The Oxford Handbook of Language Prosody 9780198832232, 0198832230

This handbook presents detailed accounts of current research in all aspects of language prosody, written by leading expe

134 25 56MB

English Pages 957 Year 2021

Table of contents :
Cover
The Oxford Handbook of Language Prosody
Copyright
Contents
Acknowledgements
List of Figures
List of Tables
List of Maps
List of Abbreviations
About The Contributors
Chapter 1: Introduction
1.1 Introduction
1.2 Motivating our enterprise
1.3 Definitional and terminological issues
1.3.1 Tradition and innovation in defining language prosody
1.3.2 Some typological categories
1.3.3 Some terminological ambiguities
1.4 The structure of the handbook
1.5 Reflections and outlook
Part I: Fundamentals of Language Prosody
Chapter 2: Articulatory Measures of Prosody
2.1 Introduction
2.2 Experimental techniques
2.2.1 Laryngoscopy
2.2.2 Electroglottography
2.3 Aerodynamic and respiratory movement measures
2.4 Point-tracking techniques for articulatory movements
2.4.1 Ultrasound
2.4.2 Electropalatography
2.5 Summary of articulatory measurement techniques
2.6 Conclusion
Acknowledgements
Chapter 3: Fundamental Aspects in the Perception of f0
3.1 Introduction
3.2 A history of fundamental pitch perception research
3.2.1 Basic terminology
3.2.2 Theories of pitch perception
3.2.3 Critical bands and their importance for pitch perception theories
3.2.4 Which components are important?
3.3 Pitch perception in speech
3.3.1 Just noticeable differences and limitations in the perception of f0
3.3.2 Segmental influences on the perception of f0
3.3.3 Perceptual interplay between prosodic parameters
3.4 Conclusion
Part II: Prosody and Linguistic Structure
Chapter 4: Tone Systems
4.1 Introduction: What is tone?
4.1.1 Tone as toneme versus morphotoneme
4.1.2 Tone as pitch versus tone package
4.1.3 Tone-bearing unit versus tonal domain (mora, syllable, foot)
4.1.4 Tone versus accent
4.2 Phonological typology of tone by inventory
4.2.1 Number of tones
4.2.2 Contour tones
4.2.3 Downstep and floating tones
4.2.4 Underspecified tone and tonal markedness
4.2.5 Distributional constraints
4.3 Phonological typology of tone by process
4.3.1 Vertical assimilation
4.3.2 Horizontal assimilation
4.3.3 Contour simplification
4.3.4 Dissimilation and polarity
4.4 Grammatical tone
4.4.1 Lexical versus morphological tone
4.4.2 Tonal morphemes
4.4.3 Replacive tone
4.4.4 Inflectional tonology
4.4.5 Compounding
4.4.6 Phrase-level tonology
4.5 Further issues: Phonation and tone features
4.6 Conclusion
Chapter 5: Word-Stress Systems
5.1 Introduction
5.2 Evidence for stress
5.2.1 Phonetic exponents
5.2.2 Speaker intuitions and co-speech gestures
5.2.3 Segmental and metrical exponents of stress
5.2.4 Distributional characteristics of stress
5.3 Typology of stress
5.3.1 Lexical versus predictable stress
5.3.2 Quantity-insensitive stress
5.3.3 Quantity-sensitive stress
5.3.4 Bounded and unbounded stress
5.3.5 Secondary stress
5.3.6 Non-finality effects
5.4 Rhythmic stress and the foot
5.5 Outstanding issues in word stress
5.5.1 The diagnosis of stress
5.5.2 Stress and prosodic taxonomy
5.5.3 Stress typology and explanation
5.6 Conclusion
Additional reading
Chapter 6: The Autosegmental-Metrical Theory of Intonational Phonology
6.1 Introduction
6.2 AM phonology
6.2.1 AM essentials
6.2.2 Metrical structure and its relationship with the autosegmental tonal string
6.2.3 Secondary association of tones
6.2.4 The phonological composition of melodies
6.3 Phonetic implementation in AM
6.3.1 Tonal alignment
6.3.2 Tonal scaling
6.3.3 Interpolation and tonal crowding
6.4 Applications of AM
6.5 Advantages over other models
Chapter 7: Prosodic Morphology
7.1 Introduction
7.2 Prosodic structure
7.3 Reduplication
7.4 Root-and-pattern morphology
7.5 Truncation
7.6 Infixation
7.7 Summary
Chapter 8: Sign Language Prosody
8.1 The visible organization of sign languages
8.2 Prosodic constituency in signed languages
8.2.1 The syllable and the prosodic word
8.2.2 Intonational phrases
8.2.3 Phonological phrases
8.3 Defining properties of sign language intonation
8.4 Intonation and information structure
8.4.1 Topic/comment
8.4.2 Given/new information
8.4.3 Focus/background
8.5 Prosody versus syntax: evidence from wh-questions
8.6 Summary and conclusion
Acknowledgements
Part III: Prosody in Speech Production
Chapter 9: Phonetic Variation in Tone and Intonation Systems
9.1 Introduction
9.2 Tonal coarticulation
9.3 Timing of pitch movements
9.3.1 Segmentally induced variability in f0 target realization
9.3.2 Time pressure effects on f0 target realization
9.3.3 Truncation and compression
9.4 Scaling of pitch movements
9.4.1 Pitch range variability: basic characteristics
9.4.2 Paralanguage, pitch range (quasi-)universals, and grammaticalization
9.4.3 Downtrend
9.4.4 Perceptual constraints on tone scaling patterns
9.5 Contour shape
9.5.1 Peak shapes and movement curvatures
9.5.2 ‘Dipping’ Lows and local contrast
9.5.3 Integrality of f0 features
9.6 Non-f0 effects
9.7 Conclusion
Chapter 10: Phonetic Correlates of Word and Sentence Stress
10.1 Introduction
10.2 Acoustic correlates of word stress
10.2.1 Segment duration
10.2.2 Intensity
10.2.3 Spectral tilt
10.2.4 Spectral expansion
10.2.5 Resistance to coarticulation
10.2.6 Rank order
10.3 Acoustic correlates of sentence stress
10.4 Perceptual cues of word and sentence stress
10.5 Cross-linguistic differences in phonetic marking of stress
10.5.1 Contrastive versus demarcative stress
10.5.2 Functional load hypothesis
10.6 Conclusion
Appendix
Measuring correlates of stress using Praat speech processing software
Measuring duration
Measuring intensity
Measuring spectral tilt
Measuring formants F1, F2 (for vowels and sonorant consonants)
Measuring noise spectra (for fricatives, stops, and affricates)
Measuring pitch correlates
Chapter 11: Speech Rhythm and Timing
11.1 Introduction
11.1.1 Periodicity in surface timing
11.1.2 Contrastive rhythm
11.1.3 Hierarchical timing
11.1.4 Articulation rate
11.2 ‘Rhythm metrics’ and prosodic typology
11.2.1 Acoustically based metrics of speech rhythm: lessons and limitations
11.2.2 The fall of the rhythm class hypothesis
11.3 Models of prosodic speech timing
11.3.1 Localized approaches to prosodic timing
11.3.2 Coupled oscillator approaches to prosodic timing
11.4 Conclusions and prospects
Part IV: Prosody Across the World
Chapter 12: Sub-Saharan Africa
12.1 Introduction
12.2 Tone
12.2.1 Tonal inventories
12.2.2 The representation of tone
12.2.3 Phonological tone rules/constraints
12.2.4 Grammatical functions of tone
12.3 Word accent
12.4 Intonation
12.4.1 Pitch as marking sentence type or syntactic domain
12.4.2 Length marking prosodic boundaries
12.5 Conclusion
Chapter 13: North Africa and the Middle East
13.1 Introduction
13.2 Afro-Asiatic
13.2.1 Berber
13.2.2 Egyptian
13.2.3 Semitic
13.2.3.1 East Semitic
13.2.3.2 West Semitic: Modern South Arabian
13.2.3.3 West Semitic: Ethio-Semitic
13.2.3.4 Central Semitic: Sayhadic
13.2.3.5 Central Semitic: North West Semitic
13.2.3.6 Central Semitic: Arabian
13.2.4 Chadic
13.2.5 Cushitic
13.2.6 Omotic
13.3 Nilo-Saharan
13.3.1 Eastern Sudanic
13.3.2 Central Sudanic
13.3.3 Maban
13.3.4 Saharan
13.4 Discussion
Chapter 14: South West and Central Asia
14.1 Introduction
14.2 Turkic
14.2.1 Lexical prosody in Turkish: stress
14.2.2 Lexical prosody: vowel harmony in Turkish
14.2.3 Post-lexical prosody in Turkish
14.2.4 Focus in Turkish
14.3 Mongolian
14.3.1 Lexical prosody in Mongolic: stress
14.3.2 Lexical prosody: vowel harmony in Mongolian
14.3.3 Post-lexical prosody in Mongolian
14.3.4 Focus in Mongolian
14.4 Persian
14.4.1 Lexical prosody in Persian
14.4.2 Post-lexical prosody in Persian
14.4.3 Focus in Persian
14.5 Caucasian
14.5.1 Georgian
14.5.1.1 Lexical prosody in Georgian
14.5.1.2 Post-lexical prosody in Georgian
14.5.1.3 Focus in Georgian
14.5.2 Daghestanian
14.6 Communicative prosody: question intonation
14.7 Conclusion
Chapter 15: Central and Eastern Europe
15.1 Introduction
15.2 Word prosody
15.2.1 Quantity
15.2.1.1 Baltic
15.2.1.2 Finno-Ugric
15.2.1.3 Slavic
15.2.2 Word stress
15.2.2.1 Baltic
15.2.2.2 Finno-Ugric
15.2.2.3 Slavic
15.2.2.4 Romance
15.3 Sentence prosody
15.3.1 Baltic
15.3.2 Finno-Ugric
15.3.3 Slavic
15.3.4 Romance
15.4 Conclusion
Chapter 16: Southern Europe
16.1 Introduction
16.2 Prosodic structure
16.2.1 Stress
16.2.2 Rhythm
16.2.3 Prosodic constituency
16.3 Intonation
16.3.1 Inventories
16.3.2 Downstep
16.3.3 Copying, merging, and truncation
16.4 Conclusion
Chapter 17: Iberia
17.1 Introduction
17.2 Word prosody
17.2.1 Catalan, Spanish, and Portuguese
17.2.2 Basque
17.3 Prosodic phrasing
17.3.1 Prosodic constituents and tonal structure
17.3.2 Phrasal prominence
17.4 Intonation
17.4.1 Tonal events
17.4.2 Main sentence types and pragmatic meanings
17.5 Conclusion and perspectives
Acknowledgements
Chapter 18: Northwestern Europe
18.1 Introduction
18.2 Continental north germanic
18.2.1 Word stress
18.2.2 Tone
18.2.2.1 Typology
18.2.3 Intonation
18.2.4 Notes on Danish
18.2.5 Prosodic domains
18.3 Continental west germanic
18.3.1 Prosodic domains
18.3.2 Word stress
18.3.3 Intonation
18.3.4 Tone accents
18.4 Concluding remarks
Chapter 19: Intonation Systems Across Varieties of English
19.1 The role of English in intonation research
19.2 Scope of the chapter
19.3 Intonational systems of Mainstream English Varieties
19.3.1 Northern hemisphere
19.3.2 Non-mainstream varieties of American English
19.3.3 Non-mainstream British varieties
19.3.4 Southern hemisphere mainstream varieties
19.4 English intonation in contact
19.4.1 Hong Kong English
19.4.2 West African Englishes (Nigeria and Ghana)
19.4.3 Singapore English
19.4.4 Indian English
19.4.5 South Pacific Englishes (Niue, Fiji, and Norfolk Island)
19.4.6 East African Englishes (Kenya and Uganda)
19.4.7 Caribbean English
19.4.8 Black South African English
19.4.9 Maltese English
19.5 Uptalk
19.6 Conclusion
Chapter 20: The North Atlantic and the Arctic
20.1 Introduction
20.2 Celtic
20.2.1 Irish and Scottish Gaelic
20.2.2 Intonation
20.3 Insular Scandinavian
20.3.1 Stress in words and phrases
20.3.2 Intonation
20.4 Eskimo-Aleut
20.4.1 Inuit
20.4.2 Yupik
20.4.3 Aleut
20.5 Conclusion
Chapter 21: The Indian Subcontinent
21.1 Introduction
21.2 Quantity
21.3 Word stress
21.4 Tone
21.5 Intonation and intonational tunes
21.5.1 Declarative
21.5.2 Focus
21.5.3 Yes/no questions, with and without focus
21.6 Segmental rules and phrasing
21.7 Conclusion
Acknowledgements
Chapter 22: China and Siberia
22.1 Introduction
22.2 The syllable and tone inventories of Chinese languages
22.3 Tone sandhi in Chinese languages
22.4 Lexical and phrasal stress in Chinese languages
22.5 Intonation in Chinese languages
22.5.1 Focus
22.5.2 Interrogativity
22.6 The prosody of Siberian languages
22.7 Summary
Chapter 23: Mainland South East Asia
23.1 Scope of the chapter
23.2 Word-level prosody
23.2.1 Word shapes and stress
23.2.2 Tonation
23.2.2.1 Inventories
23.2.2.2 Tonal phonology, tone sandhi, and morphotonology
23.3 Phrasal prosody
23.3.1 Prosodic phrasing
23.3.2 Intonation
23.3.3 Information structure
23.4 Conclusion
Chapter 24: Asian Pacific Rim
24.1 Introduction
24.2 Japanese
24.2.1 Japanese word accent
24.2.2 Japanese intonation
24.3 Korean
24.3.1 Korean word prosody
24.3.2 Korean intonation: melodic aspects
24.3.3 Korean intonation: prosodic phrasing
24.4 Conclusion
Acknowledgement
Chapter 25: Austronesia
25.1 Introduction
25.2 Lexical tone
25.3 Lexical stress
25.4 Intonation
25.5 Prosodic integrationof function words
25.6 Conclusion
Chapter 26: Australia and New Guinea
26.1 General metrical patterns in Australia
26.1.1 Quantity and peninitial stress
26.2 Intonation in Australian languages
26.3 Word prosody in New Guinea
26.3.1 Stress
26.3.2 Tone
26.4 Intonation in Papuan languages
26.5 Conclusion
Chapter 27: North America
27.1 Introduction
27.2 Stress in North American Indian languages
27.2.1 Typology of stress in North American Indian languages
27.2.2 Weight-sensitive stress
27.2.3 Iambic lengthening
27.2.4 Morphological stress
27.2.5 Phonetic exponents of stress in North America
27.3 Tone in North American Indian languages
27.3.1 Tonal inventories
27.3.2 Tonal processes
27.3.3 Stress and tone
27.3.4 Grammatical tone
27.3.5 Tonal innovations and multidimensionality of tone realization
27.4 Intonation and prosodic constituency
27.5 Prosodic morphology
27.6 Conclusions
Chapter 28: Mesoamerica
28.1 Introduction
28.2 Oto-Manguean languages
28.2.1 Lexical tone
28.2.2 Stress
28.2.3 Phonation type
28.2.4 Syllable structure and length
28.2.5 Intonation and prosody above the word
28.3 Mayan languages
28.3.1 Stress and metrical structure
28.3.2 Lexical tone
28.3.3 Phonation
28.3.4 Syllable structure
28.3.5 Intonation
28.4 Toto-Zoquean
28.4.1 Syllable structure, length, and phonation type
28.4.2 Stress and intonation
28.5 Conclusion
Acknowledgements
Chapter 29: South America
29.1 Introduction
29.2 Stress and metrical structure
29.2.1 Manifestation of prominence
29.2.2 Metrical feet and edges
29.2.3 Syllable, mora, and quantity
29.3 Tones
29.3.1 H, M, L
29.3.2 Underlying L and default H
29.3.3 Languages with underlying H and default L
29.3.4 Languages with underlying H and L
29.4 Sonority hierarchies, laryngeals, and nasality
29.5 Word prosody and morphology
29.5.1 Stress and morphology
29.5.2 Tones and morphology
29.6 Historical and comparative issues
29.6.1 Stress
29.6.2 Tones
29.7 Conclusion
Part V: Prosody in Communication
Chapter 30: Meanings of Tones and Tunes
30.1 Introduction
30.2 Basic concepts for the study of intonational meaning
30.3 Generalist and specialist theories of intonational meaning
30.3.1 Generalist theories
30.3.2 Specialist theories
30.4 Towards unifying generalist and specialist theories
30.5 Experimental work on intonational meaning
30.6 Conclusion
Acknowledgements
Chapter 31: Prosodic Encoding of Information Structure: A typological perspective
31.1 Introduction
31.2 Basic concepts of information structure
31.3 A typology of prosodic encoding of information structure
31.3.1 Stress- or pitch-accent-based cues
31.3.1.1 Types of nuclear accent or tune
31.3.2 Phrase-based cues
31.3.3 Register-based cues
31.4 Syntax–prosody interaction and non-prosodic-marking systems
31.5 Unified accounts
31.6 Evaluation and considerations for future research
Chapter 32: Prosody in Discourse and Speaker State
32.1 Introduction
32.2 Prosody in discourse
32.2.1 Prosody and turn-taking
32.2.2 Prosody and entrainment
32.3 Prosody and speaker state
32.3.1 Prosody and emotion
32.3.2 Prosody and deception
32.4 Conclusion
Chapter 33: Visual Prosody Across Cultures
33.1 Introduction
33.2 What is visual prosody?
33.2.1 How do auditory and visual prosodic cues relate to each other?
33.2.2 Is there cultural variability in (audio)visual prosody?
33.3 Three case studies
33.3.1 Cues to feeling-of-knowing in Dutch and Japanese
33.3.2 Correlates of winning and losing by Dutch and Pakistani children
33.3.3 Gestural cues to time in English and Chinese
33.4 Discussion and conclusion
Chapter 34: Pathological Prosody: Overview, assessment, and treatment
34.1 Introduction
34.2 Neural bases of pathological prosody
34.2.1 History of approaches to hemispheric specialization
34.2.2 Current proposals of hemispheric specialization of prosodic elements
34.2.3 Disturbances of pitch control and temporal cues
34.2.4 Subcortical involvement in speech prosody: basal ganglia and cerebellum
34.2.5 Prosody in autism
34.3 Evaluation of prosodic performance
34.4 Treatment for prosodic deficits
34.5 Concluding remarks
Part VI: Prosody and Language Processing
Chapter 35: Cortical and Subcortical Processing of Linguistic Pitch Patterns
35.1 Introduction
35.2 The basic functional anatomy of the human auditory system
35.3 Experimental methods in the neurophysiology of language processing
35.4 Hemispheric specialization
35.5 Neural evidence for mechanisms of linguistic pitch processing
35.6 Cortical plasticity of pitch processing
35.7 Subcortical pitch processing and its plasticity
35.8 Prosody and syntactic processing in the brain
35.9 Conclusion and future directions: bridging linguistic theory and brain models
Chapter 36: Prosody and Spoken-Word Recognition
36.1 Introduction
36.2 Defining prosody in spoken-word recognition
36.3 The bayesian prosody recognizer: robustness under variability
36.3.1 Parallel uptake of information
36.3.1.1 Influences on processing segmental information
36.3.1.2 Influences on lexical segmentation
36.3.1.3 Influences on lexical selection
36.3.1.4 Influences on inferences about other structures
36.3.2 High contextual dependency
36.3.2.1 Left-context effects
36.3.2.2 Right-context effects
36.3.2.3 Syntagmatic representation of pitch
36.3.3 Adaptive processing
36.3.4 Phonological abstraction
36.4 Conclusions and future directions
Chapter 37: The Role of Phrase-Level Prosody in Speech Production Planning
37.1 Introduction
37.2 Modern theories of prosody
37.3 Evidence for the active use of prosodic structure in speech production planning
37.3.1 Rules and processes that are sensitive to prosodic constituent boundaries: selection of cues
37.3.2 Patterns of surface phonetic values in cues to distinctive segmental features, reflecting prosodic structure
37.3.3 Behavioural evidence for the role of prosody in speech planning
37.4 The role of prosody in models of speech production planning
37.5 Summary and related issues
Part VII: Prosody and Language Acquisition
Chapter 38: The Acquisition of Word Prosody
38.1 Introduction
38.2 The acquisition of lexical tone
38.2.1 Perception of lexical tones
38.2.2 The role of lexical tone in word learning
38.2.3 Production of lexical tones
38.2.4 Summary
38.3 The acquisition of pitch accent
38.3.1 Perception of pitch accent
38.3.2 The role of pitch accent in word recognition and learning
38.3.3 Production of lexical accent
38.3.4 Summary
38.4 The acquisition of word stress
38.4.1 The perception of word stress
38.4.2 The role of word stress in word recognition and word learning
38.4.3 The production of word stress
38.4.4 Summary
38.5 Discussion and conclusions
Chapter 39: Development Of Phrase-Level Prosody from Infancy to Late Childhood
39.1 Introduction
39.2 Prosody in infancy
39.2.1 Infants’ perception of tonal patterns and prosodic phrasing
39.2.2 Prosody in pre-lexical vocalizations
39.3 Prosodic production in childhood
39.3.1 The acquisition of intonational contours
39.3.2 The acquisition of speech rhythm
39.4 Communicative uses of prosody in childhood: production and comprehension
39.4.1 Acquisition of prosody and information structure
39.4.2 Acquisition of prosody and sociopragmatic meanings
39.5 Future research
Chapter 40: Prosodic Bootstrapping
40.1 Introduction
40.2 Prosodic bootstrapping theory
40.3 Newborns’ sensitivity to prosody as a foundation for prosodic bootstrapping
40.4 How early sensitivity to prosody facilitates language learning
40.4.1 Prosodic grouping biases and the Iambic-Trochaic Law
40.4.2 How lexical stress helps infants to learn words
40.4.3 How prosody bootstraps basic word order
40.4.4 How prosody constrains syntactic analysis
40.5 Conclusion and perspectives
Chapter 41: Prosody in Infantand Child-Directed Speech
41.1 Introduction
41.2 Primary prosodic characteristics of infant- and child-directed speech
41.3 Cross-cultural similarities and differences
41.4 Other sources of variation
41.5 Function of prosodic characteristics
41.6 Conclusion and future directions
Chapter 42: Prosody in Children with Atypical Development
42.1 Introduction
42.2 Autism spectrum disorder
42.2.1 Prosody production
42.2.2 Prosody perception
42.3 Developmental language disorder
42.3.1 Prosody production
42.3.2 Prosody perception
42.4 Cerebral palsy
42.4.1 Prosody production
42.5 Hearing loss
42.5.1 Prosody production
42.5.1.1 Prosody production in sentences
42.5.1.2 Emotional prosody production
42.5.2 Prosody perception
42.5.2.1 Prosody and sentence perception
42.5.2.2 Prosody and emotion perception
42.6 Clinical practice in developmental prosody disorders
42.6.1 Assessing prosody deficits
42.6.2 Treatment of prosody deficits
42.7 Conclusion
Chapter 43: Word Prosody in Second Language Acquisition
43.1 Introduction
43.2 Lexical stress
43.2.1 Second language word perception/recognition
43.2.2 Second language word production
43.3 Lexical tone
43.3.1 Second language perception/recognition of lexical tone
43.3.2 Second language production of lexical tone
43.4 Conclusions and future directions
Chapter 44: Sentence Prosody in a Second Language
44.1 Introduction
44.2 Intonational aspects of second language sentence prosody
44.2.1 Prosodic marking of information structure
44.2.2 Prosodic marking of questions
44.2.3 Prosodic phrasing
44.2.4 Phonetic implementation of pitch accents and boundary tones
44.2.5 Prosodic marking of non-linguistic aspects
44.3 Timing phenomena in second language sentence prosody
44.3.1 Rhythm
44.3.2 Tempo and pauses
44.3.3 Fluency
44.4 Perception of second language sentence prosody
44.4.1 Perception and interpretation
44.4.2 Perceived foreign accent and ease of understanding
44.5 Conclusions
Acknowledgements
Chapter 45: Prosody in Second Language Teaching: Methodologies and effectiveness
45.1 Introduction
45.2 The importance of prosody for L2 learners
45.3 Teaching prosody
45.3.1 Intonation
45.3.2 Rhythm
45.3.3 Word stress
45.4 The effectiveness of L2 pronunciation instruction applied to prosody
45.4.1 Awareness
45.4.2 Perception
45.4.3 Production
45.4.4 Multi-modality: visual and auditory input and feedback
45.5 Conclusion
Part VIII: Prosody in Technology and the Arts
Chapter 46: Prosody in Automatic Speech Processing
46.1 Introduction
46.2 A short history of prosody in automatic speech processing
46.2.1 Timeline
46.2.2 Phenomena and performance
46.3 Features and their importance
46.3.1 Power features
46.3.2 Leverage features
46.3.3 An illustration
46.4 Concluding remarks
Acknowledgements
Chapter 47: Automatic Prosody Labelling and Assessment
47.1 Introduction
47.2 Prosodic inventories
47.3 Two types of informationabout prosody
47.3.1 Information from syntax
47.3.2 Information from the acoustic signal
47.3.3 Fusion of syntactic and acoustic information
47.4 How autobi works
47.5 Assessment
47.5.1 Intrinsic assessment of automatic labelling
47.5.2 Extrinsic assessment of automatic labelling
47.5.3 Assessment of language learner prosody
47.6 Conclusion
Chapter 48: Stress, Meter, and Text-Setting
48.1 Introduction
48.2 Meter
48.3 Prominence, rhythm, and stress
48.4 Stress-based meters
48.5 Quantitative meters
48.6 Text-setting
Acknowledgements
Chapter 49: Tone–Melody Matching in tone-Language Singing
49.1 Introduction
49.2 Defining and investigating tone–melody matching
49.3 Some examples
49.3.1 Cantonese pop music
49.3.2 Vietnamese tân nhạc
49.3.3 Contemporary Thai song
49.3.4 Traditional Dinka songs
49.4 Prospect
References
Index of Languages
Subject Index

Recommend Papers

The Oxford Handbook of Language Prosody 9780192568212, 9780198832232, 0192568213

This handbook presents detailed accounts of current research in all aspects of language prosody, written by leading expe

110 102 13MB Read more

The Oxford Handbook of Language and Law 9780199572120, 0199572127

This book provides a state-of-the-art account of past and current research in the interface between linguistics and law.

98 78 18MB Read more

The Oxford Handbook of Language Attrition 9780198793595, 0198793596

This volume is the first handbook dedicated to language attrition, the study of how a speaker's language may be aff

101 19 8MB Read more

The Oxford Handbook of Taboo Words and Language 9780198808190, 0198808194

This volume brings together experts from a wide range of disciplines to define and describe tabooed words and language a

117 39 27MB Read more

The Oxford Handbook of African American Language 9780199795390, 0199795398

Offers a set of diverse analyses of traditional and contemporary work on language structure and use in African American

113 35 39MB Read more

The Oxford Handbook of Language and Law 9780199572120, 0199572127

This book provides a state-of-the-art account of past and current research in the interface between linguistics and law.

119 8 5MB Read more

The Oxford Handbook of Language Attrition 9780192512192, 9780198793595, 0192512196

This volume is the first handbook dedicated to language attrition, the study of how a speaker's language may be aff

106 82 4MB Read more

The Oxford Handbook of Language Evolution 9780199541119, 0199541116

Leading scholars present critical accounts of every aspect of the field, including work in animal behaviour; anatomy, ge

115 75 8MB Read more

Language and Prosody of the Russian Folk Epic 9783110873719, 9789027923301

156 91 7MB Read more

The Oxford Handbook of the Septuagint (Oxford Handbooks) 9780199665716, 0199665710

The Septuagint is the term commonly used to refer to the corpus of early Greek versions of Hebrew Scriptures. The collec

99 97 62MB Read more

The Oxford Handbook of Language Prosody
9780198832232, 0198832230

Author / Uploaded
Carlos Gussenhoven
Professor of General and Experimental Phonology Carlos Gussenhoven
Aoju Chen
Professor of Language Development in Relation to Socialisation and Identity Aoju Chen

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

T h e Ox f o r d H a n d b o o k o f

L A NGUAGE PRO SODY

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

OXFORD HANDBOOKS IN LINGUISTICS Recently published

THE OXFORD HANDBOOK OF LANGUAGE POLICY AND PLANNING Edited by James W. Tollefson and Miguel Pérez-Milans

THE OXFORD HANDBOOK OF PERSIAN LINGUISTICS Edited by Anousha Sedighi and Pouneh Shabani-Jadidi

THE OXFORD HANDBOOK OF ENDANGERED LANGUAGES Edited by Kenneth L. Rehg and Lyle Campbell

THE OXFORD HANDBOOK OF ELLIPSIS

Edited by Jeroen van Craenenbroeck and Tanja Temmerman

THE OXFORD HANDBOOK OF LYING Edited by Jörg Meibauer

THE OXFORD HANDBOOK OF TABOO WORDS AND LANGUAGE Edited by Keith Allan

THE OXFORD HANDBOOK OF MORPHOLOGICAL THEORY Edited by Jenny Audring and Francesca Masini

THE OXFORD HANDBOOK OF REFERENCE Edited by Jeanette Gundel and Barbara Abbott

THE OXFORD HANDBOOK OF EXPERIMENTAL SEMANTICS AND PRAGMATICS Edited by Chris Cummins and Napoleon Katsos

THE OXFORD HANDBOOK OF EVENT STRUCTURE Edited by Robert Truswell

THE OXFORD HANDBOOK OF LANGUAGE ATTRITION Edited by Monika S. Schmid and Barbara Köpke

THE OXFORD HANDBOOK OF LANGUAGE CONTACT Edited by Anthony P. Grant

THE OXFORD HANDBOOK OF NEUROLINGUISTICS Edited by Greig I. de Zubicaray and Niels O. Schiller

THE OXFORD HANDBOOK OF ENGLISH GRAMMAR Edited by Bas Aarts, Jill Bowie, and Gergana Popova

THE OXFORD HANDBOOK OF AFRICAN LANGUAGES Edited by Rainer Vossen and Gerrit J. Dimmendaal

THE OXFORD HANDBOOK OF NEGATION Edited by Viviane Déprez and M. Teresa Espinal

THE OXFORD HANDBOOK OF LANGUAGE PROSODY Edited by Carlos Gussenhoven and Aoju Chen

For a complete list of Oxford Handbooks in Linguistics please see pp. 893–896.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

T h e Ox f or d H a n db o o k of

LANGUAGE PROSODY Edited by

CARLOS GUSSENHOVEN and

AOJU CHEN

1

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

1 Great Clarendon Street, Oxford, ox2 6dp, United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © editorial matter and organization Carlos Gussenhoven and Aoju Chen 2020 © the chapters their several contributors 2020 The moral rights of the authors have been asserted First Edition published in 2020 Impression: 1 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2020937413 ISBN 978–0–19–883223–2 Printed and bound by CPI Group (UK) Ltd, Croydon, cr0 4yy Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

Contents

Acknowledgements xi List of Figures xiii List of Tables xxii List of Maps xxv List of Abbreviations xxvii About the Contributors xxxv

1. Introduction

1

Carlos Gussenhoven and Aoju Chen

PA RT I F U N DA M E N TA L S OF L A NGUAG E PRO S ODY 2. Articulatory measures of prosody

15

Taehong Cho and Doris Mücke

3. Fundamental aspects in the perception of f0

29

Oliver Niebuhr, Henning Reetz, Jonathan Barnes, and Alan C. L. Yu

PA RT I I PRO S ODY A N D L I NGU I S T IC S T RUC T U R E 4. Tone systems

45

Larry M. Hyman and William R. Leben

5. Word-stress systems

66

Matthew K. Gordon and Harry van der Hulst

6. The Autosegmental-Metrical theory of intonational phonology

78

Amalia Arvaniti and Janet Fletcher

7. Prosodic morphology John J. McCarthy

96

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

vi Contents

8. Sign language prosody

104

Wendy Sandler, Diane Lillo-Martin, Svetlana Dachkovsky, and Ronice Müller de Quadros

PA RT I I I PRO S ODY I N SPE E C H PRODUC T ION 9. Phonetic variation in tone and intonation systems

125

Jonathan Barnes, Hansjörg Mixdorff, and Oliver Niebuhr

10. Phonetic correlates of word and sentence stress

150

Vincent J. van Heuven and Alice Turk

11. Speech rhythm and timing

166

Laurence White and Zofia Malisz

PA RT I V PRO S ODY AC RO S S T H E WOR L D 12. Sub-Saharan Africa

183

Larry M. Hyman, Hannah Sande, Florian Lionnet, Nicholas Rolle, and Emily Clem

13. North Africa and the Middle East

195

Sam Hellmuth and Mary Pearce

14. South West and Central Asia

207

Anastasia Karlsson, Gülİz Güneş, Hamed Rahmani, and Sun-Ah Jun

15. Central and Eastern Europe

225

Maciej Karpiński, Bistra Andreeva, Eva Liina Asu, Anna Daugavet, Štefan Beňuš, and Katalin Mády

16. Southern Europe

236

Mariapaola D’Imperio, Barbara Gili Fivela, Mary Baltazani, Brechtje Post, and Alexandra Vella

17. Iberia

251

Sónia Frota, Pilar Prieto, and Gorka Elordieta

18. Northwestern Europe Tomas Riad and Jörg Peters

271

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

Contents vii

19. Intonation systems across varieties of English

285

Martine Grice, James Sneed German, and Paul Warren

20. The North Atlantic and the Arctic

303

Kristján Árnason, Anja Arnhold, Ailbhe Ní Chasaide, Nicole Dehé, Amelie Dorn, and Osahito Miyaoka

21. The Indian subcontinent

316

Aditi Lahiri and Holly J. Kennard

22. China and Siberia

332

Jie Zhang, San Duanmu, and Yiya Chen

23. Mainland South East Asia

344

Marc Brunelle, James Kirby, Alexis Michaud, and Justin Watkins

24. Asian Pacific Rim

355

Sun-Ah Jun and Haruo Kubozono

25. Austronesia

370

Nikolaus P. Himmelmann and Daniel Kaufman

26. Australia and New Guinea

384

Brett Baker, Mark Donohue, and Janet Fletcher

27. North America

396

Gabriela Caballero and Matthew K. Gordon

28. Mesoamerica

408

Christian DiCanio and Ryan Bennett

29. South America

428

Thiago Costa Chacon and Fernando O. de Carvalho

PA RT V PRO S ODY I N C OM M U N IC AT ION 30. Meanings of tones and tunes

443

Matthijs Westera, Daniel Goodhue, and Carlos Gussenhoven

31. Prosodic encoding of information structure: A typological perspective 454 Frank Kügler and Sasha Calhoun

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

viii Contents

32. Prosody in discourse and speaker state

468

Julia Hirschberg, Štefan Beňuš, Agustín Gravano, and Rivka Levitan

33. Visual prosody across cultures

477

Marc Swerts and Emiel Krahmer

34. Pathological prosody: Overview, assessment, and treatment

486

Diana Van Lancker Sidtis and Seung-yun Yang

PA RT V I PRO S ODY A N D L A NGUAG E PRO C E S SI NG 35. Cortical and subcortical processing of linguistic pitch patterns

499

Joseph C. Y. Lau, Zilong Xie, Bharath Chandrasekaran, and Patrick C. M. Wong

36. Prosody and spoken-word recognition

509

James M. McQueen and Laura Dilley

37. The role of phrase-level prosody in speech production planning 522 Stefanie Shattuck-Hufnagel

PA RT V I I PRO S ODY A N D L A NGUAG E AC QU I SI T ION 38. The acquisition of word prosody

541

Paula Fikkert, Liquan Liu, and Mitsuhiko Ota

39. Development of phrase-level prosody from infancy to late childhood

553

Aoju Chen, Núria Esteve-Gibert, Pilar Prieto, and Melissa A. Redford

40. Prosodic bootstrapping

563

Judit Gervain, Anne Christophe, and Reiko Mazuka

41. Prosody in infant- and child-directed speech Melanie Soderstrom and Heather Bortfeld

574

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

Contents ix

42. Prosody in children with atypical development

582

Rhea Paul, Elizabeth Schoen Simmons, and James Mahshie

43. Word prosody in second language acquisition

594

Allard Jongman and Annie Tremblay

44. Sentence prosody in a second language

605

Jürgen Trouvain and Bettina Braun

45. Prosody in second language teaching: Methodologies and effectiveness

619

Dorothy M. Chun and John M. Levis

PA RT V I I I PRO S ODY I N T E C H NOL O GY A N D T H E A RT S 46. Prosody in automatic speech processing

633

Anton Batliner and Bernd Möbius

47. Automatic prosody labelling and assessment

646

Andrew Rosenberg and Mark Hasegawa-Johnson

48. Stress, meter, and text-setting

657

Paul Kiparsky

49. Tone–melody matching in tone-language singing

676

D. Robert Ladd and James Kirby

References Index of Languages Subject Index

689 877 887

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

Acknowledgements

We thank Julia Steer for her invitation to produce a handbook on language prosody and for the confidence she appeared to have in its completion even during the protracted gestation period that preceded our first moves. We are deeply grateful to all first authors for their efforts to coordinate the production of their chapters and for the congenial way in which we were able to negotiate the composition of their author teams with them. We also thank Emilia Barakova, Caroline Féry, David House, Hui-Chuan (Jennifer) Huang, Michael Krauss, Malcolm Ross, Louis ten Bosch, Leo Wetzels, Tony Woodbury, as well as many of our authors for suggesting names of potential contributors or providing information. We thank Karlijn Blommers, Rachida Ganga, Megan Mackaaij, and Laura Smorenburg for the many ways in which they helped us move the project forward, as well as Karen Morgan and Vicki Sunter of Oxford University Press and their outsourced service providers Premkumar Ap, Kim Allen, and Hazel Bird for their cooperation and guidance. Ron Wunderink deserves special thanks for the production of Map 1.1. We would like to acknowledge the financial support from Utrecht University for editorial assistance and from the Centre of Language Studies of Radboud University for Map 1.1. Our work on this handbook was crucially facilitated by the following reviewers of chapters: Aviad Albert, Kai Alter, Mark Antoniou, Meghan Armstrong-Abrami, Amalia Arvaniti, Stefan Baumann, Štefan Beňuš, Antonis Botinis, Bettina Braun, Amanda Brown, Gene Buckley, Daniel Büring, Gabriela Caballero, Michael Cahill, Francesco Cangemi, Thiago Chacon, Chun-Mei Chen, Laura Colantoni, Elisabeth de Boer, Fernando O. de Carvalho, Maria del Mar Vanrell, Volker Dellwo, Anne-Marie DePape, Christian DiCanio, Laura Dilley, Laura Downing, Núria Esteve-Gibert, Susan D. Fischer, Sónia Frota, Riccardo Fusaroli, Carolina Gonzalez, Matthew K. Gordon, Wentao Gu, Mark Hasegawa-Johnson, Kara Hawthorne, Bruce Hayes, Nikolaus P. Himmelmann, David House, Larry M. Hyman, Pavel Iosad, Haike Jacobs, Barış Kabak, Sayang Kim, John Kingston, James Kirby, Michael Krauss, Gjert Kristoffersen, Jelena Krivokapić, Haruo Kubozono, Frank Kügler, Anja Kuschmann, D. Robert Ladd, Angelos Lengeris, Pärtel Lippus, Zenghui Liu, Jim Matisoff, Hansjörg Mixdorff, Peggy Mok, Christine Mooshammer, Claire Nance, Marta OrtegaLlebaria, Cédric Patin, Roland Pfau, Cristel Portes, Pilar Prieto, Anne Psycha, Melissa Redford, Bert Remijsen, Tomas Riad, Toni Rietveld, Anna Sara Romøren, Malcolm Ross, Katrin Schweitzer, Stefanie Shattuck-Hufnagel, Stavros Skopoteas, Louis ten Bosch, Annie Tremblay, Jürgen Trouvain, Hubert Truckenbrodt, Frank van de Velde, Harry van der Hulst, Vincent J. van Heuven, László Varga, Irene Vogel, Lei Wang, Natasha Warner, Leo Wetzels, Chris Wilde, Patrick C. M. Wong, Tony Woodbury, Seung-yun Yang, Sabine Zerbian, Bao Zhiming, and Marzena Żygis.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

List of Figures

2.1 Waveforms corresponding to vocal fold vibrations in electroglottography (examples by Phil Hoole at IPS Munich) for different voice qualities. High values indicate increasing vocal fold contact.17 2.2 Volume of the thoracic and abdominal cavities in a Respitrace inductive plethysmograph during sentence production, inhalation and exhalation phase.19 2.3 Lip aperture in electromagnetic articulography. High values indicate that lips are open during vowel production. Trajectories are longer, faster, and more displaced in target words in contrastive focus (lighter grey lines) compared to out of focus.

21

2.4 Tongue shapes in ultrasound.

23

2.5 Contact profiles in electropalatography for different stops and fricatives. Black squares indicate the contact of the tongue surface with the palate (upper rows = alveolar articulation, lower row = velar articulation).25 3.1 Enumeration ‘Computer, Tastatur und Bildschirm’ spoken by a female German speaker in three prosodic phrases (see also Phonetik Köln 2020).35 3.2 Schematic representation of the two key hypotheses of the Theory of Optimal Tonal Perception of House (1990, 1996): (a) shows the assumed time course of information density or cognitive workload across a CVC syllable and (b) shows the resulting pitch percepts for differently aligned f0 falls.

38

3.3 Utterance in Stockholm bei der ICPhS, Kiel Corpus of Spontaneous Speech, female speaker g105a000. Arrows indicate segmental intonation in terms of a change in the spectral energy distribution (0–8 kHz) of the final [s], 281 ms.

41

3.4 Perceived prosodic parameters and their interaction.

41

5.1 Number of languages with different fixed-stress locations according to StressTyp2 (Goedemans et al. 2015).

70

5.2 Median percentages of words with differing numbers of syllables in languages with a single stress per word and those with rhythmic secondary stress in Stanton (2016).

73

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xiv List of Figures

6.1 Spectrograms and f0 contours illustrating the same English tune as realized on a monosyllabic utterance (a) and a longer utterance (b).

80

6.2 Spectrograms and f0 contours of the utterance [koˈlibise i ˈðimitra] with focus on [koˈlibise] ‘swam’ (a) and on [ˈðimitra] (b), translated as ‘Did Dimitra SWIM?’ and ‘[Was it] DIMITRA who swam?’ respectively.

85

6.3 The English intonation grammar of Pierrehumbert (1980); after Dainora (2006).

86

8.1 The monosyllabic sign SEND in ISL. The dominant hand moves in a path from the chest outward, and the fingers simultaneously change position from closed to open. The two simultaneous movements constitute a complex syllable nucleus.

106

8.2 ISL complex sentence, ‘The cake that I baked is tasty’, glossed: [[CAKE IX]PP [I BAKE]PP]IP [[TASTY]PP]IP. ‘IX’ stands for an indexical pointing sign.

107

8.3 Linguistic facial expressions for three types of constituent in ISL. (a) Yes/no questions are characterized by raised brows and head forward and down; (b) wh-questions are characterized by furrowed brow and head forward; and (c) squint signals retrieval of information shared between signer and addressee. These linguistic face and head positions are strictly aligned temporally with the signing hands across each prosodic constituent.

109

8.4 Simultaneous compositionality of intonation in ISL: raised brows of yes/no questions and squint of shared information, e.g. ‘Did you rent the apartment we saw last week?’.

110

8.5 Overriding linguistic intonation with affective intonation: (a) yes/no question, ‘Did he eat a bug?!’ with affective facial expression conveying fear/revulsion, instead of the neutral linguistic yes/no facial expression shown in Figure 8.3a. (b) wh-question, ‘Who gave you that Mercedes Benz as a gift?!’ Here, affective facial expression conveying amazement overrides the neutral linguistic wh-questions shown in Figure 8.3b.

111

8.6 Intonational marking of topics in (a) ISL and (b) ASL.

113

8.7 Different phonetic realizations of the low accessibility marker, squint, in (a) ISL and (b) ASL.

114

8.8 Non-manual markers in Libras accompanying (a) information focus (raised brows and head tilted back) and (b) contrastive focus (raised and furrowed brows, and head tilted to the side).

116

8.9 ASL alternative question, glossed: PU FLAVOUR CHOCOLATE VANILLA OR LAYER OR PU, translated roughly as ‘What flavour do you want, chocolate, vanilla or layer?’.

121

9.1 Carry-over coarticulation in Mandarin Chinese. See text for an explanation of the abbreviations.

127

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

List of Figures xv

9.2 Anticipatory coarticulation in Mandarin Chinese appears dissimilatory. See text for an explanation of the abbreviations.

128

9.3 Segmental anchoring: a schematic depicting the relative stability of alignment of f0 movements (solid line) with respect to the segmental string (here CVC) and the accompanying variation in shape (i.e. slope and duration) of the f0 movement.

130

9.4 Schematic representation of compressing and truncating approaches to f0 realization under time pressure.

132

9.5 An f0 peak realized over the monosyllabic English word Anne at seven different levels of emphasis. Peaks vary considerably, while the final low is more or less invariant.

133

9.6 f0 contours for 11 English sentences read by speaker KS. A general downward trend is clearly observed (§9.4.3), but the distance between the peaks and the baseline is also progressively reduced, due to the topline falling more rapidly than the baseline. S = sentence.

134

9.7 Waveform, spectrogram, and f0 contour of a Cantonese sentence, 媽媽擔憂娃娃, maa1 maa1 daam1 jau1 waa1 waa1, ‘Mother worries about the baby’, composed entirely of syllables bearing high, level Tone 1. Gradually lowering f0 levels over the course of the utterance could be attributed to declination.

136

9.8 Waveform, spectrogram, and f0 contour of a Cantonese sentence, 山岩遮攔花環, saan1 ngaam4 ze1 laan4 faa1 waan4, ‘A mountain rock obstructs the flower wreath’, in which high Tone 1 alternates with the low falling Tone 4, creating a HLHLHL pattern reminiscent of the terracing downstep typically described in African languages.

137

9.9 The realization of the H+L* versus H* contrast in German by means of variation in f0 peak alignment (top) or f0 peak shape (bottom). The word-initial accented CV syllables of Laden ‘store’, Wiese ‘meadow’, Name ‘name’, and Maler ‘painter’ are framed in grey. Unlike for the ‘aligner’ (LBO), the f0-peak maxima of the ‘shaper’ are timed close to the accented-vowel onset for both H+L* and H*.141 9.10 A declarative German sentence produced once as a statement (left) and once as a question (right). The shapes of the prenuclear pitch accent peaks are different. The alignment of the pitch accent peaks is roughly the same (and certainly within the same phonological category) in both utterances (statement and question).

142

9.11 A sharp peak, and a plateau, realized over the English phrase ‘there’s luminary’.142 9.12 Schematic depiction of how various f0 contour shape patterns affect the location of the Tonal Center of Gravity (TCoG) (Barnes et al. 2012b) and the concomitant effect on perceived pitch event alignment. The shapes

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xvi List of Figures

on the left should predispose listeners to judgements of later ‘peak’ timing, while the mirror images (right) suggest earlier timing. Shapes that bias perception in the same direction are mutually enhancing and hence predicted to co-occur more frequently in tonal implementation.

145

9.13 f0-peak shift continuum and the corresponding psychometric function of H* identifications. The lighter lines refer to a repetition of the experiment but with a flatter intensity increase across the CV boundary.

146

10.1 Initial stress perceived (%) as a function of intensity difference between V1 and V2 (in dB) and of duration ratio V1 ÷ V2 in minimal stress pairs (a) in English, after Fry (1955), and (b) in Dutch, after van Heuven and Sluijter (1996).

159

14.1 Multiple φ’s in all-new context and with canonical SOV order.

210

14.2 Pitch track of Ali biliyor Aynurun buraya gelmeden önce nereye gitmiş olabileceğini ‘Ali knows where Aynur might have gone to before coming here’, illustrating multiple morphosyntactic words as a single ω, with focus for the subject Ali (Özge and Bozşahin 2010: 148).

210

14.3 Pitch track showing the division into α’s of all-new [[mʊʊr]α[nɔxɔint]α [parʲəgtəw]ip]ι ‘A cat was caught by a dog’, where underlined bold symbols correspond to the second mora in an α. -LH marks the beginning of the ip (Karlsson 2014: 194).

213

14.4 Pitch track of [[pit]α [marɢaʃα]ip [[xirɮʲəŋ]α [ɢɔɮig]α]ip [tʰʊʊɮəŋ]ip]ι ‘We will cross the Kherlen river tomorrow’ (Karlsson 2014: 196). -LH marks the beginning of an ip.

214

14.5 Pitch track and speech waveform illustrating final ←Hfoc marking focus on all the preceding constituents. The utterance is [[[manai aaw pɔɮ]α]ip [[[saixəŋʦantai]α]ip [[ʊxaɮəg]α]ip]foc [xuŋ]ip]ι ‘My father is nice and wise’.

215

14.6 f0 contours of 13a (a) and 13b (b).

218

14.7 Pitch track and speech waveform of Manana dzalian lamaz meomars bans, ‘Manana is washing the very beautiful soldier’. Each word forms an α with a rising contour, [L* Hα].220 14.8 Pitch track of The soldier’s aunt is washing Manana. The complex NP subject [meomris mamida] forms an ip, marked with a H- boundary tone that is higher than the preceding Hα.221 14.9 Pitch track of No, GELA is hiding behind the ship, where the subject noun is narrowly focused and the verb, instead of being deaccented, has a H+L phrase accent. The focused word and the verb together form one prosodic unit.

222

16.1 Surface variants predicted by Jun and Fougeron’s model (from Michelas and D’Imperio 2012b).

243

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

List of Figures xvii

16.2 Pitch tracks for non-contrastive topic (T), partial topic (PT), and contrastive focus (CF) renditions of the sentence Milena lo vuole amaro ‘Milena drinks it black’, in Neapolitan (NP–VP boundary = vertical line).246 16.3 In the utterance Le schéma du trois-mâts de Thomas devenait vraiment brouillon ‘Thomas’s sketch of a square-rigger became a real scribble’, the continuous line represents the reference pitch level for the first phrase.247 16.4 Schematization involving a nuclear L* Hφ followed by a postnuclear L+H- Hι combination.

248

16.5 Tonal copy of the final tone of the matrix sentence Parce qu’il n’avait plus d’argent ‘Because he didn’t have any money’ onto the rightdislocated constituent Mercier (after Ladd 1996: 141–142).

249

17.1 Frequencies of stress patterns (%) in Catalan, Spanish, and Portuguese.

252

17.2 Left utterance: Amúmen liburúa emon nau (grandmother-gen bookabs give aux ‘(S)he has given me the grandmother’s book’); right utterance: Lagunen diruá emon nau (friend-gen money-abs give aux ‘(S)he has given me the friend’s money’).

254

17.3 f0 contour of the Catalan utterance La boliviana de Badalona rememorava la noia (‘The Bolivian woman from Badalona remembered the girl’).

256

17.4 f0 contour of the Spanish utterance La niña de Lugo miraba la mermelada ‘The girl from Lugo watched the marmalade’.

256

17.5 f0 contour of the Portuguese utterance A nora da mãe falava do namorado (‘The daughter-in-law of (my) mother talked about the boyfriend’).257 17.6 f0 contour of an utterance from Northern Bizkaian Basque: ((Mirénen)AP (lagúnen)AP (liburúa)AP )ip erun dot (Miren-gen friend-gen book-abs give aux ‘I have taken Miren’s friends’ book’).

257

17.7 f0 contour of an utterance from Northern Bizkaian Basque: ((Imanolen alabien diruá)AP )ip erun dot (Imanol-gen daughter-gen money-abs give aux ‘I have taken Imanol’s daughter’s money’).

258

17.8 f0 contour of the broad-focus statement Les nenes volen melmelada, produced by a Catalan speaker (top), and Las niñas quieren mermelada, produced by a Spanish speaker, ‘The girls want jam’ (bottom).

263

17.9 f0 contour of the narrow contrastive focus statement Les nenes volen MELMELADA (‘The girls want JAM’), produced by a Catalan speaker.

264

17.10 f0 contour of the narrow contrastive-focus statement MELMELADA, quieren (JAM (they) want, ‘(They) want JAM’), produced by a Spanish speaker.264

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xviii List of Figures

17.11 f0 contour of the rising yes/no question Quieren mermelada?, produced by a Spanish speaker (top), and Que volen melmelada, produced by a Catalan speaker (bottom), ‘Do (they) want jam?’

265

17.12 f0 contour of the broad-focus statement As meninas querem marmelada (‘The girls want jam’), produced by a European Portuguese speaker.

266

17.13 f0 contour of the narrow contrastive-focus statement As meninas querem marmelada (‘The girls want jam’) with a narrow focus on ‘MARMELADA’ (top) and on ‘AS MENINAS’ (bottom), in European Portuguese.267 17.14 f0 contour of the yes/no question As meninas querem marmelada? (‘(Do) the girls want jam?’), produced by a Standard European Portuguese speaker.

268

17.15 f0 contours of the statement Allagá da laguna (arrive aux friend-abs ‘The friend has arrived’) and the yes/no question Allagá da laguna? (‘Has the friend arrived?’), uttered by the same speaker of Northern Bizkaian Basque.

269

17.16 f0 contour of the yes/no question Garágardoa edán du? (beer-abs drink aux, ‘Did he drink the beer?’) in Standard Basque.

269

18.1 The lexical tone and the partial identity of the accents in Central Swedish.274 18.2 Post-lexical Accent 2, compound, uppmärksamhetssplittring ‘attention split’.274 18.3 Swedish intonation: initiality accent (dåliga), deaccenting with plateau (gamla), word accent (lagningar), deaccented auxiliary (måste), and nuclear accent (åtgärdas).276 18.4 Comparison of Stockholm and Copenhagen pitch accents in three different conditions. The extracted word is ˈKamma (name), which gets Accent 2 in Swedish and no-stød in Danish.

277

18.5 Nuclear pitch contours without accent modifications attested for Dutch, High German, Low German, and West Frisian.

282

18.6 Cologne Accent 1 and Accent 2 in nuclear position interacting with two contours, H*LL% and L* H-L%.

283

19.1 Tonal representations and stylized f0 contours for three stress patterns in a declarative context.

292

19.2 Tonal representations and stylized f0 contours for three stress patterns in a polar interrogative context.

292

19.3 Waveform, spectrogram, and f0 track for a sentence of read speech in Singapore English.

294

19.4 Intonation patterns for yes/no questions in Fijian and Standard English.296

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

List of Figures xix

19.5 Fall-rise uptalk contour, Australian English.

300

19.6 Late rise uptalk contour, New Zealand English.

300

21.1 Declarative intonation in Bengali.

322

21.2 Standard Colloquial Assamese ‘Ram went to Ramen’s house’ (Twaha 2017: 57).

322

21.3 Nalbariya Variety of Assamese ‘Today I scolded him’ (Twaha 2017: 79).

322

21.4 Bengali intonation, focus on Nɔren.325 21.5 Bengali intonation, focus on Runir.325

21.6 Four prosodic structures for Bengali [mɑmɑ-r ʃɑli-r bie] ‘Mother’s brother’s wife’s sister’s wedding’. The neutral, broad-focus declarative (a); the declarative with focus on [mɑmɑ-r] ‘It is Mother’s brother’s wife’s sister’s wedding’ (b); the neutral, broad-focus yes/no question (c); the yes/no question with focus on [mɑmɑ-r] ‘Is it Mother’s brother’s wife’s sister’s wedding?’ (d). Only in (c) can r-coronal assimilation go through, since there are focus-marking phonological phrase boundaries after [mɑmɑr] in (b) and (d), and an optional, tone-marked phonological phrase boundary in (a).

329

23.1 Waveforms, spectrograms, and pitch tracks of the Wa words tɛɁ ‘land’ (clear register, left) and tɛ̤Ɂ ‘wager’ (breathy register, right). The clear register is characterized by sharper, more clearly defined formants; the breathy register has relatively more energy at very low frequencies.348 24.1 Waveform and f0 track of example (10) produced as [{(jʌŋanɨn) (imoɾaŋ)(imobuɾaŋ)}ip{(jʌŋhwagwane)(kandejo)}ip]IP by a Seoul Korean speaker.367 24.2 Waveform and f0 track of example (10) produced as [{(jʌŋanɨn) (imoɾaŋ)(imobuɾaŋ)}ip {(jʌŋhwagwane kandejo)}ip]IP by a Chonnam Korean speaker.

368

25.1 The Austronesian family tree (Blust 1999; Ross 2008).

371

25.2 f0 and edge tones for (3).

377

25.3 f0 and edge tones for (4).

377

25.4 f0 and edge tones for (6).

379

25.5 f0 and edge tones for (7).

380

25.6 f0 and edge tones for (8).

380

25.7 f0 and tonal targets for (9).

381

25.8 f0 and tonal targets for (10).

382

26.1 f0 contour for a wh-question in Mawng: ŋanti calŋalaŋaka werk ‘Who is the one that she sent first?’ with high initial peak on the question word: ŋanti ‘who’.

389

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xx List of Figures

28.1 Tones in utterance non-final and utterance-final position in Ixcatec. The figures show f0 trajectories for high, mid, and low tones, averaged across four speakers.

416

31.1 Typical realizations of (1) and (4), showing how focus position affects prosodic realization. A schematic pitch realization is given, along with the prosodic phrasing, intonational tune, and text, where capitals indicate the pitch accented syllable. See text for further details.

457

31.2 Lebanese Arabic (a) and Egyptian Arabic (b) realization of narrow focus on the initial subject, from Chahal and Hellmuth (2014b). As can be seen, post-focal words are deaccented in Lebanese Arabic but not Egyptian Arabic.

458

31.3 Broad focus (a) and contrastive focus (b) in Sardinian.

460

31.4 Time-normalized pitch tracks in different focus conditions in Hindi, based on five measuring points per constituent, showing the mean across 20 speakers. SOV (a) and OSV word order (b). The comparisons of interest are subject focus (dotted line) and object focus (dashed line) with respect to broad focus (solid line).

463

33.1 Visual cues reflecting a positive (a) and a negative (b) way to produce the utterance ‘My boyfriend will spend the whole summer in Spain’.

480

34.1 (a) Schematized f0 curve of He’s going downtown today, produced by a healthy male speaker. (b) Schematized f0 curve of He’s going downtown today, produced by a patient with right hemisphere damage diagnosed with dysprosody, before treatment. The intonation contour rises at the end, yielding an unnatural prosody. (c) Schematized f0 curve of He’s going downtown today, produced by a dysprosodic patient with right hemisphere damage, after treatment. The intonation contour approaches the healthy speaker’s profile.

493

37.1 Two versions of the Nijmegen approach to modelling speech production planning. (a) shows the 1989 version of planning for connected speech, while (b) shows the 1999 version of planning for single-word (or single-PWd) utterances.

533

41.1 An example of mean f0 and f0 variability in CDS compared with ADS across six languages for both fathers (Fa) and mothers (Mo).

575

45.1 Visualization techniques for intonation contours. (a) depicts drawn, stylized intonation contours (e.g. also employed by Celce-Murcia et al. 1996/2010). (b) portrays a smoother, continuous contour (e.g. used in Gilbert 1984/2012). (c) shows a system consisting of dots representing the relative pitch heights of the syllables; the size of the dots indicates the salience level of the syllables; tonal movements are indicated by curled lines starting at stressed syllables (O’Connor and Arnold 1973). (d) represents pitch movement by placing the actual text at different

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

List of Figures xxi

vertical points (e.g. Bolinger 1986). (e) illustrates a notational system that uses arrows to indicate the direction of pitch movement and diacritics and capitalization to mark stress (similar to Bradford 1988, who used a combination of (c) and (d)). (f) represents a modification of the American transcription system ToBI, based on the autosegmental approach (e.g. Toivanen 2005; Estebas-Vilaplana 2013). The focal stress of the sentence is marked by the largest dot above the stressed syllable (c), capitalization of the stressed syllable and the diacritic ´ (e), and the L+H* notation marking the pitch accent (f).

622

45.2 The rhythms of some other languages (top) and English (bottom).

623

45.3 Representing pitch movement in a pronunciation course book.

626

45.4 Teaching word stress using rubber bands.

627

45.5 Waveforms and pitch curves of jìn lái ‘come in’ produced by a female native speaker (left) and a female student (right).

629

45.6 Different renditions of the question ‘You know why, don’t you?’.

630

45.7 A screenshot of the Streaming Speech software.

630

46.1 Effect of power features on performance: a few power features contribute strongly to performance (continuous line), whereas often there is no clear indication of which features contribute most (dashed line).

644

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

List of Tables

2.1 Advantages and disadvantages of articulatory measuring techniques

26

4.1 Tonal contrasts in Iau

45

4.2 Tonal contrasts in Vietnamese

46

4.3 Different contour simplifications of L-HL-H

55

4.4 H- tone stem patterns in Kikuria

58

4.5 Detransitivizing LH replacive tone in Kalabari

58

4.6 Possessive determiners in Kunama

59

4.7 Noni sg~pl alternations in noun class 9/10

59

4.8 Day completive/incompletive aspect alternations

59

4.9 Iau verb tones

60

4.10 Inflected subject markers in Gban

60

4.11 Tonal distributions in Itunyoso Trique

64

8.1 Non-manual marking used in different contexts in ASL and Libras

120

12.1 Grammatical tone in a language without a tone contrast in the verb stem (Luganda) and its absence in a language with such a tone contrast (Lulamogi)187 12.2 Stem-initial prominence marked by distributional asymmetries

190

13.1 Stress assignment in different Arabic dialects

199

13.2 Pausal alternations observed in Classical Arabic (McCarthy 2012)

200

15.1 Available descriptions based on the autosegmental-metrical framework

230

16.1 Information-seeking yes/no questions: nuclear patterns in 16 Italian varieties (left table) and their stylization (right schemes); motifs indicate possible groupings on the basis of nuclear tunes; varieties are represented by abbreviations. For details see Gili Fivela and Nicora (2018), adapted and updated from Gili Fivela et al. (2015a)

242

16.2 Combinations of phrase accent and boundary tone and their pragmatic functions in Athenian Greek (from Arvaniti and Baltazani 2005)

244

17.1 Inventory of nuclear accents

261

17.2 Inventory of IP boundary tones

262

17.3 Nuclear configurations

262

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

List of Tables xxiii

18.1 Accent contrast and prominence levels in Central Swedish (Bruce 1977, 2007; Myrberg 2010; Myrberg and Riad 2015, 2016). The lexical tone is bolded and intonation tones are plain. The tone-bearing unit (TBU) is a stressed syllable; * indicates association to a TBU

273

22.1 Tonal inventories in three dialects of Chinese

333

28.1 Tonal complexity by Oto-Manguean language family

409

28.2 Ixpantepec Nieves Mixtec (Carroll 2015)

410

28.3 San Juan Quiahije Chatino tone sandhi (Cruz 2011)

411

28.4 Yoloxóchitl Mixtec tonal morphology (Palancar et al. 2016)

411

28.5 Stress pattern by Oto-Manguean language family

412

28.6 Controlled and ballistic syllables (marked with /ˊ/) in Lalana Chinantec (Mugele 1982: 9)

413

28.7 The distribution of Itunyoso Triqui tones in relation to glottal consonants414 28.8 Permitted rime types and length contrasts by Oto-Manguean family

415

28.9 Syllable structure in Ayutla Mixe

424

28.10 Segment-based quantity-sensitive stress in Misantla Totonac nouns (Mackay 1999)

425

28.11 Lexical stress in Filomena Mata Totonac (McFarland 2009)

426

29.1 Summary of languages with stress and/or tone systems

428

29.2 Position of primary stress relative to word edges

430

29.3 Types and proportion of quantity-sensitive systems

432

30.1 Information-structural meanings of pitch accents (Steedman 2014)

447

34.1 Theoretical overview of hemispheric lateralization for speech prosody

488

37.1 Distribution of glottalized word-onset vowels in a sample of FM radio news speech, showing the preference for glottalization at the onset of a new intonational phrase and at the beginning of a pitch accented word, as well as individual speaker variation. Stress level is indicated with +/−F for full versus reduced vowel, and +/−A for accented versus unaccented syllable

523

42.1 Recording form for judging prosodic production in spontaneous speech592 46.1 Prototypical approaches in research on prosody in automatic speech processing over the past 40 years (1980–2020), with the year 2000 as a turning point from traditional topics to a new focus on paralinguistics

636

46.2 Phenomena and performance: a rough overview (qualitative performance terms appear in italics)

638

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xxiv List of Tables

49.1 Similar, contrary, and oblique settings, as defined by the relation between the pitch direction in a sequence of two tones and the two corresponding musical notes

680

49.2 Expected frequencies of similar, oblique, and contrary settings

680

49.3 The six Cantonese tones classified in terms of overall level, for the purposes of defining pitch direction in a sequence of two tones

681

49.4 Frequencies of similar (bold), oblique (underlined), and contrary (italic) settings in a 2,500-bigram corpus from Cantonese pop songs, from Lo (2013)

682

49.5 The six Vietnamese tones classified in terms of overall level, for purposes of defining pitch direction in a sequence of two tones

682

49.6 Frequencies of similar (bold), oblique (underlined), and contrary (italic) settings in a corpus from Vietnamese ‘new music’

683

49.7 Frequencies of similar (bold), oblique (underlined), and contrary (italic) settings in a bigram corpus from 30 Thai pop songs

684

49.8 Frequencies of similar (bold), oblique (underlined), and contrary (italic) settings in a 355-bigram pilot corpus from three Dinka songs

685

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

List of Maps

1.1 Areal groupings of languages explored in Part IV

see plate section

12.1 Number of contrastive tone heights

see plate section

12.2 Types of tonal contours

see plate section

12.3 Types of downstepped tones

see plate section

13.1 Geographical location of languages treated in this chapter, with indications of the presence of stress and of tone contrasts (1 = binary contrast; 2 = ternary contrast; 3+ = more complex system), produced with the ggplot2 R package (Wickham 2009)

196

24.1 Japanese and South Korean dialect areas

357

24.2 Six dialect areas of Korean spoken in South Korea

363

26.1 Locator map for tonal elaboration in New Guinea. Grey circle: language with no tone; dark grey square: language with a two-way contrast in tone; black circle: language with three or more tonal contrasts

392

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

List of Abbreviations

1pl.o 1st person exclusive object A1 primary auditory cortex a1pl 1st person plural absolutive a1sg 1st person singular absolutive AAE African American English abil abilitative abs absolutive acc accusative adj adjective ADS adult-directed speech aka also known as AM acoustic model AM autosegmental-metrical (theory of intonational phonology) and andative AP accentual phrase AP Articulatory Phonology appl applicative ASC autism spectrum condition ASD autism spectrum disorder ASL American Sign Language ASP automatic speech processing ASR automatic speech recognition/recognizer ATR advanced tongue root AU actions unit AusE Australian English AuToBI Automatic ToBI-labelling tool aux auxiliary av actor voice AVEC Audio/Visual Emotion Challenge BCMS Bosnian-Croatian-Montenegrin-Serbian BlSAfE Black South African English BOLD blood-oxygen-level-dependent BPR Bayesian Prosody Recognizer BURSC Boston University Radio Speech Corpus C consonant Ca Catalan CAH Contrastive Analysis Hypothesis CAPT computer-assisted pronunciation training

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xxviii List of Abbreviations

caus causative CAY Central Alaskan Yupik CC corpus callosum CDS infant- and child-directed speech CEV Contact English Variety CF conversational filler CI cochlear implant cl classifier clf cleft CNG Continental North Germanic com comitative com.a3sg completive aspect, 3rd person singular absolutive compl completive conj conjunction CP categorical perception CP computational paralinguistics CP cerebral palsy C-Prom annotated corpus for French prominence studies CQ closed quotient, aka contact quotient (glottis) cs centisecond CS current speaker CWCI children with cochlear implants CWG Continental West Germanic CWTH children with typical hearing DANVA Diagnostic Analysis of Nonverbal Behavior dat dative dB decibel def definite det determiner DIRNDL Discourse Information Radio News Database for Linguistic Analysis DLD developmental language disorder dur durative e1pl 1st person plural ergative e3sg 3rd person ergative EEG electroencephalography, electroencephalogram EGG electroglottography, electroglottograph EMA electromagnetic articulography, articulograph emph emphatic encl enclitic EPG electropalatography ERB equivalent rectangular bandwidth (unit) ERP event-related potential erg ergative

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

List of Abbreviations xxix

excl exclusive (1pl) ez Ezefe marker F falling tone f0 fundamental frequency F1 Formant 1 F1 harmonic mean of precision and recall or F-measure F2 Formant 2 FFR frequency-following response FLH Functional Load Hypothesis fMRI functional MRI fNIRS functional near-infrared spectroscopy foc focus FOK feeling-of-knowing FP filled pause Ft foot fut future G glide G_ToBI German ToBI gen genitive GhanE Ghanaian English GR_ToBI Greek ToBI GT grammatical tone H High tone H heavy (syllable type) HKE Hong Kong English HL hearing loss HTS High tone spreading Hz hertz i (subscript) intonational phrase IE Indian English IFG inferior frontal gyrus IK intonational construction incl inclusive (1PL) inst instrumental intr intransitive INTSINT International Transcription System for Intonation IP intonational phrase ip intermediate phrase IPB intonational phrase boundary IPO Institute for Perception Research IPP irregular pitch period irr irrealis IS information structure

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xxx List of Abbreviations

ISL Israeli Sign Language ITL Iambic-Trochaic Law IViE Intonational Variation in English JND just noticeable difference K_ToBI Korean ToBI kHz 1,000 Hz L Low tone L light (syllable type) L1 first or native language L2 second language lat lative LeaP corpus Learning Prosody in a Foreign Language corpus LH left hemisphere Libras Brazilian Sign Language LIS Italian Sign Language LLD low-level descriptor LM language model Ln natural logarithm loc locative LSF Langue des signes française (French Sign Language) LSVT Lee Silverman Voice Treatment LTAS long-term average spectrum LTS Low tone spreading M mid (tone) m medium (stress type) MAE Mainstream American English MAE_ToBI Mainstream American English ToBI MaltE Maltese English MDS multi-dimensional scaling MEG magnetoencephalography, magnetoencephalogram MEV Mainstream English Variety MFCC mel frequency cepstral coefficient mira mirative MIT Melodic Intonation Therapy ML machine learning MMN mismatch negativity MOMEL Modelling Melody MRI magnetic resonance imaging ms millisecond MSEA Mainland South East Asia MTG middle temporal gyrus Mword morphological word N noun

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

List of Abbreviations xxxi

NBB Northern Bizkaian Basque NC Niger-Congo neg negation NGT Nederlandse Gebarentaal (Sign Language of the Netherlands) NigE Nigerian English NIRS near-infrared spectroscopy nom nominative NP noun phrase NPN non-Pama-Nyungan nPVI normalized pairwise variability index NRU narrow rhythm unit NVA Nalbariya Variety of Assamese NZE New Zealand English Ø distinctive absence of tone obj object obliq oblique OCP Obligatory Contour Principle OQ open quotient (glottis) OSV object-subject-verb OT Optimality Theory p (subscript) phonological phrase PA pitch accent PAM Perceptual Assimilation Model PENTA Parallel Encoding and Target Approximation PEPS-C Profiling Elements of Prosody in Speech-Communication perf perfect(ive) PET positron emission tomography PFC post-focus compression pfx prefix pl plural PLVT Pitch Limiting Voice Treatment png person-number-gender POS parts-of-speech poss possessive pot potential PP phonological phrase pr present prf perfective prog progressive ProP Prosody Profile prs present prtc participle Ps subglottal pressure

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xxxii List of Abbreviations

PT Portuguese pv patient voice PVI pairwise variability index PVSP Prosody-Voice Screening Protocol Pword phonological word Q1 short (quantity) Q2 long (quantity) Q3 overlong (quantity) qm question marker Qpart question particle QUD question under discussion quot quotative R rising tone RaP Rhythm and Pitch red reduplicant refl reflexive RF random forest RFR rise-fall-rise RH right hemisphere RIP Respitrace inductive plethysmograph rls realis RPT Rapid Prosodic Transcription s second S strong (stress type) SA South America(n) SAH Segmental Anchoring Hypothesis SAOV subject-adverbial-object-verb SCA Standard Colloquial Assamese sg singular SgE Singapore English SJQC San Juan Quiahije Chatino SLM Speech Learning Model SLUSS simultaneous laryngoscopy and laryngeal ultrasound SOV subject-object-verb SP Spanish sp species SQ skewness quotient (glottis) sqrt square root ss status suffix SSBE Southern Standard British English ST semitone StB Standard Basque STG superior temporal gyrus SVM support vector machine syll/s syllables per second

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

List of Abbreviations xxxiii

TAM tense-aspect-mood TBU tone-bearing unit TCoG Tonal Center of Gravity TD typically developing tns tense ToBI Tones and Break Indices ToDI Transcription of Dutch Intonation top topic marker tr transitive TRP transition-relevant place tv transitive verb U utterance (prosodic constituent) UAR unweighted average recognition UNB Urban Northern British V vowel V̂ vowel with falling tone ̌ V vowel with rising tone V́ high toned vowel V̀ low toned vowel ̋ V super-high toned vowel VarcoC variation coefficient for consonantal intervals VarcoV variation coefficient for vocalic intervals veg vegetable noun class vn verbal noun VOS verb-object-subject VOT voice onset time VP verb phrase VSAO verb-subject-adverbial-object VSO verb-subject-object W weak (stress type) WA word accent WALS World Atlas of Language Structures WAR weighted average recognition WER word error rate wpm words per minute WSAfE White South African English XT/3C Extrinsic-Timing-Based Three-Component model YM Yoloxóchtil Mixtec α accentual phrase ι intonational phrase μ mora σ syllable υ utterance (prosodic constituent) φ phonological phrase ω phonological word, prosodic word

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

About The Contributors

Bistra Andreeva is Professor of Phonetics and Phonology in the Department of Language Science and Technology, Saarland University. Her major research interests include the phonetics and phonology of intonation and rhythm, cross-language and individual differences in the production and perception of syllabic prominence in various languages, the relation between intonation and information structure, and the interaction between information density and prosodic structure. Kristján Árnason is Professor Emeritus of Icelandic Language and Linguistics at the University of Iceland. He obtained his PhD at the University of Edinburgh in 1977. Among his publications in English are Quantity in Historical Phonology: Icelandic and Related Cases (Cambridge University Press, 1980/2009), The Rhythms of Dróttkvætt and Other Old Icelandic Metres (Institute of Linguistics, University of Iceland, 1991/2000), and The Phonology of Icelandic and Faroese (Oxford University Press, 2011). Particular interests within phonology include intonation and prosody, the interface between morphosyntax and phonology, and Old Icelandic metrical rhythm and poetics. He has organized conferences and participated in research projects on phonological variation, metrics, and sociolinguistics. Anja Arnhold is an Assistant Professor in the Department of Linguistics at the University of Alberta. She is an experimental phonologist who specializes in the prosodic marking of information structure and has worked on various languages, including Finnish, Greenlandic (Kalaallisut), Inuktitut, Mandarin, and Yakut (Sakha). Anja earned her MA from the University of Potsdam in 2007 and her PhD from Goethe University Frankfurt in 2013, moving on to a position as a postdoctoral fellow and contract instructor at the University of Alberta and as a postdoctoral researcher at the University of Konstanz. Amalia Arvaniti is Professor of English Linguistics at Radboud University. She has worked extensively on prosody, particularly on rhythm and intonation; her research focuses on Greek, English, Romani, and Korean. She has previously held appointments at the University of Kent (2012–2020), the University of California, San Diego (2002–2012), and the University of Cyprus (1995–2001), and temporary appointments at Cambridge, Oxford, and the University of Edinburgh. She is currently the president of the Permanent Council for the Organisation of the International Congress of Phonetic Sciences and the vice-president of the International Phonetic Association. Eva Liina Asu is an Associate Professor of Swedish and Phonetics at the University of Tartu. She obtained her PhD in Linguistics at the University of Cambridge in 2004. Her research

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xxxvi About The Contributors focuses on various prosodic aspects of Estonian including the phonetics and phonology of intonation, rhythm, stress, and quantity. She is also interested in prosodic and segmental features of Estonian Swedish in comparison with other varieties of Swedish. Brett Baker is an Associate Professor in Linguistics at the School of Languages and Linguistics, University of Melbourne. His primary research areas are phonology and morphology, with a focus on Australian Indigenous languages including Kriol. He has worked on a number of languages of southeastern Arnhem Land, especially Ngalakgan and Nunggubuyu/Wubuy, through primary fieldwork since the mid-1990s. His current work takes an experimental approach to investigating the extent to which speakers of Wubuy have knowledge of the internal structure of polysynthetic words. Mary Baltazani is a researcher at the Phonetics Laboratory, University of Oxford. Her research focuses on phonetics, phonology, and their interface, with special interests in intonation and pragmatics, Greek dialects, dialectology, and sociophonetics. She is currently investigating the diachronic development of intonation as it has been shaped by the historical contact of Greek with Italian and Turkish in a project supported by the Economic and Social Research Council, UK. Jonathan Barnes is an Associate Professor in the Boston University Department of Linguistics. He received his PhD from the University of California, Berkeley, in 2002, and specializes in the interface between phonetics and phonology, most particularly as this concerns the structures of tone and intonation systems. Much of his recent work involves dynamic interactions in perception between ostensibly distinct aspects of the acoustic signal, and the consequences of these interactions for our understanding of the content of phonological representations. Anton Batliner is Senior Research Fellow affiliated with the chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg. He obtained his PhD at LMU Munich in 1978. He has published widely on prosody and paralinguistics and coauthored Computational Paralinguistics (Wiley, 2014, with Björn Schuller), besides being an active editor and conference organizer. His earlier affiliations were with the Pattern Recognition Lab at the University of Erlangen-Nuremberg and the institutes for Nordic Languages and German Philology (both LMU Munich). Ryan Bennett is an Associate Professor in the Department of Linguistics at the University of California, Santa Cruz. His primary research area is phonology, with a particular emphasis on prosody and the interfaces between phonology and other grammatical domains (phonetics, morphology, and syntax). His current research focuses on the phonetics and phonology of K’ichean-branch Mayan languages, particularly Kaqchikel and Uspanteko. This work involves ongoing, original fieldwork in Guatemala and draws on data from elicitation, experimentation, and corpora. He also has expertise in Celtic languages, especially Irish. Štefan Beňuš is an Associate Professor in the Department of English and American Studies at Constantine the Philosopher University and a Senior Researcher in Speech Sciences at

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

About The Contributors xxxvii the Institute of Informatics of the Slovak Academy of Sciences in Bratislava. He holds a PhD in linguistics from New York University and postdoctoral qualifications from Columbia University and LMU Munich. His research centres on the relationship between (i) speech prosody and the pragmatic/discourse aspect of the message and (ii) phonetics and phonology, with a special interest in the articulatory characteristics of speech. He previously served as an associate editor of Laboratory Phonology and regularly presents at major conferences, such as Speech Prosody and Interspeech. Heather Bortfeld is a Professor of Psychological Sciences at the University of California, Merced (UC Merced). She completed her PhD in experimental psychology at the State University of New York, Stony Brook, in 1998. Her postdoctoral training in cognitive and linguistic sciences at Brown University was supported by the National Institutes of Health. She was on the psychology faculty at Texas A&M University and the University of Connecticut prior to arriving at UC Merced in 2015. Her research focuses on how typically developing infants come to recognize words in fluent speech and the extent to which the perceptual abilities underlying this learning process are specific to language. She has more recently extended this focus to the influence of perceptual, cognitive, and social factors on language development in paediatric cochlear implant users. Bettina Braun is Professor of General Linguistics and Phonetics at the University of Konstanz. Her research focuses on the question of how listeners process and interpret the continuous speech stream, with a special emphasis on speech prosody. Further research interests include first and second language acquisition of prosody, and the interaction between prosody and other aspects of language (word order, particles). Marc Brunelle joined the Department of Linguistics at the University of Ottawa, where he is now Associate Professor, in 2006. He obtained his PhD at Cornell University in 2005. His research interests include phonology and phonetics, tone and phonation, prosody, language contact, South East Asian linguistics, and the linguistic history of South East Asia. His work focuses on Chamic languages and Vietnamese. Gabriela Caballero is an Associate Professor in the Department of Linguistics at the University of California, San Diego. She received her BS from the University of Sonora in 2002 and her PhD from the University of California, Berkeley, in 2008. Her research focuses on the description and documentation of indigenous languages of the Americas (especially Uto-Aztecan languages), phonology, morphology, and their interaction. Her research interests recently extend to the psycholinguistic investigation of phonological and morphological processing in order to better understand patterns of morphological and phonological variation in morphologically complex languages and prosodic typology. Sasha Calhoun is a Senior Lecturer in Linguistics at Victoria University of Wellington. Her research focuses on the functions of prosody and intonation, in particular information structure. Her PhD thesis, completed at the University of Edinburgh, looked at how prosody signals information structure in English from a probabilistic perspective. More recently, she has extended this work to look at how information structure, prosody, and

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xxxviii About The Contributors syntax interact in other languages, including Samoan, te reo Māori, and Spanish. She has also been involved in work looking at intonation from an exemplar perspective in English and German. Thiago Costa Chacon is an Assistant Professor of Linguistics at the University of Brasilia and a research associate at the Max Planck Institute for the Sciences of Human History. He received his PhD from the University of Hawaiʻi at Manoa in 2007 and has specialized in the native languages of the Amazon, with particular focus on phonology, typology, historical linguistics, and language documentation and conservation. He has done fieldwork in several languages, including Kubeo, Tukano, Desano, Wanano (Tukanoan family), Ninam (Yanomami family), and Arutani (linguistic isolate). Bharath Chandrasekaran is a Professor and Vice Chair for Research in Communication Science and Disorders at the University of Pittsburgh. His research uses a systems neuroscience approach to study the computations, maturational constraints, and plasticity underlying speech perception. Aoju Chen is Professor of Language Development in Relation to Socialisation and Identity at Utrecht University. She has worked extensively on the production, perception, and processing of prosodic meaning and acquisition of prosody in first and second languages from a cross-linguistic perspective. More recently, she has extended her work to research on the social impact of developing language abilities in a first or second language with a focus on speech entrainment and the development of belonging. She is currently an associate editor of Laboratory Phonology and an elected board member of the ISCA Special Interest Group on Speech Prosody (SProSIG). Yiya Chen is Professor of Phonetics at the Leiden University Centre for Linguistics and a Senior Researcher at the Leiden Institute for Brain and Cognition. Her research focuses on prosody and prosodic variation, with particular attention to tonal languages. The general goal of her research is to understand the cognitive mechanisms and linguistic structures that underlie communication and language. She obtained her PhD from Stony Brook University in 2003. She has worked as a postdoctoral researcher at the University of Edinburgh and Radboud University in Nijmegen. She currently serves on the editorial boards of the Journal of Phonetics and the Journal of International Phonetic Association. Taehong Cho is Professor of Phonetics in the Department of English Language and Literature and the Director of the Institute for Phonetics and Cognitive Sciences of Language at Hanyang University. He earned his PhD degree in phonetics at the University of California, Los Angeles, and subsequently worked at the Max Planck Institute for Psycholinguistics in Nijmegen. His main research interest is in the interplay between prosody, phonology, and phonetics in speech production and its perceptual effects in speech comprehension. He is currently serving as editor in chief of the Journal of Phonetics and is book series editor for Studies in Laboratory Phonology (Language Science Press).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

About The Contributors xxxix Anne Christophe is a Centre National de la Recherche Scientifique (CNRS) researcher and the Director of the Laboratoire de Sciences Cognitives et Psycholinguistique at the École Normale Supérieure in Paris (part of PSL University). She received her PhD in cognitive psychology from the École des Hautes Études en Sciences Sociales (Paris) in 1993 and worked as a postdoctoral researcher at University College London prior to her CNRS research position. Her work focuses on early language acquisition and the role of phrasal prosody and function words in promoting early lexical and syntactic acquisition. Dorothy M. Chun is Professor of Applied Linguistics and Education at the University of California, Santa Barbara (UCSB). Her research areas include second language phonology and intonation, second language reading and vocabulary acquisition, computer-assisted language learning, and telecollaboration for intercultural learning. She is the author of Discourse Intonation in L2: From Theory and Research to Practice (John Benjamins, 2002). She has been the editor in chief of the online journal Language Learning & Technology since 2000 and is the founding director of the PhD Emphasis in Applied Linguistics at UCSB. Emily Clem is an Assistant Professor of Linguistics at the University of California, San Diego. She obtained her PhD at the University of California, Berkeley, in 2019. Her research focuses primarily on syntax and its interfaces and draws on data from her fieldwork on Amahuaca (a Panoan language of Peru) and Tswefap (a Grassfields Banttu language of Cameroon). Her work also examines the large-scale areal distribution of linguistic features, such as tone, using computational tools to illuminate the influence of inheritance and contact on distributional patterns. Svetlana Dachkovsky is a researcher at the Sign Language Research Lab, University of Haifa, and a lecturer at Gordon Academic College. She obtained her PhD at the University of Haifa in 2018. Her research addresses topics in linguistic and non-linguistic aspects of prosody, information structure, and change in sign language grammar, as well as multimodal communication in signed and spoken modalities. Her work focuses on the grammaticalization of non-linguistic signals into linguistic intonation in sign language, and on the role of information structure in this process. Anna Daugavet is an Assistant Professor at the Department of General Linguistics of Saint Petersburg State University, where she teaches courses on Lithuanian and Latvian dialectology and areal features of the Baltic languages. She completed her PhD in linguistics at Saint Petersburg University in 2009. Her research interests include syllable weight, tone, and stress. Fernando O. de Carvalho is Assistant Professor of Linguistics at the Federal University of Amapá. He has been a visiting researcher at the Max Planck Institute for Evolutionary Anthropology in Leipzig and the Laboratoire Dynamique du Langage in Lyon. His primary research area is the historical linguistics of the indigenous languages of South America, in particular of the Arawak, Jê, and Tupi-Guarani language families. He has done fieldwork with a number of lowland South American languages, including Mebengokre (Jê), Kalapalo (Cariban), Wayuunaiki (Arawak), and Terena (Arawak).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xl About The Contributors Nicole Dehé has held postdoctorate research and teaching positions at University College London, the University of Leipzig, the University of Braunschweig, the Humboldt University at Berlin, and the Freie Universität Berlin. She obtained her PhD at the University of Leipzig in 2001. In 2010, she joined the Department of Linguistics at the University of Konstanz as Full Professor of Linguistics. She is an Adjunct Professor at the Faculty of Icelandic and Comparative Cultural Studies, University of Iceland. Her research focuses on prosody, intonation, syntax, and the syntax–prosody and prosody–pragmatics interfaces. She mainly works on Icelandic, English, and German. Christian DiCanio is Assistant Professor at the University at Buffalo and a senior research scientist at Haskins Laboratories. He obtained his PhD at the University of California, Berkeley, in 2008. As a phonetician and a fieldworker, he focuses primarily on the phonetics, phonology, and morphology of tone in Oto-Manguean languages. He has documented the San Martín Itunyoso Triqui language, and applied corpus and laboratory methods to the analysis of various endangered languages of the Americas. Laura Dilley is Associate Professor in the Department of Communicative Sciences and Disorders at Michigan State University. She received her BS in brain and cognitive sciences with a minor in linguistics in 1997 from MIT and obtained her PhD in the Harvard–MIT Program in Speech and Hearing Biosciences and Technology in 2005. She is the author of over 60 publications on prosody, word recognition, and other topics. Mariapaola D’Imperio is currently Distinguished Professor in the Department of Linguistics and the Cognitive Science Center at Rutgers University and Head of the Speech and Prosody Lab. She obtained a PhD in linguistics from Ohio State University in 2000 and then joined the Centre National de la Recherche Scientifique in 2001. She then obtained a position as Professor of Phonetics, Phonology and Prosody at the Department of Linguistics at Aix-Marseille University, where she was Head of the Prosody Group at the Laboratoire Parole et Langage in Aix-en-Provence, France. She is currently associate editor of the Journal of Phonetics and president of the Association for Laboratory Phonology. Her research interests span the intonational phonology of Romance languages to prosody production, perception, and processing. Mark Donohue has worked linguistically in New Guinea since 1991, researching languages from both sides of the international border. He has published extensively on the prosodic systems of the languages of a number of different families in the region and beyond, as well as grammatical description and historical linguistics. He works with the Living Tongues Institute for Endangered Languages. Amelie Dorn obtainted her PhD in linguistics from Trinity College Dublin, where she carried out research on the prosody and intonation of Donegal Irish, a northern variety of Irish. In 2015, she joined the Austrian Centre for Digital Humanities and Cultural Heritage of the Austrian Academy of Sciences in Vienna. She is also a postdoctoral researcher and lecturer in the Department of German Studies at the University of Vienna.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

About The Contributors xli San Duanmu is Professor of Linguistics at the University of Michigan. He received his PhD in linguistics from MIT in 1990 and has held teaching posts at Fudan University (1981–1986) and the University of Michigan, Ann Arbor (1991–present). His research focuses on general properties of language, especially those in phonology. He is the author of The Phonology of Standard Chinese (2nd edition, Oxford University Press, 2007), Syllable Structure: The Limits of Variation (Oxford University Press, 2008), Foot and Stress (Beijing Language & Culture University Press, 2016), and A Theory of Phonological Features (Oxford University Press, 2016). Gorka Elordieta is Professor of Linguistics in the Department of Linguistics and Basque Studies at the University of the Basque Country, Spain. He obtained his PhD at the University of Southern California in 1997. His main interests are the syntax–phonology interface (the derivation of prosodic structure from syntactic structure and the effect of prosodic markedness constraints on prosodic phrasing), intonation in general, the prosodic realization of information structure, and intonational issues in language and dialect contact situations. Núria Esteve-Gibert is an Associate Professor in the Department of Psychology and Educational Sciences at the Universitat Oberta de Catalunya. She is mainly interested in first and second language development, and in particular the interfaces between prosody, body gestures, and pragmatics. Her research has shown that prosodic cues in speech align with body movements, and that language learners use multi-modal strategies to acquire language and communication. Paula Fikkert is Professor of Linguistics specializing in child language acquisition at Radboud University in Nijmegen. She obtained her PhD from Leiden University for her awardwinning dissertation On the Acquisition of Prosodic Structure (1994). She has been a (guest) researcher at various universities, among them Konstanz Universität, the University of Tromsø, and the University of Oxford. Her research concerns the acquisition of phonological representations in the lexicon and the role of these representations in perception and production. Most of her research is conducted at the Baby and Child Research Center in Nijmegen. Janet Fletcher is Professor of Phonetics in the School of Languages and Linguistics at the University of Melbourne. Her research interests include phonetic theory, laboratory phonology, prosodic phonology, and articulatory and acoustic modelling of prosodic effects in various languages. She is currently working on phonetic variation, prosody, and intonation in Indigenous Australian languages, including Mawng, Bininj Gun-wok, and Pitjantjatjara, and has commenced projects on selected languages of Oceania. Sónia Frota is Professor of Experimental Linguistics at the University of Lisbon. Her research seeks to understand the properties of prosodic systems (phrasing, intonation, and rhythm), the extent to which they vary across and within languages, and how they are acquired by infants and help to bootstrap the learning of language. She is the director of the Phonetics and Phonology Lab and the Baby Lab at the University of Lisbon, and is the editor in chief of the Journal of Portuguese Linguistics.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xlii About The Contributors James Sneed German is an Associate Professor of Language Sciences at Aix-Marseille University and the Laboratoire Parole et Langage. His research focuses primarily on intonational meaning, and especially on how specific marking strategies are disrupted by conflicting constraints across different levels of representation. He has worked extensively on modelling the intonational phonology of understudied varieties, while more recent research explores how the linguistic system dynamically adapts to socio-indexical cues in situations of dialect contact. His research covers a range of language varieties including American English, Glaswegian English, Singapore English, Standard French, Corsican French, Singapore Malay, and Taiwan Mandarin. Judit Gervain is a senior research scientist at the Centre National de la Recherche Scientifique (CNRS), working in the Laboratoire Psychologie de la Perception in Paris. She received her PhD from the International School for Advanced Studies, Trieste, in 2007 and worked as a postdoctoral research fellow at the University of British Columbia in Vancouver, before taking up a CNRS researcher position in 2009. Her work focuses on speech perception and language acquisition during the prenatal and early postnatal periods. Barbara Gili Fivela is Associate Professor at the University of Salento, where she is also vicedirector of the Centro di Ricerca Interdisciplinare sul Linguaggio and director of the programme for Studies in Interlinguistic Mediation/Translation and Interpretation. Since 2019, she has been president of the Associazione Italiana Scienze della Voce, an ISCA Special Interest Group. Her main research interests are the phonology and phonetics of intonation, second language learning processes, and the kinematics of healthy and dysarthric speech. Daniel Goodhue is a postdoctoral researcher at the University of Maryland, College Park. He completed his PhD at McGill University in 2018 with a thesis entitled ‘On asking and answering biased polar questions’. Using traditional and experimental methods, he researches the semantics and pragmatics of various phenomena, including intonation, questions, answers, and modality. Matthew K. Gordon is Professor of Linguistics at the University of California, Santa Barbara. His research interests include prosody, typology, phonological theory, and the phonetic and phonological description of endangered languages. He is the author of Syllable Weight: Phonetics, Phonology, Typology (Routledge, 2006) and Phonological Typology (Oxford University Press, 2016). Agustín Gravano is Professor in the Computer Science Department at the University of Buenos Aires, and Researcher at CONICET (Argentina’s National Research Council). His main research topic is the coordination between participants in conversations, both at a temporal level and along other dimensions of speech. The ultimate goal is to include this knowledge into spoken dialogue systems, aiming at improving their naturalness. Martine Grice is Professor of Phonetics at the University of Cologne. She has served as president of the Association of Laboratory Phonology and edits the series Studies in Laboratory Phonology. Her work on intonation theory includes the analysis of complex

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

About The Contributors xliii tonal structures and the consequences of tune–text negotiation. She has worked extensively on the intonation of Italian, English, and German and has addressed specific challenges in the analysis of Vietnamese, Tashlhiyt Berber, and Maltese. More recently she has been working on the use of intonation in attention orienting and the effect of perspective-taking abilities (e.g. in autism) on prosodic encoding and decoding. Güliz Güneş is a postdoctoral researcher at the Leiden University Centre for Linguistics and a scientific researcher at Eberhard Karls University of Tübingen. She participates in a project funded by the Dutch Research Council on the prosody of ellipsis and deaccentuation. Her past research focused on the prosodic constituency of Turkish, the interactions between information structure and prosody, and syntax–prosody correspondence in Turkish at the morpheme, word, phrase, and sentence levels. She has also worked on how morphology mediates prosodic constituency formation in Turkish. More generally, her research seeks to understand what prosodic traits of spoken speech can tell us about the interactions between syntax, morphology, and discourse structure. Carlos Gussenhoven obtained his PhD from Radboud University in 1984. He is currently Professor of Phonetics and Phonology at National Chiao Tung University, funded by the Ministry of Science and Technology, and Professor Emeritus at Radboud University, where he held a personal chair from 1996 to 2011. He was a visiting scholar at Edinburgh University (1981–1982), Stanford University (1985–1986), and the University of California, Berkeley (1995, Fulbright), and has held positions at the University of California, Berkeley (1991) and Queen Mary University of London (2004–2011), as well as guest professorships at the University of Konstanz and Nanjing University. Besides publishing his research on prosodic theory and the prosodic systems of a variety of languages in journals, edited books, and conference proceedings, he co-authored Understanding Phonology (4th edition, Routledge, 2017) and published The Phonology of Tone and Intonation (Cambridge University Press, 2004). Mark Hasegawa-Johnson is Professor of Electrical and Computer Engineering at the University of Illinois (Urbana-Champaign) and a Fellow of the Acoustical Society of America. He is currently treasurer of International Speech Communication Association, Secretary of SProSIG, and a member of the Speech and Language Technical Committee of the Institute of Electrical and Electronics Engineers; he was general chair of Speech Prosody 2010. Mark received his PhD from Ken Stevens at MIT in 1996, was a postdoctoral researcher with Abeer Alwan at the University of California, Los Angeles, and has been a visiting professor with Jeff Bilmes in Seattle and with Tatsuya Kawahara in Kyoto. He is author or co-author of over 50 journal articles, book chapters, and patents, and of over 200 conference papers and published abstracts. His primary research areas are in the application of phonological concepts to audio and audiovisual speech recognition and synthesis (LandmarkBased and Prosody-Dependent Speech Recognition), in the application of semi-supervised and interactive machine learning methods to multimedia browsing and search (Multimedia Analytics), and in the use of probabilistic transcription to develop massively multilingual speech technology localized to under-resourced dialects and languages.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xliv About The Contributors Sam Hellmuth is Senior Lecturer in Linguistics in the Department of Language and Linguistic Science at the University of York. Sam was director and principal investigator of the UK Economic and Social Research Council-funded project Intonational Variation in Arabic. Her research seeks to understand the scope of variation observed in the intonational systems of spoken Arabic dialects, and the interaction of intonation in these languages with segmental and metrical phonology, syntax, semantics, and information structure. Sam also works on second language acquisition of prosody, and the prosodic properties of regional dialects of British Englishes and World Englishes. Nikolaus P. Himmelmann is Professor of General Linguistics at the Universität zu Köln. His specializations include typology and grammaticalization as well as language documentation and description. He has worked extensively on western Austronesian as well as Papuan languages. Julia Hirschberg is Percy K. and Vida L. W. Hudson Professor of Computer Science at Columbia University. Her research focuses on prosody and discourse, including studies of speaker state (emotional, trustworthy, and deceptive speech), text-to-speech synthesis, detection of hedge words and phrases, spoken dialogue systems and entrainment in human–human and human–machine conversation, and linguistic code-switching (language mixing by bilinguals). Her previous research includes studies of charismatic speech, turn-taking behaviours, automatic detection of corrections and automatic speech recognition errors, cue phrases, and conversational implicature. She previously worked at Bell Labs and AT&T Labs in speech and human–computer interface research. Larry M. Hyman has since 1988 been Professor of Linguistics at the University of California, Berkeley, in the Department of Linguistics, which he chaired from 1991 to 2002. He has worked extensively on phonological theory and other aspects of language structure, including publishing several books—such as Phonology: Theory and Analysis (Holt, Rinehart & Winston, 1975) and A Theory of Phonological Weight (Foris, 1985)—and numerous theoretical articles in such journals as Language, Linguistic Inquiry, Natural Language and Linguistic Theory, Phonology, Studies in African Linguistics, and the Journal of African Languages and Linguistics. His current interests centre around phonological typology, tone systems, and the descriptive, comparative, and historical study of Niger-Congo languages, especially Bantu. He is also currently executive director of the France-Berkeley Fund. Allard Jongman is a Professor in the Linguistics Department at the University of Kansas. His research addresses the nature of phonetic representations and involves the study of the acoustic properties of speech sounds, the relation between phonetic structure and p honological representations, and the interpretation of the speech signal in perception across a wide range of languages. In addition to many journal articles, he is the co-author (with Henning Reetz) of Phonetics: Transcription, Production, Acoustics, and Perception (Wiley, 2020). Sun-Ah Jun obtained her PhD from the Ohio State University in 1993. She is Professor of Linguistics at the University of California, Los Angeles. She also taught at the Linguistic Society of America Summer Institute in 2001 and 2015, and the Netherlands Graduate School of Linguistics Summer School in 2013. Her research focuses on intonational phonology,

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

About The Contributors xlv prosodic typology, the interface between prosody and syntax/semantics/sentence processing, and language acquisition. Her publications include The Phonetics and Phonology of Korean Prosody: Intonational Phonology and Prosodic Structure (Garland, 1996; Routledge, 2018) and the two edited volumes Prosodic Typology: The Phonology of Intonation and Phrasing (Oxford University Press, 2005, 2014). Anastasia Karlsson is an affiliate Associate Professor of Phonetics at Lund University. Her research area is phonetics with the main focus on prosodic typology. She has contributed empirical studies on a number of typologically different languages, such as Kammu; the Formosan languages Bunun, Puyuma, and Seediq; and Halh Mongolian. Maciej Karpiński is a Professor in the Faculty of Modern Languages and Literatures of Adam Mickiewicz University, Poznań. He holds a PhD in general linguistics and a postdoctoral degree in applied linguistics. Focusing on pragma-phonetic and paralinguistic aspects of communication, he has recently investigated the contribution of prosody and gestures to the process of communicative accommodation and the influence of social factors, emotional states, and physical conditions on speech prosody. He has developed a number of linguistic corpora and contributed to projects on language documentation. Daniel Kaufman is an Assistant Professor at Queens College and the Graduate Center of the City University of New York and co-director of the Endangered Language Alliance, a non-profit organization dedicated to the documentation and support of endangered languages spoken by immigrant communities in the New York area. He obtained his PhD at Cornell University in 2010. He specializes in the Austronesian languages of the Philippines and Indonesia with a strong interest in both the synchronic analysis of their phonology, morphology, and syntax and their typology and diachrony. Holly J. Kennard completed her DPhil in linguistics at the University of Oxford. She subsequently held a British Academy Postdoctoral Fellowship, examining Breton phonology and morphophonology, and is now a Departmental Lecturer in Phonology at the University of Oxford. Her research focuses on phonology and morphonology in Breton and a variety of other languages. Paul Kiparsky, a native of Finland, is Professor of Linguistics at Stanford University. He works mainly on grammatical theory, language change, and verbal art. He collaborated with S. D. Joshi to develop a new understanding of the principles behind Pāṇini’s grammar. His book Pāṇini as a Variationist (MIT Press, 1979) uncovered an important dimension of the grammar that was not known even to the earliest commentators. James Kirby received his PhD in linguistics from the University of Chicago in 2010. Since that time he has been a Lecturer (now Reader) in the Department of Linguistics and English Language at the University of Edinburgh. His research considers the phonetic and phonological underpinnings of sound change, with particular attention to the emergence of lexical tone and register systems. Emiel Krahmer is Professor of Language, Cognition and Computation at the Tilburg School of Humanities and Digital Sciences. He received his PhD in computational linguistics in 1995, after which he worked as a postdoctoral researcher in the Institute

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xlvi About The Contributors for Perception Research at the Eindhoven University of Technology before moving to Tilburg University. His research focuses on how people communicate with each other, both verbally and non-verbally, with the aim of improving the way computers communicate with human users. Haruo Kubozono completed his PhD at the University of Edinburgh in 1988. He taught phonetics and phonology at Nanzan University, Osaka University of Foreign Studies, and Kobe University (at the last of which he was a professor) before he moved to the National Institute for Japanese Languages and Linguistics as a professor and director in 2010. His research interests range from speech disfluencies to speech prosody (accent and intonation) and its interfaces with syntax and information structure. He recently edited The Handbook of Japanese Phonetics and Phonology (De Gruyter Mouton, 2015), The Phonetics and Phonology of Geminate Consonants (Oxford University Press, 2017), and Tonal Change and Neutralization (De Gruyter Mouton, 2018). Frank Kügler is Professor of Linguistics (Phonology) at Goethe University Frankfurt. His PhD thesis (Potsdam University, 2005) compared the intonation of two German dialects. He received his postdoctoral degree (Habilitation) from Potsdam University studying the prosodic expression of focus in typologically unrelated languages and obtained a Heisenberg Fellowship to do research at the Phonetics Institute of Cologne University. His research interests are in cross-linguistic and typological prosody, tone, intonation, recursivity of prosodic constituents, and the syntax–prosody interface. He has worked on the prosody of a number of typologically diverse languages. D. Robert (Bob) Ladd is Emeritus Professor of Linguistics at the University of Edinburgh. Much of his research has dealt with intonation and prosody in one way or another; he was also a leading figure in the establishment of ‘laboratory phonology’ during the 1980s and 1990s. He received a BA in linguistics from Brown University in 1968 and a PhD from Cornell University in 1978. He undertook various research and teaching appointments from 1978 to 1985, and has been at the University of Edinburgh since 1985. He was appointed Professor of Linguistics in 1997 and became an Emeritus Professor in 2011. He became a Fellow of the British Academy in 2015 and a Member of Academia Europaea in 2016. Aditi Lahiri holds PhDs from the University of Calcutta and Brown University. She has held a research appointment at the Max Planck Institute for Psycholinguistics and various teaching appointments at the University of California Los Angeles and Santa Cruz. After spending 15 years as the Lehrstuhl Allgemeine Sprachwissenschaft at the University of Konstanz, she is currently the Statutory Professor of Linguistics at the University of Oxford. She specializes in phonology from various perspectives—synchronic, diachronic, and experimental, and has largely focused on Germanic languages and on Bengali. Joseph C. Y. Lau is a postdoctoral scholar in the Department of Psychology and Roxelyn and Richard Pepper Department of Communication Sciences and Disorders at Northwestern University. He received his PhD in linguistics at the Chinese University of Hong Kong. His work focuses on using neurophysiological and behavioural methods in consonance with

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

About The Contributors xlvii machine learning techniques to understand long-term and online neuroplasticity in speech processing in neurotypical and clinical populations from infancy to adulthood. William R. Leben is Professor Emeritus of Linguistics at Stanford University, his main affiliation since 1972. He has done fieldwork in Niger, Nigeria, Ghana, and Côte d’Ivoire on tone and intonation in Chadic and Kwa languages. He has co-authored textbooks on Hausa, the structure of English vocabulary, and the languages of the world. John M. Levis has taught and researched pronunciation for many years. He is founding editor of the Journal of Second Language Pronunciation, the founder of the Pronunciation in Second Language Learning and Teaching Conference, and the codeveloper of pronunciationforteachers. com. He is co-editor of several books, including the Phonetics and Phonology section of the Encyclopedia of Applied Linguistics (Wiley, 2012), Social Dynamics in Second Language Accent (de Gruyter, 2014), the Handbook of English Pronunciation (Wiley, 2015), Critical Concepts in Linguistics: Pronunciation (Routledge, 2017) and Intelligibility, Oral Communication, and the Teaching of Pronunciation (Cambridge University Press, 2018). Rivka Levitan received her PhD from Columbia University in 2014. She is now an Assistant Professor in the Department of Computer and Information Science at Brooklyn College, and in the Computer Science and Linguistics Programs at the City University of New York Graduate Center. Her research focuses on the detection of paralinguistic and affective information from speech and language, with special interest in the information carried by dialogue and prosody. Diane Lillo-Martin is a Board of Trustees Distinguished Professor of Linguistics at the University of Connecticut and a Senior Research Scientist at Haskins Laboratories. She is a Fellow of the Linguistic Society of America and currently serves as chair of the international Sign Language Linguistics Society. She received her PhD in linguistics from the University of California, San Diego. Her research areas include the morphosyntax of American Sign Language and the acquisition of both signed and spoken languages, including bimodal bilingualism. Florian Lionnet is Assistant Professor of Linguistics at Princeton University. He obtained his PhD at the University of California, Berkeley, in 2016. His research focuses on phonology, typology, areal and historical linguistics, and language documentation and description, with a specific focus on African languages. He is currently involved in research on understudied and endangered languages in southern Chad. He has published on a range of topics, including the phonetics–phonology interface, tonal morphosyntax, the areal distribution of phonological features in northern sub-Saharan Africa, and the typology and grammaticalization of verbal demonstratives. Liquan Liu is a Lecturer in the School of Psychology at Western Sydney University. He received his PhD from Utrecht University in 2014 , and currently holds a Marie SkłodowskaCurie fellowship at the Center for Multilingualism in Society across the Lifespan, University of Oslo. He uses behavioural and electrophysiological techniques to measure infant and early childhood development, featuring speech perception and bilingualism from an interdisciplinary perspective.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

xlviii About The Contributors Katalin Mády is a Senior Researcher at the Hungarian Research Institute for Linguistics, Budapest. After finishing her PhD on clinical phonetics at the Institute of Phonetics and Speech Processing (IPS), LMU Munich in 2004, she became Assistant Professor in German linguistics at the Pázmány Péter Catholic University. She returned to IPS LMU as a postdoctoral researcher in 2006 to do research in sociophonetics, laboratory phonology, and prosody. Her current work mainly focuses on prosodic typology, with a special interest in Uralic and other understudied languages in Central and Eastern Europe. James Mahshie is Professor in the Department of Speech, Language and Hearing Sciences at George Washington University. His research examines the perception and production of speech features by deaf children with cochlear implants. He has published numerous articles and book chapters on speech production and deafness, and co-authored a text on facilitating communication enhancement in deaf and hard-of-hearing children. Zofia Malisz is a researcher in speech technology at the Royal Institute of Technology in Stockholm. Her work focuses on modelling speech rhythm, timing, and prominence as well as improving prominence control in text-to-speech synthesis. Reiko Mazuka is a Team Leader for Laboratory for Language Development at the RIKEN Center for Brain Sciences. Her PhD dissertation was on developmental psychology (Cornell University, 1990). Before opening her lab at RIKEN in 2004, she worked in the Psychology Department at Duke University. She is interested in examining the role of language-specific phonological systems on phonological development. John J. McCarthy is Provost and Distinguished Professor at the University of Massachusetts Amherst. He is a fellow of the American Academy of Arts and Sciences, the American Association for the Advancement of Science, and the Linguistic Society of America. His books include Hidden Generalizations: Phonological Opacity in Optimality Theory (Equinox, 2007) and Doing Optimality Theory (Blackwell, 2008). With Joe Pater he edited Harmonic Grammar and Harmonic Serialism (Equinox, 2016). James M. McQueen is Professor of Speech and Learning at Radboud University. He studied experimental psychology at the University of Oxford and obtained his PhD from the University of Cambridge. He is a principal investigator at the Donders Institute for Brain, Cognition and Behaviour (Centre for Cognition) and is an affiliated researcher at the Max Planck Institute for Psycholinguistics. His research focuses on learning and processing in spoken language: How do listeners learn the sounds and words of their native and nonnative languages, and how do they recognize them? His research on speech learning concerns initial acquisition processes and ongoing processes of perceptual adaptation. His research on speech processing addresses core computational problems (such as the variability and segmentation problems). He has a multi-disciplinary perspective on psycholinguistics, combining insights from cognitive psychology, phonetics, linguistics, and neuroscience. Alexis Michaud received his PhD in phonetics from Sorbonne Nouvelle in 2005. He joined the Langues et Civilisations à Tradition Orale (LACITO) research centre at the Centre National de la Recherche Scientifique as a research scientist in 2006. His interests include

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

About The Contributors xlix tone and phonation, prosody, language documentation and description, and historical linguistics. His work focuses on languages of the Naish subgroup of Sino-Tibetan (Na and Naxi) and of the Vietic subgroup of Austroasiatic. Hansjörg Mixdorff is a professor at Beuth University of Applied Sciences Berlin. His main research interest is the production and perception of prosody in a cross-language perspective. He employs the Fujisaki model as a tool to synthesize fundamental frequency as well as to measure differences between native and second language speakers’ prosody, for instance. More recently in studies on prominence and attitude, he has worked on the interface between non-verbal facial gestures and prosodic features. Osahito Miyaoka took his MA degree and a PhD in linguistics at Kyoto University. For about 40 years, he carried out a great deal of fieldwork in Southwest Alaska (the Yupik area, including Bethel), in addition to his fieldwork on Yámana as spoken in Ukika on Navarino Island (Tierra del Fuego). In 2012, he published his monumental A Grammar of Central Alaskan Yupik (CAY) (Mouton de Gruyter). Before his retirement from Kyoto University in 2007, he taught at the Graduate School of Letters at Kyoto University. He has also taught at the University of Alaska (Fairbanks and Anchorage) and Hokkaido University. Bernd Möbius is Professor of Phonetics and Phonology at Saarland University and was editor in chief of Speech Communication (2013–2018). He was a board member of the International Speech Communication Association (ISCA) from 2007 to 2015, a founding member and chair (2002–2005) of ISCA’s special interest group on speech synthesis, and has served on ISCA’s Advisory and Technical committees. A central theme of his research concerns the integration of phonetic knowledge in speech technology. Recent work has focused on experimental methods and computational simulations to study aspects of speech production, perception, and acquisition. Doris Mücke is a Senior Researcher at the IfL Phonetics Lab at the University of Cologne, where she works with various methods of acoustic and articulatory analysis. She finished a PhD on vowel synthesis and perception in 2003 and in 2014 obtained her Habilitation degree for her research on dynamic modelling of articulation and prosodic structure. Her main research interest is the integration of phonetics and phonology with a special focus on the interplay of articulation and prosody. She investigates the coordination of tones and segments, prosodic strengthening, and kinematics and acoustics of syllable structure in various languages in typical and atypical speech. Currently, she is working on the relationship between brain modulation and speech motor control within a dynamical approach in healthy speakers as well as with patients with essential tremor and Parkinson’s disease. Ronice Müller de Quadros has been a professor and researcher at the Federal University of Santa Catarina since 2002 and a researcher on sign languages at Conselho Nacional de Desenvolvimento Científico e Tecnológico s ince 2006. She holds an MA (1995) and a PhD (1999) in linguistics, both from Pontifícia Universidade Católica do Rio Grande do Sul. Her PhD project included an 18-month internship at the University of Connecticut (1997–1998). Her main research interests are the grammar of Brazilian Sign Language (Libras), bimodal

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

l About The Contributors bilingual languages (Libras and Portuguese, and American Sign Language and English), sign language acquisition, and the Libras Corpus. Ailbhe Ní Chasaide is Professor of Phonetics and Director of the Phonetics and Speech Laboratory at the School of Linguistic, Speech and Communication Sciences, Trinity College Dublin. She has directed over 20 funded research projects and published widely on a range of topics including the voice quality dimension of prosody, how voice quality and pitch interact in signalling both linguistic and affective information, and the prosodic and segmental structure of Irish dialects. She is the lead principal investigator on the ABAIR project, which is developing phonetic-linguistic resources and technology for the Irish language. Oliver Niebuhr earned his doctorate (with distinction) in phonetics and digital speech processing from Kiel University and subsequently worked as a postdoctoral researcher at linguistic and psychological institutes in Aix-en-Provence and York as part of the interdisciplinary European Marie Curie Research Training Network ‘Sound to Sense’. In 2009, he was appointed Junior Professor of Spoken Language Analysis and returned to Kiel University, where he is Associate Professor of Communication and Innovation at the University of Southern Denmark. In 2017, he was appointed head of the CIE Acoustics Lab and founded the speech-technology startup AllGoodSpeakers ApS. Mitsuhiko Ota is Professor of Language Development at the University of Edinburgh. His research addresses phonological development in both first and second languages, with a focus on the role of linguistic input and the interface between phonology and the lexicon. He has worked on children’s acquisition of prosodic systems, such as those of Japanese and Swedish. He is an associate editor of Language Acquisition. Rhea Paul is Professor and Chair of Speech-Language Pathology at Sacred Heart University and the author of over 100 refereed journal articles, over 50 book chapters, and nine books. She holds a PhD and a Certificate of Clinical Competence in speech-language pathology. She received the Ritvo-Slifka Award for Innovative Clinical Research from the International Society for Autism Research in 2010 and Honors of the Association for lifetime achievement in 2014 from the American Speech-Language and Hearing Association. Mary Pearce obtained her PhD in linguistics at University College London in 2007 on the basis of her research on the phonology and phonetics of Chadic languages, with a particular interest in vowel harmony and tone. She has lived for a number of years in Chad although she is now based back in the UK. Her publications include The Interaction of Tone with Voicing and Foot Structure: Evidence from Kera Phonetics and Phonology (Center for the Study of Language and Information, Stanford, 2013) and ‘The Interaction between Metrical Structure and Tone in Kera’ (Phonology, 2006). She is currently the International Linguistics Coordinator for SIL International. Jörg Peters is Professor of Linguistics at Carl von Ossietzky University Oldenburg, where he teaches phonology, phonetics, sociolinguistics, and pragmatics. His research interests

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

About The Contributors li are in segmental and prosodic variation of German, Low German, Dutch, and Saterland Frisian. His publications include Intonation deutscher Regionalsprachen (de Gruyter, 2006) and Intonation (Winter, 2014). Brechtje Post is Professor of Phonetics and Phonology at the University of Cambridge. Her research focuses on speech prosody, which she explores from a number of different angles (phonology, phonetics, acquisition, and cognitive and neural speech processing). Pilar Prieto is a Research Professor funded by the Catalan Institution for Research and Advanced Studies in the Department of Translation and Language Sciences at Universitat Pompeu Fabra. Her main research interests are how prosody and gesture work in language acquisition and how they interact with other types of linguistic knowledge (pragmatics and syntax). She has published numerous research articles that address these questions and coedited a book titled Prosodic Development (John Benjamins, 2018). Hamed Rahmani obtained his PhD from Radboud University in 2019. His research focuses on the word and sentence prosody of Persian and its interaction with morphosyntax and semantics. His other research interests include the relations between mathematics, music, and linguistic structures. Melissa A. Redford is Professor of Linguistics at the University of Oregon. She received her PhD in psychology and postdoctoral training in computer science from the University of Texas at Austin. Her research investigates how language and non-language systems interact over developmental time to structure the speech plan and inform speech production processes in real time. Henning Reetz holds an MSc in computer science and received his PhD from the University of Amsterdam in 1996. He worked at the Max Planck Institute for Psycholinguistics in Nijmegen and taught phonetics at the University of Konstanz. Currently, he is Professor of Phonetics and Phonology at the University of Frankfurt. His main research is on human and machine speech recognition with a focus on the mental representation of speech. Part of this work includes the processing of audio signals in the early neural pathway, where pitch perception plays a major role. Tomas Riad has been Professor of Nordic languages at Stockholm University since 2005. His main research interests concern prosody in various ways: North Germanic pitch accent typology, the origin of lexical pitch accents, the relationship between grammar and verse metrics, and the relationship between morphology and prosody. He is the author of The Phonology of Swedish (Oxford University Press, 2014). He has been a member of the Swedish Academy since 2011. Nicholas Rolle is a postdoctoral researcher at Leibniz-Zentrum Allgemeine Sprachwissenschaft (ZAS, Berlin). He received his PhD from the University of California, Berkeley in 2018, and was previously a Postdoctoral Research Associate at Princeton University. His specialization is phonology at its interface with morphology and syntax, including the grammatical use of tone, prosodic subcategorization, paradigm uniformity

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

lii About The Contributors effects, and allomorphy. His empirical focus is on African languages, involving fieldwork on the Edoid and Ijoid families of Nigeria. Andrew Rosenberg has been a Research Staff Member at IBM Research since 2016. He received his PhD from Columbia University in 2009. He then taught and researched at Queens College, City University of New York (CUNY), until he joined IBM. From 2013 to 2016, he directed the CUNY Graduate Center Computational Linguistics Program. He has written over 70 journal and conference papers, primarily on automated analyses of prosody and the use of these on downstream spoken-language-processing tasks. He is the author and maintainer of AuToBI, an open-source tool for the automatic assignment of ToBI labels from speech. He is a National Science Foundation CAREER award winner. Hannah Sande is an Assistant Professor of Linguistics at Georgetown University. She obtained her PhD at the University of California, Berkeley, in 2017. She carries out both documentary and theoretical linguistic research. Her theoretical work investigates the interaction of phonology with morphology and syntax, with original data primarily from African languages. She has spent many summers in West Africa working with speakers of Guébie, an otherwise undocumented Kru language spoken in Côte d’Ivoire. She also works locally with speakers of Amharic (Ethio-Semitic), Dafing (Mande), Nobiin (Nilotic), and Nouchi (contact language, Côte d’Ivoire). Her dissertation work focused on phonological processes and their interaction with morphosyntax, based on data from Guébie, where much of the morphology is non-affixal and rather involves root-internal changes such as tone shift or vowel alternations. She continues to investigate morphologically specific phonological alternations across African and other languages. Wendy Sandler is Professor of Linguistics at the University of Haifa and Founding Director of the Sign Language Research Lab there. She has developed models of sign language phonology and prosody that exploit general linguistic principles to reveal both the similarities and the differences in natural languages in two modalities. More recently, her work has turned to the emergence of new sign languages and ways in which the body is recruited to manifest increasingly complex linguistic forms within a community of signers. Sandler has authored or co-authored three books on sign language: Phonological Representation of the Sign (Foris, 1989); A Language in Space: The Story of Israeli Sign Language, co-authored with Irit Meir (Hebrew version: University of Haifa Press, 2004; English version: Lawrence Erlbaum Associates/Taylor Francis, 2008, 2017); and Sign Language and Linguistic Universals, co-authored with Diane Lillo-Martin (Cambridge University Press, 2006). She is currently conducting a multi-disciplinary research project, The Grammar of the Body, supported by the European Research Council. Stefanie Shattuck-Hufnagel is a Principal Research Scientist in the Speech Communication Group at MIT. She received her PhD in psycholinguistics from MIT in 1974, taught in the Department of Psychology at Cornell University, and returned to MIT in 1979. Her research is focused on the cognitive processes and representations that underlie speech production planning, using behaviour such as speech errors, context-governed systematic variation in surface phonetic form, prosody, and co-speech gesture to test hypotheses about the plan-

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

About The Contributors liii ning process and to derive constraints on models of that process. Additional interests include developmental and clinical aspects of speech production, and the role of individual acoustic cues to phonological features in the processing of speech perception. She is a proud founding member of the Zelma Long Society. Elizabeth Schoen Simmons is an Assistant Professor at Sacred Heart University. She received her PhD in Cognitive Psychology from the University of Connecticut. Her research focuses on language development in both typical and clinical populations. Melanie Soderstrom is Associate Professor of Psychology at the University of Manitoba. She received her PhD in psychological and brain sciences from Johns Hopkins University in 2002, and has been a National Institutes of Health-funded postdoctoral researcher in Brown University’s Department of Cognitive and Linguistic Sciences. Her early work examined infants’ responses to the grammatical and prosodic characteristics of speech. More recently, she has focused the characteristics of child-directed speech in the home environment. She is currently active in the ManyBabies large-scale collaborative research initiative, and in a smaller collaborative project, Analyzing Child Language Environments Around the World (ACLEW). Marc Swerts is a Professor in the School of Humanities and Digital Sciences at Tilburg University and currently also acts as the vice-dean of research in that same school. His scientific focus is on trying to get a better understanding of how speakers exploit non-verbal features to exchange information with their addressees, with a specific interest in the interplay between prosodic characteristics, facial expressions, and gestures to signal socially and linguistically relevant information. He has served on the editorial boards of three major journals in the field of language and speech research, and has served as editor in chief of Speech Communication. He was elected to become one of the two distinguished lecturers (for the years 2007–2008) of the International Speech Communication Association (ISCA) to promote speech research in various parts of the world, and was awarded with an ISCA fellowship in 2015. Annie Tremblay is a Professor of Linguistics at the University of Kansas. She completed her PhD in second language acquisition at the University of Hawaiʻi in 2007. She uses psycholinguistic techniques such as eye tracking, cross-modal priming, and artificial language segmentation to investigate speech processing and speech segmentation in non-native listeners, with a focus on the use of suprasegmental information in spoken-word recognition. Jürgen Trouvain is a Researcher and Lecturer at the Department of Language Science and Technology at Saarland University. The focus of his PhD was tempo variation in speech production. His research interests include non-verbal vocalizations, non-native speech, stylistic variation of speech, and historical aspects of speech communication research. He has been a co-editor of publications on non-native prosody and phonetic learner corpora. Alice Turk is Professor of Linguistic Phonetics at the University of Edinburgh, where she has been since 1995. Her work focuses on systematic timing patterns in speech as evidence for the structures and processes involved in speech production. Specific interests include prosodic structure and speech motor control.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

liv About The Contributors Harry van der Hulst specializes in phonology, as encompassing both the sounds systems of languages and the visual aspects of sign languages. He obtained his PhD at Leiden University in 1984. He has published four books, two textbooks, and over 170 articles, and has edited over 30 books and six journal theme issues in the above-mentioned areas. He has been editor in chief of The Linguistic Review since 1990 and is co-editor of the series Studies in Generative Grammar (Mouton de Gruyter). He has been Professor of Linguistics at the University of Connecticut since 2000. Vincent J. van Heuven is Emeritus Professor of Experimental Linguistics and Phonetics at Leiden University and the University of Pannonia. He has honorary professorships at Nankai University and Groningen University, and is a guest researcher at the Fryske Akademy in Leeuwarden. He served as the director of the Holland Institute of Linguistics from 1999 to 2001 and of the Leiden University Centre for Linguistics from 2001 to 2006. He is a life member of the Royal Netherlands Academy of Arts and Sciences. Diana Van Lancker Sidtis is Professor of Communicative Sciences and Disorders at New York University and Research Scientist at the Nathan Kline Institute for Psychiatric Research in Orangeburg, New York. She holds a PhD and a Certificate of Clinical Competence for Speech-Language Pathologists. Educated at the universities of Wisconsin and Chicago and Brown University, she performed predoctoral studies at the University of California, Los Angeles, and was awarded a National Institutes of Health postdoctoral fellowship at Northwestern University. Her peer-reviewed published research examines voice, aphasia, motor speech, prosody, and formulaic language. Her scholarly book Foundations of Voice Studies, co-authored with Dr Jody Kreiman (Wiley-Blackwell, 2011), won the 2011 Prose Award for Scholarly Excellence in Linguistics from the American Publishers Association. Alexandra Vella completed her PhD in linguistics at the University of Edinburgh in 1995. She is Professor of Linguistics at the University of Malta, where she coordinates the Sound component of the Institute of Linguistics and Language Technology programme, teaching various courses in phonetics and phonology. Her main research focus is on prosody and intonation in Maltese and its dialects, as well as Maltese English, the English of speakers of Maltese in the rich and complex linguistic context of Malta. She leads a small team of researchers working on developing annotated corpora of spoken Maltese and its dialects as well as of Maltese English. Paul Warren is Professor of Linguistics at Victoria University of Wellington, New Zealand. He teaches and researches in psycholinguistics (having published Introducing Psycholinguistics, Cambridge University Press, 2012) and in phonetics, especially the description of New Zealand English and of intonation (addressed in his Uptalk, Cambridge University Press, 2016). He is on the editorial boards of the Journal of the International Phonetic Association, Laboratory Phonology, and Te Reo (the journal of the Linguistics Society of New Zealand). He is a founding member of the Association for Laboratory Phonology. Justin Watkins is Professor of Burmese and Linguistics at SOAS, University of London. His research focuses on Burmese and minority languages of Myanmar.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

About The Contributors lv Matthijs Westera is an Assistant Professor in Humanities and AI at Leiden University. He obtained his PhD in 2017 from the University of Amsterdam with a dissertation on implicature and English intonational meaning, after which he held a postdoctoral position in computational linguistics at the Universitat Pompeu Fabra. His research is on the semantics–pragmatics interface and combines traditional methods with advances in computational linguistics and deep learning. Laurence White is a Senior Lecturer in Speech and Language Sciences at Newcastle University. He completed his PhD in linguistics at Edinburgh University in 2002 and was a postdoctoral researcher at Bristol University and the International School for Advanced Studies, Trieste. He joined Plymouth University as a lecturer in 2011 and Newcastle University in 2018. His research explores speech perception, speech production, and their relationship, with a focus on prosody and its role in the segmentation of speech by listeners. He also works on infant language development and second language acquisition. Patrick C. M. Wong holds the Stanley Ho Chair in Cognitive Neuroscience and is Professor of Linguistics and Otolaryngology and Founding Director of the Brain and Mind Institute at the Chinese University of Hong Kong. His research covers basic and translational issues concerning the neural basis and disorders of language and music. His work on language learning attempts to explain the sources of individual differences by focusing on neural and neurogenetic markers of learning in order to support methods to personalize learning. His work also explores questions concerning phonetic constancy and representation. Zilong Xie is currently a postdoctoral researcher in the Department of Hearing and Speech Sciences at the University of Maryland, College Park. He received his PhD in communication sciences and disorders at the University of Texas at Austin. His research focuses on understanding the sensory and cognitive factors that contribute to individual differences in speech processing in typical as well as clinical (e.g. individuals with cochlear implants) populations, using behavioural and neuroimaging (e.g. electroencephalography) methods. Seung-yun Yang is Assistant Professor in Communication, Arts, Sciences & Disorders at Brooklyn College, City University of New York. She is also a certified Speech-Language Pathologist and a member of the Brain and Behavior Laboratory at the Nathan Kline Institute for Psychiatric Research in Orangeburg, New York. Her research aims to better understand the neural bases and acquisition of nonliteral language: how people communicate nonliteral meanings in spoken language and how acquired brain damage affects these communicative functions and pragmatic skills. Alan C. L. Yu is Professor of Linguistics and Director of the Phonology Laboratory at the University of Chicago. His research primarily addresses issues of individual variation in the study of language variation and change, particularly in how it informs the origins and actuation of sound change. He is the author of A Natural History of Infixation (Oxford University Press, 2007) and a (co-)editor of the Handbook of Phonological Theory (2nd edition, Blackwell Wiley, 2011) and the Origins of Sound Change: Approaches to Phonologization (Oxford University Press, 2013).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

lvi About The Contributors Jie Zhang completed his PhD in linguistics at the University of California, Los Angeles, in 2001 and is currently Professor in the Linguistics Department at the University of Kansas, where he teaches courses on phonology, introductory linguistics, and the structure of Chinese. He also served as a Lecturer in Linguistics at Harvard University from 2001 to 2003. His research uses experimental methods to investigate the representation and processing of tone and tonal alternation patterns, with a special focus on the productivity of tone sandhi in Chinese dialects.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 1

I n troduction Carlos Gussenhoven and Aoju Chen

1.1 Introduction In this chapter we first offer a motivation for starting the project that has led to this handbook (§1.2). Next, after a discussion of some definitional and terminological issues in §1.3, we lay out the structure of the handbook in §1.4. Finally, in §1.5 we discuss our reflections on this project and the outlook for this handbook.

1.2 Motivating our enterprise Surveys of language prosody tend to focus on specific aspects, such as tone, word stress, prosodic phrasing, and intonation. Surveys that attempt to cover all of these are less common. In part, this may be due to a perceived lack of uniformity in the field. Shifting conceptions and terminologies may indicate a lack of consensus about basic issues to newcomers, while a confrontation with the variety of approaches to the topic may well have the same effect. We believe, however, that the way today’s researchers conceptualize the word and sentence prosodic structures in the languages of the world and their place in discourse, processing, acquisition, language change, speech technology, and pathology shows more coherence than differences in terminology may suggest. Three developments in the field have been particularly helpful. The first of these is the model of phonology and phonetics presented by Janet Pierrehumbert in her 1980 dissertation and its application to the intonation of English. Its relevance extends considerably beyond the topic of intonation, mainly because her work has shaped our conceptualization of the relation between phonological representations and phonetic implementation. This descriptive framework moreover increased the crosslinguistic comparability of prosodic accounts, and preserved one of the main achievements of Gösta Bruce’s 1977 dissertation, the integration of tone and intonation in the same grammar. Quite unlike what happened after the introduction of phonology as a separate discip line in the early twentieth century, the separation of phonetic implementation from phonological representations has had a fruitful effect on the integrated study of phonetics

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

2 CARLOS GUSSENHOVEN AND AOJU CHEN and phonology. This is true despite the fact that it may be hard to decide whether specific intonational features in some languages have a phonological representation or arise in the phonetic implementation. The second development is the expansion of the database. The wider typological perspective has led to a more realistic view of the linguistic diversity in prosodic systems. Hypotheses about the universality in the relation between prosodic prominence and focus, for instance, are now competing with a hypothesis taking post-focus compression, a common way of marking focus, to be an areal feature (e.g. Samek-Lodovici 2005; Xu et al. 2012). Further, the data increase will have helped to shift the objects of typology from languages (‘intonation language’, ‘tone language’) to linguistic properties (cf. Larry Hyman’s work on ‘propertydriven typology’, 2006). Finally, the availability of large corpora annotated for various features has resulted in new insights into the use of prosody in everyday communication by speakers with different native languages and second languages. The third development is the emergence of new lines of research, in part due to a rapid evolution of research methodologies and registration techniques, such as eye tracking and the registration of brain activity (e.g. electroencephalography—EEG, magnetoencephalography— MEG, and functional magnetic resonance imaging—fMRI). Earlier, psycholinguistic research paradigms focused on the investigation of the role of prosodic cues in spoken-word recognition, on how listeners capitalize on prosodic cues to resolve temporary syntactic and semantic ambiguity in (‘garden path’) cases such as When he leaves the house is dark, and on how pragmatically appropriate prosody influences speech comprehension. More recently, neurological research has shown topographic and latency patterns in brain activity induced by a variety of linguistic stimuli. Accent location and type of accent have been investigated as cues to language processing in eye-tracking research, and electromagnetic registrations of articulation have provided information on how speakers integrate the phonetic implementation of prosodic and ‘textual’ parts of language. Broadly, Parts I to IV deal with linguistic representations and their realization, while Parts V to VIII deal with a variety of fields in which language prosody plays an important role. In this second half of the book, preconceived theoretical notions might at times be less of a help and, accordingly, not all chapters relate to a theoretical framework.

1.3 Definitional and terminological issues Much as we dislike pinning ourselves down to definitions of ‘language prosody’ (it is best to keep options open as long as our understanding is developing), it will be useful to provide some orientation on what we mean by that term. Definitions of scientific objects do more than circumscribe phenomena so as to shape the reader’s expectations: they also reflect the conceptualizations of the field. In §1.3.1, we briefly sketch the development in the conceptualization of language prosody, while §1.3.2 indicates the typological categories that this handbook is most directly concerned with and §1.3.3 discusses some terminological ambiguities.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

INTRODUCTION 3

1.3.1 Tradition and innovation in defining language prosody Over the past decades, a factor of interest has been the balance between form and function. An early form-based conceptualization focused on the analysis of the speech signal into four phonetic variables: pitch, intensity, duration, and spectral pattern. The first three variables are ‘suprasegmental’ in the sense that they are ‘overlaid’ relative to features that inherently define or express phonetic segments (Lehiste 1970). Examples of inherent features are voicing, nasality, and a low second formant, or even—so Lehiste points out— duration and intensity inasmuch as these are required for a segment to be identifiable in the visually or auditorily perceived speech signal. Thus, duration is suprasegmental if it is used to create a long vowel or geminate consonant, and pitch is suprasegmental when it is a manifestation of a tonal or intonational pattern (p. 2). This division between ‘segmental’ and ‘suprasegmental’ features has shaped the conceptualization of the field up until today. One consequence is that the term ‘segmental’ typically still excludes tones, the segments of autosegmental phonology, whose realization relies largely on pitch. Its usual reference is only to vowels and consonants, segments whose realization relies largely on spectral variation. Within this approach, a functional perspective will consider how each suprasegmental variable plays its role in creating communicative effects. Pitch is the most easily perceived as a separate component of speech and its acoustic correlate, fundamental frequency (f0), is readily extractable from the signal (see chapter 3). It thus invitingly exposes itself to linguistic generalizations, as in Xu’s (2005) function-based approach known as the PENTA model. In principle, this approach could also be applied to duration, which may reflect vowel or consonant quantity as well as effects of domain boundaries, tone-dependent lengthening, and effects of speech tempo, but it would be less suitable for intensity, which has a substantially reduced role to play in speech communication (cf. Watson et al. 2008a). In order to go beyond the decomposition of the speech signal into phonetic variables and the establishment of their role in language, it is useful to ask ourselves to what extent these variables express forms or functions. More recently, there has been a stronger emphasis on the distinction between what Lehiste (1970) referred to as ‘paralanguage’ and ‘language’. Paralanguage is ‘not to be confused with language’, but is nevertheless used ‘in systematic association with language’ (p. 3). As a result, researchers have confronted the fact that phonetic variables can express meanings, as in paralanguage, and meaningless phonological forms, as in language. The first aspect is involved in the signalling of affect and emotion, as when we raise our pitch to express fear or surprise, and sometimes of more linguistic functions such as interrogativity and emphasis, through manipulations of pitch, intensity, and duration. The second concerns the phonetic expression of phonological constituents such as segments, syllables, feet, and larger phonological constituents including the phonological word and the phonological phrase. Of these, segments have a different status in that the features they are composed of provide the phonological content of the phonological representation, while the other constituents provide a hierarchical prosodic phrasing structure, similar though not identical to the structure built by morphosyntactic constituents.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

4 CARLOS GUSSENHOVEN AND AOJU CHEN

1.3.2 Some typological categories To return to the issue of what this handbook should be about, we consider the core prosodic elements to be tone, stress, prosodic constituents, and intonation. First, as said above, tones are segments, following their analysis as autonomous segments in autosegmental phonology (Goldsmith 1976a). They have a variety of functions, as discussed in detail in chapters 4 and 6. Second, stress, which is not a segment, is a manifestation of a headed prosodic constituent, the foot. The intrinsic link between feet and word stress accounts for its ‘obligatoriness’ (Hyman 2006), on the assumption that while specific tones, vowels, and consonants may or may not appear in any one word of a language, prosodic parsing is obligatory (Selkirk 1984; Nespor and Vogel 1986; cf. the ‘Strict Layer Hypothesis’). For many languages, contrastive secondary stress will additionally require a headed phonological word, as in English, which differentiates between the stress patterns of ˈcatamaˌran and ˌhullabaˈloo, for example. Further increases in prominence can be created by the presence of intonational pitch accents on some stressed syllables. Many researchers assume that stress exists above the level of the word as a gradient property correlating with the rank of phonological constituents. In this view, increasing levels of stress, or of prominence, function as the heads of prosodic phrases and guide the assignment of pitch accents to word-stressed syllables (for recent positions, see e.g. Cole et al. 2019; Kratzer and Selkirk 2020; and references in both of these). Other analyses describe pitch accent assignment independently of any stress levels beyond the word, following Bolinger’s (1958: 111) conclusion that ‘pitch and stress are phonemically independent’ (cf. Lieberman 1965; Gussenhoven 2011). Chapter 5 explicitly keeps its options open and reports cases of phrasal stress for languages that may not have word stress. Also, chapter 10 makes the point that the phonetic properties of pitch accented syllables in West Germanic languages are typically of the same kind as those that differentiate stressed from unstressed syllables at the word level, and appear to enhance the word stress (e.g. Beckman and Cohen 2000). The third element, prosodic constituency, was already implicated in the above comments on stress. Prosodic constituents form an increasingly encompassing, hierarchical set. Not all of these are recognized by all researchers, but a full set ranges from morae (μ) to utterances (υ), with syllables (σ), feet (Ft), phonological words (ω), clitic groups (CG), accentual phrases (α aka AP), phonological phrases (φ, aka PP), intermediate phrases (ip), and inton ational phrases (ι aka IP) in between. Timing variation conditioned by certain constituents in this prosodic hierarchy may, in part, be responsible for perceptions of language-specific rhythmicity (see chapter 11). Languages appear to skip these prosodic ranks as a matter of course, meaning that not all of these constituents are referred to by grammatical generalizations in all languages. Truth be told, it is more difficult to show that a constituent has some role to play than to show it has no reality at all; no convincing empirical case has been made for the absence of syllables in any language, for instance (cf. Hyman 2015). Languages with contrastive syllabification within the morphological word are usually claimed to have ω’s, where these constitute syllabification domains. For example, English ˈX-Parts and ˈexperts respectively have an aspirated and an unaspirated [p]. The first item can on that basis be argued to have two ω’s, (X)ω (Parts)ω (contrasting with a single ω for the second item, cf. Nespor and Vogel 1986: 137). Further up the hierarchy, prosodic constituents often define the tonal structure, like α’s rejecting more than one pitch accent in their domain, or any constituent defining the distribution of tones, as in the case of the α-initial boundary tones

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

INTRODUCTION 5 of Northern Bizkaian Basque or the ι-final ones in Bengali, Irish, Lusoga, Mawng, or Turkish. Because not all constituents may be in use by the grammar, researchers may find themselves free to pick a rank for the phonologically active constituent. For instance, the φ maximally has one pitch accent in many Indo-European languages spoken in India, but, since it would appear to be the only constituent between the ω and the ι, either α or ip might in principle have been used instead of φ. Besides such indeterminacy, rival accounts may exist for the same language. For instance, West Germanic languages have been described with as well as without ip’s, and only continued research can decide which description is to be preferred. The fourth element, ‘intonation’, is a formal as well as a functional concept. Ladd’s (2008b: 4) definition identifies three properties: The use of suprasegmental phonetic features to convey ‘postlexical’ or sentence-level pragmatic meanings in a linguistically structured way. (italics original)

The ‘form’ aspects here are the restriction to linguistically structured ways and the restriction to suprasegmental features. The first restriction says that the topic concerns phonologically encoded morphemes. The second restriction excludes morphemes that are exclusively encoded with the help of spectral features, i.e. with the help of vowels and consonants, such as question particles, focus-related morphemes, and modal adverbs with functions similar to those typically encoded by intonational melodies. As for the functional aspect, ‘sentence-level meanings’ are those that do not arise from the lexicon, thus excluding any form of lexically or morphologically encoded meaning. These emphatically include the intonational pitch accents of a language like English (even though Liberman 1975, quite appropriately, referred to them as an ‘intonational lexicon’ to bring out the fact that they form a collection of post-lexical tonal morphemes with discoursal meanings). The additional reference to ‘pragmatic’ meaning allows vowel quantity to be part of ‘intonation’. Final vowel lengthening in Shekgalagari signals listing intonation, while absence of penultimate vowel lengthening marks hortative and imperative sentences (Hyman and Monaka 2011). At the same time, the inclusion of ‘pragmatic’ may pre-judge the issue of the kinds of meaning that post-lexical tone structures can express. Because prosodic phrasing may affect pitch accent location, and because prosodic phrasing reflects syntactic phrasing, syntactic effects may arise from the way utterances are prosodically phrased (Nespor and Vogel 1986: 301). For instance, to borrow an example from chapter 19, the accent locations in CHInese GARden indicate that this is a noun phrase, but the accentuation of MAINland ChiNESE GARden Festival signals the phrasal status of Mainland Chinese, thus giving an interpretation whereby the festival features international garden design, as opposed to exclusively focusing on garden design in China at an exhibition on some mainland. More directly syntactic uses of tone are mentioned for Yanbian Korean (chapter 24) and Persian (chapter 14).

1.3.3 Some terminological ambiguities Partly because of shifts in conceptions, reports of prosodic research can be treacherous in their use of technical terms, notably ‘pitch accent’, ‘phrase accent’, and ‘compositionality’. To tackle the first of these, Pierrehumbert (1980) distinguished ‘boundary tones’, which align

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

6 CARLOS GUSSENHOVEN AND AOJU CHEN with the boundaries of prosodic domains, from ‘central tones’, which are located inside those domains. Boundary tones are typically non-lexical, while central tones have either a lexical or an intonational function. Central tones are frequently referred to as ‘accents’ or ‘pitch accents’, but those terms have been used in other meanings too. Quite apart from its evidently different meaning of a type of pronunciation characteristic of a regional or social language variety, there are various prosodic meanings of ‘accent’, ranging from that of a label indicating the location of some tone, as in Goldsmith (1976a), to a phonetic or phonological feature that is uniquely found in a specific syllable of the word, such as duration, bimoraicity, or tone, as in van der Hulst (2014a). There would appear to be at least four meanings of the term ‘pitch accent’. One is the occurrence of a maximum of a single instance of distinctive tone per domain, as in Barasana (chapters 4 and 29), Tunebo (chapter 29), tonal varieties of Japanese, and Bengali, whereby the location of the contrast may be fixed, like the word-initial syllable in Tunebo. Second, ‘pitch accent’ may refer to a lexically distinctive tone contrast in a syllable with word stress, often restricted to word stress in a specific location in the word, like the first in South Slavic languages (chapter 15), a non-final one in Swedish (chapter 18), and the last in Ma’ya (chapter 25). Other cases are the Goizueta and Leitza varieties of Basque, Franconian German (including Limburgish), and Norwegian. The Lithuanian case is presented with some reservations in chapter 15. These two meanings are often collapsed in discussions, particularly if the tone contrasts are binary, as in all languages mentioned here except Ma’ya. Third, ‘pitch accent’ is used for the intonational tones that are inserted in the accented syllables in many Indo-European languages, whereby there may be more than one pitch accent in a domain. This usage derives from Bolinger’s (1958) article arguing for the phonological independence of tone and word stress and was boosted by Pierrehumbert’s (1980) adoption of the term. A factor favouring the use of ‘pitch accent’ in all three types of central tone mentioned so far is the existence of generalizations about their location. If they are deleted in some morphosyntactic or phonological contexts, descriptions may prefer stating that the accent is deleted to stating that the tones are deleted. In Tokyo Japanese, these two ways of putting things amount to the same thing, since only a single tone option is available. However, syllables that must be provided with tone in an English sentence, i.e. its accented syllables, each have a set of tones from which a choice is to be made, which makes a description in terms of tone insertion or deletion more cumbersome. Ladd (1980) coined the term ‘deaccenting’ to refer to the removal of the option of tonal insertion in a given syllable. Here, ‘accent’ is often used to refer to the location and ‘pitch accent’ to the tones. Finally, a fourth use of ‘pitch accent’ is a word prominence that is exclusively realized by f0, as in tonal varieties of Japanese. In this sense, Italian and English have ‘non-pitch accents’ (aka ‘stress accents’), because an accented stressed syllable will have durational and spectral features distinguishing it from unstressed syllables, besides f0 features distinguishing it from unaccented stressed syllables (Beckman 1986; see also chapter 10). Of these different usages, the Bolinger–Pierrehumbert one has perhaps been most widely accepted. The term ‘phrase accent’ has different meanings, too. Pierrehumbert (1980) first used it to refer to a tone or tones between the last pitch accent and the boundary tone at the right edge of the ι. In Swedish, this description applies to a H tone that marks the end of the focused constituent (the ‘sentence accent’; Bruce 1977), which may be flanked by a lexical pitch accent on its left and a final boundary tone on its right. Pierrehumbert (1980) introduced L- and H- in this postnuclear position to provide greater melodic scope for the English

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

INTRODUCTION 7 nuclear pitch accent as compared to prenuclear ones. Two new meanings arose from theor etical proposals about the status of these English tones. First, Beckman and Pierrehumbert (1986) reanalysed the L- and H- phrase accents as the final boundary tones of a new pros odic constituent, the ip, ranking immediately below the IP. The second redefinition is that of a postnuclear tone in any language that has an association with a tone-bearing unit, typically a stressed syllable, which configuration is otherwise reserved for the starred tone of a pitch accent (Grice et al. 2000). In this second sense, the phrase accent retains the property of being final in the phrase as well as that of an accent, because it will typically be associated with the last available stressed syllable. Since there is no guarantee that what have been analysed as boundary tones of the ip always have an association with a stressed syllable and vice versa, it is important to keep these meanings apart. A final terminological comment concerns ‘compositionality’, which refers to the transparent contributions of individual morphemes to the meaning of a linguistic expression. In intonation the term has been used to refer to the more specific assumption that each of the morphemes must consist of a single phonological tone, a position frequently attributed to Pierrehumbert and Hirschberg (1990). In the latter interpretation, all alternative analyses are ‘non-compositional’; in the more general sense, that label only applies to Liberman and Sag’s (1974) proposal that the sentence-wide melodies are indivisible morphemes, like L*+H L* H- H% (to use Pierrehumbert’s later analysis), which they analyse as meaning ‘contradiction’.

1.4 The structure of the handbook Part I, ‘Fundamentals of Language Prosody’, lays out two fundamental prerequisites for language prosody research. Chapter 2 (Taehong Cho and Doris Mücke) sets out the avail able measurement techniques, while emphasizing the integrity of speech and the interaction between prosodic and supralaryngeal data. Chapter 3 (Oliver Niebuhr, Henning Reetz, Jonathan Barnes, and Alan C. L. Yu) surveys the mechanics of f0 perception in acoustic signals generally, as well as pitch perception in its dependence on aspects of speech, including a discussion of the relation between the visually observable f0 tracks in the speech signal and the pitch contours as perceived in speech. Part II, ‘Prosody and Linguistic Structure’, contains five chapters devoted to structural aspects of prosody. Chapter 4 (Larry M. Hyman and William R. Leben) is a wide-ranging survey of lexical and grammatical tone systems, juxtaposing data from geographically widely dispersed areas. Chapter 5 (Matthew K. Gordon and Harry van der Hulst) is on stress systems and shows that languages with stress vary in the extent to which stress locations are rule governed as opposed to lexically listed; it also presents a typological survey of stress systems and their research issues. The autosegmental-metrical model, which describes sentence prosody in terms of tones and their locations in the prosodic structure, is discussed in chapter 6 (Amalia Arvaniti and Janet Fletcher), with explanatory references to the analysis of American English by Janet Pierrehumbert and Mary Beckman (MAE_ToBI) as well as other languages. Chapter 7 (John J. McCarthy) summarizes the way phonological constituents such as the syllable, the foot, and the phonological word may be the exponents of morphological categories. Finally, chapter 8 (Wendy Sandler, Diane Lillo-Martin, Svetlana Dachkovsky, and Ronice de Quadros) dives into the syntax versus prosody debate and

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

8 CARLOS GUSSENHOVEN AND AOJU CHEN shows how non-manual markers of information structure and wh-questions in unrelated sign languages are prosodic in nature. Part III, ‘Prosody in Speech Production’, contains three survey chapters on reflexes of prosodic structure in the speech signal, dealing with tones, stress, and rhythm. Chapter 9 (Jonathan Barnes, Hansjörg Mixdorff, and Oliver Niebuhr) tackles tonal phonetics covering both lexical and post-lexical tone. The chapter keeps a close eye on the difference between potentially speaker-serving strategies, such as tonal coarticulation, and potentially hearerserving strategies aimed at sharpening up pitch contrasts. Chapter 10 (Vincent J. van Heuven and Alice Turk) surveys cues to word stress and sentence accents, which are not restricted to variation in the classic suprasegmental features (cf. §1.3.1) but appear in spectral properties, as derivable from hyperarticulation. Chapter 11 (Laurence White and Zofia Malisz) evaluates the claim that languages fall into rhythm groups based on the timing of morae, syllables, or stresses. As an alternative to temporal regularity of these lower-end hierarchical constituents, segmental properties, such as consonant clusters and reduced vowels, have been hypothesized to be the determinants of rhythm. Regardless of whether they are, these have figured in a number of ‘rhythm measures’ by which languages have been characterized. Part IV, ‘Prosody across the World’, consists of 18 surveys of the prosodic facts and prosody research agendas for the languages of the world. We felt that a geographical approach would best accommodate the varying regional densities of language families and the varying intensities of research efforts across the regions of the world. As an added benefit, a geographical approach gives scope to identifying areal phenomena. Given our choice, the distribution of languages reflects the situation before the European expansion of the fifteenth and later centuries, which allowed us to include varieties of European languages spoken outside Europe in the chapters covering Europe. One way or another, our geograph ical approach led to the groupings shown in Map 1.1 (see plate section), which identifies geographical areas by chapter title and number. The core topics addressed in these chapters are word stress, lexical tone, and intonation, plus any interactions of these phenomena with other aspects of phonology, including voice quality, non-tonal segmental aspects, prosodic phrasing, or morphosyntax. For word stress, these notably involve rhyme structures. They may have effects on intonation, as in Finnish and Estonian (chapter 15, Maciej Karpiński, Bistra Andreeva, Eva Liina Asu, Anna Daugavet, Štefan Beňuš, and Katalin Mády), and more commonly on stress locations, as in the case of Maltese (chapter 16, Mariapaola D’Imperio, Barbara Gili Fivela, Mary Baltazani, Brechtje Post, and Alexandra Vella). Derivation-insensitive, morphemic effects on stress are p ervasive in Australian languages, which tend to place the stress at the left edge of the root (chapter 26, Brett Baker, Mark Donohue, and Janet Fletcher). Many of the languages dealt with in chapter 14 (Anastasia Karlsson, Güliz Güneş, Hamed Rahmani, and Sun-Ah Jun) have no stress and the same goes for those in chapter 25 (Nikolaus P. Himmelmann and Daniel Kaufman), which goes some way towards correcting an impressionistic over-reporting of stress in the earlier literature. Phonological constraints on the distribution of lexical tone may be imposed by segmental, metrical, syllabic, or phrasal structures. Interactions between tones and voice quality are notably present in South East Asia ( chapter 23, by Marc Brunelle, James Kirby, Alexis Michaud, and Justin Watkins) and Mesoamerica (chapter 28, by Christian DiCanio and Ryan Bennett). In varieties of Chinese, tone contrasts are reduced by a coda glottal stop (chapter 22, by Jie Zhang, San Duanmu, and Yiya Chen), and in the Chadic

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

INTRODUCTION 9 l anguage Musgu, tones are targeted by depressor as well as raiser consonants ( chapter 13, Sam Hellmuth and Mary Pearce). Swedish allows lexical tone in stressed syllables only, while in most German tonal dialects it is additionally restricted to syllable rhymes with two sonorant morae (chapter 18, Tomas Riad and Jörg Peters), as it is in Somali and Lithuanian. Besides the restriction on the number of distinctive tones within a phrase (see §1.3), phrase length may impose constraints on the number of boundary tones that are realized. Thus, in Seoul Korean one or two tones out of four will not be realized if the α has three syllables or less (chapter 24, by Sun-Ah Jun and Haruo Kubozono). Tones are frequent exponents of morphological categories (‘grammatical tone’) in the languages spoken in sub-Saharan Africa, as illustrated in chapter 12 (Larry M. Hyman, Hannah Sande, Florian Lionnet, Nicholas Rolle, and Emily Clem), and North America, as discussed in chapter 27 (Gabriela Caballero and Matthew K. Gordon), for instance. While all chapters discuss prosodic phrasing, the number of phrases involved in constructing the tone string varies from three in varieties of Basque (the accentual phrase, the intermediate phrase, and the intonational phrase) via two in Catalan and Spanish (which lack an accentual phrase) to one in Portuguese (which only uses an intonational phrase in the construction of the tonal representation) (chapter 17, by Sónia Frota, Pilar Prieto, and Gorka Elordieta). In chapter 20 (Kristján Árnason, Anja Arnhold, Ailbhe Ní Chasaide, Nicole Dehé, Amelie Dorn, and Osahito Miyaoka), varying degrees of integration of clitics into their host in Central Alaskan Yupik create different foot and syllable structures, with effects on stress and segments. Finally, as expected, prosodic diversity varies considerably from chapter to chapter. South America is home to 53 language families and shows a variety of languages with stress, with tone, and with both stress and tone (chapter 29, by Thiago Costa Chacon and Fernando O. de Carvalho). Two chapters offer counterpoints to this situation. Chapter 19 (Martine Grice, James Sneed German, and Paul Warren) deals with English but presents a number of closely related varieties of this language under the heading of Mainstream English Varieties as well as a number of contact languages, known as ‘New Englishes’, which display a range of typologically different phenomena. And chapter 21 (Aditi Lahiri and Holly J. Kennard) shows how the two largest language families spoken on the Indian subcontinent, Dravidian and Indo-European, present very similar intonation systems despite their genetic divergence. Part V, ‘Prosody in Communication’, starts with a survey of approaches to intonational meaning, contrasting detailed accounts of limited data with broader accounts of entire inventories of melodies, with progress being argued to follow from the integration of these approaches (chapter 30, by Matthijs Westera, Daniel Goodhue, and Carlos Gussenhoven). In chapter 31 (Frank Kügler and Sasha Calhoun), prosodic ways of marking focus are split into those that rely on the manipulation of prominent syllables (stress- or pitch-accentbased cues) and those that rely on phrase- or register-based cues, with discussion of the prosodic marking of a number of aspects of information structure, such as topic, comment, and givenness. Chapter 32 (Julia Hirschberg, Štefan Beňuš, Agustín Gravano, and Rivka Levitan) addresses the importance of prosody beyond the utterance level. It surveys the role of prosody in characterizing the dynamics of interpersonal spoken interactions, such as turn-taking and entrainment, and in providing indications of each speaker’s state of mind, such as deception versus truthfulness. Chapter 33 (Marc Swerts and Emiel Krahmer) zooms in on how visual prosody relates to auditory prosody in communication and on how the combined use of visual and auditory prosody varies across cultures. Finally, chapter 34 (Diana Van Lancker Sidtis and Seung-yun Yang) reviews the underlying causes of

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

10 CARLOS GUSSENHOVEN AND AOJU CHEN c ommunication-related prosodic abnormalities in adults with brain damage and the challenges facing evaluation of prosodic abilities, and surveys treatments for certain prosodic deficiencies together with evidence for their efficacy. Part VI, ‘Prosody and Language Processing’, examines both the processing of prosodic information and the role of prosody in language processing. Chapter 35 (Joseph C.Y. Lau, Zilong Xie, Bharath Chandrasekaran, and Patrick C.M. Wong) reviews lesion, neuroimaging, and electrophysiological studies of the processing of linguistically relevant pitch patterns (e.g. lexical tones) and the influence of prosody on syntactic processing. It shows that what underlies linguistic pitch processing is not a specific lateralized area of the cerebral cortex, but an intricate neural network that spans the two hemispheres as well as the c ortical and subcortical areas along the auditory pathway. Chapter 36 (James M. McQueen and Laura Dilley) discusses how the prosodic structure of an utterance constrains spoken-word recognition; the chapter also outlines a prosody-enriched Bayesian model of spoken-word recognition. Chapter 37 (Stefanie Shattuck-Hufnagel) summarizes evidence for the use of prosodic structure in speech production planning as part of a review of modern theories of prosody in the acoustic, articulatory, and psycholinguistic literature, and probes into the role of prosody in different models of speech production planning. Part VII, ‘Prosody and Language Acquisition’, reflects a growing body of research on prosodic development at both word and phrase level in first language (L1) acquisition. Chapter 38 (Paula Fikkert, Liquan Liu, and Mitsuhiko Ota) surveys the developmental stages during infancy and early childhood in the perception and production of lexical tone, Japanese pitch accent, and word stress, and summarizes work on the relationship between perception and production, the representation of word prosody, and the factors driving the development of word prosody. Chapter 39 (Aoju Chen, Núria Esteve-Gibert, Pilar Prieto, and Melissa A. Redford) presents developmental trajectories for the formal and functional properties of phrase-level prosody and reviews the factors that may explain why prosodic development is a gradual process across languages and why cross-linguistic differences nevertheless arise early. Chapter 40 (Judit Gervain, Anne Christophe, and Reiko Mazuka) summarizes empirical evidence for early sensitivity to prosody and discusses in depth how infants use prosodic information to bootstrap other aspects of language, in particular word segmentation, word order, syntactic structure, and word meaning. Chapter 41 (Melanie Soderstrom and Heather Bortfeld) reviews the primary prosodic characteristics of childdirected speech (CDS) (which is the input for early language development), considers sources of variation across culture and context, and examines the function of CDS for social and linguistic development. Focusing on pathological conditions, chapter 42 (Rhea Paul, Elizabeth Schoen Simmons, and James Mahshie) examines prosodic dysfunction in children within relatively common developmental disorders, such as autism spectrum disorder and developmental language disorder. It also outlines strategies for assessing and treating these prosodic deficits, many of which offer at least short-term improvements in both pros odic production and perception. Moving on to research on the learning and teaching of prosody in a second language, chapter 43 (Allard Jongman and Annie Tremblay) surveys adult second language (L2) learners’ production, perception, and recognition of word prosody, and the role of L1 word prosody in L2 word recognition. Chapter 44 (Jürgen Trouvain and Bettina Braun) reviews current knowledge of adult L2 learners’ production and perception of phrasal-level prosody (i.e. intonation, phrasing, and timing) and their use of intonation for communicative

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

INTRODUCTION 11 urposes, such as expressing information structure, illocutionary force, and affect. p Chapter 45 (Dorothy M. Chun and John M. Levis) focuses on teaching prosody, mainly intonation, rhythm, and word stress, and the effectiveness of L2 instruction of prosody. Part VIII, ‘Prosody in Technology and the Arts’, begins with two chapters dealing with technological advances and best practices in automatic processing and labelling of prosody. Chapter 46 (Anton Batliner and Bernd Möbius) identifies changes in the role of prosody in automatic speech processing since the 1980s, focusing on two main aspects: power features and leverage features. Chapter 47 (Andrew Rosenberg and Mark Hasegawa-Johnson) examines major components of an automatic labelling system, illustrates the most import ant design decisions to be made on the basis of AuToBI, and discusses assessment of automatic prosody labelling and automatic assessment of human prosody. The third and fourth chapters are dedicated to art forms that interact with linguistic-prosodic structures. In chapter 48, Paul Kiparsky investigates the way metrical constraints define texts so as to fit the poetic forms of verse. He demonstrates their effects in various verse forms in a broad typological spectrum of languages and includes an account of the detailed constraints imposed by William Shakespeare’s meter. Chapter 49 (D. Robert Ladd and James Kirby) is the first typological treatment of the constraints musical melodies pose on linguistic tone. The authors show how both in Asia and in Africa constraints focus on transitions between notes and tones, rather than the notes or the defining pitches of the tones themselves.

1.5 Reflections and outlook This handbook needed to strike a balance between our desire to be comprehensive and the fact that, besides becoming available as an e-book, it was to be published as a single volume. We understood one side of this balance, comprehensiveness, not only in terms of topic coverage but also in the sense of representing the work of different authors and research groups from different disciplines. Moreover, in line with the usual objective of handbooks of this kind, we wanted to provide a research survey for specialists that would also be suitable as an introduction for beginning researchers in the field as well as for researchers with a non-prosodic background who wish to extend their research to prosody. That is, it should ideally present the state of the art and only secondarily provide a platform for new research results. The other side of the balance, size limitation, made us emphasize conciseness in our communications with authors in a way that frequently bordered on rampant unreasonableness, particularly when our suggestions came with requests for additional content. We are deeply grateful for the generosity and understanding with which authors responded to our suggestions for textual savings and changes and hope that this policy has not led to significant effacements of information or expository threads. The outline of the contents of the handbook in §1.4 passes over topics that have no dedicated chapter in this handbook. Some of these exclusions derive from our discussion of the definition of language prosody in §1.3. Here, we may think of laughter, filled pauses, more or less continuous articulatory settings characteristic of languages, nasal prosodies, syllable onset complexity, or focus-marking particles with no tone in them. All of these will interact with language prosody, but they are not focuses of this handbook. In other cases, various circumstances account for the absence of topics. Part II might have featured a chapter on

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

12 CARLOS GUSSENHOVEN AND AOJU CHEN prosodic phrasing setting out its relation to syntactic structure, including the apparent motivations for deviating from phonological-morphosyntactic isomorphism. The relative lack of typological data here would have made it hard to discuss, say, the estimated frequencies with which prosodic constituents make their appearance in languages or the extent to which size constraints trump syntactic demands on phonological structure. The geographical approach in Part IV made us lose sight of creoles, languages that arose on the basis of mostly European languages in various parts of the world; a survey might well have addressed the debate on the status of creoles as a typological class (McWhorter 2018). Finally, a few topics do not feature in the handbook in view of recent overviews or volumes dedicated to them. In Part I, we decided to leave out a chapter on the physiological and anatomical aspects of speech production because of the existence of a number of surveys, notably Redford (2015). In Part VI, while the processing of prosody in language comprehension is briefly considered in chapter 35, we refer to Dahan (2015) for a more detailed discussion. Including a chapter on prosody in speech synthesis, a chapter on the role of prosody in social robotics, and a chapter on the link between music and speech prosody would have given Part VIII a larger coverage. Hirose and Tao (2015), Crumpton and Bethel (2016), and Heffner and Slevc (2015), however, provide welcome overviews of the state of the art on each of these topics. We hope that this survey will not only be useful for consultation on a wide range of information but also serve as a source of inspiration for tackling research questions that the 49 chapters have implicitly or explicitly highlighted through their multi-disciplinary lens.

pa rt I

F U N DA M E N TA L S OF L A NGUAGE PRO S ODY

chapter 2

A rticu l atory M easu r es of Prosody Taehong Cho and Doris Mücke

2.1 Introduction Over the past few decades, theories of prosody have been developed that have broadened the topic in such a way that the term ‘prosody’ does not merely pertain to low-level realization of suprasegmental features such as f0, duration, and amplitude, but also concerns highlevel prosodic structure (e.g. Beckman 1996; Shattuck-Hufnagel and Turk 1996; Keating 2006; Fletcher 2010; Cho 2016). Prosodic structure is assumed to have multiple functions, such as a delimitative function (e.g. a prosodic boundary marking), a culminative function (e.g. a prominence marking), and functions deriving from the distribution of tones at both lexical and post-lexical levels. It involves dynamic changes of articulation in the laryngeal and supralaryngeal system, often accompanied by prosodic strengthening—that is, hyperarticulation of phonetic segments to enhance paradigmatic contrasts by a more distinct articulation, and sonority expansion to enhance syntagmatic contrasts by increasing periodic energy radiated from the mouth (see Cho 2016 for a review). Under a broad definition of prosody, therefore, prosody research in speech production concerns the interplay between phonetics and prosodic structure (e.g. Mücke et al. 2014, 2017). It embraces issues related to how abstract prosodic structure influences the phonetic implementation by the laryngeal and supralaryngeal systems, and how higher-level prosodic structure may in turn be recoverable from or manifest in the variation in the phonetic realization. For example, a marking of a tonal event in the phonetic substance involves dynamic changes not only in the laryngeal system (regulating the vocal fold vibration to produce f0 contours) but also in the supralaryngeal system (regulating movements of articulators to produce consonants and vowels in the textual string). With the help of articulatory measuring techniques, the way these two systems are coordinated in the spatio-temporal dimension is directly observable, allowing various inferences about the role of the prosodic structure in this coordination to be made. This chapter introduces a number of modern articulatory measuring techniques, along with examples across languages indicating how each technique may be used or has been

16 TAEHONG CHO AND DORIS Mücke used on various aspects of prosody in the phonetics–prosody interplay. These include (i) laryngoscopy and electroglottography (EGG) to study laryngeal events associated with vocal fold vibration; (ii) systems such as the magnetometer (electromagnetic articulography, EMA), electropalatography (EPG), and ultrasound systems for exploring supralaryngeal articulatory events; and (iii) aerodynamic measurement systems for recording oral/ subglottal pressure and oral/nasal flow, and a device, called the RIP (Respitrace inductive plethysmograph) for recording respiratory activities.

2.2 Experimental techniques 2.2.1 Laryngoscopy Laryngoscopy allows a direct observation of the larynx. A fibreoptic nasal laryngoscopy system (Ladefoged 2003; Hirose 2010) contains a flexible tube with a bundle of optical fibres which may be inserted through the nose, while the lens at the end of the fibreoptic bundle is usually positioned near the tip of the epiglottis above the vocal folds. Before the insertion of the scope, surface anaesthesia may be applied to the nasal mucosa and to the epipharyngeal wall. The procedure is relatively invasive, requiring the presence of a physician during the experiment. A recent system for laryngoscopy provides high-speed motion pictures of the vibrating vocal folds with useful information about the laryngeal state and the glottal condition during phonation (e.g. Esling and Harris 2005; Edmondson and Esling 2006). The recording of laryngeal images by a laryngoscope is often made simultaneously with a recording of the electroglottographic and acoustic signals (see Hirose 2010: fig. 4.3). Because of its invasiveness and operating constraints, however, the use of laryngoscopy in phonetics research has been quite limited. In prosody research, a laryngoscope may be used to explore the laryngeal mechanisms for controlling f0 in connection with stress, tones, and phonation types. Lindblom (2009) discussed a fibrescopic study in Lindqvist-Gauffin (1972a, 1972b) that examined laryngeal behaviour during glottal stops and f0 changes for Swedish word accents. The fibrescopic data were interpreted as indicating that there may be three dimensions involved in controlling f0 and phonation types: glottal adduction-abduction, laryngealization (which involves the aryepiglottic folds), and activity of the vocalis muscle. With reference to more recent fibrescopic data (Edmondson and Esling 2006; Moisik and Esling 2007; Moisik 2008), Lindblom (2009) suggested that the glottal stop, creaky voice, and f0 lowering may involve the same kind of laryngealization to different degrees. Basing their argument on crosslinguistic fibrescopic data, Edmondson and Esling (2006) suggested that there are indeed different ‘valve’ mechanisms for controlling articulatory gestures that are responsible for cross-linguistic differences in tone, vocal register, and stress, and that languages may differ in choosing specific valve mechanisms. A fibreoptic laryngoscopic study that explores the interplay between phonetics and prosodic structure is found in Jun et al. (1998), who directly observed the changing glottal area in the case of disyllabic Korean words with different consonant types, with the aim of understanding the laryngeal states associated with vowel devoicing. A change in the glottal area was in fact shown to be conditioned by prosodic position (accentual

ARTICULATORY MEASURES OF PROSODY 17 phrase-initial vs. accentual phrase-medial), interpreted as gradient devoicing. Further fibreoptic research might explore the interplay between phonetics and prosodic structure, in particular to study the connection between prosodic strengthening and laryngeal articulatory strengthening.

2.2.2 Electroglottography

0.3 0.2 0.1 0 –0.1 –0.2 –0.3

0

0.4 0.3 0.2 0.1 0 –0.1 –0.2 –0.3 –0.4 0

2

4

6

8 10 12 14 16 18 Time (ms)

Breathy voice

2

4

6

8 10 12 14 16 18 20 Time (ms)

Increasing vocal fold contact ==>

Normal voice

0.4

Creaky voice

0.2 0.15 0.1 0.0.5 0 –0.05 –01 –0.15

Increasing vocal fold contact ==>

Increasing vocal fold contact ==>

Increasing vocal fold contact ==>

The electroglottograph (EGG), also referred to as laryngograph, is a non-invasive device that allows for monitoring vocal fold vibration and the glottal condition during phonation (for more information see Laver 1980; Rothenberg and Mashie 1988; Rothenberg 1992; Baken and Orlikoff 2000; d’Alessandro 2006; Hirose 2010; Mooshammer 2010). It estimates the contact area between the vocal folds during phonation by measuring changes in the transverse electrical impedance of the current between two electrodes across the larynx placed on the skin over both sides of the thyroid cartilage (Figure 2.1). Given that a glottis filled with air does not conduct electricity, the electrical impedance across the larynx is roughly negatively correlated with the contact area. EGG therefore not only provides an accurate estimation of f0 but also measures parameters related to the glottal condition, such as open quotient (OQ, the percentage of the open glottis interval relative to the duration of the full abduction–adduction cycle), contact or closed quotient (CQ, the percentage of the closed glottis interval relative to the duration of the full cycle), and skewness quotient (SQ, the ratio between the closing and opening durations). EGG signals are often obtained simultaneously with acoustic and airflow signals, so that the glottal condition can be holistically estimated. While readers are referred to d’Alessandro (2006) for a review of how voice source parameters (including those derived from EGG signals) may be used in prosody

0 10 20 30 40 50 60 70 80 Time (ms) Loud voice

0.3 0.2 0.1 0 –0.1 –0.2 –0.3 –0.4

0

2

4

6

8 10 12 14 16 18 Time (ms)

Figure 2.1 Waveforms corresponding to vocal fold vibrations in electroglottography (examples by Phil Hoole at IPS Munich) for different voice qualities. High values indicate increasing vocal fold contact. Photo taken at IfL Phonetics Lab, Cologne.

18 TAEHONG CHO AND DORIS Mücke analysis, in what follows we will discuss a few cases in which EGG is used in exploring the interplay between phonetics and prosody. An EGG may be used to explore variation in voice quality strengthening as a function of prosodic structure. For example, Garellek (2014) explored the effects of pitch accent (phraselevel stress) and boundary strength on the voice quality of vowels in vowel-initial words in English and Spanish. Word-initial vowels under pitch accent were found to have an increase in EGG contact (as reflected in the CQ) in both English and Spanish, showing laryngealized voice quality. Garellek’s study of the phonetics–prosody interplay built on the assumption that fine-grained phonetic detail, this time in the articulatory dimension of glottis, is modulated differently by different sources of prosodic strengthening (prominence vs. boundary). Interestingly, however, both languages showed a decrease in EGG contact at the beginning of a larger prosodic domain (e.g. intonational phrase-initial vs. word-initial). This runs counter to the general assumption that domain-initial segments are produced with a more forceful articulation (e.g. Fourgeron 1999, 2001; Cho et al. 2014a) and that phrase-initial vowels are more frequently glottalized than phrase-medial ones (e.g. Dilley et al. 1996; Di Napoli 2015), which would result in an increase in EGG contact. Moreover, contrary to Garellek’s observation, Lancia et al. (2016) reported that vowel-initial words in German showed more EGG contact only when the initial syllable was unstressed, which indicates that more research is needed to understand this discrepancy from cross-linguistic perspectives. EGG was also used for investigating glottalization at phrase boundaries in Tuscan and Roman Italian, pointing to the fact that these glottal modifications are used as prosodic markers in a gradient fashion (Di Napoli 2015). Another EGG study that relates the glottal condition to prominence is Mooshammer (2010). It examined various parameters obtained from EGG signals in order to explore how word-level stress and sentence-level accent may be related to vocal effort in German. The author showed that a vowel produced with a global vocal effort (i.e. with increased loudness) was similar to a vowel with lexical stress at least in terms of two parameters, OQ and glottal pulse shape (obtained by applying a version of principal component analysis), independent of higher-level accent (due to focus). A focused vowel, on the other hand, was produced with a decrease in SQ compared to an unfocused vowel, showing a more symmetrical vocal pulse shape. To the extent these results hold, lexical stress and accent in German may be marked by different glottal conditions. However, given that an accented vowel is in general produced with an increase in loudness, as has been found across languages (including German, e.g. Niebuhr 2010), further research is required to explore the exact relationship between vocal effort and accent, both of which apparently increase loudness.

2.3 Aerodynamic and respiratory movement measures Aerodynamic devices most widely used for phonetic research use oral and nasal masks (often called Rothenberg masks, following Rothenberg 1973) through which the amount of oral/nasal flow can be obtained in a fairly non-invasive way. Intraoral pressure may be

ARTICULATORY MEASURES OF PROSODY 19 simultaneously obtained by inserting a small pressure tube between the lips inside the oral mask; this records the pressure of the air in the mouth (e.g. Ladefoged 2003; Demolin 2011). One aerodynamic measure that is more directly related to prosody may be subglottal pressure, as sufficient subglottal pressure is required for initiating and maintaining vocal fold vibration (van den Berg 1958) and an increase in subglottal pressure is likely to result in an increase in loudness (sound pressure level) and f0 (e.g. Ladefoged and McKinney 1963; Lieberman 1966; Ladefoged 1967). It is not, however, easy to measure subglottal pressure directly; this involves either a tracheal puncture (i.e. inserting a pressure transducer needle in the trachea; see Ladefoged 2003: fig. 3) or inserting a rubber catheter with a small balloon through the nose and down into the oesophagus at the back of the trachea (e.g. Ladefoged and McKinney 1963). Non-invasive methods to estimate subglottal pressure have been developed by using intraoral pressure and volume flow (Rothenberg 1973; Smitheran and Hixon 1981; Löfqvist et al. 1982), but these have limited applicability in prosody research, because certain conditions (e.g. a CVCV context) must be met to obtain reliable data. Aerodynamic properties of speech sounds can be compared with respiratory activities such as lung volume, which may be obtained with a so-called RIP, or Respitrace inductive plethysmograph (e.g. Gelfer et al. 1987; Hixon and Hoit 2005; Fuchs et al. 2013, 2015). In this technique, subjects wear two elastic bands (approximately 10 cm wide vertically), one around the thoracic cavity (the rib cage) and one around the abdominal cavity (Figure 2.2). The bands expand and recoil as the volume of the thoracic and abdominal cavities changes during exhalation and inhalation, such that the electrical resistance of small wires attached to the bands (especially the upper band) is used to estimate the change in the lung volume

Acoustics

2 1 0 –1 –2

Thorax in V

1 0.8

3.5

4

4.5

5

Inhalation

5.5

6

6.5

7

Exhalation

0.6 0.4 0.2

Abdomen in V

Er malt Tania, aber nicht Sonja. (He paints Tanja, but not Sonja.)

3.5

4

4.5

5

5.5

6

6.5

7

3.5

4

4.5

5

5.5 Time in s

6

6.5

7

0.8 0.6 0.4 0.2 0

Figure 2.2 Volume of the thoracic and abdominal cavities in a Respitrace inductive plethysmograph during sentence production, inhalation and exhalation phase. (Photo by Susanne Fuchs at Leibniz Zentrum, ZAS, Berlin.)

20 TAEHONG CHO AND DORIS Mücke during speech production. Gelfer et al. (1987) used the subglottal pressure measurements (Ps) obtained directly from the trachea to examine the nature of global f0 declination (e.g. Pierrehumbert 1979; Cooper and Sorenson 1981). Based on comparison of Ps with f0 and estimated lung volume (obtained with a RIP), Gelfer et al. suggested that Ps is a controlled variable in sentence production, and f0 declination comes about as a consequence of controlling Ps. Most recently, however, Fuchs et al. (2015) assessed respiratory contributions to f0 declination in German by using the same RIP technique, and suggested that f0 declination may not stem entirely from physiological constraints on the respiratory system but may additionally be modulated by speech planning as well as by communicative constraints as suggested in Fuchs et al. (2013). This finding is in line with Arvaniti and Ladd (2009) for Greek. Some researchers have measured airflow and intraoral pressure as an index of respiratory force, because they are closely correlated with subglottal pressure. For instance, oral flow (usually observed during a vowel or a continuant consonant) and oral pressure (usually observed during a consonant) are often interpreted as being correlated with the degree of prominence (e.g. Ladefoged 1967, 2003). Exploring boundary-related strengthening effects on the production of three-way contrastive stops in Korean (lenis, fortis, aspirated; e.g. Cho et al. 2002), for example, Cho and Jun (2000) observed systematic variation of airflow measured just after the release of the stop as a function of boundary strength. However, the detailed pattern was better understood as supporting the three-way obstruent contrast. This again implies that variation in the respiratory force as a function of prosodic structure is further modulated in a language-specific way, in this case by the segmental phonology of the language. In a similar vein, nasal flow has been investigated by researchers in an effort to understand how the amount of nasal flow produced with nasal sounds may be regulated by prosodic structure (e.g. Jun 1996; Fougeron and Keating 1997; Gordon 1997; Fougeron 2001). From an articulatory point of view, Fougeron (2001) hypothesized that the articulatory force associated with prosodic strengthening may have the effect of elevating the velum, resulting in a reduction of nasal flow. Results from French (Fougeron 2001), Estonian (Gordon 1997), and English (Fougeron and Keating 1997) indeed show that nasal flow tends to be reduced in domain-initial position, in line with Fougeron’s articulatory strengthening-based account. (See Cho et al. 2017 for a suggestion that reduced nasality for the nasal consonant may be interpreted in terms of paradigmatic vs. syntagmatic enhancement due to prominence and domain-initial strengthening, respectively.) These studies again indicate that an examination of nasal flow would provide useful data on how low-level segmental realization is conditioned by higher-order prosodic structural factors.

2.4 Point-tracking techniques for articulatory movements Point-tracking techniques allow for the measuring of positions and movements of articulators over time by attaching small pellets (or sensors) to flesh points of individual articulators. The point-tracking systems that have been used in the phonetic research include the

ARTICULATORY MEASURES OF PROSODY 21 magnetometer, the X-ray microbeam, and the Optotrak (an optoelectronic system). The point-tracking systems track movements of individual articulators, including the upper and lower lips and the jaw as well as the tongue and velum, over time during speech production, although the optoelectronic system is limited to an external use (i.e. for tracking the lips and the jaw; see Stone 2010 for a brief summary of each of these systems). Among the point-tracking systems, the electromagnetic articulograph (EMA), also generally referred to as magnetometer, has been steadily developed and more widely used in recent years than the other two techniques because the former is less costly and more accessible, and provides a more rapid tracking rate compared to the X-ray microbeam system. The EMA system uses alternating electromagnetic fields that are generated from multiple transmitters placed around the subject’s head. The early two-dimensional magnetometer system (e.g. Perkell et al. 1992) used three transmitters, but Carstens’ most recent system (AG 501; see Hoole 2014) uses nine transmitters that provide multi-dimensional articulatory data. In this technique, a number of receiver coils (sensors, as small as 2 × 3 mm) are glued on the articulators (Figure 2.3), usually along the midsagittal plane, but capturing more dimensions is also possible in the three-dimensional systems. Note that the NDI Wave System (Berry 2011) uses sensors containing multiple coils (e.g. Tilsen 2017; Shaw and Kawahara 2018). The basic principle is that the strength of the electromagnetic field in a receiver (sensor) is inversely related to its distance from each of the transmitters around the head at different frequencies. Based on this principle, the system calculates the sensor’s voltages at different frequencies and obtains the distances of each sensor from the transmitters, allowing the positions of the sensors to be estimated in the two-dimensional XY or the three-dimensional XYZ coordinate plane plus two angular coordinates (see Zhang et al. 1999 for details of the technical aspects of two-dimensional EMA; Hoole and Zierdt 2010 and Hoole 2014 for EMA systems that use three or more dimensions; and Stone 2010 for a more general discussion on EMA). Given its high temporal and spatial resolution (at a sample rate up to 1,250 Hz in a recent EMA system), an EMA is particularly useful in investigating dynamic aspects of overlapping vocal tract actions that are coactive over time (for quantitative analysis of EMA data see Danner et al. 2018; Tomaschek et al. 2018; Wieling 2018). In prosody research, an EMA is particularly useful in the exploration of the phonetics–prosody

Figure 2.3 Lip aperture in electromagnetic articulography. High values indicate that lips are open during vowel production. Trajectories are longer, faster, and more displaced in target words in contrastive focus (lighter grey lines) compared to out of focus. (Photo by Fabian Stürtz at IfL Phonetics Lab, Cologne.)

22 TAEHONG CHO AND DORIS Mücke interplay in the case of tone-segment alignment and articulatory signatures of prosody structure. A few examples are provided below. It is worth noting, however, that before the EMA system was widely used, prosodically conditioned dynamic aspects of articulation had been investigated by researchers using the X-ray microbeam (e.g. Browman and Goldstein 1995; de Jong 1995; Erickson 1995) and an optoelectronic device (e.g. Edwards et al. 1991; Beckman and Edwards 1994). An EMA may be used to investigate the timing relation of tonal gestures with supralaryngeal articulatory gestures (D’Imperio et al. 2007b). Tone gestures are defined as dynamic movements in f0 space that can be coordinated with articulatory actions of the oral tract within the task dynamics framework (Gao 2009; Mücke et al. 2014). For example, Katsika et al. (2014) showed that boundary tones in Greek are lawfully timed with the vocalic gesture of the pre-boundary (final) syllable, showing an anti-phase (sequential) coupling relationship between the tone and the vocalic gesture in interaction with stress distribution over the phrase-final word (see also Katsika 2016 for related data in Greek). As for the tone-gesture alignment associated with pitch accent, Mücke et al. (2012) reported that Catalan employs an in-phase coupling relation (i.e. roughly simultaneous initiation of gestures) between the tone and the vocalic gesture with a nuclear pitch accent LH. By contrast, the tone–gesture alignment in a language with a delayed nuclear LH rise, such as German, is more complex (e.g. L and H may compete to be in phase with the vocalic gesture, with in-phase L inducing a delayed peak). Moreover, quite a few studies have shown that tone–segment alignment may be captured better with articulatory gestural landmarks rather than with acoustic ones, in line with a gestural account of tone–segment alignment (e.g. Mücke et al. 2009, 2012; Niemann et al. 2014; see also Gao 2009 for lexical tones in Mandarin). More broadly, an EMA is a device that provides useful information about the nature of tone–segment coordination, allowing for various assumptions of the segmental anchoring hypothesis (see chapter 6). EMA has also been extensively used to investigate supralaryngeal articulatory characteristics of prosodic strengthening in connection with prosodic structure. Results of EMA studies have shown that articulation is systematically modified by prominence, largely in such a way as to enhance paradigmatic contrast (e.g. Harrington et  al. 2000; Cho 2005, 2006a; see de Jong 1995 for similar results obtained with the X-ray microbeam system). On a related point, Cho (2005) showed that strengthening of [i] by means of adjustments of the tongue position manifested different kinematic signatures of the dual function (boundary vs. prominence marking) of prosodic structure (see Tabain 2003 and Tabain and Perrier 2005 for relevant EMA data in French; Mücke and Grice 2014 in German; and Cho et al. 2016 in Korean). A series of EMA studies has also examined the nature of supralaryngeal articulation at prosodic junctures, especially in connection with phrase-final (pre-boundary) lengthening (Edwards et al. 1991; Byrd 2000; Byrd et al. 2000, 2006; Byrd and Saltzman 2003; Krivokapić 2007; Byrd and Riggs 2008; Krivokapić and Byrd 2012; Cho et al. 2014b; Katsika 2016). These kinematic data have often been interpreted in terms of gesture types as associated with different aspects of the prosodic structure, like ‘π-gestures’ for slowing down the local tempo at boundaries and ‘μ-gestures’ for temporal and spatial variations under stress and accent. Krivokapić et al. (2017) have extended the analysis of vocal tract actions to manual gestures, combining an EMA with motion capture, and have shown that manual gestures are tightly coordinated with pitch accented syllables and boundaries. Both vocal tract actions and manual gestures (pointing gestures) undergo lengthening under prominence. Scarborough et al. (2009) is an example of the combined use of an EMA and

ARTICULATORY MEASURES OF PROSODY 23 an optoelectronic system to examine the relationship between articulatory and visual (facial) cues in signalling lexical and phrasal stress in English. Together, these studies have provided insights into the theory of the phonetics–prosody interplay in general, and the dynamic approaches have improved our understanding of the human communicative sound system (e.g. Mücke et al. 2014; see also recent EMA studies on other aspects of speech dynamics, such as Hermes et al. 2017; Pastätter and Pouplier 2017).

2.4.1 Ultrasound Ultrasound imaging (also referred to as (ultra)sonography) is a non-invasive technique used in phonetic research to produce dynamic images of the sagittal tongue shape, which allows for investigating vocal tract characteristics, tongue shape, and tongue motion over time (for reviews see Gick 2002; Stone 2010). It uses the reflective properties of ultra-highfrequency sound waves (which humans cannot hear) to create images of the inside of the body (Figure 2.4). The high-frequency sound wave penetrates through the soft tissues and fluids, but it bounces back off surfaces or tissues of a different density as well as air. The variation in reflected echoes is processed by computer software and displayed as a video image. It has some limitations, however (see Stone 2010). For example, due to its relatively low sampling rate (generally lower than 90 Hz), it does not allow for the investigation of sophisticated dynamic characteristics of tongue movements (unlike EMA, which provides a sample rate up to 1,250 Hz). In addition, it is generally unable to capture the shape of the tongue tip and the area beyond tissue or air that reflects ultrasound. However, because ultrasound imaging allows researchers to examine real-time detailed lingual postures not easily captured by methods such as EPG and EMA (including the tongue groove and the tongue root; see Lulich et al. 2018), and because some systems are portable and inexpensive (Gick 2002), it has increasingly been used in various phonetic studies (Stone 2010; Carignan 2017; Ahn 2018; Strycharczuk and Sebregts 2018; Tabain and Beare 2018), and also in studies involving young children (Noiray et al. 2013). Lehnert-LeHouillier et al. (2010) used an ultrasound imaging system to investigate prosodic strengthening as shown in the tongue shape for mid vowels in domain-initial position with different prosodic boundary strengths in English. They found a cumulatively increasing magnitude of tongue lowering as the boundary strength increased for vowels in

Figure 2.4 Tongue shapes in ultrasound. (Photo by Aude Noiray at the Laboratory for Oral Language Acquisition.)

24 TAEHONG CHO AND DORIS Mücke vowel-initial (VC) syllables, but not in consonant-initial (CVC) syllables. Based on these results, the authors suggested that boundary strengthening is localized in the initial segment, whether consonantal or vocalic. In a related ultrasound study that examined lingual shapes for initial vowels in French, Georgeton et al. (2016) showed that prosodic strengthening of domain-initial vowels is driven by the interaction between language factors (such as the phonetic distinctiveness in the perceptual vowel space) and physiological constraints imposed on the different tongue shapes across speakers. In an effort to explore the articulatory nature of the insertion of a schwa-like element into a consonantal sequence, Davidson and Stone (2003) used an ultrasound imaging technique to investigate how phonotactically illegal consonantal sequences may be repaired. Their tongue motion data in the production of /zC/ sequences in pseudo-Polish words suggested that native speakers of English employed different repair mechanisms for the illegal sequences, but in a gradient fashion in line with a gestural account—that is, as a result of the interpolation between the flanking consonants without showing a positional target in the articulatory space (Browman and Goldstein 1992a). It remains to be seen how such a gradient repair may be further conditioned by prosodic structure. An ultrasound imaging system may be used in conjunction with laryngoscopy. Moisik et al. (2014) employed simultaneous laryngoscopy and laryngeal ultrasound (SLLUS) to examine Mandarin tone production. In SLLUS (see also Esling and Moisik 2012), laryngoscopy is used to obtain real-time video images of the glottal condition, which provide information about laryngeal state, while laryngeal ultrasound is simultaneously used to record changes in larynx height. Results showed no positive correlation between larynx height and f0 in the production of Mandarin tones except for low f0 tone targets that were found to be accompanied by larynx raising due to laryngeal constriction (as low tone often induces creakiness). This study implies that larynx height may be controlled to help facilitate f0 change, especially under circumstances in which f0 targets may not be fully accomplished (e.g. due to vocal fold inertia). Despite the invasiveness of laryngoscopy, the innovative technique was judged to be particularly useful in exploring the relation between f0 regulation and phonation type and their relevance to understanding the production of tones and tonal register targets.

2.4.2 Electropalatography Electropalatography (EPG) is a technique that allows for monitoring linguo-palatal contact (i.e. contact between the tongue and the hard palate) and its dynamic change over time during articulation. The subject wears a custom-fabricated artificial palate, usually made of a thin acrylic, held in place by wrapping around the upper teeth (Figure 2.5). Several dozen electrodes are placed on the artificial palate, and the electrodes that are contacted by the tongue during articulation send signals to an external processing unit, indexing details of tongue activity during articulation. Unlike the EMA, which is usually used to track articulatory movements in the midsagittal plane, an EPG records the tongue contact anywhere on the entire palate (i.e. the target of the tongue movement). EPG is generally limited to investigating the production of consonants and high vowels, which involve tongue

ARTICULATORY MEASURES OF PROSODY 25

/t/

/s/

/k/

/ç/

Figure 2.5 Contact profiles in electropalatography for different stops and fricatives. Black squares indicate the contact of the tongue surface with the palate (upper rows = alveolar articulation, lower row = velar articulation). (Photo taken at IfL Phonetics Lab, Cologne.)

c ontact between the lateral margins of the tongue and the sides of the palate near the upper molars (Byrd et al. 1995; Gibbon and Nicolaidis 1999; Stone 2010; see Ünal-Logacev et al. 2018 for the use of EPG with an aerodynamic device). In prosodic research, EPG has been used to measure prosodic strengthening as evidenced by the production of lingual consonants across languages, in English (Fougeron and Keating 1997; Cho and Keating 2009), French (Fougeron 2001), Korean (Cho and Keating 2001), Taiwanese (Hsu and Jun 1998), and German (Bombien et al. 2010) (see Keating et al. 2003 for a comparison of four languages). Results of these studies have shown that consonants have a greater degree of linguo-palatal contact when they occur in initial position of a higher prosodic domain and when they occur in stressed and accented syllables. These studies have provided converging evidence that low-level articulatory realization of consonants is fine tuned by prosodic structure, and as a consequence prosodic structure itself is expressed at least to some extent by systematic low-level articulatory variation of consonantal strengthening, roughly proportional to prosodic strengthening as stemming from the prosodic structure. EPG has also been used in exploring effects of syllable position and prosodic boundaries on articulation of consonantal clusters (Byrd 1996; Bombien et al. 2010). Byrd (1996), for example, showed that a consonant in English is spatially reduced (with lesser linguo-palatal contact) in the coda as compared to the same consonant in the onset, and that an onset cluster (e.g. /sk/) overlaps less than a coda or a heterosyllabic cluster of the same segmental make-up. In relation to this study, Bombien et al. (2010) examined articulation of initial consonant clusters in German and reported that at least for some initial clusters (e.g. /kl/ and /kn/), boundary strength induced spatial and/or temporal expansion of the initial consonant, whereas stress caused temporal expansion of the second consonant, and both resulted in less overlap. These two studies together imply that coordination of consonantal gestures in clusters is affected by elements of prosodic structure, such as syllable structure, stress, and prosodic boundaries.

26 TAEHONG CHO AND DORIS Mücke

2.5 Summary of articulatory measurement techniques Table 2.1 summarizes advantages and disadvantages of several articulatory measurement techniques that we have introduced in this chapter.

Table 2.1 Advantages and disadvantages of articulatory measuring techniques Devices

Advantages

Disadvantages

Laryngoscopy provides high-speed motion pictures of larynx activities

Direct observation of vocal fold vibrations; drawing inferences about different glottal states and activity of the vocalis muscle

Relatively invasive; not ideal for obtaining a large quantity of data from a large population

EGG monitors glottal states using electrical impedance

Non-invasive; relatively easy to handle for monitoring vocal fold vibration and different glottal states during phonation with a larger subject pool

Not as accurate as laryngoscopy; indirect observation of larynx activity estimated by a change in electrical impedance across the larynx

RIP estimates lung volume change during speech by measuring expansion and recoil of elastic bands wrapped around the thoracic and abdominal cavities

Non-invasive device; useful for testing the relationship between the respiratory process and global prosodic planning in conjunction with acoustic data

Difficult to capture fine detail at the segmental level; useful for testing prosodic effects associated with large prosodic constituents, such as an intonational phrase

EMA tracks positions and movements of sensors that are attached on several articulators (e.g. tongue, lips, jaw, velum), within an electromagnetic field

Data obtained at high sampling rates; useful for examining kinematics for both consonants and vowels; simultaneous observation of multiple articulators; high temporal and spatial resolution; applicable to manual and facial movements in prosody; threedimensional devices available; used for recording a larger population with a recent device

Limited to point tracking; usually used to track movements of points along the midsagittal plane; difficult to capture the surface structure (e.g. the complete shape of the tongue); can only be used to observe the anterior vocal tract; quite invasive with sensors; possibly impedes natural articulation

Ultrasound provides imaging of tongue position and movement

Data obtained at relatively high sampling rates (though not as high as a recent EMA system); generally non-invasive; used for measuring the real-time lingual postures during vowel and consonant production; a portable device is available

Difficult to image the tongue tip (usually about 1 cm), some parts of the vocal tract (e.g. the palatal and pharyngeal wall), and the bony articulator (e.g. the jaw); because tongue contours are tracked relative to probe position, it is difficult to align them to a consistent reference, such as hard palate structure

ARTICULATORY MEASURES OF PROSODY 27

Devices

Advantages

Disadvantages

EPG measures linguo-palatal contact patterns using individual artificial palates with incorporated touch-sensitive electrodes

Useful for examining linguo-palatal contact patterns along the parts of the palate over time, especially for coronal consonants; provides information about the width, length, and curvature of constriction made along the palatal region

Custom-made artificial palates needed for individual subjects; often impedes natural speech; restricted to complete tongue– palate contacts; no information about articulation beyond the linguo-palatal contact (e.g. non-high vowels, labials, velars)

Optoelectronic device (Optotrak) provides threedimensional motion data (using optical measurement techniques) with active markers placed on surfaces

Non-invasive; data with high temporal and spatial resolution; useful for capturing movements of some articulators (jaw, chin, lips) as well as all other visible gestures (e.g. face, head, hand)

Limited to point tracking; limited to external use (i.e. to ‘visible’ movements); impossible to capture articulation inside the vocal tract; the system requires line of sight to track the markers

Real-time MRI provides real-time imaging data of complex structures and articulatory movements of the vocal tract using strong magnetic fields

Non-invasive with high spatial resolution; useful for rich anatomical data analysis as well as for analysis of articulatory movements of the entire vocal tract (including the pharyngeal area)

Relatively poor time resolution; subjects are recorded in supine posture, which may influence speech; concurrent audio from production is very difficult to acquire because of scanner noise

2.6 Conclusion Prosody research has benefited from various experimental techniques over the past decades and has as a result extended in its scope, embracing investigations of various phonetic events that occur in both laryngeal and supralaryngeal speech events in the interplay between segmental phonetics and prosodic structure. Careful examination of fine-grained phonetic detail that arises in the phonetics–prosody interplay has no doubt advanced our understanding of the regulatory mechanisms and principles that underlie the dynamics of articulation and prosody. Prosodic features and segmental features are not implemented independently. The coordination between the two systems is modulated in part by universally driven physiological and biomechanical constraints imposed on them, but also in part by linguistic and communicative factors that must be internalized in the phonological and phonetic grammars in language-specific ways. Future prosody research with the help of articulatory measuring techniques of the type introduced in this chapter will undoubtedly continue to uncover the physiological and cognitive underpinnings of the articulation of prosody, illuminating the universal versus language-specific nature of prosody that underlies human speech.

28 TAEHONG CHO AND DORIS Mücke

Acknowledgements This work was supported in part by the Global Research Network programme through the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF2016S1A2A2912410), and in part by the German Research Foundation as an aspect of SFB 1252 ‘Prominence in Language’ in the project A04 ‘Dynamic Modelling of Prosodic Prominence’ at the University of Cologne.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 3

Fu n da m en ta l Aspects i n the Perception of f0 Oliver Niebuhr, Henning Reetz, Jonathan Barnes, and Alan C. L. Yu

3.1 Introduction The mechanisms and cognitive processes that lead to the physiological sensation of pitch in acoustic signals are complex. On the one hand, the basic mechanisms and processes involved in the perception of pitch in a speech signal work on the same principles that apply to any other acoustic signal. On the other hand, pitch perception in speech is still a special case for two reasons. First, speech signals show variation and dynamic changes in many more time and frequency parameters than many other acoustic signals, in particular those that are used as psychoacoustic test stimuli. Second, speech signals convey meaning and as such give the listener’s brain more ‘top-down’ interpretation and prediction possibilities than other signals. Therefore, focusing on fundamental points, this chapter will address both the general principles in the creation of pitch and the more specific aspects of this creation in speech. The basic mechanisms and processes are dealt with in §3.2, embedded in their historical development. Next, §3.3 addresses speech-specific aspects of pitch perception, from the detection of changes in f0 to the influences of segments and prosodies. Finally, §3.4 concludes that special care needs to be taken by speech scientists when analysing visually represented f0 contours in terms of perceived pitch contours.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

30 OLIVER NIEBUHR, HENNING REETZ, JONATHAN BARNES, AND ALAN C. L. YU

3.2 A history of fundamental pitch perception research Pitch perception theories can roughly be divided into three categories: 1. Place (or tonotopic) theories assume that the pitch of a signal depends on the places of stimulation on the basilar membrane in the inner ear coding spectral properties of the acoustic signal. 2. Rate theories assume that the temporal distance between the firing of neurons determines the perceived pitch and that pitch is therefore coded by temporal properties of a signal. 3. Other theories assume a place coding for lower-frequency components and a rate coding for higher-frequency components. We present the most important results of nearly 200 years of experimentation that have contributed to the development of current theories. For this discussion, it is essential to be aware of the differences between the periodicity frequency (rate), the fundamental frequency, and the pitch of a signal.

3.2.1 Basic terminology A periodic signal can be represented by an infinite sum of sinusoids, according to the the orem of Fourier (see Oppenheim 1970). These sinusoids are uniquely determined by their frequency, amplitude, and phase. In speech perception, the phase is ignored, since our hearing system evaluates it only for directional hearing, not for speech perception per se (Moore 2013). The squared amplitude (i.e. the power) of a signal is usually displayed with a (power) spectrum, showing the individual frequency components of a signal. And the development over time of a spectrum is represented with a spectrogram. Frequencies are given in hertz (Hz) and the power is given on a decibel (dB) scale. The transformation from the acoustic signal in the time domain to the spectrum in the frequency domain is usually performed with a Fourier transformation of short stretches of speech. To avoid artefacts by cutting speech in this way, windowing can reduce the segmentation effects (Harris 1978) but cannot remove them. A pure (sine) tone of, for example, 100 Hz has a period duration of 10 ms and it has only this frequency component in its spectrum. The f0 of a complex signal composed of several pure tones whose frequency components are whole multiples of its lowest sine frequency component (e.g. 100, 200, 300, and 400 Hz) is equivalent to this lowest component (here: 100 Hz); its multiples are called ‘harmonics’. All frequency components show up in the spectrum and the period duration of this complex signal is that of the fundamental frequency, here: 10 ms. A complex signal can consist of several harmonics with some harmonics (even the fundamental) having no energy (e.g. 200, 300, and 400 Hz). This complex tone still has a period duration of 10 ms and the period frequency of the signal is 100 Hz, given that the period frequency is the inverse of the period duration. The fundamental frequency

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

FUNDAMENTAL ASPECTS IN THE PERCEPTION OF F 0 31 is therefore also 100 Hz, even though it has no energy, as a result of which it is usually said to be ‘missing’. The debate in pitch perception (which is a physiological sensation, not a physical property) is whether the periodicity of a speech signal or its harmonic structure is most import ant for the perception of its pitch. There are two theories, both based on series of experiments, as discussed in the following sections.

3.2.2 Theories of pitch perception It was long believed that the fundamental frequency (i.e. the first harmonic) of a signal must be present for it to be perceived as the pitch of a signal (see de Cheveigné 2005). This changed when Seebeck (1841) created tones with little energy at the fundamental frequency, using air sirens. He observed that perceived pitch was still that of the weak fundamental and argued that it was perceived on the basis of the rate of the periodic pattern of air puffs. Ohm (1843) objected to this explanation and instead assumed that listeners hear each harmonic in a complex tone according to mechanical resonance properties of the cochlea in the inner ear, whereby the fundamental is the strongest harmonic. Because this view makes the presence of the fundamental essential, he assumed that, in Seebeck’s experiments, non-linear distortions had somehow reintroduced the fundamental in the outer and middle ear. On this assumption, Helmholtz (1863) argued that the inner ear functions as a resonator and a ‘place coder’ for the fundamental frequency. Thus, the experimental evidence for the rate coding of pitch and the absence of the fundamental in Seebeck’s experiments were explained by the place theory, the assumption that the fundamental frequency is reintroduced in the outer or middle ear. The place theory remained the accepted theory and was supported by Békésy’s (1928) findings about the frequency-dependent elongation of different places along the basilar membrane in the inner ear. This theory was challenged by Schouten (1938, 1940a), who electronically generated complex tones with no energy at the fundamental. His findings paralleled Seebeck’s in that the perceived pitch was that of the missing fundamental. He also introduced pure tones close to the frequency of the eliminated fundamental that should lead to interference patterns in the form of a waxing and waning loudness perception, socalled beats (Moore 2013), if the fundamental was physically reintroduced as proposed by Ohm. Since no such beats were perceived, Schouten showed that there was no energy at the fundamental frequency in the listener’s ear. Additional proof of the nonexistence of the fundamental was provided by Licklider (1954), who presented stimuli in which the frequency of the fundamental was masked by noise, which did not prevent listeners from perceiving the pitch at that frequency. More advanced theories were needed to explain these experimental observations.

3.2.3 Critical bands and their importance for pitch perception theories Fletcher (1940) introduced the term ‘critical bands’ for the notion that two simultaneous frequencies must be sufficiently different to be perceived as separate tones rather than as

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

32 OLIVER NIEBUHR, HENNING REETZ, JONATHAN BARNES, AND ALAN C. L. YU a single complex tone. (Think of flying insects in a bedroom at night; several insects can sound like a single bigger one unless their sinusoidal wing frequencies fall into different ‘critical bands’.) Several experiments with different methodologies have repeated this finding (see Moore 2013). The question of whether pitch perception is a rate or a place principle was thus changed into an issue relating to the widths of critical bands. These are smaller for low frequencies and larger for higher frequencies, while the harmonics of a complex tone are equidistant. Consequently, full separation of individual harmonics is possible for lower-frequency components, whose harmonics will rarely fall into the same critical band, while higher harmonics readily fall into the same critical band and are therefore not separable. At the same time, temporal selectivity in the higher frequency range is better than in the lower frequency range. The question of pitch perception as either rate or place thus became a question of resolving individual harmonics. If the resolvable lower harmonics are important for pitch perception, then the coding can be explained by the place along the basilar membrane. If pitch perception is guided by the higher harmonics, then coding must take place by means of the firing distances of the individual nerves, because the higher harmonics cannot be resolved by the places along the basilar membrane. To show that higher harmonics can determine pitch perception, Schouten (1940b) conducted experiments using signals with a non-trivial harmonic structure. The signals consisted of three equally spaced harmonics in a range where place coding is unlikely (e.g. 1,800, 2,000, and 2,200 Hz). The missing fundamental of these three harmonics (here: 200 Hz) was predictably perceived as pitch. He then shifted the frequency components by a constant (e.g. 40 Hz) and generated the inharmonic frequency components 1,840, 2,040, and 2,240 Hz. The question was: which frequency do subjects perceive? The pitch should correspond to 200 Hz if it is derived from the spacing between (in)harmonics. But, if pitch is something like the greatest common divisor of the harmonics, then the percept should correspond to a missing fundamental at 40 Hz, as induced by the three harmonics—that is, the 46th (1,840 Hz), 51st (2,040 Hz), and 56th (2,240 Hz). However, the subjects in Schouten’s experiment most often perceived a pitch of 204 Hz, sometimes 185 or 227 Hz. These experimental findings subsequently appeared to be explained by the firing rate of neurons, which operate as ‘peak pickers’, firing at signal maxima (Schouten et al. 1962). Concentrating on the fine structure of the waveform, de Boer (1956) measured the distances between the peaks of the amplitude signal, which showed that a distance of 4.9 ms occurred most often, representing a pitch of 204 Hz. Distances of 5.4 and 4.4 ms occurred less frequently, which can be perceived as 227 and 185 Hz. Walliser (1968, 1969) gave an explanation based on a spectral decomposition of the signal. He observed that 204.4 Hz is a subharmonic of 1,840 Hz, and 185.4 and 226.6 Hz are subharmonics of 2,040 Hz. He suggested that a rough power spectrum of the resolvable harmonics is first ‘computed’ in the listener’s brain and that a central processor subsequently uses pattern-recognition techniques to determine the pitch. In this way, the virtual fundamental frequency with the best fit is selected as the perceived pitch frequency. Explaining pitch perception with only approximately matching quasi-fundamental frequencies is actually an improvement over exactly matching frequencies, because a speech signal never has a perfectly harmonic structure and the ear must deal with such imperfect structures.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

FUNDAMENTAL ASPECTS IN THE PERCEPTION OF F 0 33

3.2.4 Which components are important? Plomp (1967) and Ritsma (1967) conducted a number of experiments that led to the conclusion that a frequency band around the second, third, and fourth harmonics for fundamentals between 100 and 400 Hz was crucial for the perception of pitch, with periodicity of the signal being the important factor. In contrast, place proponents (e.g. Hartmann 1988; Gulick et al. 1989) argued that the resolved harmonics (i.e. the lowest multiples of f0) determine the pitch through their tonotopic distances. There was support for both theories (for detailed overviews see Plomp 1967; Moore 2013), so that the experiments gave no clear insight as to whether pitch is perceived in the temporal (rate) or the spectral (place) domain. Houtsma and Goldstein (1972) conducted an important experiment in which they presented complex tones composed of two pure tones (e.g. 1,800 and 2,000 Hz) to experienced listeners, who were able to identify the missing fundamental of 200 Hz. The pitch percept was even present when one harmonic was presented to the left and the other to the right ear. They assumed a central pitch processor in the listener’s brain that receives place-pitch information from each ear and then integrates this information in a single central pitch percept. In addition, Bilsen and Goldstein (1974) found that subjects can perceive pitch in whitenoise signals presented to both ears but delayed in one by 2 ms or more. The pitch percept is weak, but similar to the percept when delayed and undelayed signals are presented to the same ear (Bilsen 1966), which again points to a central mechanism (Bilsen 1977). Goldstein (1973) proposed an ‘optimum processor’ theory, where a central estimator receives information about frequencies of resolvable simple tones. The estimator tries to interpret resolved tones as harmonics of some fundamental and looks for a best fit. This analysis can be based on either place or rate information. Wightman (1973) suggested a ‘pattern-transformation’ theory, a sort of cepstrum analysis that roughly represents a phaseinsensitive autocorrelation function of the acoustic signal. Terhardt (1974) explained pitch perception in his ‘virtual pitch’ theory via a learning matrix of spectral pitch of pure tones and an analytic mode for later identifying the pitch, where harmonics leave ‘traces’ in the matrix. In a later version of his theory (Terhardt et al. 1982), he included the concept of analytical listening (‘outhearing’ individual harmonics of a signal) and holistic listening (perceiving the spectral shape as a whole). The close relationships between these three theories are brought out in de Boer (1977). All three in fact have peripheral stages where a rough power spectrum of the resolved harmonics is computed, in addition to a central processor that uses pattern-recognition techniques to extract the pitch of a signal. Although none of the theories offers a complete explanation of the computation of pitch from harmonic information only, the fact remains that binaural pitch perception and similar phenomena cannot be explained by a peripheral pitch processor alone. The central processing theories can explain a variety of experimental findings, but they still fail to explain the decreasing spectral resolution of higher harmonics due to the critical bands, which predicts good spectral coding for lower frequencies and better temporal coding for higher frequencies. This led Licklider (1951), de Boer (1956), Moore and Glasberg (1986), and Houtsma and Smurzynski (1990) to argue for a dual mechanism of pitch perception. They propose a frequency selectivity mechanism for lower harmonics and a rate analysis for higher harmonics and argue that this dual mechanism is more flexible, and thus more robust than a singular mechanism.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

34 OLIVER NIEBUHR, HENNING REETZ, JONATHAN BARNES, AND ALAN C. L. YU There is some evidence for inter-individual differences in the use of the two mechanisms. Ladd et al. (2013) adopted the terminology of ‘spectral listeners’, who use spectral patterns to perceive pitch (i.e. a place coding), and ‘f0 listeners’, whose pitch perception follows the signal’s periodicity (i.e. a rate coding). Some listeners seem to prefer place coding and some rate coding. No one, however, seems to be either a pure ‘spectral listener’ or a pure ‘f0 listener’. Moreover, the nature of the stimulus itself is also relevant. Many listeners switch between the place and rate mechanisms, with the number of ‘spectral listeners’ increasing for stimuli with lower-frequency components. Virtual pitch perception therefore appears to dominate in natural, unmanipulated speech signals and everyday conversation situations. Thus, the bottom line of §3.2 is that pitch perception in complex sound signals relies on multi-layer, signal-adaptive cognitive mechanisms in which f0 is neither required to be physically present nor directly translated into its psychoacoustic counterpart. Pitch is virtual, and this fact becomes even clearer when we shift the focus to speech signals. The following sections on pitch perception in speech begin with listeners’ sensitivity to f0 change (§3.3.1) and then successively move deeper into segments and prosodies by explaining how speech segments hinder and support perception of f0 change (§3.3.2) and how pitch perception is shaped by duration and intensity (§3.3.3).

3.3 Pitch perception in speech 3.3.1 Just noticeable differences and limitations in the perception of f0 Just noticeable differences (JNDs) play a crucial role in the analysis of speech melody. We need to know how fine grained a listener’s resolution of pitch differences is in order to be able to separate relevant from irrelevant pitch changes. However, psychoacoustic JNDs often considerably underestimate the actual JNDs of listeners in the perception of speech melody. This is probably because the limited cognitive processing capacity of the auditory system allows for more fine-grained resolution of pitch differences in simple, steady psycho acoustic stimuli than in complex and variable speech stimuli (Klatt 1973; Mack and Gold 1986; House 1990). For example, psychoacoustic studies suggest that the JND between two f0 levels is as low as 0.3–0.5%, hence only 1 Hz or even lower for typical speech f0 values (Flanagan and Saslow 1958). In contrast, studies based on real speech or speech-like stimuli have found that listeners only detect f0 changes larger than 4–5%, i.e. 5–10 Hz or 1 semitone (ST) (Isačenko and Schädlich 1970; Rossi and Chafcouloff 1972). Fry’s (1958) experiments on f0 cues to word stress also support this JND level. If reverberation in a room scrambles phase relationships between individual frequencies of the signal, then listeners’ sensitivity to pitch differences decreases further, to about 10% (Bernstein and Oxenham 2006). These 10% or roughly 2 ST are about 20 times higher than the JND specified in psychoacoustic research. That 2 ST nevertheless represents a realistic threshold for detection of pitch changes in everyday speech is reflected in many phonetic analyses. For instance, annotating intonation often involves deciding whether two sequential pitch accents are linked by a high plateau or

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

FUNDAMENTAL ASPECTS IN THE PERCEPTION OF F 0 35 a sagging transition, in particular in languages that distinguish a ‘hat pattern’ from a ‘dip pattern’ (e.g. German). Ambrazaitis and Niebuhr (2008) found that, for German listeners, at least 2–3 ST difference is necessary before meaning-based judgements switch from hatto dip-pattern identification. Similarly, communicatively relevant steps in stylized inton ation contours such as the calling contour are at least 1–2 ST (Day-O’Connell 2013; Niebuhr 2015; Arvaniti et al. 2017; Huttenlauch et al. 2018). Furthermore, listeners seem equally insensitive to differences in f0 range—that is, to the magnitude of f0 movements. Several experiments on this topic suggest that the JND for f0 ranges is at least 1 ST, and perhaps much higher—that is, 2–3 ST (Pierrehumbert 1979; ’t Hart 1981). This becomes relevant in the minor f0 drop frequently occurring at the end of phrase-final rises, as in the audio examples for Kohler (2005) shown in IPDSP (2009: fig. 3). From a purely visual perspective, labelling such phrase endings with a downstepped high (or low) rather than a high boundary tone seems justifiable. However, these drops are typ ically less than 2–3 ST and hence not audible (though see Jongman et al. 2017 on language specificity). Figure 7(c) in the audio examples for Gussenhoven (2016) shows a rare case in which a short f0 drop at the end of a phrase-final rise is just above the JND and hence aud ible for listeners (358–305 Hz, 2.8 ST). Rossi (1971) made another important observation with respect to f0-range perception. Compared to measured f0 range, perceived pitch range can be considerably smaller, varying between 66% and 100% of the f0 movement, with steeper movements resulting in smaller perceived ranges. In Figure 3.1, for example, the continuation rises depicted for the first two words have roughly the same frequency range and end at similar levels (rise 1: 174–275 Hz; rise 2: 180–272 Hz). Yet, the second rise starts later, is steeper, and thus perceptually ends at a lower pitch level than the first. In this case, Rossi’s finding has no phonological c onsequence, but this could easily occur, for example, when the f0 contour suggests an upstep from a prenuclear to a nuclear pitch accent, where listeners might not perceive an upstep owing to the steepness of the nuclear rise. The correlation that Rossi discovered between the steepness of f0 slopes and the perceived range of the corresponding pitch movements could indirectly also explain the findings of Michalsky (2016). His original aim was to determine the transition point between question and statement identification in a slope continuum of phrase-final f0 movements in German, but he stumbled upon a variation in identification behaviour that was linked to speaking rate. This link suggests that the faster speakers are, the smaller are the perceived pitch ranges of the f0 movements they produce—all else equal, including the steepness of the f0 slope. It is possible that a faster speaking rate makes f0 movements appear steeper, which then in accord with Rossi’s findings causes the decrease in pitch range perception.

(1) Computer,

Tastatur und

Bildschirm.

Figure 3.1 Enumeration ‘Computer, Tastatur und Bildschirm’ spoken by a female German speaker in three prosodic phrases (see also Phonetik Köln 2020). (From the G_ToBI training material)

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

36 OLIVER NIEBUHR, HENNING REETZ, JONATHAN BARNES, AND ALAN C. L. YU Additionally, not every f0 rise or fall is even perceived as a pitch movement. Dynamic pitch perception requires certain relations between the duration and range of an f0 movement. Below this ‘glissando threshold’, f0 movements appear to listeners as stationary tones. The shorter a frequency transition is, the steeper it must be to induce a dynamic percept (Sergeant and Harris 1962). Likewise, for a relatively flat movement to be perceived as dynamic, it must be fairly long, consistent with the results of Rossi (1971). ’t Hart et al. (1990) integrate these results on glissando perception into a formula that states that the minimum f0 slope for an ST interval to yield the percept of a pitch movement corresponds to a constant factor divided by duration squared. A short f0 movement (50 ms) must have a slope of at least 46 ST/s to yield a movement percept, approaching the limits of human production velocity, at least for rises (Xu and Sun 2002). For a 100 ms movement, the glissando threshold decreases to about 12 ST/s. One important practical implication of this is that short f0 movements framed by voiceless gaps are often not steep enough to induce perception of a pitch movement. The result is a steady pitch event whose level roughly corresponds to the mean f0 of the second half of the movement. For example, this applies to the short f0 fall on und [ʊnb̥ː] in Figure 3.1. Thus, while the contour on und Bildschirm visually suggests annotation with a boundary high, %H, annotators who also listen to the phrase will likely decide on %L instead. (See G_ToBI conventions in Grice and Baumann 2002.) The und example shows the relevance of the glissando threshold for upstep and downstep decisions, in particular between phrases (Truckenbrodt 2007). As for the discrimination of f0 slopes, Nábelek and Hirsh (1969) report, for psycho acoustic conditions, a minimum threshold of about 30%. For more speech-like stimuli, the JND again increases to between 60% and 130% (see also Klatt 1973 and ’t Hart et al. 1990, who, for most speech-like conditions, obtained a JND of about 100%). Accordingly, speakers who distinguish pitch accents using shape rather than alignment differences vary the steepness of their pitch accent slopes so as to exceed this JND (cf. Niebuhr et al. 2011a). Also, differences between monotonal and bitonal accents such as H* and L+H* or H*+L typically involve slope differences of 60% or more, leaving aside the controversial crosslinguistic discussion about the separate phonological status of these accent types (cf. Gussenhoven 2016). In summary, it is crucial for analyses of speech melody to take into account that not all observable changes in f0 result in differences in perceived pitch. Some may be insufficiently large, others not steep or fast enough (or too steep or fast), and, even if an f0 movement produces a dynamic percept, it is likely that the movement’s perceived range will be substantially smaller than f0 suggests. Still more complicated, there is increasing evidence that JNDs are not mere limitations of the auditory system but to some degree also acquired barriers the brain sets up to filter out what it has learned to regard as irrelevant variability (cf. Gregory 1997). This idea fits well with the inter-individual differences (e.g. superior performa nce of musicians compared to non-musicians) and the training or learning effects that are reported in many JND studies (’t Hart 1981; Handel 1989; Micheyl et al. 2006). The implication is that during perception experiments participants could be screened for JND levels (perhaps just with biographical questions), and likewise acoustic f0 analyses should work with conservative JND estimates (see Mertens 2004).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

FUNDAMENTAL ASPECTS IN THE PERCEPTION OF F 0 37

3.3.2 Segmental influences on the perception of f0 While §3.3.1 focused on the f0 contour itself, this section focuses on how segmental elem ents both limit and enhance the perception of pitch movements in speech. We begin with three examples of limitations. First, the speaker’s intended intonational f0 contour is typically interspersed with many f0 ‘micro-perturbations’ (Kohler 1990; Pape et al. 2005; Hanson 2009). Kirby and Ladd (2016a), for example, argue that there are two types of consonant-based f0 perturbation effect, one raising f0 following the release of a voiceless consonant, the other depressing f0 around the closure phase of a voiced obstruent. These micro-perturbations are actually not as small as the term ‘micro’ may imply. They can easily be between 10–20 Hz in magnitude, or more than 10% of the f0 range typically employed in natural speech. Moreover, these ups and downs may extend for a considerable time into the vowel (Hombert 1978). As a result, they may appear visually quite salient in depictions of f0 contours. Despite this visual salience, however, f0 micro-perturbations have little effect on listeners’ perceptions of speakers’ intended intonational patterns (though they can significantly influence phoneme identification and duration perception; see §3.3.3). Pitch pattern annotators, in other words (both humans and machines), must learn how f0 perturbations manifest themselves for the consonants of the language under analysis (Jun 1996), and apply this to the evaluation of the perceptual relevance of consonant-adjacent f0 movements, particularly those smaller than 3 ST (Mertens 2004) and shorter than about 60 ms (Jun 1996). Special caution is required when annotating phonologically relevant high and low targets near consonants; f0 perturbations, especially local dips before or inside voiced consonants, ‘could easily be misinterpreted as [low] tonal targets’ (Braun 2005: 106; cf. Petrone and D’Imperio 2009). The same applies to local f0 maxima and their interpretation as high tonal targets after voiceless consonants. Several explanations exist for apparent listener ‘deafness’ to consonantal f0 microperturbations. One is that such movements are so abrupt that they fall below the glissando threshold, and/or JNDs for change in pitch level and range (see §3.3.1). This cannot be the whole story, however, since micro-perturbations are clearly audible, insofar as they function as cues to segmental contrasts (e.g. stop voicing) (Kohler 1985; Terken 1995), sometimes even resulting in phonologization of tone contrasts that replace their consonantal precursors altogether (Haudricourt 1961; Hombert et al. 1979; House 1999; Mazaudon and Michaud 2008). Rosenvold (1981) approaches this paradox by assuming that f0 perturba tions are somehow integrated first as segmental properties in perception, thereby exempting them from additional parsing at the intonational level, which view was also argued for by Kingston and Diehl (1994). Ultimately, though, unanswered questions remain. A second reason why not all observable f0 changes manifest themselves as differences in perceived pitch comes from House’s (1990, 1996) Theory of Optimal Tonal Perception, according to which continuous f0 movements are parsed into a sequence of either level pitches or pitch movements depending on the information density of the speech signal over the relevant span. At segment onsets, for example, and particularly consonant–vowel boundaries, listeners must process a great deal of new information (see Figure 3.2a). The auditory system is thus fully engaged with processing spectral information for segmental purposes, causing it to reduce f0 movements in those regions to steady pitch events, approximately corresponding to the mean f0 of the movement’s final centiseconds

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

38 OLIVER NIEBUHR, HENNING REETZ, JONATHAN BARNES, AND ALAN C. L. YU

(a)

Time course of cognitive workload across a CVC syllable

(b) Hz 150

Max

Perception of f0 movements Low

Fall High

C

V

140 130 120 100 Min

100 C

V 200 ms

C

C

Figure 3.2 Schematic representation of the two key hypotheses of the Theory of Optimal Tonal Perception of House (1990, 1996): (a) shows the assumed time course of information density or cognitive workload across a CVC syllable and (b) shows the resulting pitch percepts for differently aligned f0 falls.

(about 3–4 cs; see d’Alessandro et al. 1998 for an alternative pitch-level calculation). Only when enough cognitive resources are available can f0 movements be tracked faithfully as dynamic events, which requires a spectrally stable sonorant section in the speech signal of at least 100 ms, as typically provided by vowels (stressed and/or long vowels in particular). This is why only the middle f0 fall in Figure 3.2b across a sonorant consonant-initial (CVC) sequence will be perceived as a falling movement (F), whereas the other two falls create the impression of either High (H) or Low (L) tones. The Prosogram algorithm for displaying and analysing f0 contours (Mertens 2004) is the first software to incorporate both this and the glissando threshold (§3.3.1) into its analyses. A third way in which sound segments limit the perception of f0 patterns is related to House’s Theory of Optimal Tonal Perception and concerns the interplay between the segmental sonority of and the salience of f0 events. In general, pitch percepts are more robust and salient when they originate from more sonorous segments. Not by chance therefore (e.g. Gordon 2001a; Zhang 2001) do phonologically relevant f0 events occur primarily in the higher sonority regions of the speech signal—for example, right after the onset of a vowel or towards its end (Kohler 1987; Xu 1998; Atterer and Ladd 2004). In a study investigating perceived scaling of plateau-shaped English pitch accents (see §9.5.1 on peak shape across languages), Barnes et al. (2011, 2014) observe that f0 plateaux that coincide with high sonority segments (e.g. accented vowels) are judged higher than identical plateaux that partially overlap with less sonorous coda consonants. Accordingly, they include segmental sonority as one of the weighting factors determining each f0 sample’s influence on the holistic percept of the timing and scaling of f0 events instantiating pitch accents, for instance, which percept they call the Tonal Center of Gravity. Thus, a plateau-shaped accent that appears visually to exhibit typical f0 alignment for a L*+H (late-peak) accent in American English may sound instead like a L+H*, if the f0 plateau

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

FUNDAMENTAL ASPECTS IN THE PERCEPTION OF F 0 39 extends largely over lower-sonority segmental material. Clearly, then, f0 contours cannot be properly analysed in isolation from the segmental context over which they are realized and annotators must rely on their ears as much as, or even more than, on convenient electronic visualizations. These findings accord with Niebuhr and Kohler (2004), who suggest that categorical perception (CP) of intonational contrasts is modulated by the presence of sonority breaks, notably CV or VC boundaries. Niebuhr (2007c) demonstrated that the abruptness of this sonority break (taken as the slope of intensity change) may alter the clarity with which perception experiments appear to show CP in their results. Lessening a sonority break’s abruptness can make an otherwise categorical-seeming identification function appear gradual, while enhancing a break’s abruptness can do the opposite. In any case, the nature of a (categorical or gradual) perceptual boundary between prosodic entities like pitch accents is not determined entirely by the nature of the entities themselves, but also involves the segmental context in which they are perceived (see §9.5.1 for more on timing relations between the f0 and intensity contours). These findings call into question the usefulness of CP as a tool for identifying phonological contrast, echoing Prieto’s statement (2012: 531–532) that ‘the application of this [CP] paradigm to intonation research has met with mixed success and there is still a need to test the [...] adequacy of this particular method’. So far, we have dealt with segmental phenomena that reduce the richness of an utterance’s perceived pitch contour in comparison with its observable f0. However, at least two noteworthy instances of the opposite are attested as well. First, while consonantal f0 per turbations may add f0 patterns to the signal that are not (entirely) incorporated into listeners’ perceived pitch pattern, vowels may in fact add something to it that is not immediately visible in recorded f0. This effect, known as intrinsic pitch, is not to be confused with intrinsic f0, whereby high vowels increase vertical tension of the vocal folds, thus raising f0 compared to low vowels, which relax and thicken the vocal folds, hence lowering f0 (Whalen and Levitt 1995; Fowler and Brown 1997). Rather, the effect of intrinsic pitch runs counter to that of intrinsic f0: all else equal, low vowels such as [a] are perceived as higher in pitch than high vowels such as [i] and [u], even when their realized f0 is in fact identical (Hombert 1978; Fowler and Brown 1997; Pape et al. 2005). Thus, an f0 rise across the two vowels [ɑː] and [i] of hobby is (in addition to the facts described in §3.1) perceptually smaller than its f0 range suggests, whereas an analogous fall becomes perceptually larger. The opposite would apply to a word like jigsaw with the inverse ordering of vowel heights (Silverman 1987). Intrinsicpitch differences are also big enough to create rising or falling pitch perceptions during visually flat f0 regions across vowel sequences or diphthongs (Niebuhr 2004). The intrinsicpitch effect can, thus, lend a word like hi, which mainly consists of a diphthong, a falling intonation although f0 is completely flat. A further way in which segments enhance f0 patterns relies on the fact that pitch perception is not restricted to periodic signals alone. Noise signals are also capable of creating aperiodic pitch impressions. These are particularly strongly influenced by the frequency of F2 but can, more generally, be modelled as a weighted combination of acoustic energy in different frequency bands (Traunmüller 1987), independently of a listener’s phonological background (Higashikawa and Minifie 1999). Thus, to the extent that speakers can control formants and the distribution of acoustic energy in the frequency spectrum, they can also control and actively vary the aperiodic pitch impressions evoked by the noise signals in their speech. As is well known, aperiodic pitch patterns can take over when f0 is not

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

40 OLIVER NIEBUHR, HENNING REETZ, JONATHAN BARNES, AND ALAN C. L. YU a vailable, as in whispered speech, where the aperiodic pitch contour allows listeners to reliably identify features of information structure, turn taking, sentence mode, or (in the case of tone languages) lexical meanings (cf. Meyer-Eppler 1957; Abramson 1972; Whalen and Xu 1992; Krull 2001; Nicholson and Teig 2003; Liu and Samuel 2004; Konno et al. 2006). Controlled, functional, and aperiodic pitch impressions were for a long time assumed to be characteristic of whispered speech only. Recent work shows that they are not. Fricatives such as [f s ʃ x] within normally voiced utterances vary in their spectral energy distribution such that the aperiodic pitch impression they create reflects the adjacent f0 level. That is, fricatives are ‘higher pitched’ in high f0 contexts and ‘lower pitched’ in low f0 contexts (‘segmental intonation’; Niebuhr 2009). Segmental intonation occurs at the end as well as in the middle of prosodic phrases and has been found for a number of different phonological f0 contexts in German (Niebuhr 2008, 2012, 2017; Niebuhr et al. 2011b; Ritter and Röttger 2014) and other languages, such as Polish (Żygis et al. 2014), Cantonese (Percival and Bamba 2017), French (Welby and Niebuhr 2019), and Dutch (Heeren 2015). Heeren also supports evidence in Mixdorff and Niebuhr (2013) and Welby and Niebuhr (2016) that the segmental intonation of fricatives is integrated in the listener’s overall perception of utterance intonation. Segmental intonation could thus be one reason why the intonation contours of utterances are ‘subjectively continuous’ (Jones 1909: 275) despite that fact that between 20% and 30% of an utterance is usually voiceless. To be sure, the segmental intonation of a fricative does not simply co-vary with the adjacent f0 context, since speakers may also produce spectral energy changes inside these fricatives such that a falling f0 movement (for example) is followed by (or continued in) a falling aperiodic pitch movement (Ritter and Röttger 2014). Furthermore, Percival and Bamba’s (2017) finding that segmental intonation is more p ronounced in English than in Cantonese underscores the extrinsic nature of the phenomenon. The most important practical implication of segmental intonation probably concerns phrase-final f0 movements. There is evidence (e.g. Kohler 2011) suggesting that phrase-final f0 movements that are acoustically truncated by final voiceless fricatives can be continued in the aperiodic noise of that fricative, appearing to be perceptually less truncated as a result. Similarly, phrase-final low rises ended by a voiceless fricative might be perceived as high rises, while phrase-final high-to-mid f0 falls ended by a voiceless fricative might be perceived as ending in low pitch. Again, this means that the decision between L-H% and H-^H% in G_ToBI, for instance, cannot be based on the f0 movement alone in such cases, and the same applies to the decision between !H-% and L-%. Figure 3.3 (from Kohler 2011) shows a highly truncated f0 fall followed by [s] at the end of in Stockholm bei der ICPhS ‘in Stockholm at the ICPhS’. Despite the clear final f0 rise, this utterance is not perceived as an open question with a phrase-final rise (!H-%) but as a binding proposal whose falling phrase-final pitch movement (L-%) reaches as low as the phrase-internal one on Stockholm.

3.3.3 Perceptual interplay between prosodic parameters Beyond the interplay of segments and prosodies reviewed above, other phonetic characteristics of the signal, such as duration and intensity, also interact with f0 in perception. This is schematized in Figure 3.4. There is, for example, a robust effect of pitch on the perceived durations of stimuli such as syllables, vowels, or even pauses. Lehiste (1976) concludes from

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

FUNDAMENTAL ASPECTS IN THE PERCEPTION OF F 0 41

Stockholm

bei

der

I C

P

h

s

Pitch (Hz)

85 300

66.67

200

48.33

150

0

Energy (dB)

In

30 2.43613

Time (s)

Figure 3.3 Utterance in Stockholm bei der ICPhS, Kiel Corpus of Spontaneous Speech, female speaker g105a000. Arrows indicate segmental intonation in terms of a change in the spectral energy distribution (0–8 kHz) of the final [s], 281 ms. (Adapted from Kohler 2011)

p tem

od

re ec

as

es

Perceived pitch

t en em ses v o m crea in

Perceived duration

n co egat rre ive lat ion

positive correlation

Perceived loudness

Figure 3.4 Perceived prosodic parameters and their interaction.

perception experiments that a changing fundamental frequency pattern has a strong influence on the listener’s perception of duration (see also Kohler 1986). More specifically, compared to a flat pitch, pitch movements—and falling ones in particular—make a sound or a syllable appear longer to listeners. Van Dommelen (1995) and Cumming (2011a) report similar lengthening effects for Norwegian, German, and French. Brugos and Barnes (2012) show in addition that larger pitch changes across pauses make these pauses appear longer. The work of Yu (2010) extends these findings to the perception of level versus dynamic lexical tones in Cantonese, adding moreover to the picture that syllables with high pitch are perceived as longer than syllables with low pitch (see also Gussenhoven and Zhou 2013). The latter effect is either language or domain specific, though, as Kohler (1986) and Rietveld and Gussenhoven (1987) showed that higher pitch levels over a longer interval cause an increase in perceived speech rate. Perceived duration—in the form of the speaking rate—also has an effect on pitch in the opposite direction in that faster speaking rates narrow perceived pitch ranges. Various investigations point to a further effect of perceived (intensity-related) loudness on pitch (for

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

42 OLIVER NIEBUHR, HENNING REETZ, JONATHAN BARNES, AND ALAN C. L. YU an overview, see Rossing and Houtsma 1986). While the direction and magnitude of this effect vary, the most frequent finding is that a decrease in loudness increases perceived pitch, potentially by as much as 1 ST. In addition, higher loudness levels lead to the perception of larger pitch intervals (Thomsen et al. 2012). In turn, loudness is affected by perceived duration (Lehiste 1970). Explanations of this interplay of prosodic parameters range from basic bottom-up reception mechanisms to expectation-based top-down perceptual processing that helps the listener ‘filter out the factors that influence duration and frequency in order to perceive the speaker’s intended’ parameter values (Handel 1989: 422). Lehiste (1970: 118) uses the term ‘correction factors’ in a similar context (see also Yu 2010). Whatever the explanation, the interaction of these parameters underscores the need for prosody to be viewed as a whole, with none of its various dimensions analysed and interpreted in isolation from the others. In this same context, note that Figure 3.4 implies the possibility of effect chains as well. For instance, a longer or shorter perceived duration lowers or raises perceived loudness, which, in turn, raises or lowers perceived pitch; alternatively, a pitch movement increases perceived duration, which, in turn, increases perceived loudness. Such potential interactions are particularly relevant in prosodically flatter portions of the signal, where in the absence of other activity the right combination of these factors might modulate the perceived prominence of syllables, or even cue the presence of less salient H* or L* pitch accents. Lastly, it is worth mentioning that perceived pitch can also be modulated significantly by voice quality or phonation type. Examples of interaction between voice quality and tone/ intonation systems are presented in chapters 12, 23, and 29.

3.4 Conclusion Today, f0 can relatively simply be extracted from acoustic speech signals and be visually displayed, making useful and compelling information available on pitch contours during speech. Nonetheless, in this chapter we have urged researchers not to mistake visual salience for perceptual reality. Intonation models involving integration of f0 information over time, such as Prosogram (Mertens 2004), Tilt (Taylor 2000), Tonal Center of Gravity (Barnes et al. 2012b), or Contrast Theory (Niebuhr 2013; for a discussion see chapter 9), represent steps towards operationalizing this reality. Meanwhile, we hope to have convinced readers that a meaningful analysis of acoustic f0 in speech signals must take into account not just how it was produced but also how it is perceived.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

Pa rt I I

PRO S ODY A N D L I NGU IST IC ST RUC T U R E

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

chapter 4

Ton e Systems Larry M. Hyman and William R. Leben

4.1 Introduction: What is tone? All languages use ‘tone’ if what is meant is either pitch or the f0 variations that are unavoid able in spoken language. However, this is not what is generally meant when the term is used by phonologists. Instead, there is a major typological split between those languages that use tone to distinguish morphemes and words versus those that do not. Found in large numbers of languages in sub-Saharan Africa, East and South East Asia, parts of New Guinea, Mexico, the Northwest Amazon, and elsewhere, tone can be used to distinguish lexical morphemes (e.g. noun and verb roots) or grammatical functions. Thus, in Table 4.1 the same eight-way tonal contrast has a lexical function on nouns, but marks the indicated grammatical distinc tions on the verb /ba/ ‘come’ in Iau [Lakes Plain; West Papua] (Bateman 1990: 35–36).1

Table 4.1 Tonal contrasts in Iau Tone Nouns H M HS LM HL HM ML HLM

bé be¯ bé˝ be᷅ bê be᷇ be᷆ bê ̄

Verbs ‘father-in-law’ ‘fire’ ‘snake’ ‘path’ ‘thorn’ ‘flower’ ‘small eel’ ‘tree fern’

bá ba¯ bá˝ ba᷅ bâ ba᷇ ba᷆ bâ ̄

Inflectional meaning ‘came’ ‘has come’ ‘might come’ ‘came to get’ ‘came to end point’ ‘still not at endpoint’ ‘come (process)’ ‘sticking, attached to’

Totality of action punctual Resultative durative Totality of action incompletive Resultative punctual Telic punctual Telic incompletive Totality of action durative Telic durative

As the above monosyllabic examples make clear, tone can be a crucial exponent of mor phemes, which may be distinguished only by tone. While Iau tone is hence densely paradig matic, at the other end of the spectrum tone can be quite sparse and syntagmatic. This is the case in Chimwiini [Bantu; Somalia], where a single H tone contrasts with zero (Ø), is strictly 1 For tone, H = high (´), ꜜH = downstepped H (ꜜH), M = mid (¯), L = low (`), and S = superhigh (˝). HS thus represents a contour that rises from high to superhigh tone. Our segmental transcriptions of African language data follow standard practice in African linguistics, which sometimes conflicts with the International Phonetic Alphabet. For example, we use ‘y’ for IPA [j] and ‘c’ and ‘j’ for IPA [tʃ] and [dʒ]. In a few forms cited (not in phonetic brackets) we are unsure of what segment our source intended by a symbol.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

46 larry m. hyman and william r. leben grammatical (nouns and verbs are underlyingly toneless), and can occur only on the final or penultimate syllable of a phonological phrase (Kisseberth and Abasheikh 2011: 1994): (1) a. n-jileː n̪amá ‘I ate meat’ n-jile ma-tuːndá ‘I ate fruit’ b. jileː n̪amá ‘you sg. ate meat’ jile ma-tuːndá ‘you sg. ate fruit’ c. jileː n̪áma ‘s/he ate meat’ jile ma-túːnda ‘s/he ate fruit’ As seen from the above examples, the H tone will occur phrase-finally in the past tense if the subject prefix on the verb is either first or second person. Otherwise the phonological phrase will receive a default penultimate H.

4.1.1 Tone as toneme versus morphotoneme Comparisons of Iau and Chimwiini, representing two extremes, reveal two different approaches to tonal contrasts, which are exemplified by two pioneers in the study of tone. For Kenneth Pike the presence of tone had to do with surface phonological contrasts. Hence, a language with tone is one ‘having significant, contrastive, but relative pitch on each syllable’ (K. Pike 1948: 3). For William E. Welmers, on the other hand, tone was seen to be an underlying property of morphemes. Hence, a tone language is one ‘in which both pitch phonemes [read: features] and segmental phonemes enter into the composition of at least some morphemes’ (Welmers 1959: 2; 1973: 80). Since Pike conceptualized tone as relatively concrete surface contrasts, he assumed that every output syllable carries a tone (or tones), as in Iau. Welmers, on the other hand, emphasized that a tone system could have toneless tone-bearing units (TBUs) as well as toneless morphemes (e.g. toneless noun and verb roots in Chimwiini). We here follow Welmers in defining a tone language as one in which both tonal and segmental features enter into the composition of at least some morphemes.

4.1.2 Tone as pitch versus tone package As indicated, the bare minimum to be considered a ‘tone language’ is that pitch enters as a (contrastive) exponent of at least some morphemes. However, more than pitch can be involved in a tonal contrast. This is particularly clear in Chinese and Southeast Asian lan guages. As seen in Table 4.2, different phonation properties accompany the six-way tonal contrast in Hanoi Vietnamese (Kirby 2011: 386).

Table 4.2 Tonal contrasts in Vietnamese Vietnamese term Pitch level Contour Other features

Example

Ngang Huyê`n Sa˘´c Na̩n ̌ g Hỏi Nga᷉

ma mà má mṃ mả ma᷉

high-mid mid high low low high

level falling rising falling falling rising

laxness laxness, breathiness tenseness glottalization or tenseness tenseness glottalization

‘ghost’ ‘but, yet’ ‘cheek’ ‘rice seedling’ ‘tomb’ ‘code’

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

tone systems 47 In ‘stopped’ syllables ending in /p, t, k/ the above six tones are neutralized to a binary contrast between a ‘checked’ rising versus low tone (mát ‘cool’, ma̩t ‘louse, bug’). In addition to glottalization, breathiness, and tense-laxness, different tones can have different durations. While falling and rising contour tones may be associated with greater duration (Zhang 2004a), tone-specific durational differences are not always predictable on the basis of universal phonetics. Thus, of the four tones of Standard Mandarin as spoken in isolation, level H Tone 1 and HL falling Tone 4 tend to be shorter than rising Tone 2, which is shorter than low-dipping Tone 3 (Xu 1997: 67).2 Correlating with such complex phonetic realiza tions found in Chinese and South East Asia is the traditional view of areal specialists that contour tones should be interpreted as units and not sequences of individual level tones. The typological distinction seems therefore to be between tone as a ‘package’ of features (necessarily including pitch) versus tone as pitch alone (cf. Clements et al. 2010: 15). For examples from South East Asia, see §23.2.2, which refers to such tone packages as tonation, following Bradley (1982). Especially prevalent in languages with monosyllabic words, the worldwide distribution of these complexes of tone and other laryngeal features can be char acterized as the Sinosphere versus the rest of the world. Outside the Sinosphere (Matisoff 1999), phonations and non-universal timing differences between tones are much rarer. Where they do occur (e.g. in the Americas), they are generally independent of the tones, and tonal contours are readily decomposable into sequences of level tones (cf. §4.2.2).

4.1.3 Tone-bearing unit versus tonal domain (mora, syllable, foot) Another way tone systems can differ is in their choice of TBU and tonal domain. By TBU we mean the individual landing sites to which the tones anchor. Past literature has referred to vowels (or syllabic segments), to morae, or to syllables as the carriers of tone. Some lan guages count a bimoraic (heavy) syllable as two TBUs and a monomoraic (light) syllable as one. Such languages often allow HL or LH contours only on bimoraic syllables. Thus, in Jamsay [Dogon; Mali], bimoraic CV: syllables can be H, L, HL, or LH, while monomoraic CV syllables can only be H or L (Heath 2008: 81). Other languages are indifferent to syllable weight and treat all syllables the same with respect to tone. Another notion distinct from the TBU is the domain within which tones (or tonal melo dies) are mapped. In Kukuya [Bantu; Republic of Congo], for example, the five tonal melodies /L, H, LH, HL, LHL/ are a property of the prosodic stem (Paulian 1975; Hyman 1987). Thus, in (2), the /LHL/ ‘melody’ stretches out over the maximally trimoraic stem. (2) (ndὲ) (ndὲ) (ndὲ) (ndὲ) (ndὲ)

bvɪ ᷈ kàây pàlɪ ̂ bàámì kàlə́gì

‘(s/he) falls’ ‘(s/he) loses weight’ ‘(s/he) goes out’ ‘(s/he) wakes up’ ‘(s/he) turns around’

/kàɪ ̂/

2 Xu adds that these findings match results from earlier studies, citing Lin (1988a). We thank Kristine Yu for locating these studies for us. See also Yu (2010: 152) and references cited therein.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

48 larry m. hyman and william r. leben For some this has meant that the prosodic stem is the TBU. However, it is important to keep distinct the carrier of tone (mora, syllable) versus the domain within which the tones or tonal sequences map. While the distinction is usually clear, Pearce’s (2013) study of Kera [Chadic; Chad, Cameroon] shows how the two notions can be confused. In this language, tones are mapped by feet. Since it will often be the case that a foot takes one or another tone (or tone pattern), it is tempting to refer to the foot as the TBU. A similar situation arises in Tamang [Tibeto-Burman; Nepal] (Mazaudon and Michaud 2008), where there are four word-tone patterns (with phonations) that map over words. In languages that place a single ‘culminative’ tone, typically H, within a prosodic domain, as in Chimila [Chibchan; Columbia] (Malone 2006: 34), the H is often described not only as a property of its TBU but also of its domain. However, the distinction between TBU and tonal domain is clearer in most languages, and it is useful to keep them separate.

4.1.4 Tone versus accent The example of a single H per domain brings up the question of whether the H is (only) a tone or whether it is also an ‘accent’. We saw such a case in Chimwiini with only one H tone per phonological phrase. One way to look at this H is from the perspective of the domain, the phonological phrase. In this case, since there can be only one H, the tempta tion is to regard the H as an ‘accent’, as Kisseberth and Abasheikh (1974) refer to it. However, in (1), final H is a strictly tonal exponent of the first or second person subject prefix. In this sense it satisfies the definition of tone, which is our only concern here. Although there are other cases where the tone-versus-accent distinction becomes blurred, the goal is not to assign a name to the phenomenon, rather to understand strictly tonal properties. While there are very clear cases of tone, such as Iau in Table 4.1, and of accent (e.g. stress in English), no language requires a third category called ‘pitch accent’ or ‘tonal accent’ (Hyman 2009). Instead, there are languages with restricted, ultimately obligatory and culminative ‘one and only one H tone per domain’, as in Kinga [Bantu; Tanzania] (Schadeberg 1973) and Nubi [Arabic-lexified creole; Uganda] (Gussenhoven 2006). Between this and a system that freely combines Hs and Ls, languages place a wide range of restrictions on tonal distributions.

4.2 Phonological typology of tone by inventory There are a number of ways to typologize tone systems by inventory. The first concerns the number of tones, which can be calculated in one of two ways: (i) the number of tone heights and (ii) the number of distinct tonal configurations including level and contour tones. Tone systems can also differ in whether they allow register effects such as downstep, by the various constraints they place on the distribution of their tones and whether the lack of tone (Ø) can function as a contrastive value. We take up each of these in this section.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

tone systems 49

4.2.1 Number of tones In order to satisfy the definition of a tone language in §4.1, there must minimally be a binary contrast in pitch. In most cases this will be a contrast between the two level tones /H/ and /L/, as in Upriver Halkomelem [Salish; Canada] /qwáːl/ ‘mosquito’ vs. /qwàːl/ ‘to speak’ (Galloway 1993: 3). Other languages contrast up to five tone heights as in Shidong Kam [Tai-Kadai; China] (Edmondson and Gregerson 1992: 566). A tone system may dis tinguish fewer underlying contrastive tone heights than surface ones. A particularly dra matic case of this is Ngamambo [Bantoid; Cameroon], which although analysed with /H, L/ (Hyman 1986a), presents a five-way H, M, ꜜM, L˚, L contrast on the surface. (Concerning L˚ see §4.2.2; concerning ꜜM see §4.2.3.) More than a simple distinction of contrasting heights is sometimes needed based on the phonological behaviour of the tones. While the most common three-height tonal contrast is /H, M, L/ (á, ā, à), where /M/ functions as quite distinct from /H/ and /L/, some tone systems instead distinguish H and extra H (a̋, á, à) or L and extra L (á, à, ȁ).

4.2.2 Contour tones In addition to level tones, languages often have contour tones, where the pitch can be fall ing, rising, rising-falling, or falling-rising. Essentially what this means is that two or more tone heights are realized without interruption by a (supralaryngeal) consonant. Contours thus occur in all but the last Kukuya example in (2). As shown autosegmentally in (3a), the LHL sequence is realized on a single mora. In (3b) the LH sequence is realized one to one on the first two morae. Both (3a) and (3b) would be called contours in contrast with the L-to-H-to-L transitions in (3c), where each tone is linked to a CV mora. (3) a. bvɪ LHL

b. bàámì

c. kàlə´gì

LH L

LHL

Thus, contours arise either when more than one tone links to the same TBU or when two or more tones link to successive vocalic morae. A third possible interpretation often assumed in the study of Chinese and South East Asian languages would treat the sequenced pitch gestures as a single unit, as mentioned in §4.1.2, such as ‘falling’ (see Yip 1989, 2002: 50–52). Note that the above refers to phonological contours. It is often the case that level tones also redundantly contour. It is very common for sequences of like tones to slightly rise or trail off in actual pronunciation. Even among closely related languages there can be differences. Within the Kuki-Chin branch of Tibeto-Burman, a prepausal L tone will abruptly fall in KukiThaadow, e.g. /zààn/ [ \] ‘night’. This is the most common realization of a L tone before pause. In closely related Hakha Lai, however, L tone is realized with level pitch, e.g. /kòòm/ [ _ ] ‘corn’. Other languages can have a surface contrast between falling versus level L. In most cases the level tone, represented with ˚, can be shown to be the result of the simplification of a final

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

50 larry m. hyman and william r. leben r ising tone or of the effect of a ‘floating’ H tone in the underlying representation, e.g. BamilekeDschang [Bantoid; Cameroon] /lə̀-tɔ̀ŋ´/ → lə̀tɔ̀ŋ˚ ‘navel’ vs. /lə̀-tɔ̀ŋ/ → lə̀tɔ̀ŋ ‘to reimburse’ (Hyman and Tadadjeu 1976: 91). Correspondingly, there are languages where /H/ (more rarely /M/) is realized as a falling contour before pause (and hence in isolation), e.g. Leggbó [Cross River; Nigeria] /dzɔ́/ → dzɔ̂ ‘ten’ (Hyman, personal notes) and Tangkhul Naga [TibetoBurman; North East India] /sām/ → sa᷆m ‘hair’ (Hyman, personal notes). In other cases con tour tones arise by fusing unlike tones, either between words, as in the reduplication case in Etsako [Benue-Congo; Nigeria] (Elimelech 1978: 45) in (4a), or by affixation of a grammatical tone, as in Tanacross [Athabaskan; Alaska] in (4b), where the possessive H+glottal stop suffix also conditions voicing (Holton 2005: 254). (4)

a. ówà + ówà ‘house’ → ówǒwà b. š-tš’òx + ´ ʔ → š-tš’ǒɣʔ

‘every house’ ‘my quill’

In other cases input contours are simplified to level tones (see §4.3.2). In short, contours can be either underlying or derived.

4.2.3 Downstep and floating tones In most two-height systems, alternating Hs and Ls in declarative utterances usually undergo ‘downdrift’, in which each H preceded by L is lowered, and with each lowered H establishing a new terrace for further tones. Downdrift is absent in a few two-height languages, such as Haya [Bantu; Tanzania] (Hyman 1979a). Independent of whether a language has downdrift or not, it may also have non-automatic downsteps, marked by ꜜ. The most common down stepped tone is ꜜH, which usually contrasts with H only after another (ꜜ)H, as in Aghem [Bantoid; Cameroon]. As seen in (5a), the two nouns ‘leg’ and ‘hand’ are both realized H-H in isolation (Hyman 2003). (5) a. kɨ́-fé H H

‘leg’ [ ]

kɨ́-wó H H

‘hand’ [ ] ‘this hand’ [ ]

b. fé H

kɨ́n H

‘this leg’ [ ]

wó ꜜkɨ́n H L H

c. fé

kɨ̂a

‘your sg. leg’

wó

H

L

[

]

kɨ̀a

H L L

‘your sg. hand’ [

]

However, as seen in (5b) and (5c) they have different effects on the tones that follow (the noun class prefix /kɨ ́-/ drops when these nouns are modified). When followed by /H/ tone /kɨ ́n/ ‘this’ in (5b), ‘this leg’ is realized H-H, while ‘this hand’ is realized H-ꜜH. As indicated, lowering or downstepping of /kɨ ́n/ is conditioned by an abstract floating L tone (which used to be on a lost second stem syllable; cf. Proto-Bantu *-bókò ‘hand’). While this L has no TBU to be pronounced on its own, it has effected a ‘register lowering’ on the H of the demonstra tive. The same floating L tone blocks the H tone of /-wó `/ ‘hand’ from spreading onto the following L tone possessive pronoun /kɨ̀a/ in (5c).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

tone systems 51 Although first and most extensively documented in African tone systems, downstepped H is found in New Guinea, e.g. Kairi [Trans New Guinea; Papua New Guinea] (Newman and Petterson 1990); Asia, e.g. Kuki-Thaadow [Tibeto-Burman; Myanmar, North East India] (Hyman 2010a); Mexico, e.g. Coatzospan Mixtec [Mixtecan] (E. Pike and Small 1974); and South America, e.g. Tatuyo [Tukanoan; Colombia] (Gomez-Imbert 1980). The most common source is the loss of a L tone between Hs, whether by loss of the TBU (as in the Aghem example in (5b)), by simplifications of a contour tone in a HL-H or H-LH sequence, or by one of assimilation of a /H-L-H/ sequence to either H-H-ꜜH or Hꜜ-H-H, e.g. Igbo [Benue-Congo; Nigeria] /ócé/ ‘chair’ + /àtó̙/ ‘three’ → ócé ꜜátó̙ ‘three chairs’ (Welmers and Welmers 1969: 317). Another source is the downstepping of a H that directly follows another H, as when Shambala [Bantu; Tanzania] /nwáná/ ‘child’ + /dú/ ‘only’ is realized nwáná ꜜdú ‘only a child’ (Odden 1982:187). While H is by far the most common downstepped tone, downstepped M and L also occur as contrastive tones, albeit rarely. An example is the five-way H, M, ꜜM, L˚, L contrast of Ngamambo (Bantoid; Cameroon) (Hyman 1986a: 123, 134–135). In some languages, the sequence H-ꜜH contrasts phonologically with H-M, even though the step from H to ꜜH can be identical (or nearly so) to the step from H to M. The difference is phonological: ꜜH establishes a new register, so that an immediately following H will be on the same level as ꜜH, while following H-M a H will go up a step in pitch (Hyman 1979a). While downstep is clearly established in the literature, there are also occasional men tions of an opposite ‘upstep’ phenomenon whereby H tones become successively higher. This is found particularly in Mexican tone languages. In Acatlán Mixtec (E. Pike and Wistrand 1974: 83) where H and upstepped H contrast, upstep appears to be the reverse of downstep. In Peñoles Mixtec (Daly and Hyman 2007: 182) a sequence of input Hs is real ized level; however, if preceded by a L, the Hs will each go up in pitch, ultimately reaching the upper end of a speaker’s pitch range. Upstep has also been reported in some African languages, such as Krachi [Kwa; Ghana] (Snider 1990). A number of other languages have what has been called ‘upsweep’: a sequence of H tones begins quite low and reaches an ultimately H pitch level (Tucker 1981) One such language is Baule [Kwa; Ivory Coast] (Leben and Ahoua 1997). Connell (2011) and Leben (in press) survey downstepping and upstepping phenomena in some representative languages.

4.2.4 Underspecified tone and tonal markedness In many tone languages one of the contrastive tone heights is best represented as the absence of tone. The simplest and most common case is a two-height system with an underlying contrast between /H/ and Ø. TBUs that do not have an underlying /H/ may acquire H or L by rule, the latter most often as a default tone. The primary argument for zeroing out a tone is that it is not ‘phonologically activated’ in the sense of Clements (2001). In /H/ vs. Ø languages, phonological rules and distributions refer to H but not to L. Examples are Chichewa [Bantu; Malawi] (Myers 1998), Tinputz [Oceanic; Papua New Guinea] (Hostetler and Hostetler 1975), Blackfoot [Algonquian; Montana, Alberta] (Stacy 2004), and Iñapari [Arawakan; Peru] (Parker 1999). Some languages contrast /L/ vs. Ø, e.g. Malinke [Mande;

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

52 larry m. hyman and william r. leben Mali] (Creissels and Grégoire 1993), Bora [Witotoan; Colombia, Peru] (Thiesen and Weber 2012: 56), and a number of Athabaskan languages (Rice and Hargus 2005: 11–17). The asymmetrical behaviour of different tones becomes even more evident in languages with multiple tone heights. A common analysis in a three-height tone system is to treat the M tone as Ø, as originally proposed for Yoruba [Benue-Congo; Nigeria] (Akinlabi 1985; Pulleyblank 1986), where only the H and L tones are activated. However, Campbell (2016) has shown that Zenzontepec Chatino [Oto-Manguean; Mexico] has a /H, M, Ø/ system, where Ø TBUs assimilate to a preceding (activated) H or M or otherwise receive the default L tone. Finally there are /H, L, Ø/ systems where Ø does not represent a distinct tone height, rather a TBU with a third behaviour. Thus in Margi [Chadic; Nigeria], roots and suffixes can have a fixed /H/ or /L/, or can have a third tone (Ø) that varies between H and L depending on the neighbouring tone (Pulleyblank 1986: 69–70).

4.2.5 Distributional constraints We have already mentioned that tone is more dense in some tone systems than in others (cf. Gussenhoven 2004: 35). At one end of the density scale are languages where all tonal con trasts are fully specified and realized in all positions. Assuming that the syllable is the TBU, a /H, L/ system would thus predict two possible contrasts on monosyllabic words, four contrasts on disyllabic words, eight contrasts on trisyllabic words, and so forth, as in Andoke [isolate; Colombia] (Landaburu 1979: 48). At the other extreme are systems such as Somali [Cushitic; Somalia], which, along with other restrictions, rarely allows more than one /H/ per word (Green and Morrison 2016). In between the extremes are systematic constraints, such as on the distribution of underlying tones or on their surface realization. In Tanimuka [Tukanoan; Colombia] (Keller 1999: 77), for example, disyllabic words are limited to H-H, L-H, and H-L, with *L-L non-occurring. The same requirement of at least one H is found on trisyllabic words (*L-L-L), but, in addition, there are no words of the shape *H-L-L. (There are no monosyllabic words.) As mentioned in §4.1.3, Kukuya allows the five prosodic stem melodies /L, H, LH, HL, LHL/. Significantly, it is missing the possibility of a /HLH/ melody. Where underlying /HLH/ sequences occur, they are frequently modified (Cahill 2007), such as to trisyllabic H-H-H, H-ꜜH-H, or H-H-ꜜH. Languages also may restrict some or all tonal contrasts to the stressed syllable, as in the ‘Accent 1’ versus ‘Accent 2’ in Swedish and Norwegian (Riad 1998a inter alia). More dramatic is the Itunyoso Trique [Oto-Manguean; Mexico] nine-way tonal contrast (45, 4, 3, 2, 1, 43, 32, 31, 13) realized only on the word-final (stressed) syllable (DiCanio 2008). Both underlying and derived contour tones can also have strict distribution constraints. First, they can be restricted by syllable type. Heavy, especially long-vowel syllables support tonal contours better than syllables with shorter rimes or stop codas (Gordon 2001a; Zhang 2004a). In addition, tonal contours can be restricted to stressed syllables or to phrase-final or penultimate position. The following markedness scale is generally assumed (where R = rising, F = falling, and > means ‘more marked than’): RF, FR > R > F > H, L (cf. Yip 2002: 27–30). Finally, contours can be restricted by what precedes or follows them: Some languages require that a contour be preceded or followed by a like tone height

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

tone systems 53 (e.g. L-LH, HL-L), while others prefer that the neighbouring tone height be opposite (e.g. H-LH, HL-H) (Hyman 2007: 14). Other languages restrict contour tones to final position (Clark 1983).

4.3 Phonological typology of tone by process In some languages the input or underlying tonal contrasts are realized essentially the same in the surface phonology. Such is the case in Tangkhul Naga [Tibeto-Burman; NE India], which has no tonal morphology or tonal alternations; the /H/, /M/, and /L/ tones that contrast on lexical morphemes (páay ‘defecate’, pāay ‘be cheap, able’, pàay ‘jump’) do not change in the output. However, in many (perhaps most) tone systems, input tones can be modified by processes triggered by another tone, a boundary, or the grammar (morphology, syntax). In this section we consider the most common phonologically conditioned tone rules.

4.3.1 Vertical assimilation Whenever tones of different heights occur in sequence, the pitch level of one or the other can be raised or lowered by what we call ‘vertical assimilations’. In a two-tone system, the interval of a /L-H/ sequence generally compresses, while a /H-L/ interval expands. For example, a H is raised to a new extra H level before a L in Engenni [Edoid; Nigeria] (Thomas 1978: 12). While only the H TBU immediately before the L is affected in Engenni, in other languages it can be a whole sequence of Hs, especially when the Hs are raised in anticipation of an upcoming downstep, which can be located several syllables away, as in the examples in (6) from Amo [Kainji; Nigeria] (Hyman 1979a: 25n). (6)

a. kìté úkɔ́ɔm ́ í fínáwà b. kìꜛté úkɔ́ɔm ́ í fíkáꜜlé

‘the place of the bed of the animal’ ‘the place of the bed of the monkey’

By expanding the /L-H/ interval of kìꜛté ‘place’ in (6b), speakers create the tonal space for what can in principle be an unlimited number of ꜜH pitch levels (cf. §4.2.3). Such anticipa tory pre-planning is extremely common, perhaps universal in languages with downstep (cf. Rialland 2001; Laniran and Clements 2003). Vertical assimilations can occur in multi-height tone systems as well. Thus, Jamieson (1977: 107) reports that all three non-low tones are raised before a low tone in four-height Chiquihuitlán Mazatec [Oto-Manguean; Mexico]. Less common are cases where the /H-L/ interval is compressed to [M-L] or [H-M], the latter occurring in the Kalenjin group of Nilotic [Kenya], e.g. Nandi áy-wà → áy-wā ‘axe’ (Creider 1981: 21) and Endo tány ‘cow’ + àkà ‘another’ → tány ākā ‘another cow’ (Zwarts 2004: 95). Finally, note that vertical assimi lation can be conditioned by a boundary, as when one or more H TBUs are realized M before pause in Isoko and Urhobo [Edoid; Nigeria] (Elugbe 1977: 54–55).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

54 larry m. hyman and william r. leben

4.3.2 Horizontal assimilation Whereas vertical assimilations involve an upward or downward adjustment in pitch range, horizontal assimilations involve cases where a tone extends to a neighbouring TBU. Better known as ‘tone spreading’, the most common cases involve perseverative assimilations where the first tone spreads into a following TBU. In horizontal assimilations there is a tendency for a tone to last too long rather than start too early (Hyman and Schuh 1974: 87–90), as schematized in (7). (7) a. Natural L-H → L-LH H-L → H-HL (perseverative)

b. Less natural L-H → LH-H H-L → HL-L (anticipatory)

In (7a), L tone spreading (LTS) and H tone spreading (HTS) create a contour tone on the next syllable, which may, however, be simplified by subsequent processes, as in the case of HTS in Adioukrou [Kwa; Ivory Coast] in (8c) below (Hérault 1978: 11). (8)

a. /jɔ́w + à/

→ jɔ́w â

‘the woman’

b. /tʃǎn +à/

→ tʃǎn â

‘the goat’

c. /má + dʒěn/ → má dʒe᷉n ‘type of pestle’ (→ [má dʒéꜜń]) LTS occurs less frequently (and often with more restrictions) than HTS, e.g. applying in Nandi [Nilotic; Kenya] only if the H TBU has a long vowel: /là̙ːk-wé̙ːt/ → là̙ːk-wě̙ːt ‘child’ (Creider 1981: 21). Many languages have both HTS and LTS that potentially interact with each other in longer utterances. In (9a) we see that HTS combined with LTS creates successive contour tones in Yoruba [Benue-Congo; Nigeria] (Laniran and Clements 2003: 207). In Kuki-Thaadow [Tibeto-Burman; North East India, Myanmar] the expected contours in (9b) are, however, simplified, since the language allows contours only on the final syllable (Hyman 2010a). (9) a. /máyò̙ | | H L

mí rà wé/ | | | H L H

b. /kà zóoŋ lìen thúm/ | | | | L H L H

[máyô̙ mıˇ râ wě]

‘Mayomi bought books’

[kà zòoŋ líen thuˇm]

‘my three big monkeys’

In languages with a M tone, spreading most likely occurs between Hs and Ls, where the interval is greater than between M and either H or L. However, Gwari [Benue-Congo; Nigeria] (Hyman and Schuh 1974: 88–89) not only has HTS and LTS of the sort illustrated in (9), but also /M-L/ is realized M-ML, e.g. /ōzà/ → ōza᷆ ‘person’. We thus expect tone spreading to follow a hierarchy of likely occurrence, HTS > LTS > MTS, where HTS is most common and MTS quite rare. While the above examples involve an interaction between H and L tones, spreading can also occur in privative systems. In /H/ vs. Ø systems, H tone often spreads over multiple TBUs, as in Yaqui [Uto-Aztecan; Mexico] /téeka/ → tééká ‘sky’, /tá-tase/ → tá-tásé ‘is cough ing’ (Demers et al., 1999: 40). In both Bagiro [Central Sudanic; Democratic Republic of

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

tone systems 55 Congo] (Boyeldieu 1995: 134) and Zenzontepec Chatino [Oto-Manguean; Mexico] (Campbell 2016: 147), a H tone will spread through any number of Ø (or L) TBUs until it reaches either pause or a H or M tone, which will be downstepped, as in the Zenzontepec Chatino example (10). (10) ta

tāká | | MH

tzaka

nkwítza | H

‘there already was a child’ (already exist one child)

[tà tāká tzáká nkwítzá] (Campbell 2016:148) In other languages HTS can be restricted from applying onto a L (or Ø) TBU that is fol lowed by another H. Thus, in Chibemba [Bantu; Zambia], /bá-la-kak-a/ ‘they tie up’ is real ized bá-lá-kàk-à, with HTS from the subject prefix /bá-/ onto the following tense marker, while /bá-la-súm-a/ ‘they bite’ is realized bá-là-súm-á utterance-medially. As seen, the /H/ of /bá-/ cannot spread, because it would bump into the H of /-súm-/, a violation of the ‘Obligatory Contour Principle’, so named by Goldsmith (1976a, 1976b) for a prohibition against two identical autosegments adjacent in a melody. (The /H/ of the verb root /-súm-/ does, however, spread onto the final inflectional suffix /-a/.)

4.3.3 Contour simplification We have already seen in some of the examples that a common process is contour tone sim plification. As mentioned in §4.2.5, languages frequently limit the distribution of contour tones, requiring that their beginning or end points be preceded or followed by a like or unlike tone height. The Kuki-Thaadow example in (9b) shows another tendency, which is to restrict contour tones to the final syllable of a phrase or utterance. A major motivator of contour simplification is the general principle of minimizing the number of ups and downs, a potential problem that becomes particularly acute when a contour is surrounded by unlike tone heights. Table 4.3 lists various fates of the L-HL-H input sequence in the indi cated Grassfields Bantu languages [Bantoid; Cameroon] (Hyman 2010b: 71).

Table 4.3 Different contour simplifications of L-HL-H Language

Output

Process

Mankon Babanki Babadjou Yemba (Dschang) Kom Aghem

L-H-ꜛH L-M-H L-H-ꜜH L-ꜜH-H L-M-M L-H-H

H-upstep HL-fusion H-downstep HL-fusion + H-downstep H-lowering L-deletion

Leroy (1979) Hyman (1979b) Hyman, field notes Hyman and Tadadjeu (1976) Hyman (2005) Hyman (1986b)

As seen, contour simplifications can produce a new surface-contrastive tone, such as the M in Babanki in (11), which results from the simplification of the HL resulting from n-deletion (Akumbu 2016):

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

56 larry m. hyman and william r. leben (11)

a. kə̀-bán + ə̀-kɔ́m → kə̀-bāː kɔ́m ‘my fufucorn’ b. kə̀-ŋkón + ə̀-kɔ́m → kə̀-ŋkɔ̄ː kɔ́m ‘my fool’

Similarly, to minimize ups and downs, an input H-LH-L sequence is subject to multiple modifications in the output. A second motivation for contour simplification is tone absorption (Hyman and Schuh 1974: 90), whereby the input sequences LH-H and HL-L are simplified to L-H and H-L, r espectively. In these cases the endpoint of the contour has been masked by the following like tone height. Thus, in Lango [Nilotic; Uganda], a HL falling tone derived by HTS is simplified to H: /dɔ́g gwὲnò/ (mouth + chicken) → dɔ́g gwɛ̂nò → dɔ́g gwέnò ‘chicken’s mouth’ (Noonan 1992: 51). Another common change is LH-L → L-HL, which occurs in Lango (p.53), Isthmus Zapotec [Oto-Manguean; Mexico] (Mock 1988: 214), and elsewhere. In this case the more marked LH rising tone is avoided and a less marked HL falling tone results.

4.3.4 Dissimilation and polarity The tone processes discussed above all either are assimilatory or represent simplifications of contours and other ‘ups and downs’. As in segmental phonology, there are processes that are dissimilatory in nature. In Munduruku [Tupi; Brazil], a L tone becomes H after /L/, as in /è + dìŋ/ (tobacco + smoke) → è-díŋ ‘tobacco smoke’ (Picanço 2005: 312). Besides tone levels, contours dissimilate, as when Hakha Lai [Tibeto-Burman; Myanmar, North East India] LH rising tone becomes falling HL after another LH rising tone, as in /ka kǒoy hrǒm/ → ka kǒoy hrôm ‘my friend’s throat’ (Hyman and VanBik 2004: 832). Similar ‘con tour metatheses’ occur in various Chinese dialects, e.g. Pingyao hai35 + bing35 → hai53 bing35 ‘become ill’ (Chen 2000: 15). Even disyllabic sequences can dissimilate. Thus in Cuicateco [Oto-Manguean; Mexico], a sequence of /M-L/ + /M-L/ becomes L-M + M-L, as in /ntōʔò/ ‘all’ + /ʔīnù/ ‘three’ → ntòʔō ʔīnù ‘all three’ (Needham and Davis 1946: 145). In some cases where it is not desirable to start with an underlying tone, a morpheme may receive the opposite ‘polar’ tone to what precedes or follows. In Eastern Kayah Li (Karen) [Tibeto-Burman; Myanmar], which distinguishes /H, M, L/, prefixes contrast in tone before a M root: ʔì-lū ‘the Kayah New Year festival’ vs. ʔí-vī ‘to whistle’. However, prefixes are Hbefore /L/ and L- before /H/, as in ʔí-lò ‘to plant (seeds)’ and ʔì-khré ‘to winnow’ (Solnit 2003: 625). In many analyses morphemes with polar tone are analysed as underlyingly toneless, receiving their tone by context. This is so for Margi [Chadic; Nigeria], discussed at consider able length by Pulleyblank (1986: 203–214), as well as Fuliiru [Bantu; Democratic Republic of Congo], which contrasts /H/, /L/, and /Ø/ verb roots, the last behaving like /H/ or /L/ verb roots in different parts of the paradigm (Van Otterloo 2014: 386).

4.4 Grammatical tone In this section we consider grammatical functions of tone. While tone is (almost) completely lexical in many languages (e.g. most Chinese languages), there are other lan guages where tone is largely grammatical—for example, marking morphological classes,

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

tone systems 57 morphological processes, and ultimately syntactic configurations as well as semantic and pragmatic functions such as negation and focus. For example, in the Igboid language Aboh [Benue-Congo; Nigeria], the difference between affirmative and negative can be solely tonal: ò jè kò ‘s/he is going’ vs. ó jé kò ‘s/he is not going’. Grammatical functions of tone are as varied as grammar itself. From the above examples we see that tone can function alone as a morpheme. It follows therefore that if tone can be a morpheme, it can do everything that a (segmental) morpheme can do, such as mark singular/plural, case, person, tense, aspect, and of course negation (Hyman and Leben 2000: 588). On the other hand, tone vastly surpasses segmental phonology in encoding syntactically dependent prosodic domains (cf. §4.4.6).

4.4.1 Lexical versus morphological tone It is clear that tone can be a property of either lexical morphemes (nouns, verb roots, etc.) or grammatical elements (pronouns, demonstratives, prefixes, suffixes, clitics, etc.). There may of course be generalizations concerning the distribution of tones by word class. For example, in Mpi [Tibeto-Burman; Thailand], nouns contrast /H, M, L/ (sí ‘four’, sī ‘a colour’, sì ‘blood’) while verbs contrast /MH, LM, HL/ (sı᷄ ‘to roll’, sı᷅ ‘to be putrid’, sî ‘to die’) (Matisoff 1978). However, the term ‘grammatical tone’ does not usually refer to tonal con trasts on segmental morphemes, such as the H and L tones of the subject pronouns à ‘I’, ò ‘he’, and á ‘she’ in Kalabari [Ijoid; Nigeria] (Jenewari 1977: 258–259). In such cases the tone is clearly linked to its TBU and not assigned by a grammatical process. In the following subsections, ‘grammatical tone’ will refer to cases either where tone is the sole exponent of morphology or where morphology introduces tonal exponents that are realized independ ent of any segmental morpheme that may accompany the tone.

4.4.2 Tonal morphemes The most straightforward type of grammatical tone is where the tone is the only exponent of a morphological distinction. Typically called a ‘tonal morpheme’, its position can some times be established within a string of (segmental) morphemes. For example, the subject H tone of Yoruba [Benue-Congo; Nigeria] occurs exactly between the subject and verb: o¯ m ̙ o¯ ̙ + ´+ lo¯ ̙ → o¯ ̙m ó̙ lo¯ ̙ ‘the child went’ (Akinlabi and Liberman 2000: 35). Similarly, the H geni tive (‘associative’) marker of Igbo [Benue-Congo; Nigeria], often translatable as ‘of ’, can be located between the two nouns in /àlà/ ‘land’ + ´ + /ìgbò/ ‘Igbo’ → àlá ìgbò ‘Igboland’ (Emenanjo 1978: 36). Such tonal morphemes can have any shape (L, M, etc.) and can even occur in sequences. In other cases it is harder to analyse morphological tones as items to be arranged in a sequence with segmental morphemes. Instead, individual H or L tones may be assigned to various positions within a paradigm. In the /H/ vs. Ø language Kikuria [Bantu; Kenya], there is no lexical tone contrast on verb roots. Instead, different inflectional features assign a H tone to the first, second, third, or fourth mora of the verb stem. In the examples in Table 4.4, the stem is bracketed and the mora receiving the H is underlined. As also seen, this H then spreads to the penultimate vowel (Marlo et al. 2014: 279).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

58 larry m. hyman and william r. leben

Table 4.4 H- tone stem patterns in Kikuria μ1

ntoo-[kó̱óndókóra]

μ2

ntooɣa-[koó̱ndókóóye]

‘indeed we have already uncovered’ ‘indeed we have been uncovering’

μ3 μ4

ntore-[koondó̱kóra] tora-[koondokó̱ra]

‘we will uncover (then)’ ‘we are about to uncover’

Untimed past anterior Hodiernal past progressive Anterior focused Remote future focused Inceptive

4.4.3 Replacive tone In other cases a morphological process may assign a ‘replacive’ tone or tonal schema. Table 4.5 gives examples from Kalabari [Ijoid; Nigeria], where a LH ‘melody’ replaces the contrastive verb tones in deriving the corresponding intransitive verb (Harry and Hyman 2014: 650).

Table 4.5 Detransitivizing LH replacive tone in Kalabari Transitive kán kɔ̀n ányá ɗ ìmà sá↓kí kíkímà pákìrí gbóló↓má

H L H-H L-L H-ꜜH H-H-L H-L-H H-H-ꜜH

Intransitive

‘tear, demolish’ ‘judge’ ‘spread’ ‘change’ ‘begin’ ‘hide, cover’ ‘answer’ ‘join, mix up’

kàán kɔ̀ɔn ́ ànyá ɗ ìmá sàkí kìkìmá pàkìrí gbòlòmá

LH LH L-H L-H L-H L-L-H L-L-H L-L-H

‘tear, be demolished’ ‘be judged’ ‘be spread’ ‘change’ ‘begin’ ‘be hidden, covered’ ‘be answered’ ‘be joined, mixed up’

As seen, the LH melody is realized as a LH rising tone (with vowel lengthening) on monosyllables, L-H on two syllables, and L-L-H on trisyllabic verbs. In (12), denominal adjectives are derived via replacive H tone in Chalcatongo Mixtec [Oto-Manguean; Mexico] (Hinton et al. 1991: 154; Macaulay 1996: 64), while deadjectival verbs are derived via repla cive L in Lulubo [Central Sudanic; South Sudan] (Andersen 1987a: 51). (12)

a.

Chalcatongo Mixtec

b. Lulubo

bīkò tānà sòʔò žūù

‘cloud’ ‘medicine’ ‘ear’ ‘rock’

ōsú ‘good’ àkēlí ‘red’ álí ‘deep’

→ → → →

bíkó táná sóʔó žúú

→ òsù → àkèlì → àlì

‘cloudy’ ‘medicinal’ ‘deaf ’ ‘solid, hard’ ‘become good’ ‘become red’ ‘become deep’

Replacive tones are found in Asia as well, such as in Southern Vietnamese [Mon-Khmer; Vietnam] (Thompson 1965) and White Hmong [Hmong-Mien; China] (Ratliff 1992). For cases of replacive tone conditioned by phrasal domains see §4.4.6.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

tone systems 59

4.4.4 Inflectional tonology The above examples show that tone can directly mark derivational processes (to which we return in §4.4.5). It may also mark inflectional morphology, specifically morphosyntactic features such as person, number, gender, tense, and aspect. Thus, in Ronga [Nilo-Saharan; Chad, Central African Republic] certain nouns mark their plural by assigning a H tone: tə̀ù ‘flour’ (pl. tə́ú), ndòbó ‘meat’ (pl. ndóbó) (Nougayrol 1989: 27). A similar H tone plural effect on possessive determiners occurs in Kunama [Nilo-Saharan; Eritrea] (Connell et al. 2000: 17), as shown in Table 4.6.

Table 4.6 Possessive determiners in Kunama

Singular

Plural

First person (exclusive) Second person Third person First person (inclusive)

-áaŋ -éy -íy -íŋ

-àaŋ -èy -ìy

As seen, the segmental morphs mark person, while the tones mark number (L for singular, H for plural). Similar alternations due to number are seen in Table 4.7 for noun class 9/10 in Noni [Bantoid; Cameroon] (Hyman 1981: 10).

Table 4.7 Noni SG~PL alternations in noun class 9/10

Stem tone

Singular

Plural

Alternation

(i) (ii) (iii) (iv)

/L/ /LH/ /HL/ /H/

jòm bìè bìeˉ bweˇ

/ ` + jòm/ / ` + bìé/ / ` + bíè/ / ` + bwé/

jo᷆m bíé bıˉeˉ bwé

/ ´ + jòm/ / ´ + bìé/ / ´ + bíè/ / ´ + bwé/

‘antelope’ ‘fish’ ‘goat’ ‘dog’

L vs. ML L vs. H LM vs. M LH vs. H

As indicated, from a two-height system, Noni developed a H, M, L surface contrast, where most occurrences of *H became M. The main exception is the plural of ‘fish’: in this case the expected HLH sequence simplified to H. A similar situation arises in Day [Adamawa; Chad] in the marking of aspect (Nougayrol 1979: 161). Although the language contrasts sur face H, M, and L we again recognize inputs /H/ and /L/ in Table 4.8.

Table 4.8 Day completive/incompletive aspect alternations

/yúú/ ‘put on, wear’

/yùù/ ‘drink’

Completive Incompletive

HL-

yúú yūū

yūū yùù

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

60 larry m. hyman and william r. leben As seen, when the completive H- prefix combines with the H tone verb ‘put on, wear’, the result is a H tone. Similarly, when the incompletive L-prefix combines with the L-tone verb ‘drink’, the result is a L tone. Both H+L and L+H result in M tone yūū, which can either mean ‘put on, wear’ (incompletive) or ‘drink’ (completive). While the above cases allow us to factor out the individual tonal contributions of each morpheme (an affix and a root), such a segmentation may be difficult or impossible in other cases. Recall from Table 4.1 the inflected verb forms that were seen in Iau, here summarized in Table 4.9.

Table 4.9 Iau verb tones

Telic

Totality of action

Resultative

Punctual Durative Incompletive

HL HLM HM

H ML HꜛH

LM M –

Although Iau verbs lend themselves to a paradigmatic display by plotting the above mor phosyntactic features, the portmanteau tonal melodies do not appear to be further seg mentable into single tones or features. Of course one can look for patterns of the sort that telic forms begin H and have a L, a M, or both tones after them, but these would not be predictive. Inflectional tonology may also produce scalar effects. In Gban [Mande; Ivory Coast], a language with four tone heights (4 = highest, 1 = lowest), there are systematic effects on inflected subject markers conditioned by person and tense (Zheltov 2005: 24). As seen in Table 4.10, first and second persons are one degree higher than third, and past tense is two degrees lower than present.

Table 4.10 Inflected subject markers in Gban First person Second person Third person

Present

sg. pl. u2 ı 2᷉ ɛɛ2 aa2 ɛ1 ɔ1 [-raised]

Past sg. ı ᷉4 ɛɛ4 ɛ3 [+raised]

pl. u4 aa4 ɔ3

[+upper] [-upper]

Although there are different ways to implement such a paradigm, Table 4.10 shows how the tonal reflexes can be nicely modelled with the features [upper] and [raised] (Yip 2002). Such an analysis is not possible in contiguous Guébie [Kru; Ivory Coast], where (i) each of the tone heights 1–4 goes down one level in the imperfective; (ii) just in case the imperfect is already 1, the tone height of the preceding subject is raised by one level instead; and (iii) just in case the subject is already 4, the tone height is further raised to a super-high 5 level, the only such occurrence in the language (Sande 2018).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

tone systems 61

4.4.5 Compounding In §4.4.3 we observed a number of cases where tone was the only reflex of a derivational process, such as a change in word class. Languages that have (almost) no morphology may, however, show traces of earlier derivational processes, as in the following falling-tone nom inalizations in Standard Mandarin: shán ‘to fan’ → shân ‘fan’ (n.), lı ̌an ‘to connect’ → lîan ‘chain’, shù(´) ‘to count’ → shû ‘number’ (Wang 1972: 489). Both in Chinese and in other tone languages, tones can be modified in compounding. Thus, in Shanghainese compounds all but the tone of the first word reduce to a default L, as in (13a) (Zee 1987; cf. Selkirk and Shen 1990). (13)

a.

ɕɪŋ + vəŋ → ɕɪŋ vəŋ ‘news’ < ɕɪ ̂ŋ ‘new’ (HL) HL LH H L ɕɪŋ + vəŋ + tɕia → ɕɪŋ vəŋ tɕia ‘news reporting circle’ HL LH MH H L L ɕɪŋ + ɕɪŋ + vəŋ + tɕi + tsɛ → ɕɪŋ ɕɪŋ vəŋ tɕi tsɛ ‘new news reporter’ HL HL LH MH MH H L L L L

b.

khʌʔ + sɤ → khʌʔ sɤ ‘to cough’ MH MH M H khʌʔ + sɤ + dã → khʌʔ sɤ dã ‘cough drops’ MH MH LH M H L khʌʔ + sɤ + jʌʔ + sr̹ + bɪŋ → khʌʔ sɤ jʌʔ sr̹ bɪŋ ‘cough tonic bottle’ MH MH LH MH LH M H L L L

The examples in (13b) show that when the first tone is a contour, here MH, its tones map to the first two syllables, any remaining syllables receiving a default L. It is very common for elements of a compound to undergo tonal modifications. This hap pens also in Barasana [Tukanoan; Colombia], which contrasts H-H, H-L, L-H, and L-HL on disyllabic words. In (14), ~ marks nasality, a prosodic property of morphemes (GomezImbert and Kenstowicz 2000: 433–434). (14)

a. H-L + H-L → H-L + L-L H-L + L-H → H-L + L-L H-L + L-HL → H-L + L-L

~újù ~kùbà ‘kind of fish stew’ (~kúbà ‘stew’) ~kíì jècè ‘peccary (sp.)’ (jècé ‘peccary’) héè rìkà ‘tree fruits (rìká` ‘fruits’) (in ritual)’

b. H-H + H-L → H-H + H-H ~ɨ ́dé ~bídí ‘bird (sp.)’ H-H + L-H → H-H + H-H ~kóbé cótɨ ́ ‘metal cooking pot’ H-H + L-HL → H-H + H-H héá ~gɨ ́tá-á ‘flint stone’

(~bídì ‘bird’) (còtɨ ́ ‘cooking pot’) (~gɨ̀tá-à ‘stone-cl’)

As seen in (14a), if the first member of the compound ends with L, the second m ember of the compound will be L-L. In (14b), however, where the first member ends with H, the second member is realized H-H. It is reasonable to assume that the tones of the second member have been deleted, followed by the spreading of the final H or L of the first member.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

62 larry m. hyman and william r. leben Reduction or loss of tone on certain member(s) of a compound is quite widespread. In the Mande group of West Africa, such changes are very common. Known as compacité tonale in the French literature, the process applies to compounds and to certain derivational processes (Creissels 1978). The following Bambara [Mali] examples from Green (2013: 4) illustrate compacité tonale in compounds (15a), noun+adjective (15b), and noun+derivational suffix (15c) combinations. (15)

a.

jàrá lion jàkúmá cat

+ +

wòló skin wòló skin

→ →

jàrà-wóló jàkùmà-wóló

‘lion skin’ ‘cat skin’

b.

jàkúmá cat jìgí hope

+ +

wárá wild -ntan neg

→ →

jàkùmà-wárá jìgì-ntán

‘feral cat’ ‘hopeless’

If the first word has a LH melody, the full compound is realized LH with the H on the final constituent and the L on preceding ones, as first formulated to our knowledge by Woo (1969: 33–34), with an acknowledgement to Charles S. Bird for help with the data (see also Leben 1973: 128; Courtenay 1974: 311). This is seen particularly clearly in more complex examples (16) (Green 2013: 9). (16)

a. fàlí + bálá + yὲlὲn → fàlì-bàlà-yɛ́lɛ́n ‘ride a donkey’ donkey upon climb b. nún nose

+ kɔ̀rɔ́ + síí under hair

→ nún-kɔ́rɔ́-síí

‘moustache’

4.4.6 Phrase-level tonology It is commonly observed that tones have the potential for considerable mobility and mutual interaction at a distance. This is seen particularly dramatically in their behaviour at the phrase level. As an example, Giryama [Bantu; Kenya] contrasts /H/ with Ø. In (17a) all of the morphemes are toneless, and all of the TBUs are pronounced with default L pitch. In (17b), however, where only the subject prefix /á-/ differs, its /H/ is realized on the penulti mate mora of the phrase (Volk 2011a: 17). (17) a.

All L tone

‘I want ...’ ni-na-maal-a ni-na-mal-a ku-guul-a ni-na-mal-a ku-gul-a ŋguuwo

b.

H tone on penultimate mora ‘he/she wants ...’ a-na-maál-a a-na-mal-a ku-guúl-a a-na-mal-a ku-gul-a ŋguúwo = H

‘... to buy’ ‘... to buy clothes’

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

tone systems 63 Outside of tone, no other phonological property is capable of such a long-distance effect. Even less dramatic tone rules applying between words are still much richer than what is found in segmental phonology. In the Chatino languages (Cruz 2011; Campbell 2014; McIntosh 2015; Sullivant 2015; Villard 2015) the tonal processes apply throughout the clause, blocked only by a sentence boundary or pause. In other languages they are subject to apply ing within certain prosodic domains that in turn are defined by the syntax. A good example of the latter occurs in Xiamen [Sinitic; Fujian province, China]. Known as the Southern Min Tone Circle, when followed by another tone, each of the five contrast ing tones is replaced by a different paired tone according to the schema 24, 44 → 22 → 21 → 53 → 44. Thus, in (18) (Chen 1987: 113), only the last tone remains unchanged. (18)

# yi kiong-kiong kio gua ke k’uah puah tiam-tsing ku ts’eq # 44 24 24 21 53 44 21 21 53 44 53 32 → 22 22 22 53 44 22 53 53 44 22 44 he by force cause I more read half hour long book ‘he insisted that I read for another half an hour’

The above changes take place only within what Chen (1987) calls a ‘tone group’, a phrasal prosodic domain determined by the syntax (cf. Selkirk 1986, 2011). While most of the tonal processes discussed in §4.3 were shown to be natural phonological rules, Xiamen shows that such ‘tone sandhi’ can also involve quite arbitrary replacive tone. Cases involving tone and syntactically defined prosodic domains are common, early examples being Chimwiini (Kisseberth and Abasheikh 1974, 2011), Ewe [Niger-Congo; Ghana, Togo] (Clements 1978), and several additional languages described in Clements and Goldsmith (1984), Kaisse and Zwicky (1987), and Inkelas and Zec (1990). Many of these studies show that the left or right edge of a prosodic domain can be marked by a boundary tone. An example is the floating L tone in Japanese at the end of an accentual phrase (Poser 1984a; Pierrehumbert and Beckman 1988). More generally, the tonal elements illustrated here for word-level tonology, accent, and phrasal tonology play a key role in intonation. This is true in non-tone languages as well, as one can gather from Gussenhoven (2004) and chapter 4 of this volume. The behaviour of tones in lexical tone systems has provided inspiration for the analysis of intonation in tone languages and non-tone lan guages alike. This tradition reaches back at least as far as Liberman (1975) and Pierrehumbert (1980), as traced by Gussenhoven (2004) among others, and as evidenced in many current analyses of intonation in specific languages, including those compiled in Jun (2005a, 2014a) and Downing and Rialland (2017a).

4.5 Further issues: Phonation and tone features As noted in §4.1.2, pitch sometimes interacts with phonation types and syllable types. For example, in some languages phonological tones are accompanied by a laryngeal gesture, such as breathiness and glottalization. Even if analysed as final /-h/ and /-ʔ/, these laryngeal

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

64 larry m. hyman and william r. leben gestures can affect the distribution of tones, as they do in Itunyoso Trique [Oto-Manguean; Mexico] (Table 4.11) (DiCanio 2016: 231).

Table 4.11 Tonal distributions in Itunyoso Trique Tone

Open syllable

/4/ /3/ /2/ /1/ /45/ /13/ /43/ /32/ /31/

yu᷉ yu᷉3 u᷉2 yu᷉1 yo13 ra43 ra᷉32 ra᷉31 4

‘earthquake’ ‘palm leaf ’ ‘nine’ ‘loose’ ‘fast (adj.)’ ‘want’ ‘durable’ ‘lightning’

Coda /h/

Coda /ʔ/

ya᷉h ya᷉h3 tah2 ya᷉h1 toh45 toh13 nna᷉h43 nna᷉h32

‘dirt’ ‘paper’ ‘delicious’ ‘naked’ ‘forehead’ ‘a little’ ‘mother!’ ‘cigarette’

niʔ tsiʔ3 ttʃiʔ2 tsiʔ1

‘see.1dual’ ‘pulque’ ‘ten’ ‘sweet’

4

4

As seen, the high rising tone /45/ only occurs on CVh syllables, while only the four level tones occur on CVʔ syllables. ‘Stopped’ syllables typically have fewer contrasts than ‘smooth’ syllables in Chinese and South East Asian languages in general. Tone and phonation also interact in many other languages around the world in a variety of ways: cf. Volk (2011a), Wolff (1987), and chapters 9 (§9.6), 12 (§12.2), 23 (§23.2.2), 27 (§27.3.5), 28 (§28.2.3, §28.3.3, and §28.4.1), and 29 (§29.4 and §29.6.2). Interactions between pitch, laryngeal gesture, and syllable type can pave the way to tono genesis and subsequent tonal splits (Haudricourt 1961; Matisoff 1973; Kingston 2011). A common pattern is for one or both of the contrasting /H/ and /L/ tones to further ‘bifurcate’ into two distinct heights, each conditioned by the voicing of the onset consonant. This likely accounts for the four-way contrast in Gban in Table 4.10. Once a language has at least a binary H vs. L contrast, the tones themselves can also interact to produce further tone heights, such as the M of Babanki in (11). While a featural analysis was provided for Gban, whether (or to what extent) tone features are needed in the phonology of tone has been questioned (Clements et al. 2010; Hyman 2010b; but cf. McPherson 2017). This is a key issue in the analysis of tone systems that remains to be resolved.

4.6 Conclusion This chapter has offered a definition of tone broad enough to cover its various functions, behaviours, and manifestations in the languages of the world while preserving the notion that tone is the same phonological entity in all the cases discussed. Our survey has attempted to cover the general properties of tone systems and some unusual ones as well. Tone, as seen, can interact in a variety of ways with other phonetic features as well as with the abstract feature accent. The basic phonological unit to which a tone is linked can differ from language to language and may include vowels (or syllabic segments), morae, and syllables.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

tone systems 65 A separate question is the domain across which a tone or tone melody is mapped. Different tone system typologies have been based on the number of tone heights or tone shapes (including potentially contours) in the phonological inventory and on several distributional properties, contrastiveness being most important for the phonologist. Another type of tonal typology differentiates the various types of assimilation and dissimilation that tones can undergo. Yet another aspect of tone is its function as a property of lexical morphemes, grammatical morphemes, or both, and its ability to function at the level of the syntactic or phonological phrase as well as in intonation.

We dedicate this chapter to the memory of our dear friend and colleague of over four dec ades, Russell G. Schuh, who loved tone as much as we do.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 5

Wor d -Str ess Systems Matthew K. Gordon and Harry Van Der Hulst

5.1 Introduction The term ‘stress’ refers to increased prominence on one or more syllables in a word.1 Depending on the language, stress is diagnosed in different ways: through a combination of physical properties, speaker intuitions, and phonological properties such as segmental constraints and processes. For example, the first syllable, the stressed one, in the English word totem /ˈtoʊtəm/ is longer, louder, and realized with higher pitch than the unstressed second syllable. In addition, the /t/ in the stressed syllable is aspirated, while the unstressed vowel is reduced to schwa and the preceding /t/ is flapped. It is possible for a word to have one or more secondary stresses that are less prominent than the main (or primary) stress. For example, the word manatee /ˈmænəˌti/ has a primary stress on the first syllable and a secondary stress on the final syllable, as is evident from the non-flapped /t/ in the onset of the final syllable. In §5.2 we consider the ways in which stress is manifested phonetically and in its correlations with segments and syllables, as well as in speaker intuitions, while in §5.2.4 we discuss some of its distributional properties in languages generally. A summary of typological research is provided in §5.3, while §5.4 considers stress’s relation to rhythm and foot structure. Finally, §5.5 deals with some outstanding issues.

5.2 Evidence for stress 5.2.1 Phonetic exponents Acoustic correlates of stress include increased duration, higher fundamental frequency (pitch), greater overall intensity (loudness), and spectral attributes such as an increased weighting in favour of higher frequencies and a shift in vowel quality (see chapter 10 for a discussion). There is considerable cross-linguistic variation in the properties that mark 1 Some researchers refer to ‘accent’ rather than ‘stress’; see van der Hulst (2014a) for terminological matters.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

WORD-STRESS SYSTEMS 67 stress. In a 75-language survey, Gordon and Roettger (2017) found duration to be the most reliable correlate of stress, distinguishing stressed from unstressed syllables in over 90% of these languages. Other exponents of stress included in their survey (intensity, f0, vowel reduction, and spectral tilt) are also predictive of stress in the majority of studies. Acoustic evidence for secondary stress is more tenuous. In virtually all studies in Gordon and Roettger’s survey, secondary stress was distinguished from primary stress and/or lack of stress through only a subset of properties, if any at all, that were used to distinguish primary stressed from unstressed syllables.

5.2.2 Speaker intuitions and co-speech gestures Evidence for stress may also come from speaker intuitions, which may be accessed either directly through questioning or covertly through observation of co-speech gestures, such as beat gestures, tapping, or eyebrow movements, which tend to coincide with peaks in fundamental frequency (e.g. Tuite 1993; Cavé et al. 1996; Leonard and Cummins 2010). In the tapping task commonly employed by stress researchers, speakers are asked to simultan eously tap on a hard surface while pronouncing a word. When asked to tap once, speakers typically tap on the primary stress. Additional prompted taps characteristically coincide with secondary stresses. Tapping has been used to elicit judgements about stress not only for languages with lexically contrastive stress, such as noun–verb pairs in English (e.g. ˈimport vs. imˈport), but also for languages with predictable stress, such as Tohono O’odham [Uto-Aztecan; United States] (Fitzgerald 1997) and Banawá [Arawan; Brazil] (Ladefoged et al. 1997). The tapping diagnostic has its limitations, however, and is not successful for speakers of all languages.

5.2.3 Segmental and metrical exponents of stress Stress also conditions various processes, many of which are phonetic or phonological mani fest ations of the strengthening and weakening effects discussed earlier. Stressed and unstressed vowels are often qualitatively different. Unstressed vowels are commonly centralized relative to their stressed counterparts, although unstressed high vowels are more peripheral in certain languages (see Crosswhite 2004 for the typology of vowel reduction). Unstressed vowels in English typically reduce to a centralized vowel, gradiently or categor ically. Gradient reduction occurs in the first vowel in [ɛ]xplain/[ə]xplain. Such qualitative reduction is typically attributed to articulatory undershoot due to reduced duration, which precludes the attainment of canonical articulatory targets (Lindblom 1963). Categorical reduction in English can often be argued to have a derivational status, as in the case of the second vowel in ˈhum[ə]n in view of its stressed counterpart in huˈm[æ]nity, but underived reduced vowels are frequent, like those in the second syllables of totem and manatee mentioned in §5.1. Vowel devoicing is another by-product of undershoot in the context of voiceless conson ants or right-edge prosodic boundaries, contexts that are characteristically associated with laryngeal fold abduction, which may overlap with a vowel, especially if unstressed. For example, in Tongan [Austronesian; Tonga] (Feldman 1978), an unstressed high vowel

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

68 MATTHEW K. GORDON AND HARRY VAN DER HULST devoices when it occurs after a voiceless consonant and either before another voiceless consonant or utterance-finally, as in /ˈtuk[i ̥]/ ‘strike’, /ˈtaf[u̥]/ ‘light a fire’, /ˌpas[i̥]ˈpas[i̥]/ ‘applaud’ (see Gordon 1998 for the typology of devoicing). Deletion is an extreme manifestation of reduction. For example, the first vowel in t[ə]ˈmato and the middle vowel in ˈfam[ə]ly are often absent in rapid speech. In San’ani Arabic [Afro-Asiatic; Yemen], unstressed vowels optionally delete, e.g. /fiˈhimtiː/~ /ˈfhimtiː/ ‘you f.sg understood’, /kaˈtabt ~ ˈktabt/ ‘I wrote’ (Watson 2007: 73). Vowel deletion often parallels devoicing in displaying gradience and optionality. Furthermore, deletion is often only a perceptual effect of shortening as articulatory traces of inaudible vowels may remain (see Gick et al. 2012). A complementary effect to reduction is strengthening in stressed syllables (see Bye and de Lacy 2008 for an overview). For example, short vowels in stressed non-final open syl lables in Chickasaw [Muskogean; United States] are substantially lengthened (Munro and Ulrich 1984; Gordon and Munro 2007), e.g. /ʧiˌpisaˌliˈtok/ → [ʧiˌpiːsaˌliːˈtok] ‘I looked at you’, /aˌsabiˌkaˈtok/ → [aˌsaːbiˌkaːˈtok] ‘I was sick’. Stressed syllables may also be bolstered through consonant gemination, e.g. Delaware [Algonquian; United States] /nəˈmə.təmeː/ → [nəˈmət.təmeː] (Goddard 1979: xiii). Gemination in this case creates a closed and thus heavy syllable (see §5.3.3 on syllable weight). Gemination can also apply to a consonant in the onset of a stressed syllable, as in Tukang Besi [Austronesian; Indonesia] (Donohue 1999) and Urubú Kaapor [Tupian; Brazil] (Kakumasu 1986). Stress may also have phonological diagnostics extending beyond strengthening and weakening. In the Uto-Aztecan language Tohono O’odham (Fitzgerald 1998), traditional song meter is sensitive to stress. The basic stress pattern (subject to morphological complications not considered here) is for primary stress to fall on the first syllable and secondary stress to occur on subsequent odd-numbered syllables (Fitzgerald 2012; see §5.4 on rhythmic stress): /ˈwa-paiˌɺa-dag/ ‘someone good at dancing’, /ˈʧɨpoˌs-id-a-ˌkuɖ/ ‘branding instrument’. Although lines in Tohono O’odham songs are highly variable in their number of syllables, they are subject to a restriction against stressed syllables in the second and final positions; these restrictions trigger syllable and vowel copying processes (Fitzgerald 1998). Stress may also be diagnosed through static phonotactic restrictions, such as the confinement of tonal contrasts to stressed syllables in Trique [Oto-Manguean; Mexico] (DiCanio 2008), the restriction of vowel length contrasts to stressed syllables in Estonian [Uralic; Estonia] (Harms 1997), or the occurrence of schwa in unstressed syllables in Dutch (van der Hulst 1984) in words where there is no evidence for an underlying full vowel.

5.2.4 Distributional characteristics of stress There are certain properties associated with ‘canonical’ stress systems (see Hyman 2006 for a summary). One of these is the specification of the syllable as the domain of stress, a property termed ‘syllable integrity’ by Hayes (1995). Syllable integrity precludes stress contrasts between the first and second halves of a long vowel or between a syllable nucleus and a coda. Syllable integrity differentiates stress from tone, which is often linked to a sub-constituent of the syllable, the mora. Another potentially definitional characteristic of stress is ‘obligatoriness’, the requirement that every word have at least one stressed syllable. Obligatoriness precludes a system in

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

WORD-STRESS SYSTEMS 69 which stress occurs on certain words but not others. Obligatoriness holds for phonological rather than morphological words; thus, a function word together with a content word, e.g. the man, constitutes a single phonological word. Unlike stress systems, canonical tone systems do not require every word to have tone. The complement of obligatoriness is ‘culminativity’, which requires that every word have at most one syllable with primary stress. Most, if not all, stress systems obey culminativity. Culminativity is not, however, definitional for stress since culminativity is a property of certain tone languages that only allow a single lexically marked tone per word, such as Japanese [Japonic; Japan]. These are often called ‘restricted tone languages’ (Voorhoeve 1973). Although syllable integrity, obligatoriness, and culminativity are characteristic of most stress systems, each of them has been challenged as a universal feature of stress systems. Certain Numic [Uto-Aztecan; United States] languages, such as Southern Paiute (Sapir 1930) and Tümpisa Shoshone (Dayley 1989), are described as imposing a rhythmic stress pattern sensitive to mora count, a system that allows for either the first or the second half of a long vowel to bear stress, a violation of syllable integrity. Some languages are described as violating obligatoriness in having stressless words, such as words lacking heavy syllables in Seneca [Iroquoian; United States] (Chafe 1977) and phrase-final and isolation words of the shape CVCV(C) in Central Alaskan Yupik [Eskimo-Aleut; United States] (Miyaoka 1985; Woodbury 1987; see chapter 20 for discussion of Yupik). Other languages are said to have multiple stresses per word none of which stands out as the primary stress, such as Central Alaskan Yupik (Woodbury 1987) and Tübatulabal [Uto-Aztecan; United States] (Voegelin 1935), a violation of culminativity. Hayes (1995) suggests that isolated violations of syllable integrity, obligatoriness, and culminativity are amenable to alternative analyses that preserve these three proposed universals of stress.

5.3 Typology of stress The typology of stress systems has been extensively surveyed (e.g. Hyman 1977; Bailey 1995; Gordon 2002; Heinz 2007; van der Hulst and Goedemans 2009; van der Hulst et al. 2010; Goedemans et al. 2015). We summarize here some of the results of this research programme.

5.3.1 Lexical versus predictable stress A division exists between languages in which stress is predictable from phonological properties such as syllable location and shape and those in which it varies as a function of morphology or lexical item. Finnish [Uralic; Finland] (Suomi et al. 2008), in which primary stress falls on the first syllable of every word, provides an example of phonologically predictable stress. At the other extreme, Tagalog [Austronesian; Philippines] (Schachter and Otanes 1972) words may differ solely on the basis of stress, e.g. /ˈpito/ ‘whistle’ vs. /piˈto/ ‘seven’. In reality, degree of predictability of stress represents more of a continuum than a binary division, since most languages display elements of both contrastive and predictable stress. For example, although stress in Spanish is lexically distinctive, e.g. /ˈsabana/ ‘bed sheet’ vs. /saˈbana/ ‘savannah’, it is confined to a three-syllable window at the right edge of a

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

70 MATTHEW K. GORDON AND HARRY VAN DER HULST word with a strong statistical preference for the penultimate syllable (Roca 1999; Peperkamp et al. 2010). Similarly, stress-rejecting affixes in Turkish Kabardian [Northwest Caucasian; Turkey] (Gordon and Applebaum 2010) create deviations from the otherwise predictable stress pattern, such as the predictable penultimate stress in /ˈməʃɐ/ ‘bear’ vs. the final stress in /məˈʃɐ/ ‘this milk’ attributed to the stress-rejecting prefix mə- ‘this’.

5.3.2 Quantity-insensitive stress Phonologically predictable stress systems differ depending on their sensitivity to the internal structure of syllables. In languages with ‘quantity-insensitive’ or ‘weight-insensitive’ stress, stress falls on a syllable that occurs at or near the periphery of a word. For example, Macedonian [Indo-European; Macedonia] stresses the antepenultimate syllable of words (Lunt 1952; Franks 1987): /voˈdeniʧar/ ‘miller’, /vodeˈniʧari/ ‘miller-pl’, /vodeniˈʧarite/ ‘miller-def.pl’. Surveys reveal five robustly attested locations of ‘fixed stress’: the initial, the second, the final, the penultimate, and the antepenultimate syllables. Third syllable is a more marginal pattern, reported for Ho-chunk [Siouan; United States] (but see discussion of alternative tonal analyses in Hayes 1995) and as the default pattern in certain languages with lexical stress, such as Azkoitia Basque [isolate; Spain] (Hualde 1998]. Three stress locations (initial, penultimate, and final) statistically predominate, as illustrated in Figure 5.1, based on the StressTyp2 (Goedemans et al. 2015) database of 699 languages.

5.3.3 Quantity-sensitive stress In many languages, stress is sensitive to the internal structure or ‘weight’ of syllables, where criteria for which syllables count as ‘heavy’ vary across languages (Hayes 1989a; Gordon 2006). For example, in Piuma Paiwan [Austronesian; Taiwan] (Chen 2009b), stress typically falls on the penultimate syllable of a word: /kuˈvuvu/ ‘my grandparents’, /səmuˈkava/

200 175 150 125 100 75 50 25 0 Initial

Peninitial

Antepenult

Penult

Final

Figure 5.1 Number of languages with different fixed-stress locations according to StressTyp2 (Goedemans et al. 2015).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

WORD-STRESS SYSTEMS 71 ‘to take off clothes’. However, if the penult contains a light syllable, one containing a schwa, stress migrates rightward to the final syllable (even if it too contains schwa): /qapəˈdu/ ‘gall’, /ʎisəˈqəs/ ‘nit’. The rejection of stress by schwa is part of a cross-linguistic weight continuum in which non-low central vowels are lighter in some languages than peripheral vowels. Among peripheral vowels, languages may treat low vowels as heavier than non-low vowels or nonhigh vowels as heavier than high vowels (Kenstowicz 1997; De Lacy 2004; Gordon 2006); see, however, Shih (2016, 2018) and Rasin (2016) for the paucity of compelling evidence for vowel-quality-based stress. It is more common for a weight-sensitive stress system to be sensitive to the structure of the syllable rime than to vowel quality (see Gordon 2006 for statistics). Many languages thus treat syllables with long vowels (CVV) as heavier than those with short vowels, while others preferentially treat both CVV and closed syllables (CVC) as heavy. For example, in Kabardian (Abitov et al. 1957; Colarusso 1992; Gordon and Applebaum 2010), stress falls on a final syllable if it is either CVV or CVC, otherwise on the penult: /sɐˈbən/ ‘soap’, /saːˈbiː/ ‘baby’, /ˈwənɐ/ ‘house’, /χɐrˈzənɐ/ ‘good’. Tone may also condition stress in some languages, where higher tones are preferentially stressed over lower tones (de Lacy 2002). In some languages, weight is scalar (Hayes 1995; Gordon 2006), and in others, weight is sensitive to onset consonants (Gordon 2005b; Topintzi 2010; see §5.3.3). Pirahã [MuraPirahã; Brazil] (Everett and Everett 1984a; Everett 1998) observes a scalar weight hierarchy that simultaneously appeals to both onset and rimal weight: stress falls on the rightmost heaviest syllable within a three-syllable window at the right edge of a word. The Pirahã weight scale is KVV > GVV > VV > KV > GV, where K stands for a voiceless onset and G for a voiced onset. Onset-sensitive weight is rare compared to rime-sensitive weight. Of 136 languages with weight-sensitive stress in Gordon’s (2006) survey, only four involve onset sensitivity (either presence vs. absence or type of onset). The primacy of rimal weight is mirrored language-internally: onset weight almost always implies rimal weight, and, where the two coexist, rimal weight takes priority over onset weight. This dependency is exemplified in Pirahã, where a heavier rime (one consisting of a long vowel) outweighs a heavier onset (one containing a voiceless consonant)—that is, GVV outweighs KV.

5.3.4 Bounded and unbounded stress In the stress systems discussed thus far, stress is limited, or ‘bounded’, to a range of syllables at a word edge. For example, in Piuma Paiwan, which avoids stress on schwa (Chen 2009b; §5.3.3), stress falls on one of the last two syllables, even if there is a peripheral vowel to the left of the penult and the final two syllables both contain schwa. Stress windows are also observed at the left edge in some languages. In Capanahua [Panoan; Peru] (Loos 1969; Elías-Ulloa 2009), stress falls on the second syllable if it is closed, but on the first otherwise, as seen in /ˈmapo/ ‘head’, /ˈwaraman/ ‘squash’, /piʃˈkap/ ‘small’, /wiˈrankin/ ‘he pushed it’ (see van der Hulst 2010a for more on window effects for weight-sensitive stress). As the word /ˈwaraman/ indicates, stress is limited to the first two syllables even if these are light and a syllable later in the word is heavy. Lexical stress may also be bound to stress windows. For example, Choguita Rarámuri [Uto-Aztecan; Mexico] (Caballero and Carroll 2015) has lexically contrastive stress operative

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

72 MATTHEW K. GORDON AND HARRY VAN DER HULST within a three-syllable window at the left edge of a word, where the default stress location is the second syllable: /ˈhumisi/ ‘run away pl’ vs. /aˈsisi/ ‘get up’ vs. /biniˈhi/ ‘accuse’. When a lexically stressed suffix attaches to a root with default second-syllable stress, stress is shifted to the suffix unless it falls outside the left-edge three-syllable window. For example, the conditional suffix /sa/ attracts stress in /ru-ˈsa/ ‘s/he is saying’ and /ʧapi-ˈsa/ ‘S/he is grabbing’, but not in /ruruˈwa-sa/ ‘s/he is throwing liquid’. Not all weight-sensitive or lexical stress systems are bounded. For example, stress in Yana [isolate; California] (Sapir and Swadesh 1960) is ‘unbounded’, falling on the leftmost heavy syllable (CVV or CVC) regardless of its position in a word. In words lacking a heavy syllable, stress defaults to the initial syllable. Languages such as Yana featuring unbounded stress may either have initial stress in the default case, as in Yana, or default final stress, as in Kʷak’ʷala [Wakashan; Canada] (Boas 1947; Bach 1975; Wilson 1986; Shaw 2009; Gordon et al. 2012). If, in languages with unbounded stress, several morphemes with inherent stress are combined into a complex word, the leftmost or rightmost among them will attract stress. This situation parallels unbounded weight-sensitive stress, if lexical stress is viewed as diacritic weight (van der Hulst 2010a). In both cases, stress defaults to the first or last syllable (or the peninitial or penult, if extrametricality/non-finality applies) if no heavy syllable is present. A case in point is Russian [Indo-European; Russia], in which primary stress falls on the rightmost syllable with diacritic weight and on the first syllable if there is no syllable with diacritic weight: /gospoˈʒa/ ‘lady’, /koˈrova/ ‘cow’ vs. /ˈzʲerkalo/ ‘mirror’, /ˈporox/ ‘powder’ (Halle 1973).

5.3.5 Secondary stress In certain languages, longer words may have one or more secondary stresses. In some, there may be a single secondary stress at the opposite edge from the primary stress. For example, in Savosavo [Central Solomon Papuan; Solomon Islands] (Wegener 2012), primary stress typically falls on the penult with secondary stress on the initial syllable, as in /ˌsiˈnoqo/ ‘cork’, /ˌkenaˈɰuli/ ‘fishing hook’. In other languages, secondary stress rhythmically propagates either from the primary stress or from a secondary stress at the opposite edge from the primary stress. Rhythmic stress was exemplified earlier (see §5.2.3) for Tohono O’odham, in which primary stress falls on the first syllable and secondary stress falls on subsequent odd-numbered s yllables: /ˈwa-paiˌɺa-dag/ ‘someone good at dancing’, /ˈʧɨpoˌs-id-a-ˌkuɖ/ ‘branding instrument’. Languages with a fixed primary and a single fixed secondary stress are relatively rare compared to those with rhythmic stress. In Gordon’s (2002) survey of 262 quantityinsensitive languages, only 15 feature a single secondary stress compared to 42 with rhythmic secondary stress. Both, though, are considerably rarer than single fixed stress systems, which number 198 in Gordon’s survey, although it is conceivable that some languages for which only primary stress is described may turn out to have secondary stress. Even rarer are hybrid ‘bidirectional’ systems in which one secondary stress ‘wave’ radiates from the primary stress with a single secondary stress occurring on the opposite edge of the word. For example, primary stress in South Conchucos Quechua [Quechuan; Peru] (Hintz 2006) falls on the penult, with secondary stress docking both on the initial syllable and on alternating syllables to the left of the penult, as in /ˌwaˌraːkaˌmunqaˈnaʧi ̥/ ‘I crunch up my own (e.g. prey) with teeth’. The bidirectional nature of stress leads to adjacent stresses (i.e. stress clashes) in words with an odd number of syllables. In some bidirectional systems,

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

WORD-STRESS SYSTEMS 73 such as Garawa [Australian; Australia] (Furby 1974), rhythmic stress is suppressed where it would result in a stress clash. Another rare system has stress on every third syllable. For example, primary stress in Cayuvava [isolate; Bolivia] (Key 1961, 1967) falls on the antepenultimate syllable and secondary stress falls on every third syllable to the left of the primary stress: /ikiˌtapareˈrepeha/ ‘the water is clean’, /ˌʧa.adiˌroboβuˈuruʧe/ ‘ninety-five (first digit)’. StressTyp2 (Goedemans et al. 2015) cites only two quantity-insensitive stress systems with stress on every third syl lable, although there are a few quantity-sensitive stress languages (§5.3.3) in which ternary intervals occur in sequences of light syllables (see Hayes 1995). Stanton’s (2016) survey of word length in 102 languages suggests that rhythmic stress (generalized over all subtypes) is especially prevalent in languages with longer words, whereas single stress systems are more common in languages with fewer long words. Figure 5.2 plots the median percentages of words ranging from one to four or more syllables for languages with a single stress per word (34 languages in Stanton’s database) and for those with rhythmic secondary stress (22 languages). Non-stress languages and those with other types of stress systems, such as those based on tone or those with one stress near each edge of the word, are excluded in Figure 5.2. The two sets of languages display virtually identical frequency patterns for words with two and three syllables, but differ in the relative frequency of monosyllabic words and words of at least four syllables. Monosyllables vastly outnumber (by nearly 30%) words with four or more syllables in the single stress languages, but are only marginally more numerous than long words in the languages with rhythmic stress. This asymmetry suggests that stress lapses are dispreferred and that when the morphology of a language creates longer words in sufficient frequency, speakers tend to impose rhythmic stress patterns, which may then generalize to shorter words. A more cynical view might attribute the link between word length and rhythmic stress to the perceptual transfer of rhythmic secondary stresses by researchers accustomed to hearing secondary stresses in their native language, a phenomenon that Tabain et al. (2014) term ‘stress ghosting’ in their study of Pitjantjatjara [Australian; Australia]. Median % of Words of Different Lengths

45 Single Stress

40

Rhythmic Stress

35 30 25 20 15 10 5 0

1

2

3

4+

1

2

3

4+

Figure 5.2 Median percentages of words with differing numbers of syllables in languages with a single stress per word and those with rhythmic secondary stress in Stanton (2016).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

74 MATTHEW K. GORDON AND HARRY VAN DER HULST A recurring feature of languages with rhythmic secondary stress is that the primary stress serves as the starting point for the placement of secondary stresses (van der Hulst 1984). Thus, in a language with rightward propagation of secondary stress, such as Tohono O’odham, the primary stress is the leftmost stress, whereas in languages with leftward iteration of secondary stress, such as Émérillon [Tupian; French Guiana] (Gordon and Rose 2006) and Cayuvava [isolate; Bolivia] (Key 1961, 1967), the rightmost stress is the primary one. Systems in which the stress at the endpoint of the rhythmic train is the primary one are comparatively rare. Virtually all of the exceptions to this generalization involve cases of rightward propagation of stress and the rightmost stress being the primary one, a pattern that plausibly reflects phrasal pitch accent rather than word stress (van der Hulst 1997; Gordon 2014). Perhaps the only case in the literature of leftward stress assignment and promotion of the leftmost stress to the primary one is found in Malakmalak [Australian; Australia] (Birk 1976).

5.3.6 Non-finality effects Many stress systems exhibit a bias against (primary or secondary) stress on final syllables. Final stress avoidance has various manifestations. Some languages suppress or shift a rhythmic secondary stress that would be predicted to fall on a final syllable. An example of final stress suppression comes from Pite Saami [Uralic; Sweden] (Wilbur 2014), which has the same basic rhythmic stress pattern as Tohono O’odham except that final odd-numbered syllables are not stressed, e.g. /ˈsaːlpmaˌkirːje/ ‘psalm book nom.sg’, /ˈkuhkaˌjolkikijt/ ‘long-leg-nmlzacc.pl’. Other languages may stress the second syllable of a word, but not if that stress would be final. For example, in Hopi (Jeanne 1982), stress falls on the second syllable of a word with more than the two syllables if the first syllable is light, but in disyllabic words stress is initial regardless of the weight of the first syllable: /kɨˈjapi/ ‘dipper’, /laˈqana/ ‘squirrel’, /ˈkoho/ ‘wood’, /ˈmaqa/ ‘to give’. Another species of non-finality occurs in weight-sensitive systems in which final weight criteria are more stringent than in non-final syllables, a pattern termed ‘extrametricality’ (Hayes 1979). Thus, in Cairene Arabic [Afro-Asiatic; Egypt] (Mitchell 1960; McCarthy 1979a; Watson 2007), CVC attracts stress in the penult, as in /muˈdarris/ ‘teacher m.sg.’, but a final syllable containing a short vowel must have two coda consonants (CVCC) to attract stress, cf. /kaˈtabt/ ‘I wrote’ but /ˈasxan/ ‘hotter’ (see Rosenthall and van der Hulst 1999 for more on context-driven weight for stress).

5.4 Rhythmic stress and the foot Languages with rhythmic stress have provided the impetus for theories that assume the foot as a prosodic constituent below the word (e.g. Liberman and Prince 1977; Hayes 1980, 1995; Selkirk 1980; Halle and Vergnaud 1987; Halle and Idsardi 1995). In these theories, foot type is treated as a parameter with certain languages employing trochaic feet, which consist of strong–weak pairs of syllables, and others opting for iambic feet, consisting of weak–strong pairs. Tohono O’odham provides an example of trochaic footing where in words with odd syllables the final syllable constitutes a monosyllabic foot, as in /(ˈʧɨpo)(ˌsida)(ˌkuɖ)/

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

WORD-STRESS SYSTEMS 75 ‘ branding instrument’ (cf. /(ˈwapai)(ˌɺadag)/ ‘someone good at dancing’). The mirror-image trochaic system stresses even-numbered syllables counting from the right, as in Émérillon (excluding words with a final heavy syllable, which attract stress from the penult) (Gordon and Rose 2006): /(ˌmana)(ˈnito)/ ‘how’, /(ˌdeze)(ˌkasi)(ˈwaha)/ ‘your tattoo’. Osage [Siouan; United States] (Altshuler 2009), in which stress falls on even-numbered syllables counting from the left, exemplifies iambic stress: /(xoːˈʦo)(ðiːbˌrɑ̃)/ ‘smoke cedar’, /(ɑ̃ːˈwɑ̃) (lɑːˌxy)ɣe/ ‘I crunch up my own (e.g. prey) with teeth’. (The final syllable remains unfooted to avoid a stress clash with the preceding syllable.) Its mirror-image iambic pattern stresses odd-numbered syllables from the right, as in Urubú Kaapor (Kakumasu 1986). Trochaic stress patterns predominate cross-linguistically. In StressTyp2, the Tohono O’odham-type trochaic pattern is found in 42 languages, while the Émérillon-type trochaic system is found in 40 languages. In contrast, their inverses, Osage iambic and Urubú Kaapor iambic systems, are observed in only 7 and 13 languages, respectively. The alternative to a foot-based theory of stress represents stress only in terms of a prominence grid (e.g. Prince 1983; Selkirk 1984; Gordon 2002), in which stressed syllables project grid marks while unstressed ones do not. Differences in level of stress (e.g. primary vs. secondary stress) are captured in terms of differences in the number of grid marks dominating a syllable. Foot-based theories assume that the grid marks are grouped into (canonic ally) disyllabic constituents, although single syllables may be parsed into feet at the periphery of a word, as in Tohono O’odham. Foot-based and grid-based representations of stress are exemplified for Tohono O’odham in (1). (1)

Level 1 (Primary stress) Level 2 (Secondary stress)

Foot-based ( x ) ( x . )( x .) ( x ) (ˈʧɨpo)(ˌsida) (ˌkuɖ)

Grid-based x x x x ˈʧɨpo ˌsida ˌkuɖ

Phonologists have long debated the role of the foot in the analysis of stress (see Hermans 2011). An asymmetry between trochaic and iambic feet in their sensitivity to weight provides one of the strongest pieces of evidence for the foot. Unlike quantity-insensitive rhythmic stress systems, which are biased towards trochees, quantity-sensitive rhythmic stress tends towards iambic groupings with an ideal profile consisting of a light–heavy sequence, an asymmetry termed the ‘iambic-trochaic law’. Chickasaw instantiates a prototypical iambic language in which stressed light (CV) syllables are lengthened non-finally (see §5.2.3) and all heavy (CVV, CVC) syllables are stressed: /(ʧiˌkaʃ)(ˈʃaʔ)/ ‘Chickasaw’, /(ˈnaːɬ)(toˌkaʔ)/ ‘policeman’, /ʧiˌpisaˌliˈtok/ → /(ʧiˌp[iː])(saˌl[iː])(ˈtok)/ ‘I looked at you’. In contrast to iambic feet, trochaic feet in some languages are subject to shortening of stressed vowels to produce a canonical light-light trochee, e.g. Fijian /m͡buːŋ͡gu/ → /(ˈm͡b[u].ŋ͡gu)/ my grandmother’ (Schütz 1985).

5.5 Outstanding issues in word stress 5.5.1 The diagnosis of stress Stress is easily identified in its prototypical instantiation in which phonetic and phonological exponents, speaker intuitions, and distributional characteristics converge. There are many languages, however, in which evidence for stress is more ambiguous. It is thus often difficult to

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

76 MATTHEW K. GORDON AND HARRY VAN DER HULST determine whether prominence should be attributed to stress rather than other properties, including tone, intonation, and the marking of prosodic boundaries (for discussion see Gordon 2014; Roettger and Gordon 2017). Raised pitch could thus potentially reflect a high tone in a tone language, reflect a phrase- or utterance-initial boundary tone, or be triggered by focus. Similarly, increased length could be attributed to a prosodic boundary rather than stress. Distributional restrictions on other phonological properties may be diagnostic of stress in lieu of obvious phonetic exponents or phonological alternations. For example, certain Bantu languages preferentially restrict high tone to a single syllable per word (Hyman 1989; Downing 2010), a distribution that is consistent with the property of culminativity holding of canonical stress systems (§5.2.4). There are also languages in which potential phonetic correlates of stress may not converge on the same syllable, as in Bantu languages with high tone on the antepenult but lengthening of the penult (Hyman 1989), or languages such as Belarusian [Indo-European; Belarus] (Dubina 2012; Borise 2015) and Welsh [Indo-European; United Kingdom] (Williams 1983, 1999) with cues to stress spread over the stressed and adjacent syllable. Non-convergence may be due to the existence of multiple prominence systems (e.g. intonation vs. word-level stress) or to a diffuse phonetic realization of stress (e.g. a delayed or premature f0 peak relative to the stress).

5.5.2 Stress and prosodic taxonomy Stress is widespread in languages of the world. Of the 176 languages included in the 200-language World Atlas of Language Structures sample, approximately 80% (141 languages) are reported to have stress (Goedemans 2010: 649; see chapter 10 for a lower estimate). Phonemic tone and stress have traditionally been regarded as mutually exclusive. However, an increasing body of research has demonstrated cases of stress and tone co-existing in the same language, whether functioning orthogonally to each other, as in Thai [Tai-Kadai; Thailand] (Potisuk et al. 1996), Papiamentu [Portuguese Creole; Aruba] (Remijsen and van Heuven 2002), and Pirahã (Everett and Everett 1984a; Everett 1998); in a dependent relationship in which tone is predictive of stress, as in Ayutla Mixtec [Oto-Manguean; Mexico] (Pankratz and Pike 1967; de Lacy 2002); or where stress is predictive of tone, as in Trique (DiCanio 2008, 2010). On the other hand, there are several languages that have traditionally been regarded as stress languages but that are now generally considered languages in which prominence can be linked to phrasal pitch events rather than word-level stress (or tone), such as French [Indo-European; France] (Jun and Fougeron 1995), Korean [Koreanic; Korea] (Jun 1993), Indonesian [Austronesian; Indonesia] (van Zanten et al. 2003), Ambonese Malay [Austronesian; Indonesia] (Maskikit-Essed and Gussenhoven 2016), West Greenlandic [Eskimo-Aleut; Greenland] (Arnhold 2014), and Tashlhiyt [Afro-Asiatic; Morocco] (Roettger et al. 2015). These languages all share in common pitch events that occur near the edges of prosodic domains larger than the word, though they differ in the consistency of the timing of the pitch events.

5.5.3 Stress typology and explanation A burgeoning area of research explores various perceptual and cognitive motivations behind stress patterns. For example, several scholars have developed phonetically driven

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

WORD-STRESS SYSTEMS 77 accounts of onset weight that appeal to auditory factors such as perceptual duration (Goedemans 1998), adaptation and recovery (Gordon 2005b), and perceptual p-centres (Ryan 2014). Gordon (2002) offers an account of rime-sensitive weight appealing to the non-linear mapping between acoustic intensity and perceptual loudness and to the tem poral summation of energy in the perceptual domain. Non-finality effects have been linked to an avoidance of tonal crowding between the high pitch characteristic of stress and the default terminal pitch fall typically associated with the right edge of an utterance (Hyman 1977; Gordon 2000, 2014). Lunden (2010, 2013) offers an account of final extrametricality based on differences in the relative phonetic duration of syllables in final versus non-final syllables. Stanton (2016) hypothesizes that the absence of languages that orient stress towards the middle of the word rather than an edge, the ‘midpoint pathology’, is attributed to the difficulty in learning such a pattern due to the relative rarity of words of sufficient length to enable the learner to disambiguate midpoint stress from other potential analyses.

5.6 Conclusion Although a combination of typological surveys of stress and detailed case studies of particular languages has revealed a number of robust typological generalizations governing stress, many questions still remain. These include the abstract versus physical reality of stress, the relationship between word stress and prominence associated with higher-level prosodic units, and the role of functional and grammatical factors in explaining the behaviour of stress. The continued expansion of typological knowledge gleaned from phono logical, phonetic, and psycholinguistic studies of stress will continue to shed light on these issues (but will undoubtedly raise more questions).

Additional reading There are several overviews of typological and theoretical aspects of word stress that contain further references to particular topics, including Kager (2007), van der Hulst et al. (2010), Gordon (2011a, 2011b, 2015), Hammond (2011), Hermans (2011), Hyde (2011), and Gordon and Roettger (2017). Hyman (2006) is a recent discussion of definitional characteristics of stress as a prosodic class distinct from tone. The papers in van der Hulst (2014a, 2014b), Heinz et al. (2016), Goedemans et al. (2019), and Bogomolets and van der Hulst (in press) explore various contemporary descriptive and theoretical issues related to word stress.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

chapter 6

The Au tosegm en ta lM etr ica l Th eory of I n tonationa l Phonol ogy Amalia Arvaniti and Janet Fletcher

6.1 Introduction The autosegmental-metrical theory of intonational phonology (henceforth AM) is a widely adopted theory concerned with the phonological representation of intonation and its phonetic implementation. The term ‘intonation’ refers to the linguistically structured modulation of fundamental frequency (f0), which directly relates to the rate of vibration of the vocal folds and gives rise to the percept of pitch. Intonation is used in all languages and specified at the ‘post-lexical’ (phrasal) level by means of a complex interplay between metrical structure, prosodic phrasing, syntax, and pragmatics; these factors determine where f0 movements will occur and of what type they will be. Intonation serves two main functions: encoding pragmatic meaning and marking phrasal boundaries. In addition to intonation, f0 is used for lexical purposes, when it encodes tonal contrasts in languages traditionally described as having a ‘lexical pitch accent’, such as Swedish and Japanese, as well as languages with a more general distribution of ‘lexical tone’, such as Mandarin, Thai, and Igbo. Both types are modelled in AM together with tones that signal intonation (see e.g. Pierrehumbert and Beckman 1988 on Japanese). In addition to these linguistic uses, f0 is used to signal ‘paralinguistic’ information such as boredom, anger, emphasis, or excitement (on paralinguistic uses of f0, see Gussenhoven 2004: ch. 5; Ladd 2008b: ch. 1; see also chapter 30). Several models for specifying f0 contours are available today, such as Parallel Encoding and Target Approximation (PENTA) (Xu and Prom-On 2014, inter alia), the International Transcription System for Intonation (INTSINT) (Hirst and Di Cristo 1998), and the Fujisaki model (Fujisaki 1983, 2004). However, many aim at modelling f0 curves rather than defining the relation between f0 curves and the phonological structures that give rise to them. In contrast, AM makes a principled distinction between intonation as a subsystem of a language’s phonology and f0, its main phonetic exponent. The arguments for this distinction

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

THE AUTOSEGMENTAL-METRICAL THEORY OF INTONATIONAL PHONOLOGY 79 are similar to those that apply to segmental aspects of speech organization. Consider the following analogy. In producing a word, it is axiomatic in linguistic theory that the word is not mapped directly onto the movements of the vocal organs. There is instead an intervening level of phonological structure: a word is represented in terms of abstract units of sounds known as ‘phonemes’ or articulatory ‘gestures’, which cause the vocal organs to move in an appropriate way. According to AM, the same applies in intonation. If a speaker wants to produce a meaning associated with a polar question (e.g. ‘Do you live in Melbourne?’), this meaning is not directly transduced as rising pitch. Instead, there is an intervening level of ‘abstract tones’ (which can, like phonemes, be represented symbolically); these tones specify a set of pitch targets that the speaker should produce if this particular melody is to be communicated. This relationship between abstract tones and phonetic realization also applies in languages that have lexically specified tone. In both types of language, only the abstract tones form part of the speaker’s cognitive-phonological plan in producing a melody, with the precise details of how pitch changes are to be realized being filled in by phonetic procedures. AM thus integrates the study of phonological representation and phonetic realization (for details, see §6.2 and §6.3 respectively). The essential tenets of the model are largely based on Pierrehumbert’s (1980) dissertation (see also Bruce 1977), with additional refinements built on experimental research and formal analysis involving a large number of languages (see Ladd 2008b for a theoretical account; see Gussenhoven 2004 and Jun 2005a, 2014a for language surveys). The term ‘autosegmental-metrical’, which gave the theory its name, was coined by Ladd (1996) and reflects the connection between two subsystems of phonology: an autosegmental tier representing intonation’s melodic part as well as any lexical tones (if part of the system), and metrical structure representing prominence and phrasing. The connection reflects the fact that AM sees intonation as part of a language’s ‘prosody’, an umbrella term that encompasses interacting phenomena that include intonation, rhythm, prominence, and prosodic phrasing. The term ‘prosody’ is preferred over the older term ‘suprasegmentals’ (e.g. Lehiste 1977a; Ladd 2008b), so as to avoid the layering metaphor inherent in the latter (cf. Beckman and Venditti 2011): prosody is not a supplementary layer over vowels and consonants but an integral part of the phonological representation of speech. Crucial to AM’s success has been the central role it gives to the underlying representation of tunes as a series of tones rather than contours. Specifically, AM analyses continuous (and often far from smooth) pitch contours as a series of abstract primitives. This is a challenging endeavour for two reasons. First, intonational primitives cannot be readily identified based on meaning (as tones can in tone languages, such as Cantonese, where distinct pitch patterns are associated with changes in lexical meaning). In contrast, the meaning of intonational primitives is largely pragmatic (Hirschberg 2004), so in languages like English choice of melody is not constrained by choice of words. Second, f0 curves do not exhibit obvious changes that readily lead to positing distinct units; thus, breaking down the f0 curve into constituents is not as straightforward as identifying distinct patterns corresponding to segments in a spectrogram. This is all the more challenging, as a melody can spread across several words or be realized on a monosyllabic utterance. To illustrate this point, consider the pitch contours in Figure 6.1. The utterance in panel a is monosyllabic, while the one in panel b is eight syllables long. The f0 contours of the two utterances are similar but not identical: neither can be said to be a stretched or squeezed version of the other. Nevertheless, both contours are recognized by native speakers of English as realizations of the same melody, in terms of both form and prag-

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

80 AMALIA ARVANITI AND JANET FLETCHER (a)

350

f0 (Hz)

310 270 230 190 150

Lou?! L-H%

L*+H 0

Time (s)

0.6964

(b) 350

f0 (Hz)

310 270 230 190 150

A

ballet

aficionado?! L-H%

L*+H 0

Time (s)

1.496

Figure 6.1 Spectrograms and f0 contours illustrating the same English tune as realized on a monosyllabic utterance (a) and a longer utterance (b).

matic function, the aim of which is to signal incredulity (Ward and Hirschberg 1985; Hirschberg and Ward 1992). The differences between the contours are not random. Rather, they exhibit what Arvaniti and Ladd (2009) have termed ‘lawful variability’, i.e. variation that is systematically related to variables such as the length of the utterance (as shown in Figure 6.1), the position of stressed syllables, and a host of other factors (see Arvaniti 2016 for a detailed discussion of additional sources of systematic variation in intonation). Besides understanding what contours like those in Figure 6.1 have in common and how they vary, a central endeavour of AM is to provide a phonological analysis that reflects this understanding.

6.2 AM phonology 6.2.1 AM essentials In AM, intonation is phonologically represented as a string of Low (L) and High (H) tones and combinations thereof (Pierrehumbert 1980; Beckman and Pierrehumbert 1986; Ladd 2008b; cf. Leben 1973; Liberman 1975; Goldsmith 1981). Tones are considered

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

THE AUTOSEGMENTAL-METRICAL THEORY OF INTONATIONAL PHONOLOGY 81 ‘autosegments’: they are autonomous segments relative to the string of vowels and consonants. Ls and Hs are the abstract symbolic (i.e. phonological) primitives of intonation (much as they are the primitives in the representation of lexical tone). Their identity as Hs and Ls is defined in relative terms: H is used to represent tones deemed to be relatively high at some location in a melody relative to the pitch of the surrounding parts of the melody, while L is used to represent tones that are relatively low by the same criterion (cf. Pierrehumbert 1980: 68–75). Crucially, the aim of the string of tones is not to faithfully represent all the modulations that may be observed in f0 contours but rather to capture significant generalizations about contours perceived to be instances of the same melody (see Arvaniti and Ladd 2009 for a detailed presentation of this principle). Thus, AM phonological presentations are underspecified in the sense that they do not account (and are not meant to account) for all pitch movements; rather they include only those elements needed to capture what is contrastive in a given intonational system. At the phonetic level as well, it is only the tones of the phonological representation that are realized as targets, with the rest of the f0 contour being derived by ‘interpolation’ (see §6.3.3 for a discussion of interpolation).

6.2.2 Metrical structure and its relationship with the autosegmental tonal string The relationship between tones and segments (often referred to as ‘tune–text association’) is mediated by a metrical structure. This is a hierarchical structure that represents (i) the parsing of an utterance into a number of constituents and (ii) the prominence relations between them (e.g. the differences between stressed and unstressed syllables). The term ‘metrical structure’, as in the term ‘autosegmental-metrical theory’, is typically used when the representation of stress is at issue; when the emphasis is on phrasal structure, the term ‘prosodic structure’ is often used instead. Both relative prominence and phrasing can be captured by the same representation (see e.g. Pierrehumbert and Beckman 1988). An example is given in (1), which represents the prosodic structure of the utterance in Figure 6.1b, ‘a ballet aficionado?!’. As can be seen, syllables (σ) are grouped into feet (F), which in turn are grouped into prosodic words (ω); in this example, prosodic words are grouped into one intermediate phrase (ip), which is the only constituent of the utterance’s only intonational phrase (IP). Relative prominence is presented by marking constituents as strong (s) or weak (w). (1)

Intonational Phrase

IP

intermediate phrase

ips ωs

ωw

Fs

Prosodic Word

Fw

σw

σs

σw

σw

a

bal

let

a

Fs

σs σw fi

σs σw

cio na do

Foot Syllable

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

82 AMALIA ARVANITI AND JANET FLETCHER The prosodic structure in (1) is based on the model of Pierrehumbert and Beckman (1988), which has been implicitly adopted and informally used in many AM analyses. This model is similar to other well-known models (cf. Selkirk 1984; Nespor and Vogel 1986) but differs from them in some critical aspects. First, the number and nature of levels in the prosodic hierarchy are not fixed but language specific. For instance, Pierrehumbert and Beckman (1988) posit three main levels of phrasing for Tokyo Japanese: the accentual phrase (AP), the intermediate phrase (ip) and the Intonational Phrase (IP). However, they posit only two levels of phrasing for English: the ip and the IP, as illustrated in (1), since they found no evidence for an AP level of phrasing (Beckman and Pierrehumbert 1986). Further, the model assumes that it is possible to have headless constituents (i.e. constituents that do not include a strong element, or ‘head’). In the analysis of Pierrehumbert and Beckman (1988), this applies to Japanese AP’s that do not include a word with a lexical pitch accent; in such AP’s, there are no syllables, feet, or prosodic words that are strong. The same understanding applies by and large to several other languages that allow headless constituents, such as Korean (Jun 2005b), Chickasaw (Gordon 2005a), Mongolian (Karlsson 2014), Tamil (Keane 2014), and West Greenlandic (Arnhold 2014a); informally, we can say that these languages do not have stress. In addition, the model of Pierrehumbert and Beckman (1988) relies on n-ary branching trees (trees with more than two branches per node); grouping largely abides by the Strict Layer Hypothesis, according to which all constituents of a given level in the hierarchy are exhaustively parsed into constituents of the next level up (Selkirk 1984). However, Pierrehumbert and Beckman also accept limited extrametricality, such as syllables that are linked directly to a prosodic word node (Pierrehumbert and Beckman 1988: 147 ff.); this is illustrated in (1), where the indefinite article a and the unstressed syllable at the beginning of aficionado are linked directly to the relevant ω node. (For an alternative model of prosodic structure that allows limited recursiveness, see Ladd 2008b: ch. 8, incl. references.) Independently of the particular version of prosodic structure adopted in an AM analysis, it is widely agreed that tones associate with phrasal boundaries or constituent heads (informally, stresses) or both (see §6.2.3 for details on secondary association of tones, and Gussenhoven 2018 for a detailed discussion of tone association). Tones that associate with stressed syllables are called ‘pitch accents’ and one of their roles is prominence enhancement; they are notated with a star (e.g. H*). The final accent in a phrase is called the ‘nuclear pitch accent’ or ‘nucleus’, and is usually deemed the most prominent. Pitch accents may consist of more than one tone and are often bitonal; examples include L*+H- and L-+H* (after Pierrehumbert’s 1980 original notation but also annotated as L*H or L*+H, and LH* or L+H* respectively). Pitch patterns have been analysed as reflexes of bitonal accents in a number of languages, including English (Ladd and Schepman 2003), German (Grice et al. 2005a), Catalan (Prieto 2014), Arabic (Chahal and Hellmuth 2014b), and Jamaican Creole (Gooden 2014). Grice (1995a) has also posited tritonal accents for English. (See also §6.2.3 on secondary association.) In Pierrehumbert (1980), the starred tone of a bitonal pitch accent is metrically stronger than the unstarred tone, and the only one that is phonologically associated (for details see §6.3.1); the unstarred weak tone that leads or trails the starred tone is ‘floating’ (i.e. it is a tone without an association). Research on a number of languages since Pierrehumbert (1980) indicates that additional types of relations are possible between the tones of bitonal accents. Arvaniti et al. (1998, 2000) have provided experimental evidence from Greek that

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

THE AUTOSEGMENTAL-METRICAL THEORY OF INTONATIONAL PHONOLOGY 83 tones in bitonal accents can be independent of each other, in that neither tone exhibits the behaviour of an unstarred tone described by Pierrehumbert (1980). Frota (2002), on the other hand, reports data from Portuguese showing that the type of ‘loose’ bitonal accent found in Greek can coexist with pitch accents that show a closer connection between tones, akin to the accents described by Pierrehumbert (1980) for English. Tones that associate with phrasal boundaries are collectively known as ‘edge tones’ and their main role is to demarcate the edges of the phrases they associate with. These may also be multitonal; for example, for Korean, Jun (2005b) posits boundary tones with up to five tones (e.g. LHLHL%), while Prieto (2014) posits a tritonal LHL% boundary tone for Catalan. Following Beckman and Pierrehumbert (1986), many analyses posit two types of edge tone, ‘phrase accents’ and ‘boundary tones’, notated with - and % respectively (e.g. H-, H%). Phrase accents demarcate ip boundaries and boundary tones demarcate IP boundaries. For example, Nick and Mel were late because they missed the train is likely to be uttered as two ip’s forming one IP: [[Nick and Mel were late]ip [because they missed the train]ip]IP; the boundary between the two ip’s is likely to be demarcated with a H- phrase accent. An illustration of the types of association between tones and prosodic structure used in AM is provided in (2) using the same utterance as in (1). (2)

IP ips ωs

ωw

Fs σw

σs

Fw σw

σw

Fs

σs σw σs σw

====================== L* +H L-H% ====================== a bal let a fi cio na do All of the languages investigated so far have edge tones that associate with right boundaries. Left-edge boundary tones have also been posited for several languages, including English (Pierrehumbert 1980; Gussenhoven 2004), Basque (Elordieta and Hualde 2014), Dalabon (Fletcher 2014), and Mongolian (Karlsson 2014). However, the specific proposal of Beckman and Pierrehumbert (1986) linking phrase accents to the ip and boundary tones to the IP has not been generally accepted, although it has been adopted by many analyses, including those for Greek (Arvaniti and Baltazani 2005), German (Grice et al. 2005a), Jamaican Creole (Gooden 2014), and Lebanese and Egyptian Arabic (Chahal and Hellmuth 2014b). Some AM analyses dispense altogether with phrase accents either for reasons of parsimony—positing only two types of primitives, pitch accents and boundary tones—or because they adopt a different conception of how the f0 contour is to be broken into constituent tones (see, among others, Gussenhoven 2004, 2005 on Dutch; Frota 2014 on Portuguese; Gussenhoven 2016 on English). Thus, phrase accents are not necessarily included in all AM analyses.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

84 AMALIA ARVANITI AND JANET FLETCHER Following Pierrehumbert and Hirschberg (1990), pitch accents and edge tones are treated as intonational morphemes with pragmatic meaning that contribute compositionally to the pragmatic interpretation of an utterance (Pierrehumbert and Hischberg 1990; Steedman 2014; see chapter 30 for a discussion of intonational meaning). Although this understanding of intonation as expressing pragmatic meaning is generally accepted, it may not apply to the same extent to all systems. For example, in languages like Korean and Japanese, in which intonation is used primarily to signal phrasing, tones express pragmatic meaning to a much lesser extent than in languages like English (Pierrehumbert and Beckman 1988 on Japanese; Jun 2005b on Korean).

6.2.3 Secondary association of tones In addition to a tone’s association with a phrasal boundary or constituent head, AM provides a mechanism for ‘secondary association’. For instance, according to Grice (1995a: 215 ff.), leading tones of English bitonal accents, such as L in L+H*, associate with the syllable preceding the accented one (if one is available), while trailing tones, such as H in L*+H, occur a fixed interval in ‘normalized time’ after the starred tone. The former is a type of secondary association (for discussions of additional association patterns, see Barnes et al. 2010a on English; van de Ven and Gussenhoven 2011 on Dutch; Peters et al. 2015 on several Germanic varieties). Although secondary association has been used for a variety of purposes, it has come to be strongly associated with phrase accents. Pierrehumbert and Beckman (1988) proposed the mechanism of secondary association to account for the fact that phrase accents often spread (see also Liberman 1979). Specifically, Pierrehumbert and Beckman (1988) proposed that edge tones may acquire additional links (i.e. secondary associations) either to a specific tone-bearing unit (TBU), such as a stressed syllable, or to another boundary. For example, they posited that English phrase accents are linked not only to the right edge of their ip (as advocated in Beckman and Pierrehumbert 1986) but also to the left edge of the word carrying the nuclear pitch accent. An example of such secondary association can be found in Figure 6.1b, in which the L- phrase accent is realized as a low f0 stretch. This stretch is due to the fact that the L- phrase accent associates both with the right ip boundary (and thus is realized as close as possible to the right edge of the phrase) and with the end of the accented word (and thus stretches as far as possible to the left). The general mechanism whereby edge tones have secondary associations has also been used by Gussenhoven (2000a) in his analysis of the intonation of Roermond Dutch, which assumes that boundary tones can be phonologically aligned both with the right edge of the phrase and with an additional leftmost position. The analyses of Pierrehumbert and Beckman (1988) and Gussenhoven (2000a) were the basis for a wider use of secondary association for phrase accents developed in Grice et al. (2000), who argue that the need for positing phrase accents in a given intonation system is orthogonal to the need for the ip level of phrasing. Grice et al. (2000) examined putative phrase accents in a variety of languages (Cypriot Greek, Dutch, English, German, Hungarian, Romanian, and Standard Greek). They showed that the phrase accents they examined are realized either on a peripheral syllable, as expected of edge tones, or an earlier one, often one that is metrically strong; which of the two realizations prevails depends on whether the

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

THE AUTOSEGMENTAL-METRICAL THEORY OF INTONATIONAL PHONOLOGY 85 (a) 275

f0 (Hz)

240 205 170 135 100 koˈlibise

ˈðimitra

i

H-L%

L* 0

1.332

Time (s)

(b) 275

f0 (Hz)

240 205 170 135 100 koˈlibise L*+H 0

ˈðimitra

i L* Time (s)

H-L% 1.509

Figure 6.2 Spectrograms and f0 contours of the utterance [koˈlibise i ˈðimitra] with focus on [koˈlibise] ‘swam’ (a) and on [ˈðimitra] (b), translated as ‘Did Dimitra SWIM?’ and ‘[Was it] DIMITRA who swam?’ respectively.

metrically strong syllable is already associated with another tone or not. This type of variation is illustrated in Figure 6.2 with the Greek polar question tune L* H-L% (Grice et al. 2000; Arvaniti et al. 2006a). As can be seen in Figure 6.2, both contours have a pitch peak close to the end of the utterance. This peak co-occurs with the stressed antepenult of the final word in the question in Figure 6.2a ([ˈði] of [ˈðimitra]), but with the last vowel in the question in Figure 6.2b (the vowel [a] of [ˈðimitra]). (Note also that the stressed antepenult of [ˈðimitra] has low f0 in Figure 6.2b, as does the stressed syllable of [koˈlibise] in Figure 6.2a; both reflect an association with the L* pitch accent of this tune.) Grice et al. (2000) attribute this difference in the alignment of the pitch peak to secondary association: the peak is the reflex of a H- phrase accent associated with a phrasal boundary, but also has a secondary association to the last metrically strong syvllable of the utterance. This association is phonetically realized when this metrically strong syllable is not associated with a pitch accent; this happens when the focus is on an earlier word, which then attracts the L* pitch accent. The phonological structures involved are shown in (3a) and (3b); (3a) shows the primary and secondary association of the H- phrase accent; (3b) shows that the secondary association of H- is not possible because [ˈði] is already associated with the L* pitch accent.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

86 AMALIA ARVANITI AND JANET FLETCHER (3) a. ‘Did Dimitra SWIM?’ L*

b. ‘[Was it] DIMITRA [who] swam?’

H- L%

[[koˈlibise i ˈðimitra]ip]IP

L* +H

L*

H- L%

[[koˈlibise i ˈðimitra]ip]IP

6.2.4 The phonological composition of melodies In Pierrehumbert (1980), the grammar for generating English tunes is as shown in Figure 6.3. Primitives of the system can combine freely and, in the case of pitch accents, iteratively. With the exception of left-edge boundary tones, which are optional, all other elements are required. In other words, a well-formed tune must include at least one pitch accent followed by a phrase accent and a boundary tone (e.g. H* L-L%). The fact that elements combine freely is connected to Pierrehumbert’s position that there is no hierarchical structure for tunes (they are a linear string of autosegments, as illustrated in (2)). It follows that there are no qualitative differences between pitch accents, as in other models of intonation, and no elements are privileged in any way. This conceptualization of the tonal string also allows for the integration of lexically specified and post-lexical tones (i.e. intonation) into one tonal string. Not everyone who works within AM shares this view. Gussenhoven (2004: ch. 15, 2005, 2016) provides analyses of English and Dutch intonation that rely on the notion of nuclear contours as units comprising what in other AM accounts is a sequence of a nuclear pitch accent followed by edge tones. Gussenhoven’s nuclear contours are akin to the nuclei of the British School. Gussenhoven (2016) additionally argues that a grammar along the lines of Figure 6.3 makes the wrong predictions, since not all possible combinations are grammatical in English, while the grammar results in both over- and under-analysis (capturing dubious distinctions while failing to capture genuine differences between tunes, respectively). Dainora (2001, 2006) also showed that some combinations of accents and edge tones are much more likely than others (though the frequencies she presents may be skewed as they

H* %H

L*

H

H%

L

L%

L+H*

L*+H %L

H*+L H+L* H*+H

Figure 6.3 The English intonation grammar of Pierrehumbert (1980); after Dainora (2006).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

THE AUTOSEGMENTAL-METRICAL THEORY OF INTONATIONAL PHONOLOGY 87 are based on a news-reading corpus). Other corpus studies of English also find that there are preferred combinations of tones in spoken interaction (e.g. Fletcher and Stirling 2014). Overall, the evidence indicates that some combinations are preferred and standardized, possibly because they reflect frequently used pragmatic meanings. This is particularly salient in some languages in which tune choice is limited to a small number of distinctive patterns (e.g. see chapter 26 for a description of intonation patterns in Indigenous Australian languages) by contrast with languages such as Dutch or English, where a range of tonal combinations are available to speakers (e.g. Gussenhoven 2004, 2005).

6.3 Phonetic implementation in AM As noted in §6.1, AM provides a model for mapping abstract phonological representations to phonetic realization. Much of what we assume about this connection derives from Pierrehumbert (1980) and Bruce (1977). Phonetically, tones are said to be realized as ‘tonal targets’ (i.e. as specific points in the contour), while the rest of an f0 curve is derived by interpolation between these targets. That is, f0 contours are both phonologically and phonetically underspecified, in that only a few points of each contour are determined by tones and their targets (see Arvaniti and Ladd 2009 for empirical evidence of this point). Tonal targets are usually ‘turning points’, such as peaks, troughs, and elbows in the contour; they are defined by their ‘alignment’ and ‘scaling’ (see §6.3.1 and §6.3.2). Scaling refers to the value of the targets in terms of f0. Alignment is defined as the position of the tonal target relative to the specific TBU with which it is meant to co-occur (e.g. Arvaniti et al. 1998, 2006a, 2006b; Arvaniti and Ladd 2009). The identity of TBUs varies by language, depending on syllable structure, but we can equate TBUs with syllable nuclei (and in some instances with morae and coda consonants; see Pierrehumbert and Beckman 1988 on Japanese; Ladd et al. 2000 on Dutch; Gussenhoven 2012a on Limburgish). The TBUs with which tones phonetically co-occur are related to the metrical positions with which the tones associate in phonology: thus, pitch accents typically co-occur with stressed syllables (though not all stressed syllables are accented); edge tones are realized on peripheral TBUs, such as phrase-final vowels.

6.3.1 Tonal alignment In AM, tonal alignment is a phonetic notion that refers specifically to the temporal alignment of tones with segmental and/or syllabic landmarks. Alignment can refer to the specific timing of a tone, but it may also reflect a phonological difference, in which case the timing of tones relative to the segmental string gives rise to a change in lexical or pragmatic meaning. For example, Bruce (1977) convincingly showed that the critical difference between the two lexical pitch accents of Swedish, Accent 1 and Accent 2, was due to the relative temporal alignment of a HL tonal sequence. For Accent 1, the H tone is aligned earlier with respect to the accented vowel than for Accent 2, a difference that Bruce (2005) encoded as a phonological difference between H+L* (Accent 1) and H*+L (Accent 2) for the Swedish East Prosodic dialect (see Bruce 2005 for a full overview of dialect-specific phonological

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

88 AMALIA ARVANITI AND JANET FLETCHER v ariation in Swedish). Pierrehumbert (1980) similarly proposed L+H* and L*+H in English to account for the difference between early versus late alignment, with the H of the L*+H being realized after the stressed TBU (see also the discussion of trailing and leading tones in §6.2.3). While alignment differences are to be encoded in tonal representations when they are contrastive, in cases where variation in tonal alignment is not contrastive, a single representation, or ‘label’, is used. For instance, in Glasgow English, the alignment of the rising pitch accent, L*H in the analysis of Mayo et al. (1997), varies from early to late in the accented rhyme, without any apparent difference in meaning. In such cases, the tonal representation may allow for more options without essentially affecting the analysis. For instance, since the rising pitch accent of Glasgow English is variable, a simpler representation as H* instead of L*H may suffice, as nothing hinges on including a L tone in the accent’s representation or on starring one or the other tone (cf. Keane 2014 on Tamil; Fletcher et al. 2016 on Mawng; Arvaniti 2016 on Romani). One point that has become very clear thanks to a wide range of research on tonal alignment is that the traditional autosegmental idea that phonological association necessarily entails phonetic co-occurrence between a tone and a TBU does not always hold (e.g. see Arvaniti et al. 1998 on Greek; D’Imperio 2001 on Neapolitan Italian). This applies particularly to pitch peaks. Indeed, one of the most consistent findings in the literature is that of ‘peak delay’, the finding that accentual pitch peaks regularly occur after the TBU they are phonologically associated with. Peak delay was first documented by Silverman and Pierrehumbert (1990), who examined the phonetic realization of prenuclear H* accents in American English. It has since been reported for (among many others) South American Spanish (Prieto et al. 1995), Kinyarwanda (Myers 2003), Bininj Gun-wok (Bishop and Fletcher 2005), Catalan (Prieto 2005), Irish (Dalton and Ní Chasaide 2007a), and Chickasaw (Gordon 2008). The extent of peak delay can vary across languages and pitch accents, but it remains stable within category (Prieto 2014). This stability in known as ‘segmental anchoring’. The idea of segmental anchoring is based on the alignment patterns observed by Arvaniti et al. (1998) for Greek prenuclear accents and further explored in subsequent work by Ladd and colleagues on other languages (e.g. Ladd et al. 2000 on Dutch; Ladd and Schepman 2003 and Ladd et al. 2009b on English; Atterer and Ladd 2004 on German). Segmental anchoring is the hypothesis that tonal targets anchor onto particular segments in phonetic realization. The idea of segmental anchoring spurred a great deal of research in a variety of languages that have largely supported it (e.g. D’Imperio 2001 on Neapolitan Italian; Prieto 2009 on Catalan; Arvaniti and Garding 2007 on American English; Gordon 2008 on Chickasaw; Myers 2003 on Kinyarwanda; Elordieta and Calleja 2005 on Basque Spanish; Dalton and Ní Chasaide 2007a on Irish). Finally, research on tonal alignment also supports the key assumption underpinning AM models in which tonal targets are levels rather than contours (i.e. rises or falls). This idea was put to the test in Arvaniti et al. (1998), who found that the L and H targets of Greek prenuclear accents each have their own alignment properties. A consequence of this mode of alignment is that the rise defined by the L and H targets has no invariable properties (such as duration or slope), a finding used by Arvaniti et al. (1998) to argue in favour of levels as intonational primitives. Empirical evidence from tone perception in English (Dilley and Brown 2007) showing that listeners perceptually equate pitch movements with level tones supports this view (see also House 2003).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

THE AUTOSEGMENTAL-METRICAL THEORY OF INTONATIONAL PHONOLOGY 89

6.3.2 Tonal scaling Since Ladd (1996) a distinction has been made between ‘pitch span’, which refers to the extent of the range of frequencies used by a speaker, and ‘pitch level’, which refers to whether these frequencies are overall high or low; together, level and span constitute a speaker’s ‘pitch range’. Thus, two speakers may have the same pitch span of 200 Hz but one may use a low level (e.g. 125–325 Hz) and the other a higher level (e.g. 175–375 Hz). A speaker’s pitch range may change for paralinguistic reasons, while, cross-linguistically, gender differences have also been observed (e.g. Daly and Warren 2002; Graham 2014). Three main linguistic factors affect tonal scaling: declination, tonal context, and tonal identity. Declination is a systematic lowering of targets throughout the course of an utterance (’t Hart et al. 1990), though declination can be suspended (e.g. in questions) and is reset across phrasal boundaries (Ladd 1988; see also Truckenbrodt 2002). Listeners anticipate declination effects and adjust their processing of tonal targets accordingly (e.g. Yuen 2007). Within AM, the understanding of declination follows Pierrehumbert (1980): the scaling of tones is modelled with reference to a declining baseline that is invariant for each speaker (at a given time). The baseline is defined by its slope and a minimum value assumed to represent the bottom of the speaker’s range, which tends to be very stable for each speaker (Maeda 1976; Menn and Boyce 1982; Pierrehumbert and Beckman 1988). L and H tones (apart from terminal L%s) are scaled above the baseline and with reference to it. Tonal context relates to the fact that the scaling of targets is related to the targets of preceding tones. For sequences of accentual H tones in particular, Liberman and Pierrehumbert (1984) have argued that every tone’s scaling is a fraction of the scaling of the preceding H. Tonal scaling is influenced by tonal context: for example, according to Pierrehumbert (1980: 136), the difference between the vocative chant H*+L- H- L% and a straightforward declarative, H* L- L%, is that the L% in the former melody remains above the baseline (and is realized as sustained level pitch), while the L% in the latter is realized as a fall to the baseline. In Pierrehumbert’s analysis, this difference is due to tonal context: in H*+L H-L%, L% is ‘upstepped’ (i.e. scaled higher) after a H- phrase accent; this context does not apply in H* L-L%, so L% reaches the baseline. One exception to the view that each H tone’s scaling is calculated as a fraction of the preceding H is what Liberman and Pierrehumbert (1984) have called ‘final lowering’, the fact that the final peak in a series is scaled lower than what a linear relation between successive peaks would predict. It has been reported in several languages with very different prosodic systems, including Japanese (Pierrehumbert and Beckman 1988), Dutch (Gussenhoven and Rietveld 1988), Yoruba (Connell and Ladd 1990; Laniran and Clements 2003), Kipare (Herman 1996), Spanish (Prieto et al. 1996), and Greek (Arvaniti and Godjevac 2003); see Truckenbrodt (2004, 2016) for an alternative analysis of final lowering. Tonal identity refers to different effects of a number of factors on the scaling of H and L tones. In English, for instance, L tones are said to be upstepped following H tones, while the reverse does not apply (Pierrehumbert 1980). Further, changes in pitch range affect the scaling of H and L tones in different ways: L tones tend to get lower when pitch span expands, while H tones get higher (e.g. Pierrehumbert and Beckman 1988; Gussenhoven and Rietveld 2000). An aspect of tonal scaling that has attracted considerable attention is ‘downstep’, the lower-than-expected scaling of H tones. In Pierrehumbert (1980) and Beckman and Pierrehumbert (1986), the essential premise is that downstep is the outcome of contextual rules. Thus, Pierrehumbert (1980) posits that downstep applies to the second H tone in a

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

90 AMALIA ARVANITI AND JANET FLETCHER HLH sequence, as in the case of the second H in Pierrehumbert’s (1980) representation of the vocative chant H*+L- H- L% above. In Beckman and Pierrehumbert (1986), tonal identity is also a key factor: all bitonal pitch accents trigger downstep of a following H tone. This position has been relaxed somewhat as it has been found that bitonal accents do not always trigger downstep in spoken discourse in American English (Pierrehumbert 2000). Others have argued that downstepped accents differ in meaning from accents that are not downstepped and thus that downstep should be treated as an independent phonological feature to mark the contrast between downstepped and non-downstepped accents, such as !H* and H* respectively (Ladd 1983, 2008b). The issue of whether downstep is a matter of contextdependent phonetic scaling or represents a meaningful choice remains unresolved for English and has been a matter of debate more generally. In some AM systems, additional notations are used to indicate differences in scaling. For example, 0% and % have been used to indicate an intermediate level of pitch that is neither high nor low within a given melody and is often said to reflect a return to a default midpitch in the absence of a tonal specification (e.g. Grabe 1998a; Gussenhoven 2005; see Ladd 2008b: ch. 3 and Arvaniti 2016 for discussion). Still other systems incorporate additional variations in pitch, such as ‘upstep’, or higher-than-expected scaling. For instance, Grice et al. (2005a) use ^H% in the analysis of German intonation, and Fletcher (2014: 272) proposes ^ as ‘an upstepped or elevated pitch diacritic’ in her analysis of Dalabon. The use of symbols such as 0% reflects the awkwardness that mid-level tones pose for analysis, particularly if evidence suggests that such mid-level tones contrast with H and L tones proper, as has been argued for Greek (Arvaniti and Baltazani 2005), Maastricht Limburgish (Gussenhoven 2012b), Polish (Arvaniti et al. 2017), and German (Peters 2018). The use of diacritics more generally reflects the challenge of determining what is phonological and what is phonetic in a given intonational system, and thus what should be part of the phonological representation; see Jun and Fletcher (2014) and Arvaniti (2016) for a discussion of field methods that address these issues. A reason why separating phonological elements from phonetic realization in intonation is such a challenge is the significant amount of variation attested in the realization of intonation. Even speakers of languages that have a number of different contrastive pitch accents may realize the same pitch accent category in different ways. Niebuhr et al. (2011a), for instance, report data from North German and Neapolitan Italian showing that some speakers signal a pitch accent category via f0 shape, whereas others manipulate tonal alignment (see also Grice et al. 2017 on individual variation in the realization and interpretation of pitch accents in German). Tonal alignment may also vary depending on dialect (e.g. Grabe et al. 2000 on varieties of English; Atterer and Ladd 2004 on varieties of German), and even the amount of voiced material available (Baltazani and Kainada 2015 on Ipiros Greek; Grice et al. 2015 on Tashlhiyt Berber). There may also be variation in the degree of rising or falling around the tone target, or general pitch scaling differences depending on where the target occurs in an utterance (IP, ip, or AP), degree of speaker emphasis, dialect, or speaker-specific speaking style. Phrase accent and boundary tone targets also vary in terms of their phonetic realization even across typologically related varieties. The classic fall-rise tune of Australian English H* L-H% is often realized somewhat differently from the same phonological tune in Standard Southern British English. The L-H% represents a final terminal rise in both varieties but scaling of the final H% tone tends to be somewhat higher in Australian English and is often described as a component of ‘uptalk’ (Warren 2016; see Fletcher and

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

THE AUTOSEGMENTAL-METRICAL THEORY OF INTONATIONAL PHONOLOGY 91 Stirling 2014 and chapter 19 for more detailed discussion). It follows from the preceding discussion that the actual phonetic realization of tonal elements can be gradient with respect to both alignment and scaling. Listeners may not necessarily interpret the different realizations as indicative of different pragmatic effects, suggesting that there is no need to posit additional contrastive categories to represent this variation. It is therefore important that a phonetic model for any language or language variety can account for this kind of realizational variation (for a proposal on how to do so, see Arvaniti 2019 and chapter 9).

6.3.3 Interpolation and tonal crowding Interpolation is a critical component of AM, as it is directly linked to the important AM tenet that melodies are not fully specified either at the phonological or the phonetic level, in that the f0 on most syllables in an utterance is determined by the surrounding tones. The phonological level involves a small number of tones; at the phonetic level, it is only these tones that are planned as tonal targets, while the rest of the f0 contour is derived by interpolation between them. The advantages of modelling f0 in this manner were first illustrated with Tokyo Japanese data by Pierrehumbert and Beckman (1988: 13 ff.). They showed that the f0 contours of AP’s without an accented word could be modelled by positing only one H target, associated with the AP’s second mora, and one L target at the beginning of the following AP; the f0 slope from the H to the L target depended on the number of morae between the two. This change in the f0 slope is difficult to model if every mora is specified for f0. Despite its importance, interpolation has not been investigated as extensively as alignment and scaling. The interpolation between targets is expected to be linear and can be conceived of as the equivalent of an articulator’s trajectory between two constrictions (cf. Browman and Goldstein 1992b). An illustration of the mapping between phonological structure and f0 contour is provided in (4), where the open circles in the f0 contour at the bottom represent tonal targets for the four tones of the phonological representation. As mentioned in §6.2.3, the L- phrase accent in this melody shows a secondary association to the end of the word with the nuclear accent, here ‘ballet’, and thus has two targets, leading to its realization as a stretch of low f0 (for a detailed discussion, see Pierrehumbert and Beckman 1988: ch. 6). (4)

IP ips ωs Fs

ωw Fw

Fs

σw σs σw σw σs σw σs σw ==================== L-H% L* +H ==================== a bal let a fi cio na do

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

92 AMALIA ARVANITI AND JANET FLETCHER One possible exception to linear interpolation is the ‘sagging’ interpolation between two H* pitch accents discussed in Pierrehumbert (1980, 1981) for English; sagging interpolation is said to give rise to an f0 dip between the two accentual peaks. It has always been seen as something of an anomaly, leading Ladd and Schepman (2003) to suggest that in English its presence is more plausibly analysed as the reflex of a L tone. Specifically, Ladd and Schepman (2003) proposed that the pitch accent of English represented in Pierrehumbert (1980) as H* should be represented instead as (L+H)*, a notation implying that in English both the L and H tone are associated with the stressed syllable. Independently of how sagging interpolation is phonologically analysed, non-linear interpolation is evident in the realization of some tonal events. For instance, L*+H and L+H* in English differ in terms of shape, the former being concave and the latter convex, a difference that is neither captured by their autosegmental representations nor anticipated by linear interpolation between the L and H tones (Barnes et al. 2010b). In order to account for this difference, Barnes et al. (2012a, 2012b) proposed a new measure, the Tonal Center of Gravity (see chapter 9). Further, although it is generally assumed that tones are realized as local peaks and troughs, evidence suggests this is not always the case. L tones may be realized as stretches of low f0, a realization that may account for the difference between convex L+H* (where the L tone is realized as a local trough) and concave L*+H (where the L tone is realized as a low f0 stretch). Similarly, H tones may be realized not as peaks but as plateaux. In some languages, plateaux are used interchangeably with peaks (e.g. Arvaniti 2016 on Romani), while in others the two are distinct, so that the use of peaks or plateaux may affect the interpretation of the tune (e.g. D’Imperio 2000 and D’Imperio et al. 2000 on Neapolitan Italian), the scaling of the tones involved (e.g. Knight and Nolan 2006 and Knight 2008 on British English), or both (Barnes et al. 2012a on American English). Data like these indicate that a phonetic model involving only targets as turning points and linear interpolation between them may be too simple to fully account for all phonetic detail pertaining to f0 curves or for its processing by native listeners. Nevertheless, the perceptual relevance of these additional details is at present far from clear. As noted above, the need for interpolation comes from the fact that the phonological representation of intonation is sparse; for example, ‘a ballet aficionado’ in (2) has eight syllables but the associated melody has a total of four tones. Nevertheless, it is also possible for the reverse to apply—that is, for an utterance to have more tones than TBUs; ‘Lou?’ uttered with the same L*+H L-H% tune (as in Figure 6.1b) is such an instance, as four tones must be realized on one syllable. In AM, this phenomenon is referred to as ‘tonal crowding’. Tonal crowding is phonetically resolved in a number of ways: (i) ‘truncation’, the elision of parts of the contour (Bruce 1977 on Swedish; Grice 1995a on English; Arvaniti et al. 1998 and Arvaniti and Ladd 2009 on Greek; Grabe 1998a on English and German); (ii) ‘undershoot’, the realization of all tones without them reaching their targets (Bruce 1977 on Swedish; Arvaniti et al. 1998, 2000, 2006a, 2006b on Greek; Prieto 2005 on Catalan); and (iii) temporal realignment of tones (Silverman and Pierrehumbert 1990 on American English; Arvaniti and Ladd 2009 on Greek). Undershoot and temporal realignment often work synergistically, giving rise to ‘compression’. Attempts have been made within AM to pin different resolutions of tonal crowding to specific languages (Grabe 1998a). Empirical evidence, however, indicates that the mechanism used is specific to elements in a tune, rather than to a language as a whole (for discussion see Ladd 2008b; Arvaniti and Ladd 2009; Arvaniti 2016).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

THE AUTOSEGMENTAL-METRICAL THEORY OF INTONATIONAL PHONOLOGY 93 Arvaniti et al. (2017) proposed using tonal crowding as a diagnostic of a putative tone’s phonological status, as it allows us to distinguish optional tune elements (those that are truncated in tonal crowding) from required elements (those that are compressed under the same conditions).

6.4 Applications of AM The best-known application of AM is the family of ToBI systems. ToBI (Tones and Break Indices) was a tool originally developed for the prosodic annotation of American English corpora (Silverman et al. 1992; see also Beckman et al. 2005; Brugos et al. 2006). Since then several similar systems have been developed for a variety of languages (see e.g. Jun 2005a, 2014a for relevant surveys). In order to distinguish the system for American English from the general concept of ToBI, the term MAE_ToBI has been proposed for the former (where MAE stands for Mainstream American English; Beckman et al. 2005). ToBI was originally conceived as a tool for research and speech technology; for example, the MAE_ToBI annotated corpus can be searched for instances of an intonational event, such as the H* accent in English, so that a sample of its instantiations can be analysed and generalizations as to its realization reached. Such generalizations are useful not only for speech synthesis but also for phonological analysis and the understanding of variation (for additional uses and extensions see Jun 2005c). ToBI representations consist of a sound file, an associated spectrogram and a pitch track, and several tiers of annotation. The required tiers are the tonal tier (a representation of the tonal structure of the pitch contour) and the break index tier (in which the perceived strength of prosodic boundaries is annotated using numbers). In the MAE_ToBI system, [0] represents cohesion between items (such as flapping between words in American English), [1] represents the small degree of juncture expected between most words, [3] and [4] represent ip and IP boundaries respectively, and [2] is reserved for uncertainty (e.g. for cases where the annotator cannot find tonal cues for the presence of a phrasal boundary but does perceive a strong degree of juncture). A ToBI system may also include an orthographic tier and a miscellaneous tier for additional information, such as disfluencies. Brugos et al. (2018) suggest incorporating an ‘alternatives’ tier, which allows annotators to capture uncertainty in assigning a particular tonal category. The content of all tiers can be adapted to the prosodic features of the system under analysis, but also to particular research needs and theoretical positions of the developers. For instance, Korean ToBI (K_ToBI) includes both a phonological and a phonetic tier (Jun 2005b), while Greek ToBI (GR_ToBI) marks sandhi, which is widespread in Greek and thus of phonological interest (Arvaniti and Baltazani 2005). ToBI as a concept has often been misunderstood. Some have taken ToBI to be the equivalent of an IPA alphabet for intonation, a claim that the developers of ToBI have taken pains to refute (e.g. Beckman et al. 2005; Beckman and Venditti 2011). A ToBI annotation system presupposes and is meant to rely on a phonological analysis of the contrastive elements of the intonation and prosodic structure of the language or language variety in question. ToBI can, however, be used as an informal tool to kick-start such an analysis on the understanding that annotations will have to be revisited once the phonological analysis is complete (Jun and Fletcher 2014; Arvaniti 2016).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

94 AMALIA ARVANITI AND JANET FLETCHER

6.5 Advantages over other models AM offers a number of advantages, both theoretical and practical, relative to other models. A major feature that distinguishes AM is that as a phonological model it relies on the combined investigation of form and meaning, and the principled separation of phonological analysis and phonetic exponence. The former feature distinguishes AM from the Institute for Perception Research (IPO) model (’t Hart et al. 1990), which focuses on intonation patterns but strictly avoids the investigation of intonational meaning. The latter feature contrasts AM with systems developed for the modelling and analysis of f0—such as PENTA (e.g. Xu and Prom-on 2014), INTSINT (Hirst and Di Cristo 1998), or Fujisaki’s model (e.g. Fujisaki 1983)—which do not provide an abstract phonological representation from which the contours they model are derived. As argued elsewhere in some detail (Arvaniti and Ladd 2009), the principled separation of phonetics and phonology in AM gives the theory predictive power and allows it to reach useful generalizations about intonation and its relation to the rest of prosody, while accounting for attested phonetic variation. In terms of phonetic implementation, the target-and-interpolation modelling of f0 allows for elegant and parsimonious analyses of complex f0 patterns, as shown by Pierrehumbert and Beckman (1988) for Japanese. AM can also accommodate non-linear interpolation, unlike the IPO (’t Hart et al. 1990). In addition, although tonal crowding is extremely frequent cross-linguistically, AM is the only model of intonation that can successfully handle it and predict its outcomes (see e.g. Arvaniti and Ladd 2009 and Arvaniti and Ladd 2015 for comparisons of the treatment of tonal crowding in AM and PENTA; see also Xu et al. 2015). Further, by relying on the formal separation of metrical structure and the tonal string, AM has disentangled stress from intonation. This has been a significant development, in that the effects of stress and intonation on a number of acoustic parameters, particularly f0, have often been confused in the literature (see Gordon 2014 for a review and chapter 5). This confusion applies both to documenting the phonetics of stress and intonation, and developing a better understanding of the role of intonation in focus and the encoding of information structure. Research within AM has shown that it is possible for words to provide new information in discourse without being accented, or to be accented without being discourse prominent (Prieto et al. 1995 on Spanish; Arvaniti and Baltazani 2005 on Greek; German et al. 2006, Beaver et al. 2007, and Calhoun 2010 on English; Arvaniti and Adamou 2011 on Romani; Chahal and Hellmuth 2014b on Egyptian Arabic). Finally, since AM reflects a general conceptualization of the relationship between tonal elements on the one hand and vowels and consonants on the other, it is sufficiently flexible to allow for the description of diverse prosodic systems—including systems that combine lexical and post-lexical uses of tone—and the development of related applications. In addition to the development of ToBI-based descriptive systems, as discussed in §6.4, such applications include modelling adult production and perception, developing automatic recognition and synthesis algorithms, and modelling child development, disorders and variation across contexts, speakers, dialects, and languages (see e.g. Sproat 1998 on speech synthesis; Lowit and Kuschmann 2012 on intonation in motor speech disorders;

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

THE AUTOSEGMENTAL-METRICAL THEORY OF INTONATIONAL PHONOLOGY 95 Thorson et al. 2014 on child development; Gravano et al. 2015 on prosodic entrainment and speaker engagement detection; Kainada and Lengeris 2015 on L2 intonation; Prom-on et al. 2016 on modelling intonation; see Cole and Shattuck-Hufnagel 2016 for a general discussion). In conclusion, AM is a flexible and adaptable theory that accounts for both tonal phonology and its relation to tonal phonetics across languages with different prosodic systems, and it can be a strong basis for developing a gamut of applications for linguistic description and speech technology.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

chapter 7

Prosodic Mor phol ogy John J. M c carthy

7.1 Introduction The phrase ‘prosodic morphology’ refers to a class of linguistic phenomena in which prosodic structure affects morphological form. In the Nicaraguan language Ulwa, for example, possessive morphemes are observed to occur after the main stress of the word, which always falls on one of the first two syllables in an iambic pattern, shown in (1) (Hale and Lacayo Blanco 1989; McCarthy and Prince 1990a). (1) Ulwa possessives ˈsuːlu ‘dog’ ˈbas ˈasna saˈna siˈwanak aˈrakbus

‘hair’ ‘clothes’ ‘deer’ ‘root’ ‘gun’

ˈsuː-ki-lu ˈsuː-ma-lu ˈsuː-ka-lu ˈbas-ka ˈas-ka-na saˈna-ka siˈwa-ka-nak aˈrak-ka-bus

‘my dog’ ‘your (sg.) dog’ ‘his/her dog’ ‘his/her hair’ ‘his/her clothes’ ‘his/her deer’ ‘his/her root’ ‘his/her gun’

In prosodic morphological terms, the possessive is suffixed to the main-stress metrical foot of the word: (siˈwa)-ka-nak. The possessive suffix subcategorizes for a prosodic constituent, the main-stress foot, rather than a morphological one, the stem. Ulwa is an example of infixation (§7.6), because the possessive suffix is internal to words with non-final stress. Other prosodic morphological phenomena to be discussed include reduplication (§7.3), root-and-pattern morphology (§7.4), and truncation (§7.5). First, though, a brief summary of the relevant assumptions about prosodic structure is necessary.

7.2 Prosodic structure Word prosody is an area of lively research and consequent disagreement, but there are certain fairly standard assumptions that underlie much work on prosodic morphology (though for other views see Downing 2006: 35; Inkelas 1989a, 2014: 84). The constituents of word prosody

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PROSODIC MORPHOLOGY 97 are the prosodic or phonological word (ω), the metrical or stress foot (Ft), the syllable (σ), and the mora (μ). The parsing of words into metrical feet is fundamental to most theories of word stress, with binary feet accounting for the typical rhythmic patterns of stress assignment: (ˌipe) (ˌcacu)(ˈana) (i.e. the English word ipecacuana). The mora is the unit of syllable weight. Generally, syllables ending in a short vowel (often referred to as CV syllables) are monomoraic and therefore light, while syllables ending in a long vowel, a diphthong, or a consonant (CVː, CVV, and CVC syllables) are bimoraic and therefore heavy. Some languages, called quantityinsensitive, do not make distinctions of syllable weight; in these languages, all syllables (or perhaps all CV and CVC syllables) are monomoraic (see chapter 5). These constituents are arranged into a prosodic hierarchy as in (2) (Selkirk 1981), in which every constituent of level n is obligatorily headed by a constituent of level n−1. (2) Prosodic hierarchy ω ⎜ Ft ⎜ σ ⎜ μ The head of a prosodic word is its main-stress foot, the head of a foot is the syllable that bears the stress, and the head of a syllable is the mora that contains the syllable nucleus. In addition to the headedness requirement, there are various principles of form that govern each level of the prosodic hierarchy. Of these, the one that is most important in studying prosodic morphology is foot binarity, the requirement that feet contain at least two syllables or morae. Many languages respect foot binarity absolutely; all languages, it would appear, avoid unary feet whenever it is possible to form a binarity foot. Combining the headedness requirement of the prosodic hierarchy with foot binarity leads to the notion of a minimal word (Broselow 1982; McCarthy and Prince 1986/1996). If every word must contain some foot to serve as its head, and if every foot must contain at least two syllables or two morae, then the smallest or minimal word in a language that respects foot binarity will be a disyllable (if distinctions of syllable weight are not made) or a single heavy syllable (if distinctions of syllable weight are made). Thus, in the Australian language Diyari (Austin 1981; Poser 1989), which is quantity-insensitive, monosyllabic words are prohibited, while in Latin, which is quantity-sensitive, the smallest word is the smallest foot, a heavy monosyllable CVC, CVː, or CVV, as in (ˈsol) ‘sun’, (ˈmeː) ‘me’, or (ˈkui) ‘to whom’. (Though for other views of the minimal word = minimal foot equivalence see Hayes 1995: 86; Garrett 1999; Gordon 1999: 255.)

7.3 Reduplication Reduplicative morphology involves copying all or part of a word. From the standpoint of prosodic morphology, partial reduplication is more interesting because prosodic structure determines what is copied. The naive expectation is that reduplication identifies a prosodic

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

98 JOHN J. MC CARTHY constituent in the stem and then copies it. This is not always the case, however, and is in fact atypical. Much more commonly, reduplication involves copying sufficient material to create a new prosodic constituent (Marantz 1982). The prosodic requirement is imposed on the copied material, not on the base from which it was copied. The example in (3) will clarify this important distinction. In the Philippine language Ilokano, the plural of nouns is formed by prefixing sufficient copied material to make a heavy syllable (Hayes and Abad 1989). (3) Heavy syllable reduplication in Ilokano pusa pus-pusa ‘cat/pl.’ kaldiŋ kal-kaldiŋ ‘goat/pl.’ ʤanitor ʤan-ʤanitor ‘janitor/pl.’ A heavy syllable is the prosodic characterization of the prefixed reduplicative material: pus-, kal-, and ʤan- are all heavy syllables. It is clearly not the case, however, that a syllable, heavy or otherwise, is targeted in the stem and then copied. Although kal happens to be a syllable in the stem, pus and ʤan are not. Rather, these segmental sequences in the stem are split across two syllables: pu.sa, ʤa.ni.tor. Other examples are given in §25.3. The analysis of partial reduplication posits a special type of morpheme, called a prosodic template, that characterizes the shape of the reduplicated material. In the Ilokano example, this morpheme is a heavy syllable, [μμ]σ, that is devoid of segments. The heavy-syllable prefix borrows segments from the stem to which it is attached via a copying operation. The details of how copying is achieved are not directly relevant to the topic of this volume, but see Marantz (1982), McCarthy and Prince (1988, 1999), Steriade (1988), Raimy (2000), Inkelas and Zoll (2005: 25), and McCarthy et al. (2012) for various approaches. Like segmental morphemes, templatic reduplicative morphemes come in various forms. In addition to its heavy-syllable reduplicative prefix, Ilokano also has a light-syllable prefix with various meanings, shown in (4). When combined with the segmental prefix ʔagin-, it conveys the sense of pretending to do something. (4) Light-syllable reduplication in Ilokano (Hayes and Abad 1989) ʤanitor ‘janitor’ ʔagin-ʤa-ʤanitor ‘pretend to be a janitor’ trabaho ‘to work’ ʔagin-tra-trabaho ‘pretend to work’ saŋit ‘to cry’ ʔagin-sa-saŋit ‘pretend to cry’ Observe that both simplex and complex onsets are copied: sa, tra. The light-syllable reduplicative template is satisfied by both CV and CCV, because onsets do not contribute to syllable weight. Whenever it is the case that the template does not limit copying, the segmental make-up of the base is duplicated exactly. Another reduplicative prosodic template, particularly common in the Australian and Austronesian languages, is the foot or minimal word. Recall that the minimal word in Diyari is a disyllabic foot. So is the reduplicative prefix (which has varied morphological functions), as shown in (5).

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PROSODIC MORPHOLOGY 99 (5) Minimal word reduplication in Diyari (McCarthy and Prince 1994a) ˈwil ̪a ˈwil ̪a-ˈwil ̪a ‘woman’ ˈwakari ˈwaka-ˈwakari ‘to break’ ˈnaŋkan̪t ̪i ˈnaŋka-ˈnaŋkan̪t ̪i ‘catfish’ ‘bird’ ˈtjilparku ˈtjilpa-ˈtjilparku The reduplicative morpheme in Diyari is quite literally a prosodic word (Austin 1981): it has its own main stress impressionistically, its first syllable has segmental allophones that are diagnostic of main stress, and it must end in a vowel, like all other prosodic words of Diyari. Reduplicated words in Diyari are prosodically compound, consisting of a minimal prosodic word followed by one that is not necessarily minimal. Why is the reduplicative part minimal even though the stem part is not? In other words, how is Diyari’s minimal word reduplication distinguished from total reduplication, like hypothetical ˈnaŋkan̪t̪i-ˈnaŋkan̪t̪i? McCarthy and Prince (1994a) argue that Diyari reduplication, and perhaps all forms of partial reduplication, are instances of what they call ‘emergence of the unmarked’. In Optimality Theory (OT), markedness constraints can be active even when they are ranked too low to compel violation of faithfulness constraints (Prince and Smolensky 1993/2004). Minimal word reduplication emerges when certain markedness constraints that are important in basic stress theory are active but dominated by faithfulness. Among these constraints is Parse-Syllable, which is violated by unfooted syllables (McCarthy and Prince 1993a). In a prosodic hierarchy of strict domination, there would be no ParseSyllable violations, because every constituent of type n−1 would be immediately dominated by a constituent of type n. In OT, however, the force of Parse-Syllable is determined by its ranking. In Diyari, Parse-Syllable is ranked below the constraints requiring faithfulness to the underlying representation, so it is not able to force deletion of stem segments in an odd-numbered (and hence unfooted) final syllable. But ParseSyllable is ranked above constraints requiring total copying of the stem into the reduplicative prefix (denoted here by the ad hoc constraint Copy). The effect of this ranking is shown somewhat informally in (6). (6) Emergence of the unmarked Red-tjilparku

Faith

Parse-Syll Copy

a. →

[(ˈtjilpa)Ft]PWd-[(ˈtjilpar)Ftku]PWd

*

b.

[(ˈtjilpa)Ftku]PWd-[(ˈtjilpar)Ftku]PWd

**

c.

[(ˈtjilpa)Ft]PWd-[(ˈtjilpa)Ft]PWd

**

***

The losing candidate in (6b) has copied the unfooted syllable ku, and necessarily left it unfooted, because Diyari does not permit monosyllabic feet. This candidate fails because

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

100 JOHN J. MC CARTHY it has incurred two Parse-Syllable violations, while (6a) has only one. The losing candidate in (6c) has eliminated all Parse-Syllable violations by deleting ku from the stem, a fatal violation of faithfulness. The winning candidate in (6a) retains the stem’s ParseSyllable violation—unavoidable because of high-ranking faithfulness—but it avoids copying that violation, at the expense of only low-ranking Copy. The extent to which other reduplicative templates, like those of Ilokano, are reducible to emergence of the unmarked, like Diyari, is a topic of discussion. See, for e xample, Urbanczyk (2001), Blevins (2003), Kennedy (2008), and Haugen and Hicks Kennard (2011).

7.4 Root-and-pattern morphology In root-and-pattern morphology, a prosodic template is the determinant of the form of an entire word, rather than just an affix, as is the case with reduplication. The prosodic template specifies the word pattern onto which segmental material (the root) is mapped (McCarthy 1981). A root-and-pattern system is arguably the fundamental organizing principle in the morphology of the Semitic languages (though see Watson 2006 for a review of divergent opinions). Some of the Classical Arabic prosodic templates are shown in (7). (7) Classical Arabic prosodic templates based on the root ktb ‘write’ Template Word Gloss Function of template CaCaC katab ‘wrote’ basic verb form CaCːaC kattab ‘caused to write’ causative verb CaːCaC kaːtab ‘corresponded’ reciprocal verb CuCuC kutub ‘books’ plural CaːCiC kaːtib ‘writer’ agent maCCaC maktab ‘office’ place maCCuːC maktuːb ‘written’ passive participle As usual in root-and-pattern systems, the effect of imposing a template is a fairly thorough remaking of a word’s form, so it may initially seem unrecognizable. Observe, however, that the consonants of the root are constant throughout. The same can be found with other roots; the root ħkm, for example, can also be found in other words that deal with the general concept of ‘judgement’: ħakam ‘passed judgement’, ħakkam ‘chose as arbitrator’, ħaːkama ‘prosecuted’, ħakiːm ‘judicious’, ħaːkim ‘a judge’, maħkam-at ‘a court’, and so on. For further discussion, see McCarthy (1981, 1993) and McCarthy and Prince (1990a, 1990b).

7.5 Truncation In truncation, a portion of the stem is deleted to mark a morphological distinction (Doak 1990; Ito 1990; Mester 1990; Weeda 1992; Benua 1995; Féry 1997; Ito and Mester 1997; Bat-El 2002; Cohn 2005; Alber and Arndt-Lappe 2012). There are two ways in which prosodic structure

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PROSODIC MORPHOLOGY 101 affects truncation: by specifying what remains or by specifying what is taken away. The former is a type of templatic morphology, closely resembling reduplicative and root-and-pattern morphology (§7.4). The latter is often referred to as subtractive morphology. In templatic truncation, a word is typically reduced to one or two syllables. This is particularly common in nicknames and terms of address, as in (8) and (9), though it can be found in other grammatical categories as well. (8) Japanese templatic truncation (Poser 1984b, 1990; Mester 1990) Name Truncated juːko o-juː Yuko ɾanko o-ɾan Ranko jukiko o-juki Yukiko midori o-mido Midori ʃinobu o-ʃino Shinobu (9) Indonesian templatic truncation (Cohn 2005) Word Truncated anak nak ‘child’ bapak pak ‘father’ Agus Gus personal name Lilik Lik personal name Glison Son personal name Mochtar Tar personal name The analysis of templatic truncation is very similar to the analysis of reduplication in Diyari. The template is some version of the minimal word, a single foot. Mapping an existing word to this template shortens it to minimal size. Mapping can proceed from left to right, as in Japanese, or right to left, as in Indonesian. Mapping may also start with the stressed syllable, as in English Elizabeth/Liz, Alexander/Sandy, and Vanessa/Nessa. In subtractive truncation, the material with constant shape consists of what is removed rather than what remains. In Koasati, for example, there are processes of plural formation that truncate the final VC or Vː (10) or the final C (11) of the stem. (10) Koasati VC subtractive truncation (Martin 1988) Singular Plural pitaf-fi-n pit-li-n ‘slice up the middle’ albitiː-li-n albit-li-n ‘to place on top of ’ akocofot-li-n akocof-li-n ‘to jump down’ (11) Koasati C subtractive truncation (Martin 1988) Singular Plural bikot-li-n bikoː-li-n ‘to bend between the hands’ asikop-li-n asikoː-li-n ‘to breathe’ What remains after truncation can be one, two, or three syllables long, depending on the length of the original stem. The constant, then, is that which is taken away rather than that which remains.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

102 JOHN J. MC CARTHY Subtractive truncation is not common, and often appears to be a historically secondary development in which erstwhile suffixes have been reanalysed as part of the base. As Alber and Arndt-Lappe (2012) note, there have been various efforts to analyse putative cases of subtractive truncation as something entirely different, such as phonological deletion. There have also been proposals within OT to introduce antifaithfulness constraints (i.e. constraints that require underlying and surface representations to differ from one another), and these constraints have been applied to the analysis of subtractive truncation (Horwood 1999; Alderete 2001a, 2001b; Bat-El 2002). There is a third type of truncation that cannot be readily classified as either templatic or subtractive. A class of vocatives in Southern Italian truncates everything after the stress, as illustrated in (12). (12) Southern Italian vocatives (Maiden 1995) Word Vocative avvoˈkatu avvoˈka ‘lawyer!’ miˈkele miˈke ‘Michael!’ doˈmeniko doˈme ‘Dominic!’ Similar phenomena can be found in other languages: Coeur d’Alene (Doak 1990; Thomason and Thomason 2004), English (Spradlin 2016), and Zuni (Newman 1965). The shape constant is that which remains after truncation—a word with final stress—but it is not a prosodic constituent such as a foot, because it is of arbitrary length. Phenomena such as this suggest that the identification of templates with prosodic constituents is insufficient. Generalized template theory (McCarthy and Prince 1993b, 1994b) allows templates to be defined by phonological constraints, much as we saw in (6). For further discussion, see Downing (2006), Flack (2007), Gouskova (2007), and Ito and Mester (1992/2003).

7.6 Infixation Infixes are affixes that are positioned internal to the root. As the Ulwa example in (1) shows, infixes sometimes fall within the general scope of prosodic morphology because prosodic factors affect their position. In Ulwa, the possessive affixes subcategorize for the head foot of the word, to which they are suffixed. Expletive infixation in English is another example (McCarthy 1982). Expletive words, such as fuckin’ or bloody, can be inserted inside of other words, provided that they do not split metrical feet: (ˌabso)Ft-fuckin’-(ˈlutely)Ft, not *(ˌab-fuckin’-so)Ft (ˈlutely)Ft or *(ˌabso)Ft ( ˈlute-fuckin’-ly)Ft. Prince and Smolensky (1993/2004) analyse Tagalog um-infixation, illustrated in (13), as prosodically conditioned. When a word begins with a single consonant, um is placed right after it. When a word begins with a consonant cluster, the infix variably falls within or after the cluster.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PROSODIC MORPHOLOGY 103 (13) Tagalog um-infixation sulat s-um-ulat Ɂabot Ɂ-um-abot gradwet g-um-radwet ~ gr-um-adwet preno p-um-reno ~ pr-um-eno

‘to write’ ‘to reach for’ ‘to graduate’ ‘to brake’

Prince and Smolensky propose that infixed um is actually a prefix that is displaced from initial position because otherwise the word would begin with a vowel: *um-sulat. In OT’s terms, um’s prefixhood is determined by a ranked, violable constraint, Align-Left(um, word). This constraint is ranked below Onset, which requires syllables to begin with consonants. (For further details, see Klein 2002; Zuraw 2007.) McCarthy and Prince (1993b) discuss a case of reduplicative infixation in the Timugon Murut language of Borneo. In this language, a light-syllable reduplicative template is prefixed to words beginning with a consonant (14a), but it is infixed after the first syllable of a word beginning with a vowel (14b). (14) Infixing reduplication in Timugon Murut (Prentice 1971) a. Copy initial CV bulud ‘hill’ bu-bulud ‘ridge’ limo ‘five’ li-limo ‘about five’ b. Skip initial V(C) and copy following CV ulampoj no gloss u-la-lampoj no gloss abalan ‘bathes’ a-ba-balan ‘often bathes’ ompodon ‘flatter’ om-po-podon ‘always flatter’ If the reduplicative prefix were not infixed, as in *u-ulampoj or *o-ompodon, the result would be adding an Onset violation. Abstractly, the analysis is the same as in Tagalog: Onset dominates the constraint requiring left-edge alignment of the reduplicative prefix. For further examples of prosodic morphology, see chapter 25, and for a comprehensive review of infixation phenomena and another point of view, see Yu (2007).

7.7 Summary This brief overview of prosodic morphology has introduced the principal phenomena— reduplication, root-and-pattern morphology, truncation, and infixation—and some of the proposals for how they should be analysed—prosodic templates and constraint interaction.

chapter 8

Sign L a nguage Prosody Wendy Sandler, Diane Lillo-Martin, Svetlana Dachkovsky, and Ronice Müller de Quadros

8.1 The visible organization of sign languages The first thing that the naive observer notices when deaf people communicate in sign language is rapid and precise motion of the hands. The second thing that strikes the observer is salient expressions on the face and motions of the head. And indeed, all of these channels are put to critical use in the organization of sign languages. But what is the division of labour between them? In the early years of sign language research, the central goal was to understand the structure of words and the syntax of their arrangement. This meant that attention was on the hands, which convey words (Battison 1978; Bellugi and Klima 1979; Stokoe 1960). Before long, however, researchers began to set their sights beyond the hands and to observe the face. Baker and Padden (1978) observed, for example, that blinks systematically marked phrasal boundaries in American Sign Language (ASL), and Liddell (1978, 1980) showed that systematic patterns of facial expression and head position characterize yes/no and content questions, as well as relative clauses and other complex structures in ASL. Liddell attributed these non-manual patterns to syntax, claiming that syntactic markers in ASL are non- manual. An advantage of this approach was that it was able to show that there are indeed complex sentences in ASL. This explicitly syntactic view of non-manual articulations established a tradition in sign language research (Wilbur and Patschke 1999; Neidle et al. 2000; Cecchetto et al. 2009). Other studies of complex sentences in ASL and in Israeli Sign Language (ISL), such as conditionals and relative clauses, likened the behaviour of facial expression and head position in these sentences to intonation (Reilly et al. 1990; Nespor and Sandler 1999; Wilbur 2000; Sandler and Lillo-Martin 2006; Dachkovsky and Sandler 2009; Dachkovsky et al. 2013). We adopt the intonation position here, because of the linguistic function and formal patterning

SIGN LANGUAGE PROSODY 105 of such expressions, and support the existence of a prosodic component in the architecture of language structure more broadly. We review research showing that facial expression and head position correspond to intonation within a prosodic system that also includes (and is aligned with) timing and prominence, which in turn are signalled by the hands (Nespor and Sandler 1999). The existence of a prosodic component in the linguistic architecture is supported by evidence for units of timing, prominence, and intonational meaning that cannot be explained on the basis of other levels of structure, such as the lexical or syntactic levels. Like spoken languages, sign languages are characterized by the following hierarchy of prosodic constituents: phonological utterance > intonational phrase > phonological phrase > prosodic word > foot > syllable (Sandler 2010). §8.2.1 sets the stage with a brief overview of the syllable and the prosodic word, and we then ascend to higher levels of prosodic structure—intonational (§8.2.2) and phonological (§8.2.3) phrases. The nature and linguistic identity of non-manual elements—primarily facial expressions that are found at the higher levels of structure—are controversial, and we continue in §8.3 with an overview of the articulations that are at issue, and their coordination with manually conveyed signs. We support the claim that these signals are explicitly intonational, showing that information structure accounts for their occurrence and distribution. A discussion of three main categories of information structure—topic/comment, given/new, and focus/ background—and their expression in sign languages follows in §8.4. These sections are based primarily on research on ASL and ISL, two well-studied but unrelated sign languages. To address the issue of the architecture of the grammar—specifically, to what extent the markers at issue belong to the syntactic or the prosodic component—we turn our attention in §8.5 to yes/no and wh-questions, around which the debate has fomented, and include evidence from Brazilian Sign Language (Libras) as well. §8.6 is a summary and conclusion.

8.2 Prosodic constituency in signed languages 8.2.1 The syllable and the prosodic word A well-formed sign in a sign language must include movement of the hand(s). This m ovement consists of (1) a movement path from one location to another, (2) hand-internal movement (by change of orientation or of finger position), or (3) both path and internal movement together. Figure 8.1 shows a sign with a complex syllable nucleus of this type. Coulter (1982) was the first to propose that ASL is a monosyllabic language. The idea is that any type of movement constitutes a syllable nucleus, and that the vast majority of signs have only one nucleus—one syllable. Many studies have attributed visual sonority to the movement component of signs (e.g. Sandler 1989, 1993, 1999; Brentari 1990, 1993, 1998; Perlmutter 1991;1 Wilbur 1993).

1 Perlmutter (1993) proposed that ASL also has a moraic level.

106 WENDY SANDLER et al.

Figure 8.1 The monosyllabic sign SEND in ISL. The dominant hand moves in a path from the chest outward, and the fingers simultaneously change position from closed to open. The two simultaneous movements constitute a complex syllable nucleus.

Sign languages are known to be morphologically complex. How, then, is monosyllabicity preserved under morphological operations such as inflection and compounding? First, morphological complexity in both ASL and ISL is typically nonconcatenative (Sandler 1990), and thus it does not affect the monosyllabic structure of the base sign. Compounding—in which two morphosyntactic words combine to form a new word—is common in sign languages. While novel compounds can occur freely, resulting in disyllabic words (Brentari 1998), lexicalized compounds often reduce to one syllable in ASL (Liddell and Johnson 1986; Sandler 1989) and in ISL (Sandler 2012). Reduplicative processes in ASL copy only the final syllable of compounds (or the only syllable if they are reduced), providing support for the syllable unit in that language (Sandler 1989, 2017). It is not only concatenation of words in compounds that can reduce two morphosyntactic words to a single syllable. Cliticization of pronouns to hosts can do the same. Such phenomena suggest a broader generalization, compatible with Coulter’s insight: the optimal ‘prosodic word’ in sign languages is monosyllabic (Sandler 1999).2 In both reduced compounds and cliticized pronouns, two morphosyntactic words constitute a single prosodic word, much like cliticized and contracted forms in spoken languages, such as Sally’s an unusual person, or I’m going to Greece tomorrow. The research sketched above demonstrates that prosodic and morphosyntactic constituents are not isomorphic, and we will see more evidence for this claim below. Our focus in the rest of this chapter is on higher prosodic levels: the intonational phrase (IP) and the phonological phrase (PP). 2 Brentari (1998) does not dispute that most ASL signs are monosyllabic, but proposes that the maximal prosodic word in ASL is disyllabic.

SIGN LANGUAGE PROSODY 107

8.2.2 Intonational phrases Let’s consider the sentence in Figure 8.2, from ISL, ‘The cake that I baked is tasty’.

CAKE

IX

I

BAKE

TASTY

Figure 8.2 ISL complex sentence, ‘The cake that I baked is tasty’, glossed: [[CAKE IX]PP [I BAKE]PP]IP [[TASTY]PP]IP. ‘IX’ stands for an indexical pointing sign.

The sentence consists of two intonational phrases, indicated by IPs after the relevant brackets in the caption of Figure 8.2. The first IP consists of two phonological phrases, indicated by PP’s—[CAKE IX] and [I BAKE]—and the second consists of one phonological phrase, [TASTY]. We will deal with PPs in the next section. As for IP’s, according to Nespor and Sandler (1999) and subsequent research on ISL, the final manual sign at the IP boundary is marked in one of three ways: larger sign, longer duration, or repetition. In this particular sentence, the IP-final signs BAKE and TASTY have more repetitions than the citation forms. Non-manual markers are equally important in characterizing IP’s. The non-manual markers of facial expression and head position align temporally with the manual markers, and all markers change across the board between IP’s. In the sentence in Figure 8.2, the entire first IP is marked by raised eyebrows, squint, and forward head movement.3 The form and functions of these non-manual elements will be further discussed in §8.3 and §8.4. 3 Certain head movements, such as side-to-side headshake for negation, may well directly perform syntactic functions and indicate their scope in some sign languages. See Pfau and Quer (2007) for an in-depth comparison of negation devices in Catalan Sign Language and German Sign Language.

108 WENDY SANDLER et al.

8.2.3 Phonological phrases Spoken language researchers have found evidence for the existence of prosodic phrases below the level of the IP, called phonological phrases (Nespor and Vogel 2007) or intermediate phrases (Beckman and Pierrehumbert 1986). Two kinds of evidence support the existence of this prosodic constituent in ISL. The first type is articulatory. Manual markers of PP boundaries—increased size or duration, or repetition—are similar to but more subtle than those marking IP boundaries. As for face/head intonation, there can be either a relaxation of non-manual markers at PP boundaries or a partial change. For example, in Figure 8.2, the final sign at the first PP boundary, IX, is repeated once quickly with minimum displacement, and the facial expression (here, eye gaze) and head position are slightly relaxed. The second kind of evidence, found for ISL in the Nespor and Sandler (1999) study, is a phonological process that occurs within but not across prosodic phrases. The non-dominant hand, which enters the signing space in the two-handed sign BAKE, spreads regressively to the left edge of the PP, as shown with circles in Figure 8.2. The shape and location of this hand in the sign BAKE are already present during the preceding, onehanded sign, I.4 In a study of a narrative in ASL, Brentari and Crossley (2002) support the co-occurrence of non-manual and manual signals with different prosodic domains, including non-dominant hand-spreading behaviour. The spread of the non-dominant hand in ISL is formally similar to external sandhi processes in spoken language that are also bounded by the PP, such as French liaison, and ‘raddoppiamento sintattico’, or syntactic doubling, in Italian (Nespor and Vogel 2007). In §8.3, we describe the form and function of intonational signals. We then provide a principled information-structure-based analysis of the role of these signals in §8.4.

8.3 Defining properties of sign language intonation The non-manual intonational signals under discussion are not articulatorily similar to those of spoken language; however, they share important structural similarities. The current section will overview similarities and differences between spoken and signed language intonation both in terms of its structural properties and in terms of its meaning and usage. First, as the previous section demonstrated, facial intonation and head movements are temporally aligned with the timing of the words on the hands according to prosodic phrasing. By virtue of this neat alignment, intonational ‘tunes’ reinforce the prosodic constituency of signed utterances. An informative overview of non-manual signals and discussion of their prosodic and syntactic roles across sign languages is found in Pfau and Quer (2010).

4 The position of the nondominant hand in the preceding PP [CAKE IX], is the rest position for this signer.

SIGN LANGUAGE PROSODY 109 Second, intonation in both language modalities is compositionally organized. Pierrehumbert and Hirschberg (1990) and Hayes and Lahiri (1991) provide evidence for compositionality in intonational systems, and we find the same in ISL and ASL. Just as the Bengali focus pattern L*HL can combine with the H tone of continuation to produce LH*LH (Hayes and Lahiri 1991), so can non-manual intonational signals co-occur on the same constituents in sign languages. The compositionality claim depends on evidence that each intonational element separately contributes a stable meaning. This raises the challenging issue of intonational meaning. Like linguistic intonation in spoken language, facial expression and head position in sign languages can serve a variety of functions—they can signal illocutionary force (e.g. interrogative and imperative), information status such as topic and comment, and relationships between constituents, such as dependency. Although some languages (such as English and French) also have morphosyntactic markers for some of these functions, many other spoken languages (such as Hebrew and Russian) can express them with intonation alone. This is typically the case with sign languages as well. In ISL, as in many sign languages, raised eyebrows signal yes/no questions as well as continuation between parts of a sentence (Figure 8.3a). Also common across many sign languages is a furrowed brow expression for wh- (content) questions (Figure 8.3b). Squinted eyes signal retrieval of shared information in ISL (Figure 8.3c). In each type of sentence elicited from several signers in ISL, the designated non-manual actions appear reliably in over 90% of the appropriate sentence types (Dachkovsky and Sandler 2009; Dachkovsky et al. 2013), indicating that they are part of the conventionalized linguistic–prosodic system.

Figure 8.3 Linguistic facial expressions for three types of constituent in ISL. (a) Yes/no questions are characterized by raised brows and head forward and down; (b) wh-questions are characterized by furrowed brow and head forward; and (c) squint signals retrieval of information shared between signer and addressee. These linguistic face and head positions are strictly aligned temporally with the signing hands across each prosodic constituent.

However, the realizations of compositionality differ in signed and spoken languages. In the former, the visual modality allows non-manual components to be simultaneously superimposed on one another, rather than sequentially concatenated like tones in spoken language melodies. Whereas Figure 8.3a illustrates the raised brows of yes/no

110 WENDY SANDLER et al. questions and Figure 8.3c shows the squint of shared information retrieval, a question such as ‘Did you rent the apartment we saw last week?’ is characterized by the raised brows of yes/no questions together with the shared information squint, as shown in Figure 8.4.

Figure 8.4 Simultaneous compositionality of intonation in ISL: raised brows of yes/no questions and squint of shared information, e.g. ‘Did you rent the apartment we saw last week?’.

Another characteristic property of intonation is its intricate interaction with non- linguistic, affective expressions. Although this area is under-investigated in sign languages, here we also find similarities and differences with spoken languages. In spoken languages the intonational expression of emotion (e.g. anger, surprise, happiness) is realized through gradient features of pitch range and register rather than through specific intonational contours (Ladd 1996; Chen et al. 2004a). In contrast, in the visual modality both linguistic and emotional functions are expressed through constellations of the same types of visual signal (facial expressions and head/torso movements) that characterize linguistic intonation (Baker-Shenk 1983; Dachkovsky 2005). This results in different patterns of interaction between the two. For example, Weast (2008) presents the first quantitative analysis of eyebrow actions in a study of six native Deaf participants producing yes/no questions, wh-questions, and statements, each in neutral, happy, sad, surprise, and angry states. The findings demonstrate that ASL maintains linguistic distinctions between questions and statements through eyebrow height regardless of emotional state, as shown in Figure 8.5a, where the emotional expression of disgust co-occurs with the raised brows of yes/no questions. On the other hand, De Vos et al. (2009) show on the basis of Sign Language of the Netherlands (NGT) that conflicting linguistic and emotional brow patterns can either be blended (as in Figure 8.5a) or supersede each other, as in Figure 8.5b, where linguistic lowered brows of content questions are superseded by raised brows of surprise (see similar examples in Libras in Table 8.1). Which factors determine the specific type of interaction between linguistic and emotional displays within a sign language is a question that requires further investigation.

SIGN LANGUAGE PROSODY 111

Figure 8.5 Overriding linguistic intonation with affective intonation: (a) yes/no question, ‘Did he eat a bug?!’ with affective facial expression conveying fear/revulsion, instead of the neutral linguistic yes/no facial expression shown in Figure 8.3a. (b) wh-question, ‘Who gave you that Mercedes Benz as a gift?!’ Here, affective facial expression conveying amazement overrides the neutral linguistic whquestions shown in Figure 8.3b.

8.4 Intonation and information structure One of the main functions of intonation is the realization of information structure (Gussenhoven 1983b, 2004; Ladd 1996; House 2006; see also chapter 31). Since sign languages are generally ‘discourse oriented’ (e.g. Friedman 1976; Brennan and Turner 1994), intonational signals of different aspects of information structure play an important role in the signed modality. However, after decades of research, there is still little consensus with regard to the basic terms and categories of information structure, or to how they interact with each other. We rely on the model of information structure presented by Krifka (2008) and Gundel and Fretheim (2008). The primary categories of information structure discussed by the authors are topic/comment, given/new information, and background/focus. Although these notions partially overlap (Vallduví and Engdalh 1996), they are independent of each other. For instance, as exemplified by Krifka (2008) and Féry and Krifka (2008), focus/ background and givenness/newness cannot be reduced to just one opposition, because given expressions, such as pronouns, can be focused. Also, we do not claim that the notions of topic, givenness, and focus exhaust all that there is to say about information structure, and other effects connected to information flow relate to broader discourse structure (Chafe 1994). Here we will stay within the confines of the sentence (in a particular context), and we will illustrate some of the ways in which the information structure notions specified above are expressed intonationally in sign languages.

112 WENDY SANDLER et al.

8.4.1 Topic/comment One category of information structure is the opposition between topic and comment. This opposition involves a partition of the semantic/conceptual representation of a sentence into two complementary parts: identification of the topic, and providing information about it in a comment (Gundel and Fretheim 2008; Krifka 2008). A common assumption is that particular accent types or intonational patterns mark utterance topics in spoken languages (e.g. Jackendoff 1972; Ladd 1980; Gussenhoven 1983b; Vallduví and Engdalh 1996). However, there are very few empirical studies on the issue, and the results that have been offered involve different languages (e.g. English in Hedberg and Sosa 2008, and German in Braun 2006), making it difficult to arrive at generalizations. The variability of topic intonation might reflect the fact that topics vary on syntactic, semantic, and pragmatic grounds. In the following discussion, ‘topic’ refers to sentence topic, and not necessarily to topicalization or movement of a constituent. The visual nature of non-manual intonational signals and their articulatory independence from one another can be an advantage for researchers by providing a clearer form–meaning mapping of the intonational components. Several studies have demonstrated that topic– comment is a common organizing principle for sentences in sign languages (Fischer 1975 and Aarons 1994 for ASL; Crasborn et al. 2009 for NGT; Kimmelman 2012 for Russian Sign Language; Rosenstein 2001 for ISL; Sze 2009 for Hong Kong Sign Language), and that topics are typically marked with particular facial expressions and head positions, which, we argue, are comparable to prosodic marking in spoken languages for this information structuring role. Kimmelman and Pfau (2016) present a comprehensive overview of information structure and of the topic/comment distinction in sign languages. Early work on topic marking in ASL identified specific cues for topics (Fischer 1975; Friedman 1976; Ingram 1978; Liddell 1980), such as raised brows and raised/retracted head position. Topic–comment constructions, set off by specific and systematic non-manual marking, have also been reported in many other sign languages, such as Swedish Sign Language (Bergman 1984), British Sign Language (Deuchar 1983), Danish Sign Language (Engberg-Pedersen 1990), Finnish Sign Language (Jantunen 2007), NGT (Coerts 1992; Crasborn et al. 2009), and ISL (Meir and Sandler 2008). The most frequent non-manual marker of topics reported cross-linguistically is raised eyebrows. However, a comparative study of prosody in ASL and ISL, two unrelated sign languages, demonstrates language-particular differences in the marking and systematicity of information structure, which one would expect in natural languages (Dachkovsky et al. 2013). Comparison of the same sentences in the two languages across several signers revealed that topics are marked differently in ISL and ASL. ISL topics are marked by head movement that starts in a neutral or slightly head-up position and gradually moves forward, and often by squint (shared information) (see Figure 8.6a). In ASL, a static head-up position, together with raised brows (Figure 8.6b), usually marks the scope of the entire topic, as Liddell (1980) o riginally claimed. These findings highlight the importance of head position as an intonational component in sign language grammar, and show that topics are characterized by different head positions that can vary from language to language. They demonstrate that information structure is systematically marked linguistically in sign languages generally, but that the specific marking is language specific and not universal, as is also the case in spoken languages.

SIGN LANGUAGE PROSODY 113

Figure 8.6 Intonational marking of topics in (a) ISL and (b) ASL.

Similarly, this research demonstrated that the facial cues accompanying topics in both languages are also variable, within and across sign languages. Variability in topic marking has surfaced in more recent studies on other sign languages as well (e.g. Hong Kong Sign Language; Sze 2009). The authors found that topics in Hong Kong Sign Language are not marked consistently by one particular non-manual cue, but rather by a variety of signals, and sometimes even by manual prosodic cues alone. The reason for this variability might lie in the very nature of topics as discourse entities: they can be linked to the preceding discourse in various ways and may include different types of information—that is, information that is more accessible or less accessible to the interlocutors (e.g. Krifka 2008). This brings us to another dimension of information structure—the given/new distinction.

8.4.2 Given/new information Referential givenness/newness is defined as indicating whether, and to what degree, the denotation of an expression is present in the interlocutors’ common ground (Gundel and Fretheim 2008: 176; Krifka 2008). On the assumption that different types of mental effort or ‘cost’ are involved in the processing of referents, the information structure literature distinguishes a scale of information states, ranging from active (or given/highly accessible) to inactive (or new/inaccessible) (Chafe 1974; Ariel 1991; Lambrecht 1996). The general pattern that has emerged from spoken language literature is that referents with a lower degree of activation, or accessibility, tend to be encoded with greater intonational prominence and/or with particular accent types, although these patterns very much depend on the language (e.g. Baumann and Grice 2006; Chen et al. 2007; Umbach 2001). The given/new category has received much less attention in sign language literature in comparison with the topic/comment category. Engberg-Pedersen (1990) demonstrated that squint in Danish Sign Language serves as instruction to the addressee to retrieve information that is not given in the discourse, and might be accessible from prior knowledge or shared background. On the basis of a fine-grained analysis, Dachkovsky (2005) and Dachkovsky and Sandler (2009) argued for a comparable function of squint in ISL. Similar conclusions related to the function of squint as a marker of information with low accessibility have been reported for German Sign Language (Herrmann and Steinbach 2013; Herrmann 2015).

114 WENDY SANDLER et al. The study by Dachkovsky et al. (2013) mentioned earlier investigates the interaction between two categories of information structure—given/new and topics—in ISL and ASL. It demonstrates essential differences with regard to the intonational marking of low referent accessibility in topics in the two languages. First of all, coding of non-manual signals using the Facial Action Coding System (Ekman et al. 2002) reveals a cross-linguistic phonetic difference in articulation of low-accessibility signal/squint. To be precise, ‘squint’ is achieved by different muscle actions, or actions units (AUs), in the two sign languages, in both cases serving to narrow the eye opening and to emphasize the ‘nasolabial triangle’ between nose and mouth in each language. While in ISL the effect is produced by lower-lid tightening (AU7 in Figure 8.7) and deepening of the nasolabial furrow (AU11), in ASL, a similar appearance is achieved by raising the cheeks (AU6) and raising the upper lip (AU10), as shown in Figure 8.7.

Figure 8.7 Different phonetic realizations of the low accessibility marker, squint, in (a) ISL and (b) ASL.

The differences between ISL and ASL information status marking pertain not only to formal properties but also to the functional properties of information status. Dachkovsky et al. (2013) suggest that the ISL intonational system is more sensitive than that of ASL to the accessibility status of a constituent. Specifically, ASL tends to reserve squint for topic constituents with very low accessibility only, according to the motivated accessibility ranking of Ariel (1991), high > mid > low. Most other topics are marked by raised brows. On the other hand, the occurrence of squint in ISL topics is broader—it systematically co-occurs with mid- as well as low-accessibility topic constituents. These findings show that syntax does not determine the non-manual marking of information structure. Specifically, the presence of squint in ISL and ASL topics is related to different degrees of sensitivity to pragmatic considerations—the degree of accessibility— regardless of their syntactic role (i.e. whether they are adverbial phrases, object noun phrases, or subject noun phrases).

SIGN LANGUAGE PROSODY 115

8.4.3 Focus/background The third crucial category in the organization of information is focus/background. A focused part of an utterance is usually defined through the presence of alternatives that are relevant for the interpretation of the utterance (e.g. Rooth 1992; Krifka 2008). Information focus can be conceived of as the part of a sentence that answers a content question. Here we can also observe how the focus/background distinction interacts with other information structure categories, such as the given/new distinction. Specifically, contrastive and emphatic foci are used to negate or affirm information that was previously given (mentioned) in the discourse. In spoken languages, a focused constituent often receives prosodic prominence and particular types of accent (Ladd 1996; Gussenhoven 2004). One study that investigated prosodic distinctions between different types of foci was conducted on NGT (Crasborn et al. 2009). The authors find that focused constituents are generally characterized by a range of non-manual signals, none of which is exclusive or obligatory. The markers of focus include intense gaze at the addressee, intensified mouthing, and enhanced manual characteristics of signs, such as size and duration. The study demonstrates that, although the prosodic marking of focus is sensitive to various semantic– pragmatic considerations and syntactic factors, the relation between these and prosody is not systematic. Kimmelman’s (2014) study demonstrates that Russian Sign Language is very different from NGT: it hardly employs any non-manuals to mark either type of focus. The most common signals of focus in Russian Sign Language are manual cues, such as holds or size and speed modifications of a sign, along with some syntactic strategies, such as d oubling and ellipsis. In German Sign Language, only contrastive focus seems to be systematically marked. Distinguishing information focus (i.e. new information) from contrastive focus, Waleschkowski (2009) and Herrmann (2015) demonstrate that, whereas the marking of information focus is not obligatory in German Sign Language, contrastive focus is consistently marked by specific manual and non-manual marking (focus particles)—mostly by head nods. ASL seems to have more regular non-manual patterns characterizing different types of focus, with contrastive foci being distinguished by opposite body leans (Wilbur and Patschke 1998). Schlenker et al. (2016; see also Gökgöz et al. 2016) complement Wilbur and Patschke’s (1998) findings on ASL by comparing them to a similar data set in LSF (French Sign Language), and show that prosodic modulations of signs and non-manuals (brow raise and head movements) suffice to convey focus, with diverse semantic effects, ranging from contrastive to exhaustive, as in spoken language. Distinct non-manual profiles were also observed for focus in Libras in a comparative study with ASL. Lillo-Martin and Quadros (2008) discuss three types of focus: information (non-contrastive) focus, contrastive focus, and emphatic focus (which like contrastive focus can be used to negate previous information, but unlike contrastive focus can also be used to affirm; Zubizarreta 1998). They report that different non-manual markers are associated with information focus (raised brows and head tilted back) and contrastive focus (brows that are furrowed with their inner parts raised, and head tilted to the side), as illustrated in Figure 8.8.

116 WENDY SANDLER et al.

Figure 8.8 Non-manual markers in Libras accompanying (a) information focus (raised brows and head tilted back) and (b) contrastive focus (raised and furrowed brows, and head tilted to the side).

As Lillo-Martin and Quadros (2008) report, both information focus and contrastive focus can occur with elements that are at the beginning of a sentence or in their sentenceinternal position, as illustrated in examples (1) (information focus) and (2) (contrastive focus).5 These examples are grammatical in both ASL and Libras. The fact that the same non-manual marking is associated with elements in different syntactic positions in (1) can be seen as evidence in favour of the view that these elements are determined by pragmatic rather than strictly syntactic factors. (1) Information focus S1: S2: S2:

wh WHAT YOU READ ‘What did you read?’ I-focus BOOK CHOMSKY I READ I-focus I READ BOOK CHOMSKY ‘I read Chomsky’s book’

(2) Contrastive focus

S1: S2:

y/n YOU READ STOKOE BOOK ‘Did you read Stokoe’s book? C-focus NO, BOOK CHOMSKY I READ ‘No, I read Chomsky’s book.’

5 Throughout this paper, signs are indicated by translation equivalent glosses in upper case. Here, PU indicates a sign made with palm up; this sign is used in ASL for general wh-questions as well as other functions. For a recent review and analysis of ‘palm up’ in gesture and in sign languages, see Cooperrider et al. (2018) and references cited therein. An abbreviation on a line above glosses indicates a non-manual marker co-occurring with the indicated signs, as follows: wh: wh-question y/n: yes/no question I-focus: information focus C-focus: contrastive focus hn: head nod

SIGN LANGUAGE PROSODY 117 Unlike information focus and contrastive focus, in Libras and ASL emphatic focus is not associated with a particular type of non-manual marking but with a stressed version of the marking that would go with the non-emphatic reading of the sentence. Emphatic elements are generally expressed in the sentence-final position or in doubling constructions, as illustrated in (3), which again represents sentences that are acceptable in both languages, where the default position of modals such as CAN is pre-verbal. (3)

hn a. JOHN CAN READ CAN hn b. JOHN CAN READ CAN ‘John really CAN read.’

Even though the examples in (3) show isomorphism between the prosody and the syntactic structure, it is possible for this relationship to be broken. For example, in Libras, there is the possibility of having an extension of the head nod accompanying the emphatic focus element after the end of the manual sentence is produced, as shown in (4). (4) JOHN CAN READ CAN

hn

In sum, we have seen that information structure is signalled by facial expression and head position in sign languages, performing much the same role as intonation in spoken languages. Along all familiar parameters of information structure—topic/comment, given/new, and focus/background—the literature shows non-trivial similarities between the distribution of these non-manual articulations in sign language and intonation in spoken language. While these information structure categories also often have syntactic characteristics, the relationship between prosody and syntax is indirect. Nevertheless, there is disagreement on this issue, with some researchers taking the position that familiar non-manual markers are associated directly with the syntactic component of the grammar, and, as such, reveal underlying syntactic structure. We now address the dispute regarding whether observed non-manual markers are best understood as more closely related to the prosodic or the syntactic component of the grammar by turning our attention to illocutionary force— specifically, to interrogatives. In spoken languages, questions may be syntactically marked, and they are also apparently universally marked by intonation. What is the case in sign languages?

8.5 Prosody versus syntax: evidence from wh-questions §8.3 showed that a particular set of non-manual markers accompanies questions in ISL. In fact, these markers are found in many sign languages, including ASL and Libras. One of the earliest descriptions of these markers, by Liddell (1980), proposed that the spread of the nonmanual is determined by the structural configuration known as ‘command’ (a precursor to the contemporary notion c-command), as illustrated by the tree in (5).

118 WENDY SANDLER et al. (5)

S

A

Y

B If syntactic elements are located in a tree structure such as the one in (5), element A commands element B if the S node dominating A also dominates B (Y indicates a node between A and B). Liddell’s argument is that the non-manual markers are represented in tree structures and that their spread is determined by syntactic relationships. The analysis of non-manual marking for questions as determined by syntactic structure has been maintained by numerous scholars, including Petronio and Lillo-Martin (1997) and Neidle et al. (2000) for ASL, and Cecchetto et al. (2009) for LIS (Italian Sign Language). These researchers have used the spread of non-manuals, especially those associated with wh-questions, to help determine the syntactic structure involved. Cecchetto et al. furthermore claimed that in Italian Sign Language, the fact that wh-question scope marking is indicated by non-manuals allows for the default linear order of syntactic constituents typically found in spoken languages to be overridden specifically for sign languages. However, as we will now show, the assumed relationship between non-manual marking of questions and syntactic structures can be violated in numerous ways (see Sandler and LilloMartin 2006 for evidence from ISL). This lack of a direct correspondence calls into question a syntactic analysis of non-manual marking, and, even more so, the use of non-manual marking spread to make inferences about syntactic structures (see Sandler 2010). Instead, in line with the conclusion put forward in the previous section, we take this common nonisomorphism to show that (at least in some sign languages) question non-manual marking behaves like intonation. It conveys pragmatic (illocutionary) information whose spread is determined by semantic–pragmatic organization. The constituents organized in this way are often correlated with syntactic phrasing and categories, but in many cases they are not (see Nespor and Sandler 1999; Nespor and Vogel 2007). For an approach that aims to reconcile syntactic distribution with prosodic spreading of non-manuals in numerous structures, see Pfau (2016). In the following, the ‘wh’ label above text corresponds to the standard non-manual marker of wh-questions in ASL. In ASL, wh-questions are characterized by furrowed brow and forward head position, as illustrated in Table 8.1a (Liddell 1980). In a study that elicited wh-questions from six ASL signers, this facial expression and head position characterized 100% and 65% of wh-questions respectively (Dachkovsky et al. 2013), and they are thus a systematic part of the linguistic system. The first case to consider is indirect questions. If the spread of the wh-question non-manual marker is determined by the scope of the [+wh] element, as is often assumed, we should see the pattern illustrated in (6).6 This pattern should be found regardless of the matrix verb.7 6 Of course, syntactic accounts could make different predictions, as in the one proposed by Petronio and Lillo-Martin (1997). Here we simply address the most common syntactic proposal associating the non-manual marking with the [+wh] feature. 7 At this point we are abstracting away from questions about the position of the wh-phrase in both matrix and indirect questions. Because wh-phrases are frequently found in sentence-final position in

SIGN LANGUAGE PROSODY 119 wh (6) a. [Matrix wh-question [embedded clause] ] wh b. [Matrix clause [embedded wh-question] ] However, the pattern listed in (6) does not hold. If the matrix verb is ASK, a pattern like this might be seen, as shown in (7a). However, if the matrix verb is WONDER, a puzzled expression, like that in Figure 8.8b, might appear across the whole sentence, as in (7b). Furthermore, if the matrix verb is KNOW, no furrowed brow wh-question expression is observed; instead, there might be a head nod, as in (7c). wh (7) a. IX_1 ASK [ WHERE JOHN LIVE ] wh b. IX_1 WONDER [ WHY JOHN LEAVE ] hn c. IX_1 KNOW [ HOW SWIM ] The distribution of wh-question non-manual markers observed in (7) is unexpected on an account by which the spread of this marker is determined by the syntactic scope of a whphrase. On the other hand, if the marker indicates a direct question to which a response is expected, the lack of any such marking in (7c) is completely expected. Furthermore, its presence in (7a) can be accounted for by assuming that this example represents a (quoted) direct question. In (7b), the expression has different characteristics: it is a puzzled expression, with eye gaze up rather than towards the addressee; this expression co-occurs with the matrix clause because the whole sentence expresses puzzlement. It would be possible to maintain that the pattern of non-manual marking in indirect questions is not inconsistent with a syntactic account of their distribution, but only that the syntactic account must be more nuanced. However, even matrix wh-questions might not display typical wh-question non-manuals in various contexts. For example, a signer can produce a wh-question with affective facial expressions that completely replace the typical wh-question non-manuals, as illustrated in row c of Table 8.1 and Figure 8.5b, similar to the example from ISL in Figure 8.3b above. It is also possible for a non-question (i.e. without morphosyntactic question markers) to be used to seek information (row d of Table 8.1), in which case it will use a non-manual marker that has the same or similar properties to the typical wh-question nonmanual marking, or for a non wh-question to indicate puzzlement using a facial expression that has a brow furrow, like standard wh-questions (row a of Table 8.1 and Figure 8.5a) All of these examples are straightforwardly compared to intonational patterns in spoken languages. ASL, some scholars (e.g. Neidle et al. 2000) have taken that to be the unmarked position (or the position of the specifier of CP, the highest ‘root’ sentence node). The grammaticality of some sentence-initial whphrases is disputed by these authors. However, for adjunct wh-phrases in indirect questions, the clauseinitial position is generally claimed to be acceptable, so we use clause-initial adjuncts in the examples in (7). As far as we know, the same pattern is found no matter the position of the wh-phrase.

120

WENDY SANDLER et al.

Table 8.1 Non-manual marking used in different contexts in ASL and Libras Context

ASL

Libras

a. Direct WH-questions

b. Indirect question (not all types have the same marking)

c. WH-question with affect

d. Non-question requesting information

e. Non-question with puzzled affect

In Libras, while the distribution of non-manual markings is similar to that described for ASL, the actual facial expressions are different, as shown in Table 8.1. In row a, typical wh-questions are shown, with brows that are furrowed in both sign languages, and, in Libras, with the inner parts of the brows somewhat raised and the head tilted back. Row b illustrates two types of marking used with indirect questions, in different contexts in the two languages. Row c illustrates that wh-questions can have affective overlays such as playful doubt (ASL) or exasperation (Libras). Row d shows that sentences without a manual (morphosyntactic) wh-element can be produced with the brow expressions typically used in wh-questions, in order to seek information.⁸ Such examples suggest that it is the ⁸ We note that furrowed brows characterize wh-questions in ISL, ASL, and Libras, as in other sign languages, such as British Sign Language (Woll 1981) and Sign Language of the Netherlands (Coerts 1992;

SIGN LANGUAGE PROSODY 121 information-seeking function that determines the facial intonation, and not the syntactic structure. Similarly, row e represents an ASL non-question with furrowed brows and head tilt, conveying puzzlement. Further research is needed to determine the degree of systematicity, additional factors determining the distribution, and associated pragmatic roles of the intonational displays shown in rows b–e of Table 8.1 and in example 8.3 above, from ISL. Our point is simply that these displays, like the systematic linguistic marking of standard wh-questions, are i ntonational. Taken together, these observations reveal that the non-manual marking typically associated with questions serve pragmatic functions, seeking a response or (in the case of certain yes/no questions) confirmation. As such, they are used only when questions have this pragmatic function, and even when declaratives have such a function. Furthermore, the spread of the non-manuals is often consistent with syntactic structure: for example, a question or a declarative seeking a response might have some indication of the non-manual marking throughout the whole clause. However, we note that the syntactic constituent is often, but not always, isomorphic with the prosodic constituent. The various components of the nonmanual marking (brows, eyelids, head position, eye gaze) can change over the course of the sentence, and may be most intense at the end of the signer’s turn, when the interlocutor’s response is expected (an observation that is given a syntactic account by Neidle et al. 2000 and others). In addition, other factors can interrupt the flow of the wh-question marker. For example, in an alternative question, the list intonation interrupts the production of the furrowed brow wh-question marker, as first observed for ISL by Meir and Sandler (2008) and illustrated in ASL in Figure 8.9.

PU

OR

FLAVOR

CHOCOLATE

LAYER

VANILLA

OR

PU

Figure 8.9 ASL alternative question, glossed: PU FLAVOUR CHOCOLATE VANILLA OR LAYER OR PU, translated roughly as ‘What flavour do you want, chocolate, vanilla or layer?’. Zeshan 2004). This suggests that this aspect of facial expression may have a general non-linguistic source that is conventionalized in sign languages.

122 WENDY SANDLER et al. It remains to be seen whether the patterns observed for ASL, Libras, and ISL are replicated in other sign languages. There is extensive discussion of the patterns of wh-questions found in different sign languages in Zeshan (2006), although these works do not address the central question here, which is whether the non-manual marking typically associated with questions represents an essentially prosodic phenomenon versus a syntactic one. How ever, any analysis of the structure of questions in sign languages should take into consideration the evidence that non-manual marking behaves as an intonational component, whose distribution and scope are determined more by pragmatic than by syntactic factors (Selkirk 1984; Nespor and Vogel 2007, Sandler and Lillo-Martin 2006; Sandler 2010).

8.6 Summary and conclusion The systematic use of facial expression and head position in sign languages, and their alignment with manual cues of prominence and timing, offer a unique contribution to linguistic theory by literally making the intricate interaction of intonation with pragmatics and syntax clearly visible. In sign languages, both the occurrence and the scope of manual and nonmanual signals, as well as their coordinated interaction, are there for all to see. Nevertheless, there are disputes in this relatively new field of inquiry. We have pointed out that one of the characteristics of prosody, including intonation, is variation—due to semantic, pragmatic, and other factors, such as rate of speech/signing—and we encourage future researchers to engage in controlled and quantified studies across a number of signers in order to document and account for the data. We hope that future research, conducted at finer and finer resolutions across signers and sign languages, will further illuminate the nature of the system, allowing us to arrive at a detailed model of prosody in sign languages, and of its interaction with other linguistic components. The evidence we have presented shows that the distribution and behaviour of these signals correspond closely to those of intonation and prosodic phrasing in spoken languages, suggesting that an independent prosodic component is a universal property of language, regardless of physical modality.

Acknowledgements Portions of the research reported here have received funding from the European Research Council under the European Union’s Seventh Framework Programme, grant agreement No. 340140. Principal Investigator: WS; Israel Science Foundation grants number 553/04 PI WS, and 580/09, PIs WS and Irit Meir; the U.S. National Institutes of Health, NIDCD grant #DC00183 and NIDCD grant #DC009263, Principal Investigator: DLM; and the Brazilian National Council for Research, CNPq Grant #CNPQ #200031/2009-0 and #470111/2007-0, Principal Investigator: RMQ.

PA rt I I I

PRO S ODY I N SPE E C H PRODUC T ION

chapter 9

Phon etic Va r i ation i n Ton e a n d I n tonation Systems Jonathan Barnes, Hansjörg Mixdorff, and Oliver Niebuhr

9.1 Introduction In both tonal and segmental phonology, patterned variability in the realization of abstract sound categories is a classic object of empirical description as well as a long-standing target of theoretical inquiry. Among prosody researchers in particular, focus on this critical aspect of the phonetics–phonology interface has been constant and intensive, even during periods when intellectual contact between segmental phonologists and their phonetician counterparts notably ebbed. In this chapter, we review the most commonly cited phenomena affecting the phonetic realization of both lexical and intonational tone patterns. The chapter’s title purposefully invokes a broad range of interpretations, and we aim, if not for exhaustivity, then at least for inclusivity of coverage within that range. Many of the phenomena we investigate here are examples of non-distinctive, within-category variability—realizational elasticity, in other words, within some phonetic dimension, in spite of or unrelated to the phonological contrasts being expressed. At the same time, however, we mean equally to review the ways in which particular phonetic dimensions of the signal may be modulated in support of the expression of contrasts (in the manner of Kingston and Diehl 1994). The thread that unifies it all is a broad concern with how tone and intonation patterns are implemented phonetically. Throughout what follows, we return repeatedly to several issues we view as central to the development of the field. One such focus is on phonetic motivation for phonological patterns, with emphasis on both perception and production evidence. We also touch on cross-language variation in the distribution and implementation of the phenomena reviewed. Do they appear in both tone and intonation systems? Do they have both gradient and categorical manifestations, and, if so, how are these connected? Lastly, we urge researchers to consider the potential interaction of all the phenomena under discussion.

126 JONATHAN BARNES, HANSJöRG MIXDORFF, AND OLIVER NIEBUHR The emergence of higher-level regularities, such as those involving enhancement relations or perceptual cue integration, is only now being explored by prosody researchers in the way that it has been for decades in the study of segmental contrasts such as voicing or consonant place. This bears great promise for the future of the field. We begin with a discussion of coarticulation patterns among adjacent tonal targets (§9.2), then turn to a more general consideration of patterns of tonal timing (§9.3) and of f0 scaling (§9.4). §9.5 reviews how aspects of global contour shape contribute to the realization and recognition of tone contrasts, while §9.6 does the same for non-f0 factors such as voice quality. A brief conclusion is offered in §9.7.

9.2 Tonal coarticulation Contrasting pitch patterns in phonological inventories are commonly described in terms of their canonical static or dynamic f0 shapes when uttered in isolation. However, like their segmental counterparts, such patterns rarely occur entirely on their own. As the larynx strives to attain multiple sequenced targets in finite time, adjacent targets may influence one another in a manner at least partially reducible to physiological constraints on tone production. At the same time, this coarticulation is tempered by the need to maintain contrasts within the system, and thus may take on diverse shapes across languages (DiCanio 2014). Much of the literature on tonal coarticulation focuses on lexical tone languages in East and South East Asia, with detailed descriptions for Standard Chinese (Ho 1976; Shih 1988; Shen 1990; Xu 1994, 1997, 1999, 2001), Cantonese (Gu and Lee 2009), Taiwanese (Cheng 1968; Lin 1988b; Peng 1997), Vietnamese (Han and Kim 1974; Brunelle 2003, 2009a), and Thai (Abramson 1979; Gandour et al. 1992a, 1992b, 1994; Potisuk et al. 1997). More recently, studies have also been devoted to African tone languages (e.g. Myers 2003 on Kinyarwanda; Connell and Ladd 1990, Laniran 1992, and others on Yoruba), and Central and South American tone languages (e.g. DiCanio 2014 on Triqui). Laboratory studies of tonal coarticulation typically involve elicitation of specific tone sequences, focusing on effects of preceding and following context on the realization of each tone in terms of both f0 scaling and alignment with the segmental string. Focus patterns and speech rate are also commonly varied. Figures 9.1 and 9.2, from Xu (2001), investigating tonal coarticulation in Mandarin, are representative. In Figure 9.1, we see a high (H, top) and rising (R, bottom) tone preceded by low (L), high (H), rising (R), and falling (F) tones. The f0 contour in the second syllable is clearly influenced by the preceding tone and the ultimate target is often only approximated towards the syllable’s end, yielding ‘carry-over’ or perseveratory coarticulation. By contrast, Figure 9.2 displays Mandarin high and rising tones in the first syllable followed by four different tone types in the second. When the first syllable precedes a Low, f0 trajectories are higher than before other second-syllable tone types. The second-syllable Low target thus influences preceding Highs in a pattern of ‘anticipatory’, dissimilative coarticulation that has been called ‘pre-Low raising’ (see below). A comparison of Figure 9.1 and Figure 9.2 also reveals that in Mandarin, carry-over coarticulation would appear to be more dramatic than anticipatory coarticulation.

PHONETIC VARIATION IN TONE AND INTONATION SYSTEMS 127 (a)

ma

160 140

H

F

120

ma

R

100

L

H

F0 (Hz)

80 (b) 160 140 120 100

H

F R

L

R

80 Normalized time

Figure 9.1 Carry-over coarticulation in Mandarin Chinese. See text for an explanation of the abbreviations. (Xu 2001)

Nearly all studies of individual languages document tonal coarticulation in both directions. Perhaps the strongest cross-linguistic generalization, however, is that both the magnitude and the duration of perseveratory coarticulation regularly greatly exceed those of anticipatory coarticulation (though Brunelle 2003, 2009a finds that for both Northern and Southern Vietnamese, anticipatory coarticulation, though weaker in magnitude, is longer lasting). The perseveratory pattern known as peak delay, whereby a high target, particularly in fast speech or on weaker or shorter syllables, has a tendency to reach its maximum during a following syllable, rather than during its phonological host, is one common reflection of this general tendency. The reason for the directional asymmetry is not immediately apparent, though see Flemming (2011) for a candidate explanation based on matching of tonal targets to regions of high segmental sonority. The pattern, in any case, is apparently not universal. In Kinyarwanda (Myers 2003), tonal coarticulation is primarily anticipatory (see also Chen et al. 2018). Perseveratory coarticulation is in all known cases assimilative in nature. Anticipatory coarticulation, however, may be either assimilative or dissimilative, depending on the language or even specific tones in a single language. Vietnamese has assimilative anticipatory coarticulation, while most studies on Thai have reported dissimilative anticipatory coarticulation, especially as Low tones affect high offsets of preceding targets (see Figure 9.2).

128 JONATHAN BARNES, HANSJöRG MIXDORFF, AND OLIVER NIEBUHR (a) 160

ma

ma F

H

140 120 100

R H L

F0 (Hz)

80 (b) 160 140

F

H

R R

120 100

L

80 Normalized time

Figure 9.2 Anticipatory coarticulation in Mandarin Chinese appears dissimilatory. See text for an explanation of the abbreviations. (Xu 2001)

For Taiwanese, Cheng (1968) and Peng (1997) find anticipatory effects to be primarily assimilative, but Peng (1997) also noticed the dissimilative effect in which high-level and mid-level tones are higher when the following tone has a low onset. For Standard Chinese, dissimilative raising of preceding high targets before a low has been noted by multiple studies (Shih 1986; Shen 1990; Xu 1994, 1997), but Shih (1986) and Shen (1990) also report assimilative tendencies in anticipatory coarticulation, such as raising of tone offsets before following high-onset tones (Shen 1990). A common theme, however, is that Low tones more often cause dissimilation of preceding Highs than Highs do of preceding Lows. This phenomenon, often called high tone (or pre-low) raising, has been taken to represent a form of syntagmatic contrast enhancement.1 In West African languages in particular, it is often mentioned in the context of the implementation of downstep, where its local effect would be to maximize the distinction between a phonological Low tone and a downstepped High, while a global effect might be the prophylactic expansion of the pitch range, to avoid the endangerment of tone contrasts under later downstep-driven compression (Laniran 1 A connection might thus be drawn between high tone raising and ‘low tone dipping’, discussed in §9.5.2. In both instances, low tone targets appear to be phonetically enhanced, by either the addition or the exaggeration of adjacent high targets.

PHONETIC VARIATION IN TONE AND INTONATION SYSTEMS 129 and Clements 2003). While primarily observed in lexical tone systems, high tone raising has also been reported in certain intonation systems (Féry and Kügler 2008; Kügler and Féry 2017). Tonal coarticulation should be distinguished from potentially related processes known as tone sandhi (see chapter 22) and other phonological processes such as tone shift or tone spreading. Coarticulation in the clearest cases is assumed not to alter the phonological status of affected tones, and instead just to shape their acoustic realization. Likewise, tonal coarticulation is typically gradient and may vary across speech rates and styles, whereas sandhi is categorical, ideally not dependent on rate or style, and may result in the neutral ization of underlying tone contrasts. There are, however, resemblances between the coarticulatory patterns noted above and certain common phonological tone patterns. Tone spreading, for example, usually described as assimilation affecting all or part of a target tone (Manfredi 1993), is extremely common across languages, and proceeds overwhelmingly from left to right (Hyman and Schuh 1974; Hyman 2007). For example, in Yoruba, High tones spread to following syllables with Lows, creating a falling contour on the host syllable; Low tones similarly spread right to syllables with Highs, creating surface rises (Schuh 1978). Hyman (2007) notes the connection between the two patterns and suggests that tonal coarticulation is in fact a phonetic precursor to many commonly phonologized patterns in tone systems. Downstep of High tones following a Low in terraced-level tone systems is likewise often considered part of categorical phonology, to the extent that it involves lowering the ceiling of the pitch range for all further Highs in a domain in a manner that is independent of speech rate or effort. Such systems may, however, have phonetic roots in gradient perseveratory coarticulation. At the same time, the distinction between phonetic and phonological patterns is not always clear. The Yoruba spreading pattern is typically treated as phonological, but in many analogous cases there is little evidence to decide the matter. In Standard Chinese trisyl lables, Chao (1968) finds a perseveratory effect of an initial high or rising tone on a following rise, which apparently causes the rise to neutralize with underlying high tones, at least in fast speech. If the process is indeed neutralizing, this sounds like typical tone sandhi. Its rate dependence, by contrast, sounds phonetic and suggests coarticulation.

9.3 Timing of pitch movements With the rise of autosegmental phonology (Leben 1973; Goldsmith 1976a; Pierrehumbert 1980; see also chapter 5), investigation of tonal implementation in terms of specified contour shapes was largely replaced by a focus on the locations in time and f0 space of phonetic tone-level targets thought to be projected by tonal autosegments (H, L, etc.) in the phon ology. These targets, in turn, were commonly equated with observable turning points (maxima, minima, ‘peaks’, ‘valleys’, ‘elbows’) in the f0 contour. In this section, we review the literature on phenomena affecting the timing of tonal targets (though, in practice, many of these patterns involve target scaling as well). The timing of phonetic tonal targets, relative either to segmental elements or other pros odic structures, is called ‘tonal alignment’ (cf. ‘tonal association’, a phonological relation

130 JONATHAN BARNES, HANSJöRG MIXDORFF, AND OLIVER NIEBUHR

c

V´

c

c

V´

c

Figure 9.3 Segmental anchoring: a schematic depicting the relative stability of alignment of f0 movements (solid line) with respect to the segmental string (here CVC) and the accompanying variation in shape (i.e. slope and duration) of the f0 movement.

rather than a physical one). A key finding in the alignment literature is segmental anchoring (Arvaniti and Ladd 1995; Arvaniti et al. 1998; Ladd et al. 1999), the observation that under changes to the duration or number of segments spanned by tonal movements, movement shapes (e.g. durations or slopes) vary correspondingly (Figure 9.3). By contrast, the tem poral relationship between pitch movement onsets/offsets and select segmental landmarks remains relatively constant. This finding comfortably echoes the ‘target-and-interpolation’ approach to tonal implementation suggested by Goldsmith (1976a: §3.2) in his dissertation, where f0 contours move from target to target, each with specified timing and scaling. Between targets, f0 interpolates along the shortest or articulatorily cheapest path. Distinctive variation in ‘underspecified’ regions is not predicted.

9.3.1 Segmentally induced variability in f0 target realization While segmental anchoring in some form is broadly accepted, questions remain concerning its details. Given a pitch rise (phonological LH), do we expect both tonal targets to be equally, independently ‘anchored’, or is there some asymmetry and/or interdependence between them? For example, the pitch accents of various European intonation systems show relatively stable timing of pitch movement onsets, while movement offset timing varies considerably relative to segmental hosts (Silverman and Pierrehumbert 1990; Caspers and van Heuven 1993; van Santen and Hirschberg 1994). One expression of this is the tendency for rising accent peaks to anchor relatively earlier in closed syllables than in open (D’Imperio 2000; Welby and Lœvenbruck 2006; Jilka and Möbius 2007; Prieto and Torreira 2007; Prieto 2009; Mücke et al. 2009), with rise onsets largely unaffected. Perhaps relatedly, peaks are also seen to align earlier in syllables with obstruent codas than in allsonorant rhymes (van Santen and Hirschberg 1994; Rietveld and Gussenhoven 1995; Welby and Lœvenbruck 2006; Jilka and Möbius 2007). Both these patterns may reflect a tendency for critical f0 contour regions to avoid realization in less sonorous contexts (House 1990; Gordon 1999; Zhang 2001, 2004a; Prieto 2009; Dogil and Schweitzer 2011; Flemming 2011; Barnes et al. 2014). Some studies (e.g. Rietveld and Gussenhoven 1995) have shown an influence of syllable onset composition on alignment patterns as well (cf. Prieto and Torreira 2007).

PHONETIC VARIATION IN TONE AND INTONATION SYSTEMS 131

9.3.2 Time pressure effects on f0 target realization Non-segmental factors, such as temporal pressure, may also influence tonal alignment. Silverman and Pierrehumbert (1990) famously found earlier alignment of prenuclear H* accents in English induced by multiple facets of upcoming prosodic context, including word boundaries and pitch accented syllables. This phenomenon, known as ‘tonal crowding’, is observed in many languages, including Dutch (Caspers and van Heuven 1993), Spanish (Prieto et al. 1995), Greek (Arvaniti et al. 1998), and Chickasaw (Gordon 2008). Interestingly, in cases where both right-side prosodic context and speech rate were involved, only the former resulted in significant alignment adjustment (Ladd et al. 1999; cf. Cho 2010, whose speech rate manipulation study shows both segmental anchoring and some pressure towards constant pitch movement duration in English, Japanese, Korean, and Mandarin). The similarity of pitch-movement onset/offset asymmetries to segmental inter-gestural coordination patterns, whereby onset consonant timing is more stable and more reliably coordinated with syllable nuclei (Browman and Goldstein 1988, 2000), may pose a challenge to the standard conception of segmental anchoring (Prieto and Torreira 2007; Gao 2008; Mücke et al. 2009; Prieto 2009). Other studies, however, argue that while ‘right context’ does condition alignment changes for pitch movement offsets in various languages, within a given structural condition, movement offsets are no more variable (and hence no less anchored) than movement onsets (e.g. Ladd et al. 2000; Dilley et al. 2005; Schepman et al. 2006; Ladd et al. 2009b). In both English and Dutch (Schepman et al. 2006; Ladd et al. 2009b), for example, pitch movement offsets align differently for syllables containing phonologically ‘short’ and ‘long’ vowels, but these differences are not determined by phonetic vowel duration, requiring reference to structural factors instead (e.g. syllabification). Some languages also exhibit more timing variability in pitch movement onsets than in offsets (e.g. Mandarin: Xu 1998).2 Much remains to be understood about cross-language variation here, both for timing patterns and for anchor types. Right or left syllable edges are often loosely invoked as anchors, but holistically construed entire syllables (Xu and Liu 2006), as well as various subsyllabic constituents (e.g. morae: Zsiga and Nitsaroj 2007), have also been proposed. Comparative studies likewise demonstrate subtle cross-language alignment differences for otherwise analogous tone patterns. Southern German speakers align prenuclear rises slightly later than Northern Germans, and both align these later than Dutch or English speakers (Atterer and Ladd 2004; Arvaniti and Garding 2007 for American English dialects; Ladd et al. 2009b on Scottish and Southern British English). Mennen (2004) shows not only that ‘comparable’ pitch accents in Greek and Dutch differ subtly in timing but also that Dutch non-native speakers of Greek display different timing patterns than both native Greek and native Dutch speakers. The phonological implications of such differences remain contested (e.g. Prieto et al. 2005).

9.3.3 Truncation and compression Much of the crowding literature focuses on repair strategies for tone strings realized in temporally challenging circumstances. Two distinct strategies, called ‘truncation’ and 2 In other words, carryover coarticulation is stronger than anticipatory (§9.2).

132 JONATHAN BARNES, HANSJöRG MIXDORFF, AND OLIVER NIEBUHR

Hz

''Compression''

''Truncation''

Short

Short

Long word

Long word

Figure 9.4 Schematic representation of compressing and truncating approaches to f0 realization under time pressure. (Grabe et al. 2000)

‘compression’, have been identified, with some suggesting that languages differ monolithically according to their preference between these two (Erikson and Alstermark 1972; Bannert and Bredvad-Jensen 1975, 1977; Grønnum 1989; Grabe 1998a; Grabe et al. 2000). The original distinction here is that, given a pitch movement under temporal pressure (e.g. HL in phrase-final position, or with a shorter host vowel, a complex or voiceless coda, etc.), a compressing language alters the timing of pitch targets—for example, by retracting the final Low. All targets remain realized, but in a compressed interval and therefore with steeper movements. A truncating language, by contrast, resolves the problem by undershooting targets. A fall’s slope might remain unchanged, for example, but its final target would not reach as low as usual (Figure 9.4). Grabe (1998a) argues that English compresses while German truncates. Bannert and Bredvad-Jensen (1975, 1977) distinguish dialects of Swedish similarly, and Grabe et al. (2000) find that while many British dialects favour compression (e.g. Cambridge), others prefer truncation (e.g. Leeds). More recent work, however, calls into question the binarity of the distinction. Rathcke (2016), for example, shows that German and Russian, both ostensibly truncating, in fact employ a mixture of strategies (see also Hanssen et al. 2007 on Dutch). Cho and Flemming (2015) further point out inconsistencies in usage surrounding these concepts. Some (e.g. Grice 1995a; Ladd 1996, 2008b) take compression to mean fitting a complete tone melody into a reduced temporal interval, potentially affecting both timing and scaling. Truncation, by contrast, refers not to phonetic undershoot but to deletion of some phonological specification altogether. As Ladd (2008b: 180–184) points out, it is often not clear whether a given instance represents phonological deletion or phonetic undershoot, making the distinction challenging to investigate empirically. Lastly, it must be recognized that temporal pressure on f0 realization is not always remedied exclusively to the detriment of the tone melody. A range of alterations to segmental material, allowing it to better accommodate the timing of an intended tone string, have also been documented, including lengthening of final sonorous segments, blocking of vowel devoicing, and final vowel epenthesis (Hanssen 2017; Roettger and Grice 2019).

PHONETIC VARIATION IN TONE AND INTONATION SYSTEMS 133

9.4 Scaling of pitch movements 9.4.1 Pitch range variability: basic characteristics Unlike musical notes, linguistic pitch specifications cannot be invariantly associated with context-independent f0 values, but rather are massively, multi-dimensionally relative. Speaker size, age, gender, sexual orientation, sociolinguistic identity, and emotional state can all influence pitch range in a global fashion, as can environmental factors such as ‘telephone voice’ or the Lombard effect during speech in noise or with hearing impairment (Hirson et al. 1995; Gregory and Webster 1996; Junqua 1996; Schötz 2007; Pell et al. 2009; Pépiot 2014). Some variability is determined by physiological factors, such as the length and mass of the vocal folds, though this is clearly both modulated by sociocultural norms (e.g. van Bezooijen 1993, 1995; Biemans 2000; Mennen et al. 2012) and actively employed for identity construction and sociolinguistic signalling. Pitch range varies not just globally, however, from individual to individual, or utterance to utterance, but in a structured, often highly local fashion as well, and it is this variability that is usually associated with the encoding of linguistic meanings or functions (cf. §9.4.2).3 One basic observation about this variation is that it tends to affect higher f0 targets more saliently than lower.4 Rather than shifting globally upward or downward, pitch range seems instead to be compressed or expanded, primarily through raising or lowering of its topline or ceiling. The bottom, or ‘baseline’ (Maeda 1976), remains comparatively unperturbed (see Figures 9.5 and 9.6). (Cf. Ladd’s 1996 decomposition of pitch range into ‘pitch level’, the overall height of a speaker’s f0, and ‘pitch span’, or the distance between the lowest and highest values within that range.) 300

F0 IN Hz

250 200

RANGE OF PEAKS

150 100

RANGE OF LOWS

Figure 9.5 An f0 peak realized over the monosyllabic English word Anne at seven different levels of emphasis. Peaks vary considerably, while the final low is more or less invariant. (Liberman and Pierrehumbert 1984)

3 It is worth mentioning here that by ‘pitch range’, we do not usually mean the physiologically determined upper and lower limits on frequencies an individual can produce, but rather the continuously updating contextually determined band of frequencies an individual is using in speech at a given moment. 4 Even when correcting for non-linearities in the perception of f0 measured in Hz.

134 JONATHAN BARNES, HANSJöRG MIXDORFF, AND OLIVER NIEBUHR Group III KS (S3, S4, S6, S7, S8, S9, S16, S19, S20, S22, S23)

F0 ( Hz) 130

110

90

.4

.8

1.2

1.6

2.0

2.4

TIME (sec)

Figure 9.6 f0 contours for 11 English sentences read by speaker KS. A general downward trend is clearly observed (§9.4.3), but the distance between the peaks and the baseline is also progressively reduced, due to the topline falling more rapidly than the baseline. S = sentence. (Maeda 1976)

Additionally, different tone types may not be treated identically under pitch range modifications. Pierrehumbert (1980) influentially asserts that a central difference between phonologically High and Low tones is that, under hyperarticulation, High targets are raised while Lows are lowered (resembling peripheralization of corner vowels). Gussenhoven and Rietveld (2000) and Grice et al. (2009) provide some supporting evidence (the former only perceptual, the latter highly variable). Contradictory findings, however, also abound. Chen and Gussenhoven (2008) find that hyperarticulatory lowering of Lows in Mandarin is subtle and variable at best. Tang et al. (2017) find that all tones increase in f0 under noiseinduced (Lombard effect) hyperarticulation. Zhao and Jurafsky (2009) report raising of all Cantonese tones, including Low, in Lombard speech, and Kasisopa et al. (2014) report the same for Thai. Gu and Lee (2009) report raising of all tones under narrow focus for Cantonese, but note that higher targets are affected more dramatically than lower. Michaud et al. (2015) report raising of all tones in Naxi, including Low, in ‘impatient’ speech. Pierrehumbert (1980: 68) suggests that lowering of hyperarticulated Lows may be constrained physiologically by proximity to the pitch floor, and in some cases obscured by simultaneous pitch floor raising. Disentangling these possibilities empirically presents a challenge.

9.4.2 Paralanguage, pitch range (quasi-)universals, and grammaticalization One major challenge in the study of pitch range is that, while canonical linguistic contrasts are categorical in nature (a given root bears a high tone or a low tone, a given utterance does or does not show wh-movement, etc.), linguistically significant pitch range variation often appears gradient (e.g. the different levels of emphasis in Figure 9.5; see also chapter 29).

PHONETIC VARIATION IN TONE AND INTONATION SYSTEMS 135 A distinction between ‘linguistic’ and ‘paralinguistic’ uses of f0 is frequently invoked here, though in practice this boundary can be elusive (Ladd 2008b: 37). Furthermore, the Saussurean arbitrariness we expect to link sounds and meanings is not always obvious in intonation systems. Certain sound–meaning pairings appear with suspicious regularity across languages, sometimes in gradient, paralinguistic forms, other times categorical and grammaticalized. Gussenhoven (2004: ch 5), building on work by Ohala (e.g. 1984), approaches these parallels in terms of what he calls ‘biological codes’. His ‘Effort Code’, for example, involves a communicatively exploitable link between greater expenditure of articulatory effort and higher degrees of emphasis. If greater effort results in larger f0 movements, then gradient pitch range expansion might be a paralinguistic expression of agitation or excitement, while global compression might signal the opposite. A linguistic codification of this pattern might be the broadly attested tendency across languages for focused elements to be realized with expanded pitch ranges, or for given information to be realized with compressed pitch range (e.g. post-focal compression: Xu et al. 2012, deaccenting, or dephrasing of post-focal material).5 In some cases, the link is gradient (Figure 9.5 again), while in others it is categorical (e.g. European Portuguese: Frota 2000; Gussenhoven 2004: 86), where two contrasting pitch accents—mid-falling H+L* and high-peaked H*+L—encode the difference between presentational and corrective focus. Note also, however, recent work attempting to unify accounts of ostensibly categorical and gradient patterns in a dynamic systems model (Ritter et al. 2019).

9.4.3 Downtrend Perhaps the most exhaustively studied pattern of contextual pitch range variability is downtrend, a term spanning a range of common, if not universal, phenomena involving a tendency for pitch levels to fall over the course of an utterance, which has long been recognized (e.g. Pike 1945: 77). The nature of the patterns involved, however, and even the number of distinct phenomena to be recognized, evokes sustained disagreement among scholars. Even the basic terminology is contentious and frustratingly inconsistent across research traditions. Our usage here is as follows: ‘declination’ refers to ‘a gradual tapering off of pitch as the utterance progresses’ (Cohen and ’t Hart 1965). Declination is putatively global, gradient, and time dependent. ‘Downstep’ refers to a variety of categorical (or at least abrupt) lowering patterns, usually, though not exclusively, affecting High tones. Some downstep patterns are phonologically conditioned, such as H-lowering after L in terraced-level tone systems (Welmers 1973; see also §9.2), or boundary-dependent lowering of sequential Highs in Japanese (Pierrehumbert and Beckman 1988) and Tswana (Zerbian and Kügler 2015). Other cases are lexically or constructionally specific, such as contrastively downstepped pitch accents in English and lexical downstep in Tiv (Arnott 1964) or Medumba (Voorhoeve 1971).6 Lastly, the term ‘final lowering’ (Liberman and Pierrehumbert 1984) refers to various 5 The paralinguistic expression of the Effort Code is (arguably) universal. Its linguistic manifestation, though, clearly varies. Some languages, such as Northern Sotho (Zerbian 2006) and Yucatek Maya (Kügler and Skopeteas 2006), are reported to lack prosodic encoding of information structure. Akan (Kügler and Genzel 2012) appears to express focus with a lowering of pitch register. 6 Some sources, following Stewart (1965), distinguish between ‘automatic’ and ‘non-automatic’ downstep, where the former refers to phonologically conditioned post-Low downstep, while the latter usually

136 JONATHAN BARNES, HANSJöRG MIXDORFF, AND OLIVER NIEBUHR

Figure 9.7 Waveform, spectrogram, and f0 contour of a Cantonese sentence, 媽媽擔憂娃娃, maa1 maa1 daam1 jau1 waa1 waa1, ‘Mother worries about the baby’, composed entirely of syllables bearing high, level Tone 1. Gradually lowering f0 levels over the course of the utterance could be attributed to declination. (Example courtesy of Di Liu)

probably distinct phenomena involving lowered pitch in domain-final contexts. Some applications appear gradient, possibly time dependent (e.g. Beckman and Pierrehumbert 1986 on Japanese), while others are heavily structure dependent and grammaticalized (e.g. Welmers 1973: 99 on Mano). Figures 9.7 and 9.8 show examples of apparent declination and terracing downstep in Cantonese.7 Much of the controversy over downtrend centres on whether globally implemented f0 patterns such as declination are distinct from, say, downstep conditioned by local inter actions between tonal targets or other phenomena. Pierrehumbert and Beckman (1988), for example, argue that much of what has been taken to represent gradual f0-target lowering in the past may simply be the result of phonetic interpolation from an early High target in an utterance to a late Low. Intervening targets would thus not be lowered but rather be means morphosyntactic or lexical downstep (sometimes attributed to the presence of ‘floating’ Low tones in phonological representation). Other terms one encounters for different types of downstep include ‘catathesis’ (Beckman and Pierrehumbert 1986, referring to the Japanese pattern) and ‘downdrift’ (sometimes used to refer to phonologically conditioned terracing downstep, and sadly also sometimes used for declination). See Leben (in press) for an excellent recent overview from a phonological perspective, as well as chapter 4. 7 Cantonese is not normally mentioned among the languages that show terraced-level downstep. To confirm that this is in fact the correct characterization of the Cantonese pattern, we would need to verify that (i) the degree of lowering in the post-Low case is greater than would be expected from a background declination effect and (ii) that this is not merely an effect of perseveratory coarticulation (i.e. that the effect is not time dependent—for example, causing lowering of the pitch ceiling that persists, absent further modifications, for the remainder of the relevant prosodic constituent).

PHONETIC VARIATION IN TONE AND INTONATION SYSTEMS 137

Figure 9.8 Waveform, spectrogram, and f0 contour of a Cantonese sentence, 山岩遮攔花環, saan1 ngaam4 ze1 laan4 faa1 waan4, ‘A mountain rock obstructs the flower wreath’, in which high Tone 1 alternates with the low falling Tone 4, creating a HLHLHL pattern reminiscent of the terracing downstep typically described in African languages. (Example courtesy of Di Liu)

absent altogether. Assuming declination does exist, though, there is also controversy over whether it is ‘automatic’ (’t Hart et al. 1990: ch 5; Strik and Boves 1995). Lieberman (1966) may be the first suggestion of a causal link between some form of downtrend and falling levels of subglottal pressure over a domain he calls the ‘breath group’. If uncompensated changes in subglottal pressure result in directly proportional f0 changes (Ladefoged 1963), and if subglottal pressure falls gradually over the utterance with decreasing volume of air in the lungs, then perhaps declination is not strictly speaking linguistic, insofar as it is not ‘programmed’ or ‘voluntary’.8 (Lieberman actually seems to be focused on rapidly falling subglottal pressure domain-finally, yielding what is now called ‘final lowering’ (see Herman et al. 1996). It is also worth noting that Lieberman only finds this connection relevant to ‘unmarked breath groups’. Languages of course also implement ‘marked breath groups’, e.g. English interrogative f0 rises, during which no trace of this ‘automatic’ tendency should be observable.) A further challenge to the idea of ‘automatic’ declination is Maeda’s (1976) observation that longer utterances typically have shallower declination slopes than shorter ones. Assuming fixed initial and final f0, the magnitude of declination thus appears constant and time independent, which in turn seems to require pre-planning, if not specifically of f0 8 Much of the disagreement here seems to hinge on tacit assumptions about what ‘automatic’ means. Some seem to understand it as ‘physiologically uncontrolled and unavoidable’, while others (e.g. ’t Hart et al. 1990) may mean something weaker, such as ‘not explicitly specified syllable by syllable’, while admitting some form of linguistically informed global control or targeting. Additionally, it should be clear that the existence of an underlying biological motivation for a linguistic pattern hardly makes that pattern ‘automatic’, the literature being replete with instances of phonetically motivated patterns that have nonetheless been ‘grammaticalized’ (§9.4.2).

138 JONATHAN BARNES, HANSJöRG MIXDORFF, AND OLIVER NIEBUHR decay then at least of the rate of airflow from the lungs. Further evidence for pre-planning comes from anticipatory or ‘look-ahead’ raising (Rialland 2001). At least some speakers of some languages have been shown to exhibit higher initial f0 levels in longer utterances than in shorter (e.g. ’t Hart et al. 1990: 128 for Dutch; Shih 2000 for Mandarin; Prieto et al. 2006 for Catalan, Italian, Portuguese, and Spanish; Yuan and Liberman 2010 for English and Mandarin; Asu et al. 2016 for Estonian).9 Another problem relating to automaticity is that downtrend can apparently be context ually modulated or even turned off. This is the case both within morphosyntactic or discourse contexts (e.g. Thorsen 1980 on Danish; Lindau 1986 and Inkelas and Leben 1990 on Hausa; Myers 1996 on Chichewa) and for specific lexical tone categories (Connell and Ladd 1990 on Yoruba; Connell 1999 on Mambila). In Choguita Rarámuri, Garellek et al. (2015) find no evidence for declination at all. Extensive basic empirical work across languages is urgently required here. One last point in the automaticity debate concerns implementation of so-called declin ation ‘reset’ (Maeda 1976), the abandonment of a given interval of downtrend, and return of the pitch range ceiling to something like typical utterance-initial levels. Here the notion of ‘breath group’ as domain becomes problematic, in that resets frequently fail to correspond to inhalations on the part of the speaker (Maeda 1976; Cooper and Sorenson 1981). Instead, reset tends to occur at linguistically salient locations, such as syntactic boundaries, thereby serving as a cue to the structure of the utterance. Degree of reset furthermore sometimes correlates with the depth of the syntactic boundary in question, distinguishing hierarchical structures with different branching patterns (Ladd 1988, 1990; van den Berg et al. 1992; Féry and Truckenbrodt 2005). How reset interacts with other known cues to boundary size and placement (e.g. pitch movements, lengthening) is an active area of current research (e.g. Brugos 2015; Petrone et al. 2017). Concerning global versus local conditioning, Liberman and Pierrehumbert (1984) argued that downtrend in English is entirely a consequence of local scaling relations between adjacent targets. They famously modelled the heights of sequential accent peaks in downstepping lists (e.g. Blueberries, bayberries, raspberries, mulberries, and brambleberries . . .), such that each peak’s f0 is scaled as a constant fraction of the preceding one.10 The resulting pattern of exponential decay creates the appearance of global downtrend without actual global planning. A tendency towards higher initial f0 in longer lists, rem iniscent of ‘look-ahead raising’, was observed in this study but discounted as non- linguistic ‘soft pre-planning’. The constant-ratio approach has been applied successfully in various languages (e.g. Prieto et al. 1995 on Mexican Spanish). Beckman and Pierrehumbert (1986), however, found that assuming an additional global declining f0 trend improved their constant-ratio model of Japanese downtrend, suggesting coexistence of downstep and declination (Poser 1984a). The constant-ratio model of English also systematically underpredicted downstep of series-final pitch accents, leading Liberman and Pierrehumbert (1984) to posit the activity of an additional decrement, or ‘final lowering’.

9 Cf. Laniran and Clements (2003) on Yoruba and Connell (2004) on Mambila, neither of which languages appear to exhibit this tendency. 10 An idea they attribute to Anderson (1978), writing on terraced-level tone systems.

PHONETIC VARIATION IN TONE AND INTONATION SYSTEMS 139

9.4.4 Perceptual constraints on tone scaling patterns The extreme malleability of tone values in the frequency domain raises questions about how listeners map between realized f0 levels in an utterance and the linguistic tone categories they express. Pierrehumbert’s (1980: 68) purely paradigmatic definition of High and Low tone (a High tone is higher than a Low tone would have been in the same context) encapsulates this difficulty, opening up the counterintuitive possibility that High tones under the right circumstances (e.g. late in a downstepping pattern) might be realized lower than Low tones within the same utterance. How often Highs and Lows in fact cross over in this manner is still not entirely clear. Welmers (1973) distinguishes between discrete tone-level languages, in which contrasting level tones tend to avoid realization within the other tones’ characteristic ‘frequency bands’, and other languages, such as Hausa, in which cross-over may take place in extended terracing downstep. Yoruba has terracing downstep but may resist cross-over (Laniran and Clements 2003).11 Mapping from realized f0 to phonological tone categories is commonly thought to involve evaluating the heights of individual targets against some form of contextually updating reference level. Many studies equate perceived prominence with the magnitude of, for example, a High pitch accent’s excursion over a ‘reference line’ (Pierrehumbert 1980; Liberman and Pierrehumbert 1984; Rietveld and Gussenhoven 1985). Pierrehumbert (1979) showed that for two American English pitch accents in sequence to sound equal in scaling, the second must be lower than the first. If the two are equal, listeners perceive the second as higher.12 The reference line, whether ‘overt’ (e.g. extrapolated through low f0 valleys between prominences) or ‘implicit’ (Ladd 1993), appears to be constantly declining. Gussenhoven and Rietveld (1988) provide evidence that perceptual declination is global and time dependent. Gussenhoven et al. (1997) present additional evidence that realized f0 minima in an utterance do not determine perceived prominence of neighbouring peaks.13 While phrase-initial f0 levels do alter the course of the projected reference line, Gussenhoven and Rietveld (1998) show that global normalization factors, such as inferred speaker gender, also play a role (cf. Ladd 1996, 2008b on initializing vs. normalizing models of scaling perception).

9.5 Contour shape Autosegmental phonology promotes a view of tone and intonation with just two orthogon ally variable representational dimensions: the levels of tonal autosegments (H, M, L, etc.), and the timing relationships between them, emerging from their alignments with segmental 11 It is tempting to relate the notion of ‘frequency banding’ to the principle of adaptive dispersion (Liljencrants and Lindblom 1972), though efforts to locate such parallels between, say, tone systems and vowel inventories have thus far yielded equivocal results (e.g. Alexander 2010). 12 This has been interpreted as a form of perceptual compensation for realized declination in production. 13 Interestingly, even utterance-final low f0, despite its demonstrated invariance within speakers, exerts no influence on judgements of prominence.

140 JONATHAN BARNES, HANSJöRG MIXDORFF, AND OLIVER NIEBUHR hosts. The target-and-interpolation view of tonal implementation, moreover, extends this picture directly into the phonetics, where research has been focused either on the timing of tonal targets or on their scaling in the f0 dimension. It is furthermore commonly assumed that those tonal targets can be operationalized more or less satisfactorily as f0 turning points reflecting target attainment in both domains. While turning points are surely related to phonological tone specifications, the directness and exhaustivity of this relationship are much less certain. Much recent research has been devoted to aspects of global f0 contour shape that are varied systematically by speakers, relied upon as cues by listeners, and yet difficult to characterize in terms of turning-point timing or scaling (Barnes et al. 2012a, 2012b, 2014, 2015; Niebuhr 2013; Petrone and Niebuhr 2014). Where the target-and-interpolation model sees empty spaces or transition zones, we are increasingly identifying aspects of tonal implementation that are no less important than the ‘targets’ themselves. The following sections focus on a few of these characteristics.

9.5.1 Peak shapes and movement curvatures Peak shape and movement curvature are two aspects of contour shape whose perceptual relevance is increasingly seen in both tone and intonation systems. This section makes three points. First, pitch accent peaks need not be symmetrical. Second, f0 movements towards and away from high tones need not be linear. Third, peak maxima need not be local events. The first point is reflected in most autosegmental-metrical analyses. Pitch accents include leading or trailing tones that precede or follow ‘starred tones’ by what was originally thought to be a constant interval (cf. §9.3). This idea embodies the observation that some slopes related to pitch accents are characteristically steeper than others. For a H+L* accent, for instance, the falling slope to the Low is expected to be systematically steeper than the rise to the H, whereas for L*+H, a steep rise should be a defining feature. Perception experiments by Niebuhr (2007a) on German support these expectations and show that movement slopes furthermore interact with peak alignment: the less steep the fall of a H+L* accent, the earl ier in time the entire falling pattern must be to reliably convey its communicative function. L*+H accents are not identified at all by listeners without a steep rise.14 For H*, identification is best if both rise and fall are shallow. The shallower these slopes, in fact, the less important it is perceptually how the peak aligns with respect to the accented syllable. (Similarly, see Rathcke 2006 on Russian.) Cross-linguistic research shows that this interplay of peak shape and alignment can be a source of inter-speaker variation (Niebuhr 2011). In both German and Italian, a continuum has been identified between two opposing strategies for pitch accent realization. Some speakers (‘aligners’) distinguish their pitch accent categories primarily using pitch movement timing, with peak shapes kept virtually constant. Other speakers (‘shapers’) produce contrasting pitch accents with more or less identical timing, but with strong differences in shape. Figure 9.9 illustrates this difference using data from two exemplary speakers of German. The corpus data suggest that pure shapers are fairly rare, the vast majority of speakers using both strategies to some degree, with alignment typically dominant. 14 Identification of L*+H is additionally enhanced by a steep fall. See Niebuhr and Zellers (2012) for the relevance of falling slope here, and a possible tritonal analysis.

PHONETIC VARIATION IN TONE AND INTONATION SYSTEMS 141 H+L*

“Aligner”

H*

La

Wi

Na

Ma

H+L*

“Shaper”

H*

Figure 9.9 The realization of the H+L* versus H* contrast in German by means of variation in f0 peak alignment (top) or f0 peak shape (bottom). The word-initial accented CV syllables of Laden ‘store’, Wiese ‘meadow’, Name ‘name’, and Maler ‘painter’ are framed in grey. Unlike for the ‘aligner’ (LBO), the f0-peak maxima of the ‘shaper’ are timed close to the accented-vowel onset for both H+L* and H*.

Dombrowski and Niebuhr (2005) discovered systematic variation in the curvature of phrase-final boundary rises in German. Concave rises, starting slow and subsequently accelerating, were mainly produced in turn-yielding contexts, whereas convex (but non-plateauing) rises were mainly produced at turn-internal phrase-final boundaries. A convex–concave distinction also appears at the ends of questions, where a convex shape signals ‘please just respond and let me continue speaking afterwards’ and a concave one ‘please feel free to take the turn and keep it’. Dombrowski and Niebuhr (2005) and Niebuhr and Dombrowski (2010) capture the communicative function ‘activating’ (convex) or ‘restricting’ (concave) the interlocutor. Asu (2006), Petrone and D’Imperio (2008), and Cangemi (2009) report similar convex–concave distinctions in varieties of Estonian and Italian. For the latter, a convex fall from a high target marks questions, while a concave fall marks statements. Petrone and Niebuhr (2014) showed that the same form–function link applies to final falls in German as well, and even extends here, in a perceptually relevant way, to the prenuclear accent peaks of questions and statements. That is, listeners infer from the shape of a prenuclear fall whether the utterance is going to be a question or a statement. Concave rises and/or convex falls are such powerful cues to sentence mode in German that they may sway listeners even in the absence of other morphosyntactic or prosodic interrogative markers, as shown in Figure 9.10. Temporal instantiation of f0 peaks may be ‘sharp’, with rapidly rising and falling flanks, or flatter, with f0 lingering close enough to its maximum for no single moment within that lengthier high region to be identifiable as ‘the target’ (Figure 9.11).

142 JONATHAN BARNES, HANSJöRG MIXDORFF, AND OLIVER NIEBUHR

ne

Woh nung

(a)

350

Question: Katherina searches for a flat Ka the ri

F0 (in Hz)

co nv ex 0

sucht

Time (s)

50 1.889 0

na

sucht

ne

Woh nung

(b)

ex nv co

50

na e cav con

F0 (in Hz)

Ka the ri

concav e

Statement: Katherina searches for a flat

350

Time (s)

1.532

Figure 9.10 A declarative German sentence produced once as a statement (left) and once as a question (right). The shapes of the prenuclear pitch accent peaks are different. The alignment of the pitch accent peaks is roughly the same (and certainly within the same phonological category) in both utterances (statement and question).

f0 (hz)

600

100

0

Time (s)

2.467

Figure 9.11 A sharp peak, and a plateau, realized over the English phrase ‘there’s luminary’.

In many languages, this variation remains largely unexplained (Knight 2008: 226). Its perceptual consequences, however, are increasingly clear. In the scaling domain, it is widely observed (e.g. D’Imperio 2000, citing a remark in ’t Hart 1991; Knight 2003, 2008) that plateau-shaped accentual peaks sound systematically higher to listeners than analogous sharp peaks with identical maximum f0. Köhnlein (2013) suggests that this higher perceived scaling may be the reason for the relative unmarkedness across languages of high-level pitch as the phonetic realization of a High tonal target. In Northern Frisian (Niebuhr and Hoekstra 2015), extended-duration peaks appear systematically in contexts where speakers of other languages would expand pitch range (e.g. contrastive focus). Turning to the composition of lexical tone inventories, it is tempting to see this as one factor making high-level tones common across languages relative to, say, sharp-peaked rising-falling tones. Cheng (1973), for example, in his survey of 736 Chinese tone inventories, finds 526 instances of high-level tones (identified by 55 or 44 transcriptions using the Chao tone numbers), against just 80 instances of convex (rising-falling) tones.15 Explanations of the higher perceived scaling of plateaux include greater salience of the f0 maximum, owing to longer exposure (Knight 2008), and the suggestion that scaling perception involves a form of f0 averaging over time (Barnes et al. 2012a, 2012b). That is, if a plateau-shaped pattern remains close to its maximum f0 over a longer time span, then listeners should perceive it as higher in pitch. This account (correctly) predicts perceived 15 One could of course also appeal to the greater structural complexity or production difficulty of convex tones in explaining this asymmetry.

PHONETIC VARIATION IN TONE AND INTONATION SYSTEMS 143 s caling differences for other shape variations as well (Barnes et al. 2010a; Mixdorff et al. 2018; see also Niebuhr et al. 2018 on peak shape variation in ‘charismatic’ speech). The effect of the sharp peak versus plateau shapes has also been studied with respect to the perceived timing of f0 targets (D’Imperio 2000; Niebuhr 2007a, 2010; D’Imperio et al. 2010; Barnes et al. 2012a), uncovering significant variation across languages and even between different intonation contours in a single language. What is clear is that no single point within a high plateau can be identified in a stable manner with the f0 maximum of a sharp peak for the purposes of ‘target’ identification. It is also clear that attempts to study perception (or production) of tonal timing independently of tone scaling will inevitably miss key insights, insofar as any aspect of f0 contour shape that affects one of these dimensions likely affects the other as well, yielding perceptual interactions that we are only just beginning to explore.

9.5.2 ‘Dipping’ Lows and local contrast If level f0 around an f0 maximum is an efficacious way of implementing High tones, the same may not be true for Low targets, which are instead frequently buttressed, and especially preceded, by higher f0, creating a salient movement down towards the low target, or up away from it, a pattern Gussenhoven (2007; after Leben 1976), refers to as ‘dipping’. Gussenhoven cites allophonic concave realization of Mandarin Tone 3, as well as instances of phonological Lows enhanced by higher surrounding pitches in Stockholm Swedish (Bruce 1977; Riad 1998a), Northern European Portuguese (Vigário and Frota 2003), and Borgloon Dutch (Peters 2007). Ahn (2008) discusses High f0 flanking L* pitch accents in English yes/noquestions in similar terms. While some of these patterns have been treated as phonological High tone insertion, others may be a matter of gradient phonetic enhancement of the Low target. (Gussenhoven 2007 presents Swedish as a case that has been analysed both ways by different scholars.) The commonplace description of late-peak (L*+H) pitch accents as ‘scooped rises’ suggests that a similar, if less dramatic, pattern may be standard in many languages. Again consulting Cheng’s (1973) Chinese tone inventory survey, we observe that while convex rise-fall patterns are relatively rare as citation forms (see §9.5.1), concave or fall-rise patterns are in fact quite common: 352 attestations in the survey of 736 systems, as against only 166 instances of tones described as low and level (Chao numerals 22 or 11).16 The connection between enhancement patterns such as Low dipping and the Obligatory Contour Principle (Leben 1973) should be clear and is explicitly invoked by both Gussenhoven (2007) and Ahn (2008). Analogous insertion of Low to separate underlying Highs has also been proposed (Gussenhoven 2012b on the Maastricht Fall-Rise). Nonetheless, it seems fair to observe that High tone analogues to the dipping Low pattern are substantially less common, leaving us to wonder why dynamicity so commonly complements Low targets, while stasis, in the form of plateaux, is so suited to Highs.17 Perhaps 16 Cheng’s sample skews heavily towards Mandarin, especially Northern Mandarin, so some caution in interpreting this typology is advised. The fact that we are in most cases observing citation forms of tones in such surveys also merits caution. 17 It is possible, of course, that the corresponding pattern for Highs is hiding in plain sight: if the English L+H* pitch accent is correctly thought of as an emphatic or hyperarticulated H* (a big if, of

144 JONATHAN BARNES, HANSJöRG MIXDORFF, AND OLIVER NIEBUHR there is some sense in which High tones, with their particular relation to prominence of varying descriptions, are perceptually advantaged and perhaps less in need of support from syntagmatic enhancements than their Low counterparts (see Evans 2015 and references therein).

9.5.3 Integrality of f0 features The interaction of contour shape with f0 timing and scaling makes clear the need to view individual aspects of the f0 contour as part of a larger constellation of cues working together to realize the contrasting categories of phonological systems. Peak timing and scaling, for example, interact not just with contour shape but with one another. Gussenhoven (2004) documents a pattern across languages whereby later peak timing either co-varies with or substitutes for higher peak scaling. His explanation for this ‘later = higher’ pattern involves an inference by listeners, such that longer elapsed time implies greater distance covered in f0 space, and hence a higher target.18 Numerous instances of the opposite pattern are also documented however, whereby earlier peak timing co-varies with higher peak scaling (e.g. Face 2006 and others on earlier, higher peaks in Spanish narrow-focus constructions, similarly Cangemi et al. 2016 on Egyptian Arabic, and Smiljanić and Hualde 2000 on Zagreb Serbo-Croatian). Gussenhoven (2004) treats such counterexamples as a distinct manifest ation of his Effort Code. Barnes et al. (2015, 2019) suggest that both patterns may originate from language- and construction-specific manipulations of peak timing in order to maximize mean f0 differences during a particular syllable or interval. Trading and enhancement relations between contour shape features are only now beginning to be explored. Work in connection with the Tonal Center of Gravity theory (TCoG) (Barnes et al. 2012b; chapter 3), for example, makes explicit predictions concerning which aspects of contour shape should be mutually reinforcing and hence likely to trade or cooccur (Figure 9.12), both with one another and with particular timing and scaling patterns. For example, for a rising-falling pitch accent, both a concave rise and a convex fall would shift the bulk, or TCoG, of the raised-f0 region later. They thus enhance one another and together promote the perception of a relatively later high pitch event. Their co-occurrence might therefore be preferred across languages (while mirror images would be avoided in late-timing contexts, insofar as they would counteract it). Bruggeman et al. (2018) use the notion of the TCoG to generalize across patterns of inter-speaker variability in the realization of pitch accents in Egyptian Arabic. Patterns of individual difference such as the ‘shapers’ and ‘aligners’ presented above (§9.5.1) may also be explained in this manner. Lastly, Barnes et al. (2015, 2019) develop the notion of the TCoG as a perceptual reference location for f0 events, both in the timing and the scaling dimensions, as a way of accounting for apparent perceptual interactions of the kind noted earlier in this section. course: Ladd and Morton 1997), then perhaps the leading Low tone is in fact precisely this sort of enhan cing feature. Similarly, ’t Hart et al. (1990: 124) refer to ‘extra-low F0 values preceding the first promin ence-lending rise’ in a phrase, called an ‘anticipatory dip’ by Cohen and ’t Hart (1967, and observed also by Maeda (1976). 18 Also called the ‘Tau Effect’, a potentially domain-general phenomenon whereby increased separation of events in time causes overestimation of their separation in space (Helson 1930; Henry et al. 2009).

PHONETIC VARIATION IN TONE AND INTONATION SYSTEMS 145 (a)

(b)

TCoG (c)

TCoG (d)

TCoG (e)

TCoG (f)

TCoG (g)

TCoG (h)

TCoG

TCoG

Figure 9.12 Schematic depiction of how various f0 contour shape patterns affect the location of the Tonal Center of Gravity (TCoG) (Barnes et al. 2012b) and the concomitant effect on perceived pitch event alignment. The shapes on the left should predispose listeners to judgements of later ‘peak’ timing, while the mirror images (right) suggest earlier timing. Shapes that bias perception in the same direction are mutually enhancing and hence predicted to co-occur more frequently in tonal implementation.

Niebuhr’s (2007b, 2013) Contrast Theory likewise showcases the interplay of seemingly disparate aspects of the signal in perception, but with an emphasis on perceived promin ence. Its basic assumption is that varying realization strategies serve to increase the perceived prominence of some f0 intervals over others, enhancing phonological contrasts. For instance, the final low section of H+L* and the central high section of H* in German should each achieve maximum prominence, assuming that T* is prominent. One way to

146 JONATHAN BARNES, HANSJöRG MIXDORFF, AND OLIVER NIEBUHR (She was once a PAINter) “Sie war mal MA– lerin”

frequency (Hz)

5000

0 85 intensity (dB)

[m]

65 150

[a:]

H*

F0 (Hz)

H+L*

85

time (sec.) 1.19642

0 100 90

% H* identification

80 70 60 50 40 30 20 10 0 1

2

3

4

5

6

7

8

9

10

11

Stimulus

Figure 9.13 f0-peak shift continuum and the corresponding psychometric function of H* identifications. The lighter lines refer to a repetition of the experiment but with a flatter intensity increase across the CV boundary.

achieve this would be to exploit the prominence-lending effect of duration and create a plateau-shaped peak for H*. Another strategy would be to centre the relevant f0 stretches over the accented vowel, thereby exploiting its inherent prominence-lending energy level. This would yield, as typically attested, an earlier peak alignment for H+L* than for H*. Indeed, perception studies involving f0 peak alignment continua (Figure 9.13) locate the

PHONETIC VARIATION IN TONE AND INTONATION SYSTEMS 147 category boundary between these two accents in synchrony with the intensity increase corresponding to the transition from onset consonant to accented vowel. Moreover, the less abrupt this intensity increase is, the less abrupt the categorization shift from H+L* to H* in the peak timing continuum. The conceptual core of the Contrast Theory relies on ideas developed by Kingston and Diehl (1994), whereby multiple acoustic parameters may be coordinated by speakers such that they form ‘arrays of mutually enhancing acoustic effects’ (p. 446), with each array representing ‘a single contrastive perceptual property’ (p. 442). Here, combinations of timing, shape, and slope properties of f0 movements would constitute the ‘arrays’, while the contrastive perceptual property would be prominence. Contrast Theory holds that speakers vary individual prominences to create a certain prominence Gestalt. Together with the coinciding pitch Gestalt, communicative meanings and functions are encoded. Contrast Theory and the TCoG both represent attempts to reconcile tension in the literature between ‘configuration’-based accounts of tone patterns and those based on level-tone targets (Bolinger 1951). Both approaches turn on the integration of acoustic f0 cues into higher-level perceptual variables. While often complementary, these approaches sometimes diverge in interesting ways, as in the case of accentual plateaux in German (Niebuhr 2011), where results run contrary to Barnes et al. (2012a) on English, in a way that appears to favour a contrast-based approach over one involving timing of the TCoG.

9.6 Non-f0 effects The fact that Mandarin, for example, is highly intelligible when whispered, or resynthesized with flattened f0, attests to the salience of non-f0 cues to tonal contrasts in that language (Holbrook and Lu 1969; Liu and Samuel 2004; Patel et al. 2010). That listeners can discriminate above chance both whispered question–statement pairs and different prom inence patterns in languages such as Dutch (Heeren and van Heuven 2014) speaks similarly regarding intonation. In addition to the interaction of intensity/sonority with f0 already discussed, chapter 3 discusses other non-f0 cues as well, such as duration as well as ‘segmental intonation’ (Niebuhr 2008, 2012), the pseudo-f0 present in (for example) obstruent noise during voiceless intervals. In what follows, we focus on one additional non-f0 factor: phonation type or voice quality. We focus on creaky voice here, though a similar literature exists regarding breathiness (e.g. Hombert 1976; Hombert et al. 1979; Esposito 2012; Esposito and Khan 2012). For overviews on phonation type, see Gerratt and Kreiman (2001) and Gordon and Ladefoged (2001), and on interactions with tone in particular see Kuang (2013a). Though creak is a well-known cue to prosodic boundary placement and strength (Pierrehumbert and Talkin 1992; Dilley et al. 1996; Redi and Shattuck-Hufnagel 2001; Garellek 2014, 2015), here we discuss it solely in relation to tonal contrasts. In some languages, voice quality is a contrast-bearing feature essentially orthogonal to tone (e.g. Jalapa Mazatec: see Silverman et al. 1995; Garellek and Keating 2011; Dinka: see Andersen 1993). In other cases, the two may be linked in complex ways. In White Hmong (Garellek et al. 2013), for example, the association of breathy voice with an otherwise high-falling tone is sufficiently strong for breathiness alone to cue listener identifications. Low falling tone, by

148 JONATHAN BARNES, HANSJöRG MIXDORFF, AND OLIVER NIEBUHR contrast, though frequently creaky, depends primarily on duration and f0 for identification, with voice quality playing no discernible role.19 Both physiological and perceptual motivations for association patterns between voice qualities and tones have been proposed. Creaky voice, for example, frequently co-occurs with lower tones, both lexical and intonational. Welmers (1973: 109) notes that native speakers of Yoruba ‘have sometimes interpreted a habitually creaky voice in an American learner as signalling low tone even when the pitch relationships represented an adequate imitation of mid and high’. Yu and Lam (2014) show that creaky voice added to otherwise identical f0 contours is sufficient to shift Cantonese listener judgements from low-level Tone 6 to low falling Tone 4. In Green Hmong, Andruski and Ratliff (2000) show that three low falling tones with broadly similar f0 are distinguished primarily by voice quality (modal, creaky, and breathy). In some cases, the phonetic link between low f0 and creak appears quite direct, as in Mandarin, where Kuang (2013a, 2017) shows that, although creak is a strong cue for (Low) Tone 3, it actually occurs whenever context draws speakers to the lower extremes of their pitch range (e.g. some offsets of high-falling Tone 4). Likewise, Tone 3 creaks less when realized in a raised pitch range and more when pitch range is lowered. Puzzlingly, creakiness is also frequently associated with very high f0 targets. The issue may be partly terminological (Keating et al. 2015). However, Kuang (2013a, 2017), building on work by Keating and Shue (2009) and Keating and Kuo (2012), demonstrates a connection between creaky or tense phonation and both high and low f0. English and Mandarin speakers producing rising and falling ‘tone sweeps’ exhibited a wedge-shaped relationship between f0 and voice quality, such that both extreme low and high f0 targets were realized with shallower spectral slopes (i.e. low H1–H2). Kuang hypothesizes that extreme f0 values at either end of the pitch range lead to increased vocal fold tension and thus non-modal phonation. For low f0 values, this becomes prototypical creak or vocal fry, often with irregu lar glottal pulsing added to shallow spectral slope. For high f0, it results instead in ‘tense’ or ‘pressed’ voice quality, sharing shallow spectral slope but lacking irregular pulsing.20 (Kingston 2005 reasons similarly regarding apparent tone reversals in Athabaskan languages related to glottalization.) There is also, however, a psychoacoustic component to these associations. Many have suggested (e.g. Honorof and Whalen 2005) that voice quality provides cues to where f0 targets lie within a speaker’s pitch range, facilitating speaker normalization. For example, tense voice quality on high-f0 syllables might indicate the top end of the speaker’s range. Kuang et al. (2016) show that manipulation of spectral slope to include more higher- frequency energy (i.e. to create a ‘tenser’ voice quality) causes listeners to report higher pitches than when the same f0 is presented with steeper spectral slope. Moreover, Kuang and Liberman (2016a) showed that at least some listeners interpreted the same spectral slope differently in different pitch ranges. Shallower slope elicited lower pitch judgements 19 In some languages these features, along with vowel quality and duration, are so densely interwoven that the term ‘register’ (Henderson 1952) is substituted. See Brunelle and Kirby (2016) and chapter 23 on problems with this distinction. 20 Interestingly, some of Kuang’s Mandarin speakers, particularly when instructed not to creak during their tone sweeps, produced breathy voice at the low end of their pitch ranges instead, and Kuang cites Zheng (2006) for the observation that for some speakers at least, the dipping Mandarin Tone 3 may be allophonically breathy, rather than creaky, as it is usually described.

PHONETIC VARIATION IN TONE AND INTONATION SYSTEMS 149 when appearing to be low in a synthetic female pitch range, but higher when high in that same range (cf. the wedge-shaped production relationship above). Lastly, Kuang and Liberman (2016b) elicited lower f0 judgements to stimuli with synthetic vocal fry added (through ‘jittering’ pulse spacing) than to the same stimuli presented unjittered. The relationship between f0 and voice quality interactions in speech production and the integration of those features in perception represent rich ground for future research. Brunelle’s (2012) investigation of integration of f0, F1 (Formant 1), and voice quality in the perception of ‘register’ in Cham dialects is exemplary here.

9.7 Conclusion In the foregoing we have reviewed to the extent possible the main patterns of variation documented in the realization of f0 contours across tone and intonation systems. We have attempted to give some indication both of what these patterns are like descriptively and of what kinds of explanations researchers have offered for them, drawing especially on connections with the literature on perception. Beyond this, we have attempted to underscore the importance of viewing all these patterns in light of their mutual interactions. In general, we expect study of the integration of cues from all the dimensions of the contour discussed herein to be a rich source of progress in prosody research concerning the production and perception of tonal contrasts for years to come.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

chapter 10

Phon etic Cor r el ates of Wor d a n d Sen tence Str ess Vincent J. van Heuven and Alice Turk

10.1 Introduction It has been estimated that about half of the languages of the world have stress (van Zanten and Goedemans 2007; van Heuven 2018). In such languages every prosodic domain has a stress (also called prosodic head). In non-stress languages (e.g. tone languages) there are no head versus dependent relationships at the word level. Prosodic domains are hierarchically ordered such that each next-higher level in the hierarchy is composed of a sequence of elements at the lower level (e.g. Nespor and Vogel 1986). In a stress language one of these lower-level units is the prosodic head; the other units, if at all present, are the dependents. This chapter deals with the prosodic heads at the word and sentence levels, called the (primary) word stress and (primary) sentence stress, respectively. Sentence stresses, whether primary or secondary, typically involve the presence of a prominence-lending tone or tone complex (i.e. a pitch accent) in a syllable with word stress (e.g. Sluijter and van Heuven 1995), which may additionally have effects on other phonetic variables and is as such profitably discussed in combination with word stress. Stresses on dependents at each of these levels can be considered secondary, or even tertiary; secondary and tertiary stress will not be considered here.1 Word stress is generally seen as a lexical property. Its location is fixed for every word in the vocabulary, by one or more fairly simple regularities. In Finnish, Hungarian, and Estonian, for instance, the word stress is invariably on the first syllable, in Turkish it is on the last, and in Polish it is on the second-but-last syllable (except in some loanwords). In Dutch the location of the word stress is correctly predicted in about 85% of the vocabulary by half a dozen quantity- sensitive rules (Langeweg 1988). In some languages (e.g. Russian and Greek) the location of the stress is fixed for each individual word but apparently no generalizations can be 1 See Rietveld et al. (2004), and references therein, for duration differences between primary and secondary stress in Dutch.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PHONETIC CORRELATES OF WORD AND SENTENCE STRESS 151 formulated that predict this location; here stress has to be memorized for each word in the lexicon separately. From a structural point of view, languages have a richer inventory of stressed than unstressed syllables (Carlson et al. 1985; van Heuven and Hagman 1988). Stressed syllables often allow more complex onsets and codas, as well as long vowels and diphthongs. Unstressed syllables often permit only single consonants in the onset and none in the coda, while distinctions in vowel length tend to be neutralized. Moreover, stressed syllables tend to resist deletions and assimilations to neighbouring unstressed syllables, whereas unstressed syllables tend to assimilate to adjacent stressed syllables and are susceptible to weakening processes and deletions. The classical definition of stress equates it with the amount of effort a speaker spends on the production of a syllable. This implies that some extra effort is spent on each of the stages of speech production—that is, the pulmonary stage (more air is pushed out of the lungs per unit time), the phonatory stage (strong contraction of selected laryngeal muscles), and the articulatory stage (closer approximation on articulatory targets of segments). It has been notoriously difficult, however, to find direct physiological or neural correlates of effort or stress in speech production and we will not attempt to improve on this state of affairs in the present chapter. Instead, we will survey the acoustic correlates of primary stress, with emphasis on languages such as English and Dutch, at the word and sentence level.2 We will show that sentence stress is signalled by all the properties that are acoustic correlates of word stress but that some extra properties are added when the word receives sentence stress. We will also review the literature on the relative importance of the correlates of word and sentence stress. Acoustic markers assume a higher position in the rank order of stress correlates as they more reliably differentiate stressed syllables from their unstressed counterparts in automatic classification procedures. The review will also bring to light that the rank order of acoustic correlates does not correspond in a one-to-one fashion with the perceptual importance of the cues. The final part of this chapter will briefly consider the universality of the rank order of stress cues and consider the question: Is the relative import ance of the acoustic correlates or of the perceptual cues the same across all languages that employ stress, or does it differ from one language to the next, and if so, what are the factors that influence the ranking of stress cues?

10.2 Acoustic correlates of word stress In this section, we will consider how we can determine the acoustic correlates of primary word stress. The procedure is relatively straightforward if a language has minimal stress pairs, which are pairs of lexical items that contain the same phoneme sequence and differ only in the location of the stress. Such minimal stress pairs do not abound in the Germanic languages, but there are enough of them for research purposes. The most frequently used minimal stress pairs in research on English are the noun–verb pairs in words of Latin origin, such as (the) import versus (to) import. A single word (i.e. a one-word sentence, also 2 For recent surveys of stress correlates in a wider range of languages we refer to, e.g., Hargus and Beavert (2005), Remijsen and van Heuven (2006), Gordon (2011b), and Gordon and Roettger (2017).

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

152 VINCENT J. VAN HEUVEN AND ALICE TURK called the citation form) will always receive sentence stress. If we want to study the corre lates of word stress, recorded materials should not have sentence stress on the target word(s), and multi-word sentences must therefore be constructed in which the sentence stress is shifted away from the target word by manipulating the information structure of the sentence. For instance, putting the time adjunct in focus would shift the sentence stress away from the target word onto the adverb in the answer part in (1), where bold small capitals denote word stress and large capitals represent sentence stress: (1) Q. When did you say ‘the IMport’? A. I said ‘the import’ YESterday.

(2) Q. When did you say ‘to imPORT’? A. I said ‘to import’ YESterday.

In Germanic languages, the rule is to remove sentence stress from words (or larger constituents) that were introduced in the immediately preceding context. By introducing the target word in the precursor question, it typically no longer receives sentence stress in the ensuing answer. The acoustic correlates of word stress can now be examined by comparing the members of the minimal stress pair in (1A) and (2A). This is best done by comparing the same syllable in the same position with and without word stress, a procedure that is referred to as ‘paradigmatic comparison’. Syntagmatic comparison of the first and second syllables is problematic since the comparison is between segmentally different syllables in different positions, and should be avoided. For (partial) solutions for syntagmatic comparisons, see van Heuven (2018). Different segments have inherently different durations, intensities, and resonance frequencies. The vowel in im- is shorter, has less intensity, and has different formant frequencies than the vowel in -port, which differences preclude a direct comparison of the effect of stress. Also, final syllables of words tend to be pronounced more slowly than nonfinal syllables, which adds to the difficulty of isolating the correlates of stress. Although it is possible, in principle, to correct for segment-inherent and position-dependent properties, this is not normally done in research on acoustic correlates of stress. If a language has no minimal stress pairs—for instance, when the language has fixed stress—paradigmatic comparison is not possible. However, phonetic differences between stressed and unstressed syllables in a fixed-stress language will always be ambiguous, since the effects can be caused by a difference in stress, but also by the difference of the position of the syllable in the word. In order to make the stressed syllable stand out from its environment, the talker makes an effort to pronounce this syllable more clearly. The result is that the stressed vowel and consonants approximate their ideal articulatory targets more closely, which in turn causes the segments to be lengthened and be produced with greater acoustic distinctiveness and intensity.

10.2.1 Segment duration Segmentation is a somewhat artificial task because of widespread coarticulation of speech movements. However, the timing of events such as consonantal closure, consonantal release, and voice onset can often be reliably identified in the acoustic waveform and spectrogram, and segment durations can be measured on the basis of these intervals

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PHONETIC CORRELATES OF WORD AND SENTENCE STRESS 153 (e.g. Turk et al. 2006). Findings for segment-related durations suggest that the lengthening effects of stress are strongest for vocalic intervals, which in English and Dutch will be approximately 40–50% longer when stressed (Fry 1955; Nooteboom 1972; Sluijter and van Heuven 1995, 1996a).3 Fry showed that the duration of the vocalic interval differentiated stressed from unstressed tokens in an automatic classification procedure with an accuracy of 98% (for details, see van Heuven 2018). In English, the lengthening caused by stress is found irrespective of the position of the syllable in the word (e.g. van Santen 1994). In Dutch, however, a phrase-final syllable that is already longer because of pre-boundary lengthening will not be lengthened further when it has sentence stress (Cambier-Langeveld and Turk 1999). Fry (1955) suggested that consonants were less susceptible to lengthening by stress than vowels. Findings with respect to stress on consonantal intervals come primarily from comparisons of consonantal intervals in unstressed syllables with those in stressed syllables that bear both word and sentence stress (e.g. Lisker 1972; van Santen 1994). In English, the size of the stress effect depends on the type of consonant and its position in the word (van Santen 1994). Word-initial and word-final effects of stress in van Santen (1994) were no greater than 20 ms, but larger effects were observed in word-medial position for ˈVCV versus VˈCV comparisons, particularly for /s/ and /t/ (see also Klatt 1976; Turk 1992). Alveolar stops in ˈVCV position often lenite to taps in American English, as does /t/ to glottal stop in British English. In Dutch, Nooteboom (1972) found small but consistent effects of stress on consonants, both in onset and coda position, in trisyllabic CVCVCVC nonsense words (see also van Heuven 2018).

10.2.2 Intensity In most studies on stress, the peak intensity of the vowel is measured as a correlate of stress. Intensity can be measured as the root-mean-square average of the amplitude of the sound wave in a relatively short time window that should include at least two periods of the glottal vibration. For a male voice the integration window should be set at 25 ms; for a female voice the window can be shorter. Instead of the peak intensity, some studies (also) report the mean intensity (mean of the intensities per time window divided by the number of time steps contained in the vocalic interval). Consonant intensities are not normally reported as correlates of stress. Beckman (1986) proposed the intensity integral (i.e. the total area under the intensity curve of a vowel) as an optimal correlate of stress. It should be realized, however, that the intensity integral is essentially a two-dimensional measure, whose value is determined jointly by the vowel duration and the mean intensity. For this reason we prefer to report the values for these two dimensions separately. Sound intensities are conventionally reported as decibels (dB). The decibel scale has an arbitrary zero-point, which is equal to the sound level that distinguishes audible sound from silence for an average human hearer. The decibel scale is logarithmic: every time we multiply the intensity of a sound by a factor of 10, we add 10 dB. The loudest intensity the human ear can tolerate (the threshold of pain) is a trillion times stronger than the threshold of hearing, so that the range of 3 Fry (1955) and Nooteboom (1972) actually elicited their words in sentence contexts. However, Nooteboom made sure the targets were out of focus (no sentence stress), and Fry did not measure pitch effects. So, it would be safe to use these data as correlates of word stress rather than sentence stress.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

154 VINCENT J. VAN HEUVEN AND ALICE TURK i ntensities is between 0 and 120 dB. Vowel intensities are typically in the range of 55 and 75 dB. The effects of stress are small but consistent: a stressed vowel is roughly 5 dB stronger than its unstressed counterpart. Fry (1955) has shown that the stressed and unstressed realization of the same syllable in English minimal stress pairs can be discriminated from each other with 89% accuracy (see van Heuven 2018 for details).

10.2.3 Spectral tilt When we produce a sound with more vocal effort, the effect is not just that the overall intensity of the sound increases. The increased airflow through the glottis makes the vocal folds snap together more forcefully, which specifically boosts the intensity of the higher harmonics (above 500 Hz), thereby generating a flatter spectral tilt. This effect of vocal effort has been shown to be perceptually more noticeable than the increase in overall intensity (Sluijter and van Heuven 1996a; Sluijter et al. 1997). The spectral slope of a vowel can be estimated by computing its long-term average spectrum (LTAS) in the range of 0 and 4,000 Hz, and then fitting a linear regression line through the intensities of the LTAS. Spectral tilt is the slope coefficient of the regression line and is expressed in dB/Hz. The spectral tilt is typically flatter for the stressed realization of a vowel than for an unstressed realization (all else being equal). See also Campbell and Beckman (1997), Hanson (1997), Fulop et al. (1998), Hanson and Chuang (1999), Heldner (2001), Traunmüller and Eriksson (2000), and Kochanski et al. (2005) for additional ways of measuring spectral tilt.

10.2.4 Spectral expansion Stressed sounds are articulated more clearly. For vowels, this means that the formant values will deviate further away from those of a neutral vowel (schwa). The acoustic vowel triangle with corner points for /i, a, u/ in an F1-by-F2 plot will be larger for vowels produced with stress than when produced without stress (‘spectral expansion’; a shrinking of the effective vowel triangle for unstressed vowels is commonly referred to as ‘spectral reduction’). Spectral expansion is expressed as the Euclidean distance of the vowel token in the F1-by-F2 vowel space from the centre of the vowel space, where the neutral vowel schwa is located (i.e. F1 = 500 Hz, F2 = 1500 Hz for a typical male voice; add 15% per formant for female voices). It is advised to apply perceptual scaling of the formant frequencies in order to abstract away from differences in sensitivity of the human ear for low and high formant frequencies, applying Bark conversion (also to the neutral reference vowel). Hertz-to-Bark conversion is done as in (3) by an empirically determined transformation (Traunmüller 1990). (3) Bark = 7 × Ln (hertz / 650 + sqrt (hertz / 650)2 + 1) The Euclidean distance D of the vowel token (V) from the centre of the vowel space (schwa) is then computed by (4): (4) D = sqrt ((F1V − F1schwa)2 + (F2V − F2schwa)2) Typically, the difference (in hertz or, better still, in Bark units) between the D of the stressed vowel token and that of its unstressed counterpart (in a paradigmatic comparison) is

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PHONETIC CORRELATES OF WORD AND SENTENCE STRESS 155 ositive, indicating that the stressed vowel is further away from the centre of the vowel p space. Spectral expansion has been reported as a useful correlate of stress for Dutch (Koopmans-van Beinum 1980; van Bergem 1993; Sluijter and van Heuven 1996a) and for English (Fry 1965; Sluijter et al. 1995; Sluijter and van Heuven 1996b). Although automatic discrimination between stressed and unstressed (and therefore partially reduced) tokens of the same vowels was well above chance, the discriminatory power of spectral expansion is smaller than that of either duration or intensity. A calculation of spectral expansion and reduction might also be attempted for frication noise—that is, for fricative sounds and release bursts of stops and affricates. Frication noise is not normally analysed in terms of resonances but characterized by the statistical properties of the entire energy distribution. The noise spectra do not normally extend to frequencies below 1 kHz. In order to be able to compare the noise spectra across voiced and voiceless sounds, it is expedient to confine the spectral analysis to a 1 to 10 kHz frequency band, thereby excluding most of the energy produced by vocal fold vibration in voiced sounds. Maniwa et al. (2009) propose that all four moments of the energy distribution be measured as correlates—that is, the spectral mean (also known as centre of gravity), the standard deviation, the skew, and the kurtosis. Their analysis of clearly articulated (stressed) fricatives in American English shows that the combination of spectral mean and standard deviation discriminates well between fricatives of different places of articulation. We are not aware, however, of any studies that have applied the concept of spectral moments to the effects of stress.

10.2.5 Resistance to coarticulation One characteristic of a spectrally expanded stressed syllable is that its segments show little coarticulation with each other or with the abutting segments of preceding and following unstressed syllables. Unstressed syllables are, however, strongly influenced by an adjacent stressed syllable, in that properties of the stressed syllable are anticipated in a preceding unstressed syllable and perseverate into the following unstressed syllable (van Heuven and Dupuis 1991). Resistance to coarticulation was claimed to be an important articulatory correlate of stress in English by Browman and Goldstein (1992a) and in Lithuanian by Dogil and Williams (1999); see also Pakerys (1982, 1987). Acoustic correlates of coarticulatory resistance are most likely to be longer segment durations and larger frequency differences between onset and offset of CV and VC formant transitions in stressed syllables, when paradigmatically compared with their unstressed counterparts.

10.2.6 Rank order The findings on acoustic correlates of word stress in English and Dutch are compatible with the suggestion traditionally made in the literature that duration of the syllable (or the vowel within it) is the most consistent and reliable correlate of stress, followed by intensity. The literature on other correlates is somewhat scarcer, but what emerges is that (flatter) spectral tilt and spectral expansion of vowels are at the bottom of the rank order, with no clear difference between them.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

156 VINCENT J. VAN HEUVEN AND ALICE TURK

10.3 Acoustic correlates of sentence stress Words that are the prosodic heads of constituents that are in focus (i.e. constituents that are presented by the speaker as introducing important information into the discourse) are pronounced with sentence stress on the syllable that carries the word stress; such words are often called ‘(nuclear pitch) accented’. Words that refer to concepts that were introduced in the (immediately) preceding context are typically pronounced without a sentence stress and have word stress only. Function words generally do not receive sentence stress. When a word is pronounced with sentence stress (nuclear or prenuclear pitch accent), the stressed syllable in that word is associated with a prominence-lending change in the rate of vocal fold vibration causing a change in pitch often called a ‘pitch accent’ (Vanderslice and Ladefoged 1972; Pierrehumbert 1980; Pierrehumbert and Hirschberg 1990; Beckman and Edwards 1994). The fundamental frequency (f0) change may be caused by a local rise in the frequency with which the vocal folds vibrate (causing f0 to go up), by a fall of the f0, or by a combination of rise and fall. These abrupt f0 changes are typically analysed as sequences of two f0 targets, H (for high f0) and L (for low f0), where one of the targets is considered to be prominence-lending (indicated with a star in phonological analyses (e.g. the ToBI transcription system; Beckman et al. 2005). An H* configuration (assuming preceding low f0) would then represent a rise, whereas as H*L would be a rise-fall configuration—in both cases with a prominence-lending H target. Changes in f0 associated with sentence stress may differ in size and in their location in the syllable. Less than full-sized f0 changes are denoted by downstepped H targets (!H*). However, for accurate (automatic) classification of sentence stress, the f0 movement in normal human speech should be at least four semitones (a change in f0 of at least a major third, or 25% in hertz). Normally, smaller f0 changes are not prominence-lending. Such small f0 perturbations (also called micro-intonation; Di Cristo and Hirst 1986; ’t Hart et al. 1990) are interpreted by the listener as involuntary (non-planned), automatic consequences of, among other things, the increase in transglottal pressure due to the sudden opening of the mouth during the production of a vowel. Sluijter et al. (1995) showed that the members of disyllabic English minimal stress pairs were differentiated automatically with 99% accuracy by the presence of an f0 change of at least four semitones within the confines of the stressed syllable. The f0 contrast between initial and final stress fell to chance level for words produced without sentence stress (i.e. without pitch accents), suggesting that f0 is not a correlate of word stress in English. The temporal alignment of the prominence-lending f0 change is a defining property of the sentence stress. For instance, rises that occur early in a Dutch syllable are interpreted as a sentence stress, but when late in a phrase-final syllable they are perceived as a H% boundary tone (’t Hart et al. 1990: 73).4 The alignment of the f0 changes differs characteristically 4 It is assumed here that ’t Hart et al.’s boundary-marking rise ‘2’ refers to the same phenomenon as the H% boundary tone (cf. Gussenhoven 2005: 139).

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PHONETIC CORRELATES OF WORD AND SENTENCE STRESS 157 between languages (e.g. Arvaniti et al. 1998 for Greek versus Ladd et al. 2000 for Dutch), and even between dialects of the same language (e.g. van Leyden and van Heuven 2006). A secondary correlate of sentence stress has been found in temporal organization. Dutch words with sentence stress are lengthened by some 10% to 15%. Van Heuven’s (1998) experiments that manipulated focus independently of sentence stress ([+focus, −sentence stress] vs. [+focus, +sentence stress]) showed that durational effects are due to sentence stress, rather than to focus; effects of focus on duration are indirect and occur via the mapping of focus to sentence stress (focus-to-accent principle; Gussenhoven 1983a; Selkirk 1984; Ladd 2008b). Early work (e.g. Eefting 1991; Sluijter and van Heuven 1995; Turk and Sawusch 1997; Turk and White 1999; Cambier-Langeveld and Turk 1999) showed that sentence stress affects more than just the syllable with word stress. For example, both syllables in bacon (English) and panda (Dutch) are longer in contrastively focused contexts where the word bears sentence stress than when it does not. Experiments that manipulated the location of sentence stress in two-word target phrases (e.g. bacon force vs. bake enforce) showed that effects were largely restricted to the word bearing sentence stress (with the exception of much smaller spill-over effects, which can occur on a syllable immediately following the stressed syllable across a word boundary). The occurrence of longer durations on both syllables in words such as bacon led to the question of whether the sentence stress targeted a continuous domain, perhaps corresponding to the whole word, or to part of a word (e.g. a foot). However, findings for longer words, such as Dutch geˈkakel ‘cackling’ (Eefting 1991) and English ˈpresidency and ˌcondeˈscending (Dimitrova and Turk 2012), show that sentence stress lengthens particular parts of words more than others, specifically stressed syllable(s), word-onset consonant closure intervals, and final syllable rhyme intervals. These findings suggest that words bearing sentence stress are marked durationally in two ways, (i) by lengthening their stressed syllables (primary and secondary) and (ii) by lengthening their edges, in a similar way to well-documented effects of phrase-initial and phrase-final lengthening (e.g. Wightman et al. 1992; Fougeron and Keating 1997;). Additional spill-over effects that are small in magnitude (10% or less) can be observed on syllables immediately adjacent to stressed syllables. Van Bergem (1993) showed that sentence stress can cause spectral expansion of full vowels in Dutch comparable in magnitude to the spectral expansion effect of word stress. See also Summers (1987), de Jong et al. (1993), Beckman and Edwards (1992, 1994), Cho (2005), and Aylett and Turk (2006) for spectral expansion, differences in articulatory magnitudes, and resistance to coarticulation for different stress categories in English. In Germanic languages, then, sentence stress is signalled acoustically by all the properties of word stress. In addition to these properties that word stress and sentence stress share, sentence stress has prominence-lending f0 changes and some lengthening of (parts of) the word containing the sentence stress. These additional properties in particular discredit the theory that word stress is simply a reduced version of sentence stress (e.g. Chomsky and Halle 1968). The experimental results reported also show that a large change in f0, when appropriately aligned with the segments making up the syllable, is a powerful correlate of (sentence) stress, even though there is no consistent f0 contour that accom panies all sentence stresses.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

158 VINCENT J. VAN HEUVEN AND ALICE TURK

10.4 Perceptual cues of word and sentence stress In §10.2 and §10.3, we have seen that word and sentence stress are acoustically marked by at least five different correlates—that is, longer duration, higher (peak or mean) intensity, flatter spectral tilt, more extreme formant values, and, in the case of sentence stress, a prominence-lending change in the f0. Studies on the perceptual cues for stress employ synthetic speech or resynthesized nat ural speech in which typically two stress-related parameters are varied systematically. The range of variation of each parameter is representative of what is found in stressed and unstressed tokens produced in natural speech. For each parameter the range is then subdiv ided into an equal number of steps (e.g. five or seven). The relative perceptual strength of a parameter is quantified as the magnitude of the cross-over from stress to non-stress and as the width or steepness of the psychometric function describing the cross-over.5 By running multiple experiments constructed along the same lines, a generalized rank order of perceptual stress cues will emerge. For instance, Fry published a series of three experiments comparing the perceptual strength of vowel duration (as a baseline condition) with that of three other parameters: peak intensity (Fry 1955), f0 (Fry 1958) and vowel quality (Fry 1965).6 Fry (1955) varied the durations of V1 and V2 in synthesized tokens of English minimal stress pairs such as import–import in five steps. The target words were embedded in a fixed carrier sentence Where is the accent in . . ., with sentence stress on the target; the f0 was 120 Hz throughout the sentence. The duration steps were systematically combined with five intensity differences (by amplifying V1 and at the same time attenuating V2) such that the V1–V2 difference varied between +10 and −10 dB. Figure 10.1a presents perceived initial stress for the five duration steps (averaged over words and intensity steps) and for the five intensity steps (averaged over words and duration ratios). The results show a cross-over from stress perceived on the first syllable to the second syl lable. The cross-over takes place between duration steps 2 and 3 and is both steep (within one stimulus step) and convincing (≥ 75% agreement on either side of the boundary). In contrast to this, the intensity difference is inconsequential: although there is a gentle trend for more initial stress to be perceived as V1 has more decibels than V2, the difference is limited to some 20 percentage points; the boundary width, which can only be estimated by extrapolation, would be some 15 times larger than for duration. This shows that duration outweighs intensity in Fry’s experiment roughly by a factor of 15. See Turk and Sawusch (1996) for similar findings. Figure 10.1b shows the results of a similar experiment run by Sluijter et al. (1997) for a single Dutch minimal stress pair, the reiterant non-word nana. The results are the same as in English. Van Heuven (2014, 2018) showed that manipulating the duration of a consonant was largely inconsequential for stress perception in disyllabic Dutch reiterant nonwords. 5 The width (or steepness) is ill-defined if the cross-over from one percept to the other is incomplete. 6 Because the cue value of vowel quality was very weak, Fry (1965) limited the range of duration variation severely relative to the earlier two experiments.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PHONETIC CORRELATES OF WORD AND SENTENCE STRESS 159

Percent initial stress perceived

100 Duration

Intensity

80 60 40 20 0

–10 .25

–5 .60

0 1.20

5 1.75

10 2.25

–3 .47

–2 .58

–1 .69

0 .83

1 .98

2 1.15

3 1.35

Intensity difference V1– V2 (dB) and duration ratio V1 / V2

Figure 10.1 Initial stress perceived (%) as a function of intensity difference between V1 and V2 (in dB) and of duration ratio V1 ÷ V2 in minimal stress pairs (a) in English, after Fry (1955), and (b) in Dutch, after van Heuven and Sluijter (1996).

An almost complete cross-over from initial to final stress perception was achieved nevertheless by shortening or lengthening either the onset or coda C in the first syllable by 50%, but only if the syllable contained a short vowel. Consonant changes in the second final syl lable, or in syllables with long vowels, had no perceptual effects. Sluijter et al. (1997) showed that intensity differences become (much) more important as perceptual stress cues if the energy differences are concentrated in the higher frequency bands (above 500 Hz), which is tantamount to saying that a flatter spectral slope cues stress. Under normal listening conditions vowel duration remained the stronger cue, but manipulating the spectral slope became almost as effective when stimuli were presented with a lot of reverberation (so that the segment durations were poorly defined). Fry’s (1965) results indicate that spectral vowel reduction was only a weak stress cue in English noun–verb pairs (contract, digest, object, subject), where stress was less likely to be perceived on the syllable with reduced vowel quality; the tendency was somewhat stronger when the vowel quality was reduced in the F2 dimension (backness and rounding) than in the F1 dimension (height), and was strongest when both quality dimensions were affected simultaneously. The effect of vowel quality was small and did not yield a convincing crossover: the percentage of initial-stress responses varied between 45% and 60%. The effect of vowel duration was clearly much stronger. Even with the smaller range of duration variation adopted in this experiment, a convincing cross-over was obtained spanning more than 50 percentage points. Fry’s results were confirmed for Dutch in a more elaborate study by van Heuven and de Jonge (2011). Vowel reduction was taken as a cue for non-stress only when the duration ratio was ambiguous between initial and final stress. Fry (1958) found that f0 changes were stronger perceptual stress cues than duration changes (see van Heuven 2018 for a detailed analysis of this experiment and a similar one by Bolinger 1958). For Dutch, a properly timed pitch change will always attract the perception of stress. This cue cannot therefore be neutralized by any combination of cues suggesting stress on a different syllable (van Katwijk 1974; see also van Heuven 2018). The upshot of the experiments on English and Dutch is that f0 provides a very strong cue to stress perception, overriding all (combinations of) other cues, provided the f0 change is properly aligned.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

160 VINCENT J. VAN HEUVEN AND ALICE TURK An appropriately aligned pitch change is the hallmark of sentence stress and since it is exclusively a property of sentence stress (i.e. not of word stress), this explains why the effects of the f0 change cannot be counteracted by other cues. The overall conclusion of this section is that the strength of acoustic correlates of stress and strength of their perceptual cue values do not correlate. This is for two reasons. First, the location of an f0 change is a strong correlate of stress in speech production, but it can only yield reliable automatic detection if the f0 change exceeds a threshold of three to four semitones and if it is appropriately aligned with the segmental structure. When words do not receive sentence stress, the f0 change is no longer a reliable correlate. A smaller f0 change may still be effective as a cue for sentence stress as long as it is noticeably larger than the f0 movements associated with micro-intonation (see §10.3). A change from 97 to 104 Hz (roughly one semitone) was enough to evoke final-stress perception, while the reverse change of the same magnitude yielded initial stress (Fry 1958, experiment 1). Therefore, f0 change may be perceptually the strongest cue, but it is acoustically unreliable. Second, the human listener does not rely on uniform intensity differences between stressed and unstressed syllables. This probably makes intensity the weakest perceptual cue of all, even though it is acoustically quite reliable.7 Differences in vowel duration are both perceptually strong and acoustically highly reliable, for both word stress and sentence stress.

10.5 Cross-linguistic differences in phonetic marking of stress There has been some speculation on the question of whether or not any language that uses the linguistic parameter of stress also uses the same correlates, with the same order of relative importance of these acoustic correlates and as cues to stress perception. The general feeling is that different correlates (and different perceptual cues) are employed depending on the structure of the language under analysis. In this section we will discuss two sets of differences between languages and their potential consequences for stress marking. The first set of differences concerns the type of stress system a language employs, whereas the second source of difference is located in the relative exploitation within a language of stress parameters for other linguistic contrasts.

10.5.1 Contrastive versus demarcative stress It seems reasonable to assume that languages with fixed stress have a smaller need for strongly marked stress positions than languages in which the position of the stressed syl lable varies from word to word. In the latter type, the position of the stress within the word

7 The relative unresponsiveness of the human hearing mechanism to differences in intensity in a linguistic context has been known for over a century. The first to comment on this phenomenon was Saran (1907). See also Mol and Uhlenbeck (1956: 205–213, 1957: 346) and Bolinger (1958: 114).

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PHONETIC CORRELATES OF WORD AND SENTENCE STRESS 161 is a potentially contrastive property, whereas in the former type words are never distinguished by the position of the stress, which is the same for all the words in the language.8 We would predict, therefore, that the size of the f0 movements does not vary as a function of the type of word-stress system of the language, but that the difference between stressed and unstressed syllables in non-focused words is less clearly marked along all the non-f0 parameters correlating with word stress. There is some evidence that the basic prediction is correct. Dogil and Williams (1999) presented a comparative study of Polish (fixed penultimate stress) and German (quantity-sensitive plus lexical stress) stress marking, and concluded that stress position is less clearly marked in Polish. Similar results were found more recently in a strictly controlled cross-linguistic study of Spanish and Greek (with contrastive stress) versus Hungarian (fixed initial stress) and Turkish (fixed final stress) by Vogel et al. (2016). Their results show that the same set of acoustic stress parameters (applied in the same manner across the four languages) affords good to excellent automatic classification of stressed and unstressed syllables at the word level for the two contrastive-stress languages but not for the fixed-stress languages.

10.5.2 Functional load hypothesis Berinstein (1979) was the first to formulate the Functional Load Hypothesis (FLH) of stress marking. The FLH predicts that stress parameters will drop to a lower rank in the hierarchy of stress cues when they are also employed elsewhere in the phonology of the language. For instance, if a language has a length contrast in the vowel system, vowel duration—which is normally a strong cue for stress—can no longer function effectively in the signalling of stress. Berinstein (1979) is often quoted in support of the FLH (e.g. Cutler 2005). In fact, Berinstein’s claim is contradicted by her own results. Languages with long versus short vowels (English, K’ekchi) were found to exploit duration as a stress cue as (in)effectively as similar languages without a vowel length contrast (Spanish, Kaqchikel). For a detailed ana lysis of Berinstein’s results, see van Heuven (2018). Similarly, Vogel et al. (2016) compared the strength of correlates of word and sentence stress in Hungarian, Spanish, Greek, and Turkish. For Hungarian, which has a vowel length contrast that is lacking in the other three languages, the FLH predicts a lower rank of duration as a stress cue, which was not in fact found. This null result was confirmed by Lunden et al. (2017), who found no difference in the use of duration as a stress cue between languages with and without segmental length contrasts in their database of 140 languages. These results suggest that the FLH by itself does not determine the overall ranking of particular stress cues in individual languages. There is some evidence, however, that functional load may be involved in determining the magnitudes of effects in some contexts in some languages, preventing the use of stress correlates in production from disrupting the use of the same acoustic correlates in signalling lexical contrast. For example, Swedish is a language with phonemic vowel length distinctions; according to the FLH this language 8 The assumption is that the word-boundary cue imparted by fixed stress is highly redundant and can be dispensed with even in noisy listening conditions, whereas contrastive stress provides vital information to the word recognition process (see Cutler 2005 for a detailed discussion of the role of stress in the word recognition process).

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

162 VINCENT J. VAN HEUVEN AND ALICE TURK would be expected to make little use of duration to signal stress. However, Swedish does use duration to signal sentence stress, but only for phonemically long vowels (Heldner and Strangert 2001). In this way, phonemic contrasts are maintained. The FLH would therefore need to be weakened to accommodate these findings. Although there seems little support for the original, strong version of the FLH, as it relates to stress versus segmental quantity correlates, the situation may well be different when stress-related parameters are in competition with f0 contrasts. What, for instance, if a language has both stress and lexical tone? In such cases, it might be more difficult for the listener to disentangle the various cues for the competing contrasts. Potisuk et al. (1996) investigated the acoustic correlates of sentence stress in Thai, a language with five different lexical tones and a vowel length contrast. Fundamental frequency should not be a high-ranking correlate of (sentence) stress, given that f0 is the primary correlate of lexical tone. Duration should not be an important stress cue since it is already implicated in the vowel length contrast. Automatic classification of syllables as stressed versus unstressed was largely unsuccessful when based on f0, while intensity was not significantly affected by stress. Duration proved the strongest stress correlate, yielding 99% correct stress decisions. These results, then, are in line with the idea developed above that stress parameters can be used simultaneously in segmental quantity and stress contrasts but not in simultaneous stress and tone contrasts. This was confirmed by Remijsen (2002) for Samate Ma’ya. This Papuan language has both lexical tone and stress, but does not have a vowel length contrast. Acoustic correlates of stress were the f0 contour, vowel quality (expansion/reduction), loudness (intensity weighted by frequency band), and duration. Remijsen’s results reveal a perfect inverse relationship between the parameters’ positions in the rank orders of cues for stress and tone.9 The original FLH idea, as formulated by Berinstein (1979), Hayes (1995), and Potisuk et al. (1996), was that stress correlates cannot be effectively used if they are also involved in lexical contrasts, whether tonal or segmental in nature. This, then, would seem to be too strong a generalization.10 The FLH appears to make sense only as long as parameters are in competition for stress versus lexical tone contrasts.

10.6 Conclusion In this chapter we reviewed the acoustic correlates of word and sentence stress (§10.2 and §10.3, respectively), drawing mainly on studies done on European languages. We concentrated on the marking of primary stress at both levels, leaving the marking of secondary and lower stress levels largely untouched. In the rank order of stress correlates that emerged, the effects of stress are most reliably seen in (relatively) longer vowel duration, followed by 9 Since Mandarin Chinese is a tone language, the FLH predicts an avoidance of the use of f0 in s ignalling focus. However, it does use expanded f0 range as a correlate of sentence stress marking focus (Xu 1999). Fundamental frequency range and tone shape therefore seem to operate as independent parameters. Similarly, f0 is used in Germanic languages to signal sentence stress as well as boundary tone without competition because tone shape and alignment are separate parameters. 10 In a recent meta-analysis, Lunden et al. (2017) presented results from a database of reported stress correlates and use of contrastive duration for 140 languages, and found no support for the FLH.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PHONETIC CORRELATES OF WORD AND SENTENCE STRESS 163 greater intensity, more spectral expansion, and flatter spectral tilt. A change in f0 does not reliably correlate with word stress, but, if appropriately timed and larger than three to four semitones, is a unique and highly reliable marker of sentence stress. In §10.4 we examined the perceptual cues for stress. It was found that an appropriately timed change in f0 is the strongest cue for stress, such that it can counteract any (combin ation of) other cues suggesting stress on a different syllable. We argue that this is because the f0 cue is the unique marker of sentence stress, and sentence stress outranks word stress. The rank order of acoustic correlates of stress is therefore not necessarily the same as the order of importance of perceptual cues. We interpret the findings as evidence suggesting that word and sentence stress are different phenomena with different communicative functions, rather than that word stresses are just lower degrees of sentence stress. In §10.5 we asked whether stress is cued by the same acoustic parameters in the same order of magnitude in all languages. The available data suggest that, overall, stress is acoustically less clearly marked in languages with fixed stress than in languages in which the stress position varies between words. No cross-linguistic support was found for the claim that stress cues become less reliable or less salient when they are implicated in segmental length contrasts. However, a weaker version of this FLH may remain viable, since ( sentence) stress and (lexical) tone do draw on partially shared prosodic parameters.

Appendix Measuring correlates of stress using Praat speech processing software The program Praat (Boersma and Weening 1996) can be downloaded at no cost from www. praat.org. No scripting is assumed here. Results of measurements can be copy-pasted from the information window to a spreadsheet.

Measuring duration D1. Read the sound file into Praat. Select the Sound object and in the editor drag the portion of the waveform that corresponds exactly to the target vowel or consonant. D2. Click Query > Get selection length for target duration.

Measuring intensity I1. As D1. I2. Perform step P1 (under ‘Measuring pitch correlates’ below) to set appropriate window size. I3. Under Intensity, click Get intensity for mean intensity. I4. Click Get maximum intensity for peak intensity.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

164 VINCENT J. VAN HEUVEN AND ALICE TURK

Measuring spectral tilt T1. As D1, then under File click Extract selected sound (time from 0). T2. In the Objects window click Analyse spectrum > To Ltas…, Bandwidth = 100 Hz, OK. T3. In the Objects window click Compute trend line…, Frequency range from 50 to 4,000 Hz, OK. T4. In the Objects window click Query > Report spectral tilt, 0 to 4,000 Hz, linear, robust, OK.

Measuring formants F1, F2 (for vowels and sonorant consonants) F1. As D1. Then set the spectrogram window by checking Show spectrogram (under Spectrum); view range from 0 to 10,000 Hz, Window length = 0.005 s, Dynamic range = 40 dB, OK. F2. In the spectrogram window, drag the (spectrally stable) portion of the waveform you want to analyse. F3. Under Formant, check Show formants. Set parameters Maximum formant = 5,000 Hz, Number of formants = 5, Window Length = 0.025 s, Dynamic Range = 30 dB, Dot size = 1 mm, OK. F4. Visually check that the formant tracks coincide with the energy bands in the spectrogram (if not, adjust the settings by changing Maximum formant and/or Number of formants). F5. Under Formant click Get first formant (or press F1 on keyboard), Get second formant (or press F2).

Measuring noise spectra (for fricatives, stops, and affricates) N1. Perform T1 for the portion of the waveform that corresponds to the noise burst you want to analyse. N2. In the Objects window click Filter…, From 1,000 Hz, To 10,000 Hz, Smoothing 100 Hz, OK. N3. In the Objects window click Analyse spectrum > To spectrum (Fast), OK. N4. Query Get centre of gravity…, Power = 2; Get standard deviation…, Power = 2; Get skewness…, Power = 2; Get kurtosis…, Power = 2.

Measuring pitch correlates P1. Read the sound file into Praat. Select the sound file in the Praat objects list. Click View and Edit. Ask for a pitch display by checking the box Show pitch (under Pitch). Adjust settings to the speaker’s voice: click Pitch settings. . . > Pitch

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PHONETIC CORRELATES OF WORD AND SENTENCE STRESS 165 range = 75 Hz, 250 Hz (for a male voice; double these frequencies for a female voice), Unit = Hertz, Analysis method = Autocorrelation, Drawing method = Speckles. P2. In the editor drag the time interval in which you wish to locate a pitch maximum and minimum. Click Query > List (or press F5). P3. In the listing delete all lines except the ones with the f0 maximum and minimum. Copy and paste the time-frequency coordinates. Minimum precedes maximum for f0 rise, but follows maximum for f0 fall. Note: complex f0 changes (e.g. rise-fall) are analysed separately for the rise and fall portions. The time-frequency coordinates of the maximum will be the same for the rise and the fall. Hertz values can be converted offline to either semitones or (better still) equivalent rectangular band units (ERB).11 P4. In the waveform, locate the vowel onset (or some other segmental landmark you want to use for your alignment analysis) of the target syllable. Query > Get cursor. Store the time coordinate of the segmental landmark (this will be needed later offline to measure the alignment of the pitch change).

11 The ERB scale is preferred when the f0 interval is the correlate of perceived prominence (Hermes and van Gestel 1991). The semitone scale is more appropriate for f0 as the correlate of lexical or boundary tones (Nolan 2003).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 11

Speech R h y th m a n d Ti mi ng Laurence White and Zofia Malisz

11.1 Introduction Rhythm is a temporal phenomenon, but how far speech rhythm and speech timing are commensurable is a perennial debate. Distinct prosodic perspectives echo two Ancient Greek conceptions of time. First, chronos (χρόνος) signified time’s linear flow, measured in seconds, days, years, and so on. Temporal linearity implicitly informs much prosody research, wherein phonetic events are interpreted with respect to external clocks and surface timing patterns are expressible through quantitative measures such as milliseconds. Second and by contrast, kairos (καιρός) was a more subjective notion of time as providing occasions for action, situating events in the context of their prompting circumstances. Kairos was invoked in Greek rhetoric: what is spoken must be appropriate to the particular moment and audience. Rhythmic approaches to speech that might be broadly classified as ‘dynamical’ reflect—to varying degrees—this view of timing as emerging from the intrinsic affordances occasioned by spoken interaction. Interpretation of observable timing patterns is complicated by the fact that vowel and consonant durations are only approximate indicators of the temporal coordination of articulatory gestures, although there is evidence that speakers do manipulate local surface durations for communicative goals (e.g. signalling boundaries and phonological length; reviewed by Turk and Shattuck-Hufnagel 2014). Furthermore, perception of speech’s temporal flow is not wholly linear. For example, Morton et al. (1976) found that a syllable’s perceived moment of occurrence (‘P-centre’) is affected by the nature of its sub-constituents. Moreover, variation in speech rate can affect the perception of a syllable’s presence or absence (Dilley and Pitt 2010) and the placement of prosodic boundaries (Reinisch et al. 2011). Thus, surface timing patterns may have non-linear relationships both to underlying control structures and to listeners’ perceptions of prominence and grouping. More generally, the term ‘speech rhythm’, without qualification, can cause potentially serious misunderstandings because: ‘ “rhythm” carries with it implicit assumptions about the way speech works, and about how (if at all) it involves periodicity’ (Turk and

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

SPEECH RHYTHM AND TIMING 167 Shattuck-Hufnagel 2013: 93). Various definitions of rhythm applied to speech, and the timing thereof, are considered by Turk and Shattuck-Hufnagel: periodicity (surface, underlying, perceptual), phonological/metrical structure, and surface timing patterns. In this chapter we do not attempt a single definition of speech rhythm but review some of these diverse perspectives and consider whether it is appropriate to characterize the speech signal as rhythmical (for other definitions, see e.g. Allen 1975; Cummins and Port 1998; Gibbon 2006; Wagner 2010; Nolan and Jeon 2014). With such caveats in mind, the remainder of this section reviews four aspects of speech that may influence perceptions of rhythmicity: periodicity, alternation between strong and weak elements, hierarchical coordination of timing, and articulation rate. §11.2 discusses attempts to derive quantitative indices of rhythm typology. §11.3 contrasts two approaches to speech timing, one based on linguistic structure and localized lengthening effects and the other on hierarchically coupled metrical units, and §11.4 considers the prospects for a synthesis of such approaches. We do not attempt a definitive summary of empirical work on speech rhythm and timing (for reviews, see e.g. Klatt 1976; Arvaniti 2009; Fletcher 2010; White 2014) but aim to highlight some key theoretical concepts and debates informing such research.

11.1.1 Periodicity in surface timing Before technology made large-scale analyses of acoustic data tractable, descriptions of speech timing were often impressionistic, with terminology arrogated from traditional poetics. In particular, the assumption that metrical structure imposes global timing constraints has a long history (Steele 1779). A specific timing constraint that proved pervasively influential was ‘isochrony’, the periodic recurrence of equally timed metrical units such as syllables or stress-delimited feet. Classe (1939), while maintaining that isochrony is an underlying principle of English speech, concluded from his data that ‘normal speech [is] on the whole, rather irregular and arrhythmic’ (p. 89), due to variation in the syllable number and phonetic composition of stress-delimited phrasal groups, as well as to grammatical structure. Pike (1945) contrasted typical ‘stress-timed’ English rhythm and ‘syllable-timed’ Spanish rhythms, while asserting that stylistic variation could produce ‘syllable-timed’ rhythm in English. Abercrombie (1967) formalized ‘rhythm class’ typology, asserting that all languages were either syllable timed (e.g. French, Telugu, Yoruba) or stress timed (e.g. Arabic, English, Russian). Isochronous mora-timing has been claimed for Japanese, among other languages (Ladefoged 1975). The mora is a subsyllabic constituent (e.g. consonant plus short vowel), with somewhat language-specific definitions, and is important in Japanese poetics (e.g. haiku comprise 17 morae), where syllables with long vowels or consonantal rhymes constitute two morae. Apparently by extension from poetry (cf. syllables in French and Spanish, stress feet in English and German), spoken Japanese morae were assumed to be isochron ous (e.g. Bloch 1950). Some data suggested approximate mora-timing but with deviations due to the mora’s internal structure (Han 1962) and utterance position (longer morae phrase-finally; Kaiki and Sagisaka 1992). Warner and Arai’s (2001) review concluded that Japanese mora duration is not isochronous, and that relatively regular mora-timing—when observed—is due to contingent features such as syllable phonotactics.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

168 LAURENCE WHITE AND ZOFIA MALISZ The ‘rhythm class’ concept persisted despite much evidence (e.g. Bertinetto 1989; Eriksson 1991) demonstrating the lack of isochrony of syllables or stress-delimited feet in surface speech timing. In a proleptic challenge to the syllable-timing hypothesis, Gili Gaya (1940; cited in Pointon 1980) observed that Spanish syllable duration is strongly affected by structural complexity, stress, and utterance position. Pointon (1980), reviewing Spanish timing studies, concluded that syllable duration is determined bottom-up—what he called an ‘antirhythmic’ or ‘segment-timed’ pattern—and found further support in a study of six Spanish speakers (Pointon 1995; see also Hoequist 1983, contra Spanish syllable-timing). Roach (1982) found similar correlations between interstress interval duration and syllable counts in Abercrombie’s (1967) ‘stress-timed’ and ‘syllable-timed’ languages, with variance measures of syllable and interstress interval duration failing to support the categorical typology. Although the elementary design and single speaker per language limits i nterpretation of Roach’s study, it proved influential for the use of variance measures of interval duration, later adopted in ‘rhythm metrics’, and for challenging the rhythm class hypothesis.

11.1.2 Contrastive rhythm Brown (1911) distinguished ‘temporal rhythm’—the regular recurrence of structural elem ents (here termed ‘periodicity’)—from ‘accentual rhythm’, the relative prominence of certain structural elements (for similar distinctions, see, inter alia: Allen 1975; Nolan and Jeon 2014; White 2014). As discussed above, speech usually lacks periodicity in surface timing, but many languages contrast stronger and weaker elements through lexical stress (relative within-word syllable prominence) and phrasal accent (relative within-phrase word prominence, also called ‘sentence stress’). Here we use the term ‘contrastive rhythm’ rather than ‘accentual rhythm’ (to avoid confusion with the nature of the contrast: lexical stress or phrasal accent). Dauer (1983), in an influential discussion, elaborated upon Roach’s (1982) suggestion that cross-linguistic rhythmic differences may inhere in structural regularities such as vowel reduction and syllable complexity, and their relation with syllable stress. In particular, Dauer observed that the phonetic realization of stressed syllables and their (lexically or syntactically determined) distribution conspire to make (for example) English and Spanish seem rhythmically distinct. Most Spanish syllables have consonant–vowel (CV) structure, whereas the predominant English syllable structure is CVC and up to three onset conson ants and four coda consonants are permissible. Moreover, the association between lexical stress and syllable weight (related to coda cluster complexity) is stronger for English, and also for Arabic and Thai, than Spanish. Additionally, unstressed syllables undergo minimal vowel reduction in Spanish, but most English unstressed syllables contain a reduced vowel, predominantly schwa (Dauer noted unstressed vowel reduction also for Swedish and Russian). All these patterns converge towards a high durational contrast between English strong and weak syllables. Furthermore, English stressed syllables tend to recur with relative regularity, particularly given the existence of secondary lexical stress, while long unstressed syllable sequences are more likely in Greek, Italian, and Spanish (Dauer 1983). Structural trends do not converge onto a high–low contrast gradient for all languages, however: for example, Polish has high syllable complexity but limited vowel reduction, while Catalan has low syllable complexity but significant vowel reduction (Nespor 1990).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

SPEECH RHYTHM AND TIMING 169 In part due to the language’s recent status as a scientific lingua franca, analytical concepts derived from English linguistics have sometimes guided the characterization of other languages. Thus, much early comparative field linguistics had a guiding assumption that ‘stress’ was universally meaningful. In fact, English—particularly standard southern British English—seems a conspicuously ‘high-contrast’ language in terms of lexical stress and also phrasal accent. Comparisons of global timing properties between selected languages often show English to have the highest variation in vowel duration (e.g. Ramus et al. 1999; White and Mattys 2007a). In Nolan and Asu (2009)’s terminology, English has a markedly steep ‘prominence gradient’. Even other Germanic languages, such as Dutch, have sparser occurrence of reduced vowels in unstressed syllables (Cutler and van Donselaar 2001). However, stress is—manifestly—linguistically important in many languages lacking such marked stress cues as English: thus, Dauer (1983) observed that while stress-related duration contrasts are substantially greater in English than Spanish, combinations of cues make stressed syllables in Spanish, Greek, or Italian salient to native listeners (in contrast with French, which lacks lexical stress). Indeed, Cumming (2011b) suggested that languages may appear less rhythmically distinct once prosodic perceptual integration is taken into account (see also Arvaniti 2009). On the other hand, it is also becoming clear that many languages may lack metrically contrasting elements (e.g. Korean: Jun 2005b; Ambonese Malay: Maskikit-Essed and Gussenhoven 2016; see Nolan and Jeon 2014 for references questioning the status of stress in several languages). Tabain et al. (2014) suggested the term ‘stress ghosting’ to highlight how Germanic language speakers’ native intuitions may induce stress perception in languages unfamiliar to them. Stress ghosting arises due to misinterpretation of phonetic or structural patterns that would be associated with prominence in languages—such as Dutch, English, or German— with unambiguous lexical stress contrast (e.g. English ˈinsight vs. inˈcite). By contrast, native speakers of languages without variable stress placement as a cue to lexical identity have been characterized as having ‘stress deafness’ (Dupoux et al. 2001). Specifically, speakers of languages that either lack lexical stress (e.g. French) or have non-contrastive stress (e.g. Finnish or Hungarian fixed word-initial stress) do not appear to retain stress patterns of nonwords in short-term memory, suggesting that their phonological representations do not include stress (Peperkamp and Dupoux 2002; see also Rahmani et al. 2015). Thus, the notion of contrastive rhythm, while pertinent for some widely studied linguistic systems, may be inapplicable for many languages (Nolan and Jeon 2014).

11.1.3 Hierarchical timing Unlike typologies based on isochrony of morae, syllables, or stressed syllables (reviewed above), hierarchical timing approaches do not identify a single privileged unit strictly governing any language’s surface timing. They describe relative timing dependencies between at least two hierarchically nested constituents—for example, the syllable and the stressdelimited foot (e.g. O’Dell and Nieminen 1999). The syllable (or syllable-sized unit, e.g. vowel-to-vowel interval) is regarded as a basic cyclic event in speech perception or production (Fowler 1983) and the smallest beat-induction speech unit (Morton et al. 1976). With regard to the stress-delimited foot, various definitions are proposed, sometimes related to

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

170 LAURENCE WHITE AND ZOFIA MALISZ the metrical structure of particular languages, with a key distinction being whether or not the foot respects word boundaries (e.g. Eriksson 1991; Beckman 1992; Bouzon and Hirst 2004). Analysis of timing relationships between hierarchically nested constituents were developed from Dauer’s (1983) findings that, in various languages, stress foot duration is neither independent of syllable number (the expectation based on foot isochrony) nor an additive function of syllable number (the expectation based on syllable isochrony). Eriksson (1991) further explored Dauer’s data on the positive relationship between total foot duration and syllable number. The durational effect of adding a syllable to the foot (i.e. the slope of the regression line) was similar for all five of Dauer’s (1983) languages. However, the intercept differed between putative ‘rhythm classes’ (‘syllable-timed’ Greek, Italian, Spanish: ~100 ms; ‘stress-timed’ English, Thai: ~200 ms). Eriksson claimed that the natural interpretation of the intercept variation was that the durational difference between stressed and unstressed syllables is greater in English and Thai than in Greek, Italian, or Spanish. However, as Eriksson observed (also O’Dell and Nieminen 1999), the positive intercept does not itself indicate where the durational variation takes place. Eriksson further noted an inverse relationship between the number of syllables in the foot and the average duration of those syl lables. Similarly, Bouzon and Hirst (2004) found sub-additive relationships between several levels of structure in British English: syllables in a foot; phones in a syllable; feet in an intonational unit. These linear relationships between foot duration and the number of sub-constituents, with positive slope and intercept coefficients of the linear function, can—as described in more detail in §11.3.2—be modelled as systems of coupled oscillators (e.g. O’Dell and Nieminen 1999, at the syllable level and foot level). Other approaches that relate surface timing to the coupled interaction of hierarchically nested constituents include work on the coordination of articulatory gestures within syllables and prosodic phrases (e.g. Byrd and Choi 2010) and on the coordination of articulatory gestures within syllables and feet (Tilsen 2009).

11.1.4 Articulation rate Cross-linguistic variations in predominant syllable structures (Dauer 1983) are associated with systematic differences in ‘articulation rate’, defined as syllables per second excluding pauses (given that pause frequency and duration significantly affect overall speech rate; Goldman-Eisler 1956). Estimated rates vary between studies due to the spoken materials, the accents chosen for each language, and speaker idiosyncrasies. Stylistic and idiosyncratic effects notwithstanding, languages with predominantly simple syllable structures, such as Spanish, tend to be spoken at a higher syllables-per-second rate than languages with more complex syllable structures, such as English (White and Mattys 2007a; Pellegrino et al. 2011). Of course, such differences in syllable rates do not imply that Spanish speakers articulate more quickly than English speakers, rather that more syllables are produced per unit of time when those syllables contain fewer segments. Additionally, Pellegrino et al. (2011) pointed to an effect of information density on rate: for example, Mandarin Chinese has lower syllable-per-second rates than Spanish, but more informationally rich syllables when taking lexical tone into account, hence their information density is roughly similar.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

SPEECH RHYTHM AND TIMING 171 Listeners’ linguistic experience may strongly affect rate judgements, particularly with unfamiliar languages. Thus, where Japanese and German utterances were assessed by native speakers of both languages, there was overestimation of the unfamiliar language’s rate compared to the first language (Pfitzinger and Tamashima 2006). This has been described as the ‘gabbling foreigner illusion’ (Cutler 2012): when confronted with speech that we cannot understand, we tend to perceive it as spoken faster (see also Bosker and Reinisch 2017, regarding effects of second language proficiency). This illusion may, in part, be due to difficulties segmenting individual words in unfamiliar languages (Snijders et al. 2007). Conversely, when judging non-native accents, listeners generally interpret faster speech rate as evidence of more native-like production (e.g. White and Mattys 2007b; see HayesHarb 2014 for a review of rate influences on accentedness judgements). Moreover, the perception of cross-linguistic ‘rhythm’ contrasts is influenced by structurally based rate differences (Dellwo 2010). For example, when hearing delexicalized Spanish and English sasasa stimuli (all vowels replaced by /a/, all consonants by /s/, but with the original segment durations preserved), English speakers were more likely to correctly classify faster Spanish but slower English utterances (White et al. 2012; Polyanskaya et al. 2017). Thus, some perceptions of linguistic differences typically described as ‘rhythmic’ may be associated with systematic variations in rate (Dellwo 2010).

11.2 ‘Rhythm metrics’ and prosodic typology Informed, in particular, by Dauer’s (1983) re-evaluation of rhythmic typology, various studies under the ‘rhythm metrics’ umbrella have attempted to empirically capture crosslinguistic differences in ‘rhythm’ (often loosely defined: see §11.2.2 and Turk and ShattuckHufnagel 2013). These studies employed diverse metrics of durational variation (cf. Roach 1982), notably in vocalic and consonantal intervals. Some studies were premised on the validity of ‘rhythm class’ distinctions (e.g. Ramus et al. 1999), raising a potential circularity problem where the primary test of a metric’s worth is whether it evidences the hypothesized class distinctions (Arvaniti 2009), although studies of perceptual discrimination between languages (e.g. Nazzi and Ramus 2003) were sometimes cited as external corroboration. However, the accumulated evidence from speech production and perception—reviewed §11.2.2—strongly questions the validity and usefulness of categorical rhythmic distinctions. Some evaluative studies have highlighted empirical strengths and limitations of different rhythm metrics, observing that while certain metrics might provide data about cross-linguistic variation in the durational marking of stress contrast, they neglect much else that might be relevant to ‘rhythm’, notably distributional information (White and Mattys 2007a; Wiget et al. 2010). More trenchantly, other researchers have argued that the ‘rhythm metrics’ enterprise was compromised by a lack of consistency regarding which languages were distinguished (Loukina et al. 2011)—for example, when comparing read and spontaneous speech (Arvaniti 2012). Indeed, the term ‘rhythm metrics’ is a misnomer: aggregating surface timing features does not capture the essence of ‘speech rhythm’, however defined

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

172 LAURENCE WHITE AND ZOFIA MALISZ (e.g. Cummins 2002; Arvaniti 2009). We next consider some lessons from the ‘rhythm metrics’ approach.

11.2.1 Acoustically based metrics of speech rhythm: lessons and limitations In the development of so-called rhythm metrics for typological studies, there was a threefold rationale for quantifying durational variation based on vowels and consonants, rather than syllables or stress feet. First, languages such as Spanish typically have less vowel reduction and less complex consonant clusters than, for example, English (Dauer 1983). Second, Mehler et al. (1996), assuming early sensitivity to vowel/consonant contrasts, proposed that young infants use variation in vowel duration and intensity to determine their native language ‘rhythm class’. Third, syllabification rules vary cross-linguistically and are not uncontroversial even within languages, while applying heuristics to identify vowel/consonant boundaries is (comparatively) straightforward (Low et al. 2000). Thus, Ramus et al. (1999) proposed the standard deviation of vocalic and consonantal interval duration (‘ΔV’ and ‘ΔC’ respectively), along with the percentage of utterance dur ation that is vocalic rather than consonantal (%V). They found that a combination of ΔC and %V statistically reflected their predefined rhythm classification of, in increasing %V order: Dutch/English/Polish, Catalan/French/Italian/Spanish, and Japanese. Seeking to capture syntagmatic contrast within an utterance as well as global variation, pairwise variability indices (PVIs) average the durational differences between successive intervals—primarily, vocalic/consonantal—over an utterance (see Nolan and Asu’s 2009, account of this development). PVI-based measures showed differences between a Singaporean and a British dialect of English that had been claimed to be rhythmically distinct (Low et al. 2000), as well as gradient variation between languages previously categorized as either ‘stress-timed’ or ‘syllable-timed’ (Grabe and Low 2002, based on one speaker per language). While PVIs were intended to capture sequential durational variation more directly than global measures, Gibbon (2006) noted that PVIs do not necessarily discriminate between alternating versus geometrically increasing sequences (although the latter are implausible in speech: Nolan and Jeon 2014). Variance-based measures of interval duration tend to show high correlation with speech rate: as overall intervals lengthen with slower rate, so—other things being equal—do standard deviations (Barry et al. 2003; Dellwo and Wagner 2003; White and Mattys 2007a). With normalized PVI (nPVI)-based metrics, interval durations were normalized to take account of speech rate variation (Low et al. 2000). With standard deviation measures (ΔV, ΔC), speech rate normalization was implemented through coefficients of variation for conson antal intervals (VarcoC: Dellwo and Wagner 2003) and vocalic intervals (VarcoV: Ferragne and Pellegrino 2004). In the case of consonants, however, VarcoC lacked discriminative power (White and Mattys 2007a): as noted by Grabe and Low (2002), mean consonantal interval duration varies substantially due to language-specific phonotactics, so using the mean as a normalizing denominator also eliminates linguistically relevant variation. Comparing the power of various metrics, White and Mattys (2007a) suggested that ratenormalized metrics of vowel duration (VarcoV, nPVI-V) are more effective in capturing

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

SPEECH RHYTHM AND TIMING 173 cross-linguistic variation, alongside %V to represent differences in consonant cluster complexity. (For broadly similar conclusions about the relative efficacy of the normalized vocalic metrics, see Loukina et al. 2011; Prieto et al. 2012b). In contrast with Ramus et al. (1999), cross-linguistic studies employing such metrics often found variation in scores within hypothesized rhythm classes to be as great as those between classes (Grabe and Low 2002; White and Mattys 2007a; Arvaniti 2012). While conclusions about prosodic typology based only on rhythm metrics should be treated with circumspection, these data generally align with recent perceptual studies (White et al. 2012; Arvaniti and Rodriquez 2013; White et al. 2016) in refuting categorical notions of rhythm class. Several studies emphasize the limitations of even the more reliable metrics for capturing language-specific durational characteristics, given their susceptibility to variation in utterance composition and idiosyncratic differences between speakers (e.g. Wiget et al. 2010; Loukina et al. 2011; Arvaniti 2012; Prieto et al. 2012b). Given that %V, for example, is designed to reflect variation in the preponderance of syllable structures between languages, it is unsurprising to find that sentences constructed to represent language-atypical structures elicit anomalous scores (Arvaniti 2012; Prieto et al. 2012b). Moreover, the sensitivity of rhythm metrics to speaker-specific variation, a potential problem for typological studies, has been exploited in forensic phonetics and speaker recognition (Leemann et al. 2014; Dellwo et al. 2015) and in discriminating motor speech disorders (Liss et al. 2009). It is clear, however, that large sample sizes and a variety of materials are needed to represent languages in typological studies, a major limitation given the laborious nature of manual measurement of segment duration (and the potential for unconscious language-specific biases in application of acoustic segmentation criteria; Loukina et al. 2011). While automated approaches have potential (Wiget et al. 2010), data-trained models for recognition and forced alignment may not be available for many languages; furthermore, Loukina et al. (2011) indicated drawbacks with forced alignment that they addressed using purely acoustic-based automated segmentation. Also problematic for ‘rhythm metrics’ is that relationships between sampled languages vary according to elicitation methods (for comparisons of read and spontaneous speech, see Barry et al. 2003; Arvaniti 2012) and that no small set of metrics, even the more reliable, consistently distinguishes all languages (Loukina et al. 2011). Furthermore, articulation rates should also be reported, as the more reliable metrics are rate-normalized (VarcoV and nPVI, although not %V), but perceptual evidence shows the importance for language dis crimination of syllable-per-second rate differences (Dellwo 2010; White et al. 2012; Arvaniti and Rodriquez 2013). At best, metrics such as VarcoV and %V are approximate indicators of broad phonetic and phonotactic patterns. Questions about cross-linguistic timing differences—for example, comparing the durational marking of prominences and boundaries—could often be better addressed by more direct methods (Turk and Shattuck-Hufnagel 2013). Moreover, duration-based metrics neglect other perceptually important prosodic dimensions (Cumming 2011b). From a theoretical perspective, the need to declare one’s assumptions about the nature of speech rhythm is paramount (e.g. Wiget et al.’s 2010 specific use of the term ‘contrastive rhythm metrics’). Indeed, there is usually a more directly appropriate term for one’s object of phonetic or phonological study than ‘rhythm’ (Turk and ShattuckHufnagel 2013).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

174 LAURENCE WHITE AND ZOFIA MALISZ

11.2.2 The fall of the rhythm class hypothesis Rhythm classes based on isochronous units—syllable-timed, stress-timed, mora-timed— have long been undermined by durational evidence, as discussed above. The multi-faceted nature of prominence provides further counter-arguments to the rhythm class hypothesis. Two languages characterized as ‘syllable-timed’ are illustrative. Spanish and French both have limited consonant clustering, minimal prominence-related vowel reduction, and relatively transparent syllabification. As Pointon (1995) observed, however, French lacks lexical stress and has phrase-final prominence, while Spanish has predominantly word-penultimate stress but lexically contrastive exceptions, minimally distinguishing many word pairs (e.g. tomo ‘I take’ vs. tomó ‘she took’). Despite such ‘within-class’ structural distinctions, some studies have suggested that initial speech processing depends upon speakers’ native ‘rhythm class’. Thus, French speakers were quicker to spot targets corresponding to exactly one syllable of a carrier word: for example, ba in ba.lance, bal in bal.con versus (slower) bal in ba.lance, ba in bal.con (Mehler et al. 1981). This ‘syllable effect’ was contrasted with metrical segmentation, wherein speakers of Germanic languages with predominant word-initial stress (e.g. Dutch and English) were held to infer word boundaries preceding stressed (full) syllables (Cutler and Norris 1988; although Mattys and Melhorn 2005 argued that stressed-syllable-based segmentation implies, additionally, a syllabic representation). These different segmentation strategies were explicitly associated with ‘rhythm class’ (Cutler 1990), which Cutler and Otake (1994) extended to Japanese, the ‘mora-timed’ archetype. Furthermore, the importance of early childhood experience was emphasized, suggesting that infants detect their native ‘rhythm class’ to select a (lifelong) segmentation strategy (Cutler and Mehler 1993). It is questionable, however, whether Spanish and French speakers would share rhythmical segmentation strategies, given differences in prominence distribution and function. Indeed, the ‘syllable effect’ subsequently appeared elusive in Spanish, Catalan, and Italian, all with variable, lexically contrastive stress placement (SebastiánGallés et al. 1992; Tabossi et al. 2000). Moreover, Zwitserlood et al. (1993) found that speakers of (‘stress-timed’) Dutch showed syllabic matching effects comparable to those Mehler et al. (1981) reported for French (for syllabic effects in native English speakers, see Bruck et al. 1995; Mattys and Melhorn 2005). It appears that syllabic and metrical effects are heavily influenced by linguistic materials and task demands, rather than fixed by listeners’ earliest linguistic experiences (for a review, see White 2018). Some perceptual studies have shown that listeners can distinguish two languages from distinct ‘rhythm classes’, but not two languages within a class. For example, American English-learning five-month-olds distinguished Japanese utterances from—separately— British English and Italian utterances, but did not distinguish Italian and Spanish, or Dutch and German (Nazzi et al. 2000a). Using monotone delexicalized sasasa speech preserving natural utterance timing (as described above), Ramus et al. (2003) found between-class, but not within-class, discrimination by French adult listeners (but postulated a fourth ‘rhythm class’ to account for discrimination of Polish from—separately—Catalan, Spanish, and English). However, subsequent similar studies found discrimination within ‘rhythm classes’: for five-month-olds hearing intact speech (White et al. 2016) and for adults with delexicalized speech (White et al. 2012; Arvaniti and Rodriquez 2013). Discrimination patterns can be explained by cross-linguistic similarity on salient prosodic dimensions, including speech

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

SPEECH RHYTHM AND TIMING 175 rate and utterance-final lengthening, without requiring categorical distinctions (White et al. 2012). In her influential paper ‘Isochrony Reconsidered’, Lehiste (1977b) argued that support for isochrony-based theories was primarily perceptual; indeed, data from perception studies have since been invoked to buttress the rhythm class hypothesis. It now seems clear, however, that responses to speech stimuli are not determined by listeners’ native ‘rhythm class’ (segmentation studies) or by categorical prosodic classes of language materials (discrimin ation studies). Languages clearly vary in their exploitation of temporal information to indicate speech structure, notably prominences and boundaries, but this variation is gradient and—integrating other prosodic features—multi-dimensional. There remain typological rhythm-based proposals, such as the ‘control versus compensation hypothesis’ (Bertinetto and Bertini 2008), but these assume gradient between-language variation in key param eters. The concept of categorical rhythm class seems superfluous, indeed misleading, for theories of speech production and perception.

11.3 Models of prosodic speech timing Factors affecting speech duration patterns are diverse and not wholly predictable, including—beyond this linguistically oriented survey’s scope—word frequencies, emotional states, and performance idiosyncrasies. At the segmental level, voicing and place/ manner of articulation influence consonant duration, while high vowels tend to be shorter than low vowels (for reviews see Klatt 1976; van Santen 1992). Some languages signal consonant or vowel identity by length distinctions, sometimes with concomitant quality contrasts (for a review see Ladefoged 1975). Connected speech structure also has durational consequences: for example, vowels are shorter preceding voiceless obstruents than voiced obstruents (Delattre 1962). This consonant–vowel duration trade-off (‘pre-fortis clipping’) is amplified phrase-finally (Klatt 1975, hinting at the importance of considering prosodic structure when interpreting durational data; e.g. White and Turk 2010). Beyond segmental and intersegmental durational effects, an ongoing discussion concerns the nature of the higher-level structures that are important for describing speech timing, and the mechanisms through which these structures influence observed durational patterns. Here we review two of the many extant approaches to these problems (see also, inter alia, Byrd and Saltzman 2003; Aylett and Turk 2004; Barbosa 2007). §11.3.1 considers approaches based on localized lengthening effects associated with linguistic constituents. §11.3.2 considers dynamical systems models based on hierarchical coup ling of oscillators. For each, we briefly highlight key features and consider their accounts of some observed timing effects.

11.3.1 Localized approaches to prosodic timing The fundamental claim of ‘localized’ approaches to prosodic timing is that no speech units impose temporal constraints on their sub-constituents throughout the utterance (van Santen 1997). Timing is primarily determined bottom-up, based on segmental identity (echoing

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

176 LAURENCE WHITE AND ZOFIA MALISZ Pointon’s 1980 description of Spanish as ‘segment-timed’) and processes of accommodation and coarticulation between neighbouring segments. Higher-level structure influences timing via localized lengthening effects at linguistically important positions (White 2002, 2014). The most well-attested lengthening effects are at prosodic domain edges and, for some languages, at prosodic heads (see Beckman 1992 regarding edge-effect universality versus head-effect language-specificity). Final (‘pre-boundary’) lengthening is widely observed at various levels of linguistic structure (e.g. English: Oller 1973; Dutch: Gussenhoven and Rietveld 1992; Hebrew: Berkovits 1994; Czech: Dankovičová 1997; see Fletcher 2010 for an extensive review). Lengthening (and gestural strengthening) of word-initial consonants is also reported cross-linguistically (e.g. Oller 1973; Cho et al. 2007), with greater lengthening after higher-level boundaries (e.g. Fougeron and Keating 1997). In many languages, lexically stressed syllables are lengthened relative to unstressed syllables (e.g. Crystal and House 1988), although the magnitude of lengthening varies (e.g. Dauer 1983; Hoequist 1983) and, as discussed in §11.1.2, some languages may lack lexical stress. Additionally, stressed and other syllables are lengthened in phrasally accented words (e.g. Sluijter and van Heuven 1995). White’s (2002, 2014) prosodic timing framework proposed that lengthening is the dur ational means by which speakers signal structure for listeners. The distribution of lengthening depends on the particular (edge or head) structural influence: for example, the syllable onset is the locus of word-initial lengthening (Oller 1973), while the pre-boundary syllable rhyme is lengthened phrase-finally (as well as syllable rhymes preceding a final unstressed syllable; Turk and Shattuck-Hufnagel 2007). Thus, the distribution (‘locus’) of lengthening disambiguates the nature of the structural cue (e.g. Monaghan et al. 2013). This emphasis on localized lengthening affords a reinterpretation of ‘compensatory’ timing processes, inverse relationships between constituent length, and the duration of subconstituents. For example, Lehiste (1972) reported ‘polysyllabic shortening’, an inverse relationship between a word’s syllable count and its primary stressed syllable’s duration. As observed by White and Turk (2010), however, many duration studies have only measured phrasally accented words, such as in fixed frame sentences (e.g. ‘Say WORD again’). The primary stressed syllables are lengthened in these phrasally accented words, as—to a lesser extent—are unstressed syllables; moreover, the greater the number of unstressed syllables, the smaller the accentual lengthening on the primary stressed syllable (Turk and White 1999). Hence, pitch accented words appear to demonstrate polysyllabic shortening (e.g. cap is progressively shorter in cap, captain, captaincy; likewise mend in mend, commend, recommend); however, in the absence of pitch accent, there is no consistent relationship between word length and stressed syllable duration (White 2002; White and Turk 2010). Similar arguments apply to apparent foot-level compression effects. Beckman (1992: 458) noted the difficulty in distinguishing ‘rhythmic compression of the stressed syllable in a polysyllabic foot from the absence of a final lengthening for the prosodic word’. Likewise, Hirst (2009) considered the durational consequences of the length of the English ‘narrow rhythm unit’ (NRU) (or ‘within-word foot’, from a stressed syllable to a subsequent word boundary). He found the expected linear relationship between syllable number and NRU duration, not the negative acceleration expected for a cross-foot compression tendency (Nakatani et al. 1981; Beckman 1992). Furthermore, Hirst (2009) attributed the ‘residual’ extra duration within each NRU (the intercept of the regression line for NRU length vs. duration) to localized lengthening effects at the beginning and end of the NRU (cf. White 2002, 2014; White and Turk 2010). Similarly, Fant et al. (1991: 84), considering Swedish,

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

SPEECH RHYTHM AND TIMING 177 French, and English, suggested that the primary (but ‘marginal’) durational consequence of foot-level structure was in ‘the step from none to one following unstressed syllables in the foot’. This localized lengthening of the first of two successive stressed syllables (e.g. Fourakis and Monahan 1988; Rakerd et al. 1987; called ‘stress-adjacent lengthening’ by White 2002) may relate to accentual lengthening variation in cases of stress class. Generalizing from these observations, White (2002, 2014) reinterpreted apparent compensatory timing as being due to variation in the distribution of localized prosodic lengthening effects at domain heads and domain edges (e.g. phrasal accent lengthening or phrase-final lengthening). The localized lengthening framework further argues that, outside the loci of prosodic lengthening effects, there is little evidence for relationships between constituent length and sub-constituent duration (see also e.g. Suomi 2009; Windmann et al. 2015). Beyond such localized lengthening, the primary determiner of a syllable’s duration is its segmental composition (van Santen and Shih 2000).

11.3.2 Coupled oscillator approaches to prosodic timing Coupled oscillator models posit at least two cyclical processes that are coupled—that is, influence each other’s evolution in time. O’Dell and Nieminen’s (1999, 2009) model of pros odic timing considers the hierarchically coupled syllable and stress-delimited foot oscillators (but see Malisz et al. 2016 regarding other units). Some models additionally include nonhierarchically coupled subsystems: Barbosa’s (2007) complex model also contains a coupled syntax and articulation module, the syntactic component being controlled by a probabilistic model as well as a coupled prosody-segmental interaction, and generates abstract vowel-tovowel durations tested on a corpus of Brazilian Portuguese. (For overviews of dynamical approaches to speech, including timing, see Van Lieshout 2004; Tilsen 2009). Empirical support for coupled oscillator models on the surface timing level has been found in the linear relationship between the number of syllables in a foot and the foot’s duration, discussed in §11.3, with non-zero coefficients. This relationship naturally emerges from O’Dell and Nieminen’s (2009) mathematical modelling of foot and syllable oscillator coupling. Importantly, there is variable asymmetry in the coupling strengths of the two oscillators, between and within languages (see also Cummins 2002). If one process wholly dominated, isochrony of syllables or stress feet would be observed: in strict foot-level isochrony, foot duration would be independent of syllable count; in strict syllable-level isochrony, foot duration would be additively proportional to syllable count. That such invariance is rarely observed is, of course, not evidence against oscillator models. Surface regularity of temporal units is not a prerequisite; rather, it is the underlying cyclicity of coupled units that is assumed (for a discussion see Turk and Shattuck-Hufnagel 2013; Malisz et al. 2016). Indeed, periodic control mechanisms, if coupled, should not typically produce static surface isochrony on any subsystem level (e.g. syllable or foot): hierarchical coupling promotes variability in temporal units (Barbosa 2007; Malisz et al. 2016), and only under specific functional conditions is surface periodicity achieved. Regression of unit duration against number of sub-constituents cannot, however, distinguish where local expansion or compression may take place. Indeed, coupled oscillator models are neutral about where durational effects are allocated within temporal domains (Malisz et al. 2016), ranging from extreme centralization to equal allocation throughout the

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

178 LAURENCE WHITE AND ZOFIA MALISZ domain. By contrast, localized approaches (e.g. White 2014) suggest strict binding of lengthening effects to specific loci (e.g. syllable onset or rhyme) within the domain (e.g. a word) while other syllables are predicted to remain unaffected as they are added to the domain outside this locus. Effects on surface timing predicted by the coupled oscillator model are thus less constrained than those of localized approaches, which specifically argue for the absence of compression effects beyond the locus (see §11.3.1). Dynamical models, such as O’Dell and Nieminen (2009), updated in Malisz et al. (2016), depend rather on evidence of hierarchical coupling, such as that provided by Cummins and Port (1998) in the specific case of speech-cycling tasks. While repeating a phrase to a uniformly varying metronome target beat, English speakers tended to phase-lock stressed syl lables to the simple ratios (1:3, 1:2, 2:3) of the repetition cycle. Furthermore, other periodicities emerge at harmonic fractions of the phrase repetition cycle (Port 2003, who relates these observations to periodic attentional mechanisms (Jones and Boltz 1989; see also McAuley and Fromboluti 2014)). There is also suggestive empirical support for metrical influences on speech production in findings that speakers may prefer words and word combinations that maintain languagespecific metrical structures (Lee and Gibbons 2007; Schlüter 2009; Temperley 2009; Shih 2014), although Temperley (2009) found that contextually driven variations from canonical form (e.g. stress-clash avoidance) actually increase interval irregularity in English. In dynamical theories, coupling is evident within hierarchical speech structures, between speakers in dialogue, and within language communities (Port and Leary 2005; Cummins 2009). Periodic behaviour is understood to be one of the mechanisms of coordination within complex systems (Turvey 1990), mathematically modelled by oscillators. Furthermore, coupled oscillatory activity behaviour is a control mechanism that spontaneously arises in complex systems where at least two subsystems interact, without necessarily requiring a periodic referent, such as a regular beat (Cummins 2011). Whether the undoubted human ability to dynamically entrain our actions is mirrored in the entrainment of metrical speech units remains debatable, as discussed here. Evidence of the entrainment of endogenous neural oscillators (e.g. theta waves) to the amplitude envelope of speech (e.g. Peelle and Davis 2012) suggests a possible neural substrate for oscillatorbased speech behaviour, potentially important in listeners’ generation of durational predictions based on speech rate (e.g. Dilley and Pitt 2010). Theories of neural entrainment need, however, to address the lack of surface periodicity in most speech, as well as the imprecise mapping between the amplitude envelope and linguistic units (Cummins 2012). More generally, oscillator models of timing may find a challenge in evidence that many languages lack levels of prominence, such as lexical stress, that were once thought universal (e.g. Jun 2005b; Maskikit-Essed and Gussenhoven 2016).

11.4 Conclusions and prospects The hypothesis that speech is consistently characterized by isochrony has succumbed to the weight of counterevidence, and the associated hypothesis about categorical ‘rhythm class’ has, at best, scant support. The accumulated production and perception data do, however, support a continuing diversity of approaches to speech timing, varying in their balance

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

SPEECH RHYTHM AND TIMING 179 between chronos and kairos, notably the degree to which surface timing patterns or hier archical control structures are emphasized. Regarding the two approaches sketched here, there may appear superficial contrasts between dynamical timing models, emphasizing underlying coupling between hierarchic ally organized levels of metrical structure (e.g. Cummins and Port 1998; O’Dell and Nieminen 2009; Malisz et al. 2016), and localized approaches, emphasizing the irregularity of surface timing and the information about structure and phonology provided for listeners by this temporal unpredictability (e.g. Cauldwell 2002a; Nolan and Jeon 2014; White 2014). A synthesis of dynamical and localized models may, however, emerge from a deeper understanding of the complex interaction between the information transmission imperative in language and the affordance that speech offers for multi-level entrainment of interlocutors’ gestural, prosodic, linguistic, and social behaviour (Tilsen 2009; Pickering and Garrod 2013; Mücke et al. 2014; Fusaroli and Tylén 2016). Some degree of broad predictability is a prerequisite for humans interacting in conversation or other joint action. More specifically, local unpredictability in speech timing cannot be interpreted as structurally or prosodically motivated unless listeners have a foundation on which to base temporal predictions and the ability to spot violations of predictions (e.g. Baese-Berk et al. 2014; Morrill et al. 2014b). Where mutual understanding confers predictability—for example, via a common social framework or foregoing linguistic context—then the surface timing of speech may be freer to vary unpredictably, towards maximizing encoding of information. When interlocutors lack shared ground and predictability is consequently elusive, then relative underlying periodicity may dominate, supporting mutual coordination and ease of processing, but with potential loss of redundancy in information encoding (see Wagner et al. 2013). This proposal, which we tentatively call the ‘periodicity modulation hypothesis’, lends itself to ecologically embedded studies of infant and adult spoken interactions and their relationship to neurophysiological indices of perception and understanding.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Pa rt I V

PRO S ODY AC RO S S T H E WOR L D

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 12

Su b-Sa h a r a n A fr ica Larry M. Hyman, Hannah Sande, Florian Lionnet, Nicholas Rolle, and Emily Clem

12.1 Introduction In this chapter we survey the most important properties and issues that arise in the pros odic systems of sub-Saharan Africa. While our emphasis is on the vast Niger-Congo (NC) stock of approximately 1,500 languages, much of what is found in NC is replicated in Greenberg’s (1963) other major stocks: Nilo-Saharan, Khoisan, and the Chadic, Cushitic, and Omotic subgroups of Afro-Asiastic. As we shall point out, both the occurrence of tone and other properties that are found in the prosodic systems of sub-Saharan Africa show noteworthy areal distributions that cut across these groups. We start with a discussion of tone (§12.2), followed by word accent (§12.3) and then intonation (§12.4).

12.2 Tone Tone is clearly an ancient feature across sub-Saharan Africa, with the exception of Afro-Asiatic (e.g. Chadic), which likely acquired tone through contact with NC and/or Nilo-Saharan (Wolff 1987: 196–197). It is generally assumed that Proto-NC, which existed somewhere between 7,000 and 10,000 years ago, already had tone, most likely with a con trast between two heights, H(igh) and L(ow) (Hyman 2017). First, almost all NC languages are tonal, including the controversial inclusions such as Mande, Dogon, and Ijoid. Second, non-tonal NC languages are geographically peripheral and have lost their tone via natural tone simplification processes (cf. Childs 1995) and/or influence from neighbouring nontonal languages (cf. Hombert 1984: 154–155). This includes not only Swahili in the East but also Northern Atlantic (Fula, Seereer, Wolof, etc.), Koromfé (Northern Central Gur; Rennison 1997: 16), and (outside NC) Koyra Chiini (Songhay; Heath 1999: 48), which could be the effect of contact with Berber or Arabic, either directly or through Fula (Childs 1995: 20). The only non-peripheral cases concern a near-continuous band of Eastern Bantu languages, such as Sena [Mozambique], Tumbuka [Malawi] (Downing 2017: 368), and Nyakyusa [Tanzania], which are not likely to have lost their tones through external contact. Since

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

184 Larry M. Hyman et al. tonogenesis usually if not always produces a binary contrast, languages with multiple tone heights have undergone subsequent tonal splits conditioned either by laryngeal features, such as the obstruent voicing of so-called depressor consonants, as in Kru (see Singler 1984: 74 for Wobe) or by the tones themselves, such as raising of L to M(id) before a H tone that then subsequently drops out, as in Mbui [Grassfields Bantu; Cameroon] (Hyman and Schuh 1974: 86).

12.2.1 Tonal inventories Numerous sub-Saharan languages still show a binary contrast in their tones, which may be analysed as equipollent /H/ vs. /L/, e.g. Ga [Kwa; Ghana] /lá/ ‘blood’, /là/ ‘fire’ (Kropp Dakubu 2002: 6), privative /H/ vs. Ø, e.g. Haya [Bantu; Tanzania] /-lí-/ ‘eat’ vs. /-si-/ ‘grind’ (Hyman and Byarushengo 1984: 61), or (more rarely) privative /L/ vs. Ø, e.g. Malinke de Kita [Mande; Mali] /nà/ ‘to come’ vs. /bo/ ‘to exit’ (Creissels 2006: 26). As pointed out by Clements and Rialland (2008: 72–73), languages with three, four, or even five contrastive pitch heights tend to cluster along a definable East–West Macro-Sudan Belt (Güldemann 2008) starting in Liberia and Ivory Coast, e.g. Dan (Vydrin 2008: 10), and ending in Ethiopia, where Bench also contrasts five tone heights: /ka̋r/ ‘clear’, /kárí/ ‘inset or banana leaf ’, /kār/ ‘to circle’, /kàr/ ‘wasp’, and /kȁr/ ‘loincloth’ (Rapold 2006: 120). Most of those spoken south of the Macro-Sudan Belt (Güldemann 2008) contrast two tone heights (see Map 12.1 in the plate section).1 Another area of high tonal density is to be found in the Kalahari Basin region (Güldemann 2010) in southwestern Africa, where languages formerly subsumed under the label ‘Khoisan’ have up to four level tones: the Kx’a language ǂʼAmkoe (Gerlach 2016) and the Khoe-Kwadi languages Khwe (Kilian-Hatz 2008: 24–25), Gǀui (Nakagawa 2006: 32–60), and Ts’ixa (Fehn 2016: 46–58) have three tone heights (H, M, L), while Khoekhoe (KhoeKwadi: Haacke 1999: 53) and the Ju branch of Kx’a (formerly ‘Northern Khoisan’: Dickens 1994; König and Heine 2015: 44–48) have four (Super-H, H, L, Super-L). Only one Kalahari Basin language (the West dialect of Taa, aka ǃXóõ) has been analysed as opposing two tone heights (H vs. L: Naumann 2008). Besides the number of tone heights, African tone system inventories differ in whether they permit contours or not, and, if permitting, which ones are present. Map 12.2 (see plate section) shows that R(ising) and F(alling) tonal contours tend more to appear in languages in the Macro-Sudan Belt. In terms of the number of contour tones, African languages have been reported with up to five falling tones, e.g. Yulu HM, HL, HꜜL, ML, MꜜL (Boyeldieu 1987: 140), and five rising tones, e.g. Wobe 31, 32, 41, 42, 43, where 1 is the highest tone (Bearth and Link 1980: 149). Another difference in inventory concerns whether a language allows downstepped tones or not. Whereas some languages contrast the three tone heights /H/, /M/, and /L/, which in principle can combine to produce nine possibilities on two syllables and 27 possibilities on 1 Note that in Map 12.1 tone heights are counted based on the number of contrastive pitch levels a language employs on the surface. Thus, if a language has a system consisting of L, H, and ꜜH, it will be counted as having three tone heights.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Sub-Saharan Africa 185 three, as in Yoruba (Pulleyblank 1986: 192–193), others contrast /H/, /L/, and a downstepped ꜜH, which usually is contrastive only after another (ꜜ)H. As seen in Map 12.3 (see plate section), a smaller number of languages have contrastive ꜜM and ꜜL. While in most African languages with downstep an underlying L tone between two H tones results in the second H surfacing as ꜜH, an input H-H sequence may also become H-ꜜH, as in Shambala (Odden 1982) and Supyire (Carlson 1983). A number of underlying /H, M, L/ languages lack ꜜH but have downstepped ꜜM, which results from a wedged L tone, e.g. Yoruba (Bamgbos̹e 1966), Gwari (Hyman and Madaji 1970: 16), and Gokana (Hyman 1985: 115). Downstepped L, on the other hand, is more likely to derive from a lost H tone, as in Bamileke-Dschang (Hyman and Tadadjeu 1976: 92). ꜜH is by far the most common downstepped tone, and a three-way H vs. ꜜH vs. L contrast is the most common downstep system. On the other hand, Ghotuo is reported to have both ꜜM and ꜜL, but no ꜜH (Elugbe 1986: 51). Yulu (Boyeldieu 2009), which is said to have an ‘infra-bas’ tone, may be best ana lysed as having ꜜL. Similarly, the contrastive L falling tone of Kalenjin may be best analysed as a LꜜL contour tone (Creider 1981). While ꜜH occurs throughout sub-Saharan Africa, ꜜM and ꜜL are more commonly found in the eastern part of the Macro-Sudan Belt (e.g. Nigeria, Cameroon, Central African Republic).

12.2.2 The representation of tone The density of the tonal contrasts depends on whether a contrastive tone is required on every tone-bearing unit (TBU), instead of allowing some or many TBUs to be toneless. In the most dense system, the number of contrastive tone patterns will equal the number of contrastive tones multiplied by the number of TBUs. Thus, in a /H, L/ system there should be two patterns on a single TBU, four patterns on two TBUs, and so on (and perhaps more patterns if tonal contours are allowed). A sparse tonal situation tends to arise in languages that have longer words but have a more syntagmatic tone system. In these languages, single tones, typically privative H, can be assigned to a specific position in a lexically defined group of stems or words, as in Chichewa, where an inflected verb stem can be completely toneless or have a H on its penultimate or final syllable. It is such systems that lend them selves to a privative /H/ vs. Ø analysis. In another type of system often referred to as melodic, the number of TBUs is irrelevant. In Kukuya, for instance, verb stems can have any of the shapes CV, CVV, CVCV, CVVCV, or CVCVCV, i.e. up to three morae over which five different tonal melodies are mapped: H, L, LH, HL, or LHL (Paulian 1975). Since a CV TBU can have any of the LH, HL, or LHL contours, analysed as sequences of tones, Kukuya unambiguously requires a /H/ vs. /L/ analysis. Other languages can reveal the need for a /H/ vs. /L/ analysis by the presence of floating tones (e.g. many Grassfields Bantu languages).

12.2.3 Phonological tone rules/constraints Almost all sub-Saharan languages modify one or another of their tones in specific tonal environments. By far the most common tone rule found in NC languages is perseverative tone spreading, which most commonly applies to H tones, then L tones, then M tones.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

186 Larry M. Hyman et al. In languages that have privative /H/ vs. Ø, only H can spread, as in much of Eastern and Southern Bantu, and similarly regarding L in /L/ vs. Ø systems such as Ruwund (Nash 1992–1994). Such spreading can be either bounded, affecting one TBU to the right, or unbounded, targeting either the final or penultimate position within a word or phrase domain. In some languages both H and L tone spreading can result in contours being created on the following syllable, as in the Yoruba example /máyò̙ mí rà wé/ → [máyô̙ mı ̌ râ wě] ‘Mayomi bought books’ (Laniran and Clements 2003: 207). In languages that do not tolerate contours, the result is doubling of a tone to the next syllable. This is seen particularly clearly in privative H systems, e.g. Kikerewe [Bantu; Tanzania] /ku-bóh-elan-a/ → [ku.bó.hé.la.na] ‘to tie for each other’ (Odden 1998: 177). In some cases the ori ginal tone delinks from its TBU, in which case the result is tone shifting, as in Jita /ku-βón-er-an-a/ → [ku-βon-ér-an-a] ‘to get for each other’ (Downing 1990: 50). Tone anticipation is much less common but does occur, e.g. Totela /o-ku-hóh-a/ → [o-kú-hoh-a] ‘to pull’ (Crane 2014: 65). Other languages may raise or lower a tone, e.g. a /L-H/ sequence may be realized L-M as in Kom [Bantoid; Cameroon] (Hyman 2005: 315–316) or M-H as in Ik [Eastern Sudanic; Uganda] (Heine 1993: 18), while the H in a /H-L/ sequence may be raised to a super-high level, as in Engenni [Benue-Congo; Nigeria] /únwónì/ → [únwőnì] ‘mouth’ (Thomas 1974: 12). Finally, tone rules may simplify LH rising and HL falling tones to level tones in specific environments. For more on the nature of tone rules in African languages see Hyman (2007) and references cited therein.

12.2.4 Grammatical functions of tone One of the most striking aspects of tone across Africa is its frequent use to mark grammat ical categories and grammatical relations. Three types of grammatical tone (GT) are illus trated below from Kalabari [Ijoid; Nigeria] (Harry and Hyman 2014). The first is morphological GT at the word level, which turns a transitive verb into an intransitive verb by replacing lexical tones with a LH tone melody. In this case, the only mark of the gram matical category is the GT, with no segmental exponence, illustrated in (1). (1) kán kíkíꜜmá

H

‘demolish’

→ kàán

LH

‘be demolished’

HHꜜH

‘hide’

→ kìkìmá LLH ‘be hidden’

The second, syntactic type occurs at the phrase level. As shown in (2), when a noun is pos sessed by a possessive pronoun (e.g. /ìnà/ ‘their’), the lexical tones of the noun are replaced with a HLH melody, realized [HꜜH] on two syllables. (2) námá bélè

HH ‘animal’ → ìnà náꜜmá HꜜH ‘their animal’ HL

‘light’

→ ìnà béꜜlé

HꜜH ‘their light’

Unlike the first case, here GT only secondarily expones the grammatical category ‘possessive’, and must co-occur with a segmentally overt possessive pronoun. Both morphological and syntactic types are referred to as ‘replacive tone’ (Welmers 1973: 132–133). Finally, the third type is also phrase level, but is crucially different in that the GT does not replace lexical tone but rather co-occurs with it. For example, still in Kalabari the future auxiliary /ɓà/ assigns

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Sub-Saharan Africa 187 /H/ to a preceding verb, which surfaces as [LH] if it has /L/: /sì/ ‘(be) bad’ → námá sìí ɓà ‘the animal will become bad’. While all tonal languages in Africa exhibit GT, typically robustly, we know of only one language with only lexical tone. GT usage cuts across other typological dimensions such as tone inventory, degree of analyticity/syntheticity, and the headedness parameter, and can express virtually all grammatical categories and many distinct types of grammatical rela tions, including derivation (valency, word class changes, and more), and all major inflec tional categories such as number, case, tense, aspect, mood, subject agreement, and polarity, as in Aboh [Benue-Congo; Nigeria]: [ò jè kò] ‘s/he is going’, [ó jé kò] ‘s/he is not going’ (L. Hyman, personal notes). One robust pattern found across Africa involves GT marking ‘associative’ (roughly, genitive and compound) constructions, e.g. in Mande (Creissels and Grégoire 1993; Green 2013), Kru (Marchese 1979: 77; Sande 2017: 40), much of Benue-Congo, and (outside of NC) Chadic (Schuh 2017: 141), the isolate Laal (Lionnet 2015), and many Khoe-Kwadi languages (Haacke 1999: 105–159; Nakagawa 2006: 60–80, among others). In the verbal domain, tone often only has a grammatical function if verb roots lack a lexical tonal contrast. Table 12.1 illustrates this with the closely related Bantu languages Luganda and Lulamogi (Hyman 2014a). Both exhibit a lexical tonal contrast in the nominal domain, but only Luganda does so in the verbal domain.

Table 12.1 Grammatical tone in a language without a tone contrast in the verb stem (Luganda) and its absence in a language with such a tone contrast (Lulamogi) Nouns Verbs

Luganda e-ki-zimbe ≠ e-ki-zîmba o-ku-bal-a ≠ o-ku-bál-a

Lulamogi é-ki-zimbé ≠ é-kí-zîmbá ó-ku-bal-á = ó-ku-bal-á

‘building’ ‘boil, tumor’ ‘to count’ ‘to produce, bear fruit’

The lack of lexical tone contrasts in the verbal domain is common across African tonal languages, such as in Kisi [South Atlantic; Sierra Leone] (Childs 1995: 55), Konni [Gur; Ghana] (Cahill 2000), CiShingini [Kainji; Nigeria] (N. Rolle and G. Bacon, field notes), and Zande [Ubangi; Central African Republic] (Boyd 1995), not to mention many Bantu lan guages where tones are assigned by the inflectional morphology (tense-aspect-mood-nega tion), e.g. Lulamogi /a-tolók-a/ ‘s/he escapes’ vs. /á-tolok-é/ ‘may s/he escape!’. At least one language, Chimwiini [Bantu; Somalia] has only GT and no lexical tone in any domain (Kisseberth and Abasheikh 2011). Here, a single final or penultimate privative H tone is determined by the grammar, e.g. [ji:lé] ‘you sg. ate’, [jí:le] ‘s/he ate’ (Kisseberth and Abasheikh 2011: 1994), and although the above contrast derives from the inflectional morphology of the verb, it is realized phrasally: [jile ma-tu:ndá] ‘you sg. ate fruit’, [jile matú:nda] ‘s/he ate fruit’. Kisseberth and Abasheikh (2011) analyse phrase-penultimate H as the default, with the final H in these examples being conditioned by first (and second) person subject marking. Other final H tones are assigned by relative clauses, conditional clauses introduced by /ka-/, the negative imperative, and the conjunction /na/ ‘and’ (Kisseberth and Abasheikh 2011: 1990–1992).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

188 Larry M. Hyman et al. The interaction of GT with lexical tone and other GTs is extremely rich and varied. One profitable way of illustrating complex GT interaction is through tonological paradigms show ing which morphosyntactic features assign which tones. These assignments often conflict. It is profitable to view GT competition as ‘dominance effects’ (Kiparsky 1984; Inkelas 1998; Kawahara 2015). As implemented in Rolle (2018), dominant GT wins systematically in com petition with lexical tone resulting in tonal replacement, as was exemplified in the first two types above from Kalabari (1–2). In contrast, non-dominant GT does not systematically win over other tones, often resulting in tones from two distinct morphemes co-occurring together, as in the third type in Kalabari shown above with the future auxiliary /ɓà/. Dominant and non-dominant GT can be interleaved in morphologically complex words, resulting in ‘layers’ of GT. The following example comes from Hausa (Inkelas 1998: 132). In (3a), dominant affixes /-íí/ agent and /-ìyáá/ fem replace any tones in the base and assign a L and H melody, respect ively. Non-dominant affixes /má-/ nominalizer and /-r/ ref either assign no tone, or assign a floating tone which docks to the edge but does not replace tones, as shown in (3b). (3) a. [ [ [ má- [ káràntá -Líí ] ] -Hìyáá ] -Lr ] nml- read -agent -fem -ref ‘the reader (f.)’ b. Dom Non-dom Dom Non-dom

káràntá -Líí má- kàràncí mákàràncíí -Hìyáá mákáráncììyáá -Lr

→ → → →

kàràncíí mákàràncíí mákáráncììyáá mákáráncììyâr

GT interaction can be very complex and may involve intricate rules of resolution not easily captured as dominant versus non-dominant GT, as in the case of the grammatical H tones in Jita [Bantu; Tanzania] (Downing 2014). In addition, tone may exhibit allo morphic melodies conditioned by properties of the target. For example, in Tommo So [Dogon; Mali] (McPherson 2014) possessive pronouns assign a H melody to bimoraic tar gets but a HL melody to longer targets. Thus, /bàbé/ ‘uncle’ → [mí bábé] ‘my uncle’ vs. / tìrὲ-àn-ná/ ‘grandfather’ → [mí tírὲ-àn-nà] ‘my grandfather’. Although virtually all sub-Saharan tonal languages exhibit both lexical tone and GT, the functional load of each can vary significantly. Many statements of African languages expli citly note the lack of tonal minimal pairs, as in the Chadic languages Makary Kotoko (Allison 2012: 38) and Goemai (Tabain and Hellwig 2015: 91); for Cushitic as a whole (Mous 2009), e.g. Awngi (Joswig 2010: 23–24); and in Eastern Bantu languages such as Luganda and Lulamogi (see Table 12.1). Other languages have more frequent minimal pairs, such as the oft-cited minimal quadruplet in Igbo: /àkwà/ ‘bed’, /àkwá/ ‘egg’, /ákwà/ ‘cloth’, /ákwá/ ‘crying’. The functional load of GT similarly varies across Africa.

12.3 Word accent While we have a great understanding of tone in African languages, there has been consider ably less clarity about the status of word stress. In this section we adopt word accent (WA) as a cover term to include word stress and other marking of one and only one most prominent

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Sub-Saharan Africa 189 syllable per word. In the most comprehensive survey of WA in sub-Saharan Africa to date, the studies cited by Downing (2010) describe individual African languages with WA assigned at the left or right edge, on the basis of syllable weight, or by tone. However, many if not most authors either fail to report word stress or explicitly state that there is no stress, rather only tone. Thus in Lomongo, ‘stress is entirely eclipsed by the much more essential marking of tones’ (Hulstaert 1934: 79, our translation). Some of the relatively few non-tonal languages do appear to have WA, such as initial (~ second syllable) in Wolof (Ka 1988; Rialland and Robert 2001) and penultimate (~ antepenultimate) in Swahili (Vitale 1982). Other non-tonal languages appear not to have WA at all, but rather mark their prosody at the phrase level, such as by lengthening the vowel of the phrase-penultimate syllable and assigning a H tone to its first mora in Tumbuka (Downing 2017: 369–370). Kropp (1981) describes the stylistic highlighting (‘stress’) of different syllables within the pause group in Ga [Kwa; Ghana]. While descriptions of many tone languages do not mention stress or accent, we do find occasional attempts to predict WA from tone. In Kpelle, Welmers (1962: 86) shows that basic (unaffixed, single-stem) words can have one of five tone melodies H, M, HL, MHL, and L. He goes on to say that when these melodies are mapped on bisyllabic words as H-H, M-M, H-L, M-HL, and L-L, accent falls on the initial syllable if its tone is H or L, otherwise on the second syllable if its tone is HL. Words that are M-M are ‘characterized by lack of stress’ (Welmers 1962: 86). However, the fact that some words are accentless makes the ana lysis suspect, as obligatoriness is a definitional property of stress in non-tone languages. Since the MHL and M melodies derive from /LHL/ and /LH/, respectively (Hyman 2003: 160), Welmers’ accent would have to be a very low-level phenomenon, assigned after the derivation of LH → M. We suspect that other claims of WA based on tonal distinctions are equally due to the intrinsic properties of pitch and other factors unrelated to WA. While in cases such as Kpelle there is a lack of additional phonetic or phonological evi dence for WA, in a number of sub-Saharan African languages the stem- (or root-)initial syllable is an unambiguously ‘strong’ position licensing more consonant and vowel con trasts than pre- and post-radical positions, where ‘weaker’ realizations are also often observed. Perhaps the best-known case is Ibibio, whose consonant contrasts are distributed within the prosodic stem as in (4) (Urua 2000; Harris and Urua 2001; Akinlabi and Urua 2006; Harris 2004). (4)

a. b. c. d. e.

prosodic stem structures: stem-initial consonants: coda consonants: intervocalic VCV: intervocalic VCCV:

CV, CVC, CVVC, CVCV, CVVCV, CVCCV b f m t d s n y ɲ k ŋ kp w p m t n y k ŋ β m ɾ n ɣ ŋ pp mm tt nn yy kk ŋŋ

As indicated, 13 consonants contrast stem-initially versus six or seven in the other positions. The intervocalic weakening of stops to [β, ɾ, ɣ] only between the two stem syllables (not between prefix and stem, for instance), as well as the realization of /i, u/ as [ɨ, ʌ] in stem-initial position, points to the formation of a prosodic stem with a strong-weak foot structure: /díp/ → [dɨ ́p] ‘hide’, /díp-á/ → [dɨ ́βé] ‘hide onseself ’. In addition, although the first syllable can have any of the six vowels /i e u o ɔ a/, the second syllable is limited to a single vowel analysable as /a/, which assimilates to the vowel of the first syllable (cf. [tòβó] ‘make an order’, [dééβ-é] ‘not scratch’, [kɔ́ŋ-ɔ́] ‘be hung’).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

190 Larry M. Hyman et al. Such distributional asymmetries are an important and widespread areal feature in West and Central Africa, in a zone extending from parts of Guinée, Côte d’Ivoire, and Burkina Faso in the West to Gabon and adjacent areas in the two Congos, partly overlapping with what Güldemann (2008) identifies as the core of the Macro-Sudan Belt. Most languages in this stem-initial prominence area are from the NC stock. However, the pattern whereby the initial syllable is favoured with consonant and vowel contrasts while the second is starved of them is an areal feature and cuts across families. It is strongest in the centre of the area (i.e. on both sides of the Nigeria–Cameroon border) and decreases towards the periphery. Most peripheral NC languages have very few such distributional asymmetries (e.g. NorthCentral Atlantic, Bantu south of the Congo), while it is present in Northwest Bantu, but not (or not as much) in the rest of Bantu. Finally, most non-NC languages with similar distri butional asymmetries are found at the periphery of the area, where they are likely to have acquired stem-initial prominence through contact with NC languages, such as Chadic lan guages spoken on the Jos Plateau next to Benue-Congo languages with stem-initial prominence, including Goemai, which has a long history of contact with Jukun (cf. Hellwig 2011: 6). Similarly, the initial-prominent Chadic languages Ndam (Broß 1988) and Tumak (Caprile 1977) and the isolate Laal (Boyeldieu 1977; Lionnet, personal notes) are spoken in southern Chad next to Lua and Ba, two Adamawa languages with strong steminitial prominence (Boyeldieu 1985; Lionnet, personal notes). Nilo-Saharan languages, most of which are spoken far from the stem-initial prominence area, do not seem to have similar distributional asymmetries. This is also true of Saharan or Bongo-Bagirmi languages, spoken relatively close to the area. Stem-initial prominence cued by segmental distribu tional asymmetries thus seems to be an areal feature within the Macro-Sudan Belt, affecting mostly NC languages (cf. Table 12.2 section a), but also neighbouring unrelated languages through contact (cf. Table 12.2 section b). However, as in the case of multiple tone heights, the Kalahari Basin area acts as a south ern counterpart to the Macro-Sudan Belt in being an area of strong stem-initial promin ence. In most of the languages formerly known as ‘Khoisan’, lexical stems strictly conform to the phonotactic templates C(C)1V1C2V2, C(C)1V1V2, and C(C)1V1N. The stem may start with virtually any consonant in the (sometimes very large) inventory, including clicks, and any of the attested clusters (only a few sonorants are not attested stem-initially), while only a handful of consonants, mostly sonorants, are attested in C2 (cf. Table 12.2 section c).

Table 12.2 Stem-initial prominence marked by distributional asymmetries a. Mande Gur Gbaya Adamawa

Guro (Vydrin 2010), Mano (Khachaturyan 2015) Konni (Cahill 2007), Koromfe (Rennison 1997) Gbaya Kara Bodoe (Moñino and Roulon 1972) Lua (Boyeldieu 1985), Kim (Lafarge 1978), Day (Nougayrol 1979), Mundang (Elders 2000), Mambay (Anonby 2010), Mumuye (Shimizu 1983), Dii/Duru (Bohnhoff 2010) Plateau Izere (Blench 2001; Hyman 2010c), Birom (Blench 2005; Hyman 2010c) Cross River Ibibio (Urua 2000; Harris and Urua 2001; Akinbiyi and Urua 2002; Harris 2004), Gokana (Hyman 2011) NW Bantu Kukuya (Paulian 1975; Hyman 1987), Tiene (Ellington 1977; Hyman 2010c), Basaa (Hyman 2008), Eton (Van de Velde 2008) b. Chadic Goemai (Hellwig 2011), Tumak (Caprile 1977), Ndam (Broß 1988) Isolate Laal (Lionnet, personal notes) c. ‘Khoisan’ (Beach 1938; Traill 1985; Miller-Ockhuizen 2001; Nakagawa 2010)

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Sub-Saharan Africa 191 The initial stem syllable may also affect how tone rules apply (e.g. attracting a H tone to them, as in Giryama; Volk 2011b) or stopping the further spread of H, as in Lango, where Noonan (1992: 51) states that ‘primary stress in Lango is invariably on the root syllable’. Since stem-initial stress is common cross-linguistically, it is natural to identify such stem-initial effects with the broader concept of WA, despite the otherwise elusive nature of WA in sub-Saharan African languages. For further discussion see Hyman (2008: 324–334), Downing (2010: 408–409), Hyman et al. (2019), and references cited therein.

12.4 Intonation Focusing on the prosodic features that mark sentence type or syntactic domain, we follow Ladd’s (2008b) definition: ‘Intonation, as I will use the term, refers to the use of suprasegmental phonetic features to convey “post-lexical” or sentence-level pragmatic meanings in a linguistically structured way’ (italics retained from Ladd). Following Ladd, we leave out a discussion of paralinguistic functions of intonation, such as enthusiasm or excitement, as achieved by tempo and pitch range modulations. A number of African languages distinguish sentence types with intonational pitch contours, often in addition to the lexical and GTs or WA in the language. Other prosodic features, such as length, are also used to mark phrasal boundaries. However, some highly tonal languages in Africa show little to no effect of intonation at all. Perhaps expectedly, there seems to be a correlation between high numbers of contrastive lexical and gram matical level tones and a lack of intonational contours to mark sentence type. For example, Connell (2017) describes the prosodic system of Mambila, a Bantoid language (Nigeria and Cameroon) with four contrastive tone heights in addition to GT marking, as having no consistent f0 contribution in indicating sentence type (i.e. declarative sentence vs. polar question). This section surveys intonational tendencies in polar questions, declarative sentences, and focus constructions across sub-Saharan African languages.

12.4.1 Pitch as marking sentence type or syntactic domain One particularly salient property of intonation contours in a wide range of sub-Saharan African languages is the lack of a rising right-edge boundary tone in polar questions (Clements and Rialland 2008: 74–75; Rialland 2007, 2009). However, even languages that lack a H% in polar questions by and large show pitch raising in some respect, either through register raising (as in Hausa and Lusoga) or by a raising of a H before a final L%. In a sample of over 100 African languages, Clements and Rialland (2008) found that more than half lack an utterance-final high or rising contour in polar questions. A number of languages show no intonational difference between declarative sentences and polar ques tions. Others make use of utterance-final low or falling intonation in polar questions. Specifically, such marking of polar questions by a final boundary tone L% is found in most Gur languages, as well as a number of Mande, Kru, Kwa, and Edoid languages, suggesting that it is an areal feature of West Africa. Clements and Rialland (2008: 77) found no Bantu,

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

192 Larry M. Hyman et al. Afro-Asiatic, or Khoisan languages that mark polar questions with a final L%,2 though see Rialland and Embanga Aborobongui (2017) on Embosi, a Bantu language with a HL% fall ing boundary tone in polar questions. Further east in Lusoga, another Bantu language, there is a right-edge H% tone in declaratives, but a L% in interrogatives and imperatives (Hyman 2018). All of the verb forms and the noun ‘farmers’ in (5a) are underlyingly tone less, while in (5b) ‘women’ has a H to L pitch drop from /ba-/ onto the first syllable of the noun root /kazi/. (5) a. Declarative: Interrogative: Imperative:

à-bál-á á-bá-límí ‘s/he counts the farmers’ à-bàl-à à-bà-lìmì ‘does s/he count the farmers?’ bàl-à à-bà-lìmì ‘count the farmers!’

b. Declarative: à-bál-á á-bá-kàzí ‘s/he counts the women’ Interrogative: à-bál-á á-ba̋-kàzì ‘does s/he count the women?’ Imperative: bàl-à à-ba̋-kàzì ‘count the women!’ (a- ‘3sg noun class 1 subject’, -bal- ‘count’, -a ‘final inflectional vowel’, a- ‘augment determiner’, -ba- ‘class 2 noun prefix, -limi ‘farmer’, -kàzi ‘woman’) While the speaker typically raises the pitch register to produce the completely toneless interrogative utterance in (5a), the whole sequence trends down towards the final L%. In the interrogative in (5b), the phonological L that follows the H is realized on super-high pitch with subsequent TBUs progressively anticipating the level of the L%.3 This widespread L% question marking across sub-Saharan Africa is surprising from a cross-linguistic perspec tive, since a H% or rising intonation in polar questions has been noted across language families (Bolinger 1978: 147) and at one time was thought to be a near-universal linguistic property (Ohala 1984: 2). On the other hand, a large number of African languages show a right-edge L% in declara tive sentences. Of the 12 African language prosodic systems described in Downing and Rialland (2017a), 10 display a L% in declaratives. The two exceptions are Basaa [Bantu; Cameroon] (Makasso et al. 2017) and Konni [Gur; Ghana) (Cahill 2017). See also the dis cussion above of Lusoga , which has a H% in declaratives. Remarkably from a cross-linguistic perspective, not many African languages use pros ody to mark focus constructions. While there are a number of distinct focus constructions found across African languages (Kalinowski 2015), intonation plays little to no role in focus marking. According to Kalinowski (2015: 159), ‘It is evident from the collection of examples from these 135 languages that focus encoding in African languages is largely morphosyntactic in nature. While prosodic cues of stress and intonation may also be involved, they are not the primary means of encoding focus.’ However, there are a few exceptions to the rule, where focused elements show a marked intonation contour: Hausa (Inkelas 1989b; Inkelas and Leben 1990), Chimwiini [Bantu; Somalia] (Kisseberth 2016),

2 Hausa shows an optional low boundary tone in polar questions (Newman and Newman 1981); how ever, there is also clear register raising in polar questions (Inkelas and Leben 1990), which rules it out of Clements and Rialland’s list. 3 Concerning the imperative, it is also possible to say [bàl-à à-bá-kàzí] if the meaning is a suggestion, e.g. ‘what should I do?’, answer: ‘count the women!’. It is never possible to show a final rise in a question.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Sub-Saharan Africa 193 Akan (Kügler 2016), Shingazidja [Bantu; Comoros] (Patin 2016), and Bemba [Bantu; Zambia) (Kula and Hamann 2017). In Hausa (Inkelas et al. 1986; Inkelas and Leben 1990), almost any word in an utterance can be emphasized (focused) by raising the first high tone in the emphasized word, which marks the beginning of an intonational domain. Phonological alternations that only apply within and not across intonational domains in Hausa (i.e. downdrift and raising of under lying low tones between two underlying high tones) do not apply between an emphasized word and the preceding word. In languages with both complex tonal inventories and intonation, the two sometimes interact. In Embosi [Bantu; Congo-Brazzaville] (Rialland and Aborobongui 2017), inton ational boundary tones are superimposed onto lexical tones, resulting in super-high and super-low edge tones. In (6), the final lexical L is produced with super-low pitch due to the utterance-final L%. (6)

[wáβaaɲibeabóowéȅ] (Rialland and Aborobongui 2017: 202) wa áβaaɲi bea bá (m)o-we 3sg.pro 3sg.take.away.rec cl8.property cl8.gen cl1-deceased ‘He took away the properties of the deceased.’

In Hausa, the final falling tone (HL%) in interrogatives neutralizes the difference between underlying H and underlying HL (Inkelas and Leben 1990). For example, word-final kai, ‘you’, with underlying H, and kâi, ‘head’, with underlying HL, are both produced with a HL fall as the final word in a question. In addition, downdrift is suspended both in questions and under emphasis in Hausa (Schachter 1965). In other languages with both tone and intonation, the two have very little effect on each other. In a number of languages, coordination is often optionally marked with intonation alone. This is the case, for example, in Jamsay (Dogon: Heath 2008: 136–138), a two-tone language where coordinated NPs can simply be listed, the coordination being marked on every coordinated element only by what Heath terms ‘dying quail’ intonation, characterized by the ‘exaggerated prolongation of the final segment (vowel or sonorant), accompanied by a protracted, slow drop in pitch lasting up to one second’, e.g. /wó∴ kó∴/ → [wóōò kóōò] ‘he/ she and it’. A similar phenomenon is attested in Laal (Lionnet, personal notes), which has three tones (H, M, and L) and where two intonational patterns marking emphatic conjunc tion are attested. In both cases, the conjoined NPs are juxtaposed, and the coordination is marked only by a specific word-final pitch contour. In the first case, illustrated in (7a), the last syllable of every coordinated member is realized with rising pitch, irrespective of the value of the last lexical tone. The second strategy is preferred when the list is very long. Here, the word-final rhyme is significantly lengthened and the rising pitch is followed by a fall to mid pitch, as shown in (7b). (7) a. bààr↗ náár↗ í tú pār → [bàa̋r náa̋r…] his.father his.mother it.is Bua all ‘Both his father and his mother are Bua [ethnic group].’ b. í sèré↗↘ í cáŋ↗↘ kə́w kíínà pār → [í sèrée̋ē í cáŋ̋ŋ̄…] it.is S.Kaba it.is Sara also do.it all ‘The Sara Kaba, the Sara, etc. everyone used to practise it too (slavery).’

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

194 Larry M. Hyman et al.

12.4.2 Length marking prosodic boundaries Pitch is not the only phonetic parameter used to demarcate utterance and phrasal boundar ies. A number of Bantu languages display lengthening of the penultimate vowel of a par ticular syntactic domain or sentence type. For instance, in Shekgalagari, which contrasts /H/ vs. Ø, the penultimate vowel is lengthened in declarative utterances, creating differ ences between medial versus final forms of nouns, as shown in (8) (Hyman and Monaka 2011: 271–272). While nouns with /Ø-H/ and /H-Ø/ patterns simply show vowel lengthening, penultimate lengthening affects the tone of the last two syllables of the other two patterns. When the last two syllables are toneless, the lengthened penultimate vowel contours from a L to super-low tone. When the last two syllables are /H/, the final H is lost and the penulti mate H contours from H to L. (8)

Underlying /Ø-Ø/ /Ø-H/ /H-Ø/ /H-H/

Medial nàmà nàwá lórì nárí

Final nȁːmà nàːwá lóːrì nâːrí

‘meat’ ‘bean’ ‘lorry’ ‘buffalo’

Penultimate lengthening does not occur in interrogative or imperative sentence types, where the final tones are realized as in medial position: [à-bàl-à rì-nárí] ‘is s/he counting buffalos?’ (cf. [à-bàl-à rì-nâːrí] ‘s/he is counting buffalos’). See also Selkirk (2011) for clauselevel penultimate lengthening in Xitsonga and Hyman (2013) for a survey of the status of penultimate lengthening in different Bantu languages.

12.5 Conclusion The prosodic systems of sub-Saharan languages are quite varied. While tone is almost uni versal in the area, some languages have very dense tonal contrasts, some sparse; some lan guages make extensive grammatical use of tone, some less; and so forth. Word stress is less obvious in most languages of the area, with the question of whether stem-initial promin ence should be equated with WA being unresolved. Finally, while less studied, the recent flurry of intonational studies is very encouraging.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 13

North A fr ica a n d the M iddl e E ast Sam Hellmuth and Mary Pearce

13.1 Introduction This chapter reviews the prosodic systems of languages spoken in North Africa and the Middle East, taking in the Horn of Africa, the Arabian Peninsula, and the Middle East. The area’s southern edge is formed by Mauretania, Mali, Niger, Chad, South Sudan, Ethiopia, and Somalia, as illustrated in Map 13.1, which indicates word-prosodic features by the greyscaled locator circles.1 We outline the scope of typological variation within and across the Afro-Asiatic and Nilo-Saharan language families in word prosody, prosodic phrasing, melodic structure, and prosodic expression of meaning (sentence modality, focus, and information structure). The survey is organized around language sub-families (§13.2 and §13.3). We close with a brief discussion in §13.4, where we also set out priorities for future research. In this chapter the term ‘stress’ denotes word-level or lexical prominence. We assume tone and stress are independent, with no intermediate accentual category (Hyman 2006). The term ‘pitch accent’ thus always denotes a post-lexical prominence or sentence accent, as used in the autosegmental-metrical framework (Gussenhoven 2004; Ladd 2008b).

13.2 Afro-Asiatic 13.2.1 Berber The Berber—now known as Amazigh—languages are all non-tonal but appear to vary regarding presence of stress. The Eastern varieties (in Tunisia, Libya, and Egypt) display word-level stress (Kossmann 2012), though without stress minimal pairs. Relatively little is 1 We also mention languages in the Nilotic family spoken further south, in Uganda, Kenya, and into Tanzania.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

196 SAM HELLMUTH AND MARY PEARCE

stress

no stress

one tone

two tones

three or more tones

Map 13.1 Geographical location of languages treated in this chapter, with indications of the presence of stress and of tone contrasts (1 = binary contrast; 2 = ternary contrast; 3+ = more complex system), produced with the ggplot2 R package (Wickham 2009)

known about the word prosody of most Libyan dialects, such as Ghadames (Kossmann 2013), but in Zwara stress generally falls on the penult (Gussenhoven 2017a). In contrast, in the Northern varieties (in Morocco and Algeria), although it is possible to construe rules for stress assignment in citation forms, these do not hold in connected speech (Kossmann 2012). For example, in Tarifit, prosody marks both clause structure and discourse structure, but pitch and intensity do not routinely co-occur (McClelland 2000). Similarly, in Tuareg, although stress can be described for citation forms (lexically determined in nouns and verbs but on the antepenultimate otherwise), accentual phrasing overrides these citation form stress patterns in ways that are as yet poorly understood and require further investigation (Heath 2011: 98). This variable pattern has been clarified in Tashlhiyt, through experimental investigation, as a non-tonal, non-stress language (without culminative stress). For example, in Tashlhiyt the intonational peak in polar questions varies probabilistically; sonorant segments tend to attract the pitch accent and tonal peaks are later in questions than in statements (Grice et al. 2015), and a similar pattern is found in wh-questions (Bruggeman et al. 2017). Intonational peaks in Tashlhiyt thus do not display the kind of consistent alignment that might indicate underlying association with a stressed syllable. In contrast, the intonation patterns of Zwara, which has word-level stress, are readily analysed in terms of intonational pitch accents and boundary tones (Gussenhoven 2017a).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

NORTH AFRICA AND THE MIDDLE EAST 197 In general, Amazigh languages make use of an initial or final question particle in polar questions and display wh-word fronting (Frajzyngier 2012). Focused elements are usually fronted but can also be right-dislocated, with associated prosodic effects; a topic is similarly placed clause-initially and marked by intonation (Frajzyngier 2012). Verb focus can be marked solely prosodically in most Amazigh varieties, with the exception of Kabyle, which requires the verb to be clefted (Kossmann 2012: 94).

13.2.2 Egyptian The now extinct Egyptian language went through several stages (Old, Middle, Late, and Demotic) before evolving into Coptic. There is no indication that the language had contrast ive tone at any stage. Egyptian had wh-in-situ, and it is assumed (Frajzyngier 2012) that at all stages a polar question could be realized on declarative syntax by changing the intonation contour. It had a set of stressed pronouns and a focus particle, and topicalization was realized through extraposition and a particle. Coptic language was spoken from the fourth to the fourteenth centuries, cohabiting with Arabic from the ninth century onwards, and survives only in the liturgy of the Coptic Orthodox church. Reconstructing from Coptic, it is likely that stress in Egyptian fell on either the final or the penult syllable and is reported to be marked by ‘strong expiratory stress’ (Fecht 1960; cited in Loprieno and Müller 2012: 118). Questions in Coptic were marked by particles and ‘possibly also by suprasegmental features such as intonation’ (Loprieno and Müller 2012: 134).

13.2.3 Semitic The Semitic languages are almost all non-tonal stress languages (exceptions are noted below).

13.2.3.1 East Semitic Evidence from texts in the now extinct Akkadian language indicate that it did not have phonemic stress, but otherwise little is known about its prosody (Buccellati 1997). It displayed fronting of topics and right-dislocation with resumptive pronouns (Gragg and Hoberman 2012).

13.2.3.2 West Semitic: Modern South Arabian In the western Modern South Arabian languages (Hobyot, Bathari, Harsusi, and Mehri), stress falls on the rightmost long syllable in the word, else on the initial syllable; in the eastern languages, Jibbali can have more than one prominent syllable per word, while in Soqotri stress falls towards the beginning of the word (Simeone-Senelle 1997, 2011). Polar questions are marked in the Modern South Arabian languages by means of intonation alone, and wh-words are either always initial (e.g. Soqotri) or always final (e.g. Mehri) (Simeone-Senelle 1997, 2011). A recent investigation of speech co-gestures in Mehri and Shehri (Jibbali) notes that intonation is used in Mehri to mark the scope of negation, though without explicitly describing the prosodic means used to achieve this effect (Watson and Wilson 2017).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

198 SAM HELLMUTH AND MARY PEARCE

13.2.3.3 West Semitic: Ethio-Semitic Although Ge’ez is no longer spoken, tradition suggests that stress fell on the penult in verbs but was stem-final in nouns and pronouns, with some exceptions (Gragg 1997; Weninger 2011). The position of stress in Tigrinya has been described as shifting readily from one position to another and is not always marked in parallel by dynamic stress correlates (intensity/dur ation) and pitch. Kogan (1997: 439) suggests therefore that ‘sentence intonation is clearly predominant over the stress of an individual word’, resembling descriptions noted above for Amazigh varieties that lack stress. A similar pattern is reported for neighbouring Tigré (Raz 1997). In Amharic, stress is described as ‘not prominent’, falling primarily on stems but displaying some interaction with syllable structure, and requiring further research (Hudson 1997). In other Ethio-Semitic languages, descriptions tend to be limited to a statement that stress is not phonemic, without elaborating further (e.g. Wagner 1997 for Harari), or make no mention of stress at all (Watson 2000). Hetzron (1997a) suggests that there is variation among Outer South Ethiopic languages, with only the most ‘progressive’ (Inor) displaying discernible stress (typically on a final heavy syllable, else on the penult). In Amharic, polar questions are marked by rising intonation, a clause-final question marker, or a verbal suffix (Hudson 1997); in wh-questions the wh-word occurs before the sentencefinal verb (Frajzyngier 2012). Questions are formed by means of a question particle attached to the questioned constituent in Tigrinya (Kogan 1997), and by an optional sentence-final particle in the Outer South Ethiopic languages (Hetzron 1997a).

13.2.3.4 Central Semitic: Sayhadic Little is known about the stress system or any other aspect of the prosody of the now extinct Sayhadic languages (Kogan and Korotayev 1997).

13.2.3.5 Central Semitic: North West Semitic In Biblical Hebrew, stress was generally final, with exceptions (Edzard 2011); stress markings in the codified Masoretic text of the Hebrew Bible show that stress position was contrastive and that surface vowel length was governed by stress (Steiner 1997). Segmental sandhi, known as ‘pausal forms’, are observed at phrase boundaries (McCarthy 1979b, 2012), and prosodic rhythm rules applied in the Tiberian system (Dresher 1994). Stress in Modern Hebrew falls on the final or penult syllable, with some morphological exceptions (Berman 1997; Schwarzwald 2011), as it most likely did in early Aramaic. In the Eastern Neo-Aramaic languages, stress tended to fall on the penult, whereas in the West Aramaic languages the position of stress depends on syllable structure, as for Arabic (Jastrow 1997; Arnold 2011; Gragg and Hoberman 2012).

13.2.3.6 Central Semitic: Arabian Little is known of the prosody of the extinct Ancient North Arabian languages. The other members of the Arabian family form five regional groups of spoken dialects across North Africa, Egypt and Sudan, the Levant, Mesopotamia, and the Arabian Peninsula (Watson 2011).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

NORTH AFRICA AND THE MIDDLE EAST 199 The position of primary stress in the word is in general predictable from syllable structure in Arabic dialects (as also in Maltese) and there is an extensive literature on microvariation between dialects in stress placement, as illustrated in Table 13.1 (see summaries in van der Hulst and Hellmuth 2010; Watson 2011; Hellmuth 2013).

Table 13.1 Stress assignment in different Arabic dialects Standard Arabic Palestinian Arabic Lebanese Arabic Cairene Arabic Negev Bedouin kiˈtaːb ˈkaːtib ˈmaktaba ˈkatab

kiˈtaːb ˈkaːtib ˈmaktaba ˈkatab

kiˈtaːb ˈkaːtib ˈmaktabe ˈkatab

kiˈtaːb ˈkaːtib makˈtaba ˈkatab

kiˈtaːb ˈkaːtib ˈmaktabah kaˈtab

book writer library he wrote

(Adapted from Hellmuth 2013: 60)

Exceptions to the general rule of predictable stress in Arabic include Nubi (derived from an Arabic pidgin), which has an accentual system (Gussenhoven 2006), and Moroccan Arabic, in which the status of stress is disputed. Maas and Procházka (2012) argue that Moroccan Arabic and Moroccan Berber (including Tashlhiyt) form a Sprachbund, sharing a large number of features across all linguistic domains, including phonology. They thus argue that Moroccan Arabic—like Moroccan Berber (see §13.2.1)—has post-lexical phrasal accentuation only, and no stress. There have been differing views on Moroccan Arabic stress (Maas 2013), since a stress generalization can be formulated for citation forms that no longer holds in connected speech (Boudlal 2001). One suggestion is that Moroccan Arabic has stress but is an ‘edge-marking’ language with boundary tones only and no prominence-marking intonational pitch accents (Burdin et al. 2015). Indeed, the descriptive observation is that tonal peaks occurring on a phrase-final word display alignment with the syllable that would be stressed in citation form (Benkirane 1998), confirmed also in corpus data (Hellmuth et al. 2015). This suggests the peak is neither solely prominence marking nor edge marking, forcing analysis as an edge-aligned pitch accent, as proposed for French (Delais-Roussarie et al. 2015), or as a non-metrical pitch accent (Bruggeman 2018). A recent comparative study (Bruggeman 2018) shows that Moroccan Arabic and Moroccan Berber speakers both demonstrate perceptual insensitivity to lexical prominence asymmetries, of the type shown by speakers of other languages known to lack word-level stress, such as French or Farsi (Rahmani et al. 2015). Standard Arabic is not acquired by contemporary speakers as a mother tongue but is instead learned in the context of formal religious or state education. It is possible to formulate a stress algorithm for Standard Arabic (Fischer 1997), and stress rules are described for learners of Arabic (Alhawary 2011), but Gragg and Hoberman (2012: 165) note that the Arab traditional grammarians did not describe the position of stress in Classical Arabic, and take this as evidence that Classical Arabic did not have stress and was ‘like modern Moroccan Arabic’. Retsö (2011) similarly suggests that the absence of stress–morphology interaction in Classical Arabic indicates that it had a system in which prominence was marked only by pitch. The prosody of Standard Arabic, as used today in broadcasting and other formal settings, most likely reflects the prosodic features of a speaker’s mother tongue spoken dialect

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

200 SAM HELLMUTH AND MARY PEARCE (cf. Retsö 2011). For stress this generates micro-variation in stress assignment patterns in Standard Arabic in different contexts, such as Cairene Classical Arabic versus Egyptian Radio Arabic (Hayes 1995). For intonation, some sharing of tonal alignment features between colloquial and Standard Arabic was found in a small study of Egyptian Arabic speakers (El Zarka and Hellmuth 2009). Prosodic juncture is marked in Standard Arabic by ‘pausal forms’, whereby grammatical case and other suffixes appear in a different form when phrase-final (Hoberman 2008; Abdelghany 2010; McCarthy 2012), as in Table 13.2. Accurate use of pausal forms is part of tajwīd rules for recitation of the Qur’ān (Al-Ali and Al-Zoubi 2009).

Table 13.2 Pausal alternations observed in Classical Arabic (McCarthy 2012) Absence of suffix case vowel Epenthesis of [h] after stem vowel Metathesis of suffix vowel Absence of suffixal [n] [ah] for suffix [at]

Phrase-internal

At pause

ʔalkitaːb-u ʔiqtadi ʔal-bakr-u kitaːb-un kaːtib-at-un

ʔalkitaːb ʔiqtadih ʔal-bakur kitaːb kaːtib-ah

the book (nom) imitate (3ms.imp) the young camel (nom) a book (nom) a writer (f.nom)

There are relatively few descriptions of cues to phrasing in spoken Arabic dialects (Hellmuth 2016), but it is likely that there is variation across dialects in the ‘default’ prosodic phrasing, similar to that seen in Romance languages: in Spanish, a phrase boundary is typically inserted after the subject in an SVO sentence, but not in Portuguese (Elordieta et al. 2005), and a similar pattern appears to differentiate Jordanian Arabic and Cairene Arabic (Hellmuth 2016). Segmental sandhi mark prosodic boundaries in some dialects: laryngealization in dialects of the Arabian peninsula (Watson and Bellem 2011) and Tunisia (Hellmuth, 2019), diphthongization of final vowels in the Levant, and nasalization in western Yemen (Watson 2011). Further research is needed to determine whether these cues mark syntactic structure or some other aspect of discourse structure, such as turn-finality. Focus and topic marking are achieved in spoken Arabic through a mixture of syntactic and prosodic means, including clefts or pseudo-clefts with associated prosodic effects. In most varieties a polar question can be realized through prosodic means alone; dialects vary with respect to wh-fronting versus wh-in-situ (Aoun et al. 2010). Focus can also be marked by prosodic means alone in many if not all dialects (see the literature review in Alzaidi et al. 2018). There is a growing body of literature on the intonational phonology of Arabic dialects (see summaries in Chahal 2006; El Zarka 2017). So far, all Arabic dialects outside North Africa appear to display intonation systems comprising both pitch accents and boundary tones. Variation in the inventory of nuclear contours (nuclear accent + final boundary tone combinations), as reported in Chahal (2006), suggests dialectal variation in the inventory of boundary tones, at least, and further comparative work may reveal variation in pitch accent inventories. Retsö (2011) notes variation across dialects in the phonetic realization of stress, differentiating ‘expiratory accent’ in Levantine varieties from ‘tonal accent’ in Cairene; this observation has been reanalysed in the autosegmental-metrical framework as variation in the distribution of pitch accents, occurring on every prosodic word in Cairene but more sparsely distributed, at the phrasal level, in Levantine (Hellmuth 2007; Chahal and Hellmuth 2014a).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

NORTH AFRICA AND THE MIDDLE EAST 201

13.2.4 Chadic Chadic languages are tonal. Many Chadic languages (e.g. Migaama, Mofu, and Mukulu) are open to analysis as ‘accentual languages’ in which there is at most one H tone per word, which is accompanied by other indicators of prominence (Pearce 2006), but others (e.g. Kera, Masa, and Podoko) have three tones and a variety of lexical tone melodies on nouns. A common explanation for this variety within the Chadic family is that a process of tonogenesis has generated a tonal split from a single tone system into two tones in some languages, and into three in others (Wolff 1987). A typical example, as illustrated for Musgu in (1), is a system where syllables with voiceless onsets usually carry a H tone and syllables with voiced onsets usually carry a L tone; sonorants and implosives may be associated with a third tone (M), or they might pattern with one of the other groups. (1) Musgu depressor and raiser consonants (Wolff 1987) depressor: L zìrì ‘align’ vìnì ‘take’ raiser: H sírí ‘squash’ fíní ‘stay’ neutral: L yìmì ‘trap’ H yímí ‘be beautiful’ The wide variety of systems observed in Chadic suggests that tonogenesis probably occurred independently in separate languages rather than once in proto-Chadic (Jim Roberts, personal communication). Whatever the diachronic history, in the synchronic grammar, the roles may become reversed: in Kera it is now tone that is phonemic, with laryngeal voice onset time cues serving as secondary enhancement to the tone cues (Pearce 2005). The function of tone in Chadic is lexical as well as grammatical (Frajzyngier 2012), and most languages appear to display little tone movement or spreading, and probably no downstep (Jim Roberts, personal communication); however, exceptions include Ga’anda, which has floating tones and associated downdrift (Ma Newman 1971), and Ngizim, which has tone spreading and downstep (Schuh 1971). Hausa has two basic tones: H~L. Surface falling tones derive from adjacent underlying HL sequences (e.g. due to affixation) but can only be realized on a heavy syllable; in contrast, underlying LH sequences are truncated to a surface high tone (P. Newman 2000, 2009). A more complex case is Kera, which has three tones in rural speech communities, but in urban varieties (where there has been prolonged contact with French) the system reduces to two tones plus a voicing contrast in some contexts, and the change is sociolinguistically conditioned: among women in the capital, there is an almost complete loss of tone (Pearce 2013). Although Kera is cited as one of the few languages to exhibit long-distance voicing harmony between consonants (Odden 1994; Rose and Walker 2004), the facts can be accounted for by proposing tone spreading with voice onset time corresponding to the tone (Pearce 2006). Similarly, it has been claimed that Kera voiced (‘depressor’) consonants lower the tone of the following syllable (Ebert 1979; Wolff 1987; Pearce 1999), but acoustic analysis confirms that although there is surface con sonant and tone interaction, it is the tones that are underlying and distinct (Pearce 2006). Mawa has three tones in surface transcription, which can probably be reduced to two underlying tones, M and L, with H as an allophone of L (Roberts 2013), and Roberts (2005) makes similar claims for Migaama. In sum, the typical Chadic tonal system has a two-way contrast between /H/ and a non-high tone [M] or [L], depending on the preceding conson ant, which in some languages has developed into a three-way /H, M, L/ contrast.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

202 SAM HELLMUTH AND MARY PEARCE Turning to sentence prosody, in Central and some West Chadic languages, word-final vowel retention (i.e. blocking of word-final vowel deletion) marks the right edge of prosodic phrases, for example in Hausa and Gidar, with similar blocking of vowel raising at phrase edges in Mafa; Wandala does not permit consonants in phrase-final position (Frajzyngier and Shay 2012). Polar questions are typically marked with a particle, whereas focus and topic can be marked in different ways, including particles, extraposition, tense-aspect markers, or intonation (Green 2007; Frajzyngier 2012). Focus is not always prosodically marked, however, if marked at all (Hartmann and Zimmermann 2007a).

13.2.5 Cushitic All languages in the Cushitic family appear to be tonal, and generally of the non-obligatory type (in which a tone is not observed on every syllable, or on every lexical item). In contrast to Chadic, in most Cushitic languages the function of tone is mostly grammatical, not lexical, marking categories such as negation, case, gender, or focus (Frajzyngier 2012; Mous 2012). Some languages in the family have a purely demarcative tonal system, such as K’abeena, whereas Awngi has demarcative phrasal stress as well as lexical tone (Hetzron 1997b). Somali has three surface word melodies, high LLH ~ falling LHL ~ low LLL (Saeed 1987), typically analysed as a privative system in which presence of a high tone (underlying /H/) alternates with absence of a high tone (underlying ‘Ø’) realized phonetically with low tone (Saeed 1999; Hyman 2009). Iraqw also has either surface H or L on the final syllable but all non-final syllables realized with mid or low tone (Nordbustad 1988; Mous 1993), and can also be analysed as /H/~ Ø (Hyman 2006). Beja has one culminative tone per word, whose position is contrastive, yielding minimal pairs such as [ˈhadhaab] ‘lions’ ~ [haˈdhab] ‘lion’ (Wedekind et al. 2005). Sidaama has at most one tone per word, whose position is contrast ive but also subject to movement in connected speech (Kawachi 2007). Afar has an obligatory phrasal H tone on the first word in each accentual phrase, appearing on the accented syllable in lexically accented words, otherwise on the final syllable (Hayward 1991). In some Cushitic languages (including Somali, Iraqw, and Alagwa), when a sentencefinal H tone is added to a word to form a polar question, all and any preceding H tones in the word or phrase are deleted, resulting in a low-level contour with a final rise that is described as ‘a phonologized intonational pattern’ (Mous 2012: 352). More generally in Cushitic, polar questions are formed by a change to the prosodic pattern, such as a rise in pitch or a rise-fall (e.g. in Sidaama: Kawachi 2007), with the addition of further segmental material in some languages. In Iraqw this takes the form of a verbal suffix, whereas in K’abeena it is fully voiced (modal) rather than whispered realization of the utterance-final vowel (Crass 2005; cited in Mous 2012); in southern Oromo dialects, the final fall in pitch is realized on a ‘linker clitic’ [áa] (Stroomer 1987). Focus is marked in Cushitic by clefting and/or use of focus particles, and topicalization by means of extraposition and determiners (Frajzyngier 2012). Iraqw and Somali display topic-fronting with a following pause (Frascarelli and Puglielli 2009; Mous 2012). In Oromo, a fronted syllable attracts sentence stress, as does a focus particle (Stroomer 1987). In Iraqw, a polar question is realized by the addition of a sentence-final particle, together with a H tone on the penult syllable of the phrase, which is also lengthened

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

NORTH AFRICA AND THE MIDDLE EAST 203 (Nordbustad 1988). In Beja, the shape and direction of the prosodic contour at phrase edges also serves to disambiguate the function of connective co-verbs in marking discourse structure (Vanhove 2008).

13.2.6 Omotic All Omotic languages are reported to have contrastive tone, with an overall tendency in the group towards marking of grammatical function rather than lexical contrast (Frajzyngier 2012). In some languages a tone is observed on every syllable, but in others tones do not necessarily appear on every lexical item, nor on every syllable if a word does have tone. Overall, then, the tonal systems vary considerably across the languages in this putative group—which may contribute to doubts about the degree of their relatedness.2 The number of contrastive tones and their distribution range from just one tone per word in Wolaitta, in which H tones contrast with toneless syllables, up to the typologically large system of six level tones in Bench, with a tone on every syllable. Dizi and Sheko each have a system of four level tones, and Nayi and Yem have three. This wide degree of variation may be due to contact with Nilo-Saharan languages (Amha 2012). Hayward (2006) notes a constraint on tonal configurations in nominals in a subset of Omotic languages such that only one high tone may appear within the nominal phrase, with other syllables bearing low tone, yielding a LHL contour, which he calls the ‘OHO’ (one high only) constraint. He notes further that this constraint is confined to those languages that display consistent head-final syntax, and thus have post-modifers in the noun phrase. Polar questions are formed in Maale by means of a question particle, optionally accompanied by rising intonation, but in Zargulla a question is marked by a change in verbal inflection, without any accompanying prosodic marking (Amha 2012). Focus is generally achieved by means of extraposition, again with no mention of accompanying prosodic marking (Frajzyngier 2012).

13.3 Nilo-Saharan The Nilo-Saharan languages3 are tonal, and most have two or three tonal categories with little tone spreading but some interaction of tone with voice quality and vowel harmony.

2 The North Omotic and South Omotic languages are now treated as independent (Hammarström et al. 2018) due to their lack of Afro-Asiatic features (Hayward 2003), despite earlier inclusion in Afroasiatic (Hayward 2000; Dimmendaal 2008). This section reviews the prosody of languages treated as members of the Omotic family at some point or spoken in south-western Ethiopia (within the geographical scope of Figure 13.1) without taking a position on the affiliation of individual languages or sub-families to Afroasiatic. 3 The Nilo-Saharan languages are diverse and there is debate as to the integrity of the family (Bender 1996).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

204 SAM HELLMUTH AND MARY PEARCE

13.3.1 Eastern Sudanic Tone is used to mark case in a number of East African languages, with case often marked exclusively by tone, as in Maa and Datooga (König 2009); in all cases where case is marked by tone, the language has a ‘marked nominative’ case system (König 2008). Hyman (2011, 2019) also notes tonal case-marking in Maasai. Similarly, the Ik language displays lexical tone (in verb and noun roots) realized differently according to grammatical context, with tonal changes that must accompany certain segmental morphemes (Schrock 2014, 2017); the underlying H and L tones each have surface ‘falling’ and downstepped variants and are also subject to downdrift and the effects of depressor consonants. Overall, the patterning of these tonal processes in Ik indicates a role for metrical feet, alongside distinct intonational contours marking indicative, interrogative, and ‘solicitive’ illocutionary force. In Ama, tone plays a part in several grammatical constructions and—in contrast to Ik— there are cases where tone is the only change, as shown in (2) (Norton 2011). searching (2) Imperfective third person present sāŋ Imperfective first or second person present sàŋ

sleeping túŋ tūŋ

washing ágēl āgèl

Dinka has a robustly demonstrated three-way vowel length contrast (Remijsen and Gilley 2008; Remijsen and Manyang 2009; Remijsen 2014), and appears to have developed from a vowel harmony type system into a contrast between breathy voice and creaky voice. Dinka is also rich in grammatical tone, for case marking and in verb derivations (Andersen 1995), with some dialectal variation in the number of tonal categories. Acoustic analysis has shown that Dinka contour tones contrast in the timing of the fall relative to the segmental content of the syllable, as well as in tonal height (Remijsen 2013). Contrastive alignment is also found in Shilluk (Remijsen and Ayoker 2014), thus challenging earlier typological claims that alignment in contour tones is never phonologically contrastive (Hyman 1988; Odden 1995). Shilluk has a complex tonal system involving three level tones and four contour tones (Remijsen et al. 2011). Tone has lexical function, marking mor phemic status in verbs, but there is also some grammatical function (e.g. the possessive marker). In Mursi, anticipatory ‘tonal polarity’ is observed at the end of any word ending in a toneless syllable, in anticipation of the tone on the following word (Muetze and Ahland, in press). As with the other Nilo-Saharan languages, there seems to be a limit of one syllable on tone spreading or displacement. Mursi appears to have a two-tone system plus a neutral ‘toneless’ option, and this may hint at a link between two- and three-tone languages in this family—that is, if a ‘toneless’ category developed into a mid M tone in some languages, or vice versa. Kunama also has stable tones that do not move or spread, though tonal suffixes may cross syntactic boundaries and replace tones. Kunama has three level tones (H, M, and L), three falls (HM, HL, and ML) and one rise (M and H), with contours observed only on heavy syllables or on word-final short vowels (Connell et al. 2000; Yip 2002: 141–142).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

NORTH AFRICA AND THE MIDDLE EAST 205

13.3.2 Central Sudanic Sudanic languages typically have three tones (Jim Roberts, personal communication). In Ngiti [Central Sudanic; Zaire], the tone is largely grammatical, such as marking tense or aspect on verbs (Kutsch Lojenga 1994). The Sara language also has three level tones, but little grammatical tone or tonal morphology (Jim Roberts, personal communication). In contrast to these three-tone languages, Boyeldieu (2000) describes Modo, Baka, and Bongo as having four melodies in disyllabic words and no contour tones on individual syllables, suggesting a classic two-tone system: a phonetic M tone is derived from adjacent /H/ tones, the second of which drops to [M]. In Bongo, tone marks perfective aspect, and lexical tone on verbs is affected by preceding subject pronouns (Nougayrol 2006; Hyman 2016a). Gor has three tones but could have originated from a two-tone system, as four melodies predominate: HM, LM, LL, and MM, with no ML pattern (Roberts 2003). However, Gor cannot now be analysed as a two-tone language because words with CV structure can carry any of the three melodies H, M, or L. Tonal person markers are found in noun suffixes: a H tone indicates possession, but the same segments with no H tone indicate the direct object.

13.3.3 Maban In Masalit, tone has low functional load; the language has a 10-vowel system exhibiting advanced tongue root (ATR) harmony from suffix to root, though the [+ATR] close vowels are increasingly merging with their [−ATR] counterparts (Angela Prinz, personal communication). Weiss (2009) analyses Maba (which is a 10-vowel ATR system) as a pitch accent system that affects the intensity, distinctiveness, and quality of vowels; the position of the accent is determined by the presence of H tone, a long vowel, and the syllabic structure.

13.3.4 Saharan In Zaghawa, there appear to be two tones instead of the usual Sudanic three, as well as ATR harmony, but it is too early to make major statements about the tonal system.

13.4 Discussion This chapter yields a near-comprehensive picture for only one of the four aspects of prosody in our survey—namely, word prosody. That is, we can identify what type of word prosody each language has—that is, whether a language has tone or stress, or both, or neither. Frajzyngier (2012) points to a basic divide in word prosody across Afro-Asiatic languages, between tonal and non-tonal languages, and notes debate about the origin of such a divide. One view argues that if any members of the wider family have lexical tone, the common

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

206 SAM HELLMUTH AND MARY PEARCE ancestor must also have had it; thus, non-tonal languages must result from loss of tonal contrast over time, and this is argued to explain the large number of homophones found in Semitic. The competing view proposes tonogeneses of various kinds: due to laryngeal effects in Chadic, where tone more commonly has lexical function, or evolving from a predictable stress system coupled with segmental neutralization, and/or due to contact with robustly tonal languages from other language families. It is beyond the scope of this chapter to resolve this debate, but our survey confirms that the tonal versus non-tonal divide does not equate to a simple ‘stress versus tone’ dichotomy. Among tonal languages, there is wide variation in the number, distribution, and function of tonal contrasts, and it is now becoming clear that non-tonal languages do not all have stress. The non-binary nature of the stress versus tone distinction is well established in theoretical literature on tone (Hyman 2006) and is matched by more recent analyses of non-tonal but also non-stress languages as ‘edge-marking’ languages, in which tonal events associate with the edges of prosodic domains (only), within the autosegmental-metrical framework (Jun 2014b). Our ability to document prosodic variation, with respect to prosodic phrasing, melodic structure, and prosodic expression of meaning, is limited by the availability of descriptions of these aspects of the languages under consideration. This is sometimes due to a general lack of description of a language, but, more commonly, to a lack of description of postlexical prosody in those descriptions that do exist (with notable exceptions). Going before us, Frajzyngier (2012: 606) also notes, in a discussion of parataxis (marking of the relationship between clauses in complex sentences), that prosodic characteristics are ‘seldom indicated in grammars’, and our survey shows that this is still the norm. Some of these gaps will be artefacts of methodological choices and priorities, but others may be due to the practical difficulties, perceived or real, involved in the performance of post-lexical prosodic analysis. For example, Watson and Wilson (2017) highlight the importance of information about intonation patterns in contexts that are syntactically ambiguous in written transcription, but also note the ‘cumbersome’ nature of prosodic annotation, and thus argue for collection and sharing of audio (and audiovisual) recordings of less-described languages. There is so much scope for further research on the prosodic systems of North Africa and the Middle East, and particularly on post-lexical prosody, that the work of overcoming these obstacles is merited.

chapter 14

Sou th W e st a n d Cen tr a l Asi a Anastasia Karlsson, Gülİz Güneş, Hamed Rahmani, and Sun-Ah Jun

14.1 Introduction This chapter offers a survey of prosodic features of languages across Southwestern, Central, and Northern Asia. In this rather large area we find a variety of language families. In §14.2, our focus is on Turkish, the standard variant spoken in Turkey (Turkic), while §14.3 deals with Halh (Khalkha) Mongolian, the standard variant spoken in Mongolia (Mongolic language family). In §14.4, the standard variant of Persian spoken in Iran (Indo-European) is treated. §14.5 deals with standard Georgian (Kartvelian). The Turkic and Mongolic groups are usually regarded as two of the three branches of the proposed Altaic language superfamily, the third being the Tungusic group. Georgian belongs to the South Caucasian language group. The term ‘Caucasian’ applies to the four linguistic families indigenous to the Caucasus: Kartvelian, Abkhaz-Adyghe, Daghestanian, and Nakh (Kodzasov 1999). Owing to the considerable lack of descriptions of the prosody of languages spoken in the Caucasus and Central Asia, Georgian is the only language in this group that can be given more than a cursory treatment here.

14.2 Turkic The majority of Turkic languages lack contrastive lexical stress, and its status and realization in many of them are still debated, something that is characteristic of the Altaic language group generally. According to Özçelik (2014), most Turkic languages have finally prominent words, but the nature and function of this final prominence varies across them. For example, Kazakh has iambic feet, while Uyghur has footless intonational prominence, marked tonally by principles similar to those applying in Turkish. A counterexample to this general right-edged prominence is Chuvash [Turkic; western part of the Russian Federation], which marks words tonally on their left edge (Dobrovolsky 1999).

208 ANASTASIA KARLSSON ET AL.

14.2.1 Lexical prosody in Turkish: stress Turkish has long been analysed as a stress-accent language (Lees 1961; Kaisse 1985; Barker 1989; Inkelas and Orgun 1998; Inkelas 1999; Kabak and Vogel 2001; İpek and Jun 2013; İpek 2015; Kabak 2016). In this tradition, word stress is assigned to a word-final syllable (1) with some exceptions, such as place names (2c, 2d), some loanwords, some exceptionally stressed roots, or pre-stressing suffixes (e.g. Sezer 1983; Inkelas and Orgun 1998; Kabak and Vogel 2001). More recently, Turkish has been analysed as a lexical pitch accent language (Levi 2005; Kamali 2011), whereby words with exceptional stress, as in (2c, 2d), are lexically accented with a H*L pitch accent and words with the regular word-final stress, as in (1) and (2b, 2c), are lexically unaccented. Unaccented words acquire a H tone post-lexically, marking the right edge of the phonological word (ω) (Güneş 2015), providing a reliable cue for word segmentation in speech processing (Van Ommen 2016). An event-related potential study investigating the processing of Turkish stress by Domahs et al. (2013) demonstrates that native speakers of Turkish process these two types of stress/accent differently. Turkish participants showed the ‘stress deafness’ effect (Dupoux et al. 1997; Peperkamp and Dupoux 2002) only for the regular finally stressed or lexically unaccented words, and treated violations of stress/accent location as a lexical violation only for the exceptionally stressed or accented words. (1) Final word ‘stressed’/‘accentless’ words in Turkish a. taní ‘know’ b. tanı-dík ‘acquaintance’ c. tanı-dığ-ím ‘my acquaintance’ In lexically accented words, H*L occurs on roots and creates a lexical contrast between segmentally identical strings, as shown for bebek in (2a) and (2c). The word accent remains on the root as the morphosyntactic word (and ω) is extended, as seen in (2c) and (2d). (2) Final stress (a, b), and exceptional lexical stress plus H*L accent (c, d) in Turkish a. bebék ‘baby’ b. bebek-ler-ím ‘my babies’ c. Bébek ‘Bebek’ (the name of a neighbourhood in Istanbul) d. Bébek-li-ler ‘Those who are from Bebek.’ The Turkish ω-final position is thus assigned a demarcative prominence, a lexical stress, or a post-lexically tone-bearing syllable, depending on the analysis.

14.2.2 Lexical prosody: vowel harmony in Turkish In Altaic languages, vowel harmony interacts with prosodic constituent structure. Vowel harmony may involve backness, labiality (rounding), vowel height, and pharyngealization (van der Hulst and van de Weijer 1995). Many Turkic languages have backness and labial harmony, while pharyngeal and labial harmony occurs in Mongolian. In Turkish, front vowels must follow front vowels and back vowels must follow back vowels (3) (Clements and Sezer 1982; Charette and Göksel 1996) due to the backness harmony. In rounding

south west and central asia 209 armony, non-initial vowels in a word can be round only if preceded by another rounded h vowel (4) (cf. Göksel and Kerslake 2005). Like Mongolian, Turkish is agglutinative and suffixes harmonize with the root. (3) a. araba-lar-da b. kedi-ler-de (4)

a. üz-gün-üz b. kız-gın-ız

‘in the cars’ ‘in the cats’ ‘we are sad’ ‘we are angry’

The domain of Turkish vowel harmony is not always the ω (Kornfilt 1996). A single harmony domain may contain two ω’s (Göksel 2010), while multiple vowel harmonic domains may be parsed as a single ω (Güneş 2015). Turkish compounds, regardless of whether they are parsed as single ω’s (5a) or two ω’s (5b), are non-harmonic. Loanwords (5c) and certain suffixes, such as gen in (5d), are also non-harmonic. (5) a. b. c. d.

(çek-yát)ω (keçí)ω(boynuzú)ω kitap altı-gen

‘pullover sofa’ ‘carob’ ‘book’ ‘hexagon’

14.2.3 Post-lexical prosody in Turkish Unless pragmatically marked tunes dictate otherwise, sentence-internal prosodic constituency in Turkish can be traced to syntactic branching and relations between syntactic constituents. Root clauses are parsed as intonational phrases (ι) (Kan 2009; Güneş 2015). ι’s contain (a number of) phonological phrases (φ), which correspond to syntactic phrases (Kamali 2011) and contain maximally two ω’s (Güneş 2015). The prosodic hierarchy proposed in the inton ational model of İpek and Jun (2013) and İpek (2015) is similar to this, but their intermediate phrase (ip), which corresponds to φ, can contain more than two prosodic words. Four major cues are employed to distinguish between intonational phrases (ι) and phonological phrases (φ) in Turkish. These are (i) boundary tones (H- for the right edges of non-final φ’s, and H% or L% for the right edges of ι’s), (ii) pauses (shorter across φ’s and longer across ι’s), (iii) head prominence, and (iv) final lengthening (shorter final syllable before φ boundaries and longer final syllable before ι boundaries). Figure 14.1 presents the prosodic phrasing of (6) with one ι and three φ’s. (6)

[((Nevriye)ω)φ ((araba-da)ω)φ ((yağmurluğ-u-nu)ω (ar-ıyor.)ω)φ]ι Nevriye car-loc raincoat-3poss-acc search-prog ‘Nevriye is looking for her raincoat in the car.’ (Güneş 2015: 110)

In Turkish, ι’s are right-prominent and φ’s are left-prominent (Kabak and Vogel 2001; Kan 2009). Prominence is marked with variation in pitch register and prosodic phrasing across the head and non-head part of φ’s. In φ’s with two ω’s, the head-ω (i.e. the leftmost ω in a φ) exhibits a higher f0 register and a final H, which is accompanied by a rise in nonfinal φ’s and a plateau in final φ’s (7). The head-ω of the final φ is also the head of its ι (i.e. the nucleus), yet its register is not higher than the heads of prenuclear φ’s. Any item that follows the nucleus receives low-level f0 and is prosodically integrated with the non-head

Pitch (Hz)

210 ANASTASIA KARLSSON ET AL.

350 300 250 200 150 100 nev

ri

H-

L

ye

a

Nevriye

Hra

ba

da

L yağ

car-loc

L%

H L mur

lu

ğu nu a

raincoat-3poss-acc

rı

yor

search-prog

‘Nevriye is looking for her raincoat in the car’ 0.2568

2.502

Time (s)

Pitch (Hz)

Figure 14.1 Multiple φ’s in all-new context and with canonical SOV order. 350 300 200 100 ali

biliyor

aynurun buraya gelmeden once nereye gitmis olabilecegini

0

Time (s)

3.7842

Figure 14.2 Pitch track of Ali biliyor Aynurun buraya gelmeden önce nereye gitmiş olabileceğini ‘Ali knows where Aynur might have gone to before coming here’, illustrating multiple morphosyntactic words as a single ω, with focus for the subject Ali (Özge and Bozşahin 2010: 148).

part of the final φ. A schematic illustration of the prosodic and tonal structure of a declarative with unaccented words is given in (7). (7)

%L

H-

[ pre-nucleus ( non-final φ ) φ ( )ω

L

H

L

pre-nucleus ( non-final φ (head) ω (

H-

L

H

L

L%

)φ )ω

nucleus post-nucleus ] ι ( final φ )φ ( headN ) ω ( )ω

Regardless of its morphological and syntactic complexity, the postnuclear ω bears low levelled, flat f0 (Özge and Bozşahin 2010), as illustrated in (8) and Figure 14.2. (8) %L H L L% [(Ali)ω-N (biliyor Aynurun buraya gel-me-den önce nereye gitmiş ol-abil-eceği -ni)ω-post-N]ι Ali knows Aynur.gen to.here come -neg-abl before where gone be-ABIL-comp.3poss-acc ‘Ali knows where Aynur might have gone to before coming here.’

south west and central asia 211

14.2.4 Focus in Turkish In Turkish, prosodic phrasing is the main focus alignment strategy. In single focus contexts, focus is aligned as the head of an ι, the nucleus (9). Word order variation can also be indir ectly related to focus alignment, in which case the focused constituent is realized in a default nuclear position (i.e. the immediately pre-verbal area) (10) (cf. Kennelly 1999; İşsever 2003; İpek 2011; Gürer 2015; but cf. İşsever 2006). (9) (OFOCSV), focused object, not immediately pre-verbal but the nucleus (adapted from Özge and Bozşahin 2010: 139) %L

H

L

L%

[((KAPIYI)ω-N/FOC (Ali kırdı)ω-Post-N)FINAL-φ]ι door.acc Ali broke ‘Ali broke the DOORFOC.’ (10) (O)(SFOCV), focused subject immediately pre-verbal and the nucleus %L

H-

L H

L

L%

[ (Kapıyı)φ-Pre-N ((ALİ)ω-N/FOC (kırdı)ω-Post-N)FINAL-φ]ι door.acc Ali broke ‘ALİFOC broke the door.’ In addition to prosodic phrasing and word order, focus in Turkish is marked by f0. Unlike intonation languages where the pitch range of a focused word is expanded compared to that of the pre-focus words, the pitch range of a focused word in Turkish is reduced in comparison to pre-focus words. The syllable before the nuclear word has a higher f0 than an equivalent syllable at a default phrase boundary. The pitch range of post-focus words is, however, substantially compressed (İpek and Jun 2013; İpek 2015), see Figure 14.2. When words with non-final lexical accent are focused, right after the accented syllable, a steep f0 fall is observed (Kamali 2011). In such cases, the non-final lexical pitch accent marks the prosodic head of the final φ, and hence is associated with focus if this head is aligned with a focused item. If words with non-final lexical accent occur in the post-verbal, postnuclear area, they get deaccented and bear low-level f0 (Güneş 2015; İpek 2015).

14.3 Mongolian 14.3.1 Lexical prosody in Mongolic: stress There is no consensus among linguists on the status and realization of lexical stress in Mongolian and Mongolic in general (for an overview see Svantesson et al. 2005). Native speakers also disagree about the placement of lexical stress in judgement tasks and in some

212 ANASTASIA KARLSSON ET AL. cases do not perceive any stress at all (Gerasimovič 1970 for Halh Mongolian; Harnud 2003 for Chakhar [Standard Mongolian; China]). Analysis by Karlsson (2005) suggests that Mongolian has no lexical stress, and three potential correlates of stress (vowel quality, vowel duration, and tone) do not correlate in marking any single syllable as stressed. Moreover, vowels, even phonemically long, can be completely deleted in all positions in casual speech. Since the initial syllable governs vowel harmony in Mongolian, this position is often ascribed stress. However, this vowel is often elided in casual speech. Mongolian speakers often devoice and completely delete all vowels in a word, which leads to chains of words with very little or no voiced material. Neither does vowel epenthesis always occur as predicted by syllabification rules, as when underlying /oʃgɮ-ʧ/ ‘kick-converb’, which is pronounced [oʃəgɮəʧ] in formal speech, is pronounced [ʃxɮʧ] in casual speech, with failed epenthesis and deletion of the phonemic vowel (Karlsson and Svantesson 2016). Extreme reduction is frequent. For example, /gaxai/ is reduced to [qχ] in /хar ɢaхai хɔjr-iŋ/ хар гахай хоёрын ‘black pig two-gen’ realized as [χarq.χɔj.riŋ], with syllabification taking place across word boundaries.

14.3.2 Lexical prosody: vowel harmony in Mongolian Pharyngeal harmony prevents pharyngeal /ʊ a ɔ/ and non-pharyngeal /u e o/ from cooccurring in the same word, with transparent /i/ occurring in either set. Harmony spreads from left to right in a morphological domain and the root word thus determines the vowel in affixes in this agglutinative language, as in the reflexive suffix -e, (e.g. ug-e ‘word’, xoɮ-o ‘foot’, am-a ‘mouth’, mʊʊr-a ‘cat’, and ɔr-ɔ ‘place’). Non-initial /i/ is ignored by vowel harmony (e.g. the reflexive suffix in mʊʊr-a ‘cat’ does not change in mʊʊr-ig-a ‘cat-acc-rfl’). Rounding harmony applies in the same domain, with /i/ again being transparent and high back /ʊ u/ being opaque. The opaque vowels block rounding harmony, as in ɔr-ɔd ‘enterperf’ (cf. ɔr-ʊɮ-ad ‘enter- caus-perf’) (Svantesson et al. 2005: 54).

14.3.3 Post-lexical prosody in Mongolian In read speech, major-class words have rising pitch, due to Lα being associated with the first mora and Hα with the second, as a result of which /mʊʊ.ra/ ‘cat’ has a pitch rise in its first syllable and /xo.ɮo/ ‘foot’ a pitch rise over its two syllables. The assignment of LαHα to the left edge of the accentual phrase (α) is post-lexical, as shown by its sensitivity to post-lexical syllabification. For instance, /nʊtʰgtʰai/ ‘homeland-com’ is either trisyllabic due to schwa epenthesis, [nʊ.tʰəg.tʰai], or disyllabic, [nʊtʰx.tʰai], with Hα appearing on [tʰəg] in the first and on [tʰai] in the second case. The domain of syllabification has been described as the ω and the domain of LαHα assignment as α. Post-positions always share the same ω (or α) as their left-edge host. Many word combinations that function as compounds (often written as two words) are realized as one α, such as the compound ɢaɮtʰ tʰirəg ‘train’ (literally: ‘fire vehicle’), pronounced with one LαHα in [ɢaɮtʰtʰirəg]α. In spontaneous speech, vowels are often deleted and words are syllabified across a word boundary. Several lexical words can thus be clustered

south west and central asia 213 as one ω and marked as α, which will lead to a discrepancy between the morphological domain of vowel harmony and the phonological domain for prosodic parsing. LαHα mark the left edge of an accentual phrase (α) and by implication an ip in Mongolian. The ip corresponds to the syntactic phrase and often contains more than one α. As a consequence of α-phrasing, almost every major-class word in neutral declaratives in read speech begins with a lowering of f0 towards the phrase-initial Lα, as illustrated in Figure 14.3. The Hα tones in a series of LαHα boundary rises show a downtrend across the ip that is reset at the beginning of every ip except the last, which corresponds to a verb phrase and is pronounced with distinctly lower pitch on the last word. Figure 14.4 shows the downtrend on the second syllable of marɢaʃ and the reset on the second syllable of /ɢɔɮig/. Tonal marking with a right-edge ip boundary tone H- occurs optionally in subordination, coordination, and enumeration. Clauses are parsed as intonational phrases (ι), which come with a right-edge L% or H% and contain one or more ip’s. However, in spontaneous speech, units larger than root clauses can be marked as ι, something that is somehow connected to discourse structure. Moreover, L% is rare in spontaneous speech, where final rises due to H% are frequent. The intonation of other Mongolic languages has been described by Indjieva (2009) in her comprehensive account of prosody of the Houg Sar and Bain Hol varieties of Oirat, a Western Mongolic language spoken in the Xinjiang region of China. Oirat lacks lexical

f0 (Hz)

300

100

0

–LαHα

LαHα

LαHα

mʊʊr

nɔxɔint

parʲəgtəw

cat

dog

catch Time (s)

L%

1.318

Figure 14.3 Pitch track showing the division into α’s of all-new [[mʊʊr]α[nɔxɔint]α[parʲəgtəw]ip]ι ‘A cat was caught by a dog’, where underlined bold symbols correspond to the second mora in an α. -LH marks the beginning of the ip (Karlsson 2014: 194).

214 ANASTASIA KARLSSON ET AL.

f0 (Hz)

300

100 –LαHα

LαHα

–LαHα

LαHα

L%

pit

marGaš

xirɮjɘŋ

gɔɮig

thʊʊɮɘŋ

we

tomorrow

Kherlen

river

cross

0

Time (s)

1.809

Figure 14.4 Pitch track of [[pit]α [marɢaʃα]ip [[xirɮʲəŋ]α [ɢɔɮig]α]ip [tʰʊʊɮəŋ]ip]ι ‘We will cross the Kherlen river tomorrow’ (Karlsson 2014: 196). -LH marks the beginning of an ip.

stress and nuclear pitch accents, and instead marks edges of prosodic units, the α, with its initial LαHα, and the ι. These features are very similar to those of Mongolian.

14.3.4 Focus in Mongolian Mongolian is strictly a verb-final subject-object-verb (SOV) language. The pre-verbal pos ition is sometimes claimed to be a focus position, but this has not been confirmed (Karlsson 2005). Focus is marked by strengthening the initial boundary of the ip that contains the focused word(s), resulting in an enhanced pitch reset. A similar pattern is found in Oirat (Indjieva 2009). Dephrasing does not occur except for a special marking of focal constituents by pitch lowering. This is only found for the ι-final position in read speech. Even in such cases, α-boundaries are often traceable. In spontaneous speech, focus is most often marked by Hfoc at the end of the focused phrase(s), as illustrated in Figure 14.5. Its scaling brings more evidence that it correlates with the new/given dichotomy: it is higher when new information coincides with the second part of the ι. To formally show the leftward spreading of Hfoc to the beginning of the ip that contains the focus constituent, an arrow is used: ←Hfoc.

south west and central asia 215

f0 (Hz)

400

100 –LαHα manai aaw pɔɮ mine father 0

–LαHα ←Hfoc

–LαHα

H%

saixəŋ

cantai

ʊxaɮəg

xuŋ

nice

nice

wise

person

Time (s)

L%

3.068

Figure 14.5 Pitch track and speech waveform illustrating final ←Hfoc marking focus on all the preceding constituents. The utterance is [[[manai aaw pɔɮ]α]ip [[[saixəŋʦantai]α]ip [[ʊxaɮəg]α]ip]foc [xuŋ]ip]ι ‘My father is nice and wise’.

14.4 Persian 14.4.1 Lexical prosody in Persian Persian word prominence has been described as having stress in nouns, adjectives, and most adverbs. Right-edge clitics, such as the indefinite [=i] and the possessive markers, are excluded from stress assignment, whereas verbs with inflectional prefixes take stress on the leftmost prefix, as illustrated in (11) (Ferguson 1957; Lazard 1992). (11)

a. pedár father b. pedár=am father=1sg ‘my father’ c. mí-goft dur-said.3sg ‘s/he would say’

216 ANASTASIA KARLSSON ET AL. While some authors have attempted to show that Persian ‘stress’ is exclusively governed by prosodic phrasing (e.g. Kahnemuyipour 2003), recent research suggests that it is in fact a post-lexical tone that is assigned on the basis of the morphosyntactic label, independently of prosodic phrasing (Rahmani et al. 2015, 2018; Rahmani 2018, 2019). That analysis is in line with three recent experimental findings. First, the syllabic prominence at issue is created only by f0, suggesting that it is a tone or accent, rather than a metrical entity (Abolhasanizadeh et al. 2012; but see Sadeghi 2017 for a different view). Second, it is not obligatory on the surface in that it disappears in some sentential contexts (Rahmani et al. 2018), thus escaping a hallmark feature of stress as defined by Hyman (2006). Third, despite the high functional load of ‘stress’ location, for instance due to homophony between derivational suffixes and clitics ([xubí] ‘goodness’ vs. [xúbi] ‘good.2sg’), Persian listeners are ‘stress deaf ’ in the sense of Dupoux et al. (2001), indicating that there is no word-prosodic information in the lexicon (Rahmani et al. 2015). Phonologically, the Persian accent consists of a H tone. The syntactic motivation behind the location of accent is based on several observations, two of which are given here. First, a given word may receive accent on different syllables depending on the syntactic environment it appears in or the grammatical function it performs. Thus, nouns are accented on the initial syllable when appearing as vocatives as opposed to their default final accent (cf. [pédar] ‘father!’ vs. [pedár] ‘father’) (Ferguson 1957). Similarly, the pos ition of accent on various grammatical words is sensitive to sentential polarity. Examples are the intensifier/xejli/ ‘very’ and the compound demonstrative /hamin/ ‘this same one’, which are accented on the first syllable in positive sentences (cf. [xéjli], [hámin]) but take a final accent in negative sentences (cf. [xejlí], [hamín]). Second, whenever an expression (including phrases or clauses) is used in such a way as though the entire group were syntactically a single noun, it follows the accentual pattern of nouns—that is, it is assigned one accent on its final syllable irrespective of its default phrasal accent pattern (VahidianKamyar 2001). (12a) illustrates a clause in its default accentuation. As shown in (12b), when the same form is used as a head noun in a possessive construction to refer to a movie title, it is reanalysed as a noun by the accent rule—that is, the entire unit is assigned one accent on its final syllable. (12)

a. [bɒ́d mɒ́=rɒ xɒhád bord] wind 1sg=obj want.3sg carry ‘The wind will carry us.’ b. [bɒd mɒ=rɒ xɒhad bórd]=e kiɒrostamí wind 1sg=obj want.3sg carry=ez Kiarostami ‘Kiarostami’s The wind will carry us’

Independently of their accentual pattern, Persian words have iambic feet, which serve as the domain for assimilation processes such as vowel harmony (Rahmani 2019). Mid vowels assimilate to the following high vowels, if only the two syllables are grouped into a single foot. Thus, while [o] normally raises to [u] in [ho.lu] ‘peach’, which is a disyllabic iamb, it cannot do so in [hol.gum] ‘pharynx’, which contains two monosyllabic iambs. In Ossetian [Indo-Iranian; Central Caucasus], accent becomes actualized only as a function of prosodic phrasing. Words do not have an individual stress but are organized in groups by a tonal accent (Abaev 1949).

south west and central asia 217

14.4.2 Post-lexical prosody in Persian The Persian prosodic hierarchy includes the φ and ι, in addition to the ω. ω is the domain of obligatory syllabification (Hosseini 2014). It roughly corresponds to a (simple or derived) stem plus inflectional affixes and clitics. φ and ι may be characterized by different degrees of pause length and pre-boundary lengthening (Mahjani 2003). Persian has a small tonal inventory. In addition to the syntactically driven accent H, there are two ι-final boundary tones, L% and H% (see §14.6). Some models of Persian intonation have assumed ‘focus accent’ and ‘phrase accent’ in the tonal inventory of the language (e.g. Scarborough 2007), for which there would appear to be insufficient supporting evidence (Rahmani et al. 2018). The two prosodic segmentations for each of the members of the minimal pair (13a, 13b) show the irrelevance of prosodic constituency to the distribution of accent. Their pitch tracks are presented in Figure 14.6. (13) a. bɒd mɒ=rɒ xɒhad bórd wind 1sg=obj want.3sg carry ‘The wind will carry us’ (naming expression) [((bɒd)ω (mɒ=rɒ)ω)φ ((xɒhad)ω (bórd)ω)φ ]ι [((bɒd)ω)φ ((mɒ=rɒ)ω (xɒhad)ω (bórd)ω)φ ]ι b. bɒ́d mɒ́=rɒ xɒhád bord wind 1sg=obj want.3 sg carry ‘The wind will carry us.’ (sentential expression) [((bɒ́d)ω (mɒ́=rɒ)ω)φ ((xɒhɒ́d)ω (bord)ω)φ ]ι [((bɒ́d)ω)φ ((mɒ́=rɒ)ω (xɒhɒ́d)ω (bord)ω)φ ]ι The intonation systems of other Iranian languages are not well documented, an exception being Kurdish (Northern Kurmanji) (Hasan 2016).

14.4.3 Focus in Persian Persian has SOV as the unmarked word order with all possible combinations for pragmatic purposes (Sadat-Tehrani 2007). It is still unclear whether word order variations cue focus, intonation being the most reliable cue. Under broad focus, post-verbal words are obligator ily unaccented and all other words obligatorily accented. Thus, while in an SAOV utterance every word is accented, in VSAO only an accent on the verb remains. Under narrow focus, post-focal words are deaccented, irrespective of the position of the verb. Thus, SfocAOV will have one accent, on Sfoc. The prosodic expression of focus is syntactically restricted. While in sentences with the unmarked SOV word order, any word can be prosodically marked for focus, in sentences with pragmatically marked word order, post-verbal words cannot be focused if the unmarked position of these words is pre-verbal. Some clause types may deviate slightly from these patterns, such as those with nonspecific objects, manner adverbials, or clauses with motion verbs, which are ignored here for lack of space. See Sadat-Tehrani (2007) and Kahnemuyipour (2009) for more information.

218 ANASTASIA KARLSSON ET AL.

f0 (Hz)

(a)

f0 (Hz)

(b)

Figure 14.6 f0 contours of 13a (a) and 13b (b).

south west and central asia 219

14.5 Caucasian About 50 languages are spoken in the Caucasus, 37 of which are indigenous (Kodzasov 1999). Among these, Georgian, a member of the South Caucasian language group, is the most studied and is described in §14.5.1. Daghestanian, a member of the Northern Caucasian language group, is briefly described in §14.5.2.

14.5.1 Georgian 14.5.1.1 Lexical prosody in Georgian Although the existence and location of lexical stress in Georgian are debated in the literature, a general consensus has been that stress is assigned word-initially (Robins and Waterson 1952; Aronson 1990). Some studies further claim that, for words longer than four syllables, both the initial and the antepenultimate syllables are stressed, with primary stress on the antepenult (Harris 1993). However, Vicenik and Jun (2014) showed that the domain of antepenult stress is not a word, but the α. Stress is not influenced by syllable weight (Zhgenti 1963) or vowel quality (Aronson 1990). The main phonetic correlate of Georgian stress was claimed to be high pitch by Robins and Waterson (1952) based on the word in isolation data, or to be related to a rhythmicalmelodic structure by Zhgenti (1963; cited in Skopeteas et al. 2009). However, based on the acoustic measurements of words in a carrier sentence with the same quality of target vowels, Vicenik and Jun (2014: 157) found that the word-initial syllable had significantly greater duration and intensity than all following syllables, while the antepenultimate syllable was not stronger than the syllable immediately preceding it. The f0 of the word-initial syllable was typically low, demarcating the beginning of a word (and an α) in declaratives with neutral focus, but was often high or rising in question sentences (see §14.6) or when the word was narrowly focused (see §14.5.1.3). That is, the pitch of the stressed syllable is determined post-lexically based on the sentence types or focus, confirming the observations made in earlier studies (Zhgenti 1963; Tevdoradze 1978).

14.5.1.2 Post-lexical prosody in Georgian There are only a few studies that have examined prosody at the post-lexical level in Georgian (Bush 1999; Jun et al. 2007; Skopeteas et al. 2009; Skopeteas and Féry 2010; Vicenik and Jun 2014; Skopeteas et al. 2018) (studies published in Russian and Georgian are not included here). These studies all agree that the intonation of simple declarative sentences typically consists of a sequence of rising f0 contours. Jun et al. (2007) and Vicenik and Jun (2014) showed that the domain of a rising f0 contour, an α, often contains one content word, though it can have more. They proposed that Georgian, like Mongolian, has three prosodic units above the word: an ι, an ip, and an α. The rising contour of the α is analysed as a L* pitch accent on the initial syllable, followed by a Hα boundary tone on the final syllable. However, when the α is part of an embedded syntactic constituent or occurs in a (wh or polar) interrogative sentence, it is often realized with a falling contour, i.e. initial H* and

220 ANASTASIA KARLSSON ET AL.

f0 (Hz)

400

50 manana

dzalian

lamaz

meomars

bans

manana

very

beautiful

soldier

is washing

L* 0

Ha

L*

Ha

L*

Ha

Time (s)

L*

Ha

L*

L% 2.559

Figure 14.7 Pitch track and speech waveform of Manana dzalian lamaz meomars bans, ‘Manana is washing the very beautiful soldier’. Each word forms an α with a rising contour, [L* Hα].

(Vicenik and Jun 2014: fig. 6.1, redrawn in Praat)

final Lα. Figure 14.7 shows the f0 of a simple declarative sentence where each word forms one α, and illustrates a downtrend of final Hα tones over the whole utterance. In Figure 14.7, the sentence-final syllable is marked with a low boundary tone, L%, a common ι boundary tone for a declarative sentence. This means that the whole sentence forms one ι and also one ip, which includes five α’s. A sequence of α’s can form an ip when the α’s are close together syntactically or semantically. This higher prosodic unit is marked by a High boundary tone, H-, which is higher than the High tone of the preceding α. Figure 14.8 shows an example pitch track of a declarative sentence, The soldier’s aunt is washing Manana, where a complex NP subject, [meomris mamida], forms an ip, marked with a H- boundary tone. The f0 height of H- breaks the downtrend of α-final H tones across the utterance, as in Figure 14.7. Finally, the Georgian α can have one more tonal property. When it exceeds four syllables, a falling tone occurs over the antepenult and penult, in addition to the α-initial pitch accent. In that case, the antepenult has a H tone and the penult a L tone, regardless of the location of a word boundary inside the α. Since this f0 fall is not a property of a word, it is categor ized as a H+L phrase accent of an α. As shown in §14.6 and §14.5.1.3., this phrase accent occurs frequently in questions and as a marker of focus in Georgian.

south west and central asia 221

f0 (Hz)

400

50

L*

meomris

mamida

mananas

bans

soldier’s

aunt

manana

is washing

Ha

0

L*

H– Time (s)

L*

Ha

L*

L% 2.652

Figure 14.8 Pitch track of The soldier’s aunt is washing Manana. The complex NP subject [meomris mamida] forms an ip, marked with a H- boundary tone that is higher than the preceding Hα. (Vicenik and Jun 2014: fig. 6.4, redrawn in Praat)

14.5.1.3 Focus in Georgian Focus in Georgian is marked by word order and prosody. As in Turkish, a pre-verbal argument receives prominence in the neutral focus condition, showing that word order is sensitive to information structure. However, an infelicitous word order for focus may become felicitous by an appropriate prosodic structure, suggesting that prosodic constraints outrank syntactic constraints in encoding information structure (Skopeteas et al. 2009). In addition, Georgian shows different intonation patterns depending on the location of the word in a sentence. Skopeteas and Féry (2010), Vicenik and Jun (2014), and Skopeteas et al. (2018) show that a focused word is realized with high f0 (due to H*) sentence-initially, but with a low flat f0 (L*) sentence-finally. Sentence-final focused words are often preceded by a phrase break marked with a high boundary tone (H- in Vicenik and Jun 2014). Though a (L)H* pitch accent marks prominence of a focused word in Georgian, it is not always realized in an expanded pitch range, especially when the focused word is sentencemedial. However, there is nevertheless salient prominence for the focused word due to increased intensity and duration of its stressed syllable and a reduced pitch range of the postfocus words. Interestingly, Vicenik and Jun (2014) show that a focused word is often marked by an additional tone, a H+L phrase accent, on the antepenultimate syllable of the focused word itself or a larger phrase that consists of a focused word and the following word. Figure 14.9 shows an example where a pre-verbal argument (Gela) is focused but the H+L phrase accent

222 ANASTASIA KARLSSON ET AL.

f0 (Hz)

400

50 ara

gela

imaleba

navis

uk’an

No,

GELA

hide

ship

behind

LH* 0

H+L Time (s)

La H* H+L

L% 2.234

Figure 14.9 Pitch track of No, GELA is hiding behind the ship, where the subject noun is narrowly focused and the verb, instead of being deaccented, has a H+L phrase accent. The focused word and the verb together form one prosodic unit.

is realized on the following word, a verb. In addition to tonal prominence marked by pitch accent and phrase accent, prosodic phrasing may mark focus, too. In terms of Vicecik and Jun’s (2014) model, a focused word often begins an ip. Production data in fact suggest that focus can be expressed by word order, prosodic phrasing, and pitch accent; any of these can mark focus, but none of them seems obligatory.

14.5.2 Daghestanian The majority of Daghestanian languages have no stress (Kodzasov 1999), appearing instead as tonal languages (e.g. Andi, Akhvakh) or quasi-tonal languages (most languages of North Dagestan), besides stress languages (most languages of Southern Dagestan). In the quasitonal languages, tone is probably connected to a stiffness/slackness contrast, whereby the articulatory transition from slackness to stiffness generates a rising f0. Thus, while the tonal contours of Andi words like hiri (LowLow) ‘red’ and mic’c’a (HighHigh) ‘honey’ are generated by lexical tones, the tonal contrasts in /aː/ (RisingLow) ‘broth’, /aː/ (LowLow) ‘pus’, and /aː/ (HighHigh) ‘ear’ result from underlying stiffness/slackness contrasts in Chamalal. In Ingush, tones are grammatical, and some tonal suffixes have a rising-falling tone, as in lät-âr ‘fought (witnessed past)’ vs. lât-ar ‘used to ﬁght (imperfect)’, where the opposition is marked by ablaut and tone shift (Nichols 2011). Tone in Ingush can occur only on one syllable per word. Chechen, a North East Caucasian language, mainly uses word order to signal focus (Komen 2007). Certain clitics and suffixes have an inherent high pitch (Nichols 1997),

south west and central asia 223 which suggests the presence of lexical tone. Komen recognizes an ι and an α in Chechen, both marked by L at their left edge and followed by H*.

14.6 Communicative prosody: question intonation In Turkish questions, the wh-word and the item that precedes the polar question particle are parsed as the nucleus (14). While in wh-questions right edges of ι’s are decorated with H%, polar questions end with %L (Göksel and Kerslake 2005; Göksel et al. 2009). Like focused items, the item preceding the Q-particle is aligned with the nucleus via prosodic phrasing (Shwayder 2015). Göksel et al. (2009) observe that the pre-wh-word area exhibits higher pitch than the pre-nuclear area in polar questions and declaratives. (14) Prosodic phrasing of polar and wh-questions compared with declaratives ι-boundary tone (post-nucleus)ω)φ]ι L%/H% Declarative: [----------- ((nucleus)ω (post-nucleus)ω)φ]ι H% wh-question: [----------- ((wh-word)ω Polar question: [----------- ((a constituent)ω (Q-particle+post-nucleus)ω)φ]ι L% A vocative proper name will exhibit a pitch fall (H*L%) (15a), which may convey surprise if spoken with an expanded pitch range (15b). Rising f0 in the same environment (i.e. LH*H%) conveys a question with the meaning of ‘Is it you?’ (15c) (Göksel and Pöchtrager 2013). (15) Vocatives with various meanings Calling address Surprise address H*L% H*L% a. Aslı

b. Aslı

Is-it-you address LH*H% c. Aslı

Mongolian polar questions are marked by a final question particle. It typically also appears at the end of wh-questions, in which the wh-word is in situ, but it may be omitted in colloquial speech. While Mongolian interrogatives often have f0 shapes that are similar to declaratives, with final H% being used in both of these, in all-new interrogatives dephrasing and suspension of the downtrend (i.e. inclination) may occur. Persian polar questions have similar intonation contours to declaratives (SadatTehrani 2011). The question particles are often omitted in colloquial speech, in which case a final H% distinguishes them from declaratives, which have L%. Additionally, questions are characterized by sentence-final syllable lengthening and wider pitch range. wh- questions are generally characterized by deaccentuation of the elements after the wh-word and a final L% boundary. Native listeners can easily differentiate wh-questions from their declarative counterparts on the basis of the part of the utterances before the wh-word (Shiamizadeh et al. 2017). In Georgian, both polar and wh-questions are marked by word order and prosody. The wh-word occurs sentence-initially and is immediately followed by a verb, with which it tends to form a single ip. This phrase is marked by the sequence H* H+L, where H* occurs on the wh-word and H+L on the antepenultimate and penultimate syllables of the verb if it

224 ANASTASIA KARLSSON ET AL. has four or more syllables or only L appears on the penult if the verb is shorter than three syllables. Most commonly, a final H- appears on the final syllable of the ip, although L- is also possible. The end of wh-question is often marked by H% or L%, less frequently HL%, without obvious differences in meaning. In polar questions, the verb occurs either sentence-initially or sentence-medially. A sentence-initial verb forms an ip by itself, with a H* L H- pattern. A sentence-medial verb either shows a H* L H- pattern by itself or appears together with a preceding subject in an ip marked by H* H+L H-, similar to the pattern in the wh-word + verb group described above. Polar questions, too, end in H%, L%, or HL%. Bush (1999) pointed out that HL% is characteristic of polite questions.

14.7 Conclusion All the languages discussed in this chapter lack contrastive lexical stress and, more generally, they lack culminative stress, in Trubetzkoy’s (1939/1969) terms. That is, minimal word pairs like English éxport versus expórt do not occur, or are at best limited to a few cases, and stress is not morphologically conditioned. Moreover, pitch, intensity, and duration are not found to coincide in marking the prominent word-initial or word-final syllable, indicating that it is not metrically strong and instead is marked by tone. Interestingly, this seems to be true for most Altaic, Caucasian, and Indo-Iranian languages and may be the reason for the lack of consensus about the status, realization, and placement of lexical stress in these languages. Vowel harmony can be seen as signalling a word as an entity in speech. Baudouin de Courtenay (1876) and Kasevič (1986), for instance, suggested that this coherence-signalling function parallels Indo-European lexical stress. If vowel harmony has a demarcative function similar to lexical stress, this may explain the redundancy of stress in harmonic languages. The absence of contrastive stress is a common feature of many harmonic languages, as we reported here for Turkic. Other examples are a number of Uralic languages (among them Finnish and Hungarian), while stress is completely absent in Mongolian (as described in §14.3), Erzya [Finno-Ugric; Mordovia], and some Chukchi-Kamchatkan languages [Paleo-Asian] (Jarceva 1990). Some languages, such as Uzbek [Turkic; Uzbekistan] and Monguor [Mongolic; China, Qinghai, and Gansu provinces], have developed lexical stress after losing vowel harmony (Binnick 1980; Kasevič 1986). In Monguor and its dialects, final lexical stress has arisen and the first syllable, which governs vowel harmony in other Mongolic languages, is lost in some words; for example, Old Mongolian *Onteken ‘egg’ has become ontəg in Halh and ndige in Monguor. These correlations suggest that harmony has a demarcative function similar to lexical stress. Though the languages treated in this chapter share some structural features, such as SOV word order, agglutination, and some prosodic similarities, their tonal tunes are aurally rather different, due to (among other things) different interactions between lexical and post-lexical prosody (micro- and macro-rhythm; Jun 2014b) as well as the shapes and distribution of pitch accents and boundary tones.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

chapter 15

Cen tr a l a n d E aster n Eu rope Maciej Karpiński, Bistra Andreeva, Eva Liina Asu, Anna Daugavet, Štefan Beňuš, and Katalin Mády

15.1 Introduction The languages of Central and Eastern Europe form a typologically divergent collection that includes Baltic (Latvian, Lithuanian), Finno-Ugric (Estonian, Finnish, Hungarian), Slavic (Belarusian, Bulgarian, Czech, Macedonian, Polish, Russian, pluricentric BosnianCroatian-Montenegrin-Serbian (BCMS), Slovak, Slovenian, Ukrainian), and Romance (Romanian). Most of them have well-established positions as official state languages, but there are also a good many minority and regional languages varying in their history, status, and number of speakers (e.g. Sorbian, Latgalian, Kashubian, a number of Uralic languages, and groups of Romani dialects). Slavic and Baltic languages are assumed to have emerged from a hypothetical common ancestor—Proto-Balto-Slavic (also referred to as very late Proto-Indo-European; Comrie and Corbett 1993: 62)—and to have split some 2,000 years ago (Mallory and Adams 2006: 103–104). Slavic broke up into East, West, and South Slavic (Mallory and Adams 2006: 14, 26; Sussex and Cubberley 2006; Clackson 2007: 8, 19). Romanian (Eastern Romance) arose from the Romanization of Dacia in the first centuries ad and the later invasion of Goths (Du Nay 1996). Hungarian is considered to have emerged from the Ugric branch of Proto-Uralic, while Estonian and Finnish belong to the Finnic branch (Abondolo 1998). Beyond genetic relations, it was migration, language policy, and language contacts that shaped the present linguistic picture of Central and Eastern Europe, including many pros odic aspects. This chapter discusses the word prosody (§15.2) and sentence prosody (§15.3) of the major languages of the region.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

226 MACIEJ KARPIńSKI et al.

15.2 Word prosody 15.2.1 Quantity Quantity distinctions play an important role in the word prosody in the region and may involve consonants in addition to vowels. In the majority of cases, vowel quantity distinctions are accompanied by a difference in vowel quality (e.g. Kovács 2002; Podlipský et al. 2009; Skarnitzl and Volín 2012; Grigorjevs and Jaroslavienė 2015).

15.2.1.1 Baltic Latvian and Lithuanian have a quantity contrast in vowels, and Latvian has additionally developed contrastive quantity in consonants. Some dialects have lost quantity in unstressed syllables. The durational proportion between short and long vowels, pronounced in isolation, has been shown to be 1:2.1 for both Latvian (Grigorjevs 2008) and Lithuanian (Jaroslavienė 2015). In Lithuanian, short open vowels are lengthened under stress in nonfinal syllables in the word, except in certain grammatical forms; see (1) below (Girdenis 1997). In Latvian, voiceless intervocalic obstruents are lengthened if preceded by a short stressed vowel, which has been attributed to Finnic influence (Daugavet 2013).

15.2.1.2 Finno-Ugric Estonian has developed a three-way quantity system with short (Q1), long (Q2), and overlong (Q3) degrees, where duration closely interacts with stress and tone (Lehiste 1997). A decisive factor in determining the degree of quantity is the duration ratio of the first (stressed) syllable and the second (unstressed) syllable in a disyllabic foot (Lehiste 1960a), while pitch remains a vital cue for distinguishing the long and overlong quantity degrees (e.g. Lehiste 1975; Danforth and Lehiste 1977; Eek 1980a, 1980b; Lippus 2011). In disyllabic Q1 and Q2 feet, the f0 steps down between the two syllables, while in Q3 feet there is an f0 fall early in the first syllable. In Finnish, both consonant and vowel durations are contrastive, independent of each other and of word stress. That is, short and long vowels may occur before and after both short and long consonants in both stressed and unstressed syllables (Suomi et al. 2008: 39). As in Estonian, in Finnish the f0 contour may act as a secondary cue for distinguishing phonological quantities (Lehtonen 1970; O’Dell 2003; Järvikivi et al. 2007; Vainio et al. 2010). Additionally, Hungarian differentiates between short and long vowels and consonants, although the quantity contrast for consonants is less crucial, as various phonotactic constraints make consonant length predictable (Siptár and Törkenczy 2007).

15.2.1.3 Slavic The historically widespread presence of vowel quantity in the area is now absent from Bulgarian, Macedonian, Polish, Ukrainian, Belarusian, and Russian, and it never existed in the only Romance language in the region, Romanian. It is preserved in Czech, Slovak, Slovenian, and pluricentric BCMS. It is found in stressed and unstressed syllables in Czech, Slovak, and BCMS, where long vowels are, however, excluded from a pre-stressed position. In Slovenian, phonological quantity is present only in final stressed syllables, stressed vowels being otherwise long and unstressed short.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

CENTRAL AND EASTERN EUROPE 227 Syllabic /l/ and /r/ occur in Czech and Slovak, and syllabic /r/ in South Slavic. Syllabic liquids participate in the quantity contrast in Slovak and BCMS, but in Slovenian the only syllabic liquid /r/ is always long. Duration ratios between short and long nuclei, relevant to rhythm, vary considerably, in part depending on style (laboratory speech vs. reading) (for Czech see Janota and Jančák 1970 and Palková 1994; for Slovak see Daržágín et al. 2005 and Beňuš and Mády 2010; for BCMS see Lehiste and Ivić 1986: 63 and Smiljanić 2004). The relevance of distinctions between long and short vowels has been called into question in Slovenian (SrebotRejec 1988) as well as in the Zagreb dialect of BCMS (Smiljanić 2004).

15.2.2 Word stress The entire range of word stress patterns—mobile and fixed, left edge and right edge, based on various phonetic properties, interacting with other prosodic domains (van der Hulst 2014b)—is represented in the languages of the region as a result of both genetics and contact factors.

15.2.2.1 Baltic Lithuanian retains the mobile stress of the Balto-Slavic system (Young 1991; Girdenis 1997; Stundžia 2014) and features a tonal contrast in the stressed syllable (Dogil 1999a: 878; see also Revithiadou 1999; Goedemans and van der Hulst 2012: 131). Latvian stress tends to fall on the initial syllable of the main word of the clitic group (e.g. /uz ˈjumta/ ‘on the roof ’) (Kariņš 1996), which is sometimes attributed to Finnic influence (Rinkevičius 2015; cf. Hock 2015), although what may be seen as early stages of the tendency towards initial stress are also found in Lithuanian dialects, where there is no Finnic influence. Secondary stresses in both Latvian and Lithuanian occur at intervals of two or three syllables, but may also depend on syllable weight and morphological structure (Daugavet 2010; Girdenis 2014). A unique feature of Latvian, a weight-insensitive unbounded system (van der Hulst et al. 1999: 463), is the existence of distinctive patterns involving pitch and glottalization on both stressed and unstressed heavy syllables (Seržants 2003). Lithuanian orthography distinguishes three marks tradition ally referred to as ‘accents’, which conflate stress and length. ‘Grave’ indicates a stressed light syllable, as in the final syllable of the instrumental case for ‘wheel’ in (1), while ‘acute’ and ‘circumflex’ indicate what is traditionally referred to as a tonal contrast on heavy syllables, as in (2a, 2b), which is now lost on long vowels. Phonetically, the role of f0 is secondary compared to the duration ratio between the first and second halves of the heavy rhyme (Dogil and Williams 1999: 278–284). In syllables with the acute accent, the first element of the diphthong and of short-vowel-plus-sonorant combinations is lengthened, while the second is short and presumably non-moraic; the circumflex accent indicates that the second element is lengthened, while the first is short and qualitatively reduced, indicating a possible loss of its mora (Daugavet 2015: 139). The circumflex is traditionally believed to be the accent of the short vowels that are lengthened under stress, as observed in §15.2.1.1, shown in (1). Stress on circumflex syllables may also shift to certain morphemes (‘stress mobility’). (1) stressed-vowel lengthening rãtas [ˈrɑː.tas] ‘wheel’, cf. ratù [ra.ˈtʊ] inst.sg

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

228 MACIEJ KARPIńSKI et al. (2) a. acute b. circumflex

áukštas [ˈɑˑʊk.ʃtas] ‘high’ táiką [ˈtɑˑɪ.kɑː] ‘aim; apply’ prs.prtc.nom.pl aũkštas [ˈɒuˑk.ʃtas] ‘storey of a building’ taĩką [ˈtəiˑ.kɑː] ‘peace’ acc.sg

15.2.2.2 Finno-Ugric In Estonian, the primary stress in native words always falls on the first syllable, but it may occur elsewhere in recent loans (e.g. menüü [meˈnyː] ‘menu’). Secondary stresses normally occur on odd-numbered syllables; their placement is determined by the deriv ational and syllabic structure of the word (Viitso 2003). The foot is maximally trisyllabic; words of more than three syllables may consist of combinations of monosyllabic, disyllabic, and trisyllabic feet (Lehiste 1997). A tetrasyllabic word is generally made up of two disyllabic trochees. The main phonetic correlate of stress in Estonian is vowel duration in interaction with the three-way quantity system: in long (Q2) and overlong (Q3) quantity, the stressed vowels are longer than the unstressed ones, whereas in short quantity (Q1) it is the other way round (Lippus et. al 2014). Primary stress in Finnish always falls on the first syllable of the word (Iivonen 1998: 315). The placement of secondary stress depends on several factors, including the segmental structure of syllables and the morphology of the word (Karlsson 1983: 150–151; Iivonen 1998: 315; Karvonen 2005). Long words are formed of disyllabic feet. In compound words, the secondary stress falls on the first syllable of the second element, even if both elements are monosyllabic (e.g. puupää [ˈpuːˌpæː] ‘blockhead’). The main phonetic correlate of stress in Finnish is the duration of segments when they constitute the word’s first or second mora (relative to segment durations elsewhere in the first foot) (Suomi and Ylitalo 2004). There is virtually no reduction of vowel quality in unstressed syllables relative to stressed syllables (Iivonen and Harnud 2005: 65). In Hungarian too, primary stress is fixed to the word-initial syllable but, unless the word carries a pitch accent, is not marked by salient acoustic cues such as vowel quality, duration, or intensity (Fónagy 1958; Szalontai et al. 2016). The existence of secondary stress is disputed (Varga 2002).

15.2.2.3 Slavic All modern West Slavic languages feature weight-insensitive word stress systems (van der Hulst et al. 1999: 436). Word stress is bound in different ways to the left or to the right edge of the word. It falls on the initial syllable in Czech and Slovak but mostly on the penultimate syllable in Polish (Jassem 1962; Steffen-Batóg 2000). In Czech, stress is achieved mainly by means of intensity with no systematic vowel reduction in unstressed conditions (Palková 1994). As an exception to the Polish penultimate syllable stress rule, stress may fall on the antepenultimate syllable in some loanwords (3a) or even on the preantepenultimate one in some verb forms (3b). (3) a. matematyka [matɛˈmatɨka] ‘mathematics’ nom.sg b. pojechalibyśmy [pɔjɛˈxalʲibɨɕmɨ] ‘we would go’ The primary stress may also move to a different syllable in order to keep its penultimate position in inflectional forms.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

CENTRAL AND EASTERN EUROPE 229 (4) bałagan [baˈwaɡan] ‘mess’ nom.sg bałaganu [bawaˈɡanu] ‘of mess’ gen.sg The nature of secondary stress in Polish is still under discussion (Rubach and Booij 1985; Newlin-Łukowicz 2012; Łukaszewicz 2018), with recent studies showing a lack of systematic acoustic evidence for it (Malisz and Żygis 2018). Similar doubts apply to Czech and Slovak. Czech and Slovak proclitics are integrated into the prosodic word (5a), while Polish word stress preserves its position except in the case of one-syllable pronominals (5b) (Dogil 1999b: 835). (5) a. Czech/Slovak do domu b. Polish do mnie

[ˈdo domu] [ˈdɔ mɲɛ]

‘(to, towards) home’ gen.sg.m ‘to me’ i.gen.sg.m

In the Eastern South Slavic group, Bulgarian has traditionally been described as having distinctive (non-predictable) dynamic word stress (Stojkov 1966). In Bulgarian, three of the six vowels are subject to stress-related phonological vowel reduction (Pettersson and Wood 1987; Andreeva et al. 2013). Macedonian is the only non-West Slavic language with fixed stress, which is antepenultimate in trisyllabic and longer words (Koneski 1976, 1983; Bethin 1998: 178; van der Hulst et al. 1999: 436). Unlike Bulgarian, Polish, and Slovenian, BCMS apply stress assignment rules to clitic groups (Nespor 1999: 145; Werle 2009). In Macedonian, for example, post-verbal clitics cause a stress shift to the antepenultimate syllable (Rudin et al. 1999: 551). Among Western South Slavic languages, Serbian and Croatian have a lexical high tone that spreads to the syllable to its left if there is one, with some exceptions specific to the region of Zagreb (e.g. Smiljanić 2004). Stress in Slovenian falls on the first syllable with a strong low tone or, if there is no tone, on the last syllable (van der Hulst 2010b: 455). Slovenian stress is independent of lexical low and high tones, which are obligatory in some dialects but optional in the standard language (Gvozdanović 1999; Jurgec 2007). East Slavic languages, Russian, Ukrainian, and Belarusian, have unbounded distinctive word stress systems where the stress may occupy any position in a word and differ across inflexional forms, for example as shown in (6). (6) Russian

борода бороды

[bərɐˈda] ‘beard’ nom.sg [ˈborədᵻ] ‘beards’ nom.pl

Russian is often characterized as having free-stress assignment (Danylenko and Vakulenko 1995; Hayes 1995; Lavitskaya 2015). Longer word forms may feature secondary stress, but rules for its location remain a matter of dispute. Duration and intensity, the latter being less significant, are the major acoustic correlates of word stress, while pitch may be important when duration- and intensity-based cues are inconclusive (Eek 1987: 21). Duration and intensity would also appear to be the major correlates of word stress for Ukrainian and Belarusian, but they may differ in terms of their weight (Nikolaeva 1977: 111–113, 127–130; Łukaszewicz and Mołczanow 2018). In Russian, vowels are systematically reduced in unstressed positions (Bethin 2012). In standard Belarusian, the contrast between non-high vowels is neutralized to [a] or a more lax [ɐ] in unstressed syllables, and vowel reduction is categorical (Czekman and Smułkowa 1988).

15.2.2.4 Romance Romanian features weight-sensitive, right-edge word stress, influenced in both verbs and nouns by derivational affixes but not by inflexional ones (Chitoran 1996; Franzen and

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

230 MACIEJ KARPIńSKI et al. Horne 1997). Vowel quality in Romanian does not change significantly across stressed and unstressed tokens. Empirical studies show greater vowel dispersion under stress and limited centralization in unstressed positions (Renwick 2014).

15.3 Sentence prosody Intonational properties of the languages of the region have been studied to varying degrees employing both the more traditional contour-based methods and target-based descriptions such as the autosegmental-metrical (AM) framework (Table 15.1). Empirical studies of speech rhythm in these languages have contributed to the discussion on interval-based rhythm metrics.

Table 15.1 Available descriptions based on the autosegmental-metrical framework Language

Prosodic units

Pitch accents

Estonian

Intonation phrase

Finnish

Intonation phrase

Finno-Ugric H*, L*, H*+L, ^H*+L, %, H% L*+H, H+L*, H+!H* L+H*, L*+H L%, H%

Czech

Intonation phrase

Slovak

Accentual phrase Intermediate phrase Intonation phrase Intermediate phrase (aka Minor phrase) Major phrase Intonation phrase

Polish Russian BCMS Bulgarian

Phonological word Intermediate phrase Intonation phrase Intermediate phrase Intonation phrase

Slavic H*, L*, H*L, L*H, and a flat contour S* H*, L*, !H*, L* H*L, L*H, LH*, HL*, LH*L H*L, H*H, H*M, L*, L*, L*H, ^HL*, H*M/(H)L* H*+L, L*+H H*, L*, L+H*, L*+H, H+!H*, H+L* -

Boundaries

L%, H%, M%, 0% H-, L-, !H%H, H%, L% H-a, L-

Source Asu (2004, 2006), Asu and Nolan (2007) Välimaa-Blum (1993) Duběda (2011, 2014) Rusko et al. (2007), Reichel et al. (2015) Wagner (2006, 2008)

H%, L% %H, %M, %L, Odé (2008) L%, 0% %L, %H Godjevac LH(2000, 2005) L%, H%, HL% L-, HAndreeva (2007) %H, L%, H%

Romance Romanian Intermediate phrase H*, L*, L+H*, L+ Catalan cabell), and trochees are equivalent to Catalan monosyllabic words (Spanish and Portuguese caro ‘expensive’ > Catalan car). Final stress and monosyllabic words are more frequent in Portuguese than in Spanish, among other things due to the historical loss of intervocalic /l/ and /n/ (Spanish palo ‘stick’ > Portuguese pau, Catalan pal; Spanish artesana ‘craftswoman’ > Portuguese artesã, Catalan artesana). The different position of Spanish is shown in Figure 17.1. Antepenultimate stress is rare, particularly in Catalan and Portuguese. Although there are competing analyses of stress assignment in these languages, accounts of stress as a predictable phenomenon have relied on morphological information (e.g. lexical stem, suffixal morphemes, word category; for reviews, see Vigário 2003a; Hualde 2013). The extent to which syllabic quantity or weight determines stress location is debatable (Mateus and Andrade 2000; Wheeler 2005; Garcia 2017; Fuchs 2018). Besides a demarcative tendency, stress systems tend to show an alternation of prominences within the word, yielding patterns of secondary stresses. Catalan, Spanish, and some varieties of Portuguese, such as Brazilian Portuguese, may display the typically Romance alternating pattern of secondary stresses to the left of the primary stress (Frota and Vigário 2000; Hualde 2012). However, native speakers’ intuitions are less clear on the locations of secondary stress. Experimental work has often failed to find evidence for alternating patterns (Prieto and van Santen 1996; Díaz-Campos 2000). A different pattern occurs in European Portuguese, with alignment of secondary stress with the left edge of the word (Vigário 2003a). This pattern may also be found in Catalan and Spanish (Hualde 2010, 2012).

45 40 35 30 25 20 15 10 5 0

Catalan Spanish Portuguese

Monosyll

Disyll WS

Disyll SW

Trisyll WSW

Figure 17.1 Frequencies of stress patterns (%) in Catalan, Spanish, and Portuguese. S = Strong; W = Weak. (Data from Frota et al. 2006; Prieto 2006; Vigário et al. 2006, 2010; Frota et al. 2010)

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Iberia 253 Pitch accents generally associate to syllables with primary stress (see §17.4.1). However, in Catalan and Spanish emphatic speech, pitch accents can additionally associate with secondary stresses located at the left edge of the prosodic word. Prominence in compound words in Ibero-Romance languages is right-headed; that is, the rightmost element of the compound bears the compound stress. Internal members of compounds typically keep their stress in Catalan and Portuguese (Prieto 2004; Vigário 2003a, 2010), whereas in Spanish the survival of internal stresses seems to depend on the morphosyntactic or lexicalized nature of the compound (Hualde 2006).

17.2.2 Basque In contrast with Catalan, Portuguese, and Spanish, Basque displays a variety of wordprosodic systems. Most Basque dialects belong to the stress-accent type—that is, all lexical words have a syllable with main word stress (cf. Hualde 1997, 1999, 2003a), with differences in the (default) stress location. The most widespread system is deuterotonic stress—that is, stress on the second syllable from the left word edge (e.g. alába ‘daughter’, emákume ‘woman’, argálegi ‘too thin’). Some varieties do not allow for final stress, so in disyllabic words stress falls on the initial syllable. In many of these varieties, there is evidence that the domain for foot construction is the root rather than the whole word, as words such as lúrrentzako ‘for the lands’ and béltzari ‘to the black one’, which morphologically are composed of the monosyllabic roots lur and beltz followed by suffixes, have initial stress. Borrowings from Spanish (Sp.) or Latin (Lt.) show initial stress: jénde or jénte ‘people’ ( aɦ͂ári > ahái ‘ram’; neskáa > neská ‘the girl’). Secondary stress has been reported in Standard Basque on the final syllable of words that are four or more syllables long, without secondary stresses on alternate syllables (e.g. gizónarenà ‘the one of the man’, enbórrekìn ‘with the tree trunks’). Final syllable secondary

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

254 Sónia Frota, Pilar Prieto, and Gorka Elordieta stress has also been found in Southern High Navarrese, which has initial stress (Hualde 1997, 1999, 2003a). The geographical distribution of this pattern is as yet unknown. The Northern Bizkaian Basque (NBB) varieties have been classified as pitch accent systems, due to the similarity of NBB’s lexical contrast between accented and unaccented words with Tokyo Japanese. In NBB, a subject-object-verb (SOV) language, unaccented words have a prominent word-final syllable when they occur finally in a sentence fragment, including citation pronunciations, and when they occur pre-verbally in a sentence. Unlike these unaccented words, accented words have a lexically or morphologically determined prominent syllable regardless of their sentential position. The prominence derives from a H*+L accent in all cases. In the sentence on the left in Figure 17.2, the accented words amúmen ‘of the grandmother’ and liburúa ‘book’ have word-level stress, with a falling accent on the pre-final syllable. However, in the sentence on the right, the lexically unaccented word lagunen ‘of the friend’ does not have word-level stress (i.e. it does not have a pitch accent). The word dirua ‘money’ is lexically unaccented, but it receives an accent on its final syllable because it precedes the verb (cf. e.g. Hualde et al. 1994, 2002; Elordieta 1997, 1998; Jun and Elordieta 1997; Hualde 1997, 1999, 2003a). Accented words receive an accent because they have one or more accented morphemes (including roots). In most of the local varieties of NBB, the falling accent is assigned to the syllable that precedes the morpheme. If there is more than one accented morpheme in the word, the leftmost one determines the position of the accented syllable. In eastern varieties of NBB, such as the well-documented Lekeitio variety, a fixed location for stress developed (the penultimate or antepenultimate syllable of the word, depending on the variety), regardless of the location of the leftmost accented root. Further to the east, the Central Basque varieties of Goizueta and Leitza (in Navarre) can also be classified as pitch accent systems, this time not because of their similarity to Japanese but to languages such as Serbian, Croatian, Swedish, and Norwegian, which have a lexical tone contrast in the syllable with

200

0

0.5

1

1.5

2

2.5

f0 (Hz)

170 140 110 80 50

H*+L amúmen

H*+L liburúa

H*+L emon nau

lagunen

diruá

emon nau

Figure 17.2 Left utterance: Amúmen liburúa emon nau (grandmother-gen book-abs give aux ‘(S)he has given me the grandmother’s book’); right utterance: Lagunen diruá emon nau (friend-gen money-abs give aux ‘(S)he has given me the friend’s money’).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Iberia 255 word stress. Thus, in Goizueta and Leitza there is no lexical contrast between accented and unaccented morphemes and words, as all words have one syllable that stands out as more prominent prosodically; rather, there is a distinction in stress location as well as lexical pitch accent type. Four-way distinctions exist: words with stress on the initial syllable with rising or falling pitch accents, and words with stress on the second syllable with rising or falling pitch accents (Hualde 2007, 2012; Hualde et al. 2008). Although the pitch accent varieties of NBB and Goizueta/Leitza are different, there is a historical connection between them. In fact, Hualde (2003b, 2007, 2012), Elordieta (2011), and Egurtzegi and Elordieta (2013) argue that the NBB varieties are the remnants of a once general prosodic system with an accented/unaccented distinction, which changed into stress-accent systems in most areas of the Basque-speaking territory. Supporting evidence lies in the fact that accented morphemes in NBB are precisely those that introduce marked accentuation patterns (initial stress) in the stress-accent varieties with deuterotonic stress (cf. Hualde 2003b, 2007, 2012). Compounds show deuterotonic stress in Standard Basque. In stress-accent varieties there is considerable variation across and within local varieties. In most pitch accent varieties, compounds are generally pitch accented, even when the members in isolation are unaccented. The accent tends to occur on the last syllable of the first member of the compound, although there is variation among local varieties as well. (Hualde 1997, 2003a).

17.3 Prosodic phrasing The hierarchical structure of prosodic constituents is characterized by patterns of metrical prominence and may co-determine the tonal structure of the utterance. The prosodic systems of Iberian languages differ in the set of prosodic phrases that are intonationally relevant as well as in the patterns of phrasal prominence.

17.3.1 Prosodic constituents and tonal structure Prosodic phrasing may be signalled by boundary tones. Across the Iberian languages and language varieties, up to three prosodic constituents have been defined at the phrasal level: the intonational phrase (IP), the intermediate phrase (ip), and the accentual phrase (AP). In Catalan and Spanish, the IP and the ip are intonationally relevant. They are characterized by pre-boundary lengthening (stronger at the IP level) and the presence of boundary tones after their final pitch accent, with the inventory of boundary tones for the ip being smaller than that for the IP (Prieto 2014; Hualde and Prieto 2015; Prieto et al. 2015). Figures 17.3 and 17.4 illustrate these prosodic phrases in Catalan and Spanish, respectively. Differently from Catalan and Spanish, in Portuguese only one prosodic constituent is intonationally relevant—the IP (Frota 2014; Frota et al. 2015; Frota and Moraes 2016; Moraes 2008). The IP is the domain for pre-boundary lengthening; it defines the position for pauses and it is the domain of the minimal tune, which in the European variety may consist only of the nuclear accent plus the final boundary tone. Prosodic phrases smaller than the intonational phrase do not exhibit tonal boundary marking. An illustration is provided in Figure 17.5.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

256 Sónia Frota, Pilar Prieto, and Gorka Elordieta

400

0

0.5

1

1.5

2

f0 (Hz)

340 280 220 160 100

la

boliviana

de

Badalona

rememorava

la

noia

ip

IP

Figure 17.3 f0 contour of the Catalan utterance La boliviana de Badalona rememorava la noia (‘The Bolivian woman from Badalona remembered the girl’).

400

0

2

f0 (Hz)

340 280 220 160 100

la

niña

de

Lugo

miraba ip

la

mermelada IP

Figure 17.4 f0 contour of the Spanish utterance La niña de Lugo miraba la mermelada ‘The girl from Lugo watched the marmalade’.

Many of the constructions reported in Catalan and Spanish to be signalled by ip boundaries, such as parentheticals, tags, and dislocated phrases, are signalled in Portuguese by IP boundaries (Vigário 2003b; Frota 2014), and the prosodic disambiguation of identical word strings by ip’s in Spanish and Catalan occurs at the IP level in Portuguese. While in Catalan, Spanish, and Portuguese there is no evidence for tonally marked prosodic constituents between the prosodic word and the ip/IP (with the exception of Northern Catalan due to contact with French; see Prieto and Cabré 2013), in many dialects of Basque, three constituents are relevant to tonal structure: the IP, the ip, and the AP.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Iberia 257

300

0

0.5

1

1.5

f0 (Hz)

260 220 180 140 100

a

nora

da

mãe

falava

do

namorado IP

Figure 17.5 f0 contour of the Portuguese utterance A nora da mãe falava do namorado (‘The daughter-in-law of (my) mother talked about the boyfriend’).

200

0

0.5

1

1.5

2

f0 (Hz)

170 140 110 80 50

Mirénen

lagúnen AP

liburúa AP

erun ip

dot IP

Figure 17.6 f0 contour of an utterance from Northern Bizkaian Basque: ((Mirénen)AP (lagúnen)AP (liburúa)AP )ip erun dot (Miren-gen friend-gen book-abs give aux ‘I have taken Miren’s friends’ book’).

In Basque, IP’s are signalled by different boundary tones depending on sentence modality and on whether IP’s are final or non-final in an utterance (Elordieta and Hualde 2014). While final IP’s may have low, rising, or falling contours, non-final IP’s are intonationally marked by rising contours in Basque, signalling continuation. In Standard Basque, ip’s are marked by rising boundary tones at their right edge, but in NBB they are not marked by any right-edge boundary tones. Rather, they are characterized as domains of downstep, where the H*+L pitch accents cause downstep on a following accent (Elordieta 1997, 1998, 2003, 2007a, 2007b, 2015; Jun and Elordieta 1997; Gussenhoven 2004; Elordieta and Hualde 2014; see Figure 17.6). The lower-level constituent is the AP. In NBB, AP’s are sequences of one or

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

258 Sónia Frota, Pilar Prieto, and Gorka Elordieta

200

0

0.5

1

1.5

2

f0 (Hz)

170 140 110 80 50

Imanolen

alabien

diruá

erun ip

dot IP

Figure 17.7 f0 contour of an utterance from Northern Bizkaian Basque: ((Imanolen alabien diruá)AP )ip erun dot (Imanol-gen daughter-gen money-abs give aux ‘I have taken Imanol’s daughter’s money’).

more words marked by a rise in pitch at the left edge and a pitch fall at the right edge. The initial rise is a combination of an AP-initial L boundary tone and a phrasal H tone phonologically associated with the second syllable of the AP, and the pitch fall is a H*+L pitch accent. The H*+L accent may belong to an accented word or to a lexically unaccented word that occurs in immediate pre-verbal position (see §17.2). In all other contexts, lexically unaccented words do not carry an accent and are included in the same AP with any following word(s). Figures 17.6 and 17.7 illustrate the general patterns of phrasing into AP’s in NBB, respectively showing a sequence of three accented words and thus three AP’s, and a sequence of three unaccented words (which form one AP) before the verb. In Standard Basque, AP’s are not clearly identified at their left edge by a rising intonation. Rather, any word with an accent could constitute an AP (Elordieta 2015; Elordieta and Hualde 2014).

17.3.2 Phrasal prominence Phrasal prominence refers to the main prosodic prominence within a prosodic constituent. It is frequently related to the expression of focus, in line with the tendency for languages to exploit prosodic structure for the marking of information status. In Catalan, Spanish, and Portuguese phrasal prominence is rightmost. The last prosodic word in the phrase gets nuclear stress and the nuclear pitch accent thus typically occurs close to the right edge of the intonational phrase. In a broad-focus statement such as ‘They want jam’, the main phrasal prominence is on the last word, as illustrated in (1). Similarly, in a narrow-focus statement such as ‘They want JAM (not butter)’, it is also final, as in (2), but the pitch accent used to express it is different. In all three languages, a particular pitch accent is commonly used to convey narrow (contrastive) focus (see also §17.4.2).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Iberia 259 (Catalan) ‘(They) want jam’ (1) a. (Volen melmelada)IP L* L% b. (Quieren mermelada)IP (Spanish) L* L% c. (Querem marmelada)IP (Portuguese) H+L* L% (Catalan) ‘(They) want JAM (not butter)’ (2) a. (Volen MELMELADA)IP L+H* L% b. (Quieren MERMELADA)IP (Spanish) L+H* L% c. (Querem MARMELADA)IP (Portuguese) H*+L L% Non-final nuclear prominence is also possible, but here the three languages differ. Changes in the placement of the nuclear accent can be used as a strategy to convey narrow focus in Catalan, Spanish, and Portuguese, as shown in (3) for the statement ‘MARINA is coming tomorrow (not Paula)’, where nuclear prominence is found on ‘Marina’. However, Catalan and to a somewhat lesser extent Spanish are less flexible in shifting the main prominence to a non-phrase-final position than West Germanic languages (Vallduví 1992; Hualde and Prieto 2015). Instead, word order changes are generally used for focus marking in combination with prosodic prominence strategies (Vanrell and Fernández-Soriano 2013), as in (4). Although word order changes are also possible in some constructions in Portuguese, prosodic strategies like those exemplified in (2) and (3) are more widely used. For further details on phrasal prominence and focus in Catalan, Spanish, and Portuguese, see Frota (2000, 2014), Face (2002), Fernandes (2007), Vanrell et al. (2013), Prieto (2014), Frota et al. (2015), Prieto et al. (2015), and Frota and Moraes (2016). (3) a. (La MARINA vendrà demà)IP L* L% L+H* b. (MARINA vendrá mañana)IP L+H* L* L% c. (A MARINA virá amanhã)IP H*+L H+L* L% (4)

a. (MELMELADA)ip L+H* Lb. (MERMELADA)ip L+H* L-

(Catalan) ‘MARINA is coming tomorrow (not Paula)’ (Spanish) (Portuguese)

(volen)IP (Catalan) ‘(They)want JAM (not butter)’ L* L% (quieren)IP (Spanish) L* L%

Differently from Catalan, Spanish, or Portuguese, the neutral word order in Basque declarative sentences is SOV, and the main prosodic prominence is assigned to the pre-verbal constituent—that is, the object (Hualde and Ortiz de Urbina 2003). In the sentence in (5), the direct object madari bát ‘a pear’ is interpreted as the constituent with main prominence.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

260 Sónia Frota, Pilar Prieto, and Gorka Elordieta (5) Mirének umiari madari bát emon dotzo. Miren-erg child-dat pear-abs one give aux ‘Miren has given a pear to the child’ Any word order that is not SOV necessarily indicates that the sentence has a constituent that is focalized and other constituents have to be understood as ‘given’ information. For instance, if the sentence in (5) were to have the order OSV, the subject would be interpreted as having narrow focus and the object would be ‘given’ information. In narrow-focus contexts, the focalized constituent must be immediately to the left of the verb. Narrow focus can also occur post-verbally, at the end of the clause (Hualde and Ortiz de Urbina 2003; Elordieta and Hualde 2014). It is not clear whether there is a difference between the realization of prosodic prominence in broad focus and in non-corrective narrow focus. In Central and Standard Basque, in both cases pitch accents are rising. Whereas pitch accents may have their peaks on the post-tonic syllable in broad focus, in narrow focus there is a tendency for such peaks to be realized within the tonic syllable (Elordieta 2003; Elordieta and Hualde 2014). In the specific type of narrow focus called ‘corrective’, the focalized constituent always has an accent with a peak in the tonic syllable, followed by a reduced pitch range on the following material. This holds for all varieties (Elordieta 2003, 2007a). Thus, Iberian languages show varying prosodic prominence effects of focus, with Basque displaying heavy syntactic constraints that are not found in the other languages and with Portuguese showing a more flexible use of prosodic prominence.

17.4 Intonation In Iberian languages, the tonal structure of utterances comprises intonational pitch accents and boundary tones. Lexical pitch accents contribute to the tonal structure in Basque only (§17.2). Differences in the types, complexity, and distribution of pitch events, and resulting nuclear configurations, are found across languages. The division of labour between prosodic and morphosyntactic means to express sentence modality and other pragmatic meanings varies greatly too. Unless otherwise stated, the description below is mostly based on the varieties of each language whose intonation is best known: Central Catalan, Castilian Spanish, Standard European Portuguese, and Northern Bizkaian and Standard Basque.

17.4.1 Tonal events All of the languages described in this chapter have pitch accents and IP boundary tones, but not all show ip or AP edge tones. Portuguese stands out for the absence of ip tonal boundaries, whereas Basque is unique in the central role played by the AP in its tonal structure. This is as expected, given the set of intonationally relevant prosodic constituents for each language described in §17.3.1. There are larger and often different sets of nuclear pitch accents than of prenuclear pitch accents in Catalan, Spanish, and Portuguese. While most pitch accents in the languages’

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Iberia 261 inventories are used only in nuclear position, it is not uncommon for a given pitch accent to occur only in prenuclear position, as do Catalan L*+H and Spanish L+ L% Was the kid smart?

Is do en Luus2 jewäs? LL* H L% Is there a louse?

Figure 18.6 Cologne Accent 1 and Accent 2 in nuclear position interacting with two contours, H*L L% and L*H L%.

In some dialects, the opposition between Accent 1 and Accent 2 remains intact outside accented syllables. In Cologne and Maastricht, for instance, the contrast is maintained in postnuclear position by both durational and f0 cues (Gussenhoven and Peters 2004; Peters 2006b; Gussenhoven 2012b), while in Tongeren and Hasselt, Accent 2 affects primarily f0 (Heijmans 1999; Peters 2008). Venlo, Roermond, and Helden maintain the contrast only in intonationally accented syllables and IP-final syllables, where the lexical tone of IP-final Accent 2 interacts with the final boundary tone (Gussenhoven and van der Vliet 1999; Gussenhoven 2000c; Gussenhoven and van den Beuken 2012). Lexical tones associate either to the mora (Central Franconian and Dutch Limburgian dialects) or to the syllable (some Belgian Limburgian dialects) (Peters 2008). Where the TBU is the mora, the tone contrast is bound to bimoraic syllables, with (sonority) requirements on segmental structure that vary across dialect groups. In addition, regional vari ation is found in the lexical distribution of Accent 1 and Accent 2 (for overviews, see Schmidt 1986, 2002; Hermans 2013). Unlike the case in CNG, the presence of a lexical tone accent distinction in Franconian varieties does not seem to drastically restrict the richness of the intonational system (Gussenhoven 2004: 228 ff.). Figure 18.6 illustrates the interaction of Accent 1 and Accent 2 with two nuclear contours in Cologne German, where the lexical tone of Accent 2 assimilates to the starred tone (after Peters 2006b). More recently, alternative metrical analyses have been proposed that account for the opposition in terms of a contrastive foot structure (e.g. Hermans and Hinskens 2010; Hermans 2013; Köhnlein 2016; Kehrein 2017; for discussion see Gussenhoven and Peters 2019). Apart from the question of whether the tone accent distinction is better accounted for by a tonal or a metrical analysis, there is an unresolved controversy about the origin of the Franconian tone accents (cf. de Vaan 1999; Gussenhoven 2000c, 2018a; Schmidt 2002; Köhnlein 2011; Boersma 2017).

18.4 Concluding remarks The prosody of varieties of Continental Germanic may be more similar than descriptive traditions suggest. For the stress system, the bigger divide goes between English and the other Germanic languages, for which the Norman Conquest is usually taken to be respon-

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

284 Tomas Riad and Jörg Peters sible. Within Continental Germanic, there is a major divide between the languages that retain a surface-evident consonant quantity distinction and those that do not. That isogloss runs between the northern CNG languages on the one hand, and Danish, Standard German, Standard Dutch, English, and so on, on the other. At the same time there are several CWG varieties with a generally southern (Alemannic) spread that retain a consonant quantity distinction (Kraehenmann 2001). The Germanic languages are all intonational languages, employing pitch accents and boundary tones that tend to be distributed and/or used for information-structural and pragmatic purposes. There are also two lexical tonal systems that are superimposed on the intonation system. In CNG, the tonal contrast must be assumed to have restricted the pragmatic variation as expressed by the intonation, unlike the case in CWG, where the inton ation system is less drastically affected by the presence of a tonal contrast. This speaks to a basic difference between the CNG and CWG tonal contrasts, where the Franconian tonal system is constituted more like the Latvian and Lithuanian ones than the North Germanic one. This is also seen in the interaction with segmentals. The CNG system is relatively ‘pros odic’ in that it (i) does not care about the sonority of syllables, (ii) does not affect vowel quality, (iii) typically requires more than one syllable to be expressed (Accent 2), and (iv) assigns a version of the lexical tonal contour (Accent 2) to compounds, by a prosodic rule. By contrast, the CWG tone contrast (i) requires two sonorant morae if the TBU is the mora, (ii) may affect vowel quality, (iii) may occur within a single stressed syllable, and (iv) does not have a prosodic rule assigning a particular tonal contour to compounds.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 19

I n tonation Systems Across Va r ietie s of English Martine Grice, James Sneed German, and Paul Warren

19.1 The role of English in intonation research The mainstream standard varieties of English have played a major role in the development of models of intonation, with different traditions on veither side of the Atlantic. The British School emphasized auditory training, including the production and perception of representative intonation patterns—akin to training in the cardinal vowel system (Jones 1967)— and transcription using tonetic stress marks. These diacritics indicate the position of stressed syllables and pitch movements (falling, rising, level, falling-rising, and risingfalling) across these and following syllables. The most important feature is the ‘nuclear tone’, the pitch movement over the stressed syllable of the most prominent word in the phrase1 (nucleus) and any following syllables (tail). The British School has included didactic approaches (O’Connor and Arnold 1961), approaches focused on phonetic detail and broad coverage (Crystal 1969), and approaches considering the relationship with syntax and semantics/pragmatics (e.g. Halliday 1967; Tench 1996). Meanwhile, American Structuralism focused on phonological structure, separating stress-related intonation from intonation marking the edges of phrasal constituents (Pike 1945; Trager and Smith 1951). Together with the all-encompassing work by Bolinger (1958, 1986, 1989), who introduced the concept of pitch accent, and insights from work on Swedish (Bruce 1977), this set the stage for autosegmental-metrical (AM) approaches to English intonation (e.g. Liberman 1975; Leben 1976; Pierrehumbert 1980; Gussenhoven 1983b; Ladd 1983; Beckman and Pierrehumbert 1986). In AM theory, the separation of prominence-cueing and edge-marking aspects of inton ation is crucial: pitch accents are associated with stressed syllables, and edge tones with 1 Whether the nuclear syllable is always the most prominent in the phrase has been contested. Instead it has been defined positionally as the last content word or the last argument of the verb.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

286 MARTINE GRICE, JAMES SNEED GERMAN, AND PAUL WARREN phrase edges. The AM equivalent to the British School nuclear tone is the combination of the last pitch accent and the following edge tones, referred to as the ‘nuclear contour’. In what follows, we couch our discussion in AM theory, currently the most widespread approach to intonation, paying special attention to the nuclear contour, which is often the focus of comparison in the papers we have consulted. Although using AM representations facilitates comparison, we nonetheless have to exercise caution, since AM models can differ from each other in ways that do not always reflect differences in the varieties they describe. Each model can be seen as a prism through which to observe a variety, and at the same time a model that is developed to describe a variety is likely to have been shaped by this variety. We return to this problem when discussing mainstream varieties below.

19.2 Scope of the chapter This chapter is concerned with the structure and systematicity underlying intonation. A historical emphasis on standardized varieties from the British Isles and North America means that our descriptive understanding of those varieties is particularly comprehensive, and they have also played a central role in the development of theoretical models and frameworks. Therefore, we treat those varieties, and closely related varieties from Australia, New Zealand, and South Africa, under the label ‘Mainstream English Varieties’ (MEVs). MEVs show relatively few differences in overall phonological organization, whereas more substantial variation arises in cases where MEVs were adopted and subsequently nativized by non-Englishspeaking populations. We therefore examine separately a selection of ‘Contact English Varieties’ (CEVs) that differ from MEVs in typologically interesting ways.2 We explore the challenges posed by their diverging prosodic structures, in terms of both prominence and edge-marking, and observe that an account of the intonation of current-day ‘Englishes’ needs to cover a broad range of typological phenomena going well beyond what is present in the extensively researched mainstream varieties. This broad range in turn provides us with a chance to observe prosodic variation within one language in a way that is usually only available to cross-linguistic studies.

19.3 Intonational systems of Mainstream English Varieties MEVs have many intonational properties in common, both in terms of the distribution of tones throughout the utterance and in terms of their local and global tonal configurations. Hence, although we discuss northern hemisphere and southern hemisphere varieties separ ately below, this is mainly for convenience in the context of this handbook, which groups languages according to their geographical location. 2 This distinction is closely similar to that between Inner and Outer Circle varieties (Kachru 1985). A separate terminology is used here because the present review is concerned primarily with similarities and differences in the synchronic structural aspects of nativized varieties, as opposed to their sociolinguistic contexts.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

INTONATION SYSTEMS ACROSS VARIETIES OF ENGLISH 287

19.3.1 Northern hemisphere All MEVs have lexical word stress, in that specific syllables within a word are designated as prosodically privileged in the sense of Ladd (2008b) and Hyman (2014b). Not only do such syllables receive acoustic and articulatory enhancement but they are also possible locations for pitch accent realization. When present, pitch accents usually occur on the primary stressed syllable of a word, though, in particular contexts, secondary stressed or even unstressed syllables can be promoted to bear accents. This can be for rhythmic reasons— Chiˈnese, has an initial accent in CHInese GARden to avoid a clash of pitch accents on adjacent syllables—or in metalinguistic corrections, such as ‘I met roSA not roSIE’. Northern hemisphere MEVs also share a considerable reduction of unstressed syllables leading to strong durational differences between stressed and unstressed syllables. This is not always the case in southern hemisphere MEVs (see §19.3.4). Pitch accents are sparsely distributed in MEVs, and are more common on nouns than on verbs, and on content words than function words. Each intonational phrase (or intermediate phrase if the model includes one) requires at least one pitch accent—the nuclear pitch accent (with exceptions for subordinate phrases and tags; see Crystal 1975: 25; Firbas 1980; Gussenhoven 2004: 291; Ladd 2008b: 238). Prenuclear pitch accents are not usually obligatory in MEVs, but they may serve information-structural functions (e.g. topic marking). They may also be optionally inserted for rhythmic purposes, usually near the beginning of a phrase. MEVs use the placement of nuclear pitch accents to mark the difference between new or contrastive information versus discourse-given information. The rules governing this relationship are complex and outside the scope of this chapter (see Wagner 2012a for a review). In broad terms, however, a focused constituent obligatorily contains a nuclear pitch accent, while discourse-given words following the focused constituent are unaccented. This contrasts with many contact varieties, for which accent placement does not appear to be used to mark information status or structure. The intonation systems of MEVs have a rich paradigmatic range of pitch accents and edge tone combinations. The pitch accents proposed in a consensus system (Beckman et al. 2005) to describe Mainstream American English (MAE) are simple low (L*) and high (H*) tones, rises (L+H*), and scooped rises (L*+H); downstepped variants of each of these last three (!H*, L+!H*, L*+!H); and an early-peak falling accent (H+!H*). This system assumes two phrase types: the intonational phrase (IP) and the intermediate phrase (ip). The latter has a high (H-), low (L-), or downstepped high-mid (!H-) edge tone at its right edge. IP-finally, the ip edge tone is followed by a high (H%) or low (L%) IP edge tone (Pierrehumbert 1980; Beckman and Pierrehumbert 1986). The co-occurrence of ip and IP edge tones leads to complex tone sequences at IP edges: a phrase-final nuclear syllable—if combined with a bitonal pitch accent—can carry up to four tones. These pitch accents and edge tones are illustrated in online ToBI (Tones and Break Indices) training materials (Veilleux et al. 2006). The edge tones of ip’s are also referred to as ‘phrase accents’. For some varieties it has been argued that they can be associated with a post-focal stressed syllable (Grice et al. 2000),3 lending them an accent-like quality. Although this does not promote the syllable sufficiently

3 This is particularly common in fall-plus-rise contours, referred to as a compound fall-plus-rise tune in the British School, e.g. ‘My \mother came from /Sheffield’, where Sheffield is given in the discourse context (O’Connor and Arnold 1961: 84).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

288 MARTINE GRICE, JAMES SNEED GERMAN, AND PAUL WARREN to allow it to bear a pitch accent (thus, it does not counteract deaccentuation), the syllable is rendered more prominent by virtue of bearing a tone. This same inventory of pitch accents and edge tones could in principle be used to describe Southern Standard British English (SSBE) (Roach 1994; Ladd 2008b). However, there are a number of differences in the tonal inventories of AM models developed for SSBE, although they appear to reflect differences in the models themselves rather than differences in intonation (MAE and SSBE). For instance, some models include a pitch accent representing a nuclear fall H*+L (also written H*L; see Gussenhoven 1983b, 2004 on British English; Grabe et al. 2001), whereas others capture this configuration with a sequence of pitch accent and ip edge tone H* L-. AM models of MEVs also differ in their treatment of the movement (onglide) towards the starred tone. In MAE_ToBI it is c aptured with a leading tone (L+H*), whereas in some other models it is either non-distinctive (and therefore treated as phonetic detail; Grabe et al. 2001) or derived from a L tone from a previous accent (Gussenhoven 2004). Nonetheless if the pitch movement is falling (i.e. there is high pitch before the accented syllable), there is a general consensus across the different models that this should be captured with an early-peak accent (H+!H*). The status of the onglide is certainly of theoretical import, but it does not capture differences across the intonation of these varieties (see Grice 1995a for a discussion). It may be, for instance, that a more frequent use of falling pitch phrase-medially leads to the conclusion that there must be a H*+L pitch accent (Estebas Vilaplana 2003), whereas if falls tend to occur phrase-finally, they might be more likely to be analysed as H* followed by a Ledge tone. The danger of comparing different intonational systems using models that are based on different assumptions, even if they are all in the AM tradition, is discussed in Ladd (2008a), who argues that this can lead to comparisons that are not about particular languages or varieties but about the models themselves. Crucially, differences in the analysis of specific patterns should not be downplayed, as they have consequences for the overall organization of a model, including, for example, the need for an ip level of phrasing. Ladd (2008b) and Gussenhoven (2016), among others, provide valuable critiques of the MAE-based consensus model outlined above. However, they highlight that revisiting early theoretical choices of that model need not undermine the appropriateness of an AM approach to MEVs. Although any combination of pitch accents and edge tones is in principle allowed, certain preferred combinations occur more frequently than others (Dainora 2006). Likewise, although individual pitch accents and edge tones have been assigned pragmatic functions (Pierrehumbert and Hirschberg 1990; Bartels 1999), it is often combinations of nuclear pitch accents and edge tones (i.e. nuclear contours) that are referred to when the meaning of intonation is discussed (see e.g. Crystal 1969; Brazil et al. 1980; Cruttenden 1986; Tench 1996). For instance, rise-fall and (rise-)fall-rise nuclear contours are said to convey meanings such as unexpectedness or disbelief respectively (see §19.5 for a discussion of rises). Differences between North American and British standard varieties can often be expressed in terms of which nuclear contour is preferred in which pragmatic context (Hirst 1998), rather than differences in the inventory. However, differences in usage can lead to misunderstandings: for instance, a request using H* L-H% is routinely used in SSBE but can sound condescending in MAE (Ladd 2008b), where a simple rise is preferred.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

INTONATION SYSTEMS ACROSS VARIETIES OF ENGLISH 289

19.3.2 Non-mainstream varieties of American English The limited research on the intonation of regional or ethnic varieties of American English suggests that varietal differences are relatively minor. Comparing rising pitch accents in speakers from Minnesota and Southern California, for example, Arvaniti and Garding (2007) found that the former group has later alignment of tonal targets and likely lacks a distinction between what are taken to be two distinct accents in the latter variety, H* and L+H*. In a comparison of Southern and Midland US varieties, Clopper and Smiljanić (2011) found distributional differences, namely in the relative frequencies of certain edge tone categories, which also varied across gender and texts. African American English (AAE) has been relatively well studied (see Thomas 2007, 2015 for reviews). One widely observed feature is that certain words have initial stress where MAE has non-initial stress (e.g. ˈpo.lice, ˈho.tel; e.g. Fasold and Wolfram 1970). This is noteworthy since it affects tonal alignment. A few studies report final falling or level contours in AAE for polar questions (Tarone 1973; Loman 1975; Jun and Foreman 1996; Green 2002), whereas MAE speakers typically use a high rising contour (L* H-H%). Jun and Foreman (1996) also report post-focal pitch accents (i.e. no deaccenting) with early focus. Other differences involve f0 scaling, including a greater overall range (Tarone 1973; Hudson and Holbrook 1981, 1982; Jun and Foreman 1996) and a lower degree of declination (Wolfram and Thomas 2002; Cole et al. 2008). Research on Chicano English in Hispanic communities remains entirely descriptive, and characteristics vary by region. Metcalf (1979) provides a comprehensive survey of Chicano English studies spanning 20 years, identifying five features as typical: (i) a tendency for noun compounds to have primary stress on the second constituent rather than the first (e.g. baby ˈsitter; Metcalf 1979; Penfield 1984), (ii) a less pronounced utterance-final fall for declaratives and wh-interrogatives (Castro-Gingras 1974; Metcalf 1979), (iii) greater use of non-final rising pitch prominences (Castro-Gingras 1974; Thomas and Ericson 2007), (iv) the use of final falling contours for polar questions (Castro-Gingras 1974; Fought 2003), and (v) the use of emphatic ‘rising glides’ with a peak very late in the stressed syllable (Penfield 1984; Penfield and Ornstein-Galicia 1985). Fought (2002) also notes additional lengthening of stressed syllables at the beginnings and ends of utterances. Burdin (2016) presents a comprehensive examination of English intonation among Jewish and non-Jewish Americans in Ohio. While the differences mainly concern category usage (e.g. Jewish speakers use more rising pitch accents in listing contexts and more level contours in narratives), her findings suggest that contact with Yiddish has contributed to a more ‘regular alternation between high and low tones within a prosodic phrase’ (p. 12), or ‘macro-rhythm’ (Jun 2014b), in Jewish English. In general, existing research on non-standard American English varieties is highly descriptive, rarely concerned with phonological structure, and mostly from older studies; this area of research clearly needs to be updated.

19.3.3 Non-mainstream British varieties Across British varieties there is considerable regional and individual variation in intonational inventories (Cruttenden 1997; Grabe et al. 2000), as well as differences in preferred patterns in

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

290 MARTINE GRICE, JAMES SNEED GERMAN, AND PAUL WARREN given pragmatic contexts. One striking aspect of many urban northern varieties (e.g. in Birmingham, Liverpool, Glasgow, Belfast, and Tyneside, which Cruttenden 1994 dubbed Urban Northern British (UNB)), is the routine use of rising contours in statements. Not only their form but also their usage as the default contour distinguish them from uptalk (see §19.5). Cruttenden (1997) proposes four types of ‘rise’: a simple rise (preferred in Glasgow), two rises with plateaux (rise-plateau and rise-plateau-slump, both common in Birmingham, Liverpool, Belfast, and Tyneside), and a rise-fall, found less commonly in several UNB varieties as well as in Welsh English. An alternative AM transcription system has been developed specifically to capture this regional variation (Intonational Variation in English, or IViE; Grabe et al. 2001); it would transcribe these contours as L*+H (Grabe 2002, 2004) combined with different edge tone sequences. To capture the difference between a final rise, a level, and a slump, IViE incorporates a null boundary tone (0%) in addition to H% and L% (the latter taken to be low, not upstepped as is sometimes the case in MAE_ToBI). However, in a detailed description of Manchester English, Cruttenden (2001) argues that this addition is insufficient to capture the range of distinctive contours both here and in UNB varieties. Distinctions need to be made between slumps and slumps plus a mid-level stretch, and between simple falls and falls plus a lowlevel stretch. Cruttenden argues for a feature involving sustention, rather like spreading, which §19.4 will show is useful for contact varieties. Irish English, referred to as Hiberno-English, is defined by a set of unofficial localized linguistic standards, setting it apart from British varieties. The intonation patterns in Hiberno-English varieties can be mapped onto corresponding regional Irish varieties (Dalton and Ní Chasaide 2007b; Dorn and Ní Chasaide 2016). There is a clear difference in the intonation of the south and the north of the island making up the Republic of Ireland and Northern Ireland. According to Kalaldeh et al. (2009), declarative statements in Dublin and Drogheda English tend to have prenuclear and nuclear falling contours, analysed as bitonal H*L pitch accents, whereas further north (e.g. in Donegal) they have prenuclear and nuclear rises similar to those in Belfast (analysed as L*H). O’Reilly et al. (2010) report post-focal deaccenting in Donegal English. Early focus, interestingly, leads to a gradual fall from the focal H tone over the post-focal domain, reaching a low-pitch phrasefinally, rather than the characteristic final rise of broad-focus or late-focus statements in this variety

19.3.4 Southern hemisphere mainstream varieties Many of the properties of northern hemisphere varieties apply to southern hemisphere varieties too. Analyses of Australian English (AusE) assume the same inventory of pitch accents, phrase accents, and boundary tones as the AM analyses of MAE (see e.g. Cox and Fletcher 2017), although earlier analyses (e.g. Fletcher et al. 2002) included H*+L and its downstepped variant, !H*+L, which are absent from MAE_ToBI. Rising tunes have been widely studied in AusE (see also §19.5). Fletcher et al. (2002) found that low-onset high rises (L* H-H%) and expanded-range fall-rises (H*+L H-H%) are more frequent and have greater pitch movements than low-range rises (L* L-H%) and low-range fall-rises (H* L-H%). There are higher topline pitch values and more high-onset

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

INTONATION SYSTEMS ACROSS VARIETIES OF ENGLISH 291 rises (H* H-H%) in instructions and information requests than in acknowledgements. For participants issuing instructions, expanded-range rises are more likely at the ends of turns than turn-medially. An analysis of newsreader speech showed that New Zealand English (NZE) has comparatively fewer level nuclei and more complex nuclei (the latter potentially expressing greater emotional involvement) than British English (Warren and Daly 2005). NZE tends to have more dynamic intonation patterns with a higher rate of change in pitch (Warren and Daly 2005) than British English, though this varies by region within New Zealand. Vermillion (2006) found that NZE is characterized by higher pitch, with higher boundary tones (H%) but smaller pitch falls between adjacent H* accents. Comparisons of AusE and NZE have chiefly focused on rising intonation patterns, particularly uptalk (Fletcher et al. 2005; Warren and Fletcher 2016a, 2016b). One point of comparison has been how the two varieties distinguish uptalk rises on declaratives from yes/no question rises, with more dramatic rises for statements than for questions (see §19.5) and more fall-rise patterns for uptalk in AusE. Moving finally to South Africa, given that English is the most commonly spoken language in official and commercial public life and is the country’s lingua franca, it is surprising how little has been written about the intonation of this variety. Indeed, reference descriptions of South African English (e.g. Lass 2002; Bowerman 2008) make no mention of prosodic features. Uptalk rises on declaratives have been reported for White South African English (WSAfE), and, as in NZE, these tend to have a later rise onset than question rises (Dorrington 2010a, 2010b). For Black South African English see §19.4.8.

19.4 English intonation in contact This section considers varieties arising from second language (L2) use of English, with subsequent nativization. Contact with typologically diverse languages has resulted in inton ation systems that differ, sometimes dramatically, from those of MEVs.

19.4.1 Hong Kong English Hong Kong English (HKE), or Cantonese English, refers to a variety spoken by first language (L1) speakers of Cantonese, either as an L2 or as a nativized co-L1 in contexts such as Hong Kong, where both languages are spoken side by side. Luke (2000) provided the first account of HKE intonation in terms of tone assignment that is sensitive to lexically stressed syllables. Word stress patterns of SSBE are interpreted in terms of three Cantonese level tones: High (H), Mid (M), and Low (L). The resemblance between Cantonese level tones and the pitch patterns of HKE syllables is indeed striking, and most authors assume that the tone inventory of HKE at least partly originates from Cantonese (though see Wee 2016 for a discussion). Luke (2000) proposes the rules in (1) for tone assignment in HKE, where ‘stress’ corresponds to all primary stressed syllables and variably to secondary stressed syllables in British English. Other patterns result from concatenations of the word-level patterns, except

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

292 MARTINE GRICE, JAMES SNEED GERMAN, AND PAUL WARREN that certain classes of function words (possessors, modals, and monosyllabic prepositions) are realized as M (Wee 2016). (1) a. Stressed syllables are realized as H. b. Unstressed syllables before the first stressed syllable in a word are realized as M. c. H spreads rightward. Luke (2000) assumes that all syllables have level pitch, though later studies show that the final syllable may be realized as a fall, resulting from a declarative L% boundary (Wee 2008; Cheung 2009). As Figures 19.1a and 19.1b show, the final syllable is realized as HL if it is stressed and as L otherwise. For sentences that end with more than one unstressed syllable in sequence, an interpolated fall stretches from the H of the last stressed syllable (Figure 19.1c) (Wee 2016). Luke (2000) and Gussenhoven (2017b, 2018b) also observe a sustained, high-level final pattern for final-stress utterances. Gussenhoven attributes this to an absence of boundary tone (Ø), which contrasts with L% and conveys a non-emphatic declarative meaning. Wee (2016), however, argues that this is a phrase-medial pattern (even when pronounced in isolation), ruling out a boundary tone. The right edge of polar interrogatives is high rising H H% if the last syllable is stressed (Figure 19.2a) and falling-rising HLH% from the last stressed syllable otherwise (Figures 19.2b and 19.2c). According to Gussenhoven (2017b, 2018b), pre-final L in the latter case arises from a HL assigned to the last stressed syllable (as opposed to H for earlier stresses), while Wee (2016) assumes that LH% is a compound boundary tone. Existing accounts mostly agree on the surface phonological representations in HKE, and that deriving those representations begins with an assignment of H to lexically privileged syllables corresponding to stress in MEVs. They also agree that H spreads rightward within a. Final stress

b. Penultimate stress

c. Antepenultimate stress

tea

apple

yesterday

H H%

H LH%

H

LH%

Figure 19.1 Tonal representations and stylized f0 contours for three stress patterns in a declarative context. (Adapted from Wee 2016: ex. 15d)

a. Final stress

b. Penultimate stress

c. Antepenultimate stress

tea

apple

yesterday

H L%

H L%

H

L%

Figure 19.2 Tonal representations and stylized f0 contours for three stress patterns in a polar interrogative context. (Adapted from Wee 2016: ex. 15g)

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

INTONATION SYSTEMS ACROSS VARIETIES OF ENGLISH 293 a word unless the word is IP-final, and M is assigned to unstressed syllables before the first stress in a word, as well as to certain functional categories.

19.4.2 West African Englishes (Nigeria and Ghana) Nigerian English (NigE) intonation has been described as involving predominantly level tones (Wells 1982; Jowitt 1991; Gut 2005). This is not surprising given its emergence in contact with level-tone languages such as Hausa, Igbo, and Yoruba. Gut (2005) proposes that tones are assigned to syllables based on word stress and grammatical category: lexical words take H on all syllables except the first, which is L if unstressed and H otherwise. Certain functional categories (e.g. personal pronouns) also take H, while articles, prepositions, and conjunctions take L. Downtrending is widely observed (Jowitt 2000; Gut 2005). A perception study (Gussenhoven and Udofot 2010) suggests that this results from obligatory downstep between subsequent H tones, triggered by a L associated to the left edge of lexical words (in words with initial stress, L is floating). A further production study (Gussenhoven 2017b) indicated that, contra Gut (2005), H is assigned only to syllables that have primary or secondary stress in SSBE, and that intervening syllables have high f0 due to interpolation. Word-level tone assignment appears obligatory and cannot be modified for the expression of contrast or information status (Jibril 1986; Jowitt 1991; Gut 2005). This may explain the impression that NigE has nuclear pitch on the final word. For declarative utterances, the final syllable is falling, which Gussenhoven (2017b) attributes to L%. Polar interrogatives generally end with a high or rising tune (Eka 1985; Jowitt 2000), suggesting that L% alternates with H%. Ghanaian English (GhanE) is very similar to NigE with a few notable exceptions. Gussenhoven (2017b), building on observations by Criper (1971) and Criper-Friedman (1990), proposes that H is assigned only to syllables corresponding to primary stresses in British English. For lexical words and certain functional categories, a word-initial L spreads to all syllables before the first H, while H spreads rightward from that syllable. IP-finally, word-final unstressed syllables are low or falling, suggesting that H is blocked from spreading rightward in that position. As in NigE, most function words are assigned L. Downstep is triggered across subsequent LH-bearing words, while pitch reset occurs across IP boundaries. Polar interrogatives are not usually marked with a boundary tone, ending instead with the pitch level of the last H. Occasionally, the final syllable of a polar interrogative is low-rising; thus, when H% occurs, a polar tone rule changes the final H to L.

19.4.3 Singapore English Singapore English (SgE) is a fully nativized L1 variety. Virtually all Singaporeans speak SgE by early school age, and it is the dominant home language for one third of the population (Leimgruber 2013). SgE has been influenced by contact with a wide range of language var ieties, and even highly standardized uses of SgE differ markedly from MEVs in terms of prosody (Tay and Gupta 1983). SgE intonation has been described as a series of ‘rising melodies’, ending in a final rise-fall (Deterding 1994). The domain of rises is typically a single content word along with preceding

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

294 MARTINE GRICE, JAMES SNEED GERMAN, AND PAUL WARREN

200 Hz

80 Hz animals 0

were

digging

Time (s)

in

the

rubbish 1.816

Figure 19.3 Waveform, spectrogram, and f0 track for a sentence of read speech in Singapore English. (D’Imperio and German 2015: fig. 1b)

function words (Chong 2013), though in certain syntactic contexts a rise may span two content words (Chong and German 2015, 2017). As Figure 19.3 illustrates, the utterance-initial rise has a large pitch excursion, while subsequent rises have smaller ranges (Deterding 1994; Low and Brown 2005). The repeating rises likely correspond to units of phrasing, as they reflect the primary domain of tune assignment and serve a grouping function. Chong (2013) proposes an AM model based on the accentual phrase (AP). The AP is marked at left and right edges by phrasal L and H tones respectively, as well as by an optional L* pitch accent aligned with the primary stressed syllable of the content word. Chong’s phrasing analysis is further supported by the fact that these rises are associated with extra lengthening of the word-final syllable (Chong and German 2017). Chong’s (2013) model also includes two further levels of phrasing: the ip and IP. The ip is marked at the right edge by H- (IP-medially) or L- (IP-finally) and accounts for pitch reset in adjacent AP’s. The IP is marked at the right edge by L% for declaratives and by H% for polar interrogatives. F0 peaks are aligned close to the end of the word in non-final AP’s. Thus, if contrastive word-level prominence is present in SgE, the link to f0 differs from that in MEVs. SgE lacks marked f0 changes between stressed and unstressed syllables, as well as any step-down in f0 from stressed to post-stress syllables within a word (Low and Grabe 1999). The proposed stress-sensitivity of L* was not corroborated by Chong and German (2015), who found no effect of stress on the alignment of low or high targets in initial AP’s. Instead, the contour shape was similar across stress patterns, while words with initial stress were produced with a globally higher f0. The alignment of f0 peaks in IP non-initial positions needs systematic investigation. Wee (2008) and Ng (2009, 2011) propose that three level tones (L, M, H) are assigned to syllables at the phonological word level. H is assigned to all final syllables, L to initial unstressed syllables, and M elsewhere either as the default tone or through spreading. Some syllables may end up toneless, in which case interpolation applies. Further quantitative

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

INTONATION SYSTEMS ACROSS VARIETIES OF ENGLISH 295 studies are needed to establish which aspects of SgE contours are better accounted for by level tones assigned to syllables versus phrase-level tones. Existing research concentrates on ethnically Chinese speakers and read speech (though see Ng 2009). Tan (2002, 2010), however, describes tune differences across three ethnic groups of Singapore (Chinese, Malay, and Indian), which reflect characteristics of the respective ‘mother tongue’ languages. More research is needed to establish which aspects of SgE intonation are shared across communities and registers.

19.4.4 Indian English Indian English (IE) is widely considered to be a fully nativized variety (e.g. Kachru 1983). Although there is a high degree of variation across speakers, IE intonation appears to conform to a single basic phonological organization similar to that of MEVs, while varietal differences concern mainly word-level stress, tonal inventories, or meaning. Word-level stress in IE is largely rule-governed rather than lexically specified, though the specific rules differ by L1 background, region, and individual (Wiltshire and Moon 2003; Fuchs and Maxwell 2015; Pandey 2015). For instance, stress assignment is quantity-sensitive for both Hindi and Punjabi speakers (Sethi 1980; Pandey 1985) but quantity-insensitive for Tamil speakers (Vijaykrishnan 1978). This poses a challenge for researchers, since it is difficult to anticipate stress patterns in materials. Nevertheless, most varieties involve promin ent pitch movements on stressed syllables, wherever the latter occur (Pickering and Wiltshire 2000; Wiltshire and Moon 2003; Puri 2013). Féry et al. (2016) analyse the narrow-focus patterns produced by Hindi background speakers in terms of phrasal tones. However, studies by Maxwell (see Maxwell 2014; Maxwell and Fletcher 2014) involving Bengali and Kannada speakers showed that for rising contours (i) peak delay relative to syllable landmarks did not vary with number of post-accentual syllables and (ii) peak delay and L-to-H rise time were correlated with syllable duration. Since these results indicate peak alignment with stressed syllables and not the right edge of the word, they disfavour a phrase tone (i.e. edge tone) analysis. Studies nevertheless show that IE has at least two levels of phrasing (ip and IP, as in MEVs), with differential final lengthening (Puri 2013; Maxwell 2014). Based on detailed timing data, Maxwell (2014) and Maxwell and Fletcher (2014) characterize the alignment characteristics of rising pitch accents in IE. For all speakers, L is consistently anchored to the onset consonant of the stressed syllable. For Kannada speakers, H aligns to the end of the accented vowel if it is long and to the onset of the post-accented syllable if the accented vowel is short. For Bengali speakers, H aligns to the post-accentual vowel in nuclear focal accents and variably to the accentual or post-accentual vowel in prenuclear accents. These results suggest that Kannada speakers have a single rising accent, while Bengali speakers use both L+H* and L*+H for prenuclear accents. ToBI analysis of read and spontaneous speech shows evidence of other pitch accent categories, including H* and L*, as well as downstepped variants of pitch accents and phrase accents. A study by Wiltshire and Harnsberger (2006) similarly found broad differences between Gujarati speakers, who use mostly rising pitch accents (L+H* or L*+H), and Telegu speakers, who additionally produce falling accents (H+L*, H*+L, H*).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

296 MARTINE GRICE, JAMES SNEED GERMAN, AND PAUL WARREN In general, focused words are realized with some combination of greater duration, amplitude, and pitch excursion on the accented syllable (Moon 2002; Wiltshire and Harnsberger 2006; Maxwell 2010; Maxwell 2014). Studies also report compression in the post-focal region (Féry et al. 2016) or alternation between deaccenting and post-focal compression without deaccenting (Maxwell 2014). Focus may also be marked by the insertion of a phrase boundary before or after the focused constituent (Maxwell 2010; Maxwell 2014; Féry et al. 2016). Many regional and L1 varieties of IE remain understudied; thus, more research is needed to corroborate the impression that IE varieties involve a similar phonological organization for intonation. Additionally, detailed phonetic evidence is needed to clarify many existing claims.

19.4.5 South Pacific Englishes (Niue, Fiji, and Norfolk Island) The islands of the South Pacific are home to a range of contact varieties of English. Because of patterns of economic migration, many of these varieties have greater numbers of speakers outside the islands (e.g. Niuean English, which has more speakers in New Zealand than on Niue). In addition, the movement of indentured labourers from India to Fiji in the late nineteenth and early twentieth centuries has resulted in an Indo-Fijian English variety alongside Fijian English. Starting with Fiji, since English is a second or third language for nearly all Fiji Islanders, there is L1 transfer in suprasegmental as well as segmental features (Tent and Mugler 2008). Stress patterns tend to be quite different from Standard English (i.e. in this case SSBE), such as [ˈkɒnˌsidɐret] for considerate or [ˌɛˈmikabɐl] for amicable. There is also a tendency for the nuclear pitch accent to occur on the verb, even in unmarked sentences, such as I am STAYing in Samabula. The most marked intonational feature, however, is an overall higher pitch level than in MEVs, especially a pattern on yes/no questions that starts high and ends with a rapid rise and sudden fall in pitch. This pattern (see Figure 19.4) is much closer to Fijian than to Standard English, and sometimes results in misunderstandings, where Standard English listeners have the impression that the speaker expects a positive response. Durational differences between stressed and unstressed syllables are weaker in Niuean English than in MEVs (Starks et al. 2007). This variety also exhibits the high rising terminal or uptalk intonation pattern found in NZE.

(Are) you ready to go ? (Fijian Fiji English)

Are you ready to go ? (Standard English)

Figure 19.4 Intonation patterns for yes/no questions in Fijian and Standard English. (Tent and Mugler 2008: 249)

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

INTONATION SYSTEMS ACROSS VARIETIES OF ENGLISH 297 The status of Norfolk Island English (‘Norfuk’) as either a variety of English or a creole is uncertain (Ingram and Mühlhäusler 2008). Its intonation is characterized as having a wide pitch range with much (often exaggerated) variation in pitch and tempo.

19.4.6 East African Englishes (Kenya and Uganda) The intonation of East African Englishes is understudied. Otundo’s (2017a, 2017b) import ant groundwork reveals considerable differences in the English spoken by different L1 groups in Kenya, which she refers to as ethnically marked varieties. For declarative questions, Nandi speakers produce a rising pitch accent followed by a boundary rise (L*+H H%) whereas Bukusu speakers produce a high pitch accent followed by a fall (H* L%). Statements predominantly have either a rise-fall (L*+H L%) or a fall (H* L%) in both groups, depending on the presence or absence of prenuclear material, respectively. wh-questions have a nuclear pitch accent on the wh-word, unlike MEV varieties, but they may have a terminal rise, in which case the nuclear accent is on a later word in the phrase. Nassenstein (2016), in an extensive overview of Ugandan English, gives no detail on intonation but does identify an interesting use of an intonational rise accompanied by add itional vowel lengthening to mark assertive focus, as in (2). (2) And he went ↑far up to there. ‘And he went far (further than I had imagined).’ (Nassenstein 2016: 400) Both Kenyan English and Ugandan English show less durational reduction of unstressed syllables compared to most MEVs (Schmied 2006).

19.4.7 Caribbean English This section must begin with a caveat: descriptions of Caribbean English do not always clarify whether they are describing Caribbean-accented varieties of English or Englishbased Caribbean creoles. A notable exception is Devonish and Harry (2008), who explicitly distinguish Jamaican Creole and Jamaican English, identifying the latter as an L2 acquired through formal education by speakers of Jamaican Creole. The two languages coexist, therefore, in a diglossic situation. The prosodic feature that these authors highlight is the prominence patterns on disyllabic words. For example, Jamaican Creole distinguishes between the kinship and priest meanings of faada (‘father’) by having a high tone on the first syllable4 for the first meaning but on the second syllable for the second meaning, also treated as a lexical stress contrast /ˈfaada/ vs. /faaˈda/ (Cassidy and LePage 1967/1980). In Jamaican English, these tonal patterns are maintained on /faaðo/, but the lexical stress is on the first syllable in both cases. Wells (1982: 572–574) highlights several other features of the prosody of West Indian English. These include a reduced durational distinction between stressed and unstressed 4 In other analyses of Caribbean creoles (e.g. Sutcliffe 2003), it is suggested that founder speakers of the creoles might have reinterpreted English word stress along lines of the tonal distinctions of West African languages, and realized stressed syllables with a high tone.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

298 MARTINE GRICE, JAMES SNEED GERMAN, AND PAUL WARREN syllables as compared to MEVs (see also Childs and Wolfram 2008). At the same time, however, pitch ranges tend to be wider, which compensates somewhat for the reduction in the stress contrast by increasing the difference between accented and unaccented syllables. Wells also observes a tendency to shift stress rightwards, particularly in emphasis, giving /kɪˈʧɪn/ for kitchen (noted also for Eastern Caribbean creoles by Aceto 2008), which he suggests might in fact be a rise-fall nucleus associated with the initial syllable, in autosegmental terms: L* H- L%. In this case, rather than stress being shifted, the second syllable may simply have pitch prominence from the H- phrase accent. A further intonational characteristic highlighted for Bahamian English (Wells 1982; Childs and Wolfram 2008) as well as Trinidadian and Tobagonian Creole (Youssef and James 2008) is the use of high rising contours with affirmative sentences. These rises appear to be inviting a response from the listener.

19.4.8 Black South African English Given that English in South Africa is less common (at 9.6%) as a first language (L1) than IsiZulu (22.7%) or IsiXhosa (16.0%) and is the L1 for just 2.9% of Black South Africans (Statistics South Africa 2011), it is unsurprising that Black South African English (BlSAfE) is heavily influenced by its speakers’ L1s. Swerts and Zerbian (2010) compared intonation in the English of intermediate and advanced speakers who had Zulu as their L1. Both groups used rising and falling intonation patterns common to both Zulu and English to mark the difference between non-final and final phrases respectively, but only the advanced speakers patterned with the native L1 speakers of English in using intonation to mark focus (Zerbian 2015). Coetzee and Wissing (2007) report that compared to WSAfE and Afrikaans English, BlSAfE (in this case Tswana English) has less of a distinction in duration between stressed and unstressed syllables, and furthermore does not show phrase-final lengthening. This supports similar general observations for BlSAfE by van Rooy (2004), who also notes that— again largely in line with the speakers’ L1—stress assignment is on the penultimate syllable (e.g. sevénty) unless the last syllable is superheavy. This author also observes ‘more frequent occurrence of pragmatic emphasis, leading to a different intonation structure of spoken BlSAfE’ (p. 178) and notes that IP’s tend to be shorter than in WSAfE (cf. Gennrich-de Lisle 1985).

19.4.9 Maltese English Maltese English (MaltE), alongside Maltese and Maltese Sign Language, is an official language of Malta, with the great majority of its population being bilingual to various degrees. MaltE does not reduce unstressed syllables to the same extent as MEVs and does not have syllabic sonorants in unstressed syllables (Grech and Vella 2018; see also chapter 16). As in a number of other contact varieties, MaltE also differs from MEVs in stress assignment in compounds, such as fire ˈengine and wedding ˈpresent, with stress on the final rather than

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

INTONATION SYSTEMS ACROSS VARIETIES OF ENGLISH 299 initial element, except in cases where the final element is monosyllabic, such as ˈfireman (Vella 1995). However, like Maltese, MaltE has regular pitch accents associated to lexically stressed syllables and tones associated with the right edge of IP’s. In wh-questions and some other modalities (e.g. vocatives and imperatives), tones can also be associated to the left edge of consituents (Vella 2003, 2012). The right-hand phrasal edge tone, a phrase accent in the sense of Grice et al. (2000), is particularly important in the tonal phonology of MaltE and is often associated with a lexic ally stressed syllable (Vella 1995, 2003), leading to a percept of strong post-focal prominence (Galea Cavallazzi 2004).

19.5 Uptalk A frequently discussed feature of English intonation is uptalk, sometimes referred to as ‘high rising terminal’ intonation (but see discussion in Warren 2016 and references therein). This use of rising intonation at the end of a declarative utterance should not be confused with UNB rises, from which it is phonetically and functionally distinct. The term ‘antipodean rise’ reflects its possible provenance in Australia and/or New Zealand, where it is becoming an unmarked feature. It is, however, found in many English varieties, including those spoken in the United States, Canada, South Africa, and the United Kingdom (see e.g. Armstrong and Vanrell 2016; Arvaniti and Atkins 2016; Moritz 2016; Prechtel and Clopper 2016; Warren 2016; Wilhelm 2016). Because of phonetic similarity to rises on yes/no and echo questions, uptalk is frequently interpreted by non-uptalkers as signalling that the speaker is questioning the content of their own utterances and is therefore uncertain or insecure. However, the distribution of uptalk and its interpretation by uptalkers indicate that it is used as an interactional device, to ensure the listener’s engagement in the conversation. (For further discussion of the meanings of uptalk see Tyler and Burdin 2016; Warren 2016.) In AM terms, uptalk has been labelled as L* H-H%, L* L-H%, and H* L-H% for Canadian English (Di Gioacchino and Jessop 2011; Shokeir 2007, 2008); L* L-H%, L* H-H%, and H* H-H% for American English (Hirschberg and Ward 1995; McLemore 1991; Ritchart and Arvaniti 2013); L* H-H%, H* H-H%, and the longer sequence H* L* H-H% for AusE and NZE (Fletcher and Harrington 2001; Fletcher 2005; Fletcher et al. 2005; McGregor and Palethorpe 2008); and H* L-H% or H*+L H-H% for British English (Bradford 1997). This mixture of labels indicates that there is variation both within and between varieties in terms of the shape of the uptalk rise, many labels including a falling component before the rise, so that the rise is from either a low-pitch accent or a low-phrase accent. Moreover, researchers have identified phonetic distinctions between uptalk and question rises, including the relative lateness of the rise onset in uptalk (in NZE: Warren 2005; for WSAfE: Dorrington 2010a) and a lower rise onset in uptalk (especially in AusE: Fletcher and Harrington 2001); see Figures 19.5 and 19.6 respectively. The differences between uptalk and question rises can be perceived and interpreted appropriately by native speakers of the varieties (Fletcher and Loakes 2010; Warren 2005, 2014).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

300 MARTINE GRICE, JAMES SNEED GERMAN, AND PAUL WARREN

250 Hz

75 Hz It’s

probably a

little

bit

away

from

whispering

pine

0.04 s

2.49 s

Figure 19.5 Fall-rise uptalk contour, Australian English. (Warren 2016: fig. 4.1)

400 Hz

100 Hz and yer gonna

go

right

round

the

bowling

1.34 s

alley

3.96 s

Figure 19.6 Late rise uptalk contour, New Zealand English. (Warren 2016: fig. 4.2)

19.6 Conclusion Our survey has revealed a diverse set of intonational systems across varieties of English. For some—especially mainstream—varieties, diversity is limited to relatively minor features in inventories or in correspondences between contours and meanings. However, differences in overall phonological organization are also observed. There are, alongside stress-accent var ieties, those such as HKE, NigE, and GhanE that involve level tones assigned by both lexical specification and spreading rules. In terms of phrasing, MEVs and most other varieties

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

INTONATION SYSTEMS ACROSS VARIETIES OF ENGLISH 301 involve a lowest level of phrasing that is relatively large, while SgE patterns with languages such as French and Korean in having smaller units of phrasing (i.e. AP’s) that generally span only one or two content words. At the other end of the spectrum, HKE, NigE, and GhanE include only one large unit of phrasing (i.e. IP’s) that contributes to the tonal specification. In light of issues raised by MEVs concerning the linking of tones to either pitch accents or edge tones (see §19.3.1), an important issue for future research is whether the need for specific phrasing levels can be supported by evidence that is independent of the edge tones they are purported to carry (e.g. progressive final lengthening). All the English varieties we have covered include word stress in some form or another. This can involve a manifestation of greater acoustic prominence or else merely designate syllables as prosodically privileged. In the latter case, privileged syllables are singled out to receive specific tonal values without concomitant acoustic or articulatory enhancement, and in most cases these tones are lexically specified. The high functional load of pitch accents in mainstream varieties has most probably led to the need for the location of these pitch accents (lexical stress) to be reinterpreted as tone. Across the different varieties, there is considerable variation in the assignment of word stress, differing from MEVs for certain words (e.g. Fijian, where secondary stress and primary stress are sometimes exchanged, and AAE, where there is a preference for initial stress) or word types (e.g. the compound rule is different in Maltese and Hispanic English). In Indian English, stress is rule based, with quantity sensitivity in some L1 groups (e.g. Hindi and Punjabi), but overall the rules are highly variable. In BlSAfE, stress also appears to be rule based, with penultimate stress in most cases. Caution is required, however: sometimes it might appear that word stress is shifted when in fact the pitch accent is low and is followed by a H-phrase accent (Caribbean English, SgE) or when constituents in a compound are treated as separate word domains (NigE, GhanE). A similar issue applies to sentence-level prominence (and, by extension, word citation forms), since in varieties such as NigE and HKE a lack of postnuclear deaccenting, combined with rightward spreading of H tones, can give the impression that prominence is ‘shifted’ to a later word or syllable than in MEVs. It is therefore essential that the interplay of stress, accent, and tonal assignment be worked out separately for each variety and independently of what might give the impression of prominence in another variety. This diversity highlights the important role played by the choice of framework and analytical tools when approaching any variety, whether unstudied or well studied. It would, for example, be clearly problematic to apply the notion of a nuclear contour to HKE, NigE, or GhanE, since, in those varieties, the tune at the end of a phrase is fully determined by lexical word stress and the choice of boundary tone. Apart from the latter, in other words, there is no optionality that could result in meaningful contrasts (see Gussenhoven 2017b for further discussion). Additionally, we need look no further than MEVs to recognize that aspects of the prenuclear contour can be informative in terms of both structure and meaning. Thus, applying AM or British School categories developed for a well-studied variety to a variety whose phonological organization and intonational inventory has not yet been established is highly problematic, since it strongly biases the researcher towards an analysis in terms of pitch accents (or even nuclear pitch accents), thereby missing other typological possibilities. Moreover, even if the variety in question does pattern with MEVs in having regular pitch accents, the use of a pre-existing inventory runs the risk of (i) treating non-contrastive differences as contrastive and (ii) lumping existing contrasts into a single category. This issue

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

302 MARTINE GRICE, JAMES SNEED GERMAN, AND PAUL WARREN is underscored by the numerous studies on UNB and American English varieties (Arvaniti and Garding 2007; Clopper and Smiljanić 2011) as well as IE (Maxwell 2014; Maxwell and Fletcher 2014). In these cases, detailed phonetic studies have revealed inventory differences across varieties that would have been missed had the authors been restricted to any of the inventories available for MEVs. Besides the differences outlined above, our survey also revealed certain regularities. For example, the majority of non-MEV varieties either lack post-focal deaccenting or use it with less regularity. These include HKE, SgE, NigE, GhanE, BlSAfE, IE, MaltE, and AAE. This broad tendency suggests that post-focal deaccenting as a feature tends to resist transfer in contact situations, either because it is not compatible with certain types of phonological organization or because it is not sufficiently salient to L2 speakers during nativization. It is also interesting to note the striking similarity in the intonation systems of HKE and West African varieties, which resulted from contact with unrelated tone languages. As Gussenhoven (2017b: 22) notes, this can be ‘explained by the way the pitch patterns of [British English] words were translated into tonal representations’. While some accounts suggest that a similar generalization applies to SgE, that variety has received substantial influence from intonation languages including Malay, Indic, and Dravidian languages, and even IE (Gupta 1998), which could explain why it patterns less closely with the above ‘tonal’ varieties. In this chapter we have only been able to report on a selection of contact varieties of English. The typological diversity we have observed will no doubt be enriched, once we take up the challenge of analysing newly emerging (extended circle) varieties, such as those spoken in China and Korea.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 20

The North Atl a n tic a n d the A rctic Kristján Árnason, Anja Arnhold, Ailbhe Ní Chasaide, Nicole Dehé, Amelie Dorn, and Osahito Miyaoka

20.1 Introduction The languages described in this chapter belong to three families, Celtic, North Germanic, and Eskimo-Aleut. The grouping into one chapter is purely geographical and the different groups have very different histories and little in common structurally. There is also little evidence of Sprachbund or contact phenomena between these groups, although pre-aspiration and some tonal features have been noted as common between some Scandinavian varieties and Celtic.

20.2 Celtic Irish and Scottish Gaelic are the indigenous languages of Ireland and Scotland, belonging to the Goidelic branch of Celtic together with Manx, while the Brittonic branch of Celtic comprises Breton, Welsh, and Cornish. Irish is, with English, one of the two official languages in the Republic of Ireland. Today, it is mostly spoken as a community language in pockets (Gaeltachtaí) that mainly stretch along the western Irish coast. There is no spoken standard, but there is standardization of written forms. The three main Irish dialects, Ulster, Connaught, and Munster Irish, differ at the phonological level, both segmental and prosodic, besides differences at the morphological, lexical, and syntactic levels. The strongest Scottish Gaelic-speaking areas today are also concentrated along the northwestern coastal areas and islands of Scotland. Welsh-speaking areas, on the other hand, extend across wider parts of Wales, with higher numbers of speakers particularly in the north and west. Breton is spoken in very limited areas, mainly in the western half of Brittany, France.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

304 kristján árnason et al.

20.2.1 Irish and Scottish Gaelic Irish is a verb-subject-object language with reduced vowels in unstressed syllables, a contrast between long and short vowels, and frequent consonant clusters, features that have been associated with ‘stress-timed’ languages (cf. Ó Siadhail 1999). Primary stress generally falls on the first syllable of words and, with few exceptions, this is also true for Scottish Gaelic (cf. Clement 1984; Bosch 2010). In Munster Irish, primary stress shifts to the second or third syllable of disyllabic and trisyllabic words if this is the first heavy syllable—that is, containing a long vowel or diphthong (cf. Blankenhorn 1981; Ó Sé 1989, 2019; Ó Siadhail 1999) or a rhyme /ax/, as in /baˈkax/ ‘lame’. Stress in Welsh (cf. Williams 1983; Hannahs 2013) and Breton (cf. Jackson 1967; Ternes 1992), on the other hand, is traditionally placed on the penultimate syllable in polysyllabic words, with exceptions being confined to loanwords or dialectal variation. Syllable structure in the Irish dialects, described as similar to English, is complex (Carnie 1994; Green 1997; Ní Chiosáin 1999) and open to different interpretations by linguists (cf. de Bhaldraithe 1945/1966; de Búrca 1958/1970; Mhac an Fhailigh 1968; Ní Chiosáin et al. 2012). For Scottish Gaelic, a close link between syllabicity and pitch has been suggested (cf. Borgstrøm 1940; Oftedal 1956; Ternes 2006). Studies on Scottish Gaelic intonation have commented on its use of lexical tone or ‘word accent’ (Borgstrøm 1940; MacAuley 1979), a feature not observed in Irish and sometimes posited as a Viking influence. Phonetic studies have also addressed the different realizations of lexical tone (cf. Bosch and de Jong 1997; Ladefoged et al. 1998). In Welsh, on the other hand, stress and pitch prominence have been noted to occur independently of one another (cf. Thomas 1967; Oftedal 1969; Williams 1983, 1985; Bosch 1996). There is a long tradition of segmental description of Irish dialects (e.g. Quiggin 1906; Sommerfelt 1922; de Bhaldraithe 1945/1966; Breatnach 1947; de Búrca 1958/1970; Ó Cuív 1968; Ní Chasaide 1999; Ó Sé 2019). The most striking feature at the phonological level (which Irish shares with Scottish Gaelic) is the systematic contrast of palatalized and velarized consonants (the term ‘velarization’ covers here secondary articulations in the velar, uvular, or upper pharyngeal regions). The palatalization or velarization of the consonant tends to give rise to diphthongal on- or offglides to or from an adjacent heterorganic vowel (i.e. when a palatalized consonant occurs with a back vowel or a velarized consonant with a front vowel). Thus, for example, the phoneme /i:/ may be realized as [i:] in bí /bʲiː/ ‘be’; [iˠi] in buí /bˠiː/ ‘yellow’; [iiˠ] in aol /i:lˠ/ ‘limestone’; [ˠiiˠ] in baol /bˠi:lˠ/ ‘danger’. This phonological contrast of palatalized–velarized pairs of consonants plays an important role in signalling grammatical information—for example, in marking the genitive case in óil / oːlʲ/ as compared to the nominative case ól /oːlˠ/ ‘drink’. In Scottish Gaelic, the realization of the stop voicing contrast entails pre-aspiration of the voiceless series (and post-aspiration in pre-vocalic contexts). There is little or no voicing of the voiced series. The extent of preaspiration varies across dialects, and similar realizations are attested for Irish (Shuken 1980; Ní Chasaide 1985).

20.2.2 Intonation Until recently, relatively little was known about Irish intonation. Broad-ranging instrumental analyses of the northern and southern Irish dialects have been carried out in the Prosody

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE NORTH ATLANTIC AND THE ARCTIC 305 of Irish Dialects project (Dalton and Ní Chasaide 2003, 2005a, 2005b, 2007a, 2007b; Ní Chasaide 2003–2006; Ní Chasaide and Dalton 2006; Dalton 2008) and beyond (Dorn 2014; O’Reilly 2014). These studies reveal a major divide between the northern dialects of Donegal on the one hand and the southern dialects of Connemara, Aran Islands, Kerry, and Mayo on the other. The southern dialects have falling tonal contours for both declaratives and questions. The northern dialects are atypical in that the default tune in neutral declaratives is a rising contour, while this is also the contour used in wh- and polar questions (cf. Dalton and Ní Chasaide 2003; Dalton 2008; Dorn et al. 2011). Further accounts of Connemara Irish are found in Blankenhorn (1982) and Bondaruk (2004). Irish has identifiable pitch accents, both monotonal and bitonal, and boundary tones. On the segmental level, the main tonal target typically aligns with the lexically stressed syllable. As mentioned above, a north–south difference emerges. For the northern varieties (of Donegal), neutral declaratives are typically characterized by sequences of (L*+H L*+H L*+H 0%), whereas for the southern dialects investigated so far (Connemara, Kerry, and Mayo), they most typically consist of a sequence of falls in all positions in the IP (H*+L H*+L H*+L 0%). Boundary tones can be high (H%), low (L%), or level (%, aka 0%, Ø). (For more detailed descriptions, see Dalton and Ní Chasaide 2003; Dalton 2008; Dorn 2014; O’Reilly 2014.) For Scottish Gaelic, Welsh, and Breton, different accounts of intonation exist. MacAulay (1979) describes Scottish Gaelic intonation (Bernera dialect, Lewis) generally with phrasefinal falls and phrase-final rises in questions, but without considering the interaction of lexical and intonational tones, something that was studied more recently by Nance (2013), who also looked at features of language change in young Gaelic speakers (Nance 2013, 2015). Breton intonation is described by Ternes (1992) with rising sentence questions and falling affirmative sentences, and several studies have addressed Welsh intonation using different transcription methods (cf. Thomas 1967; Pilch 1975; Rees 1977; Rhys 1984; Evans 1997). Detailed instrumental analyses of Welsh intonation (Anglesey variety) were carried out by Cooper (2015), who addressed both phonological and phonetic issues, concentrating on the alignment and scaling of tonal targets and on the intonational encoding of interrogativity. In Irish, in addition to syntactic marking, interrogativity is marked prosodically. Although the tonal sequence for questions is the same as for the neutral declaratives, the former entail phonetic adjustments—more specifically to the relative pitch, regardless of tune, in the initial or final accent of the utterance. In Donegal Irish, the prenuclear accent in wh-questions is boosted by raising the pitch accent targets relative to statements and polar questions. For polar questions, the principal marker involves a similar boosting of the nuclear pitch accent compared to statements. Thus, wh-questions can overall be characterized by downdrift (i.e. a falling f0 slope), while polar questions more typically show a tendency towards upsweep (i.e. a rising f0 slope) compared to statements. In South Connaught Irish (Cois Fharraige and Inis Mór), polar questions typically have a higher pitch peak in initial prenuclear accents relative to statements as well as a raised pitch level (cf. Dorn et al. 2011; O’Reilly and Ní Chasaide 2015). Cross-dialect differences in peak alignment are found in the Irish dialects (cf. Dalton and Ní Chasaide 2005a, 2005b, 2007a; Ní Chasaide and Dalton 2006; O’Reilly and Ní Chasaide 2012). Although Donegal (Gaoth Dobhair) Irish and the Connaught dialect of Cois Fharraige differ greatly in terms of their tunes, both dialects tend towards fixed alignment in that for both prenuclear and nuclear accents, the tonal target tends to be anchored to a particular point in the accented syllable. However, a comparison of peak timing in two

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

306 kristján árnason et al. southern varieties of Connaught Irish (Cois Fharraige and Inis Oírr) reveals striking differences. As mentioned, Cois Fharraige has fixed alignment, but Inis Oírr exhibits variable peak alignment: the peak of the first prenuclear accent is timed earlier if the anacrusis (the number of unaccented syllables preceding the initial prenuclear accent) is longer, while the peak of the nuclear accent is later when the tail (the postnuclear stretch) is longer. These alignment differences are all the more striking as they occur within what is considered to be a single dialect (Dalton and Ní Chasaide 2005a, 2007a). Focus in Irish can be realized by means of prosodic prominence as well as by syntactic clefting and emphatic versions of, for example, pronouns (the emphatic version of the pronouns ‘me’ mise [mʲɪʃə] vs. the non-emphatic version mé [mʲe:], typically reduced to [mʲə]). Prosodic prominence involves highlighting the focal element by raising the f0 peak and extending the duration of the accented syllable as well as deaccentuating post-focal material, and possibly reducing the scaling of pre-focal accents (cf. O’Reilly et al. 2010; O’Reilly and Ní Chasaide 2016). The same pitch accent type is used across semantically different focus types (narrow, contrastive) in Donegal Irish (cf. Dorn and Ní Chasaide 2011).

20.3 Insular Scandinavian Icelandic and Faroese form the western branch of the Nordic family of languages, sometimes referred to as Insular Scandinavian as opposed to the Continental group, Norwegian, Danish, and Swedish. They inherit initial word stress from Old Norse and have a similar quantity structure with stress-to-weight and a distinction between long (open-syllable) and short (closed-syllable) vowels as a result of a quantity shift that has occurred in both languages. Stress may be realized by tonal accents and lengthening of long vowels, and in closed syllables by post-vocalic consonants, producing for Icelandic so-called half length (Árnason 2011: 149–151, 189–195). An important difference is that Faroese has a set of segmentally restricted stressless syllables, which draw on a reduced vowel system, whereas in Icelandic all syllables are phonotactically equal. The modern languages do not have a tonal distinction in the manner of Swedish and Norwegian, although older Icelandic shows rudimental signs of such distinctions (Árnason and Þorgeirsson 2017). Both languages have post-vocalic pre-aspiration on fortis stops, correlating with stress (Árnason 2011: 216–233), but there are differences in realization and distribution (Hansson 2003), Icelandic pre- aspiration being more ‘segment like’, as if receiving half length after short vowels and closing the syllable as a coda.

20.3.1 Stress in words and phrases Icelandic has regular initial stress with a rhythmically motivated secondary stress on oddnumbered syllables: forusta [ˈfɔːrʏsˌta] ‘leadership’. Morphological structure in compounds and derived words can affect the rhythmic stress contour, as stress on second components may be retained as secondary stress, disregarding alternating stress: höfðingja#vald [ˈhœvðiɲcaˌvalt] ‘aristocracy’; literally: ‘chieftains-gen#power’. However, adjacent stresses tend to be avoided, thus shifting the morphologically predicted secondary stress to the right

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE NORTH ATLANTIC AND THE ARCTIC 307 in words such as borð#plata [ˈpɔrðplaˌta] ‘table top’; literally: ‘table plate’. Normally, a prefix takes initial stress in Icelandic, but there are principled exceptions: Hann er hálf-leiðinlegur [haulvˈleiːðɪnlɛɣʏr̥] ‘He is rather boring’ (literally: ‘half-boring’) (Árnason 1987). The unstressed prefixes have a special modal function, and, although morphologically bound, they do not form phonological words with their anchors. Some loanwords and foreign names show arhythmic word-stress patterns: karbórator [ˈkʰarˑpouraˌtʰɔˑr̥] ‘carburettor’, suggesting karbóra- and -tor as separate pseudo-morphs; the plosive in -tor is aspirated in southern Icelandic, indicating that it is foot initial. A majority of native Faroese words have their main stress on the first syllable: tómur [ˈtʰɔuːmʊɹ] ‘empty’, hestarnir [ˈhɛstanɩɹ] ‘the horses’, onga#staðni [ˈɔŋkanˌstɛanɩ] ‘nowhere (literally: no place)’ (Lockwood 1955/1977: 8). Secondary stress can be placed on the fourth syllable of a compound, as in tosingarlag [ˈtɔːsiŋkarˌlɛa] ‘mode of speaking’. Adjacent stresses occur, as inˈtil#ˌbiðja ‘worship’ (literally: ‘to pray’). According to Dehé and Wetterlin (2013), the most prominent phonetic parameters related to Faroese secondary stress are vowel duration and voice onset time. Faroese prefixes such as aftur- ‘again’ commonly take initial stress, as in ˈafturtøka ‘repetition’, but words of three or more syllables, which take prefixes such as ó- ‘un’ or ser- ‘apart’, regularly take stress on the second morphological constituent: ser#stakliga(ni) [sɛɹˈstɛaːklijanɩ] ‘especially’. Compound pre posi tions and adverbs commonly have stress on the second part: afturum [aʰtəˈɹʊmː] ‘behind (literally: after about)’. Corresponding prepositions take initial stress in most varieties of Icelandic: ˈaftanvið ‘behind’. The stress pattern of many native Faroese compounds also seems to vacillate: burðar#vektir ‘birth weights (of infants)’, which in a careful style can have the main stress either on the first or the second component: [ˈpuɹaˌvɛktɩɹ] or [ˌpuɹaˈvɛktɩɹ]. Forms such as ítrótt [ˈʊiːtrɔʰt] ‘sports’, bláloft [ˈplɔɑːlɔft] ‘(blue) sky’ have two nonrestricted syllables, of which the first takes the word stress and the second is weak accordingly. Rhythmic effects seem to be common in Faroese, so that words such as ˈvið#víkjˌandi ‘concerning’ (literally: ‘to#applying’), benkinun [ˈpɔiɲʧɩˌnʊn] ‘the bench’ show alternation. Restricted syllables such as the last one in benkinun, which take the rhythmic type of secondary stress, have been classified as leves, whereas the fully weak or reduced ones are levissimi (Hagström 1967). Forms such as tíðliga [tʰʊiʎ.ja] ‘early’, where the vowel of the second syllable remains unpronounced, show that alternating rhythm is a traditional feature of the phonological system. Many Faroese loanwords have non-initial stress: signal [sɩkˈnaːl] ‘signal’, radiatorur [ˌɹaˑtiaˈtʰoːɹʊɹ] / [ˌɹaˑtɩˈaːtoɹʊɹ] ‘radiator’. Icelandic also shows non-initial stress in loans: Þetta er dálítið extreme [ɛxsˈtriːm] ‘This is a bit extreme’, experimental notkun [ˌɛxspɛrɪˈmɛn̥talˈnɔtkʏn] ‘experimental use’. The final stress of extreme is likely to be moved to the first syllable in cases such as Þetta er extreme dæmi [ˈexstrimˈtaimɪ] ‘This is an extreme example’, avoiding a stress clash. In ‘emphatic re-phrasing’ (Árnason 2009: 289), words may be uttered as phrases, placing the accent on non-initial syllables, as in Hann er hrika-LEGUR [ˌr̥ɪˑkaˈleɛːɣʏr̥] ‘He is really terrible’ (Árnason 2011: 151). The normal pattern of phrasal accentuation in both languages is right strong: Nanna borðar HAFRAGRAUT [ˌnanˑaˌpɔrðarˈhavraˌkrœyˑt] ‘Nanna eats oat-meal’, Dávur spælar FÓTBOLT [ˌtɔaːvʊɹˌspɛalaɹˈfɔuː(t)pɔl ̥t] ‘David plays football’, but two types of Icelandic exceptions to the unmarked pattern have been noted, one systematic and the other pragmatic (Árnason 2009). Some word classes, like verbs, are ‘stronger’ than others, like nouns, and may for that reason reject the phrasal accent in wide focus, as in Ég sá JÓN koma

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

308 kristján árnason et al. (noun > verb) ‘I saw JOHN coming’. A phrasing with the accent on the verb koma would place the focus on the verb. A strength hierarchy of noun > verb > preposition > personal pronoun has been proposed (cf. Árnason 1994–1995, 1998, 2005: 446–447). Definite noun phrases are normally left-strong under broad focus: Ég gaf Jóni [GAMLA hestinn] ‘I gave John the old horse’, Ég gaf Jóni [gamlan HEST] ‘I gave John an old horse’, although semantically heavy nouns can retain their accent, as in Þarna er gamla PÓSTHÚSIÐ ‘There is the old post-office’ (Árnason 2005: 453–454; Ladd 2008b: 242). Faroese forms such as til tað [ˈtʰɪlta] ‘to that’, hjá honum [ˈʧɔːnʊn] ‘with him’, hjá mær [ˈʧɔmːɛaɹ] ‘with me’, where the pronouns tað ‘it, that’, honum ‘him’, and mær ‘me’ have been cliticized so as to form phonological words with the prepositions, must have their origin in phrases in which prepositions were stronger than pronouns. By contrast, compound pre positions have their main stress on the prepositional stem: afturum [aʰtəˈɹʊmː] ‘behind (literally: after about)’. As in Icelandic, ‘re-phrasing’ may turn parts of words into phrases. Faroese compounds such as Skálafjörður ‘a place name (literally: hall-gen#fjord)’ can be so split up, and individual syllables, as in í-TRÓTT-sögu [ʊiːˈtrɔʰtˌsœɵ] ‘sports history’ can also take contrastive stress for emphasis or as an instance of clear style of utterance (Árnason 2011: 292). Segmental processes may serve as cues to phrasing. Glottal onsets commonly occur before vowel-initial stressed syllables in both languages: Jón kemur ALDREI [jouɲcɛmʏrˈɁaltrei] (Icelandic) ‘John NEVER comes (John comes never)’, Okkurt um árið 1908 [ˈʔɔoʰkʊɹ̥tumˈʔɔɑːɹəˈnʊiːʧɔntrʊˈʔɔʰta] (Faroese) ‘Around the year 1908’. Final devoicing is a clear signal of pause or the end of an utterance in Icelandic: Jón fór [fouːr̥] ‘John went’ (Helgason 1993; Dehé 2014). Dehé’s (2014) results show that devoicing is obligatory at the ends of utterances and optional within utterances, with its frequency of occurrence reflecting the rank of the prosodic boundary. A common phenomenon in Faroese, often co-occurring with final devoicing, is the deletion or truncation of vowels in utterance-final syllables: eftir ‘after’ [ɛʰtɹ̥] (instead of, say, [ɛʰtɩɹ] or [ɛʰtəɹ]); veit ikki ‘don’t know’ [vaiːʧʰ], instead of [vaiːʧɩ] or [vaiːtɩʧɩ]. In Icelandic phrasal cohesion is shown by final vowel deletion in stronger constituents before weaker ones beginning in a vowel: Nonn(i) ætlar að far(a) á fund ‘Jonny is going to a meeting’ (Dehé 2008; Árnason 2011: 295, 299).

20.3.2 Intonation The intonation of Icelandic has been analysed in terms of prenuclear and nuclear accents and boundary tones, as well as typical nuclear contours (Árnason 1998, 2005, 2011; Dehé 2009, 2010, 2018). Boundary tones are low (L%) or high (H%), with L% marking finality and H% indicating some sort of non-finality or special connotation. Dehé (2009, 2010) suggests that there are two phrase accents (L- and H-). Pitch accents are monotonal high (H*) or low (L*) or bitonal (rising L*+H: late rise; L+H*: early rise) or falling (H*+ L). Based on tonal alignment data, however, it has been debated whether all bitonal pitch accent types are rising and the perceived fall at the end of an accented phrase is a low edge tone rather than due to a trailing tone of the pitch accent (Dehé 2010). A typical nuclear contour in neutral declaratives has one or more rising (L*+H) prenuclear accents and a (monotonal or bitonal) H* nuclear accent, terminated by L%. The default melody in all Icelandic utterance types is a fall to L%, with a downtrend within the intonational phrase. This includes polar questions (Árnason 2005, 2011; Dehé

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE NORTH ATLANTIC AND THE ARCTIC 309 2018; Dehé and Braun 2020) and other-initiated repair initials such as Ha ‘huh’ and Hvað segirðu ‘What did you say’ (Dehé 2015), which are typically rising in related languages. According to Árnason (2011: 323), questions with rising intonation ‘have special connotations’ (e.g. impatience), while questions with falling intonation are neutral. According to Árnason (2011: 322–323), an intonational difference between Icelandic statements and polar questions could be the type of nuclear accent, an early rise (L+)H* in statements versus a late rise L*+H in polar questions, while wh-questions typically have H* (Árnason 2005: 467–477); rhetorical polar and wh-questions have mostly L+H* nuclear accents (Dehé and Braun 2020). Other intonational differences may lie in distinctions in the overall downtrend (Dehé 2009), the prenuclear region, and the overall pitch level, but all of this is subject to current and future research. Systematic study of Faroese intonation has been even more limited than that of Icelandic, but scholars have observed more rising patterns (H%) than in Icelandic, especially in polar questions. There are also some anecdotal descriptions of intonational peculiarities of Faroese varieties in places such as Suðuroy and Vágar (see Árnason 2011: 324–326). In both Icelandic and Faroese, focus is marked by pitch accents, especially high tonal targets, which are then exempted from the overall downtrend (for Icelandic see Árnason 1998, 2009; Dehé 2006, 2010). Focus may also soften the strength of the prosodic boundary at the right edge of the focused constituent (Dehé 2008), enhancing the application of final vowel deletion. The likelihood of the deletion of the final vowel on Jónína in Jónína ætlar að baka köku ‘Jónína is going to bake a cake’ increases when Jónína is focused; the subject ends up forming one constituent with the verb, allowing final vowel deletion to apply within that constituent. An interesting aspect of Icelandic intonation is that given information may resist deaccentuation (Nolan and Jónsdóttir 2001; see also Dehé 2009: 19–21).

20.4 Eskimo-Aleut The Eskimo-Aleut language family consists of two branches, Eskimo (Inuit and Yupik) and Aleut. All languages in the family share central prosodic characteristics (e.g. phonemic vowel length) but differ in others (e.g. stress). The now extinct Sirenikski—originally classified as a Yupik language but now viewed as a separate branch of Eskimo (Krauss 1985a; Vakhtin 1991, 1998)—deviated by having an alternating stress pattern but no clear vowel length contrast, as well as showing vowel reduction in non-initial syllables (Krauss 1985c).

20.4.1 Inuit Within the Inuit dialect continuum, stretching from Alaska to Eastern Greenland, Kalaallisut (West Greenlandic), the official language of Greenland with about 45,000 speakers (Fortescue 2004), has received the most attention. While both vowel and consonant length are phonemically contrastive in Inuit (for phonetic studies, see Mase and Rischel 1971; Massenet 1986; Nagano-Madsen 1988, 1992), there is no lexical specification of prominence or tone. Rischel (1974) was the first to suggest that the notion of stress is not

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

310 kristján árnason et al. useful in analysing Kalaallisut, pointing out that native speakers as well as trained phoneticians could not agree on stress patterns. Fortescue (1984) agrees that there is neither contrastive nor demarcative stress, although the impression of prominence can arise when intonational pitch movements coincide with a heavy syllable. An experimental phonetic study by Jacobsen (2000) confirmed that duration and pitch are not correlated and that there is thus no evidence for stress in Kalaallisut. On the basis of auditory analysis, Fortescue (1983) states that stress is generally not part of the Inuit prosodic system. Massenet’s (1980) acoustic study supports this conclusion for speakers from Inukjuak, Quebec, living in Qausuittuq, Nunavut. Similarly, Pigott (2012) analysed f0, duration, and intensity and found no acoustic evidence for stress in Labrador Inuttut, nor did Arnhold et al. (in press) for South Baffin Island Inuktitut. A possible exception is Seward Peninsula Inupiaq, which according to Kaplan (1985, 2009) through contact with Yupik has adopted a system of consonant gradation in alternating syllables. However, Kaplan states that consonant gradation, in contrast to Yupik, is independent of stress, which appears on all non-final closed syllables and on all long vowels (for discussions of similar adjustments to syllable structure in other Inuit varieties, see Rischel 1974; Rose et al. 2012; Arnhold et al., in press). Instead of pitch accents, tonal targets seem to be associated with prosodic domains in the varieties that have been studied. Research into prosodic domains and phrasing is still somewhat preliminary, and no more than two tonally marked domains have been suggested, although further levels may be relevant for pitch scaling; for example, Nagano-Madsen and Bredvad-Jensen’s (1995) study of phrasing and reset in Kalaallisut text reading showed some correspondence between syntactic and prosodic units, but also considerable variation between speakers. As Inuit is polysynthetic, many words are complex and correspond to whole phrases when translated into languages such as English. Kalaallisut words, when uttered in isolation, consistently bear a final rising-falling-rising tonal movement (i.e. HLH tones associated with the last three vowel morae) (Mase 1973; Rischel 1974). In declarative utterances consisting of more than one word, words can have a final HLH, HL, or LH contour, as well as flat pitch, though HL and HLH are by far most frequent (Rischel 1974; Fortescue 1984; Nagano-Madsen 1993; Arnhold 2014a). Based on Rischel’s (1974) distinction between phrase-internal and phrase-final contours, Nagano-Madsen (1993) suggested a decomposition of the HLH melody into HL, which is associated with the word level, and the final H, associated with the phrase level. However, Arnhold (2007, 2014a) analysed all three tonal targets as associated with the word level to account for the high frequency of HLH realizations that did not coincide with phrase boundary markers, such as pauses and following f0 resets, while such markers may occur after words with HL contours. Both accounts agree that the final H tone in a larger unit, identified as the intonational phrase here, is often lowered, and words in utterance-final position are frequently reduced in intensity, devoiced, or ‘clipped’ by omitting the final segments (see also Rischel 1974; Fortescue 1984) (cf. Aleut in §20.4.3, and final devoicing and truncation in Insular Scandinavian in §20.3.1). Arnhold (2014a) additionally proposes a L% boundary tone associated with the inton ational phrase to account for the marking of sentence type. Whereas imperatives, exclamatives, and wh-questions end in high, though sometimes lowered, pitch like declaratives, polar interrogatives consistently end in low pitch (Rischel 1974; Fortescue 1984; Arnhold 2014a). This is true of the central dialect spoken in the Greenlandic capital Nuuk and south of it. In northern West Greenlandic, the same HLH contour as in declaratives appears in polar questions, but the last vowel of the word is lengthened so that the tones,

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE NORTH ATLANTIC AND THE ARCTIC 311 which are still associated to the last three morae, are ‘shifted’ one mora to the right from where they are in a declarative (Rischel 1974; Fortescue 1984). Fortescue (1983) describes three major differences in the intonation of statements and polar questions across Inuit varieties. First, while the mora is the tone-bearing unit in eastern and western Greenland, as well as Labrador and Arctic Quebec, he observed syllable-based patterns for the rest of Canada, Alaska, and northern Greenland (on North Greenlandic, see also Jacobsen 1991). Second, he finds declaratives to have a fall on the final, the penultimate, or the antepenultimate syllable or mora (followed by a rise in some varieties, such as Kalaallisut). Third, several eastern and western varieties have falling pitch in interrogatives as in central Kalaallisut, while others have a final pitch rise, with or without lengthening of the last syllable. Acoustic studies of intonation have been conducted for three varieties other than Kalaallisut. For Itivimuit,1 Massenet (1980) describes final f0 falls with a H tone on the penultimate vocalic mora in statements, a H tone on the last vocalic mora in exclamatives, a H tone on the antepenultimate vocalic mora in questions, followed by one of three contour shapes, a simple fall or a fall-rise with either a doubled or a tripled last vocalic mora. For Labrador Inuttut, Pigott (2012) describes similar patterns, though with a less clear distinction between questions and statements. In addition to frequent phrase-final lengthening, he also found aspiration of final coda plosives. On non-final words, he observed final HL contours. For their corpus of South Baffin Inuktitut, Arnhold et al. (2018) found HL contours on all words, with the H realized early in the word and the L at its end. On final words, the resulting fall was followed by a plateau in most cases, indicating the presence of another L associated with the intonational phrase. In addition to marking prosodic boundaries and distinguishing sentence types, Inuit prosody is influenced by pragmatics. Massenet’s three question contours distinguish ‘leading’ questions, where the speaker already knows part of the answer, from neutral polar questions and confirmation-seeking echo questions. For Kalaallisut, Fortescue (1984) describes a complex interplay between mood (e.g. indicative, interrogative, or causative), intonation (final rise vs. final fall), and context/speaker intent. Prosodic marking of information structure has only been investigated for Kalaallisut (Arnhold 2007, 2014a), where focused words are more often realized with HLH tones and an expanded pitch range, while given/backgrounded words have smaller ranges and more frequent HL realizations.

20.4.2 Yupik Yupik is a group of polysynthetic languages spoken in southwestern Alaska, the largest group being Central Alaskan Yupik (CAY). Mutually unintelligible St Lawrence Island Yupik is spoken to the north, while Alutiiq Yupik is spoken on the Alaskan peninsula to the south. CAY has approximately 10,000 speakers, many of whom speak mixed English–CAY varieties (‘half-speakers’, as they are known in the area). The description below deals with CAY and is based on Miyaoka (1971, 2002, 2012, 2015) and informed by Jacobson (1985), Leer (1985a), and Woodbury (1987). Some information on other varieties appears at the end of this section. The text below concentrates on feet and their dominating constituents. 1 According to Fortescue (1983) and Pigott (2012), Massenet’s findings are representative of the Arctic Quebec region the speakers left about 20 years before the recordings took place.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

312 kristján árnason et al. Words may be monomorphemic as well as highly synthetic through suffixation, creating the first constituent, or ‘articulus’, above the morpheme in the morphosyntactic hierarchy.2 The next two levels up in the hierarchy are the enclitic bound phrase (indicated by {…}, with clitic boundaries indicated by ‘=’) and the non-enclitic bound phrase (with boundaries indicated by ‘≠’). There are four vowels, /i a u ə/, the first three of which also occur as long; /ə/ does not appear word-finally and has no long counterpart. In the enclitic bound phrase, syllabification is continuous and quantity-sensitive iambs are built from the left edge. The last of the stressed syllables is the primary stress. Long vowels reject being in the weak position of the foot, as does a word-initial closed syllable. As shown in (1), the last syllable of the word is never accented, due to a ban on final feet, which is suspended if the word otherwise remains footless, as in nuna /(nu.ˈna)/ ‘land’. As a result, one or two syllables at the end of a longer enclitic bound phrase will be unfooted. Thus, an initial monosyllabic foot occurs in (1a, 1b) and a disyllabic one in (1c, 1d). Penultimate stress in (1a) is due to the long vowel (cf. initial primary stress in /aana=mi/ leading to /{(ˈaa).na.mi)}/ ‘How about mother?’). In (1b), four word-internal closed syllables appear in weak positions. In addition, within the word, a sequence of heavy-light-light is parsed with a monosyllabic foot for the heavy syllable instead of a disyllabic one, as shown in (1d); compare /qayar-pag-mi=mi/ ‘How about in the big kayak’, which has a clitic boundary between the two light syllables, allowing the construction of a HL foot: /{(qa.ˌjaχ).(paɣ.ˈmi).mi}/. (1) a. aaluuyaaq (ˌaa).(ˈluu).jaaq ‘swing’ b. qusngirngalngur - tangqer - sugnarq - uq = llu = gguq goat there.be probable indic.3sg encl quot (ˌquz).(ŋiʁ.ˌŋal).(ŋuχ.ˌtaŋ).(qəχ.ˌsuɣ).(naχ.ˈquq). l ̥u.xuq ‘they say there seems to be a goat also’ c. cagayag - yagar - mini bear baby loc.4sg.sg/pl (ca.ˌɣa).(ja.ˌja).(ɣa.ˈmi).ni ‘in his own baby bear(s)’ d. qayar - pag - mini kayak big loc.4sg.sg/pl (qa.ˌjaχ).(ˈpaɣ).mi.ni ‘in his own big kayak(s) The primary stress is the location of a rapid high-to-low pitch fall. Since stressed syllables are bimoraic, the three full vowels /i a u/ are subject to iambic lengthening in strong open syllables (not indicated in the current orthography and the transcriptions here). In addition to underlyingly closed syllables, open syllables may be closed by a geminate consonant in specific right-hand contexts, for which reason the stress on a derived closed syllable is referred to as ‘regressive’ (e.g. Miyaoka 2012). 2 In Miyaoka (2002, 2012, 2015), a structural hierarchy is assumed, the ‘bilateral articulation’. It does not follow the commonly assumed conception of the ‘double articulation’ of language, specifically the notion of potentially non-isomorphic morphosyntactic and phonological hierarchies. In an integrated model of morpho syntax and phonology, each constituent has a morpho syntactic plane (content) as well as a phonological plane (expression) (cf. (4)).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE NORTH ATLANTIC AND THE ARCTIC 313 In (2), three contexts are listed in which closed syllables arise in the enclitic bound phrase, underlined in the transcriptions. (2) a. An open syllable before a sequence of a consonant-initial open syllable and an onsetless syllable acquires a coda through gemination of the following consonant: . . . V.C1V.V. . . → . . .VC1.C1V.V…, as in /a.ki.a.ni/ {(ˌak).(ki.ˈa).ni} ‘across it’ (aki ‘one across’, -ani loc.3sg) and /ang.ya.cu.ar.mi/ {(ˌaŋ).(ˌjat).(tɕu.ˈaʁ).mi} ‘in the small boat’ (angya ‘boat’, -cuar- ‘small’, -mi loc.sg). b. The second of two identical onset consonants separated by /ə/ is geminated, as in /nang.te.qe.qapig.tuq/ {(ˌnaŋ).(te.ˌqəq).(qa.ˈpix).tuq} ‘he is very sick’ (nangteqe‘sick’, -qapigc- ‘very much’, -tuq ind.3sg). In the same context, /ə/ is deleted between non-identical consonants, as in /keme-ni/ {(ˈkəm).ni}/ ‘his own flesh’ (kəmə ‘flesh’, -ni abs.3sg.refl.sg/pl (cf. /keme-mi/ {(kə.ˈməm).mi} ‘of his own flesh’ keme ‘flesh’, -mi erg.3sg.refl.sg/pl). c. A word consisting of a single open syllable acquires a coda through gemination of the onset consonant of a following morpheme, as in /ca=mi/ {(ˈcam).mi} ‘then what?’ (cf. /ca-mi/ {(ca.mí)} ‘in what?’; -mi loc). This particular ‘enclitic regression’ shows some dialect variation (cf. Miyaoka 2012).

20.4.2.1 The enclitic bound phrase CAY has over a dozen monosyllabic enclitics, at least three of which may occur in succession in an enclitic bound phrase (3) a. nuna - ka = mi = gguq land abs.1sg.sg how about quot {(nu.ˌna).(ka.ˈmi).xuq} ‘how about my land, it is said’ b. aana - ka = llu = ggur = am mother abs.1sg.sg and quot emph {(ˌaa).(na.ˌka).(l u̥ .ˈxuʁ).ʁam} ‘tell him/them that my mother . . .!’ There are no stem compounds in the language, except for a few exceptional phrasal compounds, notably with inflections on both constituents. In (4), two of these are given. (4) a. Teknonymic terms, which are very common, refer to parents by mentioning the name of their child, such as May’am+arnaan {(ˌmaj).(ja.ˈmaʁ).na.an} ‘(lit.) May’aq’s mother’ (May’aq ‘proper name’, -m erg.sg, arnar ‘woman’, -an erg.3sg), used to avoid the real name of the parent), in contrast with the syntactic phrase May’am arnaa(n) {(ˈmaj).jam}{(ˈaʁ).na.a(n)} ‘(of) May’aq’s woman’. b. Complex verbs formed from nouns indicating location and verbs indicating existence have inflections on both constituents, such as /ang.ya.an+(e)tuq/ {(ˌaŋ)(ja.ˈa). nә.tuq} ‘she is in his boat’ (angya ‘boat’ -an contracted from -ani loc.3sg.sg, (e)t‘to exist’, -uq ind.3sg). By contrast, the two-phrase /angyaani etuq/ is retained in the Nunivak dialect as {(ˈaŋ)ya.an}{(ә.ˈtuq)}.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

314 kristján árnason et al. The non-enclitic bound phrase is a halfway, somewhat variable category between the enclitic bound phrase and the (free) syntactic phrase. Often, the boundary between the words in a non-enclitic bound phrase blocks the maximum onset principle as well as footing, as for an enclitic bound phrase boundary, but allows footing to continue up to the medial boundary. It is indicated by ≠, an asymmetrical boundary. Example (5) would have been parsed as /{(nu.ˌna).(kaˈta).ma.na}/ if it were a clitic bound phrase and as /{(nu.ˌna). ka}{(ta.ˈma).na}/ if there were two syntactic phrases. As it is, (5) allows /kat/ to be footed, with gemination of /t/ to satisfy bimoricity. (5) nuna-ka ≠ tama - na land abs.1sg.sg that abs.sg {(nu.ˌna).(ˌkat) {(ta.ˈma).na}} ‘that land of mine’ In sum, the enclitic bound phrase is the domain of syllabification and right-edge iambic quantity-sensitive feet, whereby non-initial closed syllables count as light. The word has a minor effect in the way strings of V.CV.V are parsed, while the non-enclitic bound phrase has a final constituent that rejects inclusion in its host enclitic bound phrase but suspends the constraint on final feet. Woodbury (1987) reports various rules of expressive lengthening in varieties of CAY. In Central Siberian Yupik, coda consonants do not affect stress and stress does not cause consonant gemination. Stress appears on all syllables with long vowels and on each syllable following an unstressed syllable, except in final position (Jacobson 1985). Stressed vowels lengthen in open syllables (except /ə/, which is always short). Long stressed vowels are additionally marked by falling pitch (Jacobson 1990; but see Krauss 1985b on the loss of some length distinctions in younger speakers). The stress system of Naukanski is similar, though with closed syllables attracting stress and some apparent flexibility in stress placement (Krauss 1985c; Dobrieva et al. 2004). A different foot structure is reported for Alutiiq Yupik Sugt’stun, which combines binary and ternary stress (Leer 1985a, 1985b, 1985c; Martínez-Paricio and Kager 2017; and references therein).

20.4.3 Aleut Aleut (Unangam Tunuu) is a severely endangered language spoken on the western Alaska Peninsula, the Aleutian island chain, and the Pribilof and Commander Islands. In addition to the phonemic length contrast, vowel durations mark primary stress (Rozelle 1997; Taff et al. 2001), which is penultimate in eastern dialects, unless the penultimate syllable contains a short vowel and the ultimate a long vowel, in which case the ultimate is stressed (Taff 1992, 1999).3 This pattern is stable even if the last syllable is reduced or deleted, as happens frequently in phrase-final position (Taff 1999; cf. Kalaallisut in §20.4.1): when the ultima is deleted, the then final, but underlyingly penultimate, syllable is stressed (Rozelle 1997; Taff et al. 2001). 3 Acoustic investigations of prosody have not been conducted for Western dialects, which according to Bergsland (1994, 1997) prefer initial stress but have similar intonation to eastern dialects.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE NORTH ATLANTIC AND THE ARCTIC 315 Intonation is very regular. Almost all content words bear a pitch rise at the beginning and a pitch fall near the end (Taff 1999; Taff et al. 2001; cf. South Baffin Inuktitut in §20.4.1), modelled by Taff (1999) with two phrase accent boundary tones, H and L, as movements are not associated with the stressed syllable. Within a sentence, H and L tones are successively lowered. Additionally, the first word in a sentence starts with lower pitch and has a later peak than subsequent words, which Taff (1999) models with a L% initial boundary tone. Moreover, pitch falls on sentence-final words are steeper than for preceding words, modelled with a final L% (which contrasts with H% for rare clause-final, sentence-internal cases with less clear falls). Intonation is very similar for declaratives and polar questions (Taff 1999; Taff et al. 2001). Prosodic marking of focus is optional and may employ suspension of downtrends between prosodic words, extra-long duration, and/or increased use of small peaks, followed by a fall, on the penultimate, which led Taff (1999) to propose a sparsely used H* accent.

20.5 Conclusion We have discussed three genetically unrelated language groups with different basic structures. The Indo-European languages make a clear distinction between word constituency and phrasal constituency, whereas for the Eskimo-Aleut languages this distinction is not as clear. Devoicing and the presence of pre-aspiration in the North Atlantic region are worthy of interest, and in particular the final devoicing and truncation noted for Insular Scandinavian, Kalaallisut, and Aleut.

chapter 21

The I n di a n Su bcon ti n en t Aditi Lahiri and Holly J. Kennard

21.1 Introduction The Indian subcontinent comprises Bangladesh, India,1 and Pakistan, and its languages come from five major language families: Indo-Aryan, Nuristani branches of Indo-Iranian, Dravidian, branches of Austroasiatic, and numerous Tibeto-Burman languages, as well as language isolates (Grierson 1903/1922; Masica 1991; Krishnamurti 2003; Thurgood and LaPolla 2003; Abbi 2013; Dryer and Haspelmath 2013). In general, the term ‘prosody’ subsumes quantity contrasts, metrical structure, lexical tone, phrasing, and intonation, but, as far as is known at present, the only language of the Indian subcontinent to have lexical tone is Punjabi. In this chapter, we touch on all of these aspects of prosody but discuss quantity only insofar as it is linked to stress, phrasing, and intonation. It is not possible to cover the entire range of existing languages; rather, we give representative examples, largely from Assamese, Bengali (Kolkata), Bangladeshi Bengali (Dacca), Hindi, Malayalam, Tamil, and Telugu.

21.2 Quantity From a prosodic perspective, quantity contrasts are pertinent since they are directly related to syllable weight and hence stress assignment. Vowel quantity is always relevant to stress in a quantity-sensitive system, but if geminates are truly moraic and add weight to the syllable (cf. Hayes 1989a), they too could attract stress.2 For our purposes, it is worth noting that consonant quantity contrasts prevail in most Indo-Aryan languages, while true vowel quantity distinctions are less frequent, a possible example being educated standard Marathi, for which Le Grézause (2015) reports a quantity contrast for high vowels, with no geminates. 1 The Indian Nagaland is covered in chapter 23. 2 Davis (2011) argues that although generally geminate consonants have an underlying mora, it could be the case that other factors constrain the surface realization. Mohanan and Mohanan (1984) also suggest that geminates in Malayalam may not be truly moraic.

THE INDIAN SUBCONTINENT 317 There is, however, a tense/lax distinction in vowels in most of the Indo-Aryan languages, and Masica (1991) provides the following details. Some languages have an eight-vowel system, as in Gujarati /i e ɛ ɑ ə ɔ o u/, while others are assumed to contain nine vowels, such as Dogri /iː ɪ e æ aː ə o uː ʊ/3 with three long vowels. Hindi has an additional vowel, but without the quantity distinctions: /i ɪ e æ a ə ɔ o u ʊ/, which is also the view held by Ohala (1999). In the Bengali vowel system, there is neither a lax/tense nor a quantity distinction: /i e æ ɑ ɔ o u/ (Lahiri 2000), but allophonic vowel lengthening occurs in monosyllabic words. Consequently, vowel quantity alternations can be observed in morphologically related (base vs. suffixed) words as well as across unrelated monomorphemic words, as in (1). (1) Vowel quantity alternations in Bengali [nɑːk] ‘nose’ ~ [nɑk-i] nose-adj, ‘nasal’ [kɑːn] ‘ear’ ~ [kɑnɑ] ‘blind’ As in most languages where consonantal quantity contrasts exist, singleton–geminate pairs are observable medially but not finally or initially. In most Indo-Aryan languages, almost all consonantal phonemes have geminate counterparts, but there may be language-specific constraints on their appearance. Hindi, Telugu, and Bengali allow geminates to appear only in medial position, and there are further segmental restrictions. In Hindi, /bʰ ɽ h ɦ/ do not geminate (Ohala 1999), while in Telugu /f ʂ ʃ h ɳ/ are always singletons (Bhaskararao and Ray 2017). Bengali does not allow /h/ to geminate; however, there are also constraints on singletons such that retroflex /ɖ ɖʰ/ are not permitted in word-medial position where they are rhotacized. Examples of monomorphemic word pairs with a subset of obstruent and sonorant phonemes are given in (2) for Bengali, Hindi, and Telugu. (2) Selected examples of singleton–geminate obstruents and sonorants a. Bengali [ɑʈɑ] [ɑʈːɑ] ‘wheat’ ‘eight o’clock’ voiceless unaspirated retroflex stop [ʃobʰɑ] ‘beauty’ [ʃobʰːo] ‘civilized’ voiced aspirated labial stop [ɔɡɑd] ‘plenty’ [ɔɡːæn] ‘faint’ voiced unaspirated velar stop [bɑtʃʰɑ] ‘to isolate’ [bɑtʃʰːɑ] ‘child’ voiceless aspirated palatoalveolar affricate [ʃodʒɑ] ‘straight’ [ʃodʒːɑ] ‘bedding’ voiced unaspirated palatoalveolar affricate [kɑn̪ɑ] ‘blind’ [kɑn̪ːɑ] ‘tears’ dental nasal *[ɖ] [bɔɖːo] ‘too much’ — voiced unaspirated retroflex stop b. Hindi (Ohala 1999: 101) [pətɑ̪ ] ‘address’ [pət̪ːɑ] ‘leaf ’ voiceless unaspirated dental stop [kət ʰ̪ ɑ] ‘narrative’ [kət̪ʰːɑ] ‘red powdered voiceless aspirated dental stop bark’ [ɡəd̪ɑ] ‘mace’ [ɡəd̪ːɑ] ‘mattress’ voiced unaspirated dental stop 3 Masica writes the lax vowels /ɪ ʊ/ as capital /I, U/. Also, following traditional usage for our languages, p and i are used instead of φ and ι to indicate the phonological phrase and the intonational phrase.

318 ADITI LAHIRI AND HOLLY J. KENNARD [bətʃɑ]

‘save’

[bətʃːɑ]

‘child’

[pəkɑ]

‘to cook’

[pəkːɑ]

‘firm’

voiceless unaspirated palatoalveolar affricate voiceless unaspirated velar stop

c. Telugu (Bhaskararao and Ray 2017: 234) [ɡɐdi] ‘room’ [ɡɐdːi] ‘throne’ [ɐʈu] ‘that side’ [moɡɐ] ‘male’ [kɐnu] ‘give birth to’ [kɐlɐ] ‘dream’ [mɐri] ‘again’

[ɐʈːu] ‘pancake’ [moɡːɐ] ‘bud’ [kɐnːu] ‘eye’ [kɐlːɐ] [mɐrːi]

voiced unaspirated dentialveolar stop voiceless unaspirated retroflex stop voiced unaspirated velar stop alveolar nasal stop

‘falsehood’ alveolar lateral approximant ‘banyan tree’ alveolar trill

Concatenation of identical phonemes leads to geminates, but gemination as a phonological process occurs quite commonly within and across words and across morphemes via assimilation processes. Such assimilations are restricted to specific prosodic domains, as we will see in §21.3). The examples in (3) illustrate this point. (3) Gemination a. Concatenation Bengali /kʰel-l-ɑm/ /ʃɑt̪ t̪ɔlɑ/

> >

[kʰelːɑm] [ʃɑt̪ːɔlɑ]

‘play-simple past-1pl’ ‘seven floors’

Marathi Glide gemination (Pandharipande 2003: 725) /nəu-wadzta/ > [nəwwadzta] ‘at nine o’clock’ /nahi-jet/ > [nahjːet] ‘does not come’ b. Derived via r-coronal assimilation, whereby /r/ assimilates completely to the following dental, palatoalveolar, and retroflex consonants, leading to geminates. Bengali /kor-t̪-ɑm/ > [kot̪ːɑm] ‘do-habitual past-1pl’ /tʃʰord̪i/ > [tʃʰod̪ːi] ‘youngest older sister’ There is another form of gemination that relates to emphasis and is marked by the gemination of a consonant. This is very clearly observed in Bengali time adverbials, as in (4). (4) Adverbs and emphatic germination Bengali4 [ækʰon] ‘now’ [ekʰːuni] [t̪ɔkʰon] ‘then’ [t̪okʰːuni]

‘immediately’ ‘right after that time’

Thus, geminates and gemination are typologically quite frequent in the Indo-Aryan languages (cf. Goswani 1966 for Assamese). However, although geminates may add weight to the preceding syllable, it does not necessarily follow that they play an active role in stress assignment, to which we turn in the next section.

4 Note that vowels are raised one step when a high vowel follows.

THE INDIAN SUBCONTINENT 319

21.3 Word stress Word stress is a contentious topic in the Indo-Aryan languages. In Bengali, for example, lexical prominence is on the first syllable of a word, but it is considered to be phonetically ‘weak’ and hardly perceptible (Chatterji 1926/1975; Hayes and Lahiri 1991; Masica 1991). Nevertheless, there are some clear diagnostics, largely post-lexical, for the location of the main prominence on a word (see also Khan 2016 for Dacca Bengali). For example, as mentioned in §20.2, there are seven oral vowels /i e æ ɑ ɔ o u/ in Bengali and all of them have a nasal counterpart. However, there are distributional constraints on the vowels based on stress. First, the vowels /ɔ/ and /æ/ only occur in word-initial position, as in [kɔt̪ʰɑ] ‘speech’, [bæt̪ʰɑ] ‘pain’. In contrast, plenty of examples exist with final /i e u o ɑ/, such as [pori] ‘fairy’, [bẽʈe] ‘short (in height)’, [d̪ʰɑt̪u] ‘metal’, [bɔɽo] ‘large, big’, [kɔlɑ] ‘banana’. Second, since all nasal vowel phonemes must be in stressed position, they are restricted to the first syllable of a word. Third, geminate consonants in monomorphemic words are also restricted to the stressed syllable: [ʃot̪ːi]; *ˈCVCVCːV words are not permitted. However, since geminate suffixes exist, they are permitted in non-initial syllables in polymorphemic words: /ˈd̪ækʰ‑ɑ-tʃʰː-e/ show‑caus‑prog-3sg, ‘(s)he is showing’; /ˈmɑrɑ‑t̪ːok/ ‘deadly’. As we shall see in §21.4, the alignment of pitch accents in intonational tunes is another indicator of the syllable that carries the main prominence. A further diagnostic is the adaptation of loans. From the seventeenth century onwards, and more so in the eighteenth and nineteenth centuries, numerous loans came into Bengali primarily from Portuguese and English, both of which can have words with non-initial and variable stress. Irrespective of the stress pattern of the donor language, Bengali has always borrowed words with main prominence on the first syllable. Portuguese estirár, ananás, alfinéte, espáda, bálde, janélla have been borrowed into Bengali as [ˈis̪ti̪ ri] ‘iron (for ironing clothes)’, [ˈɑnɑrɔʃ] ‘pineapple’, [ˈɑlpin] ‘pin’, [ˈiʃpɑt̪] ‘steel’, [ˈbɑlt̪i] ‘bucket’, [ˈʤɑnlɑ] (Lahiri and Kennard 2019). Irrespective of which syllable bore stress in Portuguese, Bengali has firmly maintained word-initial stress. The same occurs for English words: exhibítion, inspéctor, América, cómpany are pronounced as [ˈegʤibiʃɑn], [ˈinʃpekʈɔr], [ˈæmerikɑ], [ˈkompɑni]. As Chatterji (1926/1975: 636) puts it, ‘the stress is according to the habits of Bengali’.5 Other dialects of Bengali and related sister languages such as Oriya do not necessarily have fixed word-initial stress. Masica (1991: 121) claims that stress is more evenly spaced and is weak (see also Lambert 1943; Pattanayak 1966). In Hindi, stress is probably quantitysensitive and tends to fall more on the penultimate syllable if the syllable is closed or if the vowel is tense, else the antepenult, very similar to Latinate stress. Names can provide good comparative evidence. For example, the name Arundhati carries antepenultimate stress in Hindi [əˈrund̪ʰət̪i] but not in Bengali, where the stress is on the first syllable [ˈorund̪ʰot̪i]. However, there are language-specific differences as to which types of syllables contribute to weight and therefore attract stress. As Masica (1991: 121) states, ‘each language has its own peculiarities: Hindi gaˈrīb; nukˈsān ‘poor’; ‘loss’ vs. Gujarati ˈgarib, ˈnuksān’. Thus, for intonation purposes, it is important to consider whether the pitch accents are aligned to stressed syllables and whether there is more variation in some languages than others. 5 Cases of Bengali loans in English demonstrate that borrowed words in English conform to English stress rules: largely penultimate if heavy, otherwise antepenultimate. For instance, Darjeeling (a favourite Himalayan resort in West Bengali) is pronounced in Bengali as [ˈd̪ɑrʤiliŋ] while English has predictably borrowed it with stress on the penultimate syllable Darjéeling.

320 ADITI LAHIRI AND HOLLY J. KENNARD

21.4 Tone Punjabi tone arose from a merged laryngeal contrast in consonants, and the tone contrast only appears in that segmental context and most clearly in stressed syllables (Evans et al. 2018). On the basis of this segmental origin, the steeply falling tone should be expected to be a low tone, because it appears after historically voiced aspirated plosives (i.e. murmured), while the other tone is low and flat but should be expected to be high, as it appears after the plosives that historically had no breathy voice (voiceless unaspirated, voiceless aspirated, and plain voiced) (cf. §3.3.2).6 The authors propose that this apparent anomaly is to be explained by a perceptual enhancement of an original low tone after murmured plosives by a fall before it (‘dipping’; cf. §9.5.2). The low pitch for the tone after the other laryngeal groups must subsequently have arisen by a contrast-enhancing lowering of an original high tone.

21.5 Intonation and intonational tunes In this section we will discuss three nuclear tunes: the declarative (neutral), focus, and yes/ no question contours. The standard tune in Indo-Aryan languages is overwhelmingly LH. It not only marks focus but is also the general prenuclear contour. Another general fact is that plateaux are not common in Indian languages, which must be related to the fact that the Obligatory Contour Principle (OCP) prohibits the occurrence of a sequence of identical tones in the intonational phrase (IP). Beyond that, there are various interactions between the three intonational tunes in these languages. Three specific issues arise from a brief survey: 1. Are the focus tune and the neutral tune identical, and, regardless of whether they are, do they differ in pitch range? 2. How reliably can one distinguish a non-focused yes/no question from a narrowfocused one? 3. If the tunes of non-focused and narrow-focused yes/no questions are neutralized, might there still be phrasal segmental processes that can distinguish them? We next illustrate the various patterns, with examples from Bengali, Assamese, and Hindi. What these languages have in common is that focus is marked by a L*Hp contour, with L* associated to the earliest stressed syllable of the focused element. As for the neutral contour, Hayes and Lahiri (1991) and Lahiri and Fitzpatrick-Cole (1999) have claimed that it is marked by a different pitch accent, namely H*. Other researchers have made different claims (e.g. Twaha 2017 for Assamese), but the actual descriptions and pitch tracks seem to suggest that the sentence-level prosodic phrase bears a H tone. The prenuclear contour has been claimed to be also LH by most researchers, as mentioned above. We turn to this in more detail after discussing the individual tunes in Bengali. 6 The tone contrast must represent a fairly recent case of tonogenesis. The Gurmukhi script, which dates to the early sixteenth century, still indicates voiced aspirates in syllables with falling tone (Jonathan Evans, personal communication). Mikuteit and Reetz (2007) give a detailed description of the acoustics of voiced aspirates in Dacca Bengali, providing evidence that they are really voiced and aspirated.

THE INDIAN SUBCONTINENT 321

21.5.1 Declarative Bengali is strictly verb-final and the verb will form a nuclear phrase. However, not all sentences need to have a verb, as seen in (5a). More generally, the nuclear pitch accent H* would be aligned to the first prosodic word of the last prosodic phrase of an IP in a declarative intonation. Importantly, Bengali obeys the OCP within IP’s, disallowing sequences of H tones. (5) Declarative neutral TUNE (surface) a. L* HP H* LI (((ˈt∫hele-ʈi)ω)φ ((ˈlɔmbα)ω)φ)I boy-CLASSIFIER tall

‘The boy is tall.’ b. L* (((d̪id̪i-r)ω

HP (d̪æor)ω)φ

H* LI (rɑnːɑ)ω (kɔre)ω)φ)I

elder sister’s brother-in-law cook.VN do.3sg.pres ‘My elder sister’s husband’s younger brother cooks.’ In (5a), tall is the last full prosodic element to carry stress and thereby align to H*. In (5b), /d̪id̪i-r d̪æor/ falls within a single phonological phrase and can undergo assimilation; thus, the phrase /d̪id̪i-r d̪æor/ ‘elder sister’s husband’s younger brother’ surfaces with gemination: [d̪id̪id̪ːæor]. Since the prenuclear accent is L*Hp, the OCP will delete one of the H tones. In the examples, we have indicated the deletion of the phrasal Hp, but we could equally well have deleted the second H; pitch tracks show that there is a H hovering around the edge of the phrase, as in Figure 21.1. In §21.5.3, evidence is presented that in yes/no questions, Hp is deleted when it is adjacent to a final HI. Twaha (2017) provides detailed analyses of Standard Colloquial Assamese (SCA) and Nalbariya Variety of Assamese (NVA). In both varieties, each phonological phrase has a L*HP tune, while the final verb bears a pitch accent H* in a declarative sentence, as seen in Figure 21.2.7 Twaha later (2017: 79) suggests that only NVA, not SCA, has a H* on the verb or verbal complex, apparently assuming an OCP deletion of H* occasioned by the preceding HP in SCA. The NVA example is given in Figure 21.3. 7 Twaha (2017: 57–58) claims that ‘in the third P-phrase ghɔrɔk, a plateau is observable after HP is manifested on the first mora of the final syllable rɔk. This plateau is caused because, unlike the preceding two P-phrases, the following P-phrase geisil bears a high pitch accent H* on its first syllable. Since H*, as per the proposal here, is aligned to the first mora of the initial syllable of geisil, the interpolation from Hp of ghɔrɔk to H* of geisil creates a plateau.’ However, the H* on the verb is not marked consistently; in some figures there is only a rising contour on the pre-verbal phrase followed by a sentence-final L boundary tone.

322 ADITI LAHIRI AND HOLLY J. KENNARD

Hp

L*

Hp

L*

L*

Li

Hp H*

nɔren

runir

mɑlɑgulo

nɑmɑlo

Nɔren

Runi-GEN

garlands

bring down-PST-3SG

Noren brought down Runi’s garlands.

Figure 21.1 Declarative intonation in Bengali.

250

Pitch (Hz)

200

HP

150

LI

100 75

HP

L*

HP

L*

ramɔr 2

0

HP

L*

rɔmεnε

2

L%

H*

ghɔrɔk

geisil

[[rɔmεnε]P [ramɔr]P [ghɔrɔk]P [geisil]P ]I

2

3 1.096

Time (s)

Figure 21.2 Standard Colloquial Assamese ‘Ram went to Ramen’s house’ (Twaha 2017: 57).

Pitch (Hz)

300 250 200 150 150 75

L*

HP teok

azi 1 0

2

L*

dhɔmki

HP

LP

H*

LI

dilu 2

[[azi teok]P [dhɔmki]P [dilu]P ]I Time (s)

Figure 21.3 Nalbariya Variety of Assamese ‘Today I scolded him’ (Twaha 2017: 79).

3 0.9791

THE INDIAN SUBCONTINENT 323 In sum, it appears that the H* on declaratives does show up in dialects of Assamese and even in the standard variety. With respect to the OCP disallowing sequences of H tones, Twaha (2017: 80) concludes: ‘However, the H* nuclear accent in NVA declarative utterances may not be always phonetically apparent due to the phonetic pressure created by the prosodic boundary tones preceding and following it (HP and LI respectively).’ Consequently, although we see a sequence of H tones in the figures, they do not in fact always surface. Khan (2008, 2014) suggests that in Bangladeshi Bengali (spoken largely in Dacca), no sequence of identical tones is accepted, although no explicit assumption of the OCP is made. Following Selkirk’s (2007) analysis of Bengali, Khan argues that if an accentual phrase ends in a H tone, the following pitch accent is L and vice versa. This general pattern is again similar to what we have seen above. Féry (2010) argues that Hindi neutral sentences consist of repetitions of LH ending with HL and claims that the same sequences are maintained regardless of whether one chooses the canonical word order to be SOV or SVO or whether the elements are focused. That is, focus is also LH, be it initial or medial, and does not change the overall picture (see more in §21.3.2). Her coding of an SOV sentence, with focus on the subject, is given in example (6). (6) Hindi: SOV focus (Féry 2010: 294) HP LP HP HP LI LP adhaypak ne moorti ko banaaya teacher erg sculpture acc make.past Note that Féry labels the accusative marker with a H tone followed by another H on the verb, although no obvious plateau is apparent from the figure. The same sort of picture is observable in the Dravidian languages Tamil and Malayalam; the general pattern continues to be LH-marked phrases ending with HL, as (7) and (8) show. (7) Tamil (Féry 2010: 305) HP LP HP HP LI LP [[meeri]P [niRaiya]P [ceer vaank- in-aaL]P]I Mary many chairs buypast-png ‘Mary bought many chairs.’ (8) Malayalam (Féry 2010: 307) HP LP HP LP HP HP LI LP [[Peter]P [oru [rasakaram-aya]P]P [pustakam]P [vaichu]P]I Peter one interesting book read ‘Peter read one interesting book.’ There would thus appear to be a general constraint that prohibits a sequence of identical tones in most Indian languages. For Bengali, Hayes and Lahiri (1991) and Lahiri and Fitzpatrick-Cole (1999) argue that the H* of a declarative final prosodic phrase clashes with the preceding prenuclear LH. The arguments that have been raised against the assumption of a pitch accent on the final verb (cf. Dutta and Hock 2006) are probably based on the intolerance of a sequence of H tones, which causes variation in the surface alignment of the remaining H, as in Bengali, Hindi, and Assamese.

324 ADITI LAHIRI AND HOLLY J. KENNARD

21.5.2 Focus Returning now to our example in (5), we saw that in a neutral declarative sentence, Bengali has an initial L*Hp prenuclear accent, followed by a final H*LI, as shown in (5) and (6). The OCP deletes one of the high tones, so this contour surfaces as an initial low followed by a rise to the last prosodic phrase (which need not be the verb). If, however, one element, such as [lɔmbɑ] ‘tall’, is focused, we see very different patterns. The focus tune is L*Hp followed by a final LI for the declarative. In (9a), the focus on [lɔmbɑ] is marked by L*Hp. Focus on pre-final constituents, such as [tʃeleʈi] in (9b), is shown by the deletion of post-focal pitch accents, a feature that Bengali shares with many Indo-European languages. Example (9c) shows a sequence of two phonological phrases each consisting of two words, with focus on the second phonological phrase, the complex verb [rɑnːɑ kɔre] ‘cook-3sg’. In this analysis, the boundary tones are linked to edges of prosodic constituents, but their targets do not necessarily appear strictly at the edges. In contrast, the pitch accents L* or H*, which are always linked to the initial syllable of the first prosodic word, are more stably aligned to the stressed syllable. (9) Bengali: focus intonation a. L* HP L* HPLI (((ˈt∫hele-ʈi)ω)φ ((ˈlɔmbα)ω)φ)I ‘The boy is (really) tall!’ b. L* HP LI (((ˈt∫hele-ʈi)ω)φ ((ˈlɔmbα)ω)φ)I ‘The boy is tall’ c. HP L* L* HPLI (((d̪id̪i-r)ω (d̪æor)ω)φ ((rαnːα)ω (kɔre)ω)φ)I [d̪id̪id̪ːæor]

‘Sister’s brother-in-law cooks!’ We have taken a sentence with sonorants to illustrate the L*Hp contour for both the prenuclear and the focus tunes. In Figure 21.4, the first word Noren is focused and we see a clear L*Hp contour with a gradual fall ending with a Li at the end of the sentence; this should be compared with the example in Figure 21.1. Figure 21.5 illustrates the prenuclear as well as the focus contour on Runir; here the focus suggests that it is Runi’s garlands that were brought down and not someone else’s. The prenuclear L*Hp is lower than the focus contour, and again the intonation goes down to a final LI. The Bengali examples suggest that L*Hp is the focus tune and that the final non-focused tune is H*Lp, typically associated with the sentence-final verb or verbal cluster (a complex

THE INDIAN SUBCONTINENT 325

Hp

L*

Li

nɔren

runir

mɑlɑgulo

nɑmɑlo

Nɔren

Runi-GEN

garlands

bring down-PST-3SG

Nɔren brought down Runi’s garlands

Figure 21.4 Bengali intonation, focus on Nɔren.

L*

Hp L*

Hp

Li

nɔren

runir

mɑlɑgulo

nɑmɑlo

Nɔren

Runi-GEN

garlands

bring down-PST-3SG

Nɔren brought down Runi’s garlands

Figure 21.5 Bengali intonation, focus on Runir.

predicate or a noun-incorporating verb). Assamese has a similar pattern. Twaha (2017) states that the focused element begins with a L* pitch accent and ends with a focus H boundary tone, which he labels as L*fHp. An example is given in (10). (10) Assamese (Twaha 2017: 111) L* L* HP

HP L*

fHP

[[rɔmɛn-ɛ]P [dɔrza-r]P [sabi-pat]P [milɔn-ɔk]P Ramen-nom door-gen key-cl Milan-acc ‘Ramen gave the door-key to Milan.’

LI [di-lɛ]P]I give-3sg.past

Three further comments need to be added. First, if the focused constituent is longer than one prosodic word, the contour breaks up into L*H fHP. Second, if the focused constituent is towards the beginning of the sentence, the rest of the sentence ends with a general fall; thus, there is again post-focal deaccenting. Both of these are illustrated in example (11).

326 ADITI LAHIRI AND HOLLY J. KENNARD (11) Assamese (Twaha 2017: 106) L* HP L*+H

fHP

LI

[[madhɔb]P [kɔmɔla kha-bo-loi]P [khɔgɛn-ɔr ghɔr-ɔloi]P [go-isɛ]P ]I Madhab orange eat-fut-dat Khagen-gen house-dat go-past.3sg ‘Madhab went to Khagen’s house to eat oranges.’ Third, if the final verb is focused, there will be a sharp drop to the end of the sentence, as shown in (12). (12) Assamese (Twaha 2017: 105) sabi-pat]P [milɔn-ɔk]P [di-lɛ ]P ]I L* fHP LI Ramen-nom door-gen key-cl Milan-acc give-past.3sg ‘Ramen gave the door-key to Milan.’ [[rɔmɛn-ɛ]P

[dɔrza-r

Except for focusing longer constituents, SCA and Bengali are very similar. In Bengali, a longish focused constituent such as runi-r malagulo would provide a smooth rise, while in SCA it would be broken up. For Hindi, Féry (2010: 294) argues that the focused contour is also the same LH, as ‘there was no change in the overall contour’. She reports Moore’s (1965) claim that a H boundary is placed after the focused constituent; this would be similar to that in Assamese and Bengali. According to Féry’s own work, however, the only real difference between focused and non-focused utterances is that if the focus falls on an early constituent, there is a greater phonetic pitch rise, while the post-focal constituent is lowered. Tamil is also claimed to have a LH pattern ending with HL on the verb in a neutral sentence. However, when the object is topicalized, the sentence ends with LH, as in (13). (13) Tamil (Féry 2010: 306) HP LP HP LP HP LP HI LP [[[niRaiya]P [ceer]P]I [[meeri]P [vaank- in-aaL]P ]I many chairs Mary buypast-png ‘Mary bought many chairs.’ Similarly, Malayalam shows an identical pattern of LH ending with HL. Of course, many sentential and morphosyntactic options are available to mark focus. Nevertheless, from an intonation perspective, the Indo-Aryan and Dravidian languages appear to have very similar patterns. Structural devices to mark focus differ across languages. West Germanic languages, for example, use pitch accents to indicate focused parts of sentences, while European Portuguese differentiates presentational focus from corrective focus by pitch accents, using H+L* for the former and H*+L for the latter. In Italian, narrow focus is not expressed through deaccenting within noun phrases. Japanese, in contrast, marks focus by preventing the lowering of the pitch range, which is otherwise automatic. Northern Bizkaian Basque can allow complex noun phrases with a single accent on the final word to have exclusive focus on a preceding word, as in [lagunen aMA] friend-gen.sg mother ‘The FRIEND’s mother’ (Elordieta and Hualde 2014). English marks focus for domains that may include unaccented words— for example, ‘Don’t move the dinner table, just move the kitchen table’ (focus ambiguous

THE INDIAN SUBCONTINENT 327 between kitchen and kitchen table). In contrast, Bengali can discriminate words even within compounds, as a consequence of the combination of phrasing and pitch accent marking. In a declarative sentence such as nirmal lalbari-r malik ‘Nirmal is the owner of the red house’, one could focus lalbari ‘the red house’ or just lal ‘red’, suggesting that Nirmal owns the red house (not the black one). The contour, given in (14), shows that when the compound lalbari is the focus, the pitch accent associated with [lɑl] is followed by a continuous rise to the beginning of malik, while when the focus is on lal, the L*H contour is only on that part of the compound. (14) Bengali focus within compounds L* HP a. nirmɔl lɑl bɑri-r mɑlik Nirmal redhouse-gen owner ‘Nirmal is the owner of the red house.’ b.

L*HP nirmɔl lɑl bɑri-r mɑlik Nirmal redhouse-gen owner ‘Nirmal is the owner of the red house [not the black one].’

21.5.3 Yes/no questions, with and without focus In Bengali, as probably in most Indo-Aryan languages, the yes/no question tune carries a L* pitch accent and ends with a contour boundary tone Hi.Li. The examples in (15) illustrate the contour. (15) Yes/no contours a. L*H IL I kɔfi? ‘Coffee?’ b. L* HI. L I ˈtʃ ʰeleʈi kʰubˈlɔmbα? ‘Is the boy very tall?’ c. HIL I L* d̪id̪i-r d̪æor rɑnːɑ kɔre?

[d̪id̪id̪ːæor] ‘Does elder sister’s brother-in-law cook?’

328 ADITI LAHIRI AND HOLLY J. KENNARD As we can see, the generic contour in Bengali is LH, the important distinction being between H*, which signals a neutral declarative, and L*, which is used for focused declaratives as well as for yes/no questions. There now arises the question of how focused yes/no questions are prosodically distinct from broad-focus ones. If L* is also used to mark focus in interrogatives, and it is, how can a neutral yes/no question be distinguished from a focused one? Consider the orthographic string in the Bengali equivalent of the sentence ‘Has mother invited father’s friend?’ in (16) and the three tonal structures in (a), (b), and (c). Observe that in (16a) and (16b), the OCP has deleted the medial phrasal Hp. (16) Underlying versus surface patterns in yes/no questions mɑː-ki babɑ-r bondʰu-ke nemontonːo kor-e-tʃʰ-e mother-Qpart father-gen friend-obliq invitation do-perf-prog-3 ‘Has mother invited father’s friend?’ L* Hp HiLi a. mɑː-ki bɑbɑ-r bondʰu-ke [nemontonːo] f kor-e-tʃʰ-e L* Hp HiLi b. mɑ-ki [bɑbɑ-r bondʰu-ke]f nemontonːo kor-e-tʃʰ-e p ] i L* HiLi c. mɑː-ki [bɑbɑ-r bondʰu-ke nemontonːo kor-e-tʃʰ-e]p ] i Structure (16a) indicates a narrow focus on [nemontonːo] ‘invitation’—that is, the question is whether mother actually invited father’s friend (as opposed to the friend coming uninvited). In contrast, the focus on [bɑbɑ-r bondʰu-ke] ‘father’s friend’ in (16b) suggests that mother specifically invited this friend, and not someone else. The most generic reading is intended to be (16c), an answer to ‘What happened?’ Unfortunately, (16b) and (16c) are identical with respect to intonation. Given the OCP constraint, the Hp of the focus phrase is deleted, an inevitable consequence in the context of the question tune. The claim for Bengali has been that the focus tune is L*Hp, which suggests that focus marking has three elements: (i) the L* pitch accent, (ii) a Hp boundary tone, and (iii) a phrase boundary delimiting the focused part. The third element is relevant for the evaluation of an alternative account in which Hp is interpreted as a prominence-marking tone. Selkirk (2007: 2) seeks to ‘eliminate focus-phrasing alignment constraints from the universal interface constraint repertoire and to reduce all nonmorphemic, phonological, reflexes of focus to reflexes of stress prominence’ (a typological treatment of prosodic focus is offered in chapter 31). Consequently, under this view there are no constraints like Align L/R(FocusP) where P is a prosodic constituent like φ. Since in our account the OCP and the phrase boundary requirement for the focus are independent phonological constraints, the prediction is that the OCP-triggered deletion of Hp leaves the phrase boundary unaffected. In order to test this, we will look at phrasal assimilation rules: since these are bounded by phonological phrases, removal of the phrase boundary predicts an application of the assimilation rules within the merged phonological phrase. To illustrate the two scenarios, let us consider the four intonational structures provided for [mɑmɑ-r ʃɑli-r bie] ‘wedding of mother’s brother’s wife’s sister’ (17), illustrated in Figure 20.6, focusing on the interaction between the intonation contour and the phonological phrase structure.

THE INDIAN SUBCONTINENT 329 mɑmɑ-r ʃɑli-r bie mother-brother-gen sister-wife-gen wedding ‘Mother’s brother’s wife’s sister’s wedding’

(17)

Figure 21.6a is the neutral declarative, as presented earlier in (5) and (6), with [bie] carrying the last H*Li pitch accent and [mɑmɑr], [ʃɑlir], and [bie] occurring in separate φ’s, whereby the Hp of [ʃɑlir] was deleted by the OCP. The failure of [rʃ] to be assimilated to [ːʃ] at the boundary between [mɑmɑr] and [ʃɑlir] shows that this assimilation rule cannot apply across a phonological phrase boundary. As expected, deaccenting [ʃɑlir] and [bie] to mark an early focus on [mɑmɑr], shown in panel b, cannot alter the status of that boundary, since the focus is ended by a phonological phrase boundary. While this boundary is here overtly marked by Hp, an early focus in an interrogative sentence will not be able to retain its Hp, because it will be deleted by the OCP, as triggered by the final HiLi. Panel d shows the interrogative counterpart to panel b, and it does not show the assimilation, indicating the presence of a phonological phrase boundary. For the assimilation to apply, the sentence must have neutral focus and be spoken without the now optional boundaries, as in panel d. Thus,

(a)

(b)

128ms Hp

L* m

α

m

α r

mamα-r

ʃ

123ms L* α

l

i

ʃαli-r

(c)

r

b

H

Li

i

e

Hp

L* m

bie

α

m

α

r

mamα-r

(d)

187ms L* m α m α mamα(-r)

ʃ

ʃ

Li α

ʃαli-r

l

i

b

r

i

e

bie

132ms HLi α

ʃαli-r

l

i

r

b

i bie

L* e

m

α

m α

mamα-r

r

ʃ

HLi α l

ʃαli-r

i

r

b

i

e

bie

Figure 21.6 Four prosodic structures for Bengali [mɑmɑ-r ʃɑli-r bie] ‘Mother’s brother’s wife’s sister’s wedding’. The neutral, broad-focus declarative (a); the declarative with focus on [mɑmɑ-r] ‘It is Mother’s brother’s wife’s sister’s wedding’ (b); the neutral, broad-focus yes/no question (c); the yes/no question with focus on [mɑmɑ-r] ‘Is it Mother’s brother’s wife’s sister’s wedding?’ (d). Only in (c) can r-coronal assimilation go through, since there are focus-marking phonological phrase boundaries after [mɑmɑr] in (b) and (d), and an optional, tone-marked phonological phrase boundary in (a).

330 ADITI LAHIRI AND HOLLY J. KENNARD while the surface tunes in panels c and d are identical (L*HILI), the phrasing is not, and while r-coronal assimilation potentially applies in cases such as panel c, φ-structure in neutral sentences being optional, the phrase break before [ʃɑlir] is obligatory, since it comes between focused [mɑmɑ-r] and post-focal [ʃɑlir die], blocking assimilation.8 This means that phonological phrases, while being restricted to maximally one pitch accent each, can be unaccented. The question of the discriminability of yes/no questions with broad and narrow focus and the possible roles of r-coronal assimilation and pitch range remains a topic for future research.

21.6 Segmental rules and phrasing Despite their potential evidence, segmental processes are rarely appealed to for confirmation of prosodic phrasing. Hayes and Lahiri (1991) and Lahiri and Fitzpatrick-Cole (1999) provide evidence from assimilation rules in support of phonological words and phrases. These include voicing assimilation as well as the r-coronal assimilation discussed in §21.5.3. Other Indian languages have also reported phonological processes constrained to apply only within phonological phrases. Twaha (2017) provides evidence of aspirate spirantization and flapping, in addition to voicing assimilation and /r/ assimilation. In NVA, r-coronal assimilation within a phonological phrase is triggered by a focus constituent, as shown in (21). (21) Assamese (Twaha 2017: 136) rɔmɛnɛ (dɔrzar sabipat Ramen-nom door-gen key-cl ‘Ramen gave the door-key to Milan.’

milɔnɔk Milan-acc

dilak) give-past.3sg

Furthermore, intervocalic spirantization can apply across prosodic word boundaries, but it has to be phrase-internal. Thus, aspirate spirantization occurs in (22a) but not in (22b). Therefore, focus-governed phrasing can also constrain phonological rules in Assamese. (22) Assamese (Twaha 2017: 137–138) a. [[rɔmɛn-ɛ]P [makhɔn-ɔr ghɔr-ɔt]P [kɔmla]P [ kʰa-ba ge-isi]P]I does not apply b. [[rɔmɛn-ɛ]P [makhɔn-ɔr ghɔr-ɔt]P [kɔmla kʰa-ba ge-isi]P]I

does apply [kʰ] > [x]

8 To test this, in a pilot study we examined all occurrences of /r/ plus coronal obstruent sequences in focused questions of the type shown in panel c of Figure 21.6 in the oral rendition of the book Epar Bangla Opar Bangla (Bengal on this side and the other) (Fitzpatrick-Cole and Lahiri 1997). There was a total of 11,205 tokens with various coronal obstruents (stops, [tʃ], [ʃ]). Since r-coronal assimilation leads to complete gemination, it was easy to measure the duration of closure or the duration of frication for the stridents. Our results were as follows: 87% had a complete sonorant /r/, in 5% it was difficult to tell whether the /r/ was there or not, and 8% had complete assimilation. Thus, when the phrase boundary was there, despite the fact that there was no H boundary tone, the phrase boundary blocked the assimilation.

THE INDIAN SUBCONTINENT 331

21.7 Conclusion The Indian subcontinent is vast. Not only are there many languages but there are also at least four language families. Unlike in Germanic, stress does not play a very important role in most of these languages in terms of marking lexical contrast, and minimal pairs such as tórment (N) and tormént (V) will not occur. Nevertheless, stress does play a role in pitch accent association as well as in distributional constraints on segments; for example, nasal and oral vowels are contrasted in Bengali only in stressed position and thus only occur word-initially. Very rarely do we find tonal contrasts; Punjabi is the only language that has been claimed to have developed tone. The intonational systems are very comparable. In general, the basic contour appears to be LH, for prenuclear as well as focus tunes. Unsurprisingly, the pitch accents are aligned to the most prominent syllable, which may not be word-initial (e.g. Hindi; Khan 2016). Recent cross-linguistic studies confirm that, despite variation, the data are compatible with accentual phrases (also equated with phonological phrases) that begin with a L pitch accent and end with a H (Khan 2016, based on North wind and the sun in six languages; Deo and Tonhauser 2018, based on data from Chodri, Marathi, and Gujarati). Nevertheless, Khan argues that this is clearer in the Indo-Aryan languages than in Dravidian. The end of the focus seems generally to be demarcated by a H boundary tone. Moreover, sequences of identical tones would appear to be generally disallowed. Finally, we have seen that segmental processes are also bound by the accentual or phonological phrase in many languages.

Acknowledgements We are grateful to Shakuntala Mahanta, who very kindly provided a copy of Twaha (2017), and to Gadakgar Manjiri for providing references. The research was partially supported by the European Research Council Advanced Grant MORPHON 695481, PI Aditi Lahiri.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

chapter 22

Chi na a n d Siber i a Jie Zhang, San Duanmu, and Yiya Chen

22.1 Introduction This chapter provides a summary of the prosodic systems of languages in Northern Asia, including varieties of Chinese spoken in mainland China and Taiwan as well as languages in Siberia, in particular Ket. A common theme in the prosody of these languages is their ability to use pitch to cue lexical meaning differences—that is, they are tone languages. The well-known quadruplet ma55/ma35/ma214/ma51 ‘mother/hemp/horse/to scold’1 in Standard Chinese is an exemplification of the tonal nature of the languages in this area. We start with a brief discussion of the typology of syllable and tonal inventories in Chinese languages (§22.2). These typological properties lead to three unique aspects of prosody in these languages: the prevalence of complex tonal alternations, also known as ‘tone sandhi’ (§22.3); the interaction between tone and word and phrase-level stress (§22.4); and the interaction between tone and intonation (§22.5). The prosodic properties of Ket are discussed briefly in §22.6. The last section provides a summary (§22.7).

22.2 The syllable and tone inventories of Chinese languages The maximal syllable structure of Chinese languages is CGVV or CGVC (C = consonant; G = glide; VV = long vowel or diphthong) (Duanmu 2008: 72). The syllabic position of the prenuclear glide is controversial, and it has been analysed as part of the onset (Duanmu 2007, 2008, 2017), part of the rime (Wang and Chang 2001), occupying a position of its own (van de Weijer and Zhang 2008), or variably belonging to the onset or the rime depending on the language, the phonotactic constraints within a language, and the speaker (Bao 1990; Wan 2002; Yip 2003). Yip (2003) specifically used the ambiguous status of the 1 Tones are transcribed in Chao numbers (Chao 1948, 1968), where ‘5’ and ‘1’ indicate the highest and lowest pitches in the speaker’s pitch range, respectively. Juxtaposed numbers represent contour tones; for example, ‘51’ indicates a falling tone from the highest pitch to the lowest pitch.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

CHINA AND SIBERIA 333 prenuclear glide as an argument against the subsyllabic onset-rime constituency. The coda inventory is reduced to different degrees, from Northern dialects in which only nasals and occasionally [ɻ] are legal to southern dialects (e.g. Wu, Min, Yue, Hakka) where stops [p, t, k, ʔ] may also appear in addition to nasals. Syllables closed by a stop are often referred to as ‘checked syllables’ (ru sheng) in Chinese phonology, and they are considerably shorter than non-checked (open or sonorant-closed) syllables. There are typically three to six contrastive tones on non-checked syllables in Chinese dialects. On checked syllables, the tonal inventory is reduced—one or two tones are common, and three tones are occasionally attested. Table 22.1 illustrates the tonal inventories on non-checked and checked syllables in Shanghai (Wu), Fuzhou (Min), and Cantonese (Yue).

Table 22.1 Tonal inventories in three dialects of Chinese Non-checked syllables Checked syllables Cantonese (Matthews and Yip 1994) 55, 33, 22, 35, 21, 23 Shanghai (Zhu 2006) 52, 34, 14 Fuzhou (Liang and Feng 1996) 44, 53, 32, 212, 242

5, 3, 2 4, 24 5, 23

22.3 Tone sandhi in Chinese languages A prominent aspect of the prosody of Chinese languages is that they often have a complex system of ‘tone sandhi’, whereby tones alternate depending on the adjacent tones or the prosodic/morphosyntactic environment in which they appear (Chen 2000; Zhang 2014). Two examples of tone sandhi from Standard Chinese and Xiamen (Min) are given in (1). In Standard Chinese, T3 214 becomes T2 35 before another T3;2 in Xiamen, tones undergo regular changes whenever they appear in non-final positions in a syntactically defined tone sandhi domain (Chen 1987; Lin 1994). (1) Tone sandhi examples a. Tonally induced tone sandhi in Standard Chinese 214 → 35 / ___ 213

b. Positionally induced tone sandhi on non-checked syllables in Xiamen 53 → 44 → 22 → 24 in nonfinal positions of tone sandhi domain 21

Tone sandhi patterns can generally be classified as ‘left-dominant’ or ‘right-dominant’. Rightdominant sandhi, found in most Southern Wu, Min, and Northern dialects, preserves the 2 This is a vast simplification. While in identification tasks T2 is indistinguishable from the sandhi tone for T3 (e.g. Wang and Li 1967; Peng 2000), recent phonetic, psycholinguistic, and neurolinguistic evidence indicates the sandhi tone for T3 is neither acoustically identical to T2 (e.g. Peng 2000; Yuan and Chen 2014) nor processed the same way as T2 in online spoken word processing (e.g. Li and Chen 2015; Nixon et al. 2015).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

334 JIE ZHANG, SAN DUANMU, AND YIYA CHEN base tone on the final syllable in a sandhi domain and changes the tones on non-final syllables; left-dominant sandhi, typified by Northern Wu dialects, preserves the tone on the initial syllable (Yue-Hashimoto 1987; Chen 2000; Zhang 2007, 2014). It has been argued that there is an asymmetry in how the sandhi behaves based on directionality, in that right-dominant sandhi tends to involve local or paradigmatic tone change, while left-dominant sandhi tends to involve the extension of the initial tone rightward (Yue-Hashimoto 1987; Duanmu 1993; Zhang 2007). We have seen in (1) that the tone sandhi patterns in both Standard Chinese and Xiamen are right-dominant and involve local paradigmatic tone change. In the left-dominant Shanghai tone sandhi pattern in (2), however, the tone on the first syllable is spread across the disyllabic word, neutralizing the tone on the second syllable (Zhu 2006). (2) Shanghai tone sandhi for non-checked tones: 52-X → 55-31 34-X → 33-44 14-X → 11-14 Zhang (2007) argued that the typological asymmetry is due to two phonetic effects. One is that the prominent positions in the two types of dialect have different phonetic properties: the final position in right-dominant systems has longer duration and can maintain the contrastive tonal contour locally; the initial position in left-dominant systems has shorter duration and therefore needs to allocate the tonal contour over a longer stretch in the sandhi domain. The other is the directionality effect of tonal coarticulation, which tends to be perseverative and assimilatory; the phonologization of this type of coarticulatory effect could then potentially lead to a directional asymmetry in tone sandhi. Duanmu (1993, 1994, 1999, 2007), on the other hand, argued that the difference stems from the syllable structure, and hence stress pattern difference between the two types of languages, as discussed in §22.4. Despite these typological tendencies, phonetically arbitrary tone sandhi patterns abound in Chinese dialects. For instance, the circular chain shift in the Xiamen pattern (1b) has no phonotactic, and hence phonetic, motivation, as the base tone itself is not phonotactically illegal in the sandhi position. Left-dominant sandhi, likewise, often has phonetic changes that cannot be predicted by a straightforward tone-mapping mechanism. In Wuxi (Wu), for example, the tone on the initial syllable of a word needs to be first replaced with another tone before it spreads rightward (Chan and Ren 1989), and Yan and Zhang (2016) argued that the tone substitution involves a circular chain shift, as in (3). (3) Wuxi tone sandhi for non-checked tones with voiceless initials 53-X → 43-34 Falling → Dipping 323-X → 33-44 34-X → 55-31 Rising

The phonetic arbitrariness and complexity of the synchronic tone sandhi patterns raise the question of whether all of these patterns are equally productive and learnable for speakers. This question has been investigated using ‘wug’ tests in a long series of work since the 1970s. For instance, Hsieh (1970, 1975, 1976), Wang (1993), and Zhang et al. (2011a) have shown that the circular chain shift in Taiwanese Southern Min (a very similar pattern to Xiamen in (1b)) is not fully productive. Yan and Zhang (2016) and Zhang and Meng (2016) provided comparisons between Shanghai and Wuxi tone sandhi and showed that the Shanghai pattern is

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

CHINA AND SIBERIA 335 generally productive; for Wuxi, the spreading aspect of the tone sandhi is likewise productive, but the substitution aspect of the sandhi is unproductive due to its circular-chain-shift nature. The relevance of phonetic naturalness to tone sandhi productivity has also been investigated in non-chain-shift patterns. Zhang and Lai (2010), for instance, tested the productivity difference between the phonetically less natural third-tone sandhi and the more natural half-third sandhi in Standard Chinese and showed that, although both apply consistently to novel words, the former involves incomplete application of the sandhi phonetically and is thus less productive. In general, the productivity studies of tone sandhi demonstrate that, to understand how native speakers internalize the complex sandhi patterns in their language, we need to look beyond the sandhi patterns manifested in the lexicon and consider ways that more directly tap into the speakers’ tacit generalizations. In our current understanding, the synchronic grammar of tone sandhi likely includes both productive derivations from the base tone to the sandhi tone and allomorph listings of sandhi tones, depending on the nature of the sandhi.

22.4 Lexical and phrasal stress in Chinese languages We begin with word stress. All monosyllabic content words in Chinese occur in heavy syl lables, are long and stressed, and have a lexical tone, such as lian214 脸 ‘face’. Function words can have stress and carry a lexical tone, too, but they often occur in light syllables, are short and unstressed, and have no lexical tone, such as the aspect marker le. The pattern is captured by the generalizations in (4) and (5). (4) Metrical structure in monosyllables (Hayes 1995) A heavy syllable has two morae, forms a moraic foot, and is stressed. A light syllable has one mora, cannot form a foot, and has no stress. (5) The Tone-Stress Principle (Liberman 1975; Goldsmith 1981; Duanmu 2007) A stressed syllable can be assigned a lexical tone. An unstressed syllable is not assigned a lexical tone. In two-syllable words or compounds, stress patterns are more complicated. Three degrees of stress can be distinguished, represented in (6) and (7) as S (strong), M (medium), and L (light or unstressed). Tones are omitted, since they differ from dialect to dialect. (6) Stress patterns in final positions Variety

Stress type

Example in Pinyin Gloss

Beijing

MS (67%) SM (17%) SL (14%)

da-xue 大学 bao-dao 报道 ma-ma 妈妈

‘university’ ‘report’ ‘mom’

Chengdu SM SL

da-xue 大学 ma-ma 妈妈

‘university’ ‘mom’

Shanghai SL

da-xue 大学

‘university’

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

336 JIE ZHANG, SAN DUANMU, AND YIYA CHEN (7) Stress patterns in non-final positions Variety

Stress type

Example in Pinyin Gloss

Beijing

SM SL

da-xue 大学 ma-ma 妈妈

‘university’ ‘mom’

Chengdu

SM SL

da-xue 大学 ma-ma 妈妈

‘university’ ‘mom’

Shanghai

SL

da-xue 大学

‘university’

Stress in Beijing is analysed based on Yin (1982). Stress in Chengdu has a robust phonetic realization in syllable duration (Ran 2011). Stress in Shanghai is realized in both syllable duration (Zhu 1995) and tone sandhi (Xu et al. 1988). In Chengdu and Shanghai, the stress patterns remain the same whether the position is final or non-final. In Beijing, however, MS is found in final position only, and it changes to SM in non-final positions. For example, da-XUE ‘university’ is MS when final but SM when non-final, as in DA-xue jiao-SHI ‘university teacher’ (uppercase indicates S). The patterns raise three questions, given in (8). (8) Three questions to explain a. Out of nine possible combinations (SS, SM, SL, MS, MM, ML, LS, LM, and LL), why are only SM and SL found in non-final positions? b. Why is MS found in the final position only? c. How do we account for dialectal differences? (8a) is explained by (9), which allows (SM) and (SL) but not *MM, *ML, *LM, *LL (no main stress), or *SS (two main stresses). (9) Constraint on stress patterns in non-final positions Chinese has syllabic trochee. (8b) is explained by (10), where 0 is an empty beat, which is realized as either a pause or lengthening of the preceding syllable. (10) Stress shift (SM) → M(S0) / __ # (8c) is related to the complexity of syllable rimes. As shown in (11), Beijing has the most complex rimes and Shanghai the simplest. (11) Rime complexity Variety

Diphthongs [-n -ŋ] contrast

Beijing

Yes

Yes

Chengdu

Yes

No

Shanghai

No

No

Rime complexity can explain differences in stress patterns: (12a) explains why stress shift occurs in Beijing but not in Chengdu or Shanghai. (12b) explains why Shanghai has S and L but no M.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

CHINA AND SIBERIA 337 (12) Stress and rime complexity: a. Stress shift occurs in languages that have both diphthongs and contrastive codas. b. Languages without diphthongs or contrastive codas have no inherent heavy syllables. There is a common view that a language can only choose one foot type (Hayes 1995), which seems to contradict our assumption that Chinese has both moraic trochees and syllabic trochees. However, a standard assumption in metrical phonology is that multiple tiers of metrical constituents are needed, such as in the analysis of main and secondary word stress in English. The foot type of a language is simply the foot type at the lowest level of the metrical structure. Our analysis suggests that the lowest metrical tier in Chinese is the moraic foot. Monomorphemic words longer than two syllables are mostly foreign names, in which binary feet are built from left to right. Some examples in Shanghai are shown in (13), transcribed in Pinyin, where uppercase indicates stress. (13) Stress in polysyllabic foreign names in Shanghai ZI-jia-ge 芝加哥 ‘Chicago’ DE-ke-SA-si 德克萨斯 ‘Texas’ JIA-li-FO-ni-ya 加利福尼亚 ‘California’ JE-ke-SI-luo-FA-ke 捷克斯洛伐克 ‘Czechoslovakia’ Let us now consider phrasal stress. Chomsky and Halle (1968) proposed two cyclic rules for English, shown in (14). (14) Phrasal stress (Chomsky and Halle 1968) Nuclear Stress Rule

In a phrase [A B], assign stress to B.

Compound Stress Rule In a compound [A B], assign stress to B if it is branching, otherwise assign stress to A. The rules have been reinterpreted as a single rule and extended to other languages (Gussenhoven 1983a, 1983c; Duanmu 1990, 2007; Cinque 1993; Truckenbrodt 1995; Zubizarreta 1998). In (15), X is a syntactic head and XP a syntactic phrase. (15)

Stress-XP (Truckenbrodt 1995) In a syntactic unit [X XP] or [XP X], XP is assigned phrasal stress.

Stress-XP can be noncyclic. A comparison of the Compound Stress Rule (CSR) and Stress-XP is shown in (16), with English compounds. (16) A comparison of cyclic CSR and noncyclic Stress-XP CSR

Stress-XP

Cycle 2 Cycle 1

whale-oil lamp

law-school language-exam

x x [[XP X] X]

x x x [[XP X][XP X]]

x [[XP X] X]

x x [[XP X][XP X]]

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

338 JIE ZHANG, SAN DUANMU, AND YIYA CHEN On cycle 1, CSR assigns stress to the left in whale-oil, law-school, and language-exam. On cycle 2, CSR assigns stress to whale-oil (because lamp is not branching) and languageexam (because it is branching). In contrast, Stress-XP assigns stress to each XP in one step. There are three differences between the analyses. First, as Gussenhoven (1983a, 1983b) notes, Stress-XP achieves the result in one step, while CSR cannot. Second, CSR produces many stress levels, while Stress-XP produces far fewer, in support of Gussenhoven (1991). Third, in law-school language exam, CSR assigns more stress to language, while Stress-XP assigns equal stress to law and language. In what follows, we shall consider Stress-XP only, since it is a simpler theory and can account for all Chinese data. Now, consider a compound and a phrase in Shanghai Wu (Xu et  al. 1988; Duanmu 1999), shown in (17). The foot/weight tier shows foot boundaries and syllable weight (H for heavy, L for light, and 0 for an empty beat). On the tone tiers, H means high and L means low. (17) A compound and a phrase in Shanghai Wu Compound

Verb phrase

Stress Syntax Foot/weight IPA Underlying-tone Surface-tone

x [XP X] (HL) tso ve LH-LH L-H

x [X XP] H(H0) tso ve LH-LH LH-LH

Character Gloss

炒饭 ‘fry-rice (fried rice)’

炒饭 ‘fry rice (to fry rice)’

In the compound, ‘fry’ has phrasal stress, and ‘rice’ has no stress and loses its underlying tones. The underlying tones of ‘fry’ are then split between the syllables. In the phrase, ‘rice’ has phrasal stress, whereas ‘fry’ does not but remains heavy, because no expression in Chinese starts with a light syllable. The three degrees of length, L, H, and (H0), are quite clear phonetically (Zhu 1995: tab. [10L] and [10M]). Next, we consider speech style, shown in (18). Of interest is the fact that in careful speech both expressions form two domains, but in casual speech the compound can reduce to one domain, while the verb phrase stays with two domains (Xu et al. 1988).3 (18) Speech style in a compound and a phrase in Shanghai Wu Careful Stress Syntax Foot/weight IPA Underlying-tone Surface-tone

Compound

Verb phrase

x x [[XP X][XP X]] (HL)(HL) nø-ʨĩ da-ɦoʔ LH-HL LH-LH L-H L-H

x x [[XP X][XP X]] (HL)(HL) kʰo-zɑ̃ da-ɦoʔ LH-LH LH-LH L-H L-H

3 As a reviewer pointed out, the difference between casual and careful styles may be gradient. Nevertheless, the tonal difference between typical casual and careful styles is quite noticeable, as described by Xu et al. (1988).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

CHINA AND SIBERIA 339 Casual

Stress Syntax Foot/weight IPA Underlying-tone Surface-tone

x [XP X] (HL) LL nø-ʨĩ da-ɦoʔ LH-HL LH-LH L-H 0-0

x [X XP] (HL)(HL) kʰo-zɑ̃ da-ɦoʔ LH-LH LH-LH L-H L-H

Character Gloss

南京大学 ‘South-capital big-school (Nanjing University)’

考上大学 ‘exam-enter big-school (enter university by exam)’

In careful speech, each expression has two XPs; each XP yields a stress and a tonal domain, as expected. In casual speech, we may assume that each disyllabic unit is treated as a single word. As a result, the compound is now [XP X], with just one XP and one stress (on ‘South-capital’). The verb phrase is now [X XP], with phrasal stress on the object still. The verb has no phrasal stress but gets stress from a separate requirement that a Chinese expression cannot start with an unstressed syllable. Before we end this section, let us consider the role of function words. An example is shown in (19), where [jɪʔ] can be an article ‘a’ or a numeral ‘one’. (19) Example with [jɪʔ] ‘a/one’ in Shanghai Wu 一 [jɪʔ] as ‘a’

一 [jɪʔ] as ‘one’

Feet/weight

(HL)L(H0)

H(HL)(H0)

Feet/words

(ma jɪʔ) po (se)

ma (jɪʔ po) (se)

Character Gloss

买一把伞 ‘buy a cl umbrella’

买一把伞 ‘buy one cl umbrella’

When 一 [jɪʔ] means ‘a’, it is not an XP and hence unstressed (the classifier cl 把 [po] is not an XP either). When 一 [jɪʔ] means ‘one’, it is an XP (numeral phrase) and stressed. The verb 买 [ma] ‘buy’ gets stress, again because a Chinese expression cannot start with a light syllable. Finally, 伞 [se] ‘umbrella’ is an XP (noun phrase) and is always stressed. Selkirk and Shen (1990) used (19) to exemplify a phonology–syntax mismatch. A metrical analysis can explain how the mismatch occurred. The discussion above barely scratches the surface of metrical effects in Chinese, but we hope the reader can see that (i) stress plays a crucial role in Chinese phonology and (ii) the fundamental metrical principles in Chinese are the same as in other languages.

22.5 Intonation in Chinese languages Intonation in Chinese is present in every utterance and serves diverse linguistic and paralinguistic functions beyond word meanings. It marks the prosodic organization of an utterance (Li et al. 2011) and helps to organize its information structure (Chen et al. 2016). It also signals the make-up of a discourse (Tseng et al. 2005; Yang and Yang 2012) and regulates

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

340 JIE ZHANG, SAN DUANMU, AND YIYA CHEN turn takings between interlocutors (Levow 2005). Moreover, speakers employ intonation to perform speech acts (Ho 1977) as well as to express emotional states and attitudes (Liu and Pell 2012; Li 2015). Both intonation and lexical tone involve the modification of various acoustic aspects of the speech signal, but their primary correlate is fundamental frequency (f0) changes. The multiplexing of the f0 channel raises the intriguing question of how exactly utterance-level intonation and word-level lexical tones interact in Chinese. Early discussions of Chinese intonation include Chao (1933, 1968), Gårding (1987), Shen (1989a), Shen (1992), Cao (2002), Lin (2004), and Wang and Shi (2011). To date, most quantitative studies of inton ation have focused on the f0 marking of focus and interrogativity.

22.5.1 Focus Focus refers to the highlighting of information that speakers intend to bring to the discourse, as opposed to other alternatives. It is an important strategy that languages adopt for efficient speech communication. In answer to the question of who teaches linguistics (20a), 玛丽FOC in (20b) would be focused (indicated with foc) and uttered with prominence, implying that among the set of possible teachers, it is MARY who teaches linguistics. (20)

a: - 谁教语言学? Shui_jiao_yuyanxue Who_teach_linguistics ‘Who teaches linguistics?’ b: - 玛丽FOC教语言学 mali_jiao_yuyanxue Mary_teach_linguistics ‘{MARY}FOC teaches linguistics’

Focal prominence in Chinese is cued via an ensemble of acoustic variations including not only the distinctive realization of lexical tone contours and durational lengthening (Chen 2003, 2006; Chen and Gussenhoven 2008) but also higher intensity (Shih 1988; Chen et al. 2014) and hyperarticulated segmental contrasts (Chen 2008). What has been of great interest is the underlying mechanism that leads to the f0-marking of focus. Gårding et al. (1983) were the first to adopt the notion of a pitch range grid to describe f0 modifications of tones under emphasis in Standard Chinese: an expanded range for focus and a compressed range out of focus. This pattern has been repeatedly observed in subsequent studies (e.g. Jin 1996; Xu 1999; Yuan 2004), which led to the view that focus is encoded via the tri-zone pitch range control: expansion under focus, compression after focus, and little or no change before focus (Xu and Xu 2005). Both within- and cross-dialectal variation in focus-induced f0 adjustments has been reported. Shanghai Wu, for example, has five lexical tones with two rising ones differing mainly in pitch register. Chen (2009d) showed that while syllables with the low-register rising tone do show significant f0 range expansion under focus, syllables with the highregister rising tone do not. This is presumably to ensure the distinctness of the two rising tones, because significant f0 range expansion of both would cause the f0 ranges of the two

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

CHINA AND SIBERIA 341 rising tones to overlap, making them less distinguishable. Taiwanese Southern Min is another Chinese variety that lacks consistent focus-induced f0 range expansion across lex ical tones. Pan (2007) showed that f0 raising is clearly present in the HH and HL tones but not in the MM or ML tones. Taiwanese thus parallels Shanghai Wu in that focus-related f0 modification is robust only when it does not sacrifice the distinctive realization of lexical tonal contrasts. The two studies converge on the importance of lexical tonal properties (such as the role of f0 register for tonal contrasts) for focus-induced f0 range manipulations. In the post-focus position, a range of languages spoken in China has been documented to lack f0 compression, including Cantonese (Gu et al. 2006; Gu and Lee 2007), Taiwanese, and Taiwan Mandarin (Xu et al. 2012), as well as Wa, Deang, and Yi (Wang et al. 2011). Chen (2010b) reported a lack of compression in certain tonal contexts even in Standard Chinese and argued that post-focus f0 realization is conditioned by lexical tonal properties and the weak implementation of the tonal targets in the post-focus condition. Xu et al. (2012) hypothesized that post-focus compression (PFC) has a single origin that evolved into a typological divide between languages with and without PFC. Given that this study and its follow-up research were typically based on speakers’ production of one stimulus sentence (therefore with limited tonal context), larger-scale investigations are certainly needed to test this hypothesis further. Focus f0 expression is also sensitive to higher-level prosodic organizations, as is evident in Wu dialects. In Wenzhou Wu Chinese, where a disyllabic word serves as the domain for tone sandhi, when only one syllable within the disyllabic sandhi domain is contrastively focused, f0 range expansion is quite uniformly distributed over the entire disyllabic domain (Scholz 2012; Scholz and Chen 2014). Shanghai Wu shows a similar pattern of sensitivity of focus to tone sandhi domain in addition to other lexical prosodic properties (Chen 2009d; Ling and Liang 2017). Work on how focus interacts with prosodic phrasing is overdue. The finding of the multiple cues for focal prominence and the within- and cross-dialectal variation in focus-induced f0 range manipulations are compatible with the ‘prominencemarking’ view of focus expression, explored in Chen (2003, 2010b) and Chen and Gussenhoven (2008). (See Chen 2012 and references therein for a cross-linguistic perspective.) This view holds that the phonological reflex of focus is prosodic prominence, whose phonetic expression is contingent upon lexical and prosodic properties of the focused constituent. While focus-induced f0 range manipulation is one of the important means to signal focus, other f0 adjustments (such as delayed f0 rise/fall in the rising/falling tones) may also be prioritized in such a way that, as an ensemble, they ensure that lexical tones are produced with enhanced distinctiveness of their characteristic F0 contours. Moreover, while focus-induced f0 variation is largely independent of f0 variation for lexical tones, they do interact, as evident in the dialects where f0 adjustments may be absent or compromised.

22.5.2 Interrogativity Generally speaking, interrogativity in Chinese is encoded via a global f0 raising with greater magnitude towards the end of the utterance, which is consistent with a cross-linguistic tendency (Cruttenden 1997; cf. Rialland 2007). Chinese dialects differ in the way the rising question intonation is implemented in production and used in perception. Cantonese, in the face of competing f0 cues for lexical tone

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

342 JIE ZHANG, SAN DUANMU, AND YIYA CHEN and question intonation, favours the cueing of interrogativity with an utterance-final f0 rise at the cost of tonal neutralization and misidentification, as evident in the f0 realizations of the utterance-final lexical tones (T21, T23, T22), which are indistinguishable from the high rising tone T25 (Ma et al. 2006). The substantial effect that question intonation has on lexical tone contours in Cantonese allows the local f0 rise to serve as a reliable cue for intonation perception (Ma et al. 2006), but it also leads to a high error rate for utterance-final lexical tone identification, especially for the low and low-rising tones (Ma et al. 2011a), while at the same time listeners’ sensitivity to f0 raising over non-final syllables to mark interrogativity seems reduced (Xu and Mok 2012). Standard Chinese, in contrast, opts to cue lexical tones at the cost of potential intonation misidentification (Ho 1977; Shih 1988; Liu and Xu 2005). A falling tone at the end of a question maintains its falling f0 contour (but would not fall as low as in declaratives), while a rising tone at the end of a statement is realized with its characteristic rising f0 contour (but at a relatively lowered f0 level compared to that in questions). Neither the global f0 raising nor the local f0 rise distort the lexical tone contours. Thus, while Cantonese shows a more direct mapping between phonetics (final f0 rise) and meaning (interrogativity), in Standard Chinese, the mapping is more obscure as the phonetic implementation of the so-called question-induced final f0 ‘rise’ varies as a function of the utterance-final lexical tone. This renders the final syllable a less reliable f0 cue-bearer for interrogativity than that in Cantonese. Listeners have been reported to tune more into the pre-final f0 raising as an additional cue to question perception (Jiang and Chen 2016). They can also quickly and accurately identify different lexical tones produced in both question and statement intonations with near-ceiling levels of accuracy (Liu et al. 2016a). The recognition rate of intonation, however, is contingent on the identity of the utterance-final lexical tone. Lower identification accuracy and more variance have been reported for the rising tone in question intonation (Yuan 2011), while higher accuracy rates and a faster response speed have been found for the falling tone in statements (Liu et al. 2016a). The cross-dialect differences in the interplay between lexical tone and intonation in behavioural studies have also been echoed by event-related potential brain response data (Ren et al. 2009, 2013; Kung et al. 2014; Liu et al. 2016b). They jointly suggest that dialects may differ in the extent to which interrogative intonation may be grammaticalized into a local boundary tone due to their different lexical tone systems. More data, from more dialects and/or languages, with statistical validity, are essential to replicate the findings and to elucidate the range of possible interactions between tone and intonation.

22.6 The prosody of Siberian languages Central Siberia is a linguistically complex region, with at least five genetically distinct groups of language present—Samoyedic, Ob-Ugric, Yeniseic, Tungusic, and Turkic (Anderson 2004: 2). The prosody of these languages is understudied, and we focus on the Yeniseic language Ket here, for two reasons. One is that, like Chinese languages, Ket has lexical tones. The other is that there exist relatively detailed descriptions of tone in the language (e.g. Werner 1997; Vajda 2000, 2003, 2004; Georg 2007). The following summary is primarily based on Vajda (2000, 2003).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

CHINA AND SIBERIA 343 Monosyllabic words spoken in isolation can have one of four tones in Ket: a high tone, a glottalized tone, a rising-falling tone (the tone rises, then falls), and a falling tone. Tonal pitch contours vary allophonically based on the carrier syllable. On disyllabic or longer words, however, only two tonal melodies appear on the two leftmost syllables of the word: a rising-falling contour (peak on the rising portion) and a rising-(high-falling) contour (peak on the falling portion; the fall is also less pronounced). These are considered as allotones of the monosyllabic rising-falling tone and high tone, respectively. The rest of the syllables are ‘tonally neutral’ (Vajda 2003: 407). When a monosyllabic root is followed by a syllabic suffix, the tone on the suffix, to some extent, predicts the disyllabic melody of the word. For instance, a rising-(high-falling) contour is more likely when the suffix has a rising-falling tone, whereas a rising-falling contour is more likely when the suffix has a high tone (Vajda 2003: 409). But there are many exceptions. In phonological phrases, however, the monosyllabic tones are preserved. From these, Vajda (2000, 2003) concluded that the domain of tone for Ket is the word rather than the syllable. This description of Ket tone is strikingly similar to that of the tonal systems of Northern Wu dialects of Chinese, which has a similar word/phrase distinction and word-level tones. The affinity between Ket and Chinese languages in the nature of tone is recognized by Vajda (2003: 416). Word- and phrase-level stress in Ket has not been described in the literature. Therefore, whether its tone patterns are related to stress patterns, similarly to dialects of Chinese (see §23.4), remains an unanswered question. The intonation patterns of Ket are also understudied and a comprehensive survey remains to be conducted (Vajda 2003: 411).

22.7 Summary Chinese languages and Siberian languages such as Ket are known for being tonal. The scholarship on the prosody of these languages has contributed to our understanding of the typ ology of prosodic systems in a number of important ways, of which we focused on three in this chapter. First, tonal alternation patterns, generally known as ‘tone sandhi’ in the Chinese context, are typologically diverse and often complex. Experimental studies on the productivity of different types of tonal alternation can shed light on the learnability of phonological alternation and the nature of the synchronic grammar in languages with complex alternations in general. Second, although the tonal aspect of these languages may obscure stress cues, the metrical structure of the languages shows striking similarities with that of non-tone languages. Third, the interaction between intonation and tone shows that the intonational realization in tone languages is not only influenced by the general tonal nature of the language but also dependent on the specific tonal contrasts used in the language. It is hoped that future work on the prosody of languages in these regions, particularly those understudied languages and dialects, will continue to explore the ways in which tone interacts with other aspects of the prosodic and grammatical systems.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 23

M a i n l a n d Sou th E ast Asi a Marc Brunelle, James Kirby, Alexis Michaud, and Justin Watkins

23.1 Scope of the chapter Mainland South East Asia (MSEA), often defined as a Sprachbund, is a linguistic area where languages from five different phyla (Austroasiatic, Austronesian, Hmong-Mien, KraDai, and Sino-Tibetan) converge and have developed similar structures (Matisoff 1973; Alieva 1984; Enfield 2003, 2005).1 While convergence processes are easy to identify in the region, its geographical boundaries are ill-defined, and one should not understate its typological diversity (Henderson 1965; Brunelle and Kirby 2016; Kirby and Brunelle 2017). In this chapter, we cover the area encompassing the Indochinese Peninsula (Vietnam, Cambodia, Laos, Thailand, Myanmar, and Malaysia) but also include Guangxi and Yunnan in southern China (excluding Chinese varieties, which are covered in chapter 22) and northeast India. As Austronesian languages are covered in chapter 25, our discussion of this phylum is limited to Chamic languages spoken in Vietnam and Cambodia and to Austronesian languages of the Malay Peninsula. Our main goal is to give an overview of representative types of word-level (§23.2) and phrase-level (§23.3) prosody, highlighting areas of convergence between families without understating their diversity.

23.2 Word-level prosody In this section, we first discuss the most common word shapes and stress patterns found in MSEA (§23.2.1). As these two properties are largely dependent, they are discussed together. We then give an overview of the diverse tonation systems of the region (§23.2.2). 1 Indo-European and Dravidian languages are also spoken by sizeable language communities in Burma, Malaysia, and Northeast India, but are not covered in this chapter.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Mainland South East Asia 345

23.2.1 Word shapes and stress The basic vocabulary of many MSEA languages is monosyllabic. This is the case in most Kra-Dai and Hmong-Mien languages, but also in Vietnamese, an Austroasiatic language. However, in most of these languages, a significant part of the lexicon is made up of compounds, and most languages also have some polysyllabic loanwords. This can be illustrated with Vietnamese. The Vietnamese basic lexicon is largely monosyllabic, as illustrated in (1). Our transcriptions follow the conventions in Kirby (2011), except for the tone notation. (1) Vietnamese monosyllables ‘to go’ nghiêng [ŋiəŋ44] ‘to be leaning’ đi [ɗi44] 45 44 tuyết [tɥiət ] ‘snow’ ngoan [ŋwaːn ] ‘to be well-behaved’ However, Vietnamese has a significant proportion of non-monosyllabic words. According to Trần and Vallée (2009), 49% of its lexicon is disyllabic and 1% is trisyllabic. Native compounds (2) and reduplicants (3) make up most of the disyllabic vocabulary. (2) Native Vietnamese compounds house+rest ‘inn, low-end hotel’ nhà nghỉ [ɲa21 ŋi2̰1] kiếm ăn [kiəm45 an44] search+eat ‘to make a living’ father+mother ‘parents’ bố mẹ [ɓo45 mɛ31ʔ] vui tính [vuj44 tiŋ̟45] happy+temper ‘to be good-tempered’ (3) Vietnamese reduplicants bạn bè [ɓaːn31ʔ ɓɛ21] friends + red ‘friends’ tim tím [tim44 tim45] red + purple ‘purplish’ Vietnamese also has a large number of compounds whose morphemes are borrowed from Chinese. These often have opaque semantics and, as such, seem better analysed as polysyllables (4). A significant number of loanwords from other languages are also polysyllabic, even if monomorphemic (5). Besides, although this is rarely pointed out in the literature, a number of native Austroasiatic words such as tắc kè [tak45 kɛ21] ‘gecko’ and thọc lét [thɔk͡p31 let45] ‘to tickle’ seem to constitute polysyllabic morphemes. (4) Opaque Sino-Vietnamese compounds ‘elk, reindeer’ 馴鹿 tuần lộc [twɤ̆n21 lok͡p31] docile + deer h 24 31ʔ appearance + degree ‘behaviour’ 態度 thái độ [t aːj ɗo ] (5) Vietnamese monomorphemic polysyllables (loanwords) balcony (< French balcon) ban công [ɓaːn44 koŋ͡m44] phô tô cóp pi [fo44 to44 kɔk͡p45 pi44] photocopy (< French photocopie) [tɕa21 viŋ̟44] place name (< Khmer [Preah] Trapeang Trà Vinh ) By definition, monosyllabic words cannot bear paradigmatic or syntagmatic word stress. However, even in languages whose core lexicon is monosyllabic, polysyllabic words can have fairly complicated stress patterns. While Vietnamese polysyllables do not seem to

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

346 Marc Brunelle, James Kirby, Alexis Michaud, and Justin Watkins show any type of word-level prominence (Brunelle 2017), the Indic loanwords of many South East Asian languages have alternating stress systems that are not necessarily attested in their native lexicon (Luangthongkum 1977; Potisuk et al. 1994, 1996; Green 2005). For instance, polysyllabic Thai words show a tendency to alternating iambic stress, stress clash avoidance, and the application of the stress-to-weight principle, as illustrated in (6). (6) Stress in Thai polysyllabic words (examples adapted from Luangthongkum 1977:199) ‘television’ โทรทัศน์ ˌthoːrəˈthát มะเร็งในเม็ดโลหิต məˌreŋnəiˌmétloˈhìt ‘leukaemia’ ไวยากรณ์ปริวัตร ˌwaijəˌkɔːnpəriˈwát ‘transformational grammar’ In these Thai polysyllables, stress is realized primarily through longer duration. The tones of stressed syllables are also realized more fully, while those of unstressed syllables are raised and partially neutralized (Potisuk et al. 1996). Many MSEA languages also have a canonical ‘sesquisyllabic’ word shape, a structure typical of the region. The concept of the sesquisyllable seems to be attributable to Henderson (1952), but the term was coined by James Matisoff (1973) to designate words containing ‘one syllable and a half ’. Generally speaking, a sesquisyllable is a disyllable with an iambic stress pattern. Its unstressed first syllable is called the ‘minor syllable’ or the ‘presyllable’, and has a reduced phonological inventory and a limited array of possible syllable structures. Its stressed second syllable has the full array of possible contrasts of the language and can have a more complex syllable structure. Sesquisyllables show variation across and sometimes even within languages. Thomas (1992) argues that there are four types of sesquisyllable. In the first type, a fully predictable [kəɓaːl] ‘head’, which is schwa is inserted in some clusters, as in the Khmer word underlyingly /kɓaːl/. Most authors consider such cases as monosyllables rather than sesquisyllables and treat their schwa as an excrescent vowel (Thomas 1992; Butler 2014). The second type of sesquisyllable consists of iambic disyllables in which the first vowel is a schwa and where the CəC- sequence contrasts with corresponding CC- clusters. Examples from Jeh, an Austroasiatic language of the Central Vietnamese Highlands, are given in (7). (7) Jeh sesquisyllables (Gradin 1966) trah ‘to chop out’ təˈrah ‘to squawk (of chicken)’ khej ‘month’ kəˈhej ‘moon’ The third and fourth types of sesquisyllable distinguished by Thomas (1992) are qualitatively similar; they consist of sesquisyllables whose minor syllables can only contain a subset of the vowels that can appear in the main syllables. Examples from Northern Raglai, an Austronesian language of south-central Vietnam, are given in (8). While Northern Raglai has six phonemic vowels that contrast in length and nasality, only three are allowed in minor syllables. ̃ 2007) (8) Northern Raglai (Nguyên piˈtuk

‘cough’

paˈtih ‘thigh’ buˈmaw

‘mushroom’

Interestingly, the trochaic mirror image of sesquisyllables, namely disyllables with an initially stressed syllable and a reduced second syllable, does not seem to be attested in MSEA.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Mainland South East Asia 347 Many languages of the area also have a non-sesquisyllabic polysyllabic structure as their canonical word shape. One example is Malay, a language that tends to have disyllabic roots but can have much longer grammatical words because of affixation or loans from Indic or Western languages. Careful analysis strongly suggests that Peninsular Malay does not have word stress (Mohd Don et al. 2008). Many Sino-Tibetan languages can also be shown to be polysyllabic because segmental or tonal processes affect their prosodic words. In Qiang and Shixing, for instance, the lenition of word-medial consonants provides positive evidence for polysyllabic prosodic words (LaPolla and Huang 2003: 31–32; K. Chirkova 2009: 12–13).

23.2.2 Tonation Many South East Asian languages employ one or more contrastive laryngeal properties that we term ‘tonation’ (following Bradley 1982). This includes not only the use of pitch but also properties such as vowel quality, voice quality, intensity, and/or duration. The extent to which it is useful to sub-typologize languages according to exactly which property or properties they (canonically) employ remains a matter of some debate (Abramson and Luangthongkum 2009; DiCanio 2009; Enfield 2011; Gruber 2011; Brunelle and Kirby 2016); despite this, we have broadly organized the following sections by phonetic property in order to emphasize the diversity and phonetic variability of the region’s word-level prosodic systems.

23.2.2.1 Inventories Around 20% of the languages spoken in MSEA are completely atonal (Brunelle and Kirby 2015). These languages are virtually all either of Austronesian or Austroasiatic stock. Diversity is greater in Austroasiatic languages, while the Austronesian languages of MSEA are either atonal or have simple tonation-type properties.2 Many languages of the area, especially in the Austroasiatic and Austronesian phyla, have been described as having ‘registers’. Henderson (1952) was the first author to employ the term ‘register’ to refer to a ‘bundle’ of (broadly suprasegmental) features, such as phonation type, pitch, vowel quality, intensity, and vowel duration, leading to the designation of (voice-)register languages in the South East Asian linguistic literature (Henderson 1952; Gregerson 1973; Ferlus 1979; Diffloth 1982). Register is normally understood to arise from the neutralization of voicing in onsets and subsequent phonologization of phonetic properties originally associated with voicing. A hallmark of register systems is redundancy, in the sense that one can identify multiple co-occurring properties. The Austroasiatic language Mon is an example of a canonical register system relying on pitch and phonation, but also on vowel quality and duration (Lee 1983; Diffloth 1985; Luangthongkum 1987; Abramson et al. 2015). Another example is Wa, a Mon-Khmer language spoken in northeastern Myanmar and in the southwest of Yunnan province in China that distinguishes two lexical registers termed ‘clear’ and ‘breathy’ (Watkins 2002).

2 Tsat, a Chamic (Austronesian) language spoken in Hainan, has a fully fledged tone system (Maddieson and Pang 1993; Thurgood et al. 2015).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

348 Marc Brunelle, James Kirby, Alexis Michaud, and Justin Watkins 0.622 0 –1

0

Time (s)

0.781

Frequency (Hz)

8000

Pitch (Hz)

0

0

Time (s)

0.781

300

0

0

Time (s)

0.781

Figure 23.1 Waveforms, spectrograms, and pitch tracks of the Wa words tɛɁ ‘land’ (clear register, left) and tɛ̤Ɂ ‘wager’ (breathy register, right). The clear register is characterized by sharper, more clearly defined formants; the breathy register has relatively more energy at very low frequencies.

In Wa, vowels in breathy register are characterized principally by their relatively breathier phonation type rather than the modal phonation of clear register vowels, illustrated in Figure 23.1. In addition, there are typically differences such that clear register vowels have slightly higher pitch than breathy register vowels. Vowel duration and vowel quality are mostly insignificant with respect to Wa register, though for some speakers there may be contrasts in these quality differences. The Wa register contrast applies independently of syllable-final /h/ and /ʔ/, making possible the set of distinct syllables in (9). (9) Vowel register independent of laryngeal consonants in Wa tɛ ‘sweet’ tɛ̤ ‘peach’ tɛʔ ‘land’ tɛ̤ʔ ‘swear’ tɛh ‘reduce’ tɛ̤h ‘turn over’ An outstanding question concerns the stability of register systems, which have frequently been seen to ‘restructure’ (Huffman 1976) or move to realize a contrast by means of a single acoustic property. An apparently recent shift from register to a primarily pitch-based system has been documented for several dialects of Khmu (Suwilai 2004; Svantesson and House 2006; Abramson et al. 2007). Restructuring can also lead to the development of a

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Mainland South East Asia 349 large vowel inventory, as apparently occurred in the history of Khmer (Huffman 1976) or Haroi (Lee 1977; Mundhenk and Goschnick 1977). We can contrast registers with tone inventories based on pure pitch. By ‘pure’ pitch, we mean to refer to a system in which pitch is the only phonetic exponent of a suprasegmental tonation contrast. A good example of such a language in MSEA might be Southern Vietnamese (Gsell 1980; Vũ 1982; Brunelle 2009b). However, setting aside restructured register languages such as Khmu, it is not clear whether such systems actually exist, and, if they do, they may in fact be rather rare: it seems reasonable to assume that there are always at least low-level spectral effects present in ‘pure’ pitch systems. In any case, it is probably still possible to differentiate between tone systems where these spectral effects are redundant and those systems where they are a necessary element of patterns of tone contrasts, as detailed in §23.2.2.2. A related issue here concerns the phonological analysis of primarily pitch-based tone systems. The languages of sub-Saharan Africa provide compelling evidence for an analysis based on sequences of level tones (from two, High and Low, to as many as five levels; see chapter 12; see also chapter 4). In Asia, such systems appear to be significantly less common (see Evans 2008 for an overview), though cases do exist, such as Pumi (Jacques 2011; Daudey 2014; Ding 2014) and Yongning Na (Michaud 2017: 87–101). Evidence for this type of decompositional analysis comes primarily from morphotonological alternations (see §23.2.2.2). To our knowledge, these systems are restricted to Sino-Tibetan languages of the Himalayas, on the northern periphery of the area under consideration here. Analyses of other languages of South East Asia in terms of level tones have also been proposed (e.g. Morén and Zsiga 2006 on Thai), but such proposals are challenging to evaluate in the absence of language-internal (morpho)phonological evidence (Clements et al. 2010). Finally, MSEA is home to a number of languages with complex tonation systems involving multiple phonetic properties. While there may be a certain amount of variation, a hallmark of such systems is the canonical co-occurrence of two or more phonetic properties. For example, three of the six tones in Northern Vietnamese are systematically realized with a laryngealized voice quality in sonorant-final syllables (Vũ 1982; Nguyễn and Edmondson 1997; Michaud 2004), and perceptual research has shown that the strong glottalization of the low glottalized tone is normally sufficient for identification, to the point of largely overriding pitch cues (Brunelle 2009b). Hmong-Mien languages also tend to exhibit systems of this type (Huffman 1987; Andruski and Ratliff 2000; Esposito 2012; Garellek et al. 2013, 2014). For example, Black Miao, a Hmong-Mien language spoken in Guizhou province, China, contrasts five level tones, but three of these tones are also respectively characterized by laryngealized, tense, or breathy phonation, all of which are important cues for accurate native-speaker discrimination (Kuang 2013b). Although strictly speaking outside MSEA proper, a number of Wu languages spoken in China also have mixed phonation and pitch tonation systems (Rose 1989). These languages are perhaps especially notable for employing ‘whisper’ and/or ‘growl’ phonation types, probably involving oscillation of epilaryngeal structures (Edmondson et al. 2001).

23.2.2.2 Tonal phonology, tone sandhi, and morphotonology Tone serves a wide range of functions in the world’s languages: in addition to its phonemic function, it can mark grammatical categories, it can be assigned according to paradigm-specific rules, and it can even constitute the sole phonological form of a morpheme (grammatical tone; see chapter 4). In MSEA, the vast majority of Austroasiatic, Austronesian, and Tai-Kadai tone

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

350 Marc Brunelle, James Kirby, Alexis Michaud, and Justin Watkins languages have ‘inert’ tones (tones that are not active in phonology or morphology), whereas productive tonal processes are more commonly found in some Hmong-Mien and Sino-Tibetan languages. The first type of tone process found in the area is tone sandhi in its narrow sense: a tone turns into another contrastive tone in a specific tonal environment. For instance, White Hmong has seven tones, out of which five undergo the permutations in (10) in most compounds and some phrases. This tone sandhi seems partly fossilized in contemporary White Hmong, but there is little doubt that it was productive at an earlier stage of the language (Ratliff 1987; Mortensen 2004). (10) White Hmong tone sandhi (Ratliff 1987) 52, 22, 31ʔ → 42̤ 24 → 33 / 55, 53 _____ 33 → 22 Tone sandhi must be distinguished from tonal coarticulation, which could be characterized as phonetic accommodation between adjacent tones. Studies of tonal coarticulation in Central Thai and Vietnamese suggest that progressive coarticulation is much stronger than regressive, and that assimilatory effects are more common than dissimilatory ones in these languages (Han and Kim 1974; Gandour et al. 1992a, 1992b, 1994; Brunelle 2009a). Tone sandhi could develop from the misinterpretation of some forms of tone coarticulation, but this seems to require more than simple phonologization (Brunelle et al. 2016). The most complex sandhi-like processes in the region are doubtless found in the KukiChin languages of Burma, Mizoram, and Nagaland. In these languages, combinations of tone spreading and positional tone sandhi sensitive to the boundaries of prosodic domains are commonplace (Hyman and VanBik 2002, 2004; Watkins 2013). In the Tibeto-Burman southern Chin language Sumtu, of which the Myebon dialect is described by Watkins (2013), a morpheme may have lexically high or low tone. Functional morphemes attached to a noun or verb stem may have no lexically specified tone, in which case their tone is derived by a process whereby high and low tones alternate such that adjacent highs or lows are avoided where possible (i.e. unless a lexically specified tone makes adjacent highs or lows inevitable). Examples of sentences with a lexically high tone verb stem [pék] ‘give’ and a low tone verb [hŋà] ‘borrow’ are given in (11). To the right a string of verbal auxiliaries and particles are attached, and to the left of the stem a subject/object prefix is attached. Only the verb stem has lexical tone: the attached morphemes are assigned alternating high and low tones so no adjacent tones are the same. (11)

a. ʔə̀-m-pék-bà-láʔ-hnì L-H-L-H-L 3-tr-give-again-must-prf ‘He has had to give back.’ b. ʔə̀-m-pék-làʔ-hní 3-tr-give-must-prf ‘He has had to give.’

L-H-L-H

c. ʔə́-hŋà-láʔ-hnì 3-borrow-must-prf ‘He has had to borrow.’

H-L-H-L

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Mainland South East Asia 351 In Sumtu, the dual number in verb paradigms is indicated by tone, as shown in (12). The lexically low tone verb [sìʔ] ‘go’ has minor-syllable pronominal prefixes attached. In the singular and the plural forms, these prefixes have a high tone: having lexically assigned tone, they assume the tone that is the polar opposite of the stem to which they are attached. However, the dual number is indicated by a tone change in the pronoun prefix; the low tone dual pronominal prefix provokes a dissimilatory tone change in the verb stem, so that in the dual forms the verb stem has a high tone. (12) Tone change in Myebon Sumtu dual number verb forms singular dual plural 1 kə́-sìʔ incl mə̀-síʔ mə́-sìʔ excl kə̀n-síʔ kə́n-sìʔ nə̀n-síʔ nə́n-sìʔ 2 nə́-sìʔ ʔə̀n-síʔ ʔə́n-sìʔ 3 ʔə́-sìʔ A second type of tone alternation is tone spreading, a process observed in some level-tone systems: for instance, in Yongning Na (Sino-Tibetan), L tone spreads progressively (‘left to right’) onto syllables that are unspecified for tone (Michaud 2017: 324). ‘Spreading’ of level tones is a process of phonological copying; this needs to be distinguished from cases where the domain of phonetic realization of a lexical tone category is the entire phonological word, as in Tamang (Sino-Tibetan). The four tones of Tamang ‘unfold’ over an entire phonological word: non-initial syllables of words, whether they be a suffix or part of a single morpheme, never carry their own tone, so that their fundamental frequency (f0) curve can be considered an expression of the tone lexically carried by the initial lexeme, which is allowed to unfold over the available space—the entire phonological word (Mazaudon and Michaud 2008). This can usefully be distinguished from ‘tonal coarticulation’ on toneless syllables, as illustrated by Northern Mandarin, where the phonetic realization of a toneless suffix is heavily influenced by the tone of the preceding syllable but where the latter can still be considered to be realized phonetically on the syllable to which it is lexically associated (Chen and Xu 2006). Tone can also be used to mark morphological alternations. In MSEA, this is relatively rare, except in Sino-Tibetan, where morphological alternations involving tone are most abundant in Kuki-Chin (see Ozerov 2018 for an overview and a case study of Anal) and in Na-Qiangic (Evans 2008; Jacques and Michaud 2011; Daudey 2014). Cases of morphology conveyed solely by tone (i.e. tonal morphology proper) are much rarer than cases of conditioning of tone assignment by morphosyntax (i.e. morphotonology). In Anal (Ozerov 2018), omission of grammatical suffixes leads to a grammatical distinction being marked only by tonal alternations on the last syllable of the stem. Interestingly, traces of the reduced suffix can consist of (i) changed tone, (ii) vowel lengthening, or (iii) both tone change and vowel lengthening. Another example is the Burmese creaky tone, which can express possession on a restricted number of lexemes (pronouns, kinship terms, and a few more) in place of the full possessive marker, also carrying creaky tone (Okell and Allott 2001: 273). Naxi (SinoTibetan) has cases of reduction of H tone grammatical words to a floating H tone, whereas M and L tone syllables that become coalescent are reported to retain a vowel target of their own—that is, the reduction process stops short of complete segmental ellipsis (Michaud and He 2007).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

352 Marc Brunelle, James Kirby, Alexis Michaud, and Justin Watkins

23.3 Phrasal prosody The phrasal prosody of MSEA languages has attracted far less systematic attention than their word-level prosody. In this section, we first review research on prosodic domains (§23.3.1). We then go over descriptions of intonational patterns and their interaction with final particles (§23.3.2) and explore the role of information structure in the languages of the area (§23.3.3).

23.3.1 Prosodic phrasing The study of prosodic phrasing in MSEA has developed steadily over the past decade. Research has focused on the difficulty of applying the standard Prosodic Hierarchy (Selkirk 1984; Nespor and Vogel 1986) to the languages of the region. While some languages, such as Boro, faithfully conform to the hierarchy (Das 2017), a number of researchers question the very existence of a universal hierarchy, especially in the Sino-Tibetan domain, and argue for emergent domains (Hildebrandt 2007: 353–376; Bickel et al. 2009; Post 2009; Schiering et al. 2010; Michaud 2017). Lhasa Tibetan, for instance, is argued to have no phonological phrase but two word-size prosodic domains (Lim 2018). Most studies adopt a narrower scope and focus on evidence (or lack thereof) for specific prosodic domains (Phạm 2008; E. Chirkova and Michaud 2009; Karlsson et al. 2012; Brunelle 2016). For instance, the absence of segmental or suprasegmental processes in grammatical words argues against the existence of a prosodic word in Vietnamese (Schiering et al. 2010; Brunelle 2017; but cf. Phạm 2008). The lack of phonetic difference between homophonous compounds and phrases, such as hoa hồng [hwa44 hoŋ͡m21] (flower + pink) ‘rose’ or ‘pink flower’, reinforces this conclusion (Ingram and Nguyễn 2006). To our knowledge, the issue of prosodic recursion, the embedding of a prosodic constituent within a constituent of the same type, has not yet been explored systematically in MSEA. A notable exception is Boro, a language in which a tone-spreading process suggests that enclitics are parsed into a recursive prosodic word that also encompasses the prosodic word formed around its host (Das and Mahanta 2016; Das 2017).

23.3.2 Intonation Intonation, and more specifically the interaction between tone and intonation, has been studied in a number of MSEA languages. Although it is still too early to reach strong conclusions, it seems that boundary tones3 can play an important role in the intonational phon ology of languages with small tone inventories (Blood 1977; House et al. 2009; Karlsson et al. 2010; Karlsson et al. 2012; Phạm and Brunelle 2014). In Northern Khmu, a two-way 3 The term ‘boundary tone’ is used as a convenient label for intonational effects that are mostly realized at the edge of intonational domains. We recognize a divergent range of views on whether these effects should be formalized as tones or as a different type of primitive (on this topic, see Rialland, in press).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Mainland South East Asia 353 tone contrast does not prevent the realization of a phrasal H tone on the rightmost edge of every prosodic phrase; the tone curves are adjusted accordingly (Karlsson et al. 2012). A simpler example is Eastern Cham, a language in which sentence-final boundary tones concatenate with register on the final syllable, as illustrated in (13). (13) Final boundary tones realized on the final syllable in Eastern Cham (Phạm and Brunelle 2014): registers are autosegmentally represented as H/L for convenience a. L

H

H

L

L L%

c̥a ka naw pajʔ ̥ c̥ɤ boy name go study already ‘Ka has gone to school.’

b. L H H

L

L H%

c̥a ka naw pajʔ ̥ j̥ ɤ boy name go study already ‘Has Ka gone to school?’

The effect of boundary tones can also be seen in languages with large tone inventories. The clearest cases are languages in which the pitch contour of toneless particles can be predicted based on intonation, such as Thai (Pittayaporn 2007), or in which an intonational contour overrides the lexical tone of discourse markers, such as backchannels and repair utterances in Northern Vietnamese (Hạ 2010, 2012). However, the typical scenario in such languages is that intonational effects are realized through a combination of various cues, such as the global pitch height and slope of the utterance, phrase-final pitch contour, and duration (Trần 1967; Đỗ et al. 1998; Luksaneeyanawin 1998; Nguyễn and Boulakia 1999; Michaud 2005; Vũ et al. 2006; Brunelle et al. 2012; Mạc 2012). It is unclear whether these intonational cues, which show great speaker variability, can be analysed as categorical boundary tones in the autosegmental-metrical sense (Michaud 2005; Brunelle et al. 2012; Brunelle 2016). The lack of categorical realization of intonation in languages with large tone inventories could be facilitated by sentence-final particles, which are a pervasive feature of most MSEA languages. These often have the same function as intonation, arguably making it redundant. In fact, Hyman and Monaka (2011) have proposed treating such particles as a part of the intonational system. However, the existence of final particles alone does not imply that intonation is not employed, either redundantly or primarily (e.g. Dryer 2013); much more work is needed in this area.

23.3.3 Information structure In many MSEA languages, information structure is primarily marked by means of syntactic restructuring and overt morphological markers. See Michaud and Brunelle (2016) for an overview of such markers in Yongning Na and Vietnamese. More relevant to this chapter is the prosodic marking of information structure. Although these structures have not received much attention in the languages of MSEA, they seem to mainly include prosodic phrasing and overt focus. A Yongning Na example of information structure realized through prosodic phrasing is given in (14). In this example, ‘dog meat’ is topicalized and thus forms a tone group separate from the rest of the sentence, a phrasing that is marked by the bolded tone changes (see Michaud 2017: 324–327 for detailed tone rules).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

354 Marc Brunelle, James Kirby, Alexis Michaud, and Justin Watkins (14)

/kʰv˩mi˩-ʂe˩ dzɯ˥ mə˧ ɖo˧ pi˥ zo/ dog-meat eat neg ought_to say advb kʰv˩mi˩-ʂe˩˥, dzɯ˧-mə˧-ɖo˧-pi˧-zo˥ ‘It is said that one must not eat dog meat! / It is said that dog meat is something one must not eat!’ (Michaud and Brunelle 2016: 783)

Vietnamese is the MSEA language in which overt focus has been studied the most systematically. Studies have been conducted on corrective focus (Michaud 2005; Vũ et al. 2005; Brunelle 2017) and pragmatic focus (Jannedy 2007). Results reveal that speakers can realize focus through a number of correlates of vocal effort, such as raised f0 and intensity, increased duration, and a fuller realization of tone contours and phonation types associated to tones. However, speakers do not need to use all these cues simultaneously, and they exhibit significant individual variation. In spontaneous speech, prosodic focus is normally accompan ied by morphosyntactic focus-marking strategies.

23.4 Conclusion In this chapter, we have attempted to give an overview of the diverse prosodic systems of MSEA. We have argued that it is difficult to characterize the languages of the region in terms of a few stereotypical prosodic properties. The chapter also reflects the state of our current knowledge on the prosodic structures of MSEA: while their word-level prosody is well understood, it is imperative that more work be conducted on their phrasal prosody, which is still ill-understood.

chapter 24

Asi a n Pacific R i m Sun-Ah Jun and Haruo Kubozono

24.1 Introduction This chapter describes the prosodic systems of Japanese and Korean, two languages in the Asian Pacific Rim that are often said to be historically related. It analyses both lexical and post-lexical prosody, with the main focus on word accent and intonation. The discussion is not restricted to the standard varieties of the two languages, Standard Tokyo Japanese and Standard Seoul Korean, but also covers various regional dialects that are known for their diversity in prosodic organization.

24.2 Japanese Japanese is known to have lexical pitch accent systems in which word prosody is described with reference to the roles that pitch (or f0) plays at the lexical level.1 It is also known for the diversity observed across different regional dialects, ranging from so-called accentless systems to those that permit multiple contrastive pitch patterns. With this background, this section describes the diversity of both lexical and post-lexical prosody in the language with Tokyo Japanese as a reference point.

24.2.1 Japanese word accent Previous studies of lexical pitch accent have demonstrated multiple ways to typologize lex ical pitch accent systems. The most popular typology used in Japanese (and Korean) is based on the number of lexical pitch patterns observed in each system (Uwano 1999). For example, Tokyo Japanese permits multiple distinctive patterns for nouns, while Standard

1 See Beckman (1986) for the phonetic properties of Japanese pitch accent as opposed to stress accent as found in English.

356 Sun-Ah Jun and Haruo Kubozono Seoul Korean does not. The latter is known as an ‘accentless’ system where pitch features are not specified at the lexical level and, hence, pitch does not play any distinctive role. The distinctive role of pitch in Tokyo Japanese can be demonstrated by minimal pairs of words like those in (1).2 In (1) and the rest of this section, high pitch is denoted by capital letters, whereas in a more abstract accentual analysis lexical pitch accents are indicated by apostrophes after the accented syllable or mora in transcriptions contained in square brackets (McCawley 1968; Poser 1984a).3 Like Tokyo Japanese, many Japanese dialects use pitch contrastively at the word level. (1) a. b. c.

HA.na-ga [ha’.na-ga] ‘Hana (girl’s name)-nom’ ha.NA-ga [ha.na’-ga] ‘flower-nom’ ha.NA-GA [ha.na-ga] ‘nose-nom’

Under Uwano’s (1999) typology, pitch accent systems fall into two major groups, ‘accented’ and ‘accentless’ systems. Accentless systems are found in several peripheral areas of Japan, notably southern Tohoku (e.g. Miyagi, Yamagata, and Fukushima Prefectures), northern Kanto (e.g. Ibaragi Prefecture), and central Kyushu (e.g. Kumamoto Prefecture): see Map 24.1 for areas and dialects in Japan. ‘Accented’ systems can be further classified into two subgroups, called ‘multi-pattern’ and ‘N-pattern’ systems. In multi-pattern systems, the number of distinctive pitch patterns increases as the word becomes longer. A typical example is Tokyo Japanese, which exhibits n + 1 patterns for nouns with n syllables: two patterns for monosyllabic nouns, three patterns for disyllabic nouns, four patterns for trisyllabic nouns, and so on (McCawley 1968; Haraguchi 1977; cf. Kubozono 2008). The four patterns exhibited by trisyllabic nouns are shown in (2). Multi-pattern systems are widespread; they are found in most parts of Honshu and Shikoku. (2) Trisyllabic nouns in Tokyo a. KA.bu.to-ga [ka’.bu.to-ga] ‘helmet-nom’ b. ko.KO.ro-ga [ko.ko’.ro-ga] ‘heart-nom’ c. o.TO.KO-ga [o.to.ko’-ga] ‘man-nom’ d. sa.KA.NA-GA [sa.ka.na-ga] ‘fish-nom’ While the number of pitch patterns increases in proportion to the phonological length of words in multi-pattern systems, it is fixed to a certain integer in n-pattern systems. N-pattern systems in Japanese range from one-pattern to four-pattern systems, with 2 Syllable and morpheme boundaries are shown using dots and hyphens, respectively, while nom means a nominative particle. 3 Words without an apostrophe are ‘unaccented words’, the class of words that do not involve an abrupt pitch fall even when they are followed by a grammatical particle.

ASIAN PACIFIC RIM 357

HAMGYEONG TOHOKU

HONSHU Daegu

Seoul

JEOLLA

Echizen

KANTO

GYEONGSANG

Tokyo

Nagasaki KYUSHU Koshikijima

SHIKOKU

Kyoto

Narada

Osaka

Miyakonojo/Kobayashi

Kagoshima Amami

Map 24.1 Japanese and South Korean dialect areas

t wo-pattern systems found quite extensively.4 A typical example is the two-pattern system of Kagoshima Japanese, spoken in southern Kyushu. This system exhibits two and only two pitch patterns, traditionally called ‘Type A’ and ‘Type B’ (Hirayama 1951), for all types of content words. In this particular system, the two pitch patterns bear a H(igh) tone on the penultimate and final syllables, respectively, as shown in (3) (Kibe 2000; Kubozono 2012). (3) Two-pattern system of Kagoshima Japanese a. Type A NA.tu ‘summer’ na.tu.ya.SU.mi ‘summer holiday’ KI.ru ‘to wear’ b.

Type B ha.RU ‘spring’ ha.ru.ya.su.MI ‘spring holiday’ ki.RU ‘to cut’

4 N-pattern systems are found in Korean dialects, too. According to Fukui (2000), they are found in Jeolla and some western parts of Gyeongsang. N-pattern systems in Gyeongsang vary from one-pattern to five-pattern systems (Fukui 2000; Son 2007). The Masan dialect in South Gyeongsang, for example, has five distinctive patterns (Son 2007).

358 Sun-Ah Jun and Haruo Kubozono N-pattern systems include a one-pattern system where all words exhibit one and the same pitch pattern. This system is sporadically found in Japanese (e.g. the Miyakonojo and Kobayashi dialects in Miyazaki Prefecture, in the vicinity of Kagoshima Japanese). In Kobayashi Japanese, for example, all words are pronounced with a H tone on the final syllable; that is, it has only one accent pattern, one that corresponds to Type B in Kagoshima (3b). Since this system permits only one pattern, pitch does not have a distinctive role, just as in the accentless systems mentioned above. However, one-pattern and accentless systems must be distinguished from each other: ‘In the one-pattern system of Miyakonojo, there is a definite pitch shape, and native speakers recognize it, but in the accentless type, there is no native-speaker recognition of the pitch pattern, though phrases are normally pronounced in a flat tone with a slight rise in the middle’ (Shibatani 1990: 213). In other words, some pitch pattern must be available in the lexicon (though not specified for each word) in onepattern systems, while pitch is not a property of words in accentless systems. The distinction between multi-pattern and n-pattern systems is closely related to the domain where pitch patterns are manifested—that is, the word versus the bunsetsu (a min imal syntactic phrase). This difference can be shown via a comparison of Tokyo Japanese (multi-pattern system) and Kagoshima Japanese (n-pattern system). To take the loanword pi.za ‘pizza’, for example, both systems assign an accent (or H tone) to the initial syllable, pronouncing the word in the same way. However, they display crucial differences when the word combines with one or more grammatical particles: the prominence remains on the same syllable in Tokyo, but it moves rightwards in Kagoshima. In the latter system, the H tone is realized on the penultimate syllable in the bunsetsu domain. This is illustrated in (4). (4) Tokyo versus Kagoshima Japanese Tokyo

Kagoshima

Gloss

PI.za

PI.za

pizza

PI.za-ga

pi.ZA-ga

pizza-nom

PI.za-ka.ra

pi.za-KA.ra

from pizza

PI.za-ka.ra-mo

pi.za-ka.RA-mo

from pizza, too

This difference can be attributed to the nature of pitch accent in the two dialect systems. In the Tokyo-type system, it is a certain position (syllable or mora) within the word that is phonologically marked, as shown in the accentual analyses in (1) and (2) above, while it is a certain pitch pattern or melody that is phonologically specified in the Kagoshima-type system. These two types are labelled by Hayata (1999) as ‘word accent’ and ‘word tone’, respect ively. The latter is comparable to ‘syllable tone’ as found in Mandarin Chinese and other languages traditionally known as ‘tone languages’ (Pike 1948). Hayata’s notions of ‘word accent’ and ‘word tone’ roughly correspond to Uwano’s (1999) ‘multi-pattern system’ and ‘n-pattern system’, respectively. Pitch accent systems also vary depending on the basic phonological units used to measure phonological distances and to bear the lexical prominence (McCawley 1978).5 Tokyo 5 Another parameter that can distinguish one pitch accent system from another in Japanese is the phonetic feature used to distinguish one pitch pattern from another (Kubozono 2018a). Tokyo Japanese and many other systems are sensitive to a pitch fall, but Narada Japanese, a highly endangered dialect spoken in Yamanashi Prefecture, is sensitive to a pitch rise (Uwano 2012).

ASIAN PACIFIC RIM 359 and Kagoshima Japanese exhibit crucial differences in this respect, too. In Tokyo, most loanwords and a good number of native and Sino-Japanese words bear pitch accent on the third mora from the end of the word—that is, it is a mora-counting system. In Kagoshima, in contrast, it is the syllable that is consistently used to calculate the position of the promin ence. Thus, the trimoraic nouns in (5) are invariably accented or H toned on the antepenul timate mora in Tokyo, while they are H toned on the penultimate syllable in Kagoshima. (5) Mora versus syllable Tokyo

Kagoshima

Gloss

KA.na.da

ka.NA.da

Canada

DOi.tu

DOI.tu

Germany

In.do

IN.do

India

HA.wai

HA.wai

Hawaii

su.WEe.den

su.WEE.den

Sweden

mo.ROk.ko

mo.ROK.ko

Morocco

The distinction between mora-based and syllable-based systems is more complicated since the counting and prominence-bearing units are not always identical. Tokyo Japanese, for example, is defined as a ‘mora-counting, syllable language’ because not all morae can bear the pitch accent (McCawley 1978). Specifically, non-initial morae of heavy (i.e. bimoraic) syllables cannot bear the accent and, hence, shift the accent to the head mora of the same syllable. This is illustrated with four-mora nouns in (6), where Kagoshima patterns are also given for comparison. (6) Accent shift in Tokyo Tokyo

Kagoshima

Gloss

ROn.don [ro’n.don], *roN.don [ron’.don] RON.don

London

In.da.su [i’n.da.su], *iN.da.su [in’.da.su]

in.DA.su

Indus

SAi.daa [sa’i.daa], *saI.daa [sai’.daa]

SAI.daa

cider, lemonade

PAa.tii [pa’a.tii], *paA.tii [paa’.tii]

PAA.tii

party

Many Japanese dialects are mora-counting but sensitive to the syllable just like Tokyo Japanese. They can be labelled as ‘mora-counting, syllable’ systems in McCawley’s (1978) typology. On the other hand, Kagoshima Japanese is a ‘syllable-counting, syllable dialect’ since it counts the number of syllables and bears the H tone on a certain syllable. Systems of the Kagoshima type are rare in Japanese, but they are quite common in Korean.6 Note that the classification based on the mora/syllable is independent of the typology based on the nature of pitch accent—multi-pattern systems (word accent) versus n- pattern systems (word tone). In fact, most multi-pattern and n-pattern systems in 6 See Kubozono’s (2018b) account of loanword accent in Gyeongsang Korean as an example of morabased analysis of the language.

360 Sun-Ah Jun and Haruo Kubozono Japanese are mora-counting ones, including Kagoshima’s sister dialects, such as Nagasaki and Koshikijima Japanese (Matsuura 2014; Kubozono 2016). This contrasts with the fact that pitch accent systems in Korean are generally analysed as syllable-based ones, where the syllable serves as both a counting and a prominence-bearing unit, regardless of whether they are multi-pattern or n-pattern systems (Son 2007; Fukui 2013). Japanese pitch accent systems also vary in terms of the direction in which accent patterns are computed. In most pitch accent systems of Japanese including those mentioned so far (e.g. Tokyo, Kagoshima, Kobayashi), the position of the phonological prominence (accent or H tone) is calculated from the end of the word. Some systems, however, compute the prominent position from the beginning of the word. For example, Nagasaki Japanese, which is a sister dialect to Kagoshima, assigns a H tone on the second mora in Type A words, whether that mora is the head or non-head mora of the syllable (Matsuura 2014). Interestingly, Kagoshima and Nagasaki Japanese calculate the prominence patterns from opposite directions (i.e. right to left vs. left to right), although they have very similar twopattern systems otherwise. More interestingly, the two procedures can coexist in a single pitch accent system: one pattern is defined from the left edge of the word, while another pattern in the same system is computed from the right edge (Kubozono 2018a). In Japanese, these ‘hybrid’ systems are found in the Echizen-Kokonogi dialect (Nitta 2012) and the Yuwan dialect of Amami Ryukyuan (Niinaga and Ogawa 2011).7 Finally, compound accent can also distinguish one pitch accent system from another. In general, languages with word prosodies, whether pitch accent or stress accent, can be classified into two groups depending on whether their compound accent is formulated with reference to the phonological properties of the initial or final member of the compound (Kubozono 2012). Tokyo Japanese has a typical right-dominant compound rule that refers to the phonological structure of the final member, while Kagoshima Japanese and its sister dialects with two-pattern systems have left-dominant compound rules whereby compound prosody is determined by the prosodic properties of the initial member.8 One complication is the existence of a hybrid system like Kyoto/Osaka Japanese, whose compound accent rule refers to both the initial and the final members (Uwano 1997; Kubozono 2012, 2015, 2018a).9 In Hayata’s (1999) analysis, this is a hybrid system involving word accent (pitch fall) and word tone (melody), the latter referring specifically to the wordinitial pitch height.10

7 The Kuwanoura dialect of Koshikijima Japanese presents a more interesting situation: it shows two H tones in relatively long words in which the first H tone is calculated from the left edge and the second H tone from the right edge within one and the same word (Kubozono 2016, 2019). Similar hybrid systems are reported in some Korean dialects too, such as the Gwangyang dialect in South Jeolla (Fukui 2000), Jinju, and other dialects in South Gyeongsang (Son 2007). 8 This feature is obviously independent of the directionality discussed in the preceding paragraph: Tokyo and Kagoshima belong to the same group with respect to the directionality, but to different groups in terms of compound prosody. 9 The Daegu dialect of North Gyeongsang shows the same hybrid feature (Son 2007). 10 It is worth referring to the historical aspects of Japanese pitch accent systems, a topic that is closely related to the regional differences found in the language but is not discussed in this chapter due to length limitations. See de Boer (2017), for example, for a controversy regarding the historical development of Japanese lexical prosody.

ASIAN PACIFIC RIM 361

24.2.2 Japanese intonation In Japanese, as in many other languages, the overall pitch patterns of utterances are determined by the combination of lexical and post-lexical pitch information—that is, the lexical pitch patterns and post-lexical patterns manifesting phrase-level and sentence-level pitch properties. The latter includes information on sentence types (e.g. declarative vs. interrogative sentences) as well as pitch features signalling the edges of syntactic domains (e.g. phrases, clauses, sentences). In terms of the prosodic hierarchy, the largest prosodic unit at the lexical level in Japanese is the ‘phonological word’ or ‘prosodic word’ (Pword), which usually corresponds to a content word (e.g. a noun, verb, or adjective), whether it is lexically accented or unaccented, and is hence the domain within which there can be at most one pitch accent. In contrast, the smallest prosodic unit at the post-lexical level is the ‘accentual phrase’ (AP) (Pierrehumbert and Beckman 1988), which is also called the ‘minor phrase’ (McCawley 1968; Poser 1984a; Kubozono 1988; Selkirk and Tateishi 1991). In Tokyo Japanese, the AP is often defined as the domain of initial pitch rise, which serves to signal the beginning of a syntactic phrase. The relationship between the Pword and the AP is straightforward in principle, since an AP typically consists of one Pword plus one or more optional grammatical particles: every syntactic phrase in (1) and (2), for example, usually constitutes one AP.11 In Tokyo Japanese, this simple picture is complicated by the existence of many lexically ‘unaccented’ words, which often form an AP together with other Pwords. For example, uma’i nomi’mono ‘tasty drink’ constitutes two AP’s since the first Pword, uma’i ‘tasty’, is lexically accented, whereas amai nomi’mono ‘sweet drink’ often (but not always) forms one AP because the first Pword, amai ‘sweet’, is lexically unaccented. This is the place where the lexical notion of the Pword and the post-lexical notion of the AP come to disagree with each other. Stated conversely, the two notions would correspond with each other nicely if there were no ‘unaccented’ words in the language. This simpler picture is actually observed in some dialects of Japanese (e.g. Kagoshima Japanese), where every Pword at the lexical level always forms one AP together with the following grammatical particle(s) at the post-lexical level. In addition to the AP, most studies of Japanese intonation postulate two larger/higher prosodic units, the utterance (U) and the intermediate phrase (ip) (Pierrehumbert and Beckman 1988), the latter also known as the ‘major phrase’ (McCawley 1968; Poser 1984a; Kubozono 1988; Selkirk and Tateishi 1991). The U is loosely defined as the entire utterance— thus roughly corresponding to an intonational phrase (IP) in Korean and other languages— whereas the ip is usually defined as the domain of downstep, a post-lexical process whereby pitch range is lowered or narrowed after an accented AP. Every ip consists of one to three AP’s since downstep can occur iteratively within a certain syntactic domain (Poser 1984a) while being subject to the principle of rhythmic alternation in Tokyo Japanese 11 It is difficult to phonetically distinguish between these simplex AP’s and Pwords of the same phonological structure: for example, the phrase in HA.na-ga in (1a) is phonetically indistinguishable from a hypothetical trimoraic noun HA.na.ga in spontaneous speech. On the other hand, AP’s containing two Pwords are phonetically distinguishable from those containing only one Pword: thus, amai nomi’mono ‘sweet drink’ can have a pause between the two Pwords, while a hypothetical noun of the same segmental and accentual structure, amainomi’mono, cannot.

362 Sun-Ah Jun and Haruo Kubozono (Kubozono 1988; Selkirk and Tateishi 1991). Moreover, because downstep is believed to be blocked at major syntactic boundaries (e.g. clause boundaries), ip boundaries are usually posited at these syntactic boundaries. The prosodic hierarchy thus described can be illustrated as in (7), which consists of accented words, and (8), which entirely consists of unaccented words. While they both involve an ip boundary at the major syntactic boundary (i.e. between the subject and verb phrases), they differ in the number and phrasing of AP’s, as can be seen from a comparison of (7b) and (8b): [ ] and { } denote AP and ip boundaries, respectively, whereas gen and top mean genitive and topic particles, respectively. (7) a. b.

bo’ku-no a’ni-wa ro’ndon-ni su’nda. I-gen elder brother- top London-in lived ‘My elder brother lived in London’ {[bo’ku-no][a’ni-wa]} {[ro’ndon-ni] [su’nda]}

(8)

watasi-no ane-wa igirisu-ni I-gen elder sister-top England-to ‘My elder sister went to England’ {[watasi-no ane-wa]} {[igirisu-ni itta]}

a. b.

itta. went

As seen in §24.2.1, Japanese varies greatly from one dialect to another in the organization of lexical pitch accent systems. For example, multi-pattern systems involve rather dense pitch specifications at the lexical level, while accentless systems lack such specifications. These lexical differences certainly affect the overall pitch shapes of utterances. In addition, Japanese dialects can also differ in how they prosodically phrase one and the same sentence into ip’s and AP’s (Igarashi 2015) and also how they manifest different sentence types and syntactic boundaries phonetically. Japanese dialects are not uniform in the latter respect either, as evidenced by the fact that questions are signalled by a pitch rise in Tokyo and its neighbouring dialects, while the same sentence type involves a pitch fall in Kagoshima Japanese and its neighbours (Kubozono 2018c). As compared with lexical prosody, postlexical prosody of Japanese is a largely understudied area and is worth more serious attention in the future.

24.3 Korean This section describes the word prosody and intonation of Korean, covering the Standard (Seoul) dialect as well as the main regional dialects and a variety of Korean spoken in Yanbian Autonomous Prefecture, China, north of North Korea. Since intonation patterns are heavily influenced by word prosody, word prosody is introduced first (§24.3.1) followed by intonation (§24.3.2 and §24.3.3). Where relevant, a brief comment is made on the similarities to or differences from Japanese prosody at both the lexical and post- lexical levels.

ASIAN PACIFIC RIM 363

24.3.1 Korean word prosody Middle Korean (spoken in the eleventh to fifteenth centuries) was a tone language, having three tonal types: High, Low, and Rising (Sohn 1999/2001; Martin 1951, 1992; Gim 1994; I. Lee and Ramsey 2000; S.-O. Lee 2000; Ito and Kenstowicz 2017), but in Modern Korean the tonal contrast has been weakened or fully lost. That is, some dialects of Korean are lexically pitch accented while others are accentless, thus resembling the overall pattern of word prosody of Japanese dialects. There are six major regional dialects of Korean in South Korea,12 representing six geographical regions (the names of the dialects are the same as the names of the province except for the Seoul dialect). They are Seoul, which is the standard dialect of Korean spoken in the central area of Korea including the Gyeonggi province, Gangwon, Chungcheong, Jeolla, Gyeongsang, and Jeju dialects. See Map 24.2 for a map of South Korean dialects. Based on the lexical specification of pitch, these dialects can be grouped into two types (Sohn 1999/2001; S.-O. Lee 2000; Yeon 2012). The regional varieties of Korean spoken in the central and western parts of Korea (i.e. Seoul/Gyeonggi, Chungcheong, Jeolla, and Jeju dialects) do not in general have any lexical specification of pitch, and thus have accentless word prosody (H.-B. Lee 1987; S.-A. Jun 1993, 1998; H.-Y. Lee 1997; S.-H. Lee 2014; Oh and Jun 2018). On the other hand, those varieties spoken in the eastern part of Korea

Seoul

Gangwon

Gyeonggi Chungcheong Gyeongsang Jeolla

Jeju Map 24.2 Six dialect areas of Korean spoken in South Korea 12 Scholars have traditionally grouped the varieties of Korean spoken in North Korea into three dialects: Pyeongan (accentless, western dialect), Hamgyeong (accented, eastern dialect), and Hwanghae (accentless, central dialect), although the Hwanghae dialect is sometimes grouped with the Pyeongan dialect or with the central dialect group that includes the Seoul and Gyeonggi dialects (Yi 1995; Sohn 1999/2001; I. Lee and Ramsey 2000; King 2006; Yeon 2012; Ito and Kenstowicz 2017). Due to the limited resources on North Korean dialects, the prosody of Korean dialects spoken in North Korea is not covered in this chapter.

364 Sun-Ah Jun and Haruo Kubozono (i.e. Gyeongsang and Gangwon, excluding the city of Chuncheon and its neighbouring areas) are known to have a lexical specification of pitch. Traditionally, these eastern dialects have been analysed as lexical tone languages (Ramsey 1978; Hur 1985; Gim 1994; I. Lee and Ramsey 2000), but the Gyeongsang dialect has been reanalysed as a lexical pitch accent language (G.-R. Kim 1988; Chung 1991; Kenstowicz and Sohn 1997; N.-J. Kim 1997; J. Jun et al. 2006; J. Kim and Jun 2009). Gim (1994) argues that the tonal categories of accented dialects show which dialects are more conservative, by being close to the tonal system of Middle Korean. For example, the tonal category of Low, High, Rising in Middle Korean has developed to High, Mid, Low, respectively, in the Changwon variety of Gyeongsang Korean, keeping a three-way tonal contrast, but Middle Korean High and Rising tones merged to a Mid tone in the Daegu variety of Gyeongsang Korean. That is, the three-way tonal contrast in Middle Korean has been reduced to a two-way contrast in the Daegu dialect. However, unlike Tokyo Japanese, where not all words are accented and the pitch shape of a lexical accent is fixed—that is, falling (HL)—all Gyeongsang Korean words are accented and the pitch shape of a lexical accent is not fixed but varies in the number and location of H tones. For example, a three-syllable word in North Gyeongsang Korean (the Daegu area) can have four possible tonal patterns, Initial H, Penult H, Final H, and Double H, the last being word-initial, as shown in (9) (e.g. Chung 1991; Kenstowicz and Sohn 1997; N.-J. Kim 1997; J. Jun et al. 2006). Similarly, the tonal types possible for a three-syllable word in South Gyeongsang Korean (the Busan area) are HLL, LHL, HHL, and LHH. Since three tonal types (HL, HH, LH) are possible for a two-syllable word in both South and North Gyeongsang Korean, Gyeongsang Korean can be classified to have a multiple pattern. (9)

a. b. c. d.

Initial H Penult H Final H Double H

/me’.nu.ɾi/ /ʌ.mu’.i/ /wʌ.nʌ.min’/ /o’.ɾe’.pi/

‘daughter-in-law’ ‘mother’ ‘native speaker’ ‘older brother’

Like the Gyeongsang dialect spoken in the Daegu area, Yanbian Korean is lexically accented, having a two-way tonal contrast, High and Low. Furthermore, similar to Tokyo Japanese, it has n + 1 patterns for words with n syllables. That is, a High tone can occur in each syllable of a word, but each prosodic word can have at most one H tone. Yanbian Korean is a variety of the Hamgyeong dialect in North Korea, which is claimed to be more conservative than Gyeongsang Korean (Gim 1994; H. Jun 1998; Ito 2008; 2014a, 2014b). In addition to pitch, Korean dialects further vary in the presence of a vowel length contrast, which is independent of their lexical pitch specification. For example, among accentless dialects, the Chungcheong and South Jeolla (called Chonnam) dialects have a word-initial13 vowel length contrast, while the Seoul/Gyeonggi and Jeju dialects do not (S.-A. Jun 1989, 1993, 1998; Park 1994; S.-O. Lee 2000; Shin et al. 2013; Kang et al. 2015; Oh and Jun 2018). On the other hand, most varieties of Gyeongsang and Gangwon Korean, which are accented, also preserve the vowel length contrast (Martin 1951, 1992; Park 1994; N.-J. Kim 1997; Sohn 1999/2001; S.-O. Lee 2000). In accentless dialects with a vowel length contrast, the mora is the tone-bearing unit (TBU), whereby only the vowel is moraic 13 The word-initial long vowel is maintained only when the word occurs initially in an AP (S.-A. Jun 1998) (see §24.3.2 for a definition of an AP).

ASIAN PACIFIC RIM 365 (i.e. not the coda consonant), while in accentless dialects without a vowel length contrast, the syllable is the TBU. However, this correlation between TBU and vowel length does not seem to hold in accented dialects. N.-J. Kim (1997) argues that the syllable is the TBU in Gyeongsang Korean even though this dialect has a vowel-length contrast.

24.3.2 Korean intonation: melodic aspects The best-studied feature of the phrasal prosody of Korean is intonation and the way it defines a prosodic structure. The prosodic structure shared by all dialects of Korean, regardless of their type of word prosody, is a hierarchical structure of three prosodic units defined by intonation: an IP, an ip, and an AP. An IP is the largest prosodic unit, defined by a boundary tone realized on the final syllable of an IP, which is substantially lengthened. It is also optionally followed by a pause. A typologically unique feature of the Korean IP is that there are multiple IP boundary tones, whereby a boundary tone can have multiple (up to five) tonal targets realized on the last syllable or mora of an IP. In Seoul Korean, it has been shown that there are nine types of IP boundary tone: L%, H%, LH%, HL%, LHL%, HLH%, LHLH%, HLHL%, and LHLHL% (S.-A. Jun 1993, 2000, 2005b). Among these, L%, H%, LH%, HL%, and LHL% are also common in other varieties of Korean, and LHL% is more common in spontaneous speech in the Jeolla and Chungcheong dialects than in other dialects of Korean (S.-A. Jun 1989; Oh and Jun 2018). The ip, which is smaller than the IP, is the domain of pitch reset. It has two major functions in Korean, marking syntactic grouping and marking prominence (S.-A. Jun 2006, 2011; S.-A. Jun and Jiang 2019). Only the syntax-marking ip has a final boundary tone (H- or L-). Typically, the right edge of a syntax-marking ip matches the right edge of a syntactic constituent, and the ip-final syllable is slightly lengthened. On the other hand, the acoustic cues for a prominence-marking ip are found at its beginning. This is because a prominent or focused word with expanded pitch range starts the ip, causing the f0 peak of the ip-initial word to be higher than that of the immediately preceding word, the last word of the preceding ip. In this situation, the preceding ip’s final syllable is not lengthened, neither does it carry an ip boundary tone (unless the preceding ip happens to be also the end of a syntaxmarking ip). Therefore, the phonetic realizations of an ip depend on its function. Finally, an AP in Korean is smaller than an ip and is slightly larger than a word. The Seoul Korean AP is typically three or four syllables long; about 84% of AP’s have only one content word (S.-A. Jun and Fougeron 2000; S. Kim 2004). The AP is the smallest prosodic unit defined by intonation in Korean. In all accentless varieties, the AP begins with either a L or a H tone, determined by the laryngeal feature of the AP-initial segment. When the AP-initial segment is aspirated (including /s/ and /h/) or tense, the AP-initial tone is H, but L otherwise. This is a typologically very unusual feature of Korean intonation. Though all accentless dialects of Korean share this AP-initial tonal category, they can be grouped into two, depending on the AP-final tone—that is, whether it is rising,14 as in the Seoul/Gyeonggi and Jeju dialects (S.-H. Lee 2014), or falling, as in the Chungcheong and Jeolla dialects. 14 A typical AP-final boundary tone in Seoul Korean is Ha, but sometimes La is used. Though more research is needed, so far the factors that are known to trigger this change are the length of an AP and the Obligatory Contour Principle-like tonal context. That is, La occurs the most often when the AP has

366 Sun-Ah Jun and Haruo Kubozono In the Seoul/Gyeonggi and Jeju dialects, the basic tonal pattern of an AP is either LHLH or HHLH, thus simplified as THLH (T = H or L). The first two tones are realized on the initial two syllables of an AP, and the final two tones on the final two syllables. In Korean ToBI (S.-A. Jun 2000, 2005b), which is a transcription system for Seoul Korean intonation, these patterns are labelled as ‘L +H L+ Ha’ or ‘H +H L+ Ha’, respectively (‘+H’ is a H tone on the second syllable of an AP, ‘+L’ is a L tone on the penultimate syllable of an AP, and ‘Ha’ is a H tone on the AP-final syllable). When an AP is longer than three syllables, all four tones are realized (i.e. [L +H L+ Ha], [H +H L+ Ha]), though one of the two medial tones can be partially undershot. However, in trisyllabic and shorter AP’s, one or both medial tones (i.e. +H, L+) may not be realized (i.e. they are fully undershot), resulting in five more AP tonal patterns: [L +H Ha], [L L+ Ha], [H L+ Ha], [L Ha], and [H Ha]. Additionally, the AP-final H tone is, on rare occasions, replaced with a L tone (La), creating seven more tonal patterns: [L La], [H La], [L +H La], [H +H La], [H L+ La], [L +H L+ La], and [H +H L+ La]. All in all, therefore, there should be only 14 surface variations of the AP tonal pattern. This means there are only four AP tonal patterns when four tones are all realized: [LHLH], [HHLH], [LHLL], and [HHLL], and the following four-tone sequences are not allowed: *[TLHT], *[HLLH], and *[LHHL]. Finally, when an AP is the last AP of a higher prosodic unit (ip or IP), the AP-final syllable carries the boundary tone of an ip or IP, overriding the AP-final boundary tone. An example of a pitch track of a sentence (10) produced by a Seoul speaker is shown in Figure 24.1. It represents a single IP with two ip’s, the first having three AP’s and the second two. In the first ip, the f0 peak of each AP is on the AP-final syllable, unlike what is seen in the two AP’s in the second ip. The first AP here has the [L+H Ha] pattern, commonly used when the AP includes a prominent word. The f0 of the first H exceeds that of the Ha on the last syllable due to a reset at the beginning of the ip. The final AP does not show an f0 peak on its final syllable because it carries a low boundary tone (L%) of the IP, signalling a statement. (10) [jʌŋanɨn imoɾaŋ imobuɾaŋ jʌŋhwaɡwane kandejo] Younga-TOP with aunt with uncle a movie theatre-loc is going ‘Younga is going to a movie theatre with her aunt and uncle.’ In other accentless dialects, such as Chungcheong and Jeolla Korean, the basic tonal pattern of AP is THL(L). The initial tone can be L or H as in Seoul and Jeju, but the second tone is always H, realized on the second mora. The AP-final tone is typically L, but on longer AP’s the penultimate mora often has a L tone, suggesting a THLL basic tone pattern (Oh and Jun 2018). This penultimate L is obligatory when the AP is longer than three morae and is final in a higher prosodic unit whose final boundary tone begins with H (e.g. H-, H%, HL%). An utterance of (10) by a Chonnam speaker is shown in Figure 24.2. The second ip shows pitch reset, as in the Seoul example shown in Figure 24.1, but it combines the last two words into a single AP.15 Observe how the f0 peak of each AP is on the second mora, real izations of the LHL tonal pattern, and how they decline within the first ip. three tones and when the preceding AP tones are all High (i.e. H and +H but no L+) and the following AP begins with a H tone. S. Kim (2004) shows that, based on the Radio corpus (two interviews) and the Read speech corpus, La occurs in 10.6% of L-initial AP’s but in 29.4% of H-initial AP’s. 15 In part depending on the length of the verb, the second ip could have two AP’s, as in the Seoul example in Figure 24.1, without change in meaning. The Chonnam utterance here has the standard

ASIAN PACIFIC RIM 367

f0 (Hz)

350

80

jʌŋ a nɨn i mo ɾaŋ i mo bu ɾaŋ jʌŋ hwa gwan e kan Younga-TOP aunt-with AP

0

AP

uncle-with

to the movie ip

Time (s)

de

jo

is going AP

IP

2.712

Figure 24.1 Waveform and f0 track of example (10) produced as [{(jʌŋanɨn)(imoɾaŋ) (imobuɾaŋ)}ip{(jʌŋhwagwane)(kandejo)}ip]IP by a Seoul Korean speaker.

In accented dialects, as expected, the AP-initial tones are not sensitive to the laryngeal feature of a segment. Instead, as in Japanese, an AP’s tonal pattern is determined by the type of lexical pitch accent and the AP boundary tone. In Gyeongsang Korean, the AP boundary tone is L (J. Jun et al. 2006; J. Kim and Jun 2009), but it can be L or H in Yanbian Korean (S.-A. Jun and Jiang 2019). Since an AP allows only one lexical H, only one of them survives if the AP contains more than one accented word. Dialects differ in the rules governing which H survives within an AP (or phonological phrase). For example, in North Gyeongsang Korean, there is a strong tendency for a H to fall on the leftmost word within an AP (N.-J. Kim 1997). However, in Yanbian Korean the lexical H of the noun always survives, even when the adnominal modifier, which comes before the noun, is a relative clause (S.-A. Jun and Jiang 2019). In addition, the type of lexical tone within an AP can affect the tonal realization of the following AP. In North Gyeongsang Korean, an AP is downstepped after an AP whose final syllable has a lexical L, but upstepped after an AP whose final syllable has a lexical H. In this dialect, upstep or downstep may occur iteratively within an ip (Kenstowicz and Sohn 1997; N.-J. Kim 1997; J. Jun et al. 2006). That is, as in Japanese, downstep is blocked across an ip boundary.

verbal ending in the interest of comparability with the Seoul utterance, rather than the dialectal Chonnam ending.

368 Sun-Ah Jun and Haruo Kubozono

f0 (Hz)

270

100 jʌŋ

a

nɨn i mo ɾaŋ i

Younga-TOP

aunt-with AP

0

mo bu ɾaŋ jʌŋ hwagwan e kan de uncle-with

AP

to the movie ip

Time (s)

jo

is going IP

2.506

Figure 24.2 Waveform and f0 track of example (10) produced as [{(jʌŋanɨn)(imoɾaŋ)(imobuɾaŋ)}ip {(jʌŋhwagwane kandejo)}ip]IP by a Chonnam Korean speaker.

24.3.3 Korean intonation: prosodic phrasing In all varieties of Korean, prosodic phrasing, the grouping of words into prosodic units, plays a major role in marking prominence and syntactic/semantic/pragmatic information. As mentioned earlier, a narrowly focused word starts an ip, but an equally important prosodic feature of focus in accentless Korean dialects is post-focal dephrasing. That is, the AP tones are deleted in the post-focal words, which are included in the large AP starting with the focused word. However, when the post-focal string is long or syntactically heavy, these words tend to retain their AP structure but are realized in a compressed pitch range (S.-A. Jun and Lee 1998; S.-A. Jun 2011). In Gyeongsang Korean, where all words are lexically accented, a focused word begins an ip, but post-focal AP’s are not dephrased. Instead, the post-focal AP is either downstepped or upstepped, depending on the pitch accent type of the focused item. That is, the tonal interaction between AP’s mentioned at the end of §24.3.2 is maintained, but relative to the neutral focus condition the f0 of a downstepped AP is further lowered and the f0 of an upstepped AP is further raised in the narrow-focus condition. Besides marking syntactic structure, prosodic phrasing also plays a role in distinguishing wh-questions from yes/no questions. Wh-words in Korean (/nuku/ ‘who’, /muʌt/ ‘what’, /ʌntʃe/ ‘when’, /ʌti/ ‘where’) can be interpreted as indefinite pronouns (‘anyone’, ‘anything’, ‘any time’, ‘anywhere’) depending on the AP-phrasing of the word and the following verb. For example, /ʌti kasejo/ ‘where/anywhere go-honorific’ means ‘Where are you going?’ if

ASIAN PACIFIC RIM 369 the two words form one AP, but ‘Are you going anywhere?’ if there is an AP boundary between them (S.-A. Jun and Oh 1996; Yun 2018; Yun and Lee, in press). Yanbian Korean additionally shows an interesting interaction between syntax and accentual phrasing. As mentioned in §24.3.2, all adnominal modifiers before a nominal syntactic head lose their lexical H when they form one AP with the head noun, while a verb loses its lexical H when it forms a single AP with a preceding object noun, not the object. However, regardless of the type of the syntactic head, a focused item always begins an AP and maintains its lexical H, while post-focal items lose their lexical H unless the focused word is one of a number of prenominal modifiers of a heavy noun phrase (NP), in which case the post-focal head noun retains its lexical H. This constraint is so powerful that if a focused modifier is immediately followed by a head noun in a heavy NP, the modifier’s lexical H is not realized, while the head noun’s lexical H is, creating a mismatch between the presence of H and the focus. The deaccented focused modifier is realized with increased intensity and duration on its initial syllable, thus compensating for the absence of the lexical H. More details on focus realization in Yanbian Korean are provided in S.-A. Jun and Jiang (2019).

24.4 Conclusion We have described the word prosody and intonation of Japanese and Korean and have shown that these languages have quite similar prosodic features. In both languages, word prosody varies across dialects in terms of whether pitch is used distinctively, whether the syllable or the mora is used as the TBU or as a measure of phonological distance, and whether the tonal system is a multi-pattern or an n-pattern system. The proportion of accented dialects would seem to be higher in Japanese than in Korean. For intonation, the languages share the same prosodic structure, with three intonationally defined prosodic units: an AP, an ip, and a U or IP. In sum, even though the prosodic system of Tokyo Japanese, with its lexical accents and mora-counting, is quite different from that of Seoul Korean, which is accentless and syllable-counting, the prosodic systems of these two languages appear to be quite similar if we consider a wider group of varieties.

Acknowledgement The work reported in this chapter was supported by a University of California, Los Angeles, senate grant to the first author and JSPS KAKENHI grants (Grant nos. 19H00530, 16H06319, and 17K18502) and the NINJAL collaborative research project ‘Cross-Linguistic Studies of Japanese Prosody and Grammar’ to the second author.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

chapter 25

Austron esi a Nikolaus P. Himmelmann and Daniel Kaufman

25.1 Introduction The Austronesian languages cover a vast area, reaching from Madagascar in the west to Hawaiʻi and Easter Island in the east. With close to 1,300 languages, over 550 of which belong to the large Oceanic subgroup spanning the Pacific Ocean, they constitute one of the largest well-established phylogenetic units of languages. While a few Austronesian languages, in particular Malay in its many varieties and the major Philippine languages, have been documented for some centuries, most of them remain underdocumented and understudied. The major monographic reference work on Austronesian languages is Blust (2013). Adelaar and Himmelmann (2005) and Lynch et al. (2005) provide language sketches as well as typological and historical overviews of the non-Oceanic and Oceanic Austronesian languages, respectively. Major nodes in the phylogenetic structure of the Austronesian family are shown in Figure 25.1 (see Blust 2013: ch. 10 for a summary). Following Ross (2008), the italicized groupings in Figure 25.1 are not valid phylogenetic subgroups but rather collections of subgroups whose relations have not yet been fully worked out. The groupings in boxes, on the other hand, are thought to represent single proto-languages and have been partially reconstructed using the comparative method. Most of the languages mentioned in this chapter belong to the western Malayo-Polynesian linkage, which includes all the languages of the Philippines and, with a few exceptions, Indonesia. When languages from other branches are discussed, this is explicity noted. The prosody of none of the Austronesian languages has been studied to a degree that comes close to that of the well-studied European languages. There are a few specialist pros odic studies, but there is no comprehensive description of a prosodic system that covers how word-based prominence interacts with higher-level prominence. Sections on prosody in descriptive grammars, if they mention the topic at all, usually do not go beyond the statement of a default stress pattern (without providing evidence for a stress analysis) and the odd remark on intonation patterns, referring to pitch trajectories rather than attempting to identify phonologically relevant tonal targets.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

AUSTRONESIA 371 Proto Austronesian

Formosan

Proto Malayo-Polynesian (MP)

Western MP

Proto Central/Eastern MP

Central MP

Proto Eastern MP

Proto South Halmahera/ West New Guinea

Proto Oceanic

Figure 25.1 The Austronesian family tree (Blust 1999; Ross 2008).

This chapter is exclusively concerned with the two major word-based prosodies, lexical tone and lexical stress, and with phrase-based intonation, ignoring other word-related suprasegmental phenomena such as Sundanese nasal spreading, made famous by Robins (1957) and repeatedly taken up in the phonological literature (cf. Blust 2013: 238–241). Stress and tone have their own sections in Blust (2013), which, however, does not deal with intonation.

25.2 Lexical tone Most Austronesian languages do not use tone to distinguish lexical items. Distinctive lexical tone patterns have only been reported for a few geographically widely separated language groups, for which see Edmondson and Gregerson (1993), Remijsen (2001), Brunelle (2005), and Blust (2013: 657–659). This section provides some general observations and briefly reports on the cross-linguistically very unusual tone systems in West New Guinea languages. See also chapter 23 for the Chamic subgroup of Western Malayo-Polynesian and chapter 26 for languages in New Guinea. In most instances, distinctive lexical tone in Austronesian languages is transparently due to contact influences, which provide important evidence for transferred tonogenesis. Tonal distinctions usually are restricted, either phonotactically (e.g. contrast only on final syl lable) or with regard to permissible tone patterns per word (e.g. words bear either high or low tone). Contact-induced tonogenesis often involves a shift from disyllabic words characteristic of the family to monosyllables as the most common word type. Edmondson and Gregersen (1993) contains specialist chapters on a number of the better-studied Austronesian tone languages. The little that is known about the West New Guinea languages, mostly spoken on the islands along the Bird’s Head and Cenderawasih Bay in eastern Indonesia, points to a bewildering variety of word-prosodic systems. These languages are part of the extended contact zone between Austronesian and Papuan languages along the island of New Guinea.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

372 NIKOLAUS P. HIMMELMANN AND DANIEL KAUFMAN Monosyllabic words have a six-way tone contrast in Magey Matbat according to Remijsen (2007). From the few examples he gives, it appears that at least one syllable in polysyllabic words is toneless, but the position of tone-bearing syllables is not predictable. This contrasts with Moor, which is analysed by Kamholz (2014: 101–106) as having four tonal patterns, largely confined to the final two syllables. More importantly, and rather unusually for a tone language, ‘tones are realized only on phrase-final words’ (Kamholz 2014: 102). Kamholz (2014: 116 and passim) also briefly mentions Yerisiam and Yaur as languages with complex word-tone systems plus contrastive vowel length. A particularly complex—and cross-linguistically unusual—word-prosodic system is found in Ma’ya. Remijsen (2001, 2002) makes a convincing case for an analysis in terms of both lexical stress and lexical tone. There are three tonal contrasts that are confined to the final syllable. In addition, lexical bases differ in whether they are stressed on the penultimate or ultimate syllable. That is, there are minimal pairs that differ only with regard to tone, e.g. /sa12/ ‘to sweep’ vs. /sa3/ ‘to climb’ vs. toneless /sa/ ‘one’ (Remijsen 2002: 596). There are also minimal pairs differing only in stress, e.g. /ˈmana3/ ‘light (of weight)’ vs. /maˈna3/ ‘grease’ (Remijsen 2002: 600). Importantly, Remijsen (2002: 602–610) provides detailed acoustic evidence for the proposed stress difference, which includes not only duration measures but also differences in vowel quality and spectral balance.

25.3 Lexical stress Many Austronesian languages are described as having primary stress on the penultimate syllable, more rarely on the ultima. Outside the Philippine languages, discussed below, stress is rarely claimed to be contrastive, and, if it is, the contrast usually applies only to a small subset of lexical items that are said to have final stress (e.g. Toba Batak; cf. van der Tuuk 1971). Structural correlates of stress, such as differing phoneme inventories for stressed and unstressed syllables, are relatively rare. Examples include a stress-dependent ə/o alternation in Begak (Goudswaard 2005); a stress-dependent umlaut process in Chamorro, which has figured in several theoretical studies beginning with Chung (1983); and pre-stress vowel reduction in the Atayalic languages (Li 1981; Huang 2018), among others. A number of Austronesian languages with iambic stress patterns are reported for western Indonesia. Some of these are clearly the result of contact with mainland South East Asian languages, as is the case with Acehnese (Durie 1985), Chamic (Thurgood 1999), and Moken (Larish 1999; Pittayaporn 2005). Other iambic languages in Borneo (e.g. Merap, Segai-Modang group, Bidayuh) are also likely due to areal effects, although evidence for direct contact with nonAustronesians is unclear. In some of these languages, the iambic pattern has led to vowel breaking in the final syllable and vowel weakening in the initial syllable, as described by Smith (2017). Stress is often described impressionistically on the basis of words heard in isolation, without properly distinguishing between word-based and phrase-based prominence (a general problem for the study of word stress; cf. Hyman 2014c; Roettger and Gordon 2017). Work going beyond auditory impressions discusses stress solely in terms of pitch, duration, and overall intensity. Notable exceptions include Remijsen (2002) and Maskikit-Essed and Gussenhoven (2016), who also examine spectral balance. Furthermore, Rehg (1993) claims

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

AUSTRONESIA 373 that the major pitch excursions occurring on the penultimate syllable do not necessarily correlate with stress in Micronesian languages. Claims regarding stress that are primarily based on (impressionistic) evidence from pitch are difficult to evaluate without a more comprehensive account of the intonational system, which is usually missing. In line with widespread assumptions of current prosodic theory, in particular the autosegmental-metrical framework of analysis for intonation (Gussenhoven 2004; Ladd 2008b), the present discussion is based on the assumption that the analysis of pitch trajectories does not necessarily presuppose the existence of metrically strong anchoring points (i.e. lexically stressed syllables). Rather, intonational targets may also be linked to the boundaries of prosodic phrases. Consequently, many claims about stress in Austronesian languages found in the literature need further investigation, which may lead to the conclusion that a putative stress phenomenon allows for a more insightful analysis in other terms. An instructive example in this regard is the so-called definitive accent in Tongan. Here, definiteness was previously claimed to be marked by shifting stress from the penult to the ultima, but it has now convincingly been argued that the relevant contrast is essentially one of vowel length (which is phonemic in Tongan) and involves an enclitic vowel (cf. Anderson and Otsuka 2006). The bulk of the specialist work on stress relates to Standard Indonesian, the variety of Malay serving as the national language of Indonesia. Like many other varieties of Malay, Indonesian has widely been claimed to have penultimate (primary) stress, unless the penultimate syllable contains a schwa, in which case stress then shifts to the ultima (see Halim 1981: ch. 2 for a summary of the early literature). Beginning with Odé (1994), however, a group of Leiden phoneticians have questioned this view in acoustic and perceptual investigations of presumed stress phenomena in Indonesia (see also Zubkova 1966). Van Heuven and van Zanten (2007) provide a detailed report on this work, which also extends to three other Malayic varieties—Betawi (see also van Heuven et al. 2008), Manado, and Kutai Malay—as well as to Toba Batak. The main findings are as follows (see also Zuraidah et al. 2008 on Malaysian Standard Malay and Maskikit-Essed and Gussenhoven 2016 on Ambon Malay): • Strong first language (L1) effects exist for the production and perception of potentially stress-related parameters in Indonesian, with L1 Javanese speakers having the least clear evidence for stress. • Speakers of Manado Malay and L1 Toba Batak speakers of Indonesian are more consistent in rendering a fixed (typically penultimate) syllable within words more prominent. • Perceptually, speakers rate examples where one of the final three syllables is made acoustically prominent by manipulating pitch, duration, or overall intensity as roughly equivalent. Using a different methodology, Riesberg et al. (2018) find that speakers of Papuan Malay are unable to agree on which syllables are prominent in short excerpts of spontaneous narrative Papuan Malay speech. However, Kaland (2019) presents acoustic data that may be indicative of a subtle stress distinction outside IP-final pos itions. • Prominence distinctions between words appear to lack a communicative function in Indonesian. Thus, in gating experiments, Indonesian speakers were unable to make use of prominence differences in the initial syllables. They were also unable to understand contrastive stress on the subword level (as in English, ‘cof[FER] not cof[FIN]’),

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

374 NIKOLAUS P. HIMMELMANN AND DANIEL KAUFMAN as shown by their inability to judge the pragmatic appropriateness of examples involving such contrasts (van Heuven and Faust 2009). Much of this work argues that what has been analysed as word stress in Indonesian has no functional relevance for native speakers and that Indonesian and other varieties of Malay have no word-based prominence, a conclusion that, however, still needs further scrutiny before it can be considered to be firmly established. Inasmuch as the evidence for stress in other Austronesian languages is similar to that invoked for Indonesian, this kind of argument may apply more widely. This is, in fact, hypothesized in Goedemans and van Zanten (2014), who propose a set of diagnostics for suspicious stress claims, noting that these apply to a broad range of Austronesian languages. For example, some of the variation noted in older descriptions of Indonesian stress has also been described for dialects of Paiwan, a Formosan language of Taiwan, by Chen (2009b, 2009c). Chen observes that the southern dialect has invariable stress on the penultimate syllable but that the Piuma dialect shifts it to the ultima when the nucleus of the penultimate syllable is schwa. But, as in the case of Indonesian, studies of stress and prosodic prominence in Formosan languages (e.g. Chiang and Chiang 2005 for Saisiyat) generally do not take phrasal phonology into account, and the same questions thus remain open. Philippine languages present different problems for stress typology, since the vast majority of these languages have a phonemic distinction in prosodic prominence on the root level. Zorc (1978) is an early attempt to understand the phenomenon in a historical perspective and Zorc (1993) provides an overview. In Tagalog, the best-studied language of the Philippines, this prominence has alternatively been analysed as due to underlying stress or vowel length. Official Tagalog orthographic conventions for indicating stress imply that final stress is word-based (e.g. ‘beauty’) and that penultimate stress is unmarked (e.g. ‘head’). Both implications are probably wrong. First, what has been analysed as word-final stress is most likely a right-aligned edge tone, as further illustrated in §25.4. Second, it has been noted that roots with apparent ‘final stress’ in Tagalog are significantly more common that those with ‘penultimate stress’ (Blust 2013: 179), and thus that penultimate stress or prominence should be considered the marked case. The situation becomes clearer when we examine words in non-final positions, as shown in (1), where all syllables would typically be pronounced with even duration, overall intensity, and pitch except the final one (the stress mark here simply indicates some form of perceived prominence). (1) Tagalog a. [taˈŋa] stupid ‘stupid’

b. [aŋ ta~taŋa niˈla] det pl~stupid 3pl note.gen ‘How stupid they are!’

Unlike final prominence, penultimate prominence in the Tagalog word does not disappear in non-final contexts such as the above. It does, however, shift with suffixation. For instance, [ˈbaːsag] ‘break’ becomes [baˈsaːgin] ‘break (patient voice)’ with the /-in/ suffix. Again, unlike final prominence, penultimate prominence does not shift when the word is followed by clitics or other lexical material, e.g. [baˈsaːgin mo] ‘you break it!’. This, prima facie, looks like a proper stress system. Treating Philippine prosodic systems such as Tagalog

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

AUSTRONESIA 375 as inherently stress based, however, leads to a paradox in which heavy syllables seem to repel stress. While roots with an open penult allow for penultimate (trochaic) or final (iambic) prominence, no possibility exists for a trochaic pattern when the penultimate syllable is closed (2). (2) Tagalog syllable structure and putatively word-level prominence patterns open penult ˈCV.CV(C) CV.ˈCV(C) closed penult *ˈCVC.CV(C) CVC.ˈCV(C) This, in conjunction with the divergent behaviour of penultimate and final prominence, strongly suggests that length is the underlying category of interest (following Schachter and Otanes 1972: 15–18; Wolff et al. 1991: 12; Zorc 1993; but contra Bloomfield 1917: 141–142; French 1988: 63–64f). The penultimate syllable of native roots can bear a long vowel, as in /baːsag/ ‘break’, or not, as in /taŋa/ ‘stupid’. Apparent final stress is thus post-lexical, occurring only when the word occurs in phrase-final position. Penultimate prominence reflects a lexically specified long vowel. Long vowels can only occur in open syllables, a common state of affairs, so the paradox of heavy syllables repelling stress is illusory. Long vowels cannot occur in final syllables, but this is also not unusual, as final-lengthening effects tend to neutralize length distinctions in final positions (Barnes 2006: 151). What is unusual about Philippine prosodic systems such as that of Tagalog is that they instantiate ‘length shift’, a phenomenon far more familiar from stress. This can be explained as a structure preservation effect in light of the fact that long vowels never appear in (native) roots earlier than the penultimate syllable. Length shift thus preserves this generalization over suffixed words. Iterative stress (typically, with trochaic feet) has been posited for a number of Austronesian languages as well.1 It is worth noting that, with only very few exceptions (e.g. French 1988, which does not contain phonetic evidence), iterative stress has only been demonstrated for Western Malayo-Polynesian languages lacking phonemic prominence distinctions on the word level. The Sama languages of the southern Philippines have no length distinction on the penultimate syllable and are perhaps the best candidates for possessing iterative (right-aligned trochaic) stress (Allison 1979 for Sibutu; Walton 1979 for Pangutaran; Brainard and Behrens 2002 for Yakan). A similar iterative footing is reported for Oceanic languages and tentatively reconstructed for Proto-Oceanic by Ross (1988: 18). The main areas of investigation within right-aligned trochaic systems have been the integration of suffixes and clitics, the treatment of the left edge of the prosodic word (e.g. whether or not it contains an initial dactyl), and the often irregular behaviour of vowel hiatus. Zuraw et al. (2014) exemplify this line of inquiry for Samoan, while Buckley (1998) examines related issues in the much discussed stress pattern of Manam (Lichtenberk 1983; Chaski 1985).

1 Such a pattern was also posited by Cohn (1989) for non-Javanese Indonesian. It is likely that the divergence between these analyses and the more recent stressless analyses discussed above is partly due to differences in substrate between the varieties under examination (e.g. Toba Batak vs. Javanese) and partly due to differences in methodology (stress perception tests vs. impressionistic evaluation of words in isolation).

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

376 NIKOLAUS P. HIMMELMANN AND DANIEL KAUFMAN

25.4 Intonation Stoel (2005, 2006) has proposed analyses of the intonation of Manado Malay and Banyumas Javanese, the essential features of which have also been found for Malaysian Malay (Zuraidah et al. 2008), Waima’a (Himmelmann 2010, 2018), and Ambon Malay (Maskikit-Essed and Gussenhoven 2016). See also Halim (1981) for a description of Indonesian in a pre- autosegmental framework. There is still no well-established standard analysis of intonation in Austronesian languages and much of what is reported here is still tentative. In fact, for most branches too little is known about intonation to provide a basis for even the most basic observations. (For the Oceanic branch, see e.g. Rehg 1993; Clemens 2014; Jepson 2014; Calhoun 2015. For the Formosan languages, there are Chiang et al. 2006; Karlsson and Holmer 2011; and a number of Taiwanese MA theses.) Despite the great diversity of Austronesian languages and the scarcity of detailed ana lyses, several generalizations may still apply widely. We have not encountered any languages with post-focus compression, although this may not be surprising given the large number of languages for which it is thought to be absent (Xu et al. 2012). Relatedly, it appears impossible for most Austronesian languages to express narrow focus on a sub-constituent of a clause or noun phrase with intonation alone. Narrow focus is achieved through syntactic or morphological means, potentially in conjunction with a particular intonational pattern, but the role of intonation in focus marking is clearly more circumspect when compared to a language like English (Kaufman 2005). In many languages, the tones aligned with the edges of phrases and utterances are the only tonal targets that surface with any consistency. As is cross-linguistically common, we find an association between H-L% and declarative statements contrasting with a L-H% combination for polar interrogatives. This can be seen in the Waima’a and Totoli examples in (3) and (4).2 (3) Waima’a (elicited) ne de kara haru lumu 3sg neg like shirt green ‘S/he doesn’t like the green shirt.’ (4) Totoli (elicited) isei nangaanko saginna who av.rls:eat:and banana:3sg.poss ‘Who ate his/her banana?’ In addition to the bitonal target at the right edge, intonational phrases (IP) are often (but not necessarily; cf. Figure 25.2) divided into phonological phrases (φ), which usually end on a high target, represented by H$ in Figure 25.3. Note that all tonal annotations in this section only pertain to targets that are clearly recognizable in the fundamental frequency (f0) extraction. Sub-IP phrasing is too little understood to allow for speculations as to whether 2 Throughout this section we will refer to the combination of the two tonal targets as ‘edge tones’, the final pitch excursion as a ‘boundary tone’ (T%), and the target preceding the latter as the ‘prefinal target’ (T-). We will return to issues in the analysis of this prefinal target below.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

AUSTRONESIA 377

Pitch (Hz)

300

250

200 175 ne

de

ka

ra

ha ru

lu

mu H– L%

0

1.241 Time (s)

Figure 25.2 f0 and edge tones for (3). 200

Pitch (Hz)

150

100 75 i

se

i na ngaan

ko

sa

na L–

H$ 0

gin

Time (s)

H% 1.622

Figure 25.3 f0 and edge tones for (4).

IP’s are exhaustively divided into φ’s (with deletion of the φ-final boundary tone). Where they exist, φ’s are of variable size, but are often larger than a single phonological word and may span complete (subordinate) clauses. Himmelmann (2018) provides more details for Waima’a and Totoli. The cues for prosodic phrasing are generally intonational in Austronesian languages and can be subtle. Downstep of H tones within φ has been observed in all the languages discussed here, with exceptions for IP-final excursions, which can be considerably higher (arguably due to H%). Higher-level durational effects remain largely unexplored, although this is a potentially rich area for uncovering the mapping of syntactic structure to φ’s. Richards (2010: 165–182), for example, explores the structure of higher prosody in Tagalog with a view towards syntactic analysis and suggests an algorithm for locating edge tones at initial φ-edges. Hsieh (2016) examines Tagalog verb durations in two conditions and shows

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

378 NIKOLAUS P. HIMMELMANN AND DANIEL KAUFMAN evidence for closer prosodic integration of transitive subjects with preceding verbs when compared with objects. However, not all cues for phrasing are prosodic. In Rotuman and many of the languages of the Timor area, VC-metathesis appears to be rooted in phonological phrasing. In Amarasi, a language of West Timor, as described by Edwards (2016: 3), metathesis occurs in several phrase-medial contexts but not across a subject and the predicate, as seen in (5). Edwards (2016: ch. 2) provides an exhaustive review of the literature on prosodic metathesis in Austronesian and an in-depth analysis of Amarasi. (5) Amarasi a. faut koʔu stone big ‘(a) big stone’

b. fatu koʔu stone big ‘Stones are big.’

There are also segmental effects in Philippine languages that depend on prosodic phrasing. In Tagalog, for instance, we find glottal stop deletion, final vowel lowering, and intervocalic tapping (/d/→[ɾ]), all of which appear to be diagnostic of prosodic boundaries apart from being subject to speech rate effects. Recall from §25.3 that the major evidence for stress in most Austronesian languages is observed in pitch trajectories on words spoken in isolation (i.e. as short IP’s), which may in fact either show IP-final edge tones or the effect of lexical stress. While it is widely acknowledged that in the national standard varieties of Indonesian and Malaysian Malay, IP edge tones are not systematically associated with either of the two unit-final syllables, this appears to be more systematic in many other varieties, including Toba Batak, Waima’a, and eastern varieties of Malay (Manado, Papuan). Here, the pre-final target appears to be regularly associated with the penultimate syllable and the final boundary tone with the final syllable, giving rise to the widely made claim of regular penultimate stress. If this association of edge tones were exceptionless, it would not warrant an analysis in terms of lexical stress, because the pitch trajectory can be fully described with reference to the IP boundary (a rise starting at the beginning of the penultimate syllable, reaching the peak at the boundary to the final syllable, etc.). In Papuan Malay, the penultimate intonational prominence in IP-final position disappears in phrase-medial position, as seen in (6) (see also Figure 25.4). (6) Papuan Malay (elicited) a. baju b. baju mera shirt shirt red Many languages with a prominent penultimate syllable in the IP have been claimed to possess a small group of words that in IP-final position trigger prominence on their final syllable. Such a state of affairs suggests a stress difference, as proposed by Stoel (2005) for Manado Malay, for instance. However, Maskikit-Essed and Gussenhoven’s (2016) careful investigation of Ambon Malay, for which essentially the same claims as for Manado Malay have been made (cf. van Minde 1997), shows that the f0 peak of the pre-final rise does not clearly align with either the final or the pre-final syllable. Instead, the best predictor of its placement is the duration of the IP-final rhyme, syllable, or word. These authors therefore propose an analysis of the IP-final edge tone combination as ‘floating boundary tones’ (HL%). Obviously, further investigations are needed to determine whether such an analysis

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

AUSTRONESIA 379 175 160 140

Pitch (Hz)

120 100 75 ba

ju

H– 0

ba ju L%

me

ra

H–

Time (s)

L% 1.561

Figure 25.4 f0 and edge tones for (6).

can also be supported for other instances of a presumed stress difference based on what is heard as a difference in alignment of the IP-final edge tone combination. Other languages, however, clearly do associate IP-final edge tones with specific syllables. Thus, in Tagalog, they are regularly associated with the final syllable, unless the penultimate syllable contains a long vowel. In the latter case, the pre-final target is typically reached on the long vowel. Central Philippine languages vary as to whether closed penultimate syllables attract prosodic prominence. In Cebuano, for example, a closed penult coincides with prosodic prominence, as in /tanʔaw/ [ˈtanʔaw] ‘look’ (Wolff 1972: x) (cf. the Tagalog example in (1)). An underlying form such as /basbas/ thus appears to have final prominence in Tagalog [basˈbas] but penultimate prominence in Cebuano [ˈbasbas] (cf. Shyrock 1993; Zorc 1993: 19). In yet other Philippine languages, such as Itawis and Pangasinan, IP-final edge tone association is apparently not predictable even in words with a closed penult (Blust 2013: 177).

25.5 Prosodic integration of function words Among languages displaying word-level prominence, there are two strategies for handling clitics and function words. In Philippine languages, second-position clitics are distinct from lexical roots in not satisfying (disyllabic or bimoraic) minimality constraints (Kaufman 2010; Billings and Kaufman 2004). In most of these languages, the last syllable of the clitic, not that of the lexical host to its left, will attract the IP-final edge tone combination, as shown in (7), where it occurs on the final clitic /ba/. No prominence at all is accorded to the final syl lable of the verb /binili/ in (7) (see also Figure 25.5), which has an utterance-initial H tone characteristic of Tagalog, on its first syllable.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

380 NIKOLAUS P. HIMMELMANN AND DANIEL KAUFMAN 200 180

Pitch (Hz)

160 140 120 100 b

i

n

i

l

i

m

o

b

a

%H

L+H% 1.186

0 Time (s)

Figure 25.5 f0 and edge tones for (7).

(7) Tagalog (elicited) bili=mo=ba? buy=2sg.gen=qm ‘Did you buy (it)?’ The initial H typically docks to the first word of the IP but not necessarily to its first syl lable, as seen in Figure 25.6, where it is aligned with the final syllable of the proclitic /maŋa/. The initial and final pitch targets in this typical Tagalog pattern fully ignore the lexical/ functional and the prosodic word/clitic distinctions. In terms of duration, note that only syllables with lexically specified vowel length and length derived through compensatory lengthening after ʔ-deletion emerge as long. No word-based durational effects are present outside of vowel length in (8) (see also Figure 25.6), an elicited example.

500 400

Pitch (Hz)

300 200 ma ŋ a b

100

a:

t

a:

ŋ a'

p

%H

s

i l

a H–L% 1.712

0 Time (s)

Figure 25.6 f0 and edge tones for (8).

a l a

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

AUSTRONESIA 381 (8) Tagalog (elicited) maŋa=baːtaʔ=ŋaʔ=pala=sila pl=child=emph=mira=3pl.nom [maŋa baːtaː ŋaː pala sila] ‘They are really children!’ In the Pamona-Kaili and South Sulawesi languages of Sulawesi, on the other hand, function words are treated distinctly from lexical words, where only the latter project pitch accents. For Uma, a Pamona-Kaili language, word-based prominence is aligned with the penultimate syllable of a window that includes the lexical word itself and a small number of adverbial clitics such as /mpuʔu/ ‘really’ and /oaʔ/ ‘anyway’ (Martens 1988). A large number of other types of clitic, including those with pronominal, aspectual, and adverbial functions, are excluded from this window. Martens’s (1988) observations are of interest, because they concern prominence in what is generally a non-final element in the Uma clause, the verb. A similar pattern can be found in the closely related Kulawi language (Adriani and Esser 1939: 9), where prominence is associated with the penultimate syllable of the word including suffixes but excluding pronominal and adverbial clitics.3 Enclitics do not attract prominence even when they are disyllabic, as in [ˈhou=kami] house=1pl.excl.gen ‘our house’. Unlike the Philippine languages discussed above, Kulawi and Uma appear to make a clear distinction between lexical and functional words, with pitch targets anchoring only to lexical words. Note how the pitch targets in (9) (see also Figure 25.7) are anchored to the penultimate syllable of the lexical word excluding the genitive clitics. Similarly, in (10) (see also Figure 25.8) the function word /padena/ ‘then’ and the pronominal clitic are not associated with pitch excursions. (9) Kulawi (from a spoken narrative) nam-peˈgika ˈdike=na no-pa-ˈdapa hiˈnoko=ra rls.av-wait dog=3sg.gen rls.av-caus-hunt prey=3pl.gen ‘. . . his dog was waiting while he was hunting their prey.’ 170 160 140

Pitch (Hz)

120 100 80 70 na m p e

g

i

k

a

d

i

e n a

dike=na

nam-pegika H*

k

n o

p a d a

p

no-pa-dapa

H*

H*

0

a

h i n

o

k

o r

a

hinoko=ra H*

L% 2.922

Time (s)

Figure 25.7 f0 and tonal targets for (9). 3 Similar patterns are found in South Sulawesi languages, such as Bugis (Sirk 1996) and Makassarese (Jukes 2006).

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

382 NIKOLAUS P. HIMMELMANN AND DANIEL KAUFMAN 170 160 140

Pitch (Hz)

120 100 80 70

p a

d

e

n

a

padena

m o

m

u

l i

momuli

k

o

m

i

komi H*

0

H– 1.551

Time (s)

Figure 25.8 f0 and tonal targets for (10).

(10) Kulawi (from a spoken narrative) padena mo-ˈmuli=komi then irr.av-create=2pl.nom ‘you (go) create’ The prosodic framework in this chapter is not sufficient to capture all relevant inton ational contrasts. Thus, in Manado Malay and the Javanese Palace language (see chapters in van Heuven and van Zanten 2007), questions may involve a (more or less) continuous rise across most of an IP, usually after a minor initial drop. Furthermore, echo questions may have specific features such as a higher pitch level than the preceding statement. In fact, there appear to be various ways to expand the IP at the right edge, after the IP-marking edge tone combination. Stoel (2005), for example, reports the option for Manado Malay to add a single further phrase after the IP edge tones; this phrase tends to be flat and involves a highly compressed pitch range. There are also various options for what may be termed ‘inton ational clitics’, often determiners or conjunctions, which may occur after the IP-marking edge tone combination. This is similar to the compression found on post-posed reporting clauses and vocatives in English (e.g. ‘Don’t do it!’ she said.)

25.6 Conclusion Until very recently, Austronesian languages had contributed little to our understanding of prosodic typology. We are now in a position to enumerate several unusual features of these languages, which require special attention from theoretical, typological, and experimental approaches. First of all, the possibility that many languages of Indonesia lack word-based prominence must be taken seriously and examined with a more diverse range of tools. Corpus investigations, which are necessary to reveal the predictive power of production

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

AUSTRONESIA 383 and perception studies cited in the preceding sections, are lacking. The theoretical implications of ‘stresslessness’ have also gone unexplored. Does ‘stresslessness’ simply indicate that intonational and durational prominence are anchored to higher levels within an otherwise normal hierarchical prosodic structure, or does it imply a restructuring or even a lack of structure on the level of the prosodic word? To answer this, evidence for prosodic structure must be culled from segmental alternations, phonotactic generalizations, morphophonological processes, acquisition, and elsewhere. Only then will we be able to say whether a lack of word-based prominence is diagnostic of more profound structural differences. Philippine languages, on the other hand, present us with the problem of mobile (and morphologically conditioned) vowel length, its relation to syllable type, and the anchoring of pitch movements, areas that show interesting cross-linguistic variation within the group. Making progress here will require abandoning (at least temporarily) the composite notion of ‘word stress’. For each language under investigation, the determinants of segment and syllable duration, pitch movements, overall intensity, and other possible measures such as spectral tilt must be treated as a priori independent dimensions with special attention to keeping word-level, phrase-level, and utterance-level effects distinct. It may emerge that Austronesian languages indeed provide broad support for the ‘no stress type’. On the other hand, we cannot yet rule out that a careful reassessment of prosodic organization in some of the apparently stressless languages could yield subtle word-based prominence patterns that are occluded by higher-level prosodic phenomena.

chapter 26

Austr a li a a n d N ew Gu i n ea Brett Baker, Mark Donohue, and Janet Fletcher A number of Papuan languages have contrastive lexical tonal systems, of a variety of types (syllable or word-based, with varying numbers of tones), while no Australian languages are described as tonal. A pervasive feature of Australian languages is the association of stress to the initial syllables of polysyllabic morphemes, but this pattern appears to be absent from Papuan languages. Papuan languages are described in terms of both iambic and trochaic metrical systems. In Australia, evidence for foot structure is hard to come by, though no Australian languages are described as having a straightforward iambic metrical system. The intonation systems of Australian languages are not unlike those of Europe, though with a smaller number of tunes. This aspect of Papuan languages is, however, poorly documented.

26.1 General metrical patterns in Australia The few—by and large impressionistic—descriptions of the word prosody of Australian languages present a fairly uniform picture. With just a couple of exceptions, these languages have fully predictable stress on the basis of morphological structure, word edges, and, to some extent, syllable structure.1 For the most part, they are described as having root-initial stress, as well as stress on the initial syllable of polysyllabic suffixes (though see Tabain et al. 2014 for a cautionary note). Hale’s (1977: 15–16) account of stress in Warlpiri is perhaps the earliest description of this type, pointing out that morphologically governed stress provides contrasts between segmentally identical strings such as /ˈjapa-ˌɭaŋu-ɭu/ ‘person-also/ even-ergative’ and /ˈjapaɭa-ˌŋuɭu/ ‘father’s mother-elative’.

1 The only well-described exception to this statement that we know of is Ndjébbana (McKay 2000), although Evans (1995) and Round (2009) both claim that suffixes differ in whether they are ‘underlyingly-stressed’ or not in Kayardild.

AUSTRALIA AND NEW GUINEA 385 Some examples from Warlpiri are shown in (1) (adapted from Nash 1986: 102; Baker 2008: 159).2 Note that initial stress is unaffected by word length, and that the location and realization of secondary stress are affected by whether a syllable can form a foot with a following syllable without disrupting the foot structure of polysyllabic suffixes. Final syllables are never stressed. (1) root root-loc root-loc-erg root-loc-too-erg

‘man’ ˈwati ˈwati-ŋka ˈwati-ˌŋka-ɭu ˈwati-ŋka-ˌɭaŋu-ɭu

‘tree’ ˈwatija ˈwatiˌja-ɭa ˈwatija-ˌɭa-ɭu

‘spinifex plain’ ˈmanaŋˌkara ˈmanaŋˌkara-ɭa ˈmanaŋˌkara-ˌɭa-ɭu

Deviations from this pattern are not frequent and largely concern cases where sequences of monosyllabic suffixes are not footed, or where main stress can fall on a foot later than the first (typically, the last). Languages like this are Dyirbal (Dixon 1972: 274), Ngiyambaa (Donaldson 1980: 42), Diyari (Austin 1981: 31), Yankunytjatjara (Goddard 1986: 27), Martuthunira (Dench 1994: 43), Kayardild (Evans 1995; Round 2009), Eastern Arrernte (Henderson 1998: 211), Nhanda (Blevins 2001b: 24–28), and Warlpiri (Nash 1986: 100), to name just a handful, from widely dispersed parts of the continent. ‘Prefixing languages’ (mainly of the non-Pama-Nyungan (NPN) group of the central north and northwest of the continent) also appear to follow the same general pattern, e.g. Ngan’gityemerri (Reid 1990: 89), Ngalakgan (Baker 2008), Jingulu (Pensalfini 2000, 2003), Limilngan (Harvey 2001), and Bininj Gun-wok (Evans 2003). In prefixing NPN languages, the metrical domain is often preferentially associated with the root, thereby leaving strings of prefixes unstressed (and apparently unfooted), regardless of the relative sizes of the prefix string and the verb stem. In (2c) for instance, there are four unfooted syllables before the main stress on the first root in the word. (2) Ngalakgan (Baker 2008: 169)3 a. ŋuruɳ-ˈɻu 12pl.o-burn.pr ‘we’re getting burnt’ b. ŋuruɳ-mu-ˈɳe 12pl.o-veg-burn.pr ‘It [sc. ‘sun’, veg] burns us’ c. ŋuruɳ-pu-pak-ˈpoɭk+pu+n 12pl.o-3pl-appl-noise+hit+pr ‘they are making noise on us’ (i.e. ‘talking over the top of us’) d. jiriɳ-pi-pak-ˈwoc+ma 1pl.o-3pl-appl-steal+get.pr ‘they always steal from us’ 2 Note that Warlpiri (like some other Australian languages; Baker 2014: 152) has case allomorphy determined by moraic size of the stem, which is the reason for the difference in the form of the locative suffix in column one ‘man’ vs. the other stems here. 3 The verb glosses in small caps here are the denotations of these finite verbs when used independently; here, they function as verbalizers enabling a non-finite ‘coverb’ constituent to function as a predicate, and lack their ordinary independent meaning.

386 BRETT BAKER, MARK DONOHUE, AND JANET FLETCHER Prefixing languages vary in this respect, however. In neighbouring, closely related Rembarrnga (McKay 1975), examples analogous to those in (2) take initial stress, at least as an option. In Ngan’gityemerri, Reid (1990: 89) finds that noun class markers can be either internal or external to the metrical domain (the former thereby classified as ‘prefixes’ and the latter as ‘proclitics’). In Bardi, according to Bowern et al. (2012), nouns and verbs differ with respect to whether prefixes are included (nouns) or not (verbs) within the metrical domain. To date, there are no empirical perceptual studies, to our knowledge, of word stress in Australian languages. There are a few instrumental studies, but the picture they paint of the acoustic correlates of stress is far from clear. There is some suggestion that the post-tonic consonant, rather than the tonic (i.e. main stressed) vowel, may show distinctive lengthening in some languages, such as Warlpiri (Pentland 2004), Bininj Gunwok (henceforth, BGW) (Stoakes et al. 2019), Mawng (Fletcher et al. 2015), and Djambarrpuyngu (Jepson et al., in press). This process is also described impressionistically for Umpila (Levin 1985: 136) and Kukatj (Breen 1992). The widespread isomorphism between morphological and prosodic structures in Australian languages, henceforth Mwords and Pwords, is a pervasive feature that demands further research. However, it appears that not all morphological relations are relevant to metrical structure (Baker 2008; and see Hore 1981 for an early move in this direction with respect to Nunggubuyu). The contrast in (3a) and (3b) shows that inflectional suffixes on verbs do not constitute metrical domains, despite being (arguably) polysyllabic here, unlike the bound dative pronominal clitic in (3b). Examples (3b) and (4b) show that in Ngalakgan, as in Warray (Harvey and Borowsky 1999), monosyllabic CV roots must have a long vowel, regardless of affixation, suggesting that roots constitute Pwords on their own. The (a) examples show that this requirement does not hold of verb roots, which do not undergo vowel lengthening when affixed.4 (3) Ngalakgan (Baker 2008:103) a. [ˈɟaŋana] b. [ˌgeːˈŋini] /ca+ŋana/ /ke=ŋini/ stand+fut son=1sg.dat ‘will stand’ ‘my son’ (4) a. [ˈbuni] /pu+ni/ hit+irr ‘might hit’

b. [ˈboːwi] /po-wi/ river-lat ‘along the river’

In most Australian languages, polysyllabic case suffixes form their own metrical domain, but tense suffixes characteristically do not, as in Ngalakgan. In a few cases, such as Arrernte and Ngan’gityemerri, where tense suffixes are productive on verbs in the same way as case suffixes typically are on nouns, tense suffixes can constitute domains for stress (see e.g. Henderson 1998; Reid 1990/2011). 4 The use of the ‘+’ boundary symbol here, as in Baker (2008), is meant to indicate a tighter degree of juncture between verb roots and tense affixes than between noun roots and nominal suffixes (‘-’) or clitics (‘=’).

AUSTRALIA AND NEW GUINEA 387

26.1.1 Quantity and peninitial stress A good number of Australian languages have contrastive vowel length (Fletcher and Butcher 2014), and the position of long vowels would generally appear to be positively correlated with metrically strong positions. In some languages, the position of long vowels affects the location of word stress, as in Yidiny (Dixon 1977: 39), Ngiyambaa (Donaldson 1980: 40), Martuthunira (Dench 1994: 42), varieties of Bandjalang (Crowley 1978), Nhanda (Blevins 2001a: 24), Yukulta (Keen 1983: 198), Wubuy/Nunggubuyu (Hore 1981), Bardi (Bowern et al. 2012), and Kayardild (Evans 1995: 79; Round 2009). In others, long vowels only occur in metrically strong syllables, typically the word-initial syllable, as in Djambarrpuyngu and other Yolngu varieties (Heath 1980; Morphy 1983; Wilkinson 1991) as well as in Warlpiri (Nash 1986), Umpila (Harris and O’Grady 1976: 166), Yankunytjatjara (Goddard 1986: 11), and (under some interpretations) Yidiny (e.g. Dixon 1977; Nash 1979; Hayes 1995). We know of no Australian languages where vowel length is contrastive without a concomitant interaction with the metrical system as described here. Closed syllables rarely appear to behave as heavy syllables in Australian languages, an exception being Bardi (Bowern et al. 2012). However, Baker (2008) describes closed-syllable weight in a few Northern languages, including Ngalakgan and BGW (which both lack contrastive vowel length). Remarkably, homorganic clusters (nasal-stop clusters and long or geminate stops) fail to contribute weight to a preceding syllable in these languages. Apart from these, except for the remarks about long vowels above, quantity is not a common factor in Australian metrical systems. Besides the large number of languages with metrical patterns like those of Warlpiri, there is a smaller, widely dispersed group of languages that are described as having stress on the peninitial (i.e. second) syllable, a cross-linguistically uncommon pattern (Hayes 1995: 73). They are also characterized by loss of initial consonants, again a cross-linguistically highly unusual feature (Blevins 2001b; Dixon 2002: 589–602; Kümmel 2007: 108, 126). Languages in this category include the Northern Paman group in North Queensland (Hale 1964), Mbabaram of the southern group (Dixon 1991), the Arandic group in Central Australia (Koch 1997), the Anaiwan (or Nganyaywana) language of the New England highlands in northern New South Wales (Crowley 1976), and Nhanda in the north of Western Australia, which exhibits initial consonant dropping but lacks peninitial stress (Blevins and Marmion 1994). Examples (5) and (6) are from Eastern Arrernte and show, respectively, peninitial stress in surface vowel-initial words versus initial stress in surface consonantinitial words; a very similar pattern is described for other Arandic languages, such as Alyawarra (Yallop 1977: 43) as well as Northern Paman languages such as Uradhi (Hale 1976: 44). The analysis of the metrical system, and the syllable structure, of Arandic languages is a subject of ongoing debate (see e.g. Breen and Pensalfini 1999; Panther et al. 2017; Topintzi and Nevins 2017). (5) Eastern Arrernte (Henderson 1998: 210) a. ampe ‘child’ /amp/ b. ampere ‘knee’ /amp.əɻ/ c. amperneme ‘burn+pr’ /amp.əɳ.əm/ d. arrernelheme ‘put+refl+pr’ /ar.əɳ.əl ̪.əm/

[ˈɐmbɐ] ~ [ɐmˈbɐ] [ɐmˈbəɻə] [ɐmˈbəɳəmɐ] [ɐˈɾəɳəˌl ̪əmɐ]

388 BRETT BAKER, MARK DONOHUE, AND JANET FLETCHER (6)

a. b. c. d.

the ‘1sg:erg’ theme ‘poke+pr’ theleme ‘pour+pr’ thelelheme ‘pour+refl+pr’

/ət /̪ /ət .̪ əm/ /ət .̪ əl.əm/ /ət .̪ əl.əl ̪.əm/

[ˈt ɐ̪ ] [ˈt ə̪ mɐ] [ˈt ə̪ ləmɐ] [ˈt ə̪ ləˌl ̪əmɐ]

The nature of metrical structure in Australian languages and its relationship with word structure clearly deserve more detailed investigation. Even less attention has been devoted to the interesting issue of text-to-tune alignment in song language, which would in many cases appear to have musical (rhythmic, i.e. isochronous) beats that do not correspond with prominent syllables in speech (see e.g. Turpin 2008).

26.2 Intonation in Australian languages In addition to metrical word structures of the type found in languages such as English, Australian languages have intonation (i.e. patterns of pitch, duration, and intensity) that can extend across a stretch of speech to create the characteristic intonational tunes of a language. To date, detailed descriptions of only a handful of Australian languages are available. These include Wik Mungkan (Sayers 1976a, 1976b, probably the earliest descriptions of this type), Dyirbal (King 1998), BGW (Bishop 2003; Bishop and Fletcher 2005; Fletcher and Evans 2000), Kayardild (Round 2010; Ross 2011), Dalabon (Evans et al. 2008; Ross 2011; Fletcher 2014), Jaminjung (Simard 2010), Yukulta (Bonnin 2014), and Warlpiri (Pentland 2004). Generally, tunes that mark out definable ‘chunks’ of speech in these languages reflect universal trends, in that falling intonation often signals finality and completion of a discourse segment, and high pitch signals incompleteness or non-finality. As in many languages of the world, there is pre-boundary lengthening at the right edge of inton ational constituents. Pause location and duration also contribute to the delimitation of higher-level prosodic constituents, such as intonational phrases (IP’s) (e.g. Fletcher and Evans 2002; Ross 2011; Fletcher et al. 2016) and the utterance level (Round 2010). A particularly striking tune that occurs frequently in narrative discourse in many Australian languages is a stylized high-level plateau accompanied by extreme lengthening of the phrase-final vowel. As first noted by Sharpe (1972), this tune-cum-lengthening, which has been observed in many Australian languages, is generally associated with prolonged, on going, or continuous actions (Yallop 1977; Heath 1984; Bishop 2003; Ross 2011; Fletcher 2014). Most research on intonation in Australian languages is couched within the autosegmentalmetrical framework. Tone targets (prominence-lending pitch accents, e.g. H* and L+H*, and boundary tones, e.g. L%, H% and LH%) combine as intonational morphemes to create different utterance melodies. Tones may demarcate different levels of prosodic structure, such as the phonological or accentual phrase, or the IP (Fletcher and Evans 2000; Bishop 2003; Bishop and Fletcher 2005; Fletcher 2014). As a general rule, the tone inventory of Australian languages analysed to date tends to be smaller than that in languages such as English (e.g. Pierrehumbert 1981; Beckman et al. 2005), German (Grice et al. 1995), Greek (Arvaniti and Baltazani 2005), and Dutch (Gussenhoven 2005). Most analyses of Australian languages assume a structural relationship between pitch accents and main stress, as in BGW, Kayardild,

AUSTRALIA AND NEW GUINEA 389 Mawng, and Dalabon (for overviews see Fletcher and Evans 2002; Bishop 2003; Fletcher and Butcher 2014). For other languages, it has been suggested that post-lexical prosody is phrasal, with post-lexical pitch events demarcating phrase edges (e.g. see Tabain et al. 2014 for Pitjantjatjara). There may be prosodic variation among and within Australian languages, but to date this has only been documented in cross-dialect analyses of BGW (Bishop and Fletcher 2005). In terms of Jun’s (2014b) typology, most languages thus fit into the head/edgeprominence marking category (e.g. BGW) or have edge-prominence marking at the level of the phrase (e.g. Pitjantjatjara). Tones mostly demarcate left and right edges of intonational constituents with a relatively sparse distribution of tones within them. A preliminary study of HL patterns within IP’s in Mawng and Pitjantjatjara suggests that these Australian languages have relatively strong macro-rhythm (following the typology described by Jun 2014b), but it is not clear to what extent this is typical of the area. Analyses of intonational functions show that changes in pitch span, pitch register or range, prosodic phrasing, and to a lesser extent pitch accent dynamism contribute to the signalling of utterance modality (e.g. interrogative vs. declarative) and information structure. Interrogatives in many Australian languages are realized with sharply falling inton ation from a pitch peak that is generally located on the interrogative marker, which is often sentence-initial. It resembles the question intonation of languages such as Hungarian (see Ladd 2008b: 115–118 for an overview), except that the pitch fall is initiated at the IP’s left edge, regardless of whether there is an interrogative word. An example of this tune is presented in Figure 26.1 for the Northern Australian language Mawng. Here, the highest pitch peak of the IP is due to L+H* occurring on the wh-word ŋanti [ˈŋɐndɪ] ‘who’, while the rest of the IP is realized with a highly marked falling pitch pattern. Polar questions are also often realized with a similar pitch contour. This contrasts sharply with typical ‘flat hat’ of the Mawng declarative intonation, a contour that is usually (H*) H* L%. ords

nganti

j(a)ingalangaka

werrk

350

L+H* 300 250

!H* Lp

200

H*

150

L%

Hz Question word

Figure 26.1 f0 contour for a wh-question in Mawng: ŋanti calŋalaŋaka werk ‘Who is the one that she sent first?’ with high initial peak on the question word: ŋanti ‘who’.

390 BRETT BAKER, MARK DONOHUE, AND JANET FLETCHER Information structure categories including narrow focus also tend to be signalled by wider pitch span of the prosodic phrase containing the focal constituent, followed by pitch range compression of post-focal material. However, we do not observe de-phrasing or deaccentuation in this context (i.e. the accent pattern remains intact). Constituents in pragmatic focus bear the highest pitch prominence of the utterance (often a super-high pitch peak ^H* or L+^H*) and are often realized in their own minor or major IP (e.g. Singer 2006; Hellmuth et al. 2007; Fletcher et al. 2016 for Mawng; Bishop 2003 for BGW). Word order equally plays an important role in the marking of information structure. The left edge of clauses is pragmatically significant in a range of Australian languages (e.g. Mushin 2005 for Garrwa) and constituents that are discourse prominent are often left-dislocated (Simpson and Mushin 2008), resulting in that constituent ending up in an IP on its own. In sum, phrasing, pitch register, and pitch span manipulations, rather than pitch accent dynamism or nuclear accent placement (typical of Germanic languages), indicate discourse promin ence in a range of Australian languages (for topic marking in Jaminjung, see Simard 2010; Schultze-Berndt and Simard 2012). Modifications to global pitch trends within a discourse segment are thus an important index of discourse segmentation and information structure in Australian languages, similar to patterns observed in many other languages of the world. Fletcher and Evans (2000) compared global and local pitch topline trends across successive IP’s in a corpus of BGW narratives and found that a pitch range reset at the start of a discourse segment often signals topic shift, and final lowering (an extra-low boundary tone) often signals the end of a major discourse segment. Similar pitch register patterns have been observed in Dyirbal, Dalabon, and Jaminjung. Additionally, an upward pitch register shift is often a key indicator of reported speech in a wide range of languages, including Alawa, Dalabon, and Kayardild (Sharpe 1972; Evans et al. 1999; see also Round 2010; Ross 2011). Unexpectedly, prosodic and morphosyntactic structures are closely related in Australian languages. Generally, IP’s line up with major clauses (e.g. Ross et al. 2016). Because of their complex nature, clauses often consist of relatively few words. IP’s average 2.4 Mwords in Dalabon, 2.3 in Kayardild (Ross et al. 2016), and 1.9 in BGW (Bishop and Fletcher 2005). Strikingly, grammatical words in Dalabon can be broken into two IP’s, a drastic mismatch between prosodic structure and morphological word-hood (Evans et al. 2008; see also Baker 2018). Conversely, prosodic integration may play an important role in signalling morphosyntactic structure in many of these languages. In Dalabon and BGW, for instance, two complex verbal constructions may occur in a single IP as a means of indicating subordination or semantic cohesiveness (e.g. Evans 2003; Bishop and Fletcher 2005; Ross et al. 2016), often creating very long IP’s as a result.

26.3 Word prosody in New Guinea Despite their separation from mainland Asia by Island Southeast Asia, the languages of New Guinea have word-prosodic systems that agree with the general typological spectrum found in mainland Asia. While to the west and to a lesser extent the east relatively simple word-prosodic systems dominate (see chapter 25), New Guinea itself is home to a rich var iety of stress systems, seasoned with a wide variety of tonal phonologies, and frequently

AUSTRALIA AND NEW GUINEA 391 served with a mixture of the two. The languages of New Guinea therefore contrast with the fairly homogeneous prosodic systems of Australia described in §26.2. This section presents some of the variety of the prosodic systems in New Guinea, both western (Indonesia) and eastern (Papua New Guinea (PNG)), with no pretence at an exhaustive survey or even a typological overview.

26.3.1 Stress Stress is frequently on the word-initial syllable in the languages of New Guinea, perhaps more so towards the north of the island, though trochaic penultimate stress is common (e.g. initial stress in Alamblak [Sepik; East Sepik, PNG] and Mesem [Trans New Guinea; Morobe, PNG]; penultimate stress in Ambulas [Sepik; East Sepik, PNG] and Wambon [Trans New Guinea; Boven Digul, Indonesia]). In many cases, stress is insensitive to weight; in some languages, stress location is sensitive to morphosyntactic factors. One [Torricelli; Sandaun, PNG] regularly assigns stress to the penultimate syllable, but verbs allow for stress on either syllable of the final foot, resulting in contrasts such as [ˈte.re] ‘chop’ and [te.ˈre] ‘howl’. While forms such as the noun [ˈte.li] ‘round thing’ would be equally legitimate for a verb, *[te.ˈli] with final stress is not a possible noun. Weight-sensitivity does, however, occur in some stress systems. In Tifal [Trans New Guinea; Sandaun, PNG], stress is assigned to the first syllable of the word, unless there is a heavy syllable later in the word. If it contains one or more long vowels, then the first of these receives primary stress; if there are no long vowels, then a final closed syllable will attract stress. In Yimas [Lower Sepik-Ramu; East Sepik, PNG] stress is initial, but epenthetic vowels are preferentially not assigned stress. When the initial vowel is underlying, as in /ŋa.riŋ/, then stress is initial, but a word like /t.kay/ ‘nose’ is realized with an iambic pattern: [tʰə̆ˈkʰaj], *[ˈtʰəkʰaj]. If both vowels in the initial foot are epenthetic, then stress is assigned to the first of these: /ŋmkŋn/ ‘underneath’ is realized as [ˈŋəmɡə̆ŋɨ ̆n]. Stress will not be realized rightwards of the initial foot in the word, showing that the left alignment of stress is realized both within and outside the foot, and that epenthetic vowels are visible for stress assignment. Blevins and Pawley (2010) present a treatment of the phonological ambiguity of epenthetic vowels in Kalam [Trans New Guinea; Morobe, PNH] that elaborates on the Yimas case. Kanum [Maro and Upper Morehead Rivers; Merauke, Indonesia] stress is a right-aligned trochee, as shown in /kæj.kæj/ ‘skin’, [ˈkæjkæj] and /jɒ.kɒ.mpjæ.to/ ‘bad’ [ˌjɒkɒˈmpjæto] (Donohue 2009). In contrast to Yimas, epenthetic vowels are ignored by the foot structure, such that any lexical vowel will be stressed in preference to an epenthetic vowel. A sole lexical vowel in the word is lengthened, as in /j.kæ/ ‘faeces’ [jĭˈkæː], *[ˈjɪkæ]. The invisibility of the epenthetic [ɪ] is shown by /ntæ.mɛ.tj/ ‘grandfather’, which has the structure CVCVCC and surfaces with initial stress, [ˈntæmɛtɪj], not *[ntæˈmɛtɪj], where the final surface syllable [tɪj] does not contain an underlying vowel and so is not footed. Again, /jelwpæ/ ‘paddle’ shows initial stress, [ˈjɛlʊ̆pæ], and the syllabification of /w/ is not visible to foot assignment. When there are no lexical vowels in the word, stress is assigned trochaically but is aligned to the left. While alignment in a word like /kj.kj/ [ˈkɪkɪ] ‘language’ is ambiguous, the pronunciations of /k.s.pl/ ‘cuscus’ and /k.lk.lwn/ ‘sky’ as [ˈkəsə̆ˌpə̆ɭ] and [ˈkəɭkə̆ˌɭʊ̆n] (and not *[kə̆ˈsəpə̆ɭ], *[kə̆ɭˈkəɭʊ̆n]) show initial stress and thus left alignment of the trochee. While the language is therefore generally trochaic, the edge of the word from which feet are aligned

392 BRETT BAKER, MARK DONOHUE, AND JANET FLETCHER depends on the presence of lexical vowels in the word. Note the change in stress when the dative [-nɛ] is added to /kspl/: in [kə̆sə̆pə̆ɭˈnɛː] final stress and vowel lengthening are both assigned to the sole lexical vowel, on the right edge.

26.3.2 Tone Tone is reported in approximately 30% of the languages of New Guinea. Languages with rich tone systems show strong areal concentrations, as shown in Map 26.1. The Lakes Plains in the northwest, the central north coast where the Skou family is found, and the eastern highlands around Kainantu and Goroka are regions where elaborate tone systems are the norm. Languages between these areas, especially along the central cordillera and the floodplains to their south, are likely to be tonal, but with only two or three melody contrasts. A smaller area is found among languages of the Onin peninsula in the far west of the island, where both Austronesian and non-Austronesian languages show tone systems. The Kiwai languages, like other languages of southern New Guinea, have uncomplicated word-prosodic systems. Arigibi Kiwai [Kiwai; Western, PNG] reveals a culminative constraint on tone assignment, such that while a word must have minimally one L, at most one H may occur, on any one of its syllables. As a result, there is no prosodic contrast in monosyllabic words, which are always L, and there is a three-way contrast on disyllables (LL, HL, and LH). An accentual analysis (words with a H vs. words with no H) would have to specify that a monosyllable cannot be accented. Elaborations of this system may have more than one melody on the accented syllable, as in Fasu [Trans New Guinea, possibly Papuan Gulf; Southern Highlands, PNG], where H and L contrast on the accented syllable (unaccented syllables bear a L). Monosyllables thus contrast H and L, while disyllables contrast HM, LM, MH, and ML, with the mid pitch assigned to the unaccented syllables. Folopa [Trans New

Map 26.1 Locator map for tonal elaboration in New Guinea. Grey circle: language with no tone; dark grey square: language with a two-way contrast in tone; black circle: language with three or more tonal contrasts.

AUSTRALIA AND NEW GUINEA 393 Guinea? Papuan Gulf?; Southern Highlands, PNG] has a similar system, with H, L, and Ø contrasting only on the final syllable of the word. All syllables without a H or L are realized as mid. Skou [Skou; Jayapura, Indonesia] shows contrastive accent realized only with the HL tone (other tones are H, L, LH, and LHL; Donohue 2003). Newman and Petterson (1990) offer a clear example of a language with word melodies, Kairi [Papuan Gulf?; Gulf, PNG]. Words are lexically specified for one of the four melodies H, LH, HL, and HLH. On monosyllables they produce high, rising, falling, and peaking contours, respectively. In polysyllabic words, the last syllable must be associated with the second-last tone of the melody. For instance, in /hakane/ ‘grasshopper’, the HL melody is realized as a high-high-fall pitch contour, [há.ká.nê], not *[há.kà.nè], which might be expected if tones were assigned from the left (7). (7)

σ HL

σ

σ

→ →

σ

σ

σ

σ

σ | H

L

σ H

L

[˥ ˥ ] * [˥ ˩ ˩]

A small set of words obey the additional condition that the second-last tone in the melody must be pre-associated with both the final syllable and the penultimate syllable, while any third-last tone is also associated with the penultimate syllable. In (8), we see that /kakiha/ ‘bamboo (sp.)’ has a LHL tone melody, [kà.kı ̌.hâ], with H pre-associated to both the final and the penultimate syllable and the first L pre-associated to the penultimate syllable. (8)

σ LHL

σ

σ

→ →

σ

σ

σ

σ

L

H

σ

σ

L

H

L

L

[˩ ] * [˩ ˩ ], * [˩ ˥ ]

Kairi thus has contrastive tone melodies and contrastive pre-association links. Many variations on tone systems with such word melodies can be found. The papers in McKaughan (1973) and James (1994) provide excellent overviews of the complexities found in the eastern highlands of New Guinea. The description of Sikaritai [Lakes Plains; Central Mamberamo, Indonesia] by Martin (1991) shows a language in which each syllable, the tone-bearing unit (TBU), can be assigned a H or a L, independent of any other syllable in the word. Monosyllables thus show a two-way contrast between H and L; disyllables show a four-way contrast between HH, HL, LH, and LL; and trisyllables show all eight possible combinations: HHH, HHL, HLH, HLL, LHH, LHL, LLH, and LLL. Affixes are independently specified for tone, and there are no noted interactions between root tone and affixal tone. The absence of word melodies in Sikaritai is unusual for a language of New Guinea. Odé (2000) presents a

394 BRETT BAKER, MARK DONOHUE, AND JANET FLETCHER detailed description of another language with an unconstrained assignment of tones to syllables, Mpur [isolate; Tambrauw, Indonesia]. Iha [Trans New Guinea; Fakfak, Indonesia] shows a consistent, and non-contrastive, final fall across all roots, analysable as H assigned to the entire root and a L% affecting the final syllable, creating a HL fall. This can be seen in words of one to four syllables: [tâp] ‘stick’, [kɔ́.rêt] ‘food’, [hí.ŋgé.rî] ‘bandicoot (sp.)’, [á.ɣó.βá.dê] ‘banana slug’ (see (9a)). The only exceptions to this involve initial epenthetic vowels, in which case a word-initial %L is realized, interacting with the H to produce a mid tone on the first syllable, as seen in [wəˉsájmbî] ‘fish (sp.)’, [˧ ˥ ], from /wsajmbi/. The lack of contrasts in root melodies suggests that a non-tonal analysis of prosody would be appropriate. Tonal contrasts are, however, present on grammatical suffixes. The demonstrative suffix [-(j)ɔ] ‘this’ is not associated with any lexical tone, and a root with this suffix behaves as if it were one syllable longer, as shown in (9b). By contrast, the allative suffix [-na] has an underlying L tone. It is raised to mid pitch by the H of the root and forms a fall through combination with the L% boundary tone; see (9c). Thus, while tone is not contrastive on lexical roots, it plays a role in the phonology of grammatical morphemes. (9) a. Isolation:

b. Proximate:

c. Allative:

tap

[ ]

kɔret

H

L%

H

[˥ ] L%

tap-(j)ɔ [˥ ]

kɔret-(j)ɔ

H

H

L%

tap-na [˥ ] H

L

L%

kɔret-na L%

H

L

L%

[˥ ˥ ] [˥ ˥ ]

The languages of Yapen island, in the northwest of New Guinea, show complex prosodic systems that are not easily characterized as stress or tone, as generally understood. Taking data from Saweru [West Papuan; Yapen Islands, Indonesia], we find contrasts such as those shown in (10), indicating that lexically contrastive prosody is a feature of the language. (10)

tá.ni ‘meat, body, name’ ta.ní mán.da ‘starfish’ man.dá

‘mat’ ‘sea urchin’

A simple analysis in terms of lexical stress is complicated by the presence of contrastive stress patterns on monosyllabic roots. In isolation, the verb roots in (11) are prosodically identical. When prefixed for person, they are associated with different stress patterns, with stress remaining on the root of ‘tooth’ but shifting to the prefix with ‘forget’. Bringing in an additional root, ne ‘pregnant’, we can see a third behaviour under prefixation, with stress on the initial syllable of the word. The data in (11) and (12) indicate not only that at least some of the feet in the language are ternary, not binary, but also that stress is contrastive in different positions in that foot, and that monosyllabic roots can project contrastive foot structure. Ternary feet are attested elsewhere in Sentani [East Bird’s Head-Sentani; Jayapura, Indonesia] (Elenbaas 1999).

AUSTRALIA AND NEW GUINEA 395 (11)

tu i.na.tú

(12)

ne

‘tooth’ ne ‘forget(fulness)’ ‘my tooth’ i.ná.ne ‘I forgot’

‘pregnant’

í.na.ne

‘I’m pregnant’

These data suggest a non-metrical system, more similar to the Arigibi Kiwai case mentioned earlier. The difference is that Saweru has a clear metrical structure; when the trochaic words /tá.ni/ and /u.ma.ná.naj/ are prefixed, we can see their binary foot structure projected onto the prefixes through the evidence of secondary stresses, as shown in (13). When the iambic root /man.dá/ ‘sea urchin’ is prefixed, it is an iambic foot that is projected. When /ta.ní/ ‘mat’, a root specified for a ternary foot with final stress, is prefixed, a ternary foot is projected, and when /wo.wé.nam.be/ ‘swamp’, which has a ternary foot with initial stress, is prefixed, we again see the lexically specified foot structure and prominence assignment replicated on the prefix. (13)

tá.ni ùmanánaj mandá taní wowénambe

‘body’ ‘hornbill’ ‘sea urchin’ ‘mat’ ‘swamp’

ì.na-táni i.mà.ma-ùmanánaj ì.ma.mà-mandá i.mà.ma-taní i.mà.ma-wowénambe

‘my body’ ‘our hornbill’ ‘our sea urchin’ ‘our mat’ ‘our swamp’

The distribution of the wide variety of word-prosodic systems in New Guinea tends not to be defined by genetic groupings, but does show areal trends. More investigation is evidently required into the Papuan languages off the main island of New Guinea.

26.4 Intonation in Papuan languages Descriptions of intonation in Papuan language have only extended past basic descriptions of declarative versus interrogative patterns in the past 15 years (e.g. Levinson 2010), though Pence (1964) is an exception. These have revealed different patterns of interaction between tone and intonation. Some languages, such as Kuot [isolate; PNG], show no interaction between intonation and stress (Lindström and Remijsen 2005), and excrescent H tones appearing in certain contexts in Fataluku (Timor-Alor-Pantar; Stoel 2008). Intonation overrides lexical tone in Skou, where we find contrastive topics mapped to a LHL contour that is not found lexically and which eliminates the lexical contour associated with a word.

26.5 Conclusion Despite their being among the world’s most sparsely investigated areas, this chapter has identified a relatively homogeneous prosodic blueprint for the languages of Australia and a strikingly high level of word-prosodic diversity among the languages of Papua New Guinea. Both areas urgently call for more research, since unique data are undoubtedly still waiting to be discovered.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 27

North A m er ica Gabriela Caballero and Matthew K. Gordon

27.1 Introduction The indigenous languages of North America display considerable prosodic diversity. Represented in the 34 language families and 28 language isolates of the continent are a striking array of stress, tone, and intonation systems, study of which has enhanced typological knowledge of prosody. This chapter provides an overview of several prosodic features in North American languages1 and their interaction with each other and with morphology. Although many languages are discussed here, we use the 30 North American Indian languages in the World Atlas of Language Structures (WALS) 200-language sample as the basis of our overview. This genetically broad sample is representative of the diversity of the pros odic systems found in North America. We discuss genetic and (to the extent they are apparent) geographically defined areas sharing prosodic features. North American languages are traditionally classified into several linguistic areas (for discussion see Goddard 1996; Campbell 1997; Mithun 1999). The languages of the southwest, which include the UtoAztecan languages spoken in northwestern and central Mexico, are variously classified in areal linguistic surveys as North American or Central or Middle American, since the southern periphery of the Southwest cultural/linguistic area has been contested in the anthropological literature. These languages, which are still spoken by large numbers of speakers, are of particular relevance here since they remain largely understudied and are prosodically complex. Our overview supplements the WALS sample with sources covering languages of this and other areas.

1 We adopt Campbell’s (1997) operationalization of the boundary between North and Middle America, which is placed at the Pánuco River in Central Mexico. We discuss languages and language families spoken north of this boundary (see also chapter 28).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

NORTH AMERICA 397

27.2 Stress in North American Indian languages Stress refers to increased prominence associated with one or more syllables in a word (see chapters 5 and 9). Stress has various language-specific phonetic exponents, including increased duration, greater intensity, higher fundamental frequency (f0), and shifts in spectral characteristics (Gordon and Roettger 2017). Stress is a relational property assessed across syllables and potentially entails multiple degrees of stress. For example, in the Plains Cree [Algic, Algonquian; Canada] word /pasaˌkwa:piˈsimoˌwin/ ‘Shut-Eye dance’ (Wolfart 1996: 431), primary stress falls on the fifth syllable, secondary stress falls on other odd-numbered syllables counted from the right, and even-numbered syllables are unstressed.

27.2.1 Typology of stress in North American Indian languages In North America, all major classes of stress system are represented. Stress systems fundamentally differ in whether the location of stress(es) is predictable based on phonological properties or is a lexical property of individual words or morphemes. Yuchi [isolate; United States] exemplifies lexical stress, e.g. /ˈʃaja/ ‘squirrel’ vs. /ʃaˈja/ ‘weeds’ (Wagner 1933: 309). In other languages, the position of stress is predictable based on distance from a word edge and/or the internal structure of syllables. In many languages, stress occurs at a fixed distance from the word edge. Thus, in Plains Cree, primary stress falls on the antepenult, the third syllable from the end of a word. In Lakhota [Siouan; United States] (Boas and Deloria 1941; Rood and Taylor 1996), stress falls on the second syllable, e.g. /waˈjawa/ ‘he reads’, /juˈwaʃte/ ‘to make good’, /ʧʰãˈhãpi/ ‘sugar’, /oˈtxũwahe/ ‘village’ (Rood and Taylor 1996: 444). The dichotomy between predictable and lexical/morphological stress operates more on a continuum than as a categorical property, since most languages with predictable stress have at least some words with exceptional stress. Lakhota, for example, has words with initial stress, e.g. /ˈsapa/ ‘black’, /ˈkʰata/ ‘hot’ (Rood and Taylor 1996: 442). As it turns out, second-syllable and antepenultimate stress are relatively rare among stress locations. More common are three other positions: initial stress, as in Southeastern Pomo [Pomoan; United States] (Moshinsky 1974; Buckley 2013); penultimate stress, as in Nahuatl [Uto-Aztecan; Mexico] (Beller and Beller 1979; Brockway 1979; Sischo 1979; Tuggy 1979; Wolgemuth 1981), and final stress, as in Sm’algyax Tsimshian [Tsimshianic; United States] (Dunn 1995). Of the 24 North American languages in Goedemans and van der Hulst’s (2013a) chapter on fixed-stress systems in WALS, eight have initial stress, four have second-syllable stress, one has antepenultimate stress, and five each have penultimate or final stress. (One language, Ho-Chunk [Siouan; United States], has third-syllable stress.) These proportions are representative of the worldwide distribution of fixed-stress locations with the exception of second-syllable stress, which is relatively overrepresented in the North American languages these authors surveyed (although two of these four languages, Dakota and Stoney, are closely related Siouan languages).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

398 GABRIELA CABALLERO AND MATTHEW K. GORDON The locations of fixed stress are also mirrored in stress window effects, whereby stress may vary but only within a two- or three-syllable margin at the edge of a domain (Everett and Everett 1984a, 1984b; Green and Kenstowicz 1995; Kager 2012). Stress windows are well attested in North America. For example, Dakota [Siouan; United States] (Shaw 1985) and Yaqui [Uto-Aztecan; United States and Mexico] (Demers et al. 1999) have initial disyllabic stress windows. Initial trisyllabic windows are extremely rare cross-linguistically (Kager 2012) but attested in the Tarahumaran branch of Uto-Aztecan; for example, prefixing reduplication triggers alternations to maintain stress within the window in Mountain Guarijío (Miller 1996). Plains Cree illustrates another feature common to many stress systems: rhythmic secondary stresses, which fall on odd-numbered syllables counting from the right edge. Kutenai [isolate; United States] (Garvin 1948) has another subtype of rhythmic secondary stress, in which stress falls (in the default case) on even-numbered syllables counted from the right with the stress on the penult being the primary one: e.g. /ˌkqaqaˈnaɬkqaːt͡s/ ‘automobile’, /ˌhuqaɬˌwiynʔoːɬʔˈxupxa/ ‘I want to know’ (Garvin 1948: 37). Cahuilla [Uto-Aztecan; United States] (Seiler 1957, 1965, 1977) has rhythmic stress oriented towards the left rather than the right edge, falling on odd-numbered syllables: e.g./ˈtaxmuˌʔaʔtih/ ‘song (objective case)’ (Seiler 1965: 57), /ˈhaʔaˌtisqal/ ‘he is sneezing’ (p. 52). Chickasaw [Muskogean; United States] (Munro and Ulrich 1984; Munro and Willmond 1994; Gordon 2004) also has left-oriented alternating stress but on even- numbered syllables: /ʧoˌkoʃkoˈmo/ ‘s/he plays’, /kiˌlimpiˈtok/ ‘s/he was strong’ (Gordon and Munro 2007). Plains Cree and Chickasaw are typically considered iambic languages, in which stresses are grouped into feet consisting of a weak syllable followed by a strong one, e.g. Chickasaw /(ʧoˌkoʃ)(koˈmo)/, whereas Kutenai and Cahuilla are trochaic languages, since they possess strong-weak feet, e.g. Kutenai /(ˌhuqaɬ)(ˌwiynʔoːɬʔ)(ˈxupxa)/. Another rarer variant of rhythmic stress is found in the Alutiiq variety of Yup’ik, which stresses the second syllable of a word and every third syllable to its right: /aˌtuːquniːˈki/ ‘if he (refl) uses them’, /piˌsuːqutaˈquːni/ ‘if he (refl) is going to hunt’ (Leer 1985b: 113; see also chapter 20).

27.2.2 Weight-sensitive stress Stress in Cahuilla and Chickasaw is also ‘weight- (or quantity-) sensitive’, in that ‘heavy’ syllable types preferentially attract stress. In Chickasaw, syllables containing a long vowel (CVV) or coda consonant (CVC) attract stress even if this interrupts the rhythmic stress pattern: /ˌissoˈba/ ‘horse’, /ˌokˌfokˈkol/ ‘type of snail’. Primary stress in Chickasaw typically falls on the final syllable of a word; however, a long vowel to the left of the final syllable attracts primary stress: /ˈbaːˌtamˌbiʔ/ ‘Chickasaw name’, /aˈboːkoˌʃiʔ/ ‘river’. Chickasaw thus observes a complex weight hierarchy in which CVV is heaviest since it preferentially attracts primary stress, CV is lightest since it attracts stress only in a rhythmically strong position, and CVC is intermediate in weight since it attracts at least secondary stress. Although relatively rare, scalar weight hierarchies are attested in several North American languages, e.g. Klamath [Penutian; United States] (Barker 1964),

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

NORTH AMERICA 399 Nez Perce [Penutian; United States] (Crook 1989), and Central Alaskan Yup’ik [Eskimo-Aleut; United States] (Woodbury 1985). Syllable weight can also be sensitive to vowel quality, most commonly involving a distinction between light schwa and heavy peripheral (i.e. non-schwa) vowels. In Lushootseed [Salishan; United States] (Bianco 1995), stress falls on the leftmost peripheral vowel in a word and on the first syllable in words consisting only of schwa: /jəˈlaʧiʔ/ ‘both hands’, /ˈp’əʧ’əb/ ‘bobcat’, /ˈdadatut/ ‘morning’ (Bianco 1995:4–5).

27.2.3 Iambic lengthening A feature common in North America but rare in other areas is ‘iambic lengthening’, whereby short stressed vowels in non-final open syllables lengthen, e.g. Chickasaw /ʧiˌkiːsiˈliːˌtok/ ‘s/he bit you’ (Gordon and Munro 2007) from underlying /ʧikisilitok/. Lengthening of all stressed syllables (through either vowel or consonant lengthening) is almost exclusive to iambic languages. Hayes’s (1995) survey includes 23 languages with iambic lengthening, of which all but four are found in North America. More generally, North American languages are atypical in being heavily biased towards iambic stress. Goedemans and van der Hulst (2013a) find a nearly five to one skewing in favour of trochaic stress in languages worldwide (153 trochaic languages vs. 31 iambic) but a prevalence of iambic rhythm in North America (16 iambic vs. 11 trochaic).

27.2.4 Morphological stress Morphology also influences stress in many languages of North America. For example, prefixes in Lushootseed are unstressed: the perfective prefix /uʔ-/ in /uʔ-ˈlil-aɬ/ ‘travelling far’ (Bianco 1995: 13) is unstressed even though it contains the leftmost peripheral vowel. Suffixes vary: some attract stress even when attached to roots containing a peripheral vowel, e.g. /-ap/ ‘bottom’ in /pixʷ-ˈap/ ‘brush off bottom’, whereas others do not (unless they follow roots containing only schwa), e.g. /-utsid/ ‘language’ in /ləˈliʔ-utsid/ ‘foreign language’ (Bianco 1995: 9) (see §27.5 for more on the interaction between morphology and prosodic domains).

27.2.5 Phonetic exponents of stress in North America The acoustic investigation of stress is relatively understudied for the languages of North America. In a recent survey, Gordon and Roettger (2017) identify acoustic studies of stress on 13 languages spoken in North America. In all 13, duration is an exponent of stress, where in virtually all cases stressed vowels are longer than unstressed vowels. Stressed syllables are also characterized by higher f0 and greater intensity in the majority of languages in Gordon and Roettger’s survey: seven of the ten languages for which f0 was investigated and seven of nine for which intensity was measured. In the case of f0, two of the three languages in which it did not signal stress, Choguita Rarámuri [Uto-Aztecan; Mexico] (Caballero and Carroll 2015) and Sekani [Athabaskan; United States] (Hargus 2005), have lexical tone, which plausibly restricts the availability of f0 as a stress cue.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

400 GABRIELA CABALLERO AND MATTHEW K. GORDON Other potential correlates of stress have been less studied. Stressed vowels in both Balsas Nahuatl [Uto-Aztecan; Mexico] (Guion et al. 2010) and Choguita Rarámuri (Caballero and Carroll 2015) display shallower spectral tilts—that is, a smaller decrease in energy as frequency increases, than unstressed vowels. Unstressed vowels are centralized in Chickasaw (Gordon 2004) but not in Lakhota (Cho 2006b).

27.3 Tone in North American Indian languages North American word-prosodic systems are widely diverse but mostly lack lexical tone: of the 30 North American Indian languages in the WALS 200-language sample, only nine are classified as tonal (Maddieson 2013a; see also Mithun 1999; Rice 2010). The larger sample of 526 tone languages in the WALS tone chapter (Maddieson 2013a) includes only 16 from North America. Tone is, however, widely found in Athabaskan (Rice and Hargus 2005) and Uto-Aztecan. The main generalizations about tone in North American languages are summarized in (1). (1) Tone in North American Indian languages • Tone systems tend to be ‘restricted’ in terms of number of tonal contrasts, with either two-tone or privative oppositions predominating. • Tone systems are also ‘restricted’ in terms of their distribution, with many coexisting alongside stress.2 • Tonal processes and tone distribution are often linked to the complex morphological structure of these languages. • Tone has largely developed historically from the loss of laryngeal features.

27.3.1 Tonal inventories The WALS survey classifies tonal systems as either ‘simple’ or ‘complex’, where simple systems (seven of the nine in the survey) have a privative or a two-tone contrast and complex systems (Acoma [Western Keresan; United States] and Koasati [Muskogean; United States] in the survey) have minimally a three-way contrast. Slave [Na-Dené; United States] (Rice 1987, 1989) exemplifies a simple (privative) tone system, distinguishing between high and low tone, where the latter is the default (unmarked) tone: /já/ ‘louse’ vs. /jà/ ‘sky’, /ʧíh/ ‘mitts’ vs. /ʧìh/ ‘fish hook with bait’ (Rice 1987: 40). Choguita Rarámuri (Caballero and Carroll 2015) is a complex tone system contrasting low, high,3 and falling tones: /mê/ ‘win’ vs. /mè/ ‘mezcal’, /niʔˈ wí/ ‘to be lightning’ vs. /niˈ wì/ ‘to have a wedding’. 2 Following Hyman (2006, 2009), we adopt a property-driven characterization of word prosody, where individual languages may exhibit stress-like phonological properties, tone-like phonological properties, or properties of both stress accent and tone, and exclude the possibility of a ‘pitch accent’ type. 3 Analysed in Caballero and Carroll (2015) as mid tones.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

NORTH AMERICA 401 Both simple and complex tone systems are identified within single language families. For example, alongside toneless languages, such as Ahtna [Athabaskan; United States] and Koyukon [Athabaskan; United States], Northern Athabaskan (Krauss 2005) contains many languages that have simple tone systems, but also others, such as Tsuut’ina [Athabaskan; Canada] (Sapir 1925; Cook 1971) and Tanacross [Athabaskan; United States] (Holton 2005), with complex tone inventories. Tonal systems can also be classified with respect to their shapes as level (with flat f0), contour (rising or falling), and complex, with two slopes (fall followed by rise or rise followed by fall). All North American tonal languages have level tones and none have complex tones. Falling tones occur in several languages, such as Choguita Rarámuri (Caballero and Carroll 2015) and the Kiowa-Tanoan languages [United States] Kiowa (Watkins 1984) and Jemez (Bell 1993). These languages do not have rising tones, in accordance with the more restricted distribution of rising tones relative to falling tones cross-linguistically (Zhang 2001, 2004b; see also Gordon 2016). Relatively rare in North America are languages with both falling and rising tones, as in Oklahoma Cherokee [Iroquoian; United States] (Wright 1996; Uchihara 2016) and Tanacross (Holton 2005). Contour tones in many languages, notably several in the southeast (Gordon 2017), such as Koasati [Muskogean; United States] (Gordon et al. 2015), Caddo [Caddoan; United States] (Chafe 1976a), Creek [Muskogean; United States] (Martin 2011), and Oklahoma Cherokee (Wright 1996; Uchihara 2016) are restricted to ‘heavy’ syllables: those with a long vowel in Oklahoma Cherokee and those containing either a sonorant coda or long vowel in Koasati, Creek, and Caddo.

27.3.2 Tonal processes Many common tonal processes are observed in North America, though they do not show any obvious areal or genetic distributions. Tone plateauing, where a single syllable or sequence of toneless or low toned syllables rise to a high tone between intervening high tones, is found in Creek (§27.3.3). Creek also displays downstep, whereby each lexically specified high tone triggers lowering of the immediately following high tone (Haas 1977; Martin 2011). Tone spreading is exemplified in Tanacross, where high tones spread rightward in a morphologically specified domain (Holton 2005). Other Athabaskan languages have similar tone-spreading processes, including Hare Slave and Gwich’in, among others (Rice and Hargus 2005), involving interactions between sub-constituents of morphologic ally complex words (§27.3.4).

27.3.3 Stress and tone Languages with both stress and tone have been sparsely documented cross-linguistically, with existing references focusing on European languages, including Swedish and Norwegian [Germanic; Sweden, Norway] (Kristoffersen 2000; Riad 2014), Latvian and Lithuanian [Balto-Slavic; Latvia and Lithuania] (Ternes 2011), and Franconian [Germanic; Germany] (Gussenhoven and Peters 2004). These kinds of system are nevertheless documented in

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

402 GABRIELA CABALLERO AND MATTHEW K. GORDON other areas of the world (Remijsen 2002; Remijsen and van Heuven 2005; Guion et al. 2010)4 and are well represented in North America. Tone and stress exhibit different relationships depending on the language, either interacting or functioning orthogonally to each other. North American languages commonly limit tonal contrasts to stressed syllables, as in Mohawk [Iroquoian; United States] (Michelson (1988; see also Mithun 1999: 24) and Yaqui (Demers et al. 1999). Tonal Uto-Aztecan languages of the southwest also exhibit this interaction. In Choguita Rarámuri, for instance, one of three lexical tones is exclusively realized in stressed syllables (Caballero and Carroll 2015). In Balsas Nahuatl, on the other hand, tone and a system of fixed stress interact in a clash avoidance or dissimilatory pattern in some sub-dialects, where words with innovated lexical tones shift stress accent from the penultimate to the final syllable (Guion et al. 2010). Other kinds of stress–tone interaction are attested in Athabaskan languages. In Hare Slave, high tones are attracted to stressed syl lables (Rice and Hargus 2005), whereas in Sekani (Hargus 2005) and Tanacross (Holton 2005), high tone attracts stress. Tone may also be sensitive to metrical structure. In Creek (Haas 1977; Martin 2011), which possesses a left-aligned iambic metrical system, words lacking lexical tone have a surface high tone that extends from the first to the last metrically strong syllable, e.g. /(nokó)(sótʃí)/ ‘bear cub’, /(awá)(nájí)ta/ ‘to tie to’ (Martin 2011: 75). The coexistence of stress and tone raises the question of how each system is encoded acoustically, and whether each system deploys mutually exclusive acoustic correlates (Remijsen 2002; Remijsen and van Heuven 2005). In innovative tonal varieties of Balsas Nahuatl, tone is reliably realized through f0; on the other hand, stress is encoded primarily through either spectral balance in non-tonal varieties or duration in tonal varieties (Guion et al. 2010: 25). Guion et al. attribute these differences in acoustic encoding of stress to a change in progress to maximize the distinctiveness of the stress system vis-à-vis the newly developed tonal contrast.

27.3.4 Grammatical tone The cross-linguistic study of tone has largely focused on its lexical phonological properties, its phonetic implementation and its interaction with other prosodic phenomena, but the purely morphological role of tone is still poorly understood (Palancar and Léonard 2016). Furthermore, and as noted in Hyman (2016b), while there is some areal skewing in terms of systems where tone is almost exclusively used with a lexical function (e.g. in East and South East Asian languages), most tonal systems in the world deploy tone in both lexical and grammatical functions. The use of tone as the exponent of a morphological category is not widespread in North America, but it is documented in some languages. In Navajo [Athabaskan; United States], the imperfective and perfective aspect are tonally distinguished in the pair /jìʧà/ ‘he’s crying (imperfective)’ vs. /jíʧà/ ‘he cried (perfective)’ (McDonough 1999: 509). In Koasati [Muskogean; United States], the negative suffix /-ko/ carries a high tone, e.g. /isko-laho/ ‘s/he will drink’ vs. /is-kó-laho/ ‘s/he will not take it’. In Choguita Rarámuri, the imperative singular construction has a low tone allomorph with falling-tone 4 This may be in part a by-product of analytical choices and the assumption of an ambiguously determined ‘pitch accent’ prosodic type.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

NORTH AMERICA 403 stems; this grammatical tone may override lexical tone, e.g. /maˈtô/ ‘she carries it (present)’ vs. /maˈtò/ ‘carry it!’, where the grammatical low tone replaces the lexical falling tone on the stressed syllable of the imperative. Tone may also be sensitive to morphologically defined domains. In Athabaskan, several prosodic phenomena, including tone, are sensitive to a tripartite distinction between the stem, conjunct prefixes (which immediately precede the stem), and disjunct prefixes (to the left of the conjunct domain). The stem preferentially attracts stress (see case studies in Hargus and Rice 2005) and the stem and disjunct domain contain a richer array of segmental and tonal contrasts than the conjunct domain. Athabaskan languages are rife with instances of morphologically conditioned tone effects, where particular morphemes may impose tone on adjacent morphemes at specific morphological junctures (see Rice and Hargus 2005). An instructive case of complex morphological tone interacting with other sources of tone is found in Choguita Rarámuri. As described above, this language has a three-way tonal contrast, HL vs. H vs. L, exclusively realized in stressed syllables. Stress accent is in turn morphologically conditioned and tonal patterns are partially predictable based on stress. Yet there is evidence for grammatical tone and tonal classes independent of stress. In add ition to the tonal allomorphy in the imperative singular described above, some verbs acquire morphologically conditioned L or HL tone in inflection: /ˈpà-li/ ‘she brought it’ vs. /ˈpâ-ma/ ‘she will bring it’. In other classes of verbs, specific morphemes such as the imperfective /-i/ suffix, condition a morphological L tone: /riˈwá-li/ ‘she saw it’ vs. /riˈwà-i/ ‘she used to see it’. Stressed suffixes also show different tone patterns: /to-ˈkâ/ ‘you (sg) take it!’ vs. /to-ˈsì/ ‘you all take it!’.

27.3.5 Tonal innovations and multidimensionality of tone realization In the Athabaskan family, a subset of languages developed tone from a Proto-Athabaskan contrast between stem-final glottalic versus non-glottalic consonants, where both a high tone and a low tone evolved in different languages from different articulatory implementations of constricted nuclei (Kingston 2005; Krauss 2005).5 In the southwest, in addition to the Apachean branch of Athabaskan, tone has developed in Tanoan and Keresan languages (Mithun 1999: 318) as well as in Hopi and other Uto-Aztecan languages of northwestern Mexico. Hopi tones have been proposed to have developed from loss of coda /h/ or preaspiration of a stop consonant in the following syllable onset (Manaster-Ramer 1986). In some varieties of Balsas Nahuatl, high tone developed on vowels followed by a breathyvoiced coda, where a lower pitch associated with this breathy-voiced coda was reinterpreted as a high pitch in the preceding syllable, with a resulting high-low tonal contour (Guion et al. 2010). Tone has also been innovated in Achumawi [Palaihnihan; United States] and in Upriver Halkomelem [Salishan; United States], the only Salishan language to have developed tone. 5 More recent reversals between high and low tones involved listeners’ misperception of late realization of f0 targets as tone in a preceding syllable (Kingston 2005). The different reflexes are analysed synchronically as tonal ‘flip-flop’, where the phonologically active (‘marked’) tone is either high or low.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

404 GABRIELA CABALLERO AND MATTHEW K. GORDON In the Algonquian language family, Arapaho, Cheyenne, and Kickapoo have been proposed to have developed tone independently (Mithun 1999): Arapaho tone developed via loss of final syllables and of pre-consonantal glottal stop (Goddard 1974, 1982) and Cheyenne tone developed from a Proto-Algonquian vowel length contrast (Frantz 1972) (see also Goddard 2013: 57–58). In Kickapoo, Vh sequences are the most likely source of low tone (Gathercole 1983). In addition to the diachronic interaction between voice quality and f0, these two dimensions interact synchronically, but in ways that are not fully understood (Kingston 2005; Kuang 2013a). There are some studies that shed light on the phonetic implementation of tonal contrasts in languages of North America. Tanacross high tones, the reflexes of ProtoAthabaskan constricted nuclei, are produced with creaky voice (Rice and Hargus 2005; for other instrumental analyses of tone in Athabaskan, see also e.g. de Jong and McDonough 1993; McDonough 1999; Gessner 2005; Hargus 2005). Instrumental analysis of tonal realization in Choguita Rarámuri reveals considerable inter-speaker variation, with some significant trends: (i) low tones are realized with creaky voice; (ii) high tones are realized with breathy voice; (iii) utterance-final position triggers lengthening, greater for low than falling or high tones; and (iv) falling tones are interrupted by a creaky phase or glottal stop (i.e. are rearticulated) in utterance-final position (Aguilar et al. 2015). Of these findings, the association of low tone with creaky voice is common cross-linguistically (Gordon 2001b), as creaky voice may synergistically be employed to reach a low-f0 target (Kuang 2013a). On the other hand, the association between high tone and breathy voice is unexpected, given that breathy voice is typically associated with lowering of the f0 (Hombert et al. 1979; cf. Garellek and Keating 2011). For Choguita Rarámuri, given that both low and falling tones have low-f0 targets (and hence more glottal constriction), the breathiness of high tones may serve to enhance the tonal contrast (Aguilar et al. 2015). Since encoding of prosodic constituency may also involve voice quality, in addition to or in place of f0 effects, such as breathiness in Chickasaw (Gordon and Munro 2007), an open question is how these phonetic dimensions interact in the prosodic systems of tone languages.

27.4 Intonation and prosodic constituency The characteristically complex morphological properties and rich traditions of oral narratives in North American Indian languages have provided fertile ground for the investigation of prosody above the word level. The common occurrence of long morphological words raises the possibility of a single word consisting of multiple prosodic domains, the mirror image of the more familiar typological situation in which multiple morphological words may group together to form a single prosodic unit (Jun 2005a). Gordon (2005) reports the possibility of a long verb being parsed into two phonological phrases, each with the characteristic LHHL tone pattern, in Chickasaw. For example, the nine-syllable word /akittimanompoʔlokitok/ ‘I didn’t speak to him’ may be broken down into a six-syllable phrase followed by a three-syllable one.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

NORTH AMERICA 405 However, it is more likely for a single morphological word, regardless of its length, to form a single prosodic phrase in Chickasaw. Indeed, isomorphism between morphological and prosodic units appears to be the norm in other languages with long morphological words. At one end of the spectrum, this isomorphism manifests itself in morphologically long words displaying a strong tendency (though not absolute in all cases; cf. Gordon 2005 on Chickasaw) to form a single prosodic domain, diagnosed in most documented cases through stress or other word-level prominence. In others, though, the isomorphic prosodic unit is the phrase as diagnosed through phrasal tone alignment, such as in Chickasaw (Gordon 2005) and West Greenlandic [Eskimo-Aleut; Greenland] (Arnhold 2014a; see also chapter 6). This preference for matching morphological and prosodic constituents results in considerable variability in the length of prosodic domains, which runs counter to the crosslinguistic preference for isochrony. For example, Berez (2011) finds that intonation units range from 2 to 20 syllables long in her study of Ahtna narratives. On the other hand, certain languages display evidence for smaller prosodic units than those governing stress or tone; it is common for morphemes that are more peripheral in a word to be excluded from these domains. The Muskogean family provides an example of this phenomenon (Martin 1996). Most languages in the family display evidence for iambic foot structure, with languages differing in the phonetic exponent(s) of feet and domain over which footing applies (as well as its productivity), although in all languages, certain more peripheral suffixes and clitics are excluded from the footing domain. Most restrictive is Alabama, in which the root, infixes, and the causative suffix are included in the footing domain, which is evidenced through iambic lengthening (§27.2.3). At the other end is Creek, in which all prefixes are parsed into feet, which are diagnosed through the distribution of high tone (§27.3.3). Intermediate are Chickasaw, Choctaw, and Mikasuki [Muskogean; Florida (Everglades)], in which foot structure, evidenced through iambic lengthening, encompasses certain prefixes (the non-agentive person markers and the applicative) but not others (the instrumental and the dative). For example, the non-agentive prefix /sa-/ belongs to the iambic lengthening domain in Chickasaw, e.g. /(sa-noː)si/ ‘I’m asleep’, but its cognate /ʧa-/ in Alabama does not, e.g. /ʧa-noʧi-hʧi/ ‘I’m asleep’ /*ʧa-noːʧi-hʧi/ (Martin 1996: 15). The dative prefix /im-/, on the other hand, is outside the footing domain in Chickasaw, e.g. /ʧim-(apiː)la/ ‘s/he helps her/him for you’ /*(ʧim-aː)(pila)/, but within it in Creek, e.g. /(im-ó)(pónáː)(j-ís)/ ‘s/he is talking for her/him’ vs. /(opó)(náː)(j-ís)/ ‘s/he is talking’ (Martin 2011: 183). As Martin (1996) suggests, the variation within the family is likely attributed to the historical process of grammaticalization by which independent words gradually fused to the root, with languages differing in their prosodic incorporation of the new affixes into the footing domain. At prosodic levels above the phrase, morphological complexity and information flow constraints jointly conspire to shorten the length of prosodic units as measured in terms of word count. Chafe (1994), for example, observes that in Seneca [Iroquoian; Canada/United States] intonation units, which are cognitively governed prosodic units that modulate the introduction of new information in the discourse, average approximately two words in length c ompared to four words in English, in keeping with the considerably greater morphological density of words in Seneca. North American Indian languages have been the subject of several studies of higher-level prosodic units, such as Tuttle (2005) on Apache [Na-Dené; United States], Berez (2011) on Ahtna [Na-Dené; United States], Lovick and Tuttle (2012) on Dena’ina [Na-Dené; United States], Beck and Bennett (2007) on Lushootseed [Salishan; United States], and Palakurthy

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

406 GABRIELA CABALLERO AND MATTHEW K. GORDON (2017) on Navajo [Na-Dené; United States]; these studies complement the poetic movement in Native American linguistic anthropology (e.g. Tedlock 1983 on Zuni [isolate; United States]; Bright 1984 on Karuk [Hokan; United States]; Woodbury 1985 on Yup’ik). In add ition to the intonation unit (or phrase), this body of research identifies phonetic evidence for another larger unit variously called the prosodic sentence, story unit, line, or utterance, and an even bigger constituent, the paragraph. These studies reveal that large prosodic constituents are signalled through a constellation of properties, including final lengthening, pitch reset, terminal pitch excursions, and pauses.

27.5 Prosodic morphology Another important way in which prosodic structure interacts with the morphological structure of North American Indian languages is through patterns of prosodic morphology, which have served a crucial role in the development of phonological and morphological theories (see chapter 7). A common phenomenon in North American Indian languages falling under the heading of prosodic morphology is reduplication. The copied part of a morpheme, the ‘reduplicant’, characteristically adheres to a template with a particular prosodic shape. For example, the distributive in Creek is formed by positioning a copy of the first CV sequence of the root before the onset of the final consonant: e.g. /likátʃw-iː/ ‘be dirty’ vs. /likatʃliw-íː/ ‘be dirty (two or more)’, /tóːsk-iː/ ‘be mangy’ vs. /toːstok-íː/, /hátk-iː/ ‘be white’ vs. /háthak-íː/ (Martin 2011: 203–204). In Cahitan languages (Yaqui and Mayo), reduplication is typologically unusual in imposing the template on the base rather than the reduplicant; the reduplicant thus varies to match the shape of the first syllable of the root, e.g. the Yaqui habitual aspect: /vu.vu.sa/ from /vu.sa/ ‘awaken’ vs. /vam.vam.se/ from /vam.se/ ‘hurry’ (Haugen 2009: 507–508). This differs from the Creek reduplicant, which has a fixed CV shape. The Yaqui data raise questions about cross-linguistic variation in reduplicative templates and the differential assignment of bases in reduplication within a single language (Haugen 2009: 505). The languages of North America also exhibit complex prosodic morphological phenomena triggered by affixation. A relevant example is found in Nu-cha-nulth (Nootka) [Wakashan; United States] (Sapir and Swadesh 1939) and other Wakashan languages, where length and laryngeal feature alternations may be associated with affixation constructions, a pattern that may be found across the Northwest Coast area of North America.6 In Ditidaht [Nitinaht; Southern Wakashan] (Stonham 1994: 42), a large number of suffixes trigger reduplication in the stem they attach to. As noted in Inkelas (2014: 152), reduplication in this case could be analysed as a morphophonological effect brought about by suffixation or it could be analysed as a purely morphological effect, where stems have allomorphs that are selected for by specific affixation constructions. A rarer type of templatic morphology is found in Tepiman languages [Uto-Aztecan; Mexico] exhibiting subtractive morphology, where morphological constituents are shortened by a unit of a determined phonological shape or size: for example, in Southeastern 6 We would like to thank an anonymous reviewer for pointing this out.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

NORTH AMERICA 407 Tepehuan, the perfective aspect is marked by deleting a final CV syllable from an imperfect ive stem, e.g. perfective /hoohoi/ vs. imperfective /hoohoidza/ ‘look at’, perfective /maa/ vs. imperfective /maki/ ‘to give’ (Willet 1991: 28). Many North American Indian languages also feature prosodic minimality effects. An interesting case where prosodic behaviour is inconsistent occurs in Tohono O’odham [UtoAztecan; Mexico]. Although stress is attributed to quantity-insensitive trochees, minimal word requirements and other prosodic morphological phenomena exhibit sensitivity to weight (Fitzgerald 2012).

27.6 Conclusions Both typological knowledge of prosody and theoretical treatments of prosodic phenomena are heavily indebted to research on North American Indian languages, many of which are severely endangered. The contributions of this geographically vast and genetically quite diverse region to our understanding of prosody continue to resonate in a multitude of research areas including stress, tone, intonation, prosodic constituency, and interactions between prosody and morphology. Further research on the complex prosodic systems of these languages combining qualitative and instrumental analysis will continue to shed light on the nature of prosody cross-linguistically.

chapter 28

M esoa m er ica Christian D i canio and Ryan Bennett

28.1 Introduction Mesoamerica spans from Northern Central Mexico to Costa Rica. Several unrelated language families occupy this territory, including the Oto-Manguean, Mayan, and TotoZoquean families (Brown et al. 2011a), and a few language isolates, such as Huave (Kim 2008), Xinca (Rogers 2010), and Tarascan (Purépecha) (Friedrich 1975). Although the Uto-Aztecan languages Nahuatl and Pipil are spoken in Mesoamerica—in close contact, for centuries, with other Mesoamerican languages—they are not generally considered part of the Mesoamerican linguistic area (Campbell et al. 1986).1 The same is true for the Chibchan and Misumalpan families. This chapter focuses on word prosody within the Mesoamerican area and, to a lesser extent, prosodic structure above the word. The word-prosodic systems of Mesoamerican languages are diverse, owing in part to a time-depth of 4,000–6,000 years within each family. The practice of equating language names with larger ethnolinguistic groups has also resulted in a vast underestimation of linguistic diversity; for example, ‘Mixtec’ refers to at least 18 mutually unintelligible dialect clusters, with roughly 2,000 years of internal diversification (Josserand 1983). This chapter is organized into three sections, corresponding to the major language families of Mesoamerica: Oto-Manguean, Mayan, and Toto-Zoquean. The prosodic systems of these languages diverge substantially. Many Mesoamerican languages make use of non-modal phonation in their segmental inventories or word-level prosody. Thus, in addition to stress, tone, and syllable structure, this chapter also examines phonation contrasts.

28.2 Oto-Manguean languages The Oto-Manguean family comprises approximately 180 languages spoken by about 2,148,000 people (INALI 2015). Historically, Oto-Manguean languages were spoken from 1 The prosody of the Uto-Aztecan family, including the various Nahuatl languages, is examined in chapter 27.

MESOAMERICA 409 Northern Central Mexico to as far south as Costa Rica, but all languages spoken south of Mexico are currently dormant or extinct (Chiapanec, Mangue, Subtiaba, and Chorotega). Oto-Manguean is divided into two major branches: East, with Mixtecan, Popolocan, Zapotecan, and Amuzgo subgroups, and West, with Mè’phàà-Subtiaba, Chorotegan, OtoPamean, and Chinantecan subgroups (Campbell 2017a). Oto-Manguean languages are morphologically mostly isolating, though verbs generally take one or more tense-aspectmood (TAM) prefixes. Most words may also take one or more pronominal enclitics. There is a strong tendency for morphophonology to involve fusional changes on the root.

28.2.1 Lexical tone All Oto-Manguean languages are tonal, without exception, and many also possess stress. There is a sizeable literature on tone in Oto-Manguean: we report here on a survey of the entire descriptive phonological literature on the family. A total of 94 language varieties were examined.2 Five relevant prosodic features for each language were extracted: (i) tonal contrasts, (ii) maximum number of tones on a single syllable, (iii) stress pattern, (iv) rime types, and (v) additional suprasegmental features. A summary of the tonal inventory size for each major sub-family is shown in Table 28.1. Table 28.1 shows that roughly half of all Oto-Manguean languages investigated (51 of 94, or 54%) possess small tonal inventories (2–3 tones), a sizeable portion (25 of 94 or 27%) possess intermediate inventories (4–5 tones), and another sizeable portion (18 of 94 or 19%) possess large inventories (6 or more tones). However, the size of the tonal inventory in an individual language only demonstrates part of the complexity of the tonal system, because often more than one tone may surface on an individual syllable. Thus, if a Mixtecan language has the same number of tones as a Zapotecan language, the Mixtecan language will typically allow more of them on the same syllable.

Table 28.1 Tonal complexity by Oto-Manguean language family

Family Amuzgo Chinantecan Mè’phàà-Subtiaba Mixtecan Oto-Pamean Popolocan Zapotecan Across all sub-families

Number of Languages

Number of tones 2–3

4–5

6–7

8–9

10–11

Average number of tonal contrasts per syllable

2 9 3 25 15 14 26 94

0 1 3 19 11 7 10 51

0 1 0 2 4 7 11 25

1 5 0 0 0 0 3 9

1 1 0 3 0 0 1 6

0 1 0 1 0 0 1 3

7 8 9 9 3 9 5 7

2 At the time of writing, this reflects all languages known to have been investigated in the OtoManguean family (not the total number of languages within each sub-family). There are no living speakers of any Chorotegan language and no extant descriptions of their tonal systems.

410 CHRISTIAN D ICANIO AND RYAN BENNETT

Table 28.2 Ixpantepec Nieves Mixtec (Carroll 2015) kwéé xı ı̃́ ̃ kw ı `ı̃́ ̃

‘slow’ ‘different’ ‘skinny’

vií ı ı̃ ̃ nı `ı̃ ̃

‘clean’ ‘one’ ‘corn ear’

tjìí vèe ı``ı̃ ̃

‘numb’ ‘heavy’ ‘nine’

H=/á/, M=/a/, L=/à/.

Most Oto-Manguean languages have at least two level tones, and many possess three or more. Languages that permit more than one level tone per syllable (especially Popolocan and Mixtecan) may possess a large number of contour tones. Examples from Ixpantepec Nieves Mixtec are shown in Table 28.2: high, mid, and low tones combine freely with another tone on the root,3 creating a set of six derived contour tones. In most Mixtec languages, roots consist of either a single syllable with a long vowel or two syllables with short vowels (Longacre 1957; Macaulay and Salmons 1995). Consequently, the tonal contours shown above also occur as sequences in disyllabic roots, e.g./kìki/ ‘sew’ (cf. [vèe] ‘heavy’ in Table 28.2). Since the distribution of tone is sensitive to root shape, researchers have argued that the tone-bearing unit (TBU) for many Mixtec languages is the bimoraic root, with tones being aligned to morae rather than syllables (McKendry 2013; DiCanio et al. 2014; Carroll 2015; Penner 2019). Note that not all contour tones are derived from tonal sequences in Oto-Manguean languages. In some, such as Yoloxóchtil Mixtec, contour tones are undecomposable units that contrast with tone sequences, e.g. /ta1.a3/ ‘man’ vs. /nda13.a3/ ‘went up’ (periods indicate moraic boundaries) (DiCanio et al. 2014). Tone sandhi is found in many Oto-Manguean languages as well, most notably in the Mixtecan, Zapotecan, and Popolocan families. Some seminal work on Oto-Manguean tone sandhi dealt with Mazatec and Mixtec languages (Pike 1948). Work on these languages was also important to the development of autosegmental-metrical theory (Goldsmith 1990). Tone sandhi in many Oto-Manguean languages is lexically conditioned. For example, in the same language, some roots with high tones may condition tonal changes on the following word, while other roots with high tones do not. The tonal systems of Chatino languages (Zapotecan) contain several different types of floating tones that illustrate this pattern. Examples from San Juan Quiahije Chatino (SJQC) are shown in Table 28.3. SJQC has 11 tones (H, M, L, M0, MH, M∧, LM, L0, 0L, HL, ML), where ‘0’ reflects a super-high tone and ‘∧’ reflects a slight rise. Table 28.3 shows that certain high and low tone roots in Chatino are specified with a floating super-high tone (‘0’), which can replace the tone on the following word. Since floating tones are lexically specified, and only surface in phrasal contexts, tonal inventories in these languages may be larger than previously assumed (e.g. because a high tone with no floating tone must be phonologically distinct from one with a floating super-high tone) (Cruz and Woodbury 2014). Tone is not merely lexical but often serves a morphological role in many Oto-Manguean languages, particularly in inflection (Hyman 2016b; Palancar and Léonard 2016). Tone has

3 Given the largely isolating morphology of Oto-Manguean, the terms ‘root’ and ‘stem’ are roughly synonymous for this family.

MESOAMERICA 411

Table 28.3 San Juan Quiahije Chatino tone sandhi (Cruz 2011) knaH ktaL snaH skwãL

‘snake’ ‘tobacco’ ‘apple’ ‘I threw’

ĩML ĩML ĩML ĩML

+ + + +

3S 3S 3S 3S

= = = =

knaH ĩML ktaL ĩML snaH ĩ0 skwãL ĩ0

‘his/her snake’ ‘his/her tobacco’ ‘his/her apple’ ‘I threw him/her’

Table 28.4 Yoloxóchitl Mixtec tonal morphology (Palancar et al. 2016)

‘to break’ (tr)

‘hang’ (tr)

‘to change’ (intr)

‘to peel’ (tr)

‘to get wet’

Stem NEG COMP INCOMP 1S

ta Ɂβi ta14Ɂβi4 ta13Ɂβi4 ta4Ɂβi4 ta3Ɂβi42

tʃi kũ tʃi14kũ2 tʃi13kũ2 tʃi4kũ2 tʃi3kũ2=ju1

na ma na14ma3 na13ma3 na4ma13 na1ma32

kwi i kwi14i14 kwi1i4 kwi4i14 kwi1i42

tʃi3i3 tʃi14i3 tʃi13i3 tʃi4i4 tʃi3i2

3

4

3

2

1

3

1 4

a high functional load in the morphology of Yoloxóchitl Mixtec (YM) (Table 28.4). YM has nine tones, /4, 3, 2, 1, 13, 14, 24, 42, 32/ (‘4’ is high and ‘1’ is low). Tonal changes in the initial syllable of the YM verb root indicate negation, completive (perfective) aspect, or incompletive aspect. On polysyllabic words, the penultimate syllable’s tone is replaced by the morphological tone. In monosyllabic words, the morphological tone is simply appended to the left edge of the syllable, creating complex tonal contours. The 1sg enclitic is realized as tone /2/ at the right edge of the root unless the root contains a final tone /2/ or /1/. In this environment, the allomorph of 1sg is an enclitic /=ju1/. It is possible to combine several tonal morphemes on a single root in YM, e.g. /tʃi14i(3)2/ ‘I will not get wet’. Many Oto-Manguean tonal systems are described and analysed in formal phonological terms in recent work (mostly using autosegmental phonology)—for example, in Mixtecan (Hollenbach 1984; Macaulay 1996; Paster and Beam de Azcona 2005; Daly and Hyman 2007; DiCanio 2008, 2016; McKendry 2013; Hernández Mendoza 2017), Oto-Pamean (Turnbull 2017), Popolocan (Beal 2011), and Zapotecan (Arellanes Arellanes 2009; Chávez Peón 2010; Tejada 2012; Antonio Ramos 2015; McIntosh 2015; Villard 2015). There are three major analytical issues these languages raise: (i) To what extent are contours decomposable into smaller units? (ii) What is the TBU? and (iii) Is tone sandhi or tonal morphophonology predictable? Can either be modelled by autosegmental rules or general phonological constraints? These issues have been examined in various languages, though, for a majority of Oto-Manguean languages, tone is minimally analysed (and, in several cases, not analysed at all).

28.2.2 Stress Stress is usually fixed in Oto-Manguean languages, and is always confined to roots/stems (affixes never receive stress). Most roots/stems are maximally disyllabic and, as a result, root-initial and root-final stress are the norm. The presence of stress in Oto-Manguean phonological systems can be motivated by distributional asymmetries: often, more segmental

412 CHRISTIAN D ICANIO AND RYAN BENNETT

Table 28.5 Stress pattern by Oto-Manguean language family Family Amuzgo Chinantecan Mè’phààSubtiaba Mixtecan Oto-Pamean Popolocan Zapotecan Total

Languages monosyllabic roots root-initial root-final root-penultimate variable 1 8 2

0 3 0

0 0 0

1 5 2

0 0 0

0 0 0

14 12 9 24 70

0 1 0 8 12

7 11 0 3 21

4 0 5 8 25

0 0 1 3 4

3 0 3 2 8

and tonal contrasts are possible on stressed syllables than unstressed syllables (Hollenbach 1984; DiCanio 2008; Hernández Mendoza 2017). In some languages, such as Mazahua (Knapp Ring 2008), tone is only contrastive on the stressed, initial syllable of the root. Of the 94 languages surveyed in §28.2.1, some description of stress was found for 70 (Table 28.5). Of the 58 languages without monosyllabic root structure, 25 (43%) have root-final stress and 21 (36%) have root-initial stress. Stem-penultimate stress is also described for certain Zapotec languages and for Metzontla Popoloca (Veerman-Leichsenring 1991).4 Variable (i.e. mobile) stress is found in several Oto-Manguean languages: Diuxi Mixtec (Pike and Oram 1976), Molinos Mixtec (Hunter and Pike 1969), Ayutla Mixtec (Pankratz and Pike 1967), San Juan Atzingo Popoloca (Kalstrom and Pike 1968), Tlacoyalco Popoloca (Stark and Machin 1977), and Comaltepec Zapotec (Lyman and Lyman 1977). Since tone may also interact with stress, such languages have been of interest within the larger phono logical literature (e.g. de Lacy 2002), though older descriptions of these languages warrant further phonological and phonetic investigation. Given that stress is assigned primarily to roots, secondary stress is absent in most Oto-Manguean languages, though alternating, head-initial trochaic stress is reported for several languages: San Miguel Tenoxtitán Mazahua (Knapp Ring 2008), Déposito Mazahua (Juárez García and Cervantes Lozada 2005), Acazulco Otomí (Turnbull 2017), San Lucas Quiaviní Zapotec (Chávez Peón 2010), and Lachíxio Zapotec (Sicoli 2007). Little work has examined the phonetic correlates of stress in Oto-Manguean languages, though stress has been explored instrumentally in a few Mixtecan languages: Ixpantepec Nieves Mixtec (Carroll 2015), Southeastern Nochixtlán Mixtec (McKendry 2013), and Itunyoso Triqui (DiCanio 2008, 2010). In each of these languages, the main correlate of stress is acoustic duration. Note that 47 of 94 (50%) of the languages surveyed here also possess a vowel/rime length contrast, and so duration may not be a stress cue in all languages. The phonetics of stress remains an open area of inquiry in Oto-Manguean linguistics. For 11 of the 94 languages surveyed, a contrast is reported between ‘ballistic’ and ‘controlled’ stress: all nine Chinantecan languages surveyed, Xochistlahuaca Amuzgo (Buck 2015), and San Jerónimo Mazatec (Bull 1978). Ballistic syllables, first described by Merrifield 4 As some of these languages can possess trisyllabic words, it is currently unclear whether the intended generalization in the existing descriptions is that stress is root-initial or truly penultimate.

MESOAMERICA 413

Table 28.6 Controlled and ballistic syllables (marked with /ˊ/) in Lalana Chinantec (Mugele 1982: 9) Controlled stress ɔː dʒi3 liː23 2

Ballistic stress ‘mouth’ ‘chocolate atole’ ‘appears’

ɔ́ː2 dʒí3 líː23

‘bury it!’ 'wind’ ‘remembers’

1 = high tone, 2 = mid tone, 3 = low tone.

(1963) and reviewed in Mugele (1982), may possess some or all of the following phonological characteristics: (i) fortis-initial onsets, (ii) shorter vowel duration, (iii) an abrupt, final drop in intensity, (iv) tonal variation (specifically f0 raising), (v) post-vocalic aspiration, and/or (vi) coda devoicing. Examples from Lalana Chinantec are shown in Table 28.6. Though the controlled/ballistic distinction is considered to be a type of ‘stress’, these contrasts may occur in monosyllabic lexical words, making them fundamentally different from true word-level stress distinctions (Hyman 2006). Mugele argues, on the basis of acoustic data, that the distinguishing feature of ballistic syllables in Lalana Chinantec is an active expiratory gesture that raises subglottal pressure and produces syllables that have most of the characteristics mentioned above (all except (i)). Silverman et al. (1995) and Kim (2011) find no evidence for this contrast in San Pedro Amuzgos or Jalapa Mazatec, respectively, despite previous descriptions. Regarding ballistic syllables, Silverman (1997a: 241) states that ‘a byproduct of this increased transglottal flow (for producing post-vocalic aspiration) is a moderate pitch increase on the latter portion of the vowel, around the onset of aspiration’. A major question is the extent to which the acoustic features of controlled and ballistic syllables are derivable from a single articulatory parameter. Since little instrumental work has been done on this question, the nature of this unique contrast remains an open area of research.

28.2.3 Phonation type Some Oto-Manguean languages possess phonation type contrasts in their consonant, vowel, and/or prosodic systems (see Silverman 1997a). Phonation type is usually orthogonal to tone in the phonological system, though tone and phonation are interdependent in some Zapotec languages. For instance, Jalapa Mazatec (Popolocan) possesses a three-way distinction between breathy, modal, and creaky vowels, but all three tones (high, mid, low) cooccur with each phonation type (Silverman et al. 1995; Garellek and Keating 2011). Itunyoso Triqui (Mixtecan) has coda glottal consonants (/Ɂ/ and /ɦ/) as well as intervocalic /Ɂ/: contour tones do not surface on syllables with coda /Ɂ/, but most tonal patterns surface on words with intervocalic glottalization or coda /ɦ/ (DiCanio 2008, 2012a). Intervocalic /Ɂ/ in Itunyoso Triqui is frequently realized as creaky phonation on adjacent vowels (DiCanio 2012a). Table 28.7 demonstrates that glottal contrasts in Itunyoso Triqui are orthogonal to tonal contrasts, though they may still interact with them in certain ways (e.g. no contour tones surface before /Ɂ/). In many Oto-Manguean languages, glottalized or creaky vowels are realized in a phased manner (Silverman 1997a, 1997b; Gerfen and Baker 2005; Avelino 2010; DiCanio 2012a).

414 CHRISTIAN D ICANIO AND RYAN BENNETT

Table 28.7 The distribution of Itunyoso Triqui tones in relation to glottal consonants Tone

Modal

Coda /ɦ/

/4/ /3/ /2/ /1/ /45/ /13/ /43/

ββe4 nne3 nne2 nne1 ββi13 tʃe43

‘hair’ ‘plough’ ‘to lie’ ‘naked’ ‘two of them’ ‘my father’

yãɦ4 yãɦ3 nãɦ2 kãɦ1 nãɦ45 nãɦ13 nnãɦ43

/32/ /31/

nne32 nne31

‘water’ ‘meat’

nnãɦ32

‘dirt’ ‘paper’ ‘again’ ‘naked’ ‘to wash’ ‘this (one)’ ‘mother! (voc.)’ ‘cigarette’

Coda /Ɂ/

/VɁV(ɦ)/

tʃiɁ4 tsiɁ3 ttʃiɁ2 tʃiɁ1

ɾã4Ɂãɦ4 nã3Ɂãɦ3 ta2Ɂaɦ2 na1Ɂaɦ1 nã3Ɂãɦ45 kã1Ɂãɦ3 ko4Ɂo43

‘to dance’ ‘limestone’ ‘some, half ’ ‘shame’ ‘I return’ ‘four of them’ ‘to drink’

sã3Ɂãɦ2 kã3Ɂã1

‘money’ ‘wind, breath’

‘our ancestor’ ‘pulque’ ‘10’ ‘sweet’

Creaky vowels are produced as sequences, i.e. [aa̰a], rather than with a sustained duration of creaky phonation throughout the vowel. In most Zapotec languages, there is in fact a contrast between a checked vowel, i.e. /aɁ/ → [aɁ], and a rearticulated vowel, i.e. /aɁa/ → [aa̰a]. The latter is realized with weak creaky phonation and the former with more abrupt glottal closure. Both vowels behave as single-syllabic nuclei in Zapotec (Avelino Becerra 2004; Arellanes Arellanes 2009).5 A number of Oto-Manguean languages also possess phonation type contrasts among consonants. Almost all Oto-Pamean and many Popolocan languages have a series of aspirated/breathy and glottalized consonants, e.g. Mazahua /màɁa/ ‘to go’ vs. /m̥ aûphɨ/ ‘nest’ vs. /m̰ása/ ‘grub’ (Knapp Ring 2008). The representation of these complex consonants has been a topic of some theoretical interest (e.g. Steriade 1994; Golston and Kehrein 1998).

28.2.4 Syllable structure and length Many Oto-Manguean languages permit complex rimes, especially in the Oto-Pamean and Zapotecan families (Jaeger and Van Valin 1982; Berthiaume 2004), e.g. Northern Pame /st͡s’aˇ hawnt/ ‘tree knot’ and /st͡sháwɁ/ ‘ruler’.6 The distribution of rime types is shown in Table 28.8. Roughly a third of all languages permit only open syllables (33 of 94, 35%), while a sizeable number of languages permit only a glottal consonant coda (22 of 94, 23%) or a single (buccal) coda consonant (27 of 94, 29%). Seven languages permit closed syllables only in nonword-final syllables and five additional languages permit more complex coda types. While not shown here, many Oto-Manguean languages permit complex onsets as well, especially in languages where pre-tonic syncope has taken place via historical sound change, e.g. compare Zenzontepec Chatino /lutzeɁ/ ‘tongue.3S’ to Tataltepec Chatino /ltzéɁ/ (Campbell 2013). Prefixation may also produce complex onset clusters on verbs (Jaeger and Van Valin 1982). 5 This differs from the Triqui data in Table 28.7, where the /VɁV(ɦ)/ examples are disyllabic (DiCanio 2008). 6 The sole exceptions within Zapotecan are the five Chatino languages, none of which permit codas other than /Ɂ/.

MESOAMERICA 415

Table 28.8 Permitted rime types and length contrasts by Oto-Manguean family Permitted syllable types

Family Amuzgo Chinantecan Mè’phàà-Subtiaba Mixtecan Oto-Pamean Popolocan Zapotecan Total

(C)V(C) (but Languages (C)V (C)V(P/h) *(C)VC#) 2 9 3 25 15 14 26 94

0 0 2 19 0 12 0 33

2 6 1 6 0 2 5 22

0 0 0 0 7 0 0 7

(C)V(C)

(C)V(C) (C)

Length contrasts

0 3 0 0 3 0 21 27

0 0 0 0 5 0 0 5

0 9 2 3 4 3 26 47

Length contrasts occur in 50% (47 of 94) of the languages surveyed. For Mixtec languages, roots are typically bimoraic (see §28.2.1). Thus, there is a surface contrast between short vowels in polysyllabic words (e.g. CVCV) and long vowels in monosyllabic words (e.g. CVV). This type of root template is not counted as a length contrast here. For Zapotec languages, the contrast between fortis and lenis consonants involves an alternation with vowel length on the root. Long vowels surface before a lenis (or short) consonant but short vowels surface before a fortis (or long) consonant (Avelino 2001; Leander 2008; Arellanes Arellanes 2009; Chávez Peón 2010), e.g. /wdzínː/ ‘arrived’ vs. /dzìːn/ ‘honey’ in Ozolotepec Zapotec (Leander 2008). This trade-off in duration between the vowel and consonant in Zapotec is similar to the C/V trading relation with voicing in languages like English (Port and Dalby 1982; Luce and Charles-Luce 1985) and, in fact, the fortis/lenis contrast in many Zapotec languages has evolved into a voicing contrast among obstruents (Beam de Azcona 2004).

28.2.5 Intonation and prosody above the word Given the complexity of word-level prosody in Oto-Manguean languages, fairly little work has been done to date examining prosodic structure above the word. Lexical tone has a high functional load and most morphemes in Oto-Manguean languages are specified for tone. Intonational pitch accents are fairly limited, and evidence for prosodic phrasing must therefore be based on patterns of lengthening and the domains of phonological processes such as tone sandhi. Tone production in certain languages is sensitive to phrasal position. Declination and/or final lowering influences the production of tone in Coatlán Lochixa Zapotec, where rising or level tones are realized with a falling f0 pattern in utterance-final position (Beam de Azcona 2004)). In Chicahuaxtla Triqui, a phrase-final tone (/3/) is appended to noun phrases (NPs) (Hernández Mendoza 2017). In Ixcatec (Popolocan), low tones surface only at the end of a phonological phrase. In phrase-internal (but word-final) position, all low tones neutralize with mid tone (DiCanio, submitted). In Figure 28.1a, we observe complete overlap in the production of low and mid tones. These same target words are realized with different tones when they appear in utterance-final position. In Figure 28.1b, we also observe a separate pattern of high tone lowering in utterance-final position.

416 CHRISTIAN D ICANIO AND RYAN BENNETT Tones in non-final position in monosyllabic words

Tones in final position in monosyllabic words

2

2 H

L

3

4

1 f0 (normalized)

f0 (normalized)

1

Tone M

0

–1

0

–1

H

Tone M

L

–2

–2 1

2

3

4

Time (normalized)

5

1

2

5

Time (normalized)

Figure 28.1 Tones in utterance non-final and utterance-final position in Ixcatec. The figures show f0 trajectories for high, mid, and low tones, averaged across four speakers.

Tone sandhi provides the clearest evidence of higher-level prosodic structure in OtoManguean languages. In Zenzontepec Chatino, high tones spread rightward onto toneless syllables (Ø) but adjacent mid (/ā/) or high (/á/) tones undergo downstep. This downstep extends to the end of the intonational phrase (IP) (1). (1) Intonational domains in high tone downstep in Zenzontepec Chatino (Campbell 2014: 138, citing ‘la familia 9:36’) (Tones in the initial line are underlying. Tones below this are derived.) (jā kisōʔná=na tāká)IP (maxi k-ii=ą laaɁ nyāɁā)IP Ø Ø.M.H=H ↓(M.H) Ø.Ø Ø=Ø Ø M.M conj master=1pl.incl exist[.3] even.if pot-feel=1pl.incl like.so see.2sg ‘We have our master, even if we think that way, you see.’ Little instrumental research has been done on phonological phrasing but, impressionis tically, two general patterns typify the Oto-Manguean family: (i) the verb (with all TAM affixes) and a following NP usually form a phonological phrase, with no pause between the verb and the NP, and (ii) any pre-verbal free morphemes belong to a separate phonological phrase.7 The pattern in (i) is grammaticalized in San Ildefonso Tultepec Otomí, 7 VSO word order is the most common for Oto-Manguean languages (Campbell et al. 1986) and, as alluded to above, the juncture between the root and the following personal clitic is the locus of complex morphophonological patterns across the language family.

MESOAMERICA 417 where there are two classes of verbs (bound and free), the former of which is used when the verb forms a phonological phrase with the following NP (Palancar 2004). With respect to (ii), the pre-verbal domain serves as a position for constituents under argument or contrastive focus in many Oto-Manguean languages (Broadwell 1999; Foreman 2006; Chávez Peón 2010; Esposito 2010; McKendry 2013; Carroll 2015; DiCanio et al. 2018). Finally, new words are formed in many Oto-Manguean languages through compounding, which may involve phonological changes sensitive to constituency. In Southeastern Nochixtlán Mixtec (Mixtecan), auxiliary verbs and verbal prefixes are reduced before verb roots, suggesting that the verbal complex (aux + pfx − root = enclitic) is a prosodic unit (McKendry 2013). In comparison to research on lexical tone, investigations into higher-level prosodic structure remain a robust, though challenging, area for future research.

28.3 Mayan languages The Mayan family comprises some 30-odd languages, spoken by over 6 million people in a region spanning from southeastern Guatemala through southern Mexico and the Yucatan peninsula (Bennett et al. 2016). The principal subgroups of this family are Eastern Mayan, Western Mayan, Yucatecan, and Huastecan. Huasteco, the most linguistically divergent Mayan language, is spoken far from the Maya heartland in east-central Mexico (Kaufman 1976a). There is evidence of considerable linguistic contact among Mayan languages, and between Mayan and other Mesoamerican languages (Campbell et al. 1986; Law 2013, 2014). Aissen et al. (2017) is a comprehensive source on Mayan languages, their history, and their grammatical structures. On the phonetics and phonology of Mayan languages, see Bennett (2016) and England and Baird (2017). Glossing conventions and orthographic practices in this section follow Bennett (2016) and Bennett et al. (2016).

28.3.1 Stress and metrical structure Stress is predictable in Mayan languages, with few exceptions. Four distinct patterns of stress assignment are robustly attested within the family: The first is fixed final stress, found in K’ichean-branch Mayan languages and Southern Mam (all Eastern Mayan languages of Guatemala). (2) Sakapulteko (DuBois 1981: 109, 124, 138; Mó Isém 2007) a. axlajuuj [ʔaʃ.la.ˈxuːx] ‘thirteen’ b. kinb’iinik [kim. iː.ˈnekh] ‘I walk’ c. xinrach’iyan [ʃin.ʐə.tʃ ʔ i.ˈjaŋ] ‘he hit me’ d. kaaqaqapuuj [kaː.qa.qa.ˈpuːχ] ‘we will go to cut it’ ɡ

The second is fixed penultimate stress, found in Southern Mam.

418 CHRISTIAN D ICANIO AND RYAN BENNETT (3) Ostuncalco Mam (England 1983, 1990: 224–226; Pérez Vail and Jiménez 1997; Pérez Vail et al. 2000) a. kyaaje’ [ˈkjaː.χe ʔ] ‘four’ b. quniik’un [qu.ˈniː.k ʔun] ‘night’ c. t-xmilaal [ˈtʂmi.laːl] ‘his/her body’ d. kaab’aje [kaː.ˈ a.χe] ‘day before yesterday’ ɡ

The third pattern is quantity-sensitive stress, found in Huasteco as well as some Mamean languages (Northern Mam, Ixil, Awakateko, and Teko; all Eastern Mayan). In Huasteco, stress falls on the rightmost long vowel, otherwise on the initial syllable (Larsen and Pike 1949; Edmonson 1988; Herrera Zendejas 2011). Long vowels also attract stress in Mamean languages, as do syllables ending in [VɁ], [VɁC], or even [VC], depending on the language. In some cases (e.g. Northern Mam), stress assignment may follow a complex weight scale [Vː] > [VɁ] > [VC] > [V] (Kaufman 1969; England 1983, 1990). (4) Chajul Ixil (Ayres 1991: 8–10; Poma et al. 1996; Chel and Ramirez 1999) a. Default penultimate stress: (i) ib’otx’ [ˈʔi. oʈʂʔ] ‘vein’ (ii) amlika’ [ʔam.ˈli.kaʔ] ‘sky’ ɡ

b. Stress attraction to final [Vː], [VʔC#] (i) ixi’m [ʔi.ˈʂiʔm] ‘corn’ (~[ˈʔi.siʔm]) (ii) vitxoo [βi.ˈʈʂoː] ‘his/her animal’

More restricted patterns of quantity sensitivity are attested in Uspanteko (§28.3.2) and possibly K’iche’ (Henderson 2012). These cases involve additional conditioning by tone and/or morphological structure (also reported for quantity-sensitive stress in Mamean languages, e.g. England 1983). The fourth pattern is phrasally determined stress. Several languages in the Q’anjob’alan subgroup of Western Mayan have variable stress conditioned by phrasal position: stress is normally on the first syllable of the word or root, but it shifts to the final syllable in phrasefinal position. Phrasally conditioned stress is well documented for Q’anjob’al (5) and its close relatives Akateko and Popti’ (Day 1973; England 2001). (5) Q’anjob’al (Mateo Toledo 1999, 2008: 94–96; Baquiax Barreno et al. 2005) A naq Matin max kokolo’, naq kawal miman. [a naqx ˈma.tin maʂ ko.ko.ˈloɁ, naqx ˈka.wal mi.man] foc clf Matin com.a3sg e1pl.help.tv clf tns big.a3sg ‘It was Matin who we helped, the big one.’ It remains unclear whether ‘stress shift’ in this pattern actually affects word-level stress or instead reflects the addition of a non-metrical, intonational prominence to phrase-final syllables (i.e. a boundary tone; see Gordon 2014 for discussion). Descriptions of Yucatecan and Western Mayan languages (particularly the Greater Tseltalan subgroup) commonly report

MESOAMERICA 419 complex interactions between stress, phrase position, sentence type, and intonation (§28.3.5). For example, Vázquez Álvarez (2011: 43–45) states that Ch’ol has word-final and phrase-final stress in declaratives, but initial stress in polar questions (6) (see also Attinasi 1973; Warkentin and Brend 1974; Coon 2010; Shklovsky 2011). (6) a. buchuloñtyokula [bu.t∫u.loɲ.tjo.ku.ˈla] ‘yes, we are still seated’ b. buchuloñäch [ˈbu.t∫u.lo.ɲi.t∫] ‘Is it true that am I seated?’ Such patterns may indicate that ‘stress’ is phrasal rather than word level in some Mayan languages (as claimed by e.g. Polian 2013 for Tseltal) or that phrasal stress and intonation mask the position of word-level stress in certain contexts. Given these uncertainties, the description of word prosody and phrasal prosody in the Western Mayan and Yucatecan languages would benefit from more targeted investigation. There is little consensus over stress assignment in Yucatec. Since the influential early study of Pike (1946), Yucatec has been described as having a mixture of quantity-sensitive and initial/final stress (e.g. Fisher 1973; Fox 1978; Bricker et al. 1998; Gussenhoven and Teeuw 2008; see Bennett 2016 for more references). Existing analyses are not all mutually compatible, and the actual phonetic cues to stress in Yucatec remain obscure. It has even been suggested that Yucatec, a tonal language (§28.3.2), may lack word-level stress altogether (Kidder 2013). Chontal (Western Mayan) is the only language in the family that provides clear evidence for phonemic stress, e.g. u p’isi [Ɂu ˈpʔi.si] ‘he measured it’ vs. u p’isi [ʔu pʔi.ˈsi] ‘he wakened him’ (Keller 1959; Knowles 1984; Pérez González 1985). However, many minimal pairs for stress in Chontal are morphologically or syntactically conditioned (e.g. a sutun [Ɂa su.ˈtun] ‘you turn it over’ vs. sutun [ˈsu.tun] ‘Turn it over!’; Knowles 1984: 61–62). Most Mayan languages lack word-level secondary stress, apart from morphological compounds composed of two or more independent words (e.g. Ch’ol matye’ chityam [ma.ˌtʲeʔ ⁀ tʃi. ˈtjam] ‘wild boar’; Vázquez Álvarez 2011: 44). However, there are a few scattered claims of secondary stress in non-compound words as well (Bennett 2016: 497). Perhaps because most Mayan languages lack rhythmic, alternating stress, not much has been written about abstract foot structure in this family. Bennett and Henderson (2013) argue that foot structure conditions stress, tone, and segmental phonotactics in Uspanteko. In their analysis, final stress involves iambic footing (e.g. inb’eweroq [Ɂim.ɓe(we. ˈroq)] ‘I’ll go to sleep’), whereas penultimate stress (with tone) involves trochaic footing (e.g. intéleb’ [Ɂin(ˈté.leɓ)] ‘my shoulder’) (Can Pixabaj 2007: 57, 224). Bennett and Henderson support this analysis by arguing that foot-internal vowels are more susceptible to deletion than footexternal vowels, both under iambic and trochaic footing.

28.3.2 Lexical tone Most Mayan languages lack lexical tone, suggesting that Proto-Mayan and its immediate daughters were not tonal languages (though see McQuown 1956; Fisher 1973, 1976 for other views). However, lexical tone has emerged several times within the Mayan family, mostly as a reflex of post-vocalic [h Ɂ], which were often lost in the process of tonogenesis (see

420 CHRISTIAN D ICANIO AND RYAN BENNETT Fox 1978; Bennett 2016; Campbell 2017b; England and Baird 2017). Yucatec is the best- studied tonal language in the family (Pike 1946; Blair 1964; Bricker et al. 1998; Frazier 2009a, 2009b, 2013; Sobrino Gómez 2010; and many others). Lexical tone is also attested in Southern Lacandon (Yucatecan), Uspanteko (Eastern Mayan), Mocho’ (Western Mayan), and possibly one variety of Tsotsil (Western Mayan; see below in this section). Incipient tone is reported for both Teko and the Ixtahuacán variety of Mam (Eastern Mayan; England and Baird 2017) as well as Tuzanteco (Western Mayan; Palosaari 2011). Yucatec has a contrast between high /V́ / and low /V̀ ː/ on long vowels (e.g. miis /mìːs/ ‘cat’ vs. míis /míːs/ ‘broom’; Sobrino Gómez 2010). Short vowels are realized with pitch in the low-mid range, and are standardly analysed as phonologically unspecified for tone. Additionally, ‘rearticulated’ /VɁV/ vowels (phonologically a single nucleus; §25.3.3) are realized with a sharply falling pitch contour. The phonetic realization of tone, particularly high /V́ ː/, varies with phrasal position and intonational context in Yucatec (e.g. Kügler and Skopeteas 2006; Gussenhoven and Teeuw 2008). Southern Lacandon, another member of the Yucatecan branch, is described as having a contrast between high /V́ ː/ and toneless /Vː/ long vowels; as in Yucatec, short vowels are phonologically toneless (Bergqvist 2008: 64–66; cf. Fisher 1976). Uspanteko has a contrast between high (or falling) tone /V́ ː/ and low (or unspecified) tone /Vː/ on long vowels in stressed, word-final syllables (e.g. chaaj [ˈ⁀ tʃáːχ] ‘ash’ vs. kaaj [ˈkaːχ] ‘sky’; Can Pixabaj 2007: 69, 110; see also Bennett and Henderson 2013). Additionally, words with short vowels in the final syllable show a contrast between toneless […σˈσ] and tonal […ˈσ́σ], in which both stress and high tone occur on the penult (e.g. ixk’eq [Ɂiʃ.ˈkɁeq] ‘fingernail’ vs. wixk’eq [ˈwíʃ.kɁeq] ‘my fingernail’). (See Kaufman 1976b; Campbell 1977; Grimes 1971, 1972 for different descriptions of stress and tone in Uspanteko.) Palosaari (2011) describes nouns in Mocho’ as having a three-way contrast in stressed, final syllables between toneless long vowels (e.g. kaanh [ˈkaːŋ] ‘four’), long vowels with falling tone (marked as low, e.g. kàanh [ˈkàːŋ] ‘sky’), and toneless short vowels (e.g. k’anh [ˈkɁaŋ] ‘loud’) (see also Martin 1984). Sarles (1966) and Kaufman (1972) report that the variety of Tsotsil spoken in San Bartolomé de los Llanos (aka. San Bartolo or Venustiano Carranza Tsotsil) has a contrast between high and low tone on roots, and predictable tones on affixes. This characterization of the data is disputed by Herrera Zendejas (2014), who argues that pitch variation across vowels in San Bartolo Tsotsil reflects allophonic conditioning by glottalized consonants rather than true phonological tone (see also Avelino et al. 2011: fn.1). It appears to be an open question whether this, or any other variety of Tsotsil, might have phonological tone contrasts. Several languages in the Mayan family have incipient tone: some vowels appear to be specified for a particular pitch level or contour, though pitch is at least partially predictable from context (e.g. Hyman 1976; Hombert et al. 1979). For example, in Ixtahuacán Mam (Eastern Mayan), /VːɁ/ sequences are realized as [Vː], with falling tone and no apparent glottal closure corresponding to the underlying /Ɂ/, as shown in (7). (7) Ixtahuacán Mam (England 1983: 32–41; England and Baird 2017) a. i’tzal /iʔtsal/ → [ˈʔiʔ.tsal] ‘Ixtahuacán’ b. sii’ /siːʔ / → [ˈsî ː] ˜‘firewood’ c. a’ /aʔ/ → [ˈʔaʔ] ˜‘water’ ˜ → [ˈwâː.ja] ‘my water’ d. waa’ya /waːʔja/ ˜

MESOAMERICA 421 Similar cases of quasi-tonemic pitch conditioned by /Ɂ/ are reported for Teko (Eastern Mayan: Kaufman 1969; Pérez Vail 2007) and Tuzantec (Western Mayan, possibly a dialect of Mocho’, which is tonal; Martin 1984; Palosaari 2011). To our knowledge there are no instrumental studies of incipient tone in Mayan languages.

28.3.3 Phonation Several Mayan languages have laryngeally complex vowels. In the Yucatecan languages, modally voiced vowels contrast with so-called rearticulated vowels /VxʔVx/ (8). While typically transcribed as a sequence, these are phonologically single segments: words such as Mopan ch’o’oj [⁀ tʃɁoɁoh] ‘rat’ (Hofling 2011: 5, 172) are monosyllabic (Bennett 2016: §2.3). (8) Itzaj (Hofling 2000: 4–5, 10) a. kan [ˈkan] ‘snake’ b. ka’an [ˈkaʔan] ‘sky’ c. taan [ˈtaːn] ‘front’ d. ta’an [ˈtaʔan] ‘lime’ e. a’ [ʔaʔ] det In Yucatec, rearticulated vowels are associated with a sharp high-low pitch contour, /V́ xɁVx̀ /. Phonetically, they are usually produced with creaky voice rather than a full glottal stop; Frazier (2009a, 2009b, 2013) argues that a more appropriate phonetic transcription for these vowels would be [V́ V ]. Gussenhoven and Teeuw (2008) report that glottalization is strong est in phrase-final position. Attinasi (1973) and Coon (2010) argue for a second type of laryngeally complex vowel in / ~ /V Ch’ol (Western Mayan), ‘aspirated’ /Vh V / (e.g. k’ajk [kɁahk] ~ [kɁaḁk] ‘fire’ vs. pak’ Ɂ [pak ] ‘seed’). However, many authors treat the voiceless portion of ‘aspirated’ vowels as an independent consonant rather than contrastive vowel phonation (e.g. Schumann Gálvez 1973; Vázquez Álvarez 2011). Polian (2013: 105, 112–117) notes that [VhCCV] clusters are the only triconsonantal clusters permitted in Oxchuc Tseltal (Western Mayan), which may indicate that [h] is in fact a vowel feature rather than a true consonant in this context (see also Vázquez Álvarez 2011: 19, 46–47 on Ch’ol). Both phonemic and epenthetic glottal stops are pervasive in Mayan, and are frequently realized as creakiness on adjacent vowels rather than a full stop (Frazier 2009a, 2013; Baird 2011; Baird and Pascual 2011). The realization of /VɁC/ sequences often includes an ‘echo’ vowel, [VxʔVxC], making them superficially similar to ‘rearticulated’ vowels in the Yucatecan languages. England and Baird (2017) note that the phonological behaviour of /Ɂ/ in some Mayan languages suggests that /Ɂ/ is both a consonant and a feature of vowels.

28.3.4 Syllable structure Mayan languages differ substantially in their consonant cluster phonotactics. Yucatecan and Western Mayan languages tend to allow clusters of no more than two consonants, as in Ch’ol kpech [k-pe⁀ tʃ h] ‘my duck’ (Vázquez Álvarez 2011: 19, 46–47). Eastern Mayan languages

422 CHRISTIAN D ICANIO AND RYAN BENNETT are often more permissive, e.g. Sipakapense xtqsb’jaj [ʃtqsɓχaχ] ‘we are going to whack him/ her/it’ (Barrett 1999: 32). Complex clusters in Eastern Mayan are frequently the result of prefixation and/or vowel syncope; as a consequence, word-final clusters are often simpler than initial or medial clusters even in languages (like Sipakapense) that allow long strings of consonants (Barrett 1999: 23–33). It should be noted that the actual syllabification of consonant clusters, phonologically speaking, remains unclear for many Mayan languages (see Bennett 2016: §4). Sonority does not seem to influence consonant cluster types in Mayan, though certain clusters are avoided (e.g. adjacent identical consonants; García Matzar et al. 1999: 29 for Kaqchikel; Bennett 2016: §2.4.4, §4, generally). Root morphemes typically conform to a /CV(ː)C/ template, though more complex roots such as Kaqchikel k’u’x /k ɁuɁʃ/ ‘heart’ are attested as early as Proto-Mayan (Kaufman 1976a, 2003). These root shape restrictions are statistical regularities rather than absolute requirements, and hold more strongly for some lexical classes (e.g. verbs) than for others (e.g. nouns). The /CV(ː)C/ root template may reflect independent syllable shape requirements, with the caveats that (i) some languages seem to allow syllables that are more complex than /CV(ː)C/, while still enforcing root shape requirements, and (ii) there are other phonotactic conditions in Mayan languages that hold directly over roots and that do not apply to syllables as such (e.g. consonant co-occurrence restrictions; Bennett 2016: §5).

28.3.5 Intonation Many primary sources on Mayan languages describe intonation across different clause types, but there are no large-scale surveys of intonation in the family. Additionally, the relationship between morphosyntactic structure and higher prosodic domains has not been studied systematically for most Mayan languages. A few generalizations nonetheless emerge from the literature. In some Mayan languages, declarative sentences are often produced with final rising pitch (e.g. Berinstein 1991; Aissen 1992, 2017b; Palosaari 2011; Shklovsky 2011; and references therein), against the typological trend towards falling intonation in declaratives (e.g. Gussenhoven 2004: ch. 4). Nuclear stress tends to occur in phrase- or utterance-final position (e.g. K’iche’ and Q’eqchi’, Eastern Mayan: Berinstein 1991; Nielsen 2005; Henderson 2012; Baird 2014; Burdin et al. 2015; Wagner 2014; Ch’ol, Western Mayan: Warkentin and Brend 1974; Huasteco: Larsen and Pike 1949). Many Mayan languages have clitics or affixes whose form and/or appearance is conditioned by phrasal position (e.g. Skopeteas 2010; Aissen 2000, 2017b). In K’iche’, for instance, intransitive verbs are marked with the ‘status suffix’ /-ik/ when they occur at the end of an IP, but not in IP-medial position (Henderson 2012): (9)

a. X-in-kos-ik. compl-a1sg-tire-ss ‘I am tired.’ b. X-in-kos r-umal nu-chaak. compl-b1sg-tire a3sg-cause a1sg-work ‘I am tired because of my work.’

MESOAMERICA 423 These edge-marking morphemes can be a useful diagnostic for intonational domains in Mayan (e.g. Aissen 1992). Most research on the intonation of Mayan languages has dealt with the prosody of topic and focus constructions. Almost all Mayan languages have VS(O) or V(O)S as their basic word order (England 1991; Clemens and Coon 2018; Huasteco is an exception: Edmonson 1988: 565). Discourse topics may appear in a pre-verbal position (10c) (Aissen 1992, 1999, 2017a). Focused constituents may also be fronted, typically to a position between the verb and a preverbal topic, if present (10c). In situ focus is possible as well, sometimes with additional morphological marking or focus particles (10b) (see also Velleman 2014). (10) Tsotsil (Aissen 1987, 1992, 2017a) a. [Tseb San Antrex]F la te s-ta-ik un. girl San Andrés cl there a3-find-pl encl ‘It was a San Andrés girl that they found there.’ b. ja’ i-kuch yu’un i [soktometik]F foc compl-prevail by det Chiapanecos ‘It was the Chiapanecos that won.’ c. [A ti prove tseb-e]top [sovra]F ch’ak’bat. top det poor girl-encl leftovers was.given ‘It was leftovers that the poor girl was given.’ In some Mayan languages, pre-verbal topics are followed by a relatively strong prosodic boundary, indicated by phrase-final intonational contours, the possibility of pause, pitch reset, and phrase-final morphology (Aissen 1992; Avelino 2009; Can Pixabaj and England 2011; Bennett 2016; England and Baird 2017). Fronted foci are typically followed by a weaker boundary, and in some languages (e.g. Tz’utujil; Aissen 1992) even topics appear to be prosodically integrated with the rest of the clause (see also Curiel Ramírez del Prado 2007; Yasavul 2013; Burdin et al. 2015). In Yucatec, fronted foci do not appear to be prosodically marked, at least with respect to duration and pitch excursions (Kügler and Skopeteas 2006, 2007; Kügler et al. 2007; Gussenhoven and Teeuw 2008; Avelino 2009); in situ foci may be followed by pauses (Kügler and Skopeteas 2007). K’iche’ may also lack prosodic marking for focus (Yasavul 2013; Velleman 2014; Burdin et al. 2015); however, Baird (2014) found that duration, pitch range, and intonational timing were potential cues to focus in this language, particularly for in situ focus.

28.4 Toto-Zoquean The Toto-Zoquean language family consists of two major branches, Totonacan and MixeZoquean (Brown et al. 2011a). The Totonacan languages, consisting of three Tepehua and approximately 16 Totonac varieties, are spoken in the states of Veracruz and Puebla, Mexico. The Mixe-Zoquean languages, consisting of seven Mixe and five Zoque (also called Popoluca8) varieties, are spoken further south in the states of Oaxaca and Chiapas, Mexico (Wichmann 1995). 8 Not to be confused with Popoloca, which is Oto-Manguean.

424 CHRISTIAN D ICANIO AND RYAN BENNETT

28.4.1 Syllable structure, length, and phonation type Most Toto-Zoquean languages permit up to two onset and coda consonants, that is, (C)(C)V(V)(C)(C). In most languages, there is a phonemic contrast in vowel length as well. In Ayutla Mixe, up to four coda consonants are possible, though more complex clusters are usually heteromorphemic, e.g. /t-ɁaˈnuɁkʂ-nɤ-t/, 3A-borrow-perf-pl.dep, [tɁaˈnuɁkʂn˳ t] ‘they borrowed it’ (Romero-Méndez 2009: 79). Examples showing varying syllable types are given in Table 28.9. Table 28.9 also demonstrates the contrast between short and long vowels in Ayutla Mixe. The length contrast is orthogonal to voice quality on vowels (modal /V/, creaky /VɁ/, and breathy /Vh/). Though the maximal syllable structure is CCV:CC in Ayutla Mixe, complex codas are rare after long vowels in uninflected stems, and are often heteromorphemic or expone verbal inflection. Similar syllable structure constraints are found throughout the family—for example, in Alotepec Mixe (Reyes Gómez 2009), Chuxnabán Mixe (Jany 2011), Tamazulápam Mixe (Santiago Martínez 2015), Sierra Popoluca (de Jong Boudreault 2009), Filomena Mata Totonac (McFarland 2009), Huehuetla Totonac (Kung 2007), Misantla Totonac (MacKay 1994, 1999), Zacatlán Totonac (Aschmann 1946), and Pisaflores Tepehua (MacKay and Treschel 2013). Phonation type is contrastive on vowels in most Toto-Zoquean languages. Modal vowels contrast with glottalized/creaky vowels, often transcribed as /VɁ/ when short and /VɁV/ when long. In certain varieties of Mixe (Alotepec, Ayutla, Chuxnabán, Totontepecano) (Suslak 2003; Reyes Gómez 2009; Romero-Méndez 2009; Jany 2011) and Sayula Popoluca (Clark 1959), breathy vowels also occur. In Chuxnabán Mixe, short glottalized vowels are realized with creaky phonation at the end of the vowel portion, while long glottalized vowels are ‘rearticulated’, realized with glottalization at the vowel midpoint (Jany 2011; Santos Martínez 2013). Breathy vowels are realized with final aspiration or breathiness near the end of the vowel nucleus, regardless of length. The same pattern of vowel-glottal phasing (cf. Silverman 1997b) is described impressionistically for Alotepec Mixe (Reyes Gómez 2009), Sierra Popoluca (de Jong Boudreault 2009), and Zacatlán Totonac (Aschmann 1946). In Metepec Mixe, rearticulated vowels contrast with long, glottalized vowels, i.e. /VɁV/ vs. /VːɁ/, (Santos Martínez 2013). Glottalized consonants are found in both Huehuetla Totonac (Kung 2007) and Pisaflores Tepehua, but glottalized vowels do not occur (MacKay and Treschel 2013). In both languages, bilabial and alveolar stops are realized as implosives in word-initial position, whereas more posterior stops/affricates are realized as ejectives. Vowel length is contrastive in many Toto-Zoquean languages and may interact with phonation type. In Ayutla Mixe (see above) and in Totontepecano Mixe (Suslak 2003), both glottalized and breathy vowels contrast for length. However, in Alotepec Mixe, length is

Table 28.9 Syllable structure in Ayutla Mixe Rime

CVC

/V/ /VɁ/

hut ‘hole’ puɁ⁀ t s ‘short’

/Vh/

pʌhk

‘bone’

CV:C huːt puɁut⁀s nʌːhʂ

CVCC

CV:CC

‘take it out!’ ‘rotten’

tʌt⁀sk jhɤɁkʂ

‘ear’ ‘it gets hot’

waːn=s jhɤ Ɂɤkʂ

‘few=1S’ ‘it got hot’

‘ground’

kʌhpʂ

‘speak!’

kʌːhpʂ

‘he spoke’

(Data from Romero-Méndez 2009)

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

2 3 4 5

A=Atlantic, B=Benue-Congo, C=Chadic, D=Idjoid, E=Edoid, F=Kordofanian, G=Gur, H=Cushitic, I=Igboid, J=Kainji, K=Kru, M=Mande, N=Nilotic, O=Omotic, S=Central Sudanic, T=Bantoid, V=Volta-Congo, W=Kwa, X=Nubian, Y=Khoe-Kwadi, Z=Atlantic-Congo

Map 12.1 Number of contrastive tone heights

Complex R&F R only F only None

A=Atlantic, B=Benue-Congo, C=Chadic, D=Idjoid, E=Edoid, F=Kordofanian, G=Gur, H=Cushitic, I=Igboid, J=Kainji, K=Kru, M=Mande, N=Nilotic, O=Omotic, S=Central Sudanic, T=Bantoid, V=Volta-Congo, W=Kwa, X=Nubian, Y=Khoe-Kwadi, Z=Atlantic-Congo

Map 12.2 Types of tonal contours

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

20 The North Atlantic and the Arctic

18 Northwestern Europe 19 English 16 Southern Europe 17 Iberia

27 North America

13 North Africa and the Middle East

28 Mesoamerica

Th eN or th

29 South America

19 English At 2 lan 0 tic an d

th eA rc tic

18 Northwestern Europe

16 Southern Europe

17 Iberia

Afroasiatic: Chapter 13 (for Chadic and Cushitic, see also Chapter 12); Maltese: Chapter 16 Austroasiatic: Chapter 23 Austronesian: Chapter 25 (for Chamic and peninsular Malay, see also Chapter 23) Basque: Chapter 17 Caucasian: Chapter 14 Dravidian: Chapter 21 Eskimo-Aleut: Chapter 20 (for Central Alaskan Yupik and West Greenlandic, see also Chapter 27) Finno-Ugric: Chapter 15 Indo-European: Baltic: Chapter 15; Celtic: Chapter 20; Germanic: Mainland Germanic: Chapter 18; English: Chapter 19; Icelandic and Faroese: Chapter 20; Hellenic: Chapter 16; Indo-Aryan: Chapter 21; Persian: Chapter 14; Romance: Catalan, Spanish, and Portuguese: Chapter 17; French and Italian: Chapter 16; Romanian: Chapter 15; Slavic: Chapter 15. Hmong-Mien: Chapter 23 Japanese: Chapter 24 Khoi-San: Chapter 12

Map 1.1 Areal groupings of languages explored in Part IV

RW Ru-NSM 28 juni 19

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

18 Northwestern Europe e 15 urop 19 ern E East English l and a r t Cen sia 16 14 entral A Southern Europe dC t an s e 17 th W Iberia Sou

22 China and Siberia

24 Asian Pacific Rim

21 The Indian Subcontinent

13 North Africa and the Middle East

23 Mainland Southeast Asia

25 Austronesia

12 Sub-Saharan Africa

25 Austronesia

26 Australia and New Guinea

25 Austronesia

25 Austronesia

RW RU-NSM 28juni19

Korean: Chapter 24 Kra-Dai: Chapter 23 Mayan: Chapter 28 Mongolic: Chapter 14 Niger-Congo: Chapter 12 Nilo-Saharan: Chapter 13 (for Kunuma and Ronga, see also Chapter 12) Oto-Manguean: Chapter 28 Pama-Nyungan: Chapter 26 Sino Tibetan: Tibeto-Burman: Chapter 23; Sinitic: Chapter 21 Totozoquean: Chapter 28 Turkic: Chapter 14 Yeniseian: Chapter 22 Yupik-Aleut: Chapter 20 Numerous families and isolates: §14.5, chapters 26 and 29

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

↓

L, M

↓

M, ↓H

↓

L, ↓H

↓

L

↓

M

↓

H None

A=Atlantic, B=Benue-Congo, C=Chadic, D=Idjoid, E=Edoid, F=Kordofanian, G=Gur, H=Cushitic, I=Igboid, J=Kainji, K=Kru, M=Mande, N=Nilotic, O=Omotic, S=Central Sudanic, T=Bantoid, V=Volta-Congo, W=Kwa, X=Nubian, Y=Khoe-Kwadi, Z=Atlantic-Congo

Map 12.3 Types of downstepped tones

MESOAMERICA 425 non-contrastive in breathy vowels (Reyes Gómez 2009). A three-way contrast in vowel length has been described for Coatlán Mixe, e.g. /poʃ/ ‘guava’, /poːʃ/ ‘spider’, and /poːːʃ/ ‘a knot’ (Hoogshagen 1959). Subsequent work on the closely related Guichicovi Mixe variant showed that this three-way contrast was not phonemic, but partially conditioned by a previously undescribed contrast in consonant length (lenis vs. fortis consonants). In a phonetic study on Guichicovi Mixe, Bickford (1985) found that short and long vowels shorten before fortis consonants, e.g. /kappɨk/ [kăpːɨk] ‘carry it (imp)’, but lengthen before lenis consonants, e.g. /kapɨk/ [kaːpɨk] ‘no (quot)’. An alternation between vowel and consonant length is phonologized in Alotepec Mixe, where ‘weak’ consonants surface after long vowels (/Vː, VɁV/) and not before short vowels (Reyes Gómez 2009). Phonetically, short vowels in Ayutla Mixe are more centralized than long vowels are (Romero-Méndez 2009) and impressionistic work on Zacatlán Totonac and Tlachichilco Tepehua suggests a similar pattern (Aschmann 1946; Watters 1980). However, little instrumental work has been done to date on these vowel length contrasts and associated consonant mutations.

28.4.2 Stress and intonation Four types of primary stress systems are observed in Toto-Zoquean languages, differing slightly from those observed in Mayan languages (§3.1): quantity-sensitive stress, morphologically conditioned stress, fixed stress, and lexical stress. Primary and secondary stress are observed in most languages, and evidence of tertiary stress in Sierra Popoluca is discussed in de Jong Boudreault (2009). Primary stress usually surfaces at the right edge of the morphological word, but the conditions on its assignment vary. The most common stress pattern in Toto-Zoquean is primary stress on the final heavy syllable but otherwise on the penult, as in Sierra Popoluca (de Jong Boudreault 2009), Misantla Totonac (MacKay 1999), Pisaflores Tepehua (MacKay and Treschel 2013), Huehuetla Totonac (Kung 2007), and Texistepec Popoluca (Wichmann 1994). The phonological criteria for categorizing syllables as light or heavy varies by language. In Pisaflores Tepehua, syllables with long vowels and/or sonorant codas are heavy, but syllables with obstruent codas are light (MacKay and Treschel 2013). In Huehuetla Totonac, only syllables with codas are classified as heavy (open syllables are light) (Kung 2007). A unique pattern is found in Misantla Totonac, where syllables with a coda coronal obstruent are light, but syllables with any other coda or with a long vowel are heavy (MacKay 1999) (Table 28.10). Table 28.10 also illustrates weight-sensitive secondary stress in Misantla Totonac. Primary stress is assigned at the right edge, but secondary stress surfaces on all preceding heavy

Table 28.10 Segment-based quantity-sensitive stress in Misantla Totonac nouns (Mackay 1999) Penultimate /min-kiɬ-nḭ/ [ˌmiŋˈkiɬnḭ] /paːɬka̰/ [ˈpaːɬka̰] /mukskut/ [ˈmukskut] Ultimate /min-paː-luː/[ˌmimˌpaːˈluː] /ɬukuk/

[ɬuˈkuk]

‘your mouth’ ‘comal’ ‘fire’ ‘your intestines’ ‘pierced’

/min-siksi/ /kḭspa̰/ /maː-kit⁀sis/ /min-laː-qa-pḭn/ /sapa̰p/

[ˌmiˈsiksi] ‘your bile’ [ˈkḭspa̰] ‘corn kernel’ [maːˈkit⁀sis] ‘five’ [ˌmiˌlaːqaˈpḭn]‘your ribbons’ [saˈpa̰p] ‘warm’

426 CHRISTIAN D ICANIO AND RYAN BENNETT

Table 28.11 Lexical stress in Filomena Mata Totonac (McFarland 2009) Antepenultimate ˈskawawɁḁ

Penultimate ‘dry tortilla’

ˈʃtiːlan ˈsasan piˈtʃawaɁa˳

Ultimate ‘chicken’ ‘skunk’ ‘eagle’

naˈku tʃaaˈli ɬtoˈxox

‘heart’ ‘tomorrow’ ‘backpack’

syllables in the word, a pattern also observed in Pisaflores Totonac (MacKay and Treschel 2013). Secondary stress occurs on every other syllable preceding the primary (rightmost) stressed syllable in both Texistepec Popoluca (Wichmann 1994) and Huehuetla Totonac (Kung 2007). Primary stress is morphologically driven in many Toto-Zoquean languages. Table 28.10 reflects the stress pattern found on nouns in Misantla Totonac, but verbs have fixed final stress (i.e. no weight-sensitivity). Despite otherwise having right-edge primary stress, ideophonic words in Huehuetla and Filomena Mata Totonac have initial stress (Kung 2007; McFarland 2009). Moreover, morpheme-specific exceptions to these stress patterns occur throughout the family (Romero-Méndez 2009). In some languages, the domain of primary stress assignment is the nominal or verbal root rather than the morphological word, for example Ayutla and Tamazulápam Mixe (Romero-Méndez 2009; Santiago Martínez 2015). Lexical stress occurs in Filomena Mata Totonac, though almost 85% of the lexicon displays morphologically conditioned stress (McFarland 2009: 51) (Table 28.11). In such cases stress is not quantity-sensitive: final light syllables may receive stress when they follow heavy penults, and light penults or antepenults may receive stress when the final syllable is heavy. Fixed stress is rare within Toto-Zoquean languages. Primary stress is fixed in penultimate syllables in Chimalapa Zoque (Johnson 2000), Chapultenango Zoque (Herrera Zandejas 1993), and Chiapas Zoque (Faarlund 2012), but word-initial in Alotepec Mixe (Reyes Gómez 2009). There are only some impressionistic descriptions of the intonational patterns in TotoZoquean languages. For Tlachichilco Tepehua, Watters (1980) describes statement intonation as consisting of a downglide from the stressed syllable if stress is utterance-final, but a high pitch and subsequent fall if the stressed syllable is not final. Question intonation is described as having a high pitch on the pre-tonic syllable and a low target pitch on a final stressed syllable. In Zacatlán Totonac, statements are described as involving an utterance-final fall, but content questions consist of a final rise (Aschmann 1946). Apart from the patterns mentioned here, there are a large number of segmental processes that are sensitive to prosodic domains and stress in Toto-Zoquean languages, such as consonant weakening, glottalization, and the domain of palatalization rules. Readers are referred to the descriptions of individual languages mentioned here for more information on these patterns.

28.5 Conclusion The three major language families of Meso-America (Oto-Manguean, Mayan, and TotoZoquean) display an extreme diversity of word-prosodic patterns, including complex lexical tone systems, distinct stress alignment patterns, simple and complex syllable structure,

MESOAMERICA 427 and myriad phonation contrasts that interact with other prosodic phenomena. Generally speaking, there is a paucity of linguistic research on higher-level prosodic structure in Meso-American languages. Moreover, despite the observed complexity, a large number of languages remain minimally described; the descriptive work consists of either older unpublished sources or brief statements found within more general grammatical descriptions. The summary of patterns provided here serves both as a brief overview of the typological complexity within this linguistic area and as a motivation towards future fieldwork and research.

Acknowledgements This work was supported by NSF Grant #1603323 (DiCanio, PI) at the University at Buffalo.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 29

Sou th A m er ica Thiago Costa Chacon and Fernando O. de Carvalho

29.1 Introduction In this overview of prosodic systems in South American (SA) indigenous languages, we highlight their most general and most of their unique prosodic features. There are about 420 languages still spoken in SA, belonging to 53 linguistic families with two or more languages and 55 linguistic isolates (Campbell 2012). Information on prosodic systems is still very dispersed and uneven.1 Table 29.1 summarizes the distribution of SA languages with stress and/or tone systems.2 None have been reported without tones or stress. One of the challenges of SA is the analysis of languages with mixed or less prototypical stress and tone systems. Stress and tone systems in SA are discussed in §29.2 and §29.3, respectively. Some tonal phenomena are also discussed in §29.2 since they relate to stress in crucial ways. In §29.4, we discuss miscellaneous issues, such as sonority hierarchies, laryngeals, and nasality, which interact with prosody in complex ways. In §29.5, we discuss issues related to word prosody and morphology. In §29.6, we discuss some historical and comparative topics. In §29.7, we present our conclusions.

Table 29.1 Summary of languages with stress and/or tone systems Stress Tone and stress Only tone

93% (189) 25% (50) 7% (14)

1 For complementary discussion we refer to Hyman (2016c) and to Wetzels and Meira (2010), as well as to more general works, such as Dixon and Aikhenvald (1999), Adelaar (2004), and Storto and Demolin (2012), and phonological databases, such as Saphon (Michael et al. 2015), Lapsyd (Maddieson et al. 2014) and the several chapters in the World Atlas of Language Structures (Goedemans and van der Hulst 2013a, 2013b, 2013c; Maddieson 2013a, 2013b). 2 The database for the current survey has 203 languages distributed in 32 linguistic families with two or more related languages plus 21 linguistic isolates. It is a work in progress and it is not publicly available.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

SOUTH AMERICA 429

29.2 Stress and metrical structure 29.2.1 Manifestation of prominence Word-level prominence is obligatory and culminative in most SA languages (cf. Hyman 2006). Some languages, however, care more about prominence at the foot level than the word level. On the basis of an impressionistic study of Tiriyó [Kariban] stress, Meira (1998) claims that there is no audible prominence difference between the stressed syllables of multiple iambic feet in a word, as in [ˌkaː.pu.ˌɾuːtu] ‘cloud’, and that monosyllabic words lack stress (see also Jarawara [Arawan] in Dixon 2004: 26; for Nukini and other Panoan languages see González 2016: 140). In other systems, prominence is a demarcative property of words or the root, with little relevance to metrical feet. Shiwilu [Cahuapanan] exemplifies the former (see §29.5.1 for the latter), where monosyllables and disyllables are stressed in the first syllable, and in longer words the second syllable is regularly stressed. There is no evidence for secondary stress and there are no segmental effects on stress either (Valenzuela and Gussenhoven 2013). Several languages use only pitch to signal prominence, which is the most frequent acoustic correlate of stress generally on the continent.3 An interesting case that may challenge our notion of prominence is Cavineña [Tacanan], for which Guillaume (2008: 44–45) reports the existence of two non-contrastive pitch levels, H and M. The M goes to the last two syllables, while the H goes to the first as well as any remaining syllables in words longer than three syllables, as in [kímìshà] ‘three’, [mátùjà] ‘cayman’, [íwárákwàrè] ‘called’. Almost all of the languages that signal prominence by pitch alone have a vowel length contrast (cf. Wetzels and Meira 2010). In other languages with long vowels, especially in the Arawakan family, intensity in addition to pitch can be used to highlight stress. Languages that make use of all acoustic correlates of stress or exclusively of duration or intensity are rarer.4 In some languages, duration is a distinctive property of unstressed syllables. In Iquito [Zaparoan], post-tonic syllables have a longer duration (Michael 2011). In some Panoan languages, pre-tonic syllables have longer duration (González 2016). In Sekoya [Tukanoan], the pre-tonic vowel can be lengthened and devoiced before a voiceless stop [ʔòo̥khó] ‘water’ (Johnson and Levinson 1990). Remarkably, in Pirahã, unstressed vowels have longer dur ation than stressed vowels (Everett and Everett 1984b). In languages with tones and obligatory stress, prominence may be manifested by pitch, greater duration, and/or intensity (Wetzels and Meira 2010: 316–317). Some systems show an interlocking pattern, where the location of primary stress and the association of an underlying tone converge in the same syllable; this may lead to an increase of pitch levels in order to enhance the contrast between the same underlying tones in stressed or unstressed positions (see §29.3). 3 For instance, ‘Noble’ Kadiweu [Guaicuruan] (Sândalo 1995), Achagua [Arawakan] (Meléndez 1998), Sabanê [Nambikwaran] (Antunes 2004), Tsafiki [Barbacoan] (Dickinson 2002), Aguaruna [Chicham] (Overall 2007), and Mapudungun [isolate] (Molineaux 2016a, 2016b). 4 Languages that only use intensity are Jarawara [Arawan] (Dixon 2004), Piaroa [Sáliban] (Mosonyi 2000), Awa Pit [Barbacoan] (Dueñas 2000), and Páez [Isolate] (Rojas Curieux 1998). Languages using only duration are Paresí (Brandão 2014), the ‘non-Noble’ sociolect of Kadiweu (Sândalo 1995), and Wapixana (dos Santos 2006).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

430 THIAGO COSTA CHACON AND FERNANDO O. DE CARVALHO Strikingly, Terena [Arawakan] is reported to have two ‘stress phonemes’: ‘acute stress’, as in [káʃe] ‘sun’, which is realized by increased intensity in the stressed vowel and increased duration of the following consonant, and ‘circumflex stress’, as in [apâka] ‘liver’, realized by a lengthening of the stressed vowel, greater intensity, and a characteristic falling pitch contour (Ekdahl and Grimes 1964: 266; Ekdahl and Butler 1979: 16–17). The most parsimonious analysis in our view is that stress is underlyingly associated with one of the first three syllables of a word, whereby some lexemes are associated with a HL tonal melody, which causes lengthening of the vowel (see also Aikhenvald 1999: 79).

29.2.2 Metrical feet and edges Most SA stress systems are bound to the right edge of words or root morphemes, as given in Table 29.2. These figures only include languages with bound stress systems, both with fixed and free stress. Stress locations refer to the pattern that is more common or regular according to the literature surveyed. Languages with fixed antepenultimate stress are rare (e.g. Cayuvava, Hayes 1995; ‘Nonnoble’ Kadiweu, Sândalo 1995). There are no systems with fixed stress on the third syllable, but a few systems allow a three-syllable window at the left edge, as in Tunebo [Chibchan] (Headland and Headland 1976; Headland 1994), Tuyuka [Tukanoan] (Barnes and Malone 2000), Ese’eja [Takanan] (Rolle and Vuillermet 2016), and Terena (see §29.2.1). About 45% of SA languages have iambic feet and 55% have trochees. Of the iambic languages, 70% assign feet from the left, whereas the edge parameter is evenly divided over left and right in the trochaic languages.5 Iambic feet in SA languages do not always show the typical light-heavy structure (Hayes 1995), many systems having weight-insensitive iambs (e.g. Piaroa, Mosonyi 2000; Matis, González 2016; ‘Noble’ Kadiweu, Sândalo 1995; Kubeo, Chacon 2012). Some languages have a dual stress system. In South Conchucos Quechua, stress may be penultimate or first depending on whether the word is pronounced in isolation or in connected speech (Hintz 2006) (the same in Kakataibo [Panoan] tetrasyllabic words: Zariquiey 2018). In Muniche [isolate], nouns and verbs differ in their metrical parse of syllabic

Table 29.2 Position of primary stress relative to word edges Location of primary stress Ultimate Penultimate Antepenultimate or three-syllable window at the right edge First Second Three-syllable window at the left edge

Percentage of languages 25% 30% 14% 12.3% 6.3% 12.3%

5 These figures only refer to 93 languages (45%) of our entire corpus, since most available sources do not state explicitly or do not allow a clear inference on foot patterns. These figures also disregard known cases of languages with non-cohesive foot types.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

SOUTH AMERICA 431 trochees, which are built from the right in nouns and from the left in verbs (Michael et al. 2013: 327). Other languages allow more than one foot type, such as 11 Panoan languages with ‘noncohesive’ foot types (González 2016).6 For instance, in Huariapano, secondary stress is assigned iteratively from left to right by syllabic trochees, and primary stress is determined by a moraic trochee at the right word edge, as in [ʂo.ˈmoʂ] ‘needle’, [ˌtʃu.kaj.βah.ˈkaŋki] ‘they washed’. In addition, while trochees are the default, some roots have iambic feet for assigning secondary stress, as in [βis.ˌma.noh.ˌko.no.ˈʂi.ki] ‘I forgot’ (Parker 1992). Unbound stress systems also occur, such as the ‘default to opposite side’ system in Sikuani [Guahiboan] (Kondo 1985; see also §29.5.1)—where stress falls on the rightmost heavy syllable of the root, otherwise unpredictably on the initial or second syllable—or Latundê and Lakondê [Nambikuaran]—where stress is predictable in the ultimate syllable of words with only light syllables (Telles 2002). The related language Mamaindê has a ‘default to the same side’ system, stressing the ultimate light or rightmost heavy syllable (Eberhard 2009). Tonal systems may show some interesting correlations between tones, foot structure, and edges. In Tanimuka [Tukanoan] (Eraso 2015), a H tone must occur within a two- syllable window from the left and right edges of words. In related Kubeo, H spreads from the primary stressed syllable up to the syllables of one monosyllabic or disyllabic foot on the right. For instance: [maˈkáró] ‘jungle’, [maˈkáróré] ‘to/at the jungle’, [maˈkárórékà] ‘the very jungle’. In Iñapari [Arawakan], similar to Kubeo, H tones are obligatory in every word, which can surface with a maximal number of four associations (HHHH), revealing the colon as a constraint on tone assimilation. In addition, there is always a HL melody in the final two morae of a word, since the final mora is always L; morae without underlying tone associ ations are L by default: e.g. /aparépaanari/ [àpàrépáánárì] ‘similar, alike’. Secondary stress and the optional syncope of unstressed high vowels are additional evidence for foot structure in the language (Parker 1999). In Bora [Boran], a right-aligned trochee seems to constrain some of its tonal phonology. First, there is a requirement on most words for a final L tone, similar to Iñapari. In addition, a constraint on sequences of *LLL tones conspires to create a binary rhythmic alternation, e.g. /LLLLL/ > LHLHL (Thiesen and Weber 2012; see also the discussion of the ‘vowel split rule’ in §29.2.3). Finally, some tonal systems are bound to the right edge of morphemes or words and tonal assimilation proceeds from the right to the left, as in Kakataibo (Zariquiey 2018), Bora (Thiesen and Weber 2012), Miraña [Boran] (Seifart 2005), and Suruí [Tupian] (van der Meer 1982) (for details see §29.3).

29.2.3 Syllable, mora, and quantity About one third of SA languages have a simple CV syllabic structure, a much higher proportion than the reported 12% of the world’s average of CV languages (Maddieson 2013b). 6 In a typology of 25 languages, González (2016) reports seven languages analysed as ‘non-cohesive’ and four as ‘lexical stress’. We treat them as the same here, since the effects on the foot structure at a given word edge are the same.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

432 THIAGO COSTA CHACON AND FERNANDO O. DE CARVALHO

Table 29.3 Types and proportion of quantity-sensitive systems VV and VC VV only VC only

60% 27% 13%

Complex onsets have a limited distribution in lowland SA, occurring most notably in some Kariban languages, Yanomami, and Macro-Jê (Storto and Demolin 2012). In the Andean highlands (including the Southern Cone), complex onsets and codas are more common (Adelaar 2004). One third of SA languages have quantity-sensitive stress or tone patterns, as given in Table 29.3. As expected (cf. Hayes 1995: 120), VV (phonemic long vowels), which occur in a third of the SA languages, are more relevant to quantity than VC (syllables closed by a consonant). Also, most languages with VV have a quantity-sensitive system, with minor exceptions, such as ‘Noble’ Kadiweu (Sândalo 1995) and Warekena [Arawakan] (Aikhenvald 1998). Panoan languages have quite interesting quantity-related issues. Capanahua has weightsensitive primary trochaic stress and weight-insensitive secondary stress (ElíasUlloa 2004). Peruvian Matsés has weight-insensitive iambs assigned from left to right; heavy syllables do not attract stress or secondary stress, yet they may sound more prominent than the light unstressed syllables (González 2016). In Saynáwa final syllables always bear primary stress, which requires the syllable to be heavy or lengthened if it is light (Couto 2016). Tone, quantity, and length also frequently co-occur. A common pattern is lengthening due to an underlying tone. In Chimila, a H tone has a preference for heavy syllables and may create weight by the gemination of the following onset consonant (Malone 2006: 36), similar to Terena’s ‘acute stress’ (§29.2.1). In Muinane, vowels with the rightmost H tone in a word are longer (Walton et al. 2000; see also §29.3 and §29.6.2 for similar facts in Nadahup languages). In Kakataibo, a language with tones and independent stress, (C)VC syllables attract a H tone, whereas stress is quantity-insensitive; however, stress lengthens a vowel, whereas a H tone does not, as in [ˈβáːka] ‘river’ vs. [ˈmaːʂáʂ] ‘stone’ (Zariquiey 2018; see also §29.3.3). In some languages, tonal contrasts are restricted to long vowels and diphthongs, such as Terena (§29.2.1) and related Baniwa, where LH, only found in long vowels, contrasts with H (found in short vowels and in free variation with HL in long vowels), as in [pìíttì] ‘ant’ vs. [pííttì] / [píìttì] ‘your fat’ (Ramirez 2001). Bora does not contrast tones on long vowels but bans sequences of three morae surfacing with low tones, *LLL, causing a long vowel to split into different syllables and dissimilation of the L tone, creating a L.H.L melody, hence /kɯ̀ ː.mɯ̀ / > [kɯ̀ .ɯ́ .mɯ̀ ] ‘large signal drum’. In a different rule, short vowels with L tones in phrase-final position may actually surface as two heterosyllabic vowels bearing a L.H melody; compare position [íhjɯ̀ ] ‘this trail’ in phrase-medial versus [íhjɯ̀ .ɯ́ ] in final position (for details see Thiesen and Weber 2012: 70–75). The two rules suggest that surface tones constrain syllabification patterns in Bora.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

SOUTH AMERICA 433

29.3 Tones Based on the number of underlying pitch levels and constraints on surface patterns, tonal systems in SA can be divided into five types: • • • • •

languages with three underlying tones (H, L, and M), reportedly found only in Tikuna; languages with only one underlying tone; ◦ underlying L (and default H) in Miraña and perhaps also in Kashinawa (Panoan); ◦ underlying H (and default L) in 35% of all tonal languages (total 21 languages); underlying H and L tones; obligatory H: 20% of all tonal languages (total 12 languages); non-obligatory H: 40% of all tonal languages (total 24 languages).

The number of tone languages across all five types where H is obligatory is 36 (55%). In the following sections, we discuss these types.

29.3.1 H, M, L Tikuna has by far the most complex tone system in SA, complicated by a considerable amount of dialectal variation. For the Amacucayu variety [Colombia], Montes Rodríguez (2004) describes a system with three underlying tones (L, M, and H), which contrast in words such as /témá/ ‘mauritia flexuosa’, /dénè/ ‘cane’, /pōí/ ‘banana’, /būrē/ ‘green ink’, /dèá/ ‘water’, /kàtà/ ‘beam’. The association of tones to tone-bearing units (TBUs) seems to be largely unconstrained, but the surface combinations HM, ML, and LM have not been found. From the three underlying tones, six level and two contour tones arise on the surface. A super-high tone may appear as an allophone of a H tone in the context of an adjacent, independent H tone (V́ -V́ > V̋ -V̋ ); likewise, sequences of L tones may also give rise to super-low tones (Montes Rodríguez 2004: 29–34). For other analyses on Tikuna see Anderson (1962) and Soares (2000).

29.3.2 Underlying L and default H In Miraña, lexical and grammatical morphemes may be marked by at most one underlying L tone (Seifart 2005), whether or not these are floating. A L tone is associated with one TBU only. TBUs that are not assigned to a L tone will surface as H, except the last TBU in the word or ‘tonal phrase’, which by default is L, as in /àmana/ [àmánà] ‘dolphin’.7 An intriguing case is Kashinawa [Panoan] (Kensinger 1963), where H contrasts with L only in the penultimate syllable, being the default tone elsewhere. Primary stress is stem-initial and secondary stress reveals a trochaic pattern. 7 In the closely related languages Bora (Thiesen and Weber 2012) and Muinane (Vengoechea 2012), there is evidence for underlying H tones in roots and affixes.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

434 THIAGO COSTA CHACON AND FERNANDO O. DE CARVALHO

29.3.3 Languages with underlying H and default L Languages with only H as an underlying tone show strong co-occurrences of tone, stress, and foot structure. While in some languages these are absolute, in other languages there is evidence of independent contrastiveness between tone and stress systems. In Tuyuka [Tukanoan], H tone is obligatory and culminative in the word, which is accompanied by ‘maximum intensity’ in the associated vowel (Barnes and Malone 2000). A default high pitch is assigned to the second vowel of the word if the root or suffixes are toneless. Otherwise, an underlying H will dock on the first mora of the root or on a suffix corresponding to the third mora in a word (J. Barnes 1996). Aguaruna [Chicham] (Overall 2007) has a quite similar system. In verbs, the default position of H is in the second vowel, whereas in nouns it can be either in the first or the second. In Iñapari (see §29.2.2 for stress), H is obligatory and is usually accompanied by greater loudness and duration. Except for disyllables, the first syllable of a word always has a L tone and final syllables are usually L as well. Underlying H tones are unpredictable and spread rightwards up to four syllables. Words can have more than one underlying H tone only if they are plurimorphemic. If there is no underlying tone, a default H is assigned to the penultimate syllable and it does not spread (Parker 1999). For Kofán [Isolate], Borman (1962) describes words as having either of two types of stress: ‘stress 1’, which is signalled by a gradual pitch drop after the stressed syllable, and ‘stress 2’, whereby the stressed syllable is longer and there is a gradual pitch increase towards the right edge of a word. Similarly to Parker’s analysis of Iñapari, ‘stress 2’ may result from an underlying H that spreads rightwards. In ‘stress 1’, we interpret that there is no underlying tone, and stress correlates with a high pitch in the primary stressed syllables, the remaining syllables having low tones. Thus, not all words would have an underlying H tone, and stress default rules provide for a surface H tone in these cases. In Kakataibo (Zariquiey 2018; see §29.2.3 for stress), the H tone is obligatory. In most cases, H is interpretable as the phonetic manifestation of primary stress. However, some grammatical morphemes may have an underlying H tone, in which case a word may surface with two H tones. An underlying H usually docks on the rightmost trochaic foot, unless the preceding syllable is closed, thus attracting H. In related Capanahua, according to Loos (1969), there is only one stressed syllable per word, which occurs on the second syllable if closed and otherwise on the first. Some morphemes have an underlying H tone, which is equally sensitive to syllable weight. Because H spreads rightwards within a morpheme, it is not culminative in the way stress is. In addition, segmental processes and laryngeal segments differentially affect tone and stress (see also ElíasUlloa 2004, 2016).

29.3.4 Languages with underlying H and L Languages with underlying H and L can have up to four underlying patterns: H, HL, LH, L. Surface constraints may reduce this set to three, two, or even one. Other constraints determine whether a language has contour or register tones, as well as which TBUs are available for association with underlying tones.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

SOUTH AMERICA 435 Less constrained systems are reminiscent of African languages with tone melodies, as shown in the tonal templates of Karapana [Tukanoan] on bimoraic stems and a toneless suffix: /H/ > HH-H, /L/ > LL-L, /HL/ > HL-L, and /LH/ > LH-H, where tones in bold indicate the location of stress according to Metzger (1981). In Mundurukú [Tupian] (Picanço 2005), TBUs can be underlyingly associated with H or L tones. Other instances of L tones are default realization of empty TBUs, while H tones are either associated to specific TBUs or floating (see also §29.5). Due to the interaction of inton ation and creaky voice, Mundurukú can have up to four surface tones: super-high, high, low, and creaky-low. Similarly, in related Suruí (van der Meer 1982), two underlying tones can generate up to six levels of surface tones in interaction with word stress and intonation. The location of stress usually coincides with a high tone, though not always, and in fact word stress phonetically enhances the realization of tones, as in Mundurukú and Tikuna. In Wãnsöhöt [Isolate], monosyllabic morphemes have a four-way contrast: L, H, HL, LH (Girón and Wetzels 2007). In disyllabic and trisyllabic morphemes, H tone is culminative and contour tones are dispreferred. Words with only L tones are interpreted as toneless, even though L would appear to be phonologically active. Specifically, a root with HL tone spreads its L to certain suffixes, while other suffixes block a floating H from associating, presumably because that suffix has an underlying L. In neighbouring Kakua [Kakua-Nukak] (Bolaños 2016) and Kawiyari [Arawakan] (Reinoso Galindo 2012), surface-tone melodies suggest an underlying system of L, HL, and LH. No H tone surfaces without a L tone. Thus, there are LLL words but no HHH. This suggest that tones are minimally HL and LH. Whether L tone is phonologically active needs further investigation. Languages such as Sharanawa [Panoan] and Máíhɨ ̃ki [Tukanoan] have surface contrasts between H, L, and HL tones, as illustrated by the Sharanawa words [fàì] ‘corn field’, [fáí] ‘path’, and [fáì] ‘rise in a river’ (Pike and Scott 1962). In Máíhɨ ̃ki, tonal polarity also confirms that L is phonologically active (Farmer and Michael n.d.). A contrast based only on rising or falling tones, HL vs. LH, would appear to occur in all languages in the Nadahup family except Nadëb (see also §29.6.2). For instance, on the surface, Dâw has the four melodies H, L, HL, and LH, as in [ˈpɔ́j] ‘spirit’, [ˈpɔ̂ːj] ‘surubim (fish sp.)’, [ˈpɔ̌ːj] ‘caranã (palm sp.)’, [bɤ̀ ˈjɤ̌ː] ‘return’. However, in the analysis by Martins (2005), L only appears in unstressed syllables, while H occurs in stressed syllables that do not have an underlying tone. In addition, only HL and LH underlying tones lengthen the vowel. Tukano [Tukanoan], too, has the two underlying tone melodies HL and LH, according to Ramirez (1997). However, HL is never realized as a contour tone: in some roots, H is associated with the last mora, in which case L is only realized if there is a suffix, as in /peta-re/ HL [pèhtárè] ‘to the port’, being deleted otherwise. The LH melody is realized on the final syllable, lengthening it, as in /peta/LH > [pèhtàá] ‘ant sp.’. The lengthening is absent if the final syllable is a suffix, as in /peta-re/ LH [pèhtàre᷄] ‘to the ant’. In related Eastern Tukanoan languages—such as Kubeo (Chacon 2012), Barasana (GomezImbert 2004), Makuna (Smothermon et al. 1995), and Kotiria (Waltz and Waltz 2000; Waltz 2007)—two underlying tones, H and HL, dock on the first or the second TBU of a lexical root. When they dock on the second TBU, the first has an initial default L. This leads to four surface forms: /HL/ > HL-L, LH-L, /H/ > HH-H, LH-H (where the hyphen separates the tonally unspecified suffix). In Kubeo and Barasana, /L/ is otherwise phonologically active, as evidenced by some grammatical morphemes blocking H spreading. Contour tones only surface in Barasana, and only at the edge of bimoraic words (L)HL—where HL

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

436 THIAGO COSTA CHACON AND FERNANDO O. DE CARVALHO is associated to a single vowel, making it phonetically longer, similarly to Tukano above (see §29.2.2 for stress issues in Kuebo). Finally, some languages have a two-way tonal contrast restricted to a single TBU in the word. In Tunebo [Chibchan] (Headland and Headland 1976; Headland 1994), H and HL contrast in the first syllable, as in [áàka] ‘face’ vs. [ákà] ‘rock’ (or [àákà] in slow speech). An H tone may dock on any vowel of the word, whereas HL is restricted to the first syllable. Also, HL will always lengthen the vowel of the initial syllable, while lengthening by a H tone is optional. For the single-location contrast in Baniwa and Terena, see §29.2.3 and §29.2.1, respectively.

29.4 Sonority hierarchies, laryngeals, and nasality In many SA languages, patterns of stress and tone are sensitive to sonority hierarchies, vowel phonation, and laryngeal segments. Wetzels and Meira (2010) discuss how stress in Nanti [Arawakan] and Umutina [Macro-Je] is sensitive to the quality of the vowel in the syllable nucleus. In Pirahã [Mura-Pirahã], the main factor in stress assignment is syllable quantity, but, when both syllables tie for weight, the sonority of the onset becomes relevant. In Asheninka [Arawakan] (Payne 1989), syllable quantity, vowel quality, and the nature of the onset co-define a prominence hierarchy. Many languages present a correlation between nasal vowels and stress. Historically, this relation is probably always indirect, as is clear in Baré [Arawakan] (Aikhenvald 1995), for instance, where ‘derived nasals’ arise through nasalization of vowels by suffixal nasal con sonants, which are subsequently deleted, as in /nu-tʃinu-ni/ [nu.tʃi.ˈnũ] ‘my dog’. In some languages, [ʔ] is associated with strong metrical positions, as in Saynáwa (cf. §29.2.3), or in related Capanahua and Shipibo, where coda /ʔ/ is only preserved in stressed syllables, making the vowel creaky throughout (Elías-Ulloa 2016: 188). A similar phenomenon is reported for Muniche (Michael et al. 2013: 329–330). In Western Tukanoan languages, /ʔ/ in the coda is associated with high pitch, as in Koreguahe (Cook and Criswell 1993), or stress, as in Siona (Wheeler 1987). Less commonly, [ʔ] is associated with non-prominent positions. In Kofán (Borman 1962) and Wayuunaiki [Arawakan] (Álvarez 1994), antepenultimate stress occurs if the penultimate syllable has a glottal stop. In Desano [Tukanoan], a glottal stop is epenthesized along with a process of stress shift after the first mora in words derived from roots with initial stress, as in [ˈse.a-ke] ‘dig!’ and [se.ʔa-ˈbĩ] ‘he digs’ (Miller 1999: 16). The association of laryngealization and low tones is highly frequent. In Mundurukú, creaky voice is a contrastive phonation type in vowels but is dependent on tone, as it only co-occurs with L tones (Picanço 2005). Insertion of [ʔ] or creaky voice would also appear to be a common effect of transitions from L to H, as reported for Mundurukú, Siriano (Criswell and Brandup 2000), and Tikuna (Montes Rodríguez 2004: 30–31), similarly to what is described for Desano. In Tukano, unvoiced vowels and vowels with creaky voice always have L, while those with modal voice have either L or H. There is consistent evidence from several related Tukanoan

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

SOUTH AMERICA 437 languages that laryngealization should be analysed as an underlying specification of some roots, with variable phonetic implementation along a continuum between glottal stop and creaky voicing (Ramirez 1997; Stenzel 2007; Stenzel and Demolin 2013; Silva 2016). Less commonly than laryngealization, [h] and unvoiced (or whispered) vowels may cooccur with particular prosodic patterns. Wetzels and Meira (2010) describe systems where after stressed vowels become devoiced by an independent process, stress shifts to a syllable with modal voice (cf. Damana [Chibchan] and Awa Pit [Barbacoan]). In Piaroa [Sáliban] and Sekoya, a pre-tonic vowel is devoiced before a voiceless obstruent (Mosonyi 2000: 658; Johnson and Levinson 1990). In the Campa branch of the Arawakan family, syncope of [h] leaves surface H tones. Finally, association between laryngeals and nasalization (rhinoglottophilia) is very common in SA. In two languages, [ʔ] or [h] has [ŋ] as its allophone (cf. Sekoya, Vallejos 2013; Aguaruna, Overall 2007). In other languages [h] may nasalize adjacent vowels (cf. Arabela, Rich 1999: 13; Iñapari, Parker 1999; Yawitero, Mosonyi et al. 2000; Nanti, Michael 2008: 321; Yagua, Payne and Payne 1990: 438–442).

29.5 Word prosody and morphology 29.5.1 Stress and morphology There are various kinds of morphological conditioning of word stress in SA. Ese Ejja [Takanan] shows interesting phenomena of stress attraction and extrametricality at the left word edge. For instance, the privative suffix -ma attracts stress if it is located within a three-syllable window from the left edge; otherwise, it is unstressed: daki-má ‘naked (with no clothes)’ and bawícho-ma ‘with no rat’ (Vuillermet 2012: 201–202). There are also extrametrical affixes, such as the non-possessive e- prefix characteristic of a subclass of the language’s nouns, as seen in the contrast between éna ‘water’ and e-ná ‘blood’ (Vuillermet 2012: 203–204, 299). Banawá [Arawan] has an interesting case of extrametricality, where the infix [-ka-] ‘noun class marker’ is extrametrical despite not being on the edge of a phonological domain (Buller et al. 1993). Compare [ˌtiniˈkabuˌne], where stress is regularly assigned to every odd syllable, with [ˌti.ka.niˈkabuˌne] (where the noun class marker is underlined). Although an additional syllable is created by [-ka-], the stress pattern does not change.8 Like tone systems (§29.5.2), stress may be sensitive to the morphological structure of words. In Waorani [Isolate], foot parsing proceeds from the left in the root domain and from the right in the suffix domain (Hayes 1995: 182). Wichí (Mataguayan, Nercesian 2011: 81) and Mapudungun (Molineaux 2016a, 2016b) use word-level stress as a demarcative feature of, for instance, the boundary between roots of compounds, the boundary between a verb stem and inflectional affixes, and the demarcation of the lexical head of compounds distinguishing a N-V (noun incorporation) as opposed to a V-N compound. In Sikuani (Kondo 1985), stress is bound to the root, but some roots may lose the stress when inherently stressed suffixes occur. With other kinds of suffixes, stress is regularly 8 We note that noun class markers or classifiers in general have unique prosodic behaviour in several SA languages.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

438 THIAGO COSTA CHACON AND FERNANDO O. DE CARVALHO located at the penultimate syllable of the root or at the ultimate if that syllable is heavy. Affixation alters the status of heavy syllables: all except the last heavy syllable of the root lose their stress-attracting properties.

29.5.2 Tones and morphology Bound morphemes in tonal SA languages usually present complex morphophonological properties. In Wãnsöhöt, for instance, there are at least five types of suffixes with distinct prosodic features, including three with an underlying H that behaves differently depending on whether the root has H, HL, or LH (Girón and Wetzels 2007: 149). Tonal morphemes are found in quite a few languages, as either derivational or inflectional markers. The Boran family is particularly rich in them, one example being genitive L, which, as in Bora (Thiesen and Weber 2012), is marked by a L tone in the final syllable of a monosyllabic or disyllabic possessor noun. In longer words, the L tone docks on the first syllable. In neighbouring Ocaina [Witotoan], tones are involved in coding tense-aspectmood features (Agnew and Pike 1957). Tonal polarity, whereby morphemes present the opposite tone from that in a preceding morpheme, is widely attested in H vs. L tone systems (§29.3). In Mundurukú (Picanço 2005), some morphemes surface as L if the root has an underlying H and vice versa. Similar phenomena are found in several languages of Northwest Amazonia, such as the Tukanoan languages Máíhɨ ̃ki, Koreguahe, and Tanimuka, as well as Yukuna [Arawakan] and the Boran languages. Barasana has both grammatical and polar tones. In combination with segmental suffixes, a tonal prefix H distinguishes first and second person from third person, which has HL (Gomez-Imbert 2004: 71–74). In the past tense, the tonal prefix has polar tone relative to the tones of the root in the sense that if the root has HL, the prefix will be H, but if the root has H, the prefix will be HL. Similarly, in Latunde-Lakonde (Telles 2002), first and second person are distinguished in the imperfect aspect by H and L respectively in the nasal coda of a word-final syllable, V́ Ń being first and V́ Ǹ second person. Tonal polarity is also found in the lexical domain among Tukanoan languages. In Makuna, tones distinguish between body part terms for human and animals. If human terms have an underlying LH tone, the animal term will have HL and vice versa, e.g. [rìhógá] ‘head (human)’ and [ríhògà] ‘head (animal)’ (Smothermon et al. 1995). In Barasana and Eduria (Taiwano), tonal polarity is a mechanism that serves to mark their dialectal difference (Gomez-Imbert 1999). In many etymologically equivalent words, Eduria’s underlying HL matches Barasana (L)H and vice versa, as in Eduria [ˈékàɾè] vs. Barasana [ˈékáɾé] ‘to feed’.

29.6 Historical and comparative issues 29.6.1 Stress The edge alignment of primary stress in bound stress systems in SA shows a strong areal and typological bias. Left-edge systems are restricted to the western and northwestern fringes of lowland SA. This overlaps with the distribution of tonal languages (see §29.6.2).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

SOUTH AMERICA 439 In addition, half of the tonal languages have primary stress on the left edge and the other half on the right; the latter set is also genetically more diverse. This contrasts with the proportion of non-tonal languages with left-edge primary stress, which amounts to only about 20%. Foot form would appear to be a genetically or areal stable feature in a few SA families or regions only, notably the trochaic systems of the Andes and of the lowland families Guiacuruan and Yanomami. In other lowland SA families, iambic and trochaic feet have a more random distribution. Few families display iambic feet consistently, including some lower branches of the Kariban family, Sáliban, Nambikwaran (excluding Sabanê), and Macro-Jê. For most families, especially those with greater geographical distributions, there is an almost even pattern of iambic or trochaic feet, including in Arawakan, Kariban, Tupian, and Panoan. Available reconstructions of prosodic systems suggest developments from less predict able to more predictable stress systems. This is the case in Proto-Chibchan, where ConstenlaUmaña (1981, 2012: 405) reconstructs free stress and three contrastive tones, whereas most daughter languages developed a fixed word-final stress due to the frequent loss of unstressed vowels. Free stress is also reconstructed for Proto-Guaicuruan (Viegas Barros 2013: 98), implying that the predictable systems attested in most daughter languages are historically innovative. Other analyses suggest recent changes that have made stress systems less predictable. In Proto-Panoan, according to González (2016, and references therein), trisyllabic words had fixed stress in penultimate position; after the loss of the final vowel or the entire syllable, stress became word-final, yielding iambic feet in cases of light syllables (*CVCV́ CV>CVCV́ ) or keeping a moraic trochaic pattern if the last syllable was heavy (*CVCV́ CV>CVCV́ C), which explains the variability in foot types in the family (cf. §29.2.2). Proto-Arawan had syllabic trochees assigned from left to right, but reanalysis of surface patterns led to changes in primary stress placement and the direction of metrical parsing (Everett 1995). Jarawara and Banawá (Jamamadi) reanalysed the direction of parsing with right-to-left trochees. However, in Jarawara (Dixon 2004), a second level of stress system corresponds to the ori ginal system in the proto-language (before reanalysis) and triggers certain segmental rules. Similarly, in Koreguahe [Tukanoan], there is no audible secondary stress, but a fortition process of bound morphemes can be accounted for by an ancient iambic system (cf. Cook and Criswell 1993; Chacon 2014).

29.6.2 Tones Tonal languages are found predominantly in regions of great linguistic diversity occupying the western and northwestern fringes of lowland SA, including a few in the Andean foothills (Dixon and Aikhenvald 1999; Adelaar 2004). Some more peripheral languages within these areas, as well as outliers such as Guató [Isolate], Xipaya [Tupian], and Terena [Arawakan], are known cases of (pre-)historical migration. A genuine outlier is Selk’nam [Chonan] in the Southern Cone. Tonal systems seem to be recent developments in SA languages (Hyman 2016c). Tones may be reconstructed for Proto-Tukanoan, Proto-Boran, Proto-Chibchan, and, perhaps, Proto-Tupian. Short time-depth, language attrition with dominant national languages

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

440 THIAGO COSTA CHACON AND FERNANDO O. DE CARVALHO (Eberhard 2009), and language death may jointly explain the lower proportion of tonal over non-tonal languages in SA versus the whole world: 33% against 50–60%, respectively (cf. Hyman 2001; Yip 2002). A recent development of tones in SA might also explain the mixed stress and tone systems. Tonogenesis processes in SA may be useful for expanding the typology of how tones emerge cross-linguistically. For instance, in Terena, vowel syncope triggered the appearance of a contour tone on the first syllable of unpossessed roots: -étete-a ‘pepper’ (possessed) > têti ‘pepper’ (absolute). In the Tukanoan family, languages that lost the glottal stop developed quite distinct tone patterns. In Kubeo, glottal stop syncope created a H tone. In Barasana, Tatuyo, and Máíhɨ ̃ki, the syncope resulted in L tones, similarly to a phonetic effect of the glottal stop in Tukano, where it was retained.9 The evolution of tonal patterns in the Nadahup family shows a complex interplay between length and phonation. Epps (2017) proposes that a vowel length contrast is to be reconstructed to Proto-Nadahup; HL and LH underlying tones emerge as incipient tonal melodies in long vowels when the following consonant was voiced or voiceless, respectively; short vowels remained with H tones. The loss of phonemic vowel length led to the phonologization of tones in Dâw, Hup, and Yuhup, followed by independent changes in each language. Nadëb did not develop underlying tones and retained the vowel length contrast. The relationship between tone and length is so conspicuous that Martins (2005) has proposed that tones are in fact Proto-Nadahup and that vowel length emerged epiphenomenally.

29.7 Conclusion In this review, we have concentrated on general and specific properties of stress, tones, and metrical structure, showing how these properties cluster more or less independently within specific prosodic systems. ‘Mixed systems’, as well as less prototypical stress systems, are perhaps the hallmark of SA word-prosodic typology, which is also a treasure trove for the investigation of diachronic aspects of tone, as well as tone and stress co-evolution. There are very few acoustic and instrumental studies of prosody about these languages, the few exceptions being Storto and Demolin (2005), Gordon and Rose (2006), Hintz (2006), and ElíasUlloa (2010). Given that most SA languages are still un(der)documented, we expect a lot is yet to be learned about SA prosodic systems.

9 See for instance the following correspondences from Proto-Tukanoan: *neʔe ‘buriti palm’: Máíhɨ ̃ki nèè : Koreguahe neʔe : Kubeo néí : Tukano nèʔé : Barasana rẽ̀é : Tatuyo nèè.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

pa rt V

PRO S ODY I N C OM M U N IC AT ION

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

chapter 30

M ea n i ngs of Ton e s a n d Tu n e s Matthijs Westera, Daniel Goodhue, and Carlos Gussenhoven

30.1 Introduction What a speaker can be taken to mean often depends in part on the intonation used. Explanations of this fact often invoke the notion of ‘intonational meaning’: that intonational features carry meaning in their own right and in this way contribute to the meaning of the utterance. Intonational meaning can reside in the phrasing of an utterance and, where available, in the melody and in the location of accents in an utterance. This chapter presents a concise introduction to these topics, focusing on the meaning of melody. It aims to provide both an overview of the subfield and a more detailed look at several theoretical and empirical studies, with an eye to the future. The body of the chapter is organized around a distinction between two kinds of theories of intonational meaning, and their potential reconciliation: generalist and specialist. Generalist theories aim to account for the meanings of a wide range of intonation contours (comparable to the ‘abstract’ meanings of Cruttenden 1997: 89). This is typically attempted by assigning basic meanings to a set of phonological building blocks, the intonational morphemes. In contrast, specialist theories aim to account, in considerable detail and often with formal explicitness, for the usage of a narrow range of contours, or even only a particular use of a particular type of contour (as in Cruttenden’s ‘local’ meanings). Generalist and specialist theories are not necessarily incompatible. They are most fruitfully regarded, we think, as the starting points of two different approaches that may ultimately meet: a generalto-specific or ‘top-down’ approach, and a specific-to-general or ‘bottom-up’ approach. Proponents of generalist theories may ultimately want their theories to yield predictions on a par in detail to those of specialist theories, and proponents of specialist theories may regard their theories as stepping stones to a more general theory in the future. After introducing relevant concepts and distinctions in §30.2, we survey several promin ent specialist and generalist theories in §30.3. Next, in §30.4, we briefly investigate the extent to which the gap between specialist and generalist theories can be bridged, concentrating

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

444 MATTHIJS WESTERA, DANIEL GOODHUE, AND CARLOS GUSSENHOVEN on the role of pragmatics. Lastly, §30.5 reviews empirical work on intonational meaning, relating it to the same challenge, with §30.6 present a brief conclusion.

30.2 Basic concepts for the study of intonational meaning The study of meaning in a strict sense is concerned with what speakers mean when they produce an utterance—say, what they intend to communicate. In a broader sense, it is concerned also with how an audience interprets an utterance, which includes what they take the speaker to mean but may also include any other inferences an audience might draw. Although the difference between intention and interpretation is important, we will follow most of the current literature in using ‘meaning’ in its broader sense. The study of meaning is often subdivided into ‘semantics’ and ‘pragmatics’. Many characterizations of this division exist (for overviews and discussion see Bach 1997; McNally 2013). A sensible one is what Leech (1983) calls the ‘pragmaticist’s view’, in which semantics covers the linguistic conventions on which the clear communication of what a speaker means relies and pragmatics covers the rest (e.g. what a speaker may reasonably mean to begin with, and how the speaker’s communication relies on a combination of conventions and context). The issue of which meaning components are conventional (or ‘semantic’) and which are not (or ‘pragmatic’) is particularly challenging in the case of intonation (Prieto 2015). Speakers use intonation to comment on the pragmatic status of their utterance—say, to clarify how the main contribution of the utterance relates to the conversational goals and to the beliefs of speaker and hearer (‘information structure’) (e.g. Brazil et al. 1980; Gussenhoven 1984: 200; Hobbs 1990; Pierrehumbert and Hirschberg 1990). This function is mostly carried by the linguistically structured part of intonation, encoded in intonational phonology, which comprises discrete contrasts such as that between H* and L*. Paralinguistic intonation, by contrast, is expressed by gradient adjustments of the pitch contour (Ladd 2008b: 37), which typically add emotional or evaluative meanings to the linguistic message. Paralinguistic phonetic adjustments vary gradiently with their meaning. For instance, if high pitch register signals indignation, then higher pitch will signal more indignation. This chapter is primarily concerned with the linguistic part of intonational meaning. However, it may at times be hard to tell whether an intonational meaning is linguistic, because some meanings may be expressed either paralinguistically or morphologically, depending on the language (Grice and Baumann 2007). For instance, languages with a single phonological intonation contour for both assertions and questions, such as Hasselt Limburgish (Peters 2008), will signal the difference between the two meanings paralinguistically, by pitch register raising or pitch range expansion for questions (cf. Yip 2002: 260). An added difficulty is that linguistic and paralinguistic intonational meaning may be diachronically related. Paralinguistic intonational meaning has been claimed to derive from various anatomical and physiological influences on intonation.1 The best known of these influences is the size of the vocal folds, which correlates inversely with their vibration 1 This does not mean that paralinguistic intonation cannot also be a matter of linguistic convention (Prieto 2015). Dachkovsky (2017) provides an example of the conventionalization of paralinguistic signals up to their ultimate morphemic status in the development of Israeli Sign Language.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

MEANINGS OF TONES AND TUNES 445 frequency and thus with pitch. Ohala’s (1983, 1984) Frequency Code accordingly assigns ‘small’ meanings (‘friendly’, ‘submissive’, ‘uncertain’, etc.) to higher pitch and ‘big’ meanings (‘authoritative’, ‘aggressive’, ‘confident’, etc.) to lower pitch. Similar connections between sources of variation and meanings have been identified as the Effort Code, the Respiratory Code, and, tentatively, the Sirenic Code (Gussenhoven 2016 and references therein). Paralinguistic uses of high pitch for questions, of expanded pitch range for emphasis, or of final high pitch to signal incompleteness may have developed into morphemes in many languages, as in the case of interrogative H% (a reflection of the Frequency Code), focusmarking pitch accents (the Effort Code), and H% for floor-keeping (the Respiratory Code), respectively (Gussenhoven 2004: 89). To move on to linguistic intonation, Pierrehumbert (1980) aimed at formulating a phonological grammar to account for the contrastive intonation forms of English (see chapters 4 and 19). The possibility of a morphological analysis (i.e. a parsing of the phonologically wellformed strings of tones into meaning-bearing units) was only hinted at, but concrete proposals were made in numerous subsequent theories of intonational meaning, some of which will be discussed in §30.3. Theories of intonational meaning may differ in (i) the presupposed phonological analysis, (ii) the way the phonemes are grouped into morphemes, and (iii) the meanings assigned to these morphemes. Unless the size of these morphemes encompasses the intonational phrase or the utterance, intonational meaning is ‘compositional’—that is, arises from the combination of the meanings of the various morphemes (Pierrehumbert and Hirschberg 1990). Controversial elements in their phonological analysis concern the existence of the intermediate phrase (ip) and its final boundary tones H- and L-, the internal composition of pitch accents, and the obligatory status of the final boundary tone (for discussion and references, see Ladd 2008b: ch. 4; Gussenhoven 2016; see also chapter 19). In §30.3 we will discuss several ways in which theories of intonational meaning may help to shed light on phonological issues.

30.3 Generalist and specialist theories of intonational meaning 30.3.1 Generalist theories Accounts of intonational phonology have been proposed for many languages, and often come paired with a coarse characterization of intonational meaning, typically in terms of the kinds of speech acts (e.g. question vs. assertion), their turn-taking effects (e.g. continuation vs. completeness), and speaker attitudes (e.g. surprise, uncertainty, incredulity, authoritativeness) with which various contours may typically occur (see, for instance, collections such as Hirst and Di Cristo 1998; Jun 2005a, 2014a; for turn-taking specifically, see chapter 32; see also Park 2013 for Korean). More systematic and explanatory theories of intonational meaning are rarer; they have been developed primarily for English, and we will concentrate on these in what follows. For generalist theories of intonational meaning for other languages see, for example, Portes and Beyssade (2012) for French and Kügler (2007a) for German (Swabian and Upper Saxon). With regard to English, there is considerable agreement about the meaning of final rising pitch, despite some differences as to whether this meaning is contributed by a high boundary

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

446 MATTHIJS WESTERA, DANIEL GOODHUE, AND CARLOS GUSSENHOVEN tone (H%), a rising accent (L*H), its high trailing tone (or a phrase accent H-), or some combination (e.g. L*H H%). The meaning of a final rise is commonly construed either as ‘incompleteness’ (e.g. Bolinger 1982; Hobbs 1990; Westera 2018; Schlöder and Lascarides 2015) or in terms of what may be consequences of (or explanations for) incompleteness, such as ‘testing’ (Gussenhoven 1984), ‘questioning’ (Bartels 1999 and Truckenbrodt 2012, with regard to H-), ‘suspending judgement’ in some respect (Imai 1998), raising a ‘metalinguistic issue’ (Malamud and Stephenson 2015), being ‘forward-looking’ or ‘continuation-dependent’ (Pierrehumbert and Hirschberg 1990; Bartels 1999; Gunlogson 2008; Lai 2012), and placing some responsibility on or effecting an engagement of the addressee (Gunlogson 2003; Steedman 2014; see also chapter 19). This range of characterizations could in principle reflect differences in empirical focus among the various authors, rather than essential differences in the supposed meaning of the rise.2 There is also considerable agreement that a plain falling contour in English should in some sense mean the opposite of a rise. A distinction can be drawn, however, between theories—most of the aforementioned ones—according to which a fall would convey the strict negation of the rise (e.g. ‘completeness’, ‘continuation-independence’), and theories that instead consider the fall a meaningless default, such as in Hobbs (1990) and Bartels (1999), where L% conveys not the intention to convey the negation of what H% conveys, but merely the absence of the intention to convey what H% conveys. Which approach is more plausible depends in part on which theory of intonational phonology one assumes. For instance, in ToBI, boundaries are either H% or L%, so it would make sense if one of the two were the meaningful default. But if, as in Ladd (1983: 744), Grabe (1998b), and Gussenhoven (2004), final boundaries can also be toneless, it seems more natural to treat the toneless boundary as the meaningless default and the low boundary tone as the strict negation of the high boundary tone, as in most accounts (including several that are in fact based on ToBI). And the picture may be different again if one understands ToBI as providing a four-way boundary distinction (i.e. L-L%, L-H%, H-L%, H-H%). As for (pitch) accents, there seems to be a consensus that accents (cross-linguistically) serve to mark words that are ‘important’ in some sense—we will discuss this separately in §30.3.2. There is less agreement about what the meanings of the different kinds of accents would be (for a more detailed overview see chapter 33; see also Büring 2016: ch. 9). According to some authors, the distinction between rising and falling accents in English mirrors that between final rises and falls (e.g. Gussenhoven 1984; Hobbs 1990; Westera 2019). For instance, Hobbs assumes that boundary tones indicate the (in)completeness of an inton ational phrase while trailing tones indicate the (in)completeness of an accent phrase. Other authors do not assume such similarity (e.g. Pierrehumbert and Hirschberg 1990; Steedman 2014); for example, Steedman treats (ToBI-style) boundary tones as signalling agency of speaker ((H/L)L%) versus hearer ((H/L)H%), and assigns meanings to accents on the basis of two other dimensions: (i) whether the material conveyed by the accented phrase is ‘thematic’ (i.e. supposed to be common ground) or ‘rhematic’ (i.e. intended to update the common ground), which is similar to Brazil’s (1997) ‘referring’ versus ‘proclaiming’ or Gussenhoven’s (1984) ‘selection’ versus ‘addition’ (cf. Cruttenden 1997: 108), and (ii) whether 2 It has been proposed, perhaps in contrast, that the ‘incompleteness’ and ‘questioning’ uses of the final rise stem from different biological codes (Gussenhoven 2004; cf. §30.2). For a recent example and discussion of two meanings for a single form of the final rise in declaratives, see Levon (2018).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

MEANINGS OF TONES AND TUNES 447

Table 30.1 Information-structural meanings of pitch accents (Steedman 2014) Thematic (suppose) Rhematic (update)

Success

Failure

L+H* H*, H*+L

L*+H L*, H+L*

this supposition or update is successful or unsuccessful. Relative to these two binary distinctions, Steedman locates the ToBI accents as in Table 30.1. Together with Steedman’s (2014) treatment of the boundary tones, this results in intricate meanings—for example, for the contour L*+H H-H% that ‘the hearer fails to suppose that the accented material is common ground’. What this means depends of course on a theory of notions such as supposition and common ground—we return to this dependence in §30.4. For now, note that Steedman’s distinction between the rows in Table 30.1 is one between rising accents and non-rising accents, while the distinction between the columns concerns the location of the ‘star’, such that H* signals success and L* failure (unlike the more common generalization that H*/L* conveys newness/givenness; e.g. Pierrehumbert and Hirschberg 1990). Within ToBI, this organization may appear quite natural. But within Gussenhoven’s (2004) phonology this apparent naturalness disappears. For one thing, Gussenhoven’s theory does not draw a distinction between L+H* and H* (top left and bottom left of Table 30.1), taking the first to be an emphatic pronunciation of the second (cf. Ladd 2008b: 96). Moreover, ToBI’s L*+H (top right) may correspond in Gussenhoven’s theory either to a rising accent L*H or to a high accent (H*/H*L) that is delayed by means of an independently meaningful L* prefix, with retention of the semantic characteristics of the unprefixed pitch accent (L*HL; cf. Gussenhoven 2016; §3.5). The second case follows the more generally assumed morphemic status of what has been discussed for English as ‘scoop’ (Vanderslice and Pierson 1967; Vanderslice and Ladefoged 1972: 1053), ‘delayed peak’ (Ladd 1983), or [Delay] (Gussenhoven 1983b), and as ‘late peak’ for German (Kohler 1991, 2005). So what is a single morphological operation in one analysis would correspond in Steedman’s (2014) theory to a difference along two semantic dimensions. Thus, again we see that a theory of intonational meaning depends (through one’s morphological analysis) in part on one’s intonational phonology.

30.3.2 Specialist theories Specialist theories aim to account for (a particular usage of) a particular intonational feature or contour in considerable detail, often using tools from formal semantics and pragmatics. These theories have been applied to a number of different melodic features, among them (i) utterance-final rises in declarative sentences (e.g. Gunlogson 2003; Nilsenova 2006; Truckenbrodt 2006; Gunlogson 2008), (ii) accentuation and focus (e.g., among many, Rooth 1985, 1992; Roberts 1996/2012), (iii) particular uses of rise-fall-rise (RFR) (e.g. Ward and Hirschberg 1985; Büring 2003; Constant 2012), (iv) stylized intonation (Ladd 1978), (v) rises and falls in lists (e.g. Zimmermann 2000), and (vi) utterance-final rises and falls in questions (e.g. Roelofsen and van Gool 2010; Biezma and Rawlins 2012). We will discuss a number of examples in more detail.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

448 MATTHIJS WESTERA, DANIEL GOODHUE, AND CARLOS GUSSENHOVEN An influential specialist theory of accent placement (in English and many other languages) is Rooth’s (1985) theory of focus. Rooth seeks to account for the observation (e.g. Dretske 1972; Jackendoff 1972) that accent placement can have various semantic and pragmatic effects, including effects on the truth conditions of the main, asserted contribution of an utterance, as shown in (1), for example. (1) a. John only introduced BILL to Sue. H*L L% b. John only introduced Bill to SUE. H*L L% That is, (1a) is taken to express that John introduced Bill and no one else to Sue, whereas (1b) conveys that John introduced Bill to Sue and to no one else. This motivated Rooth’s integration of accent meaning with ordinary compositional semantics, giving rise to his Alternative Semantics for focus. Very roughly, the accent on Bill in (1a) introduces a set of focus alternatives into the semantics, say, the set {Bill, Peter, Ann}, which higher up in the syntactic tree generates the set {introduced Bill to Sue, introduced Peter to Sue, introduced Ann to Sue}. This set may then serve as input to the word only, which would, as its core meaning, serve to exclude all focus alternatives except the one involving Bill. Although this approach is still influential, Beaver and Clark (2009) argue for a slightly different perspective on the interplay of compositional semantics with the meaning of accentuation, in part based on cases where accent placement appears not to affect the interpretation of only. For them, words such as only are not directly sensitive to accentuation, but only indirectly, by virtue of both only and accentuation being sensitive to the kind of question addressed by the utterance in which it occurs, also called the ‘question under discussion’ (QUD) (e.g. Roberts 1996/2012). That is, given the accentuation, (1a) and (1b) are most naturally understood as addressing different QUDs, and only would convey exclusivity relative to these different QUDs. The idea that certain intonation contours presuppose particular QUDs has been applied to various intonational features, such as English RFR (Ward and Hirschberg 1985, whose ‘scales’ are roughly QUDs), English question intonation (Biezma and Rawlins 2012), and the French ‘implication contour’ (Portes and Reyle 2014), a rise-fall contour where the high peak falls on the final full syllable (which Portes and Reyle transcribe as LH*L% or LH*L-L%). According to Portes and Reyle, the implication contour expresses that the QUD has multiple possible answers. To illustrate, the implication contour would be fine on a disagreeing response (2a) but strange on an agreeing response (2b), because disagreement entails that the original QUD remains an open question (e.g. which kinds of restaurants there are); in (3), the implication contour is fine on an agreeing response, provided what is agreed on is only a partial answer, likewise leaving the QUD an open question. (2) A: Dans cette ville, il n’y a de restaurants que pour les carnivores. ‘In this town, there are restaurants only for carnivores.’ a. B: Il y a un restaurant végétarien. L+H* L- L% ‘There is a vegetarian restaurant.’ b. B: # Il n’y a pas un restaurant végétarien. L+H* L- L% ‘There is no vegetarian restaurant.’

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

MEANINGS OF TONES AND TUNES 449 (3) A: Il y a pas de volets quoi. ‘There are no shutters.’ B: Ah oui ils y ont des rideaux hein. L+H* L- L% ‘Ah yes they have curtains don’t they.’ Other specialist accounts rely not on the notion of QUD but on ‘epistemic’ notions such as discourse commitment, speaker bias, and contextual evidence—intonation thus seems to reflect the interlocutors’ goals (e.g. QUDs) as well as their epistemic states. Among these we find, for instance, a rich literature on English rising declaratives (e.g. Gunlogson 2003, 2008; Nilsenova 2006; Truckenbrodt 2006; Trinh and Crnič 2011; Farkas and Roelofsen 2017), an account of question intonation in Catalan (Prieto and Borràs-Comes 2018), and an account of the contradiction contour in English (Goodhue and Wagner 2018; for an earlier account, see e.g. Bolinger 1982). We summarize Goodhue and Wagner’s account for concreteness. According to Liberman and Sag (1974), the contradiction contour requires some kind of contradiction, but Pierrehumbert and Hirschberg (1990) criticize this characterization for being too vague and incorrectly permissive of examples such as (4) (cf. Pierrehumbert and Hirschberg’s 1990: 293, ex. (20)). (4)

A: There are mountain lions around here. B: # Alvarado said there are no mountain lions around here. %H (L*)

L* L-

H%

Goodhue and Wagner (2018) offer a more precise characterization of the contradiction contour as requiring contextual evidence against the proposition expressed. For instance, A’s utterance in (5) provides contextual evidence both for the proposition that A asserts and for the proposition embedded under the verb said, hence against the content of B’s utterances in both (5a) and (5b), licensing the contradiction contour in each. (5) A: Alvarado said there are mountain lions around here. a. B: No he didn’t. %H L* L-H% b. B: There aren’t any mountain lions around here. %H (L*) L* LH% In contrast, in (4) there is no contextual evidence against the proposition expressed by B, hence the contour is not licensed.

30.4 Towards unifying generalist and specialist theories Despite their successes, both specialist theories and generalist theories have limitations. Specialist theories can generate precise predictions, but only for a narrow empirical domain and from relatively costly assumptions. Generalist theories have a broader scope but generate less precise predictions, perhaps to such an extent that generalist theories are not really

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

450 MATTHIJS WESTERA, DANIEL GOODHUE, AND CARLOS GUSSENHOVEN falsifiable. According to Ladd (2008b: 150), this is primarily because such theories rely on an underdeveloped theory of pragmatics. For instance, the claim that a particular pitch accent marks selection from a common background (Brazil 1997; Gussenhoven 1984) is difficult to falsify in the absence of a pragmatic theory that defines the conditions under which selection from a common ground would be a rational, cooperative thing to do (see Büring 2012 for a congenial criticism of generalist theories with regard to accent placement). Perhaps specialist and generalist theories can be regarded as the starting points of two approaches to intonational meaning, a specific-to-general and a general-to-specific approach. Reconciling these requires investigating how the ingredients of specialist accounts can be generalized to or derived from the assumptions of generalist accounts, for instance through a theory of pragmatics. To illustrate this, consider the suggestions contained in some specialist theories of English rising declaratives for fitting their proposed meanings into a more generalist account. For instance, Gunlogson (2008) proposes to regard her specialist treatment in terms of a ‘contingent commitment’ as a special case of a more generalist treatment of all rising declaratives as contingent discourse moves, and sketches how certain features of the context may guide an audience’s understanding to the more specific meaning (though see Nilsenova 2006 for criticism). Similarly, Malamud and Stephenson (2015) suggest that their specialist treatment, which builds on Gunlogson’s, could be regarded as instantiating the more generalist account of all rising declaratives as raising a metalinguistic issue. For a general-to-specific approach to rising declaratives, one could instead start from the generalist assumption that the rising intonation signals ‘incompleteness’ and try to make this more precise in terms of what it means for an utterance to be incomplete. If it is understood as ‘incompleteness given the goals of cooperative conversation’, this could be explicated for instance in terms of suspending one of Grice’s (1975) maxims of conversation (Westera 2013, 2018) or in terms of the required existence of some future discourse segment (Schlöder and Lascarides 2015). To illustrate the approach based on maxim suspensions, note that one can find or construct a rising declarative for each of the maxims, as in (6) to (9) (examples, respectively, from Gunlogson 2003; Pierrehumbert 1980; Westera 2013; Malamud and Stephenson 2015). (6) (To someone entering the room with an umbrella.) It’s raining? (H%) (7) (To a receptionist.) Hello, my name is Mark Liberman. (H%) (8) (English tourist in a French café.) I’d like... err... je veux... a black coffee? (H%) (9) (B isn’t sure if A wants to know about neighbourliness or suitability for dating.) A: What do you think of your new neighbour? B: He’s attractive? (H%) The final rise in (6) seems to convey that the speaker is uncertain about the truth of the proposition expressed (suspending the maxim of Quality, which demands certainty in this regard), in (7) about whether the information provided is sufficient (suspending Quantity), in (8) about whether it was comprehensible (suspending Manner), and in (9) about whether the information was relevant to the preceding question (suspending Relation). Moreover, conceiving of final rises in terms of maxim suspensions can help to explain add itional characteristics. For instance, Quality-suspending examples like (6) are known to express a speaker bias: the truth of the proposition expressed must be deemed sufficiently likely. This is plausibly because one should not risk violating an important maxim like

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

MEANINGS OF TONES AND TUNES 451 Quality (i.e. risk uttering a falsehood), unless its falsehood is considered sufficiently unlikely (Westera 2018). In this way, by explicating relevant parts of a pragmatic theory, one can derive particular ingredients of specialist accounts from a generalist characterization. A topic where a similar reconciliation of specialist and generalist accounts through pragmatics seems underway is accentuation and focus. For instance, specialist theories of focus, such as Selkirk’s (1995) influential account, can be simplified and potentially improved by placing part of the burden of accent placement not on syntactic stipulations but on pragmatics (Schwarzschild 1999; Büring 2003; Beaver and Velleman 2011). And Rooth’s (1985) specialist theory of focus sensitivity (of words such as only) has been restated in terms of the pragmatic notion of QUD (Beaver and Clark 2009, following Roberts 2012/1996 and o thers).

30.5 Experimental work on intonational meaning Experimental work on intonational meaning encompasses both corpus research and behavioural experiments. Corpus research offers the advantage of spontaneous speech, but it does not enable one to precisely and repeatedly control for subtle pragmatic factors; experiments offer better opportunities for control, but at the cost of less spontaneous data. For reasons of space, we only review some behavioural experimental work in what follows, pointing the reader interested in corpus work to Calhoun et al. (2010, and references contained therein). Experiments may involve production and perception data. In production experiments, utterances are recorded that are produced by participants in response to stimuli, such as a short discourse or a description of the sort of meaning that is to be expressed. The recordings are then annotated for intonation. The goal is to discover manipulations of stimuli that reliably affect intonational behaviour (e.g. Hirschberg and Ward 1992; González-Fuente et al. 2015; Goodhue et al. 2016; Klassen and Wagner 2017). For instance, Goodhue et al. (2016) sought to demonstrate the existence of different rising contours in English with distinct meanings. Participants were asked to produce a single sentence in three different contexts. In (10), this sentence is You like John, to be uttered in response to (10a) so as to contradict an interlocutor, to insinuate something in response to (10b), or to express disbelief or incredulity in response to (10c). (10)

a. (Your friend Emma spent the whole day with John yesterday and you know for a fact that she likes him.) Emma: So yesterday Sarah asked me if I was going to John’s birthday party and I said no, I don’t even like him. b. (You know your friend John is attending the party, and you know Emma knows and likes him, but you’re not sure whether she’ll like anyone else.) Emma: I don’t feel like going to this party tonight, I have the feeling I might not like any of the people there. c. (Just the other day your friend Emma was bad-mouthing John, so you know for a fact that she doesn’t like him.) Emma: Yesterday Sarah kept saying mean things about John and I was really uncomfortable because John’s a nice guy, I really like him.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

452 MATTHIJS WESTERA, DANIEL GOODHUE, AND CARLOS GUSSENHOVEN Each context reliably elicited distinct rising contours, with (10a) eliciting the contradiction contour, (10b) the RFR contour, and (10c) emphatic question rises. Specialist accounts could in principle model this outcome with relative ease, especially if they were to treat each of these contours non-compositionally as a single morpheme, as in the original proposal by Liberman and Sag (1974) for the contradiction contour. Generalist accounts would have to cover a potentially broader range of uses of these contours (or their morphemes), and explain through a pragmatic theory how the contexts in (10) enable the participants to reliably convey the meanings they were asked to convey. Without a sufficiently precise pragmatic theory, generalist theories in particular are difficult to evaluate empirically. In perception experiments, participants hear utterances, again often in context, and are given tasks that are intended to shed light on how intonation affects interpretation. For instance, they may be asked to respond to questions such as ‘How natural is this utterance on a scale of 1 to 7?’, ‘Is this utterance true or false?’, ‘How different are the meanings of these contours?’, or ‘Does this utterance mean X or Y?’ (e.g. among many others, Nash and Mulac 1980; Gussenhoven and Rietveld 1991; Hirschberg and Ward 1992; Chen et al. 2004a; Watson et al. 2008b; Portes et al. 2014; Jeong and Potts 2016; Goodhue and Wagner 2018). Let us consider De Marneffe and Tonhauser (2016) as an example to illustrate that relating empirical findings to theory is not straightforward. Many theories of English RFR treat the contour as cancelling or weakening exhaustivity inferences—that is, inferences that stronger answers to the QUD are false (e.g. Tomioka 2010; Constant 2012; Wagner 2012b). In apparent contrast to these theories, De Marneffe and Tonhauser discovered that when B’s answer in (11) is pronounced with RFR, the answer is less likely to be interpreted as an affirmative answer to the question (i.e. as meaning ‘beautiful’) than when it is pronounced with a plain falling contour. (11)

A: Is your sister beautiful? B: She’s attractive… L*+H LH%

De Marneffe and Tonhauser’s interpretation of their results is that RFR strengthens the ‘not beautiful’ interpretation, which would amount to strengthening an exhaustivity inference rather than weakening it, in apparent contrast to the aforementioned accounts of RFR. However, this conclusion seems to rely on two implicit assumptions about the pragmatics of cases such as (11), the plausibility of which is difficult to assess without a detailed pragmatic theory. One is that B’s response must be interpreted either as an affirmative or as a negative answer to A’s question (as opposed to, e.g., ‘I don’t know’ or ‘it depends’)—for otherwise its affirmative interpretation being less likely does not necessarily imply that its negative interpretation is more likely (i.e. that its exhaustivity inference is strengthened). In other words, depending on one’s theory of pragmatics, the results are consistent with participants drawing an ignorance inference rather than an exhaustivity inference. Another assumption is that the non-exhaustivity predicted by existing accounts would necessarily pertain to A’s question and not, say, to some higher, implicit QUD such as ‘Could your sister be a model?’, a possibility acknowledged by Wagner (2012b; though contrary to Tomioka 2010 and Constant 2012). For, although (11) makes part of the context explicit through A’s question, it leaves implicit why this question was asked. Moreover, Kadmon and Roberts (1986) note that different intonation contours may favour different understandings of implicit parts of the context—say, of the implicit, higher QUD in (11)—further complicating the

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

MEANINGS OF TONES AND TUNES 453 interpretation of experimental results. This stresses again the importance of pragmatics for the study of intonational meaning—or, conversely, that given a certain theory of inton ational meaning, experiments may offer an important window on pragmatics.

30.6 Conclusion There is agreement that intonation has a linguistically structured component with phon ology, morphology, and basic meanings for the morphemes, and that this component serves to clarify the pragmatic status of utterances in various ways. There is also some agreement with regard to generalist characterizations of the meanings of utterance-final contours (e.g. in terms of pragmatic ‘(in)completeness’), and proposals exist to make these generalist characterizations more precise with the help of advances in pragmatics. With regard to (the various ingredients of) more complex contours, however, much remains to be discovered. A lack of consensus about the meanings of various accent types is due in part to (and also a partial cause of) disagreement about what the phonemes are and even more so what the morphemes are, even for the intensively studied West Germanic languages. It is also due in part to the difficulty of testing theories of intonational meaning, but there again, advances in pragmatic theory will lead to a better understanding.

Acknowledgements We would like to thank Daniel Büring and Pilar Prieto for their feedback. Any remaining errors are of course our own.

chapter 31

Prosodic Encodi ng of I n for m ation Structu r e A typological perspective Frank Kügler and Sasha Calhoun

31.1 Introduction Information structure (IS) can be conceived of as a ‘cognitive domain’ that interacts with the linguistic modules syntax, phonology, and morphology on the one hand, and, on the other hand, with other cognitive capabilities that control, update, and infer interlocutors’ common beliefs (Zimmermann and Féry 2010). A speaker’s utterance can be subdivided according to its IS, and the constituents of an utterance can be analysed as being focused, given, and/or topical. Interlocutors use various linguistic means to achieve the goal of controlling, updating, and inferring their common beliefs. This chapter deals with the prosodic means that speakers use to achieve this goal from a typological perspective. At first sight, it may appear that languages vary widely as to which prosodic cues signal IS (see Jun 2005a, 2014a; Kügler 2011; Downing and Rialland 2017a). However, within prosody and IS research, it has recurrently been proposed that this variation can be subsumed by underlying principles that point to common phonological structures present cross- linguistically (Truckenbrodt 1995; Büring 2009; Féry 2013). Another line of research concerns the debate about whether there is a relationship between the prosodic profile of a language and its prosodic means to express IS (Burdin et al. 2015). This chapter is organized around the different prosodic strategies used to express IS in the languages of the world (§31.3), including interactions between syntax, prosody, and IS (§31.4), leading up to a discussion of commonalities between languages (§31.5). The basic concepts of IS are introduced in §31.2, and §31.6 presents an evaluative outlook for future research.

PROSODIC ENCODING OF INFORMATION STRUCTURE 455

31.2 Basic concepts of information structure We adhere to the widely held view that the basic notions of IS are the cognitive categories focus, givenness, and topic, which are rooted in a theory of communication (Krifka 2008). IS refers to the division of an utterance into information packages (Chafe 1976b) with the aim of achieving a continuous information update about the common ground of interlocutors. The category focus is defined as an indication of ‘the presence of alternatives that are relevant for the interpretation of linguistic expressions’ (Krifka 2008: 247). This abstract cognitive category can be prosodically expressed in language-specific ways. Consider the mini-dialogue in (1). Speaker Q’s wh-question is a request to update the common ground with the referent of the wh-word. Speaker A responds by selecting the referent Peter as the relevant one of the possible alternatives in the context. This constituent is the focus and receives a pitch accent in languages such as English (whereby the location of the pitch accent is given in upper case). The rest of the utterance is the background. The focused constituent in the answer corresponds to the wh-constituent in the question, thus creating a coherent discourse. (1) Q: Who stole the cookie? A: [PEter]foc stole the cookie. (Krifka 2008: 250) A further use of focus is to correct or confirm information that an interlocutor puts under discussion, as in (2). In this case, context (2C) contains a proposition containing potential focus alternatives. In (2A), the speaker corrects (2C), while (2A´) contains the identical proposition. (2)

C: Mary stole the cookie. A: (No,) [PEter]foc stole the cookie! A´: Yes, [MAry]foc stole the cookie. (Krifka 2008: 252)

In (1), the focus is also new information in the context, and the background given. However, givenness is actually orthogonal to focus in that a focused constituent can either be given or new, which represent either end of a givenness scale. Givenness refers to the information status of a constituent. A constituent is given if it is present in the immediate common ground (Krifka 2008: 262), or new if it is not. A further, intermediate stage of the givenness scale is when the constituent is present in the common ground but may be activated by the immediate discourse. In some languages, different degrees of salience correlate with the degree of prosodic prominence a constituent is expressed with (see further §31.3.1). The example dialogue in (3), for instance, shows a case where speaker A mentions the cookie and speaker B repeats the same word in the answer. The givenness status of the cookie affects accent assignment rules in some languages (Gussenhoven 1983c; Selkirk 1995). The deaccentuation of cookie in combination with a focus accent on the verb constitutes discourse coherence and illustrates that a given constituent is expressed with less prosodic prominence.

456 FRANK Kügler AND SASHA CALHOUN (3) A: I know that John stole a cookie. What did he do then? (Krifka 2008: 264) B: He [reTURNED [the cookie.]Given]foc Finally, the category topic refers to a constituent of an utterance that a speaker chooses to give further information about in the rest of the sentence, which is usually referred to as the comment (Krifka 2008: 264–265). A topic in this sense is usually referred to as an aboutness topic. The division of a sentence into topic and comment opens a relationship between the information conveyed by the topic and the information given about the topic. Taking the answer of the cookie example (1) as a context-free sentence, (4) illustrates that Peter represents the topic, and the following information in the comment is about the topic constituent Peter. (4)

A: [Peter]Topic [stole the cookie.]Comment

The key IS categories are illustrated here in English, which relies heavily on prosodic cues to express IS. Note, however, that across languages linguistic devices other than prosodic means are used to express these information-structural categories (Zimmermann and Féry 2010; Féry and Ishihara 2016). The prosodic expression of focus is by far the most studied aspect, but we discuss the expression of givenness and topic when possible. In many studies, the category of focus/background is conflated with the referential status of new/ given. We comment on work that separates these when relevant.

31.3 A typology of prosodic encoding of information structure This section divides the prosodic means of marking IS into stress-based, phrase-based, and register-based encodings.

31.3.1 Stress- or pitch-accent-based cues Stress-based systems are the most well-studied type of prosodic encoding of IS, with decades of research, predominately on English. The basic pattern is that the focused word in an utterance is the most prosodically prominent (e.g. Selkirk 1995; Ladd 2008b; Büring 2009, 2016; Calhoun 2010). Stress-based prominence is marked by phonetic and phonological cues, which increase the prominence of a word relative to others in the utterance. Phonetic cues include higher fundamental frequency (f0), greater f0 movement, lengthening, increased intensity, and higher spectral tilt on the word, and a drop in f0 after it (see summaries in Ladd 2008b; Breen et al. 2010; Fletcher 2010; Turk 2011). Of these, f0 cues are the best studied and are perceptually important (see review in Cole 2015, and references therein). Intensity and lengthening are also perceptually important, at least in English (Turk and Sawusch 1996; Kochanski et al. 2005; Breen et al. 2010). Phonologically, a word is the most prominent in an utterance because its main stressed syllable is the head of the largest prosodic phrase that it is part of (usually the intonational phrase, ι) (Ladd 2008b). The head of the ι-phrase carries a nuclear pitch accent. Therefore, as established by work in the Autosegmental-Metrical (AM) framework, focus is not

PROSODIC ENCODING OF INFORMATION STRUCTURE 457 (a) ( ( L+H* )φ L- (

[PEter]foc

(b) )φ ) ι L%

stole the cookie

( H* )φ (

PEter

L+H*

)φ ) ι L-L%

stole the [COOKie]foc

Figure 31.1 Typical realizations of (1) and (4), showing how focus position affects prosodic realization. A schematic pitch realization is given, along with the prosodic phrasing, intonational tune, and text, where capitals indicate the pitch accented syllable. See text for further details.

marked directly by phonetic cues but rather indirectly: these cues primarily mark (nuclear) pitch accents, which mark focus (Selkirk 1995; Ladd 2008b; Büring 2009, 2016; Calhoun 2010, inter alia). This is opposed to a direct view, where focus is taken to be marked directly by phonetic prominence (Eady et al. 1986; Xu and Xu 2005; Breen et al. 2010). Figure 31.1 illustrates typical realizations of (1) and (4). Each utterance has a prosodic phrase structure (phonological phrases, φ, and ι-phrases, shown) and an intonational tune (according to the AM_ToBI scheme; see chapter 6). The nuclear accent is, by default, the final pitch accent in the ι-phrase, hence the focus on Peter in Figure 31.1a and cookie in Figure 31.1b. This is supported by perception studies, which show that listeners expect the nuclear accent, and therefore the focus, to be final and a final nuclear accent to be compatible with variable focus scope (Ayers 1996; Terken and Hermes 2000; Carlson et al. 2009; Bishop 2012, 2017). Prenuclear pitch accents may or may not mark additional foci in the sentence (Féry and Samek-Lodovici 2006; Calhoun 2010); for example, the H* accent on Peter in Figure 31.1b does not. The prominence of prenuclear accents can also signal the distinction between broad and narrow focus, although there is substantial overlap between realizations compatible with either (Rump and Collier 1996; Ladd 2008b: ch. 7; Breen et al. 2010). Postnuclear phonological heads are either not associated with pitch accents (i.e. they are ‘deaccented’) or, if there are accents, they appear in a compressed pitch register (Kügler and Féry 2017). Fully fledged postnuclear accents mark focus only in constrained discourse contexts such as second occurrence focus (e.g. Beaver et al. 2007; Baumann 2016). Stress-based marking of focus appears to be widespread among the world’s languages. It is ubiquitous among Germanic languages (Fanselow 2016), such as English, German (e.g. Féry and Kügler 2008), Dutch (e.g. Gussenhoven 2004), and Swedish (e.g. Bruce 1977; Myrberg and Riad 2016). Stress-based marking is also commonly found in Slavic languages (Jasinskaja 2016), including Russian (Luchkina and Cole 2016) and other Eastern European languages including Romanian (Manolescu et al. 2009), Greek (Baltazani and Jun 1999; Skopeteas 2016), Romani (Arvaniti and Adamou 2011), and Estonian (Ots 2017). Stressbased focus marking is also reported for Persian (Sadat-Tehrani 2007; Hosseini 2014; but see also chapter 13), the Oceanic language Torau (Jepson 2014), and Paraguayan Guaraní (Clopper and Tonhauser 2013). There are differences in the extent of stress-based marking of focus, however, even between closely related languages or varieties of one language. For example, Chahal and Hellmuth (2014b) report differences between Lebanese and Egyptian Arabic (see also Caron et al. 2015; and see El Zarka 2017 for an overview of varieties of Arabic). In Lebanese Arabic, a narrowly focused word is nuclear accented, and any postnuclear material deaccented (see Figure 31.2a), similarly to what happens in English (see Figure 31.1). In Egyptian Arabic,

458 FRANK Kügler AND SASHA CALHOUN (a) 400 300

Pitch (Hz)

200 150 100 H*

ˈmuna

Muna

ˈħamet

she-protected

0

ˈlama

Lama

L-L%

min from

ˈlima

Lima 1.849

Time (s)

(b) 400 300

Pitch (Hz)

200 150 100 L+H*

ˈmaama mum

0

L–

L+H*

L+H*

L+H*

bititˈʕallim

juˈnaani

bi-l-ˈleel

she-is-learning

Greek

at-night

Time (s)

H–H%

1.827

Figure 31.2 Lebanese Arabic (a) and Egyptian Arabic (b) realization of narrow focus on the initial subject, from Chahal and Hellmuth (2014b). As can be seen, post-focal words are deaccented in Lebanese Arabic but not Egyptian Arabic.

every prosodic word (ω) is pitch accented, and there is no postnuclear deaccentuation, although the pitch range of the focal accent is expanded (see Figure 31.2b). Romance languages are similar: some (including European Portuguese and Mexican, Argentinian, Basque, and Canarian varieties of Spanish) follow the standard stress-based pattern, whereas others (including Madrid Spanish, Catalan, and Italian) only use in situ stressbased marking in some discourse contexts (e.g. contrastive focus), and postnuclear accents are not deleted (Frota and Prieto 2015b; Vanrell and Fernández-Soriano 2018); see further §31.3.3 and §31.4. Along with stress-based marking, focus can usually also be signalled by word order, and this interacts with prosodic cues to focus (see §31.4). However, in the standard case, in situ stress-based marking, whereby the nuclear accent can be in any position in the utterance, is always available as a means of marking focus, even if it may not be the preferred one (Skopeteas and Fanselow 2010).

PROSODIC ENCODING OF INFORMATION STRUCTURE 459 As discussed in §31.2, referential givenness is orthogonal to focus, although focus and newness are correlated. In studies that separate focus and givenness, givenness is generally associated with lower stress-based prominence and/or deaccenting, both inside and outside focused constituents (e.g. Cutler et al. 1997; Baumann 2006; Féry and Kügler 2008; Cole et al. 2010; Baumann and Riester 2012; Baumann and Kügler 2015). As shown in (4), in English and other Germanic languages, within a multi-word focus, if the final word is highly contextually salient, it is usually deaccented (Wagner 2005; Ladd 2008b; Riester and Piontek 2015). Similarly, given words in the background (non-focus) of an utterance are more likely to be unaccented than new words (Gussenhoven 1983c; Selkirk 1995; Féry and Kügler 2008). Listeners typically find it harder to process discourses in which given items are accented and new items unaccented (Cutler et al. 1997; Baumann and Schumacher 2012). Most of the empirical work in this area has been on Germanic languages, so it is unclear how widespread these patterns are. The tendency to deaccent given words within a focus, as in (4), is found for Slavic languages (Jasinskaja 2016) and Paraguayan Guaraní (Burdin et al. 2015), but not in Romance languages (Swerts et al. 2002; Ladd 2008b) or ‘outer circle’ varieties of English including Indian English and Caribbean English (Ladd 2008b: 232). Given words have lower prominence than new in unfocused positions in Hungarian (Genzel et al. 2015) but not Egyptian Arabic (Hellmuth 2011).

31.3.1.1 Types of nuclear accent or tune Along with the placement of the nuclear accent, the pitch accent and/or boundary tone type is argued to play an important role in encoding IS in a variety of languages, particularly aspects of IS beyond focus, principally contrast and topic status. One well-studied case is Romance languages, where there is widespread consensus that the nuclear accent + boundary tone configuration signals broad versus narrow/contrastive focus (Frota and Prieto 2015b; chapters 16 and 17), with the exception of French (see §31.3.2). A typical example is Sardinian (Vanrell et al. 2015). In broad focus, the nuclear accent is usually H+L* (Figure 31.3a), while in narrow focus it is H*+L (Figure 31.3b), both followed by a low boundary L%. That is, the nuclear accent peak is aligned earlier in broad focus. In these languages, prenuclear accents are generally of one tonal type (L+H* in Figure 31.3), while nuclear accent (and following boundary tone) types vary and signal IS and other pragmatic meanings. Relatedly, in English and other Germanic languages, it is widely claimed that contrastive foci are typically marked by L+H* accents, with H* for non-contrastive focus (e.g. Pierrehumbert and Hirschberg 1990; Ito and Speer 2008; Watson et al. 2008b). The L+H* and H* distinction, has, however always been problematic, as it is difficult for annotators to make, and it is the only accent type distinction not based on the association of L and H tones. The most reliable cue to the distinction is peak height (Ladd and Schepman 2003; Dilley 2005; Ladd 2008b; Calhoun 2012; Repp 2016), which is experimentally confirmed for English (e.g. Welby 2003b; Breen et al. 2010; Katz and Selkirk 2011), Dutch (Krahmer and Swerts 2001), and German (Braun 2006; Kügler and Gollrad 2015), suggesting that this is better construed as a stress-based distinction. Similar arguments have been made about Romance languages—that is, the later peaks for narrow focus are also often higher (e.g. see Figure 31.3)—so peak alignment may be a partial proxy for peak height (Gussenhoven 2004: 90; Ladd 2008b: 186; Vanrell et al. 2013; Borràs-Comes et al. 2014; Repp 2016). It remains

460 FRANK Kügler AND SASHA CALHOUN (a)

300

0

0.5

1

1.5

f0 (Hz)

250 200 150 100 50 L+H* u

ˈt͡si

pi

na

H+L* nːa

ma

pitzinna

Una

ni

mandicande 1

1

ˈɽan

dæ˔

kan

L% t͡sʊ

arantzu 4

1

(b)

450

0

1

1.5

f0 (Hz)

370 290 210 130 50 L+H* ˈdε

L+H* ˈɔ̝

ʊ

Deu

man

lːʊ bollu

1

L%

H*+L da

ˈɾiː

nʊ

MANDARINU! 1

4

Figure 31.3 Broad focus (a) and contrastive focus (b) in Sardinian from del Mar Vanrell et al. (2015).

unsettled whether distinctions based on peak scaling are gradient or categorical (e.g. Gussenhoven 2004: ch 4; Ladd 2008b: ch. 5; Borràs-Comes et al. 2014). A connected issue is lack of consistency between studies as to what constitutes contrastive focus: narrow focus (focus on a single word), contrastive (involving contextually identifiable alternatives), or corrective (see (2)) (e.g. Repp 2016). It seems unlikely that any one language systematically distinguishes all of these prosodically. Rather, it may be that speakers

PROSODIC ENCODING OF INFORMATION STRUCTURE 461 use increased phonetic prominence to draw attention to foci that are informationally salient (e.g. explicit contrast or corrections) (Baumann et al. 2006; Calhoun 2009; Féry 2013). In a number of languages, topics and foci are associated with different accent types: rising accents (L+H*/L*+H) for topics and falling (H*+L/L-) for foci, as in German (e.g. Braun 2006; Repp and Drenhaus 2015), English (Büring 2003; Steedman 2014), Russian (Alter and Junghanns 2002), and Arabic (El Zarka 2017). This is linked to sentence position, with topics typically preceding foci. Calhoun (2012) claims for English that the distinction is better construed as one of relative prominence, with topics less relatively prominent than foci (see also Féry 2007). Some proposals link a greater range of tonal event types with more detailed IS frameworks (see also chapter 30). For example, in English, Steedman (2000, 2014) proposes four orthogonal dimensions of IS, signalled by combinations of pitch accent and boundary tones: background/contrast (our focus), theme-rheme (roughly topic-comment), added/ not added to the common ground, and speaker/hearer claimed (see also Pierrehumbert and Hirschberg 1990; Brazil 1997; for German see Féry 1993 and Grice et al. 2005a; see Prieto 2015 for related work on Romance languages). It is unlikely, however, that there is a one-to-one correspondence between phonological tune types and IS expression (e.g. Hirschberg et al. 2007; Féry 2008; Zimmermann and Féry 2010; Féry and Ishihara 2016). The role of accent and boundary tone type in signalling IS has predominately been researched in languages with post-lexical pitch accents. However, they may have a role in other languages. For example, in Mandarin Chinese, boundary tones on sentence-final particles can signal meanings such as presupposition, which are part of IS (e.g. Peng et al. 2005, and references therein).

31.3.2 Phrase-based cues The expression of focus by phrase-based cues first received more attention when research beyond well-studied languages such as English contributed to the field. The basic insight is that the word-prosodic system of these languages predominantly does not have lexical prominence and hence focus cannot be expressed by pitch accenting as in languages with stress-based cues. The principal intonation units are phrase tones, and prosodic phrasing is the major component of intonation. The languages differ with respect to the domain that prosodic phrase markings indicate (e.g. a ω, a φ-phrase, or an ι-phrase), but the general commonality of these languages is that focus induces additional phrase boundaries. Hence, the highlighting function of focus is expressed by separating the focused constituent from other constituents; at the same time, post-focal constituents may be integrated into the same phrase as the focused constituent. Consider Korean, which has neither lexical pitch accent nor lexical stress (Jun 2005b; chapter 24). Korean distinguishes two levels of phrasing. The ι-phrase dominates at least one accentual phrase (α), and an α consists of at least one ω. Each α is tonally marked by two rising pitch patterns. The first one is associated with the initial two morae of the α and the last one with the α-final morae (5b). There is some variation as to the initial rise: if the α-initial consonant is tense or aspirated, the α-initial tone is H (S.-A.Jun 1998, 2005b). Phrase-final lengthening in combination with a low boundary tone demarcates the ι-phrase, whereas the α shows no final lengthening.

462 FRANK Kügler AND SASHA CALHOUN A prosodic boundary is consistently inserted before a focused constituent (Jun and Lee 1998; Jun and Kim 2007; Yang 2017; Jeon and Nolan 2017), while following words tend to be in the same φ as the focused constituent (5c). Jeon and Nolan (2017) also observe a tendency for the focused constituent to be realized as an ι-phrase. In addition, all researchers show that the focused constituent is realized with higher phrase-initial pitch, longer dur ation, and higher intensity. In this sense, Korean overlaps with languages that use stressbased cues to mark focus (see §31.3.1). However, one may interpret these cues as a phonetic effect while focus phrasing may be phonological (Jun and Lee 1998). (5) a. miɾaneka neil tʃənjəke bananaɾɨl məknɨnte mira.family.gen tomorrow night banana.pl eat.prog ‘Mira’s family is eating bananas tomorrow night.’ b. phrasing in broad focus (LH LH)φ ((LH LH)φ

(LH LH)φ

(LH LH)φ

(LH L)φ L%)ι

c. phrasing in narrow focus, e.g. focus on second φ-phrase ((LH LH)φ (LH L)φ L%)ι (Korean, Jun and Lee 1998; prosodic phrasing is our own) For the Bantu language Chichewa, Kanerva (1990) analyses the effect of focus as an insertion of a prosodic boundary at the right edge of a focused constituent. Bantu languages are known for phrase-penultimate vowel lengthening, as the penultimate vowel of the phrase-final noun mwála ‘rock’ undergoes lengthening, as focus within a verb phrase (VP) takes the verb as the left edge of the prosodic phrase until the right edge of the focused constituent (Kanerva 1990: 157). The effect of focus on the PP (6b) is identical to VP focus. Any following unfocused constituents each form their own prosodic phrasal domain (6c, 6d). For a more complex analysis of the interaction between focus and prosodic phrasing in Chichewa, see Downing et al. (2004) and Downing and Pompino-Marschall (2013), who, however, base their analysis on speakers of a different variety of Chichewa. (6)

a. b. c. d.

VP focus: What did he do? (Anaményá nyumbá ndí mwáála)φ he hit the house with a rock ‘He hit the house with a rock.’ PP focus: What did he hit the house with? (Anaményá nyumbá ndí mwáála)φ Object focus: What did he hit with the rock? (Anaményá nyuúmba)φ (ndí mwáála)φ Verb focus: What did he do to the house with the rock? (Anaméenya)φ (nyuúmba)φ (ndí mwáála)φ (Chichewa, adapted from Kanerva 1990: 156)

Languages thus differ as to where a phrase boundary is inserted. While in Korean a boundary left of the focus is inserted, in Chichewa it is at the right of the focus. Further languages that show similar patterns to either Korean or Chichewa are French (Féry 2001; Jun and Fougeron 1995, 2000), Japanese (Beckman and Pierrehumbert 1986; Venditti et al. 2008), Georgian (Skopeteas et al. 2018), Shingazidja (Patin 2016), and Xhosa (Jokweni 1995; Zerbian 2004).

PROSODIC ENCODING OF INFORMATION STRUCTURE 463

31.3.3 Register-based cues Raising the pitch register is another strategy used by a number of languages. Pitch register defines reference lines relative to which local tonal targets are scaled (Clements 1979; Ladd 2008b). This type of cue is similar to stress-based cues in that it involves increasing the prominence, particularly pitch scaling, of the focused element, and/or it involves the reduction of prominence by compressing the post-focal pitch register; however, this is not achieved through pitch accenting, as these languages generally do not have post-lexical pitch accents. Register raising is similar to phrase-based cues in that these also frequently involve pitch scaling, but, in the case of the languages discussed in this section, it is not straight-forwardly related to phrasing (though see further §31.5). In Mandarin, the pitch register effects under focus are well studied (Xu 1999; Chen et al. 2016). Two prosodic effects occur, both of which preserve the lexical tones. First, the focused word exhibits a change in f0 that depends on the lexical tone: compared to broad focus, maximum f0 for H tones is raised, including the beginning of HL and the end of LH, while minimum f0 is lowered for L tones, including the end of HL and the beginning of LH. That is, there is a register expansion affecting both the top line and the bottom line of the register (Xu 1999: 69), in addition to an increasing duration (Chen 2006; Chen and

mean f0 in Hz

280

Pitch contour SOV Subject Focus Object Focus Wide Focus

260 240 220 200 180 Subject

Object

mean f0 in Hz

280

Verb Pitch contour OSV Subject Focus Object Focus Wide Focus

260 240 220 200 180 Object

Subject

Verb

Figure 31.4 Time-normalized pitch tracks in different focus conditions in Hindi, based on five measuring points per constituent, showing the mean across 20 speakers, from Patil et al. (2008: 61). SOV (a) and OSV word order (b). The comparisons of interest are subject focus (dotted line) and object focus (dashed line) with respect to broad focus (solid line).

464 FRANK Kügler AND SASHA CALHOUN Braun 2006). Second, the f0 after the focused word is lowered, a feature called post-focus compression (Xu 1999, 2011; Xu et al. 2012). Hindi likewise is best characterized as using register-based cues to mark focus prosod ically. In this case, there is minimal effect of focal raising but clear post-focal compression (Patil et al. 2008). In Figure 31.4, compression can be seen on the object in SOV order and the subject in OSV order, with only a small raising on the focused subject in SOV order. The register effect thus appears post-focally and correlates with givenness in Patil et al. (2008). Post-focal compression also functions as a cue to focus perception (Kügler 2020). Other languages that seem best described as register based are West Greenlandic (Arnhold 2014a), Georgian (Skopeteas et al. 2018), Jaminjung (Simard 2010: 214–216), Serbo-Croatian (with post-focal compression similar to Hindi) (Godjevac 2005), and Akan, with a general register-lowering effect of focus, even for lexical H tones (Kügler and Genzel 2012). As for topics, a thorough study of the effects of focus and topic in Mandarin revealed that while topic raises the f0 register at the beginning of the sentence, after the topic f0 drops gradually (Wang and Xu 2011). Hence there is no post-topic compression unlike for foci. The amount of topic raising differs from that of focal raising.

31.4 Syntax–prosody interaction and non-prosodic-marking systems In many languages, syntax is an essential means to encode IS, affecting word order and the choice of syntactic construction (see Féry and Ishihara 2016). In many of these languages, there are alignments between prosodic and syntactic encoding of IS, leading to proposals that syntactic encoding of IS is often prosodically motivated. Languages may use a syntactic focus position, most commonly either sentence-initial or -final (e.g. see Rebuschi and Tuller 1999; Drubig and Schaffar 2001; Neeleman et al. 2009; Féry and Ishihara 2016). Sentence-initially, often a specific construction such as a cleft is used (e.g. Lambrecht 2001; Hedberg 2013; Cesare 2014), where the focus in the cleft generally represents the highest prosodic prominence, while the post-focal main clause has reduced prominence (Lambrecht 2001; Vanrell and Fernández-Soriano 2018). Word order type of the language and the syntactic focus position appear to co-vary. In verb-initial languages, an initial focus position seems to be common and often coincides with the position of nuclear prominence (Herring 1990; Longacre 1995; Simard and Wegener 2017). For example, in Hungarian, narrowly focused items almost obligatorily appear immediately before the (finite) verb—that is, initial apart from any pre-verbal topics (e.g. Kiss 2002; Szendröi 2003). The pre-verbal position is also the position of the nuclear accent in Hungarian, so the focus must be placed here to align with the nuclear prominence. Narrow focus in Hungarian is marked by increased prosodic prominence on the focused word and/or lowering of accent peaks and deaccenting in the post-focal region, and cor rective (exclusive) focus is marked by increasing the relative prominence of the focus compared to the post-focal region (Genzel et al. 2015). Other verb-initial languages reported to

PROSODIC ENCODING OF INFORMATION STRUCTURE 465 have an initial focus position coinciding with the nuclear prominence include Māori (Bauer 1997), Samoan (Calhoun 2015), and probably other Polynesian languages (e.g. Clemens 2014), the Oceanic language Gela (Simard and Wegener 2017), the Berber language Tamasheq (Caron et al. 2015), and the Australian language Dalabon (Fletcher 2014); though see later in this section for counterexamples. Sentence-final focus is common for SVO languages, where the default nuclear promin ence position is final—for example, most Romance languages (Zubizarreta 1998; Ladd 2008b; Büring 2009; Frota and Prieto 2015a). For example, in Madrid Spanish, foci must be nuclear accented, and the nuclear accent must occur in phrase-final position. The prosodic marking of focus in Madrid Spanish is effectively the same as a standard stress-based system (see §31.3.1), except that the tendency that the nuclear accent should appear in phrase-final position is weaker than a tendency against non-canonical word order. Note that there is considerable variation within Romance languages, including between varieties of Spanish, in the extent to which they show in situ stress-based focus marking (Frota and Prieto 2015b; Vanrell and Fernández-Soriano 2018). In verb-final languages, the focus position is often immediately before the verb, which correlates with nuclear prominence, with the verb consistently produced with lower pitch, as in Hindi (Patil et al. 2008; Féry et al. 2016), Bengali (Hayes and Lahiri 1991), Turkish (Vallduví and Engdahl 1996; Kamali 2011), and Basque (Elordieta and Hualde 2014). There are a growing number of languages, however, where morphological and/or syntactic focus marking has not been found to correlate with any distinct prosodic marking. For instance, like other Mayan languages, the tone language Yucatec Maya has a pre-verbal focus position, syntactically analysed as a cleft construction (Kügler et al. 2007: 189; Verhoeven and Skopeteas 2015: 3), and canonical word order is VOS. The prosody of pre-verbal focused words does not differ from comparable non-focused words (Kügler and Skopeteas 2006; Gussenhoven and Teeuw 2008). Moreover, Kügler and Skopeteas (2007) present quantitative evidence that in situ focus on adjectives in an object noun phrase does not affect the tonal realization either. Similar results as in Yucatec Maya are found for the intonation languages Wolof (Rialland and Robert 2001) and Nɬeʔkepmxcin (Thompson River Salish) (Koch 2008); for other African tone languages such as Sotho (Zerbian 2007; Zerbian et al. 2010), Hausa (Hartmann and Zimmermann 2007b), Buli (Schwarz 2009), and further ones discussed in Downing and Rialland (2017b); for the Athabaskan languages Beaver (Schwiertz 2009) and Navajo (McDonough 2002); and for Malay and other varieties of Indonesian (e.g. Maskikit-Essed and Gussenhoven 2016), for which it has been shown that neither stress nor prosodic focus is perceptually detected (Goedemans and van Zanten 2007; Roosman 2007). The initial position is also associated with topics, with many languages placing topic constituents initially that may or may not be integrated syntactically with the rest of the clause (e.g. see Gundel and Fretheim 2004; Neeleman et al. 2009). A commonly found pattern is that initial topics form their own ι-phrase and, unlike foci, are not accompanied by any prominence reduction in post-topical material. Languages where this pattern is reported include Hungarian (Surányi et al. 2012; Genzel et al. 2015); German (Féry 2011); Māori (Bauer 1997); Gela (Simard and Wegener 2017); the West African language Zaar and Juba and Tripoli Arabic (Caron et al. 2015); Russian (Alter and Junghanns 2002); and the Australian language Jaminjung (Simard 2010).

466 FRANK Kügler AND SASHA CALHOUN

31.5 Unified accounts In this chapter, languages are grouped by the main type of prosodic cue each uses to encode IS. However, rather than being independent strategies, it is frequently proposed that these are different instantiations of a general principle of prosodic focus marking (e.g. Truckenbrodt 1995; Zubizarreta 1998; Gussenhoven 2008; Büring 2009; Féry 2013). All three types of cue have been argued to encompass the other two: focus-as-prominence, focus-as-alignment, and focus-as-register. The most commonly proposed unifying principle is prominence—that is, the smallest prosodic unit (e.g. ω) containing the focus is the most prominent in the largest containing the focus (e.g. ι-phrase) (e.g. Truckenbrodt 1995; Samek-Lodovici 2005; Büring 2009). This is consistent with the paralinguistic ‘Effort Code’ (Gussenhoven 2002). Importantly, prom inence is an abstract property, so cues may differ between languages. For phrase-based systems, it is argued that prominence is at the phrasal level, which can be positional (Büring 2009). For example, as was shown in Korean, φ-phrases to the right of the focus are deleted (§31.3.2), making the focused φ-phrase rightmost in the ι-phrase and hence the most prominent (see analysis in Büring 2009). Consistent with this, the focused phrase is phonetically prominent. However, not all such languages show clear correlates of phonetic prominence on the focus (e.g. Chichewa; Downing and Pompino-Marschall 2013), weakening this claim. For register-based systems, pitch range expansion can be argued to mark ι-level prominence. Consistent with this, in Mandarin Chinese, lexical tones in focused words have more distinct f0 contours and less coarticulatory effects (Chen et al. 2016). Syntactic and non-marking languages fit the principle where the syntactic focus position aligns with the default nuclear stress (see §31.4). However, when there are no clear cues to nuclear stress, it is hard to see the theory-external evidence that these fit the principle. Féry (2013) proposes an alternative principle of alignment (see also Koch 2008). She claims cross-linguistically that the strongest tendency is for focused constituents to be aligned with the left or right edge of a phrase, usually the ι-phrase or sometimes the φ-phrase, with prominence being secondary and separable from alignment. For stressbased systems, she argues that nuclear stress is also phrasal, as it marks a φ-phrase head. For example, in Germanic languages, the nuclear accent is right-aligned as it is the rightmost phrase head (see Figure 31.1; Truckenbrodt 1995; Büring 2009). However, the phonetic cues to the assumed φ-phrase boundaries are often weak, as illustrated at the right edge of Peter in Figure 31.1a, weakening this claim. For syntactic and non-marking languages, similarly to the prominence-led approach, it is difficult to see the theory-external evidence for alignment where this does not involve phonetic cues to phrasing. Furthermore, it is not clear in this approach why alignment and prominence so often co-occur, if they are independent. To our knowledge, a fully fledged focus-as-register theory has not been advocated. However, in current approaches, register reference lines are often assumed for languages with stress-based systems, implying a view of prominence encompassing stress- and phrasebased systems (e.g. German: Truckenbrodt 2002; Féry and Kügler 2008; Kügler and Féry 2017). Focal prominence raises the pitch register line across a phrase, affecting pitch accent height in a stress-based system and the whole phrase in a phrase-based system. Féry and Ishihara (2010) further propose that focus raises the register while givenness lowers it;

PROSODIC ENCODING OF INFORMATION STRUCTURE 467 however, languages differ in the extent of raising/compression. For example, post-focal and givenness compression is almost complete in English and German (Kügler and Féry 2017), but only partial in Mandarin (Xu 1999) and Hindi (Patil et al. 2008), allowing tonal distinctions to be maintained in Mandarin. These unifying accounts are appealing and explanatory over a wide range of languages, but for all three there remain cases that fit awkwardly at best, especially in the case of languages without any clear prosodic cues to focus. Rather, these would appear to be separate, though overlapping, approaches. The details of where they overlap, and the extent to which they do, need much more empirical investigation.

31.6 Evaluation and considerations for future research From the discussion of the different languages in this chapter, it emerges that certain pros odic characteristics of a language often entail certain types of prosodic encoding of IS. For instance, if a language has stress, it most likely uses stress-based cues, or, if a language uses predominantly phrase tones to mark intonation units, it most likely uses phrase-based cues. However, this is not without exceptions. Further, as discussed in §31.5, future work in this area needs to provide more evidence to argue for a prominence view, a phrasing view, or a register view of the expression of focus. One important topic that we have not had space to cover in this chapter is methodology. Eliciting IS means to elicit both mental states of speakers and hearers and the linguistic means used to convey these mental states. It is not clear that classical tests—that is, minidialogues such as question–answer pairs (Krifka 2008; §31.2)—are sufficient to generate the appropriate mental states; more interactive tasks may be preferred (e.g. the Questionnaire on Information Structure, Skopeteas et al. 2006; Kügler et al. 2007; Genzel and Kügler 2010; Kügler and Genzel 2014; Calhoun 2015; Chen 2018). Elicitation materials need to be carefully constructed, and measured, to control for other effects such as tonal context and segmental influences (see Féry and Kügler 2008; Wang and Xu 2011; Kügler and Genzel 2012; Calhoun 2015; Genzel et al. 2015). Further, the majority of the studies discussed rely on the acoustic analysis of production data. Only a few studies examine whether listeners perceive and process the prosodic cues according to their pragmatic IS manipulations (e.g. Rump and Collier 1996; Ladd and Morton 1997; Zerbian 2007; Baumann and Schumacher 2012; Dilley and Heffner 2013; Kügler and Gollrad 2015; Kügler 2020). This is important, for example to determine whether prominence or phrasing cues are primary.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 32

Prosody i n Discou rse a n d Spea k er State Julia Hirschberg, Štefan Beňuš, Agustín Gravano, and Rivka Levitan

32.1 Introduction This chapter discusses several aspects of prosodic variation in discourse and in speaker state. Prosody is a critical component of both, linking the two in multiple ways. Prosodic variation can signal simple but important elements to recognize in conversation, such as whether a speaker has completed their turn, or more complex issues, such as whether they appreciate their partners or whether they are being truthful in their conversation. In discourse, we focus on turn-taking behaviours and prosodic entrainment (§32.2.1). We consider current prosodic research in discourse turn-taking, focusing on prosodic characteristics of different turn-taking behaviours, such as smooth turn exchanges, interruptions, and backchannels, and the turns that precede and follow these. Turn-taking analysis provides a useful means of characterizing the dynamics of interpersonal spoken interactions. Another approach to such analysis is represented by the discussion of prosodic entrainment (§32.2.2). Entrainment occurs when conversational partners begin to act like one another in pitch, speaking rate, or intensity, as well as in higher-level prosodic elements such as pitch accent and phrasal tones. Prosody can also provide indications of each speaker’s own state of mind, including their personality or more transient states such as their emotions. We review prosody in speaker state, primarily focusing on the role of prosody in emotion detection (§32.3.1), including the different ways in which emotion is defined and many of the resources and machine learning methods being used to identify it. We next examine the role of prosody in identifying an important speaker state: whether the speaker is being deceptive or truthful in conversation (§32.3.2). We discuss the prosodic features that research has found to be associated with deception versus truthfulness and compare results in the automatic detection of deception with human performance.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN DISCOURSE AND SPEAKER STATE 469

32.2 Prosody in discourse Discourse is a broad term used to capture concepts ranging from the (institutional) frame for any conversation or text, such as media or political discourse, to ways speakers use to structure sentences or utterances into larger meaningful units. Here we employ the narrower approach and examine the role of prosody in understanding how a particular utterance fits in with the surrounding utterances as well as with the broader context. While such analysis can be applied to both monologues and dialogues, here we focus on turn-taking and speaker entrainment in dyadic or multi-party spoken interactions.

32.2.1 Prosody and turn-taking In a seminal paper, Sacks et al. (1974) characterized turn-taking in conversations between two or more persons and articulated a basic set of rules governing turn construction: at every transition-relevance place (TRP), (i) if the current speaker (CS) selects a conversational partner as the next speaker, then such partner must speak next; (ii) if the CS does not select the next speaker, then anyone may take the next turn; and (iii) if no one else takes the next turn, then the CS may take the next turn. Such TRPs are conjectured to occur at points of possible syntactic completion, with intonation playing an important role. While some studies stress the importance of lexico-semantic cues over prosodic cues in regulating conversational turn-taking (De Ruiter et al. 2006), the relevance of prosody for turn-management is well established. Early findings based on conversation analysis methodology showed that rate of speech, loudness, pitch level, and direction of pitch contour all contribute to create complex signals that precede TRPs, such that the more complex the signal, the higher the likelihood of a speaker change (Duncan 1972; Cutler and Pearson 1986; Couper-Kuhlen and Selting 1996; Ford and Thompson 1996; Wennerstrom and Siegel 2003). More recently, corpus studies on large data sets have confirmed and refined these hypotheses, and there is now general agreement regarding the nature of these turn-yielding cues: a falling or a high rising final intonation (L-L% or H-H% in the ToBI system, respectively); reduced phrase-final lengthening; a decrease in pitch and intensity levels; a distinct voice quality (captured by acoustic features such as jitter, shimmer, and noise-to-harmonics ratio); a point of syntactic and/or semantic completion; and a stereotyped expression (such as you know) (Gravano and Hirschberg 2011; Hjalmarsson 2011; Raux and Eskenazi 2012; Meena et al. 2014; Bögels and Torreira 2015). Finally, while the vicinity of turn ends is very important for turn-management cues, the prosodic features in the initial parts of the turns might also participate in turn-taking signals and cueing pragmatic intentions. For example, Sicoli et al. (2015) found that speakers use a boosted initial pitch to signal questions, and, more specifically, questions that elicit evaluation rather than information. This is important for smooth turn-taking since planning the speaker’s response has to start well before the interlocutor’s turn is about to finish (Levinson 2016). Discourse markers also play a central role in turn-taking, in the structuring and coord inating of dialogue (Grosz and Sidner 1986). Short linguistic expressions such as right or yeah are frequent in spontaneous conversation and are heavily overloaded. Their possible

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

470 Julia Hirschberg, Štefan Beňuš, Agustín Gravano, and Rivka Levitan iscourse/pragmatic functions include agreeing with what the interlocutor has said, disd playing interest and continued attention, cueing the start of a new topic, and many others; some words (notably, okay) may convey as many as 10 different functions (Schiffrin 1987; Redeker 1991; Gravano et al. 2012). There exists robust evidence pointing to prosody as a crucial mechanism for disambiguating the meaning of discourse markers and thus their function in turn-taking behaviour. Prosodic features effectively complement lexical, syntactic, and contextual features for discriminating between the discourse and sentential use of discourse markers (Litman 1996; Zufferey and Popescu-Belis 2004) and also for distinguishing different discourse functions (Jurafsky et al. 1998; Stolcke et al. 2000). For example, despite their high lexical variability (e.g. yeah, okay, uh-huh, mm-hm, right), such ‘backchannels’—speech from a conversational partner that is not intended to take the turn—are characterized in American English as having higher overall pitch and intensity levels than other pragmatic functions, as well as a marked rising final intonation (L-H% or H-H%) (Beňuš et al. 2007). The temporal and acoustic features of conversational fillers (CFs), or filled pauses (FPs), should be understood as devices with important turn-organizational uses (Sacks et al. 1974: 720). CFs may function as turn-holders, turn-yielders, or turn-grabbers, and serve to structure and ground discourse. For example, Shriberg and Lickley (1993) found that the fundamental frequency (f0) of CFs occurring within a clause is linearly related to the f0 of the surrounding speech from the same speaker, and Beňuš et al. (2011) identified a link between the temporal alignment of turn-initial CFs and their discourse meaning in negotiating grounding and interpersonal dominance.

32.2.2 Prosody and entrainment Entrainment, also called ‘accommodation’, ‘alignment’, ‘adaptation’, and ‘convergence’, is the tendency of conversational partners to become similar to each other along numerous dimensions of speech and language. Here we follow the computer science literature, which uses ‘entrainment’ as a general term (Brennan and Clark 1996; Nenkova et al. 2008; Lee et al. 2010; Levitan and Hirschberg 2011; Fandrianto and Eskenazi 2012; Friedberg et al. 2012), and discuss entrainment behaviour in the dimension of prosody. Empirical evidence of acoustic-prosodic entrainment has been well documented in the literature. In one of the earliest studies of entrainment, Matarazzo and Wiens (1967) showed that interviewers could manipulate an interviewee’s response time latency by increasing or decreasing their own duration of silence before responding. Similarly, Street (1984) showed that interviewees converged towards their interviewers on response latency and speech rate. Natale (1975) manipulated a confederate’s within-conversation intensity levels to show that subjects who were engaged in open-ended conversation entrained to each new intensity level; this entrainment increased over the course of a conversation and again when the subject returned for subsequent sessions. Gregory et al. (1993) found that similarity was greater in true conversations than in conversations simulated by splicing together utterances from speakers who did not actually interact. Ward and Litman (2007) measured local entrainment on loudness and pitch and showed that this measure could successfully distinguish randomized from naturally ordered data. Levitan et al. (2011) showed that speakers entrain on speech rate and voice quality in addition to vocal intensity and pitch. Moving beyond lowlevel features of the speech signal, Gravano et al. (2014) proposed entrainment measures

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN DISCOURSE AND SPEAKER STATE 471 based on strings of discrete ToBI labels, and Reichel and Cole (2016) showed entrainment using polynomial parametrization of f0 contours. Focusing specifically on the acoustic- prosodic features of speech immediately preceding backchannels, Levitan et al. (2011) showed that speakers entrain on the use of such features as backchannel-inviting cues. More detailed studies have explored various factors that mediate entrainment behaviour, such as gender. Namy et al. (2002) found that female speakers were perceived to accommodate more in a shadowing task than male speakers. Pardo (2006), on the other hand, found that female pairs were less phonetically similar to each other than male pairs, and Thomason et al. (2013) found that males entrained more than females on vocal intensity features. Bilous and Krauss (1988) found that female and male speakers converged on different features. Levitan et al. (2012) found that female–male dyads entrained on vocal intensity, pitch, and voice quality (all features tested), while female–female pairs did not entrain on pitch, and male pairs only entrained on intensity. The lack of a consistent pattern of results across studies argues against a straightforward relationship between entrainment and gender. Prosodic entrainment is ubiquitous in many languages. In a series of studies, Levitan and colleagues (Xia et al. 2014; Levitan et al. 2015) reported on commonalities observed across languages as well as systematic differences in global and local measures of acoustic-prosodic entrainment in English, Spanish, Slovak, and Chinese using comparable corpora and methodology of feature extraction and statistical analyses. There is as yet no clear narrative to explain these empirical findings in terms of how acoustic-prosodic entrainment interacts with a language’s unique prosody. Several theoretical models of entrainment have been proposed in the literature. The most influential, Communication Accommodation Theory (Giles et al. 1991), proposes that speakers converge to (or diverge from) their interlocutors in order to attenuate (or accentuate) social differences. An alternative theory, proposed by Chartrand and Bargh (1999), suggests that humans behave like chameleons rather than apes, passively and unconsciously reflecting the social behaviour of their interlocutor. They posit that the psychological mechanism behind entrainment is the perception–behaviour link: that the act of observing another’s behaviour increases the likelihood of the observer’s engaging in that behaviour. Pickering and Garrod (2004) propose an analogous theory of dialogue, which holds that production is automatically linked to perception through entrainment on linguistic levels of representation (phonology, syntax, pragmatics), and in a more general model use speech entrainment findings as support for the central role of prediction in dialogue (Pickering and Garrod 2013). In a study of perceptual learning and production changes in an ambiguous pronunci ation, Kraljic et al. (2008) decouple the processes of perception and production and show that at least in this domain, changes in perception did not trigger corresponding changes in production, suggesting that the link between processes of perception and production is not automatic, but can be mediated by pragmatic goals. They further suggest that the strength of this link may relate to the levels of representation activated in perception and production, since at the motor level it may be difficult to override the effect of practice. Characterizing speech features in terms of their levels of representation may help to reconcile disparate findings regarding perception, production, and pragmatic context. Motivated by the theories relating entrainment to pragmatic goals, some studies have explored the relationship between entrainment and extrinsic metrics. Entrainment on various

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

472 Julia Hirschberg, Štefan Beňuš, Agustín Gravano, and Rivka Levitan prosodic features has been associated with numerous measures of conversational quality or positive outcomes (Lee et al. 2010; Beňuš et al. 2012, 2014a; Levitan et al. 2012; Manson et al. 2013; Thomason et al. 2013; Gravano et al. 2015; Reichel and Cole 2016; Borrie and Delfino 2017). However, many of these studies also reported negative results for other features of prosody or conversation quality. For example, the positivity of interactions between married couples in therapy was associated with entrainment on pitch but not energy features (Lee et al. 2010); partners who converged on speech rate were more likely to later cooperate in a prisoner’s dilemma but not to evaluate each other more positively (Manson et al. 2013); partners who entrained on intensity and pitch were perceived as trying to be liked and encouraging each other, but no correlations were found for entrainment on voice quality or speech rate (Levitan et al. 2012). These findings indicate that prosodic entrainment can be a useful signal when evaluating conversation quality, but that entrainment in different dimensions may be considered distinct phenomena and must be considered individually. The level of representation activated by a particular speech feature (Kraljic et al. 2008), or perhaps the range of ‘acceptable’ values available, may provide a useful framework for explaining differences in the entrainment of individual prosodic features. Additionally, given that prosody is closely linked to the pragmatics of speech, future analysis of prosodic-pragmatic entrainment behaviours, such as backchannel-inviting cues (Levitan et al. 2011) or dialogue acts (Cole and Reichel 2016), is likely to yield better cognitive models of prosody.

32.3 Prosody and speaker state Another important aspect of discourse is the state of mind or intentions of the conversants, which clearly influence not only what they say but also how they say it. Prosody plays a major role not only in the production but also in the perception of a speaker’s state of mind, including their emotional state as well as their believability or trustworthiness.

32.3.1 Prosody and emotion A speaker’s emotional state is important to recognize for humans as well as machines. Many studies have shown that prosodic information (both low- and high-level features) plays an important role in conveying a speaker’s emotional state. Often, emotions are defined as Ekman’s classic six: disgust, sadness, happiness, fear, anger, and surprise (Ekman 1992, 1999); these were expanded by Ekman himself to include amusement, contempt, contentment, embarrassment, excitement, guilt, pride, relief, satisfaction, sensory pleasure, and shame (Ekman 1999). However, Plutchik’s (2002) alternative typology, depicted in his Wheel of Emotions, proposed eight basic emotions, divided into positive and negative (valence), with variations in the degree to which each is held (activation). A simpler scheme identifies only positive or negative valence and plus or minus activation. Research on emotional speech depends upon the existence of hand-annotated corpora for one or more emotions to be distinguished from neutral. This is a major limiting factor since, where natural speech is available, annotators frequently disagree, whatever the emotion to be

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN DISCOURSE AND SPEAKER STATE 473 identified. However, acted emotion does not appear to be truly representative of normal emotion, and is, in fact, much easier to identify since it is often over-acted. Proposals have been made to use a paradigm more familiar to professional actors in which contexts or scripts are provided that should induce the desired emotions, instead of simple instructions to ‘sound angry’ or ‘sound sad’ (Enos and Hirschberg 2006). In addition, beyond the Interspeech Challenge corpora, labelled data exist for relatively few languages and few emotions, and even fewer such corpora are available for general research use. This makes crosslanguage comparison particularly difficult as well. Despite these barriers, emotion research has increased considerably over the past decade and a half with the introduction of annual competitions, starting with the first Interspeech Computational Paralinguistics Challenge in 2009 (http://compare.openaudio.eu). Organized by Bjoern Schuller, Anton Batliner, and colleagues, this challenge introduced the openSMILE feature extraction tool (http://audeering.com/technology/opensmile), which extracts low-level prosodic features such as cepstral features, pitch, energy, and voice quality. These features can then be manipulated by functions to produce means, maxima, minima, peaks, segments, and durations, inter alia, and a variety of functionals operating on these, in two feature sets, one with 384 and one with 6,552 elements. The openSMILE tool has allowed researchers to examine the same acoustic-prosodic characteristics of a wide variety of speaker states, including the classic and derived emotions (2009, 2013); age, gender, and degree of interest (2010); intoxication and sleepiness (2011); personality, likability, and pathology (2012); social signals such as laughs and fillers, degree of conflict, and autism (2013); cognitive load and physical load (2014); degree of ‘nativeness’ of language, Parkinson’s condition, and eating disorders (2015); deception, sincerity, and native language (2016); and differentiating speech of an adult to a child or another adult, speech suffering from a cold, and snoring classification (2017). All of these challenges usually provide as a baseline the same set of openSMILE features typically used in most emotional speech research. Findings from the languages from available corpora indicate that low-level acoustic- prosodic features such as those extracted by the openSMILE libraries appear to be more useful than higher-level ones in characterizing differences in valence and activation as well as between different emotions. Mel frequency cepstral coefficients (MFCCs), representing frequencies of the audio spectrum (20 Hz to 20 kHz) of human hearing broken into frequency bands, are among the most useful features for distinguishing emotions. Features such as pitch, intensity, speaking rate, and voice quality are also particularly useful in detecting activation. While higher-level prosodic features such as intonational contour have rarely been found to distinguish different emotions in terms of valence, Liscombe et al. (2003) did find that, for the Linguistic Data Consortium EPSAT corpus of acted emotional speech (Liberman et al. 2002), a final plateau contour was associated with negative emotions while final fall characterized positive emotions. While lexical features typically provide some degree of improvement in emotion detection (for natural speech), for some emotions, such as sadness, acoustic-prosodic features can be more important than lexical features (Devillers and Vidrascu 2006). There is also evidence that context can be important in identifying emotions such as anger and frustration: for example, a gradual increase in features such as mean pitch and intensity, combined with a decrease in rate, has been found to be useful in identifying anger (Liscombe et al. 2005). A sliding window of cepstral, prosodic, and text features has also been found to support real-time recognition of valence (Mishra and Dimitriadis 2013). Sociolinguistic studies of the role of prosody in affective

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

474 Julia Hirschberg, Štefan Beňuš, Agustín Gravano, and Rivka Levitan speech suggest that cultural factors, age, and the gender of the speaker also affect the production and perception of emotion (Hancil 2009). Due to the lack of labelled corpora in many languages, cross-lingual approaches to emotion detection have become increasingly popular (Feraru et al. 2015). These approaches include training on one language to detect the same emotion in another as well as training multilingual classifiers that can handle unseen languages (Polzehl and Metze 2010). While the results do not yet equal those of systems trained and tested on the same language, they represent promising means of identifying emotions when such data is not available. They also illustrate the importance of acoustic and prosodic features in emotion detection when lexical information may be unavailable. Multi-modal approaches to emotion detection have combined acoustic-prosodic information with information from facial expression and body gestures. The AVEC (Audio/ Visual Emotion Challenge) competitions have, since 2011, brought together researchers to examine audio and video features alone or in combination for large multi-modal corpora such as the Semaine (McKeown et al. 2010), Recola (Ringeval et al. 2013), SimSensi (DeVault et al. 2014), and DAIC-WOZ (Gratch et al. 2014) corpora, and the SEWA Database (https:// db.sewaproject.eu). Most contestants have found the role of prosody to be important in the multi-modal context. For example, the winner of the 2016 competition (Brady et al. 2016) employed features based on vocal effort (captured by measuring loudness and spectral tilt) and variations in intonation (pitch range and standard deviation) and speaking rate on the Recola corpus.

32.3.2 Prosody and deception Deception is generally defined as a ‘deliberate choice to mislead a target without any notification’ (Ekman 2009: 41) to gain some advantage or to avoid some penalty. This definition excludes theatre as well as falsehoods due to ignorance, self-deception, delusion, or pathological behaviour. While ‘white lies’ (e.g. ‘I love your new jacket.’) are hard to detect, largely because the hearer is happy to believe them true, ‘serious lies’ are thought to be easier to identify. Researchers hypothesize that our cognitive load is increased when we lie because it is difficult to keep the false story straight, remembering what we have and have not said. Also, our fear of detection is increased if we believe our target is hard to fool or is suspicious of us and if the stakes of being discovered are high, such as serious rewards and/or punishments. It is thought that all of these factors can make it hard for us to control indicators of deception, possibly making it easier to detect serious lies. Researchers in multiple disciplines, including psychology, computer science, linguistics, and criminology, have studied deception for many years, examining physiological cues (Horvath 1973), micro-expressions of the face (Ekman et al. 1991; Frank and Svetieva 2015), and body gestures (Burgoon et al. 2014), as well as lexical cues and, more recently, brain activation and body odour. Previous work on language cues has examined text-based cues to deception in various domains, including criminal testimonies (Bachenko et al. 2008; Fornaciari and Poesio 2013; Pérez-Rosas et al. 2015), hotel reviews (Ott et al. 2011), and opinions about controversial topics such as abortion (Newman et al. 2003). This line of research shows that liars use less cognitively complex language, fewer self- or other-references, and more negative emotion words due to feelings of guilt and increased cognitive load in telling

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN DISCOURSE AND SPEAKER STATE 475 false stories. There is also some literature and much practitioner lore in law enforcement and the military on auditory and lexical cues to deception, the most widely mentioned being response latency, filled pauses, coherence of discourse, passive voice, and use of contractions (Reid 2000; Adams and Jarvis 2006; Bachenko et al. 2008). Of the scientific studies that have examined acoustic and prosodic cues, Ekman et al. (1991) found a significant increase in pitch for deceptive speech over truthful speech. Streeter et al. (1977) reported similar results, with stronger findings for more highly motivated subjects. Voice stress analysis procedures have attempted to use low-level indicators of stress as indirect indicators of deception, and commercial systems promise to distinguish truth from lie in this way—or love from indifference—with little independent evidence of success. A meta-study by DePaulo et al. (2003) included many of the proposed cues to deception. Of the vocal cues examined, this study found evidence that vocal indicators of uncertainty (thought to indicate that liars are less ‘compelling’), vocal tension, and pitch indications of liar tenseness had been demonstrated to indicate deception, whereas vocal indicators of ‘involvement’ and ‘immediacy’ had not. In another meta-analysis of 41 studies, Sporer and Schwandt (2006) examined nine potential cues to deception: speaking rate, response latency, message duration, number of words, filled and unfilled pauses, repetitions, speech errors, and pitch. Of all these cues, they found only pitch and response latency to be reliably associated with deception, with both showing increases during lying compared to truth-telling. Kirchhübel et al. (2013) examined temporal cues to deception including speaking rate, response onset time, and frequency and duration of hesitation markers, finding a significant increase in rate, a significant decrease in response onset time, and a reduction in hesitation phenomena in the deceptive condition of data from 19 subjects; they concluded that there was an overall acceleration of speaking tempo in deceptive speech. While most studies of deception focus on native speakers of English, some work has been done on deceptive behaviour in other cultures. In a perception study by Cheng and Broadhurst (2005), Cantonese English bilinguals were more often judged as being deceitful when they spoke in their second language, English, than when they spoke in their first language, Cantonese, regardless of whether they were telling the truth or lying. When speaking in English, they displayed more visual cues to deception (e.g. hand and arm movements) and more audio cues to deception (e.g. changes in pitch). Examining pitch, response latency, and speaking rate in Italian speech, Spence et al. (2012) found significant correlations of deception with the latter two features: speaking rate was slower and response latency longer during deception, as one might expect. Despite indications from many studies that there are auditory cues to deception, there has been little systematic examination of these cues in large-scale corpora, largely due to the lack of cleanly recorded corpora of deceptive and non-deceptive speech. Two exceptions are the Columbia SRI Colorado (CSC) corpus and the Columbia X-Cultural Deception (CXD) corpus. The CSC corpus (Hirschberg et al. 2005) represents the first large-scale corpus of deceptive speech in English, with around seven hours of within-subject deceptive and nondeceptive speech. Thirty-two subjects were asked to perform a series of tasks and then to lie about their performance in a subsequent recorded interview, identifying true or false utterances by pedal presses. The corpus was transcribed and aligned, and acoustic-prosodic as well as lexical features were extracted. These were analysed statistically and also used in machine learning experiments designed to automatically distinguish deceptive from

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

476 Julia Hirschberg, Štefan Beňuš, Agustín Gravano, and Rivka Levitan on-deceptive speech. Extracted features included raw and normalized duration (phone, n vowel, syllable, word, phrase, and utterance), speaking rate, pause length and speech-topause ratio, energy, pitch (stylized contours, range, slope, min, mean, and max), spectral tilt, and numerous lexical features, laughs, audible breaths, and other ‘noise’. With these features, it was possible to distinguish deceptive from non-deceptive speech with 70% accuracy. In contrast, human judges performed at less than chance. However, these judges were found to differ significantly in performance and this difference was found to correlate significantly with their personality profiles, leading to the question of whether personality profiles might be helpful in identifying individual differences between subjects. Although pitch and energy features did contribute to the classification performance, it was notable that, as practitioners have long believed, there were individual differences in subjects’ prosody in the two conditions. For example, some subjects raised their pitch when lying while others lowered it. So, acoustic-prosodic features were more useful in predicting the performance of individual subjects than they were in predicting for the subject population as a whole. The second corpus was designed to address this issue by collecting information on subjects’ personality using the NEO five factor personality inventory (NEO-FFI) (Costa and McCrae 1989). This information is now being used in current prediction of subjects’ deceptive and non-deceptive behaviours, with results that show increased performance in deception detection (An et al. 2018).

32.4 Conclusion In this chapter, we have summarized the importance of prosody in discourse and in speaker state. In turn-taking in discourse, prosody plays a major role in indicating whether speakers are retaining or relinquishing the floor, and whether their conversational partners are attempting to take the floor or simply inviting the current speakers to continue. It also plays a major role in the overall progress of a conversation, as a signal on which interlocutors can entrain, a significant factor in discourse quality. Prosody is also a valuable indicator of speaker state of mind in discourse. What sorts of emotions are conversational partners displaying through their prosody? Are these conversants actually being truthful or deceptive? Future research should examine the role of speaker state in conversation more thoroughly, to detect whether people also tend to entrain in their emotional state when speaking together. Conversants’ perception of their partner’s emotional state should also be the subject of more study. In terms of deceptive speech, the characteristics of speech that are trusted or mistrusted (whether this is justified or not) also constitute an important area of future research. Altogether, there are many more areas where discourse and speaker state interact that will be important to understand in producing the spoken dialogue systems of the future.

chapter 33

V isua l Prosody Across Cu lt u r e s Marc Swerts and Emiel Krahmer

33.1 Introduction Human communication proceeds via multiple channels. In their daily encounters with others, people not only talk but also make things clear through facial expressions, gestures, body posture, and the physical distance they keep from their dialogue partner (Knapp and Hall 2009). The fact that interlocutors use such visual cues in addition to their speech appears logical, as they can hear and see each other during many of their interactions; it would be a missed opportunity not to exploit the visual modality as an extra communica tive resource. Somewhat surprisingly, however, the vast majority of studies of people’s inter actions have been constrained to analyses of the auditory channel—that is, of the sounds speakers produce and the way these are perceived by their addressees. That also holds for the study of (auditory) prosody, which has for long time been treated as a purely acoustic phenomenon, usually taken to comprise auditorily observable features such as intonation, tempo, rhythm, voice quality, and loudness, which can in principle be measured from the speech signal alone. Much research has been devoted to the various ways in which such auditory features can add extra information to the spoken content of a message that is not necessarily expressed via the words and the syntax, as seen in chapters 30, 31, and 32. In this chapter, we explore the visual counterparts to auditory prosody, study how the two modal ities relate to each other, and ask to what extent the combined use of modalities varies as a function of culture.

33.2 What is visual prosody? Throughout this chapter, we use the term ‘visual prosody’ to refer to all of the above visually perceivable correlates of auditory prosody, including facial expressions (such as eyebrow movements, lip shapes, etc.), manual gestures, and body posture. When auditory and visual cues work in tandem (e.g. rising intonation and raised eyebrows) for communicative

478 MARC SWERTS AND EMIEL KRAHMER urposes (e.g. indicating that an utterance should be interpreted as a question), we also use p the term ‘audiovisual prosody’, following Granström and House (2004). A key feature of audiovisual prosody is that the auditory and visual cues are temporally aligned, although the link between the two modalities can be more or less strict, depending on the cues in question and their communicative function. (Audio)visual prosody is related to, for example, gesture studies (e.g. McNeill 1992; Kendon 1997) and non-verbal communication (e.g. Mehrabian 1972; Knapp and Hall 2009), but it is more specific than either due to its focus on the interrelation with speech. With the rise of new methods to record, analyse, and (re-)synthesize visual appearances of people, it has become increasingly evident that visual prosody may serve similar communicative functions as auditory prosody, signalling for instance whether information is important or not (Dohen et al. 2004; Swerts and Krahmer 2010), whether or not a speaker intends to continue or abandon a speaking turn (Argyle and Cook 1976; Barkhuysen et al. 2008) or wants to provide negative or positive feedback (Granström et al. 2002), whether someone is declaring or asking something (Srinivasan and Massaro 2003; Borràs-Comes and Prieto 2011), or whether a person is emotional or affective (Ekman and Friesen 1975; Rilliard et al. 2009), ironic (Attardo et al. 2003), uncertain (Krahmer and Swerts 2005; Swerts and Krahmer 2005), incredulous (Sendra et al. 2013), surprised (Visser et al. 2014), or truthful or deceptive (DePaulo et al. 2003), to just name a few of these functions. As a result, researchers have gradually started to depart from an almost exclusive focus on what can be heard, and have become more interested in speaker cues that can be seen and that are communicatively relevant. This audiovisual perspective on human communication fits well with what has become known more broadly as ‘multisensory processing’ (e.g. de Gelder and Bertenson 2003; Stein 2012), which maintains that we perceive the world through a c ombination of visual, auditory, haptic, and olfactory signals. Our appreciation of a meal is not only determined by the taste and the smell of the food but is also affected by the music we hear in the background, the decor of the restaurant we see, and the softness we feel of the chair on which we are seated. Similarly, it is an apophthegm that we do not only use our auditory senses when trying to understand speech; our visual senses also contribute, in that they can influence intelligibility in adverse acoustical conditions (e.g. Benoît et al. 1996; Jordan and Sergeant 2000) or, in the case of incongruencies, where an auditory /ba/ combined with a visual /ga/ is per ceived as /da/ by most people (McGurk and MacDonald 1976; Skipper et al. 2007). These examples highlight that, depending on the communicative setting, having multiple modalities and senses can be helpful in a variety of ways. First, one channel could be espe cially useful to the extent that it may compensate for problems with another channel—for example, because that other channel is noisy or deficient. For instance, during a cocktail party, with other people babbling and the cluttering sounds of objects in the background, addressees can profit from visual cues present in the face of a dialogue partner who is talking, as movements of that person’s lips and head improve speech intelligibility (e.g. Sumby and Pollack 1954; Jordan and Sergeant 2000; Davis and Kim 2007). Conversely, in situations with low or even no visibility, for instance during a chat in a room with poor lighting condi tions (or on the phone), it is obviously useful that speakers can rely more on auditory infor mation. Second, an additional advantage of being able to access multiple channels, is that their use can be distributed over speaking partners so that they can exchange information in parallel. For instance, while one person is talking, the other can return visual feedback, such as affirmative head nods or expressions of surprise or misunderstanding, which do not

VISUAL PROSODY ACROSS CULTURES 479 interfere with the speech produced by the other as these are produced in silence, but do indicate how the information is being processed by the addressee (Barkhuysen et al. 2008). If instead dialogue partners were to produce speech simultaneously, miscommunication might well result from the overlapping sound streams, because the speech by one person might mask that of the other. Despite such obvious cases, however, we still lack a complete picture of the relative importance of the various audiovisual cues supporting or steering human communication.

33.2.1 How do auditory and visual prosodic cues relate to each other? It will be intuitively clear to most people that audiovisual prosody plays a significant role in daily interaction. In fact, following popular wisdom, people might well attach too much importance to its role in communication. We regularly encounter statements—typically unattributed—to the effect that non-verbal communication (broadly construed) accounts for some 90% (or a comparable figure) of the message. Such statements probably go back to work by Mehrabian and colleagues in the second half of the 1960s (e.g. Mehrabian and Ferris 1967; Mehrabian and Wiener 1967). They studied how people judged a speaker’s gen eral attitude, which could be positive, negative, or neutral, based on possibly conflicting (i.e. incongruent) verbal, intonative, and facial cues. Those studies found that the relative weights of these three factors were .7, .38, and .55, respectively (totalling .93 for non-verbal cues). Even though the applicability of this result has been stretched beyond recognition (Trimboli and Walker 1987), the notion that non-verbal cues such as intonation and facial expressions are important for communication is in itself uncontroversial. However, the relative importance of different auditory and visual cues is far from transparent, and may additionally vary across different aspects of communication. Intuitively, the relative contribution of different modalities may vary for linguistic and paralinguistic functions. Linguistic functions exploit audiovisual prosody to mark structur ally relevant information of a spoken discourse. When speakers produce sequences of utter ances, they tend to indicate how those utterances are connected. As Ladd and Cutler (1983) pointed out, auditory prosody is particularly useful for segmenting the flow of speech into separate chunks of information, such as topical units, and for highlighting fragments that need to stand out, because they represent new or contrastive information. Considerable research has been done on how boundaries between units are marked auditorily through pauses, boundary tones, and pitch resets, and on how particular stretches of speech are made more prominent via acoustic cues (Ladd 2008b). There is evidence that such discourse-structural information can also be marked via facial expressions. For instance, the ends of a speaking turn have been shown to be accompanied by variations in eye gaze, with speakers averting their gaze during a speaking turn and returning the gaze to their partner when they want to end their speaking turn (Argyle and Cook 1976). In a similar vein, it has been shown that beat gestures can function as markers of syntactic structure (e.g. Biau and Soto-Faraco 2013; Guellaï et al. 2014). Likewise, pitch accents tend to be aligned with various kinds of beat gestures, such as abrupt movements of eyebrows or of the hands, or head nods (Krahmer and Swerts 2007; Ambrazaitis and House 2017; Esteve-Gibert et al. 2017).

480 MARC SWERTS AND EMIEL KRAHMER

Figure 33.1 Visual cues reflecting a positive (a) and a negative (b) way to produce the utterance ‘My boyfriend will spend the whole summer in Spain’.

However, the evidence so far suggests that these visual cues tend not to be as important as the auditory markers; that is, addressees tend to pay comparatively more attention to the auditory cues for those kinds of phenomena (Swerts and Krahmer 2008; Gambi et al. 2015). The situation appears to be different for ‘paralinguistic functions’, when audiovisual cues refer to attitudinal or emotional meanings, such as irony, sarcasm, or a depressed or angry mood. Various studies have shown that intonation, rhythm, and voice quality can signal the emotional or attitudinal state of a speaker (see e.g. Scherer 2003 for a survey). In this area, facial expressions are particularly important. Since Darwin (1872), it has been observed that faces are ‘windows to the soul’ that externalize the inner feelings of a person most clearly (Ekman and Friesen 1975), even to the extent that these may overrule whatever meaning is conveyed through words or other auditory features. A different facial expression can change the interpretation of an utterance like ‘My boyfriend will spend the whole summer in Spain’ from a positive (Figure 33.1a) to a negative (Figure 33.1b) statement.

33.2.2 Is there cultural variability in (audio)visual prosody? Of course, any claim about the relative importance of audiovisual prosody is somewhat premature when it is only based on data from a limited set of language communities. Unfortunately, like in many other areas of linguistic research, most of our current insights into prosody stem from analyses of only a handful of languages, with a strong bias towards English. As a result, we still lack insight into which aspects of audiovisual prosody are cul ture specific and which are more universally true. Darwin’s (1872) claim that some non-verbal cues to emotion are similar across the globe has been hotly debated. Perhaps not surprisingly, the current consensus seems to be that there are both universal and culture-specific aspects (e.g. Russell 1994; Kita 2009; Gendron et al. 2014; Chen and Jack 2017). For auditory forms of prosody, it has already been shown that there can be differences both between and within languages. For instance, lan guages differ in the way they use intonation to mark prominent information, to segment utterances into smaller units, or to differentiate utterance categories (Ladd 1996). Along the

VISUAL PROSODY ACROSS CULTURES 481 same lines, it is worth exploring whether differences exist in the way speakers from various communities use audiovisual prosody for communicative purposes. Importantly, ‘culture’ is a complex concept (Berry et al. 2002). At a general level, it may refer to a shared set of beliefs, attitudes, and practices that characterize certain social, reli gious, or ethnic groups, but the concrete implications of that definition are not always clear. When applied too simplistically, it will lead to an underestimation of the individual vari ation that may exist in personality, age, gender, and other personal features, which may have repercussions for the use of audiovisual prosody. Given this disclaimer, in the remainder of this chapter we will focus mainly on regional variation in visual expressions.

33.3 Three case studies The study of cross-cultural variation in the usage and function of audiovisual prosody is still in its infancy. In this section we sketch three case studies from our own lab, as examples of the kind of research that can be done in this emerging field, addressing cues to feeling-ofknowing (Dutch vs. Japanese) (§33.3.1), correlates of losing and winning situations (Dutch vs. Pakistani) (§33.3.2), and gestural references to the concept of time (Chinese vs. English) (§33.3.3). These three studies illustrate different aspects of cross-cultural research on audiovisual prosody, with the first focusing on how adult speakers of different cultures con vey information via multiple modalities, the second zooming in on cross-cultural similarities and differences in expressiveness of audiovisual prosody from a developmental perspective, and the third addressing the influence of language and culture on visual prosody, especially gesture.

33.3.1 Cues to feeling-of-knowing in Dutch and Japanese Conversations can be viewed as joint actions that depend on contributions from all dia logue partners (Clark 1996). Indeed, addressees, while listening to a speaker, do not simply decode the incoming messages but also tend to signal to their dialogue partner how that information is being processed. This is illustrated, for example, in quiz-like situations where people are asked a set of questions about a wide range of topics. From their responses, it usually becomes clear whether someone feels certain about the correctness of the answer, is simply guessing, or does not have a clue at all as to what the right response should be. Such experiences relate to what has become known as ‘feeling-of-knowing’ (FOK), which is the ability for people to judge whether or not they know the answer to a factual question, even when they may not be able to retrieve the correct information immediately from memory (the so-called tip-of-the-tongue phenomenon). Previous work (Smith and Clark 1993; Brennan and Williams 1995) has shown that speakers of English marked the degree of their FOK by auditory features of (un)certainty. For instance, when speakers felt uncertain, they tended to produce their response after some delay, often accompanied with a filled pause such as uh or uhm and with a question inton ation. We showed (Swerts and Krahmer 2005) that Dutch speakers used these auditory cues as well, but additionally also distinguished uncertain responses from certain ones by means of

482 MARC SWERTS AND EMIEL KRAHMER marked facial expressions, such as a funny face or dynamic movements of eyebrows and head. These cues needed not always occur together or at the same time, but we found that the more uncertain a speaker was, the more (auditory and visual) cues he or she used. For that study, we copied the procedure of the quiz-like set-up of earlier studies on English, except that we also video-taped our participants (which was not done in the earlier studies of English speakers). We also showed that observers made use of such audiovisual features to quite accurately determine another person’s FOK. We later conducted a similar study together with Atsushi Shimojima, and with the help of Mark Müskens and Bart de Waal as part of their MA thesis work, with Japanese partici pants. Japanese and Dutch have been argued to represent opposing cultures, with the for mer scoring significantly higher than the latter in terms of uncertainty avoidance and power distance (Hofstede 2001). Interestingly, these two features have been said to be positively correlated with face-saving strategies (Merkin et al. 2014), so that one would expect Japanese speakers to show clearer cues that they feel uncertain about the correctness of a response. However, at the same time, speakers of Japanese, being members of a collectivistic culture, are reportedly low in self-disclosure in fear of upsetting social harmony (Miyahara 2000), which actually would lead to the prediction that their cues to uncertainty are less clearly marked, as Japanese speakers may be more inclined to maintain a ‘poker face’. Our findings are in line with the second hypothesis, as the audiovisual correlates of the FOK from Japanese participants were far less obvious than those from Dutch participants, albeit that there was a gender difference, as our female Japanese participants were more transparent about their FOK than their male counterparts. In line with these observations, the Japanese recordings also turned out to be more difficult to judge in terms of a speaker’s uncertainty level, irrespective of whether observers were Dutch or Japanese, while both groups of judges found it easier to assess the FOK of the Dutch participants.

33.3.2 Correlates of winning and losing by Dutch and Pakistani children The second case study looks into developmental aspects of audiovisual prosodic responses to emotional situations, in particular winning or losing during game play. If one is ever in need of examples to show how people may externalize their inner feelings, it will suffice to observe recordings of players (or their fans) during a sports event. This is even true for facial expressions produced by blind athletes (Matsumoto and Willingham 2009), suggest ing that these expressions may, at least to some extent, be genetically determined. Accordingly, games offer a potentially useful resource for studying developmental, histor ical, and variational aspects of non-verbal expressions, given that games are played and watched across the globe by people in different age groups and with different backgrounds, personalities, and skills. For example, Shahid et al. (2008) studied expressions of Dutch and Pakistani children after they had just won or lost a game, using a digitized card game where children were shown rows of cards on a computer screen where initially only the leftmost card was visible. The goal of the game was to guess whether the next card in the row would reveal a higher or lower number than the one that had already been shown. Children would win a game if all cards in a row

VISUAL PROSODY ACROSS CULTURES 483 were correctly guessed, and would lose otherwise. Shahid and colleagues constructed the card sequences in such a way that rational choices would naturally lead to winning or losing events. Children played the game either alone or in couples, where two teammates were co-responsible for the consecutive guesses. Using an identical procedure for both cultures, they cut clips from the video recordings of the participating children (i.e. their immediate reaction to the final card after it had revealed whether they had won or lost the guessing game). Those video seg ments with the sound removed were then shown to Dutch observers, who had to decide whether a clip had been taken from a winning or losing context. A comparative analysis revealed that children of both cultures responded similarly in important respects. Adult judges found it easier to distinguish losing from winning events on the basis of expressions of children who had been in the together condition compared to the alone condition, in line with claims that a co-presence condition would make children more expressive. In addition, for both cultures, younger children (8 years old) showed clearer audiovisual responses correlating with game status than older children (12 years old), consistent with the general idea that emotions are increasingly internalized with age. However, children from the two cultures responded differently in other respects. First, the Pakistani children were found to be more expressive than the Dutch ones, since their reac tions were found to be easier to classify in a judgement study. Additionally, judges found Dutch children’s reactions to a losing outcome easier to classify than their winning ones, while for the Pakistani children, in contrast, winning reactions were easier to classify. It remains an open question how these findings can be explained and to what cultural factors (religious, philosophical, or educational) these differences can be related.

33.3.3 Gestural cues to time in English and Chinese The third and final case study compares the way speakers from different cultures use ges tures when describing temporal concepts such as ‘past’ or ‘future’. When people refer to time, they often tend to metaphorically express those references in terms of spatial con cepts. For instance, in English or Dutch, people often talk about the future as something that lies ahead of them, while the past is situated somewhere behind their back (Casasanto and Boroditsky 2008). These concepts can be expressed in speech (‘We look forward to see ing you’, ‘Bad times lie behind us’) but also in accompanying non-verbal cues, such as (metaphorical) gestures. While these gestures may not have direct acoustic correlates, mak ing it debatable whether they belong to audiovisual prosody narrowly defined, we do dis cuss them here, because they illustrate how visual cues may differ across languages in sometimes interesting and unexpected ways. While many cultures tend to use the sagittal axis to express temporal concepts, there are counterexamples as well, such as the Ayamara, who tend to view the future as something that is behind their back (Núñez and Sweetser 2006). More generally, cultures may vary regarding their general temporal focus, for example with Spaniards reportedly being more future oriented than Moroccans (De la Fuente et al. 2014). When presented with futurefocused statements such as ‘Technological and economic advances are good for society’, Spaniards expressed a higher level of agreement than Moroccans. In addition to expressions along the sagittal axis, many languages also use the horizontal axis, where English and Dutch tend to view the past as being located on the left and the future on the right.

484 MARC SWERTS AND EMIEL KRAHMER In addition to this horizontal axis, speakers of Chinese also use the vertical axis, with a timeline that goes from top (past) to bottom (future) (Boroditsky 2001). Such differences may have originated from differences in exposure to specific writing systems, which are mostly horizontally oriented in western cultures but have for a long time been vertical in Chinese, although this system now exists alongside a left-to-right orientation. In this con nection, we note that in Hebrew people experience time as developing from right to left, in line with a writing direction that is the reverse of that used for English and Dutch (Fuhrman and Boroditsky 2010). Many previous studies have used metacognitive tasks to derive how people conceptualize time events, for instance by inviting participants to very explicitly reflect about how they use space to express time. The study by Gu et al. (2017a) is different in that respect, as it looked at the spontaneous gestures Chinese subjects produced while describing time events. The authors used a task that was ostensibly set up as an experiment to explore how accur ately addressees would remember items from speaker’s descriptions. To this end, speakers were given lists of words that were semantically related, where the researchers were specif ically interested in the lists that contained temporal references (e.g. yesterday, today, tomor row). Some of these temporal words contained explicit vertical references (such as the Chinese equivalent of the word ‘up’, which could refer to the past), while temporal refer ences in other lists contained no such spatial metaphors. In a first experiment, Gu and col leagues observed that Chinese–English sequential bilinguals often produced vertical temporal gestures when describing temporal word lists for both native Chinese and native English/Dutch addressees, regardless of whether they did the task in Chinese or English, but they produced more vertical gestures when talking about Chinese time references with vertical spatial metaphors, and especially when they did this task in Chinese. In a subse quent perception test, in which judges had to rate how well gestures in video recordings matched lexical expressions, the researchers found that Chinese listeners (again, all bilin guals) preferred vertical gestures to lateral gestures when perceiving time references with vertical spatial metaphors. This bias towards vertical gestures still existed when they had to rate English translations of those Chinese stimuli, albeit to a lesser extent. These results support earlier findings that there may be differences between cultures in the way temporal references are mapped to spatial concepts. That variation is partly reflected in the way members of those cultures metaphorically express time in the words they use.

33.4 Discussion and conclusion In recent years, there has been increased awareness among intonologists (i) that there are visual counterparts to auditory prosodic cues, (ii) that audiovisual cues can be informative in a range of linguistic and paralinguistic settings, (iii) that the relative importance of audi tory and visual cues can differ per function, and (iv) that there may be cultural differences in how such cues are deployed by interlocutors. Even when it remains hard to provide an exact operationalization of ‘culture’, we can be certain that speakers from different parts of the world are alike in that they all use visual cues, in particular facial expressions and ges tures, to signal pragmatically relevant information. However, just as there can be variation

VISUAL PROSODY ACROSS CULTURES 485 between communities in their use of auditory prosodic features, their speakers can also differ in the way certain visual features are connected to communicative functions. Some of the variation between cultures appears to be gradient in nature. For instance, when Japanese and Dutch speakers intend to make it clear whether they feel certain or uncertain about a response they give to a question, we observe that they both use compar able audiovisual features, but that the Dutch speakers are more expressive when doing this than their Japanese counterparts. Likewise, we saw that both Pakistani and Dutch children indicated visually whether they had won or lost a game, but varied in that the expressive ness of Pakistani children was biased towards winning events, while Dutch children became more expressive after having lost a game. Other variation between cultures in the use of visual features is more categorical, even to the extent that certain form–function relations may be reversed. While many languages have a tendency to express time in terms of spatial metaphors, they can be markedly different in how they choose to do this, with Chinese speakers for instance being more inclined to use a vertical axis. This in turn explains why they also gesture along this dimension to refer to past and future events, which has almost never happened in data obtained from English-speaking participants. It is interesting to observe that these possible relations between auditory and visual cues are reminiscent of how languages can differ in paralinguistic or gradient form–function mapping in auditory prosody (e.g. Chen et al. 2004a). While the case studies we described are merely examples from an admittedly small (but growing) set of studies, they do illustrate that it would be highly interesting to explore for a wider range of functions how cultures resemble or differ in their use of auditory and visual features. This will teach us more about human communication and its functions, by high lighting which audiovisual prosodic skills we all share, and which may be unique for our particular language and culture. This kind of knowledge also has practical implications, for instance for systems that automatically try to detect personality or emotion from verbal and non-verbal cues (e.g. Mairesse et al. 2007; D’Mello and Graesser 2010), which so far have not taken cultural variation into account at all.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 34

Pathol ogica l Prosody Overview, assessment, and treatment Diana Van Lancker Sidtis and Seung-Yun Yang

34.1 Introduction Prosody plays an eminent role in a range of expressive and receptive communicative functions, signalling linguistic, attitudinal, affective, personal, and pragmatic information (e.g. Lehiste et al. 1976; Gussenhoven 1984, 2004; Ladd et al. 1985; Bolinger 1989; Pierrehumbert and Hirschberg 1990; Sluijter and Terken 1993; Wichmann 2000; Keating 2006). Pitch (f0, or fundamental frequency), temporal parameters, intensity, voice quality, and phonetic details all contribute to prosodic material (Schirmer et al. 2001; Raphael et al. 2007). It is almost impossible to produce an utterance with truly ‘neutral’ prosody, and prosodic meanings can be creatively—whether so intended or not—inferred from any utterance. Some cues are used similarly across languages, while other cues are used in a languagespecific way (e.g. Breitenstein et al. 2001a; Chen et al. 2004a; Abdelli-Beruh et al. 2007). A salient example is sarcastic intonation, which signals a meaning that is the opposite of that expressed in the words (Shapely 1987; Milosky and Wrobleski 1994; Attardo et al. 2003; Rockwell 2005, 2007). Prosodic features have been associated with sarcastic expression (Rockwell 2000; Anolli et al. 2002; Cheang and Pell 2008, 2009; Chen and Boves 2018). Failure to process these prosodic cues by persons with neurological dysfunction can lead to serious misunderstanding, as has been shown in persons with Parkinson’s disease (Monetta et al. 2009). Pathological conditions affecting prosodic cueing of any aspect of communication, whether of physical or cognitive origin, can have mild, moderate, or devastating effects on communication with family, friends, and caretakers (Baum and Pell 1999; Kent et al. 1999). In this chapter, we will present the state of the art on neural bases of pathological prosody (§34.2), assessment tools (§34.3), and training for prosodic deficits (§34.4).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PATHOLOGICAL PROSODY: OVERVIEW, ASSESSMENT, AND TREATMENT 487

34.2 Neural bases of pathological prosody The neural bases of prosodic perception and production remain controversial. Considerable research has investigated whether processing of the elements of prosody and the various functions relies on hemispheric specialization (Van Lancker and Breitenstein 2000), attributable to cortical processing. A role of the basal ganglia—subcortical nuclei, responsible for motor function—especially in prosodic production, has also been seen. Key questions have involved linguistic versus affective-prosodic functions in production and comprehension, and the distribution of prosodic cues, in particular, features of timing in comparison with pitch, in the left and right hemispheres respectively.

34.2.1 History of approaches to hemispheric specialization In the nineteenth century, singing and certain forms of emotional speech were found to be preserved in aphasia, which features language deficits, nearly always associated with left hemisphere (LH) damage. Therefore, it was inferred that these behaviours were modulated by the intact right hemisphere (RH) in these persons. Later, Mills (1912: 167) spoke of the ‘zone of emotion and emotional expression as especially developed in the right cerebral hemisphere’, but his examples included facial expressions and behavioural gestures only. In more recent times, claims for a role of the RH in processing emotional prosody have continued (Heilman et al. 1975; Ross 1981). Individuals with RH damage have been described as having flat affect and monotonous speech (Ross and Mesulam 1979), lowered mean pitch level, less pitch variation, and a restricted pitch range (Shapiro and Danly 1985; House et al. 1987; Ross and Monnot 2008). Research on hemispheric lateralization for prosody, approached through varying perspectives with supportive evidence, has advocated a RH involvement in processing of various aspects of prosody, not only affective prosody (Van Lancker 1980; Robin et al. 1990; Klouda et al. 1998; Baum and Dwivedi 2003; Gandour et al. 2003a; Pell 2006, 2007). In contrast, other studies suggest that prosody is not uniquely a RH function (Baum and Pell 1997; Baum et al. 2001; Seddoh 2004; Shah et al. 2006; Walker et al. 2009). In some clinical settings, the LH has appeared to be more important than the RH in processing pros odic features (Baum and Pell 1997). For example, anterior damage in the left cerebral hemi sphere, leading to expressive aphasia, has classically been associated with impaired melody of speech, a diagnostic parameter in aphasia evaluation (Luria 1966; Goodglass and Kaplan 1972). In other studies, no significant differences in recognition of affective-prosodic contrasts were reported between persons with LH damage and persons with RH damage (Schlanger et al. 1976; Van Lancker and Sidtis 1992), and other research has pointed to an important role of subcortical nuclei in both production and comprehension of affective-prosodic meanings (Cancelliere and Kertesz 1990).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

488 DIANA VAN LANCKER SIDTIS AND SEUNG-YUN YANG

34.2.2 Current proposals of hemispheric specialization of prosodic elements Two theories of hemispheric specialization for prosody have been proposed to explain prosodic phenomena: the first highlights functional (affective vs. linguistic uses) characteristics and the second attributes observed hemispheric differences to acoustic features (timing vs. pitch) (Van Lancker 1980; Edmondson et al. 1987; Pell and Baum 1997a, 1997b; Schirmer 2004). The functional perspective arises from a model of brain function that attributes emotional experiencing to the RH (Borod 1992, 1993; Bowers et al. 1993) and language function to the LH. This finding is also supported by studies that place perception of tonal contrasts (present in tone languages) and other linguistic-prosodic cues in the LH (Van Lancker and Fromkin 1973; Gandour and Dardarananda 1983; Fournier et al. 2010); here the language function appears to override the expectation, based in acoustic cue differentiation, that the RH processes pitch contrasts. The second approach describes prosody as lateralized according to componential acoustic cues—in particular, temporal features versus pitch (Van Lancker and Sidtis 1992). Studies on healthy and clinical participants have suggested that temporal and pitch processing in selected auditory challenges correspond reliably to LH and RH processing respectively (e.g. Sidtis 1980; Sidtis and Volpe 1988; Robin et al. 1990; Baum 1998; Hyde et al. 2008). Models describing the hemispheric lateralization for speech prosody are summarized in Table 34.1. It is likely that both interpretations are pertinent to various degrees depending on context. However, recent acoustic analyses of Korean literal-idiomatically ambiguous utterances lend support to the acoustic cue hypothesis (Yang and Van Lancker Sidtis 2016). Damage to the LH or RH of the brain differentially affected the production of prosodic cues. Acoustic analyses revealed that LH damage negatively affected the production of temporal cues, while RH damage negatively affected the production of pitch cues (Yang et al. 2017). Both of these kinds of cues contribute to the normal production of idiomatic/literal contrasts.

34.2.3 Disturbances of pitch control and temporal cues Another interpretation of the clinical presentation of dysprosodic speech in RH damage attributes the condition to disturbed control of pitch variation, leading to flattened pitch and

Table 34.1 Theoretical overview of hemispheric lateralization for speech prosody Brain lateralization

Left hemisphere Right hemisphere

Hemispheric specialization Functional basis

Auditory-acoustic cues

Linguistic information Linguistic prosody Emotional information Affective/emotional prosody

Temporal (duration) Rapidly changing, short temporal cues Spectral (f0) Slowly changing, longer temporal cues

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PATHOLOGICAL PROSODY: OVERVIEW, ASSESSMENT, AND TREATMENT 489 reduced pitch range (Sidtis and Van Lancker Sidtis 2003). This assessment places the pathology at the level of motor speech control. In this perspective, the dysprosodic speech of many individuals with RH damage, attributed previously to defects in processing emotional meanings in speech, is based on a pitch control deficiency. Findings for deficient linguistic prosody in RH damage have also been reported (Ouellette and Baum 1994; Baum et al. 1997, 2001; Walker et al. 2004). Less pitch variation and restricted pitch range are seen in individuals with RH damage who attempt to produce sentence stress or to convey prosodic differences between questions and statements (Danly and Shapiro 1982; Cooper et al. 1984; Shapiro and Danly 1985; Behrens 1988, 1989), again suggesting that prosodic deficits associated with RH damage may be attributable to pitch control in motor speech processing. This is an important consideration in evaluating prosodic abnormalities in persons with brain damage. The well-known early description of a melody of speech disorder, resulting from LH damage, was actually a timing deficit (Monrad-Krohn 1947; Moen 1991). A role of temporal features in many instances of apparent dysprosody may be accountable. Impaired timing control in Broca’s (nonfluent) aphasia has been reported (Williams and Seaver 1986; Baum 1992; Van Lancker Sidtis et al. 2010), where longer durations of the last syllable and magnitude of the shortening effect on non-final syllables, consistently observed in normal speech, were impaired in LH damage. These timing relationships are seen in paradigms such as jab, jabber, jabbering (where the stem vowel is shorter with each derived morpheme) and prepausal lengthening in phrases: the word you is longer in Why don’t you than in the expanded phrase Why don’t you get tickets (Gaitenby 1965). It is interesting to note that persons with Parkinson’s disease, although classically impaired in speech rate control (Breitenstein et al. 2001b), retain this relational timing in paradigms: shortening of initial syllables in single words when expanded to longer forms (pot, potter, pottering) and lengthening of final syllables in a phrase (Canter and Van Lancker 1985; Sidtis and Sidtis 2012). This observation contributes to the proposal that timing control on short segments in speech is a cortical function. Temporal problems in Parkinson’s disease (caused by basal ganglia dysfunction) involve rate of speech across phrases and clauses, with ‘short rushes’ occurring in phrases, and acceleration emerging in longer utterances. In contrast, temporal disturbances in ataxic speech (see §34.2.4) affect global syllabic structure, stress patterns, and phrasal rhythm.

34.2.4 Subcortical involvement in speech prosody: basal ganglia and cerebellum Compelling evidence has described an important role of subcortical areas in several manifestations of prosodic behaviour (Cohen et al. 1994; Pell 2002; Van Lancker Sidtis et al. 2006; Paulmann et al. 2008; Jones 2009). Dysprosodic speech is seen in persons with Parkinson’s disease, caused by insufficient infusion of dopamine to the subcortical nuclei, and in dysfunction of the cerebellum (having multiple aetiologies), as observed in cerebellar ataxia. Growing attention has been directed towards elaborating the role of subcortical components of functional brain networks engaged in the perception and production of speech prosody. Persons with Parkinson’s disease speak with a dysprosodic profile that is a signature of this progressive disease—low volume, monopitch (flat intonation), and rate

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

490 DIANA VAN LANCKER SIDTIS AND SEUNG-YUN YANG abnormalities—while ataxic speech (see below) is quite different, exhibiting signature rhythm deficits. Structures of the basal ganglia, the grouping of nuclei surrounding the thalamus that are responsible for initiation and monitoring of motor activity, have been implicated as the most significant area damaged in participants demonstrating affective or emotional prosody deficits (Cancellier and Kertesz 1990; Hertrich and Ackermann 1993; Karow et al. 2001; Pell and Leonard 2003; Van Lancker Sidtis et al. 2006; Paulmann et al. 2011). Tranel (1992: 82) associates ‘defects in articulation and prosody’ with lesions (due to stroke) of the putamen, a major subcortical nucleus. Descriptions of basal ganglia disease are coupled with descriptions of dysprosodic speech (Ali-Cherif et al. 1984; Laplane et al. 1984; Bhatia and Marsden 1994; Saint-Cyr et al. 1995). Caplan et al. (1990: 139) describe speech from subcortical lesions as ‘emotionally flat’. These studies indirectly reveal the impact of basal ganglia damage on prosodic competence. The observation that defective prosody is not seen in Alzheimer’s disease may be attributable to the fact that basal ganglia structures remain intact long into the progression of this neurological disorder. Other dementias, caused by depression or white matter disease, may be associated with prosodic changes in speech. Progressive diseases, such as amyotrophic lateral sclerosis, affecting the motor neurons, and multiple sclerosis, a white matter disease, are characterized by prosodic disturbances to the extent that speech is affected. Traumatic brain injury may lead to changes in speech prosody, depending on the site and nature of the damage, but this has not been extensively studied. Ataxic dysarthria is a motor speech disorder arising from damage to the cerebellum. Speech is affected primarily in timing and rhythmic characteristics, featuring evenly spaced syllables, which are inappropriate in English, where reduced and full syllables occur (Sidtis et al. 2011). The dysarthric profile includes vowel distortions, as well as excess and equal stress on all syllables, meaning that neither the expected rhythmic relationships between stressed and unstressed syllables nor the placement of sentence stress are preserved, and that instead all syllables tend to receive equal stress or inappropriately excessive stress (Hartelius et al. 2000). The speech of persons with cerebellar ataxia is characterized, in short, by distorted timing and rhythm (Duffy 2005).

34.2.5 Prosody in autism Little is known about the neurological aetiology of autism, which includes a broad spectrum of disorders and a great range of severity (Sparks et al. 2002; Newschaffer et al. 2007). Persons across the entire autism spectrum from low to high functioning, children and adults alike, often have prosodic abnormalities, but findings are inconsistent (McCann and Peppé 2003; see also chapter 42 for a review). Some subgroup differences have been reported (Eigsti et al. 2011; Paul et al. 2005a), some of which may contribute to classification (Peppé et al. 2011). In children with verbal production, prosodic competence is actually sometimes retained despite severe linguistic-verbal deficits: the correct intoning of jingles heard in public media, for example, has been described in essentially non-verbal children. In higher-functioning individuals with autism, prosodic abnormalities in speech production are observed (Peppé et al. 2006). Interestingly, when perception was tested, autistic persons over the age of 8

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PATHOLOGICAL PROSODY: OVERVIEW, ASSESSMENT, AND TREATMENT 491 (regardless of mental age) performed nearly at age-equivalent levels in matching emotionally intoned utterances to facial drawings (Van Lancker et al. 1989). Similarly, persons diagnosed with Asperger’s syndrome were able to determine vocal features of gender (Groen et al. 2008). However, because of the extreme variability in the clinical presentation and the lack of knowledge of neurological causes, useful generalizations about the prosodic profiles in production and perception, and their neurological aetiologies in this clinical condition, cannot yet profitably be made.

34.3 Evaluation of prosodic performance Much of prosodic material occurs at the foundation and perceptual background of speech, where linguistic meaning is prominent. In some cases, the force of intonational meanings is recognized in popular understanding; the comment ‘I didn’t appreciate his tone of voice’ can easily occur in communicative settings. However, while attending to verbal content, listeners and speakers do not always consciously track prosodic material, however import ant to the overall message. Just as musical talent and skill vary across individuals, there is a great deal of variability in the abilities of ordinary as well as highly trained persons to accurately perceive, imitate, and produce prosodic cues. Further, due to the considerable ranges and variety of phrasal intonational patterns, word and phrase accents, emotional and attitudinal expression, and other functional prosodic material occurring within and between healthy speakers, establishing stable norms is challenging (Crystal 1969). Casual observation reveals that many apparently normal speech patterns, produced by healthy persons, are monotonous, while others are hypermelodic; further, prosodic patterning depends greatly on context, speaker’s mood, and setting. Dialects (and languages) differ greatly in tunes and rhythms. It is therefore not surprising that compared to those for speech and language testing, relatively few assessment protocols have been developed for prosodic competence, even though prosodic abnormalities are known to occur in several prominent communicative disorders. The complexity of normal prosodic practice is one reason for a dearth in evalu ation materials. Another reason arises from the challenges presented to the clinician, who must be able to perceive subtle details of a backgrounded signal and decide about their normal or non-normal status. As mentioned by McSweeny and Shriberg (2001: 520), ‘Skills in the assessment of prosody are not routinely taught in academic training programmes in communicative disorders.’ Prosodic abnormalities are often mentioned as diagnostic, as in apraxia, where one of ‘the salient characteristics’ is said to be dysprosody ‘unrelieved by extended periods of normal rhythm, stress, and intonation’ (Wertz and Rosenbek 1992: 46). Here, as frequently, a clearly depicted metric for ‘normal’ is not provided. Rosenbek et al. (2006) describe an evaluation procedure developed for a prosody remediation study. Four judges view a videotape where individuals ‘imitate syntactic and emotional prosody and produce syntactic and emotional prosody to command’ (p. 384). Five levels of scoring from mild to severe are used and three of the four judges have to

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

492 DIANA VAN LANCKER SIDTIS AND SEUNG-YUN YANG agree on a ranking. This approach accommodates task differences between spontaneous and repeated speech and strives for rater agreement. However, there continues to be a challenge relating to the establishment of objective measures in these kinds of evaluations. An overview of nine assessment tools is provided by Kalathottukaren et al. (2015) for use with persons with cochlear implants, showing measures of expressive prosody. Tests used are the Prosody Profile (PROP; no normative data; Crystal 1982); the Prosody-Voice Screening Protocol (PVSP; norms from children; Shriberg et al. 1990); and the Profiling Elements of Prosody in Speech-Communication (PEPS-C; norms from children; Peppé and McCann 2003). Research-level protocols are the Aprosodia Battery (affective only; Ross et al. 1997), the Florida Affect Battery (Bowers et al. 1991), the Western Aphasia Prosody Test (Cancelliere and Kertesz 1990), and the Affective-Linguistic Prosody Test (Van Lancker 1984).

34.4 Treatment for prosodic deficits Interest in treatment of prosodic deficiencies in communicative competence has arisen in the clinical community (Leon et al. 2005). Some treatment approaches are based on theories of cerebral functioning underlying dysprosody. There are various rationales underlying prosodic treatment approaches. Explanations involving neuro-mechanisms underlying prosodic deficits include a motor programming deficit (van der Merwe 1997; Baum and Pell 1999) and a loss of affective representations (Blonder et al. 1991). Two treatments (the imitative treatment and the cognitive-linguistic treatment) for expressive dysprosody are based on these hypotheses (Rosenbek et al. 2004). Both treatments follow a six-step cueing continuum, where maximum cueing is provided initially and cueing then systematically decreases as the patient progresses through the continuum. The imitative treatment is based on the hypothesis that expressive dysprosody is the result of impaired motor programming or planning (van der Merwe 1997; Baum and Pell 1999; Zakzanis 1999). The clinician models a sentence using a happy, angry, or sad tone of voice, intoning the appropriate voice quality. One approach used biofeedback during the modelling sessions (Stringer 1996). The cognitive-linguistic treatment is based on the notion that dysprosody is the result of a loss of the affective representations of language. Patients are given the names and vocal characteristics of happy, angry, and sad, as well as pictures of the appropriate facial expression for each emotion. They are then given sentences and asked to produce them using the cues provided. Evidence supporting these two treatments for expressive dysprosody has been reported, mostly based on listeners’ perception (Rosenbek et al. 2004; Rosenbek et al. 2006; Jones et al. 2009; Russell et al. 2010). One study conducted acoustic analyses on utterances to evaluate the treatment for dysprosody (Jones et al. 2009). Statistical differences in acoustic measures were found following both treatments, lending additional support for these treatments. Russell et al. (2010) also conducted acoustic as well as perceptual analyses on the utterances following the imitative treatment. However, differences in the acoustic measures, as reported in Jones et al. (2009), were not found. While listeners showed better performance on the identification of linguistic contrasts, decreased accuracy in the identification of emotions in post-treatment dysprosodic speech was reported.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PATHOLOGICAL PROSODY: OVERVIEW, ASSESSMENT, AND TREATMENT 493 A trial treatment for dysprosody, focusing on pitch, was undertaken in our laboratory, focusing on a 50-year-old Caucasian male who had sustained a large, single infarct to the right-sided frontal, parietal, and temporal lobes resulting in severely dysprosodic speech. The approach attempted to improve the declination line (the tendency of the intonational utterance to decrease in pitch towards the end of an utterance) in the patient’s speech. As can be seen in the example provided here (see Figure 34.1b), the patient’s utterance rose at the end of the trajectory, yielding an unnatural prosody. Pitch control in speech and singing were equally impaired. Performance on clinical protocols revealed no cognitive, reading, or language disorders. The goal of this case study was to determine whether prosodic intonation patterns could be adjusted by speech therapy, which consisted of providing models of a normal intonation line to the client and asking for repetition, followed by elicitation. The patient had been anecdotally described as ‘sounding Norwegian’. As can be seen when compared with the healthy speaker (see Figure 34.1a), the declination line spoken by the person with dysprosody did not fall normally when he spoke the same sentence. In daily sessions, models of a declination line

Pitch (Hz)

(a)

250

150

75

Pitch (Hz)

(b)

0

Time (s)

1.176

75 0.2836

Time (s)

2.036

250

150

Pitch (Hz)

(c)

250

150

75 1.306

Time (s)

3.111

Figure 34.1 (a) Schematized f0 curve of He’s going downtown today, produced by a healthy male speaker. (b) Schematized f0 curve of He’s going downtown today, produced by a patient with right hemisphere damage diagnosed with dysprosody, before treatment. The intonation contour rises at the end, yielding an unnatural prosody. (c) Schematized f0 curve of He’s going downtown today, produced by a dysprosodic patient with right hemisphere damage, after treatment. The intonation contour approaches the healthy speaker’s profile.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

494 DIANA VAN LANCKER SIDTIS AND SEUNG-YUN YANG were provided and the individual was asked to repeat the correct version. He imitated the model at about 80% success by the end of a three-week course of therapy. He then provided improved elicited forms (spoken spontaneously) at about 50% success. (Figure 34.1c). This case provides an example of using modelling in therapy and acoustic measures to track improvement. The implications of this informal therapy programme were that the declination line, distorted following a stroke, could be remediated in the clinic. Generalization to spontaneous speech was, however, not evaluated. Treatment approaches for the hypophonia (soft speech) of Parkinson’s disease have included Lee Silverman Voice Treatment (LSVT) (Ramig et al. 2001), which provides practice opportunities to help patients progress towards speaking loudly, and SPEAKOUT (Levitt 2014), designed to encourage speakers to focus on intentional effort, in order to increase volume as well as articulatory clarity. Another approach, PLVT (Pitch Limiting Voice Treatment), increases loudness while lowering vocal pitch (de Swart et al. 2003). It is clear that overall lowering of vocal pitch has a goal that differs from changing the trajectory of the intonational contour. For timing abnormalities, several techniques, including voluntary rate control, use of an alphabet or pacing board, and hand tapping, are current for working with persons with rate abnormalities, as seen in Parkinson’s disease (Van Nuffelen et al. 2009). These approaches have all shown some success in the clinical setting. A well-known treatment for aphasia, Melodic Intonation Therapy (MIT), was designed to use retained prosodic—singing and melodic intoning—competence in persons with LH damage and severe language disorder. Rather than treating prosodic deficits, MIT is based on using intoning and melodic production, putatively preserved in the intact RH, in nonfluent aphasia due to LH damage (Helm-Estabrooks and Albert 1991). The theories origin ally underlying this treatment design were sound. Production of familiar songs has been associated with the intact RH in clinical subjects (Goldstein 1942; Bogen and Gordon 1971; Geschwind 1971; Heilman et al. 1984), including in persons who have undergone left hemi spherectomy (Smith 1966). Using a simplified, exaggerated, sung prosody, usually produced on a formulaic expression, MIT trains patients first to sing and then to intone and tap out words and phrases syllable by syllable. The process involves five levels: humming, unison singing, unison intoning with fading, immediate repetition, and response to a probe questions (Helm-Estabrooks et al. 1989). Positive results have been reported with MIT in individuals with chronic nonfluent aphasia (Marshall and Holtzapple 1976; Naeser and Helm-Estabrooks 1985; Springer et al. 1993; Schlaug et al. 2009; Hough 2010; Conklyn et al. 2012; van der Meulen et al. 2014). Most studies reported improvement in repetition, trained items, and object naming. It may be that familiar melodies are more successfully associated with intact RH function; in contrast, MIT uses newly created melodies. Further, recent studies indicate that the therapeutic success was due primarily to the use of well-known formulaic expressions with accompanying rhythmic beats (Stahl and Van Lancker Sidtis 2015), which form part of the MIT process.

34.5 Concluding remarks Prosodic information of great variety, present throughout the speech signal, contributes importantly to verbal communication. Tone of voice and other details of prosodic material reveal attitude, mood, emotion, intentions, and personality as well as linguistic and

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PATHOLOGICAL PROSODY: OVERVIEW, ASSESSMENT, AND TREATMENT 495 ragmatic information. This vast prosodic repertory, using many combinations of subtle p auditory-acoustic cues, presents a considerable challenge to speech scientists who wish to describe the phenomena. Regarding questions about hemispheric specialization, both structural (acoustic) and functional (cognitive) details of the stimuli are accountable (Pell 1998). The prevailing evidence suggests that temporal detail and pitch engage the LH or RH respectively, while the use of pitch in linguistic contrasts overrides the preference of the RH and, instead, develops in the LH in most right-handed persons speaking a tone language. Emotional expression is often impaired in RH damage due to disability relating to pitch control; emotional comprehension may be impaired in RH damage due to negative effects of brain damage on emotional experiencing in that hemisphere. A compilation of clinical studies and observations suggests that emotional prosodic behaviours in speech are affected by a constellation of perceptual, motor, motivational, emotional, and cognitivefunctional factors (Sidtis and Van Lancker Sidtis 2003). It is also clear that much of the brain participates in the production, perception, and comprehension of prosodic information, including subcortical and cerebellar motor structures; cortical-cognitive regions in the temporal, parietal, and frontal lobes; and limbic emotional and attitudinal domains (Robinson 1976; Kent and Rosenbek 1982; Kreiman and Sidtis 2011). It is therefore to be expected that many kinds of brain damage and neurological dysfunction will affect processing of prosodic information in both production and perception. Clinically, evaluation and treatment of prosodic deviations are at an early stage, with several promising attempts available for inspection. Progress in these efforts is highly desirable, given the large role that prosody performs in human verbal interaction, as well as the sophisticated equipment now available for acoustic and physiological analysis. It is expected that developing models of prosodic structure and function will allow scientists to address the complex layering of prosodic information carried in the speech signal, leading to useful tools for evaluation and treatment of prosodic deficits.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

pa rt V I

PRO S ODY A N D L A NGUAGE PRO C E S SI NG

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 35

Cortica l a n d Su bcortica l Processi ng of Li ngu istic Pitch Pat ter ns Joseph C. Y. Lau, Zilong Xie, Bharath Chandrasekaran, and Patrick C. M. Wong

35.1 Introduction Pitch is one of the major prosodic cues in speech. Pitch, along with other prosodic parameters such as length and loudness, signals linguistic elements of speech such as lexical stress, lexical tone, and sentence prosody (chapter 3). Pitch also serves as an important perceptual cue to speech segregation (Frazier et al. 2006). The main physical correlate to pitch is the frequency modulation (or the periodicity) of acoustic waveforms, most prominently the fundamental frequency (f0) (Oxenham 2012). Studies on the neurophysiology of hearing have concluded that along the human auditory pathway are cortical and subcortical structures that show sensitivity to frequency modulation in the acoustic signal (Alberti 1995: 53–62). How the percepts of various aspects of linguistic pitch are derived from the acoustic signal by these neural structures remains a question that has continued to intrigue language neuroscientists for more than four decades (Wong and Antoniou 2014). The goal of this chapter is to critically examine the literature investigating pitch processing at cortical and subcortical levels in the human auditory system. In the earlier behavioural and brain lesion studies, the topic of cerebral hemispheric specialization of pitch processing was a dominant theme (Wong 2002). The advent of more advanced neuroimaging techniques in the twenty-first century has provided scientists with the opportunity to investigate a wider range of topics other than the functional localization of pitch processing. Research in the past decade has seen a paradigm shift from the early fascination of hemispheric specialization to e ncompassing a vibrant array of empirical research that investigates how neural networks between the cortical and subcortical auditory systems work in

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

500 Joseph C. Y. Lau et al. concert to arrive at successful pitch processing (Zatorre and Gandour 2008; Chandrasekaran and Kraus 2010; Chandrasekaran et al. 2014). Recent research has also taken a translational perspective to investigate how neural markers of pitch processing can be used as predictors of language learning outcomes (Wong et al. 2017).

35.2 The basic functional anatomy of the human auditory system A sound pressure wave funnelled by the external ear sets the tympanic membrane into vibration. The vibrations are transmitted through a chain of bones (malleus, incus, and stapes) in the middle ear to the cochlea, wherein the vibrations are transduced into electrochemical neural potentials. The electrochemical neural potentials ascend through the auditory pathway via synapses to the cochlear nerve, the cochlear nuclei, and the superior olivary complex in the brainstem. Following synapses in the inferior colliculus and medial geniculate nucleus of the thalamus, the ascending projections reach the primary auditory cortex. For speech sounds, various widely distributed frontal and temporal networks, with hubs such as the inferior frontal gyrus (IFG), the superior temporal gyrus (STG), and the middle temporal gyrus (MTG), are also involved in performing essential language functions (Hickok and Poeppel 2007). Sensitivity to frequency modulation is observed across the auditory pathway. Neurons in the primary auditory cortex (A1) are tonotopically organized: specific ensembles of neurons respond maximally to certain sound frequencies (Romani et al. 1982). Caudal A1 is best activated by higher frequencies whereas rostral A1 responds best to lower frequencies. Neurons in the brainstem (e.g. the brainstem nuclei and inferior colliculus) are sensitive to more dynamic frequency modulations in the form of phase-locking to the auditory signal (Chandrasekaran and Kraus 2010). The extent to which pitch perception is driven by stimulation to areas corresponding to the tonotopy organization in the auditory pathway (i.e. place information), or by dynamic firing patterns of neurons (i.e. temporal information), has remained a popular debate in the psychoacoustics literature for over a century (Oxenham 2013). In addition to the aforementioned ascending auditory pathway, there are descending pathways from the auditory cortical regions that back-project to the subcortical regions, such as the medial geniculate body, the inferior colliculus, and the brainstem nuclei, through corticofugal pathways. These back-projections, with the ascending pathways, form feedback loops wherein subcortical auditory processing is dynamically modified based on the immediate history of the sensory inputs (Suga 2008).

35.3 Experimental methods in the neurophysiology of language processing Our understanding of the neurophysiology of hearing and frequency modulations across the auditory pathway has primarily been established by studies using animal models with invasive techniques such as neuronal cooling, direct neuron stimulation, and experimentally induced lesion (Fay 2012). The use of similar invasive techniques to study the neural bases of

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

CORTICAL AND SUBCORTICAL PROCESSING OF LINGUISTIC PITCH PATTERNS 501 speech and language processes in humans, however, is challenging. Intracranial stimulation and recordings can only be performed on surgical patients requiring craniotomy in the first place (e.g. epileptic patients) and are often limited by this technique’s inability to map areas related to higher cognitive functions (Towle et al. 2008). Non-experimentally induced brain lesion (such as one resulting from a stroke), on the other hand, can provide a simple yet insightful method to investigate the neural bases of language by correlating an impaired brain region with loss of language functions in behaviour. For example, Broca’s and Wernicke’s areas, the core brain regions crucial to speech production and comprehension (in the left inferior frontal and left superior temporal regions, respectively), were identified in the nineteenth century by associating lesions in these brain areas found in autopsy with the patient’s premortem loss of language functions (Geschwind 1974). The association between brain regions and pitch processing can also be studied in nonclinical populations in more controlled experimental settings thanks to advancements in modern non-invasive neuroimaging techniques, such as positron emission tomography (PET) and functional magnetic resonance imaging (fMRI). PET tracks the cerebral blood flow through radioactive tracers injected into the body, whereas fMRI measures the change of blood oxygen consumption in the brain (i.e. blood-oxygen-level-dependent (BOLD) contrasts). PET and fMRI track the brain regions that are activated by an experimental task on the assumption that blood flow, and hence BOLD responses, would increase when particular brain regions are actively engaged in the task. Electroencephalography (EEG) is a non-invasive technique that can provide the better temporal resolution needed to investigate neural processing in the time domain (i.e. how processing unfolds after the sound enters the auditory system). It records electrophysiological activity of the brain millisecond-by-millisecond using electrodes placed on the scalp. EEG responses evoked by specific stimuli are known as event-related potential (ERP) responses. Various ERP components (potentials occurring at different time points of processing) have been identified as neural markers indexing various language-related cognitive processes. For example, mismatch negativity (MMN) indexes automatic perceptual processes in the brain that discriminate sequences of auditory stimuli (Näätänen et al. 2007); P300 is elicited by attentive behavioural judgements, reflecting the mental effort associated with behavioural responses (Polich 2007); N400, often elicited in sentence/lexical decision tasks, is known to function as an index to lexico-semantic access and processing (Lau et al. 2008). These ERP components are presumably reflecting cognitive processes at the cortical level. Subcortical processing, on the other hand, can be investigated by looking at the frequency-following response (FFR). The FFR is an EEG response dominantly generated by neural ensembles within the auditory brainstem and midbrain that phase-lock to the f0 of the auditory stimulus and its harmonics (Chandrasekaran and Kraus 2010). By examining the integrity of the f0 in the FFR, subcortical encoding of pitch information in speech can be investigated (Skoe and Kraus 2010).

35.4 Hemispheric specialization Cortical lateralization of language to the left hemisphere has been observed over centuries of research using lesion, dichotic listening, and functional neuroimaging techniques (Pulvermüller 2005). The processing of non-linguistic pitch information (e.g. musical melodies), on the other hand, has been associated with the right hemisphere (Zatorre et al. 1994).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

502 Joseph C. Y. Lau et al. Interpretations of this dichotomy of lateralization have ignited much scientific debate over the past few decades (Zatorre and Gandour 2008). At least two competing hypotheses have been proposed. The ‘functional hypothesis’ posits that the dichotomy of lateralization represents two distinct neural pathways that specifically subserve linguistic and non-linguistic processing (Van Lancker 1980). In contrast, the ‘acoustic hypothesis’ posits that the lateralization patterns represent two components of a domain-general processing network that respond to different aspects (e.g. shorter and longer temporal domains) of acoustic signals (Hickok and Poeppel 2007). Linguistic pitch patterns, being both pitch and language, are hence an intriguing vehicle to test these competing hypotheses. The functional hypothesis would predict that linguistic pitch patterns, which primarily serve linguistic functions, would be processed in the left hemisphere; the acoustic hypothesis would predict that since linguistic pitch patterns are dynamic frequency modulations in nature, they would be processed in the right hemisphere. Early dichotic listening experiments on lexical tones seem to support the ‘functional hypothesis’. Studies found that with stimuli presented to both ears, the processing of lexical tone distinctions demonstrated a right-ear advantage in native speakers of tone languages (e.g. Van Lancker and Fromkin 1973; for reviews see Wong 2002; Wong and Antoniou 2014). Although these results were interpreted as evidence of left hemisphere specialization (since more nerve fibres in the auditory pathway project to the contralateral side of the brain), lexical tones always co-varied with segments and lexico-semantic meaning in these studies, making it difficult to interpret right-ear advantage as solely an effect of lexical tones. Results from early lesion studies seem to be mixed at best. Wong (2002) reviewed three dozen lesion studies investigating the production and perception of non-linguistic (affect ive prosody) and linguistic (stress, sentential prosody, and lexical tones) pitch patterns in brain-damaged patients. Both left and right hemisphere damage have been linked to a deficiency in producing and processing linguistic pitch patterns. A more recent meta-analysis of lesion data by Witteman et al. (2011) also concluded that damage to either hemisphere can impair language prosody. Due to insufficient experimental control (e.g. heterogeneity of brain damage in patient populations, and lack of studies testing both left- and righthemisphere-damaged patients with the same experimental materials), it is difficult to conclude from lesion studies that the processing of linguistic pitch pattern is only affected by left or right hemisphere lesion. Functional neuroimaging studies have the advantage of better experimental control and more precise anatomical details. Earlier PET and fMRI studies provided compelling evidence supporting the functional hypothesis. These studies identified several left cortical areas associated with lexical tone processing in native speakers, such as the left inferior frontal network, left insula, and left superior temporal regions (e.g. Klein et al. 2001; Gandour et al. 2003b; Wong et al. 2004b). Contrastively, when non-tone-language speakers processed the same tones, even after controlling for lexico-semantic factors such as by embedding the pitch patterns within real words in English (Wong et al. 2004b), homologous areas of the right hemisphere were activated, presumably because these pitch patterns do not serve linguistic functions in non-tone languages. However, more recent fMRI studies have also suggested the importance of the right hemisphere for lexical tones as well as sentence prosody processing. Liu et al. (2006) found that lexical tone production was less left lateralized compared to segmental production. Kwok et al. (2016) observed activation in the right middle STG in addition to the left STG

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

CORTICAL AND SUBCORTICAL PROCESSING OF LINGUISTIC PITCH PATTERNS 503 in lexical tone discrimination within disyllabic words. Meanwhile, the involvement of the right hemisphere in sentence prosody processing has been demonstrated in cross-language (Gandour et al. 2004), cross-task (Kreitewolf et al. 2014), and meta-analysis (Belyk and Brown 2014) studies. Right-lateralized activation of the superior temporal and inferior frontal regions has been found in prosody recognition more so than in segmental recognition (Belyk and Brown 2014; Kreitewolf et al. 2014), and irrespective of the involvement of lexical tones in the language (Gandour et al. 2004). These findings have led to the recent view that the functional and acoustic hypotheses may not be mutually exclusive. Instead, they represent different components of a bilateral speech-processing network that are responsive to different levels of linguistic pitch processing (Zatorre and Gandour 2008). However, hypotheses on the special roles of the two hemispheres still diverge drastically. Whether the left and right hemispheres are specialized in processing fast versus slow pitch modulation (Belyk and Brown 2014), processing segmental versus suprasegmental syntactic elements (Kreitewolf et al. 2014), and analysing linguistic representations versus acoustic information (Kwok et al. 2016) remain open questions. Meanwhile, a right-ear advantage of lexical tone processing has been found to be reflected in FFRs, which raises the question of the potential role of subcortical structures as precursors to hemispheric specialization (Krishnan et al. 2011). Tone languages, which contain both lexical and sentential pitch patterns, would be a good vehicle for future studies aiming to comprehensively understand the neural bases of the interactive processing of linguistic pitch elements at the lexical and sentential levels, as well as at the cortical and subcortical levels.

35.5 Neural evidence for mechanisms of linguistic pitch processing Like other elements of speech, successful perception of linguistic pitch patterns such as lexical tones requires the mapping of highly variable acoustic information into high-level linguistic categories at the representational level (Diehl et al. 2004). ERP studies have shed light on the underpinnings of neural mechanisms involved in linguistic pitch processing that may achieve this acoustic–linguistic mapping. Specifically, successful linguistic pitch perception requires the listener to focus on perceptual dimensions or cues that are particularly linguistically relevant in the specific language (Chandrasekaran et al. 2010). Given cross-language suprasegmental differences, behavioural evidence has suggested that the weights of each dimension contributing to pitch perception differ drastically across languages (Gandour 1983). An ERP study by Chandrasekaran et al. (2007a) was the first to investigate the representation of acoustic dimensions in the brain that are relevant in highly automatic pitch processing. These authors conducted multi-dimensional scaling (MDS) to analyse MMN responses from both Mandarin- and English-speaking populations that reflected their automatic processing of three Mandarin lexical tone distinctions. While the MDS analyses revealed two pitch dimensions (i.e. ‘height’ and ‘direction’), the latter was found to be weighted more heavily for the Mandarin group, suggesting that ‘direction’ is more relevant in tone processing in Mandarin. These results suggest that early and automatic cortical pitch processing is shaped

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

504 Joseph C. Y. Lau et al. by the relative saliency of acoustic dimensions underlying the pitch patterns of particular languages. Another early and automatic pitch processing phenomenon that is observed is categor ical perception (CP) (Francis et al. 2003). CP is a phenomenon in which acoustically varying sounds are transformed into perceptually distinct categories effortlessly (Liberman et al. 1957). CP, at least for consonants, is found to be a manifestation of the neuronal response of the posterior STG (higher-order speech processing area) to phonetic objects but not linear acoustic changes (Chang et al. 2010). A series of ERP studies have examined the level of neural processing at which CP of tone is manifested (Xi et al. 2010; Zhang et al. 2011b; Zheng et al. 2012). It was found that across-category distinctions elicited larger MMN (Xi et al. 2010) and P300 responses (Zhang et al. 2011b), suggesting CP of tones manifested at both early, pre-attentive (indexed by MMN) and later, attentive stages of processing. Zheng et al. (2012), however, found that P300 amplitude for across-category distinctions only differed for contour but not level tones, while this effect also differed across language groups. The results suggest that cortical CP of tones may vary as a function of suprasegmental inventory (e.g. only for contour tones) and different languages. A recent ERP study by Wang et al. (2017) found a lack of CP effect of Mandarin tones in children with autism spectrum disorder. They posit that this lack of CP is caused by phonological deficits that also lead to impediment of speech communication in tone-language environments, suggesting that the neural mechanisms that underlie CP are integral to lexical tone processing. Compared to the categorically perceived contour tones, the non-CP of level tones requires tone processing to take prior listening contexts into account, especially in highly variable ambient speech environments (e.g. preceding pitch contours) (Wong and Diehl 2003). In this process, known as ‘talker normalization’, the relative pitch height of a level tone, which determines its categorization, is computed based on the talker’s pitch range in the preceding speech contexts. An intriguing question is whether talker normal ization takes place at the early stages of suprasegmental processing before lexical retrieval (like CP) or at the later stages alongside lexical retrieval processes. fMRI studies have shown that talker normalization of segments was subserved by brain areas common to lexicosemantic processing (Hickok and Poeppel 2007) in the MTG (Wong et al. 2004a). An ERP study by Zhang et al. (2013) showed that level tone identification in a non-speech context, compared to speech contexts, only elicited a larger N400 (indexing lexical retrieval) response and no other earlier ERP components. Together, these results suggest that talker normalization takes place at the lexical retrieval stage, perhaps by optimizing the selection of activated potential lexical candidates based on a pitch template computed from preceding contexts. Recent ERP studies also have tested the hypothesis that speech processing is modulated by higher-level phonological lexical representations. Li and Chen (2015) found that MMNs to tone pair distinctions were less robust when the tones were variants of the same lexical category (i.e. allotones), compared to when the two tones belonged to two different lexical categories (i.e. tone contrasts). Politzer-Ahles et al. (2016) also found a similar MMN pattern but questioned the role of phonological lexical representations as a contributing factor rather than more general acoustic properties of the tones. Future studies should maintain better experimental control (e.g. by controlling for acoustic factors) and investigate mul tiple levels of processing (e.g. pre-attentive vs. lexical) to comprehensively test the role of phonological lexical representations in linguistic pitch processing.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

CORTICAL AND SUBCORTICAL PROCESSING OF LINGUISTIC PITCH PATTERNS 505

35.6 Cortical plasticity of pitch processing ‘Neural plasticity’ refers to the ability of the brain to change and adapt to the ever-changing environment as experience accumulates throughout the lifespan. Studies in the past two decades have demonstrated that how the brain processes linguistic pitch patterns varies as a function of lifelong and short-term auditory experience. Whether one’s ability to adapt to the ambient environment is related to one’s language development or learning outcome is a question that has ignited much interest in the era of translational research. The functional neuroimaging and electrophysiological studies reviewed in §35.4 and §35.5 have shown evidence of pitch processing plasticity as a function of long-term language experience. Listeners from different native language backgrounds (e.g. tone vs. non-tone languages) engage different cortical networks to process linguistic pitch patterns (e.g. Gandour et al. 2004; Wong et al. 2004b) and focus on different acoustic cues in pitch processing (Chandrasekaran et al. 2007a). In addition to native language experience, lifelong musical experience has been shown to enhance lexical tone and sentence prosody processing (see Chandrasekaran et al. 2015 for a review). Wang et al. (2003b) were the first to demonstrate that pitch processing could be modulated by short-term experience. Activation of the left hemisphere language-related regions expanded after native English speakers completed a short-term Mandarin tone-learning paradigm. A later fMRI study by Wong et al. (2007) identified patterns of short-term neural plasticity that were related to tone-learning success. Less successful learners in a short-term tone-learning paradigm activated a diffused brain network including the right frontal lobe, whereas more successful learners activated a more focused area in the left temporal lobe to a larger extent. This pattern of cortical streamlining was interpreted as a neuro-signature of successful learning. Since then, a series of studies have identified various anatomical (e.g. higher volume of the Heschl’s gyrus as reported in Wong et al. 2008) and functional (e.g. more efficient cortical networks as reported in Sheppard et al. 2012) markers in the brain that can predict future learning success in short-term tone-learning paradigms (see Wong et al. 2017 for a review).

35.7 Subcortical pitch processing and its plasticity Subcortical structures such as the auditory brainstem were traditionally considered passive relay stations for auditory signal in the auditory system (e.g. Hickok and Poeppel 2007). However, over the past decade or so, there has been a surge of FFR studies investigating the role of the auditory brainstem in pitch processing. These FFR studies have shown that subcortical encoding to pitch patterns in lexical tones is malleable to long-term, short-term, and immediate sensory history. Native language background was the first variable found to modulate subcortical pitch encoding. A series of studies found that pitch tracking of FFRs to lexical tones was more

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

506 Joseph C. Y. Lau et al. robust and accurate when the tone was linguistically relevant, such as for native speakers of Mandarin compared to native speakers of English (e.g. Krishnan et al. 2005), or for natural pitch contours in lexical tones rather than linear approximations (Xu et al. 2006b). It was later found that long-term experience in music also enhanced FFRs to lexical tones even for non-tone-language (English) speakers (Wong et al. 2007). Other FFR studies have looked at the effect of short-term experience and immediate online listening context on subcortical lexical tone encoding. It was found that FFRs to a Mandarin lexical tone improved after non-native participants received a short-term wordlearning programme that required them to learn to associate the tones to pseudo-words (Song et al. 2008). A recent study found that FFRs of native tone-language speakers (Cantonese) were more accurate when a stimulus was more predictable and less repetitive in its preceding context (Lau et al. 2017). Skoe et al. (2014) found an interactive effect between short-term experience and online context: FFRs to unexpected lexical tones improved after short-term word learning. The long-term, short-term, and online plasticity effects demonstrated in these FFR studies beg the question of which neural mechanisms enable such malleability of brainstem encoding in a wide time range. Recent proposals posit that subcortical plasticity is manifested by top-down modulation guided by higher-order behaviourally relevant representations of sounds built up by long-term, short-term, and online experiences (Chandrasekaran and Kraus 2010; Chandrasekaran et al. 2014). The cortex guides the sensory signal to be finetuned at the subcortical level in a feedback loop, presumably through corticofugal pathways (Chandrasekaran et al. 2014). Whether FFR, given its interpretability at an individual level and its non-task-dependent recording procedure (e.g. passively recorded in sleep) (Skoe and Kraus 2010), can be used as an index of neural encoding that can predict language learning or language development remains an exciting research area to be explored.

35.8 Prosody and syntactic processing in the brain The roles that pitch (along with other types of prosodic information, such as duration and intensity) plays in spoken-word and sentence processing have drawn language scientists’ attention in recent decades (chapter 36). The crucial role of prosody in syntactic processing in the brain is supported by neurolinguistic studies on patients with corpus callosum (CC) lesion. As reviewed in §32.4, prosody has been associated with the right hemisphere. Meanwhile, syntactic processes of language (e.g. sequencing and formation of structure) are attributed to the left frontal cortex (Friederici 2002; Friederici and Alter 2004). The CC is a large bundle of fibres that engages in inter-hemispheric information exchange. It was found that CC lesion patients, unlike non-patient controls, failed to elicit ERP responses associated with syntactic processing (N400 or left anterior negativity) when processing prosody–syntax incongruent violations, but were able to do so for purely syntactic violations (Sammler et al. 2010). These results reflect a failure to incorporate prosodic information into syntactic processing when required to do so, presumably due to the CC lesion.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

CORTICAL AND SUBCORTICAL PROCESSING OF LINGUISTIC PITCH PATTERNS 507 Multiple proposals drawn from behavioural evidence have converged to suggest that prosody provides a multi-tier framework that cues the mapping between aspects of speech signal and representations across the linguistic hierarchy (e.g. syllable, word, phrase) (Frazier et al. 2006). Given the evidence for the prosody–syntax link in the brain, Kreiner and Eviatar (2014) have argued that prosody may also be the grounding property of abstract syntactic hierarchies in the brain. Specifically, the authors have reviewed a series of magnetoencephalography (MEG) studies that supported Giraud and Poeppel’s (2012) OscillationBased Functional Model. This model suggests that neurons in the bilateral auditory cortex fire and oscillate at different frequency bands (Luo and Poeppel 2012), and phase-lock to acoustic information in different temporal windows to achieve parallel processing, such as faster phonemic (at the gamma band: 25–35 Hz) versus slower syllabic information (at the theta band: 4–8 Hz) (Giraud and Poeppel 2012). Kreiner and Eviatar posit that, likewise, oscillation at lower frequency bands (e.g. delta band: 1–3 Hz) may be phase-locked to sentence prosody given their similar temporal window. Prosody, in the form of phase-locking oscillation, then cues the mapping of syntactic categories in speech processing; on the other hand, the embodiment of syntactic representations can be achieved through intrinsic simulation of such oscillation activity. A recent study by Ding et al. (2016) found neural oscillations among the delta band that were phase-locked to the frequency rates at which syntactic phrases (2 Hz) and sentences (1 Hz) were presented in the stimuli, thus partially supporting Kreiner and Eviatar’s proposal.

35.9 Conclusion and future directions: bridging linguistic theory and brain models Moving on from the early fascination of hemispheric specialization of linguistic processing, studies in the past decade have started to look at more complex questions on the neural bases of linguistic pitch pattern processing. With functional neuroimaging and electro physiological techniques, studies have informed the scientific community that what underlies linguistic pitch processing is not one specific lateralized area of the cerebral cortex. Instead, it is an intricate neural network that spans not only the two hemispheres but also cortical and subcortical areas along the auditory pathway. More recent studies have taken both theoretical and translational approaches to re-examine the neural bases of psycholinguistic models of pitch processing, and to investigate the extent to which pitch processing can be used as a predictor of learning outcomes. These neuroscientific studies on linguistic pitch pattern processing have, however, largely been uninformed by formal prosody theories on the highly abstract mental structures that underlie various aspects of speech prosody (chapters 4–6 for overviews). Vice versa, the neural bases of formal prosody theories have also remained largely elusive. Future work should consider what the brain could tell linguistic studies about prosody theories, and, conversely, how prosody theories could shed light on the study of language neuroscience. One approach is to investigate the neural bases of prosodic representations postulated in linguistic theory. Recent advancements in multivariate analysis techniques of brain data,

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

508 Joseph C. Y. Lau et al. given its sensitivity in classifying patterns of brain activity into fine-grained experimental conditions (Naselaris and Kay 2015), may shed light on the anatomical bases of phono logical/prosodic features (c.f. Mesgarani et al. 2014 on the anatomical bases of segmental features in the STG). Another approach would be to test the neural manifestation of various aspects of the prosodic hierarchy (e.g. Nespor and Vogel 1986). The Oscillation-Based Functional Model postulates that ensembles of neurons oscillate at different frequency bands and phase-lock to information of different temporal dimensions within the same sensory signal (Giraud and Poeppel 2012). Hence, one hypothesis to test is whether neur onal oscillations would be phase-locked to different levels of the prosodic hierarchy within the delta frequency band (i.e. expanding from Ding et al. 2016, which only focused on syntactic hierarchies). Such multi-level phase-locking may underlie the differential yet simultaneous processing of various levels of prosodic information embedded in the same stream of pitch pattern.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 36

Prosody a n d Spok enWor d R ecogn ition James M. M c queen and Laura Dilley

36.1 Introduction Each spoken utterance is potentially unique and is one of an infinite range of possible utterances. However, each is made from words that usually have been heard before, sampled from the finite set of words the speaker/listener knows. To understand the speaker’s intended message in any utterance, therefore, the listener must recognize the utterance’s words. We argue here that listeners achieve spoken-word recognition through Bayesian perceptual inference. Their task, over and over again for each word, is to infer the identity of the current word and build an interpretation, integrating current acoustic information with prior knowledge. In this chapter, we consider the role of ‘prosody’ in this process of perceptual recovery of spoken words.

36.2 Defining prosody in spoken-word recognition We begin with a definition of ‘prosody’. This is not only because it can mean different things to different people, but also because one of our goals is to highlight the utility of an abstract definition of prosody that has to do with structures built in the mind of the perceiver. Critically, this definition is tied to the cognition in question: the process of spoken-word recognition. Our definition therefore does not start from linguistic material (words, sentences) or from the acoustic properties of speech (e.g. spectral and durational features) but instead from a psychological perspective, focusing on the representations and processes listeners use as they understand speech. The basis of our definition is that, during word recognition, two types of structure are built in the listener’s mind. The former structures are ‘segmental’ in that they are based on abstractions about segments—the traditional combinatorial ‘building blocks’ of words. The latter structures are ‘suprasegmental’ and relate to abstractions about the prominence,

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

510 JAMES M. MC QUEEN AND LAURA DILLEY accentuation, grouping, expressive tone of voice, and so on of syllables relative to each other and also of words relative to each other. The latter structures are prosodic, and hence to understand the role of prosody in word recognition is to have an adequate account of how these structures are built, but also how the segmental structures are built, and how these two types of structure jointly support speech understanding. This definition thus highlights the interdependency, during processing, of signal characteristics often classified as ‘segmental’ and ‘suprasegmental’. For example, pitch characteristics (i.e. perceptual indices of fundamental frequency variations)—often considered to be suprasegmental in the spoken-word recognition literature—may frequently contribute simultaneously to extracting both segmental and suprasegmental structures, as well as other kinds of structure (e.g. syntactic). In the same vein, acoustic characteristics relating to distributions of periodic (i.e. vocal fold vibration) or aperiodic energy—often considered to be segmental in the spoken-word recognition literature—contribute to extracting both segmental structures (e.g. words) and suprasegmental structures (e.g. prosodic phrase-level structures through domain-initial strengthening of segments, see later in this section), as well as other kinds of structure (e.g. syntactic). Again, this happens in an interdependent fashion across levels of structure. That such interdependences among different levels of structure exist in spoken-word recognition is consistent with the observation that lexical entries are defined in part by the constructs of ‘syllable’ and ‘stress’—each of which has both a ‘segmental’ and a ‘suprasegmental’ interpretation. That a given acoustic attribute (e.g. fundamental frequency in speech, which gives rise to a harmonic spectrum) contributes sim ultaneously to perception of both segmental and suprasegmental structures has long been recognized (e.g. Lehiste 1970). Consideration of this interdependence across different levels of the linguistic hierarchy during structure extraction is also motivated by our perspective on speech recognition. In our view, a core challenge to be explained is how words are extracted from the speech stream in spite of considerable variability. That is, a spoken-word recognizer needs to be robust in the face of acoustic variability of various kinds—for example, differences between phonological contexts, speakers, speaking styles, and listening conditions. We argue that redundancy in encoding multi-levelled tiers of structure across different kinds of acoustic information means that the system is more robust to any one kind of acoustic degradation. That is, listeners build interlocking segmental and suprasegmental phonological structures as a means to solving the variability problem. We believe that our cognitive definition of prosody allows us to avoid several problems. In particular, we do not need to define particular types of acoustic cue as strictly either ‘segmental’ or ‘suprasegmental’. Such attempts come with the implication that whatever phonetic properties are taken to define ‘suprasegmental’—usually timing and pitch—are via logical opposition ‘not segmental’, and thus that these do not cue segmental contrasts. Indeed, such a view is highly problematic, as has been noted by many researchers (e.g. Lehiste 1970). Much work has documented the role of timing in the cueing of segmental contrasts, including both consonants (Lisker and Abramson 1964; Liberman et al. 1967; Wade and Holt 2005) and vowels (cf. vowel length or tenseness; Ainsworth 1972; Miller 1981). Under our proposal, acoustic information can nevertheless still be categorized as that which assists the listener in recognizing either the segments of an utterance (‘segmental information’) or its prosodic structure (‘suprasegmental information’). Our definition is in

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY AND SPOKEN-WORD RECOGNITION 511 service of the view that spoken-word recognition involves simultaneously recognizing the words being said, the prosodic (e.g. grouping, prominence) structures associated with those words, and the larger structures (e.g. syntactic) in which the words are embedded. On this view, it becomes easier to see how diverse acoustic cues—ranging from pitch to timing to allophonic phonetic variation—could be employed to help extract structure (lexical and otherwise) at various hierarchical levels. The same acoustic information can therefore help the listener to simultaneously identify segmental and prosodic structures. Take the case of domain-initial strengthening, in which acoustic cues for consonants and vowels tend to be strengthened (e.g. become longer or louder, or add glottal stops or other fortification) at the beginnings of prosodic domains (Dilley et al. 1996; Fougeron and Keating 1997; Turk and Shattuck-Hufnagel 2000; Cho and Keating 2001; Tabain 2003; Krivokapić and Byrd 2012; Beňuš and Šimko 2014; Garellek 2014; Cho 2016). Domain-initial strengthening affects pitch, timing, and spectral details, but also concerns systematic variation at the lexical level, such that it can help with lexical disambiguation (Cho et al. 2007) and at the utterance level (such that it helps the listener with sentential parsing and interpretation building). That is, domain-initial strengthening concerns variation simultaneously at (at least) two levels of structure. Domain-initial strengthening is an example of cross-talk between segmental and suprasegmental domains. Another example relates to the widespread usage of pitch in the world’s languages to convey lexical contrast. Not only is pitch used throughout the lexicon to convey lexical contrasts in lexical tone languages (e.g. Mandarin, Thai, Igbo) but pitch also plays a role in distinguishing words in languages such as Japanese and Swedish (Bruce 1977; Beckman 1986; Heldner and Strangert 2001). Even intonation languages (e.g. English, Spanish, German, and Dutch) include lexical contrasts based on stress (e.g. IMpact (N) vs. imPACT (V)) that may be signalled by a difference in pitch in many structural and communicative contexts, but certainly not all (Fry 1958; see also chapter 5). Indeed, the acoustic cues that signal lexical stress contrasts are many and varied and include not only segmental vowel quality differences but also differences in timing, amplitude, and/or spectral balance as well as pitch (Beckman and Edwards 1994; Sluijter and van Heuven 1996a; Turk and White 1999; Mattys 2000; Morrill 2012; Banzina et al. 2016). Our definition also highlights how prosody can assist in the perceptual recovery of spoken words when the speech signal is degraded. For example, fine spectral details in signals usually associated with segmental information can be replaced with a few frequency bands of noise, producing noise-vocoded speech, or the dynamic formants can be replaced with sine waves, producing sinewave speech. Such degraded speech is often highly intelligible, especially with practice (Shannon et al. 1995; Dorman et al. 1997; Davis et al. 2005). Such perceptual recovery of spoken words is possible partly because listeners are able to make contact with their prior experiences of timing and frequency properties of spoken words experienced over their lifetimes. That is, this ability indicates that stored knowledge about word forms may include timing, pitch, and amplitude information. A critical feature of our fundamentally cognitive definition is thus that it refers not only to relevant acoustic information but also to relevant prior knowledge. To explore prosody in spoken-word recognition is thus to ask how suprasegmental information and prior knowledge about prosodic structures, together with segmental information and prior knowledge about segments, jointly support speech comprehension. We propose that the answer to this question is that speech recognition involves Bayesian inference.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

512 JAMES M. MC QUEEN AND LAURA DILLEY

36.3 The bayesian prosody recognizer: robustness under variability A growing body of evidence supports a Bayesian account of spoken-word recognition in which simultaneous multiple interdependent hypotheses are considered about the words being said, their component segments, and aspects of expressiveness that are heard to accompany those words. According to this view, the linguistic structures that are perceived are those that ultimately best explain experienced sensory information. Our proposal is that a Bayesian Prosody Recognizer (BPR) supports this inferential process by extracting prosodic structures (syllables, phrases) and words while deriving utterance interpretations. The BPR draws inspiration from other Bayesian models of speech recognition and understanding and analysis-by-synthesis approaches (Halle and Stevens 1962; Norris and McQueen 2008; Poeppel et al. 2008; Gibson et al. 2013; Kleinschmidt and Jaeger 2015) that envision the inferential, predictive process of spoken language understanding as involving simultaneous determination of multiple levels of linguistic structures, including hierarchical prosodic structures. In essence, as guaranteed by Bayes’s theorem, the listener combines prior knowledge with signal-driven likelihoods to obtain an optimal interpretation of current input. The BPR also draws inspiration from previous accounts arguing that speech recognition requires parallel evaluation of segmental and suprasegmental interpretations (in particular the Prosody Analyzer of Cho et al. 2007). Evidence for predictive and inferential processes in speech recognition is reviewed in multiple sources (Pickering and Garrod 2013; Tavano and Scharinger 2015; Kuperberg and Jaeger 2016; Norris et al. 2016). A central motivation for the BPR is the variability problem, as already introduced: structure extraction needs to be robust in spite of variability in speech. Bayesian inference is a response to this challenge because it ensures optimal interpretation of the current input. The BPR instantiates four key characteristics about prosodic processing in spoken-word recognition. All are further specifications of how the BPR offers ways to ensure robustness of recognition under acoustic variability.

36.3.1 Parallel uptake of information As we review in the following subsections, considerable evidence from studies examining the temporal dynamics of the recognition process supports our contention that timing and pitch characteristics constrain word identification, and that they do so at the same time as segmental information. In our view, parallel uptake of information has at least two import ant consequences. First, it makes it possible for structures to be extracted at different representational levels simultaneously. This can readily be instantiated in the BPR. Just like there can be, in a Bayesian framework, a hierarchy of segments (Kleinschmidt and Jaeger 2015), words (Norris and McQueen 2008), and sentences (Gibson et al. 2013), there can also be a Bayesian prosodic hierarchy, potentially from syllables up to intonational phrases. Second, it means that the same acoustic information can contribute simultaneously to the construction of different levels of linguistic representation, including the prosodic, phonological, lexical, and higher (syntactic, semantic, pragmatic) levels. In order to accomplish the above,

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY AND SPOKEN-WORD RECOGNITION 513 the BPR must analyse information across windows of varying sizes simultaneously (some quite long, such as recognizing a tune or determining turn-taking structures in discourse). As an example of both of the above, consider that as durational information for a prosodic word (i.e. a single lexical item) accumulates, it can also provide the basis of evidence for a phrase that contains that word. Evidence about that word influences the interpretation of syntactic information, and so forth. Suprasegmental information (as acoustically defined) has been shown to influence recognition in at least four different ways.

36.3.1.1 Influences on processing segmental information Segments belonging to stressed syllables in sentences are processed more quickly than those belonging to unstressed syllables (Shields et al. 1974; Cutler and Foss 1977). Segmental content in stressed syllables is more accurately perceived than that in unstressed syllables (Bond and Garnes 1980), and mispronounced segments are more easily detected in stressed syllables than in unstressed syllables (Cole and Jakimik 1978). Distortion of normal word stress information also impairs word processing and recognition (Bond and Small 1983; Cutler and Clifton 1984; Slowiaczek 1990, 1991). Recent findings indicate that categorization of speech segments is modulated by the type of prosodic boundary preceding those segments (Kim and Cho 2013; Mitterer et al. 2016). All of the above evidence supports the view that suprasegmental and segmental sources of acoustic information in words are the basis of parallel inference processes at multiple levels of linguistic structure. In keeping with this view, it has been shown that the same information (durational cues; Tagliapietra and McQueen 2010) can simultaneously help listeners to determine which segments they are hearing and the locations of word boundaries.

36.3.1.2 Influences on lexical segmentation Consistent with the BPR, the metrical properties of a given syllable affect the likelihood of listeners inferring the syllable to be word-initial (Cutler and Norris 1988; Cutler et al. 1997). For instance, strong syllables are more likely heard as word-initial in errors in perception (Cutler and Butterfield 1992). There is evidence that listeners use multiple cues (some lexical and some signal-driven, based on segmental and suprasegmental acoustic properties) to segment continuous speech into words (Norris et al. 1997). Suprasegmental cues appear to play a more important role under more difficult listening conditions. Thus, for example, the tendency to assume that strong syllables are word-initial is stronger when stimuli are presented in background noise than when there is no noise (Mattys 2004; Mattys et al. 2005).

36.3.1.3 Influences on lexical selection Suprasegmental pronunciation modifications modulate which words listeners consider and which words they eventually recognize. For example, subtle differences in segment durations or whole syllables can help them to determine the location of syllable boundaries (Tabossi et al. 2000), word boundaries (Gow and Gordon 1995), and prosodic boundaries (e.g. in making the distinction between a monosyllabic word such as cap and the initial syllable of a longer word such as captain; Davis et al. 2002; Salverda et al. 2003; Blazej and Cohen-Goldberg 2015). Additional kinds of suprasegmental acoustic-phonetic information, including pitch and intensity, also modulate perception of syllable boundaries (Hillenbrand and Houde 1996; Heffner et al. 2013; Garellek 2014). The rapidity with which

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

514 JAMES M. MC QUEEN AND LAURA DILLEY these kinds of lexical disambiguation take place (e.g. as measured with eye tracking; Salverda et al. 2003) indicates that suprasegmental processing is not delayed relative to segmental processing. Variation in pronunciation associated with distinct positions of words in prosodic phrases (e.g. whether the two words in the phrase ‘bus tickets’ span an inton ation al phrase boundary or not) has also been shown to modulate lexical selection (Christophe et al. 2004; Cho et al. 2007; see also Tremblay et al. 2016, 2018 for similar nonnative language effects). Some earlier studies (Cutler and Clifton 1984; Cutler 1986) suggested that stress differences cued by suprasegmental information (e.g. the distinction between the ‘ancestor’ and ‘tolerate’ senses of ‘forbear’, which is not due to a difference in the segments of the words; Cutler 1986) did not constrain lexical access substantially. Subsequent experiments, however, have indicated that stress does constrain lexical access, albeit to different extents in different languages, as a function of the informational value of suprasegmental stress cues in the language in question (Cutler and van Donselaar 2001; Soto-Faraco et al. 2001; Cooper et al. 2002). For example, the influence of suprasegmental stress cues on word recognition is stronger in Dutch, where such cues tell listeners more about which words have been s poken, than in English, where segmental differences are more informative (Cooper et al. 2002). Eyetracking studies indicate that suprasegmental cues to stress are taken up without delay and can thus support lexical disambiguation before any segmental cues could disambiguate the input (Reinisch et al. 2010; Brown et al. 2015). Relatedly, work on word recognition in tone languages has shown how pitch characteristics of the input constrain word identification in parallel with the uptake of segmental information (Lee 2009; Sjerps et al. 2018).

36.3.1.4 Influences on inferences about other structures Consistent with the BPR, there is considerable evidence that suprasegmental information influences the listener’s inferences about various levels of structure beyond the word level, simultaneously, in real time. The focus of this chapter is on spoken-word recognition, but since perception of lexical forms influences higher levels of linguistic structure and inference, it is important to note that there is evidence that prosody and other higher levels of linguistic knowledge are extracted in parallel. That is, perception of prosodic information and perception of syntactic structure are interdependent (Carlson et al. 2001; Buxó-Lugo and Watson 2016) and prosody influences semantic and pragmatic inference (Ito and Speer 2008; Rohde and Kurumada 2018).

36.3.2 High contextual dependency Another characteristic of prosodic processing in spoken-word recognition is its high contextual dependency. That is, the interpretation of the current prosodic event depends on the context that occurs before and/or after that event. Context can be imagined as a timeline, where ‘left context’ temporally precedes an event and ‘right context’ follows it.

36.3.2.1 Left-context effects Under the BPR account, regularities in context that are statistically predictive of properties of upcoming words will be used to infer lexical properties of upcoming words, giving rise

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY AND SPOKEN-WORD RECOGNITION 515 to left-context effects. It is well attested that suprasegmental aspects of sentential context affect the speed of processing of elements. For example, suprasegmental cues in a sequence of words preceding a given word affect processing speed on that word (Cutler 1976; Pitt and Samuel 1990) and accuracy of word identification (Slowiaczek 1991). The rhythm of stressed and unstressed syllables is an important cue for word segmentation in continuous speech (Nakatani and Schaffer 1978). Further, a metrically regular speech context has also been shown to promote spoken-word recognition (Quené and Port 2005). Our BPR proposal accounts for these findings in terms of statistical inference on the basis of regularities in the speech signal. Structures in utterances formed by prosodic (e.g. rhythmic) patterning in production engender predictability of structure and timing in perception of upcoming sentential elements (Jones 1976; Martin 1972) at multiple hierarchical levels and points (Liberman and Prince 1977). Statistical regularities in stress alternation and timing are attested in speech production experiments, corpus studies, and theoretical linguistics (Selkirk 1984; Kelly and Bock 1988; Hayes 1995; Farmer et al. 2006). Changes in the priors in a Bayesian model can account easily for the effects of left prosodic context (and other types of preceding context) on recognition of the current word. Contextual influences of suprasegmental cues on perception of segmental information (e.g. voice onset time) are well known, particularly for timing (Miller and Liberman 1979; Repp 1982; Kidd 1989) but also for pitch (Holt 2006; Dilley and Brown 2007; Dilley 2010; Sjerps et al. 2018). However, such effects have by and large been found to involve fairly proximal speech context within about 300 ms of a target segment (Summerfield 1981; Kidd 1989; Newman and Sawusch 1996; Sawusch and Newman 2000; but see Wade and Holt 2005). More recent work has shown that suprasegmental information from the more distant (‘distal’) left context can also influence which words are heard—including how syllables are grouped into words, and even whether certain words (and hence certain phonemes) are heard at all. For example, the rate of distal context speech influences whether listeners hear reduced words such as are spoken as ‘err’ (Dilley and McAuley 2008; Pitt et al. 2016). Statistical distributions of distal contextual speech rates influence listeners’ word perception over the course of around one hour (Baese-Berk et al. 2014). Further, the patterns of pitch and timing on prominent and non-prominent syllables in the left context influence where listeners hear word boundaries in lexically ambiguous sequences such as crisis turnip vs. cry sister nip (Dilley and McAuley 2008; Dilley et al. 2010; Morrill et al. 2014a). These patterns also influence the extent to which listeners hear reduced words or syllables (Morrill et al. 2014b; Baese-Berk et al. 2019). Distal rate and rhythm influence lexical processing early in perception and modulate the extent to which lexically stressed syllables are heard to be word-initial (Brown et al. 2011b, 2015; Breen et al. 2014). Consistent with the BPR, whether a listener hears a word depends in gradient, probabilistic fashion on the joint influence of distal rate cues and proximal information signalling a word boundary (Heffner et al. 2013).

36.3.2.2 Right-context effects Information that follows can be informative about lexical content that may have already elapsed. A growing body of evidence indicates that listeners often commit to an interpret ation of lexical content only after the temporal offset of that content (Bard et al. 1988; Connine et al. 1991; Grossberg and Myers 2000; McMurray 2007). In segmental perception,

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

516 JAMES M. MC QUEEN AND LAURA DILLEY temporal information to the right of a given segment can influence listeners’ judgements of segmental perception (e.g. Miller and Liberman 1979). Eye-tracking studies show that lateroccurring distal temporal information (e.g. relative duration of a subsequent phoneme sequence that includes the morpheme /s/) can influence whether listeners hear a prior reduced function word (Brown et al. 2014). All of these findings indicate that acoustic information must be held in some kind of memory buffer and hence that perceptual decisions can be delayed until after the acoustic offset of that information. The extent to which listeners hold alternative parses in mind after a given portion of signal consistent with a given word has elapsed, as opposed to abandoning them, is an active area of research and debate (Christiansen and Chater 2016). While the effects of right context might at first glance appear to be more problematic, they too can be explained in a Bayesian framework. The key notion here is that different hierarchical levels of structure and constituency (e.g. segments, syllables, words, prosodic phrases) entail different time windows over which relevant evidence is collected and applied to generate inferences about representations at that level. This implies that acoustic evidence at a given moment might be taken as highly informative for structure at one level, while simultaneously being taken as only weakly informative (or indeed uninformative) about structure at another level. Depending on the imputed reliability of evidence as it appertains to each level, inferences about structure at different levels may be made at different rates (i.e. are staggered in time). Because evidence bearing on the structure of a larger constituent (e.g. a prosodic phrase) typically will appear in the signal over a longer time span than evidence bearing on the structure of a smaller one (e.g. a syllable), completion of the inferences about the larger constituent may often entail consideration of evidence from some amount of subsequent ‘right-context’ material. This apparent delay with respect to inferences about the structure of the larger constituent does not imply that the BPR does not always attempt to use all information simultaneously or that it does not attempt to draw inferences at different levels simultaneously. Rather, it implies only that in some cases the current information is insufficient for inferences at a given level of structure to be made with confidence, and hence that the BPR may wait for further information in the upcoming context before committing to an interpretation of structure at that level. This view also entails that later-occurring information might provide evidence that an earlier assumption about structure was not well supported and hence the possibility of revision of inferences drawn earlier.

36.3.2.3 Syntagmatic representation of pitch Phonological interpretation of pitch cues in spoken language comprehension requires consideration of both left and right pitch context (Francis et al. 2006; Sjerps et al. 2018). Left and right context is also important in listeners drawing abstractions about the tonal properties of a given syllable, including that which is relevant to perceiving distinct lexical items (Wong and Diehl 2003; Dilley and Brown 2007; Dilley and McAuley 2008). Such findings support a view in which the representation of linguistically relevant pitch information is fundamentally syntagmatic (i.e. relational) and in which paradigmatic aspects of tonal information involve inferences driven by abstract knowledge about a typical speaker’s pitch range in relation to incoming pitch information (Dilley 2005, 2008; Lai 2018; Dilley and Breen, in press). This view is adopted in the BPR.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY AND SPOKEN-WORD RECOGNITION 517

36.3.3 Adaptive processing The perceptual apparatus must dynamically adapt to variation in order to remain robust in understanding intended messages. The available evidence suggests that prosodic processing is indeed very flexible. For instance, listeners adapt rapidly to the rate of compressed speech (Dupoux and Green 1997). The evidence just reviewed on context effects shows that listeners track characteristics of the current speech (e.g. distributional properties of speaking rate variation and the metrical properties of utterances) and flexibly adjust to that context (Dilley and McAuley 2008; Dilley and Pitt 2010; Baese-Berk et al. 2014; Morrill et al. 2015). Another way in which prosodic processing has been shown to be adaptive is that it involves perceptual learning. It has been established that listeners can adapt to variation in the realization of segments (Norris et al. 2003; Samuel and Kraljic 2009): they tune in, as it were, to the segmental characteristics of the speech of the current talker. It is thus plausible to expect that there are similar adjustments with respect to suprasegmental characteristics. There is indirect evidence that this may be the case. Listeners adapt to the characteristics of accented as well as distorted speech (Bradlow and Bent 2008; Mitterer and McQueen 2009; Borrie et al. 2012; Baese-Berk et al. 2013), which presumably includes adjustments to suprasegmental features. But there is also more direct evidence. Dutch listeners in a perceptual-learning paradigm can adjust the way they interpret the reduced syllables of a particular Dutch speaker (Poellmann et al. 2014), and Mandarin listeners adjust the way they interpret the tonal characteristics of syllables through exposure to stimuli with ambiguous pitch contours in contexts that encourage a particular tonal interpretation (Mitterer et al. 2011). The BPR therefore needs to be flexible. Detailed computational work on perceptual learning in a Bayesian model with respect to speech segments has already been performed (Kleinschmidt and Jaeger 2015). The argument, in a nutshell, is that learning is required for the listener to be able to recognize speech optimally, in the context of an input that is noisy and highly variable due, for instance, to differences between talkers (Norris et al. 2003; Kleinschmidt and Jaeger 2015). That is, the ideal observer needs to be an ideal adapter. Exactly the same arguments apply to prosodic variability. Learning processes, for example based on changes in the probability density function of a given prosodic constituent for a given idiosyncratic talker, should be instantiated in the BPR in a similar way to those already implemented for segments.

36.3.4 Phonological abstraction The final characteristic of prosodic processing in spoken-word recognition is that it is based on phonological abstraction. The listener must be able to form abstractions so as to remain optimally robust and capable of handling not-yet-encountered variation. Phonological abstraction is thus also a feature of the BPR. As in the previous Bayesian accounts focusing on segmental recognition (Norris and McQueen 2008; Kleinschmidt and Jaeger 2015), the representations that inferences are drawn about are abstract categories so that (as the adaptability of the BPR also guarantees) the recognition process is robust to variation due to differences across talkers and listening situations. Evidence suggests that the abstractions

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

518 JAMES M. MC QUEEN AND LAURA DILLEY about categories entail generalizations about segmental structures and allophonic variation (Mitterer et al. 2018); lexical stress and tone (Sulpizio and McQueen 2012; Ramachers 2018; Sjerps et al. 2018); pitch accent, pitch range, and boundary tone types (Cutler and Otake 1999; Dilley and Brown 2007; Dilley and Heffner 2013); and relationships between phonological elements and other aspects of the linguistic structure of information, such as grammatical categories (Kelly 1992; Farmer et al. 2006; Söderström et al. 2017). Prosodic processing in speech recognition appears to involve phonological abstraction. One line of evidence for this comes from the learning studies just reviewed. If perceptual learning generalizes to the recognition of words that have not been heard during the expos ure phase, then some type of abstraction must have taken place—the listener must know which entities to apply the learning to (cf. McQueen et al. 2006). The studies on learning about syllables (Poellmann et al. 2014) and tones (Mitterer et al. 2011) both show generalization of learning to the recognition of previously unheard words. Experiments on learning novel words also provide evidence that listeners have abstract knowledge about prosody. In these experiments (on prosodic words in Dutch: Shatzman and McQueen 2006; on lexical stress in Italian: Sulpizio and McQueen 2012), listeners learned new minimal pairs of words, and the new words were then acoustically altered to remove suprasegmental cues that distinguished between the pairs. In the final test phase, the listeners heard the altered (training) words and their unaltered (original) variants. Eyetracking measures revealed that the listeners had knowledge about the suprasegmental cues that they could apply to the online recognition of the novel words, even though they had never heard those words with those cues (for the Dutch listeners, durational cues distinguishing monosyllabic words from the initial syllables of disyllabic words; for the Italian listeners, durational and amplitude cues to antepenultimate stress in trisyllabic words). These findings suggest that processing of prosody in spoken-word recognition involves not only the uptake of fine-grained acoustic-phonetic cues to prosodic structure but also the storage of abstract knowledge about those cues. That is, while the fine phonetic details about the prosody in the current utterance are key determinants of word recognition and speech comprehension, the listener abstracts over those details in order to be able to understand future utterances. Speakers also form phonological abstractions based on long-term knowledge of phonetic properties of talker attributes, such as gender (Johnson et al. 1999; Lai 2018), that contribute to Bayesian inferences about spoken words and other aspects of linguistic meaning. Phonological abstractions are also formed based on simultaneous or sequential statistical correspondences between phonetic properties, such as pitch and non-modal voice quality, which are phonetic properties that co-vary in many lexical tone languages (Gordon and Ladefoged 2001; Gerfen and Baker 2005; Garellek and Keating 2011; Garellek et al. 2013). Such phonological abstraction—formed from long-term statistical knowledge of correspondences—is essential for drawing correct inferences based on otherwise highly ambiguous suprasegmental cues (including those for pitch and duration) about, for example, intended words, meaning, and structure (Gerfen and Baker 2005; Bishop and Keating 2012; Lai 2018). For instance, knowledge about co-occurrences of pitch and spectral (e.g. formant frequency) information for male versus female voices can be used to infer a typical or mean pitch of a talker’s voice and/or pitch span, from which Bayesian inferences can be drawn about phonological structures (such as those for pitch accents and lexical tones) and associated meanings (Dilley 2005; Dilley and Breen, in press). The BPR assumes that such l ong-term

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY AND SPOKEN-WORD RECOGNITION 519 abstracted statistical knowledge about talkers and the simultaneous and sequential distributional properties of the phonetic cues they produce is, along with talker-independent abstract phonological knowledge, the basis of the Bayesian probabilistic inferences that enable optimal decoding of spoken signals.

36.4 Conclusions and future directions We have argued that spoken-word recognition is robust under speech variability because it is based on Bayesian perceptual inference and that a vital component of this process is the BPR. As a spoken utterance unfolds over time, the BPR, based on prior knowledge about correspondences between acoustic variables, on the one hand, and meanings and structures, on the other, makes Bayesian inferences about the prosodic organization, lexical content, and semantic and pragmatic information in the utterance, among other inferences. These inferences are both signal and knowledge driven, and concern abstract structures at different levels in the prosodic hierarchy that are computed in parallel, informed by statis tical distributions of relationships between acoustic cues often considered segmental or suprasegmental. Inferences about a given stretch of input are influenced by earlier input and by inferences about it, and can be revised based on later input. Importantly, the BPR adapts to current input to optimize its inferences. We have suggested that the goal of the BPR is to derive the metrical and grouping structures in each utterance at different levels in the prosodic hierarchy. Especially for utterancelevel inferences, the representation must include a sparse set of tones, including pitch accents, boundary tones, and/or lexical tones, which are autosegmentally associated with particular positions in metrical and grouping structures indexed to the lexicon (Gussenhoven 2004; Ladd 2008b; Dilley and Breen, in press). Establishing how listeners recover this prosodic hierarchy, and the number of levels that need to be built, are import ant challenges for future research. The BPR will need to be implemented as part of a full Bayesian model of speech recognition, which includes, but is not limited to, prosodic inferences. Our view is that segmental and suprasegmental structures are built in parallel, based on information that may inform inferences about either or both types of structure. Over time, inferences about prosodic structure feed into (and are in turn influenced by) inferences made about segments and words of the unfolding utterance and its current interpretation. The model will need to specify how interacting processes determine spoken-word recognition and how inferences drawn about the speech signal change over time. It will also need to be tested, through simulations and experimentation. One way to evaluate and develop the BPR would be to compare it to other models on the role of prosody in spoken-word recognition. Unfortunately, no such alternative models currently exist. Shuai and Malins (2017) have recently proposed TRACE-T, an implementation of TRACE (McClelland and Elman 1986) that seeks to account for the processing of tonal information in Mandarin monosyllabic words. While this is a very welcome addition to the literature, TRACE-T is much more limited in scope than the BPR. Comparisons could potentially also be made to the Prosody Analyzer (Cho et al. 2007; but the BPR can be seen as a development of that model) and to Shortlist B (Norris and McQueen 2008; but Shortlist B

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

520 JAMES M. MC QUEEN AND LAURA DILLEY is limited, with respect to prosody, to the role of metrical structure in lexical segmentation, and again the BPR is largely inspired by the earlier model). Detailed comparisons of the BPR to other models (e.g. Kurumada et al. 2018) will have to wait for the implementation of the BPR and for the development of competitor models of equivalent scope. Another important aspect of future work will be cross-linguistic comparison. Most work on prosody in spoken-word recognition has been done on English or a small set of related European languages. There are some exceptions to this Eurocentric bias (Cutler and Otake 1999; Ye and Connine 1999; Lee 2007), and there has been an upsurge of work on, for example, pitch cues in conveying lexical and other meanings in typologically diverse languages (Kula and Braun 2015; Ramachers 2018; Sjerps et al. 2018; Wang et al. 2018; Yamamoto and Haryu 2018; Genzel and Kügler, in press). Much research nevertheless still is needed to explore how the full set of prosodic phenomena in the world’s languages modulates the recognition process. We do not expect that experiments on non-European languages will lead to falsification of the Bayesian model. For example, pitch conveys different kinds of structure simultaneously in a given language, and is used to convey lexical information to different degrees in different languages. Pitch is simply less informative about lexical structure in a Bayesian statistical sense in intonation languages than in tone languages and thus will be relied on less in discriminating between and recognizing words in intonation languages. While such cross-linguistic differences can thus readily be captured in a Bayesian model, it will be important to explore how pitch information can simultaneously inform inferences about words and inferences about intonational structures in a tone language, and how this weighting changes in intonation versus lexical tone languages. The Bayesian model will need to be developed in the direction of neurobiological implementation. As in psycholinguistic research (including computational modelling), much work in cognitive neuroscience focuses on how segments (e.g. individual consonants or vowels) are recognized, and how that contributes to word recognition. Prosody had tended to be ignored. There are some interesting new approaches—for example, evidence of neural entrainment to the 4 Hz oscillations at which speech tends to be spoken (i.e. the ‘syllable rate’) (Giraud and Poeppel 2012; Ding et al. 2017). Nevertheless, much work still needs to be done to specify the brain mechanisms that support spoken-word recognition as a process that depends on parallel inferences about segmental content and prosodic structures (e.g. whether entrainment is modulated by information arriving in the speech signal at faster or slower rates than 4 Hz). It will also be necessary to specify how the proposed model relates to other aspects of language processing, speech production in particular. Knowledge that is needed to support recognition (e.g. the acoustic characteristics of words with penultimate stress) may not be relevant in speech production. It remains to be determined whether and to what extent the processes and representations involved in recognition are shared with those involved in production. It is already clear, however, that there is an intimate relationship between input and output operations. For example, the Bayesian recognition process depends on the ability of the recognizer to track production statistics. There are undoubtedly constraints on which statistics are tracked (e.g. with respect to the size of the structures that are tracked), but future work will need to establish what those constraints are and why certain statistics are tracked and not others. There is also a need to evaluate the model not only relative to other domains of cognitive psychology (such as speech production, language acquisition, and second language

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY AND SPOKEN-WORD RECOGNITION 521 rocessing) but also relative to other domains of linguistics. The representations of prosodic p structure that a listener needs for efficient speech recognition may or may not have a oneto-one correspondence with those that are most relevant (for example) to language typ ology. It is theoretically possible, for example, that a structure such as the prosodic word may have an essential role in typological work and yet have no role in processes relating to the cognitive construction of prosodic structures during spoken-word recognition. It is another important challenge for future research to establish the extent to which representations of prosody indeed vary across different domains of linguistic enquiry. We have here reviewed the state of the art of research on prosody in spoken-word recognition. Rather than being theoretically neutral, we have advocated a specific model. We look forward to future research testing our central claim that prosody influences speech recognition through Bayesian perceptual inference.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 37

The Role of Phr aseLev el Prosody i n Speech Production Pl a n n i ng Stefanie Shattuck-Hufnagel

37.1 Introduction Speakers organize the words of an utterance into groups or phrases, and produce some words and syllables with greater prominence than others. Early treatments of these phenomena assumed that they were dictated by the morphosyntactic structure of the sentence intended by the utterance. Accordingly, these accounts focused on aspects of the spoken signal that did not appear to be specified by the words or their sounds, more specifically the pattern of changes in three acoustic parameters that were considered to be suprasegmental (Lehiste 1970) or ‘prosodic’: fundamental frequency (f0), duration, and amplitude (Fry 1955). Thus, for example, Lehiste et al. (1976) showed that certain types of syntactic ambiguity could be disambiguated by changing the duration of a word or syllable at the end of a constituent, as when a longer duration of the word men in an utterance of the word string Old men and women stayed at home causes listeners to infer a syntactic boundary after the phrase old men, so that the adjective old describes men but not men and women. The emergence of modern theories of spoken prosody in the 1970s and 1980s, however, introduced two new ideas that were particularly significant for models of speech production. The first was the insight that the prosodic structure of an utterance, while often influenced by syntactic structure, is separate from it. That is, it was proposed that prosody is a separate component of the phonological grammar, with its own hierarchical structure that is substantially different from the syntactic hierarchy—much simpler (flatter), without recursion, and so on (Liberman and Prince 1977; Selkirk 1984; Nespor and Vogel 1986; Hayes 1989b). This insight suggested that models of speech planning for production need to include a component for generating a prosodic structure that was separate from (although often heavily influenced by) the morphosyntactic structure. At the same time, it

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE ROLE OF PHRASE-LEVEL PROSODY IN SPEECH PRODUCTION PLANNING 523 provided the beginnings of an understanding of why one and the same sentence, with a fixed syntactic structure and propositional meaning, could nevertheless be produced with several different prosodic structures, as in (1) and (2). (1) [Sesame Street is brought to you] [by the Children’s Television Workshop] (2) [Sesame Street is brought to you by] [the Children’s Television Workshop] The second insight that was significant for speech production models, made possible by the first, was the discovery that a very large proportion of the variation that is observed in the surface phonetic characteristics of words is determined by the prosodic structure of the utterances they occur in. Until the emergence of a theory of the prosodic hierarchy, many aspects of phonetic variation appeared to occur randomly, because they were not demonstrably related to syntactic structure. For example, Umeda (1978) found no good account of where speakers ‘glottalize’ the onsets of vowel-initial words, because she was searching for an account in terms of where those words occur in syntactic structure. However, once the hierarchy of prosodic constituents had been proposed, it was discovered that these ‘glottalized’ regions in the signal occur (probabilistically) at the onsets of prosodic phrases, as well as at prosodic prominences (Pierrehumbert and Talkin 1992; Dilley et al. 1996; Garellek 2014; see Table 37.1). This chapter briefly summarizes some of the theories of prosody that emerged during the last half of the twentieth century (§37.2), presents a selection from the copious amounts of evidence that have accumulated since then to show that prosodic structures govern much of the systematic context-governed variation in surface phonetic form that occurs across different utterances (§37.3), and describes some of the ways these findings have been incorp orated into models of speech production planning (§37.4). Throughout the chapter, a focus will be the role of individual acoustic cues to prosodic constituent and prominence structure (Cole and Shattuck-Hufnagel 2016; Brugos et al. 2018).

Table 37.1 Distribution of glottalized word-onset vowels in a sample of FM radio news speech, showing the preference for glottalization at the onset of a new intonational phrase and at the beginning of a pitch accented word, as well as individual speaker variation. Stress level is indicated with +/–F for full versus reduced vowel, and +/–A for accented versus unaccented syllable

–Phrase-Initial

+Phrase-Initial

Speaker

–F, –A

+F, –A

+ F, +A

–F, –A

+F, –A

+F, +A

f1a

6% (128)

17% (46)

80% (64)

22% (36)

84% (44)

100% (12)

f2b

10% (282)

18% (60)

58% (191)

29% (104)

52% (61)

90% (29)

f3a

3% (157)

12% (41)

56% (80)

36% (42)

90% (29)

93% (27)

m1b

0% (314)

4% (82)

34% (167)

5% (137)

33% (49)

55% (29)

m2b

5% (150)

0% (26)

27% (88)

14% (57)

63% (27)

75% (20)

(Dilley et al. 1996)

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

524 STEFANIE SHATTUCK-HUFNAGEL

37.2 Modern theories of prosody Before the mid-twentieth century, treatments of prosody often focused on poetic meter or instructions for oratorical effectiveness, rather than on the rhythmic and intonational structure of conversational utterances. Sanskrit writings describe the complex metrical structure of literary forms called chandras, and the metrical structure of ancient Greek, Latin, and Arabic poetry did not lack for analysis. In modern times, perhaps the first comprehensive treatment of intonation was Pike’s The Intonation of American English (1945), which sought to provide a statement of the structure of the English intonation system as such, in relation to the structural systems of stress, pause and rhythm. Somewhat later, Lehiste’s (1970) early study Suprasegmentals, and her seminal monograph An Acoustic-Phonetic Study of Open Juncture (1960b), reported results using emerging technologies that permitted objective measurement of prosodic dimensions of spoken utterances, so that arguably, phonetic studies of prosody were instrumental in establishing phonetics as one of the major subdisciplines in linguistics in American universities (Johnson 2011). In the immediately succeeding decades, there was a burgeoning interest in prosody and intonation worldwide, as linguists, psycholinguists, and engineers began to recognize the significant role of this aspect of the grammar, particularly for speech planning and production. This was in part due to the highlighting of the importance of prosody by efforts to improve, for example, the teaching of second languages, models of human speech processing in both perception and production, and algorithms for automatic speech recognition and synthesis. The British School of intonational theory was developed by investigators such as Halliday (1967), O’Connor and Arnold (1973), and Cruttenden (1986) (summarized in Roach 1994). This approach focused on the meaning of the final part of an intonational sequence, made up of the final phrase-level prominence (pitch accent) in a phrase (or ‘breath group’) and any following tones. In the Netherlands, ’t Hart et al. (1990) characterized the contrastive categories of Dutch intonation based on experiments using successively longer straight-line approximations to determine which contours sound the same to native listeners. They proposed a phonological system based on three f0 levels, with rises, falls, and plateaux between them. In Japan, Fujisaki and colleagues (Fujisaki and Hirose 1984) proposed a mathematically based implementation model grounded on the assumption that a human speaker produces excitation pulses to raise f0 at phonologically significant locations; this approach was designed for automatic synthesis and recognition of appropriate f0 contours in Japanese. One advantage of an explicit theory of prosodic structure is that it enables investigation of the effects of prosodic boundaries and prominences on surface phonetic form; in turn, the results of such studies provide a test of the claim that speakers are representing those prosodic structures and actively using them in the planning of spoken utterances. In some cases, such theories can also provide a set of conventions for labelling the prosody of spoken utterances. Some of the theories that were developed in the second half of the twentieth century, to specify the phonological elements of intonation that specify and distinguish different categories of intonation contours and (to a more limited extent) how they map to meaning, have a tight link to a proposed annotation system. One theory in

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE ROLE OF PHRASE-LEVEL PROSODY IN SPEECH PRODUCTION PLANNING 525 particular, the autosegmental-metrical (AM) theory, is associated with several annotation systems, including the ToBI annotation framework, standing for ‘Tones’ (tonal target levels associated with phrase-level prominences and boundaries) ‘and Break Indices’ (a numerical scale associated with different levels in the hierarchy of prosodic constituents), and ToDI (Transcription of Dutch Intonation; Gussenhoven 2005), initially proposed for Dutch. The AM approach and its associated system for developing annotation frameworks has been investigated for many different languages—for example, for British English (IViE; Grabe et al. 2001), Russian (ToRI; Odé 2008), and German (Grice et al. 2005a) (see Jun 2005a, 2014a for extensive presentations). This is the framework that will be discussed further below, in connection with the role of prosody in speech production planning models.1 The AM approach to prosody integrates two aspects of prosodic structure: intonational phrasing and prominence, and the hierarchy of prosodic constituents (Ladd 1996, 2008b; for briefer summaries see Shattuck-Hufnagel and Turk 1996; Cutler et al. 1997). The inton ational component is drawn from Pierrehumbert’s (1980) and Beckman and Pierrehumbert’s (1986) grammar of intonation, which includes two levels of intonational phrasing: (full) intonational (IP) and intermediate (ip). These constituents, sometimes described as the domains of recognizable and complete intonation contours, are defined by three types of tonal target: pitch accents (which signal phrase-level prominence) and two types of edge tone (boundary tones, realized on the final syllable of a phrase, and phrase tones, often described as controlling the f0 contour between the final pitch accent of a phrase and the final boundary tone). As noted above, the prosodic constituent hierarchy component is drawn from the work of Selkirk (1984), Nespor and Vogel (1986), Hayes (1989b), and others, and specifies that an utterance of English is made up of one or more IP’s, each in turn made up of one or more ip’s. Additional lower levels in the hierarchy that are variably used across languages include the accentual phrase, the foot, and the syllable.2 This chapter uses the prosodic structures developed in AM theory for illustrative purposes, in part because this theory has provided the framework for many empirical studies of the surface phonetic form of its phonological categories, and of their role in cognitive processing of phonological/phonetic aspects of prosody in speech perception, production, and acquisition. For more thorough presentations of the theory, the reader may want to 1 Like AM theory, many of the other theories that have been proposed (as noted above) are also associated with labelling systems, including for British English (Halliday 1967; Cruttenden 1986; and later Grabe et al. 2001), for Dutch (’t Hart et al. 1990, also extended to other languages), and by Hirst and DiCristo (1998) for a number of languages. More recently, Dilley and Brown (2005) have proposed a system called RaP (for Rhythm and Pitch) that includes annotations for perceived rhythmic regularity. In the framework of Articulatory Phonology (AP), theoretical and experimental work has addressed the question of how AP accounts for the effects of prosodic boundary and prominence on gestural patterns (Byrd and Saltzman 1998; Byrd 2000; Byrd et al. 2006; Krivokapić 2012) but has not proposed a transcription system. Other approaches have emerged from work on automatic speech recognition and synthesis, including (for example) Fujisaki and Hirose (1984). Not all of these approaches have concerned themselves with investigating the role of prosody in human speech production planning models to the same extent that investigators working in the AM framework have done. 2 A wide variety of lower-level constituents have been proposed over the years, on the basis of various types of evidence, including the ‘clitic group’ (Hayes 1989b) and the ‘Abercrombian foot’ (Abercrombie 1965), and it appears that different languages may make use of different sets of levels in the hierarchy.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

526 STEFANIE SHATTUCK-HUFNAGEL consult the extensive discussion in Ladd (1996, 2008b) and summaries such as Arvaniti (in press) (see also chapter 6). For discussion of the relationship between the transcription system espoused by the British School of intonation and that of AM theory, see Roach (1994) and Grabe et al. (2001); for the relationship between ToBI and RaP (Rhythm and Pitch) see Breen et al. (2012); for discussions of AM theory in a wide range of languages see Jun (2005a, 2014a). The emergence of theoretical approaches such as AM theory allowed investigators to empirically test the hypothesis that speakers make use of prosodic structures as they plan an utterance for production. §37.3 describes some of the quantitative evidence that has emerged from these investigations.

37.3 Evidence for the active use of prosodic structure in speech production planning It goes without saying that any spoken utterance exhibits a timing pattern, an f0 pattern, and an amplitude pattern; typical speech cannot be produced without these characteristics (although whispered speech may lack an f0 contour). More interestingly, a wide variety of evidence shows that speakers reflect the hierarchical structures of prosodic theory in the surface phonetic patterns of an utterance, which can serve as cues to the listener about the intended prosodic structure (see chapter 36 for further discussion of the perceptual consequences of these patterns). Moreover, in addition to these three well-recognized acoustic correlates of prosodic structure, speakers produce an even wider range of acoustic cues, often probabilistic in their occurrence, at prosodically significant locations. These include phonation qualities such as irregular pitch periods (IPP’s) (sometimes called ‘glottalization’; see Table 37.1 and associated discussion) and aspiration (turbulence noise produced at the glottis). The picture that emerges from these observations suggests that speakers plan their spoken output in terms of its prosodic structure, and that this structure governs much of the systematic variation in surface phonetic form that is commonly observed across utterances. These patterns of variation are difficult to account for otherwise, because they are only indirectly related to traditional morphosyntactic structures, via the influence of those structures (along with that of other factors) on the prosodic structures. Evidence for the view that speakers make active use of prosodic structure in planning an utterance for production comes from several different domains. One line of evidence comes from rules and processes that either apply at prosodic constituent edges or, conversely, apply only within a prosodic constituent (§37.3.1). Another line of evidence comes from the distribution of phonetic cues and cue values in relation to the hierarchy of prosodic structures, such as degrees of duration lengthening and shortening, and likelihood of voice quality changes (§37.3.2). Finally, evidence is found in behavioural studies that measure, for example, how prosodic structure influences the time it takes to initiate a spoken utterance (§37.3.3).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE ROLE OF PHRASE-LEVEL PROSODY IN SPEECH PRODUCTION PLANNING 527

37.3.1 Rules and processes that are sensitive to prosodic constituent boundaries: selection of cues Many of the processes that change the form of a word when it occurs in continuous speech are constrained by the boundaries of prosodic constituents. Some of these phenomena have been described at the phonological level, and some as phonetic processes. At the level of phonological processes, Selkirk (1984) and Nespor and Vogel (1986) provide examples from many different languages, such as stress readjustment in Greek within a clitic group, and Raddoppiamento Sintattico in Italian related to the phonological phrase, demonstrating that languages refer to prosodic constituents in the formulation of constraints on when and where such sound-level processes can occur. A wealth of additional evidence supports the view that surface phonetic variation is governed by prosodic structure. For example, in American English, the word-final /t/ of visit is flapped in visiting but can be lightly aspirated across the word boundary in the multi-word constituent visit it (Hayes 1989b). Flapping is also possible between words within a single intonational phrase, as in Might it audit Emmet at Ida’s?, but this is less likely across intonational phrase boundaries, as in Emmet, alias The Rat, eats only cheese (Hayes 1989b). Another cue that varies systematically with prosodic context is the probabilistic occurrence of IPP’s at the onset of word-initial vowels. That is, the distribution of this cue is different for full IP’s versus intermediate IP’s. At the onset of a higher-level full IP, IPP’s are likely to occur for any vowel (whether full or reduced); however, at the onsets of lower-level intermediate intonational phrases, it is largely restricted to full vowels only (Dilley et al. 1996; for some evidence of dialectal variation in these phonetic processes, see also Garellek 2014; Keating et al. 2015; Shattuck-Hufnagel 2017). Another line of evidence that implicates prosodic constituents as the source of constraints on surface phonetic form is the placement of a pitch accent within its word. Early accent occurs when a phrase-level prominence occurs not on the main-stress syllable of a word but on an earlier full-vowel syllable, as in e.g. JApanese FOOD (vs. JapanESE, in isolation), or MIssissippi LEgislator (vs. MissiSSIpi, in isolation). Selkirk (1984) and many others have observed that this shift in the location of prominence is likely to occur in conditions of stress clash—that is, it prevents the prominences on JapaNESE and FOOD from occurring too close together. Bolinger (1958) proposed a different account, noting that speakers prefer to place phrase-level accents as early as possible (prenuclear) and as late as possible (nuclear) in an intonational phrase. Subsequent work showed that early accent can indeed involve the early placement of a prenuclear pitch accent within the word, and that this is likely to occur when that pitch accent is the first one in a new intonational phrase, whether there is a stress clash or not, as in e.g. CHInese DREsser (to prevent a clash) but also CHInese anTIQUES (no clash; M. Beckman, pers. comm.). Cooper and Eady’s (1986) failure to find evidence of early accent under conditions of stress clash in pairs such as thirteen companies (early accent predicted) versus thirteen corporations (no early accent predicted) is probably due to the fact that early accent occurs in both conditions, following Bolinger’s onset marking principle for pitch accent placement. Note that Kelly and Bock (1988) found evidence of clash-governed prominence shift in conditions where nonsense words (like colVANE) were embedded in pairs of structures like Use the colVANE PROUDly (prominence shift predicted due to prominence clash between adjacent syllables) and The proud colVANE proPOSED (no prominence shift predicted due to no prominence clash). In these

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

528 STEFANIE SHATTUCK-HUFNAGEL contexts, the target word was phrase-medial rather than phrase-initial, and was therefore subject to prominence-clash effects rather than to phrase-onset-marking effects on pitch accent location within the word. In contrast, a nuclear pitch accent (i.e. the final accent in an intonational phrase) is generally required to occur on the last accentable syllable of its word—that is, the main-stress syllable (Shattuck-Hufnagel et al. 1994).3

37.3.2 Patterns of surface phonetic values in cues to distinctive segmental features, reflecting prosodic structure A second kind of evidence for the active use of prosodic structure in speech planning can be found in patterns of phonetic variation in both the acoustic and articulatory domains. In the acoustic domain, some prosodically governed cue value patterns involve what have traditionally been seen as cues to segmental feature contrasts (e.g. non-contrastive differences in voice onset time (VOT), or the interval between the release burst of a stop consonant and the onset of voicing for the following vowel), and others involve what were once regarded as direct correlates of syntactic structure (e.g. constituent-final lengthening). An example of systematic prosodically governed changes in events traditionally defined as cues to segmental features is Jun’s (1993) observation that, in Korean, VOT values increase systematically for stop consonants at the onset of higher-level prosodic constituents. An example of a pattern that was earlier seen as a correlate of traditionally defined syntactic structure is phrase-final lengthening. This phenomenon was documented by Klatt (1975, 1976), who, in accord with the then-current assumption that syntax determined the phrasing of spoken utterances, attributed it to clause- or sentence-final position. Ferreira (1993) examined the pattern of final lengthening in utterances with differing degrees of syntactic complexity and showed that these final-lengthening patterns did not reflect the syntactic hierarchy. Instead, her results showed that they were determined by the prosodic hierarchy, which had earlier been suggested by work showing a hierarchy of inter-phrasal pause durations (Gee and Grosjean 1983). Wightman et al. (1992), using a scalar break index labelling system to label the perceived prosodic boundaries (0 for the shallowest; 6 for the deepest) in a corpus of FM radio news speech, showed that for levels 0 to 4, the degree of lengthening on the constituent-final syllable increased monotonically with the level of the boundary in the hierarchy. Their break index levels BI 3 and BI 4 were later incorporated into the ToBI transcription system as boundaries of the intermediate intonational phrase (BI 3) and the full intonational phrase (BI 4). Lengthening associated with prosodic constituent boundaries is more complex than the term ‘final lengthening’ might suggest. Although Wightman et al.’s (1992) BI levels 5 and 6 did not show additional lengthening in the final syllable beyond the value for a BI 4 break, 3 An interesting exception occurs in some contexts for English words such as probably, possibly, and maybe, which can carry a pitch accent on both the main-stressed initial syllable and the final full-vowel syllable, particularly when they are produced as a single-word utterance with a L* H* contour. This may be related to the finding that when a single word carries all of the pitch accents in an utterance, it tends to bear two accents, as in the short dialogue Where do you live? In MASsaCHUsetts (Shattuck-Hufnagel et al. 1994). This is consistent with Bolinger’s (1958) proposal that speakers prefer to produce an early and a late accent in each intonational phrase.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE ROLE OF PHRASE-LEVEL PROSODY IN SPEECH PRODUCTION PLANNING 529 there was some evidence that the following pauses were longer for these constituents (see also Goldman-Eisler 1968). In support of this proposal, Shattuck-Hufnagel and Ren (2018) have recently found suggestive evidence that pause duration is a correlate of higher-level discourse structure above the BI 4 full intonational phrase. They obtained these data using an extended version of Rapid Prosodic Transcription (RPT; Cole et al. 2017), a crowd-based annotation system that does not assume a particular theoretical stance as to the number of levels in the hierarchy and is thus sensitive to levels above the IP. Shattuck-Hufnagel and Ren’s (2018) extended version of RPT uses three levels of boundary marking (/, //, and ///), which allows listeners to annotate higher-level prosodic constituents above the intonational phrase. A mechanism for adjusting durations at phrase boundaries has been proposed by Byrd and Saltzman (1998, 2003) in the framework of Articulatory Phonology. In that approach, a prosody-specific gesture called a pi-gesture spans the elements just before and just after a boundary. When the speaker’s internal clock, which controls the activation interval of that gesture, is slowed, the result is lesser amounts of lengthening at the beginning and end of the spanned interval, as the activation ramps up and down. This model provides an account of the fact that duration lengthening occurs on both sides of the boundary (for an utteranceinternal phrase boundary) and that less of it occurs in regions further from the boundary. However, another facet of the complexity of this phenomenon was observed by Turk and Shattuck-Hufnagel (2007), who noted the possibility of two separate final-lengthening mechanisms. They found that when words like Michigan occur phrase-finally, so that the main-stress syllable is separated from the phrase-final syllable by another syllable, the final syllable -gan is lengthened as expected, but the main-stress syllable Mi- is also lengthened (although less so) and the intervening syllable -chi- is not lengthened at all. This suggests that separate stress-based and boundary-location-based duration-lengthening mechanisms might be at work in the phrase-final word. Additional durational correlates of prosodic structure can be found in patterns of lengthening associated with phrase-level pitch accents, and shortening associated with multiple sub-constituents within a larger constituent. Turk and White (1999) reported that, in English, accent-related lengthening is distributed primarily in the rhyme of the accented syllable but can also extend to adjacent elements, more so to preceding regions than to succeeding ones. Lehiste (1972) was one of the first to document the effect of multiple sub-constituents on the duration of the first syllable of a word, reporting (for example) that the /i/ in sleep was shortened in sleepy and shortened yet more in sleepiness. Since then, additional studies have shown that there is a general tendency to shorten elements when the prosodic constituent they occur in contains multiple sub-constituents, although this tendency is not strong enough to maintain a fixed surface duration for the larger constituent (e.g. Dauer 1983; Krivokapić 2012). The prosodic constituent hierarchy also finds supporting evidence in the articulatory domain. For example, Fougeron and Keating (1997) measured the amount of contact between the surface of the tongue and the hard palate for /n/ in American English, using electropalatography, and found increasing amounts of contact for instances of /n/ produced at the onsets of increasingly higher levels of prosodic constituents. Such domain-specific strengthening patterns are not limited to English; Fougeron (2001) reported similar results for French, Cho and Keating (2001) found similar results in Korean, and Keating et al. (2003) found parallel evidence in four different languages. Byrd et al. (2006) are among

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

530 STEFANIE SHATTUCK-HUFNAGEL many investigators who have reported articulatory magnetometer data that support the existence of prosodic-boundary-related lengthening. Finally, there are sources of evidence that involve both the acoustic (perceptual) and the articulatory (production) domains. For example, Cole and Shattuck-Hufnagel (2018) asked speakers to listen to and then productively imitate short conversational utterances, a task that involves both perception and interpretation of the acoustic cues in the speech signal, and production. They reported that speakers reproduced the prosodic phonology of the utterances they heard (i.e. the location of phrase-level prominences and boundaries), a finding similar to those of Braun et al. (2006), but that speakers did not necessarily reproduce the precise pattern of surface phonetic cues to those prominences and boundaries (such as the probabilistic distribution of IPP episodes at intonational phrase onsets and at pitch-accented word onsets, for words that begin with vowels or sonorant consonants). This raises interesting questions about how to model the process by which the prosodic structure of the utterance to be imitated, along with the consequences of this structure for the acoustic cues and cue values in that heard utterance, are combined with the speaker’s own phonology-to-phonetics mapping to generate the imitated output. The related question of whether prosodic structures, like syntactic structures, can be primed, has also been addressed. For example, Bishop et al. (2015) have reported evidence that implicit prosody (i.e. the mental representation of prosody that a viewer forms when reading a sentence silently) can prime later parsing decisions. Braun and Tagliapietra (2010) also present evidence that contrastive intonation primes the recognition of contrastive referents, while non-contrastive intonation does not.

37.3.3 Behavioural evidence for the role of prosody in speech planning Another set of findings that implicates prosody in speech planning comes from measures not of the speech signal itself but of phenomena that arise during the planning process, before and as the utterance begins. For example, Wheeldon and Lahiri (1997) measured the initiation time for Dutch utterances with varying numbers of prosodic words (PWds) while holding the number of syllables constant, as in the English analogous utterances (I drank the) (milk) versus (I drank) (fresh) (milk), with two versus three PWds. Their results showed that (i) for a delayed production task, where the entire utterance could be planned ahead, the total number of PWds in the utterance influenced the time to initiate the utterance but (ii) for an immediate production task, where presumably only the first PWd could be planned, the complexity of the initial PWd influenced initiation time. More recently, this group has reported that compound words cliticize like monomorphemic words, resulting in a similar effect on initiation time for both compounds and monomorphemic words (Wynne et al. 2018). To the extent that delayed production tasks are analogous to the pre-planning that speakers carry out before producing an utterance, these results implicate the existence of a higher-level plan for the utterance, one that may be developed down to the level of its PWds, at least in some circumstances. It is an interesting and unexplored question whether the degree of prosodic pre-planning varies across different types of typical speaking

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE ROLE OF PHRASE-LEVEL PROSODY IN SPEECH PRODUCTION PLANNING 531 c ircumstances, or for different speakers; this parameter may not be a fixed and immutable aspect of the planning process.

37.4 The role of prosody in models of speech production planning The general agreement that utterances have prosodic structure—that is, that their words are grouped into prosodic constituents and exhibit patterns of relative prominence, and also that these prosodic structures influence the segmental phonological and phonetic shape of the utterance—raises a number of interesting questions about the production planning process. For example, how far ahead does a speaker plan the prosodic structure, and to what degree of detail? That is, when and how is the phrase-level prosody of an utterance (i.e. the groupings and relative prominences of its words) determined? And how, exactly, does this structure exercise its influence on the quantitative surface phonetic shape of an utterance? More generally, it has long been recognized that production models must grapple with the question of how the higher-level linguistic structure of an utterance influences that utterance’s surface phonetic form, how it plays a role in the planning process, and how far ahead the planning process extends. Lashley (1951) noted, in a remarkably common-sense line of argumentation, that speakers do not plan their speech one word at a time, as is clearly shown by speech errors with an anticipatory component (such as completed anticipations, e.g. the plane flies → the flane flies; completed exchanges, e.g. a big fat bug → a fig bat bug; and incomplete errors, e.g. show me some → so me---show me some). Instead, occasionally a word and sound that by rights should appear later in an utterance occurs earlier, showing that (i) that element is discretely represented during the planning process and (ii) it has been accessed well before its appropriate moment of appearance (see also Meringer and Mayer 1895; Meringer 1908; Fromkin 1971; MacKay 1987; and their references). Garrett (1975, 1980) explicitly proposed that exchange errors require a model of the utterance planning process that includes a structural planning frame, separate from its contents, into which words and other linguistic elements are serially ordered. When this process goes awry, an element that should occur later can be selected for an earlier slot, resulting in the first part of the exchange error; subsequently, the element that was displaced by the first part of the exchange is mis-selected for the slot where the displacing element should have occurred, creating the second part of the exchange. Garrett also noted aspects of word errors that suggest something further about the nature of the planning frame. For example, when a word with phrase-level prominence (pitch accent) exchanges with a word with no accent, the accent occurs at its original location, even though this means it occurs on a different word (i.e. it occurs on the word that has intruded into the accent-marked slot). This suggests that the slots in the planning frame are marked for phrase-level prosodic promin ence. Garrett also noted that some elements are serially ordered without their final phonological form, receiving that surface form via a later process of phonological spellout, and argued that this set of words includes function words. For example, serial ordering errors involving pronouns typically mis-order only certain of their features (i.e. gender and number), while the case is left behind with its location in the frame (e.g. She liked him → He liked

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

532 STEFANIE SHATTUCK-HUFNAGEL her, not *Him liked she). This suggests that locations in the planning frame are tagged with syntactic information, which then interacts with the mis-ordered gender and number information to ‘spell out’ or generate the phonology of the error output. Shattuck-Hufnagel (1992), working at the level of individual sound errors, pointed out similar phenomena at the level of the individual sound segment or cluster, where exchange errors provide evidence that the phonological planning frame and the individual sound segments that will occupy it are represented separately (3). This is because the phonemic element that is displaced by the first half of an exchange error does not subsequently occur at a random location but is serially ordered into the target slot that was originally intended for the displacing element. This observation is difficult to account for unless there is a multiword structural planning frame, separate from its segmental contents, to maintain the location of that planning ‘slot’ (Rosenbaum et al. 2007a). (3) Examples of segmental exchange errors 1) weak and feeble → feak and weeble 2) a copy of my paper → a poppy of my caper 3) top shelf → toff shelp 4) sweater drying → dretter swying 5) squeaky floor → fleaky squoor Initial proposals about the nature of the planning framework focused on syntactic structure. This was a reasonable approach, since it seems incontrovertible that speakers must represent the syntax of their utterances in some way. Moreover, syntactic structure seemed to provide a good explanation for phenomena such as phrase-final lengthening (Klatt 1975, 1976). However, as noted earlier, the emergence of modern approaches to pros odic analysis in the 1970s and 1980s provided an alternative to this view, in the form of a hierarchy of prosodic constituent structures and prominences for each utterance that is influenced by, but is not isomorphic with, the syntax of the corresponding sentence. Ferreira (1993) was one of the first to demonstrate that the prosodic structure of an utterance, rather than its surface syntactic structure, governed aspects of phonetic realization such as phrasefinal lengthening. Much of the subsequent work on the question of how higher-level pros odic structure influences surface phonetic form concerns the effect of word groupings (i.e. prosodic phrasing or constituent structure), although some exploration of the effect of phrase-level prominences has also occurred. For example, Turk and White (1999) showed that phrase-level accentuation added duration to the accented syllable, beyond that associated with lexical stress, and that this lengthening was attenuated at the syllable boundary. A number of models of the production planning process have been proposed that focus on a particular aspect of speech processing. For example, Fromkin (1971) emphasized the fact that error units are often grammatically motivated elements (such as phonemes or morphemes); Dell and colleagues were particularly concerned with the process of accessing words in the lexicon (see Dell 1986), while Guenther (e.g. 2006) focused on the brain mechanisms involved in speaking and perceiving speech. Most of these models, like the earlier ones mentioned above, dealt with only limited aspects of the speech production process. However, the landscape of speech production planning models was changed irrevocably by the appearance of Levelt’s volume Speaking in 1989, because of its ambitiously comprehensive treatment of the planning process for connected speech, from message formulation to

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE ROLE OF PHRASE-LEVEL PROSODY IN SPEECH PRODUCTION PLANNING 533 Panel a

Panel b

conceptual preparation in terms of lexical concepts surface structure morphological/metrical spellout procedures

‘citation’ metrical spellout

lexical concept

‘intonational meaning’

lexical selection lemma

morphological spellout

morphological encoding

segmental spellout procedures

‘citation’ segmental spellout

address frames and parameter settings

phonetic spellout procedures phonetic plan

sagmental spellout for phonological words

selfmonitoring prosody generator

lemmas mental lexicon word forms

morpheme phonological encoding syllabification phonological word phonetic encoding

SYLLABARY

phonetic gestural score articulation

sound wave

Figure 37.1 Two versions of the Nijmegen approach to modelling speech production planning. (a) shows the 1989 version of planning for connected speech, while (b) shows the 1999 version of planning for single-word (or single-PWd) utterances.

articulation. In that model, generating prosodic structure was a key aspect of the process, albeit a late one (Figure 37.1a). That was because the word-sized units that undergo phonetic encoding in that model are not lexical words but PWds, which can (in languages like English and Dutch) include more than one lexical item (as in e.g. sell it). The output of this Prosody Generator included not only the PWd level of structure but also markings on the individual PWds for phrase-level prominences (pitch accents) and prosodic constituent boundaries, derived from the ‘intonational meaning’ input. When this approach was implemented as a model of single word production (Levelt et al. 1999: fig. 1b, describing Roelof ’s 1997 Weaver algorithm), the multi-word aspect of planning took a back seat and the Prosody Generator understandably disappeared from the model. In the 1999 version, the word-level prosodic constituent PWd appears as the unit of phonetic encoding, providing a partial step towards connected speech for complex PWds like escort us. (See also Levelt 2001 for a pithy discussion of lexical selection and form encoding for multiword utterances (although without regard to phrase-level prosodic structure), and Levelt 2002 for further remarks on the planning of utterances containing multiple PWds.) Thus, in the 1989 version of what might be called the Nijmegen model, higher-level pros odic structure is imposed on the planned utterance towards the end of the planning process, after metrical and segmental spellout, and this structure is generated incrementally—that is, using only local information and requiring very little look-ahead. The Prosody Generator constructs a sequence of PWds for an utterance so that each PWd can be syllabified (to enable retrieval of syllable-sized articulatory plans) and structured into a hierarchy of

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

534 STEFANIE SHATTUCK-HUFNAGEL prosodic constituents, including phonological phrases and intonational phrases. Levelt argues that it is most parsimonious to assume that the process of phonetically encoding PWds, with their syllabic substructure and their phrasal superstructure, is incremental. This means that the encoding is a local process, with little look-ahead to later parts of the utterance; each PWd is input into the phonetic spellout module as soon as it is planned and syllabified, perhaps only one PWd ahead. However, Levelt (2002) points out two important caveats. First, the proposal that PWds are the unit of phonetic encoding, creating the representational unit by which syllable-sized articulatory instructions for the syllables of a PWd are retrieved from a long-term store of such instructions, is not incompatible with the hypothesis that higher-level representations of the larger utterance plan are generated concurrently. This is compatible with the view advanced in Wheeldon and Lahiri (1997, 2002) that speakers plan entire phrases, or even perhaps entire utterances, in terms of the number of PWds they contain, at a more abstract level of representation than the incremental process of phonetic encoding requires. Thus, in the Nijmegen model, although the role of higher-level prosodic phrase structure is acknowledged, and its construction is ascribed to the Prosody Generator, the details of how this structure influences phonological and phonetic planning are left for later development. This lack of explicitness highlights a potential problem for the model: how can the overall prosodic contour of an utterance, including its prosodic phrasing and accentuation, be generated one PWd at a time? In contrast, Keating and Shattuck-Hufnagel (2002) propose a Prosody First model, in which phrase- or utterance-level prosodic structure plays a role from the very beginning of the phonological and phonetic encoding processes for an utterance. The Prosody First model builds on the Nijmegen proposal, in that phonological information is transferred from the lexicon into the planning framework for a particular utterance in several steps. But the Prosody First model provides a different account of how the process of encoding the surface form of an utterance unfolds. It postulates that, as the surface syntactic structure emerges, a concurrent prosodic structure is being planned for the entire phrase or utterance. This prosodic structure is proposed to be minimal initially (an utterance with a single intonational phrase, a single PWd, etc.), and this minimal structure both becomes more complex and undergoes restructuring as various types of information are transferred into it from the surface syntactic plan and the lexicon. This view provides a different account of a number of phenomena; for example, consider its account of sound-level speech errors. The Prosody First model postulates that during an early stage in the planning process, speakers generate both a set of words activated for the utterance, with their phonological specifications (Shattuck-Hufnagel 1992), and the surface syntactic structure of the utterance, along with a default phrase- or utterance-sized p rosodic planning framework. This prosodic planning framework is then expanded and restructured, as information from the surface syntactic structure and the set of target words is transferred into it. Keating and Shattuck-Hufnagel (2002) point out that because this model includes a planning framework for an entire prosodic phrase (or even an utterance), it can account for the fact that sound-level interaction errors often involve sounds from two separate PWds; this is difficult to account for if the phonological and phonetic encoding for one PWd is incremental and is thus complete before the next one is processed. On the other hand, Levelt (2002) has pointed out that many if not most sound-level interaction errors occur between two adjacent PWds, and notes that in some cases the

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE ROLE OF PHRASE-LEVEL PROSODY IN SPEECH PRODUCTION PLANNING 535 encoding of two adjacent PWds may overlap in time, providing a potential mechanism for cross-PWd error interactions even in a largely incremental model. A careful study of the PWd organization of utterances in which sound-level interaction errors occur would shed light on the question of how many PWds ahead an utterance is typically planned, and to what level of detail. A recent finding of interest in this regard is the report that, in French, liaison of a coda consonant across a word boundary is more likely to occur if the following word is a high-frequency word (Kilbourn-Ceron 2017). Kilbourn-Ceron suggests that if the word following the current PWd is high frequency, it may be accessed sooner, increasing the possibility of liaison (although see Fougeron et al. 2001 for conflicting findings). Such observations suggest that the size of the window of material available for look-ahead may not be fixed, but instead may depend on contextual factors. Levelt (1989, 2001) voiced a similar view when he noted that the planning window for lexical encoding may be flexible, and even strategic, so that (for example) sound-level interaction errors across longer spans may be more likely in read speech, where the speaker has more time to plan ahead, than in typical conversational speech. An advantage of both the Nijmegen model (‘Prosody Last’ in some of its versions) and the Prosody First proposal is that they provide an explanation of why segment-sized serial ordering errors occur at all. The occurrence of such interaction errors between two elem ents of a planned utterance had always been somewhat puzzling, since the serial order of the individual sounds of a word must be specified in the lexicon, in order to distinguish between the forms for e.g. cat, act, and tack. Despite this specification, sounds sometimes occur in the wrong order, suggesting that there is a serial-ordering process that translates the lexical representations into a different planning framework. In the Nijmegen model (Levelt 1989), this occurs as the information at a lexical address is transferred piece by piece to the phonological encoder, which will develop the phonetic plan for the utterance. This provides an account of how segmental interaction errors might occur, without requiring construction of a planning structure for the entire phrase. However, the fact that a displaced target sound occurs (if it occurs at all) not in some random location in the utterance, but in exactly the location where the first intruding segment should have occurred, seems to suggest a representation of later locations in the utterance, in a phrase-sized planning framework, such as that proposed in the Prosody first approach. The Prosody First model also provides a mechanism by which sound-level serial ordering errors (such as the exchanges in (3) above) arise. This mechanism has much in common with the transfer mechanism proposed in the Nijmegen approach—that is, such errors occur during the process of transferring information from the morphosyntactic representation of a sentence into a different framework. But in this case the framework is the utterance-specific prosodic representation that is being developed on the basis of that syntactic representation. Moreover, the prosodic planning frame provides a mechanism for maintaining the slot locations for the intended sounds of upcoming words, as suggested by exchange errors, and the step-wise transfer of lexical information into the utterance-specific prosodic representation provides a mechanism for serial ordering that, when it goes awry, produces errors of exchange, anticipation, and perseveration. Relevant error-elicitation studies by Croot and colleagues (Croot et al. 2010; Beirne and Croot 2018) offer support for the view that phrase-level prosodic structure is available at the time that serial ordering of phonemic segments occurs; these authors’ tongue-twister studies show

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

536 STEFANIE SHATTUCK-HUFNAGEL that both phrase-level prominence and prosodic boundaries constrain segmental interaction errors.4 In addition to incremental approaches such as the Nijmegen model and phrase-based approaches such as the Prosody First model, other models that are focused on more specific portions of the planning process have included phrase-level prosody. For example, Articulatory Phonology assumes as input an already-specified phonological plan for an utterance that includes phrase-level prosody, and focuses on how that plan can be implemented in an oscillator-based system of motor control that models effects of prosody on timing by a set of coupled planning oscillators for the syllable, foot, and phrase level (Byrd and Saltzman 1998; Pouplier and Goldstein 2010; Krivokapić 2012). Phrase-level prosodic structure also plays an important early role in the recently developed Extrinsic-Timing-Based Three-Component model (XT/3C) of phonological and phon etic planning proposed by Turk and Shattuck-Hufnagel (2020). The XT/3C approach is an extension of the Prosody First approach that focuses on what the Nijmegen model terms ‘phonological and phonetic encoding’. Like the Prosody First model, it envisages a phrase- or utterance-sized planning frame developed early in the planning process. It includes three separate components: (i) Phonological Planning (using symbolic, non-quantitative representations), (ii) Phonetic Planning (generating quantitative representations of the acoustic goals that will signal those symbolic elements and the articulatory instructions that will produce those acoustic patterns), and (iii) Motor-Sensory Implementation (to track and adjust the movements so that they occur at appropriate times to meet those goals). These three separate components are required to account for the translation of symbolic phonological representations into quantitative phonetic representations, with explicit representation of surface time (using phonology-extrinsic general-purpose timing mechanisms— hence the ‘extrinsic timing’ designation), which can guide the Motor-Sensory Implementation component as it tracks and adjusts articulatory movements to ensure the production of the planned acoustic cues on time. The planned prosodic structure of an utterance is seen as one of many inputs into the Phonological Planning component, along with (for example) speaking rate and other factors. In this approach, the Phonological Planning component operates to integrate all the goals, including signalling the intended words and the prosodic structure, as well as achieving a target speaking rate, minimizing effort, minimizing time, and so on. (Thus this planning component includes many factors that are not part of the phonological component of the grammar.) The representational vocabulary in this planning component is abstract and symbolic (i.e. it does not include quantitative specifications), but it does make use of representations of individual acoustic cues to the distinctive features of the intended words, in their symbolic (categorical) form. By selecting the individual acoustic cues that are appropriate for the intended segmental context, prosodic context, speaking rate, and so on, the Phonological Planning component generates a representation that serves as the input to the separate Phonetic Planning component, which outputs a motor plan that will produce those acoustic cues with appropriate quantitative values.

4 Another interpretation of segmental interaction errors is that they arise during the process of retrieving sounds from their stored lexical representations, when the intruding sound reaches its threshold of activation sooner than it should (Dell and Reich 1977; Dell 1986; Dell et al. 1993).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE ROLE OF PHRASE-LEVEL PROSODY IN SPEECH PRODUCTION PLANNING 537 This conceptual approach incorporates one of the key insights that has emerged from the past several decades of speech research: that a substantial proportion of the systematic contextgoverned variation in the surface phonetic forms of words across different utterances is due to their hierarchical prosodic structure, including constituents and prominences. As a result, it puts the prosodic structure of a prosodic phrase or utterance at the core of the phonological and phonetic planning processes. It is also compatible with a growing number of findings that suggest a speaker knows a considerable amount about the higher-level prosodic structure of a phrase or utterance before beginning to articulate it. This evidence includes (for example) findings suggesting a higher initial-accent f0 (Ladd and Johnson 1987; Asu et al. 2016) and a deeper pre-utterance breath (Sperry and Klich 1992) for utterances that contain more prosodic constituents. Early knowledge about the overall structure of an utterance is also compatible with Gee and Grosjean’s (1983) finding that speakers, on average, prefer to place a prosodic boundary somewhere near the middle of an utterance. These observations certainly do not provide conclusive evidence that speakers have planned an entire utterance at the level of acoustic and articulatory detail before they begin to produce it. However, they do suggest a certain degree of planning of the overall shape of an utterance before the acoustic signal begins, and it remains to be discovered exactly what type and degree of such pre-planning occurs. For example, to what degree of surface phonetic detail have which parts of the utterance been planned at any particular time, both before it begins and as it unfolds in time? Wheeldon and Lahiri’s (1997, 2002) results showing that the total number of PWds in an utterance influences the initiation time in a delayed production task, but not in an immediate production task, suggest that the window size for pre-planning is not fixed but is flexible, and may differ depending on the speaking context.

37.5 Summary and related issues While the detailed operation of phrase-level prosodic constituent structure and promin ence in speech production planning is not yet fully understood, it is accepted by most practitioners that an adequate model of speech production planning must take account of this aspect of language. This is necessary, of course, to ensure that the utterance has the appropriate prosody (intonation and timing, grouping and prominence structure) to signal the speaker’s intended meaning. But it is also required in order to account for the fact that phrase-level prosodic constituent structure and prominence have been shown to have profound effects on the surface phonetic properties of the words and sounds of the utterance. It is clear from this discussion that the task of modelling human speech production planning has been well begun but not yet completed. Outstanding questions include: • whether the flow of information through the various processing components is best described by modular ‘forward’ models, with no feedback, or by interactive views in which information can flow in both directions between processing modules (Goldrick et al. 2011); • whether prosodic planning is incremental, proceeding left to right as information becomes available, or has a top-down component that represents some information about the entire phrase or utterance from the beginning of phonological encoding;

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

538 STEFANIE SHATTUCK-HUFNAGEL • whether sound-level errors require the buffering of candidate items for an utterance in a short-term store (Shattuck-Hufnagel 1992) or can be modelled in terms of the time course of activation and competition within the lexicon (Dell 1986; Dell et al. 1997); • the role of the syllable and its subcomponents in speech production planning (see Crompton 1982; Laubstein 1987; Schiller 1999; Schiller and Costa 2006; Cholin et al. 2011; see also Xu and Liu 2006 for a model in which the syllable plays a central organizational role); • which sound-level errors occur by the mechanism of gesture intrusion and reduction, as has been shown for certain tongue-twister patterns (Pouplier and Goldstein 2010), and which ones occur at the level of mis-selection from among a set of abstract phonemes (Shattuck-Hufnagel 1992; Croot et al. 2010); • the role of prosody in speech disorders and their treatment (Zipse et al. 2014; Bishop et al. 2015); • how prosodic choices encode the speaker’s semantic and pragmatic meanings; • the nature of the individual cues to prosodic prominence and constituent structure (Cole and Shattuck-Hufnagel 2016; Brugos et al. 2018; Li et al. 2018). Guenther (2018) has proposed an extensive model of the brain mechanisms involved in speech production planning, which suggests some ways in which the role of prosodic structure may be specified in this domain. Some additional questions about how prosody plays a role in speech production concern the links between prosody and co-speech gesture, which appear to be closely timed with each other (Kendon 1980, 2004; Loehr 2004, 2012; ShattuckHufnagel et al. 2007; Shattuck-Hufnagel and Ren 2018), the development of prosody-based planning in acquisition (Gerken 1996; Demuth 2014; Frota et al. 2016; Frota and Butler 2018; Prieto and Esteve-Gibert 2018), and a prosodic approach to the phonology of sign language (Brentari 1998). In sum, the role of prosody in speech production planning appears to be pervasive, with both prosodic constituent structure and phrase-level prominence playing a critical role. Beyond this basic observation, much remains to be discovered, and it is possible that behavioural tests of proposed models of this process will, in the future, help to resolve some of the more pressing questions about prosodic theory as well as about speech planning.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

pa rt V I I

PRO S ODY A N D L A NGUAGE AC QU ISI T ION

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 38

The Acqu isition of Wor d Prosody Paula Fikkert, Liquan Liu, and Mitsuhiko Ota

38.1 Introduction In learning the sound patterns of words in a language, children need to acquire not only the segmental but also the prosodic features that mark lexical contrasts, the ‘word prosody’. Three types of word prosody have been identified in the literature: lexical tone, lexical pitch accent, and word stress. Both lexical tone and lexical pitch accent use pitch specification to indicate word-level contrasts. In lexical tone systems, such as Mandarin and Thai, pitch patterns are specified for most, if not all, syllables in a word (for more detail see chapter 4). There may be a number of lexically contrastive pitch configurations (i.e. tones) in such systems. In contrast, in a typical pitch accent system, lexical marking of pitch involves only up to one location in a word (the ‘accented’ syllable) and the types of pitch configuration are usually limited in number. Lexical pitch accent can be found in languages without stress, such as Japanese and Basque, but also in languages with stress, such as Swedish and Franconian dialects of German and Dutch, including Limburgian (where the phenomenon is sometimes called ‘word accent’). Word stress marks at least one syllable in every lexical word as the potential bearer of metrical prominence. The actual realization of the promin ence of word stress differs depending on the language and phonological context, but it typically manifests itself in the phonetic strengthening of segments (e.g. in duration, spectral properties, and/or amplitude; see chapter 10 for more detail). In this chapter, we outline the developmental changes that occur during infancy and early childhood in the perception and production of lexical tone, pitch accent, and word stress. A necessary but not sufficient condition for learning word prosody is the ability to discriminate the relevant phonetic correlates. Children also need to learn to use each type of word prosody in meaningful situations (i.e. for word recognition and word production), and hence encode it in lexical representations (§38.2, §38.3, §38.4). The review does not extend to the role of word prosody in infant word segmentation, which is covered in chapter 40. The final section (§38.5) summarizes our current understanding of three overarching issues concerning the acquisition of word prosody: the relationship between perception and production, the representation of word prosody, and the factors driving development in word prosody.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

542 PAULA FIKKERT, LIQUAN LIU, AND MITSUHIKO OTA

38.2 The acquisition of lexical tone 38.2.1 Perception of lexical tones Infants’ perception of lexical tones changes in the first year after birth as a function of native linguistic experience. Initially, infants are sensitive to the acoustic differences between nonnative tonal patterns, but non-tone language learners largely lose their sensitivity to the majority of non-native lexical tones (Burnham and Mattock 2010). Using behavioural measures such as conditioned head-turn paradigms and habituation procedures, previous research has found that infants acquiring Dutch, English, French, and German can discriminate Cantonese, Mandarin, Thai, and Yoruba tones before the age of 6 months but become unable to distinguish many non-native tonal contrasts from 9 months onwards (Harrison 2000; Mattock et al. 2008; Yeung et al. 2013; Liu and Kager 2014; Götz et al. 2018; Shi et al. 2017b). In contrast, tone language learning infants refine their perception of native tonal contrasts throughout this period and retain some sensitivity to non-native contrasts beyond 9 months. For example, infants learning Cantonese, Mandarin, and Yoruba discriminate corresponding native tonal contrasts at both 6 and 9 months, but show languagespecific differences in tonal perception already at 4 or 5 months (Harrison 2000; Mattock and Burnham 2006), and this sensitivity to language-specific differences in tonal perception is observed as early as 4 months (Yeung et al. 2013). There is no doubt that infants’ perception of tonal contrasts is influenced by a variety of factors, such as the acoustic properties and corresponding salience of tones and task difficulty (Burnham and Singh 2018). For instance, lexical tonal contrasts, such as high-level versus high-falling, and high-level versus low-dipping in Mandarin Chinese, are marked by substantial acoustic differences, and are thus expected to be easy to distinguish perceptually (Huang and Johnson 2010). Tsao (2017) tested 4- to 13-month-old Mandarin-learning infants in a series of conditioned head-turn studies and found improved discrimination of the native high-level versus the native low-dipping contrast from 7 to 11 months. Infants across language backgrounds and developmental stages may face difficulties with nonsalient tone contrasts (Tsao 2008; Liu and Kager 2014). It is worth noting that non-tone language learning infants regain tonal sensitivity at around 14–18 months, suggesting a degree of flexibility associated with tone discrimination in the second year after birth (Götz et al. 2018; Liu and Kager 2014, 2018). Although infants’ ability to perceive lexical tones emerges at a very early stage, its developmental trajectory is non-monotonous and still not well understood (see Singh and Fu 2016 for a more detailed review).

38.2.2 The role of lexical tone in word learning The ability to discriminate and categorize tonal contrasts is necessary but insufficient for acquiring lexical tones. Importantly, children need to associate tonal contrasts in lexical representations with word meaning. Infants’ ability to integrate tonal contrasts lexically appears to change as a function of language experience and tone properties (Burnham et al. 2018), and recent work typically adopts mispronunciation or label–object association

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE ACQUISITION OF WORD PROSODY 543 paradigms to examine word learning. Singh et al. (2008) reported early signs of integration of pitch into native word forms in 7.5-month-old English-learning infants, who recognized familiarized words when they were presented in the same pitch but not when their pitch was changed. At 9 months, however, English-learning infants were able to recognize familiarized words independent of changes in pitch, indicating that they had learned the irrelevance of pitch to lexical identity in the language. Other studies report slightly later timings of this developmental change. For example, in Hay et al. (2015) and Liu and Kager (2018), English and Dutch-learning infants associated novel tones with novel objects at around 15 months, but failed in this task at approximately 18 months (Hay et al. 2015; Liu and Kager 2018). In Singh et al. (2014), both English- and Mandarin-learning infants detected Mandarin tonal mispronunciations at 18 months, but only Mandarin infants did so at 24 months. Quam and Swingley (2010) found that 2.5-year-old English-learning toddlers accepted novel objects labelled with differently toned words, but not if the words had different vowels. Thus, by 2 to 2.5 years, children exposed to a non-tone language stop encoding pitch differences as lexically relevant contrasts. Infants’ ability to lexically encode tonal contrasts is also affected by the particular tonal contrast that is used in word learning and word recognition experiments. A study by Burnham and colleagues (2018) showed that Mandarin-learning 17-month-olds found it easier to discriminate between the high-level and mid-rising tones for novel objects than between the mid-rising and high-falling tones. The type of tonal contrast also affects lexical interpretation of tonal patterns by non-tone language learners. Hay et al. (2019) tested American English-learning 14-month-olds on their novel tone label–object association ability and found that infants performed much better if rising pitch was involved. The authors attributed this ‘rising bias’ to infants’ over-interpretation of rising pitch contours when differentiating words, due to its prominent function in American English, which uses falling versus rising contours to mark meaning differences. Lastly, the lexical encoding of tones appears to follow a non-monotonous developmental pattern, similar to the perceptual development of tones. Shi et al. (2017a) reported that Mandarin-learning 19-to-26-month-old toddlers performed poorly in the encoding of similar tones, especially the rising tone and the low-dipping tone, in known words. Singh et al. (2015) tested older Mandarin-learning toddlers using a mispronunciation paradigm and reported greater sensitivity to tonal mispronunciations than segmental mispronunciations at 2.5–3.5 years, but this sensitivity pattern was reversed at 4–5 years. The overall findings indicate that despite the early emergence of tone categories, tone language learning infants undergo substantial development in the lexical encoding of tones in later years along the developmental trajectory.

38.2.3 Production of lexical tones Research on tone production typically centres on how well young children can produce pitch contours of individual or clusters of tones in spontaneous production or in a controlled setting. An early study by Li and Thompson (1977) on Mandarin-learning 1- to 3-year-olds showed that children first learned to produce the high-level tone and the highfalling tone followed by the mid-rising and low-dipping tones. Some follow-up studies

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

544 PAULA FIKKERT, LIQUAN LIU, AND MITSUHIKO OTA report that Cantonese- and Mandarin-learning children could accurately produce native lexical tones before vowels and consonants and rarely produced tone errors shortly after the age of 2 (So and Dodd 1995; Hua and Dodd 2000; Hua 2002; To et al. 2013). In addition, a number of studies with 2- to 3-year-old Cantonese- and Mandarin-learning toddlers have shown that children produce tone sandhi at a relatively early age (Clumeck 1977, 1980; Tse 1978; Hua and Dodd 2000; Hua 2002; Xu et al. 2016). Tone sandhi is a phonological process by which a lexical tone changes its tonal category in a particular tonal context, resulting in a derived tone that is perceptually indistinguishable from another lexical tone (Wang and Li 1967; Xu 1994). However, most of the above-mentioned studies are based on adults’ transcription of children’s production; more recent studies using acoustic analyses have revealed substantial discrepancies in tone production between adults and young children. For example, Wong et al. (2005) and Wong (2012a, 2012b, 2013) found that Mandarin-learning children did not reach an adult level in their phonetic realization of lexical tones by 5 years of age, contrary to the claims made based on adults’ transcriptions. Other acoustic analyses report that adult-like production of low-dipping tone in Mandarin is acquired later than other tones, possibly because its articulation requires finer motor control (Wong 2012a) or because its realization varies between mid-rising and low-dipping due to the tone sandhi (Chen et al. 2015; Wewalaarachchi and Singh 2016). Development in tonal production thus appears to be constrained by similar factors that impact tonal perception, such as infants’ experience with their native language and acoustic properties of tones.

38.2.4 Summary Exposure to a tone language is not necessary for tone discrimination in the earliest stages of acquisition as both tone language and non-tone language learning children are sensitive to tonal contrasts. However, to develop a linguistic tonal system for perception and production, exposure to a tone language is crucial. Although non-tone language learning children are initially sensitive to lexical tones in a tone language, only tone language learning children retain and refine their sensitivity to native tonal patterns and use tonal contrasts in word learning after infancy. It is currently unclear how exactly tone language learning infants uncover the lexical tonal contrasts from the input. Existing research has, however, shown that tonal acoustic properties influence the trajectory of tonal development. That is, tones that are acoustically and perceptually easily distinguishable remain discriminable for non-tone learning infants and tend to be produced earlier than other tones and picked up earlier in word learning by tone language learning children.

38.3 The acquisition of pitch accent There are reasons to hypothesize that the development of lexical pitch accent and word accent (hereafter ‘pitch accent’) would be different from that of lexical tone. On the one hand, the limited range of tonal contrasts in pitch accent languages may make it easier for learners to perceptually discriminate lexical contrasts. On the other hand, the sparse

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE ACQUISITION OF WORD PROSODY 545 a ssignment of tonal contrasts in pitch accent languages means that surface pitch patterns in these languages are subject to a greater extent to the impact of post-lexical, or intonational, phonology, potentially making it more challenging for learners to identify the portion of pitch variation that belongs to word prosody. In the following subsections, we explore these issues through the examination of the development of pitch accent in three languages: (Tokyo) Japanese, (Stockholm) Swedish, and (Eastern) Limburgian.

38.3.1 Perception of pitch accent Cross-linguistic experiments with newborns have revealed experience-independent abil ities to discriminate pitch accent contrasts, just like tonal contrasts. For example, French neonates can discriminate lists of CVCV Japanese words that differ only in their pitch contour, low-high versus high-low (Nazzi et al. 1998b). In Tokyo Japanese, this difference corresponds to the difference between accentless or finally accented disyllabic words on the one hand and initially accented disyllabic words on the other. Thus, a perceptual sensitivity to acoustic differences associated with pitch accent is already present at birth. Perhaps unsurprisingly, this sensitivity is retained in infants learning a pitch accent language. For example, 4- and 10-month-old Japanese-learning infants can discriminate the minimal pairs used in the Nazzi et al. (1998b) study based on the rising versus falling contour differences (Sato et al. 2010). Similarly, Ramachers et al. (2018) demonstrated that Limburgian-learning infants at 6, 9, and 12 months of age were capable of discriminating novel Limburgian words with a fall or a fall-rise in pitch, a difference that reflects the Limburgian lexical contrast between Accent 1 and Accent 2. For instance, in sentence-final, focused position, haas [ha:s] with falling pitch is the word for ‘hare’ (Accent 1), but with falling-rising pitch it is the word for ‘glove’ (Accent 2). However, it is important to note that these sensitivities cannot be unambiguously attributed to Japanese and Limburgianlearning infants’ exposure to lexical pitch contrasts. This is because similar pitch contours may also represent intonational contrasts in other languages, which could equally cause infants to respond differently to these pitch distinctions. For example, although Dutch does not lexically distinguish words based on pitch patterns, it does use falling versus fallingrising contours in the same sentential position to mark the difference between a statement and a question. In fact, Dutch-exposed infants can also discriminate Limburgian Accent 1 and Accent 2 words (Ramachers et al. 2018). Although the behavioural evidence discussed above suggests that infants’ ability to discriminate pitch contour differences is simply a matter of maintaining initial speech perception sensitivities, evidence from neuroimaging research indicates otherwise. Using functional near-infrared spectroscopy (fNIRS), Sato et al. (2010) found that 10-month-old, but not 4-month-old, Japanese-learning infants exhibited a left hemisphere dominance in their cortical hemodynamic responses when listening to the rising/falling pitch contrasts in disyllabic words. This hemispheric asymmetry was also found in adult speakers of Japanese listening to the same prosodic contrast (Sato et al. 2007), but not in infants’ or adults’ responses to pure tone analogues of the pitch contours. These results indicate that neurofunctional changes occur in the perception of contour differences corresponding to lexical pitch contrasts between 4 and 10 months. However, the same caveat mentioned above for the behavioural evidence applies to this finding as well. Until we can demonstrate that

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

546 PAULA FIKKERT, LIQUAN LIU, AND MITSUHIKO OTA infants learning languages without lexical pitch contrasts show a different pattern of neurofunctional development, it cannot be concluded that the changes are induced by the lexical as opposed to intonational nature of the pitch contrasts.

38.3.2 The role of pitch accent in word recognition and learning Recent work has also begun to examine the role of pitch contrasts in pitch-accent-languagelearning children’s word recognition and word learning. If pitch accent is encoded in children’s lexical representations, we expect their word recognition to be sensitive to pitch variation. Indeed, Japanese-learning 17-month-olds (Ota et al. 2018) and 24-month-olds (Yamamoto and Haryu 2018) are slowed down in their identification of familiar words when the words are presented with a lexically incongruent pitch contour (e.g. final-accented inú ‘dog’ spoken with a falling contour instead of the correct rising contour). Comparable results are observed in 2.5- to 4-year-old Limburgian-learning children, whose recognition of trained novel words is attenuated when the test words are presented with the contrastively opposite pitch contour (e.g. Accent 1 instead of Accent 2) (Ramachers et al. 2017). Nonetheless, the ‘incorrect’ pitch contours do not completely block word identification in any of these studies. Moreover, age-matched Dutch-learning children show the same response pattern as the Limburgian-learning children in Ramachers et al. (2017). These results suggest that children are capable of storing some pitch information when learning new words regardless of whether their native language has lexical accent. Recall, however, from §38.2.2, that 1.5-yearold Dutch-learning children fail to associate tonal contrasts between level and falling pitch in monosyllabic novel words for novel objects (Liu and Kager 2018). Thus, children may be able to recruit native intonational contrasts to mark lexical differences, but only when the contour types can be assimilated to native patterns (e.g. fall vs. fall-rise in Dutch).

38.3.3 Production of lexical accent Fundamental frequency (f0) analyses of children’s spontaneous speech in Japanese and Swedish have shown that lexical and intonational aspects of the pitch phonology become evident in children’s word production during the second year. In (adult) Tokyo Japanese, isolated disyllabic words are produced with a pitch fall from the first to second syllable when there is an accent on the initial syllable, due to a high-low tonal sequence marking the accented syllable. Isolated disyllabic words with no accent or an accent on the second syllable are produced with a pitch rise between the two syllables, due to a phrase-initial lowhigh tonal sequence, which is an intonational feature. Some 18-month-olds fail to produce discernible differences in the pitch contours for these two types of words, indicating that they have not acquired any lexical or intonational contrasts in pitch production (Ota 2003). Others, however, reliably produce the pitch fall corresponding to the pitch accent, but not the phrase-initial intonational pitch rise. In other words, there is some separation in the development of lexical and intonational marking in production. Research on early word production in Swedish has focused on the contrast between Accent 1 (‘acute’) and Accent 2 (‘grave’) words (see Romøren 2016 for a review). In citation

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE ACQUISITION OF WORD PROSODY 547 form, disyllabic words with initial word stress differ depending on their accent: Accent 1 words carry one rise-fall between the syllables, while Accent 2 words carry two rise-fall pitch sequences, one on the stressed syllable and one on the post-stress syllable. The initial rise-fall in Accent 2 is a lexical pitch accent, while the second rise-fall in Accent 2 words and the rise-fall in Accent 1 words result from intonational features (i.e. a focus-marking high tone and boundary-marking low tone) (Bruce 1977; Riad 2014). An early acoustic analysis of isolated initially stressed disyllabic word productions by Swedish-learning 17- to 18-month-olds concluded that they could distinguish Accent 1 and 2 only in the poststressed syllable, with a rise in Accent 2 words from the intonational contour (Engstrand et al. 1991). Later analyses demonstrated that Swedish-learning children of around 16 to 18 months also produced a low-pitch turning point before the post-stress syllable in Accent 2 words, an indication that the effect of the pitch accent was also evident, although its realization was somewhat inconsistent at this age (Kadin and Engstrand 2005; Ota 2006). A more reliable distinction between Accent 1 and Accent 2 contours becomes observable around 24 months (Kadin and Engstrand 2005). It is interesting to note that in single-word utterances, the first detectable pitch characteristic that emerges in children’s production is not the invariable pitch accent contour but the intonational post-stress rise-fall in Accent 2 that is only heard in focus position. This may be because the intonational rise-fall is often more acoustically salient in the input, or because children pay more attention to intonationally highlighted words (Grassmann and Tomasello 2010).

38.3.4 Summary Although pitch accent can be seen as a type of tonal system, the available developmental work reveals some interesting differences between the acquisition of pitch accent and that of lexical tone. For one thing, there is little evidence (at least for now) that exposure to a pitch accent system triggers the kind of language-specific early perceptual changes to pitch variations observed in infants exposed to a lexical tone. Part of the reason for this could be that the patterns that mark lexical contrasts in pitch accent languages are simpler than those in lexical tone languages and may not require perceptual attunement for the relevant acoustic dimensions. In word recognition, we have seen evidence that, by 24 months, children begin to encode pitch accent contrasts in familiar and newly acquired words. But the fact that they do not completely reject pitch-mismatched words and that learners of a non-pitch-accent language can also perform similar tasks pose interesting questions, such as the extent to which such encoding is uniquely lexical and categorical. Finally, the production data demonstrate some complexity that is not observed in the development of lexical tone due to the highly interactive nature of lexical and intonational factors in lexical pitch systems.

38.4 The acquisition of word stress The acquisition of word stress is more akin to the acquisition of pitch accent than that of tone, as there often is only one stressed syllable in a word. In many languages, minimal pairs that only differ in terms of lexical stress are very rare. English, for example, has some Romance

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

548 PAULA FIKKERT, LIQUAN LIU, AND MITSUHIKO OTA loans where the verb is stressed on the second syllable but the noun on the first (abstráctV vs. ábstractN), indicating that besides stress, word class is a factor. Moreover, it is possible that word stress is rule- or constraint-governed, rather than encoded in the mental lexicon for each lexical item, or at least partly so. If so, dominant word stress may not be marked in the lexicon but generated by rules or constraints, leaving only non-dominant stress lexically marked.

38.4.1 The perception of word stress Months before they start to produce their first word, infants can already discriminate between the dominant stress pattern in their native language and other patterns and show a preference for words with the dominant stress pattern. For example, in their seminal study, Jusczyk et al. (1993) presented English-learning infants with lists of strong-weak (S-W) and weak-strong (W-S) words in a head-turn preference (HTP) procedure. Results showed that although English-learning infants did not yet have a preference for trochaic words at 6 months of age, they showed a clear preference for trochaic words at 7.5 months. Similar results were found for Dutch-learning infants at 9 months (Houston et al. 2000). Subsequent research further investigated whether the nature of the trochaic bias as found in English-learning and Dutch-learning infants is universal, or whether it already reflects learning the word-prosodic system of the native language. The evidence points to the latter: Frenchlearning infants showed the opposite preference from German-learning infants, who—like English-learning and Dutch-learning infants—also showed a preference for trochaic words, from 6 months onwards though not earlier (Höhle et al. 2009). Similarly, Hebrew-learning infants showed a preference for the dominant iambic word-stress pattern in their language (Segal and Kishon-Rabin 2012). While Spanish-learning 9-month-olds did not show a trochaic preference when presented with trochaic and iambic CV.CV sequences, they did when syllable weight was taken into account: in that case, they showed a trochaic bias for CVC.CV words and an iambic bias for CV.CVC words (Pons and Bosch 2010), reflecting the variation in word stress in their input. This preference signals infants’ developing knowledge of the word-prosodic system. Vihman et al. (2004), testing trochaic and iambic words and children’s sensitivity to stress mispronunciations in an HTP paradigm, showed that infants at 11 months, but not at 9 months, preferred to listen to frequent over infrequent words regardless of the stress pattern (iambic vs. trochaic) or correctness of the stress (correctly vs. incorrectly pronounced stress). In other words, misstressing did not affect word ‘recognition’. However, this experiment did not tap into the lexical representation of stress in these words, as children did not know the words. Thus, unlike lexical tone, there is no evidence that word stress is discriminated better by younger infants than by older infants, but, as in the case of lexical tone and pitch accent, infants tune in to their native word-stress system in the second part of the first year.

38.4.2 The role of word stress in word recognition and word learning The role of word stress in novel-word learning has been studied through a habituation para digm, while the role of word stress in the recognition of known words has typically been tested in a stress-mispronunciation paradigm. For example, Curtin (2009) tested whether

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE ACQUISITION OF WORD PROSODY 549 12-month-old English-learning children were able to learn words only differing in word stress by using habituation. She taught children a pair of words and objects that only differed in the location of stress (BEdoka and beDOka) until habituation. In the test phase, words and objects were switched. The fact that infants noticed the switch in the word–object combination suggests the use of the word stress in the newly learned words. Using the same paradigm, Graf Estes and Bowen (2013) habituated 19-month-old English-learning infants to words with high or low phonotactic probability that either had trochaic or iambic stress and showed that the infants only learned the labelling of trochaic words with high phonotactic probability. However, 16-month-old English-learning infants were able to use the difference between trochees and iambs for category learning: they associated iambs with actions (verbs) and trochees with nouns (Curtin et al. 2012). Thus, in the course of the second year, children not only know what the dominant stress pattern of their language is but can also learn words with a non-canonical stress pattern and even use this to assign category labels (verb/noun). De Bree et al. (2008) used mispronunciation of stress to test word recognition in a visualworld paradigm, manipulating stress in words well known to Dutch-learning 3-year-olds, who listened to correctly and mispronounced iambic and trochaic words while viewing two objects (target and distractor). Contrary to the infants in the study by Vihman et al. (2004), de Bree et al. found that while children looked significantly longer at the target picture on trials with correctly stressed trochaic words than on trials with misstressed trochees, mispronouncing iambs did not hinder word recognition, suggesting that the violation of the regular (expected) pattern hinders word recognition. Quam and Swingley (2014) also used a visual-world paradigm to test whether English-speaking preschoolers (206 children aged 2.5 to 5 years) recognized familiar words when misstressed, e.g. bunny (misstressed as bunNY) and banana (misstressed as BAnana). Results showed that children fixated longer on the target picture more when the word was correctly stressed and this was true for both trochaic and iambic words. However, when only pitch was used as a cue for word stress, the children were less sensitive to misstressings. In sum, the studies reviewed in this section suggest that when stress deviates from the expected word-stress pattern, word recognition is negatively impacted.

38.4.3 The production of word stress Spontaneous production data have shown that children tend to produce short words with maximally two syllables in many languages and often truncate words with more than one foot and an initial unstressed syllable (e.g. banana as nana in English) in languages with the trochaic pattern as the dominant word-stress pattern (e.g. English: Smith 1973; Allen and Hawkins 1978; Echols and Newport 1992; Demuth 1996; Kehoe 1997; Kehoe and StoelGammon 1997; Pater 1997a; Dutch: Fikkert 1994; Wijnen et al. 1994; Taelman and Gillis 2003; German: Grimm 2007; Kehoe et al. 2011; Spanish: Macken 1978; Kehoe et al. 2011; but see Hebrew: Adam and Bat-El 2009). In addition, stress errors are rare and short-lasting in production, and more often occur in words with irregular stress (e.g. Fikkert 1994; Kehoe 2001). One recurring theme in the literature is the issue of the trochaic bias in production. Allen and Hawkins (1978, 1980) stated that children start with a universal trochaic bias, hence

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

550 PAULA FIKKERT, LIQUAN LIU, AND MITSUHIKO OTA roducing mostly monosyllabic and trochaic words. This trochaic pattern has frequently p been mentioned, mostly for Germanic languages, which are inherently trochaic in nature, but also for Hebrew, for example. Hebrew does not provide children with overwhelming evidence for either trochees or iambs: both patterns frequently occur. Yet, according to Adam and Bat-El (2009), children adhere to a trochaic foot during the very early stage of acquisition, whereas their production patterns at later stages of acquisition do reflect the frequency in the language. In contrast, Hochberg (1988) argued that (Mexican) Spanish-learning children have a neutral start in stress-learning. The data for Portuguese-learning infants (both European and Brazilian) do not provide clear evidence for an initial trochaic bias either (dos Santos 2003; Correira 2009). Rose and Champdoizeau (2008) argue that there is no evidence for a trochaic bias, not even for English, referring on the one hand to a number of sources on child language data in English (Vihman et al. 1998) and on the other hand providing data for French-learning children, who predominantly produce iambic words in their early vocabularies. However, experimental evidence based on the repetition of trisyllabic w-s-w (weakstrong-weak) forms clearly shows English-learning children’s preference for trochees over iambs, as these forms are predominantly realized as S-W (Echols 1993).

38.4.4 Summary Infants start to show a preference for the dominant word-stress pattern of their language in the second half of the first year, and become sensitive to minor patterns in the course of development. Misstressing in words does not seem to hinder word recognition of words with the non-dominant word-stress pattern, but it does impact word recognition in words with the dominant stress pattern. Such asymmetries have not been reported for either toneor pitch-accent-language learning infants. The accurate production of word stress takes years: children do not produce many stress errors but typically reduce the number of syllables of longer words.

38.5 Discussion and conclusions In all three domains discussed in this chapter, children exhibit sensitivity to native lexical tone and pitch accent contrasts and to the dominant word-stress patterns of their language already in the second half of the first year. This appears to provide the foundation for the development of word prosody in their lexical representations, and the production of lexical tones, pitch accent contrasts, and stress patterns, which begins in the babbling stage and continues into childhood.1 In the remainder of this chapter, three overarching issues will be 1 Whether or not children produce the word prosodic pattern of a word correctly may depend on issues other than perception, such as sentence prosody. This is an area for future research that we have not been able to cover in this chapter (but see Liu and Kager 2014 and Singh et al. 2014 for lexical tone; Ota 2003, Romøren 2016, and Ramacher et al. 2018 for pitch accent; and Gerken 1994, 1996 for word stress). For instance, Gerken (1994, 1996) argued that whether or not English children produce iambic words accurately depends on the rhythm of a sentence: if an iambic word is preceded by a stressed syllable, the unstressed syllable is more often realized than when it is preceded by an unstressed syllable.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

THE ACQUISITION OF WORD PROSODY 551 addressed: the relationship between perception and production, the representation of word prosody, and the factors driving development in word prosody. Very few studies have addressed the perception and production of lexical tone, pitch accent, or word stress in the same children, and, hence, we lack insight into the nature of the attested time lag. While, for instance, infants seem to have language-specific knowledge of the word-stress system before age 1, they do not show this advanced knowledge in production, where truncation of iambic words still appears in the third year of life. There are of course significant methodological differences: perception studies have typically been performed in the lab and are based on group results, whereas production studies are often based on spontaneous speech from individual children. Yet, given that accurate perception of tone, pitch accent, or word stress is a necessary requirement for accurate production of these word-level prosodic features, parallels between perception and production could be expected. While there is a consensus that children have to learn lexical tones and pitch accents as a property of the words, the nature of the representation of word stress in the mental lexicon is more debatable: regular word stress could be a lexical property of words or could be assigned by grammar. The fact that children show a preference for language-specific word stress in words they do not yet know suggests that this knowledge is not specific to lexical items but rather reflects generalization over the input. Alternatively, this preference may find its origin in infants’ generalization over the words in their perceptive vocabulary: Bergelson and Swingley (2013) have shown that infants know more words than previously assumed as early as 6 months, even though most words in infants’ early lexicons are probably monosyllabic and disyllabic. Using word stress to bootstrap word segmentation is an important step in learning words, but it does not necessarily imply that words are learned and stored with word stress. Some have argued that word stress is part of children’s lexical representations, just like lexical tone and pitch accent. In a sentence repetition task, Carter and Gerken (2004) asked children to imitate sentences with word patterns like Sandra and Lucinda (e.g. ‘He kissed Sandra/Lucinda’). In the latter case, the initial unstressed syllable was typically deleted (e.g. ‘He kissed _cinda’). However, the pause between the verb and the proper name was longer in the case of an unrealized initial syllable, suggesting a trace, which may indicate children’s knowledge of the whole word form, even though they realize a truncated version of it. An important issue that is still under-researched concerns the mechanisms driving development in word prosody (Prieto and Esteve-Givert 2018). While the development trajectories of lexical tone, pitch accent, and word stress are well charted by now, it is time to investigate what may trigger their development. It may be that the development of pitch accent is bootstrapped by the acquisition of sentence prosody given children’s early sensitivity to declarative and question intonation (Chen and Fikkert 2007; Frota and Butler 2018). Fikkert (1994) presents a developmental account of word stress inspired by the computational stress learner of Dresher and Kaye (1990), which itself was based on a parametric stress theory. According to Fikkert, children’s development is based on the expansion of their own production forms based on detecting errors: either errors of omission (the child’s form is shorter than the adult target) or errors of commission (the child’s form is stressed differently than the adult target form). The former errors typically persist longer than the latter. Pater (1997a) presents a developmental model of word-stress acquisition in Optimality Theory, where development is characterized as the demotion of markedness constraints, which leads to more marked forms over the course of development. Although Pater (1997a)

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

552 PAULA FIKKERT, LIQUAN LIU, AND MITSUHIKO OTA shares the idea of a developing grammar, he argues that children’s lexical representations may initially be adult-like, differing from Fikkert (1994), who assumes a non-adult-like initial grammar. Ota (2003) uses autosegmental theory to explain how pitch accent may develop hand in hand with phrase-level or sentence-level prosody as well as stress in children’s production. These very different accounts of similar production data suggest that production data are difficult to interpret, which is why current approaches to understanding development draw on artificial learning experiments, as evident from the overview art icle by Gomez and Gerken (2000). Lexical tone, pitch accent, and word stress have all been investigated in artificial language learning experiments to some extent, which have been ignored in this overview due to space limitations. Such studies may be important to understand the acquisition of prosody in general and may enhance our insight into the acquisition of word prosody in particular.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 39

Dev el opm en t OF Phr ase-L ev el Prosody from I n fa ncy to L ate Childhood Aoju Chen, Núria Esteve-Gibert, Pilar Prieto, and Melissa A. Redford

39.1 Introduction It was long thought that children master phrase-level prosody (hereafter prosody)—that is, intonational contours, phrasing, and rhythm—before other aspects of language. This belief stemmed from the observation that infants systematically vary pitch, duration, and intensity in their vocalizations according to the interactional context, thus appropriately responding to prosodic variation in adults’ speech. Yet, recent theoretical advances in the field have encouraged more detailed study of children’s prosodic abilities, leading researchers to question the view of early mastery. Researchers now recognize that mastery of a native prosodic system requires the ability to produce formal properties of a prosodic system, vary these appropriately in response to communicative context, perceive their meanings, and process prosodic information in language comprehension (Kehoe 2013; Prieto and Esteve-Gibert 2018). Research over the past two decades has shown that prosodic acquisition starts with perceptual attunement to native tonal patterns and with cues to prosodic boundaries in infant vocalizations, but formal properties are not acquired until age 2 years and phonetic implementation of some prosodic features is not mature until middle childhood. The development of the full communicative functions of prosody takes even longer. The goal of this chapter is to present the state-of-the-art research on phrase-level prosodic development from infancy to late childhood (§39.2, §39.3, and §39.4). To this end, we describe developmental trajectories for the formal and functional properties of prosody, identify the factors that may explain why prosodic development is a gradual process across languages, and examine why cross-linguistic differences nevertheless arise early. We also suggest key issues for future research (§39.5).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

554 AOJU CHEN, Núria ESTEVE-GIBERT, PILAR PRIETO, AND MELISSA A. REDFORD

39.2 Prosody in infancy Infants are sensitive to prosodic variation within and across languages from birth. This sensitivity undergirds early perceptual attunement to ambient language patterns. Infant vocalizations incorporate prosody from a very early age. Whereas the earliest prosodic adaptations to social context may reflect links between acoustic vocal properties and emotional states, it is clear that infants also begin to incorporate language-specific patterns into their vocal repertoires before they begin to speak.

39.2.1 Infants’ perception of tonal patterns and prosodic phrasing Infants are acutely sensitive to prosody from birth (see chapter 40). They can distinguish between rhythmically different languages shortly after birth and prefer to listen to a language that is rhythmically similar to their own at 3.5 to 5 months of age (Nazzi and Ramus 2003; Molnar et al. 2013). French newborns detect changes in pitch movement (rise vs. fall) in Japanese words (Nazzi et al. 1998b); English-listening 1- to 4-month-old infants can discriminate a pair of English words whose final vowels differ in pitch, duration, or intensity (Bull et al. 1984, 1985; Eilers et al. 1984). This sensitivity no doubt provides the foundation for the perception of tonal patterns (i.e. phonologically contrastive pitch patterns in a language). Research on early perception of tonal patterns has focused on infants’ perception of nonnative tonal patterns from a tone language (Tsao et al. 2004; Mattock and Burnham 2006; Yeung et al. 2013; Liu and Kager 2014; Singh and Fu 2016; see also chapter 38). The main finding is that infants’ tonal perception narrows in the first year of life, a phenomenon known as ‘phonological attunement’ or ‘perceptual reorganization’. Whereas all infants can discriminate most native and non-native patterns from birth to between 4 and 6 months of age, sensitivity to native tonal patterns is retained or enhanced after 6 months of age but sensitivity to non-native tonal patterns declines, albeit only in infants whose ambient language does not have lexical tone. Thus, perceptual attunement to tonal patterns is similar to that reported for vowels and consonants, though it occurs at a slightly earlier age (see Kuhl 2004; Saffran et al. 2006; Yeung et al. 2013). Limited research on infants’ perception of native tonal categories in intonation languages shows that early tonal perception may be influenced by how consistently a tonal pattern occurs in a certain communication context and how salient acoustic differences between tonal patterns are (Soderstrom et al. 2011; Frota et al. 2014; Butler et al. 2016; see Frota and Butler 2018 for a review). For example, European Portuguese-learning infants can discriminate between nuclear contours with different boundary tones, such as fall and fall-rise, as early as 5 months of age (Frota et al. 2014), but English-learning infants aged 4 to 24 months do not easily discriminate between rising nuclear contours and non-rising nuclear contours (Soderstrom et al. 2011). Though not directly comparable due to methodological differences (e.g. one-word utterances with different nuclear contours presented in different blocks of stimuli in the Portuguese study vs. multiword utterances with different nuclear

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

DEVELOPMENT of PHRASE-LEVEL PROSODY FROM INFANCY tO LATE CHILDHOOD 555 contours mixed in the same blocks of stimuli in the English study), the different findings provide initial evidence for infants’ sensitivity to the coupling between tonal patterns and communication contexts. In particular, falling nuclear contours are frequently used when making statements and fall-rise nuclear contours are frequently used when asking questions in European Portuguese, whereas rising and non-rising patterns are used in tandem in both situations in English (Soderstrom et al. 2011; Frota et al. 2014). More recently, Butler et al. (2016) have shown that European Portuguese-learning infants do not show sensitivity to contrasts in peak alignment (i.e. the difference between a pitch peak at the end of a stressed syllable vs. on the following syllable) until 8 to 12 months of age, even though these alignment contrasts consistently signal different focus types. Together, the studies on European Portuguese-learning infants suggest that infants acquire tonal contrasts with more salient acoustic differences (i.e. direction of final pitch movement) earlier than tonal contrasts with less salient acoustic differences (e.g. peak alignment) if there is little difference in how consistently a tonal pattern occurs in a certain communication context. Research on early perception of prosodic phrasing suggests perceptual narrowing around boundary cues. Major prosodic boundaries, such as intonational phrase boundaries, are demarcated by several acoustic cues, including pre-boundary lengthening, pausing, and pitch changes (Ferreira 2007; Mo 2010; Wagner and Watson 2010; Holzgrefe-Lang et al. 2016; Zhang 2012). Languages differ in the distribution of these cues and so listeners weight them differently when identifying boundaries (Peters et al. 2005; Holzgrefe-Lang et al. 2016). For example, English- and Mandarin-speaking listeners treat pitch change as a more reliable cue to boundary perception than pre-boundary lengthening and pause, but Dutch-speaking listeners appear to treat pause as a more reliable cue to boundary perception than pitch changes and boundary tones (Sanderman and Collier 1997; Swerts 1997). As with tonal patterns, infants become attuned to language-specific boundary marking over developmental time. Between 4 and 6 months of age, infants exposed to different languages can discriminate well-formed and ill-formed prosodic units (e.g. ‘rabbits eat leafy vegetables’ spoken as one intonational phrase or as part of two different intonational phrases with a boundary between ‘eat’ and ‘leafy’) only if all boundary cues are present (Seidl and Cristià 2008; Wellmann et al. 2012); between 6 and 8 months of age, infants begin to rely only on those cues that adults find most critical for boundary marking (Seidl 2007; Johnson and Seidl 2008; Wellmann et al. 2012). To take another specific example, English-learning 6-month-old infants attend more to pitch changes than to the other cues at intonational phrase boundaries, but Dutch-learning 6-month-old infants attend more to pausing than to other cues at intonational phrase boundaries (Johnson and Seidl 2008).

39.2.2 Prosody in pre-lexical vocalizations Infants modulate prosodic features to express emotional affect well before they produce identifiable consonant-vowel sequences (Oller et al. 2013): as early as 3 months of age, infants’ vocal productions differ as a function of context. For example, the vocalizations produced during positive interactions (i.e. playing, reunion, feeding) differ from those produced when the infant is under stress (i.e. in pain, isolation, or hungry) (Lindová et al. 2015).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

556 AOJU CHEN, Núria ESTEVE-GIBERT, PILAR PRIETO, AND MELISSA A. REDFORD Infants express negative affect by vocalizing for longer and across a wider pitch span than during positive interactions (Scheiner et al. 2002). Around 7 to 9 months of age, infants’ vocalizations begin to sound like articulated speech. Consonant-like sounds alternate with vowel-like sounds to form long strings of syllablelike utterances. This is the babbling stage of infant vocal development. This stage overlaps with first-word acquisition in that infants continue to produce long strings of babble intermixed with words well into the second year of life (Vihman 2014). Babbling has a clear prosody that is influenced by the ambient language (e.g. Whalen et al. 1991; Engstrand et al. 2003; DePaolis et al. 2008). For example, DePaolis and colleagues (2008) showed that duration, intensity, and pitch patterns differed in babbling produced by 10- to 18-month-old American English-, Finnish-, French-, and Welsh-learning infants in directions that were clearly consistent with the ambient language prosody. At around 9 months of age, some infants’ vocalizations appear to shift from play-like to intentionally communicative (Vihman 2014; Esteve-Gibert and Prieto 2018). At around 11 months of age, infants clearly use duration, pitch range, and the direction of the pitch movement to specify their specific pragmatic intent. For example, Catalan- and English-learning infants produce longer vocalizations with larger pitch span when requesting or expressing discontent than when responding to a caregiver or producing a statement (Papaeliou et al. 2002; Papaeliou and Trevarthen 2006; Esteve-Gibert and Prieto 2013). Italian-learning 12- to 18-month-olds tend to produce falling contours when making declarative pointing gestures and rising contours when making requestive pointing gestures (Aureli et al. 2017). Moreover, cross-linguistic differences have been observed in the combination of pitch and gestures, possibly reflecting the intonation patterns of the ambient language. For example, in contrast to Italian-learning infants, Dutch-learning 14-month-olds most frequently produce level contours accompanying requestive pointing, and rising contours in declarative pointing (Grünloh and Liszkowski 2015). In sum, prosody in infancy is characterized by infants’ sensitivity to variations in pros odic parameters at birth, the development from language-independent perception to language-specific perception, an evolution from affectively linked vocalizations to more clearly pragmatically driven ones between 3 and 12 months of age, and the approximation of language-specific correlates of prosody during babbling.

39.3 Prosodic production in childhood Developmental studies on prosodic production of toddlers and preschoolers (hereafter children, compared to infants in the preceding sections) are still fragmentary. Researchers have typically investigated a limited set of prosodic phenomena in a small number of children in a narrow age range for a handful of languages. Despite these limitations, a review of the literature suggests an interesting conclusion: children acquire language-specific inton ation before they acquire language-specific rhythm. In this section, we briefly review the limited research on prosodic production in childhood that supports this conclusion and then consider what it implies for language development.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

DEVELOPMENT of PHRASE-LEVEL PROSODY FROM INFANCY tO LATE CHILDHOOD 557

39.3.1 The acquisition of intonational contours Contour-based approaches to child prosody have assessed the acquisition of global inton ation and prosodic patterns found in languages like English, French, and Japanese (e.g. Hallé et al. 1991; Vihman et al. 1998; Snow 2006). These studies have revealed cross-linguistic differences in the global contours that children produce during babbling and the production of early words (Hallé et al. 1991; Vihman et al. 1998). For example, Hallé and colleagues (1991) investigated the fundamental frequency (f0) and vowel duration properties of bisyllabic babbling sequences and single-word productions of four French- and four Japanese-learning 18-month-old toddlers. The results were that global intonation contours and lexical tonal patterns differed as functions of the ambient language. Snow and Balog (2002) cite this study along with several others (e.g. Crystal 1986; Snow 1995) to support their argument that intonational development reflects the acquisition of language-specific knowledge. The alternative view is that intonational development is a by-product of ‘natural tendencies’ due to physiological factors (e.g. breathing) or universal socio-emotional ones. This view, common in earlier work on children’s intonation production (e.g. Lieberman 1966; Dore 1975; D’Odorico 1984), wrongly predicts a universal pattern of intonational development. More recent studies on children’s speech prosody share Snow and Balog’s (2002) focus on knowledge acquisition. These studies have investigated early intonation from the perspective of the autosegmental-metrical (AM) theory, which assumes a prosodic grammar (see chapter 6). The AM perspective encourages a detailed description of children’s phonological categories, including mastery over language-specific pitch accents and boundary tones. At this point, children’s intonational grammars have been described for Spanish and Catalan (Prieto et al. 2012a; Thorson et al. 2014), Portuguese (Frota et al. 2016), Dutch (Chen and Fikkert 2007; Chen 2011b), and English (Astruc et al. 2013). These analyses suggest that children acquire a good portion of the intonational system between 14 and 19 months, coinciding with the presence of a small, 25-word vocabulary. By age 2 years, children use prenuclear accents appropriately and their speech is characterized by adult-like pitch accent distributions. Adult-like pitch alignment and pitch scaling have also been observed for a few languages (e.g. Spanish for alignment, Portuguese for scaling) (see Frota and Butler 2018 for a review). Young children also produce cues to major prosodic boundaries at about the same time they correctly produce prenuclear pitch accents. Language-specific patterns of final lengthening have been observed in infant babbling (e.g. Hallé et al. 1991), but it is not fully controlled until age 2 years. For example, Snow (1994) investigated both final-lengthening and falling intonation in a longitudinal study of English-learning 16- to 25-month-olds’ speech and observed that control over final falls emerges earlier than control over final lengthening, which is only used systematically as boundary markers at the onset of combinatorial speech.

39.3.2 The acquisition of speech rhythm In contrast to the early acquisition of intonational categories and boundary marking, children do not produce fully adult-like rhythm patterns as measured in terms of interval-based metrics until the age of 4 or 5 years for the ‘syllable-timed’ Romance languages (Bunta and Ingram 2007; Payne et al. 2012), and not until after the age of 5 years for the ‘stress-timed’

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

558 AOJU CHEN, Núria ESTEVE-GIBERT, PILAR PRIETO, AND MELISSA A. REDFORD Germanic languages (Grabe et al. 1999; Sirsa and Redford 2011; Payne et al. 2012; Polyanskaya and Ordin 2015). Part of the apparent delay is due to the fact that language-specific rhythm patterns are very much a running speech phenomenon. Whereas appropriate intonation and boundary marking can be observed at the one- and two-word stages, well-structured rhythm patterns only arise with longer phrases that are chunked into language-appropriate supralexical units. Of course, children typically produce well-formed sentences as single utterances by about age 3 years. So, why is speech rhythm still immature at age 4 in Romance languages and until after age 5 in Germanic languages? Payne and colleagues (2012) suggest that the delayed acquisition of language-specific rhythm in Germanic languages is due to an interaction between the acquisition of phono logical abilities and the acquisition of phonetic abilities; more specifically, it is due to an interaction between the acquisition of complex syllable structure and the development of motor skills necessary for the temporally invariant production of segmental targets. This explanation is plausible in that it aligns well with the view that language-specific rhythm patterns emerge from language-specific syllable structures (e.g. Dauer 1983) and with the observation that speech motor skills develop slowly (e.g. Smith and Zelaznik 2004). But, in so far as motor skill development continues until middle adolescence, the explanation may overstate the influence of phonological acquisition on speech rhythm development. For example, English-speaking 5-year-olds who have mastered English syllable structure do in fact produce adult-like temporal patterns associated with lexical stress even though their overall rhythm patterns remain immature (Sirsa and Redford 2011). This observation suggests that the protracted development of English rhythm may also reflect delays in children’s ability to phonetically implement supralexical structures. For example, Redford (2018) found that 5-year-old children produce longer and louder determiner vowels relative to the adjacent nouns than adults, even though measures of anticipatory coarticulation suggest that they chunk the determiners with the following noun, just as adults do. Thus, the difference between English-speaking children’s and adults’ speech seems to arise from inadequately reduced grammatical words and not from differences in how these are chunked with adjacent content words to create larger rhythmic units. This interpretation of the results conforms to the more general suggestion that by the time children produce multi-word utterances, their speech representations are no longer influenced by motor skill development, even though this development continues into adolescence (Redford and Oh 2017). To summarize, current work on children’s speech supports the hypothesis that the phono logical aspects of prosody are acquired early. The protracted acquisition of speech rhythm suggests that immature motor skills may nonetheless impede children’s ability to implement in an adult-like manner the representations they have acquired.

39.4 Communicative uses of prosody in childhood: production and comprehension Research on children’s communicative uses of prosody has centred on the interface between prosody and information structure and the expression of emotions and epistemic meanings typical in their interactions with adults. A consistent pattern that has emerged from various

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

DEVELOPMENT of PHRASE-LEVEL PROSODY FROM INFANCY tO LATE CHILDHOOD 559 studies is that the mastery of adult-like competence in this area is gradual, despite the early use of prosody for interactional purposes and the early mastery of the phonological aspects of prosody. In this section, we review the key findings and explanations for why mastery over communicative uses of prosody takes so long.

39.4.1 Acquisition of prosody and information structure In many languages, speakers vary prosody in response to changes in information structure (see chapter 31). Listeners rely on these changes to make sense of incoming information (Cutler et al. 1997; Dahan 2015). The specific form–function relationship between prosody and information-structural categories is often language-specific. Developmental research on the interface between prosody and information structure is mostly concerned with how children encode focus prosodically in production, how they react to the prosody-to-focus mapping, and how they use prosodic information to anticipate upcoming referents in online comprehension. Although children appear to vary prosody to distinguish new and given information to their interlocutor at the two-word stage (Chen 2011a), adult-like production and comprehension of the prosody-to-information structure mapping develops very gradually: it is not until the age of 10 or 11 years that children acquire adult-like competence in this domain (Chen 2011a, 2018; Ito 2018). Why does this mapping take so long to acquire? Chen (2018) proposes that differences in prosodic systems and in how to encode information structure result in differences in both the rate and the route of acquisition in children acquiring different native languages. Her proposal is based on a review of recent studies on children’s prosodic focus marking in typologically different languages, including Mandarin, Korean, Swedish, Finnish, English, German, and Dutch (Hornby and Hass 1970; Wonnacott and Watson 2008; Sauermann et al. 2011; Arnhold et al. 2016; Romøren 2016; Yang 2017; Chen and Höhle 2018). More specifically, children acquire the use of phonetic means (i.e. phonetic implementation of phonological categories such as lexical tones in Mandarin, lexical pitch accents in Swedish, and pitch accents in English) to distinguish narrow focus from non-focus and to differentiate different focus types (i.e. broad focus, narrow focus, narrow contrastive focus) at an earlier age in a language (e.g. Mandarin) that exclusively relies on phonetic means for focus marking than in a language that uses both phono logical and phonetic means for focus marking (e.g. English, Dutch). Furthermore, children acquire phonological encoding of narrow focus at an earlier age in languages with a more transparent form–function mapping between the phonological means and focus conditions (e.g. Swedish and Korean vs. Dutch). The effect of transparency is also present in the phono logical marking of focus in different sentence positions within the same language. For example, Dutch-speaking children acquire phonological focus marking earlier in sentenceinitial and -final positions than in sentence-medial position, where the form–function mapping is blurrier. Moreover, children acquire the use of pitch-related cues for focus-marking purposes later than duration cues if pitch is also used for lexical purposes (e.g. Mandarin vs. Dutch). Finally, the relative importance of prosody and word order for focus marking has an effect on children’s use of phonetic means in distinguishing focus types in different word orders. For example, 4- to 5-year-olds acquiring languages that use word order in conjunction with prosody to mark focus use prosody more extensively and are less restricted by the word order of the sentences (e.g. Finnish) than children acquiring languages where prosody plays a primary role in focus marking (e.g. German and Dutch).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

560 AOJU CHEN, Núria ESTEVE-GIBERT, PILAR PRIETO, AND MELISSA A. REDFORD With respect to the comprehension of the prosody–information structure interface, a much studied information-structural condition is focus. Adults take the focus-to-prosody mapping into account in online language comprehension such that an appropriate focus-toprosody mapping leads to faster comprehension than an inappropriate focus-to-prosody mapping (Birch and Clifton 1995; Cutler et al. 1997). It is widely believed that, at the age of 4 or 5, children are not able to interpret or efficiently use the focus-to-prosody mapping in comprehension, but they use prosody to realise focus in production (Cruttenden 1985; Cutler and Swinney 1987; Hendriks 2005; Müller et al. 2006). Chen (2010b) reviewed the comprehension studies cited as evidence for children’s failure in comprehension and found that none of these studies has directly examined children’s comprehension of the focus-to- prosody mapping. Besides, the test materials used in the earlier comprehension studies were usually syntactically more complex and semantically more demanding than the materials used in related production studies. Controlling for syntactic complexity and task complexity, Chen (2010a) and Szendröi et al. (2018) found that children can process the focus-to-prosody mapping at the age of 4 or 5 years. There are, however, substantial individual differences in the comprehension of focus-to-prosody mapping in children under the age of 11 years (Chen 2014; Chen and van den Bergh, 2020) and they take more time than adults to reach a decision regardless of focus conditions in comprehension (Chen 2010a). The other frequently studied information-structural condition is contrast. With the increasing accessibility of eye-tracking techniques, researchers have been able to use the visual-world paradigm (Trueswell and Tanenhaus 2005) to study how children aged 4 to 11 years process prosodic manipulations in a short stretch of material (e.g. the adjective of an adjective-and-noun phrase, the first syllable of a word) to predict the upcoming referent in different languages (Arnold 2008; Sekerina and Trueswell 2012; Ito et al. 2012, 2014). For example, Sekerina and Trueswell (2012) reported a facilitative effect of prominence in the adjective on the detection of the correct target referent in colour adjective-and-noun phrases in 6-year-old Russian-speaking children. Ito and colleagues (2012, 2014) found both facilitative and misleading effects of prominence in the adjective in similar tasks in English- and Japanese-speaking 6- to 11-year-olds. However, children take more time to respond to the prosodic information in the stimuli than adults and are not as fast as adults even at the age of 11. They also need more time to recover from misguided interpretations than adults (Ito 2018). According to Ito (2018), the slow acquisition of contrastive prosody comprehension may be related to underdeveloped executive function, such as attention allocation and inhibition. To respond quickly to the prominence in the speech, children need to switch their attention quickly from the previous referent or the referent that they have considered initially to something new. This is hard to achieve when their executive functions are still developing. This proposal highlights the relation between prosodic development and cognitive development.

39.4.2 Acquisition of prosody and sociopragmatic meanings The meaning of an utterance goes beyond its information-structural interpretation. Speakers also convey and infer via prosody and body language sociopragmatic meanings in communication, such as emotions, irony, politeness, and epistemic stances (e.g. uncertainty,

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

DEVELOPMENT of PHRASE-LEVEL PROSODY FROM INFANCY tO LATE CHILDHOOD 561 disbelief) (Barth-Weingarten et al. 2009; Prieto 2015; Brown and Prieto 2017; and see also chapter 32). In recent years, researchers have begun to study how children express and understand sociopragmatic meanings via speech prosody and body language in a few languages (see Armstrong and Hübscher 2018 for a review). Complex cognitive abilities are required to infer the others’ affective and epistemic stances. It is not until the age of 4 to 5 years that children fully develop the so-called theory of mind—that is, the ability to understand others’ emotions, beliefs, and desires—which enables them to use prosody as a tool to infer these complex meanings (e.g. Kovács et al. 2010; Apperly 2012; Ruffman 2014). Research on children’s use of prosody in expressing and understanding sociopragmatic meanings has focused on 3-year-olds and older children. Studies on Catalan- and Dutch-speaking children show that children use prosody to encode epistemic stances such as uncertainty between the ages of 3 and 5 years, but their exact use of prosody is not adult-like even at the age of 7 or 8. For example, Catalan-speaking children use prosodic cues to express uncertainty at age 3 (before they actually use lexical cues for that purpose) (Hübscher et al. 2019). Dutch-speaking 7- to 8-year-olds use mostly delays and high pitch, while adults also use filled pauses, eyebrow movements, and funny faces (Krahmer and Swerts 2005). In speech comprehension, children appear to rely more on prosodic cues than lexical cues to detect others’ epistemic stance at the ages of 3 to 5 (Hübscher et al. 2017). For example, 3-year-old English-speaking children can correctly identify which speaker is requesting an action in a polite way on the basis of the intonation contour (i.e. rising contours for polite requests and falling contours for impolite requests) (Hübscher et al. 2016, 2018). Children’s interpretation becomes more accurate when they can also access the interlocutor’s facial expression (Armstrong et al. 2018; Hübscher et al. 2017). The earlier reliance on cues in prosody and body language to interpret epistemic stances and politeness has led to the suggestion that prosody, along with gestural and bodily marking, can serve a bootstrapping function, helping children to express complex sociopragmatic meanings at young ages (Hübscher and Prieto 2019). However, other work has found partially conflicting evidence showing that the ability to interpret prosodic cues to other indirect pragmatic inferences emerges at a later age. For example, it is not until the age of 5 years that children explicitly match sad-sounding prosody with unfamiliar broken objects and happy-sounding prosody with unfamiliar nicely decorated objects (Berman et al. 2010, 2013). When prosodic cues conflict with contextual and/or lexical cues, as in irony, children under the age of 10 tend to rely more on contextual and lexical cues than on prosodic cues to reach an ironic interpretation (Gil et al. 2014). However, they give prosodic cues more weight than contextual cues when prosodic cues are combined with a reinforcing facial expression (e.g. González-Fuente 2017). In summary, the above-reviewed work shows that children appear to learn multiple functions of prosody simultaneously between age 3 and age 11. The mastery of adult-like competence in the use of prosody in communication is a gradual process and it requires a certain level of competence in other cognitive domains; the form–function mapping is not necessarily transparent in everyday speech. Cross-linguistic differences in the rate and route of acquisition can arise due to differences in prosodic systems and specific uses of prosody.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

562 AOJU CHEN, Núria ESTEVE-GIBERT, PILAR PRIETO, AND MELISSA A. REDFORD

39.5 Future research This brief review of past research on children’s use of prosody in production, perception, and comprehension from infancy to late childhood describes fundamental knowledge about prosodic acquisition, which gives rise to interesting theoretical insights. It has become apparent from this review that development of phrase-level prosody takes years despite early sensitivity to acoustic variations in prosodic features and early language-specific patterns in infants’ pre-lexical vocalizations. However, existing work is almost solely concerned with establishing what children can do at which age in specific languages. The question as to which learning mechanisms drive the developmental changes await attention in future research. Relatedly, research on the acquisition of the use of prosody for information-structural purposes and illocutionary force has been separated from research on the acquisition of the use of prosody to express and interpret sociopragmatic meanings, as also noted by Esteve-Gibert and Prieto (2018) and Ito (2018). Consequently, we are still far from having a comprehensive picture of children’s use of prosody in communication. A holistic approach whereby different functions of prosody are studied in interaction will probably be an interesting and rewarding challenge for future research. Furthermore, much development appears to have taken place before the age of 4 years in most of the communicative uses of prosody reviewed above. In general, more research is needed on toddlers’ prosodic development, for which new suitable research paradigms are called for (Chen 2018). Another avenue for further research departs from the observation that children appear to master the intonational aspects of speech prosody earlier than the rhythmic aspects. Initial study of the alignment of phrasal accent and lexical stress confirms that the inton ational and rhythmic systems are in fact still segregated in English-speaking school-aged children’s speech (Shport and Redford 2014). Future work should investigate how the systems come to be integrated over time in production. Finally, children develop a range of abilities in linguistic and non-linguistic domains in the same period when their prosodic abilities develop. Research is needed to disentangle how prosodic development interacts with development of other aspects of language (e.g. vocabulary, syntax) and in other cognitive domains (e.g. perspective-taking abilities, empathy).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 40

Prosodic Bootstr a ppi ng Judit Gervain, Anne Christophe, and Reiko Mazuka

40.1 Introduction Prosody is the first aspect of language human infants encounter. Maternal tissues lowpass filter the speech signal delivered to the foetus in the womb at around 400 Hz, suppressing much of the spectral details necessary to identify individual segments but preserving prosody. Thus, infants’ first linguistic experience largely consists of the rhythm and melody of the language(s) spoken by their mothers before birth (Smith et al. 2003). Throughout early language acquisition, prosody continues to play an important role in scaffolding language learning. Here, we provide an overview of theoretical frameworks and experimental findings showing that young infants are sensitive to different aspects of language prosody and can use them to learn about the lexical and morphosyntactic features of their native language(s). We first summarize the theory of prosodic bootstrapping and its critiques (§40.2). We then review empirical evidence about how newborns and young infants rely on speech rhythm to identify their native language and discriminate languages (§40.3). Subsequently, we show how early sensitivity to prosody facilitates learning different aspects of the native language (§40.4). We discuss three areas of bootstrapping. First, we present evidence showing that infants can use their knowledge of the typical lexical stress patterns of their native language to constrain learning word forms during the first year of life. Second, we discuss findings suggesting that infants use prosody to learn about the basic word order of their native language in the first year of life. Third, we review results indicating that infants can use their sensitivity to prosodic structure to constrain syntactic analysis and the learning of word meanings during the second year of life. Finally, we highlight some open questions (§40.5). Infant studies use specific methods. In order to provide a better understanding of the empirical results discussed in the chapter, we introduce the commonly used methods very briefly here. The most typical behavioural methods with infants from about 4 months

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

564 JUDIT GERVAIN, ANNE CHRISTOPHE, AND REIKO MAZUKA of age rely on infants’ looking behaviour, measuring looking time as a dependent variable. Looking time methods include the head-turn preference procedure, intermodal preferential looking, and the switch/habituation-dishabituation task. Here, looking is interpreted as a proxy for infants’ attention, interest, and/or preference for a visual stimulus or a sound stimulus associated with a visual stimulus. These methods are suitable to test spontaneous preference, as well as discrimination or learning, especially when the test is preceded by a learning phase (familiarization or habituation). More recently, brain imaging methods have also been developed for infant testing. The most commonly used ones are electroencephalography (EEG) and near-infrared spectroscopy (NIRS). EEG directly targets neural activation, whereas NIRS measures its metabolic correlates, such as fMRI (functional magnetic resonance imaging). In addition to testing discrimination and learning, these methods also provide evidence about the neural correlates of speech and language processing.

40.2 Prosodic bootstrapping theory Many lexical and morphosyntactic properties of language are accompanied by characteristic prosodic patterns. The theory of prosodic bootstrapping (Gleitman and Wanner 1982; Morgan and Demuth 1996) holds that young learners can exploit the prosodic cues that are directly available in their input to learn about the perceptually unavailable, abstract lexical and grammatical properties with which those cues are correlated. In English, for instance, bisyllabic nouns (N) and verbs (V) with the same segmental make-up are distinguished by lexical stress: nouns tend to have initial stress, verbs final stress, e.g. record / ˈrekə(r)d/ N vs. /riˈko(r)d / V (Cutler and Carter 1987). Knowing this regularity, a learner is able to categorize novel words as nouns or verbs even if he or she does not know their meanings. Experimental findings over the past two decades suggest that infants are indeed able to exploit such correlations to break into the lexicon and grammar of their native language(s), thus alleviating the learning problem they face when confronted with the acquisition of abstract linguistic properties. This body of evidence will be presented in detail in the sections below. Prosodic bootstrapping theory is not without its criticisms. These will be discussed in the concluding section. Despite these issues, in the sections below, we will review evidence suggesting that prosody does play a key role in scaffolding language development. Note that newborns and young infants have sophisticated auditory and speech perception abilities. For instance, newborns prefer speech over equally complex sine wave analogues (Vouloumanos and Werker 2004; Vouloumanos et al. 2010), and they can detect the acoustic correlates of word boundaries (Christophe et al. 1994). Two-month-olds categorically discriminate consonants (Eimas et al. 1971). As another example, 4-monthold English-exposed babies recognize their own names (Mandel et al. 1995). In this review, we only concentrate on the abilities that directly support and relate to prosodic bootstrapping. Young infants’ speech perception abilities are reviewed more broadly elsewhere (Werker and Curtin 2005; Kuhl et al. 2008; Gervain and Mehler 2010; Mazuka 2015).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODIC BOOTSTRAPPING 565

40.3 Newborns’ sensitivity to prosody as a foundation for prosodic bootstrapping Infants show sensitivity to speech prosody from birth. This capacity derives from two sources: some abilities to perceive prosody result from prenatal experience with the native language(s), while others are universal, broadly based, and hard-wired in the mammalian auditory system. Although not yet mature, hearing is operational from the third trimester of gestation (Eggermont and Moore 2012). Foetuses thus experience speech. However, speech perceived in the womb is different from normal speech, as maternal tissues preserve speech prosody but suppress detailed segmental details (Querleu et al. 1988; Gerhardt et al. 1992; Lecanuet et al. 1992; Lecanuet and Granier-Deferre 1993; Griffiths et al. 1994; Sohmer et al. 2001). Human infants thus start their journey into language by experiencing speech prosody before birth. Indeed, infants learn from this prenatal experience: newborns recognize various linguistic stimuli heard prenatally (DeCasper and Fifer 1980; Bushnell et al. 1989; Moon and Fifer 2000; Sai 2005; Kisilevsky et al. 2009). Most importantly from the perspective of pros odic bootstrapping, they recognize and prefer their native language over other, rhythmic ally different languages (Mehler et al. 1988; Moon et al. 1993), even when they were exposed to two languages prenatally (Byers-Heinlein et al. 2010). Prenatal experience thus helps to identify and draw infants’ attention to the language(s) they need to acquire and informs them about a highly relevant prosodic property of their native language, its rhythm. However, the impact of prenatal prosodic experience goes well beyond simply highlighting the native language. Newborns show specific and detailed knowledge of the prosodic patterns of the languages they were exposed to in utero. For instance, they show adult-like prosodic grouping preferences predicted by the Iambic-Trochaic Law (ITL), but only for acoustic cues that mark prosodic prominence in the languages heard prenatally (Abboub et al. 2016b). Thus, prenatally French-exposed newborns show a prominence-final (i.e. iambic) preference for pure tone pairs that contrast in duration, duration being the acoustic marker that carries prominence in prosodic phrases in French, but no preference for pure tone pairs that contrast in pitch or intensity. In contrast, newborns whose mothers spoke French and another language that mainly relies on pitch contrasts in its prosody show a prominence-initial (i.e. trochaic) grouping preference for pure tone pairs differing in pitch. These grouping preferences are in conformity with the ITL (Hayes 1995; Nespor et al. 2008), which holds that sound sequences contrasting in duration are naturally perceived iambically, while sound sequences that contrast in pitch or intensity are typically grouped trochaically. These early preferences also show the impact of in utero learning, as newborns only exhibit grouping preferences for the prenatally experienced acoustic cues. The postnatal development of the ITL bias will be discussed in §40.4.1. An even more striking demonstration of the influence of prenatal experience comes from newborns’ productions (Mampe et al. 2009), although these findings have recently received some methodological criticism (Gustafson et al. 2017). German and French newborns’

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

566 JUDIT GERVAIN, ANNE CHRISTOPHE, AND REIKO MAZUKA communicative cries differ in their pitch and intensity contours in the same way as the prosodic patterns of German and French intonational contours differ. German newborns produce falling cry contours with an initial peak in pitch and intensity, while French infants produce rising contours with final pitch and intensity peaks, matching the prominenceinitial versus prominence-final nature of German and French prosody, respectively. These results suggest that newborns are familiar with the prosodic patterns of the largest linguistic units in their native language, those of intonational and phonological phrases (Nespor and Vogel 1986) from birth. Importantly, not all prosodic knowledge is learned from prenatal experience. Newborns also possess auditory and speech-processing skills that are universal and broadly based, possibly rooted in more general properties of the mammalian auditory system. They are thus able to discriminate not only their native language, with which they are familiar, but also unfamiliar languages they have never heard before as long as they are rhythmically different (Nazzi et al. 1998a; Ramus et al. 2000). French-exposed newborns are thus able to discriminate between Japanese and Russian, but not between Dutch and English or Italian and Spanish. In this context, speech rhythm is operationally defined as the proportion of vocalic (%V) and consonantal (%C) intervals, as well as their variability in the speech signal (Ramus et al. 1999). Following this definition, Russian, English, and Dutch can be con sidered stress-timed languages, while French, Italian, and Spanish are syllable-timed. Japanese is mora-timed. For a more detailed discussion of the rhythmic classes of languages, the reader is referred to chapter 11 as well as Grabe and Low (2002), Dellwo (2006), Wiget et al. (2010), and Loukina et al. (2011). One important contribution of rhythm-based discrimination to language acquisition is that it provides a cue for multilingual infants about the presence of several languages in their environment. This seems to be a non-language-specific auditory ability, as it is shared with non-human primates (Ramus et al. 2000). Not requiring any language experience or language-specific knowledge, this ability is thus well suited to helping multilingual infants recognize that there are multiple languages present and keep them apart, as long as the languages are rhythmically different. Language discrimination in infants exposed to two rhythmically similar languages has not been shown before 3.5–4 months (Bosch and Sebastian-Galles 1997; Molnar et al. 2013) and is believed to rely on cues that are language-specific and primarily not prosodic in nature (phonotactic regularities, phoneme repertoire, etc.). Newborns also have universal perceptual abilities, allowing them to process word-level phonological information. They are able to discriminate between functors and content words, at least when these words are presented in isolation, outside their sentential context, on the basis of their acoustic/phonological differences (Shi et al. 1999)—that is, functors’ phonological ‘minimality’, such as shorter duration, frequent lack of lexical stress, and simple syllable structure (Morgan et al. 1996). Interestingly, when functors and content words are combined together into utterances with intonational phrase contours characteristic of the native language, newborns can no longer individuate them, as they fail to detect changes in the order of functors and content words within utterances (Benavides-Varela and Gervain 2017), suggesting that larger prosodic units might outweigh smaller ones at the very beginning of language development. Newborns are nevertheless sensitive to lexical stress when words with opposite lexical stress patterns, presented in isolation, are contrasted (Sansavini et al. 1997). Indeed, they

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODIC BOOTSTRAPPING 567 can discriminate iambically and trochaically stressed bi- and trisyllabic words from one another, irrespective of whether the words are made up of identical or different phonemes.

40.4 How early sensitivity to prosody facilitates language learning Experimental studies with infants and toddlers have begun to shed light on specific ways that this initial sensitivity to prosody facilitates early stages of language learning.

40.4.1 Prosodic grouping biases and the Iambic-Trochaic Law As described above, one important principle of prosodic grouping is the ITL (Hayes 1995), which states that sound sequences contrasting in duration are naturally perceived iambically, whereas sound sequences that contrast in pitch or intensity are perceived trochaically. The ITL has recently received considerable attention in the language acquisition literature, as it is believed to be one of the key prosodic bootstrapping mechanisms that is operational early in infancy. While considerable advances have been made in our understanding of how the ITL might help infants to learn language, as discussed in several sections of this chapter, several important issues are still heatedly debated. One question concerns phonological domains to which the ITL applies. Many studies on the ITL do not directly address this question but assume more or less implicitly that it applies to the word level, and thus to lexical stress, without denying that it may also apply to other levels of the prosodic hierarchy (e.g. Höhle et al. 2009; Hay and Saffran 2011). However, some recent work (Nespor et al. 2008; Langus et al. 2016) explicitly claims that the ITL applies to feet and especially to prosodic phrases, but not to words, because feet and phonological phrases contain an alternation of contrasting strong and weak elements (syllables and words, respectively), while words only have a single primary lexical stress (i.e. a single stressed syllable), irrespective of their length (e.g. in the seven-syllable-long word demineraliˈzation there is still only one stressed syllable). This creates an irregular distribution of stressed syllables at the word level, rather than an alternation. No experimental study has directly tested at what level(s) infants represent prosodic grouping biases. But indirect evidence comes from a study that found language-specific influences on grouping biases. Some existing studies have found that the properties of the phonological phrase in a given language influence adult (Iversen et al. 2008) and infant (Yoshida et al. 2010; Gervain and Werker 2013; Molnar et al. 2014) participants’ grouping preferences. For instance, 9-month-old infants exposed to an object-verb (OV) language such as Japanese or Korean, and a VO language such as English, can use the different prosodic phrase patterns associated with the different word orders (OV languages: trochaic, i.e. prominence-initial, phonological phrases vs. VO languages: iambic, i.e. prominence-final, phonological phrases) to select the relevant word order (see §40.4.3 for details).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

568 JUDIT GERVAIN, ANNE CHRISTOPHE, AND REIKO MAZUKA Evidence for language-specific modulation deriving from prosodic properties at the word level is scarcer. In adults, some studies have observed language-specific effects at the word level, especially with linguistically complex stimuli (Bhatara et al. 2013), while others have not (Hay and Diehl 2007). In infants, one study comparing French and German babies found no differences between the performance of the two groups (Abboub et al. 2016a). It thus remains an open question whether the representations of lexical stress patterns are governed by rhythmic grouping biases such as the ITL. This is not to deny, of course, that infants get to know lexical stress patterns in their native language early and use them as segmentation cues. A second open question regarding grouping biases is whether they are entirely experience based or at least partly driven by biologically endowed perceptual mechanisms. As discussed above, the newborn brain shows preferential processing of both iambic and trochaic sequences for acoustic cues that carry prosodic prominence in the language(s) experienced prenatally (Abboub et al. 2016b). These results suggest that the emergence of prosodic grouping biases is based on language experience. However, the behavioural studies conducted with older infants as well as with non-linguistic animals paint a somewhat different picture: they suggest the primacy of the trochaic pattern. First, a trochaic bias has systematically been observed in infants whose native language has a trochaic lexical stress pattern at ages at which infants with an iambic language do not (yet) show a preference. For instance, Germanexposed infants show a trochaic bias at 5–6 months (Weber et al. 2004; Höhle et al. 2009) and English-exposed infants between 6 and 9 months (Jusczyk et al. 1993; Curtin et al. 2005), but French infants do not show an iambic bias at the same age (Höhle et al. 2009), nor do Japanese infants (Yoshida et al. 2010; Hayashi and Mazuka 2017). Second, in languages in which the predominant pattern is trochaic but iambic word forms also exist, such as English, the representation of the iambic pattern emerges later than that of the trochaic pattern, irrespective of whether naturalistic stimuli are used (Jusczyk et al. 1999) or whether individual acoustic cues are used to preferentially trigger different groupings (Hay and Saffran 2011). Third, preference for a trochaic grouping has been observed in infants exposed to languages that do not have trochaic lexical or phrasal level prominence. Thus, 7-month-old Italian infants show a trochaic grouping preference for pitch contrasts but no iambic preference for durational contrasts (Bion et al. 2011), whereas Italian has penultimate stress at the lexical level and iambic prosodic prominence at the phrasal level. Similarly, French infants more readily group sequences of speech sounds contrasting in pitch into trochees than sequences contrasting in duration into iambs (Abboub et al. 2016a). Fourth, non-linguistic animals show a trochaic preference for sounds contrasting in pitch but have no preference for sounds contrasting in duration (de la Mora et al. 2013; Spierings et al. 2017). The existing empirical evidence thus shows a complex pattern, possibly suggesting an innate perceptual bias for the trochaic pattern based on pitch, as well as language-specific influences modulating both pitch perception (Butler et al. 2016) and the preference for iambs based on duration perception, but further research is needed to fully understand the origins of the ITL.

40.4.2 How lexical stress helps infants to learn words Newborns already have some knowledge of the prosodic patterns of their native language(s) at birth. After a few months of experience with their mother tongue(s), this knowledge is further refined, and the infant starts to attune to his or her native language, losing the ability

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODIC BOOTSTRAPPING 569 to discriminate most non-native phonological contrasts but becoming more efficient at native discriminations. One important achievement is infants’ growing familiarity with the typical lexical stress patterns in their native vocabulary. This knowledge is a powerful bootstrapping tool that enables infants to better segment the continuous speech stream into words. In languages in which a predominant or fixed lexical stress pattern exists, the position of the stressed syllable can serve as a cue to word boundaries. In English, for instance, in which bisyllabic nouns are typically trochaic (i.e. stress-initial), it is a successful heuristic strategy to place word boundaries before stressed syllables. This stress-based segmentation mechanism, also believed to underlie speech perception in adults, is known in the literature as the Metrical Segmentation Strategy (Cutler and Carter 1987; Cutler 1994). In language acquisition, this hypothesis is strengthened by the fact that infants start recognizing familiar words in continuous passages between 6 and 7.5 months (Jusczyk and Aslin 1995) and start attaching meaning to them between 6 and 9 months (Bergelson and Swingley 2012), just when they show the first evidence that they recognize the predominant lexical stress patterns of their native language. American English-exposed infants have been shown to develop sensitivity to the stress patterns of English (a predominance of trochaic words in the initial child-directed vocabulary, e.g. in bi- and trisyllabic nouns) between 6 and 9 months, showing a preference for trochaic over iambic words (Jusczyk et al. 1993; Morgan and Saffran 1995; Morgan 1996). When the target word is aligned with an utterance edge, they succeed as early as 6 months (Johnson et al. 2014). Around this age, they also start to use stress as a cue for segmentation: at 7.5 months, American English-learning infants recognize familiar trochaic words in continuous passages. Specifically, when familiarized with trochaic English words (e.g. ˈdoctor, ˈcandle), 7.5-montholds prefer passages containing these words over passages that do not contain them (Jusczyk et al. 1999). This preference is specific to the trochaic word form and not simply to the stressed syllable, because passages containing only the first strong syllables of the words (e.g. dock, can) do not give rise to a similar preference. Moreover, by this age, English infants use language-specific stress cues to segment words from the ongoing speech stream. When presented with a continuous stream of CV syllables where every third syllable was stressed, 7- and 9-month-olds treated as familiar only those trisyllabic sequences that had initial stress (Curtin et al. 2005). Infants showed no recognition of trisyllabic sequences that were not trochaic (i.e. had stress on the middle or final syllable). The Metric Segmentation Strategy also predicts that weak–strong (i.e. iambic) words (e.g. guiˈtar) might initially be missegmented, which turns out to be the case. Iambic words are correctly segmented out from continuous passages only at 10.5 months—that is, 3 months later than trochaic ones (Jusczyk et al. 1999). Interestingly, infants exposed to British English do not show evidence of segmentation in the same paradigm before 10.5 months, and only succeed on trochaic words at this age (Mason-Apps et al. 2011; Floccia et al. 2016). The authors explain this difference between American and British English with reference to the more strongly intonated, more infantdirected nature of the American English stimuli. A similar difference between European French and Canadian French has been observed. While Canadian French infants can segment words from passages at 8 months (Polka and Sundara 2012) both from Canadian French and from European French, European French infants can only segment at this age under appropriate task conditions and only from their native dialect; segmentation from Canadian French has a cost (Nazzi et al. 2006, 2014).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

570 JUDIT GERVAIN, ANNE CHRISTOPHE, AND REIKO MAZUKA Infants exposed to German, a language in which trochaic words are also predominant, start to show a preference for trochaic forms (i.e. evidence for the iambic/trochaic discrimination) at 6 months but not yet at 4 months in behaviour (Höhle et al. 2009). EEG results show a similar pattern: 5-month-old, but not yet 4-month-old, German-exposed infants showed reliable discrimination for the deviant trochaic forms when the iambic form was the standard, but not the other way around (Weber et al. 2004). By contrast, infants exposed to French, a language in which lexical stress is not marked but an iambic pattern exists at the level of the clitic group or the phonological phrase, showed no preference between trochaic and iambic forms at 6 months, although they could discriminate them (Höhle et al. 2009). More generally, it is still debated whether these iambic–trochaic preferences (i.e. the ITL) arise as a result of experience with the native language or whether they derive from a universal auditory bias, as will be discussed in greater detail in §40.4.3. In Spanish, where stress assignment is variable, with a slight predominance of trochaic forms and especially a tight link between heavy syllables and stress, 9-month-olds show a trochaic or an iambic preference according to where the heavy syllable is located in nonwords, but have no lexical stress preference for nonwords with two light syllables (Pons and Bosch 2010). Taken together, these results suggest that during the second half of the first year of life, infants are able to use lexical stress to segment the continuous speech stream into words, an essential step of word learning.

40.4.3 How prosody bootstraps basic word order One important contribution of prosodic grouping at the level of the phonological phrase is that it correlates with the basic word order of languages, and it may thus serve as a potential bootstrapping cue to this fundamental syntactic feature of the native grammar. Specifically, the position as well as the acoustic realization of phrase-level prosodic prominence covaries with word order (Nespor et al. 2008; Gervain and Werker 2013). In Head-Complement or functor-initial languages, such as English or Italian, prosodic prominence in phono logical phrases, which falls on the Complement, is phrase-final (i.e. iambic) and is realized as a durational contrast—that is, as the lengthening of the stressed vowel of the Complement (e.g. in Ro:me). By contrast, in Complement-Head or functor-final languages, such as Japanese, Turkish, or Basque, the prominence is initial (i.e. trochaic) and is realized as increased pitch or intensity (e.g. Japanese: ˈTokyo ni ˈto Tokyo). While other cues may accompany prominence in any language, pitch or intensity serves as the contrast ive cue in OV, functor-final languages, whereas duration plays this role in VO, functor- initial languages. Infants as young as 8–9 months of age can align phrasal prosody with the underlying syntactic pattern within phrases, as they expect functors to be non-prominent and content words to be prominent (Bernard and Gervain 2012). Even more importantly, 7-month-old bilinguals exposed to a functor-initial and a functor-final language use the different prosodic realizations to select the relevant word order (Gervain and Werker 2013). Upon hearing a durational contrast, they select sequences with a functor-initial order, while, when presented with a pitch contrast, they prefer functor-final sequences. This is strong evidence that infants start using prosody to bootstrap syntax even before they have a sizeable lexicon, suggesting that they set abstract syntactic parameters rather than

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODIC BOOTSTRAPPING 571 emorize or rote-learn lexical patterns or item-based expressions. In this regard, the role of m the ITL is particularly relevant. As mentioned before, newborns already show familiarity with the predominant iambic or trochaic prosodic patterns of their native language. This knowledge may guide young infants from very early on in how they segment and parse the language input, and allow them to determine basic properties of their native grammar such as its word order. Indeed, correlations between prosody and morphosyntax might allow infants to directly access abstract linguistic knowledge, which might then help them to better parse the input and learn lexical items, rather than the other way around. For instance, an infant expecting a functor-content word order on the basis of prosody will be able to directly assign the correct lexical category to the novel words he or she encounters in an input sentence.

40.4.4 How prosody constrains syntactic analysis Since prosodic structure reflects syntactic structure to some extent, infants and young children could also exploit phrasal prosody to get information about the syntactic structure of utterances even beyond basic word order (Morgan 1986; Morgan and Demuth 1996; Christophe et al. 2008, 2016; Hawthorne and Gerken 2014; Hawthorne et al. 2016). The boundaries of large and intermediate prosodic units are aligned with syntactic constituent boundaries (Nespor and Vogel 1986; Selkirk 1984), and adults exploit these prosodic boundaries in order to constrain their syntactic analysis (e.g. Millotte et al. 2007, 2008; Kjelgaard and Speer 1999). Infants perceive intonational phrase boundaries from 5 months of age (Hirsh-Pasek et al. 1987; Seidl 2007; Männel and Friederici 2009) and intermediate pros odic boundaries from 9 months of age (Gerken et al. 1994; Shukla et al. 2011); in addition, infants’ ability to memorize well-formed prosodic units and recognize them in fluent speech suggests that they use phrasal prosody to parse continuous speech (Nazzi et al. 2000b); last, infants exploit these boundaries to constrain lexical access by 10 months of age (Gout et al. 2004; Johnson 2008; Millotte et al. 2010). Thus, phrasal prosody might allow infants to have access to some information about the syntactic structure of sentences, which may help them to figure out the meaning of some words, as proposed by Gleitman in her ‘syntactic bootstrapping’ hypothesis (Gleitman 1990; Yuan and Fisher 2009; for instance, a verb used in a transitive construction, such as the boy is daxing the girl, is more likely to refer to a causative action). While phrasal prosody signals some syntactic constituent boundaries, prosody per se gives no information as to the syntactic nature of these constituents. To this end, infants may rely on function words and morphemes (grammatical elements such as articles, pronouns, auxiliaries, and inflectional affixes), which can be discovered relatively early because they are extremely frequent syllables generally appearing at the boundaries of prosodic units (Shi et al. 1998). Infants younger than 1 year of age notice when the function words of their native language are replaced by nonsense syllables (see Shi 2014 for a review), and by 18 months they have encoded regularities between function words and content words (e.g. they expect a noun after a determiner, a verb after a personal pronoun; Höhle et al. 2004; Kedar et al. 2006; Shi and Melançon 2010; Cauvet et al. 2014). Thus, phrasal prosody and function words, taken together, may allow children to build an approximate syntactic structure of sentences in which phrasal prosody delimitates units, and function words and

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

572 JUDIT GERVAIN, ANNE CHRISTOPHE, AND REIKO MAZUKA morphemes supply the syntactic labels of each constituent (e.g. noun phrases typically contain determiners, while auxiliaries signal verb phrases). This approximate syntactic representation may be available to infants even without their having access to the content words making up the sentence (Christophe et al. 2016). Although it is very crude, it may be sufficient to support the acquisition of word meanings; for instance, a novel word appearing in the position of a noun (e.g. it’s a doke!) is likely to refer to an object, while a novel word appearing in a verb position (e.g. it’s doking!) is likely to refer to an action (Bernal et al. 2007; Waxman et al. 2009; He and Lidz 2017). The literature on children’s use of phrasal prosody to constrain syntactic analysis shows mixed results, with a number of experiments suggesting that 5-year-olds are much less skilled than adults in exploiting prosody to disambiguate sentences such as Can you touch the frog with the feather?, in which the prepositional phrase with the feather can modify either the noun frog or the verb touch (Choi and Mazuka 2003; Snedeker and Yuan 2008). However, in these cases the default prosodic structure of the sentence (generated from the syntactic structure) is the same for the two underlying syntactic structures, with three intermediate prosodic units, as in [can you touch] [the frog] [with the feather], so that disambiguation through prosody can be achieved only if the speaker is aware of the ambiguity and intentionally exaggerates one of the prosodic breaks (Millotte et al. 2007; Snedeker and Yuan 2008). To directly test the impact of phrasal prosody on syntactic analysis, it is best to select a case where the default prosodic structure differs between the two interpretations. Noun/ verb homophones provide such a test case: fly can be either a noun, as in [The baby fliesN]NP [hide in the shadows], or a verb, as in [The baby]NP[fliesv his kite]VP. In these sentences, the prosodic boundary falls after the ambiguous word when it is a noun and before it when it is a verb. When listening to the beginnings of such sentences (with the end of the sentence being masked by babble noise), infants as young as 20 months of age are able to exploit the prosodic information in order to reach the correct interpretation, and direct their gaze to the picture depicting the intended meaning (in English: de Carvalho et al. 2016b; in French: de Carvalho et al. 2016a, 2017). If phrasal prosody is to help bootstrap language acquisition, toddlers should also be able to exploit the syntactic structure they computed through phrasal prosody in order to guess the meaning of a novel word. To test this, 18-month-olds were exposed to sentences that featured novel words and differed only in their syntactic/prosodic structure, as in [Regarde la petite bamoule!]—‘Look at the little bamoule!’, where bamoule is a noun (and probably names an object) versus [Regarde],[la petite][bamoule!]—‘Look, the little(one) is bamouling’, where bamoule is a verb (and probably names an action). In a habituation-switch paradigm, toddlers exposed to these sentences while watching a video of an animal performing a self-generated action mapped the novel word to the animal when it appeared in a noun position, and to the action when it appeared in a verb position (de Carvalho et al. 2015). Thus, 18-month-olds already have a fine-grained knowledge of the contexts in which nouns and verbs are supposed to occur, with the context including not only neighbouring words but also the position within the prosodic structure (see also Massicotte-Laforge and Shi 2015), and they can rely on this knowledge in order to infer the probable meaning of a novel word that has not been encountered before. To learn which contexts correspond to which kinds of words, they might generalize from words they already know (Bergelson and

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODIC BOOTSTRAPPING 573 Swingley 2012, 2013) and infer that words occurring in similar contexts share the same properties as known words (Gutman et al. 2015; Christodoulopoulos et al. 2016).

40.5 Conclusion and perspectives Prosody is the overarching organizing principle of the speech signal. Its structure and hierarchical organization also bear important similarities with morphosyntactic and lexical features. Infants have been shown to be able to exploit these correlations to use the perceptually available cues in the speech signal to learn about the abstract grammatical and lexical properties correlated with them. While research on prosodic bootstrapping continues to progress, several questions have so far remained unanswered (e.g. Höhle 2009) and some tenets of the proposal have even been challenged (e.g. Fernald and McRoberts 1996). One open issue is the need to explain how infants can identify and rely on the relevant input cues, when languages vary as to which cues are most informative. Furthermore, the correlations between prosody and grammar/ vocabulary are often imperfect, showing exceptions within a language and often not applying to all the languages of the world (Mazuka 2007). Typological variability across languages definitely impacts the applicability of bootstrapping mechanisms. The lexical stress difference between nouns and verbs applies in English, but not in most other languages. So infants will rely on different bootstrapping mechanisms to acquire different languages, but all languages that have been investigated offer some prosody–grammar/vocabulary correlations that infants can exploit. Which correlations are most useful is an empirical question that merits further research. Relatedly, when an infant is growing up with two, typologically different systems, as is the case with certain bilingual infants (e.g. English–Japanese bilinguals), more information is needed to understand how they switch between the relevant bootstrapping mechanisms. Relatedly, the question of how infants may integrate the different prosodic cues with one another and with other cues (e.g. statistics, phonotactics) has received relatively little attention so far (e.g. Thiessen and Saffran 2003; Johnson and Seidl 2009) and requires further research to be fully accounted for. Finally and more generally, what is the origin of the correlations between prosody and grammar/vocabulary and how do young infants know about them (implicitly, of course)? For a prosodic cue to be useful for bootstrapping, the learner must know in the first place with what grammatical or lexical feature it correlates and how. Where this knowledge comes from might vary from one correlation to another. In certain cases, such as the English lexical category distinction mentioned in §40.2, knowledge of at least a few exemplars is necessary for the infant to generalize the pattern. In other cases, such as for the ITL (Hayes 1995), language might have evolved to recruit existing biases of the mammalian auditory system (e.g. Lewicki 2002); thus the perceptual mechanisms underlying the correlations might be automatic and present from very early on, making them ideal candidates for early bootstrapping. Despite these open questions, existing empirical evidence suggests that prosodic bootstrapping is a powerful heuristic learning mechanism available to infants from the very beginning of language acquisition.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 41

Prosody i n I n fa n ta n d Child -Dir ected Speech Melanie Soderstrom and Heather Bortfeld

41.1 Introduction It has long been recognized that adults speak differently to children than they do to other adults, across cultures and languages (Casagrande 1948; Ferguson 1964; Bynon 1968). The characteristics of this ‘infant-directed’ or ‘child-directed’ speech register (henceforth collectively CDS) include a wide variety of prosodic, lexical, syntactic, and phonological changes relative to speech directed at other adults (Soderstrom 2007). Hand in hand with the prosodic characteristics described in §41.2, spoken CDS includes robust shifts in vocal timbre (Piazza et al. 2017), a specialized lexicon (Phillips 1973; Mervis and Mervis 1982), reduced utterance length (Phillips 1973), and particular phonological characteristics (Kuhl et al. 1997; though see e.g. McMurray et al. 2013). The exaggerated features of CDS extend to visual aspects of spoken language, including mouth movements (Green et al. 2010) and head motion (Smith and Strader 2014). Finally, there is evidence of gestural exaggerations in visual languages as well (Masataka 1992). Infant preference for speech with the characteristics of CDS over those of adult-directed speech (ADS) has also been robustly documented, particularly in young infants. A metaanalysis found a Cohen’s D effect size for this preference of .67, with higher effect sizes for studies using naturalistic speech samples (Dunst et al. 2012). Indeed, preference for CDS was more recently selected as the topic of a large-scale cross-laboratory replication project in no small part due to the robustness of this preference (Frank et al. 2017; ManyBabies Consortium 2020). In both theoretical and empirical works, CDS has been identified as having an important influence on language development by drawing attention to the linguistic signal, communicating positive affect, and highlighting specific linguistic elements by simplifying the overall signal. In this chapter, we first review some of the primary prosodic characteristics of CDS (§41.2). We then discuss sources of variation across culture and context (§41.3 and §41.4).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN INFANT- AND CHILD-DIRECTED SPEECH 575 Next, we examine the function of CDS for social and linguistic development (§41.5). We conclude with some thoughts about the future of this research (§41.6).

41.2 Primary prosodic characteristics of infant- and child-directed speech The most salient and stable prosodic characteristic of CDS is a raised mean fundamental frequency (f0) (e.g. Fernald and Simon 1984; Fernald et al. 1989; see Figure 41.1). Another fundamental characteristic of CDS is variation in f0. This has been variously described as expanded pitch contours or increased pitch range (Fernald and Simon 1984), or increases in overall variability (Fernald et al. 1989). Differences across studies in the manner with which this pitch variation is characterized may lead to difficulties in comparisons across findings. However the variation is characterized, f0 features appear to a primary driving force behind infant preference for CDS, at least in early infancy (Fernald and Kuhl 1987). CDS is also characterized by significant rhythmic differences that include not only shorter utterances, slower speech rate, and longer pauses (e.g. Fernald et al. 1989; Soderstrom et al. 2008) but also differing and exaggerated prosodic stress and syllable-lengthening effects compared with ADS. While there is a general consensus that lengthening effects exist, a complex picture emerges with respect to how lengthening is instantiated across the sentence and stress contexts. For example, in a task where mothers were instructed to teach their 6- and 8-month-old infants multisyllabic words, Albin and Echols (1996) found evidence of greater sentence-final lengthening in CDS than ADS, and also word-final lengthening effects in CDS, for both stressed and unstressed syllables. Bernstein Ratner (1986) similarly found increased sentence-final lengthening effects for CDS compared with ADS in a sample of pre-verbal infants, but not for mothers speaking to infants at the one-word stage or beyond (stress was not controlled or examined in this study). These studies suggest that sentence-final lengthening plays a significant role in the durational differences between CDS and ADS. Indeed, one study found that exclusion of utterance-final syllables erased

Fundamental frequency (Hz)

French

Italian

German

Japanese

500 400

British English

American English

300 200 150 100 80

Mo

Fa

Mo

Fa

Mo

Fa

Mo

Fa

Mo

Fa

Mo

Fa

Figure 41.1 An example of mean f0 and f0 variability in CDS compared with ADS across six languages for both fathers (Fa) and mothers (Mo). (Reprinted with permission from Fernald et al. 1989)

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

576 MELANIE SODERSTROM AND HEATHER BORTFELD the effects of speaking rate differences (i.e. length of syllables across the sentence) between CDS and ADS (Church et al. 2005). Importantly, however, other studies have found that mothers place novel words at the ends of utterances, coincident with exaggerated pitch peaks, something not consistently done in speech to adults (Fernald and Mazzie 1991), and that mothers’ use of stress across discourse to focus attention (e.g. across repeated uses of a word or to highlight a word) in speech to their word-learning infants differs from that of speech to adults (Fisher and Tokura 1995; Bortfeld and Morgan 2010). This makes it difficult to interpret whether effects of lengthening are due to position or focal stress. One recent study (Ko and Soderstrom 2013) attempted to disentangle these factors using highly controlled stimuli and systematically varying speech register (CDS vs. ADS), focus, and utterance type (question vs. declarative). In this study, lengthening effects were found across the sentence, with no proportional increase in lengthening sentence-finally. However, the highly controlled nature of these samples may not be fully reflective of naturalistic CDS, so the question remains incompletely resolved. In addition to highlighting lexical items, CDS carries information at the paralinguistic level, such as conveying approval or comfort, arousing or soothing, or soliciting attention (Papoušek et al. 1991; Katz et al. 1996). Of note have been studies suggesting that both infants and their caregivers use specific intonational structures relevant to the dynamics of conversational interaction and turn-taking. In particular, the preponderance of questions (with their rising intonation) has been noted in a wide variety of studies (e.g. Newport et al. 1977; Soderstrom et al. 2008; but cf. van de Weijer 1997). More generally, caregivers use rising intonation to solicit behaviours from their infants (Stern et al. 1982; Ferrier 1985; Papoušek et al. 1991) or bell-shaped contours to maintain gaze (Stern et al. 1982). These different intonational structures are associated with particular grammatical structures (Stern et al. 1982) and their pragmatic meaning (e.g. approval, prohibition) is more salient in their CDS than ADS forms (Fernald 1989). Infants are sensitive to these messages, showing preferences for certain kinds of intonational patterns, such as approving contours (Papoušek et al. 1990) and questions (Soderstrom et al. 2011). Some have gone so far as to argue that this affective component is the primary distinguishing feature of CDS—comparisons of CDS and ADS have found that when emotional content is controlled, there are no differences between the prosodic characteristics of CDS and ADS or infants’ responses to them (Trainor et al. 2000; Singh et al. 2002).

41.3 Cross-cultural similarities and differences Documenting the existence of particular speech modifications across diverse cultures is important in understanding whether the use and characteristics of CDS represent a more general human communication phenomenon or are culturally specific. Indeed, evidence from a wide range of languages and cultures supports the view that there is a general drive for adults to modify their speech when speaking to children. In early work across a range of urban centres, Ferguson (1964, 1977) documented acoustic modifications of speech when adults addressed children. He also observed changes in what he referred to as ‘speech register’

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN INFANT- AND CHILD-DIRECTED SPEECH 577 in adults’ speech to children in 15 different languages and 23 different cultures. This latter form of CDS included a change in any linguistic feature of speech, such as vocabulary or syntax, rather than a vocal or acoustic change. It has long been argued that acoustic modification is more typical when addressing pre-linguistic infants, whereas simplified speech is more common when addressing older infants and young children (Fernald 1992), but the relative prevalence of the two forms of modification remains unclear. Intonational characteristics of CDS have been reported in a number of languages. Fernald et al. (1989) compared acoustic aspects of CDS across six languages. These were German, British English, and American English (stress-timed languages); French and Italian (syllable-timed languages); and Japanese (a mora-timed/pitch-accented language). Acoustic analyses of mothers’ speech revealed higher mean f0 and f0 range in CDS compared to ADS in all languages except Japanese, in which the f0 ranges in CDS and ADS were equivalent. In a similar vein, for the tonal language Mandarin Chinese, CDS f0 characteristics were less exaggerated than in American English (Grieser and Kuhl 1988; Papoušek and Hwang 1991). Kitamura and colleagues (2001) investigated the prosodic characteristics of CDS in a tonal (Thai) and a non-tonal (Australian English) language. Longitudinal speech samples were collected from mothers while they spoke to their children. Collection took place at three-month intervals from birth to 12 months, as well as while the mothers spoke to another adult. While the age trends across the two languages differed for each of the target measures (i.e. mean f0, f0 range, and utterance slope f0), the integrity of the tonal information in Thai was retained. Although Australian English CDS was generally more exaggerated than Thai CDS, tonal information in Thai was only slightly less identifiable in Thai CDS than in Thai ADS. Broesch and Bryant (2015) argued that the details of such claims should be clarified with respect to which aspect of CDS was measured: acoustic modification of speech, simplified speech register, or both. While simplified speech register has been documented in both industrialized and non-industrialized cultures, data on modification of acoustic properties of speech have come entirely from large urban and industrialized societies. Broesch and Bryant (2015) set out to determine whether mothers in traditional societies likewise alter acoustic aspects of their speech when speaking to infants relative to adults. Focusing on speech from three distinct cultures (rural Fijians, Kenya’s Bukusu, and middle-class Americans), their findings confirm that prosodic alteration of speech manifests similarly across quite different cultural groups (but see Ratner and Pye 1984; Ingram 1995). However, the amount of CDS heard by infants can vary quite widely (see e.g. Cristia et al. 2019), and much more work is needed to establish the extent to which CDS can be considered ‘universal’. One important limitation on these kinds of comparative works to date is that researchers often use different standards of measurement (what counts as an utterance, how f0 is measured, what specific characteristics are examined, etc.) that make comparisons across studies very challenging. Kitamura and colleagues’ (2001) finding that Thai mothers preserve linguistic structure by restricting f0 movement was an important test of the parameters of acoustic changes in CDS. These and other findings (e.g. Toda et al. 1990; Bornstein et al. 1992) show that, for tonal and pitch accented languages, mothers compensate for their restricted f0 by increasing the affective content of their CDS. In other words, it is not the f0 characteristics themselves that generalize across languages and cultures, but the positive affect conveyed in the mother’s voice, a finding consistent with the research described in §41.2 on the relationship

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

578 MELANIE SODERSTROM AND HEATHER BORTFELD between affect and CDS. While some languages allow substantial acoustic modulation of speech to infants, others restrain prosodic production a bit more (Grieser and Kuhl 1988; Fernald et al. 1989; Papoušek et al. 1991). Further support for universality comes from the findings of ‘prosodic’ characteristics of CDS even in visual mode (sign) languages (e.g. Masataka 1992), and preference by both deaf and hearing infants for CDS in the visual mode (Masataka 1996, 1998). Overall, the tendency to modify speech seems to be a speciesspecific adaptation that facilitates effective mother–infant communication.

41.4 Other sources of variation Cross-linguistic and cross-cultural concerns aside, CDS is often treated as a monolithic entity, a single register, whereas it is better viewed as a cluster of characteristics that vary by language and culture, but also by age, gender, context, and other factors. While CDS begins at birth (e.g. Fernald and Simon 1984) and continues through early childhood (e.g. Garnica 1977), the manner in which we speak to newborns is patently different from speech to young children. The intonational characteristics described in Fernald and Simon’s work would sound very odd in communication to a 5-year-old or even a 2-yearold—the extent of the exaggeration of f0 characteristics decreases with the age of the child after the first year (e.g. Garnica 1977). Indeed, one study of Japanese parents found that prosodic modifications decreased from birth, approaching characteristics of ADS by the onset of the two-word stage (Amano et al. 2006). However, others have found agerelated changes in f0 modification over the first year of life to be less linear. Kitamura and colleagues’ research with Australian and Thai speakers found both linear and higherorder trends in f0 and f0 range measures across the first year of life, with speech to newborns having lower f0 than that to 6- to 9-month-olds (Kitamura et al. 2001; Kitamura and Burnham 2003). These changes are also associated with differences in the communicative intent of the speaker (Kitamura and Burnham 2003). Similarly, Stern and colleagues (1983) found the most extreme prosodic modifications at 4 months (compared with newborn, 12, or 24 months) but the longest pauses at the newborn age. Another study did find a linear trend from 4 to 16 months in changes in speaking rate, with rate of speech increasing towards the adult-to-adult rate over time (Narayan and McDermott 2016), but it did not measure speech to newborns. Overall, there are reasons to believe that some CDS characteristics are unique to the newborn period, with studies reporting particular speech styles not present at other ages, such as whispered speech (Fernald and Simon 1984) and some much longer utterances with more of a self-talk character (Phillips 1973; Snow 1977). Another important source of variation is gender, both that of the caregiver and that of the infant. Findings related to gender have been varied and complex, and appear to involve interactions between the gender of the caregiver and child, age of the child, language, and context. For example, Kitamura and colleagues (2001) found changing patterns of mean f0 raising by mothers over their children’s development that differed between speech to their male versus female infants—but this developmental pattern was different for Thai speakers and speakers of Australian English. On the other hand, both Thai and Australian English mothers produced greater overall f0 range for girls than boys.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN INFANT- AND CHILD-DIRECTED SPEECH 579 Comparisons between fathers and mothers yield a similarly complex story. Overall, it is clear that the prosodic characteristics of CDS are found in both male and female speech (e.g. Fernald et al. 1989). However, some studies have found no difference in f0 measures of male and female adults’ production of CDS (e.g. Jacobson et al. 1983) while others have shown a complex pattern of differentiation across child ages that varies between mothers and fathers (e.g. Warren-Leubecker and Bohannon 1984). While one study found greater increases in f0 range in fathers’ than mothers’ speech to 2-year-olds (Warren-Leubecker and Bohannon 1984), another study found less increase in f0 range in speech to 1-year-olds by fathers than mothers (Fernald et al. 1989). Importantly, baseline measures of ADS also vary across genders, not just for mean f0 but for f0 variability as well (Warren-Leubecker and Bohannon 1984), adding to the complexity in interpreting these gender findings. Furthermore, gender differences may be highly influenced by cultural expectations, which may vary not only by language or culture but also by cultural shifts over time. Beyond systematic variation, there are individual differences in the implementation of CDS that often go unrecognized and may confound analyses with small numbers of individuals. To our knowledge, only one study has attempted to examine this directly. Bergeson and Trehub (2007) found systematic individual differences in the implementation of a particular prosodic contour (rising) by particular mothers, which they referred to as ‘signature tunes’. One of the most important concerns is variation due to the context in which the speech sample is collected, as this touches on the ecological validity of the characteristics of CDS. This is particularly important given that many experimental studies on the effects of CDS for infant language acquisition rely on laboratory-recorded ‘CDS’ in the absence of an infant target. A few studies have attempted to directly examine the impact of recording context on CDS characteristics. Some studies have shown impacts of the knowledge of observation on the quantity and quality of CDS (e.g. Graves and Glick 1978; Field and Ignatoff 1981; Shneidman and Goldin-Meadow 2012). Specific to prosody, Fernald and Simon (1984) found that speech produced by a mother to her newborn without the infant actually present (‘simulated’ CDS) contained some of the same expanded contours of CDS, but these characteristics were reduced compared to CDS produced while holding the infant. In another study, simulated CDS from female students and trained female actors was compared. The trained actors produced higher f0 in their CDS than the students (Knoll et al. 2009). However, both groups showed similar increases in ratings of positive affect in their CDS compared with ADS. By contrast, Schaeffler and colleagues (2006) showed no difference in the impact of presence or absence of a child on f0 of CDS. However, in that study, the child was present in the room during the recording of the ADS, which may have served to reduce any measurable CDS effect in the child-present condition.

41.5 Function of prosodic characteristics Numerous researchers over the decades have pointed to the functional benefits of CDS (see references in Snow and Ferguson 1977, particularly Sachs 1977 for an early articulation), which can be classified in one of three ways. First, CDS engages and maintains attention.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

580 MELANIE SODERSTROM AND HEATHER BORTFELD Second, it communicates affect and facilitates social interaction. Third, it facilitates language. Each of these three functional benefits will be reviewed in turn. Attention is required for learning, and different forms of speech have more or less success in attracting infants’ attention. Since ‘happy talk’ draws infants’ attention in a positive way, caregivers (and doting others) are more inclined to manipulate their vocal acoustics to elicit this response. Indeed, and perhaps unsurprisingly, adults rate infants’ facial responses to CDS as more ‘attractive’ than their facial responses to ADS (Werker and McLeod 1989). While only one study has measured the association of prosody with infant attention longitudinally (Roberts et al. 2013), the findings confirm the long-term effects of the attentiongating of such speech, showing that CDS at 6 months predicts infant joint attention skills at 12 months. In other words, the intonational modifications inherent to CDS increase the salience of the input, probably by increasing its variability relative to ADS and by reflecting positive emotions. Not surprisingly, positive speech greatly affects infants’ social and linguistic development. There is growing evidence that an abundance of negative or neutral speech can have a detrimental effect on early development. For example, Weinberg and Tronick (1998) found that infants as young as 3 months are sensitive to their mothers’ depression. In turn, infants of depressed mothers show impairment in social, emotional, and cognitive functions (Weinberg and Tronick 1998), as well as in associative learning, something necessary for language development (Kaplan et al. 2002). While affect (or lack thereof) may account for such findings, another important factor is infants’ own active elicitation of responses from caregivers. This communicative give-and-take creates an environment rich in linguistic structure, which is fundamental for language development to take place, and depressed mothers may not partake in it as readily as non-depressed mothers. Perhaps most critically, the exaggerated intonational characteristics of CDS highlight linguistic structure and how different components of language are strung together. These properties affect infants’ organization of, and memory for, speech. Moreover, young learners can use a variety of distributional strategies to pull individual words out of the components. The simplest example of this is that a priori knowledge of certain highfrequency words (e.g. the infant’s own name) (Bortfeld et al. 2005) combined with CDS can help to further delineate where other words begin and end, even at the age of 6 months. In other words, while the structure of CDS provides initial edges in otherwise continuous speech, continued exposure to the regular patterns within the smaller ‘chunks’ of speech that those edges create allows infants to break them down further. Consistent with this, a wealth of recent evidence has highlighted different forms of structural information in the speech signal. For example, 9-month-olds prefer to hear artificial pauses at grammatical boundaries over pause insertions at non-boundaries, but only for CDS and not ADS (Kemler Nelson et al. 1989); infants can segment artificial speech with CDS characteristics, but not when the stimuli are produced in ADS (Thiessen et al. 2005); and CDS has been found to contribute to lexical learning (Ma et al. 2011b). Other linguistic properties of CDS (e.g. an abundance of questions) may also serve to highlight these chunks and syntactic regularities in the language (Soderstrom et al. 2008).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN INFANT- AND CHILD-DIRECTED SPEECH 581

41.6 Conclusion and future directions Communication is inherently social. At the earliest stages of development, infants are being influenced by the sounds around them, particularly the speech directed at them. Fortunately, caregivers’ biases to communicate in particular ways help infants to focus their attention specifically on speech sounds. The structure of the speech signal together with the contingent structure of the infant–caregiver interaction (mothers responding dynamically to infants and vice versa) serve to highlight regularities in speech and communicate affect; infants respond to this. These maternal responses correlate positively with language development. Through these responses, infants appear to learn the association between the production of certain sounds and their outcomes. Finally, caregivers’ input during social interactions and early ‘conversations’ scaffold language learning by providing information about activities and objects that are the focus of infants’ attention in the first place. While research on CDS dates back to the 1970s and even earlier and continues to flourish, recent methodological advances in two domains warrant particular attention in the coming years. First, much of the data available on CDS are behavioural in nature. Although there are a growing number of studies using neurophysiological methods to examine infants’ processing of different forms of CDS (e.g. Bortfeld et al. 2005, 2007; Saito et al. 2007; Naoi et al. 2012; Fava et al. 2014a, 2014b), an increase in such research will broaden our understanding of what role, if any, this form of speech plays in early brain development. Second, recent advances in the use of full-day recordings to examine infants’ and young children’s language experiences (Vandam et al. 2016) present an important opportunity to expand our understanding of how CDS manifests in the real world, particularly across different cultures. Recent collaborative approaches increasingly allow for data sharing (e.g. Vandam et al. 2016) and focus on the development of standardized annotation approaches (e.g. the DARCLE Annotation Scheme: https://osf.io/4532e), which will greatly improve comparative analyses. These and other approaches will contribute to our understanding of the interplay between language development and important competencies, both social and emotional. Nonetheless, it is clear that this special form of speech is fundamental to infants’ initial vocal development and lays the foundation for subsequent advances in language learning.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 42

Prosody i n Childr en w ith At y pica l Dev el opm en t Rhea Paul, Elizabeth Schoen Simmons, and James Mahshie

42.1 Introduction Atypicalities of communication development are among the most common developmental disabilities (Prelock et al. 2008). For some children, communication is the primary aspect of development affected (Bishop et al. 2016). For other children with a variety of developmental disorders—including intellectual disabilities, neuromotor disorders, and autism—communication difficulties are one aspect of their symptomatology. In most of the research aimed at understanding the communication disorders demonstrated by affected children (who represent approximately 13% of the population internationally; McLeod and Harrison 2009), the focus has been on the development of their vocabulary, speech sound production, and syntactic, morphological, and pragmatic abilities. However, there is an emerging literature that addresses the prosodic strengths and difficulties seen in children with communication disorders. This chapter reviews four developmental disorders in which prosody has been reported to show atypicalities: autism spectrum disorder (ASD) (§42.2), developmental language disorder (DLD) (§42.3), cerebral palsy (CP) (§42.4), and hearing loss (HL) (§42.5). The focus is placed on these disorders since Lopes and Lima (2014) report that there is very little research on prosody in other disabilities seen in childhood. Brief descriptions of the impact of each of these disorders on prosodic function will be presented. As chapters 39 and 40 make clear, children are sensitive to prosodic aspects of speech input by 4–6 months of age, but the full acquisition of receptive and expressive prosodic skills at the lexical and utterance levels extends over the course of childhood, with some features not acquired until after 8 years of age. Rates and sequences can vary across languages.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN CHILDREN WITH ATYPICAL DEVELOPMENT 583

42.2 Autism spectrum disorder The US National Institute of Mental Health (2018) defines autism spectrum disorder (ASD) as a group of developmental disorders that include a spectrum of symptoms, skills, and levels of disability and that involve problems communicating and interacting with others, repetitive behaviours, and circumscribed interests. ASD is diagnosed when these symptoms impair the individual’s ability to function in important areas such as school, work, and community settings. Severity can range from very mild to profound impairment. Overall prevalence is reported by this source at 1.7% in 8-year-old children. A core feature, and one of the primary diagnostic symptoms, of ASD is a qualitative impairment in social communication (American Psychiatric Association 2013). Although some individuals with ASD have limited spoken language abilities, current estimates (Developmental Disabilities Monitoring Network Surveillance Year 2010 Principal Investigators and Centers for Disease Control and Prevention 2014) suggest that more than 70% of those with ASD, as currently defined, function within or near the normal range in intellectual ability and use spoken language as their primary means of communication. Research on the development of language in speakers with ASD (summarized by Kim et al. 2014) suggests relative strengths in the areas of phonology (Bartolucci and Pierce 1977; Kjelgaard and Tager-Flusberg 2001), morphosyntax (Tager-Flusberg et al. 1990; Eigsti et al. 2007), and vocabulary (Jarrold et al. 1997; Kjelgaard and TagerFlusberg 2001) when compared to pragmatic abilities, which constitute their primary communication difficulties. Prosody, however, has also been identified as a significant component of the deficits seen in speakers with ASD. Since Kanner’s (1943) original description of the autistic syndrome, prosodic differences in speakers with ASD have been noted (e.g. Pronovost et al. 1966; Ornitz and Ritvo 1976; Fay and Schuler 1980; Baltaxe and Simmons 1985). While not universal in speakers with ASD, when inappropriate prosody is present, it tends to persist over time, even when other aspects of language improve (Rutter and Lockyer 1967; TagerFlusberg 1981; Shriberg et al. 2001).

42.2.1 Prosody production Shriberg et al. (2001) were perhaps the first to apply a validated assessment instrument to the study of prosody in ASD. They assessed speech samples from 30 young adult speakers with ASD and reported more utterances coded as inappropriate in the domains of phrasing, stress, and resonance for the ASD group than for typical speakers. Paul et al. (2005b), reporting on the same sample of young adults, showed, as earlier studies had reported (Simmons and Baltaxe 1975), that the prosodic deficits found were not universal in the sample; only 47% of participants demonstrated these impairments, primarily in the areas of phrasing and use of stress. For this portion of the sample, however, stress difficulties were significant predictors of both social and communicative ratings on standardized instruments. Recent work suggests that this perception of prosody impairment in ASD is the result of both extended duration and extreme pitch variation used to produce simpler

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

584 RHEA PAUL, ELIZABETH SCHOEN SIMMONS, AND JAMES MAHSHIE pitch contours that are used more repetitively (Diehl et al. 2009; Green and Tobin 2009; Nadig and Shaw 2012; Diehl and Paul 2013; Fusaroli et al. 2017). These findings suggest that although not all speakers with ASD show deficits in prosody, when they do, the deficits are associated with perception of poorer social and communicative skills on the part of significant others. DePape et al. (2012), in reviewing studies of prosodic production, concluded that the degree of prosodic impairment in speakers with ASD is related to their general language level, and that these impairments affect not only the acoustic character of the output but also speakers’ ability to convey crucial information regarding meaning in utterances. Although most of this research has been carried out on English speakers, Chan and To (2016) report that speakers of tonal languages with ASD show similar atypicalities.

42.2.2 Prosody perception While there is now a fairly consistent body of evidence of impairment in the expressive prosodic abilities of affected speakers with ASD, a growing literature on the understanding of prosodic information in ASD has yielded some contradictory findings (for reviews see McCann and Peppé 2003; Diehl and Paul 2009). Some studies have reported deficits in the comprehension of prosody used to express emotional states (e.g. Lindner and Rosén 2006; Wang and Tsao 2015; Rosenblau et al. 2017). Others have found that individuals with ASD are just as capable as controls of identifying basic emotional states from prosody (Boucher et al. 1998; Grossman et al. 2010; Brennand et al. 2011; Lyons et al. 2014). Still, it is also the case that several studies have found deficits in the recognition of certain emotions (e.g. happiness: Wang and Tsao 2015; surprise: Martzoukou et al. 2017) and not others, and that emotional prosody recognition is more problematic when prododic cues are discrepant with other information, such as facial expression (Lindström et al. 2016). Some studies have employed neuroimaging techniques in order to understand emotional prosodic processing in adolescents and adults with ASD. Eigsti et al. (2012) and Gebauer et al. (2014) reported finding broader recruitment of executive and ‘mind-reading’ brain areas in ASD for a relatively simple emotion-recognition task involving prosody. Eigsti et al. interpreted these findings to suggest that participants had developed less automaticity in processing this information. Rosenblau et al. (2017) reported significant differences between typically developing (TD) individuals and individuals with ASD on both behavioural and neural levels of processing of emotional prosody. These findings, in conjunction with those concerning the inconsistencies noted above, may suggest that processing emotional prosody is effortful and resource intensive for speakers with ASD. Prosody, however, plays a role not only in the communication of emotional information but also in structural language processes such as lexical segmentation, lexical identification, and syntactic parsing (Cutler et al. 1997; Wagner and Watson 2010). Chapter 40, for example, discusses the role played by ‘prosodic bootstrapping’ in the acquisition of word order, phrase, and clausal boundaries. Research on these nonpragmatic functions of prosody in ASD is thought to be critical for determining whether prosodic deficits seen in this syndrome are distinct from the general pragmatic deficit noted earlier, or merely collateral to it. The role of intonational phrasing in syntactic parsing has been explored, with mixed results. Three studies, which respectively included adolescents, young adults, and schoolaged children, found no difference between participants with ASD and TD controls (Paul

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN CHILDREN WITH ATYPICAL DEVELOPMENT 585 et al. 2005a; Peppé et al. 2007; Chevallier et al. 2009), and one found that persons with ASD performed worse than age- and language-matched controls (Järvinen-Pasley et al. 2008). The participants in the last of these studies were less verbally proficient than those in the other studies, suggesting that methodological and age differences may influence results. Diehl et al. (2008) compared prosodic comprehension in adolescent speakers with ASD to a control group. Participants were asked to point to the correct picture in syntactically ambiguous sentences. Speakers with ASD were less likely than their TD peers to act in concordance with the prosodic cue in following directions they heard (1). (1) Put the dog . . . in the box on the star (Put the dog into the box that’s on a star). Put the dog in the box . . . on the star (Put a dog that’s in a box onto a star). Diehl et al. (2015) used both eye-tracking and behavioural responses in an experimental paradigm similar to that used in their 2008 study. They found that speakers with ASD were as likely as TD peers to use prosodic information to resolve syntactic ambiguity, provided that conflicting cues (e.g. lexical bias to interpret the first prepositional phrase heard as a destination, even when a second prepositional phrase required reanalysis of this interpret ation) were absent. Diehl et al. (2015) interpreted these data to suggest that the deficits observed in the understanding of both emotional and linguistic prosody in this population may not be due to a global deficit in prosodic processing. Rather, they may stem from weaknesses in inter pretation of information in the auditory-linguistic signal, as well as in the ability to form and override expectations based on prior knowledge and integration of cues from nonauditory sources (e.g. facial expression, situational context) and with social-cognitive knowledge (e.g. theory of mind). It may be the combination of these pressures to integrate information from a variety of sources when processing natural language that leads to the inconsistent performance seen in speakers with ASD on a variety of prosodic tasks, rather than a weakness specific to prosody per se. This interpretation of the findings in this population may help to explain its members’ relative strengths in production. In sum, prosodic deficits have consistently been reported in about half of the individuals with ASD who speak, and these deficits affect others’ perceptions of the affected speakers. Conclusions on the source of differences in receptive prosodic capacity are more mixed, and additional research is clearly needed in this area.

42.3 Developmental language disorder Developmental language disorder (DLD), sometimes referred to as specific language impairment, is a neurodevelopmental condition that affects approximately 7% of the general population in the UK and the USA, with males more likely to be affected than females (Tomblin et al. 1997). Individuals with DLD are characterized as having impairments in expressive and/or receptive language skills in the absence of obvious sensory deficits, neurological impairment, or other developmental disorders such as ASD. Although there is significant heterogeneity within the disorder, children with DLD frequently present with delayed lexical development, grammatical impairments, and impoverished sentence structure. Subtle pragmatic impairments may also be evident (Schwartz 2009).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

586 RHEA PAUL, ELIZABETH SCHOEN SIMMONS, AND JAMES MAHSHIE As we have seen, prosody plays a critical role in language processing—specifically, in parsing and organizing sentences into comprehensible linguistic units (Frazier et al. 2006), providing word boundary cues (Cutler and Carter 1987), and supporting pragmatic processing (Dahan 2015). It would seem plausible that prosodic impairments would be evident in those with DLD; however, there is mixed evidence in the literature on this point.

42.3.1 Prosody production There is a dearth of literature evaluating the acoustic properties of prosodic output in children with DLD. However, one study by Snow (1998) used acoustic measures to quantify prosody production in a group of 4-year-olds with and without DLD. This project measured syllable duration and the use of falling pitch contours within utterances collected during spontaneous language sampling. Results reveal that the children with DLD marked syntactic boundaries using prosodic information, including final syllable lengthening and falling pitch contours, in the same way as their typical peers. These findings suggest that these speakers provide at least some acoustic features in their prosodic output in a typical manner.

42.3.2 Prosody perception There is evidence to suggest that basic auditory processing of low-level prosodic information is impaired in children with DLD. Cumming et al. (2015) report that 9-year-olds with DLD demonstrate diminished sensitivity to amplitude rise time in speech. Since the amplitude envelope transmits information about the global prosodic structure of an utterance, poor sensitivity to this structure may explain some of the higher-level language-processing difficulties observed in the disorder. Haake et al. (2013) suggest additional difficulties in children with DLD in impoverished processing of durational information. When participants with DLD were presented with pairs of tones varying in duration and had to choose which tone was longer, a subset of participants demonstrated impaired performance, while the remaining participants demonstrated performance on par with age-matched typical peers. Thus, it should be noted that deficits were not seen across all participants. In a study of younger children, an unfiltered sentence was presented to preschoolers with DLD. A filtered sentence that either matched the unfiltered sentence or varied on between one and three prosodic parameters was presented after the unfiltered sentence (Fisher et al. 2007). The children with DLD performed more poorly than the TD children in determining whether the sentences matched. The authors argue that diminished performance on this task in the DLD group supports the notion that those with the disorder may not derive the same support for sentence parsing and comprehension from prosodic information as compared to language-typical peers. In summary, the findings reported here suggest that some sublexical, basic auditory processing impairments may underpin the linguistic prosody deficits observed in a subset of children with DLD. Nonetheless, at least on average, these children display the ability to produce most of the prosodic distinctions tested. It is unclear from the handful of studies presented here whether prosodic functioning is generally impaired in children with DLD,

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN CHILDREN WITH ATYPICAL DEVELOPMENT 587 whether only a subset of the DLD population experiences prosodic difficulty, or whether any of the differences reported between DLD and TD groups are clinically meaningful, given the relatively intact production reported. Continued research is needed to evaluate the relationship between basic auditory processing skills and linguistic prosody within the language impairments that characterize the disorder.

42.4 Cerebral palsy CP is a developmental disorder characterized by movement and postural disturbance that is nonprogressive in nature. It usually has its onset in the pre- or perinatal period, caused by damage to the central nervous system. It is often accompanied by problems with sensation, perception, cognition, communication, and behaviour (Rosenbaum et al. 2007b). Speech and language problems in children with CP arise primarily from deficits in speech motor control, although comorbid problems in cognition, language, and/or sensation and perception can exist (Hustad et al. 2010). Recent data from population-based samples suggest that 60% of children with CP have some type of communication problem (Bax et al. 2006), the most common of which is dysarthria, a motor speech disorder that results from impaired movement of the muscles used for speech production (American Speech-Language Hearing Association 2017). Dysarthria is characterized by speech that is aberrant in rate, pitch, intensity, and rhythm; may show changes in voice quality; and often includes imprecise consonant articulation and vowel distortions that result in reduced speech intelligibility.

42.4.1 Prosody production Most of the research on prosodic performance in CP has been carried out on adults rather than children. Patel (2002a) showed that adults with CP and severe dysarthria were able to produce vowels with both contrastive pitches and durations. Patel (2002b) was able to show that typical listeners could identify pitch contour cues provided by severely dysarthric speakers with CP in question versus statement contexts, even though the range of frequency control by these speakers was reduced, suggesting that the speakers with dysarthria were able to exert sufficient control to signal the functional question–statement distinction in their speech. Patel (2003) found that the speakers with dysarthria due to CP used pitch, duration, and intensity cues to signal contrast, and compensated for their reduced control of pitch by exploiting control of loudness and duration. Patel (2004) and Patel and Campellone (2009) reported similar findings for the ability of speakers with dysarthria due to CP to produce contrastive stress, again suggesting compensatory strategies. Connaghan and Patel (2017) showed that some speakers with CP benefit from using contrastive stress as a strategy to improve intelligibility. In one of the few studies conducted on children with CP, Kuschmann et al. (2017) reported on 15 adolescents with moderate dysarthria who were provided with an intervention programme targeting a range of language skills. In monitoring outcomes on intonation, they noted a significant increase in the use of rising intonation patterns after intervention. There were also some indications that the increase in rising intonation was related to gains in

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

588 RHEA PAUL, ELIZABETH SCHOEN SIMMONS, AND JAMES MAHSHIE speech intelligibility for some of the participants. Pennington et al. (2018) also examined the effect of intervention aimed at respiration, phonation, and speech rate in adolescents with CP and dysarthria. They found that increases in intensity and reductions in pitch were associated with gains in intelligibility. In sum, the sparse literature on the prosody of the speech of adult speakers with CP focuses only on production and suggests that some elements of prosody can be preserved in this population, despite limitations in articulatory accuracy. Adult speakers with CP appear able to exploit pitch, duration, and loudness changes to convey communicative information, often using these cues in a compensatory fashion. Training appears to increase their ability to do so.

42.5 Hearing loss Approximately 2 or 3 out of every 1,000 children in the United States are born deaf or hard of hearing (National Institutes of Health 2016), with about half showing severe to profound loss. Ninety per cent of these congenitally deaf children are born to parents with normal hearing. While more than 50% of all incidents of congenital hearing loss in children result from genetic factors, other causes include prenatal infections, illnesses, toxins consumed by the mother during pregnancy, and other conditions occurring at the time of birth or shortly thereafter (American Speech-Language Hearing Association 2018). Many children with hearing loss have some degree of residual hearing, although any impairment to hearing will impact the development of spoken language. Severity of deficits in reception and production of spoken language depend not only on the type and extent of hearing loss but also on age at identification and intervention, and the type of intervention. Perhaps the most significant advance in hearing technology since the advent of the hearing aid has been the development and use of cochlear implants with children. Prior to the widespread use of cochlear implants, children with hearing loss relied on hearing aids to access speech. These children were typically characterized as having affected speech, including articulatory errors (Hudgins and Numbers 1942; Smith 1975) and distorted vocal and prosodic characteristics (Hood and Dixon 1969; Monsen 1974; Monsen et al. 1979). But deaf children who have received cochlear implants prior to 3 years of age have generally acquired higher levels of speech and language skills (Flipsen 2008, 2011; Niparko et al. 2010) compared to peers using hearing aids.

42.5.1 Prosody production Numerous studies indicate that a notable problem among children with cochlear implants (CWCI) is the ability to sustain stable fundamental frequency (f0) and amplitude, which is likely to affect both the prosodic patterns and overall quality of speech (Campisi et al. 2006; Wan et al. 2009; Holler et al. 2010). Studies (e.g. Higgins et al. 2003; Campisi et al. 2006; Gu et al. 2017b) confirm that these differences negatively impact the production of prosody.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN CHILDREN WITH ATYPICAL DEVELOPMENT 589 A limited number of studies have directly examined the production of prosodic features by CWCI. Snow and Ertmer (2012) reported that CWCI exhibit early changes in intonation during the first six months of auditory experience with their implants, similar to those of hearing children. Additional evidence, however, shows that children with cochlear implants present a range of atypicalities in prosody production.

42.5.1.1 Prosody production in sentences Lenden and Flipsen (2007) reported a number of differences in free speech related to production of stress and speaking rate, but no consistent difficulties with phrasing or pitch. Peng et al. (2008) elicited a series of syntactically matched questions and statements from children and youth with cochlear implants and an age-matched group of children with typical hearing (CWTH). The production scores of CWCI were significantly below those of the CWTH for both sentence types. But Barbu (2016) reported that results from a panel of listeners revealed no significant difference between the CWCI and CWTH groups in the production of rising and falling intonation contrasts to signal a question or a statement, and that the groups were similar in the use of f0 and, to a lesser extent, intensity, to distinguish between statements and questions. Mahshie et al. (2016) used the focus output subtest of the Profiling Elements of Prosody in Speech-Communication (PEPS-C; Peppé et al. 2007) to elicit utterances with varied stress patterns from early-implanted CWCI. Listener judgements revealed no significant difference between the two groups’ ability to accurately produce word stress, but acoustic analysis suggested that the CWCI relied less on altering amplitude in achieving focus.

42.5.1.2 Emotional prosody production Research examining the ability to produce speech conveying emotional states is limited in this population. Nakata et al. (2012) compared hearing and implanted children’s imitation of a series of utterances that conveyed surprise (rising intonation contour) and disappointment (falling-rising intonation contour). The CWCI had an overall poorer ability to imitate these patterns. While the CWTH showed a steady improvement with age, the scores of the CWCI were not correlated with age and were similar to those of the youngest hearing children. Wang et al. (2013) compared the ability to imitate ‘happy’ and ‘sad’ sentences between nine ‘highly successful’ bilateral implant users and an age-matched group of CWTH. Findings revealed poorer performance for the CWCI.

42.5.2 Prosody perception 42.5.2.1 Prosody and sentence perception Unlike the other disorders discussed here, a good deal of research on children with HL focuses on perception of prosody. Despite significant improvements in speech perception abilities resulting from cochlear implants, the speech information provided by these devices is impoverished when compared to that contained in the intact acoustic signal. Most significant is the absence of f0 information, suggesting that CWCI have limited ability to perceive (and thus to produce) prosodic features (Peng et al. 2008). This is confirmed by studies (e.g. Most and Peled 2007) that have compared the ability of CWCI and children

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

590 RHEA PAUL, ELIZABETH SCHOEN SIMMONS, AND JAMES MAHSHIE with hearing aids to perceive intonation, syllable stress, word emphasis, and word pattern; these studies report that the children with hearing aids outperformed CWCI in perceiving both intonation and stress. In addition, a number of studies have compared prosody perception in CWCI and CWTH. Peng et al. (2008) compared the ability of early-implanted CWCI and age-matched hearing individuals to produce and perceive questions and statements. Accuracy scores and appropriateness of pitch contours were significantly lower for both production and perception of these patterns in the CWCI. See et al. (2013) reported similar findings for perception of intonation. Torppa et al. (2014) examined word and sentence stress perception by CWTH and two subgroups of CWCI, one with some degree of musical experience and a second without. The results suggest that music training for CWCI was associated with scores more similar to those of the typically hearing group than the group without music instruction, suggesting that training may improve auditory perception in children with cochlear implants. Holt et al. (2015) used a reaction time paradigm to examine response to prosodic cues in adolescents with and without cochlear implants. The group with implants showed slower reaction times than did the hearing group, suggesting that ‘deficits in the perception of prosodic cues may impact on an individual’s language processing speed’ (p. 6). Fortunato (2015) examined the role of prosody in the interpretation of syntactically ambiguous sentences and reported that a group of Portuguese-speaking CWCI differed in their use of prosodic forms to disambiguate sentences when compared to a matched group of CWTH.

42.5.2.2 Prosody and emotion perception Hopyan-Misakyan et al. (2009) used the Diagnostic Analysis of Nonverbal Behavior (DANVA-2; Nowicki and Duke 1994), a research measure of emotion perception, to compare the ability of CWCI and age- and gender-matched CWTH to recognize four affective states: happy, sad, angry, and fearful. The CWCI performed more poorly on all four categories of emotions. However, Chin et al. (2012) examined the ability of CWCI to imitate utterances that conveyed happy and sad emotions and found no significant difference between CWCI and an age-matched group of CWTH. Nakata et al. (2012) likewise compared affective prosody perception by CWCI and CWTH. They reported better performance in the perception of ‘happy’ and ‘sad’ in CWCI, though they were not entirely comparable to CWTH, but larger deficits were seen in the perception of ‘angry’ utterances on the part of CWCI. In summary, most research examining production of prosody by CWCI suggests deficits in production of stress, question–statement intonation, and mood (Peng et al. 2008; See et al. 2013; Torppa et al. 2014) and the use of compensatory strategies (Patel 2004; Patel and Campellone 2009; Connaghan and Patel 2017). Some studies, however, have found comparable performance among CWCI and CWTH. These differences may be accounted for by differences in the characteristics of the children studied and the methods used to obtain utterances. That is, research suggests that children who receive their implants prior to 3 years of age have better speech and language outcomes than do children who receive their implants at an older age (see Kirk and Hudgins 2016). While studies of earlyimplanted children tend to report performance close to that of CWTH, studies showing more differences contained significant numbers of children implanted after age 3. Methods

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN CHILDREN WITH ATYPICAL DEVELOPMENT 591 that examine imitated utterances or highly structured elicited productions, as opposed to those using spontaneous speech, may also play a role in the discrepant outcomes reported. Research on prosody perception in this population reveals deficits in various aspects of perception. Performance is slower, less consistent, and less efficient than in typical development, rather than unequivocably absent. One study of CWCI and music training suggests positive effects of perception practice.

42.6 Clinical practice in developmental prosody disorders 42.6.1 Assessing prosody deficits There are few instruments available for assessing prosody. The Prosody-Voice Screening Protocol (PVSP; Shriberg et al. 1992) is a measure that can be used to examine prosodic variables in free speech samples, in terms of stress, rate, phrasing (fluency), loudness, pitch, and voice quality. As a screening measure, the PVSP suggests a cutoff score of 80% for identifying a prosodic deficit. That is, if fewer than 80% of the subject’s utterances are rated as appropriate in one of the six areas above, the speech sample is considered to demonstrate prosodic difficulties in that area. The PVSP has undergone extensive psychometric study and demonstrates adequate reliability at the level of summative prosody-voice codes. However, the PVSP is highly labour-intensive, requiring transcription and utterance-byutterance judgements to be made for each prosody/voice code. It also requires intensive training and practice before adequate skill levels can be obtained by raters. The aforementioned PEPS-C (Wells et al. 2004; Wells and Stackhouse 2015) samples a range of expressive and receptive prosodic elements in an elicitation format. With normative data reported for children aged 5–13, the measure has been used with typical children and those with a range of disabilities. Like the PVSP, it can identify prosodic deficits, but many children with disorders score within the normal range on this measure and the items are somewhat unlike any natural speech context. The aforementioned DANVA-2 (Nowicki and Duke 1994), a norm-referenced measure of emotion perception, has been shown to be internally consistent and reliable over time, and its strong psychometric properties render it a useful instrument. The Child Paralanguage subtest consists of recorded repetitions of the same neutral sentence depicting four emotional states (happy, sad, angry, and fearful) with either high or low emotional intensity: ‘I’m going out of the room now, but I’ll be back later.’ The child responds by selecting one of four pictures that represent the four emotions. This measure has been used frequently in studies of perception of emotional prosody. A variety of notation systems have been developed to allow the annotation of transcribed speech samples to indicate prosodic features. Wells and Stackhouse (2015) supply one example in their Intonation Interaction Profile (IIP). They provide guidance for coding the appropriate use of turn-ending prosody, focus within utterances, and tone used to align with previous utterances within transcriptions. These and other notations allow the clinician to rate turn-taking, focus, and tone-matching in order to identify areas in need of intervention.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

592 RHEA PAUL, ELIZABETH SCHOEN SIMMONS, AND JAMES MAHSHIE

Table 42.1 Recording form for judging prosodic production in spontaneous speech Prosodic parameter

Clinical judgement Appropriate Inappropriate No opportunity to observe

Rate Stress in words Stress in sentences Fluency; use of repetition, revision Phrasing; use of pauses Overall pitch level; relative to age/gender Intonation (melody patterns of speech) Loudness

Acoustic analysis, using software such as Praat (Boersma 2001), has also been used to analyse prosodic features, although no standardized methods using this approach have yet been published. Most clinicians assess prosody by relying primarily on subjective rating scales such as the one shown in Table 42.1 (Paul and Fahim 2014) to make judgements about prosodic performance. At this writing, there are no truly standardized measures of prosody production or perception, despite the importance of understanding function in these areas.

42.6.2 Treatment of prosody deficits Few interventions have been developed to address prosodic deficits, particularly for children with developmental disorders. One problem facing developers of prosody intervention is the lack of normative information on the sequence of acquisition of various aspects of prosody (Diehl and Paul 2009). A few single-subject reports have appeared. Kuschke et al. (2016), for example, report on several cases in which fairly traditional language intervention techniques coupled with focused listening activities aimed at highlighting prosody were employed to some effect (Matsuda and Yamamoto 2013). However, the literature on interventions for prosody primarily focuses on ASD and HL, whereas a literature on prosody intervention is lacking for children with DLD and CP. Lo et al. (2015) employed melodic contour training with 16 adult cochlear implant users. Therapy involved training using five-note contours forming nine different patterns, such as falling or rising-falling. Following training, the implant users exhibited improved conson ant perception along with some benefits for question–statement prosody perception. Rothstein (2013) published a volume of activities for preschool through school-aged children with HL that uses developmentally appropriate activities (singing, pretending, character voices) to improve receptive and prosody production skills in the areas of loudness, pitch, rhythm, and overall intelligibility. No data have been published on the efficacy of this approach, however. Dunn and Harris (2016) also provide a volume of activities designed to address prosody, specifically in speakers with ASD. The programme includes a qualitative screening measure,

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

PROSODY IN CHILDREN WITH ATYPICAL DEVELOPMENT 593 with ratings based on observation of spontaneous speech. The programme itself presents a series of rule-based activities that employ visual cueing and physical activity to teach awareness and use of breath support, airflow, phonation, and motor sequencing in each area of prosody separately (volume, rhythm, pitch, stress, etc.) via exercises that start with sounds and words and move through phrases, sentences, paragraphs, and conversation. The authors report that research has been done on the effects of the intervention, although this research has not been published as of this writing. Wells and Stackhouse (2015) present an additional intervention manual, with activities developed for a range of developmental levels (prelingusitic through school age) and some targeted for specific aetiologies (autism, learning disabilities, deafness). The system requires a high level of transcription and analytic capability and, while a great deal of developmental information on prosodic acquisition is reviewed in the volume, no empirical data on the efficacy of the programme are provided. Simmons et al. (2016) reported on the use of a mobile application, SpeechPrompts, designed to treat prosodic disorders in children with ASD and other communication impairments using tablet computer technology. The app allows clinicians to provide sample utterances with pictured pitch and loudness characteristics and rates client productions as matching or diverging from the models. Forty students, 5–19 years old with prosody def icits, received treatment provided by their speech-language pathologists in school settings, using the app on a tablet device for short periods of time (10–20 minutes) one or two times per week for eight weeks. Post-treatment ratings suggest that SpeechPrompts was useful in the treatment of prosodic disorders, but efficacy data are not available.

42.7 Conclusion This review has identified several emerging trends in this research. In terms of prosodic perception, the data from children with DLD, ASD, and HL converge somewhat on the notion of less rapid, complete, and efficient processing of the auditory signal that carries prosody, resulting in not absent, but inconsistent and inefficient perception of prosodic cues. Although this perception is, on average, lower than that seen in typical populations, it would seem to provide information adequate to learn a good deal about prosody that allows for, again, production of a range of prosodic parameters. This development, however, is somewhat delayed, less accurate, and less efficient than normal, but is not entirely absent. Many in these populations find ways to compensate for both motoric (CP) and perceptual (ASD, DLD, HL) weaknesses to make use of strategies for improving others’ perception of their prosody. Both production and perception appear to be amenable to the positive effects of training. While more research is clearly needed, the current literature suggests that it is possible to obtain at least short-term improvements in prosodic function using a variety of approaches. Better data on the normal development of prosody, improved assessment procedures, and fuller study of the efficacy of a range of treatment approaches will be necessary to advance the current state of clinical practice in this important area of communicative function.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 43

Wor d Prosody i n Secon d L a nguage Acqu isition Allard Jongman and Annie Tremblay

43.1 Introduction The learning of lexical prosody in adult second language (L2) learners differs greatly from that in children acquiring their native language (L1): not only are adults cognitively mature but they also approach the L2 learning task with an already established linguistic system, that of the L1. This system has an important influence on the learning of lexical prosody in an L2. The present chapter provides an overview of this influence by discussing L2 learners’ use of prosodic information in word perception/recognition and their production of such information at the word level, focusing on the learning of lexical stress and lexical tone. The chapter first compares phonological and phonetic approaches to explaining L2 learners’ ability to perceive and use stress in word recognition, and then compares phonological and statistical approaches to explaining L2 learners’ production of lexical stress (§43.2). The chapter then discusses the learning of lexical tone by focusing on the contributions of lower-level acoustic-phonetic and higher-level linguistic information, the perceptual weighting of tonal cues, and the influence of contextual phonetic and prosodic information (§43.3). The efficacy of short-term auditory training is also discussed. The chapter ends with concluding remarks and future directions for research on L2 word prosody (§43.4).

43.2 Lexical stress 43.2.1 Second language word perception/recognition One approach that has been adopted to explain the influence of the L1 on adults’ perception of stress and use of stress in spoken-word recognition is the phonological approach. According to this approach, L2 learners’ success at perceiving stress and using it in word

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

WORD PROSODY IN SECOND LANGUAGE ACQUISITION 595 recognition is influenced by whether stress is represented phonologically as part of their L1 lexical representations. More precisely, L2 learners are more likely to use stress in word perception/recognition if stress is lexically contrastive in the L1 (i.e. if L1 words differ in their stress pattern) than if it is not lexically contrastive (e.g. Dupoux et al. 1997, 2001, 2008; Peperkamp and Dupoux 2002; Peperkamp 2004; Tremblay 2008, 2009; Peperkamp et al. 2010; C. Y. Lin et al. 2014). A number of studies have provided support for this approach. Using AX perception and sequence recall tasks, Dupoux and colleagues showed that naive French listeners performed more poorly than native Spanish listeners when attempting to perceive stress in phonetically variable Spanish nonwords (i.e. nonwords differing in stress uttered by various Spanish speakers; e.g. Dupoux et al. 1997, 2001). Whereas Spanish words can differ in their stress pattern (e.g. Harris 1983), French words either are unstressed or have their final syllable ‘stressed’ in phrase-final position (e.g. Jun and Fougeron 2000, 2002).1 French listeners’ so-called stress deafness was attributed to stress not being lexically contrastive in French (Dupoux et al. 1997, 2001) (for similar results with speakers of Finnish, a language where stress is also not lexically contrastive, see Peperkamp and Dupoux 2002). To provide a theoretical account of these findings, Peperkamp and Dupoux (2002) proposed the Stress Parameter Model (see also Peperkamp 2004). According to this model, listeners who, during the first two years of their life, receive exposure to a language in which stress is lexically contrastive (e.g. Spanish) set the Stress Parameter to encode (i.e. represent) stress phonologically in their lexical representations, whereas listeners who are exposed to a language where stress is not lexically contrastive (e.g. French, Finnish) do not. For listeners whose L1 does not have lexical stress, the model further predicts gradience in the degree of stress deafness as a function of whether the L1 prosodic system requires listeners to tease apart content words from function words. For example, in Hungarian, the first syllable of the first content word in a phrase is ‘stressed’ (Vago 1980), and in Polish the penultimate syllable of every content word is stressed (Comrie 1967). A lower degree of stress deafness is predicted for speakers of these languages than for speakers of languages where prosodic generalizations can be made independently of the lexical status of words (e.g. French, Finnish) (for such results, see Peperkamp and Dupoux 2002; Peperkamp et al. 2010) (for electrophysiological evidence that Polish listeners show different patterns of responses to different types of stress violations, see Domahs et al. 2012). ‘Stress deafness’ has been reported not only for naive French listeners but also for Frenchspeaking L2 learners of Spanish. Dupoux et al. (2008) showed that French-speaking L2 learners of Spanish performed similarly to naive French listeners and significantly worse than native Spanish speakers when recalling the stress pattern of nonwords, and they performed significantly worse than native Spanish speakers when judging the lexical status of nonwords that differed from Spanish words only in their stress patterns, irrespective of their proficiency in Spanish. Even simultaneous French–Spanish bilinguals whose domin ant language was French appeared ‘deaf to stress’ (Dupoux et al. 2010). On the basis of these findings, Dupoux et al. (2010) proposed that when the two languages learned from birth

1 French does not have lexical stress; prominence is instead realized at the level of the phrase, with the last non-reduced syllable in the phrase receiving an intonational pitch accent and thus being perceived as more prominent than the preceding syllables (e.g. Jun and Fougeron 2000, 2002).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

596 ALLARD JONGMAN AND ANNIE TREMBLAY conflict in whether they have lexical stress, the Stress Parameter is set to encode stress only if the language with lexically contrastive stress is the dominant language. The phonological approach to the study of stress in L2 word perception/recognition has received support from additional language pairings, including L1 French L2 English (Tremblay 2008, 2009) and L1 Korean L2 English and L1 Chinese L2 English (C. Y. Lin et al. 2014). In C. Y. Lin et al. (2014), for example, L2 learners of English whose L1 was Korean, a language where stress is not contrastive at the word level (Jun 2005b), had significantly more difficulty recalling nonwords that differed in stress compared to native English listeners and L2 learners of English whose L1 was (Standard) Mandarin, a tonal language where stress is lexically contrastive (Chao 1968; Duanmu 2007). These findings suggest that whether or not listeners can use stress in L2 word perception/recognition is strongly influenced by whether stress is lexically contrastive in the L1 (though not exclusively so; for details, see Peperkamp and Dupoux 2002). The general predictions of the phonological approach are somewhat coarse, however, and the specific predictions of Peperkamp and Dupoux’s (2002) Stress Parameter Model have not been consistently supported (e.g. Rahmani et al. 2015). An approach that instead focuses on the specific cues that distinguish words from one another in the L1 and in the L2, henceforth referred to as the ‘phonetic approach’, may have more power in explaining L2 learners’ use of stress in word perception/recognition. According to this cue-based, phonetic approach, adults’ success at learning lexical stress in the L2 is also influenced by the degree to which the acoustic cues used to realize stress in the L2 (e.g. fundamental frequency (f0), duration, intensity, vowel quality) signal lexical contrasts in the L1 (e.g. Cooper et al. 2002; Zhang and Francis 2010; Ortega-Llebaria et al. 2013; Chrabaszcz et al. 2014; C. Y. Lin et al. 2014; Qin et al. 2017) (for a similar approach to the use of prosodic cues in L2 speech segmentation, see Tremblay et al. 2018). This approach has also received some empirical support. Cooper et al. (2002), for example, showed that Dutch-speaking L2 learners of English were more accurate than native English listeners in a task where they selected the continuation of a stressed or unstressed word fragment they heard. Both English and Dutch have lexically contrastive stress, but unstressed syllables are more reduced in English than in Dutch (e.g. Sluijter and van Heuven 1996a). Since the word fragments used in the experiment all contained full vowels, listeners could not use vowel quality as a cue to lexical stress and thus had to rely on suprasegmental cues such as f0, duration, and intensity to determine whether the fragment was stressed and select the corresponding continuation for that fragment. Hence, the more accurate performance of the Dutch listeners (as compared to that of the English listeners) was attributed to their greater sensitivity to the suprasegmental cues to stress (see also van Heuven and de Jonge 2011). Cooper et al. (2002)’s results thus provide some support for a phonetic approach to the study of L2 learners’ use of stress in word perception/recognition. C. Y. Lin et al. (2014), discussed earlier, also obtained results that can be interpreted within a phonetic approach. In a lexical decision task, native English listeners, but not Korean- or Chinese-speaking L2 learners of English, were more likely to reject English nonwords that were incorrectly stressed if the incorrect stress placement affected the quality of the vowels in the word. The Korean listeners’ results were attributed to the absence of vowel reduction in their L1. Although (Standard) Mandarin does have vowel reduction, reduced vowels cannot occur in word-initial syllables in Mandarin. The Chinese listeners’ results

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

WORD PROSODY IN SECOND LANGUAGE ACQUISITION 597 were attributed to the fact that many of the stimuli in C. Y. Lin et al.’s (2014) lexical decision task had a vowel quality change in the first syllable. Stronger evidence for a phonetic approach to the use of stress in L2 word perception/ recognition was provided by Zhang and Francis (2010). Using a word identification task with auditory stimuli in which the segmental and suprasegmental cues to lexical stress were orthogonally manipulated, the authors showed that although both native English listeners and (Mandarin) Chinese L2 learners of English relied more on vowel quality than on f0, duration, or intensity when recognizing English words that differed in stress, Chinese listeners’ relative reliance on f0 was greater than that of English listeners. These results were attributed to the fact that Chinese has lexical tones and f0 is the primary cue to these tones (Howie 1976; Gandour 1983). Additional support for a phonetic approach was provided by Ortega-Llebaria et al. (2013). Using a task in which listeners identified the stress pattern of a word in prenuclear position in a declarative sentence, the authors showed that English-speaking L2 learners of Spanish perceived syllables with an f0 rise as being stressed, unlike native Spanish listeners, who perceived the f0 rise as signalling stress post-tonically. Stressed syllables in prenuclear pos ition in Spanish are associated with an f0 rise post-tonically (Prieto et al. 1995; Hualde 2005). L2 learners thus need to associate this f0 rise with the (stressed) syllable preceding it, something that Ortega-Llebaria et al.’s (2013) L2 learners did not appear to do. The L2 learners also made greater use of duration cues than did the native listeners. The duration ratio of stressed to unstressed syllables is larger in English than in Spanish (Delattre 1966), due in part to the occurrence of vowel reduction in English (Beckman and Edwards 1994) but not in Spanish. English listeners thus appeared to transfer the use of duration cues to the perception of stress in full vowels in Spanish. These results indicate that even when both the L1 and the L2 have lexically contrastive stress, L2 listeners must learn the acoustic cues to stress in the L2 in order to perceive stress accurately. Chrabaszcz et al. (2014) provided further evidence that L2 learners’ perception of stress is contingent on the cues that signal stress in the L1. In a stress perception task with nonwords, native English listeners and L2 learners of English who spoke Mandarin or Russian as their L1 were found to differ in their reliance on suprasegmental cues to stress: whereas both English and Mandarin listeners weighted f0 cues more heavily than duration and intensity cues, Russian listeners showed the opposite pattern of results. These results were attributed to the participants’ L1, with f0 not being a reliable cue to stress in Russian, unlike in English and Mandarin. Evidence in support of a cue-based approach was also provided by Rahmani et al. (2015). The authors reported that Dutch and Japanese listeners outperformed Persian, Indonesian, and French listeners when recalling sequences of nonwords that differed in stress. Japanese does not have lexical stress, but it has lexical pitch accents, with words differing in their tonal (i.e. pitch) patterns; in contrast, Persian does not have lexical stress or lexical pitch accents, and neither does Indonesian (for more details on the prosodic systems of each of these languages, see Rahmani et al. 2015). The authors interpreted their results as suggesting that listeners can encode stress in sequence recall tasks only if the L1 encodes prosodic markings at a lexical level. Although the authors’ explanation is more phonological in nature, it yields the same predictions as a phonetic, cue-based approach, with Japanese listeners’ use of pitch to differentiate L1 words enabling them to perceive and process L2 stress.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

598 ALLARD JONGMAN AND ANNIE TREMBLAY Qin et al. (2017) similarly showed that L2 learners’ processing of lexical stress is modulated by which prosodic cues signal lexical contrasts in the L1. The authors investigated whether speakers of Standard Mandarin, a dialect with lexical stress (Chao 1968; Duanmu 2007), and speakers of Taiwan Mandarin, a dialect without lexical stress (Kubler 1985; Swihart 2003), differed in their ability to use f0 and duration cues when perceiving stress in English nonwords. In both varieties of Mandarin, f0 is the primary cue to lexical tones (Howie 1976; Gandour 1983), and in Standard Mandarin, duration is the primary cue to stress (T. Lin 1985). The results of a sequence recall task showed that, as predicted, L2 learners of English whose L1 was Standard Mandarin made greater use of duration cues when perceiving English stress than L2 learners of English whose L1 was Taiwan Mandarin, and both L2 groups made lesser use of these cues when compared to native English listeners. Crucially, when stress was realized with conflicting f0 and duration cues, both L2 groups relied more on f0 than on duration when perceiving English stress, whereas native English listeners relied equally on both types of cue. The greater reliance on f0 than on duration for Mandarin listeners was interpreted as their transferring the use of f0 from the perception of lexical tones in the L1 to the perception of stress in the L2. These findings thus provide further support for a cue-based, phonetic approach to the perception of L2 stress. All in all, the existing research on the use of stress in L2 word perception/recognition suggests that listeners’ success at perceiving stress in the L2 is predicted by both whether stress is lexically contrastive in the L1 and which prosodic cues signal lexical contrasts in the L1. Further research that focuses on the transfer of specific acoustic cues from the L1 to the L2 is needed in order to refine the predictions of the phonetic approach.

43.2.2 Second language word production The influence of the L1 on adults’ production of stress in the L2 has largely been studied from a phonological perspective. This research has typically focused on whether L2 learners from various L1 backgrounds stress the correct syllable in L2 words (e.g. Mairs 1989; Archibald 1992, 1993; Pater 1997b; Tremblay and Owens 2010). The general prediction from this approach is that L2 learners will be more successful at producing the correct lexical stress pattern if the generalizations underlying stress placement in the L1 (if any) are similar to those underlying stress placement in the L2. Archibald (1992, 1993) analysed the stress systems of participants’ L1 and L2 using the parameters of Metrical Theory, proposed by Dresher and Kaye (1990), and made predictions for the production of stress in L2 words using the L1 parameters. In a read-aloud task, Archibald (1992) found that when Polish-speaking L2 learners of English incorrectly stressed English words, they tended to stress the penultimate syllable. As mentioned earlier, Polish does not have lexically contrastive stress; in Polish, words are consistently stressed on the penultimate syllable. L2 learners’ incorrect stress placement in English was attributed to Polish stress not being related to syllable weight (unlike in English, where syllables analysed as heavy should be stressed; Dresher and Kaye 1990) and to Polish words not ending in an extrametrical syllable (unlike in English, where the last syllable of nouns was analysed as extrametrical and, thus, invisible to stress; Dresher and Kaye 1990).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

WORD PROSODY IN SECOND LANGUAGE ACQUISITION 599 Using a similar task, Archibald (1993) showed that when Spanish-speaking L2 learners of English made stress placement errors in English words, they tended to stress either the penultimate syllable or the final syllable if the latter contained a diphthong or one or more coda consonants. Since the majority of the incorrectly stressed words ended with a deriv ational suffix, the author attributed these results to derivational affixes not being extrametrical in Spanish, unlike in English (for similar results, see Mairs 1989). Thus, in Archibald (1992, 1993), L2 learners’ stress errors were attributed to the different generalizations underlying stress placement in the L1 and the L2. Pater (1997b) adopted a similar approach with French Canadian L2 learners of English but instead elicited English nonwords. The nonwords were elicited in the subject position of a carrier sentence, and thus were interpreted as nouns. Pater (1997b) found that these L2 learners typically stressed the first syllable of trisyllabic nouns and, unlike native English speakers, showed little sensitivity to syllable structure in their production of lexical stress.2 Importantly, the L2 learners rarely stressed the last syllable of the trisyllabic nonwords, which is the pattern of responses that Pater (1997b) predicted based on his analysis of Canadian French using Dresher and Kaye’s (1990) parameters (with French being analysed as having an iambic, quantity-insensitive foot), suggesting a lack of L1 transfer. Tremblay and Owens (2010), who used a similar task, also reported a tendency for French Canadian L2 learners of English to stress the initial syllable of disyllabic and trisyllabic nonce nouns, independently of syllable structure. The results of Pater (1997b) and Tremblay and Owens (2010), unlike those of Archibald (1992, 1993), suggest that L2 learners do not necessarily show clear evidence of L1 influence in their production of L2 lexical stress. Tremblay and Owens (2010) attributed the L2 learners’ production of initial stress to the statistical frequency with which nouns are stressed on the initial syllable in English (e.g. Cutler and Carter 1987; Clopper 2002), leading them to overgeneralize this pattern to contexts where stress should not be word-initial (e.g. in trisyllabic nouns that contain a heavy penultimate syllable). L2 learners’ production of lexical stress has also been investigated from a statistical perspective. This approach has focused on whether L2 learners can learn the statistical regular ities of stress patterns in the L2, independently of the L1 (e.g. Davis and Kelly 1997; Guion et al. 2004; Guion 2005). The researchers who have conducted studies under this approach have expressed concerns about the psychological reality of the stress rules proposed to explain stress placement, at least in English (for discussion, see Guion et al. 2004). Guion et al. (2004) examined the production of English nonwords by Spanish speakers who had acquired English at an early age (mean of 3.7 years) or at a later age (mean of 21.5 years).3 Statistically, in English, disyllabic words are more likely to be stressed initially if they are nouns than if they are verbs (Sereno 1986; Kelly and Bock 1988), and syllables with diphthongs are more likely to be stressed than syllables with lax vowels (Guion et al. 2003). Guion et al. (2004) showed that the early L2 learners (and native English speakers) were more likely to stress the first syllable of disyllabic nonce words when they were elicited as nouns than when they were elicited as verbs. Furthermore, the early L2 l earners 2 Pater (1997b) also examined L2 learners’ production of secondary stress in quadrisyllabic words. We do not discuss the results for these words due to space limitations. 3 Guion et al. (2004) also elicited judgements of stress patterns in the same nonwords. We do not report these results due to space limitations.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

600 ALLARD JONGMAN AND ANNIE TREMBLAY (and native speakers) were more likely to stress syllables that contained a diphthong than syllables that contained a lax vowel, and to stress syllables that contained a complex coda than syllables that contained a simple coda. The late L2 learners, on the other hand, showed an effect of lexical class only for words that contained a lax vowel in the first syllable and a simple coda in the second syllable, and an effect of vowel only for words that had a diphthong in the second syllable. They were also more likely to stress the first syllable of all words compared to the early L2 learners and native speakers, suggesting a similar overgeneralization of the word-initial stress pattern as observed in Pater (1997b) and Tremblay and Owens (2010). The authors interpreted their results as suggesting that the L2 learners’ age of acquisition had an important effect on their ability to extract the statistical regularities of stress patterns from the input, especially regularities that relate to syllable structure. In an attempt to determine whether Korean speakers could learn the statistical regular ities of stress patterns in English, Guion (2005) conducted a replication of Guion et al.’s (2004) study but with early and late Korean L2 learners of English. Her production results were similar to those of Guion et al. (2004), with age of acquisition having an effect on L2 learners’ ability to extract the stress regularities that relate to lexical class and syllable structure from the input. However, since the two studies were reported in different papers, it is unclear whether the L1 significantly affected L2 learners’ production of stress. In summary, the existing research on the production of lexical stress in the L2 suggests that L2 learners can but do not necessarily transfer the generalizations underlying stress placement from the L1 to the L2. Furthermore, L2 learners appear to be able to learn the statistical regularities that relate to stress placement in the L2, though not at a native-like level, with early L2 learners outperforming late L2 learners.

43.3 Lexical tone L1 effects have been found not only on the perception/recognition and production of lexical stress but also on the perception/recognition and production of lexical tones. It is by now well established that there is a difference in the way native speakers of a tone language perceive tonal distinctions as compared to native speakers of a non-tonal language. Most generally, speakers of a tone language can discriminate tones more accurately and quickly than speakers of a non-tonal language (see Lee et al. 1996; Wayland and Guion 2003; Bent et al. 2006). Furthermore, speakers of a tone language and speakers of a non-tonal language are differentially sensitive to individual pitch cues. In a tone language, tone (or its acoustic correl ate, f0, or its perceptual correlate, pitch) serves to distinguish word meaning. In a non-tonal language, on the other hand, while pitch may provide grammatical or intonational information, it does not distinguish word meaning. Speakers of a tone language are more sensitive to changes in pitch direction and slope, which are crucial to tonal identification and thus linguistically relevant, whereas speakers of a non-tonal language attend more to general phonetic properties such as pitch height (average pitch) and duration, which are arguably linguistically less relevant in the languages under investigation (Gandour and Harshman 1978; Gandour 1983; Chandrasekaran et al. 2007b; Jongman et al. 2017).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

WORD PROSODY IN SECOND LANGUAGE ACQUISITION 601 Additionally, studies of categorical perception of tones show that tonal distinctions that involve pitch movement are perceived in a categorical manner by native speakers of tone languages (e.g. Hallé et al. 2004; Xu et al. 2006a; Peng et al. 2010). Listeners of non-tone languages, on the other hand, are much more sensitive to within-category differences as compared to native speakers of tone languages. This less categorical nature of tone perception has been reported for listeners from a variety of non-tonal-language backgrounds, including Dutch (Leather 1987), English (Xu et al. 2006a), French (Hallé et al. 2004; see also DiCanio 2012b for French participants listening to Trique), and German (Peng et al. 2010). It has been suggested that speakers of non-tonal languages process the stimuli in an acoustic or psychophysical mode, such that they are sensitive to small acoustic differences as long as they exceed the difference limen (e.g. Hallé et al. 2004); in contrast, speakers of tone languages process the stimuli in a linguistic mode, ignoring small acoustic differences between members of the same category so that they can assign them to one of two tonal categories. The above-reviewed differences in tonal perception between speakers of a tone language and speakers of a non-tone language entail challenges in tonal acquisition in an L2. In what follows, we will discuss a number of these challenges as well as the extent to which training can mitigate them.

43.3.1 Second language perception/recognition of lexical tone L2 tonal acquisition has been shown to be affected by several factors, including training, familiarity with lexical tone, language proficiency, and tonal transfer. To begin with, research to empirically assess learner performance employs the well-established high-variability phonetic training paradigm aimed at assisting learners to establish L2 phonetic categories by exposing them to a great variety of exemplars of a category (Logan et al. 1991). In one of the first tone training studies, Y. Wang et al. (1999) demonstrated that a brief training regimen (eight sessions over the course of two weeks) significantly improved English learners’ identification of Mandarin tone. During training, learners (students in their first semester of Mandarin instruction) were exposed to a variety of talkers and phonetic contexts; tones were trained pairwise and learners received feedback. The results showed a substantial gain (21%) in tone identification accuracy after training, as compared to no gain in a control group that received no training. All four tones were identified more accurately after training. In addition, the training benefit extended to both words and speakers not encountered during training, and was still present six months after training. Training has also been extended to native speakers of a tone language learning a foreign tone system. While training in the laboratory clearly improves learners’ perception of tone, it is important to establish whether this improvement reflects more native-like processing. Francis et al. (2008) provided Mandarin Chinese (tonal) and English (non-tonal) participants with 10 hours of training on Cantonese tones. Overall, training improved the per formance of both groups to the same extent. Multi-dimensional scaling analysis showed that the two primary dimensions, pitch height and pitch slope, accounted for a greater proportion of the variance after training compared to before training. That is, both groups

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

602 ALLARD JONGMAN AND ANNIE TREMBLAY became more Cantonese-like in their weighting of these two cues. However, language- specific differences remained, with non-tonal English participants giving more weight to height over direction while Mandarin participants assigned about equal weight to both cues. Wayland and Guion (2003) addressed the role of language proficiency by comparing the discrimination of the mid and low tones in Thai by native speakers and by English speakers with and without Thai language experience. The experienced speakers had studied Thai for 2.5 years and had lived in Thailand for four years on average. The performance by the native speakers of Thai was best; the English speakers without Thai experience did the worst, and the L2 learners of Thai fell in between. Similarly, Guion and Pederson (2007) showed that, when discriminating tone, advanced American learners of Mandarin were more similar to native Mandarin speakers than Americans without any experience with a tone language: while the last group only used average pitch in their discrimination of synthesized stylistic tones, both the advanced learners and the native speakers of Mandarin used pitch slope in addition to average pitch. Qin and Jongman (2016) directly evaluated the role of tone transfer by comparing discrimination of Cantonese level and contour tones by native speakers of Mandarin, native speakers of English, and English learners of Mandarin. They found that both the native Mandarin speakers and the English learners of Mandarin were better at discriminating the contour–level tone pairs than the level–level tones. This suggests that experience with Mandarin increased L2 learners’ sensitivity to f0 direction in the perception of Cantonese tones. Moreover, the L2 learners, as well as the monolingual English speakers, were better than the native Mandarin speakers at discriminating the level–level tone pairs, suggesting that the English L1 experience still influenced how L2 learners of Mandarin perceived Cantonese tones. Overall, previous lexical tone experience in a tone system, be it either as an L1 or an L2, transfers to the perception of tones in a different tone system. In addition, tone training improves tone categorization even for participants with little or no exposure to a tone language. However, differences in the weighting of cues to tone may remain between learners and native speakers of a tone language, and between learners from different non-tonal L1 backgrounds (e.g. Braun et al. 2014). Brain imaging studies can provide an additional way of assessing the extent to which the improvement observed after training reflects more native-like processing (e.g. Kaan et al. 2008).

43.3.2 Second language production of lexical tone The production of tone has received much less attention than the perception of tone. While much research remains to be done in this area, there are several studies that report on the difficulties that L2 learners of a tone language encounter. For example, Shen (1989b) found that English learners of Mandarin had difficulty with the production of all tones but especially with Tone 4. Tone 4 errors may be ascribed to the fact that this falling tone is less prosodically marked for speakers of English, who may therefore use it more frequently to substitute for other tones. Miracle (1989) conducted an acoustic analysis of productions made by second-year American learners of Mandarin. She used the pitch contours of the learners as well as three native speakers to assign the productions to one of the four tonal categories. Miracle (1989) reported an error rate of 43%. These errors were evenly divided

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

WORD PROSODY IN SECOND LANGUAGE ACQUISITION 603 between register (too low or high) and contour errors and across all four tones. While Y. Wang et al. (2003a) observed a similar overall error rate as Miracle (1989), the distribution of errors was quite different, with accuracy scores for Tones 1, 2, and 4 around 67% but only around 20% for Tone 3. It should be noted that Tone 3 is also acquired last by native speakers of Mandarin (e.g. C. N. Li and Thompson 1977). Y. Wang et al. (2003a) also directly examined the tone production of their participants in a perceptual training study (Y. Wang et al. 1999). The training group read a wordlist before and after training while the control group read this wordlist twice separated by two weeks. Pre- and post-training productions were evaluated in two ways: by native Mandarin listeners in an identification task and by acoustic measurements. Results from the perceptual evaluation indicated that a significantly greater number of tones produced after training were perceived as intended as compared to those produced before training. The acoustic analysis consisted of a detailed comparison of a number of parameters including onset, offset, maximum, and minimum pitch value between native Mandarin and learner productions. The acoustic results were consistent with the native-speaker judgements as they showed that the pitch contours of the learners’ productions were closer to those of the native speakers after training as compared to before training. This finding is particularly interesting when it is considered that the participants in this study were only trained on the perception of tone and did not receive any production training. Nevertheless, the benefits from the perception training seemed to carry over to production.

43.4 Conclusions and future directions The above research shows important L1 effects on how adult listeners perceive and process lexical stress and lexical tone in L2. For lexical stress, listeners’ success at encoding stress in the L2 is predicted by both whether stress is lexically contrastive in the L1 and which pros odic cues signal lexical identity in the L1. For L2 learners whose L1 does not have lexically contrastive stress, proficiency in the L2 does not seem to be a strong predictor of listeners’ ability to perceive and use lexical stress in word recognition. A better predictor is instead whether a prosodic cue signals lexical identity in the L1, with L2 learners relying on this cue to perceive and use stress in L2 word recognition even if the L1 does not have lexically contrastive stress. For lexical tone, speakers of tone languages and speakers of non-tonal languages differ in the weighting of tonal features, with speakers of tone languages being more sensitive to changes in pitch direction while speakers of non-tonal languages attend more to pitch height. Native knowledge of a tone language or acquired proficiency in a tone language can help the processing of tone in an L2. However, the best predictor of performance in a non-native tone language may be the extent to which the acoustic features that characterize the tones in the L1 correspond to those used in the tones of the L2. Although the research cited in this chapter also reveals L1 effects on the production of lexical stress and lexical tones, L2 learners do not necessarily show L1 effects in their production of lexical stress and may instead be sensitive to the statistical regularities that relate to stress placement in the L2.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

604 ALLARD JONGMAN AND ANNIE TREMBLAY Laboratory training studies using the high-variability training paradigm have been shown to substantially improve learners’ perception and production of tone. Such research should be extended to the perception and production of lexical stress by L2 learners whose L1 does not have lexically contrastive stress (for an example of a study with English L2 learners of Spanish, see Romanelli et al. 2015). A key challenge is to establish ways to effect a shift in cue weighting to attain more native-like performance (e.g. Lim and Holt 2011). For lexical tones, while much of the training research has focused on monosyllabic words, future research should expand to polysyllabic words in which tonal coarticulation creates a much greater degree of acoustic variability. For example, it has been shown that it is more difficult to learn tones in disyllabic than monosyllabic words and that the learnability of a given tone is correlated with its contextual tonal variability (Chang and Bowles 2015). Moreover, preliminary results suggest that training on monosyllabic words does not result in improved tone perception in bisyllabic words, while training on bisyllabic words does transfer to monosyllabic words (Y. Li 2016). Preliminary findings also show that tone training at phrasal and sentential levels results in improvements in tone perception in these larger linguistic contexts (X. Wang 2012). The roles of individual differences and aptitude need to be investigated in more detail. Research indicates that individual differences in cue weighting may be able to predict which participants benefit most from training (Chandrasekaran et al. 2010), and perceptual aptitude may determine which type of training is preferable, with high-variability training benefiting high-aptitude perceivers while low-aptitude perceivers preferred low-variability training (e.g. Perrachione et al. 2011; Sadakata and McQueen 2014). Finally, there are additional promising areas that could not be included in this review because of space constraints. One such area concerns L1–L2 interactions in tone perception in bilinguals. For example, experience with a tonal language has been shown to affect the processing of English intonation by Chinese learners of English (Ortega-Llebaria et al. 2015). Another area requiring more research is the extent to which the communicative goal (e.g. focusing on tone or intonation) makes listeners shift their cue weighting (X. Li et al. 2008).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

chapter 44

Sen tence Prosody i n a Secon d L a nguage Jürgen Trouvain and Bettina Braun

44.1 Introduction This chapter aims to give an overview of the state of the art in research on sentence prosody (i.e. the prosodic properties beyond the word level) in a second language (L2) from both the production and the perception perspective (see chapter 43 for L2 word prosody). By ‘L2’ we refer to any second or non-native language that is acquired after childhood, and ‘L1’ refers to the first or native language(s). We use the term ‘native speaker’ to refer to speakers of the learners’ target languages. L2 sentence prosody is still a relatively underexplored field of L2 acquisition, as evidenced by a lack of discussion on this topic in various handbooks on L2 acquisition (e.g. Doughty and Long 2005; Ritchie and Bhatia 2009; Gass and Mackey 2012; Herschensohn and Young-Scholten 2013). In recent years, research on L2 sentence prosody has been boosted by the collection, annotation, and provision of phonetic learner corpora (Trouvain et al. 2017). Examples with scripted and unscripted speech include the LeaP corpus (Gut 2012), the COREIL corpus (Delais-Roussarie and Yoo 2011), and the AixOx corpus (Herment et al. 2012). However, with annotation of learner data we face a problem that is inherent in the annotation of L2 research and in prosodic annotation in general, the choice of appropriate measures or categories: annotation can be done at the purely acoustic level, with category labels of the L1, with category labels of the L2, with new inter-language categories, or with ‘error’ categories, as discussed by Ramírez Verdugo (2005), Bartkova et al. (2012), and Albin (2015). The choice of coding has consequences for reliability, validity, and the comparability of results across studies. Current research is typ ically concerned with topics relating to intonation (e.g. types of pitch accent and their location and function, types of boundary tone and their functions, and prosodic phrasing) or timing (e.g. rhythm, tempo, pauses, fluency); research combining L2 melodic and timing aspects is still sparse, as noted by Mennen and de Leeuw (2014). In what follows, we will review research on intonation in §44.2 and research on timing in §44.3, discuss work on perception of L2 sentence prosody in §44.4, and conclude with a discussion on challenges facing in this field (§44.5). As in other areas of L2 acquisition, parts of L2 prosody become more native-like with earlier age of acquisition (see e.g. Huang and

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

606 Jürgen TROUVAIN AND BETTINA BRAUN Jun 2011 for different prosodic features across groups), higher proficiency and fluency in the L2 (e.g. Gut 2009a; Swerts and Zerbian 2010), and for many though not all individuals over time (e.g. Wieden 1993). Although we cannot discuss all of these extra-linguistic aspects in detail, we will mention them where appropriate in our review.

44.2 Intonational aspects of second language sentence prosody In this section we review two linguistic functions of prosody, the prosodic marking of information structure (§44.2.1) and questions (§44.2.2), and then turn to the acquisition of phrasing (§44.2.3) as well as the phonetic implementation of prosodic events (§44.2.4). We end this section with an overview of the prosodic marking of non-linguistic attributes (§44.2.5).

44.2.1 Prosodic marking of information structure Information structure can have several partitions: focus/background, topic/comment, or given/new (Krifka 2008; see also chapter 31). In brief, focus signals the presence of alternatives, topic links an utterance to the prior discourse, and the given/new partition refers to information that either has been mentioned before or not. An additional aspect to the various partitions is contrastive information—that is, information that contrasts with or corrects prior information or assumptions. Information structure is signalled via various linguistic devices: lexical markers, syntactic operations, and prosody (e.g. Vallduví and Engdahl 1996; Burdin et al. 2015). Languages do not only differ in the relative importance of these devices but also in the use of prosody. For example, languages can differ in the exact use of prosodic prominence for focus marking even if they are similar in marking phrase-level prosodic prominence. Languages can also differ in whether attenuation of prosodic prominence is used to mark givenness and post-focal information. Languages can be grouped according to how they mark phrase-level prosodic promin ence: language can mark the head of the phrase (e.g. English, using different pitch accents), the edge of the phrase (e.g. Korean, using tonal marking at prosodic edges), or both (e.g. French, Japanese; cf. Jun 2014b). In head-prominence languages (e.g. English, German, Dutch) a pitch accent is assigned to the word in focus (see (1) for a pitch accent on everybody). In broad focus (i.e. a response to a question like ‘What’s new?’), the pitch accent typically falls onto a so-called focus exponent that is defined syntactically (e.g. prosody in the example sentence in (1); cf. Ladd 2008b). Acoustically, the result is a salient f0 movement on the focus exponent (i.e. word that carries the focus), higher intensity, and longer duration, but also a higher probability of pauses following the focused constituent (e.g. Arnhold 2016). ‘f0’ stands for the frequency of vocal fold vibration, whose perceptual correlate is pitch. In some lexical tone languages (e.g. Mandarin Chinese) which are also head-prominence languages, the focused constituent is realized by an increase in f0 range of the lexical tone, followed by compression of the f0 range in the post-focus constituent

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

SENTENCE PROSODY IN A SECOND LANGUAGE 607 (e.g. Xu 1999), but other tone languages show pre-focus pitch raising and pitch compression from the focused word onwards (e.g. Bemba; cf. Kula and Hamann 2017). Other tone languages show no prosodic focus marking at all (Downing and Rialland 2017a: 7). In contrast, in other head-prominence languages, such as Italian, Catalan, and Spanish, information structure is mainly marked via syntactic operations, such as dislocations or syntactic movement (e.g. Vallduví and Engdahl 1996; Jun 2005a; Büring 2009; Dehé et al. 2011). These languages often do not deaccent given information, like prosody in the answer of example (1), and produce an additional accent on prosody. In edge-prominence languages (e.g. Korean), on the other hand, a focus constituent is marked by the initiation of a new prosodic phrase, into which the post-focus constituent is integrated. (1) Question: Who is interested in prosody?

Answer: Everybody is interested in prosody. (Stressed syllables are underlined, pitch accent is displayed in the schematic f0 contour) Learners of the head-prominence languages outlined above are faced with the questions of where to locate pitch accents and where to implement phrase breaks (let alone syntactic operations, which are not dealt with here; cf. Hertel 2003; Zubizarreta and Nava 2011). In contrast, learners of edge-prominence languages face the challenge of producing phrasing accordingly. Learners of both types of languages need to acquire the phonetic realization of accents and phrase breaks (see §44.2.4). The acquisition of the prosodic realization of focus in head-prominence languages has been studied in great detail, mostly with English as the target language. Learners whose L1 is a head-prominence language that marks focus syntactically or is a tone language appear to encounter more difficulty than learners whose L1 is similar to English in its prosodic system and in the use of accent placement for focus marking. For example, Ramírez Verdugo (2006) reported that Spanish learners of English overgeneralized broad-focus realizations to contrastive focus contexts—that is, they did not produce a rising-falling accent on everybody but an accent on prosody in (1). In utterances in which the accent location was correct, learners often differed in accent type from native speakers: they produced more rising accents, while native speakers of English produced more falling accents. In another study, Spanish learners of English were reported to insert pauses after the focused constituent, a strategy that was not present in native English speakers (Ortega-Llebaria and Colantoni 2014). Similarly, learners from tone languages have difficulties in accent location (Baker 2010 for Mandarin speakers of English; Swerts and Zerbian 2010 for Zulu speakers of English). In a study by O’Brien and Gut (2010), German learners of English showed the same accent placement as native speakers of English in a variety of focus conditions but differed from native speakers of English in (i) accent type (by using rising accents more frequently) and (ii) phonetic implementation. Learners from edge-prominence languages (Korean, Japanese) were more accurate than Mandarin learners of English in placing the pitch accent in different focus conditions (Baker 2010), but there are studies showing wrong accent placement in Japanese learners of English (Saito 2006).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

608 Jürgen TROUVAIN AND BETTINA BRAUN There is, however, limited research on the acquisition of focus in edge- and head/edgeprominence languages. Current findings suggest that learners from an edge-prominence language can learn the target pattern if the target language places accents on structural principles instead of information-structural principles. For example, in French, accent distribution is largely independent of whether information is new, given, or contrastive (Krahmer and Swerts 2001), and advanced Dutch learners of French were found to largely produce the French pattern (Rasier and Hiligsmann 2007). For the head/edge-prominence language Japanese it was found that Swedish learners of Japanese put too much emphasis (f0 scaling) on the topic constituent and too little emphasis on the focus (NaganoMadsen 2014). In target languages that mark information structure syntactically, the main challenge lies in the acquisition of the correct word order (e.g. Hertel 2003) and—for learners from certain head-prominence languages—the suppression of prosodic highlighting. Turco et al. (2015) showed that Dutch and German learners of Italian exaggerated affirmative polarity contrast in their L2 productions, either by lexical markers (Dutch) or by a rising-falling pitch accent (German), markings that are not present in Italian L1. Likewise, English learners of Spanish were shown to use higher intensity and pitch to mark focus and contrast while native speakers of Spanish used syntactic and lexical markers (Kelm 1987). Finally, we turn to deaccentuation of given information. English, German and Dutch are typical deaccenting languages, while Italian and Spanish are not (Brown 1983; Ladd 2008b). Learners from languages without deaccentuation of given information often fail to deaccent given referents in an L2 deaccenting language (Gut and Pillai 2014 for Malaysian Malay speakers of English; Nguyễn et al. 2008 for Vietnamese learners of English; Swerts and Zerbian 2010 for Zulu speakers of English; Ueyama and Jun 1998 for Korean and Japanese learners of English). In contrast, learners with native languages that deaccent given information overuse deaccentuation in an L2 that does not deaccent (e.g. Rasier and Hiligsmann 2007 on Dutch learners of French). Another aspect related to the issue of deaccenting given information is post-focus compression (PFC), a mechanism whereby constituents following the focused one exhibit shorter durations, a more compressed f0 range, and lower intensity (Eady et al. 1986; Hindi: Patil 2008; Finnish: Vainio and Järvikivi 2007; Mandarin: Xu 1999). Production data show that Taiwanese learners of Mandarin Chinese— that is, speakers whose L1 does not have PFC (Southern Min; cf. Xu et al. 2012) but whose target language does—do not consistently produce PFC in their L2 Mandarin. It is rather the case that the correct acquisition of PFC seems to be guided by L2 use (Chen et al. 2014). In sum, much research has concentrated on the prosodic marking of focus, given versus new, and contrastive information. Studies suggest a strong influence of L1 and provide some evidence for successful learning in certain L1–L2 pairings. However, there is comparatively little work on the acquisition of topic marking in L2. Also, there is a need for research on a wider range of L1–L2 pairings to better study the underlying mechanisms.

44.2.2 Prosodic marking of questions Question forms include polar (yes/no) questions, constituent (wh)-questions, alternative questions, and tag questions. Depending on the language, neutral polar questions are marked syntactically (e.g. German, English), by particles (e.g. Urdu, Japanese), or purely

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

SENTENCE PROSODY IN A SECOND LANGUAGE 609 prosodically (e.g. Italian, Basque). In many languages, one can use a declarative syntax to ask a question (2a), even though the pragmatic effect may differ from a syntactically marked question (2b). Constituent questions are lexically marked by a question word, alternative questions by alternatives, and tag questions by tags (Bartels 1999). In many languages, the intonation of questions is quite variable (Hedberg et al. 2004, 2010; Kohler 2004; Braun et al. 2018), which makes it hard to establish what the native language (intonational) grammar is. (2)

a. You study prosody?

b. Do you study prosody?

Past work on L2 prosodic marking of questions shows that learners differ from native speakers in (a) prosodic realization and (b) the distribution of realizations. With respect to English polar questions, which often end in a high rise (Quirk et al. 1985), Greek learners of English transferred the typical polar question contour of their L1, which is a rise-fall (Arvaniti et al. 2006b), to that of English and also placed the nuclear accent on the verb (instead of the argument of the verb, as native speakers of English do) (Kainada and Lengeris 2015). The use of a falling contour in English polar questions was also reported for Thai and Spanish learners of English (Wennerstrom 1994). McGory (1997) compared declaratives and polar questions spoken by beginning and advanced Mandarin and Korean learners of English to those produced by native speakers of English. Beginners produced the final high boundary tone that is often used in English polar questions, but failed to produce the nuclear accent with a low tone and instead produced a high-falling accent, which typically occurs in English declarative sentences. In English, question tags may be falling or rising (Dehé and Braun 2013), the pattern being influenced by the polarity of the tag (positive or negative) and the position of the tag in the speaker’s turn. Spanish learners of English were found to use a rising tag irrespective of polarity and position of the tag (Ramírez Verdugo and Romero Trillo 2005). Mexican Spanish learners of French generally had a higher proportion of high rising boundary tones than French native speakers, irrespective of sentence type (Santiago Vargas and Delais-Roussarie 2012). To sum up, many factors influence the intonational realization of questions in different languages, such that the learner has to make sense of a pattern that is obscured by a lot of variability. This may make it hard to approach the distributions of target intonational p atterns and to figure out the factors that affect the intonation pattern in the non-native language.

44.2.3 Prosodic phrasing Depending on the language, the prosodic hierarchy distinguishes between prosodic phrases at different levels: accentual phrases (e.g. Japanese, Korean, French), intermediate phrases (e.g. English, German, often termed ‘minor phrases’), and intonational phrases (‘major phrases’) (e.g. Nespor and Vogel 2007). Learners have been shown to differ from native speakers in the number of phrase breaks they make and the tonal marking of the phrasal breaks. For example, in non-final position (e.g. after the subject-NP, such as protection in (3)), French learners of English produced more phrase breaks than native speakers of

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

610 Jürgen TROUVAIN AND BETTINA BRAUN English and ended phrases mostly with rising contours (Herment et al. 2014), while native speakers of English marked these minor phrases with f0 falls (Horgues 2013). On the other hand, learners of French whose native languages lacked accentual phrases (e.g. Mexican Spanish) produced fewer phrases than native speakers of French (Santiago Vargas and Delais-Roussarie 2012). (3) The idea of a good protection / is to guarantee that your computer doesn’t get infected by a virus/ (Slashes indicate phrase breaks; example from Horgues 2013.) In Korean, phrasing may distinguish between a polar question and a constituent question reading. Jun and Oh (2000) tested the acquisition of phrasing in minimal pairs, recording native speakers of Korean and English learners of Korean with varying proficiency. Phrasal grouping improved with increased proficiency, but learners generally produced more pitch accents, a type of accent that is absent in Korean. In sum, these studies suggest that phrasal marking is likely transferred from the native language, with proficiency being a modulating factor.

44.2.4 Phonetic implementation of pitch accents and boundary tones Previous research has shown that L2 prosody differs from that of the target language in the alignment of f0 peaks and f0 troughs of rises, peak scaling, and global f0 range. With respect to alignment, German learners of English were shown to align accentual tones later than native speakers of English, due to the influence of L1 dialect or regional accent (Atterer and Ladd 2004; Gut 2009b; Ulbrich 2013). A later alignment of high accentual tones is reported for Japanese and Spanish beginning and advanced learners of American English (Northern Virginia/Washington) by Graham and Post (2018). However, learners do not always produce a later tonal alignment: Dutch learners of Greek, for instance, produced rising accents with an earlier alignment than Greek native speakers (Mennen 2004), likely a transfer from their L1. In terms of scaling of pitch accents, Mandarin L2 speakers of American English prod uced accented words with higher f0 peaks than native speakers (Chen et al. 2001). Thai, Japanese, and Spanish learners of English increased f0 as much as the English native speakers to mark focal information but they showed less reduction of f0 on non-focused information than the native speakers (Wennerstrom 1994). Regarding the production of the nuclear tune (the last pitch accent of the phrase plus the following boundary tone), native speakers of German truncate falls if there is little sonorant material (i.e. they stop the f0 movement earlier in names like Shift as compared to names like Sheafer), while speakers of English compress them (i.e. they realize the full f0 movement in less time) (Grabe 1998a). German learners of English were found to transfer the truncation of falling accents when there was limited sonorant material, whereas English learners of German could correctly truncate the falling nuclear contours in L2 German (Zahner and Yu 2019). Finally, mixed findings have been reported for the production of f0 range in L2. Some studies observed a narrower f0 range for learners compared to native speakers (e.g. Kainada

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

SENTENCE PROSODY IN A SECOND LANGUAGE 611 and Lengeris 2015 for Greek learners of English), whereas other studies found the reverse (Aoyama and Guion 2007 for Japanese learners of American English; Santiago Vargas and Delais-Roussarie 2012 for Mexican Spanish learners of French). A third group of studies found no differences (Wennerstrom 1994 for Thai, Japanese, and Spanish learners of English; Zimmerer et al. 2015 for French learners of German, and German learners of French). There may be a number of other critical factors at play, such as speaker idiosyncrasies, the specific L1–L2 pairing, non-linguistic factors (e.g. level of uncertainty), pragmatic context, and the speech task.

44.2.5 Prosodic marking of non-linguistic aspects Besides linguistic meaning, prosody is often used to signal the speaker’s emotion, epistemic belief, and certainty in questions (e.g. Domaneschi et al. 2017), commitment (Truckenbrodt 2012), or attitude towards the proposition (Crystal 1969; O’Connor and Arnold 1973; Ladd 1980; Pierrehumbert and Hirschberg 1990; Wells 2006). Research on the use of prosody in marking non-linguistic meaning in L2 is rare. Existing work suggests that this aspect of L2 prosody also poses challenges to L2 learners, especially in the absence of explicit instruction, possibly due to differences between L1 and L2. For example, Ramírez Verdugo (2005) investigated how Spanish learners of English and native speakers of English realized uncertainty. While native speakers of English used a fall-rise to mark uncertainty and falling contours to mark certainty, the learners mostly used falling contours with a narrow f0 range in both the certainty and the uncertainty condition, which made it hard to perceive the contrast between the two conditions. Furthermore, Chen and de Jong (2015) examined the prosodic realization of sarcasm in advanced Dutch learners of English. They found that learners sounded less sarcastic to native speakers of English than to native speakers of Dutch. However, learners could produce more sarcastic-sounding prosody after brief training (Smorenburg et al. 2015).

44.3 Timing phenomena in second language sentence prosody This section deals with the timing phenomena (i.e. rhythm, tempo, and fluency) in L2 speech, which can have a dramatic impact on the perceived degree of foreign accent as well as the comprehensibility of L2 speech.

44.3.1 Rhythm The definition of rhythm is quite problematic (see chapter 11). There is a classic perceptionbased division of languages into stress-timed, syllable-timed, and mora-timed languages, depending on which of these entities defines approximately isochronous intervals (Pike 1945; Ladefoged 1975). Some studies on L2 speech used metrics for the quantification of speech

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

612 Jürgen TROUVAIN AND BETTINA BRAUN rhythm, such as ‘pairwise variability indices’ and interval measures of vowels and conson ants, together with rate-normalized interval measures (e.g. White and Mattys 2007a; Li and Post 2014; Ordin and Polyanskaya 2015). In these studies, some of these rhythm measures reflected differences between L1 and L2 speech for read sentences, albeit not in a consistent way across the studies. However, in view of the repeated lack of evidence for the classic perception-based rhythmic categories, it is important to integrate (i) tempo and prosodic phrasing, (ii) prominence structure, and (iii) segmental reductions in research on L2 rhythm, because these properties typically work in tandem. For example, the faster we speak, the fewer phrases and pitch accents we tend to produce (e.g. Trouvain and Grice 1999), along with more segmental and syllable reduction, as is often observed in spontaneous speech (Barry and Andreeva 2001; Engstrand and Krull 2001). Barry (2007) and Gut (2009a) examined temporal and prominence structure instead of adopting a rhythm measure in L2 German and English. They found that in L2 speech, especially by learners at lower proficiency levels, the durational relation between strong and weak syllables typically fell short of the L1 norm, whereas more advanced learners were able to produce patterns more similar to the target patterns. There are still other sources that are responsible for a deviant rhythm, some of which reside at the word level (wrong word stress, inappropriate usage of vowel reduction and deletion, insertion of epenthetic vowels) or at the utterance level (wrongly placed sentence accent and prosodic phrase breaks, missing or inappropriate linking between words). These aspects are important when explaining L2 speech rhythm and when studying how L2 speakers acquire and master appropriate rhythmical patterns.

44.3.2 Tempo and pauses L2 speech is often characterized by a slower tempo than L1 speech, due to a slower articulation rate and more and longer pauses (Pürschel 1975 and Wiese 1983 for German learners of English; Trofimovich and Baker 2006 for Korean learners of English). When considering stretches of speech beyond a single utterance, it is useful to distinguish between the rather general terms ‘speaking rate’, ‘speech rate’, and ‘tempo’ (for example), which typically include pauses, and the term ‘articulation rate’, which excludes pauses (Trouvain 2004). These measures are expressed in linguistic units per time unit: for example, syllables per second (syll/s), words per minute (wpm), mean segmental duration, or phones per second. The most widespread metric for speech tempo seems to be syll/s (see Trouvain 2004). Note, however, that syll/s can be problematic in cross-linguistic studies. For example, languages like English and German tend to omit entire syllables; it remains unclear whether syllables are to be considered at the underlying (phonological) level or at the phonetic surface in these languages. Moreover, differences in syllable complexity may cause biases, such that a German speaker’s speaking rate in syll/s in German with rather complex and hence longer syllable durations may be slower than his or her rate in an L2 with a less complex syllable structure (e.g. Trouvain and Möbius 2014). The measure wpm has the advantage of easy counting but has the disadvantage of cross-linguistic differences in word length and coarse granularity, while segments per second has a more fine-grained granularity but is prone to omissions, harder to define, and more time-consuming to count.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

SENTENCE PROSODY IN A SECOND LANGUAGE 613 Various studies show that, with increasing proficiency, the tempo of L2 speakers becomes faster and thus more similar to the tempo of the learner’s L1 and target language (Trouvain and Möbius 2014). On the perceptual side, it seems common that beginning L2 learners have the impression that L1 speakers speak at an extremely fast tempo (Abercrombie 1967: 96). Schwab and Grosjean (2004) investigated the relative tempo perceptions of L1 and L2 in Swiss German learners of French and found that the measured speech rate in the L2 positively correlated with the perceived speech rate in the L2 by L2 learners and negatively correlated with speech comprehension in the L2. As mentioned above, pause is a critical concept in the discussion of tempo and related terms. Pauses can be defined as phases of the speaking process in which the articulatory and phonatory activity is interrupted. However, there is no generally accepted threshold when a silence should count as a pause (distinguishing it from the closure phase of plosives, for example). Pauses are sometimes divided into ‘silent’ and ‘filled’ pauses. The latter corres pond to filler syllables like erm and uh. The so-called silent pause often contains inhalation noise, also called ‘breath pauses’, which are usually longer than silent pauses (Grosjean and Collins 1979; Trouvain et al. 2016). However, silences or breath noises are not required for the perception of a perceived pause due to the syntactic expectation together with cues such as final lengthening and the shape of the nuclear contour (Butcher 1981). L2 speech tends to have more pauses and longer pauses. But a study on Dutch L1 and L2 speech showed that these L2 pausing characteristics mainly concerned pauses within utterances and not between utterances (de Jong 2016).

44.3.3 Fluency Fluency is an important and often-mentioned concept in the assessment of L2 proficiency (Council of Europe 2011). Higher fluency in an L2 is associated with higher proficiency levels. Although we may have an intuitive idea of what fluency is, it is not easy to define it. Production fluency is based on a speech signal, which can be used for quantitative measurements. However, there is no agreement on the best parameter for production fluency (e.g. Raupach 1980; Gut 2009a; de Jong 2016). Measures that are frequently mentioned include the two tempo metrics articulation rate and speaking rate, but also mean length of run (run = inter-pause stretch), the ‘phonation/time ratio’ (ratio of articulation time to total speaking time), the number and the duration of unfilled pauses, and the number of filled pauses and other disfluencies (de Jong 2016; for details see Gut 2009a). In contrast to production fluency, perceptual fluency is based on the fluency assessment of listeners (mostly native speakers of the target language). Quite often the measurements of production and perceptual fluency differ. Fluency cannot be considered without disfluencies. In spontaneous speech, there are a number of markers of production disfluency. Formulations can be discontinued and the re-start can lead to a repair. The repair phase consists of the syllables to be repaired (reparandum), followed by the interregnum (or editing phase) after the interruption point, which is terminated with the reparans as the actually repaired sequence (Levelt 1983). The interregnum can contain an explicit editing term (e.g. No, I mean) or in many cases silent pauses and/or filled pauses. Interestingly, filled pauses hardly occur in read speech (Duez 1982; Trouvain et al. 2016) but they are very common in spontaneous speech (e.g. Duez 1982;

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

614 Jürgen TROUVAIN AND BETTINA BRAUN Cucchiarini et al. 2002). Filled pauses can also occur in articulatory phases without any interruption and without any silence. In those cases they may not be considered a disfluency but a means of fluency, sometimes also called ‘fluenceme’ (Götz 2013). In contrast to the negative associations that disfluencies trigger regarding the flow of spoken information, fluencemes may help the listener in speech comprehension. For example, fluencemes may elicit prediction of less accessible referents and may shift the attention of listeners to the upcoming information (e.g. Corley et al. 2007). Although disfluencies can be observed in both L1 and L2 speech, there is evidence that their beneficial effect for the speech comprehension process is present for L1 disfluent speech but not for its L2 counterpart (e.g. Bosker et al. 2014).

44.4 Perception of second language sentence prosody 44.4.1 Perception and interpretation In this subsection we review how L2 speakers perceive differences in the intonational form of the target language (e.g. discrimination of contours, determination of accent location) and how they interpret the role of prosody in signalling information structure, signalling questionhood, disambiguating syntactic ambiguities, and expressing paralinguisic meaning.1 Although research on these topics is still relatively sparse, existing work has shown that L2 perception of interpretation is subject to the influence of various mechanisms. To begin with, Baker (2010) tested the perception of accent location and the interpret ation of information structure of Korean learners of English and found that the learners were as good as the native controls in determining accent location but had poorer performance than the native controls in the interpretation of information structure, suggesting influence from the L1. Ortega-Llebaria and Colantoni (2014) provided further evidence for L1 influence in their cross-linguistic experiment on the use of prosody in comprehension of focus by Spanish and Mandarin learners of English. When asked to select an answer to a question from possible responses differing only by accent location, the Mandarin learners of English achieved native-like accuracy but the Spanish learners of English were significantly less accurate than the native controls. The authors related this difference between the two groups of learners to the reliance on word order in focus marking in Spanish but on prosody in Mandarin. Research on the perception of L2 questionhood also shows a strong influence from the L1. For example, Puga et al. (2017) tested the ability of German learners of English to match the intended intonation pattern to a number of sentence types and functions (e.g. polar questions, tag questions, statements). They found that the German learners did not differ from the English controls for polar questions and statements, but were less accurate than 1 Relevant psycholinguistic studies (e.g. Akker and Cutler 2003; Braun and Tagliapietra 2011; Lee and Fraundorf 2017) and neurolinguistic studies (e.g. Nickels et al. 2013; Nickels and Steinhauer 2018) are not reviewed due to space limitations.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

SENTENCE PROSODY IN A SECOND LANGUAGE 615 the native controls for tag questions, possibly due to L1 transfer. Compared to English, tag questions are less common in German and display less variability in syntactic form and prosody (Dehé and Braun 2013). Yang and Chan (2010) tested the perception of question versus statement interpretation in English learners of Mandarin Chinese. They reported that the learners at all proficiency levels made the most errors when statements ended with a syllable that had Tone 2 (f0 rise) or when questions ended in Tone 4 (high-falling)—that is, when the final f0 contour did not match the typical contours of polar questions (rising) and statements (falling). These results also provide evidence for L1 transfer. Similarly, Liang and van Heuven (2007) found evidence for an influence of speaking a tone language in a study on the perception of question versus statement in Mandarin Chinese by three learner groups, two from dialects of Chinese (Nantong and Changsha dialect) and one from a nontone language (Uygur, an Altaic language). For the intonation task, participants had to indicate whether the utterance was a statement or question (the utterance had constant high-level tones). The learners from the non-tone language were more sensitive to the statement/question contrast than the learners from a tone language, who, in turn, were more accurate in a separate tone recognition task. Limited research on the role of prosody in disambiguating syntactic ambiguities and expressing paralinguistic meaning has, however, yielded evidence for not only L1 influence but also language-independent mechanisms. For example, Cruz-Ferreira (1989) examined the use of prosody in syntactic disambiguation by English learners of Portuguese and Portuguese learners of English in their L1 and L2. She presented the participants with sentences whose meaning could be disambiguated by either accent placement or prosodic phrasing (see examples (4) and (5)). She found that the participants performed well in the L2 when the meaning contrast was realized prosodically in a similar way in their L1 and L2 (due to positive transfer) or in a language-independent way (e.g. associating high pitch with ‘open’ meaning). (4) She gave her dog biscuits. (5) She dressed and fed the baby. Atoye (2005) extended Cruz-Ferreira’s (1989) study by testing both discrimination of contours (perception) and interpretation in Nigerian learners of English using a subset of the stimuli from Cruz-Ferreira (1989). Nigerian English differs substantially from British English in prosody (Gut and Milde 2002). For example, in Nigerian English, pitch movements on syllables typically occur in pre-pausal syllables and pitch height of syllables appears to vary to encode grammatical functions. Atoye found that the Nigerian learners were generally able to perceive differences between two prosodic versions of a pair but had substantial difficulties in glossing their meanings. These results suggest that difficulties in establishing the form–function link in the L2 probably arise from a lack of similar uses of prosody in the L1, not due to low-level perceptual skills, similar to findings from Baker (2010). Finally, Chen (2009a) investigated the perception of paralinguistic meaning attributes such as ‘emphatic’ and ‘surprised’ in Dutch learners of English and English learners of Dutch. The learners showed a transfer from their L1, but it was also evident that they partially interpreted non-L1-like form–function mappings in a native-like manner, due to language-independent uses of pitch, as captured in the biological codes (Gussenhoven 2004) and exposure to native input.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

616 Jürgen TROUVAIN AND BETTINA BRAUN

44.4.2 Perceived foreign accent and ease of understanding When assessing L2 speech, we can distinguish between perceived degree of foreign accent (or linguistic nativelikeness) and ease of understanding (Munro and Derwing 1995; Derwing and Munro 1997). The latter is usually divided into intelligibility (number of words actually understood by listeners) and comprehensibility (how well listeners think they understand the speaker). Past work has shown that prosodic properties such as speech rate, rhythm, intonation, and fluency can not only have an impact on accentedness ratings but also on intelligibility and comprehensibility ratings. However, as different prosodic properties and aspects of L2 speech have been examined in different L2s, we do not yet have a clear understanding of the effects of different prosodic properties on accentedness, intelligibility, and comprehensibility in different L1–L2 parings. For example, Jilka (2000) found that for German learners of English, accent ratings were best predicted by f0 range and word stress measures whereas comprehensibility scores were mostly associated with speaking rates. Polyanskaya et al. (2017) found that for French learners of English, both speech rate and speech rhythm (operationalized as durational ratios of syllables, vocalic sequences, and consonantal clusters) influenced the degree of perceived foreign accent, but the effect of speech rhythm was larger than that of speech rate. Van Maastricht et al. (2016) showed that L2 Dutch, containing deviance in pitch accent distributions for the purpose of focus marking produced by Spanish learners, was rated as more foreign accented and more difficult to understand than L1 Dutch, with speakers’ proficiency as a modulating factor. Using a cross-modal priming paradigm, Braun et al. (2011) showed that an unfamiliar intonation contour on otherwise Dutch sentences resulted in longer lexical decision latencies and semantic category judgements, suggesting an effect of non-native intonation on comprehensibility. Much research on the assessment of L2 speech has been devoted to the issue of the weighting between segmental and prosodic characteristics in their impact on accentedness and comprehensibility. The findings have been rather mixed. Some studies found that deviations on the segmental level are less severe for the ratings of accentedness and comprehensibility than deviations on the prosodic level (Munro and Derwing 1995 on Mandarin learners of English; Trofimovich and Baker 2006 on Korean learners of English). On the other hand, more recent studies have shown that segmental errors and the interplay between segments and prosody have a larger impact. For example, in an investigation of German-accented English by Ulbrich and Mennen (2016), the native listeners were more influenced by segments than prosody. In addition, the listeners were quite sensitive to small prosodic differences when mixed with non-native segments. In a study on Korean-accented English, Sereno et al. (2016) showed that segments had a significant effect on accentedness, comprehensibility, and intelligibility, but intonation only had an effect on intelligibility. Because different studies have focused on different L2s produced by speakers with different L1s, future research is needed to better understand whether the weighting between segmental and prosodic properties in perceived accentedness and comprehensibility varies between L2s produced by learners with different L1s and between different L2s produced by learners with the same L1. We can state that both segmental and prosodic features are responsible, differing in weighing from case to case. Thus, we can assume that an L2 learner

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

SENTENCE PROSODY IN A SECOND LANGUAGE 617 who concentrates on segmental acuity, leaving out prosodic aspects, will probably be perceived as less fluent and less intelligible than an L2 speaker who cares more about fluency and prosody than about vowels and consonants.

44.5 Conclusions Research on L2 sentence prosody covers different aspects (intonation, timing) from different perspectives (production, perception, comprehension). Studies that integrate different aspects and perspectives are rare (but see e.g. Gut 2009a; van Maastricht et al. 2016, 2017). For this reason, theoretical models that aim to predict learning difficulties only cover sub-fields of L2 sentence prosody. One example is the L2 Intonational Learning Theory by Mennen (2015), which describes development in L2 intonational production along the same dimensions in which cross-linguistic differences in intonation can occur (Ladd 1996, 2008b)— that is, the inventory of phonological-prosodic elements, their distribution, how these elements are phonetically implemented, which functions these elements have, and how often these elements are used. Models that aim to explain the underlying learning mechan isms (e.g. answering questions such as ‘What drives the learning of L2 prosody in the absence of explicit instructions?’ or ‘What factors matter to the successful learning of L2 prosody?’) are still lacking, even though significant theoretical advances have been made in the acquisition of non-prosodic aspects of L2. Existing research on L2 sentence prosody has been concerned with the influence of the L1 on L2 prosody. However, a solid analysis of such a transfer often encounters difficulties due to an unclear reference of ‘correct’ forms in the sentence prosody of the target variety. In contrast to L2 word prosody and L2 segmental forms, we frequently have greater optionality in L2 sentence prosody (e.g. in placing phrase breaks and pitch accents). We thus face huge variability that is often enforced by regional and other non-standard influences. In addition, most studies are concerned with a variety of English as the target language. For L2 sentence prosody research it remains a challenge to define what is ‘correct’ or ‘acceptable’ in the target variety on the one hand, and to widen our knowledge of target and source languages on the other. But there are more challenges for future research. Specifically, characteristics that are traditionally considered segmental, such as reduction, should be examined from a prosodic perspective. There is a need to investigate the interface between prosody and other linguistic levels, such as syntax in L2s (e.g. Zubizarreta and Nava 2011). It is also important to extend L2 prosody research to more natural situations, such as dialogues and other inter actional behaviour (e.g. Ward and Gallardo 2017). For this purpose, phonetic learner corpora with prosodic annotation would be a valuable resource. At the same time, learner corpora can help to test theoretical questions about L2 sentence prosody with a substantial number of participants. This will in turn allow us to gain more insight into the developmental stages of L2 sentence prosody, ideally by establishing a hierarchy of learning difficulties. Finally, fluency and timing are not treated together with intonation- and pitch-related aspects in L2 teaching, L2 assessment, and L2 testing. However, such a broad-ranging view on L2 sentence prosody would be beneficial to the construction of theories concerning the

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

618 Jürgen TROUVAIN AND BETTINA BRAUN acquisition of L2 prosody and applications such as assessments in teaching, exercises for individual learning, and automatic testing of spoken performances.

Acknowledgements We are grateful to Elisabeth Delais-Roussarie, Laura Downing, Christiane Ulbrich, and Katharina Zahner for comments on an earlier version of this chapter. We also thank Clara Huttenlauch, Sophie Kutscheid, Jana Neitsch, Maximilian Wiesner, and Katharina Zahner for help with references.

chapter 45

Prosody i n Secon d L a nguage Teachi ng Methodologies and effectiveness Dorothy M. Chun and John M. Levis

45.1 Introduction In this chapter, we discuss the teaching of prosody to second language (L2) learners, beginning with the importance of prosody for L2 learners (§45.2) and followed by summaries of the research literature on teaching prosody (which includes intonation, rhythm, and stress) (§45.3). Finally, we present the current state of knowledge regarding the effectiveness of L2 instruction of prosody (§45.4), including recommendations for the future of teaching L2 prosody (§45.5).

45.2 The importance of prosody for L2 learners In L2 pronunciation, two twenty-first-century trends can be cited. First, the goal of L2 pronunciation teaching is no longer to train learners to achieve native-like accuracy, as it tended to be until the late 1990s, but rather to speak comprehensibly and intelligibly (Derwing and Munro 1997; Levis 2005a; Munro and Derwing 2011; Murphy 2014). Second, a trend that began in the 1980s recognized that prosodic features play as important a role, if not more important, as segmentals in the comprehensibility of L2 speech (McNerney and Mendelsohn 1992; Derwing et al. 1998; Kang 2010; Gordon and Darcy 2016). For L2 learners, speaking comprehensibly and intelligibly is a goal that requires a certain level of mastery of L2 prosody, in terms of both production (speaking) and reception (listening). As Derwing and Munro (2015: 1) state, speech intelligibility is ‘the most fundamental characteristic of successful oral communication’; they define an intelligible utterance as one

620 DOROTHY M. CHUN AND JOHN M. LEVIS that the listener can understand, with the additional requirement that the listener can also grasp the speaker’s intended message. In comparison, speech comprehensibility refers to the amount of effort that the listener must put into understanding speech. Speech may be intelligible (understandable), but if it takes the listener a great effort to fully and readily grasp what is being said, that speech is less comprehensible (less easily understandable). Other reasons for the importance of prosody for L2 learners are: (i) prosody contributes relatively more to a non-native/foreign accent than segmentals (vowels and consonants)1 and (ii) fluency, or the degree to which speech flows easily, is closely related to the number and types of pauses and other dysfluency markers, often considered in the realm of prosody (e.g. Trofimovich and Baker 2006; see also chapter 44). Even though a native-like accent is no longer one of the most important goals of L2 speakers, strongly dysfluent speech can considerably hinder comprehensibility, while fluent speech may be considered more comprehensible.

45.3 Teaching prosody From a theoretical standpoint, the main proposals for explaining how L2 learners perceive and produce speech revolve primarily around the influence of the L1 (see chapters 39, 40, and 42). The theories in turn have implications for pronunciation teaching. One of the earliest approaches is the Contrastive Analysis Hypothesis (CAH), proposed by Lado (1957), which views language as a system of habits and postulates that native language (L1) habits (especially the articulation of vowels and consonants) have a significant influence on L2 acquisition. By contrasting the phonological inventories of L1 and L2, hierarchies of error types could be established, and teachers would understand which L2 sounds would be most difficult for learners. In a study of prosodic transfer from L1 to L2 based on the CAH, Rasier and Hiligsmann (2007) investigated the study of (pitch) accent in L2 Dutch and L2 French and found that it was easier to shift from a language with non-fixed accentuation (like Dutch) to a language with fixed accentuation (like French) than the other way around. Similarly, Missaglia (1999: 551) developed a pronunciation training method specifically for Italian learners of German based on a contrastive German–Italian prosody framework and found that accentuation and intonation seemed to have a controlling function over syllables and segments. Taking a different tack and theorizing that adults’ difficulties in phonetic acquisition are frequently due to not being able to perceive new or unknown sounds, Best (1995) proposed the Perceptual Assimilation Model (PAM), in which L2 adults perceive unfamiliar sounds in terms of their similarities and dissimilarities to their native phonemes. A third approach, developed specifically to account for L2 phonetic learning over time, was the Speech Learning Model (SLM) of Flege (1995). The SLM posits that the level of accuracy at which an L2 learner can perceive L2 segments has an effect on the level of accuracy at which the learner can produce 1 See Munro (1995) and Derwing et al. (1998). However, this may depend on the L1–L2 language pairs under investigation (see chapter 44). For example, Sereno et al. (2016: 303) found in their investigation of Korean-accented English that segments had a significant effect on accentedness, comprehensibility, and intelligibility, but intonation only had an effect on intelligibility.

PROSODY IN SECOND LANGUAGE TEACHING 621 the segments. Although the models of Best (1995) and Flege (1995) were concerned primarily with segmentals, Trofimovich and Baker (2006: 26) suggest that their L2 speech learning theories can be extended to account for the learning of L2 prosody as well. However, such extensions are scarce, though see Kainada and Lengeris (2015) and Mennen (2015), who proposed a working model of L2 intonation learning that takes into account predictions from the PAM and the SLM and investigates how cross-language differences in intonation predict where L2 deviation is likely to occur. From a practical standpoint, teachers would ideally want to apply what is known theoretically about how L2 pronunciation is acquired to methodologies and materials for teaching prosody. In general, methodologies for teaching prosody consist of a combination of explaining prosodic features, raising awareness of the features and functions of prosody, production exercises and practice, and perceptual training (Gut et al. 2007). Depending on the goals and proficiency levels of the learners, instruction will include different combinations of these methods and perhaps a different ordering. Textbooks and other materials for teaching pronunciation that include attention to prosody have been available for decades, particularly for English (e.g. O’Connor and Arnold 1973; Bradford 1988; Wells 2006 for British English; and Pike 1945; Gilbert 1984/2012; Celce-Murcia et al. 1996/2010 for American English). Levis and Sonsaat (2016) examined intermediate English language teaching textbooks from three well-known publishers (Cambridge University Press, Oxford University Press, and Pearson-Longman) and found that the books focused primarily on suprasegmentals (though they included segmentals as well), with intelligibility as a priority. In the following subsections, we look at several areas of prosody that help to explain what can be expected in L2 prosody learning and how prosody has been, and can be, taught. We start with intonation, then move on to rhythm and word stress, touching on other prosodic features as is relevant to the various languages we discuss.

45.3.1 Intonation For the sake of simplicity, one could posit two major models of intonational structure, corresponding roughly to the British ‘contour-based’ approaches and the American ‘autosegmental’ approach, both of which focus on pitch movement or changes in pitch (Levis 2005b). In contour-based approaches, intonation is composed of three systems: ‘tonality’ (phrasing of intonation units), ‘tonicity’ (focal point of intonation), and ‘tones’ (or tunes, e.g. falls, rises, and fall-rises) (Halliday 1967; Tench 1996). On the other hand, the autosegmental approach proposes that intonation consists of sequences of tone levels (Pierrehumbert 1980; see also chapter 6). These levels can be realized as ‘pitch accents’, which are associated with lexically stressed syllables, and ‘boundary tones’, which are assigned to phrase-final syllables of either an intermediate (minor) phrase or an intonational (major) phrase. In reality, each approach includes attention to both holistic contours and to specific features that make up the contours, with a difference in which elements are emphasized. Visualization techniques have been used to teach intonation for decades. Some are simple visualizations (e.g. O’Connor and Arnold 1973), while others represent phonological inventories of prosodic labels and symbols (e.g. Pike 1945). Some were originally developed for analytical purposes (e.g. Pierrehumbert 1980). A systematic comparison of the use of

622 DOROTHY M. CHUN AND JOHN M. LEVIS

(a)

(c)

(b)

Messer und Gabel liegen neben dem Teller

Am blauen Himmel ziehen die Wolken

Ob ich süßigkeiten kaufen darf

Riecht ihr nicht die frische Luft

Die Ärzte sind damit gar nicht einverstanden

(d)

FelÜber die

der

weht ein Wind

Wer muss noch Schularbeiten machen

Muss der

Zuk-

ker nicht dort drü-

(e)

durch ´wAld und ↓´fEld führt unser `WE:G. ´wAs ↓´mAcht ↓dein ver`stAUchter ´FU:SS?

(f)

ben

steh-

en

L = Low (tone) H = High (tone) %L

^H* L+H* Die drei Männer sind begeisert

H* L*+l %L H* Können wir nicht Tante Erna einladen

Figure 45.1 Visualization techniques for intonation contours. (a) depicts drawn, stylized intonation contours (e.g. also employed by Celce-Murcia et al. 1996/2010). (b) portrays a smoother, continuous contour (e.g. used in Gilbert 1984/2012). (c) shows a system consisting of dots representing the relative pitch heights of the syllables; the size of the dots indicates the salience level of the syllables; tonal movements are indicated by curled lines starting at stressed syllables (O’Connor and Arnold 1973). (d) represents pitch movement by placing the actual text at different vertical points (e.g. Bolinger 1986). (e) illustrates a notational system that uses arrows to indicate the direction of pitch movement and diacritics and capitalization to mark stress (similar to Bradford 1988, who used a combination of (c) and (d)). (f) represents a modification of the American transcription system ToBI, based on the autosegmental approach (e.g. Toivanen 2005; Estebas-Vilaplana 2013). The focal stress of the sentence is marked by the largest dot above the stressed syllable (c), capitalization of the stressed syllable and the diacritic ´ (e), and the L+H* notation marking the pitch accent (f). (Adapted from Niebuhr et al. 2017: fig. 1–6)

different techniques and their effect on learners’ performance was lacking until Niebuhr et al. (2017) studied six different notational techniques for teaching L2 German intonation contours (see Figure 45.1). Niebuhr et al. found that iconic visualization techniques, in particular the technique shown in panel a, were more helpful to learners than symbolic notation techniques, for example those shown in panels e and f. In teaching L2 intonation, the shift in the focus of L2 pronunciation teaching from segmentals to suprasegmentals is thought to ensure better communication skills and efficacy. Similarly, there has been an expansion of intonation functions that are taught. Intonation not only conveys linguistic information (e.g. statements vs. questions) but also plays key

PROSODY IN SECOND LANGUAGE TEACHING 623 roles in discourse-level communication (e.g. managing conversational turns or signalling speaker intentions and attitudes). One of the most difficult aspects of teaching and learning L2 intonation is that there is no consistent correspondence in any language between a particular intonational pattern, specific meanings, and specific grammatical structures. In addition, native speakers vary in how they use L1 intonation (e.g. regional differences), making intonation all the more challenging for L2 learners.

45.3.2 Rhythm Rhythm has long been a mainstay in the teaching of L2 prosody, even if it has largely been built on assumptions that are not empirically justified. The teaching of rhythm is often based on a distinction between stress-timed and syllable-timed languages. In this view, languages like English are stress-timed and are marked by a rhythm with relatively equal durations between stressed syllables, a timing that is thought to be largely preserved no matter how many unstressed syllables are between two stresses. Languages like Spanish are syllable-timed, and the syllables have relatively equal durations. This has long been visualized—for example, as shown in Figure 45.2, from Prator and Robinett (1985: 29). This impressionistic description still dominates the teaching of L2 rhythm in English, yet it may have value in providing a model for speech patterns, such as how stress and unstress are tied to lexical classes. In English, content (lexical) words (Noun, Verb, Adjective, Adverb) are inherently stressed, while function words, especially one-syllable function words (Preposition, Pronoun, Determiner, Conjunction, etc.), are not and are typically reduced to schwa. Some categories seem to occupy a middle ground and have full vowels, like content words, but rarely carry phrasal prominence, unlike content words. These word classes include demonstrative pronouns, question words, and negative words. Despite its pedagogical attractiveness, this model was seriously challenged nearly four decades ago. Dauer (1983) found that stress-timed and syllable-timed languages, rather than being polar opposites typified by groups of languages, were instead tendencies that often

Figure 45.2 The rhythms of some other languages (top) and English (bottom). (Prator and Robineett 1985: 29)

624 DOROTHY M. CHUN AND JOHN M. LEVIS overlapped (e.g. stress-timed languages often have groups of relatively equally syllable durations, while syllable-timed languages often have syllables differing in duration). Despite repeated attempts using a variety of rhythm metrics for vowel and consonant variations in different languages (e.g. Grabe and Low 2002), there is little evidence for the traditional stress/syllable-timed distinction (see also chapter 11). Research from psycholinguistics (Cutler 2012), however, indicates that these different conceptions reflect prosodic reality. Rhythm is particularly important in speech segmentation, and the stress-based rhythm of English is used by English speakers to successfully identify words from the stream of speech, while French speakers segment more successfully by using syllable-based rhythmic properties and Japanese speakers use mora-based properties of speech. In these different types of segmentation strategies, L2 learning is more challenging when listeners try to segment speech that is produced with an unfamiliar rhythm.

45.3.3 Word stress Research on word stress in free-stress languages offers some evidence for the reality of how prosodic differences affect L2 learners. In a wide range of studies on English, Dutch, and Spanish stress differences, researchers have demonstrated that four features are related to perceptions of word stress: vowel quality, duration, pitch, and intensity. In free-stress languages, these features have different weights. Spanish, for example, does not make use of vowel reduction, and thus Spanish stress perception is largely based on prosodic features. English, on the other hand, heavily relies on vowel reduction as a cue to distinctions between unstressed and stressed syllables, and duration, pitch, and intensity are often treated as redundant cues to stress. Indeed, the predominance of schwa in English speech (approximately 33% of vowels in running speech; Woods 2005) indicates the strength of vowel quality as a stress cue. Dutch falls somewhere between English and Spanish. Dutch makes use of vowel reduction, though not to the same extent as English (Soto-Faraco et al. 2001), and Dutch listeners are more effective than English listeners in making use of prosodic features to evaluate stress placement, even when evaluating the stress of English words (Cooper et al. 2002). Thus stress judgements are actually based on language-specific segmental and suprasegmental cues. When learners from languages that either do not use stress as a parameter or have fixed stress learn free-stress languages, the success they achieve in perceiving and predicting stress differs from that of native speakers (see also chapter 43). Guion et al. (2003) established that English speakers seem to attend to three cues in determining the stress placement of unknown two-syllable words in noun and verb frames: syllable weight, lexical class, and analogy with known words. They then examined whether the same cues were employed by learners of L2 English. For early and late English–Spanish bilinguals, Guion et al. (2004) found that both early and late bilinguals showed s ignificant relationships between stress and syllable weight, lexical class, and analogy with unknown words, but that late bilinguals showed significantly reduced effects of syllable weight. Late bilinguals also showed an increased tendency towards initial stress for both noun and verb frames. For Korean learners (Guion 2005) the effect of phonologically similar words was strongest, followed by syllable weight, whose effect decreased for late bilinguals. L2

PROSODY IN SECOND LANGUAGE TEACHING 625 English stress placement by speakers of Thai was examined by Wayland et al. (2006). The effect of phonologically similar real words was a significant predictor of stress placement, but Thai speakers showed a strong attraction to stress in syllables with long vowels and initial stress in noun frames. Other research has examined the effect of misplaced word stress on the intelligibility of words for native speakers and non-native speakers. Rightward misstressing seems to create more difficulties for both native speakers and non-native listeners in understanding than leftward misstressing (Field 2005). In longer words, Richards (2016) also found that loss of intelligibility due to changes in vowel quality was cumulative, with multiple quality changes being more damaging than one change. The ultimate success of learning the prosodic patterns of an L2 is perplexingly varied based on the L1–L2 prosodic systems and the age at which the L2 learner acquires the L2. A series of articles by Archibald (e.g. 1994, 1997) proposed that learners of non-stress languages may compute stress word by word but that learners from variable- and fixed-stress languages like Spanish, Polish, and Hungarian show evidence of being affected by both universal processes and L1-specific developmental processes. Almost all of these studies, however, were based on very small numbers of subjects and have to be seen as inconclusive in regard to how L2 learners negotiate stress in an L2. Some research has proposed that French learners may suffer from ‘stress deafness’ when learning a free-stress language (Dupoux et al. 2008), but other research indicates that stress deafness may also affect L2 learners of one free-stress language learning another because of insufficient attention to the phonetic encoding of stress in the L2 (Ortega-Llebaria et al. 2013).

45.4 The effectiveness of L2 pronunciation instruction applied to prosody Relatively few studies have examined the extent to which L2 prosodic patterns can be learned and retained with instruction rather than through naturalistic exposure. There is some indication that systematic instruction on prosody can more quickly lead to improvements in comprehensibility as measured by ratings of the ease of understanding of accented speech. Derwing et al. (1998), studying the efficacy of prosodic instruction, found that the comprehensibility of learners’ spontaneous speech receiving primarily prosodic instruction improved, while comprehensibility of spontaneous speech for leaners who were instructed on segmental features did not. (In read speech, both groups’ comprehensibility ratings improved.) Similarly, Gordon and Darcy (2016) found that instruction on suprasegmental features in a short-term pronunciation course led to comprehensibility improvements while instruction on segmentals led to poorer comprehensibility ratings, perhaps because students, in paying attention to segmental errors, spoke more hesitantly whereas instruction on prosody promoted more fluent and thus more comprehensible speech. In the following subsections, we discuss the effectiveness of prosody instruction with regard to awareness raising, perception, production, and the use of multi-modality.

626 DOROTHY M. CHUN AND JOHN M. LEVIS

How do you spell ‘ease’?

E - A - S - E.

How do you spell ‘easy’?

E - A - S - Y.

Figure 45.3 Representing pitch movement in a pronunciation course book.

45.4.1 Awareness It is hard to produce L2 prosodic features that are not perceived. It is almost impossible to perceive them if there is insufficient awareness that they exist. In one study of the value of awareness raising, Coniam (2002) used visual distinctions in timing found in waveforms to raise awareness for English teachers about the differences between inner-circle (American English) and outer-circle (Hong Kong) English varieties. Coniam used a popular TV programme with two main characters, one with an American and one with a Hong Kong accent. He raised awareness among the teachers by demonstrating the rhythmic timing differences on the waveforms and helping the teachers to notice the differences between the varieties. Actually seeing the differences in the waveforms led to increased awareness while discussion alone did not. Gilbert (1984/2012: 6), following Bolinger (1986) and others, marks prosody in examples and exercises using changes in typeface and by marking pitch lines (Figure 45.3). This creates awareness of the way L2 speech is verbally shaped by changing the typical non-varying visual representations of written language to reflect significant prosodic features.

45.4.2 Perception Like other areas of L2 phonology, the perception of L2 prosody is challenging. Certain prosodic properties in an L1 are acquired very early, perhaps as early as before birth (Vihman 2015), and prosodic parameters are used in language-specific ways. Some parameters, such as syllable length, may be categorical distinctions in languages like German and Estonian while they reflect allophonic distinctions in languages like English. Others, such as pitch accents, are found at the word level in Japanese but function phrasally in English. Even for languages that use prosodic parameters similarly, there may be differences in meaning associated with similar intonation contours that cause difficulties in perceptions of meaning (CruzFerreira 1987). The perception of some prosodic features seems more teachable than others (e.g. contrastive stress is more teachable than the use of pitch to mark juncture) (Pennington and Ellis 2000), but few studies demonstrate increases in perception based on instruction. One promising approach to learning that has been demonstrated with segmental learning is

PROSODY IN SECOND LANGUAGE TEACHING 627 the use of high-variability phonetic training. This approach, which uses multiple voices producing contrasts in multiple linguistic environments for input rather than only one voice, has been shown to promote more robust L2 phonetic categorization with segmental contrasts (e.g. Logan et al. 1991; Thomson 2012; Qian et al. 2018). These encouraging results hold promise for training with prosodic contrasts as well, and may help L2 learners to notice, perceive, and produce L2 prosody more successfully (e.g. Chang and Bowles 2015). A low-tech approach to teaching prosody perception and awareness is the use of kazoos to highlight rhythmic and melodic features of the L2. Gilbert (1994) has long advocated kazoos because of their ability to filter speech and make prosody prominent, thus raising awareness of prosody while removing much of the clarity of the words and even segments used to produce speech.

45.4.3 Production The teaching of prosodic features such as word stress is often accomplished in low-tech ways by physical embodiment of length differences in stressed and unstressed syllables. Gilbert (1984/2012: 18) suggests the use of rubber bands stretched between the thumbs. The stretching apart corresponds with lengthening of the stressed syllable, and the release of the stretched rubber band is done on the unstressed syllable (Figure 45.4). Any physical action that takes longer and shorter times to perform may be used in the same way. Chan (1987a), for example, advocates ‘stress stretches’ in which the speaker stands up on the stressed syllable and sits on the unstressed. Acton (2019) advocates haptic approaches to the teaching of rhythm, such as the ‘walkabout’ (in which stressed syllables correspond to steps as learners rehearse a text; Acton 2011), the ‘fight club’ (Burri et al. 2016), and other ways in which prosodic variations correspond to physical movement. Murphy (2004) calls for metalinguistic awareness of stress patterns and physical movement in learning to stress multisyllabic academic words. Another technique advocated as especially helpful for L2 prosody learning is ‘backward buildups’, in which the rhythm of longer sentences or texts is built from the end rather than from the beginning (Anderson-Hsieh and Dauer 1997), as in (1). The phrases are suggestions and could be shorter if needed. Starting from the end takes advantage of the ending prosody in building up the phrase, and is far more successful in promoting an appropriate prosodic shape of the sentence.

Figure 45.4 Teaching word stress using rubber bands.

628 DOROTHY M. CHUN AND JOHN M. LEVIS (1) ‘The rhythm of longer sentences is built from the end rather than the beginning.’ 1st phrase: ‘rather than the beginning’ (repeated by L2 learner) 2nd phrase: ‘from the end rather than the beginning’ 3rd phrase: ‘is built from the end rather than the beginning’ 4th phrase: ‘The rhythm of longer sentences’ 5th phrase: Read the entire sentence

45.4.4 Multi-modality: visual and auditory input and feedback Research on L2 phonological acquisition aided by technology has focused increasingly on suprasegmentals (Chun 1998, 2002; Jenkins 2004; Hardison 2005; O’Brien 2006; Hincks and Edlund 2009; Tanner and Landon 2009; Warren et al. 2009). However, as Thomson and Derwing (2015: 330) noted, of the computer-assisted pronunciation training (CAPT) studies they reviewed, only 28% focused on suprasegmentals, 59% on segmentals, and 13% on both. In addition, the meta-analysis by Lee et al. (2015: 355) found that computer-provided treatments yielded small effects compared with those provided by teachers. Other approaches to teaching L2 prosody include the use of drama and shadowing. These integrated approaches combine awareness raising, perception, production, and multimodality. Goodwin (2012) advocates using scenes from movies or TV shows to practise pronunciation. Learners transcribe the scene, marking phrase breaks, prominent syllables, intonation, other prosodic features, and corresponding gestures. They then try to match the prosody, gestures, and segmental features of the characters. This top-down approach to learning pronunciation includes controlled practice with authentic models of speech, but it does not focus on individual features out of context. As such, it promotes pronunciation learning holistically. Better use of ‘shadowing’ is made possible by the ready availability of natural speech and the ease with which learners can practise with a focus on prosody (Meyers 2013). Foote and McDonough (2017) examined the effect of shadowing using technology-based practice (iPods) over eight weeks, using scenes from sit-coms and transcripts of the texts. The selfdirected instructional routine resulted in improvements for ratings of comprehensibility and fluency but not accentedness. In another study, Galante and Thomson (2016) examined the use of drama for instruction with English learners in Brazil. Drama activities included scripted scenarios and role-plays, as well as the study, rehearsal, and presentation of a short play or scene. Classes met twice a week for four months, and approximately half of each two-hour class period was taken up by drama-based practices. Galante and Thomson found that drama-based instruction, although it did not improve ratings of accentedness, improved oral fluency more than communicatively based instruction, and that it also led to significant, albeit smaller, gains in comprehensibility. CAPT training for prosody can include relatively simple digital tools, such as digital video, and more sophisticated programmes, such as automatic speech recognition and the visualization of the speech signal (via spectrograms, waveforms, fundamental frequency, or pitch contours). Visualizations of various aspects of speech are also possible and can be used by those with access to computing devices (be they computers or mobile devices) and the Internet. In the early 1980s, visual displays of pitch were already being used and studied

PROSODY IN SECOND LANGUAGE TEACHING 629

Figure 45.5 Waveforms and pitch curves of jìn lái ‘come in’ produced by a female native speaker (left) and a female student (right).

(de Bot and Mailfert 1982). Commercial software, such as Visi-Pitch (by Kay Elemetrics), was found to be helpful to learners of L2 French (Hardison 2004) where native Englishspeaking learners of French received three weeks of training with immediate visual feedback on both idiomatic and non-idiomatic uses of French intonation. There was significant improvement in accuracy of intonation, which transferred to novel sentences. Practice also led to improvement in identifying the trained sentences under filtered conditions. As in Hirata (2004), relatively controlled training at the sentence level with visual feedback led to strong prosody learning. Visi-Pitch was also used by Molholt and Hwu (2008) to demonstrate how visualizations of pitch could be used in the teaching of English, Spanish, and Chinese. Figure 45.5 shows two renditions of the Chinese sentence jin lai ‘come in’; on the left is the native speaker’s recording and on the right is the L2 learner’s speech. These visualizations were created with the open-source software Praat (Boersma and Weenink 2014). An example of software for prosody teaching for a language other than English is the Intelligent Language Tutoring System with Multimodal Feedback Functions (acronym Euronounce) for German as a L2, for native speakers of Polish, Slovak, Czech, and Russian (Demenko et al. 2010). Levis and Pickering (2004) investigated sentence-level and discourse-level intonation and provided specific examples of how systematic meanings of pitch movement can be described in transparent ways to learners (using Computerized Speech Laboratory, also from Kay Elemetrics). Figure 45.6 shows how intonation in English can signal discourse expectations of the speaker. The left portion of the figure shows the utterance ‘You know why, don’t you?’ with rising intonation at the end of the first clause ‘You know why’ and falling intonation at the end of the second clause ‘don’t you?’, with the questioner assuming the listener knows why and wants to confirm this fact (Chun 2002: 220–221). In contrast, the right portion of the figure depicts the same words, ‘You know why, don’t you?’ but in this rendition, the speaker uses falling intonation at the end of the first clause and rising intonation at the end of the second. This pattern might be used when the speaker thinks that the listener knows why, but has doubts and wishes to confirm this fact.

630 DOROTHY M. CHUN AND JOHN M. LEVIS

Figure 45.6 Different renditions of the question ‘You know why, don’t you?’.

031 //

COURSE of STUdy do you think // 278 mid + high

031a //

COURSE of STUdy do you think //

mid + mid

031b //

COURSE of STUdy do you think //

mid + low

Figure 45.7 A screenshot of the Streaming Speech software.

An online programme for teaching prosody (now no longer available) was Cauldwell’s (2002b) Streaming Speech, based on Brazil’s (1997) theory of discourse intonation. The speech samples were all unscripted narratives that had been extensively repurposed for pedagogical use, and the exercises targeted both perception and production activities. Learners could observe schematically drawn pitch changes along with capitalized words and syllables that received stress and diacritics such as arrows that indicated pitch direction (see Figure 45.7).

45.5 Conclusion This chapter has outlined the importance of prosody in spoken comprehensibility and intelligibility and has presented research that suggests that prosody training is effective in improving L2 speakers’ perception and production of prosodic features. Instruction in prosody has been based on theories of how L2 learners perceive and produce speech, as well as on empirical evidence relating to differing methodologies for teaching prosodic elements, including both low-tech and high-tech tools. For the future, we expect greater attention to prosodic learning that takes into account the importance of L1 parameters and L2 learning principles

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

pa rt V I I I

PRO S ODY I N T E C H NOL O GY A N D T H E A RT S

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

chapter 46

Prosody i n Au tom atic Speech Processi ng Anton Batliner and Bernd Möbius

46.1 Introduction We understand ‘automatic speech processing’ (ASP) to mean word recognition (automatic speech recognition (ASR)), processing of higher linguistic components (syntax, semantics, and pragmatics), and processing of computational paralinguistics (CP). This chapter attempts to describe the role of prosody in ASP from the word level up to the level of CP, where the focus was initially on emotion recognition and later expanded to the recognition of health conditions, social signals such as back-channelling, and speaker states and traits (Schuller and Batliner 2014). ‘Automatic processing’ of prosody means that at least part of the processing is done by the computer. The automatic part can be small, for example pertaining only to pitch extraction, followed by manual correction of the fundamental frequency (f0) values with subsequent automatic computation of characteristic values such as mean, minimum, or maximum. This is typically done in basic, possibly exploratory, research on prosody and in studies aiming to evaluate certain models and theories. A fully automatic processing of prosody, on the other hand, is necessary when we employ prosody in conjunction with other information in a larger context, such as developing a prosody module in a complete speech-to-speech dialogue system, or improving the speech of pathological speakers or foreign language learners via screening, monitoring, and feedback on the learning progress in a stand-alone tool. Apart from the phenomena to be investigated—such as prosodic parameters, emotions and affects, speaker states and traits, and social signals (for details see §46.2.2)—and the speech data to be recorded, the basic ingredients of automatic processing of prosody are (i) the units of analysis, suited to both the phenomenon and the type of features we employ; (ii) the features to be extracted; and (iii) machine learning (ML) procedures that tell us how good we are (i.e. which classification performance we obtain) and, if relevant, which features are most important, and for which units. The units of analysis in the processing of prosody may be implicit (e.g. an entire speech file), be temporally defined (e.g. segments of five seconds or one tenth of the entire speech

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

634 Anton Batliner and Bernd Möbius file), or be obtained via pre-processing, such as voice activity detection (e.g. using silence as an indicator for major prosodic/syntactic boundaries), ASR yielding word boundaries, syntactic parsing that generates phrase and sentence boundaries, or a combination of these strategies. Regarding ML procedures, many have been employed for the processing of prosody in ASP. Generally speaking, traditional, well-established procedures, such as linear classifiers and decision trees, tend to yield somewhat lower but more interpretable performance than the more recently developed procedures, such as deep neural networks, which tend to yield better results on larger data sets. Additionally, more controlled data, such as read speech, is likely to yield a better performance than spontaneous speech. This point may seem trivial but is worth stressing, since comparisons across different types of speech data are not uncommon. Strictly speaking, a comparison of performance obtained by, for example, different ML procedures can only be done for the very same data used in the same way, including, for instance, identical partitioning into train, development, and test sets. Evaluating the role of prosody in ASP has focused on two issues: performance and import ance. Performance can be measured: typically, the result is a numerical value between 0 and 1.0 (the higher, the better) or can be mapped onto such a value (Schuller and Batliner 2014). Importance is not as easy to define: it can mean importance for a model or theory, or importance for specific applications, therapies, or treatments. Nowadays, performance is the preferred measure in ASP. However, an equally important issue, often mentioned in introductory or concluding remarks, is to identify salient parameters (pitch, intensity, duration, voice quality) or features characterizing these parameters (see more on this in §46.3). In this chapter, we first present a short history of the field (§46.2), including a timeline in §46.2.1 and an overview of the phenomena addressed in the field and performance obtained in §46.2.2. We then describe the main aspects of prosodic features and feature types used in ASP in §46.3, introducing two concepts: ‘power features’ in §46.3.1 and ‘leverage features’ in §46.3.2. We then illustrate these concepts in §46.3.3, which is followed by concluding remarks in §46.4.

46.2 A short history of prosody in automatic speech processing 46.2.1 Timeline The history of prosody in ASP started with pioneering studies on the prerequisites for automatic processing of prosody, such as Lieberman (1960: 451) on ‘a simple binary automatic stress recognition program’1 and Mermelstein (1975) on ‘automatic segmentation of speech into syllabic units’. The speech material analysed in these studies consisted of prosodic minimal pairs and elicited carefully read speech. This was (and quite often still is) the usual 1 Lieberman already pointed out the incompleteness of the set of prosodic features used, and that prosody is characterized both by the presence of redundant information and by trading relations between different features.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PROSODY IN AUTOMATIC SPEECH PROCESSING 635 procedure used to exclude the multifarious confounding factors encountered in real-life situations. This approach, typical of basic research, was adopted by early attempts at incorp orating prosodic knowledge in ASP. Table 46.1 gives an overview of research on prosody in ASP over the past 40 years. Most of the studies conducted in the earlier period can be characterized by the components in the left column and most of the studies from the later period by the components in the right column. The entries under ‘integration’ in Table 46.1 denote a sliding transition from studies where prosody is processed alone (stand-alone) and as sole topic (intrinsic), by that being visible, to studies where prosody is used jointly with other parameters in an integrated way, towards some extrinsic goal (i.e. targeting some application), and leading to prosody becoming invisible as a contributing factor. Early studies that laid the foundations for prosody in ASP in the 1980s include Lea (1980), Vaissière (1988, 1989), and Batliner and Nöth (1989). The year 2000 can be viewed as a turning point away from these classical approaches, culminating in a functional prosody module in an end-to-end system (Batliner et al. 2000a) and moving towards new approaches with a focus on the processing of paralinguistics, starting with emotion recognition (Batliner et al. 2000b). Approaches from the earlier years nevertheless continued to be pursued after 2000, but to a lesser extent. Table 46.1 can be seen as a set of building blocks: any ‘component’ in the chain of processing (alone or in combination with some other component) from one of the cells (1–6) can be combined. Normally, only cells from the left or cells from the right are combined with each other unless a comparison of methodologies is aimed at (see, for instance, Batliner et al. 2000c).

46.2.2 Phenomena and performance In this section we take a closer look at the phenomena addressed in past studies on prosody in ASP (Table 46.1) and performance obtained for them in ASP. This is intended as a compact narrative overview instead of a systematic meta-review. In the second phase (after the year 2000), prosodic features were mainly used together with other features, especially spectral (cepstral) ones. It is therefore important to keep in mind that performance measures are usually not obtained by using prosodic features alone. In the 1990s, speech processing focused narrowly on the role of word and phrase prosody (accents and boundaries), intonation models,2 syntax (parsing) based on prosodic models, semantics (salience), and segmentation and classification of dialogue acts. This trend went in tandem with the general development of automatic speech and language processing systems, moving from read speech to less controlled speech in more natural situations and leading to conversational speech and dialogue act modelling. In the first phase (before 2000), most of the time, only prosodic features—sometimes enriched with features from higher linguistic levels—were used; see reviews of state-of-the-art systems in Shriberg and Stolcke (2001) and Batliner et al. (2001), as well as Price et al. (1991), Wang and Hirschberg 2 We use ‘intonation’ in a narrower sense, comprising only pitch plus delimiters of pitch configur ations (boundaries), and ‘prosody’ in a wider sense, comprising pitch and duration (rhythm), loudness, and voice quality, too.

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

636 Anton Batliner and Bernd Möbius

Table 46.1 Prototypical approaches in research on prosody in automatic speech processing over the past 40 years (1980–2020), with the year 2000 as a turning point from traditional topics to a new focus on paralinguistics 1980

1990

Getting wiser; basic knowledge; deciding between theoretical constructs; models/ theories

2000

2010

2020

1. Motivation Getting better; successful performance/ intervention; applications

2. Phenomena Paralinguistics (speaker): states (emotion, pain, Phonetics/linguistics (speech): accents, etc.) and traits (personality, ethnicity, etc.); boundaries, dialogue acts; parsing, dialogue diagnostics/teaching/therapy; towards ‘direct’ systems; speaker adaptation/verification/ representation (raw audio in–classes out) identification; ... ± intermediate levels such as tone representation Controlled, constructed; ‘interesting’ phenomena; prompted/acted; lab recordings; one (a few) speaker(s); small segments (units of analysis trivially given)

3. Data Less restricted data (more speakers, noisy environment); more spontaneous; from lab to real life; big data; segmentation/chunking into units of analysis necessary

4. Features Many (brute forcing) low-level descriptors and A few theoretically and/or empirically functionals; together with other types (spectral motivated; only intonational (tunes, pitch (cepstral)); all kind of linguistic features; patterns, e.g. ToBI); only prosodic (pitch/ loudness/duration plus/minus voice quality); multi-modal (together with facial and body gestures) syntactic features; speech only (uni-modal) ‘Traditional’ (k-nearest-neighbour, linear classifiers, decision trees, artificial neural networks); feature selection/reduction

Within theory: interpretability, deciding between alternatives, explicit modelling; within applications: employed for syntactic/semantic ‘pre-processing’ Stand-alone, intrinsic, visible

5. Procedures ‘Modern’ ones (support vector machines, ensemble classifiers (random forests)); all varieties of deep neural networks; feature selection/reduction not necessary 6. Utilization Performance; applications: e.g. semantic salience, states and traits; big data, data mining; (towards) implicit modelling of prosody 7. Integration → Integrated, extrinsic, not visible

(1992), and Ostendorf et al. (1993). This line of inquiry continued to be pursued after the turn of the century but was complemented and essentially replaced by a strong focus on paralinguistics, starting with emotion recognition (Daellert et al. 1996) and eventually extending to all kinds of speaker states and traits, including long-term traits, such as age, gender, group membership, and personality; medium-term traits, such as sleepiness and health state; short-term states, such as emotion and affect (e.g. stress, uncertainty, frustration); and interactional/social signals. The successful incorporation of a prosody module into the end-to-end translation system VERBMOBIL (Batliner et al. 2000a; Nöth et al. 2000) has highlighted the impact that

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

PROSODY IN AUTOMATIC SPEECH PROCESSING 637 prosody can have for ASP.3 However, such an integration comes at a cost, as described in Spilker et al. (2001) for speech repairs and in Streit et al. (2006) for modelling emotion. The interaction of the prosody module with other modules is highly complex and to some extent unstable. In general, the modular and partly knowledge-based design of such systems gave way to an integrated ML approach, which proved to be successful in subsequent years: in a state-of-the-art paper (Xiong et al. 2017) on conversational speech recognition, prosody is not even mentioned. This might be the main reason why the focus of prosody research in ASP, and concomitantly the visibility of prosody in ASP, has shifted to the domain of paralinguistics, whereas ASP (and especially ASR) systems today employ pros odic information, if at all, in a rather implicit way, for instance by using prosodic features in combination with all kinds of other features in a large, brute-force feature vector. Yet, there are many studies concerned with the assessment of non-nativeness or specific speech pathologies that address the impact of prosodic features, aiming at identifying the (most) important features; see §46.3.4 The implementation of the Tones and Break Indices (ToBI) model (Silverman et al. 1992) in ASP nicely illustrates how a genuinely phonological-prosodic approach was harnessed but eventually abandoned by ASP. One of the aims of ToBI was to foster a close col lab or ation between prosody researchers and engineers (Silverman et al. 1992). Especially during the 1990s, researchers tried to employ ToBI categories in mainstream ASP. However, using tonal categories as features in ML procedures introduces a quantization error by reducing detailed prosodic information to only a few parameters (Batliner and Möbius 2005). A reduced set of ToBI labels—that is, a light version proposed by Wightman (2002), which was based on results from perception experiments and would recognize classes of tones and breaks instead of the full set of ToBI labels—actually corresponded closely to the labels used in the VERBMOBIL project (Batliner et al. 1998). In other words, a functional model based on the annotation and classification of perceived accents and syntactic-prosodic boundaries should be preferred to a formal model relying on the annotation and classification of intonational forms—that is, pitch configurations with delimiters (break indices as quantized pauses), without a clear-cut relationship of these forms to functions. In Table 46.2, we report performance obtained for a selection of representative phenomena that have been addressed, ordered vertically from linguistic features to paralinguistic features, and from the more basic ones to the more complex ones, largely corresponding to the entries listed under ‘phenomena’ in Table 46.1. Performance depends on a plethora of factors, such as type of data and features employed. Moreover, it makes a big difference whether ‘weighted average recognition’ (WAR) or ‘unweighted average recognition’ (UAR) is used.5 Instead of presenting exact figures, we map the figures onto ranges of performance, 3 Syntactic-prosodic boundary detection reduced the search space for parsing considerably, yielding tolerable response times. This was a limited yet pivotal contribution. 4 Shriberg (2007) gives an overview of higher-level (including prosodic) features in the field of automatic speaker recognition. Schuller and Batliner (2014: chs. 4, 5) survey studies on CP, again including prosodic ones. 5 For WAR, chance level is the frequency in per cent of the most frequent class. UAR reports the mean of the diagonal in a confusion matrix in per cent; chance level is always 50% for two classes, 33.3% for three classes, and so on. UAR was introduced in the VERBMOBIL project as the ‘average of the classwise recognition rates’ (Batliner et al. 1998: 216), to facilitate a comparison of performance across results with different numbers of syntactic-prosodic boundary classes (skewed class distributions, up to 25

OUP CORRECTED PROOF – FINAL, 06/12/20, SPi

638 Anton Batliner and Bernd Möbius

Table 46.2 Phenomena and performance: a rough overview (qualitative performance terms appear in italics) Word recognition: prosody contributes little (low performance) Lexicon (word accent, stress): roughly the same performance as for accents Accents: phrase (primary, sentence) accent: medium to good; secondary accents markedly worse Boundaries: major and minor boundaries, purely prosodic and/or syntactic; major boundaries good, sometimes excellent; minor boundaries worse; boundaries can be better classified than accents—they display a more categorical distribution Syntactic parsing: based on accent and boundary detection; successful Sentence mood: mainly statement vs. question but others as well (imperative, subjunctive, etc.); depends on type of sentence mood: questions vs. statements medium to good Semantic salience (topic spotting): cf. accents above: islands of reliability, salient topics; closely related to phrase accent processing Dialogue acts: cf. above, sentence mood; sometimes good if pronounced, e.g. back-channelling with duration (here, duration is not really a prosodic feature but simply reflects the fact that back-channellings normally consist of very short words) Agrammatical phenomena: filled/unfilled pauses, false starts, hesitations: low to good Biological and cultural traits: sex/gender (pitch register): good to very good Personality traits: big five or single traits; depends on the trait to be modelled: good for those that display clear acoustic correlates such as loudness (extraversion), low for others Emotional/affective states: same as for personality; arousal good, valence rather low (especially if only acoustic features are used); emotions that display pronounced acoustic characteristics can be classified better, cf. anger vs. sadness; yet, anger with high arousal can be confused with happiness with high arousal Typical vs. atypical speech: pathological speech, non-native speech, temporary deviant speech (duration (non-natives), rhythm, loudness (Parkinson’s condition)); good, almost on par with single human expert annotators for assessment of intelligibility/naturalness Discrepant speech: irony/sarcasm, deceptive speech (lying): medium for controlled speech, but very low for un-controlled speech; off-talk (speaking aside): medium to good Entrainment/(phonetic) convergence: mutual adaptation of speakers in conversational settings, employing many of the above-mentioned phenomena Social/behavioural signals: modelling of speakers in interactional/conversational settings, employing many of the above-mentioned phenomena

following Coe (2002); UAR for a two-class problem with 50% chance level is given in per cent, followed by Pearson’s r in parentheses: excellent: >90% ( >.80); good: 80–90% (0.63–0.80); medium: 70–80% (0.46–0.63); low: 0.63–0.80 (0.24–0.46); very low: P´ (18) a. Constraints requiring stress in Strong positions: P > M > M × × ´ >M b. Constraints requiring absence of stress in Weak positions: P´ > M >P Variation in English meter provides abundant support for this hierarchy. One example must suffice here: the types of extrametrical syllable permitted in different varieties of English iambic pentameter. The most restrictive system (19a), found in early Marlowe (Schlerman 1989: 200), allows only unstressed syllables of polysyllabic words. The less restrictive system, adopted by Marlowe in his later work (19b), allows also function words (Schlerman 1989: 202). Shakespeare’s plays (but not his sonnets) allow extrametrical stressed monosyllables of compound words, as in (19c) (Kiparsky 1977). The least restrictive system, seen in Jacobean dramatists such as Fletcher (19d), allows even phrasal peaks in extrametrical positions. (19) a. Because | it is | my coun|tries and | my Fathers (Tamburlaine 2.4.124) b. This sport | is ex|cellent; | we’ll call | and wake him (Dr. Faustus 4.2.124) c. Quite o|verca|nopied | with lus|cious woodbine (MND 2.1.251) d. Ten pound | to twen|ty shil|lings, within | these three weeks (The Tamer Tam’d 1.1.71) In Shakespeare’s blank verse, the feet are most commonly grouped as 2+3, with a caesura after the fourth syllable. (20) (from Keppel-Jones 2001: 234) shows the percentages of caesuras after each position. (20) 1. 2. 3. 4. 5. 6. 7. 8. 9.

0.3 2.8

5.3 20.1

3.6 0.3

12.0

32.7 22.9

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

666 Paul Kiparsky In the later plays, the caesura after the sixth syllable becomes more common (Tarlinskaja 2014: 164). This suggests that the basic structure of Shakespeare’s line shifts from (21a) to (21b).7 (21)

a. S W

S

W W

S S

W

W S

W

W S

W

S W

S

S

b. W S

S W W

W S W

S S

W

W S

W

S S

W

S

In each case, the line consists of two hemistichs, one consisting of a single dipod, the other of two dipods, one of which has a catalectic (empty) foot. As with any option, the permissible variation in the position of the caesura in successive lines must be fully exploited (Golston 2009). In neoclassical iambic pentameter distichs (heroic couplets), a medial caesura after the fifth position becomes more common. A caesura in the middle of a foot constitutes improper bracketing, which violates Strict Layering (5), but dividing the line into two constituents of equal size could reflect a Parallelism constraint.8 In the nineteenth century, the caesura begins to move still further rightward; Browning favours it after the sixth position in his early work, and later increasingly even after the seventh.

48.5 Quantitative meters Meters in which prominence is marked by quantity challenge the claim that all metrical structure is rhythmic. While stress is uncontroversially structured as in (3), the hierarchical nature of syllable weight is less obvious. And some quantitative meters look at first sight like 7 The right-branching in (21a) might reflect the general stylistic long-last preference, as in drawn and ready; soft and delicate; friends, Romans, countrymen; let her rot, and perish, and be damned to-night (Ryan 2019). 8 Cf. Lerdahl and Jackendoff (1983: 51): ‘When two or more segments of the music can be construed as parallel, they preferably form parallel parts of groups.’

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

stress, meter, and text-setting 667 fixed aperiodic sequences of heavy and light syllables, with no obvious rhythm. But recent work has shown that quantitative meters do have a hierarchical rhythmic organization. Exactly like stress-based meters, they either require prominence in Strong positions or prohibit prominence in Weak positions, or both, but with prominence assessed by syllable weight rather than by stress. A one-mora unit is light, whereas a unit of two or more morae is heavy. Accordingly, in quantitative meters, Strong positions may be distinguished by requiring bimoraic feet or syllables, and conversely Weak positions may be distinguished by prohibiting them. Thus the typology is given by the menu of constraints shown in (22). Note that a meter in which both (22a2) and (22b1) are enforced is isosyllabic. (22) a. Strong positions 1. Must be a bimoraic foot (less restrictive) 2. Must be a bimoraic syllable – (more restrictive)

b. Weak positions 1. Cannot be a bimoraic foot (more restrictive) 2. Cannot be a bimoraic syllable – (less restrictive)

The various binary meters of Greek and Latin exploit all these options (for discussion and references see Kiparsky 2018b). (23) a. Strict iambic verse, with feet of the form ( –): (22a2), (22b1) b. Iambic with resolution in S, with feet of the form ( ): (22a1), (22b1) c. Iambic with resolution in S and split W, with feet of the form ( ): (22a1), (22b2) In ternary quantitative meters, both S and W are bimoraic trochees, and the correspondence constraints on positions determine the distribution of their monosyllabic and disyllabic realizations. The parallelism between quantitative and stress meters extends to larger units. Quantitative meters typically group their feet into dipods, one of which is more stringently constrained than the other, mirroring the corresponding asymmetry between Strong and Weak dipods in stress-based meters. For example, the ‘trochaic tetrameter’ of classical Greek is really a headless iambic tetrameter of the form (1), where the weak feet of each dipod are special: the first one is empty and the others (positions 5, 9, and 13) are quantitatively indifferent. In addition, this meter allows the type of resolution permitted by correspondence constraint (22a1) except in the last dipod, which is the strongest. The half-line boundary plays a role in the placement of the caesura. More evidence that quantitative alternations are rhythmic comes from meters in which syllable weight collaborates with stress or pitch in marking prominence. The same metrical positions that (22a) identifies as strong then meet additional prominence requirements. For example, the first position of a quantitative dactyl, which requires a heavy syllable, is Strong, and the second position, which may be occupied by two light syllables, is Weak. Correspondingly, in the last dipod of Latin hexameters, the S positions are nearly always stressed and the W positions unstressed (Ryan 2019). Thus, all the hierarchical groupings in (4) are manifested in classical quantitative meters. Musical performance also corroborates the rhythmic character of quantitative meter. Numerous traditions of quantitative versification preferentially place metrically Strong

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

668 Paul Kiparsky syllables into musically prominent positions (see §48.5); none are known in which musical rhythm is indifferent to quantity. Grouping into feet has been questioned particularly in meters based on moraic trochees. Japanese meter has been analysed as mora-counting, but compelling evidence has now been found that it is tetrameter with two-mora feet and catalexis. In particular, the haiku 5-7-5 has an underlying structure of three eight-mora lines, each consisting of four two-mora feet, with catalexis of all line-final moras, and of the final foot of odd-numbered lines.9 (24)

W S

W

S S

W W

S

S

S

S W S W S WØ

S

WØ

S

W W

S

W

S W S W S W S

S

W W

S

WØ S W S W S WØ

WØ

The unpronounced moras are realized in performance as silences of the appropriate length to complete the full eight-mora count of each line (Gilbert and Yoneoka 2000; Kawamoto 2000: 175–179; Cole and Miyashita 2006; Yamamoto 2011). A simple 5-7-5 count of just the uttered moras in the text would actually misrepresent the meter. Empty beats challenge bottom-up parsing theories of meter, notably that of Fabb and Halle (2008), which is designed to parse only what is present in the text and explicitly rejects the relevance of performance to meter. The Somali quantitative masafo meter has received non-rhythmic analyses, but Fitzgerald (2006) has shown that it is actually iambic tetrameter, made up of four iambic dipods with a medial break, tied together by alliteration. Each foot is a quantitative iamb of the type familiar from the theory of stress, consisting maximally of a Light-Heavy pair of syllables (three morae), minimally of a Light-Light pair or a Heavy syllable (two morae). CV and CVC count as Light (monomoraic), whereas CVV(C) is Heavy (bimoraic). (25) shows a distich quoted from Banti and Giannattasio (1996), here parsed metrically into iambs in accordance with Fitzgerald’s theory. (25) a. Masa|la gaa|bya diin| naa || meel | daran | idiin | dhigay. Mid mid | waxow | tiraah|daan || niman|ka moo|radii | furay. The unlearned of you taught you a bad lesson. Tell each of the men who opened the pen. b. First hemistich: ϕ

ϕ

ϕ

ϕ

σ

σ

σ

σ

σ

σ

σ

µ

µ

µ

µ µ

µ

µ µ

µ

µ

in n a

a

ma s a

la g a

a

bya d i

9 The tanka 5-7-5-7-7 doubles the initial distich into a quatrain. Their historical relation is the reverse, since the tanka appears to be older than the haiku.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

stress, meter, and text-setting 669 Fitzgerald (2006) is based on a written corpus of verse, but it is supported by Banti and Giannattasio’s (1996) examples of sung and recited masafo (although they do not assume an iambic meter). Their transcriptions show that singers render syllable weight accurately and mark the caesura just where Fitzgerald’s analysis would predict, as illustrated by the distribution of quarter notes and eighth notes in (26). (26)

Languages that have lexical stress or weight almost always recruit one or both of these features for their metrical verse. Languages that lack these lexical prominence features may still have metrical verse based on syllables and words, which are present in all languages.10 The rhythm is then often enhanced by rhyme or parallelism and by phrasing. Mordvin has neither lexical stress nor vowel quantity nor distinctive pitch. The meters of its rich lyrical and narrative poetry are syllable-counting, with caesuras and strict parallelism as the principal rhythmic devices. In a corpus of 508 poems, lyric and narrative, ranging from 10 to over 500 lines, I found 16 distinct meters, consisting of two, three, or four cola, invariant throughout a poem. They were of three types: short, long, and couplet meters, with respectively 1, 1.5, and 2 lines. Remarkably, their abstract form was the same as that of English folk quatrains (Kiparsky, in press). (27)

a. short meters: 4|3, 4|4, 4|5, 5|3, 5|5 b. long meters: 4|4|3, 4|5|3, 4|4|4, 4|4|5, 5|3|3, 5|5|3, 5|4|3, 5|5|5 c. couplet meters 5|5|3|3, 4|4|4|3, 4|3|4|3, 5|5|4|3

Rumsey (2007, 2010, 2011) shows that the meter and music of Ku Waru (New Guinea) tom yaya kange narrative songs have ‘a strong tendency for each beat except the final one to be associated with a single word or bound morpheme’ (2007: 236) and argues on that basis that their lines are built on word-length units. Ku Waru has no lexical stress or vowel length.11 The songs are chanted at breakneck speed to simple repeating two-part melodies, the second part a variation of the first. They consist of quatrains of syntactically independent lines, with a fixed number of feet (five, six, seven, or eight). Each foot has a fixed duration and usually a fixed pitch. The final foot of each line ends with a single long vowel, or when the singer takes a breath, with a pause of the same length. The metrical and musical form coincide closely. Rumsey (2007) provides the example reproduced in (28).12 10 On allegedly syllable-less languages see Kiparsky (2018a). 11 It does have phonemic pitch accent, but, as in Japanese, it seems to play no role in the meter. 12 The transcription here is based on the audio at http://chl-old.anu.edu.au/languages/ku_waru.php and matches Rumsey’s (2010: 47) partial phonetic transcription except for some details. (For a glossed

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

670 Paul Kiparsky (28)

1 puku

topa

lkund

urum

2 ndagl(i)

pugla

many(a)

lyirim

3 ogla

pugla

wagl(e)

lyirim

4 ngi ka-

pogla

mar(i)

tekin

5 kanamb

a:

kegle(pa)

pup(a)

6 koro(ka)

komung(a) kai

7 ogla

pupa

moglu(pa) megl

8 tumbagl

kop

e(ken)di

lyirim

9 kulaim

mingiyl

ekendi

lyirim

10 kanab

kuku

kegla

purum

11

toku

noba

lkaib

turum

12 toku

wagle

mura

murum

13 toku

i(k)ily(a)

purum

kanum

14 waru

kupa

punglau

nyirim

15 ngi

ka- pugla

mar(i)

te(k)in

16 kanab

ta(k)a

taka

namb(a)

kanung(a)

a:

He jumped and came into the house e: He removed his banana leaf apron a: And put on his cordylene kilt. , Well done, my lad, well done! a: As I watched he went on his way .e: Headed for Koroka Mountain a: He climbed to the top and stayed , With a Jew’s harp in one hand a: And a bamboo flute in the other. e: As I watched he went on his way. a: Where he smoked his tobacco and spat (e), Fields of tobacco plants sprouted a: And the smoke that went up in the sky a: Billowed like clouds round the mountain. a: Well done, my lad, well done! , In my mind’s eye the story unfolds

In the output, there are two types of feet: disyllables and heavy monosyllables. Trisyllabic words and short monosyllabic words are accommodated to this foot scheme in two ways. The first method is to divide them or combine them with each other; the second is to resize them by Procrustean lengthenings and shortenings. Lines 4 and 15 of (28) illustrate the first method. The short monosyllable ngi ‘that’ is joined with trisyllabic kapogla ‘good’ into two feet [ngi-ka]φ [pogla]φ. An example of splitting a word into two feet (not in this text) is pidiyap [pidi]φ [yap]φ (Rumsey 2010: 45). Instances of the second method are the lengthening of

phonemic transcription of the text, see Rumsey 2010: 53–4.) gl represents a velarized lateral; mb, nd are single prenasalized consonants; and ly, yl, ny, yn are palatals (Rumsey 2010: 43). Ku Waru seems to have no coda clusters. p, t, k are lenited. Rumsey writes ngi in line 15 as i, and I hear the final vocable as alternating a: and e:, until the last quatrain.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

stress, meter, and text-setting 671 monomoraic ku ‘and’ by reduplication to kuku to fill a foot (l.10),13 and the slurring or shortening of trisyllabic words into disyllables, indicated by parentheses in (28), e.g. koroka (name) → koro, konunga ‘mountain’ → konung ‘that’ (l. 6), ekendi ‘on a side’ → edi (l. 8).14 Rumsey’s generalization that words tend to correspond to feet can then be seen as a joint result of the high ranking of fit and the universal align foot/word constraint (p. 4). The first method of accommodating mono- and trisyllabic words results when phonological faithfulness outranks align foot/word, whereas the second method results when the reverse ranking gives free rein to the poetic language’s phonetic processes to compress and stretch words to fit the rigid foot structure of the verse. The ‘word-counting’ appearance of the meter would thus be an indirect epiphenomenal effect of foot/word alignment. Whether foot structure plays a role in the phonology itself remains to be seen; if not, we would have to say that it is so to speak imposed on the verse by the musical rhythm and realized by phonetic reduction and elongation. In addition to cases of apparent lack of layering, quantitative meters can look downright aperiodic. This challenge is posed by many quantitative meters of classical Sanskrit, Greek (e.g. Pindar), Persian/Urdu, and Berber; three representative examples are given in (29). (29) a. Strict iambic verse, with feet of the form (

): (22a2), (22b1)

b. Iambic with resolution in S, with feet of the form (

): (22a1), (22b1)

c. Iambic with resolution in S and split W, with feet of the form (

): (22a1), (22b2)

The key to such apparently arrhythmic quantitative meters lies in their characteristic correspondence rules. They are most often built from four-mora dipods, but the distribution of the morae is more complex than in (22) and may differ in successive feet. Some meters allow or require syncopation and empty beats (Deo 2007; Deo and Kiparsky 2011; Kiparsky 2018b). Syncopation (‘anaclasis’ in Greek metrics) operates in all older Indo-European quantitative meters to license the equivalence of – and – in designated positions of the line. Formally, it is a weight mismatch between the abstract pattern (verse design) and its instantiation, in which a heavy Weak position contributes its extra mora to supply the missing weight of the adjacent Strong position. (30) (a) W

S

(b) S

W

Thus an iambic dipod allows three types of syncopation, as shown in (31). (31)

W

(a) (b) (c) (d)

W 1

S S 2

W 3

S 4

no syncopation syncopation in positions 1 and 2 (‘choriambic’) syncopation in positions 2 and 3 (‘ionic’) syncopation in positions 3 and 4 (‘glyconic’)

13 Reduplication is known to fill out a prosodic minimum in other languages, e.g. Chukchi. 14 An exception is ekenda ‘on a side’, which seems to remain trisyllabic in its second occurrence (l. 9).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

672 Paul Kiparsky As an optional correspondence rule, syncopation generates patterns of synchronic equivalence (responsion) within some meters. It also functions as an obligatory correspondence rule that is a constitutive defining feature of meters with an independent status in their own right. Deo (2007) has shown that classical Sanskrit meters are in fact periodic. Their metrical positions are filled by bimoraic trochees—Heavy syllables or two Light syllables, grouped into (usually binary) left- or right-headed feet, which are in turn recursively grouped into larger units. Some meters require syncopation and/or empty beats at designated points in the line. Every line of a poem has the same invariant meter of Heavy and Light syllables, but there are hundreds of distinct meters. A variety at least as rich as the one found within Shakespeare’s blank verse (§48.3) is here distributed across Sanskrit’s metrical repertoire as distinct meters, each readily identifiable by someone schooled in the tradition. Haṃsῑ consists of four moraic trochee dipods with an empty position after the caesura, as shown in (32). (32) The empty position is not posited merely for the sake of regularity. In chanting the metrical pattern, it is obligatorily realized as a pause on the fourth Strong beat, with the fifth syllable placed on the following Weak beat. While ordinary caesuras are just places where a word break is required, Deo (2007) shows that precisely in those cases where they are induced by empty positions they are marked by actual pauses or by lengthening the precaesural syllable. The rathoddhatā has exactly the same structure, but with syncopation instead of an empty beat. (33)

Hemistich S

W

S

S

W

S W

S

W

S

W W

S

W

The structure is binary and left-headed all the way up: bimoraic positions are grouped into binary feet, which are grouped into binary dipods, and these in turn are grouped into binary hemistichs; a pair of these, separated by a caesura, form a line. Strong positions are affiliated with Heavy syllables in Strong feet. In Weak feet they are affiliated with Light syllables and get their second mora from the preceding position by syncopation. The muẓāric turns out to be a syncopated Rubācῑ, as shown in (34). (34)

S

W W

S

S

S W

W

S

S

W W

W

S

S

W W

S

S

W

On the rigorous four-mora structure of the Berber meters, see Dell and Elmedlaoui (2008) and Riad (2016).

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

stress, meter, and text-setting 673

48.6 Text-setting Composing and performing a song requires matching three independent rhythmic structures of the form shown in (3): the intrinsic prosodic phonological representation given by the language itself, a metrical form, and a musical rhythm. Their respective prominent beats are preferentially aligned, and prosodic constituents of language (phonological words and phrases) are preferentially aligned with metrical constituents (feet, dipods, cola, lines) (see §48.2). The required correspondences and the permissible mismatches between them are regulated by conventions that evolve historically within limits grounded in the faculty of language. Predominant metrical systems and recitation/singing/text-setting practices in a poetic tradition are mutually accommodated and in time mutually optimized. Performance can be ‘tilted’ to reflect meter, and metrical forms must be compatible with prevailing text-setting/recitation practices.15 When meter and phonology can be mismatched, fitting a metrical text to a predetermined rhythmic pattern of a song or chant may require either neutralizing phonological features of the language or disregarding the metrical structure. The choice is resolved in partly conventional ways, but subject to some overriding functional principles. In general, the most salient features of a language are preferentially retained. For example, stress in English is highly salient, since it affects the prosodic organization and segmental phonology of the entire word. Therefore text-setting in English privileges phonological prominence over metrical prominence, ensuring the natural rendering of speech. This means preferentially aligning stressed syllables with Strong musical beats, regardless of whether they fall in Strong or Weak positions in the verse (Halle and Lerdahl 1993). For example, in (35) the composer sets the first two feet of an iambic line counter-metrically. (35) Yeats, The Secrets of the Old (Samuel Barber, Four Songs, Op. 13.2)

Although óld sits in a Weak metrical position of the iambic line (I have | old wo|men’s se|crets now), it is put on a Strong musical beat in order to bring out the contrastive stress required by the contrast between old and young in the text. The less salient stress is in a language, the more readily it can be mismatched with musical beats. According to Dell and Halle (2009), while English matches stresses to strong beats across the board, French does so only at the ends of lines, and French traditional songs on the other hand require a parallel pairing of syllables to beats in each stanza, unlike English. They propose to derive both differences from the fact that stress is perceptually salient throughout the utterance in English, and only before major breaks in French. 15 For example, Neoclassical recitation conventions that highlight the meter were incompatible with the mismatch-rich Baroque and Renaissance poetry and led to the eclipse of poets such as Donne and Wyatt.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

674 Paul Kiparsky Finnish stress is an intermediate case. It is predictable as in French, yet salient as in English, because it conditions much of the word phonology and allomorphy. In songs, musically prominent beats are generally aligned with stress-bearing word-initial syllables, but they can also be aligned with metrical Strong positions, resulting in shifted word stress. Even vowel length, which is contrastive, can be neutralized. The quatrains of a traditional ballad in (36) illustrate all these options. The singers lengthen the stressed syllables, pronouncing e.g. túlet ‘you come’ in line 1 as túulet.16 (36) Velisurmaaja ‘The Fratricide’, sung by Niekku (4/3 ballad meter). s

s

s

s

1. Mis.täs tú.let kús.tas tú.let, s

s

2. Poi.ka.ni í.loi.nen? s

Where are you coming from,

s

s

s

s

s

s

My cheerful son? I’m coming from the seashore,

3. Mí.e tú.len mé.ren rán.nast, s

4. Miu ä´i.ti.ni kul.tai.nén.

My dear mother.

When an unstressed syllable falls on a Strong beat, the singers sometimes shift the word stress onto it, as in line 3 of the quatrain in (37). s

s

s

s

(37) 1. Míst on míek.kais vér.re.hen túl.lu, s

s

s

2. Poi.ka.ni í.loi.nen? s

s

s

My cheerful son?

s

3. Míe ta.póin van.hém.man vél.jen, s

s

s

How did your sword get bloody,

4. Miu ä´i.ti.ni kul.tai.nén.

I killed my older brother, My dear mother.

Native Finnish meters are ingeniously adapted to jointly optimize both variety and phonological faithfulness to vowel length and stress. The Kalevala meter consists of eight-syllable lines made up of four trochaic (Strong-Weak) feet, with obligatory alliteration and parallelism. The basic metrical rule is that a stressed (i.e. word-initial) syllable must be in Weak positions and – in Strong positions ( = CV̆). This avoids lengthening of vowels in the sensitive first syllable, which would neutralize the most important site of the length contrast. The resulting mismatches between stress and metrical positions in words with an odd number of syllables create the main rhythmic excitement of this meter. In (38), the singers foreground the tension between phonology and meter by singing lines 1 and 3 first with the original word stress, and then a second time shifted to the metrically Strong syllable, in each case with lengthening of some of the stressed vowels.17 s s s s (38) a1. Ei pí.täi.si núo.ren néi.en s

s

s

A young girl shouldn’t

s

a2. Ei pi.tä´i.si núo.ren néi.en s

s

s

A young girl shouldn’t

s

A chick still growing

b. Vás.ta.kás.va.van ká.na.sen s

s

s

s

s

s

s

c1. Ús.ko.a ú.ron sá. no. ja s

Trust the words of a male

c2. Us.ko.á ú.ron sa.nó.ja s

s

s

Trust the words of a male s

d. Míe.hen váls.kin ván.non.noi.ta The promises of a cheating man 16 https://shiftwwww.youtube.com/watch?v=NwAcHcuYzL4. 17 Niekku, Suomen kansanmusiikki 1 (Folk Music of Finland 1), Kansanmusiikki-instituutti 1988.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

stress, meter, and text-setting 675 In languages where quantity is more salient than stress, it is reflected more faithfully in textsetting. The famous ghazal singer Begum Akhtar respected the syllable count and syllable weight as far as possible while maintaining Urdu’s phonemic vowel length contrast. (39) is the first distich of a ghazal by Ghalib. (39)

sab kahā˜ kuchh lāla-ō-gul mē˜ numāyā˜¯ hō gaʾˉ˜i ḳhāk mē˜ kyā ṡūratē˜ hō˜gī ke pinhā˜¯ hō gaʾˉ˜i

Where are they now? The earth has covered their eyes. Some did as roses or as tulips rise.

In Begum Akhtar’s rendition,18 the meter – – – | – – – | – – – | – – (Ramal) is disrupted twice in the first line: the first foot has five syllables, and the third foot has three. (40) sa bə ka -hā˜ kuchh lā

l-ō -gul mē˜ nu mā yā˜¯ hō ga -ʾ˜ˉi

The extra vowel is not phonemic but inserted, and the missing vowel is phonemic but deleted. With the underlying vocalism in place, as in (39), the line scans perfectly. The discrepancy between scansion and pronunciation is systematic in this artist’s songs. She uses the optional process of ə-insertion to allot more musical time to metrically and phonologically heavy syllables without obliterating Urdu’s phonemic length contrast. Heavy syllables, especially in odd-numbered positions in this meter, are sung as long. The question is how to lengthen a CV̌ C syllable. When it ends in a sonorant, as in gul ‘rose’, ham ‘we’, par ‘on’, the note is simply extended over the syllable coda. But in words like sǝb ‘all’, the obstruent cannot sustain a note, nor can the vowel lengthen, since that would turn it into another word. Such words are sung with final -ǝ inserted as a last resort to give them their weight in the song, e.g. as in sabǝ, jisǝ, usǝ, miṭǝ. Here faithfulness to syllable weight trumps faithfulness to the syllable count.19 The second phonological process seen in this line is the deletion of vowels in hiatus: lāla-ō-gul → lālōgul. It applies regularly and obligatorily when two vowels come into close contact in ordinary speech. In both cases, the meter is checked before the phonological rules take effect, while the performance is based on their output. In contrast, Classical Greek, Sanskrit, and Latin meters are checked post-lexically, after all the phonological rules have applied, including resyllabification across word boundaries. This variation in the phonology/meter interface illustrates another parametric difference between metrical systems that fits naturally into a theory of the form (1). The investigation of text-to-tune alignment in cross-linguistic perspective is only beginning, but preliminary results promise rich insights into the relations between the prosodic structures of language, meter, and music.

Acknowledgements Thanks to Elan Dresher for his careful reading of an early draft of this chapter; to Alan Rumsey for information on Ku Waru, and valuable discussion of it at several stages; and to two reviewers.

18 In raga Bhairavī and the rūpak measure of 7 beats, divided 3+2+2, with beat 1 silent, and beats 4 and 6 strong. 19 No ǝ-insertion takes place in foot-final syllables, e.g. kuchh, which are treated as non-prominent in the song.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

chapter 49

Ton e–M el ody M atchi ng i n tone-Language Singing D. Robert Ladd and James Kirby

49.1 Introduction Speakers of European languages are apt to wonder how it is possible to sing in a tone language—that is, how there can be a musical melody independent of the linguistically specified pitch on each syllable. The short answer, which looks completely uninformative at first sight, is that for a given sequence of linguistic tones, some musical melodies are a better fit than others. In this chapter we outline some of the principles that make for a good fit. Before we start, it is important to point out that the idea of a good fit between tune and text is not unique to tone languages, though the principles are rather different in languages without lexical tone. As set out in detail in chapter 48, it is important for major stressed syllables to occur on strong musical beats in a language like English and perhaps even more important for unstressed syllables not to occur there. A mis-setting well known to classical musicians is Handel’s treatment of the phrase and the dead shall be raised incorruptible in the aria ‘The Trumpet Shall Sound’ from Messiah. Handel’s original version is as in (1), where the unstressed syllable -ti- occurs on the final strong beat of the musical phrase. (It seems plausible to suggest that Handel’s native German, in which that same syllable would have borne the main stress, was responsible for the apparent musical lapse.) (1)

Although the result is unlikely to affect the intelligibility of the text, it still feels wrong to any native speaker of English, and in most editions and most performances the setting is edited to (2).

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Tone–Melody Matching in tone-Language Singing 677 (2)

From the point of view of musical aesthetics, the edited version arguably sounds inappropriately bouncy, yet the change is all but forced on editors and performers by the structural principles governing the match between tune and text; see further chapter 48 for metrically governed text-setting. There is now a considerable body of work showing that structural principles—principles that are not reducible to matters of aesthetics—are involved in defining a good match between linguistic tone and musical melody as well. A particularly clear demonstration of systematic correspondence between tone and melody is found in a paper by Chan (1987b) on Cantonese pop songs. Chan observed that the sequences of syllable tones in lyrical stanzas sung to the same melody were strikingly similar from verse to verse, even though the words themselves were different. She treated this as evidence of a systematic relationship between a sequence of musical notes and the corresponding sequence of syllable tones in the accompanying text. She found that an important aspect of this systematic relationship involves matching the direction of pitch change from one syllable (or one musical note) to the next. The critical role of the relation between one note/tone and the next seems to have been overlooked by many earlier investigators. For example, a short paper by Bright (1957) on song in Lushai (a Tibeto-Burman language of Assam State, India) compares the pitch contours of individual musical notes with those of the corresponding spoken syllables, finds no systematic relation, and concludes that tone is simply ignored in singing. The first author’s work on Dinka song (discussed further in §49.3.4) originally made the same mistake, looking in vain for cues to linguistic tone in the acoustic properties of individual notes. This is not to assert that lexical tone never influences musical pitch—in fact, Chan (1987c) herself found such effects in songs from popular Mandarin films of the 1970s and 1980s, and more recently Schellenberg and Gick (2020) have shown experimentally that rising tones have ‘microtonal’ effects on the realization of individual notes in Cantonese. Rather, the point here is simply to emphasize that the relative pitch of adjacent syllables is apparently salient to both composers and listeners. Once we are alert to the importance of the relation between adjacent notes, we discover that many unrelated and typologically dissimilar tone languages with widely divergent musical traditions follow remarkably similar principles of matching tone and musical melody. The primary goal of this chapter is to summarize recent work on tone–melody matching in tone languages and to show how focusing on pitch direction across notes makes it possible to formulate clear, tractable research questions. Our emphasis is on general structural principles, not details of performance practices or specific genres, though a clearer understanding of the structural principles should eventually make it easier to interpret performance practices as well.1 We also do not speculate about the possibility of an overarching 1 Correspondence between tone and melody in traditional art forms, such as Cantonese opera, is also well studied (e.g. Yung 1983a, 1983b), but the problem of matching tune and text in these cases is to some extent a matter of performance practices rather than text-setting. Roughly speaking, in the acculturated (i.e. Western-influenced) musics of much of East and South East Asia, melodies are relatively fixed and texts must be chosen to fit, whereas in many ‘traditional’ forms the melodies are fairly abstract templates and may be modified in performance to achieve optimal tone–melody correspondence with a

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

678 D. ROBERT LADD AND JAMES KIRBY theory of text-setting that would unify the formal principles of tone–melody matching in tone languages with those of traditional European metrics, but this also seems an interesting long-term goal (see §49.4).

49.2 Defining and investigating tone–melody matching To provide a descriptive framework for discussing pitch direction across notes, we repurpose some standard terms from classical Western music theory. In polyphonic music, ‘contrary motion’ is present when one melodic line moves down in pitch while another is moving up; in ‘similar motion’, two lines move together in the same direction (without necessarily maintaining the same intervals between them); in ‘oblique motion’, one line moves (up or down) while the other stays on the same pitch. We propose to use the same terms to refer to the relationship between the movement of the musical melody and the corresponding linguistic pitch; for most of this chapter we refer only to the pitch direction across pairs of consecutive notes or syllables, which we refer to henceforth as ‘bigrams’. Given a tonal sequence /tàpá/, where grave accent represents low tone and acute represents high, the bigrams in (3) exemplify the three possible ways of setting a bigram to a musical melody.2 (3)

It can be seen that our definitions identify two subtypes of oblique setting, which may eventually need to be distinguished more carefully. In Type I (3c), two consecutive syllables have different tones but are sung on the same note; in Type II (3d), two consecutive syllables have the same tone but are sung on different notes. The basic constraint on tone–melody matching found in most of the tone-language song traditions that have been investigated over the past few decades might be summed up as ‘avoid contrary settings’: if the linguistic pitch goes up from one syllable to the next, the corresponding musical melody should not go down, and vice versa. Expressing the constraint in this way allows us to formulate hypotheses that can be tested against quantitative data. Specifically, if we tally all the bigram settings in a song corpus and label them as similar, articular text. This issue also arises in the analysis of tone–melody correspondence in a number of p South East Asian vocal traditions; see e.g. Williamson (1981) on Burmese, Tanese-Ito (1988) and Swangviboonpong (2004) on Thai, Chapman (2001) on Lao, Norton (2009) on Vietnamese, Lissoir (2016) on Tai Dam, and Karlsson (2018) on Kammu. 2 Schellenberg (2009), who considered the general problem of tone–melody correspondence in more or less the way discussed here, refers to ‘opposing’, ‘non-opposing’, and ‘parallel’ pairs of notes for what we are calling ‘contrary’, ‘oblique’, and ‘similar’ settings.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Tone–Melody Matching in tone-Language Singing 679 oblique, or contrary, we find that contrary settings are far less common than would be expected if tone–melody matching were unconstrained. The avoidance of contrary settings has been shown to provide a good quantitative account of tone–melody correspondences in the songs of numerous tone languages, including Zulu (Rycroft 1959), Hausa (Richards 1972), Cantonese (Chan 1987b; Ho 2006, 2010; Lo 2013), Shona (Schellenberg 2009), Thai (Ketkaew and Pittayaporn 2014, 2015), Vietnamese (Kirby and Ladd 2016b), and Tommo So (McPherson and Ryan 2018).3 We review some of these studies in more detail in §49.3. Given the three types of setting, ‘avoid contrary settings’ is quite a weak constraint; a much stronger constraint would require similar settings. Of the languages and musical genres that have been studied, Cantonese pop music seems to come closest to this stronger constraint, but it still allows oblique settings in certain contexts (see §49.3.1). In other musical traditions, oblique settings appear entirely acceptable, and even contrary settings may not be consistently avoided. Nevertheless, it is clear that speakers of many tone languages are sensitive to the constraint against contrary settings, in that they will intuitively interpret, say, a musical line that rises from one note to the next as corresponding to a rise from a lower to a higher tone. There are numerous anecdotes of Christian hymns that have been inappropriately translated into tone languages by European missionaries who were unaware of the force of this constraint (e.g. Carter-Ényì 2018). Before we can investigate these questions empirically in terms of the properties of bigrams, we must first deal with a specific methodological question: how to characterize bigrams that involve contour tones. A sequence of a low tone and a mid tone (/11/–/33/),4 or a high tone and a low tone (/55/–/11/), can unambiguously be treated as going up or down, respectively, so that defining the tone–melody correspondence for the bigram is unproblematic. Suppose, however, that we have a sequence of two rising tones, such as /25/–/25/. We could consider this to be a level sequence (because it involves a sequence of identical tones), so that a similar setting would be expected to involve a sequence of identical notes in the musical melody. However, we might also define the tonal bigram in terms of the overall pitch direction from the beginning of the first tone to the end of the second, in which case it involves a rise in pitch, or in terms of the very local pitch change from the end of the first tone to the beginning of the second, in which case it involves a fall. There is no obvious way to decide this a priori, but it has the virtue of being an empirical research question, because once we are aware of the existence of the basic constraint, we can investigate how such bigrams are actually treated in text-setting. Chan’s (1987c) study, which devoted considerable attention to this issue, suggests that the final (or target) pitch of each syllable is what counts for the purposes of defining the pitch direction across a tonal bigram, at least in Cantonese; in other words, a rising tone ending high (e.g. /25/) will be treated as high, a falling tone ending mid will be treated as mid (e.g. /53/), and so on. Whether or not this

3 Our wording here implies that oblique settings represent a lesser violation of tone–melody matching than contrary settings—a view that is also implicit in Schellenberg’s terms ‘opposing’ and ‘nonopposing’ and in the workings of McPherson and Ryan’s constraint-based analysis. This assumption has rightly been called into question by Proto (2016) on the basis of her findings on Fe’Fe’ Bamileke, and the issue certainly deserves closer investigation. 4 To transcribe syllable tone, here and elsewhere we use ‘Chao numbers’ (Chao 1930), which are still widely used in discussing East Asian tone languages. The voice range is divided into five levels from 1 (low) to 5 (high), and the tone on each syllable is indicated by a sequence of two numbers indicating the levels at the beginning and the end of the syllable.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

680 D. ROBERT LADD AND JAMES KIRBY

Table 49.1 Similar, contrary, and oblique settings, as defined by the relation between the pitch direction in a sequence of two tones and the two corresponding musical notes Melodic sequence

Tone sequence

Up Down Level

Up

Down

Level

Similar Contrary Oblique

Contrary Similar Oblique

Oblique Oblique Similar

Table 49.2 Expected frequencies of similar, oblique, and contrary settings Melodic sequence

Tone sequence

Up Down Level

Up

Down

Level

Frequent Rare Possible

Rare Frequent Possible

Possible Possible Frequent

‘offset principle’ is valid universally is an empirical matter, but data from some languages suggest that other parameterizations may also be possible (see §49.3). Finally, before proceeding to discuss individual cases, we introduce the matrix diagrams we will use to present quantitative data on tallies of bigrams. If in any pair of sung notes the pitch may go up, go down, or stay the same, and the pitch direction from the first syllable tone to the second—assuming answers to the kinds of definitional questions just discussed—can be up, down, or level, then there are nine possible types of tone–melody bigrams, as shown in Table 49.1. If contrary settings are regularly avoided and similar settings weakly preferred, we might expect to find the distribution of bigrams shown in Table 49.2. By filling the cells in the matrix with actual counts of bigram types, we can test and fine-tune such predictions.5

49.3 Some examples Space does not permit a thorough review of the literature but only a summary of a few representative cases with which we are most familiar. Some important early works not mentioned so far include Schneider (1943) and Jones (1959) on Ewe, Chao (1956) on Mandarin, 5 Note that in these matrices we define ‘level’ in terms of identical categories, either linguistic or musical. Two notes in sequence that count musically as, say, A flat count as level regardless of how accurately they are realized. In this we differ from the approach taken by Schellenberg (2009) in his exploration of tone–melody matching in Shona, which is otherwise quite similar. In his analysis Schellenberg defined up, down, and level in purely acoustic terms, with a very small (1.5 Hz) threshold for considering two pitches to be the same, which meant that he counted rather few level note sequences.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Tone–Melody Matching in tone-Language Singing 681 and List (1961) on Thai. Most work has considered tone languages spoken in Africa or East Asia; notable exceptions include Herzog (1934) on Navajo, Pugh-Kitingan (1984) on Huli, Baart (2004) on Kalam Kohistani, and Morey (2010) on Tai Phake. We do not discuss ‘pitch accent languages’ at all, but it seems fairly certain that in at least some such languages similar principles are at work: in Japanese, for example, an accented syllable is likely to be sung on a higher note than the preceding and/or the following syllables (Cho 2017). We direct the interested reader to Jürgen Schöpf ’s online bibliography (http://www.musikologie.de/ Literatur.html), which, while not exhaustive, contains many other useful references not mentioned here. Schellenberg (2012) and McPherson and Ryan (2018) also provide good overviews.

49.3.1 Cantonese pop music Several experimental and/or quantitative studies in the past two decades have investigated tone–melody matching in Cantonese pop music or ‘Cantopop’. Wong and Diehl (2002), without fully developing the three-way distinction between types of setting introduced in the previous section, did a small quantitative survey of the way tonal bigrams are actually treated melodically in text-setting. For the purposes of defining pitch direction from one tone to the next, they grouped the six tones of Cantonese into three sets, high, mid, and low, as shown in Table 49.3. Defining pitch direction on this basis, they observed similar settings in over 90% of cases. They also conducted a perceptual experiment in which Cantonese listeners were presented with short sung melodies containing an ambiguous test word; the perceived identity of the word was well predicted by assuming a match between the pitch direction in the musical melody and in the inferred linguistic tone sequence. Subsequent work by Ho (2006, 2010) and Lo (2013) proposed a modification of Wong and Diehl’s (2002) three-way classification: both authors suggest separating Tone 4 (21) from Tone 6 (22), so that a Tone 6–Tone 4 bigram would be treated as a fall (from low to extra-low) rather than as level (with both tones classed as low). Both showed that this change increases the number of instances of similar settings involving these two tones. Ho made numerous other refinements to the general approach, noting, for example, that when contrary settings occur, they almost always involve bigrams that straddle syntactic (and perhaps also musical) phrase boundaries (see e.g. Ho 2006: 1419). Lo (2013) undertook an

Table 49.3 The six Cantonese tones classified in terms of overall level, for the purposes of defining pitch direction in a sequence of two tones Tone

Phonetic description (Chao numbers)

Tone 1 Tone 2 Tone 3 Tone 4 Tone 5 Tone 6

high level (55) mid-high rising (35) mid-level (33) low falling (21) low-mid rising (23) low level (22)

Classification for defining tonal pitch direction Wong and Diehl (2002)

Ho (2006) and Lo (2013)

High High Mid Low Mid Low

High High Mid Extra-low Mid Low

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

682 D. ROBERT LADD AND JAMES KIRBY analysis of 11 Cantonese pop songs, classifying more than 2,500 bigrams into the matrix shown in Table 49.4. This shows clearly that both contrary and oblique settings are generally dispreferred, although Lo did find that in one specific context—sequences of High (/55/ or /35/) tones on a descending melody line—oblique settings seemed fairly common: over 84% of the 210 level tonal bigrams set to falling melodies were High–High sequences (a similar pattern was observed by Ho). As in Ho’s work, virtually all of the instances of contrary setting in Lo’s corpus occur across syntactic and/or musical phrase boundaries (Lo 2013: 31).

49.3.2 Vietnamese tân nhạ c Kirby and Ladd (2016b) considered tone–melody matching in a corpus of Vietnamese tân nhạc or ‘new music’, a broad term covering a number of Western-influenced genres of popular song produced since the 1940s. They took an empirical approach to the issue of whether to treat tonal bigrams as level, rising, or falling by enumerating all possible groupings of tones in the language and then ranking them based on the rates of similar and contrary setting that obtain under that grouping. As with Cantopop, the resulting grouping was found to refer primarily to tonal offsets, as shown in Table 49.5.

Table 49.4 Frequencies of similar (bold), oblique (underlined), and contrary (italic) settings in a 2,500-bigram corpus from Cantonese pop songs, from Lo (2013) Melodic sequence

Tone sequence

Up Down Level

Up

Down

Level

931 56 103

32 828 210*

22 39 444

* Mostly with H tones. (Lo 2013)

Table 49.5 The six Vietnamese tones classified in terms of overall level, for purposes of defining pitch direction in a sequence of two tones Tone (traditional name)

Phonetic description (Chao numbers)

Classification for defining tonal pitch direction (Kirby and Ladd 2016b)

ngang huyền sắc nặng hỏi ngã

mid-level (33) mid-falling (32) rising (24) checked (21ˀ) low falling (21) broken (3ˀ5)

Mid Low High Extra-low Extra-low High

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Tone–Melody Matching in tone-Language Singing 683

Table 49.6 Frequencies of similar (bold), oblique (underlined), and contrary (italic) settings in a corpus from Vietnamese ‘new music’ Melodic sequence

Tone sequence

Up Down Level

Up

Down

Level

1136 72 256

81 1111 273

84 59 473

(Kirby and Ladd 2016b)

As shown in Table 49.6, the majority of bigrams involved similar settings, although the overall percentage (77%) was lower than that previously found for Cantonese. Oblique settings appear to be tolerated to a certain degree, particularly when sequences of the midlevel (ngang) tone are involved, but contrary settings were found to be comparatively rare, again suggesting that avoidance of contrary setting may generally be more important than achieving ‘parallelism’ between tonal and melodic pitch direction (see also Ketkaew and Pittayaporn 2014; McPherson and Ryan 2018).

49.3.3 Contemporary Thai song Central Thai is normally regarded as having five lexical tones, conventionally labelled High, Mid, Low, Falling, and Rising. However, these labels are somewhat misleading regarding their modern phonetic realizations, and work on tone–melody mapping in contemporary Thai song has consistently found lower rates of similar setting than in, for example, Cantonese or Vietnamese, with rates reported as low as 40% (List 1961; Ho 2006; Ketkaew and Pittayaporn 2014). As discussed in §49.2, however, these rates are partly a function of how contour tones are classified for the purposes of defining the pitch direction across a bigram. Both Ho (2006) and Ketkaew and Pittayaporn (2014) note that the onset of the Falling tone /42/ appears to be more important than the offset, while the offset of the Rising tone /24/ is more relevant for tone–melody correspondence. In other words, both Falling and Rising tones appear to function more like High tones in at least some contemporary Thai genres. This suggests that the ‘offset principle’, while predicting high rates of tone– melody correspondence in Vietnamese and Cantonese, may not be a universal principle of tonal text-setting. Rather than group individual tones into discrete categories, Ketkaew and Pittayaporn (2014) treated the 25 possible types of tonal bigrams separately, grouping them into three categories (rising, falling, and level) based on the type of melodic transition they most often occurred with. For example, they treated the sequence Falling–Falling as a falling t ransition, but Rising–Rising as a level transition. The resulting correspondence matrix for their c orpus of 30 Thai pop songs is reproduced in Table 49.7. In addition to highlighting the problem of tonal classification, studies on Thai song raise a number of other issues. Many short syllables of polysyllabic Thai words are realized as

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

684 D. ROBERT LADD AND JAMES KIRBY

Table 49.7 Frequencies of similar (bold), oblique (underlined), and contrary (italic) settings in a bigram corpus from 30 Thai pop songs Melodic sequence

Tone sequence

Up Down Level

Up

Down

Level

1091 415 426

317 1039 483

230 275 594

(Ketkaew and Pittayaporn 2014)

toneless or with reduced tone on the surface (Potisuk et al. 1994; Peyasantiwong 1986), so simply considering all musical bigrams in an automatic fashion may inflate the number of ‘true’ mismatches. Other factors such as word status (lexical vs. grammatical), note value, and interval range may also need to be taken into account in building up a more accurate picture of tonal text-setting constraints in languages such as Thai.

49.3.4 Traditional Dinka songs Some years ago the first author was involved in a large funded project6 on Dinka song. One of the research aims was to try to understand how tone and other ‘suprasegmental’ phonological features are manifested in singing. This is not a peripheral topic in Dinka, for two reasons. First, song is ubiquitous in Dinka culture: individuals compose and own songs that are used for a wide range of social purposes (Deng 1973; Impey 2013). Second, Dinka phonology has an unusually rich suprasegmental component, including a two-way voicequality distinction (Andersen 1987b), a three-way quantity distinction (Remijsen and Gilley 2008), and a tonal system that is heavily used in the ablaut-based inflectional morphology (Andersen 1993, 2002; Ladd et al. 2009a). In most dialects there seem to be three distinct tones (low, high, and falling), but some may also include a fourth (rising) tone (Andersen 1987b; Remijsen and Ladd 2008). Musically, Dinka songs are generally performed unaccompanied. They are characterized by musical phrases of variable length, with a simple rhythmic pulse and no overarching metrical structure or any analogue to stanzas. The phrases often begin with big melodic movements and end with long stretches of syllables sung on the same pitch. The musical 6 ‘Metre and Melody in Dinka Speech and Song’, funded by the UK Arts and Humanities Research Council, 2009–2012. The project was the initiative of Bert Remijsen (Edinburgh); others involved were Angela Impey (SOAS, University of London) and Miriam Meyerhoff (formerly Edinburgh, now Oxford), and, among others in South Sudan, Peter Malek and Elizabeth Achol Deng. Much of the effort of the project was devoted to assembling an archive of some 200 Dinka songs, available from the British Library (http://sounds.bl.uk/World-and-traditional-music/Dinka-songs-from-South-Sudan). The question addressed here—tone–melody matching in Dinka—is unfortunately still best described as ‘work in progress’.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Tone–Melody Matching in tone-Language Singing 685 scale is an unremarkable anhemitonic pentatonic scale (in ordinary language, any scale that can be played using only the black notes of a piano keyboard), which is found in many unrelated musical traditions around the world. The texts are of paramount importance in many performances, which means that the question of whether and how tonal identity is conveyed in music is of considerable interest. As mentioned in the introduction, our initial approach to this question assumed that cues to the tone on a given syllable would be found on the syllable itself—for example, that low toned syllables might be produced slightly below pitch and high toned syllables slightly above. In a good deal of exploratory work on some of the songs in our corpus, we found no such pattern.7 The absence of any such effect for high and low tones led us to conclude that there are no local phonetic cues to the linguistic tone of a sung note. By contrast, bigram-based effects of the sort illustrated so far are certainly present. A preliminary tally based on about one minute each of three songs sung by three different singers—a total of 334 bigrams—is displayed in Table 49.8. It can be seen that just over half the bigrams exhibit similar settings and less than 10% involve contrary settings. As just mentioned, musical phrases in Dinka songs often end with long sequences of identical notes, which makes oblique setting (of Type I; see example (3c)) very likely; this can be seen in the rightmost column of the matrix. Oblique settings of Type II are encouraged by linguistic factors: over half the syllables in running text have low tone, and it seems accurate to describe low tone as ‘unmarked’, which means that there are many sequences of two, three, or even more low tones. This is reflected in the bottom row of the matrix. Fuller investigation of tone–melody mapping in Dinka is difficult for a variety of practical reasons, the most important being that tone varies somewhat from dialect to dialect and is not marked in the developing standard orthography. There are also significant questions about how to deal with toneless syllables, which are fairly common in Dinka. The tallies in Table 49.8 are based on a small set of tonally transcribed songs that we intend to use in a fuller analysis.

Table 49.8 Frequencies of similar (bold), oblique (underlined), and contrary (italic) settings in a 355-bigram pilot corpus from three Dinka songs Melodic sequence

Tone sequence

Up Down Level

Up

Down

Level

56 6 45

19 50 39

21 19 69

7 The negative finding is almost certainly meaningful, because our methods were sensitive enough to detect other small phonetic effects in the same song corpus. Specifically, we readily identified effects of vowel-intrinsic f0: mean f0 values for high vowels produced on a given musical note are slightly higher than for mid vowels, which in turn are slightly higher than for low vowels.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

686 D. ROBERT LADD AND JAMES KIRBY

49.4 Prospect As we have suggested throughout this chapter, an important benefit of treating the problem of tone–melody correspondence in tone-language singing as a matter of text-setting constraints is that this approach allows us to formulate clear directions for further research. In this brief final section we mention a few such possibilities. Perhaps the most obvious question that requires additional empirical evidence is one we have mentioned several times, namely how to determine whether a given tonal bigram should be expected to match a rising, level, or falling melodic sequence. We introduced this problem in connection with contour tones in §49.2 and illustrated some of the potential complexities in several of our case studies in §49.3. In Thai, for example, simply referencing the offset pitch of contour tones may prove inadequate to account for the way tonal bigrams match musical ones. Similarly, both Thai and Dinka raise the question of how to count toneless syllables in evaluating bigrams. In other languages, similar issues arise in connection with other well-known tone-language phenomena such as downstep and tone sandhi. Expressing these issues in terms of the correspondence between tonal and musical bigrams makes it possible to formulate explicit research questions with empirical answers. A more general question concerns the variation we may expect to find from one language or musical tradition to another. As our brief case studies have sought to illustrate, languages appear to differ regarding the degree to which they actively prefer similar settings or merely seek to avoid contrary settings. There also appear to be language-specific differences with respect to oblique settings: for instance, Type II oblique settings (sequences of identical tones on a moving melody; see example (3d)) may be tolerated more with certain tones than with others. If this is the case for a particular language, for which tones might it hold, and why? To what extent can the phonetic or phonological properties of a particular tone system predict the degree to which these constraints will be violated? Much more empirical work will be necessary in order to build up a better picture of the cross-linguistic landscape. Still another general question that needs investigation is whether it is possible and desirable to treat tone–melody mapping exclusively in terms of local constraints, which is the approach we have assumed throughout this chapter. This approach implies that there is no overarching plan to the melody, no sense in which long tone sequences must match whole musical phrases. Every bigram instantiates one of three types of setting, and the constraints on tone–melody correspondence are defined in terms of the type of setting on each bigram. Even the conditions under which the constraints can be relaxed (such as the fact that contrary settings may occur across syntactic and/or musical phrase boundaries) seem to be definable in strictly local terms. It is thus natural to ask whether there are instances in which sequences larger than bigrams are required, or whether highly local constraints (in conjunction with references to high-level structure, such as musical or syntactic boundaries) will suffice. To mention just one example, Carter-Ényì (2016) discusses three-note sequences like those in (4); he shows that, if there is a difference in pitch between the first and third notes, then there is a strong preference for the tones on those notes to differ: melody (4a) best matches H-L-M and melody (4b) best matches M-L-H.

OUP CORRECTED PROOF – FINAL, 04/12/20, SPi

Tone–Melody Matching in tone-Language Singing 687 (4)

Cases like (4) also raise the more general question of whether the magnitude of the pitch movement across a bigram—in musical terms, the interval between the two notes or syllables—is relevant to tone–melody correspondence. In this chapter we have treated bigrams simply as rising, falling, or level, without considering the difference between (for example) a small rise and a large one. However, there is plenty of reason to think that this difference may be important in some languages. For example, McPherson and Ryan (2018: 127–128) suggest that in their Tommo So song corpus, contrary settings involving two or more musical scale steps represent a more severe violation of text-setting principles than those involving only a single scale step. The degree to which such tendencies are observed in the tone languages of the world is an open question. A final question for future research is the interaction between tonal and metrical constraints. This issue has broad formal and theoretical aspects but also requires much language-specific empirical investigation. With regard to formal issues, McPherson and Ryan (2018) note several possible parallels between tonal and metrical text-setting, such as sensitivity to phrasal phonology, and Halle (2015) has discussed the extent to which constraints on tune–text matching, in both tonal text-setting and traditional European metrics, are based on purely local rather than long-distance structural relations. As for empirical differences between languages and between musical genres, metrical and tonal constraints may interact in very specific ways. For example, the texts in the vocal traditions of many tone languages often themselves obey specific poetic constraints on which tones may appear in certain metrical positions; moreover, the vocal melodies themselves may not be completely rigid, but will be shaped to accommodate the tones of the text. The nature of this bidirectional interaction between tune and text clearly invites further study.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

R eferences

Aarons, D. (1994). Aspects of the syntax of American Sign Language. PhD dissertation, Boston University. Abaev, V. I. (1949). Osetinskiy jazyk i fol’klor. St Petersburg: Akademiya Nauk SSSR. Abbi, A. (2013). A Grammar of the Great Andamanese Language: An Ethnolinguistic Study. Leiden: Brill. Abboub, N., N. Boll-Avetisyan, A. Bhatara, B. Höhle, and T. Nazzi (2016a). An exploration of rhythmic grouping of speech sequences by French- and German-learning infants. Frontiers in Human Neuroscience 10, 292. Abboub, N., T. Nazzi, and J. Gervain (2016b). Prosodic grouping at birth. Brain and Language 162, 46–59. Abdelghany, H. (2010). Prosodic phrasing and modifier attachment in Standard Arabic sentence processing. PhD dissertation, City University of New York. Abdelli-Beruh, N., J. Ahn, S. Yang, and D. Van Lancker Sidtis (2007). Acoustic cues differentiating idiomatic from literal expressions across languages. Paper presented at the American Speech and Hearing Association Convention, Boston. Abercrombie, D. (1965). Syllable quantity and enclitics in English. In D. Abercrombie, D. B. Fry, P. A. D. MacCarthy, N. C. Scott, and J. L. Trim (eds.), In Honour of Daniel Jones, 216–222. London: Longman. Abercrombie, D. (1967). Elements of General Phonetics. Edinburgh: Edinburgh University Press. Abitov, M. L., B. X. Balkarov, J. D. Desheriev, G. B. Rogava, X. U. El’berdov, B. M. Kardanov, and T. X. Kuasheva (1957). Grammatika kabardino-cherkesskogo literaturnogo jazyka. Moscow: Izdatel’stvo Akademii Nauk. Abolhasanizadeh, V., M. Bijankhan, and C. Gussenhoven (2012). The Persian pitch accent and its retention after the focus. Lingua 122(13), 1380–1394. Abondolo, D. (1998). The Uralic Languages. New York: Routledge. Abramson, A. S. (1972). Tonal experiments with whispered Thai. In A. Valdman (ed.), Papers in Linguistics and Phonetics to the Memory of Pierre Delattre, 31–44. The Hague: Mouton. Abramson, A. S. (1979). The coarticulation of tones: An acoustic study of Thai. In T. Luangthongkum, P. Kullavanijaya, V. Panupong, and K. Tingsabadh (eds.), Studies in Tai and Mon-Khmer Phonetics and Phonology in Honour of Eugenie J. A. Henderson, 1–9. Bangkok: Chulalongkorn University Press. Abramson, A. S., and T. Luangthongkum (2009). A fuzzy boundary between tone languages and voice-register languages. In G. Fant, H. Fujisaki, and J. Shen (eds.), Frontiers in Phonetics and Speech Science, 149–155. Beijing: Commercial Press. Abramson, A. S., P. W. Nye, and T. Luangthongkum (2007). Voice register in Khmuˈ: Experiments in production and perception. Phonetica 64, 80–104. Abramson, A. S., M. K. Tiede, and T. Luangthongkum (2015). Voice register in Mon: Acoustics and electroglottography. Phonetica 72, 237–256. Aceto, M. (2008). Eastern Caribbean language varieties: Phonology. In B. Kortmann and E. W. Schneider (eds.), Varieties of English. Vol. 2: The Americas and the Caribbean, 290–310. Berlin: Mouton de Gruyter. Acton, B. (2011). Walkabout talk: The TalkaboutWalkabout. Retrieved 22 May 2020 from http:// hipoeces.blogspot.ca/2011/11/walking-talk-talk-about-walkabout.html.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

690 References Acton, B. (2019). Essential Haptic-Integrated English Pronunciation (EHIEP). Retrieved 22 May 2020 from https://www.actonhaptic.com/about. Adam, G., and O. Bat-El (2009). When do universal preferences emerge in language development? The acquisition of Hebrew stress. Brills Journal of Afroasiatic Languages and Linguistics 1, 255–282. Adams, S. H., and J. P. Jarvis (2006). Indicators of veracity and deception: An analysis of written statements made to police. International Journal of Speech, Language and the Law 13, 21. Adelaar, K. A., and N. P. Himmelmann (2005). The Austronesian Languages of Asia and Madagascar. London: Routledge. Adelaar, W. F. H. (2004). The Languages of the Andes. Cambridge: Cambridge University Press. Adriani, N., and S. J. Esser (1939). Koelawische taalstudien (Bibliotheca Celebica, I, II, III). Bandung: A. C. Nix. Agnew, A., and E. G. Pike (1957). Phonemes of Ocaina (Huitoto). International Journal of American Linguistics 23, 24–27. Aguilar, A., G. Caballero, L. Carroll, and M. Garellek (2015). Multi-dimensionality in the tonal realization of Choguita Rarámuri (Tarahumara). Paper presented at the Society for the Study of Indigenous Languages of the Americas Annual Meeting, Portland. Ahn, B. (2008). A Case of OCP Effects in Intonational Phonology. Ms., University of California, Los Angeles. Ahn, S. (2018). The role of tongue position in laryngeal contrasts: An ultrasound study of English and Brazilian Portuguese. Journal of Phonetics 71, 451–467. Aikhenvald, A. (1995). Bare (Languages of the World/Materials 100). Munich: Lincom Europa. Aikhenvald, A. (1998). Warekena. In D. C. Derbyshire and G. K. Pullum (eds.), Handbook of Amazonian Languages, 225–440. Berlin: Mouton de Gruyter. Aikhenvald, A. (1999). The Arawak language family. In R. M. W. Dixon and A. Aikhenvald (eds.), The Amazonian Languages, 65–106. Cambridge: Cambridge University Press. Ainsworth, W. A. (1972). Duration as a cue in the recognition of synthetic vowels. Journal of the Acoustical Society of America 51, 648–651. Aissen, J. (1987). Tzotzil clause structure. Dordrecht: Kluwer Academic. Aissen, J. (1992). Topic and focus in Mayan. Language 68, 43–80. Aissen, J. (1999). External possessor and logical subject in Tz’utujil. In D. Payne and I. Barshi (eds.), External Possession, 451–485. Amsterdam: John Benjamins. Aissen, J. (2000). Prosodic conditions on anaphora and clitics in Jakaltek. In A. Carnie and E. Guilfoyle (eds.), The Syntax of Verb Initial Languages, 185–200. Oxford: Oxford University Press. Aissen, J. (2017a). Information structure in Mayan. In J. Aissen, N. England, and R. Z. Maldonado (eds.), The Mayan Languages, 293–324. New York: Routledge. Aissen, J. (2017b). Special clitics and the right periphery in Tsotsil. In C. Bowern, L. R. Horn, and R. Zanuttini (eds.), On Looking into Words (and Beyond): Structures, Relations, Analyses, 235–262. Berlin: Language Science Press. Aissen, J., N. England, and R. Z. Maldonado (eds.) (2017). The Mayan Languages. New York: Routledge. Akinlabi, A. (1985). Tonal underspecification and Yoruba tone. PhD dissertation, University of Ibadan. Akinlabi, A., and M. Y. Liberman (2000). The tonal phonology of Yoruba clitics. In B. Gerlach and J. Grijzenhout (eds.), Clitics in Phonology, Morphology and Syntax, 31–62. Amsterdam: John Benjamins. Akinlabi, A., and E. E. Urua (2006). Foot structure in the Ibibio verb. Journal of African Languages and Linguistics 24, 119–160. Akker, E., and A. Cutler (2003). Prosodic cues to semantic structure in native and nonnative listening. Bilingualism: Language and Cognition 6, 81–96. Akumbu, P. W. (2019). A featural analysis of mid and downstepped high tone in Babanki. In E. Clem, P. Jenks, and H. Sande (eds.), Theory and description in African Linguistics: Selected papers from the 47th Annual Conference on African Linguistics, 3–20. Berlin: Language Science Press. Al-Ali, M. N., and M. Q. Al-Zoubi (2009). Different pausing, different meaning: Translating Quranic verses containing syntactic ambiguity. Perspectives: Studies in Translatology 17, 227–241. Alber, B., and S. Arndt-Lappe (2012). Templatic and subtractive truncation. In J. Trommer (ed.), The Morphology and Phonology of Exponence, 289–325. Oxford: Oxford University Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 691 Alberti, P. W. (1995). The Anatomy and Physiology of the Ear and Hearing. Toronto: University of Toronto Press. Albin, A. L. (2015). Typologizing native language influence on intonation in a second language: Three transfer phenomena in Japanese EFL learners. PhD dissertation, Indiana University. Albin, D. D., and C. H. Echols (1996). Stressed and word-final syllables in infant-directed speech. Infant Behavior and Development 19(4), 401–418. Alderete, J. (2001a). Dominance effects as transderivational anti-faithfulness. Phonology 18, 201–253. Alderete, J. (2001b). Morphologically Governed Accent in Optimality Theory. New York: Routledge. Alexander, J. (2010). The theory of adaptive dispersion and acoustic-phonetic properties of crosslanguage lexical-tone systems. PhD dissertation, Northwestern University. Alhawary, M. T. (2011). Modern Standard Arabic Grammar: A Learner’s Guide. Malden: John Wiley and Sons. Ali-Cherif, A., M. L. Royere, A. Gosset, M. Poncet, G. Salamon, and R. Khalil (1984). Behavior and mental activity disorders after carbon monoxide poisoning: Bilateral pallidal lesions. Revue Neurologique 140, 401–405. Alieva, N. F. (1984). A language-union in Indo-China. Asian and African Studies 20, 11–22. Allen, G. D. (1975). Speech rhythm: Its relation to performance universals and articulatory timing. Journal of Phonetics 3, 75–86. Allen, G. D., and S. Hawkins (1978). The development of phonological rhythm. In A. Bell and J. Bybee Hooper (eds.), Syllables and Segments, 173–185. Amsterdam: North Holland. Allen, G. D., and S. Hawkins (1980). Phonological rhythm: Definition and development. Child Phonology 1, 227–256. Allison, E. J. (1979). The phonology of Sibutu Sama: A language of the southern Philippines. Studies in Philippine Linguistics 3, 63–104. Allison, S. (2012). Aspects of a grammar of Makary Kotoko (Chadic, Cameroon). PhD dissertation, University of Colorado. Alter, K., and U. Junghanns (2002). Topic-related prosodic patterns in Russian. In P. Kosta and J. Frasek (eds.), Current Approaches to Formal Slavic Linguistics, 73–83. Frankfurt: Peter Lang. Altshuler, D. (2009). Osage fills the gap: The quantity insensitive iamb and the typology of feet. International Journal of American Linguistics 75, 365–398. Álvarez, J. (1994). Estudios de lingüística guajira. Maracaibo: Gobernación del Estado Zulia, Secretaría de Cultura. Alzaidi, M. S., Y. Xu, and A. Xu (2018). Prosodic encoding of focus in Hijazi Arabic. Speech Communication 106, 127–149. Amano, S., T. Nakatani, and T. Kondo (2006). Fundamental frequency of infants’ and parents’ utterances in longitudinal recordings. Journal of the Acoustical Society of America 119, 1636–1647. Ambrazaitis, B., and D. House (2017). Multimodal prominences: Exploring the patterning and usage of focal pitch accents, head beats and eyebrow beats in Swedish television news readings. Speech Communication 95, 100–113. Ambrazaitis, G., and O. Niebuhr (2008). Dip and hat pattern: A phonological contrast of German? In Proceedings of Speech Prosody 4, 269–272, Campinas. American Psychiatric Association (2013). Diagnostic and Statistical Manual of Mental Disorders (DSM-5®). Washington, DC: American Psychiatric Association. American Speech-Language Hearing Association (2017). Dysarthria. (n.d.). Retrieved 7 May 2020 from http://www.asha.org/public/speech/disorders/dysarthria/#what_is_dysarthria. American Speech-Language Hearing Association (2018). Causes of hearing loss in children. (n.d.). Retrieved 7 May 2020 from https://www.asha.org/public/hearing/causes-of-hearing-loss-in-children/. Amha, A. (2012). Omotic. In Z. Frajzyngier and E. Shay (eds.), The Afroasiatic Languages, 423–504. Cambridge: Cambridge University Press. An, G., S. I. Levitan, J. Hirschberg, and R. Levitan (2018). Deep personality recognition for deception detection. In INTERSPEECH 2018, 421–425, Hyderabad. Andersen, T. (1987a). An outline of Lulubo phonology. Studies in African Linguistics 18, 39–65.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

692 References Andersen, T. (1987b). The phonemic system of Agar Dinka. Journal of African Languages and Linguistics 9, 1–27. Andersen, T. (1993). Vowel quality alternation in Dinka verb inflection. Phonology 10, 1–42. Andersen, T. (1995). Morphological stratification in Dinka: On the alternations of voice quality, vowel length and tone in the morphology of transitive verbal roots in a monosyllabic language. Studies in African Linguistics 23, 1992–1994. Andersen, T. (2002). Case inflection and nominal head marking in Dinka. Journal of African Languages and Linguistics 23, 1–30. Anderson, D. (1962). Conversational Ticuna. Yarinacocha: Instituto Lingüístico de Verano. Anderson, G. D. S. (2004). The languages of Central Siberia. In E. J. Vajda (ed.), Languages and Prehistory of Central Siberia. Amsterdam: John Benjamins, 1–119. Anderson, S. (1978). Tone features. In V. A. Fromkin (ed.), Tone: A Linguistic Survey, 133–176. New York: Academic Press. Anderson, V., and Y. Otsuka (2006). The phonetics and phonology of ‘definitive accent’ in Tongan. Oceanic Linguistics 45, 21–42. Anderson-Hsieh, J., and R. M. Dauer (1997). Slowed-down speech: A teaching tool for listening/ pronunciation. Retrieved 7 May 2020 from http://files.eric.ed.gov/fulltext/ED413774.pdf. Andreassen, H., and J. Eychenne (2013). The French foot revisited. Language Sciences 39, 126–140. Andreeva, B. (2007). Zur Phonetik und Phonologie der Intonation der Sofioter-Varietät des Bulgarischen, PHONUS 12. PhD dissertation, University of the Saarland. Andreeva, B., T. Avgustinova, and W. J. Barry (2001). Link-associated and focus-associated accent patterns in Bulgarian. In G. Zybatow, U. Junghanns, G. Mehlhorn, and L. Szucsich (eds.), Current Issues in Formal Slavic Linguistics, 353–364. Frankfurt: Peter Lang. Andreeva, B., W. J. Barry, and J. C. Koreman (2013). The Bulgarian stressed and unstressed vowel system: A corpus study. In INTERSPEECH 2013, 345–348, Lyon. Andreeva, B., W. J. Barry, and J. C. Koreman (2016). Local and global cues in the prosodic realization of broad and narrow focus in Bulgarian. In M. Żygis and Z. Malisz (eds.), Slavic perspectives on prosody (special issue), Phonetica 73, 260–282. Andreeva, B., and D. Oliver (2005). Information structure in Polish and Bulgarian: Accent types and peak alignment in broad and narrow focus. In S. Franks, F. Y. Gladney, and M. Tasseva-Kurktchieva (eds.), Formal Approaches to Slavic Linguistics 13: The South Carolina Meeting (2004), 1–12. Ann Arbor: Michigan Slavic Publications. Andruski, J. E., and M. Ratliff (2000). Phonation types in production of phonological tone: The case of Green Mong. Journal of the International Phonetic Association 30, 37–61. Anolli, L., R. Ciceri, and M. G. Infantino (2002). From ‘blame by praise’ to ‘praise by blame’: Analysis of vocal patterns in ironic communication. International Journal of Psychology 27, 266–276. Anonby, E. J. (2010). A Grammar of Mambay. Cologne: Rüdiger Köppe. Antonio Ramos, P. (2015). La fonología y morfología del zapoteco de San Pedro Mixtepec. PhD dissertation, Centro de Investigaciones y Estudios Superiores en Antropología Social. Antunes, G. (2004). A Grammar of Sabanê: A Nambikwaran Language. Utrecht: LOT. Aoun, J. E., E. Benmamoun, and L. Choueiri (2010). The Syntax of Arabic. Cambridge: Cambridge University Press. Aoyama, K., and S. G. Guion (2007). Prosody in second language acquisition: Acoustic analyses of duration and f0 range. In O.-S. Bohn and M. J. Munro (eds.), Language Experience in Second Language Speech Learning: In Honour of James Emil Flege, 281–297. Amsterdam: Benjamins. Apopeiu, V., D. Jitcă, and A. Turculeţ (2006). Intonational structures in Romanian yes-no questions. Computer Science Journal of Moldova 14(1), 113–137. Apperly, I. A. (2012). What is ‘theory of mind’? Concepts, cognitive processes and individual differences. Quarterly Journal of Experimental Psychology 65, 825–839. Aquilina, J. (1959). The Structure of Maltese. Malta: Progress Press. Aquilina, J. (1965). Teach Yourself Maltese. London: English Universities Press. Archibald, J. (1992). Transfer of L1 parameter settings: Some empirical evidence from Polish metrics. Canadian Journal of Linguistics 37, 301–339.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 693 Archibald, J. (1993). Learnability of English metrical parameters by adult Spanish speakers. International Review of Applied Linguistics 31/32, 129–142. Archibald, J. (1994). A formal model of learning L2 prosodic phonology. Second Language Research 10, 215–240. Archibald, J. (1997). The acquisition of second language phrasal stress: A pilot study. Language Acquisition and Language Disorders 16, 263–290. Arellanes Arellanes, F. (2009). El sistema fonológico y las propiedades fonéticas del zapoteco de San Pablo Güilá: Descripción y análisis formal. PhD dissertation, Colegio de México. Argyle, M., and M. Cook (1976). Gaze and Mutual Gaze. Cambridge: Cambridge University Press. Århammar, N. (1968). Friesische Dialektologie. In L. E. Schmitt (ed.), Germanische Dialektologie, 264–317. Wiesbaden: Steiner. Arias, J. P., N. B. Yoma, and H. Vivanco (2010). Automatic intonation assessment for computer aided language learning. Speech Communication 52, 254–257. Ariel, M. (1991). The function of accessibility in a theory of grammar. Journal of Pragmatics 16, 443–463. Armstrong, M. E., N. Esteve-Gibert, I. Hübscher, A. Igualada, and P. Prieto (2018). Developmental and cognitive aspects of children’s disbelief comprehension through intonation and facial gesture. First Language, 38(6), 596–616. Armstrong, M. E., and I. Hübscher (2018). Children’s development of internal state prosody. In P. Prieto and N. Esteve-Gibert (eds.), The Development of Prosody in First Language Acquisition, 271–294. Amsterdam: John Benjamins. Armstrong, M. E., and M. M. Vanrell (2016). Intonational polar question markers and implicature in American English and Majorcan Catalan. In Proceedings of Speech Prosody 8, 158–162, Boston. Árnason, K. (1987). The stress of prefixes and suffixes in Icelandic. In K. Gregersen and H. Basbøll (eds.), Nordic Prosody IV: Papers from a Symposium, 137–146. Odense: Odense University Press. Árnason, K. (1994–1995). Tilraun til greiningar á íslensku tónfalli. Íslenskt mál 16–17: 99–131. Árnason, K. (1998). Toward an analysis of Icelandic intonation. In S. Werner (ed.), Nordic prosody: Proceedings of the VIIth Conference, Joensuu 1996, 49–62. Frankfurt: Peter Lang. Árnason, K. (2005). Hljóð: Handbók um hljóðfræði og hljóðkerfisfræði. Reykjavík: Almenna bókafélagið. Árnason, K. (2009). Phonological domains in modern Icelandic. In J. Grijzenhout and B. Kabak (eds.), Phonological Domains: Universals and Deviations, 283–313. Berlin: Mouton de Gruyter. Árnason, K. (2011). The Phonology of Icelandic and Faroese. Oxford: Oxford University Press. Árnason, K., and H. Þorgeirsson (2017). Tonality in earlier Icelandic. In J. E. Abrahamsen, J. C. Koreman, and W. A. van Dommelen (eds.), Nordic Prosody: Proceedings of the XIIth Conference, Trondheim 2016, 51–62. Frankfurt: Peter Lang. Arnfield, S. (1994). Prosody and syntax in corpus based analysis of spoken English. PhD dissertation, University of Leeds. Arnhold, A. (2007). Focus realisation in West Greenlandic intonation. MA thesis, University of Potsdam. Arnhold, A. (2014a). Prosodic structure and focus realization in West Greenlandic. In S.-A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 216–251. Oxford: Oxford University Press. Arnhold, A. (2014b). Finnish prosody: Studies in intonation and phrasing. PhD dissertation, Goethe University. Arnhold, A. (2015). What do compounds and noun phrases tell us about tonal targets in Finnish? Nordic Journal of Linguistics 38, 221–244. Arnhold, A. (2016). Complex prosodic focus marking in Finnish: Expanding the data landscape. Journal of Phonetics 56, 85–109. Arnhold, A., A. Chen, and J. Järvikivi (2016). Acquiring complex focus-marking: Finnish four- to fiveyear-olds use prosody and word order in interaction. Frontiers in Psychology 7, 1886. Arnhold, A., R. Compton, and E. Elfner (2018). Prosody and wordhood in South Baffin Inuktitut. In M. Keough, N. Weber, A. Anghelescu, S. Chen, E. Guntly, K. Johnson, D. Reisinger, and

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

694 References O. Tkachman (eds.), Proceedings of the Workshop on Structure and Constituency in the Languages of the Americas 21, 30–39. University of British Columbia Working Papers in Linguistics 46. Arnhold, A., and A.-K. Kyröläinen (2017). Modelling the interplay of multiple cues in prosodic focus marking. Laboratory Phonology 8, 1–25. Arnhold, A., E. Elfner, and R. Compton (in press). Inuktitut and the concept of word-level prominence. In K. Bogomolets and H. van der Hulst (eds.), Word Prominence in Languages with Complex Morphology. Oxford: Oxford University Press. Arnold, J. E. (2008). THE BACON not the bacon: How children and adults understand accented and unaccented noun phrases. Cognition 108(1), 69–99. Arnold, W. (2011). Western Neo-Aramaic. In S. Weninger, G. Khan, M. P. Streck, and J. C. E. Watson (eds.), The Semitic Languages: An International Handbook, 685–696. Berlin: Mouton de Gruyter. Arnott, D. W. (1964). Downstep in the Tiv verbal system. African Language Studies 5, 34–51. Aronson, H. I. (1990). Georgian: A Reading Grammar. Bloomington: Slavica. Arvaniti, A. (1991). The phonetics of modern Greek rhythm and its phonological implications. PhD dissertation, University of Cambridge. Arvaniti, A. (1992). Secondary stress: Evidence from modern Greek. In G. J. Docherty and D. R. Ladd (eds.), Papers in Laboratory Phonology II: Gesture, Segment, Prosody, 398–423. Cambridge: Cambridge University Press. Arvaniti, A. (1994). Acoustic features of Greek rhythmic structure. Journal of Phonetics 22, 239–268. Arvaniti, A. (1998). Phrase accents revisited: Comparative evidence from Standard and Cypriot Greek. In Proceedings of the 5th International Conference on Spoken Language Processing, vol. 7, 2883–2886, Sydney. Arvaniti, A. (2000). The acoustics of stress in modern Greek. Journal of Greek Linguistics 1, 9–39. Arvaniti, A. (2007). Greek phonetics: The state of the art. Journal of Greek Linguistics 8, 97–208. Arvaniti, A. (2009). Rhythm, timing and the timing of rhythm. Phonetica 66, 46–63. Arvaniti, A. (2012). The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics 40, 351–373. Arvaniti, A. (2016). Analytical decisions in intonation research and the role of representations: Lessons from Romani. Laboratory Phonology 7(1), 6. Arvaniti, A. (2019). Crosslinguistic variation, phonetic variability, and the formation of categories in intonation. In Proceedings of the 19th International Congress of Phonetic Sciences 1–6, Melbourne. Arvaniti, A. (in press). The autosegmental-metrical model of intonational phonology. In J. A. Barnes and S. Shattuck-Hufnagel (eds.), Prosodic Theory and Practice. Cambridge, MA: MIT Press. Arvaniti, A., and E. Adamou (2011). Focus expression in Romani. In Proceedings of the 28th West Coast Conference on Formal Linguistics, 240–248. Somerville, MA: Cascadilla Proceedings Project. Arvaniti, A., and M. Atkins (2016). Uptalk in Southern British English. In Proceedings of Speech Prosody 8, 153–157, Boston. Arvaniti, A., and M. Baltazani (2005). Intonational analysis and prosodic annotation of Greek spoken corpora. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 84–117. Oxford: Oxford University Press. Arvaniti, A., M. Baltazani, and S. Gryllia (2014). The pragmatic interpretation of intonation in Greek wh-questions. In Proceedings of Speech Prosody 7, 1144–1148, Dublin. Arvaniti, A., and G. Garding (2007). Dialectal variation in the rising accents of American English. In J. Cole and J. I. Hualde (eds.), Laboratory Phonology 9, 547–576. Berlin: Mouton de Gruyter. Arvaniti, A., and S. Godjevac (2003). The origins and scope of final lowering in English and Greek. In Proceedings of the 15th International Congress of Phonetic Sciences, 1077–1080, Barcelona. Arvaniti, A., and D. R. Ladd (1995). Tonal alignment and the representation of accentual targets. In Proceedings of the 13th International Congress of Phonetic Sciences, vol. 4, 220–223, Stockholm. Arvaniti, A., D. R. Ladd, and I. Mennen (1998). Stability of tonal alignment: The case of Greek prenuclear accents. Journal of Phonetics 26, 3–25.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 695 Arvaniti, A., D. R. Ladd, and I. Mennen (2000). What is a starred tone? Evidence from Greek. In M. B. Broe and J. B. Pierrehumbert (eds.), Papers in Laboratory Phonology V: Acquisition and the Lexicon, 119–131. Cambridge: Cambridge University Press. Arvaniti, A., D. R. Ladd, and I. Mennen (2006a). Phonetic effects of focus and ‘tonal crowding’ in intonation: Evidence from Greek polar questions. Speech Communication 48, 667–696. Arvaniti, A., D. R. Ladd, and I. Mennen (2006b). Tonal association and tonal alignment: Evidence from Greek polar questions and contrastive statements. Language and Speech 49, 421–450. Arvaniti, A., and D. R. Ladd (2009). Greek wh-questions and the phonology of intonation. Phonology 26, 43–74. Arvaniti, A., and D. R. Ladd (2015). Underspecification in intonation revisited: A reply to Xu, Lee, Prom-on and Liu. Phonology 32, 537–541. Arvaniti, A., and T. Rodriquez (2013). The role of rhythm class, speaking rate, and f0 in language discrimination. Laboratory Phonology 4, 7–38. Arvaniti, A., M. Żygis, and M. Jaskuła (2017). The phonetics and phonology of the Polish calling melodies, Phonetica 73, 338–361. Aschmann, H. P. (1946). Totonaco phonemes. International Journal of American Linguistics 12, 34–43. Astésano, C., E. Gurman Bard, and A. E. Turk (2007). Structural influences on initial accent placement in French. Language and Speech 50, 423–446. Astruc, L., E. Payne, B. Post, M. M. Vanrell, and P. Prieto (2013). Tonal targets in early child English, Spanish and Catalan. Language and Speech 56, 229–253. Asu, E. L. (2004). The phonetics and phonology of Estonian intonation. PhD dissertation, University of Cambridge. Asu, E. L. (2006). Rising intonation in Estonian: An analysis of map task dialogues and spontaneous conversations. In R. Aulanko, L. Wahlberg, and M. Vainio (eds.), Fonetiikan Päivät 2006/The Phonetics Symposium 2006, 1–8. Helsinki: Helsinki University. Asu, E. L., P. Lippus, N. Salveste, and H. Sahkai (2016). F0 declination in spontaneous Estonian: Implications for pitch-related preplanning in speech production. In Proceedings of Speech Prosody 8, 1139–1142, Boston. Asu, E. L., and F. Nolan (2006). Estonian and English rhythm: A two-dimensional quantification based on syllables and feet. In Proceedings of Speech Prosody 3, Dresden. Asu, E. L., and F. Nolan (2007). The analysis of low accentuation in Estonian. Language and Speech 50, 567–588. Atoye, R. O. (2005). Non-native perception and interpretation of English intonation. Nordic Journal of African Studies 14, 26–42. Attardo, S., J. Eisterhold, J. F. Hay, and I. Poggi (2003). Multimodal markers of irony and sarcasm. International Journal of Humor Research 16, 243–260. Atterer, M., and D. R. Ladd (2004). On the phonetics and phonology of ‘segmental anchoring’ of f0: Evidence from German. Journal of Phonetics 32, 177–197. Attinasi, J. (1973). Lak t’an: A grammar of the Chol (Mayan) word. PhD dissertation, University of Chicago. Aureli, T., M. Spinelli, M. Fasolo, M. C. Garito, P. Perucchini, and L. D’Odorico (2017). The pointingvocal coupling progression in the first half of the second year of life. Infancy 22(6), 801–818. Austin, P. (1981). A grammar of Diyari, South Australia (Cambridge Studies in Linguistics 32). Cambridge: Cambridge University Press. Auziņa, I. (2013). Valodas suprasegmentālās jeb prosodiskās vienības. In D. Nītiņa and J. Grigorjevs (eds.), Latviešu valodas gramatika. Rīga: LU Latviešu valodas institūts. Avanzi, M., A. C. Simon, J.-P. Goldman, and A. Auchlin (2010). C-PROM: An annotated corpus for French prominence study. In Proceedings of Speech Prosody 5, Chicago. Avanzi, M., S. Schwab, P. Dubosson, and J.-P. Goldman (2012). La prosodie de quelques variétés de français parlées en Suisse romande. In A. C. Simon (ed.), La variation prosodique régionale en français, 89–120. Louvain-la-Neuve: De Boeck/Duculot.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

696 References Avelino, H. (2001). Phonetic correlates of fortis-lenis in Yalálag Zapotec. MA thesis, University of California, Los Angeles. Avelino, H. (2009). Intonational patterns of topic and focus constructions in Yucatec Maya. In H. Avelino, J. Coon, and E. Norcliffe (eds.), New Perspectives in Mayan Linguistics, vol. 59, 1–21. Cambridge, MA: MIT Press. Avelino, H. (2010). Acoustic and electroglottographic analyses of nonpathological, nonmodal phon ation. Journal of Voice 24, 270–280. Avelino, H., E. Shin, and S. Tilsen (2011). The phonetics of laryngealization in Yucatec Maya. In H. Avelino (ed.), New Perspectives in Mayan Linguistics, 1–20. Newcastle upon Tyne: Cambridge Scholars Publishing. Avelino Becerra, H. (2004). Topics in Yalálag Zapotec, with particular reference to its phonetic structures. PhD dissertation, University of California, Los Angeles. Avesani, C. (1990). A contribution to the synthesis of Italian intonation. In Proceedings of the 1st International Conference on Spoken Language Processing, vol. 1, 834–836, Kobe. Avesani, C., M. Vayra, and C. Zmarich (2007). On the articulatory bases of prominence in Italian. In Proceedings of the 16th International Congress of Phonetic Sciences, 981–984, Saarbrücken. Avgustinova, T., and B. Andreeva (1999). Thematic intonation patterns in Bulgarian clitic replication. In Proceedings of the 14th International Congress of Phonetic Sciences, 1501–1504, San Francisco. Ayres, G. (1991). La gramática Ixil: Antigua. Guatemala: Centro de Investigaciones Regionales de Mesoamérica. Ayers, G. M. (1996). Nuclear accent types and prominence: Some psycholinguistic experiments. PhD dissertation, The Ohio State University. Aylett, M., and A. E. Turk (2004). The smooth signal redundancy hypothesis: A functional explan ation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech 47, 31–56. Aylett, M., and A. E. Turk (2006). Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. Journal of the Acoustical Society of America 119, 3048–3058. Baart, J. L. G. (2004). Tone and song in Kalam Kohistani (Pakistan). In H. Quené and V. J. van Heuven (eds.), On Language and Speech: Studies for Sieb G. Nooteboom, 5–16. Utrecht: Netherlands Graduate Institute of Linguistics. Bach, E. (1975). Long vowels and stress in Kwakiutl. Texas Linguistic Forum 2, 9–19. Bach, K. (1997). The semantics–pragmatics distinction: What it is and why it matters. In E. Rolf (ed.), Pragmatik, Linguistische Berichte (Forschung Information Diskussion), 33–50. Wiesbaden: VS Verlag für Sozialwissenschaften. Bachenko, J., E. Fitzpatrick, and M. Schonwetter (2008). Verification and implementation of language-based deception indicators in civil and criminal narratives. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 41–48, Columbus. Baese-Berk, M. M., A. R. Bradlow, and B. A. Wright (2013). Accent-independent adaptation to foreign-accented speech. Journal of the Acoustical Society of America 133, 174–180. Baese-Berk, M. M., L. C. Dilley, M. Henry, L. Vinke, and E. Banzina (2019). Not just a function of function words: Distal speech rate affects perception of prosodically weak syllables. Attention, Perception, and Psychophysics 81, 571–589. Baese-Berk, M. M., C. Heffner, L. C. Dilley, M. A. Pitt, T. Morrill, and J. D. McAuley (2014). Long-term temporal tracking of speech rate affects spoken-word recognition. Psychological Science 25, 1546–1553. Baeskow, H. (2004). Lexical Properties of Selected Non-native Morphemes of English. Tübingen: Gunter Narr. Bagmut, A. J., I. V. Borysjuk, and H. P. Olijnyk (1985). Intonacija spontannoho movlennja. Kyjiv: Naukova Dumka. Bailey, T. M. (1995). Nonmetrical constraints on stress. PhD dissertation, University of Minnesota. Baird, B. (2011). Phonetic and phonological realizations of ‘broken glottal’ vowels in Kˈicheeˈ. In K. Shklovsky, P. M. Pedro, and J. Coon (eds.), Proceedings of Formal Approaches to Mayan Linguistics (FAMLi), 39–50. Cambridge, MA: MIT Working Papers in Linguistics.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 697 Baird, B. (2014). An acoustic analysis of contrastive focus marking in Spanish Kˈicheeˈ (Mayan) bilingual intonation. PhD dissertation, University of Texas. Baird, B., and A. F. Pascual (2011). Realizaciones foneticas de /VP/ en Q’anjob’al (Maya). In Proceedings of the 5th Conference on Indigenous Languages of Latin America, Austin. Baken, P. J., and R. F. Orlikoff (2000). Clinical Measurement of Speech and Voice. San Diego: Singular Publishing Group. Baker, B. J. (2008). Word Structure in Ngalakgan. Stanford: CSLI Publications. Baker, B. J. (2014). Word structure in Australian languages. In H. Koch and R. Nordlinger (eds.), The World of Linguistics. Vol. 3: The Languages and Linguistics of Australia, 139–213. Berlin: Mouton de Gruyter. Baker, B. J. (2018). Super-complexity and the status of ‘word’ in Gunwinyguan languages of Australia. In G. Booij (ed.), The Construction of Words, 255–286. Cham: Springer. Baker, C., and C. Padden (1978). Focusing on the nonmanual components of American Sign Language. In P. Siple (ed.), Understanding Language through Sign Language Research, 27–57. New York: Academic Press. Baker, R. E. (2010). The acquisition of English focus marking by non-native speakers. PhD dissertation, Northwestern University. Baker-Shenk, C. (1983). A microanalysis of the nonmanual components of questions in American Sign Language. PhD dissertation, University of California, Berkeley. Baltaxe, C., and J. Simmons (1985). Prosodic development in normal and autistic children. In E. Schopler and G. Mesibov (eds.), Communication Problems in Autism, 95–125. New York: Plenum Press. Baltazani, M. (2006). Focusing, prosodic phrasing, and hiatus resolution in Greek. In J. L. Goldstein, D. H. Whalen, and C. T. Best (eds.), Laboratory Phonology 8, 473–494. Berlin: Mouton de Gruyter. Baltazani, M. (2007a). Prosodic rhythm and the status of vowel reduction in Greek. In Selected Papers on Theoretical and Applied Linguistics from the 17th International Symposium on Theoretical and Applied Linguistics, vol. 1, 31–43. Thessaloniki: Department of Theoretical and Applied Linguistics. Baltazani, M. (2007b). Intonation of polar questions and the location of nuclear stress in Greek. In C. Gussenhoven and T. Riad (eds.), Experimental Studies in Word and Sentence Prosody: Vol. 2. Tones and Tunes, 387–405. Berlin: Mouton de Gruyter. Baltazani, M., and S.-A. Jun (1999). Focus and topic intonation in Greek. In Proceedings of the 14th International Congress of Phonetic Sciences, 1305–1308, San Francisco. Baltazani, M., and E. Kainada (2015). Drifting without an anchor: How pitch accents withstand vowel loss. Language and Speech 58, 84–113. Baltazani, M., and E. Kainada (2019). The Cretan fall: An analysis of the declarative intonation melody in the Cretan dialect. In M. Janse, B. D. Joseph, I. Kappa, A. Ralli, and M. Tzakosta (eds.), MGDLT 7: Proceedings of the 7th International Conference on Modern Greek Dialects and Linguistic Theory, 38–48. Patras: University of Crete. Baltazani, M., E. Kainada, A. Lengeris, and K. Nicolaidis (2015). The prenuclear field matters: Questions and statements in Standard Modern Greek. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Baltazani, M., E. Kainada, K. Nikolaidis, A. Sfakianaki, A. Lengeris, E. Tsiartsioni, D. Papazachariou, and M. Giakoumelou (2014). Cross dialectal vowel spaces in Greek. Poster presented at the 14th Conference on Laboratory Phonology, Tokyo. Baltazani, M., J. Przedlacka, and J. Coleman (2019). Greek in contact: A historical-acoustic investigation of Asia Minor Greek intonational patterns. In M. Janse, B. D. Joseph, I. Kappa, A. Ralli, and M. Tzakosta (eds.), MGDLT 7: Proceedings of the 7th International Conference on Modern Greek Dialects and Linguistic Theory, 49–58. Patras: University of Crete. Bamgbos̹e, A. (1966). The assimilated low tone in Yoruba. Lingua 16, 1–13.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

698 References Bannert, R., and A.-C. Bredvad-Jensen (1975). Temporal Organization of Swedish Tonal Accents: The Effect of Vowel Duration. Working Papers, Phonetics Laboratory, Lund University. Bannert, R., and A.-C. Bredvad-Jensen (1977). Temporal organization of Swedish tonal accents: The effect of vowel duration in the Gotland dialect. In Working Papers 15, 133–138. Phonetics Laboratory, Lund University. Banti, G., and F. Giannattasio (1996). Music and metre in Somali poetry. In R. J. Hayward and I. M. Lewis (eds.), Voice and Power: The Culture of Language in North-East Africa, 83–128. London: School of Oriental and African Studies. Banzina, E., L. C. Dilley, and L. Hewitt (2016). The role of secondary stressed and unstressed unreduced syllables in word recognition: Acoustic and perceptual studies with Russian learners of English. Journal of Psycholinguistic Research 45, 813–831. Bao, Z. (1990). Fanqie languages and reduplication. Linguistic Inquiry 21, 317–350. Baquiax Barreno, M. C., R. J. Mateo, and F. R. Mejía (2005). Yaq’b’anil stxolilal ti’ Q’anjob’al: Gramática descriptiva Q’anjob’al. Guatemala City: Academia de Lenguas Mayas de Guatemala. Barbosa’, P. A. (2007). From syntax to acoustic duration: A dynamical model of speech rhythm production. Speech Communication 49, 725–742. Barbu, I. (2016). Listener ratings and acoustic characteristics of intonation contours produced by children with cochlear implants and children with normal hearing. MA thesis, George Washington University. Bard, E. G., R. C. Shillcock, and G. T. Altmann (1988). The recognition of words after their acoustic offsets in spontaneous speech: Effects of subsequent context. Perception and Psychophysics 44, 395–408. Bardiaux, A., J.-P. Goldman, and A. C. Simon (2012). La prosodie de quelques variétés de français parlées en Belgique. In A. C. Simon (ed.), La variation prosodique régionale en Français, 65–87. Bruxelles: De Boeck/Duculot. Barker, C. (1989). Extrametricality, the cycle, and Turkish word stress. In J. Ito and J. Runner (eds.), Phonology at Santa Cruz 1, 1–33. Santa Cruz: Linguistics Research Center. Barker, M. A.-a-R. (1964). Klamath Grammar (University of California Publications in Linguistics 32). Berkeley: University of California Press. Barkhuysen, P., E. Krahmer, and M. Swerts (2008). The interplay between the auditory and visual modality for end-of-utterance detection. Journal of the Acoustical Society of America 123(1), 354–365. Barnes, J. (1996). Autosegments with three-way lexical contrasts in Tuyuca. International Journal of American Linguistics 62, 31–58. Barnes, J. A. (2006). Strength and Weakness at the Interface: Positional Neutralization in Phonetics and Phonology. Berlin: Mouton de Gruyter. Barnes, J. A., A. Brugos, S. Shattuck-Hufnagel, and N. Veilleux (2012a). On the nature of perceptual differences between accentual peaks and plateaux. In O. Niebuhr (ed.), Understanding Prosody: The Role of Context, Function and Communication. Language, Context, and Cognition series, 93–118. Berlin: Mouton de Gruyter. Barnes, J. A., A. Brugos, N. Veilleux, and S. Shattuck-Hufnagel (2011). Voiceless intervals and perceptual completion in f0 contours: Evidence from scaling perception in American English. In Proceedings of the 17th International Congress of Phonetic Sciences, 108–111, Hong Kong. Barnes, J. A., A. Brugos, N. Veilleux, and S. Shattuck Hufnagel (2014). Segmental influences on the perception of pitch accent scaling in English. In Proceedings of Speech Prosody 7, 1125–1129, Dublin. Barnes, J. A., A. Brugos, N. Veilleux, and S. Shattuck-Hufnagel (2015). Perception of Pseudoswedish tonal contrasts by native speakers of American English: Implications for models of intonation perception. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Barnes, J. and T. Malone (2000). El Tuyuca. In G. de Pérez, M. Stella, R. de Montes, and M. Luisa (eds.), Lenguas indígenas de Colombia: Una visión descriptiva, 437–452. Santafé de Bogotá: Instituto Caro y Cuervo. Barnes, J. A., N. Veilleux, A. Brugos, and S. Shattuck-Hufnagel (2010a). Tonal center of gravity: How f0 contour shape can matter without configurations. Poster presented at the 12th Conference on Laboratory Phonology, Albuquerque.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 699 Barnes, J. A., N. Veilleux, A. Brugos, and S. Shattuck-Hufnagel (2010b). Turning points, tonal targets, and the L-phrase accent. Language and Cognitive Processes 25, 982–1023. Barnes, J. A., N. Veilleux, A. Brugos, and S. Shattuck-Hufnagel (2012b). Tonal center of gravity: A global approach to tonal implementation in a level-based intonational phonology. Laboratory Phonology 3, 337–383. Barnes, J. A., N. Veilleux, A. Brugos, and S. Shattuck-Hufnagel (2019). The interaction of timing and scaling in a lexical tone system: An example from Shilluk. In Proceedings of the 19th International Congress of Phonetic Sciences, 1952–1956, Melbourne. Barrett, R. (1999). A grammar of Sipakapense Maya. PhD dissertation, University of Texas at Austin. Barry, W. J. (2007). Rhythm as an L2 problem: How prosodic is it? In J. Trouvain and U. Gut (eds.), Non-native Prosody: Phonetic Description and Teaching Practice, 97–120. Berlin: Mouton de Gruyter. Barry, W. J., and B. Andreeva (2001). Cross-language similarities and differences in spontaneous speech patterns. Journal of the International Phonetic Association 31, 51–66. Barry, W. J., B. Andreeva, and J. C. Koreman (2009). Do rhythm measures reflect perceived rhythm? Phonetica 66, 78–94. Barry, W. J., B. Andreeva, M. Russo, S. Dimitrova, and T. Kostadinova (2003). Do rhythm measures tell us anything about language type? In Proceedings of the 15th International Congress of Phonetic Sciences, 2693–2696, Barcelona. Bartels, C. (1999). The Intonation of English Statements and Questions: A Compositional Interpretation. New York: Routledge. Barth-Weingarten, D., N. Dehé, and A. Wichmann (2009). Where Prosody Meets Pragmatics. Bingley: Emerald. Bartkova, K., E. Delais-Roussarie, and F. Santiago Vargas (2012). Prosotran: A tool to annotate pros odically non-standard data. In Proceedings of Speech Prosody 6, 55–58, Shanghai. Bartolucci, G., and S. Pierce (1977). A preliminary comparison of phonological development in autistic, normal, and mentally retarded subjects. British Journal of Disorders of Communication 12, 137–147. Basbøll, H. (2005). The Phonology of Danish. Oxford: Oxford University Press. Bat-El, O. (2002). True truncation in colloquial Hebrew imperatives. Language 78, 651–683. Bateman, J. (1990). Iau segmental and tonal phonology. Miscellaneous Studies of Indonesian and Other Languages 10, 29–42. Batliner, A. (1989). Eine Frage ist eine Frage ist keine Frage: Perzeptionsexperimente zum Fragemodus im Deutschen. In H. Altmann, A. Batliner, and W. Oppenrieder (eds.), Zur Intonation von Modus und Fokus im Deutschen, 87–109. Niemeyer: Tübingen. Batliner, A., J. Buckow, R. Huber, V. Warnke, E. Nöth, and H. Niemann (1999). Prosodic feature evalu ation: Brute force or well designed? In Proceedings of the 14th International Congress of Phonetic Sciences, 2315–2318, San Francisco. Batliner, A., A. Buckow, H. Niemann, E. Nöth, and V. Warnke (2000a). The prosody module. In W. Wahlster (ed.), Verbmobil: Foundations of Speech-to-Speech Translations, 106–121. Berlin: Springer. Batliner, A., K. Fischer, R. Huber, J. Spilker, and E. Nöth (2000c). Desperately seeking emotions or: Actors, wizards, and human beings. In Proceedings of the ISCA Tutorial and Research Workshop on Speech and Emotion, 195–200, Newcastle, Northern Ireland. Batliner, A., R. Huber, H. Niemann, E. Nöth, J. Spilker, and K. Fischer (2000b). The recognition of emotion. In W. Wahlster (ed.), Verbmobil: Foundations of Speech-to-Speech Translations, 122–130. Berlin: Springer. Batliner, A., A. Kießling, R. Kompe, H. Niemann, and E. Nöth (1997). Can we tell apart intonation from prosody (if we look at accents and boundaries)? In INTERSPEECH 1997, 39–42, Rhodes. Batliner, A., R. Kompe, A. Kießling, M. Mast, H. Niemann, and E. Nöth (1998). M = syntax + prosody: A syntactic-prosodic labelling scheme for large spontaneous speech databases. Speech Communication 25, 193–222. Batliner, A., and B. Möbius (2005). Prosodic models, automatic speech understanding, and speech synthesis: Towards the common ground? In W. Barry and W. A. van Dommelen (eds.), The Integration of Phonetic Knowledge in Speech Technology, 21–44. Dordrecht: Springer.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

700 References Batliner, A., and E. Nöth (1989). The prediction of focus. In Eurospeech 1989, 210–213, Paris. Batliner, A., E. Nöth, J. Buckow, R. Huber, V. Warnke, and H. Niemann (2001). Whence and whither prosody in automatic speech understanding: A case study. In M. Bacchiani, J. Hirschberg, D. Litman, and M. Ostendorf (eds.), Proceedings of the Workshop on Prosody and Speech Recognition, 3–12, Red Bank, NJ. Batliner, A., B. Schuller, S. Schaeffler, and S. Steidl (2008). Mothers, adults, children, pets: Towards the acoustics of intimacy. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 4497–4500, Las Vegas. Batliner, A., S. Steidl, B. Schuller, D. Seppi, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, V. Aharonson, and N. Amir (2011). Whodunnit: Searching for the most important feature types signalling emotional user states in speech. Computer Speech and Language 25, 4–28. Battison, R. M. (1978). Lexical Borrowing in American Sign Language. Silver Spring, MD: Linstok Press. Baudouin de Courtenay, J. I. N. (1876). Rez’ja i rez’jane. Slavjanskij sbornik 3, 223–371. Bauer, W. (1997). The Reed Reference Grammar of Māori. Auckland: Reed. Baum, S. R. (1992). The influence of word length on syllable duration in aphasia: Acoustic analyses. Aphasiology 6, 501–513. Baum, S. R. (1998). The role of fundamental frequency and duration in the perception of linguistic stress by individuals with brain damage. Journal of Speech, Language, and Hearing Research 41, 31–40. Baum, S. R., and V. Dwivedi (2003). Sensitivity to prosodic structure in left- and right-hemispheredamaged individuals. Brain and Language 87, 278–289. Baum, S. R., and M. D. Pell (1997). Production of affective and linguistic prosody by brain-damaged patients. Aphasiology 11, 177–198. Baum, S. R., and M. D. Pell (1999). The neural bases of prosody: Insights from lesion studies and neuroimaging. Aphasiology 13, 581–608. Baum, S. R., M. D. Pell, C. L. Leonard, and J. Gordon (1997). The ability of right- and left-hemisphere damaged individuals to produce and interpret prosodic cues marking phrasal boundaries. Language and Speech 40, 313–330. Baum, S. R., M. D. Pell, C. L. Leonard, and J. Gordon (2001). Using prosody to resolve temporary syntactic ambiguities in speech production: Acoustic data on brain-damaged speakers. Clinical Linguistics and Phonetics 15, 441–456. Baumann, S. (2006). The Intonation of Givenness: Evidence from German (Linguistische Arbeiten 508). Tübingen: Niemeyer. Baumann, S. (2016). Second occurrence focus. In C. Féry and S. Ishihara (eds.), Oxford Handbook of Information Structure, 483–502. Oxford: Oxford University Press. Baumann, S., and M. Grice (2006). The intonation of accessibility. Journal of Pragmatics 38, 1636–1657. Baumann, S., M. Grice, and S. Steindamm (2006). Prosodic marking of focus domains: Categorical or gradient? In Proceedings of Speech Prosody 3, Dresden. Baumann, S., and F. Kügler (2015). Prosody and information status in typological perspective: Introduction to the special issue. Lingua 165(B), 179–182. Baumann, S., and A. Riester (2012). Referential and lexical givenness: Semantic, prosodic and cognitive aspects. In G. Elordieta and P. Prieto (eds.), Prosody and Meaning (Interface Explorations 25), 119–162. Berlin: Mouton de Gruyter. Baumann, S., and P. B. Schumacher (2012). (De-)accentuation and the processing of information status: Evidence from event-related brain potentials. Language and Speech 55, 361–381. Bax, M., C. Tydeman, and O. Flodmark (2006). Clinical and MRI correlates of cerebral palsy: The European cerebral palsy study. JAMA 296, 1602–1608. Beach, D. M. (1938). The Phonetics of the Hottentot Language. Cambridge: Heffer and Sons. Beal, H. D. (2011). The segments and tones of Soyaltepec Mazatec. PhD dissertation, University of Texas at Arlington.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 701 Beam de Azcona, R. (2004). A Coatlán-Loxicha Zapotec grammar. PhD dissertation, University of California, Berkeley. Bearth, T., and C. Link (1980). The tone puzzle of Wobe. Studies in African Linguistics 11, 147–207. Beaver, D., and B. Z. Clark (2009). Sense and Sensitivity: How Focus Determines Meaning (Explorations in Semantics 12). Oxford: John Wiley and Sons. Beaver, D., B. Z. Clark, E. Flemming, T. F. Jaeger, and M. Wolters (2007). When semantics meets phonetics: Acoustical studies of second-occurrence focus. Language 83, 245–276. Beaver, D., and D. Velleman (2011). The communicative significance of primary and secondary accents. Lingua 121, 1671–1692. Beck, D., and D. Benett (2007). Extending the prosodic hierarchy: Evidence from Lushootseed narrative. Northwest Journal of Linguistics 1, 1–34. Beckman, M. E. (1986). Stress and Non-stress Accent. Dordrecht: Foris. Beckman, M. E. (1992). Evidence for speech rhythms across languages. In Y. Tohkura, E. VatikiotisBateson, and Y. Sagisaka (eds.), Speech Perception, Production and Linguistic Structure, 457–463. Tokyo: OHM. Beckman, M. E. (1996). The parsing of prosody. Language and Cognitive Processes 11, 17–67. Beckman, M. E., and K. B. Cohen (2000). Modelling the articulatory dynamics of two levels of stress contrast. In M. Horne (ed.), Prosody: Theory and Experiment, 169–200. Dordrecht: Amsterdam. Beckman, M. E., and J. Edwards (1992). Intonational categories and the articulatory control of dur ation. In Y. Tohkura, E. Vatikiotis-Bateson, and Y. Sagisaka (eds.), Speech Perception, Production and Linguistic Structure, 356–375. Tokyo: OHM. Beckman, M. E., and J. E. Edwards (1994). Articulatory evidence for differentiating stress categories. In P. A. Keating (ed.), Phonological Structure and Phonetic Form: Papers in Laboratory Phonology III, 7–33. Cambridge: Cambridge University Press. Beckman, M. E., and G. A. Elam (1997). Guidelines for ToBI labelling, v.3. Retrieved 8 May 2020 from http://www.ling.ohio-state.edu/research/phonetics/E_ToBI. Beckman, M. E., J. Hirschberg, and S. Shattuck-Hufnagel (2005). The original ToBI system and the evolution of the ToBI framework. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 9–54. Oxford: Oxford University Press. Beckman, M. E., and J. B. Pierrehumbert (1986). Intonational structure in English and Japanese. In C. J. Ewen and J. Anderson (eds.), Phonology Yearbook 3, 255–310. Cambridge: Cambridge University Press. Beckman, M. E., and J. J. Venditti (2011). Intonation. In J. A. Goldsmith, J. Riggle, and A. C. Yu (eds.), The Handbook of Phonological Theory, 485–532. Malden: Blackwell. Behne, D. M., and P. E. Czigler (1995). Distinctive vowel length and postvocalic consonant clusters in Swedish. PHONUM 3, 55–63. Behrens, S. J. (1988). The role of the right hemisphere in the production of linguistic stress. Brain and Language 33, 104–127. Behrens, S. J. (1989). Characterizing sentence intonation in a right-hemisphere-damaged population. Brain and Language 37, 181–200. Beirne, M.-B., and K. Croot (2018). The prosodic domain of phonological encoding: Evidence from speech errors. Cognition 177, 1–7. Békésy, G. (1928). Zur Theorie des Hörens: Die Schwingungsform der Basilarmembran. Physikalische Zeitschrift 22, 793–810. Bell, A. (1993). Jemez tones and stress. Colorado Research in Linguistics 12, 26–34. Beller, R., and P. Beller (1979). Huasteca Nahuatl. In R. Langacker (ed.), Studies in Uto-Aztecan Grammar: Vol. 2. Modern Aztec Grammatical Sketches, 199–306. Arlington, TX: Summer Institute of Linguistics. Bellugi, U., and E. Klima (1979). Language: Perspectives from another modality. Brain and Mind 69, 99–117.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

702 References Belyk, M., and S. Brown (2014). Perception of affective and linguistic prosody: An ALE meta-analysis of neuroimaging studies. Social Cognitive and Affective Neuroscience 9, 1395–1403. Benavides-Varela, S., and J. Gervain (2017). Learning word order at birth: A NIRS study. Developmental Cognitive Neuroscience 25, 198–208. Bender, M. L. (1996). The Nilo-Saharan Languages: A Comparative Essay. Munich: Lincom Europa. Benkirane, T. (1998). Intonation in Western Arabic (Morocco). In D. Hirst and A. Di Cristo (eds.), Intonation Systems: A Survey of Twenty Languages, 345–359. Cambridge, MA: Cambridge University Press. Bennett, R. (2016). Mayan phonology. Language and Linguistics Compass 10, 469–514. Bennett, R., and R. Henderson (2013). Accent in Uspanteko. Natural Language and Linguistic Theory 31, 589–645. Bennett, R., J. Coon, and R. Henderson (2016). Introduction to Mayan linguistics. Language and Linguistics Compass 10, 455–468. Benoît, C., T. Guiard-Marigny, B. Le GoV, and A. Adjoudani (1996). Which components of the face do humans and machines best speechread? In D. Stork and M. Hennecke (eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, 315–328. New York: Springer-Verlag. Bent, T., A. R. Bradlow, and B. A. Wright (2006). The influence of linguistic experience on the cognitive processing of pitch in speech and nonspeech sounds. Journal of Experimental Psychology: Human Perception and Performance 32, 97–103. Benua, L. (1995). Identity effects in morphological truncation. In J. Beckman, L. W. Dickey, and S. Urbanczyk (eds.), University of Massachusetts Occasional Papers in Linguistics 18: Papers in Optimality Theory, 77–136. Amherst: GLSA. Beňuš, Š., A. Gravano, and J. Hirschberg (2007). The prosody of backchannels in American English. In Proceedings of the 16th International Congress of Phonetic Sciences, 1065–1069, Saarbrücken. Beňuš, Š., A. Gravano, and J. Hirschberg (2011). Pragmatic aspects of temporal accommodation in turn-taking. Journal of Pragmatics 43, 3001–3027. Beňuš, Š., A. Gravano, R. Levitan, S. I. Levitan, L. Willson, and J. Hirschberg (2014a). Entrainment, dominance and alliance in supreme court hearings. Knowledge-Based Systems 71, 3–14. Beňuš, Š., R. Levitan, and J. Hirschberg (2012). Entrainment in spontaneous speech: The case of filled pauses in supreme court hearings. Paper presented at the 3rd IEEE Conference on Cognitive Infocommunications, Kosice, Slovakia. Beňuš, Š., and K. Mády (2010). Effects of lexical stress and speech rate on the quantity and quality of Slovak vowels. In Proceedings of Speech Prosody 5, Chicago. Beňuš, Š., U. D. Reichel, and K. Mády (2014b). Modeling accentual phrase intonation in Slovak and Hungarian. In L. Veselovská and M. Janebová (eds.), Complex Visibles Out There: Proceedings of the Olomouc Linguistics Colloquium, 677–689. Olomouc: Palacký University. Beňuš, Š., and J. Šimko (2012). Rhythm and tempo in Slovak. In Proceedings of Speech Prosody 6, 502–505, Shanghai. Beňuš, Š., and J. Šimko (2014). Emergence of prosodic boundary: Continuous effects of temporal affordance on inter-gestural timing. Journal of Phonetics 44, 110–129. Berez, A. (2011). Prosody as a genre-distinguishing feature in Ahtna: A quantitative approach. Functions of Language 18, 210–236. Bergelson, E., and D. Swingley (2012). At 6–9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences of the United States of America 109 (9), 3253–3258. Bergelson, E., and D. Swingley (2013). The acquisition of abstract words by young infants. Cognition 127, 391–397. Bergeson, T. R., and S. E. Trehub (2007). Signature tunes in mothers’ speech to infants. Infant Behavior and Development 30, 648–654. Bergman, B. (1984). Non-manual components of signed language: Some sentence types in Swedish Sign Language. Recent Research on European Sign Languages, 49–59. Bergqvist, J. H. G. (2008). Temporal reference in Lakandon Maya: Speaker- and event perspectives. PhD dissertation, School of Oriental and African Studies, University of London.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 703 Bergsland, K. (1994). Aleut Dictionary (Unangam Tunudgusii): An Unabridged Lexicon of the Aleutian, Pribilof, and Commander Islands Aleut Language. Fairbanks: Alaska Native Language Center, University of Alaska. Bergsland, K. (1997). Aleut Grammar (Unangam Tunuganaan Achixaasix̂): A Descriptive Reference Grammar of the Aleutian, Pribilof, and Commander Islands Aleut Language (Alaska Native Language Center Research Papers 10). Fairbanks: Alaska Native Language Center, University of Alaska. Berinstein, A. E. (1979). A Cross‑Linguistic Study on the Perception and Production of Stress. UCLA Working Papers in Phonetics 47. Berinstein, A. E. (1991). The role of intonation in Kˈekchi Mayan discourse. In C. McLemore (ed.), Texas Linguistic Forum, vol. 32, 1–19. Austin: University of Texas at Austin. Berkovits, R. (1994). Durational effects in final lengthening, gapping, and contrastive stress. Language and Speech 37, 237–250. Berman, J. M. J., C. G. Chambers, and S. A. Graham (2010). Preschoolers’ appreciation of speaker vocal affect as a cue to referential intent. Journal of Experimental Child Psychology 107, 87–99. Berman, J. M., S. A. Graham, D. Callaway, and C. G. Chambers (2013). Preschoolers use emotion in speech to learn new words. Child Development 84, 1791–1805. Berman, R. A. (1997). Modern Hebrew. In R. Hetzron (ed.), The Semitic Languages, 312–333. London: Routledge. Bernal, S., J. Lidz, S. Millotte, and A. Christophe (2007). Syntax constrains the acquisition of verb meaning. Language Learning and Development 3, 325–341. Bernard, C., and J. Gervain (2012). Prosodic cues to word order: What level of representation? Frontiers in Language Sciences 3, 451. Bernstein, J. G., and A. J. Oxenham (2006). The relationship between frequency selectivity and pitch discrimination: Sensorineural hearing loss. Journal of the Acoustical Society of America 120, 3929–3945. Berruto, G. (1993). Le varietà del repertorio. In A. Sobrero (ed.), Introduzione all’italiano contemporaneo, 3–36. Rome: Laterza. Berry, J. (2011). Accuracy of the NDI wave speech research system. Journal of Speech, Language, and Hearing Research 54, 1295–1301. Berry, J., Y. H. Poortinga, M. H. Segall, and P. R. Dasan (2002). Cross-Cultural Psychology: Research and Applications. Cambridge: Cambridge University Press. Berthiaume, S. (2004). A phonological grammar of Northern Pame. PhD dissertation, University of Texas at Arlington. Bertinetto, P. M. (1980). The perception of stress by Italian speakers. Journal of Phonetics 8, 385–395. Bertinetto, P. M. (1981). Strutture prosodiche dell’italiano: Accento, quantità, sillaba, giuntura, fondamenti metrici. Florence: Accademia della Crusca. Bertinetto, P. M. (1985). A proposito di alcuni recenti contributi alla prosodia dell’italiano. Classe di lettere e filosofia 15, 581–643. Bertinetto, P. M. (1989). Reflections on the dichotomy ‘stress’ vs. ‘syllable-timing’. Revue de phonétique appliquée 91, 99–130. Bertinetto, P. M., and C. Bertini (2008). On modelling the rhythm of natural languages. In Proceedings of Speech Prosody 4, 427–430, Campinas. Bertinetto, P. M., and E. Magno Caldognetto (1993). Ritmo e intonazione. In A. Sobrero (ed.), Introduzione all’italiano contemporaneo, 141–192. Bari: Laterza. Bertini, C., and P. M. Bertinetto (2009). Prospezioni sulla struttura ritmica dell’italiano basate sul corpus semispontaneo AVIP/API. In L. Romito, V. Galatà, and R. Lio (eds.), La fonetica sperimentale: Metodo e applicazioni (Proceedings of the 4th Conference of the Associazione Italiana di Scienze della Voce), 3–21. Torriana: EDK Editore. Best, C. T. (1995). A direct realist perspective on cross-language speech perception. In W. Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-Language Research, 171–204. Timonium, MD: York Press. Bethin, C. Y. (1998). Slavic Prosody: Language Change and Phonological Theory. Cambridge: Cambridge University Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

704 References Bethin, C. Y. (2012). Effects of vowel reduction on Russian and Belarusian inflectional morphology. Lingua 122, 1232–1251. Bhaskararao, P., and A. Ray (2017). Telugu. Journal of the International Phonetic Association 47, 231–241. Bhatara, A., N. Boll-Avetisyan, A. Unger, T. Nazzi, and B. Höhle (2013). Native language affects rhythmic grouping of speech. Journal of the Acoustical Society of America 134, 3828–3843. Bhatia, K. P., and C. D. Marsden (1994). The behavioural and motor consequences of focal lesions of the basal ganglia in man. Brain 117, 859–876. Bianco, V. (1995). Stress in Northern Lushootseed: A preliminary analysis. In Papers for the 3rd International Conference on Salish and Neighbouring Languages, 127–136. Victoria: University of Victoria. Biau, E., and S. Soto-Faraco (2013). Beat gestures modulate auditory integration in speech perception. Brain and Language 124, 143–152. Bickel, B., K. A. Hildebrandt, and R. Schiering (2009). The distribution of phonological word domains: A probabilistic typology. In J. Grijzenhout and B. Kabak (eds.), Phonological Domains: Universals and Deviations, 47–78. Berlin: Mouton de Gruyter. Bickford, J. A. (1985). Fortis/lenis consonants in Guichicovi Mixe: A preliminary acoustic study. Work Papers of the Summer Institute of Linguistics, University of North Dakota Session, 29, 195–207. Biemans, M. (2000). Gender Variation in Voice Quality. Utrecht: Landelijke Onderzoekschool Taalwetenschap. Biezma, M., and K. Rawlins (2012). Responding to alternative and polar questions. Linguistics and Philosophy 35, 361–406. Billings, L., and D. Kaufman (2004). Towards a typology of Austronesian pronominal clisis. ZAS Papers in Linguistics 34, 15–29. Bilous, F. R., and R. M. E. Krauss (1988). Dominance and accommodation in the conversational behaviours of same- and mixed-gender dyads. Language and Communication 8, 183–194. Bilsen, F. A. (1966). Repetition pitch: Monaural interaction of a sound with the repetition of the same, but phase shifted, sound. Acustica 17, 295–300. Bilsen, F. A. (1977). Pitch of noise signals: Evidence for a ‘central spectrum’. Journal of the Acoustical Society of America 61, 150–161. Bilsen, F. A., and J. L. Goldstein (1974). Pitch of dichotically delayed noise and its possible spectral basis. Journal of the Acoustical Society of America 55, 292–296. Binnick, R. I. (1980). The underlying representation of harmonizing vowels: Evidence from modern Mongolian. In R. M. Vago (ed.), Issues in Vowel Harmony, 113–126. Amsterdam: John Benjamins. Bion, R. A. H., S. Benavides-Varela, and M. Nespor (2011). Acoustic markers of prominence influence infants’ and adults’ segmentation of speech sequences. Language and Speech 54(1), 123–140. Birch, S., and C. J. Clifton (1995). Focus, accent, and argument structure: Effects on language comprehension. Language and Speech 38, 365–391. Birk, D. B. W. (1976). The Malakmalak Language, Daly River (Western Arnhem Land). Canberra: Australian National University. Bishop, D. V., M. J. Snowling, P. A. Thompson, T. Greenhalgh, and CATALISE Consortium (2016). CATALISE: A multinational and multidisciplinary Delphi consensus study. Identifying language impairments in children. PLoS ONE 11, e0158753. Bishop, J. (2003). Aspects of intonation and prosody in Bininj Gun-wok: An autosegmental-metrical analysis. PhD dissertation, University of Melbourne. Bishop, J. (2012). Information structural expectations in the perception of prosodic prominence. In G. Elordieta and P. Prieto (eds.), Prosody and Meaning (Interface Explorations 25), 239–270. Berlin: Mouton de Gruyter. Bishop, J. (2017). Focus projection and prenuclear accents: Evidence from lexical processing. Language, Cognition and Neuroscience 32, 236–253. Bishop, J., A. J. Chong, and S.-A. Jun (2015). Individual differences in prosodic strategies to sentence parsing. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 705 Bishop, J., and P. A. Keating (2012). Perception of pitch location within a speaker’s range: Fundamental frequency, voice quality, and speaker sex. Journal of the Acoustical Society of America 132, 1100–1112. Bishop, J., and J. Fletcher (2005). Intonation in six dialects of Bininj Gun-wok. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 331–361. Oxford: Oxford University Press. Bjelica, M. (2012). Speech Rhythm in English and Serbian: A Critical Study of Traditional and Modern Approaches. Novy Sad: Filozofski fakultet u Novom Sad. Black, M. P., D. Bone, Z. I. Skordilis, R. Gupta, W. Xia, P. Papadopoulos, S. N. Chakravarthula, B. Xiao, M. V. Segbroeck, J. Kim, P. G. Georgiou, and S. S. Narayanan (2015). Automated evaluation of nonnative English pronunciation quality: Combining knowledge- and data-driven features at multiple time scales. In INTERSPEECH 2015, 493–497, Dresden. Blair, R. (1964). Yucatec Maya noun and verb morpho-syntax. PhD dissertation, Indiana University. Blankenhorn, V. (1981). Pitch, quantity and stress in Munster Irish. Éigse 18, 225–250. Blankenhorn, V. (1982). Intonation in Connemara Irish: A preliminary study of kinetic glides. Studia Celtica 16–17, 259–279. Blazej, L. J., and A. M. Cohen-Goldberg (2015). Can we hear morphological complexity before words are complex? Journal of Experimental Psychology: Human Perception and Performance 41, 50–68. Blench, R. (2001). Plural verb morphology in Fobur Izere. Ms., University of Cambridge. Blench, R. (2005). Plural verb morphology in Eastern Berom. Ms., University of Cambridge. Blevins, J. (2001a). Nhanda: An Aboriginal language of Western Australia. Honolulu: University of Hawaiʻi Press. Blevins, J. (2001b). Where have all the onsets gone? Initial consonant loss in Australian Aboriginal languages. In J. Simpson, D. Nash, M. Laughren, P. Austin, and B. Alpher (eds.), Forty Years On: Ken Hale and Australian Languages (Pacific Linguistics 512), 481–492. Canberra: Australian National University. Blevins, J. (2003). A note on reduplication in Bugotu and Cheke Holo. Oceanic Linguistics 42, 499–505. Blevins, J., and D. Marmion (1994). Nhanta historical phonology. Australian Journal of Linguistics 14(2), 193–216. Blevins, J., and A. K. Pawley (2010). Typological implications of Kalam predictable vowels. Phonology 27, 1–44. Bloch, B. (1950). Studies in colloquial Japanese IV phonemics. Language 26, 86–125. Blonder, L. X., D. Bowers, K., and K. M. Heilman (1991). The role of the right hemisphere in emotional communication. Brain 114, 1115–1127. Blood, D. W. (1977). Clause and sentence final particles in Cham. In D. D. Thomas, E. W. Lee, and N. Đ. (eds.), Liêm Southeast Asian Linguistics: No. 4. Chamic Studies, 39–51. Canberra: Pacific Linguistics. Bloomfield, L. (1917). Tagalog Texts with Grammatical Analysis (3 vols). Urbana: University of Illinois. Blumenfeld, L. (2015). Meter as faithfulness. Natural Language and Linguistic Theory 33, 79–125. Blumenfeld, L. (2016). End-weight effects in verse and language. Studia Metrica et Poetica 3, 7–32. Blust, R. A. (1999). Subgrouping, circularity, and extinction: Some issues in Austronesian comparative history. In E. Zeitoun and P. J.-K. Li (eds.), Selected Papers from the Eighth International Conference on Austronesian Linguistics, 31–94. Taipei: Academica Sinica. Blust, R. A. (2013). The Austronesian Languages (rev. ed.). Canberra: Asia-Pacific Linguistics. Boas, F. (1947). Kwakiutl grammar with a glossary of the suffixes (edited by H. B. Yampolsky and Z. S. Harris). Transactions of the American Philosophical Society 37, 199, 202–377. Boas, F., and E. Deloria (1941). Dakota Grammar (Memoirs of the National Academy of Sciences 23). Washington: US Government Printing Office. Boersma, P., and V. J. van Heuven (2001). Praat, a system for doing phonetics by computer. Glot International 5 (9–10), 341–347. Boersma, P. (2017). The history of the Franconian tone contrast. In W. Kehrein, B. Köhnlein, P. Boersma, and M. van Oostendorp (eds.), Segmental Structure and Tone, 27–98. Berlin: De Gruyter. Boersma, P., and D. Weenink (1996). Praat, a System for Doing Phonetics by Computer. Report of the Institute of Phonetic Sciences Amsterdam 132.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

706 References Boersma, P., and D. Weenink (2014). Praat: Doing Phonetics by Computer [Software]. Bögels, S., and F. Torreira (2015). Listeners use intonational phrase boundaries to project turn ends in spoken interaction. Journal of Phonetics 52, 46–57. Bogen, J. E., and H. W. Gordon (1971). Musical tests for functional lateralization with intracarotid amobarbitol. Nature 230, 524–525. Bogomolets, K., and H. van der Hulst. (eds.) (in press). Word Prominence in Polysynthetic Languages. Oxford: Oxford University Press. Bohnhoff, L. E. (2010). A Description of Dii: Phonology, Grammar, and Discourse. Ngaoundéré: Dii Literature Team. Bolaños, K. (2016). A Descriptive Grammar of Kakua. Utrecht: LOT. Bolinger, D. (1951). Intonation: Levels versus configurations. Word 7, 199–210. Bolinger, D. (1958). A theory of pitch accents in English. Word 14, 109–149. Bolinger, D. (1978). Intonation across languages. In J. H. Greenberg (ed.), Universals of Human Language: Vol. 2. Phonology, 471–524. Stanford: Stanford University Press. Bolinger, D. (1982). Intonation and its parts. Language 58, 505–532. Bolinger, D. (1986). Intonation and its Parts: Melody in Spoken English. Stanford: Stanford University Press. Bolinger, D. (1989). Intonation and its Uses: Melody in Grammar and Discourse. Stanford: Stanford University Press. Bombien, L., C. Mooshammer, P. Hoole, and B. Kühnert (2010). Prosodic and segmental effects on EPG contact patterns of word-initial German clusters. Journal of Phonetics 38, 388–403. Bond, Z. S., and S. Garnes (1980). Misperceptions of fluent speech. In R. A. Cole (ed.), Perception and Production of Fluent Speech, 115–132. Hillsdale, NJ: Erlbaum. Bond, Z. S., and L. H. Small (1983). Voicing, vowel, and stress mispronunciations in continuous speech. Perception and Psychophysics 34, 470–474. Bondaruk, A. (2004). The inventory of nuclear tone in Connemara Irish. Journal of Celtic Linguistics 8, 15–47. Bone, D., C.-C. Lee, and S. S. Narayanan (2014). Robust unsupervised arousal rating: A rule-based framework with knowledge-inspired vocal features. IEEE Transactions on Affective Computing 5, 201–213. Bonnin, C. (2014). Intonation and prosody in Yukulta, a Tangkic language of North West Queensland. MA thesis, University of Queensland. Booij, G. (1995). The Phonology of Dutch. Oxford: Oxford University Press. Bordal, G. (2012). Prosodie et contact de langues: Le cas du système tonal du français centrafricain. PhD dissertation, Oslo University. Borg, A., and M. Azzopardi-Alexander (1997). Maltese. London: Routledge. Borgeson, S., A. Anttila, R. Heuser, and P. Kiparsky (in press). Antimetricality. Borgstrøm, C. (1940). The Dialects of the Outer Hebrides: Vol. 1. Oslo: Norsk Tidsskrift for Sprogvidenskap. Borise, L. (2015). Prominence redistribution in the Aŭciuki dialect of Belarusian. Formal Approaches to Slavic Linguistics 24, 94–109. Borman, M. B. (1962). Cofan phonemes. SIL International Publications in Linguistics 7, 45–59. Bornstein, M. H., C. S. Tamis-LeMonda, J. Tal, P. Ludemann, S. Toda, C. W. Rahn, M.-G. Pecheux, H. Azuma, and D. Vardi (1992). Maternal responsiveness to infants in three societies: America, France, and Japan. Child Development 63, 808–821. Borod, J. C. (1992). Interhemispheric and intrahemispheric control of emotion: A focus on unilateral brain damage. Journal of Consulting and Clinical Psychology 60, 339–348. Borod, J. C. (1993). Cerebral mechanisms underlying facial, prosodic, and lexical emotional expression: A review of neuropsychological studies and methodological issues. Neuropsychology 7, 445–463. Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conceptions of time. Cognitive Psychology 43(1), 1–22.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 707 Borràs-Comes, J., and P. Prieto (2011). Seeing tunes: The role of visual gestures in tune interpretation. Laboratory Phonology 2, 355–380. Borràs-Comes, J., M. M. Vanrell, and P. Prieto (2014). The role of pitch range in establishing inton ational contrasts. Journal of the International Phonetic Association 44, 1–20. Borrie, S. A., and C. R. Delfino (2017). Conversational entrainment of vocal fry in young adult female American English speakers. Journal of Voice 31, 513.e25–513.e32. Borrie, S. A., M. J. McAuliffe, J. M. Liss, C. Kirk, G. A. O’Beirne, and T. Anderson (2012). Familiarisation conditions and the mechanisms that underlie improved recognition of dysarthric speech. Language and Cognitive Processes 27, 1039–1055. Bortfeld, H., and J. Morgan (2010). Is early word-form processing stress-full? How natural variability supports recognition. Cognitive Psychology 60, 241–266. Bortfeld, H., J. Morgan, R. M. Golinkoff, and K. Rathbun (2005). Mommy and me: Familiar names help launch babies into speech stream segmentation. Psychological Science 16, 298–304. Bortfeld, H., E. Wruck, and D. Boas (2007). Assessing infants’ cortical response to speech using nearinfrared spectroscopy. NeuroImage 34, 407–415. Bosch, A. (1996). Prominence at two levels: Stress vs. pitch prominence in North Welsh. Journal of Celtic Linguistics 5, 121–165. Bosch, A. (2010). Phonology in modern Gaelic. In M. Watson and M. Macleod (eds.), The Edinburgh Companion to the Gaelic Language, 262–282. Edinburgh: Edinburgh University Press. Bosch, A., and K. de Jong (1997). The prosody of Barra Gaelic epenthetic vowels. Studies in the Linguistic Sciences 27, 1–16. Bosch, L., and N. Sebastian-Galles (1997). Native-language recognition abilities in 4-month-old infants from monolingual and bilingual environments. Cognition 65, 33–69. Bosker, H. R., H. Quené, T. J. M. Sanders, and N. H. de Jong (2014). Native um’s elicit prediction of low-frequency referents, but non-native um’s do not. Journal of Memory and Language 75, 104–116. Bosker, H. R., and E. Reinisch (2017). Foreign languages sound fast: Evidence from implicit rate normalization. Frontiers in Psychology 8, 1063. Botinis, A. (1989). Stress and Prosodic Structure in Greek. Lund: Lund University Press. Boucher, J., V. Lewis, and G. Collis (1998). Familiar face and voice matching and recognition in children with autism. Journal of Child Psychology and Psychiatry and Allied Disciplines 39, 171–181. Boudlal, A. (2001). Constraint interaction in the phonology and morphology of Casablanca Moroccan Arabic. PhD dissertation, Mohammed V University. Boudreault, M. (1968). Rythme et mélodie de la phrase parlée en France et au Québec. Québec: Presses de l’Université Laval. Bouzon, C., and D. Hirst (2004). Isochrony and prosodic structure in British English. In Proceedings of Speech Prosody 2, 223–226, Nara. Bowerman, S. (2008). White South African English: Phonology. In B. Kortmann and E. W. Schneider (eds.), Varieties of English. Vol. 4: Africa, South and Southeast Asia, 164–176. Berlin: Mouton de Gruyter. Bowern, C., J. McDonough, and K. Kelliher (2012). Illustrations of the IPA: Bardi. Journal of the International Phonetic Association 42, 333–351. Bowers, D., R. M. Bauer, and K. M. Heilman (1993). The non-verbal effect lexicon: Theoretical perspectives from neurological studies of affect perception. Neuropsychology 7, 433–444. Bowers, D., L. X. Blonder, and K. M. Heilman (1991). The Florida Affect Battery: Manual. Ms., Center for Neuropsychological Studies, University of Florida. Boyd, R. (1995). Le Zande. In R. Boyd (ed.), Le système verbal dans les langues oubangiennes, 165–197. Munich: Lincom Europa. Boyeldieu, P. (1977). Eléments pour une phonologie du Laal de Gori (Moyen-Chari). In J.-P. Caprile (ed.), Etudes phonologiques tchadiennes, 186–198. Paris: Société des Etudes Linguistiques et Anthropologiques de France. Boyeldieu, P. (1985). La langue lua (‘Niellim’): Groupe Boua – Moyen-Chari, Tchad. Cambridge: Cambridge University Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

708 References Boyeldieu, P. (1987). Les langues fer (‘kara’) et yulu du nord centrafricain: Esquisses descriptives et lexiques. Paris: Geunthner. Boyeldieu, P. (1995). Modifications tonales et limites syntaxiques en bagiro (language ‘Sara’ de la République Centrafricaine). In R. Nicholaï and F. Rottland (eds.), Cinquième Colloque de Linguistique Nilo-Saharienne/Fifth Nilo-Saharan Linguistics Colloquium Nice, 24–29 Août (1992), 131–145. Cologne: Rüdiger Köppe. Boyeldieu, P. (2000). Identit‚ tonale et filiation des langues Sara-Bongo-Baguirmiennes (Afrique Centrale). Cologne: Rüdiger Köppe. Boyeldieu, P. (2009). Le quatrième ton du Yulu. Journal of African Languages and Linguistics 30, 197–234. Bradford, B. (1988). Intonation in Context. Cambridge: Cambridge University Press. Bradford, B. (1997). Upspeak in British English. English Today 13, 29–36. Bradley, D. (ed.) (1982). Tonation (Papers in South-East Asian Linguistics 8). Canberra: Pacific Linguistics. Bradlow, A. R., and T. Bent (2008). Perceptual adaptation to non-native speech. Cognition 106, 707–729. Brady, K., Y. Gwon, P. Khorrami, E. Godoy, W. Campbell, C. Dagli, and T. S. Huang (2016). Multi-modal audio, video and physiological sensor learning for continuous emotion prediction. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, 97–104, Amsterdam. Brainard, S., and D. Behrens (2002). A Grammar of Yakan. Manila: Linguistic Society of the Philippines. Brandão, A. P. (2014). A reference grammar of Paresi-Haliti (Arawak). PhD dissertation, University of Texas at Austin. Braun, B. (2005). Production and Perception of Thematic Contrast in German. Oxford: Peter Lang. Braun, B. (2006). Phonetics and phonology of thematic contrast in German. Language and Speech 49, 451–493. Braun, B., A. Dainora, and M. Ernestus (2011). An unfamiliar intonation contour slows down on-line speech comprehension. Language and Cognitive Processes 26, 350–375. Braun, B., N. Dehé, J. Neitsch, D. Wochner, and K. Zahner (2018). The prosody of rhetorical and information-seeking questions in German. Language and Speech 62(4), 779–807. Braun, B., T. Galts, and B. Kabak (2014). Lexical encoding of L2 tones: The role of L1 stress, pitch accent and intonation. Second Language Research 30, 323–350. Braun, B., G. Kochanski, E. Grabe, and B. S. Rosner (2006). Evidence for attractors in English inton ation. Journal of the Acoustical Society of America 119, 4006–4015. Braun, B., and L. Tagliapietra (2010). The role of contrastive intonation contours in the retrieval of contextual alternatives. Language and Cognitive Processes 25, 1024–1043. Braun, B., and L. Tagliapietra (2011). Online interpretation of intonational meaning in L2. Language and Cognitive Processes 26, 224–235. Brazil, D. (1997). The Communicative Value of Intonation in English. Cambridge: Cambridge University Press. Brazil, D., M. Coulthard, and C. Johns (1980). Discourse Intonation and Language Teaching. London: Longman. Breatnach, R. B. (1947). The Irish of Ring, Co. Waterford. Dublin: Dublin Institute for Advanced Studies. Breen, G. (1992). Some problems in Kukatj phonology. Australian Journal of Linguistics 12, 1–44. Breen, G., and R. J. Pensalfini (1999). Arrernte: A language with no syllable onsets. Linguistic Inquiry 30, 1–25. Breen, M., L. C. Dilley, J. Kraemer, and E. Gibson (2012). Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch). Corpus Linguistics and Linguistic Theory 8, 277–312. Breen, M., L. C. Dilley, J. D. McAuley, and L. Sanders (2014). Auditory evoked potentials reveal early perceptual effects of distal prosody on speech segmentation. Language, Cognition and Neuroscience 29, 1132–1146.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 709 Breen, M., L. C. Dilley, M. Brown, and E. Gibson (2018). Rhythm and Pitch (RaP) Corpus. Philadelphia: Linguistic Data Consortium. Breen, M., E. Fedorenko, M. Wagner, and E. Gibson (2010). Acoustic correlates of information structure. Language and Cognitive Processes 25, 1044–1098. Breitenstein, C., D. Van Lancker, and I. Daum (2001a). The contribution of speech rate and pitch variation to the perception of vocal emotions in a German and an American sample. Cognition and Emotion 15, 57–79. Breitenstein, C., D. Van Lancker, I. Daum, and C. Waters (2001b). Impaired perception of vocal emotions in Parkinson’s disease: Influence of speech time processing and executive functioning. Brain and Cognition 45, 277–314. Brennan, M., and G. H. Turner (eds.) (1994). Word-order Issues in Sign Language: Papers Presented at a Workshop Held in Durham, 18–21 September 1991. Durham: International Sign Linguistics Association. Brennan, S., and H. H. Clark (1996). Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory and Cognition 22, 1482–1493. Brennan, S., and M. Williams (1995). The feeling of another’s knowing: Prosody and filled pauses as cues to listeners about the metacognitive states of speakers. Journal of Memory and Language 34, 383–398. Brennand, R., A. Schepman, and P. Rodway (2011). Vocal emotion perception in pseudo-sentences by secondary-school children with autism spectrum disorder. Research in Autism Spectrum Disorders 5, 1567–1573. Brentari, D. (1990). Theoretical foundations of American Sign Language phonology. PhD dissertation, University of Chicago. Brentari, D. (1993). Establishing a sonority hierarchy in American Sign Language: The use of simul taneous structure in phonology. Phonology 10, 281–306.‫‏‬ Brentari, D. (1998). A Prosodic Model of Sign Language Phonology. Cambridge, MA: MIT Press. Brentari, D., and L. Crossley (2002). Prosody on the hands and face: Evidence from American Sign Language. Sign Language and Linguistics 5, 105–130. Bricker, V., E. P. Yah, and O. D. de Po’ot (1998). A Dictionary of the Maya Language: As Spoken in Hocabá, Yucatán. Salt Lake City: University of Utah Press. Bright, W. (1957). Singing in Lushai. Indian Linguistics 17, 24–28. Bright, W. (1984). American Indian Linguistics and Literature. Berlin: Mouton de Gruyter. Broadwell, G. A. (1999). Focus alignment and optimal order in Zapotec. In Proceedings of the 35th Meeting of the Chicago Linguistics Society, Chicago. Brockway, E. (1979). North Puebla Nahuatl. In R. Langacker (ed.), Studies in Uto-Aztecan Grammar: Vol. 2. Modern Aztec Grammatical Sketches, 141–198. Arlington: Summer Institute of Linguistics. Broesch, T., and G. Bryant (2015). Prosody in infant-directed speech is similar across Western and traditional cultures. Journal of Cognition and Development 16, 31–43. Broselow, E. (1982). On predicting the interaction of stress and epenthesis. Glossa 16, 115–132. Broß, M. (1988). Materialen zur Sprache der Ndam von Dik (Rép. Tchad): Untersuchungen zur Phonologie und Morphologie. MA thesis, University of Marburg. Browman, C. P., and J. L. Goldstein (1988). Some notes on syllable structure in articulatory phonology. Phonetica 45, 140–155. Browman, C. P., and J. L. Goldstein (1992a). ‘Targetless’ schwa: An articulatory analysis. In D. R. Ladd and G. J. Docherty (eds.), Papers in Laboratory Phonology II: Gesture, Segment, Prosody, 26–67. Cambridge: Cambridge University Press. Browman, C. P., and J. L. Goldstein (1992b). Articulatory phonology: An overview. Phonetica 49, 155–180. Browman, C. P., and J. L. Goldstein (1995). Gestural syllable position effects in American English. In F. Bell-Berti and L. J. Raphael (eds.), Producing Speech: Contemporary Issues. Woodbury, NY: AIP Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

710 References Browman, C. P., and J. L. Goldstein (2000). Competing constraints on intergestural coordination and self-organization of phonological structures. Bulletin de la communication parlee 5, 25–34. Brown, C. H., D. Beck, G. Kondrak, J. K. Watters, and S. Wichmann (2011a). Totozoquean. International Journal of American Linguistics 22, 323–372. Brown, G. (1983). Prosodic structure and the given/new distinction. In A. Cutler and D. R. Ladd (eds.), Prosody: Models and Measurements, 67–77. Berlin: Springer. Brown, L., and P. Prieto (2017). (Im)politeness: Prosody and gesture. In M. Haugh, D. Kádár, and J. Culpeper (eds.), Palgrave Handbook of Linguistic Politeness, 357–379. New York: Palgrave. Brown, M., L. C. Dilley, and M. Tanenhaus (2014). Probabilistic prosody: Effects of relative speech rate on perception of (a) word(s) several syllables earlier. In Proceedings of Speech Prosody 7, 1154–1158, Dublin. Brown, M., A. P. Salverda, L. C. Dilley, and M. Tanenhaus (2011b). Expectations from preceding prosody influence segmentation in online sentence processing. Psychonomic Bulletin and Review 18, 1189–1196. Brown, M., A. P. Salverda, L. C. Dilley, and M. Tanenhaus (2015). Metrical expectations from preceding prosody influence perception of lexical stress. Journal of Experimental Psychology: Human Perception and Performance 41, 306–323. Brown, W. (1911). Studies from the psychological laboratory of the University of California: Temporal and accentual rhythm. Psychological Review 18, 336–346. Browne, E. W., and J. D. McCawley (1965). Srpskohrvatski akcenat. Zbornik za filologiju i lingvistiku 8, 147–151. (English trans., 1973, Serbo-Croatian accent. In E. C. Fudge (ed.), Phonology: Selected Readings, 330–335. Baltimore: Penguin.) Bruce, G. (1974). Tonaccentregler för sammansatta ord i några sydsvenska stadsmål. In C. Platzack (ed.), Svenskans beskrivning 8, 62–75, Lund. Bruce, G. (1977). Swedish Word Accents in Sentence Perspective. Lund: CWK Gleerup. Bruce, G. (1998). Allmän och svensk prosodi (Praktisk Lingvistik 16). Lund: Lund University. Bruce, G. (2005). Intonational prominence in varieties of Swedish revisited. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 410–429. Oxford: Oxford University Press. Bruce, G. (2007). Components of a prosodic typology of Swedish intonation. In T. Riad and C. Gussenhoven (eds.), Tones and Tunes I: Studies in Word and Sentence Prosody, 113–146. Berlin: Mouton de Gruyter. Bruce, G. (2010). Vår fonetiska geografi: Om svenskans accenter, melodi och uttal. Lund: Studentlitteratur. Bruce, G., and E. Gårding (1978). A prosodic typology for Swedish dialects. In E. Gårding, G. Bruce, and R. Bannert (eds.), Nordic Prosody: Papers from a Symposium, 219–228. Lund: Lund University. Bruck, M., R. Treiman, and M. Caravolas (1995). Role of the syllable in the processing of spoken English: Evidence from a nonword comparison task. Journal of Experimental Psychology: Human Perception and Performance 21, 469–479. Bruggeman, A. (2018). Lexical and postlexical prominence in Tashlhiyt Berber and Moroccan Arabic. PhD dissertation, University of Cologne. Bruggeman, A., F. Cangemi, S. Wehrle, D. El Zarka, and M. Grice (2018). Unifying speaker variability with the Tonal Centre of Gravity. In M. Belz, C. Mooshammer, S. Fuchs, S. Jannedy, O. Rasskazova, and M. Żygis (eds.), Proceedings of the Conference on Phonetics and Phonology in German-Speaking Countries, Berlin. Bruggeman, A., T. B. Roettger, and M. Grice (2017). Question word intonation in Tashlhiyt Berber: Is high good enough? Laboratory Phonology 8(1), 5. Brugos, A. (2015). The interaction of pitch and timing in the perception of prosodic grouping. PhD dissertation, Boston University. Brugos, A., and J. A. Barnes (2012). The auditory kappa effect in a speech context. In Proceedings of Speech Prosody 6, 1–4, Shanghai.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 711 Brugos, A., M. Breen, N. Veilleux, J. A. Barnes, and S. Shattuck Hufnagel (2018). Cue-based annotation and analysis of prosodic boundary events. In Proceedings of Speech Prosody 9, 245–249, Poznań. Brugos, A., S. Shattuck-Hufnagel, and N. Veilleux (2006). Transcribing prosodic structure of spoken utterances with ToBI (MIT open courseware). Retrieved 19 May 2020 from https://ocw.mit.edu/ courses/electrical-engineering-and-computer-science/6-911-transcribing-prosodic-structure-ofspoken-utterances-with-tobi-january-iap-2006. Brunelle, M. (2003). Tone coarticulation in Northern Vietnamese. In Proceedings of the 15th Inter national Congress of Phonetic Sciences, 2673–2676, Barcelona. Brunelle, M. (2005). Register in Eastern Cham: Phonological, phonetic and sociolinguistic approaches. PhD dissertation, Cornell University. Brunelle, M. (2009a). Northern and Southern Vietnamese tone coarticulation: A comparative case study. Journal of Southeast Asian Linguistics 1, 49–62. Brunelle, M. (2009b). Tone perception in Northern and Southern Vietnamese. Journal of Phonetics 37, 79–96. Brunelle, M. (2012). Dialect experience and perceptual integrality in phonological registers: Fundamental frequency, voice quality and the first formant in Cham. Journal of the Acoustical Society of America 131, 3088–3102. Brunelle, M. (2016). Intonational phrase marking in Southern Vietnamese. In Proceedings of the 5th International Symposium on Tonal Aspects of Languages, 60–64, Buffalo, NY. Brunelle, M. (2017). Stress and phrasal prominence in tone languages: The case of Southern Vietnamese. Journal of the International Phonetic Association 47(3), 283–320. Brunelle, M., K. P. Hạ, and M. Grice (2012). Intonation in Northern Vietnamese. The Linguistic Review 29, 3–36. Brunelle, M., K. P. Hạ, and M. Grice (2016). Inconspicuous coarticulation: A complex path to sound change in the tone system of Hanoi Vietnamese. Journal of Phonetics, 23–39. Brunelle, M., and J. Kirby (2015). Re-assessing tonal diversity and geographical convergence in Mainland Southeast Asia. In N. J. Enfield and B. Comrie (eds.), Mainland Southeast Asian Languages: The State of the Art, 82–110. Berlin: Mouton de Gruyter. Brunelle, M., and J. Kirby (2016). Tone and phonation in Southeast Asian languages. Language and Linguistics Compass 10, 191–207. Brunetti, L., M. D’Imperio, and F. Cangemi (2010). On the prosodic marking of contrast in Romance sentence topic: Evidence from Neapolitan Italian. In Proceedings of Speech Prosody 5, Chicago. Bryzgunova, E. A. (1963/1967). Prakticheskaya Fonetika i Intonatsiya Russkogo Yazyka. Moscow: MGU. Buccellati, G. (1997). Akkadian. In R. Hetzron (ed.), The Semitic Languages, 69–99. London: Routledge. Buck, M. J. (2015). Gramática del amuzgo de xochistlahuaca. Mexico: Instituto Lingüístico de Verano. Buckley, E. (1998). Alignment in Manam stress. Linguistic Inquiry 29, 475–495. Buckley, E. (2013). Prosodic structure in Southeastern Pomo stress. Paper presented at the Society for the Study of the Indigenous Languages of the Americas Annual Meeting, Boston. Bull, B. (1978). A Phonological Summary of San Jerónimo Mazatec up to Word Level. City: Dallas: SIL International. Bull, D., R. E. Eilers, and D. K. Oller (1984). Infants’ discrimination of intensity variation in multisyllabic stimuli. Journal of the Acoustical Society of America 76, 13–17. Bull, D., R. E. Eilers, and D. K. Oller (1985). Infants’ discrimination of final syllable fundamental frequency in multisyllabic stimuli. Journal of the Acoustical Society of America 77, 289–295. Buller, B., E. Buller, and D. L. Everett (1993). Stress placement, syllable structure, and minimality in Banawá. International Journal of American Linguistics 59, 280–293. Bunta, F., and D. Ingram (2007). The acquisition of speech rhythm by bilingual Spanish- and English-speaking 4- and 5-year-old children. Journal of Speech, Language, and Hearing Research 50, 999–1014. Burdin, R. S. (2014). Variation in list intonation in American Jewish English. In Proceedings of Speech Prosody 7, 934–938, Dublin.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

712 References Burdin, R. S. (2016). Variation in form and function in Jewish English intonation. PhD dissertation, The Ohio State University. Burdin, R. S., S. Phillips-Bourass, R. Turnbull, M. Yasavul, C. G. Clopper, and J. Tonhauser (2015). Variation in the prosody of focus in head- and head/edge-prominence languages. Lingua 165(B), 254–276. Burgoon, J., J. G. Proudfoot, R. Schuetzler, and D. Wilson (2014). Patterns of nonverbal behavior associated with truth and deception: Illustrations from three experiments. Journal of Nonverbal Behavior 38, 325–354. Büring, D. (2003). On D-trees, beans, and accents. Linguistics and Philosophy 26, 511–545. Büring, D. (2009). Towards a typology of focus realizations. In M. Zimmermann and C. Féry (eds.), Information Structure: Theoretical, Typological, and Experimental Perspectives, 177–205. Oxford: Oxford University Press. Büring, D. (2012). Focus and intonation. In G. Russell and D. G. Fara (eds.), Routledge Companion to the Philosophy of Language, 103–115. London: Routledge. Büring, D. (2016). Intonation and Meaning. Oxford: Oxford University Press. Bürki, A., M. Ernestus, and U. H. Frauenfelder (2010). Is there only one ‘fenêtre’ in the production lexicon? On-line evidence on the nature of phonological representations of pronunciations variants for French schwa words. Journal of Memory and Language 62, 421–437. Bürki, A., M. Ernestus, C. Gendrot, C. Fougeron, and U. H. Frauenfelder (2011). What affects the presence versus absence of schwa and its duration: A corpus analysis of French connected speech. Journal of the Acoustical Society of America 130, 3980–3991. Burling, R. (1966). The metrics of children’s verse: A cross-linguistic study. American Anthropologist, New Series 68, 1418–1441. Burnham, D. K., and K. Mattock (2010). Auditory development.  In G. Bremner and T. D. Wachs (eds.), The Wiley Blackwell Handbook of Infant Development, 81–119. Oxford: Wiley Blackwell. Burnham, D. K., and L. Singh (2018). Coupling tonetics and perceptual attunement: The psychophysics of lexical tone contrast salience. Journal of the Acoustical Society of America 144, 1716–1716. Burnham, D. K., L. Singh, K. Mattock, P. J. Woo, and M. Kalashnikova (2018). Constraints on tone sensitivity in novel word learning by monolingual and bilingual infants: Tone properties are more influential than tone familiarity. Frontiers in Psychology 8, 2190. Burri, M., A. Baker, and W. Acton (2016). Anchoring Academic Vocabulary with a ‘Hard Hitting’ Haptic Pronunciation Teaching Technique. Faculty of Social Sciences Papers, University of Wollongong. Bush, R. (1999). Georgian yes–no question intonation. In Phonology at Santa Cruz 6, 1–11. University of California Santa Cruz. Bushnell, I. W., F. Sai, and J. T. Mullin (1989). Neonatal recognition of the mother’s face. British Journal of Developmental Psychology 7, 3–15. Butcher, A. (1981). Aspects of the Speech Pause: Phonetic Correlates and Communicative Functions. Arbeitsberichte des Instituts für Phonetik und digitale Sprachverarbeitung der Universität Kiel 15. Butler, B. A. (2014). Deconstructing the Southeast Asian sesquisyllable: A gestural account. PhD dissertation, Cornell University. Butler, J., M. Vigário, and S. Frota (2016). Infants’ perception of the intonation of broad and narrow focus. Language Learning and Development 12, 1–13. Buxó-Lugo, A., and D. Watson (2016). Evidence for the influence of syntax on prosodic parsing. Journal of Memory and Language 90, 1–13. Bye, P., and P. de Lacy (2008). Metrical influences on fortition and lenition. In J. B. de Carvalho, T. Scheer, and P. Ségéral (eds.), Lenition and Fortition, 173–206. Berlin: Mouton de Gruyter. Byers-Heinlein, K., T. C. Burns, and J. F. Werker (2010). The roots of bilingualism in newborns. Psychological Science 21, 343–348. Bynon, J. (1968). Berber nursery language. Transactions of the Philological Society 67, 107–161.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 713 Byrd, D. (1996). Influences on articulatory timing in consonant sequences. Journal of Phonetics 24, 209–244. Byrd, D. (2000). Articulatory vowel lengthening and coordination at phrasal junctures. Phonetica 57, 3–16. Byrd, D., and S. Choi (2010). At the juncture of prosody, phonology, and phonetics: The interaction of phrasal and syllable structure in shaping the timing of consonant gestures. In C. Fougeron, B. Kuhnert, M. D’Imperio, and N. Vallee (eds.), Laboratory Phonology 10, 31–59. Berlin: Mouton de Gruyter. Byrd, D., E. Flemming, C. A. Mueller, and C. C. Tan (1995). Using regions and indices in EPG data reduction. Journal of Speech and Hearing Research 38, 821–827. Byrd, D., A. Kaun, S. S. Narayanan, and E. Saltzman (2000). Phrasal signatures in articulation. In M. B. Broe and J. B. Pierrehumbert (eds.), Papers in Laboratory Phonology V: Acquisition and the Lexicon, 70–88. Cambridge: Cambridge University Press. Byrd, D., J. Krivokapic, and S. Lee (2006). How far, how long: On the temporal scope of prosodic boundary effects. Journal of the Acoustical Society of America 120, 1589–1599. Byrd, D., and D. Riggs (2008). Locality interactions with prominence in determining the scope of phrasal lengthening. Journal of the International Phonetic Association 38, 187–202. Byrd, D., and E. Saltzman (1998). Intragestural dynamics of multiple prosodic boundaries. Journal of Phonetics 26, 173–199. Byrd, D., and E. Saltzman (2003). The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening. Journal of Phonetics 31, 149–180. Caballero, G., and L. Carroll (2015). Tone and stress in Choguita Rarámuri (Tarahumara) word prosody. International Journal of American Linguistics 81, 459–493. Cahill, M. (2000). The phonology of Konni verbs. Cahiers Voltaïques/Gur Papers 5, 31–38. Cahill, M. (2007). More universals of tone. Ms., SIL International. Retrieved 11 May 2020 from https:// www.sil.org/resources/publications/entry/7816. Cahill, M. (2017). Kɔnni intonation. In L. J. Downing and A. Rialland (eds.), Intonation in African Tone Languages, 53–88. Berlin: Mouton de Gruyter. Calhoun, S. (2009). What makes a word contrastive? Prosodic, semantic and pragmatic perspectives. In D. Barth-Weingarten, N. Dehé, and A. Wichmann (eds.), Where Prosody Meets Pragmatics (Studies in Pragmatics 8), 53–78. Bingley: Emerald. Calhoun, S. (2010). The centrality of metrical structure in signalling information structure: A prob abilistic perspective. Language 86, 1–42. Calhoun, S. (2012). The theme/rheme distinction: Accent type or relative prominence? Journal of Phonetics 40, 329–349. Calhoun, S. (2015). The interaction of prosody and syntax in Samoan focus marking. Lingua 165(B), 205–229. Calhoun, S., J. Carletta, J. M. Brenier, N. Mayo, D. Jurafsky, M. Steedman, and D. Beaver (2010). The NXT-format Switchboard Corpus: A rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue. Language Resources and Evaluation 44, 387–419. Cambier-Langeveld, T., and A. E. Turk (1999). A cross-linguistic study of accentual lengthening: Dutch vs. English. Journal of Phonetics, 27, 171–206. Campbell, E. (2013). The internal diversification and subgrouping of Chatino. International Journal of American Linguistics 79, 395–400. Campbell, E. (2014). Aspects of the phonology and morphology of Zenzontepec Chatino, a Zapotecan language of Oaxaca, Mexico. PhD dissertation, University of Texas at Austin. Campbell, E. (2016). Tone and inflection in Zenzontepec Chatino. In E. L. Palancar and J. L. Léonard (eds.), Tone and Inflection, 141–162. Berlin: Walter de Gruyter. Campbell, E. (2017a). Otomanguean historical linguistics: Exploring the subgroups. Language and Linguistics Compass 11, e12244. Campbell, L. (1977). Quichean Linguistic Prehistory (University of California Publications in Linguistics 81). Berkeley: University of California Press. Campbell, L. (1997). American Indian Languages: The Historical Linguistics of Native America. Oxford: Oxford University Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

714 References Campbell, L. (2012). Classification of the indigenous languages of South America. In L. Campbell and V. Grondona (eds.), The Indigenous Languages of South America: A Comprehensive Guide, 59–166. Berlin: Mouton de Gruyter. Campbell, L. (2017b). Mayan history and comparison. In J. Aissen, N. England, and R. Z. Maldonado (eds.), The Mayan Languages, 12–43. New York: Routledge. Campbell, L., T. Kaufman, and T. C. Smith-Stark (1986). Meso-America as a linguistic area. Language 62, 530–570. Campbell, N., and M. E. Beckman (1997). Stress, prominence, and spectral tilt. In Proceedings of the European Speech Communication Association Workshop on Intonation, 67–70, Athens. Campisi, P., B. C. Low, R. J. Papsin, R. J. Mount, and R. V. Harrison (2006). Multidimensional voice program analysis in profoundly deaf children: P quantifying frequency and amplitude control. Perceptual and Motor Skills 103, 40–50. Can Pixabaj, T. A. (2007). Gramática Descriptiva Uspanteka. Antigua, Guatemala: Oxlajuuj Keej Mayaˈ Ajtzˈiibˈ. Can Pixabaj, T., and N. England (2011). Nominal topic and focus in Kˈicheeˈ. In R. Gutiérrez-Bravo, L. Mikkelsen, and E. Potsdam (eds.), Representing Language: Essays in Honor of J. Aissen, 15–30. Santa Cruz: Linguistics Research Center. Cancelliere, A., and A. Kertesz (1990). Lesion localization in acquired deficits of emotional expression and comprehension. Brain and Cognition 13, 133–147. Cangemi, F. (2009). Phonetic detail in intonation contour dynamics. In S. Schmid, M. Schwarzenbach, and D. Studer (eds.), La dimensione temporale del parlato (Atti del V Convegno Nazionale AISV), 325–334. Zurich: EDK Editore (Torriana). Cangemi, F., D. El Zarka, S. Wehrle, S. Baumann, and M. Grice (2016). Speaker-specific intonational marking of narrow focus in Egyptian Arabic. In Proceedings of Speech Prosody 8, 1–5, Boston. Canter, G. J., and D. Van Lancker (1985). Disturbances of the temporal organization of speech following bilateral thalamic surgery in a patient with Parkinson’s disease. Journal of Communication Disorders 18, 329–349. Cao, J. (2002). The relationship between tone and intonation in Mandarin Chinese. Chinese Language 3, 195–202. Caplan, L. R., J. D. Schmahmann, C. S. Kase, E. Feldman, G. Baquis, J. P. Greenberg, P. B. Gorelick, C. Helgason, and D. B. Hier (1990). Caudate infarcts. Archives of Neurology 47, 133–143. Caprile, J.-P. (1977). Première approche phonologique du Tumak de Goundi. In Etudes Phonologiques Tchadiennes, 63–64, 79–86. Paris: Société des Etudes Linguistiques et Anthropologiques de France. Caputo, M. R. (1996). Le domande in un corpus di italiano parlato: Analisi prosodica e pragmatica. PhD dissertation, University of Naples Federico II. Caputo, M. R., and M. D’Imperio (1995). Verso un possibile sistema di trascrizione prosodica dellitaliano: Cenni preliminary. In Proceedings of IV Giornate di Studio del GFS, Povo, Italy. Carignan, C. (2017). Covariation of nasalization, tongue height, and breathiness in the realization of F1 of Southern French nasal vowels. Journal of Phonetics 63, 87–105. Carlson, K., C. J. Clifton, and L. Frazier (2001). Prosodic boundaries in adjunct attachment. Journal of Memory and Language 45, 58–81. Carlson, K., M. W. Dickey, L. Frazier, and C. J. Clifton, Jr (2009). Information structure expectations in sentence comprehension. Quarterly Journal of Experimental Psychology 62, 114–139. Carlson, R. (1983). Downstep in Supyire. Studies in African Linguistics 14, 35–45. Carlson, R., K. Elenius, B. Granstrom, and S. Hunnicutt (1985). Phonetic and orthographic properties of the basic vocabulary of five European languages. Speech Transmission Laboratory: Quarterly Progress and Status Report 26, 63–94. Carnie, A. (1994). Whence sonority: Evidence from epenthesis in modern Irish. MIT Working Papers in Linguistics 21, 81–108. Caron, B., C. Lux, S. Manfredi, and C. Pereira (2015). The intonation of topic and focus: Zaar (Nigeria), Tamasheq (Niger), Juba Arabic (South Sudan) and Tripoli Arabic (Libya). In A. Mettouchi,

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 715 M. Vanhove, and D. Caubet (eds.), Corpus-Based Studies of Lesser-Described Languages: The CorpAfRoAs Corpus of Spoken Afro-Asiatic Languages (Studies in Corpus Linguistics 68), 63–115. Amsterdam: John Benjamins. Carroll, L. S. (2015). Ixpantepec nieves mixtec word prosody. PhD dissertation, University of California, San Diego. Carter, A., and L. Gerken (2004). Do children’s omissions leave traces? Journal of Child Language 31, 561–586. Carter-Ényì, A. (2016). Contour levels: An abstraction of pitch space based on African tone systems. PhD dissertation, The Ohio State University. Carter-Ényì, A. (2018). Hooked on sol-fa: The do-re-mi heuristic for Yorùbá speech tones. Africa 88, 267–290. Casagrande, J. B. (1948). Comanche baby language. International Journal of American Linguistics 14, 11–14. Casasanto, D., and L. Boroditsky (2008). Time in the mind: Using space to think about time. Cognition 106(2), 579–593. Caspers, J., and V. J. van Heuven (1993). Effects of time pressure on the phonetic realization of the Dutch accent–lending pitch rise and fall. Phonetica 50, 161–171. Cassidy, F., and R. LePage (1967/1980). Dictionary of Jamaican English. Cambridge: Cambridge University Press. Castelo, J., and S. Frota (2016). Variação entoacional no Português do Brasil: Uma análise fonológica do contorno nuclear em enunciados declarativos e interrogativos. Revista da Associação Portuguesa de Lingüística 1, 95–120. Castro-Gingras, R. (1974). An Analysis of the Linguistic Characteristics of the English Found in a Set of Mexican-American Child Data. Los Alamitos, CA: Southwest Regional Laboratory for Educational Research and Development. Cauldwell, R. (2002a). The functional irrhythmicality of spontaneous speech: A discourse view of speech rhythms. Apples – Journal of Applied Language Studies. Retrieved 11 May 2020 from https:// jyx.jyu.fi/handle/123456789/22698. Cauldwell, R. (2002b). Streaming Speech: Listening and Pronunciation for Advanced Learners of English [CD-ROM for Windows]. Birmingham: speechinaction. Cauvet, E., R. Limissuri, S. Millotte, K. Skoruppa, D. Cabrol, and A. Christophe (2014). Function words constrain on-line recognition of verbs and nouns in French 18-month-olds. Language Learning and Development 10, 1–18. Cavé, C., I. Guaïtella, R. Bertrand, S. Santi, F. Harlay, and R. Espesser (1996). About the relationship between eyebrow movements and f0 variations. In Proceedings of the 4th International Conference on Spoken Language Processing, 2175–2179, Philadelphia. Cecchetto, C., C. Geraci, and S. Zucchi (2009). Another way to mark syntactic dependencies: The case for right-peripheral specifiers in sign languages. Language 85, 278– 320. Celce-Murcia, M., C. Brinton, and J. Goodwin (1996/2010). Teaching Pronunciation: A Reference for Teachers of English to Speakers of Other Languages. Cambridge: Cambridge University Press. Cesare, A.-M. D. (2014). Frequency, Forms and Functions of Cleft Constructions in Romance and Germanic: Contrastive, Corpus-Based Studies. Berlin: Walter de Gruyter. Chacon, T. C. (2012). The phonology and morphology of Kubeo: The documentation, theory, and description of an Amazonian language. PhD dissertation, University of Hawaiʻi at Mānoa. Chacon, T. C. (2014). A revised proposal of Proto-Tukanoan consonants and Tukanoan family classification. International Journal of American Linguistics 80, 275–322. Chafe, W. L. (1974). Language and consciousness. Language 50, 111–133. Chafe, W. L. (1976a). The Caddoan, Iroquoian, and Siouan languages. The Hague: Mouton de Gruyter. Chafe, W. L. (1976b). Givenness, contrastiveness, definiteness, subjects and topics. In C. N. Li (ed.), Subject and Topic, 27–55. New York: Academic Press. Chafe, W. L. (1977). Accent and related phenomena in the Five Nations Iroquois languages. In L. M. Hyman (ed.), Studies in Stress and Accent, 169–181. Los Angeles: University of Southern California Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

716 References Chafe, W. L. (1994). Discourse, Consciousness, and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing. Chicago: University of Chicago Press.‫‏‬ Chahal, D. (2006). Intonation. In K. Versteegh (ed.), Encyclopedia of Arabic Language and Linguistics, vol. 2, 395–401. Netherlands: Brill Academic. Chahal, D., and S. Hellmuth (2014a). Comparing the intonational phonology of Lebanese and Egyptian Arabic. In S.-A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 365–404. Oxford: Oxford University Press. Chahal, D., and S. Hellmuth (2014b). The intonation of Lebanese and Egyptian Arabic. In S.-A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 365–404. Oxford: Oxford University Press. Chan, K. K. L., and C. K. S. To (2016). Do individuals with high-functioning autism who speak a tone language show intonation deficits? Journal of Autism and Developmental Disorders 46, 1784–1792. Chan, M. (1987a). Phrase by Phrase. Englewood Cliffs: Prentice Hall Regents. Chan, M. K. M. (1987b). Tone and melody in Cantonese. In Proceedings of the 13th Meeting of the Berkeley Linguistics Society, 26–37, Berkeley. Chan, M. K. M. (1987c). Tone and melody interaction in Cantonese and Mandarin songs. UCLA Working Papers in Phonetics 68, 132–169. Chan, M. K. M., and H. Ren (1989). Wuxi tone sandhi: From last to first syllable dominance. Acta Linguistica Hafniensia 21, 35–64. Chandrasekaran, B., J. T. Gandour, and A. Krishnan (2007a). Neuroplasticity in the processing of pitch dimensions: A multidimensional scaling analysis of the mismatch negativity. Restorative Neurology and Neuroscience 25, 195–210. Chandrasekaran, B., and N. Kraus (2010). The scalp-recorded brainstem response to speech: Neural origins and plasticity. Psychophysiology 47, 236–246. Chandrasekaran, B., A. Krishnan, and J. T. Gandour (2007b). Mismatch negativity to pitch contours is influenced by language experience. Brain Research 1128, 148–156. Chandrasekaran, B., P. D. Sampath, and P. C. Wong (2010). Individual variability in cue-weighting and lexical tone learning. Journal of the Acoustical Society of America 128, 456–465. Chandrasekaran, B., E. Skoe, and N. Kraus (2014). An integrative model of subcortical auditory plas ticity. Brain Topography 27, 539–552. Chandrasekaran, B., Z. Xie, and R. Reetzke (2015). Music training and neural processing of speech: A critical review of the literature. In A. Aguele and A. Lotto (eds.), Essays in Speech Processes: Language Production and Perception, 139–174. Sheffield: Equinox. Chang, C. B., and A. R. Bowles (2015). Context effects on second-language learning of tonal contrasts. Journal of the Acoustical Society of America 138, 3703–3716. Chang, E. F., J. W. Rieger, K. Johnson, M. S. Berger, N. M. Barbaro, and R. T. Knight (2010). Categorical speech representation in human superior temporal gyrus. Neuroscience 13, 1428–1432. Chao, Y. R. (1930). A system of ‘tone letters’. Le maître phonétique 45, 24–27. Chao, Y. R. (1933). Tone and intonation in Chinese. Bulletin of the Institute of History and Philology, Academia Sinica 4, 121–134. Chao, Y. R. (1948). Mandarin Primer: An Intensive Course in Spoken Chinese. Cambridge, MA: Harvard University Press. Chao, Y. R. (1956). Tone, intonation, singsong, chanting, recitative, tonal composition, and atonal composition in Chinese. In M. Halle, H. Lunt, H. McLean, and C. H. van Schooneveld (eds.), For Roman Jakobson: Essays on the Occasion of His Sixtieth Birthday, 52–59. The Hague: Mouton. Chao, Y. R. (1968). A Grammar of Spoken Chinese. Berkeley: University of California Press. Chapman, A. (2001). Lexical tone, pitch and poetic structure: Elements of melody creation in ‘khaplam’ vocal music genres of Laos. Context 21, 21–40. Charette, M., and A. Göksel (1996). Licensing constraints and vowel harmony in Turkic languages. SOAS Working Papers in Linguistics and Phonetics 6, 1–25. Chartrand, T. L., and J. A. Bargh (1999). The chameleon eﬀect: The perception–behavior link and social interaction. Journal of Personality and Social Psychology 76, 893–910.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 717 Chaski, C. (1985). Linear and metrical analyses of Manam stress. Oceanic Linguistics 25, 167–209. Chatterji, S. K. (1926/1975). The Origin and Development of the Bengali Language: Vol. 1. Calcutta: Rupa and Co. Chávez Peón, M. E. (2010). The interaction of metrical structure, tone, and phonation types in Quiaviní Zapotec. PhD dissertation, University of British Columbia. Cheang, H. S., and M. D. Pell (2008). The sound of sarcasm. Speech Communication 50, 366–381. Cheang, H. S., and M. D. Pell (2009). Acoustic markers of sarcasm in Cantonese and English. Journal of Acoustical Society of America 126, 1394–1405. Chel, A. C., and J. Ramirez (1999). Diccionario del idioma Ixil de Santa María Nebaj. Antigua, Guatemala: Proyecto Lingüístico Francisco Marroquín. Chen, A. (2009a). Perception of paralinguistic intonational meaning in a second language. Language Learning 59, 367–409. Chen, A. (2010a). Is there really an asymmetry in the acquisition of the focus-to-accentuation mapping? Lingua 120, 1926–1939. Chen, A. (2011a). The developmental path to phonological encoding of focus in Dutch. In S. Frota, P. Prieto, and G. Elordieta (eds.), Prosodic Production, Perception and Comprehension, 93–109. Dordrecht: Springer. Chen, A. (2011b). Tuning information structure: Intonational realisation of topic and focus in child Dutch. Journal of Child Language 38, 1055–1083. Chen, A. (2014). Production-comprehension (A)symmetry: Individual differences in the acquisition of prosody focus-marking. In Proceedings of Speech Prosody 7, 423–427, Dublin. Chen, A. (2018). Get the focus right across languages: Acquisition of prosodic focus-marking in production. In P. Prieto and N. Esteve-Gibert (eds.), Prosodic Development in First Language Acquisition, 295–314. Amsterdam: John Benjamins. Chen, A., and L. Boves (2018). What’s in a word: Sounding sarcastic in British English. Journal of the International Phonetic Association 48(1), 57–76. Chen, A., and D. de Jong (2015). Prosodic expression of sarcasm in L2 English: Spoken discourse and prosody in L2. In M. Chini (ed.), Il Parlato in [Italiano] L2: Aspetti Pragmatici e Prosodici, 27–37. Milan: Francoangli. Chen, A., E. den Os, and J.-P. de Ruiter (2007). Pitch accent type matters for online processing of information status: Evidence from natural and synthetic speech. The Linguistic Review 24, 317–344. Chen, A., and P. Fikkert (2007). Intonation of early two-word utterances in Dutch. In Proceedings of the 16th International Congress of Phonetic Sciences, 315–320, Saarbrücken. Chen, A., C. Gussenhoven, and T. Rietveld (2004a). Language specificity in the perception of paralinguistic intonational meaning. Language and Speech 47, 311–349. Chen, A., and B. Höhle (2018). Four- to five-year-olds’ use of word order and prosody in focus marking in Dutch. Linguistics Vanguard 4, 20160101. Chen, A., L. Liu, and R. Kager (2015). Cross-linguistic perception of Mandarin tone sandhi. Language Sciences 48, 62–69. Chen, A., and H. van den Bergh (2020). The production-comprehension link in prosodic development and individual differences. Open Science Framework Preprints. https://doi.org/10.31219/osf.io/mur7f. Chen, C.-M. (2009b). The phonetics of Paiwan word-level prosody. Language and Linguistics 10, 593–625. Chen, C.-M. (2009c). Documenting Paiwan phonology: Issues in segments and non-stress prosodic features. Concentric: Studies in Linguistics 35, 193–223. Chen, C., and R. E. Jack (2017). Discovering cultural differences (and similarities) in facial expressions of emotion. Current Opinion in Psychology 17, 61–66. Chen, K., and M. Hasegawa-Johnson (2004). How prosody improves word recognition. In Proceedings of Speech Prosody 2, 583–586, Nara. Chen, K., M. Hasegawa-Johnson, and A. Cohen (2004b). An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic-prosodic model. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 509–512, Montreal.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

718 References Chen, L., and K. Zechner (2011). Applying rhythm features to automatically assess non-native speech. In INTERSPEECH 2011, 1861–1864, Florence. Chen, M. Y. (1987). The syntax of Xiamen tone sandhi. Phonology [Yearbook] 4, 109–149. Chen, M. Y. (2000). Tone Sandhi: Patterns across Chinese Dialects. Cambridge: Cambridge University Press. Chen, S., C. Wiltshire, and B. Li (2018). An updated typology of tonal coarticulation properties. Taiwan Journal of Linguistics 16, 79–114. Chen, Y. (2003). The phonetics and phonology of contrastive focus in Standard Chinese. PhD dissertation, Stony Brook University. Chen, Y. (2006). Durational adjustment under corrective focus in standard Chinese. Journal of Phonetics 34, 176–201. Chen, Y. (2008). The acoustic realization of vowels of Shanghai Chinese. Journal of Phonetics 36, 629–648. Chen, Y. (2009d). Prosody and information structure mapping: Evidence from Shanghai Chinese. Chinese Journal of Phonetics 2, 123–133. Chen, Y. (2010b). Post-focus f0 compression: Now you see it, now you don’t. Journal of Phonetics 38, 517–525. Chen, Y. (2012). Message-related variation. In A. Cohn, C. Fourgeron, and M. Huffman (eds.), Oxford Handbook of Laboratory Phonology, 103–115. Oxford: Oxford University Press. Chen, Y., and B. Braun (2006). Prosodic realization of information structure categories in standard Chinese. In Proceedings of Speech Prosody 9, Dresden. Chen, Y., and C. Gussenhoven (2008). Emphasis and tonal implementation in standard Chinese. Journal of Phonetics 36, 724–746. Chen, Y., M. P. Robb, H. R. Gilbert, and J. W. Lerman (2001). A study of sentence stress production in Mandarin speakers in American English. Journal of the Acoustical Society of America 109, 1681–1690. Chen, Y., P. P.-I. Lee, and H. Pan (2016). Topic and focus marking in Chinese. In C. Féry and S. Ishihara (eds.), Oxford Handbook of Information Structure, 733–752. Oxford: Oxford University Press. Chen, Y., and Y. Xu (2006). Production of weak elements in speech: Evidence from F0 patterns of neutral tone in standard Chinese. Phonetica 63, 47–75. Chen, Y., Y. Xu, and S. Guion-Anderson (2014). Prosodic realization of focus in bilingual production of Southern Min and Mandarin. Phonetica 71, 249–270. Cheng, C.-C. (1973). A quantitative study of Chinese tones. Journal of Chinese Linguistics 1, 93–110. Cheng, K. H., and R. G. Broadhurst (2005). Detection of deception: The effects of language on detection ability among Hong Kong Chinese. Psychiatry, Psychology and Law 12(1), 107–118. Cheng, R. L. (1968). Tone sandhi in Taiwanese. Linguistics 41, 19–42. Cheung, W. H. Y. (2009). Span of high tones in Hong Kong English. Proceedings of the 35th Meeting of the Berkeley Linguistics Society 72–82, Berkeley. Chevallier, C., I. Noveck, F. Happé, and D. Wilson (2009). From acoustics to grammar: Perceiving and interpreting grammatical prosody in adolescents with Asperger syndrome. Research in Autism Spectrum Disorders 3, 502–516. Chiang, W.-Y., and F.-M. Chiang (2005). Saisiyat as a pitch accent language: Evidence from acoustic study of words. Oceanic Linguistics 44, 404–426. Chiang, W.-Y., I. Chang-liao, and F.-M. Chiang (2006). The prosodic realization of negation in Saisiyat and English. Oceanic Linguistics 45, 110–132. Childs, B., and W. Wolfram (2008). Bahamian English: Phonology. In B. Kortmann and E. W. Schneider (eds.), Varieties of English: Vol. 2. The Americas and the Caribbean, 239–255. Berlin: Mouton de Gruyter. Childs, T. (1995). Tone and accent in Atlantic. In A. Traill, R. Vossen, and M. Biesele (eds.), The Complete Linguist: Papers in Memory of Patrick J. Dickens, 195–215. Cologne: Rüdiger Köppe. Chin, S. B., T. R. Bergeson, and J. Phan (2012). Speech intelligibility and prosody production in children with cochlear implants. Journal of Communication Disorders 45(5), 355–366.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 719 Chirkova, E., and A. Michaud (2009). Approaching the prosodic system of Shı ̌xīng. Language and Linguistics 10, 539–568. Chirkova, K. (2009). Shı ̌xīng, a Sino-Tibetan language of South-West China: A grammatical sketch with two appended texts. Linguistics of the Tibeto-Burman Area 32, 1–90. Chitoran, I. (1996). Prominence vs. rhythm: The predictability of stress in Romanian. In K. Zagona (ed.), Grammatical Theory and Romance Languages, 47–58. Amsterdam: John Benjamins. Cho, H. (2010). A weighted–constraint model of f0 movements. PhD dissertation, MIT. Cho, H., and E. Flemming (2015). Compression and truncation: The case of Seoul Korean accentual phrase. Studies in Phonetics, Phonology, and Morphology 21, 359–382. Cho, S.-H. (2017). Text alignment in Japanese children’s song. Penn Working Papers in Linguistics 23(1). Retrieved 11 May 2020 from http://repository.upenn.edu/pwpl/vol23/iss1/5. Cho, T. (2005). Prosodic strengthening and featural enhancement: Evidence from acoustic and articulatory realizations of /a,i/ in English. Journal of the Acoustical Society of America 117, 3867–3878. Cho, T. (2006a). Manifestation of prosodic structure in articulation: Evidence from lip kinematics in English. In L. M. Goldstein, D. H. Whalen, and C. T. Best (eds.), Laboratory Phonology 8, 519–548. Berlin: Mouton de Gruyter. Cho, T. (2006b). An acoustic study of the stress and intonational system in Lakhota: A preliminary report. Speech Sciences 13, 23–42. Cho, T. (2016). Prosodic boundary strengthening in the phonetics–prosody interface. Language and Linguistics Compass 10, 120–141. Cho, T., and S.-A. Jun (2000). Domain-initial strengthening as featural enhancement: Aerodynamic evidence from Korean. Chicago Linguistics Society 36, 31–44. Cho, T., S.-A. Jun, and P. Ladefoged (2002). Acoustic and aerodynamic correlates of Korean stops and fricatives. Journal of Phonetics 30, 193–228. Cho, T., and P. A. Keating (2001). Articulatory and acoustic studies of domain-initial strengthening in Korean. Journal of Phonetics 29, 155–190. Cho, T., and P. A. Keating (2009). Effects of initial position versus prominence in English. Journal of Phonetics 37, 466–485. Cho, T., D. Kim, and S. Kim (2017). Prosodically-conditioned fine-tuning of coarticulatory vowel nasalization in English. Journal of Phonetics 64, 71–89. Cho, T., Y. Lee, and S. Kim (2014a). Prosodic strengthening on the /s/-stop cluster and the phonetic implementation of an allophonic rule in English. Journal of Phonetics 46, 128–146. Cho, T., J. M. McQueen, and E. A. Cox (2007). Prosodically driven phonetic detail in speech processing: The case of domain-initial strengthening in English. Journal of Phonetics 35, 210–243. Cho, T., M. Son, and S. Kim (2016). Articulatory reflexes of the three-way contrast in labial stops and kinematic evidence for domain-initial strengthening in Korean. Journal of the International Phonetic Association 26, 129–155. Cho, T., Y. Yoon, and S. Kim (2014b). Effects of prosodic boundary and syllable structure on the temporal realization of CV gestures in Korean. Journal of Phonetics 44, 96–109. Choi, Y., and R. Mazuka (2003). Young children’s use of prosody in sentence parsing. Journal of Psycholinguistic Research 32, 197–217. Cholin, J., G. S. Dell, and W. J. M. Levelt (2011). Planning and articulation in incremental word production: Syllable frequency effects in English. Journal of Experimental Psychology: Learning, Memory and Cognition 37, 109–122. Chomsky, N., and M. Halle (1968). The Sound Pattern of English. New York: Harper and Row. Chong, A. J. (2013). Towards a model of Singaporean English intonational phonology. Proceedings of Meetings on Acoustics 19, Montreal. Chong, A. J., and J. S. German (2015). Prosodic phrasing and f0 in Singapore English. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Chong, A. J., and J. S. German (2017). The accentual phrase in Singapore English. Phonetica 74, 63–80.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

720 References Chrabaszcz, A., M. Winn, C. Y. Lin, and W. J. Idsardi (2014). Acoustic cues to perception of word stress by English, Mandarin, and Russian speakers. Journal of Speech, Language, and Hearing Research 57, 1468–1479. Christiansen, M. H., and N. Chater (2016). The now-or-never bottleneck: A fundamental constraint on language. Behavioral and Brain Sciences 39, e62. Christodoulopoulos, C., D. Roth, and C. Fisher (2016). An incremental model of syntactic bootstrapping. In Proceedings of the Seventh Workshop on Cognitive Aspects of Computational Language Learning, Berlin. Christophe, A., I. Dautriche, A. de Carvalho, and P. Brusini (2016). Bootstrapping the syntactic bootstrapper. In J. Scott and D. Waughtal (eds), Proceedings of the 40th Boston University Conference on Language Development, 75–88. Somerville, MA: Cascadilla Press. Christophe, A., E. Dupoux, J. Bertoncini, and J. Mehler (1994). Do infants perceive word boundaries? An empirical study of the bootstrapping of lexical acquisition. Journal of the Acoustical Society of America 95, 1570–1580. Christophe, A., S. Millotte, S. Bernal, and J. Lidz (2008). Bootstrapping lexical and syntactic acquisition. Language and Speech 51, 61–75. Christophe, A., S. Peperkamp, C. Pallier, E. Block, and J. Mehler (2004). Phonological phrase boundaries constrain lexical access: I. Adult data. Journal of Memory and Language 51, 523–547. Chun, D. (1998). Signal analysis software for teaching discourse intonation. Language Learning and Technology 2, 61–77. Chun, D. (2002). Discourse Intonation in L2: From Theory and Research to Practice. Amsterdam: John Benjamins. Chung, S. (1983). Transderivational relationships in Chamorro phonology. Language 59, 35–66. Chung, Y. (1991). The lexical tone system of North Kyungsang Korean. PhD dissertation, The Ohio State University. Church, R., B. Bernhardt, K. Pichora-Fuller, and R. Shi (2005). Infant-directed speech: Final syllable lengthening and rate of speech. Canadian Acoustics 33, 13–20. Cichocki, W., S.-A. Selouani, A. B. Ayed, C. Paulin, and Y. Perreault (2013). Variation of rhythm metrics in regional varieties of Acadian French. Proceedings of Meetings on Acoustics 19, Montreal. Cinque, G. (1993). A null theory of phrase and compound stress. Linguistic Inquiry 24, 239–297. Clackson, J. (2007). Indo-European Linguistics: Introduction. Cambridge: Cambridge University Press. Clark, H. H. (1996). Using Language. Cambridge: Cambridge University Press. Clark, L. E. (1959). Phoneme classes in Sayula Popoluca. Studies in Linguistics 14(702), 25–33. Clark, M. M. (1983). On the distribution of contour tones. Paper presented at the 2nd West Coast Conference on Formal Linguistics. Los Angeles: University of Southern California. Classe, A. (1939). The rhythm of English prose. Oxford: Basil Blackwell. Clemens, L. E. (2014). Prosodic noun incorporation and verb-initial syntax. PhD dissertation, Harvard University. Clemens, L. E., and J. Coon (2018). Deriving verb-initial word order in Mayan. Language 94(2), 237–280. Clement, R. D. (1984). Gaelic. In P. Trudgill (ed.), Language in the British Isles, 318–343. Cambridge: Cambridge University Press. Clements, G. N. (1978). Tone and syntax in Ewe. In D. J. Napoli (ed.), Elements of Tone, Stress, and Intonation, 21–99. Washington DC: Georgetown University Press. Clements, G. N. (1979). The description of terraced-level tone languages. Language 55, 536–558. Clements, G. N. (2001). Representational economy in constraint-based phonology. In T. A. Hall (ed.), Distinctive Feature Theory, 71–146. Berlin: Mouton de Gruyter. Clements, G. N., and J. A. Goldsmith (eds.) (1984). Autosegmental Studies in Bantu Tone. Dordrecht: Foris. Clements, G. N., and S. J. Keyser (1983). CV Phonology: A General Theory of the Syllable. Cambridge, MA: MIT Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 721 Clements, G. N., A. Michaud, and C. Patin (2010). Do we need tone features? In J. A. Goldsmith, E. V. Hume, and L. Wetzels (eds.), Tones and Features: Phonetic and Phonological Perspectives, 3–24. Berlin: Mouton de Gruyter. Clements, G. N., and A. Rialland (2008). Africa as a phonological area. In B. Heine and D. Nurse (eds.), A Linguistic Geography of Africa, 36–87. Cambridge: Cambridge University Press. Clements, G. N., and E. Sezer (1982). Vowel and consonant disharmony in Turkish. In H. van der Hulst and N. Smith (eds.), The Structure of Phonological Representations, part 2, 213–255. Dordrecht: Foris. Clopper, C. G. (2002). Frequency of Stress Patterns in English: A Computational Analysis. Indiana University Linguistics Club Working Papers Online 2. Retrieved 8 June 2020 from https://pdfs.semanticscholar. org/3404/d56321a167b484a3daa0654ece56aa1b8aff.pdf?_ga=2.242619713.299482006.1591501832927414360.1591501832. Clopper, C. G., and R. Smiljanić (2011). Effects of gender and regional dialect on prosodic patterns in American English. Journal of Phonetics 39, 237–245. Clopper, C. G., and J. Tonhauser (2013). The prosody of focus in Paraguayan Guaraní. International Journal of American Linguistics 79, 219–251. Clumeck, H. (1977). Topics in the Acquisition of Mandarin Phonology: A Case Study. Papers and Reports on Child Language Development, ERIC Clearinghouse. Clumeck, H. (1980). The acquisition of tone. In G. H. Yeni-Komshian, J. F. Kavanagh, and C. A. Ferguson (eds.), Child Phonology, 257–275. New York: Academic Press. Coe, R. (2002), It’s the effect size, stupid: What effect size is and why it is important. Paper presented at the Annual Conference of the British Educational Research Association, Exeter. Retrieved 13 March 2018 from www.leeds.ac.uk/educol/documents/00002182.htm. Coerts, J. (1992). Nonmanual grammatical markers: An analysis of interrogatives, negations and topicalisations in Sign Language of the Netherlands. PhD dissertation, University of Amsterdam. Coetzee, A. W., and D. P. Wissing (2007). Global and local durational properties in three varieties of South African English. The Linguistic Review 24, 263–289. Cohen, A., and J. ’t Hart (1965). Perceptual analysis of intonation patterns. In D. E. Commins (ed.), Proceedings of the 5th International Congress on Acoustics, paper A16, Liege. Cohen, A., and J. ’t Hart (1967). On the anatomy of intonation. Lingua 19, 177–192. Cohen, M. J., C. A. Riccio, and A. M. Flannery (1994). Expressive aprosodia following stroke to the right basal ganglia: A case report. Neuropsychology 8, 242–245. Cohn, A. (1989). Stress in Indonesian and bracketing paradoxes. Natural Language and Linguistic Theory 7, 167–216. Cohn, A. (2005). Truncation in Indonesian: Evidence for violable minimal words and AnchorRight. In K. Moulton, and M. Wolf (eds.), Proceedings of the 34th Annual Meeting of the North East Linguistics Society, vol. 1, 175–189. Colarusso, J. (1992). The Kabardian Language. Calgary: University of Calgary Press. Cole, D., and M. Miyashita (2006). The function of pauses in metrical studies: Acoustic evidence from Japanese verse. In B. E. Dresher and N. Friedberg (eds.), Formal Approaches to Poetry, 173–192. Berlin: Mouton de Gruyter. Cole, J. (2015). Prosody in context: A review. Language, Cognition and Neuroscience 30, 1–31. Cole, J., H. Choi, H. Kim, and M. Hasegawa-Johnson (2003). The effect of accent on the acoustic cues to stop voicing in radio news speech. In Proceedings of the 15th International Congress of Phonetic Sciences, 2665–2668, Barcelona. Cole, J., J. I. Hualde, C. L. Smith, C. Eager, T. Mahrt, and R. Napoleão de Souza (2019). Sound, structure and meaning: The bases of prominence ratings in English, French and Spanish-NC-ND license. Journal of Phonetics 75, 113–147. Cole, J., T. Mahrt, and J. Roy (2017). Crowd-sourcing prosodic annotation. Computer Speech and Language 45, 300–325. Cole, J., Y. Mo, and M. Hasegawa-Johnson (2010). Signal-based and expectation-based factors in the perception of prosodic prominence. Laboratory Phonology 1(2), 425–452. Cole, J., and U. D. Reichel (2016). What entrainment reveals about the cognitive encoding of prosody and its relation to discourse function. In Proceedings of Speech Prosody 8, Boston.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

722 References Cole, J., and S. Shattuck-Hufnagel (2016). New methods for prosodic transcription: Capturing vari ability as a source of information. Laboratory Phonology 7(8), 1–29. Cole, J., and S. Shattuck-Hufnagel (2018). Quantifying phonetic variation: Landmark labelling of imitated utterances. In F. Cangemi, M. Clayards, O. Niebuhr, B. Schuppler, and M. Zellers (eds.), Rethinking Reduction, 164–204. Berlin: Mouton de Gruyter. Cole, J., E. R. Thomas, E. Britt, and E. L. Coggshall (2008). Gender and ethnicity in intonation: A case study of North Carolina English. Ms., Department of Linguistics, University of Illinois at Urbana-Champaign. Cole, R. A., and J. Jakimik (1978). Understanding speech: How words are heard. In G. Underwood (ed.), Strategies of Information Processing, 67–116. London: Academic Press. Comrie, B. (1967). Irregular stress in Polish and Macedonian. International Review of Slavic Linguistics 1, 227–240. Comrie, B., and G. G. Corbett (eds.) (1993). The Slavonic Languages. London: Routledge. Coniam, D. (2002). Technology as an awareness-raising tool for sensitising teachers to features of stress and rhythm in English. Language Awareness 11, 30–42. Conklyn, D., E. Novak, A. Boissy, F. Bethoux, and K. Chemali (2012). The effects of modified melodic intonation therapy on nonfluent aphasia: A pilot study. Journal of Speech, Language, and Hearing Research 55, 1463–1471. Connaghan, K. P., and R. Patel (2017). The impact of contrastive stress on vowel acoustics and intelligibility in dysarthria. Journal of Speech, Language, and Hearing Research 60, 38–50. Connell, B. A. (1999). Four tones and downtrend: A preliminary report on pitch realization in Mambila. In P. F. A. Kotey (ed.), New Dimensions in African Linguistics and Languages. Trends in African Linguistics, vol. 3, 75–88. Trenton, NJ: Africa World Press. Connell, B. A. (2004). Tone, utterance length and FO scaling. In Proceedings of the 1st International Symposium on Tonal Aspects of Languages, 41–44, Beijing. Connell, B. A. (2011). Downstep. In M. van Oostendorp, C. J. Ewen, E. V. Hume, and K. Rice (eds.), The Blackwell Companion to Phonology, vol. 2, 824–847. Oxford: Wiley Blackwell. Connell, B. A. (2017). Tone and intonation in Mambila. In L. J. Downing and A. Rialland (eds.), Intonation in African Tone Languages, 131–166. Berlin: Mouton de Gruyter. Connell, B. A., R. A. Hayward, and J. A. Ashkaba (2000). Observations on Kunama tone. Studies in African Linguistics 29, 1–41. Connell, B. A., and D. R. Ladd (1990). Aspects of pitch realization in Yoruba. Phonology 7, 1–29. Connine, C. M., D. Blasko, and M. Hall (1991). Effects of subsequent sentence context in auditory word recognition: Temporal and linguistic constraints. Journal of Memory and Language 30, 234–250. Constant, N. (2012). English rise-fall-rise: A study in the semantics and pragmatics of intonation. Linguistics and Philosophy 35, 407–442. Constenla-Umaña, A. (1981). Comparative Chibchan phonology. PhD dissertation, University of Pennsylvania. Constenla-Umaña, A. (2012). Chibchan languages. In L. Campbell and V. Grondona (eds.), The Indigenous Languages of South America: A Comprehensive Guide, 391–440. Berlin: Mouton de Gruyter. Cook, D., and L. L. Criswell (1993). El idioma koreguaje (Tucano occidental). Bogotá: ILV. Cook, E.-D. (1971). Vowels and tone in Sarcee. Language 47, 164–179. Coon, J. (2010). Complementation in Chol (Mayan): A theory of split ergativity. PhD dissertation, MIT. Cooper, N., A. Cutler, and R. Wales (2002). Constraints of lexical stress on lexical access in English: Evidence from native and non-native listeners. Language and Speech 45, 207–228. Cooper, S. E. (2015). Intonation in Anglesey Welsh. PhD dissertation, University of Bangor. Cooper, W., and J. Eady (1986). Metrical phonology in speech production. Journal of Memory and Language 25, 369–384. Cooper, W., C. Soares, J. Nicol, D. Michelow, and S. Goloskie (1984). Clausal intonation after unilateral brain damage. Language and Speech 27, 17–24.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 723 Cooper, W., and J. Sorensen (1981). Fundamental Frequency in Sentence Production. Heidelberg: Springer. Cooperrider, K., N. Abner, and S. Goldin-Meadow (2018). The palm-up puzzle: Meanings and origins of a widespread form in gesture and sign. Frontiers in Communication 3, 23. Coquillon, A., A. Di Cristo, and M. Pitermann (2000). Marseillais et toulousains gerent-ils différemment leur pieds? Caracteristiques prosodiques du schwa dans les parlers méridionaux. In Proceedings of the 23rd Journées d’Etude sur la Parole, 89–92, Aussois. Corley, M., L. J. MacGregor, and D. I. Donaldson (2007). It’s the way that you, er, say it: Hesitations in speech affect language comprehension. Cognition 105, 658–668. Corrales-Astorgano, M., D. Escudero-Mancebo, and C. Gonzalez-Ferreras (2018). Acoustic characterization and perceptual analysis of the relative importance of prosody in speech of people with Down syndrome. Speech Communication 99, 90–100. Correira, S. (2009). The acquisition of primary word stress in European Portuguese. PhD dissertation, University of Lisbon. Costa, P., and R. McCrae (1989). Neo Five-Factor Inventory (NEO-FFI). Odessa, FL: Psychological Assessment Resources. Coulter, G. (1982). On the nature of ASL as a monosyllabic language. Paper presented at the Annual Meeting of the Linguistic Society of America, San Diego. Council of Europe (2011). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Strasbourg: Council of Europe. Couper-Kuhlen, E., and M. Selting (1996). Prosody in Conversation: Interactional Studies. Cambridge: Cambridge University Press. Courtenay, K. (1974). On the nature of the Bambara tone system. Studies in African Linguistics 5, 303–323. Coustenoble, H., and L. Armstrong (1934). Studies in French Intonation. Cambridge: W. Heffer and Sons. Couto, C. (2016). A influência da prosódia sobre o sistema vocálico da língua saynáwa (pano). Revue d’Ethnolinguistique amérindienne 39(1), 53–82. Cox, F., and J. Fletcher (2017). Australian English Pronunciation and Transcription. Cambridge: Cambridge University Press. Crane, T. (2014). Melodic tone in Totel TAM. Africana Linguistica 20, 63–79. Crasborn, O., E. van der Kooij, J. Ros, and H. de Hoop (2009). Topic agreement in NGT (Sign Language of the Netherlands). The Linguistic Review 26, 355–370. Crass, J. (2005). Das Kabeena: Deskriptive Grammatik einer hochlandostkuschitischen Sprache. Cologne: Rüdiger Köppe. Creider, C. A. (1981). The tonal system of Proto-Kalenjin. In T. C. Schadebertg and M. L. Bender (eds.), Proceedings of the First Nilo-Saharan Linguistics Colloquium, 19–39. Dordrecht: Foris. Creissels, D. (1978). A propos de la tonologie du bambara: Realisations tonales, systeme tonal et la modalite nominal ‘defini’. Afrique et langage 9, 5–70. Creissels, D. (2006). Le malinké de Kita. Cologne: Köppe. Creissels, D., and C. Grégoire (1993). La notion de ton marqué dans l’analyse d’une oppositon tonale binaire: Le cas du Mandingue. Journal of African Languages and Linguistics 14, 107–154. Cremona, M., S. Assimakopoulos, and A. Vella (2017). The expression of politeness in a bilingual setting: Exploring the case of Maltese English. Russian Journal of Linguistics 21, 767–788. Criper, L. (1971). The tone system of Gã English. In Actes du 8ème Congrès de la Societé Linguistique de l’Afrique Occidental, 45–57, Abidjan. Criper-Friedman, L. (1990). The tone system of West African coastal English. World Englishes 9, 63–77. Cristia, A., E. Dupoux, M. Gurven, and J. Stieglitz (2019). Child-directed speech is infrequent in a forager-farmer population: A time allocation study. Child Development 90, 759–773. Criswell, L. L., and B. Brandrup (2000). Un bosquejo fonológico y gramatical del Siriano. In M. S. G. de Pérez and M. L. R. de Montes (eds.), Lenguas indígenas de Colombia: Una visión descriptiva, 395–415. Santafé de Bogotá: Instituto Caro y Cuervo.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

724 References Crocco, C. (2013). Is Italian clitic right dislocation grammaticalised? A prosodic analysis of yes/no questions and statements. Lingua 133, 30–52. Crompton, A. (1982). Syllables and segments in speech production. In A. Cutler (ed.), Slips of the Tongue and Language Production, 663–716. Berlin: Mouton de Gruyter. (Also 1981, Linguistics 19, 663–716.) Crook, H. (1989). The phonology and morphology of Nez Perce stress. PhD dissertation, University of California, Los Angeles. Croot, K., C. Au, and A. Harper (2010). Prosodic structure and tongue twister errors. In C. Fougeron, B. Kuehnert, M. D’Imperio, and N. Vallée (eds.), Laboratory Phonology 10, 433–459. Berlin: Mouton de Gruyter. Crosswhite, K. (2004). Vowel reduction. In B. Hayes, D. Steriade, and R. Kirchner (eds.), Phonetically Based Phonology, 191–231. New York: Cambridge University Press. Crowley, T. (1976). Phonological change in New England. In R. M. W. Dixon (ed.), Grammatical Categories in Australian Languages, 19–50. Canberra: Australian Institute of Aboriginal Studies. Crowley, T. (1978). The Middle Clarence Dialects of Bandjalang. Canberra: Australian Institute of Aboriginal Studies. Crumpton, J., and C. L. Bethel (2016) A survey of using vocal prosody to convey emotion in robot speech. International Journal of Social Robotics 8(2), 271–285. Cruttenden, A. (1985). Intonation comprehension in ten-year-olds. Journal of Child Language 12, 643–661. Cruttenden, A. (1986). Intonation. Cambridge: Cambridge University Press. Cruttenden, A. (1994). Rises in English. In J. Windsor-Lewis (ed.), Studies in General and English Phonetics: Essays in Honour of Professor J. D. O’Connor, 155–173. London: Routledge. Cruttenden, A. (1997). Intonation (2nd ed.). Cambridge: Cambridge University Press. Cruttenden, A. (2001). Mancunian intonation and intonational representation. Phonetica 58, 53–80. Cruz, E. (2011). Phonology, tone and the functions of tone in San Juan Quiahije Chatino. PhD dissertation, University of Texas at Austin. Cruz, E., and A. Woodbury (2014). Finding a way into a family of tone languages: The story and methods of the Chatino Language Documentation Project. Language Documentation and Conservation 8, 490–524. Cruz-Ferreira, M. (1987). Non-native interpretive strategies for intonational meaning: An experimental study. In A. James and J. Leather (eds.), Sound Patterns in Second Language Acquisition, 103–120. Dordrecht: Foris. Cruz-Ferreira, M. (1989). Non-native comprehension of intonation patterns in Portuguese and in English. PhD dissertation, University of Manchester. Crystal, D. (1969). Prosodic Systems and Intonation in English. Cambridge: Cambridge University Press. Crystal, D. (1975). The English Tone of Voice: Essays in Intonation, Prosody and Paralanguage. London: Arnold. Crystal, D. (1982). Profiling Linguistic Disability. London: Edward Arnold. Crystal, D. (1986). Prosodic development. In P. J. Fletcher and M. Garman (eds.), Studies in First Language Development, 174–197. New York: Cambridge University Press. Crystal, T. H., and A. House (1988). Segmental durations in connected-speech signals: Syllabic stress. Journal of the Acoustical Society of America 83, 1574–1585. Cucchiarini, C., H. Strik, and L. Boves (2002). Quantitative assessment of second language learners’ fluency: Comparisons between read and spontaneous speech. Journal of the Acoustical Society of America 111, 2862–2873. Cumming, R. E. (2011a). The language-specific interdependence of tonal and durational cues in perceived rhythmicality. Phonetica 68, 1–25. Cumming, R. E. (2011b). Perceptually informed quantification of speech rhythm in pairwise variabil ity indices. Phonetica 68, 256–277. Cumming, R. E., A. Wilson, and U. Goswami (2015). Basic auditory processing and sensitivity to prosodic structure in children with specific language impairments: A new look at a perceptual hypothesis. Frontiers in Psychology 6, 1–16.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 725 Cummins, F. (2002). Speech rhythm and rhythmic taxonomy. In Proceedings of Speech Prosody 1, 121–126, Aix-en-Provence. Cummins, F. (2009). Rhythm as entrainment: The case of synchronous speech. Journal of Phonetics 37, 16–28. Cummins, F. (2011). Periodic and aperiodic synchronization in skilled action. Frontiers in Human Neuroscience 5, 170. Cummins, F. (2012). Oscillators and syllables: A cautionary note. Frontiers in Psychology 3, 364. Cummins, F., and R. Port (1998). Rhythmic constraints on stress timing in English. Journal of Phonetics 26, 145–171. Curiel Ramírez del Prado, A. (2007). Estructura de la información, enclíticos y configuración sintáctica en tojol’ab’al. MA thesis, Centro de Investigaciones y Estudios Superiores en Antropología Social. Curtin, S. (2009). Twelve-month-olds learn novel word–object pairings differing only in stress pattern. Journal of Child Language 36(5), 1157–1165. Curtin, S., J. Campbell, and D. Hufnagle (2012). Mapping novel labels to actions: How the rhythm of words guides infants’ learning. Journal of Experimental Child Psychology 112(2), 127–140. Curtin, S., T. H. Mintz, and M. H. Christiansen (2005). Stress changes the representational landscape: Evidence from word segmentation. Cognition 96(3), 233–262. Cutler, A. (1976). Phoneme-monitoring reaction time as a function of preceding intonation contour. Perception and Psychophysics 20, 55–60. Cutler, A. (1986). Forbear is a homophone: Lexical prosody does not constrain lexical access. Language and Speech 29, 201–220. Cutler, A. (1990). Exploiting prosodic probabilities in speech segmentation. In G. T. M. Altmann (ed.), Cognitive Models of Speech Processing: Psycholinguistic and Computational Perspectives, 105–112). Cambridge, MA: MIT Press. Cutler, A. (1994). Segmentation problems, rhythmic solutions. Lingua 92, 81–104. Cutler, A. (2005). Lexical stress. In D. B. Pisoni and R. E. Remez (eds.), The Handbook of Speech Perception, 264–289. Oxford: Blackwell. Cutler, A. (2012). Native Listening: Language Experience and the Recognition of Spoken Words. Cambridge, MA: MIT Press. Cutler, A., and S. Butterfield (1992). Rhythmic cues to speech segmentation: Evidence from juncture misperception. Journal of Memory and Language 31, 218–236. Cutler, A., and D. M. Carter (1987). The predominance of strong initial syllables in the English vocabu lary. Computer Speech and Language 2, 133–142. Cutler, A., and C. E. Clifton (1984). The use of prosodic information in word recognition. In H. Bouma and D. G. Bouwhuis (eds.), Attention and Performance X, 183–196. Hillsdale, NJ: Erlbaum. Cutler, A., D. Dahan, and W. van Donselaar (1997). Prosody in the comprehension of spoken language: A literature review. Language and Speech 40(2), 141–201. Cutler, A., and D. Foss (1977). On the role of sentence stress in sentence processing. Language and Speech 20, 1–10. Cutler, A., and J. Mehler (1993). The periodicity bias. Journal of Phonetics, 21(1–2), 103–108. Cutler, A., and D. Norris (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance 14(1), 113–121. Cutler, A., and T. Otake (1994). Mora or phoneme? Further evidence for language-specific listening. Journal of Memory and Language 33(6), 824–844. Cutler, A., and T. Otake (1999). Pitch accent in spoken-word recognition in Japanese. Journal of the Acoustical Society of America 105, 1877–1888. Cutler, A., and M. Pearson (1986). On the analysis of prosodic turn-taking cues. In C. Johns-Lewis (ed.), Intonation in Discourse, 139–156. San Diego: College-Hill. Cutler, A., and D. A. Swinney (1987). Prosody and the development of comprehension. Journal of Child Language 14, 145–167. Cutler, A., and W. van Donselaar (2001). Voornaam is not (really) a homophone: Lexical prosody and lexical access in Dutch. Language and Speech 44(2), 171–195.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

726 References Czekman, W., and E. Smułkowa (1988). Fonetyka i fonologia języka białoruskiego z elementami fonetyki i fonologii ogólnej. Warsaw: PWN. Dachkovsky, S. (2005). Facial expression as intonation in ISL: The case of conditionals. MA thesis, University of Haifa. Dachkovsky, S. (2017). Grammaticalization of intonation in Israeli Sign Language: From information structure to relative clause relations. PhD dissertation, University of Haifa. Dachkovsky, S., C. Healy, and W. Sandler (2013). Visual intonation in two sign languages. Phonology 30(2), 211–252. Dachkovsky, S., and W. Sandler (2009). Visual intonation in the prosody of a sign language. Language and Speech 52(2–3), 287–314. Daellert, F., T. Polzin, and A. Waibel (1996). Recognizing emotion in speech. In Proceedings of the 4th International Conference on Spoken Language Processing, vol. 3, 1970–1973, Philadelphia. Dahan, D. (2015). Prosody and language comprehension. Wiley Interdisciplinary Reviews: Cognitive Science 6, 441–452. Dainora, A. (2001). An empirically based probabilistic model of intonation in American English. PhD dissertation, University of Chicago. Dainora, A. (2006). Modelling intonation in English: A probabilistic approach to phonological competence. In L. M. Goldstein, D. H. Whalen, and C. T. Best (eds.), Laboratory Phonology 8, 107–132. Berlin: Mouton de Gruyter. d’Alessandro, C. (2006). Voice source parameters and prosodic analysis. In S. Sudhoff, D. Lenertova, R. Meyer, S. Pappert, P. Augurzky, I. Mleinek, N. Richter, and J. Schließer, Method in Empirical Prosody Research, 63–87. Berlin: Walter de Gruyter. d’Alessandro, C., S. Rosset, and J.-P. Rossi (1998). The pitch of short-duration fundamental frequency glissandos. Journal of the Acoustical Society of America 104, 2339–2348. Dalton, M. (2008). The phonetics and phonology of the intonation of Irish dialects. PhD dissertation, Trinity College Dublin. Dalton, M., and A. Ní Chasaide (2003). Modelling intonation in three Irish dialects. In Proceedings of the 15th International Congress of Phonetic Sciences, 1073–1076, Barcelona. Dalton, M., and A. Ní Chasaide (2005a). Peak timing in two dialects of Connaught Irish. In INTERSPEECH 2005, 1377–1380, Lisbon. Dalton, M., and A. Ní Chasaide (2005b). Tonal alignment in Irish dialects. Language and Speech 48, 441–464. Dalton, M., and A. Ní Chasaide (2007a). Melodic alignment and micro-dialect variation in Connemara Irish. In C. Gussenhoven and T. Riad (eds.), Tones and Tunes: Vol. 2. Experimental Studies in Word and Sentence Prosody, 293–316. Berlin: Mouton de Gruyter. Dalton, M., and A. Ní Chasaide (2007b). Nuclear accents in four Irish (Gaelic) dialects. In Proceedings of the 16th International Congress of Phonetic Sciences, 965–968, Saarbrücken. Daly, J. P., and L. M. Hyman (2007). On the representation of tone in Peñoles Mixtec. International Journal of American Linguistics 73(2), 165–207. Daly, N., and P. Warren (2002). Pitching it differently in New Zealand English: Speaker sex and inton ation patterns. Journal of Sociolinguistics 2, 85–96. Daneš, F. (1957). Intonace a věta ve spisovné češtině. Prague: Czechoslovak Academy of Sciences. Daneš, F. (1974). Functional sentence perspective and the organization of the text. Papers on Functional Sentence Perspective 106, 128. Danforth, D. G., and I. Lehiste (1977). The hierarchy of phonetic cues in the perception of Estonian quantity. P. Virittaja 4, 404–411. Dankovičová, J. (1997). The domain of articulation rate variation in Czech. Journal of Phonetics 25(3), 287–312. Dankovičová, J., and V. Dellwo (2007). Czech speech rhythm and the rhythm class hypothesis. In Proceedings of the 16th International Congress of Phonetic Sciences, 1241–1244, Saarbrücken. Danly, M., and B. E. Shapiro (1982). Speech prosody in Broca’s aphasia. Brain and Language 16, 171–190.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 727 Danner, S. G., A. V. Barbosa, and J. L. Goldstein (2018). Quantitative analysis of multimodal speech data. Journal of Phonetics 71, 268–283. Danylenko, A., and S. Vakulenko (1995). Ukrainian. Munich: Lincom Europa. Darwin, C. (1872). The Expression of the Emotions of Man and Animals. London: John Murray. Daržágín, S., M. Trnka, and J. Štefánik (2005). Možnosti výskumu kvantity pomocou existujúcich rečových databáz: Kvantita v spisovnej slovenčine a v slovenských nárečiach. Bratislava: Veda. Das, K. (2017). Tone and intonation in Boro. Paper presented at the Chulalongkorn International Symposium on Southeast Asian Linguistics, Bangkok. Das, K., and S. Mahanta (2016). Tonal alignment and prosodic word domains in Boro. In Proceedings of the 5th International Symposium on Tonal Aspects of Languages, 111–115, Buffalo, NY. Daudey, H. (2014). A grammar of Wadu Pumi. PhD dissertation, La Trobe University. Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics 11, 51–62. Dauer, R. M. (1987). Phonetic and phonological components of language rhythm. In Proceedings of the 11th International Congress of Phonetic Sciences, vol. 5, 447–450, Tallinn. Daugavet, A. (2010). Syllable length in Latvian and Lithuanian: Searching for the criteria. Baltic Linguistics 1, 83–114. Daugavet, A. (2013). Geminacija soglasnyx v latyšskom jazyke: Sledy pribaltijsko-finskogo vlijanija. In V. V. Ivanov and P. Arkadiev (eds.), Issledovanija po tipologii slavjanskix, baltijskix i balkanskix jazykov, 280–319. St Petersburg: Aletheia. Daugavet, A. (2015). The lengthening of the first component of Lithuanian diphthongs in an areal perspective. In P. Arkadiev, A. Holvoet, and B. Wiemer (eds.), Contemporary Approaches to Baltic Linguistics, 139–202. Berlin: De Gruyter. Davidson, L., and M. Stone (2003). Epenthesis versus gestural mistiming in consonant cluster production: An ultrasound study. In Proceedings of the 22nd West Coast Conference on Formal Linguistics, 165–178. Somerville, MA: Cascadilla Press. Davis, C., and J. Kim (2007). Audio-visual speech perception off the topic of the head. Cognition 100, B21–B31. Davis, M. H., I. S. Johnsrude, A. Hervais-Adelman, K. Taylor, and C. McGettigan (2005). Lexical information drives perceptual learning of distorted speech: Evidence from the comprehension of noise-vocoded sentences. Journal of Experimental Psychology: General 134, 222–241. Davis, M. H., W. D. Marslen-Wilson, and M. G. Gaskell (2002). Leading up the lexical garden path: Segmentation and ambiguity in spoken word recognition. Journal of Experimental Psychology: Human Perception and Performance 28, 218–244. Davis, S. (2011). Geminates.  In M. van Oostendorp, C. J. Ewen, E. V. Hume, and K. Rice (eds.), The Blackwell Companion to Phonology, vol. 1, 837–859. Oxford: Wiley Blackwell. Davis, S. M., and M. H. Kelly (1997). Knowledge of the English noun-verb stress difference by native and nonnative speakers. Journal of Memory and Language 36, 445–460. Day, C. (1973). The Jacaltec Language. Bloomington: Indiana University. Day-O’Connell, J. (2013). Speech, song, and the minor third: An acoustic study of the stylized interjection. Music Perception 30, 441–462. Dayley, J. (1989). Tümpisa (Panamint) Shoshone Grammar. Berkeley: University of California Press. de Bhaldraithe, T. (1945/1966). The Irish of Cois Fharrige, Co. Galway: A Phonetic Study. Dublin: Dublin Institute of Advanced Studies. de Boer, E. M. (1956). On the ‘residue’ in hearing. PhD dissertation, University of Amsterdam. de Boer, E. M. (1977). Pitch theories unified. In E. F. Evans and J. P. Wilson (eds.), Psychophysics and Physiology of Hearing, 323–334. London: Academic Press. de Boer, E. M. (2017). Universals of tone rules and diachronic change in Japanese. Journal of Asian and African Studies 94, 217–242. de Bot, K., and K. Mailfert (1982). The teaching of intonation: Fundamental research and classroom applications. TESOL Quarterly 16, 71–77. de Bree, E., P. van Alphen, P. Fikkert, and F. Wijnen (2008). Metrical stress in comprehension and production of Dutch children at risk of dyslexia. In H. Chan, H. Jacob, and E. Kapia (eds.),

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

728 References Proceedings of the 32nd Annual Boston University Conference on Language Development, 60–71. Somerville, MA: Cascadilla Press. de Búrca, S. (1958/1970). The Irish of Tourmakeady, Co. Mayo. Dublin: Dublin Institute of Advanced Studies. de Carvalho, A., I. Dautriche, and A. Christophe (2016a). Preschoolers use phrasal prosody online to constrain syntactic analysis. Developmental Science 19(2), 235–250. de Carvalho, A., I. Dautriche, I. Lin, and A. Christophe (2017). Phrasal prosody constrains syntactic analysis in toddlers. Cognition 163, 67–79. de Carvalho, A., A. X. He, J. Lidz, and A. Christophe (2015). 18-month-olds use phrasal prosody as a cue to constrain the acquisition of novel word meanings. Paper presented at the Boston University Conference on Language Development, Boston. de Carvalho, A., J. Lidz, L. Tieu, T. Bleam, and A. Christophe (2016b). English-speaking preschoolers can use phrasal prosody for syntactic parsing. Journal of the Acoustical Society of America 139(6), EL216–EL222. de Cheveigné, A. (2005). Pitch perception models. In C. J. Plack, A. J. Oxenham, and R. R. Richard (eds.), Pitch, 169–233. New York: Springer. De Gelder, B., and P. Bertelson (2003). Multisensory integration, perception and ecological validity. Trends in Cognitive Sciences 7, 460–467. de Jong, K. (1995). The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. Journal of the Acoustical Society of America 97, 491–504. de Jong, K., M. E. Beckman, and J. Edwards (1993). The interplay between prosodic structure and coarticulation. Language and Speech 36, 197–212. de Jong, K., and J. McDonough (1993). Tone and tonogenesis in Navajo. UCLA Working Papers in Phonetics 84, 165–182. de Jong, N. H. (2016). Fluency in second language assessment. In D. Tsagari and J. Banerjee (eds.), Handbook of Second Language Assessment, 203–218. Berlin: Mouton de Gruyter. de Jong Boudreault, L. J. (2009). A grammar of Sierra Popoluca (Soteapanec, a Mixe-Zoquean language). PhD dissertation, University of Texas at Austin. De la Fuente, J., J. Santiago, A. Román, C. Dumitrache, and D. Casasanto (2014). When you think about it, your past is in front of you: How culture shapes spatial conceptions of time. Psychological Science 25, 1682–1690. de la Mora, D. M., M. Nespor, and J. M. Toro (2013). Do humans and nonhuman animals share the grouping principles of the iambic–trochaic law? Attention, Perception, and Psychophysics 75, 92–100. de Lacy, P. (2002). The interaction of tone and stress in Optimality Theory. Phonology 19, 1–32. de Lacy, P. (2004). Markedness conflation in Optimality Theory. Phonology 21, 1–55. De Marneffe, M.-C., and J. Tonhauser (2016). Inferring meaning from indirect answers to polar questions: The contribution of the rise-fall-rise contour. In E. Onea, M. Zimmermann, and K. von Heusinger (eds.), Questions in Discourse, 132–163. Leiden: Brill. De Ruiter, J. P., H. Mitterer, and N. J. Enfield (2006). Projecting the end of a speaker’s turn: A cognitive cornerstone of conversation. Language 82(3), 515–535. de Swart, B. J. M., S. C. Willemse, B. A. M. Maasen, and M. W. I. M. Horstink (2003). Improvement of voicing in patients with Parkinson’s disease by speech therapy. Neurology 60(3), 498–500. de Vaan, M. (1999). Towards an explanation of the Franconian tone accents. Amsterdamer Beiträge zur älteren Germanistik 51, 23–44. De Vos, C., E. van der Kooij, and O. Crasborn (2009). Mixed signals: Combining linguistic and affect ive functions of eyebrows in questions in Sign Language of the Netherlands. Language and Speech 52(2–3), 315–339.‫‏‬ DeCasper, A. J., and W. P. Fifer (1980). Of human bonding: Newborns prefer their mothers’ voices. Science 208(4448), 1174–1176. Dehé, N. (2006). Some notes on the focus–prosody relation and phrasing in Icelandic. In G. Bruce and M. Horne (eds.), Nordic Prosody: Proceedings of the IXth Conference, Lund 2004, 47–56. Frankfurt: Peter Lang.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 729 Dehé, N. (2008). To delete or not to delete: The contexts of Icelandic final vowel deletion. Lingua 118(5), 732–753. Dehé, N. (2009). An intonational grammar for Icelandic. Nordic Journal of Linguistics 32(1), 5–34. Dehé, N. (2010). The nature and use of Icelandic prenuclear and nuclear pitch accents: Evidence from f0 alignment and syllable/segment duration. Nordic Journal of Linguistics 33(1), 31–65. Dehé, N. (2014). Final devoicing of /l/ in Reykjavík Icelandic. In Proceedings of Speech Prosody 7, 757–761, Dublin. Dehé, N. (2015). The intonation of the Icelandic other-initiated repair expressions Ha Huh and Hvað Segirðu/Hvað Sagðirðu What Do/Did You Say. Nordic Journal of Linguistics 38(2), 189–219. Dehé, N. (2018). The intonation of polar questions in North American (heritage) Icelandic. Journal of Germanic Linguistics 30(3), 211–256. Dehé, N., and B. Braun (2013). The prosody of question tags in English. English Language and Linguistics 17(1), 129–156. Dehé, N., and B. Braun (2020). The intonation of information seeking and rhetorical questions in Icelandic. Journal of Germanic Linguistics 32(1), 1–42. Dehé, N., I. Feldhausen, and S. Ishihara (2011). The prosody–syntax interface: Focus phrasing, language evolution. Lingua 121(13), 1863–1869. Dehé, N., and A. Wetterlin (2013). Secondary stress in morphologically complex words in Faroese: A word game. In H. Härtl (ed.), Interfaces of Morphology, 229–248. Berlin: Akademie. Delais-Roussarie, E., and H.-Y. Yoo (2011). Learner corpora and prosody: From the COREIL corpus to principles on data collection and corpus design. Poznań Studies in Contemporary Linguistics 47(1), 26–39. Delais-Roussarie, E., B. Post, M. Avanzi, C. Buthke, Di Cristo A., I. Feldhausen, S.-A. Jun, P. Martin, T. Meisenburg, A. Rialland, R. Sichel-Bazin, and H. Y. Yoo (2015). Developing a ToBI system for French. In S. Frota and P. Prieto (eds.), Intonational Variation in Romance. Oxford: Oxford University Press. Delais-Roussarie, E., H.-Y. Yoo, and B. Post (2011). Quand frontières prosodiques et frontières syntaxiques se rencontrent. Langue française 170, 29–44. Delattre, P. (1962). Some factors of vowel duration and their cross-linguistic validity. Journal of the Acoustical Society of America 34(8), 1141–1143. Delattre, P. (1966). A comparison of syllable length conditioning among languages. International Review of Applied Linguistics in Language Teaching 4, 183–198. Dell, F. (1984). L’accentuation dans les phrases en français. In F. Dell, D. Hirst, and J.-R. Vergnaud (eds.), La forme sonore du langage, 65–122. Paris: Hermann. Dell, F., and M. Elmedlaoui (2008). Poetic Meter and Musical Form in Tashlhiyt Berber Songs. Cologne: Rüdiger Köppe. Dell, F., and J. Halle (2009). Comparing musical textsetting in French and in English songs. In J.-L. Arleo and A. Arleo (eds.), Towards a Typology of Poetic Forms: From Language to Metrics and Beyond, 63–78. Amsterdam: John Benjamins. Dell, G. S. (1986). A spreading activation theory of retrieval in sentence production. Psychological Review 93, 283–321. Dell, G. S., L. K. Burger, and W. R. Svec (1997). Language production and serial order: A functional analysis and a model. Psychological Review 104, 123–147. Dell, G. S., C. Juliano, and A. Govindjee (1993). Structure and content in language production: A theory of frame constraints in phonological speech errors. Cognitive Science 17, 149–195. Dell, G. S., and P. A. Reich (1977). A model of slips of the tongue. In R. J. Dipietro and E. L. Blansitt (eds.), The Third LACUS Forum, 448–455. Columbia, SC: Hornbeam Press. Dellwo, V. (2006). Rhythm and speech rate: A variation coefficient for ∆C. In P. Karnowski and I. Szigeti (eds.), Language and Language-Processing, 231–241. Frankfurt: Peter Lang. Dellwo, V. (2010). Influences of speech rate on the acoustic correlates of speech rhythm: An experimental phonetic study based on acoustic and perceptual evidence. PhD dissertation, University of Bonn.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

730 References Dellwo, V., A. Leemann, and M.-J. Kolly (2015). Rhythmic variability between speakers: Articulatory, prosodic, and linguistic factors. Journal of the Acoustical Society of America 137(3), 1513–1528. Dellwo, V., and P. Wagner (2003). Relationships between rhythm and speech rate. In Proceedings of the 15th International Congress of Phonetic Sciences, 471–474, Barcelona. Demenko, G. (1999). Analiza cech suprasegmentalnych języka polskiego na potrzeby technologii mowy. Poznań: Wydawnictwo Naukowe UAM. Demenko, G., and W. Jassem (1999). A text-to-speech oriented comparison of Polish and English intonation. Acta Acustica 85, 51–52. Demenko, G., A. Wagner, and N. Cylwik (2010). The use of speech technology in foreign language pronunciation training. Archives of Acoustics 35(3), 309–329. Demers, R., F. Escalante, and E. Jelinek (1999). Prominence in Yaqui words. International Journal of American Linguistics 65, 40–55. Demolin, D. (2011). Aerodynamic techniques for phonetic fieldwork. In Proceedings of the 17th International Congress of Phonetic Sciences, 84–87, Hong Kong. Demuth, K. (1996). Stages in the acquisition of prosodic structure. In E. Clark (ed.), Proceedings of the 27th Child Language Research Forum, 39–48. Stanford: CSLI. Demuth, K. (2014). Prosodic licensing and the development of phonological and morphological representations. In A. Farris-Trimble and J. Barlow (eds.), Perspectives on Phonological Theory and Acquisition: Papers in Honor of Daniel A. Dinnsen (Language Acquisition and Language Disorders 56), 11–24. Amsterdam: John Benjamins. Dench, A. (1994). Martuthunira: A Language of the Pilbara Region of Western Australia. Canberra: Pacific Linguistics. Deng, F. M. (1973). The Dinka and Their Songs. Oxford: Oxford University Press. Deo, A. S. (2007). The metrical organization of classical Sanskrit verse. Journal of Linguistics 43(1), 63–114. Deo, A. S., and P. Kiparsky (2011). Poetries in contact: Arabic, Persian, and Urdu. In M.-K. Lotman and M. Lotman (eds.), Frontiers of Comparative Prosody, 145–172. Bern: Peter Lang. Deo, A., and J. Tonhauser (2018). On the prosody of pragmatic focus in Chodri, Gujurati and Marathi. Paper presented at the 34th South Asian Languages Analysis Roundtable, Konstanz. DePaolis, R., M. Vihman, and S. Kunnari (2008). Prosody in production at the onset of word use: A cross-linguistic study. Journal of Phonetics 36, 406–422. DePape, A.-M. R., A. Chen, G. B. Hall, and L. J. Trainor (2012). Use of prosody and information structure in high functioning adults with autism in relation to language ability. Frontiers in Psychology 3, 199–246. DePaulo, B. M., J. J. Lindsay, B. E. Malone, L. Muhlen-Bruck, K. Charlton, and H. Cooper (2003). Cues to deception. Psychological Bulletin 129(1), 74–118. Derwing, T. M., and M. J. Munro (1997). Accent, intelligibility, and comprehensibility. Evidence from four L1s. Studies in Second Language Acquisition 20, 1–16. Derwing, T. M., and M. J. Munro (2015). Pronunciation Fundamentals: Evidence-Based Perspectives for L2 Teaching and Research: Vol. 42. Amsterdam: John Benjamins. Derwing, T. M., M. J. Munro, and G. Wiebe (1998). Evidence in favor of a broad framework for pronunciation instruction. Language Learning 48(3), 393–410. Deterding, D. (1994). The Intonation of Singapore English. Journal of the International Phonetic Association 24(2), 61–72. Deuchar, M. (1983). Is BSL an SVO language? In J. Kyle and B. Woll (eds.), Language in Sign, 69–76. London: Croom Helm. DeVault, D., R. Artstein, G. Benn, T. Dey, E. Fast, A. Gainer, K. Georgilia, J. Gratch, A. Hartholt, M. Lhommet, G. M. Lucas, S. Marsella, F. Morbini, A. Nazarian, S. Scherer, G. Stratou, A. Suri, D. R. Traum, R. Wood, Y. Xu, A. Rizzo, and L.-P. Morency (2014). SimSensei Kiosk: A virtual human interviewer for healthcare decision support. Autonomous Agents and Multiagent Systems 2014, 1061–1068.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 731 Developmental Disabilities Monitoring Network Surveillance Year 2010 Principal Investigators, and Centers for Disease Control and Prevention (CDC) (2014). Prevalence of autism spectrum disorder among children aged 8 years: Autism and developmental disabilities monitoring network, 11 sites, United States, 2010. Morbidity and Mortality Weekly Report 63(2), 1–21. Devillers, L., and L. Vidrascu (2006). Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In INTERSPEECH 2006, 801–804, Pittsburgh. Devonish, H., and O. G. Harry (2008). Jamaican Creole and Jamaican English: Phonology. In B. Kortmann and E. W. Schneider (eds.), Varieties of English: Vol. 2. The Americas and the Caribbean, 256–289. Berlin: Mouton de Gruyter. Di Cristo, A. (2000). Vers une modélisation de l’accentuation du français. French Language Studies 10, 27–44. Di Cristo, A., and D. J. Hirst (1986). Modelling French micromelody: Analysis and syntheses. Phonetica 43, 11–30. Di Cristo, A., and D. Hirst (1993). Prosodic regularities in the surface structure of French questions. In Proceedings of the ESCA Workshop on Prosody (Working Papers 41), 268–271. Lund: Department of Linguistics. Di Gioacchino, M., and L. C. Jessop (2011). Uptalk: Towards a quantitative analysis. Toronto Working Papers in Linguistics 33(1). Di Napoli, J. (2015). Glottalization at phrase boundaries in Tuscan and Roman Italian. In J. Romero and M. Riera (eds.), The Phonetics/Phonology Interface: Sounds, Representations, Methodologies, 125–148. Amsterdam: John Benjamins. Diaz-Campos, M. (2000). The phonetic manifestation of secondary stress in Spanish. In H. Campos, E. Herburger, A. Morales-Front, and T. J. Walsh (eds.), Hispanic Linguistics at the Turn of the Millennium: Papers from the 3rd Hispanic Linguistics Symposium 49–65. Somerville: Cascadilla Press. DiCanio, C. (2008). The phonetics and phonology of San Martín Itunyoso Trique. PhD dissertation, University of California, Berkeley. DiCanio, C. (2009). The phonetics of register in Takhian Thong Chong. Journal of the International Phonetic Association 39, 162–188. DiCanio, C. (2010). Illustrations of the IPA: Itunyoso Trique. Journal of the International Phonetic Association 40(2), 227–238. DiCanio, C. T. (2012a). Coarticulation between tone and glottal consonants in Itunyoso Trique. Journal of Phonetics 40, 162–176. DiCanio, C. T. (2012b). Cross-linguistic perception of Itunyoso Trique tone. Journal of Phonetics 40, 672–688. DiCanio, C. (2014). Triqui tonal coarticulation and contrast preservation in tonal phonology. In Proceedings from Sound Systems of Mexico and Central America. Yale University. Retrieved 8 June 2020 from https://ubir.buffalo.edu/xmlui/bitstream/handle/10477/41240/DiCanio-2014-Triqui-SSMCA. pdf?sequence=1&isAllowed=y. DiCanio, C. (2016). Tonal classes in Itunyoso Trique person morphology. In E. L. Palancar and J. L. Leonard (eds.), Tone and Inflection: New Facts and New Perspectives (Trends in Linguistics Studies and Monographs 296), 225–266. Berlin: Mouton de Gruyter. DiCanio, C. (submitted). The phonetics of word-prosodic structure in Ixcatec. DiCanio, C., J. D. Amith, and R. C. García (2014). The phonetics of moraic alignment in Yoloxóchitl Mixtec. In Proceedings of the 4th International Symposium on Tonal Aspects of Languages, 203–210, Nijmegen. DiCanio, C., J. Benn, and R. C. García (2018). The phonetics of information structure in Yoloxóchitl Mixtec. Journal of Phonetics 68, 50–68. Dickens, P. J. (1994). English-Jul’hoan, Jul’hoan-English Dictionary. Cologne: Köppe. Dickinson, C. (2002). Complex Predicates in Tsafiki. Eugene: University of Oregon. Diehl, J. J., L. Bennetto, D. Watson, C. Gunlogson, and J. McDonough (2008). Resolving ambiguity: A psycholinguistic approach to understanding prosody processing in high-functioning autism. Brain and Language 106, 144–152.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

732 References Diehl, J. J., C. Friedberg, R. Paul, and J. Snedeker (2015). The use of prosody during syntactic processing in children and adolescents with autism spectrum disorders. Development and Psychopathology 27(3), 867–884. Diehl, J. J., and R. Paul (2009). The assessment and treatment of prosodic disorders and neurological theories of prosody. International Journal of Speech-Language Pathology 11(4), 287–292. Diehl, J. J., and R. Paul (2013). Acoustic and perceptual measurements of prosody production on the profiling elements of prosodic systems in children by children with autism spectrum disorders. Applied Psycholinguistics 34(1), 135–161. Diehl, J. J., D. Watson, L. Bennetto, J. McDonough, and C. Gunlogson (2009). An acoustic analysis of prosody in high-functioning autism. Applied Psycholinguistics 30(3), 385–404. Diehl, R. L., A. Lotto, and L. L. Holt (2004). Speech perception. Annual Review of Psychology 55, 149–179. Diffloth, G. (1982). Registres, dévoisement, timbres vocaliques: Leur histoire en Katouique. MonKhmer Studies 11, 47–82. Diffloth, G. (1985). The registers of Mon vs. the spectrographists tones. UCLA Working Papers in Phonetics 60, 55–58. Dilley, L. C. (2005). The phonetics and phonology of tonal systems. PhD dissertation, MIT. Dilley, L. C. (2008). On the dual relativity of tone. In Proceedings of the Annual Meeting of the Chicago Linguistics Society, vol. 41, 129–144, Chicago. Dilley, L. C. (2010). Pitch range variation in English tonal contrasts is continuous, not categorical. Phonetica 67, 63–81. Dilley, L. C., and M. Breen (in press). An enhanced autosegmental-metrical theory (AM+) facilitates phonetically transparent prosodic annotation: A reply to Jun. In J. A. Barnes and S. ShattuckHufnagel (eds.), Prosodic Theory and Practice. Cambridge, MA: MIT Press. Dilley, L. C., and M. Brown (2005). The RaP (Rhythm and Pitch) Labeling System, Version 1.0. Retrieved 12 May 2020 from http://tedlab.mit.edu/tedlab_website/RaPHome.html. Dilley, L. C., and M. Brown (2007). Effects of pitch range variation on f0 extrema in an imitation task. Journal of Phonetics 35, 523–551. Dilley, L. C., and C. Heffner (2013). The role of f0 alignment in distinguishing intonation categories: Evidence from American English. Journal of Speech Sciences 3(1), 3–67. Dilley, L. C., D. R. Ladd, and A. Schepman (2005). Alignment of L and H in bitonal pitch accents: Testing two hypotheses. Journal of Phonetics 33, 115–119. Dilley, L. C., and J. D. McAuley (2008). Distal prosodic context affects word segmentation and lexical processing. Journal of Memory and Language 59, 294–311. Dilley, L. C., S. L. Mattys, and L. Vinke (2010). Potent prosody: Comparing the effects of distal prosody, proximal prosody, and semantic context on word segmentation. Journal of Memory and Language 63, 274–294. Dilley, L. C., and M. A. Pitt (2010). Altering context speech rate can cause words to appear or disappear. Psychological Science 21(11), 1664–1670. Dilley, L. C., S. Shattuck-Hufnagel, and M. Ostendorf (1996). Glottalization of word-initial vowels as a function of prosodic structure. Journal of Phonetics 24, 423–444. Dimitrova, S. (1997). Bulgarian speech rhythm: Stress-timed or syllable-timed? Journal of the International Phonetic Association 27(1–2), 27–33. Dimitrova, S., and S.-A. Jun (2015). Pitch accent variability in focus production and perception in Bulgarian declaratives. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Dimitrova, S., and A. E. Turk (2012). Patterns of accentual lengthening in English four-syllable words. Journal of Phonetics 40, 403–418. Dimmendaal, G. J. (2008). Language ecology and linguistic diversity on the African continent. Language and Linguistics Compass 2(5), 840–858. D’Imperio, M. (1999). Tonal structure and pitch targets in Italian focus constituents. In Proceedings of the 14th International Congress of Phonetic Sciences, 1757–1760, San Francisco. D’Imperio, M. (2000). The role of perception in defining tonal targets and their alignment. PhD dissertation, The Ohio State University.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 733 D’Imperio, M. (2001). Focus and tonal structure in Neapolitan Italian. Speech Communication 33, 339–356. D’Imperio, M. (2002). Italian intonation: An overview and some questions. Probus 14, 37–69. D’Imperio, M. (2006). Current issues in tonal alignment. Italian Journal of Linguistics/Rivista di linguistica 18, 1. D’Imperio, M., R. Bertrand, A. Di Cristo, and C. Portes (2007a). Investigating phrasing levels in French: Is there a difference between nuclear and prenuclear accents? In J. Camacho, V. Deprez, N. Flores, and L. Sanchez (eds.), Selected Papers from the 36th Linguistic Symposium on Romance Languages (LSRL), 97–110. New Brunswick: John Benjamins. D’Imperio, M., and F. Cangemi (2009). Phrasing, register level downstep and partial topic constructions in Neapolitan Italian. In C. Gabriel and C. Lléo (eds.), Intonational Phrasing in Romance and Germanic (Hamburger Studies on Multilingualism 10), 75–94. Amsterdam: John Benjamins. D’Imperio, M., F. Cangemi, and L. Brunetti (2008). The phonetics and phonology of contrastive topic constructions in Italian. Studies on Multilingualism 10, 75–94. Amsterdam: John Benjamins. D’Imperio, M., G. Elordieta, S. Frota, P. Prieto, and M. Vigário (2005). Intonational phrasing in Romance: The role of syntactic and prosodic structure. In S. Frota, M. Vigàrio, and M. J. Freitas (eds.), Prosodies, 59–97. Berlin: Mouton de Gruyter. D’Imperio, M., R. Espesser, H. Loevenbruck, C. Menezes, N. Nguyen, and P. Welby (2007b). Are tones aligned with articulatory events? Evidence from Italian and French. In J. Cole and J. I. Hualde (eds.), Laboratory Phonology 9, 577–608. Berlin: Mouton de Gruyter. D’Imperio, M., and J. S. German (2015). Phonetic detail and the role of exposure in dialect imitation. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. D’Imperio, M., and B. Gili Fivela (2003). How many levels of phrasing? Evidence from two varieties of Italian. In J. Local, R. Ogden, and R. Temple (eds.), Phonetic Interpretation: Papers in Laboratory Phonology VI, 130–144. Cambridge: Cambridge University Press. D’Imperio, M., B. Gili Fivela, and O. Niebuhr (2010). Alignment perception of high intonational plateaux in Italian and German. In Proceedings of Speech Prosody 5, Chicago. D’Imperio, M., and D. House (1997). Perception of questions and statements in Neapolitan Italian. In Eurospeech 1997, 251–254, Rhodes. D’Imperio, M., and A. Michelas (2014). Pitch scaling and the internal structuring of the intonation phrase in French. Phonology 31(1), 95–122. D’Imperio, M., and C. Petrone (2008). Is the Clitic group tonally marked in Italian questions and statements? Poster presented at the 11th Conference on Laboratory Phonology, Wellington. D’Imperio, M., C. Petrone, and C. Graux-Czachor (2015). The influence of metrical constraints on direct imitation across French varieties. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. D’Imperio, M., and R. Rosenthal (1999). Phonetics and phonology of main stress in Italian. Phonology 16, 1–28. D’Imperio, M., J. Terken, and M. Piterman (2000). Perceived tone ‘targets’ and pitch accent identification in Italian. In Proceedings of the 8th Australian International Conference on Speech Science and Technology, 201–211, Canberra. Ding, N., L. Melloni, H. Zhang, X. Tian, and D. Poeppel (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience 19(1), 158–164. Ding, N., A. D. Patel, L. Chen, H. Butler, C. Luo, and D. Poeppel (2017). Temporal modulations in speech and music. Neuroscience and Biobehavioral Reviews 81, 181–187. Ding, P. S. (2014). A Grammar of Prinmi: Based on the Central Dialect of Northwest Yunnan. Leiden: Brill. Dixon, R. M. W. (1972). The Dyirbal Language of North Queensland. Cambridge: Cambridge University Press. Dixon, R. M. W. (1977). A Grammar of Yidiny. Cambridge: Cambridge University Press. Dixon, R. M. W. (1991). Mbabaram. In R. M. W. Dixon and B. Blake (eds.), The Handbook of Australian Languages, vol. 4, 348–402. Oxford: Oxford University Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

734 References Dixon, R. M. W. (2002). Australian Languages: Their Nature and Development. Cambridge: Cambridge University Press. Dixon, R. M. W. (2004). The Jarawara Language of Southern Amazonia. Oxford: Oxford University Press. Dixon, R. M. W., and A. Aikhenvald (1999). The Amazonian Languages. Cambridge: Cambridge University Press. D’Mello, S. K., and A. Graesser (2010). Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features. User Modeling and User-Adapted Interaction 20, 147–187. Đỗ, T. D., T. H. Trần, and G. Boulakia (1998). Intonation in Vietnamese. In D. Hirst and A. Di Cristo (eds.), Intonation Systems: A Survey of Twenty Languages, 395–416. Cambridge: Cambridge University Press. Doak, I. (1990). Truncation, í-suffixation, and extended vowel length in Coeur d’Alene. In Papers for the 25th International Conference on Salish and Neighbouring Languages, 97–107. Vancouver, BC: University of British Columbia. Dobrieva, E. A., E. V. Golovko, S. A. Jacobson, and M. E. Krauss (2004). Naukan Yupik Eskimo Dictionary (ed. S. A. Jacobson). Fairbanks: Alaska Native Language Center, University of Alaska. Dobrovolsky, M. (1999). The phonetics of Chuvash stress: Implications for phonology. In Proceedings of the 14th International Conference of Phonetic Sciences, 539–542, San Francisco. D’Odorico, L. (1984). Non-segmental features in prelinguistic communications: An analysis of some types of infant cry and non-cry vocalizations. Journal of Child Language 11(1), 17–27. Dogil, G. (1999a). Baltic languages. In H. van der Hulst (ed.), Word Prosodic Systems in the Languages of Europe, 877–896. Berlin: Mouton de Gruyter. Dogil, G. (1999b). The phonetic manifestation of word stress in Lithuanian, German, Polish and Spanish. In H. van der Hulst (ed.), Word Prosodic Systems in the Languages of Europe, 273–311. Berlin: Mouton de Gruyter. Dogil, G., and A. Schweitzer (2011). Quantal effects in the temporal alignment of prosodic events. In Proceedings of the 17th International Congress of Phonetic Sciences, 595–598, Hong Kong. Dogil, G., and B. Williams (1999). The phonetic manifestation of word stress. In H. van der Hulst (ed.), Word Prosodic Systems in the Languages of Europe, 273–334. Berlin: Walter de Gruyter. Dohen, M., H. Lœvenbruck, M.-A. Cathiard, and J.-L. Schwartz (2004). Visual perception of contrast ive focus in reiterant French speech. Speech Communication 44, 155–172. Domahs, U., S. Genç, J. Knaus, R. Wiese, and B. Kabak (2013). Processing (un)predictable word stress: ERP evidence from Turkish. Language and Cognitive Processes 28(3), 335–354. Domahs, U., J. Knaus, P. Orzechowska, and R. Wiese (2012). Stress ‘deafness’ in a Language with fixed word stress: An ERP study on Polish. Frontiers in Psychology 3, 439. Domaneschi, F., M. Romero, and B. Braun (2017). Bias in polar questions. Glossa: A Journal of General Linguistics 21(2), 23. Dombrowski, E., and O. Niebuhr (2005). Acoustic patterns and communicative functions of phrasefinal rises in German: Activating and restricting contours. Phonetica 62(2), 176–195. Donaldson, B. C. (1993). A Grammar of Afrikaans. Berlin: De Gruyter. Donaldson, T. (1980). Ngiyambaa: The Language of the Wangaaybuwan. Cambridge: Cambridge University Press. Donohue, M. (1999). A Grammar of Tukang Besi. Berlin: Mouton de Gruyter. Donohue, M. (2003). The tonal system of Skou, New Guinea. In S. Kaji (ed.), Proceedings of the Symposium Cross-Linguistic Studies of Tonal Phenomena: Historical Development, Phonetics of Tone, and Descriptive Studies, 329–355. Tokyo: Research Institute for Language and Cultures of Asia and Africa, Tokyo University of Foreign Studies. Donohue, M. (2009). The phonological typology of Papuan languages. Course taught at the Linguistic Society of America Linguistic Institute, Berkeley. Dore, J. (1975). Holophrases, speech acts and language universals. Journal of Child Language 2(1), 21–40.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 735 Dorman, M. F., P. C. Loizou, and D. Rainey (1997). Speech understanding as a function of the number of channels of stimulation for processors using sine-wave and noise-band outputs. Journal of the Acoustical Society of America 102, 2403–2411. Dorn, A. (2014). Sub-dialect variation in the intonation of Donegal Irish. PhD dissertation, Trinity College Dublin. Dorn, A., and A. Ní Chasaide (2011). Effects of focus on f0 and duration in Irish (Gaelic) declaratives. In INTERSPEECH 2011, 965–968, Florence. Dorn, A., and A. Ní Chasaide (2016). Donegal Irish rises: Similarities and differences to rises in English varieties. In Proceedings of Speech Prosody 8, 163–167, Boston. Dorn, A., M. O’Reilly, and A. Ní Chasaide (2011). Prosodic signalling of sentence mode in two var ieties of Irish (Gaelic). In Proceedings of the 17th International Congress of Phonetic Sciences, 611–614, Hong Kong. Dorrington, N. (2010a). Speaking up: A comparative investigation into the onset of uptalk in General South African English. BA dissertation, Rhodes University. Dorrington, N. (2010b). ‘Speaking up’: An investigation into the uptalk phenomenon in South Africa. Paper presented at the LSSA/SAALA/SAALT Conference, Pretoria. Abstract retrieved 5 May 2020 from https://salals.org.za/abstracts2010-2. dos Santos, M. (2006). Uma gramática do Wapixána (Aruák). PhD dissertation, UNICAMP. dos Santos, R. S. (2003). Bootstrapping in the acquisition of word stress in Brazilian Portuguese. Journal of Portuguese Linguistics 2(1), 93–114. Doughty, C. J., and M. H. Long (eds.) (2005). The Handbook of Second Language Acquisition. Malden: Wiley Blackwell. Downing, L. J. (1990). Problems in Jita tonology. PhD dissertation, University of Illinois. (Revision published 1996, The Tonal Phonology of Jita. Munich: Lincom.) Downing, L. J. (2006). Canonical Forms in Prosodic Morphology. Oxford: Oxford University Press. Downing, L. J. (2010). Accent in African languages. In H. van der Hulst, R. W. N. Goedemans, and E. van Zanten (eds.), A Survey of Word Accentual Patterns in the Languages of the World, 381–428. Berlin: Mouton de Gruyter. Downing, L. J. (2014). Melodic verb tone patterns in Jita. Africana Linguistica 20, 101–119. Downing, L. J. (2017). Tone and intonation in Chichewa and Tumbuka. In L. J. Downing and A. Rialland (eds.), Intonation in African Tone Languages, 365–392. Berlin: Mouton de Gruyter. Downing, L. J., A. Mtenje, and B. Pompino-Marschall (2004). Prosody and information structure in Chichewa. In S. Fuchs and S. Hamann (eds.), Papers in Phonetics and Phonology (ZASpil), 167–186, Berlin. Downing, L. J., and B. Pompino-Marschall (2013). The focus prosody of Chichewa and the stressfocus constraint: A response to Samek-Lodovici (2005). Natural Language and Linguistic Theory 31(3), 647–681. Downing, L. J., and A. Rialland (eds.) (2017a). Intonation in African Tone Languages. Berlin: Mouton de Gruyter. Downing, L. J., and A. Rialland (2017b). Introduction. In L. J. Downing and A. Rialland (eds.), Intonation in African Tone Languages, 1–16. Berlin: Mouton de Gruyter. Dresher, B. E. (1994). The prosodic basis of Tiberian Hebrew system of accents. Language 70, 1–52. Dresher, B. E., and J. D. Kaye (1990). A computational learning model for metrical phon ology. Cognition 34(2), 137–195. Dretske, F. I. (1972). Contrastive statements. Philosophical Review 81(4), 411–437. Drubig, H. B., and W. Schaffar (2001). Focus constructions. In M. Haspelmath, E. König, W. Oesterreicher, and W. Raible (eds.), Language Typology and Language Universals: An International Handbook, 1079–1104. Berlin: Mouton de Gruyter. Dryer, M. S. (2013). Polar questions. In M. S. Dryer and M. Haspelmath (eds.), The World Atlas of Language Structures Online. Max Planck Institute for Evolutionary Anthropology. Retrieved 13 May 2020 from http://wals.info/chapter/116.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

736 References Dryer, M. S., and M. Haspelmath (eds.) (2013). The World Atlas of Language Structures Online. Max Planck Institute for Evolutionary Anthropology. Retrieved 13 May 2020 from http://wals.info. Du Nay, A. (1996). The Origins of the Rumanians: The Early History of the Rumanian Language. Toronto: Matthias Corvinus. Duanmu, S. (1990). A formal study of syllable, tone, stress and domain in Chinese languages. PhD dissertation, MIT. Duanmu, S. (1993). Rime length, stress, and association domains. Journal of East Asian Linguistics 2, 1–44. Duanmu, S. (1994). Syllabic weight and syllabic duration: A correlation between phonology and phonetics. Phonology 11, 1–24. Duanmu, S. (1999). Metrical structure and tone: Evidence from Mandarin and Shanghai. Journal of East Asian Linguistics 8, 1–38. Duanmu, S. (2007). The Phonology of Standard Chinese (2nd ed.). Oxford: Oxford University Press. Duanmu, S. (2008). Syllable Structure: The Limits of Variation. Oxford: Oxford University Press. Duanmu, S. (2017). From non-uniqueness to the best solution in phonemic analysis: Evidence from Chengdu Chinese. Lingua Sinica 3(1), 1–23. Duběda, T. (2011). Towards an inventory of pitch accents for read Czech. Slovo a Slovesnost 71(1), 3–13. Duběda, T. (2014). Czech intonation: A tonal approach. Slovo a Slovesnost 75, 83–98. Duběda, T., and J. Raab (2008). Pitch accents, boundary tones and contours: Automatic learning of Czech intonation. In Text, Speech and Dialogue, 293–201. Berlin: Springer. Dubina, A. (2012). Toward a Tonal Analysis of Free Stress. Utrecht: LOT. DuBois, J. W. (1981). The Sacapultec language. PhD dissertation, University of California, Berkeley. Dueñas, R. C. (2000). Fonología y aproximación a la morfosintaxis del awa pit. In G. de Pérez, M. Stella, R. de Montes, and M. Luisa (eds.), Lenguas indígenas de Colombia: Una visión descriptiva, 97–116. Santafé de Bogotá: Instituto Caro y Cuervo. Duez, D. (1982). Silent and non silent pauses in three speech styles. Language and Speech 25, 11–28. Duffy, J. R. (2005). Motor Speech Disorders. St Louis: Elsevier Mosby. Duncan, S. (1972). Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23(2), 283–292. Dunn, J. A. (1995). Smalgyax: A Reference Dictionary and Grammar for the Coast Tsimshian Language. Seattle: University of Washington Press. Dunn, M., and L. Harris (2016). Prosody Intervention for High-Functioning Adolescents and Adults with Autism Spectrum Disorder: Enhancing Communication and Social Engagement through Voice, Rhythm, and Pitch. London: Jessica Kingsley. Dunst, C., E. Gorman, and D. Hamby (2012). Preference for infant-directed speech in preverbal young children. Center for Early Literacy Learning 5(1), 1–13. Dupoux, E., and K. Green (1997). Perceptual adjustment to highly compressed speech: Effects of talker and rate changes. Journal of Experimental Psychology: Human Perception and Performance 23, 914–927. Dupoux, E., C. Pallier, N. Sebastian, and J. Mehler (1997). A destressing ‘deafness’ in French? Journal of Memory and Language 36, 406–421. Dupoux, E., S. Peperkamp, and N. Sebastián-Gallés (2001). A robust method to study stress ‘deafness’. Journal of the Acoustical Society of America 110, 1606–1618. Dupoux, E., S. Peperkamp, and N. Sebastián-Gallés (2010). Limits on bilingualism revisited: Stress deafness in simultaneous French-Spanish bilinguals. Cognition 114, 266–275. Dupoux, E., N. Sebastián-Gallés, E. Navarrete, and S. Peperkamp (2008). Persistent stress deafness: The case of French learners of Spanish. Cognition 106, 682–706. Durand, J. (1986). French liaison, floating segments and other matters in a dependency framework. In J. Durand (ed.), Dependency and Non-Linear Phonology, 161–201. London: Croom Helm. Durand, J., and J.-M. Tarrier (2003). Enquête phonologique en Languedoc (Douzens, Aude). La tribune internationale des langues vivantes 33, 117–127. Durie, M. (1985). A Grammar of Acehnese on the Basis of a Dialect of North Aceh. Dordrecht: Foris.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 737 Dutta, I., and H. H. Hock (2006). Interaction of verb accentuation and utterance finality in Bangla. In Proceedings of Speech Prosody 3, Dresden. Eady, S. J., W. Cooper, G. V. Klouda, P. R. Mueller, and D. W. Lotts (1986). Acoustical characteristics of sentential focus: Narrow vs. broad and single vs. dual focus environments. Language and Speech 29(3), 233–251. Eberhard, D. M. (2009). Mamaindê Grammar: A Northern Nambikwara Language and Its Cultural Context. Utrecht: LOT. Ebert, K. (1979). Sprache und Tradition der Kera (Tschad): Teil III—Grammatik. Berlin: Reimer. Echols, C. H. (1993). A perceptually-based model of children’s earliest productions. Cognition 46, 245–296. Echols, C. H., and E. L. Newport (1992). The role of stress and position in determining first words. Language Acquisition 2, 189–220. Eckart, K., A. Riester, and K. Schweitzer (2012). A discourse information radio news database for linguistic analysis. In C. Chiarcos, S. Nordhoff, and S. Hellman (eds.), Linked Data in Linguistics: Representing and Connecting Language Data and Language Metadata, 65–76. Berlin: Springer. Edmondson, J. A., J. L. Chan, G. Seibert, and E. Ross (1987). The effect of right-brain damage on acoustical measures of affective prosody in Taiwanese patients. Journal of Phonetics 15, 219–233. Edmondson, J. A., and J. H. Esling (2006). The valves of the throat and their functioning in tone, vocal register and stress: Laryngoscopic case studies. Phonology 23, 157–191. Edmondson, J. A., and K. J. Gregerson (1992). On five-level tone systems. In S. J. J. Hwang and W. R. Merrifield (eds.), Language in Context: Essays for Robert E. Longacre, 555–576. Dallas: Summer Institute of Linguistics. Edmondson, J. A., and K. J. Gregerson (eds.) (1993). Tonality in Austronesian Languages (Oceanic Linguistics Special Publication 24). Honolulu: University of Hawaiʻi Press. Edmondson, J. A., L. Ziwo, J. H. Esling, J. G. Harris, and L. Shaoni (2001). The aryepiglottic folds and voice quality in the Yi and Bai languages: Laryngoscopic case studies. Mon-Khmer Studies 31, 83–100. Edmonson, B. (1988). A descriptive grammar of Huastec (Potosino dialect). PhD dissertation, Tulane University. Edwards, J. E., M. E. Beckman, and J. Fletcher (1991). The articulatory kinematics of final lengthening. Journal of the Acoustical Society of America 89, 369–382. Edwards, O. (2016). Metathesis and unmetathesis: Parallelism and complementarity in Amarasi, Timor. PhD dissertation, Australian National University. Edzard, L. (2011). Biblical Hebrew. In S. Weninger, G. Khan, M. P. Streck, and J. C. E. Watson (eds.), The Semitic Languages: An International Handbook, 480–514. Berlin: Mouton de Gruyter. Eefting, W. Z. F. (1991). The effect of information value and accentuation on the duration of Dutch words, syllables and segments. Journal of the Acoustical Society of America 89, 412–414. Eek, A. (1980a). Estonian quantity: Notes on the perception of duration. Estonian Papers in Phonetics 1979, 5–30. Eek, A. (1980b). Further information on the perception of Estonian quantity. Estonian Papers in Phonetics 1979, 31–57. Eek, A. (1987). Word stress in Estonian and Russian. In R. Channon and L. Shockey (eds.), In Honor of I. Lehiste, 19–32. Dordrecht: Foris. Eggermont, J. J., and J. K. Moore (2012). Morphological and functional development of the auditory nervous system. In L. Werner, R. R. Fay, and A. N. Popper (eds.), Human Auditory Development, 61–105. New York: Springer. Egurtzegi, A., and G. Elordieta (2013). Euskal azentueren historiaz. In R. Gómez, J. Gorrochategui, J. Lakarra, and C. Mounole (eds.), 3rd Conference of the Luis Michelena Chair, 163–186. Bilbao: University of the Basque Country Press. Eguskiza, N., A. Etxebarria, and I. Gaminde (2017). Bai-ez galderak. In A. Etxebarria and N. Eguskiza (eds.), Bariazioa Esaldien Intonazioan, 93–127. Bilbao: University of the Basque Country Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

738 References Eigsti, I. M., L. Bennetto, and M. B. Dadlani (2007). Beyond pragmatics: Morphosyntactic development in autism. Journal of Autism and Developmental Disorders 37, 1007–1023. Eigsti, I. M., A. B. de Marchena, J. M. Schuh, and E. Kelley (2011). Language acquisition in autism spectrum disorders: A developmental review. Research in Autism Spectrum Disorders 5, 681–691. Eigsti, I. M., J. M. Schuh, E. Mencl, R. T. Schultz, and R. Paul (2012). The neural underpinnings of prosody in autism. Child Neuropsychology 18(6), 600–617. Eilers, R. E., D. K. Oller, and C. R. Benito-Garcia (1984). The acquisition of voicing contrasts in Spanish and English learning infants and children: A longitudinal study. Journal of Child Language 11(2), 313–336. Eimas, P. D., E. R. Siqueland, P. W. Jusczyk, and J. Vigorito (1971). Speech perception in infants. Science 171(968), 303–306. Eka, D. (1985). A phonological study of Standard Nigerian English. PhD dissertation, Ahmadu Bello University. Ekdahl, M., and N. Butler (1979). Aprenda Terêna (vol. 1). Brasília: Summer Institute of Linguistics. Ekdahl, M., and J. Grimes (1964). Terena verbal inflection. International Journal of American Linguistics 30(3), 261–268. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion 6(3–4), 169–200. Ekman, P. (1999). Basic emotions. In T. Dalgleish and M. Power (eds.), Handbook of Cognition and Emotion, 45–60. Chichester: John Wiley and Sons. Ekman, P. (2009). Telling Lies: Clues to Deceit in the Marketplace, Politics, and Marriage (rev. ed.). New York: W. W. Norton. Ekman, P., and W. V. Friesen (1975). Unmasking the Face. Englewood Cliffs: Spectrum-Prentice Hall. Ekman, P., W. V. Friesen, and J. C. Hager (2002). Facial Action Coding System: Manual and Investigator’s Guide. Salt Lake City: Research Nexus. Ekman, P., M. Sullivan, W. V. Friesen, and K. Scherer (1991). Face, voice, and body in detecting deception. Journal of Nonverbal Behaviour 15(2), 125–135. El Zarka, D. (2017). Arabic intonation. In Oxford Handbooks Online. Oxford: Oxford University Press. Retrieved 19 May 2020 from https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/ 9780199935345.001.0001/oxfordhb-9780199935345-e-77. El Zarka, D., and S. Hellmuth (2009). Variation in the intonation of Egyptian formal and colloquial Arabic. Langues et linguistique 22, 73–92. Elders, S. (2000). Grammaire Mundang. Leiden: Leiden University. Elenbaas, N. (1999). A unified account of binary and ternary stress: Considerations from Sentani and Finnish. PhD dissertation, Utrecht University. Elías-Ulloa, J. (2004). Quantity (in)sensitivity and underlying glottal-stop deletion in Capanahua. Coyote Papers 13, 1–16. Elías-Ulloa, J. (2009). The distribution of laryngeal segments in Capanahua. International Journal of American Linguistics 75, 159–206. Elías-Ulloa, J. (2010). An Acoustic Phonetics of Shipibo-Conibo (Pano), an Endangered Amazonian Language—A New Approach to Documenting Linguistic Data. Lewiston, ID: Edwin Mellen Press. Elías-Ulloa, J. (2016). The role of prominent prosodic positions in governing laryngealization in vowels: A case study of two Panoan languages. In H. Avelino, M. Coler, and L. Wetzels (eds.), The Phonetics and Phonology of Laryngeal Features in Native American Languages (Brill’s Studies in the Indigenous Languages of the Americas 12). Leiden: Brill. Elimelech, B. (1978). A Tonal Grammar of Etsako. Berkeley: University of California Press. Ellington, J. (1977). Aspects of the Tiene language. PhD dissertation, University of Wisconsin, Madison. Elordieta, G. (1997). Accent, tone, and intonation in Lekeitio Basque. In F. Martínez-Gil and A. Morales-Front (eds.), Issues in the Phonology and Morphology of the Major Iberian Languages, 3–78. Washington, DC: Georgetown University Press. Elordieta, G. (1998). Intonation in a pitch accent variety of Basque. ASJU: International Journal of Basque Linguistics and Philology 32, 511–569.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 739 Elordieta, G. (2003). Intonation. In J. I. Hualde and J. Ortiz de Urbina (eds.), A Grammar of Basque, 72–113. Berlin: Mouton de Gruyter. Elordieta, G. (2007a). A constraint-based analysis of the intonational realization of focus in Northern Bizkaian Basque. In T. Riad and C. Gussenhoven (eds.), Tones and Tunes: Vol. 1. Typological Studies in Word and Sentence Prosody, 199–232. Berlin: Mouton de Gruyter. Elordieta, G. (2007b). Minimum size constraints on intermediate phrases. In Proceedings of the 16th International Congress of Phonetic Sciences, 1021–1024, Saarbrücken. Elordieta, G. (2011). Euskal azentuaren bilakaera: Hipotesiak eta proposamenak. In A. Sagarna, J. Lakarra, and P. Salaberri (eds.), Pirinioetako Hizkuntzak: Lehena eta Oraina, Iker 26, 989–1014. Bilbao: Euskaltzaindia. Elordieta, G. (2015). Recursive phonological phrasing in Basque. Phonology 32, 1–30. Elordieta, G., and N. Calleja (2005). Microvariation in accentual alignment in Basque Spanish. Language and Speech 48, 397–439. Elordieta, G., S. Frota, and M. Vigario (2005). Subjects, objects and intonational phrasing in Spanish and Portuguese. Studia Linguistica 59(2–3), 110–143. Elordieta, G., and J. I. Hualde (2014). Intonation in Basque. In S.-A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 405–463. Oxford: Oxford University Press. Elugbe, B. O. (1977). Some implications of low tone raising in southwestern Edo. Studies in African Linguistics, Suppl. 7, 53–62. Elugbe, B. O. (1986). The analysis of falling tones in Ghotuo̞. In K. Bogers, H. van der Hulst, and M. Mous (eds.), The Phonological Representation of Suprasegmentals, 51–62. Dordrecht: Foris. Emenanjo, E. N. (1978). Elements of Modern Igbo Grammar: A Descriptive Approach. Ibadan: Oxford University Press. Encrevé, P. (1988). La liaison avec et sans enchaînement. Paris: Le Seuil. Enfield, N. J. (2003). Linguistic Epidemiology: Semantics and Grammar of Language Contact in Mainland Southeast Asia. London: Routledge/Curzon. Enfield, N. J. (2005). Areal linguistics and mainland Southeast Asia. Annual Review of Anthropology 34, 181–206. Enfield, N. J. (2011). Linguistic diversity in mainland Southeast Asia. In N. J. Enfield (ed.), Dynamics of Human Diversity: The Case of Mainland Southeast Asia, 63–80. Canberra: Pacific Linguistics. Engberg-Pedersen, E. (1990). Pragmatics of nonmanual behaviour in Danish Sign Language. In SLR ’87: Papers from the Fourth International Symposium on Sign Language Research, Lappeenranta, Finland July 15–19, 1987, 121–128. Copenhagen: University of Copenhagen. England, N. (1983). A Grammar of Mam, a Mayan Language. Austin: University of Texas Press. England, N. (1990). El Mam: Semejanzas y diferencias regionales. In N. England and S. Elliott (eds.), Lecturas Sobre la Lingüística Maya, 221–252. Antigua, Guatemala: Centro de Investigaciones Regionales de Mesoamérica. England, N. (1991). Changes in basic word order in Mayan languages. International Journal of American Linguistics 57(4), 446–486. England, N. (2001). Introducción a la gramática de los idiomas Mayas. Ciudad de Guatemala: Cholsamaj. England, N., and B. Baird (2017). Phonology and phonetics. In J. Aissen, N. England, and R. Z. Maldonado (eds.), The Mayan Languages, 175–200. New York: Routledge. Engstrand, O., and D. Krull (2001). Simplification of phonotactic structures in unscripted Swedish. Journal of the International Phonetic Association 31, 41–50. Engstrand, O., K. Williams, and F. Lacerda (2003). Does babbling sound native? Listener responses to vocalizations produced by Swedish and American 12- and 18-month-olds. Phonetica 60, 17–44. Engstrand, O., K. Williams, and S. Strömqvist (1991). Acquisition of the Swedish tonal word accent contrast. In Proceedings of the 12th International Congress of Phonetic Sciences, 324–327, Aix-enProvence. Enos, F., and J. Hirschberg (2006). A framework for eliciting emotional speech: Capitalizing on the actor’s process. Paper presented at the 5th International Conference on Language Resources and Evaluation Workshop on Corpora for Research on Emotion and Affect, Genoa.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

740 References Epps, P. (2017). Reconstructing prosodic phonological features in the Nadahup family: A tale of tonogenesis. Paper presented at the 8th Conference on Indigenous Languages of Latin America, Austin. Eraso, N. (2015). Gramática Tanimuka, lengua de la Amazonía Colombiana. PhD dissertation, Lyon 2 University. Erickson, P. G. (1995). Harm reduction: What it is and is not. Drug and Alcohol Review 14, 283–285. Erikson, Y., and M. Alstermark (1972). Fundamental frequency correlates of the grave word accent in Swedish: The effect of vowel duration. Speech Transmission Laboratory: Quarterly Papers and Status Report, 2–3, 53–60. Eriksson, A. (1991). Aspects of Swedish speech rhythm. PhD dissertation, University of Gothenburg. Escandell-Vidal, V. (2017). Intonation and evidentiality in Spanish polar questions. Language and Speech 60(2), 224–241. Esling, J. H., and J. H. Harris (2005). States of the glottis: An articulatory phonetic model based on laryngoscopic observations. In W. J. Hardcastle and J. M. Beck (eds.), A Figure of Speech: A Festschrift for J. Laver, 345–383. Mahwah: Erlbaum. Esling, J. H., and S. R. Moisik (2012). Laryngeal aperture in relation to larynx height change: An ana lysis using simultaneous laryngoscopy and laryngeal ultrasound. In D. Gibbon, D. Hirst, and N. Campbell (eds.), Rhythm, Melody and Harmony in Speech: Studies in Honour of W. Jassem (Speech and Language Technology 14/15), 117–127. Poznań: Polskie Towarzystwo Fonetyczne. Esposito, C. M. (2010). Variation in contrastive phonation in Santa Ana del Valle Zapotec. Journal of the International Phonetic Association 40, 181–198. Esposito, C. M. (2012). An acoustic and electroglottographic study of White Hmong tone and phon ation. Journal of Phonetics 40, 466–476. Esposito, C. M., and S. U. D. Khan (2012). Contrastive breathiness across consonants and vowels: A comparative study of Gujarati and White Hmong. Journal of the International Phonetic Association 42, 123–143. Estebas Vilaplana, E. (2003). The status of L in British English prenuclear accents. Atlantis 25(1), 39–50. Estebas-Vilaplana, E. (2013). TL_ToBI: A new system for teaching and learning intonation. In Proceedings of the Phonetics Teaching and Learning Conference, 39–42, London. Esteve-Gibert, N., E. A. Joan Borràs-Comes, M. Swerts, and P. Prieto (2017). The timing of head movements: The role of prosodic heads and edges. Journal of the Acoustical Society of America 141(6), 4727–4739. Esteve-Gibert, N., and P. Prieto (2013). Prosody signals the emergence of intentional communication in the first year of life: Evidence from Catalan-babbling infants. Journal of Child Language 40, 919–944. Esteve-Gibert, N., and P. Prieto (2018). Early development of the prosody-meaning interface. In P. Prieto and N. Esteve-Gibert (eds.), The Development of Prosody in First Language Acquisition, 227–246. Amsterdam: John Benjamins. Evans, J. P. (2008). African tone in the Sinosphere. Language and Linguistics 9, 463–490. Evans, J. P. (2015). High is not just the opposite of low. Journal of Phonetics 51, 1–5. Evans, J. P., W.-C. Yeh, and R. Kulkarni (2018). Acoustics of tone in Indian Punjabi. Transactions of the Philological Society 116(3), 509–528. Evans, L. (1997). C_ToBI: Towards a system for the prosodic transcription of Welsh. MSc dissertation, University of Edinburgh. Evans, N. (1995). A Grammar of Kayardild: With Historical-Comparative Notes on Tangkic (Mouton Grammar Library 15). Berlin: Walter de Gruyter. Evans, N. (2003). Bininj Gun-wok: A Pan-dialectal Grammar of Mayali, Kunwinjku and Kune (Pacific Linguistics 541) (2 vols.). Canberra: Pacific Linguistics. Evans, N., J. Bishop, I. Mushin, B. Birch, and J. Fletcher (1999). The sound of one quotation mark: Intonational cues to quotation in four north Australian languages. Paper presented at the Australian Linguistics Society Annual Conference, Crawley. Evans, N., J. Fletcher, and B. Ross (2008). Big words, small phrases: Mismatches between pause units and the polysynthetic word in Dalabon. Linguistics 46, 89–129.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 741 Everett, D. L. (1995). Sistemas prosódicos da família Arawá. In W. L. Wetzels (ed.), Estudos fonológicos das línguas indígenas Brasileiras, 297–339. Rio de Janeiro: Editora UFRJ. Everett, D. L., and K. Everett (1984a). On the relevance of syllable onsets to stress placement. Linguistic Inquiry 15, 705–711. Everett, D. L., and K. Everett (1984b). Syllable onsets and stress placement in Pirahã. In Proceedings of the 3rd West Coast Conference on Formal Linguistics, 105–116. Stanford: Stanford Linguistics Association. Everett, K. (1998). The acoustic correlates of stress in Pirahã. Journal of Amazonian Languages 1(2), 104–162. Eyben, F., F. Weninger, F. Gross, and B. Schuller (2013). Recent developments in Opensmile, the Munich open-source multimedia feature extractor. In Proceedings of the 21st ACM International Conference on Multimedia, 835–838, Barcelona. Faarlund, J. T. (2012). A Grammar of Chiapas Zoque. Oxford: Oxford University Press. Fabb, N., and M. Halle (2008). Meter in Poetry: A New Theory. Cambridge: Cambridge University Press. Face, T. (2002). Intonational Marking of Contrastive Focus in Madrid Spanish. Munich: Lincom Europa. Face, T. (2006). Narrow focus intonation in Castilian Spanish absolute interrogatives. Journal of Language and Linguistics 5(2), 295–311. Face, T. (2008). The Intonation of Castilian Spanish Declaratives and Absolute Interrogatives. Munich: Lincom Europa. Fandrianto, A., and M. Eskenazi (2012). Prosodic entrainment in an information-driven dialog system. In INTERSPEECH 2012, 342–354, Portland. Fanselow, G. (2016). Syntactic and prosodic reflexes of information structure in Germanic. In C. Féry and S. Ishihara (eds.), Oxford Handbook of Information Structure, 621–641. Oxford: Oxford University Press. Fant, G., A. Kruckenberg, and L. Nord (1991). Durational correlates of stress in Swedish, French, and English. Journal of Phonetics 19, 351–365. Farkas, D. F., and F. Roelofsen (2017). Division of labor in the interpretation of declaratives and interrogatives. Journal of Semantics 34(2), 1–53. Farmer, S., and L. Michael (n.d.). Máíhɨk̀ ̃ ì tone in comparative Tukanoan perspective. Ms. Retrieved 8 June 2020 from http://www.cabeceras.org/ldm_publications/IJAL_mai_tone_v5.pdf. Farmer, T. A., M. H. Christiansen, and P. Monaghan (2006). Phonological typicality influences on-line sentence comprehension. Proceedings of the National Academy of Sciences of the United States of America 103(32), 12203–12208. Fasold, R. W., and W. Wolfram (1970). Some linguistic features of Negro dialect. In R. Fasold and R. W. Shuy (eds.), Teaching Standard English in the Inner City, 41–86. Washington, DC: Center for Applied Linguistics. Fava, E., R. Hull, K. Baumbauer, and H. Bortfeld (2014a). Hemodynamic responses to speech and music in preverbal infants. Child Neuropsychology 20, 430–438. Fava, E., R. Hull, and H. Bortfeld (2014b). Dissociating cortical activity during processing of native and non-native audiovisual speech from early to late infancy. Brain Sciences 4, 471–487. Fay, R. R. (ed.) (2012). Comparative Hearing: Mammals (vol. 4). New York: Springer Science and Business Media. Fay, W., and A. Schuler (1980). Emerging Language in Autistic Children. Baltimore: University Park Press. Fecht, G. (1960). Wortakzent und Silbenstruktur: Untersuchungen zur Geschichte der ‘gyptischen Sprache’. Glückstadt: J. J. Augustin. Fehn, A.-M. (2016). A grammar of Ts’ixa (Kalahari Khoe). PhD dissertation, University of Cologne. Feldman, H. (1978). Some notes on Tongan phonology. Oceanic Linguistics 17, 133–139. Feraru, S. M., D. Schuller, and B. Schuller (2015). Cross-language acoustic emotion recognition: An overview and some tendencies. In Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 125–131, Xi’an.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

742 References Ferguson, C. A. (1957). Word stress in Persian. Language 33(2), 123–135. Ferguson, C. A. (1964). Baby talk in six languages. American Anthropologist 66, 103–114. Ferguson, C. A. (1977). Baby talk as a simplified register. In C. E. Snow and C. Ferguson (eds.), Talking to Children, 219–236. Cambridge: Cambridge University Press. Ferlus, M. (1979). Formation des registres et mutations consonantiques dans les langues Mon-Khmer. Mon-Khmer Studies Journal 8, 1–76. Fernald, A. (1989). Intonation and communicative intent in mothers’ speech to infants: Is the melody the message? Child Development 60(6), 1497–1510. Fernald, A. (1992). Human vocalizations to infants as biologically relevant signals: An evolutionary perspective. In J. Barkow and L. Cosmides (eds.), The Adapted Mind: Evolutionary Psychology and the Generation of Culture, 391–428. New York: Oxford University Press. Fernald, A., and P. K. Kuhl (1987). Acoustic determinants of infant preference for motherese speech. Infant Behavior and Development 10(3), 279–293. Fernald, A., and G. W. McRoberts (1996). Prosodic bootstrapping: A critical analysis of the argument and the evidence. In J. L. Morgan and K. Demuth (eds.), Signal to Syntax: Bootstrapping from Speech to Grammar in Early Acquisition, 365–388. Hillsdale, NJ: Erlbaum. Fernald, A., and C. Mazzie (1991). Prosody and focus in speech to infants and adults. Developmental Psychology 27(2), 209–221. Fernald, A., and T. Simon (1984). Expanded intonation contours in mothers’ speech to newborns. Developmental Psychology 20(1), 104–113. Fernald, A., T. Taeschner, J. Dunn, M. Papousek, B. de Boysson-Bardies, and I. Fukui (1989). A crosslanguage study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language 16, 477–501. Fernandes, F. R. (2007). Ordem, focalização e preenchimento em Português: Sintaxe e prosódia. PhD dissertation, University of Campinas. Fernandez, R., A. Rosenberg, A. Sorin, B. Ramabhadran, and R. Hoory (2017). Voice-transformationbased data augmentation for prosodic classification. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 5530–5534, New Orleans. Ferragne, E., and F. Pellegrino (2004). A comparative account of the suprasegmental and rhythmic features of British English dialects. Paper presented at Modelisations pour l’Identiﬁcation des Langues, Paris. Ferreira, F. (1993). Creation of prosody during speech production. Psychological Review 100(2), 233–253. Ferreira, F. (2007). Prosody and performance in language production. Language and Cognitive Processes 22(8), 1151–1117. Ferrer, L., E. Shriberg, and A. Stolcke (2003). A prosody-based approach to end-of-utterance detection that does not require speech recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 608–611, Hong Kong. Ferrier, L. J. (1985). Intonation in discourse: Talk between 12-month-olds and their mothers. In K. E. Nelson (ed.), Children’s Language (vol. 5, 35–60). Hillsdale, NJ: Erlbaum. Féry, C. (1993). German Intonational Patterns (Linguistische Arbeiten 285). Tübingen: Niemeyer. Féry, C. (1995). Alignment, Syllable and Metrical Structure in German. Habilitation thesis, University of Tübingen. Féry, C. (1997). Uni und Studis: Die besten Wörter des Deutschen. Linguistische Berichte 172, 461–490. Féry, C. (2001). Focus and phrasing in French. In C. Féry and W. Sternefeld (eds.), Audiatur vox sapientiae: A Festschrift for A. von Stechow (Studia Grammatica 52), 153–181. Berlin: Akademie. Féry, C. (2007). The prosody of topicalization. In K. Schwabe and S. Winkler (eds.), On Information Structure, Meaning and Form, 69–86. Amsterdam: John Benjamins. Féry, C. (2008). Information structural notions and the fallacy of invariant grammatical correlates. Acta Linguistica Hungarica 55(3–4), 361–380. Féry, C. (2010). Indian languages as intonational phrase languages. In S. I. Hasnain and S. Chaudhary (eds.), Problematizing Language Studies: Cultural, Theoretical, and Applied Perspectives—Essays in Honour of Rama Kant Agnihotri, 288–312. Delhi: Aakar Books.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 743 Féry, C. (2011). German sentence accents and embedded prosodic phrases. Lingua 121(13), 1906–1922. Féry, C. (2013). Focus as prosodic alignment. Natural Language and Linguistic Theory 31(3), 683–734. Féry, C. (2017). Intonation and Prosodic Structure. Oxford: Oxford University Press. Féry, C., and S. Ishihara (2010). How focus and givenness shape prosody. In M. Zimmermann and C. Féry (eds.), Information Structure: Theoretical, Typological, and Experimental Perspectives, 36–65. Oxford: Oxford University Press. Féry, C., and S. Ishihara (eds.) (2016). Oxford Handbook of Information Structure. Oxford: Oxford University Press. Féry, C., G. Kentner, and P. Pandey (2016). The prosody of focus and givenness in Hindi and Indian English. Studies in Language 40(2), 302–339. Féry, C., and M. Krifka (2008). Information structure: Notional distinctions, ways of expression. In P. van Sterkenburg (ed.), Unity and Diversity of Languages, 123–136. John Benjamins: Amsterdam. Féry, C., and F. Kügler (2008). Pitch accent scaling on given, new and focused constituents in German. Journal of Phonetics 36(4), 680–703. Féry, C., A. Paslawska, and G. Fanselow (1997). Nominal split constructions in Ukrainian. Journal of Slavic Linguistics 15, 3–48. Féry, C., and V. Samek-Lodovici (2006). Focus projection and prosodic prominence in nested foci. Language 82(1), 131–150. Féry, C., and H. Truckenbrodt (2005). Sisterhood and tonal scaling. Studia Linguistica 59(2–3), 223–243. Field, J. (2005). Intelligibility and the listener: The role of lexical stress. TESOL Quarterly 39(3), 399–423. Field, T., and E. Ignatoff (1981). Videotaping effects on the behaviors of low income mothers and their infants during floor-play interactions. Journal of Applied Developmental Psychology 2(3), 227–235. Fikkert, P. (1994). On the acquisition of prosodic structure. PhD dissertation, University of Leiden. Firbas, J. (1980). Post-intonation-centre prosodic shade in the modern English clause. In S. Greenbaum, G. Leech, and J. Svartvik (eds.), Studies in English for Randolph Quirk, 125–133. London: Longman. Firbas, J. (1992). Functional Sentence Perspective in Written and Spoken Communication. Cambridge: Cambridge University Press. Fischer, S. (1975). Influences on Word-Order Change in American Sign Language. San Diego: Salk Institute. Fischer, W. (1997). Classical Arabic. In R. Hetzron (ed.), The Semitic Languages, 187–219. London: Routledge. Fischer-Jørgensen, E. (1989). A Phonetic Study of the Stød in Standard Danish. Åbo: University of Turku. Fisher, C., and H. Tokura (1995). The given/new contract in speech to infants. Journal of Memory and Language 34, 287–310. Fisher, J., E. Plante, R. Vance, L. Gerken, and T. J. Glattke (2007). Do children and adults with language impairment recognize prosodic cues? Journal of Speech, Language, and Hearing Research 50, 746–758. Fisher, W. (1973). Towards the reconstruction of Proto-Yucatec. PhD dissertation, University of Chicago. Fisher, W. (1976). On tonal features in the Yucatecan dialects. In M. McClaran (ed.), Mayan Linguistics, 29–43. Los Angeles: University of California Press. Fitzgerald, C. M. (1997). O’odham rhythms. PhD dissertation, University of Arizona. Fitzgerald, C. M. (1998). The meter of Tohono O’odham songs. International Journal of American Linguistics 64, 1–36. Fitzgerald, C. M. (2006). Iambic verse in Somali. In B. E. Dresher and N. Friedberg (eds.), Formal Approaches to Poetry: Recent Developments in Metrics, 193–209. Berlin: Mouton de Gruyter. Fitzgerald, C. M. (2012). Prosodic inconsistency in Tohono O’odham. International Journal of American Linguistics 78, 425–463.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

744 References Fitzgerald, H. (2003). How Different Are We? Spoken Discourse in Intercultural Communication. Clevedon: Multilingual Matters. Fitzpatrick-Cole, J., and A. Lahiri (1997). Focus, intonation and phrasing in Bengali and English Intonation: Theory, models and applications. In Proceedings of the European Speech Communication Association Workshop on Intonation, 119–122, Athens. Flack, K. (2007). Templatic morphology and indexed markedness constraints. Linguistic Inquiry 38, 749–758. Flanagan, J. J., and M. G. Saslow (1958). Pitch discrimination for synthetic vowels. Journal of the Acoustical Society of America 30, 435–442. Flege, J. (1995). Second language speech learning: Theory, findings, and problems. In W. Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-Language Research, 229–273. Timonium, MD: York Press. Flemming, E. (2011). La grammaire de la coarticulation. In M. Embarki and C. Dodane (eds.), La coarticulation: Des indices à la représentation, 189–211. Paris: L’Harmattan. (English version retrieved 8 June 2020 from http://web.mit.edu/flemming/www/paper/grammar-of-coarticulation.pdf.) Fletcher, H. (1940). Auditory patterns. Reviews of Modern Physics 12, 47–65. Fletcher, J. (2005). Compound rises and ‘uptalk’ in Spoken English. In INTERSPEECH 2005, 1381–1384, Lisbon. Fletcher, J. (2010). The prosody of speech: Timing and rhythm. In W. J. Hardcastle, J. Laver, and F. E. Gibbon (eds.), The Handbook of Phonetic Sciences (2nd ed.), 523–602. Oxford: Wiley Blackwell. Fletcher, J. (2014). Intonation and prosody in Dalabon. In S.-A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 252–272. Oxford: Oxford University Press. Fletcher, J., and A. Butcher (2014). Sound patterns of Australian languages. In H. Koch and R. Nordlinger (eds.), The World of Linguistics: Vol. 3. The Languages and Linguistics of Australia, 91–138. Berlin: Mouton de Gruyter. Fletcher, J., and N. Evans (2000). Intonational downtrends in Mayali. Australian Journal of Linguistics 20, 23–38. Fletcher, J., and N. Evans (2002). An acoustic phonetic analysis of intonational prominence in two Australian languages. Journal of the International Phonetic Association 32, 123–140. Fletcher, J., E. Grabe, and P. Warren (2005). Intonational variation in four dialects of English: The high rising tune. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 390–409. Oxford: Oxford University Press. Fletcher, J., and J. Harrington (2001). High-rising terminals and fall-rise tunes in Australian English. Phonetica 58(4), 215–229. Fletcher, J., and D. Loakes (2010). Interpreting rising intonation in Australian English. In Proceedings of Speech Prosody 5, Chicago. Fletcher, J., and L. Stirling (2014). Prosody and discourse in the Australian English Map Task corpus. In J. Durand, U. Gut, and G. Kristoffersen (eds.), The Oxford Handbook of Corpus Phonology, 562–575. Oxford: Oxford University Press. Fletcher, J., H. Stoakes, D. Loakes, and R. Singer (2015). Accentual prominence and consonant lengthening and strengthening in Mawng. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Fletcher, J., L. Stirling, I. Mushin, and R. Wales (2002). Intonational rises and dialog acts in the Australian English map task. Language and Speech 45(3), 229–253. Fletcher, J., H. Stoakes, R. Singer, and D. Loakes (2016). Intonational correlates of subject and object realisation in Mawng (Australian). In Proceedings of Speech Prosody 8, 188–192, Boston. Flipsen, P. (2008). Intelligibility of spontaneous conversational speech produced by children with cochlear implants: A review. International Journal of Pediatric Otorhinolaryngology 72, 559–564. Flipsen, P. (2011). Examining speech sound acquisition for children with cochlear implants using the GFTA-2. The Volta Review 111(1), 25–37. Floccia, C., T. Keren-Portnoy, R. DePaolis, H. Duffy, C. Delle Luche, S. Durrant, L. White, J. Goslin, and M. Vihman (2016). British English infants segment words only with exaggerated infant-directed speech stimuli. Cognition 148, 1–9.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 745 Fónagy, I. (1958). A hangsúlyról (Nyelvtudományi Értekezések 18). Budapest: Akadémiai Kiadó. Fónagy, I., and K. Magdics (1967). A magyar beszéd dallama. Budapest: Akadémiai Kiadó. Foote, J., and K. McDonough (2017). Using shadowing with mobile technology to improve L2 pronunciation. Journal of Second Language Pronunciation 3(1), 34–56. Ford, C. E., and S.-A. Thompson (1996). Interactional units in conversation: Syntactic, intonational and pragmatic resources for the management of turns. In E. A. Schegloff, E. Ochs, and S.-A. Thompson (eds.), Interaction and Grammar, 134–184. Cambridge: Cambridge University Press. Foreman, J. O. (2006). The morphosyntax of subjects in Macuiltianguis Zapotec. PhD dissertation, University of California, Los Angeles. Fornaciari, T., and M. Poesio (2013). Automatic deception detection in Italian court cases. Artificial Intelligence and Law 21(3), 303–340. Fort, M. C. (2001). Das Saterfriesische. In H. H. Munske, N. Århammar, V. F. Faltings, J. Hoekstra, O. Vries, A. G. H. Walker, and O. Wilts (eds.), Handbook of Frisian Studies, 409–422. Berlin: Walter de Gruyter. Fortescue, M. (1983). Intonation across Inuit dialects. Études Inuit Studies 7(2), 113–124. Fortescue, M. (1984). West Greenlandic. London: Croom Helm. Fortescue, M. (2004). West Greenlandic (Eskimo). In G. Booij, C. Lehmann, J. Mugdan, and S. Skopeteas in collaboration with W. Kesselheim (eds.), Morphology: An International Handbook on Inflection and Word-Formation, 1389–1399. Berlin: Walter de Gruyter. Fortunato, T. M. (2015). Prosodic boundary effects on syntactic disambiguation in children with cochlear implants, and in normal hearing adults and children. PhD dissertation, City University of New York. Fougeron, C., and P. A. Keating (1997). Articulatory strengthening at edges of prosodic domains. Journal of the Acoustical Society of America 106, 3728–3740. Fougeron, C. (1999). Prosodically conditioned articulatory variations: A review. UCLA Working Papers in Phonetics, 97, 1–74. Fougeron, C. (2001). Articulatory properties of initial segments in several prosodic constituents in French. Journal of Phonetics 29, 109–135. Fougeron, C. (2007). Word boundaries and contrast neutralization in the case of enchaînement in French. In J. Cole and J. I. Hualde (eds.), Laboratory Phonology 9, 609–642. Berlin: Mouton de Gruyter. Fougeron, C., J.-P. Goldman, and U. H. Frauenfelder (2001). Liason and schwa deletion in French: An effect of lexical frequency and competition? In Eurospeech 2001, 639–642, Aalborg. Fought, C. (2002). Ethnicity. In J. K. Chambers, P. Trudgill, and N. Schilling-Estes (eds.), The Handbook of Language Variation and Change, 444–472. Malden: Blackwell. Fought, C. (2003). Chicano English in Context. New York: Palgrave Macmillan. Fourakis, M., A. Botinis, and M. Katsaiti (1999). Acoustic characteristics of Greek vowels. Phonetica 56, 28–43. Fourakis, M., and C. B. Monahan (1988). Effects of metrical foot structure on syllable timing. Language and Speech 31(3), 283–306. Fournier, R., C. Gussenhoven, O. Jensen, and P. Hagoort (2010). Lateralization of tonal and inton ational pitch processing: An MEG study. Brain Research 1328, 79–88. Fowler, C. A. (1983). Converging sources of evidence on spoken and perceived rhythms of speech: Cyclic production of vowels in monosyllabic stress feet. Journal of Experimental Psychology: General 112(3), 386–412. Fowler, C. A., and J. M. Brown (1997). Intrinsic f0 differences in spoken and sung vowels and their perception by listeners. Perception and Psychophysics 59, 729–738. Fox, J. (1978). Proto-Mayan accent, morpheme structure conditions, and velar innovation. PhD dissertation, University of Chicago. Frajzyngier, Z. (2012). Typological outline of the Afroasiatic phylum. In Z. Frajzyngier and E. Shay (eds.), The Afroasiatic Languages, 505–623. Cambridge: Cambridge University Press. Frajzyngier, Z., and E. Shay (2012). Chadic. In Z. Frajzyngier and E. Shay (eds.), The Afroasiatic Languages, 236–341. Cambridge: Cambridge University Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

746 References Francis, A. L., V. Ciocca, and B. K. C. Ng (2003). On the (non)categorical perception of lexical tones. Perception and Psychophysics 65(7), 1029–1044. Francis, A. L., V. Ciocca, L. Ma, and K. Fenn (2008). Perceptual learning of Cantonese lexical tones by tone and non-tone language speakers. Journal of Phonetics 36, 268–294. Francis, A. L., V. Ciocca, N. K. Y. Wong, W. H. Y. Leung, and P. C. Y. Chu (2006). Extrinsic context affects perceptual normalization of lexical tone. Journal of the Acoustical Society of America 119, 1712–1726. Francuzik, K., M. Karpiński, J. Kleśta, and E. Szalkowska (2005). Nuclear melody in Polish semispontaneous and read speech: Evidence from the Polish Intonational Database PoInt. Studia Phonetica Posnaniensia 7, 97–128. Frank, M. C., E. Bergelson, C. Bergmann, A. Cristia, C. Floccia, J. Gervain, J. K. Hamlin, E. E. Hannon, M. A. Kline, C. Levelt, C. Lew-Williams, T. Nazzi, R. Panneton, H. Rabagliati, M. Soderstrom, J. R. Sullivan, S. R. Waxman, and D. Yurovsky (2017). A collaborative approach to infant research: Promoting reproducibility, best practices, and theory-building. Infancy 22, 421–435. Frank, M. G., and E. Svetieva (2015). Microexpressions and deception. In M. K. Mandal and A. Awasthi (eds.), Understand Facial Expressions in Communication: Cross-Cultural and Multidisciplinary Perspectives, 227–242. Berlin: Springer. Franks, S. (1987). Regular and irregular stress in Macedonian. International Journal of Slavic Linguistics and Poetics 35, 93–142. Frantz, D. G. (1972). The origin of Cheyenne pitch accent. International Journal of American Linguistics 38, 223–225. Franzen, V., and M. Horne (1997). Word stress in Romanian. Working Papers [Lund University Department of Linguistics] 46, 75–91. Frascarelli, M., and A. Puglielli (2009). Information structure in Somali. Evidence from the syntaxphonology interface. Brill’s Journal of Afroasiatic Languages and Linguistics 1(1), 146–175. Frazier, L., K. Carlson, and C. Clifton, Jr (2006). Prosodic phrasing is central to language comprehension. Trends in Cognitive Sciences 10(6), 244–249. Frazier, M. (2009a). The production and perception of pitch and glottalization in Yucatec Maya. PhD dissertation, University of North Carolina at Chapel Hill. Frazier, M. (2009b). Tonal dialects and consonant-pitch interaction in Yucatec Maya. In H. Avelino, J. Coon, and E. Norcliffe (eds.), New Perspectives in Mayan Linguistics, vol. 59, 59–82. Cambridge, MA: MIT. Frazier, M. (2013). The phonetics of Yucatec Maya and the typology of laryngeal complexity. Sprachtypologie und Universalienforschung 66(1), 7–21. French, K. M. (1988). Insights into Tagalog: Reduplication, Infixation, and Stress from Nonlinear Phonology. Dallas: University of Texas at Arlington. Fretheim, T., and R. A. Nilsen (1989). Terminal rise and rise-fall tunes in East Norwegian intonation. Nordic Journal of Linguistics 12, 155–182. Friedberg, H., D. Litman, and S. B. F. Paletz (2012). Lexical entrainment and success in student engin eering groups. In IEEE Spoken Language Technology Workshop, 404–409, Miami. Friederici, A. D. (2002). Towards a neural basis of auditory sentence processing. Trends in Cognitive Sciences 6(2), 78–84. Friederici, A. D., and K. Alter (2004). Lateralization of auditory language functions: A dynamic dual pathway model. Brain and Language 89(2), 267–276. Friedman, L. A. (1976). The manifestation of subject, object, and topic in the American Sign Language. In C. N. Li (ed.), Subject and Topic, 17–148. New York: Academic Press. Friedrich, P. (1975). A Phonology of Tarascan (University of Chicago Studies in Anthropology Series in Social, Cultural, and Linguistic Anthropology 4). Chicago: University of Chicago Press. Fromkin, V. A. (1971). The non-anomalous nature of anomalous utterances. Language 47(1), 27–52. Frota, S. (2000). Prosody and Focus in European Portuguese: Phonological Phrasing and Intonation. New York: Garland. Frota, S. (2002). Tonal association and target alignment in European Portuguese nuclear falls. In C. Gussenhoven and N. Warner (eds.), Laboratory Phonology 7, 387–418. Berlin: Mouton de Gruyter.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 747 Frota, S. (2014). The intonational phonology of European Portuguese. In S.-A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 6–42. Oxford: Oxford University Press. Frota, S., and J. Butler (2018). Early development of intonation: Perception and production. In P. Prieto and N. Esteve-Gibert (eds.), The Development of Prosody in First Language Acquisition, 145–164. Amsterdam: John Benjamins. Frota, S., J. Butler, and M. Vigário (2014). Infants’ perception of intonation: Is it a statement or a question? Infancy 19(2), 194–213. Frota, S., M. Cruz, N. Matos, and M. Vigário (2016). Early prosodic development: Emerging inton ation and phrasing in European Portuguese. In M. E. Armstrong, N. C. Henriksen, and M. M. Vanrell (eds.), Intonational Grammar in Ibero-Romance: Approaches across Linguistic Subfields, 295–324. Philadelphia: John Benjamins. Frota, S., M. Cruz, F. Fernandes-Svartman, G. Collischonn, A. Fonseca, C. Serra, P. Oliveira, and M. Vigário (2015). Intonational variation in Portuguese: European and Brazilian varieties. In S. Frota and P. Prieto (eds.), Intonation in Romance, 235–283. Oxford: Oxford University Press. Frota, S., and J. Moraes (2016). Intonation of European and Brazilian Portuguese. In W. L. Wetzels, J. Costa, and S. Menuzzi (eds.), The Handbook of Portuguese Linguistics, 141–166. Chichester: Wiley Blackwell. Frota, S., and P. Prieto (eds.) (2015a). Intonation in Romance. Oxford: Oxford University Press. Frota, S., and P. Prieto (2015b). Intonation in Romance: Systemic similarities and differences. In S. Frota and P. Prieto (eds.), Intonation in Romance, 392–418. Oxford: Oxford University Press. Frota, S., and Vigário, M. (2000). Aspectos de prosódia comparada: Ritmo e entoação no PE e no PB. In V. R. Castro and P. A. Barbosa (eds.), Actas do XV Encontro Nacional da Associação Portuguesa de Linguística, vol. 1, 533–555. Coimbra: APL. Frota, S., M. Vigário, and F. Martins (2006). FreP: An electronic tool for extracting frequency infor mation of phonological units from Portuguese written text. In Proceedings of the 5th International Conference on Language Resources and Evaluation, 2224–2229, Genoa. Frota, S., M. Vigário, F. Martins, and M. Cruz (2010). FrePOP: Version 1.0. Phonetics Laboratory, Faculty of Letters, University of Lisbon. Retrieved 19 May 2020 from http://frepop.letras.ulisboa.pt. Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. Journal of the Acoustical Society of America 27, 765–768. Fry, D. B. (1958). Experiments in the perception of stress. Language and Speech 1, 126–152. Fry, D. B. (1965). The dependence of stress judgments on vowel formant structure. In Proceedings of the 5th International Congress of Phonetic Sciences, 306–311. Basel: Karger. Fuchs, M. (2018). Antepenultimate stress in Spanish: In defense of syllable weight and grammaticallyinformed analogy. Glossa: A Journal of General Linguistics 3(1), 80. Fuchs, R., and O. Maxwell (2015). The placement and acoustic realisation of primary and secondary stress in Indian English. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Fuchs, S., C. Petrone, J. Krivokapić, and P. Hoole (2013). Acoustic and respiratory evidence for utterance planning in German. Journal of Phonetics 41, 29–47. Fuchs, S., C. Petrone, A. Rochet-Capellan, U. D. Reichel, and L. L. Koenig (2015). Assessing respiratory contributions to f0 declination in German across varying speech tasks and respiratory demands. Journal of Phonetics 52, 35–45. Fuhrhop, N., and J. Peters (2013). Einführung in die Phonologie und Graphematik. Stuttgart: Metzler. Fuhrman, O., and L. Boroditsky (2010). Cross-cultural differences in mental representations of time: Evidence from an implicit nonlinguistic task. Cognitive Science 34(8), 1430–1451. Fujisaki, H. (1983). Dynamic characteristics of voice fundamental frequency in speech and singing. In P. MacNeilage (ed.), The Production of Speech, 39–55. New York: Springer. Fujisaki, H. (2004). Information, prosody and modeling: With emphasis on tonal features of speech. In Proceedings of Speech Prosody 2, 1–10, Nara. Fujisaki, H., and K. Hirose (1984). Analysis of voice fundamental frequency contours for declarative sentences of Japanese. Journal of the Acoustics Society of Japan 5(4), 233–241. Fukui, R. (2000). Kankokugo shohōgen no akusento taikei ni tsuite. In R. Fukui (ed.), Kankokugo Akusento Ronshū (ICHEL Linguistic Studies 3), 1–20. Tokyo: University of Tokyo.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

748 References Fukui, R. (2013). Kankokugo On’inshi no Tankyū. Tokyo: Sanseido. Fulop, S. A., E. Kari, and P. Ladefoged (1998). An acoustic study of the tongue root contrast in Degema vowels. Phonetica 55, 80–98. Furby, C. (1974). Garawa phonology. Pacific Linguistics A.37, 1–11. Fusaroli, R., A. Lambrechts, D. Bang, D. M. Bowler, and S. B. Gaigg (2017). Is voice a marker for autism spectrum disorder? A systematic review and meta-analysis. Autism Research 10(3), 384–407. Fusaroli, R., and Tylén, K. (2016). Investigating conversational dynamics: Interactive alignment, interpersonal synergy, and collective task performance. Cognitive Science 40(1), 145–171. Gaitenby, J. H. (1965). The elastic word. Haskins Report SR-2, 3.1–3.12. Galante, A., and R. Thomson (2016). The effectiveness of drama as an instructional approach for the development of second language oral fluency, comprehensibility, and accentedness. TESOL Quarterly 51(1), 115–142. Galea, L. (2016). Syllable structure and gemination in Maltese. PhD dissertation, University of Cologne. Galea Cavallazzi, K. (2004). The phonology-syntax interface in spoken Maltese English. MA thesis, University of Malta. Galloway, B. D. (1993). A Grammar of Upriver Halkomelem. Berkeley: University of California Press. Gambi, C., T. Jachmann, and M. Staudte (2015). Listen, look, go! The role of prosody and gaze in turn-end anticipation. In Proceedings of the 37th Annual Conference of the Cognitive Science Society, Pasadena. Gandour, J. T. (1983). Tone perception in Far Eastern-languages. Journal of Phonetics 11(2), 149–175. Gandour, J. T., and R. Dardarananda (1983). Identification of tonal contrasts in Thai aphasic patients. Brain and Language 18(1), 98–114. Gandour, J. T., M. Dzemidzic, D. Wong, M. Lowe, Y. Tong, L. Hsieh, N. Satthamnusong, and J. Lurito (2003a). Temporal integration of speech prosody is shaped by language experience: An fMRI study. Brain and Language 84, 318–336. Gandour, J. T., and R. A. Harshman (1978). Crosslanguage differences in tone perception: A multidimensional scaling investigation. Language and Speech 21, 1–33. Gandour, J. T., S. Potisuk, and S. Dechongkit (1992a). Anticipatory tonal coarticulation in Thai noun compounds. Linguistics of the Tibeto-Burman Area 15, 111–124. Gandour, J. T., S. Potisuk, and S. Dechongkit (1992b). Tonal coarticulation in Thai disyllabic utterances: A preliminary study. Linguistics of the Tibeto-Burman Area 15, 93–110. Gandour, J. T., S. Potisuk, and S. Dechongkit (1994). Tonal coarticulation in Thai. Journal of Phonetics 22, 474–492. Gandour, J. T., Y. Tong, D. Wong, T. Talavage, M. Dzemidzic, Y. Xu, X. Li, and M. Lowe (2004). Hemispheric roles in the perception of speech prosody. NeuroImage 23(1), 344–357. Gandour, J. T., Y. Xu, D. Wong, M. Dzemidzic, M. Lowe, X. Li, and Y. Tong (2003b). Neural correlates of segmental and tonal information in speech perception. Human Brain Mapping 20(4), 185–200. Gao, M. (2008). Mandarin tones: An articulatory phonology account. PhD dissertation, Yale University. Gao, M. (2009). Gestural coordination among vowel, consonant and tone gestures in Mandarin Chinese. Chinese Journal of Phonetics 2, 43–50. Garcia, G. (2017). Weight gradience and stress in Portuguese. Phonology 34(1), 41–79. Garcia Matzar, P. O., V. T. Cotzajay, and D. C. Tuiz (1999). Gramática del idioma Kaqchikel. Antigua: Proyecto Lingüístico Francisco Marroquín. Gårding, E. (1987). Speech act and tonal pattern in Standard Chinese: Constancy and variation. Phonetica 44, 13–29. Gårding, E., J. Zhang, and J.-O. Svantesson (1983). A generative model for tone and intonation in Standard Chinese based on data from one speaker. Lund Working Papers 25, 53–65. Garellek, M. (2014). Voice quality strengthening and glottalization. Journal of Phonetics 45, 106–113. Garellek, M. (2015). Perception of glottalization and phrase-final creak. Journal of the Acoustical Society of America 137(2), 822–831. Garellek, M., A. Aguilar, G. Caballero, and L. Carroll (2015). Lexical and post-lexical tone in Choguita Rarámuri. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 749 Garellek, M., and P. A. Keating (2011). The acoustic consequences of phonation and tone interactions in Jalapa Mazatec. Journal of the International Phonetic Association 41, 185–205. Garellek, M., P. A. Keating, and C. M. Esposito (2014). Relative importance of phonation cues in White Hmong tone perception. In Proceedings of the 38th Annual Meeting of the Berkeley Linguistic Society, 179–189, Berkeley. Garellek, M., P. A. Keating, C. M. Esposito, and J. Kreiman (2013). Voice quality and tone identification in White Hmong. Journal of the Acoustical Society of America 133, 1078–1089. Garnica, O. K. (1977). Some prosodic and paralinguistic features of speech to young children. In C. E. Snow and C. A. Ferguson (eds.), Talking to Children: Language Input and Acquisition, 63–88. Cambridge: Cambridge University Press. Garrett, E. (1999). Minimal words aren’t minimal feet. In M. Gordon (ed.), UCLA Working Papers in Linguistics: Papers in Phonology 2, 68–105. Los Angeles: UCLA Department of Linguistics. Garrett, M. F. (1975). The analysis of sentence production. In G. Bower (ed.), The Psychology of Learning and Motivation, vol. 9, 133–175. New York: Academic Press. Garrett, M. F. (1980). The limits of accommodation: Arguments for independent processing levels in sentence production. In V. A. Fromkin (ed.), Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen and Hand, 263–272. New York: Academic Press. Garvin, P. (1948). Kutenai 1: Phonemics. International Journal of American Linguistics 14, 37–43. Gass, S. M., and A. Mackey (eds.) (2012). The Routledge Handbook of Second Language Acquisition. London: Routledge. Gathercole, G. (1983). Tonogenesis and the Kickapoo tonal system. International Journal of American Linguistics 49, 72–76. Gebauer, L., J. Skewes, L. Hørlyck, and P. Vuust (2014). Atypical perception of affective prosody in autism spectrum disorder. NeuroImage: Clinical 6, 370–378. Gee, J. P., and F. Grosjean (1983). Performance structures: A psycholinguistic and linguistic appraisal. Cognitive Psychology 15, 411–458. Gelfer, C. E., K. S. Harris, and T. Baer (1987). Controlled variables in sentence intonation. In T. Baer, C. Sasaki, and K. S. Harris (eds.), Laryngeal Function in Phonation and Respiration, 422–435. Boston: Little, Brown. Gendron, M., D. Roberson, J. M. van der Vyver, and L. Feldman Barrett (2014). Perceptions of emotion from facial expressions are not culturally universal: Evidence from a remote culture. Emotion 14(2), 251–262. Gennrich-de Lisle, D. (1985). Theme in conversational discourse: Problems experienced by speakers of Black South African English, with particular reference to the role of prosody in conversational synchrony. MA thesis, Rhodes University. Genzel, S., S. Ishihara, and B. Surányi (2015). The prosodic expression of focus, contrast and givenness: A production study of Hungarian. Lingua 165(B), 183–204. Genzel, S., and F. Kügler (2010). How to elicit semi-spontaneous focus realizations with specific tonal patterns. In M. Grubic, S. Genzel, and F. Kügler (eds.), Linguistic Fieldnotes I: Information Structure in Different African Languages (Interdisciplinary Studies on Information Structure 13), 77–102. Potsdam: Universitätsverlag Potsdam. Genzel, S., and F. Kügler (2020). Production and perception of question prosody in Akan. Journal of the International Phonetic Association 50(1), 61–92. Georg, S. (2007). A Descriptive Grammar of Ket (Yenisei-Ostyak)—Part I: Introduction, Phonology, Morphology. Folkestone, UK: Global Oriental. Georgeton, L., T. Kocjančič Antolík, and C. Fougeron (2016). Effect of domain initial strengthening on vowel height and backness contrasts in French: Acoustic and ultrasound data. Journal of Speech, Language, and Hearing Research 59, 1575–1586. Gerasimovič, L. K. (1970). K voprosu o haraktere udarenija v mongol’skom jazyke. Vestnik Leningradskogo Universiteta 14, 131–137. Gerfen, C., and K. Baker (2005). The production and perception of laryngealized vowels in Coatzospan Mixtec. Journal of Phonetics 33, 311–334.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

750 References Gerhardt, K. J., R. Otto, R. M. Abrams, J. J. Colle, D. J. Burchfield, and A. J. Peters (1992). Cochlear microphonics recorded from fetal and newborn sheep. American Journal of Otolaryngology 13(4), 226–233. Gerken, L. (1994). A metrical template account of children’s weak syllable omissions from multisyllabic words. Journal of Child Language 21, 565–584. Gerken, L. (1996). Prosodic structure in young children’s language production. Language 72(4), 683–712. Gerken, L., P. W. Jusczyk, and D. R. Mandel (1994). When prosody fails to cue syntactic structure: 9-month-oldsʼ sensitivity to phonological versus syntactic phrases. Cognition 51, 237–265. Gerlach, L. (2016). Nǃaqriaxe: The Phonology of an Endangered Language of Botswana (Asien- und Afrika-Studien der Humboldt-Universität zu Berlin 47). Wiesbaden: Harrassovitz. German, J. S., J. B. Pierrehumbert, and S. Kaufmann (2006). Evidence for phonological constraints on nuclear accent placement. Language 82(1), 151–168. German, J. S., and M. D’Imperio (2016). The status of the initial rise as a marker of focus in French. Language and Speech 59(2), 165–195. Gerratt, B. R., and J. Kreiman (2001). Toward a taxonomy of nonmodal phonation. Journal of Phonetics 29, 365–381. Gervain, J., and J. Mehler (2010). Speech perception and language acquisition in the first year of life. Annual Review of Psychology 61(1), 191–218. Gervain, J., and J. F. Werker (2013). Prosody cues word order in 7-month-old bilingual infants. Nature Communications 4(1), 1490. Geschwind, N. (1971). Current concepts: Aphasia. New England Journal of Medicine 284, 654–656. Geschwind, N. (1974). Carl Wernicke, the Breslau School and the history of aphasia. In Selected Papers on Language and the Brain, 42–61. Springer: Dordrecht. Gessner, S. (2005). Properties of tone in Dene Su̜ɬiné. In S. Hargus and K. Rice (eds.), Athabaskan Prosody, 229–247. Amsterdam: John Benjamins. Giakoumelou, M., and D. Papazachariou (2013). Asking questions in Corfu: An intonational analysis. In M. Janse, B. D. Joseph, A. Ralli, and M. Bagriacik (eds.), MGDLT 5: Proceedings of the 5th International Conference on Modern Greek Dialects and Linguistic Theory, 89–100. Patras: University of Patras. Gibbon, D. (2006). Time types and time trees: Prosodic mining and alignment of temporally annotated data. In S. Sudhoff, D. Lenertova, R. Meyer, S. Pappert, P. Augurzky, I. Mleinek, N. Richter, and J. Schließer (eds.), Methods in Empirical Prosody Research, 281–209. Berlin: Walter de Gruyter. Gibbon, F. E., and K. Nicolaidis (1999). Palatography. In W. J. Hardcastle and N. Hewlet (eds.), Coarticulation: Theory, Data and Techniques, 229–245. Cambridge: Cambridge University Press. Gibson, E., L. Bergen, and S. T. Piantadosi (2013). Rational integration of noisy evidence and prior semantic expectations in sentence interpretation. Proceedings of the National Academy of Sciences of the United States of America 110(20), 8051–8056. Gick, B. (2002). The use of ultrasound for linguistic phonetic fieldwork. Journal of the International Phonetic Association 32, 113–121. Gick, B., H. Bliss, K. Michelson, and B. Radanov (2012). Articulation with acoustics: ‘Soundless’ vowels in Oneida and Blackfoot. Journal of Phonetics 40, 46–53. Giegerich, H. J. (1985). Metrical Phonology and Phonological Structure: German and English. Cambridge: Cambridge University Press. Gil, S., M. Aguert, L. Le Bigot, A. Lacroix, and V. Laval (2014). Children’s understanding of others’ emotional states: Inferences from extralinguistic or paralinguistic cues? International Journal of Behavioral Development 38(6), 539–549. Gilbert, J. (1984/2012). Clear Speech Teachers Resource and Assessment Book: Pronunciation and Listening Comprehension in North American English. Cambridge: Cambridge University Press. Gilbert, J. (1994). Intonation: A navigation guide for the listener. In J. Morley (ed.), Pronunciation Pedagogy and Theory: New Views, New Directions, 38–48. Alexandria, VA: TESOL. Gilbert, R., and J. Yoneoka (2000). From 5–7–7 to 8–8–8: An investigation of Japanese Haiku metrics and implications for English Haiku. Language Issues: Journal of the Foreign Language Education Center 1, 1–35.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 751 Giles, H., N. Coupland, and J. Coupland (1991). Accommodation theory: Communication, context, and consequence. In H. Giles, J. Coupland, and N. Coupland (eds.), Contexts of Accommodation: Developments in Applied Sociolinguistics, 1–68. Cambridge: Cambridge University Press. Gili Fivela, B. (2004). The phonetics and phonology of intonation: The case of Pisa Italian. PhD dissertation, Scuola Normale Superiore. Gili Fivela, B. (2008). Intonation in Production and Perception: The Case of Pisa Italian. Alessandria: Edizioni dellOrso. Gili Fivela, B., C. Avesani, M. Barone, G. Bocci, C. Crocco, M. D’Imperio, R. Giordano, G. Marotta, M. Savino, and P. Sorianello (2015a). Intonational phonology of the regional varieties of Italian. In S. Frota and P. Prieto (eds.), Intonation in Romance, 140–197. Oxford: Oxford University Press. Gili Fivela, B., S. D’Apolito, A. Stella, and F. Sigona (2008). Domain initial strengthening in sentences and paragraphs: Preliminary findings on the production of voiced bilabial plosives in two varieties of Italian. In Proceedings of the International 8th Seminar on Speech Production, 205–208, Strasbourg. Gili Fivela, B., G. Interlandi, and A. Romano (2015b). On the importance of fine alignment and scaling differences in perception: The case of Turin Italian. In A. Romano, M. Rivoira, and I. Meandri (eds.), Aspetti prosodici e testuali del raccontare: Dalla letteratura orale al parlato dei media, Atti del 10° convegno AISV, 22–24 gennaio 2014, Università di Torino, 229–254. Torino: Edizioni dell’Orso. Gili Fivela, B., and F. Nicora (2018). Intonation in Liguria and Tuscany: checking for similarities across a traditional isogloss boundary. In G. V. Vietti, L. Spreafico, and D. Mereu (eds.), Studi AISV 4, 131–156. Gili Fivela, B., and M. Savino (2003). Segments, syllables and tonal alignment: A study on two varieties of Italian. In Proceedings of the 15th International Conference of Phonetic Sciences, 2933–2936, 347–350, Barcelona. Gili Gaya, S. (1940). La cantidad silábica en la frase. Castilla 1, 287–298. Gim, C.-G. (1994). Tongshi seongjoron. In S.-J. Chang (ed.), Modern Linguistics: The Present and the Future, 97–131. Seoul: Hanshin. Giordano, R. (2006). The intonation of polar questions in two central varieties of Italian. In Proceedings of Speech Prosody 3, Dresden. Giordano, R. (2008). On the phonetics of rhythm of Italian: Duration patterns in pre-planned and spontaneous speech. In Proceedings of Speech Prosody 4, 347–350, Campinas. Giraud, A.-L., and D. Poeppel (2012). Cortical oscillations and speech processing: Emerging computational principles and operations. Nature Neuroscience 15, 511–517. Girdenis, A. (1997). Phonology. In V. Ambrazas (ed.), Lithuanian Grammar, 13–58. Vilnius: Baltos Lankos. Girdenis, A. (2014). Theoretical Foundations of Lithuanian Phonology (trans. S. R. Young). Vilnius: Eugrimas. Girón, J. M., and W. L. Wetzels (2007). Tone in Wansöhöt (Puinave) Colombia. In W. L. Wetzels (ed.), Language Endangerment and Endangered Languages, 129–156. Leiden: CNWS. Gleitman, L. R. (1990). The structural sources of verb meanings. Language Acquisition 1, 3–55. Gleitman, L. R., and E. Wanner (1982). Language Acquisition: The State of the State of the Art. Cambridge: Cambridge University Press. Goddard, C. (1986). Yankunytjatjara Grammar. Alice Springs: Institute for Aboriginal Development. Goddard, I. (1974). An outline of the historical phonology of Arapaho and Atsina. International Journal of American Linguistics 40, 102–116. Goddard, I. (1979). Delaware Verbal Morphology: A Descriptive and Comparative Study. New York: Garland. Goddard, I. (1982). The historical phonology of Munsee. International Journal of American Linguistics 48, 16–48. Goddard, I. (1996). The classification of the native languages of North America. In Handbook of North American Indians, vol. 17, 290–323. Washington DC: Smithsonian Institution.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

752 References Goddard, I. (2013). Algonquian linguistic change and reconstruction. In P. Baldi (ed.), Patterns of Change—Change of Patterns: Linguistic Change and Reconstruction Methodology, 98–114. Berlin: De Gruyter. Godjevac, S. (2000). Intonation, word order, and focus projection in Serbo-Croatian. PhD dissertation, The Ohio State University. Godjevac, S. (2005). Transcribing Serbo-Croatian intonation. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 146–171. Oxford: Oxford University Press. Goedemans, R. W. N. (1998). Weightless Segments: A Phonetic and Phonological Study Concerning the Metrical Irrelevance of Syllable Onsets. The Hague: Holland Academic Graphics. Goedemans, R. W. N. (2010). A typology of stress patterns. In H. van der Hulst, R. Goedemans, and E. van Zanten (eds.), A Survey of Word Accentual Patterns in the Languages of the World, 647–666. New York: Mouton de Gruyter. Goedemans, R. W. N., J. Heinz, and H. van der Hulst (2015). StressTyp2. Retrieved 15 May 2020 from http://st2.ullet.net. Goedemans, R. W. N., J. Heinz, and H. van der Hulst (eds.) (2019). The Study of Word Stress and Accent: Theories, Methods and Data. Cambridge: Cambridge University Press. Goedemans, R. W. N., and H. van der Hulst (2012). The separation of accent and rhythm: Evidence from StressTyp. In H. van der Hulst (ed.), Word Stress: Theoretical and Typological Issues, 119–148. Cambridge: Cambridge University Press. Goedemans, R. W. N., and H. van der Hulst (2013a). Fixed stress locations. In M. S. Dryer and M. Haspelmath (eds.), The World Atlas of Language Structures Online. Max Planck Institute for Evolutionary Anthropology. Retrieved 15 May 2020 from http://wals.info/chapter/14. Goedemans, R. W. N., and H. van der Hulst (2013b). Rhythm types. In M. S. Dryer and M. Haspelmath (eds.), The World Atlas of Language Structures Online. Max Planck Institute for Evolutionary Anthropology. Retrieved 15 May 2020 from http://wals.info/chapter/17. Goedemans, R. W. N., and H. van der Hulst (2013c). Weight-sensitive stress. In M. S. Dryer and M. Haspelmath (eds.), The World Atlas of Language Structures Online. Max Planck Institute for Evolutionary Anthropology. Retrieved 15 May 2020 from http://wals.info/chapter/15. Goedemans, R. W. N., and E. van Zanten (2007). Stress and accent in Indonesian. In V. J. van Heuven and E. van Zanten (eds.), Prosody in Indonesian Languages, 35–62. Utrecht: LOT. Goedemans, R. W. N., and E. van Zanten (2014). No stress typology. In J. Caspers, Y. Chen, W. F. L. Heeren, J. Pacilly, N. O. Schiller, and E. van Zanten (eds.), Above and Beyond the Segments: Experimental Linguistics and Phonetics, 83–95. Amsterdam: John Benjamins. Gökgöz, K., K. Bogomolets, L. Tieu, J. L. Palmer, and D. Lillo-Martin (2016). Contrastive focus in children acquiring English and ASL: Cues of prominence. In L. Perkins, R. Dudley, J. Gerard, and K. Hitczenko (eds.), Proceedings of GALANA (2015), 13–23. Somerville: Cascadilla. Göksel, A. (2010). Focus in words with truth values. Iberia 2(1), 89–112. Göksel, A., M. Kelepir, and A. Üntak-Tarhan (2009). Decomposition of question intonation: The structure of response seeking utterances. In J. Grijzenhout and B. Kabak (eds.), Phonological Domains: Universals and Deviations, 249–282. The Hague: Mouton de Gruyter. Göksel, A., and C. Kerslake (2005). Turkish: A Comprehensive Grammar. London: Routledge. Göksel, A., and M. Pöchtrager (2013). The vocative and its kin: Marking function through prosody. In B. Sonnenhauser and P. Noel Aziz Hanna (eds.), Vocative! Addressing between System and Performance, 87–108. Berlin: Walter de Gruyter. Goldman-Eisler, F. (1956). The determinants of the rate of speech output and their mutual relations. Journal of Psychosomatic Research 1(2), 137–143. Goldman-Eisler, F. (1968). Psycholinguistics: Experiments in Spontaneous Speech. London: Academic Press. Goldrick, M., H. R. Baker, A. Murphy, and M. Baese-Berk (2011). Interaction and representational integration: Evidence from speech errors. Cognition 121, 58–72. Goldsmith, J. A. (1976a). Autosegmental phonology. PhD dissertation, MIT. (Published 1978, New York: Garland Press.)

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 753 Goldsmith, J. A. (1976b). An overview of autosegmental phonology. Linguistic Analysis 2, 23–68. Goldsmith, J. A. (1981). English as a tone language. In D. L. Goyvaerts (ed.), Phonology in the 1980’s, 287–308. Ghent: E. Story-Scientia. Goldsmith, J. A. (1990). Autosegmental and Metrical Phonology. Oxford: Blackwell. Goldstein, J. L. (1973). An optimum processor theory for the central formation of the pitch of complex tones. Journal of the Acoustical Society of America 54, 1496–1516. Goldstein, K. (1942). Aftereffects of Brain-Injuries in War: Their Evaluation and Treatment. New York: Grune and Stratton. Golston, C. (2009). Old English feet. In T. K. Dewey and Frog (eds.), Versatility in Versification: Multidisciplinary Approaches to Metrics, 105–122. New York: Peter Lang. Golston, C., and W. Kehrein (1998). Mazatec onsets and nuclei. International Journal of American Linguistics 64(4), 311–337. Gomez, R., and L. Gerken (2000). Infant artificial language learning and language acquisition. Trends in Cognitive Sciences 4, 178–186. Gomez-Imbert, E. (1980). La faille tonale en tatuyo. Paper presented at Journées de Tonologie LP 3–121 du CNRS, Paris. Gomez-Imbert, E. (1999). Variétés tonales sur fond d’exogamie linguistique. Cahiers de grammaire (phonologie: théorie et variation) 24, 67–93. Gomez-Imbert, E. (2004). Fonología de dos idiomas tukano del Pira-paraná: Barasana y tatuyo. Amerindia: Revue d’Ethnolinguistique amérindienne 29–30, 43–80. Gomez-Imbert, E., and M. Kenstowicz (2000). Barasana tone and accent.  International Journal of American Linguistics 66(4), 419–463. González, C. (2016). Tipología de los sistemas métricos de veinticinco lenguas pano. Amerindia: Revue d’Ethnolinguistique amérindienne 39(1), 129–172. González-Fuente, S. (2017). Audiovisual prosody and verbal irony. PhD dissertation, Universitat Pompeu Fabra. González-Fuente, S., S. Tubau, M. T. Espinal, and P. Prieto (2015). Is there a universal answering strategy for rejecting negative propositions? Typological evidence on the use of prosody and gesture. Frontiers in Psychology 6(899), 1–17. Gooden, S. (2014). Aspects of the intonational phonology of Jamaican Creole. In S.-A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 272–301. Oxford: Oxford University Press. Goodglass, H., and E. Kaplan (1972). The Assessment of Aphasia and Related Disorders. Philadelphia: Lea and Febiger. Goodhue, D., L. Harrison, Y. T. C. Su, and M. Wagner (2016). Toward a bestiary of English inton ational contours. In B. Prickett and C. Hammerly (eds.), The Proceedings of the North East Linguistics Society (NELS) 46, 311–320. Amherst: Graduate Linguistics Student Association. Goodhue, D., and M. Wagner (2018). Intonation, yes and no. Glossa 3(1), 5–45. Goodwin, J. (2012). Pronunciation teaching methods and techniques. In C. A. Chapelle (ed), The Encyclopedia of Applied Linguistics, 4725–4734. Chichester: Wiley Blackwell. Gordon, J., and I. Darcy (2016). The development of comprehensible speech in L2 learners. Journal of Second Language Pronunciation 2(1), 56–92. Gordon, M. (1997). Phonetic correlates of stress and the prosodic hierarchy in Estonian. In J. Ross and I. Lehiste (eds.), Estonian Prosody: Papers from a Symposium, 100–124. Tallinn: Institute of Estonian Language. Gordon, M. (1998). The phonetics and phonology of non-modal vowels: A cross-linguistic perspective. Proceedings of the 24th Meeting of the Berkeley Linguistics Society, 93–105, Berkeley. Gordon, M. (1999). Syllable weight: Phonetics, phonology, typology. PhD dissertation, University of California, Los Angeles. Gordon, M. (2000). The tonal basis of final weight criteria. Chicago Linguistics Society 36, 141–156. Gordon, M. (2001a). A typology of contour tone restrictions. Studies in Language 25, 405–444. Gordon, M. (2001b). Linguistic aspects of voice quality with special reference to Athabaskan. In Proceedings of the 2001 Athabaskan Languages Conference, 163–178, Los Angeles.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

754 References Gordon, M. (2002). A factorial typology of quantity insensitive stress. Natural Language and Linguistic Theory 20, 491–552. Gordon, M. (2004). A phonological and phonetic study of word-level stress in Chickasaw. International Journal of American Linguistics 70, 1–32. Gordon, M. (2005). An autosegmental/metrical model of Chickasaw intonation. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 301–330. New York: Oxford University Press. Gordon, M. (2005a). Intonational phonology of Chickasaw. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 301–330. Oxford: Oxford University Press. Gordon, M. (2005b). A perceptually-driven account of onset-sensitive stress. Natural Language and Linguistic Theory 23, 595–653. Gordon, M. (2006). Syllable Weight: Phonetics, Phonology, Typology. New York: Routledge. Gordon, M. (2008). Pitch accent timing and scaling in Chickasaw. Journal of Phonetics 36, 521–535. Gordon, M. (2011a). Stress systems. In J. A. Goldsmith, J. Riggle, and A. C. Yu (eds.), The New Handbook of Phonology, 141–163. Chichester: Wiley Blackwell. Gordon, M. (2011b). Stress: Phonotactic and phonetic evidence.  In M. van Oostendorp, C. J. Ewen, E. V. Hume, and K. Rice (eds.), The Blackwell Companion to Phonology: Vol. 2. Suprasegmental and Prosodic Phonology, 924–948. Oxford: Wiley Blackwell. Gordon, M. (2014). Disentangling stress and pitch accent: Toward a typology of prominence at different prosodic levels. In H. van der Hulst (ed.), Word Stress: Theoretical and Typological Issues, 83–118. Cambridge: Cambridge University Press. Gordon, M. (2015). Metrical structure and stress. In M. Aronoff (ed.), Oxford Research Encyclopedia of Linguistics. New York: Oxford University Press. doi: 10.1093/acrefore/9780199384655.013.115. Gordon, M. (2016). Phonological Typology. New York: Oxford University Press. Gordon, M. (2017). Phonetic and phonological research on Native American languages: Past, present, and future. International Journal of American Linguistics 83, 79–110. Gordon, M., and A. Applebaum (2010). Acoustic correlates of stress in Turkish Kabardian. Journal of the International Phonetic Association 40, 35–58. Gordon, M., E. Ghushchyan, B. McDonnell, D. Rosenblum, and P. Shaw (2012). Sonority and central vowels: A cross-linguistic phonetic study. In S. Parker (ed.), The Sonority Controversy, 219–256. Berlin: Mouton de Gruyter. Gordon, M., and P. Ladefoged (2001). Phonation types: A cross-linguistic overview. Journal of Phonetics 29, 383–406. Gordon, M., J. B. Martin, and L. Langley (2015). Some phonetic structures of Koasati. International Journal of American Linguistics 81, 83–118. Gordon, M., and P. Munro (2007). A phonetic study of final vowel lengthening in Chickasaw. International Journal of American Linguistics 73, 293–330. Gordon, M., and T. B. Roettger (2017). Acoustic correlates of word stress: A cross-linguistic survey. Linguistics Vanguard 3(1), 2017–0007. Gordon, M., and F. Rose (2006). Émérillon stress: A phonetic and phonological study. Anthropological Linguistics 48, 132–168. Goswani, G. C. (1966). An Introduction to Assamese Phonology. Poona: Deccan College Postgraduate and Research Institute. Götz, A., H. H. Yeung, A. Krasotkina, G. Schwarzer, and B. Höhle (2018). Perceptual reorganization of lexical tones: Effects of age and experimental procedure. Frontiers in Psychology 9, 477. Götz, S. (2013). Fluency in Native and Nonnative English Speech. Amsterdam: John Benjamins. Goudswaard, N. (2005). The Begak (Ida’an) Language of Sabah. Utrecht: LOT. Gouskova, M. (2007). The reduplicative template in Tonkawa. Phonology 24, 367–396. Gout, A., A. Christophe, and J. Morgan (2004). Phonological phrase boundaries constrain lexical access II: Infant data. Journal of Memory and Language 51(4), 548–567. Gow, D., and P. C. Gordon (1995). Lexical and prelexical influences on word segmentation: Evidence from priming. Journal of Experimental Psychology: Human Perception and Performance 21, 344–359.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 755 Grabe, E. (1998a). Pitch accent realization in English and German. Journal of Phonetics 26, 129–143. Grabe, E. (1998b). Comparative Intonational Phonology: English and German (MPI Series in Psycholinguistics 7). Wageningen: Ponsen en Looien. Grabe, E. (2002). Variation adds to prosodic typology. In Proceedings of Speech Prosody 1, 127–132, Aix-en-Provence. Grabe, E. (2004). Intonational variation in urban dialects of English spoken in the British Isles. In P. Gilles and J. Peters (eds.), Regional Variation in Intonation, 9–31. Tübingen: Niemeyer. Grabe, E., and E. L. Low (2002). Durational variability in speech and the rhythm class hypothesis. In C. Gussenhoven and N. Warner (eds.), Laboratory Phonology 7, 515–546. Berlin: Mouton de Gruyter. Grabe, E., B. Post, and F. Nolan (2001). Modelling intonational variation in English: The IViE system. In S. Puppel and G. Demenko (eds.), Proceedings of Prosody (2000), 51–57. Poznań: Adam Mickiewitz University. Grabe, E., B. Post, F. Nolan, and K. Farrar (2000). Pitch accent realization in four varieties of British English. Journal of Phonetics 28, 161–185. Grabe, E., B. Post, and I. Watson (1999). The acquisition of rhythmic patterns in English and French. In Proceedings of the 14th International Congress of Phonetic Sciences, 1201–1204, San Francisco. Gradin, D. (1966). Consonantal tone in Jeh phonemics. Mon Khmer Studies 2, 41–53. Graf Estes, K. G., and S. Bowen (2013). Learning about sounds contributes to learning about words: Effects of prosody and phonotactics on infant word learning. Journal of Experimental Child Psychology 114(3), 405–417. Gragg, G. (1997). Geez (Ethiopic). In R. Hetzron (ed.), The Semitic Languages, 242–262. London: Routledge. Gragg, G., and R. D. Hoberman (2012). Semitic. In Z. Frajzyngier and E. Shay (eds.), The Afroasiatic Languages, 145–235. Cambridge: Cambridge University Press. Graham, C. R. (2014). Fundamental frequency range in Japanese and English: The case of simultan eous bilinguals. Phonetica 71, 271–295. Graham, C. R., and B. Post (2018). Second language acquisition of intonation: Peak alignment in American English. Journal of Phonetics 66, 1–14. Granström, B., and D. House (2004). Audiovisual representation of prosody in expressive speech communication. In Proceedings of Speech Prosody 2, 393–400, Nara. Granström, B., D. House, and M. Swerts (2002). Multimodal feedback cues in human-machine interactions. In Proceedings of Speech 1, Aix-en-Provence. Grassmann, S., and M. Tomasello (2010). Prosodic stress on a word directs 24-month-olds’ attention to a contextually new referent. Journal of Pragmatics 42(11), 3098–3105. Gratch, J., R. Artstein, G. M. Lucas, G. Stratou, S. Scherer, A. Nazarian, R. Wood, J. Boberg, D. DeVault, S. Marsella, and D. R. Traum (2014). The Distress Analysis Interview corpus of human and computer interviews. In Proceedings of the 9th International Conference on Language Resources and Evaluation, 3123–3128, Reykjavik. Gravano, A., Š. Beňuš, R. Levitan, and J. Hirschberg (2014). Three ToBI-based measures of prosodic entrainment and their correlations with speaker engagement. In Proceedings of the IEEE Spoken Language Technology Workshop, 578–583, South Lake Tahoe. Gravano, A., Š. Beňuš, R. Levitan, and J. Hirschberg (2015). Backward mimicry and forward influence in prosodic contour choice in Standard American English. In INTERSPEECH 2015, 1839–1843, Dresden. Gravano, A., and J. Hirschberg (2011). Turn-taking cues in task-oriented dialogue. Computer Speech and Language 25, 601–634. Gravano, A., J. Hirschberg, and Š. Beňuš (2012). Affirmative cue words in task-oriented dialogue. Computational Linguistics 38(1), 1–39. Graves, Z. R., and J. Glick (1978). The effect of context on mother-child interaction: A progress report. Quarterly Newsletter of the Institute for Comparative Human Development 2(3), 41–46. Grech, S. (2015). Variation in English: Perceptions and patterns in Maltese English. PhD dissertation, University of Malta. Grech, S., and A. Vella (2018). Rhythm in Maltese English. In P. Paggio and A. Gatt (eds.), The Languages of Malta, 1–22. Berlin: Language Science Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

756 References Green, A. (1997). The prosodic structure of Irish, Scots Gaelic, and Manx. PhD dissertation, Cornell University. Green, A. D. (2005). Word, foot, and syllable structure in Burmese. In J. Watkins (ed.), Studies in Burmese Linguistics, 1–25. Canberra: Pacific Linguistics. Green, C. R. (2013). Formalizing the prosodic word domain in Bambara tonology. Journal of West African Languages 40, 3–20. Green, C. R., and M. E. Morrison (2016). Somali wordhood and its relationship to prosodic structure. Morphology 26, 3–32. Green, H., and Y. Tobin (2009). Prosodic analysis is difficult . . . but worth it: A study in high functioning autism. International Journal of Speech-Language Pathology 11(4), 308–315. Green, J., I. Nip, E. Wilson, A. Mefferd, and Y. Yunusova (2010). Lip movement exaggerations d uring infant-directed speech. Journal of Speech, Language, and Hearing Research 53(6), 1529–1542. Green, L. J. (2002). African American English: A Linguistic Introduction. Cambridge: Cambridge University Press. Green, M. (2007). Focus in Hausa. Oxford: Wiley Blackwell. Green, T., and M. Kenstowicz (1995). The lapse constraint. In L. Gabriele, D. Hardison, and R. Westmoreland (eds.), Papers from the Sixth Annual Meeting of the Formal Linguistics Society of Mid-America, 1–15. Bloomington: Indiana University Linguistics Club. Greenberg, J. H. (1963). The Languages of Africa. The Hague: Mouton. Greenberg, S., and L. Hitchcock (2001). Stress-accent and vowel quality in the Switchboard corpus. Paper presented at the NIST Large Vocabulary Continuous Speech Recognition Workshop, Linthicum Heights, MD. Gregerson, K. J. (1973). Tongue-root and register in Mon-Khmer. In N. Jenner, L. Thompson, and S. Starosta (eds.), Austroasiatic Studies, 323–369. Honolulu: University of Hawaiʻi Press. Gregory, R. L. (1997). Eye and Brain: The Psychology of Seeing. Oxford: Oxford University Press. Gregory, S. W., and S. Webster (1996). A nonverbal signal in voices of interview partners effectively predicts communication accommodation and social status predictions. Journal of Personality and Social Psychology 70, 1231–1240. Gregory, S. W., S. Webster, and G. Huang (1993). Voice pitch and amplitude convergence as a metric of quality in dyadic interviews. Language and Communication 13(3), 195–217. Grèzes, F., J. Richards, and A. Rosenberg (2013). Let me finish: Automatic conflict detection using speaker overlap. In INTERSPEECH 2013, 200–204, Lyon. Grice, H. P. (1975). Logic and conversation. In P. Cole and J. Morgan (eds.), Syntax and Semantics 3: Speech Acts, 41–58. New York: Academic Press. Grice, M. (1995a). Leading tones and downstep in English. Phonology 12, 183–233. Grice, M. (1995b). The Intonation of Interrogation in Palermo Italian: Implications for Intonation Theory (Linguistische Arbeiten 334). Tübingen: Niemeyer. Grice, M., and S. Baumann (2002). Deutsche intonation und GToBI. Linguistische Berichte 191, 267–298. Grice, M., and S. Baumann (2007). An introduction to intonation-functions and models. In J. Trouvain and U. Gut (eds.), Non-native Prosody: Phonetic Description and Teaching Practice, 25–51. Berlin: De Gruyter. Grice, M., S. Baumann, and R. Benzmüller (2005a). German intonation in autosegmental-metrical phonology. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 55–83. Oxford: Oxford University Press. Grice, M., S. Baumann, and N. Jagdfeld (2009). Tonal association and derived nuclear accents: The case of downstepping contours in German. Lingua 119, 881–905. Grice, M., R. Benzmüller, M. Savino, and B. Andreeva (1995). The intonation of queries and checks across languages: Data from map task dialogues. In Proceedings of the 13th International Congress of Phonetic Sciences, 648–651, Stockholm. Grice, M., M. D’Imperio, M. Savino, and C. Avesani (2005b). Strategies for intonation labelling across varieties of Italian. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 362–389. New York: Oxford University Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 757 Grice, M., D. R. Ladd, and A. Arvaniti (2000). On the place of phrase accents in intonational phonology. Phonology 17, 143–185. Grice, M., S. Ritter, H. Niemann, and T. B. Roettger (2017). Integrating the discreteness and continuity of intonational categories. Journal of Phonetics 64, 90–107. Grice, M., T. B. Roettger, and R. Ridouane (2015). Tonal association in Tashlhiyt Berber: Evidence from polar questions and contrastive statements. Phonology 32(2), 241–266. Grice, M., and M. Savino (1995). Intonation and communicative function in a regional variety of Italian. In W. J. Barry and J. C. Koreman (eds.), Phonus 1, 19–32. Grice, M., M. Savino, and M. Refice (1997). The intonation of questions in Bari Italian: Do speakers replicate their spontaneous speech when reading? Phonus 3, 1–7. Grice, M., M. Savino, and T. B. Roettger (2018). Word final schwa is driven by intonation: The case of Bari Italian. Journal of the Acoustical Society of America, 143(4), 2474–2486. Grice, M., S. Vella, and A. Bruggeman (2019). Stress, pitch accent, and beyond: Intonation in Maltese questions. Journal of Phonetics 76, 100913. Grierson, G. A. (1903/1922). Linguistic Survey of India. Calcutta: Office of the Superintendent of Government Printing. Grieser, D. L., and P. K. Kuhl (1988). Maternal speech to infants in a tonal language: Support for universal prosodic features in motherese. Developmental Psychology 24(1), 14–20. Griffiths, S. K., W. S. Brown, K. J. Gerhardt, R. M. Abrams, and R. J. Morris (1994). The perception of speech sounds recorded within the uterus of a pregnant sheep. Journal of the Acoustical Society of America 96(4), 2055–2063. Grigorjevs, J. (2008). Latviešu valodas patskaņu sistēmas akustisks un auditīvs raksturojums. Riga: LU Latviešu Valodas Institūts. Grigorjevs, J., and J. Jaroslavienė (2015). Comparative study of the qualitative features of the Lithuanian and Latvian monophthongs. Baltistica 50(1), 57–89. Grimes, J. (1971). A reclassification of the Quichean and Kekchian (Mayan) languages. International Journal of American Linguistics 37(1), 15–19. Grimes, J. (1972). The Phonological History of the Quichean Languages. Carbondale: University Museum, Southern Illinois University. Grimm, A. (2007). The development of early prosodic word structure in child German: Simplex words and compounds. PhD dissertation, University of Potsdam. Groen, W. B., L. van Orsouw, M. Zwiers, S. SwinkelsRutger, J. van der Gaag, and J. K. Buitelaar (2008). Gender in voice perception in Autism. Journal of Autism and Developmental Disorders 38(10), 1819–1826. Grønnum, N. (1989). Stress group patterns, sentence accents and sentence intonation in Southern Jutland (Sønderborg and Tønder)—with a View to German. Annual Reports of the Institute of Phonetics, University of Copenhagen 23, 1–85. Grønnum, N. (1990). Prosodic features in regional Danish with a view to Swedish and German. In K. Wiik and I. Raimo (eds.), Nordic Prosody: Papers from a Symposium V, 131–144. Åbo: University of Turku Phonetics. Grønnum, N. (in press). Modelling Danish intonation. In S. Shattuck-Hufnagel and J. A. Barnes (eds.), Prosodic Theory and Practice. Cambridge, MA: MIT Press. Grosjean, F., and M. Collins (1979). Breathing, pausing and reading. Phonetica 36(2), 98–114. Grossberg, S., and C. Myers (2000). The resonant dynamics of speech perception: Interword integration and duration-dependent backward effects. Psychological Review 107, 735–767. Grossman, R. B., R. H. Bemis, D. P. Skwerer, and H. Tager-Flusberg (2010). Lexical and affective prosody in children with high-functioning autism. Journal of Speech, Language, and Hearing Research 53(3), 778–793. Grosz, B., and C. Sidner (1986). Attention, intention, and the structure of discourse. Computational Linguistics 2012(3), 175–204. Gruber, J. (2011). An acoustic, articulatory, and auditory study of Burmese tone. PhD dissertation, Georgetown University.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

758 References Grünloh, T., and U. Liszkowski (2015). Prelinguistic vocalizations distinguish pointing acts. Journal of Child Language 42(6), 1312–1336. Gsell, R. (1980). Remarques sur la structure de lespace tonal en Vietnamien du sud (parler de Saigon). Cahiers détudes vietnamiennes 4, 1–26. Gu, W., K. Hirose, and H. Fujisaki (2006). Modeling the effects of emphasis and question on fundamental frequency contours of Cantonese utterances. IEEE Transactions on Audio, Speech and Language Processing 14, 1155–1170. Gu, W., and T. Lee (2007). Effects of tonal context and focus on Cantonese f0. In Proceedings of the 16th International Congress of Phonetic Sciences, 1033–1036, Saarbrücken. Gu, W., and T. Lee (2009). Effects of tone and emphatic focus on f0 contours of Cantonese speech: A comparison with Standard Chinese. Chinese Journal of Phonetics 2, 133–147. Gu, W., J. Yin, and J. Mahshie (2017b). Production of sustained vowels and categorical perception of tones in Mandarin among cochlear-implanted children. In INTERSPEECH 2017, 1869–1873, Stockholm. Gu, Y., L. Mol, M. Hoetjes, and M. Swerts (2017a). Conceptual and lexical effects on gestures: The case of vertical spatial metaphors for time in Chinese. Language, Cognition and Neuroscience 32(8), 1048–1063. Guellaï, B., A. Langus, and M. Nespor (2014). Prosody in the hands of the speaker. Frontiers in Psychology 5, 700. Guenther, F. (2018). Neural Control of Speech. Cambridge, MA: MIT Press. Guenther, F. H., S. S. Ghosh, and J. A. Tourville (2006). Neural modeling and imaging of the cortical interactions underlying syllable production. Brain and Language 96(3), 280–301. Guillaume, A. (2008). A Grammar of Cavineña (Mouton Grammar Library 44). Berlin: Mouton de Gruyter. Guion, S. G. (2005). Knowledge of English word stress patterns in early and late Korean-English bilinguals. Studies in Second Language Acquisition 27(4), 503–533. Guion, S. G., J. D. Amith, C. S. Doty, and I. A. Shport (2010). Word-level prosody in Balsas Nahuatl: The origin, development and acoustic correlates of tone in a stress accent language. Journal of Phonetics 38, 137–166. Guion, S. G., J. Clark, T. Harada, and R. P. Wayland (2003). Factors affecting stress placement for English non-words include syllabic structure, lexical class, and stress patterns of phonologically similar words. Language and Speech 46, 403–427. Guion, S. G., T. Harada, and J. Clark (2004). Early and late Spanish–English bilinguals acquisition of English word stress patterns. Bilingualism: Language and Cognition 7(03), 207–226. Guion, S. G., and E. Pederson (2007). Investigating the role of attention in phonetic learning. In O.-S. Bohn and M. J. Munro (eds.), Language Experience in Second Language Speech Learning: In Honor of James Emil Flege, 57–77. Amsterdam: John Benjamins. Güldemann, T. (2008). The Macro-Sudan belt: Towards identifying a linguistic area in northern subSaharan Africa. In B. Heine and D. Nurse (eds.), A Linguistic Geography of Africa, 151–185. Cambridge: Cambridge University Press. Güldemann, T. (2010). Sprachraum and geography: Linguistic macro-areas in Africa. In A. Lameli, R. Kehrein, and S. Rabanus (eds.), Language and Space: An International Handbook of Linguistic Variation, vol. 2, 561–585, maps 2901–2914. Berlin: Mouton de Gruyter. Gulick, W. L., G. A. Gescheider, and R. D. Frisina (1989). Hearing: Physiological Acoustics, Neural Coding, and Psychoacoustics. New York: Oxford University Press. Gundel, J. K., and T. Fretheim (2004). Topic and focus. In L. R. Horn and G. Ward (eds.), The Handbook of Pragmatics, 175–196. Oxford: Blackwell. Gundel, J. K., and T. Fretheim (2008). Topic and focus. In L. R. Horn and G. Ward (eds.), The Handbook of Pragmatics, 175–196. Oxford: Wiley Blackwell. Güneáş, G. (2015). Deriving Prosodic Structures. Utrecht: LOT Dissertation Series. Gunlogson, C. (2003). True to Form: Rising and Falling Declaratives as Questions in English. New York: Routledge. Gunlogson, C. (2008). A question of commitment. Belgian Journal of Linguistics 22, 101–136. Gupta, A. F. (1998). The situation of English in Singapore. In J. A. Foley, T. Kandiah, B. Zhiming, A. F. Gupta, L. Alsagoff, H. C. Lick, L. Wee, I. S. Talib, and W. Bokhorst-Heng (eds.), English in New

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 759 Cultural Contexts: Reflections from Singapore, 106–126. Singapore: Singapore Institute of Management/ Oxford University Press. Gürer, A. (2015). Semantic, prosodic and syntactic marking of information structural units in Turkish. PhD dissertation, Boğaziçi University. Gussenhoven, C. (1983a). Focus, mode and the nucleus. Journal of Linguistics 19(2), 377–417. Gussenhoven, C. (1983b). A Semantic Analysis of the Nuclear Tones of English. Bloomington: Indiana University Linguistics Club. Gussenhoven, C. (1983c). Testing the reality of focus domains. Language and Speech 26(1), 61–80. Gussenhoven, C. (1984). On the Grammar and Semantics of Sentence Accents (Publications in Language Sciences 16). Berlin: De Gruyter Mouton. Gussenhoven, C. (1991). The English rhythm rule as an accent deletion rule. Phonology 8(1), 1–35. Gussenhoven, C. (1993). The Dutch foot and the chanted call. Journal of Linguistics 29, 37–63. Gussenhoven, C. (2000a). The boundary tones are coming: On the non-peripheral realization of boundary tones. In M. B. Broe and J. B. Pierrehumbert (eds.), Papers in Laboratory Phonology V: Acquisition and the Lexicon, 132–151. Cambridge: Cambridge University Press. Gussenhoven, C. (2000b). The lexical tone contrast of Roermond Dutch in Optimality Theory. In M. Horne (ed.), Prosody: Theory and Experiment, 129–167. Dordrecht: Kluwer. Gussenhoven, C. (2000c). On the origin and development of the Central Franconian tone contrast. In A. Lahiri (ed.), Analogy, Levelling, Markedness: Principles of Change in Phonology and Morphology, 215–260. Berlin: Mouton de Gruyter. Gussenhoven, C. (2002). Intonation and interpretation: Phonetics and phonology. In Proceedings of Speech Prosody 1, 47–57, Aix-en-Provence. Gussenhoven, C. (2004). The Phonology of Tone and Intonation. Cambridge: Cambridge University Press. Gussenhoven, C. (2005). Transcription of Dutch intonation. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 118–145. Oxford: Oxford University Press. Gussenhoven, C. (2006). Between stress and tone in Nubi word prosody. Phonology 23, 193–223. Gussenhoven, C. (2007). Intonation. In P. de Lacy (ed.), The Cambridge Handbook of Phonology, 253–280. Cambridge: Cambridge University Press. Gussenhoven, C. (2008). Notions and subnotions in information structure. Acta Linguistica Hungarica 55(3), 381–395. Gussenhoven, C. (2009). Vowel duration, syllable quantity, and stress in Dutch. In K. Hanson and S. Inkelas (eds.), The Nature of the Word: Essays in Honor of P. Kiparsky, 181–198. Cambridge, MA: MIT Press. Gussenhoven, C. (2011). Sentential prominence in English. In M. van Oostendorp, C. J. Ewen, E. V. Hume, and K. Rice (eds.), The Blackwell Companion to Phonology: Vol. 5. Phonology across Languages, 2778–2806. Oxford: Wiley Blackwell. Gussenhoven, C. (2012a). Quantity or durational enhancement of tone? The case of Maastricht Limburgian high vowels. In B. Botma and R. Noske (eds.), Phonological Explorations: Empirical, Theoretical and Diachronic Issues, 241–254. Berlin: Mouton de Gruyter. Gussenhoven, C. (2012b). Asymmetries in the intonation system of Maastricht Limburgish. Phonology 29, 39–79. Gussenhoven, C. (2016). Analysis of intonation: The case of MAE ToBI. Laboratory Phonology 7, 1–35. Gussenhoven, C. (2017a). Zwara (Zuwarah) Berber. Journal of the International Phonetic Association, 48(3), 371–387. Gussenhoven, C. (2017b). On the intonation of tonal varieties of English. In M. Filppula, J. Klemola, and D. Sharma (eds.), The Oxford Handbook of World Englishes, 569–598. Oxford: Oxford University Press. Gussenhoven, Carlos (2018a). In defense of a dialect-contact scenario of the Central Franconian tonogenesis. In H. Kubozono and M. Giriko (eds.), Tonal Change and Neutralization. Mouton de Gruyter, 350–379. Gussenhoven, C. (2018b). On the privileged status of intonational boundary tones: Evidence from Japanese, French, and Cantonese English. In E. Buckley, T. Crane, and J. Good (eds.), Revealing Structure, 57–69. Stanford: Center for the Study of Language and Information. Gussenhoven, C., and J. Peters (2004). A tonal analysis of Cologne Schärfung. Phonology 21, 251–285.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

760 References Gussenhoven, C., and J. Peters (2019). Franconian tones fare better as tones than as feet: A reply to Köhnlein (2016). Phonology 36, 497–530. Gussenhoven, C., B. H. Repp, T. Rietveld, W. H. Rump, and J. Terken (1997). The perceptual promin ence of fundamental frequency peaks. Journal of the Acoustical Society of America 102, 3009–3022. Gussenhoven, C., and A. C. M. Rietveld (1991). An experimental evaluation of two nuclear tone taxonomies. Linguistics 29, 423–449. Gussenhoven, C., and A. C. M. Rietveld (1992). Intonation contours, prosodic structure and preboundary lengthening. Journal of Phonetics 20(3), 283–303. Gussenhoven, C., and T. Rietveld (1988). Fundamental frequency declination in Dutch: Testing three hypotheses. Journal of Phonetics 16, 355–369. Gussenhoven, C., and T. Rietveld (1998). On the speaker-dependence of the perceived prominence of f0 peaks. Journal of Phonetics 26, 371–380. Gussenhoven, C., and T. Rietveld (2000). The behavior of H* and L* under variations in pitch range in Dutch rising contours. Language and Speech 43, 183–203. Gussenhoven, C., and R. Teeuw (2008). A moraic and a syllabic H-tone in Yucatec Maya. In E. Herrera and P. M. Butrageño (eds.), Fonología instrumental: Patrones fónicos y variación, 49–71. Mexico City: El Colegio de México. Gussenhoven, C., and I. Udofot (2010). Word melodies vs. pitch accents: A perceptual evaluation of terracing contours in British and Nigerian English. In Proceedings of Speech Prosody 5, Chicago. Gussenhoven, C., and F. van den Beuken (2012). Contrasting the high rise and the low rise intonations in a dialect with the Central Franconian tone. The Linguistic Review 29, 75–107. Gussenhoven, C., and P. van der Vliet (1999). The phonology of tone and intonation in the Dutch dialect of Venlo. Journal of Linguistics 35, 99–135. Gussenhoven, C., and W. Zhou (2013). Revisiting pitch slope and height effects on perceived duration. In INTERSPEECH 2013, 1365–1369, Lyon. Gustafson, G. E., S. M. Sanborn, H.-C. Lin, and J. Green (2017). Newborns’ cries are unique to individuals (but not to language environment). Infancy 22(6), 736–747. Gut, U. (2005). Nigerian English prosody. English World-Wide 26(2), 153–177. Gut, U. (2009a). Non-native speech: A Corpus-Based Analysis of Phonological and Phonetic Properties of L2 English and German. Frankfurt: Peter Lang. Gut, U. (2009b). Introduction to English Phonetics and Phonology. Frankfurt: Peter Lang. Gut, U. (2012). The LeaP corpus: A multilingual corpus of spoken learner German and learner English. In T. Schmidt and K. Wörner (eds.), Multilingual Corpora and Multilingual Corpus Analysis (Hamburg Studies on Multilingualism 14), 3–23. Amsterdam: John Benjamins. Gut, U., and J.-T. Milde (2002). The prosody of Nigerian English. In Proceedings of Speech Prosody 1, 367–370, Aix-en-Provence. Gut, U., and S. Pillai (2014). Prosodic marking of information structure by Malaysian speakers of English. Studies in Second Language Acquisition 36(2), 283–302. Gut, U., J. Trouvain, and W. J. Barry (2007). Bridging research on phonetic descriptions with know ledge from teaching practice—the case of prosody in non-native speech. Trends in Linguistics Studies and Monographs 186, 3–21. Gutman, A., I. Dautriche, B. Crabbé, and A. Christophe (2015). Bootstrapping the syntactic bootstrapper: Probabilistic labeling of prosodic phrases. Language Acquisition 22(3), 285–309. Gvozdanović, J. (1980). Tone and Accent in Standard Serbo-Croatian. Vienna: Verlag der Österreichischen Akademie der Wissenschaften. Gvozdanović, J. (1999). South Slavic prosody. In H. van der Hulst (ed.), Word Prosodic Systems in the Languages of Europe, 839–876. Berlin: Mouton de Gruyter. Gyuris, B., and K. Mády (2014). Contrastive topics between syntax and pragmatics in Hungarian: An experimental analysis. In Proceedings of the 46th Meeting of the Chicago Linguistics Society, number 1, 147–162. Chicago: Chicago Linguistic Society. Hạ, K. P. (2010). Ờ, ừ and vâng in backchannels and requests for information. Journal of the Southeast Asian Linguistics Society 3, 56–76.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 761 Hạ, K. P. (2012). Prosody in Vietnamese: Intonational Form and Function of Short Utterances in Conversation. Cologne: University of Cologne. Haacke, W. H. G. (1999). The Tonology of Khoekhoe (Nama/Damara). Research in Khoisan Studies 16. Cologne: Köppe. Haake, C., M. Kob, K. Willmes, and F. Domahs (2013). Word stress processing in specific language impairment: Auditory or representational deficits? Clinical Linguistics and Phonetics 27(8), 594–615. Haas, M. (1977). Tonal accent in Creek. In L. M. Hyman (ed.), Studies in Stress and Accent, 195–208. Los Angeles: University of Southern California. Hagström, B. (1967). Ändelsevokalerna i färöiskan: En fonetisk-fonologisk studie. Stockholm: Almqvist and Wiksel. Hajičová, E. (2012). Vilém Mathesius and functional sentence perspective, and beyond. In A Centenary of English Studies at Charles University: From Mathesius to Present-Day Linguistics, 49–60. Prague: Univerzita Karlova v Praze. Hajičová, E., B. Partee, and P. Sgall (1998). Focus, topic and semantics. University of Massachusetts Working Papers in Linguistics, 21, 101–124. Hale, K. L. (1964). Classification of the Northern Paman languages, Cape York Peninsula, Australia: A research report. Oceanic Linguistics 3, 248–265. Hale, K. L. (1976). Phonological developments in a Northern Paman language: Uradhi. In P. Sutton (ed.), Languages of Cape York, 41–49. Canberra: Australian Institute of Aboriginal Studies. Hale, K. L. (1977). §1.3 Elementary remarks on Walbiri orthography, phonology and allomorphy. Ms., MIT. Hale, K. L., and A. Lacayo Blanco (1989). Diccionario elemental del Ulwa (Sumu meridional). Cambridge, MA: Center for Cognitive Science, MIT. Halim, A. (1981). Intonation in Relation to Syntax in Indonesian (Pacific Linguistics Series D, 36). Canberra: Pacific Linguistics. Hall, T. A. (1992). Syllable Structure and Syllable-Related Processes in German. Berlin: De Gruyter. Hall, T. A., and U. Kleinhenz (eds.) (1999). Studies on the Phonological Word. Amsterdam: Benjamins. Halle, M., and K. N. Stevens (1962). Speech recognition: A model and a program for research. IEEE Transactions on Information Theory 8, 155–159. Halle, J. (2015). Stress-beat and tone-tune: Mismatches compared. Paper presented at the Singing in Tone satellite workshop at the 18th International Congress of Phonetic Sciences, Glasgow. Halle, J., and F. Lerdahl (1993). A generative textsetting model. Current Musicology 55, 3–23. Halle, M. (1973). The accentuation of Russian words. Language 49, 312–348. Halle, M., and W. J. Idsardi (1995). Stress and metrical structure. In J. A. Goldsmith (ed.), The Handbook of Phonological Theory, 403–443. Oxford: Blackwell. Halle, M., and J.-R. Vergnaud, (1987). An Essay on Stress. Cambridge, MA: MIT Press. Hallé, P. A., B. De Boysson-Bardies, and M. Vihman (1991). Beginnings of prosodic organization: Intonation and duration patterns of disyllables produced by Japanese and French infants. Language and Speech 34(4), 299–318. Hallé, P. A., Y.-C. Chang, and C. T. Best (2004). Identification and discrimination of Mandarin Chinese tones by Mandarin Chinese vs. French listeners. Journal of Phonetics 32(3), 395–421. Halliday, M. (1967). Intonation and Grammar in British English. The Hague: Mouton. Hamada, H., S. Miki, and R. Nakatsu (1993). Automatic evaluation of English pronunciation based on speech recognition techniques. IEICE Transactions on Information and Systems E76-D(3), 352–359. Hammarström, H., R. Forkel, and M. Haspelmath (2018). Glottolog 3.3. Max Planck Institute for the Science of Human History. Retrieved 4 December 2018 from http://glottolog.org. Hammond, M. (2011). The foot. In M. van Oostendorp, C. J. Ewen, E. V. Hume, and K. Rice (eds.), The Blackwell Companion to Phonology, vol. 2, 949–979. Oxford: Wiley Blackwell. Han, M. (1962). The feature of duration in Japanese. Onsei No Kenkyuu 10, 65–80. Han, M., and K.-O. Kim (1974). Phonetic variation of Vietnamese tones in disyllabic utterances. Journal of Phonetics 2, 223–232. Hancil, S. (2009). The Role of Prosody in Affective Speech. Berlin: Peter Lang.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

762 References Handel, S. (1989). Listening: An Introduction to the Perception of Auditory Events. Cambridge, MA: MIT Press. Hannahs, S. J. (2013). The Phonology of Welsh. Oxford: Oxford University Press. Hanson, H. (1997). Glottal characteristics of female speakers: Acoustic correlates. Journal of the Acoustical Society of America 101, 466–481. Hanson, H., and E. S. Chuang (1999). Glottal characteristics of male speakers: Acoustic correlates and comparison with female data. Journal of the Acoustical Society of America 106, 1064–1077. Hanson, H. M. (2009). Effects of obstruent consonants on fundamental frequency at vowel onset in English. Journal of the Acoustical Society of America 125, 425–441. Hanson, K. (2006). Shakespeare’s lyric and dramatic metrical styles. In B. E. Dresher and N. Friedberg (eds.), Formal Approaches to Poetry, 111–133. Berlin: Mouton de Gruyter. Hanson, K., and P. Kiparsky (1996). A parametric theory of poetic meter. Language 72, 287–335. Hanssen, J. (2017). Regional Variation in the Realization of Intonation Contours in the Netherlands. Utrecht: LOT. Hanssen, J., J. Peters, and C. Gussenhoven (2007). Phrase-final pitch accommodation effects in Dutch. In Proceedings of the 16th International Congress of Phonetic Sciences, 1077–1080, Saarbrücken. Hansson, G. O. (2003). Laryngeal licensing and laryngeal neutralization in Faroese and Icelandic. Nordic Journal of Linguistics 26, 45–79. Haraguchi, S. (1977). The Tone Pattern of Japanese: An Autosegmental Theory of Tonology. Tokyo: Kaitakusha. Hardison, D. (2004). Generalization of computer-assisted prosody training: Quantitative and qualitative findings. Language Learning and Technology 8, 34–52. Hardison, D. (2005). Contextualized computer-based L2 prosody training: Evaluating the effects of discourse context and video input. Calico Journal, 22(2), 175–190. Hargus, S. (2005). Prosody in two Athabaskan languages of northern British Columbia. In S. Hargus and K. Rice (eds.), Athabaskan Prosody, 393–423. Amsterdam: John Benjamins. Hargus, S., and V. Beavert (2005). A note on the phonetic correlates of stress in Yakima Sahaptin. In D. J. Jinguji and S. Moran (eds.), University of Washington Working Papers in Linguistics 24, 64–95. Washington, DC: University of Washington. Hargus, S., and K. Rice (eds.) (2005). Athabaskan Prosody. Amsterdam: John Benjamins. Harms, R. (1997). Estonian Grammar. New York: Routledge. Harnud, H. (2003). A Basic Study of Mongolian Prosody (Publications of the Department of Phonetics 45). Helsinki: University of Helsinki. Harrington, J., J. Fletcher, and M. E. Beckman (2000). Manner and place conflicts in the articulation of accent in Australian English. In M. B. Broe and J. B. Pierrehumbert (eds.), Papers in Laboratory Phonology V: Acquisition and the Lexicon, 40–55. Cambridge: Cambridge University Press. Harris, A. (1993). Georgian. In J. Jacobs, A. von Stechow, W. Sternefeld, and T. Vennemann (eds.), Syntax: An International Handbook of Contemporary Research, 1377–1397. Berlin: De Gruyter. Harris, B. P., and G. O’Grady. (1976). An analysis of the progressive morpheme in Umpila verbs. In P. J. Sutton (ed.), Languages of Cape York, 165–212. Canberra: Australian Institute of Aboriginal Studies. Harris, F. J. (1978). On the use of windows for harmonic analysis with the discrete Fourier transform. Proceedings of the IEEE 66, 51–83. Harris, J. (1983). Syllable Structure and Stress in Spanish: A Nonlinear Analysis. Cambridge: Cambridge University Press. Harris, J. (2004). Release the captive coda: The foot as a domain of phonetic interpretation. In J. Local, R. Ogden, and R. Temple (eds.), Phonetic Interpretation: Papers in Laboratory Phonology VI, 103–129. Cambridge: Cambridge University Press. Harris, J., and E.-A. Urua (2001). Lenition degrades information: Consonant allophony in Ibibio. Speech, Hearing and Language: Work in Progress 13, 72–105. Harrison, P. (2000). Acquiring the phonology of lexical tone in infancy. Lingua 110(8), 581–616.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 763 Harry, O. G., and L. M. Hyman. Phrasal construction tonology: The case of Kalabari. Studies in Language 38(4), 649–689. Hartelius, L., B. Runmarker, O. Andersen, and L. Nord (2000). Temporal speech characteristics of individuals with multiple sclerosis and ataxic dysarthria: Scanning speech revisited. Folia Phoniatrica et Logopaedica 52, 228–238. Hartmann, K., and M. Zimmermann (2007a). Focus strategies in Chadic: The case of Tangale revisited. Studia Linguistica 61(2), 95–129. Hartmann, K., and M. Zimmermann (2007b). In place—out of place? Focus strategies in Hausa. In K. Schwabe and S. Winkler (eds.), On Information Structure, Meaning and Form, 365–403. Amsterdam: John Benjamins. Hartmann, W. M. (1988). Pitch perception and the segregation and integration of auditory entities. In G. M. Edelman, W. E. Gall, and W. M. Cowan (eds.), Auditory Function: Neurobiological Bases of Hearing, 623–645. New York: John Wiley and Sons. Harvey, M. (2001). A Grammar of Limilngan. Canberra: Pacific Linguistics. Harvey, M., and T. Borowsky (1999). The minimum word in Warray. Australian Journal of Linguistics 19, 89–99. Hasan, A. (2016). The phonological word and stress shift in Northern Kurmanji Kurdish. European Scientific Journal 12, 370. Haudricourt, A. G. (1961). Bipartition et tripartion des systèmes de tons dans quelques langues de’Extrême-Orient. Bulletin de la Société de Linguistique de Paris 56, 163–180. Haugen, E. (1976). The Scandinavian Languages. London: Faber and Faber. Haugen, J. D. (2009). What is the base for reduplication? Linguistic Inquiry 40, 505–514. Haugen, J. D., and C. Hicks Kennard (2011). Base-dependence in reduplication. Morphology 21, 1–29. Hawthorne, K., and L. Gerken (2014). From pauses to clauses: Prosody facilitates learning of syntactic constituency. Cognition 133(2), 420–428. Hawthorne, K., L. Rudat, and L. Gerken (2016). Prosody and the acquisition of hierarchical structure in children and adults. Infancy 21, 603–624. Hay, J. F., R. A. Cannistraci, and Q. Zhao (2019). Mapping non-native pitch contours to meaning: Perceptual and experiential factors. Journal of Memory and Language 105, 131–140. Hay, J. F., K. G. Graf Estes, T. Wang, and J. R. Saffran (2015). From flexibility to constraint: The con trastive use of lexical tone in early word learning. Child Development 86(1), 10–22. Hay, J. F., and J. R. Saffran (2011). Rhythmic grouping biases constrain infant statistical learning. Infancy 17(6), 610–641. Hay, J. S., and R. L. Diehl (2007). Perception of rhythmic grouping: Testing the iambic/trochaic law. Perception and Psychophysics 69(1), 113–122. Hayashi, A., and R. Mazuka (2017). Emergence of Japanese infants’ prosodic preferences in infantdirected vocabulary. Developmental Psychology 53(1), 28–37. Hayata, T. (1999). Onchō no Taiporojī. Tokyo: Taishūkan. Hayes, B. (1979). Extrametricality. MIT Working Papers in Linguistics 1, 77–86. Hayes, B. (1980). A metrical theory of stress rules. PhD dissertation, MIT. (Published 1985, New York: Garland Press.) Hayes, B. (1989a). Compensatory lengthening in moraic phonology. Linguistic Inquiry 20, 253–306. Hayes, B. (1989b). The prosodic hierarchy in meter. In P. Kiparsky and G. Youmans (eds.), Phonetics and Phonology I: Rhythm and Meter, 201–260. San Diego: Academic Press. Hayes, B. (1995). Metrical Stress Theory: Principles and Case Studies. Chicago: University of Chicago Press. Hayes, B., and M. Abad (1989). Reduplication and syllabification in Ilokano. Lingua 77, 331–374. Hayes, B., and A. Lahiri (1991). Bengali intonational phonology. Natural Language and Linguistic Theory 9(1), 47–96. Hayes-Harb, R. (2014). Acoustic-phonetic parameters in the perception of an accent. In J. M. Levis and A. Moyer (eds.), Social Dynamics in Second Language Accent. Berlin: De Gruyter Mouton. Hayward, R. J. (1991). Tone and accent in the Qafar noun. York Papers in Linguistics 15, 117–137.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

764 References Hayward, R. J. (2000). Afroasiatic. In B. Heine and D. Nurse (eds.), African Languages: An Introduction, 74–98. Cambridge: Cambridge University Press. Hayward, R. J. (2003). Omotic: The empty quarter of Afroasiatic Linguistics. In J. Lecarme (ed.), Research in Afroasiatic Grammar II: Selected Papers from the Fifth Conference on Afroasiatic Languages, Paris, 2000, 241–261. Amsterdam: John Benjamins. Hayward, R. J. (2006). The OHO constraint. In F. K. Erhard Voeltz (ed.), Studies in African Linguistic Typology, vol. 64, 155–169. Amsterdam: John Benjamins. He, A. X., and J. Lidz (2017). Verb learning in 14- and 18-month-old English-learning infants. Language Learning and Development, 13(3), 335–356. Headland, E. (1994). Diccionario bilingüe Uw Cuwa (Tunebo)—Español, Español—Uw Cuwa (Tunebo): Con una gramática Uw Cuwa (Tuneba). Bogotá: Instituto Lingüístico de Verano. Headland, P., and E. Headland (1976). Fonología del tunebo (trans. L. E. Henríquez). In Sistemas fonológicos de idiomas Colombia 3, 17–26. Lomalinda, Colombia: Instituto Linguístico de Verano. Heath, J. (1980). Dhuwal (Arnhem Land) Texts on Kinship and Other Subjects: With Grammatical Sketch and Dictionary. Sydney: Oceanic Linguistics. Heath, J. (1999). A Grammar of Koyra Chiini. Berlin: Walter de Gruyter. Heath, J. (2008). A Grammar of Jamsay. Berlin: Mouton de Gruyter. Heath, J. (2011). A Grammar of Tamashek (Tuareg of Mali). Berlin: Walter de Gruyter. Heath, J. G. (1984). Functional Grammar of Nunggubuyu. Canberra: Australian Institute of Aboriginal Studies. Hedberg, N. (2013). Multiple focus and cleft sentences. In K. Hartmann and T. Veenstra (eds.), Cleft Structures (Linguistik Aktuell), 227–250. Amsterdam: John Benjamins. Hedberg, N., and J. M. Sosa (2008). The prosody of topic and focus in spontaneous English dialogue. In C. Lee, M. Gordon, and D. Büring (eds.), Topic and Focus: Cross-Linguistic Perspectives on Meaning and Intonation, 101–120. Dordrecht: Springer. Hedberg, N., J. M. Sosa, and L. Fadden (2004). Meanings and configurations of questions in English. In Proceedings of Speech Prosody 2, 309–312, Nara. Hedberg, N., J. M. Sosa, E. Gürgülü, and M. Mameni (2010). Prosody and pragmatics of wh-interrogatives. In Proceedings of the 2010 Annual Conference of the Canadian Linguistic Association, Montreal. Heeren, W. F. L., and V. J. van Heuven (2014). The interaction of lexical and phrasal prosody in whispered speech. Journal of the Acoustical Society of America 136(6), 3272–3289. Heeren, W. F. L. (2015). Coding pitch differences in voiceless fricatives: Whispered relative to normal speech. Journal of the Acoustical Society of America 138, 3427–3438. Heffner, C., L. C. Dilley, J. D. McAuley, and M. A. Pitt (2013). When cues combine: How distal and proximal acoustic cues are integrated in word segmentation. Language and Cognitive Processes 28, 1275–1302. Heffner, C. C., and L. R. Slevc (2015). Prosodic structure as a parallel to musical structure. Frontiers in Psychology, 6, 1962, 1–14. Heijmans, L. (1999). The representation of the Tongeren lexical tone contrast. Ms., LSA Linguistic Summer Institute. Heilman, K. M., D. Bowers, L. Speedie, and B. Coslett (1984). Comprehension of affective and nonaf fective prosody. Neurology 34, 917–921. Heilman, K. M., R. Scholes, and R. T. Watson (1975). Auditory affective agnosia. Journal of Neurology, Neurosurgery, and Psychiatry 38, 69–72. Heine, B. (1993). Ik Dictionary. Cologne: Rüdiger Köppe. Heinz, J. (2007). The inductive learning of phonotactic patterns. PhD dissertation, University of California, Los Angeles. Heinz, J., R. W. N. Goedemans, and H. van der Hulst (2016). Dimensions of Phonological Stress. Cambridge: Cambridge University Press. Heldner, M. (2001). Spectral emphasis as an additional source of information in accent detection. In M. Bacchiani, J. Hirschberg, D. Litman, and M. Ostendorf (eds.), Proceedings of the Workshop on Prosody and Speech Recognition, 57–60, Red Bank, NJ.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 765 Heldner, M., and E. Strangert (2001). Temporal effects of focus in Swedish. Journal of Phonetics 29, 329–361. Helgason, P. (1993). On Coarticulation and Connected Speech Processes in Icelandic. Reykjavík: Málvísindastofnun Háskóla Íslands. Hellmuth, S. (2007). The relationship between prosodic structure and pitch accent distribution: Evidence from Egyptian Arabic. The Linguistic Review 24(2–3), 289–314. Hellmuth, S. (2011). Acoustic cues to focus and givenness in Egyptian Arabic. In Z. M. Hassan and B. Heselwood (eds.), Instrumental Studies in Arabic Phonetics (Current Issues in Linguistic Theory), 299–324. Amsterdam: John Benjamins. Hellmuth, S. (2013). Phonology. In J. Owens (ed.), The Oxford Handbook of Arabic Linguistics, 45–70. Oxford: Oxford University Press. Hellmuth, S. (2016). Explorations at the syntax-phonology interface in Arabic. In S. Davis and U. Soltan (eds.), Perspectives on Arabic Linguistics: Proceedings of the 27th Arabic Linguistics Symposium, Bloomington Indiana February 28th–March 2nd 2013, 75–97. Amsterdam: John Benjamins. Hellmuth, S. (2019). Prosodic variation. In Horesh, U. & Al-Wer, E (eds.), The Routledge Handbook of Arabic Sociolinguistics, 169–184. London: Routledge. Hellmuth, S., F. Kügler, and R. Singer (2007). Quantitative investigation of intonation in an endangered language. In P. Austin, O. Bond, and D. Nathan (eds.), Proceedings of the Conference on Language Documentation and Linguistic Theory, 123–132. London: School of Oriental and African Studies. Hellmuth, S., N. Louriz, B. Chlaihani, and R. Almbark (2015). F0 peak alignment in Moroccan Arabic polar questions. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Hellwig, B. (2011). A Grammar of Goemai (Mouton Grammar Library 51). Berlin: Mouton de Gruyter. Helm-Estabrooks, N., and M. L. Albert (1991). Manual of Aphasia Therapy. Austin: Pro-Ed. Helm-Estabrooks, N., M. Nicholas, and A. Morgan (1989). Melodic Intonation Therapy. Austin: Pro-Ed. Helmholtz, H. (1863). Die Lehre von den Tonempfindungen als physiologische Grundlage für die Theorie der Musik. Braunschweig: F. Vieweg und Sohn. Helson, H. (1930). The Tau Effect: An example of psychological relativity. Science 71(1847), 536–537. Henderson, E. (1952). The main features of Cambodian pronunciation. Bulletin of the School of Oriental and African Studies 14, 453–476. Henderson, E. (1965). The topography of certain phonetic and morphological features of Southeast Asian Languages. Lingua 15, 400–434. Henderson, J. K. (1998). Topics in Eastern and Central Arrernte grammar. PhD dissertation, University of Western Australia. Henderson, R. (2012). Morphological alternations at the intonational phrase edge. Natural Language and Linguistic Theory 30(3), 741–787. Hendriks, P. (2005). Asymmetries in the acquisition of contrastive stress. Paper presented at the workshop on Contrast, Information Structure and Intonation, Stockholm. Henry, M., J. D. McAuley, and M. Zaleha (2009). Evaluation of an imputed pitch velocity model of the auditory Tau Effect. Attention, Perception, and Psychophysics 71(6), 1399–1413. Hérault, G. (1978). Élements de grammaire adioukrou. Abidjan: Institut de Linguistique Appliquée. Herman, R. (1996). Final lowering in Kipare. Phonology 13, 171–196. Herman, R., M. E. Beckman, and K. Honda (1996). Subglottal pressure and final lowering in English. In Proceedings of the 4th International Conference on Spoken Language Processing, 145–148, Philadelphia. Hermans, B. (2011). The representation of word stress. In M. van Oostendorp, C. J. Ewen, E. V. Hume, and K. Rice (eds.), The Blackwell Companion to Phonology, vol. 2, 980–1002. Oxford: Wiley Blackwell. Hermans, B. (2013). Phonological features of Limburgian dialects. In F. Hinskens and J. Taeldeman (eds.), Language and Space: An International Handbook of Linguistic Variation: Vol. 3. Dutch, 336–355. Berlin: De Gruyter Mouton. Hermans, B., and F. Hinskens (2010). The phonological representation of the Limburg tonal accents. In J. E. Schmidt, E. Glaser, and N. Frey (eds.), Dynamik des Dialects: Wandel und Variation, 101–117. Stuttgart: Steiner.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

766 References Herment, S., N. Ballier, E. Delais-Roussarie, and A. Tortel (2014). Modelling interlanguage intonation: The case of questions. In Proceedings of Speech Prosody 7, 492–496, Dublin. Herment, S., A. Loukina, and A. Tortel (2012). The AixOx corpus. Retrieved 18 May 2020 from http:// sldr.org/sldr000784/fr. Hermes, A., D. Mücke, and A. Bastian (2017). The variability of syllable patterns in Tashlhiyt Berber and Polish. Journal of Phonetics 64, 127–144. Hermes, D. J., and J. C. van Gestel (1991). The frequency scale of speech intonation. Journal of the Acoustical Society of America 90, 97–102. Hernández Mendoza, F. (2017). Tono y fonología segmental en el triqui de chicahuaxtla. PhD dissertation, National Autonomous University of Mexico. Herrera Zandejas, E. (1993). Palabras, estratos y representaciones: Temas de la fonología en zoque. PhD dissertation, Colegio de México. Herrera Zendejas, E. (2011). Peso silábico y patron acentual en Huasteco. In K. Shklovsky, P. M. Pedro, and J. Coon (eds.), Proceedings of Formal Approaches to Mayan Linguistics (FAMLi), 135–146. Cambridge, MA: MIT Working Papers in Linguistics. Herrera Zendejas, E. (2014). Mapa fónico de las lenguas mexicanas: Formas sonoras 1 y 2 (2nd ed.). Mexico: El Colégio de México. Herring, S. (1990). Information structure as a consequence of word order type. Proceedings of the 16th Meeting of the Berkeley Linguistics Society, 163–174, Berkeley. Herrmann, A. (2015). The marking of information structure in German Sign Language. Lingua 165, 277–297. Herrmann, A., and M. Steinbach (2013). Nonmanuals in Sign Language. Amsterdam: John Benjamins. Herschensohn, J., and M. Young-Scholten (eds.) (2013). The Cambridge Handbook of Second Language Acquisition. Cambridge: Cambridge University Press. Hertel, T. J. (2003). Lexical and discourse factors in the second language acquisition of Spanish word order. Second Language Research 19(4), 273–304. Hertrich, I., and H. Ackermann (1993). Acoustic analysis of speech prosody in Huntington’s and Parkinson’s disease: A preliminary report. Clinical Linguistics and Phonetics 7, 285–297. Herzog, G. (1934). Speech-melody and primitive music. Musical Quarterly 20(4), 452–466. Hetzron, R. (1997a). Outer South Ethiopic. In R. Hetzron (ed.), The Semitic Languages, 535–549. London: Routledge. Hetzron, R. (1997b). Awngi phonology. In A. Kaye (ed.), Phonologies of Asia and Africa (vol. 1, 477–491). Winona Lake, IN: Eisenbrauns. Hickok, G., and D. Poeppel (2007). The cortical organization of speech processing. Nature Reviews Neuroscience 8(5), 393–402. Higashikawa, M., and F. D. Minifie (1999). Acoustical-perceptual correlates of ‘whisper pitch’ in synthetically generated vowels. Journal of Speech Language and Hearing Research 42, 583–591. Higgins, M. B., E. A. McCleary, A. E. Carney, and L. Schulte (2003). Longitudinal changes in children’s speech and voice physiology after cochlear implantation. Ear and Hearing 24(1), 48–70. Hildebrandt, K. A. (2007). Prosodic and grammatical domains in Limbu. Himalayan Linguistics 8, 1–34. Hillenbrand, J., and R. A. Houde (1996). Role of f0 and amplitude in the perception of intervocalic glottal stops. Journal of Speech and Hearing Research 39, 1182–1190. Hilton, K. (2016). The perception of overlapping speech: Effects of speaker prosody and listener attitudes. In INTERSPEECH 2016, 1260–1264, San Francisco. Himmelmann, N. P. (2010). Notes on Waima’a intonation. In M. Ewing and M. Klamer (eds.), Typological and Areal Analyses: Contributions from East Nusantara, 47–69. Canberra: Pacific Linguistics. Himmelmann, N. P. (2018). Some preliminary observations on prosody and information structure in Austronesian languages of Indonesia and East Timor. In S. Riesberg, A. Shiohara, and A. Utsumi (eds.), A Cross-Linguistic Perspective on Information Structure in Austronesian Languages, 347–374. Berlin: Language Science Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 767 Hincks, R., and J. Edlund (2009). Promoting increased pitch variation in oral presentations with transient visual feedback. Language Learning and Technology 13(3), 32–50. Hinton, L., G. Buckley, M. Kramer, and M. Meacham (1991). Preliminary analysis of Chalcatongo Mixtec tone. Occasional Papers on Linguistics 16, 147–155. Hintz, D. (2006). Stress in South Conchucos Quechua: A phonetic and phonological study. International Journal of American Linguistics 72, 477–521. Hirata, Y. (2004). Computer assisted pronunciation training for native English speakers learning Japanese pitch and durational contrasts. Computer Assisted Language Learning 17(3–4), 357–376. Hirayama, T. (1951). Kyūshū Hōgen Onchō no Kenkyū. Tokyo: Gakkai-no-shishin-sha. Hirose, H. (2010). Investigating the physiology of laryngeal structures. In W. J. Hardcastle, J. Laver, and F. E. Gibbon (eds.), The Handbook of Phonetic Sciences (2nd ed.), 130–152. Chichester: Wiley Blackwell. Hirose, K., and J. Tao (2015). Speech Prosody in Speech Synthesis: Modeling and Generation of Prosody for High Quality and Flexible Speech Synthesis. Berlin: Springer. Hirschberg, J. (1993). Pitch accent in context: Predicting intonational prominence from text. Artificial Intelligence 63(1–2), 305–340. Hirschberg, J. (2004). Pragmatics and intonation. In L. R. Horn and G. Ward (eds.), The Handbook of Pragmatics, 515–537. London: Blackwell. Hirschberg, J., and C. Avesani (2000). Prosodic disambiguation in English and Italian. In A. Botinis (ed.), Intonation: Analysis, Modelling and Technology, 87–96. Dordrecht: Kluwer Academic. Hirschberg, J., S. Beňuš, J. M. Brenier, F. Enos, S. Friedman, S. Gilman, C. Girand, M. Graciarena, A. Kathol, L. Michaelis, B. Pellom, E. Shriberg, and A. Stolcke (2005). Distinguishing deceptive from non-deceptive speech. In INTERSPEECH 2005, 1833–1836, Lisbon. Hirschberg, J., A. Gravano, A. Nenkova, E. Sneed, and G. Ward (2007). Intonational overload: Uses of the Downstepped (H* !H* L - L%) contour in read and spontaneous speech. In J. Cole and J. I. Hualde (eds.), Laboratory phonology 9, 455–482. Berlin: Mouton de Gruyter. Hirschberg, J., and G. Ward (1992). The influence of pitch range, duration, amplitude and spectral features on the interpretation of the rise-fall-rise intonation contour in English. Journal of Phonetics 20, 241–251. Hirschberg, J., and G. Ward (1995). The interpretation of the high-rise question contour in English. Journal of Pragmatics 24, 407–412. Hirsh-Pasek, K., D. G. Kemler Nelson, P. W. Jusczyk, K. W. Cassidy, B. Druss, and L. Kennedy (1987). Clauses are perceptual units for young infants. Cognition 26, 269–286. Hirson, A., J. P. French, and D. Howard (1995). Speech fundamental frequency over the telephone and face-to-face: Some implications for forensic phonetics. In J. Windsor-Lewis (ed.), Studies in General and English Phonetics: Essays in Honour of Professor J. D. O’Connor, 230–240. London: Routledge. Hirst, D. (1987). La représentation linguistique des systèmes prosodiques: Une approche cognitive. Habilitation thesis, University of Provence. Hirst, D. (1998). Intonation of British English. In D. Hirst and A. Di Cristo (eds.), Intonation Systems: A Survey of Twenty Languages, 56–77. Cambridge: Cambridge University Press. Hirst, D. (2009). The rhythm of text and the rhythm of utterances: From metrics to models. In INTERSPEECH 2009, 1519–1522, Brighton. Hirst, D., and A. Di Cristo (1998). A survey of intonation systems. In D. Hirst and A. Di Cristo (eds.), Intonation Systems: A Survey of Twenty Languages, 1–44. Cambridge: Cambridge University Press. Hirst, D., and R. Espesser (1993). Automatic modelling of fundamental frequency using a quadratic spline function. Travaux de l’Institut de Phontique d’Aix 15, 71–85. Hitchcock, L., and S. Greenberg (2001). Vowel height is intimately associated with stress accent in spontaneous American English discourse. In Eurospeech 2001, 79–82, Aalborg. Hixon, T., and J. Hoit (2005). Evaluation and Management of Speech Breathing Disorders: Principles and Methods. Tucson: Redington Brown. Hjalmarsson, A. (2011). The additive effect of turn-taking cues in human and synthetic voice. Speech Communication 53(1), 23–35.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

768 References Ho, A. T. (1976). The acoustic variation of Mandarin tones. Phonetica 33, 353–367. Ho, A. T. (1977). Intonation variation in a Mandarin sentence for three expressions: Interrogative, exclamatory, and declarative. Phonetica 34, 446–457. Ho, W. S. V. (2006). The tone-melody interface of popular songs written in tone languages. In Proceedings of the 9th International Conference on Music Perception and Cognition, 1414–1422, Bologna. Ho, W. S. V. (2010). A phonological study of the tone-melody correspondence in Cantonese pop music. PhD dissertation, University of Hong Kong. Hobbs, J. R. (1990). The Pierrehumbert-Hirschberg theory of intonational meaning made simple: Comments on Pierrehumbert and Hirschberg. In P. R. Cohen, J. Morgan, and M. E. Pollack (eds.), Intentions in Communication, 313–323. Cambridge, MA: MIT Press. Hoberman, R. D. (2008). Pausal forms. In K. Versteegh, M. Eid, A. Elgibali, M. Woidich, and A. Zaborski (eds.), Encyclopedia of Arabic Language and Linguistics, 564–570. Amsterdam: Brill. Hochberg, J. (1988). Learning Spanish stress: Developmental and theoretical perspectives. Language 64, 683–706. Hock, H. H. (2015). Prosody and dialectology of tonal shifts in Lithuanian and their implications. In P. Arkadiev, A. Holvoet, and B. Wiemer (eds.), Contemporary Approaches to Baltic Linguistics, 111–138. Berlin: De Gruyter Mouton. Hoequist, C. (1983). Syllable duration in stress-, syllable- and mora-timed languages. Phonetica 40(3), 203–237. Hofling, C. (2000). Itzaj Maya Grammar. Salt Lake City: University of Utah Press. Hofling, C. (2011). Mopan Maya-Spanish-English Dictionary. Salt Lake City: University of Utah Press. Hofstede, G. (2001). Culture’s Consequences: Comparing Values, Behaviors, Institutions, and Organizations across Nations (2nd ed.). Thousand Oaks: Sage. Hognestad, J. K. (2012). Tonelagsvariasjon i norsk: Synkrone og diakrone aspekter, med særlig fokus på vestnorsk. PhD dissertation, University of Agder. Höhle, B. (2009). Bootstrapping mechanisms in first language acquisition. Linguistics 47(2), 359–382. Höhle, B., R. Bijeljac-Babic, B. Herold, J. Weissenborn, and T. Nazzi (2009). Language specific pros odic preferences during the first half year of life: Evidence from German and French infants. Infant Behavior and Development 32(3), 262–274. Höhle, B., J. Weissenborn, D. Kiefer, A. Schulz, and M. Schmitz (2004). Functional elements in infants’ speech processing: The role of determiners in the syntactic categorization of lexical elements. Infancy 5(3), 341–353. Holbrook, A., and H.-T. Lu (1969). A study of intelligibility in whispered Chinese. Speech Monographs 36(4), 464–466. Hollenbach, B. E. (1984). The phonology and morphology of tone and laryngeals in Copala Trique. PhD dissertation, University of Arizona. Holler, T., P. Campisi, J. Allegro, N. K. Chadha, R. V. Harrison, B. Papsin, and K. Gordon (2010). Abnormal voicing in children using cochlear implants. Archives of Otolaryngology: Head and Neck Surgery 136(1), 17–21. Holt, C., K. Demuth, and I. Yuen (2015). The use of prosodic cues in sentence processing by prelingually deaf users of cochlear implants. Ear and Hearing 38(2), e101–e108. Holt, L. L. (2006). The mean matters: Effects of statistically defined nonspeech spectral distributions on speech categorization. Journal of the Acoustical Society of America 120, 2801–2817. Holton, G. (2005). Pitch, tone, and intonation in Tanacross. In S. Hargus and K. Rice (eds.), Athabaskan Prosody, 249–275. Amsterdam: John Benjamins. Holzgrefe-Lang, J., C. Wellmann, C. Petrone, R. Räling, H. Truckenbrodt, Höhle, B., and I. Wartenburger (2016). How pitch change and final lengthening cue boundary perception in German: Converging evidence from ERPs and prosodic judgements. Language, Cognition and Neuroscience, 3798(April), 1–17. Hombert, J.-M. (1976). Phonetic explanation of the development of tones from prevocalic consonants. UCLA Working Papers in Phonetics 33, 23–39.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 769 Hombert, J.-M. (1978). Consonant types, vowel quality, and tone. In V. A. Fromkin (ed.), Tone: A Linguistic Survey, 77–107. New York: Academic Press. Hombert, J.-M. (1984). Les systèmes tonals des langues africaines: Typologie et diachronie. Pholia 1, 113–164. Hombert, J.-M., J. J. Ohala, and E. G. Ewan (1979). Phonetic explanations for the development of tones. Language 55, 37–58. Hönig, F. (2016). Automatic assessment of prosody in second language learning. PhD dissertation, University of Erlangen-Nuremberg. Hönig, F., A. Batliner, E. Nöth, S. Schnieder, and J. Krajewski (2014a). Acoustic-prosodic characteristics of sleepy speech: Between performance and interpretation. In Proceedings of Speech Prosody 7, 864–868, Dublin. Hönig, F., A. Batliner, E. Nöth, S. Schnieder, and J. Krajewski (2014b). Automatic modelling of depressed speech: Relevant features and relevance of gender. In INTERSPEECH 2014, 1248–1252, Singapore. Hönig, F., A. Batliner, K. Weilhammer, and E. Nöth (2010). Automatic assessment of non-native prosody for English as L2. In Proceedings of Speech Prosody 5, Chicago. Hönig, F., T. Bocklet, K. Riedhammer, A. Batliner, and E. Nöth (2012). The automatic assessment of non-native prosody: Combining classical prosodic analysis with acoustic modelling. In INTERSPEECH 2012, 823–826, Portland. Honorof, D., and D. H. Whalen (2005). Perception of pitch location within a speaker’s f0 range. Journal of the Acoustical Society of America 117, 2193–2200. Hood, R. B., and R. F. Dixon (1969). Physical characteristics of speech rhythm of deaf and normalhearing speakers. Journal of Communication Disorders 2(1), 20–28. Hoogshagen, S. (1959). Three contrastive vowel lengths in Mixe. Zeitschrift fur Phonetik und allgemeine Sprachwissenschaft 12, 111–115. Hoole, P. (2014). Recent work on EMA methods at IPS Munich. Retrieved 22 May 2020 from https:// www.phonetik.uni-muenchen.de/~hoole/articmanual/ag501/carstens_workshop_summary_ issp2014.pdf. Hoole, P., and A. Zierdt (2010). Five-dimensional articulography. In B. Maassen and P. van Lieshout (eds.), Speech Motor Control: New Developments in Basic and Applied Research, 331–349. Oxford: Oxford University Press. Hopyan-Misakyan, T., K. Gordon, M. Dennis, and B. Papsin (2009). Recognition of affective speech prosody and facial affect in deaf children with unilateral right cochlear implants. Child Neuropsychology 15, 136–148. Hore, M. (1981). Syllable length and stress in Nunggubuyu. In B. Waters (ed.), Australian Phonologies: Collected Papers 5, 1–62. Darwin: Summer Institute of Linguistics. Horgues, C. (2013). French learners of L2 English: Intonation boundaries and the marking of lexical stress. Research in Language 11(1), 41–56. Hornby, P. A., and W. A. Hass (1970). Use of contrastive stress by preschool children. Journal of Speech Language and Hearing Research 13(2), 395–399. Horvath, F. S. (1973). Verbal and nonverbal clues to truth and deception during polygraph examin ations. Journal of Police Science and Administration 1, 138–152. Horwood, G. (1999). Anti-faithfulness and subtractive morphology. Ms., Rutgers University. Hosseini, A. (2014). The phonology and phonetics of prosodic prominence in Persian. PhD dissertation, University of Tokyo. Hostetler, R., and C. Hostetler (1975). A tentative description of Tinputz phonology: Phonologies of five Austronesian languages. Workpapers in Papua New Guinea Languages 13, 5–44. Hough, M. (2010). Melodic intonation therapy and aphasia: Another variation on a theme. Aphasiology 24, 775–786. House, A., D. Rowe, and P. J. Standen (1987). Affective prosody in the reading voice of stroke patients. Journal of Neurology, Neurosurgery, and Psychiatry 50, 910–912. House, D. (1990). Tonal Perception in Speech. Lund: Lund University Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

770 References House, D. (1996). Differential perception of tonal contours through the syllable. In Proceedings of the 4th International Conference of Spoken Language Processing, 2048–2051, Philadelphia. House, D. (1999). Perception of pitch and tonal timing: Implications for mechanisms of tonogenesis. In Proceedings of the 14th International Congress of Phonetic Sciences, 1823–1826, San Francisco. House, D. (2003). Perception of tone with particular reference to temporal alignment. In S. Kaji (ed.), Proceedings of the Symposium Cross-Linguistic Studies of Tonal Phenomena: Historical Development, Phonetics of Tone, and Descriptive Studies, 203–216. Tokyo: Research Institute for Language and Cultures of Asia and Africa, Tokyo University of Foreign Studies. House, D. (2004). Final rises and Swedish question intonation. In Proceedings of Fonetik 2004, 56–59, Stockholm. House, D., A. Karlsson, J.-O. Svantesson, and D. Tayanin (2009). The phrase-final accent in Kammu: Effects of tone, focus and engagement. In INTERSPEECH 2009, 2439–2442, Brighton. House, J. (2006). Constructing a context with intonation. Journal of Pragmatics 38(10), 1542–1558. Houston, D. M., P. W. Jusczyk, C. Kuijpers, R. Coolen, and A. Cutler (2000). Cross-language word segmentation by 9-month-olds. Psychonomic Bulletin and Review 7(3), 504–509. Houtsma, A. J. M., and J. L. Goldstein (1972). The central origin of the pitch of complex tones: Evidence from musical interval recognition. Journal of the Acoustical Society of America 51, 520–529. Houtsma, A. J. M., and J. Smurzynski (1990). Pitch identification and discrimination for complex tones with many harmonics. Journal of the Acoustical Society of America 87, 304–310. Howie, J. M. (1976). Acoustical Studies of Mandarin Vowels and Tones. Cambridge: Cambridge University Press. Hsieh, H.-I. (1970). The psychological reality of tone sandhi rules in Taiwanese. In Proceedings of the 6th Meeting of the Chicago Linguistics Society, 489–503. Chicago: Chicago Linguistic Society. Hsieh, H.-I. (1975). How generative is phonology? In E. F. Koerner (ed.), The TransformationalGenerative Paradigm and Modern Linguistic Theory, 109–144. Amsterdam: John Benjamins. Hsieh, H.-I. (1976). On the unreality of some phonological rules. Lingua 38, 1–19. Hsieh, H. (2016). Prosodic indicators of phrase structure in Tagalog transitive sentences. In H. Nomoto, T. Miyauchi, and A. Shiohara (eds.), Proceedings of AFLA 23rd Meeting of the Austronesian Formal Linguistics Association, 111–122. Canberra: Asia-Pacific Linguistics. Hsu, C.-S., and S.-A. Jun (1998). Prosodic strengthening in Taiwanese: Syntagmatic or paradigmatic? UCLA Working Papers in Phonetics 96, 69–89. Hua, Z. (2002). Phonological Development in Specific Contexts: Studies of Chinese-Speaking Children  (vol. 3). Clevedon: Multilingual Matters. Hua, Z., and B. Dodd (2000). The phonological acquisition of Putonghua (modern Standard Chinese). Journal of Child Language 27(1), 3–42. Hualde, J. I., G. Elordieta, and A. Elordieta (1994). The Basque Dialect of Lekeitio. Bilbao: Diputación Foral de Gipuzkoa. Hualde, J. I. (1997). Euskararen Azentuerak. Bilbao: University of the Basque Country Press. Hualde, J. I. (1998). A gap filled: Postpostinitial accent in Azkoitia Basque. Linguistics 36, 99–117. Hualde, J. I. (1999). Basque accentuation. In H. van der Hulst (ed.), Word Prosodic Systems in the Languages of Europe, 947–993. Berlin: Mouton de Gruyter. Hualde, J. I. (2003a). Accentuation. In J. I. Hualde and J. Ortiz de Urbina (eds.), A Grammar of Basque, 65–71. Berlin: Mouton de Gruyter. Hualde, J. I. (2003b). From phrase-final to post-initial accent in Western Basque. In P. Fikkert and H. Jacobs (eds.), Development in Prosodic Systems, 249–281. Berlin: Mouton de Gruyter. Hualde, J. I. (2005). The Sounds of Spanish. Cambridge: Cambridge University Press. Hualde, J. I. (2006). Stress removal and stress addition in Spanish. Journal of Portuguese Linguistics, 5(2)–6(1), 59–89. Hualde, J. I. (2007). Historical convergence and divergence in Basque accentuation. In T. Riad and C. Gussenhoven (eds.), Tunes: Vol. 1 Typological Studies in Word and Sentence Prosody, 291–322. Berlin: Mouton de Gruyter.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 771 Hualde, J. I. (2010). Secondary stress and stress clash in Spanish. In M. Ortega-Llebaria (ed.), Selected Proceedings of the 4th Conference on Laboratory Approaches to Spanish Phonology, 11–19. Somerville, MA: Cascadilla Press. Hualde, J. I. (2012). Two Basque accentual systems and word-prosodic typology. Lingua 122, 1335–1351. Hualde, J. I. (2013). Los sonidos del español. Cambridge: Cambridge University Press. (Revised Spanish-language edition of Hualde 2005.) Hualde, J. I., G. Elordieta, I. Gaminde, and R. Smiljanić (2002). From pitch-accent to stress-accent in Basque. In C. Gussenhoven and N. Warner (eds.), Laboratory Phonology 7, 547–584. Berlin: Mouton de Gruyter. Hualde, J. I., O. Lujanbio, and F. Torreira (2008). Stress and tone in Goizueta Basque. Journal of the International Phonetic Association 38, 1–24. Hualde, J. I., and Ortiz de Urbina, J. (2003). A Grammar of Basque. Berlin: Mouton de Gruyter. Hualde, J. I., and P. Prieto (2015). Intonational variation in Spanish: European and American varieties. In S. Frota and P. Prieto (eds.), Intonation in Romance, 350–391. Oxford: Oxford University Press. Huang, B. H., and S.-A. Jun (2011). The effect of age on the acquisition of second language prosody. Language and Speech 54(Pt 3), 387–414. Huang, H.-C. J. (2018). The nature of pretonic weak vowels in Squliq Atayal. Oceanic Linguistics 57(2), 265–288. Huang, T., and K. Johnson (2010). Language specificity in speech perception: Perception of Mandarin tones by native and nonnative listeners. Phonetica 67(4), 243–267. Hübscher, I., N. Esteve-Gibert, A. Igualada, and P. Prieto (2017). Intonation and gesture as bootstrapping devices in speaker uncertainty. First Language 37(1), 24–41. Hübscher, I., M. Garufi, and P. Prieto (2018). Preschoolers use prosodic mitigation strategies to encode polite stance. In Proceedings of Speech Prosody 9, 255–259, Poznań. Hübscher, I., and P. Prieto (2019). Gestural and prosodic development act as sister systems and jointly pave the way for children’s sociopragmatic development. Frontiers in Psychology 10, 1259. Hübscher, I., L. Vincze, and P. Prieto (2019). Children’s signaling of their uncertain knowledge state: Prosody, face and body cues come first. Language Learning and Development 15(4), 366–389. Hübscher, I., L. Wagner, and P. Prieto (2016). Young children’s sensitivity to polite stance expressed through audiovisual prosody in requests. In Proceedings of Speech Prosody 8, 897–901, Boston. Hudgins, C. V., and F. C. Numbers (1942). An investigation of intelligibility of speech of the deaf. Genetic Psychology Monographs 25(1–2), 189–292. Hudson, A. I., and A. Holbrook (1981). A study of reading fundamental vocal frequency of young black adults. Journal of Speech and Hearing Research 24, 197–201. Hudson, A. I., and A. Holbrook (1982). Fundamental frequency characteristics of young black adults: Spontaneous speaking and oral reading. Journal of Speech and Hearing Research 25, 25–28. Hudson, G. (1997). Amharic and Argobba. In R. Hetzron (ed.), The Semitic Languages, 457–485. London: Routledge. Huffman, F. (1976). The register problem in fifteen Mon-Khmer languages. Oceanic Linguistics special publication ‘Austroasiatic Studies’ Pt 1, 575–589. Huffman, M. (1987). Measures of phonation type in Hmong. Journal of the Acoustical Society of America 81, 495–504. Hulstaert, G. (1934). Grammaire du Lomongo, Première Partie: Phonologie. Tervuren, Belgium: Musée Royal de l’Afrique Centrale. Hunter, G. G., and E. V. Pike (1969). The phonology and tone sandhi of Molinos Mixtec. Journal of Linguistics 47, 24–40. Hur, W. (1985). Gyengsang do pangen uy sengco. Korean Journal 25, 19–32. Hustad, K. C., K. Gorton, and J. Lee (2010). Classification of speech and language profiles in 4-yearold children with cerebral palsy: A prospective preliminary study. Journal of Speech, Language, and Hearing Research 53, 1496–1513. Huttenlauch, C., I. Feldhausen, and B. Braun (2018). The purpose shapes the vocative: Prosodic real isation of Colombian Spanish vocatives. Journal of the International Phonetic Association 48, 33–56.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

772 References Hyde, B. (2011). Extrametricality and nonfinality. In M. van Oostendorp, C. J. Ewen, E. V. Hume, and K. Rice (eds.), The Blackwell Companion to Phonology, vol. 2, 1027–1051. Oxford: Wiley Blackwell. Hyde, K. L., I. Peretz, and R. J. Zatorre (2008). Evidence for the role of the right auditory cortex in fine pitch resolution. Neuropsychologia 46(2), 632–639. Hyman, L. M. (1976). Phonologization. In A. Juilland (ed.), Linguistic Studies Presented to J. H. Greenberg, 407–418. Saratoga: Anma Libri. Hyman, L. M. (1977). On the nature of linguistic stress. In L. M. Hyman (ed.), USC Studies in Stress and Accent, 37–82. Los Angeles: USC Linguistics Department. Hyman, L. M. (1979a). A reanalysis of tonal downstep. Journal of African Languages and Linguistics 1, 9–29. Hyman, L. M. (1979b). Tonology of the Babanki noun. Studies in African Linguistics 10, 159–178. Hyman, L. M. (1981). Noni Grammatical Structure (Southern California Occasional Papers in Linguistics 9). Los Angeles: University of Southern California. Hyman, L. M. (1985). A Theory of Phonological Weight. Dordrecht: Foris. Hyman, L. M. (1986a). The representation of multiple tone heights. In K. Bogers, H. van der Hulst, and M. Mous (eds.), The Phonological Representation of suprasegmentals, 109–152. Dordrecht: Foris. Hyman, L. M. (1986b). Downstep deletion in Aghem. In D. Odden (ed.), Current Approaches to African Linguistics, vol. 4, 209–222. Dordrecht: Foris. Hyman, L. M. (1987). Prosodic domains in Kukuya. Natural Language and Linguistic Theory 5, 311–333. Hyman, L. M. (1988). Syllable structure constraints on tonal contours. Linguistique Africaine 1, 49–60. Hyman, L. M. (1989). Accent in Bantu: An appraisal. Studies in the Linguistic Sciences 19, 115–134. Hyman, L. M. (2001). Tone systems. In M. Haspelmath, E. König, W. Oesterreicher, and W. Raible (eds.), Language Typology and Language Universals: An International Handbook, 1367–1380. Berlin: Walter de Gruyter. Hyman, L. M. (2003). African languages and phonological theory. GLOT International 7(6), 153–163. Hyman, L. M. (2005). Initial vowel and prefix tone in Kom: Related to the Bantu augment? In K. Bostoen and J. Maniacky (eds.), Studies in African Comparative Linguistics with special focus on Bantu and Mande: Essays in honour of Y. Bastin and C. Grégoire, 313–341. Cologne: Rüdiger Köppe. Hyman, L. M. (2006). Word-prosodic typology. Phonology 23, 225–257. Hyman, L. M. (2007). Universals of tone rules: 30 years later. In T. Riad and C. Gussenhoven (eds.), Tones and Tunes: Studies in Word and Sentence Prosody, 1–34. Berlin: Mouton de Gruyter. Hyman, L. M. (2008). Directional asymmetries in the morphology and phonology of words, with special reference to Bantu. Linguistics 46, 309–349. Hyman, L. M. (2009). How (not) to do phonological typology: The case of pitch-accent. Language Sciences 31, 213–238. Hyman, L. M. (2010a). Kuki-Thaadow: An African tone system in Southeast Asia. In F. Floricic (ed.), Essais de typologie et de linguistique générale, 31–51. Lyon: Les Presses de l’Ecole Normale Supérieure. Hyman, L. M. (2010b). Do tones have features? In J. A. Goldsmith, E. V. Hume, and L. Wetzels (eds.), Tones and Features: Phonetic and Phonological Perspectives, 50–80. Berlin: De Gruyter Mouton. Hyman, L. M. (2010c). Affixation by place of articulation: The case of Tiene. In M. Cysouw and J. Wohlgemuth (eds.), Rara and Rarissima: Collecting and Interpreting Unusual Characteristics of Human Languages, 145–184. Berlin: Mouton de Gruyter. Hyman, L. M. (2011). Tone: Is it different? In J. A. Goldsmith, J. Riggle, and A. Yu (eds.), The Handbook of Phonological Theory (2nd ed.), 197–239. Malden: Wiley Blackwell. Hyman, L. M. (2013). Penultimate lengthening in Bantu. In B. Bickel, L. A. Grenoble, D. A. Peterson, and A. Timberlake (eds.), Language Typology and Historical Contingency: In Honor of Johanna Nichols, 309–330. Amsterdam: John Benjamins. Hyman, L. M. (2014c). Do all languages have word accent? In H. van der Hulst (ed.), Word Stress: Theoretical and Typological Issues, 56–82. Cambridge: Cambridge University Press. Hyman, L. M. (2014a). Tonal melodies in the Lulamogi verb. Africana Linguistica 20, 163–180. Hyman, L. M. (2014b). How autosegmental is phonology? The Linguistic Review 31, 363–400. Hyman, L. M. (2015). Does Gokana really have syllables? A postscript to Hyman (2011). Phonology 32, 303–306.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 773 Hyman, L. M. (2016a). Morphological tonal assignments in conflict: Who wins? In E. L. Palancar and J. L. Léonard (eds.), Tone and Inflection: New Facts and Perspectives, 15–39. Berlin: De Gruyter Mouton. Hyman, L. M. (2016b). Lexical vs. grammatical tone: Sorting out the differences. In Proceedings of the 5th International Symposium on Tonal Aspects of Languages, 6–11, Buffalo, NY. Hyman, L. M. (2016c). Amazonia and the typology of tone systems. In H. Avelino, M. Coller, and L. Wetzels (eds.), The Phonetics and Phonology of Laryngeal Features in Native American Languages, 235–257. Leiden: Brill. Hyman, L. M. (2017). On reconstructing tone in Proto-Niger-Congo. In V. Vydrin and A. Lyakhovich (eds.), In the hot yellow Africa, 175–191. St. Petersburg: Nestor-Istoria. Hyman, L. M. (2018). The autosegmental approach to tone in Lusoga. In D. Brentari and J. Lee (eds.), Shaping Phonology, 47–69. Chicago: University of Chicago Press. Hyman, L. M. (2019a). Morphological tonal assignments in conflict: Who wins? In E. L. Palancar and J. L. Léonard (eds.), Tones and Inflections, 15–40. Berlin: De Gruyter Mouton. Hyman, L. M. (2019b). Positional prominence vs. word accent: Is there a difference? In R. Goedemans, J. Heinz, and H. van der Hulst (eds.), The Study of Word Stress and Accent: Theories, Methods and Data, 60–75. Cambridge University Press. Hyman, L. M., and E. R. Byarushengo (1984). A model of Haya tonology. In G. N. Clements and J. A. Goldsmith (eds.), Autosegmental Studies in Bantu tone, 53–103. Dordrecht: Foris. Hyman, L. M., and W. R. Leben (2000). Suprasegmental processes. In G. Booij, C. Lehmann, and J. Mugdan (eds.), A Handbook on Inflection and Word Formation, 587–594. Berlin: De Gruyter. Hyman, L. M., and D. J. Magaji (1970). Essentials of Gwari Grammar (Institute of African Studies Occasional Publication 27). Ibadan: University of Ibadan Press. Hyman, L. M., and K. C. Monaka (2011). Tonal and non-tonal intonation in Shekgalagari. In S. Frota, G. Elordieta, and P. Prieto (eds.), Prosodic Categories: Production, Perception and Comprehension, 267–289. Dordrecht: Springer. Hyman, L. M., and R. G. Schuh (1974). Universals of tone rules: Evidence from West Africa. Linguistic Inquiry 5, 81–115. Hyman, L. M., and M. Tadadjeu (1976). Floating tones in Mbam-Nkam. In L. M. Hyman (ed.), Studies in Bantu Tonology (Southern California Occasional Papers in Linguistics 3), 58–111. Los Angeles: University of Southern California. Hyman, L. M., and K. VanBik (2002). Tone and syllable structure in Hakha-Lai. Proceedings of the 28th Meeting of the Berkeley Linguistics Society, 15–28, Berkeley. Hyman, L. M., and K. VanBik (2004). Directional rule application and output problems in Hakha Lai tone. Language and Linguistics 5, 821–861. Igarashi, Y. (2015). Intonation. In H. Kubozono (ed.), Handbook of Japanese Phonetics and Phonology, 525–568. Berlin: De Gruyter Mouton. Iivonen, A. (1998). Intonation in Finnish. In D. Hirst and A. Di Cristo (eds.), Intonation Systems: A Survey of Twenty Languages, 314–330. Cambridge: Cambridge University Press. Iivonen, A., and H. Harnud (2005). Acoustical comparison of the monophthong systems in Finnish, Mongolian and Udmurt. Journal of the International Phonetic Association 35(1), 59–71. Imai, K. (1998). Intonation and relevance. In R. Carston and S. Uchida (eds.), Relevance Theory: Applications and Implications (Pragmatics and Beyond New Series 37), 69–86. Amsterdam: John Benjamins. Imoto, K., Y. Tsubota, A. Raux, T. Kawahara, and M. Dantsuji (2002). Modeling and automatic detection of English sentence stress for computer-assisted English prosody learning system. In INTERSPEECH 2002, 749–752, Denver. Impey, A. (2013). Keeping in touch via cassette: Tracing Dinka songs from cattle camp to transnational audio-letter. Journal of African Cultural Studies 25, 197–210. INALI. (2015). Proyecto de indicadores sociolingüísticos de las lenguas indígenas nacionales. Instituto Nacional de Lenguas Indígenas. Retrieved 20 May 2020 from http://site.inali.gob.mx/Micrositios/ estadistica_basica. Indjieva, E. (2009). Oirat Tones and Break Indices (O-ToBi): Intonational structure of the Oirat Language. PhD dissertation, University of Hawaiʻi at Manoa.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

774 References Ingram, D. (1995). The cultural basis of prosodic modifications to infants and children: A response to Fernald’s universalist theory. Journal of Child Language 22, 223–233. Ingram, J., and P. Mühlhäusler (2008). Norfolk Island-Pitcairn English: Phonetics and phonology. In B. Kortmann and E. W. Schneider (eds.), Varieties of English: Vol. 3. The Pacific and Australasia, 267–291. Berlin: Mouton de Gruyter. Ingram, J., and T. A. T. Nguyễn (2006). Stress, tone and word prosody in Vietnamese compounds. In Proceedings of the 11th Australian International Conference on Speech Science and Technology, 193–198, Auckland. Ingram, R. M. (1978). Theme, rheme, topic, and comment in the syntax of American Sign Language. Sign Language Studies 20(1), 193–218. Inkelas, S. (1989a). Prosodic constituency in the lexicon. PhD dissertation, Stanford University. Inkelas, S. (1989b). Register tone and the phonological representation of downstep. In L. Tuller and I. Haik (eds.), Current Approaches to African Linguistics, vol. 6, 65–82. Dordrecht: Foris. Inkelas, S. (1998). The theoretical status of morphologically conditioned phonology: A case study of dominance effects. In G. Booij and J. van Marle (eds.), Yearbook of Morphology 1997, 121–155. Dordrecht: Springer. Inkelas, S. (1999). Exceptional stress-attracting suffixes in Turkish: Representations versus the grammar. In R. Kager, H. van der Hulst, and W. Zonneveld (eds.), The Prosody–Morphology Interface, 134–187. Cambridge: Cambridge University Press. Inkelas, S. (2014). The Interplay of Morphology and Phonology. Oxford: Oxford University Press. Inkelas, S., and W. R. Leben (1990). Where phonology and phonetics intersect: The case of Hausa intonation. In J. Kingston and M. E. Beckman (eds.), Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech, 17–34. Cambridge: Cambridge University Press. Inkelas, S., W. R. Leben, and M. Cobbler (1986). Lexical and phrasal tone in Hausa. In Proceedings of the 17th Annual Meeting of the North East Linguistics Society, 327–341. Amherst: Graduate Linguistics Student Association. Inkelas, S., and C. O. Orgun (1998). Level (non)ordering in recursive morphology: Evidence from Turkish. In S. Lapointe, D. Brentari, and P. Farrell (eds.), Morphology and its Relation to Phonology and Syntax, 360–392. Stanford: CSLI. Inkelas, S., and D. Zec (1988). Serbo-Croatian pitch accent: The interaction of tone, stress, and inton ation. Language 64, 227–248. Inkelas, S., and D. Zec (eds.) (1990). The Phonology-Syntax Connection. Stanford: CSLI. Inkelas, S., and C. Zoll (2005). Reduplication: Doubling in Morphology. Cambridge: Cambridge University Press. IPDSP (Institute of Phonetics and Digital Speech Processing, University of Kiel). (2009). [Figures for Kohler (2005).] Retrieved 30 April 2020 from http://www.ipds.uni-kiel.de/kjk/pub_exx/kk2005_2/ kk_05.html. İpek, C. (2011). Phonetic realization of focus with no on-focus pitch range expansion in Turkish. In Proceedings of the 17th International Congress of Phonetic Sciences, 140–143, Hong Kong. İpek, C. (2015). The phonology and phonetics of Turkish intonation. PhD dissertation, University of Southern California. İpek, C., and S.-A. Jun (2013). Towards a model of intonational phonology of Turkish: Neutral inton ation. Proceedings of Meetings on Acoustics 19, Montreal. Isačenko, A. V., and H. J. Schädlich (1970). A Model of Standard German Intonation. The Hague: Mouton. Ishchenko, O. (2015). Ukrainian in prosodic typology of world languages. In O. Novikova, P. Hilkes, and U. Schweier (eds.), Dialog der Sprachen, Dialog der Kulturen: Die Ukraine aus globaler Sicht— Internationale virtuelle Konferenz der Ukrainistik, 76–85. Munich: Otto Sagner. İşsever, S. (2003). Information structure in Turkish: The word order-prosody interface. Lingua 113(11), 1025–1053. İşsever, S. (2006). On the NSR and focus projection in Turkish. In S. Yağcıoğlu and C. Değer (eds.), Advances in Turkish Linguistics: Proceedings of the 12th International Conference on Turkish Linguistics, 421–435, İzmir.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 775 Ito, C. (2008). Historical development and analogical change in Yanbian Korean accent. Harvard Studies in Korean Linguistics 12, 165–178. Ito, C. (2014a). Compound tensification and laryngeal co-occurrence restrictions in Yanbian Korean. Phonology 31, 349–398. Ito, C. (2014b). Loanword accentuation in Yanbian Korean: A weighted-constraints analysis. Natural Language and Linguistic Theory 32, 537–592. Ito, C., and M. Kenstowicz (2017) Pitch accent in Korean. In Oxford Research Encyclopedias. DOI:10.1093/acrefore/9780199384655.013.242. Ito, J. (1990). Prosodic minimality in Japanese. In Proceedings of the 26th Meeting of the Chicago Linguistics Society, 213–239. Chicago: Chicago Linguistic Society. Ito, J., and A. Mester (1992/2003). Weak layering and word binarity. In T. Honma, M. Okazaki, T. Tabata, and S.-I. Tanaka (eds.), A New Century of Phonology and Phonological Theory: A Festschrift for Professor Shosuke Haraguchi on the Occasion of His Sixtieth Birthday, 26–65. Tokyo: Kaitakusha. Ito, J., and A. Mester (1997). Sympathy theory and German truncations. In V. Miglio and B. Morén (eds.), University of Maryland Working Papers in Linguistics 5: Selected Phonology Papers from Hopkins Optimality Theory Workshop 1997 / University of Maryland Mayfest 1997, 117–139. College Park: University of Maryland. Ito, J., and A. Mester (2007). Prosodic adjunction in Japanese compounds. In Y. Miyamoto and M. Ochi (eds.), Formal Approaches to Japanese Linguistics: Proceedings of FAJL 4 (MIT Working Papers in Linguistics 55), 97–111. Cambridge, MA: MIT Department of Linguistics and Philosophy. Ito, J., and A. Mester (2013). Prosodic subcategories in Japanese. Lingua 124(1), 20–40. Ito, K. (2018). Gradual development of focus prosody and affect prosody comprehension: A proposal for a holistic approach. In P. Prieto and N. Esteve-Gibert (eds.), Prosodic Development in First Language Acquisition, 295–314. Amsterdam: John Benjamins. Ito, K., S. Bibyk, L. Wagner, and S. R. Speer (2014). Interpretation of contrastive pitch accent in 6- to 11-year-old English speaking children (and adults). Journal of Child Language 41(1), 84–110. Ito, K., N. Jincho, U. Minai, N. Yamane, and R. Mazuka (2012). Intonation facilitates contrast resolution: Evidence from Japanese adults and 6-year olds. Journal of Memory and Language 66(1), 265–284. Ito, K., and S. R. Speer (2008). Anticipatory effects of intonation: Eye movements during instructed visual search. Journal of Memory and Language 58, 541–573. Iversen, J. R., A. D. Patel, and K. Ohgushi (2008). Perception of rhythmic grouping depends on auditory experience. Journal of the Acoustical Society of America 124(4), 2263–2271. Jackendoff, R. (1972). Semantic Interpretation in Generative Grammar. Cambridge, MA: MIT Press. Jackson, K. (1967). A Historical Phonology of Breton. Dublin: Dublin Institute for Advanced Studies. Jacobs, N. G. (2005). Yiddish: A Linguistic Introduction. Cambridge: Cambridge University Press. Jacobsen, B. (1991). Recent phonetic changes in the Polar Eskimo dialect. Études Inuit Studies 15(1), 51–73. Jacobsen, B. (2000). The question of ‘stress’ in West Greenlandic: An acoustic investigation of rhythmicization, intonation, and syllable weight. Phonetica 57, 40–67. Jacobson, J. L., D. C. Boersma, R. B. Fields, and K. L. Olson (1983). Paralinguistic features of adult speech to infants and small children. Child Development 54(2), 436–442. Jacobson, S. A. (1985). Siberian Yupik and Central Yupik prosody. In M. E. Krauss (ed.), Yupik Eskimo Prosodic Systems: Descriptive and Comparative Studies, 25–46. Fairbanks: Alaska Native Language Center, University of Alaska. Jacobson, S. A. (1990). Comparison of Central Alaskan Yup’ik Eskimo and Central Siberian Yupik Eskimo. International Journal of American Linguistics 56(2), 264–286. Jacques, G. (2011). Tonal alternations in the Pumi verbal system. Language and Linguistics 12, 359–392. Jacques, G., and A. Michaud (2011). Approaching the historical phonology of three highly eroded Sino-Tibetan languages: Naxi, Na and Laze. Diachronica 28, 468–498. Jaeger, J. J., and R. Van Valin (1982). Initial consonant clusters in Yatée Zapotec. International Journal of American Linguistics 48(2), 125–138. Jakobson, R. (1960). Concluding statement: Linguistics and poetics. In T. A. Sebeok (ed.), Style in Language, 350–377. Cambridge, MA: MIT Press. (Reprinted in J. J. Weber, 1996, The Stylistics Reader: From Roman Jakobson to the Present, London: Arnold.)

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

776 References James, D. J. (1994). Word tone in a Papuan language: An autosegmental solution. Language and Linguistics in Melanesia 25, 125–148. Jamieson, A. R. (1977). Chiquihuitlán tone. In W. R. Merrifield (ed.), Studies in Otomanguean Phonology, 107–135. Dallas: SIL/University of Texas at Arlington. Jang, T.-Y. (2009). Automatic assessment of non-native prosody using rhythm metrics: Focusing on Korean speakers’ English pronunciation. In Proceedings of the 2nd International Conference on East Asian Linguistics, Beijing. Jannedy, S. (2007). Prosodic focus in Vietnamese. In S. Ishihara, S. Jannedy, and A. Schwarz (eds.), Interdisciplinary Studies on Information Structure, 209–230. Potsdam: Universitätsverlag Potsdam. Janota, P., and P. Jančák (1970). An investigation of Czech vowel quantity by means of listening tests. Acta Universitatis Carolinae, Phonetica pragensis 1, 31–68. Jantunen, T. (2001). On topic in Finnish Sign Language. MA thesis, University of Helsinki. Jany, C. (2011). The phonetics and phonology of Chuxnabán Mixe. Linguistic Discovery 9(1), 31–70. Jarceva, V. N. (ed.) (1990). Lingvističeskij ènciklopedičeskij slovar. Moscow: Sovetskaja Ènciklopedija. Jaroslavienė, J. (2015). Lietuvių kalbos trumpųjų ir ilgųjų balsių kiekybės etalonai. Bendrinė kalba 88. Jarrold, C., J. Boucher, and J. Russel (1997). Language profiles in children with autism. Autism 1, 57–76. Järvikivi, J., D. Aalto, R. Aulanko, and M. Vainio (2007). Perception of vowel length: Tonality cues categorization even in a quantity language. In Proceedings of the 16th International Congress of Phonetic Sciences, 693–696, Saarbrücken. Järvinen-Pasley, A., S. Peppé, G. King-Smith, and P. Heaton (2008). The relationship between form and function level receptive prosodic abilities in autism. Journal of Autism and Developmental Disorders 38(7), 1328–1340. Jasinskaja, K. (2016). Information structure in Slavik. In C. Féry and S. Ishihara (eds.), Oxford Handbook of Information Structure, 709–732. Oxford: Oxford University Press. Jassem, W. (1962). Akcent języka polskiego. Wrocław: Zakład Narodowy im Ossolińskich. Jastrow, O. (1997). The Neo-Aramaic languages. In R. Hetzron (ed.), The Semitic Languages, 334–377. London: Routledge. Jeanne, L. V. (1982). Some phonological rules of Hopi. International Journal of American Linguistics 48, 245–270. Jenewari, C. E. W. (1977). Studies in Kalabari syntax. PhD dissertation, University of Ibadan. Jenkins, J. (2004). Research in teaching pronunciation and intonation. Annual Review of Applied Linguistics 24, 109–125. Jeon, H.-S., and F. Nolan (2017). Prosodic marking of narrow focus in Seoul Korean. Laboratory Phonology 8(1), 2. Jeong, S., and C. Potts (2016). Intonational sentence-type conventions for perlocutionary effects: An experimental investigation. In M. Moroney, C.-R. Little, J. Collard, and D. Burgdorf (eds.), Proceedings of Semantics and Linguistic Theory 26 (SALT 26), 1–22, Austin, TX. Jepson, K. (2014). Intonational marking of focus in Torau. In L. Gawne and J. Vaughan (eds.), Selected Papers from the 44th Conference of the Australian Linguistic Society, 2013, http://www.als.asn.au. Jepson, K., J. Fletcher, and H. Stoakes (in press). Post-tonic consonant lengthening in Djambarrpuyŋu. Language and Speech. Jessen, M. (1993). Stress-conditions on vowel quality and quantity in German. Working Papers of the Cornell Phonetics Laboratory 8, 1–27. Jiang, P., and A. Chen (2016). Representation of Mandarin intonation: Boundary tone revisited. In Proceedings of the 23rd North American Conference on Chinese Linguistics, 97–109, Eugene, OR. Jibril, M. (1986). Sociolinguistic variation in Nigerian English. English World-Wide 7, 147–174. Jilka, M. (2000). The contribution of intonation to the perception of foreign accent. PhD dissertation, University of Stuttgart. Jilka, M., and B. Möbius (2007). The influence of vowel quality features on peak alignment. In INTERSPEECH 2007, 2621–2624, Antwerp. Jin, S. (1996). An acoustic study of sentence stress in Mandarin Chinese. PhD dissertation, The Ohio State University.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 777 Jitcă, D., V. Apopeiu, O. Păduraru, and S. Marusca (2015). Transcription of Romanian intonation. In S. Frota and P. Prieto (eds.), Intonational Variation in Romance, 284–316. Oxford: Oxford University Press. Johnson, O. E., and S. E. Levinsohn (1990). Gramática Secoya (Cuadernos Etnolingüísticos 11). Quito: Instituto Lingüístico de Verano. Johnson, E. K. (2008). Infants use prosodically conditioned acoustic-phonetic cues to extract words from speech. Journal of the Acoustical Society of America, 123(6), EL144–8. Johnson, E. K., and A. Seidl (2008). Clause segmentation by 6-month-old infants: A crosslinguistic perspective. Infancy 13(5), 440–455. Johnson, E. K., and A. Seidl (2009). At 11 months, prosody still outranks statistics. Developmental Science 12(1), 131–141. Johnson, E. K., A. Seidl, and M. D. Tyler (2014). The edge factor in early word segmentation: Utterancelevel prosody enables word form extraction by 6-month-olds. PLoS ONE 9(1): e83546. Johnson, H. (2000). A grammar of San Miguel Chimalapa Zoque. PhD dissertation, University of Texas at Austin. Johnson, K. (2011). An appreciation of I. Lehiste. In UC Berkeley Phonology Lab Annual Report, 1–8. Berkeley: Department of Linguistics, University of California, Berkeley. Johnson, K., E. A. Strand, and M. D’Imperio (1999). Auditory-visual integration of talker gender in vowel perception. Journal of Phonetics 27, 359–384. Jokweni, M. W. (1995). Aspects of Isixhosa phrasal phonology. PhD dissertation, University of Illinois at Urbana-Champaign. Jones, A. M. (1959). Studies in African Music (vol. 1). Oxford: Oxford University Press. Jones, D. (1909). Intonation Curves. Leipzig: Teubner. Jones, D. (1967). The Phoneme: Its Nature and Use. Cambridge: Cambridge University Press. Jones, H. N. (2009). Prosody in Parkinson’s disease. Perspectives on Neurophysiology and Neurogenic Speech and Language Disorders 19, 77–82. Jones, H. N., R. Shrivastav, S. S. Wu, E. K. Plowman-Prine, and J. C. Rosenbek (2009). Fundamental frequency and intensity mean and variability before and after two behavioral treatments for aprosodia. Journal of Medical Speech-Language Pathology 17, 45–53. Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review 83, 323–355. Jones, M. R., and M. Boltz (1989). Dynamic attending and responses to time. Psychological Review 96(3), 459. Jongman, A., Z. Qin, J. Zhang, and J. A. Sereno (2017). Just noticeable differences for pitch direction, height, and slope for Mandarin and English listeners. Journal of the Acoustical Society of America 142, EL163. Jordan, T., and P. Sergeant (2000). Effects of distance on visual and audio-visual speech recognition. Language and Speech 43(1), 107–124. Joseph, B. D., and I. Philippaki-Warburton (1987). Modern Greek. London: Croom Helm. Josipović, V. (1994). English and Croatian in the typology of rhythmic systems. Studia Romanica et Anglica Zagrebniensia, 39, 25–37. Josserand, J. K. (1983). Mixtec dialect history. PhD dissertation, Tulane University. Jowitt, D. (1991). Nigerian English Usage. Lagos: Longman. Jowitt, D. (2000). Patterns of Nigerian English intonation. English World-Wide 21, 63–80. Jože, T. (1967). Pojmovanje tonemičnosti slovenskega jezika. Slavistična revija 15(1–2), 64–108. Juárez García, C., and A. Cervantes Lozada (2005). Temas de (morfo)fonología del Mazahua de el Déposito, San Felipe del Progreso, Estado de México. MA thesis, Metropolitan Autonomous University Iztapalapa. Jukes, A. (2006). Makassarese (basa Mangkasara'): A description of an Austronesian language of South Sulawesi. PhD dissertation, University of Melbourne. Jun, H. (1998). Hamkyengto pangenuy umcoe tayhan yenku. Yanbian, China: Heuk-ryong-kang Chosun Minjok.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

778 References Jun, J., J. Kim, H. Lee, and S.-A. Jun (2006). The prosodic structure and pitch accent of Northern Kyungsang Korean. Journal of East Asian Linguistics 15, 289–317. Jun, S.-A. (1989). The accentual pattern and prosody of Chonnam dialect of Korean. In S. Kuno, J. Whitman, Y.-S. Kang, I.-H. Lee, and S.-Y. Bak (eds.), Harvard Studies in Korean Linguistics III, 89–100. Seoul: Hanshin. Jun, S.-A. (1993). The phonetics and phonology of Korean prosody. PhD dissertation, The Ohio State University (Published 1996, New York: Garland; 2018, New York: Routledge.). Jun, S.-A. (1996). Influence of microprosody on macroprosody: A case of phrase initial strengthening. UCLA Working Papers in Phonetics 92, 97–116. Jun, S.-A. (1998). The accentual phrase in the Korean prosodic hierarchy. Phonology 15(2), 189–226. Jun, S.-A. (2000). K-ToBI (Korean ToBI ) labelling conventions: Version 3. Speech Sciences 7, 143–169. Jun, S.-A. (ed.) (2005a). Prosodic Typology: The Phonology of Intonation and Phrasing. Oxford: Oxford University Press. Jun, S.-A. (2005b). Korean intonational phonology and prosodic transcription. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 201–229. Oxford: Oxford University Press. Jun, S.-A. (2005c). Prosodic typology. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 432–458. Oxford: Oxford University Press. Jun, S.-A. (2006). Intonational phonology of Seoul Korean revisited. In T. J. Vance and K. Jones (eds.), Japanese-Korean Linguistics, vol. 14, 15–26. Stanford: CSLI. Jun, S.-A. (2011). Prosodic markings of complex NP focus, syntax, and the pre-/post-focus string. In Proceedings of the 28th West Coast Conference on Formal Linguistics, 214–230. Somerville, MA: Cascadilla Proceedings Project. Jun, S.-A. (ed.) (2014a). Prosodic Typology II: The Phonology of Intonation and Phrasing. Oxford: Oxford University Press. Jun, S.-A. (2014b). Prosodic typology: By prominence type, word prosody, and macro-rhythm. In S.A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 520–539. Oxford: Oxford University Press. Jun, S.-A., M. E. Beckman, and H.-J. Lee (1998). Fiberscopic evidence for the influence on vowel devoicing of the glottal configurations for Korean obstruents. UCLA Working Papers in Phonetics 96, 43–68. Jun, S.-A., and G. Elordieta (1997). Intonational structure of Lekeitio Basque. In A. Botinis, G. Kouroupetroglou, and G. Carayiannis (eds.), Intonation: Theory, Models and Applications, 193–196. Athens: Greece. Jun, S.-A., and J. Fletcher (2014). Methodology of studying intonation: From data collection to data analysis. In S.-A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 493–519. Oxford: Oxford University Press. Jun, S.-A., and C. Foreman (1996). Boundary tones and focus realization in African American English intonation. Paper presented at the 132nd meeting of the Acoustical Society of America, Honolulu. Jun, S.-A., and C. Fougeron (1995). The accentual phrase and the prosodic structure of French. In Proceedings of the 13th International Congress of Phonetic Sciences, vol. 2, 722–725, Stockholm. Jun, S.-A., and C. Fougeron (2000). A phonological model of French intonation. In A. Botinis (eds.), Intonation: Analysis, Modeling and Technology, 209–242. Dordrecht: Kluwer Academic. Jun, S.-A., and C. Fougeron (2002). Realizations of accentual phrase in French intonation. Probus 14, 147–172. Jun, S.-A., and X. Jiang (2019). Differences in prosodic phrasing in marking syntax vs. focus: Data from Yanbian Korean. The Linguistic Review 36(1), 117–150. Jun, S.-A., and H. S. Kim (2007). VP focus and narrow focus in Korean. In Proceedings of the 16th International Congress of Phonetic Sciences, 1277–1280, Saarbrücken. Jun, S.-A., and H.-J. Lee (1998). Phonetic and phonological markers of contrastive focus in Korean. In Proceedings of the 5th International Conference on Spoken Language Processing, 1295–1298, Sydney.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 779 Jun, S.-A., and M. Oh (1996). A prosodic analysis of three types of wh-phrases in Korean. Language and Speech 39(1), 37–61. Jun, S.-A., and M. Oh (2000). Acquisition of second language intonation. In Proceedings of the 6th International Conference on Spoken Language Processing, 73–76, Beijing. Jun, S.-A., C. Vicenik, and I. Lofstedt (2007). Intonational phonology of Georgian. UCLA Working Papers in Phonetics 106, 41–57. Junqua, J.-C. (1996). The influence of acoustics on speech production: A noise-induced stress phenomenon known as the Lombard reflex. Speech Communication 20, 13–22. Jurafsky, D., E. Shriberg, B. Fox, and T. Curl (1998). Lexical, prosodic, and syntactic cues for dialog acts. In ACL/ISCA Special Interest Group on Discourse and Dialogue, Discourse Relations and Discourse Markers, 114–120. Retrieved 8 June 2020 from https://www.aclweb.org/anthology/ W98-0319. Jurgec, P. (2007). Acoustic analysis of tones in contemporary Standard Slovene: Preliminary findings. Slovene Linguistic Studies 6, 195–207. Jusczyk, P. W., and R. N. Aslin (1995). Infants’ detection of the sound patterns of words in fluent speech. Cognitive Psychology 29, 1–23. Jusczyk, P. W., A. Cutler, and N. J. Redanz (1993). Infants’ preference for the predominant stress patterns of English words. Child Development 64(3), 675–687. Jusczyk, P. W., D. M. Houston, and M. R. Newsome (1999). The beginnings of word segmentation in English-learning infants. Cognitive Psychology 39(3), 159–207. Ka, O. (1988). Wolof phonology and morphology: A non-linear approach. PhD dissertation, University of Illinois at Urbana-Champaign. Kaan, E., C. M. Barkley, M. Bao, and R. P. Wayland (2008). Thai lexical tone perception in native speakers of Thai, English and Mandarin Chinese: An event-related potentials training study. BMC Neuroscience 9, 53. Kabak, B., and I. Vogel (2001). The phonological word and stress assignment in Turkish. Phonology 18, 315–360. Kabak, B. (2016). Refin(d)ing Turkish stress as a multifaceted phenomenon. Handout for a talk given at the Second Conference on Central Asian Languages and Linguistics (ConCALL-2), Indiana University. Kabak, B., and A. Revithiadou (2009). An interface approach to prosodic word recursion. In J. Grijzenhout and B. Kabak (eds.), Phonological Domains: Universals and Deviations, 105–132. Berlin: Mouton de Gruyter. Kachru, B. B. (1983). The Indianization of English: The English Language in India. Oxford: Oxford University Press. Kachru, B. B. (1985). Standards, codification and sociolinguistic realism: The English language in the outer circle. In R. Quirk and H. G. Widdowson (eds.), English in the World: Teaching and Learning the Language and Literatures, 11–30. Cambridge: Cambridge University Press. Kadin, G., and O. Engstrand (2005). Tonal word accents produced by Swedish 18- and 24-month-olds. In Proceedings of Fonetik 2005, 67–70, Göteborg. Kadmon, N., and C. Roberts (1986). Prosody and scope: The role of discourse structure. In Proceedings of the 22nd Meeting of the Chicago Linguistics Society, 16–28. Chicago: Chicago Linguistic Society. Kager, R. (1989). A Metrical Theory of Stress and Destressing in English and Dutch. Dordrecht: Foris. Kager, R. (2007). Feet and metrical stress. In P. de Lacy (ed.), The Cambridge Handbook of Phonology, 195–227. Cambridge: Cambridge University Press. Kager, R. (2012). Stress in windows: Language typology and factorial typology. Lingua 122, 1454–1493. Kager, R., and V. Martínez-Paricio (2018). The internally layered foot in Dutch. Linguistics 56, 69–114. Kahnemuyipour, A. (2003). Syntactic categories and Persian stress. Natural Language and Linguistic Theory 21(2), 333–379. Kahnemuyipour, A. (2009). The Syntax of Sentential Stress. Oxford: Oxford University Press. Kaiki, N., and Y. Sagisaka (1992). The control of segmental duration in speech synthesis using statis tical methods. In Speech Perception, Production and Linguistic Structure, 391–402. Tokyo: Ohmsha.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

780 References Kainada, E. (2010). Boundary-related durations in Modern Greek. In Proceedings of ExLing 2010-Third Workshop on Experimental Linguistics, 68–72, Athens. Kainada, E. (2012). The acoustics of post-nasal stop voicing in Standard Modern Greek. In Z. Gavriilidou et al. (eds.), Proceedings of the 10th International Conference on Greek Linguistics: Selected Papers, 320–329. Komotini: Democritus University of Thrace. Kainada, E. (2014). F0 alignment and scaling as markers of prosodic constituency. In G. Kotzoglou et al. (eds), Proceedings of the 11th International Conference on Greek Linguistics, 580–590. Rhodes: University of the Aegean. Kainada, E. and M. Baltazani (2014). The vocalic system of the dialect of Ipiros. In G. Kotzoglou et al. (eds.), Proceedings of the 11th International Conference on Greek Linguistics: Selected Papers, 591–602. Rhodes: University of the Aegean. Kainada, E., and A. Lengeris (2015). Native language influences on the production of second-language prosody. Journal of the International Phonetic Association 45, 269–287. Kaisse, E. (1985). Some theoretical consequences of stress rules in Turkish. In Proceedings of the 21st Meeting of the Chicago Linguistics Society, 199–209. Chicago: Chicago Linguistics Society. Kaisse, E., and A. Zwicky (eds.) (1987). Syntactic conditions on phonological rules (special issue). Phonology Yearbook 4. Kakumasu, J. (1986). Urubu-Kaapor. In D. C. Derbyshire and G. K. Pullum (eds.), Handbook of Amazonian Languages, vol. 1, 326–406. Berlin: Mouton de Gruyter. Kalaldeh, R., A. Dorn, and A. Ní Chasaide (2009). Tonal alignment in three varieties of HibernoEnglish. In INTERSPEECH 2009, 2443–2446, Brighton. Kaland, C. (2019). Acoustic correlates of word stress in Papuan Malay. Journal of Phonetics 74, 55–74. Kalathottukaren, R. T., S. C. Purdy, and E. Ballard (2015). Prosody perception and musical pitch discrimination in adults using cochlear implants. International Journal of Audiology, 54, 444–452. Kalinowski, C. (2015). A typology of morphosyntactic encoding of focus in African languages. PhD dissertation, University at Buffalo. Kálmán, L., and Á. Nádasdy (1994). A hangsúly. In F. Kiefer (ed.), Strukturális magyar nyelvtan, 393–467. Budapest: Akadémiai Kiadó. Kalstrom, M. R., and E. V. Pike (1968). Stress in the phonological system of Eastern Popoloca. Phonetica 18, 16–30. Kamali, B. (2011). Topics at the PF interface of Turkish. PhD dissertation, Harvard University. Kamholz, D. C. (2014). Austronesians in Papua: Diversification and change in South Halmahera-West New Guinea. PhD dissertation, University of California, Berkeley. Kan, S. (2009). Prosodic domains and the syntax-prosody mapping in Turkish, MA thesis, Boğaziçi University. Kanerva, J. M. (1990). Focusing on phonological phrases in Chichewa. In S. Inkelas and D. Zec (eds.), The Phonology-Syntax Connection, 145–161. Chicago: University of Chicago Press. Kang, O. (2010). Relative salience of suprasegmentals features on judgments of L2 comprehensibility and accentedness. System 38, 301–315. Kang, Y., T.-J. Yoon, and S. Han (2015). Frequency effects of the vowel length merger in Seoul Korean. Laboratory Phonology 6, 469–503. Kanner, L. (1943). Autistic disorders of affective contact. Nervous Child 2, 217–250. Kaplan, L. D. (1985). Seward Peninsula Inupiaq consonant gradation and its relationship to prosody. In M. E. Krauss (ed.), Yupik Eskimo Prosodic Systems: Descriptive and Comparative Studies (Alaska Native Language Center Research Papers 7), 191–210. Fairbanks: Alaska Native Language Center, University of Alaska. Kaplan, L. D. (2000). Seward Peninsula Inupiaq and language contact around Bering Strait. In M.A. Mahieu and N. Tersis (eds.), Variations on Polysynthesis: The Eskaleut languages, 261–272. Amsterdam: John Benjamins. Kaplan, P. S., J. Bachorowski, M. J. Smoski, and W. J. Hudenko (2002). Infants of depressed mothers, although competent learners, fail to learn in response to their own mothers’ infant-directed speech. Psychological Science 13, 268–271. Kariņš, K. A. (1996). The prosodic structure of Latvian. PhD dissertation, University of Pennsylvania.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 781 Karlsson, A. (2005). Rhythm and intonation in Halh Mongolian. PhD dissertation, Lund University. Karlsson, A. (2014). The intonational phonology of Halh Mongolian. In S.-A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 187–215. Oxford: Oxford University Press. Karlsson, A. (2018). Coordination between lexical tones and melody in traditional Kammu singing. Journal of the Phonetic Society of Japan 22(3), 30–41. Karlsson, A., and A. Holmer (2011). Interaction between word order, information structure and inton ation in Puyuma. In M. Endo (ed.), Papers in Austroasiatic and Austronesian Linguistics, 28–38. Tokyo: Aoyama Gakuin University. Karlsson, A., D. House, and J.-O. Svantesson (2012). Intonation adapts to lexical tone: The case of Kammu. Phonetica 69, 28–47. Karlsson, A., D. House, J.-O. Svantesson, and D. Tayanin (2010). Influence of lexical tones on intonation in Kammu. In INTERSPEECH 2010, 1740–1743, Makuhari. Karlsson, A., and J.-O. Svantesson (2016). Sonority and syllabification in casual and formal Mongolian speech. In M. J. Ball and N. Müller (eds.), Challenging Sonority: Cross-Linguistic Evidence (Studies in Phonetics and Phonology), 110–121. Sheffield: Equinox. Karlsson, F. (1983). Suomen kielen äänne ja muotorakenne. Porvoo, Finland: WSOY. Karow, C. M., T. P. Marquardt, and R. C. Marshall (2001). Affective processing in left and right hemisphere brain-damaged subjects with and without subcortical involvement. Aphasiology 15, 715–729. Karpiński, M. (2006). Struktura i intonacja polskiego dialogu zadaniowego. Poznań: Wydawnictwo Naukowe UAM. Karvonen, D. H. (2005). Word prosody in Finnish. PhD dissertation, University of California, Santa Cruz. Kasevič, V. B. (1986). Anakcentnyje jazyki i singarmonizm. In V. M. Nadeljaev (ed.), Fonetika jazykov Sibiri i sopredel’nyh regionov, 14–17. Novosibirsk: Nauka. Kasisopa, B., V. Attina, and D. K. Burnham (2014). The Lombard effect with Thai lexical tones: An acoustic analysis of articulatory modifications in noise. In INTERSPEECH 2014, 1717–1721, Singapore. Kastrinaki, A. (2003). The temporal correlates of lexical and phrasal stress in Greek, exploring rhythmic stress: Durational patterns for the case of Greek words. MSc dissertation, University of Edinburgh. Katsika, A. (2016). The role of prominence in determining the scope of boundary-related lengthening in Greek. Journal of Phonetics 55, 149–181. Katsika, A., J. Krivokapić, C. Mooshammer, M. K. Tiede, and J. L. Goldstein (2014). The coordination of boundary tones and their interaction with prominence. Journal of Phonetics 44, 62–82. Katz, D. (1983). Zur Dialektologie des Jiddischen. In W. Besch, U. Knoop, W. Putschke, and H. E. Wiegand (eds.), Dialektologie: Ein Handbuch zur deutschen und allgemeinen Dialektforschung, 1018–1041. Berlin: De Gruyter. Katz, G. S., J. F. Cohn, and C. A. Moore (1996). A combination of vocal f0 dynamic and summary features discriminates between three pragmatic categories of infant-directed speech. Child Develop ment 67(1), 205–217. Katz, J., and E. O. Selkirk (2011). Contrastive focus vs. discourse-new: Evidence from phonetic prom inence in English. Language 87(4), 771–816. Kaufman, D. (2005). Aspects of pragmatic focus in Tagalog. In M. Ross and I. Wayan Arka (eds.), The Many Faces of Austronesian Voice Systems: Some New Empirical Studies, 175–196. Canberra: ANU Press. Kaufman, D. (2010). The morphosyntax of Tagalog clitics: A typological approach. PhD dissertation, Cornell University. Kaufman, T. (1969). Teco: A new Mayan language. International Journal of American Linguistics 35(2), 154–174. Kaufman, T. (1972). El proto-tzeltal-tzotzil: Fonología comparada y diccionario reconstruido. Mexico: National Autonomous University of Mexico. Kaufman, T. (1976a). Archaeological and linguistic correlations in Mayaland and associated areas of Meso-America. World Archaeology 8(1), 101–118.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

782 References Kaufman, T. (1976b). Proyecto de alfabetos y ortografías para escribir las lenguas mayances. Antigua, Guatemala: Proyecto Lingüístico Francisco Marroquín. Kaufman, T. (2003). A preliminary Mayan etymological dictionary. Ms., Foundation for the Advancement of Mesoamerican Studies. Retrieved 20 May 2020 from http://www.famsi.org/ reports/01051. Kawachi, K. (2007). A grammar of Sidaama (Sidamo): A Cushitic language of Ethiopia. PhD dissertation, University at Buffalo. Kawahara, S. (2015). The phonology of Japanese accent. In H. Kubozono (ed.), Handbook of Japanese Phonetics and Phonology, 445–492. Boston: Mouton. Kawamoto, K. (2000). The Poetics of Japanese Verse: Imagery, Structure, Meter. Tokyo: University of Tokyo Press. Kazlauskienė, A. (2015). Pirminio lietuvių kalbos ritmo dėsningumai. Kaunas: Vytauto Didžiojo Universitetas. Keane, E. (2014). The intonational phonology of Tamil. In S.-A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 118–153. Oxford: Oxford University Press. Keating, P. A. (2006). Phonetic encoding of prosodic structure. In J. Harrington and M. Tabain (eds.), Speech Production: Models, Phonetic Processes, and Techniques (Macquarie Monographs in Cognitive Science), 167–186. New York: Psychology Press. Keating, P. A., T. Cho, C. Fougeron, and C.-S. Hsu (2003). Domain-initial strengthening in four languages. In J. Local, R. Ogden, and R. Temple (eds.), Phonetic Interpretation: Papers in Laboratory Phonology 6, 145–163. Cambridge: Cambridge University Press. Keating, P. A., M. Garellek, and J. Kreiman (2015). Acoustic properties of different kinds of creaky voice. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Keating, P. A., and G. Kuo (2012). Comparison of speaking fundamental frequency in English and Mandarin. Journal of the Acoustical Society of America 132, 1050–1060. Keating, P. A., and S. Shattuck-Hufnagel (2002). A prosodic view of word form encoding for speech production. UCLA Working Papers in Phonetics 101, 112–156. Keating, P. A., and Y.-L. Shue (2009). Voice quality variation with fundamental frequency in English and Mandarin. Journal of the Acoustical Society of America 126, 2221. Kedar, Y., M. Casasola, and B. Lust (2006). Getting there faster: 18- and 24-month-old infants’ use of function words to determine reference. Child Development 77(2), 325–338. Keen, S. (1983). Yukulta. In R. M. W. Dixon and B. Blake (eds.), Handbook of Australian Languages, vol. 3, 190–304. Canberra: Australian National University Press. Kehoe, M. (1997). Stress error patterns in English-speaking children’s word productions. Clinical Linguistics and Phonetics 11(5), 389–409. Kehoe, M. (2001). Prosodic patterns in children’s multisyllabic word productions. Language, Speech, and Hearing Services in Schools 32(4), 284–294. Kehoe, M. (2013). The Development of Prosody and Prosodic Structure. New York: Nova Science. Kehoe, M., C. Lleó, and M. Rakow (2011). Speech rhythm in the pronunciation of German and Spanish monolingual and German-Spanish bilingual 3-year-olds. Linguistische Berichte 2011(227), 323–352. Kehoe, M., and C. Stoel-Gammon (1997). The acquisition of prosodic structure: An investigation of current accounts of children’s prosodic development. Language 73(1), 113–144. Kehrein, W. (2017). There’s no tone in Cologne: Against tone-segment interactions in Franconian. In W. Kehrein, B. Köhnlein, P. Boersma, and M. van Oostendorp (eds.), Segmental Structure and Tone, 147–194. Berlin: De Gruyter. Kehrein, W., and C. Golston (2004). A prosodic theory of laryngeal contrasts. Phonology 21, 1–33. Keller, K. (1959). The phonemes of Chontal (Mayan). International Journal of American Linguistics 25(1), 44–53. Keller, N. E. (1999). Cambios tonales en la palabra verbal de la lengua tanimuca. In Congreso de Lingüística Amerindia y Criolla 3, 72–95. Bogotá: CCELA. Kelly, M. H., and J. K. Bock (1988). Stress in time. Journal of Experimental Psychology: Human Perception and Performance 14, 389–403.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 783 Kelly, M. H. (1992). Using sound to solve syntactic problems: The role of phonology in grammatical category assignments. Psychological Review 99, 349–364. Kelm, O. R. (1987). An acoustic study on the differences of contrastive emphasis between native and nonnative Spanish. Hispania 70(3), 627–633. Kemler Nelson, D. G., K. Hirsh-Pasek, P. W. Jusczyk, and K. W. Cassidy (1989). How prosodic cues in motherese might assist language learning. Journal of Child Language 16, 55–68. Kendon, A. (1980). Gesticulation and speech: Two aspects of the process of utterance. In M. R. Key (ed.), Nonverbal Communication and Language, 207–227. The Hague: Mouton. Kendon, A. (1997). Gesture. Annual Review of Anthropology 26, 109–128. Kendon, A. (2004). Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press. Kennedy, R. (2008). Bugotu and Cheke Holo reduplication: In defence of the emergence of the unmarked. Phonology 25, 61–82. Kennelly, S. (1999). The syntax of the p-focus position in Turkish. In G. Rebuschi and L. Tuller (eds.), The Grammar of Focus, 179–211. Amsterdam: John Benjamins. Kensinger, K. M. (1963). The phonological hierarchy of Cashinahua (Pano). In Studies in Peruvian Indian Languages, vol. 1, 207–217. Norman: Summer Institute of Linguistics of the University of Oklahoma. Kenstowicz, M. (1997). Quality-sensitive stress. Rivista di Linguistica 9(1), 157–187. Kenstowicz, M., and H.-S. Sohn (1997). Phrasing and focus in North Kyungsang Korean. MIT Working Papers in Linguistics 30, 25–47. Kent, R. D., and J. C. Rosenbek (1982). Prosodic disturbance and neurologic lesion. Brain and Language 15, 259–291. Kent, R. D., G. Weismer, J. F. Kent, H. K. Vorperian, and J. R. Duffy (1999). Acoustic studies of dysarthric speech: Methods, progress and potential. Journal of Communication Disorders 32, 141–186. Keppel-Jones, D. (2001). The Strict Metrical Tradition: Variations in the Literary Iambic Pentameter from Sidney and Spenser to Matthew Arnold. Montreal: McGill-Queen’s University Press. Ketkaew, C., and P. Pittayaporn (2014). Mapping between lexical tones and musical notes in Thai pop songs. In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation (PACLIC 28), 160–169, Phuket. Ketkaew, C., and P. Pittayaporn (2015). Do note values affect parallelism between lexical tones and musical notes in Thai pop songs? In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Key, H. (1961). Phonotactics of Cayuvava. International Journal of American Linguistics 27, 143–150. Key, H. (1967). Morphology of Cayuvava. The Hague: Mouton. Khachaturyan, M. (2015). Grammaire du mano. Mandenkan 54, 1–253. Khan, S. U. D. (2008). Intonational phonology and focus prosody of Bengali. PhD dissertation, University of California, Los Angeles. Khan, S. U. D. (2014). The intonational phonology of Bangladeshi Standard Bengali. In S.-A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 81–117. Oxford: Oxford University Press. Khan, S. U. D. (2016). The intonation of South Asian languages: Towards a comparative analysis. Proceedings of Formal Approaches to South Asian Languages 6, 23–36. Kibe, N. (2000). Seinanbu Kyūshū Nikei Akusento no Kenkyū. Tokyo: Benseisha. Kidd, G. R. (1989). Articulatory rate-context effects in phoneme identification. Journal of Experimental Psychology: Human Perception and Performance 15, 736–748. Kidder, E. (2013). Prominence in Yucatec Maya: The role of stress in Yucatec Maya words. PhD dissertation, University of Arizona. Kilbourn-Ceron, O. (2017). Speech production planning affects phonological variability: A case study in French liaison. In Proceedings of the Annual Meetings on Phonology, Los Angeles. Kilian-Hatz, C. (2008). A Grammar of Modern Khwe (Research in Khoisan Studies 23). Cologne: Köppe. Kim, G.-R. (1988). The pitch-accent system of the Taegu dialect of Korean with emphasis on tone sandhi at the phrase level. PhD dissertation, University of Hawaiʻi.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

784 References Kim, H., J. Cole, H. Choi, and M. Hasegawa-Johnson (2004). The effect of accent on acoustic cues to stop voicing and place of articulation in radio news speech. In Proceedings of Speech Prosody 2, 29–32, Nara. Kim, J., and S.-A. Jun (2009). Prosodic structure and focus prosody of South Kyungsang Korean. Language Research 45(1), 43–66. Kim, N.-J. (1997). Tone, segments, and their interaction in North Kyungsang Korean: A correspondence theoretic account. PhD dissertation, The Ohio State University. Kim, S. (2004). The role of prosodic phrasing in Korean word segmentation. PhD dissertation, University of California, Los Angeles. Kim, S., and T. Cho (2013). Prosodic boundary information modulates phonetic categorization. Journal of the Acoustical Society of America 134, EL19–EL25. Kim, S., R. Paul, H. Tager-Flusberg, and C. Lord (2014). Language and communication in ASD. In F. Volkmar, S. Rogers, R. Paul, and K. Pelphrey (eds.), Handbook of Autism Spectrum Disorders (4th ed.), vol. 1, 176–190. New York: John Wiley and Sons. Kim, Y. (2008). Topics in the phonology and morphology of San Francisco Del Mar Huave. PhD dissertation, University of California, Berkeley. Kim, Y. (2011). Algunas evidencias sobre representaciones tonales en amuzgo de San Pedro Amuzgos. In Proceedings of the 5th Conference on Indigenous Languages of Latin America, Austin. Kimmelman, V. (2012). Word order in Russian Sign Language. Sign Language Studies 12(3), 414–445. Kimmelman, V. (2014). Information structure in Russian Sign Language and Sign Language of the Netherlands. PhD dissertation, University of Amsterdam. Kimmelman, V., and R. Pfau (2016). Information structure in sign languages. In C. Féry and S. Ishihara (eds.), The Oxford Handbook of Information Structure, 814–833. Oxford: Oxford University Press.‫‏‬ King, H. B. (1998). The Declarative Intonation of Dyirbal: An Acoustic Analysis. Munich: Lincom Europa. King, R. (2006). Dialectal variation in Korean. In H.-M. Sohn (ed.), Korean Language in Culture and Society, 264–281. Honolulu: University of Hawaiʻi Press. Kingston, J. (2005). The phonetics of Athabaskan tonogenesis. In S. Hargus and K. Rice (eds.), Athabaskan Prosody, 137–184. Amsterdam: John Benjamins. Kingston, J. (2011). Tonogenesis. In M. van Oostendorp, C. J. Ewen, E. V. Hume, and K. Rice (eds.), The Blackwell Companion to Phonology, vol. 4, 2304–2334. Oxford: Wiley Blackwell. Kingston, J., and R. L. Diehl (1994). Phonetic knowledge. Language 70, 419–454. Kiparsky, P. (1977). The rhythmic structure of English verse. Linguistic Inquiry 8(2), 189–247. Kiparsky, P. (1984). A compositional approach to Vedic word accent. In S. D. Joshi (ed.), Amrtadhara: Prof R. N. Dandeka Felicitation Volume, 201–210. Delhi: Ajanta Publications. Kiparsky, P. (2006). A modular metrics for folk verse. In B. E. Dresher and N. Friedberg (eds.), Formal Approaches to Poetry, 7–49. The Hague: Mouton. Kiparsky, P. (2018a). Formal and empirical issues in phonological typology. In L. M. Hyman and F. Plank (eds.), Phonological Typology. Berlin: De Gruyter Mouton. Kiparsky, P. (2018b). Indo-European origins of the Greek hexameter. In D. Gunkel and O. Hackstein (eds.), Sprache und Metrik, 77–127. Leiden: Brill. Kiparsky, P. (in press). Kalevala and Mordvin meter. Studia Metrica. Kirby, J. (2011). Vietnamese (Hanoi Vietnamese). Journal of the International Phonetic Association 41(3), 381–392. Kirby, J., and M. Brunelle (2017). Southeast Asian tone in areal perspective. In R. Hickey (ed.), The Cambridge Handbook of Areal Linguistics, 703–731. Cambridge: Cambridge University Press. Kirby, J., and D. R. Ladd (2016a). Effects of obstruent voicing on vowel f0: Evidence from ‘true voicing’ languages. Journal of the Acoustical Society of America 140, 2400–2411. Kirby, J., and D. R. Ladd (2016b). Tone-melody correspondence in Vietnamese popular song. In Proceedings of the 5th International Symposium on Tonal Aspects of Languages, 48–51, Buffalo, NY. Kirchhübel, C., A. W. Stedman, and D. Howard (2013). Analyzing deceptive speech. In Proceedings of the International Conference on Engineering Psychology and Cognitive Ergonomics, 134–141, Las Vegas.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 785 Kirk, K. I., and M. Hudgins (2016). Speech perception and spoken word recognition in children with cochlear implants. In N. M. Young and K. I. Kirk (eds.), Pediatric Cochlear Implantation: Learning and the Brain, 145–161. New York: Springer. Kisilevsky, B. S., S. M. J. Hains, C. A. Brown, C. T. Lee, B. Cowperthwaite, and S. S. Stutzman (2009). Fetal sensitivity to properties of maternal speech and language. Infant Behavior and Development 32(1), 59–71. Kiss, K. É. (2002). The Syntax of Hungarian. Cambridge: Cambridge University Press. Kisseberth, C. W. (2016). Chimiini intonation. In L. J. Downing and A. Rialland (eds.), Intonation in African Tone Languages, 225–284. Berlin: De Gruyter Mouton. Kisseberth, C. W., and M. I. Abasheikh (1974). Vowel length in ChiMwi:ni: A case study of the role of grammar in phonology. In R. A. Anthony and A. Bruck (eds.), Papers from the Parasession on Natural Phonology, 193–209. Chicago: Chicago Linguistic Society. Kisseberth, C. W., and M. I. Abasheikh (2011). Chimwiini phonological phrasing revisited. Lingua 121(13), 1987–2013. Kita, S. (2009). Cross-cultural variation of speech-accompanying gesture: A review. Language and Cognitive Processes 24(2), 145–167. Kitamura, C., and D. K. Burnham (2003). Pitch and communicative intent in mothers speech: Adjustments for age and sex in the first year. Infancy 4(1), 85–110. Kitamura, C., C. Thanavishuth, D. K. Burnham, and S. Luksaneeyanawin (2001). Universality and specificity in infant-directed speech: Pitch modifications as a function of infant age and sex in a tonal and non-tonal language. Infant Behavior and Development 24(4), 372–392. Kjelgaard, M., and S. R. Speer (1999). Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity. Journal of Memory and Language 40(2), 153–194. Kjelgaard, M., and H. Tager-Flusberg (2001). An investigation of language impairment in autism: Implications for genetic subgroups. Language and Cognitive Processes 16, 287–308. Klassen, J., and M. Wagner (2017). Prosodic prominence shifts are anaphoric. Journal of Memory and Language 92, 305–326. Klatt, D. H. (1973). Discrimination of fundamental frequency contours in synthetic speech: Implications for models of pitch perception. Journal of the Acoustical Society of America 53, 8–16. Klatt, D. H. (1975). Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics 3(3), 129–140. Klatt, D. H. (1976). Linguistic uses of segmental duration in English: Acoustic and perceptual evidence. Journal of the Acoustical Society of America 59, 1208–1221. Klein, D., R. J. Zatorre, B. Milner, and V. Zhao (2001). A cross-linguistic PET study of tone perception in Mandarin Chinese and English speakers. NeuroImage 13(4), 646–653. Klein, T. (2002). Infixation and segmental constraint effects: UM and IN in Tagalog, Chamorro, and Toba Batak. Ms., University of Manchester. Kleinschmidt, D. F., and F. T. Jaeger (2015). Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel. Psychological Review 122, 148–203. Klessa, K. (2016). Analiza wybranych cech zmienności iloczasowej w różnych stylach wypowiedzi na podstawie korpusów nagrań dla technologii mowy. Prace Filologiczne 69, 215–235. Klouda, G. V., D. A. Robin, N. R. Graff-Radford, and W. Cooper (1998). The role of callosal connections in speech prosody. Brain and Language 35, 154–171. Knapp, M. L., and J. Hall (2009). Nonverbal Communication in Human Interaction. Wadsworth: Cengage Learning. Knapp Ring, M. H. (2008). Fonología segmental y léxica del mazahua. Mexico: Instituto Nacional de Antropología e Historia. Knight, R.-A. (2003). Peaks and plateaux: The production and perception of intonational high targets in English. PhD dissertation, University of Cambridge. Knight, R.-A. (2008). The shape of nuclear falls and their effect on the perception of pitch and prom inence: Peaks vs. plateaux. Language and Speech 51(3), 223–244.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

786 References Knight, R.-A., and F. Nolan (2006). The effect of pitch span on intonational plateaux. Journal of the International Phonetic Association 36(1), 21–38. Knowles, S. (1984). A descriptive grammar of Chontal Maya (San Carlos dialect). PhD dissertation, Tulane University. Ko, E. S., and M. Soderstrom (2013). Additive effects of lengthening on the utterance-final word in child-directed speech. Journal of Speech, Language, and Hearing Research 56(1), 364–371. Koch, H. (1997). Pama-Nyungan reflexes in the Arandic languages. In D. Tryon and M. Walsh (eds.), Boundary Rider: Essays in Honour of Geoffrey O’Grady, 271–302. Canberra: Pacific Linguistics. Koch, K. (2008). Intonation and Focus in Nɬeʔkepmxcin (Thompson River Salish). Vancouver: University of British Columbia. Kochanski, G., E. Grabe, J. Coleman, and B. S. Rosner (2005). Loudness predicts prominence: Fundamental frequency lends little. Journal of the Acoustical Society of America 118(2), 1038–1054. Kochanski, G., and C. Shih (2000). Stem-ML: Language-independent prosody description. In Proceedings of the 6th International Conference on Spoken Language Processing, 239–242, Beijing. Kodzasov, S. V. (1999). Caucasian: Daghestanian languages. In H. van der Hulst (ed.), Word Prosodic Systems in the Languages of Europe, 995–1020. Berlin: Mouton de Gruyter. Kodzasov, S. V. (1996). Zakony frazovoj akcentuacii. In T. M. Nikolaeva (ed.), Prosodičeskij stroj russkoj reči, 181–204. Moscow: Institut Russkogo Jazyka Imeni V. V. Vinogradova, RAN. Kogan, L. (1997). Tigrinya. In R. Hetzron (ed.), The Semitic Languages, 424–445. London: Routledge. Kogan, L., and V. Korotayev (1997). Sayhadic (Epigraphic South Arabian). In R. Hetzron (ed.), The Semitic Languages, 220–241. London: Routledge. Kohler, K. J. (1985). The perception of lenis and fortis plosives in French: A critical re-evaluation. Phonetica 42, 116–123. Kohler, K. J. (1986). Parameters of speech rate perception in German words and sentences: Duration, f0 movement, and f0 level. Language and Speech 29, 115–139. Kohler, K. J. (1987). Categorical pitch perception. In Proceedings of the 11th International Congress of Phonetic Sciences, 331–333, Tallinn. Kohler, K. J. (1990). Macro and micro f0 in the synthesis of intonation. In J. Kingston and M. E. Beckman (eds.), Papers in Laboratory Phonology I Between the Grammar and Physics of Speech, 115–138. Cambridge: Cambridge University Press. Kohler, K. J. (1991). Terminal intonation patterns in single-accent utterances of German: Phonetics, phonology, semantics. Arbeitsberichte des Instituts für Phonetik und digitale Sprachverarbeitung der Universität Kiel 25, 295–368. Kohler, K. J. (2004). Pragmatic and attitudinal meanings of pitch patterns in German syntactically marked questions. In G. Fant, H. Fujisaki, J. Cao, and Y. Xu (eds.), From Traditional Phonology to Modern Speech Processing: Festschrift für Professor Wu Zongji’s 95th Birthday, 205–215. Bejing: Foreign Language Teaching and Research Press. Kohler, K. J. (2005). Timing and functions of pitch contours. Phonetica 62, 88–105. Kohler, K. J. (2011). Communicative functions integrate segments in prosodies and prosodies in segments. Phonetica 68, 26–56. Köhnlein, B. (2011). Rule Reversal Revisited: Synchrony and Diachrony of Tone and Prosodic Structure in the Franconian Dialect of Arzbach. Utrecht: LOT. Köhnlein, B. (2013). Optimizing the relation between tone and prominence: Evidence from Franconian, Scandinavian, and Serbo-Croatian tone accent systems. Lingua 131, 1–28. Köhnlein, B. (2016). Contrastive foot structure in Franconian tone-accent dialects. Phonology 33, 87–123. Komar, S. (1999). The fall-rise: A new tone in the Slovene sentence intonation. Govor 14(2), 139–148. Komar, S. (2006). The pragmatic function of intonation in English and Slovene. Acta Neophilologica 39(1–2), 155–166. Komar, S. (2008). Communicative Functions of Intonation: English-Slovene Contrastive Analysis. Ljubljana: Znanstvenoraziskovalni Inštitut Filozofske Fakultete. Komen, E. R. (2007). Focus in Chechen. MA thesis, Leiden University.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 787 Kompe, R. (1997). Prosody in speech understanding systems. Lecture Notes in Artificial Intelligence 1307, 1–357. Kondo, R. W. (1985). Contribución al estudio de la longitud vocálica y el acento en el idioma guahibo. Artículos en Lingüística y Campos Afines 13, 55–81. Koneski, B. (1976). Gramatika na makedonskiot literatur en jazik. Skopje: Kultura. Koneski, B. (1983). A Historical Phonology of the Macedonian Language. Heidelberg: Carl Winter. König, C. (2008). Case in Africa. Oxford: Oxford University Press. König, C. (2009). Marked nominatives. In A. L. Malchukov and A. Spencer (eds.), The Oxford Handbook of Case, 535–548. Oxford: Oxford University Press. König, C., and B. Heine (2015). The ǃXun Language: A Dialect Grammar of Northern Khoisan (Research in Khoisan Studies 33). Cologne: Köppe. Konno, D., H. Kanemitsu, J. Toyama, and M. Shimbo (2006). Spectral properties of Japanese whispered vowels referred to pitch. Journal of the Acoustical Society of America 120, 3378. Kontosopoulos, N. (1994/2001). Dialekti ke idiomata ti Neas Ellinikis (3rd ed.). Athens: Grigoris. Koopmans-van Beinum, F. J. (1980). Vowel reduction in Dutch. PhD dissertation, University of Amsterdam. Kornai, A., and L. Kálmán (1988). Hungarian sentence intonation. In H. van der Hulst and N. Smith (eds.), Autosegmental Studies in Pitch Accent, 183–195. Dordrecht: Foris. Kornfilt, K. (1996). On copular clitic forms in Turkish. In A. Alexiadou, N. Fuhrkop, P. Law, and S. Löhken (eds.), ZAS Papers in Linguistics 6, 96–114. Berlin: ZAS. Kossmann, M. (2012). Berber. In Z. Frajzyngier and E. Shay (eds.), The Afroasiatic Languages, 18–101. Cambridge: Cambridge University Press. Kossmann, M. (2013). A Grammatical Sketch of Ghadames Berber (Libya). Cologne: Rüdiger Köppe. Kostić, D. (1983). Recenicka Melodija u Srpskohrvatskom Jeziku. Beograd: Rad. Kovács, Á. M., E. Téglás, and A. D. Endress (2010). The social sense: Susceptibility to others’ beliefs in human infants and adults. Science, 330(6012), 1830–1834. Kovács, M. (2002). Tendenciák és szabályszerűségek a magánhangzó-időtartamok produkciójában és percepciójában. PhD dissertation, University of Debrecen. Kozasa, T. (2000). Moraic tetrameter in Japanese poetry. In G. Sibley, N. Ochner, and K. Russell (eds.), Proceedings 2000: Selected Papers from the 4th College-Wide Conference for Students in Languages, Linguistics and Literature, 9–19. Honolulu: National Foreign Language Resource Center. Kraehenmann, A. (2001). Swiss German stops: Geminates all over the word. Phonology 18, 109–145. Kraehenmann, A. (2003). Quantity and Prosodic Asymmetries in Alemannic: Synchronic and Diachronic Perspectives. Berlin: Mouton de Gruyter. Krahmer, E., and M. Swerts (2001). On the alleged existence of contrastive accents. Speech Communication 34(4), 391–405. Krahmer, E., and M. Swerts (2005). How children and adults produce and perceive uncertainty in audiovisual speech. Language and Speech 48(1), 29–53. Krahmer, E., and M. Swerts (2007). The effects of visual beats on prosodic prominence: Acoustic analyses, auditory perception and visual perception. Journal of Memory and Language 57(3), 396–414. Kráľ, Á. (1988). Pravidlá slovenskej výslovnosti. Bratislava: Slovenské Pedagogické Nakladateľstvo. Kraljic, T., S. E. Brennan, and A. G. Samuel (2008). Accommodating variation: Dialects, idiolects, and speech processing. Cognition 107(1), 54–81. Kratzer, A., and E. O. Selkirk (2020). Deconstructing information structure. lingbuzz/004201. Krauss, M. E. (1985a). Introduction. In M. E. Krauss (ed.), Yupik Eskimo Prosodic Systems: Descriptive and Comparative Studies (Alaska Native Language Center Research Papers 7), 1–6. Fairbanks: Alaska Native Language Center, University of Alaska. Krauss, M. E. (1985b). Supplementary notes on Central Siberian Yupik prosody. In M. E. Krauss (ed.), Yupik Eskimo Prosodic Systems: Descriptive and Comparative Studies (Alaska Native Language Center Research Papers 7), 47–50. Fairbanks: Alaska Native Language Center, University of Alaska.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

788 References Krauss, M. E. (1985c). Sirenikski and Naukanski. In M. E. Krauss (ed.), Yupik Eskimo Prosodic Systems: Descriptive and Comparative Studies (Alaska Native Language Center Research Papers 7), 175–190. Fairbanks: Alaska Native Language Center, University of Alaska. Krauss, M. E. (2005). Athabaskan tone (1979). In S. Hargus and K. Rice (eds.), Athabaskan Prosody, 55–136. Amsterdam: John Benjamins. Kreiman, J., and D. Sidtis (2011). Foundations of Voice Studies: Interdisciplinary Approaches to Voice Production and Perception. Boston: Wiley Blackwell. Kreiner, H., and Z. Eviatar (2014). The missing link in the embodiment of syntax: Prosody. Brain and Language 137, 91–102. Kreitewolf, J., A. D. Friederici, and K. von Kriegstein (2014). Hemispheric lateralization of linguistic prosody recognition in comparison to speech and speaker recognition. Neuroimage 102, 332–344. Krifka, M. (2008). Basic notions of information structure. Acta Linguistica Hungarica 55(3), 243–276. Krishnamurti, B. (2003). The Dravidian Languages. Cambridge: Cambridge University Press. Krishnan, A., J. T. Gandour, S. Ananthakrishnan, G. M. Bidelman, and C. J. Smalt (2011). Functional ear (a)symmetry in brainstem neural activity relevant to encoding of voice pitch: A precursor for hemispheric specialization? Brain and Language 119(3), 226–231. Krishnan, A., Y. Xu, J. T. Gandour, and P. Cariani (2005). Encoding of pitch in the human brainstem is sensitive to language experience. Cognitive Brain Research 25(1), 161–168. Kristoffersen, G. (1992). Tonelag i sammensatte ord i østnorsk. Norsk Lingvistisk Tidsskrift 10, 39–65. Kristoffersen, G. (1993). An autosegmental analysis of East Norwegian pitch accent. In B. Granström and L. Nord (eds.), Nordic Prosody VI: Papers from a Symposium, 109–122. Stockholm: Almqvist & Wiksell. Kristoffersen, G. (2000). The Phonology of Norwegian. Oxford: Oxford University Press. Kristoffersen, G. (2006). Is 1 always less than 2 in Norwegian tonal accents? In M. de Vaan (ed.), Germanic Tone Accents (Zeitschrift für Dialektologie und Linguistik, Beiheft 131), 63–71. Stuttgart: Franz Steiner. Krivokapić, J. (2007). Prosodic planning: Effects of phrasal length and complexity on pause duration. Journal of Phonetics 35, 162–179. Krivokapić, J. (2012). Prosodic planning in speech production. In S. Fuchs, M. Weihrich, D. Pape, and P. Perrier (eds.), Speech Planning and Dynamics, 157–190. Frankfurt: Peter Lang. Krivokapić, J., and D. Byrd (2012). Prosodic boundary strength: An articulatory and perceptual study. Journal of Phonetics 40, 430–442. Krivokapić, J., M. K. Tiede, and M. Tyrone (2017). A kinematic study of prosodic structure in articulatory and manual gestures: Results from a novel method of data collection. Laboratory Phonology 8(3), 1–26. Kropp, M. E. (1981). On the characterization of stress in West African tone languages. Afrika und Übersee 64, 227–236. Kropp Dakuku, M. E. (2002). Ga Phonology (Language Monograph 6). Legon: Institute of African Studies, University of Ghana. Krull, D. (2001). Perception of Estonian word prosody in whispered speech. In W. A. van Dommelen and T. Fretheim (eds.) Nordic Prosody: Proceedings of the VIIIth Conference, Trondheim 2000, 153–164, Frankfurt: Peter Lang. Kuang, J. (2013a). Phonation in tonal contrasts. PhD dissertation, University of California, Los Angeles. Kuang, J. (2013b). The tonal space of contrastive five level tones. Phonetica 70, 1–23. Kuang, J. (2017). Covariation between voice quality and pitch: Revisiting the case of Mandarin creaky voice. Journal of the Acoustical Society of America 142(3), 1693–1706. Kuang, J., Y. Guo, and M. Liberman (2016). Voice quality as a pitch-range indicator. In Proceedings of Speech Prosody 8, 1061–1065, Boston. Kuang, J., and M. Y. Liberman (2016a). Pitch-range perception: The dynamic interaction between voice quality and fundamental frequency. In INTERSPEECH 2016, 1350–1354, San Francisco. Kuang, J., and M. Y. Liberman (2016b). The effect of vocal fry on pitch perception. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 5260–5264, Shanghai. Kubler, C. (1985). The influence of Southern Min on the Mandarin of Taiwan. Anthropological Linguistics 27, 156–176.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 789 Kubozono, H. (1988). The organization of Japanese Prosody. PhD dissertation, University of Edinburgh. (Published 1993, Tokyo: Kurosio.) Kubozono, H. (2008). Japanese accent. In S. Miyagawa and M. Saito (eds.), The Oxford Handbook of Japanese Linguistics, 165–191. Oxford: Oxford University Press. Kubozono, H. (2012). Varieties of pitch accent systems in Japanese. Lingua 122(13), 1395–1414. Kubozono, H. (2015). Japanese dialects and general linguistics. Gengo Kenkyu 148, 1–31. Kubozono, H. (2016). Diversity of pitch accent systems in Koshikijima Japanese. Gengo Kenkyu 150, 1–31. Kubozono, H. (2018a). Pitch accent. In Y. Hasegawa (ed.), The Cambridge Handbook of Japanese Linguistics, 154–180. Cambridge: Cambridge University Press. Kubozono, H. (2018b). Loanword accent in Kyungsang Korean: A moraic account. In K. Nishiyama, H. Kishimoto, and E. Aldridge (eds.), Topics in Theoretical Asian Linguistics: Studies in Honor of John B. Whitman, 303–329. Amsterdam: John Benjamins. Kubozono, H. (2018c). Postlexical tonal neutralizations in Kagoshima Japanese. In H. Kubozono and M. Giriko (eds.), Tonal Change and Neutralization, 27–57. Berlin: De Gruyter Mouton. Kubozono, H. (2019). Secondary high tones in Koshikijima Japanese. The Linguistic Review 36(1), 25–50. Kügler, F., and S. Skopeteas (2006). Interaction of lexical tone and information structure in Yucatec Maya. In Proceedings of the 2nd International Symposium on Tonal Aspects of Languages, 77–82, La Rochelle. Kügler, F. (2007a). The Intonational Phonology of Swabian and Upper Saxon. Tübingen: Niemeyer. Kügler, F. (2007b). Timing of legal and illegal consonant clusters in Swedish. In Proceedings of Fonetik 2007, 9–12, Stockholm. Kügler, F. (2011). The prosodic expression of focus in typologically unrelated languages. Habilitation thesis, University of Potsdam. Kügler, F. (2016). Tone and intonation in Akan. In L. J. Downing and A. Rialland (eds.), Intonation in African Tone Languages, 89–129. Berlin: De Gruyter Mouton. Kügler, F. (2020). Post-focal compression as a prosodic cue for focus perception in Hindi. Journal of South Asian Linguistics 10(2), 38–59. Kügler, F., and C. Féry (2017). Postfocal downstep in German. Language and Speech 60(2), 260–288. Kügler, F., and S. Genzel (2012). On the prosodic expression of pragmatic prominence: The case of pitch register lowering in Akan. Language and Speech 55(3), 331–359. Kügler, F., and S. Genzel (2014). On the elicitation of focus: Prosodic differences as a function of sentence mode of the context? In Proceedings of the 4th International Symposium on Tonal Aspects of Languages, 71–74, Nijmegen. Kügler, F., and A. Gollrad (2015). Production and perception of contrast: The case of the rise-fall contour in German. Frontiers in Psychology 6(1254), 1–18. Kügler, F., and S. Skopeteas (2007). On the universality of prosodic reflexes of contrast: The case of Yucatec Maya. In Proceedings of the 16th International Congress of Phonetic Sciences, 1025–1028, Saarbrücken. Kügler, F., S. Skopeteas, and E. Verhoeven (2007). Encoding information structure in Yucatec Maya: On the interplay of prosody and syntax. In S. Ishihara, S. Jannedy, and A. Schwarz (eds.), Interdisciplinary Studies on Information Structure 8 (Working Papers of the SFB 632), 187–208. Potsdam: University of Potsdam. Kuhl, P. K. (2004). Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience 5(11), 831–843. Kuhl, P. K., J. E. Andruski, I. A. Chistovich, L. A. Chistovich, E. V. Kozhevnikova, V. L. Ryskina, E. I. Stolyarova, U. Sundberg, and F. Lacerda (1997). Cross-language analysis of phonetic units in language addressed to infants. Science 277(5326), 684–686. Kuhl, P. K., B. T. Conboy, S. Coffey-Corina, D. Padden, M. Rivera-Gaxiola, and T. Nelson (2008). Phonetic learning as a pathway to language: New data and native language magnet theory expanded (NLM-e). Philosophical Transactions of the Royal Society Series B: Biological Sciences 363(1493), 979–1000.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

790 References Kuhn, T. S. (2012). The Structure of Scientific Revolutions (50th anniv. ed.). Chicago: University of Chicago Press. Kula, N. C., and B. Braun (2015). Mental representation of tonal spreading in Bemba: Evidence form elicited production and perception. Southern African Linguistics and Applied Language Studies 33, 307–323. Kula, N. C., and S. Hamann (2017). Intonation in Bemba. Intonation in African Tone Languages 24, 321–363. Kümmel, M. (2007). Konsonantenwandel: Bausteine zu einer Typologie des Lautwandels und ihre Konsequenzen für die vergleichende Rekonstruktion. Wiesbaden: Reichert. Kundrotas, G. (2004). Intonation in the Lithuanian language: Theory and practice. Žmogus ir žodis 6(1), 13–29. Kundrotas, G. (2008). Lietuvių kalbos intonacinių kontūrų fonetiniai požymiai (eksperimentinisfonetinis tyrimas). Žmogus ir Žodis 1, 43–55. Kundrotas, G. (2010). Lietuvių kalbos intonacinių kontūrų funkcinės galimybės. Žmogus ir Žodis 1, 42–47. Kung, C., D. Chwilla, and H. Schriefers (2014). The interaction of lexical tone, intonation and semantic context in on-line spoken word recognition: An ERP study on Cantonese Chinese. Neuropsychologia 53, 293–309. Kung, S. S. (2007). A descriptive grammar of Huehuetla Tepehua. PhD dissertation, University of Texas at Austin. Kuperberg, G. R., and F. T. Jaeger (2016). What do we mean by prediction in language comprehension? Language, Cognition and Neuroscience 31, 32–59. Kurumada, C., M. Brown, and M. Tanenhaus (2018). Effects of distributional information on categor ization of prosodic contours. Psychonomic Bulletin and Review 25, 1153–1160. Kuschke, S., B. Vinck, and S. Geertsema (2016). A combined prosodic and linguistic treatment approach for language-communication skills in children with autism spectrum disorder: A proofof-concept study. South African Journal of Childhood Education 6(1), 1–8. Kuschmann, A., N. Miller, A. Lowit, and L. Pennington (2017). Intonation patterns in older children with cerebral palsy before and after speech intervention. International Journal of Speech-Language Pathology 19(4), 370–380. Kutsch Lojenga, C. (1994). Ngiti: A Central-Sudanic Language of Zaire. Cologne: Rüdiger Köppe. Kwok, V. P., G. Dan, K. Yakpo, S. Matthews, and L. H. Tan (2016). Neural systems for auditory perception of lexical tones. Journal of Neurolinguistics 37, 34–40. Ladd, D. R. (1978). Stylized intonation. Language 54, 517–540. Ladd, D. R. (1980). The Structure of Intonational Meaning: Evidence from English. Bloomington: Indiana University Press. Ladd, D. R. (1983). Phonological features of intonational peaks. Language 59, 721–759. Ladd, D. R. (1988). Declination ‘reset’ and the hierarchical organization of utterances. Journal of the Acoustical Society of America 84, 530–544. Ladd, D. R. (1990). Metrical representation of pitch register. In J. Kingston and M. E. Beckman (eds.), Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech, 35–57. Cambridge: Cambridge University Press. Ladd, D. R. (1993). On the theoretical status of ‘the baseline’ in modeling intonation. Language and Speech 36, 435–451. Ladd, D. R. (1996). Intonational Phonology. Cambridge: Cambridge University Press. Ladd, D. R. (2008a). Review of Jun, S.-A. (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing. Oxford: OUP (2005). Phonology 25, 372–376. Ladd, D. R. (2008b). Intonational Phonology (2nd ed.). Cambridge: Cambridge University Press. Ladd, D. R., and A. Cutler (1983). Introduction: Models and measurements in the study of prosody. In A. Cutler and D. R. Ladd (eds.), Prosody: Models and Measurements, 1–10. Berlin: Springer. Ladd, D. R., D. Faulkner, H. Faulkner, and A. Schepman (1999). Constant ‘segmental anchoring’ of f0 movements under changes in speech rate. Journal of the Acoustical Society of America 106, 1543–1554.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 791 Ladd, D. R., I. Mennen, and A. Schepman (2000). Phonological conditioning of peak alignment in rising pitch accents in Dutch. Journal of the Acoustical Society of America 107, 2685–2696. Ladd, D. R., and C. Johnson (1987). ‘Metrical’ factors in the scaling of sentence-initial accent peaks. Phonetica 44, 238–245. Ladd, D. R., and R. Morton (1997). The perception of intonational emphasis: Continuous or categor ical? Journal of Phonetics 25(3), 313–342. Ladd, D. R., B. Remijsen, and C. A. Manyang (2009a). On the distinction between regular and irregular inflectional morphology: Evidence from Dinka. Language 85, 659–670. Ladd, D. R., and A. Schepman (2003). ‘Sagging transitions’ between high pitch accents in English: Experimental evidence. Journal of Phonetics 31, 81–112. Ladd, D. R., A. Schepman, L. White, L. M. Quarmby, and R. Stackhouse (2009b). Structural and dialectal effects on pitch peak alignment in two varieties of British English. Journal of Phonetics 37, 145–161. Ladd, D. R., K. Silverman, F. Tolkmitt, G. Bergman, and K. Scherer (1985). Evidence for the independent function of intonation contour type, voice quality, and f0 range in signaling speaker affect. Journal of the Acoustical Society of America 78, 435–444. Ladd, D. R., R. Turnbull, C. Browne, C. Caldwell-Harris, L. Ganushchak, K. Swoboda, V. Woodfield, and D. Dediu (2013). Patterns of individual differences in the perception of missing-fundamental tones. Journal of Experimental Psychology: Human Perception and Performance 39, 1386–1397. Ladefoged, P. (1963). Some physiological parameters in speech. Language and Speech 6, 109–119. Ladefoged, P. (1967). Three Areas of Experimental Phonetics. Oxford: Oxford University Press. Ladefoged, P. (1975). A Course in Phonetics. New York: Harcourt Brace Jovanovich. Ladefoged, P. (2003). Phonetic Data Analysis: An Introduction to Fieldwork and Instrumental Techniques. Oxford: Blackwell. Ladefoged, P., J. Ladefoged, and D. L. Everett (1997). Phonetic structures of Banawá, an endangered language. Phonetica 54, 94–111. Ladefoged, P., J. Ladefoged, A. E. Turk, K. Hind, and S. J. Skilton (1998). Phonetic structures of Scottish Gaelic. Journal of the International Phonetic Association 28, 1–41. Ladefoged, P., and N. P. McKinney (1963). Loudness, sound pressure, and subglottal pressure in speech. Journal of the Acoustical Society of America 35, 454–460. Lado, R. (1957). Linguistics across Cultures: Applied Linguistics for Language Teachers. Ann Arbor: Michigan University Press. Lafarge, F. (1978). Etudes phonologiques des parlers Kosop (Kim) et Gerep (Djouman). PhD dissertation, New Sorbonne University. Lahiri, A. (2000). Hierarchical restructuring in the creation of verbal morphology in Bengali and Germanic: Evidence from phonology. In A. Lahiri (ed.), Analogy, Levelling, Markedness, 71–123. Berlin: Mouton de Gruyter. Lahiri, A., and J. Fitzpatrick-Cole (1999). Emphatic clitics in Bengali. In R. Kager and W. Zonneveld (eds.), Phrasal Phonology, 119–144. Dordrecht: Foris. Lahiri, A., and H. J. Kennard (2019). Pertinacity in loanwords: Same underlying systems, different outputs. In M. Cennamo and C. Fabrizio (eds.), Historical Linguistics 2015, 57–74. Amsterdam: John Benjamins. Lahiri, A., and J. C. Koreman (1988). Syllable weight and quantity in Dutch. In Proceedings of the 7th West Coast Conference on Formal Linguistics, 217–228. Stanford: Stanford Linguistic Association. Lahiri, A., and F. Plank (2010). Phonological phrasing in Germanic: The judgement of history, confirmed through experiment. Transactions of the Philological Society 108, 370–398. Lahiri, A., A. Wetterlin, and E. Jönsson-Steiner (2005). Lexical specification of tone in North Germanic. Nordic Journal of Linguistics 28, 61–96. Lai, C. (2012). Rises all the way up: The interpretation of prosody, discourse attitudes and dialogue structure. PhD dissertation, University of Pennsylvania.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

792 References Lai, W. (2018). Voice gender effect on tone categorization and pitch perception. In Proceedings of the 6th International Symposium on Tonal Aspects of Languages, 103–107, Berlin. Lambert, H. M. (1943). Marathi Language Course. Oxford: Oxford University Press. Lambrecht, K. (1996). Information Structure and Sentence Form: Topic, Focus, and the Mental Representations of Discourse Referents. Cambridge: Cambridge University Press. Lambrecht, K. (2001). A framework for the analysis of cleft constructions. Linguistics 39(3), 463–516. Lancia, L., D. Voigt, and G. Krasovitskiy (2016). Characterization of laryngealization as irregular vocal fold vibration and interaction with prosodic prominence. Journal of Phonetics 54, 80–97. Landaburu, J. (1979). La langue des Andoke, Amazonie colombienne. Paris: CNRS. Langeweg, S. J. (1988). The stress system of Dutch. PhD dissertation, Leiden University. Langus, A., J. Mehler, and M. Nespor (2016). Rhythm in language acquisition. Neuroscience and Biobehavioral Reviews 81, 158–166. Laniran, Y. (1992). Intonation in tone languages: The phonetic implementation of tones in Yoruba. PhD dissertation, Cornell University. Laniran, Y., and G. N. Clements (2003). Downstep and high tone raising: Interacting factors in Yoruba tone production. Journal of Phonetics 31, 203–250. Laplane, D., M. Baulac, D. Widlöcher, and B. DuBois (1984). Pure psychic akinesia with bilateral lesions of basal ganglia. Journal of Neurology, Neurosurgery, and Psychiatry 47, 377–385. LaPolla, R. J., and C. Huang (2003). A Grammar of Qiang with Annotated Texts and Glossary. Berlin: Mouton de Gruyter. Larish, M. D. (1999). The position of Moken and Moklen within the Austronesian language family. PhD dissertation, University of Hawaiʻi at Manoa. Larsen, R., and E. V. Pike (1949). Huasteco intonations and phonemes. Language 25(3), 268–277. Lashley, K. (1951). The problem of serial order in behavior. In L. A. Jeffress (ed.), Cerebral Mechanisms in Behavior, 112–131. New York: Wiley. Lass, R. (2002). South African English. In R. Mesthrie (ed.), Language in South Africa, 104–126. Cambridge: Cambridge University Press. Lau, E. F., C. Phillips, and D. Poeppel (2008). A cortical network for semantics: (De)constructing the N400. Nature Reviews Neuroscience 9(12), 920–933. Lau, J. C. Y., P. C. Wong, and B. Chandrasekaran (2017). Context-dependent plasticity in the subcor tical encoding of linguistic pitch patterns. Journal of Neurophysiology 117(2), 594–603. Laubstein, A. S. (1987). Syllable structure: The speech error evidence. Canadian Journal of Linguistics 32(4), 339–363. Laver, J. (1980). The Phonetic Description of Voice Quality. Cambridge: Cambridge University Press. Lavitskaya, Y. (2015). Prosodic structure of Russian: A psycholinguistic investigation of the metrical structure of Russian nouns. PhD dissertation, University of Konstanz. Law, D. (2013). Mayan historical linguistics in a new age. Language and Linguistics Compass 7(3), 141–156. Law, D. (2014). Language Contact, Inherited Similarity and Social Difference: The Story of Linguistic Interaction in the Maya Lowlands. Amsterdam: John Benjamins. Lazard, G. (1992). A Grammar of Contemporary Persian. Costa Mesa, CA: Mazda. Le Grézause, E. (2015). Investigating weight-sensitive stress in disyllabic words in Marathi and its acoustic correlates. University of Washington Working Papers in Linguistics 33, 33–52. Lea, W. (1980). Prosodic aids to speech recognition. In W. Lea (ed.), Trends in Speech Recognition, 166–205. Englewood Cliffs: Prentice-Hall. Leander, A. J. (2008). Acoustic correlates of fortis/lenis in San Francisco Ozolotepec Zapotec. MA thesis, University of North Dakota. Leather, J. (1987). F0 pattern inference in the perceptual acquisition of second language tone. In A. James and J. Leather (eds.), Sound Patterns in Second Language Acquisition, 59–81. Dordrecht: Foris. Leben, W. R. (1973). Suprasegmental phonology. PhD dissertation, MIT. (Published 1980, New York: Garland.)

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 793 Leben, W. R. (1976). The tones in English intonation. Linguistic Analysis 2, 69–107. Leben, W. R. (in press). The nature(s) of downstep. In F. Ahoua (ed.), Proceedings from SLAO/1er Colloque International, Humboldt Kolleg. Abidjan, 2014. Leben, W. R., and F. Ahoua (1997). Prosodic domains in Baule. Phonology 25, 113–132. Lecanuet, J. P., and C. Granier-Deferre (1993). Speech stimuli in the fetal environment. In B. de Boysson-Bardies, S. de Schonen, P. W. Jusczyk, P. MacNeilage, and J. Morton (eds.), Developmental Neurocognition: Speech and Face Processing in the First Year of Life, 237–248. Dordrecht: Kluwer Academic Press. Lecanuet, J.-P., C. Granier-Deferre, and A.-Y. Jacquet (1992). Decelerative cardiac responsiveness to acoustical stimulation in the near term fetus. Quarterly Journal of Experimental Psychology, 44b(3), 279–303. Lee, C.-C., M. Black, A. Katsamanis, A. Lammert, B. Baucom, A. Christensen, P. G. Georgiou, and S. S. Narayanan (2010). Quantification of prosodic entrainment in aﬀective spontaneous spoken interactions of married couples. In INTERSPEECH 2010, 793–796, Makuhari. Lee, C.-Y. (2007). Does horse activate mother? Processing lexical tone in form priming. Language and Speech 50, 101–123. Lee, C.-Y. (2009). Identifying isolated, multispeaker Mandarin tones from brief acoustic input: A perceptual and acoustic study. Journal of the Acoustical Society of America 125, 1125–1137. Lee, E.-K., and S. H. Fraundorf (2017). Effects of contrastive accents in memory for L2 discourse. Bilingualism: Language and Cognition 20(5), 1063–1079. Lee, E. W. (1977). Devoicing, aspiration, and vowel split in Haroi: Evidence for register (contrastive tongue-root position). In D. D. Thomas, E. W. Lee, and N. Đ. Liêm (eds.), Papers in Southeast Asian Linguistics 4, 87–104. Canberra: Australian National University. Lee, H.-B. (1987). Korean prosody: Speech rhythm and intonation. Korean Journal 27(2), 42–68. Lee, H.-Y. (1997). Korean Prosody. Seoul: Korean Research Center. Lee, I., and R. Ramsey (2000). The Korean Language. New York: State University of New York Press. Lee, J., J. Jang, and L. Plonsky (2015). The effectiveness of second language pronunciation instruction: A meta-analysis. Applied Linguistics 36(3), 345–366. Lee, M.-W., and J. Gibbons (2007). Rhythmic alternation and the optional complementiser in English: New evidence of phonological influence on grammatical encoding. Cognition 105(2), 446–456. Lee, S.-H. (2014). The intonation patterns of accentual phrase in Jeju dialect. Phonetics and Speech Sciences 6(4), 117–123. Lee, S.-O. (2000). Vowel length and tone. Se Kukeo Saynghwal, 10(1), 197–209. Lee, T. (1983). An acoustical study of the register distinction in Mon. UCLA Working Papers in Phonetics 57, 79–96. Lee, Y.-S., D. A. Vakoch, and L. H. Wurm (1996). Tone perception in Cantonese and Mandarin: A cross-linguistic comparison. Journal of Psycholinguistic Research 25, 527–542. Leech, G. (1983). Principles of Pragmatics. London: Longman. Leemann, A., M.-J. Kolly, and V. Dellwo (2014). Speaker-individuality in suprasegmental temporal features: Implications for forensic voice comparison. Forensic Science International 238, 59–67. Leer, J. (1985a). Evolution of prosody in the Yupik languages. In M. E. Krauss (ed.), Yupik Eskimo Prosodic Systems: Descriptive and Comparative Studies, 77–134. Fairbanks: Alaska Native Language Center, University of Alaska. Leer, J. (1985b). Prosody in Alutiiq (the Koniaq and Chugach dialects of Alaskan Yupik). In M. E. Krauss (ed.), Yupik Eskimo Prosodic Systems: Descriptive and Comparative Studies (Alaska Native Language Center Research Papers 7), 77–133. Fairbanks: Alaska Native Language Center, University of Alaska. Leer, J. (1985c). Toward a metrical interpretation of Yupik Prosody. In M. E. Krauss (ed.), Yupik Eskimo Prosodic Systems: Descriptive and Comparative Studies (Alaska Native Language Center Research Papers 7), 159–172. Fairbanks: Alaska Native Language Center, University of Alaska. Lees, R. (1961). The Phonology of Modern Standard Turkish (Uralic and Altaic Series 6). Bloomington: Indiana University.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

794 References Lehiste, I. (1960a). Segmental and syllabic quantity in Estonian. In American Studies in Uralic Linguistics, 21–82. Bloomington: Indiana University. Lehiste, I. (1960b). An acoustic-phonetic study of open juncture. Phonetica 5(Suppl.), 5–54. Lehiste, I. (1970). Suprasegmentals. Cambridge, MA: MIT Press. Lehiste, I. (1972). The timing of utterances and linguistic boundaries. Journal of the Acoustical Society of America, 51(6B), 2018–2024. Lehiste, I. (1975). Experiments with synthetic speech concerning quantity in Estonian. In V. Hallap (ed.), Congressus Tertius Internationalis Fenno-Ugristarum Tallinnae Habitus, 254–269. Tallinn: Valgus. Lehiste, I. (1976). Influence of fundamental frequency pattern on the perception of duration. Journal of Phonetics 4, 113–117. Lehiste, I. (1977a). Suprasegmentals. Cambridge, MA. MIT Press. Lehiste, I. (1977b). Isochrony reconsidered. Journal of Phonetics 5, 253–263. Lehiste, I. (1997). Search for phonetic correlates in Estonian prosody. In I. Lehiste and J. Ross (eds.), Estonian Prosody: Papers from a Symposium, 11–35. Tallinn: Institute of the Estonian Language. Lehiste, I., and P. Ivić (1963). Accent in Serbo-Croatian: An Experimental Study (Michigan Slavic Materials 4). Ann Arbor: University of Michigan, Department of Slavic Languages and Literatures. Lehiste, I., and P. Ivić (1986). Word and Sentence Prosody in Serbocroatian. Cambridge, MA: MIT Press. Lehiste, I., J. Olive, and L. Streeter (1976). Role of duration in disambiguating syntactically ambiguous sentences. Journal of the Acoustical Society of America 60, 1199–1202. Lehnert-LeHouillier, H., J. McDonough, and S. McAleavey (2010). Prosodic strengthening in American English domain-initial vowels. In Proceedings of Speech Prosody 5, Chicago. Lehtonen, J. (1970). Aspects of Quantity in Standard Finnish (Studia Philologica Jyväskyläensia 6). Jyväskylä: Gummerus. Leimgruber, J. R. (2013). Singapore English: Structure, Variation, and Usage. Cambridge: Cambridge University Press. Leino, P. (1986). Language and Metre: Metrics and the Metrical System of Finnish. Helsinki: Suomalaisen Kiriallisuuden Seura. Lenden, J. M., and P. Flipsen (2007). Prosody and voice characteristics of children with cochlear implants. Journal of Communication Disorders 40, 66–81. Leon, S. A., J. C. Rosenbek, G. P. Crucian, B. Hieber, B. Holiway, A. D. Rodriguez, T. U. Ketterson, M. Z. Ciampitti, S. Freshwater, K. M. Heilman, and L. J. Gonzalez-Rothi (2005). Active treatments for aprosodia secondary to right hemisphere stroke. Journal of Rehabilitation Research and Development 42(1), 93–102. Leonard, T., and F. Cummins (2010). The temporal relation between beat gestures and speech. Language and Cognitive Processes of the United States of America 26(10), 1457–1471. Lepschy, A. L., G. Lepschy (1981). La lingua italiana: Storia, varietà delluso, grammatica. Milan: Bompiani. Lerdahl, F., and R. Jackendoff (1983). A Generative Theory of Tonal Music. Cambridge, MA: MIT Press. Leroy, J. (1979). A la recherche de tons perdus. Journal of African Languages and Linguistics 1, 55–71. Levelt, W. J. M. (1983). Monitoring and self-repair in speech. Cognition 14, 41–104. Levelt, W. J. M. (1989). Speaking: From Intention to Articulation. Cambridge, MA: MIT Press. Levelt, W. J. M. (2001). Spoken word production: A theory of lexical access. Proceedings of the National Academy of Sciences of the United States of America 98(23), 13464–13471. Levelt, W. J. M. (2002). Phonological encoding in speech production: Comments on Jurafsky et al., Schiller et al., and Heuven and Haan. In C. Gussenhoven and N. Warner (eds.), Laboratory Phonology 7, 87–99. Berlin: Mouton de Gruyter. Levelt, W. J. M., A. Roelof, and A. S. Meyer (1999). A theory of lexical access in speech production. Behavioral and Brain Science 22(1), 1–38.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 795 Levi, S. (2005). Acoustic correlates of lexical accent in Turkish. Journal of the International Phonetic Association 35, 73–97. Levin, J. (1985). Reduplication in Umpila. In D. Archangeli, A. Barss, and R. Sproat (eds.), Papers in Theoretical and Applied Linguistics (MIT Working Papers in Linguistics 6), 133–159. Cambridge, MA: MIT Press. Levinson, S. C. (2010). Questions and responses in Yélî Dnye, the Papuan language of Rossel Island. Journal of Pragmatics 42(10), 2741–2755. Levinson, S. C. (2016). Turn-taking in human communication: Origins and implications for language processing. Trends in Cognitive Sciences 20(1), 6–14. Levis, J. M. (2005a). Changing contexts and shifting paradigms in pronunciation teaching. TESOL Quarterly 39(3), 369–377. Levis, J. M. (2005b). Comparing apples and oranges? Pedagogical approaches to intonation in British and American English. In K. Dziubalska-Kolaczyk and J. Przedlacka (eds.), English Pronunciation Models: A Changing Scene, 339–366. Bern: Peter Lang. Levis, J. M., and L. Pickering (2004). Teaching intonation in discourse using speech visualization technology. System 32(4), 505–524. Levis, J. M., and S. Sonsaat (2016). Pronunciation materials. In M. Azarnoosh, M. Zeraatpishe, A. Favani, and H. R. Kargozari (eds.), Issues in Materials Development, 109–119. Rotterdam: Sense. Levitan, R., Š. Beňuš, A. Gravano, and J. Hirschberg (2015). Entrainment in Slovak, Spanish, English, and Chinese: A cross-linguistic comparison. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 325–334, Prague. Levitan, R., A. Gravano, and J. Hirschberg (2011). Entrainment in speech preceding backchannels. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 113–117, Portland. Levitan, R., A. Gravano, L. Willson, Š. Beňuš, J. Hirschberg, and A. Nenkova (2012). Acoustic-prosodic entrainment and social behavior. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 11–19, Montreal. Levitan, R., and J. Hirschberg (2011). Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions. In INTERSPEECH 2011, 3081–3084, Florence. Levitt, J. S. (2014). A case study: The effects of the ‘SPEAK OUT®’ voice program for Parkinson’s disease. International Journal of Applied Science and Technology 4(2), 20–28. Levon, E. (2018). Same difference: The phonetic shape of high rising terminals in London. English Language and Linguistics 24(1), 49–73. Levow, G. (2005). Turn-taking in Mandarin dialogue: Interactions of tones and intonation. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, 72–78. Lewicki, M. S. (2002). Efficient coding of natural sounds. Nature Neuroscience 5(4), 356–363. Li, A. (2015). Encoding and Decoding of Emotional Speech: A Cross-Cultural and Multimodal Study between Chinese and Japanese. Berlin: Springer. Li, A., and B. Post (2014). L2 acquisition of prosodic properties of speech rhythm: Evidence from L1 Mandarin and German learners of English. Studies in Second Language Acquisition 36(2), 223–255. Li, C. N., and S. A. Thompson (1977). The acquisition of tone in Mandarin-speaking children. Journal of Child Language 4, 185–199. Li, P. J.-K. (1981). Reconstruction of Proto-Atayalic phonology. Bulletin of the Institute of History and Philology 52(2), 235–301. Li, X., Y. Chen, and Y. Yang (2011). Immediate integration of different types of prosodic information during on-line spoken language comprehension: An ERP study. Brain Research 1386, 139–152. Li, X., and Y. Chen (2015). Representation and processing of lexical tone and tonal variants: Evidence from the mismatch negativity. PLoS ONE 10, e0143097. Li, X., Y. Yang, and P. Hagoort (2008). Pitch accent and lexical tone processing in Chinese discourse comprehension: An ERP study. Brain Research 1222, 192–200. Li, Y. (2016). Effects of high variability phonetic training on monosyllabic and disyllabic Mandarin Chinese tones for L2 Chinese learners. PhD dissertation, University of Kansas.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

796 References Li, Y., A. Katsika, and A. Arvaniti (2018). Modelling variability and non-local effects in intonation as cue trading. Poster presented at the 16th Conference on Laboratory Phonology, Lisbon. Liang, J., and V. J. van Heuven (2007). Chinese tone and intonation perceived by L1 and L2 listeners. In C. Gussenhoven and T. Riad (eds.), Tones and Tunes: Vol. 2. Experimental Studies in Word and Sentence Prosody, 27–61. Berlin: Mouton de Gruyter. Liang, Y., and A. Feng (1996). Fuzhouhua yindang. Shanghai: Shanghai Jiaoyu Chubanshe. Liberman, A. M., F. S. Cooper, D. P. Shankweiler, and M. Studdert-Kennedy (1967). Perception of the speech code. Psychological Review 74, 431–461. Liberman, A. M., K. S. Harris, H. S. Hoffman, and B. C. Griffith (1957). The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology 54(5), 358–368. Liberman, M. Y. (1975). The intonational system of English. PhD dissertation, MIT. Liberman, M. Y. (1979). Intonational System of English. New York: Garland. Liberman, M. Y., K. Davis, M. Grossman, N. Martey, and J. Bell (2002). Emotional Prosody Speech and Transcripts. Philadelphia: Linguistic Data Consortium. Retrieved 21 May 2020 from https://catalog. ldc.upenn.edu/LDC2002S28. Liberman, M. Y., and J. B. Pierrehumbert (1984). Intonational invariance under changes in pitch range and length. In M. Aronoff and R. T. Oehrle (eds.), Language Sound Structure: Studies in Phonology Presented to Morris Halle, 157–233. Cambridge, MA: MIT Press. Liberman, M. Y., and A. Prince (1977). On stress and linguistic rhythm. Linguistic Inquiry 8, 249–336. Liberman, M. Y., and I. Sag (1974). Prosodic form and discourse function. In Proceedings of the 10th Meeting of the Chicago Linguistics Society, 402–415. Chicago: Chicago Linguistics Society. Lichtenberk, F. (1983). A Grammar of Manam. Honolulu: University of Hawaiʻi Press. Licklider, J. C. R. (1951). A duplex theory of pitch perception. Experienta 7, 128–134. Licklider, J. C. R. (1954). ‘Periodicity’ pitch and ‘place’ pitch. Journal of the Acoustical Society of America 26S, 945. Liddell, S. K. (1978). Nonmanual signals and relative clauses in American Sign Language. In P. Siple (ed.), Understanding Language through Sign Language Research, 59–90. New York: Academic Press. Liddell, S. K. (1980). American Sign Language Syntax. The Hague: Mouton. Liddell, S. K., and R. E. Johnson (1986). American Sign Language compound formation processes, lexicalization, and phonological remnants. Natural Language and Linguistic Theory 4(4), 445–513.‫‏‬ Lieberman, P. (1960). Some acoustic correlates of word stress in American English. Journal of the Acoustical Society of America 32, 451–454. Lieberman, P. (1965). On the acoustic basis of the perception of intonation by linguists. Word 21(1), 40–54. Lieberman, P. (1966). Intonation, perception, and language. PhD dissertation, MIT. (Published 1967, Cambridge: MIT Press.) Liljencrants, J., and B. Lindblom (1972). Numerical simulation of vowel quality systems: The role of perceptual contrast. Language 48, 839–862. Lillo-Martin, D., and R. M. d. Quadros (2008). Focus constructions in American Sign Language and Língua de Sinais Brasileira. In Signs of the Time: Selected Papers from TISLR 8, 161–176. Barcelona: Seedorf. Lim, J. K. S. (2018). The role of prosodic structure in the word tonology of Lhasa Tibetan. PhD dissertation, University of Ottawa. Lim, S. J., and L. L. Holt (2011). Learning foreign sounds in an alien world: Videogame training improves non-native speech categorization. Cognitive Science 35, 1390–1405. Lin, C. Y., M. I. N. Wang, W. J. Idsardi, and Y. I. Xu (2014). Stress processing in Mandarin and Korean second language learners of English. Bilingualism: Language and Cognition 17, 316–346. Lin, H.-B. (1988b). Contextual stability of Taiwanese tones. PhD dissertation, University of Connecticut. Lin, M. (2004). Chinese intonation and tone. Applied Linguistics 3, 57–67.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 797 Lin, M. (1988a). Putonghua shengdiao de shengxue texing he zhijue zhengzhao. Zhongguo Yuwen 204, 182–193. Lin, T. (1985). Tantao Beijinghua qingyin xingzhi de chubu shiyan. In S. Hu (ed.), Beijing yuyin shiyanlu, 1–26. Beijing: Beijing Daxue Chubanshe. Lindau, M. (1986). Testing a model of intonation in a tone language. Journal of the Acoustical Society of America 80, 757–764. Lindblom, B. (1963). Spectrographic study of vowel reduction. Journal of the Acoustical Society of America 35, 1773–1781. Lindblom, B. (2009). F0 lowering, creaky voice, and glottal stop: Jan Gauffin’s account of how the larynx works in speech. In Proceedings of Fonetik 2009, 8–11, Stockholm. Lindner, J. L., and L. A. Rosén (2006). Decoding of emotion through facial expression, prosody and verbal content in children and adolescents with Asperger’s syndrome. Journal of Autism and Developmental Disorders 36(6), 769–777. Lindová, J., M. Špinka, and L. Nováková (2015). Decoding of baby calls: Can adult humans identify the eliciting situation from emotional vocalizations of preverbal infants? PLoS ONE 10(4), e0124317. Lindqvist-Gauffin, J. (1972a). A descriptive model of laryngeal articulation in speech. Speech Transmission Laboratory Quarterly Progress and Status Report (Department of Speech Transmission, Royal Institute of Technology, Stockholm) 2–3, 1–9. Lindqvist-Gauffin, J. (1972b). Laryngeal articulation studied on Swedish subjects. Speech Transmission Laboratory Quarterly Progress and Status Report (Department of Speech Transmission, Royal Institute of Technology, Stockholm) 2–3, 10–27. Lindström, E., and B. Remijsen (2005). Aspects of the prosody of Kuot, a language where intonation ignores stress. Linguistics 43(4), 839–870. Lindström, L. (2005). Finiitverbi asend lauses: Sõnajärg ja seda mõjutavad tegurid suulises eesti keeles. PhD dissertation, University of Tartu. Lindström, R., T. Lepistö-Paisley, R. Vanhala, R. Alén, and T. Kujala (2016). Impaired neural discrimination of emotional speech prosody in children with autism spectrum disorder and language impairment. Neuroscience Letters 628, 47–51. Ling, B., and J. Liang (2017). Focus encoding and prosodic structure in Shanghai Chinese. Journal of the Acoustical Society of America 141, EL610. Lionnet, F. (2015). Mid-tone lowering in Laal: The phonology/syntax interface in question. In Proceedings of the 49th Annual Meeting of the Chicago Linguistics Society (CLS 49, 2013). Chicago: Chicago Linguistics Society. Lippus, P. (2011). The acoustic features and perception of the Estonian quantity system. PhD dissertation, Tartu University Press. Lippus, P., E. L. Asu, and M.-L. Kalvik (2014). An acoustic study of Estonian word stress. In N. Campbell, D. Gibbon, and D. Hirst (eds.), Proceedings of Speech Prosody 7, 232–235, Dublin. Liscombe, J., G. Riccardi, and D. Hakkani-Tur (2005). Using context to improve emotion detection in spoken dialog systems. In INTERSPEECH 2005, 1845–1848, Lisbon. Liscombe, J., J. Venditti, and J. Hirschberg (2003). Classifying subject ratings of emotional speech using acoustic features. In Eurospeech 2003, 725–728, Geneva. Lisker, L. (1972). Stop duration and voicing in English. In A. Valdman (ed.), Papers in Linguistics and Phonetics to the Memory of Pierre Delattre, 339–343. The Hague: Mouton. Lisker, L., and A. S. Abramson (1964). A cross-language study of voicing in initial stops: Acoustical measurements. WORD: Journal of the International Linguistic Association 20, 384–422. Liss, J. M., L. White, S. L. Mattys, K. Lansford, A. Lotto, S. M. Spitzer, and J. N. Caviness (2009). Quantifying speech rhythm abnormalities in the dysarthrias. Journal of Speech, Language, and Hearing Research 52(5), 1334–1352. Lissoir, M.-P. (2016). Le khap tai dam, catégorisation et modèles musicaux. PhD dissertation, Free University of Brussels and New Sorbonne University. List, G. (1961). Speech melody and song melody in central Thailand. Ethnomusicology 5(1), 16–32.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

798 References Litman, D. (1996). Cue phrase classification using machine learning. Journal of Artificial Intelligence 5, 53–94. Liu, Y., E. Shriberg, and A. Stolcke (2003). Automatic disfluency identification in conversational speech using multiple knowledge sources. In Eurospeech 2003, 957–960, Geneva. Liu, F., and Y. Xu (2005). Parallel encoding of focus and interrogative meaning in Mandarin inton ation. Phonetica 62, 70–87. Liu, L., and R. Kager (2014). Perception of tones by infants learning a non-tone language. Cognition 133(2), 385–394. Liu, L., and R. Kager (2018). Monolingual and bilingual infants’ ability to use non-native tone for word learning deteriorates by the second year after birth. Frontiers in Psychology 9, 117. Liu, L., D. Peng, G. Ding, Z. Jin, L. Zhang, K. Li, and C. Chen (2006). Dissociation in the neural basis underlying Chinese tone and vowel production. NeuroImage 29, 515–523. Liu, M., Y. Chen, and N. O. Schiller (2016a). Context effects on tone and intonation processing in Mandarin. In Proceedings of Speech Prosody 8, 1056–1060, Boston. Liu, M., Y. Chen, and N. O. Schiller (2016b). Online processing of tone and intonation in Mandarin: Evidence from ERPs. Neuropyschologia, 307–317. Liu, P., and M. D. Pell (2012). Recognizing vocal emotions in Mandarin Chinese: A validated database of Chinese vocal stimuli. Behavior Research Methods 44, 1042–1051. Liu, S., and A. G. Samuel (2004). Perception of Mandarin lexical tones when f0 information is neutralized. Language and Speech 47, 109–138. Lo, C. Y., C. M. McMahon, V. Looi, and W. F. Thompson (2015). Melodic contour training and its effect on speech in noise, consonant discrimination, and prosody perception for cochlear implant recipients. Behavioural Neurology 2015, ID 352869. Lo, T. C. (2013). Correspondences between lexical tone and music transitions in Cantonese pop songs: A quantitative and analytic approach. MA thesis, University of Edinburgh. Lockwood, W. B. (1955/1977). An Introduction to Modern Faroese. Tórshavn: Føroyja Skúlabókargrunnur. Loehr, D. P. (2004). Gesture and intonation. PhD dissertation, Georgetown University. Loehr, D. P. (2012). Temporal, structural, and pragmatic synchrony between intonation and gesture. Laboratory Phonology 3, 71–89. Logan, J. S., S. E. Lively, and D. B. Pisoni (1991). Training Japanese listeners to identify English /r/ and /l/: A first report. Journal of the Acoustical Society of America 89, 874–886. Loman, B. (1975). Prosodic patterns in a Negro American dialect. In Style and Text: Studies Presented to Nils Erik Enkvist, 219–242. Stockholm: Sprakforlaget Skriptor. Longacre, R. E. (1957). Proto-Mixtecan. Bloomington: Indiana University Press. Longacre, R. E. (1995). Left shifts in strongly VSO languages. In P. A. Downing and M. Noonan (eds.), Typological Studies in Language, 331–354. Amsterdam: John Benjamins. Loos, E. (1969). The Phonology of Capanahua and its Grammatical Basis. Norman, OK: Summer Institute of Linguistics. Lopes, L. W., and I. L. B. Lima (2014). Prosódia e transtornos da linguagem: Levantamento das publicações em periódicos indexados entre 1979 e (2009). Revista CEFAC 16(2), 651–659. Loporcaro, M. (1997). L’origine del raddoppiamento fonosintattico: Saggio di fonologia diacronica romanza. Basel: Francke. Loprieno, A., and M. Müller (2012). Ancient Egyptian and Coptic. In Z. Frajzyngier and E. Shay (eds.), The Afroasiatic Languages, 102–144. Cambridge: Cambridge University Press. Lorentz, O. (1995). Tonal prominence and alignment. Phonology at Santa Cruz 4, 39–56. Loukina, A., G. Kochanski, B. S. Rosner, E. Keane, and C. Shih (2011). Rhythm measures and dimensions of durational variation in speech. Journal of the Acoustical Society of America 129(5), 3258–3270. Lovick, O., and S. Tuttle (2012). The prosody of Denaina narrative discourse. International Journal of American Linguistics 78, 293–334. Low, E. L., and A. Brown (2005). Singapore English: An Introduction. Singapore: McGraw-Hill. Low, E. L., and E. Grabe (1999). A contrastive study of prosody and lexical stress placement in Singapore English and British English. Language and Speech 42, 39–56.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 799 Low, E. L., E. Grabe, and F. Nolan (2000). Quantitative characterizations of speech rhythm: Syllabletiming in Singapore English. Language and Speech 43(4), 377–401. Lowit, A., and A. Kuschmann (2012). Characterizing intonation deficit in motor speech disorders: An autosegmental-metrical analysis of spontaneous speech in hypokinetic dysarthria, ataxic dysarthria, and foreign accent syndrome. Journal of Speech, Language and Hearing Research 55, 1472–1484. Luangthongkum, T. (1977). Rhythm in standard Thai. PhD dissertation, University of Edinburgh. Luangthongkum, T. (1987). Another look at the register distinction in Mon. UCLA Working Papers in Phonetics 67, 29–48. Luce, P. A., and J. Charles-Luce (1985). Contextual effects on vowel duration, closure duration, and the consonant/vowel ratio in speech production. Journal of the Acoustical Society of America 78(6), 1949–1957. Luchkina, T., and J. S. Cole (2016). Structural and referent-based effects on prosodic expression in Russian. Phonetica 73(3–4), 279–313. Łukaszewicz, B. (2018). Phonetic evidence for an iterative stress system: The issue of consonantal rhythm. Phonology 35(1), 115–150. Łukaszewicz, B., and J. Mołczanow (2018). The role of vowel parameters in defining lexical and subsidiary stress in Ukrainian. Poznań Studies in Contemporary Linguistics 54(3), 355–375. Luke, K. K. (2000). Phonological re-interpretation: The assignment of Cantonese tones to English words. In Proceedings of the 9th International Conference on Chinese Linguistics, Singapore. Luksaneeyanawin, S. (1998). Intonation in Thai. In D. Hirst and A. D. Cristo (eds.), Intonation Systems, 376–394. Cambridge: Cambridge University Press. Lulich, S. M., H. H. Berkson, and K. de Jong (2018). Acquiring and visualizing 3D/4D ultrasound recordings of tongue motion. Journal of Phonetics 71, 410–424. Lunden, A. (2010). A Phonetically-Motivated Phonological Analysis of Syllable Weight and Stress in the Norwegian Language. New York: Edwin Mellen Press. Lunden, A. (2013). Reanalyzing final consonant extrametricality: A proportional theory of weight. Journal of Comparative Germanic Linguistics 16, 1–31. Lunden, A., J. Campbell, M. Hutchens, and N. Kalivoda (2017). Vowel-length contrasts and phonetic cues to stress: An investigation of their relation. Phonology 34, 565–580. Lunt, H. (1952). A Grammar of the Macedonian Literary Language. Skopje: n.p. Luo, H., and D. Poeppel (2012). Cortical oscillations in auditory perception and speech: Evidence for two temporal windows in human auditory cortex. Frontiers in Psychology 3, 170. Luria, A. R. (1966). Higher Cortical Functions in Man. New York: Basic Books. Lyman, L., and R. Lyman (1977). Choapan Zapotec phonology. In W. R. Merrifield (ed.), Studies in Otomanguean Phonology, 137–161. Arlington: Summer Institute of Linguistics/University of Texas at Arlington. Lynch, J., M. Ross, and T. Crowley (2005). The Oceanic Languages. London: Routledge. Lyons, M., E. Schoen Simmons, and R. Paul (2014). Prosodic development in middle childhood and adolescence in high-functioning autism. Autism Research 7(2), 181–196. Ma, J., V. Ciocca, and T. Whitehill (2006). Effect of intonation on Cantonese lexical tones. Journal of Acoustical Society of America 120, 3978–3987. Ma, J., V. Ciocca, and T. Whitehill (2011a). The perception of intonation questions and statements in Cantonese. Journal of Acoustical Society of America 129, 1012–1023. Ma, W., R. M. Golinkoff, D. M. Houston, and K. Hirsh-Pasek (2011b). Word learning in infant-and adult-directed speech. Language Learning and Development 7(3), 185–201. Ma Newman, R. (1971). Downstep in Gaanda. Journal of African Languages 10, 15–27. Maas, U. (2013). Die marokkanische Akzentuierung. In R. Kuty, U. Seeger, and S. Talay (eds.), Nicht nur mit Engelszungen (Beiträge zur semitischen Dialektologie: Festschrift für Werner Arnold zum 60. Geburtstag). Wiesbaden: Harrassowitz. Maas, U., and S. Procházka (2012). Moroccan Arabic in its wider linguistic and social contexts. Language Typology and Universals/Sprachtypologie und Universalienforschung 65(4), 329–357.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

800 References Mạc, Đ. K. (2012). Génération de parole expressive dans le cas des langues à tons. Grenoble: Grenoble INP. MacAuley, D. (1979). Some functional and distributional aspects of intonation in Scottish Gaelic: A preliminary study of tones. In D. Ó. Baoill (ed.), Papers in Celtic Phonology, 27–38. Coleraine: New University of Ulster. Macaulay, M. (1996). A Grammar of Chalcatongo Mixtec. Berkeley: University of California Press. Macaulay, M., and J. C. Salmons (1995). The phonology of glottalization in Mixtec. International Journal of American Linguistics 61(1), 38–61. Mack, M., and B. Gold (1986). The effect of linguistic content upon the discrimination of pitch in monotone stimuli. Journal of Phonetics 14, 333–337. MacKay, C. J. (1994). A sketch of Misantla Totonac phonology. International Journal of American Linguistics 60(4), 369–419. MacKay, C. J. (1999). A Grammar of Misantla Totonac (Studies in Indigenous Languages of the Americas). Salt Lake City: University of Utah Press. MacKay, C. J., and F. R. Treschel (2013). A sketch of Pisaflores Tepehua phonology. International Journal of American Linguistics 79(2), 189–218. MacKay, D. (1987). Spoonerisms: The structure of errors in the serial order of speech. Neuropsychologia 8, 323–350. Macken, M. A. (1978). Permitted complexity in phonological development: One child’s acquisition of Spanish consonants. Lingua 44(2–3), 219–253. Maddieson, I. (2013a). Tone. In M. S. Dryer and M. Haspelmath (eds.), The World Atlas of Language Structures Online. Max Planck Institute for Evolutionary Anthropology. Retrieved 21 July 2017 from http://wals.info/chapter/13. Maddieson, I. (2013b). Syllable structure. In M. S. Dryer and M. Haspelmath (eds.), The World Atlas of Language Structures Online. Max Planck Institute for Evolutionary Anthropology. Retrieved 21 July 2017 http://wals.info/chapter/12. Maddieson, I., S. Flavier, E. Marsico, and F. Pellegrino. (2014). LAPSyD: Lyon-Albuquerque Phonological Systems Database. Retrieved 21 May 2020 from http://www.lapsyd.ddl.cnrs.fr/lapsyd. Maddieson, I., and K.-F. Pang (1993). Tone in Utsat. In J. A. Edmondson and K. J. Gregerson (eds.), Tonality in Austronesian Languages, 75–89. Honolulu: University of Hawaiʻi Press. Mády, K. (2012). A fókusz prozódiai jelölése felolvasásban és spontán beszédben. In M. Gósy (ed.), Beszéd, Adatbázis, Kutatások, 91–107. Budapest: Akadémiai Kiadó. Mády, K., F. Kleber, U. D. Reichel, and Á. Szalontai (2016). The interplay of prominence and boundary strength: A comparative study. In Proceedings of 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum, 107–110, Munich. Mády, K., and Á. Szalontai (2014). Where do questions begin? Phrase-initial boundary tones in Hungarian polar questions. In Proceedings of Speech Prosody 7, 568–572, Dublin. Maeda, S. (1976). A characterization of American English intonation. PhD dissertation, MIT. Maekawa, K. (2003). Corpus of spontaneous Japanese: Its design and evaluation. In Proceedings of the ISCA and IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo. Magno Caldognetto, E., F. E. Ferrero, C. Lavagnoli, and K. Vagges (1978). F0 contours of statements, yes-no questions, and wh-questions of two regional varieties of Italian. Journal of Italian Linguistics 3, 57–68. Magro, E.-P. (2004). La chute de la mélodie dans les énoncés assertifs en maltais: Finalité ou continuation? In Proceedings of the 25th Journées d’Etude sur la Parole, 333–336, Fès. Mahjani, B. (2003). An instrumental study of prosodic features and intonation in modern Farsi (Persian). MA thesis, University of Edinburgh. Mahshie, J., K. Preminger, and Ciemniecki, L (2016). Production of contrastive stress by children with cochlear implants: Acoustic evidence. Poster presented at the American Speech and Hearing Association Convention, Philadelphia. Maiden, M. (1995). Evidence from the Italian dialects for the internal structure of prosodic domains. In J. C. Smith and M. Maiden (eds.), Linguistic Theory and the Romance Languages, 115–131. Amsterdam: John Benjamins.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 801 Mairano, P., and A. Romano (2011). Rhythm metrics for 21 languages. In Proceedings of the 17th International Congress of Phonetic Sciences, 17–21, Hong Kong. Mairesse, F., M. A. Walker, M. R. Mehl, and R. K. Moore (2007). Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research 30, 457–500. Mairs, J. L. (1989). Stress assignment in interlanguage phonology: An analysis of the stress system of Spanish speakers learning English. In S. M. Gass and J. Schachter (eds.), Linguistic Perspectives on Second Language Acquisition, 260–283. Cambridge: Cambridge University Press. Makasso, E.-M., F. Hamlaoui, and S. J. Lee (2017). Aspects of the intonational phonology of Bàsàá. In L. J. Downing and A. Rialland (eds.), Intonation in African Tone Languages, 167–194. Berlin: De Gruyter Mouton. Malamud, S., and T. Stephenson (2015). Three ways to avoid commitments: Declarative force modi fiers in the conversational scoreboard. Journal of Semantics 32(2), 275–311. Malisz, Z. (2013). Speech rhythm variability in Polish and English: A study of interaction between rhythmic levels. PhD dissertation, Adam Mickiewicz University in Poznań. Malisz, Z., M. O’Dell, T. Nieminen, and P. Wagner (2016). Perspectives on speech timing: Coupled oscillator modeling of Polish and Finnish. Phonetica 73(3–4), 229–255. Malisz, Z., and P. Wagner (2012). Acoustic-phonetic realisation of Polish syllable prominence: A corpus study. In D. Gibbon, D. Hirst, and N. Campbell (eds.), Rhythm, melody and harmony in speech: Studies in honour of W. Jassem (special issue), Speech and Language Technology 14–15, 105–114. Malisz, Z., and M. Żygis (2018). Lexical stress in Polish: Evidence from focus and phrase-position differentiated production data. In Proceedings of Speech Prosody 9, 1008–1012, Poznań. Mallory, J. P., and D. Q. Adams (2006). The Oxford Introduction to Proto-Indo-European and the ProtoIndo-European World. Oxford: Oxford University Press. Malone, T. (2006). Tone and syllable structure in Chimila. International Journal of American Linguistics 72, 1–58. Mampe, B., A. D. Friederici, A. Christophe, and K. Wermke (2009). Newborns’ cry melody is shaped by their native language. Current Biology 19(23), 1994–1997. Manaster-Ramer, A. (1986). Genesis of Hopi tones. International Journal of American Linguistics 52, 154–160. Mandel, D. R., P. W. Jusczyk, and D. B. Pisoni (1995). Infants’ recognition of the sound patterns of their own names. Psychological Science 6(5), 315–318. Manfredi, V. (1993). Spreading and downstep: Prosodic government in tone languages. In H. van der Hulst and K. Snider (eds.), The Phonology of Tone, 133–184. Berlin: Mouton de Gruyter. Maniwa, K., A. Jongman, and T. Wade (2009). Acoustic characteristics of clearly spoken English frica tives. Journal of the American Acoustical Society 125(6), 3962–3973. Männel, C., and A. D. Friederici (2009). Pauses and intonational phrasing: ERP studies in 5-monthold German infants and adults. Journal of Cognitive Neuroscience 21, 1988–2006. Manolescu, A., D. Olson, and M. Ortega-Llebaria (2009). Cues to contrastive focus in Romanian. In M. Vigário, S. Frota, and M. J. Freitas (eds.), Interactions in Phonetics and Phonology, 71–90. Amsterdam: John Benjamins. Manson, J. H., G. Bryant, M. M. Gervais, and M. A. Kline (2013). Convergence of speech rate in conversation predicts cooperation. Evolution and Human Behavior 34(6), 419–426. ManyBabies Consortium (2020). Quantifying sources of variability in infancy research using the infant-directed-speech preference. Advances in Methods and Practices in Psychological Science 3(1), 24–52. Marantz, A. (1982). Re reduplication. Linguistic Inquiry 13, 483–545. Marchese, L. (1979). Atlas linguistique kru. Abidjan: Institut de Linguistique Appliquée. Marchi, E., B. Schuller, A. Batliner, S. Fridenzon, S. Tal, and O. Golan (2012). Emotion in the speech of children with autism spectrum conditions: Prosody and everything else. In Proceedings of the 3rd Workshop on Child, Computer and Interaction, Portland. Marković, M., and T. Milićev (2017). The effect of rhythm unit length on the duration of vowels in Serbian. Selected Papers on Theoretical and Applied Linguistics 19, 305–311.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

802 References Marlo, M. R., L. C. Mwita, and M. Paster (2014). Kuria tone melodies. Africana Linguistica 20, 277–294. Marotta, G. (1985). Modelli e misure ritmiche: La durata vocalica in italiano. Bologna: Zanichelli. Marotta, G. (2008). Lenition in Tuscan Italian (Gorgia Toscana). In J. B. de Carvalho, T. Scheer, and P. Ségéral (eds.), Lenition and Fortition, 235–271. Berlin: Mouton de Gruyter. Marshall, N., and P. Holtzapple (1976). In R. H. Brookshire (ed.), Melodic Intonation Therapy: Variations on a Theme (6th Clinical Aphasiology Conference), 115–141. Minneapolis, MN: BRK. Martens, M. P. (1988). Notes on Uma verbs. In H. Steinhauer (ed.), Papers in Western Austronesian Linguistics 4, 167–237. Canberra: Australian National University. Martin, D. (1991). Sikaritai phonology. Workpapers in Indonesian Languages and Cultures 9, 91–120. Martin, J. B. (1988). Subtractive morphology as dissociation. In Proceedings of the 7th West Coast Conference on Formal Linguistics, 229–240. Stanford: Stanford Linguistic Association. Martin, J. B. (1996). Proto-Muskogean stress. Ms., College of William and Mary. Martin, J. B. (2011). A Grammar of Creek (Muskogee). University of Nebraska Press. Martin, J. G. (1972). Rhythmic (hierarchical) versus serial structure in speech and other behavior. Psychological Review 79, 487–509. Martin, L. (1984). The emergence of phonemic tone in Mocho (Mayan). Talk presented at the American Anthropological Association Meeting, Denver. Martin, P. (2014). À propos des variations prosodiques régionales. Nouveaux cahiers de linguistique française 31, 77–85. Martin, S. E. (1951). Korean phonemics. Language 27, 519–533. Martin, S. E. (1992). A Reference Grammar of Korean: A Complete Guide to the Grammar and History of the Korean Language. Tokyo: Tuttle. Martínez-Paricio, V., and R. Kager (2017). Metrically conditioned pitch and layered feet in Chugach Alutiiq. Loquens 3(2), e030. Martins, V. (2005). Reconstrução fonológica do Proto-Maku Oriental. PhD dissertation, Vrije Universiteit Amsterdam. Martzoukou, M., D. Papadopoulou, and M.-H. Kosmidis (2017). The comprehension of syntactic and affective prosody by adults with autism spectrum disorder without accompanying cognitive deficits. Journal of Psycholinguistic Research 46(6), 1573–1595. Masataka, N. (1992). Motherese in a signed language. Infant Behavior and Development 15, 453–460. Masataka, N. (1996). Perception of motherese in a signed language by 6-month-old deaf infants. Developmental Psychology 32(5), 874–879. Masataka, N. (1998). Perception of motherese in Japanese Sign Language by 6-month-old hearing infants. Developmental Psychology 34(2), 241–246. Mase, H. (1973). A study of the role of syllable and mora for the tonal manifestation in West Greenlandic. Annual Report of the Institute of Phonetics, University of Copenhagen 7, 1–98. Mase, H., and J. Rischel (1971). A study of consonant quantity in West Greenlandic (No. 5). Annual Report of the Institute of Phonetics, University of Copenhagen 5, 175–247. Masica, C. P. (1991). The Indo-Aryan Languages. Cambridge: Cambridge University Press. Maskikit-Essed, R., and C. Gussenhoven (2016). No stress, no pitch accent, no prosodic focus: The case of Ambonese Malay. Phonology 33, 353–389. Mason-Apps, E., V. Stojanovik, and C. Houston-Price (2011). Early word segmentation in typically developing infants and infants with Down syndrome: A preliminary study. In Proceedings of the 17th International Congress of Phonetic Sciences, 1334–1337, Hong Kong. Massenet, J.-M. (1980). Notes sur l’intonation finale en inuktitut. Cahier de linguistique 10, 195–214. Massenet, J.-M. (1986). Étude phonologique d’un dialecte inuit canadien. Québec: Association Inuksiutiit Katimajiit. Massicotte-Laforge, S., and R. Shi (2015). The role of prosody in infants’ early syntactic analysis and grammatical categorization. Journal of the Acoustical Society of America 138, 441–446. Matarazzo, J. D., and A. N. Wiens (1967). Interviewer influence on durations of interviewee silence. Journal of Experimental Research in Personality 2(1), 56–69.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 803 Mateo Toledo, B. E. (1999). La cuestión Akateko-Q’anjob’al, una comparación gramatical. Licentiate thesis, Unversidad Mariano Gálvez. Mateo Toledo, B. E. (2008). The family of complex predicates in Q’anjob’al (Maya): Their syntax and meaning. PhD dissertation, University of Texas at Austin. Mateus, M. H. M., and E. Andrade (2000). The Phonology of Portuguese. Oxford: Oxford University Press. Matisoff, J. A. (1973). Tonogenesis in Southeast Asia. In L. M. Hyman, Consonant Types and Tone (Southern California Occasional Papers in Linguistics 3), 71–95. Los Angeles: University of Southern California. Matisoff, J. A. (1978). Mpi and Lolo-Burmese Microlinguistics. Tokyo: Institute for the Study of Languages and Cultures of Asia and Africa. Matisoff, J. A. (1999). Tibeto-Burman tonology in areal context. In S. Kaji (ed.), Proceedings of the Symposium Cross-Linguistic Studies of Tonal Phenomena: Tonogenesis, Typology and Related Topics. Tokyo: Research Institute for Language and Cultures of Asia and Africa, Tokyo University of Foreign Studies. Matsuda, S., and J. Yamamoto (2013). Intervention for increasing the comprehension of affective prosody in children with autism spectrum disorders. Research in Autism Spectrum Disorders 7(8), 938–946. Matsumoto, D., and B. Willingham (2009). Spontaneous facial expressions of emotion of blind individuals. Journal of Personality and Social Psychology 96(1), 1–10. Matsuura, T. (2014). Nagasaki Hōgen kara Mita Go-onchō no Kōzō. Tokyo: Hituzi Syobo. Matthews, S., and V. Yip (1994). Cantonese: A Comprehensive Grammar. London: Routledge. Mattock, K., and D. K. Burnham (2006). Chinese and English infants’ tone perception: Evidence for perceptual reorganization. Infancy 10(3), 241–265. Mattock, K., M. Molnar, L. Polka, and D. K. Burnham (2008). The developmental course of lexical tone perception in the first year of life. Cognition 106(3), 1367–1381. Mattys, S. L. (2000). The perception of primary and secondary stress in English. Perception and Psychophysics 62, 253–265. Mattys, S. L. (2004). Stress versus coarticulation: Toward an integrated approach to explicit speech segmentation. Journal of Experimental Psychology: Human Perception and Performance 30, 397–408. Mattys, S. L., and J. F. Melhorn (2005). How do syllables contribute to the perception of spoken English? Insight from the migration paradigm. Language and Speech 48(2), 223–252. Mattys, S. L., L. White, and J. F. Melhorn (2005). Integration of multiple speech segmentation cues: A hierarchical framework. Journal of Experimental Psychology: General 134, 477–500. Maturi, P. (1988). L’intonazione delle frasi dichiarative e interrogative nella varietà napoletana di italiano. Rivista italiana di acustica 12, 13–30. Maxwell, O. (2010). Marking of focus in Indian English of L1 Bengali speakers. In Proceedings of the 13th Australasian International Conference on Speech Science and Technology, 58–61, Melbourne. Maxwell, O. (2014). The intonational phonology of Indian English: An autosegmental-metrical ana lysis based on Bengali and Kannada English. PhD dissertation, University of Melbourne. Maxwell, O., and J. Fletcher (2014). Tonal alignment of focal pitch accents in two varieties of Indian English. In Proceedings of the 15th Australasian International Conference on Speech Science and Technology, 59–62, Christchurch. Mayo, C., M. Aylett, and D. R. Ladd (1997). Prosodic transcription of Glasgow English: An evaluation study of GlaToBI. In Proceedings of the European Speech Communication Association Workshop on Intonation, 231–234, Athens. Mazaudon, M., and A. Michaud (2008). Tonal contrasts and initial consonants: A case study of Tamang, a ‘missing link’ in tonogenesis. Phonetica 65, 231–256. Mazuka, R. (2007). The rhythm-based prosodic bootstrapping hypothesis of early language acquisition: Does it work for learning for all languages? Journal of the Linguistic Society of Japan 132, 1–13. Mazuka, R. (2015). Learning to become a native listener of Japanese. In M. Nakayama (ed.), Handbook of Japanese Psycholinguistics, 19–47. Berlin: Mouton de Gruyter. McAuley, J. D., and E. K. Fromboluti (2014). Attentional entrainment and perceived event duration. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1658), 20130401. McCann, J., and S. Peppé (2003). Prosody in autism spectrum disorders: A critical review. International Journal of Language and Communication Disorders 38(4), 325–350.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

804 References McCarthy, J. J. (1979a). On stress and syllabification. Linguistic Inquiry 10, 443–465. McCarthy, J. J. (1979b). Formal problems in Semitic phonology and morphology. PhD dissertation, MIT. McCarthy, J. J. (1981). A prosodic theory of nonconcatenative morphology. Linguistic Inquiry 12, 373–418. McCarthy, J. J. (1982). Prosodic structure and expletive infixation. Language 58, 574–590. McCarthy, J. J. (1993). Template form in prosodic morphology. In L. Smith Stvan (ed.), Papers from the Third Annual Formal Linguistics Society of Midamerica Conference, 187–218. Bloomington: Indiana University Linguistics Club Publications. McCarthy, J. (2012). Pausal phonology and morpheme realization. In T. Borowsky, S. Kawahara, T. Shinya, and M. Sugahara (eds.), Prosody Matters: Essays in Honor of Elisabeth Selkirk, 341–373. London: Equinox. McCarthy, J. J., and A. Prince (1986/1996). Prosodic Morphology. New Brunswick: Rutgers University Center for Cognitive Science. McCarthy, J. J., and A. Prince (1988). Quantitative transfer in reduplicative and templatic morphology. In Linguistic Society of Korea (ed.), Linguistics in the Morning Calm, 3–35. Seoul: Hanshin. McCarthy, J. J., and A. Prince (1990a). Foot and word in prosodic morphology: The Arabic broken plural. Natural Language and Linguistic Theory 8, 209–283. McCarthy, J. J., and A. Prince (1990b). Prosodic morphology and templatic morphology. In M. Eid and J. J. McCarthy (eds.), Perspectives on Arabic linguistics II: Papers from the Second Annual Symposium on Arabic Linguistics, 1–54. Amsterdam: John Benjamins. McCarthy, J. J., and A. Prince (1993a). Generalized alignment. In G. Booij and J. van Marle (eds.), Yearbook of Morphology, 79–153. Dordrecht: Kluwer. McCarthy, J. J., and A. Prince (1993b). Prosodic Morphology: Constraint Interaction and Satisfaction. New Brunswick, NJ: Rutgers University Center for Cognitive Science. McCarthy, J. J., and A. Prince (1994a). The emergence of the unmarked: Optimality in prosodic morphology. In M. Gonzàlez (ed.), Proceedings of the North East Linguistic Society 24, 333–379. Amherst: GLSA. McCarthy, J. J., and A. Prince (1994b). Two lectures on prosodic morphology (Utrecht, 1994). Part I: Template form in prosodic morphology. Part II: Faithfulness and reduplicative identity. Ms., University of Massachusetts Amherst and Rutgers University. McCarthy, J. J., and A. Prince (1999). Faithfulness and identity in prosodic morphology. In R. Kager, H. van der Hulst, and W. Zonneveld (eds.), The Prosody-Morphology Interface, 218–309. Cambridge: Cambridge University Press. McCarthy, J. J., W. Kimper, and K. Mullin (2012). Reduplication in harmonic serialism. Morphology 22, 173–232. McCawley, J. D. (1968). The Phonological Component of a Grammar of Japanese. The Hague: Mouton. McCawley, J. D. (1978). What is a tone language? In V. A. Fromkin (ed.), Tone: A Linguistic Survey, 113–131. New York: Academic Press. McClelland, C. W. (2000). The Interrelations of Syntax, Narrative Structure and Prosody in a Berber Language. Lampeter: Edwin Mellen Press. McClelland, J. L., and J. L. Elman (1986). The TRACE model of speech perception. Cognitive Psychology 18, 1–86. McDonough, J. (1999). Tone in Navajo. Anthropological Linguistics 41, 503–540. McDonough, J. (2002). The prosody of interrogative and focus constructions in Navajo. In A. Carnie, H. Harley, and M. Willie (eds.), Formal Approaches to Functional Phenomena: In Honor of Eloise Jelinek, 191–206. Amsterdam: John Benjamins. McFarland, T. A. (2009). The phonology and morphology of Filomeno Mata Totonac. PhD dissertation, University of California, Berkeley. McGory, J. (1997). The acquisition of intonation patterns in English by native speakers of Korean and Mandarin. PhD dissertation, Ohio State University. McGregor, J., and S. Palethorpe (2008). High rising tunes in Australian English: The communicative function of L* and H* pitch accent onsets. Australian Journal of Linguistics 28(2), 171–193.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 805 McGurk, H., and J. MacDonald (1976). Hearing lips and seeing voices. Nature 264, 746–748. McIntosh, J. D. (2015). Aspects of phonology and morphology of Teotepec Eastern Chatino. PhD dissertation, University of Texas at Austin. McKaughan, H. (ed.) (1973). The Languages of the Eastern Family of the East New Guinea Highland Stock. Seattle: University of Washington Press. McKay, G. (1975). Rembarŋa: A language of central Arnhem Land. PhD dissertation, Australian National University. (Published 2011, Munich: Lincom Europa.) McKay, G. (2000). Ndjébbana. In R. M. W. Dixon and B. Blake (eds.), Handbook of Australian Languages, vol. 5, 155–354. Oxford: Oxford University Press. McKendry, I. (2013). Tonal association, prominence, and prosodic structure in South-Eastern Nochixtlán Mixtec. PhD dissertation, University of Edinburgh. McKeown, G., M. F. Valstar, R. Cowie, and M. Pantic (2010). The SEMAINE Corpus of Emotionally Coloured Character Interactions. In Proceedings of the IEEE International Conference on Multimedia and Expo, 1079–1084, Singapore. McLemore, C. (1991). The pragmatic interpretation of English intonation: Sorority speech. PhD dissertation, University of Texas. McLeod, S., and L. Harrison (2009). Epidemiology of speech and language impairment in a nationally representative sample of 4- to 5-year-old children. Journal of Speech, Language, and Hearing Research 52(5), 1213–1229. McMurray, B. (2007). Defusing the childhood vocabulary explosion. Science 317, 631. McMurray, B., K. A. Kovack-Lesh, D. Goodwin, and W. McEchron (2013). Infant directed speech and the development of speech perception: Enhancing development or an unintended consequence? Cognition 129(2), 362–378. McNally, L. (2013). Semantics and pragmatics. Wiley Interdisciplinary Reviews: Cognitive Science 4, 285–297. McNeill, D. (1992). Hand and Mind: What Gestures Reveal about Thought. Chicago: University of Chicago Press. McNerney, M., and D. Mendelsohn (1992). Suprasegmentals in the pronunciation class: Setting priorities. In P. Avery and S. Ehrlich (eds.), Teaching American English Pronunciation, 185–196. Oxford: Oxford University Press. McPherson, L. (2014). Replacive grammatical tone in the Dogon languages. PhD dissertation, University of California, Los Angeles. McPherson, L. (2017). Tone features revisited: Evidence from Seenku (Mande, Burkino Faso). In D. Payne, S. Pacchiarotti, and M. Bosire (eds.), Diversity in African Languages: Selected Proceedings of the 46th Annual Conference on African Linguistics, 5–21. Berlin: Language Science Press. McPherson, L., and K. M. Ryan (2018). Tone-tune association in Tommo So (Dogon) folk songs. Language 94, 119–156. McQueen, J. M., A. Cutler, and D. Norris (2006). Phonological abstraction in the mental lexicon. Cognitive Science 30, 1113–1126. McQuown, N. (1956). The classification of the Mayan languages. International Journal of American Linguistics 22(3), 191–195. McSweeny, J. L., and L. D. Shriberg (2001). Clinical research with the Prosody-Voice Screening Profile. Clinical Linguistics and Phonetics 15(7), 505–528. McWhorter, J. (2018). The Creole Debate. Cambridge: Cambridge University Press. Meena, R., G. Skantze, and J. Gustafson (2014). Data-driven models for timing feedback responses in a map task dialogue system. Computer Speech and Language 28(4), 903–922. Mehler, J., J. Y. Dommergues, U. H. Frauenfelder, and J. Segui (1981). The syllable’s role in speech segmentation. Journal of Verbal Learning and Verbal Behavior 20(3), 298–305. Mehler, J., E. Dupoux, T. Nazzi, and G. Dehaene-Lambertz (1996). Coping with linguistic diversity: The infant’s viewpoint. In J. L. Morgan and K. Demuth (eds.), Signal to Syntax: Bootstrapping from Speech to Grammar in Early Acquisition, 101–116. Mahwah: Erlbaum. Mehler, J., P. W. Jusczyk, G. Lambertz, N. Halsted, J. Bertoncini, and C. Amiel-Tison (1988). A precursor of language acquisition in young infants. Cognition 29, 143–178.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

806 References Mehrabian, A. (1972). Nonverbal Communication. New York: Routledge. Mehrabian, A., and S. R. Ferris (1967). Inference of attitudes from nonverbal communication in two channels. Journal of Consulting Psychology 31(3), 248–252. Mehrabian, A., and M. Wiener (1967). Decoding of inconsistent communications. Journal of Personality and Social Psychology 6(1), 109–114. Meir, I., and W. Sandler (2008). A Language in Space: The Story of Israeli Sign Language. New York: Taylor and Francis. Meira, S. (1998). Rhythmic stress in Tiriyó (Cariban). International Journal of American Linguistics 64(4), 352–378. Meléndez, M. (1998). La lengua achagua: Estudio gramatical—Colección Lenguas aborígenes de Colombia (Descripciones 11). Bogotá: University of the Andes. Menn, L., and S. Boyce (1982). Fundamental frequency and discourse structure. Language and Speech 25, 341–383. Mennen, I. (2004). Bi-directional interference in the intonation of Dutch speakers of Greek. Journal of Phonetics 32, 543–563. Mennen, I. (2015). Beyond segments: Towards an L2 intonation learning theory. In E. DelaisRoussarie, M. Avanzi, and S. Herment (eds.), Prosody and Language in Contact: L2 Acquisition, Attrition and Languages in Multilingual Situations, 171–188. Berlin: Springer. Mennen, I., and E. de Leeuw (2014). Beyond segments: Prosody in SLA. Studies in Second Language Acquisition 36(2), 183–194. Mennen, I., F. Schaeffler, and G. J. Docherty (2012). Cross-language differences in fundamental frequency range: A comparison of English and German. Journal of the Acoustical Society of America 131(3), 2249–2260. Meringer, R. (1908). Aus dem Leben der Sprache. Berlin: Behr. Meringer, R., and K. Mayer (1895). Versprechen und Verlesen, eine psyclologisch linguistische Studie. Vienna: John Benjamins. Merkin, R., V. Taras, and P. Steel (2014). State-of-the-art themes in cross-cultural communication research: A meta-analytic review. International Journal of Intercultural Relations 38, 1–23. Mermelstein, P. (1975). Automatic segmentation of speech into syllabic units. Journal of the Acoustical Society of America 58, 880–883. Merrifield, W. R. (1963). Palantla Chinantec syllable types. Anthropological Linguistics 5(5), 1–16. Mertens, P. (2004). The prosogram: Semi-automatic transcription of prosody based on a tonal perception model. In Proceedings of Speech Prosody 2, 549–552, Nara. Mervis, C. B., and C. A. Mervis (1982). Leopards are kitty-cats: Object labeling by mothers for their thirteen-month-olds. Child Development, 267–273. Mesgarani, N., C. Cheung, K. Johnson, and E. F. Chang (2014). Phonetic feature encoding in human superior temporal gyrus. Science 343(6174), 1006–1010. Mester, A. (1990). Patterns of truncation. Linguistic Inquiry 21, 475–485. Metcalf, A. A. (1979). Chicano English. Arlington: Center for Applied Linguistics. Metzger, R. G. (1981). Gramática popular del Carapana. Bogotá: Instituto Lingüístico del Verano. Meyer-Eppler, W. (1957). Realization of prosodic features in whispered speech. Journal of the Acoustical Society of America 29, 104–106. Meyers, C. (2013). Mirroring project update: Intelligible accented speakers as pronunciation models.  TESOL Video News, August. Retrieved 8 June 2020 from http://newsmanager.commpartners.com/tesolvdmis/issues/2013-07-27/6.html. Mhac an Fhailigh, É. (1968). The Irish of Erris, Co. Mayo: A phonemic study. Dublin: Dublin Institute for Advanced Studies. Michael, L. (2003). Between grammar and poetry: The structure of Nanti Karintaa chants. Texas Linguistic Forum 47, 251–262. Michael, L. (2008). Nanti Evidential Practice: Language, Knowledge, and Social Action in an Amazonian Society. Austin: University of Texas at Austin. Michael, L. (2011). The interaction of tone and stress in the prosodic system of Iquito (Zaparoan). Amerindia 35, 53–74.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 807 Michael, L., S. Farmer, G. Finley, C. Beier, and K. S. Acosta (2013). A sketch of Muniche segmental and prosodic phonology. International Journal of American Linguistics 79(3), 307–347. Michael, L., T. Stark, E. Clem, and W. Chang (2015). South American Phonological Inventory Database: v1.1.3. University of California. Retrieved 7 October 2017 from http://linguistics.berkeley.edu/~saphon/en. Michalsky, J. (2016). Perception of pitch scaling in rising intonation: On the relevance of f0 median and speaking rate in German. In Proceedings of the 12th German Phonetics and Phonology Meeting, 115–119, Munich. Michaud, A. (2004). Final consonants and glottalization: New perspectives from Hanoi Vietnamese. Phonetica 61, 119–146. Michaud, A. (2005). Prosodie de langues à tons (naxi et vietnamien): Prosodie de l’anglais—Éclairages croisés. PhD dissertation, New Sorbonne University. Michaud, A. (2017). Tone in Yongning Na: Lexical Tones and Morphotonology (Studies in Diversity Linguistics 13). Berlin: Language Science Press. Michaud, A., and M. Brunelle (2016). Information structure in Asia: Yongning Na (Sino-Tibetan) and Vietnamese (Austroasiatic). In C. Féry and S. Ishihara (eds.), The Oxford Handbook of Information Structure, 774–789. Oxford: Oxford University Press. Michaud, A., and X. He (2007). Reassociated tones and coalescent syllables in Naxi (Tibeto-Burman). Journal of the International Phonetic Association 37, 237–255. Michaud, A., M. C. Nguyễn, and J. Vaissière (2015). Phonetic insights into a simple level-tone system: ‘Careful’ vs. ‘impatient’ realizations of Naxi High, Mid and Low tones. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Michelas, A. (2011). Phonetic and phonological characterization of the intermediate phrase in French. PhD dissertation, University of Provence. Michelas, A., and M. D’Imperio (2012a). Specific contribution of tonal and duration cues to the syntactic parsing of French. In Proceedings of Speech Prosody 6, 147–150, Shanghai. Michelas, A., and M. D’Imperio (2012b). When syntax meets prosody: Tonal and duration variability in French accentual phrases. Journal of Phonetics 40(6), 816–829. Michelas, A. and M. D’Imperio (2015). Prosodic boundary strength guides syntactic parsing of French utterances. Laboratory Phonology 6(1), 119–146. Michelson, K. (1988). A Comparative Study of Lake-Iroquoian Accent. Dordrecht: Kluwer. Micheyl, C., K. Delhommeau, X. Perrot, and A. J. Oxenham (2006). Influence of musical and psychoacoustical training on pitch discrimination. Hearing Research 219, 36–47. Mikuteit, S., and H. Reetz (2007). Caught in the ACT: The timing of aspiration and voicing in East Bengali. Language and Speech 50(2), 247–277. Miller, J. L. (1981). Phonetic perception: Evidence for context-dependent and context-independent processing. Journal of the Acoustical Society of America 69, 822–831. Miller, J. L., and A. M. Liberman (1979). Some effects of later-occurring information on the perception of stop consonant and semivowel. Perception and Psychophysics 25, 457–465. Miller, M. (1999). Desano Grammar (Studies in the Indigenous Languages of Colombia 6). Arlington: Summer Institute of Linguistics/University of Texas at Arlington. Miller, W. (1996). Guarijío: Gramática, textos, y vocabulario. Mexico City: National Autonomous University of Mexico. Miller-Ockhuizen, A. L. (2001). Grounding Ju|ˈHoansi root phonotactics: The phonetics of the guttural OCP and other acoustic modulations. PhD dissertation, The Ohio State University. Millotte, M. J., S. Margules, S. Bernal, M. Dutat, and A. Christophe (2010). Phrasal prosody constrains word segmentaion in French 16-month-olds. Journal of Portuguese Linguistics, 10, 67–86. Millotte, S., A. René, R. Wales, and A. Christophe (2008). Phonological phrase boundaries constrain the online syntactic analysis of spoken sentences. Journal of Experimental Psychology: Learning, Memory, and Cognition 34(4), 874–885. Millotte, S., R. Wales, and A. Christophe (2007). Phrasal prosody disambiguates syntax. Language and Cognitive Processes 22(6), 898–909. Mills, C. K. (1912). The cerebral mechanism of emotional expression. Transactions of the College of Physicians of Philadelphia 34, 381–390.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

808 References Milosky, L. M., and C. Wrobleski (1994). The prosody of irony. Paper presented at the International Society for Humor Studies Conference, Ithaca. Miracle, W. C. (1989). Tone production of American students of Chinese: A preliminary acoustic study. Journal of Chinese Language Teachers Association 24, 49–65. Miševa, A. (1991). Intonacionna sistema na bălgarskija ezik. Sofija: Bălgarska Akademija na Naukite. Miševa, A., and M. Nikov (1998). Intonation in Bulgarian. In D. Hirst and A. Di Cristo (eds.), Intonation Systems: A Survey of 20 Languages, 275–287. Cambridge: Cambridge University Press. Mishra, T., and D. Dimitriadis (2013). Incremental emotion recognition. In INTERSPEECH 2013, 2876–2880, Lyon. Missaglia, F. (1999). Contrastive prosody in SLA: An empirical study with Italian learners of German. In Proceedings of the 14th International Congress of Phonetic Sciences, 551–554, San Francisco. Mitchell, T. F. (1960). Prominence and syllabification in Arabic. Bulletin of Oriental and African Studies 23, 369–389. (Reprinted 1975, T. F. Mitchell (ed.), Principles of Firthian Linguistics, 75–98. London: Longman.). Mithun, M. (1999). The Languages of Native North America. New York: Cambridge University Press. Mitterer, H., Y. Chen, and X. Zhou (2011). Phonological abstraction in processing lexical-tone variation: Evidence from a learning paradigm. Cognitive Science 35, 184–197. Mitterer, H., T. Cho, and S. Kim (2016). How does prosody influence speech categorization? Journal of Phonetics 54, 68–79. Mitterer, H., and J. M. McQueen (2009). Foreign subtitles help but native-language subtitles harm foreign speech perception. PLoS ONE 4, e7785. Mitterer, H., E. Reinisch, and J. M. McQueen (2018). Allophones, not phonemes in spoken-word recognition. Journal of Memory and Language 98, 77–92. Mixdorff, H., and O. Niebuhr (2013). The influence of f0 contour continuity on prominence perception. In INTERSPEECH 2013, 230–234, Lyon. Mixdorff, H., O. Niebuhr, and A. Hönemann A. (2018). Model-based prosodic analysis of charismatic speech. In Proceedings of Speech Prosody 9, 814–818, Poznań. Mixdorff, H., M. Vainio, S. Werner, and J. Järvikivi (2002). The manifestation of linguistic information in prosodic features of Finnish. In Proceedings of Speech Prosody 1, 511–514, Aix-en-Provence. Miyahara, A. (2000). Toward theorizing Japanese communication competence from a non-Western perspective. American Communication Journal 3(3). (Reprinted in F. E. Jandt (ed.) (2004), Intercultural Communication: A Global Reader, 279–291. Thousand Oaks: Sage.) Miyaoka, O. (1971). On syllable modification and quantity in Yupik phonology. International Journal of American Linguistics 37, 219–226. Miyaoka, O. (1985). Accentuation in Central Alaskan Yupik. In M. E. Krauss (ed.), Yupik Eskimo Prosodic Systems: Descriptive and Comparative Studies, 51–76. Fairbanks: Alaska Native Language Center, University of Alaska. Miyaoka, O. (2002). ‘Go’ to wa Nani ka: Esukimô-go kara Nihon’go o Miru. Tokyo: Sanseido. Miyaoka, O. (2012). A Grammar of Central Alaskan Yupik: An Eskimoan Language (Mouton Grammar Library 58). Berlin: De Gruyter Mouton. Miyaoka, O. (2015). ‘Go’ to wa Nani ka: Nihon’go Bumpô to Moji no Kansei. Tokyo: Sanseido. Mo, Y., J. Cole, and E.-K. Lee (2008). Naïve listeners prominence and boundary perception. In Proceedings of Speech Prosody 4, 735–738, Campinas. Mo, Y. (2010). Prosody production and perception with conversational speech. PhD dissertation, University of Illinois at Urbana-Champaign. Mó Isém, R. (2007). Rikemiik li tujaal tziij: Gramática Sakapulteka. Antigua, Guatemala: Oxlajuuj Keej Mayaˈ Ajtzˈiibˈ. Mock, C. (1988). Pitch accent and stress in Isthmus Zapotec. In H. van der Hulst and N. Smith (eds.), Autosegmental Studies on Pitch Accent, 197–223. Dordrecht: Foris. Moen, I. (1991). Functional lateralization of pitch accents and intonation in Norwegian: MonradKrohn’s study of an aphasic patient with altered ‘melody of speech’. Brain and Language 41, 538–554. Mohanan, K. P., and T. Mohanan (1984). Lexical phonology of the consonant system in Malayalam. Linguistic Inquiry 15(4), 575–602.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 809 Mohd Don, Z., J. Yong, and G. Knowles (2008). How words can be misleading: A study of syllable timing and ‘stress’ in Malay. Linguistics Journal 3, 66–81. Moisik, S. R. (2008). A three-dimensional model of the larynx and the laryngeal constrictor mechan ism. MA thesis, University of Victoria. Moisik, S. R., and J. H. Esling (2007). 3D auditory-articulatory modeling of the laryngeal constrictor mechanism. In Proceedings of the 16th International Congress of Phonetic Sciences, 373–378, Saarbrücken. Moisik, S. R., H. Lin, and J. H. Esling (2014). A study of laryngeal gestures in Mandarin citation tones using simultaneous laryngoscopy and laryngeal ultrasound (SLLUS). Journal of the International Phonetic Association 44, 21–58. Mol, H., and E. M. Uhlenbeck (1956). The linguistic relevance of intensity in stress. Lingua 5, 205–213. Mol, H., and E. M. Uhlenbeck (1957). The correlation between interpretation and production of speech sounds. Lingua 6, 333–353. Molholt, G., and F. Hwu (2008). Visualization of speech patterns for language learning. In V. M. Holland and F. P. Fisher (eds.), The Path of Speech Technologies in Computer Assisted Language Learning, 91–122. New York: Routledge. Molineaux, B. (2016a). Rhythmic vs. Demarcational Stress in Mapudungun. Paper presented at the 24th Manchester Phonology Meeting, Manchester. Molineaux, B. (2016b). Morphological and phonological patterns in mapudungun stress assignment. Paper presented at the Linguistics and English Language Phonetics and Phonology Workshop (P-Workshop). Retrieved 21 May 2020 from http://www.homepages.ed.ac.uk/bmolinea/talk/ Pwkshp%20Mapu. Molnar, M., J. Gervain, and M. Carreiras (2013). Within-rhythm class native language discrimination abilities of Basque-Spanish monolingual and bilingual infants at 3.5 months of age. Infancy 19(3), 326–337. Molnar, M., M. Lallier, and M. Carreiras (2014). The amount of language exposure determines nonlinguistic tone grouping biases in infants from a bilingual environment. Language Learning 64(Suppl. 2), 45–64. Monaghan, P., L. White, and M. M. Merkx (2013). Disambiguating durational cues for speech segmentation. Journal of the Acoustical Society of America, 134(1), EL45–EL51. Monetta, L., C. Grindrod, and M. D. Pell (2009). Irony comprehension and theory of mind deficits in patients with Parkinson’s disease. Cortex 45(8), 972–981. Moñino, Y., and P. Roulon (1972). Phonologie Du Gbaya Kara ’Bodoe de Ndongue Bongowen (Région de Bouar, République Centrafricaine). Paris: Société d’Études Linguistiques et Anthropologiques de France. Monrad-Krohn, G. H. (1947). Dysprosody or altered ‘melody of speech’. Brain 70, 405–415. Monsen, R. B. (1974). Durational aspects of vowel production in speech of deaf children. Journal of Speech and Hearing Research 17(September), 386–398. Monsen, R. B., A. M. Engebretson, and N. R. Vemula (1979). Some effects of deafness on the gener ation of voice. Journal of the Acoustical Society of America 66(6), 1680–1690. Montes Rodríguez, M. E. (2004). Morfosintaxis de la lengua Tikuna (Amazonía colombiana) (CESOCCELA Descripciones 15). Bogotá: University of the Andes. Moon, C., R. P. Cooper, and W. P. Fifer (1993). Two-day-olds prefer their native language. Infant Behavior and Development 16(4), 495–500. Moon, C. M., and W. P. Fifer (2000). Evidence of transnatal auditory learning. Journal of Perinatology 20(S8), S37. Moon, R. L. (2002). A comparison of the acoustic correlates of focus in Indian English and American English. MA thesis, University of Florida. Moore, B. C. J. (2013). An Introduction to the Psychology of Hearing. Leiden: Brill. Moore, B. C. J., and B. R. Glasberg (1986). The role of frequency selectivity in the perception of loudness, pitch and time. In B. C. J. Moore (ed.), Frequency Selectivity in Hearing, 251–308. London: Academic Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

810 References Moore, R. R. (1965). A study of Hindi intonation. PhD dissertation, University of Michigan. Mooshammer, C. (2010). Acoustic and laryngographic measures of the laryngeal reflexes of linguistic prominence and vocal effort in German. Journal of the Acoustical Society of America 127, 1047–1058. Moraes, J. (2008). The pitch accents in Brazilian Portuguese: Analysis by synthesis. In Proceedings of Speech Prosody 4, 389–397, Campinas. Morén, B., and E. Zsiga (2006). The lexical and postlexical phonology of Thai tones. Natural Language and Linguistic Theory 24, 113–178. Morén-Duolljá, B. (2013). The prosody of Swedish underived nouns: No lexical tones required. Nordlyd 40, 196–248. Morey, S. D. (2010). The realisation of tones in traditional Tai Phake songs. North East Indian Linguistics 2, 54–69. Morgan, J. (1986). From Simple Input to Complex Grammar. Cambridge, MA: MIT Press. Morgan, J. (1996). A rhythmic bias in preverbal speech segmentation. Journal of Memory and Language 35(5), 666–688. Morgan, J., and K. Demuth (1996). Signal to Syntax: Bootstrapping from Speech to Grammar in Early Acquisition. Mahwah: Erlbaum. Morgan, J., and J. R. Saffran (1995). Emerging integration of sequential and suprasegmental informa tion in preverbal speech segmentation. Child Development 66, 911–936. Morgan, J., R. Shi, and P. Allopenna (1996). Perceptual bases of rudimentary grammatical categories: Toward a broader conceptualization of bootstrapping. In J. L. Morgan and K. Demuth (eds.), Signal to Syntax, 263–283. Hillsdale, NJ: Erlbaum. Morin, Y.-C., and J. D. Kaye (1982). The syntactic bases for French liaison. Journal of Linguistics 18(2), 291–330. Moritz, N. (2016). Uptalk variation in three varieties of Northern Irish English. In Proceedings of Speech Prosody 8, 119–122, Boston. Morphy, F. (1983). Djapu, a Yolngu dialect. In R. M. W. Dixon and B. Blake (eds.), Handbook of Australian Languages, vol. 3, 1–188. Canberra: Australian National University Press. Morrill, T. (2012). Acoustic correlates of stress in English adjective-noun compounds. Language and Speech 55, 167–201. Morrill, T., M. M. Baese-Berk, C. Heffner, and L. C. Dilley (2015). Interactions between distal speech rate, linguistic knowledge, and speech environment. Psychonomic Bulletin and Review 22, 1451–1457. Morrill, T., L. C. Dilley, and J. D. McAuley (2014a). Prosodic patterning in distal speech context: Effects of list intonation and f0 downtrend on perception of proximal prosodic structure. Journal of Phonetics 46, 68–85. Morrill, T., L. C. Dilley, J. D. McAuley, and M. A. Pitt (2014b). Distal rhythm influences whether or not listeners hear a word in continuous speech: Support for a perceptual grouping hypothesis. Cognition 131(1), 69–74. Mortensen, D. (2004). The development of tone sandhi in Western Hmongic: A new hypothesis. Ms., University of California, Berkeley. Morton, J., S. Marcus, and C. Frankish (1976). Perceptual centers (P-centers). Psychological Review 83(5), 405–408. Moshinsky, J. (1974). A Grammar of Southeastern Pomo. Berkeley: University of California Press. Mosonyi, E. E. (2000). Elementos de Piaroa. In G. de Pérez, M. Stella, R. de Montes, and M. Luisa (eds.), Lenguas indígenas de Colombia: Una visión descriptiva. Santafé de Bogotá: Instituto Caro y Cuervo. Mosonyi, J. C., E. E. Mosonyi, and A. Largo (2000). Yavitero. In J. C. Mosonyi and E. E. Mosonyi (eds.), Manual de Lenguas Indígenas de Venezuela, vol. 2, 594–664. Caracas: Fundación Bigott. Most, T., and M. Peled (2007). Perception of suprasegmental features of speech by children with cochlear implants and children with hearing aids. Journal of Deaf Studies and Deaf Education 12, 350–361. Mous, M. (1993). A Grammar of Iraqw (Cushitic Language Studies 9). Hamburg: Buske.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 811 Mous, M. (2009). The typology of tone in Cushitic. Paper presented at the 6th World Congress of African Linguistics, Cologne. Mous, M. (2012). Cushitic. In Z. Frajzyngier and E. Shay (eds.), The Afroasiatic Languages, 342–422. Cambridge: Cambridge University Press. Mücke, D., and M. Grice (2014). The effect of focus marking on supralaryngeal articulation: Is it mediated by accentuation? Journal of Phonetics 44, 47–61. Mücke, D., M. Grice, J. Becker, and A. Hermes (2009). Sources of variation in tonal alignment: Evidence from acoustic and kinematic data. Journal of Phonetics 37(3), 321–338. Mücke, D., M. Grice, and T. Cho (2014). More than a magic moment: Paving the way for dynamics of articulation and prosodic structure. Journal of Phonetics 44, 1–7. Mücke, D., A. Hermes, and T. Cho (2017). Mechanisms of regulation in speech: Linguistic structure and physical control system. Journal of Phonetics 64, 1–7. Mücke, D., N. Nam, A. Hermes, and J. L. Goldstein (2012). Coupling of tone and constriction gestures in pitch accents. In P. Hoole, M. Pouplier, L. Bombien, C. Mooshammer, and B. Kühnert (eds.), Consonant Clusters and Structural Complexity, 205–230. Berlin: Mouton de Gruyter. Muetze, B., and C. Ahland (in press). Mursi. In B. Wakjira, R. Meyer, and Z. Leyew (eds.), The Oxford Handbook of Ethiopian Languages. Oxford: Oxford University Press. Mugele, R. L. (1982). Tone and ballistic syllable in Lalana Chinantec. PhD dissertation, University of Texas at Austin. Müller, A., B. Höhle, M. Schmitz, and J. Weissenborn (2006). Focus-to-stress alignment in 4- to 5-year-old German-learning children. In A. Belletti, E. Bennati, C. Chesi, E. Di Domenico, and I. Ferrari (eds.), Proceedings of GALA (2005), 393–407. Cambridge: Cambridge Scholars Publishing. Mundhenk, A. T., and H. Goschnick (1977). Haroi phonemes. In D. D. Thomas, E. W. Lee, and N. Đ. Liêm (eds.), Papers in Southeast Asian Linguistics 4, 1–15. Canberra: Australian National University. Munro, M. J. (1995). Non-segmental factors in foreign accent. Studies in Second Language Acquisition 17, 17–34. Munro, M. J., and T. M. Derwing (1995). Foreign accent, comprehensibility and intelligibility in the speech of second language learners. Language Learning 45, 73–97. Munro, M. J., and T. M. Derwing (2011). The foundations of accent and intelligibility in pronunciation research. Language Teaching 44(3), 316–327. Munro, P., and C. Ulrich (1984). Structure-preservation and Western Muskogean rhythmic lengthening. In Proceedings of the 3rd West Coast Conference on Formal Linguistics, 191–202. Stanford: Stanford Linguistics Association. Munro, P., and C. Willmond (1994). Chickasaw: An Analytical Dictionary. Norman: University of Oklahoma Press. Murphy, J. (2004). Attending to word-stress while learning new vocabulary. English for Specific Purposes 23(1), 67–83. Murphy, J. (2014). Intelligible, comprehensible, non-native models in ESL/EFL pronunciation teaching. System 42, 258–269. Mushin, I. (2005). Word order pragmatics and narrative functions in Garrwa. Australian Journal of Linguistics 25(2), 253–273. Myers, S. (1996). Boundary tones and the phonetic implementation of tone in Chichewa. Studies in African Linguistics 25, 29–60. Myers, S. (1998). Surface underspecification of tone in Chichewa. Phonology 15, 367–391. Myers, S. (2003). F0 timing in Kinyarwanda. Phonetica 60, 71–97. Myrberg, S. (2010). The Intonational Phonology of Stockholm Swedish (Stockholm Studies in Scandinavian Philology 53). Stockholm: Stockholm University. Myrberg, S., and T. Riad (2015). The prosodic hierarchy of Swedish. Nordic Journal of Linguistics 38, 115–147. Myrberg, S., and T. Riad (2016). On the expression of focus in the metrical grid and in the prosodic hierarchy. In C. Féry and S. Ishihara (eds.), Oxford Handbook of Information Structure, 441–462. Oxford: Oxford University Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

812 References Näätänen, R., P. Paavilainen, T. Rinne, and K. Alho (2007). The mismatch negativity (MMN) in basic research of central auditory processing: A review. Clinical Neurophysiology 118(12), 2544–2590. Nábelek, I., and I. J. Hirsh (1969). On the discrimination of frequency transitions. Journal of the Acoustical Society of America 45, 1510–1519. Nadig, A., and H. Shaw (2012). Acoustic and perceptual measurement of expressive prosody in high-functioning autism: Increased pitch range and what it means to listeners. Journal of Autism and Developmental Disorders 42(4), 499–511. Naeser, M. A., and N. Helm-Estabrooks (1985). CT scan lesion localization and response to melodic intonation therapy with nonfluent aphasia cases. Cortex 21, 203–223. Nagano-Madsen, Y. (1988). Phonetic reality of the mora in Eskimo. Working Papers in General Linguistics and Phonetics [Lund University] 34, 79–82. Nagano-Madsen, Y. (1992). Mora and Prosodic Coordination: A Phonetic Study of Japanese, Eskimo and Yoruba. Lund: Lund University Press. Nagano-Madsen, Y. (1993). Phrase-ﬁnal intonation in West Greenlandic Eskimo. Working Papers in General Linguistics and Phonetics [Lund University] 40, 145–155. Nagano-Madsen, Y. (2014). Acquisition of L2 Japanese intonation: Data from Swedish learners (in Japanese). Japanese Speech Communication 2, 1–27. Nagano-Madsen, Y., and A.-C. Bredvad-Jensen (1995). An analysis of intonational phrasing in West Greenlandic Eskimo reading text. Working Papers in General Linguistics and Phonetics [Lund University] 44, 129–144. Nakagawa, H. (2006). Aspects of the phonetic and phonological structure of the Glui language. PhD dissertation, University of the Witwatersrand. Nakagawa, H. (2010). Phonotactics of disyllabic lexical morphemes in Glui. Working Papers in Corpus-Based Linguistics and Language Education 5, 23–31. Nakata, T., S. E. Trehub, and Y. Yukihiko (2012). Effect of cochlear implants on children’s perception and production of speech prosody. Journal of the Acoustical Society of America 131(2), 1307–1314. Nakatani, C. H., and J. Hirschberg (1994). A corpus-based study of repair cues in spontaneous speech. Journal of the Acoustical Society of America 95(3), 1603–1616. Nakatani, L. H., K. D. O’Connor, and C. H. Aston (1981). Prosodic aspects of American English speech rhythm. Phonetica 38(1–3), 84–105. Nakatani, L. H., and J. A. Schaffer (1978). Hearing ‘words’ without words: Prosodic cues for word perception. Journal of the Acoustical Society of America 63, 234–245. Namy, L. L., L. C. Nygaard, and D. Sauerteig (2002). Gender diﬀerences in vocal accommodation: The role of perception. Journal of Personality and Social Psychology 21(4), 422–432. Nance, C. (2013). Phonetic variation, sound change, and identity in Scottish Gaelic. PhD dissertation, University of Glasgow. Nance, C. (2015). Intonational variation and change in Scottish Gaelic. Lingua 160, 1–19. Naoi, N., Y. Minagawa-Kawai, A. Kaboyashi, K. Takeuchi, K. Nakamura, J. Yamamoto, and S. Kojima (2012). Cerebral responses to infant-directed speech and the effect of talker familiarity. NeuroImage 59, 1735–1744. Narayan, C. R., and L. C. McDermott (2016). Speech rate and pitch characteristics of infant-directed speech: Longitudinal and cross-linguistic observations. Journal of the Acoustical Society of America 139(3), 1272–1281. Naselaris, T., and K. N. Kay (2015). Resolving ambiguities of MVPA using explicit models of representation. Trends in Cognitive Sciences 19(10), 551–554. Nash, D. (1986). Topics in Warlpiri Grammar. New York: Garland. (Revision of author’s 1980 PhD dissertation, MIT.) Nash, D. G. (1979). Yidiny stress: A metrical account. In Proceedings of the 9th Annual Meeting of the North East Linguistics Society, 112–130. New York: Routledge. Nash, J. A. (1992–1994). Underlying low tones in Ruwund. Studies in African Linguistics 23, 223–278. Nash, R., and A. Mulac (1980). The intonation of verifiability. In L. R. Waugh and C. H. van Schooneveld (eds.), The Melody of Language: Intonation and Prosody, 219–241. Baltimore: University Park Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 813 Nassenstein, N. (2016). A preliminary description of Ugandan English. World Englishes 35(3), 396–420. Natale, M. (1975). Convergence of mean vocal intensity in dyadic communication as a function of social desirability. Journal of Personality and Social Psychology 32(5), 790–804. National Institute of Mental Health (2018). Autism spectrum disorder. Retrieved 6 November 2018 from https://www.nimh.nih.gov/health/topics/autism-spectrum-disorders-asd/index.shtml. National Institutes of Health (2016). Quick statistics about hearing. Retrieved from https://www. nidcd.nih.gov/health/statistics/quick-statistics-hearing. Naumann, C. (2008). High and low tone in Taa ǂaan (ǃXóõ). In S. Ermisch (ed.), Khoisan Languages and Linguistics: Proceedings of the 2nd International Symposium, 279–302, Riezlern/Kleinwalsertal. Nazzi, T., J. Bertoncini, and J. Mehler (1998a). Language discrimination by newborns: Toward an understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception and Performance 24(3), 756–766. Nazzi, T., C. Floccia, and J. Bertoncini (1998b). Discrimination of pitch contours by neonates. Infant Behavior and Development 21(4), 779–784. Nazzi, T., G. Iakimova, J. Bertoncini, S. Frédonie, and C. Alcantara (2006). Early segmentation of fluent speech by infants acquiring French: Emerging evidence for crosslinguistic differences. Journal of Memory and Language 54(3), 283–299. Nazzi, T., P. W. Jusczyk, and E. K. Johnson (2000a). Language discrimination by English-learning 5-month-olds: Effects of rhythm and familiarity. Journal of Memory and Language 43(1), 1–19. Nazzi, T., D. G. Kemler Nelson, P. W. Jusczyk, and A.-M. Jusczyk (2000b). Six-month-olds detection of clauses embedded in continuous speech: Effects of prosodic well-formedness. Infancy 1, 123–147. Nazzi, T., K. Mersad, M. Sundara, and G. Iakimova (2014). Early word segmentation in infants acquiring Parisian French: Task-dependent and dialect-specific aspects. Journal of Child Language 41(3), 600–633. Nazzi, T., and F. Ramus (2003). Perception and acquisition of linguistic rhythm by infants. Speech Communication 41(1), 233–243. Needham, D., and M. Davis (1946). Cuicateco phonology. International Journal of American Linguistics 12, 139–146. Neeleman, A., E. Titov, H. van de Koot, and R. Vermeulen (2009). A syntactic typology of topic focus and contrast. In J. van Craenenbroeck (ed.), Alternatives to Cartography (Studies in Generative Grammar 100), 15–52. Berlin: De Gruyter Mouton. Neidle, C. J., J. Kegl, B. Bahan, D. MacLaughlin, and R. G. Lee (2000). The Syntax of American Sign Language: Functional Categories and Hierarchical Structure. Boston: MIT Press. Nenkova, A., A. Gravano, and J. Hirschberg (2008). High frequency word entrainment in spoken dialogue. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 169–172, Columbus. Nercesian, V. (2011). Stress in Wichí (Mataguayan) and its interaction with the word-formation processes. Amerindia 35, 75–102. Nespor, M. (1990). On the rhythm parameter in phonology. In I. Roca (ed.), Logical Issues in Language Acquisition, 157–175. Dordrecht: Foris. Nespor, M. (1993). Fonologia. Bologna: Il Mulino. Nespor, M. (1999). Stress domains. In H. van der Hulst (ed.), Word Prosodic Systems in the Languages of Europe, 117–159. Berlin: Mouton de Gruyter. Nespor, M., and W. Sandler (1999). Prosody in Israeli Sign Language. Language and Speech 42(2–3), 143–176. Nespor, M., M. Shukla, R. van de Vijver, C. Avesani, H. Schraudolf, and C. Donati (2008). Different phrasal prominence realization in VO and OV languages. Lingue e linguaggio 7(2), 1–28. Nespor, M., and I. Vogel (1986). Prosodic Phonology. Dordrecht: Foris. Nespor, M., and I. Vogel (1989). On clashes and lapses. Phonology 6, 69–116. Nespor, M., and I. Vogel (2007). Prosodic Phonology: With a New Foreword (2nd ed.). Berlin: Walter de Gruyter.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

814 References Newlin-Łukowicz, L. (2012). Polish stress: Looking for phonetic evidence of a bidirectional system. Phonology 29(2), 271–329. Newman, S. S. (1965). Zuni Grammar. Albuquerque: University of New Mexico Press. Newman, P., and R. Newman (1981). The question morpheme q in Hausa. Afrika und Übersee 64, 35–46. Newman, J., and R. G. Petterson (1990). The tones of Kairi. Oceanic Linguistics 29(1), 49–76. Newman, M. L., J. W. Pennebaker, D. S. Berry, and J. Richards (2003). Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin 29(5), 665–675. Newman, P. (2000). The Hausa Language: An Encyclopedic Reference Grammar. New Haven: Yale University Press. Newman, P. (2009). Hausa and the Chadic languages. In B. Comrie (ed.), The World’s Major Languages, vol. 2, 705–723. London: Routledge. Newman, R. S., and J. R. Sawusch (1996). Perceptual normalization for speaking rate: Effects of temporal distance. Perception and Psychophysics 58, 540–560. Newman, Z. K. (2000). The Jewish sound of speech: Talmudic chant, Yiddish intonation and the origins of early Ashkenaz. Jewish Quarterly Review 90, 293–336. Newport, E. L., H. Gleitman, and L. R. Gleitman (1977). Mother, I’d rather do it myself: Some effects and non-effects of maternal speech style. In C. E. Snow and C. A. Ferguson (eds.), Talking to Children, 109–149. Cambridge: Cambridge University Press. Newschaffer, C. J., L. A. Croen, J. Daniels, E. Giarelli, J. K. Grether, S. E. Levy, D. S. Mandell, L. A. Miller, J. Pinto-Martin, J. Reaven, A. M. Reynolds, C. E. Rice, D. Schendel, and G. C. Windham (2007). The epidemiology of autism spectrum disorders. Annual Review of Public Health 28, 235–258. Newton, B. (1972). The Generative Interpretation of Dialect: A Study of Modern Greek Phonology. Cambridge: Cambridge University Press. Ng, E.-C. (2009). Non-plateaus, non-tonal heads: Tone assignment in Colloquial Singaporean English. Chicago Linguistic Society 45(1), 487–501. Ng, E.-C. (2011). Reconciling stress and tone in Singaporean English. In L. J. Zhang, R. Rubdy, and L. Alsagoff (eds.), Asian Englishes: Changing Perspectives in a Globalised World, 48–59, 76–92. Singapore: Pearson Longman. Nguyễn, T. A. T., J. Ingram, and J. R. Pensalfini (2008). Prosodic transfer in Vietnamese acquisition of English contrastive stress patterns. Journal of Phonetics 36(1), 158–190. Nguyễn, T. T. H., and G. Boulakia (1999). Another look at Vietnamese intonation. In Proceedings of the 14th International Congress of Phonetic Sciences, 2399–2402, San Francisco. Nguyễn, V. H. (2007). The direction of monosyllabicity in Raglai. In M. Alves, P. Sidwell, and D. Gill (eds.), SEALS VIII Papers from the 8th Annual Meeting of the Southeast Asian Linguistics Society 2007, 121–123. Canberra: Pacific Linguistics. Nguyễn, V. L., and J. A. Edmondson (1997). Tones and voice quality in modern northern Vietnamese: Instrumental case studies. Mon-Khmer Studies 28, 1–18. Ní Chasaide, A. (1985). Preaspiration in phonological stop contrasts. PhD dissertation, University of Bangor. Ní Chasaide, A. (1999). Irish. In Handbook of the International Phonetic Association, 111–116. Cambridge: Cambridge University Press. Ní Chasaide, A. (2003–2006). Prosody of Irish Dialects: The Use of Intonation, Rhythm, Voice Quality for Linguistic and Paralinguistic Signalling. Trinity College Dublin. Retrieved 21 May 2020 from https://www.tcd.ie/slscs/research/projects/past/prosody.php. Ní Chasaide, A., and M. Dalton (2006). Dialect alignment signatures. In Proceedings of Speech Prosody 3, Dresden. Ní Chiosáin, M. (1999). Syllables and phonotactics in Irish. In H. van der Hulst and N. Ritter (eds.), The Syllable: Views and Facts, 551–575. Berlin: Mouton de Gruyter. Ní Chiosáin, M., P. Welby, and R. Espesser (2012). Is the syllabification of Irish a typological exception? Speech Communication 54, 68–91. Nichols, J. (1997). Chechen phonology. In A. S. Kaye (ed.), Phonologies of Asia and Africa, vol. 2, 941–971. Winona Lake, IN: Eisenbrauns.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 815 Nichols, J. (2011). Ingush Grammar. Berkeley: University of California Press. Nicholson, H., and A. H. Teig (2003). How to tell beans from farmers: Cues to the perception of pitch accent in whispered Norwegian. Nordlyd 31, 315–325. Nickels, S., B. Opitz, and K. Steinhauer (2013). ERPs show that classroom-instructed late second language learners rely on the same prosodic cues in syntactic parsing as native speakers. Neuroscience Letters 557, 107–111. Nickels, S., and K. Steinhauer (2018). Prosody-syntax integration in a second language: Contrasting event-related potentials from German and Chinese learners of English using linear mixed effect models. Second Language Research 34(1), 9–37. Nicolaidis, K. (2003). Acoustic variability of vowels in Greek spontaneous speech. In Proceedings of the 15th International Congress of Phonetic Sciences, 3221–3224, Barcelona. Niebuhr, O. (2004). Intrinsic pitch in opening and closing diphthongs of German. In Proceedings of Speech Prosody 3, 733–736, Nara. Niebuhr, O. (2007a). The signalling of German rising-falling intonation categories: The interplay of synchronization, shape, and height. Phonetica 64(2), 174–193. Niebuhr, O. (2007b). Perzeption und kognitive Verarbeitung der Sprechmelodie: Theoretische Grundlagen und empirische Untersuchungen (Language, Context and Cognition 7). Berlin: De Gruyter. Niebuhr, O. (2007c). Categorical perception in intonation: A matter of signal dynamics? In INTERSPEECH 2007, 109–112, Antwerp. Niebuhr, O. (2008). Coding of intonational meanings beyond f0: Evidence from utterance-final /t/ aspiration in German. Journal of the Acoustical Society of America 124, 1252–1263. Niebuhr, O. (2009). Intonation segments and segmental intonations. In INTERSPEECH 2009, 2435–2438, Brighton. Niebuhr, O. (2010). On the phonetics of intensifying emphasis in German. Phonetica 67, 170–198. Niebuhr, O. (2011). Alignment and pitch-accent identification: Implications from f0 peak and plateau contours. In Working Papers of the Institute of Phonetics and Digital Speech Processing, vol. 3838, 77–95. Kiel: University of Kiel. Niebuhr, O. (2012). At the edge of intonation: The interplay of utterance-final f0 movements and voiceless fricative sounds in German. Phonetica 69, 7–21. Niebuhr, O. (2013). The acoustic complexity of intonation. In E. L. Asu and P. Lippus (eds.), Nordic Prosody: Proceedings of the XIth Conference, Tartu 2012, 15–29. Frankfurt: Peter Lang. Niebuhr, O. (2015). Stepped intonation contours, a new field of complexity. In O. Niebuhr and R. Skarnitzl (eds.), Tackling the Complexity in Speech, 39–74. Prague: Univerzita Karlova. Niebuhr, O. (2017). On the perception of ‘segmental intonation’: F0 context effects on sibilant identification in German. Journal on Audio, Speech, and Music Processing 2017, 19. Niebuhr, O., M. Alm, N. Schümchen, and K. Fischer (2017). Comparing visualization techniques for learning second language prosody: First results. International Journal of Learner Corpus Research 3, 252–279. Niebuhr, O., M. D’Imperio, B. Gili Fivela, and F. Cangemi (2011a). Are there ‘shapers’ and ‘aligners’? Individual differences in signalling pitch accent category. In Proceedings of the 17th International Congress of Phonetic Sciences, 120–123, Hong Kong. Niebuhr, O., and E. Dombrowski (2010). Shaping phrase-final rising intonation in German. In Proceedings of Speech Prosody 5, Chicago. Niebuhr, O., and J. Hoekstra (2015). Pointed and plateau-shaped pitch accents in North Frisian. Laboratory Phonology 5, 433–468. Niebuhr, O., and K. J. Kohler (2004). Perception and cognitive processing of tonal alignment in German. In Proceedings of the 1st International Symposium on Tonal Aspects of Languages, 155–158, Beijing. Niebuhr, O., C. Lill, and J. Neuschulz (2011b). At the segment-prosody divide: The interplay of inton ation, sibilant pitch and sibilant assimilation. In Proceedings of the 17th International Congress of Phonetic Sciences, 1478–1481, Hong Kong.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

816 References Niebuhr, O., J. Thumm, and J. Michalsky (2018). Shapes and timing in charismatic speech: Evidence from sounds and melodies. In Proceedings of Speech Prosody 9, 582–586, Poznań. Niebuhr, O., and M. Zellers (2012). Late pitch accents in hat and dip intonation patterns. In O. Niebuhr (ed.), Understanding Prosody: The Role of Context, Function, and Communication, 159–186. Berlin: De Gruyter. Nielsen, K. (2005). Kiche intonation. In UCLA Working Papers in Phonetics 104, 45–60. Los Angeles: University of California. Niemann, H., M. Grice, and D. Mücke (2014). Segmental and positional effects in tonal alignment: An articulatory approach. In Proceedings of the 10th International Seminar on Speech Production, 285–288, Cologne. Niinaga, Y., and S. Ogawa (2011). Kita-ryūkyū Amami Yuwan hōgen no akusento taikei. In Proceedings of the 143th Meeting of the Linguistic Society of Japan, 238–243, Osaka. Nikolaeva, T. M. (1977). Frazovaja intonacija slavjanskix jazykov. Moscow: Nauka. Nikolaeva, T. M. (1982). Semantika akcentnogo vydelenija. Moscow: Nauka. Nikolić, B. M. (1970). Osnovi Mladje Novostokavske Akcentuacije. Belgrade: Institut za Srpskohrvatski Jezik. Nilsenova, M. (2006). Rises and falls: Studies in the semantics and pragmatics of intonation. PhD dissertation, Institute for Logic, Language and Computation, University of Amsterdam. Niparko, J. K., E. A. Tobey, D. J. Thal, L. S. Eisenberg, N. Y. Wang, A. L. Quittner, N. E. Fink, and CDaCI Investigative Team (2010). Spoken language development in children following cochlear implantation. JAMA 303(15), 1498–1506. Nitta, T. (2012). Fukuiken Echizen-chō Kokonogi hōgen no akusento. Onsei Kenkyū 16(1), 63–79. Nixon, J. S., Y. Chen, and N. O. Schiller (2015). Speech variants are processed as abstract categories and context-specific instantiations: Evidence from Mandarin lexical tone production. Language, Cognition, and Neuroscience 30, 491–505. Noiray, A., L. Ménard, and K. Iskarous (2013). The development of motor synergies in children: Ultrasound and acoustic measurements. Journal of Acoustical Society of America 133, 444–452. Nolan, F. (2003). Intonational equivalence: An experimental evaluation of pitch scales. In Proceedings of the 15th International Congress of Phonetic Sciences, 771–774, Barcelona. Nolan, F., and E. L. Asu (2009). The Pairwise Variability Index and coexisting rhythms in language. Phonetica 66(1–2), 64–77. Nolan, F., and H.-S. Jeon (2014). Speech rhythm: A metaphor? Philosophical Transactions of the Royal Society B: Biological Sciences 369(1658), 20130396. Nolan, F., and H. Jónsdóttir (2001). Accentuation patterns in Icelandic. In W. A. van Dommelen and T. Freteheim (eds.), Nordic Prosody: Proceedings of the VIIIth Conference, Trondheim 2000, 187–198. Frankfurt: Peter Lang. Noonan, M. (1992). A Grammar of Lango. Berlin: Mouton de Gruyter. Nooteboom, S. G. (1972). Production and perception of vowel duration: A study of durational properties of vowels in Dutch. PhD dissertation, Utrecht University. Nordbustad, F. (1988). Iraqw Grammar: An Analytical Study of the Iraqw Language. Berlin: D. Reimer. Norris, D., and J. M. McQueen (2008). Shortlist B: A Bayesian model of continuous speech recognition. Psychological Review 115, 357–395. Norris, D., J. M. McQueen, and A. Cutler (2003). Perceptual learning in speech. Cognitive Psychology 47, 204–238. Norris, D., J. M. McQueen, and A. Cutler (2016). Prediction, Bayesian inference and feedback in speech recognition. Language, Cognition and Neuroscience 31, 4–18. Norris, D., J. M. McQueen, A. Cutler, and S. Butterfield (1997). The possible-word constraint in the segmentation of continuous speech. Cognitive Psychology 34, 191–243. Norton, B. (2009). Songs for the Spirits: Music and Mediums in Modern Vietnam. Urbana: University of Illinois Press. Norton, R. (2011). Ama verb morphology: Categories, tone, morphophonemics. Paper presented at the Nuba Mountain Languages Conference, Leiden. Nöth, E., A. Batliner, A. Kiessling, R. Kompe, and H. Niemann (2000). Verbmobil: The use of prosody in the linguistic components of a speech understanding system. IEEE Transactions on Speech and Audio Processing 8, 519–532.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 817 Nougayrol, P. (1979) Le day de bouna (Tchad): I. Élements de description linguistique. Paris: Société des Etudes Linguistiques et Anthropologiques de France. Nougayrol, P. (1989). La language des Aiki dits Rounga. Paris: Librairie Orientaliste Paul Geunthner. Nougayrol, P. (2006). Tones and verb classes in Bongo. In A.-A. Abu-Manga, L. G. Gilley, and A. Storch (eds.), Insights into Nilo-Saharan Language, History and Culture: Proceedings of the 9th Nilo-Saharan Linguistics Colloquium Institute of African and Asian Studies, University of Khartoum, 16–19 February 2004, 335–345. Cologne: Rüdiger Köppe. Nowicki, Jr. S., and M. P. Duke (1994). Individual differences in the nonverbal communication of affect: The diagnostic analysis of nonverbal accuracy scale. Journal of Nonverbal Behavior 18, 9–35. Núñez, R. E., and E. Sweetser (2006). With the future behind them: Convergent evidence from Aymara language and gesture in the crosslinguistic comparison of spatial construals of time. Cognitive Science 30(3), 401–450. Ó Cuiv, B. (1968). The Irish of West Muskerry, Co. Cork. Dublin: Dublin Institute for Advanced Studies. Ó Sé, D. (1989). Contributions to the study of word stress in Irish. Ériu 40, 147–178. Ó Sé, D. (2019). The Irish of West Kerry. Dublin: Dublin Institute for Advanced Studies. Ó Siadhail, M. (1999). Modern Irish: Grammatical Structure and Dialectal Variation. Cambridge: Cambridge University Press. O’Brien, M. (2006). Teaching pronunciation and intonation with computer technology. In L. Ducate and N. Arnold (eds.), Calling on CALL: From Theory and Research to New Directions in Foreign Language Teaching, 127–148. San Marcos: CALICO. O’Brien, M., and U. Gut (2010). Phonological and phonetic realization of different types of focus in L2 speech. In K. Dziubalska-Kołaczyk, M. Wrembel, and M. Kul (eds.), Achievements and Perspectives in the Acquisition of Second Language Speech: New Sounds, 205–215. Frankfurt: Peter Lang. O’Connor, J. D., and G. F. Arnold (1973 [1961]). Intonation of Colloquial English (2nd ed.). London: Longman. Odden, D. (1982). Tonal phenomena in Shambaa. Studies in African Linguistics 13, 177–208. Odden, D. (1994). Adjacency parameters in phonology. Language 70(2), 289–330. Odden, D. (1995). Tone: African languages. In J. A. Goldsmith (ed.), The Handbook of Phonological Theory, 444–475. Oxford: Blackwell. Odden, D. (1998). Verbal tone melodies in Kikerewe. In I. Maddieson and T. J. Hinnebusch (eds.), Language History and Linguistic Description in Africa, 177–184. Trenton: Africa World Press. Odé, C. (1989). Russian Intonation: A Perceptual Description. Amsterdam: Rodopi. Odé, C. (1994). On the perception of prominence in Indonesian. In C. Odé and V. J. van Heuven (eds.), Experimental Studies of Indonesian Prosody, 27–107. Leiden: Department of Languages and Cultures of Southeast Asia and Oceania, Leiden University. Odé, C. (2000). Mpur. In G. Reesink (ed.), Studies in Irian Languages Part II, 59–70. Jakarta: NUSA. Odé, C. (2005). Neutralization or truncation? The perception of two Russian pitch accents on utterance-final syllables. Speech Communication 47(1–2), 71–79. Odé, C. (2008). Transcription of Russian intonation ToRI, an interactive research tool and learning module on the Internet. In P. Houtzagers, J. Kalsbeek, and J. Schaeken (eds.), Dutch Contributions to the Fourteenth International Congress of Slavists, Ohrid: Linguistics (Studies in Slavic and General Linguistics 34), 431–449. Amsterdam: Rodopi. O’Dell, M. (2003). Intrinsic Timing and Quantity in Finnish (Acta Universitatis Tamperensis 979). Tampere: Tampere University Press. O’Dell, M., M. Lennes, and T. Nieminen (2008). Hierarchical levels of rhythm in conversational speech. In Proceedings of Speech Prosody 4, 355–358, Campinas. O’Dell, M., and T. Nieminen (1999). Coupled oscillator model of speech rhythm. In Proceedings of the 14th International Congress of Phonetic Sciences, 1075–1078, San Francisco. O’Dell, M. L., and T. Nieminen (2009). Coupled oscillator model for speech timing: Overview and examples. In M. Vainio, R. Aulanko and O. Aaltonen (eds.), Nordic Prosody: Proceedings of the Xth Conference, Helsinki 2008, 179–190, Frankfurt: Peter Lang.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

818 References Oftedal, M. (1956). A Linguistic Survey of the Gaelic Dialects of Scotland: Vol 3. The Gaelic of Leurbost, Isle of Lewis. Oslo: Norsk Tidsskrift for Sprogvidenskap. Oftedal, M. (1969). ‘Word tones’ in Welsh? In Tilegnet Carl Hj Borgstrøm et Festskrift På 60-Årsdagen, 119–127. Oslo: Universitetsforlaget. Ogden, R. (2001). Turn transition, creak and glottal stop in Finnish talk-in-interaction. Journal of the International Phonetic Association 31(1), 139–152. Oh, S., and S.-A. Jun (2018) Prosodic structure and intonational phonology of the Chungcheong dialect of Korean. Poster presented at Experimental and Theoretical Advances in Prosody 4, Amherst. Ohala, J. J. (1983). Cross-language use of pitch: An ethological view. Phonetica 40(1), 1–18. Ohala, J. J. (1984). An ethological perspective on common cross-language utilization of f0 of voice. Phonetica 41, 1–16. Ohala, M. (1999). Hindi. In Handbook of the IPA, 100–103. Cambridge: Cambridge University Press. Ohm, G. S. (1843). Ueber die Definition des Tones, nebst daran geknüpfter Theorie der Sirene und ähnlicher tonbildender Vorrichtungen. Annalen der Physik und Chemie 59, 513–565. Okell, K., and A. Allott (2001). Burmese/Myanmar Dictionary of Grammatical Forms. Richmond: Curzon. Oller, D. K. (1973). The effect of position in utterance on speech segment duration in English. Journal of the Acoustical Society of America 54(5), 1235–1247. Oller, D. K., E. H. Buder, H. L. Ramsdell, A. S. Warlaumont, and L. Chorna (2013). Functional flexibility of infant vocalization and the emergence of language. Proceedings of the National Academy of Sciences of the United States of America 110(16), 6318–6323. Oppenheim, A. V. (1970). Speech spectrograms using the fast Fourier transform. IEEE Spectrum 7(8), 57–62. Ordin, M., and L. Polyanskaya (2015). Perception of speech rhythm in second language: The case of rhythmically similar L1 and L2. Frontiers in Psychology 6, 316. Ordin, M., and J. Setter (2008a). Objective indicators of rhythmic Russian–English transfer. In Proceedings of the 20th Session of the Russian Acoustical Society, 649–652, St Petersburg. Ordin, M., and J. Setter (2008b). Comparative research of temporal organization of the syllable structure in Hong Kong English, Russian English and British English. In Proceedings of the 20th Session of the Russian Acoustical Society, 653–656, St Petersburg. O’Reilly, M. (2014). Sentence mode, alignment and focus in the intonation of Cois Fharraige, Inis Mór and Gaoth Dobhair Irish: A dual approach. PhD dissertation, Trinity College Dublin. O’Reilly, M., A. Dorn, and A. Ní Chasaide (2010). Focus in Donegal Irish (Gaelic) and Donegal English bilinguals. In Proceedings of Speech Prosody 5, Chicago. O’Reilly, M., and A. Ní Chasaide (2012). H alignment and scaling in nuclear falls in two varieties of Connemara Irish. Paper presented at the Fifth European Conference on Tone and Intonation, Oxford. O’Reilly, M., and A. Ní Chasaide (2015). Declination, peak height and pitch level in declaratives and questions of South Connaught Irish. In INTERSPEECH 2015, 978–982, Dresden. O’Reilly, M., and A. Ní Chasaide (2016). The prosodic effects of focus in the Irish of Cois Fharraige. Paper presented at the Seventh European Conference on Tone and Intonation, Canterbury, UK. Ornitz, E., and E. Ritvo (1976). Medical assessment. In E. Ritvo (ed.), Autism: Diagnosis, Current Research, and Management, 7–26. New York: Spectrum. Orrico, R., R. Savy, and M. D’Imperio (2019). Individual variability in Salerno Italian intonation: Evidence from read and spontaneous speech. In Proceedings of the 15th International Conference of the Italian Voice and Speech Association, Arezzo. Ortega-Llebaria, M., and L. Colantoni (2014). L2 English intonation. Relations between formmeaning associations, access to meaning and L1 transfer. Studies in Second Language Acquisition 36(2), 331–353. Ortega-Llebaria, M., H. Gu, and J. Fan (2013). English speakers, perception of Spanish lexical stress: Context-driven L2 stress perception. Journal of Phonetics 41, 186–197. Ortega-Llebaria, M., M. Nemogá, and N. Presson (2015). Long-term experience with a tonal language shapes the perception of intonation in English words: How Chinese-English bilinguals perceive ‘Rose?’ vs. ‘Rose’. Bilingualism: Language and Cognition 20, 367–383.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 819 Ostendorf, M., P. Price, and S. Shattuck-Hufnagel (1995). The Boston University Radio News Corpus. Boston University Technical Report ECS-95–001. Ostendorf, M., C. Wightman, and N. Veilleux (1993). Parse scoring with prosodic information: An analysis/synthesis approach. Computer Speech and Language 7, 193–210. Ota, M. (2003). The development of lexical pitch accent systems: An autosegmental analysis. Canadian Journal of Linguistics/Revue canadienne de linguistique 48(3–4), 357–383. Ota, M. (2006). Children’s production of word accents in Swedish revisited. Phonetica 63(4), 230–246. Ota, M., N. Yamane, and R. Mazuka (2018). The effects of lexical pitch accent on infant word recognition in Japanese. Frontiers in Psychology 8, 2354. Ots, N. (2017). On the phrase-level function of f0 in Estonian. Journal of Phonetics 65, 77–93. Ott, M., Y. Choi, C. Cardie, and J. T. Hancock (2011). Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 309–319, Portland. Otundo, B. K. (2017a). Intonation variation of declarative questions by Kenyan speakers of English. International Journal of Current Research 9(10), 59412–59419. Otundo, B. K. (2017b). Exploring Ethnically-Marked Varieties of Kenyan English: Intonation and Associated Attitudes. Berlin: LIT. Ouellette, G., and S. R. Baum (1994). Acoustic analysis of prosodic cues in left- and right-hemispheredamaged patients. Aphasiology 8, 257–283. Overall, S. (2007). A grammar of Aguaruna. PhD dissertation, LaTrobe University. Oxenham, A. J. (2012). Pitch perception. Journal of Neuroscience 32(39), 13335–13338. Oxenham, A. J. (2013). Revisiting place and temporal theories of pitch. Acoustical Science and Technology 34(6), 388–396. Özçelik, Ö. (2014). Stress or intonational prominence? Word accent in Kazakh, Turkish, Uyghur and Uzbek. Paper presented at the 10th Workshop on Altaic Formal Linguistics (WAFL 10), Cambridge, MA. Ozerov, P. (2018). Tone assignment and grammatical tone in Anal (Tibeto-Burman). Studies in Language 42(3), 708–733. Özge, U., and C. Bozşahin. (2010). Intonation in the grammar of Turkish. Lingua 120, 132–175. Pakerys, A. (1982). Lietuvių bendrinės kalbos prozodija. Vilnius: Mokslas. Pakerys, A. (1987). Relative importance of acoustic features for perception of Lithuanian stress. In Proceedings of the 11th International Congress of Phonetic Sciences, 319–320, Tallinn. Palakurthy, K. (2017). Prosodic units in Navajo narrative: A quantitative investigation of acoustic cues. Ms., University of California, Santa Barbara. Palancar, E. L. (2004). Verbal morphology and prosody in Otomí. International Journal of American Linguistics 70(3), 251–278. Palancar, E. L., J. D. Amith, and R. C. García (2016). Verbal inflection in Yoloxóchitl Mixtec. In E. L. Palancar and J. L. Léonard (eds.), Tone and Inflection: New Facts and New Perspectives, 295–336. Berlin Mouton de Gruyter. Palancar, E. L., and J. L. Léonard (2016). Tone and Inflection: New Facts and New Perspectives. Berlin: Mouton de Gruyter. Palková, Z. (1994). Fonetika a fonologie češtiny: S obecným úvodem do problematiky oboru. Prague: Karolinum. Palosaari, N. (2011). Topics in Mochoˈ phonology and morphology. PhD dissertation, University of Utah. Pan, H.-H. (2007). Focus and Taiwanese unchecked tones. In C. Lee, M. Gordon, and D. Büring (eds.), Topic and Focus: Cross-Linguistic Perspectives on Meaning and Intonation, 197–216. Dordrecht: Springer. Pandey, P. (1985). Word accentuation in Hindustani English in relation to Hindustani and English. PhD dissertation, University of Poona. Pandey, P. (2015). Indian English prosody. In G. Leitner, A. Hashim, and H.-G. Wolf (eds.), Communicating with Asia, 56–68. Cambridge: Cambridge University Press. Pandharipande, R. (2003). Marathi. In G. Cardona and D. Jain (eds.), The Indo-Aryan Languages, 698–728. London: Routledge.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

820 References Pankratz, L., and E. V. Pike (1967). Phonology and morphotonemics of Ayutla Mixtec. International Journal of American Linguistics 33, 287–299. Panther, F., M. Harvey, K. Demuth, M. Turpin, N. San, M. Proctor, and H. Koch (2017). Syllable and word structure in Kaytetye. Paper presented at the 25th Manchester Phonology Meeting, Manchester. Papaeliou, C., G. Minadakis, and D. Cavouras (2002). Acoustic patterns of infant vocalizations expressing emotions and communicative functions. Journal of Speech, Language, and Hearing Research 45(2), 311–317. Papaeliou, C. F., and C. Trevarthen (2006). Prelinguistic pitch patterns expressing ‘communication’ and ‘apprehension’. Journal of Child Language 33(1), 163–178. Papazachariou, D. (1998). Language variation and the social construction of identity: The sociolinguistic role of intonation among adolescents in Northern Greece. PhD dissertation, University of Essex. Papazachariou, D. (2004). Greek intonation variables in polar questions. In P. Gilles and J. Peters (eds.), Regional Variation in Intonation, 191–217. Tubingen: Max Nieneyer. Pape, D., C. Mooshammer, S. Fuchs, and P. Hoole (2005). Intrinsic pitch differences between German vowels /i:/, /I/ and /y:/ in a cross-linguistic perception experiment. In Proceedings of the ISCA Workshop on Plasticity in Speech Perception, 1–4, London. Papoušek, M., M. H. Bornstein, C. Nuzzo, H. Papoušek, and D. Symmes (1990). Infant responses to prototypical melodic contours in parental speech. Infant Behavior and Development 13(4), 539–545. Papoušek, M., and S. F. C. Hwang (1991). Tone and intonation in Mandarin babytalk to presyllabic infants: Comparison with registers of adult conversation and foreign language instruction. Applied Psycholinguistics 12, 481–504. Papoušek, M., H. Papoušek, and D. Symmes (1991). The meanings of melodies in motherese in tone and stress languages. Infant Behavior and Development 14, 415–440. Parada-Cabaleiro, E., G. Costantini, A. Batliner, M. Schmitt, and B. W. Schuller (2019). DEMoS: an Italian emotional speech corpus. Language Resources and Evaluation 54, 341–383. Pardo, J. A. (2006). On phonetic convergence during conversational interaction. Journal of the Acoustical Society of America 119(4), 2382–2393. Park, J.-W. (1994). Variation of vowel length in Korean. In Y.-K. Kim-Renaud (ed.), Theoretical Issues in Korean Linguistics, 175–188. Stanford: Center for the Study of Language and Information. Park, M.-J. (2013). The Meaning of Korean Prosodic Boundary Tones. Leiden: Brill. Parker, S. G. (1992). Datos del idioma huariapano (Documento de Trabajo 24). Yarinacocha: Ministerio de Educación and Instituto Lingüístico de Verano. Parker, S. G. (1999). A sketch of Iñapari phonology. International Journal of American Linguistics 65, 1–39. Pastätter, M., and M. Pouplier (2017). Articulatory mechanisms underlying onset-vowel organization. Journal of Phonetics 65, 1–14. Paster, M., and R. Beam de Azcona (2005). A phonological sketch of the Yucunany dialect of Mixtepec Mixtec. In Proceedings of the 7th Annual Workshop on American Indigenous Languages, 61–76, Santa Barbara. Patel, R. (2002a). Phonatory control in adults with cerebral palsy and severe dysarthria. Augmentative and Alternative Communication 18(1), 2–10. Patel, R. (2002b). Prosodic control in severe dysarthria: Preserved ability to mark the question-statement contrast. Journal of Speech, Language, and Hearing Research 45(5), 858–870. Patel, R. (2003). Acoustic characteristics of the question-statement contrast in severe dysarthria due to cerebral palsy. Journal of Speech, Language, and Hearing Research 46(6), 1401–1415. Patel, R. (2004). The acoustics of contrastive prosody in adults with cerebral palsy. Journal of Medical Speech-Language Pathology 12(4), 189–193. Patel, R., and P. Campellone (2009). Acoustic and perceptual cues to contrastive stress in dysarthria. Journal of Speech, Language, and Hearing Research 52, 206–222.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 821 Patel, A. D., Y. Xu, and B. Wang (2010). The role of f0 variation in the intelligibility of Mandarin sentences. In Proceedings of Speech Prosody 5, Chicago. Pater, J. (1997a). Minimal violation and phonological development. Language Acquisition 6(3), 201–253. Pater, J. (1997b). Metrical parameter missetting in second language acquisition. In S. J. Hannahs and M. Young-Scholten (eds.), Focus on Phonological Acquisition, 235–261. Amsterdam: John Benjamins. Patil, U., G. Kentner, A. Gollrad, F. Kügler, C. Féry, and S. Vasishth (2008). Focus, word order and intonation in Hindi. Journal of South Asian Linguistics 1(1), 53–67. Patin, C. (2016). Tone and intonation in Shingazidja. In L. J. Downing and A. Rialland (eds.), Intonation in African Tone Languages, 285–321. Berlin: De Gruyter Mouton. Pattanayak, D. P. (1966). A Controlled Historical Reconstruction of Oriya, Assamese, Bengali, and Hindi. The Hague: Mouton. Paul, R., A. Augustyn, A. Klin, and F. Volkmar (2005a). Perception and production of prosody by speakers with autism spectrum disorders. Journal of Autism and Developmental Disorders 35(2), 205–220. Paul, R., and D. Fahim (2014). Assessing communication in autism spectrum disorders. In F. Volkmar, S. Rogers, R. Paul, and K. Pelphrey (eds.), Handbook of Autism and Pervasive Developmental Disorders, 673–694. New York: Wiley. Paul, R., L. D. Shriberg, J. L. McSweeny, D. Cicchetti, A. Klin, and F. Volkmar (2005b). Brief report: Relations between prosodic performance and communication and socialization ratings in high functioning speakers with autism spectrum disorders. Journal of Autism and Developmental Disorders 35(6), 861–869. Paulian, C. (1975). Le kukuya: Langue teke du Congo. Paris: Société d’Études Linguistiques et Anthropologiques de France. Paulmann, S., M. D. Pell, and S. A. Kotz (2008). Functional contributions of the basal ganglia to emotional prosody: Evidence form ERPs. Brain Research 1217, 171–178. Paulmann, S., D. V. Ott, and S. A. Kotz (2011). Emotional speech perception unfolding in time: The role of the basal ganglia. PLoS ONE 6, e17694. Payne, D., and T. Payne (1990). Yagua. In D. C. Derbyshire and G. K. Pullum (eds.), Handbook of Amazonian Languages, 251–474. Berlin: Mouton de Gruyter. Payne, E., B. Post, L. Astruc, P. Prieto, and M. M. Vanrell (2012). Measuring child rhythm. Language and Speech 55(2), 203–229. Payne, J. (1989). Lecciones para el aprendizaje del idioma asheninca (Serie Lingüística Peruana 28). Yarinacocha: Summer Institute of Linguistics/Ministerio de Educación. Pearce, M. (1999). Consonants and tone in Kera (Chadic). Journal of West African Languages 27(1), 33–70. Pearce, M. (2005). Kera tone and voicing. In M. Pearce and N. Topintzi (eds.), University College London Working Papers in Linguistics, vol. 17, 61–82. London: Department of Linguistics and Phonetics, University College London. Pearce, M. (2006). The interaction between metrical structure and tone in Kera. Phonology 23(2), 259–286. Pearce, M. (2013). The Interaction of Tone with Voicing and Foot Structure: Evidence from Kera Phonetics and Phonology. Stanford: CSLI. Peelle, J. E., and M. H. Davis (2012). Neural oscillations carry speech rhythm through to comprehension. Frontiers in Psychology 3, 320. Pell, M. D. (1998). Recognition of prosody following unilateral brain lesion: Influence of functional and structural attributes of prosodic contours. Neuropsychologia 36(8), 701–715. Pell, M. D. (2002). Surveying emotional prosody in the brain. In Proceedings of Speech Prosody 1, 77–82, Aix-en-Provence. Pell, M. D. (2006). Judging emotion and attitudes from prosody following brain damage. Progress in Brain Research 156, 303–317.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

822 References Pell, M. D. (2007). Reduced sensitivity to prosodic attitudes in adults with focal right hemisphere brain damage. Brain and Language 101, 64–79. Pell, M. D., and S. R. Baum (1997a). Unilateral brain damage, prosodic comprehension deficits, and the acoustic cues to prosody. Brain and Language 57, 195–214. Pell, M. D., and S. R. Baum (1997b). The ability to perceive and comprehend intonation in linguistic and affective contexts by brain-damaged adults. Brain and Language 57, 80–99. Pell, M. D., and C. L. Leonard (2003). Processing emotional tone from speech in Parkinson’s disease: A role for the basal ganglia. Cognitive, Affective, and Behavioral Neuroscience 3, 275–288. Pell, M. D., L. Monetta, S. Paulmann, and S. A. Kotz (2009). Recognizing emotions in a foreign language. Journal of Nonverbal Behavior 33, 107–120. Pellegrino, F., C. Coupé, and E. Marsico (2011). Across-language perspective on speech information rate. Language 87(3), 539–558. Pence, A. R. (1964). Intonation in Kunimaipa (New Guinea). In Papers in New Guinea Linguistics 1, 1–15. Canberra: Pacific Linguistics. Penčev, J. (1980). Osnovni intonacionni konturi v bălgarskoto izrečenie. Sofia: Bălgarska Akademija na Naukite. Penfield, J. (1984). Prosodic patterns: Some hypotheses and findings from fieldwork. In J. L. OrnsteinGalicia (ed.), Form and Function in Chicano English, 71–82. Rowley, MA: Newbury House. Penfield, J., and J. L. Ornstein-Galicia (1985). Chicano English: An Ethnic Contact Dialect. Amsterdam: John Benjamins. Peng, S.-H. (1997). Production and perception of Taiwanese tones in different tonal and prosodic contexts. Journal of Phonetics 25, 371–400. Peng, S.-H. (2000). Lexical versus ‘phonological’ representations of Mandarin sandhi tones. In M. B. Broe and J. B. Pierrehumbert (eds.), Papers in Laboratory Phonology V: Acquisition and the Lexicon, 152–167. Cambridge: Cambridge University Press. Peng, S.-H., M. K. M. Chan, C.-Y. Tseng, T. Huang, O. J. Lee, and M. E. Beckman (2005). Towards a pan-Mandarin system for prosodic transcription. In S.-A. Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 230–270. Oxford: Oxford University Press. Peng, S., J. B. Tomblin, and C. Turner (2008). Pediatric cochlear implant recipients and individuals with normal hearing. Ear and Hearing 29(3), 336–351. Peng, G., H.-Y. Zheng, T. Gong, R.-X. Yang, J.-P. Kong, and W. S. Y. Wang (2010). The influence of language experience on categorical perception of pitch contours. Journal of Phonetics 38, 616–624. Penner, K. (2019). Prosodic structure in Ixtayutla Mixtec: Evidence for the foot. PhD dissertation, University of Alberta. Pennington, L., E. Lombardo, N. Steen, and N. Miller (2018). Acoustic changes in the speech of children with cerebral palsy following an intensive program of dysarthria therapy. International Journal of Language and Communication Disorders 53(1), 182–195. Pennington, M., and N. Ellis (2000). Cantonese speakers memory for English sentences with pros odic cues. Modern Language Journal 84(3), 372–389. Pensalfini, R. J. (2000). Suffix coherence and stress in Australian languages. In J. Henderson (ed.), Proceedings of the 1999 Conference of the Australian Linguistic Society. http://www.als.asn.au/. Pensalfini, R. J. (2003). A Grammar of Jingulu, an Aboriginal Language of the Northern Territory (Pacific Linguistics 536). Canberra: Pacific Linguistics. Pentland, C. (2004). Stress in Warlpiri: Stress domains and word-level prosody. PhD dissertation, University of Queensland. Peperkamp, S. (2004). Lexical exceptions in stress systems: Arguments from early language acquisition and adult speech perception. Language 80, 98–126. Peperkamp, S., and E. Dupoux (2002). A typological study of stress ‘deafness’. In C. Gussenhoven and N. Warner (eds.), Laboratory Phonology 7, 203–240. Berlin: Mouton de Gruyter. Peperkamp, S., I. Vendelin, and E. Dupoux (2010). Perception of predictable stress: A cross-linguistic investigation. Journal of Phonetics 38, 422–430.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 823 Pépiot, E. (2014). Male and female speech: A study of mean f0, f0 range, phonation type and speech rate in Parisian French and American English speakers. In Proceedings of Speech Prosody 7, 305–309, Dublin. Peppé, S., J. Cleland, F. Gibbon, A. O’Hare, and P. M. Castilla (2011). Expressive prosody in children with autism spectrum disorders. Journal of Neurolinguistics 24(1), 41–53. Peppé, S., and J. McCann (2003). Assessing intonation and prosody in children with atypical language development: The PEPS-C test and the revised version. Clinical Linguistics and Phonetics 17(4–5), 345–354. Peppé, S., J. McCann, F. Gibbon, A. O’Hare, and M. Rutherford (2006). Assessing prosodic and pragmatic ability in children with high-functioning autism. Journal of Pragmatics 28, 1776–1791. Peppé, S., J. McCann, F. Gibbon, A. O’Hare, and M. Rutherford (2007). Receptive and expressive pros odic ability in children with high-functioning autism. Journal of Speech, Language, and Hearing Research 50(4), 1015–1028. Percival, M., and K. Bamba (2017). Segmental intonation in tonal and non-tonal languages. Journal of the Acoustical Society of America 141: 3701. Pérez González, B. (1985). El chontal de Tucta. Villahermosa: Gobierno del Estado de Tabasco. Pérez-Rosas, V., M. Abouelenien, R. Mihalcea, Y. Xiao, C. Linton, and M. Burzo (2015). Verbal and nonverbal clues for real-life deception detection. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2336–2346, Lisbon. Pérez Vail, J. R. (2007). Xtxolil yool b’a’aj: gramática Tektiteka. Antigua, Guatemala: Oxlajuuj Keej Mayaˈ Ajtzˈiibˈ. Pérez Vail, E. G., B. L. García Jiménez, and O. Jiménez (2000). Tz’ixpub’ente tiib’ qyool: Variación dialectal en Mam. Ciudad de Guatemala: Oxlajuuj Keej Maya’ Ajtz’iib’. Pérez Vail, E. G., and O. Jiménez (1997). Ttxoolil qyool Mam: gramática Mam. Ciudad de Guatemala: Cholsamaj. Perkell, J. S., M. H. Cohen, M. A. Svirsky, M. L. Matthies, I. Garabieta, and M. T. T. Jackson (1992). Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. Journal of the Acoustical Society of America 92, 3078–3096. Perlmutter, D. M. (1991). Sonority and syllable structure in American Sign Language. Linguistic Inquiry 23(3), 407–442. Perlmutter, D. (1993). Sonority and syllable structure in American Sign Language. In G. Coulter (ed.), Phonetics and Phonology: Current Issues in ASL Phonology, vol. 3, 227–261. San Diego: Academic Press. Perrachione, T. K., J. Lee, L. Y. Ha, and P. C. Wong (2011). Learning a novel phonological contrast depends on interactions between individual differences and training paradigm design. Journal of the Acoustical Society of America 130, 461–472. Peters, B., K. J. Kohler, and T. Wesener (2005). Phonetische Merkmale prosodischer Phrasierung in deutscher Spontansprache. In K. J. Kohler, F. Kleber, and T. Wesener (eds.), Prosodic Structures in German Spontaneous Speech, 143–184. Kiel: IPDS. Peters, J. (2006a). Intonation deutscher Regionalsprachen. Berlin: De Gruyter. Peters, J. (2006b). The Cologne word accent revisited. In M. de Vaan (ed.), Germanic Tone Accents, 107–133. Wiesbaden: Steiner. Peters, J. (2007). Bitonal lexical pitch accents in the Limburgian dialect of Borgloon. In T. Riad and C. Gussenhoven (eds.), Tones and Tunes: Vol. 1. Typological Studies in Word and Sentence Prosody, 167–198. Berlin: Mouton de Gruyter. Peters, J. (2008). Tone and intonation in the dialect of Hasselt. Linguistics 46, 983–1018. Peters, J. (2010). Tonal variation of West Germanic languages. In T. Stolz, E. Ruigendijk, and J. Trabant (eds.), Linguistik im Nordwesten. Beiträge zum 1. Nordwestdeutschen Linguistischen Kolloquium, 2008, Bremen, October 10–11, 2008, 79–102. Bochum: Brockmeyer. Peters, J. (2014). Intonation. Heidelberg: Winter. Peters, J. (2018). Phonological and semantic aspects of German intonation. Linguistik Online 88, 85–107. Retrieved 21 May 2020 from https://bop.unibe.ch/linguistik-online/article/view/4191/6292. Peters, J. (2019). Saterland Frisian. Journal of the International Phonetic Association 49, 223–230.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

824 References Peters, J., J. Hanssen, and C. Gussenhoven (2015). The timing of nuclear falls: Evidence from Dutch, West Frisian, Dutch Low Saxon, German Low Saxon, and High German. Laboratory Phonology 6, 1–52. Petrone, C. (2008). Le rôle de la variabilité phonétique dans la représentation phonologique des contours intonatifs et de leur sens. PhD dissertation, University of Provence. Petrone, C., and M. D’Imperio (2008). Tonal structure and constituency in Neapolitan Italian: Evidence for the accentual phrase in statements and questions. In Proceedings of Speech Prosody 4, 301–304, Campinas. Petrone, C., and M. D’Imperio (2009). Is tonal alignment interpretation independent of method ology? In INTERSPEECH 2009, 2459–2462, Brighton. Petrone, C., and M. D’Imperio (2015). Effects of syllable structure on intonation identification in Neapolitan Italian. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Petrone, C., M. D’Imperio, and S. Fuchs (2010). What can prosodic constituency and prominence tell us about the pi-gesture scope? Poster presented at the 12th Conference on Laboratory Phonology, Albuquerque. Petrone, C., M. D’Imperio, S. Fuchs, and L. Lancia (2014). The interplay between prosodic phrasing and accentual prominence on articulatory lengthening in Italian. In Proceedings of Speech Prosody 7, 192–196, Dublin. Petrone, C., and O. Niebuhr (2014). On the intonation of German intonation questions: The role of the prenuclear region. Language and Speech 57, 108–146. Petrone, C., H. Truckenbrodt, C. Wellmann, J. Holzgrefe-Lang, I. Wartenburger, and B. Höhle (2017). Prosodic boundary cues in German: Evidence from the production and perception of bracketed lists. Journal of Phonetics 61, 71–92. Petronio, K., and D. Lillo-Martin (1997). WH-movement and the position of spec-CP: Evidence from American Sign Language. Language 73(1), 18–57. Pettersson, T., and S. Wood (1987). Vowel reduction in Bulgarian and its implications for theories of vowel reduction: A review of the problem. Folia Linguistica 21, 261–280. Peyasantiwong, P. (1986). Stress in Thai. In R. J. Bickner, T. J. Hudak, and P. Peyasantiwong (eds.), A Conference on Thai Studies In Honor of William J. Gedney, 211–230. Ann Arbor: Center for South and Southeast Asian Studies, University of Michigan. Pfau, R. (2016). Non-manuals and tones: A comparative perspective on suprasegmentals and spreading. Linguística: Revista de estudos linguísticos da Universidade do Porto 11, 19–58. Pfau, R., and J. Quer (2007). On the syntax of negation and modals in Catalan Sign Language and German Sign Language.  In P. M. Perniss, R. Pfau, and M. Steinbach (eds.), Visible Variation (Comparative Studies on Sign Language Structure 188), 129–160. The Hague: Walter de Gruyter. Pfau, R., and J. Quer (2010). Nonmanuals: Their grammatical and prosodic roles. In D. Brentari (ed.), Sign Languages, 381–402. Cambridge: Cambridge University Press. Pfitzinger, H., and M. Tamashima (2006). Comparing perceptual local speech rate of German and Japanese speech. In Proceedings of Speech Prosody 3, Dresden. Phạm, A. H. (2008). Is there a prosodic word in Vietnamese? Toronto Working Papers in Linguistics 29, 1–23. Phạm, T. T. H., and M. Brunelle (2014). Ngữ điệu và các tiểu từ cuối câu trong tiếng Cham Phan Rang. Ngôn Ngữ 6, 57–69. Phillips, J. R. (1973). Syntax and vocabulary of mothers speech to young children: Age and sex comparisons. Child Development 44, 182–185. Phonetik Köln (2020). Grundlagen Prosodie. Retrieved 22 May 2020 from http://www.gtobi.unikoeln.de/x_grundlagen_prosodie.html#phrasierung. Piazza, E. A., M. C. Iordan, and C. Lew-Williams (2017). Mothers consistently alter their unique vocal fingerprints when communicating with infants. Current Biology 27(20), 3162–3167.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 825 Picanço, G. L. (2005). Mundurukú: Phonetics, phonology, synchrony, diachrony. PhD dissertation, University of British Columbia. Pickering, L., and C. Wiltshire (2000). Pitch accent in Indian-English teaching discourse. World Englishes 19(2), 173–183. Pickering, M. J., and S. Garrod (2004). Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences 27(2), 169–190. Pickering, M. J., and S. Garrod (2013). An integrated theory of language production and comprehension. Behavioral and Brain Sciences 36(4), 329–347. Pierrehumbert, J. B. (1979). The perception of fundamental frequency declination. Journal of the Acoustical Society of America 66, 363–369. Pierrehumbert, J. B. (1980). The phonology and phonetics of English intonation. PhD dissertation, MIT. (Distributed 1988, Indiana University Linguistics Club.) Pierrehumbert, J. B. (1981). Synthesizing intonation. Journal of the Acoustical Society of America 70, 985–995. Pierrehumbert, J. B. (2000). Tonal elements and their alignment. In M. Horne (ed.), Prosody: Theory and Experiment—Studies Presented to G. Bruce, 11–26. Dordrecht: Kluwer. Pierrehumbert, J. B., and M. E. Beckman (1988). Japanese Tone Structure (Linguistic Inquiry Monograph 15). Cambridge, MA: MIT Press. Pierrehumbert, J. B., and J. Hirschberg (1990). The meaning of intonational contours in the interpret ation of discourse. In Philip R. Cohen, Jerry Morgan & Martha E. Pollack (eds.), Intentions in Communication, 271–311. Cambridge: MIT Press. Pierrehumbert, J. B., and D. Talkin (1992). Lenition of /h/ and glottal stop. In G. J. Docherty and D. R. Ladd (eds.), Papers in Laboratory Phonology II: Gesture, Segment, Prosody, 90–117. Cambridge: Cambridge University Press. Pigott, P. (2012). Degemination and prosody in Labrador Inuttut: An acoustic study. MA thesis, Memorial University of Newfoundland. Pike, E. V., and J. Oram (1976). Stress and tone in the phonology of Diuxi Mixtec. Phonetica 33, 321–333. Pike, E. V., and E. Scott (1962). The phonological hierarchy of Marinahua. Phonetica 8, 1–8. Pike, E. V., and P. Small (1974). Downstepping terrace tone in Coatzospan Mixtec. In R. B. Brend (ed.), Advances in Tagmemics (North-Holland Linguistic Series 9), 105–134. Amsterdam: North-Holland. Pike, E. V., and K. Wistrand (1974) Step-up terrace tone in Acatlán Mixtec (Mexico). In R. B. Brend (ed.), Advances in Tagmemics (North-Holland Linguistic Series 9), 81–104. Amsterdam: North-Holland. Pike, K. L. (1945). The Intonation of American English. Ann Arbor: University of Michigan Press. Pike, K. L. (1946). Phonemic pitch in Maya. International Journal of American Linguistics 12(2), 82–88. Pike, K. L. (1948). Tone Languages. Ann Arbor: University of Michigan Press. Pilch, H. (1975). Advanced Welsh phonemics. Zeitschrift für celtische Philologie 34, 60–102. Pitrelli, J. F., M. E. Beckman, and J. Hirschberg (1994). Evaluation of prosodic transcription labeling reliability in the ToBI framework. In Proceedings of the 3th International Conference on Spoken Language Processing, 123–126, Yokahama. Pitt, M. A., and A. G. Samuel (1990). The use of rhythm in attending to speech. Journal of Experimental Psychology: Human Perception and Performance 16, 564–573. Pitt, M. A., C. Szostak, and L. C. Dilley (2016). Rate dependent speech processing can be speech specific: Evidence from the perceptual disappearance of words under changes in context speech rate. Attention, Perception, and Psychophysics 78, 334–345. Pittayaporn, P. (2005). Moken as a Mainland Southeast Asian language. In A. Grant and P. Sidwell (eds.), Chamic and Beyond, 189–209. Canberra: Pacific Linguistics. Pittayaporn, P. (2007). Prosody of final particles in Thai: Interaction between lexical tones and boundary tones. Paper presented to the International Workshop on Intonation Phonology: Understudied or Fieldwork Languages, Saarbrücken.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

826 References Plénat, M. (1984). Toto, Fanfa, Totor et même Guiguitte sont des anars. In F. Dell, D. Hirst, and J.R. Vergnaud (eds.), Forme sonore du langage: Structure des représentations en phonologie, 161–181. Paris: Hermann. Plomp, R. (1967). Pitch of complex tones. Journal of the Acoustical Society of America 41, 1526–1533. Plutchik, R. (2002). Nature of emotions. American Scientist 89(4), 344–350. Podlipský, V. J., R. Skarnitz, and J. Volín (2009). High front vowels in Czech: A contrast in quantity or quality? In INTERSPEECH 2009, 132–135, Brighton. Poellmann, K., H. R. Bosker, J. M. McQueen, and H. Mitterer (2014). Perceptual adaptation to segmental and syllabic reductions in continuous spoken Dutch. Journal of Phonetics 46, 101–127. Poeppel, D., W. J. Idsardi, and V. van Wassenhove (2008). Speech perception at the interface of neurobiology and linguistics. Philosophical Transactions of the Royal Society B: Biological Sciences 363, 1071–1086. Pointon, G. E. (1980). Is Spanish really syllable-timed? Journal of Phonetics 8(3), 293–304. Pointon, G. E. (1995). Rhythm and duration in Spanish. In J. Windsor-Lewis (ed.), Studies in General and English Phonetics: Essays in Honour of Professor J. D. O’Connor, 266–269. London: Routledge. Polian, G. (2013). Gramática del Tseltal de Oxchuc. Mexico: Centro de Investigaciones y Estudios Superiores en Antropología Social. Polich, J. (2007). Updating P300: An integrative theory of P3a and P3b. Clinical Neurophysiology 118(10), 2128–2148. Politzer-Ahles, S., K. Schluter, K. Wu, and D. Almeida (2016). Asymmetries in the perception of Mandarin tones: Evidence from mismatch negativity. Journal of Experimental Psychology: Human Perception and Performance 42, 1547–1570. Polka, L., and M. Sundara (2012). Word segmentation in monolingual infants acquiring Canadian English and Canadian French. Infancy 17(2), 198–232. Polyanskaya, L., and M. Ordin (2015). Acquisition of speech rhythm in first language. Journal of the Acoustical Society of America 138(3), EL199–EL204. Polyanskaya, L., M. Ordin, and M. G. Busa (2017). Relative salience of speech rhythm and speech rate on perceived foreign accent in L2. Language and Speech 60, 333–355. Polzehl, S., and F. Metze (2010). Approaching multi-lingual emotion recognition from speech: On language dependency of acoustic/prosodic features for anger detection. In Proceedings of Speech Prosody 5, Chicago. Poma, M., T. J. T. de la Cruz, M. C. Caba, M. M. Brito, D. S. Marcos, and N. Cedillo (1996). Gramática del idioma Ixil. Antigua, Guatemala: Proyecto Lingüístico Francisco Marroquín. Pompino-Marschall, B., E. Steriopolo, and M. Żygis (2016). Ukrainian. Journal of the International Phonetic Association 47(3), 349–357. Pons, F., and L. Bosch (2010). Stress pattern preference in Spanish-learning infants: The role of syllable weight. Infancy 15, 223–245. Port, R. F. (2003). Meter and speech. Journal of Phonetics 31(3–4), 599–611. Port, R. F., and J. Dalby (1982). Consonant/vowel ratio as a cue for voicing in English. Perception and Psychophysics 32, 141–152. Port, R. F., and A. P. Leary (2005). Against formal phonology. Language 81(4), 927–964. Portes, C., and C. Beyssade (2012). Is intonational meaning compositional? Verbum 34. Retrieved 21 May 2020 from http://www2.lpl-aix.fr/~c3i/doc/PortesBeyssade_2014. Portes, C., C. Beyssade, A. Michelas, J.-M. Marandin, and M. Champagne-Lavau (2014). The dialogical dimension of intonational meaning: Evidence from French. Journal of Pragmatics 74(Suppl. C), 15–29. Portes, C., M. D’Imperio, and L. Lancia (2012). Positional constraints on the initial rise in French. In Proceedings of Speech Prosody 6, 563–566, Shanghai. Portes, C., and U. Reyle (2014). The meaning of French ‘implication’ contour in conversation. In Proceedings of Speech Prosody 7, 413–417, Dublin. Poser, W. J. (1984a). The phonetics and phonology of tone and intonation in Japanese. PhD dissertation, MIT.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 827 Poser, W. J. (1984b). Hypocoristic formation in Japanese. In Proceedings of the 3rd West Coast Conference on Formal Linguistics, 218–229. Stanford: Stanford Linguistic Association. Poser, W. J. (1989). The metrical foot in Diyari. Phonology 6, 117–148. Poser, W. J. (1990). Evidence for foot structure in Japanese. Language 66, 78–105. Post, B. (2000a). Tonal and phrasal structures in French intonation. PhD dissertation, Radboud University Nijmegen. Post, B. (2000b). Pitch accents, liaison and the phonological phrase in French. Probus 12, 127–164. Post, B. (2011). The multi-faceted relation between phrasing and intonation in French. In C. Lléo and C. Gabriel (eds.), Hamburger Studies in Multilingualism 10: Intonational Phrasing at the Interfaces— Cross-Linguistic and Bilingual Studies in Romance and Germanic, 44–74. Amsterdam: John Benjamins. Post, M. W. (2009). The phonology and grammar of Galo ‘words’: A case study in benign disunity. Studies in Language 33, 934–974. Potisuk, S., J. T. Gandour, and M. P. Harper (1994). F0 correlates of stress in Thai. Linguistics of the Tibeto-Burman Area 17, 1–27. Potisuk, S., J. T. Gandour, and M. P. Harper (1996). Acoustic correlates of stress in Thai. Phonetica 53, 200–220. Potisuk, S., J. T. Gandour, and M. P. Harper (1997). Contextual variations in trisyllabic sequences of Thai tones. Phonetica 54, 22–42. Pouplier, M., and L. Goldstein (2010). Intention in articulation: Articulatory timing in alternating consonant sequences and its implications for models of speech production. Language and Cognitive Processes 25(5), 616–649. Prator, C., and B. Robinett (1985). Manual of American English Pronunciation (4th ed.). New York: Holt, Rinehart and Winston. Prechtel, C., and C. G. Clopper (2016). Uptalk in Midwestern American English. In Proceedings of Speech Prosody 8, 133–137, Boston. Prehn, M. (2012). Vowel Quantity and the Fortis/Lenis Distinction in North Low Saxon. Utrecht: LOT Dissertation Series. Prelock, P. A., T. Hutchins, and F. P. Glascoe (2008). Speech-language impairment: How to identify the most common and least diagnosed disability of childhood. Medscape Journal of Medicine 10(6), 136. Prentice, D. J. (1971). The Murut Languages of Sabah. Canberra: Australian National University. Price, P., M. Ostendorf, and S. Shattuck-Hufnagel (1991). Disambiguating sentences using prosody. In Proceedings of the 12th International Congress of Phonetic Sciences, 418–421, Aix-en-Provence. Prieto, P. (2004). Fonètica I fonologia: Els sons del català. Barcelona: Editorial UOC. Prieto, P. (2005). Stability effects in tonal clash contexts in Catalan. Journal of Phonetics 33, 215–242. Prieto, P. (2006). The relevance of metrical information in early prosodic word acquisition: A comparison of Catalan and Spanish. Language and Speech 49(2), 233–261. Prieto, P. (2009). Tonal alignment patterns in Catalan nuclear falls. Lingua 119, 865–880. Prieto, P. (2012). Experimental methods and paradigms for prosodic analysis. In A. Cohn, C. Fougeron, and M. Huffman (eds.), The Oxford Handbook of Laboratory Phonology, 528–537. Oxford: Oxford University Press. Prieto, P. (2014). The intonational phonology of Catalan. In S.-A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 43–80. Oxford: Oxford University Press. Prieto, P. (2015). Intonational meaning. Wiley Interdisciplinary Reviews: Cognitive Science 6, 371–381. Prieto, P., J. Borràs-Comes, T. Cabré, V. Crespo-Sendra, I. Mascaró, P. Roseano, R. Sichel-Bazin, and M. M. Vanrell (2015). Intonational phonology of Catalan and its dialectal varieties. In S. Frota and P. Prieto (eds.), Intonation in Romance, 9–62. Oxford: Oxford University Press. Prieto, P., and J. Borràs-Comes (2018). Question intonation contours as dynamic epistemic operators. Natural Language and Linguistic Theory 36, 563–586. Prieto, P., and T. Cabré (eds.) (2013). L’entonació dels dialectes catalans. Barcelona: Publicacions de l’Abadia de Montserrat.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

828 References Prieto, P., M. D’Imperio, G. Elordieta, S. Frota, and M. Vigário (2006). Evidence for soft preplanning in tonal production: Initial scaling in Romance. In Proceedings of Speech Prosody 3, Dresden. Prieto, P., M. D’Imperio, and B. Gili Fivela (2005). Pitch accent alignment in Romance: Primary and secondary association with metrical structure. In P. Warren (ed.), Intonation in language varieties (special issue), Language and Speech 48(4), 359–396. Prieto, P., E. Estebas-Vilaplana, and M. M. Vanrell (2010). The relevance of prosodic structure in tonal articulation: Edge effects at the syllable and prosodic word level in Catalan and Spanish. Journal of Phonetics 38, 687–705. Prieto, P., and N. Esteve-Gibert (2018). The Development of Prosody in First Language Acquisition. Amsterdam: John Benjamins. Prieto, P., A. Estrella, J. Thorson, and M. M. Vanrell (2012a). Is prosodic development correlated with grammatical development? Evidence from Catalan, English, and Spanish. Journal of Child Language 39(2), 221–257. Prieto, P., and G. Rigau (2007). The syntax-prosody interface: Catalan interrogative sentences headed by que. Journal of Portuguese Linguistics 6(2), 29–59. Prieto, P., and P. Roseano (eds.) (2010). Transcription of Intonation of the Spanish Language. Munich: Lincom Europa. Prieto, P., C. Shih, and H. Nibert (1996). Pitch downtrend in Spanish. Journal of Phonetics 24, 445–473. Prieto, P., and F. Torreira (2007). The segmental anchoring hypothesis revisited: Syllable structure and speech rate effects on peak timing in Spanish. Journal of Phonetics 35(4), 473–500. Prieto, P., and I. van Santen (1996). Secondary stress in Spanish: Some experimental evidence. In C. Parodi, C. Quicoli, M. Saltarelli, and M. L. Zubizarreta (eds.), Aspects of Romance Linguistics, 337–356. Washington, DC: Georgetown University Press. Prieto, P., J. P. H. van Santen, and J. Hirschberg (1995). Tonal alignment patterns in Spanish. Journal of Phonetics 23, 429–451. Prieto, P., M. M. Vanrell, L. Astruc, E. Payne, and B. Post (2012b). Phonotactic and phrasal properties of speech rhythm: Evidence from Catalan, English, and Spanish. Speech Communication 54(6), 681–702. Prince, A. (1980). A metrical theory for Estonian quantity. Linguistic Inquiry 11, 511–562. Prince, A. (1983). Relating to the grid. Linguistic Inquiry 14, 19–100. Prince, A. (1989). Metrical forms. In P. Kiparsky and G. Youmans (eds.), Phonetics and Phonology: Vol. 1. Rhythm and Meter, 45–80. San Diego: Academic Press. Prince, A., and P. Smolensky (1993/2004). Optimality Theory: Constraint Interaction in Generative Grammar. Malden: Blackwell. Prom-on, S., Y. Xu, W. Gu, A. Arvaniti, H. Nam, and D. H. Whalen (2016). The Common Prosody Platform (CPP): Where theories of prosody can be directly compared. In Proceedings of Speech Prosody 8, 1–5, Boston. Pronovost, W., M. P. Wakstein, and D. J. Wakstein (1966). A longitudinal study of the speech behavior and language comprehension of fourteen children diagnosed atypical or autistic. Exceptional Children 33(1), 19–26. Proto, T. (2016). Methods of analysis for tonal text-setting: The case study of Fe’Fe’Bamileke. In Proceedings of 5th International Symposium on Tonal Aspects of Languages 2016, 162–166, Buffalo, NY. Prunet, J.-F. (1987). Liaison and nasalization in French. In C. Neidle and R. Nuñez Cedeño (eds.), Studies in Romance Languages, 225–235. Dordrecht: Foris. Puga, K., R. Fuchs, J. Setter, and P. Mok (2017). The perception of English intonation patterns by German L2 speakers of English. In INTERSPEECH 2017, 3241–3245, Stockholm. Pugh-Kitingan, J. (1984). Speech-tone realisation in Huli music. In J. C. Kassler and J. Stubington (eds.), Problems and Solutions: Occasional Essays in Musicology Presented to Alice M. Moyle, 94–120. Sydney: Hale and Ironmonger. Pulleyblank, D. (1986). Tone in Lexical Phonology. Dordrecht: Reidel.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 829 Pulvermüller, F. (2005). Brain mechanisms linking language and action. Nature Reviews Neuroscience 6(7), 576–582. Puri, V. (2013). Intonation in Indian English and Hindi late and simultaneous bilinguals. PhD dissertation, University of Illinois at Urbana-Champaign. Pürschel, H. (1975). Pause und Kadenz: Interferenzerscheinungen bei der englischen Intonation deutscher Sprecher. Tübingen: Niemeyer. Qian, M., E. Chukharev-Hudilainen, and J. M. Levis (2018). A system for adaptive high-variability segmental perceptual training: Implementation, effectiveness, transfer. Language Learning and Technology 22(1), 69–96. Qin, Z., Y.-F. Chien, and A. Tremblay (2017). Processing of word-level stress by Mandarin-speaking second language learners of English. Applied Psycholinguistics 38, 541–570. Qin, Z., and A. Jongman (2016). Does second language experience modulate perception of tones in a third language? Language and Speech 59, 318–338. Quam, C., and D. Swingley (2010). Phonological knowledge guides 2-year-olds’ and adults’ interpretation of salient pitch contours in word learning. Journal of Memory and Language 62(2), 135–150. Quam, C., and D. Swingley (2014). Processing of lexical stress cues by young children. Journal of Experimental Child Psychology 123, 73–89. Quené, H., and R. F. Port (2005). Effects of timing regularity and metrical expectancy on spoken-word perception. Phonetica 62, 1–13. Querleu, D., X. Renard, F. Versyp, L. Paris-Delrue, and Crèpin, G. (1988). Fetal hearing. European Journal of Obstetrics and Gynecology and Reproductive Biology 28(3), 191–212. Quiggin, E. C. (1906). A Dialect of Donegal. Cambridge: Cambridge University Press. Quirk, R., S. Greenbaum, G. Leech, and J. Svartvik (eds.) (1985). A Comprehensive Grammar of the English Language. New York: Longman. Rahmani, H. (2018). Persian ‘word stress’ is a syntax-driven tone. In Proceedings of the 6th International Symposium on Tonal Aspects of Languages, 155–158, Berlin. Rahmani, H. (2019). An evidence-based new analysis of Persian word prosody. Utrecht: LOT. Rahmani, H., T. Rietveld, and C. Gussenhoven (2015). Stress ‘deafness’ reveals absence of lexical marking of stress or tone in the adult grammar. PLoS ONE 10(12), e0143968. Rahmani, H., T. Rietveld, and C. Gussenhoven (2018). Post-focal and factive deaccentuation in Persian. Glossa: A Journal of General Linguistics, 3(1), 13. Raimy, E. (2000). The Phonology and Morphology of Reduplication. Berlin: Mouton de Gruyter. Rakerd, B., W. Sennett, and C. A. Fowler (1987). Domain-final lengthening and foot-level shortening in spoken English. Phonetica 44(3), 147–155. Rakić, S. (2010). On the trochaic feet, extrametricality and shortening rules in Standard Serbian. In Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages, 97, Dubrovnik. Ramachers, S. (2018). Setting the Tone: Acquisition and Processing of Lexical Tone in East-Limburgian Dialects of Dutch. Utrecht: LOT. Ramachers, S., S. Brouwer, and P. Fikkert (2017). How native prosody affects pitch processing during word learning in Limburgian and Dutch toddlers and adults. Frontiers in Psychology 8, 1652. Ramachers, S., S. Brouwer, and P. Fikkert (2018). No perceptual reorganization for Limburgian tones? A cross-linguistic investigation with 6- to 12-month-old infants. Journal of Child Language 45(2), 290–318. Ramig, L. O., S. Sapir, S. Countryman, A. A. Pawlas, C. O’Brien, M. Hoehn, and L. L. Thompson (2001). Intensive voice treatment (LSVT®) for patients with Parkinson’s disease: A 2 year follow up. Journal of Neurosurgery and Neurosurgical Psychiatry 71, 493–498. Ramirez, H. (1997). A Fala Tukano dos Ye’pa-Masa: Vol. 1. Gramatica. Manaus, Brazil: Inspetoria salesiana missionaria da Amazonia. Ramirez, H. (2001). Uma gramática Baniwa do Içana. Manaus: Federal University of Amazonas. Ramírez Verdugo, D. (2005). The nature and patterning of native and non-native intonation in the expression of certainty and uncertainty. Journal of Pragmatics 37(12), 2086–2115.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

830 References Ramírez Verdugo, D. (2006). Prosodic realization of focus in the discourse of Spanish learners and English native speakers. Estudios Ingleses de la Universidad Complutense 14, 9–32. Ramírez Verdugo, D., and J. Romero Trillo (2005). The pragmatic function of intonation in L2 discourse: English tag questions used by Spanish speakers. Intercultural Pragmatics 2(2), 151–168. Ramsey, R. (1978). Accent and Morphology in Korean Dialects. Seoul: Tap. Ramus, F. (2002). Acoustic correlates of linguistic rhythm: Perspectives. In Proceedings of Speech Prosody 1, 115–120, Aix-en-Provence. Ramus, F., M. D. Hauser, C. Miller, D. Morris, and J. Mehler (2000). Language discrimination by human newborns and by cotton-top tamarin monkeys. Science 288(5464), 349–351. Ramus, F., M. Nespor, and J. Mehler (1999). Correlates of linguistic rhythm in the speech signal. Cognition 73(3), 265–292. Ramus, F., M. Nespor, and J. Mehler (2003). The psychological reality of rhythm classes: Perceptual studies. In Proceedings of the 15th International Congress of Phonetic Sciences, 337–340, Barcelona. Ran, Q. (2011). Beijinghua, Sichuanhua qiyi ‘dong-dan + ming-dan’ jiegou de yuyin chayi ji yiyi. Shijie Hanyu Jiaoxue 25(4), 235–448. Raphael, L., G. Borden, and K. Harris (2007). Speech Science Primer: Physiology, Acoustics, and Perception of Speech (5th ed.). Baltimore: Lippincott Williams and Wilkins. Rapold, C. J. (2006). Towards a grammar of Benchnon. PhD dissertation, University of Leiden. Rasier, L., and P. Hiligsmann (2007). Prosodic transfer from L1 to L2: Theoretical and methodological issues. Nouveaux cahiers de linguistique française 28, 41–66. Rasin, E. (2016). The stress-encapsulation universal and phonological modularity. Talk given at NAPhC 9, Concordia University, Montreal. Rathcke, T. (2006). Relevance of f0-peak shape and alignment for the perception of a functional contrast in Russian. In Proceedings of Speech Prosody 3, Dresden. Rathcke, T. (2016). How truncating are ‘truncating languages’? Evidence from Russian and German. Phonetica 73, 194–228. Ratliff, M. (1987). Tone sandhi compounding in White Hmong. Linguistics of the Tibeto-Burman Area 10, 71–105. Ratliff, M. (1992) Meaningful Tone: A Study of Tonal Morphology in Compounds, Form Classes, and Expressive Phrases in White Hmong (Center for Southeast Asian Studies Monograph Series). De Kalb: Northern Illinois University. Ratner, N. B. (1986). Durational cues which mark clause boundaries in mother–child speech. Journal of Phonetics 14(2), 303–309. Ratner, N. B., and C. Pye (1984). Higher pitch in BT is not universal: Acoustic evidence from Quiche Mayan. Journal of Child Language 11, 515–522. Raupach, M. (1980). Temporal variables in first and second language speech production. In H. W. Dechert and M. Raupach (eds.), Temporal Variables in Speech, 263–270. The Hague: Mouton de Gruyter. Raux, A., and M. Eskenazi (2012). Optimizing the turn-taking behavior of task-oriented spoken dialog systems. ACM Transactions on Speech and Language Processing 9(1), 1–23. Raz, S. (1997). Tigre. In R. Hetzron (ed.), The Semitic Languages, 446–456. London: Routledge. Rebuschi, G., and L. Tuller (eds.) (1999). The Grammar of Focus (Linguistik aktuell 24). Amsterdam: John Benjamins. Redeker, G. (1991). Review article: Linguistic markers of linguistic structure. Linguistics 29(6), 1139–1172. Redford, M. A. (2015). Handbook of Speech Prosody. Chichester: Wiley. Redford, M. A. (2018). Grammatical word production across metrical contexts in schoolaged children’s and adults’ speech. Journal of Speech, Language, and Hearing Research 61, 1339–1354.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 831 Redford, M. A., and G. Oh (2017). Representation and execution of articulatory timing in first and second language acquisition. Journal of Phonetics 63, 127–138. Redi, L., and S. Shattuck-Hufnagel (2001). Variation in the realization of glottalization in normal speakers. Journal of Phonetics 29, 407–429. Rees, M. (1977). Aspects of Welsh intonation. PhD dissertation, University of Edinburgh. Refice, M., M. Savino, and M. Grice (1997). A contribution to the estimation of naturalness in the intonation of Italian spontaneous speech. In Eurospeech 1997, 783–786, Rhodes. Rehg, K. L. (1993). Proto-Micronesian prosody. In J. A. Edmondson and K. J. Gregerson (eds.), Tonality in Austronesian Languages (Oceanic Linguistics Special Publication 24), 25–46. Honolulu: University of Hawaiʻi Press. Reichel, U. D., and J. Cole (2016). Entrainment analysis of categorical intonation representations. Phonetik und Phonologie. Reichel, U. D., K. Mády, and Š. Beňuš (2015). Parameterization of prosodic headedness. In INTERSPEECH 2015, 929–933, Dresden. Reid, N. (1990/2011). Ngan’gityemerri, a language of the Daly River region. PhD dissertation, Australian National University. (Published 2011, Münich: Lincom.) Reid, J. E., and Associates (2000). The Reid Technique of Interviewing and Interrogation. Chicago: John E. Reid and Associates. Reilly, J. S., M. McIntire, and U. Bellugi (1990). The acquisition of conditionals in American Sign Language: Grammaticized facial expressions. Applied Psycholinguistics 11(4), 369–392. Reinisch, E., A. Jesse, and J. M. McQueen (2010). Early use of phonetic information in spoken word recognition: Lexical stress drives eye movements immediately. Quarterly Journal of Experimental Psychology 63, 772–783. Reinisch, E., A. Jesse, and J. M. McQueen (2011). Speaking rate affects the perception of duration as a suprasegmental lexical-stress cue. Language and Speech 54(2), 147–165. Reinoso Galindo, A. E. (2012). La lengua kawiyarí: Una Aproximación a su fonología y gramática. n.p.: Editorial Académica Española. Remijsen, B. (2001). Word-Prosodic Systems of Raja Ampat Languages. Utrecht: LOT Dissertation Series. Remijsen, B. (2002). Lexically contrastive stress accent and lexical tone in Ma’ya. In C. Gussenhoven and N. Warner (eds.), Laboratory Phonology 7, 585–614. Berlin: Mouton De Gruyter. Remijsen, B. (2007). Lexical tone in Magey Matbat. In V. J. van Heuven and E. van Zanten (eds.), Prosody in Austronesian languages of Indonesia, 9–34. Utrecht: LOT. Remijsen, B. (2013). Tonal alignment is contrastive in falling contours in Dinka. Language 89(2), 297–327. Remijsen, B. (2014). Evidence for three-level vowel length in Ageer Dinka. In J. Caspers, Y. Chen, W. F. L. Heeren, J. Pacilly, N. O. Schiller, and E. van Zanten (eds.), Above and Beyond the Segments: Experimental Linguistics and Phonetics, 246–260. Amsterdam: John Benjamins. Remijsen, B., and O. G. Ayoker (2014). Contrastive tonal alignment in falling contours in Shilluk. Phonology 31(3), 435–462. Remijsen, B., O. G. Ayoker, and T. Mills (2011). Shilluk. Journal of the International Phonetic Association 41(1), 111–125. Remijsen, B., and L. G. Gilley (2008). Why are three-level vowel length systems rare? Insights from Dinka (Luanyjang dialect). Journal of Phonetics 36(2), 318–344. Remijsen, B., and D. R. Ladd (2008). The tone system of the Luanyjang dialect of Dinka. Journal of African Languages and Linguistics 29, 173–213. Remijsen, B., and C. A. Manyang (2009). Luanyjang Dinka. Journal of the International Phonetic Association 39(1), 113–124. Remijsen, B., and V. J. van Heuven (2005). Stress, tone and discourse prominence in the Curaçao dialect of Papiamentu. Phonology 22, 205–235. Remijsen, B., and V. J. van Heuven (eds.) (2006). Between stress and tone. Phonology 23, 121–333.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

832 References Ren, G., Y. Yang, and X. Li (2009). Early cortical processing of linguistic pitch patterns as revealed by the mismatch negativity. Neuroscience 162, 87–95. Ren, G., Y. Tang, X. Li, and X. Sui (2013). Pre-attentive processing of Mandarin tone and intonation: Evidence from event-related potentials. In F. Signorelli and D. Chirchiglia (eds.), Functional Brain Mapping and the Endeavor to Understand the Working Brain, ch. 6. London: InTech. Rennison, J. R. (1997). Koromfe (Descriptive Grammars Series). London: Routledge. Renwick, M. (2014). The Phonetics and Phonology of Contrast: The Case of the Romanian Vowel System. Berlin: De Gruyter Mouton. Repp, B. H. (1982). Phonetic trading relations and context effects: New experimental evidence for a speech mode of perception. Psychological Bulletin 92, 81–110. Repp, S. (2016). Contrast: Dissecting an elusive information-structural notion and its role in grammar. In C. Féry and S. Ishihara (eds.), Oxford Handbook of Information Structure, 270–289. Oxford: Oxford University Press. Repp, S., and H. Drenhaus (2015). Intonation influences processing and recall of left-dislocation sentences by indicating topic vs. focus status of dislocated referent. Language, Cognition and Neuroscience 30(3), 324–346. Retsö, J. (2011). Classical Arabic. In S. Weninger, G. Khan, M. P. Streck, and J. C. E. Watson (eds.), The Semitic Languages: An International Handbook, 782–810. Berlin: De Gruyter Mouton. Revithiadou, A. (1999). Headmost Accent Wins: Head Dominance and Ideal Prosodic Form in Lexical Accent Systems. Utrecht: LOT Dissertation Series. Reyes Gómez, J. C. (2009). Fonología de la lengua Ayuuk de Alotepec, Oaxaca. PhD dissertation, Escuela Nacional de Antropología e Historia. Rhys, M. (1984). Intonation and the discourse. In M. J. Ball and G. E. Jones (eds.), Welsh Phonology: Selected Readings, 125–155. Cardiff: University of Wales Press. Riad, T. (1992). Structures in Germanic prosody. PhD dissertation, Stockholm University. Riad, T. (1998a) Towards a Scandinavian accent typology. In W. Kehrein and R. Weise (eds.), Phonology and Morphology of the Germanic Languages, 77–112. Tübingen: May Niemeyer. Riad, T. (1998b). The origin of Scandinavian tone accents. Diachronica 15, 63–98. Riad, T. (2000). The origin of Danish stød. In A. Lahiri (ed.), Analogy, Levelling and Markedness: Principles of Change in Phonology and Morphology, 261–300. Berlin: Mouton de Gruyter. Riad, T. (2006). Scandinavian accent typology. Sprachtypologie und Universalienforschung 59, 36–55. Riad, T. (2009). Eskilstuna as the tonal key to Danish. In Proceedings of Fonetik 2009, 12–17, Stockholm. Riad, T. (2012). Culminativity, stress and tone accent in Central Swedish. Lingua 122, 1352–1379. Riad, T. (2014). The Phonology of Swedish. Oxford: Oxford University Press. Riad, T. (2015). Prosodin i svenskans morfologi. Stockholm: Morfem Förlag. Riad, T. (2016). The meter of Tashlhiyt Berber songs. Natural Language and Linguistic Theory 35, 499–548. Riad, T. (2018). The phonological typology of North Germanic accent. In L. M. Hyman and F. Plank (eds.), Phonological Typology, 341–388. Berlin: Mouton de Gruyter. Rialland, A. (2001). Anticipatory raising in downstep realization: Evidence for preplanning in tone production. In S. Kaji (ed.), Cross-Linguistics Studies of Tonal Phenomena: Tonogenesis, Typology, and Related Topics, 301–322. Tokyo: Institute for the Study of Languages and Cultures of Asia and Africa. Rialland, A. (2007). Question prosody: An African perspective. In T. Riad and C. Gussenhoven (eds.) Tones and Tunes: Vol 1. Typological Studies in Word and Sentence Prosody, 35–62. Berlin: Mouton de Gruyter. Rialland, A. (2009). The African lax question prosody: Its realisation and geographical distribution. Lingua 1119, 928–949. Rialland, A. (in press). Intonation in Bantu languages. In The Oxford Guide to the Bantu Languages. Oxford: Oxford University Press. Rialland, A., and M. E. Aborobongui (2017). How intonations interact with tones in Embosi (Bantu C25), a two-tone language without downdrift. In L. J. Downing and A. Rialland (eds.), Intonation in African Tone Languages, 195–222. Berlin: De Gruyter Mouton.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 833 Rialland, A., and S. Robert (2001). The intonational system of Wolof. Linguistics 39(5), 893–939. Rice, K. (1987). On defining the intonational phrase: Evidence from Slave. Phonology Yearbook 4, 37–59. Rice, K. (1989). A Grammar of Slave. Berlin: Mouton de Gruyter. Rice, K. (2010). Accent in the native languages of North America. In H. van der Hulst, R. W. N. Goedemans, and E. van Zanten (eds.), A Survey of Word Accentual Patterns in the Languages of the World, 155–248. Berlin: Mouton de Gruyter. Rice, K., and S. Hargus (2005). Introduction. In S. Hargus and K. Rice (eds.), Athabaskan Prosody, 1–45. Amsterdam: John Benjamins. Rich, F. (1999). Diccionario Arabela-Castellano. Lima: Instituto Lingüístico de Verano. Richards, P. (1972). A quantitative analysis of the relationship between language tone and melody in a Hausa song. African Language Studies 13, 137–161. Richards, N. (2010). Uttering Trees. Cambridge, MA: MIT Press. Richards, M. (2016). Not all word stress errors are created equal: Validating an English word stress error gravity hierarchy. PhD dissertation, Iowa State University. Riesberg, S., J. Kalbertodt, S. Baumann, and N. P. Himmelmann (2018). On the perception of prosodic prominences and boundaries in Papuan Malay. In S. Riesberg, A. Shiohara, and A. Utsumi (eds.), A Cross-Linguistic Perspective on Information Structure in Austronesian Languages, 389–414. Berlin: Language Science Press. Riester, A., and J. Piontek (2015). Anarchy in the NP: When new nouns get deaccented and given nouns don’t. Lingua 165(B), 230–253. Rietveld, T., and C. Gussenhoven (1985). On the relation between pitch excursion size and promin ence. Journal of Phonetics 13, 299–308. Rietveld, A., and C. Gussenhoven (1987). Perceived speech rate and intonation. Journal of Phonetics 13, 273–285. Rietveld, T., and C. Gussenhoven (1995). Aligning pitch targets in speech synthesis: Effects of syllable structure. Journal of Phonetics 23, 375–385. Rietveld, T., J. Kerkhoff, and C. Gussenhoven (2004). Word prosodic structure and vowel duration in Dutch. Journal of Phonetics 32, 349–371. Rilliard, A., T. Shochi, J.-C. Martin, D. Erickson, and V. Aubergé (2009). Multimodal indices to Japanese and French prosodically expressed social affects. Language and Speech 52(2–3), 223–243. Ringeval, F., A. Sonderegger, J. Sauer, and D. Lalanne (2013). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Proceedings of the 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE), Shanghai. Rinkevičius, V. (2015). Baltų ir slavų kalbų kirčiavimo istorija 1. Vilnius: Vilniaus Universitetas. Rischel, J. (1963). Morphemic tone and word tone in Eastern Norwegian. Phonetica 10, 154–164. Rischel, J. (1974). Topics in West Greenlandic Phonology: Regularities Underlying the Phonetic Appearance of Word Forms in a Polysynthetic Language. Copenhagen: Akademisk Forlag. Ritchart, A., and A. Arvaniti (2013). The form and use of uptalk in Southern Californian English. Proceedings of Meetings on Acoustics 20, San Francisco. Ritchie, W., and T. K. Bhatia (eds.) (2009). The New Handbook of Second Language Acquisition. Bingley: Emerald. Ritsma, R. J. (1967). Frequency dominant in the perception of the pitch of complex sounds. Journal of the Acoustical Society of America 42, 191–198. Ritter, S., D. Mücke, and M. Grice (2019). The dynamics of intonation: Categorical and continuous variation in an attractor-based model. PLoS ONE 14(5), e0216859. Ritter, S., and T. B. Röttger (2014). Speakers modulate noise-induced pitch according to intonational context. In Proceedings of Speech Prosody 7, 890–894, Dublin. Roach, P. (1982). On the distinction between ‘stress-timed’ and ‘syllable-timed’ languages. In D. Crystal (ed.), Linguistic Controversies, 73–79. London: Edward Arnold. Roach, P. (1994). Conversion between prosodic transcription systems: ‘Standard British’ and ToBI. Speech Communication 15, 91–99.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

834 References Roberts, C. (1996/2012). Information structure in discourse: Towards an integrated formal theory of pragmatics. Semantics and Pragmatics 5(6), 1–96. Roberts, J. S. (2005). Is Migaama a tonal or an accentual language. Poster presented at the Conference Between Stress and Tone, Leiden. Roberts, J. S. (2013). The tone system of Mawa. In H. Tourneux (ed.), Topics in Chadic Linguistics VII: Papers from the 6th Biennial International Colloquium on the Chadic Languages, Villejuif, September 22–23, 2011, 115–130. Cologne: Rüdiger Köppe. Roberts, J. S. (2003). The analysis of central vowels in Gor (Central Sudanic). In Actes du 3e Congrès Mondial de Linguistique Africaine, 53–67, Lomé. Roberts, S., R. Fyfield, E. Baibazarova, S. van Goozen, J. F. Culling, and D. F. Hay (2013). Parental speech at 6 months predicts joint attention at 12 months. Infancy, 18(Suppl. 1), E1–E15. Robin, D. A., D. Tranel, and H. Damasio (1990). Auditory perception of temporal and spectral events in patients with focal left and right cerebral lesions. Brain and Language 39, 539–555. Robins, R. H. (1957). Vowel nasality in Sundanese: A phonological and grammatical study. Studies in Linguistic Analysis, 87–103. Oxford: Blackwell. Robins, R. H., and N. Waterson (1952). Notes on the phonetics of the Georgian word. Bulletin of the School of Oriental and African Studies 14, 52–72. Robinson, B. W. (1976). Limbic influences on human speech. Annals of the New York Academy of Sciences 280, 761–771. Roca, I. (1999). Stress in the Romance languages. In H. van der Hulst (ed.), Word Prosodic Systems in the Languages of Europe, 659–812. Berlin: Mouton de Gruyter. Rockwell, P. (2000). Lower, slower, louder: Vocal cues of sarcasm. Journal of Psycholinguistic Research 29, 483–495. Rockwell, P. (2005). Sarcasm on television talk shows: Determining speaker intent through verbal and nonverbal cues. In A. Clark (ed.), Psychology of Moods, 109–140. New York: Nova Science. Rockwell, P. (2007). Vocal features of conversational sarcasm: A comparison of methods. Journal of Psycholinguistic Research 36, 361–369. Roelof, A. (1997). The WEAVER model of word-form encoding in speech production. Cognition 64, 249–284. Roelofsen, F., and S. van Gool (2010). Disjunctive questions, intonation, and highlighting. In M. Aloni, H. Bastiaanse, T. de Jager, and K. Schulz (eds.), Logic, Language, and Meaning: Selected Papers from the Seventeenth Amsterdam Colloquium, 384–394. New York: Springer. Roettger, T. B., A. Bruggeman, and M. Grice (2015). Word stress in Tashlhiyt: Postlexical prominence in disguise. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Roettger, T. B., and M. Gordon (2017). Methodological issues in the study of word stress correlates. Linguistics Vanguard 3(1), 1–11. Roettger, T. B., and M. Grice. (2019). The tune drives the text: Competing information channels of speech shape phonological systems. Language Dynamics and Change 9(2), 265–298. Rogers, C. (2010). A comparative grammar of Xinkan. PhD dissertation, University of Utah. Rohde, H., and C. Kurumada (2018). Alternatives and inferences in the communication of meaning. In C. Fedemeier and D. Watson (eds.), Psychology of Learning and Motivation, vol. 68, 215–252. New York: Academic Press. Rohlfs, G. (1966). Grammatica storica della lingua italiana e dei suoi dialetti. Torino: Einaudi. Rojas Curieux, T. (1998). La lengua páez: Una visión de su gramática. Bogotá: Ministerio de Cultura. Roll, M., M. Horne, and M. Lindgren (2009). Left-edge boundary tone and main clause verb effects on syntactic processing in embedded clauses: An ERP study. Journal of Neurolinguistics 22, 55–73. Rolle, N. (2018). Grammatical tone: Typology, theory, and functional load. PhD dissertation, University of California, Berkeley. Rolle, N., and M. Vuillermet (2016). Morphologically assigned accent and an initial three syllable window in Ese’eja. In UC Berkeley Phonetics and Phonology Lab Annual Report. Berkeley: Depart ment of Linguistics, University of California, Berkeley.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 835 Romanelli, S., A. C. Menegotto, and R. Smyth (2015). Stress perception: Effects of training and a study abroad program for L1 English late learners of Spanish. Journal of Second Language Pronunciation 1, 181–210. Romani, G. L., S. J. Williamson, and L. Kaufman (1982). Tonotopic organization of the human auditory cortex. Science 216(4552), 1339–1340. Romano, A., P. Mairano, and B. Pollifrone (2010). Variabilità ritmica di varietà dialettali del Piemonte. In S. Schmid, M. Schwarzenbach, and D. Studer (eds.), La dimensione temporale del parlato: Proceedings of the 5th National Conference of the Italian Association for Speech Sciences, 101–112, Zürich. Romero-Méndez, R. (2009). A reference grammar of Ayutla Mixe (Tukyo’m Ayuujk). PhD dissertation, University at Buffalo. Romito, L., and J. Trumper (1989). Un problema della coarticolazione: L’isocronia rivisistata. In Atti del convegno della’Associazione Italiana di Acustica, 449–455. Fidenza: Mattioli. Romøren, A. S. H. (2016). Hunting highs and lows: The acquisition of prosodic focus marking in Swedish and Dutch. PhD dissertation, Utrecht University. Romportl, M. (1973). Intonological typology. In Studies in Phonetics, 131–136. Prague: Academia. Rood, D. S., and A. R. Taylor (1996). Sketch of Lakhota, a Siouan language. In Handbook of North American Indians, vol. 17, 440–482. Washington DC: Smithsonian Institution. Roosman, L. (2007). Melodic structure in Toba Batak and Betawi Malay word prosody. In V. J. van Heuven and E. van Zanten (eds.), Prosody in Indonesian languages, 89–115. Utrecht: LOT. Rooth, M. (1985). Association with focus. PhD dissertation, University of Massachusetts. Rooth, M. (1992). A theory of focus interpretation. Natural Language Semantics 1, 75–116. Rose, P. (1989). On the non-equivalence of fundamental frequency and linguistic tone. In D. Bradley, E. Henderson, and M. Mazaudon (eds.), Prosodic Analysis and Asian Linguistics: To Honour R. K. Sprigg, 55–82. Canberra: Pacific Linguistics. Rose, Y., and C. Champdoizeau (2008). There is no innate trochaic bias: Acoustic evidence in favour of the neutral start hypothesis. In A. Gavarró and M. J. Freitas (eds.), Language Acquisition and Development: Proceedings of GALA (2007), 359–369. Newcastle: Cambridge Scholars. Rose, Y., P. Pigott, and D. Wharram (2012). Schneider’s Law revisited: The syllable-level remnant of an older metrical rule. McGill Working Papers in Linguistics 22(1), 1–12. Rose, S., and R. Walker (2004). A typology of consonant agreement as correspondence. Language 80(3), 475–531. Rosenbaum, D. A., R. G. Cohen, S.-A. Jax, D. J. Weiss, and R. van der Wel (2007a). The problem of serial order in behavior: Lashley’s legacy. Human Movement Science 26, 525–554. Rosenbaum, P., N. Paneth, A. Leviton, M. Goldstein, M. Bax, and B. Jacobsson (2007b). A report: The definition and classification of cerebral palsy April 2006. Developmental Medicine and Child Neurology 109 (Suppl.), 8–14. Rosenbek, J. C., G. P. Crucian, S. A. Leon, B. Hieber, A. D. Rodriguez, B. Holiway, T. U. Ketterson, M. Ciampitti, K. Heilman, and L. Gonzalez-Rothi (2004). Novel treatments for expressive aprosodia: A phase I investigation of cognitive linguistic and imitative interventions. Journal of Inter national Neuropsychological Society 10, 786–793. Rosenbek, J. C., A. D. Rodriguez, B. Hieber, S. A. Leon, G. P. Crucian, T. U. Ketterson, M. Ciampitti, F. Singletary, K. M. Heilman, and L. J. Gonzalez Rothi (2006). Effects of two treatments for aprosodia secondary to acquired brain injury. Journal of Rehabilitation Research and Development 43, 379–390. Rosenberg, A. (2009). Automatic detection and classification of prosodic events. PhD dissertation, Columbia University. Rosenberg, A. (2010). AuToBI: A tool for automatic ToBI annotation. In INTERSPEECH 2010, 146–149, Makuhari. Rosenberg, A. (2012a). Classifying skewed data: Importance weighting to optimize average recall. In INTERSPEECH 2012, 2242–2245, Portland. Rosenberg, A. (2012b). Modeling intensity contours and the interaction between pitch and intensity to improve automatic prosodic event detection and classification. In Proceedings of the IEEE Workshop on Spoken Language Technology, Miami.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

836 References Rosenberg, A. (2018). Speech, prosody, and machines: Nine challenges for prosody research. In Proceedings of Speech Prosody 9, 784–793, Poznań. Rosenberg, A., R. Fernandez, and B. Ramabhadran (2015). Modeling phrasing and prominence using deep recurrent learning. In INTERSPEECH 2015, 3066–3070, Dresden. Rosenblau, G., D. Kliemann, I. Dziobek, and H. R. Heekeren (2017). Emotional prosody processing in autism spectrum disorder. Social Cognitive and Affective Neuroscience 12(2), 224–239. Rosenstein, O. (2001). Israel Sign Language: A topic prominent language. PhD dissertation, University of Haifa. Rosenthall, S., and H. van der Hulst (1999). Weight-by-position by position. Natural Language and Linguistic Theory 17, 499–540. Rosenvold, E. (1981). The role of intrinsic f0 and duration in the perception of stress. ARIPUC 15, 147–166. Ross, B. (2011). Prosody and grammar in Dalabon and Kayardild. PhD dissertation, University of Melbourne. Ross, B., J. Fletcher, and R. Nordlinger (2016). The alignment of prosody and clausal structure in Dalabon. Australian Journal of Linguistics 36(1), 52–78. Ross, E. D. (1981). The aprosodias: Functional-anatomical organization of the affective components of language in the right hemisphere. Archives of Neurology 38, 561–569. Ross, E. D., and M. M. Mesulam (1979). Dominant language functions of the right hemisphere? Prosody and emotional gesturing. Archives of Neurology 36, 144–148. Ross, E. D., and M. Monnot (2008). Neurology of affective prosody and its functional-anatomic organization in right hemisphere. Brain and Language 104, 51–74. Ross, E. D., R. D. Thompson, and J. Yenkosky (1997). Lateralization of affective prosody in brain and the callosal integration of hemispheric language functions. Brain and Language 56, 27–54. Ross, K. N., and M. Ostendorf (1996). Prediction of abstract prosodic labels for speech synthesis. Computer Speech and Language 10, 155–185. Ross, M. D. (1988). Proto Oceanic and the Austronesian Languages of Western Melanesia. Canberra: Pacific Linguistics. Ross, M. D. (2008). The integrity of the Austronesian language family. In A. Sanchez-Mazas, R. Blench, M. Ross, I. Peiros, and M. Lin (eds.), Past Human Migrations in East Asia: Matching Archaeology, Linguistics and Genetics, 161–181. London: Routledge. Ross, T., N. Ferjan, and A. Arvaniti (2008). Quantifying rhythm in running speech. Journal of the Acoustical Society of America 123(5), 3427. Rossi, M. (1971). Le seuil de glissando ou seuil de perception des variations tonales pour les sons de la parole. Phonetica 23, 1–33. Rossi, M., and M. Chafcouloff (1972). Recherche sur de le seuil differentiel de fréquencé fondamentale dans la parole. Travaux de l’Insitut de Phonetique d’Aix 1, 179–185. Rossing, T. D., and A. J. M. Houtsma (1986). Effects of signal envelope on the pitch of short sinusoidal tones. Journal of the Acoustical Society of America 79, 1926–1933. Rothenberg, M. (1973). A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. Journal of the Acoustical Society of America 53, 1632–1645. Rothenberg, M. (1992). A multichannel electroglottograph. Journal of Voice 6, 36–43. Rothenberg, M., and J. Mashie (1988). Monitoring vocal fold adduction through vocal fold contact area. Journal of Speech and Hearing Research 31, 338–351. Rothstein, J. (2013). Prosody Treatment Program. Austin: Pro-Ed. Round, E. (2009). Kayardild morphology, phonology and morphosyntax. PhD dissertation, Yale University. Round, E. (2010). Tone height binarity and register in intonation: The case from Kayardild (Australian). In Proceedings of Speech Prosody 5, Chicago. Rozelle, L. (1997). The effect of stress on vowel length in Aleut. UCLA Working Papers in Phonetics 95, 91–101.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 837 Rubach, J., and G. Booij (1985). A grid theory of stress in Polish. Lingua 66(4), 281–320. Rudin, C., C. Kramer, L. Billings, and M. Baerman (1999). Macedonian and Bulgarian li questions: Beyond syntax. Natural Language and Linguistic Theory 17, 541–586. Ruffman, T. (2014). To belief or not belief: Children’s theory of mind. Developmental Review 34(3), 265–293. Rump, H. H., and R. Collier (1996). Focus conditions and the prominence of pitch-accented syllables. Language and Speech 39(1), 1–17. Rumsey, A. (2007). Musical, poetic and linguistic form in Tom Yaya sung narratives from Papua New Guinea. Anthropological Linguistics 49, 237–282. Rumsey, A. (2010). A metrical system that defies description by ordinary means. In J. Bowden and N. P. Himmelmann (eds.), A Journey through Austronesian and Papuan Linguistic and Cultural Space: Papers in Honour of A. K. Pawley, 39–56. Canberra: Pacific Linguistics. Rumsey, A. (2011). Style, plot, and character in Tom Yaya Tales from Ku Waru’. In A. Rumsey and D. Niles (ed.), Sung Tales from the Papua New Guinea Highlands: Studies in Form, Meaning, and Sociocultural Context, 247–274. Canberra: ANU E Press. Rusko, M., R. Sabo, and M. Dzúr (2007). Sk-ToBI scheme for phonological prosody annotation in Slovak. In Text, Speech and Dialogue, vol. 4629, 334–341. Berlin: Springer. Russell, E., J. Laures-Gore, and R. Pate (2010). Treatment expressive aprosodia: A case study. Journal of Medical Speech-Language Pathology 18, 115–119. Russell, J. A. (1994). Is there universal recognition of emotion from facial expressions? A review of the cross-cultural studies. Psychological Bulletin 115(1), 102–141. Rutter, M., and L. Lockyer (1967). A five to fifteen year follow-up study of infantile psychosis. I: Description of sample. British Journal of Psychiatry 113, 1169–1182. Ryan, K. M. (2011). Gradient syllable weight and weight universals in quantitative metrics. Phonology 28, 413–454. Ryan, K. M. (2014). Onsets contribute to syllable weight: Statistical evidence from stress and meter. Language 90, 309–341. Ryan, K. M. (2017). The stress-weight interface in metre. Phonology 34(3), 581–613. Ryan, K. M. (2019). Prosodic Weight. Oxford: Oxford University Press. Rycroft, D. (1959). African music in Johannesburg: African and non-African features. Journal of the International Folk Music Council 11, 25–30. Sachs, J. (1977). The adaptive significance of linguistic input to prelinguistic infants. In C. E. Snow and C. A. Ferguson (eds.), Talking to Children: Language Input and Acquisition (51–61). Cambridge: Cambridge University Press. Sacks, H., E. A. Schegloff, and G. Jefferson (1974). A simplest systematics for the organization of turntaking for conversation. Language 50, 696–735. Sadakata, M., and J. M. McQueen (2014). Individual aptitude in Mandarin lexical tone perception predicts effectiveness of high-variability training. Frontiers in Psychology 5, 1318. Sadat-Tehrani, N. (2007). The intonational grammar of Persian. PhD dissertation, University of Manitoba. Sadat-Tehrani, N. (2011). The intonation patterns of interrogatives in Persian. Linguistic Discovery Journal 9(1), 105–136. Sadeghi, V. (2017). Word-level prominence in Persian: An experimental study. Language and Speech 60(4), 571–596. Saeed, J. (1987). Somali Reference Grammar. Kensington, MD: Dunwoody Press. Saeed, J. (1999). Somali. Amsterdam: John Benjamins. Saffran, J. R., J. R. Werker, and L. A. Werner (2006). The infant’s auditory world: Hearing, speech, and the beginnings of language. In W. Damon and R. M. Lerner (eds.), Handbook of Child Psychology, 58–108. Hoboken: John Wiley & Sons. Sahkai, H., M.-L. Kalvik, and M. Mihkla (2013). Prosodic effects of information structure in Estonian. In E. L. Asu and P. Lippus (eds.), Nordic Prosody: Proceedings of the XIth Conference, Tartu 2012, 323–332. Frankfurt: Peter Lang.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

838 References Sai, F. (2005). The role of the mother’s voice in developing mother’s face preference: Evidence for intermodal perception at birth. Infant and Child Development 14(1), 29–50. Saint-Cyr, J. A., A. E. Taylor, and K. Nicholson (1995). Behavior and the basal ganglia. In W. J. Weiner and A. E. Lang (eds.), Behavioral Neurology of Movement Disorders (Advances in Neurology 65), 1–28. New York, NY: Raven Press. Saito, H. (2006). Nuclear-stress placement by Japanese learners of English: Transfer from Japanese. In Y. Kawaguchi, I. Fónagy, and T. Moriguchi (eds.), Prosody and Syntax, 125–139. Amsterdam: John Benjamins. Saito, Y., S. Aoyama, T. Kondo, R. Fukumoto, N. Konishi, K. Nakamura, M. Kobayaski, and T. Toshima (2007). Frontal cerebral blood flow change associated with infant-directed speech (IDS). Archives of Disease in Childhood: Fetal and Neonatal Edition 92, F113–F116. Salverda, A. P., D. Dahan, and J. M. McQueen (2003). The role of prosodic boundaries in the reso lution of lexical embedding in speech comprehension. Cognition 90, 51–89. Salveste, N. (2013). Kõnetaju kategoriaalsus ehk hüpotees sellest, kuidas me keelelisi üksusi tajume: Eesti ja soome-ugri keeleteaduse ajakiri. Journal of Estonian and Finno-Ugric Linguistics 4(1), 127–143. Salza, P., G. Marotta, and D. Ricca (1987). Duration and formant frequencies of Italian bivocalic sequences. In Proceedings of the 11th International Congress of Phonetic Sciences, 113–116, Tallinn. Samek-Lodovici, V. (2005). Prosody-syntax interaction in the expression of focus. Natural Language and Linguistic Theory 23, 687–755. Sammler, D., S. A. Kotz, K. Eckstein, D. V. Ott, and A. D. Friederici (2010). Prosody meets syntax: The role of the corpus callosum. Brain 133(9), 2643–2655. Samuel, A. G., and T. Kraljic (2009). Perceptual learning for speech. Attention, Perception, and Psychophysics 71, 1207–1218. Sândalo, F. (1995). A Grammar of Kadiweu: With Special Reference to the Polysynthesis Parameter. MIT Working Papers in Linguistics. Sande, H. (2017). Distributing morphologically conditioned phonology: Three case studies from Guébie. PhD dissertation, University of California, Berkeley. Sande, H. (2018). Cross-word morphologically conditioned scalar tone shift in Guébie. Morphology 28, 253–295. Sanderman, A. A., and R. Collier (1997). Prosodic phrasing and comprehension. Language and Speech 40(4), 391–409. Sandler, W. (1989). Phonological Representation of the Sign: Linearity and Nonlinearity in American Sign Language. Dordrecht: Foris. Sandler, W. (1990). Temporal aspects and ASL phonology. In S. Fischer and P. Siple (eds.), Theoretical Issues in Sign Language Research: Vol. 1. Linguistics, 7–36. Chicago: University of Chicago Press. Sandler, W. (1993). A sonority cycle in American Sign Language. Phonology 10(2), 209–241. Sandler, W. (1999). Cliticization and prosodic words in a sign language. In T. Hall and U. Kleinhenz (eds.), Studies on the Phonological Word, 223–254. Amsterdam: John Benjamins. Sandler, W. (2010). Prosody and syntax in sign languages. Transactions of the Philological Society 108(3), 298–328. Sandler, W. (2012). The phonological organization of sign languages. Language and Linguistics Compass 6(3), 162–182. Sandler, W. (2017). The challenge of sign language phonology. Annual Review of Linguistics 3, 43–63.‫‏‬ Sandler, W., and D. Lillo-Martin (2006). Sign Language and Linguistic Universals. Cambridge: Cambridge University Press. Sansavini, A., J. Bertoncini, and G. Giovanelli (1997). Newborns discriminate the rhythm of multisyllabic stressed words. Developmental Psychology 33(1), 3–11. Santiago Martínez, G. G. (2015). Temas de fonología y morfosintaxis del mixe de Tamazulápam. PhD dissertation, Centro de Investigaciones y Estudios Superiores en Antropología Social. Santiago Vargas, F., and E. Delais-Roussarie (2012). Acquiring phrasing and intonation in French as second language: The case of yes-no questions produced by Mexican Spanish learners. In Proceedings of Speech Prosody 6, 338–341, Shanghai.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 839 Santos Martínez, M. R. (2013). El sistema fonológico del mixe de metepec: Aspectos segmentales y prosódicos. MA thesis, Centro de Investigaciones y Estudios Superiores en Antropología Social. Sapir, E. (1925). Pitch accent in Sarcee, an Athabaskan language. Journal de la Société des Américanistes de Paris 17, 185–205. Sapir, E. (1930). Southern Paiute, a Shoshonean language. Proceedings of the American Academy of Arts and Sciences 65, 1–296. Sapir, E., and M. Swadesh (1939). Nootka Texts: Tales and Ethnological Narratives with Grammatical Notes and Lexical Materials. Philadelphia: Linguistic Society of America. Sapir, E., and M. Swadesh (1960). Yana Dictionary. Berkeley: University of California Press. Saran, F. L. (1907). Deutsche Verslehre. Munich: C. H. Beck. Sarles, H. (1966). A descriptive grammar of the Tzotzil language as spoken in San Bartolomé de los Llanos. PhD dissertation, University of Chicago. Sato, Y., Y. Sogabe, and R. Mazuka (2007). Brain responses in the processing of lexical pitch-accent by Japanese speakers. Neuroreport 18(18), 2001–2004. Sato, Y., Y. Sogabe, and R. Mazuka (2010). Development of hemispheric specialization for lexical pitch–accent in Japanese infants. Journal of Cognitive Neuroscience 22(11), 2503–2513. Sauermann, A., B. Höhle, A. Chen, and Järvikivi, J. (2011). Intonational marking of focus in different word orders in German children. In Proceedings of the 28th West Coast Conference on Formal Linguistics, 313–322. Somerville, MA: Cascadilla Proceedings Project. Savino, M. (2012). The Intonation of Polar Questions in Italian: Where is the Rise? Journal of the International Phonetic Association 42(1), 23–48. Sawicka, I. (1991). Some remarks on the intonation of yes-or-no questions in Southern Slavonic. In I. Sawicka and A. Holvoet (eds.), Studies in the Phonetic Typology of the Slavic Languages, 125–152. Warsaw: Sławistyczny Ośrodek Wydawniczy. Sawicka, I., and L. Spasov (1997). Fonologija na sovremeniot makedonski standarden jazik: Segmentalna i suprasegmentalna. Skopje: Detska Radost. Sawusch, J. R., and R. S. Newman (2000). Perceptual normalization for speaking rate II: Effects of signal discontinuities. Perception and Psychophysics 62, 285–300. Sayers, B. J. (1976a). Interpenetration of Stress and Pitch in Wik-Munkan Grammar and Phonology. Canberra: Pacific Linguistics. Sayers, B. J. (1976b). The Sentence in Wik-Munkan: A Description of Propositional Relationships. Canberra: Pacific Linguistics. Scarborough, R. (2007). The intonation of focus in Farsi. UCLA Working Papers in Phonetics 105, 19–34. Scarborough, R., P. A. Keating, S. L. Mattys, T. Cho, and A. Alwan (2009). Optical phonetics and visual perception of lexical and phrasal stress in English. Language and Speech 52, 135–175. Schachter, P. (1965). Some comments on J. M. Stewart’s The Typology of the Twi Tone System. Bulletin of the Institute of African Studies (Legon University) 1, 28–42. Schachter, P., and F. T. Otanes (1972). Tagalog Reference Grammar. Berkeley: University of California Press. Schadeberg, T. C. (1973). Kinga: A restricted tone language. Studies in African Linguistics 4, 23–48. Scheiner, E., K. Hammerschmidt, U. Jürgens, and P. Zwirner (2002). Acoustic analyses of developmental changes and emotional expression in the preverbal vocalizations of infants. Journal of Voice 16(4), 509–529. Schellenberg, M. (2009). Singing in a tone language: Shona. In A. Ojo and L. Moshi (eds.), Selected Proceedings of the 39th Annual Conference on African Linguistics: Linguistic Research and Languages in Africa, 137–144. Somerville, MA: Cascadilla Proceedings Project. Schellenberg, M. (2012). Does language determine music in tone languages? Ethnomusicology 56(2), 266–278. Schellenberg, M., and B. Gick (2020). Microtonal variation in sung Cantonese. Phonetica 77, 83–106. Schepman, A., R. Lickley, and D. R. Ladd (2006). Effects of vowel length and ‘right context’ on the alignment of Dutch nuclear accents. Journal of Phonetics 34, 1–28.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

840 References Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication 40(1–2), 227–256. Schiering, R., B. Bickel, and K. A. Hildebrandt (2010). The prosodic word is not universal, but emergent. Journal of Linguistics 46, 657–709. Schiffrin, D. (1987). Discourse Markers. Cambridge: Cambridge University Press. Schiller, N. O. (1999). Masked syllable priming of English nouns. Brain and Language 68(1–2), 300–305. Schiller, N. O., and A. Costa (2006). Activation of segments, not syllables, during phonological encoding in speech production. In G. Jarema and G. Libben (eds.), The Mental Lexicon, vol. 1, no. 2, 231–250. Amsterdam: John Benjamins. Schirmer, A. (2004). Timing speech: A review of lesion and neuroimaging findings. Cognitive Brain Research 21, 269–287. Schirmer, A., K. Alter, S. A. Kotz, and A. D. Friederici (2001). Lateralization of prosody during language production: A lesion study. Brain and Language 76, 1–17. Schlanger, B. B., P. Schlanger, and L. J. Gerstman (1976). The perception of emotionally toned sentences by right hemisphere-damaged and aphasic subjects. Brain and Language 3, 396–403. Schlaug, G., S. Marchina, and A. Norton (2009). Evidence for plasticity in white-matter tracts of patients with chronic Broca’s aphasia undergoing intense intonation-based speech therapy. Annals of the New York Academy of Science 1169, 385–394. Schlenker, P., V. Aristodemo, L. Ducasse, J. Lamberton, and M. Santoro (2016). The unity of focus: Evidence from sign language (ASL and LSF). Linguistic Inquiry 47(2), 363–381. Schlerman, B. J. (1989). The Meters of John Webster. New York: Peter Lang. Schlöder, J., and A. Lascarides (2015). Interpreting English pitch contours in context. In C. Howes and S. Larsson (eds.), Proceedings of the 19th Workshop on the Semantics and Pragmatics of Dialogue, 131–139, Gothenberg. Schlüter, J. (2009). Rhythmic Grammar: The Influence of Rhythm on Grammatical Variation and Change in English. Berlin: Walter de Gruyter. Schmid, S. (2004). Une approche phonétique de l’isochronie dans quelques dialectes italo-romans. In T. Meisenburg and M. Selig (eds.), Nouveaux départs en phonologie, 109–124. Tübingen: Narr. Schmidt, J. E. (1986). Die mittelfränkischen Tonakzente (rheinische Akzentuierung). Stuttgart: Steiner. Schmidt, J. E. (2002). Die sprachhistorische Genese der mittelfränkischen Tonakzente. In P. Auer, P. Gilles, and H. Spiekermann (eds.), Silbenschnitt und Tonakzente, 201–233. Tübingen: Niemeyer. Schmied, J. (2006). East African Englishes. In B. B. Kachru, Y. Kachru, and C. L. Nelson (eds.), Handbook of World Englishes, 188–202. Oxford: Blackwell. Schneider, M. (1943). Phonetische und metrische Korrelationen bei gesprochenen und gesungenen Ewe-Texten. Archiv für vergleichende Phonetik 7(1–2), 1–15. Scholz, F. (2012). Tone sandhi, prosodic phrasing, and focus marking in Wenzhou Chinese. PhD dissertation, Leiden University. Scholz, F., and Y. Chen (2014). The independent effects of prosodic structure and information status on tonal coarticulation: Evidence from Wenzhou Chinese. In J. Caspers, Y. Chen, W. F. L. Heeren, J. Pacilly, N. O. Schiller, and E. van Zanten (eds.), Above and Beyond the Segments: Experimental Linguistics and Phonetics, 275–287. Amsterdam: John Benjamins. Schötz, S. (2007). Acoustic analysis of adult speaker age. In C. Müller (ed.), Speaker Classification I, 88–107. New York: Springer. Schouten, J. F. (1938). The perception of subjective tones. Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen 41, 1086–1093. Schouten, J. F. (1940a). The residue and the mechanism of hearing. Proceedings of the Koninklijke Nederlandse Akademie van Wetenschapen 43, 991–999. Schouten, J. F. (1940b). The residue, a new component in subjective sound analysis. Proceedings of the Koninklijke Nederlandse Akademie van Wetenschapen 43, 356–365. Schouten, J. F., R. J. Ritsma, and B. L. Cardozo (1962). Pitch of the residue. Journal of the Acoustical Society of America 34, 1418–1424.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 841 Schrock, T. (2014). A Grammar of Ik (Icé-tód): Northeast Uganda’s Last Thriving Kuliak Language. Utrecht: LOT. Schrock, T. (2017). The Ik Language: Dictionary and Grammar Sketch. Berlin: Language Science Press. Schuh, R. G. (1971). Ngizim phonology. Ms., University of California, Los Angeles. Schuh, R. G. (1978). Tone rules. In V. A. Fromkin (ed.), Tone: A Linguistic Survey, 221–256. New York: Academic Press. Schuh, R. G. (2017). A Chadic Cornucopia. Oakland: eScholarship. Schuller, B., and A. Batliner (2014). Computational Paralinguistics: Emotion, Affect, and Personality in Speech and Language Processing. Chichester: Wiley. Schuller, B., S. Steidl, and A. Batliner (2009). The Interspeech 2009 Emotion Challenge. In INTERSPEECH 2009, 312–315, Brighton. Schuller, B., S. Steidl, A. Batliner, S. Hantke, F. Hönig, J. R. Orozco-Arroyave, E. Nöth, Y. Zhang, and F. Weninger (2015). The Interspeech 2015 computational paralinguistics challenge: Nativeness, Parkinson’s and eating condition. In INTERSPEECH 2015, 478–482, Dresden. Schuller, B., S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, F. Weninger, F. Eyben, E. Marchi, M. Mortillaro, H. Salamin, A. Polychroniou, F. Valente, and S. Kim (2013). The Interspeech 2013 Computational Paralinguistics Challenge: Social signals, conflict, emotion, autism. In INTERSPEECH 2013, 148–152, Lyon. Schuller, B., F. Weninger, Y. Zhang, F. Ringeval, A. Batliner, S. Steidl, F. Eyben, E. Marchi, A. Vinciarelli, K. Scherer, Chetouani, M., and M. Mortillaro (2019). Affective and behavioural computing: Lessons learnt from the First Computational Paralinguistics Challenge. Computer Speech and Language 53, 156–180. Schultze-Berndt, E., and C. Simard (2012). Constraints on noun phrase discontinuity in an Australian language: The role of prosody and information structure. Linguistics 50, 1015–1058. Schumann Gálvez, O. (1973). La lengua Chol, de Tila (Chiapas). Mexico City: Centro de Estudios Mayas, National Autonomous University of Mexico. Schütz, A. (1985). The Fijian Language. Honolulu: University of Hawaiʻi Press. Schwab, S., and F. Grosjean (2004). La perception du débit en langue seconde. Phonetica 61, 84–94. Schwartz, R. (2009). Handbook of Child Language Disorders. New York: Psychology Press. Schwarz, A. (2009). Tonal focus reflections in Buli and some Gur relatives. Lingua 119(6), 950–972. Schwarzschild, R. (1999). GIVENness, AvoidF and other constraints on the placement of accent. Natural Language Semantics 7, 141–177. Schwarzwald, O. (2011). Modern Hebrew. In S. Weninger, G. Khan, M. P. Streck, and J. C. E. Watson (eds.), The Semitic Languages: An International Handbook, 523–536. Berlin: De Gruyter Mouton. Schwiertz, G. (2009). Intonation and prosodic structure in Beaver (Athabaskan): Explorations on the language of the Danezaa. PhD dissertation, University of Cologne. Sebastián-Gallés, N., E. Dupoux, J. Segui, and J. Mehler (1992). Contrasting syllabic effects in Catalan and Spanish. Journal of Memory and Language 31(1), 18–32. Seddoh, S. A. K. (2004). Prosodic disturbance in aphasia: Speech timing versus intonation production. Clinical Linguistics and Phonetics 18, 17–38. See, R., V. Driscoll, K. Gfeller, S. Kliethermes, and J. Oleson (2013). Speech intonation and melodic contour recognition in children with cochlear implants and with normal hearing. Otology and Neurotology 34(3), 490–498. Seebeck, A. (1841). Beobachtungen über einige Bedingungen zur Entstehung von Tönen. Annalen der Physik und Chemie 53, 417–436. Segal, O., and L. Kishon-Rabin (2012). Evidence for language-specific influence on the preference of stress patterns in infants learning an iambic language (Hebrew). Journal of Speech, Language and Hearing Research 55, 1329–1341. Seidl, A. (2007). Infants’ use and weighting of prosodic cues in clause segmentation. Journal of Memory and Language 57(1), 24–48.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

842 References Seidl, A., and A. Cristià (2008). Developmental changes in the weighting of prosodic cues. Developmental Science 11(4), 596–606. Seifart, F. (2005). The structure and use of shape-based noun classes in Miraña (North West Amazon). PhD dissertation, Radboud University. Seiler, H. (1957). Die phonetischen Grundlagen der Vokalphoneme des Cahuilla. Zeitschrift für Phonetik und allgemeine Sprachwissenschaft 10, 204–223. Seiler, H. (1965). Accent and morphophonemics in Cahuilla and Uto-Aztecan. International Journal of American Linguistics 31, 50–59. Seiler, H. (1977). Cahuilla Grammar. Banning, CA: Malki Museum Press. Sekerina, I. E., and J. C. Trueswell (2012). Interactive processing of contrastive expressions by Russian children. First Language 32(1–2), 63–87. Selkirk, E. O. (1977). Some remarks on noun phrase structure. In A. Akmajian, P. Culicover, and T. Wasow (eds.), Studies in Formal Syntax, 283–316. New York: Academic Press. Selkirk, E. O. (1978). The French foot: On the status of ‘mute’ e. Studies in French Linguistics 1(2), 141–150. Selkirk, E. O. (1980). The role of prosodic categories in English word stress. Linguistic Inquiry 11, 563–605. Selkirk, E. O. (19781). On prosodic structure and its relation to syntactic structure. In T. Fretheim (ed.), Nordic Prosody II: Papers from a Symposium, 111–140. Trondheim: Tapir. Distributed by Indiana University Linguistics Club 1980. Selkirk, E. O. (1984). Phonology and Syntax: The Relation between Sound and Structure. Cambridge, MA: MIT Press. Selkirk, E. O. (1986). On derived domains in sentence phonology. Phonology [Yearbook] 3, 371–405. Selkirk, E. O. (1995). Sentence prosody: Intonation, stress, and phrasing. In J. A. Goldsmith (ed.), The Handbook of Phonological Theory, 550–569. Malden, MA: Blackwell. Selkirk, E. O. (2007). Bengali intonation revisited: An optimality theoretic analysis in which focus stress prominence drives focus phrasing. In C. Lee, M. Gordon, and D. Büring (eds.), Topic and Focus: Cross-Linguistic Perspectives on Meaning and Intonation, 217–246. Dordrecht: Kluwer. Selkirk, E. O. (2011). The syntax-phonology interface. In J. A. Goldsmith, J. Riggle, and A. C. Yu (eds.), The Handbook of Phonological Theory (2nd ed.), 435–483. Malden, MA: Blackwell. Selkirk, E. O., and T. Shen (1990). Prosodic domains in Shanghai Chinese. In S. Inkelas and D. Zec (eds.), The Phonology-Syntax Connection, 313–337. Stanford: Stanford University/Chicago University Press. Selkirk, E. O., and K. Tateishi (1991). Syntax and downstep in Japanese. In C. Georgopoulos and R. Ishihara (eds.), Interdisciplinary Approaches to Language, 519–543. Dordrecht: Kluwer. Selting, M. (2007). Lists as embedded structures and the prosody of list construction as an inter actional resource. Journal of Pragmatics 39, 483–526. Sendra, V. C., C. Kaland, M. Swerts, and P. Prieto (2013). Perceiving incredulity: The role of intonation and facial gestures. Journal of Pragmatics 47, 1–13. Sereno, J. A. (1986). Stress pattern differentiation of form class in English. Journal of the Acoustical Society of America 79, S36. Sereno, J. A., L. Lammers, and A. Jongman (2016). The relative contribution of segments and inton ation to the perception of foreign-accented speech. Applied Psycholinguistics 37(2), 303–322. Sergeant, R. L., and J. D. Harris (1962). Sensitivity to unidirectional frequency modulation. Journal of the Acoustical Society of America 34, 1625–1628. Seržants, I. (2003). Die Intonationen der suffixalen und Endsilben im Lettischen: Synchronie und Diachronie. Baltu filoloģija 12(1), 83–122. Sethi, J. (1980). Word accent in educated Punjabi speakers’ English. Bulletin of the Central Institute of English 16, 35–48. Sezer, E. (1983). On non-final stress in Turkish. Journal of Turkish Studies 5, 61–69. Shah, A., S. R. Baum, and V. Dwidedi (2006). Neural substrates of linguistic prosody: Evidence from syntactic disambiguation in the productions of brain-damaged patients. Brain and Language 96, 78–89.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 843 Shahid, S., E. Krahmer, and M. Swerts (2008). Alone or together: Exploring the effect of physical copresence on the emotional expressions of game playing children across cultures. In P. Markopoulos, B. de Ruyter, W. IJsselsteijn, and D. Rowland (eds.), Fun and Games, 94–105. Berlin: Springer. Shannon, R. V., F. G. Zeng, V. Kamath, J. Wygonski, and M. Ekelid (1995). Speech recognition with primarily temporal cues. Science 270, 303–304. Shapely, M. (1987). Prosodic variation and audience response. Papers in Pragmatics 1, 66–79. Shapiro, B. E., and M. Danly (1985). The role of the right hemisphere in the control of speech prosody in propositional and affective contexts. Brain and Language 25, 19–36. Sharpe, M. C. (1972). Alawa Phonology and Grammar (Australian Aboriginal Studies 37, Linguistics Series 15). Canberra: Australian Institute of Aboriginal Studies. Shattuck-Hufnagel, S. (1992). The role of word structure in segmental serial ordering. Cognition 42, 213–259. Shattuck-Hufnagel, S. (2017). Individual differences in the signaling of prosodic structure by changes in voice quality. Journal of the Acoustical Society of America 142(4 Pt 2), 2521. Shattuck-Hufnagel, S., M. Ostendorf, and K. Ross (1994). Stress shift and early pitch accent placement in lexical items in American English. Journal of Phonetics 22, 357–388. Shattuck-Hufnagel, S., and A. Ren (2018). The prosodic characteristics of non-referential co-speech gestures in a sample of academic-lecture-style speech. Frontiers of Psychology 9, 1514. Shattuck-Hufnagel, S., and A. E. Turk (1996). A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research 25, 193–247. Shattuck-Hufnagel, S., Y. Yasinnik, N. Veilleux, and M. Renwick (2007). Time alignment of gestures and prosody in American English academic-lecture-style speech: ‘Hits’ and pitch accents. In A. Esposito, M. Bratanic, E. Keller, and M. Marinaro (eds.), Fundamentals of Verbal and Nonverbal Communication and the Biometric Issue: Vol. 18. NATO Security Through Science Series: Human and Societal Dynamics, 34–44. Washington, DC: IOS Press. Shatzman, K. B., and J. M. McQueen (2006). Prosodic knowledge affects the recognition of newly acquired words. Psychological Science 17, 372–377. Shaw, J. A., and S. Kawahara (2018). The lingual articulation of devoiced /u/ in Tokyo Japanese. Journal of Phonetics 66, 100–119. Shaw, P. (1985). Coexistent and competing stress rules in Stoney (Dakota). International Journal of American Linguistics 51, 1–18. Shaw, P. (2009). Default-to-opposite stress in Kʷak’ʷala: Quantity sensitivities in a default-to-right system. Paper presented at the Society for the Study of the Indigenous Languages of the Americas Annual Meeting, Berkeley. Shen, J. (1992). On Chinese intonation models. Chinese Studies 4, 16–24. Shen, X. S. (1989a). The Prosody of Mandarin Chinese. Berkeley: University of California Press. Shen, X. S. (1989b). Toward a register approach in teaching Mandarin tones. Journal of Chinese Language Teachers Association 24, 27–47. Shen, X. S. (1990). Tonal coarticulation in Mandarin. Journal of Phonetics 18, 281–295. Sheppard, J. P., J. P. Wang, and P. C. Wong (2012). Large-scale cortical network properties predict future sound-to-word learning success. Journal of Cognitive Neuroscience 24(5), 1087–1103. Shi, R. (2014). Functional morphemes and early language acquisition. Child Development Perspectives 8(1), 6–11. Shi, R., J. Gao, A. Achim, and A. Li (2017a). Perception and representation of lexical tones in native Mandarin-learning infants and toddlers. Frontiers in Psychology 8, 1117. Shi, R., and Melançon, A. (2010). Syntactic categorization in French-learning infants. Infancy 15(5), 517–533. Shi, R., J. Morgan, and P. Allopenna (1998). Phonological and acoustic bases for earliest grammatical category assignment: A cross-linguistic perspective. Journal of Child Language 25(1), 169–201. Shi, R., E. Santos, J. Gao, and A. Li (2017b). Perception of similar and dissimilar lexical tones by non-tone-learning infants. Infancy 22(6), 790–800. Shi, R., J. F. Werker, and J. Morgan (1999). Newborn infants’ sensitivity to perceptual cues to lexical and grammatical words. Cognition 72(2), B11–B21.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

844 References Shiamizadeh, Z., J. Caspers, and N. O. Schiller (2017). The role of prosody in the identification of Persian sentence types: Declarative or wh-question? Linguistics Vanguard (1), 20160085. doi: https:// doi.org/10.1515/lingvan-2016-0085. Shibatani, M. (1990). The Languages of Japan. Cambridge: Cambridge University Press. Shields, J. L., A. McHugh, and J. G. Martin (1974). Reaction time to phoneme targets as a function of rhythmic cues in continuous speech. Journal of Experimental Psychology 102, 250–255. Shih, C. (1986). The prosodic domain of tone sandhi in Chinese. PhD dissertation, University of California, San Diego. Shih, C. (1988). Tone and intonation in Mandarin. In Working Papers of the Cornell Phonetics Laboratory 3: Stress, Tone and Intonation, 83–109. Ithaca: Cornell University. Shih, C. (2000). A declination model of Mandarin Chinese. In A. Botinis (ed.), Intonation: Analysis, Modelling and Technology, 243–268. Dordrecht: Kluwer. Shih, S.-H. (2016). Sonority-driven stress does not exist. In G. O. Hansson, A. Farris-Trimble, K. McMullin, and D. Pulleyblank (eds.), Supplemental Proceedings of the 2015 Annual Meeting on Phonology. Washington, DC: Linguistic Society of America. Shih, S.-H. (2018). On the existence of sonority-driven stress: Gujarati. Phonology 35, 327–364. Shih, S. S. (2014). Towards optimal rhythm. PhD dissertation, Stanford University. Shimizu, K. (1983). The Zing dialect of Mumuye: A Descriptive Grammar with a Mumuye-English Dictionary and an English-Mumuye Index. Hamburg: Helmut Buske. Shin, J., J. Kiaer, and J. Cha (2013) The Sounds of Korean. Cambridge: Cambridge University Press. Shklovsky, K. (2011). Petalcingo Tseltal intonational prosody. In K. Shklovsky, P. M. Pedro, and J. Coon (eds.), Proceedings of Formal Approaches to Mayan Linguistics (FAMLi), 209–220. Cambridge, MA: MIT Working Papers in Linguistics. Shneidman, L. A., and S. Goldin-Meadow (2012). Language input and acquisition in a Mayan village: How important is directed speech? Developmental Science 15(5), 659–673. Shokeir, V. (2007). Uptalk in Southern Ontario English. Paper presented at New Ways of Analyzing Variation 36, Philadelphia. Shokeir, V. (2008). Evidence for the stable use of uptalk in South Ontario English. University of Pennsylvania Working Papers in Linguistics 14(2), 16–24. Shport, I. A., and M. A. Redford (2014). Lexical and phrasal prominence patterns in school-aged children’s speech. Journal of Child Language 41, 890–912. Shriberg, E. (2007). Higher-level features in speaker recognition. In C. Müller (ed.), Speaker Classification I: Fundamentals, Features, and Methods, 241–259. Berlin: Springer. Shriberg, E., and R. Lickley (1993). Intonation of clause-internal filled pauses. Phonetica 50, 1 72–179. Shriberg, E., and A. Stolcke (2001). Prosody modeling for automatic speech understanding: An overview of recent research at SRI. In M. Bacchiani, J. Hirschberg, D. Litman, and M. Ostendorf (eds.), Proceedings of the Workshop on Prosody and Speech Recognition, Redbank, NJ. Shriberg, E., and A. Stolcke (2004). Direct modeling of prosody: An overview of applications in automatic speech processing. In Proceedings of Speech Prosody 2, 575–582, Nara. Shriberg, L. D., J. Kwiatkowski, and C. Rasmussen (1990). Prosody-Voice Screening Profile (PVSP): Scoring Forms and Training Materials. Tucson: Communication Skill Builders. Shriberg, L. D., J. Kwiatkowski, C. Rasmussen, G. L. Lof, and J. F. Miller (1992). The Prosody-Voice Screening Profile (PVSP): Psychometric Data and Reference Information for Children (Technical Report 1). Madison: Waisman Center on Mental Retardation and Human Development, University of Wisconsin–Madison. Shriberg, L. D., R. Paul, J. L. McSweeny, A. Klin, D. J. Cohen, and F. Volkmar (2001). Speech and prosody characteristics of adolescents and adults with high-functioning autism and Asperger syndrome. Journal of Speech, Language, and Hearing Research 44(5), 1097–1115. Shuai, L., and J. G. Malins (2017). Encoding lexical tones in jTRACE: A simulation of monosyllabic spoken word recognition in Mandarin Chinese. Behavior Research Methods 49, 230–241. Shuken, C. (1980). An instrumental investigation of some Scottish Gaelic consonants. PhD dissertation, University of Edinburgh. Shukla, M., K. S. White, and R. N. Aslin (2011). Prosody guides the rapid mapping of auditory word forms onto visual objects in 6-mo-old infants. Proceedings of the National Academy of Sciences of the United States of America 108(15), 6038–6043.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 845 Shwayder, K. (2015). Words and subwords: Phonology in a piece-based syntactic morphology. PhD dissertation, University of Pennsylvania. Shyrock, A. (1993). A metrical analysis of stress in Cebuano. Lingua 91, 103–148. Sichel-Bazin, R., C. Buthke, and T. Meisenburg (2012). The prosody of Occitan-French bilinguals. In K. Braunmüller and C. Gabriel (eds.), Multilingual Individuals and Multilingual Societies, vol. 13, 349–364. Amsterdam: John Benjamins. Sicoli, M. A. (2007). Tono: A linguistic ethnography of tone and voice in a Zapotec region. PhD dissertation, University of Michigan. Sicoli, M. A., T. Stivers, N. J. Eneld, and S. C. Levinson (2015). Marked initial pitch in questions signals marked communicative function. Language and Speech 58(2), 204–223. Sidtis, J. J. (1980). On the nature of the cortical function underlying right hemisphere auditory perception. Neuropsychologia 18, 321–330. Sidtis, J. J., J.-S. Ahn, C. Gomez, and D. Van Lancker Sidtis (2011). Speech characteristics associated with three genotypes of ataxia. Journal of Communication Disorders 44, 478–492. Sidtis, J. J., and D. Sidtis (2012). Preservation of relational timing in speech of persons with Parkinson’s disease with and without deep brain stimulation. Journal of Medical Speech-Language Pathology 20, 140–151. Sidtis, J. J., and D. Van Lancker Sidtis (2003). A neurobehavioral approach to dysprosody. Seminars in Speech and Language 24, 93–105. Sidtis, J. J., and B. T. Volpe (1988). Selective loss of complex-pitch or speech discrimination after unilateral cerebral lesion. Brain and Language 34, 235–245. Silva, W. (2016). The status of the laryngeals ‘ʔ’ and ‘h’ in Desano. In H. Avelino, M. Coler, and L. Wetzels (eds.), The Phonetics and Phonology of Laryngeal Features in Native American Languages, 285–307. Leiden: Brill. Silverman, D. (1997a). Laryngeal complexity in Otomanguean vowels. Phonology 14, 235–261. Silverman, D. (1997b). Phasing and Recoverability. New York: Routledge. Silverman, D., B. Blankenship, P. L. Kirk, and P. Ladefoged (1995). Phonetic structures in Jalapa Mazatec. Anthropological Linguistics 37(1), 70–88. Silverman, K. (1987). The structure and processing of fundamental frequency contours. PhD dissertation, University of Cambridge. Silverman, K., M. E. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J. B. Pierrehumbert, and J. Hirschberg (1992). ToBI: A standard for labeling English prosody. In Proceedings of the International Congress on Speech and Language Processing, 866–870, Banff. Silverman, K., and J. B. Pierrehumbert (1990). The timing of prenuclear high accents in English. In J. Kingston and M. E. Beckman (eds.), Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech, 72–106. Cambridge: Cambridge University Press. Simard, C. (2010). The prosodic contours of Jaminjung, a language of northern Australia. PhD dissertation, University of Manchester. Simard, C., and C. Wegener (2017). Fronted NPs in a verb-initial language: Clause-internal or external? Prosodic cues to the rescue! Glossa: A Journal of General Linguistics 2(1), 51, 1–32. Simeone-Senelle, M. C. (1997). The modern South Arabian languages. In R. Hetzron (ed.), The Semitic Languages, 378–423. London: Routledge. Simeone-Senelle, M. C. (2011). Modern South Arabian. In S. Weninger, G. Khan, M. P. Streck, and J. C. E. Watson (eds.), The Semitic Languages: An International Handbook, 1073–1113. Berlin: De Gruyter Mouton. Simmons, J., and C. Baltaxe (1975). Language patterns of adolescent autistics. Journal of Autism and Childhood Schizophrenia 5(4), 333–351. Simmons, E. S., R. Paul, and F. Shic (2016). A mobile application to treat prosodic deficits in autism spectrum disorder and other communication impairments: A pilot study. Journal of Autism and Developmental Disorders 46(1), 320–327. Simon, A. C. (2012). Quelles avancées dans létude de la variation prosodique régionale en français? In A. C. Simon (ed.), La variation prosodique régionale en français, 231–247. Brussels: De Boeck/ Duculot.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

846 References Simpson, J., and I. Mushin (2008). Clause-initial position in four Australian languages. In I. Mushin and B. J. Baker (eds.), Discourse and Grammar in Australian Languages, 25–57. Amsterdam: John Benjamins. Singer, R. (2006). Information structure in Mawng: Intonation and focus. In K. Allan (ed.), Selected papers of the 2005 Conference of the Australian Linguistics Society. http://www.als.asn.au. Singh, L., and C. S. Fu (2016). A new view of language development: The acquisition of lexical tone. Child Development 87(3), 834–854. Singh, L., H. H. Goh, and T. D. Wewalaarachchi (2015). Spoken word recognition in early childhood: Comparative effects of vowel, consonant and lexical tone variation. Cognition 142, 1–11. Singh, L., T. J. Hui, C. Chan, and R. M. Golinkoff (2014). Influences of vowel and tone variation on emergent word knowledge: A cross-linguistic investigation. Developmental Science 17(1), 94–109. Singh, L., J. Morgan, and C. T. Best (2002). Infant listening preferences: Baby talk or happy talk? Infancy 3, 365–394. Singh, L., K. S. White, and J. Morgan (2008). Building a word-form lexicon in the face of variable input: Influences of pitch and amplitude on early spoken word recognition. Language Learning and Development 4(2), 157–178. Singler, J. V. (1984). On the underlying representation of contour tones in Wobe. Studies in African Linguistics 15, 59–75. Siptár, P., and M. Törkenczy (2007). The Phonology of Hungarian (2nd ed.). Oxford: Oxford University Press. Sirk, Ü. (1996). The Buginese Language of Traditional Literature. Moscow: Author. Sirsa, H., and M. A. Redford (2011). Towards understanding the protracted acquisition of English rhythm. In Proceedings of the 17th International Congress of Phonetic Sciences, 1862–1865, Hong Kong. Sischo, W. (1979). Michoacán Nahual. In R. Langacker (ed.), Studies in Uto-Aztecan Grammar: Vol. 2. Modern Aztec Grammatical Sketches, 307–380. Arlington: Summer Institute of Linguistics. Sjerps, M. J., C. Zhang, and G. Peng (2018). Lexical tone is perceived relative to locally surrounding context, vowel quality to preceding context. Journal of Experimental Psychology: Human Perception and Performance 44, 914–924. Skarnitzl, R., and J. Volín (2012). Referenční hodnoty vokalických formantů pro mladé dospělé mluvčí standardní češtiny. Akustické listy 18, 7–11. Skipper, J., V. van Wassenhove, H. Nusbaum, and S. Small (2007). Hearing lips and seeing voices: How cortical areas supporting speech production mediate audiovisual speech perception. Cerebral Cortex 17(10), 2387–2399. Skoe, E., B. Chandrasekaran, E. R. Spitzer, P. C. Wong, and N. Kraus (2014). Human brainstem plasti city: The interaction of stimulus probability and auditory learning. Neurobiology of Learning and Memory 109, 82–93. Skoe, E., and N. Kraus (2010). Auditory brainstem response to complex sounds: A tutorial. Ear and Hearing 31(3), 302–324. Skopeteas, S., I. Fiedler, S. Hellmuth, A. Schwarz, R. Stoel, G. Fanselow, C. Féry, and M. Krifka (2006). Questionnaire on Information Structure (QUIS) (Interdisciplinary Studies on Information Structure 4). Potsdam: Universitätsverlag Potsdam. Skopeteas, S. (2010). Syntax-phonology interface and clitic placement in Mayan languages. In V. Torrens, L. Escobar, A. Gavarró, and J. Gutiérrez (eds.), Movement and Clitics: Adult and Child Grammar, 307–331. Newcastle upon Tyne: Cambridge Scholars. Skopeteas, S. (2016). Information structure in modern Greek. In C. Féry and S. Ishihara (eds.), Oxford Handbook of Information Structure, 686–708. Oxford: Oxford University Press. Skopeteas, S., and G. Fanselow (2010). Focus types and argument asymmetries: A cross-linguistic study in language production. In C. Breul and E. Göbbel (eds.), Comparative and Contrastive Studies of Information Structure (Linguistics Today 165), 165–197. Amsterdam: John Benjamins. Skopeteas, S., and C. Féry (2010). Effect of narrow focus on tonal realization in Georgian. In Proceedings of Speech Prosody 5, Chicago.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 847 Skopeteas, S., C. Féry, and R. Asatiani (2009). Word order and intonation in Georgian. Lingua 119, 102–127. Skopeteas, S., C. Féry, and A. Rusudan (2018). Prosodic separation of postverbal material in Georgian: A corpus study on syntax-phonology interface. In E. Adamou, K. Haude, and M. Vanhove (eds.), Information Structure in Lesser-Described Languages: Studies in Prosody and Syntax, 17–50. Amsterdam: John Benjamins. Slowiaczek, L. M. (1990). Effects of lexical stress in auditory word recognition. Language and Speech 33, 46–68. Slowiaczek, L. M. (1991). Stress and context in auditory word recognition. Journal of Psycholinguistic Research 20, 465–481. Sluijter, A. M. C., S. Shattuck-Hufnagel, K. N. Stevens, and V. J. van Heuven (1995). Supralaryngeal resonance and glottal pulse shape as correlates of prosodic stress and accent in American English. In Proceedings of the 13th International Congress of Phonetic Sciences, vol. 2, 630–633, Stockholm. Sluijter, A. M. C., and J. Terken (1993). Beyond sentence prosody: Paragraph intonation in Dutch. Phonetica 50, 180–188. Sluijter, A. M. C., and V. J. van Heuven (1995). Effects of focus distribution, pitch accent and lexical stress on the temporal organization of syllables in Dutch. Phonetica 52(2), 71–89. Sluijter, A. M. C., and V. J. van Heuven (1996a). Spectral balance as an acoustic correlate of linguistic stress. Journal of the Acoustical Society of America 100, 2417–2485. Sluijter, A. M. C., and van Heuven, V. J. (1996b). Acoustic correlates of linguistic stress and accent in Dutch and American English. In Proceedings of the 4th International Conference on Spoken Language Processing, 630–633, Philadelphia. Sluijter, A. M. C., V. J. van Heuven, and J. Pacilly (1997). Spectral balance as a cue in the perception of linguistic stress. Journal of the Acoustical Society of America 101, 503–513. Smiljanić, R. (2004). Lexical, Pragmatic, and Positional Effects on Prosody in Two Dialects of Croatian and Serbian: An Acoustic Study. New York: Routledge. Smiljanić, R., and J. I. Hualde (2000). Lexical and pragmatic functions of tonal alignment in two Serbo-Croatian dialects. In Proceedings of the 36th Meeting of the Chicago Linguistics Society, 469–482. Chicago: Chicago Linguistic Society. Smith, A. (1966). Speech and other functions after left (dominant) hemispherectomy. Journal of Neurology, Neurosurgery, and Psychiatry 29, 467–471. Smith, A. (2017). The languages of Borneo: A comprehensive classification. PhD dissertation, University of Hawaiʻi. Smith, A., and H. N. Zelaznik (2004). Development of functional synergies for speech motor coord ination in childhood and adolescence. Developmental Psychobiology 45(1), 22–33. Smith, C. (1975). Residual hearing and speech production in the deaf. Journal of Speech and Hearing Research 19, 795–811. Smith, N. (1973). The Acquisition of Phonology. Cambridge: Cambridge University Press. Smith, N., and H. Strader (2014). Infant-directed visual prosody: Mothers’ head movements and speech acoustics. Interaction Studies 15(1), 38–54. Smith, S. L., K. J. Gerhardt, S. K. Griffiths, X. Huang, and R. M. Abrams (2003). Intelligibility of sentences recorded from the uterus of a pregnant ewe and from the fetal inner ear. Audiology and Neuro-Otology 8, 47–353. Smith, V. L., and H. H. Clark (1993). On the course of answering questions. Journal of Memory and Language 32, 25–38. Smitheran, J. R., and T. Hixon (1981). A clinical method for estimating laryngeal airway resistance during vowel production. Journal of Speech and Hearing Disorders 46, 138–146. Smorenburg, L., J. Rodd, and A. Chen (2015). The effect of explicit training on the prosodic production of L2 sarcasm by Dutch learners of English. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Smothermon, J., P. S. Frank, and J. Smothermon (1995). Bosquejo del Macuna: Aspectos de la cultura material de los macunas, fonología, gramática. Bogotá: Instituto Lingüístico de Verano. Snedeker, J., and S. Yuan (2008). Effects of prosodic and lexical constraints on parsing in young children (and adults). Journal of Memory and Language 58, 574–608.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

848 References Snider, K. (1990). Tonal upstep in Krachi: Evidence for a register tier. Language 66, 453–474. Snow, C. E. (1977). The development of conversation between mothers and babies. Journal of Child Language 4, 1–22. Snijders, T. M., V. Kooijman, A. Cutler, and P. Hagoort (2007). Neurophysiological evidence of delayed segmentation in a foreign language. Brain Research 1178, 106–113. Snow, C. E., and C. Ferguson (eds.) (1977) Talking to Children. Cambridge: Cambridge University Press. Snow, D. (1994). Phrase-final syllable lengthening and intonation in early child speech. Journal of Speech and Hearing Research 37, 831–840. Snow, D. (1995). Formal regularity of the falling tone in children’s early meaningful speech. Journal of Phonetics 23(4), 387–405. Snow, D. (1998). Prosodic markers of syntactic boundaries in the speech of 4-year-old children with normal and disordered language development. Journal of Speech, Language, and Hearing Research 41(5), 1158–1170. Snow, D. (2006). Regression and reorganization of intonation between 6 and 23 months. Child Development 77, 281–296. Snow, D., and H. L. Balog (2002). Do children produce the melody before the words? A review of developmental intonation research. Lingua 112, 1025–1058. Snow, D., and D. Ertmer (2012). Children’s development of intonation during the first year of cochlear implant experience. Clinical Linguistics and Phonetics 26(1), 51–70. So, L. K., and B. Dodd (1995). The acquisition of phonology by Cantonese-speaking children. Journal of Child Language 22(3), 473–495. Soares, M. F. (2000). O suprasegmental em Tikuna e a teoria fonológica: Vol. 1. Investigação de aspectos da sintaxe Tukuna. Campinas: Editora da UNICAMP. Sobrino Gómez, C. M. (2010). Las vocales con tono del maya yucateco: Descripción y génesis. MA thesis, Centro de Investigaciones y Estudios Superiores en Antropología Social. Soderstrom, M. (2007). Beyond babytalk: Re-evaluating the nature and content of speech input to preverbal infants. Developmental Review 27, 501–532. Soderstrom, M., M. Blossom, I. Foygel, and J. Morgan (2008). Acoustical cues and grammatical units in speech to two preverbal infants. Journal of Child Language 35, 869–902. Söderström, P., M. Horne, P. Mannfolk, D. van Westen, and M. Roll (2017). Tone-grammar associ ation within words: Concurrent ERP and fMRI show rapid neural pre-activation and involvement of left inferior frontal gyrus in pseudowords. Brain and Language 174, 119–126. Soderstrom, M., E. S. Ko, and U. Nevzorova (2011). It’s a question? Infants attend differently to yes/no questions and declaratives. Infant Behavior and Development 34(1), 107–110. Sohmer, H., R. Perez, J. Y. Sichel, R. Priner, and S. Freeman (2001). The pathway enabling external sounds to reach and excite the fetal inner ear. Audiology and Neuro-Otology 6, 109–116. Sohn, H.-M. (1999/2001). The Korean Language. Cambridge: Cambridge University Press. Solnit, D. (2003). Eastern Kayah Li. In G. Thurgood and R. J. LaPolla (eds.), The Sino-Tibetan Languages, 623–631. London: Routledge. Sommerfelt, A. (1922). The Dialect of Torr, Co. Donegal. Christiania: J. Dybwad. Son, J. (2007). Kankokugo Shohōgen no Akusento Kijutsu. PhD dissertation, University of Tokyo. (Rev. version published 2017, Seoul: Chaek-Sarang.) Song, J. H., E. Skoe, P. C. Wong, and N. Kraus (2008). Plasticity in the adult human auditory brainstem following short-term linguistic training. Journal of Cognitive Neuroscience 20(10), 1892–1902. Sorianello, P. (2001). Modelli intonativi dell’interrogazione in una varietà di italiano meridionale (Cosenza). Rivista italiana di dialettologia 25, 85–108. Soto-Faraco, S., N. Sebastián-Gallés, and A. Cutler (2001). Segmental and suprasegmental mismatch in lexical access. Journal of Memory and Language 45(3), 412–432. Sparks, B. F., S. D. Friedman, D. W. Shaw, E. H. Aylward, D. Echelard, A. A. Artru, K. R. Maravilla, J. N. Giedd, J. Munson, G. Dawson, and S. R. Dager (2002). Brain structural abnormalities in young children with autism spectrum disorder. Neurology 59(2), 184–192.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 849 Spence, K., G. Villar, and J. Arciuli (2012). Markers of deception in Italian speech. Frontiers in Psychology 3, 453. Sperry, E. E., and R. I. Klich (1992). Speech breathing in senescent and younger women during oral reading. Journal of Speech and Hearing Research 35, 1246–1255. Spierings, M., J. Hubert, and C. ten Cate (2017). Selective auditory grouping by zebra finches: Testing the iambic–trochaic law. Animal Cognition 20, 665–675. Spilker, J., A. Batliner, and E. Nöth (2001). How to repair speech repairs in an end-to-end system. In Proceedings of the International Speech Communication Association Workshop on Disfluency in Spontaneous Speech (DISS’01), 73–76, Edinburgh. Sporer, S. L., and B. Schwandt (2006). Paraverbal indicators of deception: A meta analytic synthesis. Applied Cognitive Psychology 20, 421–446. Spradlin, L. (2016). OMG the word-final alveopalatals are cray-cray prev(alent): The morphophon ology of totes constructions in English. University of Pennsylvania Working Papers in Linguistics 22, 30. Springer, L., K. Willmes, and E. Haag (1993). Training in the use of wh-questions and prepositions in dialogues: A comparison of two different approaches in aphasia therapy. Aphasiology 7, 251–270. Sproat, R. (1998). Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Dordrecht: Kluwer. Srebot-Rejec, T. (1988). Word Accent and Vowel Duration in Standard Slovene: An Acoustic and Linguistic Investigation (Slavistische Beitrage 226). Munich: Otto Sagner. Srinivasan, R. J., and D. W. Massaro (2003) Perceiving prosody from the face and voice: Distinguishing statements from echoic questions in English. Language and Speech 46(1), 1–22. Stacy, E. (2004). Phonological aspects of Blackfoot prominence. MA thesis, University of Alberta. Stahl, B., and D. Van Lancker Sidtis (2015). Tapping into neural resources of communication: Formulaic language in aphasia therapy. Frontiers in Psychology 6, 1526. Stanton, J. (2016). Learnability shapes typology: The case of the midpoint pathology. Language 92, 753–791. Stark, S., and P. Machin (1977). Stress and tone in Tlacoyalco Popoloca. In W. R. Merrifield (ed.), Studies in Otomanguean Phonology, 69–92. Dallas: Summer Institute of Linguistics. Starks, D., J. Christie, and L. Thompson (2007). Niuean English: Initial insights into an emerging variety. English World-Wide 28(2), 133–146. Statistics South Africa (2011). Census (2011). Retrieved 22 May 2020 from http://www.statssa.gov.za/ census/census_2011/census_products/Census_2011_Census_in_brief.pdf. Steedman, M. (2000). Information structure and the syntax-phonology interface. Linguistic Inquiry 31(4), 649–689. Steedman, M. (2014). The surface-compositional semantics of English intonation. Language 90, 2–57. Steele, J. (1779). Prosodia Rationalis: Or, an Essay Towards Establishing the Melody and Measure of Speech to be Expressed and Perpetuated by Peculiar Symbols. London: J. Nichols. Steffen-Batóg, M. (1996). Struktura przebiegu melodii języka polskiego ogólnego. Poznań: SORUS. Steffen-Batóg, M. (2000). Struktura akcentowa języka polskiego. Warsaw: Wydawnictwo Naukowe PWN. Stehwien, S., and N. T. Vu (2017). Prosodic event recognition using convolutional neural networks with context information. In INTERSPEECH 2017, 2326–2330, Stockholm. Stein, B. E. (2012). The New Handbook of Multisensory Processing. Cambridge, MA: MIT Press. Steiner, R. C. (1997). Ancient Hebrew. In R. Hetzron (ed.), The Semitic Languages, 145–173. London: Routledge. Stenzel, K. (2007). Glottalization and other suprasegmental features in Wanano. International Journal of American Linguistics 73(3), 331–366. Stenzel, K., and D. Demolin (2013). Traços Laringais em Kotiria e Waikhana (Tukano Oriental). In G. Collisschonn and L. Bisol (eds.), Fonologia: Teoria e Perspectivas—Anais do IV Seminario Internacional de Fonologia, 77–100. Porto Alegre: PURS. Steriade, D. (1988). Reduplication and syllable transfer in Sanskrit and elsewhere. Phonology 5, 73–155.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

850 References Steriade, D. (1994). Complex onsets as single segments: The Mazateco pattern. In J. Cole and C. Kisseberth (eds.), Perspectives in Phonology, 203–291. Stanford: CSLI. Stern, D. N., S. Spieker, R. K. Barnett, and K. MacKain (1983). The prosody of maternal speech: Infant age and context related changes. Journal of Child Language 10(1), 1–15. Stern, D. N., S. Spieker, and K. MacKain (1982). Intonation contours as signals in maternal speech to prelinguistic infants. Developmental Psychology 18(5), 727–735. Stevanović, M. (1989). Savremeni Srpskohrvatski jezik. Beograd: Naucna Knjiga. Stewart, J. (1965). The typology of the Twi tone system. Bulletin of the Institute of African Studies 1, 1–27. Stoakes, H., J. Fletcher, and A. Butcher (2019). Nasal coarticulation in Bininj Kunwok: An aero dynamic analysis. Journal of the International Phonetic Association. doi: 10.1017/S0025100318000282. Stockmal, V., M. Dace, and B. Dzintra (2005). Measures of native and non-native rhythm in a quantity language. Language and Speech 48(1), 55–63. Stoel, R. B. (2005). Focus in Manado Malay. Leiden: CNWS. Stoel, R. B. (2006). The intonation of Banyumas Javanese. In Proceedings of Speech Prosody 3. Dresden. Stoel, R. B. (2008). Fataluku as a tone language. In P. Sidwell and U. Tadmor (eds.), SEALS XVI: Papers from the 16th Annual Meeting of the Southeast Asian Linguistics Society, 75–83. Canberra: Pacific Linguistics. Stojkov, S. (1966). Uvod văv fonetikata na bălgarskija ezik. Sofia: Nauka i izkustvo. Stokoe, C. William, Jr (1960). Sign language structure: An outline of the visual communication system of the American deaf. Journal of Deaf Studies and Deaf Education 10(1), 3–37. Stolcke, A., K. Ries, N. Coccaro, E. Shriberg, R. Bates, D. Jurafsky, P. Taylor, R. Martin, C. V. EssDykema, and M. Meteer (2000). Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational Linguistics 26(3), 339–373. Stolcke, A., E. Shriberg, D. Hakkani-Tür, and G. Tür (1999). Modeling the prosody of hidden events for improved word recognition. In Eurospeech 1999, 307–310, Budapest. Stone, M. (2010). Laboratory techniques for investigating speech articulation. In W. J. Hardcastle, J. Laver, and F. E. Gibbon (eds.), The Handbook of Phonetic Sciences (2nd ed.), 9–38. Hoboken: John Wiley & Sons. Stonham, J. (1994). Combinatorial Morphology. Amsterdam: John Benjamins. Storto, L., and D. Demolin (2005). Pitch accent in Karitiana. In S. Kaji (ed.), Proceedings of the Symposium Cross Linguistic Studies of Tonal Phenomena: Historical development, tone-syntax interface, and descriptive studies, 329–355. Tokyo: Research Institute for Language and Culture of Asia and Africa, Tokyo University of Foreign Studies. Storto, L., and D. Demolin (2012). The phonetics and phonology of South American languages. In L. Campbell and V. Grondona (eds.), The Indigenous Languages of South America: A Comprehensive Guide, 331–390. Berlin: Mouton de Gruyter. Strangert, E. (1985). Swedish Speech Rhythm in a Cross-Language Perspective (Acta Universitatis Umensis 69). Stockholm: Almqvist and Wiksell International. Street, R. L. (1984). Speech convergence and speech evaluation in fact-finding interviews. Human Communication Research 11(2), 139–169. Streeter, L., R. M. E. Krauss, V. Geller, C. Olson, and W. Apple (1977). Pitch changes during attempted deception. Journal of Personality and Social Psychology 35(5), 345–350. Streit, M., A. Batliner, and T. Portele (2006). Emotion analysis and emotion-handling subdialogues. In W. Wahlster (ed.), SmartKom: Foundations of Multimodal Dialogue Systems, 317–332. Berlin: Springer. Strik, H., and L. Boves (1995). Downtrend in F0 and Psb. Journal of Phonetics 23, 203–220. Stringer, A. Y. (1996). Treatment of motor aprosodia with pitch biofeedback and expression modeling. Brain Injury 10(8), 583–590. Stroomer, H. (1987). A Comparative Study of Three Southern Oromo Dialects in Kenya. Hamburg: Buske. Strycharczuk, P., and K. Sebregts (2018). Articulatory dynamics of (de)gemination in Dutch. Journal of Phonetics 68, 138–149. Stundžia, B. (2014). Bendrinės lietuvių kalbos akcentologija. Vilnius: Vilniaus Universitetas.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 851 Suga, N. (2008). Role of corticofugal feedback in hearing. Journal of Comparative Physiology A 194(2), 169–183. Sullivant, J. R. (2015). The phonology and inflectional morphology of Cháʔknyá, Tataltepec de Valdés Chatino, a Zapotecan language. PhD dissertation, University of Texas at Austin. Sulpizio, S., and J. M. McQueen (2012). Italians use abstract knowledge about lexical stress during spoken-word recognition. Journal of Memory and Language 66, 177–193. Sumby, W. H., and I. Pollack (1954). Visual contribution to speech intelligibility. Journal of the Acoustical Society of America 26, 212–215. Summerfield, Q. (1981). Articulatory rate and perceptual constancy in phonetic perception. Journal of Experimental Psychology: Human Perception and Performance 7, 1074–1095. Summers, W. V. (1987). Effects of stress and final-consonant voicing on vowel production: Articulatory and acoustic analyses. Journal of the Acoustical Society of America 82, 847–863. Suomi, K. (2009). Durational elasticity for accentual purposes in Northern Finnish. Journal of Phonetics 37(4), 397–416. Suomi, K., J. Toivanen, and R. Ylitalo (2003). Durational and tonal correlates of accent in Finnish. Journal of Phonetics 31(1), 113–138. Suomi, K., J. Toivanen, and R. Ylitalo (2008). Finnish Sound Structure: Phonetics, Phonology, Phonotactics and Prosody. Oulu: University of Oulu. Suomi, K., and R. Ylitalo (2004). On durational correlates of word stress in Finnish. Journal of Phonetics 32(1), 35–63. Surányi, B., S. Ishihara, and F. Schubö (2012). Syntax-prosody mapping, topic-comment structure and stress-focus correspondence in Hungarian. In G. Elordieta and P. Prieto (eds.), Prosody and Meaning (Interface Explorations 25), 35–72. Berlin: De Gruyter Mouton. Suslak, D. (2003). A grammar of 7anyükojmit 7ay2:k (aka ‘Totontepecano Mije’ aka ‘TOT’). PhD dissertation, University of Chicago. Sussex, R., and P. Cubberley (2006). The Slavic Languages. Cambridge: Cambridge University Press. Šuštaršič, R. (1995). Pitch and tone in English and Slovene. Linguistica 35(2), 91–106. Sutcliffe, D. (2003). Eastern Caribbean suprasegmental systems: A comparative view, with particular reference to Barbadian, Trinidadian, and Guyanese. In M. Aceto and J. P. Williams (eds.), Contact Englishes of the Eastern Caribbean, 265–296. Amsterdam: John Benjamins. Sutcliffe, E. F. (1936). A Grammar of the Maltese Language. Oxford: Oxford University Press. Suwilai, P. (2004). Register complex and tonogenesis in Khmu dialects. Mon-Khmer Studies 34, 1–17. Svantesson, J.-O., and D. House (2006). Tone production, tone perception and Kammu tonogenesis. Phonology 23, 309–333. Svantesson, J.-O., A. Tsendina, A. Karlsson, and V. Franzén (2005). The Phonology of Mongolian. Oxford: Oxford University Press. Svetozarova, N. D. (1982). Intonacionnaja sistema russkogo jazyka. St Petersburg: Izdatelstvo Leningradskogo Universiteta. Svetozarova, N. D. (1998). Intonation in Russian. In D. Hirst and A. Di Cristo (eds.), Intonation Systems: A Survey of Twenty Languages, 261–274. Cambridge: Cambridge University Press. Swangviboonpong, D. (2004). Thai Classical Singing: Its History, Musical Characteristics and Transmission. Aldershot, UK: Routledge. Swerts, M. (1997). Prosodic features at discourse boundaries of different strength. Journal of the Acoustical Society of America 101, 514–521. Swerts, M. (2007). Contrast and accent in Dutch and Romanian. Journal of Phonetics 35(3), 380–397. Swerts, M., and E. Krahmer (2005). Audiovisual prosody and feeling of knowing. Journal of Memory and Language 53(1), 81–94. Swerts, M., and E. Krahmer (2008). Facial expression and prosodic prominence: Effects of modality and facial area. Journal of Phonetics 36(2), 219–238. Swerts, M., and E. Krahmer (2010). Visual prosody of newsreaders: Effects of information structure, emotional content and intended audience on facial expressions. Journal of Phonetics 38(2), 197–206.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

852 References Swerts, M., E. Krahmer, and C. Avesani (2002). Prosodic marking of information status in Dutch and Italian: A comparative analysis. Journal of Phonetics 30(4), 629–654. Swerts, M., and S. Zerbian (2010). Intonational differences between L1 and L2 English in South Africa. Phonetica 67(3), 127–146. Swihart, D. A. W. (2003). The two Mandarins: Putonghua and Guoyu. Journal of the Chinese Language Teachers Association 38, 103–118. Syrdal, A. K., J. Hirschberg, J. McGory, and M. Beckman (2001). Automatic ToBI prediction and alignment to speed manual labeling of prosody. Speech Communication 33(1–2), 135–151. Szalontai, Á., P. Wagner, K. Mady, and A. Windmann (2016). Teasing apart lexical stress and sentence accent in Hungarian and German. In Proceedings of 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum, 215–218, Munich. Sze, F. (2009). Topic constructions in Hong Kong Sign Language. Sign Language and Linguistics 12(2), 222–227. Szendröi, K. (2003). A stress-based approach to the syntax of Hungarian focus. The Linguistic Review 20(1), 37–78. Szendröi, K., C. Bernard, F. Berger, J. Gervain, and B. Höhle (2018). Acquisition of prosodic focus marking by English, French and German 3-, 4-, 5- and 6-year-olds. Journal of Child Language 45, 219–241. ’t Hart, J. (1981). Differential sensitivity to pitch distance, particularly in speech. Journal of the Acoustical Society of America 69, 811–821. ’t Hart, J. (1991). F0 stylization in speech: Straight lines versus parabolas. Journal of the Acoustical Society of America 90(6), 3368. ’t Hart, J., R. Collier, and A. Cohen (1990). A Perceptual Study of Intonation: An Experimental-Phonetic Approach. Cambridge: Cambridge University Press. Tabain, M. (2003). Effects of prosodic boundary on /aC/ sequences: Articulatory results. Journal of the Acoustical Society of America 113, 2834–2849. Tabain, M., and R. Beare (2018). An ultrasound study of coronal places of articulation in Central Arrernte: Apicals, laminals and rhotics. Journal of Phonetics 66, 63–81. Tabain, M., J. Fletcher, and A. Butcher (2014). Lexical stress in Pitjantjatjara. Journal of Phonetics 42, 52–66. Tabain, M., and B. Hellwig (2015). Goemai. Journal of the International Phonetic Association 45, 88–104. Tabain, M., and P. Perrier (2005). Articulation and acoustics of /i/ in preboundary position in French. Journal of Phonetics 33, 77–100. Tabossi, P., S. Collina, M. Mazzetti, and M. Zoppello (2000). Syllables in the processing of spoken Italian. Journal of Experimental Psychology: Human Perception and Performance 26(2), 758–775. Taelman, H., and S. Gillis (2003). Hebben Nederlandse kinderen een voorkeur voor trochaısche productievormen? Een onderzoek naar truncaties in kindertaal. Nederlandse Taalkunde 8, 130–157. Taff, A. (1992). Assigning primary stress in Aleut. In Working Papers in Linguistics vol. 10, 275–282. Seattle: University of Washington. Taff, A. (1999). Phonetics and phonology of Unangan (Eastern Aleut) intonation. PhD dissertation, University of Washington. Taff, A., L. Rozelle, T. Cho, P. Ladefoged, M. Dirks, and J. Wegelin (2001). Phonetic structures of Aleut. Journal of Phonetics 29(3), 231–271. Tager-Flusberg, H. (1981). On the nature of linguistic functioning in early infantile autism. Journal of Autism and Developmental Disorders 11, 45–49. Tager-Flusberg, H., S. Calkins, T. Nolin, T. Baumberger, M. Anderson, and A. Chadwick-Dias (1990). A longitudinal study of language acquisition in autistic and Down syndrome children. Journal of Autism and Developmental Disorders 20, 1–21. Tagliapietra, L., and J. M. McQueen (2010). What and where in speech recognition: Geminates and singletons in spoken Italian. Journal of Memory and Language 63, 306–323. Tallal, P., R. Ross, and S. Curtiss (1989). Familial aggregation in specific language impairment. Journal of Speech and Hearing Disorders 54(2), 167–173.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 853 Tan, Y. Y. (2002). The acoustic and perceptual properties of stress in the ethnic sub-varieties of Singapore English. PhD dissertation, National University of Singapore. Tan, Y. Y. (2010). Singing the same tune? Prosodic norming in bilingual Singaporeans. In M. C. Ferreira (ed.), Multilingual Norms, 173–194. Frankfurt: Peter Lang. Tanese-Ito, Y. (1988). The relationship between speech-tones and vocal melody in Thai court song. Musica Asiatica 5, 109–139. Tang, P., N. Xu Rattanasone, I. Yuen, and K. Demuth (2017). Phonetic enhancement of Mandarin vowels and tones: Infant-directed speech and Lombard speech. Journal of the Acoustical Society of America 142, 493–503. Tanner, M., and J. Landon (2009). The effects of computer-assisted pronunciation readings on ESL learners’ use of pausing, stress, intonation, and overall comprehensibility. Language Learning and Technology 13(3), 51–65. Tarlinskaja, M. (2014). Shakespeare and the Versification of English Drama 1561–1622. Farnham: Ashgate. Tarone, E. (1973). Aspects of intonation in Black English. American Speech 48(1–2), 29–36. Tavano, A., and M. Scharinger (2015). Prediction in speech and language processing. Cortex 68, 1–7. Tay, M. W. J., and A. F. Gupta (1983). Towards a description of standard Singapore English. In R. B. Noss (ed.), Varieties of English in Southeast Asia, 173–189. Singapore: RELC. Taylor, P. (2000). Analysis and synthesis of intonation using the Tilt model. Journal of the Acoustical Society of America 107, 1697–1714. Tedlock, D. (1983). The Spoken Word and the Work of Interpretation. Philadelphia: University of Pennsylvania Press. Tejada, L. (2012). Tone gestures and constraint interaction in Sierra Juarez Zapotec. PhD dissertation, University of Southern California. Telles, S. (2002). Fonologia e Gramática Latundê/Lakondê. PhD dissertation, Vrije Universiteit Amsterdam. Temperley, D. (2009). Distributional stress regularity: A corpus study. Journal of Psycholinguistic Research 38(1), 75. Tench, P. (1996). The Intonation Systems of English. London: Cassell. Tent, J., and F. Mugler (2008). Fiji English: Phonology. In B. Kortmann and E. W. Schneider (eds.), Varieties of English: Vol. 3. The Pacific and Australasia, 234–266. Berlin: Mouton de Gruyter. Terhardt, E. (1974). Pitch, consonance, and harmony. Journal of the Acoustical Society of America 55, 1061–1069. Terhardt, E., G. Stoll, and M. Seewann (1982). Pitch of complex signals according to virtual-pitch theory: Tests, examples, and predictions. Journal of the Acoustical Society of America 71, 671–678. Terken, J. (1995). The perceptual relevance of micro-intonation: Enhancing the voicing distinction in synthetic speech by means of consonantal f0 perturbation. In L. Hunyadi, M. Gosý, and G. Olaszy (eds.), Studies in Applied Linguistics, vol. 2, 103–124. Hungary: University of Debrecen. Terken, J., and D. J. Hermes (2000). The perception of prosodic prominence. In M. Horne (ed.), Prosody: Theory and Experiment: Studies Presented to G. Bruce, 89–128. Dordrecht: Kluwer. Ternes, E. (1992). The Breton language. In D. MacAulay (ed.), The Celtic Languages, 371–442. Cambridge: Cambridge University Press. Ternes, E. (2006). The Phonemic Analysis of Scottish Gaelic (3rd ed.). Dublin: Dublin Institute for Advanced Studies. Ternes, E. (2011). Balto-Slavic accentology in a European context. In E. Stadnik-Holzer (ed.) Baltische und slavische Prosodie: International Workshop on Balto-Slavic Accentology IV (Scheibbs, 2–4 July 2008), 169–186. Frankfurt: Peter Lang. Tevdoradze, I. (1978). Kartuli enis p’rosodiis sak’itxebi. Tbilisi: Tbilisi State University Press. Themistocleous, C. (2014). Edge-tone effects and prosodic domain effects on final lengthening. Linguistic Variation 14(1), 129–160. Themistocleous, C. (2016). Seeking an anchorage: Stability and variability in tonal alignment of rising prenuclear pitch accents in Cypriot Greek. Language and Speech 59(4), 433–461.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

854 References Thiessen, E. D., E. A. Hill, and J. R. Saffran (2005). Infant-directed speech facilitates word segmentation. Infancy 7, 53–71. Thiessen, E. D., and J. R. Saffran (2003). When cues collide: Use of stress and statistical cues to word boundaries by 7- to 9-month-old infants. Developmental Psychology 39(4), 706–716. Thiesen, W., and D. Weber (2012). A Grammar of Bora with Special Attention to Tone (SIL International Publications in Linguistics 148). Dallas: SIL International. Thomas, C. H. (1967). Welsh intonation: A preliminary study. Studia Celtica 2, 8–28. Thomas, D. (1992). On sesquisyllabic structure. Mon-Khmer Studies 21, 207–210. Thomas, E. (1974). Engenni. In J. Bendor-Samuel (ed.), Ten Nigerian Tone Systems (Studies in Nigerian Languages 4), 13–26. Dallas: Summer Institute of Linguistics/University of Texas Press. Thomas, E. (1978). A Grammatical Description of the Engenni Language. Arlington: University of Texas/Summer Institute of Linguistics. Thomas, E. R. (2007). Phonological and phonetic characteristics of AAVE. Language and Linguistics Compass 1, 450–475. Thomas, E. R. (2015). Prosodic features of African American English. In S. L. Lanehart (ed.), The Oxford Handbook of African American Language, 420–435. New York: Oxford University Press. Thomas, E. R., and H. A. Ericson (2007). Intonational distinctiveness of Mexican American English. University of Pennsylvania Working Papers in Linguistics 13(2), 193–205. Thomason, J., H. V. Nguyen, and D. Litman (2013). Prosodic entrainment and tutoring dialogue success. In H. C. Lane, K. Yacef, J. Mostow, and P. Pavlik (eds.), Artificial Intelligence in Education: AIED 2013, 750–753. Berlin: Springer. Thomason, L., and S. Thomason (2004). Truncation in Montana Salish. In D. B. Gerdts and L. Matthweson (eds.), Studies in Salish Linguistics in Honor of M. Dale Kinkade (University of Montana Occasional Papers in Linguistics 17), 354–376. Missoula: University of Montana. Thompson, L. (1965). A Vietnamese Grammar. Seattle: University of Washington Press. Thompson, W. F., V. Peter, K. N. Olsen, and C. J. Stevens (2012). The effect of intensity on relative pitch. Quarterly Journal of Experimental Psychology 65, 2054–2072. Thomson, R. I. (2012). Improving L2 listeners’ perception of English vowels: A computer-mediated approach. Language Learning 62(4), 1231–1258. Thomson, R. I., and T. M. Derwing (2015). The effectiveness of L2 pronunciation instruction: A narrative review. Applied Linguistics 36(3), 326–344. Thordike, R. L. (1953). Who belongs in the family? Psychometrika 18, 267–276. Thornton, A., C. Iacobini, and C. Burani (1997). BDVDB: Una base dati per il vocabolario di base della lingua italiana (2nd ed.). Rome: Bulzoni. Thorsen, N. (1980). A study of the perception of sentence intonation: Evidence from Danish. Journal of the Acoustical Society of America 67, 1014–1030. Thorson, J., J. Borras-Comes, V. Crespo-Sendra, M. M. Vanrell, and P. Prieto (2014). The acquisition of melodic form and meaning in yes-no interrogatives by Catalan and Spanish speaking children. Probus 26(1), 59–82. Thurgood, G. (1999). From Ancient Cham to Modern Dialects: Two Thousand Years of Language Contact and Change. Honolulu: University of Hawaiʻi Press. Thurgood, G., E. Fengxiang, and L. Fengxiang (2015). A Grammatical Sketch of Hainan Cham: History, Contact, and Phonology. Berlin: Mouton De Gruyter. Thurgood, G., and R. J. LaPolla (2003). The Sino-Tibetan Languages. London: Routledge. Tiersma, P. M. (1999). Frisian Reference Grammar. Leeuwarden: Fryske Akademy. Tilkov, D. (1981). Intonacijata v bălgarskija ezik. Sofija: Narodna Prosveta. Tilsen, S. (2009). Multitimescale dynamical interactions between speech rhythm and gesture. Cognitive Science 33(5), 839–879. Tilsen, S. (2017). Exertive modulation of speech and articulatory phasing. Journal of Phonetics 64, 34–50. To, C. K., P. S. Cheung, and S. McLeod (2013). A population study of children’s acquisition of Hong Kong Cantonese consonants, vowels, and tones. Journal of Speech, Language, and Hearing Research 56(1), 103–122.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 855 Toda, S., A. Fogel, and M. Kawai (1990). Maternal speech to three-month-old infants in the United States and Japan. Journal of Child Language 17, 279–294. Toivanen, J. (2005). ToBI or not ToBI? Testing two models in teaching English intonation to Finns. Proceedings Phonetics Teaching and Learning Conference 2005, London, UK, pp. 1–4. Tomaschek, F., D. Arnold, F. Bröker, and R. H. Baayen (2018). Lexical frequency co-determines the speed-curvature relation in articulation. Journal of Phonetics 68, 103–116. Tomblin, J. B., N. Records, P. Buckwalter, X. Zhang, E. Smith, and M. O’Brien (1997). Prevalence of specific language impairment in kindergarten children. Journal of Speech and Hearing Research 40, 1245–1260. Tomioka, S. (2010). A scope theory of contrastive topics. Iberia: An International Journal of Theoretical Linguistics 2(1), 113–130. Topintzi, N. (2010). Onsets: Suprasegmental and Prosodic Behaviour. Cambridge: Cambridge University Press. Topintzi, N., and A. Nevins (2017). Moraic onsets in Arrernte. Phonology 34(3), 615–650. Torppa, R., A. Faulkner, M. Huotilainen, J. Jarvikivi, J. Lipsanen, M. Laasonen, and M. Vainio (2014). The perception of prosody and associated auditory cues in early-implanted children: The role of auditory working memory and musical activities. International Journal of Audiology 53(3), 182–191. Torreira, F., and M. Grice (2018). Melodic constructions in Spanish: Metrical structure determines the association properties of intonational tones. Journal of the International Phonetic Association 48(1), 9–32. Towle, V. L., H. A. Yoon, M. Castelle, J. C. Edgar, N. M. Biassou, D. M. Frim, J.-P. Spire, and M. H. Kohrman (2008). ECoG gamma activity during a language task: Differentiating expressive and receptive speech areas. Brain 131(8), 2013–2027. Trager, G. L., and H. L. Smith (1951). An Outline of English Structure. Norman, OK: Battenburg Press. Traill, A. (1985). Phonetic and Phonological Studies of !Xóõ Bushman: Vol. 1. Quellen Zur KhoisanForschung. Hamburg: Helmut Buske. Trainor, L. J., C. M. Austin, and R. N. Desjardins (2000). Is infant-directed speech prosody a result of the vocal expression of emotion? Psychological Science 11(3), 188–195. Trần, H. M. (1967). Tones and intonation in South Vietnamese. In Đ. L. Nguyễn, H. M. Trần and D. Dellinger, Series A Occasional Papers 9: Papers in Southeast Asian Linguistics, 19–34. Canberra: Linguistics Circle of Canberra. Trần, T. T. H., and N. Vallée (2009). An acoustic study of interword consonant sequences in Vietnamese. Journal of the Southeast Asian Linguistics Society 2009, 231–249. Tranel, D. (1992). Neuropsychological correlates of cortical and subcortical damage. In S. C. Yudofsky and R. E. Hales (eds.), Textbook of Neuropsychiatry, vol. 2, 57–88. Washington, DC: American Psychiatric Press. Traunmüller, H. (1987). Some aspects of the sound of speech sounds. In M. E. H. Schouten (ed.), The Psychophysics of Speech Perception, 293–305. Dordrecht: Martinus Nijhoff. Traunmüller, H. (1990). Analytical expressions for the tonotopic sensory scale. Journal of the Acoustical Society of America 88, 97–100. Traunmüller, H., and A. Eriksson (2000). Acoustic effects of variation in vocal effort by men, women and children. Journal of the Acoustical Society of America 107, 3438–3451. Tremblay, A. (2008). Is second language lexical access prosodically constrained? Processing of word stress by French Canadian second language learners of English. Applied Psycholinguistics 29, 553–584. Tremblay, A. (2009). Phonetic variability and the variable perception of L2 word stress by French Canadian listeners. International Journal of Bilingualism 13, 35–62. Tremblay, A., M. Broersma, C. E. Coughlin, and J. Choi (2016). Effects of the native language on the learning of fundamental frequency in second-language speech segmentation. Frontiers in Psychology 7, 985. Tremblay, A., M. Broersma, and C. E. Coughlin (2018). The functional weight of a prosodic cue in the native language predicts the learning of speech segmentation in a second language. Bilingualism: Language and Cognition 21, 640–652. Tremblay, A., and N. Owens (2010). The role of acoustic cues in the development of (non-)target-like second-language prosodic representations. Canadian Journal of Linguistics 55, 84–114.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

856 References Trimboli, A., and M. B. Walker (1987). Nonverbal dominance in the communication of affect: A myth? Journal of Nonverbal Behavior 11(3), 180–190. Trinh, T., and L. Crnič (2011). On the rise and fall of declaratives. In I. Reich, E. Horch, and D. Pauly (eds.), Proceedings of Sinn und Bedeutung (SuB), vol. 15, 1–16. Saarbrücken: Saarland University Press. Trofimovich, P., and W. Baker (2006). Learning second language suprasegmentals: Effect of L2 experience on prosody and fluency characteristics of L2 speech. Studies in Second Language Acquisition 28(1), 1–30. Trompenaars, F., and C. Hampden-Turner (1998). Riding the Waves of Culture: Understanding Diversity in Global Business (2nd ed.). London: Nicolas Brealey. Trouvain, J. (2004). Tempo variation in speech production. PhD dissertation, Saarland University. Trouvain, J., C. Fauth, and B. Möbius (2016). Breath and non-breath pauses in fluent and disfluent phases of German and French L1 and L2 read speech. In Proceedings of Speech Prosody 8, 31–35, Boston. Trouvain, J., and M. Grice (1999). The effect of tempo on prosodic structure. In Proceedings of the 14th International Congress of Phonetic Sciences, 1067–1070, San Francisco. Trouvain, J., and B. Möbius (2014). Sources of variation of articulation rate in native and non-native speech: Comparisons of French and German. In Proceedings of Speech Prosody 7, 275–279, Dublin. Trouvain, J., F. Zimmerer, B. Möbius, M. Gósy, and A. Bonneau (2017). Segmental, prosodic and fluency features in phonetic learner corpora: Introduction to special issue. International Journal of Learner Corpus Research 3(2), 105–118. Trubetzkoy, N. (1939/1969). Principles of phonology (trans. C. A. M. Baltaxe). Berkeley: University of California Press. Truckenbrodt, H. (1995). Phonological phrases: Their relation to syntax, focus, and prominence. PhD dissertation, MIT. Truckenbrodt, H. (2002). Upstep and embedded register levels. Phonology 19, 77–120. Truckenbrodt, H. (2004). Final lowering in non-final position. Journal of Phonetics 32, 313–348. Truckenbrodt, H. (2006). On the semantic motivation of syntactic verb movement to C in German. Theoretical Linguistics 32(3), 257–306. Truckenbrodt, H. (2007). Upstep on edge tones and on nuclear accents. In C. Gussenhoven and T. Riad (eds.), Tones and Tunes: Vol. 2. Experimental Studies in Word and Sentence Prosody, 349–386. Berlin: Mouton de Gruyter. Truckenbrodt, H. (2012). Semantics of intonation. In C. Maienborn, K. von Heusinger, and P. Portner (eds.), Semantics: An International Handbook of Natural Language Meaning, 2039–2069. Berlin: De Gruyter. Truckenbrodt, H. (2016). Focus, intonation, and tonal height. In C. Féry and S. Ishihara (eds.), The Oxford Handbook of Information Structure, 463–482. Oxford: Oxford University Press. Trueswell, J. C., and M. Tanenhaus (2005). Approaches to Studying World-Situated Language Use. Cambridge, MA: MIT Press. Trumper, J., L. Romito, and M. Maddalon (1991). Double consonants, isochrony and ‘raddoppiamento fonosintattico’: Some reflections. In P. M. Bertinetto, M. Kenstowicz, and M. Loporcaro (eds.), Certamen Phonologicum II: Papers from the 1990 Cortona Phonology Meeting, 329–360. Torino: Rosenberg, Sellier. Tsao, F.-M. M., H.-M. M. Liu, P. K. Kuhl, T. Feng-Ming, and L. Huei-Mei (2004). Speech perception in infancy predicts language development in the second year of life: A longitudinal study. Child Development 75(4), 1067–1084. Tsao, F.-M. (2008). The effect of acoustical similarity on lexical-tone perception of one-year-old Mandarin-learning infants. Chinese Journal of Psychology 50(2), 111–124. Tsao, F.-M. (2017). Perceptual improvement of lexical tones in infants: Effects of tone language experience. Frontiers in Psychology 8, 558. Tse, J. K. P. (1978). Tone acquisition in Cantonese: A longitudinal case study. Journal of Child Language 5(2), 191–204.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 857 Tseng, C.-Y., S.-H. Pin, Y. Lee, H.-M. Wang, and Y.-C. Chen (2005). Fluent speech prosody: Framework and modeling. Speech Communication 46, 284–309. Tseplitis, L. K. (1974). Analiz rečevoj intonatsii. Riga: Zinatne. Tucker, A. N. (1981). The conceptions downdrift, upsweep, downstep, upstep, uphitch in African languages. In H. Jungraithmayr and D. Miehe (eds.), Berliner afrikanistische Vorträge, vol. 21, 223–224. Berlin: Deutscher Orientalistentage. Tuggy, D. (1979). Tetelcingo Nahuatl. In R. Langacker (ed.), Studies in Uto-Aztecan Grammar: Vol. 2. Modern Aztec Grammatical Sketches, 1–140. Arlington: Summer Institute of Linguistics. Tuite, K. (1993). The production of gesture. Semiotica 93(1–2), 83–106. Turco, G., C. Dimroth, and B. Braun (2015). Prosodic and lexical marking of contrast in L2 Italian. Second Language Research 31, 465–491. Turcsan, G. (2005). Le mot phonologique en français du Midi: Domaines, contraintes, opacité. PhD dissertation, University of Toulouse: Le Mirail. Turk, A. E. (1992). The American English flapping rule and the effect of stress on stop consonant dur ations. Working Papers of the Cornell Phonetics Laboratory 7, 103–133. Turk, A. E. (2011). The temporal implementation of prosodic structure. In A. Cohn, C. Fougeron, and M. Huffman (eds.), The Oxford Handbook of Laboratory Phonology, 242–253. Oxford: Oxford University Press. Turk, A. E., S. Nakai, and M. Sugahara (2006). Acoustic segment durations in prosodic research: A practical guide. In S. Sudhoff, D. Lenertová, R. Meyer, S. Pappert, P. Augurzky, I. Mleinek, N. Richter, and J. Schliesser (eds.), Methods in Empirical Prosody Research, 1–28. Berlin, New York: De Gruyter. Turk, A. E., and J. R. Sawusch (1996). The processing of duration and intensity cues to prominence. Journal of the Acoustical Society of America 99(6), 3782–3790. Turk, A. E., and J. S. Sawusch (1997). The domain of accentual lengthening in American English. Journal of Phonetics 25, 25–41. Turk, A. E., and S. Shattuck-Hufnagel (2000). Word-boundary-related duration patterns in English. Journal of Phonetics 28, 397–440. Turk, A. E., and S. Shattuck-Hufnagel (2007). Multiple targets of phrase-final lengthening in American English words. Journal of Phonetics 35, 445–472. Turk, A. E., and S. Shattuck-Hufnagel (2013). What is speech rhythm? A commentary on Arvaniti and Rodriquez, Krivokapić, and Goswami and Leong. Laboratory Phonology 4(1), 93–118. Turk, A. E., and S. Shattuck-Hufnagel (2014). Timing in talking: What is it used for, and how is it controlled? Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1658), 20130395. Turk, A. E., and S. Shattuck-Hufnagel (2020). Speech Timing. Oxford: Oxford University Press. Turk, A. E., and L. White (1999). Structural influences on accentual lengthening in English. Journal of Phonetics 27, 171–206. Turnbull, R. (2017). The phonetics and phonology of lexical prosody in San Jerónimo Acazulco Otomi. Journal of the International Phonetic Association 47(3), 251–282. Turpin, M. (2008). Text, rhythm and metrical form in an Aboriginal song series. In INTERSPEECH, 2008, 96–98, Brisbane. Turvey, M. T. (1990). Coordination. American Psychologist 45(8), 938–953. Tuttle, S. (2005). Duration, intonation and prominence in Apache. In S. Hargus and K. Rice (eds.), Athabaskan Prosody, 319–344. Amsterdam: John Benjamins. Twaha, A. I. (2017). Intonational phonology and focus in two varieties of Assamese. PhD dissertation, Indian Institute of Technology Guwahati. Tyler, J., and R. S. Burdin (2016). Epistemic and attitudinal meanings of rise and rise-plateau contours. In Proceedings of Speech Prosody 8, 128–132, Boston. Uchihara, H. (2016). Tone and Accent in Oklahoma Cherokee. New York: Oxford University Press. Ueyama, M., and S.-A. Jun (1998). Focus realization in Japanese English and Korean English inton ation. Japanese and Korean Linguistics 7, 629–645. Ulbrich, C. (2005). Phonetische Untersuchungen zur Prosodie der Standardvarietäten des Deutschen in der Bundesrepublik Deutschland: In der Schweiz und in Österreich. Frankfurt: Peter Lang.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

858 References Ulbrich, C. (2013). German pitches in English: Production and perception of cross-varietal differences in L2. Bilingualism: Language and Cognition 16(2), 397–419. Ulbrich, C., and I. Mennen (2016). When prosody kicks in: The intricate interplay between segments and prosody in perceptions of foreign accent. International Journal of Bilingualism 20(5), 522–549. Umbach, C. (2001). Contrast and contrastive topic. In Proceedings of the ESSLLI 2001 Workshop on Information Structure, Discourse Structure and Discourse Semantics, 175–188, Helsinki. Umeda, N. (1978). The occurrence of glottal stops. Journal of the Acoustical Society of America 64(1), 88–94. Ünal-Logacev, Ö., S. Fuchs, and L. Lancia (2018). A multimodal approach to the voicing contrast in Turkish: Evidence from simultaneous measures of acoustics, intraoral pressure and tongue palatal contacts. Journal of Phonetics 71, 395–409. Urbanczyk, S. (2001). Patterns of Reduplication in Lushootseed. New York: Garland. Urua, E.-A. (2000). Ibibio Phonetics and Phonology. Cape Town: Centre for Advanced Studies of African Society. Uwano, Z. (1997). Fukugō meishi kara mita nihongo shohōgen no akusento. In T. Kunihiro, H. Hirose, and M. Kohno (eds.), Accent, Intonation, Rhythm and Pause, 231–270. Tokyo: Sanseidō. Uwano, Z. (1999). Classification of Japanese accent systems. In S. Kaji (ed.), Proceedings of the Symposium Cross-Linguistic Studies of Tonal Phenomena: Tonogenesis, Typology and Related Topics, 151–186. Tokyo: Research Institute for Language and Cultures of Asia and Africa, Tokyo University of Foreign Studies. Uwano, Z. (2012). Three types of accent kernels in Japanese. Lingua 122(13), 1415–1440. Vago, R. M. (1980). The Sound Pattern of Hungarian. Washington, DC: Georgetown University Press. Vahidian-Kamyar, T. (2001). Næva-ye goftar dær Farsi. Mashhad: Ferdowsi University Press. Vainio, M., M. Airas, J. Järvikivi, and P. Alku (2010). Laryngeal voice quality in the expression of focus. In INTERSPEECH 2010, 921–924, Makuhari. Vainio, M., and J. Järvikivi (2007). Focus in production: Tonal shape, intensity and word order. Journal of the Acoustical Society of America, EL55–61. Vaissière, J. (1988). The use of prosodic parameters in automatic speech recognition. In H. Niemann, M. Lang, and G. Sagerer (eds.), Recent Advances in Speech Understanding and Dialog Systems, 71–99. Berlin: Springer. Vaissière, J. (1989). On automatic extraction of prosodic information for automatic speech recognition systems. In Eurospeech 1989, 202–205, Paris. Vajda, E. J. (2000). Ket Prosodic Phonology. Munich: Lincom Europa. Vajda, E. J. (2003). Tone and phoneme in Ket. In D. A. Holisky and K. Tuite (ed.), Current Trends in Caucasian, East European, and Inner Asian linguistics: Papers in Honor of Howard I. Aronson, 393–418. Amsterdam: John Benjamins. Vajda, E. J. (2004). Ket. Munich: Lincom Europa. Vakhtin, N. B. (1991). Sirinek Eskimo: The available data and possible approaches. Language Sciences 13(1), 99–106. Vakhtin, N. B. (1998). Endangered languages in northeast Siberia: Siberian Yupik and other languages of Chukotka. In E. Kasten (ed.), Bicultural Education in the North: Ways of Preserving and Enhancing Indigenous Peoples’ Languages and Traditional Knowledge, 159–174. Münster: Waxman. Valenzuela, P., and C. Gussenhoven (2013). Shiwilu (Jebero). Journal of the International Phonetic Association 43, 97–106. Välimaa-Blum, R. (1993). A pitch accent analysis of intonation in Finnish. Ural-Altaische Jahrbücher 12, 82–94. Vallduví, E. (1992). The Informational Component. New York: Garland. Vallduví, E., and E. Engdahl (1996). The linguistic realization of information packaging. Linguistics 34(3), 459–520.‫‏‬ Vallejos, R. (2013). El secoya del Putumayo: Aportes fonológicos para la reconstrucción del ProtoTucano Occidental. Liames 13, 67–100. van Bergem, D. (1993). Acoustic vowel reduction as a function of sentence accent, word stress, and word class on the quality of vowels. Speech Communication 12, 1–23. van Bezooijen, R. (1993). Fundamental frequency of Dutch women: An evaluative study. In Eurospeech 1993, 2041–2044, Berlin.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 859 van Bezooijen, R. (1995). Sociocultural aspects of pitch differences between Japanese and Dutch women. Language and Speech 38, 253–265. Van de Velde, M. L. (2008). A Grammar of Eton (Mouton Grammar Library 46). Berlin: Mouton de Gruyter. van de Ven, M., and C. Gussenhoven (2011). The timing of the final rise in falling-rising intonation contours in Dutch. Journal of Phonetics 39, 225–236. van de Weijer, J. (1997). Language input to a prelingual infant. In A. Sorace, C. Heycock, and R. Shillcock (eds.), Proceedings of GALA (1997), 290–293. Edinburgh: Edinburgh University Press. van de Weijer, J., and J. Zhang (2008). An X-bar approach to the syllable structure of Mandarin. Lingua 118, 1416–1428. van den Berg, J. (1958). Myoelastic-aerodynamic theory of voice production. Journal of Speech, Language, and Hearing Research 1, 227–244. van den Berg, R., C. Gussenhoven, and T. Rietveld (1992). Downstep in Dutch: Implications for a model. In G. J. Docherty and D. R. Ladd (eds.), Papers in Laboratory Phonology II: Gesture, Segment, Prosody, 335–367. Cambridge: Cambridge University Press. van der Hulst, H. (1984). Syllable Structure and Stress in Dutch. Dordrecht: Foris. van der Hulst, H. (1997). Primary accent is non-metrical. Rivista di linguistica 9(1), 99–127. van der Hulst, H. (2010a). Word accent: Terms, typologies and theories. In H. van der Hulst, R. W. N. Goedemans, and E. van Zanten (eds.), A Survey of Word Accentual Patterns in the Languages of the World, 3–54. Berlin: Mouton de Gruyter. van der Hulst, H. (2010b) Word accent systems in the languages of Europe. In H. van der Hulst, R. W. N. Goedemans, E. van Zanten (eds.), A Survey of Word Accentual Patterns in the Languages of the World, 429–508. Berlin: De Gruyter Mouton. van der Hulst, H. (2014a). The study of word accent and stress: Past, present, and future. In H. van der Hulst (ed.), Word Stress: Theoretical and Typological Issues, 3–55. Cambridge: Cambridge University Press. van der Hulst, H. (eds.) (2014b). Word Stress: Theoretical and Typological Issues. Cambridge: Cambridge University Press. van der Hulst, H., and R. W. N. Goedemans (2009). StressTyp [Database]. Retrieved 22 May 2020 from http://fonetiek-6.leidenuniv.nl/pil/stresstyp/stresstyp.html. van der Hulst, H., R. W. N. Goedemans, and E. van Zanten (eds.) (2010). A Survey of Word Accentual Patterns in the Languages of the World. Berlin: Mouton de Gruyter. van der Hulst, H., and S. Hellmuth (2010). Word accent systems in the Middle East. In R. W. N. Goedmans and H. van der Hulst (eds.), A Survey of Word Accentual Patterns in the Languages of the World, 615–646. Berlin: Mouton de Gruyter. van der Hulst, H., B. Hendriks, and J. van de Weijer (1999). A survey of word prosodic systems in European languages. In H. van der Hulst (ed.), Word Prosodic Systems in the Languages of Europe, 425–476. Berlin: Mouton de Gruyter. van der Hulst, H., and J. van de Weijer (1995). Vowel harmony. In J. A. Goldsmith (ed.), The Handbook of Phonological Theory, 495–534. Malden, MA: Blackwell. van der Meer, T. H. (1982). Fonologia da língua suruí. MA thesis, University of Campinas. van der Merwe, A. (1997). A theoretical framework for the characterization of pathological speech sensorimotor control. In M. R. McNeil (ed.), Clinical Management of Sensorimotor Speech Disorders, 1–25. New York: Thieme. van der Meulen, I., W. M. van de Sandt-Koenderman, M. H. Heijenbrok-Kal, E. G. Visch-Brink, and G. M. Ribbers (2014). The efficacy and timing of melodic intonation therapy in subacute aphasia. Neurorehabilitation and Neural Repair 28, 536–544. van der Tuuk, H. N. (1971). A Grammar of Toba Batak. The Hague: Martinus Nijhoff. van Dommelen, W. A. (1995). Interactions of fundamental frequency contour and perceived duration in Norwegian. Phonetica 52, 180–187. van Heuven, V. J. (1998). Effects of focus distribution and accentuation on the temporal and melodic organisation of word groups in Dutch. In S. Barbiers, J. Rooryck, and J. van de Weijer (eds.), Small Words in the Big Picture: Squibs for Hans Bennis, 37–42. Leiden: Holland Institute of Generative Linguistics.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

860 References van Heuven, V. J. (2014). Stress and segment duration in Dutch. In R. Kager, J. Grijzenhout, and K. Sebregts (eds.), Where the Principles Fail: A Festschrift for Wim Zonneveld on the Occasion of his 64th Birthday, 217–228. Utrecht: Utrecht Institute of Linguistics OTS. van Heuven, V. J. (2018). Acoustic correlates and perceptual cues of word and sentence stress: Towards a cross-linguistic perspective. In R. Goedemans, J. Heinz, and H. van der Hulst (eds.), The Study of Word Stress and Accent: Theories, Methods and Data, 13–59. Cambridge: Cambridge University Press. van Heuven, V. J., and de Jonge, M. (2011). Spectral and temporal reduction as stress cues in Dutch. Phonetica 68(3), 120–132. van Heuven, V. J., and M. Ch Dupuis (1991). Perception of anticipatory VCV-coarticulation: Effects of vowel context and accent distribution. In Proceedings of the 12th International Congress of Phonetic Sciences, 78–81, Aix‑en‑Provence. van Heuven, V. J., and V. Faust (2009). Are Indonesians sensitive to contrastive accentuation below the word level? Wacana: Jurnal Ilmu Pengetahuan Budaya 11, 226–240. van Heuven, V. J., and P. J. Hagman (1988). Lexical statistics and spoken word recognition in Dutch. In P. Coopmans and A. Hulk (eds.), Linguistics in the Netherlands (1988), 59–68. Dordrecht: Foris. van Heuven, V. J., L. Roosman, and E. van Zanten (2008). Betawi Malay Word Prosody. Lingua 118, 1271–1287. van Heuven, V. J., and A. M. C. Sluijter (1996). Notes on the phonetics of word prosody. In R. Goedemans, H. van der Hulst, and E. Visch (eds.), Stress Patterns of the World, Part 1: Background (HIL Publications 2), 233–269. The Hague: Holland Academic Graphics. van Heuven, V. J., and E. van Zanten (eds.) (2007). Prosody in Austronesian Languages of Indonesia. Utrecht: LOT. van Huyssteen, G., and D. P. Wissing (1996). Foneties-fonologische aspekte van prototipiese Afrikaanse vraagsinne: ’N verkenning. South African Journal of Linguistics 14 (Suppl. 34), 103–122. van Katwijk, A. (1974). Accentuation in Dutch: An Experimental Linguistic Study. Amsterdam: Van Gorcum. van Kuijk, D., and L. Boves (1999). Acoustic characteristics of lexical stress in continuous telephone speech. Speech Communication 27, 95–111. Van Lancker, D. (1980). Cerebral lateralization of pitch cues in the linguistic signal. Papers in Linguistics 13, 201–277. Van Lancker, D. (1984). The Affective-Linguistic Prosody Test. Ms. Van Lancker, D., and C. Breitenstein (2000). Emotional dysprosody and similar dysfunctions. In J. Bougousslavsky and J. L. Cummings (eds.), Disorders of Behavior and Mood in Focal Brain Lesions, 326–368. Cambridge: Cambridge University Press. Van Lancker, D., C. Cornelius, and J. Kreiman (1989). Recognition of emotional-prosodic meanings in speech by autistic, schizophrenic, and normal children. Developmental Neuropsychology 5(2–3), 207–226. Van Lancker, D., and V. A. Fromkin (1973). Hemispheric specialization for pitch and tone: Evidence from Thai. Journal of Phonetics 1, 101–109. Van Lancker, D., and J. J. Sidtis (1992). The identification of affective-prosodic stimuli by left- and right-hemisphere-damaged subjects: All errors are not created equal. Journal of Speech and Hearing Research 25, 963–970. Van Lancker Sidtis, D., D. Kempler, C. Jackson, and E. J. Metter (2010). Prosodic changes in aphasic speech: Timing. Journal of Clinical Linguistics and Phonetics 24, 155–167. Van Lancker Sidtis, D., N. Pachana, J. L. Cummings, and J. J. Sidtis (2006). Dysprosodic speech following basal ganglia insult: Toward a conceptual framework for the study of the cerebral representation of prosody. Brain and Language 97, 135–153. van Leyden, K., and V. J. van Heuven (2006). On the prosody of Orkney and Shetland dialects. Phonetica 63, 149–174. Van Lieshout, P. (2004). Dynamical systems theory and its application in speech. In B. Maassen, R. Kent, H. Peters, P. van Lieshout, and W. Hulstijn (eds.), Speech Motor Control in Normal and Disordered Speech, vol. 3, 51–82. Oxford: Oxford University Press.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 861 van Maastricht, L. J., E. Krahmer, and M. Swerts (2016). Native speaker perceptions of (non-)native prominence patterns: Effects of deviance in pitch accent distributions on accentedness, comprehensibility, intelligibility, and nativeness. Speech Communication 83, 21–33. van Maastricht, L. J., T. Zee, E. Krahmer, and M. Swerts (2017). L1 perceptions of L2 prosody: The interplay between intonation, rhythm, and speech rate and their contribution to accentedness and comprehensibility. In INTERSPEECH 2017, 364–368, Stockholm. van Minde, D. (1997). Malayu Ambong: Phonology, Morphology, Syntax. Leiden: CNWS. Van Nuffelen, G., M. De Bodt, F. Wuyts, and P. Van de Heyning (2009). The effect of rate control on speech rate and intelligibility of dysarthric speech. Folia Phoniatrica et Logopaedica, 61(2), 69–75. Van Ommen, S. (2016). Listen to the beat: A cross-linguistic perspective on the use of stress in segmentation. PhD dissertation, Utrecht Institute of Linguistics. Van Otterloo, K. (2014). Tonal melodies in the Kifuliiru verbal system. Africana Linguistica 20, 385–403. van Rooy, B. (2004). Black South African English: Phonology. In B. Kortmann, E. W. Schneider, K. Burridge, B. Kortman, R. Mesthrie, and C. Upton (eds.), Handbook of Varieties of English: Vol. 1. Phonology, 943–952. Berlin: Mouton. van Santen, J. P. H. (1992). Contextual effects on vowel duration. Speech Communication 11(6), 513–546. van Santen, J. P. H. (1994). Assignment of segmental duration in text-to-speech synthesis. Computer, Speech and Language 8, 95–128. van Santen, J. P. H. (1997). Segmental duration and speech timing. In Y. Sagisaka, N. Campbell, and N. Higuchi (eds.), Computing Prosody: Computational Models for Processing Spontaneous Speech, 225–249. New York: Springer. van Santen, J. P. H., and M. D’Imperio (1999). Positional effects on stressed vowel duration in standard Italian. In Proceedings of the 14th International Congress of Phonetic Sciences, 241–244, San Francisco. van Santen, J. P. H., and J. Hirschberg (1994). Segmental effects on timing and height of pitch contours. In Proceedings of the 3rd International Conference on Spoken Language Processing, 719–722, Yokohama. van Santen, J. P. H., and C. Shih (2000). Suprasegmental and segmental timing models in Mandarin Chinese and American English. Journal of the Acoustical Society of America 107(2), 1012–1026. van Zanten, E., and Goedemans, R. W. N. (2007). A functional typology of Austronesian and Papuan stress systems. In V. J. van Heuven and E. van Zanten (eds.), Prosody in Indonesian Languages (LOT Occasional Series 9), 63–88. Utrecht: LOT. van Zanten, E., R. W. N. Goedemans, and J. Pacilly (2003). The status of word-level stress in Indonesian. In J. van de Weijer, V. J. van Heuven, and H. van der Hulst (eds.), The Phonological Spectrum: Vol. 2. Suprasegmental Structure, 151–175. Amsterdam: John Benjamins. VanDam, M., A. S. Warlaumont, E. Bergelson, A. Cristia, M. Soderstrom, P. De Palma, and B. MacWhinney (2016). HomeBank: An online repository of daylong child-centered audio recordings. Seminars in Speech and Language 37(2), 128–142. Vanderslice, R., and P. Ladefoged (1972). Binary suprasegmental features of English and transforma tional word-accentuation rules. Language 48, 819–838. Vanderslice, R., and L. S. Pierson (1967). Prosodic features of Hawaiian English. Quarterly Journal of Speech 53, 156–166. Vanhove, M. (2008). Enoncés hiérarchisés, converbes et prosodie en bedja. In B. Caron (ed.), Subordination, dépendance et parataxe dans les langues africaines, 83–103. Louvain: Peeters. Vanrell, M. M., M. E. Armstrong, and P. Prieto (2017). Experimental evidence for the role of inton ation in evidential marking. Language and Speech 60(2), 242–259. Vanrell, M. M., and O. Fernández-Soriano (2013). Variation at the interfaces in Ibero-Romance: Catalan and Spanish prosody and word order. Catalan Journal of Linguistics 12, 1–30. Vanrell, M. M., and O. Fernández-Soriano (2018). Language variation at the prosody-syntax interface: Focus in European Spanish. In M. García and M. Uth (eds.), Focus Realization in Romance and Beyond, 33–70. Amsterdam: John Benjamins.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

862 References Vanrell, M. M., C. S. Francesc Ballone, and P. Prieto (2015). Sardinian intonational phonology: Logudorese and Campidanese varieties. In S. Frota and P. Prieto (eds.), Intonation in Romance, 317–349. Oxford: Oxford University Press. Vanrell, M. M., A. Stella, B. Gili Fivela, and P. Prieto (2013). Prosodic manifestations of the Effort Code in Catalan, Italian and Spanish contrastive focus. Journal of the International Phonetic Association 43(2), 195–220. Varga, L. (2002). Intonation and Stress: Evidence from Hungarian. Basingstoke: Palgrave Macmillan. Varga, L. (2008). The calling contour in Hungarian and English. Phonology 25, 469–497. Varga, L. (2010). Boundary tones and the lack of intermediate phrase in Hungarian (revisiting the Hungarian calling contour). The Even Yearbook 9. Budapest: Department of English Linguistics, Eötvös Loránd University. Vayra, M., C. Avesani, and C. A. Fowler (1984). Patterns of temporal compression in spoken Italian. In A. Cohen and M. P. R. van den Broecke (eds.), Proceedings of the 10th International Congress of Phonetic Sciences, 541–546. Dordrecht: De Gruyter Mouton. Vázquez Álvarez, J. J. (2011). A grammar of Chol, a Mayan language. PhD dissertation, University of Texas, Austin dissertation. Veerman-Leichsenring, A. (1991). Gramática del Popoloca de Metzontla (con vocabulario y textos). Amsterdam: Editions Rodopi. Veilleux, N., S. Shattuck-Hufnagel, and A. Brugos (2006). 6.911 Transcribing Prosodic Structure of Spoken Utterances with ToBI. MIT. Retrieved 22 May 2020 from https://ocw.mit.edu/courses/ electrical-engineering-and-computer-science/6-911-transcribing-prosodic-structure-of-spokenutterances-with-tobi-january-iap-2006. Vella, A. (1995). Prosodic structure and intonation in Maltese and its influence on Maltese English. PhD dissertation, University of Edinburgh. Vella, A. (2003). Phrase accents in Maltese: Distribution and realisation. In Proceedings of the 15th International Congress of Phonetic Sciences, 1775–1778, Barcelona. Vella, A. (2007). The phonetics and phonology of wh-question intonation in Maltese. In Proceedings of the 16th International Congress of Phonetic Sciences, 1285–1288, Saarbrücken. Vella, A. (2009a). Maltese intonation and focus structure. In R. Fabri (ed.), Maltese Linguistics: A Snapshot. In Memory of Joseph A. Cremona, 63–92. Bochum: Brockmeyer. Vella, A. (2009b). On Maltese prosody. In B. Comrie, R. Fabri, E. V. Hume, M. Mifsud, T. Stolz, and M. Vanhove (eds.), Introducing Maltese Linguistics, 47–68. Amsterdam: John Benjamins. Vella, A. (2011a). Phonetic implementation of falling pitch accents in dialectal Maltese: A preliminary study of the intonation of Gozitan Żebbuġi. In M. Embarki and M. Ennaji (eds.), Modern Trends in Arabic Dialectology, 211–238. Lawrenceville, NJ: Red Sea Press. Vella, A. (2011b). Alignment of the ‘early’ HL sequence in Maltese falling tune wh-questions. In Proceedings of the 17th International Congress of Phonetic Sciences, 2062–2065, Hong Kong. Vella, A. (2012). Languages and language varieties in Malta. International Journal of Bilingual Education and Bilingualism 16(5), 532–552. Vella, A., F. Chetcuti, and S. Agius (2016). Lengthening as a discourse strategy: Phonetic and phonological characteristics in Maltese. In G. Puech and B. Saade (eds.), Shifts and Patterns in Maltese (Studia Typologica 19), 991–114. Berlin: De Gruyter. Velleman, L. (2014). On optional focus movement in Kˈicheeˈ. In L. E. Clemens, R. Henderson, and P. M. Pedro (eds.), Proceedings of Formal Approaches to Mayan Linguistics (FAMLi) 2, 107–118. Cambridge, MA: MIT Working Papers in Linguistics. Venditti, J. J., K. Maekawa, and M. E. Beckman (2008). Prominence marking in the Japanese inton ation system. In S. Miyagawa (ed.), The Oxford Handbook of Japanese Linguistics, 456–512. Oxford: Oxford University Press. Vengoechea, C. (2012). Categorisation lexicale en muinane (Amazonie Colombienne). Doctoral dissertation, University of Toulouse. Vennemann, T. (1990). Syllable structure and simplex accent in modern Standard German. In Proceedings of the 26th Meeting of the Chicago Linguistics Society, vol. 2 399–412. Chicago: Chicago Linguistic Society.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 863 Vergyri, D., A. Stolcke, V. R. Gadde, L. Ferrer, and E. Shriberg (2003). Prosodic knowledge sources for automatic speech recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 208–211, Hong Kong. Verhoeven, E., and S. Skopeteas (2015). Licensing focus constructions in Yucatec Maya. International Journal of American Linguistics 81(1), 1–40. Verluyten, S. (1982). Recherches sur la prosodie et la métrique du français. PhD dissertation, University of Antwerp. Vermillion, P. (2006). Aspects of New Zealand English intonation and its meanings: An experimental investigation of forms and contrasts. PhD dissertation, Victoria University of Wellington. Vicenik, C., and S.-A. Jun (2014). An autosegmental-metrical analysis of Georgian intonation. In S.-A. Jun (ed.), Prosodic Typology II: The Phonology of Intonation and Phrasing, 154–186. Oxford University Press. Viegas Barros, J. P. (2013). Proto-Guaicurú. Munich: Lincom Europa. Vigário, M. (2003a). The Prosodic Word in European Portuguese. Berlin: Mouton de Gruyter. Vigário, M. (2003b) Prosody and sentence disambiguation in European Portuguese. In P. Prieto (ed.), Romance intonation (special issue), Catalan Journal of Linguistics 2, 249–278. Vigário, M., and S. Frota (2003). The intonation of Standard and Northern European Portuguese. Journal of Portuguese Linguistics 2(2), 115–137. Vigário, M. (2010). Prosodic structure between the prosodic word and the phonological phrase: Recursive nodes or an independent domain? The Linguistic Review 27(4), 485–530. Vigário, M., S. Frota, and F. Martins (2010). A frequência que conta na aquisição da fonologia: Types ou tokens. In A. M. Brito, F. Silva, J. Veloso, and A. Fiéis (eds.), XXV Encontro Nacional da Associação Portuguesa de Linguística: Textos seleccionados, 749–767. Porto: Associação Portuguesa de Linguística. Vigário, M., F. Martins, and S. Frota (2006). A ferramenta FreP e a frequência de tipos silábicos e classes de segmentos no Português. In F. Oliveira and J. Barbosa (eds.), XXI Encontro da Associação Portuguesa de Linguística: Textos seleccionados, 675–687. Lisbon: Associação Portuguesa de Linguística. Vihman, M. (2014). Phonological Development: The First Two Years (2nd ed.). Malden, MA: Wiley Blackwell. Vihman, M. (2015). Acquisition of the English sound system. In M. Reed and J. M. Levis (eds.), The Handbook of English Pronunciation, 333–352. Boston: Wiley Blackwell. Vihman, M. M., R. A. DePaolis, and B. L. Davis (1998). Is there a ‘trochaic bias’ in early word learning? Evidence from infant production in English and French. Child Development 69, 935–949. Vihman, M., S. Nakai, R. DePaolis, and P. Hallé (2004). The role of accentual pattern in early lexical representation. Journal of Memory and Language 50(3), 336–353. Viitso, T.-R. (2003). Phonology, morphology and word formation. In M. Erelt (ed.), Estonian Language, 9–92. Tallinn: Estonian Academy Publishers. Vijaykrishnan, K. G. (1978). Stress in Tamilian English: A study within the framework of generative phonology. MLitt dissertation, Central Institute of English and Foreign Languages. Villa-Cañas, T., J. D. Arias-Londoño, J. R. Orozco-Arroyave, J. F. Vargas-Bonilla, and E. Nöth (2015). Low-frequency components analysis in running speech for the automatic detection of Parkinson’s disease. In INTERSPEECH 2015, 100–104, Dresden. Villard, S. (2015). The phonology and morphology of Zacatepec Eastern Chatino. PhD dissertation, University of Texas at Austin. Villing, R., J. Timoney, T. Ward, and H. Costello (2004). Automatic blind syllable segmentation for continuous speech. In Proceedings of ISSC, Belfast. Visser, M., E. Krahmer, and M. Swerts (2014). Contextual effects on surprise expressions: A developmental study. Journal of Nonverbal Behavior 38(4), 523–547. Vitale, A. J. (1982). Problems of stress placement in Swahili. Studies in African Linguistics 13, 325–330. Voegelin, C. F. (1935). Tübatulabal grammar. University of California Publications in American Archaeology and Ethnology 34, 55–190.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

864 References Vogel, I., A. Athanasopoulou, and N. Pincus (2016). Prominence, contrast and the Functional Load Hypothesis: An acoustic investigation. In R. Goedemans, J. Heinz, and H. van der Hulst (eds.), Dimensions of Phonological Stress, 123–167. Cambridge: Cambridge University Press. Volk, E. (2011a). Depression as register: Evidence from Mijikenda. In Proceedings of the 37th Meeting of the Berkeley Linguistics Society, 389–398, Berkeley. Volk, E. (2011b). Mijikenda tonology. PhD dissertation, Tel Aviv University. Voorhoeve, J. (1971). Morphotonology of the Bamileke noun. Journal of African Languages 10, 44–53. Voorhoeve, J. (1973). Safwa as a restricted tone system. Studies in African Linguistics 4, 1–21. Vouloumanos, A., M. D. Hauser, J. F. Werker, and A. Martin (2010). The tuning of human neonates’ preference for speech. Child Development 81(2), 517–527. Vouloumanos, A., and J. F. Werker (2004). Tuned to the signal: The privileged status of speech for young infants. Developmental Science 7(3), 270–276. Vũ, M. Q., Đỗ Đ. T., and É. Castelli (2006). Intonation des phrases interrogatives et affirmatives en langue vietnamienne. Journées d’étude de la parole 4, 187–190. Vũ, N. T., C. d’Alessandro, and A. Michaud (2005). Using open quotient for the characterization of Vietnamese glottalized tones. In INTERSPEECH 2005, 2885–2889, Lisbon. Vũ, T. P. (1982). Phonetic properties of Vietnamese tones across dialects. In D. Bradley (eds.), Papers in Southeast Asian Linguistics, 55–75. Sydney: Australian National University. Vuillermet, M. (2012). Grammaire de l’ese ejja, langue tacana d’Amazonie bolivienne. Doctoral dissertation, Lyon 2 University. Vydrin, V. (2008). Dictionnaire Dan-Français (dan de l’est). St Petersburg: Nestor-Istoria. Vydrin, V. (2010). Le pied métrique dans les langues mandé. In F. Floricic (ed.), Essais de typologie et de linguistique générale: Mélanges offerts à Denis Creissels, 53–62. Lyon: ENS Éditions. Wade, T., and L. L. Holt (2005). Perceptual effects of preceding nonspeech rate on temporal properties of speech categories. Perception and Psychophysics 67, 939–950. Wagner, A. (2006). A comprehensive model of intonation for application in speech technology. In Proceedings of the 8th International PhD Workshop OWD, 21–24, 91–96, Wisła. Wagner, A. (2008). Automatic labeling of prosody. In Proceedings of the International Speech Communication Association Tutorial and Research Workshop on Experimental Linguistics (ExLing 2008), 221–224, Athens. Wagner, A. (2017). Rytm w mowie i języku w ujęciu wielowymiarowym. Warsaw: Elipsa. Wagner, A., K. Klessa, and J. Bachan (2016). Polish rhythmic database: New resources for speech timing and rhythm analysis. In Proceedings of the 10th International Conference on Language Resources and Evaluation, 4678–4683, Portorož. Wagner, E. (1997). Harari. In R. Hetzron (ed.), The Semitic Languages, 486–508. London: Routledge. Wagner, G. (1933). Yuchi. In F. Boas (ed.), Handbook of American Indian Languages, vol. 3. New York: Columbia University Press. Wagner, K. (2014). An intonational description of Mayan Q’eqchi’. PhD dissertation, Brigham Young University. Wagner, M. (2005). Prosody and recursion. PhD dissertation, MIT. Wagner, M. (2012a). Focus and givenness: A unified approach. In I. Kučerová and A. Neeleman (eds.), Contrasts and Positions in Information Structure, 102–148. Cambridge: Cambridge University Press. Wagner, M. (2012b). Contrastive topics decomposed. Semantics and Pragmatics 5(8), 1–54. Wagner, M., and D. Watson (2010). Experimental and theoretical advances in prosody: A review. Language and Cognitive Processes 25(7), 905–945. Wagner, P. (2010). A time-delay approach to speech rhythm visualization, modeling and measurement. In M. Russo (ed.), Prosodic Universals: Comparative Studies in Rhythmic Modeling and Rhythm Typology, 117–146. Rome: Aracne. Wagner, P., Z. Malisz, B. Inden, and I. Wachsmuth (2013). Interaction phonology: A temporal coordination component enabling representational alignment within a model of communication. In I. Wachsmuth (eds), Alignment in Communication: Towards a New Theory of Communication, 109–132. Amsterdam: John Benjamins.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 865 Waleschkowski, E. (2009). Focus in German Sign Language. Paper presented at the Workshop on Non-Manuals in Sign Languages, Frankfurt. Walker, J. P., L. Joseph, and J. Goodman (2009). Production of linguistic prosody in subjects with aphasia. Clinical Linguistics and Phonetics 23, 529–549. Walker, J. P., R. Pelletier, and L. Reif (2004). The production of linguistic prosody by subjects with right hemisphere damage. Clinical Linguistics and Phonetics 18, 85–106. Walliser, K. (1968). Zusammenwirken von Hüllkurvenperiode und Tonheit bei der Bildung der Periodentonhöhe. PhD dissertation, Technische Hochschule München. Walliser, K. (1969). Über ein Funktionsschema für die Bildung der Periodentonhöhe aus dem Schallreiz. Kybernetik 6, 65–72. Walton, C. (1979). Panutaran (Sama) phonology. Studies in Philippine Linguistics 3(2), 189–217. Walton, J., G. Hensarling, and M. R. Maxwell (2000). El Muinane: La lengua Andoque. In M. Stella and M. L. Rodriguez De Montes Gonzales De Perez (ed.), Lenguas indígenas de Colombia: Una visión descriptiva. Santafé de Bogotá: Instituto Caro y Cuervo. Waltz, N. E. (2007). Diccionario bilingüe: Wanano o Guanano—Español, Español—Wanano o Guanano (ed. P. S. de Jones and C. de Waltz). Bogotá: Editorial Fundación para el Desarrollo de los Pueblos Marginados. Waltz, N. E., and C. Waltz (2000). El Wanano. In G. de Pérez and M. Stella (eds.), Lenguas indígenas de Colombia: Una visión descriptiva. Santafé de Bogotá: Instituto Caro y Cuervo. Wan, I.-P. (2002). The status of prenuclear glides in Mandarin syllables: Evidence from psycholinguistics and experimental acoustics. Journal of Chinese Phonology 11, 232–248. Wan, P., Z. Huang, and J. Gao (2009). Research of the effect of cochlear implant and hearing aid on voice quality of hearing impaired children [in Chinese]. Journal of Clinical Otorhinolaryngology, Head, and Neck Surgery 23(19), 874–877. Wang, B., L. Wang, and T. Qadir (2011). Prosodic realization of focus in six languages/dialects in China. In Proceedings of the 17th International Congress of Phonetic Sciences, 144–147, Hong Kong. Wang, B., and Y. Xu (2011). Differential prosodic encoding of topic and focus in sentence-initial pos ition in Mandarin Chinese. Journal of Phonetics 37, 502–520. Wang, B., Y. Xu, and Q. Ding (2018). Interactive prosodic marking of focus, boundary and newness in Mandarin. Phonetica 75, 24–56. Wang, D., S. Trehab, A. Volkova, and P. van Lieshout (2013). Child implant users imitation of happyand sad-sounding speech. Frontiers of Psychology 4(1), 351. Wang, H. S. (1993). Taiyu biandiao de xinli texing. Tsinghua Xuebao 23, 175–192. Wang, H. S., and C.-C. Chang (2001). On the status of the prenucleus glide in Mandarin Chinese. Language and Linguistics 2, 243–260. Wang, J.-E., and F.-M. Tsao (2015). Emotional prosody perception and its association with pragmatic language in school-aged children with high-function autism. Research in Developmental Disabilities 37, 162–170. Wang, M., and J. Hirschberg (1992). Automatic Classification of Intonational Phrase Boundaries, Computer Speech and Language 6, 175–196. Wang, P., and F. Shi (2011). Hanyu yudiao de jiben moshi. Nankai Yuyan Xuekan 2, 1–11. Wang, W. S.-Y. (1972). The many uses of Fo. In A. Valdman (ed.), Papers in Linguistics and Phonetics Dedicated to the Memory of Pierre Delattre, 487–503. The Hague: Mouton. Wang, X. (2012). Auditory and visual training on Mandarin tones: A pilot study on phrases and sentences. International Journal of Computer-Assisted Language Learning and Teaching 2, 16–29. Wang, X., S. Wang, Y. Fan, D. Huang, and Y. Zhang (2017). Speech-specific categorical perception deficit in autism: An event-related potential study of lexical tone processing in Mandarin-speaking children. Scientific Reports 7, 43254. Wang, Y., A. Jongman, and J. A. Sereno (2003a). Acoustic and perceptual evaluation of Mandarin tone productions before and after perceptual training. Journal of the Acoustical Society of America 113(2), 1033–1043. Wang, W. S.-Y., and K.-P. Li (1967). Tone 3 in Pekinese. Journal of Speech and Hearing Research 10, 629–636.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

866 References Wang, Y., J. A. Sereno, A. Jongman, and J. Hirsch (2003b). fMRI evidence for cortical modification during learning of Mandarin lexical tone. Journal of Cognitive Neuroscience 15(7), 1019–1027. Wang, Y., M. M. Spence, A. Jongman, and J. A. Sereno (1999). Training American listeners to perceive Mandarin tones. Journal of the Acoustical Society of America 106, 3649–3658. Ward, G., and J. Hirschberg (1985). Implicating uncertainty: The pragmatics of fall-rise intonation. Language 61, 747–776. Ward, A., and D. Litman (2007). Measuring Convergence and Priming in Tutorial Dialog. Technical report, University of Pittsburgh. Ward, N. G., and P. Gallardo (2017). Non-native differences in prosodic-construction use. Dialogue and Discourse 8(1), 1–30. Warkentin, V., and R. M. Brend (1974). Chol phonology. Linguistics 12(132), 87–102. Warner, N., and T. Arai (2001). Japanese mora-timing: A review. Phonetica 58(1–2), 1–25. Warren, P. (2005). Patterns of late rising in New Zealand English: Intonational variation or inton ational change? Language Variation and Change 17(2), 209–230. Warren, P. (2014). Sociophonetic and prosodic influences on judgements of sentence type. In Proceedings of the 15th Australasian International Conference on Speech Science and Technology, 185–188, Christchurch. Warren, P. (2016). Uptalk: The Phenomenon of Rising Intonation. Cambridge: Cambridge University Press. Warren, P., and N. Daly (2005). Characterizing New Zealand English intonation: Broad and narrow analysis. In A. Bell, R. Harlow, and D. Starks (eds.), Languages of New Zealand, 217–237. Wellington: Victoria University Press. Warren, P., I. Elgort, and D. Crabbe (2009). Comprehensibility and prosody ratings for pronunciation software development. Language Learning and Technology 13(3), 87–102. Warren, P., and J. Fletcher (2016a). Leaders and followers: Uptalk and speaker role in map tasks in New Zealand English and Australian English. New Zealand English Journal 29–30, 77–93. Warren, P., and J. Fletcher (2016b). Phonetic differences between uptalk and question rises in two Antipodean English varieties. In Proceedings of Speech Prosody 8, 148–152, Boston. Warren-Leubecker, A., and J. N. Bohannon (1984). Intonation patterns in child-directed speech: Mother-father differences. Child Development, 1379–1385. Watkins, J. (2002). The Phonetics of Wa: Experimental Phonetics, Phonology, Orthography and Sociolinguistics. Canberra: Pacific Linguistics. Watkins, J. (2013). A first account of tone in Myebon Sumtu Chin. Linguistics of the Tibeto-Burman Area 36, 97–127. Watkins, L. J. (with the assistance of P. McKenzie) (1984). A Grammar of Kiowa. Lincoln: University of Nebraska Press. Watson, D. G., J. E. Arnold, and M. K. Tanenhaus (2008a). Tic tac toe: Effects of predictability and importance on acoustic prominence in language production. Cognition 106(3), 1548–1557. Watson, D. G., M. Tanenhaus, and C. Gunlogson (2008b). Interpreting pitch accents in online comprehension: H* vs. L+H*. Cognitive Science 32, 1232–1244. Watson, J. C. E. (2000). Review of R. Hetzron (ed.), The Semitic languages. London: Routledge, 1997. Journal of Linguistics 36(3), 645–664. Watson, J. C. E. (2006). Arabic morphology: Diminutive verbs and diminutive nouns in San’ani Arabic. Morphology 16, 189–204. Watson, J. C. E. (2007). The Phonology and Morphology of Arabic. New York: Oxford University Press. Watson, J. C. E. (2011). Word stress in Arabic. In M. van Oostendorp, C. J. Ewen, E. V. Hume, and K. Rice (eds.), The Blackwell Companion to Phonology, vol. 5, 2990–3018. Oxford: Wiley Blackwell. Watson, J. C. E., and A. Bellem (2011). Glottalisation and neutralisation in Yemeni Arabic and Mehri: An acoustic study. In Z. M. Hassan and B. Heselwood (eds.), Instrumental Studies in Arabic Phonetics, 235–256. Amsterdam: John Benjamins.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 867 Watson, J. C. E., and J. Wilson (2017). Gesture in modern South Arabian languages: Variation in multimodal constructions during task-based interaction. Brill’s Journal of Afroasiatic Languages and Linguistics 9(1–2), 49–72. Watters, J. K. (1980). Aspects of Tlachichilco Tepehua (Totonacan) phonology. SIL-Mexico Workpapers 4, 85–130. Waxman, S. R., J. Lidz, I. E. Braun, and T. Lavin (2009). Twenty four-month-old infants’ interpret ations of novel verbs and nouns in dynamic scenes. Cognitive Psychology 59(1), 67–95. Wayland, R. P., and S. G. Guion (2003). Perceptual discrimination of Thai tones by naive and experienced learners of Thai. Applied Psycholinguistics 24, 113–129. Wayland, R. P., D. Landfair, B. Li, and S. G. Guion (2006). Native Thai speakers’ acquisition of English word stress patterns. Journal of Psycholinguistic Research 35(3), 285–304. Weast, T. P. (2008). Questions in American Sign Language: A quantitative analysis of raised and lowered eyebrows. PhD dissertation, University of Texas at Arlington. Weber, C., A. Hahne, M. Friedrich, and A. D. Friederici (2004). Discrimination of word stress in early infant perception: Electrophysiological evidence. Cognitive Brain Research 18(2), 149–161. Wedekind, K., C. Wedekind, and A. Musa (2005). Beja pedagogical grammar. Ms. Retrieved 4 June 2020 from https://www.afrikanistik-aegyptologie-online.de/archiv/2008/1283/beja_pedagogical_ grammar_final_links_numbered.pdf. Wee, L.-H. (2008). Phonological patterns in the Englishes of Singapore and Hong Kong. World Englishes 27, 480–501. Wee, L.-H. (2016). Tone assignment in Hong Kong English. Language 92(2), 67–87. Weeda, D. (1992). Word truncation in prosodic morphology. PhD dissertation: University of Texas at Austin. Wegener, C. (2012). A Grammar of Savosavo. Berlin: Mouton de Gruyter. Weinberg, M. K., and E. Z. Tronick (1998). Emotional characteristics of infants associated with maternal depression and anxiety. Pediatrics 102, 1298–1304. Weinreich, U. (1954). Stress and word structure in Yiddish. In U. Weinreich (ed.), The Field of Yiddish: Studies in Yiddish Language, Folklore, and Literature, 1–27. New York: Linguistic Circle of New York. Weinreich, U. (1956). Notes on the Yiddish rise-fall intonation contour. In M. Halle (ed.), For Roman Jakobson, 633–643. The Hague: Mouton. Weiss, D. (2009). Phonologie et morphosyntaxe du maba. PhD dissertation, Lyon 2 University. Welby, P. (2003a). The slaying of Lady Mondegreen, being a study of French tonal association and alignment and their role in speech segmentation. PhD dissertation, The Ohio State University. Welby, P. (2003b). Effects of pitch accent position, type, and status on focus projection. Language and Speech 46(1), 53–81. Welby, P. (2006). French intonational structure: Evidence from tonal alignment. Journal of Phonetics 34(3), 343–371. Welby, P. (2007). The role of early fundamental frequency rises and elbows in French word segmentation. Speech Communication 49, 28–48. Welby, P., and H. Lœvenbruck (2006). Anchored down in Anchorage: Syllable structure, rate, and segmental anchoring in French. Italian Journal of Linguistics/Rivista di linguistica 18, 74–124. Welby, P., and O. Niebuhr (2016). The influence of f0 discontinuity on intonational cues to word segmentation: A preliminary investigation. In Proceedings of Speech Prosody 8, 40–44, Boston. Welby, P., and O. Niebuhr (2019). ‘Segmental intonation’ information in French fricatives. In Proceedings of the 19th International Congress of Phonetic Sciences, 225–229, Melbourne. Wellmann, C., J. Holzgrefe, H. Truckenbrodt, I. Wartenburger, and Höhle, B. (2012). How each pros odic boundary cue matters: Evidence from German infants. Frontiers in Psychology 3, 580. Wells, B., S. Peppé, and A. Goulandris (2004). Intonation development from five to thirteen. Journal of Child Language 31, 749–778. Wells, B., and J. Stackhouse (2015). Children’s Intonation: A Framework for Practice and Research. Chichester Wiley Blackwell.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

868 References Wells, J. C. (1982). Accents of English: Vol. 3. Beyond the British Isles. Cambridge: Cambridge University Press. Wells, J. C. (2006). English Intonation: An Introduction. Cambridge: Cambridge University Press. Welmers, W. E. (1959). Tonemics, morphotonemics, and tonal morphemes. General Linguistics 4, 1–9. Welmers, W. E. (1962). A phonology of Kpelle. Journal of African Languages 1, 69–93. Welmers, W. E. (1973). African Language Structures. Berkeley: University of California Press. Welmers, W. E., and B. F. Welmers (1969). Noun modifiers in Igbo. International Journal of American Linguistics 35, 315–322. Weninger, S. (2011). Old Ethiopic. In S. Weninger, G. Khan, M. P. Streck, and J. C. E. Watson (eds.), The Semitic Languages: An International Handbook, 1124–1141. Berlin: De Gruyter Mouton. Wennerstrom, A. (1994). Intonational meaning in English discourse: A study of non-native speakers. Applied Linguistics 15(4), 399–420. Wennerstrom, A., and A. F. Siegel (2003). Keeping the floor in multi-party conversations: Intonation, syntax, and pause. Discourse Processes 36(2), 77–107. Werker, J. F., and S. Curtin (2005). PRIMIR: A developmental model of speech processing. Language Learning and Development 1(2), 197–234. Werker, J. F., and P. J. McLeod (1989). Infant preference for both male and female infant-directed talk: A developmental study of attentional and affective responsiveness. Canadian Journal of Psychology 43, 230–246. Werle, A. (2009). Word, phrase, and clitic prosody in Bosnian, Serbian and Croatian. PhD dissertation, University of Massachusetts Amherst. Werner, H. (1997). Die Ketische Sprache. Wiesbaden: Harrassowitz. Werth, A. (2011). Perzeptionsphonologische Grundlagen der Prosodie. Stuttgart: Franz Steiner. Wertz, R. T., and J. C. Rosenbek (1992). Where the ear fits: A perceptual evaluation of motor speech disorders. Seminars in Speech and Language 13(1), 39–54. Westera, M. (2013). ‘Attention, I’m violating a maxim!’: A unifying account of the final rise. In R. Fernández and A. Isard (eds.), Proceedings of the 17th Workshop on the Semantics and Pragmatics of Dialogue, 150–159, Amsterdam. Westera, M. (2018). Rising declaratives of the Quality-suspending kind. Glossa: A Journal of General Linguistics, 3(1), 121. Westera, M. (2019). Rise-fall-rise as a marker of secondary QUDs. In D. Gutzmann and K. Turgay (eds.), Secondary Content: The Semantics and Pragmatics of Side Issues, 376–404. Leiden: Brill. Wetterlin, A. (2010). Tonal Accents in Norwegian: Phonology, Morphology and Lexical Specification. Berlin: Mouton de Gruyter. Wetzels, W. L. M., and S. Meira (2010). A survey of South American stress systems. In R. W. N. Goedemans, H. van der Hulst, and E. van Zanten (eds.), A Survey of Word Accentual Patterns in the Languages of the World, 313–379. Berlin: Mouton de Gruyter. Wewalaarachchi, T. D., and L. Singh (2016). Effects of suprasegmental phonological alternations on early word recognition: Evidence from tone sandhi. Frontiers in Psychology 7, 627. Whalen, D. H., A. G. Levitt, and Q. Wang (1991). Intonational differences between the reduplicative babbling of French- and English-learning infants. Journal of Child Language 18, 501–516. Whalen, D. H., and A. G. Levitt (1995). The universality of intrinsic f0 of vowels. Journal of Phonetics 23, 349–366. Whalen, D. H., and Y. Xu (1992). Information for Mandarin tones in the amplitude contour and in brief segments. Phonetica 49, 25–47. Wheeldon, L., and A. Lahiri (1997). Prosodic units in speech production. Journal of Memory and Language 37, 356–381. Wheeldon, L., and A. Lahiri (2002). The minimal unit of phonological encoding: Prosodic or lexical word. Cognition 85, B31–B41. Wheeler, A. (1987). Gantëya Bain: El pueblo siona del río Putumayo, Colombia: Vol. 1. Etnología, gramática, textos. Vol. 2. Diccionario. Bogotá: ILV. Wheeler, M. W. (2005). The Phonology of Catalan. Oxford: Oxford University Press. White, L. (2002). English speech timing: A domain and locus approach. PhD dissertation, University of Edinburgh.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 869 White, L. (2014). Communicative function and prosodic form in speech timing. Speech Communication, 63–64, 38–54. White, L. (2018). Segmentation of speech. In S.-A. Rueschemeyer and G. Gaskell (eds.), Oxford Handbook of Psycholinguistics, 5–30. Oxford: Oxford University Press. White, L., C. Delle Luche, and C. Floccia (2016). Five-month-old infants’ discrimination of unfamiliar languages does not accord with ‘rhythm class’. In Proceedings of Speech Prosody 8, 567–571, Boston. White, L., and S. L. Mattys (2007a). Calibrating rhythm: First language and second language studies. Journal of Phonetics 35(4), 501–522. White, L., and S. L. Mattys (2007b). Rhythmic typology and variation in first and second languages. In P. Prieto, J. Mascaró, and M.-J. Solé (eds.), Segmental and Prosodic Issues in Romance Phonology, 237–257. Amsterdam: John Benjamins. White, L., S. L. Mattys, and L. Wiget (2012). Language categorization by adults is based on sensitivity to durational cues, not rhythm class. Journal of Memory and Language 66, 665–679. White, L., and A. E. Turk (2010). English words on the Procrustean bed: Polysyllabic shortening reconsidered. Journal of Phonetics 38(3), 459–471. Wichmann, A. (2000). Intonation in Text and Discourse. London: Pearson Education. Wichmann, S. (1994). Underspecification in Texistepec Popoluca phonology. Acta Linguistica Hafniensa 10, 455–474. Wichmann, S. (1995). The Relationship among the Mixe-Zoquean Languages of Mexico. Salt Lake City: University of Utah Press. Wickham, H. (2009). ggplot2: Elegant Graphics for Data Analysis. Cham: Springer. Wieden, W. (1993). Aspects of acquisitional stages. In B. Kettemann and W. Wieden (eds.), Current Issues in European Second Language Acquisition Research, 125–135. Tübingen: Gunter Narr. Wieling, M. (2018). Analyzing dynamic phonetic data using generalized additive mixed modeling: A tutorial focusing on articulatory differences between L1 and L2 speakers of English. Journal of Phonetics 70, 86–116. Wiese, R. (1983). Psycholinguistische Aspekte der Sprachproduktion. Hamburg: Buske. Wiese, R. (1996). The Phonology of German. Oxford: Clarendon Press. Wiget, L., L. White, B. Schuppler, I. Grenon, O. Rauch, and S. L. Mattys (2010). How stable are acoustic metrics of contrastive speech rhythm? Journal of the Acoustical Society of America 127(3), 1559–1569. Wightman, C., S. Shattuck-Hufnagel, P. Price, and M. Ostendorf (1992). Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America 91(3), 1707–1717. Wightman, C., and M. Ostendorf (1994). Automatic labeling of prosodic patterns. IEEE Transactions on Speech and Audio Processing 2(4), 469–481. Wightman, C. (2002). ToBI or not ToBI. In Proceedings of Speech Prosody 1, 25–29, Aix-en-Provence. Wightman, F. L. (1973). The pattern-transformation model of pitch. Journal of the Acoustical Society of America 54, 407–416. Wiik, K. (1989). Pohjois-eurooppalaisten kielten entinen yhteinen puherytmi. In H. Nyyssönen and O. Kuure (eds.), XV Kielitieteen päivät Oulussa 13–14.05.1988, 277–302. Oulu: University of Oulu. Wijnen, F., E. Krikhaar, and E. Den Os (1994). The (non)realization of unstressed elements in children’s utterances: Evidence for a rhythmic constraint. Journal of Child Language 21, 59–83. Wilbur, R. B. (1993). Syllables and segments: Hold the movement and move the holds! In G. Coulter (ed.), Current Issues in ASL Phonology, 135–168. New York: Academic Press. Wilbur, R. B. (2000). Phonological and prosodic layering of nonmanuals in American Sign Language. In K. Emmorey and H. Lane (eds.), The Signs of Language Revisited: An Anthology to Honor U. Bellugi and E. Klima, 215–244. Mahwah: Erlbaum. Wilbur, J. (2014). A Grammar of Pite Saami. Berlin: Language Science Press. Wilbur, R. B., and C. G. Patschke (1998). Body leans and the marking of contrast in American Sign Language. Journal of Pragmatics 30(3), 275–303. Wilbur, R. B., and C. G. Patschke (1999). Syntactic correlates of brow raise in ASL. Sign Language and Linguistics 2(1), 3–41.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

870 References Wilhelm, S. (2016). Towards a typological classification and description of HRTs in a multidialectal corpus of contemporary English. In Proceedings of Speech Prosody 8, 138–142, Boston. Wilkinson, M. (1991). Djambarrpuyngu: A Yolngu variety of northern Australia. PhD dissertation, University of Sydney. Willet, T. L. (1991). A Reference Grammar of Southeastern Tepehuan. Arlington: Summer Institute of Linguistics/University of Texas at Arlington. Williams, B. (1983). Stress in modern Welsh. PhD dissertation, University of Cambridge. (Distributed 1989, Indiana University Linguistics Club.) Williams, B. (1985). Pitch and duration in Welsh stress perception: The implications for intonation. Journal of Phonetics 13, 381–406. Williams, B. (1999). The phonetic manifestation of stress in Welsh. In H. van der Hulst (ed.), Word Prosodic Systems in the Languages of Europe, 311–334. New York: Mouton de Gruyter. Williams, S. E., and E. J. Seaver (1986). A comparison of speech sound durations in three syndromes of aphasia. Brain and Language 29(1), 171–182. Williamson, M. C. (1981). The correlation between speech-tones of text syllables and their musical setting in a Burmese classical song. Musica Asiatica 3, 11–28. Wilson, S. A. (1986). Metrical structure in Wakashan phonology. Proceedings of the 12th Meeting of the Berkeley Linguistics Society, 283–291, Berkeley. Wiltshire, C., and J. Harnsberger (2006). The influence of Gujarati and Tamil L1s on Indian English: A preliminary study. World Englishes 25(1), 91–104. Wiltshire, C., and R. Moon (2003). Phonetic stress in Indian English vs. American English. World Englishes 22(3), 291–303. Windmann, A., J. Šimko, and P. Wagner (2015). Optimization-based modeling account of speech timing. Speech Communication 74, 76–92. Wissing, D. P. (2014). Fonologie. In W. A. M. Carstens and N. Bosman (eds.), Kontemporêre Afrikaanse taalkunde, 131–176. Pretoria: Van Schaik. Withgott, M., and P.-K. Halvorsen (1984). Morphological Constraints on Scandinavian Tone Accent. Stanford: CSLI. Witt, S. M. (2012). Automatic error detection in pronunciation training: Where we are and where we need to go. In Proceedings of IS ADEPT, Stockholm. Witteman, J., M. H. van IJzendoorn, D. van de Velde, V. J. van Heuven, and N. O. Schiller (2011). The nature of hemispheric specialization for linguistic and emotional prosodic perception: A metaanalysis of the lesion literature. Neuropsychologia 49(13), 3722–3738. Wolfart, H. C. (1996). Sketch of Cree, an Algonquian language. In I. Goddard (ed.), Handbook of American Indians: Vol. 17. Languages, 390–439. Washington, DC: Smithsonian Institute. Wolff, E. (1987). Consonant-tone interference in Chadic and its implications for a theory of tonogenesis in Afroasiatic. In D. Barreteau (ed.), Langues et cultures dans le bassin du lac Tchad, 193–216. Paris: ORSTOM. Wolff, J. U. (1972). A Dictionary of Cebuano Visayan. Ithaca: Southeast Asia Program and Linguistic Society of the Philippines. Wolff, J. U. (with M. T. C. Centeno and D.-H. V. Rau) (1991). Pilipino through Self-Instruction. Ithaca: Cornell Southeast Asia Program. Wolfram, W., and E. R. Thomas (2002). The Development of African American English (Language in Society 31). Oxford: Blackwell. Wolgemuth, C. (1981). Gramática náhuatl de Mecayapan. Verano: Instituto Lingüístico de Verano. Woll, B. (1981). Question structure in British Sign Language. In B. Woll, J. Kyle, and M. Deuchar (eds.), Perspectives on British Sign Language and Deafness, 136–149. London: Croom Helm. Wong, P. (2012a). Acoustic characteristics of three-year-olds’ correct and incorrect monosyllabic Mandarin lexical tone productions. Journal of Phonetics 40(1), 141–151. Wong, P. (2012b). Monosyllabic Mandarin tone productions by 3-year-olds growing up in Taiwan and in the United States: Interjudge reliability and perceptual results. Journal of Speech, Language, and Hearing Research 55(5), 1423–1437.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 871 Wong, P. (2013). Perceptual evidence for protracted development in monosyllabic Mandarin lexical tone production in preschool children in Taiwan. Journal of the Acoustical Society of America 133(1), 434–443. Wong, P., R. Schwartz, and J. Jenkins (2005). Perception and production of lexical tones by 3-year-old, Mandarin-speaking children. Journal of Speech, Language, and Hearing Research 48(5), 1065–1079. Wong, P. C. M. (2002). Hemispheric specialization of linguistic pitch patterns. Brain Research Bulletin 59, 83–95. Wong, P. C. M., and M. Antoniou (2014). The neurophysiology of tone: Four decades of research. In Proceedings of the 4th International Symposium on Tonal Aspects of Languages, 199–202, Nijmegen. Wong, P. C. M., and R. L. Diehl (2002). How can the lyrics of a song in a tone language be understood? Psychology of Music 30(2), 202–209. Wong, P. C. M., and R. L. Diehl (2003). Perceptual normalization for inter- and intra-talker variation in Cantonese level tones. Journal of Speech, Language, and Hearing Research 46, 413–421. Wong, P. C. M., H. C. Nusbaum, and S. L. Small (2004a). Neural bases of talker normalization. Journal of Cognitive Neuroscience 16(7), 1173–1184. Wong, P. C. M., L. M. Parsons, M. Martinez, and R. L. Diehl (2004b). The role of the insular cortex in pitch pattern perception: The effect of linguistic contexts. Journal of Neuroscience 24(41), 9153–9160. Wong, P. C. M., E. Skoe, N. M. Russo, T. Dees, and N. Kraus (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10, 420–422. Wong, P. C. M., L. C. Vuong, and K. Liu (2017). Personalized learning: From neurogenetics of behaviors to designing optimal language training. Neuropsychologia 98, 192–200. Wong, P. C. M., C. M. Warrier, V. B. Penhune, A. K. Roy, A. Sadehh, T. B. Parrish, and R. J. Zatorre (2008). Volume of left Heschl’s gyrus and linguistic pitch learning. Cerebral Cortex 18, 828–836. Wonnacott, E., and D. Watson (2008). Acoustic emphasis in four year olds. Cognition 107(3), 1093–1101. Woo, N. H. (1969). Prosody and phonology. PhD dissertation, MIT. Woodbury, A. (1985). Graded syllable weight in Central Alaskan Yupik Eskimo (Hooper Bay-Chevak). International Journal of American Linguistics 51, 620–623. Woodbury, A. (1987). Meaningful phonological processes: A consideration of Central Alaskan Yupik Eskimo prosody. Language 63, 685–740. Woods, H. (2005). Rhythm and Unstress. Ottawa: Canadian Government Publishing Centre. Wright, R. (1996). Tone and accent in Oklahoma Cherokee. In P. Munro (ed.), Cherokee Papers from UCLA (UCLA Occasional Papers in Linguistics 16), 11–22. Los Angeles: University of California, Los Angeles. Wunderli, P. (1987). L’intonation des séquences extraposées en français. Tübingen: Gunther Narr. Wynne, H. S. Z., L. Wheeldon, and A. Lahiri (2018). Compounds, phrases and clitics in connected speech. Journal of Memory and Language 98, 45–58. Xi, J., L. Zhang, H. Shu, Y. Zhang, and P. Li (2010). Categorical perception of lexical tones in Chinese revealed by mismatch negativity. Neuroscience 170(1), 223–231. Xia, Z., R. Levitan, and J. Hirschberg (2014). Prosodic entrainment in Mandarin and English: A crosslinguistic comparison. In Proceedings of Speech Prosody 7, 65–69, Dublin. Xiong, W., J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, and G. Zweig (2017). Towards Human Parity in Conversational Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 25, 2410–2423. Xu, B., and P. Mok (2012). Cross-linguistic perception of intonation by Mandarin and Cantonese listeners. In Proceedings of Speech Prosody 6, 99–102, Shanghai. Xu, B., Z. Tang, R. You, N. Qian, R. Shi, and Y. Shen (1988). Shanghai Shiqü fangyan zhi. Shanghai: Shanghai Jiaoyu Chubanshe. Xu, Y. (1994). Production and perception of coarticulated tones. Journal of the Acoustical Society of America 95(4), 2240–2253. Xu, Y. (1997). Contextual tonal variations in Mandarin. Journal of Phonetics 25(1), 61–83. Xu, Y. (1998). Consistency of tone-syllable alignment across different syllable structures and speaking rates. Phonetica 55, 179–203.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

872 References Xu, Y. (1999). Effects of tone and focus on the formation and alignment of f0 contours. Journal of Phonetics 27(1), 55–105. Xu, Y. (2001). Fundamental frequency peak delay in Mandarin. Phonetica 58, 26–52. Xu, Y. (2005). Speech melody as articulatorily implemented communicative functions. Speech Communication 46(3–4), 220–251. Xu, Y. (2011). Post-focus compression: Cross-linguistic distribution and historical origin. In Proceedings of the 17th International Congress of Phonetic Sciences, 152–155, Hong Kong. Xu, Y., S.-W. Chen, and B. Wang (2012). Prosodic focus with and without post-focus compression: A typological divide within the same language family? The Linguistic Review 29, 131–147. Xu, Y., J. T. Gandour, and A. L. Francis (2006a). Effects of language experience and stimulus complexity on the categorical perception of pitch direction. Journal of the Acoustical Society of America 120, 1063–1074. Xu, Y., A. Krishnan, and J. T. Gandour (2006b). Specificity of experience-dependent pitch representation in the brainstem. Neuroreport 17(15), 1601–1605. Xu, Y., A. Lee, S. Prom-on, and F. Liu (2015). Explaining the PENTA model: A reply to Arvaniti and Ladd (2009). Phonology 32(3), 505–535. Xu, Y., and F. Liu (2006). Tonal alignment, syllable structure and coarticulation: Toward an integrated model. Italian Journal of Linguistics 18, 125–159. Xu, Y., and S. Prom-on (2014). Towards variable functional representations of variable frequency contours: Synthesising speech melody via model-based stochastic learning. Speech Communication 57, 181–208. Xu, Y., and X. Sun (2002). Maximum speed of pitch change and how it may relate to speech. Journal of the Acoustical Society of America 111, 1399–1413. Xu, Y., and C. X. Xu (2005). Phonetic realization of focus in English declarative intonation. Journal of Phonetics 33(2), 159–197. Xu Rattanasone, N., P. Tang, I. Yuen, L. Gao, and K. Demuth (2016). 3-year olds produce pitch contours consistent with Mandarin Tone 3 Sandhi. In Proceedings of the 16th Australasian International Conference on Speech Science and Technology, 5–8, Canberra. Yallop, C. (1977). Alyawarra: An Aboriginal Language of Central Australia (Australian Aboriginal Studies: Research and Regional Studies 10). Canberra: Australian Institute of Aboriginal Studies. Yamamoto, H. W., and E. Haryu (2018). The role of pitch pattern in Japanese 24-month-olds’ word recognition. Journal of Memory and Language 99, 90–98. Yamamoto, S. (2011). Pitch contour of Japanese traditional verse. MA thesis, University of Montana. Yan, H., and J. Zhang (2016). Pattern substitution in Wuxi tone sandhi and its implication for phonological learning. International Journal of Chinese Linguistics 3, 1–45. Yang, A. (2017). The Acquisition of Prosodic Focus-Marking in Mandarin Chinese-Speaking and Seoul Korean-Speaking Children. Utrecht: LOT Dissertation Series. Yang, C., and M. K. M. Chan (2010). The perception of Mandarin Chinese tones and intonation by American learners. Journal of the Chinese Teachers Association 45(1), 7–36. Yang, S., and D. Van Lancker Sidtis (2016). Production of Korean idiomatic utterances following leftand right-hemisphere damage: Acoustic studies. Journal of Speech, Language, and Hearing 59, 267–280. Yang, S.-Y., D. Sidtis, and S.-N. Yang (2017). Listeners’ identification and evaluation of Korean idiom atic utterances produced by persons with left- or right-hemisphere damage. Clinical Linguistics and Phonetics 31(2), 155–173. Yang, X., and Y. Yang (2012). Prosodic realization of rhetorical structure in Chinese discourse. IEEE Transactions on Audio, Speech, and Language Processing 20, 1196–1206. Yasavul, M. (2013). Prosody of focus and contrastive topic in K’iche’. In M. E. Beckman, M. Lesho, J. Tonhauser, and T.-H. Tsui (eds.), Ohio State University Working Papers in Linguistics, vol. 60, 129–160. Columbus, OH: Department of Linguistics, The Ohio State University.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 873 Ye, Y., and C. M. Connine (1999). Processing spoken Chinese: The role of tone information. Language and Cognitive Processes 14, 609–630. Yeon, J. (2012). Korean dialects: A general survey. In N. Tranter (ed.), The Languages of Japan and Korea, 168–185. Abingdon: Routledge. Yeung, H. H., K. H. Chen, and J. F. Werker (2013). When does native language input affect phonetic perception? The precocious case of lexical tone. Journal of Memory and Language 68(2), 123–139. Yi, S. (1995). Pangeonhak (Dialectology). Seoul: Hagyeonsa. Yin, Z. (1982). Guanyu Putonghua shuangyin changyong ci qingzhongyin de chubu kaocha. Zhongguo Yuwen 3(168), 168–173. Yip, M. (1989). Contour tones. Phonology 6, 149–174. Yip, M. (2002). Tone. Cambridge: Cambridge University Press. Yip, M. (2003). Casting doubt on the onset–rime distinction. Lingua 113, 779–816. Yoon, T. J., S. Chavarria, J. Cole, and M. Hasegawa-Johnson (2004). Intertranscriber reliability of prosodic labeling on telephone conversation using ToBI. In INTERSPEECH 2004, 2729–2732, Jeju Island. Yoshida, K. A., J. R. Iversen, A. D. Patel, R. Mazuka, H. Nito, J. Gervain, and J. F. Werker (2010). The development of perceptual grouping biases in infancy: A Japanese-English cross-linguistic study. Cognition 115, 356–361. Young, S. (1991). The Prosodic Structure of Lithuanian. New York: University Press of America. Youssef, V., and W. James (2008). The creoles of Trinidad and Tobago: Phonology. In B. Kortmann and E. W. Schneider (eds.), Varieties of English: Vol. 2. The Americas and the Caribbean, 320–338. Berlin: Mouton de Gruyter. Yu, A. (2007). A Natural History of Infixation. Oxford: Oxford University Press. Yu, A. (2010). Tonal effects on perceived vowel duration. In C. Fougeron, B. Kühnert, M. D’Imperio, and N. Vallée (eds.), Laboratory Phonology 10, 151–168. Berlin: Mouton de Gruyter. Yu, K. M., and H. W. Lam (2014). The role of creaky voice in Cantonese tonal perception. Journal of the Acoustical Society of America 136(3), 1320–1333. Yuan, J. (2004). Intonation in Mandarin Chinese: Acoustics, perception, and computational model ing. PhD dissertation, Cornell University. Yuan, J. (2011). Perception of intonation in Mandarin Chinese. Journal of the Acoustical Society of America 130, 4063–4069. Yuan, J., and Y. Chen (2014). 3rd tone sandhi in Standard Chinese: A corpus approach. Journal of Chinese Linguistics 42(1), 218–237. Yuan, J., and M. Y. Liberman (2010). F0 declination in English and Mandarin broadcast news speech. In INTERSPEECH 2010, 134–137, Makuhari. Yuan, S., and C. Fisher (2009). Really? She blicked the baby? Psychological Science 20, 619–626. Yue-Hashimoto, A. O. (1987). Tone sandhi across Chinese dialects. In Chinese Language Society of Hong Kong (ed.), Wang Li Memorial Volumes: English volume, 445–474. Hong Kong: Joint Publishing Co. Yuen, I. (2007). Declination and tone perception in Cantonese. In C. Gussenhoven and T. Riad (eds.), Tones and Tunes: Vol. 2. Experimental Studies in Word and Sentence Prosody, 63–77. Berlin: Mouton de Gruyter. Yun, J. (2018). Meaning and prosody of wh-indeterminates in Korean. Linguistic Inquiry 50(3), 630–347. Yun, J., and H.-S. Lee (in press). Prosodic disambiguation of questions in Korean: Theory and processing. Korean Linguistics. Yung, B. (1983a). Creative process in Cantonese opera I: The role of linguistic tones. Ethnomusicology 27(1): 29–47. Yung, B. (1983b). Creative process in Cantonese opera II: the process of t’ien tz’u (text-setting). Ethnomusicology 27(2): 297–318.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

874 References Zahner, K., and J. Yu (2019). Compensation strategies in non-native English and German. In Proceedings of the 19th International Congress of Phonetic Sciences, 1670–1674, Melbourne. Zakzanis, K. K. (1999). Ideomotor prosodic apraxia. Journal of Neurology, Neurosurgery, and Psychiatry 67, 694–695. Zariquiey, R. (2018). A Grammar of Kakataibo. Berlin: De Gruyter Mouton. Zatorre, R. J., A. C. Evans, and E. Meyer (1994). Neural mechanisms underlying melodic perception and memory for pitch. Journal of Neuroscience 14(4), 1908–1919. Zatorre, R. J., and J. T. Gandour (2008). Neural specializations for speech and pitch: Moving beyond the dichotomies. Philosophical Transactions of the Royal Society of London B: Biological Sciences 363(1493), 1087–1104. Zec, D. (1993). Rule domains and phonological change. In S. Hargus and E. Kaisse (eds.), Lexical Phonology, 365–405. San Diego: Academic Press. Zec, D. (1994). Sonority Constraints on Prosodic Structure. New York: Garland Press. Zec, D. (2009). The prosodic word as a unit in poetic meter. In K. Hanson and S. Inkelas (eds.), The Nature of the Word: Essays in Honor of P. Kiparsky, 63–94. Cambridge, MA: MIT Press. Zee, E. (1987). Tone demonstration: Shanghai noun compound tone patterns. Handout at the Hyman/ Leben course at the LSA Summer Institute, Stanford. Zerbian, S. (2004). Phonological phrases in Xhosa (Southern Bantu). In S. Fuchs and S. Hamann (eds.), Papers in Phonetics and Phonology (ZASpil), 71–99, Berlin. Zerbian, S. (2006). Expression of information structure in the Bantu Language Northern Sotho. PhD dissertation, Humboldt University of Berlin. Zerbian, S. (2007). Phonological phrasing in Northern Sotho (Bantu). The Linguistic Review 24, 233–262. Zerbian, S. (2015). Syntactic and prosodic focus marking in contact varieties of South African English. English World-Wide 36(2), 228–258. Zerbian, S., S. Genzel, and F. Kügler (2010). Experimental work on prosodically-marked information structure in selected African languages (Afroasiatic and Niger-Congo). In Proceedings of Speech Prosody 5, Chicago. Zerbian, S., and F. Kügler (2015). Downstep in Tswana (Southern Bantu). In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Zeshan, U. (2004). Interrogative constructions in signed languages: Crosslinguistic perspectives. Language 80(1), 7–39. Zeshan, U. (2006). Interrogative and Negative Constructions in Sign Languages. Nijmegen: Ishara Press. Zhang, C., G. Peng, and W. S. Y. Wang (2013). Achieving constancy in spoken word identification: Time course of talker normalization. Brain and Language 126(2), 193–202. Zhang, J. (2001). The effects of duration and sonority on contour tone distribution: Typological survey and formal analysis. PhD dissertation, University of California, Los Angeles. (Published 2002, Oxford: Routledge.) Zhang, J. (2004a). The role of contrast-specific and language-specific phonetics in contour tone distribution. In B. Hayes, R. Kirchner, and D. Steriade (eds.), Phonetically-based phonology, 157–119. Cambridge: Cambridge University Press. Zhang, J. (2004b). Contour tone licensing and contour tone representation. Language and Linguistics 5, 925–968. Zhang, J. (2007). A directional asymmetry in Chinese tone sandhi systems. Journal of East Asian Linguistics 16, 259–302. Zhang, J. (2014). Tones, tonal phonology, and tone sandhi. In C.-T. J. Huang, Y.-H. A. Li, and A. Simpson (eds.), The Handbook of Chinese linguistics, 443–464. Oxford: Wiley Blackwell. Zhang, J., and Y. Lai (2010). Testing the role of phonetic knowledge in Mandarin tone sandhi. Phonology 27, 153–201. Zhang, J., Y. Lai, and C. Sailor (2011a). Modeling Taiwanese speakers’ knowledge of tone sandhi in reduplication. Lingua 121, 181–206. Zhang, J., I. Maddieson, T. Cho, and M. Baroni (1999). Articulograph AG100: Electromagnetic Articulation Analyzer—Homemade Manual. UCLA Phonetics Lab. Retrieved 22 May 2020 from http://www.linguistics.ucla.edu/faciliti/facilities/physiology/Emamual.html.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

References 875 Zhang, J., and Y. Meng (2016). Structure-dependent tone sandhi in real and nonce words in Shanghai Wu. Journal of Phonetics 54(1), 169–201. Zhang, L., J. Xi, G. Xu, H. Shu, X. Wang, and P. Li (2011b). Cortical dynamics of acoustic and phonological processing in speech perception. PLoS ONE 6(6), e20963. Zhang, X. (2012). A comparison of cue-weighting in the perception of prosodic phrase boundaries in English and Chinese. PhD dissertation, University of Michigan. Zhang, Y., and A. Francis (2010). The weighting of vowel quality in native and non-native listeners’ perception of English lexical stress. Journal of Phonetics 38, 260–271. Zhao, Y., and D. Jurafsky (2009). The effect of lexical frequency and Lombard reflex on tone hyperarticulation. Journal of Phonetics 37, 231–247. Zheltov, A. (2005). Le système des marqueurs de personnes en gban: Morphème syncrétique ou syncrétisme des morphèmes. Mandenkan 41, 23–28. Zheng, H. Y., J. W. Minett, G. Peng, and W. S. Wang (2012). The impact of tone systems on the categor ical perception of lexical tones: An event-related potentials study. Language and Cognitive Processes 27(2), 184–209. Zheng, X. (2006). Voice quality variation with tone and focus in mandarin. In Proceedings of the 2nd International Symposium on Tonal Aspects of Languages, 132–136, La Rochelle. Zhgenti, S. M. (1963). Kartuli enis rit’mik’ul-melodik’uri st’ruk’t’ura. Tbilisi: Codna. Zhu, X. (1995). Shanghai tonetics. PhD dissertation, Australian National University. Zhu, X. (2006). A Grammar of Shanghai Wu. Munich: Lincom Europa. Zimmerer, F., B. Andreeva, J. Jügler, and B. Möbius (2015). Comparison of pitch profiles of German and French speakers speaking French and German. In Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Zimmermann, M., and C. Féry (2010). Introduction. In M. Zimmermann and C. Féry (eds.), Information Structure: Theoretical, Typological, and Experimental Perspectives, 1–11. Oxford: Oxford University Press. Zimmermann, T. E. (2000). Free choice disjunction and epistemic possibility. Natural Language Semantics 8, 255–290. Zipse, L., A. Worek, A. J. Guarino, and S. Shattuck-Hufnagel (2014). Tapped out: Do people with aphasia have rhythmic processing deficits? Journal of Speech, Language, and Hearing Research 57(6), 2234–2245. Zlatoustova, L. V. (1975). Rhythmic structure types in Russian speech. In G. Fant and M. A. A. Tatham (eds.), Auditory Analysis and Perception of Speech, 477–483. Academic Press. Zonneveld, W., M. Trommelen, M. Jessen, G. Bruce, and K. Árnason (1999). Word-stress in WestGermanic and North-Germanic Languages. In H. van der Hulst (ed.), Word Prosodic Systems in the Languages of Europe, 478–544. Berlin: De Gruyter. Zora, H., T. Riad, I.-C. Schwarz, and M. Heldner (2016). Lexical specification of prosodic information in Swedish: Evidence from mismatch negativity. Frontiers in Neuroscience 10, 533. Zora, H., T. Riad, and S. Ylinen (2019). Prosodically controlled derivations in the mental lexicon. Journal of Neurolinguistics 52, 100856. Zorc, D. (1978). Proto-Philippine word accent: Innovation or Proto-Hesperonesian retention? In S.A. Wurm and L. Carrington (eds.), Second International Conference on Austronesian Linguistics: Proceedings I, 67–119. Canberra: Australian National University. Zorc, D. (1993). Overview of Austronesian and Philippine accent patterns. In J. A. Edmondson and K. J. Gregerson (eds.), Tonality in Austronesian Languages (Oceanic Linguistics Special Publication 24), 17–24. Honolulu: University of Hawaiʻi Press. Zsiga, E., and R. Nitisaroj (2007). Tone features, tone perception, and peak alignment in Thai. Language and Speech 50, 343–383. Zubizarreta, M. L. (1998). Prosody, Focus, and Word Order (Linguistic Inquiry Monographs 33). Cambridge, MA: MIT Press. Zubizarreta, M. L., and E. Nava (2011). Encoding discourse-base meaning: Prosody vs. syntax— Implications for second language acquisition. Lingua 121, 652–669.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

876 References Zubkova, L. G. (1966). Vokalizm Indonezijskogo jazyka. PhD dissertation, Leningrad University. Zufferey, S., and A. Popescu-Belis (2004). Towards automatic identification of discourse markers in dialogs: The case of like. In Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue, 63–71, Cambridge, MA. Zuraidah, M. D., G. Knowles, and J. Yong (2008). How words can be misleading: A study of syllable timing and ‘stress’ in Malay. Linguistics Journal 3, 66–81. Zuraw, K. (2007). The role of phonetic knowledge in phonological patterning: Corpus and survey evidence from Tagalog infixation. Language 83, 277–316. Zuraw, K., K. M. Yu, and R. Orfitelli (2014). The word-level prosody of Samoan. Phonology 31(2), 271–327. Zwarts, J. (2004). The Phonology of Endo: A Southern Nilotic Language of Kenya. Munich: Lincom Europa. Zwitserlood, P., H. Schriefers, A. Lahiri, and W. van Donselaar (1993). The role of syllables in the perception of spoken Dutch. Journal of Experimental Psychology: Learning, Memory, and Cognition 19(2), 260–271. Żygis, M., D. Pape, L. M. T. Jesus, and M. Jaskuła (2014). Intended intonation of statements and polar questions in Polish in whispered, semi-whispered and normal speech modes. In Proceedings of Speech Prosody 7, 678–682, Dublin.

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

Index of Languages

This index lists spoken, signed or extinct languages mentioned in the book. For language families see Map 1.1 (plate section). !Xóõ, see Taa ǂʼAmkoe 184 Aboh 57, 187 Acehnese 372 Achumawi 403 Acoma 400 Adamawa 59, 190 Adioukrou 54 Afar 202 Afrikaans 271, 278–81 Aghem 50–1 Aguaruna 429 n.3, 434, 437 Ahtna 401, 405 Akan 135 n.5, 192–3, 464 Akhvakh 222 Akkadian 253 Alabama 405 Alagwa 202 Alamblak 391 Alawa 390 Aleut (Unangam Tunuu) 314–15 Alyawarra 387 Ama 204 Amarasi 378 Amazigh, see Berber Ambulas 391 American Sign Language 104–22 Amharic 198 Amo 53 Anaiwan (Nganyaywana) 387 Anal 351 Andi 222 Andoke 52 Apache 403, 405 Arabela 437 Arabic 68, 74, 82–3, 94, 100, 144, 167–8, 183, 197–200, 457–9, 461, 465, 524, 661–2 Cairene Arabic 74, 199–200 Classical Arabic 100, 199–200, 662 Egyptian Arabic 83, 94, 144, 200, 457–9 Jordanian Arabic 200 Juba Arabic 465 Lebanese/Tripoli Arabic 83, 199, 457–8, 465 Moroccan Arabic 199

Palestinian Arabic 199 San’ani Arabic 68 Standard Arabic 199–200 Aramaic 198 Arapaho 404 Arigibi Kiwai 392 Arrernte 385–7 Asheninka 436 Assamese 316–31 Atayal 372 Awa Pit 429 n.4, 437 Awngi 188, 202 Ba 190 Babadjou 55 Babanki 55 Bagiro 54 Baka 205 Bambara 62 Bamileke-Dschang 185 Banawá 67, 437, 439 Bandjalang (Bundjalung) 387 Baniwa 432, 436 Barasana 435–6, 438, 440 Bardi 386–7 Baré 436 Basaa 190, 192 Basque 5–6, 9, 70, 83, 251–62, 267–70, 382, 458, 465, 541, 570, 609 Azkoitia Basque 70 Baztan Basque 253 Central Basque 254 Esteribar Basque 253 Gipuzkoan (Beasain) Basque 253 Goizueta Basque 6, 254–5 Leitza Basque 6, 254–5 Lekeitio Basque 254 Northern Bizkaian Basque 254 Roncalese Basque 253 Sara Basque 253 Souletin Basque 253 Southern High Navarrese Basque 254 Bathari 197 Baule 51

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

878 Index of languages Beaver 465 Begak 372 Beja 202–3 Belarusian 76, 225–35 Bemba (Chibemba) 55, 193, 607 Bench 184, 203 Bengali (Kolkata Bengali) 5, 6, 109, 295, 316–31, 465 Bangladeshi Bengali 323 Berber (Amazigh, Tamazight) 90, 183, 195–7, 199, 465, 671–2 Ghadames 196 Kabyle 197 Moroccan Berber 199 Tamasheq 465 Tashlhiyt 76, 90, 196, 199 Tuareg 196 Zwara 196 Bidayuh 372 Bininj Gun-wok 385–90 Blackfoot 51 Bongo 190, 205 Bora 431–2, 433 n.7, 438 Boro 352 Bosnian-Croatian-Montenegrin-Serbian 144, 225–35, 254, 464, 661 Brazilian Sign Language, see Libras Breton 303–5 British Sign Language 112, 120 n.8 Bukusu 297, 577 Bulgarian 225–35 Buli 465 Burmese 351, 678 n.1 Caddo 401 Cahuilla 398 Cantonese 475, 506, 542–4, 601–2, 677–83 Capanahua 127, 432, 434, 436 Catalan 9, 22, 82–3, 88, 89, 138, 168, 172, 174, 245, 251–70 Central Catalan 260–70 Northern Catalan 256 Cavineña 429 Cayuvava 73–4, 430 Cebuano 379 Chakhar 212 Cham 149, 353 Chamalal 222 Chamorro 372 Chatino 410–11, 414–16 Chechen 222 Cherokee, Oklahoma 401 Cheyenne 404 Chibemba, see Bemba Chichewa 51, 138, 185, 462, 466 Chickasaw 68, 75, 82, 88, 131, 398–400, 404–5 Chiini, Koyra 183 Chimila 48, 432 Chimwiini 45–6, 48, 63, 187, 192

Chinese 8, 46–9, 56, 61, 64, 126–9, 142–3, 162, 170, 295, 332–43, 345, 358, 461, 471, 483–5, 542, 577, 596–7, 601, 604, 606, 608, 615, 629, 661 Changsha Chinese 615 Mandarin Chinese 22–4, 61, 78, 126–38, 143–8, 162, 170, 332–43, 351, 358, 461–7, 503–6, 511, 517, 519, 541–4, 555, 559, 577, 596–8, 601–3, 606–10, 614–16, 656, 677, 680 Beijing Mandarin 335–6 Chengdu Mandarin 335–6 Northern Mandarin 143, 351 Standard Mandarin 47, 61, 596, 598 Taiwan Mandarin 341, 598 Min Chinese 333, 608 Fuzhou Min 333 Taiwanese Southern Min 25, 126, 128, 334, 341 Xiamen Southern Min 63, 333 Nantong Chinese 615 Standard Chinese Wu Chinese 333–4, 338–43, 349 Shanghai Wu (Shanghainese) 61, 338–41 Wenzhou Wu 341 Wuxi Wu 334–5 Choctaw 405 CiShingini 187 Coeur d’Alene 102 Coptic 197 Cornish 303 Creek 401–2, 405–6 Croatian, see Bosnian-Croatian-Montenegrin-Serbian Cuicateco 56 Czech 176, 225–35, 629, 662 Dalabon 83, 90, 388–90, 465 Damana 437 Dan 184 Danish 271–84 Danish Sign Language 112–13 Datooga 204 Dâw 435, 440 Day 59 Deang 341 Delaware 68 Dena’ina 405 Desano 436 Dinka 147, 204, 677, 684–6 Ditidaht 406 Diyari 97–101, 385 Dizi 203 Djambarrpuyngu 386–7 Dogon 47, 183, 188, 193 Dogri 317 Dutch 40, 68, 83, 84, 86, 87–9, 131–2, 147, 150–59, 169, 172, 174, 271–84, 388, 457, 459, 481–5, 511, 514, 517–18, 524–5, 530, 541–66, 596–7, 601, 606, 608, 610–16, 620, 624

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

Index of languages 879 Limburgish (Limburgian) Dutch 6, 87, 282–3, 541, 545–6 Borgloon Dutch 143 Hasselt Limburgish 283, 444 Helden Dutch 283 Maastricht Limburgish 90 Roermond Dutch 84, 283 Venlo Dutch 283 Dutch Sign Language, see Sign Language of the Netherlands Dyirbal 385, 388, 390 Eduria 438 Egyptian 197 Embosi 192–3 Émérillon 74–5 Endo 53 Engenni 53, 186 English: African American English 289, 301–2 African English: Ghanaian English 293, 300–2 Kenyan English 297 Nigerian English 293, 300–2, 615 Tswana English 298 Ugandan English 297 American English 7, 38, 88–93, 131, 139, 153, 155, 174, 245, 272, 287–90, 299, 302, 470, 527–9, 543, 556, 569, 575–7, 610–11, 621, 626, 647 Australian English 90, 290–1, 299, 300, 577–8 Bahamian English 298 British English 90, 92, 131, 153, 169–70, 174, 234, 288, 291, 293, 299, 302, 525, 569, 575, 577, 621 Glasgow English 88, 290 Manchester English 290 Scottish English 131 Standard Southern British English 90, 131, 169–70, 288 Urban Northern British English 290, 299, 302 Welsh English 290 Canadian English 299 Caribbean English 297–8, 459 Jamaican English 297 West Indian English 297 Chicano English 289 Fijian English 296 Hispanic English 301 Hong Kong (Cantonese) English 291–2, 300–2, 626 Indian English 295, 302 Irish (Hiberno-) English 290, 305 Donegal English 290, 305 Drogheda English 290 Dublin English 290 New Zealand English 291, 296, 299–300 Singapore English 293–4, 301–2 South African English 291, 298

Afrikaans English 298 Black South African English 298, 301–2 White South African English 291, 298–9 South Pacific English (Fijian, Niuean, Norfolk Island) 296–7 Erzya 224 Ese’eja 430 Eskimo, see Inuit, Yupik Estonian 8, 20, 68, 138, 141, 150, 225–35, 457, 626, 661 n.3 Etsako 50 Ewe 63, 680 Faroese 306–9 Farsi 199 Fasu 392 Fataluku 395 Fijian 75, 296, 301 Finnish 8, 69, 150, 169, 224, 225–35, 271, 559, 595, 608, 661–2, 674 Finnish Sign Language 112 Folopa 392 French 20, 22, 24–5, 40–1, 76, 108–9, 167, 169, 172, 174, 199, 232, 236–49, 256, 301, 448, 459, 462, 529, 535, 542, 545, 548, 550, 554–7, 565–70, 572, 575, 577, 595–601, 606, 608–11, 616, 620, 624–5, 629, 648, 656, 673–4 Belgian French 241 Canadian French 239, 569, 599 Central African French 241 European (Hexagonal) French 236–49, 569 Occitan French 241 Swiss French 241 French Sign Language 115 Frisian 278, 280–2 East Frisian 278 Northern (North) Frisian 142, 278 Saterland Frisian 278 West Frisian 278, 280–2 Fula 183 Fuliiru 56 Ga 184, 189 Ga’anda 201 Gaelic 303–5 Garrwa (Garawa) 73, 390 Gban 60, 64 Gela 465 Georgian 207, 219–22, 223, 462, 464 German 4–5, 9, 18–28, 35, 40–1, 82–4, 90, 112, 131–2, 140–7, 161, 167, 171, 174, 246–7, 271–2, 278–84, 388, 445, 447, 457–9, 461, 465–7, 541–2, 548–50, 558–9, 565–70, 575, 601, 606–16, 620–22, 626, 648, 676 Alemannic German 272 n.3, 282, 284 Cologne German 282–3 Franconian German 282

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

880 Index of languages German (cont.) Northern (North, Low) German 90, 131, 271, 278, 280–2 Saxon German 278, 445 Southern (South, High) German 131, 271, 278, 282 Swabian German 445 Swiss German 613 German Sign Language 107, 113, 115 Ghotuo 185 Gidar 202 Giryama 62, 191 Gǀui 184 Goemai 188, 190 Gokana 185, 190 Gor 205 Gothic 272 Greek 20–2, 82–5, 88–94, 131, 150, 157, 161, 166, 235, 236–50, 388, 457, 527, 609–11, 661, 667, 671, 675 Ancient (Classical) Greek 166, 524, 661, 667, 675 Asia Minor Greek 241, 244 Athenian Greek 236–8, 240–4, 248, 249 Corfiot Greek 240–1 Cretan Greek 241, 244 Cypriot Greek 84, 240–1, 244–5 Epirus (Ipiros) Greek 90, 241, 244, 249 Northern Greek 241 Guaraní 457, 459 Guarijío, Mountain 398 Guató 439 Guébie 60 Gujarati 295, 317, 319, 331 Gwari 54, 185 Gwich’in 401 Halkomelem, Upriver 49, 403 Haroi 349 Harsusi 197 Hausa 138–9, 188, 191–3, 201–2, 293, 465, 661, 679 Haya 50, 184 Hebrew 109, 176, 198, 281, 484, 548–50 Hindi 295, 301, 316–31, 463–7, 608, 662 Hmong 58, 147–8, 277, 350 Green Hmong 148 White Hmong 58, 147, 350 Hobyot 197 Ho-Chunk 70, 397 Hong Kong Sign Language 112–13 Hopi 74, 403 Huariapano 431 Huli 681 Hungarian 84, 150, 161, 169, 224, 225–35, 389, 459, 464–5, 595, 625, 661–2 Hup 440 Iau 45–8, 60 Ibibio 189–90

Icelandic 306–9 Iha 394 Ik 186 Ilokano 98–100 Iñapari 51, 431, 434, 437 Indonesian 76, 101, 373–8, 465, 597 Ingush 222 Inor 198 Inuit 309–11, 315 Inuktitut Inuit 310, 315 Inuttut, Labrador 310–11 Itivimuit 311 South Baffin Inuktitut 310–11, 315 Inupiaq, Seward Peninsula 310 Kalaallisut (West Greenlandic) 76, 82, 309–11, 314–15, 405, 464 North Greenlandic 311 Iraqw 202 Irish 5, 88, 290, 303–6 IsiXhosa, see Xhosa IsiZulu, see Zulu Isoko 53 Israeli Sign Language 104–22, 444 n.1 Italian 6, 18, 88, 90, 92, 102, 108, 138, 140–1, 168–74, 236–49, 326, 458, 475, 518, 527, 556, 566, 568, 570, 575–7, 607–9, 620, 656 Bari Italian 241–2, 245–6, 248–9 Cosenza Italian 241–2 Florentine Italian 241–3 La Spezia Italian 242 Lucca Italian 242 Neapolitan Italian 88, 90, 92, 239, 240, 241–3, 245–6, 248–9 Palermo Italian 241–3, 246, 248–9 Pescara Italian 242, 249 Pisa Italian 242–3, 245, 248–9 Roman Italian 18, 241–2 Salerno Italian 242 Southern Italian 102 Turin Italian 242–3 Tuscan Italian 18, 241–3 Italian Sign Language 118 Jamaican Creole 82–3, 297 Jaminjung 388, 390, 464, 465 Jamsay 47, 193 Japanese 6, 10, 63, 69, 78, 82–94, 101, 131, 135–6, 138, 167, 171–2, 174, 254, 326, 355–69, 462, 481–2, 485, 511, 524, 541, 545–6, 554, 557, 560, 566–8, 570, 573–8, 597, 606–11, 624, 626, 648, 658, 668–9, 681 Echizen-Kokonogi Japanese 357, 360 Kagoshima Japanese 360 Kobayashi Japanese 357, 360 Koshikijima Japanese 357, 360 Kuwanoura Japanese 360 n.7 Kyoto Japanese 357, 360

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

Index of languages 881 Miyakonojo Japanese 357–8 Nagasaki Japanese 357, 360 Narada Japanese 357, 358 n.5 Osaka Japanese 357, 360 Sino-Japanese 359 Tokyo Japanese 6, 82, 91, 254, 355–64, 369, 545–6, 570 Yuwan Japanese 360 Javanese 373, 375 n.1, 376, 382 Jeh 346 Jemez 401 Jibbali 197 Jingulu 385 Jita 186, 188 Jukun 190 Kabardian 70–1 K’abeena 202 Kabyle, see Berber Kadiweu 429 n.3,4, 430, 432 Kairi 51, 393 Kakataibo 430–4 Kakua 435 Kalabari 57–8, 186, 188 Kalam 391, 681 Kalenjin 53, 241 Kam 49 Kannada 295 Kanum 391 Kaqchikel 161, 422 Karapana 435 Karen, Eastern Kayah Li 56 Karuk 406 Kashinawa 433 Kashubian 225 Kayardild, 384 n.1, 385–90 Kazakh 207 K’ekchi 161 Kera 48, 201 Ket 332, 342–3 Khmer 345–6, 349 Khmu 348–9, 352 Khoekhoe 184 Khwe 184 Kickapoo 404 Kikerewe 186 Kikuria 57–8 Kinyarwanda 88, 126–7 Kiowa 401 Kipare 89 Kisi 187 Klamath 398 Koasati 101, 400–2 Kofán 434, 436 Kohistani, Kalam 681 Kom 55, 64, 186 Konni 187, 190, 192

Korean 5, 9, 16, 20, 22, 25, 76, 82–4, 93, 131, 169, 301, 355–69, 445, 461–2, 466, 488, 528–9, 559, 567, 596, 600, 606–10, 612, 614, 616, 620 n.1, 624, 656 Chungcheong Korean 363–6 Daegu Korean 357, 360 n.9, 364 Gangwon Korean 363–4 Gyeonggi Korean 363 Gyeongsang Korean 357, 363–5, 367–8 Hamgyeong Korean 357, 363 n.12, 364 Hwanghae Korean 363 n.12 Jeju Korean 363–6 Jeolla Korean 357, 360 n.7, 363–6 Middle Korean 363–4 North Korean 363 n.12 Pyeongan Korean 363 n.12 Seoul Korean 9, 356, 365–7, 369 Yanbian Korean 5, 362, 364, 367, 369 Koreguahe 436, 438–40 Koromfé 183, 190 Kotiria 435 Kotoko, Makary 188 Koyukon 401 Kpelle 189 Krachi 51 Kubeo 430–1, 435, 440 Kukatj 386 Kuki-Chin 49, 350–1 Kuki-Thaadow 49, 51, 54–5 Kukuya 47, 49, 52, 185, 190 Kulawi 381–2 Kunama 59, 204 Kuot 395 Kutenai 398 Kʷak’ʷala 72 Laal 187, 190, 193 Lai, Hakha 49, 56 Lakhota 397, 400 Lakondê 431, 438 Lango 56, 191 Latgalian 225 Latin 97, 319, 524, 661–2, 667, 675 Latundê 431, 438 Latvian 225–7, 231, 235, 277, 284, 401 Leggbó 50 Libras (Brazilian Sign Language) 105, 110, 115–17, 120–2 Limburgish (Limburgian), see Dutch, Limburgish Limilngan 385 Lithuanian 6, 9, 155, 225–7, 231, 235, 284, 401 Lomongo 189 Lua 190 Luganda 187–8 Lulamogi 87–8 Lulubo 58 Lushai 677

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

882 Index of languages Lushootseed 399 Lusoga 5, 191–2 Maa 204 Maale 203 Maban 205 Macedonian 70, 225–6, 229, 234 Mafa 202 Máíhɨ ̃ki 435, 438, 440 Makary Kotoko, see Kotoko Makuna 435, 438 Malakmalak 74 Malay 76, 169, 295, 302, 347, 370–8, 382, 465, 608 Ambon (Ambonese) Malay 76, 169, 373, 376, 378 Betawi Malay 373 Kutai Malay 373 Malaysian Malay 373, 376, 378, 608 Manado Malay 373, 376, 378, 382 Papuan Malay 378 Malayalam 316, 323, 326 Malinké de Kita 51, 184 Maltese 8, 199, 236–42, 245, 248–9, 298–9, 301 Gozo (Gozitan Żebbuġi) Maltese 242 Mambila 138, 191 Manam 375 Mandarin, see Chinese, Mandarin Mankon 55 Mano 136, 190 Manx 303 Māori 465 Mapudungun 429 n.4, 437 Marathi 316, 318, 331 Margi 52, 56 Martuthunira 385, 387 Matbat, Magey 372 Mawa 201 Mawng 5, 88, 386, 389–90 Maya, Yucatec 417–23, 465 Ma’ya 6, 162, 372 Mayo 406 Mazatec 53, 147, 410, 412–13 Jalapa Mazatec 147, 413 Mbabaram 387 Mbui 184 Medumba 135 Mehri 197 Merap 372 Mesem 391 Miao, Black 349 Migaama 201 Mikasuki 405 Min, see Chinese, Min Miraña 431, 433 Mixe, Ayutla 424–6 Mixtec 51, 58, 76, 408–17 Acatlán Mixtec 51 Ayutla Mixtec 76, 412, 424

Chalcatongo Mixtec 58 Coatzospan Mixtec 51 Peñoles Mixtec 51 Modo 205 Mofu 201 Mohawk 402 Moken 372 Mon 347 Mongolian 82–3, 207–14, 219, 223–4 Halh (Khalkha) Mongolian 207, 212, 224 Monguor 2 Moor 372 Mordvin 669 Mpi 57 Mpur 394 Muinane 432–3 Mukulu 201 Mundurukú 56, 435–6, 438 Muniche 430, 436 Mursi 204 Musgu 9, 201 Nadëb 435, 440 Naga, Tangkhul 50, 53 Nahuatl 397, 400–3, 408 Balsas Nahuatl 400–3 Nandi 53–4, 297 Nanti 436–7, 661 Navajo 402, 406, 465, 681 Naxi 134, 351 Nayi 203 Ndam 190 Ndjébbana 384 n.1 Nez Perce 399 Ngalakgan 385–7 Ngamambo 49, 51 Ngan’gityemerri 385–6 Nganyaywana 387 Ngiti 205 Ngiyambaa 385, 387 Ngizim 201 Nhanda 385, 387 Nɬeʔkepmxcin 465 Noni 59 Nootka, see Nu-cha-nulth Norwegian 6, 41, 52, 254, 271–7, 280, 306, 401, 493 Nubi 48, 199 Nu-cha-nulth (Nootka) 406 Nukini 429 Nunggubuyu (Wubuy) 386–7 Nyakyusa 183 Ocaina 438 Oirat 213–14 Old Norse 306 One 391 Oriya 319

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

Index of languages 883 Oromo 202 Osage 75 Ossetian 216 Paiute, Southern 69 Paiwan 70–1, 374 Pame, Northern 414 Pangutaran 375 Papiamentu 76 Persian 5, 207, 215–18, 223, 457, 597, 661–2, 671 Piaroa 429 n.4, 430, 437 Pingyao 56 Pirahã 71, 76, 429, 436 Pitjantjatjara 73, 389 Polish 24, 40, 90, 150, 161, 168, 172, 174, 225–35, 595, 598, 625, 629 Pomo, Southeastern 397 Portuguese 9, 83, 135, 138, 143, 177, 200, 251–70, 319, 326, 458, 550, 554–7, 590, 615 Brazilian Portuguese 177, 252, 261, 266, 268, 270 European Portuguese 135, 252, 260–1, 266–8, 326, 458, 554–5 Northern European Portuguese 143 Proto-Arawan 439 Proto-Bantu 50 Proto-Boran 439 Proto-Chadic 201 Proto-Chibchan 439 Proto-Nadahup 440 Proto-Panoan 439 Proto-Tukanoan 439 Proto-Tupian 439 Pumi 349 Punjabi 295, 301, 316, 320, 331 Qiang 347 Quechua, South Conchucos 72, 430 Raglai, Northern 346 Rarámuri, Choguita 399–404 Rembarrnga 386 Romani 88, 92, 94, 225, 457 Romanian 84, 225–6, 229–30, 235, 457 Ronga 59 Rotuman 378 Russian 72, 109, 132, 140, 150, 167–8, 225–6, 229–35, 457, 461, 465, 525, 560, 566, 597, 629, 662 Russian Sign Language 112, 115 Ruwund 186 Saami (Sámi) 74, 271 Pite Saami 74 Saisiyat 374 Sámi, see Saami Samoan 375, 465 Sanskrit 524, 661, 671–2, 675 Sara 205

Sardinian 459–60 Savosavo 72 Saweru 394–5 Sayhadic 198 Saynáwa 432, 436 Scottish Gaelic, see Gaelic Seereer 183 Sekani 399, 402 Sekoya 429, 437 Selk’nam 439 Sena 183 Seneca 69, 405 Sentani 394 Serbian, see Bosnian-Croatian-Montenegrin-Serbian Serbo-Croatian, see Bosnian-Croatian-MontenegrinSerbian Shambala 51, 185 Shanghainese, see Chinese, Wu Chinese Sharanawa 435 Shehri 197 Shekgalagari 5, 194 Sheko 203 Shilluk 204 Shingazidja 193, 462 Shipibo 436 Shiwilu 429 Shixing 347 Shona 679–80 Shoshone, see Tümpisa Shoshone Sidaama 202 Sign Language of the Netherlands 110, 112, 115, 120 n.8 Sikaritai 393 Sikuani 431, 437 Sirenikski 309 Skou 392–3, 395 Slave 400–2 Slovak 225–35, 471, 629 Slovenian 225–9, 234 Somali 9, 52, 202, 661, 668 Soqotri 197 Sorbian 225 Sotho 135, 465 Northern Sotho 135 Spanish 9, 18, 69, 88–9, 94, 131, 138, 144, 161, 167–72, 174, 176, 200, 245, 251–70, 458, 465, 471, 511, 548–50, 557, 566, 570, 595–9, 604, 607–11, 614, 616, 623–5, 629, 656 Basque Spanish 88, 458 Canarian Spanish 458 Caribbean Spanish 264 Madrid Spanish 458, 465 Mexican Spanish 138, 458, 550, 609–11 South American Spanish 88, 264, 458 Argentinian Spanish 264 Stoney 397 Sumtu, Myebon 350–1 Sundanese 371

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

884 Index of languages Supyire 185 Suruí 431, 435 Swahili 183, 189 Swedish 6, 9, 16, 52, 78, 87–8, 92, 112, 132, 143, 161–2, 168, 176, 254, 271–80, 285, 306, 401, 457, 511, 541, 545–7, 559, 608, 649 Central (Stockholm) Swedish 143, 272–7, 545 Dala Swedish 273, 275 Eskilstuna Swedish 277 Finland Swedish 272 Gotland Swedish 273 South Swedish 273 Swedish Sign Language 112 Taa (!Xóõ) 184 Tagalog 69, 102–3, 374–81 Tai Dam 678 n.1 Tai Phake 681 Taiwanese, see Chinese, Min Tamang 48, 351 Tamasheq, see Berber Tamazight, see Berber Tamil 82, 88, 295, 316, 323, 326, 661 Tanacross 50, 401–4 Tanimuka 108, 431, 438 Tarifit, see Berber Tashlhiyt, see Berber Tatuyo 51, 440 Telugu 167, 316–18 Tepehuan 407 Terena 430–2, 436, 439–40 Thai 76, 78, 126–7, 134, 162, 168, 170, 344–6, 349–50, 353, 511, 541–2, 577–8, 602, 609–11, 625, 678 n.1, 679, 681, 683–6 Tibetan, Lhasa 352 Tifal 391 Tigrinya 198 Tikuna 433, 435–6 Amacayayu Tikuna 433 Timugon Murut 103 Tinputz 51 Tiriyó 429 Tiv 135 Toba Batak 372–3, 375 n.1, 378 Tobagonian Creole 298 Tohono O’odham 67–8, 72, 74–5, 407 Tommo So 188, 679, 687 Tongan 67, 373 Torau 457 Totela 186 Totoli 376–7 Trinidadian Creole 298 Trique (Triqui) 52, 64, 68, 76, 412–15, 601 Chicahuaxtla Triqui 415 Itunyoso Trique (Triqui) 52, 64, 412–14 Tsat 347 n.2 Tsimshian, Sm’algyax 397

Ts’ixa 184 Tsuut’ina 401 Tswana 135 Tuareg, see Berber Tübatulabal 69 Tukang Besi 68 Tukano 435–6, 440 Tumak 190 Tumbuka 183, 189 Tümpisa Shoshone 69 Tunebo 6, 430, 436 Turkish 5, 150, 161, 207–11, 221, 223, 244, 465, 570, 662 Tuyuka 430, 434 Ukrainian 225–6, 229, 233, 235 Ulwa 96, 102 Uma 381 Umpila 386–7 Umutina 436 Unangam Tunuu, see Aleut Uradhi 387 Urdu 608, 661–2, 671, 675 Urhobo 53 Urubú Kaapor 68, 75 Uyghur 207 Uzbek 224 Vietnamese 46, 58, 126–7, 345–6, 349–54, 608, 661, 678 n.1, 679, 682–3 Hanoi Vietnamese 46 Northern Vietnamese 349, 353 Southern Vietnamese 58, 127, 349 Wa 341, 347–8 Waima’a 376–7 Wambon 391 Wandala 202 Wãnsöhöt 435, 438 Waorani 437 Warekena 432 Warlpiri 384–8 Warray 386 Waru 669–70 Wayuunaiki 436 Welsh 76, 303–5, 556 West Greenlandic, see Inuit, Kalaallisut Wichí 437 Wik Mungkan 388 Wobe 184 Wolaitta 203 Wolof 183, 189, 465 Wubuy, see Nunggubuyu Xhosa (isiXhosa) 298, 462 Xipaya 439 Xitsonga 194

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

Index of languages 885 Yagua 437 Yakan 375 Yana 72 Yankunytjatjara 385, 387 Yanomami 402, 406, 432, 439 Yaqui 54, 398 Yaur 372 Yawitero 437 Yem 203 Yemba (Dschang) 55 Yerisiam 372 Yi 341 Yiddish 271, 278, 280–1, 289 Yidiny 387 Yimas 391 Yongning Na 349, 351, 353 Yoruba 52, 54, 57, 89, 126, 129, 138–9, 148, 167, 185–6, 293, 542 Yuchi 397 Yuhup 440 Yukulta 387–8

Yukuna 438 Yupik (Yup’ik) 9, 69, 309–11, 314, 398–99, 406 Alutiiq Yupik 311, 314, 398 Central Alaskan Yupik 9, 69, 311, 399 Central Siberian Yupik 314 Naukanski Yupik 314 Sirenik Yupik, see Sirenikski St Lawrence Island Yupik 314 Zaar 465 Zaghawa 205 Zande 187 Zapotec 56, 412, 414–15 Coatlán Loxchixa Zapotec 415 Comaltepec Zapotec 412 Isthmus Zapotec 56 Ozolotepeq Zapotec 415 San Lucas Quiaviní Zapotec 412 Zargulla 203 Zulu 298, 607–8, 679 Zuni 102, 406

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

Subject Index A

Abercrombian foot 525 n.1 accentual phrase 4, 5, 9, 17, 63, 82, 202, 212–13, 219–20, 230, 232–3, 238–9, 243, 247, 255, 279, 294, 323, 331, 361, 388, 461, 525, 600, 610 acoustic cues to prosody ch 10, 228, 365, 479, 488, 495, 505, 511, 519, 523, 526, 530, 536, 555, 565, 568, 596–8 aerodynamic measurement 18–19, 25 agglutination 209, 212, 224 alignment forced 173 Optimality Theory 103, 661, 671, 675 tone target 87, 126, 129–31, 139–42, 145–6, 156, 162, 190, 196, 199–200, 234, 242–5, 249, 249, 252, 261, 289, 294–5, 305–8, 328, 379, 459, 555, 557, 610, 650 sign languages 108, 122 stress/accent 199, 204–1, 243–4, 261, 391, 405, 438, 563 text-setting 388, 661, 671, 675 tone and gesture 22 American structuralism 285 anaclasis 671 anticipatory phonetic 53, 126–8, 138, 558 phonology 54, 204 speech error 531 antimetricality 658 articulation rate, see also speech rate, tempo 167, 170–1, 612–13 Articulatory Phonology 525 n.1, 529, 536 assessment machine performance esp. ch 47, 639, 655 non-native speech 613, 616–18, 637–40 pathology esp. ch 34, 583, 593 assimilation tone 51, 53–4, 65, 129, 431 vowel/consonant 151, 216, 240, 318, 321, 328–30 auditory processing 500, 586–7 autism spectrum disorder 490, 504, 583, 642 automatic labelling ch 47 automatic speech processing ch 46, 47 automatic speech recognition ch 46, 524, 525 n.6, 628, 646, 654 Autosegmental-Metrical Theory ch 6, 140, 200, 206, 230, 238, 285, 353, 373, 456, 525, 557

B

Bayesian model 512, 515, 517–20 beat-splitting 659 big/small accent (Swedish) 273 biological code 135, 446 n.1, 615 bootstrapping, see prosodic bootstrapping boundary tone 6–7, 22, 35, 63, 76, 83–4, 86, 90, 156, 162 n.9, 165 n.11, 191–3, 196, 199–200, ch 14–21, 342, 354–3, 362–7, 377–8, 388, 390, 394, 418, 445, 446–7, 459, 461, 479, 518, 525, 554, 609–10, 621, 640 n.10, 651 British School of intonation analysis 86, 233, 285, 286, 287 n.3, 301, 524, 526

C

calling contour/vocative chant 35, 89–90, 234, 244, 247, 282 Cantopop 677, 681–3 catalexis 658 n.1, 666, 668 categorical perception 39, 67, 504, 564, 601 cerebral palsy 587 chain shift (tone) 334–5 child-directed speech ch 41, 569 Chinese regulated verse 661 clitic group 4, 227, 229, 525 n.2, 527, 570 clitic 57, 106, 202, 215–7, 223, 227, 229, 237, 279, 308, 312–4, 352, 373–5, 379–82, 386, 405, 409, 411, 416 n.7, 417, 422, 530 cochlear implant 492, 588–90, 592 CoG (centre of gravity) 144–6, 147 Compound Stress Rule (CSR) 337–8 computational paralinguistics 473, 638, 640 constraints (Optimality Theory) 99, 102–3, 185, 221, 239, 328, 548, 551, 655, 659, 661–7, 671, 679 n, 686, 687 contour simplification 55–6 contour tone 47–55, 184–5, 204–5, 332 n.1, 401, 410, 413, 433, 435, 440, 504, 608, 679, 683, 686 contraction in speech 253, 279, 475 in text-setting 663 contradiction contour 449, 452 Contrast Theory 145, 147 contrastive rhythm 168–8, 173 correlates of stress ch 10, 76, 158, 198, 212, 229, 233, 235, 354, 372, 386, 400, 402, 412, 429, 434, 466, 649

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

888 Subject Index co-speech gesture 67, 297, 538 coupled oscillators 170, 177–9 creak, see glottalization creole 12, 48, 76, 82, 83, 297–8 cross-cultural variation 481, 576–8, 642 culminativity 15, 48, 196, 202, 224, 392, 428, 434–5 cyclicity phonological 337 speech production 177, 196

D

dance 659–60, 668 deaccenting 6, 135, 211, 217, 223, 232–5, 263, 276, 288–90, 296, 301–2, 306, 325–6, 329, 369, 390, 455, 457–9, 464, 607–8 deaccentuation, see deaccenting deceptive speech 475, 476, 638 declination 20, 89, 135, 136, 137, 138, 139, 289, 493, 494 default tone 51, 294, 433 delayed peak 22, 88, 127, 281, 295, 341, 447 demarcative 160, 202, 208, 224, 251, 429, 437 depressor consonants 184, 201, 204 deseterac meter 661 Developmental language disorder (DLD) 585–7, 592, 593 diachrony 201, 272, 275, 279, 404, 444 Diagnostic Analysis of Nonverbal Behavior 509 Dinka song 677, 684–5 dipod 660, 666–8, 671–3 discrimination in infants 564, 569 intonation 36, 614–15, 653 JND 34–7 rhythm 171, 174–5, 566 stress 155, 570 tone 349, 503, 542–4, 602 dissimilation phonetic 128 phonological 56, 432 distich 660, 666, 668, 675 downdrift 50, 136, 193, 201, 204, 305 downstep 136, 184–5, 201, 204, 243–8, 257, 281, 287, 290, 293, 295, 361–2, 367–8, 377, 401, 416, 648, 686 downtrend 135–8, 213, 220, 223, 293, 308–9, 315 dysarthria 490, 587–8 (dys)fluency 591–2, 606, 611–14, 616–17, 620, 628, 641, 649, 655 dysprosody 489, 491–3

E

early accent, see stress clash echo question 264, 299, 311, 382 edge tone 193, 240, 243, 245, 249, 260, 281, 285–90, 276, 295, 299, 301, 308, 374, 377, 378–80, 382, 525 EEG (electroencephalography) 501, 564, 570

EGG (electroglottography) 16–18, 26 elision in meter 663–4, 675 of pitch movement, see truncation EMA (electromagnetic articulography) 21–4, 26 emotion perception 474, 590–1 emotional speech 472–3, 487 encliticization, see clitic entrainment 95, 178–9, 468, 469–72, 520, 638 epenthesis 132, 200, 212 EPG (electropalatography) 24–5, 529 ERB scale 165 exaggerated pitch 576 extrametricality 72–4, 77, 82, 437, 598–9 extrametricality (meter) 663–4

F

facial expression 104, 105, 107–12, 117–22, 474, 477, 479–80, 482, 484, 487, 492, 561, 567, 584–5 final devoicing 308, 310, 315 final lengthening 157, 175–7, 209, 285, 298, 301, 311, 375, 405, 461, 469, 528–9, 532, 557, 575, 613 fixed stress 160–1, 163, 229, 397–8, 402, 425–6, 430, 439, 624–5 floating tone 50, 185, 188, 201, 410 fMRI (functional magnetic resonance imaging) 27, 501–2, 504–5, 564–5 focal accent/tone 266, 273–8, 287, 290, 295, 369, 428 focal prominence 229, 275, 340–1, 466 focus particle 115, 197, 202, 423 foot 4, 47–8, 74, 81, 96–102, 105, 157, 169–70, 176–77, 189, 207, 212, 216, 226, 228, 238, 253, 272, 283, 307, 312, 314, 335, 337–9, 375, 384–5, 391, 394–5, 405, 419, 429–31, 434, 437, 439, 525, 536, 540, 550, 599, 628, 657, 659, 660–2, 666–71, 675, 994–5 formulaic expression 294 function word 69, 156, 287, 292–4, 335, 339, 379, 381, 516, 531, 571, 595, 623, 649, 664–5 functional load 161, 188, 205, 216, 301, 411, 415 Functional Load Hypothesis (FLH) 161–3 fundamental frequency ch 3

G

garden-path sentence 2 geminate 3, 68, 238 n.3, 245, 279, 312, 314, 316, 317–19, 330, 387, 432 gesture intrusion 538 ghazal 675 givenness 111–14, 117, 135, 214, 231–4, 260, 273, 287, 309, 311, 447, 455–6, 459, 464, 466–7, 559 glottalization/laryngealization/creak 16, 18, 200, 204, 231–2, 277, 343, 349, 351, 404, 413–14, 420–1, 424, 436–7, 523 grammatical/morphological/syntactic tone 50, 56–8, 63, 186–8, 203–5, 216, 349–51, 400–3, 410–11, 438

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

Subject Index 889

H

haiku 167, 668 hearing impairment 133 hearing loss 588 perception of prosody 589–91 hemistich 660, 666, 668, 672 Hertz-to-Bark conversion 154 hexameter 667 High Rising Terminal, see uptalk horizontal assimilation, see spreading

I

Iambic-Trochaic Law 73, 565, 567, 570 implication contour 448 improper bracketing 661, 666 incrementality ASR 640 speech planning 533–7 infant directed speech, see child-directed speech infants chs 38–41, 172, 174, 179, 524 infixation 96, 102–3 initial lengthening, see initial strengthening initial strengthening 20, 176, 240, 510–1 initiality accent 273, 276 initiation time 530, 537 intermediate phrase 9, 81–2, 108, 209, 230, 235, 255, 279, 287, 361, 445, 609, 648, 651–2 interpolation 24, 81, 87, 91–2, 94, 130, 136, 140, 293–4, 321 n.7 interrogative sign language 109, 117 spoken language 137, 141, 192–4, 219, 223, 231, 233, 254–5, 261–2, 268, 292–4, 310–1, 328–9, 342, 361, 376, 389, 395, 445 interrogative particle 197–8, 202–3, 223, 264, 286, 608 Intonation Interaction Profile (IIP) 591 intonational construction 233 intonational meaning esp. ch 30, ch 31 intonation(al) phrase 9, 18, 26, 81–2, 105–7, 209, 213, 230, 233, 239, 255, 258, 278–9, 282, 287, 308, 310–11, 317, 320, 361, 376, 388, 416, 445–6, 512, 514, 523, 527–9, 530, 534, 555, 566, 571, 609, 641, 648, 653 intoneme 231, 233 intrinsic f0/pitch 39, 685 n.7 isochrony 167–70, 175, 177–8, 238, 405, 659 isosyllabic 667

J

JND (Just Noticeable Difference) 34–7

L

L2 teaching esp. ch 45 language development 10, 505, 506, 556, 564, 566, 574, 580, 581

language processing 500–1, 586, 590 laryngealization, see glottalization laryngoscopy 16, 24, 26 length shift 375 lesion 490, 499–502, 506 level tone 49, 129, 135, 138, 293, 410, 504, 543, 602 leverage features 11, 634, 642, 643, 645 lexical phonology 662 lexical (pitch) accent 16, 18, 78, 82, 87, 188, 208, 211, 231, 234, 251, 253–5, 261–2, 268–9, 272–6, 277–8, 282–4, 304, 355, 362, 364, 385–90, 461, 544, 546–7, 638 lineation 658–9 localized lengthening 167, 176–9 long-distance effects 63, 201, 687 loudness 18–19, 31, 41–42, 66, 77, 162, 434, 469, 470, 474, 477, 494, 499, 587, 591, 592, 593, 635, 636, 638, 642–3, 662 LTAS (long-term average spectrum) 154 Lục bát 661

M

machine learning 468, 475, 633, 646–7 Mainstream English Varieties 9, 286 major phrase 361 marked nominative case systems 204 maximal prosodic word 106, 276–8 measuring techniques esp. ch 2 melodeme 232 melodic contour training 592 merging 205, 241, 248–9 meter 11, 68, 524, 657–69, 671–5 metrical rhythm (see speech rhythm) metrical structure 78–9, 81, 94, 167, 170, 179, 251, 316, 337, 343, 386, 388, 395, 402, 417, 429, 440, 520, 524, 657, 659, 661, 666, 673, 684 minimal prosodic word 99, 274, 277–8 minimal stress pair 152, 158 minor phrase 361 mismatch (prosodic structure) 272, 339, 369, 390, 501, 661, 671, 673 mispronunciation 542–3, 548–9 missing fundamental 31–3 monosyllabic words 47, 52, 73, 97, 252, 317, 345, 392, 411, 415–16, 429, 518–19, 604, 665, 670 moras 668 morphological tone, see grammatical tone morphotoneme 46 motor speech disorder 94, 173, 490, 587 MRI (magnetic resonance imaging), see fMRI music 12, 238, 478, 506, 590–1, 657, 658–60, 666, 669, 675, 678–9, 681–5

N

narrow focus 134, 217, 231, 234–5, 259–60, 266, 267, 326, 328, 330, 376, 390, 457–62, 559

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

890 Subject index nativized varieties 286 neuroimaging 10, 499, 501–2, 505, 507, 545, 584 neuroplasticity 505–6 newborn(s) 568, 578–9 Nijmegen model of speech encoding 533–6 Non-Finality 74, 77, 234, 308, 388 nonverbal communication 478–9 nuclear contour 261, 286, 288, 301, 308, 613 nuclear (pitch) accent 7, 82, 84, 86, 91, 200, 277, 279, 297, 308, 457, 459, 464–6, 528, 609 Nuclear Stress Rule 337

O

Obligatory Contour Principle (OCP) 55, 143, 275–6, 320–1, 323–4, 328–9, 365 oblique setting 678, 685 one high only (OHO) constraint 203 optoelectronic device 22, 27

P

paradigmatic comparison 152, 154 paralinguistic(s)paralanguage 3, 78, 134, 135, 444–5, 480, 485, 576, 615, 635, 637, 639 parallelism 666–7, 669 parentese, see infant/child-directed speech pause 40–1, 170, 470, 475–6, 528–9, 555, 575, 578, 592, 612–14, 638, 641–2, 672 peak intensity 153, 158 PENTA model 3 pentameter 659, 662–3 PEPS-C 492, 591 perceptual integration 169 perceptual learning 471, 517–18 performance conventions 659 periodicity 1–30, 166–8, 177–9, 499, 660 PET (positron emission tomography) 501, 502 phonation type 147, 347, 348, 413, 424, 436 phonetic encoding 533–4, 625 phonetics–prosody interplay 16, 18, 23, 27 phonological abstraction 571–8 phonological phrase 3, 46, 105–7, 238, 277, 279, 352, 415, 527, 567, 570 phonological processes 129, 241, 415, 527 phonological word, see prosodic word phrasal/phrase accent 5, 7, 83–6, 89–91, 168–9, 177, 216–17, 220–2, 232–4, 240–9, 279, 287–9, 301–8, 315, 319, 446, 562, 640 n.10, 651–2 phrasal prominence 258–9, 650 phrasal stress 23, 202, 238, 280, 337–9, 419 phrase tone, see edge tone 295 phrase-based cues 461–2 phrasing, see prosodic phrasing pitch esp. ch 3

pitch perception esp. ch 3, 500, 503, 568 pitch production acquisition 546 disorder 488–9, 493, 495 pitch range 35, 89, 110, 129, 133–5, 148–9, 444–5, 466, 518, 556, 575 planning frame 531–2, 535–6 polar question, see also yes/no question 79, 85, 188, 191, 197, 200, 202, 223, 550, 609, 610 polar tone 56, 293, 438 polarity 56, 187, 204, 216, 435, 438, 608–9 polysynthetic languages 311 positive affect 574, 577, 579 postfocal accent, see also deaccenting 235, 248, 289 post-focus compression 2, 135, 296, 341, 376, 464, 608 post-focus deaccenting 135, 211, 217, 234–5, 263, 289–90, 296, 301–2, 306, 325–6, 329, 369, 390, 455, 457–9, 464, 607–8 post-lexical pitch 361, 389, 461, 463 posttonic suffix 278 power features 641–2 Praat 163–5, 592, 629 pragmatics 444–5, 447–8, 450–2, 453, 472 pre-boundary lengthening 22, 153, 176, 217, 239, 240, 241, 255, 388, 555 preference for infant/child-directed speech 574, 575, 578 prenatal 565–6, 588 prenuclear accent 141, 261, 305, 306, 321, 324 pre-planning 53, 137, 213, 530–1, 537 primary stress 66, 67, 68, 69, 151–5, 191, 199, 219, 228, 237, 238, 251, 252, 272, 279–81, 289, 301, 304, 312, 314, 372, 373, 391, 397, 398, 425–6, 430, 431, 432, 433–4, 439 procliticization 279 productivity 335, 343, 405 Prokosch’s Law 272 prominence 6, 15, 18, 20, 22, 66, 75–6, 79, 81, 82, 105, 139, 145–7, 156, 168–9, 174, 190, 456–7, 459, 461, 463, 464–5, 466, 525, 527–8, 531, 537–8, 560, 565–6, 568, 570, 606–8, 650, 659–2, 665, 667, 673 prominence grid 75 prominence tone 275 prose 658, 661 prosodic bootstrapping esp. ch 40, 561, 584 prosodic constituency, see prosodic phrasing prosodic domain 6, 63, 76, 108, 150, 206, 233, 277–8, 280, 310, 318, 350, 399, 404, 405, 422, 426, 511 prosodic head 151, 211 prosodic hierarchy 4, 82, 97, 209, 217, 240, 277–8, 352, 362, 508, 512, 519, 523, 528, 567, 609–10, 660–1 prosodic morphology ch 7, 406–7 prosodic paragraph 279 prosodic phrasing 15, 20, 22, 25, 81–3, 96–7, 100–1, 105–8, 200, 209, 236, 239, 255–8, 279, 352, 365, 368–9, 377, 404, 415–17, 510, 518, 522–3, 525, 526–30, 532, 533–4, 537, 555, 572, 609

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

Subject index 891 prosodic prominence 115, 267, 565, 570, 606 prosodic structure, see prosodic phrasing prosodic typology chs 4–6, 171, 173, 440, 647 prosodic word/phonological word 97, 105, 106, 513, 521 Prosody First model of speech encoding 534–6 Prosody-Voice Screening Protocol (PVSP) 591

Q

quantitative meters 657, 661–2, 666–71 quantity 226–7, 228, 272, 284, 316–18, 387, 675 quasi-tonal languages 222 quatrain 658, 660, 668–70, 674 question intonation 223–4, 341–2, 389, 426, 448, 449, 551 question particle 5, 223, 268 question under discussion (QUD) 448–9, 451–2

R

ranking 99–100, 659, 671 Rapid Prosodic Transcription (RPT) 529, 647 real time MRI 27 recitation 194, 659, 673 reduplication 50, 96, 97–100, 103, 398, 406 register 50–1, 135, 192, 209, 246–7, 340, 389–90, 444, 457, 463–4, 466 register-based cues 9, 463–4 replacive tone 58, 63, 186–8 resistance to coarticulation 155 respitrace inductive plethysmograph (RIP) 16, 19, 20, 26 restricted tone 69 resyllabification 237, 675 rhythm ch 11, 230, 239, 489, 490, 491, 557–8, 565–6, 587, 611–12, 616, 623–4, 627, 638, 655, 658–60, 667–70 Rhythm and Pitch (RaP) 525–6, 647–8 rhythm class 167–8, 171–5, 178, 230, 624 rhythm metrics 168, 171–3, 230, 624 Rhythm Rule 272, see stress clash rising declarative 450 root-and-pattern morphology 100–1

S

saliency 38, 42, 138, 163, 169, 174, 228, 554–5, 576, 673–5 sandhi, see tone sandhi scaling 38, 87, 89–92, 133–9 scansion 675 secondary association 82, 84–5, 91, 232, 243, 245, 249 secondary stress 4, 67–8, 72–5, 228–9, 237–8, 252–3, 270, 272, 280, 306–7, 385, 397, 412, 419, 425–6, 429, 431–3, 439 segmental intonation 40–1, 147 semitone scale 165 n.1 sentence modality 195, 257, 260, 262, 270

sentence stress ch 10, 168, 202, 489–90 sesquisyllables 346 shortening 68, 75, 176, 489, 526, 529, 671 sign language intonation ch 8 silence 153, 470, 613, 634, 641, 652 specific language impairment, see Developmental language disorder (DLD) spectral expansion 154–5, 157–8 spectral reduction 154 spectral tilt 154–5, 158, 163–4, 456, 474, 478, 650, 652 speech errors 531, 534 speech processing 174, 208, 489, 503–4, 507, 532, 566 speech production planning ch 37 speech rate 41, 131, 166, 170–2, 470, 472, 575, 612–13, 616 speech recognition 510, 512, 518, 521 speech/linguistic rhythm ch 11, 557–8, 566, 612, 616, 658 speech timing 166–8, 175–9 SpeechPrompts 593 spreading 54–5, 129, 185–6, 201, 203–4, 249, 294, 335, 350–2, 401 stød 271, 277 strengthening, non-prosodic 17–18, 20, 25, 67–8, 176, 511, 541 strengthening, prosodic 15, 18, 20, 22–5 stress ch 5, ch 10 ambiguous analysis 189, 196–7, 205, 372–5, 378 and L1 tone 543, 600 and L2 tone 600–3 stress assignment 74, 199, 215, 229, 252, 295, 298, 318–19, 391, 417–19, 426, 570 stress clash 73, 75, 280, 307, 346, 527 phrasal 5, 178, 241, 272 n.1, 280, 287, 307, 323, 527 word 72–3, 75, 198, 346, 402 stress deafness 169, 208, 595, 625 stress meters 661–2, 667 stress shift 229, 280, 336–7, 418, 436 stress typology 76–7, 374 stress-based cues 461–2, 467 stresslessness 69, 306, 375 n.1, 383 stress–tone interaction 76, 163, 226, 371, 401–2, 420, 428, 436 Stress-XP 337–8 Strict Layering 661, 666 stylized contours 281, 476 subcortical auditory systems 499 subcortical structures 499, 503, 505 subtle stress 373 subtractive morphology 101, 406 suffixation 312, 374, 406 suprasegementals ch 36, 3, 5, 15, 347, 503–4, 509–12, 522, 596–7, 624–5, 684 surface phonetic variation 527 syllabic trochee 336 syllabification 60, 172, 212, 217, 278, 312, 314, 391, 432

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

892 Subject index syllable integrity 68–9 syllable weight 70–1, 97–8, 168, 219, 227, 238, 279, 338, 398–9, 548, 598, 624, 666–7 syllable-counting 359, 369, 669 syllabotonic meters 661 syncope 414, 422, 431, 437, 440, 663 syncopation 671–2 syntactic analysis 118, 377, 571–2 syntactic processing 506–7 syntactic tone, see grammatical tone syntagmatic contrast 15, 20, 128, 152, 516 syntax–prosody interaction 203, 256–6, 269, 338–9, 365, 369, 404–5, 464–5, 506–7, 528, 532, 649 Sign language 117–122

T

tân nhạc (Vietnamese ‘new music’) 682–3 tanka 668 TBU (tone-bearing unit) 47–8, 84, 185, 274, 311, 364, 393, 410 TCoG (Tonal Center of Gravity) 38, 144–7, 148–56 template 98, 100–3, 406, 415, 422, 504 tempo, see speech rate ternary feet 394 tetrameter 658, 667–8 text-setting ch 48, ch 49 Thai song 683–4 Theory of Optimal Tonal Perception 37, 38 ToBI ch 6, ch 47, 525–6, 637 tonal alignment 87–8, 90, 129–31, 200, 241–5, 250, 289, 308, 405, 610 tonal coarticulation 126–9, 334, 350–1, 604 tonal copying 248–9 tonal crowding 77, 91–4, 131, 249 tonal density 184, 270 tonal domain 47–8, 339 tonal markedness 51–2 tonal morpheme, see also grammatical tone 57 tonal morphology, see also grammatical tone 53, 205, 349–5, 400–2, 410–11, 438 tonal polarity 204, 435, 438 tonal target 87, 90, 142, 249, 305, 525 tone esp. ch 4 tone absorption 56 tone/tonal accent 48, 200, 216, 271–2, 279, 282–3 tone/tonal height 48–9, 51–3, 55, 60, 64–5, 184, 190–1, 204 tone inventory 143, 187, 217, 243, 291, 333, 388, 409 tone melody 65, 132 tone package 46–7 tone rules 53, 63, 185–6, 191, 353 tone sandhi 63, 333–5, 341, 349–50, 410–11, 544 tone scaling 139, 143

tone spreading 54–5, 185–6, 201, 203, 204, 248, 249, 350, 351, 352 tone-gesture alignment 22 toneme 46 tone–melody matching ch 49 Tone-Stress Principle 335 tonogenesis 64, 184, 201, 275, 320, 371, 419 topic particle 361 trochaic bias 548, 549–50, 568 truncation 92, 100–2, 131–2, 233, 248–50, 261, 308, 310, 551, 610 turn-taking 445, 469–70, 476, 576

U

ultrasound systems 16 underspecified 51–2, 81, 87, 243 upstep/upstepped/upstepping 35, 36, 51, 55, 89, 90, 247, 248, 267, 290, 367, 368 upsweep 51, 305 uptalk 90, 290, 291, 296, 299–300

V

variability problem 510, 512 verse ch 48 vertical assimilation 53, 54 virtual pitch 33–4 visual prosody ch 33 vocative 102, 216, 223, 240, 242, 245, 382 vocative chant, see calling contour vowel harmony 203–4, 208–9, 212–13, 216, 224 vowel quantity 226, 272, 316–17, 669 vowel reduction 67, 159, 168, 172, 174, 228, 229, 239, 309, 372, 506, 597, 612, 624, 650

W

Weaver algorithm 533 weight-sensitive stress 71–2, 398–9 WH-question 111, 117–22, 223–4, 389 word accent, see lexical (pitch) accent word learning 542–3, 544, 546, 548–9, 570 word order 211, 217, 221–3, 231, 233, 235, 259–60, 323, 390, 416, 423, 458, 463–5, 559, 570–1, 608, 614 word segmentation 208, 515, 551 word stress esp. ch 5, ch 10 obligatoriness 68–9, 429

X

X-ray microbeam 21, 22

Y

yes/no question, see also polar question 111, 264–9, 291

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

OX FOR D HA NDBOOKS IN LINGUISTICS THE OXFORD HANDBOOK OF AFRICAN AMERICAN LANGUAGE Edited by Sonja Lanehart

THE OXFORD HANDBOOK OF AFRICAN LANGUAGES Edited by Rainer Vossen and Gerrit J. Dimmendaal

THE OXFORD HANDBOOK OF APPLIED LINGUISTICS Second edition Edited by Robert B. Kaplan

THE OXFORD HANDBOOK OF ARABIC LINGUISTICS Edited by Jonathan Owens

THE OXFORD HANDBOOK OF CASE

Edited by Andrej Malchukov and Andrew Spencer

THE OXFORD HANDBOOK OF CHINESE LINGUISTICS Edited by William S-Y Wang and Chaofen Sun

THE OXFORD HANDBOOK OF COGNITIVE LINGUISTICS Edited by Dirk Geeraerts and Hubert Cuyckens

THE OXFORD HANDBOOK OF COMPARATIVE SYNTAX Edited by Gugliemo Cinque and Richard S. Kayne

THE OXFORD HANDBOOK OF COMPOSITIONALITY

Edited by Markus Werning, Wolfram Hinzen, and Edouard Machery

THE OXFORD HANDBOOK OF COMPOUNDING Edited by Rochelle Lieber and Pavol Štekauer

THE OXFORD HANDBOOK OF COMPUTATIONAL LINGUISTICS Edited by Ruslan Mitkov

THE OXFORD HANDBOOK OF CONSTRUCTION GRAMMAR Edited by Thomas Hoffman and Graeme Trousdale

THE OXFORD HANDBOOK OF CORPUS PHONOLOGY Edited by Jacques Durand, Ulrike Gut, and Gjert Kristoffersen

THE OXFORD HANDBOOK OF DERIVATIONAL MORPHOLOGY Edited by Rochelle Lieber and Pavol Štekauer

THE OXFORD HANDBOOK OF DEVELOPMENTAL LINGUISTICS Edited by Jeffrey Lidz, William Snyder, and Joe Pater

THE OXFORD HANDBOOK OF ENDANGERED LANGUAGES Edited by Kenneth L. Rehg and Lyle Campbell

THE OXFORD HANDBOOK OF ELLIPSIS

Edited by Jeroen van Craenenbroeck and Tanja Temmerman

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

THE OXFORD HANDBOOK OF ENGLISH GRAMMAR Edited by Bas Aarts, Jill Bowie, and Gergana Popova

THE OXFORD HANDBOOK OF ERGATIVITY

Edited by Jessica Coon, Diane Massam, and Lisa deMena Travis

THE OXFORD HANDBOOK OF EVENT STRUCTURE Edited by Robert Truswell

THE OXFORD HANDBOOK OF EVIDENTIALITY Edited by Alexandra Y. Aikhenvald

THE OXFORD HANDBOOK OF EXPERIMENTAL SEMANTICS AND PRAGMATICS Edited by Chris Cummins and Napoleon Katsos

THE OXFORD HANDBOOK OF GRAMMATICALIZATION Edited by Heiko Narrog and Bernd Heine

THE OXFORD HANDBOOK OF HISTORICAL PHONOLOGY Edited by Patrick Honeybone and Joseph Salmons

THE OXFORD HANDBOOK OF THE HISTORY OF ENGLISH Edited by Terttu Nevalainen and Elizabeth Closs Traugott

THE OXFORD HANDBOOK OF THE HISTORY OF LINGUISTICS Edited by Keith Allan

THE OXFORD HANDBOOK OF INFLECTION Edited by Matthew Baerman

THE OXFORD HANDBOOK OF INFORMATION STRUCTURE Edited by Caroline Féry and Shinichiro Ishihara

THE OXFORD HANDBOOK OF JAPANESE LINGUISTICS Edited by Shigeru Miyagawa and Mamoru Saito

THE OXFORD HANDBOOK OF LABORATORY PHONOLOGY Edited by Abigail C. Cohn, Cécile Fougeron, and Marie Hoffman

THE OXFORD HANDBOOK OF LANGUAGE AND LAW Edited by Peter Tiersma and Lawrence M. Solan

THE OXFORD HANDBOOK OF LANGUAGE AND SOCIETY Edited by Ofelia García, Nelson Flores, and Massimiliano Spotti

THE OXFORD HANDBOOK OF LANGUAGE ATTRITION Edited by Monika S. Schmid and Barbara Köpke

THE OXFORD HANDBOOK OF LANGUAGE CONTACT Edited by Anthony P. Grant

THE OXFORD HANDBOOK OF LANGUAGE EVOLUTION Edited by Maggie Tallerman and Kathleen Gibson

THE OXFORD HANDBOOK OF LANGUAGE POLICY AND PLANNING Edited by James W. Tollefson and Miguel Pérez-Milans

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

THE OXFORD HANDBOOK OF LANGUAGE PROSODY Edited by Carlos Gussenhoven and Aoju Chen

THE OXFORD HANDBOOK OF LEXICOGRAPHY Edited by Philip Durkin

THE OXFORD HANDBOOK OF LINGUISTIC ANALYSIS Second edition Edited by Bernd Heine and Heiko Narrog

THE OXFORD HANDBOOK OF LINGUISTIC FIELDWORK Edited by Nicholas Thieberger

THE OXFORD HANDBOOK OF LINGUISTIC INTERFACES Edited by Gillian Ramchand and Charles Reiss

THE OXFORD HANDBOOK OF LINGUISTIC MINIMALISM Edited by Cedric Boeckx

THE OXFORD HANDBOOK OF LINGUISTIC TYPOLOGY Edited by Jae Jung Song

THE OXFORD HANDBOOK OF LYING Edited by Jörg Meibauer

THE OXFORD HANDBOOK OF MODALITY AND MOOD Edited by Jan Nuyts and Johan van der Auwera

THE OXFORD HANDBOOK OF MORPHOLOGICAL THEORY Edited by Jenny Audring and Francesca Masini

THE OXFORD HANDBOOK OF NAMES AND NAMING Edited by Carole Hough

THE OXFORD HANDBOOK OF NEGATION Edited by Viviane Déprez and M.Teresa Espinal

THE OXFORD HANDBOOK OF NEUROLINGUISTICS Edited by Greig I. de Zubicaray and Niels O. Schiller

THE OXFORD HANDBOOK OF PERSIAN LINGUISTICS Edited by Anousha Sedighi and Pouneh Shabani-Jadidi

THE OXFORD HANDBOOK OF POLYSYNTHESIS

Edited by Michael Fortescue, Marianne Mithun, and Nicholas Evans

THE OXFORD HANDBOOK OF PRAGMATICS Edited by Yan Huang

THE OXFORD HANDBOOK OF REFERENCE Edited by Jeanette Gundel and Barbara Abbott

THE OXFORD HANDBOOK OF SOCIOLINGUISTICS Second Edition Edited by Robert Bayley, Richard Cameron, and Ceil Lucas

OUP CORRECTED PROOF – FINAL, 07/12/20, SPi

THE OXFORD HANDBOOK OF TABOO WORDS AND LANGUAGE Edited by Keith Allan

THE OXFORD HANDBOOK OF TENSE AND ASPECT Edited by Robert I. Binnick

THE OXFORD HANDBOOK OF THE WORD Edited by John R. Taylor

THE OXFORD HANDBOOK OF TRANSLATION STUDIES Edited by Kirsten Malmkjaer and Kevin Windle

THE OXFORD HANDBOOK OF UNIVERSAL GRAMMAR Edited by Ian Roberts

THE OXFORD HANDBOOK OF WORLD ENGLISHES Edited by Markku Filppula, Juhani Klemola, and Devyani Sharma