550 118 9MB
English Pages 384 [399] Year 2020
Approaches to the Study of Sound Structure and Speech
This innovative work highlights interdisciplinary research on phonetics and phonology across multiple languages, building on the extensive body of work of Katarzyna Dziubalska-Kołaczyk on the study of sound structure and speech. The book features concise contributions from both established and up-andcoming scholars who have worked with Katarzyna Dziubalska-Kołaczyk across a range of disciplinary fields toward broadening the scope of how sound structure and speech are studied and how phonological and phonetic research is conducted. Contributions bridge the gap between such fields as phonological theory, acoustic and articulatory phonetics, and morphology, but also includes perspectives from such areas as historical linguistics, which demonstrate the relevance of other linguistic areas of inquiry to empirical investigations in sound structure and speech. The volume also showcases the rich variety of methodologies employed in existing research, including corpus-based, diachronic, experimental, acoustic, and online approaches and showcases them at work, drawing from data from languages beyond the Anglocentric focus in existing research. The collection reflects on Katarzyna Dziubalska-Kołaczyk’s pioneering contributions to widening the study of sound structure and speech and reinforces the value of interdisciplinary perspectives in taking the field further, making this key reading for students and scholars in phonetics, phonology, sociolinguistics, psycholinguistics, and speech and language processing. Magdalena Wrembel is University Professor and Head of Studies in the Faculty of English at Adam Mickiewicz University in Poznań, Poland. Her main research areas involve bilingualism and multilingualism, phonological acquisition of the third language, and language awareness. Her current work focuses on crosslinguistic influence and longitudinal development of L3 phonology. Agnieszka Kiełkiewicz-Janowiak is University Professor in the Faculty of English at Adam Mickiewicz University in Poznań, Poland. She has done research and lectured on social dialectology, historical sociolinguistics, discourse analysis as well as language and gender issues. Her current research interests focus on lifespan sociolinguistics and the discourse of ageing across cultures. Piotr Ga˛siorowski is University Professor in the Faculty of English at Adam Mickiewicz University in Poznań, Poland. His research interests include historical and evolutionary linguistics, theories of language change, dialectology, and phonetics and phonology. His current research work focuses on various aspects of Germanic and Indo-European reconstruction as well as Modern English prosody.
Routledge Studies in Linguistics
Perspectives from Systemic Functional Linguistics Edited by Akila Sellami-Baklouti and Lise Fontaine Time Series Analysis of Discourse Method and Case Studies Dennis Tay Heart-and Soul-Like Constructs across Languages, Cultures, and Epochs Edited by Bert Peeters Systemic Functional Political Discourse Analysis A Text-based Study Eden Sum-hung Li, Percy Luen-tim Lui and Andy Ka-chun Fung Systemic Functional Language Description: Making Meaning Matter Edited by J.R. Martin and Y.J. Doran Rarely Used Structures and Lesser-Studied Languages Insights from the Margins Emily Manetta Externalization Phonological Interpretations of Syntactic Objects Yoshihito Dobashi Approaches to the Study of Sound Structure and Speech Interdisciplinary Work in Honour of Katarzyna Dziubalska-Kołaczyk Edited by Magdalena Wrembel, Agnieszka Kiełkiewicz-Janowiak and Piotr Gąsiorowski For more information about this series, please visit: www.routledge. com/Routledge-Studies-in-Linguistics/book-series/SE0719
Approaches to the Study of Sound Structure and Speech Interdisciplinary Work in Honour of Katarzyna Dziubalska-Kołaczyk Edited by Magdalena Wrembel, Agnieszka Kiełkiewicz-Janowiak and Piotr Ga˛siorowski
First published 2020 by Routledge 52 Vanderbilt Avenue, New York, NY 10017 and by Routledge 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2020 Taylor & Francis The right of Magdalena Wrembel, Agnieszka Kiełkiewicz-Janowiak, and Piotr Gąsiorowski to be identified as authors of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data A catalog record for this book has been requested ISBN: 978-0-367-33760-5 (hbk) ISBN: 978-0-429-32175-7 (ebk) Typeset in Sabon by Apex CoVantage, LLC
Contents
List of Contributors
ix
Introduction
1
MAGDALENA WREMBEL, AGNIESZKA KIEŁKIEWICZ-JANOWIAK AND PIOTR GĄSIOROWSKI
PART 1
With Hindsight: Diachronic Approaches 1 The Consonants of 19th-Century English: Southern Hemisphere Evidence
5
7
PETER TRUDGILL
2 High Vowel Decomposition in Midwest American English
24
JERZY RUBACH
3 Social Dialect: The Halting of a Sound Change in Oslo Norwegian Revisited—A Report on the Imminent Victory of Retroflex /ɭ/
42
ERNST HÅKON JAHR
4 The Palatal ~ Non-Palatal Distinction in Irish and Russian
53
RAYMOND HICKEY
5 Vennemann’s Head Law and Basque
66
MIREN LOURDES OÑEDERRA OLAIZOLA
6 Ex Oriente Lux: How Nepali Helps to Understand Relict Numeral Forms in Proto-Indo-European PIOTR GĄSIOROWSKI AND MARCIN KILARSKI
76
vi
Contents
PART 2
On Close Inspection: Theoretical and Methodological Approaches 7 Pholk Phonetics and Phonology
85 87
NANCY NIEDZIELSKI AND DENNIS R. PRESTON
8 Rhythm Zone Theory: Speech Rhythms Are Physical After All
109
DAFYDD GIBBON AND XUEWEI LIN
9 The Remote Island, Unattested Patterns and Initial Clusters
129
TOBIAS SCHEER
10 Main Differences Between German and Russian (Mor)phonotactics: A Corpus-Based Study
145
WOLFGANG U. DRESSLER AND ALONA KONONENKO-SZOSZKIEWICZ
11 Boundaries and Typological Variation in Laryngeal Phonology
159
EUGENIUSZ CYRAN
12 Cross-Language Phonetic Relationships Account for Most, But Not All L2 Speech Learning Problems: The Role of Universal Phonetic Biases and Generalized Sensitivities
171
OCKE-SCHWEN BOHN
13 L1 Foreign Accentedness in Polish Migrants in the UK: An Overview of Linguistic and Social Dimensions
185
AGNIESZKA KIEŁKIEWICZ-JANOWIAK AND MAGDALENA WREMBEL
14 The Greater Poland Spoken Corpus: Data Collection, Structure and Application
198
MAŁGORZATA KUL, PAULINA ZYDOROWICZ AND KAMIL KAŹMIERSKI
15 Sounds Delicious! JOHN C. WELLS
213
Contents
vii
PART 3
Reality Check: Empirical Approaches
225
16 The Involvement of the Cerebellum in Speech and Non-Speech Motor Timing Tasks: A Behavioural Study of Patients With Cerebellar Dysfunctions
227
MARZENA ŻYGIS, ZOFIA MALISZ, MAREK JASKUŁA AND IRENEUSZ KOJDER
17 ERP Correlates of Figurative Language Processing
244
ANNA B. CIEŚLICKA AND ROBERTO R. HEREDIA
18 Competing Vowels Facilitate the Recognition of Unfamiliar L2 Targets in Bilinguals: The Role of Phonetic Experience
258
BARTOSZ BRZOZA
19 Applications of Electropalatography in L2 Pronunciation Teaching and Phonetic Research
270
GRZEGORZ KRYNICKI AND GRZEGORZ MICHALSKI
20 Polish Two-Consonant Clusters: A Study in Native Speakers’ Phonotactic Intuitions
280
JOLANTA SZPYRA-KOZŁOWSKA AND PAULINA ZYDOROWICZ
21 Illustration of Markedness and Frequency Relations in Phonotactics
301
PAULINA ZYDOROWICZ AND PAULA ORZECHOWSKA
22 Laryngeal Phonology and Asymmetrical Cross-Language Phonetic Influence
316
GEOFFREY SCHWARTZ, JERZY DZIERLA AND EWELINA WOJTKOWIAK
23 Variable Rhoticity in the Speech of Polish Immigrants to England
326
EWA WANIEK-KLIMCZAK
24 Selected Aspects of Polish Vowel Formants JAROSŁAW WECKWERTH AND ANNA BALAS
338
viii
Contents
25 Testing Receptive Prosody: A Pilot Study on Polish Children and Adults
349
JOANNA ŚMIECI Ń SKA
26 Fostering Classroom Discourse for English Learners and Special Needs Students in Elementary School Classrooms
359
RONALD COLE
27 Uniformity, Solidarity, Frequency: Trends in the Structure of Stop Systems
370
IAN MADDIESON
Index
381
Contributors
Anna Balas is Assistant Professor in the Faculty of English at Adam Mickiewicz University, Poznań, Poland. Her PhD thesis was supervised by Professor Katarzyna Dziubalska-Kołaczyk. She has published on Natural Phonology, speech perception and production in the second, third and foreign language. Ocke-Schwen Bohn is Professor of English Linguistics at Aarhus University, Denmark. With funding from German and Danish agencies, and in collaboration with American, Canadian and Australian colleagues, his research focuses on the causes and characteristics of foreign accented speech and on speech perception (in infants, cross-language and second language acquisition). Bartosz Brzoza is a PhD student in the Department of Contemporary English Language at the Faculty of English, Adam Mickiewicz University, Poznań, Poland. His research interests include psycholinguistics, bilingualism and visual and spoken word recognition. He employs various psycholinguistic methods in his research. He also works as an educational consultant at the British Council Poland. Anna B. Cieslicka is Professor of Psychology at Texas A&M International University (TAMIU) and Director of the MS in Psychology Graduate Program. Her research focuses mainly on the psycholinguistics of second/foreign language acquisition and processing, bilingual lexicon, figurative language and the neuropsychology of bilingualism. Ronald Cole received his PhD in 1971. He studied speech perception for 10 years. At Carnegie Mellon University he discovered the awesome potential of language technologies to improve learning. He subsequently led efforts to develop intelligent tutoring systems for children and adults. His favorite place to work is Poznań, Poland. Eugeniusz Cyran is Full Professor and Head of the Department of Phonology and Phonetics at the John Paul II Catholic University of Lublin, Poland. His interests include typological phonological systems, Irish and Polish phonology and phonetics.
x
Contributors
Wolfgang U. Dressler is head of the Working Group “Comparative Psycholinguistics” in the Department of Linguistics at Vienna University (where he held a professorship 1971–2008). He is a member of the Austrian and six other Academies of Sciences, (co-)author of 28 books and volumes and 531 articles. He is currently working on morphology, language acquisition and poetic language. Jerzy Dzierla is a PhD student in the Faculty of English at Adam Mickiewicz University, Poznań, Poland. He defended his MA thesis in 2015. His research interests revolve primarily around issues associated with phonetic training, e.g., perceptual training paradigms, transfer of learning between the domains of perception and production. Piotr Ga˛siorowski is University Professor and Head of the Department of Older Germanic Languages in the Faculty of English at Adam Mickiewicz University, Poznań, Poland. His research interests include historical and evolutionary linguistics, theories of language change, dialectology, phonetics and phonology. His current research work focuses on various aspects of Germanic and Indo-European reconstruction as well as Modern English prosody. Dafydd Gibbon is Emeritus Professor of English and General Linguistics at Bielefeld University. He specialises in computational models and their applications in speech prosody and lexicography, particularly for the Niger–Congo languages of West Africa. For his cooperative research he received the Nigerian Linguistics Association Silver Jubilee Award, the Adam Mickiewicz University Bronze Medal, and was honoured as Officier de l’Ordre du Mérite Ivoirien. Roberto R. Heredia, PhD, is Regents Professor in the Department of Psychology and Communication at Texas A&M International University. He has published on bilingual memory, bilingual lexical representation, bilingual nonliteral language processing, stereotype processing and evolutionary psychology. He is co-editor and co-founder of Springer’s Bilingual Mind and Brain book series. Raymond Hickey is Professor of English Linguistics at the University of Duisburg, Essen. He has written several books on varieties of English, especially Irish English, the history of English (in particular, the 18th century), the standardization of English, language contact and areal linguistics, as well as sociolinguistic variation and change. Ernst Håkon Jahr has been Professor of Scandinavian Linguistics at the University of Tromsø (1976–1998) and the University of Agder, Kristiansand (since 1999). He was Rector of the latter from 2000 to 2007. He is the founder of the Agder Academy of Sciences and Letters and its President (2002–2019). His research fields include language contact, historical sociolinguistics, language planning and the history of linguistics.
Contributors
xi
He has written 60 books and about 200 articles, founded three journals and three book series, and organized 21 international conferences. Marek Jaskuła is a researcher working in the Faculty of Computer Science and Information Technology at the West Pomeranian University of Technology in Szczecin, Poland. He obtained his PhD in Electrical Engineering in 1999. His research activity focuses both on biomedical and speech-signal analysis and processing, and on hardware/software embedded systems development. Kamil Kazmierski is Assistant Professor in the Faculty of English at Adam Mickiewicz University, Poznań, Poland. He obtained a PhD in English and American Studies from the University of Vienna, Austria, in 2013. His research interests include theories of phonological storage, evolutionary linguistics and variationist sociolinguistics. Agnieszka Kiełkiewicz-Janowiak is University Professor in the Faculty of English at Adam Mickiewicz University, Poznań, Poland. She has done research and lectured on social dialectology, historical sociolinguistics, discourse analysis as well as language and gender issues. Her current research interests focus on life-span sociolinguistics and the discourse of ageing across cultures. Marcin Kilarski is University Professor in the Faculty of English at Adam Mickiewicz University in Poznań, Poland. His main areas of interest are linguistic typology and the historiography of linguistics. His publications include the monograph Nominal Classification: A History of Its Study From the Classical Period to the Present (John Benjamins, 2013). Ireneusz Kojder is a Professor and neurosurgeon working in the Department of Neurosurgery and Pediatric Neurosurgery at the Pomeranian Medical University Hospital in Szczecin, Poland. He is also a researcher and lecturer at the Pomeranian Medical University, working on clinical and neurocognitive disorders in cerebral pathologies and on the Balanced Sequence of Cerebellar Activity Theory. Alona Kononenko-Szoszkiewicz is a PhD student at Adam Mickiewicz University in Poznań, where she previously earned her MA degree. She graduated from Skovoroda National Pedagogical University in Ukraine. Her research interests include the sound-related aspects of language, i.e., phonotactics and morphonotactics. Grzegorz Krynicki is a Lecturer and a researcher in the Faculty of English, Adam Mickiewicz University, Poznań. His areas of interest include articulatory phonetics, computer-assisted pronunciation teaching and speech processing. He has collaborated with Professor Katarzyna Dziubalska-Kołaczyk on three research grants over the past 15 years.
xii
Contributors
Małgorzata Kul is Senior Lecturer in the Faculty of English, Adam Mickiewicz University, Poznań, Poland. Her research has focused on the exploration of connected speech processes. Another important strand is teaching and learning of advanced phonetics by learners of foreign languages. Yet another area of her research activity is corpus phonology. Xuewei Lin is Lecturer at Jinan University, China. She has a PhD in Translation Studies from the Guangdong University of Foreign Studies, and she visited University College London on a Governmental Joint PhD Scholarship offered by the China Scholarship Council. Her research interests are literary translation and second language acquisition. Ian Maddieson is currently affiliated as Adjunct Research Professor with the University of New Mexico, following years at the Phonetics Laboratory at UCLA, and at UC Berkeley. His research focuses on global patterns in phonetic and phonological systems, exemplified in Sounds of the World’s Languages (with Peter Ladefoged) and the UPSID and LAPSyD databases. Zofia Malisz is a postdoctoral researcher in speech technology at the Department of Speech, Music and Hearing, KTH Royal Institute of Technology in Stockholm, Sweden. She has published in the speech sciences (prosodic modelling, gesture and communication studies, dialogue) and speech technology (HCI and speech synthesis). She has led or contributed to eight research projects in these areas at universities in Poland, Germany and Sweden. Grzegorz Michalski is Senior Lecturer in the Faculty of English, Adam Mickiewicz University, Poznań, Poland. His 2009 doctoral thesis combined the cyclic approach to derivation in phonology with the representational apparatus of CVCV and Element Theory. His main interests are theoretical linguistics and the phonology of English and Polish. Nancy Niedzielski is Associate Professor in the Linguistics Department at Rice University. She is the co-author (with Dennis R. Preston) of Folk Linguistics and A Reader in Sociophonetics. Her areas of interest include speech perception, sociophonetics, language and race, language attitudes and forensic linguistics. Miren Lourdes Oñederra Olaizola is a retired Professor of Basque Phonology at the University of the Basque Country. She is a full member of the Royal Academy of the Basque Language, where she chairs the Pronunciation Committee. Most of her research deals with Basque phonology and its theoretical implications. Paula Orzechowska, PhD, works in the Faculty of English at Adam Mickiewicz University, Poznań, Poland. Her main research interests include phonology, morphonology, psycholinguistics and neurolinguistics. In collaborations with linguistic institutes in Germany, France and
Contributors
xiii
Austria she has worked on the phonotactics and word-stress of the Slavic, Germanic and selected Afro-Asiatic languages. Dennis R. Preston is Regents Professor at Oklahoma State University and University Distinguished Professor Emeritus at Michigan State University. He directed the 2003 LSA Institute and was President of the American Dialect Society. He is a sociolinguist and dialectologist, a Fellow of the Linguistic Society of America and holds the Officer’s Cross of the Order of Merit of the Polish Republic. Jerzy Rubach holds professorships at two universities: the University of Warsaw and the University of Iowa. He is a phonologist specialising in Slavic and Germanic languages, and is the founder of Derivational Optimality Theory, a phonological framework based on the assumption that there are four independent levels of phonological evaluation. Tobias Scheer is a senior CNRS researcher (Directeur de Recherche) working at the Université Côte d’Azur in Nice. He has authored a book on Strict CV (2004), A Guide to Phonology—Morpho-Syntax Interface (2011), a volume on his own theory of the latter (Direct Interface, 2012) and a textbook on syllable structure (2015, in French). Geoffrey Schwartz is University Professor at Adam Mickiewicz University, Poznań, Poland. His main interests are phonological theory, the phonetics-phonology interface and second language speech. He has received four research grants from the Polish National Science Centre, and published in journals such as Glossa, Second Language Research and Language Sciences. Joanna Śmiecin΄ ska is Senior Lecturer in the Department of Contemporary English Language, Faculty of English, Adam Mickiewicz University, Poznań, Poland. She holds a PhD in Linguistics. She is interested in speech prosody, its development in children and the ways it can be tested. Jolanta Szpyra-Kozłowska is Professor of Linguistics in the Department of English at Maria Curie-Skłodowska University, Lublin, Poland. She has published extensively on English and Polish phonology, the phonologymorphology interaction, acquisition of English phonetics and phonology by Poles, pronunciation pedagogy, foreign accent perception and gender linguistics. She is the editor of the SOUNDS—MEANING— COMMUNICATION series at Peter Lang Verlag and organizer of international biennial conferences, Approaches to Phonetics and Phonology. Peter Trudgill is a theoretical dialectologist who has held professorships at the Universities of Reading, Essex and Lausanne. He is currently Emeritus Professor of English Linguistics at Fribourg University, Switzerland. He is the author of Dialects in Contact, New-Dialect Formation, Sociolinguistic Typology: Social Determinants of Linguistic Complexity, Investigations in Sociohistorical Linguistics and Dialect
xiv
Contributors
Matters: Respecting Vernacular Language. In 2017 he received the Medal of Merit from Adam Mickiewicz University, Poznań. Ewa Waniek-Klimczak is Professor of English Linguistics in the Institute of English at the University of Łódź, Poland. Her research interests include the acquisition and usage of the second language sound system, cross-linguistic phonetics/phonology and pronunciation teaching. She has published in the area of applied phonetics and phonology and has organised a series of conferences on pronunciation instruction and native/non-native accents of English (called Accents). Jarosław Weckwerth is Senior Lecturer in the Faculty of English, Adam Mickiewicz University, Poznań, Poland. His interests include phonetics, sociolinguistics and the use of technology in the teaching of linguistics. He has published on the acquisition of English pronunciation by Polish learners, as well as various aspects of English sociophonetics. John C. Wells is Emeritus Professor of Phonetics at University College London. He is the author of Accents of English (three volumes, CUP 1982), Longman Pronunciation Dictionary (Pearson Education, 3rd edn. 2008), English Intonation (CUP 2006), Sounds Interesting (CUP 2014) and Sounds Fascinating (CUP 2016). Ewelina Wojtkowiak is a PhD student in the Department of Contemporary English Language at the Faculty of English, Adam Mickiewicz University, Poznań, Poland. Her research interests include phonetic-phonology interface, feature theory, acoustic phonetics as well as cross-linguistic influence. Magdalena Wrembel is University Professor and Head of Studies in the Faculty of English at Adam Mickiewicz University, Poznań, Poland. Her main research areas involve bilingualism and multilingualism, phonological acquisition of the third language and language awareness. Her current work focuses on cross-linguistic influence and longitudinal development of L3 phonology. Paulina Zydorowicz is Assistant Professor at Adam Mickiewicz University in Poznań, Poland. Her research interests comprise phonetics and phonology with special focus on phonotactics, casual speech, acquisition and corpus linguistics. She has published on phonotactics of Polish and English in written corpora, acquisition, as well as sociophonetic variation in Polish. ∙ Marzena Zygis is a researcher in the laboratory phonology group at the Leibniz-Zentrum Allgemeine Sprachwissenschaft in Berlin working on the phonetics and phonology of Slavic languages. She obtained her PhD in Linguistics at Humboldt-Universität, Berlin in 1999. She received two habilitations in Linguistics: in 2007 at Humboldt-Universität, and in 2013 at Adam Mickiewicz University, Poznań.
Introduction Magdalena Wrembel, Agnieszka KiełkiewiczJanowiak and Piotr Gąsiorowski
This book is a collection of chapters providing a comprehensive overview of a broad range of issues in the study of sound structure and speech approached from interdisciplinary perspectives. It encompasses state-ofthe-art articles addressing the topic from various angles such as theory of phonology, acoustic and articulatory phonetics, segmental and suprasegmental phonetics, morphonotactics, historical phonology, speech perception, language contact, cross-linguistic influence, sociophonetics, foreign language acquisition, L2 pronunciation teaching as well as migrant speech. It offers a discussion of fundamental problems in these fields as well as new empirical findings generated from a variety of methodological approaches. The languages covered herein include English, German, Russian, Basque, Norwegian, Polish and miscellaneous others. The volume is aimed to honour the academic achievements of an outstanding scholar and charismatic personality, Professor Katarzyna Dziubalska-Kołaczyk, on her sixtieth birthday. To this end, we invited numerous colleagues, from no fewer than ten countries, as authors of particular chapters. The authors have all been actively involved in one way or another in co-operation with the Honorand. They represent the full spectrum of academia, from experienced, internationally renowned scholars to young linguists—Professor Dziubalska-Kołaczyk’s numerous disciples who have contributed to the Poznań school of natural linguistics. Our main objective is to bring together insights from different subdisciplines investigating speech from various perspectives and to bridge the gap in the literature in this area. The innovative nature of the volume is threefold: first, it widens the approach to the study of speech, thus differing from narrow, highly specialised collections; second, it covers data on a number of languages, going beyond the Anglocentric bias; third, it features various methodologies and designs in the investigations of sound structure and speech. The book consists of three complementary parts. Part 1, “With Hindsight: Diachronic Approaches”, illustrates the different ways in which historical analyses of sound change contribute to our understanding of speech, and vice versa—how current typological perspectives and comparative
2
Magdalena Wrembel et al.
studies help us to make sense of the past. Part 2, “On Close Inspection: Theoretical and Methodological Approaches”, offers insights into ongoing basic research, with emphasis on theoretical novelties, interdisciplinary syntheses, innovative methods and exploratory and explanatory work on speech-related phenomena. Part 3, “Reality Check: Empirical Approaches”, contains reports of cutting-edge experimental research, from advanced laboratory phonetics to neurocognitive studies of brain activity during speech processing. Our mutual friend, Professor Katarzyna Dziubalska-Kołaczyk, born in 1960 and active in academia since 1983, is an outstanding linguist with six monographs, more than a hundred articles or book chapters and about two hundred conference presentations to her credit. She is best known for her role in the development of Natural Phonology, although her interests are wide and her research spans many areas of linguistics. Since 1999, Professor Dziubalska-Kołaczyk has been the editor-in-chief of Poznań Studies in Contemporary Linguistics, which has progressed, under her editorship, to the unquestionably top position among linguistic journals in Poland. Since 2012, she has been the Dean of the Faculty of English at Adam Mickiewicz University in Poznań, the only Polish centre of English studies elevated to such a rank. Many of Professor DziubalskaKołaczyk’s publications have resulted from collaboration with eminent linguists world-wide and with her colleagues and students at home. Such emphasis on teamwork and joint projects is rare in the humanities. As a teacher and academic advisor, she has contributed to rearing a new generation of linguists, including more than twenty who worked for their doctoral degrees under her advisorship. As a member of no fewer than twenty-four linguistic societies—serving as president or officer in some of them at various times—and an indefatigable conference organiser, she has made a powerful impact on the international academic life of the community of linguists. All the authors of chapters in the present volume, including its editors, cherish a profound respect for Professor Dziubalska-Kołaczyk’s achievements as a scholar and an educator. Katarzyna has guided us all by setting the example of a scholar whose approach to language study is, at the same time, deeply humanistic and strictly scientific. We feel that our gratitude will find the best expression in this kind of collaborative undertaking. We hope this volume “showcases the impressive diversity and richness of speech research nowadays” (to gratefully quote the words of one of the book proposal reviewers) by linking the theoretical and empirical domains. We intend to encapsulate in it the dynamic development of the study of speech, expanding over a diversity of languages, communities, methodologies and approaches. The volume is intended as a valuable resource for scholars and professionals investigating the study of speech, phoneticians and phonologists, speech scientists, sociolinguists and more generally linguists—whether theoretical, experimental or applied—for
Introduction
3
researchers and students who wish to gain an insight into this area of linguistics. As editors, we should like to thank the authors for their enthusiastic response and contributions to the present volume, as well as their cooperation in the editorial process. We wish to express our gratitude to all the reviewers for their effort in peer-reviewing the book proposal as well as the submitted articles. We are indebted to Dr Jarosław Weckwerth for his invaluable technical assistance in the editing process. We gratefully acknowledge the co-operation with the publisher and, in particular, with Elysse Preposi, the linguistics research editor at Routledge. Last but not least, we thank Kasia, the Honorand, for being an inexhaustible source of inspiration for all of us.
Part 1
With Hindsight Diachronic Approaches
1
The Consonants of 19th-Century English Southern Hemisphere Evidence Peter Trudgill
1. Introduction The data on which this paper is based derive from the Origins of New Zealand English Project (ONZE) directed by Professor Elizabeth Gordon at Canterbury University in Christchurch, New Zealand (Gordon et al. 2004). Part of the work of this project is based on a rather remarkable database. In 1946, a Mobile Disc Recording Unit was set up by the New Zealand National Broadcasting Service and was sent around the country to collect pioneer reminiscences and local music. As part of the ONZE project, these recordings were copied onto tape, and Professor Gordon and her colleagues transcribed and analysed a large number of them. My personal involvement with the project was for the most part concerned with analysing the phonology of the oldest speakers in order to trace the new-dialect formation developments (see Trudgill 1986) of the New Zealand accent with reference to the British Isles (and other) input. However, it also became clear that the recordings gave us, too, some important and possibly unique insights into earlier forms of British Isles English. These oldest speakers, who were recorded in the 1940s when they were already elderly, were born in the 1850s, 1860s and 1870s. They represent the first generation of New Zealand-born anglophones. Typically, of course, children acquire the dialect and accent of their peers. However, in early anglophone New Zealand, there was no established peer-dialect for children to acquire. This of course is typical of situations involving the dialect contact, dialect mixture, koineisation, and new-dialect formation (see Trudgill 1986) that occurs in colonial situations. In some of the individual New Zealand cases analysed, this lack of a peer-group model led to fascinating and varied individual dialect mixture processes which I have described elsewhere (see Trudgill 2004). In other cases—particularly with speakers who grew up in isolated rural situations—it is clear that the elderly informants, as children over 150 years ago, acquired something very close indeed to the English, Scottish or Irish English dialects of their parents, for the very good reason that there was nothing else for them to acquire. People who never set foot in
8
Peter Trudgill
Scotland sound completely Scottish because their parents were Scottish. I have also argued that this led to a form of “colonial lag” (Marckwardt 1958), meaning that new colonial varieties lag behind metropolitan varieties in terms of linguistic change (Trudgill 1999b). The speech of such people is fossilised, in the sense that it represents forms of speech typical of a generation older than would normally be the case. We thus have recordings available which give important insights into the way vernacular English was spoken in the British Isles by people born as early as the 1820s. The colonial situation, that is, enables us to push back the time depth available to us for historical linguistic studies in the sense that we are able to investigate speech which is, as it were, one generation earlier than would be the case with British-born speakers. We do not actually have sound recordings from the 1830s, but it is as if we did. I now proceed to use data from the ONZE project, as well as from studies of other colonial varieties of English, to argue that certain wellestablished beliefs about the phonology of 19th-century British English are not entirely correct. I do this through an examination of selected English consonants.
2. H-Dropping H-dropping refers to the loss of word-initial /h/ in words such as hill, house, hammer, with the result that pairs such as ill and hill become homophonous. Not to be considered as h-dropping is the absence of /h/ in unstressed grammatical words such as him, his, he, her, have, has, had, where all English speakers lack /h/. Neither can we call the absence of /h/ h-dropping in words which have orthographic but which were borrowed from French without /h/, such as heir, hour, honest, honour and where, again, no English-speakers have /h/. However, care has to be taken in analysing the speech of older speakers with a number of words in this category which used to lack /h/ but now have it as a result of spelling pronunciation e.g., hotel, hospital, humble, humour, herb (though American English still lacks /h/ in this last item). Other words in this class (see MacMahon 1994: 477–478; Wells 1982: 255) have had /h/ for so long that pronunciations of them without /h/ can safely be considered as h-dropping: habit, heritage, homage, hospitable, host(ess), human. Care also has to be taken with older speakers in the case of words such as hotel, historic, hysteria, which have unstressed first syllables and which were treated in archaic RP like the unstressed forms of h-initial grammatical words i.e., they lacked /h/ also (see Wells 1982: 286). A final group of words in which absence of /h/ cannot be considered h-dropping, because no English speakers employ /h/ in such words (Gimson 1962: 186), consists of certain items with medial , such as exhaust, exhilarate, exhibit, vehicle, vehement, shepherd (plus, at least in Britain, place names such as Durham, Birmingham).
The Consonants of 19th-Century English
9
H-dropping represents the end-point of a very long historical process in which the original Old English phoneme /h ~ x/ was gradually subjected to more and more phonotactic restrictions. It was lost word-initially before /r/ as in hring = ring, before /l/ as in hlāf = loaf, and before /n/ as in hnutu = nut in late Old English or early Middle English; in other preconsonantal positions during the 1300s after back vowels, as in daughter and brought; and during the 1400s after other vowels, as in night and sigh, at least in the south of England (McLaughlin 1970: 110). The loss before /w/, as in which, is much more recent, and many varieties remain unaffected (see below). And the loss in absolute initial position, as in hill, is more recent still: Sweet (1888: 259) dates it to the late 1700s—“initial h began to be dropt everywhere in colloquial speech towards the end of thMn” [= third Modern period = 1700–1800]. This would place it about 25–50 years before the beginning of the period we are interested in. Southern Hemisphere Evidence Modern New Zealand and South African English typically do not have h-dropping. It occurs at a low level of frequency in the speech of some Australians, and in Falkland Islands English (see Trudgill 2004). H-dropping is not uncommon on the ONZE Project recordings, but only about 25% of speakers use this feature. This suggests that h-dropping was probably much less common in England and Wales in the 19th century than today. Interestingly, a number of ONZE speakers fall into a category which is absent from modern Britain: they have h-dropping only in the stressed forms of the grammatical words have, had, has, his, him, her, hers, here (unstressed forms of such words, of course, lack /h/ in all varieties of English). H-dropping for these speakers does not occur at all, even variably, in lexical words such as hammer, hill, house. A good explanation for the presence of this hitherto unknown feature in the speech of a sizeable group of ONZE speakers is that it was found in some forms of 19th-century English English, as a result of the spread of h-lessness from unstressed to stressed forms of the relevant grammatical words, and that speakers of this type have subsequently disappeared because h-lessness has now also spread to non-grammatical words. British Evidence According to Wells (1982: 255), “historical details of the spread of h-dropping through England are lacking”. However, there is much that we can deduce about the situation in the mid-19th century, most obviously by noting modern trends and working backwards. Most modern local accents of English and Welsh English currently demonstrate h-dropping. The major exceptions to this are the accents of the English
10
Peter Trudgill
Northeast and East Anglia, although modern urban East Anglia is currently acquiring h-dropping: the East Anglian h-pronouncing area is certainly smaller today than it was at the time of the Survey of English Dialects (SED) fieldwork (see Trudgill 1986: 44–46). H-dropping has not yet reached Scotland or Ireland, and indeed Scots (and to a certain extent Scottish English also) preserve /h/ even in final and pre-consonantal position, as in nicht ‘night’ and loch. In Map I we can see, in the areas labelled 1, the extent of h-dropping in English dialects as of the turn of the 21st century (from Trudgill 1999a). However, in the most conservative of varieties for which we have accurate information, the traditional dialects investigated by the SED in the 1950s, the areas involved are much bigger, as can also be seen from the map in areas labelled 2. Note too that at this stage there is another area in the southwest and a small area in northern Kent which also have absence of h-dropping, as does the Isle of Wight. Kurath and Lowman (1970: 32), employing data gathered by Lowman in the late 1930s, also say that “initial [h] is regular in a continuous area extending from Norfolk into Essex”. We can extrapolate backwards from this pattern to a supposition that absence of h-dropping was even more widespread in the period 1825–1865. Kurath and Lowman (1970: 32) say of the southwestern area in the 1930s that “initial [h] . . . occurs with some frequency in Somerset—Wiltshire—Hampshire” (see their map on p. 33). Ellis (1889), for an even earlier period, shows that the East Anglian h-pronouncing area extended into parts of southeast Lincolnshire, northern Cambridgeshire and northern Huntingdonshire (see transcriptions on pp. 211, 249–252, 298–299). He shows absence of h-dropping also in a rather larger area of Kent (see the transcriptions for Wingham, p. 142). There is also evidence that /h/ was retained in Devon—it is reported as being “seldom dropped” in Milbrook, southwest of Plymouth (p. 167)—and in Cornwall (see the transcriptions for St Columb, Marazion and Lands End, pp. 169, 172– 173). And there is also evidence from Ellis that the northeastern area was considerably larger too: most of his West Northern area, which includes south Durham, Westmoreland, northern Lancashire and western Cumberland, is described as a region in which “the aspirate . . . is employed with much uniformity in the country part” (p. 542, and see the transcriptions on pp. 563–594). Note that most of Ellis’s data was obtained in the 1870s. These areas are labelled 3 on Map I (Trudgill 2004). H-dropping is found today in the English of south Wales (Wells 1982: 391). In the Welsh-speaking area of north and west Wales, however, h-dropping does not occur in English for the good reason that /h/ is found in the Welsh of this region (Thomas 1994: 128—labelled 2 on the map). The Welsh-speaking area was of course larger in 1850 than it is today: Ellis (1889: 13–14) has information on where the language frontier ran in the 1860s (see his Map of the English Dialect Districts)—these areas are also labelled 3 on Map I.
Map I
12
Peter Trudgill
All this information now enables us, by working backwards from our oldest information for any given region, to produce the most extensive area shown in Map I. Ireland is still h-pronouncing to this day, and it can be seen from the map that most of Britain, including much of England, was h-pronouncing at the time in question, confirming the supposition that we arrived at on the basis of the Southern Hemisphere evidence.
3. The /hw/—/w/ Merger The merger of /hw/ and /w/ as in which/witch on /w/ is referred to by Wells (1982: 228) as Glide Cluster Reduction. (In varieties which still have this distinction, /hw/ may be [hw], [hʍ] or [ʍ].) Wells suggests that the merger began in lower-class speech in the south of England in early Middle English, became current in educated speech in the 1700s, and was “usual by 1800”. The merger can be regarded as part of the process of loss of /h/ just described above. Southern Hemisphere Evidence The historical distinction between initial /hw/ and /w/ has been widely preserved in New Zealand English (Wells 1982: 610). It is usual on the ONZE recordings; and Turner (1966: 105) reported that in 1964 about 50% of first-year students (i.e., people born about 1946) at Canterbury University, Christchurch, still had the distinction. Glide Cluster reduction is, on the other hand, usual in South African and Australian English, and is now increasingly becoming the norm in New Zealand. It is possible that the New Zealand pronunciation reflects a greater degree of retention of the distinction in 19th-century Britain than suggested by Wells. British Evidence In support of this, MacMahon (1994: 467) says that /hw/ was retained “by most speakers of educated Southern English until at least the second half of the nineteenth century”. In current regional speech, /hw/ survives totally in Scotland and Ireland (as well as in parts of North America) but has disappeared from all of England except the far north. Map II shows the area which had /hw/ according to Ellis, and the rather smaller area which can be deduced from the 1950s SED materials. Note that a comparison of Maps I and II reveals a puzzling situation. Accents in the south of England clearly lost /hw/ before they lost /h/: East Anglian dialects have /h/ but not /hw/, for instance. It would therefore seem legitimate to assume that there is an implicational scaling effect here due to the relative chronology of the two changes: all speakers who retain /hw/ also retain /h/, while the reverse is not necessarily true. However, there is a small area of northern England, roughly equivalent to the far
Map II
14
Peter Trudgill
northwest of Yorkshire, which Ellis shows as having preserved /hw/ while having lost /h/. The assumption cannot, then, be entirely correct.
4. /l/ In modern RP and many other forms of English English, /l/ is normally “clear” before a following vowel or /j/, but “dark” (velarised) before all other consonants, including /w/, or a pause, regardless of word boundaries. In the non-RP accents of southern England, l-vocalisation is also common; this process affects only dark /l/, and is certainly recent even in London, where it is most advanced today. As Wells (1982: 259) points out, there is no reference to it in descriptions of Cockney until the early 20th century. Southern Hemisphere Evidence The ONZE data suggest that the clear /l/–dark /l/ allophony of modern England is rather recent. On the ONZE Project recordings “dark” /l/ is not very “dark” at all in the speech of most of the informants, and there is little or no l-vocalisation. A number of informants have clear /l/ in all positions. In the modern Southern Hemisphere varieties, the pronunciation of /l/ tends to be rather dark, possibly pharyngealised, in most or all environments in Australasian English; and the distribution of “clear” and “dark” allophonic variants is certainly not as prominent as in many English English accents (Wells 1982: 603, 609): l-vocalisation is now under way in prepausal and preconsonantal position in varieties of New Zealand and Australian English, but it is obviously a 20th-century innovation. South African English /l/ is described by Wells (1982: 616) as being “neutral or clear in quality, without the dark allophones common elsewhere”. Falkland Islands English (see Trudgill 2004) has the English English type of allophony, with dark /l/ being rather markedly velarised. British Evidence The British evidence also supports the view that the clear /l/–dark /l/ allophony of English English appears to be a recent addition to English phonology. The SED materials show that the dark /l/ allophone was found only south of a line passing between Shropshire and Hereford and proceeding more or less due east to pass between Norfolk and Suffolk. In modern dialects, on the other hand, dark /l/ is now found in the relevant phonotactic environments everywhere except in the northeast (Trudgill 1999b). L-vocalisation is normal in the SED records only in southeastern Essex, southern Hertfordshire, northwestern Kent, Surrey, Middlesex and Sussex, i.e., the areas immediately to the south, east and north of London.
The Consonants of 19th-Century English
15
No form of Irish English has dark /l/, clear /l/ being usual in all environments, as it is in parts of the Scottish Highlands. In the Lowlands, on the other hand, dark /l/ is usual in all phonological environments.
5. /t/-Glottalling Wells (1982: 261) writes that t-glottalling, the realisation of syllable-final /t/ as [ʔ], “must have spread very fast in the course of the present century” in British English, and indeed there is plenty of evidence that this is the case. Southern Hemisphere Evidence None of the modern Southern Hemisphere Englishes has t-glottalling or pre-glottalisation, except for Falkland Islands English, which can be explained by that dialect’s rather later date of origin. It is hardly surprising, therefore, that there is hardly any glottalling in the speech of the elderly New Zealanders on the ONZE Project recordings, a confirmation, if one were needed, of the relatively recent development of this phenomenon in England. There is one exception on our tapes to this generalisation—a Mrs German, who was born in 1867 in Clinton and lived in Balclutha, both on the South Island. Her parents were middle-class people who came from Bury St. Edmunds, Suffolk. Mrs German preserves a number of obviously East Anglian features in her speech (see Trudgill 1999c). In Mrs German’s speech, word-final /t/ is quite often realised as [ʔ]. Although it is often assumed that t-glottalling in England was an urban innovation, it is equally possible that it had its origins in East Anglia: as we just saw, the only area of rural England to have considerable amounts of glottalling in the records of SED is northern East Anglia (see Trudgill 1974). The fact that Mrs German has this feature suggests that, unlike in the rest of England, it was possibly part of East Anglian English at least from the 1850s. A number of the ONZE Project speakers also have intervocalic /p, t, k/ as [pʔ, tʔ, kʔ]. One such is Mr C. Dixon, who was born in Naseby in 1867, and whose father also came from East Anglia (Norfolk). British Evidence According to Bailey (1996: 76) observers have been commentating on t-glottalling only since 1860, and early references were almost exclusively to Scotland and to London. The SED records show hardly any instances of t-glottalling except in the London area and northern East Anglia. And there is convincing evidence that it reached western areas such as Cardiff (see Mees 1977) and Liverpool only very recently. In many studies (see for example Trudgill 1988) it has been shown that younger speakers demonstrate more glottalling than older speakers.
16
Peter Trudgill
6. Pre-Glottalisation Pre-glottalisation (Wells 1982: 260) is the use of a glottal stop before /p, t, k, ʧ/ in items such as hopeless [hoʊʔpləs], match [mæʔʧ]. Pre-glottalisation of this type is very usual today in very many—perhaps most—English English accents, including RP (see Roach 1973). As Wells points out, however, it is something which has attracted very little comment from observers in Britain (although it may be one of the things which leads Americans to describe British accents as “clipped”). We therefore have no solid information which might lead us to any satisfactory indication of its dating. Southern Hemisphere Evidence When I first started listening to the ONZE tapes, I noticed that there was something about even those speakers who sounded very English that was strange—something which gave their speech a distinctly un-British ring. I eventually realised what it was: the ONZE speakers show very little evidence of pre-glottalisation. Its absence from the New Zealand data recordings suggests that pre-glottalisation in Britain is a recent and probably late 19th-century phenomenon. None of the modern Southern Hemisphere Englishes has pre-glottalisation, except for Falkland Islands English, which can be explained by its rather later date of origin. British Evidence Andrésen (1968) dates pre-glottalisation to the 20th century in RP. Interestingly, Andersen (2002) now shows that it is beginning to make an appearance in American English.
7. /r/ and Rhoticity Many modern accents of English are non-rhotic. That is, /r/ occurs only in prevocalic position, as in rat, trap, carry, car appliance, but not in nonprevocalic position, as in cart, car wash, car. This non-rhoticity is the result of the sound change which Wells (1982: 218) labels r-dropping, and which is well known to have begun in England (Scotland and Ireland remain rhotic to this day). Wells dates it to the 18th century, when /r/ disappeared before a consonant or in absolute final position. Bailey (1996: 100) says of English English that “the shift from consonantal to vocalic r, though sporadic earlier, gathered force at the end of the eighteenth century”. Strang (1970: 112) writes that “in post-vocalic position, finally or pre-consonantally, /r/ was weakened in articulation in the 17c and reduced to a vocalic segment early in the 18c”. The dates given by Strang and Bailey seem to be accurate for London: Walker (1791) says that nonprevocalic /r/ is “sometimes entirely sunk”.
The Consonants of 19th-Century English
17
Southern Hemisphere Evidence Except for the Southland area, New Zealand English today is non-rhotic (Wells 1982: 606). Australian and Falkland Islands English are also nonrhotic, as is South African English except for “some people who are native Afrikaans speakers” (Wells 1982: 617). This has often been erroneously ascribed (for instance by Trudgill 1986) to the fact that most of England was non-rhotic at the time of the main immigration to New Zealand. It is now obvious that this is not correct at all: of the 84 New Zealand Mobile Unit speakers analysed, an astonishing 92% are rhotic to some degree. This suggests that rhoticity must have been very common in 19thcentury England and has subsequently been lost in parallel in England and in New Zealand. English English and New Zealand English, having both been very rhotic in the 19th century, both became very non-rhotic in the 20th century—with the respective exceptions of the English southwest and parts of the northwest, and of New Zealand Southland—as a result of parallel developments. New Zealand English therefore did not inherit non-rhoticity from English English as such but rather inherited an ongoing process involving loss of rhoticity. Given that New Zealand was settled by anglophones later than Australia and South Africa, which are also non-rhotic, it would not be surprising to discover that they were originally rhotic also. In fact, Branford cites evidence (1994: 436) for rhoticity in South African English in the form of early borrowings from English into Xhosa: for example, in tichela ‘teacher’, the /l/ corresponds to English /r/, indicating a rhotic pronunciation. As far as Australia is concerned, the evidence is even stronger. This evidence comes from the speech of elderly Australians who were recorded in 1988 as part of the New South Wales bicentennial oral history project.1 The speech during the first hour of each recording made of the 12 oldest speakers, who were born between 1889 and 1899, has been analysed. Of these 12 speakers, the six men are all rhotic to varying degrees. The six women are all non-rhotic. This gender imbalance may well be significant, with women leading the way in an ongoing linguistic change i.e., rhoticity-loss. The details of the six male rhotic speakers are as follows, with percentages of rhotic tokens given: Mr G. Golby Mr Reg Green Mr Arthur Debenham Mr A Richardson Mr Arthur Emblem Mr Don Taylor
b. 1889 (Dalgety)
20% rhoticity (49/240 tokens) b. 1897 (Tingha) 8% (29/357) b. 1897 (Pampoolah) 8% (34/395) b. 1889 (Sydney) 4% (25/580) b. 1897 (Tamworth) 4% (19/454) b. 1891 (Avalon) 1% (8/557)
18
Peter Trudgill
The rhoticity which we see here is best interpreted as the vestigial rhoticity which is typical of the end-stages of rhoticity-loss. The variable rhoticity has a low level of occurrence; the non-prevocalic /r/s tend to be phonetically weaker than in pre-vocalic position (and than in most fully rhotic accents), i.e., they are less fully retroflexed; and /r/ occurs much more frequently in some phonological contexts than others: the preceding vowels which most commonly produce rhoticity are those of north/ force and letter, unlike in New Zealand English and other vestigially rhotic varieties e.g., those of various parts of England, where rhoticity is most common after nurse (see Trudgill 2004: 70). So this variable and low-level rhoticity must represent the last surviving traces of earlier, fuller rhoticity. There is no other explanation for why six speakers, all unknown to one another, and living in six different places, should display this kind of linguistic behaviour. Australian English, just like New Zealand English, used to be rhotic; and the nearer to the beginning of the 19th century, the higher the levels of rhoticity must have been. There can be no doubt that Australian English, today totally non-rhotic, was formerly at least variably rhotic. The evidence is that, far from non-rhoticity being brought to Australia, it was actually rhoticity that arrived; and the sound change involving loss of rhoticity did not go to total linguistic and society-wide completion until about 1900, in terms of birthdates of speakers. British Evidence I have just asserted, on the basis of New Zealand evidence, that England must have been “very rhotic in the 19th century”. Happily, there is British evidence for this also. For example, Beal (1999: 7–8) tells us that Walker (1791), in saying that /r/ is “sometimes entirely sunk” (see above), is referring “only to the most advanced dialect” of his day, colloquial London English; and she further argues that the loss of /r/ was still stigmatised in the first decades of the 19th century. Hallam, quoted in MacMahon (1983: 28), also shows that rhoticity continued to be a feature of some upper-class speech into the 1870s, citing the accents of Disraeli (b. 1804) and Prince Leopold (b. 1853), the fourth son of Queen Victoria (see also the discussion in Lass 1997: 6.2). It is also clear from dialectological work that in this respect most regional accents lagged behind London English and the prestige norm that was to become RP; and that Bailey’s (1996: 102) statement that “resistance to the spreading London fashion was, however, not long sustained” is a considerable exaggeration. If we work backwards chronologically, the evidence from dialectology is rather clear on this point. Map III shows areas of England which were rhotic in different ways at different periods in local dialects: 1. Areas that were fully rhotic in the Survey of English Dialects records (large areas of the southwest and the northwest of England,
Map III
20
2. 3. 4. 5.
6.
Peter Trudgill plus the northeast). Also included are areas of Wales which are indicated to be rhotic by Thomas (1994): these are “the rural communities of the west and the north” where it is a feature “carried over” from Welsh (p. 128); and the long-term anglophone areas of southern Pembroke (p. 131), and Gower, where however it is “infrequent”; and the Marches, which are the counties of Wales bordering directly on Herefordshire, Shropshire and Gloucestershire (p. 130). Areas of eastern Yorkshire which in the SED are shown with nonprevocalic /r/ retained only in unstressed final syllables, as in butter. Areas of Essex which in the SED are shown with non-prevocalic /r/ retained only after /ɜː/ in items such as worms. Additional areas which are shown to be fully rhotic in the 1930s in Kurath & Lowman (1970). Areas of the south and east Midlands, and of Essex and Suffolk, which Lowman’s research showed as having /r/ after /ɜː/ in the 1930s (from Kurath & Lowman 1970). Areas which are shown in Ellis (1889) as being rhotic. The details for Ellis (using his terms) are as follows for the areas not shown as rhotic by later workers: In area 20, Border Midland, which is equivalent to Lincolnshire, Ellis explicitly states (p. 297) that /r/ is vocalised or omitted. In area 24, Eastern West Midland, which is essentially south Yorkshire, rhoticity is variable but present, including in Sheffield and Rotherham. In area 25, Western Mid Midland, which centres on Cheshire, the transcriptions all show rhoticity. In area 26, Eastern Mid Midland, which centres on Derbyshire, transcriptions show total rhoticity except in the far east of that county. In area 27, East Midland, which is equivalent to Nottinghamshire, transcriptions show lack of rhoticity, except in East Retford in the north bordering the Eastern West Midland. In area 29, Eastern South Midland, all the transcriptions show rhoticity except for those for Lichfield, Staffordshire, and Atherstone and Enderby, Leicestershire. Transcriptions for area 30, East Northern, generally do not show rhoticity (except in the east where it shows up, as expected—see 2. in this list above—in items such as butter). The whole of area 31, West Northern, has transcriptions showing rhoticity.
7. We can also assume, following Thomas (1994), that there was rhoticity in the second-language English of those areas of Wales which were Welsh-speaking in the mid-19th century.
The Consonants of 19th-Century English
21
Thus, as Map III shows, the only areas of England and Wales for which we have no evidence of rhoticity in the mid-19th century lie in two separate corridors. The first runs south from the North Riding of Yorkshire through the Vale of York into north and central Lincolnshire, nearly all of Nottinghamshire, and adjacent areas of Derbyshire, Leicestershire, and Staffordshire. The second includes all of Norfolk, western Suffolk and Essex, eastern Cambridgeshire and Hertfordshire, Middlesex, and northern Surrey and Kent. It is also possible that there was a third area in recently anglophone areas of South Wales. The Phonetics of /r/ In modern Britain five different phonetic realisations of /r/ are extant. These are: •
• • • •
the sharply recessive voiced uvular fricative [ʁ], which is confined to the northeast of England (see Wells 1982: 368) and some Scottish idiolects (Aitken 1984); the alveolar flap [ɾ], which is today usually associated with Scotland and parts of the North of England; the retroflex approximant [ɻ], which is most typical of southwestern England (Wells 1982: 342); the postalveolar approximant [ɹ], most usually associated with RP and much of south and central England; and the labio-dental approximant [ʋ], which is currently gaining ground very rapidly amongst younger speakers (see Trudgill 1988).
There is no doubt at all that the labio-dental approximant is a very new pronunciation. Of the other three widespread variants, we can suppose on phonetic grounds that what we are witnessing is an ongoing process of lenition, with the flap being the oldest and the postalveolar approximant the newest variant, and the retroflex articulation chronologically intermediate. We can suppose that even earlier forms of English may have had a roll (trill). Bailey (1996: 99) indicates that “weakening of r from a trilled consonant was first reported in Britain at the end of the sixteenth century” (see also Wells 1982: 370). This gives us a history of lenition in the realisation of /r/ as follows: [r > ɾ > ɻ > ɹ > ʋ] Southern Hemisphere Evidence The relative chronology is clear; what is less certain is the absolute chronology. When, for example, did the variant [ɹ] become the most usual and widespread variant? Here the ONZE recordings are of considerable
22
Peter Trudgill
interest. The normal pronunciation of /r/ in New Zealand today is also, as in most of England, [ɹ], although on average this is rather more retroflexed, i.e., conservative, than in England. Australian and Falkland Islands English also show more retroflexion that southeastern England. However, analysis of the ONZE tapes shows that the pronunciation of /r/ as a flapped [ɾ] is extremely common on these recordings. There is thus a strong suggestion that the weakening of the flap to an approximant in the South and Midlands of England is a very recent phenomenon dating from approximately the middle of the 19th century, a development that has been followed more or less simultaneously in New Zealand. It is also of considerable interest that one of our informants, Mr Eccles, who was born and lived in Tasmania, Australia, until he was 20, and who is therefore not included amongst our core New Zealand speakers, has flapped /r/ too, suggesting that Australian English has undergone the same process. Some forms of South African English still have a tap to this day which, as argued for by Lass (1997: 206), further strengthens this supposition (a trilled variant can also be heard, but not from native speakers—Wells 1982: 617). Lass concurs that “non-approximant /r/ is the norm not only for Scots, but for virtually all of rural England in the mid-nineteenth century, and for many towns and cities as well”.
Note 1. The recordings were kindly made available to Prof. Elizabeth Gordon by the copyright holders, the Mitchell Library in Sydney.
References Aitken, A. J. 1984. Scottish accents and dialects. In P. Trudgill (ed.), Language in the British Isles, 94–114. Cambridge: Cambridge University Press. Andrésen, B. 1968. Preglottalization in English standard pronunciation. Oslo: Universitetsforlaget. Andersen, H. 2002. Preglottalization in English and a North Germanic bifurcation. In D. Restle & D. Zaefferer (eds.), Sounds and systems: Studies in structure and change: A Festschrift for Theo Vennemann, 5–24. Berlin: Mouton de Gruyter. Bailey, R. W. 1996. Nineteenth century English. Ann Arbor: University of Michigan Press. Beal, J. 1999. English pronunciation in the eighteenth century: Thomas Spence’s ‘Grand repository of the English language’. Oxford: Clarendon Press. Branford, W. 1994. English in South Africa. In Burchfield (ed.), 182–429. Ellis, A. 1889. On early English pronunciation, vol. 5. London: Trübner. Gimson, A. C. 1962. An introduction to the pronunciation of English. London: Edward Arnold. Gordon, E., L. Campbell, J. Hay, M. Maclagan, & P. Trudgill. 2004. The origins of New Zealand English. Cambridge: Cambridge University Press.
The Consonants of 19th-Century English
23
Kurath, H., & G. S. Lowman. 1970. The dialectal structure of southern England. Tuscaloosa: University of Alabama Press. Lass, R. 1997. Historical linguistics and language change. Cambridge: Cambridge University Press. MacMahon, M. 1983. Thomas Hallam and the study of dialect and educated speech. Transactions of the Yorkshire Dialect Society 83: 119–131. MacMahon, M. K. C. 1994. Phonology. In S. Romaine (ed.), The Cambridge history of the English language (Vol. 4: 1776–1997), 373–535. Cambridge: Cambridge University Press. Marckwardt, A. 1958. American English. New York: Oxford University Press. McLaughlin, J. C. 1970. Aspects of the history of English. New York: Holt, Reinhart & Winston. Mees, I. 1977. Language and social class in Cardiff. Leiden: University of Leiden Ph.D. thesis. Roach, P. 1973. Glottalisation of English /p,t,k,ʧ/: A re-examination. Journal of the International Phonetic Association 3: 10–21. Strang, B. 1970. A history of English. London: Methuen. Sweet, H. 1888. A history of English sounds from the earliest period. Oxford: Clarendon Press. Thomas, A. 1994. English in Wales. In Burchfield (ed.), 94–147. Trudgill, P. 1974. The social differentiation of English in Norwich. Cambridge: Cambridge University Press. Trudgill, P. 1986. Dialects in contact. Oxford: Blackwell. Trudgill, P. 1988. Norwich revisited: Recent changes in an English urban dialect. English World Wide 9: 33–49. Trudgill, P. 1999a. The dialects of England, 2nd edn. Oxford: Blackwell. Trudgill, P. 1999b. A window on the past: ‘Colonial lag’ and New Zealand evidence for the phonology of 19th-century English. American Speech 74(3): 1–11. Trudgill, P. 1999c. A Southern Hemisphere East Anglian: New Zealand English as a resource for the study of 19th century British English. In U. Carls & P. Lucko (eds.), Form, function and variation in English: Studies in honour of Klaus Hansen, 169–174. Berlin: Peter Lang. Trudgill, P. 2004. New dialect formation: The inevitability of colonial Englishes. Edinburgh: Edinburgh University Press. Turner, G. W. 1966. The English language in Australia and New Zealand. London: Longman. Walker, J. 1791. Critical pronouncing dictionary and expositor of the English language. London: Robinson, Cadell. Wells, J. C. 1982. Accents of English. Cambridge: Cambridge University Press.
2
High Vowel Decomposition in Midwest American English Jerzy Rubach
My students at The University of Iowa pronounce due and do in the same way. There seems to be nothing remarkable about it because both of these words are pronounced [du:] in a large part of the United States. Actually, however, the pronunciation of due and do is interesting for two reasons. First, my students pronounce due in the historically faithful way as [dɪu] and, second, the homophony is at the expense of do, so both do and due are pronounced [dɪu] rather than [du:]. The pronunciation [dɪu] for due, but certainly not for do, is reported in Kenyon and Knott’s (1940) Pronouncing Dictionary of American English. Kenyon and Knott (1940) note that there is an alternative pronunciation of due with [u:]. This is also true in the speech of my students, but the dominant pronunciation is [ɪu]. This chapter1 pursues the intriguing question of how [ɪu] developed and how it functions in present-day Midwest American English.2 The perspective is primarily diachronic, including the most recent history: the developments in the last one hundred years. On the theoretical side, the chapter demonstrates how the tools of Optimality Theory with moraic representations can inform an analysis of historical change and the functioning of synchronic grammars. The evidence considered in this chapter argues in favor of unidirectional Ident constraints and derivational levels in Optimality Theory. Section 1 looks at the change from [y:] to [ɪu] and [ju:] in Early Modern English. Section 2 provides an analysis of further developments in Midwest American English. Section 3 summarizes the conclusions.
1. Historical Development in Early Modern English Historical grammarians concur on the observation that the Middle English French vowel [y:] in words such as measure [mɛzy:r] and fortune
This article is dedicated to Professor Katarzyna Dziubalska-Kołaczyk, Kasia, whose scholarship and managerial genius I admire and whose friendship I cherish.
High Vowel Decomposition
25
[fɔrty:nə] began to fall in Early Modern English (Luick 1940; Dobson 1968; Jordan 1974; Minkova 2014). The fall resulted in the splitting of //y://3 into two vowels that constituted a diphthong: //y:// → [ɪu].4 A development of this type is well known in phonology, for example, Slovak turned low front //æ:// into [ia], as in riasa ‘cassock’ (see Rubach 1993). The type of process in question is decomposition or fission. The general principle is that under decomposition the properties of the input must be preserved in the correspondents in the output (Struijke 2000; Rubach 2003). In the Slovak example, //ræ:sa// → [riasa], the feature [-back] of the input //æ:// is realized as [i], the feature [+low] surfaces on [a], and length is preserved since diphthongs, like long vowels, have two moras. Given the input //y://, the principle of decomposition mandates that both [-back] and [+round] be preserved in the output. Therefore, the result is that //y:// changes into [ɪu],5 where [ɪ] carries [-back] and [u] preserves [+round]. Returning to our examples, the pronunciation changes between Middle English and Early Modern English (MnE) are as follows. (1) ME use measure fortune
Early y:z mɛzy:rə fɔrty:nə
MnE ɪuz mɛzɪur6 fɔrtɪun
Optimality Theory (OT; Prince & Smolensky 2004; McCarthy & Prince 1995) has no difficulty accommodating English Decomposition. Relevant for the analysis are the following constraints that I state in a simplified manner (for a formal statement, see McCarthy & Prince 1995). (2) a.
Integrity:
A segment in the input may not have multiple correspondents in the output, that is, no Decomposition. b. Ident[-back]: The feature [-back] of the input segment must be preserved on a correspondent of this segment in the output. c. Ident[+round]: The feature [+round] of the input segment must be preserved on a correspondent of this segment in the output. d. No Diphthong (*Diph): Diphthongs are not permitted. e. Long-V: Long vowels are not permitted. f. *[y:]: No front rounded vowels. g. Onset: Syllables must have onsets.
Modern phonological theory represents length in terms of moras (Hock 1986; Hyman 1985; Hayes 1989): long vowels and diphthongs have two moras while short vowels have one mora. The decomposition //y:// → [ɪu] preserves length because both the input and the output have two moras.
26
Jerzy Rubach
In (3) I look at the Early MnE change of //y:// → [ɪu] in the word use. (3) use //y:z// → [ɪuz] a. y:z
*[y:] Id[-bk] Id[+rd] *Long-V *Diph *!
*
b. ɪuz c. u:z d. i:z e. ju:z
Integrity Onset *
* *! *!
*
*
*
*
*
*
*!
The driver for Decomposition is the segment inventory constraint *[y:], which must have been reranked to an undominated position in the phonology of the speakers exhibiting Decomposition.7 Candidate (3a) violates *[y:] and is hence hors de combat. Candidates (3c-d) have turned //y:// into [u:] and [i:], failing to preserve Ident[-back] and Ident[+round], respectively. Candidate (3e), [ju:z], has a long vowel, but the system shown in (3) gives preference to diphthongs, so candidate (3b) wins, the correct result. Decomposition makes an important theoretical point: counter to the tenet of Standard OT (Prince & Smolensky 2004; McCarthy & Prince 1995), Ident constraints must be unidirectional and not bidirectional. This is documented in (3b), where Ident[-back] and Ident[+round] are satisfied in the Input → Output direction because both [ɪ] and [u] are correspondents of //y://, so both [-back] and [+round] are preserved, albeit on different components of the diphthong. The opposite direction, Output → Input does not work as [ɪ] is not only [-back] but also [-round] while the input //y:// is not [-round]. Similarly, the [u] part of the diphthong is not only [+round] but also [+back] while the input //y:// is not [+back]. I conclude that English Decomposition argues for unidirectional Ident constraints and thus lends support to the claim made originally in Pater (1999), Struijke (2000) and Rubach (2003). Luick (1940), Dobson (1968) and Wells (1982) note that the system with [ɪu] was unstable and the [ɪu] tended to develop into [ju:]. The change first occurred word-initially, so in words such as use. This is exactly what phonological theory would predict: syllables want to have onsets and CV is the most optimal syllable. The development [ɪu] → [ju:] follows from the tenets of moraic theory (Hock 1986; Hyman 1985; Hayes 1989): since [ɪ], a component of the diphthong, is moved to the onset, it sheds a mora as onsets can never be moraic. The freed mora hooks up to the vowel, yielding [ju:], so we witness a compensatory lengthening effect. An OT analysis of [ɪu] → [ju:] recruits some new constraints, not stated in (2).
High Vowel Decomposition (4) a. Max-μ: b. Dep-Seg:
27
No deletion of moras. No insertion of segments.
The candidate [ju:z] that was rejected in (3) must now be the winner. The input is //ɪu//. The break-up of the diphthong //ɪu// → [ju:] is enforced by Onset that must now be ranked above *Long-V. The undominated position of Dep-Seg makes sure that no new segments are inserted in order to provide an onset for use, so the [j] has to be harvested from //ɪu//. (5) use //ɪuz// → [ju:z]
Dep-Seg
Max-μ Onset
a. ɪuz b. jɪuz c. juz
*Long-V
*! *! *!
d. ju:z
*
Candidate (5a) loses on Onset. Candidate (5b) avoids this violation by inserting [j]. Insertion of a glide is a well-known repair strategy, for example, it is used in Kurpian (Rubach 2009). (6) Kurpian glide insertion //ɪlɛ// → [jɪlɛ] ‘how many’ //ul// → [wul] ‘beehive’ //ɔbɔra// → [wɔbɔra] ‘cowshed’ English does not employ this repair strategy, so Dep-Seg is undominated. The English solution is to make use of the existing material in order to create an onset: the [j] in [ju:z] comes from the //ɪ// in the diphthong //ɪuz// rather than from insertion, so Dep-Seg is not violated in (5c-d). Candidate (5c), with a short vowel, loses because it has failed to pick up the mora freed by gliding, //ɪ// → [j]. The change //ɪu// → [ju:] was not variable and occurred in all dialects of English. A farther fall of //ɪu// takes different paths in England and in America. Wells (1982) claims that by 1750 [ɪu] sounded old-fashioned in London and was regularly replaced with [ju:]. We can therefore assume that in the course of the 19th c. [ɪu] totally disappeared from British English, but as I explain in the following section, not so in America.
2. Decomposition in the Midwest It is fortunate that we have a complete and fully professional description of Standard Midwest American English from the year 1924, which is when Kenyon’s first edition of American Pronunciation appeared.
28
Jerzy Rubach
Kenyon, himself a professional phonetician, was a native of Ohio. He reports that he pronounces [ɪu] in his own speech in, for example, mute [mɪut], duty [dɪutɪ],8 suit [sɪut] and abused [əbɪuzd]. Kenyon’s description from the year 1924 is complemented by Kenyon and Knott’s Pronouncing Dictionary of American English, published in 1940 and based on the research carried out in the 1930s, so we have a valuable record of the facts. This section investigates the fate of [ɪu] in America. Section 2.1 provides an analysis of Monophthongization, //ɪu// → [ju:], and determines the context in which //ɪu// is retained as [ɪu]. Section 2.2 looks at a relationship between //ɪu// and Palatalization. Section 2.3 considers the special status of liquids, and Section 2.4 concludes with the extension of [ɪu] to new contexts. 2.1. Monophthongization versus Retention of [ɪu] Kenyon and Knott (1940) maintain that [ɪu] is found in three types of context. (7) Kenyon and Knott’s (1940) generalizations9 a.
[ɪu] after labials, velars and laryngeals, where it is in free variation with [ju:], so [ɪu] ⁓ [ju:], where ⁓ means ‘freely varying with’ abuse [əbɪuz] ⁓ [əbju:z] refuse [rɪfɪuz] ⁓ [rɪfju:z] music [mɪuzɪk] ⁓ [mju:zɪk] accuse [əkɪuz] ⁓ [əkju:z] human [hɪumən] ⁓ [hju:mən]
b. [ɪu] is the only variant after coronals in stressed syllables (Kenyon 1924: 219) tune [tɪun] suit [sɪut] new [nɪu] juice [ʤɪus] c.
duke [dɪuk] resume [rɪzɪum] enthusiast [ɪnθɪuzɪæst]
[ɪu] ⁓ [u:] variation occurs after the liquids [r] and [l] rule [rɪu] ⁓ [ru:l] lunatic [lɪunətɪk] ⁓ [lu:nətɪk]
cherubic [ʧɛrɪubɪk] ⁓ [ʧɛru:bɪk] lucid [lɪusɪd] ⁓ [lɪusɪd]
Given that Kenyon’s American Pronunciation appeared in 1924, we may assume that the generalizations stated in (7) represent the state of American English about a hundred years ago. Since then some generalizations have changed. The facts for my students are as follows.
High Vowel Decomposition
29
(8) Present-day Midwest10 a.
after labials, velars and laryngeals, [ɪu] is no longer attested because of the change //ɪu// → [ju:] music [mju:zɪk] dipute [dɪspju:t] fuse [fju:z] acute [əkju:t]
mute [mju:t] abuse [əbju:z] accuse [əkju:z] human [hju:mən]
b. after coronals in stressed syllables, [ɪu] freely varies with [u:], but [ɪu] is the dominant pronunciation tune [tɪun] ⁓ [tu:n] suit [sɪut] ⁓ [su:t] new [nɪu] ⁓ [nu:] juice [ʤɪus] ⁓ [ʤu:s] c.
duke [dɪuk] ⁓ [du:k] resume [rɪzɪum] ⁓ [rɪzu:m] enthusiast [ɪnθɪuziæst] ⁓ [ɪnθu:ziæst]
after the liquids r and l, the pronunciation [ɪu] has completely disappeared rule [ru:l] lunatic [lu:nətɪk]
cherubic [ʧɛru:bɪk] lucid [lu:sɪd]
The fall of [ɪu] after non-coronals yields to a straightforward analysis. I look at music and assume the arbitrary dates 1900 and 2000. (9) i. The year 1900: music //mɪuzɪk// → [mɪuzɪk] (no change)
Max-Seg
*Long-V
Max-μ
*[ɪu]
a. mɪuzɪk
*
b. mjuzɪk c. mu:zɪk
*! *!
d. mju:zɪk
* *!
ii. The year 2000: music //mɪuzɪk// → [mjuzɪk]
Max-Seg
Max-μ
a. mɪuzɪk
d. mju:zɪk
*Long-V
*!
b. mjuzɪk c. mu:zɪk
*[ɪu]
*! *!
* *
The present-day grammar in (9ii) has reranked *[ɪu] above *Long-V, so [mju:zɪk] wins over [mɪuzɪk]. The evaluation of //ɪu// → [ju:] after velars
30
Jerzy Rubach
and laryngeals, as in cute and human is entirely parallel to the evaluation shown for music in (9). Recall from the data in (8b) that [ɪu] is the default pronunciation after coronals in stressed syllables. The evaluation in (9i-a) suggests that [ɪu] is derived if *Long-V dominates *[ɪu], as it did in the year 1900. However, this ranking is not available in the year 2000 because we want [ju:] to win in music (9ii-d). The ranking must therefore be *[ɪu] >> Long-V. Let us see how this ranking fares with coronals. The icon marks the undesired winner while the sad face indicates the desired winner. Onset is irrelevant and hence is omitted. (10) duke //dɪuk// = [dɪuk] (no change; failed evaluation) Max-Seg
Max-μ
*[ɪu]
a. dɪuk
*Long-V
*!
b. djuk
*!
c. du:k
*!
*
d. dju:k
*
The result is incorrect because the default pronunciation of duke in the Midwest is [dɪuk] and [dju:k] is unattested. We need a constraint that can eliminate [dju:k]. A comparison of annual [ænjuəl] and annuity [ənɪuəɾi] prompts the solution because we have an alternation: [j] occurs in the unstressed syllable but not in the stressed syllable in the same morpheme. This leads to the following generalization: (11) Cor-j)Str-Syll:
[j] cannot occur after a coronal in a stressed syllable.
In the Midwest, Cor-j)Str-Syll is surface-true and hence undominated. The evaluation in (10) is now corrected in (12). (12) duke //dɪuk// = [dɪuk] (no change)
Cor-j)Str-Syll
Max-Seg
Max-μ
F a. dɪuk b. djuk
*Long-V
* *!
c. du:k d. dju:k
*[ɪu]
* *!
*!
* *
*[ɪu] must be ranked lower than Cor-j)Str-Syll and Max-Seg to make sure that candidates (12c) and (12d) lose to the desired output [dɪuk] in (12a).
High Vowel Decomposition
31
For completeness let us look at the alternation of zero and [j] in annuity [ənɪuəɾi]—annual [ænjuəl]. The evaluation of annuity runs in the same way as the evaluation of duke in (12). I do not analyze Flapping and Final Tensing (but see Note 10). (13) annuity //ænɪuətɪ// = [ənɪuəɾi]
Cor-j)Str-Syll
Max-Seg
Max-μ
*[ɪu]
a. ənɪuəɾi b. ənjuəɾi
* *!
*
c. ənu:əɾi d. ənju:əɾi
*Long-V
*!
*
*!
*
In the adjective annual, the relevant syllable is unstressed. This calls for a constraint regulating length of unstressed vowels. (14) Shortening: Vowels in unstressed syllables must be short. In order to have an effect, Shortening must be ranked above Max-μ that enforces the preservation of length. Cor-j)Str-Syll, listed for completeness in (15) is mute because the syllable is unstressed. (15) annual //ænɪuæl// → [ænjuəl] a. ænɪuəl b. ænjuəl c. ænu:əl d. ænju:əl
Cor-j)Str-Syll Max-Seg Shortening Max-μ *[ɪu] *Long-V *!
* *
*!
* *!
* *
The analysis delivers the correct result: we have [j] after coronals in unstressed syllables and [ɪu] after coronals in stressed syllables: annual [ænjuəl] in (15) vs. annuity [ənɪuəɾi] in (13). Since the blocking of [j] derived from //ɪu// in the surface representations is executed by Cor-j)StrSyll and, consequently, is limited to syllables beginning with coronals, the expectation is that [j] should be free to surface after labials, velars and laryngeals not only in unstressed syllables but also in stressed syllables. This expectation is borne out. (16) Unstressed syllable commutation [kɑmjəteɪʃən] computation [kɑmpjəteɪʃən] ambiguous [æmbɪgjuəs]
Stressed syllable commute [kəmju:t] compute [kəmpju:t] ambiguity [æmbɪgju:əɾi]
32
Jerzy Rubach
2.2. Palatalization The examples in (17) document a new issue, as the alternations suggest. (17) habit [hæbɪt]—habitual [həbɪʧuəl] grade [greɪd]—gradual [græʤuəl] sense [sɛns]—sensual [sɛnʃuəl] use [ju:z]—usual [ju:ʒuəl] We see an alternation between alveolars and palato-alveolars: [t d s z]— [ʧ ʤ ʃ ʒ]. This alternation and its sources can be inspected directly by looking at phonostylistic variation in clitic phrases. (18) Midwest Clitic Phrase Palatalization don’t you, would you doʊnt ju: wʊd ju: doʊnʧ ju: wʊʤ ju doʊnʧ ju
wʊʤ ju
Palatalization, t d → ʧʤ Shortening in unstressed syllables
Clitic Phrase Palatalization is limited to stops in the Midwest, but in other dialects of English, for example, in RP it extends to fricatives. (19) Clitic Phrase Palatalization in RP lets you [s]—[ʃj]—[ʃ] knows you [z]—[ʒj]—[ʒ] What we witness is a rule of Palatalization. Stated originally in Chomsky and Halle (1968) and developed in Rubach (1984), Palatalization turns //t d s z// into [ʧ ʤ ʃ ʒ]. The rule applies before [j] but not before front vowels, so we have [d] rather than [ʤ] in would it [wʊd ɪt], never *[wʊʤ ɪt]. (20) architect [t]—architecture [ʧ] proceed [d]—procedure [ʤ] press [s]—pressure [ʃ] close [z]—closure [ʒ] The words in (20) show effects of Palatalization, but [j], the trigger of the process, is not visible. This difficulty disappears when we consider the pair fail—failure that has the same structure as the words in (20). In fail+ure the [j] appears overtly in the surface representation, so the /j/ must also be present in (20). Stating Palatalization as an OT constraint would take us far beyond the scope of this chapter, so I will assume the following informal generalization.
High Vowel Decomposition (21) Pal
33
[t d s z] → [ʧ ʤ ʃ ʒ] / — j
The absence of [j] in the surface representations in (20) is due to deletion that applies after palato-alveolars. Stated as a constraint, [j]-Deletion prohibits the following clusters. (22) [j]-Deletion ([j]-Del) *ʧj ʤj ʃj ʒj The alternation [t]—[ʧ] in words such as habit—habitual suggests that we are looking at Palatalization. As noted, Palatalization is activated by the occurrence of [j], but habitual does not have [j]: [həbɪʧuəl]. The argument for /j/ is made in three ways. First, [j] is not visible in [həbɪʧuəl] because [j]-Del must have deleted it after [ʧ ʤ ʃ ʒ], so we have an instance of opacity: the trigger of the process is not found in the surface representation. Second, habitual has the same structure as manual, in which [j] is present in the phonetic representation. Third, the pair perpetuity [pərpɛtɪuəɾi]–perpetual [pərpɛʧuəl] takes us directly to the underlying representation. In perpetuity we have the diphthong [ɪu] in the surface representation, so //ɪu// must also be present in the underlying representation. The form perpetual [pərpɛʧuəl] is now easy to understand: the /j/ is derived from //ɪu// in the same way as in annual, //ænɪuæl// → [ænjuəl] analyzed in (15). This smooth reasoning runs into a difficulty. As just stated, we can derive /j/ in perpetual, but the /j/ is subsequently deleted. In a derivational account, the rules would apply as follows. (23) //pərpɛtɪuæl// pərpɛtju:æl pərpɛʧjuæl pərpɛʧuæl pərpɛʧuəl
Monophthongization, //ɪu// → /ju:/ Palatalization, /t/ → [ʧ]/__/j/ [j]-Deletion Vowel Reduction
From the point of view of Standard OT, the reasoning in (23) is unacceptable because it assumes derivation, and OT, it is held firmly, is fully parallel, without any derivational steps. This assumption leads to a conundrum: on the one hand, the Pal constraint is activated only in the context of /j/, but, on the other hand, the surface form does not contain [j]: [pərpɛʧuəl]. This conundrum is insoluble in Standard OT but yields to a straightforward analysis in Derivational OT, a theory proposed originally by Kiparsky (1997) and Rubach (1997). Kiparsky (2000) assumes three derivational levels: the stem level, the word level and the postlexical level. This model is extended in Rubach (2011) to include the clitic level. The architecture of the theory assumes that the input to the first level is the underlying representation, the winner from the first level is the input to the second level (a new ‘underlying representation’), the winner from the second level
34
Jerzy Rubach
is the input to the third level, and the winner from the third level is the input to the postlexical level. Importantly, constraints may be reranked between levels, but the reranking must be minimal (Rubach 2000). Given Derivational OT, the conundrum regarding the incompatibility of Pal and [j]-Del is easily resolved: at level 1, Pal compels //tj dj sj zj// → /ʧj ʤj ʃj ʒj/ because Pal is ranked above the faithfulness constraint militating against changes in the place of articulation. (24) Ident[+anter]: [+anter] on the input segment must be preserved on a correspondent of that segment in the output. The /j/ is not deleted at level 1 because Max-Seg is undominated and thwarts a potential action of [j]-Del that is ranked lower. In (25), I ignore a formal account of Vowel Reduction that turns vowels into schwa in unstressed syllables. Irrelevant constraints such as the undominated Cor-j)Str-Syll that has jurisdiction over stressed syllables have been omitted. (25) Level 1: perpetual //pərpɛtɪuæl// → [pərpɛʧjuəl] Max-Seg Shortening Max-μ *[ɪu] Pal Id[+anter] [j]-Del a. pərpɛtɪuəl *! * b. pərpɛtjuəl * *! c. pərpɛʧju:əl *! * * * * * d. pərpɛʧjuəl e. pərpɛtuəl *! *
As noted, Pal must outrank Ident[+anter] or else it would not have an effect. Crucially, [j]-Del must be ranked below Max-Seg, so that /j/ can survive in the optimal candidate and account for Palatalization. Shortening penalizes two-moraic nuclei in unstressed syllables. Obedience to Shortening is at the cost of violating Max-μ because one mora from the input has been lost, so the ranking is Shortening >> Max-μ. At level 2, the ranking of Max-Seg and [j]-Del is reversed. (26) Level 1: Max-Seg >> [j]-Del, hence no deletion of /j/ Level2: [j]-Del >> Max-Seg, hence [j] is deleted after [ʧ ʤ ʃ ʒ] The evaluation in (25) continues in (27). (27) Level 2: perpetual //pərpɛʧjuəl// → [pərpɛʧuəl] a. pərpɛʧjuəl b. pərpɛʧuəl
[j]-Del
Max-Seg
*! *
High Vowel Decomposition
35
I conclude that an analysis of English Palatalization argues for Derivational OT and against Standard OT, whose principle of strict parallelism must be abandoned. In diachronic terms, the fall of //ɪu// (from ME //y://) is not complete in the Midwest. The [ɪu] survives in stressed syllables after a coronal, as in duke, but turns into [ju:] after labials, velars and laryngeals, as in music, cute and huge. The context of liquids is special, as I detail in the following section. 2.3. Liquids After liquids, Kenyon’s (1924) and Kenyon and Knott’s (1940) [ɪu] has been completely eliminated in the speech of my students. (28) rule rude absolutely lucid
Kenyon and Knott rɪul rɪud æbsəlɪutlɪ lɪusɪd
Present-day Midwest ru:l ru:d æbsəlu:tli lu:sɪd
The context of liquids is special because after coronals other than [r] and [l], [ɪu] occurs freely, and is in fact the default pronunciation in words such as duke [dɪuk]. The absence of [ɪu] after [l, r] is a phonotactic constraint on allowable collocations. (29) Liquid-[ɪu] (Liq-[ɪu]): No [ɪu] after [r, l].11 To have an effect, Liq-[ɪu] must be ranked above Max-Seg, as //ɪu// → [u:] violates Max-Seg. The evaluation in (30) shows how the sound change has affected the word rule. (30) rule //rɪul// → [ru:l]12 a. rɪul b. rju:l c. rul d. ru:l e. rɪ:l
Liq-[ɪu] Cor-j)Str-Syll Max-Seg Max-μ *[ɪu] *[ɪ:] *[u:] *!
* *!
* *
*!
* *
* *!
A point of interest in (30) is that the diphthong //ɪu// can be monophthongized in two ways. The first segment of //ɪu// can be lengthened at the expense of the //u// that is deleted, //ɪu// → [ɪ:]; or conversely, the second segment of //ɪu// can be lengthened at the expense of the //ɪ// that is
36
Jerzy Rubach
deleted, //ɪu// → [u:]. Both of these options attain the goal of eliminating [ɪu], but only the latter is attested. The desired result follows from the ranking of the segment inventory constraints: *[ɪ:] (don’t be a long high front lax vowel) >> *[u:] (don’t be a long high back tense vowel). This ranking is natural because the generalization is that long vowels are typically tense, not lax. Monophthongization after [l] is best illustrated by looking at the pair voluminous–volume. The former appears as [vəlɪumɪnəs] in Kenyon and Knott (1940) but as [vəlu:mɪnəs] in the speech of my students. On the other hand, volume, where the syllable after [l] is unstressed has not changed: [vɑljəm] in both Kenyon and Knott (1940) and in the speech of my students. The tableaux in (31–32) do not address the issue of vowel reduction to schwa in unstressed syllables. (31) voluminous //vɑlɪumɪnəs// → [vəlu:mɪnəs] Liq-[ɪu] Cor-j)Str-Syll Max-Seg Max-μ *[ɪu] *[ɪ:] *[u:] a. vəlɪumɪnəs *! * b. vəlju:mɪnəs *! * c. vəlumɪnəs * *! * * d. vəlu:mɪnəs e. vəlɪ:mɪnəs * *!
In the noun volume, the syllable with [l] is unstressed and hence subject to Shortening. Cor-j)Str-Syll is omitted because it has no jurisdiction over unstressed syllables. (32) volume //vɑlɪum// → [vɑljəm] a. vɑlɪum
Liq-[ɪu] Max-Seg Shortening Max-μ *[ɪu] *!
*
b. vɑlju:m
*!
c. vɑljəm d. vɑləm
*
* *!
To summarize, present-day Midwest American English is different from the Midwest dialect in Kenyon (1924) and Kenyon and Knott (1940) in two ways that are relevant to this chapter. First, [ɪu] is no longer attested after labials, velars and laryngeals, so we have music [mju:zɪk], cute [kju:t] and huge [hju:ʤ] rather than [mɪuzɪk], [kɪut] and [hɪuʤ]. Second, [ɪu] has vanished from the context of liquids, so we have rule [ru:l] and lucid [lu:sɪd] rather than [rɪul] and [lɪusɪd].
High Vowel Decomposition
37
2.4. Variation and Extension to New Contexts The context of coronals other than liquids delivers variable results and continues the state of the matter reported in Kenyon (1924) and Kenyon and Knott (1940), so we have duke [dɪuk] ⁓ [du:k] and assume [əsɪum] ⁓ [əsu:m]. The question is how the [u:] variants are generated by the grammar. The answer is straightforward: Max-Seg must be ranked below *[ɪu], as the following evaluation demonstrates. (33) duke //dɪuk// → [du:k]
Cor-j)Str-Syll *[ɪu] Max-Seg Max-μ
a. dɪuk b. dju:k c. duk d. du:k
*! *! *
*!
*
Candidate (33c) has not only deleted the underlying //ɪ// but also one of the moras, shortening the vowel, an offence penalized by Max-μ. Looking at (33), the two attested variants of duke are generated by assuming that the ranking of *[ɪu] and Max-Seg is variable. (34) a. Option 1: Max-Seg >> *[ɪu], so (33a) wins b. Option 2: *[ɪu] >> Max-Seg, so (33d) wins There is a fascinating development in the Midwest dialect (the speech of my students): the variation [ɪu] ⁓ [u:] that comes historically from the front rounded vowel //y:// has been extended to the context where it does not belong, as the following examples document. The transcriptions of interest have been bolded. (35) due do chute shoot new noon presume zoom assume soon
Kenyon and Knott dɪu ⁓ du: du: ʃɪut ⁓ ʃu:t ʃu:t nɪu ⁓ nu: nu:n prəzɪum ⁓ prəzu:m zu:m əsɪum ⁓ əsu:m su:n
Present-day Midwest dɪu ⁓ du: dɪu ⁓ du: ʃɪut ⁓ ʃu:t ʃɪ ʃɪut ⁓ ʃu:t ʃɪ nɪu ⁓ nu: nɪun ⁓ nu:n prəzɪum ⁓ prəzu:m zɪum ⁓ zu:m əsɪum ⁓ əsu:m sɪun ⁓ su:n
38
Jerzy Rubach
The extension of the [ɪu] pronunciation to the o/oo words is historically incorrect. For example, do comes from Old English dōn [do:n] and the [u:] in do [du:] is a result of the Great Vowel Shift that operated in Early Modern English. The unexpected pronunciation [dɪu] for do must come from misassignment of the underlying representation. Since genuine [ɪu] words, such as due [dɪu] ⁓ [du:], overlap in their surface representation with [u:] words, such as do [du:], speakers misassign the representation //ɪu// to //u:// morphemes. Consequently, not only due but also do derives from the underlying //dɪu//. The grammar generates surface [dɪu] without trouble, as shown by the evaluation of duke in (12). The observations just made are strengthened by the fact that the //ɪu// extension occurs in exactly the class of historical [y:]-Decomposition words: stressed coronal initial syllables. After non-coronals in o/oo words, the only attested pronunciation is [u:]. (36) o/oo words with a non-coronal initial consonant boom [bu:m], not *[bɪum] move [mu:v], not *[mɪuv] lagoon [ləgu:n], not *[ləgɪun] coo [ku:], not *[kɪu] hooligan [hu:ləgən], not *[hɪuləgən] The absence of the [ɪu] extension in (36) fits the system since these are the contexts in which the historical [ɪu] was not retained and went to [ju:]; compare refuse [rɪfju:z] and acute [əkju:t]. The contention that the [ɪu] extension is a diphthong rather than [ju] with a glide is motivated in three ways. First, the [ɪu] extension occurs only in words that have a long vowel, so in soon [su:n], but not in soot [sʊt]. This restriction fits: short [ʊ] words are monomoraic, so they cannot support a diphthong because diphthongs have two moras. Second, the front segment audible in the speech of my students must be [ɪ] rather than [j] because the //j// from the putative //ju// would have been deleted through the action of Cor-j)Str-Syll that prohibits [j] in stressed syllables beginning with coronal. However, if the representation is //ɪu//, Cor-j)StrSyll is mute because there is no [j]. Third, [j]-Del that prohibits [j] after [ʧ ʤ ʃ ʒ] tolerates the front vocoid element in choose [ʧɪuz] and shoot [ʃɪut]. Again, this fits if we assume that the words contain //ɪu// rather than //ju//, so the underlying representations are //ʧɪuz// and //ʃɪut// rather than //ʧjuz// and //ʃjuz//, and hence [j]-Del is mute.
3. Conclusion The history of the front rounded vowel [y:] occurring in French loanwords in Middle English is interesting from the point of view of both phonological theory and descriptive linguistics. On the theoretical side,
High Vowel Decomposition
39
we see how OT makes correct predictions about the decomposition of //y:// into a falling diphthong [ɪu]. Ident[-back] and Ident[+round], the faithfulness constraints guiding Decomposition, provide evidence for the tenet that Ident constraints must be unidirectional rather than bidirectional, so Standard OT must be modified in its understanding of how Ident constraints work. OT predicts further that [ɪu] outputs in which the diphthong is word-initial should monophthongize to [ju:]. Specifically, Onset enforces //ɪ// → [j] and moraic theory predicts, correctly, that taking out //ɪ// from //ɪu// must lead to the lengthening of //u//, so //ɪu// → [ju:]. This lengthening is a compensatory effect caused by the mora of //ɪ/ that is freed when //ɪ// goes into the onset, as in use //ɪuz// → [ju:z]. The analysis of Palatalization leads to a different point of theoretical interest. Since //j//, the trigger of the process, is not attested on the surface, it is necessary to assume a derivational step at which /ʧj ʤj ʃj ʒj/ are the desired outputs. The concept of a derivational step or a derivational level is entirely foreign to Standard OT, which is committed to the ideology of strict parallelism: simultaneous evaluation of all candidates without any derivationalism whatsoever. Derivational OT is exactly the theory that is needed for an analysis of English Palatalization. At level 1, unstressed //tɪu dɪu sɪu zɪu// → /tj dj sj zj/ → /ʧj ʤj ʃj ʒj/, so Pal is satisfied. At level 2, [j]-Del is reranked above Max-Seg, so /j/ is lost, as required. From a descriptive point of view, [y:]-Decomposition is a story of continued fall of the marked structure. First //y:// develops into [ɪu], eliminating front rounded vowels that are alien to the phonology of English. Then //ɪu// itself falls by undergoing Monophthongization: //ɪu// → [ju:]. As noted by Wells (1982), by 1750 [ɪu] was regarded as an old-fashioned pronunciation in London. In the course of the 19th c. [ɪu] vanished completely from British English because it was turned into [ju:]. Across the Atlantic, in America, [ɪu] continued to be pronounced. The descriptions from a hundred years ago record [ɪu] in stressed syllables after all types of consonants: labials, coronals, velars and laryngeals; mute [mɪu], duke [dɪuk], rule [rɪul], lucid [lɪusɪd], cute [kɪut] and huge [hɪuʤ]. The current situation is different: [ɪu] is found after coronals in stressed syllables and it is in free variation with [u:], as in duke [dɪuk] ⁓ [du:k]. After non-coronals, //ɪu// has changed into [ju:], as in mute [mju:t], cute [kju:t] and huge [hju:ʤ]. Liquids constitute a special case because they allow only [u:], as in rule [ru:l] and lucid [lu:sɪd]. In the speech of my students [ɪu] is the dominant pronunciation after coronals. Its dominance is particularly evident in the tendency for historical [u:] words deriving from Old English [o:] to adopt the [ɪu] pronunciation. The bottom line is that due and do as well as chute and shoot are pronounced identically, but, counter to expectations, it is the [ɪu] output that dominates in creating homophony, so [dɪu] is the optimal surface representation for both due and do.
40
Jerzy Rubach
Notes 1. I would like to thank my students at the University of Iowa, particularly the 2018 class in Historical Linguistics, for their help and inspiration. 2. Most of my students are from Iowa but some come from Illinois. 3. I use double slashes for underlying representations, single slashes for intermediate forms, and square brackets for surface representations. 4. In the 17th c. [ɪu] started occurring also in words from native English sources, such as dew and new; the former from ME deew (OE dēaw) and the latter from ME niew (OE niowe). See Lass (1999). 5. It is assumed that [ɪu] is a falling diphthong, which accords well with the other diphthongs of English (Luick 1940; Kenyon & Knott 1940; Dobson 1968). 6. Final schwa was deleted by Apocope. 7. Dobson (1968) and Jordan (1974) note that the pronunciation [y:] persisted into the 17th c., but was limited to educated speakers. These speakers showed variation between [y:] and [ɪu]. 8. Kenyon (1924) and Kenyon and Knott (1940) do not transcribe Flapping. The final vowel is [ɪ]; see the explanation below. 9. Following the established tradition, Kenyon and Knott (1940) do not transcribe length because length is predictable from tenseness in American English, so, for instance, ‘long’ u is transcribed [u]. I will depart from this tradition and transcribe tense vowels not only as tense but also as long, which is consistent with the historical development. 10. In addition to the changes detailed below, there are two rules that my students use in an obligatory fashion and that are not attested in Kenyon (1924) or Kenyon and Knott (1940): Final Tensing and Prevocalic Tensing. Final Tensing tenses high vowels at the end of the word while Prevocalic Tensing tenses them before a vowel. Final Tensing Kenyon and Knott mɛsɪ ɑrgjʊ
Present-day Midwest mɛsi ɑrgju
Prevocalic Tensing Kenyon and Knott enthusiast ɪnθɪuzɪæst associate əsoʃɪeɪt usual ju:ʒʊəl
Present-day Midwest ɪnθɪuziæst əsoʊʃieɪt ju:ʒuəl
messy argue
11. The constraint need not be limited to stressed syllables as [ɪu] does not occur after liquids at all, so the generalization is surface-true. 12. Technically, this is a level 1 evaluation, but nothing happens at level 2. Here and below, I will assume that, if not stated otherwise, the evaluation refers to level 1.
References Chomsky, N. & M. Halle. 1968. The sound pattern of English. New York: Harper and Row. Dobson, E. J. 1968. English pronunciation 1500–1700. Oxford: Clarendon Press. Hayes, B. 1989. Compensatory lengthening in moraic theory. Linguistic Inquiry 20. 253–306.
High Vowel Decomposition
41
Hock, H. H. 1986. Compensatory lengthening: In defense of the concept ‘mora’. Folia Linguistica 20. 431–460. Hyman, L. 1985. A theory of phonological length. Dordrecht: Foris Publications. Jordan, R. 1974. Handbook of middle English grammar: Phonology. The Hague: Mouton. Kenyon, J. S. 1924. American pronunciation. Ann Arbor: George Wehr Publishing Company. Kenyon, J. S. & T. A. Knott. 1940. A pronouncing dictionary of American English. Springfield: Merriam. Kiparsky, P. 1997. LP and OT: Handout. Ithaca, NY: Cornell Linguistic Institute. Kiparsky, P. 2000. Opacity and cyclicity. The Linguistic Review 17. 351–365. Lass, R. 1999. Phonology and morphology. In R. Lass (ed.), The Cambridge history of the English language, Vol. 3: 1476–1776, 56–186. Cambridge: Cambridge University Press. Luick, K. 1940. Historische Grammatik der englischen Sprache. Leipzig: Bernhard Tauschnitz. McCarthy, J. J. & A. Prince. 1995. Faithfulness and reduplicative identity. In J. N. Beckman, L. W. Dickey & S. Urbanczyk (eds.), University of Massachusetts occasional papers in linguistics, Vol. 18, 249–384. Amherst, MA: Graduate Linguistic Student Association Publications. Minkova, D. 2014. A historical phonology of English. Edinburgh: University Press. Pater, J. 1999. Austronesian nasal substitution and other NC effects. In R. Kager, H. van der Hulst & W. Zonneveld (eds.), Prosody: Morphology interface, 305– 343. Cambridge: Cambridge University Press. Prince, A. & P. Smolensky. 2004. Optimality theory: Constraint interaction in generative grammar. Oxford: Blackwell. [Revision of 1993 technical report, Rutgers University Center for Cognitive Sciences. Available on Rutgers Optimality Archive, ROA-537]. Rubach, J. 1984. Segmental rules of English and Cyclic Phonology. Language 60. 21–54. Rubach, J. 1993. Lexical Phonology of Slovak. Oxford: Oxford University Press. Rubach, J. 1997. Extrasyllabic consonants in Polish: Derivational Optimality Theory. In I. Roca (ed.), Derivations and constraints in phonology, 551–581. Oxford: Oxford University Press. Rubach, J. 2000. Glide and glottal stop insertion in Slavic languages: A DOT analysis. Linguistic Inquiry 31. 271–317. Rubach, J. 2003. Duke-of-York derivations in Polish. Linguistic Inquiry 34. 601–629. Rubach, J. 2009. Zasady pisowni kurpiowskiego dialektu literackiego. Ostrołęka: Związek Kurpiów. Rubach, J. 2011. Syllabic repairs in Macedonian. Lingua 121. 237–268. Struijke, C. 2000. Existential faithfulness: A study of reduplicative TETU. College Park: University of Maryland dissertation. Wells, J. C. 1982. Accents of English. Cambridge: Cambridge University Press.
3
Social Dialect The Halting of a Sound Change in Oslo Norwegian Revisited—A Report on the Imminent Victory of Retroflex /ɭ/ Ernst Håkon Jahr
Introduction In a conference paper from 1986 (published as Jahr 1988), I analysed a particular sound change involving laterals in Oslo Norwegian (details below) and made the point that this expected and structurally longoverdue change in the Oslo lateral system halted before going to completion because of social factors linked to the simpler system that would have resulted. Such a simple lateral system had long been a feature of the dialect to the southeast of Oslo, which was traditionally considered to be among those with the lowest social status in the country. Language users in the capital did not want to sound like people from that area, and this attitude, I argued, halted the sound change in question in Oslo. Today, however, this change is nevertheless ongoing in the capital. One of the first to mention this was Uri (2004: 145), who wrote: And then there is something going on with the l’s, at least in some places. We live in Oslo, but my children and many of their friends use l’s that make them sound like people from Østfold. Since Oslo speech has a high status, this l-sound may spread throughout the country. (My translation) The phonological pressure to continue the change in the lateral system has turned out to be too strong after all. Within a relatively short period of time—between the late 1990s and now (2019)—children and young adults (although not all of them) throughout Oslo have completed the change to a new simple lateral system, the final stage of the change being the eradication of the dental l allophone after [a(:)] and [o(:)]. To most people over 50, this feature is still considered alien to Oslo speech, as testified by, for example, frequent letters to newspapers from more mature members of the language community. To them, it is like a dam has suddenly broken, allowing a very salient feature to surge in from the lowstatus Østfold dialect.
Social Dialect
43
The Development of the Oslo Lateral System From c. 1880 Onwards The lateral system in Oslo Norwegian, involving what to the language users in the capital would be subsumed under the label of an l sound, became rather complicated during the 20th century. Around 1880, however, Oslo speech had a simpler system, which consisted of a dental (or alveolar) [l] and a more rarely used retroflex /ɭ/ (a voiced retroflex lateral approximant). The latter was an assimilated pronunciation of the consonant cluster [rl], which could, however, at that time also be pronounced as [rl]—a trill followed by [l]—in the Oslo West dialect, i.e., in more prestigious speech. The dental [l] was realized as a clear l [l] in all phonological contexts, except when following a back vowel (especially [a(:)] and [o(:)]), where this l allophone would be realized as a voiced velarized dental approximant [ɫ]. Norwegian linguists at that time did not describe the latter as a different “sound” from the l in most other contexts, since the POA (Place of Articulation; here, the placement of the tip of the tongue, the socalled passive articulator) was the same, either dental or alveolar. Neither the famous Norwegian phonetician Johan Storm (cf. Linn 2004) in his descriptions of Norwegian at the time (e.g., Storm 1892: 245–257) nor other contemporary descriptions (Brekke 1881; Western 1889) made any distinction whatsoever between the “l sound” found after front vowels and after back vowels. (And, of course, at that time, long before structuralism, they all wrote about “sounds”, not about phonemes and allophones.) This state of affairs in 1880 had changed by the second half of the 20th century, by which time we could observe three lateral allophones in use in Oslo speech (Jahr 1975, 1981). The dental allophone was now restricted to two phonological contexts: following another dental consonant (as in handle ‘shop, v.’, fatle ‘sling, n.’); and after the back vowels [a(:)] and [o(:)] (as in alle ‘all’, holde ‘hold, v.’). The retroflex [ɭ] allophone had taken over in most phonological environments and was becoming the most frequently used allophone (in words such as like ‘like, v.’, le(i)ke ‘play, v.’, stilig ‘elegant, smart’). Early in the 20th century, the workingclass dialect of the more eastern areas of the capital also had the dental allophone as a possibility after the back vowel [u(:)] (in words such as politi ‘police’, bolig ‘abode, dwelling’). After back vowels, as was the case earlier on, the dental allophone was velarized, the back of the tongue being lifted towards velum. A velarized dental l [ɫ] is sometimes referred to as “dark l”. The difference between this “dark”, velarized [ɫ] and retroflex [ɭ] now became sociolinguistically important and salient.
The Complex System of the 1970s and 1980s The rather complicated late 20th century system is shown on the right side of Figure 3.1.
44
Ernst Håkon Jahr
Figure 3.1 The development of l allophones from a very simple system, c. 1880, to a rather complicated one 100 years later, c. 1980 (= “today” in the figure). The allophone [ɭ] is shown with a dot under the l in this figure Source: Jahr 1988: 334.
In Oslo speech (as also in many other southeast Norwegian dialects), we find that a retroflex flap [ɽ] may alternate with both [ɭ] and [ɫ] in some words. The retroflex flap is conceived of by these speakers as an l sound, and thus usually referred to as “thick l”. The variation in use (of either the retroflex flap or one of the lateral allophones) is dependent on wellknown sociolinguistic categories, which, however, will not be discussed further here (cf. Jahr 1976, 1978, 1981). Instead, I will investigate a special morphophonological process found in the conjugation of masculine nouns ending in [r] or retroflex flap [ɽ] in the indefinite singular, e.g., kar [ka:r] ‘guy’, mur [mʉ:r] ‘wall’, dal [da:ɽ] ‘valley’, kål [ko:ɽ] ‘cabbage’, ål [o:ɽ] ‘eel’. This is relevant here, because, as I will demonstrate below, the change in laterals in Oslo speech from c. 1880 onwards also resulted in more masculine nouns being affected by the same morphophonological process (i.e., masculine nouns ending in [əɭ]). In Norwegian, the definite singular morpheme is suffixed (-en for masculine nouns); thus we find the following forms in the written standard for the set of nouns above: karen ‘guy-the’, muren ‘wall-the’, dalen ‘valley-the’, kålen ‘cabbage-the’, ålen ‘eel-the’. In Oslo speech, however, especially in the working-class variety, the phonological realisation of these forms could be, respectively, [ka:ɳ], [mʉ:ɳ], [da:ɳ], [ko:ɳ], [o:ɳ]. During the 20th century, we find that the same morphophonological process also began to occur in masculine nouns ending in [əɭ]: indef. sg. sykkel bibel artikkel fakkel
def. sg. sykkelen bibelen artikkelen fakkelen
[sykæɳ] [bi:bæɳ] [aʈikæɳ] [fakæɳ]
This morphophonological process: [əɭ] + masc. sg. def.art. [ən] > [æɳ]
‘bike’ ‘bible’ ‘article’ ‘torch’
Social Dialect
45
affects a rather large number of masculine nouns in the definite singular: engelen ‘the angel’, himmelen ‘the sky’, hybelen ‘the rented small room or apartment’, muskelen ‘the muscle’, onkelen ‘the uncle’ etc. Since this is the same morphophonological process as the one involving an [r] or a retroflex flap [ɽ] + < masc. sg. def.art. >, it shows that the [ɭ] in [-əɭ] has its POA close to both [r] and [ɽ], and this then triggers the same morphophonological process, also yielding the ending [æɳ] in masc. def. sg. nouns ending in [əɭ] in the indef. sg. This process, however, did not affect masculine nouns in which the retroflex flap [ɽ] was no option (in final position), such as sal ‘hall’, mal ‘template’, trål ‘trawl net’. After [a(:)] and [o(:)] the dental [ɫ] then, as expected, prevailed: salen [sa:ɫən], malen [ma:ɫən], trålen [tro:ɫən] (not *[sa:ɳ], *[ma:ɳ], *[tro:ɳ]). In Amund B. Larsen’s pioneering description of the working-class dialect of Oslo (Larsen 1907), he did not mention this morphophonological process after [əɭ]; it was first mentioned by Vanvik (1975). The piece of evidence presented here from the morphology of masculine nouns ending in [əɭ] in the indef. sg. demonstrates that the /l/ phoneme in Oslo speech— in most phonological contexts—has migrated from a generalized dental/ alveolar allophone to a retroflex allophone [ɭ], with a POA close enough to [r] and retroflex flap [ɽ] to trigger the same morphophonological process in masculine nouns ending in [əɭ] as [r] and [ɽ] did. There is, however, lots of individual variation as to how/where the allophone [ɭ] is produced, and thus linguists and phoneticians have argued about what label to use (e.g., supra-dental, post-alveolar, retroflex, also “apical”—as opposed to “laminal” in the case of the velarized dental allophone [ɫ]) (cf. Kristoffersen (2000: 24f) for discussion). For reasons of simplicity, I have chosen to stick to the term “retroflex” [ɭ] here, well aware that this may disregard a variety of possible placements of the tip of the tongue, all of which, however, are retracted and thus not dental. The auditory perception of the allophone [ɭ] is nevertheless not affected by this. It is interesting to note that, for a long time, the existence of the highly salient “dark” dental [ɫ] after [a(:)] and [o(:)] escaped the notice of most of those who described the sound system of Oslo Norwegian. The development from the simple dental system of the late 19th century to the distinctive two-allophone lateral system of the later 20th century went almost unnoticed (cf. Borgstrøm 1938; Vogt 1939), and we find that most descriptions of Oslo Norwegian (which often is referred to as just “Norwegian”) claim that Oslo speech had just one dental [l]—cf. Marm & Sommerfelt (1943, revised 1967), Popperwell (1963), Haugen & Chapman (1964), Strandskogen (1979). No one had discovered that the system at that time exhibited both a retroflex [ɭ] and a velarized [ɫ] allophone. In Jahr (1981), I stated that Kostas Kazazis, working at the University of Chicago, was the first to mention—in a review in Language of R. G. Popperwell’s (1963) The Pronunciation of Norwegian—that it was clear
46
Ernst Håkon Jahr
there was a velarized [ɫ] allophone in Oslo speech. Popperwell did not mention that such an allophone existed in this book. However, Kazazis had noticed the allophone when listening to the recorded speech samples that accompanied the book (Kazazis 1968). In Jahr (1975), I offered the first description of how the retroflex and velarized l allophones had emerged out of the simple dental [l] allophone of the late 19th century, and at that time (the 1970s) represented salient sociolinguistic features of Oslo speech, in contrast to the Østfold dialect southeast of the capital, which lacked the velarized [ɫ] allophone (cf. Jahr 1981, 1985). My 1975 paper received a lengthy article in reply by Papazian (1977), in which he gave a valuable description of the lateral system in his own Oslo speech, which turned out—as demonstrated in Jahr (1977)—to be identical to the intermediate lateral system of the 1930s described by Vogt (1939). This intermediate system fits nicely between the 1880 scenario and the system of the 1970s and 1980s, cf. Figure 3.1 above.
Halting of the Expected Completion of the Sound Change The ongoing sound change to a generalized [ɭ] allophone could, at the time and from a structural point of view, be expected to continue until the process had eradicated the velarized dental [ɫ] altogether, yielding a new and, once again, simple lateral system: a single retroflex [ɭ] allophone instead of the simple, single dental [l] allophone of the late 19th century. In Jahr (1988), I made the point that the sound change in the lateral system that had occurred in Oslo speech from c. 1880 to c. 1980 seemed to have been halted due to sociolinguistic factors. I claimed that the main sociolinguistic factor was, without doubt, the attitude of Oslo speech users towards the neighbouring, rather low-status, dialect of the county of Østfold, which long before had lost the dental velarized [ɫ], even after back vowels. This feature of the Østfold dialect stood out to Oslo speech users as the most salient feature of that low-status dialect (cf. Strømsodd 1979). To import that particular feature into Oslo speech seemed very unlikely, unless Oslo speech users could somehow accept sounding like dialect speakers from Østfold. Thus, I predicted that the Oslo dialect would keep the “dark” dental [ɫ] after [a(:)] and [o(:)].
Developments After 1988 Changes since my 1988 paper have proven my prediction wrong. Today, we find children and young adults all over Oslo using the retroflex [ɭ] allophone, so far without any apparent sociolinguistic variation according to city region, social group or gender. This has taken the older generation by surprise, and they often refer to this feature as a ‘mispronunciation’, ‘childish’, ‘Østfold dialect’ and the
Social Dialect
47
like. This development, however, managed to pass under the radar of the general public for several years. This is probably due to another much more salient—to the general public—sound change which has taken place over the course of the past 30–40 years or so across the country: an apparent collapse of the distinction between the [ʃ] and [ç] sounds. Since this merger involves two distinct phonemes, and quite a number of minimal pairs are thus affected, it has received more attention and aroused more outrage among parents and the older generation than the merger of the two l allophones, which involves no minimal pairs. When the change from [ɫ] to [ɭ] started, it went unnoticed by many observers, including trained linguists. Arne Torp, then Associate Professor of Scandinavian Linguistics at the University of Oslo, gave a talk in 1999 to the Norwegian Academy of Science and Letters in Oslo about ongoing phonetic/phonological changes in Norwegian, and he did not mention the change from [ɫ] to [ɭ] in Oslo at all. Instead, he focussed solely on the rapid spread of uvular /ʁ/ in west Norway and the even more rapid merger of [ʃ] and [ç] all over the country (Torp 2002). The spread of uvular /ʁ/ does not impact Oslo, while the merger of [ʃ] and [ç] does. In 1999, when he delivered his talk, the beginning of the final stage of the long, ongoing change in Oslo speech from a generalized [l] to a generalized [ɭ] allophone had obviously started.
The Role of the Østfold Dialect, Attitudes to Dialect Some people believe the loss of the [ɫ] allophone in Oslo is an influence from the Østfold dialect, and this “explanation” surfaces from time to time in the newspapers, and more often on social media. This is wrong, of course. The change in the lateral system in Oslo from around the turn of the millennium has nothing to do with the Østfold dialect, but is motivated by a drive to complete the sound change from a dental to a retroflex [ɭ] as the generalized and, in principle, sole allophone of the /l/ phoneme in Oslo speech. An interesting observation concerning language attitudes is relevant here. As long as the use of retroflex [ɭ] after [a(:)] and [o(:)] was confined to the low-status Østfold dialect, the National Broadcasting System (NRK) did not allow their broadcasters to use this feature. When one of the NRK News policy-makers was asked—in a program on dialect usage—why the Østfold dialect could not be accepted on the air, he laughed out loud; it was completely inconceivable for him to allow it. Now, however, that it is more or less common in Oslo for children and young adults to use the retroflex [ɭ] in all contexts (even the 25-year-old MP Mathilde Tybring-Gjedde, who is from Oslo, is consistent in using both the retroflex [ɭ] in all positions and in merging [ʃ] and [ç]), NRK sees no problem in allowing this feature to be used by everybody on the radio and TV. In children’s programs on NRK TV3, this usage now seems to
48
Ernst Håkon Jahr
dominate completely, as children recruited to these programs most often come from the Oslo area. The reason given by NRK for permitting the use of generalized retroflex [ɭ] is interesting indeed, since they still do not allow their staff to merge [ʃ] and [ç] in broadcasts: the retroflex [ɭ] is okay now, they say, because [ɭ] and [ɫ] are just varieties of the “l sound”, while [ʃ] and [ç] are used to differentiate between words. This reasoning is valid enough, of course, since /ʃ/ and /ç/ are different phonemes, but NRK have conveniently forgotten that it prohibited its staff from using retroflex [ɭ] after [a(:)] and [o(:)] when this was characteristic of only the low-status Østfold dialect. This is another nice example proving that it is not the linguistic feature in itself that is important, but who uses it. Features that are used and accepted in Oslo will be taken up more readily nationwide—and consequently used also on the radio and TV.
The Final Stage of the Sound Change In trying to explain why retroflex [ɭ] is now dominant in Oslo, it has erroneously been ascribed to the Oslo variety spoken by an increasing number of foreign immigrants. Oslo has a large proportion of immigrants, especially in certain areas of the city. We would certainly expect that young immigrants and children of immigrants would be linguistically rational and opt for a simplified lateral system, abandoning the dental velarized [ɫ]. And the social status of the Østfold dialect is of no importance whatsoever to these immigrants. However, this explanation is clearly incorrect. The only quantitative investigation (Svendsen 2012) undertaken so far of the retroflex [ɭ] after the back vowels [a(:)] and [o(:)], using data collected between 2004 and 2008, concludes that the use of the generalized retroflex [ɭ] is not at all found more frequently among immigrant school children than among ethnic Norwegian children. The level of education of the parents does not seem to matter either, nor does the area of Oslo the speakers come from, nor their social status or background. As a matter of fact, none of the traditional sociolinguistics factors seem to play a role (Svendsen 2012: 357). It would also be interesting to try and map the geographical dissemination of this feature from the turn of the millennium. A piece of anecdotal evidence may provide an indication that it did indeed spread from one Oslo (sub)urban area to the next. My grandchild (born in 2003) lives in the Oppsal area of Oslo. In 2014, she began training at a swimming club in the neighbouring Lambertseter area, where her father observed that the generalized use of the retroflex [ɭ] was more frequent among children her age. At that time, it was still quite rare in the Oppsal area, but this usage has now increased in Oppsal.
Social Dialect
49
Laterals in Oslo Speech—From a Simple System (c. 1880) to a New Simple System (Today) The picture we have drawn here of the change in the pronunciation of l in Oslo Norwegian can be summarized briefly as follows: From an early (c. 1880) lateral system comprised of mainly one dental/ alveolar /l/ in all positions (except for an occasional retroflex [ɭ] for the assimilated cluster [rl]), the retroflex [ɭ] allophone spread to more phonological contexts during the 20th century, coming to be used in almost all contexts except following an [a(:)] or [o(:)] in a stressed syllable. Jahr (1975, elaborated further in Jahr 1988) claims that this particular system prevailed, and the sound change would not be completed and yield a simpler system, because of the resistance of Oslo speakers to this low-status Østfold dialect feature. However, around the turn of the millennium, the structural pressure for a simpler lateral system won out, and most children throughout the city started using the retroflex [ɭ] in all positions—the culmination of the more than 100-year-long process of change towards a new simple system with—again—just one main l allophone.
Indications That the Social Dialect Hypothesis Is Correct The fact that there was a period during the second half of the 20th century when this “natural” and structurally expected change seemed to have stopped leads us to speculate about the importance and strength of social dialects: how much did the Oslo speakers’ negative attitude towards the Østfold dialect contribute to halting this sound change? One answer to this question could be that the hypothesis about a social dialect explanation is simply wrong, and that the change just needed this length of time to reach its final stage. However, there are some specific empirical data which lend strong support to the claim that sociolinguistic factors were at work here. One such piece of evidence is the fact that during the second half of the 20th century, the l allophone after [a(:)] and [o(:)] could also be realized as a retroflex in some words, i.e., if the word was multisyllabic and the syllable containing the combination or was unstressed: allé ‘alley’, ballet ‘ballet’, hallo ‘hello’, salong ‘drawing-room’, salat ‘lettuce, salad’, alarm ‘alarm’, kollasj ‘collage’. All of these words could and can still be pronounced in Oslo either with the stress on the first syllable— typical of the working-class variety—or with the stress on the second syllable, which was more typical of upper-middle class speech earlier on, but today is probably the most frequently used pronunciation. The interesting observation here, however, is that if the second syllable is stressed, the or sequence in the first—unstressed—syllable can have a retroflex [ɭ] without this pronunciation being linked in any way with
50
Ernst Håkon Jahr
lower-status Østfold speech. (If the stress falls on the first syllable, however, the use of a retroflex [ɭ] would be associated with the “Østfold l”.) A few other words which take stress on the first syllable can be pronounced with the retroflex [ɭ] without being associated with the Østfold dialect: e.g., smålig ‘petty’, blålig ‘bluish’, grålig ‘grayish’. In these words, there is a morpheme boundary between the [o:] and the [ɭ] (små+lig). We can thus posit a general rule that allows the retroflex [ɭ] to be used in words with a morpheme boundary between an [a:] or [o:] and [ɭ]. For the word salig ‘blessed’, however, we would not find the retroflex [ɭ] being used in Oslo speech during the 20th century—such a pronunciation would immediately be associated with the Østfold dialect. Instead, the word would be pronounced with the velarized dental [ɫ], because the morpheme boundary here is not between the [a:] and the [ɫ], but between and . Another indication that the negative attitude towards the Østfold dialect seems to have been influential in the capital is that children or adults who had learnt Oslo speech as their primary spoken language but had grown up far from Oslo itself, e.g., far north in Svalbard, almost without exception use the generalized retroflex [ɭ], even after [a(:)] and [o(:)]. Since this pronunciation is what we would expect from a systematic point of view, and it occurs in Oslo speech users far from the capital, the rather long cessation in the process of generalizing the use of the retroflex [ɭ] after [a(:)] and [o(:)] has to be given a sociolinguistic explanation. We may conclude from the examples discussed here that, until the turn of the millennium, the retroflex [ɭ] in Oslo was used in every phonological context and word possible—except where it could be associated with the Østfold dialect. However, the sociolinguistic restriction caused by the negative attitude towards the Østfold dialect seems to have disappeared after the turn of the millennium. The dam had broken.
An Ahistorical Linguistic Feature in Movies When Norwegian film makers shoot movies today that are set in Oslo during the 20th century, they often use children and young actors who do not sound like people did in the 20th century. The l sound after [a(:)] and [o(:)] immediately gives them away as people from the third millennium. In the 2016 film by Erik Poppe, Kongens nei [The King’s No]—about the Norwegian King Haakon the 7th who refused to surrender to Nazi Germany after they attacked Norway on April 9, 1940 and subsequently occupied the country—the child who played one of the king’s granddaughters said to the king (49:43 minutes into the film): “Det var du som kastet snøballen!” (‘It was you who threw the snowball’). In the word snøball, she pronounced the not with the “dark” /ɫ/ but with the retroflex [ɭ]: [snø:baɭ]—or, as many of the older generation in Oslo still would say, with an “Østfold l”.
Social Dialect
51
Obviously, neither of the two princesses would have used the retroflex [ɭ] in that position in 1940.1
Note 1. I thank friends and colleagues, in particular Jacques Koreman and Peter Trudgill, for constructive comments to earlier drafts of this chapter. I also thank an anonymous reviewer for useful remarks.
References Borgstrøm, C. H. 1938. Zur Phonologie der norwegischen Schriftsprache (nach der ost-norwegischen Aussprache). Norsk Tidsskrift for Sprogvidenskap 9. 250–273. Brekke, K. 1881. Bidrag til dansk-norskens lydlære [Contribution to the description of the Danish-Norwegian sound system]. Separataftryk af Aars og Voss’s skoles indbydelsesskrift for 1881. Kristiania [Oslo]: W. C. Fabritius. Haugen, E. and K. G. Chapman. 1964. Spoken Norwegian (2nd revised edn.; 1st edn. 1944.). New York: Holt Rinehart and Winston. Jahr, E. H. 1975. l-fonemet i Oslo bymål [The l phoneme in Oslo speech]. Norskrift 1 (Talemålsundersøkelsen i Oslo (TAUS). Skrift nr. 1). Arbeidsskrift for nordisk språk og litteratur, 3–15. Oslo: University of Oslo, Department of Nordic Languages and Literature. Jahr, E. H. 1976. Litt om bruk av tjukk l i Oslo bymål. [Some words about the use of retroflex flap in Oslo speech]. In E. Ryen (ed.), Språk og kjønn, 141–146. Oslo: Novus. Jahr, E. H. 1977. Svar og kommentar til Eric Papazians artikkel ‘Om ‘tjukk l’ og andre rare lyder [Reply to and comments on Eric Papazian’s article: On retroflex flap (‘thick l’) and other strange sounds]. Norskrift 14. Arbeidsskrift for nordisk språk og litteratur, 57–67. Oslo: University of Oslo, Department of Nordic Languages and Literature. Jahr, E. H. 1978. The sound ‘retroflex flap’ in Oslo. In W. U. Dressler & W. Meid (eds.), Proceedings of the twelfth international congress of linguists, Vienna, August 28—September 2, 1977, 785–788. Innsbruck: Institute für Sprachwissenschaft der Universität Innsbruck. Jahr, E. H. 1981. L-fonema i Oslo bymål [The l phonemes in Oslo speech]. In E. H. Jahr & O. Lorentz (eds.), Fonologi/Phonology (= Studies in Norwegian Linguistics 1), 328–344. Oslo: Novus. [Reprinted 2008 in G. Wiggen, T. Bull & M. A. Nielsen (eds.), Språkhistorie og språkkontakt/Language History and Language Contact, 80–96. Oslo: Novus]. Jahr, E. H. 1985. Another explanation for the development of s before l in Norwegian. In J. Fisiak (ed.), Papers from the 6th international conference on historical linguistics, Poznań, Aug. 22–26 1983 (= Amsterdam Studies in the Theory and History of Linguistic Science, Series IV, 34), 291–300. Amsterdam & Poznań: J. Benjamins & Adam Mickiewicz University Press. Jahr, E. H. 1988. Social dialect influence in language change: The halting of a sound change in Oslo Norwegian. In J. Fisiak (ed.), Historical dialectology: Regional and social (= Trends in Linguistics, Studies and Monographs 37), 329–335. Berlin & New York: Mouton de Gruyter.
52
Ernst Håkon Jahr
Kazazis, K. 1968. Review of Popperwell, R. G.: The pronunciation of Norwegian (1963). Language 44. 632–633. Kristoffersen, G. 2000. The phonology of Norwegian. Oxford: Oxford University Press. Larsen, A. B. 1907. Kristiania bymål. Vulgærsproget med henblik på den utvungne dagligtale [Oslo speech: The working-class dialect in its natural daily use]. Utgit av Bymålslaget. Kristiania [Oslo]: I kommission hos Cammermeyers boghandel. Linn, A. 2004. Johan Storm: dhi gretest pràktikal liNgwist in dhi werld (= Publications of the Philological Society 38). Oxford & Boston: Wiley-Blackwell. Marm, I. & A. Sommerfelt. [1943] 1967. Teach yourself Norwegian: A book of self-instruction in the Norwegian Riksmål (3rd edn. 1992). New York: David McKay. [1943 edn. published in London: English Universities Press Ltd.]. Papazian, E. 1977. Om ‘tjukk l’ og andre rare lyder [About the retroflex flap (‘thick l’) and other strange sounds]. Norskrift 14. Arbeidsskrift for nordisk språk og litteratur, 1–56. Oslo: University of Oslo, Department of Nordic Languages and Literature. Popperwell, R. G. 1963. The pronunciation of Norwegian. Cambridge & Oslo: Cambridge University Press. Storm, J. 1892. Englische Philologie. Anleitung zum wissenschaftlichen Studien der englischen Sprache. Vom Verfasser für das deutsche Publikum bearbeitet. Zweite, vollständig umgearbeitete und sehr vermehrte Auflage. I: Die lebende Sprache. 1. Abteilung: Phonetik und Aussprache. Leipzig: Reisland. Strandskogen, Å. B. 1979. Norsk fonetikk for utlendinger [Norwegian phonetics for foreigners]. Oslo: Gyldendal. Strømsodd, S. A. 1979. Dialektholdninger blant folk i to bydeler i Oslo [Dialect attitude among people from two areas in Oslo]. Unpublished Cand. Philol. thesis. Oslo: University of Oslo, Department of Nordic Languages and Literature. Svendsen, B. A. 2012. Et språklig bakholdsangrep? ‘Østfold L-ens’ inntog i Oslo. [A linguistic ambush? The entry of the ‘Østfold l’ into Oslo]. In U. Røyneland & H.-O. Enger (eds.), Fra holtijaR til holting spåkhistoriske og språksosiologiske artikler til Arne Torp på 70-årsdagen, 349–365. Oslo: Novus. Torp, A. 2002. Skarre-r og ‘skjøttkaker’—barnespråk, talefeil eller språkforandring? [Uvular /r/ and ‘meat balls’ (with initial /ʃ/ instead of /ç/): Child language, speech error or language change?]. In Det Norske Videnskaps-Akademi Årbok 1999, 334–356. Oslo: Novus. Uri, H. 2004. Hva er språk? [What is language?]. Oslo: Universitetsforlaget. Vanvik, A. 1975. En detalj i Oslodialekten [A detail in the Oslo dialect]. Maal og Minne 1975. 65–66. Vogt, H. 1939. Some remarks on Norwegian phonemics. Norsk Tidsskrift for Sprogvidenskap 11. 136–144. [Reprinted 1981 in E. H. Jahr & O. Lorentz (eds.), Fonologi/Phonology (Studier i norsk språkvitenskap/Studies in Norwegian Linguistics 1), 187–195. Oslo: Novus]. Western, A. 1889. Kurze Darstellung des norwegischen Lautsystems. Phonetische Studien 2 (Zeitschrift für wissenschaftliche und praktische Phonetik mit besonderer Rücksicht auf die phonetische Reform des Sprachunterrichts). Marburg in Hessen, 259–282.
4
The Palatal ~ Non-Palatal Distinction in Irish and Russian Raymond Hickey
1 Introduction In languages which have a series of palatal consonants, these normally stem from an original phonetic assimilation whereby a front vowel following a consonant caused the point of articulation to be shifted to the area of the palate immediately behind the alveolar ridge. At the initial developmental stage there is convergence of articulation with (i) a shift from velar to palatal, as in [k] to [c], and (ii) a retraction of dental/alveolar to palatal, as in [t] to [tʲ] (Padgett 2003a, 2003b; Ní Chiosáin and Padgett 2012). The convergence points from both directions do not necessarily meet; indeed, retraction of dentals/alveolars is usually accompanied by assibilation, so that [tʲ] can frequently become [tʃ]. This affricate can arise through velar fronting but only as a second step after the rise of [c]. Assibilation is not a necessary correlate of palatalisation, cf. Russian, which shows it (historically), and Irish, which does not. In a language with a restricted inventory of palatal sounds, the palatal segments are frequently lateral or nasal. For instance, in (Tuscan) Italian both /lʲ/ and /nʲ/ (as geminates) exist (Lepschy and Lepschy 1977: 87–88); in French /nʲ/ is found while /lʲ/ existed also before its vocalisation to /j/ (Rothe 1978: 78, 140–141); in Spanish /nʲ/ exists (and /lʲ/ for those few dialects which do not have yeísmo, MacPherson 1975: 76–77). As sonorants have a clearly discernible formant structure during their articulation, palatalisation can be recognised acoustically, i.e., an /i/-quality can be heard during the articulation of the sonorant. This /i/-quality is perceptible in the relatively high second formant (around 2,000 Hz), which stands in contrast to a /u/-quality that can be found with velarisation as secondary articulation in which the second formant lies in the region of 800 to 1,000 Hz. After sonorants, the consonants most likely to be affected by palatalisation are /k, g/ and /t, d/, as they can have a shift in their primary articulation towards the hard palate. With /s/ and /z/ palatalisation may be accompanied by a grooved tongue configuration with possible liprounding, yielding /ʃ/ and /ʒ/ respectively. This has been the situation in
54
Raymond Hickey
Irish which has, as its palatal counterpart to /s/, the phonetic realisation [ʃ], a sound to be interpreted phonologically as /sʲ/. Because obstruents do not have the clear formant structure of sonorants, the cue for their point of articulation comes from the bending of the second formant on release of the stop or on the transition from fricative to a following vowel (Fry 1979: 138). The palatalisation of labials, where it occurs, is usually results in analogy to dental and velar palatals (McKenna 2001)—as when labials palatalise before front vowels—though contrast can later arise between palatals and following back vowels, for instance if front vowels are retracted or if a suffix with a back vowel is attached to a stem with a final palatal consonant. In both Russian and Irish the palatal ~ non-palatal contrast plays a significant role in morphology, both inflectional and derivational, and in the phonological structure of lexemes (Hickey 1985; Unbegaun 1969: 29). Again in both languages palatalisation has a common origin as coarticulatory assimilation to a following front vowel (Kuryłowicz 1971; Lunt 1956; Greene 1973). Phonetically motivated palatalisation is no longer obligatory so that palatals can occur without any following front vowel triggering them, cf. Irish1 cleas /kʲlʲas/ ‘trick.NOM’, chlis /xʲlʲɪsʲ/ ‘trick.GEN’, corp /kʌrp/ ‘body.NOM’, choirp /xɪrʲpʲ/ ‘body.GEN’. Each of these word pairs has palatalisation as the marker of the genitive just as Russian has palatalisation as a morphonological alternation between, for example, the singular and plural with certain nouns (Lomtev 1972: 66–67) and within items of a verbal paradigm as in stlat’ /stlatʲ/ ‘to spread’, stelju /sʲtʲiˈlʲu/ ‘I spread’, gnat’ /gnatʲ/ ‘to drive’, gonju /gaˈnʲu/2 ‘I drive’. When palatalisation applies to the entire or nearly the entire consonant inventory, its phonetic realisation will vary depending on the affected segment. For the grammar of the language in question [palatal] is a cover feature. Phonological segments come to form pairs which are distinguished by phonetic features. For palatalisation this feature is point of articulation for dentals and velars; for labials is can be tenseness and spreading of the lips, in both Irish and Russian.
2 Palatalisation in Russian Russian is the Slavic language with a system of palatalisation closest to Irish, both on a lexical and on a phonological level. In Russian the functionalisation of palatalisation has been much discussed in the literature and the present analysis takes cognisance of the studies in this field (for a discussion of palatalisation in Irish and Polish see Cyran and Szymanek 2010). The history of the Slavic languages is characterised by a series of palatalisations which shaped the sound system of each of them (Cubberley 2002: 18–21; Sussex and Cubberley 2006: 137–152). There are
The Palatal ~ Non-Palatal Distinction
55
two major palatalisations which phonetically involved the fronting and assibilation of velars to palatal sibilants in a number of stages (Comrie 1990a: 58). 1) a.
First palatalisation before original front vowels b. Second palatalisation before front vowels arising from monophthongisations
g, k, x
>
ʤ, ʧ, ʃ
g, k, x
>
dzʲ, ʦʲ, sʲ (> s/ʃ)
In addition to this, Common Slavonic developed palatal sonorants from original sequences of sonorant plus /j/. i.e., /nj, lj, rj/ > /nʲ, lʲ, rʲ/. There is a third palatalisation in Slavonic which in its effect is the same as the second but its environment is different: it occurs after front vowels. In addition to full vowels, Slavonic had a special development of inherited *i and *u which gave rise to the reduced vowels known as yers. The later development of the yers depended on their position. If strong, e.g., as the nucleus of a stressed syllable, they resulted in a later full vowel, Russian o: son, Polish e: sen ‘dream’, from a strong back yer (Entwistle and Morison 1964 [1949]: 171; Cubberley 2002: 87). Both a weak and a strong yer can be seen in Russian den’ (from dünü) ‘day’ where the strong yer in the first syllable resulted in /e/ and the weak yer at the end disappeared, after palatalising the preceding /n/ to /nʲ/. If weak, e.g., word-final, yers were lost. The front yer has a reflex in the palatalisation of the consonant which preceded it, Russian brat’ ‘take’. In this respect the weak front yers are like the high front vowels of inflectional endings in Irish before apocope set in during the pre-Old Irish period. As back or front vowels could occur in many morphemes, palatalisation came to be a feature of morphonological alternations in Russian. Palatalisation was not the only feature in such cases, as the inherited inflections of Indo-European with front vowels were usually retained. Russian, like Irish, has a virtually complete and complementary series of palatal and non-palatal consonants (Avanesov 1972: 100). Labial, dental and velar points of articulation can be recognised, all of which are of relevance to the systemic palatal ~ non-palatal distinction.
3 Palatalisation in Irish Already by the time of the earliest written records for Old Irish (from the seventh century onwards) a series of palatal and non-palatal consonants existed, a situation which has continued down to the present in the various dialects, albeit with varying phonetic realisations (Hickey 2011, 2014). In Modern Irish this distinction applies to all consonants, with the exception of /h/ (see Section 4.2. below), and is an essential element of
56
Raymond Hickey
both the grammar and lexical structure of the language. Phonetically, palatal consonants are produced by raising the middle of the tongue towards the palate. This provides the constriction which is the acoustic cue for such segments. Non-palatal consonants are generally velarised with the middle of the tongue lowered and the back raised towards the velum. Acoustically, this gives a hollow sound to non-palatal segments which indicates that they are the opposite of palatal sounds with the constriction just described. As in Russian, this ‘hollow’ quality is very noticeable with non-palatal versions of the sonorants l and n, e.g. lá /lˠɑ:/ [lˠɑ:] ‘day’, ná /nˠɑ:/ [nˠɑ:] ‘nor’.
4 The Analysis of Palatalisation in Russian and Irish For the present analysis of palatalisation in Irish and Russian four points of articulation can be recognised in the oral cavity. 2) 1) labial 3) palatal
2) dental 4) velar
These can be organised as either of two groups. The first rests on the classification of the sounds as phonetically palatal or non-palatal. 3) Phonetically palatal 1) palatal
Phonetically non-palatal 1) labial 2) dental 3) velar
The second grouping is determined by whether the tip or the blade of the tongue is the active articulator for a given sound. All sounds for which this is the case are classified as central; those where it is not the case are non-central or peripheral: 4) Central 1) dental 2) palatal
Non-central (peripheral) 1) labial 2) velar
4.1 The Position of /j/ in Irish and Russian In any discussion of palatalisation the position of [j] in the sound system is of particular relevance as it is itself a palatal; with regard to [j] Russian and Irish differ. In Irish it is always the realisation of the phoneme /ɣʲ/ as in: giobach /gʲʌbəx/ ‘untidy’, an-ghiobach /anˠ ɣʲʌbəx/ [anʲjʌbəx] ‘very untidy’. A peculiarity of [j] is not just that it is a dependent phoneme, i.e., that it only occurs as the result of lenition in Irish, but that it can be the result of leniting both /gʲ/ and /dʲ/ as in: díol /dʲiəlˠ/ ‘sale’, a dhíol /ə ɣʲiəlˠ/
The Palatal ~ Non-Palatal Distinction
57
[ə jiəlˠ] ‘his sale’. There is no independent segment /j/ in Irish. In Russian /j/ exists phonologically, cf. the initial sound in jazyk /jiˈzɨk/ ‘language’ and jug /jug/ [juk] ‘(the) South’. In addition, /j/ can occur immediately after a palatal consonant as in: p’janyj /pʲjanij/ ‘drunk, intoxicated’, confirming its status as an independent segment in Russian (DeArmond 1975). 4.2 The Position of /h/ in Irish and Russian In Irish the glottal fricative /h/ exists both as an independent phoneme in some English loans, like hata ‘hat’, and as the output of leniting /s/ or /t/, e.g., a sheoladh [ə hoːlˠə] ‘his address’, a theach [ə hæːx] ‘his house’. The voiceless palatal fricative [ç] is a possible realisation of /h/ when it represents the lenited form of /s/ and when found before a low or back vowel, e.g., a Sheáin [ə çɑːnʲ] ‘John.VOC’. When /h/ is an independent phoneme, i.e., not the result of lenition, it always has the realisation [h]. [ç] may also occur as the palatal counterpart to /x/, i.e., as the phonetic realisation of / xʲ/. In this case it belongs to the set of palatal/non-palatal consonant pairs. In Russian3 the situation is simpler. It has no /h/; in those loanwords which contain /h/ it has been replaced by /x/, e.g., xoll ‘hall’, and occasionally by /g/, e.g., gavan’ ‘harbour’ < Dutch or Middle Low German Haven.
5 Segment Inventories The following tables show the entire sets of palatal and non-palatal segments for Russian and Irish. In Russian4 there are a few asymmetries such as the long fricative (formerly pronounced [ʃʲtʲʃʲ] with an internal voiceless stop) and the affricate, /ʃʲʃʲ/ and /tʲʃʲ/ respectively, which do not have non-palatal counterparts; the fricatives /ʃ, ʒ/ are always non-palatal and it is normally assumed that the affricate /ts/ does not have a palatal counterpart. 5) a.
Non-palatal/palatal consonant pairs in Russian /p/ /f/ /m/ /t/ /ts/ /s/ /ʃ/ /l/ /r/ /k/ /x/
/pʲ/ /fʲ/ /mʲ/ /tʲ/
/b/ /v/
/bʲ/ /vʲ/
/d/
/dʲ/
/sʲ/
/z/ /ʒ/ /n/
/zʲ/
/lʲ/ /rʲ/ /ʃʲʃʲ/ /kʲ/ /xʲ/
/g/
/nʲ/ /tʲʃʲ/ /gʲ/
58
Raymond Hickey b. Non-palatal/palatal consonant pairs in Irish /p/ /f/ /m/ /t/ /s/ /l/ /r/ /k/ /x/ /h/
/pʲ/ /fʲ/ /mʲ/ /tʲ/ /sʲ/ /lʲ/ /rʲ/ /kʲ/ /xʲ/
/b/ /v/
/bʲ/ /vʲ/
/d/
/dʲ/
/n/
/nʲ/
/g/ /ɣ/
/gʲ/ /ɣʲ/
5.1 The Palatalisation of Labials As mentioned at the outset above, the consonant inventories of Russian and Irish can be split into a labial and non-labial group. The former has a palatal ~ non-palatal contrast. It is known from Irish and the developments in various dialects (including Scottish Gaelic) that labials were palatalised by analogy with the already existing palatals which have the tongue as active articulator (Jackson 1967: 180). Furthermore, there does not seem to be any language which has a palatal ~ non-palatal contrast only for labials (Bhat 1978: 68–70). The secondary nature of palatalised labials is due to the fact that the tongue obviously cannot be involved in their articulation (at least directly) and so they can only arise if there is pressure from the language system for some feature of secondary articulation to be adopted as the realisation of systemic palatalisation. In both Irish and Russian the phonetic correlates of palatalisation of labials are similar: a high front vowel position for the tongue during the articulation of the labial and a glide from this tongue position on the release of the labial (Jones and Ward 1969: 93). In both languages palatalised labials are characterised by tensing and spreading of the lips and a scarcely perceptible glide on their release. 6) /pʲ/ = [+tense, +spread lips] Russian pjat’ /pʲatʲ/ ‘five’ Irish peann /pʲɑ:nˠ/ ‘pen’ In Russian the timing of the release of the stop and the glide from the high front vowel position has been exploited to give a contrast between a palatal consonant (Cʲ) and a palatal consonant followed by a /j/-glide (Cʲ+/j/). The labial consonant in question may be either a stop or a fricative as in pes /pʲos/ ‘dog’, p’eš’ /pʲjoʃ/ ‘you.SG drink’ or vedro /ˈvʲedra/ ‘bucket’, v’eš’ /vʲjoʃ/ ‘you.SG plait’. Vowel quality provides the acoustic cue for palatalisation with labials in medial and final position. This fact has led to a consideration of
The Palatal ~ Non-Palatal Distinction
59
whether consonants or vowels auditorily carry the greatest significance in Russian (Hamilton 1980). Again in both Irish (de Bhaldraithe 1953: 43) and Russian (Jones and Ward 1969: 93) a brief on-glide [ʲ] to a palatal is heard with low and back vowels. 7) a. b. c. d.
Irish cóimheas /koːvʲəs/ > [koːʲvʲəs] ‘comparison’ Irish cóip /koːpʲ/ > [koːʲpʲ] ‘copy’ Russian opera /ˈopʲira/ > [ˈoʲpʲɪrə] ‘opera’ Russian top’ /topʲ/ > [toʲpʲ] ‘marsh, bog’
The examples just given concern palatalised labials after stressed vowels. Due to abundant stress variation in Russian, as opposed to (Western) Irish, which has fixed initial stress in all but a handful of words, there are many instances of medial palatalised labials where the stressed vowel follows. In these cases there is no on-glide, instead a degree of tenseness and spreading of the lips renders their phonological identification unambiguous: obida /aˈbʲida/ ‘insult’. 5.2 The Palatalisation of Velars In Russian palatalised velars occur frequently and can in some instances also stand in front of low or back vowels. They are, however, far more restricted in their occurrence than in Irish. Two sources of palatalised velars before low and back vowels can be identified. The first is that of loanwords; here velars before a back vowel have been used, for instance, as the rendering of French /ky/ as in: kjuvet /kʲuvʲet/ ‘ditch’ or of a fronted velar before a low vowel onset for a diphthong as in gjaur /gʲaur/ ‘nonMuslim’ (from Turkish). A small class of verbs with monosyllabic bases forms the second source, for instance tkat’ ‘to weave’ and its derivatives (Zaliznjak 1980: 669) which, on suffixation of the second person singular morpheme /joʃ/, do not palatalise: tkeš’ /tkʲoʃ/ ‘you.SG weave’ (Daum and Schenk 1971: 634). All other verbs with stem-final /k/ show palatalisation of the stem-final consonant, e.g., tečeš’ /tʲiˈtʲʃʲjoʃ/ ‘you.SG flow’ (cf. teku /tʲiˈku/ ‘I flow’). The Irish palatalised velars are those which have existed in the language since the earliest attestations; they are true palatal stops (phonetically /kʲ/ = [c]; /gʲ/ = [ɟ]), which have not, and do not, show any signs of developing into affricates as has happened at various stages and places in the development of the Romance languages (see Price 1971: 49 on palatalisation in French). Nor did any changes, such as those which led to affricates in Russian, even occur in the history of Irish. 5.3 The Interpretation of [i] and [ɨ] A long-standing issue for Russian scholars divided the Leningrad (St Petersburg) and Moscow phonological schools (Halle 1968: 8–9; Cubberley
60
Raymond Hickey
2002: 63). This is whether the vowels [i] and [ɨ] (Leed 1963; Plapp 1996) should be treated as two phonological units or two realisations of the same unit, with the Moscow school favouring the latter analysis yielding a fivevowel, not a six-vowel system. Their argument is based on the complementary distribution of the vowels [i] and [ɨ], the latter only occurring after non-palatal consonants and the former only after palatal consonants. 8)
Vowel system for Russian i (ɨ) u e o a
The sound written in Irish corresponds to the [ɨ] of Russian written û. It shows precisely the same distribution as its Russian counterpart: it only occurs after a non-palatal consonant and so also shows a complementary distribution with the high front vowel. The only difference is that the Irish vowels are both phonologically long, i.e., [ɨː] (naoi [nˠɨː] ‘nine’) and [iː] (ní [nʲiː] ‘(small) thing, something’) with the latter contrasting with a short [i] as in bith ‘any, ever’ # bí ‘be.IMPERATIVE’; [ɨ] in Irish is always long. The vowels [iː] and [ɨː] are not treated as separate phonological units in Irish. The parallel with Russian is that the [ɨː] of Irish is, bar its length, pronounced exactly as the vowel indicated in Russian by û. The discussion of the status of the high central or front vowels appears to be more motivated in Irish than in Russian. In the latter [ɨ] can be seen as a postvelar realisation of a general phoneme /i/. Consider the example s Igorem [ˈsɨgərʲɪmʲ] ‘with Igor’ (Jones and Ward 1969: 36) where initial [i] changes to [ɨ] after a velarised /s/, i.e., the non-palatal equivalent of /sʲ/. This is a case of assimilation, here realised as depalatalisation. The few cases of word initial [ɨ] in (non-Russian) place names and its occurrence in the name of the letter ы (y in transliteration) is hardly enough to ascribe it phonological status. In Irish, however, there are several instances of initial [ɨ:] in native words, e.g., aon [ɨːnˠ] ‘one’, aos [ɨːs] ‘people, group’, and it occurs on its own in aoi [ɨː] ‘guest’. Furthermore, it triggers non-palatal assimilation in a preceding consonant, e.g., in aon a ráite [ɪnˠɨːnˠ . . .] ‘on the point of saying’ (in = /ɪnʲ/). 5.4 Coronal Consonants Coronals refer to all non-labials and non-velars in both Irish and Russian, and include dentals, alveolars and palatals. The sounds have in common that they involve the tip or blade of the tongue as the active articulator. In a particular language these sounds may group in a different manner on a phonological plane: thus dentals and alveolars do not contrast phonologically
The Palatal ~ Non-Palatal Distinction
61
in Irish or Russian. However, non-palatalised dentals and palatalised dentals contrast phonologically in both languages. There is also acoustic evidence that the set of sounds which contrast with coronals, i.e., labials and velars, group together, but rarely with central consonants (Hickey 1984). Consider, for instance, the genitive ending -ogo /-əvə/ of Russian and occasional words like segodnja /sʲiˈvodnʲa/ ‘today’, which show a labial fricative from a much earlier velar stop (Cubberley 2002: 131). The inventory of coronals for Irish and Russian are as follows. 9) Coronal consonants in Irish /t/ /d/ /s/
/tʲ/ /dʲ/ /sʲ/
/l/ /n/ /r/
/lʲ/ /nʲ/ /rʲ/
Coronal consonants in Russian /t/ /d/ /s/ /z/ /ts/
/tʲ/ /dʲ/ /sʲ/ /zʲ/ /tʲʃʲ/
/l/ /n/ /r/
/lʲ/ /nʲ/ /rʲ/
/ʃ/
/ʃʲʃʲ/ /ʒ/
Irish has no voiced alveolar or palatal fricatives. Phonologically, there are no affricates in Irish either, in contrast to Russian. [tʃ] does, however, occur due to assimilation: tiocfadh sé /tʲʌkəx sʲeː/ > [tʲʌkətʃeː] ‘he would come’. This can be seen as the result of two processes, one is assimilation of point of articulation, /x/ to /s/, and the other is fortition of /s/ to /t/ as there is an absolute rule in Irish phonotactics that no two fricatives can occur in succession (fricative dissimilation). [tʃ] remains a sandhi phenomenon in Irish and is not connected with non-palatal ~ palatal alternations. In Irish palatal /sʲ/ is realised as [ʃ] (the same sound as in English shoe), that is Irish uses the distinction between the narrow-grooved [s] and the broad-grooved [ʃ] to carry the phonological distinction between nonpalatals and palatals. In Russian, the two fricatives in rosa /raˈsa/ ‘dew’, osel /aˈsʲol/ ‘ass’ are both narrow-grooved, the second being pronounced with phonetic palatalisation. Equally, there is a distinction between the phonetic [ʃ] of Irish and the phonemic /ʃ/ and /ʒ/ of Russian as the latter are velarised (Jones and Ward 1969: 133): nozh /noʒ/ > [noʃ] ‘knife’, shit’ /ʃitʲ/ > [ʃɨtʲ] ‘to sew (imperfective)’. The [ʃ] from /sʲ/ in Irish may in its turn be palatalised to [ʃʲ] if it occurs contiguously with a further palatal consonant whose articulation is phonetically palatal: sneachta /sʲnʲaxtə/ > [ʃʲnʲæːxtə] ‘snow’. The two coronal stops of Irish, /t/ and /d/, are similar to those of Russian in articulation as both are dental. However, all voiceless stops in Irish are aspirated, much as in English. The coronal palatal stops, /tʲ/ and
62
Raymond Hickey
/dʲ/, are again similar in both languages. In Russian there is no neutralisation of the contrast between /tʲ/ and the voiceless palatal affricate /tʲʃʲ/, cf. tina /ˈtʲina/ ‘ooze, mud’, čina /ˈtʲʃʲina/ ‘rank.GEN’. In Irish the affrication of palatal stops varies: the Northern dialects have greater amounts of affrication. 5.5 Coronal Sonorants With the sonorants /l/ and /n/ palatality is most easily perceptible as there is a recognisable formant structure during the articulation of the sonorant. Not only is palatality clear with /lʲ/ and /nʲ/, but the reactive velarisation is also obvious with /l/ and /n/ (= [lˠ] and [nˠ]), e.g., lán [lˠɑːnˠ] ‘full’, nó [nˠuː] ‘or’. But it is equally present with all other non-palatal consonants, as can be seen from the following narrow transcriptions: tuí [tˠɨː] ‘straw’, daor [dˠɨːr] ‘dear’. With obstruents there is no audible formant structure during their articulation and so this velarisation is perceived as formant bending on their release; the tongue configuration for [tˠ] and [nˠ], however, is exactly the same, the only difference being the nasal release and voice. In Russian a similar situation obtains. The laterals in el [jelˠ] ‘ate’ and dolg [dolˠk] ‘duty’ are clearly velarised (Jones and Ward 1969: 109–110) as are the nasals, cf. nužnyi [nˠuʒnˠɨj] ‘necessary’ where both nasals are velarised. In both Russian and Irish the primary articulation is still dental; velarisation is achieved by lowering the body of the tongue while simultaneously raising the back. In Irish, as in Russian, both /r/ and /rʲ/ exist but the realisation of them is different. Both of them are apical but in Irish /r/ is either a frictionless continuant, like the English [r], but velarised, when it occurs initially, medially (intervocalically) or word-finally after a vowel, e.g., rua [rˠuə] ‘red-haired’, árasán [ɑːrˠəsɑːnˠ] ‘flat, apartment’, cur [kʌrˠ] ‘putting’, or it is a flap when it comes after a stop and before a stressed vowel brón [bɾoːnˠ] ‘sorrow’. /r/ is not rolled in Western Irish (unless for emphasis) although in the North a rolled [r] occurs as the articulation of /r/ with some older speakers; this has generally been replaced by a non-trill realisation. Russian has a tap articulation occurring when /r/ is found intervocalically and not immediately preceding a stressed vowel, e.g., gorod [ˈgoɾət] ‘town’, xorošo [xəɾʌˈʃo] ‘good’. Otherwise a slight trill is the normal realisation of non-palatal /r/. The palatal /rʲ/ of Russian maintains its trill character with simultaneous raising of the tongue body towards the palate. This raising of the tongue is accompanied by closing of the jaws and slight lip spreading (Bolla 1981: 99–100, plate 77). In Irish palatal /rʲ/ is realised with considerable tongue body raising and is never a flap or trill. It is found at its clearest in intervocalic or pre-stress, post-stop position: amáireach [əˈmɑːrʲəx] ‘tomorrow’ (Western Irish pronunciation). The functional load of the /r/ # /rʲ/ contrast in pre-vocalic initial position is slight in Irish. The few instances of this which exist tend to neutralise
The Palatal ~ Non-Palatal Distinction
63
the distinction in favour of /r/: reangach [rængəx] ‘wiry, sinewy’, reatha [ræə] ‘run.GEN’. Furthermore, there is no evidence in Irish for the alternative of sequentialising the apico-alveolar and palatal articulations, so that the sequence /r/+/j/ does not occur; nor are there any cases of palatal /rʲ/ plus /j/ as in Russian r’janyj /rʲjanij/ ‘zealous’.
6 Conclusion Both Russian and Irish have a palatal ~ non-palatal distinction for the majority of consonants which arose historically from the co-articulation of consonants with following high front vowels. In both languages the distinction is phonetically regular to a large extent, more so in Russian which additionally has a number of sibilants and affricates, whereas Irish only has voiceless sibilants. Again, in both languages the distinction plays a central role in the both the inflectional and derivational morphology testifying to the ability of the palatal ~ non-palatal distinction to indicate grammatical categories in languages.
Notes 1. Note that all transcriptions, unless otherwise specified, show Western Irish pronunciations, see de Bhaldraithe (1945) and Hickey (2011). 2. In the present study the phonemic interpretation of is not discussed for reasons of space. There are different opinions on the interpretation of when it represents a vowel, which is always unstressed, though when it varies between stressed and unstressed the reduced unstressed realisation is regarded as an allophone of /o/, cf. stol /stol/ [stol] ‘table’-NOM; stola /stoˈla/ [stʌˈɫa] ‘table’-GEN (see Hamilton 1980: 35–50 for details). 3. Throughout the present study Russian words have been transliterated. The system used is the so-called scholarly or scientific system which uses Latin letters with a hacek for affricates and sibilants (a practice similar to that found in Czech). This system is essentially different from that in which affricates and sibilants are indicated by a Latin letter followed by h, e.g., ch for č, sh for š, zh for ž, shch for šč, as well as kh for x [x] and ts for c [ts]. 4. An anonymous reviewer suggested using a narrow IPA transcription for some sounds of Russian, e.g., IPA /ʂ, ʐ/ for /ʃ,ʒ/. While I take the point that these sibilants are phonetically non-palatal, for the present discussion it is sufficient to mark palatality for palatal sounds; where there is no diacritic on the consonant they are phonetically non-palatal (in both Russian and Irish). For a detailed discussion of sibilant fricatives in Russian, see Żygis (2003).
References Avanesov, Ruben I. 1972. Russkoe literaturnoe proiznoshenie [Russian literary pronunciation], 5th edition. Moskva: Proveshchenie. Bhat, Darbhe N. S. 1978. A general study of palatalization. In Joseph H. Greenberg (ed.), Universals of human language, vol. 2, 47–92. Stanford: University Press. Bolla, Kálmán. 1981. A conspectus of Russian speech sounds. Köln, Wien: Böhlau.
64
Raymond Hickey
Comrie, Bernard. 1990b. Russian. In Bernard Comrie (ed.), The major languages of Eastern Europe, 63–81. London: Routledge. Cubberley, Paul. 2002. Russian: A linguistic introduction. Cambridge: Cambridge University Press. Cyran, Eugeniusz & Bogdan Szymanek. 2010. Phonological and morphological functions of palatalization in Irish and Polish. Celto-Slavica 3. 99–133. Daum, Edmund & Werner Schenk. 1971. Die russischen Verben [Russian verbs], 7th edition. Leipzig: Bibliographisches Institut. DeArmond, Richard C. 1975. On the phonemic status of [i] and [j] in Russian. Russian Lingusitics 2. 23–35. de Bhaldraithe, Tomás. 1945. The Irish of Cois Fhairrge, Co. galway. Dublin: Dublin Institute for Advanced Studies. de Bhaldraithe, Tomás. 1953. Gaeilge Chois Fhairrge. An deilbhíocht [The Irish of Cois Fhairrge: The morphology]. Dublin: Dublin Institute for Advanced Studies. Entwistle, William J. & Walter A. Morison. 1964 [1949]. Russian and the Slavonic languages, 2nd edition. London: Faber and Faber. Fry, Dennis B. 1979. The physics of speech. Cambridge: Cambridge University Press. Greene, David. 1973. The growth of palatalization in Irish. Transactions of the Philological Society 72(1). 127–136. Halle, Morris. 1968. Phonemics. In Thomas A. Sebeok (ed.), Current trends in linguistics, vol. 1: Soviet and East European linguistics, 5–21. The Hague: Mouton. Hamilton, William S. 1980. Introduction to Russian phonology and word structure. Bloomington, IN: Slavica Publications. Hickey, Raymond. 1984. On the nature of labial velar shift. Journal of Phonetics 12. 345–354. Hickey, Raymond. 1985. Segmental phonology and word formation: Agency and abstraction in the history of Irish. In Jacek Fisiak (ed.), Historical semantics and word formation, 199–219. Berlin: Mouton de Gruyter. Hickey, Raymond. 2011. The dialects of Irish: Study in a changing landscape. Berlin: de Gruyter Mouton. Hickey, Raymond. 2014. The sound structure of Modern Irish. Berlin: de Gruyter Mouton. Jackson, Kenneth H. 1967. Palatalization of labials in the Gaelic languages. In Wolfgang Meid (ed.), Festschrift für Julius Pokorny, 179–192. Innsbruck: Innsbruck University Press. Jones, Daniel & Dennis Ward. 1969. The phonetics of Russian. Cambridge: Cambridge University Press. Kuryłowicz, Jerzy. 1971. Morphonological palatalization in Old Irish. Travaux Linguistique de Prague 4. 67–74. Leed, Richard L. 1963. A note on the phonemic status of Russian high unrounded vowels. Slavic and East European Journal 7(1). 39–42. Lepschy, Giulio & Anna L. Lepschy. 1977. The Italian language today. London: Hutchinson. Lomtev, Timofej P. 1972. Fonologija sovremennogo russkogo jazyka [The phonology of the contemporary Russian language]. Moskva: Vysshaja Shkola. Lunt, Horace. 1956. On the origin of phonemic palatalization in Slavic. In Morris Halle et al. (eds.), For Roman Jakobson, 306–315. The Hague: Mouton.
The Palatal ~ Non-Palatal Distinction
65
MacPherson, Ian R. 1975. Spanish phonology. Manchester: University Press. McKenna, Malachy. 2001. Palatalization and labials in the Irish of Torr, Co. Donegal. In Brian Ó Catháin & Ruairi Ó hUiginn (eds.), Béalra. Aistí ar theangeolaíocht na Gaeilge [Speech: Essays on the linguistics of Irish], 146–160. Maynooth: An Sagart. Ní Chiosáin, Máire & Jaye Padgett. 2012. An acoustic and perceptual study of Connemara Irish palatalization. Journal of the International Phonetic Association 42(2). 171–191. Padgett, Jaye. 2003a. The emergence of contrastive palatalization in Russian. In E. Holt (ed.), Optimality theory and language change, 307–335. Dordrecht: Kluwer. Padgett, Jaye. 2003b. Contrast and post-velar fronting in Russian. Natural Language and Linguistic Theory 21(1). 39–87. Plapp, Rosemary K. 1996. Russian /i/ and /ɨ/ as underlying segments. Journal of Slavic Linguistics 4. 76–108. Price, Glanville. 1971. The French language: Past and present. London: Edward Arnold. Rothe, Wolfgang. 1978. Phonologie des Französischen [The phonology of French], 2nd edition. Berlin: Eric Schmidt. Sussex, Roland & Paul Cubberley. 2006. The Slavic languages. Cambridge: Cambridge University Press. Unbegaun, Boris O. 1969. Russische Grammatik [Russian grammar]. Göttingen: Vandenhoeck und Ruprecht. Zaliznjak, Andrej A. 1980. Grammaticheskij slovarj russkogo jazyka [A grammatical dictionary of Russian]. Moscow: Izdateljstvo Russkij Jazyk. Z˙ygis, Marzena. 2003. Phonetic and phonological aspects of Slavic sibilant fricatives. ZAS Papers in Linguistics 3. 175–213.
5
Vennemann’s Head Law and Basque* Miren Lourdes Oñederra Olaizola
1. Introductory Dedication This chapter is offered to Kasia Dziubalska-Kołaczyk (who deserves much more) in celebration of her productive career and remembering our good old times in Vienna, in Hawaii, in Poznań, and Krems, and Gniezno: ever since Toledo so long ago. I am grateful for our always intense and sincere scientific (or not) discussions. My contribution is no more than a sketchy presentation of some ideas that are emerging from my work (Oñederra, In prep.) on the phonology of Basque within the theoretical framework of Natural Phonology (hereafter NP).
2. Basque Phonology At first sight Basque phonology could seem relatively straightforward. In general, it is its highly agglutinative morphosyntax and the complexity of its morphology that are considered the most interesting and intriguing parts of the language. However, the explanatory nature of NP offers a stimulating program of research. NP is not precisely good for the description a language, as we heard David Stampe once say in one of the classes of the course on phonology that I was lucky to share with Kasia in 2002: “Phonotactics is important for the description of a language, but NP is not about describing languages . . . NP is not good for describing languages (OT is like that too)”. I am sure Kasia will also have these or similar words in her course notes. As Maddieson (2009: 132) said: “Optimality Theory, as well as Natural Phonology, are all more focused on developing a model of how the phonological part of the overall language faculty might be shaped”.
2.1. The Phonemic Inventory in NP Although a thorough account of the issue would fall beyond the scope of this paper, for the sake of information about the theory a few words should be said on the specific importance of the phonemic paradigm in
Vennemann’s Head Law and Basque
67
Table 5.1 Sibilant inventory of Basque Sibilants
fricative affricate
alveolar
predorsal
apical
laminal
s̺ ts̺
s̻ ts̻
ʃ tʃ
NP. Unlike other theories that “do not account for the existence of [phonemic] distinctions” (Donegan 2002: 65), NP considers the phonemic inventory of a language to be the result of a specific set of process inhibitions. Such processes are in NP context-free constraints “on possible perceptions and therefore on possible intentions in speech production” (Donegan & Stampe 2009: 3). NP was first to give a dynamic sense to Jakobson’s implicational laws of irreversible solidarity by means of context-free phonological processes. From that point of view and given that context-free processes are phonetically motivated, the presence of a certain (class of) phoneme(s) in a given language is analysed as the result of a linguistically motivated counterphonetic choice that must have something to do with the inner structure of its phonological system both synchronically and diachronically. Sibilants and palatals stand out as the two most marked subsets in the otherwise rather straightforward phoneme inventory of Basque, where we find a largely voiceless obstruent paradigm (voiced stops, the only voiced obstruents, are found exclusively in onset position), relatively many coronal consonants (10 out of 23), the unmarked system of five oral vowels (only some eastern varieties have the sixth /y/), etc. But, among sibilants, a rich set of oppositions between apical vs. laminal alveolars both in the fricative and affricate paradigms is distinguished on top of the predorsal pair (fricative and affricate).1 Palatals or predorsals are the other quantitatively marked set: 6 out of 23, they are phonemically distinguished in any manner of articulation. This relatively larger paradigm of sibilant and palatal phonemes constitutes an outstanding difference in relation to Castilian Spanish, with which Basque otherwise shares many phonological characteristics. 2.2. The Syllable (Basque a Very CV-Language) As Basque is an agglutinative language in which words may vary in length from eight or more syllables to one syllable, rhythmic regularity is based on the syllable (Donegan 1993: 9–10). Not surprisingly, data unequivocally show the strong tendency of Basque to optimal syllabic structure or “more natural syllabifications” (Donegan & Stampe 1978: 30). The preference for canonical CV syllables in Basque is obvious at least since the adaptation of Latin loanwords where even muta cum liquida consonant
68
Miren Lourdes Oñederra Olaizola
clusters were avoided by either vowel epenthesis (1.a) or consonant deletion (1.b). (1) a.
Lat. cruce(m) > Basque gurutze ‘cross’ Lat. livru(m) > Basque liburu ‘book’ b. Lat. pluma > Basque luma ‘feather’ Lat. placet > Basque laket (da) ‘pleases’
The derivations shown in (1.a) may easily be considered old witnesses of a tendency to avoid consonant clusters in onset position by the phonemic status given to the (inserted) svarabhakti vowel. That vowel must have been perceivable (i.e., phonemic) to Basque speakers according to prosodic (specifically syllabic) conditions of their first language. A mutatis mutandis parallel case can be seen in Donegan (1993: 6) where, from the NP perspective, epenthesis is also interpreted as the result of prosodic regularities in Munda languages. Although accent is a good candidate for the explanation of those facts, this chapter will focus on issues related to syllable structure. Other phenomena affecting syllable structure in Basque may also be interpreted as a signal of what Dressler et al. (2010: 54) would consider phonotactic simplicity: restrictions on the types of consonants that appear as word internal codas (fricative sibilants and sonorant consonants but no stops), together with the strong tendency to open syllables (for a more detailed account of these issues, see Hurch 1988; Jauregi 2008). These schematic notes on the syllable are intended to be a first approach to a basic idea underlying my work (Oñederra, In prep.), in the sense that syllabic structure may be the motivation of several segmental processes applying in onset position. Although NP holds that explanation and phonetic motivation are “atomistically” to be sought in each phonological process (i.e., each constraint), it also envisages a global vision of the phonology of a language beyond the list of active and inhibited processes: It would be (trivially) possible to count inhibitions, but we think that a more promising program is to explore the influence of prosody on the kinds of inhibitions that are favored. Donegan & Stampe (2009: 6, footnote 4)
3. Perceptually Helpful Lenitions? But let us focus now on the specific subject of this chapter, inspiration for which came from the attentive reading of page 142 of Donegan and Stampe’s (1979) seminal article “The Study of Natural Phonology”. On that page a three-way typology of phonological processes according to their function is given: prosodic processes, fortition processes, lenition processes. The paragraph that accounts for lenition processes is practically
Vennemann’s Head Law and Basque
69
a mirror image of the paragraph where fortition processes are explained (Donegan & Stampe 1979: 142–143). There is only one line where the symmetry is broken. While fortitions are said to “invariably have a perceptual teleology, but often incidentally make the segments they affect more pronounceable as well as more perceptible”, lenitions “have an exclusively articulatory teleology, making segments and sequences of segments easier to pronounce”. In other words, lenitions are not at all (even incidentally) linked to the optimization of perception. The analysis of a still productive process of Basque led me to the conclusion that I was missing either a slot in the fortition vs. lenition pair or a better explanation of the facts. By means of that process stops are devoiced when following a voiceless obstruent (de facto a fricative sibilant). There is no doubt that we are talking about an assimilation, therefore a lenition. But, given that a voiceless obstruent substitutes for a voiced one, it paradoxically seems that perception of the sound will also be optimized. (2)
Maite joan da [ɟoan̪d̪a] ‘Maite (proper name) left’ Maite ez da joan [est̪ aɟoan] ‘Maite did not leave’
As an example among many others, the alternation in (2) shows that the third person singular form of the auxiliary verb ‘to be’, da, is pronounced with a voiced dental stop [d̪a], while the stop is voiceless when following the voiceless sibilant of the negative particle ez [es].2 As said before, this obvious assimilation of the voice quality of the preceding consonant has the effect of devoicing an obstruent and, hereby, optimizing its perception. So to say, it seems an at least partially fortitive lenition.3 Ignoring now the more than probable oversimplification of that expression, the fundamental question is why the perceptual effect should be so. The first consideration that immediately came to my mind was that perhaps the differences between a language like Basque and languages like English might have something to do with the paradox. For reasons only superficially addressed above, the place to look for fundamental type-constituent characteristics in NP are prosodic differences.
4. Prosody as the Main Factor In the very enlightening and didactically useful process classification mentioned in the preceding section, Donegan and Stampe define prosodic processes before fortitions and lenitions. That precedence is not merely formal. In NP prosodic processes are “the most important factor in the living phonological pattern of a language and its long-range phonological ‘drift’” (Donegan & Stampe 1979: 142). This substantial precedence of prosody is present in all the works of Donegan and Stampe more or less
70
Miren Lourdes Oñederra Olaizola
explicitly, as when they recapitulate the 1978 foundations of their theory on the syllable: “Beats and syllables are not only the domains of timing but also they and their natural parts (. . .) are the domains of phonological processes” (Donegan & Stampe 2004: 17–18). The theory is further justified in the illustrative fourth section of Donegan and Stampe (2009), where, under the meaningful title “Phonological Processes Apply to Features Within Prosodic Domains”, proposals for “an asegmental phonology” of feature interactions are explained. The most obvious prosodic motivation for the process we are discussing can precisely be related to an asegmental fact not mentioned in the exclusively segmental description given above in Section 3. Devoicing assimilation happens in a heterosyllabic domain, i.e., there is a syllable boundary between the devoicing environment and the devoiced stop (/es+d̪a/ [est̪ a]). It is well known that onset position is the position of maximal consonantal strength: “The onsets of prosodic sequences are intrinsically stronger in articulation (hence more perceptually prominent) than the off-sets” (Donegan & Stampe 1978: 30). This point of maximal consonantal strength is therefore the perfect moment for the intrinsic prominence of a voiceless stop to be preferred over a voiced one.4 The onset character of that part of the syllable would therefore behave as the driving force to the point of overriding the immediate assimilatory origin of devoicing. We may analyze this tendency observed in Basque in terms of Theo Vennemann’s preference laws for the syllable. The assimilation under discussion could specifically be interpreted as an instantiation of the Calibration Law (a) of Head strengthening, by which A.B > A.C, where C is stronger than B (Vennemann 1988: 50). The process can also be seen as the realization of part (b) of the Head Law: A syllable head is more preferred: (a) the closer the number of speech sounds in the head is to one, (b) the greater the Consonantal Strength value of its onset, and (c) the more sharply the Consonantal strength drops from the onset toward the Consonantal Strength of the following syllable nucleus. (Vennemann 1988: 13–14) In the remaining lines of this section other phenomena of Basque phonology that equally result in syllable onset strengthening will be briefly introduced.5 4.1. Affrication of Sibilants as Onset Reinforcement Onset reinforcement may also give us the rationale to account for the affrication of sibilants following sonorant consonants that is seen in the adaptation of loanwords like tentsio [tents̺ io] (Sp. tensión)
Vennemann’s Head Law and Basque
71
‘tension’, pultso [pults̺ o] (Sp. pulso) ‘pulse’, unibertsitate [uniβerts̺ itate] (Sp. universidad) ‘university’. There are also language internal alternations like: (3) a. utzi zion [utsision] ‘(somebody) lent something (to somebody)’ eman zion [emantsion] ‘(somebody) gave something (to somebody)’ (3) b. etorri zen [etorisen] ‘(somebody) came’ etorri al zen [etorialtsen] ‘did (somebody) come?’ It could be added here that the post-sonorant affrication of sibilants shown in (3), highly productive in many Basque dialects, nicely illustrates the possibility for fortition processes to also favour articulation (Donegan & Stampe 1979: 142), because the presence of the non-continuant element in the sonorant-fricative transition makes the articulation of the consonant sequence easier.6 But the fortitive character of this process, also known in other languages under names like emergent, excrescent stop, etc. (for American English and Italian dialects, see Busà 2007; Shosted 2011, among others) is beyond any question, independently of whether we conceptualize it as an epentheses or as the affrication of a fricative (Jauregi & Oñederra 2010). On top of adding discontinuity to the obstruction, affricates are also better syllable heads than fricatives due to the continual drop of consonantal strength from beginning toward, and including, the nucleus (Vennemann 1988: 18). Moreover, affricates in onset position make syllable contact better too, as Dressler and Siptár (1989: 34) remind us in their analysis of Hungarian: syllable contact is better if the syllable rise is more complex than the immediately preceding syllable fall. 4.2. Intervocalic Epenthesis We may also add to the list of onset-strengthening phenomena in Basque the process of intervocalic consonant epenthesis, which inserts a consonant before a vowel when the preceding syllable ends in a (phonemically) high vowel, as in: (4)
gorri [gori] ‘red’ [goriʃa] ‘the red one’ mendi [men̪d̪i] ‘mountain’ [men̪d̪iʃa] ‘the mountain’
This time it is not a mere strengthening of the onset consonant, but the actual creation of a consonantal onset by epenthesis, so that a potential V.V sequence turns out to be a V.CV one. This alternation is highly frequent in the dialects where it is productive, since it applies, for instance, whenever the determiner suffix /-a/ is added to roots ending in /i/. Intramorphemic examples like biotsa [biʃotsa] ‘heart’ or biar [biʃar] ‘tomorrow’ show the general (though stylistically variable) productivity of the
72
Miren Lourdes Oñederra Olaizola
process. Different palatal consonants result from this epenthesis in different dialects ([∫], [ʒ], [ʝ], etc.), and Hualde (2003: 48) reports that a palatal glide /j/ is inserted also after the high labial vowel /u/ in Low Navarrese. In that environment labial [β̞] insertion has been productive in other varieties of the language until very recently. When the high vowel /i/ is realized as a glide (by desyllabification after a vowel), instead of the epenthesis, an apparent change of the glide into a full consonant is found: /mai+a/ ‘the table’→ maja → ma.ʝa, with crucial resyllabification. But this is surely only a superficial impression, as the final result may well be the product of a glide deletion (j → ø/___ ʝ), fed by the very same epenthesis process that we have been talking about.7 Forms of [ʒ] inserting dialects, where both glide and inserted consonant are pronounced, would point in that direction: /mai+a/ [maj.ʒa]. 4.3. Another Case of Onset Reinforcement Also, the following substitution is probably the result of the coalescence of several processes: two sibilant fricatives are realized as a unique affricate, as in the sentence “bazen baina ez zen” (‘it was but it was not’) containing the alternation: (5)
basen [basen] ← /ba/ (affirmative prefix) + /zen/ ‘was’ etsen [etsen] ← /ez/ ‘no’ + /zen/ ‘was’
The affricate substituting for the heterosyllabic fricative sequence becomes the onset of the second syllable. The first syllable loses its coda and becomes an open syllable (VC.CV > V.CV). Whatever is the exact analysis of this substitution, the optimization of the syllabic structuring is obvious both by the loss of one coda and by onset strengthening in the subsequent syllable. 4.4. Contact Epenthesis Another instance of onset strengthening is found in the already lexicalized contact epenthesis (Vennemann 1988: 53) that turned Spanish al revés ‘upside down’, enredar ‘to mess around’, Enrique ‘Henri’ into Basque aldrebes, endreatu, Endrike. The same phenomenon can be observed in the diachronic evolution of Spanish words like *tenerá > *tenrá > tendrá ‘(she/he/it) will have.’ Another candidate to be included in this series of onset optimization phenomena of Basque phonology would be the consonant prothesis that can still be occasionally heard in improvised oral verse composition (e.g., [ɟesan] as the pronunciation of the verb esan ‘to say’ at the beginning of a line). The final result very much resembles the diachronic desyllabifications starting from prevocalic */e/ that have reached in central dialects
Vennemann’s Head Law and Basque
73
the obstruent stage of present joan [xoan] ‘to go’ (Michelena [1977] 1985: 515 et passim). I would also propose that the process that changes initial glides (you [ju:]) into full palatal consonants in Basque speakers’ pronunciation of English as an L2 ([ɟu]) falls somewhere near all these onset consonantal strengthening moves. Whether or not the progressive direction of the palatalizing assimilation (bina [biɲa] ← /bi+na/ ‘two+distr. suff’) should also be considered under this light still needs further thought and work. On the other hand, the fact that palatalization may also apply to coda consonants somewhat weakens the wish to see it within the potential collection of onset strengthening processes. Neither is the widespread preference of Basque for falling diphthongs (almost exclusive in the traditional forms of central and western dialects) explainable in terms of the onset strengthening preference. To propose that this could be a consequence of initial accent (Donegan & Stampe 1978: 31) is only a conjectural way to bypass the issue. At any rate, how accent interferes with syllabic regularities will have to be addressed in order to reach a better understanding of the prosodic domains of segmental processes in Basque (Oñederra 2015).
5. Some Concluding Remarks I want to argue that onset strengthening is a prosodic preference shaping the syllabic patterning of Basque, and that it probably is one of the main factors governing the choice of segmental processes in the living phonology of the language. In other words, processes like obstruent devoicing and the affrications of sibilants, could be interpreted as the segmental consequence of the rhythmic organization of Basque, an agglutinative syllable-timed language in which syllable optimization clearly outranks the weight of accent (cf. Donegan & Stampe 1978; Donegan 1993). Let me finish this chapter with the double question that I took down during a course given in Madrid by Bernhard Hurch (2013): Are there languages for which the syllable plays the central role of organizing segments into higher units and organizing words and accentual groups (measures, feet) into smaller units? And are there languages for which the syllable obviously does not play this central role of timing? Basque is, in my opinion, a good example of a positive answer to the first question. The processes and substitutions which we have been talking about could be seen, using Hurch’s words again, precisely as “processes (re-) establishing higher syllabic principles.” I am sure that I would very much enjoy a discussion with Kasia about how all this would come out from her proposal of a phonology without the syllable (Dziubalska-Kołaczyk 1995).
74
Miren Lourdes Oñederra Olaizola
Notes * A preliminary version of this chapter was presented as a lecture at the Centre for the Research on Basque IKER (CNRS, UMR-5478) in Bayonne, France, in May 2017. I would like to thank the audience for their inspiring questions. I also sincerely thank the anonymous editors and reviewers for their very helpful comments and suggestions. 1. I am following Hualde’s (2003: 15) terminology. These oppositions are maximally kept only in some dialects, but they are favoured by the norms for the standard pronunciation of formal registers. 2. The process applies without exception in monomorphemic forms. 3. In equivalent Spanish sequences, exclusively weakening lenitions take place, assimilatory voicing of first consonant and approximantization of the second one: desde [dezδe] ‘since’, musgo [muzγo] ‘moss’. 4. See Donegan (1995: 63) on the optimal feature combination obstruent-voiceless. 5. It should be taken into account that current Spanish does not share these phonological phenomena, in spite of its otherwise relatively similar phonology. The reasons for that escape me at the moment. Some have also been productive in older stages of Spanish (see 4.4). 6. Productivity of this and the other processes mentioned in the chapter is noticeably decreasing at present due inter allia to Spanish or French bilingualism and to the influence of standard orthography. 7. Lexical remnants like gaua [gaβ̞a] ‘the night’ show that similar phenomena may have happened in the environment of a labial high vowel.
References Busà, M. G. 2007. Coarticulatory nasalization and phonological developments: Data from Italian and English nasal-fricative sequences. In M. J. Solé, P. Speeter Beddor & M. Ohala (eds.), Experimental approaches to phonology, 55–191. Oxford: Oxford University Press. Donegan, P. 1993. Rhythm and vocalic drift in Munda and Mon-Khmer. Linguistics of the Tibeto-Burman Area 16(1). 1–43. Donegan, P. 1995. The innateness of phonemic perception. In V. Samiian & J. Schaeffer (eds.), Proceedings of the 24th western conference on linguistics (WECOL 94), Volume 7, 59–69. Fresno, CA: Dept. of Linguistics, California State University. Donegan, P. 2002. Phonological processes and phonetic rules. In K. DziubalskaKołacyk & J. Weckwerth (eds.), Future challenges for natural linguistics (LINCOM Studies in Theoretical Linguistics 30), 57–81. Münich: LINCOM EUROPA. Donegan, P. & D. Stampe. 1978. The syllable in phonological and prosodic structure. In A. Bell & J. Bybee Hooper (eds.), Syllables and segments, 25–34. Amsterdam: North Holland. Donegan, P. & D. Stampe. 1979. The study of Natural Phonology. In D. A. Dinnsen (ed.), Current approaches in phonological theory, 126–173. Bloomington, IN: Indiana University Press. Donegan, P. & D. Stampe. 2004. Rhythm and the synthetic drift of Munda. In R. Singh (ed.), Yearbook of South Asian languages and linguistics, 3–37. Berlin & New York: Mouton de Gruyter. Donegan, P. & D. Stampe. 2009. Hypotheses of natural phonology. Poznań Studies in Contemporary Linguistics 45(1). 1–31. http://phonology.wordpress.com.
Vennemann’s Head Law and Basque
75
Dressler, W. U., K. Dziubalska-Kołaczyk & L. Pestala. 2010. Change and variation in morphonotactics. Folia Linguistica 31. 51–67. Dressler, W. U. & P. Siptár. 1989. Towards a natural phonology of Hungarian. Acta Linguistica Hungarica 39. 29–51. Dziubalska-Kołaczyk, K. 1995. Phonology without the syllable. Poznań: Motivex. Hualde, J. I. 2003. Segmental phonology. In J. I. Hualde & J. Ortiz de Urbina (eds.), A grammar of Basque, 15–65. Berlin & New York: Mouton de Gruyter. Hurch, B. 1988. Is Basque a syllable-timed language? ASJU 13(3). 813–825. Hurch, B. 2013. Syllabic typology. Course notes Master de Estudio Fónicos, UIMP Madrid. January–February. Jauregi, O. 2008. Euskararen silaba: egitura eta historia (The syllable of Basque: Structure and history). Bilbao: UPV-EHU. Jauregi, O. & M. Lourdes Oñederra. 2010. Sibilantes tras consonante sonante en euskera: inserción vs. africación, fonética y fonología. Revista de Estudos Linguísticos da Universidade do Porto 5(1). 71–89. Maddieson, I. 2009. Phonology, naturalness and universals. Poznań Studies in Contemporary Linguistics 45(1). 13–140. Michelena, L. [1977] 1985. Fonética Histórica Vasca. San Sebastián: Seminario Julio de Urquijo de la Excelentísima Diputación de Guipúzcoa. Oñederra, M. Lourdes. 2015. Tipologia eta hizkuntzaren erritmoa: Stampe eta Doneganen ideien inguruan (Typology and language rhythm: About Stampe and Donegan’s ideas). In M. J. Ezeizabarrena & R. Gómez (eds.), Eridenen du zerzaz kontenta. Sailkideen omenaldia Henrike Knörr irakasleari (1947–2008), 559–607. Bilbao: UPV-EHU. http://hdl.handle.net/10810/17243. Oñederra, M. Lourdes. In prep. El patrón sonoro de la lengua vasca, una aproximación desde la Fonología Natural. Shosted, R. K. 2011. An articulatory-aerodynamic approach to stop excrescence. Journal of Phonetics 39. 660–667. Vennemann, T. 1988. Preference laws for syllable structure and the explanation of sound change. Berlin: Mouton de Gruyter.
6
Ex Oriente Lux How Nepali Helps to Understand Relict Numeral Forms in Proto-Indo-European Piotr Gąsiorowski and Marcin Kilarski
1. Introduction The feminine forms of the numerals ‘3’ and ‘4’ in Celtic and Indo-Iranian have attracted much attention and inspired numerous efforts to make sense of their phonological shape and morphological structure. This chapter examines various aspects of their reconstruction and attempts to throw new light not only on the irregular sound changes that affected them, but also on the original functions of their constituent morphemes. In search of insight provided by functional analogies, the behaviour and properties of the Proto-Indo-European numerals in question are compared with the expression of feminine reference in modern Nepali, a language possessing grammatical gender as well as numeral classifiers due to contact with neighbouring Tibeto-Burman languages. The morphosyntactic and semantic analogies between numeral phrases in Nepali and Proto-Indo-European help to determine the role of internal and external factors, such as inherited typology and language contact, in the rise and loss of morphosyntactic complexity.
2. Feminine Numerals in Celtic and Indo-Iranian Languages The extraordinary feminine forms of the numerals ‘3’ and ‘4’ in Celtic and Indo-Iranian, reconstructed as *tisres (accent uncertain, see below) and *kʷétesres, do not resemble anything else in the Indo-European (IE) declensional system.1 Their occurrence in geographically and genetically distant branches of the family, as well as their shared idiosyncratic irregularities, leave no doubt as to their antiquity. After several decades of debate, the current prevailing view is that Proto-Indo-European (PIE) distinguished only two genders, common and neuter (or animate and inanimate, in terms of semantic categories). After the separation of Anatolian, the feminine gender arose in the remaining part of the family (Luraghi 2011: 437–438). Although it was not always marked by morphological means, its development was accompanied by a rapid expansion of the femininizing suffixes *-ah2 and *-ih2/*-i̯áh2-.
Ex Oriente Lux
77
What the Celtic and Indo-Iranian feminine numerals ‘3’ and ‘4’ display instead is the element *-sr- (with plural inflections) added to what seems to be a simplified form of the respective numeral stem, *tri- and *kʷetu̯ or-. A similar suffix can be found in the word *su̯ é-sor- ‘sister’ and in several Anatolian nouns for female humans or deities. It can be identified with a scantily attested archaic word for ‘woman, female’, *sór-/*sér- (for the comparative data and details of reconstruction, see Harðarson 2014). The attested examples indicate that at a very early stage in the history of Indo-European, presumably before the emergence of regular feminine inflections, there were attempts to femininize certain words, including the lower numerals, by coining compounds with a noun root meaning ‘woman’, which was soon grammaticalized as a suffix of limited productivity. Such a phenomenon would not be entirely isolated. For example, Corbett (1991: 168–169) mentions “overdifferentiated” gender-agreement targets in the Central Dravidian languages. Those languages have two genders, male human vs. “other”, but some of the lower numerals have special “female human” forms which co-occur with nouns denoting women. Such a marginal distinction, restricted to a single word-class, cannot be elevated to the role of a separate gender, but it could potentially act as a nucleus for the emergence of one. The similarity to the Indo-European situation described above is remarkable. In order to identify the morphological components of the numerals in question, it is first necessary to carry out a formal diachronic analysis of their phonological structure. No explanation proposed so far satisfactorily accounts for the vocalism of *kʷétesres. The usual compositional form of the numeral ‘4’ is *kʷ(e)tur- (prevocalic), *kʷ(e)tu̯ r̥ - ~ *kʷ(e)tru(preconsonantal). In obscured compounds, the weak variants *kʷtur-, *kʷtru-, *kʷtu̯ r̥ - may be further reduced to *tur-, *tru-, *tu̯ r̥ - by cluster simplification. None of these forms is a plausible source of *kʷéte-, with its two full vowels. A straightforward derivation of *kʷétesres from a univerbated phrase, *kʷétu̯ ores sóres ‘four women’ would require some completely ad hoc acrobatics to account for a massive deletion of unwanted segments in one fell swoop. Unconventional segmentations like *kʷ(e)t-(h1)ésr- (Kim 2008: 158) fail on several counts: (1) the bare root *kʷet- is not attested in the meaning ‘4’; (2) the reconstruction of the ‘woman’ word as *h1és-or- rather than *sór- is untenable (Harðarson 2014: 46–47); (3) the accentuation of *kʷétesres remains unexplained. Such complications can be avoided if we start with *kʷétu̯ er sóres. The first component here would be an endingless locative, *kʷétu̯ er ‘in a foursome’, formed to the hypothetical noun *kʷét-u̯ r̥ , whose collective *kʷétu̯ ōr is the neuter form of the numeral ‘4’ itself.2 A dissimilatory loss of the first *r is reminiscent of what we must independently assume for the *ti- part of *tisres, and the loss of postconsonantal *u̯ is at least paralleled by such simplifications as *tu̯ é → *te in unaccented position. The
78
Piotr Gąsiorowski and Marcin Kilarski
resulting univerbation *kʷétesores may have lost its *o through analogical levelling, since a zero-grade would have been expected in the “weak” cases with accented inflectional endings. The form *tisres, presumably reflecting earlier *trisres (or *trisores), is also puzzling. Its Vedic reflex, tisráḥ, has a final accent which cannot be original in the nominative, since the PIE plural ending *-es was inherently unaccentable; the accent must have been generalized from the weak case forms. If parallel to *kʷétesres, *tisres should be expected to contain some adverbial (“locatival”) derivative of the numeral ‘3’. The bare stem *tri, however, is not attested in this function anywhere in IndoEuropean. An original *trisú sóres, with the locative plural of ‘3’ meaning ‘in (a group of) three’, seems possible, although a haplological reduction, *trisusores > *trisores, would have to be posited. Elusive as the details of the derivation are, one thing is clear: PIE endocentric compounds meaning ‘3–4 women forming a group’ evolved into feminine variants of the animate numerals ‘3–4’ already at an early date. As a three-gender system was established, they were eventually recruited into the numeral system, assuming the role of bona fide feminine numerals. The complete agreement of Celtic and Indo-Iranian guarantees that this exaptation took place already in the common ancestor of all the extant IE languages, if not earlier. Judging from trace evidence surviving elsewhere, it may have been part of a more general attempt to characterize various word classes as feminine with the element *-s(o)r-.3 When this early femininizer became outcompeted by new productive gender markers, the process was aborted, leaving behind only a handful of lexical fossils. Indo-European compounds involving numerals are of course common across the family. In particular, multiplicative adjectives or adverbs are formed by attaching various nouns to the compositional allomorphs of the cardinals. In Ṛgvedic Old Indo-Aryan alone, one finds compounds with -bhuji-, -vṛt- and -vártu-, -vaya-, -dhātu-, i.e., deverbal nouns expressing meanings like ‘bend’, ‘turn’, ‘branch’ or ‘layer’ (Emmerick 1991: 191). Note also PGmc. *-falþa- ‘-fold’, Lat. -plex, Gk. -πλαξ (from *pleḱ- ‘twine’) and many similar constructions. These compounds, however, are typically exocentric (‘having n bends/layers/threads/folds’), while *-s(o)res, originally endocentric (‘women in a group of n’), evolved into something resembling, to all intents and purposes, a numeral classifier (as defined in Gil 2013). Such an interpretation of the *-sr- suffix is explicitly adumbrated by Hackstein (2010: 58–64) but has otherwise gone unexplored.
3. Gender and Numeral Classifiers in Nepali Nepali belongs to a small group of Indo-Aryan languages which possess not only grammatical gender, but also numeral classifiers. As is well known, these two most common types of nominal classification have a
Ex Oriente Lux
79
near-complementary distribution in the languages of the world (cf. Corbett 2013; Gil 2013; Sinnemäki In press). There is a striking disagreement in the literature concerning the properties of gender and numeral classifiers in Nepali, which can be attributed to considerable areal variation and the use of different definitional criteria (for details, see Tang & Kilarski In press). As regards gender, following Tang and Kilarski (In press), we analyse Nepali as having two co-existing gender agreement systems based on the masculine/feminine and human/non-human oppositions. With regard to the former opposition, female humans and female animals are feminine, with the residue assigned to the masculine gender. Gender agreement is found on adjectives, verbs, possessive adjectives and ordinal numerals as well as the general classifier. In contrast, third-person personal pronouns distinguish between humans vs. non-humans, as in u ‘he/she’ vs. tyo ‘it’. The two types of agreement patterns are illustrated in (1) with adjectives and verbs vs. pronouns referring to male and female humans and an inanimate referent: (1) Gender agreement on adjectives, verbs and pronouns (Tang and Kilarski In press) a. Sarita Paris-ma bosche ra Sarita Paris-at live.prs.3sg.f and ‘Sarita lives in Paris and she is beautiful.’ b. Ram Paris-ma boscha ra Ram Paris-at live.prs.3sg.m and ‘Ram lives in Paris and he is handsome.’ c. mer-o ghar Paris-ma cha my-m house(m) Paris-at be.prs.3sg.m ‘My house is in Paris and it is beautiful.’
u ramr-i che he/she beautiful-f be.prs.3sg.f u ramr-o cha he/she beautiful-m be.prs.3sg.m ra and
tyo it
ramr-o cha beautiful-m be.prs.3sg.m
In addition to grammatical gender, Nepali has a system of numeral classifiers. In view of the lack of agreement in the available literature, according to which the number of classifiers in Nepali varies between two (Acharya 1991: 100) and over 200 (Pokharel 1997, 2010), we follow Tang and Kilarski (In press), who distinguish at least ten numeral, i.e., sortal classifiers. These include the general classifier wota together with other specific classifiers, e.g., the human classifier jana and dana, the classifier for round fruits. The noun phrase ordering is Numeral-Classifier-Noun, as in tin jana manche (three clf.human man) ‘three men’. The general classifier wota differs from the other classifiers in terms of its semantic and morphosyntactic properties. First, it can be used instead of the specific classifiers for human and inanimate nouns, similarly to general classifiers found in the languages of East Asia. Second, with regard to its expression, while the specific classifiers can only occur independently, the general classifier can occur either as a free-standing word following a numeral or fused with the numeral, as in ek wota chora (one clf.general
80
Piotr Gąsiorowski and Marcin Kilarski
son) ‘one son’ vs. eu-ta chora (one-clf.general son) ‘one son’. Finally, in a typologically uncommon pattern, the general classifier distinguishes masculine and feminine agreement both in its independent and fused forms, analogously to adjectives and verbs as in (1) above.4 Masculine and feminine agreement forms of the independent and fused forms of the general classifier with human nouns are illustrated in (2) below. (2) Gender agreement on the general classifier in Nepali (Tang & Kilarski In press) a. tin wot-a keto three clf.general-m boy(m) ‘three boys’ b. tin-t-a keto three-clf.general-m boy(m) ‘three boys’ c. tin wot-i keti three clf.general-f girl(f) ‘three girls’ d. tin-t-i keti three-clf.general-f girl(f) ‘three girls’ The emergence of the complex nominal classification system in Nepali can be attributed to language contact. As an Indo-European language, Nepali possesses grammatical gender of the type characteristic of most other languages in the family. In turn, the presence of a numeral classifier system is the consequence of stable, long-term contact with neighbouring Tibeto-Burman languages. The actual status of the two nominal classification systems in Nepali is made still more complex by considerable variation in their expression across the varieties of Nepali spoken in Nepal as well as India and Bhutan. In general, the expression of gender agreement and the inventories of numeral classifiers depend on the relative proximity to other Tibeto-Burman and Indo-Aryan languages. For example, the eastern dialects of Nepali are more frequently characterized by the loss of gender and by larger inventories of numeral classifiers due to the influence of Tibeto-Burman languages, which predominate in the east of Nepal. This situation largely mirrors those Indo-Aryan languages which have lost gender and acquired numeral classifiers, e.g., Bengali (see, e.g., Emeneau 1956; Pokharel 2010; Tang & Kilarski In press).
4. Discussion Analogies between the early Indo-European developments and the Nepali system are easy to draw. Here we focus on the emergence of a marker
Ex Oriente Lux
81
expressing femininity, which in both cases obligatorily follows a numeral within a numeral phrase if it refers to a noun denoting a female. With regard to morphosyntactic realization, the feminine classificatory marker is realized in the two cases in a number of different ways. While in Celtic and Indo-Iranian the marker fused with the numeral by forming a compound which eventually became morphologically opaque as a result of sound change, in Nepali the feminine suffix appears with the general classifier either as part of an independent word following the numeral or is fused with the numeral, as in tin-t-i keti (three-clf.general-f) girl(f) ‘three girls’. However, in both cases we observe the same basic ordering, i.e., Numeral-Fem.Marker-Noun, as in Vedic cáta-sra(s) . . . ghṛta-dúhaḥ (four-clf.female.nom.pl butter-yielder(female).nom.pl) ‘four butteryielders’ (Rigveda 9.89.5) and Nepali tin wot-i keti (three clf.general-f girl(f)) ‘three girls’ (cf. (2)). Both patterns are consistent with the typical ordering of elements within a numeral phrase in a numeral classifier language, where the classifier and numeral appear contiguously (see Greenberg 1972; Her 2017). In addition, there are analogies regarding the classificatory nature of the early Indo-European feminine marker and related elements in Nepali. In the first place, in both cases we are dealing with the expression of feminine reference—either by semantically bleached “femininizing” markers on numerals in early Indo-European or feminine agreement suffixes in Nepali. In addition, the Indo-European marker shares common properties with Nepali numeral classifiers, where in both cases we find a classificatory system expressed on numerals, in which animate referents are classified with respect to sex. The etymological connection between the *-sr- suffix and the noun *sór- implies its original use with nouns denoting female humans; it was only later that the suffix was reinterpreted as a general feminine marker within a new three-gender system. We do not know what motivated the transition from the Proto-IndoEuropean animate/neuter system to the three-gender system familiar from most of the non-Anatolian branches. One could suspect contact with some other language family with a gender system based on the masculine/feminine contrast. However, reconstructible Proto-Indo-European as well as the immediate ancestor of the non-Anatolian group show the hallmarks of languages used in societies characterized by “small size, dense social networks, large amounts of shared information, high stability, and low contact” (Trudgill 2011: 185)—namely, a richness of morphological categories combined with extreme morphophonological and morphosyntactic complexification. It is therefore probable that the motivation was internal. The Celtic and Indo-Iranian feminine forms of ‘3’ and ‘4’ offer a snapshot of a precursor stage of that process. There is evidence that the function of the element *-sor- was at first purely derivational: it expressed oppositional “natural” (morphosemantic) femininity outside the system
82
Piotr Gąsiorowski and Marcin Kilarski
of gender concord. Subsequently, it was employed as a numeral classifier, but only to a limited extent. Its full grammaticalisation was perhaps made problematic by the fact that the inherited numerals ‘5–10’ were indeclinable. The potential combinability of *sóres with locatival adverbs like *pénkʷer, *(s)u̯ éḱser ‘in a group of 5, 6’ etc. (cf. Majer 2017) remained unexploited; at any rate, there is no positive evidence of such combinations. The numeral *du̯ o- ‘2’, in turn, required dual inflections and therefore followed a pattern of its own. The experiment involving a numeral classifier for counting “female humans” failed with the rise of a full-fledged feminine gender. As mentioned above, numeral classifiers rarely co-occur with morphosyntactic gender agreement in one and the same language. The older forms were able to survive by being exapted as feminines participating in gender agreement. This early state of affairs is preserved as a morphological relict in Celtic and Indo-Iranian. The remaining branches of non-Anatolian Indo-European either lost specifically feminine forms of the declinable numerals or created new feminines that outcompeted those with an incorporated classifier.
5. Conclusions Considering the time depth that is involved, the developments discussed in this chapter offer a fascinating insight into a wide range of phenomena worthy of more extensive study. These include the diversity and complexity of the ways that speakers develop to categorize their environment, the variation in the expression of grammatical categories such as gender and number across the history of the Indo-European languages and finally the still rarely acknowledged advantages of integrating diachronic and typological perspectives. We suggest here that the language ancestral to non-Anatolian Indo-European possessed a nascent system of numeral classifiers before it developed the familiar three-gender system. The example of Nepali shows that gender and numeral classifiers may coexist in the same language system (subject to a great deal of sociolinguistic variation). If a comparable situation existed in pre-Proto-Indo-European, it would explain both the origin of the unusual feminine forms of ‘3’ and ‘4’ and their recruitment as marginal markers of grammatical gender. Thus, the paradox of feminine numerals being apparently older than the feminine gender is also resolved. The question whether Proto-IndoEuropean had any other numeral classifiers remains open and perhaps deserves to be investigated further.
Notes 1. See Kim (2008) for a re-evaluation of the Celtic data. 2. On double full-grade locatives or “locativals” see especially Nussbaum (1986: 189–190); see also Majer (2017) on such forms derived specifically from IE numerals.
Ex Oriente Lux
83
3. For other possible examples, see Gąsiorowski (2017). 4. See Ciucci and Bertinetto (2019) for a discussion of possessive classifiers which show agreement in gender and number in Ayoreo (Zamucoan; Bolivia and Paraguay).
References Acharya, J. 1991. A descriptive grammar of Nepali and an analyzed corpus. Washington, DC: Georgetown University Press. Ciucci, L. & P. M. Bertinetto. 2019. Possessive classifiers in Zamucoan. In A. Y. Aikhenvald & E. Mihas (eds.), Genders and classifiers: A cross-linguistic typology. Oxford: Oxford University Press. 144–175. Corbett, G. G. 1991. Gender. Cambridge: Cambridge University Press. Corbett, G. G. 2013. Number of genders. In M. S. Dryer & M. Haspelmath (eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. http://wals.info/chapter/30. Emeneau, M. B. 1956. India as a linguistic area. Language 32(1). 3–16. https:// doi.org/10.2307/410649. Emmerick, R. 1991. Old Indian. In J. Gvozdanović (ed.), Indo-European numerals, 163–198. Berlin & New York: Mouton de Gruyter. Gąsiorowski, P. 2017. Cherchez la femme: Two Germanic suffixes, one etymology. Folia Linguistica Historica 51(s38). 125–147. https://doi.org/10.1515/ flih-2017-0005. Gil, D. 2013. Numeral classifiers. In M. S. Dryer & M. Haspelmath (eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. http://wals.info/chapter/55. Greenberg, J. H. 1972. Numeral classifiers and substantival number: Problems in the genesis of a linguistic type. Working Papers in Language Universals 9. 1–39. Hackstein, O. 2010. Apposition and nominal classification in Indo-European and beyond. Vienna: Verlag der Österreichischen Akademie der Wissenschaften. Harðarson, J. A. 2014. Das andere Wort für ‘Frau’ im Urindogermanischen. In S. Neri & R. Schuhmann (eds.), Studies on the collective and feminine in IndoEuropean from a diachronic and typological perspective, 23–54. Leiden: Brill. Her, O.-S. 2017. Deriving classifier word order typology, or Greenberg’s Universal 20A and Universal 20. Linguistics 55(2). 265–303. Kim, R. 2008. The Celtic feminine numerals ‘3’ and ‘4’ revisited. Keltische Forschungen 3. 143–168. Luraghi, S. 2011. The origin of the Proto-Indo-European gender system: Typological considerations. Folia Linguistica 45(2). 435–463. https://doi.org/10.1515/ flin.2011.016. Majer, M. 2017. The ‘fiver’: Germanic ‘finger’, Balto-Slavic de-numeral adjectives in *-ero- and their Indo-European background. Transactions of the Philological Society 115(2). 239–262. https://doi.org/10.1111/1467-968X.12099. Nussbaum, A. J. 1986. Head and horn in Indo-European. Berlin & New York: Walter de Gruyter. Pokharel, M. P. 1997. Nepali Vakya Vyākaran [Grammar of Nepali syntax]. Kathmandu: Royal Nepal Academy. Pokharel, M. P. 2010. Noun class agreement in Nepali. Kobe Papers in Linguistics 7. 40–59.
84
Piotr Gąsiorowski and Marcin Kilarski
Sinnemäki, K. In press. On the distribution and complexity of gender and numeral classifiers. In F. Di Garbo, B. Wälchli & B. Olsson (eds.), Grammatical gender and linguistic complexity. Berlin: Language Science Press. Tang, M. & M. Kilarski. In press. Functions of gender and numeral classifiers in Nepali. Poznań Studies in Contemporary Linguistics. Trudgill, P. 2011. Sociolinguistic typology: Social determinants of linguistic complexity. Oxford: Oxford University Press.
Part 2
On Close Inspection Theoretical and Methodological Approaches
7
Pholk Phonetics and Phonology Nancy Niedzielski and Dennis R. Preston
1. Introduction In this chapter we investigate folk attitudes and beliefs about phonetics and phonology. Folk Linguistics (FL), as defined by Niedzielski and Preston (2000), is the study of what nonlinguists believe about language and the level of awareness of linguistic facts that the folk possess (Silverstein 1981; Preston 1996, 2016). No other linguistic area brings this into focus as clearly as that of phonetics and phonology, perhaps the one most frequently brought up by the folk. We should make it clear that we hold to the definitions first stated in Niedzielski and Preston (2000: vii): 1) The folk in FL are not rustic or unsophisticated; they are simply those whose professional work is not linguistic.1 2) Folk belief about language is like all folk belief; it is neither true nor false, and one reason to study it is to show the relationship between it and scientific positions. That second point immediately leads to two others: 3) What the folk have observed may offer language professionals insights into areas of research that have not yet occurred to them; we comment on this further in 5) and 6) below. 4) Those who want to perform any sort of language intervention (i.e., applied linguistics) will be better served by understanding local folk belief. Perhaps our own beliefs about the value of FL are best summarized by others who have appealed to the need for such knowledge. 5) One cannot hope to carry out the full investigation of a speech community without taking its linguistic folk belief into account: Hymes (1972: 39) suggests that “If the community’s own theory of linguistic repertoire and speech is considered (as it must be [italics ours] in any serious ethnographic account), matters become all the more complex and interesting”. 6) In addition, some aspects of the study of language variation and change depend on the study of FL: The theory of language change must establish empirically the subjective correlates of the several layers and variables in a heterogeneous structure. Such subjective correlates . . . cannot be deduced from the place of the variables within linguistic structure. (Weinreich et al. 1968: 186)
88
Nancy Niedzielski and Dennis R. Preston
We do not dwell on these matters and instead focus attention on folk references to sounds and sound systems. We examine data from fiftyfour interviews with residents of southeast Michigan collected in the 1990s; their demographic details are omitted here to save space, but are given in Niedzielski and Preston (2000). We first describe a taxonomy of references to sounds and sound systems derived from many of these interviews, one that allows us to summarize the most salient factors. We conclude by reflecting on the place of such folk comments within a more general characterization of a sociolinguistically-oriented FL and its position in more general concerns of phonology.
2. Folk Categories Our taxonomy, related to the notions which occur in the data used for this study, deals with several levels: 1. Speaker Identity, including ideas about region, nativeness, ethnicity, sex and gender, status, and education 2. Phonetic Realization, including discussions of articulatory distinctiveness, deletion and insertion, rate, pitch, intonation, and stress 3. Phonological Units, such as consonants and vowels, and discussions of spelling and the lexicon 4. Evaluation, including ideas about correctness, articulatory effort, and comprehension We illustrate each of these levels with examples drawn from our interviews below. 2a. Speaker Identity Awareness of sound and sound systems is most clearly demonstrated in reference to social characteristics, in particular, regional identities. In such discussions, we find that while broad generalizations are often made (e.g., Texas speech is “slower” and “lazy”), fine-grained analyses of regional vowel differences are also discussed. As we have shown in previous work (Preston 1989; Niedzielski & Preston 2000), dialects of the Southern states are highly salient to all our US respondents, and several detailed and specific phonological features of these dialects are offered. The Southern drawl is defined by respondents as “lengthening” (particularly at the end of sentences: see below), and, in fact, more than one respondent identifies this phenomenon as a Southern “draw” since the words in Southern English are “prolonged” or “pull[ed] out”: G: We- we call it a ‘draw’ because . . .
Pholk Phonetics and Phonology
89
K: They- prolong it. H: Prolong it. K: Prolong it. Pull it out. . . . They speak slower. . . . H: =Uh so like you say uh Southern draw a- a- could you tell me how do you spell ‘draw’? How do you spe[ G: I-just like we’d normally draw, ((spelling)) d-r-a-w. Duration then is such an important part of Southern folk phonology that some respondents have derived a folk etymology for a “draw”. We return to folk descriptions of duration in the following section. In addition, specific phonological features of the Southern drawl are discussed. Monophthongization of /ɑɪ/ and its role in the perception of a Southern drawl surfaces as a topic quite frequently: “yeah there’s a Southern accent. They they tend to draw(l) some words out longer. [T]heir /ɑɪ/s are are like [ɑ] . . . in a lot of places”. Discussion of specific Southern phonological features, however, often involves consonantal variables. One respondent who worked as a salesperson reported being confused as to whether her customers were asking for ‘white’ or ‘wide’ shoes, and she and her husband go on to discuss whether the customers were black or white Southerners. Another respondent states that /ð/ is realized as [v] in the phrase “over there”, and yet another respondent illustrates an epenthetic /r/ in ‘oil’ by describing a Southern woman who has lived in the North for forty years and is still misunderstood at a filling station, suggesting that she must do it on purpose because “I know she knows better than that by now”. It is no surprise that most of these discussions are not only descriptive, but evaluative as well; as we discuss in greater detail below, most discussions of Southern English include negative value judgments. Perhaps due to the proximity to Canada, phonological features of Canadian English are commonly discussed. As expected, Canadian raising (the raising and centralization of the onsets to /ɑɪ/ and /ɑʊ/) is often noted by our respondents, but they also note a “broad ‘a’”, and label it “British”, in words like ‘pass’ and ‘class’. This “broad ‘a’” is also noted (and labeled “British”) in ‘tomato’ and ‘aunt’. Specific phonological features of Michigan dialects are not noted; rather, more evaluative phonological descriptions are offered, and we discuss these below. Phonological features linked to ethnicity are also discussed, most often with regard to African-American English. Not surprisingly, our respondents usually offer lexical items containing their versions of these variants, rather than describing the variants themselves. One (white) respondent tells a lengthy story about an interaction with an African-American man and the difficulty that results from the respondent mistaking ‘Atwood’
90
Nancy Niedzielski and Dennis R. Preston
for ‘Edward’. Another offers ‘door’ (with the /r/ deleted), ‘bed’ (with the vowel fronted and diphthongized), and ‘that’ (with the interdental fricative realized as an alveolar stop). Several white respondents make derogatory comments about AfricanAmerican English, and phonology is not spared. For instance, we have an unfortunate example of a (white) respondent using an unintelligible string of sounds to imitate black speakers: “They’re not talking ((lowered voice)) ‘Hey man [h_ub_h_u_b_u_h__b_u_]’”. To her, it is not even real language. An interesting illustration of confounding phonology and morphology comes from a discussion where it is asserted that black speakers omit segments at the end of words. The respondents, who are African American themselves, are discussing the phrase “faking it”. When the fieldworker asks about the spelling, one of the respondents says that “you’d probably spell it f-a-k-i-n”, and then laughs and says that “black people don’t put the endings on their words”. A different respondent then states that this is true: “Cause when I’m reading I’m- I say- you know I don’t put the endings on my words. And I have to correct myself, I have to start putting the endings on them. You know you may leave the ‘s’ out or something you know”. Thus, the phonological process of deletion, which is erroneously illustrated with ‘fakin’, becomes analogized to the morphological deletion of perhaps the plural or third person singular marker. 2b. Phonetic Realization Folk knowledge of the articulatory phonetic distinctiveness of segments is revealed in discussions of languages other than English, or dialects distinct from those of the respondents. For instance, the uvular /ʁ/ found in French is said to be “rolled”, and to “come from the Adam’s apple” (demonstrating a not-too-far-off knowledge of how this sound is produced), and in fact another respondent described French as “rolling”. Discussions of post-vocalic /r/-deletion yield quite interesting folk theories of articulation. Several respondents equate r-lessness with nasality, describing New York or Boston English as “nasal”, but then illustrate this by deleting post-vocalic /r/ (rather than actually producing nasalized segments). Nasality in general is a highly-salient phonetic feature, again with regard to dialects on the upper East Coast and Inland North. Most often it is described in articulatory terms as “through the nose”, although one respondent referred to a “nasally” Boston dialect as being produced at the “back of the throat”. Another respondent offered an articulatory description of /ɑɪ/, stating that “you have to move your tongue around a lot” in its production. While not exactly the way a linguist might describe diphthongization, its reference to tongue movement is accurate.
Pholk Phonetics and Phonology
91
The description and effect of suprasegmental features is one of the most-often discussed topics by our respondents. Occasionally the discussion is lexical—where the stressed syllable should be in ‘Caribbean’, for example. More often, fine-grained knowledge of intonational patterns is offered. For instance, there is a lengthy discussion about German and Japanese intonation, which is at first described in non-linguistic terms such as “very strong” and “exacting”, and the fieldworker imitates German as ‘uk duk duk,—duk duk’ and Japanese as ‘li -ch—ch—ch—ch’. But they also use apparently linguistic or descriptive terms, such as “low pitched” or “harsh”. Finally, they describe German as “dictatorial”, and, in one of the most insightful comments we find, D says “you know, language does have something to do with culture”. While D is not correct in directly attributing linguistic qualities of a language to its speakers’ cultural beliefs, she is correct in pointing out is that the folk attitudes towards a culture create folk attitudes towards that language. While the Southern drawl is described segmentally by a few respondents (see above), it is most often described with regard to its intonation. Several respondents refer to duration by describing it as “slow” or “prolonged”. But their awareness of duration is sometimes quite linguistically sophisticated. D, for instance, displays an awareness of phrase final lengthening (Bishop & Kim 2018), by noting that this “slowing down” or “lengthening” occurs at the “end of sentences”. Our respondents also discuss pitch patterns, but in fairly broad terms. For instance, they describe West Indies English as “musical”. In one discussion, however, J describes French as having sentences that are “smooth”, “straight”, and “level”, but then says “if we don’t talk—you know, in the same monoTONE, then they might take it wrong”. This discussion is interesting for several reasons. First, although she begins the discussion by using non-linguistic terms like ‘straight’ and ‘level,’ she clearly is aware that French has phrase final sentence-level stress. Second, she actually illustrates this by shifting the word-level stress to the final syllable in ‘monotone.’ Finally, she compares this language’s “straight” and “level” intonation pattern to English: “we tend to talk, you know, up and down”. Thus, while A is not using the terms a linguist might use in differentiating French from English intonation patterns, she demonstrates a clear understanding of them. We also find a sophisticated discussion of the difference between pitch in a tone language and pitch in a language like English. Respondents G and K discuss the seven “levels” of pitch in Chinese versus one level in English in the following: G: We don’t have the variety. If-if uh you had where you could say the same sounds, but just change that level. IK: And make a completely new word (of it).
92
Nancy Niedzielski and Dennis R. Preston [ G: And make a complete-complete new-by using it, cause we actually have to change the wording. K: Yes. G: We can change stress. you can y-you can by stressing different words, you can mean something else Y-= K: =Yes. Yeah. stress. [ G: Which we just had a class, and th-That’s difficult just just for - one level - of change, (.hhh) is difficult for= [ H: Uh huh. G: =our students to understand, (.hhh) but to take something with seven levels.
They are noting a phenomenon that has stumped phoneticians for decades: how to quantify phrase-level stress in languages that have lexical tone (e.g., Hyman 2006). Our respondents first demonstrate an understanding of how in altering phrase-level stress a speaker “can mean something else”, and are clearly aware that stress can be realized through pitch alteration. They then wonder how this happens in a language that has tone—i.e., the “seven levels”. Finally, tempo is a common topic. Most often, this occurs in discussions of other languages seeming “fast” in language-learning situations, but we also find discussions of English being produced “too fast”, particularly as this relates to prescriptive ideas of clarity. For instance, in response to the fieldworker claiming that English sounds “musical”, G states that “we don’t have space between what we’re saying. In other words, we blend our words together”. He goes on to say that this is “not necessarily a good idea”, and that it’s because “we don’t listen to ourselves”. Thus, duration is implicated in speakers’ carelessness and lack of clarity (see below). 2c. Phonological Units Most often awareness of units in English phonology is demonstrated through spelling. We have treated this extensively elsewhere (e.g., Niedzielski & Preston 2000), but believe it warrants mention here as an important key to folk understanding. A common theme is that spelling gives speakers the “correct” pronunciation of a lexical item, but speakers ignore it (to their detriment). For instance, there are several discussions of the intervocalic flapped /ɾ/, but often as a way to illustrate “quick” or “sloppy” speech. Respondent H states that he often produces ‘letter’ as [lɛɾəɹ], but then says “I mean to pronounce ‘t’”.
Pholk Phonetics and Phonology
93
We also find very interesting discussions of vowel variation and spelling. In one discussion, they grapple with what seems to be the low back merger (despite this not being a feature of Northern Cities shifted English, although the fronting of /ɑ/ is also clearly involved). In response to the fieldworker’s inquiry about the pronunciation of ‘dog,’ a lengthy discussion of back vowels emerges. We reproduce this discussion below, since it illustrates how spelling is co-opted to illustrate vowel features: r: [dɑg] how do you -pronounce-not [dɔg]—not= [ [ A: [dɑg] ( )= r: =uh huh. A: =[dɑg]. J: No um [dɔg] [dɔg] [ r: [dɑg] . . . r: And how about the word [bɑks]. Do you pronounce [baks] or [bɑks]. [ J: [bɑks] I say [bɑks] A: [baks] [ r: But they are the same ‘o’ in the middle, ((spelling)) d-o-g [dɑg] but ((spelling)) b-o-x [baks]. J: Are they different? [ A: [a] uh [a] [a] [dag] no y-no [dag] i-[da] and= [ r: [baks] or ( ) [ J: ((various pronunciations of ‘box’ and dog’)) A: =[ba]—[baks]. [ J: And you would say [dɑg] [bɑg] [klɑg] [mɑg] A: If there’s a ‘e’ after—something you would pronounce it wi-like a -uh ‘e’ after—‘e’ after i-in ‘done.’ In ‘done’ We see here first a confusion as to whether ‘dog’ and ‘box’ have the same vowel, and in fact, J changes his pronunciation of the vowel from [ɔ] to [ɑ]. This may be the result of the fieldworker pointing out that they are spelled with the same vowel letter; however, it may also be the result of the low-back merger in other dialects of American English. Preston (1997: 40–41) shows that in rhyming tasks, Michigan speakers (who do not have
94
Nancy Niedzielski and Dennis R. Preston
the merger) often rhyme words containing /ɔ/ such as ‘dog’ and ‘off’ with words containing /ɑ/, and Niedzielski 1999 also showed that respondents matched audio tokens containing /ɔ/ such as ‘talk’ with the vowel /ɑ/. Thus, Michigan speakers (who believe that their variety of English is correct— see below) are unsure what the “correct” vowel is in these words (and spelling is no help). In addition, our respondents in the above are confronted with the problem of ‘done,’ also containing an orthographic ‘o,’ and A determines that the ‘e’ at the end changes the vowel to /ʌ/. There is a spelling “rule” for orthographic ‘o’ with a final ‘e’ that was a complex result of the Great Vowel Shift and would have changed some instances of Old English long /ɑ/ into /o/ (as seen in Modern English ‘bone’). That is not what is at stake here. Old and Middle English ‘done’ already had long ‘o’ and could have been expected to raise to /u/ (as in ‘Modern English ‘do’). But the /ʌ/ in ‘done’ is the result of shortening that influenced many words, perhaps especially in late Middle English, and is clearly an exception. Nonetheless, A is not deterred in co-opting spelling to explain this. In a discussion regarding supposed deletion of segments (present in the orthography), these same respondents either claim to actually produce the segments or offer an explanation for why it is prescriptively acceptable that they don’t. A, for instance, says that he does in fact produce a ‘g’ at the end of the word ‘long’ and in fact does produce the [g] in his illustration: “it’s a lo[ŋg] time”. But he states that he is not obligated to produce an ‘s’ in ‘Illinois’ because it’s “Indian, or Iroquois”. J suggests it’s actually French, but ultimately they agree that the ‘s’ is silent because of its different origin. What we see from these discussions is that, as mentioned above, there is strong folk belief that spelling provides the correct pronunciation for a word and that speakers (including themselves) are too “sloppy” or “lazy” to follow the rules. Nonetheless, their remarks show that they are clearly aware of the segmental units of their language, even if it is the result of orthography. Finally, a discussion of word games offers a glimpse into the awareness of phonological units. Respondent M offers a description of Pig Latin, and wrestles with exactly which unit is moved. She illustrates this with ‘scram’ and ‘am-scray,’ and states first that the [skrʌ] is moved, and a ‘y’ is added. J adds that “you put the ‘r’ in’” (although it is not clear what he means by this). Next, M says “you take the first syllable and put it at the end”, and then “Like-I was saying like ‘scram’ i-‘amscray,’ ‘asr-’ ‘a-’ ‘a-’ ‘amscray.’ (.hhh) Where yo-and then you kind of add a ‘y’”. Interestingly, there is an appeal to phonological segments, but it is the syllable that M overtly mentions, and in fact uses in her description of the process. While ‘am-scray’ is a “correct” Pig Latin production, when she breaks the process down, she offers not only the first three phonemes, but a vowel as well ([skrʌ]). In addition, the exact placement of the /ɹ/ is problematic: she moves it to a position following /s/, (deleting /k/ altogether), and J
Pholk Phonetics and Phonology
95
specifically says that an /ɹ/ is inserted (somewhere). Thus, the actual order of the phonological segments is difficult in a case where perhaps less basic units (a complex onset) are moved, and an appeal to more basic phonological units available to the folk (the syllable) are enlisted. 2d. Evaluation In the rest of our examples, we discuss the frequent evaluative dimensions in our data. Here, however, we make only a brief note of how phonology is directly appealed to. As we have suggested above, when even fine-grained phonological differences between language varieties are discussed, they are often accompanied by value judgments, exaggerated when even more general statements regarding phonology are concerned. This does not mean that we are unconcerned with the influence of folk evaluation of phonological processes, especially historical change, but we examine such questions below. We emphasize here again that most salient of regional varieties—the US South (Niedzielski & Preston 2000: 54). While several respondents acknowledge the pleasantness of Southern speech, it is noteworthy that they never receive the positive evaluations that Michigan English receives (see below). For instance, one respondent states that “the Southern drawl is fun to listen to”, but describes Michigan English as “clearer” and “better-pronounced” than other varieties, but in no case do we find such positive evaluations as “clarity” given to Southern dialects. Instead, we see labels such as “lazy” (often paired with “slow”) or “slurred”. It is true that “slurred” was also used for Michigan English, but the respondent used this label directly after stating that Michigan English was “clearer and better-pronounced” than other varieties. Thus, only occasionally is Michigan English not the standard-bearer, but other American varieties are never described with such positive labels. Further examination of the role of evaluation in folk phonology is considered at greater length below. 2e. Summary Several principal themes emerge from this data, particularly as they relate to the structure of the standard. First, there is a robust belief among the folk that the difference between standard and non-standard is one of ease of articulation: standard English involves more articulatory effort, and nonstandard English is therefore “lazier” (e.g., “g-dropping”). Second, and not unrelated, folk discussions often involve ideas about perceptual separation (i.e., it should not be “slurred”) as it pertains to clarity of speech. Our respondents nevertheless display certain levels of phonological awareness. As we have shown, they often offer relatively complex descriptions of phonological processes, such as intonation patterns and duration effects.
96
Nancy Niedzielski and Dennis R. Preston
Since folk phonetics and phonology is concerned with levels of awareness, it is no surprise that discussions among the folk involve reference to various aspects of spelling as it relates to different linguistic levels—how it relates to the acquisition of reading and writing, for example, or how spelling dictates how standard English should be pronounced. As an overt and imperfect representation of phonology, folk discussions surrounding orthography are an ideal place to view beliefs about and the levels of awareness of sound. For instance, we find FL discussions of the acquisition of correspondences between orthography and phonemes and the relative difficulty of the English spelling system. Additionally, we find on the one hand moral and intellectual overtones about correct spelling, regardless of how closely the orthography corresponds to the phonology, but also encounter beliefs regarding how closely the standard language adheres to a one-to-one correspondence between orthography and phonemes. We turn now to more theoretically oriented considerations, particularly those that deal with levels of awareness in FL and phonetic and phonological facts.
3. Pholk Phonetics and Phonology and Language Awareness Traditional work in FL suggests that the level of consciousness examined is that of overt awareness. We will not repeat here the details of attempts to account for awareness in sociolinguistic settings; that work is recently reviewed in Babel (2016) and follows up on such reflections in Silverstein (1981) and Preston (1996). We will, however, try to set the comments shown above within a cognitive, processing environment but along the way expand the notion of FL. To do so, we make use of an oft-revised FL triangle introduced in Niedzielski and Preston (2000: 26). Figure 7.1 shows our most recent version. From the linguistic input (the top of the triangle), the most important thing to take into consideration for FL is (1) Noticing. For the moment we use this label to refer to the kind of mental operation which results in overt or conscious awareness (e.g., Schmidt 1990; Squires 2016). Linguistic material (a stimulus) appears at ɑ, the inside top of the triangle and may be Noticed (Step 1). This material is Classified (Step 2), i.e., attributed to a group.2 Of course there will be errors in attribution and varying degrees of detail about the group. Once the group is identified, Step 3 takes that information to the underlying “Cultural belief system” where there is a repository of ideas about that group’s characteristics. Steps 4a and b then offer the opportunity for Imbuing, or what Irvine (2001) calls iconization, the process that allows attributes of a group (e.g., “slow”, “intelligent”, “assertive”, “friendly”) to be assigned the linguistic item itself. Once such imbuing takes place the classifying step may be short-cut in future occurrences, allowing the perceiver to retrieve elements previously imbued into
Pholk Phonetics and Phonology
97
Figure 7.1 An FL triangle.
the linguistic feature directly from the “Cultural belief system”. Please notice “elements” in that last sentence. The belief system about any group is complex and may even contain contradictory notions. Some of these may be more heavily triggered in some settings and tasks and others in different ones. Representations of the underlying structure of this part of the system as an “attitudinal cognitorium” or “indexical field” can be found in Preston (2010) and Eckert (2008), respectively.3 Step 5 feeds these selected items into the procedural apparatus that guides the response, i.e., bc’. Some are guided into b, the leftmost side of the triangle, where they surface as the classical material of FL—consciously available characterizations. But some are sent to c where they are expressed in behaviors and responses that cannot be given explicit description—they are implicit responses, the classical territory of the social psychologists of language. We are not happy with the exclusion of these c-side responses from FL for several reasons. First, we are not so sure that some of the traditional tasks of language attitude study obtain purely implicit data. In the use of the matched guise technique, for example, respondents still know at a fully conscious level that they are reacting to a language stimulus. Second, we believe that the data from various FL tasks, perhaps especially conversational ones, offer opportunities to investigate implicit as well as explicit factors in the respondent belief systems. Third, we believe that FL findings
98
Nancy Niedzielski and Dennis R. Preston
and language attitude findings complement one another in interesting and even explanatory ways. We elaborate here on the last of these with evidence for a more inclusive pholk phonology. Niedzielski (1999) is a study of reactions to a test seeking to discover the degree to which respondents from southeastern Michigan (US) could accurately detect pronunciation features that matched their own. She asked southeastern Michigan respondents to carry out a simple vowel matching task. In a sentential context, she played the word “last” with a local raised and fronted vowel, i.e., one actually closer to conservative US English [ɛ] than [æ], and on the test page of the task the speaker was identified as “from Michigan”. She then asked the respondents to listen to three other representations of the word “last” and state which sounded most like the first. Table 7.1 shows the acoustic measures of the three vowels presented. One was the raised and fronted vowel which matched the first (called “actual”), also the normal usage of the respondents. Another was the conservative US English /æ/ vowel, called “Canonical”, and the third was a backer version (close to phonetic [a]), called “Hyperstandard”. Table 7.2 shows the complete failure of these respondents to accurately match the two vowels. No one matched it correctly, and four even chose the exaggeratedly lower and backer vowel. Here is Niedzielski’s explanation. The results of this portion of the perceptual study provide further evidence that White speakers in Detroit feel that they are speakers of “standard” speech. . . . To date, however, no other study sought to determine whether Detroiters felt that the Northern Cities Chain Shifted [NCCS] vowels were “correct” and standard or whether they simply did not hear the shift in their own speech. The present study Table 7.1 Formant Values for the /æ/ in “Last” Presented to the Respondents for Matching (Adapted From Niedzielski 1999: 74) Formant Values of Tokens Chosen by Respondents for Last No. of Token
F1
F2
Label of Vowel
1 2 3
900 775 700
1530 1700 1900
Hyperstandard Canonical /æ/ Actual token produced by speaker
Table 7.2 Responses to the “Last” Test (Adapted From Niedzielski 1999:72) Last: Token Selection Token (Label)
1 (Hyperstandard)
2 (Standard /æ/
3 (Actual Token)
10%
90%
00%
Total
Pholk Phonetics and Phonology
99
provides strong evidence for the latter. Even when faced with acoustic data that suggest otherwise, Detroit respondents select standard vowels as those that match the vowels in the speech of fellow Detroiters. It is not the case, then, that Detroiters assign standard labels to raised peripheral vowels and lowered lax ones, that is, NCCS vowels. Rather, Detroiters simply do not perceive NCCS vowels at this level of consciousness. (Niedzielski 1999: 80–81) From a number of FL tasks we know that the sorts of Michigan respondents Niedzielski studied are not at all shy in expressing their belief in their own standardness. Hand-drawn maps of US regional varieties by such respondents note that their own local variety is “normal” and when asked to assess the “correctness” of the English of every state, rate Michigan highest (Niedzielski & Preston 2000: 64). In the following a Michigan respondent answers a nonnative fieldworker’s question about where standard English is spoken. G: If you have such a thing as called standard English other than textbook English, it would probably be the language that you’re hearing right now. As you listen to the Midwestern. In other words, a variety of overt FL studies confirm Niedzielski’s suspicion that the folk stereotype of local “correctness” for Michigan speakers implicitly interferes with their ability to carry out a phonetic perception task in which that stereotype is challenged.4 We believe, however, that we can push this expanded notion of FL (i.e., considering both explicit and implicit matters) even further in the service of phonology. Niedzielski’s 1999 study looked at only a few of the rotating vowels in the Northern Cities Shift (NCS), shown schematically in Figure 7.2. Preston (2005) studied all these shifted vowels in a single-word comprehension test among local southeastern Michigan speakers. In a similar study outside the NCS area, Peterson and Barney (1952) showed excellent comprehension (over 90% correct) of nearly all vowels in the US English system in an /h_d/ environment, but Preston’s study found considerable misperception of NCS vowels, some near chance, while those not involved in the NCS showed Peterson and Barney-like success. The aim of this study, however, did not focus on the rate of misunderstanding, but on the direction.5 That is, what vowel was heard instead of the speaker’s intended one? Figure 7.2 suggests the following: 1. /æ/ could be understood as /e/ or as shifted /ɪ/ 2. /ɑ/ could be misunderstood as /ɛ/ if the latter has lowered (see Note 5) 3. /ɔ/ could be misunderstood as /ʌ/
100
Nancy Niedzielski and Dennis R. Preston
Figure 7.2 The Northern Cities Chain Shift Illustrated by the Formants for an Urban Southern Michigan Female Speaker (in Squares) and Those for a More Conservative US Female Vowel System as Reported in Peterson and Barney (1952) (in Circles). Step 1, /æ/ raises and fronts; step 2, /ɑ/ lowers and fronts; step 3, /ɔ/ lowers and fronts; step 4, /ɛ/ lowers and backs;6 step 5, /ʌ/ backs, and step 6, /ɪ/ lowers and backs.
4. /ɛ/ will not be misunderstood since it does not reach shifted /ʌ/, /ɔ/ but could be misunderstood as /ɑ/ if it has lowered (see Note 5) 5. /ɪ/ could be misunderstood as /e/ or shifted /æ/. Let’s try a second, however unlikely, scenario in which the NCCS speaker makes mistakes not based on their own system but based on the older, Peterson and Barney-like conservative system. In this scenario the following might occur: 1. /æ/ might be misunderstood as /ɛ/ 2. /ɑ/ will not be misunderstood since it does not reach the space vacated by /æ/ 3. /ɔ/ could be misunderstood as /ʌ/ or /ɑ/ 4. /ɛ/ will not be misunderstood since it does not reach /ʌ/, /ɔ/, or /ɑ/ territory but could be misunderstood as /æ/ if it has lowered (see Note 6) 5. /ɪ/ could be misunderstood as /e/ or /ɛ/. These are very different predictions, and Table 7.3 shows how they were realized in the comprehension task.
Pholk Phonetics and Phonology
101
Table 7.3 Confusion Matrix of NCS Shifted Vowel Misunderstandings. Greyed cells are correct responses; bold numbers refer to errors made principally in the “conservative” direction, and italicized ones refer to errors made principally in the “shifted” direction (Preston 2018: 14)
Respondent identifications
NCS stimuli /ɑ/ /ʌ/ /æ/ /ɛ/ /ɔ/ /ɪ/
Total 431 331 432 429 432 288
/ɑ/ 357 6 0 0 216 1
/ʌ/ 0 287 0 111 16 0
/æ/ 72 4 366 10 8 3
/ɛ/
/ɔ/
/ɪ/
1 6 66 298 0 162
0 21 0 0 183 0
0 0 0 1 1 122
other 0 3 2 7 8 0
With the exception of one /ɑ/ heard as /ɛ/, three /ɪ/s heard as /æ/, and the ambiguous results with regard to /ɛ/’s confusion with /ʌ/, the overwhelming number of errors refer to phonetic positions that reflect the conservative or Peterson and Barney-like vowel system, not the system of the respondents themselves. This confirms Niedzielski’s (1999) results that southeastern Michigan respondents refuse to hear NCS-shifted tokens and that the tokens they believe they hear are deeply embedded in a system that does not guide their production. Their staunch belief that they are speakers of the standard is the guiding FL principle, but these imaginary tokens that nonconsciously represent the standard must come from somewhere. Preston (2018) speculates as follows: [E]vidence of them [i.e., vowel positions in the conservative system] is still around in older speakers, in the speech of speakers from other areas, in media language, and even in some conservative environments of a shifted speaker’s own system. (15–16) And there is newer evidence that this burden of the standard may be causing some reversals in the system among younger speakers, perhaps even at least partly based on overt awareness (Wagner et al. 2016), but we put aside consideration of that more recent development in what follows and return to it in our concluding remarks. Figure 7.3 shows the beginnings of a scenario for the forward and upward movement of /æ/ and the fronting of /ɑ/. The outlined space in Figure 7.3 shows a conservative area for /ɑ/ with 1550Hz as its F2 center (the black square), but in the /æ/ space to the left (i.e., a fronter F2 area), there is one token in which the speaker intended an /ɑ/. If this token is misunderstood as /æ/, there is likely to be no influence on the hearer’s vowel space configuration, but Figure 7.4 shows that the fronted /ɑ/ token is left behind when /æ/ itself is fronted (see
102
Nancy Niedzielski and Dennis R. Preston
Figure 7.3 Distribution of Tokens of the English Low Vowels and the Beginnings of the NCS Realignment of /ɑ/. Source: Preston 2011: 18, Adapted from Labov 2002.
Figure 7.4 The First Stage of the NCS for Low Vowels, Showing the Fronting of /æ/ and With the Black Square the New F2 Center for /ɑ/. Source: Preston 2011: 18, Adapted from Labov 2002.
Figure 7.2), and that an adjustment to the hearer’s vowel space results in a new center of the area for /ɑ/. This appears to be pretty normal vowel change, certainly flying under the radar of the hearers, but we believe there is a FL component to it, illustrated in Figure 7.5. The dotted lines outline a box where NCS speakers retain a conservative perceptual apparatus for /æ/, one brought into play when their standardness
Pholk Phonetics and Phonology
103
Figure 7.5 A Hypothetical Conservative /æ/ Vowel Territory for NCS Speakers. Source: Preston 2011: 29, Adapted from Labov 2002.
is challenged or even when standard language is focused on in a matching or comprehension task. Seventy-two /ɑ/ tokens are heard as /æ/ (the white squares and black dots within the dotted line square). Such mishearings did not interfere with the nonconscious progress of the NCS, but they show a reliance on a conservative ideology in some processing tasks. Even more importantly, they show how this conservative ideology lies at the basis of the shift. If these speakers are firm believers in their standardness, why would they tolerate change? Their FL belief in a standard allowed them to construct an alternative imaginary phonology that continued to satisfy their ideological urges; they did not hear themselves or others like them use the shifted items since they had another conservative system at their disposal. We believe that such reasoning as this helps fulfill the explanatory detail sought after in the desideratum stated by Weinreich et al. (1968) at the beginning of this chapter: “The theory of language change must establish empirically the subjective correlates of the several layers and variables in a heterogeneous structure” (186). But it is the knowledge of the overt FL belief in local standardness that buttresses the odd perceptual mismatches that, in turn, explain the progress in a shift that would have had little chance of operating without some way of maintaining the belief in local standardness. In this we may slightly disagree with Eckert and Labov (2017): Our review first assembles evidence that social meaning is deeply involved in phonological variation and that phonological change is frequently motivated and accelerated by the association of social meaning with the more concrete components of linguistic structure. (491)
104
Nancy Niedzielski and Dennis R. Preston
In this case the social meaning of correctness is associated with what should be a retarding rather than motivating fact, but its realization in an abstract system seems to be empowering for the change itself. We believe there are many such stories to be told in the FL world of phonology. Some of them deal directly with FL observations but others rely on FL experiments and tasks that just as surely reveal nonconscious or implicit facts that may be even more germane to studies of variation and change. From a pholk phonology perspective, for example, Plichta (2004) determined that the overt outsider perception of Great Lakes area US pronunciation was “nasal” was indeed an accurate perception: vowels involved in the NCS in nonnasal environments were, in fact, considerably nasalized—a phonetic discovery triggered by many FL observations in which Michigan, Wisconsin, and Minnesota are often identified as “nasal accents”. Implicit folk beliefs emerge as well. Koops et al. (2008) tested implicit knowledge of the /ɪ/ ~ /ɛ/ merger in Houston, Texas. The two are merged by older Anglo (i.e., non-Hispanic and white) speakers, but they are unmerging in the speech of younger speakers. Although previous studies have shown that Houstonians associate the merger with rural areas, age was not considered. Eye-tracking allowed researchers to determine how long respondents focused their gaze on potential (partial) homophones (rinse and rent); the screen showed a photograph of one of three women: “young”, “middle-aged”, and “older”, and the audio samples were spoken by younger and older speakers. The older voice and age suggested by the picture influenced the time a respondent spent looking at “rent” when the stimulus contained the [ɪ] vowel; that combination caused the subjects to look longer at the “rent” alternative; the younger voice and picture when combined with [ɪ] triggered rapid fixation on “rinse”. Although the respondents do not have explicit knowledge of the correlation between age and merger, the eye-tracker reveals that they have implicit knowledge of who is likely to merge and who is not. Sociolinguists’ knowledge of the fact that respondents make this connection is important to any study of the advancing distinction. In this case, although a great deal of FL belief is focused on the fact that younger speakers are “ruining the language”, in much of the US South older speakers are associated with rurality and less well-educated language use. This FL fact suggests that the young urban Houstonians who are involved in change are at least in part responding to a more modern, standard, and nonlocal ability to distinguish the two sounds, one consistent with their overt stereotypes.7 We cannot conclude without speculating on even more intimate links between folk phonetic and phonological facts and linguistic representations of the same territory. If Figures 7.3, 7.4, and 7.5 suggest something of an exemplar model of the phonetics/phonology relationship, perhaps further consideration of the FL realization of this dual perception of the Michigan vowel system by our respondents is in order. Let’s pretend that
Pholk Phonetics and Phonology
105
the black square in Figure 7.3 is the prototype position for the English vowel /ɑ/; i.e., that it is the center of the densest cloud (which does not entail its being an actual exemplar, Pierrehumbert 2001). Figure 7.4 shows a somewhat normal scenario in which a sound change has taken place. Due to the forward movement of /æ/, some tokens in its former territory are now understood as /ɑ/, and, with sufficient exemplars in that space, a new prototype center has been established. Figure 7.5, however, shows the potential disruption of perception by sociocultural facts, the explanations used for the mishearings reported in Table 7.3. If southeastern Michigan respondents cannot accurately hear their own (and their own speech community’s) vowels, a number of important issues emerge. Do exemplars misplaced due to such factors as linguistic security count or not count in the creation of the density characteristics of clouds of exemplars? Are these misplaced exemplars only activated by certain tasks in which prescriptivist urges are triggered in respondents? Since social information is encoded with exemplars, do these mishearings mean that social groups are assigned incorrect exemplary systems? If one prefers a feature-based phonology, this reasoning does not seem to be at odds (Ettlinger & Johnson 2009). Whatever the answer to these questions, and we have proposed some tentative ones, they would seem to bear heavily on the studies of the perceptual bases of language production and with no doubt on those of variation and change. In this particular case, for example, the more recent work by Wagner et al. (2016) that shows a reversal of the NCS can be interpreted not as a reversal from some exterior source nor some strange remembrance of things past but as the activation of a system already present in the local speakers of the variety. As noted earlier, we take such matters to be part of the solution to the evaluation problem and are ignored in the study of phonological systems at some risk. With this in mind, we also recognize the avowed importance of such considerations in natural phonology and morphology as included in IV in the following: Universals of human language (I) are properties (e.g., phonological processes) which can be scaled along parameters of naturalness from most to least natural. Accordingly, some of them appear in all languages, some of them enter implicational scales of applicability and, finally, some of them are totally suppressible. A selection of universal properties constitutes a language type (II) (e.g., iso-accentual languages, quantity-sensitive languages). Properties (I) and (II) are filtered by the system of an individual language in order to make them comply with the properties defining this system (III). The choices within an individual system undergo further specification via sociolinguistic norms (IV) [italics ours] and conditions of usage (V). Performance itself feeds back into the universals. (Dziubalska-Kołaczyk 2002: 32)
106
Nancy Niedzielski and Dennis R. Preston
In conclusion, the list of studies that link phonetic and phonological change that can be related to FL, both overt and covert, could be considerably expanded over the ones surveyed here.8 We hope to have shown, however, that “pholk phonetics and phonology” are not only very interesting parts of the ethnography and language ideology of a speech community but also vital parts of the investigation of sound systems and sound change.
Notes 1. This is much too facile to satisfy any folklorist; see, for example, the discussion in Dundes (1980) and the numerous definitions of folklore at the American Folklore Society’s website: www.afsnet.org/page/WhatIsFolklore. 2. This is the step that associates a linguistic item with a social identity. It is a commonplace in both exemplar theory (e.g., Pierrehumbert 2003) and Sprachdynamik (e.g., Schmidt & Herrgen 2011). 3. Preston’s “attitudinal cognitorium” is borrowed from Bassili and Brown (2005), and Eckert’s “indexical field” makes direct reference to Silverstein (2003). 4. This misperception on the basis of folk belief has been replicated in a number of other studies. In Plichta and Preston (2005), for example, US women’s voices were rated as “more Northern” and men’s as “more Southern” in a task in which the variable element was resynthesized; that is, the clues to regionality were exactly the same for men and women but the positioning on a North-South continuum was significantly different. The FL interferences with correct perception in this case were (1) the belief that women speak more standard English than men, and (2) the belief that better English is spoken in the North. 5. Labov (2010, Part A: 18–85) reports extensively on cross-dialectal misunderstandings, but the focus here is on same-dialect misunderstandings. 6. /ɛ/ has an alternative path not shown in Figure 7.2 in which it lowers into the territory previously occupied by conservative /æ/. 7. We are aware that not all implicit reaction is in line with overt belief. Kristiansen (2009), for example, believes that only implicit reactions and beliefs guide language change. 8. A considerable number of such studies can be found in Parts II (Studies of Perception) and III (Studies of Perception and Production) in Preston and Niedzielski (2010).
References American Folklore Society. What is Folklore. www.afsnet.org/page/WhatIsFolklore. Babel, A. (ed.). 2016. Awareness and control in sociolinguistic research. Cambridge: Cambridge University Press. Bassili, J. N. & R. D. Brown. 2005. Implicit and explicit attitudes: Research, challenges, and theory. In D. Albarracín, B. T. Johnson & M. P. Zann (eds.), The handbook of attitudes, 543–574. Malwah, NJ & London: Lawrence Erlbaum Associates. Bishop, J. & B. Kim. 2018. Anticipatory shortening: Articulation rate, phrase length and lookahead in speech production. Speech Prosody 48. 235–239.
Pholk Phonetics and Phonology
107
Dundes, A. 1980. Interpreting folklore. Bloomington: Indiana University Press. Dziubalska-Kołaczyk, K. 2002. Beats-and-binding phonology. Frankfurt/Main: Peter Lang. Eckert, P. 2008. Variation and the indexical field. Journal of Sociolinguistics 12(4). 453–476. Eckert, P. & W. Labov. 2017. Phonetics, phonology, and social meaning. Journal of Sociolinguistics 21(4). 467–496. Ettlinger, M. & K. Johnson. 2009. Vowel discrimination by English, French and Turkish speakers: Evidence for an exemplar-based approach to speech perception. Phonetica 66. 222–242. Hyman, L. 2006. Word-prosodic typology. Phonology 23. 225–257. Hymes, D. 1972. Models of the interaction of language and social life. In J. J. Gumperz & D. Hymes (eds.), Directions in sociolinguistics: The ethnography of communication, 35–71. New York: Holt, Rinehart and Winston. Irvine, J., 2001. “Style” as distinctiveness: The culture and ideology of linguistic differentiation. In P. Eckert and J. R. Rickford (eds.), Style and sociolinguistic variation, 21–43. Cambridge: Cambridge University Press. Koops, C., E. Gentry & A. Pantos. 2008. The effect of perceived speaker age on the perception of PIN and PEN vowels in Houston, Texas. University of Pennsylvania Working Papers in Linguistics 14(2) (= Selected papers from NWAV 36) http.repository.upenn/pwpl/vol14/iss2/ (accessed October 10, 2010). Kristiansen, T. 2009. The macro-level social meanings of late-modern Danish accents. Acta Linguistica Hafniensia 41. 167–192. Labov, W. 2002. Driving forces in linguistic change. A paper presented to the International Conference on Korean Linguistics, Seoul National University, August 2. Labov, W. 2010. Principles of linguistic change, Vol. 3: Cognitive and cultural factors. Oxford: Wiley-Blackwell. Niedzielski, N. 1999. The effect of social information on the perception of sociolinguistic variables. Journal of Language and Social Psychology 18. 62–85. DOI: 10.1177/0261927X99018001005. Niedzielski, N. & D. R. Preston. 2000. Folk linguistics. Berlin: Mouton de Gruyter. Peterson, G. E. & H. L. Barney. 1952. Control methods used in a study of the vowels. Journal of the Acoustical Society of America 24(2). 175–184. Pierrehumbert, J. B. 2001. Exemplar dynamics: Word frequency, lenition and contrast. In J. Bybee & P. Hopper (eds.), Frequency effects and emergent grammar, 137–157. Amsterdam: John Benjamins. www.ling.nwu.edu/ jbp. Pierrehumbert, J. B. 2003. Phonetic diversity, statistical learning, and acquisition of phonology. Language and Speech 46. 115–154. Plichta, B. 2004. Interdisciplinary perspectives on the Northern Cities Chain Shift. Ph.D. dissertation. East Lansing, MI: Michigan State University. Plichta, B. & D. R. Preston. 2005. The /ay/s have it. In T. Kristiansen, N. Coupland & P. Garrett (eds.), Acta linguistica Hafniensia 2005 (Subjective processes in language variation & change). 107–130. Preston, D. R. 1989. Perceptual dialectology. Dordrecht: Foris. Preston, D. R. 1996. “Whaddayaknow?”: The modes of folk linguistic awareness. Language Awareness 5(1). 40–74. Preston, D. R. 1997. The northern cities chain shift in your mind. In A. Thomas (ed.), Issues and methods in dialectology, 37–45. Bangor: Department of Linguistics, University of Wales Bangor.
108
Nancy Niedzielski and Dennis R. Preston
Preston, D. R. 2005. Belle’s body just caught the fit gnat: The perception of Northern Cities shifted vowels by local speakers. University of Pennsylvania Working Papers in Linguistics (= Selected papers from NWAV 33) 11(2). 133–146. Preston, D. R. 2010. Variation in language regard. In P. Gilles, J. Scharloth & E. Zeigler (eds.), Variatio delectat: Empirische Evidenzen und theoretische Passungen sprachlicher Variation (für Klaus J. Mattheier zum 65. Geburtstag), 7–27. Frankfurt am Main: Peter Lang. Preston, D. R. 2011. The power of language regard: Discrimination, classification, comprehension, and production. In D. Speelman, S. Grondelaers & J. Nerbonne (eds.), Proceedings of Production, Perception, Attitude 2009. Dialectologia special issue II. 9–33. Preston, D. R. 2016. Whaddayaknow now? In A. Babel (ed.), Awareness and control in sociolinguistic research, 177–199. Cambridge: Cambridge University Press. Preston, D. R. 2018. Language regard: What, why, how, whither? In B. Evans, E. Benson & J. Stanford (eds.), Language regard: Methods, variation, and change, 3–30. Cambridge: Cambridge University Press. Preston, D. R. & N. Niedzielski (eds.). 2010. A reader in sociophonetics (= Trends in Linguistics. Studies & Monographs [TiLSM] 219). Berlin & New York: De Gruyter Mouton. Schmidt, J. E & J. Herrgen. 2011. Sprachdynamik: Eine Einführung in die moderne Regionalsprachenforschung (= Grundlagen der Germanistik. 49). Berlin: Erich Schmidt Verlag. Schmidt, R. W. 1990. The role of consciousness in second language learning. Applied Linguistics 11. 129–158. Silverstein, M. 1981. The limits of awareness. Sociolinguistic working paper #84. Austin, TX: Southwest Educational Development Laboratory. Silverstein, M. 2003. Indexical order and the dialectics of sociolinguistic life. Language and Communication 23. 193–229. Squires, L. 2016. Processing grammatical differences: Perceiving versus noticing. In A. Babel (ed.), Awareness and control in sociolinguistic research, 80–103. Cambridge: Cambridge University Press. Wagner, S. E., A. Mason, M. Nesbitt, E. Pevan & M. Savage. 2016. Reversal and re-organization of the Northern Cities Shift in Lansing, Michigan. University of Pennsylvania Working Papers in Linguistics 22(2). Selected Papers from NWAV 44, 171–179. Weinreich, U., W. Labov & M. I. Herzog. 1968. Empirical foundations for a theory of linguistic change. In W. F. Lehmann & Y. Malkiel (eds.), Directions for historical linguistics, 95–188. Austin, TX & London: University of Texas Press.
8
Rhythm Zone Theory Speech Rhythms Are Physical After All Dafydd Gibbon and Xuewei Lin
1 Rhythm in a Semiotic Framework Rhythms in music and language are semiotic events: regularly repeated structured temporal patterns of human experience in performing and perceiving music, dance and speech, and, more metaphorically, to events of non-human origin such as animal sounds, and to regularly repeated spatial patterns in the visual arts and in the dynamics of natural phenomena. The aim of the present chapter is to examine the frequencies of speech rhythms in the newly developed framework of Rhythm Zone Theory (RZT), and to illustrate a possible application domain in the field of foreign language fluency assessment, using two non-fluency markers derived from RZT. These two aspects of theory and practice relate to two of the main research interests amicae optimae laudataeque libri huius. This work is exploratory and concerned with methodological issues; the case study is illustrative of the method rather than primarily evidential. The background to the work is formulated in Time Type Theory (Gibbon 1992, 2006; Gibbon & Griffiths 2017), which provides an ontology of four linguistically relevant time concepts: (1) abstract categorial time, as in duration contrasts between long and short vowels; (2) abstract relational (‘rubber’) time, as in the sequential and hierarchical relations postulated in linguistic descriptions; (3) clock time, as in measurements of time points and intervals in a speech signal and (4) cloud time, as in the intuitively perceived timing of actual utterances as they are made. The present chapter is concerned with clock time. From a semiotic point of view, rhythms have functional, formal and physical characteristics: functions in communication, forms as sequences, hierarchies and parallel streams and physical characteristics in the movements of a musician or in the movements of the organs of speech in the vocal tract. In the long history of the scientific treatment of rhythms, the complex interactions of the three semiotic principles of function, form and physics have often, intuitively, been taken to reflect emotions associated with rhythmical aspects of human behaviour and perception, such as faster and slower breathing, heartbeats or limb movements which are determined by the properties of human anatomy and physiology. The
110
Dafydd Gibbon and Xuewei Lin
literature on these topics is legion, and the present chapter focuses only on a very small part of this literature. The functional aspects of rhythms are perhaps the most complex and the least researched: the importance of rhythms as cohesive means of framing speech and music into coherent, manageable information patterns is perhaps most obvious in speech, and the emotional heartbeats of rhythm are perhaps most obvious in music, but rhythms in both speech and music share cohesive and emotional functions. A more general functionality of rhythm is described in a thought experiment of whether there could be a world with time as its only dimension (Strawson 1959), in which dynamic changes in amplitude (and thus also rhythms) may be interpreted as approaching and disappearing sound source objects. The formal characteristics of rhythms are linear and hierarchical patterns of sounds in time. In linguistics, particularly in the phonology of sentences and words, there is an extensive field of research in modelling these patterns, most clearly represented in the nuclear stress rule and the compound stress rule of generative phonology (after Chomsky & Halle 1968), the metrical grid (after Liberman & Prince 1977), the prosodic hierarchy (Selkirk 1984), beats and binding (Dziubalska-Kołaczyk 2002) and other variants of and successors to generative phonology. In linguistic studies a certain scepticism about phonetic studies of rhythm reigns, suggesting that rhythms are primarily cognitive constructs, or even not identifiable in physical terms at all. The physical characteristics of rhythms are found in the dynamics of the production of sounds with musical instruments and the voice, and in the perception of these sounds. In musicology and in phonetics, there have been many approaches to capturing, describing and explaining the physical characteristics of rhythm. In phonetics, much effort has been spent on investigating ‘the’ rhythm of a language, dialect or idiolect with various phonetic methods, for instance by investigating repeated temporal patterns aligned with syllables and words. These studies have not been particularly successful, and many have relied on human filtering of the speech signal through the procedures of manual annotation (and automatic annotation, i.e., annotation by supervised machine learning involving bootstrapping with manual annotations). More success has been achieved by studies of rhythms as oscillations (see Section 3). A terminological clarification is necessary at this point. Speech involves approximate frequency ranges of three different types, of which only the first is relevant for the present study: 1. From 0 Hz . . . 20 Hz: the domain of the varying frequencies and their phases which characterise the rhythms of speech sounds, syllables, words, phrases and larger discourse units, which determine the low frequency outline (the amplitude envelope) of the speech signal; events in this frequency range are perceived as separate beats rather than as tones. 2. 80 Hz . . . 400 Hz (adult male and female voices): the domain of the fundamental frequency of the voice, which relates to tones, pitch
Rhythm Zone Theory
111
accents and intonation, the domain usually shown in F0 tracks, ‘pitch’ tracks; in this frequency range, events are perceived as tones. 3. 80 Hz . . . 4000 Hz: the domain of the spectral formants shaped by the oral and nasal cavities of the vocal tract, which characterise vowels and consonants and voice quality, the domain usually shown in spectrograms; particularly in the mid and upper sections of this frequency range, events are perceived as sound qualities. The following sections concentrate on these temporal physical characteristics, and show that ‘the’ rhythm of a language is best not thought of as ‘the’ rhythm at all: there are many rhythms, in different temporal domains, and each of the rhythms is highly variable both in frequency and in phase. The physical characteristics of speech rhythms are measurable and visualisable using signal processing methods, also in neurophysiological domains. Finally, an application of this recent methodology in a practical field will be demonstrated: the capturing of temporal non-fluency in readings by low proficiency adult Cantonese learners of L2 English.
2 Irregularity and Isochrony: The Annotation Method One basic method of investigating speech timing is by aligning linguistic units with segments of the speech signal, measuring the duration of these units, and performing descriptive statistical analyses and structure building on these duration measurements. Annotation (also known as labelling or phonetic alignment) is a method of pairing the components of a transcription of a speech recording (labels) with time-stamps which indicate the beginning and end, or the beginning and length (more rarely: the middle) of these components, with the aid of speech visualisation software. For manual annotation, the most popular software tool is Praat (Boersma 2001); cf. also Wavesurfer (Beskow & Sjölander 2004), Transcriber (Barras et al. 2001), Annotation Pro (Klessa & Gibbon 2014). The annotations produced with each tool are largely interconvertible. For semi-automatic annotation, a convenient software tool is SPPAS (Bigi 2015). The first tools for the annotation method were originally developed in speech technology, for bootstrapping supervised machine learning procedures in statistical automatic speech recognition. Three main kinds of approach based on the annotation method have emerged: one-dimensional, twodimensional and three-dimensional irregularity and isochrony models. The one-dimensional approaches are based on the calculation of an index based on the descriptive statistics of label durations. These approaches have been used to investigate the temporal typology of different spoken languages, and are necessarily based on an assumption that the kind of unit to be labelled (vocalic and consonantal sequences; syllables; feet; word) is universally found in all languages. The simplest index is the standard deviation of the durations of the units of the relevant type (for variants of this index cf. Roach 1982; Scott et al. 1985). One problem
112
Dafydd Gibbon and Xuewei Lin
with this approach is the hidden factor of speech tempo: the unit rate per second may vary for different reasons during an utterance. This hidden factor of speech tempo variation is abstracted out by the normalised Pairwise Variability Index (nPVI), which averages the normalised duration differences between neighbouring units, yielding an index with an asymptote of 200: 0≤i
FEN − + +
>
IEN − − +
most Slavic Serbo-Croatian Ukrainian
The typology above does not show an option in which IEN would be a licenser but FEN would not. This appears to be theoretically precluded.6
168
Eugeniusz Cyran
7. Conclusions Observed laryngeal phenomena are misleading with respect to the actual nature of the phonological system that stands behind them. The only relevant generalization that can be made on the basis of phonetic facts is whether there is a laryngeal contrast in a given system. It is not always clear which criteria deem a particular laryngeal phenomenon phonetic or truly phonological. Phonologically relevant aspects are: i) the type of laryngeal representation and ii) laryngeal licensing which is responsible for the distribution of laryngeal categories within the word. As for representation, a strictly privative approach seems to provide more insight, especially that some variation may be due to a different system of marking (Cyran 2014). Laryngeal licensing, on the other hand, is due to the function of V positions following the relevant obstruent. Full vowels always license the laryngeal distinction and may impart this property on preceding IEN if the intervening consonant is not an obstruent. Empty nuclei do not license laryngeal distinctions unless they inherit this property (IEN), or they are parametrically set to license. In the latter case, IEN as an independent licenser implies that FEN is one too. All the parametric and structural configurations discussed in this chapter contribute to a very simple view of the very complex variation in voicing phenomena. The relevant empirical division into the two patterns in (2) and (3) is due to the distinction between IEN and FEN and their respective laryngeal licensing settings. Both types of empty nuclei are present in the representation due to the general design of CVCV phonology. Given that boundaries are typically viewed as triggers and blockers of phonological phenomena, only FEN seems to fulfil both criteria. It triggers delaryngealization and blocks transmission of laryngeal licensing from the following context. The only observable phenomenon that ignores almost all boundaries is RVA. It is suggested in this chapter that the phenomenon does not involve formal spreading of a laryngeal property. It is probably best understood as an anticipatory articulatory process which may only to some extent be conditioned by phonology. Namely, as we saw in Ukrainian and SerboCroatian, some types of RVA require a neutral or neutralized target. This chapter is one of the initial steps in a research programme aiming to provide a fuller typology of laryngeal systems from the perspective of a phonological model, rather than from one based on observable phonetic facts.
Notes 1. Progressive assimilation, pre-sonorant sandhi voicing, or other issues such as sonorant transparency and opacity, will be kept aside in this discussion. 2. Phrase boundary is ignored here because no RVA is expected if the final obstruent is followed by silence.
Boundaries/Variation in Laryngeal Phonology
169
3. RVA is therefore symmetrical observationally, not phonologically. 4. For syllabic and non-syllabic approaches to the distribution of the laryngeal contrast in Polish, see e.g., Bethin (1984, 1992), Gussmann (1992), Rubach (1996). 5. Licensing inheritance is an established mechanism of Government Phonology (Harris 1997), though it was not applied to licensing of laryngeal properties in the original proposal. 6. For a discussion of the differences between FEN and IEN, see e.g., Scheer (2004).
References Andersen, H. 1986. Sandhi and prosody: Reconstruction and typology. In H. Andersen (ed.), Sandhi phenomena in the languages of Europe, 231–246. Berlin & New York: Mouton de Gruyter. Backley, P. 2011. An introduction to Element Theory. Edinburgh: Edinburgh University Press. Bethin, C. 1984. Voicing assimilation in Polish. Journal of Slavic Linguistics and Poetics 29. 17–32. Bethin, C. 1992. Polish syllables: The role of prosody in phonology and morphology. Columbus, Ohio: Slavica. Cyran, E. 2014. Between phonology and phonetics: Polish voicing. Berlin & New York: De Gruyter Mouton. Cyran, E. 2017. Hocus bogus? Licensing paths and voicing in Polish. In Geoff Lindsey & Andrew Nevins (eds.), Sonic signatures: Studies dedicated to John Harris, 33–62. Amsterdam & Philadelphia: John Benjamins. Gussmann, E. 1992. Resyllabification and delinking: The case of Polish voicing. Linguistic Inquiry 23. 29–56. Harris, J. 1994. English sound structure. Oxford: Blackwell. Harris, J. 1997. Licensing inheritance: An integrated theory of neutralisation. Phonology 14. 315–370. Itô, J. 1986. Syllable theory in Prosodic Phonology. Stanford University dissertation. New York: Garland Press. Iverson, G. & J. Salmons. 1995. Aspiration and laryngeal representation in Germanic. Phonology 12. 369–396. Jansen, W. 2004. Laryngeal contrast and phonetic voicing: A laboratory phonology approach to English, Hungarian, and Dutch, PhD dissertation, University of Groningen. [S.l.]: s.n. Jansen, W. 2007. Dutch regressive voicing assimilation as a ‘low level phonetic process’: acoustic evidence. In J. van de Weijer & E. J. van der Torre (eds.), Voicing in Dutch: (De)voicing-phonology, phonetics, and psycholinguistics, 125–151. Amsterdam & Philadelphia: John Benjamins Publishing Company. Lowenstamm, J. 1996. CV as the only syllable type. In J. Durand & B. Laks (eds.), Current trends in phonology: Models and methods, 419–441. Salford, Manchester: European Studies Research Institute, University of Salford. Rubach, J. 1996. Non-syllabic analysis of voice assimilation in Polish. Linguistic Inquiry 27. 69–110. Scheer, T. 2004. A lateral theory of phonology: What is CVCV, and why should it be? Berlin: Mouton de Gruyter.
170
Eugeniusz Cyran
Scheer, T. 2011. A guide to morphosyntax-phonology interface theories: How extra-phonological information is treated in phonology since Trubetzkoy’s Grenzsignale. Berlin: Mouton de Gruyter. Schwartz, G. 2016. Representing non-neutralization in Polish sandhi-voicing. In J. Szpyra-Kozłowska & E. Cyran (eds.), Phonology, its faces and interfaces, 103–121. Frankfurt am Main: Peter Lang.
12 Cross-Language Phonetic Relationships Account for Most, But Not All L2 Speech Learning Problems The Role of Universal Phonetic Biases and Generalized Sensitivities Ocke-Schwen Bohn 1. Introduction Anyone interested in the acquisition of the sounds and the sound system of a nonnative language needs to understand how learners exploit knowledge about previously acquired languages when dealing with their L2.1 Katarzyna Dziubalska-Kołaczyk has contributed substantially to this knowledge, especially regarding phenomena that many, including the present author, lacked the courage to study, in this case phonotactics (e.g., Dziubalska-Kołaczyk 2003; Dziubalska-Kołaczyk & Zydorowicz 2014; Dziubalska-Kołaczyk & Zielińska 2011). Given that the L2 acquisition of individual segments still presents us with many unanswered questions (for a review, see Bohn 2018a), it certainly does take courage to boldly go where many have not yet gone and consider L2 acquisition beyond the segment. However, as the studies by Katarzyna Dziubalska-Kołaczyk just cited clearly show, she is among what appears to be a minority of scholars whose curiosity goes beyond the study of the influence of previously learned languages on later learned languages. Katarzyna DziubalskaKołaczyk’s research has contributed importantly to an understanding of universal factors in L2 acquisition, those that “override other relevant factors such as the structure of the L1” (Dziubalska-Kołaczyk 2003: 2729). The present author shares this interest but, lacking Katarzyna Dziubalska-Kołaczyk’s beyond-the-segment courage, has stayed at the apparently easier segmental level. This contribution addresses the very general question of whether crosslanguage phonetic relationships can provide a full account of all L2 speech learning problems. Spoiler alert: The answer to this question is a “not quite”, as the title of this contribution indicates. The structure of this contribution is as follows: Section 2 will, to confirm the “most” in the title, present additional evidence of the predictive value of cross-language
172
Ocke-Schwen Bohn
phonetic relationships for L2 speech learning problems (and successes). Section 3 highlights two of the potentially many aspects of L2 speech learning that cannot be accounted for by previous language experience but instead point to universal, L1-independent strategies which language learners apply. Section 4 reviews some evidence which suggests that higher-order generalized sensitivities derived from non-segment-specific properties of the L1 sound system affect L2 speech learning. This contribution concludes with an attempt to provide an overall picture of the role of cross-language phonetic relationships, universal phonetic biases, and generalized sensitivities.
2. Cross-Language Phonetic Relationships Account for Most L2 Speech Learning Problems Many of the studies that have examined the perception and the production of nonnative speech sounds have been inspired by models of L2 speech learning, all of which assume that speech learning problems (and successes) are directly related to how the L2 learner maps nonnative sounds on to native language categories. Take, for example, the two most widely used models in L2 speech learning research, Flege’s (1995) Speech Learning Model (SLM) and the Perceptual Assimilation Model applied to L2 (PAM-L2) of Best and Tyler (2007). The SLM assumes that the sounds of the L2 are perceived along a continuum ranging from identical over similar to new vis-à-vis native categories, and that learning success depends importantly on the degree to which nonnative sounds can escape equivalence classification with native categories. That is, the more dissimilar an L2 sounds is perceived to be, the more learnable it is. PAM-L2, on the other hand, works with six major assimilation types of nonnative contrasts to native categories, which are used to predict the discriminability and, by extension, the learnability of nonnative contrasts. For example, the assimilation of tokens from a Mandarin Chinese contrast which is realized as [th] vs [tsh] to Danish /th/ (a long-lag affricated alveolar stop, realized as [tsh]) is, in terms of PAM, a SingleCategory (SC) assimilation in which tokens from the contrasting Mandarin categories are assimilated to just one Danish category with no or little difference in goodness of fit assimilation (Rasmussen & Bohn 2017). PAM-L2 predicts learning problems for SC-assimilated contrasts. These problems may also exist for nonnative contrasts that are assimilated to the same L1 category, but with a difference in goodness of fit, such as English [θ] and [f] tokens to Danish /f/ (Bohn & Ellegaard 2019). In this case of this Category-Goodness (CG) assimilation, learnability depends on how large the difference in goodness of fit is for tokens from contrasting categories: A large difference is expected to aid learnability, a small difference is expected to cause learning problems. Nonnative sounds may also be assimilated as Uncategorized (UC), that is, nonnative listeners
Cross-Language Phonetic Relationships
173
may indicate that the nonnative sound is not a (good) fit to any native category (as syllable-initial English [ð] for L1 Danish listeners, Bohn & Ellegaard 2019), or it is assimilated, with low goodness-of-fit ratings, to several native categories, such as Mandarin [tɕ] to Danish /th/, /tj/, and /dj/ (Rasmussen & Bohn 2017). (For more details, also on other assimilation types in PAM-L2, see Best & Tyler 2007. For recent subclassifications of UC assimilations, see Faris, Best, & Tyler 2016). The predictions of both the SLM and PAM-L2 have been tested and, by and large, supported in a very large number of studies (for recent summaries, see Bohn 2018a, 2018b). Some of the L2 speech learning phenomena which are not accounted for by current speech learning models are briefly presented and discussed in Sections 3 and 4. To provide background for these sections, and by way of acknowledging the importance of cross-language phonetic relations for L2 speech learning as posited by these models, consider the results of a study by Bohn and Garibaldi (2017), who explored the phonetic relationships between the close vowels of L2 Danish and the L1s English and Spanish. A superficial comparison of the vowel phoneme inventories would suggest that L1 English and L1 Spanish learners of Danish face the same learning problems because both L1s have only two close vowels, /i/ and /u/, whereas Danish has three, /i/, /u/, and /y/. That, however, is not the case. The participants in the Bohn and Garibaldi (2017) study were ten native speakers each of Spanish and of English, who had spent a mean of 15 years in Denmark and had used Danish on a daily basis. Three L1 Danish speakers provided baseline data. Among other things, Bohn and Garibaldi (2017) examined how the three speaker groups produced the vowels /i/ and /y/ in Danish, using the second formant frequency (F2) as the most direct acoustic correlate of tongue position (front-back) for the Danish close front vowels. The three groups did not differ significantly in their realizations of Danish /i/, but only the L1 Spanish speakers, not the L1 English speakers, produced /y/ vowels with Danish-like F2 values. The English-accented Danish /y/ was produced with F2 values that are too low (the tongue is not advanced enough). The phoneme inventory differences between English and Spanish (with /i, u/) and Danish (with (/i, y, u/) provide no clue as to why the highly experienced L1 Spanish speakers, but not the highly experienced L1 English speakers produced Danish-like close vowels. One indication of why this is so comes from the acoustic comparison of the /u/ vowel in Danish, Spanish, and English as produced by the participants in the Bohn and Garibaldi study. These /u/ in Spanish and in Danish did not differ in terms of F2, resulting in a clear difference between the /u/ realizations in these languages on the one hand and Danish /y/ on the other, whereas English /u/ was produced with a significantly higher F2 (advanced tongue position) which was half way between Danish /i/ and /y/. In other words,
174
Ocke-Schwen Bohn
Table 12.1 Mean Percent Identification of Danish [i, y, u] as L1 /i/ or /u/ by L1 Spanish and L1 English Listeners. Goodness ratings (1 = bad, 5 = perfect) in parentheses. Danish stimuli [i] [y] [u]
Spanish response
English response
/i/
/u/
/i/
/u/
100 (3.7) 33.3 (2.1)
100 (3.3)
66.7 (2.0) 100 (3.6)
100 (2.4) 100 (3.2)
Danish /y/ differs greatly from Spanish /u/, which should aid learnability, but Danish /y/ is quite similar to English /u/, which should make learning difficult. However, since acoustic comparisons have been shown to be unreliable guides to perceived cross-language similarity (e.g., Strange, Bohn, Trent & Nishi 2004; Strange, Bohn, Nishi & Trent 2005), Bohn and Garibaldi (2017) also examined how the participants assimilated Danish [i, u, y] to native categories. Table 12.1 shows that, as expected, Danish [i] and [u] tokens are always assimilated to Spanish and English /i/ and /u/, respectively. However, the nonnative groups differ in their assimilation patterns for [y]: The L1 English participants assimilated Danish [y] exclusively to English /u/ (with a somewhat lower goodness-of-fit rating than for Danish [u]), which is a Category Goodness assimilation suggesting that L1 English listeners perceive Danish [y] and [u] to be similar. This differs from the L1 Spanish listeners, whose assimilation pattern for Danish [y] with low goodness-of-fit ratings and a split between two native categories suggests that this vowel is Uncategorized (PAM-L2) or new (SLM). The production of Danish close vowels shows that cross-language phonetic relationships, both in terms of acoustics and perceived crosslanguage similarity, can account for L2 speech learning success (L1 Spanish speakers) and failure (L1 English speakers). The next sections provide evidence suggesting that additional sources need to be taken into account for a more complete picture of L2 speech learning.
3. Universal Phonetic Biases in L2 Speech Learning L2 speech research which is based on the assumption that cross-language phonetic relationships provide a near-complete account of challenges in L2 speech learning acknowledges that the phonetic landscape is uneven, and it assumes that this unevenness is due to perceptual biases rooted in the Ll. However, over the past ca. 25 years, research has accumulated which clearly shows that, in L2 speech learning, “L1 transfer doesn’t tell
Cross-Language Phonetic Relationships
175
it all” (Bohn 1995). In other words, L2 learners are biased, and some of these biases cannot be attributed to the L1 because they are universal, L1-independent biases. One of these biases was first observed with L1 Spanish speakers whose strong reliance on the duration cue in their perception of the English /i/—/ɪ/ contrast could not be attributed the L2 English (this English vowel contrast is near-exclusively cued by vowel quality, not quantity), nor could it be attributed to the learners’ L1 because Spanish does not use duration to contrast vowels (Bohn 1995; Flege, Bohn & Jang 1997). This observation led Bohn (1995: 294–295) to formulate the Desensitization Hypothesis, which states that “whenever spectral differences are insufficient to differentiate vowel contrasts because previous linguistic experience did not sensitize listeners to these spectral differences, duration differences will be used to differentiate the nonnative vowel contrast”, to which should be added “irrespective of whether the duration cue is phonologically relevant in the listener’s L1”. Since 1995, this hypothesis has been tested and confirmed in studies with L2s with relatively large and L1s with relatively small vowel inventories, such as L2 English and L1 Spanish (Kondaurova & Francis 2008; Escudero & Boersma 2004), L1 Catalan (Cebrian 2006), L1 Portuguese (Rauber, Escudero, Bion & Baptista 2005), L1 Russian (Kondaurova & Francis 2008), L1 Polish (Bogacka 2004), L1 Mandarin (Flege et al. 1997), as well as L2 Dutch and L1 Spanish (Escudero, Benders & Lipski 2009; Lipski, Escudero & Benders 2012), and L2 German and L1 Turkish (Darcy & Krüger 2012). Thus, there is solid evidence confirming the Desensitization Hypothesis, which posits a bias in L2 speech learning that cannot be attributed to a characteristic of the L1 sounds system (in this case, duration as cue for vowel identification). Another universal bias in L2 speech perception was first discovered in a series of infant vowel perception studies. Polka and Bohn observed that the discrimination level of prelingual infants for vowel contrasts depended on which of the two vowels in a contrast was the background and which was the foreground vowel. (Note that in many infant speech perception studies, discrimination ability is examined in versions of the change/no change paradigm in which participants are tested on their ability to discover a change of stimuli from repeatedly presented background category to a foreground category, e.g., dut . . . dut . . . dut . . . dyt . . . dyt . . .). Figure 12.1 shows that in the studies which Polka and Bohn conducted (Polka & Bohn 1996; Bohn & Polka 2001; Polka & Bohn 2003, 2011) and which we reviewed, it was always2 the vowel that is relatively more peripheral in the vowel space that made it difficult to perceive a change towards the less peripheral vowel (e.g., /u/ to /y/), and a change from the less peripheral vowel to the more peripheral vowel that was easier to discriminate (e.g., /y/ to /u/). These perceptual asymmetries led Polka and Bohn to postulate the Natural Referent Vowel framework
176
Ocke-Schwen Bohn
Figure 12.1 Plot of F1/F2 Frequencies for Contrasts Showing Asymmetries in Infant Vowel Discrimination Studies. Arrows point to the referent vowels for the contrast; vowel changes in this direction were easier to discriminate. Formant frequency values are approximate. Source: Polka & Bohn 2011.
(Polka & Bohn 2011, see also Bohn & Polka 2014; Polka, Bohn & Weiss 2015), which posits among other things that independent of the infant’s ambient language, relatively peripheral vowels will act as natural referents, probably because of their acoustic properties which Schwartz, Abry, Boe, Menard, and Vallee (2005) described as formant focalization. The relevance of the Natural Referent Vowel framework for L2 speech learning derives from two of the framework’s hypotheses. First, the framework predicts that it is not, as was originally assumed by Polka and Werker (1994), the vowel that the learner has been exposed to in the ambient language which makes a “foreign” vowel less discriminable. The Polka and Bohn studies clearly showed that this is not the case because, for instance, both German-learning and English-learning infants show the same perceptual asymmetries for the German /u/—/y/ contrast (with /u/ being a vowel shared by English and German) and for the English /ɛ/—/æ/ contrast (with /æ/ being an English-only vowel). Additional evidence for peripherality, not L1 experience, as the factor causing asymmetries comes from a recent study by Masapollo, Polka, Molnar, and Menard (2017) who compared adult L1 English and L1 French speakers’ perceptual asymmetries in the close back portion of the vowel space: For both listener groups, a French-like back [u] was the attractor, even though the backest closest vowel in English is realized as a fronted [ʉ].
Cross-Language Phonetic Relationships
177
The second hypothesis derived from the Natural Referent Vowel framework which is relevant for L2 speech learning states that these asymmetries will be maintained beyond infancy if the ambient language does not provide experience with one of the contrasting vowels, but will be lost if the contrasting vowels are present in the ambient language. We tested this prediction in several experiments, one of which examined the discriminability of the German vowel contrasts /u/—/y/ and /ʊ/—/ʏ/ by L1 English and L1 German adults (Polka, Bohn, & Molnar 2005). As predicted, L1 English, but not L1 German listeners perceived each contrast asymmetrically in that they showed the same perceptual asymmetry as English- and German-learning infants and children, for whom the peripheral back vowel (as background category) made the less peripheral front vowel less discriminable. Additional evidence for the maintenance of the special status of referent vowels in the absence of specific experience comes from Danish: The vowels /ɒ/—/ʌ/, which are contrastive in Southern British English (as in hot-hut), but not in Danish, are asymmetrically perceived by both Danish-learning infants and L1 Danish adults, with the more peripheral /ɒ/ vowel serving as referent. One consequence of the maintenance of the special status for referent vowels is that the question of whether a nonnative vowel contrast is difficult to perceive for L2 learners cannot be answered with a simple “yes” or “no”, but with the more interesting response “it depends”. Polka and Bohn (2011) reported that for adult L1 Danish listeners, the overall percent correct discrimination rate for the /ɒ/—/ʌ/ contrast is 59.8%. However, if the direction of presentation is taken into account, discrimination is at chance level (50.5%) when the direction of change is from the more peripheral /ɒ/ to /ʌ/, but it is as high as 69.2% when the direction of change is in the opposite direction. These and other asymmetries are not just of interest for general speech perception research in that they inform us about universal patterns of vowel perception; they could also have direct applications for perceptual training studies which could examine whether training in the “easier” direction aids the perception of nonnative contrasts in the “harder” direction. The universal, L1-independent privileged status of peripheral vowels which provides the motivation for Polka and Bohn’s (2011) Natural Referent Vowel framework has recently been conformed in a meta-analysis by Tsuji and Cristia (2017). But what about consonants? Are there consonants which, irrespective of the language background, have a special status in perception and thus may be important for some aspects of language learning? The question of whether natural referents exists has only recently been extended to consonants, so no firm conclusion can (yet) be drawn. A good candidate for natural referent consonants are alveolars, no matter whether they are liquids, affricates, plosives, or fricatives, which all serve as attractors for, respectively, retroflex liquids (L1 Japanese—L2
178
Ocke-Schwen Bohn
English listeners in Cutler, Weber & Otake 2006), for retroflex affricates (L1 Burmese and L1 Malay—L2 Mandarin listeners, Lai 2009), for dentals and retroflexes (L1 English—L2 Wubuy listeners in BundgaardNielsen, Baker, Kroos, Harvey & Best 2015), and for labiodentals (L1 English listeners in Schluter, Politzer-Ahles & Almeida 2016). In addition, a perceptual asymmetry with /b/ attracting /v/ was observed for L1 Japanese listeners of L2 English by Tsushima, Shiraki, Yoshida & Sasaki (2005). Interestingly, the very few infant speech perception experiments that examined perceptual asymmetries for consonants suggest that infant and adult L2 learners share the same consonant referents: Tsuji, Mazuka, Cristia and Fikkert (2015) reported that 4-month-old infants show perceptual asymmetries favoring /n/ over /m/ when presented with the /m/—/n/ contrast, and /t/ over /p/ when presented with the /t/—/p/ contrast, adding to the evidence of a special status of alveolars. Also, 5- to 6-month-old infants favor /b/ over /v/ when presented with the /b/—/v/ contrast (Nam & Polka 2016). Overall, these studies suggest that alveolars and stops are somehow “better” consonants for both L1 and L2 learners. More research is clearly needed, but the findings reported so far carry the promise of providing psycholinguistic validity to descriptive notions such as “underspecification” and “markedness”.
4. Generalized L1-Based Sensitivities Several findings from L2 speech research are difficult to account for even if cross-language phonetic relationships and universal phonetic biases are taken into account. The Bohn and Best (2012) study which compared the perception of American English approximants by nonnative listeners with the L1s French, Danish, German, and Japanese reported two results which illustrate this, and which point to an influence on L2 speech which will somewhat vaguely be labeled “generalized sensitivities”. One experiment in the Bohn and Best (2012) study examined and compared the identification of stimuli from a ten-step rock-lock continuum. We expected that, even though French, Danish, and German have a phonemic /r/—/l/ contrast, their identification would be compromised because of differences in the phonetic realizations: French, Danish, and German all have non-velarized [l] realizations of /l/, whereas American English has [ɫ]. In addition, the /r/ realizations of French and German are [ʁ], and Danish has a pharyngeal approximant [ʕ], whereas American English has [ɻ]. In spite of these differences, the L1 Danish, L1 French, and L1 German listeners did not differ from the L1 American English listeners with respect to the location of the category boundary or the slope of the identification function for the rock-lock continuum. Why did the considerable phonetic differences between the realizations of /r/ and /l/ in American English on the hand and French, Danish, and German on the other not influence the nonnatives’ identification? Bohn
Cross-Language Phonetic Relationships
179
and Best (2012) speculated that several factors may have contributed to this unexpected finding: In all four languages, /r/ and /l/ share a number of phonotactic characteristics, such as their place in consonant clusters and their likelihood to merge with, or to color, preceding vowels. Also, German, Danish, and English (and to a lesser extent in French), share a fairly large number of cognates, such as ring, bring, rose, land, lang/long, lamp(e), etc. and the unambiguous orthographic symbols and . It is not unlikely that all this aids nonnatives in the clear perceptual differentiation of American English /r/ and /l/. Another unexpected finding in the Bohn and Best (2012) study resulted in a relatively easily testable hypothesis. We found that listeners with French, with Danish, or with German as their L1s outperformed L1 American English listeners in the discrimination of a ten-step synthetic continuum ranging from /w/ to /j/. The nonnative listeners perceived this continuum in a continuous fashion, with discrimination levels that were at or near ceiling (> 80% correct). The L1 American English listeners, however, showed two discrimination peaks and discrimination levels that were below 80% for all stimulus pairs. We suggested that the surprisingly high and better-than-native discriminability of the /w/—/j/ contrast by the nonnative listeners could be related to the fact that French, Danish, and German have contrastive lip rounding for vowels, unlike English. (Note that [w] and [j] are nonsyllabic, consonantal versions of [u] and [i], respectively.) This led us to hypothesize that the factor that causes continuous high-level discrimination of the /w/—/j/ contrast is listeners’ sensitization to lip rounding distinctions, which is generalized from the sensitivity to lip rounding distinctions for vowels. Thus, test cases for this hypothesis could be L1 listeners of Japanese (with no lip rounding distinction for vowels) and Turkish (with such a distinction—actually even finer grained that French, Danish, or German, because Turkish has a four-way close vowel distinction between rounded /u, y/ and unrounded /i, ɯ/.) For Japanese, data confirming this hypothesis were already reported by Best and Strange (1992). We are currently testing the hypothesis with L1 Turkish listeners, and the preliminary results from six participants look like the hypothesis is supported. As shown in Figure 12.2, the discrimination level of the L1 Turkish listeners is higher than for the L1 American English listeners, and the L1 Turkish listeners seem to perceive the English /w/—/j/ contrast near ceiling and in a continuous fashion. Along similar lines, Pajak and Levy (2014) reported what they described as an “enhanced general sensitivity along phonetic dimensions that the listeners’ native language employs to distinguish between categories”. Pajak and Levy examined whether listeners’ familiarity with duration to signal vowel contrasts would aid their discrimination of consonant duration contrasts. L1 listeners of Korean (with both vowel and consonant duration contrasts), of Vietnamese and of Cantonese (with only vowel duration contrasts), and of Mandarin (no duration contrasts)
180
Ocke-Schwen Bohn
Figure 12.2 Discrimination Function for a /w/—/j/ Continuum as Perceived by L1 American English and L1 Turkish Listeners. (Preliminary data, for information on stimuli characteristics and procedures, see Best & Strange 1992 and Bohn & Best 2012)
discriminated nonce words with a [CVC(:)V] structure. Pajak and Levy reported that, as expected, L1 Korean listeners performed best, and that L1 Vietnamese and L1 Cantonese listeners performed significantly better than L1 Mandarin listeners, which suggests that L1 experience with a phonetic dimension for one type of segment can be exploited in nonnative speech perception for other types of segments. General conclusion from the Bohn and Best (2012) and Pajak and Levy (2014) studies is that nonnative speech perception can be shaped by generalized sensitivities (e.g., significance of lip rounding and duration) in addition to crosslanguage phonetic relationships and universal biases.
5. Conclusion By way of concluding this overview of recent research on sources of L2 speech learning problems, consider Figure 12.3: At the center, both in this figure and in L2 speech learning, are the cross-language phonetic relationships which account for a large proportion of the phenomena observable in L2 speech learning. However, as shown at the bottom of Figure 12.3, these relationships are shaped by L1- and L2-independent
Cross-Language Phonetic Relationships
181
Figure 12.3 Schematic Representation of Phonetic Factors Which Influence L2 Speech Learning (See Text).
universal phonetic biases such as those formulated in the Natural Referent Vowel framework or the Desensitization hypothesis, and as illustrated by examples presented in section 3 of this contribution. We can conceptualize these influences as “bottom-up” influences which the learner (L1 and L2) brings to the task of language learning irrespective of her/his language background. On the other hand, L2 speech research also needs to acknowledge the possible influence of learner sensitivities that are caused by general properties of the L1, such as the use of lip rounding or duration to contrast segments, whose presence or absence in the L1 may aid or stand in the way of L2 speech learning. Clearly, much more research is needed on both universal and language-generalized influences on L2 speech learning. While the existence of Natural Referent Vowels is solidly documented, we have only some promising indications that Natural Referent Consonants may exist and influence L2 speech learning, and the existence of Natural Reference Tones needs yet to be explored. With respect to generalized sensitivities, it would be rewarding to explore whether other general properties of sound systems (such as the existence of tense/lax contrasts) can be exploited by L2 learners, and whether just the L1, or also the L2 can be the source of these sensitivities, at least for advanced learners (as in Balas 2018). To conclude, this contribution started by acknowledging the importance of cross-language phonetic relationships in L2 speech learning. However, a complete picture of L2 speech learning can only be achieved if we acknowledge additional factors which “override other relevant factors such as the structure of the L1” (Dziubalska-Kołaczyk 2003: 2729).
Notes 1. The abbreviation L2 (second language) is used here as a shorthand for any language learned after first language (L1) acquisition has started.
182
Ocke-Schwen Bohn
2. The one exception is the asymmetry for the /e/-/ø/ contrast, which is also the only contrast that involves not just a change in the location in the F1/F2 space, but also in lip rounding.
References Balas, A. 2018. Non-native vowel perception: The interplay of categories and features. Poznań: Wydawnictwo Naukowe UAM. Best, C. T. & W. Strange. 1992. Effects of phonological and phonetic factors on cross-language perception of approximants. Journal of Phonetics 20(3). 305–330. Best, C. T. & M. D. Tyler. 2007. Nonnative and second-language speech perception: Commonalities and complementarities. In O.-S. Bohn & M. M. Munro (eds.), Language experience in second language speech learning: In honor of James Emil Flege, 1–47. Amsterdam: John Benjamins. Bogacka, A. 2004. On the perception of English high vowels by Polish learners of English. Proceedings of the University of Cambridge second postgraduate conference in language research, 43–50. Cambridge, UK: Cambridge University Press. Bohn, O.-S. 1995. Cross-language speech perception in adults: L1 transfer doesn’t tell it all. In W. Strange (ed.), Speech perception and linguistic experience: Theoretical and methodological issues in cross-language speech research, 275–300. Timonium, MD: York Press. Bohn, O.-S. 2018a. Second language phonetics. In M. Aronoff (ed.), Oxford research encyclopedia of linguistics. Oxford, UK: Oxford University Press. Bohn, O.-S. 2018b. Cross-language and second language speech perception. In E. M. Fernandez & H. S. Cairns (eds.), The handbook of psycholinguistics, 213–239. New York: Wiley. Bohn, O.-S. & C. T. Best. 2012. Native-language phonetic and phonological influences on perception of American English approximants by Danish and German listeners. Journal of Phonetics 40. 109–128. Bohn. O.-S. & A. Ellegaard. 2019. Perceptual assimilation and graded discrimination as predictors of identification accuracy for learners differing in L2 experience: The case of Danish listeners’ perception of English initial fricatives. International Congress of Phonetic Sciences, Melbourne. 2070–2074. Bohn, O.-S. & C. L. Garibaldi. 2017. Production and perception of Danish front rounded /y/: A comparison of ultimate attainment in native Spanish and native English Speakers. In M. Yavas, M. M. Kehoe & W. C. Cardoso (eds.), RomanceGermanic bilingual phonology, 121–136. Sheffield: Equinox Publishing. Bohn, O.-S. & L. Polka. 2001. Target spectral, dynamic spectral, and duration cues in infant perception of German vowels. Journal of the Acoustical Society of America 110(1). 504–515. Bohn, O. S. & L. Polka, L. 2014. Fast phonetic learning in very young infants: What it shows, and what it doesn’t show. Frontiers in Psychology 5. 511. Bundgaard-Nielsen, R. L., B. J. Baker, C. H. Kroos, M. Harvey & C. T. Best. 2015. Discrimination of multiple coronal stop contrasts in Wubuy (Australia): A natural referent consonant account. PLoS One 10(12). e0142054. Cebrian, J. 2006. Experience and the use of non-native duration in L2 vowel categorization. Journal of Phonetics 34(3), 372–387. Cutler, A., A. Weber & T. Otake. 2006. Asymmetric mapping from phonetic to lexical representations in second-language listening. Journal of Phonetics 34(2). 269–284.
Cross-Language Phonetic Relationships
183
Darcy, I. & Krüger, F. 2012. Vowel perception and production in Turkish children acquiring L2 German. Journal of Phonetics 40(4). 568–581. Dziubalska-Kołaczyk, K. 2003. On phonotactic difficulty. In Proceedings of the 15th international congress of phonetic sciences, 2729–2732. Barcelona: Futurgraphic. Dziubalska-Kołaczyk, K. & D. Zielińska. 2011. Universal phonotactic and morphonotactic preferences in second language acquisition. In K. DziubalskaKoɫaczyk, M. Wrembel & M. Kul (eds.), Proceedings of the 6th international symposium on the acquisition of second language speech, new sounds 2010, 53–63. Frankfurt am Main: Peter Lang. Dziubalska-Kołaczyk, K. & P. Zydorowicz. 2014. The production of highfrequency clusters by native and non-native users of Polish. Concordia Working Papers in Applied Linguistics 5. 130–144. Escudero, P., T. Benders & S. C. Lipski 2009. Native, non-native and L2 perceptual cue weighting for Dutch vowels: The case of Dutch, German, and Spanish listeners. Journal of Phonetics 37(4). 452–465. Escudero, P. & P. Boersma. 2004. Bridging the gap between L2 speech perception research and phonological theory. Studies in Second Language Acquisition 26(4). 551–585. Faris, M. M., C. T. Best & M. D. Tyler. 2016. An examination of the different ways that non-native phones may be perceptually assimilated as uncategorized. Journal of the Acoustical Society of America 139(1). EL1–EL5. Flege, J. E. 1995. Second-language speech learning: Theory, findings, and problems In W. Strange (ed.), Speech perception and linguistic experience: Theoretical and methodological issues in cross-language speech research, 233–277. Timonium, MD: York Press. Flege, J. E., O.-S. Bohn & S. Jang. 1997. Effects of experience on non-native speakers’ production and perception of English vowels. Journal of Phonetics 25(4). 437–470. Kondaurova, M. V. & A. L. Francis. 2008. The relationship between native allophonic experience with vowel duration and perception of the English tense/ lax vowel contrast by Spanish and Russian listeners. Journal of the Acoustical Society of America 124(6). 3959–3971. Lai, Y. H. 2009. Asymmetry in Mandarin affricate perception by learners of Mandarin Chinese. Language and Cognitive Processes 24(7–8). 1265–1285. Lipski, S. C., P. Escudero & T. Benders. 2012. Language experience modulates weighting of acoustic cues for vowel perception: An event-related potential study. Psychophysiology 49(5). 638–650. Masapollo, M., L. Polka, M. Molnar & L. Ménard. 2017. Directional asymmetries reveal a universal bias in adult vowel perception. Journal of the Acoustical Society of America 141(4). 2857–2869. Nam, Y. & L. Polka. 2016. The phonetic landscape in infant consonant perception is an uneven terrain. Cognition 155. 57–66. Pajak, B. & R. Levy, R. 2014. The role of abstraction in non-native speech perception. Journal of Phonetics 46. 147–160. Polka, L. & O.-S. Bohn. 1996. A cross-language comparison of vowel perception in English-learning and German-learning infants. Journal of the Acoustical Society of America 100(1). 577–592. Polka, L. & O.-S. Bohn. 2003. Asymmetries in vowel perception. Speech Communication 41(1). 221–231.
184
Ocke-Schwen Bohn
Polka, L. & O.-S. Bohn. 2011. Natural Referent Vowel (NRV) framework: An emerging view of early phonetic development. Journal of Phonetics 39(4). 467–478. Polka, L., O.-S. Bohn. & M. Molnar. 2005. Natural referent vowels guide the development of vowel perception. Journal of the Acoustical Society of America 117(4). 2398. Polka, L., O.-S. Bohn. & D. J. Weiss. 2015. Commentary: Revisiting vocal perception in non-human animals: A review of vowel discrimination, speaker voice recognition, and speaker normalization. Frontiers in Psychology 6. 941. Polka, L. & J. F. Werker. 1994. Developmental changes in perception of nonnative vowel contrasts. Journal of Experimental Psychology: Human Perception and Performance 20(2). 421. Rasmussen, S. & O.-S. Bohn. 2017. Perceptual assimilation of Mandarin Chinese consonants by native Danish listeners. Journal of the Acoustical Society of America 141(5). 3518. Rauber, A. S., P. Escudero, R. A. Bion & B. O. Baptista. 2005. The interrelation between the perception and production of English vowels by native speakers of Brazilian Portuguese. Proceedings of Interspeech 2005. 2913–2916. Schluter, K., S. Politzer-Ahles & D. Almeida. 2016. No place for /h/: an ERP investigation of English fricative place features. Language, Cognition and Neuroscience 31(6). 728–740. Schwartz, J. L., C. Abry, L. J. Boë, L. Ménard & N. Vallée. 2005. Asymmetries in vowel perception, in the context of the Dispersion-Focalisation Theory. Speech Communication 45(4). 425–434. Strange, W., O.-S. Bohn, K. Nishi & S. A. Trent. 2005. Contextual variation in the acoustic and perceptual similarity of North German and American English vowels. Journal of the Acoustical Society of America 118. 1751–1762. Strange, W., O.-S. Bohn, S. A. Trent & K. Nishi. 2004. Acoustic and perceptual similarity of North German and American English vowels. Journal of the Acoustical Society of America 115. 1791–1807. Tsuji, S. & A. Cristia. 2017. Which acoustic and phonological factors shape infants’ vowel discrimination? Exploiting natural variation in InPhonDB. Interspeech Stockholm. 2108–2112. Tsuji, S., R. Mazuka, A. Cristia & P. Fikkert. 2015. Even at 4 months, a labial is a good enough coronal, but not vice versa. Cognition 134. 252–256. Tsushima, T., S. Shiraki, K. Yoshida & M. Sasaki. 2005. Stimulus order effects in discrimination of a nonnative consonant contrast, English /b—v/, by Japanese listeners in the AX discrimination procedure. Paper presented at the First Acoustical Society of America Workshop on L2 Speech Learning, Vancouver, Canada, 13–15 May.
13 L1 Foreign Accentedness in Polish Migrants in the UK An Overview of Linguistic and Social Dimensions Agnieszka Kiełkiewicz-Janowiak and Magdalena Wrembel 1. Introduction For decades of research on bilingual competence, acquisition has received more attention than has deterioration. This focus on development may be justified by the scholars’ ultimate interest in speakers’ well-being and societies’ integrated growth. As a result, however, “far less is known about loss or attrition of language skills than about their acquisition” (Schmid 2016: 186). Globalisation and transnational migration have foregrounded the vulnerability of L1 as minoritised in migration settings (see studies of English as L1 in migrants: Paradis 2007; Ribes & Llanes 2015). In the present contribution we bring forward the L1 attrition processes manifested as foreign accentedness in Polish migrants to the UK and its relation to the acculturation phenomena, thus encompassing the linguistic and social dimensions of this process. We aim to overview relevant research findings and analyse inherent methodological challenges in order to propose a more holistic research design to investigate L1 foreign-accentedness in its social context.
2. Linguistic and Social Dimensions of Language Attrition 2.1 Language Attrition in Social Context: Literature Review Schmid (2016), in her “research timeline” on language attrition studies, distinguishes several phases of the growing awareness and research attention being given to what was early on (in the 1980s) defined as attrition: “the non-pathological loss of previously fully acquired first (L1) or second language (L2) skills in adult speakers” (Schmid 2016: 186). The research acquired more focus in the next decade, when studies came to be theoretically sound and empirically driven. Increasingly, language attrition research attained more visibility as relevant to larger issues pertaining to the dynamics of bi-and multilingualism.
186
Kiełkiewicz-Janowiak and Wrembel
The research on adult migrants, who arrived in the destination country with a stable competence in their first language, has shown the vulnerability of L1 to restructuring under the influence of L2. Nevertheless, it has been claimed that L1 skills may remain “extremely robust” for the first decade of life in migration (see de Bot and Clyne’s (1994) study of Dutch-English bilingualism in Australia). On the other hand, Waas (1996), in her study of German long-term adult migrants in Australia, concludes that “L1 attrition in an L2 environment is inevitable, even after a stay of only 10 to 20 years” and that “socio-demographic factors such as citizenship and (non-)affiliation have an impact on the extent of L1 attrition” (Waas 1996: 171). An important, though under-researched, dimension of language attrition research refers to the investigations of what we have called the individual’s ‘linguascape’ over their lifespan. For one thing, the history of language contact in a migrant’s life is fundamentally important, and it directly relates to their age of arrival and time since then, spent in L2 environment. For another, there are age-grading effects in the way people ‘manage’ their access to and usability of language through the life-course: their changing social roles (but also societal expectations) will make one language dominant and another peripheral, which in turn will trigger people’s decisions to acquire, or give up on, some language skills. Expectedly, in the migration context, the requirements of the workplace will strengthen the ability and preference to use the dominant community language, and peripheralise the heritage language, i.e., the migrant’s L1. Thus, accordingly, studies have given attention to the processes of language shift/attrition over the individuals’ lifespans (see above) as well as in the context of their social and familial embedding (see Hulsen 2000; Kim & Starks 2010). Schmid and Dusseldorp (2010), in a quantitative analysis, studied the impact of extralinguistic factors on L1 retention/attrition, which had so far been considered relevant. The predictors were grouped under the following labels: identification and affiliation with L1, exposure to L1 and attitude towards L1. The authors emphasised that the factors should not be considered in isolation, but rather as interdependent and interacting. In terms of populations affected, Schmid and Dusseldorp (2010: 128–130) divided extralinguistic predictors of language attrition into general ones, relevant to all speakers, and factors “pertaining to bilingual populations”, including those in the context of migration. In the case of bilinguals in a migration setting, details of migration history count, for instance, age at and time since arrival in the destination country. Specifically, there is much research testifying to the significant effect of age roughly between 8 and 13 years on L1 attrition, but not beyond puberty (Köpke & Schmid 2004; Pallier 2007). For early-onset bilinguals (i.e., migrants with a relatively early age of arrival), L1 grammar is more likely to be incomplete (see, for instance, Montrul 2002). As for pronunciation, studies based
L1 Foreign Accentedness in Polish Migrants
187
on free speech production suggest that late bilinguals hardly ever attain nativelike performance (for a review see Abrahamsson & Hyltenstam 2009). Similarly, certain socio-psychological factors will be relevant in the case of all bilingual speakers. While motivation for learning has been shown to largely influence the success of L2 acquisition, there is less research on the impact of attitudinal factors on L1 attrition (Schmid 2002; BenRafael & Schmid 2007). Nevertheless, findings suggest that positive attitudes to L1 and to heritage culture lead to L1 maintenance, and negative attitudes—respectively—lead to attrition. The social circumstances of contact with L1 seem to be a relevant factor: predictably, disuse of L1, measured by time and frequency, is conducive to attrition, as reflected in retrieval difficulty in online language use (Paradis 2007). However, this dependency has not been supported by empirical work, and Schmid and Dusseldorp (2010: 130) stress the need to refine the L1 use factor with respect to the quality and context of use (e.g., formal/informal situations, workplace/home). In their study measuring the predictive power of sociolinguistic and extralinguistic factors for L1 attrition, they found little impact of language use and language attitudes, with the exception of the use of L1 in the workplace, which seems to have “a protective effect against language attrition” (Schmid & Dusseldorp 2010: 150). Strikingly, the use of L1 within the family and with friends proved hardly related to attrition, although this seemed relevant to the migrants themselves, as gathered from their metalinguistic comments. Education and length of residence are the two factors that have been frequently singled out (and shown relevant) in attrition-in-migration research. However, more importantly, language practices emerge in the complex process of acculturation, i.e., the “confluence among heritage-cultural and receiving-cultural practices, values, and identifications” (Schwartz et al. 2010: 237). The latter is problematised by the (potentially unfavourable) ‘context of reception’, defined as “the ways in which the receiving society constrains and directs the acculturation options available to migrants” (Schwartz et al. 2010: 338). Researching such complex conditioning of language use requires adequate methodological complexity. 2.2 Linguistic Dimensions of Language Attrition Having reviewed the social aspects involved in first language attrition, we would like to turn to the linguistic dimension of this process, narrowing the scope of our discussion to the phenomenon of foreign accentedness. The term ‘foreign accent’ is commonly used to refer to segmental and prosodic deviations from the native pronunciation norms. Generally, it is assessed holistically by raters on the basis of such rating parameters as the degree of foreign accent, speech intelligibility and/or acceptability,
188
Kiełkiewicz-Janowiak and Wrembel
as reported in the second language acquisition (SLA) literature (cf. Gallardo del Puerto et al. 2007; Piske et al. 2001). A scalar evaluation of foreign accent by raters is frequently supplemented by the identification of phonetic and phonological features contributing to the perception of a foreign accent in the participants. 2.2.1 Foreign Accentedness Ratings: Overview of Research The ratings of perceived global foreign accent have been widely applied in second language acquisition (SLA) research, mostly on adult learners (e.g., Flege 1988; Gallardo del Puerto et al. 2007; Piske et al. 2001). The results of foreign accentedness ratings (FARs) indicate that the degree of foreign accent is the most severely judged rating parameter, and that some degree of foreign accentedness does not necessarily preclude the lack of intelligibility. It is generally acknowledged that the degree of foreign accentedness may differ as the function of the characteristics of the subjects examined, the most important differentiating factors being the L1 background, the L2 target and the amount of language experience (Piske et al. 2001). Further, numerous studies focused on the question of age of the first exposure to the L2 or the age of arrival to an L2-speaking country (AoA) in the case of immigrant subjects. The general findings of these accent ratings provide further evidence for the claim that ‘the earlier, the better’, i.e., the lower the age of arrival to the receiving country, the lower the degree of foreign-accented language production in the community language (e.g., Flege & Fletcher 1992; Flege 1995). In turn, Wrembel et al. (2019) investigated the predictors of foreignaccentedness in the heritage language of Polish-English bilingual children raised in the UK. To this end, the children’s L1 Polish speech samples were phonetically analysed by trained phoneticians and rated by 55 Polish raters for the degree of native accent, intelligibility and acceptability in order to test whether their L1 Polish would be perceived as different from that of monolinguals matched for age and socioeconomic status. The findings demonstrated significant differences between bilingual and monolingual children in terms of atypical speech patterns as well as holistic accentedness ratings. The amount of exposure to L1 Polish was found to be the main predictor of accentedness. 2.2.2 Foreign Accent Studies in L1 Attrition The problem of foreign accentedness has been less frequently explored in the context of L1 attrition, with only a handful of investigations comparing foreign accentedness in L1 attriters vs. L2 acquirers. Hopp and Schmid’s (2013) study aimed to explore how perceived foreign accent is conditioned by such factors as the age of onset and bilingual experience. To this end, they compared two groups (i.e., late bilinguals and highly advanced L2 learners) on how native-like their speech was
L1 Foreign Accentedness in Polish Migrants
189
assessed to be. The findings indicated that both bilingual populations were fairly comparable, however the L1 attriters were found to be more similar to the native controls than the L2 learners. The factors that correlated with the degree of perceived accentedness included the frequency of L1 use as well as the time spent in the L1-speaking environment. In another study Schmid and Hopp (2014) compared foreign accent in L1 attrition and L2 acquisition. The major focus of the study was to test how various factors such as raters’ characteristics, the range within speech samples and procedures applied may affect accent ratings. The findings demonstrated that rater differences did not result in systematic changes in rating patterns. However, the ratings were found to be affected by range effects as well as familiarity with accented speech. The authors showed that including more strongly foreign-accented speech samples resulted in lower ratings for the whole group of L2 speakers compared to native controls. Similarly, lower familiarity with foreign accent was shown to lead to harsher and more variable foreign accentedness judgements. In a study on Polish-English bilingual children, Marecka et al. (2015) investigated specific features that differentiated their speech from that of Polish monolinguals and attested to a phonetic drift towards the community language (i.e., English). Based on a detailed auditory analysis, the authors identified the following problem areas in L1 Polish of the bilingual children living in the UK: vowel quality and/or quantity distorted, vowel reduction applied to Polish, non-native-like consonants, reductions and substitutions of consonantal clusters, lack of consonant palatalization, atypical VOT patterns in plosives as well as suprasegmental features such as incorrect number of syllables or incorrect stress patterns. In a follow-up study, which combined FARs with the phonetic analysis of bilingual children’s speech, Wrembel et al. (2019) examined which categories of atypical speech patterns were the most salient features correlated to perceived foreign accent. The results demonstrated that atypical prosody was the best predictor of holistic accent assessment. In other words, bilingual children who had problems with retaining the syllabic structure of the words or applying accurate stress patterns were more likely to be perceived as non-native and unintelligible in their L1. This is contrary to some previous research which indicates that the presence of non-native segmental features in speech samples is a stronger predictor of foreign accentedness than the presence of non-native suprasegmental features (Liu & Lee 2012).
3. Methodological Considerations 3.1 Extralinguistic Factors in L1 Attrition—Data Type and Elicitation Issues The challenge of selecting appropriate research tools for capturing the relationship between language proficiency, social and situational variables,
190
Kiełkiewicz-Janowiak and Wrembel
and metalinguistic knowledge has been generally acknowledged (e.g., Schmid 2011; Jaspaert & Kroon 1989). Major methodological concerns relate to the various tools applied for data collection (interview or experimental tasks), the types of data elicited (e.g., free speech or responses to language tests) and the reliability of self-report data, including selfassessments (see also Kim & Starks 2008). Over the decades of research on language attrition processes, scholars have addressed the question of data types and data collection tools. Should spontaneous speech data be analysed or should the data be elicited in experimental settings? Researchers’ decisions have been dependent on the research focus. For instance, Schmid and Jarvis (2014) claim that free speech data are more appropriate for the study of lexical attrition than are responses to formal tasks, or even elicited narratives. Schmid and Hopp (2014) consider the difference in studying accentedness via reading tasks vs. spontaneous speech, suggesting that they rely on different skills and degrees of monitoring. When migrants are the population under study, it is sensible to use the opportunity—while eliciting narratives for linguistic analysis—to retrieve their personal accounts of the life in and/or following transition, constructing each migrant’s profile, including their language-related practices, attitudes and emotions. In ethnographically-oriented interviews, it is good practice to position the migrant as a teller, talking about their (migration) experience to an interested researcher and actively selecting the way to tell their story. We would thus meet two objectives: obtain free, relatively spontaneous speech as data on the participants’ language use, and elicit metalinguistic information. However useful free narrating may be, it is a challenge in the course of data elicitation. Participants may hesitate, have difficulty remembering or naming things, deciding about the meaning of language choices; moreover, their assessments will most likely change over time, verified by life experience, and may therefore appear contradictory. In general, the reliability of self-report data, as garnered from either surveys or (the less structured) interviews, may be questioned. On the other hand, however it is collected, information about reported language use by migrants is valuable: accounts of people’s personal experience in migration and their ideas about the significance of language use decisions will be an important part of a comprehensive inquiry into the social meaning of language and, consequently, the linguistic processes such as maintenance or attrition. In the course of an interview, detailed questions may be asked to find out about the perceived statuses of the dominant and heritage languages. For instance, one may ask: do Polish migrants perceive English as having a competitive or facilitative role (see Dziubalska-Kołaczyk 2016)? In other words, is the English language believed to subvert or support the maintenance of Polish in migrant bilingual speakers? The purportedly
L1 Foreign Accentedness in Polish Migrants
191
facilitative role of English may derive from the belief into the high prestige of English brought by migrants from their country of origin. Yet, although they see English as the language of opportunity (for employment, social integration and progress), it surfaces in in-depth interviews that migrants specifically hope for their children to become bilingual. To sum up, looking into the social context of language practices in migrants, in order to explore the conditioning of language attrition, requires data amenable to qualitative analysis. In particular, in-depth interviews, complemented by ethnographic observation, though rather time-consuming, help to avoid relying exclusively on self-report as given in response to language use tasks or strictly structured surveys. Closely listening to individual stories and opinions, as well as discussing them with the interlocutor, brings about a more nuanced insight and is a way to capture diversity in the migrant population. 3.2 Integrated Perspective With respect to a general approach to the research design, some scholars recommend the so called ‘integrated perspective’ which entails the dynamic co-existence of languages in the bi-/multilingual mind and their complex dependence on extralinguistic factors (e.g., Schmid & Yilmaz 2018). For example, in Cherciov’s (2013) study the interaction of sociolinguistic factors with language performance was shown to be non-linear and complex, and little predictive power of language use and language attitude variables was revealed. Thus, qualitative analyses of detailed interviews with participants were recommended as a way to retrieve data on their specific motivations for L1 maintenance, such as passing the heritage language on to the following generations. Indeed, the attitudes and behavioural responses of individual migrants may be very idiosyncratic. Therefore, the results of studies on migrants in different parts of the world, and with different migration motivations, have not drawn a consistent pattern with respect to language competences and practices. In a recent study, Schmid and Yilmaz (2018: 1) admit that the complexity in how bilingual development is conditioned by the sociopsychological context is a methodological challenge: “our findings show that statistical models based on linear relationships fall short of capturing the full picture”. The recommendation has been that researchers should ideally look at both/all languages involved and their mutual influences. The authors pointed to the intensity of informal use of both L1 and L2 in daily life as strongly related to L1 maintenance/attrition. 3.3 Methodological Issues in Foreign Accent Ratings From the methodological standpoint, the application of foreign accentedness ratings offers several advantages. Firstly, the ratings can be elicited
192
Kiełkiewicz-Janowiak and Wrembel
relatively quickly from a large range of raters at the same time. Secondly, this procedure provides a holistic assessment of all aspects of speech including segmentals, suprasegmentals, fluency, etc. Finally, the technique applied in the perceived global foreign accent is inherently involved in any assessment of pronunciation skills that we tend to perform on a regular basis when passing explicit or implicit evaluations of people’s speaking performance. However, there are several methodological issues that may affect the validity of FARs (cf. Schmid & Hopp 2014) such as the range of accents represented in the sample or rater effects (e.g., familiarity with accented speech). Moreover, accentedness is a complex construct and it is very difficult to assess it unidimensionally on a simple scale or to assign to specific categories. The phenomenon of accentedness comprises a variety of different features and can be detectable on multiple layers including segmental and suprasegmental levels as well as speech rate, rhythm, disfluency markers, hesitations, etc. As pointed out by Southwood and Flege (1999), there are no physical units in which accent can be measured, therefore, the raters in a foreign accentedness study may apply a contraction bias, i.e., by overestimating small differences and underestimating large ones. A considerable variety was reported with respect to the techniques used to elicit non-native speech samples. In most accentedness rating studies, subjects were recorded reading sentences, short fragments of texts and individual words (e.g., Bongaerts et al. 1997). Fewer studies employed samples of extemporaneous speech elicited through picture descriptions or recounting personal experiences (e.g., Elliott 1995). Finally, repetition techniques after a native model are also used, featuring either direct repetition (imitation) or delayed repetition (e.g., Flege 1995), the latter providing a more objective measure of speakers’ actual phonetic performance. Schmid and Jarvis (2014), in their study on lexical access and lexical diversity in first language attrition, conclude that measuring attrition in free speech provides a more accurate impression of lexical attrition than formal tasks or elicited narratives. However, free speech is somewhat problematic in itself since morphosyntactic and lexical errors may influence the accentedness ratings.
4. Research Design Proposal The present overview stresses the need to discuss the methodological issues related to L1 attrition studies based on FAR and to incorporate addressing them in a new research project. A study focusing on L1 foreign accentendness in Polish migrants in the UK needs to consider the socio-cultural conditioning of migrants’ linguistic choices as well as the relevance of the intra-and cross-linguistic contexts. Therefore, it is necessary to identify salient features indexing migrants’ L1 as foreign and to
L1 Foreign Accentedness in Polish Migrants
193
explore the acculturation phenomena that correlate with L1 attrition processes. The accentedness ratings should include two perspectives, namely that of internal and external raters. The former would involve community members (Polish migrants in the UK) and their attitudes towards their own language testifying to the process of L1 attrition and a shift of dominance in the linguistic repertoire. The latter would encompass a view from outside, i.e., that of phonetically trained Polish native speakers and their evaluation of accented Polish. This approach is novel because a distinction of raters into internal and external ones has not been implemented in any related FAR studies so far. The participants of the forthcoming project will be speakers with Polish as the L1, migrants to the UK representing different age groups and different age of arrival (AOA), for example, a) arrival early 1980s, adults (age 50+), b) arrival 2000–2010, adults (age 30+) and their children (age 6+). By considering two different waves of migration we would like to tap into how different migration motivations (i.e., predominantly politicallybased in the 1980s vs. more economically-related in the early 21st century) affect the attitudes towards one’s own (native, home, heritage) language maintenance and whether they lead to potential L1 attrition. In the project we intend to highlight the complexity of circumstances and the multiplicity of factors involved in multi-/bilingualism in the migration situation (see ‘integrated’ approach as discussed in Section 3.1.). In particular, with respect to exploring the social conditioning of L1 attrition, we postulate a sensitive investigation into individual motivations and behaviours as socially shaped in an environment which is transient and challenging, in other words far from stable and secure. Ultimately, we wish to demonstrate that speakers’ sociolinguistic practices result from their vitality and adaptability in a new environment, which often proves much less friendly than expected. We hope to be able show how speakers manage their language skills and linguistic choices to the best of their own quality of life, but also with the new generations of heritage speakers in mind. The planned methods of data collection are meant to embrace the integrated perspective and will include bio-demographic questionnaires, oral narratives, accentedness ratings and semi-structured sociolinguistic interviews. Thus, we intend to propose a triangulated access to information about the participants, and an analytical approach of combined qualitative and quantitative interpretation of the data. Firstly, a survey, either online or researcher administered, will aim at gathering bio-demographic data and exploring the first vs. second (additional) language use patterns. This will allow us to identify and control for such factors as the participants’ chronological age, their age of arrival (AoA) to the UK and the length of residence in the receiving country, the amount of exposure to the L1 and L2/Ln (according to a proposed measure) and the context of exposure to L1 and L2/Ln (at school, work,
194
Kiełkiewicz-Janowiak and Wrembel
Saturday school tutoring, family). Secondly, oral narratives will be collected through the use of a storytelling task aimed at investigating lexical richness, lexical instability, fluency/disfluencies (e.g., number of pauses) and code-switching patterns. The narratives will be used as data for further analysis, i.e., accentedness ratings. Thirdly, L1 foreign accentedness ratings will be performed as a form of an auditory assessment of crosslinguistic influence between language systems of the participants. The ratings parameters will include: • • • •
degree of foreign accentedness (on a Likert scale 1–7) comprehensibility (on a Likert scale 1–7) social acceptability (attitudes) (on a Likert scale 1–7) raters’ judgements of the number of years spent by a speaker in the UK.
The speech samples stimuli for the rating study will come from the oral narratives collected at a previous stage. The ensuing analysis will involve a phonetic and sociolinguistic analysis. Semi-structured sociolinguistic interviews will supplement the survey by garnering detailed ethnographic information on the participants’ language behaviours and attitudes. A qualitative analysis of accounts of language use in everyday life and typical communicative encounters will be conducted to reveal relevant socio-psychological factors, codeswitching patterns as well as perceptions of accentedness and attitudes to it. For example, we will question the adequacy of the established term of ‘foreign accentedness’ as we believe a migrant, with a long-standing use of L2 English as a dominant language in numerous contexts, will not perceive English as a ‘foreign’ language. Thus, we will propose the use of “L2 accentedness in L1” as a cover term for our research focus. The analytical approach will accommodate the employed versatile types of data and encompass an array of complementary analyses. Firstly, a quantitative analysis of raters’ measures of L1 foreign accentedness ratings (three rating parameters) as well as an analysis of salient features reported as contributing to the perception of foreign accentedness will be conducted. The following lexico-discursive characteristics (i.e., lexical richness, code-switching) and phonetic analysis (i.e., fluency measures) will be based on the data drawn from the oral narratives. The ensuing qualitative analysis of speakers’ and raters’ accounts of extralinguistic, social and cultural factors will take into account the data from semistructured ethnographically-oriented interviews. Further, a correlational analysis between the results of foreign accentedness ratings and interdependent variables (internal and external factors) will be conducted. Another correlation will be drawn between the acculturation phenomena (elicited from interviews) and L1 attrition processes (based on the performed lexico-discursive analysis and phonetic fluency measures).
L1 Foreign Accentedness in Polish Migrants
195
The project will focus on the phenomenon of individual ‘linguascapes’, reflected in the language practices, with speakers translanguaging or code switching in context. Specifically, we intend to investigate attitudes to foreign-accentedness of L1 in the migration setting, i.e., to explore the acculturation phenomena that correlate with L1 attrition processes. We wish to highlight the significance of such research for local language policies and the well-being and cultural value of heritage speakers in the face of bi-/multilingualism being increasingly recognised as an asset.
5. Conclusions and Implications Over decades of research on language attrition, particularly L1 attrition in migration settings, scholars have described linguistic phenomena and processes across a wide variety of contexts. However, whenever multiple languages co-exist in societies and in speakers’ minds, too often have they been treated in separation rather than as interacting, and as dynamic resources accessible and flexibly used by speakers in situated ways. Recently, scholars have pleaded for investigating language attrition as a cross-linguistic process: involving the co-existence of, and interaction between, at least two languages, treated as dynamic systems. This contribution and the forthcoming research project is intended to tap into this trend.
References Abrahamsson, N. & K. Hyltenstam. 2009. Age of onset and nativelikeness in a second language: Listener perception versus linguistic scrutiny. Language Learning 59. 249–306. Ben-Rafael, M. & M. S. Schmid. 2007. Language attrition and ideology: Two groups of immigrants in Israel. In B. Köpke, M. S. Schmid, M. Keijzer & S. Dostert (eds.), Language attrition: Theoretical perspectives, 205–226. Amsterdam, The Netherlands: John Benjamins. Bongaerts, T., C. van Summeren, B. Planken & E. Schils. 1997. Age and ultimate attainment in the pronunciation of a foreign language. Studies in Second Language Acquisition 19. 447–465. Cherciov, M. 2013. Investigating the impact of attitude on first language attrition and second language acquisition from a Dynamic Systems Theory perspective. International Journal of Bilingualism 17(6). 716–733. de Bot, K. & M. Clyne. 1994. A 16 year longitudinal study of language attrition in Dutch immigrants in Australia. Journal of Multilingual and Multicultural Development 15(1). 17–28. Dziubalska-Kołaczyk, K. 2016. Identities of English: A dynamic emergent scene. Plenary talk at ISLE International Society for the Linguistics of English, Poznań. 18–21 September 2016. Elliott, R. E. 1995. Field independence/dependence, hemispheric specialization, and attitude in relation to pronunciation accuracy in Spanish as a foreign language. The Modern Language Journal 79. 356–371.
196
Kiełkiewicz-Janowiak and Wrembel
Flege, J. E. 1988. Factors affecting degree of perceived foreign accent in English sentences. Journal of the Acoustical Society of America 84. 70–79. Flege, J. E. 1995. Second-language speech learning: Theory, findings, and problems. In W. Strange (ed.), Speech perception and linguistic experience: Issues in cross-language research, 229–273. Timonium, MD: York Press. Flege, J. E. & K. L. Fletcher. 1992. Talker and listener effects on degree of perceived foreign accent. Journal of the Acoustical Society of America 91. 370–389. Gallardo del Puerto, F., E. Gómez Lacabex & M. L. García Lecumberri. 2007. The assessment of foreign accent by native and non-native judges. PTLC Proceedings, London, CD-ROM. Hopp, H. & M. S. Schmid. 2013. Perceived foreign accent in first language attrition and second language acquisition: The impact of age of acquisition and bilingualism. Applied Psycholinguistics 34(2). 361–394. Hulsen, M. 2000. Language loss and language processing: Three generations of Dutch migrants in New Zealand. Doctoral dissertation, Katholieke Universiteit, Nijmegen. Jaspaert, K. & S. Kroon. 1989. Social determinants of language loss. Review of Applied Linguistics (I.T.L.) 83/84. 75–98. Kim, S. H. O. & D. Starks. 2008. The role of emotions in L1 attrition: The case of Korean-English late bilinguals in New Zealand. International Journal of Bilingualism 12(4). 303–319. Kim, S. H. O. & D. Starks. 2010. The role of fathers in language maintenance and language attrition: The case of Korean-English late bilinguals in New Zealand. International Journal of Bilingual Education and Bilingualism 13(3). 285–301. Köpke, B. & M. S. Schmid. 2004. First language attrition: The next phase. In M. S. Schmid, B. Köpke, M. Keijzer & L. Weilemar (eds.), First language attrition: Interdisciplinary perspectives on methodological issues, 1–43. Amsterdam and Philadelphia: John Benjamins. Liu, X. & J.-K. Lee. 2012. The contribution of prosody to the foreign accent of Chinese talkers’ English speech. Phonetics and Speech Sciences 4(3). 59–73. Marecka, M., M. Wrembel, D. Zembrzuski & A. Otwinowska-Kasztelanic. 2015. Do early bilinguals speak differently than their monolingual peers? Predictors of phonological performance of Polish-English bilingual children. In E. Babatsouli & D. Ingram (eds.), Proceedings of the international symposium on monolingual and bilingual speech 2015, 207–213. Chania: Institute of Monolingual and Bilingual Speech. Montrul, S. 2002. Incomplete acquisition and attrition of Spanish tense/aspect distinctions in adult bilinguals. Bilingualism: Language and Cognition 5(1). 39–68. Pallier, C. 2007. Critical periods in language acquisition and language attrition. In B. Köpke, M. S. Schmid, M. Keijzer & S. Dostert (eds.), Language attrition: Theoretical perspectives, 155–168. Amsterdam and Philadelphia: John Benjamins. Paradis, M. 2007. L1 attrition features predicted by a neurolinguistic theory of bilingualism. In B. Köpke, M. S. Schmid, M. Keijzer & S. Dostert (eds.), Language attrition: Theoretical perspectives, 121–133. Amsterdam: John Benjamins. Piske, T., I. R. A. MacKay & J. E. Flege. 2001. Factors affecting degree of foreign accent in an L2: A review. Journal of Phonetics 29. 191–215. Ribes, Y. & À. Llanes. 2015. First language attrition: The effects of acculturation to the host culture. Procedia: Social and Behavioral Sciences 173. 181–185.
L1 Foreign Accentedness in Polish Migrants
197
Schmid, M. S. 2002. First language attrition, use and maintenance: The case of German Jews in Anglophone countries. Amsterdam and Philadelphia: John Benjamins. Schmid, M. S. 2011. Language attrition. Cambridge: Cambridge University Press. Schmid, M. S. 2016. First language attrition. Language Teaching 49(2). 186–212. Schmid, M. S. & E. Dusseldorp. 2010. Quantitative analyses in a multivariate study of language attrition: The impact of extralinguistic factors. Second Language Research 26(1). 125–160. Schmid, M. S. & H. Hopp. 2014. Comparing foreign accent in L1 attrition and L2 acquisition: Range and rater effects. Language Testing 31(3). 367–388. Schmid, M. S. & S. Jarvis. 2014. Lexical access and lexical diversity in first language attrition. Bilingualism: Language and Cognition 17(4). 729–748. Schmid, M. S. & G. Yılmaz. 2018. Predictors of language dominance: An integrated analysis of first language attrition and second language acquisition in late bilinguals. Frontiers in Psychology 9 (article 1306, published online 20 August 2018). doi.org/10.3389/fpsyg.2018.01306. Schwartz, S. J. et al. 2010. Rethinking the concept of acculturation: Implications for theory and research. American Psychologist 65(4). 237–251. doi: 10.1037/ a0019330. Southwood, M. H. & J. E. Flege. 1999. Scaling foreign accent: direct magnitude estimation versus interval scaling. Clinical Linguistics & Phonetics 13. 335–349. Waas, M. 1996. Language attrition downunder: German speakers in Australia. Frankfurt/M. and Berlin: Peter Lang. Wrembel, M., M. Marecka, J. Szewczyk & A. Otwinowska. 2019. The predictors of foreign-accentedness in the home language of Polish-English bilingual children. Bilingualism: Language and Cognition 22(2). 383–400.
14 The Greater Poland Spoken Corpus Data Collection, Structure and Application Małgorzata Kul, Paulina Zydorowicz and Kamil Kaźmierski 1. Origins of the Project And Data Collection Process1 1.1 Project Background The aim of this contribution is to present the Greater Poland Speech Corpus, to be precise, its origin, the process of data collection, its structure, possible applications as well as future developments. The corpus was collected within the project Building an Internet Corpus of Contemporary Spontaneous Polish Spoken in the Area of Greater Poland. The aim of the project was the construction of a corpus in the form of recordings, accompanied by orthographic transcripts and partial phonetic transcription, and making it available to the academic community. One of the existing corpora is The National Corpus of Polish (Narodowy Korpus Języka Polskiego, henceforth NKJP, Pęzik 2012); nevertheless, the analysis of its structure justified the construction of a new contemporary resource of spoken Polish. Firstly, spoken language constitutes merely 10% of NKJP, thus it is the written language that constitutes the core of the resource. Within this 10%, only 1% of the data come from spontaneous conversations (Przepiórkowski et al. 2009), the remaining 9% comprise public speeches, interviews, stenographic records and TV and radio programs, which depart from the nature of spontaneous, casual conversations. Secondly, the thematic content of the spoken component is highly varied, whereas for studies of phonetic variability, word repetitions in various phonetic, prosodic and semantic contexts are sought after. Sufficient numbers of repetitions are necessary to establish phonetic norms for contemporary Polish, e.g., vowel formant values or Voice Onset Time. Furthermore, NKJP does not contain a controlled word list or a scripted text for all speakers to compare the pronunciation of words in scripted and unscripted speech. Demographic information about the speakers is incomplete (place of residence/origin is either uncontrolled or unknown). The recordings differ in size, from 17 seconds to 38 minutes. Finally, most recordings come from the PELCRA corpus, collected between 1992 and 2003, so its contemporaneity may be questioned. Another extant resource on phonetic variation in Polish is Słownik wariantywności
The Greater Poland Spoken Corpus
199
fonetycznej współczesnej polszczyzny (A Dictionary of Phonetic Variation in Contemporary Polish, Madelska 2005). Its strength lies in professional phonetic transcriptions of recordings, collected among university students from different places of origin in Poland. The resource serves as a valuable tool for linguists, speech therapists and language teachers. Its limitations include the lack of background information about the speakers, the cumulative nature of the data, which precludes studies of intersperaker variability, and most importantly, no access to the sound files, which forecloses acoustic analyses. To the best of our knowledge, ten Polish speech corpora are available. Most of them, however, do not fulfill the following criteria as compiled in Table 14.1: (1) it is free, open and easily accessible, (2) it is large, (3) it possesses excellent acoustic quality, (4) it contains speech data, elicited Table 14.1 A Compilation of Spoken Polish Corpora and Their Characteristics Existing corpora Large of spoken Polish (length and speaker sample size) The National Corpus of Polish (NCP) PELCRA JURISDICT Multimodal Communication: Culturological Analysis (MCCA) Polish Corpus of Wroclaw University of Technology (PC of WUT) LUNA Polish Speech Corpus of AGH (PSC of AGH) Audiovisual Polish Speech Corpus The Database of Emotional Speech Acoustic Database for Polish Unit Selection Speech Synthesis
Includes Excellent Open spontaneous acoustic access to speech and/or quality research conversational speech
Annotated for phonetic research
✔
✔
—
✔
partly
✔ — —
? partly ✔ (6%)
— ✔ ? (no info)
— ? (no info) ? (no info) ?(no info) ? (no info) ? (no info)
—
✔
—
?(no info) ? (no info)
✔ —
— —
✔ ✔
?(no info) ? (no info) ?(no info) ? (no info)
—
—
✔
? (no info) ? (no info)
✔
—
—
✔
? (no info)
✔
—
—
✔
—
200
Małgorzata Kul et al.
from contemporary speakers of Polish and (5) it includes recordings of spontaneous speech in a conversational setting. Even though current corpora are of excellent acoustic quality, many are small in sample size or recording duration. While lots of conversational data is scattered as part of larger collections, we are not aware of a database of sufficient size that collects spontaneous dialogue or multiparty conversation of good acoustic quality and no crosstalk. Some large databases of excellent audio quality (JURISDICT) are not available or proprietary. In light of the above compilation, it appears that corpora of spoken Polish designed for acoustic analysis and available online are scarce and/ or unavailable for research (marked here as ? no info). Thus, by building the Greater Poland Spoken Corpus we wish to remedy the situation. 1.2 The Structure of the Corpus The corpus represents speakers from the city of Poznań as well as the voivodeship of Greater Poland (see Figure 14.1). Its construction was greatly inspired by the methodology applied in the collection of the English corpus La Phonologie de l’Anglais Contemporain (Durand & Pukli 2004) and its French counterpart Phonologie du Français Contemporain, (Durand et al. 2002). The methodology draws on the works of Labov (1972) and Milroy (1980). According to Milroy (1980), a participant recruited for the recording session selects another participant, for example, a family member or an acquaintance. In the construction of GPSC, we followed suit: the interview was conducted in a 2 + 2 format, i.e., both speakers knew each, they were accompanied by two interviewers, and the recordings were collected in an amiable, informal environment. The recordings were collected in a setting familiar to the participants, i.e., at university (in the case of students), at work (in the case of employees) or at home (in the case of some speakers residing outside of Poznań). The recording procedure consisted of three parts: providing background information, interview proper, and sentence reading. First, each participant was requested to fill in a questionnaire with background information (place of birth, place of residence, education, family background, the command of foreign languages) as well as to sign a consent regarding the use of the data for academic and scientific purposes. Subsequently, participants were interviewed by two project investigators. The thematic scope of the interview comprised such topics as studies (in the case of students), work (in the case of graduates/jobholders), the place of residence, the Internet, culture and entertainment in Poznań. Thematic coherence ensures numerous repetitions of the same words within and across speakers. The minimum length of the spontaneous component amounts to 15 minutes, but most frequently the recording extends to 40 minutes per couple. Lastly, participants were instructed to read a list of words
The Greater Poland Spoken Corpus
201
Figure 14.1 GPSC Speakers Within the Area of Greater Poland. Point size corresponds to the number of speakers. So-called ‘dialect’ speakers are not included.
embedded in carrier sentences. This task was registered in an anechoic chamber. Prior to corpus collection, pilot sessions were organized with the view of verifying the format and the conditions of the recording session as well as revising the word list. The aim of introducing the word list into the procedure was to obtain a set of words which are produced under controlled conditions, and to obtain reference values for segments,
202
Małgorzata Kul et al.
e.g., vowel formants or consonant quality prior to the application of connected speech processes. The structure of the word list was as follows: the vocabulary items were selected to represent the phonemic inventory of Polish (they were largely inspired by the word list in Jassem 2003); subsequently, the list was supplemented with the most frequent words from the pilot interviews. The words were embedded in three types of carrier sentences, which guaranteed three repetitions of each word by each participant. Such a policy ensures a fair representation of the phonemic inventory of Polish, on the one hand, and on the other hand, it enables comparing words produced in scripted and unscripted speech. Additionally, 11 words and phrases were added to the list in order to trace the pronunciation labials and the process of pre-sonorant voicing of obstruents. The data was collected between November 2013 and July 2017. The high-quality of the recordings is ensured by using a professional Roland R-26 recorder and the lapel lavalier Rode microphones. Upon data collection, the corpus was transliterated in Praat (Boersma & Weenink 2014) (15 minutes per session). Each transcript was double-checked by another transcriber. Altogether, the corpus comprises recordings of 94 speakers (63 female and 31 male), born between 1951 and 2006. Seventy-three transcripts of spontaneous conversation are available (68 speakers with text grids ready to be processed automatically in systems like LaBB-CAT (Fromont & Hay 2012); 34 speakers with .doc files). Phonemic transcription (for the transliterated parts) was added automatically; three reduction processes, namely consonant cluster reduction, intervocalic w-deletion and assimilation of /t/ + /ʂ/ sequences, were manually annotated. The view of a Praat window is depicted in Figure 14.2. For illustration purposes, we present the waveform, the spectrogram, the transcription tier, the number of syllables for each word, the tier where the three processes are identified, the speech rate and the orthographic tier. The corpus documents Poznań speech in the second decade of the third millennium and enables tracing language evolution and making comparisons with previous descriptions of the Polish language. The corpus is also a suitable resource for dialectal, sociolinguistic, phonostylistic and phonotactic studies as well as for training speech recognition systems and increasing their effectiveness (Wypych 1999). The recordings and transcripts are available online in an electronic version for educational and scholarly purposes (Kul & Zydorowicz online, http://wa.amu.edu. pl/korpuswlkp/).
2. A Study Illustrating the Use of the GPSC Corpus We have used the corpus to establish the prevalence of different pronunciation variants of selected phonological variables in present-day Poznań
The Greater Poland Spoken Corpus
203
Figure 14.2 A Spectrogram and Selected Annotation Layers of an Example Sentence. The layers (counting from the top) display: (1) phonological transcription of each word, (2) number of syllables of that word, (3) the incidence of a connected speech process, (4) speech rate (in syllables per second) of the utterance, and (5) an orthographic transcript of the utterance.
speech in order to present a quantitatively informed picture of variation in this variety of Polish (Kaźmierski et al. In press). We also compared these results to the record of Poznań speech from the 1980s, as documented by Witaszek-Samborska (1985, 1986), to address the issue of potential dialect leveling. 2.1 Method The part of our corpus that was transcribed and time-aligned at the time this project started (15 minutes of speech from four speakers) was queried with regard to all the features Witaszek-Samborska studied, and all variables with at least 20 occurrences of the relevant context were selected for further investigation. Eight features met this criterion; four of them classified as ‘widespread’ and four as ‘recessive’ in the original study. The widespread features included: (1) pre-sonorant voicing of obstruents (pta[g] odfrunął ‘the bird flew away’), (2) nasal stopping, i.e., the realization of word-final /ɔ̃/ as [ɔm] (id[ɔm] t[ɔm] drog[ɔm] ‘they are going this
204
Małgorzata Kul et al.
way’), other recorded realizations of word-final /ɔ̃/ include: [ɔw̃], [ɔw], and [ɔ], (3) cross-morphemic nasal assimilation (panie[ŋ]ka ‘maid’), and (4) stop + fricative affrication ([t͡ʂ]eba ‘one has to’). The recessive features included: (1) pre-j /ɛ/ raising (lepi[i]j ‘better’), (2) voicing retention in clusters (tr[v]ały ‘lasting’), (3) prothetic /w/ ([w]ojciec ‘father’) and (4) śmy voicing and stress shift (słyszelˈi[ʑ]my ‘we heard’). Once additional time-aligned transcripts were in place, out of all speakers from the corpus, only Poznań residents (14 altogether, 11 females and 3 males, born between 1993 and 1996) were selected, and their transcripts queried for all contexts where the eight selected features of Poznań speech could show up. The results were then coded manually by the three authors, based on audition, aided by visual inspection of spectrograms. 2.2 Results Most of our variables were binary: the variable was either realized with the local variant or it was not. Only word-final /ɔ̃/, and śmy voicing + stress shift had four possible realizations (details below). Due to the rather small number of speakers in this preliminary study, we cannot perform a robust statistical analysis. Instead, we present summaries of our data set, hoping to stimulate discussion and further research. We have chosen to visualize only the results for ‘widespread’ (cf. section 2.1) Poznań speech features (see Figure 14.3, panels a1, b1, c1 and d1); the outcomes for recessive features are presented without graphical aids. Plots showing individual variation (i.e., panels a2, b2, c2 and d2 in Figure 14.3) omit speakers who did not produce the relevant contexts at all. Pre-Sonorant Voicing Of all the obstruent + sonorant sequences across word boundaries (N = 723), 42% (i.e., 301) showed the voiced variant and 58% (i.e., 422) the voiceless variant (see Panel a1 in Figure 14.3). While Speakers 24, 8, 6, 22, 55 and 44 display a tendency towards pre-sonorant voicing (above the mean of 42%), Speakers 36, 65, 62, 21, 20 and 31 show rates well below the mean. Word-Final /ɔ̃/ We recorded four realizations of word-final /ɔ̃/: (1) vowel followed by nasalized glide [ɔw̃], (2) vowel followed by nasal stop [ɔm], (3) vowel followed by oral glide [ɔw] and (4) oral vowel [ɔ]. Variants (1) and (2) were the most common. The traditional Poznań feature, nasal stopping (i.e., the [ɔm] variant) occurred in only 25% of cases (80/309). This could suggest that [ɔm] is not the dominant variant in present-day Poznań speech. Instead, the
Figure 14.3 Results for ‘Widespread’ Features of Poznan Speech. Panel a1: presonorant voicing. Panel a2: individual variation for pre-sonorant voicing (NB here and in following plots with binary outcomes, the top broken line represents the mean of means, and the bottom line an arbitrary threshold of 10%). Panel b1: word-final /ɔ/̃ is demonstrated. Panel c1: cross-morphemic nasal assimilation. Panel c1 shows only 12 speakers, as for two speakers, a cross-morphemic /nk/ sequence did not arise. Panel d1: stop + fricative affrication.
206
Małgorzata Kul et al.
standard [ɔw̃] dominates (71% tokens, 229/309). This contrasts with Witaszek-Samborska (1985), who found it was “widespread” for speakers of all age groups, with 23 out of 43 speakers in her sample having [ɔm] an exclusive or almost exclusive variant. The occurrence of [ɔw] and [ɔ] is a little surprising, as [ɔw] is typical of Warmia and Mazury (Dubisz et al. 1995: 114), and [ɔ] of Eastern Poland (Dunaj 2006: 163). As Panel b2 in Figure 14.3 illustrates, only for three speakers (62, 20, 31) was [ɔm] the dominant variant. For all remaining speakers, [ɔw̃] is favored. Cross-Morphemic Nasal Assimilation The post-dental /n/ showed regressive place assimilation in 71% of all cross-morphemic /nk/ sequences (37/52) (Panel c1 in Figure 14.3. For most speakers, assimilation was an exclusive variant, with only three speakers oscillating, and one speaker not using it at all. Due to a rather small number of observations, however, these percentage values should be approached with caution. Stop + Fricative Affrication For this variable, 23% of tokens were affricated variants (21/92) and 77% were not (71/92). Panel d1 in Figure 14.3 illustrates their distribution. The results indicate that while the affricated variant was overall rather rare among Poznań speakers, as Panel d2 in Figure 14.3 illustrates, certain speakers produced the affricated variant at high rates. Pre-j /ɛ/ Raising Out of all cases of /-ɛj/, only 13% (40/307) were raised, whereas the remaining 87% (267/307) were not. It seems that the use of the raised variant of /ɛ/ is marginal nowadays. It has to be noted that within the raised variants, none was realized as [ij], i.e., with a fully high vowel. Rather, we observed a realization closer to [ɨj]. Half of the speakers showed some degree of pre-j /ɛ/ raising. Voicing Retention in Clusters Turning to the retention of voicing in /v/ in clusters in which it is preceded by voiceless plosives, only 4% tokens were voiced (3/67) while the remaining 96% were voiceless (64/67). In comparison with the Poznań speech of the 1980s, very few instances of the voiced variant occurred in the speech of modern Poznań inhabitants.
The Greater Poland Spoken Corpus
207
Prothetic /w/ Among the four hundred cases of word-initial /ɔ/ in the data set, we have not found a single instance of prothetic /w/ before /ɔ/. Thus, prothetic /w/ seems not to be a variable in modern Poznań speech. Śmy Voicing and Stress Shift Based on the literature, we expected to find two potentially related processes in words ending in /ɕmɨ/: voicing of /ɕ/ and stress shift from antepenultimate to penultimate, e.g., ˈmieliśmy → mieˈli[ʑ]my. First of all, the overall number of observations was very low (N = 20). Most tokens had the stress-shift, none had voicing. As a summary of the results, the eight features of Poznań speech, divided into widespread (top four rows) and recessive (bottom four rows) according to Witaszek-Samborska (1985), are presented in Table 14.2. Another finding which merits particular interest is that the rates of use of the Poznań variables by the 14 speakers varied considerably. This suggests that certain speakers are more ‘dialectal’ than others, feeding the discussion on the role of interspeaker variability in language theories (as hinted in Piroth & Janker 2004). 2.3 Discussion The four variables categorized as recessive in 1985 (i.e., raising of the vowel in /-ɛj#/, retention of voiced fricative /v/ in clusters after voiceless plosives, insertion of a prothetic /w/ before word-initial /ɔ/ and voicing /ɕ/ to /ʑ/ in /-ɕmɨ/) are either marginally represented or nonexistent. As the original survey classified them as widespread in the speech of older speakers only, their weak representation in the speech of young speakers 30 years down the line is a logical continuation. Somewhat surprisingly, the variable with the highest incidence within the group of recessive Table 14.2 Summary of the Variables and Comparison to Witaszek-Samborska (1985). Percentages show overall rates Variable
Witaszek-Samborska (1985)
Present study
nasal assimilation pre-sonorant voicing nasal stopping affrication pre-j /ɛ/ raising voicing retention in /v/ prothetic /w/ śmy voicing
Widespread (present in all age-groups)
71% (37/52) 42% (301/723) 25% (80/323) 23% (21/92) 13% (40/307) 4% (3/67) 0% (0/400) 0% (0/20)
Recessive (in older age-groups only)
208
Małgorzata Kul et al.
features is the raising of the vowel in /-ɛj#/. It has to be pointed out that 65% of cases for which it was coded to occur involve a pre-palatal, palatal or palatalized consonant immediately preceding the vowel (e.g., później [ˈpuʑɲi] ‘later’, angielskiej [aŋˈɟelsci] ‘English’, lepiej [ˈlɛpʲji] ‘better’, respectively). The raising of the vowel in these cases can be plausibly interpreted as a coarticulatory phonetic effect, rather than as evidence of the speakers having a representation of the suffix with /i/ instead of /ɛ/. The remaining cases, however, were not preceded by palatal segments, and so arguably are instances of a representation with a raised vowel. What is more relevant to the issue of dialect leveling, local variants of the four variables which were categorized as widespread in all age groups in 1985 (i.e., /ŋ+k/, pre-sonorant voicing, /-ɔ̃#/ as /ɔm/ and affrication) show rates of use which are not indicative of their dominance in the present study. Only one of the variables (/ŋ+k/) was realized with its local variant in more than a half of the contexts in which it was expected (71% of the time), and so, its use could arguably be still seen as ‘widespread’, with no decline compared to the 1980s. The remaining three variables seem to be on the decline, with rates of incidence below 50%. Caution, however, needs to be taken with regard to this conclusion for several reasons. First, with regard to the state of affairs in the 1980s, the original survey does not present quantitative data, and it is not straightforward how to interpret the impressionistic category of ‘widespread’ in numerical terms. Second, the present study is based on speech collected by conducting informal interviews. Still, a certain degree of selfconsciousness of the participants leading to speech-monitoring cannot be completely ruled out. If the rates are lowered by conscious suppression of local variants by the speakers, however, such style-shifting might be an indication of stigmatization of these variants. This usually implies a ‘change from above’, i.e., a change towards a socially prestigious norm. Interestingly, the variables whose realization with the Poznań variant is noticeable (> 10% of all contexts for a given speaker) remain in an implicational relationship to one another (Table 14.3), in that having a local realization of a variable higher up in the table implies having a Table 14.3 Implicational Relationship Between Variables. A check mark means that the speaker has realized in excess of 10% of tokens with the Poznań variant Speaker affrication /ŋ+k/ pre-sonorant voicing /ɔm/ for /ɔ̃#/
6 ✓ ✓ ✓
8 ✓ ✓ ✓
21 ✓ ✓ ✓
✓ ✓ ✓
22 ✓ ✓ ✓
24 ✓ ✓ ✓
36 ✓ ✓ ✓
43 ✓ ✓ ✓
62 ✓ ✓ ✓
20
31
44
55
65
✓ ✓
✓ ✓
✓ ✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
69
The Greater Poland Spoken Corpus
209
local realization of all the variables below (the two variables at the bottom are tied). This generalization holds with no exceptions. For instance, the presence of the velar nasal /ŋ/ before heterosyllabic /k/ implies having both pre-sonorant voicing and /ɔm/ for /ɔ̃#/. This is an indication that the variables differ in the degree to which they are associated with Poznań speech.
3. Implications for Further Research The study reported above would greatly benefit from including a larger number of subjects. Apart from this, three directions for further research may be outlined. Firstly, we could compare the Poznań features with the features of speakers from outside Poznań. In order to verify the dialect leveling hypothesis further, we could compare the outcomes of the present study with other varieties of Polish, the Warsaw variety in particular. Secondly, in addition to furnishing quantitative analysis of eight Poznań features, we ought to investigate the factors behind the features’ variability. These factors may include lexical frequency, predictability, production planning, phonetic context, speakers’ gender and age. Thirdly, further light could be cast on the issue of awareness and style shifting by annotating the recordings with regard to style. Currently, there is a lively interest in investigating unscripted speech. While until recently phoneticians and phonologists were almost exclusively interested in highly controlled laboratory speech (Johnson 2003), they are now increasingly turning towards unscripted data. In the past, unscripted speech was seen as unconstrained, with too many interfering variables to allow for useful generalizations. However, with the advent and dissemination of sophisticated statistical tools, it is now possible to control for a number of variables at the data analysis stage, instead of at the data collection stage. Thus, the benefits of using unscripted speech can be fully exploited. It has to be stressed that the GPSC in its current shape will be expanded and fully annotated. The recordings already collected to form part of the Greater Poland Speech Corpus will be further supplemented by recordings of approximately thirty-five speakers to attain equal numbers of female and male speakers. Additionally, orthographic transcripts will be prepared in Praat for all new recordings, as well as for the parts of the corpus which, as yet, are not accompanied by orthographic transcriptions. As of now, the Greater Poland Speech Corpus contains orthographic transcriptions of 15 minutes of each recording. In the future, all recordings will be provided with orthographic transcripts in their entirety. The orthographic transcriptions at the utterance level of the entire corpus will be used to create orthographic annotation at the word level and phonetic annotation at the phonemic level (automatically through forcealignment), as well as to add part of speech tagging (semi-automatically).
210
Małgorzata Kul et al.
The annotated corpus will be of interest to linguists from the area of phonetics and phonology. For instance, in discussing VOT values for Polish speakers, Waniek-Klimczak (2011) and Wrembel (2011) refer to Keating et al. (1981). One may expect that the corpus will furnish an up-to-date norm/reference point of the Greater Poland variety. Following studies on the evolution of English (Labov 1994; Preston 2003) and German (Piroth & Skupinski 2010), the fully developed corpus will be an excellent resource to conduct research on language variation and change in Polish. Regarding typological research, the extended corpus might prove useful for researchers working on languages other than Polish with the view of including a typologically unrelated language in their research paradigms. The corpus may also appeal to any researcher venturing into discourse analysis and might provide ample material for lexical and sociolinguistic studies. In a similar vein, our findings may be compared to those suffering from various speech pathologies (Połczyńska & Tobin 2011). We have shown that the corpus can be effectively mined for phonological material (Kaźmierski et al. In press); future work can be extended to other linguistic variables.
Acknowledgements The first two authors gratefully acknowledge the financial support of the Ministry of Higher Education (grant number: 0113/NPRH2/ H11/81/2013). We also would like to express our gratitude to Dr Liliana Madelska for her involvement in the project, to Marek Simon for collecting a subset of the recordings as well as to our transcribers, Dr Karolina Rosiak, Karolina Baranowska and Zofia Wypychowska. Very special thanks go to Dr Kamil Kaźmierski who monitored the technical aspect of the project by compiling a pronunciation dictionary of Polish and furnishing statistical analysis.
Note 1. The corpus construction was financed by the Ministry of Science and Higher Education within the National Program for the Development of Humanities (Narodowy Program Rozwoju Humanistyki; grant number 0113/NPRH2/ H11/81/2013).
References Boersma, P. & D. Weenink. 2014. Praat: Doing phonetics by computer [Computer program]. Version 5.3.11. www.praat.org/. Dubisz, S., H. Karaś & N. Kolis. 1995. Dialekty i gwary polskie. Warszawa: Wiedza Powszechna. Dunaj, B. 2006. Zasady poprawnej wymowy polskiej. Język polski 3. 161–172.
The Greater Poland Spoken Corpus
211
Durand, J., B. Laks & C. Lyche. 2002. La phonologie du français contemporain: Usages, variétés et structure. In Romanistische Korpuslinguistik-Korpora und gesprochene Sprache/Romance Corpus Linguistics: Corpora and spoken language, 93–106. Tübingen: Gunter Narr. Durand, J. & M. Pukli. 2004. How to construct a phonological corpus: PRAAT and the PAC project. Tribune Internationale des Langues Vivantes (TILV) 36. 36–46. Fromont, R. & J. Hay. 2012. LaBB-CAT: An annotation store. Proceedings of the Australasian Language Technology Association Workshop, Dunedin, New Zealand. 113–117. Jassem, W. 2003. Illustrations of IPA: Polish. Journal of the International Phonetic Association 33(1). 103–107. Johnson, K. 2003. Massive reduction in conversational American English. (Proceedings of the Speech: Data and Analysis, Aug., 2002, Tokyo, Japan). Kaźmierski, K., M. Kul & P. Zydorowicz, P. In press. Educated Poznań speech 30 years later. In Studia Linguistica Universitatis Iagiellonicae Cracoviensis. Keating, P., M. Mikoś & W. F. Ganong. 1981. A cross-language study of range of VOT in the perception of stop consonant voicing. Journal of Acoustic Society of America 70. 1260–1271. Kul, M. & P. Zydorowicz. Korpus Mowy Wielkopolskiej [Greater Poland spoken corpus]. http://wa.amu.edu.pl/korpuswlkp/ (9 Jan. 2019). Labov, W. 1972. Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press. Labov, W. 1994. Principles of language change. Vol 1: Internal factors. Oxford: Blackwell. Madelska, L. 2005. Słownik wariantywności fonetycznej współczesnej polszczyzny. Kraków: Collegium Columbinum. Milroy, L. 1980. Language and social networks. London; Baltimore: Basil Blackwell & University Park Press. Narodowy Korpus Języka Polskiego. http://nkjp.pl/ (10 Jan. 2019). PAC (La Phonologie de l’Anglais Contemporain: usages, variétés et structure: The phonology of contemporary English: Usage, varieties and structure). www. projet-pac.net/ (10 Jan. 2019). Pęzik, P. 2012. Język mówiony w NKJP. In A. Przepiórkowski, M. Bańko, R. L. Górski & B. Lewandowska-Tomaszczyk (eds.), Narodowy Korpus Języka Polskiego, 37–47. Warszawa: Wydawnictwo Naukowe PWN. PFC (Phonologie du Français Contemporain). www.projet-pfc.net/ (10 Jan. 2019). Piroth, H. G. & P. M. Janker. 2004. Speaker-dependent differences in voicing and devoicing of German obstruents. Journal of Phonetics 32. 81–109. Piroth, H. G. & P. Skupinski. 2010. Merging and splitting processes in Mountain Silesian: A comparison to the Standard German vowel system. (Paper presented at Sociophonetics at the Crossroads of Speech Variation, Processing and Communication, Pisa). Połczyńska M. & Y. Tobin. 2011. Phonetic processes and their influence on speech intelligibility in dysarthric speech after traumatic brain injury: A longitudinal case study. In L. Wai-Sum & E. Zee (eds.), Proceedings of the 17th international congress of phonetic sciences, 17–21 August 2011, 1622–1625. Hong Kong: City University of Hong Kong.
212
Małgorzata Kul et al.
Preston, D. R. (ed.). 2003. Needed research in American dialects (= Publications of the American Dialect Society 88). Durham, NC: Duke University Press. Przepiórkowski, A., R. L. Górski, B. Lewandowska-Tomaszczyk & M. Łaziński, M. 2009. Narodowy Korpus Języka Polskiego. Biuletyn Polskiego Towarzystwa Językoznawczego 65. 47–55. Waniek-Klimczak, E. 2011. Aspiration and style: A sociophonetic study of the VOT in Polish learners of English. In M. Wrembel, M. Kul & K. DziubalskaKołaczyk (eds.), Achievements and perspectives in the SLA of speech: New Sounds 2010, vol. 1, 303–316. Frankfurt am Main: Peter Lang. Witaszek-Samborska, M. 1985. Regionalizmy fonetyczne w mowie inteligencji poznańskiej. Slavia Occidentalis 42. 91–104. Witaszek-Samborska, M. 1986. Mowa poznańskiej inteligencji. In M. Gruchmanowa, M. Witaszek-Samborska & M. Żak-Święcicka (eds.), Mowa mieszkańców Poznania, 29–87. Poznań: Wydawnictwo Poznańskie. Wrembel, M. 2011. Cross-linguistic influence in third language acquisition of Voice Onset Time. (Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, Aug. 17–21, 2157–2160). Wypych, M. 1999. Implementacja algorytmu transkrypcji fonematycznej. In Speech and Language Technology, vol. 3. Poznań: Polskie Towarzystwo Fonetyczne.
15 Sounds Delicious! John C. Wells
Many of the words for things we eat and drink exemplify points of phonetic or etymological interest. In what follows I discuss some of them. The names used for specific meals in English vary greatly, depending on the speaker’s age, class, and location of origin, the time of day, or the size of the meal. Our first meal of the day is breakfast. Breakfast is an interesting word in respect of sound-to-spelling correspondences. Although its etymology is transparent—the meal in which we break our night-time fast (= abstention from food), nevertheless its pronunciation ˈbrekfəst has two points of interest. In standard pronunciation the first syllable has the short vowel e (DRESS) rather than the diphthong eɪ (FACE) that we use in the verb to break; and the second syllable has a reduced vowel ə compared with the strong BATH vowel that we use in fast as an independent word. Some English people drink tea with their breakfast. Others, including me, drink coffee. Both these beverages have names borrowed from foreign languages: tea from Chinese 茶 chá, coffee via Turkish from Arabic qahwah. Our word tea, as in the case of most European languages, comes not from standard Chinese chá but from tê, the form this word takes in Hokkien, the form of Chinese local to Amoy (now Xiamen). We now pronounce tea as tiː, with the FLEECE vowel, though previously, as shown by rhymes in poetry down to 1782, it had the FACE vowel (and in a few local dialects still does). With coffee, we have recently acquired a number of new loanwords from Italian, among them espresso, cappuccino, and latte. For the first, alongside standard eˈspresəʊ you can also hear ekˈspresəʊ (under the influence of express and the many words we have that start with ex-). The second can give rise to jokes about one ˌkæpəˈtʃiːnəʊ, two cups o’ chino. For the third we have not yet in England quite decided whether to map Italian a onto our short vowel, ˈlæteɪ, or onto our long one, ˈlɑːteɪ. The Americans go for the latter. Ever since I came across it when I first visited the United States, I have liked to start my breakfast with a glass of orange juice ˈɒrɪndʒ dʒuːs. Here, too, there are two points of phonetic interest. One is the double affricate.
214
John C. Wells
Unlike plosives, which when doubled (geminated) are pronounced as a single articulation with a lengthened hold phase, doubled affricates in English are pronounced as two successive complete articulations. The other is the etymology of the word orange. This word seems to have first reached Europe as the Arabic nāranj from Sanskrit, possibly from Tamil or some other Dravidian language. The earliest uses of the word in English refer to the fruit, and the colour orange was later named after the fruit. We took the word from Old French, where the initial n-seems to have been reinterpreted as part of the indefinite article, thus une norange taken as une orange, or in English a norange taken as an orange. This same process, known as metanalysis, gave us an adder (a kind of snake) from earlier a nadder and an apron from earlier a napron. What else do you have for breakfast? Some people like porridge ˈpɒrɪdʒ, particularly when the weather is cold. It is made of oatmeal boiled in water, and we usually eat it with milk and sugar. (The Scots are supposed to prefer it with salt, and the Americans just call it oatmeal.) The origin of this word is phonetically interesting. It started out in English as pottage, borrowed around 1300 as pɒˈtɑːːʒ or ˈpɒtɑːdʒ from the French potage meaning ‘soup, food cooked in a pot’. Somehow the t between the vowels came to be reinterpreted as r—perhaps by the same mechanism by which modern AmE Betty, with its tapped t ˈbeɾi can sound to some people like berry. Spellings with r rather than t are attested from the middle of the 16th century. Pottage itself, with t, remains in the expression pease pottage ˈpɒtɪdʒ, an alternative name for pease pudding, a thick paste made of dried peas. The peas we eat as a vegetable have an interesting etymology, too: the word pea is a back-formation from pease piːz, which has its origin in Latin pisum and Greek πίσος. The z was reinterpreted as the plural ending, and a new singular pea piː inferred. The same thing happened in cherry, which originally had a final sibilant, as seen in Latin cerasus, French cérise, German Kirsch etc; but was then reinterpreted as plural. Rather than porridge for breakfast, some people prefer muesli ˈmjuːzli. As you might guess from the non-English appearance of its spelling, this word too is a borrowing from another language—in this case quite a recent borrowing from Swiss German. It was invented around 1900 by the Swiss physician Maximilian Bircher-Benner for patients in his hospital. In German it is spelt Müsli and pronounced myːsli. English does not have the close front rounded vowel y, so we have done what we usually do with such borrowings, remapping the vowel as juː (or before r as jʊə), keeping the closeness and roundedness as u: (ʊə) and recasting the front (palatal) element as the semivowel j. (Compare Führer ˈfjʊərə, Zürich ˈzjʊərɪk, etc.) Another cereal people eat for breakfast is corn flakes—corn in the American sense of ‘maize’ rather than the British sense, usually ‘wheat’ or ‘oats’. In ˈkɔːn fleɪks there is the possibility of dealveolar assimilation
Sounds Delicious!
215
so that the n of corn becomes labiodental, giving ˈkɔːːɱ fleɪks. The word cereal, meaning any kind of grain that we eat, is a homophone of serial ˈsɪəriəl, which allows us to make puns such as “What do you call a monster who poisons corn flakes?”—“A cereal killer”. A healthy addition to breakfast is fruit. In Britain we have always had our traditional apples, pears, plums, and various berries. The word apple ˈæpl̩ is a good example of one sound change currently under way in the south of England, namely l-vocalization, which turns it into ˈæpo. Among our native currants, blackcurrants supply a good example of a geminated plosive, ˌblækˈkʌrənts, while redcurrants ˌred ˈkʌrənts exemplify possible dealveolar assimilation into ˌreɡ ˈkʌrənts. Talking about both types in the same sentence pushes us into the use of contrastive stress, ˈblækˌkʌrənts and ˈredˌkʌrənts. In summer and autumn we can pick strawberries, raspberries, gooseberries and blackberries to add to our breakfast cereal or to eat in pies and other desserts. Americans usually pronounce the second part of the names of these fruit with a strong vowel, thus ˈstrɔːˌ ːˌberi, ˈræzˌberi, ˈblækˌberi. But the British normally weaken the vowel or elide it altogether, thus ˈstrɔːb(ə)ri, ˈrɑːzb(ə)ri, ˈblækb(ə)ri. You will notice that for raspberry and gooseberry the pronunciation of the first element does not usually quite correspond to the spelling. The letter p in raspberry is silent, while in both words there may be an irregular assimilation of voicing, so that we can have z rather than s. The rasp-of raspberry has the BATH vowel, so we have BrE RP ˈrɑːz-but AmE ˈræz-. In BrE, but not in AmE, gooseberry can have a short vowel (FOOT) in the first element, giving BrE ˈɡʊzbri alongside AmE ˈɡuːsˌberi. In informal or jocular style, we sometimes call strawberries strawbs strɔːbz (hence the name of the 1960s rock group) or even strawbugs ˈstrɔːbʌɡz. In moorland areas we may be able to find wild bilberries (in various parts of Britain also known as whortleberries or blaeberries), while supermarkets import the similar cultivated blueberries from all over the world and throughout the year. Nowadays, as well as the familiar oranges (already discussed) and bananas bəˈnɑːnəz (AmE bəˈnænəz), we also import a wide range of exotic fruit. As well as importing the fruit themselves from foreign countries, we also import their names from foreign languages. My local supermarket sometimes has kumquats for sale. I like to slice these tiny citrus fruits and add them to salads, though I do get rid of the pips they contain. I pronounce kumquat (there is an alternative spelling cumquat) as ˈkʌmkwɒt, and I think most speakers of English do likewise. The word comes to us from Cantonese, where it is pronounced k=ɐm k=wɐt, with tone 1 (high level) on each syllable (In Mandarin it would be kin kū). Cantonese ɐ sounds like English ʌ, so it makes sense to map it accordingly, giving the English word a first syllable kʌm. The second syllable
216
John C. Wells
has a phonetic shape familiar from words such as squat skwɒt and kwɒd quad. Oranges are just one kind of citrus fruit. There are many others, some with quite interesting names. At Christmas time we often eat satsumas. The name satsuma sætˈsuːmə comes of course from Japanese. In English we regard the two parts of the affricate ts as belonging in separate syllables—unlike in Japanese, where ts is just the form that t takes before the close back vowel. Satsumas are so called in English because the first of these fruit to reach the West were exported from Satsuma province in Japan. Mandarin oranges, also known just as mandarins, share the name mandarin ˈmændərɪn not only with the standard form of the Chinese language but also with the Chinese officials who speak or spoke it and the yellow robes the dignitaries wore—yellow like the colour of the fruit. Some people like yogurt with their cereal, too. In England we pronounce this word with the LOT vowel in the first syllable, thus ˈjɒɡət, though Americans and Australians use the GOAT vowel. The word can furthermore be spelt with or without an h after the g. Either way, the word comes from the Turkish yoğurt joˈuɾt. Yogurt is often sold with fruit flavouring or with fruit added; my local supermarket now offers yogurt with yuzu ˈjuːzuː, which is a loanword from Japanese (ユズ). A full English breakfast, a cooked breakfast, includes fried bacon and eggs or ham and eggs, perhaps with sausages, mushrooms, fried tomatoes, and sautéd potatoes or hash browns. The word bacon ˈbeɪkən came into English via medieval French from an Old High German word, even though modern French and German now use quite different names for it. Phonetically, it is susceptible to possible syllabic consonant formation and progressive assimilation, yielding possible pronunciations ˈbeɪkn̩ and ˈbeɪkŋ. The word sausage ˈsɒsɪdʒ is one of the few words in which the spelling au corresponds to the LOT vowel rather than the more usual THOUGHT or (in loanwords) MOUTH. To sauté ˈsəʊteɪ food is to fry it in a particular way; the word is a loanword from French, where it means ‘jumped’. Mushroom is another word that came to us from French; but English evidently reshaped Norman French muserun as if it were a compound with the second element room. Like the ordinary word room, mushroom can be pronounced either with short ʊ (FOOT) or with long uː (GOOSE). And, as everyone knows, tomato is one of the words in which the British and American pronunciations differ in an unpredictable way: they say təˈmeɪɾoʊ, we say təˈmɑːtəʊ. (I notice that the Japanese form トマト tomato doesn’t quite correspond to either of them, but is based on the spelling.) The final part of a traditional English breakfast is toast təʊst and marmalade ˈmɑːməleɪd. What can happen to the pronunciation of and in this context? We will of course use its weak form, which is basically ən or (less commonly) ənd. But the sequence ən, like other sequences of schwa plus a sonorant, is again a candidate for syllabic consonant formation. So toast
Sounds Delicious!
217
and is likely to become təʊst n̩. But that leaves the final st of toast susceptible to cluster reduction by loss of the t, yielding təʊs n̩. Furthermore, the nasal is followed by a bilabial at the start of marmalade, which makes it susceptible to dealveolar assimilation. So we are very likely to pronounce the phrase as ˌtəʊs m̩ ˈmɑːməleɪd. For British people the special day for eating pancakes is Shrove Tuesday, but Americans often eat them with syrup for breakfast on any day. Although the etymology of pancake is obvious—it’s a sort of cake, cooked in a pan—we don’t necessarily think of it as a compound. This may be helped by the fact that in BrE, at least, it is almost always pronounced with regressive dealveolar assimilation of the nasal, this ˈpæŋkeɪk. But it has nothing to do with pangs of hunger. Most of us like a mid-morning coffee or tea break, and if it includes a bite to eat we may call it elevenses ɪˈlevənzɪz, an informal word deriving from the fact that it takes place at around 11:00 a.m. The OED dates it to 1887, but does not indicate how the double plural originated. (The only other cases of double plurals in English seem to be historically double plural endings in children and brethren and the modern addition of an English plural ending to already plural loanwords, as in paninis. There is also a dialectal fourses for ‘tea-time’.) In the middle of the day we typically have a light meal usually known as lunch. In some regional or social varieties of English it may alternatively be called dinner or indeed luncheon. As in other words with a similar spelling, the plosive element of the affricate implied by the spelling ch in lunch is omitted by some speakers, so that the pronunciation may be either lʌntʃ or just lʌnʃ. Many people nowadays just buy a sandwich for their lunch, perhaps with a soft drink and a packet of crisps (= AmE chips). We nowadays have chains of shops specializing in the supply of freshly made sandwiches, including one chain confusingly called Subway and another going by the name Pret a Manger ˌpret ə ˈmɒnʒeɪ, -ˈmɑːn-, -dʒeɪ. This name is more or less the French for ‘ready to eat’ (in French spelling it would require two diacritics, thus prêt à manger; but we ignore them in English). The word sandwich is phonetically interesting for two reasons. The first syllable may be pronounced as written, ˈsænd-; but more frequently we see the d elided, leaving just ˈsæn-. There may also be dealveolar assimilation triggered by the following w, giving ˈsæm(b)-; and in regional speech we also sometimes get assimilation to the velar rather than to the labial component of w, giving ˈsæŋ(ɡ)-. For the second syllable, abouta half of British speakers use a voiced affricate at the end, -wɪdʒ, rather than the voiceless one, -wɪtʃ, that corresponds to the spelling. We see the same hesitation in many other place-names ending in -wich, e.g., Greenwich, Woolwich, Harwich, and Norwich, and also in the name of the vegetable spinach. It has even been claimed that some people use tʃ in the singular sandwich but dʒ in the plural sandwiches.
218
John C. Wells
A sandwich basically consists of two slices of buttered bread with a filling. What are our favourite fillings? Popular in Britain are prawn, egg and cress, and tuna and sweetcorn. The second of these can have the weak form of and ən reduced to syllabic n̩, very likely in this context to be assimilated to the velar position, ŋ̩. The last of them exemplifies the phenomenon of intrusive r, since we usually pronounce it ˌtjuːnər ən ˈswiːtkɔːn. There is also the likelihood here of yod coalescence, giving ˈtʃuːn-, or—particularly in AmE—yod dropping, giving ˌtuːn-, while the t of sweet-can be glottal (ˈswiːːʔ-) or assimilated (ˈswiːk-). My own favourite sandwich is one filled with smoked salmon and cream cheese. Being immediately followed by a consonant in the next word, the final consonant in the first word of the phrase smoked salmon ˌsməʊkt ˈsæmən can undergo elision, thus ˌsməʊk ˈsæmən. In salmon there is no phonetic l, and never has been in English; the letter l in the spelling is etymological, from Latin salmōn-. Lunch is also a meal for which the English-speaking world has started to borrow Japanese food items and their Japanese names along with them. You will now find sushi ˈsʊʃi or ˈsuːːʃi on widespread sale in London and New York; and in London, at least, you will also find shops offering you a bento ˈbentəʊ. We are gradually learning other Japanese food words, too—nigiri, miso, wasabi, sashimi, nori. And we have heard of sake. Thus Japanese renews its place in the long list of languages from which English has borrowed. For food and drink that means particularly French, Italian, plus more recently in Britain the languages of India and China, and in America Spanish. In the United States the Japanese word daikon has been borrowed to refer to the long white radish; but in Britain our supermarkets call that vegetable mooli, a word of Hindi-Urdu origin. Around four o’clock in the afternoon, it is tea-time. If we do have a tea break, it may comprise anything from a cup of tea with a biscuit to a formal afternoon tea such as you may remember from The Importance of Being Ernest—complete with cucumber sandwiches, scones and cake. The word cucumber ˈkjuːkʌmbə is phonetically interesting in that it has an unstressed ʌ rather than ə in its second syllable. Scone is a word over whose pronunciation we British are divided: some say skɒn, rhyming with on, while others say skəʊn, rhyming with stone. In the preference poll I conducted for LPD, people voted 65% for the first, 35% for the second. Then there’s cake, which is grammatically interesting, since this noun can be large and uncountable (“Would you like a slice of cake?”) or smaller and countable (“Have another cake!”). We can also note that some people call their evening meal tea. Hence the British can not only drink tea, but also eat it. In fact, there is considerable geographical and social variation in the use and meaning of the terms tea, dinner, and supper. For most of us, however, the main meal of the day is the evening meal, and we call it dinner.
Sounds Delicious!
219
Dinner often consists of several courses: a first course, which may be known in BrE as a starter and in AmE as an appetizer; a main course, sometimes called the entrée ˈɒntreɪ (though this word may also refer to a course between the first course and the main course—in French it is ɑ̃tʁe and merely means ‘entry’, the main course being the plat de résistance); and after that a sweet course, known in BrE as pudding, afters, sweet, or dessert, but in AmE usually just as dessert. Sometimes people finish with a cheese course rather than a sweet course. For the starter or appetizer, we often have soup. Other possibilities include dishes such as prawn cocktail (in AmE prawns are called shrimp), bruschetta, or hors d’oeuvres. The first mentioned, prawn cocktail ˌprɔːn ˈkɒkteɪl, is a candidate for dealveolar assimilation, making its first word prɔːːŋ. Bruschetta is an Italian word, in that language pronounced bɾuˈsketːa, and so ought to be called brʊˈsketə in English; but many people get misled by the sch in the spelling, which they pronounce ʃ as if it were German, giving brʊˈʃetə. Hors d’oeuvres is the French for ‘out of works’, i.e., leftovers from other dishes. In French it is ɔʁ dœvʁ, but in English we say ɔːˈdɜːv. Of course, the pronunciation of our many food words from foreign languages involves adapting them to fit English phonetics. Rather than soup, or before or alongside their main course, people often eat salad. In studying the phonetics of different varieties of English I like to compare the second vowel of salad with that of valid. In my own accent these weak vowels are different—ˈsæləd, ˈvælɪd—so that the two words do not rhyme. But for many Americans, Australians, and others, the two words rhyme perfectly—ˈsæləd, ˈvæləd. These speakers have a smaller weak vowel system than I do. There is a joke: “What is a honeymoon salad?” “Lettuce alone!”. It depends on pronouncing lettuce the same as let us (with the weak form of us), ˈletəs. So the joke doesn’t really work for those who, like me, pronounce lettuce as ˈletɪs. A possible salad component in Britain is the root vegetable that we in Britain call beetroot, and the Americans call just beet. The pronunciation of beetroot ˈbiːtruːt is noteworthy in that despite its obviously being a compound we pronounce it with an affricate tr in the middle, as in mattress ˈmætrəs, rather than recognizing the morpheme boundary as we would in other obvious compounds such as outright ˈaʊt.raɪt and therefore potentially making the t at the end of the first syllable either glottal or at least not audibly released. Returning to the matter of soup, we can note that many varieties of soup have names borrowed from foreign languages. Certainly bisque biːsk (a smooth creamy seafood soup) is from French, and the similar North American chowder ˈtʃaʊdə may also be from Canadian French. However, the curry-flavoured mulligatawny ˌmʌliɡəˈtɔːni soup and its name came to Britain from south India (Tamil miɭaku-taɳɳi). There are
220
John C. Wells
also two cold soups you may come across in warm weather: vichyssoise ˌviːːʃɪˈswɑːz from France and gazpacho ɡæzˈpætʃəʊ from Spain. The Vietnamese noodle soup pho fəʊ comes from much further away, though apparently its name (Vietnamese phở) may ultimately be traceable to the French word feu fø meaning ‘fire’. Except for people who choose a vegetarian option, the main course involves fish or meat. It is striking that the English names for most kinds of meat are of French origin, unlike the name of the animals that give us the meat, which come straight from Anglo-Saxon. So we eat beef from cows, mutton from sheep, and pork from pigs. The reason for this is historical: for a couple of centuries after the Norman Conquest in 1066, the landowners, who ate the meat, spoke French (in its regional Norman version), while the peasants, who raised the animals for their lords’ tables, spoke AngloSaxon. Hence meat from the cow is beef biːf (French boeuf bœf—though the cow itself in French is vache vaʃ), meat from the sheep is mutton ˈmʌtn̩ (French mouton mutɔ—though nowadays we normally call it lamb ̃ læm), and meat from the pig is pork pɔːk (French porc pɔʁ—though nowadays the usual French word for pig is cochon kɔʃɔ). ̃ Meat from a calf is likewise veal viːl (French veau vo, from Latin vitell-). Less commonly eaten nowadays is meat from the deer, venison ˈvenɪsən, from a French word meaning ‘hunting’. Strangely enough, we do not have this sort of distinction in the case of meat from the hen, for which we use the Germanic word chicken rather than something based on the French word, poule pul—though we do use the term pullet ˈpʊlɪt for a young hen. Likewise, for the eggs hens lay we still use the Germanic word, egg eɡ, rather than something based on the French oeuf œf. These lexical distinctions remained even after the landowning class switched to speaking English. We generally eat meat and fish not raw but cooked. We can stew it, grill (AmE broil) it, fry it, or bake it. The word stew is phonetically interesting, since it exemplifies not only a typical British/American difference in pronunciation (RP stjuː, GenAm stuː), but also two sound changes currently in progress in England. The first is coalescent assimilation, by which t plus j coalesce into the affricate tʃ, thus stʃuː. The second is s-affricate assimilation, by which s changes to ʃ before a following affricate (tr or tʃ), giving ʃtʃuː. Grilling slowly over fire, usually outdoors, is barbecuing. Barbecue ˈbɑːbɪkjuː has interesting alternative spellings such as bar-b-q and BBQ, based on the name of the letter Q kjuː. The word is believed to derive from the language of the now extinct Taino people of the West Indies, and came to us via American Spanish. There are various ‘cuts’ of meat available. A joint of meat will usually be carved into thin boneless ‘slices’ to be served to a number of people. Pork, veal, and lamb may be served as cutlets. A boneless slice of meat coated in breadcrumbs and fried is an escalope ˌeskəˈlɒp or schnitzel
Sounds Delicious!
221
ˈʃnɪtsəl. These last two words are French and German respectively. Schnitzel is phonetically interesting in that it mildly violates the phonotactics of English: we do not find ʃn at the beginning of native words, and the only word with medial ts and no morpheme boundary between the two consonants is pizza ˈpiːtsə, a 20th-century borrowing from Italian via American English. One way to make schnitzel sound more English is seen in the mispronunciation ˈsnɪtʃəl sometimes encountered, with the metathesis of s and ʃ. If beef is minced (AmE: ground) into tiny pieces, we British call the result minced beef or simply mince mɪns. The Americans call it ground beef or hamburger. Both we and the Americans use the name hamburger for the cooked beef in a bun that you might buy at McDonald’s or Burger King. Like other words with ns after a stressed vowel, mince is pronounced by many people with an epenthetic consonant t between the nasal and the fricative, thus mɪnts. Compare fence fen(t)s, once wʌn(t)s, pencil ˈpen(t)səl, etc. A standard accompaniment to meat is “two veg”, where veg vedʒ is the abbreviation of vegetables. Typically, one of them is potatoes, while the other might, for example, be carrots or a green vegetable such as peas, broccoli, or runner beans. Meat and two veg are usually accompanied by gravy. Gravy ˈɡreɪvi is a word of interesting etymology. Apparently, the letter v here, and the corresponding pronunciation with v, result from a misreading of an n in an Old French word, grané. In medieval manuscripts the modern v was written u, which was often difficult to distinguish from n. Potatoes can be boiled, fried, baked, or roasted. Each of these adjectives ends in a consonant susceptible to being elided or assimilated when immediately followed by the word potatoes. An alternative name for mashed potatoes ˌmæʃ(t) pəˈteɪtəʊz is simply mash. (We have an everyday dish called “bangers and mash”.) Strangely, before a noun the verbal past participle roasted tends to be avoided in favour of the synonymous adjective roast, so that one may have the impression here too of the elision of the -ed ɪd suffix: roast potatoes ˌrəʊs pəˈteɪtəʊz. Similarly, fried potatoes are also alternatively known as fries. One method of frying is known by the French-derived term to sauté: the result is sauté(ed) ˈsəʊteɪ(d) potatoes. Literally, this means that the cook makes them jump. As usual, the French vowel o of sauté sote is mapped onto the English GOAT vowel. The word broccoli ˈbrɒkəli is a borrowing from Italian. Like spaghetti spəˈɡeti, in Italian it is a plural. The corresponding singular forms in Italian are broccolo and spaghetto, and mean ‘cabbage sprout’ and ‘little string’ respectively. Broccoli is a form of cauliflower ˈkɒliˌflaʊə, another interesting word. English probably borrowed or adapted it from the Italian cavoli fiori ‘cabbage flowers’. The first part is related to the first half of coleslaw ˈkəʊlslɔː, the salad made from cabbage: this word consists
222
John C. Wells
of cole, an earlier English word for ‘cabbage’, plus slaw, from Dutch sla ‘salad’. Rather than potatoes, the starchy part of the main course might be some form of pasta—perhaps indeed spaghetti, or perhaps noodles, macaroni, gnocchi, or one of the many other varieties we have adopted from Italian cuisine. But names of pasta are not necessarily all Italian—noodle ˈnuːdl̩ is from German. The first vowel in pasta is different from that in past, though in opposite directions in British and American pronunciation. In RP we have ˈpæstə and pɑːst, in GA ˈpɑːstə and pæst. Gnocchi is (or are—again, in Italian this word is plural) indeed of Italian origin, and the word presents us with a difficulty in adapting it to English pronunciation habits. Its initial consonant in Italian is the palatal nasal ɲ, a consonant we do not have in English. When we borrow a foreign word that includes this sound, we usually map it onto nj if it is between two vowels, as when we turn Spanish cañon ˈkaɲon into ˈkænjən, or onto plain n at the end of a word, perhaps with a vowel change to capture the palatal element, as when we turn French champagne ʃɑ̃paɲ into ˌʃæmˈpeɪn and the port city of Boulogne bulɔɲ into buˈlɔɪn. For ɲ at the beginning of a word, as in gnocchi (Italian ˈɲɔkki) we have not yet decided which solution to choose: in English you can hear both ˈnjɒki and ˈnɒki. (For the spelling gn-at the beginning of native words, of course, we pronounce just n, as in gnome and gnaw.) We can now end our meal with the last course, namely dessert or often, in BrE, pudding. The standard pronunciation of the latter term is of course ˈpʊdɪŋ. But as with other -ing forms there exists a lower-prestige alternative form with an alveolar nasal, thus ˈpʊdn̩. For this in Britain we might well choose apple crumble ˌæpl̩ ˈkrʌmbl̩ or (with London-style l-vocalization) ˌæpo ˈkrʌmbo. We might accompany it with custard or ice cream. The word custard has an interesting etymology: it seems to derive from a French word croustade, something crusty. The compound noun ice cream is one of those compounds in which the stress pattern is variable: in my LPD preference poll, 66% voted for putting the main stress on cream, 34% for putting it on ice. With our meal we need a drink. Some prefer beer, some wine. The etymology of beer is an Anglo-Saxon borrowing from Late Latin, related to Latin bibere ‘to drink’. Its near-synonym ale is straight Germanic. The etymology of wine, on the other hand, is Latin vinum. Or we could just drink water. The typical pronunciation of water in BrE, ˈwɔːtə, is so different from its typical pronunciation in AmE, ˈwɑːːɾɚ, that it frequently causes a moment of misunderstanding when Brits and Americans encounter one another. Before or after our meal we may drink spirits. Among popular spirituous drinks, sometimes diluted with tonic or another mixer, or combined with other liquids in a cocktail, are gin dʒɪn (from French from
Sounds Delicious!
223
Latin juniperus), brandyˈbrændi (from Dutch brandewijn ‘burnt (distilled) wine’), whisk(e)y ˈwɪski (from Gaelic uisgebeatha, literally ‘water of life’), and vodka ˈvɒdkə (from Russsian водка ˈvotkə, diminutive of вода ‘water’). These diverse origins nicely illustrate the mosaic that is the etymology of English words.
References The nature of this article renders exhaustive listing of sources unnecessary. Pronunciations may be checked in LPD or other dictionaries. Etymologies may be checked in the online OED; many are also discussed in Ayto 1990. Ayto, J. 1990 [2nd edn 2005]. Word Origins. London: A & C Black. LPD = Wells, J. C. 2008. Longman Pronunciation Dictionary. Harlow: Pearson Education. OED = Oxford English Dictionary. 1st edn. 1884–1928, 2nd edn. 1989, 3rd edn. in preparation; available online. Oxford University Press.
Part 3
Reality Check Empirical Approaches
16 The Involvement of the Cerebellum in Speech and NonSpeech Motor Timing Tasks A Behavioural Study of Patients With Cerebellar Dysfunctions1 Marzena Z˙ ygis, Zofia Malisz, Marek Jaskuła and Ireneusz Kojder 1. Introduction The cerebellum, the second largest part of the brain, is located near its base. The cerebellum is engaged in the execution of motor actions responsible for maintaining balance, limb and eye co-ordination as well as speech production. It also participates in higher cognitive functions such as attention and preparation for many mental and motor processes (Ackermann 2008; Ackermann & Brendel 2016). Additionally, the observation of lexico-semantic and syntactic disorders in cases of cerebellar lesions suggest the involvement of ‘the small brain’ in language, understood as a cognitive skill and representation (Mariën et al. 2014). The cerebellum is also clearly relevant for temporal processing pertaining to both production and perception of events in real time. It is important to note, however, that the cerebellum is not the sole centre responsible for temporal processing. In this function, it participates in a distributed network that involves other subcortical networks such as the basal ganglia and thalamus, as well as the supplementary motor area and other areas of the cortex (Coull et al. 2011). The role of the cerebellum in this network is considered to be either a compensation mechanism supporting main routes of temporal processing or a contextually dependent mechanism where the cerebellum is recruited depending on a specific timing task (Breska & Ivry 2016). In particular, the cerebellum’s involvement in prediction, preparation and attention implies that it is relevant for the processing of temporal information in perception, including speech perception (Schwartze & Kotz 2016). There is also general agreement that the cerebellum specialises in the precise representation of temporal events in the sub-second range (Spencer & Ivry 2013; Breska & Ivry 2016; Schwartze & Kotz 2016) pertaining to motor timing, including speech production timing. For instance, it has been shown that patients with cerebellar degeneration are unable to detect differences between words which differ exclusively in the temporal structure
228
Marzena Żygis et al.
of the acoustic signal (Ackermann et al. 2007). It has also been shown that the pronunciation of patients with cerebellar dysfunction is characterised by equalised and prolonged syllable duration, called ‘scanning speech’, a slower speaking rate and other symptoms of ataxic dysarthria (Ackermann et al. 2007; Ackermann & Brendel 2016). Regarding more detailed behavioural results of cerebellar involvement in timing tasks, it is evident that the ‘small brain’ takes part in interval duration estimation where precise, time-isolated intervals are considered, and no other temporal context is present (Breska & Ivry 2016). For example, when a single interval is presented, the judgement of the interval duration is impaired in cerebellar patients, suggesting that the cerebellum is involved. However, when the same interval is presented in a stream of isochronous, or otherwise rhythmic stimuli, the judgement is correct, implying that cerebellar structures are not recruited for this particular task. Similarly to the rhythmic context influence, when timing performance depends on emergent dynamics of the signal or activity, the cerebellum does not participate, as Breska and Ivry (2016) argue in their overview of timing tasks in several neuropsychological, neurophysiological and imaging studies. This is the case with, for example, tasks that involve continuous periodic (cyclic) movements (drawing circles) or other emergent activities based on oscillatory entrainment. It appears, therefore, that the cerebellum is critical for tasks in which discrete event timing is involved—but not where dynamic, cyclic context is present.
2. The Present Study The present study focuses on characterising timing constraints in relation to cerebellar lesions via several behavioural timing tasks. According to the taxonomy of timing tasks in Coull and Nobre (1998), the tasks analysed in the present work are explicit motor timing tasks. In our explicit motor tasks, the participants either a) tap or speak to repeat a pattern heard immediately before or b) synchronise their speech production with a periodic signal. Many papers have investigated the behaviour of patients with cerebellar lesions via explicit timing tasks, particularly finger tapping experiments. These studies showed that patients have difficulty in replicating duration relations between beats they listen to (cf. e.g., Ivry & Keele 1989), evidencing cerebellar involvement. Additionally, these types of tasks are traditionally considered to gauge dynamic timing, that is, to involve mechanisms that process rhythmic or continuous patterns. If that is the case, the tapping task, as used in the classic Ivry and Keele (1989) study and now in ours, would be the only example of a dynamic task that is not consistent with Breska and Ivry’s proposal (2016) as mentioned above. That is to say, it is solely the discrete timing tasks, where e.g., discrete intervals are judged or motor performance is
The Cerebellum in Speech and Non-Speech Tasks
229
based on the concatenation of discrete events, that require the recruitment of the cerebellum, while dynamic tasks do not usually involve this structure. To explain the discrepancy between the supposed timing process and cerebellar involvement in repetitive tapping tasks, Breska and Ivry (2016) propose that the tapping task could in fact be a discrete rather than a dynamic explicit motor task. They suggest that repetitive tapping is correlated with mechanisms of discrete duration estimation (interval-based timing) and error correction rather than with dynamic timing processing. With this background, we address the question of the extent to which lesions of the cerebellum have an impact on the timing of differently patterned sequences involving repetitive tapping. Additionally, we look at not only the ability of patients to repeat non-speech sequences, but also at repeated syllables arranged in prosodic feet, such as trochees, iambs, dactyls or anapests.
3. The Experiment2 In our experiment we hypothesise that patients with cerebellar dysfunctions encounter more difficulties in reproducing the rhythm patterns than the control group. The patterns are based on speech stimuli (syllables) and non-speech stimuli (taps), arranged in prosodic feet. 3.1. Speech and Non-Speech Material We gathered data from cerebellar patients, who were asked to complete the following tasks: Task 1: spontaneous speaking: question answering and picture description; Task 2: reading aloud the Polish version of “The North Wind and the Sun” (see Appendix); Task 3: repeating rhythmic structures consisting of two or three feet by pen tapping (henceforth: the tapping task); Task 4: repeating syllables grouped into two or three prosodic feet (henceforth: the syllable repetition task); Task 5: synchronising CV syllables with metronome beats at different rates; Task 6: synchronising five different short sentences with the metronome beats at different rates. In Task 3, participants were asked to listen to short rhythm sequences tapped by the experimenter (the first author) with a pen on the table. The participants were asked to tap the following sequences: XxXx, xXxX, XxXxXx, xXxXxX, XxxX, xXXx, XxxXxx, xxXxxX, where X indicates an acoustically more prominent tap than x. The participants tapped
230
Marzena Żygis et al.
their repetitions immediately after each pattern was performed by the experimenter. There were three trials including this set of patterns, intertwined with other tasks in the session. In Task 4, the participants were asked to listen to sequences of the syllable /pa/ pronounced by the experimenter and to repeat them immediately after each pattern was given. The sequences were arranged into feet and showed a similar pattern to those performed in Task 3: PApaPApa (trochaic short), paPApaPA (iambic short), PApaPApaPApa (trochaic long), paPApaPAaPA (iambic long), PApapaPApapa (dactyl), papaPApapaPA (anapest), but also papapaPA (equal-iambic) and PApapapapaPA (trochaic-equal-iambic). The realisation of prominence was expressed not only by timing but also by spectral properties as e.g., fundamental frequency and intensity. In Task 5, the individuals were asked to listen to a metronome played at different rates (40, 52, 64, 76, 88, 100, 112, 124, 136, 148) and asked to synchronise the syllables /ta/ and then /ʃa/ with the beat. Before starting to synchronise, the patients listened to a few beats of the metronome at the pace to be tested. There were ca. 20–25 repetitions of each syllable at a given rate. Regarding Task 6 the participants listened to the metronome and tried to synchronise a sentence with the metronome rate. We also recorded healthy informants who were asked to complete Tasks 3–6, as a control group. In the present paper, we will share some observations concerning Tasks 1 and 2, and then focus on Tasks 3 and 4 by analysing patients’ performance compared with that of the control group. 3.2. Speakers Eight patients with cerebellar dysfunction (six males) aged 25–63 and native speakers of Polish took part in the experiment. All participants signed permission for researchers to use their data for scientific purposes. The dysfunction concerned different locations in the cerebellum: F2 (female)—right-sided epidermoid cyst; F1 (female)—meningioma of the tentorium; M1 (male)—right-sided and pressure on medulla oblongata; M2 (male)—left-sided damage; M3 (male)—right-sided cavernoma; M4 (male)—left-sided astro-pilocytic tumour; M5 (male)—Dandy Walker cyst, with left-sided prevalence; M6 (male)—both-sided lacunas of the cerebellum. The participants were also asked whether they had a musical background, i.e., if they play or used to play an instrument or sing songs. Four (F1, M2, M3, M4) out of eight patients declared themselves to be musical in this sense. As a control group, we recorded ten healthy, native speakers of Polish, students at the Westpomeranian University of Technology, Szczecin. For the purpose of the present study, we analysed data of four females (F1, F2, F3, F4) and four males (M1, M2, M3, M4) in this group.
The Cerebellum in Speech and Non-Speech Tasks
231
3.3. Recordings Each session took about 40–60 min. Due to patient fatigue, a few sessions were shortened, mainly at the cost of task repetition. Patient recordings took place in a room at the Pomeranian Medical University Hospital in Szczecin (Poland) using a Linear PCM Recorder TASCAM DR-05. The control group was recorded in a sound-proof room at the Electrical Engineering Department of the West Pomeranian University of Technology in Szczecin using a TLM103 microphone connected to a ProTools system with a Digi 003 interface (sampling rate 44100 Hz). 3.4. Analysis We evaluated the tapping task by annotating (in PRAAT, Boersma & Weenink 2018) and counting the number of taps performed by the experimenter and the participant in both the patient and the control groups. The observed (dis)agreement was logged. The evaluation of the syllable repetition task involved two annotators. Each of them listened to a) the syllable patterns performed by the experimenter, and b) the patient and control group responses, which consisted of immediate repetitions of each pattern given by the experimenter. The annotators labelled each repetition in both a) and b) by assigning either x (weak prominence) or X (strong prominence) to each syllable pattern as they heard it. They also counted the number of syllables in the original and repeated pattern. Finally, they decided whether the repeated pattern reproduced the original pattern and whether the number of syllables in both patterns was the same. If the prominence pattern was correct but the number of syllables differed, the pattern was interpreted as incorrect. 3.5. Statistical Analysis Statistical calculations were performed in R Studio (R Studio Team 2018, version 1.1.453). We investigated the agreement of the number of taps between the experimenter and the participant (Task 3) and the agreement of prosodic syllable patterns as produced by the experimenter and participants (Task 4). First, we used chi-square tests to compare the agreements between different patterns separately for patients and the control group. Second, by using binomial mixed-effect models, we studied the effect of Pattern [trochaic, iambic, etc.], Length of the pattern [short = 4 syllables, long = 6 syllables), Participant Group [patients, control] and Sex [female, male] on the agreement of the number of repeated taps (section 3.6.1) and syllables (section 3.6.2). Speaker was entered as a random factor. The models were compared by means of ANOVA and the
232
Marzena Żygis et al.
best-fit model was selected as final. Finally, we also corrected for multiple comparisons by using the Tukey test. 3.6. Results Regarding Task 1, spontaneous, continuous monologues, sporadic slips of the tongue, repetitions of words and single instances of vowel prolongation were observed. The pauses between sentences were not exceptionally long. No perceivable differences in speech between patients could be impressionistically stated. Similarly, in Task 2, reading aloud the “The North Wind and the Sun” fable aloud, no speaking problems could be established in the patient group bar one exception, patient M6, who read the text inserting longer pauses and produced some syllables with exceptionally prolonged duration. He also produced very short intonational phrases, often consisting of two words. We will devote more attention to this patient in section 3.6.3. In the following, we will focus on Tasks 3 and 4. 3.6.1 The Tapping Task Regarding the tapping task, the experimenter performed 121 tap sessions, 21 of which were erroneously repeated by patients. Each session contained one prosodic pattern. As presented in Figure 16.1 (left), the tap disagreement was found in all patterns (anapest = 3 incorrect answers, dactyl = 2, iambic = 5, iambic-trochaic = 2, trochaic = 6, trochaic-iambic = 3). The chisquare test comparing the number of correct/incorrect answers in dependence of prosodic patterns was not significant (p = 0.991). Note that the width of the bars indicates the number of investigated patterns: the wider the bar, the more data are represented. The differences in bar width are due to the fact that not every patient was able to accomplish three repetitions of the task. In addition, patterns like XxXx and XxXxXx were both classified as trochaic and xXxX and xXxXxX as iambic. Even if we included a distinction between short and long sequences in our database, we did not provide this distinction in figures for reasons of simplification and because differences in the performance between these sequences were minimal. When we look at patterns provided by individual participants, differences between them are evident; see Figure 16.1 (right). While three participants accomplished the tapping task perfectly (F2, M1, M5), five of them made mistakes. The number of mistakes varied between one and 11 (F1: 2; M2 = 4; M3 = 3; M4 = 1; M6 = 11). In the control group, the number of incorrect answers was six out of 192 sessions. The incorrect answers were found in the anapest (1), dactyl (1), iambic (1), iambic-trochaic (1) and trochaic-iambic pattern (2). As expected, the difference was not significant (p = 0.531). Figure 16.2 (left) illustrates the results.
Figure 16.1 Percentage of Correct (= 0) and Incorrect (= 1) Taps in Different Patterns (Left) and Results Obtained by Individual Patients (Right).
Figure 16.2 Percentage of Correct (= 0) and Incorrect (= 1) Taps in Different Patterns (Left) and Results Obtained by Individual Participants in the Control Group (Right).
The Cerebellum in Speech and Non-Speech Tasks
235
While five participants from the control group performed the task without any mistake, three of them accomplished the task with few mistakes: F2 (1), F4 (2) and M3 (3). The results are presented in Figure 16.2 (right). A binomial analysis reveals that the number of incorrect vs. correct tap reproductions is dependent on whether the patterns are performed by patients or the control group. The odds ratio for patients with cerebellum dysfunctions is 7.74 higher than for the control group (z = 2.15, p < .05). Other effects, including the pattern, its length and speaker’s sex, were not significant. 3.6.2 The Syllable Repetition Task In the syllable repetition task, out of 120 sessions produced by the experimenter, 19 were incorrectly reproduced by patients. The results are presented in Figure 16.3 (left). Again, the incorrect answers were found in all patterns (anapest:1; dactyl:3; equal-iambic:4; iambic:2; trochaic:6; trochaicequal-iambic:3). The chi-square test was not significant (p = 0.382). The results also reveal a large variation found among the patients. Of six patients who took part in this task, three experienced difficulties. Note that two patients (F2, M2) did not accomplish the task as they interrupted the experiment, and the other two (M3, M5) could only perform one repetition of the task. As Figure 16.3 (right) shows, the number of incorrect reproductions was not equally dispersed over the three participants. While speaker M3 made one and M1 two mistakes, speaker M6 was mistaken 16 times. In the control group, 214 patterns were performed by the experimenter and only six of them were wrongly reproduced. As illustrated by Figure 16.4 (left), they appeared in the trochaic pattern (1) and trochaic-equaliambic (5). The main problem in the trochaic-equal-iambic sequence was not erroneously repeated prominence but a reduced number of syllables (from six to five, that is from XxxxxX to XxxxX). The chi-square test was significant (χ2 = 32.80; p < .001). The incorrect patterns were produced by F1 (1), F4 (3) and M4 (2), with the number of errors varying from one to three (F1:1; F4:3; and M4:2, as shown Figure 16.4 on the right). The binomial analysis shows that the number of incorrect vs. correct syllable reproductions is not significantly different between patients and the control group. The only significant difference was found with respect to the performance of the trochaic-equal-iambic sequences with respect to (a) anapest (z = 2.98, p < .05), (b) equal-iambic (z = 3.07, p < .05), and (c) iambic pattern (z = 3.68, p < .01). Other factors remained not significant. However, it should be taken into account that most mistakes, not only in the tapping task but also in the speech task, were made by patient M6. We will therefore take a closer look at his answers in the next section.
Figure 16.3 Percentage of Correct (= 0) and Incorrect (= 1) Syllables in Different Patterns (Left) and Results Obtained by Individual Patients.
Figure 16.4 Percentage of Correct (= 0) and Incorrect (= 1) Syllables in Different Patterns (Left) and Results Obtained by Individual Participants in the Control Group (Right).
238
Marzena Żygis et al.
3.6.3 Results for Patient M6 In the tapping task, patient M6 made 11 mistakes. The patterns tapped by this patient included either fewer or more taps in comparison with the number of taps performed by the experimenter. The spectrum of mistakes was broad, as the patient performed from one tap fewer to 16 taps more than the original pattern. Figure 16.5 shows the six-tap pattern produced by the experimenter (left) and the corresponding 19-tap pattern produced by this patient (right). In the syllable repetition task, patient M6 mistakenly repeated the syllable pattern 16 times, which was far more than other patients. He either incorrectly put prominences or produced an incorrect number of syllables. An example is provided in Figure 16.6, where the experimenter produces a dactyl pattern (left) and the patient produces nine syllables with the longest first syllable (right). The reason for such exceptional behaviour by this patient might be related to the fact that compared to other patients, his cerebellar damage was the most extensive. Figure 16.7 provides a schematic presentation of the cerebellum of patient M6, with light areas illustrating damage. As Figure 16.7 shows, patient M6 suffered from both-sided lacunas of the cerebellum and was the only one among eight patients with this type of damage. Other patients had one-sided (left or right) damage (see Section 3.2 for a description of the lesions of other patients). Therefore, we hypothesise that this type of damage is responsible for the extreme behaviour in the tasks conducted in our experiment.
4. Discussion and Summary Our results reveal a clear difference in task accomplishment depending on the nature of the stimuli. The non-speech stimuli were more frequently replicated correctly by healthy participants than by patients, suggesting this task was more challenging for patients with cerebellar lesions. Differences in tap replications between the experimenter and patients were found not only in complex but also in simple prosodic patterns, e.g., short trochaic patterns XxXx. These results support Breska and Ivry’s (2016) proposal that the tapping task is a discrete rather than a dynamic explicit motor task and as such requires the recruitment of the cerebellum. By contrast, speech stimuli, i.e., consisting of spoken syllables arranged in prosodic feet, were repeated correctly, for the most part, by both patients and the control group. The only exception was the complex trochaic-equaliambic sequence, which proved challenging for both groups. Thus, the difference in the accomplishment of these tasks suggests that the cerebellum is not equally involved in speech vs. non-speech based tasks. It seems reasonable to assume that the co-variability of other acoustic parameters might play a role here: in case of speech, also fundamental frequency (and other
Figure 16.5 Taps Produced in Task 3 by the Experimenter (Left) and Patient M6 (Right).
Figure 16.6 Syllabic Patterns Produced in Task 4 by the Experimenter (Left) and Patient M6 (Right).
The Cerebellum in Speech and Non-Speech Tasks
241
Figure 16.7 Schematic Presentation of the Cerebellum of Patient M6.
parameters) possibly contribute to how temporal stimuli are processed. A more detailed acoustic analysis of the stimuli could help to (dis)prove this conclusion. The present study also reveals a large interspeaker variation between patients in the accomplishment of the tasks. While one patient encountered extreme difficulties in accomplishing both tasks, there were patients who did not make mistakes. This suggests that further study is needed with patients showing the same (or similar) cerebellar lesions in order to localise areas responsible for different types of timing processing, and to detect which areas are responsible for processing speech versus nonspeech patterns.
Acknowledgements We would like to thank all participants in this study. This research has been supported by Bundesministerium für Bildung und Forschung (BMBF, Germany) Grant Nr. 01UG1411 to Marzena Z˙ygis.
Notes 1. We dedicate this paper to Prof. Katarzyna Dziubalska-Kołaczyk, who has made outstanding contributions to Polish phonology, L1 and L2 studies and to the linguistics and preservation of the world’s languages. We thank you, Katarzyna, for your work, enthusiasm and courage. 2. The present investigation is part of a project related to patients with cerebellar lesions, with the aim of examining cerebellar dysfunctions from perceptual, cognitive, linguistic and neurological points of view. The project was led by Prof. Ireneusz Kojder at the Pomeranian Medical University Hospital in Szczecin (Poland).
242
Marzena Żygis et al.
References Ackermann, H. 2008. Cerebellar contributions to speech production and speech perception: Psycholinguistic and neurobiological perspectives. Trends in Neurosciences 31(6). 265–272. Ackermann, H. & B. Brendel. 2016. Cerebellar contributions to speech and language. In G. Hickock & S. L. Small (eds.), Neurobiology of language. Amsterdam: Elsevier. 73–84. Ackermann, H., K. Mathiak & A. Riecker. 2007. The contribution of the cerebellum to speech production and speech perception: Clinical and functional imaging data. The Cerebellum 6. 202–213. Boersma, P. & D. Weenink. 2018. Praat: Doing phonetics by computer [Computer program]. Version 6.0.40, retrieved 11 May 2018 from www.praat.org/. Breska, A. & R. B. Ivry. 2016. Taxonomies of timing: Where does the cerebellum fit in? Current Opinion in Behavioral Sciences 8. 282–288. Coull, J. T., R. K. Cheng & W. H. Meck. 2011. Neuroanatomical and neurochemical substrates of timing. Neuropsychopharmacology 36(1). 3. Coull, J. T. & A. C. Nobre. 1998. Where and when to pay attention: The neural systems for directing attention to spatial locations and to time intervals as revealed by both PET and fMRI. Journal of Neuroscience 18(18). 7426–7435. Ivry, R. & S. Keele. 1989. Timing functions of the cerebellum. Journal of Cognitive Neuroscience 1. 136–152. Mariën, P., H. Ackermann, M. Adamaszek, C. H. Barwood, A. Beaton, J. Desmond & M. Leggio. 2014. Consensus paper: Language and the cerebellum: An ongoing enigma. The Cerebellum 13(3). 386–410. R Studio Team. 2018. RStudio: Integrated development for R. Boston, MA: RStudio, Inc. Version 1.1.453. Schwartze, M. & S. A. Kotz. 2016. Contributions of cerebellar event-based temporal processing and preparatory function to speech perception. Brain and Language 161. 28–32. Spencer, R. M. & R. B. Ivry. 2013. The cerebellum and timing. In M. Manto, D. Gruol, J. Schmahmann & N. Koibuchi (eds.), Handbook of the cerebellum and cerebellar disorders, 1201–1219. Dordrecht: Springer.
Appendix
Wiatr Północny i Słońce Pewnego razu Północny Wiatr i Słońce sprzeczali się, kto z nich jest silniejszy. Właśnie przechodził drogą jakiś człowiek owinięty w ciepły płaszcz. Umówili się więc, że ten z nich, który pierwszy zmusi przechodzącego, aby zdjął okrycie, będzie uważany za silniejszego. Północny Wiatr zaczął od razu dąć z całej siły, ale im więcej dął, tym silniej podróżny otulał się w płaszcz. Wreszcie Północny Wiatr dał spokój. Wtedy Słońce zaczęło przygrzewać, a w chwilę później podróżny zdjął płaszcz. W ten sposób Północny Wiatr musiał przyznać, że Słońce jest silniejsze od niego. Translation The North Wind and the Sun were disputing who was the stronger, when a traveller came along wrapped in a warm cloak. They agreed that the one who first succeeded in making the traveller take his cloak off should be considered stronger than the other. Then the North Wind blew as hard as he could, but the more he blew, the more closely did the traveller fold his cloak around him; and at last the North Wind gave up the attempt. Then the Sun shined out warmly, and immediately the traveller took off his cloak. And so the North Wind was obliged to confess that the Sun was the stronger of the two.
17 ERP Correlates of Figurative Language Processing Anna B. Cieślicka and Roberto R. Heredia
1. Introduction Several types of ERP components relevant to the study of language have been identified and described. Early exogenous components, occurring before 200 ms, and later endogenous ones have been distinguished in the ERP literature. After 200 ms, especially at the N400 component, meaning-related attributes are extracted. The N400 is a negative deflection in the event-related potential that is an index of semantic processes and linguistic expectations (Kutas & Federmeier 2011; Kutas & Hillyard 1980b). It peaks around 400 ms after the presentation of a content word and its size depends on a number of factors, such as word frequency, repetition, word concreteness, the number of a word’s orthographic neighbors, semantic relatedness, sentence position, and contextual expectancy (Kutas & Delong 2008). The N400 is smaller when context builds strong expectations concerning the upcoming word. Thus, incongruent or unexpected sentence endings and unrelated word pairs elicit larger N400 responses than congruent or expected sentence endings and related word pairs. In addition, the component has smaller amplitude in response to high frequency words, as compared to words of lower frequency. Unexpected changes of a non-semantic nature (e.g., changes in word size, color, or grammatical violations) have been linked with positivegoing potentials occurring either around 300 ms (P300) or 600 ms (P600) after stimulus presentation (Kutas & Hillyard 1980a). The P600 is a large positive-going wave peaking between 500–1000 ms after the presentation of a syntactic anomaly (Friederici 1995; Osterhout & Mobley 1995). The P600 has been shown to reflect syntactic violations (Hahne & Friederici 1999), memory retrieval (Paller & Kutas 1992; Rugg et al. 1995), the processing of well-formed but syntactically complex sentences (Kaan et al. 2000), and processes of reanalysis and syntactic repair (Friederici 1995, 2002; Osterhout et al. 1994). A number of ERP studies have investigated figurative language processing by native language speakers. Some of them looked into cerebral asymmetries in processing figurative language. For example, Rapp
ERP Correlates of Language Processing
245
et al. (2004) reported the strongest activation in the anterior part of the left inferior frontal gyrus and middle temporal gyri during reading metaphorical, as opposed to literal sentences. Similarly, Stringaris et al. (2007) found greater activation in the left hemisphere than in the right hemisphere during the processing of metaphorical, relative to literal sentences. Along with metaphorical/literal sentences, stimuli included nonmeaningful ones, which participants evaluated for their meaningfulness. Distinct neural mechanisms were revealed for each sentence type, with non-meaningful and metaphoric sentences primarily activating the left inferior frontal gyrus, metaphoric ones additionally activating the left thalamus, and literal sentences showing strong activation in the right precentral gyrus when contrasted with either metaphorical or non-meaningful conditions. While a number of ERP studies have explored metaphor and metonymy (Weiland et al. 2014), studies into the processing of idioms are scarce and mostly conducted in participants’ native language. Using a behavioral paradigm (semantic priming and a relatedness judgment task) and ERPs, Laurent et al. (2006) showed that highly salient (figurative) meanings of familiar idioms are accessed faster and evoke smaller amplitudes of the brain’s electrophysiological responses than weakly salient idioms. Idioms varying in salience (strongly vs. weakly salient) were presented out of context and followed by words related to their figurative or literal meanings. Participants decided whether a target word was related (yes/no) to the meaning of a preceding idiomatic sentence. Behavioral data indicated that figurative targets were responded to faster than literal targets when presented with salient idioms, whereas faster reaction times (RTs) to literal targets than figurative targets were exhibited in the weakly salient idiom condition. In turn, ERP recordings showed smaller amplitudes of the N400 and P600 for the last word of the highly salient idioms and for target words compatible with salient meanings of salient idioms. These results support the Graded Salience Hypothesis (GSH), in that, depending on the idioms’ salience, either their idiomatic (for high-salience idioms) or literal (for low-salience idioms) meaning was primarily automatically activated, leading to the increased speed in the processing of a subsequent compatible (figurative/literal) target. Electrophysiological measures corroborated those results, showing smaller N400 amplitude for the idiomatic, as compared to literal meaning following salient idioms. Overall, strongly salient idioms evoked smaller electrophysiological responses on the N400 component (revealing access to semantic memory) and the P600 component (sensitive to syntactic or semantic expectations) than low-salience idioms. More recently, Canal et al. (2015) looked at electrophysiological correlates during the processing of literally plausible idioms (break the ice) embedded in either literal- or figurative-biasing context. ERP recordings were time-locked to the presentation of the first three words of the idiom,
246 Anna B. Cieślicka and Roberto R. Heredia whose structure conformed to the Verb Phrase + Noun Phrase. While no differences were found for the N400 amplitude recorded for the three constituent words of idioms used literally or figuratively, at the time window 400–600 ms, reflecting later integration processes, ERP differences emerged in frontal electrodes between idiomatic vs. literal and idiomatic vs. control sentences during the presentation of the final word of the idiomatic/control expression. Given the scarcity of studies into bilingual figurative processing and ERPs, the current study explores the activation of literal and figurative meanings of English idiomatic expressions by Spanish-English bilinguals. Previous research using behavioral measures points to two important factors requiring further attention. (1) Research findings show that literal meanings of second language (L2) idioms are understood faster and become readily accessible (Liontas 2002; Cieślicka 2006; Cieślicka & Heredia 2011). (2) Language dominance emerges as a crucial factor modulating bilingual language processing (Altarriba & Basnight-Brown 2007; Heredia 1997). In the dominant language, bilinguals feel more comfortable communicating and are more proficient in receptive and productive language skills. Even if the native language (L1) at one point was dominant, overtime, the L2 may become the dominant and more readily accessible language. Studies have shown that bilinguals not dominant in a given language will process the idiom in that language literally first, and only then trigger its figurative meaning (Cieślicka 2006, 2013). The current study investigates these issues more closely to determine which meanings of ambiguous (literally plausible: kick the bucket) idioms are more salient for fluent Spanish-English bilinguals likely to be dominant in English. Idioms were embedded in a literal-biased context, favoring the idiom’s less salient meaning (He slipped on the wet kitchen floor and he kicked the bucket, or in a figurative-biased context, favoring the idiom’s salient meaning (e.g., After a long battle with cancer, finally he kicked the bucket). Sentences were followed by a critical target that was either figuratively (DIE) or literally (FOOT) related to the idiom. Additionally, targets were either congruent/incongruent with the preceding context. Participants’ task was to decide whether the critical target was related to the meaning of the preceding idiomatic sentence. We used both behavioral (RT and response accuracy) and electrophysiological measures and indices of literal and figurative meaning activation of ambiguous idioms presented in context. For the electrophysiological measures, we looked at the N400 and P600 components. Methods Participants Twenty-eight Spanish-English bilinguals (Mage = 24, 15 female) students from a South Texas University participated in the experiments. Twenty-five
ERP Correlates of Language Processing
247
participants were dominant in English and three in Spanish. Language dominance was assessed using Dunn and Fox Tree’s (2009) Bilingual Dominance Scale (BDS). All participants were left-handed and had no known neurological or reading disorders. Materials Selection of Stimuli Stimuli included 120 literally plausible idioms (i.e., idioms that can be interpreted literally). For each idiom, two types of context sentences were created. One sentence biased the idiom’s figurative meaning (I worked all day and I am tired, but I need to prepare my lunch before I hit the sack) and another biased the idiom’s literal meaning (As part of our stressrelease seminar we grabbed a bag from the floor, fill it with pillows, and then hit the sack). For each idiom two target words were prepared, one related to the idiom’s figurative meaning (BED) and another to the idiom’s literal meaning (PUNCH). Target words were paired up with idiomatic (literal and figurative-biased) sentences in such a way that they could be congruent or incongruent with the preceding sentence context. This yielded four possible conditions, where a literal biasing sentence was paired with a congruent (literal) or incongruent (figurative) target, and a figurative biasing sentence was paired with a congruent (figurative) or incongruent (literal) target. A control (unrelated) sentence was created for each target, so as to establish RT baselines. Congruent and incongruent targets for idioms used in the figurative- or literal-biasing context were matched for number of syllables, word frequency, familiarity, imageability, and concreteness (ps > .05). Predictions Since the majority of the participants were English dominant, as posed by GSH, it was assumed that the idiomatic expressions’ figurative, rather than literal, meaning would be more salient and more readily available for processing. Consequently, salient figurative meanings of idioms should be retrieved automatically as whole units from the mental lexicon and not undergo any literal analysis. That is, if the idiom is used in the figurative context, incongruent literal targets should be rejected fast and congruent figurative targets accepted fast. Hence, there should be a large discrepancy between RTs to literal and figurative targets for idioms used in the figurative context, such that figurative targets have significantly shorter RTs than literal targets. In contrast, for idioms used in the literal context, responses to congruent literal targets might not be much faster than responses to incongruent figurative targets, as the idiom’s salient (figurative) meaning is activated regardless of context. As hypothesized by the GSH, even if the idiom is used
248 Anna B. Cieślicka and Roberto R. Heredia in the literal context, it still gets automatically activated as a chunk and its salient figurative meaning easily accessed. Consequently, even in the context biasing the idioms’ literal meaning, the figurative meaning should be activated, making the contextually incongruent figurative target easier to process. Therefore, we did not expect to see large discrepancies between RTs to literal and figurative targets following idioms embedded in the literal context. In relation to the ERP data, in line with previous research (Laurent et al. 2006), given the salience of figurative meanings for a bilingual’s dominant language, figurative targets should evoke smaller N400 and P600 amplitudes than literal targets, regardless of the contextual congruency. Procedure Electroencephalogram (EEG) signals were recorded from 64 scalp sites using a Biosemi headcap (10/20 layout). The ground electrodes were Common Mode Sense active electrode and the Driven Right Leg passive electrode. Recordings were referenced to the left and right mastoids. To control for artifacts related to eye movements, bipolar horizontal and vertical electrooculographic activity was recorded. Electrode impedances were kept below 5 KU. The EEG signals were recorded continuously at a sampling rate of 8kHz per channel. The resolution of the Biosemi Active Two is 31nV. The EEG data were initially filtered at .10 Hz for the highpass filter and ERPs were digitally filtered at a low-pass of 30 Hz. Participants completed a consent form and BDS and sat comfortably on a reclining chair in front of a computer screen in a sound-proof chamber. Sentence presentation was controlled by the E-Prime 2 experimental software. For each sentence, each word appeared in the center of the screen for 300 ms. At the end of the sentence, a target was displayed in uppercase letters for 3000 ms, and the participant’s task was to decide if the target was meaningfully related to the preceding sentence. YES responses were signaled by pressing the “0” key and NO responses by pressing the “9” key with the dominant hand. Once participants made their decision, a blank screen appeared for 5000 ms to allow them to rest and prepare for the next trial. Sentence presentation was divided into three blocks of 40 so as to provide participants with a rest. Experimental sessions lasted 60 minutes.
2. Results RT Data Responses exceeding 3.5 SDs above or below the mean were excluded from further analysis (about 2%). The analysis was run on percentage of correct responses and RT for correct responses. Incorrect responses constituted 30% of the data. This high percentage of inaccurate responses
ERP Correlates of Language Processing
249
is expected in the meaningfulness task which simultaneously taps both literal and figurative idiom meanings. Data were analyzed using IBM SPSS V.20 linear mixed effects (LME) models procedure, with Context and Target Type as fixed factors and participants and items as random factors. The analysis revealed main effects of Context, F(2, 2443) = 13.40, p < .0001 and Target Type F(1, 188) =17.90, p < .0001, and a significant interaction between Context and Target Type F(2, 2433) = 23.34, p < .0001. Follow-up multiple comparisons revealed no significant differences between literal and figurative targets for idioms biased toward their literal meaning. This finding confirms the prediction that idioms are automatically activated as chunks in the mental lexicon, even if they are used in sentences that bias their literal meaning. Once an idiom is known to a language user, he/she cannot ignore its figurative meaning. Therefore, when presented with a figurative-related target, there is still a priming effect, even if the required response is negative (i.e., reject the target as incongruent). Both incongruent (figurative target) and congruent (literal target) responses take approximately the same amount of time. This finding confirms GSH’s predictions in which salient meanings are readily and directly activated, irrespective of the contextual bias. In the figurative context, responses varied significantly as a function of Target Type. While congruent, figurative targets were responded to faster (M = 1595, SE = 107 ms), incongruent literal targets took almost 500 ms longer to respond (M = 2076, SE = 109 ms), F(1, 966) = 61.72, p < .0001. This finding confirms our prediction that less salient (literal) meanings of idioms will not be automatically activated when the idioms are highly familiar and used in their salient, figurative meaning. Accuracy Data There was a main effect of Context, F(2, 3501) = 35.34, p < .0001; as well as a significant interaction between Context and Target Type, F(2, 3502) = 35.60, p < .0001. Incongruent targets elicited a significantly higher response inaccuracy percentage than congruent targets. In the literalbiasing context, responses to literal (congruent) targets were significantly more accurate (M = .71, SE = .035) than figurative (incongruent) targets (M = .60, SE = .036), p < .0001. In the figurative-biasing context, responses to figurative (congruent) targets were significantly more accurate (M = .76, SE = .035) than literal (incongruent) targets (M = .59, SE = .036). ERP Data EEG recordings for each subject were first examined for artifact rejection. Trials contaminated by excessive eye or muscle movement artifacts were rejected. Because of heavy movement- and eye-blink related artifacts,
250 Anna B. Cieślicka and Roberto R. Heredia data from 10 participants were excluded. Based on visual inspection of the brain waves, the time window of 300–500 ms was chosen to capture the N400, and the time window from 500–890 ms to capture the P600 component. Average ERPs from −100 ms to 900 ms after the onset of critical target words were computed as a function of the condition (figurative congruent, figurative incongruent, figurative control, literal congruent, literal incongruent, literal control), with 100 ms before the onset of the target word serving as the baseline. Thus, the epoch for each trial spanned the time window from −100 ms to 900 ms. Data were analyzed separately for lateral and midline electrodes, as well as the region (anterior to posterior). Electrodes were assigned to the following regions and laterality: frontal left: Fp1, AF7, AF3, F1, F5, F7; central left: FT7, FC5, FC3, FC1, C1, C3, C5, T7; posterior left: TP7, CP5, CP3, CP1, P1, P3, P5, P7, P9, PO7, PO3, O1; frontal midline: Fpz, AFz, Fz; central midline: FCz, Cz; posterior midline: Pz, POz, Oz, Iz; frontal right: FP2, AF8, AF4, F2, F4, F6, F8; central right: FT8, FC6, FC4, FC2, C2, C4, C6, T8; posterior right: TP8, CP6, CP4, CP2, Ps, P4, P6, P8, P10, PO8, PO4, O2. Peak deflections larger than 3.5 SDs above or below the mean (larger than +/−160 μV) were removed from further analyses. Data were entered into an LME analysis with Context (congruent vs. incongruent vs. control), Target Type (figurative vs. literal), Laterality (left vs. midline vs. right), and Region (frontal vs. central vs. parietal) as fixed factors and participants as random factors. Due to space limitation, we only report the data for left-side electrodes. Analyses for midline and right-side electrodes were highly comparable. N400 Data Analysis conducted on the left-side electrodes (frontal, central and posterior) revealed a main effect of Context, F(2, 2534) = 8.01, p .05), so the confounds related to frequency of items should be avoided. Nonwords were generated with the aid of ARC non-word database (Rastle, Harrington & Coltheart 2002). As mentioned, in similar and dissimilar conditions, word onsets overlapped phonologically with regard to consonants but a vowel differed. The auditory primes in English contained a neighbouring vowel /e/ or /ʌ/, given the visual target with /æ/ (similar vowels) or a distinct /iː/ or /ɪ/, given the /ɒ/ targets (dissimilar vowels). The types of primes with respective targets are summarised in Table 18.1.
Recognition of Unfamiliar L2 Targets
263
Table 18.1 Examples of Prime and Target Types and Examples of Used Items Prime Type
Prime
Target
Similar
Honey /ʌ/ Pencil /e/ Watch /ɒ/ Shop /ɒ/ Lemon Gwelk
Hand /æ/ Panda /æ/ Witch /ɪ/ Sheep /iː/ Snake Flon
Dissimilar Filler Non-word
2.4. Procedure The experiment was a language-specific lexical decision task with crossmodal priming conducted in English (L1 for English participants, L2 for Polish participants). It was developed in E-Prime 2.0 Professional (Psychology Software Tools, Pittsburgh, PA). Participants were seated in front of the computer screens and they were instructed to decide whether a string of letters presented to them was an existing English word or not. First, a fixation appeared on the screen for 500ms, after which a slide with hashtag signs followed, during which a prime was played through headphones (average prime length was 716ms, the screen disappeared after 1000ms). There was a short (250ms) blank screen, after which a visual target (word or non-word) appeared. It remained on the screen until participants have made their decision. If there was no response, the target disappeared after 5000ms. The interstimulus interval was 250ms, a typical value in similar priming experiments (Gor 2018). After the whole cycle, another trial began. The procedure is presented in Figure 18.1. Participants were instructed to do the task as quickly and as accurately as possible. If they hesitated and were unsure of the correct response, they were asked to consider the string a non-word. Testing for Polish participants took place in the Language and Communication Laboratory, Faculty
Figure 18.1 The Experimental Procedure.
264
Bartosz Brzoza
of English, Adam Mickiewicz University, Poznań, Poland. English participants were tested in a psycholinguistic laboratory at the University of Reading, United Kingdom. In both cases participants did the task in closed booths, with no noise around. The conditions, instructions and the ambience of the booths were kept identical. The experiment received a positive evaluation of the internal ethical committee as part of a bigger project on spoken-word recognition and the acquisition of phonological skills. 2.5. Data Analysis The data were subjected to statistical analyses. No participants were identified as outliers. The timing error was below the refresh screen cycle, so no data needed to be excluded because of the computer’s timing error. The error rate was at the level of 8.88%, and incorrect responses were excluded from reaction times analysis. All descriptive statistics and normality tests were conducted. The data showed a normal distribution of scores. Having tested for all assumptions, the data were submitted to a 2-way multivariate ANOVA. The data analysis was conducted only on correct responses. 2.6. Results Mean reaction times of responses to targets are provided in Table 18.2. It shows response times for three investigated groups. These results show that native speakers of English have overall the quickest reaction times, and that their difference between similar and dissimilar primes is not big numerically. Non-native speakers have longer processing latencies. The numerical value of the reaction time to targets preceded by dissimilar primes is numerically greater than to targets
Table 18.2 Mean Reaction Times to Targets in Groups.
Little phonetic experience group Greater phonetic experience group Greatest phonetic experience group (native speakers)
Prime Type
Mean RT [ms]
SD
Similar Dissimilar Filler Non-word Similar Dissimilar Filler Non-word Similar Dissimilar Filler Non-word
790.05 912.35 792.21 1202.34 867.20 946.27 815.94 1014.01 486.63 497.81 509.13 572.10
111.49 159.84 134.49 306.96 182.03 216.81 137.42 269.46 40.35 51.17 50.70 84.46
Recognition of Unfamiliar L2 Targets
265
preceded by similar primes in both groups of Polish participants but the difference between these two conditions seems to be smaller in the greater phonetic experience group of non-native speakers. The reaction times to targets in all groups and for all prime types are plotted in Figure 18.2. In the group with little phonetic experience a significant main effect of prime type on reaction times was observed (F(2,29) = 39.15; p < .001; η2 = .558). Post-hoc tests with Bonferroni correction showed which pairs differed. There was a significant difference between reaction times on targets preceded by similar and dissimilar primes (p < .001) as well as targets preceded by similar vs. non-word primes (p < .001). There was no difference between RTs to targets preceded by similar vs. filler primes (p > .05). In the group with greater phonetic experience a significant main effect of prime type on reaction times was also observed (F(2,15) = 8.19; p < .001; η2 = .325) but the magnitude of the F value and the effect size decreased. There was a significant difference between reaction times on targets preceded by a dissimilar vs. filler prime (p = .005) and non-word vs. filler prime (p = .006) but no significant difference between reaction times on targets preceded by similar versus dissimilar prime (p > .05).
Figure 18.2 Comparison of Mean RTs Obtained for Targets Across Groups of Participants.
266
Bartosz Brzoza
Table 18.3 Differences Between Reaction Times on Targets Preceded by Different Types of Primes (Significant Differences Have Been Marked Grey) Group
Prime Type
Similar
Dissimilar
Filler
Non-word
Little phonetic experience group
Similar Dissimilar Filler Non-word Similar Dissimilar Filler Non-word Similar Dissimilar Filler Non-word
— p < .001 p > .05 p < .001 — p > .05 p > .05 p > .05 — p > .05 p > .05 p < .001
p < .001 — p < .01 p < .001 p > .05 — p = .005 p > .05 p > .05 — p > .05 p = .001
p > .05 p < .01 — p < .001 p > .05 p = .005 — p = .006 p > .05 p > .05 — p = .001
p < .001 p < .001 p < .001 — p > .05 p > .05 p = .006 — p < .001 p = .001 p = .001 —
Greater phonetic experience group
Greatest phonetic experience group
Among the native speakers of English, a group with the greatest phonetic experience, there was also a main effect of prime type on reaction times (F(2,19) = 18.36; p < .001; η2 = .466). Importantly, there was no difference between reaction times on targets preceded by similar primes as opposed to those preceded by dissimilar primes (p > .05). Reaction times on targets preceded by all prime types differed significantly only when compared to those preceded by non-words (similar primes—p < .001; dissimilar primes—p = .001; filler primes—p = .001). All the comparisons together with p values are summarised in Table 18.3. Finally, there was an overall main effect of group (F(2,69) = 14.07; p .05) but for the difference between similar and dissimilar primes the difference was on the tendency level (p = 0.9). The comparison of native speakers versus non-native speakers (little vs. greatest and greater vs. greatest phonetic experience) revealed significant differences in reaction times to targets (p < .001) in all conditions.
3. Discussion and Conclusion All the experimental hypotheses have been corroborated. Polish learners of L2 English are indeed primed by the appearance of a word with a similar vowel and they are not primed by the appearance of a word with a dissimilar vowel. Contrarily, native speakers are not primed by words with either
Recognition of Unfamiliar L2 Targets
267
similar or dissimilar vowels. They ignore only superficially related sounds. The amount of priming has been diminished in the more advanced group of Polish learners of L2 English compared to the less advanced group. The observed main effect of group overall shows that groups differ in their responses to targets preceded by different types of primes. Further analysis of the obtained RTs in groups indicates that the group of native speakers is the quickest in their decisions, with shortest response latencies. Their processing of targets is similar overall, regardless of the prime type. The most important finding is that there is a significant difference in the processing of targets preceded by words with similar (confusable) vowels (/e/ or /ʌ/ given target with /æ/) as opposed to dissimilar (non-confusable) vowels (/iː/ or /ɪ/, given target with /ɒ/) in the least phonetically-experienced group. This difference is gone in more phonetically-experienced participants. It is also absent in the processing exhibited by the native speakers of English (the group with the greatest amount of phonetic experience in English). There is additional evidence from the effect size coefficients of main effect of prime type, which decreases in non-native speakers, following phonetic training. To summarise, the priming effect stemming from similar, confusable vowels has been attested for the least experienced group of non-native speakers. This effect of priming disappears in more experienced groups which points to the role of phonetic experience on dealing with unwanted competition. Such a finding lends support to the theories postulating a qualitative change in the language processing performance following the acquisition of L2 phonology, such as the Lexical Restructuring Model (Metsala & Walley 1998; Walley & Flege 1999). Remarkably, the observed increase in the processing time of most stimuli types between the Polish little phonetic experience and the Polish greater phonetic experience groups is an interesting result. It might indicate that the processing is more accurate and effective but at a cost of an additional fraction of attention that is put into the process of word recognition. Native-likeness of the spoken word recognition process, it seems, might lie in the successful exclusion of unwanted competitors, and might not go hand in hand with the more native-like timecourse of the process. Future research should investigate potential priming effects of different vowel contrasts as well as the role of different features important to word recognition. Scholars should try to tease apart the role of phonological and lexical detail on the lexical processing, as well as the interrelation of both levels of processing. Incorporating many factors into analyses is also the way to go.
Acknowledgements This research was supported by a grant 2016/21/N/HS2/02605 received from the National Science Centre, Poland.
268
Bartosz Brzoza
References Alshangiti, W. & B. Evans. 2015. Comparing the efficiency of vowel production training in immersion and non-immersion settings for Arabic learners of English. In The Scottish Consortium for ICPhS 2015 (ed.), Proc. 18th ICPhS Glasgow. Glasgow: University of Glasgow. Retrieved from https://www.interna tionalphoneticassociation.org/icphs-proceedings/ICPhS2015/preface.html Best, C. T. 1995. A direct realist view of cross-language speech perception. In W. Strange (ed.), The development of speech perception and linguistic experience: Issues in cross language research, 171–204. Timonium: York Press. Blumenfeld, H. K. & V. Marian. 2007. Constraints on parallel activation in bilingual spoken language processing: Examining proficiency and lexical status using eye-tracking. Language and Cognitive Processes 22(5). 633–660. Broersma, M. 2012. Increased lexical activation and reduced competition in second-language listening. Language and Cognitive Processes 27(7–8). 1205–1224. Broersma, M. & A. Cutler. 2008. Phantom word activation in L2. System 36(1). 22–34. Broersma, M. & A. Cutler. 2011. Competition dynamics of second-language listening. The Quarterly Journal of Experimental Psychology 64. 74–95. Darcy, I., D. Daidone & C. Kojima. 2013. Asymmetric lexical access and fuzzy lexical representations in second language learners. The Mental Lexicon 8(3). 372–420. Dziubalska-Kołaczyk, K. 2003. Speech is in the ear of the listener: Some remarks on the acquisition of second language sounds. In K. Hales & A. Terveen (eds.), Selected papers from the sixth college-wide conference for students in languages, linguistics and literature 2002, 81–92. Honolulu: University of Hawai‘i at Manoa. Dziubalska-Kołaczyk, K. & B. Walczak. 2010. Polish. Revue belge de Philologie et d’Histoire 88(3). 817–840. Frenck-Mestre, C., P. Peri, C. Meunier & R. Espesser. 2011. Perceiving non-native vowel contrasts: ERP evidence of the effect of experience. In M. Wrembel, M. Kul & K. Dziubalska-Kołaczyk (eds.), Achievements and perspectives in SLA of speech: News sounds 2010, Vol. 1, 79–90. Frankfurt am Main: Peter Lang. Gaskell, G. & W. D. Marslen-Wilson. 1997. Integrating form and meaning: A distributed model of speech perception. Language and Cognitive Processes 12. 613–656. Gor, K. 2018. Phonological priming and the role of phonology in nonnative word recognition. Bilingualism: Language and Cognition 21(3). 437–442. Goto, H. 1971. Auditory perception by normal Japanese adults of the sounds “L” and “R”. Neuropsychologia 9(3). 317–323. Groot, A. M. B. de. 2011. Language and cognition in bilinguals and multilinguals: An introduction. New York: Psychology Press. Hazan, V., A. Sennema, M. Iba & A. Faulkner. 2005. Effect of audiovisual perceptual training on the perception and production of consonants by Japanese learners of English. Speech Communication 47. 360–378. Heuven, W. J. B. van, P. Mandera, E. Keuleers & M. Brysbaert. 2014. SUBTLEXUK: A new and improved word frequency database for British English. The Quarterly Journal of Experimental Psychology 67(6). 1176–1190.
Recognition of Unfamiliar L2 Targets
269
Huettig, F., J. Rommers & A. S. Meyer. 2011. Using the visual world paradigm to study language processing: A review and critical evaluation. Acta Psychologica 137(2). 151–171. Imai, S., A. C. Walley & J. E. Flege. 2005. Lexical frequency and neighborhood density effects on the recognition of native and Spanish-accented words by native English and Spanish listeners. The Journal of the Acoustical Society of America 117(2). 896–907. Insam, M. & B. Schuppler. 2015. Evaluating the effects of pronunciation training on non-native speech: A case study report. In A. Leemann, M. J. Kolly, S. Schmid & V. Dellwo (eds.), Trends in phonetics and phonology in Germanspeaking Europe, 317–330. Bern: Peter Lang. Marslen-Wilson, W. & P. Zwitserlood. 1989. Accessing spoken words: The importance of word onsets. Journal of Experimental Psychology: Human Perception and Performance 15(3). 576–585. McQueen, J. M. 2009. Eight questions about spoken word recognition. In G. Gaskell (ed.), The Oxford handbook of psycholinguistics, 37–53. Oxford: Oxford University Press. Metsala, J. L. & A. C. Walley. 1998. Spoken vocabulary growth and the segmental restructuring of lexical representations: Precursors to phonemic awareness and early reading ability. In J. L. Metsala & L. C. Ehri (eds.), Word recognition in beginning literacy, 89–120. Hillsdale: Erlbaum. Norris, D. & J. M. McQueen. 2008. Shortlist B: A Bayesian model of continuous speech recognition. Psychological Review 115(2). 357–395. Psychology Software Tools, Pittsburgh, PA. 2012. Psychology Software Tools, Inc. [E-Prime 2.0]. Retrieved from www.pstnet.com. Rastle, K., J. Harrington & M. Coltheart. 2002. 358,534 nonwords: The ARC nonword database. Quarterly Journal of Experimental Psychology: Section A 55(4). 1339–1362. Rojczyk, A. 2011. Overreliance on duration in nonnative vowel production and perception: The within lax vowel category contrast. In M. Wrembel, M. Kul & K. Dziubalska-Kołaczyk (eds.), Achievements and perspectives in SLA of speech: News sounds 2010, Vol. 2, 239–250. Frankfurt am Main: Peter Lang. Schmidtke, J. 2014. Second language experience modulates word retrieval effort in bilinguals: Evidence from pupillometry. Frontiers in Psychology 5. 1–16. Slowiaczek, L. M. & M. B. Hamburger. 1992. Prelexical facilitation and lexical interference in auditory word recognition. Journal of Experimental Psychology: Learning, Memory and Cognition 18. 1239–1250. Vohs, K. D. 2015. Money priming can change people’s thoughts, feelings, motivations, and behaviors: An update on 10 years of experiments. Journal of Experimental Psychology: General 144(4). e86–e93. Walley, A. C. & J. E. Flege. 1999. Effects of lexical status on the perception of native and non-native vowels: A developmental study. Journal of Phonetics 27. 307–332. Weber, A. & A. Cutler. 2004. Lexical competition in non-native spoken-word recognition. Journal of Memory and Language 50(1). 1–25.
19 Applications of Electropalatography in L2 Pronunciation Teaching and Phonetic Research Grzegorz Krynicki and Grzegorz Michalski Introduction Current trends in teaching foreign language (FL) pronunciation have been influenced by psychology, psychotherapy, neurolinguistics, computer science and medicine. These influences have given rise to multidisciplinary practices that are becoming part of the mainstream pronunciation teaching. Particularly attractive and influential is the multisensory approach to pronunciation pedagogy based on evidence that multisensory input is pivotal in understanding concepts taught both in first language (L1) and FL acquisition. This approach assumes that FL speech perception and production may be mediated through sensory modalities, supplementing the auditory channel with visual or even kinaesthetic and tactile reinforcements (Underhill 1996; Wrembel 2001, 2011). Visual feedback on learner’s production is perhaps the easiest to generate by means of a computer. With the development of new technologies in the area of computer-assisted pronunciation training (CAPT), new possibilities for delivering feedback in both auditory and visual modalities appeared. The computer-assisted approach to pronunciation training has been explored since the late 1970s for the acquisition of L1 pronunciation (Destombes 1993). The first computer-assisted pronunciation training programs displayed pitch and intensity versus time and were used by the deaf to learn to speak intelligibly. In the 1980s, Flege (1988) used the idea of computer-generated visual feedback in FL pronunciation teaching and demonstrated its value in training learners to produce correct vowels in a target language. These results supported the idea that if speech errors are correctly detected and displayed by the system along with corrective information, learners could use it to learn how to correct their errors. Recent comparisons of various visual feedback methods indicate that some of the most effective visual information for teaching pronunciation of a given sound may be provided by the display of articulators in movement for that sound (Eskenazi 2009: 837). FL pronunciation training usually involves presenting the learner with visual feedback that compares their production of a given L2 sound to that of a native speaker’s
Applications of Electropalatography
271
(the “target”). Methods devised to visualise the movement of articulators in real time may include ultrasound imaging (Stone et al. 1992; Gick et al. 2008), electromagnetic articulometry (Levitt & Katz 2010), magnetic resonance imaging (Alwan et al. 1997) and electropalatography. EPG has already been used mainly as a source of visual feedback in speech therapy (e.g., Hardcastle et al. 1991b). To a limited extent, it has also been used for FL pronunciation training (Gibbon et al. 1991; Schmidt & Beamer 1998; Bright 1999; Schmidt 2012; Hacking et al. 2017). The effectiveness of EPG in foreign accent reduction, however, has never been tested experimentally as all the reports on the use of EPG in FL pedagogy are case studies based on a limited number of subjects with tentative though generally favourable conclusions. In speech therapy, to which the bulk of literature involving EPG is devoted, there also remains a lack of properly controlled studies. Fifteen out of 228 (6.6%) studies that were analysed in Wrench (2011) were attributed the Efficacy Level III, the fifth in a 6-grade scale of evidence for evaluating quality of treatment studies (Scottish Intercollegiate Guidelines Network 2010), which corresponds to “well-designed non-experimental studies, i.e., comparative, correlational, and case studies”. Of these 228, 213 (93.4%) were attributed the sixth and the lowest grade on that scale, which corresponds to “expert committee report, consensus conference, clinical experience of respected authorities”. On the other end of the scale, there are randomised controlled trials (RCT), which, if successfully replicated, provide the highest scientific evidence (cf. Burns et al. 2011 for other scales of scientific evidence). Although the majority of the studies reported by Wrench (2011) showed at least a minimal improvement and none showed degradation in speech performance of patients subjected to EPG therapy, the U.S. Preventive Services Task Force and the corresponding regulatory bodies worldwide declare no official recommendation for EPG to be provided routinely, because “the balance of benefits and harms is too close to justify a general recommendation” (Wrench 2011: 17). To remedy that situation, a randomised controlled trial of EPG in the area of speech therapy for children with Down Syndrome has been recently conducted (Wood et al. 2019). Although the moderately successful results of the EPG method did not reach statistical significance at the 0.05 level in that study, it set a new standard with its methodological rigour, the sample size of 27 subjects and the 3-month long intervention followed by a 6-month monitoring period. Providing a definite answer as to the efficacy of EPG in teaching L2 consonants and consonant clusters may require following or surpassing the above standard. At the same time, it may require addressing the limitations of the technology like the possible interference of the artificial palate with articulation, the lack of coverage over the velar and labial regions and no one-to-one correspondence between the EPG pattern and the accompanying acoustic signal. It should also be stressed that the
272
Grzegorz Krynicki and Grzegorz Michalski
potential efficacy of any visual feedback technique on learner’s articulation in the context of L2 pronunciation instruction can be considered only if skilfully integrated with acoustic training. The multisensory approach always assumes the acoustic signal as the primary source of feedback in learning both first and foreign language. Such acoustic training should also include perception training considering that without accurate perceptual targets to guide the sensorimotor learning of L2 sounds, production of the L2 sounds would be inaccurate (Flege 1995: 238).
Foreign Accent Reduction The literature on the application of EPG in pronunciation teaching is scarce. The studies in English that address the issue (that the authors of the present chapter are aware of) are: Gibbon et al. 1991; Schmidt & Beamer 1998; Bright 1999; Schmidt 2012; Hacking et al. 2017. None of these are experimental or even quasi-experimental in character. They are case or correlational studies with the number of subjects ranging from one to ten. They are therefore at best Efficacy Level III in the scale referenced above. None of them therefore show the causal relationship between the EPG treatment and the possible improvement in the pronunciation of persons subjected to it. All of them, however, suggest that EPG may be successfully employed in efforts to assist foreign accent reduction. Gibbon et al. (1991) have demonstrated the effectiveness of EPG in the teaching of /r/—/l/ pronunciation to two advanced Japanese learners of English. The learning process took 45 minutes per day for two weeks. Schmidt and Beamer (1998) report on three adult native speakers of Thai who participated in EPG treatment designed to teach contrasts between English /s/–/ʃ/, /t/–/θ/, and /l/—/r/ in 45-min. sessions biweekly for 24 weeks. The subjects received additional articulatory practice without EPG to generalize new motor patterns learned with EPG. All subjects learned to correctly produce the target contrasts. Bright (1999, cit. after Isaacson 2015: 29) used EPG to provide feedback for accent reduction in three native adult Spanish speakers learning English. The results of a formal accent articulation test showed marked reduction in accent of selected English consonants. Schmidt (2012) reports on two adult native speakers of Korean participating in electropalatographic treatment designed to teach them contrasts between English /s/–/ʃ/, /z/–/ʤ/, and /l/–/r/. Participants were successful in learning to produce the English contrasts. Perception of English consonants was tested pre- and post-treatment. Post-treatment perception improved somewhat for treated consonants but not for untreated consonants, although perception was not directly trained. Hacking et al. (2017) is the most extensive study on the use of EPG in foreign language reduction. It reports on ten natives of American English learning Russian palatalized vs. unpalatalized distinction in a 6-week course utilizing EPG. Although not all subjects showed significant
Applications of Electropalatography
273
improvements, all showed an increase from pre- to post-training in the second formant frequency of vowels preceding palatalized consonants, thus enhancing their contrast between palatalized and unpalatalized consonants. All pre- and post-training productions were evaluated by Russian natives, which showed a modest increase in identification accuracy. These results suggest that EPG training can be an effective intervention with adult L2 learners. One of the major problems in all of the above studies is that none of them involves a control group so it cannot be determined whether the described improvements in production would not have been possible without EPG. Without the control group there is also no possibility of random assignment to the groups. All of the above studies also recruited subjects opportunistically, which may result in a self-selection bias. Moreover, Schmidt (2012) and Hacking et al. (2017) used artificial palates with an electrode layout that was not normalised for anatomical between-speaker differences, which may have resulted in unreliable effects of the comparison between the learner and target articulations. Finally, most of the studies tested EPG efficacy on a narrow selection of consonants only in binary oppositions. Other selections might include consonants less amenable to contrastive analysis by means of electropalatography (e.g., velars or bilabials) and for some languages it might include a larger set of related sounds. The advocated RCT study on the efficacy of EPG in L2 pronunciation teaching should thus cover a possibly wide range of sets of related consonants; it should also train and test their use in a variety of contexts. Ideally, in such a study, anatomically normalised artificial palates should be used and targets presented to the learner should come from multiple native speakers. Subject for such a study should also be randomly recruited from a population and randomly assigned to control and experimental groups. Both groups should be rigorously matched and checked for accidental bias, i.e., the diversity of the groups in terms of the distribution of important prognostic factors (e.g., language aptitude, language proficiency). Finally, the pronunciation course designed for the purpose of such an RCT experiment should also facilitate generalisation of patterns learned with EPG to everyday speaking situations (Gibbon & Paterson 2006).
Phonetic Research Apart from the therapeutic and pedagogical applications, EPG has been used to describe the location and timing of linguapalatal contact that accompanies the articulation of speech sounds. Unlike acoustic data, electropalatographic data may be effectively used to determine the actual spatial and temporal articulatory correlates of those distinctive features of segments in a given language that involve a stricture between the tongue and the palate. EPG has also been successfully applied to the study of
274
Grzegorz Krynicki and Grzegorz Michalski
coarticulation and assimilatory processes in over a dozen languages or dialects. The focus of phonetic research on particular languages with the use of EPG—usually combined with acoustic data and, occasionally, with intraoral pressure measurement or electromagnetic articulography— varies. Below we outline the research reported on in the last two decades. (For older applications of EPG in phonetic research the reader may refer to e.g., Baken & Orlikoff 2010: 527). Kochetov (2018) studied the place and manner contrasts on a majority of the consonants of Japanese. (This is a rare example of research not focused on coarticulation or phonostylistic processes.) Place of articulation (POA) contrasts have also been studied in the case of coronals between Argentine and Cuban Spanish (Kochetov & Colantoni 2011), whereas durational contrasts, as correlates of voicing contrasts, have been studied in Swiss German word-initial singleton and geminate plosives (Kraehenmann & Lahiri 2008) and in Turkish coronal obstruents (Ünal-Logacev & Fuchs 2015). Spanish and Brazilian Portuguese were sources of the study on the articulation of the palatal nasal /ɲ/ involving two experiments: one to measure coarticulation with vowels, the other to measure occlusion under different prosodic conditions (Shosted et al. 2012). Typically, studies concentrate either on coarticulation or articulation vs. prosody. The former include, among others, coarticulation of lingual consonants with vowels in Spanish (Fernández Planas 2000), coarticulation of /r/ with vowels in Italian (Spreafico et al. 2015) and coarticulation of voiceless lingual fricatives in Polish (Guzik & Harrington 2007). Palatalisation before high front segments has been specifically targeted in studies on the coronal plosives in French (Corneau 2000), as well as a set of four coronals ([t, r, ʂ, ʐ]) (Rochoń & Pompino-Marschall 1999) and the bilabial plosives in Polish (Pompino-Marschall & Żygis 2003). The influence of prosodic factors on the articulation of selected consonants has been investigated, for example, in the case of [c] (as an allophone of /k/), /ɲ/ and /ʎ/ (Recasens & Espinosa 2006) and /ɾ/ vs. /r/ in Catalan (Recasens & Espinosa 2007), temporal organisation of wordinitial consonant clusters in German (Bombien et al. 2007, 2010), /t/ and /n/ in English (Cho & Keating 2009), and /t/ vs. /ʈ/ in Arrernte (Tabain & Beare 2015). Finally, EPG has been used in research on the gradience of phonostylistic phenomena, such as regressive POA assimilation (Ellis & Hardcastle 2002) or /l/-vocalisation in English (Scobbie et al. 2007).
Challenges in Therapeutic and Pedagogical Applications of EPG Whereas the applications of EPG in phonetic research have provided many valuable insights, the results of studies on the applications of EPG
Applications of Electropalatography
275
in speech therapy and foreign accent reduction are not straightforward to interpret and in many respects raise more questions than answers. Any attempts to provide some definite answers as to the efficacy of EPG should minimise problems affecting all EPG applications so far, including: •
•
•
•
•
All pseudopalates initially cause interference with correct articulation (Baum & McFarland 1997). Subjects should therefore undergo an adaptation period before each training session (Gibbon et al. 1991) and they should wear a practice pseudopalate several days prior to the experiment (Schmidt & Beamer 1998). The type of the pseudopalate selected for the experiment should also be tight-fitted and as thin as possible. EPG provides no information about articulatory movements before and after the contact was made (Byrd et al. 1995). Moreover, most EPG systems have no or a limited ability to detect interlabial and velar contact and none is able to visualise nasalisation or voicing. Experimenters should therefore consider combining EPG with other imaging instrumentation such as ultrasound (Stone et al. 1992) or optopalatography (Birkholz & Neuschaefer-Rube 2011). A sound acoustically acceptable in a given language can be produced by different arrangements of articulators (Neiberg et al. 2008); EPG patterns are subject to random variation and speakers’ idiosyncrasies. Any impressionistic and especially automatic comparisons of the learner and the native patterns should be conducted provided they are obtained from anatomically normalised palates or they are normalised algorithmically. All such comparisons should also leave sufficient margin for unavoidable within- and between-speaker articulatory variability. Learners find it difficult to generalise new patterns learned with EPG into everyday speaking situations (Gibbon & Paterson 2006). Therapies and courses that employ EPG should adopt specific strategies to promote generalisation and maintenance of the new patterns. The feedback that has often been used is not always easy for L2 speakers to interpret during training, which may limit its effectiveness. Linguistically-motivated data (dimensionality) reduction techniques (e.g., Hardcastle et al. 1991a) should be used to visualise the key differences between the student and model articulations and these differences should be presented immediately after L2 production (Öster 1997: 145).
Conclusions Over the past 50 years, the use of electropalatography has been reported in close to 600 papers ranging over a wide variety of topics in general phonetics and speech-language pathology. Although many of the latter suggest
276
Grzegorz Krynicki and Grzegorz Michalski
the potential effectiveness of EPG, the evidence supporting such efficacy is not strong as, to our knowledge, only one of them was a randomised controlled trial. The scarcity of such evidence, combined with the high cost of manufacturing the artificial palates, the inconvenience of wearing them as well as the inherent drawbacks of the technology all reflect on the popularity of electropalatography as a source of visual feedback in therapy. The uses EPG as source of visual feedback are particularly scarce in foreign accent reduction where the benefit-cost ratio seems even less favourable than in speech therapy. In our chapter, we have provided an overview of available studies on foreign accent reduction and of most influential studies from the past two decades that involved the use of EPG as a source of articulatory data for phonetic research. We have also highlighted some methodological challenges that should be addressed in a well-designed randomised controlled study testing the effectiveness of EPG in the area of speech therapy and foreign accent reduction.
References Alwan, A. A., S. S. Narayanan & K. Haker. 1997. Toward articulatory-acoustic models for liquid approximants based on MRI and EPG data, Part II: The rhotics. Journal of the Acoustical Society of America 101. 1078–1089. Baken, R. & F. Orlikoff. (2010) 2000. Clinical measurement of speech and voice, 2nd edition. San Diego: Singular Publishing. Baum, S. & D. McFarland. 1997. The development of speech adaptation to an artificial palate. Journal of the Acoustical Society of America 102. 2353–2359. Birkholz, P. & C. Neuschaefer-Rube. 2011. Combined optical distance sensing & electropalatography to measure articulation. In Proc. of the Interspeech 2011, 285–288. Florence, Italy. Bombien, L., C. Mooshammer, P. Hoole & B. Kühnert. 2010. Prosodic and segmental effects on EPG contact patterns of word-initial German clusters. Journal of Phonetics 38. 388–403. Bombien, L., C. Mooshammer, P. Hoole, T. Rathcke & B. Kühnert. 2007. Articulatory strengthening in initial German /kl/ Clusters under prosodic variation. In J. Trouvain & W. J. Barry (eds.), Proceedings of the 16th international congress of the ICPhS, 457–460. Saarbrücken: Universität des Saarlandes. Bright, A. 1999. The palatometer as an instrument for accent reduction therapy with three native ESL Spanish speakers. Unpublished master’s thesis. Provo, UT: Brigham Young University. Burns, P. B., R. J. Rohrich & K. C. Chung. 2011. The levels of evidence and their role in evidence-based medicine. Plastic and Reconstructive Surgery 128(1). 305–310. Byrd, D., E. Flemming, C. A. Mueller & C. C. Tan. 1995. Using regions and indices in EPG data reduction. Journal of Speech and Hearing Research 38. 821–827. Cho, T. & P. Keating. 2009. Effects of initial position versus prominence in English. Journal of Phonetics 37. 466–485. Corneau, C. 2000. An EPG study of palatalization in French: Cross-dialect and inter-subject variation. Language Variation and Change 12. 25–49.
Applications of Electropalatography
277
Destombes, F. 1993. The development and application of the IBM speech viewer. In A. Brekelmans, A. G. Ben & F. C. Elsendoorn (eds.), Interactive learning technology for the deaf, 187–198. Berlin: Springer. Ellis, L. & W. J. Hardcastle. 2002. Categorical and gradient properties of assimilation in alveolar to velar sequences: Evidence from EPG and EMA data. Journal of Phonetics 30. 373–396. Eskenazi, M. 2009. An overview of spoken language technology for education. Speech Communication 51(10). 832–844. Fernández Planas, A. M. 2000. Estudio electropalatográfico de la coarticulación vocálica en estructuras VCV en castellano. Doctoral dissertation. Barcelona: University of Barcelona. Flege, J. E. 1988. Using visual information to train foreign language vowel production. Language Learning 38(3). 365–407. Flege, J. E. 1995. Second language speech learning theory, findings, and problems. In W. Strange (ed.), Speech perception and linguistic experience: Issues in crosslanguage research, 233–277. Timonium, MD: York Press. Gibbon, F. E., B. Hardcastle & H. Suzuki. 1991. An electropalatographic study of the /r/, /l/ distinction for Japanese learners of English. Computer Assisted Language Learning 4(3). 153–171. Gibbon, F. E. & L. Paterson. 2006. A survey of speech and language therapists’ views on electropalatography therapy outcomes in Scotland. Child Language Teaching and Therapy 22(3). 275–292. Gick, B., B. M. Bernhardt, P. Bacsfalvi & I. Wilson. 2008. Ultrasound imaging applications in second language acquisition. In J. G. Hansen Edwards & M. L. Zampini (eds.), Phonology and second language acquisition, 309–322. Amsterdam: John Benjamins. Guzik, K. M. & J. Harrington. 2007. The quantification of place of articulation assimilation in electropalatographic data using the similarity index (SI). Advances in Speech: Language Pathology 9(1). 109–119. Hacking, J. F., B. L. Smith & E. M. Johnson. 2017. Utilizing electropalatography to train palatalized vs. unpalatalized consonant productions by native speakers of American English learning Russian. Conference presentation: New Sounds 2016, Aarhus. Hardcastle, W. J., F. E. Gibbon & K. Nicolaidis. 1991a. EPG data reduction methods and their implications for studies of lingual coarticulation. Journal of Phonetics 19. 251–266. Hardcastle, W. J., F. E. Gibbon & W. Jones. 1991b. Visual display of tonguepalate contact: Electropalatography in the assessment & remediation of speech disorders. British Journal of Disorders of Communication 26. 41–74. Isaacson, L. D. 2015. Establishing normative data for contact patterns of fricative production by native German speakers: An electropalatography study. Unpublished MA Thesis. Provo, UT: Brigham Young University. Kochetov, A. 2018. Linguopalatal contact contrasts in the production of Japanese consonants: Electropalatographic data from five speakers. Acoustical Science and Technology 39(2). 84–91. Kochetov, A. & L. Colantoni. 2011. Coronal place contrasts in Argentine and Cuban Spanish: An electropalatographic study. Journal of the International Phonetic Association 41. 313–342.
278
Grzegorz Krynicki and Grzegorz Michalski
Kraehenmann, A. & A. Lahiri. 2008. Duration differences in the articulation and acoustics of Swiss German word-initial geminate and singleton stops. The Journal of the Acoustical Society of America 23(6). 4446–4455. Levitt, J. S. & W. F. Katz. 2010. The effect of EMA-based augmented visual feedback on the English speakers’ acquisition of the Japanese flap. ISCA Proceedings of Interspeech, Makuhari. 1862–1865. Neiberg, D., G. Ananthakrishnan & O. Engwall. 2008. The acoustic to articulation mapping: Non-linear or non-unique? Proceedings of Interspeech. 1485–1488. Öster, A.-M. 1997. Auditory and visual feedback in spoken L2 teaching. Reports from the Department of Phonetics, Umeå University, PHONUM 4. 145–148. Pompino-Marschall, B. & M. Żygis. 2003. Surface palatalization of Polish bilabial stops: Articulation and acoustics. Proceedings of the 15th international congress of phonetic sciences, 1751–1754. Barcelona: Causal Productions. Recasens, D. & A. Espinosa. 2006. Articulatory, positional and contextual characteristics of palatal consonants: Evidence from Majorcan Catalan. Journal of Phonetics 34. 295–318. Recasens, D. & A. Espinosa. 2007. Phonetic typology and positional allophones for Alveolar Rhotics in Catalan. Phonetica 64. 1–28. Rochoń, M. & B. Pompino-Marschall. 1999. The articulation of secondarily palatalized coronals in polish. In J. J. Ohala, Y. Hasegawa, M. Ohala, D. Granville & A. C. Bailey (eds.), Proceedings from 14th international congress of phonetic sciences, 1897–1900. American Institute of Physics. Schmidt, A. M. 2012. Effects of EPG treatment for English consonant contrasts on L2 perception and production. Clinical Linguistics and Phonetics 26(11– 12). 909–925. Schmidt, A. M. & J. Beamer. 1998. Electropalatography treatment for training Thai speakers of English. Clinical Linguistics and Phonetics 12(5). 389–403. Scobbie, J. M., M. Pouplier & A. A. Wrench. 2007. Conditioning factors in external sandhi: An EPG study of English /l/ vocalisation. In J. Trouvain & W. J. Barry (eds.), Proceedings of the 16th international congress of the ICPhS, 441– 444. Saarbrücken: Universität des Saarlandes. Scottish Intercollegiate Guidelines Network. 2010. www.sign.ac.uk/pdf/sign118. pdf, after E.-A. Efstratiadou. 2018. Investigation of different therapy approaches for aphasia in the Greek language. Unpublished PhD Thesis. Shosted, R., J. I. Hualde & D. Scarpace. 2012. Palatal complexity revisited: An electropalatographic analysis of /ɲ/ in Brazilian Portuguese with comparison to Peninsular Spanish. Language and Speech 55(4). 477–502. Spreafico, L., C. Celata, A. Vietti, C. Bertini & I. Ricci. 2015. An EPG+UTI study of Italian /r/. Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow, UK: The University of Glasgow. Stone, M., A. Faber, L. Raphael & T. Shawker. 1992. Crosssectional tongue shape and linguopalatal contact patterns in [s], [ʃ], and [l]. Journal of Phonetics 20(2). 253–270. Tabain, M. & R. Beare. 2015. An EPG and EMA study of apicals in stressed and unstressed position in Arrernte. Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow, UK: The University of Glasgow. Ünal-Logacev, Ö. & S. Fuchs. 2015. Voicing contrast in Turkish: Simultaneous measurements of acoustics, EPG and intraoral pressure. Proceedings of the
Applications of Electropalatography
279
18th International Congress of Phonetic Sciences. Glasgow, UK: The University of Glasgow. Underhill, A. 1996. Making pronunciation work for your learners. Language Teacher Online 20(9). Wood, S. E., C. Timmins, J. Wishart, W. J. Hardcastle & J. Cleland. 2019. Use of electropalatography in the treatment of speech disorders in children with down syndrome: A randomized controlled trial. International Journal of Language & Communication Disorders 54(2). 234–248. Wrembel, M. 2001. Innovative approaches to the teaching of practical phonetics. Proceedings of the Phonetics Teaching and Learning Conference PTLC2001, UCL, London. 63–66. Wrembel, M. 2011. Cross-modal reinforcements in phonetics teaching and learning: An overview of innovative trends in pronunciation pedagogy. In W.-S. Lee & E. Zee (eds.), Proceedings of the 17th ICPhS, 104–107. Hong Kong: City University of Hong Kong. Wrench, A. 2011. Electropalatography clinical review. Document EPG. Articulate Instruments Ltd. www.articulateinstruments.com/wordpress/wp-content/ uploads/2013/02/Electropalatography-Clinical-Review.pdf
20 Polish Two-Consonant Clusters A Study in Native Speakers’ Phonotactic Intuitions1 Jolanta Szpyra-Kozłowska and Paulina Zydorowicz 1. Introduction Polish is well-known as a language which abounds in consonant clusters rarely found in other languages. Thus, as often pointed out, there can be up to four consonants in word initial position, e.g., [pstr] in pstry ‘pied’ and up to five consonants in word-final position, e.g., [mpstf] in przestępstw ‘crime, gen. pl.’. The clustering possibilities of Polish consonants are epitomized in the famous tongue twister with many consonantal sequences, coronal fricatives and affricates in particular, which Poles proudly attempt to teach foreign learners: W Szczebrzeszynie chrząszcz brzmi w trzcinie [f ʂt͡ʂɛbʐɛʂɨɲɛ xʂow̃ʂ͡tʂ bʐmʲi f tʂʨiɲɛ] ‘In Szczebrzeszyn, a beetle buzzes in the reed’. Such heavy clusters pose a serious challenge not only for language learners, but also for an adequate account of Polish phonotactics and for different theories of this important aspect of phonology. The issue in question has, understandably, attracted many linguists’ attention, who have devoted numerous studies to it. Thus, Polish clusters were examined in several older publications, for example, Bargiełówna (1950), Leszczyński (1969), Rocławski (1976), Dunaj (1985), Dobrogowska (1992), as well as in more recent ones, e.g., by Śledziński (2010). The authors are concerned with various aspects of clusters: diachronic changes, their synchronic status and structure both in Standard Polish and in local dialects. Apart from older structural analyses, a variety of newer, theory-based approaches to Polish consonant sequences have been offered: within generative models, in reference to syllable structure (e.g., Bethin 1992; Rubach & Booij 1990; Szpyra 1995), in Government Phonology (e.g., Cyran & Gussmann 1999) and Optimality Theory (Rochoń 2000). In recent years, the problem of phonotactics and consonant clusters in particular has been brought to attention again due to the studies of Katarzyna Dziubalska-Kołaczyk, who developed a measure of cluster preferability called Net Auditory Distance (NAD; e.g., DziubalskaKołaczyk 2009, 2014) within the framework of Natural Phonology (Stampe 1979; Donegan & Stampe 1979; Dressler 1984). The author
Native Speakers’ Phonotactic Intuitions
281
and her collaborators tested the role of the NAD principle in different areas of external evidence such as corpus-based research, first and second language acquisition and diachronic change (e.g., Dziubalska-Kołaczyk 2014; Marecka & Dziubalska-Kołaczyk 2014; Zydorowicz et al. 2016; Zydorowicz & Orzechowska 2017). In this chapter, we undertake the issue of Polish two-consonant clusters, both word initial and word final, from the perspective of native speakers’ tolerance and acceptance of novel items with such sequences. We report on an experiment in which 50 Polish students were asked to express acceptability judgements concerning 80 monosyllabic nonwords, 68 of which contain two-consonant clusters of various segmental structure, with a view to finding answers to the following research questions: •
•
What are the participants’ acceptability judgements of nonwords with different two-consonant clusters in word initial and final positions? Which of them are acceptable and which unacceptable as potential Polish words? How can the experimental results be accounted for? Can it be done in terms of: a. the Sonority Sequencing Generalization? b. the Net Auditory Distance? c. cluster frequency?
•
Which of these approaches is most adequate in dealing with the experimental material?
Thus, the chapter sets itself two major goals: empirical and theoretical. To our knowledge, this is the first study devoted to examining Poles’ acceptability ratings with reference to phonotactic patterns.2 The chapter is structured as follows. Section 2 discusses several important issues concerning consonant cluster phonotactics and nonword acceptability judgements. Section 3 presents the details of the experimental design and the obtained results. Next, in Section 4, the collected data are analysed in terms of the Sonority Sequencing Generalization and, in Section 5 from the perspective of NAD. Cluster frequency is considered as a possible determinant of nonword acceptability judgements in Section 6. Finally, conclusions are drawn in Section 7.
2. Selected Approaches to Phonotactic Knowledge Phonotactics is often claimed to constitute the core of phonology. Trask (1996: 277) defines it as “a set of constraints on the possible sequences of consonant and vowel phonemes within a word, a morpheme or a syllable.” Due to such restrictions many sequences of segments, known as systematic gaps, are ruled out by a given language. For instance, no
282
Jolanta Szpyra-Kozłowska and Paulina Zydorowicz
English word begins with /sr/, /zd/ or /lp/. However, not all combinations of segments which are in agreement with the phonotactic rules are actually attested in a language and constitute the so-called accidental gaps. According to Trask (1996: 5), an accidental gap is “a possible word which is phonologically well-formed in every respect but which happens not to exist, such as /blik/ in English.” Accidental and systematic gaps jointly form the so-called nonwords, pseudowords, nonce words or nonsense words as they do not correspond to real words. Thus, the following categories of items can be isolated (illustrated with English examples): • • •
with sequences which are permitted and attested, i.e., existing words (e.g., brick)3 with sequences which are permitted, but unattested, i.e., accidental gaps (e.g., blick) with sequences which are unpermitted and unattested, i.e., systematic gaps (e.g., bnick).4
In our chapter, the focus is on the second category of nonwords, i.e., forms with word-initial and word-final two-consonant clusters which are found in the Polish lexicon, but, with the remaining segments, constitute accidental gaps. Since such sequences are attested in actual words, we might expect them to be all acceptable by native speakers. As the experimental results in Section 3.4. demonstrate, however, this is not the case and among the experimental items we find both acceptable and unacceptable nonwords. It is therefore crucial at this point to present the dominant views on the mechanisms behind native speakers’ acceptability judgements of nonce words. In studies concerning this issue, two major types of approaches can be isolated: lexical and grammatical. According to the first of them (e.g., Greenberg & Jenkins 1964; Vitevitch & Luce 1999, 2004), native speakers’ acceptability judgements are related to the similarity between a given string and an actual word or words, i.e., they are based on the raters’ access to the lexicon. The degree of phonological similarity between a nonword and real words is known as phonological neighbourhood density; the more items of a similar phonological structure are found in the language, the more dense a string’s phonological neighbourhood. Nonwords which have many real word neighbours are likely to be more acceptable than those with a sparse phonological neighbourhood. According to the second approach (e.g., Clements & Keyser 1983; Hammond 1999), the factor that determines the acceptability of a nonword is the native speakers’ grammatical knowledge, i.e., their knowledge of phonotactic well-formedness conditions. Such constraints are claimed to be part of their linguistic competence and are responsible for their judgements. Phonotactic constraints are usually formulated in
Native Speakers’ Phonotactic Intuitions
283
reference to syllable structure5 and make crucial use of the sonority relations within and between syllable constituents. The sonority relations in a syllable are claimed to be reflected by the cross-linguistic Sonority Sequencing Generalization (SSG), according to which (Selkirk 1982) sonority increases towards the nucleus and decreases away from it. Thus, sequences of obstruents and sonorants are usually claimed to be well-formed syllable onsets, but not codas, whereas their reversals, i.e., clusters of sonorants and obstruents are licit codas, but not onsets.6 The most widely used sonority hierarchy is the following:7 stops (plosives and affricates) > fricatives > nasals > liquids > glides > vowels It should be added that while the SSG is valid for many languages, numerous counterexamples to it, for instance in English and Polish, can be found. It thus expresses a cross-linguistic tendency rather than a rigid surface-true constraint. As often noted, however, both the lexical and grammatical approaches to nonword acceptability suffer from a serious drawback since they cannot account for gradient judgements which characterize phonotactic intuitions. Put differently, nonwords are often judged not categorically as well-formed or ill-formed, but as more acceptable or less acceptable. To account for this fact, Treiman (1988) argues that native speakers are sensitive to the frequency of sound sequences found in a given position. For instance, in her study the participants rated the higher frequency rhymes as better than the low frequency rhymes. The issue of strings’ frequency has been termed phonotactic probability.8 It has been formalized in several probabilistic models of phonotactics (e.g., Coleman & Pierrehumbert 1997; Bailey & Hahn 1998; Frisch et al. 2000, a maximum entropy model of Hayes & Wilson 2008, which, for reasons of space, will not be discussed here). To sum up, three major factors have been claimed to affect native speakers’ nonword acceptability judgements: their knowledge of phonotactic constraints, phonological similarity between nonwords and real words, particularly as defined by the concept of neighbourhood density, and the frequency with which various phonological units occur in a given language. It is important to add that all the studies on phonotactics reported in this section are concerned with English, whose phonological grammar is fairly simple when compared to that of Polish with its rich variety of consonant clusters. Therefore empirical verification of many theoretical claims against languages other than English is in order. In this chapter, the experimental data will be examined in terms of selected possible determinants of the participants’ wordlikeness judgements, i.e., the Sonority Sequencing Generalization (Section 4), DziubalskaKołaczyk’s modified approach to sonority expressed by NAD (Section 5) and cluster frequency (Section 6).
284
Jolanta Szpyra-Kozłowska and Paulina Zydorowicz
3. The Experiment In this section we present the goals of the experiment, the experimental design, the stimuli, the participants, the adopted procedure and the obtained results. 3.1. Goals The major goal of the experiment was to examine Polish native speakers’ judgements concerning word likelihood of a set of nonwords with two consonants in either word initial or final position. More specifically, we intended to find out which sequences of consonants, divided into sets of two obstruents, two sonorants, an obstruent + a sonorant and a sonorant + an obstruent, placed in word initial and word-final position, are judged as (a) acceptable, (b) unacceptable or (c) located between these two extremes. 3.2. The Stimuli The stimuli were a set of 80 monosyllabic nonwords with a large variety of two consonant clusters attested in Polish words and taken from Zydorowicz et al. (2016). Thus, they are all accidental gaps. 68 forms constituted the core experimental material with 33 items containing two initial consonants and 35 items two final consonants. They represent both frequent and infrequent sequences of consonants found in the two positions. Twelve items were included as fillers (distracters): six had the CVC structure, five included three-consonant clusters and one two double consonant clusters (a full list of the stimuli is provided in Appendix 1). The nonwords used in our study were all monosyllabic (either of CCVC or CVCC structure) to exclude the potential impact of length and stress placement. Only closed syllables were employed to avoid a possible distinction in the evaluation of items ending in vowels and in consonants. In order not to deal with morphotactic constraints, the stimuli were monomorphemic. They were presented to the participants in an orthographic form. The clusters found in the nonwords are listed in Table 20.1. 3.3. The Experimental Procedure The experiment took place in October 2018. Fifty volunteer 1st year students of English, both male and female, of Maria Curie-Skłodowska University in Lublin were asked to complete a questionnaire which contained 80 nonwords and decide whether the provided items could, in their opinion, become part of the Polish lexicon. They were given five options to choose from, which were later assigned numerical values: definitely yes
Native Speakers’ Phonotactic Intuitions
285
Table 20.1 Consonant Clusters Found in the Stimuli Initial CC clusters (33)
Final CC clusters (35)
obstruent + sonorant: pr, dr, ʂr, ͡tʂw, pl, xl, gw, dn, ʐl, vɲ, kl, ʐm, xlʲ, ʨm,
obstruent + sonorant: ɕɲ, kl, kr, fl, ʥm, pɲ, bl, pr, dm
obstruent + obstruent: st, sk, tʂ, pʂ, ɕp, ft, ͡tʂf, ʣb, sp, db, tk, zb, vʑ,
obstruent + obstruent: ɕʨ, sk, ks, ft, xʨ, kt, ʂt͡ʂ, xt, tʂ, pt,
sonorant + obstruent: wz, rv, lʐ, rʣ
sonorant + obstruent: jt͡ʂ, wʦ, lʦ, ms, wʂ, jp, lʂ, jf, lp, nf, mf,
sonorant + sonorant: mn, ml
sonorant + sonorant: rl, lm, mn, ml, lɲ
(2 points), rather yes (1), rather not (−1), definitely not (−2) and difficult to say (0 points). The participants were told that the experiment was a study of their linguistic intuitions and were asked not to consult the others. No time limit was provided, but all the students completed their task within 15–20 minutes. After the task, an informal discussion followed in which the participants were asked about the criteria they employed in making their decisions. 3.4. Results In this section, we present the experimental results starting with some general observations and proceeding with relevant details concerning the obtained data. 3.4.1. General Observations Fifty participants evaluated 80 nonwords, which yielded 4000 tokens. Among them there were 380 (9.5%) ‘difficult to say’ responses indicating the raters’ problems with regard to some of the experimental items.9 As mentioned in Section 2, in spite of the fact that all the stimuli constitute accidental gaps, as they contain sound sequences attested in real Polish words and, as such, should be acceptable to the students, the obtained results indicate that many items were rejected as candidates for Polish words. This proves that ‘attested’ cannot always be equated with ‘acceptable.’ Moreover, the participants’ judgements were usually scalar in nature, which means that, apart for definitely acceptable and definitely unacceptable answers, all of them made use of the whole scale provided by the experimenters. This is in agreement with the results of
286
Jolanta Szpyra-Kozłowska and Paulina Zydorowicz
other studies which emphasize the noncategorical nature of phonotactic intuitions (e.g., Treiman 1988). It should be pointed out, however, that the participants’ responses show a considerable lack of uniformity and differences of opinion. Thus, many items were judged as fully acceptable by some respondents and as completely unacceptable by others. This means that either the respondents differ in terms of their phonotactic intuitions or that in their decisions they were guided by different factors.
3.4.2. Acceptability of Nonwords With Initial and Final CC Clusters Table 20.2 provides the results for the nonwords with the initial CC clusters dividing them into acceptable (mean value above 0 points) and unacceptable (mean value below 0 points). They are listed according to the diminishing mean values. Of 33 items with the initial CC clusters, according to the participants, 16 were acceptable (positive mean values) and 17 were unacceptable (negative mean values).10 Five items were judged as most acceptable, with the mean values between 1 and 2 with the following sequences: /pr, dr, ʂr/ and /st/. It is striking that the first three comprise clusters of obstruents followed by the trill. The next group of highly acceptable nonwords with the initial clusters, with the mean values between 0.7 and 0.9, is less homogeneous and includes sequences of obstruents (/sk, tʂ, pʂ/) as well as obstruents followed by sonorants (/t͡ʂw, pl/). The forms whose mean acceptability ranges from 0.28 to 0.48 comprises clusters of obstruents and sonorants /kl, ʐm, ʨm, xlʲ/, of two obstruents (/zb, vʑ/ and one sequence of two sonorants (/ml/). The nonwords unacceptable for the majority of the participants include sequences of obstruents (/ɕp, ͡tʂf, ʣb, sp, ft, db, tk/), of obstruents and sonorants (/ɡw, dn, ʐl, vɲ/), of two sonorants (/mn/) and sonorants
Table 20.2 Acceptable and Unacceptable Nonwords With the Initial CC Clusters Acceptable nonwords with initial CC clusters (16)
Unacceptable nonwords with initial clusters (17)
pr (1.24), dr (1.24), ʂr (1.18), st (1.14), ͡tʂw (0.9), sk (0.9), tʂ (0.78), pʂ (0.76), pl (0.7), zb (0.48), kl (0.44), ʐm (0.4), ml (0.38), vʑ (0.38), ʨm (0.28), xlʲ(0.12)
xl (−0.08), wz (−0.01), ɡw (−0.1), dn (−0.14), rv (−0.2), ɕp (−0.2), ʐl (−0.68), ft (−0.72), ͡tʂf (−0.76), ʣb (−0.78), sp (−0.8), lʐ (−0.8), db (−0.84), vɲ (−0.84), tk (−1.08), rʣ (−1.14), mn (−1.34)
Native Speakers’ Phonotactic Intuitions
287
Table 20.3 Acceptable and Unacceptable Nonwords With Final Clusters. Acceptable nonwords with final CC clusters (13)
Unacceptable nonwords with final CC clusters (21)
ɕʨ (1.3), ɕɲ (0.7), sk (0.6), kl (0.48), jt͡ʂ (0.38), ks (0.36), wʦ (0.32), ft (0.3), xʨ (0.28), kt (0.22), lʦ (0.12), kr (0.06), ʂt͡ʂ (0.06),
fl (−0.02), wʂ (−0.22), xt (−0.32), ʥm (−0.4), jp (−0.42), pɲ (−0.48), bl (−0.52), rl (−0.68), lʂ (−0.76), jf (−0.84), lp (−0.88), tʂ (−0.86), nf (−0.96), pt (−0.98), pr (−0.98), mf (−1), lm (−1.02), mn (−1.02), ml (−1.02), dm (−1.06), lɲ (−1.16)
followed by an obstruent (/wz, rv, lʐ, rʣ/). The most harshly evaluated forms, with the mean below −1 point, start with /tk, rʣ/ and /mn/. Let us now present the results concerning the participants’ acceptability ratings of nonwords with the final consonant clusters, summarized in Table 20.3. As before, they are listed in the order of decreasing mean values with the dividing line drawn at the mean of 0 points. The respondents viewed 13 stimuli with the final clusters as acceptable and as many as 21 as unacceptable. One item with the /ms/ sequence received 0 points and will not be included in the subsequent analyses. The most accepted form is luść with the final /ɕʨ/ cluster, whose mean value is considerably higher than that of the remaining nonwords.11 The forms with the mean values between 0.7 and 0.12 comprise sonorants and obstruents (/jt͡ʂ, wʦ, lʦ/), two obstruents (/sk, ks, ft, xʨ, kt/) and one sequence of an obstruent + a sonorant (/ɕɲ/). Among the negatively evaluated nonwords, the ones with /fl, wʂ xt, ʥm, jp/ are fairly close to the acceptability threshold, while the items with /nf, pt, pr, mf, lm, mn, ml, dm, lɲ/ are viewed as highly unacceptable. The remaining ones are located between these extremes. What is striking is the fact that in all these subgroups, we find various combinations of consonants in terms of the sonority relations between them. The results presented in this section demonstrate that while almost the same number of stimuli with the initial CC clusters were considered acceptable and unacceptable, in the case of nonwords with the final consonant sequences the participants’ degree of tolerance was smaller, as shown in the acceptance of only 14 items and the rejection of 21 out of 35.
4. Experimental Results Versus the SSG Let us examine the obtained results in terms of their adherence to and violation of the SSG. We employ here the simplest version of the SSG, according to which there should be a rise in the sonority level in the onset and the sonority fall in the coda,12 with no requirement of the minimal
288
Jolanta Szpyra-Kozłowska and Paulina Zydorowicz
Table 20.4 Acceptable Nonwords With CC Clusters and the SSG Acceptable
Nonwords with initial CC clusters (16)
Nonwords with final CC clusters (13)
Following SSG
pr, dr, ʂr, ͡tʂw, tʂ, pʂ, pl, kl, ʐm, ml, ʨm, xlʲ (12) st, sk, zb, vʑ (4)
ɕʨ, sk, jt͡ʂ, wts, ft, xʨ, lʦ, ʂt͡ʂ (8)
Violating SSG
ɕɲ, ks, kt, kr, kl, (5)
sonority distance between the consonants in the initial clusters (Harris 1983; Clements 1990). We examine whether acceptable nonwords contain the clusters which follow the SSG and whether the unacceptable ones contain sequences that violate it. Table 20.4 divides the employed CC clusters according to this principle. Among the acceptable nonwords with the initial clusters 12 (75%) are in agreement with the SSG, while 4 (25%) violate this principle. It should be noted, however, that in the latter three cases we find dental fricative + plosive sequences, common in word initial position in Indo-European languages, which cannot be regarded as grave violations of the SSG. The last cluster represents level sonority. The majority (8, i.e., 61.5%) of the acceptable items with the final clusters adhere to the SSG and 38.5% do not. The offending cases include two sequences of an obstruent + a sonorant and two instances of two obstruents. Table 20.5 offers the relevant data on the unacceptable nonwords with the CC clusters presented from the perspective of the SSG. Out of 17 unacceptable nonwords with the initial clusters, 11 (65%) are violations of the SSG and, as such, are predicted by this principle while 6 (35%) are rejected in spite of their well-formedness in terms of the SSG. The latter fact comes as a surprise as five of these clusters represent a sharp sonority rise, desirable in this position. In the case of the unacceptable items with the final clusters, 11 (52%) are predicted by the sonority requirements in the coda, but 10 (48%) are well-formed with regard to the SSG and yet are not accepted. To sum up, our analysis of the experimental results in terms of the SSG demonstrates that this principle is capable of predicting the shape of 70% of the nonwords with the initial clusters and 57% of the items with the final CC sequences. These data show that higher adherence to the SSG is required word initially than finally. In other words, the participants display greater tolerance to violations of the SSG in word codas than onsets. Nevertheless, when we add the figures concerning both cases, the overall percentage of clusters in agreement with the SSG amounts to 63.5% while 36.5% of them fail to conform to it. This means that the SSG cannot be viewed as the only factor responsible for the respondents’ choices. In the following section we intend to find out whether a modified approach to sonority, i.e., the concept of NAD, fares better in accounting for the experimental results.
Native Speakers’ Phonotactic Intuitions
289
Table 20.5 Unacceptable Nonwords With CC Clusters and the SSG Unacceptable
Nonwords with initial CC clusters (17)
Nonwords with final CC clusters (21)
Following SSG
xl, ɡw, dn, ʐl, ͡tʂf, vɲ (6)
Violating SSG
wz, rv, ɕp, ft, ʣb, lʐ, db, tk, rʣ, mn, sp (11)
jp, lʂ, jf, lp, nf, xt, mf, lm, lɲ, wʂ (10) fl, ʥm, pɲ, bl, rl, tʂ, pt, pr, mn, ml, dm (11)
5. Experimental Results Versus NAD The present section examines the data presented in Section 3 from the perspective of NAD. First, in 5.1. this concept is clarified. Next, in 5.2., NAD is applied to the experimental results with a view to checking whether there is a correspondence between acceptable and preferred clusters as well as between unacceptable and dispreferred consonant sequences. 5.1. The Concept of NAD The Net Auditory Distance principle (henceforth NAD) is a measure of cluster goodness proposed by Dziubalska-Kołaczyk (2009). This principle stems from the Beats-and-Binding model of phonotactics (DziubalskaKołaczyk 2002), which is embedded in the framework of Natural Phonology. Typology-wise, the universal phonological structure is that of CV(s) alternating in a word (all languages have CVs). However, nearly 31% of languages have a complex syllable structure. A moderately complex syllable structure is characteristic of 56.5% languages (Maddieson 2013). Thus, in order to survive, clusters must organize themselves according to certain universal conditions, which can be captured in terms of phonotactic preferences. Their function is to counteract the preference for a CV and to prevent the emergence of dysfunctional clusters. The NAD measure is based on the principle of perceptual contrast that needs to be ensured in a string of segments. In its current version, this contrast is obtained by taking into account three parameters of consonant production: manner of articulation (henceforth MOA), place of articulation (henceforth POA) and the distinction between a sonorant and obstruent in a sequence (S/O). The values for MOA and POA are presented in Table 20.6. The difference between a sonorant and an obstruent in a sequence is presented in a binary fashion (1 = difference, 0 = no difference). Six well-formedness conditions have been formulated for clusters of 2 and 3 segments for each word-position (initial, medial and final). For the purpose of this study, we present well-formedness conditions for twomember initial and final sequences and illustrate them with examples of clusters used as stimuli in our experiment.
290
Jolanta Szpyra-Kozłowska and Paulina Zydorowicz
Table 20.6 Distances in MOA and POA: Polish (Zydorowicz et al. 2016) OBSTRUENT STOP 5.0
SONORANT
FRICATIVE NASAL
AFFRICATE 4.5
4.0
pb
GLIDE
lateral rhotic 3.0
2.5
2.0
1.5 1.0 w̃
0
m
w
fv
1.5 labiodental
ʦʣ
sz
n
2.0 (post-) dental
tʂ dʐ
ʂʐ
2.3 alveolar
ʨʥ
ɕʑ
ɲ
2.6 alveolopalatal
td
LIQUID
VOWEL
kɡ
x
l r
1.0 bilabial
j
ȷ̃
3.0 palatal
ŋ
w
w̃
3.5 velar
LABIAL
CORONAL
DORSAL
4.0
RADICAL
5.0
GLOTTAL
Word-initial double clusters: NAD (C1, C2) ≥ NAD (C2, V) The condition reads as follows: in word-initial double clusters, the Net Auditory Distance between the two consonants should be greater than or equal to the sonority distance between a vowel and a consonant neighbouring on it. Word-final double clusters: NAD (V, C1) ≤ NAD (C1, C2) The condition for word-final clusters is a mirror image of the condition for the word-initial context, i.e., the contrast between the two consonants must be greater than (or equal to) the contrast between the vowel and C1. The calculations of cluster goodness are illustrated below. CC initial: NAD (C1,C2) ≥ NAD (C2,V) NAD CC = |MOA1−MOA2| + |POA1−POA2| + S/O NAD CV = |MOA2−MOA V| + S/O prV NAD CC = |5–2| + |1–2.3| + 1 = 3 + 1.3 + 1 = 5.3 NAD CV = |2–0| + 0 = 2 + 0 = 2 5.3 > 2; difference 3.3 The preference NAD (C1, C2) ≥ NAD (C2, V) is observed since 5.3 > 2.
Native Speakers’ Phonotactic Intuitions
291
ʣbV NAD CC = |4.5–5| + |2–1| + 0 = 0.5 + 1 + 0 = 1.5 NAD CV = |5–0| + 1 = 5 + 1 = 6 1.5 < 6, difference −4.5 The preference NAD (C1, C2) ≥ NAD (C2, V) is not observed, which results in a negative evaluation of the cluster. In order to perform efficient large-scale calculations, the NAD calculator has been devised (Dziubalska-Kołaczyk et al. 2014). The tool operates on five languages: English, Polish, German, Russian and Ukrainian, and is accessible online at http://wa.amu.edu.pl/nadcalc/. 5.2. NAD and Nonword Acceptability Tables 20.7 and 20.8 below summarize nonword acceptability judgements with reference to the NAD criterion. Word-initial and final positions are analyzed separately. This time the analysis aims to verify the assumption if acceptable nonwords equals with preferred clusters (as measured by NAD) and unacceptable nonwords equals with dispreferred clusters. Let us first analyze the word-initial context. Out of 16 acceptable nonwords with the initial clusters in the experimental material, 56% (9 types) are preferred and 44% (7 types) are dispreferred. Word-finally, out of 13 CC types, 3 (23%) are preferred whereas 10 (77%) clusters do not fulfill the NAD criterion. This result demonstrates that the subjects were not guided by the contrast requirements imposed by NAD (mean 39.5%) while accepting nonwords with CC clusters. Let us now move on to the pool of disqualified items. Table 20.7 Acceptable Nonwords With CC Clusters in the Light of NAD Acceptable
Nonwords with initial CC clusters (16)
Nonwords with final CC clusters (13)
Fulfilling the NAD condition Violating the NAD condition
pr, ͡tʂw, kl, dr, pl, xlj, ʨm, ʂr, ʐm (9) ml, pʂ, sk, tʂ, vʑ, zb, st (7)
jt͡ʂ, wʦ, lʦ (3) kr, kl, sk, ɕɲ, ks, ft, xʨ, ɕʨ, ʂt͡ʂ, kt (10)
Table 20.8 Unacceptable Nonwords With CC Clusters in the Light of NAD Unacceptable
Nonwords with initial CC clusters (17)
Nonwords with final CC clusters (21)
Fulfilling the NAD condition Violating the NAD condition
ɡw, xl, ʐl, vɲ, dn (5)
jp, jf, wʂ, lp, lʂ (5)
rv, wz, rʣ, mn, lʐ, ɕp, ͡tʂf, sp, ʣb, tk, ft, db (12)
mf, nf, pr, lm, rl, lɲ, pɲ, ʥm, ml, bl, mn, fl, dm, xt, tʂ, pt (16)
292
Jolanta Szpyra-Kozłowska and Paulina Zydorowicz
Out of 17 unacceptable nonwords with the word-initial clusters, 12 (70.5%) were dispreferred whereas 5 (29.5%) were preferred. Twentyone items with the final clusters were disqualified by the subjects: 16 (76%) contain dispreferred consonant sequences whereas 5 types (24%) fulfill the NAD condition. Thus, the mean value for the predictions of NAD for the unacceptable nonwords with CC clusters amounts to 73%. In sum, the combined results for the application of NAD to the experimental data show that this measurement accounts for 56% of the participants’ nonword acceptability decisions. Thus, as shown in Sections 4 and 5, the sonority profiles of consonant clusters are only partly responsible for the obtained results and other explanations for them must be sought.
6. Frequency of Consonant Clusters and Experimental Results In this section, we confront the experimental data with the frequency of occurrence of the analyzed clusters in order to find out whether this factor accounts for the obtained results. At the outset, it should be clarified that the frequency of clusters can be measured in different sets of data. The most common measures include lexical (dictionary) and corpus frequency. The former refers to the number of cluster occurrences in lexical items, e.g., collected in a dictionary, whereas the latter concerns the frequency with which a given sequence of segments appears in a language corpus. In the present study, corpus frequency will serve as a measure as it is indicative of the speakers’ familiarity with the cluster. The frequency data are taken from Zydorowicz et al. (2016).13 It is not a challenging task to indicate the most frequent clusters in Polish (/pʂ/ and /pr/ word-initially and /st/ and /ɕʨ/ word-finally) and the least frequent ones (e.g., /ʐɡn/ word-initially and /w̃st/ word-finally). Nevertheless, it is extremely difficult, if not impossible, to draw the line between the frequent and rare clusters on the frequency continuum. In order to enable a comparison of word acceptability in the light of cluster frequency, the data was divided into three frequency bands: high, medium and low. In the word-initial context, the three frequency bands correspond to the following: high frequency clusters = above 100,000 corpus tokens, medium frequency = 100,000 ≤ 15,000, low frequency = below 15,000. Arbitrary though the proposed division may seem, it reflects the data distribution in our material. Since word-final clusters generally have a lower frequency than initial ones, the following ranges were delineated: high frequency = above 10,000, mid-frequency = 10,000 ≤ 1000, low frequency = below 1000. Table 20.9 presents nonce word acceptance and rejection in relation to the frequency criterion. Our goal is to establish whether the following equations: acceptable nonwords = frequent clusters and unacceptable nonwords = infrequent clusters hold true.14
Native Speakers’ Phonotactic Intuitions
293
Table 20.9 Acceptable and Unacceptable Nonwords and Cluster (Corpus) Frequency Acceptable nonwords Frequency
With initial CC clusters (16)
With final CC clusters (11)
High Mid Low
pʂ, pr, st, sk, dr (5) tʂ, kl, pl, ͡tʂw, zb (5) vʑ, ml, xl, ʐm, ʂr, ʨm (6)
ɕʨ, kt, sk, ks (4) ʂt͡ʂ, kl (2) ft, lʦ, ɕɲ, kr, jt͡ʂ (5)
Unacceptable nonwords Frequency
With initial CC clusters (17)
With final CC clusters (21)
High Mid Low
sp (1) ɡw, vɲ, ft, ͡tʂf (4) mn, db, xl, dn, tk, wz, rʣ, lʐ, ɕp, rv, ʣb, ʐl (12)
lm, tʂ, pt (3) xt, lɲ, pr, mn, rl, ml, mf, fl, lp, wʂ, lʂ, jf, dm, jp, pɲ, ʥm, bl, nf (18)
Table 20.10 Nonword Acceptability vs Cluster Frequency: A Summary Frequency Nonwords with initial CC clusters Nonwords with final CC clusters
High Mid Low
acceptable
unacceptable
acceptable
unacceptable
83% 56% 33%
17% 44% 67%
100% 40% 22%
0% 60% 78%
The data summarized in Table 20.10 demonstrates that, generally, the number of rejected nonwords increases as the frequency of the cluster decreases. In other words, 70% of the acceptable items contain high or mid frequency clusters, whereas 72.5% of the unacceptable forms comprise clusters with a low frequency of occurrence. The conclusion which can be drawn from these observations is that corpus frequency of a two-consonant sequence plays an important role in the participants’ nonword acceptability decisions. However, as Table 20.9 suggests, the decision to disqualify a potential item must also be based on criteria other than frequency as some rare clusters (six initials and five finals) were deemed acceptable as well.
7. Conclusions Let us summarize the major points made in this chapter and the emerging conclusions. We have carried out the first psycholinguistic experiment whose main goal was to collect data on Polish native speakers’ intuitions concerning the acceptability of nonwords with two-consonant clusters in word initial and final positions. Next we applied three measures, i.e., the SSG, NAD and cluster frequency, to the experimental results to see how well they fare in predicting the participants’ acceptability judgements of novel forms (the ratings are provided in Appendix 2). In other words,
294
Jolanta Szpyra-Kozłowska and Paulina Zydorowicz
the focus was on finding out which of the following equations (and their reversals) hold true: acceptable nonwords = with clusters conforming to the SSG, acceptable nonwords = with clusters preferred according to NAD and acceptable nonwords = with frequent clusters. It has been demonstrated that the SSG is capable of predicting the respondents’ acceptability evaluations of 70% of nonwords with the initial and 57% with the final clusters (mean for both positions 63.5%). The respective results for NAD are 63% and 50% (mean 56%). Thus, the sonority profile of the experimental clusters, whether measured in terms of the SSG or NAD, plays an important role in their judgements, particularly in the ratings of the items with the initial clusters which, in order to be acceptable, must conform to the sonority requirements more than final clusters. Numerous violations of these principles are tolerated, however, which is not surprising as Polish native speakers encounter many such cases in real Polish words. The combined figures for high and medium frequency clusters account for the acceptance of 71% of the stimuli while low frequency for 72.5% of the rejected items. This means that of the three examined measures the best, but certainly not perfect, predictor of the participants’ nonword acceptability decisions is cluster frequency. While the respondents display sensitivity to cluster frequency and, to some extent, to the sonority relations between their members, these are not the only factors which determine their acceptability judgements of nonwords. Further studies, however, are needed to verify the correctness of the observations made in this chapter coupled with an examination of other possible determinants of wordlikeness such as neighbourhood density and phonological similarity of words.
Notes 1. This chapter is written in honour of Prof. Katarzyna Dziubalska-Kołaczyk and her valuable work on various aspects of phonology, phonetics and language acquisition. 2. Wiese et al. (2017) report an experiment which examined the learnability of nonwords ending in CC clusters by a group of Polish native speakers. 3. Within this category a division is often made between frequent sequences (e.g., #tr, #bl, nd#) and rare ones (e.g., #sf (sphere), #skl (sclerosis), rsk# (torsk)). 4. In addition to these three categories, some scholars isolate items with sequences which are unpermitted, but attested in loanwords (e.g., Vladimir), onomatopoeic expressions (e.g., vroom) and ejaculations (e.g., tsk-tsk). 5. In the traditional generative syllableless model of Chomsky and Halle (1968), such constraints were expressed by morpheme structure conditions. The concept of NAD (see Section 5) defines preferred and dispreferred clusters with reference to the word initial and final position without employing the notion of the syllable. 6. A frequent claim is that within the syllable sonority rises sharply in the onset and decreases gradually in the coda. This requirement is expressed by the concept of the minimal sonority distance which requires consonants in the onset, but not in the coda, not to be neighbours on the sonority scale.
Native Speakers’ Phonotactic Intuitions
295
7. On different sonority scales and their application to Polish see SzpyraKozłowska (1998). 8. The frequency of different units can be studied: segments, syllables, syllabic constituents, subsegmental features, bigrams and trigrams. 9. The nonwords which turned out to be the most troublesome in this respect include żaft, wniup and głal (10–11 ‘hard to say’ answers). Four items caused no such difficulties: kleg, ftać (1), słemf (a filler, regarded as unacceptable by the majority of the respondents) and drecz (both with 0 ‘hard to say’ answers). 10. It should be pointed out that two nonwords, chluf and chliń, with very similar clusters, i.e., [xl] and [xlʲ] were evaluated differently: the former as unacceptable and the latter as acceptable. The difference in these judgements, however, is slight and statistically insignificant. 11. This fact may be attributed to a high frequency of this final cluster combined with words such as liść ‘leaf’ and ‘puść ‘let go’ which form minimal pairs with luść. 12. The SSG concerns syllable structure rather than word positions. This is of no relevance here since in our experiment only monosyllabic items were employed. 13. The corpus is the full text of the Rzeczpospolita newspaper from the 2000– 2001 period, which means 48.6 million words (tokens) and 630,000 unique word forms (types). 14. Two final clusters, i.e., /wʦ/ and /xʨ/, for which no frequency data are available, are not included in Tables 20.9–20.11.
References Bailey, T. M. & U. Hahn. 1998. Determinants of wordlikeness. Proceedings of the Cognitive Science Society 20. 90–95. Bargiełówna, M. 1950. Grupy fonemów spółgłoskowych współczesnej polszczyznykulturalnej. Biuletyn Polskiego Towarzystwa Językoznawczego 10. 1–25. Bethin, C. 1992. Polish syllables: The role of prosody in phonology and morphology. Columbus, OH: Slavica Publishers. Chomsky, N. & M. Halle. 1968. The sound pattern of English. New York: Harper & Row. Clements, G. N. 1990. The role of the sonority cycle in core syllabification. In J. Kingston & M. Beckman (eds.), Papers in laboratory phonology. 1: Between the grammar and physics of speech, 283–333. Cambridge: Cambridge University Press. Clements, G. N. & S. J. Keyser. 1983. CV Phonology: A generative theory of the syllable. Cambridge, MA: MIT Press. Coleman, J. S. & J. Pierrehumbert. 1997. Stochastic phonological grammars and acceptability. Computational Phonology 3. 49–56. Cyran, E. & E. Gussmann. 1999. Consonantal clusters and governing relations: Polish initial consonant sequences. In H. van der Hulst & N. Ritter (eds.), The syllable: Views and facts, 219–247. Berlin: Mouton de Gruyter. Dobrogowska, K. 1992. Word initial and word final consonant clusters in Polish popular science texts and in artistic prose. Studia Phonetica Posnaniensia 2. 47–121. Donegan, P. & D. Stampe. 1979. The study of natural phonology. In D. A. Dinnsen (ed.), Current approaches to phonological theory, 126–174. Bloomington: Indiana University Press.
296
Jolanta Szpyra-Kozłowska and Paulina Zydorowicz
Dressler, W. U. 1984. Explaining natural phonology. Phonology Yearbook 1. 29–50. Dunaj, B. 1985. Grupy spółgłoskowe współczesnej polszczyzny mówionej (w języku mieszkańców Krakowa) (= Prace Językoznawcze 85). Kraków: Zeszyty Naukowe UJ. Dziubalska-Kołaczyk, K. 2002. Beats-and-binding phonology. Frankfurt am Main: Peter Lang. Dziubalska-Kołaczyk, K. 2009. NP extension: B&B phonotactics. Poznań Studies in Contemporary Linguistics 45(1). 55–71. Dziubalska-Kołaczyk, K. 2014. Explaining phonotactics using NAD. Language Sciences 46A. 6–17. Dziubalska-Kołaczyk, K., D. Pietrala & G. Aperliński. 2014. The NAD phonotactic calculator: An online tool to calculate cluster preference in English, Polish and other languages. http://wa.amu.edu.pl/nadcalc/ (8 November 2018). Frisch, S. A., N. R. Large & D. B. Pisoni. 2000. Perception of wordlikeness: Effects of segment probability and length on the processing of nonwords. Journal of Memory and Language 42(4). 481–496. Greenberg, J. H. & J. J. Jenkins. 1964. Studies in the psychological correlates of the sound system of American English. Word 20. 157–177. Hammond, M. 1999. The phonology of English: A prosodic optimality-theoretic approach. Oxford: Oxford University Press. Harris, J. 1983. Syllable structure in Spanish: Non-linear analysis (= Linguistic Inquiry Monograph 8). Cambridge, MA: MIT Press. Hayes, B. & C. Wilson. 2008. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39. 379–440. Leszczyński, Z. 1969. Studia nad polskimi grupami spółgłoskowymi. Wrocław: Ossolineum. Maddieson, I. 2013. Syllable structure. In M. S. Dryer & M. Haspelmath (eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. http://wals.info/chapter/12 (15 December 2017). Marecka, M. & K. Dziubalska-Kołaczyk. 2014. Evaluating models of phonotactic constraints on the basis of SC cluster acquisition data. Language Sciences 46. 37–47. Rochoń, M. 2000. Optimality in complexity: The case of Polish consonant clusters. Berlin: Akademie Verlag. Rocławski, B. 1976. Zarys fonologii, fonetyki I fonotaktyki współczesnego języka polskiego. Gdańsk: Wydawnictwo Uczelniane UG. Rubach, J. & G. Booij. 1990. Edge of constituents effects in Polish. Natural Language and Linguistic Theory 8. 427–463. Selkirk, E. 1982. The syllable. In H. van der Hulst & N. Smith (eds.), The structure of phonological representations II, 107–136. Dordrecht: Foris. Śledziński, D. 2010. Analiza struktury grup spółgłoskowych w nagłosie oraz w wygłosie wyrazów w języku polskim. Kwartalnik Językoznawczy 3–4. 61–84. Stampe, D. 1979. A dissertation on natural phonology. New York: Garland Publishing. Szpyra, J. 1995. Three tiers in Polish and English phonology. Lublin: Wydawnictwo UMCS. Szpyra-Kozłowska, J. 1998. The sonority scale and phonetic syllabification in Polish. Biuletyn Polskiego Towarzystwa Językoznawczego 54. 65–82. Trask, R. L. 1996. A dictionary of phonetics and phonology. New York: Routledge.
Native Speakers’ Phonotactic Intuitions
297
Treiman, R. 1988. Distributional constraints and syllable structure in English. Journal of Phonetics 16. 221–229. Vitevitch, M. S. & P. A. Luce. 1999. Probabilistic phonotactics and neighborhood activation in spoken word recognition. Journal of Memory and Language 40. 374–408. Vitevitch, M. S. & P. A. Luce. 2004. A Web-based interface to calculate phonotactic probability for words and nonwords in English. Behavior Research Methods, Instruments and Computers 36(3). 481–487. Wiese, R., P. Orzechowska, P. M. Alday & C. Ulbrich. 2017. Structural principles or frequency of use? An ERP experiment on the learnability of consonant clusters. Frontiers in Psychology 7. 2005. doi: 10.3389/fpsyg.2016.02005. Zydorowicz, P. & P. Orzechowska. 2017. The study of Polish phonotactics: Measures of phonotactic preferability. Studies in Polish Linguistics 12(2). 97–121. Zydorowicz, P., P. Orzechowska, M. Jankowski, K. Dziubalska-Kołaczyk, P. Wierzchoń & D. Pietrala. 2016. Phonotactics and morphonotactics of Polish and English: Theory, description, tools and applications. Poznań: Wydawnictwo UAM.
Appendix 1 A List of the Experimental Items (In the Order Presented to the Participants)
przun, szobl, mnep, luść, człać, ruń, dzbyw, wikr, gedź, skorz, żaft, ciusk, szruń, żmyg, łetsz, wypt, kleg, żlorz, siap, dnysz, cekl, spyp, julm, stryw, tuśń, dbeś, siamn, chluf, cior, dżyml, słemf, dżacht, rdzup, kastrz, preń, dzylsz, lżag, somf, lik, styś, rudźm, tkuf, forst, gechć, bolp, ćmesz, wulc, głal, kefl, guf, drecz, nidm, ciopń, zben, ksztaf, binf, czwyp, ziaszcz, mloń, szukt, fyks, ciak, rwol, gełc, szojf, łzur, sełsz, chliń, kajcz, kostw, ftać, żems, wziaj, szorl, wniup, śpym, rojp, trzeg, fupr
Appendix 2
Table 20.11 Experimental Cluster Evaluation in Terms of Acceptability, the SSG, NAD and Corpus Frequency Pos CC
Acceptability Evaluation Evaluation Cluster corpus rating according to SSG according to NAD frequency
i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i
1.24 1.24 1.18 1.14 0.9 0.88 0.78 0.74 0.7 0.48 0.44 0.4 0.38 0.38 0.28 0.12 −0.08 −0.1 −0.1 −0.14 −0.2 −0.2 −0.68 −0.72 −0.76 −0.78 −0.8 −0.8 −0.84 −0.84 −1.08 −1.14 −1.34 −1.52
pr dr ʂr st ͡tʂw sk tʂ pʂ pl zb kl ʐm vʑ ml ʨm xl xl ɡw wz dn ɕp rv ʐl ft ͡tʂf ʣb sp lʐ vɲ db tk rʣ mn sw
+ + + – + – + + + – + + – + + + + + – + – – + – + – – – + – – – – +
+ + – + – – – + – + + – – + + + + – + – – + – – – – – + – – – – +
high high low high medium high medium high medium medium medium low low ow low low low medium low low low low low medium medium low high low medium low low low low low (Continued)
(Continued) Pos CC f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f
ɕʨ ɕɲ sk kl jt͡ʂ ks ft kt lʦ ʂt͡ʂ kr ms fl wʂ xt ʥm jp pɲ bl rl lʂ jf tʂ lp nf pt pr mf lm mn ml dm lɲ
Acceptability Evaluation Evaluation Cluster corpus rating according to SSG according to NAD frequency 1.3 0.7 0.6 0.48 0.38 0.36 0.3 0.22 0.12 0.06 0.06 0 −0.02 −0.22 −0.32 −0.4 −0.42 −0.48 −0.52 −0.68 −0.76 −0.84 −0.86 −0.88 −0.96 −0.98 −0.98 −1 −1.02 −1.02 −1.02 −1.06 −1.16
+ – + – + – + – + + – + – + + – + – – + + + – + + – – + + – – – +
– – – – + – – – + – – + – + – – + – – – + + – + – – – – – – – – –
high low high medium low high low high low medium low low low low low low low low low low low low medium low low medium low low medium low low low low
21 Illustration of Markedness and Frequency Relations in Phonotactics Paulina Zydorowicz and Paula Orzechowska
1. Introduction The study of phonotactics has been focused on the organization of segments in larger linguistic units, in particular in syllables. Consonant clusters have inspired a large body of studies involving markedness. Since sequences of consonants are rare and disfavoured in the languages of the world (Greenberg 1978; Maddieson 2013), phonologists have been interested in investigating if, and to what extent, they follow or violate wellformedness principles. The distinction between unmarked and marked clusters has been related to their being easy or difficult to articulate, early or late to acquire, later or early to lose in language deficit, more or less frequent in child-directed speech, or even more or less expected crosslinguistically (Dressler et al. 1987; Rice 2007). Another diagnostic of markedness is frequency. In usage-based phonology (Bybee 2003), type and token frequencies are considered to have an influence on the phonological and morphological structure. This idea is in the focus of the present chapter. The question of how markedness is related to frequency is tested on Polish and English. Both languages constitute adequate testing grounds for our analysis as they represent typologically different systems, which display different degrees of phonotactic complexity (Maddieson 2013). Polish permits 5 consonants at the left and right edge of the word, e.g., /strfj-/ in Strwiąż [a river name] and /-mpstf/ in przestępstw ‘crime’-gen.pl. English admits 3 and 4 consonants word-initially and word-finally, respectively, e.g., /str-/ in string and /-mpts/ in attempts. In both languages, phonotactic richness is also due to the intervention of morphology. The study of the two languages makes it possible to compare how frequency and markedness pattern depending on the phonotactic richness of a language. The aim of this contribution is to examine the relation between cluster markedness and frequency in written corpora in Polish and English. This work draws on our earlier findings (see Zydorowicz & Orzechowska 2017; Orzechowska & Zydorowicz 2019) and discusses the distribution of clusters alongside the preferability and frequency continua. The preferability measures adopted in this chapter include two principles: the Sonority Sequencing Generalization and Net Auditory Distance.
302
Paulina Zydorowicz and Paula Orzechowska
2. Measures of Cluster Goodness 2.1 The Sonority Sequencing Generalisation The study of phonotactics has been dominated by the principle of sonority. The concept dates back to early works of Whitney (1865), Sievers (1881) and Jespersen (1904). For more than a century, various definitions of the term have been proposed (see Parker 2012 for an overview); however, sonority has been most commonly associated with the constriction in the vocal tract, which corresponds to loudness. Ladefoged and Johnson define sonority as “loudness relative to that of other sounds with the same length, stress, and pitch” (Ladefoged & Johnson 2011: 245). In general, an increase in the aperture of the vocal tract corresponds with a decrease in sonority. Various versions of the sonority scale have been proposed throughout the course of time (e.g., Vennemann 1988; Clements 1990; Parker 2008). Although all the hierarchies vary in the degree of detail and a language they have been proposed for, all share some universal properties. First, vowels and obstruents are found at the most and least sonorous extremes of the hierarchy, respectively. Second, less sonorous obstruents are separated from sonorants. Finally, all scales are based on the manner of articulation categories. In the present study, we use the scale of Foley (1972), where sonority increases from 6 to 1: plosives (6)—fricatives (5)—nasals (4)—liquids (3)—glides (2)—vowels (1). The selection of the scale is motivated by the fact that we are interested in providing an analysis which will be most comparable with NAD. As we will see in Section 2.2., the dimension of NAD which corresponds with sonority involves 6 classes of segments, where 5 distances are found between plosives and vowels. Moreover, in both approaches, the scale does not contain affricates as a separate class between plosives and fricatives. Affricates are viewed as an intermediate category which combines articulatory gestures typical of oral stops and fricatives. This treatment of complex segments has consequences for the calculations performed by means of both principles. Namely, although the distance of one holds between all major consonant classes, affricates involve half-a-point (=0.5) distance towards plosives and fricatives. Specific calculations are detailed below. The arrangement of segments in a successful syllable is governed by the Sonority Sequencing Generalization (SSG, Selkirk 1984). The principle requires a sonority rise from cluster margins towards a nuclear vowel. Originally, the principle was proposed to account for syllable-related processes and syllabification, but it has also been used to classify clusters into well-formed and ill-formed. Clusters which display a falling sonority slope from a vowel outward are unmarked, e.g., plosive+liquid /pr-/ in onsets and the reverse /-rp/ in codas. The reverse marked pattern is found in numerous Polish clusters, such as initial /wb-/ in łby ‘heads’, /mɕʨ-/ in mścić ‘to revenge’ and final /-tr/ in wiatr ‘wind’ and /-kst/ in tekst ‘text’.
Markedness and Frequency Relations
303
Apart from providing a binary division into SSG-obeying and SSGviolating clusters, degrees of markedness have been established on the basis of sonority distance (e.g., Clements 1990). In the present work, determining degrees of markedness involves counting distances between consonants along the sonority scale given above. For instance, the smallest distance of 0 is represented by plateau clusters such as /pt-/ and /-mn/ in Polish and /sf-/ and /-vz/ in English. The largest distance is found in clusters containing plosives and glides, e.g., /bj-/ in beauty and /ɡw-/ in głowa ‘head’. This method corresponds to calculations performed in terms of Net Auditory Distance, which enables a comparison of the results between both approaches. 2.2 The Net Auditory Distance Principle The Net Auditory Distance (NAD) principle (Dziubalska-Kołaczyk 2009, 2014, in press) stems from the Beats and Binding model of phonotactics. It was proposed as a refinement of the Optimal Sonority Distance Principle in Dziubalska-Kołaczyk (2002). NAD is based on universals and semiotic premises: 1) clusters are marked (all languages have CVs, but not all languages have CCs), 2) in order to survive clusters must be sanctioned by some kind of force, 3) this force is a principle of contrast, 4) the contrast is achieved by a sum of three parameters traditionally used for consonant description: place of articulation (POA), manner of articulation (MOA) and a sonorant/obstruent (S/O) contrast between members in a string. Table 21.1 and Table 21.2 present places and manners of articulation in Polish and English respectively. The manner dimension corresponds to the sonority scale discussed in Section 2.1., whereas places of articulation are presented from the front of the oral cavity backwards. Both manners and places of articulation are assigned values on a scale 0–5. The obstruent/ sonorant distinction is expressed by a 0/1 difference, where 0 = no difference and 1 = difference in voicing. The POA and MOA values are languagespecific. NAD is a sum of values (POA, MOA, S/O) for the contrast between the adjacent consonants in relation to consonant-vowel neighbourhood. The principle formulates well-formedness conditions for 2- and 3-member clusters in all word positions, however, our illustration will be limited to word-initial CC sequences due to the scope of our analysis to follow: NAD (C1,C2) ≥ NAD (C2,V). In word-initial double clusters, the NAD value between the two consonants should be greater than or equal to the sonority distance between a vowel and a consonant neighbouring on it. Below, we provide an illustration of how the evaluation of a cluster proceeds (for automatic calculations see Dziubalska-Kołaczyk et al. 2014). Given the following requirements for a preferred initial CC sequence, NAD (C1,C2) ≥ NAD (C2,V) NAD C1C2 = |MOAC1—MOAC2| + |POAC1—POAC2| + S/O NAD C2V = |MOAC2—MOAV| + S/O
Table 21.1 Places and Manners of Articulation in Polish OBSTRUENT STOP
SONORANT
FRICATIVE
NASAL
LIQUID
GLIDE
AFFRICATE
lateral
rhotic
5.0
4.5
4.0
3.0
2.5
2.0
pb
VOWEL
1.5
1.0
w
w̃
0
m
fv
ʦʣ
sz
n
tʂ dʐ
ʂʐ
ʨʥ
ɕʑ
ɲ
j
ȷ̃
3.0 palatal
ŋ
w
w̃
3.5 velar
4.0
5.0
td
kɡ
x
l r
1.0 bilabial
1.5 labio-dental
2.0 (post-)dental
2.3 alveolar
2.6 alveolo-palatal
Table 21.2 Places and Manners of Articulation in English OBSTRUENT STOP
SONORANT
FRICATIVE
NASAL
LIQUID
GLIDE
lateral rhotic
1.0
0
AFFRICATE
5.0
4.5
4.0
3.0
pb
2.5
2.0
m
w
1.0 bilabial
fv
1.5 labio-dental
θð
2.0 inter-dental
sz
n
2.3 alveolar
ʃʒ
2.6 postalveolar
td ʧʤ
kɡ
l ɹ
j
3.0 palatal
ŋ
w
3.5 velar
4.0 -
5.0 glottal
ʔ
VOWEL
h
Markedness and Frequency Relations
305
Polish cluster /prV/ as in praca ‘work’ is analysed in the following manner: NAD C1C2 = |5–2| + |1–2.3| + 1 = 3 + 1.3 + 1 = 5.3 NAD C2V = |2–0| + 0 = 2 + 0 =2 5.3 > 2; difference 3.3 (later referred to as NAD product). The preference NAD (C1,C2) ≥ NAD (C2,V) is observed since 5.3 > 2. The SSG and NAD were applied to account for the structure of clusters at word edges in Polish and English. Zydorowicz & Orzechowska (2017) have shown that the two measures differ in the evaluation of clusters in Polish. Generally, clusters which fulfill the requirements of sonority minimally, i.e., by 0.5 or 1 point, are rejected by NAD as the contrast between C1 and C2 is insufficient. Word-initially, 24% of cluster types and 27% of word types are evaluated differently by the two principles. It can be concluded that NAD is a more demanding measure of cluster goodness as it is based on more requirements. A study by Orzechowska and Zydorowicz (2019) investigated the relationship between frequency and markedness in Polish and English word-initial and word-final clusters. The authors found no correlations between degrees of markedness, expressed by SSG and NAD distances, and logarithmic type and token frequencies, suggesting that the predictions of markedness are not corroborated.
3. Methodology Four types of resources were explored to extract clusters for analysis.1 As regards Polish, Słownik Podstawowego Języka Polskiego dla Cudzoziemców by Bartnicka-Dąbkowska and Sinielnikoff (1999) provided cluster types as well as word types, and frequency information was appended from the corpus of raw texts of the nation-wide newspaper article collection designed for the general public. As far as English is concerned, the data also include a word list and a corpus. The wordlist was based on the CUV2 lexicon compiled by Mitton (1992) in the Oxford Advanced Learner’s Dictionary of Current English (Hornby 1974) and supplemented by data from Sobkowiak (2006). Frequency counts for the items studied were extracted the Corpus of Contemporary American English (Davies 2011). Words with initial clusters were extracted from the four resources. We matched each cluster with two types of frequencies: type and token. The term type frequency is used to refer to the number of words representing a given cluster, whereas token frequency is understood as the number of word repetitions in the corpus. Finally, in order to enable comparisons between corpora of different sizes and to group the clusters in several categories, we used the data in Orzechowska & Zydorowicz (2019), where raw frequencies have been transformed into logarithmic frequencies (loq freq), along a scale from 1 to 7 (where 1 = rare, 7 = frequent) (cf. Orzechowska & Zydorowicz 2019).
306
Paulina Zydorowicz and Paula Orzechowska
Table 21.3 Ranges for the SSG, NAD and Logarithmic Frequency broad groupings
SSG range
broad divisions
NAD range
broad divisions
log freq range
−3 −2 −1 /p/ suggests that the tensions between the two expectations are worked out in different ways in individual cases. At least in some languages, diachronic processes seem to have been powerfully active in eliminating instances of /p/. Comparative studies show that lenition of earlier /p/ to a fricative, such as /f, ɸ, h/ has occurred in many instances (e.g., Proto-Austronesian *puluq ‘ten’, Malagasy folo, Arosi taŋahulu). Most likely such changes are facilitated for perceptual reasons: the explosion of /p/ has the weakest burst and the most diffuse spectrum among plosives (Stevens 1999). A relative disfavoring of /p/ can also be evaluated by comparing its frequency with respect to other voiceless plosives, not to its voiced counterpart. Maddieson (in press) evaluated the relative frequency of /p/ with respect to /k/—itself the most common consonant in more languages than any other—and found that in a sample of 43 languages with both /k/and /p/, /p/ was less frequent than /k/ in 41.
374
Ian Maddieson
2.3. Velars In the case of velar plosives, Solidarity leads to the expectation that /ɡ/ would be less frequent than /k/ on markedness grounds, and therefore would also be expected to occur in fewer inventories. In the LAPSyD sample, 96% of all languages with a voicing contrast between plosives include /k/ in their inventory. And 29% of these do not have a voiced counterpart /ɡ/, substantially more than the percentage of languages that lack /p/. The relative rarity of /ɡ/ in inventories is therefore confirmed (cf. Maddieson 2011–2014). In an ensemble of 66 languages for which relevant within-language frequency data are available either in published sources or from the author’s calculations, 56 (83%) show a clear predominance of /k/ over /ɡ/. For the remaining ten languages, in half of them /ɡ/ appears more frequently than /k/, and in the others the two are of roughly equal frequency. In this instance, unlike with bilabials, both markedness and occurrence in inventories point in the same direction with respect to within-language frequency. This is consistent with finding a much stronger tendency for /k/ > /ɡ/ than for /b/ > /p/, since in the latter case this ranking conflicts with markedness values related to plosive voicing. However, despite the well-understood phonetic reason that disfavors voiced velar plosives (small supra-laryngeal cavity size, see Ohala 1983), there are nonetheless a few languages that have more frequent /ɡ/ than /k/. 2.4. Summary The preceding sections illustrated how /p/ is rarer in inventories and in within-language frequency than might be expected, and the same is true a fortiori for /ɡ/. In both cases, uniformity or economy would lead us to expect the presence in inventories to follow simply from the use of the relevant place and manner/voicing features, but this is obviously not fully the case. As noted, phonetic factors that are distinct for different place/ manner combinations probably account for these exceptions (Ohala 1979, 1983; Maddieson 2011–2014).
3. Glottalized Stops Even sharper deviations from the expectations of Uniformity occur in connection with stops produced with a glottal constriction. Ejective stops are produced with a full closure of the vocal folds and larynx raising. These maneuvers most effectively block any possibility of accompanying voicing. Implosive stops are typically produced with vocal fold approximation, not closure, and downward movement of the larynx, increasing the amplitude of translaryngeal flow and of the amplitude of voicing (Ladefoged & Maddieson 1986). Ejective and implosive stops are thus at extreme opposite poles of a voicing continuum.
Uniformity, Solidarity, Frequency
375
In the LAPSyD database about 14% of the languages have one or both of these classes of consonants, with a slightly larger number having implosive stops than ejective ones. The bilabial place is favored for implosives but relatively disfavored for ejectives: 93 languages have /ɓ/ vs. 61 with /p’/. While both bilabial and coronal places are quite frequent for implosives, velar place is very strongly disfavored: in fact, it only occurs in 14 languages, compared to the 93 that have /ɓ/. Uniformity thus applies very weakly to the class of implosive consonants. Rather few within-language frequency counts are available for relevant languages, but Maddieson (in press) provides a few counts. Of the seven languages with /ɠ/, in none of them is it the most frequent implosive. In the language with the most extensive data, Ma’di (mhi, South Sudan), there are 431 instances of /ɓ/, 339 instances of /ɗ/ and only 56 instances of /ɠ/ in the lexicon. As for ejectives, whereas coronal and velar places are often quite wellrepresented, bilabial /p’/ where it does occur is frequently extremely rare. In four of the counts reported in Maddieson [in press] there are fewer than ten instances of /p’/ in samples including hundreds of words with ejectives (fewer words are available for Qawasqar [alc, Chile]). For example, in Amharic (amh, Ethiopia), three tokens of /p’/ occur in a lexicon of 1254 words, compared to 194 tokens of /k’/ and 166 of /t’/. Thus the phonetic particularities affecting the combination of voicelessness with bilabial place and voicing with velar place apparent among simple plosives seem to have an accentuated affect among the set of glottalized stops. Both /p’/ and /ɠ/ are of very rare occurrence.
4. Double Articulations A relatively small number of languages have a further interesting class of plosive segments involving two places of articulation at the same time. Other than labial–velars, such combinations are vanishingly rare in the world’s languages (Ladefoged & Maddieson 1986). In LAPSyD, 51 languages have /k͡p/ or /ɡ͡b/ or both, with the majority, 44, having both. There does not, therefore, seem to be any very strong bias toward presence of the voiceless, theoretically unmarked, member of this pair in inventories. Most languages with labial–velars are African, but a few occur elsewhere, especially in Papua New Guinea. Birom (bom, Nigeria) has all of the segments /p, t, k, k͡p, b, d, ɡ ɡ͡b/ although occurrences of /k͡p, ɡ͡b/ are rather rare in the text counted (the Gospel of St. Mark). In this language /t, k/ are roughly twice as frequent in the text as /d, ɡ/, but /p/ is much less frequent than /b/, reflecting the disfavoring of voiceless bilabials. The disfavoring of voicelessness with bilabial place seems also to be reflected in doubly articulated stops that include a bilabial component, as there are only 19 occurrences of /k͡p/ vs. 33 of /ɡ͡b/. In Themne (tem, Sierra Leone) /ɡ͡b/ appears to fill a slot that might otherwise have been filled by /ɡ/ which is missing from the inventory of this
376
Ian Maddieson
language. Among the voiced plosives the rank order of frequency here is /b/ > /d/ > /ɡ͡b/ (970 cases of /b/, 493 of /d/ and 367 of /ɡ͡b/), so /ɡ͡b/ has a frequency that would be similar to that expected for a /ɡ/. In Yoruba (yor, Nigeria), which has no /p/ in its inventory, /k͡p/ seems to fill its place. Yoruba has /b, d, ɡ/ as well as /ɡ͡b/. In Peust’s (2008) count /b/ > /d/ > /ɡ͡b/ > /k͡p/ > /ɡ/, so /k͡p/ is in a familiar place in the frequency hierarchy relative to /b/ if it is in some sense the representative of /p/ in this language, and /ɡ͡b/ is, as expected, less frequent than the “simpler” /b/. But simple /ɡ/ is strikingly rare. In Amele (aey, Papua New Guinea), it seems that it is /ɡb/ that in some sense might be considered to stand in place of /p/ in the inventory. The set of plosives in this language is /t, k, b, d, ɡ, ɡ͡b/, so there is no voiceless counterpart to either /b/ or /ɡ͡b/. A forced symmetry could put /ɡb/ in the “slot” where /p/ might be expected. However, voiceless plosives in this language are rare, with only 873 occurrences of /t/ in 12 chapters of the Gospel of St. Mark, and even fewer, 81, occurrences of /k/, several of which are in place names (e.g., Kapernaum) or English/Tok Pisin loans (e.g., king, buk). By contrast, there are 1491 occurrences of /b/, 3012 of /d/, 2498 of /ɡ/ and 1312 of /ɡb/. Therefore, the voiced series seems to be much more basic than the voiceless in this language, and the rank order /d/ > /ɡ/ > /b/ > /ɡ͡b/ only surprises by the relative rarity of /b/. Dedua (ded, Papua New Guinea), like Birom, has all of the segments /p, t, k, k͡p, b, d, ɡ ɡ͡b/. However, like Amele, it also has strikingly high frequency of the voiced plosives /b, d, ɡ/ relative to their voiceless counterparts. In fact, /b, d, ɡ/ are several times more frequent than /p, t, k/ in ten chapters of the Gospel of St. Mark, but in contrast, /ɡ͡b/ is only about one tenth as frequent as /k͡p/ (38 to 376 instances). Among the voiced stops the rank order is /d/ > /ɡ/ > /b/ > /ɡ͡b/, as in Amele. Note that although both languages are in the Trans-Guinea family, they are not particularly closely related.
5. Some Conclusions In the inventories of stop systems, the principle of Uniformity rules to a great extent. However, there are salient exceptions, such as the relative many cases of languages in which of /p/ and /ɡ/ are absent from inventories. Within-language frequency patterns mirror these exceptions, providing strong overall evidence for the Solidarity principle suggested by Jakobson.
6. Practical Issues Results have been presented so far with no discussion of the issues related to conducting frequency counts. However, these cannot be avoided. Estimating cross-language occurrence in language inventories is itself complicated,
Uniformity, Solidarity, Frequency
377
but is simplified in this chapter by relying on decisions made in the LAPSyD resource. Within-language segment frequency counts are typically based on either texts, whether written or spoken, or on a lexicon. Obviously, the relative frequency of segments in both of these types of sources will depend on a number of factors. The varying topic(s) of textual material will naturally result in different frequencies of particular words, which can bias the counts, especially if the texts are relatively short. Lexical material, drawn from dictionaries, wordlists or concordia will inevitably be heavily dependent on the particular form chosen for the lexical entry. Typically, the material available for ready analysis is limited for all but the major world languages for which large analyzed corpora are usually available covering both lexical and textual frequency. In the preparation of this chapter, lexical resources have been used to derive the frequency counts in a number of cases, but texts have been used for others. For example, for eight African languages (Miya, Basari, Adamawa Fulani, Ma’di, Hausa (Ader dialect, Niger), Seko, Mamvu and Dan), the RefLex database (Segerer & Flavier 2011–2018) was used. On the other hand, for Kala Lagau Ya, Themne and Dedua an online version of the Gospel of St. Mark written in a phonemic orthography was analyzed (mentions of “Christ” were omitted from the counts), and for Kota two texts in Emeneau (1944) were counted. The heterogeneity of the sources obviously creates some concerns about comparability. To some degree this concern is eased by comparisons made between lexical and textual counts in more well-known languages. To a very large degree, there is coherence between the frequencies found in text and lexical counts (as least if the lexical entry form is not grossly biased, such as would be the case if every English verb was cited in an infinitive form including “to”). At least when relatively large datasets are concerned, it seems like there is a good deal of convergence between counts based on text frequency and on lexical frequency. Figure 27.1 shows the overall correlation between the frequencies of consonants in French based on a large corpus of written texts and a large corpus of lexical forms (New & Pallier 2019). The correlation between the two is very high (R2 = .86). The higher values for /d, l, s/ in text are easily accounted for as due to their frequent occurrence in grammatical morphemes. This result is broadly reassuring that counts based on either text or lexical frequency will largely detect similar patterns. There are also issues concerning the transcription used. For example, in German the frequency of voiceless stops will be increased if the wordfinal lexically-voiced stops in word like Rad ‘wheel’ are treated as voiceless (as they are phonetically, but without all difference being erased, cf. Port & O’Dell 1985; Port & Crawford 1989). It is not always possible to determine in sources how cases of, say, neutralization or assimilation which might be considered to change phoneme identity have been treated.
378
Ian Maddieson
Figure 27.1 Text vs. Lexical Frequency of Consonants in French.
Despite these challenges, through the assembling of data from many languages—even if somewhat limited in some cases—it seems clear that broad cross-linguistic patterns can be identified and established.
References Blevins, J. 2009. Another universal bites the dust: Northwest Mekeo lacks coronal phonemes. Oceanic Linguistics 48. 264–273. Dart, S. N. 1998. Comparing French and English coronal consonant articulation. Journal of Phonetics 26. 71–94. Delattre, P. 1965. Comparing the phonetic features of English, German, Spanish and French. Heidelberg: Julius Gross Verlag. Dziubalska-Kołaczyk, K. 2002. Beats-and-binding phonology. Frankfurt-am-Main: Peter Lang. Emeneau, M. B. 1944. Kota texts, Part 1. Berkeley and Los Angeles: University of California Press. Hockett, C. A. 1955. Manual of phonology. International Journal of American Linguistics 21(4), Part 1 (Indiana University Publications in Anthropology and Linguistics, Memoir 11 of IJAL.). Baltimore: Waverly Press. Jakobson, R. 1941. Kindersprache, Aphasie und allgemeine Lautgesetze. Uppsala: Almqvist och Wiksells Boktryckeri. Ladefoged, P. & I. Maddieson. 1986. Sounds of the world’s languages. Oxford: Blackwell.
Uniformity, Solidarity, Frequency
379
Lindblom, B. 1983. The economy of speech gestures. In P. F. MacNeilage (ed.), The production of speech, 217–245. New York: Springer Verlag. Lindblom, B. 2000. Developmental origins of adult phonology: The interplay between phonetic emergents and the evolutionary adaptations of sound patterns. Phonetica 57. 297–314. Lindblom, B. & I. Maddieson. 1988. Phonetic universals in consonant systems. In L. M. Hyman, V. Fromkin & C. N. Li (eds.), Language, speech and mind, 62–78. London: Routledge. Maddieson, I. 2011–2014. Voicing and gaps in plosive systems. In M. Haspelmath, M. S. Dryer, D. Gil & B. Comrie (eds.), World atlas of language structures, 26–29. New York and Oxford: Oxford University Press. Maddieson, I. in press. Segment frequency: Within-language and cross-language similarity. Maddieson, I., S. Flavier, E. Marsico, C. Coupé & F. Pellegrino. 2013. LAPSyD: Lyon-Albuquerque phonological systems database. Proceedings of Interspeech 2013, Lyon, 25–29 August. Martinet, A. 1955. Économie des changements phonétiques. Bern: Franke. Martinet, A. 1968. Phonetics and linguistic evolution. In B. Malmberg (ed.), Manual of phonetics, 464–487. Amsterdam: North-Holland. Neuberg, S. 2016. Phonology of Berta: Language and culture archives. Dallas: SIL International. www.sil.org/system/files/reapdata/34/97/95/349795845591 08017895152445720359187456/Neudorf_Berta_Phonology_2016_03_with_ coversheet.pdf. New, B. & C. Pallier. 2019. Lexique. On-line resource. Université Savoie-Mont Blanc. www.lexique.org. Ohala, J. J. 1979. Phonetic universals in phonological systems and their explanation. [Summary of symposium moderator’s introduction]. Proceedings of the 9th International Congress of Phonetic Sciences, Copenhagen, 5–8. Ohala, J. J. 1983. The origin of sound patterns in vocal tract constraints. In P. F. MacNeilage (ed.), The production of speech, 189–216. New York: Springer Verlag. Palmer, B. 2009. Kokota grammar (Oceanic Linguistics Special Publication 35). Honolulu: University of Hawaii Press. Passy, P. 1891. Étude sur les changements phonétiques et leurs caractères généraux (doctoral dissertation, Faculté des lettres de Paris). Paris: Librairie Firmin-Didot. Peust, C. 2008. On consonant frequency in Egyptian and other languages. Lingua Aegyptia 16. 105–134. Port, R. F. & P. Crawford. 1989. Incomplete neutralization and pragmatics in German. Journal of Phonetics 17. 257–282. Port, R. F. & M. L. O’Dell. 1985. Neutralization of syllable-final voicing in German. Journal of Phonetics 13. 455–471. Salomon, R. G. 1996. Brahmi and Kharoshthi. In P. T. Daniels & W. Bright (eds.), The world’s writing systems, 373–382. New York and Oxford: Oxford University Press. Schadeberg, T. 1981. A survey of Kordofanian: Volume 2, The Talodi group. Hamburg: Helmut Buske. Segerer,G. & S. Flavier. 2011–2018. RefLex: Reference lexicon of Africa, version 1.1. Paris and Lyon: CNRS. http://reflex.cnrs.fr/. Stevens, K. N. 1999. Acoustic phonetics. Cambridge, MA: MIT Press. Trubetskoy, N. S. 1939. Grundzüge der Phonologie (Travaux du Cercle Linguistique de Prague 7). Prague: Le Cercle Linguistique de Prague.
380
Ian Maddieson
Zipf, G. K. 1929. Relative frequency as a determinant of phonetic change. Boston: Department of the Classics, Harvard University. Zipf, G. K. 1935. The psycho-biology of language: An introduction to dynamic philology. Boston: Houghton Mifflin. Zipf, G. K. 1949. Human behavior and the principle of least effort: An introduction to human ecology. Cambridge, MA: Addison-Wesley [reprinted 2012, Eastford, CT: Martino Fine Books].
Index
accentedness ratings 188–189, 192, 194 accidental gaps 130 accommodation 318 activation 258–259 adaptation 327–328 affrication 70–71, 73, 204–208 African languages 375, 377 agreement 77–82 alignment 110, 111, 118, 209 Amele 376 Amharic 375 amplitude envelope 110, 115–119, 125 Anatolian 76–77, 81–82 Arabic 213 aspiration languages 322 assimilation 53–54 , 60 – 61 , 69 , 202–207, 213–220 assimilatory processes 274 asymmetrical CLI 322–324 attitude(s) 87, 91, 97–98, 190, 194 attrition 185–187, 190 awareness 88, 91–92, 94–96, 101 back vowels 54, 57, 59 Beats and Binding theory 133–134, 303 bilabial stops 373 Bilingual Dominance Scale (BDS) 247 bilingual figurative processing 246 bilinguals 186, 189, 258–260, 262 binarity 112–114, 125 Birom 375 boundaries 161 Calibration Law 70 CASUM project 361–368 Celtic 76–78, 81–82
cerebellar dysfunctions 227, 229 cerebellum 227–229, 235, 238 Chinese 213–216 classroom discourse 359, 361, 368 CLI see cross-linguistic interaction/ influence cluster(s) 301–303, 305–314 coarticulation 274 compensatory lengthening 26, 39 competition 258–261, 267 competitor 259, 267 complexity 301 computer-assisted pronunciation training (CAPT) 270–273 computer-generated visual feedback 270–272, 276 connected speech 202–203 consonantal language 145 consonant clusters 145–157 contact epenthesis 72 coronal: consonants 60–61; stops 372–333 corpus 198–204, 209–210, 305, 306, 309–114 correctness 88, 90–92, 94, 96, 98–99, 101, 104–105 crazy rules 140 cross-language phonetic relationships 172–174 cross-linguistic interaction/influence 316–317, 321–324 cross-modal priming 259, 261 CVCV phonology 163 Danish 172–173 dark l 43, 45–46, 50 data (dimensionality) reduction 275 Dedua 376 dental l 42–51
382
Index
depalatalisation 60, 63 Derivational Optimality Theory 24, 33–35, 39 Desensitization Hypothesis 175 development 260 devoicing 69, 73 dialect leveling 203, 208–209 diphthongs 25–26, 38 distance 112, 118, 120, 122, 125 domain-general vs. domain-specific explanations 131–132 Donegan, P. 67, 68, 69, 70–74 double articulation 375–376 Down Syndrome 271 Dravidian 77 Dutch 222–223 ejectives 374–375 electromagnetic articulometry 271 electropalatography (EPG) 270–276 emotional prosody 356 emphatic stress 335–336 English phonology 24–41 epenthesis 71–72 equivalence classification 316–317, 322 ethnic affiliation 330, 334 evaluation (problem) 88, 95, 105 event-related potential (ERP) 244–256 expert learners 328, 330 extralinguistic factors 186–187, 189, 191, 194 facilitation 258, 260 feature theory 322 feminine 76–82 figurative language processing 244 final obstruent devoicing 163 first language acquisition 270, 272 fluency 109, 111, 118–120, 122–123, 125 folk linguistics/FL 87–88, 96–99, 101, 102–104, 106 foreign accentedness/accent 188–189, 194 foreign accent reduction 272, 275 foreign language pronunciation teaching 270–273 formants see vowel formants fortition 68–69 French 53, 59, 214, 217, 219–222, 377–378 frequency 109–111, 115–118, 120, 370–378
fricative 67 front vowels 53–55, 58, 60, 63 functional approaches 132 fused marker 79–81 Gaelic 223 gender 76–83 generalized phonetic sensitivities 178–180 German 145–157, 214, 221–222, 377 glottalized stops 374–375 Graded Salience Hypothesis (GSH) 245–249, 254–256 Greek 214 Hawaiian 372 Head Law 70 Hindi-Urdu 218 identity 88 idiomatic expressions 250, 252, 254 implosives 374–375 Indo-Aryan 78, 80 Indo-European 76–78, 80–82 Indo-Iranian 76–78, 81–82 infant vowel perception 175–177 inhibition 259–260 initial clusters (in Slavic) 134–137 integrated perspective 191, 193 interspeaker variability 207 intervocalic epenthesis 71–72 intonation 111, 115, 119, 349, 351–352 intrusive r 218 Irish 53–63 isochrony 111–112, 114–115, 119, 125 Italian 53, 219, 221–222 Japanese 216, 218 Jassem differencer 212 Kala Lagau Ya 373 Kokota 372 Kota 373, 377 L1 maintenance 191 L1 Polish 320–322 L2 258–263, 266–267 L2 acquisition 171–184, 187–188 L2 English 317–320 language contact 76, 80–81 language ideology 103, 106 language proficiency 326–327
Index LAPSyD database 370–377 laryngeal licensing 164–166 laryngeal phonology 322–324 Laryngeal Realism 159, 322 Laryngeal Relativism 159 Latin 214, 220, 222–223 left hemisphere 245, 254 lenition 68–69 lexical competition 260 lexical decision task 263 Lexical Restructuring Model 261, 267 licensing inheritance 165 literal salience hypothesis 255 literal sentences 245 logarithmic frequency 305–306 l-vocalization 215 machine learning 110–111, 114, 125 Ma’di 375 magnetic resonance imaging 271 Mandarin Chinese 172–173 markedness 301, 303, 305–306, 308, 310–311, 314 mental lexicon 247, 249, 254–255 metanalysis 214 metric 112, 114–115, 119, 123, 125 modularity of the mind 140–141 modulation 115–116 Modulation Theory 324 monophthongization 28, 33, 36, 39 morphonotactics 145–157 motor timing tasks 227–228 multilingualism 195 N400 244–247, 250 NAD see Net Auditory Distance narratives 192–194 nasality 90, 104 native speakers 260–262, 264–267 Natural Phonology 66–68, 132–133 natural referent consonants 177–178 Natural Referent Vowel framework 175–178 Nepali 76, 78–82 Net Auditory Distance 150–151, 154–156, 301, 302–303, 305–306, 308–314 ‘neutral’ accent 329 non-native speakers 259–261, 264–267 non-palatal consonants 53–58, 60–63 Norwegian 42–51 numeral 76–82 numeral classifier 76, 78–82
383
onset 70 Onset Prominence 159, 323–324 onset reinforcement 70 Optimality Theory 25–37 Optimal Sonority Distance Principle 303 oscillation 110, 115, 118, 125 Oslo Norwegian 42–51 Østfold-l 42, 50 Østfold dialect 42, 46, 47–50 otherness 329, 335 overgeneration 137–138 overlap 259, 260, 262 P300 244 P600 244–247, 252–255 palatal consonants 53–63 palatalisation 53–56, 58–61 parametric licensing 166–167 PEPS-C 350 perception 68–69, 70, 89, 99, 104–105 Perceptual Assimilation Model (PAM) 172–173 perceptual asymmetries 175–178 periodicity 115, 117, 122–124 phoneme inventory 66–67 phonetic convergence 317–320 phonetic drift 321–322 phonetic experience 258, 260–262, 264–267 phonetic research 273–274 phonetic(s) 87–88, 90, 92, 96, 98–99, 101, 104, 106 phonetic training 260, 267 phonological processes 67, 68, 69 phonological differences 259 phonology 87–90, 92, 95–96, 98–99, 103–106 phonotactics 145–157, 301–303 phrasing 349, 351–352, 354–356 pitch accent 110, 115, 119 plosives 371 Polish 54–55, 338–347 Polish (im)migrants 190, 192–193, 326–327, 329–330, 335 Polish voicing 160–162 Polish vowels: allophony 344–345; formants 338–347 Poznań Polish 202–209 preferability 301, 306 preferred clusters 150–152, 154–156 prefixing language 146 pre-sonorant voicing 202–209 primary articulation 53, 62
384
Index
prime 260–267 priming 259–261, 263, 267 processes 202 pronunciation pedagogy 270–273 prosodic typology 69 prosody 68–69, 70, 73, 349–351, 355–356 Proto-Austronesian 373 Proto-Indo-European 76, 81–82
spontaneous speech 199–200 SSG 302–303, 305–313 Stampe, David 66–73 steretotype(s) 99, 104 stop inventories 370–376 strengthening 68–69, 70 syllable 70 syllable structure 70, 71 systematic gaps 130
Qawasqar 375 QTA (Questioning the Author) approach 360–362
T@blit project 360, 368 Tamil 219 tapping 228–229, 231, 235, 238 target 258–260, 262–267 target variety 327, 330 Themne 373, 375 third factor explanations 131 Tibeto-Burman 76, 80 time 109–112, 114, 119, 125 token frequency 301, 305, 314 tone 110–111, 115, 119 transcription 198–199, 202–203, 209 true-voice languages 322 Turkish 216 type frequency 301, 305, 314 typology 66
raising 204–208 randomised controlled trials 271, 273, 276 rate 112, 117, 119, 123 regressive voicing assimilation 161 response time 264–265 retroflex l 42–51 rhythm 109–112, 115–119, 121–125 rhythm zone 109–111, 115–119, 121–125 Russian 53–63, 145–157, 223 Sanskrit 213 secondary articulation 53, 58 second language learning 258 Sentinel Island 129 shortening 31–32, 34, 36–37 sibilant 67 signal processing 111, 116, 123, 125 Slavic 55, 167 sociolinguistic interview 190–191, 194 solidarity 371, 373–374, 376 sonorants 53–56, 62 sonority 302, 305–306, 308–311, 314 sonority scale 302–303 Sonority Sequencing Generalization 301–302, 314 sound change 42–51, 105–106 sound sweep 351–353, 356 southern drawl 88–89, 91, 96 Spanish 53, 173–174, 222 spectrum 111, 115–121, 123 speech 109–112, 114–115, 117–120, 124–125 speech community 87, 105–106 Speech Learning Model (SLM) 172, 316, 322 speech therapy 271, 275–276 spelling 88–90, 92–94, 96
Ukrainian voicing 167 ultrasound imaging 271, 275 unidirectional IDENT 24, 26, 39 uniformity 370–371, 373, 375–376 Universal Grammar 130–131 universal phonetic biases 174–178 variation 203–205, 210 velarized l 42–51 Vennemann 70–71 Vietnamese 220 visual feedback 270–272, 275–276 voice onset time 317–322 voicing 202–209 vowel decomposition 25–26, 29, 38–39 vowel formants 111, 338–347 vowels 259, 261–262, 266–267 Wagner Quadrant 113–114 word recognition 258–261, 264–267 Yoruba 376 Zamucoan 83