Universal or Diverse Paths to English Phonology 9783110346084, 9783110345926

The book is concerned with the acquisition of English phonology, both segmental and suprasegmental, by learners of Engli

192 5 3MB

English Pages 255 [256] Year 2015

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Table of contents
1. Introduction
2. The phonology of Brunei English: L2 English or emergent variety
3. Rothicity in Malaysian English: The emergence of a new norm?
4. Cross-linguistic influence in second vs. third language acquisition of phonology
5. Differences in the perception of English vowel sounds by child L2 and L3 learners
6. Loanword adaptation and second language acquisition: Convergence and divergence
7. Onset consonant cluster realisation in Nigerian English: The emergence of an endogenous variety?
8. Acquiring English and French speech rhythm in a multilingual classroom: A comparison with Asian Englishes
9. A sonority-based account of speech rhythm in Chinese learners of English
10. English word stress in L2 and postcolonial varieties: systematicity and variation
11. Prosodic marking of focus in transitive sentences in varieties of South African English
12. Epilogue: Universal or diverse paths to English phonology?
Index
Recommend Papers

Universal or Diverse Paths to English Phonology
 9783110346084, 9783110345926

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Ulrike Gut, Robert Fuchs, Eva-Maria Wunder (Eds.) Universal or Diverse Paths to English Phonology

Topics in English Linguistics

Edited by Elizabeth Closs Traugott Bernd Kortmann

Volume 86

Universal or Diverse Paths to English Phonology

Edited by Ulrike Gut, Robert Fuchs, Eva-Maria Wunder

ISBN 978-3-11-034592-6 ISBN (PDF) 978-3-11-034608-4 ISBN (EPUB) 978-3-11-039458-0 ISSN 1434-3452 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. 6 2015 Walter de Gruyter GmbH, Berlin/Boston Typesetting: RoyalStandard, Hong Kong Printing and binding: CPI books GmbH, Leck ♾ Printed on acid-free paper Printed in Germany www.degruyter.com

Table of contents 1

Ulrike Gut, Robert Fuchs & Eva-Maria Wunder 1 Introduction

2

David Deterding The phonology of Brunei English: L2 English or emergent variety

3

Stefanie Pillai Rothicity in Malaysian English: The emergence of a new norm?

9 23

4

Magdalena Wrembel Cross-linguistic influence in second vs. third language acquisition of 41 phonology

5

Romana Kopečková Differences in the perception of English vowel sounds by child L2 and 71 L3 learners

6

Hemalatha Nagarajan Loanword adaptation and second language acquisition: Convergence and 91 divergence

7

Taiwo Soneye & Kehinde Ayoola Onset consonant cluster realisation in Nigerian English: The emergence of 117 an endogenous variety?

8

Christoph Gabriel, Johanna Stahnke & Jeanette Thulke Acquiring English and French speech rhythm in a multilingual classroom: 135 A comparison with Asian Englishes

9

Robert Fuchs & Eva-Maria Wunder A sonority-based account of speech rhythm in Chinese learners of 165 English

Heidi Altmann & Barış Kabak 10 English word stress in L2 and postcolonial varieties: systematicity and 185 variation 11

Sabine Zerbian Prosodic marking of focus in transitive sentences in varieties of South 209 African English

Ulrike Gut 12 Epilogue Index

249

Ulrike Gut, Robert Fuchs & Eva-Maria Wunder

1 Introduction1 Scientific investigations of the acquisition of English phonology are currently carried out in several different disciplines: in second language acquisition research, third or additional language acquisition research and in the sociolinguistic study of the phonologies of the “New Englishes”, the world-wide varieties of English that have developed in the past 200 years. Researchers working in these three disciplines all share the common goal of shedding light on the structures and regularities of learner English phonology and on the role that the learners’ other languages play in the acquisition process. This includes investigations into the course, rate and final outcome of the acquisition of segmental and prosodic properties of English as well as of the factors constraining and influencing this process of phonological acquisition. Yet, despite these common goals, researchers from the different disciplines have so far mainly worked side by side without employing, testing and modifying each other’s theories and methods. Each discipline seems to have developed its own theoretical focus and methodological approaches. This disparity is also reflected in the fact that for these disciplines nearly completely separate research communities exist that share neither specialist conferences nor journals for the publication of their research. Having its roots in the late 1960s, the scientific study of second language acquisition (SLA) is a relatively young discipline in linguistics. Early theoretical advances included the insight that language produced by language learners is systematic and can be described with the same rules and constraints as any other natural language (e.g. Corder 1967; Selinker 1972). While initially an applied discipline focussing on language teaching strategies, by the mid-1980s SLA research had emerged as a distinct research field with a theoretical orientation and methodology of its own. Current models of learner phonology (e.g. Best 1995; Flege 1995; Brown 2000; Major 2001) are predominantly concerned with three major issues: the relationship between the speaker’s first (L1) and second language (L2), the role of language universals and the influence of non-linguistic factors on the acquisition rate, process and outcome (e.g. Piske, MacKay, and

1 The idea for this volume was conceived at the workshop “Universal or diverse paths to English phonology”, which was held at the University of Münster in September 2012. We are grateful to the VW Foundation for funding this event. Our thanks also go to the series editor Bernd Kortmann for his helpful comments, to our reviewers Gessica De Angelis, Björn Hammarberg, Magnus Huber, Mary O’Brien and Volker Dellwo and to Silke Elisabeth Stagg for her assistance with the editing process.

2

Ulrike Gut, Robert Fuchs & Eva-Maria Wunder

Flege 2001). Phonemic substitutions and deviations on the prosodic level in learner speech have been explained and predicted by rule-driven approaches, underlying representations, language universals and prosodic hierarchies. Nonlinguistic factors, which comprise such diverse concepts as motivation, age of first contact with the L2, length of residence, musical ability, type of instruction and continued L1 use, are investigated in order to explain the variability across speakers in L2 phonology and to predict the ultimate outcome of second language learning (see e.g. Gut 2009, chapter 9). In the 1980s, a new discipline in English linguistics arose that focussed on the emerging postcolonial varieties of English. These are spoken in – usually multilingual – countries in which English has an important status as an official language and where it is used in business and commerce, education, media, mass communication and as a lingua franca for interethnic communication. Most empirical studies and comparative overviews (e.g. Hughes and Trudgill 1996; Schneider et al. 2004; Schneider 2007; Mesthrie and Bhatt 2008) to date focus on the description of the structural properties of these New Englishes, often in comparison with standard varieties of English. Divergences are interpreted as a result of language contact between English and the other languages spoken in the respective country (e.g. Schneider 2003: 248), and are considered indicators of the degree of “nativisation”, i.e. the development of a distinct linguistic shape that a particular variety has undergone (e.g. Gut 2007). Only a few studies exist so far that explore the contribution of other factors, such as colonial input, teaching traditions and population migration, to the phonologies of new varieties of English (e.g. Simo Bobda 2003). An increasing number of studies furthermore aim to find angloversals, i.e. shared structural properties across the new Englishes (e.g. Mair 2003; Szmrecsanyi and Kortmann 2009). An even younger discipline, barely 15 years old, is concerned with research on the acquisition of a third or additional language (L3/Ln) (e.g. Cenoz and Jessner 2000; Cenoz, Hufeisen, and Jessner 2001; De Angelis 2007). Its research focus centres around the fact that L3/Ln learners have already acquired an L2 and thereby have gained conscious linguistic knowledge and language-learning experience on which they can potentially rely when learning a further language. The central hypothesis of this discipline is thus that L3/Ln learners can draw from more and different linguistic competence than monolingual language learners can in second language acquisition, and that L3/Ln language production and perception consequently differ from L2 production and perception by the complexity of potential sources for cross-linguistic influence. While for L2 speakers cross-linguistic influence is restricted to transfer between two languages, in an L3/Ln speaker’s mind at least three linguistic systems interact. It is one of the principal aims of L3/Ln acquisition research to explore how this

Introduction

3

interaction works and how cross-linguistic influence may affect the trilingual or multilingual speaker’s language production and comprehension. Current knowledge on both the types of phonological cross-linguistic influence and the conditioning factors, however, is still minimal. In short, in a very simplified way one could claim that research in SLA so far has focussed mainly on the acquisition process, research on varieties of English primarily on the acquisition outcome, while L3/Ln acquisition research has been predominantly concerned with the factors influencing both acquisition process and outcome. Joining forces of these largely complementary research efforts thus seems highly desirable. Moreover, the three disciplines do not only differ in their research focus, but also in their preferred research methods. Second language acquisition research employs a wide spectrum of methods. Although focussing on developmental processes, studies mostly rely on cross-sectional data that compares learners at different developmental stages; only few investigations are longitudinal in character and describe the development of individual learners. The majority of studies in second language phonology favours experimental data and tend to be based on a relatively small empirical base with a limited number of participants and the restriction to one particular speech style (see Gut 2009, chapter 2). The “second” language for participants in SLA studies typically either constitutes the language of the country they immigrated to and live in permanently or a language they learn at school as a “foreign” language. Research on the phonologies of new Englishes also uses a wide range of data, including experimentally elicited as well as “naturally” occurring language samples. The vast majority of studies employs a cross-sectional approach and compares the phonological properties of a particular variety of English with a standard variety such as British or American English (e.g. Deterding 2001). Increasingly, comparative studies across several new English varieties are carried out (e.g. Pillai, Manueli, and Dumanig 2010), or the new English phonological structures are compared with the phonologies of the indigenous language/s spoken in the country (e.g. Gut 2005; Hoffmann 2011). In contrast to many studies in SLA research, the learners investigated in the context of varieties of English are typically highly advanced, use English on a regular basis and live in an environment where the language plays a crucial role in many aspects of their life. For an increasing number of speakers, moreover, English is becoming the first or one of the first languages they acquire. In contrast to research in SLA and new varieties of English, early studies in L3/Ln phonological acquisition were longitudinal: they were based on individual case studies that closely followed the language learner’s development for the first year of acquisition and recorded L3/Ln production data at regular

4

Ulrike Gut, Robert Fuchs & Eva-Maria Wunder

intervals (Hammarberg and Williams 1993; Hammarberg and Hammarberg 2005). Recently, corpus-based studies that comprise various speaking styles have been carried out (e.g. Gut 2010; Wrembel 2010; Wunder 2011), and it is becoming standard methodology to elicit data from the speaker in all of his or her languages. Typically, these learners acquire their L3/Ln as a “foreign” language in a formal school setting. Suggestions for combining efforts across disciplines date back nearly thirty years (e.g. Sridhar and Sridhar 1986), but only very recently have comparative studies of second language learners and speakers of a postcolonial variety of English begun to be carried out (e.g. Nesselhauf 2009; contributions in Mukherjee and Hundt 2011; Davydova 2012), of which only one so far has been concerned with phonology (Gut 2007). It is the aim of this volume to tackle some of the fundamental questions in research on the acquisition of English phonology and to thus bridge the theoretical and methodological gap between the three disciplines that was sketched above. In particular, the contributions in this volume will address the following issues: – The commonalities and differences in the phonological properties of English as a second language, a third/additional language and a new variety. – The viability of the distinction between English as a second, a third/additional and a foreign language on phonological grounds. – The role of cross-linguistic influence, of language universals and of nonlinguistic factors in the acquisition of English phonology and their constraints. – The methods of investigating the path and outcome of phonological acquisition of English as a second language, a third/additional language and a new variety. This volume is structured into two parts. The first part comprises six contributions on the acquisition of English vowels and consonants and their distribution in syllables. The second part comprises four chapters on the acquisition of English prosody. In chapter 2, David Deterding analyses rhoticity in Brunei English and investigates factors that influence it. He shows that young Brunei English speakers differ noticeably in their realisation of coda /r/ and that this is unrelated to other phonological features in their speech. In conclusion, he argues that Brunei English speakers aim at a dynamic global style of English that at the same time maintains some L1 features. Moreover, he suggests that the dichotomy between a “new” variety of English and English as an L2 is becoming less relevant in the modern world. Rhoticity is also investigated by Stefanie Pillai in chapter 3. Focussing on Malaysian English, she compares older and younger, ethnically Malay, Indian

Introduction

5

and Chinese as well as L1 and L2 speakers of English. The results show very little evidence of rhoticity across all speaker groups, with only a few speakers having variably rhotic speech. This predominant non-rhoticity of Malaysian English is explained with the speakers’ norm orientation towards the model of British English. Magdalena Wrembel’s study in chapter 4 is concerned with the acquisition of voice onset time (VOT) in the voiceless plosives /p,t,k/ in stressed onset positions by multilingual speakers. She reports that they produce VOT values in their L3 French that are significantly distinct from the VOT values in their L2 English and their L1 German. VOT values between their L1 and L2 English, by contrast, do not differ, which suggests cross-linguistic influence from the L2 English onto the speakers’ L1. The length of instruction in the L2 English as well as self-assessed proficiency in the L3 emerged as significant factors that influence VOT values in the L3. Moreover, Wrembel’s study reveals strong universal effects of the place of articulation and some effects of the vowel context on VOT patterns in all three languages. In chapter 5, Romana Kopečková investigates the vowel perception of Polish L1 children who moved to Ireland. Comparing those who learn English as their only L2 with children who learn it alongside other languages, she demonstrates that the two groups differ in their ability to perceive contrasts between Polish and (Irish) English vowels. Children acquiring English together with other L2s demonstrate a greater perceptual sensitivity and are better able to distinguish the contrasts between /æ/ and /a/, /i/ and /ɪ/ as well as /ɔ/ and /əʊ/, respectively. Although both groups of children use the same cognitive mechanism of equivalence classification in their perception of L2 vowels, the superior perceptual abilities of children who have more than one L2 seem to reflect their greater language awareness and experience as well as positive cross-linguistic influence from their other languages. Chapter 6 by Hemalatha Nagarajan compares the realisation of English loanwords in Bangla (Bengali), an Indo-European language spoken in India and Bangladesh, with the variety of English spoken by Bangla speakers as a second or third language. Using an optimality-theoretic (OT) framework, she shows that English loanwords in Bangla are constrained by slightly different phonological rules than English words produced by Bangla speakers in English as an L2/L3: while in loanwords both coda and onset consonant clusters are broken up by epenthesis, this does not occur in acrolectal Bangla English. Nagarajan furthermore proposes a developmental sequence of the acquisition of English syllable structure as a re-ranking of various phonological rules. In chapter 7, Taiwo Soneye and Kehinde Ayoola investigate the production of onset clusters by Nigerian speakers of English. Comparing speakers from

6

Ulrike Gut, Robert Fuchs & Eva-Maria Wunder

different regional and ethnic backgrounds, they found that the production of epenthetic vowels in order to break up CC and CCC onset clusters is influenced by the speakers’ linguistic background. In addition, their research reveals that speakers of Nigerian English as an L1 produce some onset clusters that L2 Nigerian English speakers do not. In chapter 8, Christoph Gabriel, Johanna Stahnke and Jeanette Thulke investigate the acquisition of speech rhythm. Comparing L1 German and L1 Mandarin Chinese speakers of L2 English, they find evidence of cross-linguistic influence from the L1, which results in more target-like values for L1 German speakers. When producing speech rhythm in L3 French, however, the opposite picture emerges. An analysis of multilingual – speaking both Mandarin Chinese and German – learners’ speech rhythm in their L3/Ln English showed inconsistent patterns that overlap with both monolingual learner groups. These can be explained with the learners’ attitudes and their multilingual awareness which appear to favour positive transfer from an earlier acquired language onto their L3 English. This kind of transfer is also suspected to underlie the speech rhythm of the new English varieties spoken in Singapore, Taiwan and Hong Kong. The acquisition of speech rhythm is also the focus of chapter 9 written by Eva-Maria Wunder and Robert Fuchs. The authors investigate Mandarin Chinese speakers’ speech rhythm in English with a new sonority measurement. Contrary to their expectations they find strong evidence of cross-linguistic influence from the native language in the speech of these advanced learners and thus no difference from speakers of a new variety of English with Mandarin Chinese as heritage language. In chapter 10, Heidi Altmann and Barış Kabak investigate the perception and production of stress placement in English as a second language and in a new variety of English. An experiment involving the production of stress on nonce words suggests universal learner strategies: L2 English speakers with very different L1 backgrounds have a preference for placing primary stress on the final syllable. A perception experiment confirms that even highly advanced L2 English speakers are, in some cases, impervious to evidence and follow their own stress assignment rules. One such rule for speakers of Cameroon and Nigerian English is the reliance on the segmental content of the final rhyme. The authors suggest that these fossilised stress representations by individual speakers might have led to the distinct word stress systems in emerging varieties of English. Chapter 11 by Sabine Zerbian compares prosodic strategies of marking focus and givenness in three varieties of South African English: White South African English, which is spoken as an L1, the L2 variety Black South African English and a “crossing over” L2 variety, which is spoken by a recently established black middle class and which differs markedly from Black South African

Introduction

7

English in segmental and prosodic aspects. Zerbian shows that focus is marked prosodically in White South African English, but neither in Black South African English nor in the crossing over variety. In contrast, givenness is marked prosodically in similar ways in both White South African English and the crossing over variety, but shows different acoustic correlates in Black South African English. The volume closes with an epilogue by Ulrike Gut, in which she reviews the findings of all contributions to this volume in order to answer the research questions that were specified above.

References Best, Catherine T. 1995. A direct realist view of cross-language speech perception. In Winifred Strange (ed.), Speech perception and linguistic experience: Issues in cross-language research, 171–204. Timonium, MD: York Press. Brown, Cynthia. 2000. The interrelation between speech perception and phonological acquisition from infant to adult. In John Archibald (ed.), Second language acquisition and linguistic theory, 4–63. Oxford: Blackwell. Cenoz, Jasone, Britta Hufeisen & Ulrike Jessner (eds.). 2001. Cross-linguistic influence in third language acquisition: Psycholinguistic perspectives. Clevedon, UK: Multilingual Matters. Cenoz, Jasone & Ulrike Jessner (eds.). 2000. English in Europe: The acquisition of a third language. Clevedon, UK: Multilingual Matters. Corder, Stephen Pit. 1967. The significance of learners’ errors. International Review of Applied Linguistics 5. 161–169. Davydova, Julia. 2012. Englishes in the Outer and Expanding Circles: A comparative study. World Englishes 31 (3). 366–385. De Angelis, Gessica. 2007. Third or additional language acquisition. Clevedon, UK: Multilingual Matters. Deterding, David. 2001. The measurement of rhythm: A comparison of Singapore and British English. Journal of Phonetics 29 (2). 217–230. Flege, James Emil. 1995. Second-language speech learning: Theory, findings, and problems. In Winifred Strange (ed.), Speech perception and linguistic experience: Issues in crosslinguistic research, 233–277. Timonium, MD: York Press. Gut, Ulrike. 2005. Nigerian English prosody. English World-Wide 26 (2). 153–177. Gut, Ulrike. 2007. First language influence and final consonant clusters in the new Englishes of Singapore and Nigeria. World Englishes 26 (3). 346–359. Gut, Ulrike. 2009. Non-native speech: A corpus-based analysis of phonological and phonetic properties of L2 English and German. Frankfurt: Peter Lang. Gut, Ulrike. 2010. Cross-linguistic influence in L3 phonological acquisition. International Journal of Multilingualism 7 (1). 19–38. Hammarberg, Björn & Sarah Williams. 1993. A study of third language acquisition. In Björn Hammarberg (ed.), Problem, process, product in language learning, 60–70. Stockholm, Sweden: Stockholm University.

8

Ulrike Gut, Robert Fuchs & Eva-Maria Wunder

Hammarberg, Björn & Hammarberg, Britta. 2005. Re-setting the basis of articulation in the acquisition of new languages: A third-language case study. In Britta Hufeisen & Robert J. Fouser (eds.), Introductory readings in L3, 11–18. Tübingen: Stauffenburg Verlag. Hoffmann, Thomas. 2011. The Black Kenyan English vowel system. English World-Wide 32 (2). 147–173. Hughes, Arthur & Peter Trudgill. 1996. English accents and dialects: An introduction to social and regional varieties of British English. London: Arnold. Mair, Christian. 2003. Kreolismen und verbales Identitätsmanagement im geschriebenen jamaikanischen Englisch. In Elisabeth Vogel, Antonia Napp & Wolfram Lutterer (eds.), Zwischen Ausgrenzung und Hybridisierung: Zur Konstruktion von Identitäten aus kulturwissenschaftlicher Perspektive, 79–96. Würzburg: Ergon. Major, Roy. 2001. Foreign accent: The ontogeny and phylogeny of second language phonology. Mahwah, NJ: Lawrence Erlbaum. Mesthrie, Rajend & Rakesh M. Bhatt. 2008. World Englishes: The study of new language varieties. Cambridge: Cambridge University Press. Mukherjee, Joybrato & Marianne Hundt (eds.). 2011. Exploring second-language varieties of English and learner Englishes: Bridging a paradigm gap. Amsterdam and Philadelphia: John Benjamins. Nesselhauf, Nadja. 2009. Co-selection phenomena across New Englishes. English World-Wide 30 (1). 1–25. Pillai, Stefanie, Maria Khristina Manueli & Francisco Perlas Dumanig. 2010. Monophthong vowels in Malaysian and Philippine English: An exploratory study. Philippine Journal of Linguistics 41. 80–93. Piske, Thorsten, Ian R. A. MacKay & James Emil Flege. 2001. Factors affecting degree of foreign accent in an L2: A review. Journal of Phonetics 29. 191–215. Schneider, Edgar W. 2003. The dynamics of new Englishes: From identity construction to dialect birth. Language 79 (2). 233–281. Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world. Cambridge: Cambridge University Press. Schneider, Edgar W., Kate Burridge, Bernd Kortmann, Rajend Mesthrie & Clive Upton (eds.). 2004. A handbook of varieties of English. Volume 1: Phonology. Berlin: Mouton de Gruyter. Selinker, Larry. 1972. Interlanguage. International Review of Applied Linguistics in Language Teaching 10. 209–231. Simo Bobda, Augustin. 2003. The formation of regional and national features in African English pronunciation: An exploration of some non-interference factors. English World-Wide 24 (1). 17–42. Szmrecsanyi, Benedikt & Bernd Kortmann. 2009. Vernacular universals and angloversals in a typological perspective. In Markku Filppula, Juhani Klemola & Heli Paulasto (eds.), Vernacular Universals and Language Contacts: Evidence from Varieties of English and Beyond, 33–53. London/New York: Routledge. Sridhar, Kamal & S. Sridhar. 1986. Bridging the paradigm gap: Second language acquisition theory and indigenized varieties of English. World Englishes 5 (1). 3–14 Wrembel, Magdalena. 2010. L2-accented speech in L3 production. International Journal of Multilingualism 7 (1). 75–90. Wunder, Eva-Maria. 2011. Cross-linguistic influence in multilingual language acquisition: Phonology in third or additional language acquisition. In Gessica De Angelis & Jean-Marc Dewaele (eds.), New trends in crosslinguistic influence and multilingualism research, 105–128. Clevedon, UK: Multilingual Matters.

David Deterding

2 The phonology of Brunei English: L2 English or emergent variety 1 Introduction The status of different Englishes depends substantially on whether they constitute varieties of English as a first language, English as a second language (ESL), or English as a foreign language (EFL). This three-way distinction has alternatively been represented in terms of the Three Circles model proposed by Kachru (1985, 2005): first-language varieties are found in Inner-Circle countries like the UK, the USA and Australia; ESL varieties occur in Outer-Circle places that were once colonies and where the language now has an official status, such as India, Nigeria and Singapore; and EFL varieties exist in countries in the Expanding Circle where English has no official status, including Germany, Brazil and China. One crucial difference between the status of varieties of English in the Three Circles, as noted by Kachru, is that Inner-Circle varieties have traditionally established the norms, Outer-Circle varieties increasingly develop their own standards independent of the patterns of usage found in the Inner Circle, and ExpandingCircle varieties generally continue to look to the Inner Circle for guidance on how English should be used. To a certain extent, this distinction between the Circles resonates with attitudes in different places, as people in Outer-Circle countries such as Singapore are usually comfortable with their own style of English, particularly in terms of pronunciation, and most are quite proud to sound Singaporean (Deterding 2007), while people in places such as Poland more often tend to insist that they aspire to RP British English pronunciation (Scheuer 2005; Sobkowiak 2005) and may be upset or even insulted to be told that they speak with a Polish accent. Despite its widespread adoption and continued usefulness, there are some problems with Kachru’s Three-Circle representation of varieties of English. The status of different places is determined by history and geography, and some Expanding-Circle countries where English is quite widely used, such as Argentina and Belgium, might alternatively now be regarded as indeterminate between the Outer and Expanding Circles (Jenkins 2009: 20). Furthermore the model fails to capture many of the dynamic ways in which English is being used in today’s globalised world (Cogo and Dewey 2012: 9). In addition, Seidlhofer (2011) argues David Deterding, University of Brunei Darussalam

10

David Deterding

that there is no need for speakers in the Expanding Circle to continue to be classified as norm-dependent and so be excluded from contributing to the ways that English is evolving, especially as they nowadays constitute the majority of speakers of English in the world (Crystal 2003: 69). An alternative way of conceptualising the evolution of English in different places is by means of the Five-Phase Model of postcolonial development proposed by Schneider (2007). The first phase deals with the introduction of English into a territory where it was not previously used, while the fifth phase involves the emergence of diversity in a completely mature variety. Only Inner-Circle varieties such as those of the United States and Australia are considered to have reached the fifth phase, though it is possible that English in some places such as Singapore is in the process of achieving this status. Current research on Brunei English (e.g. Deterding and Salbrina 2013: 119) suggests that it may be in the third phase of Schneider’s model, labelled ‘nativization’, in which the variety is still subject to substantial external influences, as indigenous norms are not yet established. Although Brunei English is certainly developing many distinct local characteristics and it may well one day evolve to establish its own norms of pronunciation, lexis and usage, it still seems to be subject to influence from Inner-Circle varieties. In fact, school exam papers are still set in the UK and then sent to the UK to be graded, and furthermore many British teachers are employed in local schools, so the historical link with Britain continues. In addition, there may be substantial influence from American English, something that will be analysed in this paper. The paper examines aspects of pronunciation, particularly the apparent increasing incidence of rhoticity among young people in Brunei, as this is something that seems currently to be undergoing a transition. The investigation of rhoticity may thus provide a window onto the status of Brunei English and help to establish if it is a second language variety or if it is emerging as a variety that may one day become independent of external norms of pronunciation and usage.

2 The phonology of Brunei English An early investigation into the pronunciation of Brunei English was carried out by Mossop (1996). Based on auditory judgements, he described a range of features, including the use of [t] and [d] for /θ/ and /ð/ (consonants that will here be termed ‘voiceless TH’ and ‘voiced TH’, following the convention established by Wells 1982), the omission of final plosives from words such as first and

The phonology of Brunei English: L2 English or emergent variety

11

hand, the shortening of long vowels in words such as shirt, moon and cream, the merging of /e/ and /æ/ (vowels that will here be referred to as DRESS and TRAP, using the lexical keywords proposed by Wells 1982), and the avoidance of vowel reduction in the second syllable of words such as frigate and mammal. Mossop made no mention of rhoticity, apart from a brief comment about the lack of final [r] when the vowel in words such as square, chair and hair is shortened to [e] (1996: 201). While it is possible that he failed to notice rhoticity among his speakers, or alternatively that he believed it did not merit discussion, it is perhaps more likely that the widespread incidence of rhoticity in Brunei English is a recent phenomenon. Ten years later, Salbrina (2006) investigated the vowels of Brunei English using acoustic measurements as well as auditory judgements, and she confirmed the tendency for long and short vowels (such as FLEECE and KIT ) to be merged and showed that there was also little distinction between DRESS and TRAP. After a further four years, Salbrina (2010) included the study of consonants in her research on the pronunciation of eighteen ethnically Malay female undergraduates in Brunei reading an early version of the Wolf Passage (Deterding 2006), and she reported that about 52% of the tokens of thought, threaten and third in her data had [t] rather than [θ] at the start, and the final plosive in words such as fist and feast was omitted in about 62% of tokens. In addition, she reported that half of her speakers might be classified as rhotic. Salbrina and Deterding (2010) focused just on the rhoticity of the eighteen speakers from Salbrina (2010), and they reported that about 47% of tokens with potential post-vocalic [r] in stressed syllables in the reading of the Wolf Passage had r-colouring. While only three of the speakers had r-colouring in all the tokens investigated, nine speakers had r-colouring in most of them, and just six speakers exhibited no r-colouring in any of the tokens. The current paper will investigate rhoticity among Brunei undergraduates in more detail, including data from men as well as women and also including some non-Malays. In addition, the incidence of rhoticity will be correlated with other features of speech, to try to determine if it might be considered a prestigious feature of pronunciation or not. A more extensive analysis of Brunei English, including its grammar, lexis and discourse, is presented in Deterding and Salbrina (2013).

3 Data 53 undergraduates at the University of Brunei Darussalam (UBD) were recorded reading a short text, the Wolf Passage (see Appendix), and they were also interviewed for five minutes by the author of this paper. 38 of them are female and

12

David Deterding

the other 15 are male. 33 are ethnically Malay, 15 are Chinese, and the remaining five are from one of the minority ethnic groups in Brunei. They were aged between 20 and 24 at the time of the recording except for one female who was aged 35 and one male who was 28. The speech patterns of the two older speakers do not seem to be markedly different from the others. All the speakers have good English, though many stated that Malay is their first language while seven of the Chinese gave Mandarin Chinese as their best language. Further details about the speakers can be found in Deterding and Salbrina (2013: 9). In this paper, the rhoticity of these speakers will be analysed in some detail, particularly based on their reading of the Wolf Passage, and the incidence of rhoticity will be correlated with three other features of pronunciation: the realisation of voiceless TH; omission of [t] from the end of word-final consonant clusters; and differentiation between long and short vowels. Each of these three features has a standard pronunciation, so the correlation may provide an insight into whether rhoticity is linked with a prestigious way of speaking or not in Brunei English, and we can therefore see what this tells us about the status of Brunei English, both as an emergent variety within Schneider’s Five-Phase Model and also as an ESL variety.

4 Incidence of rhoticity Perceptual judgements were made about the presence or absence of [r] at the end of stressed syllables in five tokens from the Wolf Passage for all 53 speakers: heard, concern, short, more and before. The context for these tokens is shown below (where three dots indicate that the extract is not at the beginning or end of a sentence): As soon as they heard him, . . . . . . full of concern for his safety, . . . . . . stayed with him for a short while. . . . , and once more he was successful. . . . cried out even louder than before.

These five tokens provide a range of environments for the potential r-colouring in the coda of a syllable: in more and before, the potential [r] occurs at the end of the word, while in the other three tokens it occurs in a syllable coda where there is a following consonant. Moreover, before is the final word in a sentence, while the other four tokens involve non-final words.

The phonology of Brunei English: L2 English or emergent variety

13

Another phonetician listened to the data, and the rate of agreement between the two listeners was 87%. In cases of disagreement the item was generally counted as non-rhotic, so the results reported here represent a conservative estimate of rhoticity. The results for the incidence of rhoticity for these five tokens are shown in Table 1. (The total for more is 52 rather than 53 because one speaker omitted the word.) These results show that nearly 31% of the tokens have r-colouring while about 69% do not. Furthermore r-colouring is more common in word-final position (more and before) and is less frequent in a non-final position of a coda consonant cluster. Table 1: Incidence of rhoticity in the Wolf Passage [r]

No [r]

heard concern short more before

13 10 9 26 23

40 43 44 26 30

Total

81 (30.7%)

183 (69.3%)

If we look into these results in more detail, we find that four speakers have r-colouring in all five tokens, 31 of them, over 58%, show some sign of rhoticity, and 22 have no r-colouring in any of the tokens (see Table 2). Table 2: Number of speakers producing number of coda [r]s Number of coda [r] realised

Number of speakers

0 1 2 3 4 5

22 5 13 6 3 4

If only a single token produced by an individual speaker is perceived as having r-colouring, this could represent an exceptional item, but if at least two tokens are judged to have r-colouring, then we can assume that the speaker may be perceived to be at least partially rhotic. On the basis of a 2-out-of-5 threshold, Deterding and Salbrina (2013: 33) conclude that 26 of these UBD undergraduates (49%) have a rhotic accent, which is almost identical to the 50% reported in

14

David Deterding

the earlier study involving only female ethnically-Malay speakers (Salbrina and Deterding 2010), though it must be admitted that the 2-out-of-5 threshold is somewhat arbitrary. Indeed, eight of the speakers exhibit r-colouring in more and before (in which there is no following consonant) but not the other three tokens, so it is uncertain if they should be classified as rhotic or not. We can further consider the incidence of rhoticity among female and male speakers and also between the two main ethnic groups, Malays and Chinese. The results for female and male speakers are shown in Table 3. Using the 2-outof-5 classification of rhotic speakers, we find that 22 of the 38 females are rhotic (58%) while only 4 out of the 15 males are rhotic (27%). Table 3: Incidence of rhoticity in the Wolf Passage

Females Males

Rhotic

Non-rhotic

22 (57.9%) 4 (26.7%)

16 (42.1%) 11 (73.3%)

The difference between the two genders is significant at the 0.05 level (χ² = 4.2, df = 1, p = 0.041). We should be cautious in drawing too great an inference from such small numbers, as one should not really do a chi-squared test when one of the cells has less than five tokens (Mackey and Gass 2005: 279). Nevertheless, these figures suggest that young women in Brunei are more likely to be rhotic than men. The incidence of rhoticity for the two main ethnic groups is shown in Table 4. While it appears that more Chinese are rhotic than Malays (60% versus 45%), the difference is not significant (χ² = 0.87, df = 1, p = 0.35). Table 4: Incidence of rhoticity for the Malay and Chinese speakers

Malay Chinese

Rhotic

Non-rhotic

15 (45.5%) 9 (60.0%)

18 (54.5%) 6 (40.0%)

To summarise so far: about half of young Bruneians can be described as rhotic, though the incidence of r-colouring is variable for most of them and it is more likely to occur in open syllables than closed ones. Women seem to be more likely to be rhotic than men, but there is no difference between ethnically Malay and Chinese Bruneians.

The phonology of Brunei English: L2 English or emergent variety

15

5 Correlation of rhoticity with other features of pronunciation In addition to analysing the background of the speakers, we can also consider how the incidence of rhoticity correlates with other features of pronunciation that are non-prestigious in Brunei English. In Brunei, as in most of South-East Asia, many speakers pronounce the voiceless TH in word-initial position such as in thin and three as [t] (Deterding and Kirkpatrick 2006). The Wolf Passage has three words with initial voiceless TH, thought, threaten and third, and overall about 47% of the tokens of these words are pronounced with an initial [θ], while nearly 53% of them have [t] at the start. Table 5 shows how the rhotic and non-rhotic speakers pronounce these three tokens. Table 5: Pronunciation of initial voiceless TH by the rhotic and non-rhotic speakers [θ]

[t]

Rhotic speakers Non-rhotic speakers

41 (52.6%) 34 (42.0%)

37 (47.4%) 47 (58.0%)

Total

75 (47.2%)

84 (52.8%)

Although the results in Table 5 seem to suggest that the rhotic speakers tend to use more [θ] than the non-rhotic speakers, the difference between the two groups is quite small and it is not significant (χ² = 1.79, df = 1, p = 0.18). Next, we can analyse how the rhotic and non-rhotic speakers deal with final consonant clusters, particularly the final [t] in words such as fist, forest and feast, each of which occurs in the Wolf Passage. Of course, it would be quite normal for most speakers of English, including speakers of Standard British English, to omit the final [t] in these words when the next word begins with a consonant (Cruttenden 2014: 314). Consequently, only contexts in which the next word begins with a vowel ( fist in, forest and) or where the word is at the end of a sentence ( feast) are considered, as these are environments in which speakers in Inner-Circle countries such as Britain and America are more likely to retain the final [t] (Cruttenden 2014: 314; Neu 1980: 47). The incidence of [t] retention and omission for these three tokens is shown in Table 6.

16

David Deterding

Table 6: Rate of [t] omission by the rhotic and non-rhotic speakers [t] retained

[t] omitted

Rhotic speakers Non-rhotic speakers

36 (46.2%) 36 (44.4%)

42 (53.8%) 45 (55.6%)

Total

72 (45.3%)

87 (54.7%)

Although these figures suggest a slightly higher tendency for the rhotic speakers to retain final [t], the differences fall far below the level of significance (χ² = 0.05, df = 1, p = 0.83), so we should conclude that there is no difference between the two groups in terms of retaining or omitting final [t] from word-final consonant clusters. Finally, we can consider whether the rhotic and non-rhotic speakers make a difference between the long and short vowels in a minimal pair such as feast and fist. Both these words occur in the Wolf passage, and auditory judgement combined with acoustic measurement of the formants suggests that 14 of the 53 speakers make no difference between these two vowels, as shown in Table 7. Table 7: Separation of feast and fist by the rhotic and non-rhotic speakers Different vowel

Same vowel

Rhotic speakers Non-rhotic speakers

21 (80.8%) 18 (66.7%)

5 (19.2%) 9 (33.3%)

Total

39 (73.6%)

14 (26.4%)

There appears to be a greater tendency for rhotic speakers to differentiate between these vowels (81% versus 67%), but once more the difference falls short of significance (χ² = 1.36, df = 1, p = 0.24). In summary: there is no evidence for a significant correlation of rhoticity with any of the three features of pronunciation investigated. In reality, 53 speakers is a small number when looking for statistical tendencies in pronunciation, and a much larger corpus of data would be needed to enable us to identify trends with any degree of confidence. However, we can certainly conclude that there is no evidence from these results that rhoticity is correlated with non-prestigious features of pronunciation.

6 Discussion It has been shown that about half of Bruneian undergraduates at UBD might be classified as having a rhotic accent, though the incidence of r-colouring is

The phonology of Brunei English: L2 English or emergent variety

17

variable, as only four out of the 53 speakers studied here have a post-vocalic [r] in all the tokens analysed. Based on the apparent absence of rhoticity in the data analysed by Mossop (1996), it may be a recent trend. Indeed, Nur Raihan (2014) has investigated the pronunciation of 24 school children in Brunei and reports that all but one of them could be described as having a rhotic accent, which lends support to the suggestion that rhoticity is an emergent trend in Brunei. In fact, phonics has recently been introduced for all primary school children in the country (Smith 2011), and with this promotion of the teaching of reading by means of explicit linking between the spelling of words and their pronunciation, one might expect the incidence of rhoticity in Brunei to be reinforced in the future, given that post-vocalic [r] reflects the written form of words. Comparison of the different groups in the current study indicates that women in Brunei are more likely to have a rhotic accent than men. Trudgill (1995: 70) observes that, in many societies around the world, women tend to adopt more prestige forms of speech than men. Cameron (2007) urges caution in accepting all the claimed differences between the speech of men and women; but if women have a greater tendency to exhibit r-colouring in Brunei, this suggests that rhoticity may be perceived as a prestige feature of pronunciation, particularly among young people. Furthermore, young women are often believed to be the trend-setters in terms of pronunciation (Johnson 2008: 166), so this further supports the suggestion that rhoticity is currently emerging as the norm in Brunei. There appears to be no difference in rhoticity between the two main ethnic groups, Malays and Chinese. This is a little surprising, as the Malay spoken in Brunei is strongly rhotic (Clynes and Deterding 2011), while the Chinese spoken in Brunei is non-rhotic. Although it is true that Standard Chinese can have rhotacised vowels, and for example 兒 (ér, ‘son’) is pronounced with r-colouring as [ɝ] (Lee and Zee 2003: 11), this is much more common in Beijing Dialect than other varieties of the language. Indeed, rhoticity is almost entirely absent in the Mandarin spoken in places such as Singapore and Taiwan (Lin 2007: 7), where 兒 is pronounced with a central vowel with no r-colouring, and this is also true for the Mandarin spoken in Brunei. One might predict, therefore, that on the basis of influence from their dominant home language, Malay Bruneians would exhibit more rhoticity than Chinese Bruneians, and it is not clear why this does not occur. One might note that Brunei Malay is the most widespread lingua franca in the country, and it is commonly spoken even by ethnically Chinese people, so maybe the pronunciation of Brunei Malay influences all speakers whether they are ethnically Malay or not.

18

David Deterding

Many of the pronunciation features of Brunei English might be characterised as prestigious or non-prestigious: use of [θ] for initial voiceless TH in words such as thought is closer to the Inner-Circle norm than use of [t]; retention of wordfinal [t] in phrases such as fist in the air is more standard than omission of this consonant; and a clear separation of the long and short vowels in words such as feast and fist is more prestigious than the merging of these two vowels. In each case, the rhotic speakers seem to have a slightly greater use of the more prestigious pattern, though none of the differences is significant, so we should be careful before we draw any firm conclusions about the correlation between rhoticity and these three features of pronunciation. However, these results certainly provide no evidence that non-rhoticity is perceived as the more prestigious way of speaking, even though British pronunciation is largely non-rhotic and pronunciation based on RP British English has traditionally usually been promoted as the norm in Brunei. In fact, there are currently about 260 teachers from the CfBT Trust employed as English language teachers in Brunei schools (Deterding and Salbrina 2013: 18), most of them from England, Australia and New Zealand and almost all having non-rhotic accents, but it seems that they have little influence on the pronunciation of their pupils. Given that the incidence of rhoticity seems to be increasing in Brunei, apparently led by young women, we might ask what the source of this influence is. Three potential influences can be suggested: the first is American English, as young Bruneians watch many American movies and listen to American music, though some linguists have questioned how much influence popular media have on sound changes that take place in society (Chambers 1998: 126); the second is Brunei Malay, which, as mentioned above, is strongly rhotic; and the third is the English of the Philippines, as there are about 200 teachers from the Philippines in Brunei schools, all of whom have a rhotic accent, and furthermore there are many thousands of Filipina domestic helpers (amahs) in Brunei homes. It is hard to determine which of these three influences is greater. Probably, they combine to influence the pronunciation of Brunei English, and the change is taking place because of the existence of all three influences. Finally, we can consider what this tells us about the status of Brunei English. The suggestion that it still seems to be subject to substantial external influences, particularly from American English and maybe also Philippine English, confirms that it belongs in Phase 3 of Schneider’s model. However, the fact that it is breaking away from its historical roots with British English, partly influenced by the pronunciation of the local variety of Malay, suggests that Brunei English is developing its own distinctive style of pronunciation so it might be regarded as moving towards Phase 4.

The phonology of Brunei English: L2 English or emergent variety

19

This progression from Phase 3 to Phase 4 of Schneider’s model might alternatively be seen as a shift from being an L2 variety towards becoming an emergent independent variety. The observation that Brunei English seems currently to be influenced by an external style of pronunciation, in this case American English, suggests that it might be regarded as an L2 variety; yet at the same time, it is shedding its historical links with British English and thereby developing its own distinctive phonology, partly influenced by Brunei Malay, so this suggests that it is becoming an emergent variety. However, there is an alternative perspective: in the modern globalised world, it is possible that young Bruneians are participating in a dynamic global style of English, so maybe the dichotomy between an independent national variety of English and L2 pronunciation is less relevant in the modern world where there is a burgeoning trend towards the use of English as a Lingua Franca (Seidlhofer 2011). This global ELF is characterised by many shared features of pronunciation, including avoidance of vowel reduction in function words such as of and as, widespread adoption of [t] for voiceless initial TH, and omission of the final /t/ in words such as fist (Deterding 2010), and these worldwide trends seem to occur regardless of how people in the UK or USA speak. We might then conclude that the phonological basis for classifying a variety of English as an emergent independent postcolonial variety or alternatively as an L2 variety may nowadays be gradually becoming less relevant in the modern world.

Appendix: The Wolf Passage The Boy who Cried Wolf (from Deterding 2006) There was once a poor shepherd boy who used to watch his flocks in the fields next to a dark forest near the foot of a mountain. One hot afternoon, he thought up a good plan to get some company for himself and also have a little fun. Raising his fist in the air, he ran down to the village shouting “Wolf, Wolf.” As soon as they heard him, the villagers all rushed from their homes, full of concern for his safety, and two of his cousins even stayed with him for a short while. This gave the boy so much pleasure that a few days later he tried exactly the same trick again, and once more he was successful. However, not long after, a wolf that had just escaped from the zoo was looking for a change from its usual diet of chicken and duck. So, overcoming its fear of being shot, it actually did come out from the forest and began to threaten the sheep. Racing down to the village, the boy of course cried out even louder than before. Unfortunately, as all the villagers were convinced that he was trying to fool them a third time, they told him, “Go away and don’t bother us again.” And so the wolf had a feast.

20

David Deterding

References Cameron, Deborah. 2007. The myth of Mars and Venus. Oxford: Oxford University Press. Chambers, Jack K. 1998. Myth 15: TV makes people sound the same. In Laurie Bauer & Peter Trudgill (eds.), Language myths, 123–132. London: Penguin. Clynes, Adrian & David Deterding. 2011. Standard Malay (Brunei). Journal of the International Phonetic Association 41(2). 259–268. Cogo, Alessia & Martin Dewey. 2012. Analysing English as a Lingua Franca: A corpus-driven investigation. London: Continuum. Cruttenden, Alan. 2014. Gimson’s pronunciation of English, 8th edn. London: Routledge. Crystal, David. 2003. English as a global language, 2nd edn. Cambridge: Cambridge University Press. Deterding, David. 2006. The North Wind versus a Wolf: Short texts for the description and measurement of English pronunciation. Journal of the International Phonetic Association 36(2). 187–196. Deterding, David. 2007. Singapore English. Edinburgh: Edinburgh University Press. Deterding, David. 2010. Variation across Englishes: Phonology. In Andy Kirkpatrick (ed.), The Routledge handbook of World Englishes, 385–399. London: Routledge. Deterding, David, & Andy Kirkpatrick. 2006. Emerging South-East Asian Englishes and intelligibility. World Englishes 25(3/4). 391–409. Deterding, David & Sharbawi Salbrina. 2013. Brunei English: A new variety in a multilingual society. Dordrecht: Springer. Jenkins, Jennifer. 2009. World Englishes: A resource book for students, 2nd edn. Abingdon, UK & New York: Routledge. Johnson, Keith. 2008. Quantitative methods in linguistics. Malden, MA: Blackwell. Kachru, Braj B. 1985. Standards, codification and sociolinguistic realism: The English language in the outer circle. In Randolph Quirk & Henry G. Widdowson (eds.), English in the world: Teaching and learning the language and literatures, 11–30. Cambridge: Cambridge University Press. Kachru, Braj B. 2005. Asian Englishes: Beyond the canon. Hong Kong: Hong Kong University Press. Lee, Wai-Sum & Eric Zee. 2003. Standard Chinese (Beijing). Journal of the International Phonetic Association 33(1). 109–112. Lin, Yen-Hwei. 2007. The sounds of Chinese. Cambridge: Cambridge University Press. Mackey, Alison M. & Susan M. Gass. 2005. Second language research: Methodology and design. Mahwah, NJ: Erlbaum. Mossop, Jonathan. 1996. Some phonological features of Brunei English. In Peter W. Martin, Conrad K. Ożóg & Gloria R. Poedjosoedarmo (eds.), Language use & language change in Brunei Darussalam, 189–208. Athens: Ohio University Center for International Studies. Neu, Helene. 1980. Ranking of constraints on /t,d/ deletion in American English: A statistical analysis. In William Labov (ed.), Locating language in time and space, 37–54. New York: Academic Press. Nur Raihan, Mohamad. 2014. A comparison of the pronunciation of English by teenagers and university undergraduates in Brunei. Final Year Academic Exercise, BA in English Language and Linguistics. Faculty of Arts and Social Sciences, University of Brunei Darussalam. Salbrina, Sharbawi. 2006. The vowels of Brunei English: An acoustic investigation. English World-Wide 27(3). 247–264.

The phonology of Brunei English: L2 English or emergent variety

21

Salbrina, Sharbawi. 2010. The sounds of Brunei English: 15 years on. South East Asia: A Multidisciplinary Journal 10. 39–56. Salbrina, Sharbawi & David Deterding. 2010. Rhoticity in Brunei English. English World-Wide 31 (2). 121–137. Scheuer, Sylwia. 2005. Why native speakers are (still) relevant. In Katarzyna DziubalskaKołaczyk & Joanna Przedlacka (eds.), English pronunciation models: A changing scene, 111–130. Bern: Peter Lang. Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world. Cambridge: Cambridge University Press. Seidlhofer, Barbara. 2011. Understanding English as a Lingua Franca. Oxford: Oxford University Press. Smith, Mark. 2011. Issues for teaching phonics in a multilingual context: A Brunei perspective. South East Asia: A Multidisciplinary Journal 11. 1–15. Sobkowiak, Włodzimierz. 2005. Why not LFC? In Katarzyna Dziubalska-Kołaczyk & Joanna Przedlacka (eds.), English pronunciation models: A changing scene, 131–149. Bern: Peter Lang. Trudgill, Peter. 1995. Sociolinguistics: An introduction to language and society. London: Penguin. Wells, John C. 1982. Accents of English. Cambridge: Cambridge University Press.

Stefanie Pillai

3 Rothicity in Malaysian English: The emergence of a new norm? 1 Introduction Malaysian English (MalE) is classified as a New English together with other postcolonial varieties of English such as Indian and Singapore English. These varieties developed from the time English was brought over by, in most cases, the British. Their historical link with Britain (and in some cases America) set these varieties apart from English as a foreign language (EFL) contexts. In many of the postcolonial settings, both the educated and local forms of English are still used for intra-national communication by a certain percentage of the population, and the presence of English in education, business, local media and creative works may still be strong in these contexts. However, different language settings and language policies following the withdrawal of Britain from their colonies have affected the extent to which English is used and learnt in these postcolonial countries. For example, in many postcolonial contexts, English may be considered as a second language (ESL) with reference to it being learnt or used as a second language after one or more indigenous languages or a national language. Different language settings and language policies have also resulted in postcolonial varieties of English developing distinctive linguistic features. In Malaysia, English is considered to be a second language (L2) but this is only true in the sense that it is the second compulsory language taught in Malay medium schools. The use of the term L2 does not mean that English is the second language learnt by the majority of Malaysians. Neither does it imply that it is the second most used language in Malaysia. The dominance of English in Malaysia is context-driven and restricted to particular domains, and for many multilingual Malaysians, it may be the third or other language which is learnt in school with varying degrees of success. Only approximately two per cent of Malaysians speak English as a first language (L1) (Crystal 1997: 58). This is one of the features of postcolonial settings where there is likely to be speakers, albeit a minority, who claim English as a first language. These L1 speakers of English grow up speaking English at home and will tend to use it as a dominant language for communication even if they Stefanie Pillai, University di Malaya, Kuala Lumpur

24

Stefanie Pillai

subsequently learn other languages like Malay or Mandarin. For example, in a study on the use of English among Malaysian undergraduates, those who claimed to be L1 speakers of English said that they always used English at home (Pillai 2008a), and similar to another group of L1 speakers interviewed in Pillai and Khan (2011), also said that they tend to use mainly English when communicating with relatives and friends. The undergraduate English L1 speakers were in their early twenties and were of Chinese and Indian origin, while the ones in Pillai and Khan (2011) were of Portuguese Eurasian descent and were aged between 39 to 68 years old. Despite their different ethnic backgrounds, they all considered English rather than their heritage languages to be their first language based on the fact that they first learnt to speak English at home. There are also some Malaysians who grow up speaking English and one or more languages at home. This distinguishes the English-as-an-L1-speakers from the majority of Malaysians who do not learn or speak English at home. They instead learn it either from the time they enter pre-school (from 4 to 5 years old) or primary school (from 7 years old). As mentioned earlier, English may not necessarily be the second language this group learns or speaks, and some of them may become highly proficient in English and use English as much as or more than other languages because of their social, educational background and profession. Thus, to refer to this entire group as L2 speakers may not provide an accurate picture of their actual proficiency and/or use of English. Instead, in multilingual settings such as Malaysia, labels such as English as an L1, L2 or ESL are not always useful due to the diversity in how and when and to what extent English is learnt and used. The Portuguese Eurasian group in Pillai and Khan (2011) all said that they used mainly English at the workplace, while the undergraduates who were studying at a public university tended to use both English and Malay depending on to whom they were speaking, and what they were doing. The use of English among the two groups of speakers mirrors the current situation in Malaysia, where the public sector generally functions in Malay, while the private sector, which is focused in cities like Kuala Lumpur and Penang, still largely operates in English. Thus, as reported in McArthur (2002: 335), about “25% of city dwellers use it [English] for some purposes in everyday life. It is widely used in the media and as a reading language in higher education and for professional purposes”. The importance that is placed on English in the professional setting can be seen in the numerous complaints by employers about the poor command of English among Malaysian graduates (Downe et al. 2012; Survey by Manpower Inc. 2008 cited in The National Graduate Employability Blueprint 2012–2017 2012). Further, language use and fluency in English and other languages function in a multidimensional manner taking into account a host of factors. As Canagarajah

Rothicity in Malaysian English: The emergence of a new norm?

25

and Wurr (2011: 3) point out, “[m]ultilinguals adopt different codes for different contexts and objectives. From this perspective, the objective of their acquisition is repertoire building rather than total competence in individual languages”. Within this repertoire may also be a different ‘types’ of English such as colloquial and Standard English which users can weave in and out of depending on the context in which English is being used (Govindan and Pillai 2009). For example, the variety of English used at home is likely to be the colloquial variety of Malaysian English (Pillai 2008a). Thus, even for those for whom English is a first or dominant language, the variety of MalE that is used, especially the spoken variety, displays distinct linguistic features compared to other varieties of English (e.g. Pillai 2012). In terms of pronunciation, for example, there is a noticeable lack of contrast between vowel pairs such as /ɪ/ – /iː/, /e/ – /æ/, and /ʌ/ – /ɑː/ (Pillai et al. 2010). Such features of pronunciation are likely to have been ‘learnt’ at home or in schools from Malaysian teachers, and then are used when communicating in English with fellow Malaysians. Over time, they could become an accepted “tacit endonormative standard” (Gut 2007: 355). This is perhaps one of the distinguishing features between some postcolonial varieties and EFL contexts. Thus, whilst it has been suggested that language background affects English pronunciation, in situations like in Malaysia, where English is generally learnt from fellow Malaysians, the language background of speakers may not be the only mitigating factor as there are particular pronunciation features which are common across Malaysians, making it possible to distinguish, for example, between a Malaysian and Hong Kong or Mainland Chinese English speaker. Further, unlike typical L2 contexts in which English is learnt in a non-English context, English is used in Malaysia with a colloquial variety thriving alongside a more acrolectal one. While EFL contexts generally lean towards a native model of English, usually either British or American English, some postcolonial countries may have shifted to their own model of English as a norm. Gut (2007: 356) explains this as a shift to an “endonormative orientation” in her Norm Orientation Hypothesis, where distinct linguistic features systematically emerge over generations as speakers begin looking towards their own variety of English as a norm. One such example is Singapore (Gut 2007: 356), which incidentally is also placed in the fourth phase, “endonormative stabilization”, of Schneider’s Dynamic Five-Phase Model of the Evolution of New Englishes (2003: 243). In Malaysia, however, whilst some features of pronunciation seem to be more systemically established than others (e.g. the lack of vowel contrast), others, such as rhoticity, do not appear to be so. A variety of English is considered rhotic when the in the spelling of the word is pronounced in the syllable coda either as the only consonant (e.g. paper#) or before another consonant (e.g.

26

Stefanie Pillai

card) (Trudgill and Hannah 2008: 20). This in the spelling of English words that occurs in the syllable coda position is often called post-vocalic /r/. However, the term “non-prevocalic /r/” (e.g. Trudgill and Hannah 2008: 11) is perhaps more apt for this feature of rhoticity compared to “post-vocalic /r/”. The latter could include instances in non-rhotic varieties where the in the spelling is pronounced before a vowel, such as in intervocalic positions in a word (e.g. carry) and across word boundaries (e.g. four eggs). In contrast, the term “nonprevocalic /r/” applies to the pronunciation of orthographic in word final positions preceding a pause and preceding a consonant (Salbrina and Deterding 2010). Since there is inconsistency in the terms used in the literature, the original terms used by the authors will be maintained when discussing their work, but the term coda /r/ is used when discussing the findings in the present study.

2 Previous studies 2.1 Rhoticity in Malaysian English Although MalE, having its roots in British English, is considered a non-rhotic variety (e.g. Baskaran 2004), the pronunciation of coda /r/ has been reported in this variety. For example, Ramasamy (2005) found instances of rhoticity among younger Malaysian Indian speakers. She examined the production of this feature among middle-class Malaysian Indians who were all of Tamil origin, and based on auditory examination, she found no evidence of such realisation among her older speakers (47–54 years old). However, she found that the younger speakers, particularly the 14 to 17 year olds tended to pronounce the in coda position. Based on her findings, she suggests that rhoticity is an emerging phenomenon in the speech of young Malaysian Tamils. Pillai (2014) also found sporadic realisations of rhoticity among Malaysian speakers who were in their twenties. Ramasamy (2005) suggests that rhoticity among her younger speakers is due to the influence of American media. The influence of American media on young MalE speakers has also been suggested by Rajadurai (2006), although thus far, no other instances of the consistent use of other forms of American English pronunciation, such as flapping or the use of unrounded /ɒ/ have been reported. Phoon, Abdullah and Maclagan (2013), however, did not find any instance of coda /r/ being realised by Indian speakers in their study. This could be attributed to the fact that none of the speakers in their study, who were aged between 19 to 22 years old, used English as a first or dominant language. This differs from the younger (14–17 years old) speakers in Ramasamy’s (2005) study

Rothicity in Malaysian English: The emergence of a new norm?

27

who had all acquired English at home and used it as a dominant language. Further, none of the speakers in Ramasamy’s study were from Tamil medium primary schools whilst the ones in the study by Phoon, Abdullah and Maclagan (2013) were. This suggests that the Indian speakers in Phoon, Abdullah and Maclagan’s (2013) study were predominantly second language speakers of English compared to the ones in Ramasamy’s (2005) study who used English as their first language. None of the latter spoke their heritage language, Tamil, fluently. The use of English as a first or dominant home language is not an uncommon phenomenon among middle class and above Indian families (e.g. David, Naji and Kaur 2003; Schiffman 1995), and it may be that rhoticity is more evident in this group of speakers. This could be due to more exposure to American media and the influence of peers. All 15 speakers in Phoon, Abdullah and Maclagan’s (2013) study were in fact second language speakers of English, with the Chinese and Indian speakers having attended vernacular primary schools where the medium of instruction is Mandarin or Tamil. It has to be stated here that the term ‘second’ is with reference to English not being the first language of the speakers, and does not necessarily reflect the order of use or learning as the speakers are all multilingual, with at least one other language, Bahasa Malaysia or Malay, in their language repertoire. The speakers’ overall self-rating for speaking in English was between weak to average based on the mean of 2.8 (SD = 0.6) on a scale from 1 to 5 (1 = very weak and 5 = very good) (Phoon, Abdullah and Maclagan 2013: 12). This again contrasts with the ones in Ramasamy’s (2005) study who were all reported to be fluent speakers of English. Perhaps this might explain why Phoon, Abdullah and Maclagan (2013) did not find any evidence of rhoticity among their Indian speakers. They did, however, find instances of the coda /r/ being produced by their Chinese speakers, but only in 7% of the selected tokens. Yet, in an earlier study of ten Chinese Malaysians aged between 19 to 26 years old, all of whom were recruited based on the criteria that they were exposed to English since birth and used it as a dominant home language, Phoon and Maclagan (2009: 32) found that although there was evidence of rhoticity among all but one of their speakers none of them were “consistently rhotic” with less than 25% of coda /r/s in 990 words being realised. Phoon and Maclagan (2009: 32) did, however, report that the speakers were more inclined to realise the coda preceding a consonant (e.g. bird) compared when it was in word final position (e.g. hair). This inconsistency in realisations of coda /r/ in these word positions is reflected in the different studies that have mentioned rhoticity in MalE. Table 1 summarises these findings from some of the studies that were discussed in this section of the paper.

28

Stefanie Pillai

Table 1: Table 1 Rhoticity in Malaysian English Authors

Ramasamy (2005)

Speakers

Evidence of rhoticity

Ethnicity

Age (years)

Status of English

Tamil

14–17

L1

Pronounced the coda /r/ in word final positions and preceding a consonant.

47–54

Dominant users

No evidence

Phoon and Maclagan (2009)

Chinese

19 to 26

Dominant home language

Coda /r/ in word final positions and preceding a consonant realised by nine of ten speakers but not consistently so. Higher percentage of occurrences preceding a consonant.

Phoon, Abdullah and Maclagan (2013)

Malay, Chinese and Indian

19–22

L2

Only two of five Chinese speakers produced coda /r/ in five out of the 70 words. No evidence of rhoticity among the Indian and Malay speakers.

Pillai (2014)

Indians and Chinese

Early 20s

L1

The in words like bird, board and bard pronounced consistently by only two of 11 speakers the speakers who produced the

None of the studies on rhoticity in MalE thus far indicate that there is a consistent display of rhoticity in MalE. This contradicts Kirkpatrick (2007: 123), who states that one of the differences between Malaysian and Singapore English is that “Singaporean English is non-rhotic, but Malaysian speakers produce the post-vocalic /r/ in certain contexts”. He does not, however, elaborate on what these contexts are. This clearly does not apply to all Malaysians in general as we are talking about a large number of speakers of different ages with varying

Rothicity in Malaysian English: The emergence of a new norm?

29

levels of fluency in English, who come from different language backgrounds and social backgrounds, and who live and work in different parts of Malaysia. Although the results on this phenomenon remain largely inconclusive with a general perception that MalE is becoming more “American”, largely due to auditory impressions of rhoticity, it appears that age and language backgrounds may be related to the presence of rhoticity among Malaysian speakers. There have been no attempts to examine the emergence of rhoticity as a possible development of a new pronunciation norm, signalling a move away from an endonormative norm, British English. This is perhaps one of the key differences between postcolonial and EFL contexts. Speakers in the former, having had a longer encounter with English and a longer period over which particular linguistic features including pronunciation features have developed, could have entered a phase where these features may become established norms in the variety, and hence have a endonormative orientation. Speakers in the latter context (e.g. China, Indonesia and Korea) tend to have an exonormative orientation. The study reported in this chapter examines the use of rhoticity among two age groups (20–29 and 30–45 years) from three main ethnic groups (Malay, Indian and Chinese) in order to answer the following research questions: (1) To what extent do Malaysian speakers pronounce coda /r/ both before a consonant and in word-final position (2) To what extent is there a relationship between rhoticity and speakers for whom English is an L1, and those for whom it is not? (3) Is there more evidence of rhoticity among younger speakers? The assumption underlying these questions is that there will be more evidence of rhoticity among the younger group, in particular among the MalE L1 group, who predominantly use English . We can expect linguistic innovations like new pronunciation features to be seen in this group rather than among those for whom English is not a first or dominant language. The findings will also be discussed with reference to rhoticity in neighbouring varieties of English who share the same British English legacy, namely, Singapore and Brunei English. This will be done in order to establish if similar trends are emerging among these neighbouring varieties, and if this is related to whether the patterns of use are related to the acceptance of an indigenous or external variety of English as a norm among these postcolonial varieties of English.

2.2 Rhoticity in neighbouring varieties of English A study on rhoticity in Singapore English by Tan (2012) found that the level of education and socio-economic status of Singapore speakers influences the use of coda /r/ (Tan 2012 used the term postvocalic /r/) and intrusive /r/, the realisation of /r/ at the end of a word which ends with a vowel, e.g. law when it is

30

Stefanie Pillai

followed by a word beginning with a vowel e.g. law and order. Her speakers were Chinese Singaporeans who were English-Mandarin bilinguals. Those with a higher level of education and socio-economic status showed a higher tendency to produce post-vocalic /r/. On the other hand, those with a lower level of education tended to produce the highest percentage of intrusive /r/ compared to those with a higher level of education. Tan (2012: 19) reports that “there is a direct correlation between education level and socioeconomic status of the speaker and the production of postvocalic-r and intrusive-r in SgE. Speakers of higher education levels and socioeconomic status have a tendency to produce postvocalic-r, and speakers of low education levels and socioeconomic status have a tendency to produce the intrusive-r”. This is similar to Tan and Gupta’s (1992) finding that the production of coda /r/ is a prestige feature for some speakers of Singaporean English. Poedjosoedarmo (2000) also found evidence of rhoticity among educated Singapore English speakers, particularly among Chinese speakers. All these findings on Singapore English contradict Kirkpatrick’s (2007: 123) claim that “Singaporean English is non-rhotic”. Still, at this point in time, rhoticity cannot be considered a definitive feature of Singapore English as it does not cut across ethnic or social groups. For instance, Salbrina and Deterding (2010) found that only one of the 12 Singapore Malay speakers in their study could be deemed to be rhotic, whereas about half of the 18 Brunei Malay speakers were rhotic. Thus, unlike MalE and Singapore English, Brunei English is more likely to be rhotic. Salbrina and Deterding (2010) attribute this to Brunei Malay which is rhotic and also to the influence of American media. Salbrina and Deterding (2010) analysed their data, which were obtained from the Wolf Passage (Deterding 2006), both perceptually and acoustically. Words with coda /r/ in word final positions and preceding consonants were extracted from the recordings of this passage. For the acoustic analysis, the third formant (F3) of the vowel preceding /r/ was measured at its start and end. They found that the average F3 for the r-coloured vowels were significantly lower than the non-rhotic ones, thus generally confirming their perceptual findings and also showing that there is a relationship between rhoticity and lower F3 values. Tan (2012) also measured F3 to confirm her auditory analysis for instances of postvocalic, intrusive and linking-/r/. In short, whilst rhoticity in Brunei English may be attributed to the influence of Brunei Malay, rhoticity in Singapore English appears to be developing as a prestige norm. Although the use of rhoticity may be attributed to the influence of American media, in the absence of other features of American English pronunciation and given Singapore’s “endonormative orientation” towards English (Gut 2007: 356), this feature could well be an example of an emerging feature of Singapore English pronunciation.

Rothicity in Malaysian English: The emergence of a new norm?

31

3 Methodology The speakers in this study comprised 34 speakers from two age groups: 20–29, 30–45 years. The data were obtained from the Corpus of Spoken Malaysian English, which is being developed at the Faculty of Languages and Linguistics, University of Malaya (Pillai et al. 2010; Pillai, Mohd. Don, and Knowles 2012). Each group was recorded reading the North Wind and the Sun text (see Appendix). Each group comprised speakers from the three main ethnic groups in Malaysia (Malay, Chinese and Indian), to examine if there was any observable pattern of rhoticity among a particular ethnic group. The speakers in the 30–45 year old age group comprised five speakers in the Malay and Chinese groups and four in the Indian one. They were all English language teachers and lecturers who were fluent in English, and used English extensively at work and at home. In the Indian group three of the speakers said that they grew up speaking English and used English predominantly English in most contexts. All three considered English to be their first language. The 20–29 year age group consisted of undergraduate speakers who were divided into L1 and and non-L1 speakers. This was based on their answers in a questionnaire where they were asked to state what they considered to be their first and second languages. They were given the option to indicate more than one language as their first language, if they grew up speaking those languages. There were also other questions in the questionnaire which asked them to indicate what language or languages they spoke to each of their parents, and other family members, when they started learning the languages they know, and to whom and when they use these languages. As expected, the L1 group reported more frequent use of English. They used mostly English in family and social contexts, and at university. There were five speakers in the younger L1 group (two Chinese, two Indians and one speaker of mixed parentage), and five speakers in each of the three ethnic groups in the L2 group. The first language for the Chinese group was Cantonese, and for the Indian group it was Tamil. All were undergraduates majoring in languages and linguistics. Speaker codes were devised according to (i) whether they were L1 MalE speakers; (ii) whether they were in the older (O) or younger (Y) group; (iii) their ethnic group (M = Malay, C = Chinese, I = Indian, MX = mixed parentage). A number was then allocated to each speaker in a group. For example, an older Chinese speaker may be identified as OC1, and a younger one as L1YC1. A total of 19 words per speaker were extracted from the recordings (printed in bold in the Appendix), which should have resulted in 646 tokens in all. However, one speaker (L1YI1) did not produce the second more in the text (see

32

Stefanie Pillai

Table 2: Tokens from North Wind and the Sun Position

Tokens

Frequency in text

Total by 34 speakers

Stress

rC

north warm first considered hard warmly

4 1 1 1 1 1

136 34 34 34 34 34

yes yes yes no yes yes

r#

were stronger traveller other more

1 2* 4 1 2

34 68 136 34 67

no no no no yes

TOTAL

645

rC = before a consonant/consonants, therefore a closed syllable; r# word final, i.e. an open syllable (followed by a pause or a consonant in the following word) *The last stronger in the text was not examined for rhoticity as it was followed by the word of, and is, therefore, an instance of linking r

Appendix), and thus a total of 645 words were extracted and examined. The words which were analysed are presented in Table 2. A perceptual analysis was first carried out where the author and another researcher indicated whether they found the /r/ to be realised or not in the tokens. Using Praat Version 5.3.01 (Boersma and Weenink 2013), the values of the third formant of the vowels in both rhotic and non-rhotic tokens were then measured and compared to see if the rhotacised tokens contained vowels with a lower F3 compared to the nonrhotacised ones (Love and Walker 2012; Hayward 2000; Salbrina 2010; Salbrina and Deterding 2010). This was based on the assumption that “[v]ariations in the frequency of F3 indicate the degree of r-colouring: the lower the F3, the greater the degree of rhoticity” (Ladefoged 2003: 149). In Standard Malay, /r/ is produced as an alveolar trill. However, is not always realised in word final positions (e.g. lebar) in the varieties of Malay used in Peninsular Malaysia unlike the varieties in the two states neighbouring Brunei (Aman et al 2000; Omar 1977). None of the Malay speakers or any of the other speakers participating in this study produced /r/ in the rhotic tokens as an alveolar trill or any other form of /r/ which was not an approximant. Thus, the lowering effect of F3 was not caused by other possible realisations. It should be noted that the realisations of /r/ as taps and trills reported by Chinese (Phoon and Maclagan 2009), Malay and Indian MalE speakers (Phoon, Abdullah, and Maclagan 2013) refer to /r/ in syllable onset positions and are thus, not related to instances of rhoticity.

Rothicity in Malaysian English: The emergence of a new norm?

33

As Salbrina and Deterding (2010: 125) point out “the correlation between lowered F3 and R-colouring is only approximate, partly because it is not always possible to derive reliable estimates of F3”. Bearing this in mind, the mid-point of the F3 of the tokens were measured, using the automatic formant tracking feature in Praat, and hand corrected where necessary.

4 Findings and discussion The perceptual examination of the sounds, done by the author and another researcher, yielded an agreement of 97% between the two raters in the first instance. Agreement was then reached for the few cases of disagreement upon listening to the contested cases again. Based on the perceptual analysis, three speakers in the older group showed evidence of rhoticity (see Table 3). It is also interesting to note that only two speakers (L1OI3 and L1YC2), incidentally both L1 speakers, in all the recordings produced a linking /r/ in the phrase stronger of, which shows that Malaysian speakers in general do not have this phonological feature. Among the younger group, only four out of the 15 speakers produced rhotacised tokens in the non-L1 group. Contrary to expectations, only one of the L1 speakers (L1YC2) pronounced coda /r/, and this only in four of the 19 words that she read, thus debunking the assumption that the L1 speakers, particularly the younger ones, are purveyors of the emergence of rhoticity in MalE. The findings of the acoustic analysis show that the average F3 value of the vowels preceding a pronounced coda /r/ (Mean = 2566Hz, S.D. = 418Hz) is significantly lower than that of the vowels preceding coda /r/ that was not realised (Mean = 2951Hz, S.D. = 346Hz): (t = 4.5, df = 643, independent samples, twotailed, p < 0.001). In total, only 17 out of 645 (2.6%) words were rhotic, and these were produced by only eight of the 34 or 20.6% of the 34 speakers (see Table 3 and 4). Both coda /r/ before consonants (e.g. in first, hard, warmly) and word final coda /r/ (e.g. in other, more, were, stronger) were found among these words (see Table 4). As can be seen in Table 4, the incidence of rhoticity among the stressed closed syllables (11 out of 272 or 4%) is only slightly higher than that for the stressed open syllables (2 out of 67 or 3%), and thus, given these small numbers, it is not possible to say for sure in which environments the coda /r/ is more likely to be realised. There was, however, a larger difference between rhotic tokens in the stressed syllables (13 out of 339 or 3.8%) compared to unstressed syllables (4 out of 306 or 1.3%) as can be seen in Table 4. This may be an indication that rhoticity is more common in stressed syllables.

34

Stefanie Pillai

Table 3: Rhotic tokens by speakers Speakers

Rhotacised tokens

Percentage of rhotic tokens per speaker

OM1 OM2 OM3 OM4 OM5 OC1 OC2 OC3 OC4 OC5 OI1 L1OI2 L1OI3 L1OI4 YM1 YM2 YM3 YM4 YM5 YC1 YC2 YC3 YC4 YC5 YI1 YI2 YI3 YI4 YI5 L1YC1 L1YC2 L1YI1 L1YI2 L1YMX1

None first hard warmly None first hard other None None None None first None None None None None None None first hard None None hard other more None first None None None None None None None first more were stronger None None None

0 15.8 0 15.8 0 0 0 0 5.3 0 0 0 0 0 0 0 10.5 0 0 10.5 5.3 0 5.3 0 0 0 0 0 0 0 21.1 0 0 0

None of the speakers realised more than 50% of the 19 tokens of coda /r/ in the text they read (see Table 3), which indicates that similar to Singapore English, rhoticity is not a common phenomenon among MalE speakers at this point in time. This could well be because of the influence of British English on both these postcolonial varieties, but the findings that rhoticity is being perceived as a prestige variety in Singapore English also indicates the possibility of

Rothicity in Malaysian English: The emergence of a new norm?

35

moving from one exonormative norm to another (e.g. from British to American English). It further suggests that a pronunciation feature may at first be increasingly used due to, for example, the influence of media and entertainment (also said to be the case for Brunei English). This feature could later become accepted as part of the indigenous norm to the extent that it is no longer seen with reference to the exonormative norm, but as part of the endonormative norm. The inconsistent use of rhoticity is also reflected in the way the same word may have the coda /r/ produced in one instance but not in another by the same speaker (e.g. more by YC2 and L1YC2, and stronger by L1YC2). Table 4: Word position of rhotic tokens Position

Tokens

Total by 34 speakers

rC

north

136

0

0

warm

34

0

0

first

34

6 (17.6%)

0.9

hard

34

4 (11.8%)

0.6

warmly

34

1 (2.9%)

0.2

Unstressed

considered

34

0

0

Unstressed

were

34

1 (2.9%)

0.2

stronger

68

1 (2.9%)

0.2

traveller

136

0

0

r#

Stressed

Stressed

Number and percentage of rhotic tokens according to word

Number and overall percentage of rhotic tokens

other

34

2 (5.9%)

0.3

more

67

2 (5.9%)

0.3

TOTAL

645

17 (100%)

2.6

As mentioned previously, there is also no overwhelming evidence of rhoticity among the younger L1 speakers as might have been expected. This suggests that it is not a developing pattern among those in their twenties. We can also assume that these speakers acquired a non-rhotic variety from their parents. The L1 speakers from both age groups were all non-rhotic except for L1YC2. Although this finding should be treated with caution given the small number of speakers, it may be the case that L1 speakers who have a tendency to use English more frequently and in more contexts, are not rhotic, at least not in the age groups examined.

36

Stefanie Pillai

The data in this study does not indicate any ethnically based patterns expect for the fact that none of the non-L1 Indian speakers in both age groups pronounced coda /r/. The small number of rhotic tokens produced by three Malays (two from the older group and one from the younger one) and five Chinese speakers (four from the younger group including one Chinese L1 speaker compared to one from the older group) is not sufficient enough for us to come to any conclusions about whether rhoticity is more common among the Malays and Chinese. A total of three speakers from the older group (two Malays and one Chinese), produced seven rhotic tokens compared to five speakers from the younger group who produced ten tokens (see Table 3). None of these speakers though could be considered rhotic based on their inconsistent production of the non-prevocalic /r/. Further, the production of the /r/ could be attributed to the fact that they were reading a text rather than speaking spontaneously. For example, Tan and Gupta (1992) found a higher percentage of post-vocalic /r/ being produced when their speakers were reading a passage and a word list compared to when they were being interviewed. Compared to the five younger speakers who produced a total of ten rhotacised tokens, there is no overwhelming evidence that the younger speakers were necessarily more rhotic than the older ones.

5 Conclusion In relation to the first research question, the combination of perceptual and acoustic findings shows no consistent realisation of coda /r/ among the Malaysian speakers in this study. R-lessness therefore appears to cut across the two age-groups, the three ethnic groups and L1 and non-L1 speakers of MalE. Based on this, MalE appears to still be a non-rhotic variety. Both groups acquired this non-rhotic variety of English in Malaysia from Malaysian family members or teachers who would have presumably been non-rhotic as well. Like the older group, the younger groups (who are in their twenties) of MalE speakers, being non-rhotic, are likely to pass on this non-rhotic variety to their children or students. The findings tentatively suggest that we are not likely to see MalE become rhotic quite so soon as Brunei English (see Deterding, this volume) or as Singapore English, where rhoticity is becoming prestigious. Given the overall inconsistent use of rhoticity among the speakers and especially the lack of rhotic realisations among the L1 speakers, there is no indication that rhoticity is developing as a prestige feature in MalE at present. Unlike Singapore, where it is reported that a “shift to an endonormative orientation [. . .]

Rothicity in Malaysian English: The emergence of a new norm?

37

has been completed” (Gut 2007: 356), Malaysia is still fixated on British English as a norm. For example, it is stipulated that British English should be used as a pedagogic model: “[t]eachers should use Standard British English as a reference and model for teaching the language. It should be used as a reference in terms of spelling and grammar as well as pronunciation for standardization” (Curriculum Development Division Ministry of Education 2012: 4). There is still a general sense that the local variety of English is not “good” enough or an “incorrect” variety (Pillai 2008b) and therefore, an exonormative norm, in this case, British English, is still considered the reference model. It may be the case that the influence of American English and American media and entertainment will influence younger generations of Malaysians to become increasingly rhotic despite the current exonormative orientation towards British English. However, perhaps it is not so much about whether there is a shift from one exonormative norm to another (especially in view of the fact that other features of American English pronunciation have not been reported in MalE), but about whether rhoticity in itself becomes a distinguishing feature of or a norm in MalE. If this were the case in the future, and if there were a shift to an endonormative orientation by then, the acceptance of this feature would no longer be tied to it being a feature of American English but of MalEng. In relation to the second and third research question, there was no obvious relationship between L1 and non-L1 speakers of English, or older and younger speakers with rhoticity although the tendency was for the latter to show more non-rhotic tendencies. Returning to the lack of rhotic tokens found in this study, whilst this would suggest that MalE is not rhotic at this point in time, the small number of speakers in this study make the findings tentative. Further studies on even younger groups of speakers from different educational and language backgrounds (e.g. international and local public schools) may reveal more about the realisation of coda /r/ in MalE. A study of speakers in Sabah and Sarawak, which border Brunei, may also show different patterns of use, as they may display more instances of rhoticity due to the variety of Malay in these States being rhotic.

Acknowledgements The study reported in the paper was funded in part by a research grant from the University of Malaya RG159-10HNE.

38

Stefanie Pillai

References Aman, Idris, Rosniah Mustaffa, Zharani Ahmad, Jamilah Mustafa and Mohammad Fadzeli Jaafar. 2011. Aksen standard bahasa kebangsaan: Realiti, Identiti dan Integrasi [National language standard accent: Reality, identity and integration]. In Idris Aman (ed.), Aksen bahasa kebangsaan: Realiti, identiti dan integrasi [National language accent: Reality, identity and integration], 72–82. Bangi: Penerbit Universiti Kebangsaan Malaysia. Omar, Asmah. 1977. The phonological diversity of the Malay dialects. Kuala Lumpur: Dewan Bahasa dan Pustaka. Baskaran, Loga. 2004. Malaysian English: Phonology. In Edgar W. Schneider, Kate Burridge, Bernd Kortmann, Rajend Mesthrie & Clive Upton (eds.), A handbook of varieties of English. Volume 1: Phonology, 1034–1046. Berlin: Mouton de Gruyter. Boersma, Paul & David Weenink. 2013. Praat: Doing phonetics by computer. Version 5.3.57. http://www.praat.org/ (accessed 27 October 2013). Canagarajah, A. Suresh & Adrian J. Wurr. 2011. Multilingual communication and language acquisition: New research directions. The Reading Matrix 11(1). 1–15. Crystal, David. 1997. English as a global language. Cambridge: Cambridge University Press. Curriculum Development Division Ministry of Education. 2012. Dokumen kurikulum standard sekolah rendah (KSSR): Bahasa Inggeris sekolah kebangsaan: Year 3 [Standard primary school curriculum document: English language for national schools: Year 3]. Putrajaya: Ministry of Education Malaysia. http://web.moe.gov.my/bpk/v2/kssr/index.php/dokumen_ kurikulum/tahap_i/modul_teras_asas/bahasa_inggeris (accessed 19 January 2014). David, Maya Khemlani, Ibtisam M. H. Naji & Sheena Kaur. 2003. Language maintenance or language shift among the Punjabi Sikh community in Malaysia? International Journal of the Sociology of Language 160–161. 1–24. Deterding, David. 2006. The North Wind versus a Wolf: Short texts for the description and measurement of English pronunciation. Journal of the International Phonetic Association 36 (2). 187–196. Downe, Alan G., Siew-Phaik Loke, Jessica Sze-Yin Ho & Ayankunle Adegbite Taiwo. 2012. Corporate talent needs and availability in Malaysian service industry. International Journal of Business and Management 7 (2). 224–235. Govindan, Indira & Stefanie Pillai. 2009. English question forms used by young Malaysian Indians. The English Teacher 38. 74–94. Gut, Ulrike. 2007. First language influence and final consonant clusters in the new Englishes of Singapore and Nigeria. World Englishes 26(3).346–359. Hayward, Katrina. 2000. Experimental phonetics. Harlow: Longman. Kirkpatrick, Andy. 2007. World Englishes: Implications for international communication and English language teaching. Cambridge: Cambridge University Press. Ladefoged, Peter. 2003. Phonetic data analysis: An instruction to fieldwork and instrumental techniques. Oxford: Blackwell. Love, Jessica & Abby Walker. 2012. Football versus football: Effect of topic on /r/ realization in American and English sports fans. Language and Speech 56 (4). 443 –460 McArthur, Tom. 2002. The Oxford guide to World Englishes. Oxford: Oxford University Press. Phoon, Hooi San & Margaret Anne Maclagan. 2009. Chinese Malaysian English phonology. Asian Englishes 12 (1). 20–45. Phoon, Hooi San, Anna Christina Abdullah & Margaret Maclagan. 2013. The consonant realizations of Malay-, Chinese- and Indian-influenced Malaysian English. Australian Journal of Linguistics 33 (1). 3–30.

Rothicity in Malaysian English: The emergence of a new norm?

39

Pillai, Stefanie. 2008a. A study of the use of English among undergraduates in Malaysia and Singapore. Southeast Asian Review of English 48. 19–38. Pillai, Stefanie. 2008b. Speaking English the Malaysian way: Correct or not? English Today 96 (24.4). 42–45. Pillai, Stefanie. 2012. Colloquial Malaysian English. In Bernd Kortmann & Kerstin Lunkenheimer (eds.), The Mouton world atlas of variation in English, 573–584. Berlin: Mouton de Gruyter. Pillai, Stefanie. 2014. The monophthongs and diphthongs of Malaysian English: An instrumental analysis. In Hajar Abdul Rahim & Shakila Abdul Manan (eds.), English in Malaysia: Postcolonial and beyond, 55–86. Frankfurt: Peter Lang. Pillai Stefanie & Mahmud Hasan Khan. 2011. I am not English but my first language is English: English as a first language among Portuguese Eurasians in Malaysia. In Dipika Mukherjee & Maya Khemlani David (eds.), National language planning and language shifts in Malaysian minority communities: Speaking in many tongues, 87–100. Amsterdam: Amsterdam University Press. Pillai, Stefanie, Zuraidah Mohd. Don, Gerald Knowles & Jennifer Tang. 2010. Malaysian English: An instrumental analysis of vowel contrasts. World Englishes 29 (2). 159–172. Pillai, Stefanie, Zuraidah Mohd. Don & Gerald Knowles. 2012. Towards building a model of Standard Malaysian English pronunciation. In Zuraidah Mohd. Don (ed.), English in multicultural Malaysia: Pedagogy and applied research, 195–211. Kuala Lumpur: University of Malaya Press. Poedjosoedarmo, Gloria. 2000. The media as a model and source of innovation in the development of Singapore Standard English. In Adam Brown, David Deterding & Low Ee Ling (eds.), The English language in Singapore: Research on pronunciation, 112–120. Singapore: Singapore Association for Applied Linguistics. Rajadurai, Joanne. 2006. Pronunciation issues in non-native contexts: A Malaysian case study. Malaysian Journal of ELT Research 2. 42–59. Ramasamy, Sheila Adelina. 2005. Analysis of the usage of post vocalic /r/ in Malaysian English. Kuala Lumpur, University of Malaya MA research report. Salbrina, Sharbawi. 2010. The sounds of Brunei English: 15 years on. South East Asia: A Multidisciplinary Journal 10. 39–56. Salbrina, Sharbawi & David Deterding. 2010. Rhoticity in Brunei English. English World-Wide 31 (2). 121–137. Schiffman, Harold F. 1995. Language shift in the Tamil communities of Malaysia and Singapore: The paradox of egalitarian language policy. Southwest Journal of Linguistics 14 (1–2). 151– 165. Schneider, Edgar W. 2003. The dynamics of new Englishes: From identity construction to dialect birth. Language 79(2). 233–281. Tan, Ying-Ying. 2012. To r or to to r: Social correlates of /ɹ/ in Singapore English. International Journal of the Sociology of Language 218. 1–24. Tan, Chor Hiang & Anthea Fraser Gupta. 1992. Post-vocalic /r/ in Singapore English. York Papers in Linguistics 16. 139–152. The National Graduate Employability Blueprint 2012–2017. 2012. Ministry of Higher Education Malaysia Putrajaya 2012. http://jpt.mohe.gov.my/PENGUMUMAN/GE%20blueprint%202012– 2017.pdf (accessed 30 March 2014). Trudgill, Peter & Jean Hannah. 2008. International English: A guide to the varieties of Standard English, 5th edn. London: Hodder Education.

40

Stefanie Pillai

Appendix The North Wind and the Sun were disputing which was the stronger when a traveller came along wrapped in a warm cloak. They agreed that the one who first succeeded in making the traveller take his cloak off should be considered stronger than the other. Then the North Wind blew as hard as he could but the more he blew the more closely did the traveller fold his cloak around him and at last the North Wind gave up the attempt. Then the Sun shone out warmly and immediately the traveller took off his cloak. And so the North Wind was obliged to confess that the sun was the stronger of the two.

Magdalena Wrembel

4 Cross-linguistic influence in second vs. third language acquisition of phonology 1 Cross-linguistic influence The term cross-linguistic influence was first introduced by Sharwood-Smith (1983) to refer to transfer-related phenomena in a theory-neutral manner. It is intended to cover a wider range of linguistic influences including interference, transfer, borrowing or language loss triggered by the coexistence of various language systems. Cross-linguistic influence in second language acquisition research has been traditionally perceived to be a one-to-one type of transfer between the native and the target language. However, as early as the 1980s, some scholars attempted to broaden this understanding by pointing not only to the native tongue but also to other non-native languages as potential sources of influence in the acquisition of subsequent languages. For instance, Gass and Selinker (1983: 372) define language transfer as “the use of native language (or other language) knowledge [. . .] in the acquisition of a second (or additional) language”. Along the same lines, Sharwood-Smith (1994: 198) provides the following definition of CLI, according to which it pertains to “the influence of the mother tongue on the learner’s performance in and/or development of a given target language; by extension, it also means the influence of any ‘other language’ known to the learner on that target language”. Such a broader view going beyond L1 influence has been fully embraced by researchers working on third or additional language acquisition such as Cenoz, Hufeisen and Jessner (2001) and De Angelis (2007). A number of empirical studies conducted from the multilingual perspective have allowed to challenge some well-established assumptions that identify the native language as the only or prevailing source of transfer and, consequently, to modify the existing theoretical models. This resulted in a new conceptualisation of transfer-related phenomena, acknowledging various interactions between non-native languages and a simultaneous influence of more than one language on the target language being acquired. Therefore, the traditional one-to-one type of transfer associated with SLA was replaced by a suggestion of a many-to-one type of interference and Magdalena Wrembel, Faculty of English, Adam Mickiewicz University, Poznań, Poland [email protected]

42

Magdalena Wrembel

a proposal of the so called ‘combined cross-linguistic influence’ (De Angelis 2007: 21). The studies conducted to date have largely confirmed the assumption of combined CLI and, at the same time, identified a number of factors that seem to condition the source, direction and relative strength of the influence of previously learnt languages (both native and non-native) on the subsequently acquired language systems. Among the factors most frequently discussed in the literature are typological proximity, psychotypology, target/source language proficiency, order of acquisition of particular languages, recency of use, type of exposure and length of residence (cf. De Angelis 2007; Cenoz 2001). As far as typological proximity is concerned, scholars generally agree that cross-linguistic influence is most likely to occur between languages which are closely related rather than those which are not (e.g. Cenoz 2001; De Angelis 2007; Williams and Hammarberg 1998). Research findings in this field demonstrate that multilinguals tend to be influenced mostly by the languages from their linguistic repertoire that are or are perceived to be the closest to the target language, although there are also less frequent cases of reliance on distant languages. The factor of language distance can be seen as an objective formal measure of a genetic relationship between language families or as learners’ subjective perception of that language distance, i.e. psychotypology. According to Kellerman (1987), transferability is conditioned by two constraints, namely psychotypology and prototypicality, i.e. more prototypical forms/features in the source languages determine a higher degree of CLI into the target language, especially if these languages are perceived to be related/similar. A further distinction is drawn in the area of factors conditioning CLI between relatedness (i.e. genetic affiliation or typological proximity between languages belonging to the same or different language family or group) and formal similarity (i.e. explicit identification of similarity between unrelated languages with respect to some language components or features, De Angelis 2007). The proficiency factor is also commonly acknowledged in the literature as conditioning the source and strength of cross-linguistic influence. On the whole, research results so far have mostly attested CLI at the early stages of acquisition of the target language, when the proficiency level in this additionally acquired language is relatively low and learners tend to resort to transfer more frequently as a coping strategy (e.g. Ringbom 1987; Odlin 1989; Hammarberg and Hammarberg 2005; Wrembel 2010). Furthermore, Odlin (1989) claims that transfer characteristic for the low proficiency level in the target language is usually negative, as opposed to the positive type of transfer which typically occurs at more advanced stages of acquisition when learners take advantage of their previous linguistic knowledge much more. Interestingly, some scholars pointed also to

Cross-linguistic influence in second vs. third language acquisition of phonology

43

proficiency level in the source language as an important variable for CLI, although few systematic studies investigated it further. It tentatively appears that other non-native languages can be sources of cross-linguistic influence irrespective of how proficient the multilingual learners are (cf. Ringbom 1987; De Angelis 2007), however, some claim that the proficiency threshold level in a non-native language must be sufficiently high in order to exert influence on another currently acquired foreign language (e.g. Fernandes-Boëchat 2007). Other CLI-related factors involve the length of residence and exposure to a foreign language environment which are generally found to influence the amount and type of transfer. Nonetheless, recency of use has been identified as one of the focal factors in several studies on multilingualism since 1960s (cf. Vildomec 1963). The underlying assumption is that recent use tends to trigger more potential influence due to the previous activation of some linguistic information stored in the mind of a multilingual (cf. Williams and Hammarberg 1998). Finally, the order in which languages were acquired was also found to determine the amount and type of cross-linguistic influence (Dewaele 1998). Recent developments in investigations on transfer phenomena have resulted in a proposal of a complex scheme put forward by Jarvis and Pavlenko (2007: 20) that aims at characterising various types of cross-linguistic influence. The developed classification tries to account for CLI with respect to ten dimensions including (1) the area of language knowledge (e.g. phonological, semantic, lexical transfer etc.), (2) directionality (forward, reverse, lateral, multidirectional transfer), (3) cognitive level (linguistic vs. conceptual transfer), (4) type of knowledge (implicit vs. explicit), (5) intentionality (intentional vs. unintentional transfer), (6) mode (productive vs. receptive), (7) channel (aural vs. visual), (8) form (verbal vs. nonverbal), (9) manifestation (overt vs. covert) and (10) outcome (positive vs. negative). For the purpose of the present study, the notion of directionality of CLI will be particularly relevant. Therefore it will be described in more detail. The distinction between “forward transfer” (L1 → L2) as well as “reverse” or “backward transfer” (L2 → L1) is used rather conventionally in the SLA literature (e.g. Gass and Selinker 2001). These terms could be potentially extended to third or additional language acquisition provided that the sequential order of acquisition of L1, L2, L3, Ln is unambiguous and relevant, which is rarely the case, taking into account an array of other factors conditioning CLI. In an attempt to account for the complex nature of third or additional language acquisition, Jarvis and Pavlenko (2007) introduced the term “lateral transfer” to refer to any influence of a non-native (or post-L1) language on another nonnative language (e.g. L2 → L3, L3 → L4). Furthermore, “bidirectional or multidirectional transfer”refers to the cases in which two or more languages from the multilinguals’ repertoire function simultaneously as source and recipient languages (L1 ↔ L2, L2 ↔ L3).

44

Magdalena Wrembel

2 VOT in second and third language acquisition and new varieties of English Voice onset time of initial plosives is frequently selected as the focus of investigations on interference-related phenomena in foreign language acquisition for several reasons. On the one hand, it is recognised as a significant feature correlated with a high degree of a perceived global foreign accent (e.g. Major 1990). On the other hand, due to the precise nature of the acoustic measurements of VOT, it allows for statistical comparisons and hypothesis testing. The subsequent overview of the literature shows major findings and differences in methodological approaches on cross-linguistic influence in VOT patterns in research on second language acquisition (SLA), third language acquisition (TLA) and the acquisition of new varieties of English.

2.1 VOT in SLA As pointed out by Hansen Edwards and Zampini (2008) stop consonants are amongst the most widely studied classes of sounds in second language acquisition research with a focus on one acoustic cue, namely voice onset time. The SLA literature provides a lot of evidence of transfer of L1 VOT values in the acquisition of L2 aspiration patterns of stops, especially at the lower levels of L2 proficiency (e.g. Flege 1987; Flege and Hillenbrand 1987). More advanced learners were found to be able to approximate native speaker norms and to differentiate L1 and L2 with respect to VOT (e.g. Caramazza et al. 1973; Flege 1987, 1991). Only the most proficient L2 learners were reported to be able to produce foreign language aspiration patterns with a mean VOT duration that was like the one of the monolingual native speakers of the target variety (Flege 1987). The inability to distinguish between differently aspirated plosives in the L1 and L2 by inexperienced L2 learners was explained by means of the proposed mechanism of equivalence classification which blocks the formation of a new phonetic category in case when L1 and L2 sounds are not sufficiently dissimilar (Flege 1987; Flege and Hillenbrand 1987). According to Flege’s (1995) Speech Learning Model (SLM), early acquirers are able to establish separate phonetic categories for L1 and L2 stops, however, late L2 learners are more likely to create a new “merged” L2 category, which may deflect away from both L1 and L2 categories in order to maintain the phonetic contrast between the two languages. Such “compromise” or “hybrid” VOT values for both languages were evidenced in several SLA studies (Flege 1987;

Cross-linguistic influence in second vs. third language acquisition of phonology

45

Flege and Eefting 1988; Major 1992). The results suggest that also L1 phonetic representations may get restructured as the result of L2 acquisition and the production of native language VOT values may be affected by the shift towards more target-like values in the L2, thus resulting in the so called regressive transfer (e.g. Waniek-Klimczak 2011). Furthermore, a number of studies have explored various factors that may influence the degree to which L2 learners are able to approximate native-like VOT durations. The most frequently explored factors included the age of acquisition, the effects of the speaking rate or language mode activation. Many researchers found that early bilinguals are more likely to produce initial plosives with target language durations than those subjects who started acquiring the second language at a later age (e.g. Flege 1991). Moreover, language proficiency was shown as a significant factor influencing the degree to which L2 learners are able to approximate native-like VOT durations (e.g. Flege and Hillenbrand 1984; Flege 1987). Since VOT length may alter as a function of the speaking rate, some studies were conducted also on the factor of rate-related VOT adjustments that need to be made by L2 learners in an attempt to approximate target norms (cf. Schmidt and Flege 1996). The influence of other languages the participants of these studies knew has been ignored completely though.

2.2 VOT research in TLA By contrast, the mutual influence of all languages of multilinguals is the focus of studies from the perspective of third language acquisition (TLA), although relatively few studies to date have explored VOT patterns in this framework, where L3 phonological acquisition remains an understudied domain (cf. Cabrelli Amaro 2012). In the earliest reported study in this area, Tremblay (2007) analysed the acoustic measurements of voice onset time of four L1 English/L2 French bilinguals at the early stages of acquisition of L3 Japanese. The results showed similar VOT values for the L2 French and L3 Japanese which were much lower than for the long-lag L1 English VOT. The findings were interpreted as an indication of the L2 effect on L3 phonological acquisition, although the L3 VOT values approximated L2 French and, at the same time, native Japanese target norms. Moreover, the participants’ sample was very limited. Interestingly enough, no task effect was found as the VOT patterns in L3 did not differ significantly with respect to the task performed, i.e. word list reading or delayed repetition. A comprehensive study by Llama, Cardoso and Collins (2010) investigated whether the “L2 status” or language typology was the determining factor in the production of voiceless stops in stressed onset position in L3 Spanish. The

46

Magdalena Wrembel

experiment was based on target word list reading and involved two groups of learners; one with L1 English and L2 French, the other with L1 French and L2 English. The results indicated that the cross-linguistic influence from the L2 rather than typological proximity or the L1 transfer alone seemed to be the stronger predictor in the acquisition of VOT patterns in L3. However, the findings were not unambiguous as to the prevailing source of CLI pointing to the interaction of both native and non-native influences on the third language phonology. Particularly noteworthy is the application of a mirror-design methodology which allowed for a reliable verification of the research hypothesis. However, the lack of data in the participants’ L1s and the reliance on the literature reference values as a baseline instead appears to be a shortcoming of this valuable study. Wunder (2010), on the other hand, analysed text reading samples of eight L1 German speakers with respect to the VOT values in their L2 English and L3 Spanish. Her findings were mixed pointing to either L1 effect or combined L1 German and L2 English cross-linguistic influence on the aspiration patterns in L3 Spanish. The largest pool of VOT measurements was assigned to the category of ‘hybrid’ values in which it was not possible to determine whether the source of influence on L3 VOT were the L1 German or native Spanish values. In conclusion, Wunder stated that her results contradicted previous research demonstrating a prevailing L2 influence on L3 phonology (e.g. Hammarberg and Hammarberg, 2005). Similar results were reported by Sypiańska (2013) who examined VOT of word-initial /p, t, k/ in multilinguals with the following language repertoire: L1 Polish, L2 Danish and L3 English. Her findings attested a combined influence of both L1 and L2 on the VOT patterns in L3 English. An interesting case of a regressive transfer was also observed since the effect of L3 English was visible in increased VOT values in L1 Polish and L2 Danish in the trilingual group when compared to a bilingual control group with L1 Polish and L2 Danish. Sypiańska concluded that all component languages of multilingual subjects interact and influence one another in the global language entity. In a series of parallel studies Wrembel (2011, 2014) investigated VOT patterns in trilingual acquisition as a selected phonetic dimension of a foreign accent in order to complement previous research on perceived foreign accentedness based on L3 accent ratings (cf. Wrembel 2012a, 2012b). The results of the studies involving different language combinations – (1) L1 Polish, L2 English and L3 French; (2) L1 Polish, L2 English and L3 German – revealed that the multilingual subjects contrasted between VOT duration in all three language systems (i.e. the mean values for /p, t, k/ in stressed onset positions were significantly different in L1, L2 and L3). The reported L3 values corresponded to compromise VOT durations and were intermediate between the L1 and L2 mean VOT. The findings corro-

Cross-linguistic influence in second vs. third language acquisition of phonology

47

borated the coexistence of the L1 and L2 effect, and substantiated the assumption of a combined cross-linguistic influence in L3 acquisition. It was concluded that further research on different multilingual groups with various linguistic repertoires may be necessary to provide more evidence for these findings.

2.3 VOT in studies on new English varieties Investigations into initial plosive voicing contrasts are relatively scarce in research in new varieties. To the best of my knowledge, a series of studies were conducted to this effect in the South African context (Wissing 2005; Wissing and Pretorius 1996) as well as from an Asian perspective (Poedjianto 2002; Shahidi and Rahim 2011). These mainly focus on L1 influence although the possible influence of language proficiency has also been studied. Like in SLA studies, further languages the participants speak remain uninvestigated. Poedjianto (2002) investigated the voicing contrast in the Indonesian variety of English. The findings did not indicate any voicing contrast in Indonesian English between /p/-/b/ in the beginner learners, unlike the general case with English /p/-/b/. However, taking into consideration that the main indicator of the voicing contrast in Surabaya Indonesian is phonation (i.e. stiff and slack voice) the author hypothesises that there will be some adjustment made to reduce slackness along with the increase of VOT in Indonesian English. Moreover, Poedjianto predicts that VOT will progressively get longer across proficiency levels. The production of initial plosives in the Malaysian variety of English was explored by Shahidi and Rahim (2011). Unlike in English, Malay voiceless plosives are always unaspirated. The acoustic measurements of the participants’ productions of Malay and English voiced and voiceless obstruents demonstrated short lag VOT values for both languages for /p, t, k/ ranging between 10–30 ms) and a voicing lead for the voiced plosives. The phonetic realisations of Malaysian English initial plosives were found to be significantly different (i.e. lower) from the native English values. The authors concluded that the voicing patterns for Malay and Malaysian English are nearly identical thus corroborating the claim of L1 influence. The investigations into VOT patterns in African varieties of English feature studies by Wissing and Pretorius (1996) on Setswana, one of the Sotho languages of South Africa, and by Wissing (2005) on the aspiration of voiceless stop consonants in Southern Sotho. Sotho has a dual system in which the presence or absence of aspiration is phonemic, unlike in English where aspiration is phonetically motivated. Moreover, Sotho languages are characterised by long

48

Magdalena Wrembel

voicing lag in voiceless plosives. The results of Wissing and Pretorius’ (1996) study showed that voice onset time values for /p, t, k/ produced by Setswana speakers of English exceeded those reported for native English. Wissing’s (2005) study yielded comparable results with strongly aspirated voiceless plosives observed both in the Southern Sotho native renditions (in the 80–96 ms range) as well as their productions of the respective consonants in English (in the 55– 90 ms range). In a detailed account for individual results the author proposes a specific explanation of language interference based on some category confusion (in case of significantly shorter aspiration in /p/) rather than the typical negative transfer from the L1. All in all, the mean VOT values in this variety of English appear to be intermediate between the speakers’ native tongue and expected English values, yet L1 influence is strongly noticeable. In summary, studies in the framework of third language acquisition consider the largest number of different types of CLI and the widest range of potentially influencing factors compared to studies carried out in the framework of second language acquisition or new English varieties. It is the aim of this study to demonstrate the advantages of a multifaceted approach to investigating CLI from which studies on L2 acquisition and new English varieties could profit.

3 Study 3.1 Aims and research questions The present study constitutes a part of a larger scale project into third language phonological acquisition based on a series of studies on VOT patterns in different language combinations conducted by the author. Previous results of investigations on VOT patterns in L3 French and L3 German were presented in Wrembel (2014). The study aims to further investigate the complexity of transfer of voice onset time (VOT) patterns in trilingual acquisition. Its major objective is to explore the sources of cross-linguistic influence (CLI) in the acquisition of VOT in L3 French by L1 German learners with an advanced competence in L2 English. Furthermore, the major goal of this contribution is to compare the tendencies in VOT acquisition patterns found in L3 to those observed in research on SLA or new varieties of English. The languages involved in the present study all make a phonological distinction between two categories of stops, however, their phonetic realisation differs. English and German belong to the category of the so called aspirating languages (cf. Lisker and Abramson 1964), which differentiate between voiceless aspirated

Cross-linguistic influence in second vs. third language acquisition of phonology

49

and voiceless unaspirated plosives, whereas French is a voicing language, in which there is a distinction between voiced and voiceless unaspirated plosives. In English /p/, /t/, /k/ are implemented as long-lag stops with VOT around 60– 80 ms (Lisker and Abramson 1964), while in German the average VOT values are said to be between 30 and 50 ms (Angelowa and Pompino-Marschall 1985). In turn, in French /p/, /t/, /k/ are implemented as short-lag stops with mean VOT values around 20–30 ms (Caramazza et al. 1973). The study poses the following research questions in order to address the specified objectives: 1) Do multilingual subjects differentiate among their L1, L2 and L3 with regard to VOT values? 2) Do L3 VOT patterns approximate the participants’ L1 German, L2 English or the L3 native French norms? 3) Which factors have an influence on the CLI found for VOT production in the three languages? 4) Do the trends observed in L3 acquisition of VOT resemble the ones reported in studies on SLA and new English varieties? On the basis of the overview of the literature on third language acquisition, three potential general outcomes as to the sources of CLI were hypothesised: (1) native L1 German would be a prevailing source of cross-linguistic influence for the acquisition of VOT patterns in L3 French; (2) the influence of L2 English, the so called “foreign language effect” would override the native language in shaping L3 VOT values; (3) both the native and non-native languages would have an impact on the VOT values in the L3, thus collaborating the assumption of a combined cross-linguistic influence. With regard to the comparison between patterns of VOT acquisition it was hypothesised that (1) similar trends are observed in the acquisition of a second language, third language or new varieties; (2) the trends differ significantly, thus reflecting the specific nature of these three contexts of acquisition.

3.2 Participants and procedure The study involved 18 native speakers of German who were students at the University of Münster, Germany at the time of data collection. There were 15 female and 3 male participants and their mean age was 29 years (SD = 5.6), ranging from 22 to 43 years old. For all of the participants English was their second language (L2) and French was their third language (L3) both in terms of

50

Magdalena Wrembel

chronology and the dominance of use. The level of proficiency in L2 English was advanced (C1, according to CEFR) with an average length of training being 14 years (SD = 4.6) and the age of onset at 10 years old (SD = 1.6). In case of L3 French, the participants’ proficiency level was intermediate (B1/B2 level according to CEFR). Foreign language proficiency level was self-declared by the participants based on internal course placement assessment procedures. The average amount of time of formal training in French (YFT) was 7 years (SD = 3.3), whereas the mean age of onset of learning (AOL) equalled 13 years (SD = 1.5). The total number of foreign languages known by the participants equalled on average 3.3 (SD = 1.1) ranging from 2 to 7. Their self-evaluation of the general language competence in L3 French on a scale from 1–5 (1 = very poor, 5 = very good) equalled 3.3 (SD = 0.8), similarly to the self-evaluation of L3 pronunciation which was 3.4 (SD = 0.9) corresponding to a category between satisfactory and good. The participants had undergone general linguistic training, however, no practical training of the phonetic feature under investigation was reported. The data collection procedure involved all three language systems of the multilingual participants, i.e. L1 German, L2 English and L3 French. The stimuli consisted of three word lists with 18 target words in the respective languages. The target words included voiceless plosives /p, t, k/ in stressed onset positions in the following context of high, mid and low vowels, in mono- and disyllabic words, thus generating a total of 18 items per language list. The words were randomised and embedded in carrier phrases in particular languages (i.e. Ich sage. . . , I am saying . . . , Je dis. . .). The recordings were made in a clearly specified language mode in the natural order of acquisition of the languages involved, with German as first, English as second and French as third. The participants were asked to read the lists at a natural speed with a few minutes’ break interval between the recordings. The interaction with the researcher was carried out in the language of the subsequent recording to promote the activation of the respective languages. Finally, a language background questionnaire was administered to tap the subjects’ language history and use. The stimuli were recorded with the application of Audition CS5.5 as 16-bit mono files at 32000 Hz sampling frequency. Tokens were excluded from the analysis if the target words were mispronounced. A total of 1512 tokens were subjected to an acoustic analysis performed using PRAAT 5.2.15 (Boersma and Weenick 2010). Voice onset time was measured in milliseconds (ms) as the interval between the release burst and the beginning of the regular vocal fold vibrations.

Cross-linguistic influence in second vs. third language acquisition of phonology

51

3.3 Results The analysis of the results was based on the acoustic measurements of mean voice onset time of the target words read in the carrier phrases in L1 German, L2 English and L3 French and it involved (1) mean VOT values for L1, L2 and L3, (2) the comparison to VOT reference values, (3) the analysis of the context effects and (4) the analysis of variance and correlation analysis accounting for the relationships between independent variables. The statistical analyses were performed using SPSS.

3.3.1 Mean VOT values for L1, L2 and L3 Figure 1 presents the mean results of VOT measurements for the voiceless plosives /p/, /t/, /k/ in stress onset positions in the participants’ L1 German, L2 English and L3 French. The VOT values produced in the participants’ first and second language were relatively similar (L1 German /p/ = 67 ms, /t/ = 71 ms, /k/ = 84 ms; L2 English /p/ = 64 ms, /t/ = 74 ms, /k/ = 83 ms), and were characterised by a longer lag than the values for L3 French (/p/ = 49 ms, /t/ = 55 ms, /k/ = 68 ms).

Figure 1: Mean VOT (ms) values

52

Magdalena Wrembel

Across-language comparisons of means for /p/ /t/ /k/ were performed by means of the analysis of variance and a non-parametric Kruskal-Wallis test. Repeated-measures ANOVA pointed to significantly different values for initial voiceless plosives between L1 German and L3 French as well as between L2 English and L3 French (p < .05), however, the difference between mean VOT values in L1 German and L2 English was not statistically significant. Table 1: Repeated measures ANOVA – a comparison of VOT means in L1 German, L2 English and L3 French ANOVA

/p/ /t/ /k/

p L1vsL2

L1vsL3

L2vsL3

1.000000 0.273820 0.908390

0.000000* 0.000022* 0.000022*

0.000000* 0.000022* 0.000022*

The following box plots (Figure 2–4) illustrate the observed tendencies in VOT patterns in the respective languages separately for the stressed onset plosives /p/, /t/ and /k/. While the distribution in L1 German and L2 English shows only negligible discrepancies with respect to VOT means, standard deviation as well as the minimum-maximum range, the mean values for L3 French remain significantly lower although the minimum-maximum range is even more pronounced. The language effect was thus observed to hold only between the third language and the remaining two phonological systems. The mean voice onset time values in L3 French were considerably lower than the respective values in L1 German and L2 English, which, on the other hand, display very similar patterns of distribution. In order to investigate the relationship between the VOT values observed in L3 French and those of the native L1 German as well as L2 English the Pearson correlation analysis was applied. The calculated coefficients pointed to positive weak to moderate correlations between the mean VOT values. In case of voiceless bilabial plosive /p/ the correlation between non-native languages (L3 French and L2 English) was slightly higher (R = .34) than the one between the L3 and the native German VOT values (R = .25). For the alveolar and velar plosives /t/ and /k/ the correlations were slightly stronger between L3 and L1 rather than L3 and L2, although they were in the weak range for /t/ (R = .31 vs. R = .21 respectively) and in the medium range for /k/ (R = .42 vs. R = .37 respectively). It appears impossible to state unequivocally whether L3 VOT values were correlated more with the native values or those of another foreign language as the differences between the coefficients were relatively small.

53

Cross-linguistic influence in second vs. third language acquisition of phonology

Figures 2–4: Box plots for /p/ /t/ /k/ in L1 German, L2 English and L3 French

Table 2: Pearson’s correlations between L3 VOT values and L1 and L2

/p/ /t/ /k/

VOT

N

R

t

p

L1 vs. L3 L2 vs. L3 L1 vs. L3 L2 vs. L3 L1 vs. L3 L2 vs. L3

106 106 106 106 108 108

0.25 0.34 0.31 0.21 0.42 0.37

2.6 3.7 3.3 2.1 4.7 4.1

0,009326* 0,000353* 0,001344* 0,033813* 0,000008* 0,000076*

3.3.2 Individual variation The analysis of VOT measurements investigated also the individual variation in the generated VOT values for /p/ /t/ /k/. Due to space limitations, only the individual distribution for L3 French is presented, which is of most relevance for the

54

Magdalena Wrembel

Figure 5: Individual variation in L3 French VOT for /p/ /t/ /k/ against the reference VOT values

present study. As can be seen from Figure 5, nearly all of the participants with a few exceptions (CA, EF, JP) followed the universal VOT pattern, with bilabials plosives yielding the shortest VOT values, and velar – the longest. The greatest variability seems to be visible for /p/, whereas /t/ and /k/ tended to generate less interspeaker variation. Individual average VOT measures are presented against the selected reference VOT values for French (Caramazza et al. 1973). On the whole, the observed L3 values surpass the reference VOT measurements, with such individuals as CA, EF representing the most extreme departures from the norm. On the other hand, the L3 performance of some individual participants like AB, CF, JS appears to be fairly close to the French norm VOT values.

3.3.3 Comparison with L1 reference values One-sample t-tests were administered to compare the calculated mean VOT durations for /p, t, k/ in L1 German, L2 English and L3 French to the reference values often quoted in the literature for the respective languages. The overall finding was that the VOT measurements differed significantly from the native

55

Cross-linguistic influence in second vs. third language acquisition of phonology

norms as reported in the literature (see Table 3). More specifically, the VOT values for voiceless stops in L1 German of the multilingual participants were found to be significantly longer than the reference VOT German values quoted in the literature (Angelowa and Pompino-Marschall 1985), i.e. /p/ 66.6 vs. 36 ms; /t/ 70.6 vs. 39 ms; /k/ 84.3 vs. 47 ms. As far as the VOT measurements in L2 English are concerned they were demonstrated to be closer to the reference range (Lisker and Abramson 1964), especially in the case of /k/ (83.3 vs. 84 ms), however, the bilabial and alveolar stops were realised on average with a longer lag than the reference values (/p/ 64.3 vs. 59 ms; /t/ 74 vs. 67 ms). Although the differences were found to be statistically significant for /p/ and /t/, they were still within the accepted 5–10 ms range. Considerable VOT lengthening was also observed for L3 French when compared to the literature reference values (Caramazza et al. 1973), with /p/ equal to 46.8 vs. 18 ms; /t/ 55.1 vs. 23 ms; /k/ 68 vs. 32 ms). All in all, the French stops were implemented by the multilingual participants as long-lag and thus the L3 phonetic norms were not approximated successfully. The findings demonstrated “compromise” values for L3 French that were longer than typical French native values but shorter than the values observed for both L1 German and L2 English. It is thus impossible to tease apart the influence of the first or the second language on the values in the third language as the values for the participants’ native German and L2 English did not differ significantly from one another. Table 3: Comparison to VOT reference values in L1 German, English and French, p < .01. (1Angelowa and Pompino-Marschall 1985; 2 Lisker and Abramson 1964; 3 Caramazza et al. 1973) VOT /p/

/t/

/k/

German

VOT 1

Ref. L1 M SD p

36 66.6 19.5 0.0000*

39 70.6 16.6 0.0000*

47 84.3 19.0 0.0000*

English

Ref. VOT 2 L2 M SD p

59 64.3 18.1 0.0029*

67 74.0 16.9 0.0000*

84 83.3 17.3 0.6693

French

Ref. VOT 3 L3 M SD p

18 46.8 23.3 0.0000*

23 55.1 16.1 0.0000*

32 68.0 18.9 0.0000*

56

Magdalena Wrembel

A potential explanation for the mismatch in the observed L1 German VOT values in this study compared to the reference values from Angelowa and Pompino-Marschall’s (1985) study could be related to dialectal differences among the participants; in the former study representing mainly the Western Low German area, whereas in the latter participants came from the South German region. Braun (1996) provides relevant support for this suggestion on the basis of her comparison of VOT values in various regional varieties of German in which the VOT values for North-Western German speakers tend to be higher resembling those reported in the present study. Consequently, this fact could have also contributed to the present participants’ rather successful renditions of target-like VOT values for English /p, t, k/.

3.3.4 Analysis of variance A two-factor ANOVA between languages (L1, L2, L3) and VOT durations of the voiceless plosive sounds /p, t, k/ was performed as part of the analysis of variance. Although the differences in VOT values within the factors of language (F (2; 959) = 92.6, p < .05) and segments (F (2; 959) = 89.6, p < .05) were shown to be significant, the interaction between languages and segments on the VOT values was not found to be significant (F (4; 959) = 0.93, p > .05). The lack of interaction between languages and segments did not depend on the type of language as presented in Figure 6. In order to investigate the interaction of the vowel context and the language on the observed VOT durations, a two-factor analysis ANOVA was performed for the factors of the language (L1, L2, L3) and the context of the vowel following the voiceless plosives in stressed onset positions in the target words (_/a/, _/i/, _/e, o/). In accordance with universal tendencies, the context of high vowels (e.g. /i/) should generate longer VOT values in the preceding plosives than the context of low vowels (e.g. /a/). The results of the analysis indicate that there are significant differences in VOT values within the factor of languages (F (2; 959) = 79.5, p < .05) and the vowel context (F (2; 959) = 11.5, p < .05). Moreover, there is also a significant interaction between the two factors (F (4; 959) = 3.98, p < .05) which depends on the type of the language (see Figure 7). Different patterns of interaction can be observed in the respective languages with only L3 French following closely the universal patterns, i.e. the longest VOT values in the high vowel context /i/, medium for the mid vowels /e/, /o/, and the shortest for the low vowel context /a/. In the L1 German the universal tendencies were not fully observed, with the /a/ context generating on average the longest VOT values, whereas in L2 English the mid and low vowel contexts yielded different VOT duration patterns than the expected ones.

Cross-linguistic influence in second vs. third language acquisition of phonology

57

Figure 6: The interaction between the language and segment factors

3.3.5 Correlation analysis of factors influencing VOT production The analysis of the results involved also the computation of linear Pearson’s correlation between different independent variables and the observed mean VOT durations in particular languages. The selected variables involved such factors as the participants’ age (AGE); the years of formal training in L2 English and L3 French (L2_YFT, L3_YFT); the starting age of learning of both foreign languages (L2_AOL, L3_AOL), proficiency level in L2 English and L3 French (L2_Prof, L3_Prof); self-evaluation of general language proficiency in both languages (L2_self-eval, L3_self-eval); self-evaluation of pronunciation competence in L2 and L3 (L2_self-eval PRON, L3_self-eval PRON); and the total number of foreign languages known by the participants (N_TOTAL). No significant Pearson’s correlations were found for the observed values in L1 German and L2 English; however, L3 French displayed some interesting patterns of dependence. A positive moderate correlation was found between the years of instruction in L2 English and an average VOT duration in L3 French (r = 0.52, p = 0.03), i.e. the longer the training in English, the longer the VOT

58

Magdalena Wrembel

Figure 7: The interaction between the language and vowel context factors

values in L3 French. Moreover, a significant negative correlation between the self-evaluation of pronunciation in L3 and VOT length in L3 for /p/ and /t/ was observed (r = –0.51, p = 0.03), i.e. the better one’s self-assessment of L3 oral performance in French, the lower the observed VOT values in L3 which corresponds to more native-like French VOT patterns. The correlations between the remaining variables did not prove significant. Furthermore, another Pearson’s correlation analysis was performed to investigate any dependence across the selected factors. The analysis pointed to a number of significant correlations (see Table 4) including a strong positive correlation between the amount of training in L3 and the self-evaluation of L3 general competence (r = 0.69, p < .05) as well as between the amount of training in L3 and L3 proficiency level (r = 0.72, p < .05). Moreover, a strong positive correlation was found between self-evaluation of general L3 proficiency and self-evaluation of L3 pronunciation (r = 0.78, p > indicates that the constraints to the left are higher ranked than those to the right)

NO COMPLEX ONSET

“No complex onsets/codas allowed”

(14)

and NO COMPLEX ONSET Modern Bangla pronunciation /bentʃ/ (English) [bentʃɪ] /læmp/ (English) [læmpo] /treɪn/ (English) [teren]9 /srad̪ʰd̪ʰo/ (Sanskrit) [ced̪d̪a]

NO COMPLEX CODA

Loanword

Gloss ‘bench’ ‘lamp’ ‘train’ (gloss not available)

The repair strategies employed to avoid complex codas and onsets could be either deletion or epenthesis. We once again observe that these constraints NO COMPLEX ONSET and NO COMPLEX CODA apply to Modern Bangla, Assimilated Sanskrit words and the English lexicon.

9 However, interestingly, coda clusters are tolerated in words like [turunk] (trunk) and [ɪʃtænd] (stand) as mentioned in (3) above. This shows that NO COMPLEX ONSET is higher ranked than NO COMPLEX CODA . Epenthesis of a vowel to break the coda cluster is not possible as it would then violate the DISYLLABIC TROCHEE CONSTRAINT.

104

Hemalatha Nagarajan

4.1.3 Gemination in Bangla Kar (2009b: 111) discusses different types of gemination processes in Bangla and attempts to identify the domains of application of these rules. A. Gemination with semi-vowels: A post-consonantal semi-vowel is lost leading to the gemination of the preceding consonant. (15)

Gemination with semi-vowels Loanword Modern Bangla pronunciation /sa:d̪ʰʋɪ/ (Sanskrit) [sad̪ʰd̪ʰɪ] /pṛɪt ̪ʰʋɪ/ (Sanskrit) [pṛɪt ̪ʰt ̪ʰɪ] /bɪsʋa:s/ (Sanskrit) [bɪʃʃaʃ] /sat ̪ʰya/ [ʃot ̪ʰt ̪ʰo] /ba:lyaka:l/ [ballokal] /pʊnja/ [pʊnno]

Gloss ‘faithful wife’ ‘earth’ ‘trust’ ‘truth’ ‘childhood’ ‘virtue’

Kar attributes this gemination to the operation of the constraint SYLLABLE CONTACT LAW. Vennemann (1988) and Murray and Vennemann (1983) proposed the SYLLABLE CONTACT LAW which attempts to explain syllabification patterns and sound change at syllable boundaries in terms of a single, graded preference “law”. The proposed law can be paraphrased as: “A syllable contact pair α. β is more preferred the greater the increase in consonantal strength from a coda segment α to an onset segment β.”

This explains why gemination takes place in the above mentioned contexts as a less sonorous segment is followed by a segment that is more sonorous: a semi-vowel. We, on the other hand, believe that there is no need to invoke the constraint SYLLABLE CONTACT LAW in these cases. This is because Bangla “hardens” the sonorants /ʋ/ and /j/ to the corresponding obstruents /b/ and /dʒ/. Once these are created, we assume there is total spread of features from the preceding consonant. For instance, the word /pṛɪt ̪ʰʋɪ/ (‘earth’) is first converted to [prɪt ̪ʰbʰɪ]. Progressive total assimilation takes place then and changes it to [prɪt ̪ʰt ̪ʰɪ]. /pṛɪt ̪ʰʋɪ/ → [prɪt ̪ʰbʰɪ] → [[prɪt ̪ʰt ̪ʰɪ] This spread of features is bidirectional as illustrated in the word /va:hja/ (Sanskrit) →/ badʒʰdʒʰo/ (Bangla). The glide /j/ initially changes to /dʒʰ/, which

Loanword adaptation and second language acquisition

105

spreads to the previous segment too, leading to gemination. This, understandably, applies only to Sanskrit words that have been assimilated totally into Bangla on which the phonological rules of Bangla are imposed. Kar (2009b: 113) also mentions another gemination rule,10 i.e.: B. Gemination with liquids: Voiced plosives geminate before a liquid /r/ or /l/, keeping the latter sound intact. However, he categorically states that this rule applies only to Sanskrit, English and Arabic loanwords and not to Modern Bangla words. (16)

A. Gemination with liquids in loans Loanword Modern Bangla pronunciation /pʊt ̪ʰra/ (Sanskrit) [pʊt ̪ʰt ̪ʰro] /sʊbʰra/ (Sanskrit) [ʃʊbʰbʰro] /sʊpri:m/ (English) [ʃʊpprɪm] /sʌplaɪ/ (English) [ʃapplaɪ] /mad̪rasa/ (Arabic) [mad̪d̪raʃa]

Gloss ‘son’ ‘white/bright’ ‘supreme’ ‘supply’ ‘school’

B. No Gemination with liquids in Modern Bangla Native Bangla /sa:t ̪ʰra/ /babrɪ/ /sapla/

Modern Bangla pronunciation [ʃat ̪ʰra] [babrɪ] [ʃapla]

Gloss ‘a Bengali surname’ ‘long curing hair-style’ ‘water lily’

Kar attributes this too to SYLLABLE CONTACT LAW. In Modern Bangla words, however, it is overridden by the constraint NO COMPLEX ONSET as gemination would lead to the creation of a complex onset word-medially, for example /babrɪ/→[babbrɪ].

10 In addition to these two contexts of gemination, Kar also presents cases of gemination of a plosive before the nasal /m/. For example, /pad̪ma/ (Sanskrit) /cʰad̪ma/ (Sanskrit) /a:t ̪ʰma/ (Sanskrit)

[pɔd̪d̪o] [c ɔd̪d̪o] [at ̪ʰtʰ̪ o]

‘lotus’ ‘disguise’ ‘soul’

We treat these as cases of restructuring as (i) these are rare instances, and (ii) gemination is disallowed in a velar plosive + /m/. For example, /rukmɪnɪ/ is never realised as /rukkɪnɪ/.

106

Hemalatha Nagarajan

4.2 The core-periphery of the Bangla lexicon To summarise this section, we note that loanwords adopted into a language undergo phonological changes depending on the historical period when they were incorporated. Loanwords adopted into Bangla at the same period as English undergo the same constraints whereas loanwords from an earlier period undergo a different set of constraints. Constraints get reordered in different periods of time. Careful examination of Modern Bangla data and loanwords from English, Sanskrit (and Arabic) reveal the operation of the following families of constraints: Markedness Constraints DISYLLABIC TROCHEE

NO COMPLEX ONSET

NO COMPLEX CODA

SYLLABLE CONTACT LAW

Faithfulness Constraints NO GEM (No gemination) MAX (No deletion)

DEP

(No epenthesis)

However, what is interesting is that not all of them apply uniformly to either Bangla or to all the loanwords used in the language. The diagram shown below illustrates this. The syllable structure rules of NO COMPLEX CODA and NO COMPLEX ONSET, and DISYLLABIC TROCHEE requirement apply to Modern Bangla, Assimilated Sanskrit loans, English and Portuguese loans. The constraint SYLLABLE CONTACT LAW is seen only in Assimilated Sanskrit vocabulary, English and Portuguese loans and in Arabic loans. This clearly shows that NO COMPLEX ONSET is very high ranked in Modern Bangla whereas this is relaxed in the loans in the medial position (as gemination of the plosive before liquids leads to the creation of a complex onset word-medially, for example /put̪.t ̪ro/). To sum up, the constraint rankings that we need for the different strata are as follows (see Figure 4). Thus, phylogenetically, the historical evolution of the Bangla language reveals a simplification process from Old Bangla to Modern Bangla, triggered by markedness principles. Initially, Old Bangla tolerated complex onsets, complex codas and had words longer than two syllables. Hence, the loans that were incorporated into the language prior to the 13th century underwent no changes. This is reflected in the Unassimilated Sanskrit loans and the Arabic loans. Hence, the ranking order was: Constraint ranking for Old Bangla, Unassimilated Sanskrit loans, Arabic loans: Faithfulness>>Markedness

Loanword adaptation and second language acquisition

107

Figure 4: The core and periphery of the Bangla lexicon

Gradually, Bangla started undergoing changes, seen in Middle and Modern Bangla. With the simplification of Bangla, triggered by markedness principles, we notice phonological repairs in the loans too. Thus, there was a gradual shift or promotion of markedness constraints over Faithfulness constraints. Constraint ranking for Modern Bangla, Assimilated Sanskrit loans and English loans: Markedness >> Faithfulness Thus, we find convergence in the languages that were incorporated into the Bangla lexicon at the same historical period.

108

Hemalatha Nagarajan

2 Second/third language acquisition by Bangla speakers Having examined English and Sanskrit loanwords in Bangla, we now proceed to describe the phonological changes that English and Hindi words undergo at the level of the syllable in the second/ third language variety spoken by Bangla speakers. The data has been gathered from various sources: Karim (2010), Hoque (2011) and Dutta (p.c.). While examining the data, we need to keep in mind that there may be no consensus among speakers of the language, as a second/third language is usually in a fluid state. Keeping this in mind, we have attempted to show the cline or stages of interlanguage from a basic basilectal stage to a nearnative acrolectal stage. Syllable structure constraints in English as L2/L3 Karim (2010: 27) notes the following regarding the pronunciations of L2 English words by L1 Bangla speakers:“The restrictions on word-initial consonant clusters in native Bengali carry over to the pronunciation of English words by Bengali speakers learning English”. These learners use a strategy of vowel epenthesis to break up initial consonant clusters. Sometimes vowel epenthesis occurs between the two consonants of the consonant clusters. For example: (17)

Epenthesis English /frʌnt/ /flæt/ /kri:m/ /gru:p/ /flɔr/

in between consonant clusters in Bangla-English Bangla-English Gloss /fərʌnt/ ‘front’ /fəlæt/ ‘flat’ /kəri:m/ ‘cream’ /gǝrup/ ‘group’ /fǝlor/ ‘floor’ (Karim 2010: 28)

In some cases epenthesis occurs before the initial consonant clusters. For example: (18)

Epenthesis before consonant clusters in Bangla-English English Bangla-English Gloss /speʃl/ /ɪspeʃal/ ‘special’ /speɪn/ /ɪspeɪn/ ‘Spain’ /sku:l/ /ɪsku:l/ ‘school’ /steʃn/ /ɪsteʃon/ ‘station’

Loanword adaptation and second language acquisition

109

The site of epenthesis is determined by sonority principles. When we have two consonants with rising sonority as in (17), there is epenthesis in between the consonant cluster. However, when there is a falling sonority as in (18), the epenthesis site is before the consonant cluster. Finally, when a consonant cluster occurs between two vowels (i.e. when they are members of different syllables) epenthesis does not occur. For example: (19)

No a. b. c. d.

epenthesis astonish: /ǝstɔnıʃ/ continue: /kʌntɪnju/ Monday: /mʌndeɪ/ April: /eɪprǝl/

Karim’s study is restricted to onset clusters as Bangla speakers of English do not seem to have problems with the pronunciation of coda clusters of English in words like lift, pant, tax, bank, lunch, pound, chance, lamp etc. To account for these facts, we posit the following constraint rankings: (20)

Interlanguage stage 1 (Basilectal stage) DISYLLABIC TROCHEE / NO COMPLEX ONSET / SYLLABLE CONTACT LAW NO COMPLEX CODA

(21)

Interlanguage stage 2 (Mesolectal stage) NO COMPLEX ONSET / SYLLABLE CONTACT LAW/ DISYLLABIC TROCHEE ALIGN R

(22)

>>

>> MAX , DEP (L1 Faithfulness)

>>

>> MAX / DEP > NO COMPLEX CODA > ALIGN L (Markedness)

Interlanguage stage 3 (Acrolectal stage) ALIGN R , MAX , DEP >> SYLLABLE CONTACT LAW >> DISYLLABIC TROCHEE > NO COMPLEX CODA / NO COMPLEX ONSET > ALIGN L (L2 Faithfulness)

Initially, due to the dominance of L1 Faithfulness, (as noted in the previous section on English loanwords in Bangla) consonant clusters in the onset position are not tolerated. Moreover, the disyllabic requirement leads to the lengthening of vowels in monosyllabic words like guest, bank, lift, tax, pin, pet11 etc. This has been noted even in the speech of English children acquiring English as their first language. Demuth and Fee (1995) proposed that children demonstrate an early sensitivity to word-minimality effects, exhibiting a period of vowel lengthening 11 Lengthening of the vowel is required in Bangla/Bangla English as Bangla has a highly ranked disyllabic rather than a bimoraic requirement.

110

Hemalatha Nagarajan

or vowel epenthesis if coda consonants cannot be produced. Kehoe et al. (2008) show that coda consonants were accurately produced by English speaking children in contexts where it could be prosodified as part of a bimoraic foot. In the next stage, onset clusters are tolerated if they do not violate sonority sequencing. Coda clusters, however become part of the repertoire of BanglaEnglish syllables. This forces the introduction of two additional constraints, ALIGN R (Right) and ALIGN L (Left), where the former is higher ranked than the latter. ALIGN - R or ALIGN - L are constraints that show a preference in languages for certain linguistic features to be aligned with other linguistic features either at the right or the left edge. Alignment describes the tendency in languages for certain linguistic features to coincide, such as the location of primary word stress at the beginning of a word or at the end of a word. Ranking ALIGN - R above ALIGN - L explains the acquisition of coda clusters prior to onset clusters. Finally, the Faithfulness constraints of L2 dominate with traces of L1 as in lengthening of vowels (for the disyllabic requirement etc.) Syllable structure constraints in Hindi as L2 Before we analyse the Bangla-Hindi data, it is necessary to know about Hindi phonology. The syllable structure of Hindi is (C)(C)V(C)(C). As such, both wordinitial and word-final consonant clusters are permitted. Given below are the possible onset and coda clusters in Hindi: (23)

Permissible onset and coda clusters in Hindi ONSET CLUSTERS

CODA CLUSTERS

ʈr, ɖr, kr, kl, pr, pl, br, bl, tr, dr, gr, gl, ghr, mr, ml, nr, hr, hl, s̪p, s̪ph, s̪t, s̪th, s̪k, s̪kh, s̪n, s̪rs̪l, s̪m, s̪ʈ Three-member consonant clusters occur only word-initially and they are 4 in number, viz. skr, str, spr, sʈr dhr,

khr,

mp, nth, nc, ng, sth, sht, rb, ksh, mb, lp

The following are the patterns found in Bangla-Hindi with regard to onset clusters: Firstly, Bangla-Hindi has onset clusters like str, st, pr, kr, gr. The only problematic ones are the nasal+liquid clusters, namely, /mr/ and /nr/. We find the following progression from basilectal to acrolectal stage for a Hindi word like /mrɪt ̪ju/ (ʻdeathʼ)

Loanword adaptation and second language acquisition

(24)

111

[mɪt ̪t ̪u] → [mrɪt ̪t ̪u] → [mrɪt ̪tju]

What this indicates is that a Bangla speaker acquiring Hindi as a second language at the initial stage (Interlanguage stage 1) is guided by the phonology of his/her L1, namely Bangla. As Modern Bangla does not permit consonant clusters in the onset position, the initial cluster /mr/ is simplified to just /m/ with deletion of /r/. Another constraint, SYLLABLE CONTACT LAW forces the change of the medial /t ̪j/ sequence to the geminated /t ̪t /̪ . This suggests the ranking of constraints as follows: (25)

Interlanguage stage 1 (basilect) NO COMPLEX ONSET / SYLLABLE CONTACT LAW

>> MAX / DEP

(L1 Faithfulness) At the next level, (Interlanguage stage 2), with exposure to more of the second language Hindi, the learner realises that he or she has to re-rank the constraints. (26)

Interlanguage stage 2 (mesolect) > MAX / DEP > NO COMPLEX ONSET (Markedness > L1/L2 Faithfulness) SYLLABLE CONTACT LAW

Some markedness constraints like SYLLABLE CONTACT LAW persist even at this stage. However, with the mastery of the second language, L2 faithfulness (i.e. faithfulness to Hindi phonology) forces reordering of the constraints again as follows: (27)

Interlanguage stage 3 (acrolect) MAX > SYLLABLE CONTACT LAW > DEP > NO COMPLEX ONSET (L2 Faithfulness)

In the word-medial position, there is gemination of the coda consonant if there is violation of SYLLABLE CONTACT LAW. For example, /ʃupprobhha:t ̪/ (‘morningprayer’), /ra:t ̪:t ̪rɪ/ (‘night’), /ʃɪgghro/ (‘soon’), /ɔmmrɪt ̪o/ (‘nectar’). The most interesting cases are the consonant clusters in the coda position of the syllable. Dutta (p.c) notes that there is a difference in pronunciation of the following set of words: (28)

A.

/spaʃt/ (‘clear’), /kaʃt/ (‘suffering’), /gʌrb/ (‘pregnancy’), /vrɪkʃ/ (‘tree’), /st ̪amb/ (‘pillar’), /sankalp/ (‘difficulty’)

B.

/tʃa:nd̪/ (‘moon’), /d̪o:st ̪/ (‘friend’), /su:t ̪r/ (gloss unknown), /gu:ndʒ/ (‘echo’), /pra:nt ̪/(‘state’), /sa:mp/ (‘snake’), /ka:mp/ (‘tremble’) ̃ ̃

112

Hemalatha Nagarajan

The words in Set A have the structure (C)CVCC whereas the ones in Set B have the structure CVVCC. In the pronunciation of Set A words, we either find an epenthetic final vowel /ɔ/ in the basilectal varieties or /ǝ/ in the mesolectal varieties. There is no vowel at the end of the words in Set B. We have already noted the high-ranked constraint DISYLLABIC TROCHEE requirement for Bangla. This is operative in the (C)CVCC syllables too. The question to be asked is: Why doesn’t it enforce epenthesis of a vowel in CVVCC syllables? In CVVCC syllables (as these syllables have long vowels), there is “a catalectic syllable” which renders these minimal words “virtual disyllables”as mentioned in section 4.1.1.). Hence, as these words are already disyllabic, there is no need to add a vowel at the end. The phonotactics of word-final and word-initial consonants in these different languages can be summarised in the initial stages of acquisition as in the following table (29). (29)

Word-final and word-initial consonants in Modern Bangla, Hindi, English and Bangla-Hindi and Bangla-English Consonant clusters

Modern Hindi English Bangla-Hindi Bangla

BanglaEnglish

Allows final C

Yes

Yes

Yes

Yes

Yes

Allows coda clusters

No

Yes

Yes

Yes if preceded by a long vowel. Otherwise, no.

Yes

Allows onset clusters

No

Yes

Yes

Yes

No

The puzzling aspect is the emergence of onset clusters prior to coda clusters in Bangla-Hindi whereas we observe the reverse in Bangla-English. This is a fact that persists in the mesolectal varieties and therefore needs to be explained. One plausible explanation could be the high ranking of constraint ALIGN - R in English which is transferred to Bangla-English. In English, scansion of syllables for assignment of stress begins from the right edge (Hayes 1995). Hence, this edge needs to be “docked” first. Kirk and Demuth (2006) note the emergence of coda clusters prior to onset clusters in English speaking children and in Germanic languages in general. On the other hand, Demuth and Kehoe (2006) observe that word-initial clusters are acquired first for French-speaking children, probably because iambic feet are created from left to right in French. Research needs to be done to check if onset clusters or coda clusters are acquired first in Hindi speaking children to strengthen this hypothesis.

Loanword adaptation and second language acquisition

113

5 Conclusion To sum up, an attempt has been made in this paper to show how English words adopted into Bangla language as loanwords (i.e. as a foreign language) have undergone the same phonological rules (at the level of the syllable) as other languages adopted at the same historical period. In this case, there is a convergence. On the other hand, the phonological rules that apply to English as a second language shows a marked difference from the ones that apply to Hindi as a second/third language. In this case, there is a divergence in the constraints obeyed. We have illustrated within the theoretical framework of Optimality Theory (OT how constraints are/were re-ranked across different historical stages (learnerexternal) and developmental stages (learner-internal). In general, the historical analysis of Bangla loanword structures is presented as a shift from focusing on adhering to the shape a word brings from its donor language (L1 Faithfulness) in early loans towards decreasing the degree of markedness in phonotactic structure in late loans. We observe convergence in loanwords adopted in the same period of time, for example, Sanskrit and English where there is strict adherence to L1 Faithfulness over L2 Faithfulness constraints. This shows that when foreign words are incorporated into a language (at a point of time), they undergo phonological changes to adhere to the system of the L1 (at the given point of time). On the other hand, examination of the prosodic structure of (L1 Bangla) L2 (Hindi or English) learners’ pronunciations in different interlanguage stages reveals a tension between L1 and L2 where the learner’s grammar exhibits a divergence after a certain point not only between L1 and L2 (as the learner’s proficiency in the second language increases) but also between the two (or more) languages acquired after the first language, as the case of English and Hindi illustrated in the paper. L2 Faithfulness constraints (i.e. phonological constraints of English and Hindi) take precedence over L1 Faithfulness (phonological constraints of Bangla). Some universal markedness constraints persist.

References Archanageli, Diana & D. Terence Langendoen (eds.). 1997. Optimality Theory: An overview. Oxford: Blackwell. Becker, Michael. 2003. Lexical stratification of Hebrew- The disyllabic maximum. In Yehuda Falk (ed.), Proceedings of IATL 19. Bickerton, Derek. 1975. Dynamics of a creole system. Cambridge: Cambridge University Press. Bishwaksen, Bandhopadhyay. 2005. Loanwords in Bangla.. Hyderabad, India: CIEFL M.A. thesis.

114

Hemalatha Nagarajan

Boersma, Paul & Silke Hamann. 2009. Loanword adaptation as first-language phonological perception. In Andrea Calabrese & W. Leo Wetzels (eds.), Loan phonology, 11–58. Amsterdam & Philadelphia: John Benjamins. Calabrese, Andrea & W. Leo Wetzels (eds.). 2009. Loan phonology. Amsterdam & Philadelphia: John Benjamins. Chatterji, Suniti Kumar. 1926. The origin and development of the Bengali language. Calcutta: Calcutta University Press. Clements, Nick. G. 2001. Representational economy in constraint-based phonology. In T. Alan Hall (ed.), Distinctive feature theory, 71–146. Berlin: Mouton de Gruyter. Dash, Niladri Sekhar. 2005. Methods in madness of Bengali spelling: A corpus-based empirical investigation. South Asia Language Review 14 (2). 63–92. (isical.academia.edu/nsdash). Demuth, Katherine & E. Jane Fee. 1995. Minimal words in early phonological development. Ms, Brown University and Dalhousie University. Demuth, Katherine, Jennifer Culbertson & Jennifer Alter. 2006. Word-minimality, epenthesis and coda licensing in the early acquisition of English. Language and Speech 49 (2). 137– 174. Demuth, Katherine & Margaret Kehoe. 2006. The acquisition of word-final clusters in French. Journal of Catalan Linguistics 5. 59–81. Friesner, Michael L. 2009. The social and linguistic predictors of the outcomes of borrowing in the speech community of Montréal. University of Pennsylvania PhD dissertation. Hammarberg, Björn. 2001. Roles of L1 and L2 in L3 production and acquisition. In Jasone Cenoz, Britta Hufeisen and Ulrike Jessner (eds.), Cross-linguistic influence in third language acquisition: Psycholinguistic perspectives, 21–41. Clevedon, UK: Multilingual Matters. Hammarberg, Björn (ed.). 2009. Processes in third language acquisition. Edinburgh: Edinburgh University Press. Hancin-Bhatt, Barbara Jean. 1994. Segment transfer: A consequence of a dynamic system. Second Language Research 10. 241–269. Haugen, Einar. 1950. The analysis of linguistic borrowing. Language 26 (2). 210–231. Hayes, Bruce. 1995. Metrical stress theory: Principles and case studies. Chicago: University of Chicago Press. Heffernan, Kevin. 2007. The role of phonemic contrast in the formation of Sino-Japanese. Journal of East Asian Linguistics 16. 61–86. Herd, Jonathon. 2005. Loanword adaptation and the evaluation of similarity. Toronto Working Papers in Linguistics 24. 65–116. Hoque, Muhammad Azizul. 2011. Problems of pronunciation for the Chittagonian learners of English: A case study. Journal of Education and Practice2(6). 1–17. Itô, Junko & Armin Mester. 1995. Japanese phonology. In John A. Goldsmith (ed.) The handbook of phonological theory, 817–838. Cambridge, MA & Oxford: Blackwell. Itô, Junko & Armin Mester. 1999. The phonological lexicon. In NatsukoTsujimura (ed.), The handbook of Japanese linguistics, 62–100. Malden, MA & Oxford: Blackwell. Itô, Junko & Armin Mester. 2001. Covert generalizations in Optimality Theory: The role of stratal faithfulness constraints. Studies in Phonetics, Phonology, and Morphology 7, 273–299. Itô, Junko, Yoshihisa Kitagawa & Armin Mester. 1996. Prosodic faithfulness and correspondence: Evidence from a Japanese argot. Journal of East Asian Linguistics 5.217–294. Kager, René. 1995. Consequences of catalexis. I Harry van der Hulst and Jeroen de van Weijer, de (eds.), Leiden in Last. HIL Phonology Papers I, 269–298. The Hague: Holland Academic Graphics.

Loanword adaptation and second language acquisition

115

Kager, René. 1999. Optimality Theory. Cambridge: Cambridge University Press. Kang, Yoonyung. 2010. Loanword Phonology. individual.utoronto.ca/yjkang/files/TBC_100.kang. pdf ). Kar, Somdev. 2009a. The syllable structure of Bangla in Optimality Theory and its application to the analysis of verbal inflectional paradigms in distributed morphology. Tübingen: University of Tübingen (TOBIASlib).+ PhD dissertation. Kar, Somdev. 2009b. Gemination in Bangla: An Optimality Theoretic Analysis.The Dhaka University Journal of Linguistics 1 (2). 87–114. Karim, Khaled. 2010. Vowel epenthesis in Bengali: An Optimality Theory analysis. Working Papers of the Linguistics Circle of the University of Victoria 20 (1). 26–36. Kehoe, Margaret, Geraldine Hilaire-Debove, Katherine Demuth, & Conxita Lleó. 2008. The structure of branching onsets and rising diphthongs: Evidence from the acquisition of French and Spanish. Language Acquisition 15 (1). 5–57. Kiparsky, Paul. 1991. Catalexis. Unpublished manuscript. Stanford University. Kirk, Cecilia & Katherine Demuth. 2003. Onset/coda asymmetries in the acquisition of clusters. In Barbara Beachley, Amanda Brown, and Frances Conlin (eds.), Proceedings of BUCLD 27. 437–448. Somerville, MA: Cascadilla Press. Kirk, Cecilia & Katherine Demuth. 2006. Accounting for variability in 2-year-olds’ production of coda consonants. Language Learning and Development 2. 97–118. Krashen, Stephen. 1981. Second language acquisition and second language learning. Oxford: Pergamon Press. Major, Roy. 2001.Foreign accent: The ontogeny and phylogeny of second language phonology. Mahwah: Lawrence Erlbaum. McCarthy, John J & Alan Prince. 1995. Prosodic morphology. In John A Goldsmith (ed.), The handbook of phonological theory, 318–366. Cambridge, MA& Oxford: Blackwell. McCarthy, John J. 2001. A thematic guide to optimality theory. Cambridge: Cambridge University Press. Murray, Robert & Theo Vennemann. 1983. Sound change and syllable structure in Germanic phonology. Language 59. 514–528. Musa, Monsur. 1995. Bànglàdesherràstrabhaßà [The statelanguage of Bangladesh]. Dhaka: Bangla Academy. Paradis, Carole & Darlene LaCharité. 1997. Preservation and minimality in loanword adaptation. Journal of Linguistics 33 (2). 379–430. Paradis, Carole & Darlene LaCharité. 2008. Apparent phonetic approximation: English loanwords in Old Quebec French 1. Journal of Linguistics 44 (1). 87–128. Paradis, Carole & Darlene LaCharité. 2009. English loanwords in Old Quebec French: Fewer bilinguals does not mean a great increase in naive phonetic approximation. Langues et Linguistique 32. 82–117. Peperkamp, Sharon, Michele Pettinato & Emmanuel Dupoux. 2003. Reinterpreting loanword adaptations: The role of perception. In Barbara Beachley, Amanda Brown & Francis Conlin (eds.), Proceedings of the 27th Annual Boston University Conference on Language Development, 650–661. Somerville, MA: Cascadilla Press. Peperkamp, Sharon, Inga Vendelin & Kimihiro Nakamura. 2008. On the perceptual origin of loanword adaptations: Experimental evidence from Japanese. Phonology 25 (1). 129–164. Prince, Alan and Paul Smolensky. 1993. Optimality Theory: Constraint interaction in generative grammar. Rutgers Center for Cognitive Science Technical Report TR-2.

116

Hemalatha Nagarajan

Silverman, Daniel. 1992. Multiple scansions in loanword phonology: Evidence from Cantonese. Phonology 9 (2). 289–328. Steriade, Donca. 2001. The phonology of perceptibility effects: The P-map and its consequences for constraint organization. (Microsoft Word format). Ussishkin, Adam. 2000. The emergence of fixed prosody. UC Santa Cruz PhD dissertation. Vennemann, Theo. 1988. Preference laws for syllable structure. Berlin, New York & Amsterdam: Mouton de Gruyter. Vijayakrishnan, K. G. 2007. The Disyllabic Trochee in Bangla, Punjabi and Tamil: Variations on a Theme. In Josef Bayer, Tanmoy Bhattacharya and M.T. Hany Babu (eds.), Linguistic theory and South Asian languages. 237–247. John Benjamins: Amsterdam.

Taiwo Olayemi Soneye & Kehinde A. Ayoola

7 Onset consonant cluster realisation in Nigerian English: The emergence of an endogenous variety? 1 Introduction Many scholars have explored the “domestication”, “indigenisation”, “acculturation” and “nativisation” of the English language in Nigeria (e.g. Banjo 1971; Jibril 1979, 1982; Bamgbose 1982; Jowitt 1991; Kujore 1995; Gut 2004, 2005; Alo and Mesthrie 2008; Ayoola 2007; Akere 2009; Ugorji 2010; Fuchs, Gut, and Soneye 2013). Furthermore, issues relating to the choice of an appropriate model variety for schools and other pedagogical concerns, the relationship of English with the hundreds of indigenous languages spoken in Nigeria and the need for the codification of Nigerian English have been discussed for more than 40 years (e.g. Atoye 1987; Amayo 1988; Dairo 1988; Dadzie and Awonusi 2004; Ekong 2007). Yet, an endonormative variety, a form of Nigerian English that is accepted as a standard within the country, has not emerged yet. One reason for this might be the fact that several varieties of English exist side by side in Nigeria. Since the pioneering work of Brosnahan (1958), there have been numerous efforts to describe the variation of English in Nigeria using different parameters such as geographical, (i.e. Southern, Northern varieties); educational, (i.e. non-education, primary, secondary etc.) and ethnic (i.e. Igbo English, Hausa English, Edo English, Yoruba English etc.). Banjo (1971: 1993) for instance describes Nigerian English varieties using educational parameters: Variety I is spoken by Nigerians with primary school education and is considered uneducated English. Variety II has social acceptability but with a low degree of intelligibility and some L1 transfer. Variety III, spoken by university graduates, is socially accepted and has national and international intelligibility, observes phonemic distinctions of British English and is close to Standard British English in syntax and semantics. Udofot (2005) describes this typology as not being a faithful representation of Nigeria’s linguistic space in the 21st century anymore and, in her investigation of stress and rhythm in spoken Nigerian English, identified a non-standard variety, spoken by primary school leavers, the standard variety spoken by university graduates, and the sophisticated variety, which is

Taiwo Olayemi Soneye & Kehinde A. Ayoola, Obafemi Awolowo University, Ile-Ife

118

Taiwo Olayemi Soneye & Kehinde A. Ayoola

spoken by trained linguists, professional speakers and phoneticians (Udofot 1997, 2003). There is, however, no clear correlation between educational level and phonology: In some cases, the pronunciation of a primary school leaver sounds more British in the use of some consonantal segments such as the dental fricatives and stress placement than another Nigerian who has acquired university education. The present study focuses on the standard variety identified by Udofot and seeks to test whether it is indeed a homogenous variety in terms of onset cluster realisation. Jibril (1979, 1982, 1986) furthermore proposes distinct phonological features of English in the northern and the southern parts of Nigeria – Hausa English (basic and sophisticated) and Southern English (basic and sophisticated) – and in addition identifies a southern-influenced Hausa English. Jibril’s (1982, 1986) sociolinguistic classification of “Sophisticated Hausa English” and “Sophisticated Southern English”, however, implies some homogeneity of spoken English in the two regions that might not exist. Contemporary experience clearly no longer supports this generalisation, and the broad categorisation of Nigerian English as southern and northern varieties is beginning to pose some challenges. Before 1963, there were three geographical regions in Nigeria: the Southern, Eastern and Northern; but that year the Midwestern region was added and it was later split into Edo and Delta states. Likewise, Cross River and Rivers states were carved out of the Eastern region. This distribution and redistribution of Nigeria has had linguistic implications. For instance Kwara state used to be within “the Yoruba speaking enclave” and in this state “loyalty to Hausa was for some time almost as strong as to Yoruba” (Banjo 1996: 27) and to date, both Hausa and Yoruba are spoken there (Adegbija: 2004). Also to date, there has not been any policy statement as to where the Federal Capital Territory, Abuja belongs. Table 1 lists the variety of languages that is spoken in each of the states of contemporary Nigeria. It shows that the broad categorisation of Nigerian English as southern and northern varieties is highly problematic. There are Nigerian states (such as the Federal Capital, Abuja) where the three major languages are widely spoken and the government itself is silent on whether such states belong to the North or South. One of the aims of this study is thus to explore whether classifications on geographical grounds such as into southern and northern Nigerian English are still linguistically valid. A more recent challenge is describing and conceptualising Nigerian English as “English in Nigeria”, as “English as a second language (ESL)” with features of learner errors, or as “Nigerian English”, a “New English variety”, a term that was coined in the 80s. Jowitt (1991) states that “New English” is not the same as “English as a second language”. According to Jowitt “New English” refers [. . .] to an established language variety and so to usage, whereas in ‘English as a

Onset consonant cluster realisation in Nigerian English

119

Table 1: The complexity of the language situation in contemporary Nigeria, adapted from Adegbija (2004: Ch. 3) State

Dominant language(s)

No of languages

Major language(s)

Language(s) spoken by > mill. speakers

Abia

Igbo

Igbo



Akwa Ibom Adamawa Taraba (Former Gonglola) Bauchi Benue Bornu Yobe

Annang, Ibibio, Oron Abon, Awak, Bachama, Bandawa Fulfulde (Fula), Chamba, Jukun, Hausa, Kuteb Fulfulde (Fula) Idoma Bade, Balewa, Badawai, Baduna Buduna, Fulfulde (Fula), Kanuri, Shuwa-arabic Effik, Annang, Bokyi, Bekwara, Ibibio Afemai (Yekhee), Ebira Edo (Bini), Itsekiri, Igbo, Ijo (Izon), Urhobo Igbo Igbo Igbo (18 dialects) Hausa, Fulfulde Hausa, Kaje Hausa Hausa, Fulfulde Arabic, Bare, Badakare, Banga Fulfulde, Hausa Baruba, Agwara-Kamberi Ebira, Nupe, Yoruba, Igala Yoruba Gwari, Hausa, Nupe, Fulfulde Yoruba Ijo, Yoruba Yoruba Yoruba Hausa, Jarawa, Jukun, Tiv Ikwere, Izon (Ijo), Kalabari, Kana Gwari, Hausa, Nupe

1 (14) dialects 3 4 119

– – Hausa

Ibibio – Tiv, Fulfulde

73 10 4 39

Hausa Hausa – Hausa

Nupe Tiv – Fulfulde

71



Tiv, Effik, Ibibio

2 31

– Yoruba

– (Pidgin)

1 1 1 2 53 4 2 4 17 2 18 4 23 2 (Egun) 2 (8 dialects) 1 (7 dialects) 1 99 33 10

Igbo Igbo Igbo Hausa Hausa Hausa Hausa – Hausa – Yoruba, Hausa Yoruba Hausa Yoruba Yoruba Yoruba Yoruba Hausa – Hausa, Igbo, Yoruba

– – – Fulfulde – Fulfulde Fulfulde – Fulfulde – Nupe – Fulfulde, Nupe – – – – Tiv Ijo Nupe

Cross River Delta Edo (Bendel) Enugu Anambra Imo Jigawa Kaduna Kano Katsina Kebbi Sokoto Kogi Kwara Lagos Niger Ogun Ondo Osun Oyo Plateau Rivers Abuja (FCT)

120

Taiwo Olayemi Soneye & Kehinde A. Ayoola

second language’ the emphasis is usually on language learning and the target of learning” (1991: 3). So the question to be answered is whether Nigerian English in a second language environment is still “striving” to equate the norm of the British English bequeathed to Nigeria during the colonial era. This question definitely requires answers via empirical analysis and there could be no better time to do this than now, when there is a growing interest in the comparative study of World Englishes (Schneider et al. 2004). The aim of this study is therefore to investigate whether, following Jowitt’s (1991) claim, there are established usages in terms of onset consonant clusters that educated Nigerian speakers that are themselves university teachers are not aspiring to “correct”. The issue of classifying English in Nigeria as learner English or a new variety of English is further complicated by the fact that there is a growing number of children in Nigeria who have English as their L1. Banjo describes a new generation of Nigerians, living in Nigeria from birth “who are already English-speaking on their first day at school (. . .) In many cases, they are even monolingual, and for the rest, bilingual with English dominance” (1996: 43–44). The explanation for this phenomenon according to Banjo (1996) is that English is the language of interaction, play and audio-visual education during preschool, nursery and primary school stages of their development. Many Nigerian cities, exemplified by the Lagos metropolis, the oil city of Port Harcourt, the university city of Zaria, the Federal Capital Territory of Abuja and several state capitals are inhabited by families that hail from different parts of the federation of 36 states and at least 250 ethnic nationalities. Also, there is an increase in the incidents of marriage between people from different ethnic nationalities; hence the language of communication in such homes is often English and this becomes the first language of their children (Banjo 1996: 44). Many educated Nigerians, irrespective of whether they have different native languages or not, communicate mainly in English with their children. Moreover, city children from the middle-lower class and the middle class are often sent by their parents to nursery-primary schools where English is used almost exclusively for both instruction and all other communicative activities in the school. The use of the language of the immediate environment is almost taboo in many of such schools (certainly in Lagos). Children from middle-lower class and middle class homes often have access to satellite television, computer games, play stations, cartoons, videos, storybooks and other educative materials, most of which are usually produced in English. The consequence of this is that the children (and wards) of educated elites, more often than not, use English naturally amongst themselves when they are at play. However, what is germane to this study here is whether the English that these Nigerians speak is normoriented towards British English or an L1 English variety that is Nigerian English oriented.

Onset consonant cluster realisation in Nigerian English

121

The purpose of this study is therefore to explore the issues of regional (and thus L1-influenced) variation in Nigerian English phonology and of possible differences between Nigerian L1 and L2 speakers of English. By focussing on onset cluster realisation it addresses the following questions: (a) Is educated spoken Nigerian English homogenous? (b) Are there tangible differences in the spoken English of educated Nigerians whose first language is English and those who have English as their second language? (c) Are there phonological features in Nigerian English that could be regarded as “nativised” rather than “learner errors”? (d) Is Nigerian Spoken English already an identifiable variety or it is a variety tilting towards the “bequeathed” British English variety? The paper is structured as follows: after a brief description of the syllable structures of English and the major Nigerian languages in section 2, the methodology of the present study will be presented in section 3. Subsequently, the results are presented and discussed in sections 4 and 5.

2 Consonant clusters in English and in Nigerian languages British English has a complex syllable structure that can be described as (C0– C3) V (C0–C4) (Yavas 2011). This means that in British English between zero and three consonants can occur in the onset position of a syllable and between zero and four consonants in the coda position after the nucleus. Consequently, in English a wide range of syllable types are possible: 14 different types of syllables have been attested ranging from V (nucleus only as in oh) to CCCVCCCC (as in strengths /stɹeŋkθs/). Phonotactic rules govern the order of consonant sequences, which are based on the sonority of sounds (Ohala 1986). In a syllable, sounds with the highest sonority occur in the nucleus position. Consonants in the onset position increase in sonority towards the nucleus, whereas consonants in the coda position are ordered in terms of decreasing sonority. Nigerian languages, by contrast, have a prevalence of CV syllables. Nigeria’s three major indigenous languages, Igbo, Yoruba and Hausa for example, do not permit onset clusters. Yoruba and Igbo manifest only three types of syllable structures, which are CV, V and N. Both allow a maximum of two elements in a syllable. These elements are one consonant and one vowel (CV). Yoruba and Igbo also allow syllables without onsets and both allow syllables consisting of

122

Taiwo Olayemi Soneye & Kehinde A. Ayoola

a single syllabic nasal (N). However, as Soneye (2009: 84–85) observed, young educated Yoruba speakers of English now exhibit a double consonant cluster rendition in Yoruba words such as “kraakita” for kirakita, (strenuous labouring) and “graagraa” for giragira (senseless moves). With regard to consonant clusters in Nigerian English, Jowitt (1991), Simo Bobda (2003, 2007) and Gut (2007) found consonant reduction in syllable codas as well as insertion of epenthetic vowels. Simo Bobda describes consonant cluster reduction as common in the coda position in words such as uncle and devil (2003: 30), while Gut (2007), found that deletion occurs more often in three-consonant coda clusters (e.g. rinsed [rinzd]) than in two-consonant coda clusters (e.g. cold [kəυld]). Similarly, Huber (2004: 861) states that “cluster reduction at the coda position is a phenomenon that Ghanaian English shares with other West African Englishes” like Nigeria and this phenomenon, the author describes as relatively prevalent in the English of the less educated in Ghana. The production of consonant clusters in onset position in Nigerian English has not been investigated yet systematically and will therefore be the object of this study.

3 Methodology Thirty Nigerians, aged between 10 and 70, participated in this study. They are of mixed ethnic and linguistic backgrounds, especially Igbo, Hausa and Yoruba, and comprise nine females and twenty-one males who (with the exception of the 10-year-old child) have all acquired tertiary education (see Appendix 1). 12 of the respondents are from the Northern part of the country; out of the twelve, five are from the North Central namely Zamfara, Kano, Sokoto and Kaduna states, with Hausa as their native language, and the remaining seven are from the North Peripheral namely Benue, Plateau, Kebbi and Kogi states respectively with their native languages being Tiv, Ngas, Fakkansi, Klela, Igala and Nupe. However, in this study North Central and North Peripheral are much of the time collapsed into the northern group because of the small number of speakers for some individual languages. The remaining eighteen respondents are from the Southern part of Nigeria, 12 from the South-West, namely Osun, Ondo, Ogun, Ekiti and Kwara states. Kwara here is categorised as Southern instead of Northern because the three respondents from Kwara state have Yoruba as their native language, although it is categorised politically and geographically as a northern state. Six participants are from the South-East, four from Anambra, Abia, and Enugu states respectively with Igbo as their native language. Two are from the

Onset consonant cluster realisation in Nigerian English

123

old Midwestern region, namely Edo and Delta states, and their native languages are Urhobo and Afemai respectively. Four subjects from the south-west group, whose ages fall between 10 and 26 and who for the most part of their lives were bred in Nigeria, have English as their L1. Among the L1 speakers of English, one was born in England and stayed there until he was five years old. The second L1 speaker was born in Osun state Nigeria but left for Sussex when she was about 5 years and stayed there for about 10 years. The last two were born in Lagos and Osun, Nigeria and had not travelled outside the country in the last 25 years. We consider it useful not to lump these Nigerian L1 speakers together with the south-west group, although the four of them are all from the South-West. Three specific methods were employed to elicit information from the participants (see Appendix 2). First, a questionnaire with two sections was administered to them. Section A elicited information on the respondents’ career, native language, spouse’s language, state of origin, length of stay in places across the country, most used language at home, order of proficiency in languages used, standard dialect of native language, etc. Section B included a read aloud text and a retelling of a purposively composed story that contains four words with double onset clusters (smaller, cranny, pride, phlegm), to which a fifth word (problem) from the retelling task was added. Furthermore, the text included nine words with triple onset consonant clusters: Four times /str-/ in stranded, strive, strolling, stroke; three times /skr-/ (scream, scroll, scrape) and two words with /spr-/ (spread, sprays). After reading the text aloud, the participant retold the event in the passage for collaborative phonological validation. The third task consisted of a structured interview where respondents could engage in spontaneous reaction and comments. The spontaneous reactions were meant to further validate their pronunciations. The analysis was conducted by both researchers using an auditory assessment of the computer recorded speeches.

4 Results The results show systematic differences in the production of double consonant onset clusters and triple consonant clusters (see Table 2). While only 56.6% (153/270) of all triple consonant onset clusters are produced faithfully, 99.3% (149/150) of the double consonant analysed onset clusters are produced faithfully. Table 2 furthermore shows that it is only the /str-/ onset cluster that is rarely produced faithfully by most participants of this study (2.8%), while the /spr-/ cluster is produced faithfully in 98.3% of all cases. The /skr-/ onset cluster is

124

Taiwo Olayemi Soneye & Kehinde A. Ayoola

Table 2: Retention of triple and double consonant clusters among the southern and northern sub-groups and the L1 speakers South West

South East

North Central

North Peripheral

L1 speakers

/str-/ /spr-/ /skr-/ n

0/32 16/16 24/24 72

0/24 12/12 18/18 54

0/20 9/10 15/15 45

0/28 14/14 21/21 63

4/16 8/8 12/12 36

/sm-/ /kr-/ /pr-/ /fl-/ n

8/8 8/8 15/16 8/8 40

6/6 6/6 12/12 6/6 30

5/5 5/5 10/10 5/5 25

7/7 7/7 14/14 7/7 35

4/4 4/4 8/8 4/4 20

always produced faithfully. Faithful production of /str-/ varies significantly across the five speaker groups (χ2 (df = 4) = 21.5; p = 0.000) with only one L1 speaker (the 10-year-old child who had spent his first five years in Britain) producing them faithfully. When analysing the onset consonant cluster reduction processes employed by the speakers, it becomes clear that two processes are common in Nigerian English: Consonant elision and vowel insertion. In the /str-/ clusters, all Nigerian speakers irrespective of their regional and linguistic background delete the voiceless alveolar plosive /t/. While the south-west and south-east Nigerian English speakers only tend to elide the plosive /t/ and retain the syllabic structure of the word, the northern Nigerian English speakers in addition to eliding, tend to insert an epenthetic vowel so that most of their words become longer; i.e. disyllabic words becoming trisyllabic as in the word strolling /sɪrolɪn/. Figure 1 shows that vowel insertion is especially common in the speech of Nigerians from the North Central region. Figure 2 illustrates that there are some lexical effects. In the word stranded speakers from all over Nigeria except the L1 speakers produce vowel insertions, while the /str-/ cluster in words like stroke, strive and strolling never shows insertion by any Southern Nigerian speakers. This might be due to the fact that stranded appears as the first triple-onset cluster word in the text. The triple onset cluster /spr-/ as in spray is commonly produced faithfully in the renditions of North Peripheral, South-East, South-West, and Nigerian speakers of English as L1. Only in 20% of the instances speakers from the North Central inserted an epenthetic vowel in the pronunciation of spread as /spɪred/. The /skr-/ triple onset clustered words as in scroll and scrape were articulated fully by speakers in all the subgroups. The only significant difference is that the

Onset consonant cluster realisation in Nigerian English

125

Figure 1: Percentage of vowel insertion in triple onset clusters across the five speaker groups.

Figure 2: Percentage of triple onset cluster reduction by insertion in individual words across the five speaker groups.

diphthongs in them were realised as monophthongs by all, including 75% of Nigerian speakers of English as L1. Also worth mentioning in this study is the issue of accent diffusion. One of the respondents, NC1 (see Appendix 1) from Zamfara (North Central), born in Gusau who had stayed there for 20 years but is married to a Southerner from Ogun state and who speaks English as the second home language, exhibits more features of southern English than of North Central. We did not examine this further but the issue of accent diffusion is worth examining at a future date.

126

Taiwo Olayemi Soneye & Kehinde A. Ayoola

Figure 3: Percentage of inserted vowels in double consonant onset clusters across the five speaker groups.

Figure 3 presents the total percentage score of respondents’ patterns of reduction of onset double clustered words via insertion of epenthetic vowels. It shows that reduction in onset double consonant clustered words is not as high as in triple onset clustered words. For instance in North Central triple onset cluster we have as high as 80% reduction by insertion whereas for double onset clustered words the highest is 40%. There was only one case of double onset cluster-word reduction via elision in all the subdivisions. This occurs in the pronunciation of the word problem (from the retelling task), which has the /r/ elided so that we have [pʊoblem] instead of /prɒbləm/. The cases of reduction of cluster by insertion include [sɪmʊla] for smaller and [kɪrani] for cranny. There is also the substitution of a sound /p/ for /f/ in phlegm [plem] in especially the north central variety but this aspect is already established in the literature (Soneye 2008). We also have insertion of /ɪ/ in the same word, i.e. [pɪgem] by 8% of the south-west subjects, which is likely to be an instance of spelling pronunciation. Indeed one of the L1 speakers of English pronounced [pɪgem] instead of /flem/. Two interesting words, although not within the sphere of this current investigation, but heard in the course of the interviews with 20% of those from North Central are the words government and hundred. They pronounced [gɔment] or [gɔmen] for government and [hɔndɪred] for hundred, while those from the South pronounce those words [gɔvment] and [hɔndred], retaining the double cluster at the middle of the two words. Table 3 presents the most common rendition of the 14 target words in the four geographical speaker groups. It shows that educated spoken Nigerian English is not homogenous across the country as far as onset cluster production is concerned. Especially speakers from the North Central part show patterns of insertion that are not shared by speakers from other parts of Nigeria.

Onset consonant cluster realisation in Nigerian English

127

Table 3: Commonest pronunciations of triple and double onset consonant clusters among the southern and northern subgroups (excluding the L1 speakers) word stranded

British

SouthWest

SouthEast

/strændɪd/

North Peripheral

sranded

North Central

Elision/ insertion

sranded / sɪranded

t –

scream

/skrim/

strolling

/strəʊlɪŋ/

srolin

skrim sɪrolin

t/ɪ

strive

/straɪv/

sraɪv

sɪraɪv

t/ɪ

sprays

/spreɪz/



sprez spred

spred / spɪred

– / (ɪ)

spread

/spred/

scroll

/skrəʊl/

skrol



stroke

/strəʊk/

srok

t

scrape

/skreɪp/

skrep



cranny

/kræni/

pride

/praɪd/

problem

/prɒbləm/

phlegm

/flem/

smaller

/smɔlə/

krani

kɪrani

prοblem flem / pɪgem

pʊrοblem flem

smɔla

–/ɪ –

praɪd pʊrοblem/pʊοblem

/r/, /ʊ/

plem

/p/substitution

smɔla / sʊmɔla

–/ʊ

5 Discussion This study has examined onset consonant cluster production in the Nigerian variety of English and showed that triple consonant onset clusters are more often reduced than double onset clusters. Our findings thus are in parallel with those by Gut (2007), who observed in her study that deletion occurs more often in three-consonant coda clusters than in two-consonant coda clusters in Nigerian English. The two reduction processes observed are consonant elision and vowel insertion. Consonant elision is most frequent in triple onset consonant clusters with /str-/ in all subgroups and vowel insertion is most frequent in /str-/ triple onset cluster and double onset cluster /pr-/ words of Northerners. Some significant differences were found between speakers from the northern and those from the southern part of Nigeria. While both groups show elision in onset clusters, it is characteristic of northern Nigerian English to reduce onset clusters more by insertions than their southern educated counterparts. The first research question, i.e. whether educated spoken Nigerian English is homogenous across the country, has to be answered in the negative.

128

Taiwo Olayemi Soneye & Kehinde A. Ayoola

The second research question was concerned with possible differences between Nigerians who speak English as an L2 or as an L1. The linguistic reality in Nigeria is such that a yet to be determined percentage Nigerian children/ youth acquire English as their L1 instead of as a second language. For this set of Nigerians, the “Mother Tongue” or “Language of Immediate Environment” such as Yoruba, Igbo and Hausa are beginning to assume a second language status. Results from the questionnaire reveal that 30.3% (9) respondents’ families speak English as their major language at home; all from the South, 22% are from the South-West (Yoruba) and 8% from the South-East (Igbo). 9 out of the 30 subjects claim that English is the language in which they have acquired the highest proficiency and out of the 9 subjects six are between the ages of 10 and 26. The implication of this is that the English-only practice is only an emerging phenomenon common among southern Nigerians especially the Yoruba people; perhaps due to the fact that formal education developed first in the South-West and gained ground more rapidly than is the case in northern Nigeria. Although this category of L1 English users form a small percentage of the overall Nigerian speakers of English, quite significant in this study is the result from the four Nigerians that have English as their L1, which implies that the only-English phenomenon is emerging. The study has shown systematic differences in the spoken English of educated Nigerians whose first language is English and those who have English as their second language: For instance, it has discovered that elision is not as high in that emerging variety as it is in the variety of Nigerian speakers of English as a second language. It is an L1 speaker who is the only participant in this study not to reduce the /str-/ onset cluster in stranded. It thus appears that the variety these L1 speakers speak has features which are British-norm dependent. We assert however that those who have acquired English in Nigeria as an L1 speak the same variety as those who have acquired it as L2. In other words, the L1 speakers of English in Nigeria are also speakers of Nigerian English perhaps with less features of mother tongue interference. However, there is dire need for more research in this direction. All the processes of onset consonant cluster production in Nigerian English that the current study discovered attest to the fact that Nigerian English is not British norm-oriented. However, it does not seem right either to call these features learner errors, since speakers are not going to change their pronunciation. It appears that as Schneider (2007) describes in his Dynamic Model, educated Nigerian English is moving from Phase 3, nativisation, towards Phase 4 with endonormativity and independence even in their linguistic orientation. The realities may have resulted from our articulatory patterns which are in close affiliation with our L1s (but this is even so for Nigerian speakers of English as L1) coupled with our socio-political backgrounds. As Schneider observes:

Onset consonant cluster realisation in Nigerian English

129

Phonetic transfer from a first to a second language is easy and happens commonly, due to the psycholinguistic fact for most people the micro-muscular movements of their speech organs and thus their native phonetics, are fixed at a relatively early age. . .and these established pronunciation habits are very difficult to modify at a later age. Therefore an accent which could originally be accounted for as phonetic transfer by individuals became a permanent marker of the community. Examples include Nigerian English (. . .). (2000: 208)

Our speech motor patterns, our “native phonetics” and “our pronunciation habits” differ from ethnicity to ethnicity, and from region to region in Nigeria and all of these differences affect and condition the phonotactics of the varieties of educated Nigerian English we speak as demonstrated with the onset consonant cluster renditions. One could say as a rider to this that some sound sequences are not as phonetically complex for both speakers of English as first or second language in Nigeria as some others. Also, it is clear from this research that there are phonological features that could and should be categorised as nativised rather than “learner errors”. L1 influence is discernible and feature-diffusion of Southern and Northern Nigerian English varieties is also likely. The study reveals the inadequacy of the age-long geo-tribal approaches (Akinjobi 2006) and methodological rhetoric of compartmentalisation in researching the acquisition of English phonology in Nigeria as some phonological diffusion is beginning to emerge in all the varieties. This might not be unconnected to the high degree of migration between northern and southern parts of Nigeria within the last one or two last decades. On the whole, the study suggests multidimensional conceptual frameworks in unravelling more features and to deal with the current lacuna in the research and acquisition of English phonology especially in multilingual environments such as Nigeria.

6 References Adegbija, Efurosibina E. 2004. Multilingualism: A Nigerian case study. Asmara: Africa World Press. Akere, Funso. 2009. The English language in Nigeria: The sociolinguistic dynamics of decolonization and globalization. Refereed Proceedings of the 23rd Annual Conference of the Nigeria English Studies Association. 2–16. Akinjobi, Adenike. 2006. Vowel reduction and suffixation in Nigeria. English Today 85. 22 (1). 10– 17. Alo, M. A. and Rajend Mesthrie. 2008. Nigerian English: Syntax and morphology. In Rajend Mesthrie (ed.), Varieties of English. Volume 4: Africa, South and Southeast Asia, 323– 339. Berlin: Mouton de Gruyter.

130

Taiwo Olayemi Soneye & Kehinde A. Ayoola

Amayo, Airen. 1988. Teaching English pronunciation in Nigeria: What model? The case of RP as the model for teaching English pronunciation: A rejoinder. Ife Studies in English Language 2 (1). Atoye, Raphael. 1987. The case of RP as the appropriate model for teaching English pronunciation. Ife Studies in English Language (ISEL) 1 (1). 63–69. Ayoola, Kehinde A. 2007. The triumph of non-standard English in Nigeria. Papers in English and Linguistics (PEL) (Obafemi Awolowo University, Ile-Ife), Vol. 7 & 8. 117–126. Bamgbose, Ayo. 1982. Standard Nigerian English: Issues of identification. In Braj B. Kachru (ed.), The other tongue: English across cultures, 2nd. edn. 148–161 New York: Pergamon Press. Banjo, Ayo. 1971. Towards a definition of ‘Standard Nigerian spoken English’. Acres du 8th Congress de la Societé Linguistique de LʻAfrique Occidentale, 165–175. Banjo, Ayo. 1993. An endonormative model for the teaching of the English language in Nigeria. International Journal of Applied Linguistics. 3 (2). 261–275. Banjo, Ayo. 1996. Making a virtue out of necessity: An overview of the English language in Nigeria. Ibadan: University of Ibadan Press. Brosnahan, Leonard F. 1958. English in Southern Nigeria. English Studies 39. 97–110. Dadzie, A.B.K. and Awonusi Segun (eds.). 2004. Nigerian English: Influences and characteristics. Lagos: Concept Publications. Dairo, Lekan. 1988. Teaching English pronunciation in Nigerian Schools: The Choice of a model. Ife Studies in English Language (ISEL) 2 (1). 102–109. Ekong, Pamela. 2007. On the use of an indigenous model for teaching English in Nigeria. World Englishes 1 (3). 87–92. Fuchs Robert, Ulrike Gut and Taiwo Soneye. 2013. We just don’t even know: The usage of the pragmatic focus particles even and still in Nigerian English. English World-Wide 34 (2). 123–145. Gut, Ulrike. 2004. Nigerian English phonology. In Edgar W. Schneider, Kate Burridge, Bernd Kortmann, Rajend Mesthrie & Clive Upton (eds.), 813–830. A handbook of varieties of English. Volume 1: Phonology. Berlin: Mouton de Gruyter. Gut, Ulrike. 2005. Nigerian English prosody. English World-Wide 26 (2). 153–177. Gut, Ulrike. 2007. First language influence and final consonant clusters in the new Englishes of Singapore and Nigeria. World Englishes 26(3). 346–359. Huber, Magnus. 2004. Ghanaian English phonology. In Edgar W. Schneider, Kate Burridge, Bernd Kortmann, Rajend Mesthrie & Clive Upton (eds.), 854–863. A handbook of varieties of English. Volume 1: Phonology. Berlin: Mouton de Gruyter. Jibril, Munzali. 1979. Regional variation in Nigerian spoken English. In Ebo Ubahakwe (ed.), Varieties and functions of English in Nigeria, 78–93. Ibadan: African Universities Press. Jibril, Munzali. 1982. Nigerian English: An introduction. In John B. Pride (ed.), New Englishes, 73–84. Rowley, MA: Newbury House. Jibril, Munzali. 1986. Sociolinguistic variation in Nigerian English. English World-Wide 7. 47–75 Jowitt, David. 1991. Nigerian English usage: An introduction. Ikeja: Longman Nigeria. Kujore, Obafemi. 1995. Whose English? In Ayo Bamgbose, Ayo Banjo and Andrew Thomas (eds.), New Englishes: A West African perspective, 366–380. Ibadan: Mosuro Publishers. Ohala, J. 1986. Consumer’s guide to evidence in phonology. Phonology Year Book 3. 3–26. Schneider, W. Edgar. 2000. Feature diffusion vs. contact effects in the evolution of new Englishes: A typological case study of negation patterns. English World-Wide 21 (2). 201– 230

Onset consonant cluster realisation in Nigerian English

131

Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world. Cambridge: Cambridge University Press. Schneider, Edgar W., Kate Burridge, Bernd Kortmann, Rajend Mesthrie & Clive Upton (eds.). 2004. A handbook of varieties of English. Volume 1: Phonology. Berlin: Mouton de Gruyter. Simo Bobda, Augustin. 2003. The formation of regional and national features in African English pronunciation. English World-Wide 24 (1). 17–42. Simo Bobda, Augustin. 2007. Some segmental rules of Nigerian English phonology. English World-Wide 28 (3). 279–310. Soneye, Taiwo. 2008. Accentual variation in educated Nigerian English varieties and its implications for standardization. Ibadan Journal of English Studies 5. 402–413. Soneye, Taiwo. 2009. Linguistic glottophagia in the usage of Yoruba proverbs: A playful or painful blasphemy? The International Journal of Language, Society and Culture 29. 80–86. Udofot, Inyang M. 1997. The rhythm of Nigerian English. University of Calabar unpublished PhD dissertation. Udofot, Inyang M. 2003. Stress and rhythm in the Nigerian accent of English: A preliminary investigation. English World-Wide 24 (2). 201–220. Udofot, Inyang M. 2005. Emergent trends in English usage in Nigeria. Paper presented at the 22nd Annual Conference of the Nigeria English Studies Association, Obafemi Awolowo University, Ile-Ife, Nigeria, 24th September 2005. Ugorji, Ugo. 2010. New Englishes in diachronic light: Evidence from Nigerian English phonology. The International Journal of Language Society and Culture 30. 131–141. Yavas, Mehmet. 2011. Applied English phonology, 2nd edn. London: Blackwell.

132

Taiwo Olayemi Soneye & Kehinde A. Ayoola

Appendix 1 Bio-data of respondents Tags

State of Origin

Place of birth/ sex age langs. Native length of stay used lang.

NC_1 NC_2 NC_3 NC_4 NC_5 NP_1 NP_2 NP_3 NP_4 NP_5 NP_6 NP_7 SE_1 SE_2 SE_3 SE_4 SE_5 SE_6 Sw_1 Sw_2 Sw_3 Sw_4 Sw_5 Sw_6 Sw_7 Sw_8 SW_9 Sw_10 Sw_11 Sw_12

Zamfara Kano Kano Sokoto Kaduna Benue Benue Plateau Kebbi Kebbi Kogi Kogi Anambra Abia Abia Enugu Delta Edo Kwara Kwara Kwara Osun Osun Osun Ondo Ondo Osun Ekiti Ogun Osun

Gusau/20yrs Kaduna/21yrs Kano/50years Sokoto/since Katsina/19yrs Benue/27yrs Benue/12yrs Jos/8yrs Kebbi/since Kebbi Lokoja Kogi/9yrs Sokoto/33yrs Abia/10yrs Oyo/2yrs Enugu/10yrs

M M M M M F M M M M M M F M F F M Edo/ 20yrs F Kaduna/23yrs M Kwara/17yrs M Lagos/19yrs M Ghana/11yrs F Osun F Ondo/5yrs F Ondo/18yrs M Ondo/17yrs F Lagos/since M Osun/20yrs F Oyo M England/5yrs M

NC = North Central NP = North Peripheral SW = South-West SE = South-East N.A = Not applicable

41+ 41+ 41+ 41+ 41+ 26+ 41+ 41+ 60+ 41+ 41+ 41+ 26+ 41+ 41+ 41+ 26+ 26+ 26+ 26+ 60+ 41+ 18+ 41+ 41+ 26+ 18+ 18+ 56+ 10+

2 4 2 3 3 4 4 3 4 3 3 4 3 2 3 3 3 3 4 2 3 4 2 2 3 3 2 3 2 2

L1

Hausa Hausa Hausa Hausa Hausa Tiv Tiv Ngas

Hausa Hausa Hausa Hausa Hausa Tiv Tiv Ngas Ot-maror C’lela K’lela Nupe Nupe Igala Igala Igbo Hausa Igbo Igbo Igbo Igbo Igbo Igbo Urhobo Igala Yoruba Yoruba Yoruba Yoruba Yoruba Yoruba Yoruba Yoruba Yoruba Yoruba Yoruba Yoruba

Hausa Yoruba Yoruba Twi English Yoruba Yoruba Yoruba English English Yoruba English

Most Lang(s). used proficient In at home

State of spouse

Work place

Hausa Hausa Hausa Hausa Hausa Tiv. Tiv English English C’lela Nupe Igala English Igbo Igbo Igbo Hausa Hausa Hausa Yoruba Yoruba English English Yoruba Yoruba English English English English English

Ogun kaduna Katsina Kebbi kaduna N.A Benue Plateau Kebbi Kebbi Kogi Kogi N.A Adamawa N.A Enugu Benue Sokoto N.A N.A N.A Osun N.A Lagos Ondo N.A N.A N.A Ogun N.A

Kaduna Kano Kano Sokoto Zaria Yobe Benue Plateau Reitired Kebbi Zaria Zaria Sokoto Osun Osun Sokoto Yobe Sokoto Osun Kaduna Kaduna Osun Osun Osun Osun Osun Osun Osun Osun Osun

Hausa/Eng. Hausa/Eng. Hausa Hausa/Eng. Hausa/Arabic Tiv/Eng. Tiv/Eng./pidgin Ngas Ot-maror/Hausa K’lela Nupe/Eng. Igala/Hausa/Eng Hausa/Eng. English Igbo Igbo/Eng. Igala/Eng./Hausa Hausa/English Hausa/Eng. Yoruba/Eng. Yoruba/Eng. Eng./Yoruba Eng./Yoruba Yoruba Eng./Yoruba Yoruba/Eng. English Eng./Yoruba Yoruba/English English/Yoruba

Onset consonant cluster realisation in Nigerian English

133

Appendix 2 QUESTIONNAIRE Dear Sir /Madam, This questionnaire is designed to compile information on the use of English and other languages in Nigeria. Kindly assist us by supplying answers to the questions below: Section A: Personal Information 1. What do you do for a living? ____________ State the city/town please _________ 2. Your State of origin ____________ Husband/wife’s state of origin _____________ 3. Your place of birth ___________. How long did you live there? ______________ 4. Where else have you lived in Nigeria? (a) ______________ (b) for how long? __________ 5. Educational level (please tick appropriate box): Primary k Secondary k Tertiary k 6. Sex (please tick appropriate box) (a) Male k (b) Female k 7. Where do you currently live_______________ since when ___________________ 8. Age (please tick correct box): 10–17 k 18–25 k 26–40 k 41–60 k 60 & above 9. How many languages do you speak? One k Two k Three k Four k More than Four k 10. What is your native language? ___________________ 11. What language(s) do you speak most at home? (a) _____________ (b) _______________ 12. Arrange the languages you speak in the other of proficiency/fluency: 1st ___2nd____ 3rd___ 4th ____ 13. What dialect of your language do you speak? ____________________________ 14. Which dialect of your language is regarded as the ‘core’ or ‘central’ dialect of language? _________________________________________________________________________________ 15. Which Members of your ethnic group use the core dialect? _____________________________ 16. Why do you think the dialect is regarded as ‘core’? ___________________________________ 17. What is your perception of the classification of a dialect as core/central (you may tick more than one box)? Fair and Objective

Political k Geographical k Historical k

18. Do you think there are other reasons? Please comment freely on Question 17 above. ________________________________________________________________________________ ________________________________________________________________________________ 19. Can you state some features in the Core dialect that are not present in other dialects of your language? (a) _____________________________________________________________________________ (b) _____________________________________________________________________________ (c) _____________________________________________________________________________ Section B: Kindly read aloud the passage below: The Lion has become stranded due to the heat from the smaller animals. The spread of his power sprays every nook and cranny of the forest. The powerless strive and scream yet the bitter scroll unfolds continuously as he roars. The lion is now ill and the hill seems insurmountable. Strolling bye with the stroke from his pride he could not even scrape the phlegm on his back. Please say what the lion’s problems are.

134

Taiwo Olayemi Soneye & Kehinde A. Ayoola

Section C: Interviewee/in-depth Discussion Session (respondents are required to speak/ comment freely for about 7 minutes) There are about 505 Nigerian languages with Hausa, Yoruba and Igbo as main but these languages are spoken differently within each ethnic group. For instance the Hausa spoken in Sokoto is said to be different from the one spoken in Jos, so also the Igbo spoken in Anambra State differs from the variety spoken in Imo State. The Yoruba in Oyo is not the same as the one in Ekiti. Do you agree that there is a standard dialect of your language and what are some of its features? Are there those who formerly belong to your broad language group that now belong to another, as a result of state redistribution in Nigeria? Thank you for your cooperation.

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

8 Acquiring English and French speech rhythm in a multilingual classroom: A comparison with Asian Englishes 1 Introduction Due to globalisation and migration, linguistic and cultural diversity have become an important aspect of foreign language learning and teaching, especially in school settings. Consequently, the aims of foreign language instruction have become more diverse. Apart from communicative skills and plurilingual competencies, multilingual and language learning awareness have recently gained importance. In this framework, a large number of pedagogical studies have been conducted, taking into account (positive or negative) transfer1 between typologically related languages during the process of foreign language learning (e.g. Martinez and Reinfried 2006; Mehlhorn 2008; Meißner and Reinfried 1998). However, the influence of typologically distant (and often unsystematically acquired) heritage languages on the learning of further foreign languages has long been disregarded, from both a linguistic and a pedagogical perspective (Gabriel et al. 2012; Hu 2011). This also applies to prosody and in particular to the durational properties of non-native speech, despite the fact that prosody has been shown to considerably contribute to the perception of foreign accent (Boula de Mareüil and Vieru-Dimulescu 2006). While at least a few studies on the learning of foreign languages by multilingual learners address Turkish as a heritage language, focusing on the learners’ reading and understanding competencies (Elsner 2007; Rauch, Jurecka, and Hesse 2010), other typologically distant languages such as Chinese have not yet been taken into account. Our study

1 We adopt the widely accepted definition of transfer proposed by Odlin (1989) who characterises it as “the influence resulting from the similarities and differences between the target language and any other language that has been previously (and perhaps imperfectly) acquired” (27); see also Odlin (2003) for an overview. Negative transfer is to be understood as the nontarget-like transference of a linguistic structure to the foreign language; the notion of positive transfer, by contrast, refers to the target-like production of a certain property of the language learned that corresponds to some parallel structure of the learners’ mother tongue or some other language from their linguistic background.

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke, Hamburg

136

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

investigates the acquisition of English and French speech rhythm by German senior high school students with Chinese as a heritage language, thus investigating a group of immigrants who nowadays make up important communities in urban spaces in Germany, but who are almost completely ignored in both linguistic research and language pedagogy. By combining linguistic and educational perspectives, we focus on rhythmic transfer, thereby concentrating on the question of to what extent the languages that make up the learners’ linguistic background, i.e. the surrounding and dominant language German and the heritage language Chinese, serve as a basis for (positive or negative) transfer in the acquisition of English and French speech rhythm. Within the context of the multilingual classroom, we discuss if extra-linguistic factors such as the learners’ attitudes towards the languages of the sample, i.e. German and Chinese on the one hand and the foreign languages learned at school on the other as well as their individual degree of multilingual and phonological awareness may have an effect on their production of English and French speech rhythm.2 Contact between typologically distant languages that occurs as a consequence of worldwide migration and that is in particular typical of postcolonial settings is a much better investigated area. This holds, e.g. for the case of Asian Englishes that are in contact with Chinese such as the varieties spoken in Taiwan, Singapore, and Hong Kong. Several studies have shown that these varieties of English are prosodically influenced by the contact language Chinese, in particular with respect to speech rhythm (e.g. Low and Grabe 1995; Low, Grabe, and Nolan 2000; Deterding 2001; Setter 2003, 2006). These findings serve as a basis for a comparison with the data produced by the multilingual learners with Chinese as a heritage language and for detecting possible similarities regarding the prosodic shape of learner and contact varieties of English. The paper is organised as follows. In a first step, we characterise the languages of our sample from a typological point of view, focusing on the differences and similarities between the heritage language (Chinese), the surrounding language and language of instruction (German), and the foreign languages English and French (section 2). Section 3 provides the reader with general information on speech rhythm and in particular on the durational properties of the speech rhythm of the languages investigated in this study before giving a brief overview of the literature on rhythmic transfer in different settings of language contact. In a next step we present the methodology and the results of our empirical study (section 4) before discussing the results in the wider context of prosodic transfer in learner and contact varieties. This will be done by comparing 2 A positive effect of linguistic and in particular phonological awareness on the learning of foreign languages has been evidenced in a number of studies; see Mehlhorn (2008) and Schmidt (2010) with special consideration of phonological learning.

Acquiring English and French speech rhythm in a multilingual classroom

137

the rhythmic properties of the non-native English produced by our learners and those attested in varieties of English that are in contact with Chinese (section 5). Section 6, finally, offers an outlook and some concluding remarks.

2 Mandarin Chinese as a distant language Our study comprises a wide spectrum of genealogically and typologically distinct languages. Within this constellation, Chinese seems to be maximally distant from both English (as the first foreign language in the learning setting and the contact language in the postcolonial context in Asia), French (second foreign or third language) and German (surrounding language and language of instruction). A conspicuous feature of this “otherness” is first of all related to the use of a completely different script in Chinese: Chinese makes use of a basically logographic writing system, the so-called 汉字 (hànzi, ‘Chinese characters’), which represent meaningful units (lexical or functional morphemes) and thus refer to the semantic level, while the Latin script used for English, French and German reflects in a more or less abstract way the phonological level of the languages concerned. This visually perceptible distance between Chinese on the one hand and English, French and German on the other patterns with the languages’ genealogical distinctness: Chinese is a Sino-Tibetan language and the only language in our study that does not belong to the Indo-European group. With respect to morphosyntactic typology, Chinese is again set apart from the other three languages due to its isolating grammar, which is in sharp contrast to the inflecting (or: fusional) structures of English, French and German.3 The same holds true, at least in part, for the prosodic level:4 Regarding intonation, Mandarin Chinese is a tone language with four lexical tones that allow for the expression of lexical contrasts in monosyllabic and segmentally identical 3 In contrast to inflecting languages, which express grammatical functions by means of affixes on lexical stems and/or by allomorphic variation of the stem itself, the words of isolating languages such as Mandarin Chinese do not undergo any systematic changes. Grammatical information such as, e.g. tense or aspect is not marked by bound affixes that attach to the verb stem, but through free morphemes such as the marker of perfective aspect 了 (le), which may appear either clause-finally or next to the verb, e.g. 爸爸看见老师了 (Bàba kànjiàn lăoshī le) or 爸爸看 见了老师 (Bàba kànjiàn le lăoshī), both ‘The father saw the teacher’. 4 The segmental level is not taken into account here since the present study concentrates on prosodic phonology. Concise descriptions of the segmental inventories of the languages involved are given by Lin (2007: 19–82) and Duanmu (2007: 9–47) for Mandarin Chinese, by Gut (2009: 50–74) for English, by Wiese (1996: 9–26) for German, and by Fagyal, Kibbee and Jenkins (2006: 23–52) for French.

138

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

words by means of distinct F0 contours.5 By contrast, the other three languages constitute so-called “intonation-only languages” (Gussenhoven 2004: 12), which lack tonal marking of lexical contrasts but systematically use F0 for the marking of lexical stress6 and clause typing (e.g. declarative vs. interrogative structures), for prosodic phrasing, and for the expression of paralinguistic aspects such as emotions, among other things. Turning now to the durational properties of the languages under discussion, the picture considerably changes, inasmuch as the syllable-timed speech rhythm of Mandarin Chinese (Lin and Wang 2007) patterns with French, but contrasts with the stress-timed languages German and English (see section 3 for details). Against this background, it is to be expected that both the “new” contact varieties of English spoken in Hong Kong, Taiwan and Singapore and the L2 English produced by learners with a Mandarin Chinese background (native or heritage language) exhibit evidence for rhythmic transfer from syllable-timed Chinese. By contrast, learners with Chinese as a native or heritage language should produce the syllable-timed rhythm of French in a more target-like manner than monolingual German learners of French.

3 Investigating speech rhythm The prosody of a language is crucially determined by the systematic use of durational effects and fundamental frequency (F0), i.e. by timing (speech rhythm) and melody (intonation). In languages with lexical stress (Liberman and Prince 1977), such as German or English, both F0 and timing are essentially linked to prosodic prominences on the word level, i.e. to the marking of metrically strong (or: accented) syllables. The role of intensity as the third correlate of stress (besides F0 and duration) will not be considered in our study. Regarding their durational properties, the languages of the world are traditionally classified as belonging to either the stress-timed or the syllable-timed group (Abercrombie

5 A sequence such as [ʂɤ] can convey different lexical meanings, depending on the F0 movement produced on the tone-bearing unit, e.g. first tone (high): 狮(子) shī(zi) ‘lion’, second tone (rising): 十 shí ‘ten’, third tone (falling-rising): 使 shĭ ‘messenger’, fourth tone (falling): 是 shì ‘to be’. Functional morphemes such as the aspectual particle了 (le) are tonally unspecified (so-called neutral tone). For an analysis of the lexical tones in an autosegmental-metrical framework see Duanmu (2007: 236–238). 6 Note that this does not hold for French which lacks lexical stress. In contrast to English and German, French intonation is not related to the word level, but to higher levels of the prosodic hierarchy such as the edges of accentual phrases (see, e.g. Jun and Fougeron 2000).

Acquiring English and French speech rhythm in a multilingual classroom

139

1967; Pike 1945).7 According to this view, the perceived contrast between the two types of languages is interpreted as a contrast in the domain of the isochrony of timing intervals: In syllable-timed languages such as Chinese and French, all syllables tend to be of equal duration. By contrast, stress-timed languages such as English or German rather have (stress-delimited) feet of the same length, i.e., these languages are characterised by approximately equal durations between the onsets of metrically strong syllables. Syllables, in contrast, can be of very different durations. After it was shown by Dauer (1987) and Roach (1982), among others, that neither syllable-based nor stress-based isochrony was systematic in the two groups of languages, the investigation of speech rhythm largely developed in two different directions. One current line of research predominately focuses on phonological factors and interprets the timing properties of a given language as a surface reflection of its phonological properties such as (more or less complex) syllable structures and the presence or absence of vowel reduction (see, e.g. Auer 2001; Auer and Uhmann 1988; Dasher and Bolinger 1982). A second approach, although based on phonological phenomena, is more phonetically oriented and concentrates on the measurement of the durations of vocalic (V) and consonantal (C) intervals in the speech signal (Dellwo 2006; Dellwo and Wagner 2003; Grabe and Low 2002; Ramus, Nespor, and Mehler 1999; White and Mattys 2007; White, Mattys, and Wiget 2012, among many others).8 Seen from this angle, the contrast between “stress-timed” and “syllable-timed” languages is evidenced by different proportions of vocalic material in the speech signal (%V; Ramus, Nespor, and Mehler 1999) on the one hand and by the values depicting the durational variability of V and C intervals on the other. The non-normalised metrics ΔV and ΔC express the standard deviation of V and C intervals (Ramus, Nespor, and Mehler 1999); the so-called variation coefficients VarcoV and VarcoC are speech-rate-normalised versions of ΔV and ΔC, respectively (Dellwo and Wagner 2003); the so-called Pairwise Variability Index (PVI; Grabe

7 A third group of languages, among them Japanese as well as probably Ancient Greek and Vedic Sanskrit, exhibits regular pacing with respect to the mora as a basic timing unit. This aspect will not be dealt with in the following since none of the languages of our sample belong to the mora-timed group. 8 These approaches crucially build on the insights of the influential study by Mehler et al. (1996), who showed that infants perceive the speech signal mainly as a sequence of vocalic and intervocalic (i.e. consonantal) intervals. Based on this assumption, Ramus, Nespor and Mehler (1999) suggested that syllables should no longer be interpreted as the basic rhythmic units of a given language, but rather V and C intervals.

140

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

and Low 2002) differs from the aforementioned metrics in computing the durational variability in successive intervals instead of calculating the variability over the whole stretch of analysed speech.9 Several studies have shown that these rhythm metrics are influenced by speakers, materials and measurers (Dellwo, Leeman, and Kolly 2012; Wiget et al. 2010; Yoon 2010) and that the choice of the most appropriate rhythm metric is language dependent (Loukina et al. 2011). Although these findings call into question the existence of rhythm classes and the use of rhythm metrics in general (Arvaniti 2012), research on the rhythmic properties of non-native speech and contact varieties has considerably increased during the last decade. As concerns the context of L2 language learning, transfer of the durational properties from a native to a non-native system has been attested in studies on pairs of languages belonging to different rhythmic classes (stress-timed vs. syllable-timed; see, e.g. Pulzován de Egger 2002 for Argentinean Spanish learners of German and German learners of Spanish; Chen 2012 for Taiwanese learners of English), but also in work on the learning of a second language belonging to the same rhythmic group as the native tongue of the learners (see Benet et al. 2012; Gabriel and Kireva 2014 for Italian learners of Spanish; Ordin, Polyanskaya, and Ulbrich 2011 for German learners of English; Gut 2012 for an overview). Rhythmic transfer was also shown to occur in the speech of adult bilingual speakers (see, e.g. White and Mattys 2007 for Spanish/English and Spanish/ Dutch adult bilinguals) and in bilingual first language acquisition (see Kehoe, Lleó, and Rakow 2011 for Spanish/German bilingual children). Regarding the field of (usually migration-induced) linguistic contact, rhythmic transfer has been attested for several Asian varieties of English in contact with syllable-timed languages, among them Mandarin Chinese (see, e.g. Crystal 1995; Deterding 1994, 2001; Jian 2004; Low and Grabe 1995; Low, Grabe, and Nolan 2000; Setter 2003, 2006; see section 5 for a more detailed discussion), for banlieue (‘suburban’) Parisian French in contact with migration languages such as Arabic (Fagyal 2010), and for Argentinean Spanish in contact with Italian (Benet et al. 2012; Gabriel and Kireva 2014). However, studies explicitly addressing the acquisition of non-native speech rhythm by multilingual learners in an instructed learning setting are virtually inexistent, apart from a pilot study by Gabriel et al. (2012) 9 In the literature, the PVI has been applied in both its raw, i.e. non-normalised form (rPVI) and in its speech-rate-normalised version (nPVI). Grabe and Low (2002) argued that only vocalic durations are affected by speech rate and consequently suggested using the PVI in its normalised form for V intervals (VnPVI) and in its raw version for C intervals (CrPVI). However, since it was shown that consonantal durations can also vary according to speech rate (Dellwo and Wagner 2003), the normalised PVI has also been applied for C intervals (CnPVI) in several studies (see, e.g. Kinoshita and Sheppard 2011).

Acquiring English and French speech rhythm in a multilingual classroom

141

on the acquisition of French speech rhythm by multilingual learners with Chinese as a heritage language in a German classroom. The results obtained in this study indicate that learners with a syllable-timed language such as Chinese as a heritage language might benefit from the rhythmic properties of their linguistic background regarding the acquisition of the syllable-timed rhythm of French through positive transfer. Our empirical study, which is presented in detail in the following section, considerably enlarges the data base as compared to the one used in Gabriel et al. (2012), e.g., by including control data from monolingual Chinese learners of French, and thus aims at filling a research gap in this area. The comparison of the results from the learner data with the findings obtained in the literature on Asian Englishes aims at adding to our knowledge on the similarities and diversity between learner and contact varieties with respect to prosodic transfer.

4 Empirical study The following section presents our empirical study. In a first step, we outline our hypotheses (section 4.1), before describing our speakers and the data collection (section 4.2). Then we present the results obtained from the rhythmic analysis performed on the data (section 4.3) and, finally, discuss them (section 4.4).

4.1 Hypotheses As shown in section 2, (Mandarin) Chinese differs from German, English and French in many respects. Only with regard to speech rhythm it shows similarities with French in that both are said to be syllable-timed. As shown in the pilot study by Gabriel et al. (2012; see section 3), speakers with a syllable-timed language in their linguistic repertoire who learn a syllable-timed foreign language might benefit from their implicit or explicit linguistic knowledge by positively transferring appropriate prosodic characteristics to the target language. The same should apply to speakers of a stress-timed L1 learning a stress-timed L2. Applied to our context, this means that learners with L1 German (stress-timed) produce the speech rhythm of the L2 English (stress-timed) more target-like than learners with L1 Chinese (syllable-timed). The latter, by contrast, should perform better than monolingual German learners regarding the production of the syllable-timed rhythm of their L3 French. Based on these prerequisites, our hypotheses are as follows:

142

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

H1: Monolingual German learners and multilingual learners with a Chinese/ German background produce the speech rhythm of English as a L2 more targetlike than monolingual learners with L1 Chinese (who lack any experience with stress-timed languages). H2: Monolingual Chinese learners and multilingual learners with a Chinese/ German background produce the speech rhythm of French as a L3 more targetlike than monolingual learners with L1 German (who lack any experience with syllable-timed languages).

4.2 Methodology Our set of participants comprises one experimental group and three control groups. The experimental group consists of 13 multilingual Chinese / German speaking students from a German senior high school (Gymnasium). They all had started to learn English as a L2 before French as a second formally instructed L2 at school.10 At the time of recording the speakers were aged between 14 and 18; they had learned English for 5–11 years and French for 2–6 years. Table 1 gives an overview of the linguistic biographies of the participants, showing the heterogeneity of the group of multilingual learners. Although most of the speakers learned Mandarin Chinese or another Chinese language such as Cantonese, 潮州市话 Cháozhōu huà (a variety belonging to the 闽 Mĭn group), or 上海话 Shànghăi huà (which belongs to the 吴 Wù group, see Kurpaska 2010: 37–62),11 and German as a L1 or an early L2, the ages of onset for the individual languages differ considerably (see Table 1).

10 One of our speakers had learned Russian before starting to learn English (see Table 1). The order of acquisition, however, is the same for all learners with respect to the two foreign languages addressed in our study, i.e. English and French. 11 Although the speech rhythm of these varieties have not yet been investigated systematically (Peggy Mok, Chinese University of Hong Kong, and Hongwei Ding, Tongji University of Shanghai, personal communications), they do not considerably differ from Mandarin Chinese with respect to syllable structure (Ramsey 1987: 92–93, 109). This suggests that both Cháozhōu and Shànghăi huà pattern with Mandarin Chinese regarding their rhythmic properties. We are deeply indebted to Dunghui Zuo (Chinese University of Hong Kong) for providing us with a recording of the Shànghăi huà version of the North Wind text; our analysis of this recording showed rhythmic values that differed only slightly from those obtained from the Mandarin Chinese data.

Acquiring English and French speech rhythm in a multilingual classroom

143

Table 1: Participants in the experimental group. (AoL = age of learning, m = male, f = female; MAN = Mandarin Chinese, GER = German, CANT = Cantonese, ENG = English, FRE = French, SPA = Spanish, RUS = Russian) Learner (age, sex)

L1 (< 3 years)

L2/s (AoL)

C01 (17, f) C02 (14, m) C03 (17, m) C06 (17, f) C08 (17, m) C09 (18, m) C10 (16, m) C12 (14, f) C13 (18, m) C14 (15, f) C15 (15, f)

MAN MAN, GER MAN CANT, GER GER, Cháozhōu huà MAN CANT, GER MAN, GER MAN, GER MAN, GER Shànghăi huà, GER, MAN MAN GER

GER (9) ENG (9) GER (6) MAN (6) ENG, MAN (11) ENG (10) ENG, MAN (10) ENG (8) ENG (10) ENG (9) ENG (8)

ENG (10) FRE (12) ENG (9) ENG (9) FRE (13) GER (12) FRE (11) FRE (10) FRE (13) FRE (11) FRE (11)

SPA (13) SPA (13)

RUS (5) MAN (5)

ENG (6) ENG (8)

GER (9) FRE (11)

C16 (17, f) C17 (16, f)

FRE (11) FRE (12) FRE (12) FRE (15)

FRE (11) SPA (13)

SPA (13)

The first control group consists of ten monolingual German learners; all of them are senior high school students in Germany (aged 15). At the time of recording (Hamburg 2012), they had learned English for seven years and French for four years. The second control group comprises ten monolingual Chinese senior high school students from Beijing who were recorded in Chinese, English and French. Their ages range between 17 and 21 years; at the time of the data collection (Beijing 2012), they had learned English for 9–13 years and French for 1–6 years. All learners were recorded both in their mother tongue (German or Chinese) and in the two L2s. The native control data for French were gathered from ten students from the University of Bordeaux (ages 18–22; Bordeaux 2012). As for English, we refer to the rhythmic values given in Mairano and Romano (2010). Data collection proceeded in the following way: First, all learners were asked to fill in a questionnaire that enquired about their language biographies. Then they were recorded reading aloud a text in each of their languages. For English, German and Mandarin Chinese, we used the fable The North Wind and the Sun and its respective translations, i.e. Nordwind und Sonne and 北风和太阳 Běifēng hé tàiyáng. Due to the high degree of lexical difficulty of the French version (La bise et le soleil), the participants read a short story from a textbook instead (Amandine fait du sport).12 All the materials were controlled with respect 12 As for the versions of the North Wind text, we followed the Handbook of the International Phonetic Association (see IPA 1999); the Mandarin Chinese version we used is reproduced in the Appendix. The French text was taken from Jouvet (2006: 7) and slightly adapted to our purposes (see Appendix).

144

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

to syllable structure in order to make sure that the occurrences of different syllable types (CV, CVC, CCVC etc.) contained in the individual texts correspond to what is typical of the relevant language. The participants read the texts in the order they felt most comfortable with. In a last step, semi-structured interviews (Kvale 2007) focusing on the learners’ attitudes towards their languages as well as on their multilingual and phonological awareness were conducted with the multilingual learners (experimental group). All interviews were conducted in German. For the analysis of the speech data, we measured all C and V intervals using Praat (Boersma and Weenink 2011). Following White and Mattys (2007), the positions of boundaries between V and C intervals were determined on the basis of formant structure and pitch period and set at the point of zero crossing of the waveform. Pre-pausal and phrase-final intervals were considered for the analysis because possible effects of final lengthening were likely to be reflected in the measures (Grabe and Low 2002; White and Mattys 2007). According to Grabe and Low (2002), glides were treated as belonging to the V intervals if there was no friction detected in the speech signal. For plosives and affricates following a stretch of silence (pause), the beginning was placed at 50 milliseconds before the burst, given that their boundaries can hardly be set using the aforementioned criteria (Mok and Dellwo 2008). Silent pauses and material affected by any kind of speech disfluency were not included in the analysis. On the basis of the segmentation, we calculated %V and VarcoV (see section 3) using Correlatore (Mairano and Romano 2010), in line with White and Mattys (2007) who statistically identified these two rhythm metrics as appropriate for the analysis of multilingual speech data. Regarding the interviews, we extracted informative passages which reveal the learners’ attitudes towards their languages and hint at their multilingual and phonological awareness. Individual learner profiles were created on the information taken from both the questionnaires and the interviews. Finally, we looked for correlations between the linguistic and the non-linguistic data that might contribute to a better understanding of the results.

4.3 Results Figure 1 illustrates the results for L2 English. The x-axis shows the percentage of vocalic material in the speech signal (%V); the y-axis indicates the values for the variability of V intervals (VarcoV). For ease of reading, the L1 values for Mandarin Chinese and German are represented as averages from all speakers’ results (upper left graph in Figure 1); as already mentioned, the target (L1)

Acquiring English and French speech rhythm in a multilingual classroom

145

Figure 1: %V and VarcoV of L2 English. The upper left graph shows L1 averages only; the upper right one adds the results of the monolingual German and Chinese learners of English. The lower graph, finally, depicts the results of all speakers.

values for English are taken from Mairano and Romano (2010). All L1 values correspond to what is reported in the literature: Chinese (right-sided triangle) displays a higher value for the percentage of vocalic material and a lower value for the variability of V intervals as compared to both German (left-sided triangle) and English (rhomb); German is comparable to English for %V. For VarcoV, German shows a somewhat lower value than English (see Ramus, Nespor, and Mehler 1999; Grabe and Low 2002; Dellwo 2006; Mairano and Romano 2010).

146

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

Figure 2: Variability of %V (left) and VarcoV (right) in the three groups of L2 English speakers.

Looking at the results for the learners (upper right graph in Figure 1), we first see that the majority of the VarcoV values for the L1 German learners (light grey squares) lie in between the L1 German starting point and the English target point. Most of the monolingual Mandarin Chinese learners (dark grey circles) display VarcoV values lower than those produced by the monolingual German learners. This confirms findings from previous studies on speech rhythm in L2 English produced by native speakers of Mandarin Chinese (Li and Post 2012). The values for the multilingual learners (black triangles, lower graph in Figure 1) overlap with the results of both monolingual learner groups, some of them performing better than others do. For illustration, we highlight three speakers as examples of variable success in producing the speech rhythm of the target language. These three cases will be discussed later on when we look at the non-linguistic data. All learner groups show some dispersion with respect to their %V and VarcoV values, separately depicted as boxplots for each group (see Figure 2). As can be seen, inter-speaker variability regarding %V is highest for the monolingual Chinese learners of English (standard deviation (SD) = 3.59) and lowest for the multilingual learners (SD = 1.95). Regarding VarcoV, variability is highest for the multilinguals (SD = 5.65), whereas the dispersion range is lower for the two monolingual groups (Chinese learners: SD = 3.89, German learners: SD = 3.76). The results were statistically tested by comparing the rhythmic values of the different learner groups with those of the L1 English speakers. Taking into consideration the effect of inter-speaker variability with respect to the variable production in the respective native languages, we applied an ANCOVA test (Analysis of Covariance), which yields adjusted average values for each learner group. To test the difference between these values and the target value of the

Acquiring English and French speech rhythm in a multilingual classroom

147

English native speakers from Mairano & Romano’s (2010) study, a one-sample mean-comparison test (t-test) was applied. For VarcoV, we found a significant difference between English as a L2 produced by the monolingual Chinese learners and L1 English (t-test, p < 0.05), but no significant difference was found between the L2 English produced by the German monolinguals and L1 English. For testing the multilingual learners’ performance as a group, one of their two L1 languages, i.e. Chinese or German, would have to be defined as their native language – which is hardly possible given their considerably different linguistic biographies (see Table 1). We thus abstain from statistically testing the multilingual learners as a group and rather treat them as individual speakers. Section 4.4 provides further evidence in favour of this approach from the extra-linguistic data. The results obtained from the analysis performed on the L2/3 French data (Figure 3) show that the value for L1 French (rhombus, target value) is located in between the values for L1 German and L1 Mandarin Chinese (upper left graph in Figure 3). Again, this corresponds to the values reported in previous studies: French displays a percentage of vocalic material in between German and Chinese, while the variability of V intervals is comparable to that of Chinese (Mairano and Romano 2010). Looking at the results of the learners (upper right graph in Figure 3), it is conspicuous that the majority of the learners display quite high values for both metrics when they speak French as a L2. We further see that the monolingual German learners of French (light grey squares) represent the group with the highest values for the variability of V intervals, whereas the monolingual Chinese learners of French (dark grey circles) obtain lower values in this respect. The values for the multilingual learners (black triangles, lower graph in Figure 3) are situated in between the two groups of monolingual learners. For %V, all of the three groups show a roughly comparable range of dispersion. Despite the fact that all of our three speaker groups can generally be distinguished from each other, some individual speakers again display more target-like results than others. Figure 4 illustrates the inter-speaker variability for L2 French. With respect to %V, the three learner groups do not considerably differ from one another, with standard deviations ranging from 2.25 for the monolingual Chinese learners and 2.95 for the German monolingual learners of French. Concerning VarcoV, variability is highest for the monolingual learners with L1 German (SD = 4.91), while the standard deviation is lower for the two other groups (multilingual learners: SD = 3.34, Chinese learners: SD = 3.66). As concerns %V, we found a significant difference between French as a L2 produced by the monolingual German learners and L1 French (ANCOVA test, p < 0.05), but no significant difference was found between the French produced

148

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

Figure 3: %V and VarcoV for L2 French. The upper left chart shows L1 averages only; the upper right one adds the results of the monolingual German and Chinese learners of French. The lower graph shows the results of all speakers.

by the Chinese monolinguals and the target language. In testing the multilinguals, the results yield a significant difference when German is considered as a L1 (ANCOVA test, p < 0.05), but no significant difference was detected with Chinese considered as the learners’ L1. This might be interpreted as a welcome result, since it corresponds to what is expected on the basis of the rhythmic properties of the languages investigated. However, as stated above, treating the multilingual learners of our sample as a homogeneous group and performing statistical tests on their grouped speech production is highly questionable.

Acquiring English and French speech rhythm in a multilingual classroom

149

Figure 4: Variability of %V (left) and VarcoV (right) for the three learner groups of L2 French.

4.4 Discussion The descriptive results obtained from the analysis of the L2 English data show that speech rhythm is produced more target-like by the monolingual German than by the monolingual Chinese learners. The data produced by the multilinguals exhibit some inconsistencies, in that some of them perform like the German monolinguals, while others rather pattern with the monolingual Chinese learners. Statistical hypothesis testing confirms these findings only with respect to VarcoV: The monolingual Chinese learners of English perform significantly different from L1 English speakers, while the German monolingual learners of English show no significant difference from English native speakers. Hypothesis 1 is thus partially confirmed, i.e. regarding the monolingual groups. As concerns L2 French, we found quite high values for both %V and VarcoV in all the learner data. We interpret this finding as an effect of low proficiency which leads to variable (and usually lower) speech rate and higher occurrences of hesitation phenomena. As for the rhythm metrics, the situation is inverted as compared to that of L2 English: The L1 Chinese speakers generally produce French speech rhythm in a more target-like way than the monolingual German learners of French do. The group of the multilingual speakers, again, displays inconsistent results in overlapping with both monolingual learner groups. The statistical tests show that the German monolingual learners of French differ significantly from L1 French speakers regarding %V, while the Chinese monolingual learners of French do not. Consequently, Hypothesis 2 is partly confirmed with regard to the monolinguals’ performance in the L2. As these results suggest, differences between speaker groups may be captured by different rhythm metrics for different languages (Loukina et al. 2011), in our case VarcoV for English and %V for French. In order to account for the heterogeneous results obtained from the analysis performed on the speech data produced by the multilingual learners, we consider the extra-linguistic data gathered from the semi-structured interviews,

150

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

which focused on the learners’ attitudes towards their languages and on their multilingual/phonological awareness. Apart from the general multilingual language background, we have to take into consideration the fact that German and Chinese may have a different status for each of the speakers (L1, L2 or early L2). In addition, the age of learning of the foreign languages English and French varies from speaker to speaker, and they also differ in their language learning biographies (i.e. L2s other than English and French may have been learned before or are learned simultaneously; see section 4.2). Diversity also shows up with respect to the individual attitudes of the learners towards the different languages, and they also differ from one another regarding their relative degree of multilingual and phonological awareness. When taking a closer look at the results of the three speakers highlighted in Figures 1 and 3 (speakers C08, C14, and C16; see section 4.3) and considering their multilingual and/or phonological awareness, there seems to be a relationship between the learners’ individual production of speech rhythm in the foreign languages and certain extra-linguistic factors. As can be seen in the following, speakers vary with respect to the quantity and quality of their statements referring to, e.g. articulation, cross-linguistic comparisons of sounds or speech melody and/or attitudes towards their languages. Speaker C08 does not obtain a target-like result in his L2 English, especially with respect to VarcoV. Speakers C14 and C16, by contrast, both display relatively target-like values for English. Looking at their L3 French, speakers C08 and C16 perform more target-like than C14, who is set far apart from the target value for French. Considering the language profiles depicted in Figure 5 again, the heterogeneity that characterises the group of multilinguals is highly conspicuous. Learner C08 (male, age: 17) was born in Germany. With his parents, he usually speaks 潮州市话 Cháozhōu huà, a variety belonging to the group of 闽 Mĭn dialects (see footnote 11); with his brother, by contrast, he only speaks German. He started learning German in Kindergarten (AoL: 3); Mandarin Chinese was only learned from the age of 11 on, when he entered senior high school. At the same time, he began learning English (AoL: 11); French as a second foreign or third language was added two years later (AoL: 13). Learner C14 (female, age: 15) was also born in Germany. She acquired Mandarin Chinese from birth on and German in Kindergarten (AoL: 3). At home, she speaks German, but she attends 星期日学校 xīngqīrì xuéxiào ‘(Chinese) Sunday school’, where she regularly improves her Chinese language skills. English as the first L2 was learned at the age of nine; French was added two years later (AoL: 11). At the time of recording, she was preparing for an exchange year in the US. In addition to French and English, she also learned Spanish from the age of 13 on. Learner C16 (female, age: 17) has an even more complex language profile. She was born in China,

Acquiring English and French speech rhythm in a multilingual classroom

151

Figure 5: Learner profiles for speakers C08, C14 and C16. The bottom arrow symbolises the age of learning (see Table 1); each bar represents a language learned by a speaker for the indicated period.

152

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

where she acquired Mandarin Chinese as a L1, and thus might have a stronger Chinese language background as compared to speakers C08 and C14. Like C14, she practices her Chinese skills by regularly attending Chinese Sunday school. At the age of five, she moved to Russia with her family, where she started learning Russian as the first stress-timed language she got in contact with. Until present, she occasionally practices Russian with her father. One year after having moved to Russia, she started English lessons at school (AoL: 6). When she was nine, her family came to Germany where she started to learn German. She continued learning English at school and started French at the age of 11; Spanish as a further foreign language was added to her linguistic repertoire two years later (AoL: 13). Looking at the learner profiles given in Figure 5, it might seem surprising that C16 is the speaker with the most target-like results for English because she seems to have the strongest Chinese background and started learning stresstimed languages only late in comparison with the other two speakers. One might conclude that she does not have the best prerequisites for (positively) transferring stress-timed characteristics from her other languages to her L2 English. As for C08 and C14, it might be surprising at first sight that the two learners show very different results since both of them have a comparable language background: They both speak three syllable-timed languages, one even as a L1, but only two stress-timed languages. It might seem more probable that they produce more target-like values in their L3 French than in their L2 English, but this only holds true for C08. Comparing the interviews of our three speakers hints at possible reasons for this: C08: Ähm, also, meinerseits ist es sehr, sehr toll, wenn ich Asiaten sehe, die meine Sprache auch sprechen können, also Cháozhōu. Das war auch auf der Straße so, ich sehe jemanden, und ähm, das waren Freunde von Freunden, ich begrüße sie und dann hatte ich so gefragt, welche Sprache sie sprechen. Und sie gleich Cháozhōu, und dann wurden wir sofort Freunde, auf einen Schlag. Also, es hat schon Vorteile. Und Mandarin, weil, ähm (. .), in China sprechen ja eigentlich alle, das ist so die Nationalsprache und das wäre schon von Vorteil, wenn ich so als Chinese Mandarin sprechen könnte, das wäre ein bisschen peinlich für mich schon. ‘As far as I am concerned it is great when I meet Asians who speak my language, I mean Cháozhōu. I once met some people in the street, friends of a friend. I greeted them and then I asked them which language they speak. They immediately said Cháozhōu and we were friends at once, at one go. So there are advantages. And Mandarin . . . in China nearly everybody speaks Mandarin, it is the national language and it would be an advantage if I as a Chinese spoke Mandarin. Otherwise it would be embarrassing for me.’ C08: Also, also, zumindest, also, zumeist die Grammatik [des Deutschen], ist, fällt für uns sehr schwer, da es auch Artikel gibt und ähm, im Chinesischen ist die Grammatik sehr einfach, das wird alles, wie z. B. die Vergangenheit, im Deutschen muss man die

Acquiring English and French speech rhythm in a multilingual classroom

153

Verben konjugieren, im Chinesischen setzt man nur ein le hinten dran und dann ist das Vergangenheit. ‘Mostly the grammar [of German] is difficult for us because there are articles and . . . in Chinese the grammar is very easy. For example the past; in German verbs have to be conjugated, in Chinese you put le to the end of the verb and then you have a simple past tense form.’

C08 characterises himself as being very German. His friends call him the Nicht-Chinese (‘non-Chinese’) because he has never been to China. The Chinese language, however, is very important to him: He uses Chinese to make friends and considers the knowledge of different Chinese varieties to be beneficial. French, in contrast, is less important to him; he is even happy to quit his French courses and explains this with the lack of possibilities of language use and boring French classes. His attitude towards English seems to be neutral, English serves him only for communication purposes. By explaining grammatical differences between German and Chinese, he shows a certain degree of multilingual awareness, but there is no evidence of phonological awareness detectable in the material. Interviewer: Ok. Und siehst du für dich Chinesisch als einen Vorteil an? C14: [. . .] Das ist einfach nur mehr Arbeit. Ich muss sonntags hierher [in die Sonntagsschule] kommen. Ich muss chinesische Hausaufgaben machen. Ich kann sonntags mich nicht mit Freunden treffen. Manche feiern auch ihren Geburtstag sonntags und da konnte ich auch nicht hin und das ist einfach nur . . . ‘Interviewer: Well. And do you think your Chinese skills are of an advantage to you? C14: [. . .] It [i.e. learning Chinese] is just more work. I have to come here [i.e. to Chinese Sunday school] on Sundays. I have to do Chinese homework. I can’t meet friends on Sundays. Some of them celebrate their birthday on a Sunday and I could never go there. That’s just . . . ’ C14: Zum Beispiel das r, das rollt sie [die muttersprachliche Mitschülerin im SpanischUnterricht] halt schon. Ja [. . .] wie es sich gehört und wir halt nur so r [produziert ein ungerolltes r]. ‘For example the r, she [a class-mate speaking Spanish as a L1] rolls it, like it has to be. And we produce r like that [demonstrates an unrolled r].’

C14 is very connected to Germany and the German language. Learning Chinese is additional work for her and instead of going to Chinese Sunday school, she would prefer meeting friends. Furthermore she does not see any advantage in her Chinese language background for the learning of other languages. Spanish, French and English are just three languages she uses for communication, English is considered as the most important one. By explaining the differences of producing the r sound in Spanish, she shows little phonological and multilingual awareness.

154

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

C16: Viele können diese Töne nicht so gut und manche Aussprachen nicht so gut, weil die Zunge, also mir ist aufgefallen, bei Sprache ist Zunge ganz wichtig. ‘Many people produce the tones and some pronunciations not very well because of the tongue. I noticed that the tongue is very important for languages.’ C16: Also wenn ich eine Sprache wie Französisch spreche und da Wörter kommen, die auf anderen Sprachen es auch gibt, denke ich sofort daran, wie man das auf Deutsch, auf Englisch, auf Russisch, auf Spanisch aussprechen könnte. Das heißt, die folgen gleich darauf mit dran und alle verbunden, ja. ‘When I speak a language like French, and there are words which also exist in other languages, I immediately think of how they are pronounced in German, in English, in Russian, in Spanish; that is, they directly follow each other and are all interrelated.’

C16 does not show any particular affection to a special language, but seems to be interested in every single language she has at her disposal. As compared to the other speakers, her comments on the importance of the tongue for the production of speech sounds reveal a certain degree of phonological awareness, in particular regarding articulatory phonetics. In addition, she refers to prosodic characteristics such as (lexical) tones that might influence foreign language productions. Regarding her own strategies of foreign language learning, she refers to cross-linguistic connections on the lexical level, thus demonstrating a certain degree of multilingual awareness. Summing up, the correlation between target-like production of speech rhythm in the L2s and the extra-linguistic factors seem to at least partly explain the inconsistent results obtained from the analysis of the data produced by the multilingual learners: Both the learners’ attitudes towards their languages (L2s, German and the heritage language Chinese) and multilingual as well as phonological awareness might favour positive transfer from a syllable- or stress-timed L1 to a L2 exhibiting the relevant rhythmic properties.13

5 Rhythmic properties of learner and contact varieties: Comparing non-native English and Asian Englishes As our results suggest, the native prosodic background can have a (positive or negative) transfer effect on L2 prosody in learner settings. In comparison to the 13 In order to further substantiate these findings, we conducted Think-Aloud-Protocols (Osburne 2003) with a subgroup of the multilingual learners. First results obtained from the analysis of these data seem to corroborate the correlation between target-like production and extra-linguistic factors (Gabriel et al. 2015).

Acquiring English and French speech rhythm in a multilingual classroom

155

monolingual German learners of English with a stress-timed background, the monolingual Chinese learners with a syllable-timed background consistently produce more target-like French speech rhythm and less target-like English speech rhythm. In line with their native language, these learner varieties of English exhibit a number of specific syllable-timed properties (i.e. higher values for %V and lower values for VarcoV than the target variety).14 From a global perspective on multilingualism outside classroom settings, the role of English as a worldwide lingua franca is worth a closer look. Crosslinguistic influence in the emergence of Asian Englishes, for instance, has been studied with respect to prosodic properties. It is well-established that “[t]here is a syllable-timed English emerging all over the world” (Crystal 1995: 177), so that these varieties of English share their syllable-timed speech rhythm in English with the learner groups presented in our empirical study (see section 4.3). Chinese-based Asian Englishes thus constitute an interesting reference point in the context of our study.15 Turning to Singapore first, the majority of its population is ethnically Chinese, and Mandarin Chinese is one of its official languages. Regarding speech rhythm, studies on Singapore English have shown that it differs rhythmically from Standard British English: Measuring the durations of successive vowels in read sentences (PVI), Low and Grabe (1995) and Low, Grabe and Nolan (2000) found that Singapore English has a lower vocalic variability than British English and that the quality of reduced vowels in Singapore English differs from British English. As a consequence, Singapore English is said to be more syllable-timed than British English on a cross-varietal scale. Deterding’s (1994, 2001) studies on Singapore English confirm these results with respect to natural, conversational speech. Comparing the syllable durations produced by speakers of Singapore English and British English, he showed that their variability in Singapore English is lower than in British English. Apart from the qualitative differences of reduced vowels reported by Low, Grabe and Nolan (2000), their infrequent occurrence and relatively longer durations contribute to the effect of the syllabletimedness of Singapore English (Deterding 2001). Jian’s (2004) findings on Taiwan English correspond to the results obtained for Singapore English: In her study, Taiwan English is compared to American English, which has a %V value that is comparable to British English, while VarcoV

14 Meng et al. (2010) further identify incorrect stress placement and realisation of unstressed syllables as typical prosodic domains of negative transfer by Asian learners of English. 15 The postcolonial varieties of French, which are still spoken in some Asian countries such as Vietnam, Laos and Cambodia, are not considered here, since the use of French in Asia has been decreasing for years, while English has steadily gained importance.

156

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

is somewhat lower (Mairano and Romano 2010). Jian uses Low, Grabe and Nolan’s (2000) metrics and similar materials and finds the Taiwan English PVI to be lower than that of American English. Also, the proportion of vocalic material (%V) is higher for Taiwan English than for American English. Likewise, in her investigation of Hong Kong English, Setter (2003, 2006) examines formal speech produced by L1 Cantonese speakers of English. Cantonese displays values for both %V and VarcoV that are quite similar to Mandarin Chinese (Mairano and Romano 2010). Setter’s findings corroborate the general picture of Asian Englishes, i.e. Hong Kong English features less variability in syllable duration and overall longer syllable durations than British English, among other results. Summing up, the examples of Singapore English, Taiwan English and Hong Kong English all show a tendency towards a more syllable-timed speech rhythm as compared to either British or American English. The studies cited here suggest that these results can be explained by less variability of successive V intervals (or entire syllables, as in the study by Deterding 2001), a higher proportion of vocalic material, longer vowel durations and the different nature of reduced vowels in contrast to the varieties of British and American English. The results of our study show a comparable tendency for syllable-timed English in the data produced by the Chinese learners and consequently suggest a clear link between L2 speech and new varieties of English.16 Bearing in mind this similarity, Chinese-based Asian Englishes can plausibly be characterised as contact varieties that have developed out of L2 speech and that have preserved durational properties which are typical of the speech produced by learners with a rhythmically distinct L1 such as Mandarin Chinese.17

6 Conclusion and outlook Based on the results obtained from our analysis of L2 English and L3 French data produced by different groups of monolingual and multilingual learners, we conclude that there are both linguistic and extra-linguistic factors that constrain cross-linguistic influence with regard to speech rhythm. Depending on the 16 However, further research is needed in order to determine if exactly the same prosodic factors are responsible for these similarities, e.g. with respect to vowel reduction in L2 English. 17 Regarding the similarities of English as a L2 and new English varieties also see the studies of Benet et al. (2012) and Gabriel and Kireva (2014), who showed that the Italian-based contact variety of Porteño Spanish as spoken in present day Argentina rhythmically patterns with the L2 Spanish produced by Italian natives who currently learn Spanish as a foreign language in Madrid.

Acquiring English and French speech rhythm in a multilingual classroom

157

interplay of these factors, the multilingual learners with Chinese as a heritage language can have an advantage over the German monolinguals in learning French and over the Chinese monolinguals in learning English in that the speech rhythm of their L1s may be positively transferred to the L2s. The connections between the extra-linguistic data taken from the questionnaire and the semistructured interviews conducted with the learners and their (more or less targetlike) production of speech rhythm in the L2/L3 suggest that positive transfer might be favoured by the learners’ language attitudes as well as by their individual degree of multilingual and phonological awareness. This indicates that a typologically distant language such as Chinese as part of a complex linguistic background does not constitute a disadvantage for the learning of further languages. On the contrary, having syllable-timed Chinese as a L1 along with (stress-timed) German in their linguistic repertoire may rather be seen as an advantage for these learners – provided that the relevant properties of their L1s get “activated” and may thus serve as a basis for positive transfer of durational features. We cautiously interpret these findings as evidence to suggest that phonological and multilingual awareness need to be promoted in secondary education, including both learners and teachers. Bearing in mind possible parallels between English as a L2 and as a postcolonial variety, as outlined in section 5, the reality of the multilingual classroom offers an advantageous opportunity to teach prosodic differences of different languages, including the various World Englishes, and to train phonological awareness, especially because of the fact that certain properties linked to speech rhythm, e.g. syllabic durations, are claimed to be “highly learnable and teachable” (Setter 2006: 767).

References Abercrombie, David. 1967. Elements of general phonetics. Edinburgh: Edinburgh University Press. Arvaniti, Amalia. 2012. The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics 40. 351–373. Auer, Peter. 2001. Silben- und akzentzählende Sprachen. In Martin Haspelmath, Ekkehart König, Wulf Oesterreicher & Wolfgang Raible (eds.), Language typology and language universals: An international handbook, 1391–1399. Berlin: Mouton De Gruyter. Auer, Peter & Susanne Uhmann. 1988. Silben- und akzentzählende Sprachen: Literaturüberblick und Diskussion. Zeitschrift für Sprachwissenschaft 7 (2). 214−259. Benet, Ariadna, Christoph Gabriel, Elena Kireva & Andrea Pešková. 2012. Prosodic transfer from Italian to Spanish: Rhythmic Properties of L2 Speech and Argentinean Porteño. In Qiuwu Ma, Hongwei Ding & Daniel Hirst (eds.), Proceedings of Speech Prosody 2012: 6th International Conference, Shanghai, China, May 22–25, 438–441. Shanghai: Tongji University Press.

158

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

Boersma, Paul & David Weenink. 2011. Praat: Doing phonetics by computer. Version 5.3 [Computer software]. http://www.praat.org (accessed 13 April 2011). Boula de Mareüil, Philippe & Bianca Vieru-Dimulescu. 2006. The contribution of prosody to the perception of foreign accent. Phonetica 63 (4). 247–267. Chen, Hsueh-Chu. 2012. Second language timing patterns and their effects on native listeners’ perceptions. Concentric: Studies in Linguistics 36 (2). 183–212. Crystal, David. 1995. Documenting rhythmical change. In Jack Windsor Lewis (ed.), Studies in general and English phonetics, 174–179. London: Routledge. Dasher, Richard & Dwight Bolinger. 1982. On pre-accentual lengthening. Journal of the International Phonetic Association 12 (2). 58–71. Dauer, Rebecca M. 1987. Phonetic and phonological components of language rhythm. In Tamaz V. Gamkrelidze (ed.), Proceedings of the 11th International Congress of Phonetic Sciences, Tallin, Estonia, August 1-7, 447–450. Talinn: Academy of Sciences of the Estonian SSR. Dellwo, Volker. 2006. Rhythm and speech rate: A variation coefficient for delta C. In Pawel Karnowski, & Imre Szigeti (eds.), Language and language processing: Proceedings of the 38th Linguistic Colloquium, Piliscsaba, 2003, 231–241. Frankfurt: Peter Lang. Dellwo, Volker & Petra Wagner. 2003. Relations between language rhythm and speech rate. In Maria-Josep Solé, Daniel Recasens & Joan Romero (eds.), Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, Spain, 3-9 August, 470–474. Barcelona: Universitat Autònoma de Barcelona. Dellwo, Volker, Adrian Leemann & Marie-José Kolly. 2012. Speaker idiosyncratic rhythmic features in the speech signal. Electronic Proceedings of Interspeech 2012, Portland, OR, USA. Deterding, David. 1994. The rhythm of Singapore English. In Roberto Togneri (ed.), Proceedings of the Fifth Australian International Conference on Speech Science and Technology, 316– 321. Canberra: Australian Speech Science and Technology Association. Deterding, David. 2001. The measurement of rhythm: A comparison of Singapore and British English. Journal of Phonetics 29 (2). 217–230. Duanmu, San. 2007. The phonology of Standard Chinese, 2nd edn. Oxford: Oxford University Press. Elsner, Daniela. 2007. Hörverstehen im Englischunterricht der Grundschule: Ein Leistungsvergleich zwischen Kindern mit Deutsch als Muttersprache und Deutsch als Zweitsprache. Frankfurt: Peter Lang. Fagyal, Zsuzsanna. 2010. Accents de banlieue: Aspects prosodiques du français populaire en contact avec les langues de l’immigration. Paris: L’Harmattan. Fagyal, Zsuzsanna, Douglas Kibbee & Fred Jenkins. 2006. French: A linguistic introduction. Cambridge: Cambridge University Press. Gabriel, Christoph, Adelheid Hu, Lan Diao & Jeanette Thulke. 2012. Transfer, phonological awareness und Mehrsprachigkeitsbewusstsein: Zum Erwerb des französischen Sprachrhythmus durch Schüler/innen mit chinesischem Sprachhintergrund im deutschen Schulkontext. Bericht aus einem laufenden Forschungsprojekt. Zeitschrift für Fremdsprachenforschung 23. 53−76. Gabriel, Christoph & Elena Kireva. 2014. Prosodic transfer in learner and contact varieties: Speech rhythm and intonation of Buenos Aires Spanish and L2 Castilian Spanish produced by Italian native speakers. Studies in Second Language Acquisition 36. 257–281. Gabriel, Christoph, Johanna Stahnke, Jeanette Thulke & Sevda Topal. 2015. Positiver Transfer aus der Herkunftssprache? Zum Erwerb des französischen und englischen Sprachrhythmus durch mehrsprachige deutsch-chinesische und deutsch-türkische Lerner. In Johannes

Acquiring English and French speech rhythm in a multilingual classroom

159

Müller-Lancé, Eva Maria Fernández Ammann & Amina Kropp (eds.), Herkunftsbedingte Mehrsprachigkeit im Unterricht der romanischen Sprachen in Schule und Universität: Herausforderung und Chance für die romanistische Sprachwissenschaft? Berlin: Frank & Timme, 5–26. Grabe, Esther & Ee Ling Low. 2002. Durational variability in speech and the rhythm class hypothesis. In Carlos Gussenhoven & Natasha Warner (eds.), Laboratoy Phonology 7, 515–546. Berlin: Mouton De Gruyter. Gussenhoven, Carlos. 2004. The phonology of tone and intonation. Cambridge: Cambridge University Press. Gut, Ulrike. 2009. Introduction to English phonetics and phonology. Frankfurt: Peter Lang. Gut, Ulrike. 2012. Rhythm in L2 speech. Speech and language technology (Technologia Mowy i Języka) 14/15. 83–94. Hu, Adelheid. 2011. Migrationsbedingte Mehrsprachigkeit und schulischer Fremdsprachenunterricht: Forschung, Sprachenpolitik, Lehrerbildung. In Hannelore Faulstich-Wieland (ed.), Umgang mit Heterogenität und Differenz, 121–140. Baltmannsweiler: Schneider Hohengehren. IPA (International Phonetic Association). 1999. Handbook of the International Phonetic Association. Cambridge: Cambridge University Press. Jian, Hua-Li. 2004. On the syllable timing in Taiwan English. In Keikichi Hirose (ed.), Proceedings of Speech Prosody 2004, Nara Japan, 247–250. http://sprosig.isle.illinois.edu/sp2004/PDF/ Jian.pdf (accessed 14 November 2013). Jouvet, Laurent. 2006. Les petites histoires d’Amandine. Stuttgart: Klett. Jun, Sun-Ah & Cécile Fougeron. 2000. A phonological model of French intonation. In Antonis Botinis (ed.), Intonation: Analysis, modelling and technology, 209–242. Dordrecht: Kluwer. Kehoe, Margaret, Conxita Lleó & Martin Rakow. 2011. Speech rhythm in the pronunciation of German and Spanish monolingual and German-Spanish bilingual 3-year-olds. Linguistische Berichte 227. 323–352. Kinoshita, Naoka & Chris Sheppard. 2011. Validating acoustic measures of speech rhythm for second language acquisition. In Wai-Sum Lee & Eric Zee (eds.), Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, 1086–1089. Hong Kong: City University of Hong Kong. Kurpaska, Maria. 2010. Chinese language(s): A look through the prism of The Great dictionary of modern Chinese dialects. Berlin: Mouton de Gruyter. Kvale, Steinar. 2007. Doing Interviews. London & Los Angeles: Sage. Li, Aouju & Brechtje Post. 2012. L2 rhythm development by Mandarin Chinese learners of English. Poster presented at Perspectives on Rhythm and Timing (PoRT), University of Glasgow, July 20. Liberman, Mark & Alan S. Prince. 1977. On stress and linguistic rhythm. Linguistic Inquiry 8. 249–336. Lin, Hua & Qian Wang. 2007. Mandarin rhythm: An acoustic study. Journal of Chinese Language and Computing 17 (3). 127−140. Lin, Yen-Hwei. 2007. The sounds of Chinese. Cambridge: Cambridge University Press. Loukina, Anastassia, Greg Kochanski, Burton Rosner, Elinor Keane & Chilin Shih. 2011. Rhythm measures and dimensions of durational variation in speech. Journal of the Acoustical Society of America 129. 3258−3270. Low, Ee Ling & Esther Grabe. 1995. Prosodic patterns in Singapore English. In Kjell Elenius & Peter Branderud (eds.), Proceedings of the XIIIth International Congress of Phonetic Sciences, Stockholm, Sweden, 636–639. Stockholm: Kungliga Tekniska Högskolan (Royal Institute of Technology).

160

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

Low, Ee Ling, Esther Grabe & Francis Nolan. 2000. Quantitative characterizations of speech rhythm: Syllable-timing in Singapore English. Language and Speech 43. 377–401. Mairano, Paolo & Antonio Romano. 2010. Un confronto tra diverse metriche ritmiche usando Correlatore. In Stephan Schmid, Michael Schwarzenbach & Dieter Studer (eds.), La dimensione temporale del parlato: Proceedings of the V National AISV Congress, 79–100. Torriana: EDK. Martinez, Hélène & Marcus Reinfried (eds.). 2006. Mehrsprachigkeitsdidaktik gestern, heute und morgen. Tübingen: Narr. Mehler, Jacques, Emmanuel Dupoux, Thierry Nazzi & Ghislaine Dehaene-Lambertz. 1996. Coping with linguistic diversity: The infant’s viewpoint. In James L. Morgan & Katherine Demuth (eds.), Signal to syntax: Bootstrapping from speech to grammar in early acquisition, 101–116. Mahwah, NJ: Erlbaum. Mehlhorn, Grit. 2008. Russisch nach Englisch, Polnisch nach Russisch. Überlegungen zu einer Mehrsprachigkeitsdidaktik der slavischen Sprachen aus phonetischer Sicht. In Ljudmila Geist & Grit Mehlhorn (eds.), XIV. JungslavistInnentreffen in Stuttgart, 117–145. München: Kubon & Sagner. Meißner, Franz-Joseph & Marcus Reinfried. 1998. Mehrsprachigkeitsdidaktik. Konzepte, Analysen, Lehrerfahrungen mit romanischen Fremdsprachen. Tübingen: Narr. Meng, Helen, Chiu-yu Tseng, Mariko Kondo, Alissa Harrison & Tanya Viscelgia. 2010. Studying L2 suprasegmental features in Asian Englishes: A position paper. In International Speech Communication Association (ed.), 10th Annual Conference of the International Speech Communication Association, Brighton, UK, 6-10 September, Vol. 3, 1683–1686. Red Hook, NY: Curran. Mok, Peggy Pik Ki & Volker Dellwo. 2008. Comparing native and non-native speech rhythm using acoustic rhythmic measures: Cantonese, Beijing Mandarin and English. In Plinio Barbosa, Sandra Madureira & César Reis (eds.), Proceedings of the 4th Conference on Speech Prosody, 423–426. Campinas, Brazil: Editoria RG/CNPq. http://sprosig.isle.illinois.edu/ sp2008/papers/id063.pdf (accessed 14 November 2013) Odlin, Terence. 1989. Language transfer: Cross-linguistic influence in language learning. Cambridge: Cambridge University Press. Odlin, Terence. 2003. Cross-linguistic influence. In Catherine J. Doughty & Michael H. Long (eds.), The handbook of second language acquisition, 436–486. London: Blackwell. Ordin, Mikhail, Leona Polyanskaya and Christiane Ulbrich. 2011. Acquisition of timing patterns in second language. In Piero Cos, Renato de Mori, Giuseppe di Fabbrizio & Roberto Pieraccini (eds.), Proceedings of the 12th Annual Conference of the International Speech Communication Association: Interspeech 2011, Florence, Italy, August 27–31. http://www. isca-speech.org/archive/interspeech_2011/i11_1129.html (accessed 14 November 2013). Osburne, Andrea G. 2003. Pronunciation strategies of advanced ESOL learners. International Review of Applied Linguistics in Language Teaching 41 (2). 131–141. Pike, Kenneth L. 1945. The intonation of American English. Ann Arbor, MI: University of Michigan Press. Pulzován de Egger, Silvia. 2002. Fremdsprache und Rhythmus. Eine Untersuchung zum Sprachrhythmus in Deutsch und Spanisch als Fremdsprache. Marburg: Tectum. Ramsey, S. Robert. 1987. The languages of China. Princeton: Princeton University Press. Ramus, Franck, Marina Nespor & Jacques Mehler. 1999. Correlates of linguistic rhythm in the speech signal. Cognition 73 (3). 265−292.

Acquiring English and French speech rhythm in a multilingual classroom

161

Rauch, Dominique P., Astrid Jurecka & Hermann-Günter Hesse. 2010. Für den Drittspracherwerb zählt auch die Lesekompetenz in der Herkunftssprache: Untersuchung der Türkisch-, Deutsch- und Englisch-Lesekompetenz bei Deutsch-Türkisch bilingualen Schülern. In Cristina Allemann-Ghionda, Petra Stanat, Kerstin Göbel & Charlotte Röhner (eds.), Migration, Identität, Sprache und Bildungserfolg (Zeitschrift für Pädagogik, Beiheft 55), 78–100. Weinheim: Beltz. Roach, Peter. 1982. On the distinction between ‘stress-timed’ and ‘syllable-timed’ languages. In David Crystal (ed.), Linguistic controversies, 73–79. London: Arnold. Schmidt, Claudia. 2010. Sprachbewusstheit und Sprachlernbewusstheit. In Hans-Jürgen Krumm, Christian Fandrych, Britta Hufeisen & Claudia Riemer (eds.), Deutsch als Fremd- und Zweitsprache: Ein internationales Handbuch, 858–866. Berlin: Mouton de Gruyter. Setter, Jane. 2003. A comparison of speech rhythm in British and Hong Kong English. In MariaJosep Solé, Daniel Recasens & Joan Romero (eds.), Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, Spain, 3–9 August, 467–470. Barcelona: Universitat Autònoma de Barcelona. Setter, Jane. 2006. Speech rhythm in World Englishes: The case of Hong Kong. TESOL Quarterly 40 (4). 763–782. White, Laurence & Sven L. Mattys. 2007. Calibrating rhythm: First language and second language studies. Journal of Phonetics 35. 501−522. White, Laurence, Sven L. Mattys & Lukas Wiget. 2012. Language categorization by adults is based on sensitivity to durational cues, not rhythm class. Journal of Memory and Language 66. 665−679. Wiese, Richard. 1996. The phonology of German. Oxford: Oxford University Press. Wiget, Lukas, Laurence White, Barbara Schuppler, Izabelle Grenon, Olesya Rauch & Sven L. Mattys. 2010. How stable are acoustic metrics of contrastive speech rhythm? Journal of the Acoustical Society of America 127 (3). 1559−1569. Yoon, Tae-Jin. 2010. Capturing inter-speaker invariance using statistical measures of speech rhythm. Proceedings of Speech Prosody 5, Chicago, IL, USA, May 10-14. http://speechprosody2010. illinois.edu/papers/100201.pdf (accessed 11 April 2014).

162

Christoph Gabriel, Johanna Stahnke, and Jeanette Thulke

Appendix: Materials recorded Mandarin Chinese: The North Wind and the Sun 北风与太阳 有一次,北风和太阳正在争论谁比较有本事。他们正好看到有个人走过, 那个人穿着一件斗篷。他们就说了,谁可以让那个人脱掉那件斗篷,就算谁 比较有本事。于是,北风就拼命地吹。怎知,他吹得越厉害,那个人就越是 用斗篷包紧自己。最后,北风没办法,只好放弃。接着,太阳出来晒了一 下,那个人就立刻把斗篷脱掉了。于是,北风只好认输啦。 Běi fēng hé tàiyáng Yǒu yí cì, běi fēng hé tàiyáng zhèng zài zhēng lùn shuí bǐ jiào yǒu běn shì. Tāmen zhèng hǎo kàn dào yǒu gè rén zǒu guò, nà gè rén chuān zhe yī jiàn dǒu péng. Tāmen jiù shuō le, shuí kě yǐ ràng nà gè rèn tuō diào nà jiàn dǒu péng, jiù suàn shuí bǐ jiào yǒu běn shì. Yú shì, běi fēng jiù pīn mìng de chuī. Zěn zhī, tā chuī de yuè lì hài, nà gè rén jiù yuè shì yòng dǒu péng bāo jǐn zì jǐ. Zuì hòu, běi fēng méi bàn fǎ, zhǐ hǎo fàng qì. Jiē zhe, tài yáng chū lái shài le yī xià, nà gè rén jiù lì kè bǎ dǒu péng tuō diào le. Yú shì, běi fēng zhǐ hǎo rèn shū la.

French: Short story from textbook (Jouvet 2006: 7, slightly adapted) Amandine fait du sport Les chats n’aiment pas faire du sport, mais le chat des Carbonne aime ça. Le chat s’appelle Amandine. Elle fait souvent du sport le dimanche soir. Elle ne fait pas de la natation parce que les chats n’aiment pas l’eau. Elle ne fait pas du foot avec Alain, et elle ne fait pas du jogging avec Olivier. Mais elle fait du sport le dimanche soir. La famille Carbonne est devant la télé le dimanche soir, et ils ne jouent pas avec Amandine. Elle n’aime pas ça ! Alors elle grimpe sur la télé. Alors maman va à la cuisine et Amandine va aussi à la cuisine. Après, Amandine grimpe sur les genoux de papa et de maman, puis elle grimpe encore sur la télé, puis sur les étagères et sur la table. Et comme ça, Amandine fait du sport le dimanche soir.

Acknowledgements The present study was partly funded by the Free and Hanseatic City of Hamburg within the scope of the cluster of excellence (federal level) “Linguistic diversity

Acquiring English and French speech rhythm in a multilingual classroom

163

Management in Urban Areas” (LiMA). We are grateful for this financial support. Moreover, we are deeply indebted to Adelheid Hu (University of Luxembourg), who co-directed the project at an earlier stage, for fruitful discussions on methodological issues. Further thanks go to Lan Diao (University of Hamburg), especially for her substantial help with the data collection in Beijing (May 2012), and to Vasyl Druchkiv (University Medical Center Hamburg-Eppendorf, Germany) for his help with the statistical analyses. Last but not least, we would like to thank our student assistants Annette Armbrust, Rebekka Constantin, Birte Dorau, Pauline Gaillot, Jonas Grünke, Hongguang Liu, and Duygu Murathanoğlu (all University of Hamburg) for their help with transcribing and segmenting the materials, as well as to Dirk Kessing (University of Hamburg) for his help with the cross-checking of references.

Robert Fuchs and Eva-Maria Wunder

9 A sonority-based account of speech rhythm in Chinese learners of English1 1 Introduction Speech rhythm appears to be a rather difficult feature to master when learning a second or foreign language (e.g. Adams 1979; Bond and Fokes 1985; Faber 1986; Wennerstrom 2001). As various studies have confirmed (e.g. Kaltenbacher 1998; Moyer 1999; Van Els and de Bot 1987; Anderson-Hsieh, Johnson, and Koehler 1992), an inappropriate rendering of the speech rhythm of a language is one of the main reasons learner speech is perceived as accented. To date, the most frequently adduced reason for rhythmic differences in L1 speakers compared to learners is influence resulting or deriving from structural differences between the L1 and the non-native target language (e.g. Adams 1979; Wenk 1985; Kaltenbacher 1998; Gut 2003a, 2003b; Lee, Guion, and Harada 2006). Further potential sources of influence on non-native speech rhythm have not been thoroughly explored yet, such as the phonological structures of a previously acquired nonnative language. Another aspect that needs more attention is that second and foreign language users may differ with respect to the target norms they pursue (Gut 2007; Schneider 2003, 2007), so that differences between native and nonnative speech rhythm might be due to differences in norm orientation and/or cross-linguistic influence. Research trying to resolve these questions also needs to take into account that the definition and analysis of speech rhythm have evolved over the last decades. All languages were traditionally thought to belong to discrete rhythm classes: for example, British English (BrE) and German were thought to be stresstimed, while Spanish and French were thought to be syllable-timed. Following a more recent approach to speech rhythm, languages could also be differentiated according to phonetic aspects, i.e. length, pitch and quality, and phonological aspects, i.e. syllable structure and the function of accent (Dauer 1987). However, many researchers have come to the conclusion that speech rhythm should

1 The authors would like to thank the speakers for taking part in this study as well as Ulrike Gut and the reviewers for their helpful comments on an earlier version of this paper.

Robert Fuchs and Eva-Maria Wunder, Westfälische Wilhelms-Universität Münster

166

Robert Fuchs and Eva-Maria Wunder

rather be considered as a gradable property, where degrees of stress- and syllabletiming can be distinguished. The traditional account classifies the world’s languages as either stresstimed or syllable-timed2 (e.g. Pike 1945; Abercrombie 1967). This distinction of two main classes of speech rhythm was based on the assumption that in prototypically stress-timed languages, such as BrE, Arabic or German, prominent syllables seem to occur at quasi-isochronous (i.e. regular) intervals: each foot – comprising the length of a stress beat plus all subsequent unstressed syllables up to the next stress beat – is allegedly always of the same length. This means that a varying number of syllables can be contained in one such isochronous foot. Consequently, stress-timed languages were thought to show great durational variability of syllables. In prototypical syllable-timed languages, such as Spanish, Romanian or Mandarin Chinese, on the other hand, it is the duration of the syllable, not of the foot, that is supposedly equal; this means that the syllables are isochronous and stress beats occur only irregularly (e.g. Auer 2001; Rossi 1998). Thus, syllabletimed languages exhibit rather little or no variability of syllable durations. To account for the differences believed to exist between stress- and syllabletimed languages, several rhythm metrics have been proposed, all of which aim to capture acoustic correlates of speech rhythm. Many of these metrics rely on syllabic, consonantal or vocalic measures: Ramus, Nespor, and Mehler’s (1999) %V, ΔV and ΔC, based on vocalic and consonantal intervals; Dellwo’s (2006) modification of these, VarcoC and VarcoV; Low and Grabe’s (1995) PVI (or raw Pairwise Variability Index, rPVI); Gibbon and Gut’s (2001) Rhythm Ratio; or Deterding’s (2001) syllable-based Variability Index. Many of these rhythm metrics have been used to account for the rhythm of postcolonial varieties of English and learner varieties. However, most of them have attracted criticism, for example because they rely on time-consuming and error-prone manual annotation of syllables and vocalic or consonantal intervals. This makes it hard to compare results across studies. For the present paper, a different method, based on sonority measurements, will be used to determine rhythmic properties (see section 3 below for details). Rather than drawing on the above-mentioned differences in durational variability of syllables, this method (Galves et al. 2002) is based on acoustic measures of sonority and its variability. It can be calculated in an entirely automatic process, thus avoiding the comparability issues and the time-consuming annotation that many widely used rhythm metrics rely on. 2 A further category of mora-timed speech rhythm was later established, such as exhibited by Japanese or Tamil (e.g. Ramus, Nespor, and Mehler 1999: 266). None of the languages investigated in the present paper are mora-timed.

A sonority-based account of speech rhythm in Chinese learners of English

167

The present paper is the first to apply Galves et al.’s (2002) method of automated sonority measurements – and consequently the determination of speech rhythm – to learner language. One of the most interesting aspects in connection with speech rhythm is that it has been found to be transferable in language learning (e.g. Adams 1979; Flege and Bohn 1989; Gut 2009; Lee, Guion, and Harada 2006). Thus, native speakers of a syllable-timed language learning a stress-timed language, for instance, could exhibit L1 influence on their nonnative productions, so that their L2 rhythm leans towards syllable-timing. The speech rhythm of learner varieties often becomes a mixture of the L1 and the target language, with durational variability somewhere in between the two extremes, especially when the two differ to a greater degree with regard to rhythm (cf. White and Mattys 2007). In the present investigation, we want to find further evidence for these intermediate values for learner speech by testing native speakers of syllable-timed first language (L1) Mandarin Chinese learning English as a foreign language (with a stress-timed target rhythm). Furthermore, Galves et al.’s (2002) method could also be useful regarding the ongoing discussion about a phonological basis for a distinction between learner English, i.e. English as a foreign language (EFL), and English as a second language (ESL) as it is spoken in postcolonial contexts. We would like to explore whether differences in norm orientation between these groups might influence the rhythm of their speech. While EFL speakers strive to emulate the norms of native and established varieties of English (usually stress-timed British or American English), ESL speakers in postcolonial contexts usually adhere to local, emerging norms (Schneider 2007). We expect that Mandarin-accented EFL has a speech rhythm between the stress-timed rhythm of English as a native language (ENL) and L1 Mandarin syllable-timed rhythm. This is in contrast to Mandarin-based postcolonial varieties, which have been shown to have a more syllable-timed rhythm (Deterding 1994, 2001; Low, Grabe, and Nolan 2000). We will therefore apply Galves et al.’s (2002) sonority-based rhythm metrics to data from L1 Mandarin, ENL and Mandarin EFL advanced learners’ speech. If advanced Mandarin-accented EFL is relatively similar to ENL in rhythm as measured by sonority-based metrics, then this would support the hypothesis that the learners have successfully acquired the stress-timed ENL rhythm. If, on the other hand, there are substantial differences between these groups, this would suggest that some acoustic correlates of rhythm might be easier to acquire for some learners than others. The remainder of this paper is structured as follows: Section 2 discusses previous results on speech rhythm in learner language. Section 3 presents the sonority-based rhythm metrics used in this study, and section 4 explains how we applied them to our data. Section 5 presents the results, and section 6 discusses the implications of our findings.

168

Robert Fuchs and Eva-Maria Wunder

2 Speech rhythm in learner language A number of studies have previously explored non-native speech rhythm, mostly with a focus on English as the target language (e.g. Adams 1979; Flege and Bohn 1989; Lee, Guion, and Harada 2006). Adams (1979), for instance, elicited variables that add to creating non-native sounding rhythm in the targetlanguage English of learners from various L1 backgrounds, such as a lack of durational differences between stressed and unstressed syllables (cf. Gut 2009: 172; Kaltenbacher 1998: 28). Most studies investigating non-native speech rhythm in English relied on vowel reduction as an acoustic correlate of speech rhythm (e.g. Bond and Fokes 1985; Mairs 1989; Wenk 1985; Zborowska 2000); the common result is that learners of English do not produce enough vowel reduction compared to native speakers. Other studies used the vocalic rhythm metrics VarcoV and nPVI-V to investigate the variability of vocalic durations. In cases where the L1 and target language differed substantially in rhythm, these studies often found a mixed rhythmic pattern in the learner variety (White and Mattys 2007; Grenon and White 2008; Jang 2008; Dellwo, Gutierrez Diez, and Gavalda 2009; Sarmah, Gogoi, and Wiltshire 2009; Ordin, Polyanskaya, and Ulbrich 2011; Tsiartsioni, 2011). The speech rhythm of the two languages investigated in the present paper, Mandarin Chinese and English, is sufficiently different to allow us to pin down any potential L1 influence on the L2. In Mandarin Chinese, a language commonly classified as syllable-timed, most syllables have roughly similar durations, compared to stress-timed languages (e.g. Rossi 1998; Chiao and Kelz 1985: 30; Hunold 2009: 85; Mok 2009). BrE, by comparison, has a tendency for roughly equally long intervals between two stressed beats, regardless of how many unstressed syllables intervene (e.g. Gut 2003b: 140). Thus, sometimes more, sometimes fewer syllables are produced during one stress interval, obviously resulting in durational variability of these syllables. Studies on the rhythm of Mandarin-accented EFL speech have so far failed to provide unequivocal evidence of such a mixed rhythmic pattern. Lin and Wang (2008) found Mandarin-accented EFL speakers to use as much vocalic variability (nPVI-V) as ENL speakers, which was shown to be higher than in L1 Mandarin. If the Mandarin-accented EFL speakers had used a truly mixed rhythm, their vocalic variability would have been halfway between that of L1 Mandarin and ENL. Regarding the proportion of vocalic durations over the whole utterance duration (%V), there was more evidence of such a mixed pattern. %V was found to be highest in L1 Mandarin, followed by Mandarin-accented EFL, and finally ENL with the lowest value. This is evidence for the L1 Mandarin

A sonority-based account of speech rhythm in Chinese learners of English

169

speakers using a more syllable-timed rhythm, the ENL speakers a more stresstimed rhythm, and the Mandarin-accented EFL speakers using a mixed rhythm, but tests of the statistical significance of the differences were not reported. A second study by He (2010) also found L1 Mandarin speakers to exhibit lower variability of vocalic durations (as measured by VarcoV and nPVI-V) than ENL speakers. Advanced Mandarin-accented EFL learners in this study spoke with intermediate variability of vocalic durations. This is commensurate with a description of ENL as stress-timed, L1 Mandarin as syllable-timed, and Mandarinaccented EFL as having a mixed rhythm. But the Mandarin-accented EFL speakers were in fact relatively close to the ENL speakers, so that the difference between ENL and EFL was not (VarcoV) or only marginally significant (nPVI-V). The same constellation was found for the proportion of vocalic durations over the whole utterance duration: %V was significantly higher for L1 Mandarin than for Mandarin-accented EFL, and ENL had lower %V than Mandarin-accented EFL, although this difference was not significant.3 Studies of Mandarin-accented EFL speech have thus failed to provide unequivocal evidence for its alleged mixed rhythm. By contrast, studies of an ESL variety with Mandarin substrate, Singapore English (SinE), have provided such evidence. Low, Grabe and Nolan (2000) used measurements of the variability of vocalic durations (nPVI-V) to show that SinE speakers (significantly less variability) use a more syllable-timed rhythm than BrE speakers (significantly more variability). Similar conclusions were reached by Deterding (1994, 2001), who used a measure of the variability of syllable durations, the Variability Index. The SinE speakers in these studies varied the durations of syllables significantly less than the BrE speakers, which Deterding interpreted as pointing towards a more syllable-timed rhythm in SinE. The results of these studies suggest that Mandarin-accented EFL, a learner variety, might be relatively close in rhythm to ENL, which is perhaps testament to the fact that relatively advanced learner speech was examined. The learners seemed to be quite successful in imitating or acquiring the rhythm of the target language. SinE (an ESL variety), by contrast, was found to differ in rhythm from ENL. The speakers in the SinE studies also had Mandarin as their L1, but, unlike the Mandarin-accented EFL speakers, they probably did not strive to imitate ENL rhythm. Schneider (2003, 2007) argued that SinE has entered an “exonormative” phase in its development, where Singaporeans have stopped looking to other countries, such as Great Britain, to provide language standards. Instead, they 3 Another study on the speech rhythm of Mandarin-accented EFL (Chen and Zechner 2011) investigated in how far rhythm measurements of the speech of learners of English with L1 Mandarin can account for accent ratings, but did not provide actual rhythm measurements.

170

Robert Fuchs and Eva-Maria Wunder

rely on their own norms, and a more syllable-timed rhythm might be a part of these norms. It appears then that one of the differences between EFL and ESL is that proficient EFL learners of English with a syllable-timed L1 strive to acquire a native-like stress-timed rhythm, and are often successful at that. Proficient speakers of ESL varieties with a syllable-timed L1, by contrast, do not aim for a stress-timed rhythm, and a more syllable-timed rhythm may be part of the emerging standard of many of these postcolonial varieties. The fact that English speakers in a postcolonial setting tend to orientate themselves at the stress characteristics of their L1 rather than at those of one of the Inner Circle varieties reflects what Gut (2007) claimed for the phonologies of postcolonial varieties of English in her Norm Orientation Hypothesis. The studies referred to above all accounted for rhythm by measuring vocalic durations. While this is one of the most commonly used acoustic correlates of rhythm, recently a more holistic view of rhythm has been advocated, taking into account factors other than duration, such as intensity, loudness and pitch (Cumming 2010, 2011; Fuchs 2013, 2014a, 2014b; He 2012; Low 1998; Stojanovic 2009). One of these factors, Galves et al. (2002) argued, should be sonority.

3 Sonority-based rhythm metrics Contrary to the most commonly used rhythm metrics, such as nPVI-V and %V, which necessarily rely on extensive manual annotation, the sonority-based method suggested by Galves et al. (2002) calculates a measure of sonority based on the rate of change in the spectrum of the speech signal almost fully automatically. Sonority was defined by the authors as relative change in the acoustic signal. Each 2 ms stretch of a recording was mapped onto a scale ranging from 0 to 1, with no change in the acoustic signal corresponding to 1, or high sonority, and rapid change in the acoustic signal corresponding to 0, or little sonority. Following this definition, vowels, with relatively stable periodic patterns, are very sonorous. Obstruents, by contrast, are characterised by aperiodic noise and rapid change in the acoustic signal, corresponding to regions of low sonority. On the basis of this scale, two metrics were defined: S̄ is a measure of mean sonority in an utterance and higher for syllable-timed languages, and δS is a measure of mean change in sonority and higher for stress-timed languages. Using Ramus, Nespor and Mehler’s (1999) original data in order to replicate and triangulate their results, Galves et al. (2002) were able to discriminate between languages in terms of their rhythmic class, like Ramus, Nespor and

A sonority-based account of speech rhythm in Chinese learners of English

171

Mehler did with their ΔV, ΔC and %V calculations. That is, both measures placed languages on a similar rhythm scale with different metrics applied to these languages. Galves et al.’s methods were also used by Fuchs (2013), who showed that Educated Indian English has a more syllable-timed rhythm than BrE in terms of sonority. Educated Indian English has higher mean sonority (in read and spontaneous speech) and less variation in sonority (in read speech) than BrE.

4 Method 4.1 Data Recordings were made of 10 L1 Mandarin Chinese advanced learners of English, namely five female and five male speakers aged between 21 and 28 at the time of recording. They started learning English, as their first non-native language, in China via formal instruction when they were between 10 and 14 years old. Classroom instruction was oriented towards the norms of BrE. The learners were first recorded performing a read-out-loud text task in their L1 (syllabletimed) Mandarin Chinese, namely the Mandarin version of Aesop’s fable The North Wind and the Sun plus two phonetically rich sentences (157 words, see Appendix 1). The latter two sentences were selected based on the number of differing phonemes occurring in them in order to aim for a maximally broad sample of Mandarin speech. Secondly, the learners were recorded reading out a short text taken from the online edition of National Geographic in their foreign language English (282 words, see Appendix 2). All recordings were conducted in a quiet room. In order to avoid any bias of results, the participants were simply asked to read out loud first the English and then the Mandarin texts, and were only told after the recordings which aspects of their productions would be investigated. To provide a point of comparison in the form of ENL, four native speakers of English (two Southern BrE, one American English, one Scottish English, all female, aged 22 to 48, recorded in Münster, Germany) were recorded reading the same text as the Mandarin L2 English speakers. For the recordings, a handheld Edirol R-09 wav/mp3 recording device with an inbuilt stereo condenser microphone was used with a sampling rate of 44 kHz and a bit depth of 16 bit. For the ENL recordings, a high-quality head-mounted microphone was used with the Edirol recording device.

172

Robert Fuchs and Eva-Maria Wunder

4.2 Data analysis The uncompressed stereo wav-files were transformed into mono-channel files with the open-source audio editor Audacity. In order to prepare the application of Galves et al.’s (2002) automated sonority measurements, the speech signal had to be segmented into breath units. These were defined as continuous speech of at least five syllables. A pause was defined as a period of silence of at least 150 ms. Utterances of less than five continuous syllables were excluded, as were hesitations, stuttering and non-speech noises, etc. In order to avoid any bias of syllable-final lengthening, the last syllable of each breath unit was also excluded from calculations. After annotating breath units and pauses in Praat TextGrid files and scaling all recordings to the same average intensity level, all breath units of at least five syllables minus the final syllable were extracted automatically with a Praat script. The breath units were then analysed with the Perl script provided by Galves et al. (2002),4 which uses a console version of Praat (Praatcon) for all acoustic measurements. After calculating mean sonority and variation in sonority for each of the breath units, the data was imported into the statistical package R. Finally, median values of mean sonority and variation in sonority were calculated for each speaker individually. These speaker median values were then used to calculate mean values for each of the three languages/varieties. The significance of the differences between them was established with t-tests, which were applied separately for mean sonority and variation in sonority. It is expected that the L1 Mandarin Chinese passage turns out to be most syllable-timed and the ENL passage to be most stress-timed, whereas the Mandarin-accented EFL recordings will be positioned in between due to influence from the L1. Consequently, mean sonority (S̄ ) is expected to be highest for L1 Mandarin and lowest for ENL; variation in sonority (δS), on the other hand, is hypothesised to be highest for ENL and lowest for L1 Mandarin. For Mandarinaccented EFL, S̄ and δS are expected to show hybrid values: When influenced more by the learners’ L1, they should be situated more towards the syllabletimed end of the speech rhythm continuum, and when influenced by the target language English they should be situated more towards the stress-timed extreme. Whether these expectations indeed hold true will be discussed in the subsequent results section.

4 The following parameters were used: Beta 1.5, range 3, beginning 20 Hz, end 800 Hz, frequency step 20 Hz, time step 2 ms, window size 25 ms.

A sonority-based account of speech rhythm in Chinese learners of English

173

5 Results Figure 1 below shows average mean sonority and variation in sonority for the three groups (data for individual speakers is shown in Appendix 3). Variation in sonority was lowest for Mandarin-accented EFL (EFL, 0.142), followed by L1 Mandarin (Man, 0.146) and ENL (0.156). ENL differed significantly from Mandarinaccented EFL (p < 0.0001, t = 6.0, df = 70.5) and L1 Mandarin (p < 0.01, t = 3.1, df = 43.9), but L1 Mandarin and Mandarin-accented EFL were not significantly different in mean sonority (p = 0.17, t = –1.4, df = 35.1). Variation in sonority was hypothesised to be higher for ENL than for both L1 Mandarin and Mandarin-accented EFL, which was supported by the data. However, Mandarin-accented EFL was expected to have an intermediate value between L1 Mandarin and ENL, but in fact did not differ significantly from L1 Mandarin.

Figure 1: Mean and standard deviation for variation in sonority (δS) and mean sonority (S̄ ) in L1 Mandarin (Man), ENL and Mandarin-accented EFL (EFL).5 5 Our results differ in magnitude from those presented by Galves et al. (2002), most likely due to differences in bit rate, which the authors unfortunately did not specify for the recordings they used. Our recordings had a bit rate of 16 bit * 44,100 Hz * 1 channel = 1,411.2 kbit/s.

174

Robert Fuchs and Eva-Maria Wunder

Mean sonority was lowest for Mandarin-accented EFL (0.441), followed by L1 Mandarin (0.445), and ENL (0.506). ENL differed significantly from Mandarinaccented EFL (p < 0.05, t = 2.5, df = 11.6) and L1 Mandarin (p < 0.05, t = 2.2, df = 11.8), but L1 Mandarin and Mandarin-accented EFL did not differ significantly (p = 0.9, t = –0.13, df = 17.9). Mean sonority was hypothesised to be higher for L1 Mandarin than for ENL, which the data did not support. In fact, ENL had a higher mean sonority than both L1 Mandarin and Mandarin-accented EFL.

6 Discussion This study was driven by two pairs of hypotheses: The first two hypotheses stated that ENL speakers have (a) higher variation in sonority and (b) lower mean sonority than L1 Mandarin speakers. Recordings of four ENL speakers and ten L1 Mandarin speakers provided support for part (a), i.e. higher variation in sonority in ENL than in L1 Mandarin. However, part (b), lower mean sonority in ENL than in L1 Mandarin, was not confirmed. This unexpected result might conceivably be due to the text that was chosen as the basis for the present measurements, and the fact that only four speakers were recorded.6 The second pair of hypotheses concerned a possible difference between Mandarin-accented EFL and ESL varieties used by L1 Mandarin speakers (such as SinE). Previous research, based on measurements of speech rhythm as variability of vocalic durations, suggested that advanced EFL learners approximate ENL rhythm fairly closely, while speakers of ESL varieties might maintain their more syllable-timed rhythm and continue to adhere to the local emerging standard. It was thus hypothesised that Mandarin-accented EFL would have intermediate values between L1 Mandarin and ENL, or values closer to ENL, in (a) mean sonority and (b) variation in sonority. The results did not support this. Mandarin-accented EFL had in fact values of mean sonority and variation in sonority that did not differ significantly from L1 Mandarin. Although the Mandarinaccented EFL speakers were advanced learners, they did not approximate ENL stress-timed rhythm as measured by the sonority-based metrics. It thus appears that duration-based metrics suggest a difference between postcolonial varieties of English with syllable-timed substrate languages (more

6 Recordings of a different and longer text (392 words) read by 10 speakers of British English, taken from the DyViS corpus (Nolan 2006), indicated a much lower value for mean sonority of 0.351 (Fuchs 2013).

A sonority-based account of speech rhythm in Chinese learners of English

175

syllable-timed than ENL), and EFL learners with syllable-timed L1 (close approximation to ENL stress-timed rhythm). Sonority-based metrics, by contrast, suggest that even advanced EFL learners with a syllable-timed L1 maintain this syllabletimed rhythm. These results are seemingly contradictory as long as speech rhythm is regarded as a unitary phenomenon with different acoustic correlates, such as variability in vocalic durations, variation in sonority and mean sonority. However, speech rhythm might be better considered as a multidimensional phenomenon (Fuchs 2013, 2014a, b; Gut, Trouvain, and Barry 2007; Stojanovic 2009; Loukina et al. 2011; Nolan and Asu 2009). A language might tend towards syllable-timing as measured by one acoustic correlate, and towards stresstiming as measured by another acoustic correlate. Taking this perspective, it appears that EFL speakers with a syllable-timed L1 can come close to ENL stress-timed rhythm as measured by one acoustic correlate (variability in vocalic durations), and maintain a syllable-timed rhythm as measured by other correlates (variability in sonority and mean sonority). Other acoustic correlates of rhythm that might shed light on this are variation in fundamental frequency, intensity and loudness. In fact, He (2012) studied variability in intensity in Mandarin English, L1 English and L1 Mandarin, and found that Mandarin L2 English remains relatively close to L1 Mandarin syllabletimed rhythm, and does not approximate L1 English stress-timed rhythm. Like mean sonority and variation in sonority, variability in intensity thus appears to be another correlate of rhythm that is more strongly influenced by the L1, even in advanced learners. Table 1: Overview of different dimensions of speech rhythm, their assumed learnability, and outcomes in EFL speakers and ESL varieties with syllable-timed substrate languages. Rhythm dimension

Learnability

English as a foreign language (EFL)

English as a second language (postcolonial varieties of English)

Aim to acquire ENL stress-timed rhythm

Aim to maintain endonormative standards, which might include a more syllable-timed rhythm

Variability of vocalic durations

Comparatively easy

Close approximation to ENL stress-timed rhythm

More syllable-timed rhythm than ENL

Variation in sonority

Comparatively difficult

More syllable-timed rhythm than ENL

More syllable-timed rhythm than ENL

Mean sonority

Comparatively difficult

More syllable-timed rhythm than ENL

More syllable-timed rhythm than ENL

Variability in intensity

Comparatively difficult

More syllable-timed rhythm than ENL

More syllable-timed rhythm than ENL

176

Robert Fuchs and Eva-Maria Wunder

Table 1 summarises these results, and attempts a classification of different dimensions of speech rhythm, their learnability, and the consequences for EFL and ESL varieties with a syllable-timed substrate language. Crucially, EFL speakers tend to aim for ENL stress-timed rhythm, while speakers of ESL varieties strive to maintain endonormative standards, which might include a more syllable-timed rhythm. The variability of vocalic durations appears to be a rhythm dimension that is comparatively easy to acquire by learners, and advanced EFL speakers with L1 Mandarin Chinese have been shown to approximate ENL stress-timed rhythm fairly closely on this dimension (Lin and Wang 2008; He 2010). Speakers of ESL varieties with a Mandarin Chinese substrate, by contrast, have been found to maintain a more syllable-timed rhythm, driven by their endonormative standards (Low, Grabe, and Nolan 2000). The conclusion that variability of vocalic durations is a correlate of speech rhythm that is comparatively easy to acquire (at least for advanced learners) is also supported by Gu and Hirose (2014), who investigated the speech rhythm of advanced L1 English learners of Mandarin. Variation in sonority and mean sonority might be dimensions of speech rhythm that are comparatively hard to change in second language learning. This is what the results of the present study suggest, where speakers of Mandarinaccented EFL were relatively close in rhythm to L1 Mandarin speakers. Speakers of ESL varieties with a syllable-timed substrate language also tend to maintain a more syllable-timed rhythm as measured by this dimension. This is because they desire to maintain local standards with a more syllable-timed rhythm, and, even if they wanted to acquire a more stress-timed rhythm, would face difficulties in doing so because this rhythm dimension might be harder to acquire. Evidence for this rhythm dimension in SinE is lacking, but, as mentioned above, Fuchs (2013) showed that speakers of Educated Indian English have a more syllable-timed rhythm than BrE speakers as far as sonority-based measurements are concerned. Finally, variability in intensity is another dimension of speech rhythm. It is also comparatively difficult to acquire, and He (2012) showed that even advanced learners of English with L1 Mandarin fail to approximate a stresstimed rhythm on this dimension. Speakers of ESL varieties with a syllable-timed L1 are also likely to maintain a more syllable-timed rhythm because of (1) endonormative standards, which possibly mandate a more syllable-timed rhythm, and (2) because variability in intensity is a rhythm dimension that is more difficult to acquire. This is commensurate with Gut’s (2007) Norm Orientation Hypothesis, and also supported by Low (1998), who determined that SinE has less variability in intensity than BrE. While a range of studies provide support for the generalisations summarised in Table 1, future research might offer more evidence on these questions, with a

A sonority-based account of speech rhythm in Chinese learners of English

177

wider range of L1s. In addition, other dimensions of speech rhythm, such as variability in loudness and fundamental frequency, should also be considered. Finally, the question of the learnability of different dimensions of speech rhythm also needs to be looked at more closely. Although previous research supports the view that variability in vocalic durations is a feature that is easier to acquire than other dimensions of rhythm, it remains unclear whether it is inherently less difficult, or whether a focus on this rhythm dimension in English language instruction is responsible. If there is evidence for the latter, then a focus on other dimensions of speech rhythm might also help EFL learners in acquiring a stresstimed rhythm on these dimensions.

7 References Abercrombie, David. 1967. Elements of general phonetics. Edinburgh: Edinburgh University Press. Adams, Corinne. 1979. English speech rhythm and the foreign learner. The Hague: Mouton de Gruyter. Anderson-Hsieh, Janet, Ruth Johnson and Kenneth Koehler. 1992. The relationship between native speaker judgments of nonnative pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning 42 (4). 529–555. Auer, Peter. 2001. Silben- und akzentzählende Sprachen. In Martin Haspelmath, Ekkehard König, Wulf Oesterreicher and Wolfgang Raible (eds.), Language typology and language universals: An international handbook, 1391–1399. Berlin: Mouton de Gruyter. Bond, Zinny S. and Joann Fokes. 1985. Non-native patterns of English syllable timing. Journal of Phonetics 13. 407–420. Chen, Lei and Klaus Zechner. 2011. Applying rhythm features to automatically assess nonnative speech. Proceedings of Interspeech 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27–31. Chiao, Wei J. and Heinrich P. Kelz. 1985. Chinesische Aussprache, 2nd edn. Bonn: Dümmler. Cumming, Ruth E. 2010. Speech rhythm: The language-specific integration of pitch and duration. Cambridge: University of Cambridge unpublished PhD dissertation. Cumming, Ruth E. 2011. Perceptually informed quantification of speech rhythm in Pairwise Variability Indices. Phonetica 68 (4). 256–277. Dauer, Rebecca M. 1987. Phonetic and phonological components of language rhythm. In Tamaz V. Gamkrelidze (ed.), Proceedings of the 11th International Congress of Phonetic Sciences, Tallin, Estonia, August 1–7, 447–450. Talinn: Academy of Sciences of the Estonian SSR. Dellwo, Volker. 2006. Rhythm and speech rate: A variation coefficient for delta C. In Pawel Karnowski and Imre Szigeti (eds.), Language and language processing: Proceedings of the 38th Linguistic Colloquium, Piliscsaba, 2003, 231–241. Frankfurt: Peter Lang. Dellwo, Volker, Francisco Gutierrez Diez, and Nuria Gavalda. 2009. The development of measurable speech rhythm in Spanish speakers of English. Actas de XI Simposio Internacional de Comunicacion Social, Santiago de Cuba, 594–597.

178

Robert Fuchs and Eva-Maria Wunder

Deterding, David. 1994. The rhythm of Singapore English. In Roberto Togneri (ed.), Proceedings of the Fifth Australian International Conference on Speech Science and Technology, Perth, Australia, 316–321. Canberra: Australian Speech Science and Technology Association. Deterding, David. 2001. The measurement of rhythm: A comparison of Singapore and British English. Journal of Phonetics 29 (2). 217–230. Faber, David. 1986. Teaching the rhythms of English: A new theoretical base. International Review of Applied Linguistics in Language Teaching 24. 205–216. Flege, James Emil and Ocke-Schwen Bohn. 1989. An instrumental study of vowel reduction and stress placement in Spanish-accented English. Studies in Second Language Acquisition 11. 35–62. Fuchs, Robert. 2013. Speech rhythm in educated Indian English and British English. Münster: University of Münster unpublished PhD dissertation. Fuchs, Robert. 2014a. Integrating variability in loudness and duration in a multidimensional model of speech rhythm: Evidence from Indian English and British English. In Nick Campbell, Dafydd Gibbon and Daniel Hirst (eds.), Social and Linguistic Speech Prosody: Proceedings of the 7th International Conference on Speech Prosody 2014, Dublin, Ireland, 290–294. Fuchs, Robert. 2014b. Towards a perceptual model of speech rhythm: Integrating the influence of f0 on perceived duration. In Li, Haizhou, Meng, Helen, Ma, Bin, Cheng, Eng Siong, Xie, Lei (eds.), Proceedings of Interspeech 2014, Singapore, pp. 1949–1953. Singapore. Galves, Antonio, Jesus Garcia, Denise Duarte and Charlotte Galves. 2002. Sonority as a basis for rhythmic class discrimination. Paper presented at Speech Prosody 2002, Aix-enProvençe, France, 11–13 April. Gibbon, Dafydd and Ulrike Gut. 2001. Measuring speech rhythm. In Proceedings of Eurospeech 2001, Aalborg, Denmark, 91–94. Grenon, Isabelle and Laurence White. 2008. Acquiring rhythm: A comparison of L1 and L2 speakers of Canadian English and Japanese. In Harvey Chan, Heather Jacob and Enkeleida Kapia (eds.), Proceedings of the 32nd Annual Boston University Conference on Language Development, 155–166. Somerville: Cascadilla. Gu, Wentao and Keikichi Hirose. 2014. Rhythmic patterns in native and nonnative Mandarin speech. In Nick Campbell, Dafydd Gibbon and Daniel Hirst (eds.), Social and Linguistic Speech Prosody: Proceedings of the 7th International Conference on Speech Prosody 2014, Dublin, Ireland, 592–596. Gut, Ulrike. 2003a. Non-native speech rhythm in German. In Marie-Josep Solé, Daniel Recacens and Joan Romero (eds.), Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, Spain, 3–9 August, 2437–2440. Barcelona: Universitat Autònoma de Barcelona. Gut, Ulrike. 2003b. Prosody in second language speech production: The role of the native language. Fremdsprachen Lehren und Lernen 32. 133–152. Gut, Ulrike. 2007. First language influence and final consonant clusters in the new Englishes of Singapore and Nigeria. World Englishes 26 (3). 346–359. Gut, Ulrike. 2009. Non-native speech: A corpus-based analysis of phonological and phonetic properties of L2 English and German. Frankfurt: Peter Lang. Gut, Ulrike, Jürgen Trouvain and William J. Barry. 2007. Bridging research on phonetic descriptions with knowledge from teaching practice: The case of prosody in non-native speech. In Jürgen Trouvain and Ulrike Gut (eds.), Non-native prosody: Phonetic description and teaching practice, 3–21. Berlin and New York: Mouton de Gruyter.

A sonority-based account of speech rhythm in Chinese learners of English

179

He, Lei. 2010. Interlanguage Rhythm. Edinburgh: University of Edinburgh MA thesis. url: http:// hdl.handle.net/1842/6011. He, Lei. 2012. Syllabic intensity variations as quantification of speech rhythm: Evidence from both L1 and L2. In Ma, Qiuwu, Ding, Hongwei and Hirst, Daniel (eds.), Proceedings of the 6th International Conference on Speech Prosody, Shanghai, May 22–26, 2012. Shanghai: Tongji University Press. Hunold, Cordula. 2009. Untersuchungen zu segmentalen und suprasegmentalen Ausspracheabweichungen chinesischer Deutschlernender. Frankfurt: Peter Lang. Jang, Tae-Yeoub. 2008. Speech rhythm metrics for automatic scoring of English speech by Korean EFL learners. Malsori Speech Sounds 66. 41–59. Kaltenbacher, Erika. 1998. Zum Sprachrhythmus des Deutschen und seinem Erwerb. In Heide Wegener (ed.), Eine zweite Sprache lernen, 21–38. Tübingen: Narr. Lee, Borim, Susan G. Guion and Tetsuo Harada. 2006. Acoustic analysis of the production of unstressed English vowels by early and late Korean and Japanese bilinguals. Studies in Second Language Acquisition 28 (3). 487–513. Lin, Hua and Qian Wang. 2008. Interlanguage rhythm in the English production of Mandarin speakers. In Proceedings of the 8th Phonetic Conference of China and the International Symposium on Phonetic Frontiers, Beijing, April 18–20. Loukina, Anastassia, Greg Kochanski, Burton Rosner, Elinor Keane and Chilin Shih. 2011. Rhythm measures and dimensions of durational variation in speech. Journal of the Acoustical Society of America 129. 3258−3270. Low, Ee Ling. 1998. Prosodic prominence in Singapore English. Cambridge: University of Cambridge PhD dissertation. Low, Ee Ling and Esther Grabe. 1995. Prosodic patterns in Singapore English. In Kjell Elenius and Peter Branderud (eds.), Proceedings of the XIIIth International Congress of Phonetic Sciences, Stockholm, Sweden, 636–639. Stockholm: Kungliga Tekniska Högskolan (Royal Institute of Technology). Low, Ee Ling, Esther Grabe and Francis Nolan. 2000. Quantitative characterizations of speech rhythm: Syllable-timing in Singapore English. Language and Speech 43. 377–401. Mairs, Jane Lowenstein. 1989. Stress assignment in interlanguage phonology: An analysis of the stress system of Spanish speakers learning English. Susan M. Gass and Jacquelyn Schachter (eds.) Linguistic perspectives on second language acquisition, 260–283. Cambridge: Cambridge University Press. Mok, Peggy Pik Ki. 2009. On the syllable-timing of Cantonese and Beijing Mandarin. Chinese Journal of Phonetics 2. 148–154. Moyer, Alene. 1999. Ultimate attainment in L2 phonology. Studies in Second Language Acquisition 21 (1). 81–108. Nolan, Francis and Eva Liina Asu. 2009. The Pairwise Variability Index and coexisting rhythms in language. Phonetica 66 (1–2). 64–77. Nolan, Francis, Kirsty McDougall, Gea de Jong and Toby Hudson. 2006. A forensic phonetic study of ‘dynamic’ sources of variability in speech: The DyViS project. In Paul Warren and Catherine I. Watson (eds.), Proceedings of the 11th Australian International Conference on Speech Science and Technology, University of Auckland, New Zealand: December 6–8, 13– 18. Canberra: Australian Speech Science and Technology Association. Ordin, Mikhail, Leona Polyanskaya and Christiane Ulbrich. 2011. Acquisition of timing patterns in second language. In Piero Cos, Renato de Mori, Giuseppe di Fabbrizio and Roberto Pieraccini (eds.), Proceedings of the 12th Annual Conference of the International Speech

180

Robert Fuchs and Eva-Maria Wunder

Communication Association: Interspeech 2011, Florence, Italy, August 27–31. http://www. isca-speech.org/archive/interspeech_2011/i11_1129.html (accessed 14 November 2013). Pike, Kenneth L. 1945. The Intonation of American English. Ann Arbor, MI: University of Michigan Press. Ramus, Franck, Marina Nespor and Jaques Mehler. 1999. Correlates of linguistic rhythm in the speech signal. Cognition 73 (3). 265–292. Rossi, Mario. 1998. Intonation in Italian. In Daniel Hirst and Albert di Cristo (eds.), Intonation Systems, 219–238. Cambridge: Cambridge University Press. Sarmah, Priyankoo, Divya Verma Gogoi, and Caroline Wiltshire. 2009. Thai English: Rhythm and vowels. English World-Wide 30 (2). 196–217. Schneider, Edgar W. 2003. The dynamics of new Englishes: From identity construction to dialect birth. Language 79 (2). 233–281. Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world. Cambridge: Cambridge University Press. Stojanovic, Diana. 2009. Issues in the quantitative approach to speech rhythm comparisons. Working Papers in Linguistics (University of Hawai’i at Manoa) 40 (9). 1–20. Tsiartsioni, Eleni. 2011. Can pronunciation be taught? Teaching English speech rhythm to Greek students. In Eliza Kitis, Nikolas Lavidas, Nina Tpointzi and Tasos Tsangalidis (eds.), Selected papers from the 19th International Symposium on Theoretical and Applied Linguistics (ISTAL 19), 447–458. Aristotle University of Thessaloniki: Thessaloniki. http:// www.enl.auth.gr/symposium19/. Van Els, Theo and Kees De Bot. 1987. The role of intonation in foreign accent. The Modern Language Journal 71 (2). 147–155. Wenk, Brian J. 1985. Speech rhythms in second language acquisition. Language and Speech 28 (2). 157–175. Wennerstrom, Ann. 2001. The music of everyday speech: Prosody and discourse analysis. Oxford: Oxford University Press. White, Laurence and Sven L. Mattys. 2007. Calibrating rhythm: First language and second language studies. Journal of Phonetics 35 (4). 501–522. Zborowska, Justyna. 2000. The acquisition of English speech rhythm by Polish learners. In Proceedings of New Sounds 2000, Amsterdam, The Netherlands, 368–374.

A sonority-based account of speech rhythm in Chinese learners of English

181

8 Appendices Appendix 1 The North Wind and the Sun in Mandarin Chinese:

有一回,北风跟太阳正在那儿争论谁的本领大。说着说着,来了一个过路 的,身上穿了一件厚袍子。他们俩就商量好了,说,谁能先叫这个过路的把 他的袍子脱下来,就算是他的本领大。北风就卯足了劲儿,拼命的吹。 可是,他吹得越厉害,那个人就把他的袍子裹得越紧。到末了儿,北风没辙 了,只好就算了。一会儿,太阳出来一晒,那个人马上就把袍子脱了下来。 所以,北风不得不承认,还是太阳比他的本领大。 Pinyin: Yǒu yì huí, běi fēng gēn tài yang zhèng zài nàr zhēng jùn shuí de běn lǐng dà. shuō zhe shuō zhe, lái le yí ge guò lù de, shēn shàng chuān le yí jiàn hòu páo zi. tā men liǎ jiù shāng liang hǎo le, shuō, shúi néng xiān jiào zhè ge guò lù de bǎ tā de páo zi tuō xià lái, jiù suàn shì tā de běn lǐng dà. běi fēng jiù mǎo zú le jìnr, pīn mìng de chuī. kě shì, tā chuī de yuè lì hài, nà ge rén jiù bǎ tā de páo zi guǒ de yuè jǐn. dào mò liǎor, běi fēng méi zhé le, zhǐ hǎo jiù suàn le. yì huǐr, tài yang chū lái yí shài, nà gè rén mǎ shàng jiǔ bǎ páo zi tuō le xià lái. suǒ yǐ, běi fēng bù dé bù chéng rèn, hái shì tài yang bǐ tā de běn lǐng dà. (Translation: The North Wind and the Sun were disputing which of them was stronger, when a traveller came along wrapped in a warm cloak. They agreed that the one who first succeeded in making the traveller take his cloak off should be considered stronger than the other. Then the North Wind blew as hard as he could, but the more he blew, the more closely did the traveller fold his cloak around him; and at last the North Wind gave up the attempt. Then the Sun shone out warmly, and immediately the traveller took off his cloak. And so the North Wind was obliged to confess that the Sun was the stronger of the two.) Two phonetically rich sentences in Mandarin Chinese: 1. 小朋友们爱玩纸飞机和气球。 Pinyin: Xiăo péng yŏu men ài wán zhĭ fēi jī hé qì qiú. (Translation: The children like playing with paper planes and balloons.) 2. 我们喜欢去公园放风筝,荡秋千和打羽毛球。 Pinyin: Wŏ men xĭ huan qù gōng yuán fang fēng zheng, dàng qiū qiān hé dă yŭ máo qiú. (Translation: We love kite flying, seesawing and playing badminton in the park.)

182

Robert Fuchs and Eva-Maria Wunder

Appendix 2 English text Koalas Millions of koalas once lived in Australia. About 100,000 survive today. What’s happening to these popular critters? Wildfires raged in Australia during January 2002, destroying 600,000 acres of forest. The flames’ victims included countless koalas. These tree-climbing mammals live only in eastern Australia. But the fire alarms caught the attention of koala lovers around the world. The wildfires, however, were just part of a much larger problem: Forests are vanishing throughout eastern Australia. Cute and popular as koalas are, they’re having trouble hanging on. Koalas’ problems stem from being picky eaters. These marsupials like just one thing: They’re hooked on eucalyptus, an Australian tree. Koalas use their big noses to sniff out tasty leaves. “If you offered them something else,” says zookeeper Jennifer Toll, “they wouldn’t know what to do with it. They’d starve before they’d eat a carrot.” Koalas weigh only twenty pounds. But they gobble almost three pounds of food a day. That’s like a sixty-pound kid eating nine pounds a day! Eucalyptus leaves, you see, aren’t very nutritious. So koalas need supersize servings to get enough energy. Even eating as much as they do, koalas don’t have much energy. So they rest about 20 hours a day. That doesn’t make it any easier to search for mates, especially when their territories are so scattered. As a result, the koala population plunged. No one knows exactly how many koalas survive today. What does the future hold for koalas? Can humans find ways to help them hold on? Australians hope so. “The koala,” an Australian once said, “is essential to how we see ourselves.” Saving koalas is possible. But it will take time, work, hard choices – and plenty of eucalyptus leaves. (modified from http://magma.nationalgeographic.com/ngexplorer/0303/ articles/mainarticle.html; 16.02.2010)

A sonority-based account of speech rhythm in Chinese learners of English

Appendix 3 Mean sonority and variation in sonority for individual speakers

Eng 1 Eng 2 Eng 3 Eng 4 Man 1 Man 2 Man 3 Man 4 Man 5 Man 6 Man 7 Man 8 Man 9 Man 10 ManE 1 ManE 2 ManE 3 ManE 4 ManE 5 ManE 6 ManE 7 ManE 8 ManE 9 ManE 10

Mean sonority (S̄ )

Variation in sonority (δS)

0.482 0.547 0.486 0.509 0.375 0.420 0.428 0.334 0.495 0.492 0.550 0.441 0.537 0.378 0.360 0.409 0.406 0.366 0.470 0.500 0.551 0.422 0.533 0.395

0.155 0.150 0.148 0.149 0.145 0.168 0.166 0.151 0.152 0.143 0.131 0.163 0.124 0.147 0.140 0.163 0.159 0.152 0.157 0.145 0.141 0.170 0.129 0.145

183

Heidi Altmann and Barış Kabak

10 English word stress in L2 and postcolonial varieties: systematicity and variation 1 Introduction This paper brings together empirical insights from various studies on contact varieties of English (non-native and native English varieties) in an attempt to identify unity and variation in English stress placement. There is a growing body of work on prosody in non-native (L2) English (e.g. Capliez 2011; Chun 2002; Gut 2003, 2009; Trouvain and Gut 2007) as well as in varieties of English around the world (e.g. Gut 2005; Simo Bobda 2011; Zerbian 2013). Although English word stress has by now been widely investigated in L2 acquisition (see Altmann and Kabak 2011 for a review), comparative studies on its variable realisation in different varieties of English are crucially lacking. It is widely assumed that English stress is primarily lexical (cf. Giegerich 1992; Roach 2009) and that the placement of stress in L2 and postcolonial varieties can be prone to L1 effects (e.g. Altmann 2006; Kijak 2009). To that end, English word level stress constitutes an instructive research venue from both a theoretical and an empirical perspective since different encounters with such a notoriously complex word prosodic system may nevertheless exhibit recurrent patterns that are reflective of universal as well as specifically English-based acquisition-driven processes. The primary aim of this paper is to understand unity and variation in English stress assignment by investigating an array of different L1s as well as different contact and learning situations, namely English as a Second Language (ESL), English as a Foreign Language (EFL), and postcolonial varieties of English. After a brief introduction to the facts of English stress and its variability, we will bring together three sorts of studies to pinpoint endogenous (timeless laws of phonetics and phonology, as well as English-specific variables) and exogenous (user-specific variables such as his or her L1 and level of proficiency) factors that predict systematicity and variation. In particular, we will report on a study on the production of stress in English nonce words by highly advanced ESL speakers with different L1s in the USA, and compare that with stress placement

Heidi Altmann, University of Stuttgart Barış Kabak, University of Würzburg

186

Heidi Altmann and Barış Kabak

in real English words by highly advanced EFL speakers in Germany. Finally, we will embed the observations from these experimental studies in the context of previous descriptions of stress placement in English postcolonial varieties, namely Cameroon and Nigerian English, to arrive at broader generalisations of the nature and dynamics of stress in Englishes and to explicate some potential factors in the genesis of non-convergence between standard varieties and postcolonial varieties in stress assignment.

2 English stress: Facts and findings from non-native speakers This section presents some well-known facts of word-level stress in Standard Englishes at the interfaces between phonology, morphology and lexicon. Our primary aim here is one of description. As such, we do not evaluate existing theories of English stress, nor do we provide a new account. We also review a number of recent studies on the L2 acquisition and processing of English stress to explore systematicity and variation in non-native encounters with English stress and lay the ground for our experimental studies.

2.1 Facts: Systematicity and variability in English stress Every learner of English has most likely struggled with stress assignment in English at some point in the learning process. While there are some regular patterns or tendencies, numerous exceptions can be identified for each of them, reflecting the complex diachronic development of the English language (e.g. Fikkert, Dresher and Lahiri 2006; Dresher and Lahiri 2005; Fournier 2007). What is generally accepted by now is that the position of primary stress in an English lexical word depends on different factors such as word class, syllabic structure or morphological composition. These, of course, come in addition to cases where stress simply has to be specified lexically. Purely phonological accounts of English stress focus on the importance of syllable weight: In monomorphemic words, stress is assumed to usually fall on the penultimate syllable (i.e. the second syllable from the end) if it is heavy; if it is light, the preceding (antepenultimate) syllable is stressed (Chomsky and Halle 1968; Hayes 1982; Giegerich 1992).1 While this generalisation holds for most 1 This could also be presented in terms of final consonant extrametricality for simplex adjectives and verbs and syllable extrametricality for nouns (Hayes 1982), however, our focus here is not on theoretical accounts; it is purely descriptive.

English word stress in L2 and postcolonial varieties

187

nouns, there are, however, quite a number of verb/noun or adjective/noun pairs that can be distinguished based on the position of stress. In such disyllabic pairs, nouns are stressed on the penultimate syllable and the corresponding verbs or adjectives on the final syllable (e.g. ˈsuspect (n.) vs. susˈpect (v.), ˈpermit (n.) vs. perˈmit (v.), ˈcontent (n.) vs. conˈtent (adj.)), suggesting that word class, a non-phonological construct, exerts an influence on stress placement in English and that primary stress in monomorphemic words will fall on one of the last three syllables. Another non-phonological factor influencing stress placement is morphological complexity, whereby regularities based on word class are additionally subject to the prosodic demands of inflectional and derivational morphemes (e.g. Kingdon 1958). For example, it is well known that some derivational suffixes cause stress to either shift further to the right within the stem (stress-shifting suffixes) or pull stress onto themselves (stress-attracting suffixes) (see Giegerich 1992 and Yavaş 2011 for textbook treatments). Examples for stressattracting suffixes would be –ese or –ee as in Portuˈguese (cf. ˈPortugal), examiˈnee (cf. eˈxamine), for stress-shifting suffixes –al (e.g. poˈlitical, cf. ˈpolitics), –ify (e.g. soˈlidify, cf. ˈsolid) or –iary (beneˈficiary, cf. ˈbenefit). For most of the general patterns described above, however, exceptional cases can be cited. These make the actual system much less predictable and more idiosyncratic. For example, there are many verbs and adjectives that are stressed on the first syllable (e.g. ˈborrow, ˈdifficult). Furthermore, there are several disyllabic nouns that are stressed on the final syllable (e.g., balˈloon, hoˈtel, caˈnoe, Juˈly, desˈsert), as well as those with final secondary stress (e.g., ˈsynˌtax, ˈraˌdar). Despite a heavy penultimate syllable, nouns such as ˈcalendar and ˈcolander (cf. coriˈander) must also be treated as exceptions to the English stress rule given above. One can even find (albeit rare) cases of words with primary stress on the pre-antepenultimate syllable (and even secondary stress on the final syllable) in the English lexicon (e.g. ˈcatamaˌran, ˈcaricaˌture), which violate the supposed three-syllable window from the right word edge for primary stress. Finally, there is considerable variation among Standard Englishes with respect to the placement of primary as well as secondary stress (e.g. General American: beˈret vs. Received Pronunciation: ˈberet; General American: ˈsecreˌtary vs. Received Pronunciation: ˈsecretary; see Tottie 2002 for further examples).

2.2 Non-native encounters: Acquiring and processing English stress The empirical investigation of the non-native production of word stress is a relatively young discipline in linguistic research. For a long time, segmental

188

Heidi Altmann and Barış Kabak

issues had been considered much more central to non-native speech and became the impetus behind the development of influential L2 speech models (e.g. Best’s [1995] Perceptual Assimilation Model; Flege’s [1995] Speech Learning Model) albeit without explicit predictions for the development of L2 prosody. Nevertheless, the importance of stress in non-native speech has long been acknowledged in the literature such that incorrect stress placement alone may lead to miscommunication (e.g. Hubicka 1980). For instance, Benrabah (1997: 163) propagates that “its importance as a clue to word-recognition in listening to speech makes it a ‘high-priority’ in language teaching”. Strikingly, though, word stress is excluded from Jenkins’ (2000) list of features of the Lingua Franca Core – i.e. those features that should be focused on in teaching English as a foreign language (EFL). However, given the fact that stress can be used as a cue in speech perception by English native speakers (e.g. Cutler 1984), and that misassigned primary stress in English has been shown to influence comprehensibility and intelligibility (Hahn 2004), Jenkins’ observation that English lexical stress, as well as individual vowel quality (which constitutes yet another cue for word stress in English), do not matter in EFL interactions as they do not lead to unintelligibility (see Deterding 2011 for a review) remains questionable. Research on L2 stress has revealed what is also commonly found in other areas of L2 development: A prevalent cross-language transfer effect, on the one hand, and unique interlanguage patterns on the other. In particular, L2 learners have been shown to stress English words in accordance with strategies from their native language (Archibald 1992), or to produce some kind of default pattern that corresponds to neither their L1 nor the target language prosodic system (Archibald 1997; Pater 1997). The application of L1 strategies naturally hints at learners’ direct transfer of properties of the language they are familiar with to the less familiar language. Such a transfer presumably would not indicate an active involvement of a newly created prosodic system for the L2, at least on the surface. For example, Hungarian learners, who tend to place stress in English words at their left-edge, might be influenced by the regular Hungarian pattern of word stress on the initial syllable (Archibald 1992). In the vast majority of stress production studies, however, learners were found to produce stress patterns that did not correspond to the ones in their L1 or in their L2, especially if they were required to produce English nonce words. These patterns might be due to novel strategies invented by learners or non-target like (i.e. non-converging) application of existing strategies in English. The French learners of English in the study by Pater (1997) are exemplary for the application of a unique non-converging learner pattern since they consistently stressed the leftmost (heavy) syllable, which is unlike any English

English word stress in L2 and postcolonial varieties

189

native strategy and also unlike the French L1 pattern. In contrast, Guion, Harada and Clark (2004) found Spanish L2 learners to rely on the same factors for stress assignment as native speakers of English (analogy to known words, syllabic structure, lexical class), albeit not to the same extent or in the same relative distribution. Albeit with some delay, the perceptual aspects of L2 stress have also received some attention. In particular, robust effects of one’s native language on the ability to perceive stress location or stress differences in words have been repeatedly shown in the psycholinguistic literature. For instance, French listeners, whose L1 has no lexically contrastive stress, were shown to exhibit difficulties with encoding stress contrasts (Dupoux, Peperkamp and Gallés 2001; Dupoux et al. 1997, 2008). “Stress deafness”, as Dupoux and colleagues termed this “impairment”, emerges only in tasks that tap phonological representations of stress. As such, the language-specific modulation of stress sensitivity stems from processing stress at an abstract phonological level rather than at a psycho-acoustic level (see Domahs et al. 2012; Schwab and Llisterri 2011). This suggests that at lower levels of processing, the phonetic differences may be discriminated and used by listeners, yet they may have difficulties with encoding stress in lexical representations.2 Learner internal factors that have so far been identified as influencing the relative success in stress perception were the degree of predictability of stress position in the L1 (Altmann 2006; Peperkamp and Dupoux 2002), the presence or absence of lexical stress in the L1 (Altmann 2006 for English as L2), or the functional load of word stress in the L1 (Kijak 2009 for Polish as L2).

3 Case studies In order to get a better understanding of how variation and systematicity manifest themselves in English stress assignment, we will now take a closer look at specific case studies on English stress placement in three distinct learning/ contact situations: learners residing in an English-speaking country and those residing in their home country, referred to here as ESL and EFL learners respectively, and a postcolonial variety of English, namely Cameroon/Nigerian English.

2 For example, Turkish speakers have been shown to have difficulties identifying stress location in English words (Altmann 2006) although they can use stress cues for word segmentation in their native language (e.g. Kabak et al. 2010).

190

Heidi Altmann and Barış Kabak

While the EFL study tests the learners’ metalinguistic knowledge of stress patterns in the perception of existing English words by highly proficient German university students of English, both the ESL study and the observations from Cameroon/ Nigerian English refer to English stress assignment in production. An evaluation of the findings from perception and production studies as well as from different contact situations enables us to identify potentially similar strategies across different modalities and learning scenarios, thus yielding a more comprehensive overview of variation and systematicity in English stress assignment throughout distinct linguistic contexts.

3.1 Stress production in highly proficient ESL users The first case study we report here concerns highly advanced non-native speakers of English who had been studying and residing in the USA for at least six months at the point of time of data collection. We investigated which strategies these learners might use to assign stress to words that they have never come across in the second language. It seems clear that there are different strategies available (cf. section 2.2), however, what is of interest in this context is to what extent English stress patterns differ across the different L1 groups. The data we discuss below were first reported in Altmann (2006), which we will re-analyse in the context of the central research questions raised in this paper.

Participants A total of 80 university students from the University of Delaware participated in this study. Ten of them were native speakers of American English, all undergraduate students enrolled in introductory linguistics courses. The remaining participants were recruited from the international student community at the University of Delaware, all highly advanced learners as measured by standardised proficiency tests (Test of Spoken English and Michigan English Test) and speakers of a standard variety of their native language. In particular, there were 7 different non-native L1 groups with 10 participants each: Arabic, Turkish, French (all of them are languages with regular/ predictable word stress), Spanish (a language with primarily lexical stress), Japanese, Korean, and Chinese (all of them are pitch-accent or tone languages with no word-level stress).

English word stress in L2 and postcolonial varieties

191

Methodology Forty-six nonce words were presented to the participants in orthographic form on a printed list in a pseudo-randomised order. Each word was broken down into its individual syllables, separated from each other by a dot (•) for easier reading. The nonce words were created based on the following criteria: (1) only open CV syllables were used, (2) each word was intended to contain at least one stressable (i.e. not schwa) vowel, which in the current case was restricted to a long/tense vowel or diphthong (indicated by double letters in orthography, e.g. or ), (3) no more than two stressable syllables should be adjacent, conforming to the English rhythm rules (cf. Liberman and Prince 1977), and (4) no syllable corresponded to any existing English word. Application of these criteria yielded three different structures of words with two syllables (5 tokens per structure = 15 tokens in total, e.g. noo•dee), four different structures of words with three syllables (4 tokens per structure = 16 tokens in total, e.g. sa•foa•na), and 5 different structures of words with four syllables (3 tokens per structure = 15 tokens in total, e.g. ma•ley•da•zee).3 The task of the participants was to read each word out loud twice. Prior to the study, all participants were instructed that the nonce words are potential words of English and practiced the production task with a number of real and nonce words. In the main task, only the second production was used for analysis since this allowed the speakers to monitor and improve on their first reading in case it did not ‘sound good’ to them. The second recordings were transcribed and analysed with respect to the location of stress, which were determined by two phonetically trained (near-)native speakers of American English who listened to all the items. Inter-transcriber reliability was very high (90%) and cases of disagreement were discussed until a consensus was reached. If participants provided structures that were unintended but fell into some other structural type, these were grouped with the respective applicable alternative structure; where this was not possible, the item had to be excluded from analysis. This naturally resulted in unequal numbers of actual productions for each category and language group. Below we will only offer a qualitative analysis of the data to reach a holistic understanding of the patterns that emerged in each L1 and how those patterns compare to other L1s’ preferred stress location overall.

3 A list of the specific structures can be found in Appendix 1 and a comprehensive list of all stimuli can be found in Altmann (2006).

192

Heidi Altmann and Barış Kabak

Results Table 1 presents the preferred stress position across words of different lengths for each structure that contained at least two potential positions of primary stress, which means that structures with only one full vowel are not included in the analysis since this would not involve speakers’ choices for or against a certain stress position due to the existence of only one potentially stressable vowel in a word. It also provides the percentage of how often each pattern was provided overall by the speakers of the respective L1 group. In cases where there were two options within 10% of occurrence, both choices are listed. It should be noted that in words with three full vowels, a score of 50% may still yield a difference of more than 10% from either of the other two possible stress positions. As can be seen in Table 1, Turkish (6/7), French (5/7), Arabic (5/7) and Japanese (4/7) speakers produced most structures in agreement with the English native speakers’ stress patterns. English native speakers overwhelmingly preferred to stress the rightmost non-final stressable syllable, i.e., the penult if this contained a full vowel or otherwise the antepenult, a finding that has recently been confirmed by Domahs, Plag and Carroll (2014) as well. Strikingly, only four English participants actually produced the pattern CV.Cə.Cə.CV (with final primary stress placement), yielding a total of merely six tokens out of a possible total of thirty; all other English native speakers instead delivered pronunciations with differing structures for items of this type, changing one of the intended schwas into stressed tense vowels (which were subsequently counted towards those respective structural types in the current analysis) or into stressed lax vowels (making the following onset consonant ambisyllabic; these structures had to be discarded since they did not correspond to any of the structural types in our investigation). Thus, for example, the item goo•ve•ra•dee was pronounced (unchanged) as [guːvərəˈdiː], or (changed) as [gʊˈvɛrədiː] or [guːvə ˈrʌdiː]. Native speakers’ difficulty with this structure can be explained by the theoretical existence of a “three syllable window” from the right edge that is regularly available for English primary stress (see Domahs et al. 2014 for a CELEX analysis), which means that almost all English real words have primary stress on one of the three last syllables (cf. section 2.1.). Since in the case of CV. Cə.Cə.CV both the penultimate and antepenultimate syllables contain schwas, and the final syllable is obviously not the preferred position for stress by native speakers, this would only leave the pre-antepenult as the last potential target here. No English participant, however, produced primary stress on this syllable; instead the majority opted for deviating pronunciations that contained a different (stressable) vowel type in the penult or antepenult syllable, their preferred

English word stress in L2 and postcolonial varieties

193

Table 1: Most preferred position of primary stress for nonce words by language (shaded cells indicate clear agreement4 with the English native group) English

Turkish

French

Arabic

Chinese

Japanese

Korean

Spanish

penult 67.9%

penult 53.9%/ final 46.1%

penult 67.2%

penult 50%/ final 50%

final 71.1%

final 58.9%

final 64%/ penult 56%

final 52%/ penult 48%

CV.CV.Cə

penult 79.1%

penult 68.8%

penult 67.4%

penult 72.4%

penult 83.3%

penult 91.3%

penult 71.0%

penult 74.4%

CV.Cə.CV

antepenult 87.1%

antepenult 63.3%

antepenult 68.8%

antepenult 62.5%

final 70.6%

final 74.4%

antepenult 53.3%

final 57.6%

CV.Cə.CV.CV

penult 90%

penult 59.2%

final 54.1%

final 57.1%

final 50%/ penult 45.8%

penult 50%

final 50%

final 85.7%

CV.Cə.CV.Cə

penult 100%

penult 95.6%

penult 81.2%

penult 100%

penult 100%

penult 96.0%

penult 89.4%

penult 90%

Cə.CV.Cə.CV

antepenult 80.8%

antepenult 75%

final 54.1%

antepenult 57.9%

final 65.2%

antepenult 66.7%

final 56.5%

final 55.5%

CV.CV.Cə.CV

antepenult 80%

antepenult 70.8%

antepenult 60%

antepenult 55%

final 72.7%

final 54.8%

final 46.2%

final 59.3%

CV.Cə.Cə.CV5

final 100%

final 81.8%

final 81.3%

–6

preantepenult 85.7%

final 100%

final 63.2%

final 94.4%

2 syllables: CV.CV

3 syllables:

4 syllables:

4 “Clear agreement” was determined to be only in cases where the most preferred stress position (i) was the same as the one chosen be the English group and (ii) yielded more than 15% difference from any other alternative stress position produced for a given structure. 5 There was only a marginal number of occurrences (n = 6) of this structure provided by few English speakers (n = 4). This row is presented here for reasons of completeness but will not be considered for further analysis. 6 No Arabic speaker provided any single word of this structure.

194

Heidi Altmann and Barış Kabak

positions for primary stress. A few speakers did obey the intenoverwhelmingly preferred to stressded structure and had to resort to final stress for these words, as illustrated in the example above, which, however, does not provide a solid basis for a group wise comparison of stress placement for this type of words with the other L1s. Considering the variable performance of the different L1 groups, the question arises as to why the Turkish, French, Arabic groups as well as the Japanese group showed most convergence with the native speaker population. Can this be due to the transfer of L1 prosody (e.g. employing similar stress patterns or experience with word-level lexical prominence)? Since stress placement in Arabic would yield the same stress patterns as those by the English group, we cannot be certain as to whether these learners applied L1 strategies or English-specific strategies. However, it was also the case that both the Turkish and the French participants displayed similar, target-like strategies for stress assignment for the vast majority of nonce word structures, which cannot be due to an L1-induced pattern. In the few cases where they showed divergence, however, the L1 strategy of making the final syllable prominent surfaced (both Turkish and French have regular stress on the final syllable; see Kabak and Vogel 2001 for Turkish; Dell 1985 for French). As for the Japanese learners, although they do not fully cohere typologically with the three rather “target-like” groups discussed above, their convergence with English native speakers can be due to the word-level prosody of Japanese, where pitch-accent arguably brings about a specific locus of prominence within a word, which is lexical. However, the fact that Spanish speakers, whose L1 also employs a lexical stress system, did not show target-like behavior is not consistent with an explanation based on the potential positive influence of Japanese prosody. Altogether then, it is not possible to postulate a purely L1-based account of convergence in L2 stress production in this study. Similarly, L1 influence also does not explain the cases of non-convergence, which we turn to below. The remaining L2 groups only showed agreement with the target population if there was a schwa in the final syllable, which then necessarily had to lead to penultimate stress. What is striking, however, is the surprising overlap that can be found across the non-converging stress patterns: almost all of them are due to placing word stress on the final syllable. Since Chinese and Korean are both non-stress languages, this cannot be based on L1 strategies. It rather reflects some linear interlanguage strategy, whereby the learners put word-level prominence on the last syllable that can bear it. Thus, there were some common approaches for non-converging patterns that could be identified in all L2 groups: (i) the most prominent syllable was pushed towards the right edge (as opposed to the beginning of the word), and (ii) final stress was not an option in the case

English word stress in L2 and postcolonial varieties

195

of schwa syllables. These approaches suggest that all learners had some sensitivity to the rhythmic structure of English. However, this prosodic knowledge did not always produce patterns that converge with those of the English native speakers since prominence was pushed too far to the right, thus ignoring the exceptional character that the final syllable obviously has for native speakers of English. It can therefore be considered a unique learner strategy since the most frequent position for stress in English lexical words is the penultimate or antepenultimate syllable and final stress is statistically not a common position in the native English lexicon in general (cf. Clopper 2002). A somewhat mixed performance was that of the Spanish group, which patterned more with the non-stress languages for most structures. In the absence of orthographical marking, stress in Spanish would generally fall on the penultimate syllable, which is not observed in the data at hand: in cases where the final and the penultimate syllable contained a full vowel, the Spanish learners favoured the final syllable, which agrees with neither their L1 nor the L2. As such, Spanish learners’ English stress assignment strategies can be taken to yield a unique learner pattern, apparently the same one that the other nonconverging L1 groups (i.e. those with no word stress in the L1) applied. What can be concluded from these L2 production data is that highly proficient ESL learners often did not provide the same stress patterns as native English speakers. However, there was systematicity in the non-converging structures since they followed the same basic generalisation that was employed consistently by groups of learners from different L1s, i.e. placing primary stress on the final syllable, a pattern that was neither L1- nor L2-induced.

3.2 Stress identification in highly proficient EFL users The point of departure in the second study was our longtime personal observations on incorrect stress realisations of some contextually-familiar English words (i.e. those used frequently in academic language) by German university students who persistently produce them with penultimate stress such as hypothesis (as [haɪpɔˈθi:sɪs]), and variable (as [vəˈɹaɪəbl ̩]). Given the fact that German has lexical stress, such incorrect renderings of stress are unlikely to stem from a perceptual difficulty, or a general difficulty to store stress contrasts at the word-level, as discussed above. In particular, we investigated to what extent highly proficient L2 learners of English are able to correctly identify stressed syllables in such “problem” words upon carefully focusing on their respective stress patterns, and how confident and consistent they are in their identification.

196

Heidi Altmann and Barış Kabak

Participants Twenty-seven university students (23 female) with German as their L1 participated in the study in 2012. All were studying English either as their primary or secondary major at the University of Würzburg in Germany, and were participating in an advanced-level seminar on Second Language Phonology and Foreign Accent in the English Linguistics department at the time of testing (the experiment was conducted in the last session of the seminar). As such, they had explicit knowledge about English phonetics and phonology as well as empirical background in L2 phonology and foreign accent.

Methodology The participants were asked to carefully listen to the pronunciation of 30 polysyllabic English words (see Appendix 2) presented in random order by means of two loudspeakers at a comfortable volume, and pay specific attention to word stress in two consecutive tests.7 The items were recorded by a phonetically trained near-native speaker of English (the first author), who modeled her pronunciation after the pronunciation given for General American English in the English Pronouncing Dictionary (Jones 2006). After having heard all the words, the participants were told to turn over the response sheet given to them, which listed in random order all the words that they had heard in the recording. For each word, they were instructed to mark the syllable with the strongest stress. Next, they were asked to indicate, again for each word, how confident they were in their stress judgement using a Likert scale from 1 to 5 (1: no confidence, 5: very high confidence). Additionally, they were requested to come up with at least one word that is phonologically similar to each word on the list. The primary aim of this sub-task was to increase their sensitivity to the phonological properties of the words, segmentally and suprasegmentally, the results of which we will not report here. There were 5 different response sheets, all containing the same items albeit in a different order to control for any order effects, and they were assigned to the participants in a consecutive order (Subject 1-List 1; Subject 2-List 2; . . . Subject 6-List 1, and so on). The participants were given 10

7 The items mostly came from those words that we have long observed to often pose problems for German university students. A closer look at the words reveals that they either do not follow the so-called English Stress Rule (e.g. poˈlice, ˈcalendar), or constitute cognates in German and English, causing the transfer of the German stress pattern to the English equivalent (e.g. English: hyˈpothesis, aˈnalysis, German: Hypoˈthese, Anaˈlyse ).

197

English word stress in L2 and postcolonial varieties

Table 2: Participant characteristics and self-assessment Age

Importance of native-like accent

Importance of fluency

Speaking

Comprehension

Writing

Mean

24.4

3.9

4.8

3.7

4.5

3.7

Range

21–29

3–5

3–5

3–5

3–5

3–5

minutes to finish the task (Test 1). After a short break, they were asked to listen to the same 26 words again and do the same task (i.e. marking the primary stress and indicating their confidence level for each word) on a second response sheet that they received (Test 2). The items on the second response sheet had a different order than the ones on the first response sheet. They again had 10 minutes to finish the task. Finally, they were given a background questionnaire which requested, among others, information about their age, how important it is for them to have a native-like accent and to be fluent in English (5: Very Important, 4: Important, 3: Moderately Important, 2: Of Little Importance, 1: Unimportant) as well as a self-evaluation of their English skills in speaking, comprehension and writing (5: Very Good, 4: Good, 3: Acceptable, 2: Poor, 1: Very Poor). As expected, the participants reported to be highly motivated to be fluent in English and to have a near-native accent. Table 2 summarises the participants’ responses.

Results A total of 1620 responses were obtained from the two tests. For each test, we analysed the number of incorrect stress assignments as well as the confidence level given to each incorrect response. Here we focus only on overall error patterns, without going into error analyses for each word or participant. The error rate was 11.5% for Test 1 and 7% for Test 2, indicating that the participants’ stress judgements became more reliable after an additional listening session. The generally low error rates may not be surprising given that, first of all, the participants were trained in English phonology and second language acquisition, and perhaps more crucially, that the task demanded explicit attention to stress. However, given the favorable conditions for monitoring and increased metalinguistic awareness, the existence of errors is instructive to understand the pervasiveness of stress difficulties in an L2. Furthermore, errors could not completely be eradicated in Test 2 although the participants heard the words again, suggesting a level of fossilisation in the lexical representation of stress.

198

Heidi Altmann and Barış Kabak

Table 3: Error rates in stress marking for different positions in polysyllabic words Syllable

% Errors in Test 1 (n=84)

% Errors in Test 2 (n=51)

initial antepenult penult final

29% (24) 2% (2) 60% (50) 9% (8)

39% (20) 10% (5) 45% (23) 6% (3)

The great majority of errors (90% in Test 1, and 88% in Test 2) came from words with three or more syllables. Of these polysyllabic words with diverging stress position, the majority of errors in both tests was due to the placement of stress on the penultimate syllable. The word that turned out to be most problematic was parenthesis, to which 63% of the participants responded with an incorrect stress pattern in Test 1. The great majority of these erroneous responses involved penultimate stress (83%). The second most problematic word was variable, which was marked incorrectly by 44% of the participants, all with antepenultimate stress. This was followed by reviewed (30 %) with initial stress. Among those who incorrectly responded to parenthesis and reviewed, the great majority could correct themselves in Test 2 (83% and 63% respectively). However, the erroneous stress pattern of variable was the most pertinacious: Only 33% of the participants could correct the stress pattern in Test 2. Another uniform pattern that emerged in the study involved calendar in Test 1, where all the erroneous answers (22% of all responses) involved penultimate stress; all were corrected, however, in Test 2. Table 3 summarises the distribution of errors to syllable positions in words with three or more syllables in Test 1 and Test 2. Although the number of errors in Test 2 was considerably lower in comparison to Test 1, 72% of these were due to the repetition of the same erroneous stress pattern as the first time around. The remainder was due to new errors (i.e. items that were correctly marked in Test 1, but incorrectly marked in Test 2). Interestingly, 65% of the new errors involved initial stress. Altogether then the results show that the prevalent target positions for non–converging stress placement in polysyllabic words for German speakers are penultimate or initial syllables. Concerning the confidence scores assigned to the erroneously marked words, Wilcoxon signed-rank tests with Holm-Bonferroni sequential correction reveal that there is a significant increase in the participants’ confidence levels from Test 1 to Test 2 when the errors are corrected in Test 2 (x = 3.02 vs. 3.94, Z = –4.014, p < 0.001). However, this was also the case when the participants committed the same error (3.23 vs. 3.83, Z = –2.888, p = 0.008), as well as when they made additional errors (3.44 vs. 4.13, Z = –2.230, p = 0.03).

English word stress in L2 and postcolonial varieties

199

In sum, this study shows that stress errors can be impervious to correction even for highly advanced L2 learners of English who have explicit knowledge about English stress and who are given an opportunity to monitor stress placement. The preponderance of initial and penultimate stress in non-convergence is on a par with some of the observations made in the production study with ESL speakers (see 3.1.) above: Learners were found to employ some unique strategy that native speakers would not (e.g. final stress in the ESL study, initial (i.e. pre-antepenultimate) stress in the EFL study). In addition, they also followed and generalised target-like strategies, as illustrated by the converging preference for a penultimate stress pattern in polysyllabic words, which however led to errors in the case of the EFL study only due to the lexical, exceptional nature of stress placement in some English words (e.g. analysis, hypothesis, phenomenon). An alternative explanation for some errors may be that these words exist as cognates in German, where they have penultimate stress.

3.3 Stress assignment in postcolonial varieties of English: Cameroon/Nigerian English The lexicon of postcolonial varieties of English often illustrates stress placement that differs from that of the Standard Englishes (i.e. Received Pronunciation and General American). Once one steps away from a prescriptive comparison with the heritage language and considers these varieties as independent grammatical systems in their own right, it becomes obvious that they follow their own patterns and regularities. These patterns may originate from a more systematic application of stress patterns present in the “old Englishes”, they may, however, also be genuine innovations. In the following, we will take a closer look at the stress systems of Cameroon/Nigerian English to support the notion that deviation from standard norms should rather be seen as variation, which in turn should motivate researchers to get a better understanding of a potential system behind this variation. As convincingly argued by Simo Bobda (1995, 2008, 2010), Cameroon and Nigerian English (as opposed to the “old English” targeted in institutional settings in these countries) can be subsumed under one common denominator regarding lexical stress in both varieties. Based on a reanalysis of an array of existing data (frequency counts as well as original experimental data), Simo Bobda (2010) provides an insightful account of stress placement in these varieties as being widely regular with respect to a number of relatively simple and transparent strategies, some of which will be presented in the following.

200

Heidi Altmann and Barış Kabak

Many stress placement strategies in these new English varieties are incorporated from Standard Englishes (i.e. RP and GA)8, but more regularly applied, as exemplified in the following select examples. In disyllabic Cameroon/Nigerian English verb-noun pairs, for instance, the position of stress alternates just like it often does for such words in Standard English (i.e. initial stress on nouns, final stress on verbs: ˈsuspect (N) vs. susˈpect (V)), however, it is applied more systematically to nouns in the new varieties. This is evidenced in generally regular initial stress for disyllabic nouns e.g. ˈadvice, ˈextent or ˈsuccess, compared to Standard English adˈvic/se (V/N) or sucˈcess (N). Furthermore, syllable weight as a phonological factor also strongly affects stress placement, just like in Standard English, but to a larger extent, avoiding many exceptions that exist in RP or GA. This can be exemplified by words like multiˈply, reaˈlise, anˈnex or caˈlendar in Cameroon/Nigerian English (which are non-rhotic). In addition, the stress property of affixes is more streamlined than it is in Standard English, for example, the suffix -ly is consistently stress-neutral in these new varieties (ˈmilitari+ly, ˈnecessari+ly, ˈprimari+ly) but not always in Standard English (miliˈtari+ly, necesˈsari+ly, priˈmari+ly). Cameroon/Nigerian English stress placement, however, also involves strategies that can be considered truly innovative. One such innovation is that the presence of a certain vowel or consonant in a final syllable rhyme may exert an effect on stress placement – a strategy that has no parallel in RP or GA, where the specific segmental content of a syllable (as opposed to its internal structure) has no such relevance. For instance, high front vowels in the final rhyme attract stress on this syllable. This tendency can be found for (especially female) first names, e.g. Juˈdy, Doˈris, for nationalities, e.g. Iraˈqi, Somaˈli, for prefixes, e.g. cenˈti+grade, poˈly+syllable, miˈni+skirt, but also for simple nouns such as bis ˈcuit, tenˈnis or speˈcies. In addition, there is a second stress placement strategy linked to segmental content which yields stress on final syllables ending in an alveolar nasal for (again especially female) first names, e.g. Eveˈlyn, Suˈsan.9 Another segmental strategy resulting in final stress can be found for verbs beyond two syllables length with final obstruents, e.g. embarˈrass, interˈpret. As illustrated by the select phenomena above, stress placement in Cameroon/ Nigerian English often differs from that in Standard English. This, however, does not mean that speakers of these new varieties fail at approaching a certain target 8 These varieties are also known as “old English” (e.g. Simo Bobda 2010), or as “Inner Circle English”, (e.g. Kachru 1985). 9 First names that do not have such a predisposition for final stress are regularly stressed on the penultimate syllable by default: Moˈnica, Reˈginald.

English word stress in L2 and postcolonial varieties

201

form but rather that they apply their own strategies that are not necessarily identical with the ones in the traditional standard varieties. Such strategies could even involve genuine original innovations of the new Englishes, in addition to a more general extension or simplification of existing strategies in the old Standard Englishes, as in the case of prevalent word-initial stress in disyllabic nouns. As such, what appear to be inconsistencies may stem from an interaction of conflicting patterns, as illustrated by words with the suffix –ism, which is pre-stressing in Cameroon/Nigerian English (materiˈalism, coloniˈalism), and thus does not yield final stress despite the presence of a high front vowel.

4 Discussion and conclusions Comparing the findings of the three case studies and scenarios, the following picture emerges: Word stress can pervasively diverge from normative standards even among those who actively use the language on a daily basis, and, interestingly, even among trained learners who strive for native-like pronunciation. Furthermore, non-native English stress patterns that emerge across different L1 backgrounds may, on the one hand, show uniformity in certain aspects (e.g. preferring a relatively fixed location for stress placement), but may, on the other hand, result from unique learner-internal strategies that are neither L1- nor L2induced (e.g. segmental content of final rhyme). Across the different varieties in our database, we find a considerable amount of within- and across-group variation with some clear default patterns, which could be either converging with or diverging from the Standard English pattern, with only marginal modulation by the L1 of the speaker (e.g. in the case of cognates). We will unpack each of these factors below and discuss their implications in the context of previous observations and findings. First, in all three studies presented above, learners apply regularisation strategies that comply with core prosodic properties of English (e.g. weight sensitivity, three syllable window, penult or antepenult preference) but may also overgeneralise across the learned lexicon. More specifically, converging patterns show a preponderance of penultimate stress placement, followed by antepenultimate stress depending on syllabic structure. Along these lines, van Rooy (2002) makes use of roughly the same principles, albeit expressed by ranked constraints, in order to account for stress patterns in Tswana English (a variety of Black South African English). In particular, stress in Tswana English is right-aligned to the word, avoiding however the final syllable unless it is superheavy, culminating in the above mentioned core properties of English.

202

Heidi Altmann and Barış Kabak

More specifically, in van Rooy’s Optimality Theoretic account, the ranking SUPERHEAVY >> NON FIN (ALITY ) >> ALIGN -R(IGHT ) ensures that penultimate syllables attract stress, with superheavy final syllables still being able to receive stress. The interplay of these constraints with the correspondence constraint OUTPUT-OUTPUT IDENT (STRESS ) accounts for the fact that certain suffixes may require stress to appear on antepenultimate syllables.10 Second, there were patterns that converged neither with the L1 of the speaker nor with the target language. The above mentioned stress-attracting segmental composition of certain final syllables in Cameroon/Nigerian English (e.g. the presence of high vowels) constitute one such innovative factor, which parallels Tswana English speakers’ tendency to stress the final syllable if it is superheavy (van Rooy 2002). Likewise, we saw a preponderance of non-targetlike final stresses in the ESL study, which can be considered an extension of the strategy found in Tswana English since this also applied to CV syllables and not only to superheavy ones. Third, we have seen cognate effects insofar as the stress patterns of the highly proficient German speakers of English are concerned. This is not surprising given that, first of all, real words were used in our EFL study and, secondly, English and German share a considerable number of cognates. This is on a par with Flege and Bohn’s (1989) findings, where Spanish L2 speakers of English residing in the USA (mean LOR = 2.3 years) resembled English native speakers more closely in producing the stress alternation in able-ability (in terms of stress location, duration and intensity differences between the unstressed and stressed syllables) than in pairs like satan-satanic, where there was a greater betweenand within-group differences. The authors speculated that Spanish speakers relied on the stress pattern of the Spanish cognate Satan (with final syllable stress), leading to “smaller average duration and intensity ratios for the Spanish than English speakers in the first vowels in Satan and satanic” (Flege and Bohn 1989: 57). Similarly, the non-convergence between the Spanish and English speakers in the production of the pair apply-application was also attributed to a cognate effect. While it is not possible to tease apart the role of word familiarity from cognate effects in Flege and Bohn’s (1989) study, it would be highly plausible that L2 learners acquire stress alternations on a word-by-word basis and resort to an L1-based strategy where lexical representations in the L2 are not intact. Coupled with our results, where we additionally see both rule-governed (albeit overgeneralised) and lexical effects, it would not be too far-fetched to suggest that English stress assignment, both in old varieties as well as the ones 10 According to van Rooy (2002) this happens in the case of “transparent” suffixes, for example in measuring [‘me.sa.riŋ].

English word stress in L2 and postcolonial varieties

203

we discussed here, is a by-product of both static, stored, lexicalised patterns as well as more rule-based, computable ones. Finally, highly advanced L2 speakers also exhibit intra-speaker variability or fossilised stress representations that are impervious to correction even under high-monitor settings, which we take to be reminiscent of how non-convergence has possibly originally emerged in postcolonial varieties. Especially since correction or possibilities to model one’s pronunciation after speakers of old varieties were less likely in postcolonial varieties given the ecology and special sociolinguistic dynamics of language contact situations, learner-initiated generalisations that diverge from standards were prone to eventually become static. In conclusion, the comparison of three different L2 English scenarios allowed us to shed some light on what kinds of general properties English language users are able to extract regarding stress assignment in English and how much they might add to the newly emerging system independently. Word stress in World Englishes is likely to become more fixed and predictable given well known laws of simplification and regularisation in language change and contact. This is perhaps inevitable especially given that there are instances of regularisation even in old English varieties. For example, a preference for initial stress in Southern American English as well as African American English for words that are standardly stressed on other syllables (e.g. July, police, TV, insurance etc.) has been widely noted (see Kretzschmar 2008: 49; Baugh 1983: 62–64). If word stress is an evanescent feature in lexical representations and a less salient marker of linguistic identity than for instance vowels and consonants, linguistic leveling within varieties and dialect leveling across varieties (especially given increasing globalisation) might make word prosody more similar across paradigms in the lexicon. The unifying English word stress patterns in the varieties we examined here can be taken as indications of an incipient change that may eventually eradicate complexities and exceptions in word prosody.

References Altmann, Heidi. 2006. The perception and production of second language stress: A crosslinguistic experimental study. Newark, DE: University of Delaware dissertation. http://ifla. uni-stuttgart.de/files/altmann-dissertation.pdf (accessed 07 March 2014) Altmann Heidi & Barış Kabak. 2011. Second language phonology. In Nancy C. Kula, Bert Botma & Kuniya Nasukawa (eds.), The Continuum companion to phonology, 298–319. London/ New York: Continuum. Archibald, John. 1992. Transfer of L1 parameter settings: Some empirical evidence from Polish metrics. Canadian Journal of Linguistics 37. 301–339.

204

Heidi Altmann and Barış Kabak

Archibald, John. 1997. The acquisition of English stress by speakers of nonaccentual languages: Lexical storage versus computation of stress. Linguistics 35(1). 167–181. Baugh, John. 1983. Black street speech: Its history, structure and survival. Austin, TX: University of Texas Press. Benrabah, Mohamed. 1997. Word-stress: A source of unintelligibility in English. International Review of Applied Linguistics in Language Teaching 35(3). 157–165. Best, Catherine T. 1995. A direct realist view of cross-language speech perception. In Winifred Strange (ed.), Speech perception and linguistic experience: Issues in cross-language research, 171–204. Timonium, MD: York Press. Capliez, Marc. 2011. Experimental research into the acquisition of English rhythm and prosody by French learners. Lille, France: Université de Lille dissertation. Chomsky, Noam & Morris Halle. 1968. The sound pattern of English. New York: Harper & Row. Chun, Dorothy M. 2002. Discourse intonation in L2: From theory and research to practice. Amsterdam: John Benjamins. Clopper, Cynthia G. 2002. Frequency of stress patterns in English: A computational analysis. Indiana University Linguistics Club Working Papers 2 (1). https://www.indiana.edu/ ~iulcwp/pdfs/02-clopper02.pdf (accessed 07 March 2014) Cutler, Anne. 1984. Stress and accent in language production and understanding. In Dafydd Gibbon & Helmut Richter (eds.), Intonation, accent and rhythm: Studies in Discourse Phonology, 77–90. Berlin: Mouton de Gruyter. Dell, François. 1985. Les règles et les sons. Paris: Hermann. Deterding, David. 2011. English language teaching and the lingua franca core. Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS), Hong Kong, August 17–21 August, 92–95. Domahs, Ulrike, Safiye Genç, Johannes Knaus, Richard Wiese & Barış Kabak. 2012. Processing (un)predictable word stress: ERP evidence from Turkish. Language and Cognitive Processes 28 (3). 335–354. Domahs, Ulrike, Ingo Plag & Rebecca Carroll (2014). Word stress assignment in German, English and Dutch: Quantity-sensitivity and extrametricality revisited. Journal of Comparative Germanic Linguistics 17. 59–96. Dresher, B. Elan & Aditi Lahiri. 2005. Main stress left in Early Middle English. In Michael Fortescue, Eva Skafte Jensen, Jens Erik Mogensen & Lene Schøsler (eds.), Historical Linguistics 2003. Selected papers from the 16th International Conference on Historical Linguistics, Copenhagen, 10–15 August 2003, 75–85. Amsterdam: John Benjamins. Dupoux Emmanuel, Christophe Pallier, Núria Sebastián Gallés & Jacques Mehler. 1997. A destressing ‘deafness’ in French? Journal of Memory and Language 36. 406–21. Dupoux Emmanuel, Sharon Peperkamp & Núria Sebastián Gallés. 2001. A robust method to study stress ‘deafness’. Journal of the Acoustical Society of America 110 (3). 1606–18. Dupoux Emmanuel, Nuria Sebastián Gallés, Eduardo Navarrete & Sharon Peperkamp. 2008. Persistent stress ‘deafness’: The case of French learners of Spanish. Cognition 106 (2). 682–706. Fikkert, Paula, B. Elan Dresher & Aditi Lahiri. 2006. Prosodic preferences: From Old English to Early Modern English. In Ans van Kemenade & Bettelou Los (eds.), The handbook of the history of English, 125–150. Oxford: Blackwell. Flege, James Emil & Ocke-Schwen Bohn. 1989. An instrumental study of vowel reduction and stress placement in Spanish-accented English. Studies in Second Language Acquisition 11., 35–62.

English word stress in L2 and postcolonial varieties

205

Flege, James Emil. 1995. Second-language speech learning: Theory, findings, and problems. In Winifred Strange (ed.), Speech perception and linguistic experience: Issues in crosslinguistic research, 233–277. Timonium, MD: York Press. Fournier, Jean-Michel. 2007. From a Latin syllable-driven stress system to a Romance versus Germanic morphology-driven dynamics: In honour of Lionel Guierre. Language Sciences 29 (2–3). 218–236. Giegerich, Heinz J. 1992. English Phonology: An introduction. Cambridge: Cambridge University Press. Guion, Susan G., Tetsuo Harada, & J. J. Clark. 2004. Early and late Spanish-English bilinguals’ acquisition of English word stress patterns. Bilingualism: Language and Cognition 7(3). 207– 226. Gut Ulrike. 2003. Prosody in second language speech production: The role of the native language. Fremdsprachen Lehren und Lernen 32. 133–152. Gut, Ulrike. 2005. Nigerian English prosody. English World-Wide 26 (2). 153–177. Gut Ulrike. 2009. Non-native speech. A corpus-based analysis of the phonetic and phonological properties of L2 English and L2 German. Frankfurt: Peter Lang. Hahn, Laura D. 2004. Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly 38 (2). 201–223. Hayes, Bruce. 1982. Extrametricality and English stress. Linguistic Inquiry 13 (2). 227–276. Hubicka, Olga. 1980. Why bother about phonology? Practical English Teaching 1(1). 22–24. Jenkins, Jennifer. 2000. The phonology of English as an international language: New models, new norms, new goals. Oxford: Oxford University Press. Jones, Daniel. 2006. English pronouncing Dictionary. Cambridge: University Press. Kabak, Barış & Irene Vogel. 2001. The phonological word and stress assignment in Turkish. Phonology 18. 315–360. Kabak, Barış, Kazumi Maniwa, Kazumi & Nina Kazanina. 2010. Listeners use vowel harmony and word-final stress to spot nonsense words: A study of Turkish and French. Journal of Laboratory Phonology 1 (1). 207–224. Kachru, Braj B. 1985. Standards, codification and sociolinguistic realism: The English language in the outer circle. In Randolph Quirk & Henry Widdowson (eds.), English in the world: Teaching and learning the language and literatures, 11–30. Cambridge: Cambridge University Press. Kijak, Anna. 2009. How stressful is L2 stress? A cross-linguistic study of L2 perception and production of metrical systems. The Hague: LOT Publications. Kingdon, Roger. 1958. The groundwork of English stress. London: Longman. Kretzschmar, William A. Jr. 2008. Standard American English pronunciation. In Edgar W. Schneider (ed.), Varieties of English. Volume2: The Americas and the Caribbean, 37–51. Berlin: Mouton de Gruyter. Liberman, Mark & Alan Prince. 1977. On stress and linguistic rhythm. Linguistic Inquiry 8. 249– 336. Pater, Joe. 1997. Metrical parameter missetting in second language acquisition. In S.J. Hannahs & Martha Young-Scholten (eds.), Focus on phonological acquisition, 235–261. Amsterdam: John Benjamins. Peperkamp, Sharon & Emmanuel Dupoux. 2002. A typological study of stress ‘deafness’. In Carlos Gussenhoven & Natasha Warner (eds.) Papers in Laboratory Phonology 7, 203– 240. Berlin: Mouton de Gruyter.

206

Heidi Altmann and Barış Kabak

Roach, Peter. 2009. English phonetics and phonology: A practical course, 4th edn. Cambridge University Press. van Rooy, Bertus. 2002. Stress placement in Tswana English: The makings of a coherent system. World Englishes 21 (1). 145–160. Schwab, Sandra & Joaquim Llisterri. 2011. Are French speakers able to learn to perceive lexical stress contrasts? Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS), Hong Kong, August 17–21, 1774–77. Simo Bobda, Augustin. 1995. The phonologies of Nigerian English and Cameroon English. In Ayo Bamgbose, Ayo Banjo & Andrew Thomas (eds.), New Englishes: A West African perspective, 248–268. Ibadan: Mosuro Publishers. Simo Bobda, Augustin. 2008. Predictability of word stress in African English: Evidence from Cameroon English and Nigerian English. In Augustin Simo Bobda (ed.), Explorations into language use in Africa, 161–181. Frankfurt: Peter Lang. Simo Bobda, Augustin. 2010. Word stress in Cameroon and Nigerian Englishes. World Englishes 29(1). 59–74. Simo Bobda, Augustin. 2011. Understanding the innerworks of word stress in RP and Cameroon English: A case for a competing constraints approach. International Journal of English Linguistics 1(1). 81–104. Tottie, Gunnel. 2002. An introduction to American English. Malden, MA & Oxford: Blackwell. Trouvain, Jürgen & Ulrike Gut (eds.), Non-native prosody: Phonetic description and teaching practice. Berlin & New York: Mouton de Gruyter. Yavaş, Mehmet. 2011. Applied English phonology, 2nd edn. London: Blackwell. Zerbian, Sabine. 2013. Prosodic marking of narrow focus across varieties of South African English. English World-Wide 34(1). 26–47.

English word stress in L2 and postcolonial varieties

207

Appendix 1: Structures of nonce words created for Study 1 (V denotes a tense vowel): 2 syllables: Cə•CV, CV•Cə, CV•CV 3 syllables: Cə•CV• Cə, CV•Cə•CV, CV•CV•Cə, CV•Cə•Cə 4 syllables: Cə•CV•Cə•CV, CV•Cə•CV•Cə, CV•Cə•CV•CV, CV•Cə•Cə•CV, CV•CV•Cə•CV

Appendix 2: List of words presented in Study 2: Disyllabic words: confused, constant, laptop, occur, police, refer Polysyllabic words: analysis, annual, attention, calendar, component, computer, continuous, dictionary, examine, formulate, hypothesis, interpret, manipulated, memory, paradise, parenthesis, phenomenon, proposal, qualitative, reference, reviewed, speculation, treatment, variable

Sabine Zerbian

11 Prosodic marking of focus in transitive sentences in varieties of South African English1 1 Introduction English is one of the official languages in South Africa and the language of teaching and learning. Nevertheless, English in South Africa is not a homogeneous variety. Despite the end of apartheid South Africa remains a heterogeneous society with the different ethnic groups in the country having their own culture and language. The resulting recognisable varieties of English that coexist in the country are still referred to with reference to ethnic groups such as Indian, Black, Coloured and White South African English. The differences in the phonological systems of these varieties have been described in the respective chapters in Mesthrie (2008). There are not only differences across different ethnic varieties, but also the English spoken within an ethnic group shows considerable linguistic variation across speakers. The present chapter focuses on the speech of Black speakers.2 Black South African English (BlSAfE) emerged as an ethnic variety of South African English due to the segregation politics of apartheid South Africa (1948 to 1994). Its speakers are multilingual in one or more of the local Bantu languages and English. BlSAfE has phonological and syntactic features (van Rooy 2004) which clearly mirror the influence of the local Bantu languages. Because of this discernible influence of the local Bantu languages, this variety will be 1 This research was carried out with funding from the German Research Foundation (DFG), grant to the SFB 632 “Information Structure” at the University of Potsdam. Thanks go to Prof. Rajend Mesthrie from the Department of Linguistics at the University of Cape Town for his support in carrying out the research, to all participants for their cooperation, to Bruce Wileman for his help in doing the recordings, to Svenja Schuermann for assistance in data annotation and analysis, and to the three reviewers for helpful and constructive feedback. All errors are my own. 2 I follow the convention used in Mesthrie (2010, footnote 2) according to which black (in lower case) is “used to denote a composite group of Blacks, Coloureds and Indians once discriminated against by the apartheid state” and Black (in upper case) is used to refer to a specific ethnic group.

Sabine Zerbian, University of Stuttgart & University of the Witwatersrand, Johannesburg

210

Sabine Zerbian

referred to as an L2 variety. This variety continues to exist in present-day South Africa. However, due to Black economic empowerment and increased opportunities in education and profession, a middle class has emerged among the Black population. The linguistic norms of this group differ remarkably from BlSAfE. Although these speakers are largely still multilingual in English and one or more of the South African Bantu languages, their English does not immediately show phonological and/or syntactic markers of this multilingual background. Mesthrie (2010: 13) suggests that a new dialect has emerged and characterises this accent as “the prestige styles used by young people of colour, who have non-racial peer groups and behave in ways associated traditionally with Whites”. He proposes the sociolinguistic term “crossing over” to be used for this accent “that is not traditionally associated with people of one’s presumed ethnicity” (Mesthrie 2010: 13). This term will also be used here to refer to this “new” variety among Black speakers. More and more studies investigate segmental aspects pertaining to vowels in the “new” variety of Black middle-class speakers (Da Silva 2008; Mesthrie 2010; Wilmot 2014). They find that although white norms might be approached and even adopted by some speakers (e.g. Black females’ /u/-fronting, Wilmot 2014), the variety is not identical to White South African English (WSAfE) (Mesthrie 2010). The present study addresses the considerably lesser-studied aspect of variation in suprasegmental phonology among the group of Black multilingual speakers of both the “crossing over” variety and BlSAfE, with the aim of finding out if the same constraints on the outcome operate in English as a second language (L2) or as a “new” variety. Data from these two varieties are compared to the speech of monolingual speakers of WSAfE. The suprasegmental phenomenon under consideration is the use of prosody for the marking of focused and given constituents. In English, prosodic focus marking results in an increase of intensity, length and fundamental frequency on the focused constituent, making it prosodically prominent. At the same time, constituents which are given in discourse are deaccented. The Bantu languages of South Africa, which always are the first and/or additional languages of Black speakers of South African English, have not been reported to use purely prosodic means to mark focused or given constituents (Zerbian 2007 for Northern Sotho; Swerts and Zerbian 2010 for Zulu). Rather, morphosyntactic means such as left- and right-dislocation of given constituents are used (cf. Zerbian 2006 for Northern Sotho). Thus, the difference in the use of prosody between English and the South African Bantu languages makes the South African English of multilingual speakers interesting varieties to study. Zerbian (2013) investigated acoustic cues of focusing nouns and adjectives in modified noun phrases in groups similar to the ones in the current study. It was found that speakers of BlSAfE do not

Prosodic marking of focus in transitive sentences in varieties

211

manipulate F0 and intensity on the basis of focus. Speakers of the “crossing over” variety (termed “postacrolect” in Zerbian 2013) do not change intensity for focus marking. This study used semi-spontaneous speech, and the results indicate that a difference exists in the phonetic implementation of prosody across these three varieties. The present chapter presents the results of a study which tested the same research question, this time using read speech and, more importantly, varying focus within a sentence (and not within a noun phrase as in Zerbian (2013)). The domain of deaccenting is said to vary across languages: Whereas English deaccentuates both within phrases and sentences, Egyptian Arabic does not deaccentuate neither within sentences nor within phrases (Hellmuth 2005), and Italian has been reported to allow deaccenting within sentences but not within phrases (Ladd 1996: 177). It is therefore necessary to test whether the phonetic implementation of focus in these varieties differs also in sentences. At the same time, the current chapter interprets the results of both studies to address one of the leading questions of the volume, namely whether the same phonological constraints operate in English as a L2 or as a “new” variety. The article is structured as follows: Section 2 provides the background concerning prosodic marking of information structure (focus and givenness) in WSAfE, by presenting the results of an experimental study for this group of speakers. It will emerge that in WSAfE focus and givenness are marked prosodically in a similar way to what has been reported for other L1 varieties of English, such as Southern Standard British English or General American English. Sections 3 and 4 present the results of the same study conducted with Black speakers of South African English for the two varieties described above. Section 5 discusses the results.

2 Prosodic marking of focus and givenness in White South African English 2.1 Background The prosodic marking of focused constituents in transitive sentences in English is well-researched. It is generally agreed that the nuclear accent shifts to the constituent in focus (in (1a), (1b) and (1c) indicated by capital letters), resulting in a significant increase of fundamental frequency (F0), duration and intensity on the primarily stressed syllable of this constituent as compared to a non-focused rendering (see Breen et al. 2010 for a recent review). The prosodic prominence

212

Sabine Zerbian

that accompanies focus in English can in principle be realised on all lexical categories in a transitive sentence, i.e. on the subject, the verb or the object. In (1), as in the experiment to be reported on below, focus on a constituent is unambiguously established by means of a preceding wh-question. This approach follows a definition of focus as an indication of alternatives (cf. Krifka 2008). (1)

a.

What may Lila mow? Lila may mow [the MEAdow]F.

b.

Who will mow the meadow? [LIla]F may mow the meadow.

c.

What may Lila do to the meadow? Lila may [MOW]F the meadow.

The following subsection lays out in detail the methodology with which the data were collected from speakers of the varieties of South African English under consideration in the present study.

2.2 Methodology The methodology closely resembles the paradigm first used by Xu (1999) for Mandarin Chinese and subsequently applied to other languages, such as English in Xu & Xu (2005). A similar paradigm was also followed by Breen et al. (2010) for General American English.

2.2.1 Material Five stimulus sentences were created that were comparable in segmental makeup.3 The stimulus sentences are shown in (2).

3 All sentences consist of the same number of words. The target words (subject noun, object noun and verb) show comparable initial stress, the same phonological length in the stressed vowel, and a comparable segmental make-up wherever possible. These factors were kept constant to control for segmental effects. Care was taken that the object nouns were disyllabic to allow for the realisation of both pitch accent and boundary tone without tonal crowding on the same syllable. The subject nouns matched the object nouns for number of syllables. Full verbs always had heavy syllables with either a long vowel, diphthong or a closed syllable.

Prosodic marking of focus in transitive sentences in varieties

(2)

Stimulus sentences subject item σ1 σ2 1 Lila 2 Nina 3 Nor- man 4 Mo- na 5 Nor- ma

auxiliary may may may may may

verb mow learn know name gnaw

article the the my the the

object σ1 meamemomminmea-

213

σ2 dow. mo. my. now. lie.

To control for focus, each stimulus sentence was preceded by a question, evoking broad focus as well as focus on the subject, verb and object respectively, thus rendering the other constituents discourse-given by having been explicitly mentioned in the question. The answers were presented in writing on a screen, together with their questions, accompanied by a picture illustrating the action. Each sentence occurred once in each focus condition. One additional set was provided to become familiar with the task but was excluded from analysis. The repetitions necessary for quantitative analyses were thus provided by the five different stimulus sentences rather than by repetitions of identical sentences. The same order of focused constituent was maintained across all 5 sets: Broad focus, subject focus, verb focus and finally object focus. Sentences with differing focus structures followed each other directly without any intervening fillers in order to make speakers aware of the potential for ambiguity. It was decided that the stimuli sentences were presented in direct contrast to each other and not randomised (cf. Xu 1999), because sometimes speakers only reliably mark contrasts when they are made aware of ambiguity (cf. Snedeker and Trueswell 2003; Breen et al. 2010: 1089). In addition, the focused target words were underlined in the answers in order to reduce errors (cf. Xu 1999). The reasoning is that if prosodic means are available to speakers they will then know exactly where to produce them (due to underlining) and no errors occur because of the focus structure not having been processed correctly. If no prosodic means are available to the speaker, then the underlining will not be able to evoke prominence anyway.

2.2.2 Recording Recordings were conducted in a quiet office at the Department of Linguistics of the University of Cape Town in February/March 2012. A microphone was placed approximately 20 to 30 cm in front of the participant’s mouth. Participants were asked to read out loud both the question and the answer, before proceeding to

214

Sabine Zerbian

the next slide at their own pace. In case of hesitations, the participants were asked to repeat both the question and the answer. Three sets of stimuli sentences were presented together, and after a short break, in which a different task was administered, another two sets of stimuli sentences were presented.

2.2.3 Subjects Data from nine white speakers were analysed for whom English was their first and only language. They are speakers of General South African English (GenSAfE), the standard variety spoken predominantly by white middle-class speakers (Bekker 2009). The term GenSAfE does not refer to any specific ethnicity and is used in the present chapter when this specific reference is not considered relevant. They were aged between 18 and 23, and were students at the University of Cape Town in different disciplines. Five speakers were male, four were female.

2.2.4 Preparation of data analysis The recordings were directly digitised onto the hard disk of a PC laptop for further analysis. Syllables were delineated and are the unit of display in the graphs in Figures 1 to 8. Data extraction was carried out using ProsodyPro (Xu 2013) for the software Praat (Boersma and Weenink 2012). Praat provides automatic vocal pulse marking which was manually corrected for missing or incorrect values in case of obvious octave jumps, using the ProsodyPro script. The data were first inspected visually for F0 changes over the course of the utterances. Subsequently, duration, mean F0 and mean intensity were measured for the stressed syllables of the subject, object and verb.

2.3 Results for White South African English 2.3.1 Visual inspection of F0 contours As a first approach to the data, F0 contours averaged across stimuli and speakers were inspected for each focus condition. There are four logical possibilities of how focus and givenness can be realised prosodically: (1) through prosodic marking of the focused constituent, (2) through prosodic marking of given constituents, and (3) by prosodic marking of both the focused and the given constituents. Furthermore, speakers may (4) not use any prosodic marking at all.

Prosodic marking of focus in transitive sentences in varieties

215

The following two parameters were thus considered when inspecting the F0 contours visually: – To what extent is there prosodic focus marking, i.e. to what extent do focused constituents show on average an increased pitch compared to a baseline of broad focus? – To what extent is there givenness marking, i.e. to what extent is a lower pitch observable on given elements as compared to a baseline of broad focus? The answer to the broad focus question “What is happening?” served as the baseline as it represents an all-new utterance in which no constituent is particularly in focus. The following graph shows the F0 contours averaged across the five stimulus sentences and across speakers. In order to account for genderrelated differences in pitch among speakers, the F0 values obtained for each speaker were first converted to their logarithms (using the log() function in R). The logarithmic values were averaged across all sentences and speakers, and the resulting mean was converted back to Hz values for the visual display in Figure 1. In all figures, a pitch range of 130 Hz is displayed. Data are timenormalised by taking ten measures of each syllabic interval. For the labels of the x-axis see (2). The solid line represents the baseline. In comparison to the baseline we find that focus results in a higher F0 in the case of subject and verb focus for these

Figure 1: F0 average across all speakers and utterances for WSAfE

216

Sabine Zerbian

speakers of WSAfE. Givenness results in a lower F0 as compared to the baseline and moreover in postfocal deaccentuation in the case of subject and verb focus. In the case of object focus, the accent on the object is nearly as high as the one on the subject. In the broad focus context, the pitch accent on the object is realised lower than the one on the subject. In object focus context, both pitch accents are of equal height, rendering the second accent perceptually more prominent. A qualitative analysis of the pitch patterns speaker-by-speaker reveals that eight out of nine speakers mark focus and givenness prosodically by either pitch lowering only (3/9) or by both pitch lowering and pitch expansion (5/9). Only one speaker does not seem to apply either. In sum, what can be seen from the graphical display in Figure 1 is that in WSAfE we find clear marking of both focus and givenness by means of F0.

2.3.2 Inferential analysis In order to further examine prosodic focus and givenness marking, the three acoustic parameters relevant for focus, namely maximum F0, intensity and duration, were measured on the stressed syllables of the subject, object and the full verb. Linear mixed models (Bates and Sarkar 2007) were fitted with maximum F0/duration/intensity of the stressed syllable as the dependent variable, focus as a fixed factor and speaker as a random factor. This was done in order to investigate whether these three measures are significantly different from each other when comparing focused and given constituents to the baseline of allnew sentences. The results are presented for each of the measures in turn and are summarised in 2.4. 2.3.2.1 Maximum F0 The mean values of maximum F0 for each of the constituents across all utterances and speakers are presented in Table 1 (using log-values in the analysis, cf. 2.3.1.). The results of the linear mixed models are given below. The significance level was set at p = 0.05. Table 1: Means of maximum F0 of stressed syllables, differentiated by focus condition (in Hz) baseline

focus

given prefocal

postfocal

σ1 (subj.)

170.07

180.77

vF: 151.02

σ (verb)

146.28

161.83

oF: 143.73

sF: 127.98

σ1 (obj.)

144.74

147.74

n/a

sF: 124.37

oF: 155

n/a

vF: 127.57

217

Prosodic marking of focus in transitive sentences in varieties

For prosodic focus marking by means of F0 it was found that – a focused subject has a significantly higher F0 than a subject in the baseline (t = –2.66; p = 0.0086) – a focused verb has a significantly higher F0 than a verb in the baseline (t = –4.08; p < 0.001) – a focused object does not differ significantly in F0 from an object in the baseline (t = –0.7; p = 0.486) For givenness marking by means of F0 it was found that – a given subject has a significantly lower F0 than a subject in the baseline, both in the case of verb focus (VF) and object focus (OF) (vF: t = –5.12; p < 0.001; oF: t = –6.21; p < 0.001) – a given verb only has a significantly lower F0 than a verb in the baseline if it appears postfocally, thus in the case of subject focus (sF: t = 5.46; p < 0.001), but not if it occurs prefocally as in the case of object focus (oF: t = –0.72; p = 0.4735) – a given object has a significantly lower F0 than an object in the baseline, both in the case of verb focus (VF) and subject focus (SF) (sF: t = –5.16; p < 0.001; vF: t = –4.24; p < 0.001)

2.3.2.2 Duration In order to examine whether duration is influenced by information structure (focus/givenness), duration was measured on the stressed syllables of the subject, the object and the full verb. The mean values across all utterances and speakers are presented in Table 2. Table 2: Means of duration of stressed syllables, differentiated by focus condition (in ms) baseline

focal

given prefocal

postfocal

σ1 (subj.)

207.8

226.5

vF: 170.4

σ (verb)

337

384.4

oF: 313.8

sF: 276.2

σ1 (obj.)

165.6

165.8

n/a

sF: 152.1

oF: 177.3

n/a

vF: 154.7

In order to test whether the differences in duration between the different conditions are significant, a linear mixed model was fitted with duration of the stressed syllable as the dependent variable, focus as a fixed factor and both speaker and item as random factors. Item was chosen to be included as a

218

Sabine Zerbian

random factor because there are some slight differences in segmental structure between the different target sentences. For prosodic focus marking by means of duration it was found that – a focused subject has a significantly longer duration than a subject in the baseline (t = –2.3; p = 0.0227) – a focused verb is significantly longer than a verb in the baseline (t = –2.812; p = 0.0055) – a focused object is not significantly longer as compared to the baseline (t = –0.044; p = 0.9651) For prosodic givenness marking by means of duration it was found that – a given subject has a significantly shorter duration than a subject in the baseline both in the case of verb focus and object focus (vF: t = –4.543; p < 0.001; oF: t = –3.755; p < 0.001) – a given verb is significantly shorter only postfocally, i.e. in the case of subject focus (sF: t = –3.662; p < 0.001), not prefocally, i.e. in the case of object focus (oF: t = –1.399; p = 0.1638) – a given object is significantly shorter both in the case of subject and verb focus (sF: t = –3.719; p < 0.001; vF: t = –2.959; p = 0.0036)

2.3.2.3 Intensity The mean values of intensity in the stressed syllable of subject, verb and object across all utterances and speakers are presented in Table 3. Table 3: Means of intensity of stressed syllables, differentiated by focus condition (in dB) baseline

focal

given prefocal

postfocal

σ1 (subj.)

70.29

70.57

vF: 68.47

σ (verb)

67.53

67.78

oF: 66.42

sF: 64.36

σ1 (obj.)

66

65.75

n/a

sF: 61.93

oF: 68.6

n/a

vF: 63.15

For prosodic focus marking by means of intensity it was found that – a focused subject does not have significantly higher intensity than a subject in the baseline (t = –0.52; p = 0.6061) – a focused verb does not have significantly higher intensity than a verb in the baseline (t = –0.53; p = 0.5937) – a focused object does not have significantly higher intensity than an object in the baseline (t = 0.52; p = 0.6063)

Prosodic marking of focus in transitive sentences in varieties

219

For prosodic givenness marking by means of intensity it was found that – a given subject has a significantly lower intensity than a subject in the baseline both for verb and object focus (vF: t = –3.09; p = 0.0024; oF: t = –3.22; p = 0.0016) – a given verb has a significantly lower intensity than a verb in the baseline both pre- and postfocally (oF: t = –2.37; p = 0.019; sF: t = –6.77; p < 0.001) – a given object has a significantly lower intensity than an object in the baseline (sF: t = –8.27; p < 0.001; vF: t = –5.73; p < 0.001)

2.4 Summary of the results for WSAfE The inferential analysis of F0 confirms the visual impression shown in section 2.3.1. We find a significant on-focus F0 increase in this group of speakers for subject and verb focus. We also find off-focus F0 lowering when comparing a given subject to its realisation in the baseline. For the verb, F0 lowering can only be observed for postfocal but not for prefocal occurrence. The F0 lowering on given subjects renders an object prosodically prominent although it is itself not marked with on-focus pitch expansion. For duration, a corresponding pattern emerges. The durations of the stressed syllable of the verb and subject are significantly longer in the focus condition than in the baseline condition. Given constituents are nearly always shorter than the same constituents in the baseline condition. For the verb, a distinction into prefocal and postfocal has to be made: only postfocal verbs are shorter than in the baseline. For intensity realisation, there is no significantly increased intensity rise on the focused constituent as compared to the baseline. We do find consistent intensity decreases on given constituents though: Intensity drops on the subject when another constituent is in focus. For verb and object, we find an intensity decrease if they occur after the focused constituent. Again, the verb shows a slight difference in intensity reduction between pre- and postfocal position although the intensity decrease is significant in both instances. Thus, the overall pattern of prosodic focus and givenness marking found in WSAfE is similar to the one reported for Southern Standard British English and General American English (cf. Breen et al. 2010 for a recent overview). Prosodic marking for both focus and givenness can be found. The acoustic parameters F0, duration and intensity conspire to render the focused element prosodically prominent by either increased pitch, loudness and duration and/or decreased values in the ways described in detail above.

220

Sabine Zerbian

Against this background, the realisation by Black speakers of South African English will be discussed in the following two sections, starting with Black speakers of the “crossing over” variety in section 3 and continuing with the L2 BlSAfE in section 4.

3 Prosodic marking of focus and givenness in the “crossing over” variety of Black speakers 3.1 Methodology and speakers The methodology followed the one described in section 2.2. for the speakers of WSAfE. Data of seven Black speakers of the “crossing over” variety were analysed. They were aged between 18 and 22, and were students at the University of Cape Town in different disciplines. Six of them were female, one was male. All speakers were multilingual, speaking English and one or more of the South African Bantu languages (Xhosa (3), Zulu (3), Pedi (1), and Tswana (1); one speaker gave two languages). Their pronunciation was judged by two trained linguists to resemble GenSAfE, showing no obvious linguistic traces of the South African Bantu languages. Self-reports showed that all speakers in this group had attended a private or ex-model-C high school. Both school types are dominantly White or mixed in terms of peer structure. The term “ex-model-C”, when used in describing South African schools, refers to those schools which were formerly reserved for Whites and which in the early 1990s, when these schools were opened to other race groups, elected to receive state funding of staff members, while allowing for their own policies of admission. Such schools retain a reputation for providing a better education than other public schools and many have seen a significant influx of black students since even before the end of apartheid, leading to a more racially-integrated situation in these schools, where students have more access to GenSAfE norms than is the case in government schools (cf. Hofmeyr 2000). Recording, data preparation and analysis followed the same protocol as reported in sections 2.2. and 2.3. The results will be presented in a parallel fashion, with a summary in section 3.4.

3.2 Visual inspection Figure 2 shows the F0 contour averaged across all stimuli sentences and all speakers for the four focus conditions. Values have been normalised for F0 and are shown in a pitch range of 130 Hz.

Prosodic marking of focus in transitive sentences in varieties

221

Figure 2: F0 averages across all speakers and utterances for the “crossing over” variety (Black speakers)

For the group as a whole, F0 is slightly increased on the focused constituents. Given subjects are realised below the baseline, given verbs are realised like the baseline, and given objects have lower F0 values. The pattern is not as clear as with the speakers of WSAfE though. Looking at each speaker individually, we find that two patterns emerge across the seven speakers. In two speakers4, we find that F0 is manipulated depending on focus by both F0 increase on the focused constituent and F0 decrease on the postfocal constituents as compared to the baseline. The F0 pattern is similar to the one seen in WSAfE. An example is provided in Figure 3.5 Five speakers6 do not seem to manipulate F0 on the basis of focus in any systematic way, neither in marking the prominent syllable with an F0 peak nor by lowering the pitch of given elements. An example is provided below in Figure 4 (note that this speaker consistently produces high rising terminals).

4 Speakers of Pedi and Tswana respectively. 5 Inspection of the data show that one speaker shows increased duration under focus but no consistent change in intensity. The other speaker shows both increased duration and increased intensity under focus. 6 Speakers of Xhosa (3) and Zulu (3). One speaker gave two languages.

222

Sabine Zerbian

Figure 3: F0 averages for speaker 58 (male, “crossing over” variety, Tswana speaker)

Figure 4: F0 averages for speaker 62 (female, “crossing over” variety, Zulu speaker)

3.3 Inferential analysis As before, the three acoustic measures relevant for focus, namely maximum F0, intensity and duration, were measured on the stressed syllables of the subject, object and the full verb. Linear mixed models were fitted with maximum

Prosodic marking of focus in transitive sentences in varieties

223

F0/duration/intensity on the stressed syllable as the dependent variable, focus as a fixed factor and speaker as a random factor. This was done in order to investigate whether these three measures are significantly different from each other when comparing focused and given constituents to the baseline. The results are presented for each of the measures in turn and are summarised in 3.4.

3.3.1 Maximum F0 The mean values across all utterances and speakers are presented in Table 4. The results of the linear mixed models are reported below. Table 4: Means of maximum F0 of stressed syllables, differentiated by focus condition (in Hz) baseline

focal

given prefocal

postfocal

σ1 (subj.)

187.4

187.1

vF: 180

σ (verb)

160.9

170.1

oF: 160.4

sF: 153.6

σ1 (obj.)

156.5

159.4

n/a

sF: 150.18

oF: 185.2

n/a

vF: 149.65

For prosodic focus marking by means of maximum F0 it was found that – a focused subject is not significantly higher in F0 than in the baseline (t = 0.66; p = 0.5125) – a focused verb is significantly higher in F0 than in the baseline (t = –3.33; p = 0.0011) – a focused object is not significantly higher in F0 than in the baseline (t = –0.74; p = 0.4606) For prosodic givenness marking by means of maximum F0 it was found that – a given subject is marginally significantly lower in F0 as in the baseline in the case of verb focus but not significantly lower in F0 in object focus (vF: t = –2.65; p = 0.0092; oF: t = –1.19; p = 0.2346) – a given verb is only postfocally significantly lower in F0 than in the baseline, i.e. in the case of subject focus (sF: t = –2.72; p = 0.0075), not prefocally, i.e. in the case of object focus (oF: t = –0.17; p = 0.8668) – a given object is not significantly lower in F0 than in the baseline (sF: t = –1.70; p = 0.0912; vF: t = –1.87; p = 0.0646).

224

Sabine Zerbian

3.3.2 Duration The mean values across all utterances and speakers are presented in Table 5. The results of the linear mixed models are reported below. Table 5: Means of duration of stressed syllables, differentiated by focus condition (in ms) baseline

focal

given prefocal

postfocal

σ1 (subj.)

207.25

217.86

vF: 181.44

σ (verb)

373.9

421.4

oF: 363.81

sF: 288.35

σ1 (obj.)

181.9

172.99

n/a

sF: 160.65

oF: 182.27

n/a

vF: 160.87

For prosodic focus marking by means of duration it was found that – a focused subject is not significantly longer than in the baseline (t = 0.939; p = 0.3496) – a focused verb is not significantly longer than in the baseline (t = –1.759; p = 0.0812) – a focused object is not significantly longer than in the baseline (t = 1.549; p = 0.124) For prosodic givenness marking by means of duration it was found that – a given subject is significantly shorter than in the baseline (vF: t = –2.31; p = 0.0227; oF: t = –2.234; p = 0.0274) – a given verb is significantly shorter than in the baseline only postfocally, i.e. in the case of subject focus (sF: t = –3.135; p = 0.0022), not prefocally in the case of object focus (oF: t = –0.373; p = 0.7096) – a given object is significantly shorter than in the baseline in both subject and verb focus (sF: t = –3.66; p < 0.001; vF: t = –3.661; p < 0.001) 3.3.3 Intensity The mean values across all utterances and speakers are presented in Table 6. The results of the linear mixed models are reported below. Table 6: Means of intensity of stressed syllables, differentiated by focus condition (in dB) baseline

focal

given prefocal

σ1 (subj.)

72.33

72.31

vF: 72.34

postfocal oF: 71.79

n/a

σ (verb)

69.67

70

oF: 68.76

sF: 67.42

σ1 (obj.)

67.76

67.51

n/a

sF: 64.89

vF: 65.29

Prosodic marking of focus in transitive sentences in varieties

225

For prosodic focus marking by means of intensity it was found that – a focused subject does not have a significantly higher intensity than a subject in the baseline (t = 0.04; p = 0.9674) – a focused verb does not have a significantly higher intensity than a verb in the baseline (t = –0.55; p = 0.5822) – a focused object does not have a significantly higher intensity than an object in the baseline (t = 0.38; p = 0.7028) For prosodic givenness marking by means of intensity it was found that – a given subject does not have a significantly lower intensity than a subject in the baseline (vF: t = 0.01; p = 0.9889; oF: t = –0.95; p = 0.3433) – only in the postfocal position does a given verb have a significantly lower intensity than in the baseline (sF: t = –3.72; p < 0.001). The same is not true for the prefocal occurrence in case of object focus (oF: t = –1.52; p = 0.1316) – a given object has a significantly lower intensity than an object in the baseline, both in subject and verb focus (sF: t = –4.36; p < 0.001; vF: t = –3.79; p < 0.001)

3.4 Summary of the results for the “crossing over” variety of Black speakers There is more variation and a less consistent pattern in the use of prosody for the marking of focus and givenness in the “crossing over” variety of Black speakers. Average maximum F0 is only increased in the case of verb focus as compared to the baseline. In subject and object focus, no increase of F0 takes place. This is contrary to speakers of WSAfE, who show a significantly higher F0 in the focus conditions for all constituents except objects. For F0 on given constituents, it emerges that F0 is only significantly lower on the subject when the verb is focused and on postfocal verbs. This last finding concerning the verb is in agreement with Xu and Xu’s (2005) work on English and with the findings for the speakers of WSAfE, namely that there is a clear difference between preand postfocal focus and givenness in verbs. Moreover, the effect in the “crossing over” variety of Black speakers is not as strong as in WSAfE. Concerning duration, the primary stressed syllable of a focused constituent is never lengthened compared to the baseline rendition. The stressed syllables of given constituents (except a prefocal verb) are considerably shortened as compared to the baseline. As for intensity, there is no difference in intensity on a focused constituent compared to its rendition in the baseline, similar to what was found for speakers of WSAfE. For the subject, no difference in intensity can be found when comparing given realisations to the baseline. Thus, this group of speakers does not

226

Sabine Zerbian

realise the decrease in intensity observed in the group of speakers of WSAfE. For the verb and object, there is a decrease in intensity in the given realisation when the constituents occur postfocally. For the verb, a real difference emerges between prefocal and postfocal realisation. This is parallel to what was found for speakers of WSAfE.

4 Prosodic marking of focus and givenness in BlSAfE (L2 by Black speakers) 4.1 Methodology and speakers The methodology followed the one described in section 2.2. Data of 18 speakers of BlSAfE were analysed. The speakers were aged between 18 and 25, and were students at the University of Cape Town in different disciplines. Eight were male speakers, 10 were female. All speakers were multilingual, speaking English and one or more of the South African Bantu languages (Zulu (5), Xhosa (5), Sotho (2), Pedi (3), Tswana (2), Tsonga (2), Venda (2); three speakers gave two languages). Despite the different Bantu languages represented in this group, they are considered together as Wissing (2002) argues that there are no major differences by home (or ancestral) language, at least at the segmental level. Their English pronunciation showed clear traces of the South African Bantu languages, as confirmed by the two trained linguists who also listened to the speakers of the “crossing over” variety (see van Rooy 2004 for the phonological features of this variety). The group was slightly heterogeneous in terms of schooling background. Self-reports show that eight speakers of this variety had attended ex-DET schools in the past. The acronym “DET” stands for “Department of Education and Training” and refers to schools which were under the jurisdiction of this department and which were formerly for black students only. Such schools are often referred to as “township schools”, alluding to the fact that they usually serve township communities and are usually located in such areas. They are considered to provide an education which is lower in quality to that provided by other public schools, as these schools were under-resourced during apartheid. As the vast majority of students and teachers in these schools are black and there are very few, if any speakers of GenSAfE in these schools, access to GenSAfE norms through socialisation is minimal as there has been less racial integration in these schools (cf. Hofmeyr 2000). The other speakers had attended either government schools or, in two cases, ex-model-C schools. A differentiation is commonly made in the literature on BlSAfE into mesolect and acrolect (e.g. van Rooy 2004). The acrolect, by definition, is closer to GenSAfE, but has been shown to be characterised by considerable variation in

Prosodic marking of focus in transitive sentences in varieties

227

its vowel system (van Rooy 2004: 139). Observations like these led Mesthrie (2010: 28) to conclude that no stable acrolect of BlSAfE has developed. So even among educated speakers, such as the students who participated in the current study, considerable variation is expected to occur. Recording and data preparation followed the same protocol as reported in sections 2.2. and 2.3. The results are presented in a parallel fashion, with a summary of the results in section 4.4.

4.2 Visual inspection Figure 5 shows the pitch contour averaged across all stimuli sentences and all speakers for the four focus conditions. Again, values have been converted to their log-values for averaging, cf. 2.3.1. For the group as a whole, we see only very fine differences in F0 between focus contexts. There seems to be accentuation on postfocal constituents in both subject and verb focus with very slightly reduced F0. No clear evidence exists for on-focus F0 increase. A speaker-by-speaker investigation shows considerable variation as was to be expected in this group. Three of the four logical possibilities of focus marking are attested. For six speakers (6/18) F0 is manipulated on the basis of focus by both F0 increase on the focused constituent and F0 decrease on postfocal constituents in at least some cases. The marking is clearer in some cases than in

Figure 5: F0 averages across all speakers and utterances for BlSAfE

228

Sabine Zerbian

Figure 6: F0 averages for speaker 18 (female, BlSAfE, speaker of Tswana and Tsonga)

others. Overall, the pitch differences are very small. An example is given in Figure 6.7 Four speakers (4/18) do not show a pitch peak on the focused constituent but do show (very slight) lower pitch on some postfocal elements. Again, the differences are often rather subtle, as in the example in Figure 7.8 Eight speakers (8/18) do not seem to manipulate F0 depending on focus, neither in marking the prominent syllable with an F0 increase nor by lowering F0 on postfocal elements, as illustrated in Figure 8.9

7 Speakers of Tswana (2), Zulu (2), Venda (1), Tsonga (1), Xhosa (1); one speaker gave two languages. An exploratory investigation into the use of intensity and duration for this group of speakers shows that there is no systematic change of duration and intensity due to focus. The duration of a focused subject is longer but not the duration of verb and/or object. Intensity is higher only in a focused object. 8 Speakers of Zulu (2), Xhosa (1), Pedi (1). An exploratory investigation into the use of intensity and duration for this group of speakers shows that there are no differences in intensity and/or duration due to focus. 9 Speakers of Xhosa (3), Zulu (1), Pedi (2), Sotho (2), Tsonga (1), and Venda (1); two speakers gave two languages. An exploratory investigation into the use of intensity and duration for this group of speakers shows that there are no differences in intensity and/or duration due to focus.

Prosodic marking of focus in transitive sentences in varieties

Figure 7: F0 averages for speaker 46 (male, BlSAfE, Xhosa speaker)

Figure 8: F0 averages for speaker 24 (male, BlSAfE, Tsonga speaker)

229

230

Sabine Zerbian

4.3 Inferential analysis In order to examine whether any consistent cues emerge in the prosodic marking of focus and givenness in the group as a whole, the three acoustic parameters relevant for prominence, namely maximum F0, intensity and duration, were measured on the stressed syllables of the subject, object and the full verb. Linear mixed models were fitted with maximum F0/duration/intensity of the stressed syllable as the dependent variable, focus condition as a fixed factor and speaker as a random factor. The results are presented for each of the measures in turn and are summarised in section 4.4.

4.3.1 Maximum F0 The mean values of maximum F0 across all utterances and speakers are presented in Table 7 (with their log-values used for analysis, see 2.3.1.). Table 7: Means of maximum F0 of stressed syllables, differentiated by focus condition (in Hz) baseline

focal

given prefocal

postfocal

σ1 (subj.)

168.52

162.55

vF: 161.93

σ (verb)

149.86

150

oF: 147.46

sF: 143.1

σ1 (obj.)

137.66

135.82

n/a

sF: 131.87

oF: 161.58

n/a

vF: 132.08

For prosodic focus marking by means of F0 it was found that – a focused subject has a significantly lower F0 than a subject in the baseline (t = 3.55; p < 0.001) – a focused verb does not have a significantly higher F0 than a verb in the baseline (t = –0.1; p = 0.92) – a focused object does not differ significantly in F0 from an object in the baseline (t = 1.77; p = 0.0783) For givenness marking by means of F0 it was found that – a given subject has a significantly lower F0 than a subject in the baseline (vF: t = –3.93; p < 0.001; oF: t = –4.18; p < 0.001) – a given verb only has a significantly lower F0 postfocally, i.e. in the case of subject focus (sF: t = –4.45; p < 0.001), but not prefocally, i.e. in the case of object focus (oF: t = –1.57; p = 0.1173) – a given object is significantly lower in F0 than in the baseline (sF: t = –5.59; p < 0.001; vF: t = –5.4; p < 0.001)

231

Prosodic marking of focus in transitive sentences in varieties

4.3.2 Duration The mean values of duration across all utterances and speakers are presented in Table 8 and tested for significance below. Item was included as a random factor in the linear mixed models because there are some slight differences in segmental structure between the different stimuli sentences. Table 8: Means of duration of stressed syllables, differentiated by focus condition (in ms) baseline

focal

given prefocal

postfocal

σ1 (subj.)

171.13

176.9

vF: 160.56

σ (verb)

376.73

363.87

oF: 333.82

sF: 320.43

σ1 (obj.)

217.57

202.6

n/a

sF: 197.99

oF: 156.96

n/a vF: 202.49

For prosodic focus marking by means of duration it was found that – a focused subject is not significantly longer than a subject in the baseline (t = –0.933; p = 0.3513) – a focused verb is not significantly longer than a verb in the baseline (t = 0.881; p = 0.3791) – a focused object is significantly shorter than an object in the baseline (t = 3.358; p < 0.001) For givenness marking by means of duration it was found that – a given subject is only significantly shorter than in the baseline in the case of object focus (oF: t = –2.318; p = 0.021), not in the case of verb focus (vF: t = –1.714; p = 0.0875) – a given verb is significantly shorter than in the baseline (oF: t = –2.965; p = 0.0032; sF: t = –3.849; p < 0.001) – a given object is significantly shorter than in the baseline (sF: t = –4.349; p < 0.001; vF: t = –3.355; p < 0.001) 4.3.3 Intensity The mean values of intensity across all utterances and speakers are presented in Table 9 and tested for significance below. Table 9: Means of intensity of stressed syllables, differentiated by focus condition (in dB) baseline

focal

given prefocal

σ1 (subj.)

70.87

69.6

vF: 69.6

postfocal oF: 70.7

n/a

σ (verb)

68.38

67.59

oF: 67.53

sF: 66.99

σ1 (obj.)

66.34

65.55

n/a

sF: 64.64

vF: 64.43

232

Sabine Zerbian

For prosodic focus marking by means of intensity it was found that – a focused subject has a significantly lower intensity than in the baseline (t = 3.33; p = 0.001) – a focused verb has a significantly lower intensity than in the baseline (t = 2.43; p = 0.0155) – a focused object has a significantly lower intensity than in the baseline (t = 2.05; p = 0.0409) For givenness marking by means of intensity it was found that – a given subject also has a significantly lower intensity than in the baseline (vF: t = –3.32; p = 0.001; oF: t = –3.07; p = 0.0023) – a given verb has a significantly lower intensity than in the baseline (oF: t = –2.64; p = 0.0086; sF: t = –4.27; p < 0.001) – a given object has a significantly lower intensity than in the baseline (sF: t = –4.38; p < 0.001; vF: t = –4.93; p < 0.001)

4.4 Overall results for BlSAfE A lack of F0 increase can be observed on focused items in contrast to speakers of WSAfE. F0 is never higher on a focused constituent as compared to its realisation in the baseline. On a focused subject, F0 is surprisingly significantly lower than in the baseline. For F0-lowering on given constituents, the pattern is similar to the one observed in WSAfE, though the differences are by far less clear (comparing the visual displays in Figures 1 and 5 as well as the average values in Tables 1 and 7). The most striking difference lies in the absence of focus marking by F0 and the stable initial high F0 on the subject. Concerning the realisation of duration, the duration of the stressed syllable of the verb and subject is never significantly longer in the focus condition than in the baseline. The stressed syllable of the object is even significantly shorter in focus condition than in the baseline. Given constituents are always realised significantly shorter, except for a given subject in the case of verb focus. An unexpected finding is the significantly higher intensity in the baseline as compared to the focused condition for all constituents. As the baseline was always elicited as the first question-answer pair in all sets of stimuli sentences, this might well be due to a generally higher intensity on the first rendering of an utterance compared to the following ones. Although this does not seem to have been happening for the other groups of speakers, the intensity measures of the group of BlSAfE were not taken into further consideration.

Prosodic marking of focus in transitive sentences in varieties

233

5 Discussion Before entering into a discussion of the results, Table 10 provides the overall results of the three groups of speakers. Table 10: Comparison of acoustic cues used for focus and givenness in the three varieties (– = no significant difference, ↑ = significantly higher than baseline, ↓ = significantly lower than baseline)

Focus

acoustic cue

constituent

WSAfE

“crossing over”

BlSAfE

F0

S







V







O







S







V







O







S





n/a/

V





O





S



VF: ↓ OF: –



V

preFoc: – postFoc: ↓

preFoc: – postFoc: ↓

preFoc: – postFoc: ↓

O







S





VF: – OF: ↓

V

preFoc: – postFoc: ↓

preFoc: – postFoc: ↓



O







S





n/a

V



preFoc: – postFoc: ↓

O





duration

intensity

Given

F0

duration

intensity

A complex pattern of phonetic variation between the three varieties emerges from the overview in Table 10. What these results show is that all three varieties

234

Sabine Zerbian

differ in their use of the three acoustic correlates of prominence. WSAfE shows similar cues as the well-described General American English (Breen et al. 2010) in using both an increase of F0 and duration on focused constituents as well as a decrease of these same parameters (and additionally intensity) for the marking of given constituents. Together, these acoustic changes amount to clearly marking the information structure of an utterance prosodically. What we see in the pattern of these speakers is what Ladd (1980: 67) refers to when he says that in English prosodic focus marking and deaccentuation can be considered “opposite sides of the same coin”. The multilingual speakers of South African English varieties select some of the features of WSAfE and use them in specific contexts. For prosodic marking of focus, we see in the “crossing over” variety of Black speakers that only one of the acoustic correlates is selected (namely F0) and only for verb focus. Thus, we do not find any categorical use of an acoustic cue for focus. The use of a significantly lower F0 by speakers of BlSAfE in the case of subject focus and of significantly shorter duration in the case of object focus is surprising and cannot be explained in any principled way. It might be interpreted as a reflection of the fact that acoustic parameters are not manipulated in a systematic way depending on focus in this variety. Interestingly, nearly the same pattern emerges in all three varieties for the use of the acoustic cues for given constituents. In general, F0, intensity and duration are significantly lower on given constituents. Even the different prosodic treatment of a given verb occurring prefocally and postfocally is present in the English varieties by Black speakers. Givenness marking on postfocal constituents has been analysed in two ways in the literature: As deaccentuation and as postfocal compression. The term deaccentuation refers to a deletion of pitch accents due to discourse-specific rules (Gussenhoven 2011). Consequently, if pitch accents are deleted, the resulting F0 contour should be entirely flat. This is what we see with speakers of WSAfE (Figure 1): the F0 contour does not show any sign of a pitch accent after the focused constituent in the case of subject and verb focus. However, Ladd warns that deaccenting “does not depend on anything as straightforward as ‘failure to be assigned stress’ or ‘low pitch’” (Ladd 1980: 57). Instead he describes that the typical case of deaccenting shows a constituent whose “level of stress” appears quite reduced (Ladd 1980: 55) and is “perceived by inferring a rhythmic structure in which the deaccented item is weaker than it would be if it were not deaccented” (Ladd 1980: 57). This makes deaccentuation comparable to postfocal compression (PFC). This term goes back to the studies by Xu (1999) on focus intonation in Mandarin Chinese and refers to a compression of F0 range and intensity of post-focus constituents. The F0 contour might still show reflexes of pitch accent albeit in a reduced

Prosodic marking of focus in transitive sentences in varieties

235

range. The statistical results show a significant lowering of F0 on given constituents as compared to the baseline. (Intensity will not be considered here given that there are no reliable data for speakers of BlSAfE.) The numbers on their own do not reveal which of the two analyses is most suited to the case at hand. The visual inspection of the F0 tracks of those Black speakers who do show some marking of givenness suggests an interpretation of deaccentuation for the Black speakers of the “crossing over” variety (speaker 58, see Figure 3) because the F0 contour remains entirely flat after the focused constituent in subject and verb focus, not giving any indication of an additional pitch accent on following content words. The BlSAfE speakers who show prosodic givenness marking still seem to realise some reduced pitch accent on postfocal content words, see e.g. the F0 track of speaker 18 (Figure 6), which shows a very slight peak on the postfocal object in verb focus and a slight peak on the verb in subject focus. Such a pattern would support an analysis as postfocal compression. These observations can only serve as an initial hypothesis for further research on this issue, which is clearly needed. But it is a very interesting hypothesis as it implies that the English varieties of Black speakers might differ in givenness marking: the “crossing over” variety uses a pattern of givenness marking similar to WSAfE, the L2 variety uses a pattern where traces of pitch accents can still be seen, though reduced in range. It is not surprising to find an absence of focus marking in the L2 variety BlSAfE. As Coulmas points out (2005: 78–79), intonation is one of the most conservative features of speech, while segmental phonetic features are more likely to change in language contact. In the case at hand, the Southern Bantu languages do not seem to mark focus and givenness prosodically (Zerbian 2007 for Northern Sotho; Swerts and Zerbian 2010 for Zulu), and hence the absence of focus marking in BlSAfE could be considered such a persisting L1 influence.10 An alternative interpretation would be that the absence of prosodic focus marking is a universal aspect of L2 speech, given that it has been found for many L2 speakers of English with different L1s (e.g. Gut 2009). Only further research on language pairings in which both languages show prosodic focus marking can provide a final answer to this question. How does the variability observed in the prosodic patterns within the speaker groups relate to the Bantu languages of the speakers and/or their proficiency in English? As far as the Bantu languages involved are concerned, no consistent pattern emerges. Speakers of the Nguni languages (comprising Zulu and Xhosa) are among those speakers who show prosodic focus marking by means of F0 as 10 Practical reasons did not allow to test the speakers in their L1s too, so we had to rely on existing literature on this topic.

236

Sabine Zerbian

well as among those who do not. The same holds for speakers of the SothoTswana languages (comprising Sotho, Tswana, Pedi) and Tsonga. Though a more controlled study is definitely needed to carefully investigate the influence of the different Bantu languages on the prosody of English by Black speakers, the data available suggest that no phonological differences are involved. Thereby, Wissing’s (2002) observation of a lack of major differences due to the ancestral language can be extended to prosody. Testing for proficiency in South African English, on the other hand, is not a trivial task, and the procedure used here can only give a first impression but should not be considered reliable to draw conclusions. We administered the grammar part of the Oxford Placement Test (2004 edition) excluding the questions pertaining to question tags (due to a different usage of question tags in South African English; Minow 2010: 72). Black speakers were assigned to two groups depending on their score. The threshold was set arbitrarily to 90% of correct answers. 90% or above will be referred to as “more proficient”, less than 90% as “less proficient”. The following distribution emerges: Table 11: F0 used for focus marking and English proficiency F0 used for on/off focus

only focus

used (not used for IS)

“crossing over” variety “more proficient” “less proficient”

2 –

– –

2 3

BlSAfE “more proficient” “less proficient”

4 2

1 3

1 7

The distribution might suggest a tendency that among those speakers who use F0 for the marking of focus are more “more proficient” speakers, and among those who do not use F0 or use F0 less are more “less proficient” speakers. However, we do not consider our test of English proficiency representative and reliable for the South African context, and therefore refrain from any conclusions concerning this issue. What is interesting from a theoretical perspective is the phonetic marking of givenness in BlSAfE. Other English L2 varieties have been reported to lack deaccentuation (Gumperz 1982 for Indian English as cited in Ladd 2008: 232; Gut 2005 for Nigerian English; and recently Gut, Pillai and Zuraidah 2013 for Malaysian English). In addition, given that many languages of the world are reported to lack deaccentuation (see Cruttenden 2006 for an overview), it can be assumed that prosodic givenness marking is a marked feature (cf. Rasier

Prosodic marking of focus in transitive sentences in varieties

237

and Hiligsmann 2007, Zerbian to appear for markedness of prosody). Marked features are difficult to acquire (Eckman 1977). Studies by Wu and Chung (2011) and by Chen, Guion-Anderson and Xu (2012) on different groups of bilingual speakers have confirmed that postfocal compression is not easily transferred from language to language. It is thus surprising to find givenness marking in the L2 variety BlSAfE. As it is a marked feature it should be difficult to acquire. The question is, though, whether the difference (which clearly emerges as statistically significant from the analysis) is phonological or solely phonetic. The interpretation for BlSAfE as postfocal compression based on the inspection of the pitch tracks already suggests that the difference might not be phonological. The smaller effect sizes suggest that differences are less salient. It also needs to be pointed out that although present phonetically, the linguistic cues used for givenness marking might not be interpreted by listeners. In Zerbian (to appear), listeners had to deduce the information-structural structure of SVO sentences uttered by speakers of WSAfE and BlSAfE in different focus conditions, solely based on the prosody of the utterances. Results show that utterances produced by speakers of BlSAfE were more often misjudged with respect to their information structure than utterances produced by speakers of WSAfE. The difference is significant and is fully in line with the production data presented in the present study. In the special case of subject focus, there was a significant difference between all three varieties, with BlSAfE again showing the highest number of cases of misjudgements. The statistical difference between the baseline and givenness marking that emerged in the current study could thus be interpreted as a mere phonetic effect. In work on focus prosody, it is suggested that focus and emphasis have much in common, not least prosodic cues. Emphasis can be considered as “an optional paralinguistic overlay to the prosodic realization, if any, of semantic focus” (Downing and Pompino-Marschall 2013: 666). By the same token, slight phonetic reduction of given material might be an optional paralinguistic overlay to the prosodic realization of givenness. In a study on the prosodic expression of focus in modified noun phrases, Zerbian (2013) found for comparable groups of speakers of South African English that speakers of BlSAfE do not manipulate intensity and F0 on the basis of focus (duration was not considered). This result is in line with the current study which also did not find any prosodic marking of focus. For Black speakers of the “crossing over” variety, (called postacrolect SAfE in Zerbian 2013) is was found that F0 but not intensity was changed due to focus. In the present study, F0 emerged as a possible cue to prosodic focus marking only for verb focus, not in general. It needs to be noted though that Zerbian (2013) investigated the parameters within the noun phrase whereas in the present study the acoustic cues were compared to a baseline of broad focus. Also, in the present study

238

Sabine Zerbian

the stimuli sentences were read out whereas Zerbian (2013) investigated semispontaneous speech. As already mentioned in the introduction, the two studies are a necessary complementation of each other as languages have been shown to differ in the domains of prosodic marking of information structure. Taken together, the two studies show that varieties of South African English spoken by Black speakers do not reliably mark focus prosodically, neither in the phrasal domain nor in the sentence domain. Givenness is marked within the sentence, though further research on its perception strongly suggests that it is only a slight but statistically significant phonetic difference, at least in BlSAfE. The “crossing over” variety of Black speakers emerges as both different from the L2 variety BlSAfE as well as similar: Different in some phonetic cues used for focus marking (e.g. increased F0 on the verb when in focus; see also results in Zerbian 2013). Similar in its overall absence of systematic focus marking by prosody, at the same time showing prosodic givenness marking. However, the phonology of givenness marking might again be different across the two Black varieties, though more research is clearly needed. This answers one of the leading questions of this volume in the positive: It seems as if different constraints operate on English as a second language and a newly emerging variety. The exact mechanisms, however, are not yet clear. Given the sociolinguistic situation in South Africa and the ongoing change in the linguistic landscape (cf. Mesthrie 2010), the ”crossing over” variety cannot be considered a stable variety. Some of its present features are documented by the findings of the current study, and it will be interesting to see whether and into which direction this variety might develop. So why is it different from GenSAfE? Influence from the South African Bantu languages does not suggest itself readily as in the case of BlSAfE to explain why prosodic focus marking is largely absent, because we do not find segmental traces of the Bantu languages. It could be an extreme example for the inertness of prosody to change so that a pattern known from the local Bantu languages persists even when there are no segmental traces present in the variety. Alternatively, it could be a universal feature, possibly linked to markedness considerations (cf. Zerbian, to appear). Or, extralinguistic features could be a driving factor behind the absence of focus marking, such as the conscious or unconscious wish not to sound “too white” (cf. Rudwick 2008). Further phonetic and sociolinguistic research is necessary to find answers to these questions.

References Bates, Douglas & Dipanwita Sarkar. 2007. lme4: linear mixed-effects models using S4 classes. (R package version 0.9975–11). Bekker, Ian. 2009. The vowels of South African English. Potchefstroom: North-West University dissertation.

Prosodic marking of focus in transitive sentences in varieties

239

Boersma, Paul & David Weenink. 2012. Praat: Doing phonetics by computer. [Computer program]. http://www.praat.org Breen Mara, Evelina Fedorenko, Michael Wagner & Edward Gibson. 2010. Acoustic correlates of information structure. Language and Cognitive Processes 25 (7/8/9). 1044–1098. Chen, Ying, Susan Guion-Anderson & Yi Xu. 2012. Post-focus compression in second language Mandarin. In Qiuwu Ma, Hongwei Ding & Daniel Hirst (eds.), Proceedings of Speech Prosody 2012: 6th International Conference, Shanghai, China, May 22–25, 410–413. Shanghai: Tongji University Press. Coulmas, Florian. 2005. Sociolinguistics: The study of speaker’s choices. Cambridge: Cambridge University Press. Cruttenden, Alan. 2006. The de-accenting of given information: A cognitive universal? In Giuliano Bernini & Marcia L. Schwartz (eds.), Pragmatic organization of discourse in the languages of Europe, 311–355. Berlin: Mouton de Gruyter. Da Silva, Arista B. 2008. South African English: A sociolinguistic investigation of an emerging variety. Johannesburg: University of the Witwatersrand dissertation. Downing, Laura J. & Bernd Pompino-Marschall. 2013. The focus prosody of Chichewa and the Stress-Focus constraint: A response to Samek-Lodovici (2005). Natural Language and Linguistic Theory 31 (3). 647–681. Eckman, Fred. 1977. Markedness and the Contrastive Analysis Hypothesis. Language Learning 27 (2). 315–330. Gumperz, John Joseph. 1982. Discourse strategies. Cambridge: Cambridge University Press. Gussenhoven, Carlos. 2011. Sentential prominence in English. In Marc van Oostendorp, Colin J. Ewen, Elizabeth V. Hume & Keren Rice (eds.), The Blackwell companion to phonology, 2778–2806. Malden, MA & Oxford: Wiley-Blackwell. Gut, Ulrike. 2005. Nigerian English prosody. English World-Wide 26 (2). 153–177. Gut, Ulrike. 2009. Non-native speech: A corpus-based analysis of phonological and phonetic properties of L2 English and German. Frankfurt: Peter Lang. Gut, Ulrike, Stefanie Pillai & Zuraidah Mohd Don. 2013. The prosodic marking of information status in Malaysian English. World Englishes 32(2). 185–197. Hellmuth, Samantha. 2005. No de-accenting in (or of) phrases: Evidence from Arabic for crosslinguistic and cross-dialectal prosodic variation. In Sónia Frota, Marina Vigário & Maria João Freitas (eds.), Prosodies, 99–112. Berlin: Mouton de Gruyter. Hofmeyr, Jane. 2000. The emerging school landscape in Post-Apartheid South Africa (Speech presented for the Independent Schools Association of South Africa, 30 March 2000), cited in Alan Morris, A decade of Post-Apartheid: Is the city in South Africa being remade? Safundi 5(1–2), 2004. Krifka, Manfred. 2008. Basic notions of information structure. Acta Linguistica Hungarica 55 (3). 243–276. Ladd, D. Robert. 1980. The structure of intonational meaning: Evidence from English. Bloomington: Indiana University Press. Ladd, D. Robert. 1996. Intonational phonology, 1st edn. Cambridge: Cambridge University Press. Ladd, D. Robert. 2008. Intonational phonology, 2nd edn. Cambridge: Cambridge University Press. Mennen, Ineke. 2004. Bi-directional interference in the intonation of Dutch speakers of Greek. Journal of Phonetics 32 (4). 543–563.

240

Sabine Zerbian

Mesthrie, Rajend (ed.). 2008. Varieties of English 4: Africa, South and Southeast Asia. Berlin & New York: Mouton de Gruyter. Mesthrie, Rajend. 2010. Socio-phonetics and social change: Deracialisation of the GOOSE vowel in South African English. Journal of Sociolinguistics 14 (1). 3–33. Minow, Verena. 2010. Variation in the grammar of Black South African English. Frankfurt: Peter Lang. Rasier, Laurent & Philippe Hiligsmann. 2007. Prosodic transfer from L1 to L2: Theoretical and methodological issues. Nouveaux cahiers de linguistique francaise 28. 41–66. Rudwick, Stephanie. 2008. Coconuts and Oreos: English-speaking Zulu people in a South African township. World Englishes 27 (1). 101–116. Snedeker, Jesse & John Trueswell. 2003. Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. Journal of Memory and Language 48. 103–130. Swerts, Marc & Sabine Zerbian. 2010. Intonational differences between L1 and L2 English in South Africa. Phonetica 67 (3). 127–146. van Rooy, Bertus. 2004. Black South African English: Phonology. In Edgar W. Schneider, Kate Burridge, Bernd Kortmann, Rajend Mesthrie & Clive Upton (eds.), A handbook of varieties of English. Volume 1: Phonology, 943–952. Berlin: Mouton de Gruyter. Wilmot, Kirstin. 2014. “Coconuts” and the middle-class: Identity change and the emergence of a new prestigious English variety in South Africa. English World-Wide 35 (3). 306–337. Wissing, Daan. 2002. Black South African English: A new English? Observations from a phonetic viewpoint. World Englishes 21 (1). 129–144. Wu, Wing Li & Lisa Chung. 2011. Post-focus compression in English-Cantonese bilingual speakers. In Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, China, August 17–21, 148–151. Xu, Yi. 1999. Effects of tone and focus on the formation and alignment of f0 contours. Journal of Phonetics 27. 55–105. Xu, Yi. 2013. ProsodyPro – A tool for large-scale systematic prosody analysis. In Proceedings of Tools and Resources for the Analysis of Speech Prosody (TRASP 2013), Aix-en-Provence. 7–10. Xu, Yi & Ching X. Xu. 2005. Phonetic realization of focus in English declarative intonation. Journal of Phonetics 33. 159–197. Zerbian, Sabine. 2006. Expression of information structure in the Bantu language Northern Sotho. ZAS Papers in Linguistics 45. ZAS: Berlin. Zerbian, Sabine. 2007. Investigating prosodic focus marking in Northern Sotho. In Enoch Oladé Aboh, Katharina Hartmann & Malte Zimmermann (eds.), Focus strategies in African languages: The interaction of focus and grammar in Niger-Congo and Afro-Asiatic, 55–79. Berlin: Mouton de Gruyter. Zerbian, Sabine. 2013. Prosodic marking of narrow focus across varieties of South African English. English World-Wide 34 (1). 26–47. Zerbian, Sabine (to appear). Syntactic and prosodic focus in contact varieties of South African English. English World-Wide. Zerbian, Sabine (to appear). Markedness considerations in L2 prosodic focus and givenness marking. To appear in Prosody and languages in contact: L2 acquisition, attrition, languages in multilingual situations, ed. E. Delais-Roussarie, M. Avanzi & S. Herment. Berlin: Springer.

Ulrike Gut

12 Epilogue: Universal or diverse paths to English phonology? This volume set out to bridge the gap between the disciplines investigating English as a second language, English as a third or additional language, and English as a new variety. One of the central questions to be tackled in this endeavour is: What do the phonologies of speakers of English as a second language (L2), speakers of English as a third or additional language (L3/Ln) and speakers of new English varieties have in common? What is it that sets them apart? The contributions in this volume have shown that, for both production and perception, what prevails in all three scenarios are high variability across speakers and cross-linguistic influence. Furthermore, the same factors seem to influence the three types of acquisition process. High variability across speakers of English as a second, third or additional language was found in their production of word stress (see chapter 10, this volume), their production of voice onset time (VOT, see chapter 4, this volume) and speech rhythm (see chapter 8, this volume). By the same token, a pronounced variability across speakers in new English varieties was observed with regard to rhoticity in both Brunei and Malaysian English (see chapters 2 and 3, this volume): In both varieties, fully non-rhotic speakers, fully rhotic speakers and speakers with variable rhoticity can be found. In Black South African English, speakers’ phonologies range from an accent that nearly equals that of White South African speakers to segmental and suprasegmental properties that are very distinct from it (see chapter 11, this volume). Likewise, educated Nigerian English comprises both speakers with the consonant cluster simplification strategy of vowel insertion and those whose phonologies do not have this process at all (see chapter 7, this volume). A further commonality across English spoken as a second, third or additional language, on the one hand, and as a new variety of English, on the other hand, is the clear evidence of cross-linguistic influence (CLI). This was found for the perception of vowels (see chapter 5, this volume) and word stress (see chapter 10, this volume) as well as for the production of coda /r/ (see chapters 2 and 3, this volume), VOT (see chapter 4, this volume), prosodic marking of focus and givenness (see chapter 11, this volume), syllable structure (see chapter 6, this volume), onset consonant clusters (see chapter 7, this volume), and speech rhythm (see chapters 8 and 9, this volume). Moreover, it appears that Ulrike Gut, University of Münster

242

Ulrike Gut

independent of the three acquisition scenarios of English considered here, phonological CLI seems to be constrained by the same factors: language experience and awareness appear to foster CLI in both the perception of vowel contrasts (see chapter 5, this volume) and the acquisition of speech rhythm (see chapter 8, this volume). Moreover, the length of instruction in an L2 has an effect on VOT patterns in an L3, as found by Wrembel (chapter 4, this volume). Wrembel’s study furthermore suggests that highly proficient speakers of L2 English might exhibit CLI from their L2 onto their L1, as evidenced in VOT values in their L1 German that resemble those of native English speakers. It appears very likely that they were influenced by the speakers’ L2 English. This would then corroborate findings by Mennen (2004), who demonstrated L2 influence on speakers’ L1 Dutch. The main factor, however, which strongly affects all speakers/learners of English in the three scenarios seems to be their norm orientation, as first proposed by Gut (2007). Some speakers of a new variety such as Nigerian English and Black South African English seem to have an endonormative orientation, as argued by Soneye and Ayoola (chapter 7, this volume) for onset cluster production and by Zerbian (chapter 11, this volume) for the prosodic marking of focus and givenness. Some English speakers in postcolonial countries, by contrast, show exonormative orientation, like L2 and L3/Ln learners in a nonimmigrant setting tend to do. Deterding (chapter 2, this volume), for example, argues that rhoticity in Brunei English is predominantly caused by an orientation towards a prestigious American norm or even an orientation towards a global norm of English as a lingua franca. By the same token, Pillai (chapter 3, this volume) argues that the non-rhotic accent of Malaysian speakers of English stems from their orientation towards the British English norm. The speakers’ norm orientation can also explain much of the observed variability across speakers of English as a second language, a third/additional language or a new variety. One case in point is the puzzling fact that Black South African speakers who share their ethnic background, are equally multilingual and have the same proficiency in English have acquired two distinct phonological systems, as described in chapter 11 (this volume). What divides the speakers of ‘Black South African English’ and the ‘crossing over’ variety of Blacks in South Africa is their social norm orientation. While the first group go to schools that cater for an almost exclusively black community, speakers of the crossing over variety go to more prestigious schools that are primarily frequented by White South Africans and tend to belong to a higher social class than the Black South African speakers do. The phonologies of the two groups thus reflect the different standards in their communities of practice (Lave and Wenger 1991) as well as their different constructions of identity (see Schneider 2007: 239–241). Likewise, it might turn out that the variability in rhoticity that Deterding (chapter 2, this

Epilogue: Universal or diverse paths to English phonology?

243

volume) found in Brunei English speakers can be explained by their differing norm orientation towards American or global models. Moreover, the single Nigerian speaker included in Soneye and Ayoola’s study (chapter 7, this volume) who was born in Britain and had lived there for the first five years of his life, showed distinct differences in onset cluster production from all other Nigerian speakers. Although he is still a child and might alter some aspects of his phonology later in life, this case also suggests that for him it is the norm orientation towards British English that leads him not to reduce onset clusters. The Norm Orientation Hypothesis can even be invoked to explain the differences in English syllable structure between English loan words in Bangla and English words produced by Bangla L2/L3 speakers of English (see chapter 6, this volume). As Nagarajan shows, norm orientation towards the L1 (or a highly ranked L1 Faithfulness constraint in OT terminology) causes English loan words in Bangla to undergo substantial phonological adaptations such as consonant cluster deletion, consonant gemination and vowel epenthesis. When Bangla L1 speakers speak English, however, their norm orientation is focused on English phonology (or their L2 faithfulness constraints are ranked highest) so that consonant clusters are produced and gemination does not occur. Can we conclude from this that the path to acquiring English phonology is the same across learners of English as a second language, a third/additional language and a new variety? This would certainly be premature. It is true that some universal strategies in the acquisition of word stress such as a preponderance of penultimate stress placement or the stress-attracting segmental composition of certain final syllables (see chapter 10, this volume), and universal strategies in the acquisition of VOT, such as effects of the place of articulation and the vowel context (see chapter 4, this volume), have been found in these three groups. In contrast, other studies have provided compelling evidence that the three types of English speakers differ in crucial aspects. Kopečková (chapter 5, this volume) showed that child learners of English as a second language differ sharply in their perception of vowel contrasts from learners of English that simultaneously acquire further languages. Multilingual learners of English are better at discerning differences between English vowels than monolingual learners of English are, which is likely due to their greater language experience and awareness. By the same token, Gabriel, Stahnke and Thulke showed that learners of English as an L3/Ln acquire the speech rhythm of English more easily if one of their earlier languages has similar rhythmic properties (chapter 8, this volume). What does this mean for the second research question of this volume? How useful is it then – from a phonological point of view – to continue using the distinction between English as an L2, English as an L3/Ln, and English as a new variety – together with related distinctions such as English as a second

244

Ulrike Gut

language (ESL) and English as a foreign language (EFL) (see e.g. McArthur 1998: 42–43)? The results of some studies in this volume suggest that a distinction between English as an L2 and as a new variety is difficult to uphold. For speakers with the same linguistic background, some of the studies in this volume show that there is a clear link between the properties of L2 speech and new English varieties: Gabriel, Stahnke and Thulke (chapter 8, this volume), for example, found striking similarities between the rhythmic properties of English spoken by Chinese L2 learners and that spoken by speakers of English in Taiwan, Hong Kong and Singapore (where the majority of speakers are of Chinese ethnicity). Likewise, Altmann and Kabak (chapter 10, this volume) speculate that it is the fossilised stress patterns that highly advanced L2 speakers of English produce and that are resistant to contrary evidence that might have led to the establishment of innovative stress patterns of English words in the new varieties of English. Conversely, some studies in this volume show that the term ‘new English variety’ can comprise speakers who speak English as their native and possibly even only language (see chapters 3 and 7 for English in Malaysia and Nigeria respectively), which indicates the usefulness of differentiating between English as an L2 and as a new variety after all. Other varieties of English that have a sizable or growing number of L1 speakers include Singapore English, Ghanaian English (Huber 2008) and Liberian English (Singler 2008), but research on whether L1 speakers of a new variety show less variability and possibly different acquisition outcomes is still scarce. If this were indeed the case, the time had come to differentiate between varieties of English that consist exclusively of L2 (or L3/Ln) speakers and those varieties that also have L1 speakers. It would be necessary then to coin a new term for such (sub-)varieties of English that have both L1 and L2/L3/Ln speakers and to include this new concept in models of new varieties of English. The recognition of such heterogeneous speaker communities with different norm orientations and thus, as shown in this volume, with different phonologies strongly suggests a reconsideration of traditional classifications such as Kachru’s (1992) Three Circles of English. This model divides the forms of English spoken around the world into the Inner Circle, which includes countries where English is spoken as a primary language that is norm-providing, the Outer Circle, where English is spoken as a second language and serves important roles, e.g. in government, legislature, media, business and education, and the Expanding Circle, where English functions as a foreign language that is norm-dependent and that is used mainly in foreign language classrooms. While this and similar models have superbly served their purpose of awarding the structures of the new varieties of English the status of innovations worth of linguistic study and

Epilogue: Universal or diverse paths to English phonology?

245

thus removing from them the label of ‘learner errors’, they can no longer adequately represent present-day realities (see also the criticism in Bruthiaux 2003 and Yano 2009). New models are required that incorporate issues such as the growing numbers of L1 English speakers in countries traditionally placed in the Outer Circle such as Nigeria and Malaysia (see chapters 3 and 7, this volume), the wide range of speaker communities within these multi-ethnic and multilingual societies (see chapter 11, this volume) as well as the dynamics of a digitalised and globalised world that allows new forms of interaction and communication. As Deterding (chapter 2, this volume) argues, the classification of varieties as Outer Circle or Expanding Circle is becoming less important and should be replaced by a more fine-grained description based on individual speaker communities that differ in their norm orientation. As far as the distinction between speakers of English as an L3/Ln and speakers of a new English variety is concerned, the case seems to be straightforward: those speakers of a postcolonial variety of English who are multilingual and acquired English as an additional language are speakers of English as an L3/Ln. For this group, cross-linguistic influence between all of their languages will occur in the same manner and be constrained by the same factors as for other L3/Ln speakers of English who do not live in a postcolonial Englishspeaking country. The fact that English has a different status in ‘Outer Circle’ countries, which might lead to different directions and degrees of CLI can easily be conceptualised within multifactorial models developed for L3/Ln acquisition (e.g. De Angelis 2007; Cenoz 2001). Thus, the only distinction between speakers of English as a second, a third/ additional language and a new variety that has been proven empirically valid, at least as far as the acquisition of phonology is concerned, is that between L2 and L3/Ln speakers of English. As Kopečková (chapter 5, this volume) has shown, those two groups of learners differ in their cognitive means both for perceiving speech sounds/sound contrasts and forming mental categories for them. Equally, they have different preconditions for acquiring phonological properties of their target language, as shown by Gabriel, Stahnke and Thulke (see chapter 8, this volume) for the acquisition of speech rhythm. This is due to the fact that, as shown in many previous studies as well as Wrembel’s contribution in this volume (see chapter 4), CLI in L3/Ln learners differs sharply from CLI in L2 learners by manifesting itself in more differentiated ways and by being constrained by a greater number of factors. Which theoretical and methodological conclusions can be drawn from the contributions to this volume? It appears vital that the focus of each discipline be widened and that a transfer of methods across the three disciplines of SLA research, L3/Ln acquisition research, and research in new varieties of English

246

Ulrike Gut

should occur. As the chapters by Wrembel and Altmann and Kabak have shown (see chapters 4 and 10, this volume), this will unearth new insights. For example, in L2 studies, the other languages of the speakers need to be taken into consideration as they will influence (a) the perception of L2 contrasts (see chapter 5, this volume), (b) the production of phonological properties in the target language (e.g. chapter 8, this volume) as well as (c) the phonologies of other languages they speak (see chapter 4, this volume). As a consequence, it needs to be stressed that models of L2 phonological acquisition such as Flege’s (1995) Speech Learning Model or the models proposed by Best (1995) and Brown (1998, 2000) can only be applied to speakers who learn their first (!) non-native language. In order to extend the applicability of these models to L3/Ln acquisition of phonology, the manifold possibilities and constraints of CLI in multilingual speakers must be taken into account. Moreover, the contributions to this volume (see chapters 2, 3, 7 and 11 in this volume) suggest that in studies on the phonologies of new varieties of English differences across speakers should be taken into account more carefully. It appears useful to differentiate between L1 and L2/L3/Ln speakers of a new variety (see chapter 7, but see contrary evidence in chapter 3). Moreover, as shown by studies of other multilingual speakers (see chapters 4, 5 and 8 in this volume), non-linguistic factors such as length of instruction and language experience as well as cross-linguistic influence should be considered when describing the phonological properties of new varieties of English. Equally importantly, the elicitation of speakers’ norm-orientation should always be part of a study on new English varieties. Adequate methods for doing so still need to be developed, though. With many questions still unresolved, some contributions in this volume have even raised more. For example, Deterding’s study (chapter 2, this volume) showed the independence of individual phonological features such as rhoticity, vowel inventory, /t/-deletion and TH-stopping in phonological acquisition. Similar findings were reported by Gut (2009) for the areas of consonant cluster reduction, intonation and vowel reduction. Much more research is therefore needed on the interplay between the different phonological aspects of a target language during acquisition. We hope that this volume has put a first silken thread across the gulf that still divides the disciplines of second language acquisition, third or additional language acquisition, and World Englishes. We eagerly await future efforts of a similar kind that will help to transform this silken thread into a solid bridge.

Epilogue: Universal or diverse paths to English phonology?

247

References Best, Catherine. 1995. A direct realist view of cross-language speech perception. In: Winifred Strange (ed.), Speech perception and linguistic experience: Issues in cross-language research, 171–204. Timonium, MD: York Press. Brown, Cynthia. 1998. The role of the L1 grammar in the L2 acquisition of segmental structure. Second Language Research 14 (2). 136–193. Brown, Cynthia. 2000. The interrelation between speech perception and phonological acquisition from infant to adult. In John Archibald (ed.), Second language acquisition and linguistic theory, 4–63. Oxford: Blackwell. Bruthiaux, Paul. 2003. Squaring the circles: Issues in modeling English worldwide. International Journal of Applied Linguistics 13 (2). 159–178. Cenoz, Jasone. 2001. The effect of linguistic distance, L2 status and age on cross-linguistic influence in third language acquisition. In Jasone Cenoz, Britta Hufeisen and Ulrike Jessner (eds.), Cross-linguistic influence in third language acquisition: Psycholinguistic perspectives, 8–20. Clevedon, UK: Multilingual Matters. De Angelis, Gessica. 2007. Third or additional language acquisition. Clevedon, UK: Multilingual Matters. Flege, James Emil. 1995. Second-language speech learning: Theory, findings, and problems. In Winifred Strange (ed.), Speech perception and linguistic experience: Issues in crosslinguistic research, 233–277. Timonium, MD: York Press. Gut, Ulrike. 2007. First language influence and final consonant clusters in the new Englishes of Singapore and Nigeria. World Englishes 26 (3). 346–359. Gut Ulrike. 2009. Non-native speech: A corpus-based analysis of phonological and phonetic properties of L2 English and German. Frankfurt: Peter Lang. Huber, Magnus. 2008. Ghanaian English: phonology. In Rajend Mesthrie (ed.), Varieties of English. Volume 4 : Africa, South and Southeast Asia, 67–92. Berlin: Mouton de Gruyter. Kachru, Braj B. 1992. The other tongue: English across cultures, 2nd. ed. Champaign-Urbana: University of Illinois Press. Lave, Jean & Etienne Wenger. 1991. Situated learning: Legitimate peripheral participation. Cambridge: Cambridge University Press. McArthur, Tom. 1998. The English languages. Cambridge: Cambridge University Press. Mennen, Ineke. 2004. Bi-directional interference in the intonation of Dutch speakers of Greek. Journal of Phonetics 32 (4). 543–563. Schneider, Edgar W. 2003. The dynamics of new Englishes: From identity construction to dialect birth. Language 79 (2). 233–281. Singler, John Victor. 2008. Liberian Settler English: Phonology. In Rajend Mesthrie (ed.), Varieties of English. Volume 4 : Africa, South and Southeast Asia, 102–114. Berlin: Mouton de Gruyter. Yano, Yasukata. 2009. The future of English: Beyond the Kachruvian three circle model. In Kumiko Murata & Jennifer Jenkins (eds.), Global Englishes in Asian contexts: Current and future debates, 208–226. Basingstoke-New York: Palgrave Macmillan.

Index Bangla 91–107 bilingual advantage 72, 78, 84, 85 Brunei English 10–12, 29 Chinese 29, 30, 31, 72, 135–138, 166, 167, 168 consonant cluster 12, 15, 72, 95, 108–111, 120–122 consonant cluster reduction 122, 124 cross-linguistic influence 2–4, 41–43, 74, 85, 156, 166, 241, 245, 246 disyllabic trochee constraint 99, 101, 102 elision 124, 126, 127, 128 endonormative 25, 29, 35, 117, 175, 176, 242 English as a foreign language, EFL 9, 23, 25, 29, 167, 168, 170, 185, 188, 244 English as a second language, ESL 9, 12, 23, 24, 118, 167, 169, 170, 176, 185, 190, 244 epenthesis 95, 96, 100, 108–110, 243 exonormative 29, 35, 37, 169, 242 F0 138, 211, 214, 234–238 focus 210–212 fossilisation 197 French 45, 49, 72, 94, 112, 136, 139, 140, 165, 188–189 gender 14, 215 German 46, 136, 137, 138, 140, 165, 166, 195–196 givenness 211 Indian English 66, 171, 176, 236 information structure 211 Inner Circle English 9, 10, 15, 18, 200, 244 intensity 138, 170, 175–176, 202, 210–211 Irish English 76–78 learner English 1, 120, 167 learner language 167–168 lingua franca 2, 17, 19, 155, 188, 242 loanword 5, 91–113

Malaysian English 4–5, 23–29, 31, 33, 36– 37 metalinguistic awareness 74, 197 multilingual 2, 3, 5, 6, 23–24, 27, 41–43, 45–47, 49–50, 55, 50–61, 65–67, 72– 74, 78, 85–87, 94–95, 129, 135–136, 140–142, 144, 146–150, 154–157, 209– 210, 220, 226, 234, 242–243, 245–246 multilingual awareness 6, 153–4, 157 Nigerian English 6, 117–118, 120–122, 124, 126–129, 186, 189–190, 199–202, 236, 241–242 non-rhotic 5, 13–18, 26, 28, 30, 32, 35–37, 200, 241–242 nPVI-V 168–170 onset (of a syllable) 5–6, 32, 45–46, 50–52, 56, 60–61, 95–96, 99, 103–106, 109– 112, 117–118, 120–129, 139, 192, 241– 243 Optimality Theory 92, 95, 113 Outer Circle English 9, 244–245 penultimate stress 194–195, 198–199, 201, 243 phonological awareness 71, 136, 144, 150, 153–154, 157 Polish 5, 9, 46, 61, 63, 66, 71, 75–84, 189 postvocalic /r/ 29 proficiency 5, 24, 42–45, 47, 50, 57–58, 61– 62, 64–66, 72, 78–81, 113, 123, 128, 149, 185, 190, 235–235, 242 prosody 4, 135, 138, 154, 185, 188, 194, 203, 210–211, 225, 236–238 psychotypology 42 rhoticity 4–5, 10–18, 25–37, 241–242, 246 rhythm metrics 140, 144, 149, 166–168, 170 second language acquisition 1–3, 41, 44, 48, 60–63, 91–92, 94–95, 197, 246 Singapore English 23, 28–30, 34, 36, 155– 156, 169, 244

250

Index

sonority 6, 109–110, 121, 166–167, 170–176, 183 South African English 6–7, 66, 201, 209–21, 214, 220, 234, 236–238, 241–242 speech perception 78, 188 speech rhythm 6, 136, 138–142, 146, 149– 150, 154–157, 165–168, 172, 174–177, 241–243, 245 stress assignment 6, 185–186, 189–190, 194–195, 197, 199, 202–203 stress deafness 189 stress error 199 stress production 188, 190, 194 stress-timed 138–142, 152, 154–155, 157, 166–170, 172, 174–176 syllable structure 5, 97–99, 106, 108, 110, 121, 139, 144, 165, 241, 243 syllable-timed 138–142, 152, 155–157, 165– 172, 174–176

third language acquisition 41, 44–45, 48– 49, 63, 65, 93–94, 108 Three Circles of English 244 three-syllable window 187, 192, 201 transfer 2, 6, 41–44, 46, 48, 61, 63, 65–66, 72–75, 85, 93, 117, 129, 135–136, 138, 140–141, 154, 157, 188, 194, 245 typology 45, 62–63, 67, 97, 99, 103, 117, 137 VarcoV 139, 144–150, 155–156, 166, 168– 169 voice onset time, VOT 5, 44–45, 46–58, 60– 67, 241–243 vowel insertion 124–125, 127, 241 vowels 4–6, 11–12, 16–18, 30, 32–33, 50, 56, 76–78, 80, 82–84, 100, 104, 109– 110, 112, 122, 126, 155–156, 170, 192, 200, 202–203, 210, 241, 243 word-level stress 185–186, 190