311 90 4MB
English Pages 199 [200] Year 2015
A Sociophonetic Approach to Scottish Standard English
Varieties of English Around the World (VEAW) issn 0172-7362
A companion monograph series devoted to sociolinguistic research, surveys and annotated text collections. The VEAW series is divided into two parts: a text series contains carefully selected specimens of Englishes documenting the coexistence of regional, social, stylistic and diachronic varieties in a particular region; and a general series which contains outstanding studies in the field, collections of papers devoted to one region or written by one scholar, bibliographies and other reference works. For an overview of all books published in this series, please see http://benjamins.com/catalog/veaw Editor Stephanie Hackert
University of Munich (LMU)
Editorial Board Manfred Görlach Cologne
Rajend Mesthrie
University of Cape Town
Peter L. Patrick
University of Essex
Edgar W. Schneider
University of Regensburg
Peter Trudgill
University of Fribourg
Walt Wolfram
North Carolina State University
Volume G53 A Sociophonetic Approach to Scottish Standard English by Ole Schützler
A Sociophonetic Approach to Scottish Standard English Ole Schützler University of Bamberg
John Benjamins Publishing Company Amsterdam / Philadelphia
8
TM
The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.
doi 10.1075/veaw.g53 Cataloging-in-Publication Data available from Library of Congress: lccn 2015008369 (print) / 2015011363 (e-book) isbn 978 90 272 4913 5 (Hb) isbn 978 90 272 6858 7 (e-book)
© 2015 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa
Table of contents
List of tables
ix
List of figures
xi
List of equations
xiii
List of abbreviations
xv
List of other symbols
xvii
Acknowledgements
xix
chapter 1 Introduction1 1.1 Scottish English: Previous research 1 1.2 The present study in the context of Edinburgh 3 1.3 The variables under investigation 7 1.4 Research in a sociophonetic framework 10 1.5 Overall contribution of the present study 13 1.6 Structure of the book 14 chapter 2 Scottish Standard English in context 2.1 The Scottish English language continuum 17 2.1.1 Scots and Scottish Standard English 18 2.1.2 Status and definition of SSE 20 2.1.3 ‘Drifting’ 22 2.2 SSE as a double contact variety 23 2.3 Characteristics of the SSE accent 26 2.3.1 The sound inventory 26 2.3.2 Criterial and optional features of SSE 30 chapter 3 Explaining accent variation and change 3.1 Accent contact and change by accommodation 33 3.2 Internal and external factors in variation and change 35 3.2.1 Age, gender and contact 36 3.2.2 Style 38
17
33
vi
A Sociophonetic Approach to Scottish Standard English
3.3
3.2.3 Ease of articulation and clarity 40 3.2.4 Frequency effects 42 3.2.5 Other internal factors 43 Towards a unified model 45
chapter 4 Data and methodology 4.1 Data collection 47 4.1.1 Speakers and styles 48 4.1.2 Types and tokens 50 4.1.3 Recording, processing and transcription 52 4.2 Analysing acoustic vowel data 53 4.2.1 Making vowel measurements 54 4.2.2 Vowel transformation 55 4.2.3 Acoustic vowels as variables 58 4.3 Auditory analyses of (r) 59 4.4 Multilevel modelling 60 4.4.1 The hierarchical (generalised) linear model 61 4.4.2 Model output and diagnostics 63 4.4.3 Model-building 65 chapter 5 The research context for /e/ and /o/ 5.1 The acoustics and perception of diphthongs 67 5.2 Historical developments of (e) and (o) 69 5.3 /e/ and /o/ in Scotland 71 5.3.1 Early sources and textbooks 72 5.3.2 Contemporary empirical research 73 5.4 Summary and research questions 75 chapter 6 Statistical analyses of (e) and (o) 6.1 Descriptive statistics 79 6.2 Multilevel analyses of (e) and (o) 83 6.2.1 Social effects 86 6.2.2 Stylistic effects 89 6.2.3 Language-internal effects 93 6.3 Discussion of results 96
47
67
79
chapter 7 The research context for (r) 7.1 Variability of /r/ 101 7.2 Historical developments of /r/ 103 7.3 /r/ in Scotland 104 7.3.1 Early sources and textbooks 104 7.3.2 Contemporary empirical research 106 7.4 Linking /r/ 110 7.5 Summary and research questions 112 chapter 8 Statistical analyses of (r) 8.1 Descriptive statistics 115 8.2 Multilevel analyses of (r) 120 8.2.1 Social effects 123 8.2.2 Stylistic effects 126 8.2.3 Language-internal effects 127 8.3 Discussion of results 134 chapter 9 Conclusion: Variation and change in SSE 9.1 Summary of central findings 139 9.2 The SSE-SSBE continuum – fact or fiction? 143
Table of contents vii
101
115
139
References149 appendices A. Fieldwork material 163 A.1 Reading passage 163 A.2 Word list 164 A.3 Questionnaire 165 B. Sample transcript 166 C. Token numbers 169 C.1 Token numbers of (e) and (o) 169 C.2 Token numbers of (r) 170 D. Independent variables 171 D.1 Technical definitions 171 D.2 Normal values 174 Index
177
List of tables
Table 1.1 Demographic data for Edinburgh and Glasgow3 Table 2.1 The consonants of SSE29 Table 4.1 Distribution of speakers by age, gender, and contact48 Table 4.2 Types used in the word list51 Table 4.3 Types used in the reading passage51 Table 4.4 Token numbers for (e), (o) and (r) by style52 Table 5.1 Historical developments of present-day /e/ and /o/71 Table 6.1 Descriptive statistics for (e) and (o)79 Table 6.2 Hierarchical linear regression of (e)84 Table 6.3 Hierarchical linear regression of (o)85 Table 8.1 Gross frequencies of variants of (r) by subvariables and speaker groups116 Table 8.2 Frequencies of variants of coda (r) used by individual speakers117 Table 8.3 Frequencies of variants of linking (r) used by individual speakers119 Table 8.4 Frequencies of variants of onset (r) used by individual speakers120 Table 8.5 Nested hierarchical logistic regression of coda (r)121 Table 8.6 Nested hierarchical logistic regression of linking (r)122 Table 8.7 Hierarchical logistic regression of onset (r)122
List of figures
Figure 2.1 Scots, SSE and SSBE: Two models of contact25 Figure 2.2 The vowels of SSE27 Figure 3.1 Complementary stylistic continua: ATS and SAT40 Figure 3.2 SSE-SSBE contact: Internal and external factors46 Figure 4.1 Measuring points in the words bait and no55 Figure 4.2 Variants of the subvariables coda (r), linking (r) and onset (r)59 Figure 4.3 The 3-level structure of the dataset in the present study60 Figure 4.4 Tri-categorical outcomes as a result of two binary choices63 Figure 5.1 /e/ and /o/ in SSBE/RP (Deterding 1990) and SSE (McClure 1995)75 Figure 6.1 Mean positions and variability of T1 and T2 of (e) and (o)80 Figure 6.2 Vowel trajectories for all 27 speakers81 Figure 6.3 Vowel trajectories for all 27 speakers by age, gender and contact82 Figure 6.4 Controlled formant plot based on the final models for (e) and (o)86 Figure 6.5 Effects of age and gender on (e) and (o)87 Figure 6.6 Effects of contact on (e) and (o)88 Figure 6.7 (e) and (o): Interaction of age, gender and wordlist90 Figure 6.8 (e) and (o): Interaction of wordlist and contact92 Figure 6.9 Effects of speechrate, stress and bnclogf on (e) and (o)93 Figure 6.10 Effects of prepausal on (e) and (o)95 Figure 8.1 Proportions of variants of (r) by subvariables and speaker groups116 Figure 8.2 Proportions of variants of coda (r) used by individual speakers117 Figure 8.3 Proportions of variants of linking (r) used by individual speakers119 Figure 8.4 Proportions of variants of onset (r) used by individual speakers120 Figure 8.5 Normal proportions of (r)-variants predicted from the final models123 Figure 8.6 Effects of age and gender on coda (r) and linking (r)125
xii A Sociophonetic Approach to Scottish Standard English
Figure 8.7 Figure 8.8 Figure 8.9 Figure 8.10 Figure 8.11 Figure 8.12 Figure 8.13 Figure 9.1
Effect of contact on coda (r)125 Effects of wordlist on coda (r), with interactions127 Effects of speechrate and stress on coda (r)129 Effects of bnclogf on subvariables of (r)130 Effects of prepausal on coda (r), with interactions132 Effects of ini 6 and ini 3 on coda (r)133 Effect of intervoc on onset (r) vis-à-vis variants of linking (r)133 SSBE as an alternative acrolect (based on Giles 1973: 90)145
List of equations
Equation 4.1 Equation 4.2 Equation 4.3 Equation 4.4 Equation 4.5 Equation 4.6 Equation 4.7
Conversion of Hz into ωBk56 Nearey’s (1978) single logmean using ωBk instead of lnHz57 Calculation of Euclidean distance in acoustic vowel spaces58 Schematic representation of a random-intercepts HLM61 Logit link function (a) and logistic function (b)62 Probabilities in nested logit models: three variants of (r)63 R-squared in multilevel models (single variance component)64
List of abbreviations
AD Audience design AIC Akaike’s Information Criterion ATS Attention to speech B Regression coefficient Bk Bark C Consonant CBR Critical Band Rate CS Careful speech D Qualitative distance EModE Early Modern English ESRC Economic and Social Research Council F1 Formant 1 F2 Formant 2 GA General American HGLM Hierarchical Generalised Linear Model HLM Hierarchical Linear Model Hz Hertz kHz Kilohertz LMC Lower middle class LVC Language variation and change MC middle-class ME Middle English ms millisecond(s) N Number of observations Number of observations at levels 3/2/1 N3/2/1 OR Odds ratio p Probability / proportion r Any articulated variant of (r) RP Received Pronunciation s second(s) SAT Speech Accommodation Theory ScE Scottish English SSBE Southern Standard British English
xvi A Sociophonetic Approach to Scottish Standard English
SSE Scottish Standard English SVLR Scottish Vowel Length Rule T1 Target 1 T1(F1) Target 1 of formant 1 T1(F2) Target 1 of formant 2 T2 Target 2 T2(F1) Target 2 of formant 1 T2(F2) Target 2 of formant 2 TX Reading passage UMC Upper middle class V Vowel WC working-class WL Word list Y Outcome variable
List of other symbols
(r), (e), (o) sociophonetic variables /r/, /e/, /o/ phonemes [r], [e], [o] phonetic variants spellings < smaller than > larger than σ standard deviation σ2 variance Σ Sum of η logit link function ωBk Octaves based on Bark # Morpheme boundary
Acknowledgements
This monograph is the result of research carried out at the University of Bamberg as well as in Edinburgh. Many people deserve my heartfelt thanks for the invaluable help and encouragement they gave me along the way. First and foremost I would like to thank Manfred Krug, who supported me in every conceivable way, not least by putting me in touch with a number of extremely knowledgeable and helpful colleagues. Manfred is a prime example of the fact that, beyond ideas and hard work, fulfilment and success in academia also depend on the goodwill and cooperation of people. Ulrike Gut from the University of Münster gave invaluable advice, supported me as a referee on several occasions and always responded to my concerns with promptness, efficiency and warmth. I would also like to thank the series editor, Stephanie Hackert, as well as an anonymous reviewer, for the effort they put into reading and commenting on the manuscript, which has benefited a great deal from their detailed and constructive feedback. Many thanks are due to the team on the research project Variation, Linguistic Change and Grammaticalization III, first and foremost its principal investigator, Teresa Fanego. The generous personal funding by the Spanish Ministry of Science and Innovation and the European Regional Development Fund were immensely helpful, as was the general support and encouragement I was given by Teresa and her colleagues. I would also like to thank my colleagues Shane Walshe, Anna Rosen and Katrin Sell, for sharing problems and concerns and relaxing over many a coffee at our meetings. Julia Schlüter and Heinrich Ramisch were always there when advice or help was needed. I am also grateful to everyone who took an interest in and commented on my work at conferences and meetings. Heinz Giegerich, Jane Stuart-Smith and Jim Scobbie gave me the opportunity to present my work and put me in touch with people at their respective institutions, the University of Edinburgh, the University of Glasgow and Queen Margaret University. Jim also very kindly supported me in my application for a DAAD grant. Of course I am most grateful to all informants who participated in my fieldwork and thus made this study possible in the first place. People who gave me access to institutions and speakers and helped me in various ways during data collection were Stefanie Groenke, Laura Bradley, Eleoma Joshua and Siegfrid Opelka at the University of Edinburgh, and various members of staff at the secondary school
xx A Sociophonetic Approach to Scottish Standard English
where I conducted part of the interviews. I would like to thank the National Library of Scotland, Edinburgh City Library, and the University of Edinburgh Main Library for being such wonderful places to work. The German Academic Exchange Service (DAAD) kindly granted me funding for additional research activities in Edinburgh in 2011/12. It was fantastic to be part of their programme. From my family, I would like to thank my sister Lena for introducing me to multilevel analysis. My parents I would like to thank for taking me on a trip around Scotland in the splendid summer of 1991, which triggered my Scotophilia, and for being an all-round support in every way, over the years. Steffi and Arved I would like to thank for bearing with a partner and dad who was often enough impatient and grumpy because his mind was locked on obscure linguistic problems, and also for forcing his mind off those problems. I may not always have felt this initially, but being chased about by a four-year-old wee monster can really give your brain a fresh start (although it also makes you rather tired). I dedicate this book to both of them. Some errors and weaknesses in the present study will have gone undetected. Needless to say that for these I have only myself to blame.
Edinburgh, spring 2012 / Bamberg, autumn 2014
chapter 1
Introduction
While it is certainly one of the smaller varieties of English in terms of its number of speakers, Scottish English (ScE) is clearly recognised as a major variety in terms of its importance and influence. Its status and the general recognition and attention it receives are partly due to the long history of English in Scotland as a distinct variety (cf. 2.1.1). This book therefore builds upon a large body of previous research. However, it focuses particularly on the possibility of ongoing linguistic influence of Southern British English on ScE – a perspective, it is argued, that is not often taken. This first chapter contrasts this particular research interest with existing studies and introduces the linguistic variables that are used to explore it. Furthermore, it outlines the sociophonetic framework that is adopted, highlights methodological and theoretical contributions of the study, and, finally, provides an outline of the book as a whole. Throughout the book, Scottish English is used as an umbrella term covering the continuum between non-standard (Scots) and standard (Scottish Standard English) usage (see 2.1 for a discussion). 1.1 Scottish English: Previous research There is an impressive list of handbook chapters, edited volumes and book-length empirical studies dedicated to aspects of ScE. Some of the most significant and best-known publications include Aitken and McArthur’s edited volume Languages of Scotland (1979), McClure’s (1994) chapter in The Cambridge History of the English Language (vol. V), The Edinburgh History of the Scots Language (Jones, ed., 1997), The Edinburgh Companion to Scots (Corbett, McClure & Stuart-Smith, eds., 2003), the contributions by Stuart-Smith on phonology and by Miller on morphology and syntax to the handbook Varieties of English (Kortmann & Upton, eds., 2008), as well as Jones’s introductory volume The English Language in Scotland (2002). Another recent collection of research papers on sociolinguistic research on Scottish languages and dialects is Lawson (ed., 2014). Furthermore, McMahon (2000: 140–204) inspects the Scottish Vowel Length Rule (SVLR) in her monograph on lexical phonology, and Giegerich includes Scottish Standard English as one reference accent in his English Phonology (1992). Numerous (mainly PhD) dissertations have been written on aspects of ScE, covering a wide range of topics
2
A Sociophonetic Approach to Scottish Standard English
and theoretical approaches. There are studies concentrating on sociolinguistics (Romaine 1975; Clark 2008; Lawson 2009; Brato 2012), voice quality (Esling 1978), historical aspects of the ScE accent (Kohler 1964), bilingualism (Gordeeva 2006), language acquisition (Matthews 2001), the Scottish Vowel Length Rule (McKenna 1988; Agutter 1988; Pukli 2006), phonological (non-variationist) aspects of ScE (Kamińska 1995), as well as language attitudes (Schmitt 2009). Against the background of this extensive body of research, it is somewhat surprising that the dynamics resulting from the present-day coexistence of ScE and non-Scottish varieties of English have rarely been the object of (socio)linguistic research – exceptions being the research project Accent and Identity on the ScottishEnglish Border (AISEB),1 carried out by investigators based at the universities of York and Newcastle, and research by Carr and Brulard (2006). In general, however, the main interest has been in variation between Scots and Scottish Standard English (SSE; e.g. Romaine 1978; Stuart-Smith 2003), where SSE is viewed as the standard dialect (or accent) in a language continuum, which, as a whole, can be called Scottish English (see 2.1). There may be an underlying assumption that speakers of Southern Standard British English (SSBE)2 and speakers of SSE belong not to the same but to different speech communities, which implies that there is little or no mutual influence.3 The research gap is all the more conspicuous because there is general agreement that present-day SSE results from historical contact between Scots and southern English (cf. 2.1.1). The lack of research activity on a spects of Scottish Standard English is pointed out, for example, by Corbett, McClure and Stuart-Smith (2003: 4) and Macafee (2004: 59), the latter highlighting the fact that SSE is a contact variety. Of course this can be understood in the sense of SSE being the result of historical contact between different Englishes, i.e. a variety created through a process of koinéisation (cf. 3.1; also cf. McClure 1994: 79), but it can also mean that SSE continues to be influenced by SSBE. The transition from acknowledging this contact situation to developing a research interest of the kind pursued in the present study is but a small step. It needs to be pointed out 1. (6 January 2015). 2. When referring to the standard accent of England, I use the term Southern Standard British English (SSBE; McMahon 2002: 69) in preference to Anglo-English (Abercrombie 1979: 73) or Received Pronunciation (RP; e.g. Cruttenden 2008: 77). Particularly in contrast to the latter, this emphasises the regional rather than the class-based associations of this accent (cf. MacMahon 1998: 395; Gimson 1984: 46–47). To avoid confusion, I substituted SSBE for RP in citations of earlier studies in this book (unless quoting verbatim), indicating the substitution where necessary, and referring to this footnote. 3. See Romaine (1980: 217), who points out that Scots who have for some reason acquired a southern English accent are nevertheless part of the linguistic continuum.
Chapter 1. Introduction
that SSE and SSBE are of course standard dialects of English with differences at all linguistic levels (see 2.1.2). In this study, however, SSE and SSBE will be used as labels for the two standard accents of Scotland and England, respectively, but this is done for purely practical reasons. 1.2 The present study in the context of Edinburgh The research presented in the following chapters focuses on Edinburgh middleclass speakers interviewed at the University of Edinburgh and at one of Edinburgh’s private schools. For these speakers, SSE-SSBE contact plays a major role, due to mainly three factors: (i) the particular social makeup of Edinburgh as a city, (ii) the institutional contexts of the university and the school, and (iii) the fact that, as members of the middle class, speakers are relatively mobile, and their social networks are relatively loose. In illustration of the first point, Table 1.1 shows some data concerning the cities of Glasgow and Edinburgh and their two main universities, the University of Edinburgh and the University of Glasgow. Figures are taken from The University of Edinburgh Annual Review 2012–2013 (University of Edinburgh 2014), The University of Glasgow Annual Review 2012–2013 (University of Glasgow 2014) and the 2011 Census (National Records of Scotland 2014), respectively. Comparing the census data for the 2011 Scottish Council Areas of the City of Edinburgh and Glasgow City, it is evident that the proportion of the population who state their national identity as ‘Scottish and British’, ‘British’, or even ‘English’ is much higher in Edinburgh than in Glasgow. It is equally striking Table 1.1 Demographic data for Edinburgh and Glasgow. Glasgow
Edinburgh
Scottish identity only Scottish and British identity British identity only English identity only
61.9% 16.1% 8.6% 1.0%
48.8% 18.5% 11.4% 2.6%
Higher managerial / administrative / professional occupations Lower managerial / administrative / professional occupations Intermediate occupations ∑% top three socio-economic categories
7.9% 16.9% 11.5% 36.4%
14.6% 22.3% 12.4% 49.3%
Proportion of residents born in England
4.8%
12.1%
Students from Scotland (2012/13) Students from rest of UK (2012/13) Non-UK students (2012/13)
62.2% 8.5% 29.3%
37.0% 24.1% 38.9%
3
4
A Sociophonetic Approach to Scottish Standard English
that the proportion of socio-economically higher-ranking professionals is considerably larger in Edinburgh. Further, there are far more English-born people living in Edinburgh, and the student mix in Edinburgh is also characterised by a much higher proportion of students from other parts of the UK (predominantly from England, it can be assumed) or from abroad. In this context it is important to note that a general correlation of Englishness and socio-economic standing has been noted by Scobbie, Hewlett and Turk (1999: 242). Compared to most other areas in Scotland, including urban centres, Edinburgh is more robustly middle-class and affluent as well as characterised by stronger British or even English elements. One of the socio-historical reasons for this is the fact that Edinburgh has been the capital of Scotland since the reign of James III in the late 15th century (Lynch 2001: 218), and has therefore been tied into international (and particularly British) affairs to a much larger extent than other parts of Scotland. Although Glasgow’s role in international trade and commerce was also of enormous importance during the industrial revolution and British colonial expansion (cf. Macaulay 1977: 9), this phase was on the whole more short-lived and followed by dramatic economic crises in the twentieth century, during which interest and investment from abroad drained away. By contrast, Edinburgh’s capital status and the concomitant cosmopolitan outlook of the city have on the whole been more constant across the centuries and have provided a fertile ground for c ertain types of economic activity, still visible (and perhaps even strengthened) today in the number of companies from the banking and insurance sectors – companies that operate internationally and attract employees and customers from all over the world, but particularly from the rest of Britain. Knock-on effects of this particular aspect of the city’s social make-up are the large number of students from England (see Table 1.1 above), and the fact that several private secondary schools offer leaving certificates that conform to English standards (A-Levels), e.g. the Edinburgh Academy, Fettes College or the Loretto School. There has been a general upsurge of national feeling in Scotland during the 20th century, with points of culmination marked by referendums in 1979 and 1997, addressing the question of devolution, as well as the recent referendum of 2014, concerning the question of independence. However, according to the official figures published by the Electoral Management Board for Scotland (2014), Edinburgh was among those council areas where the proposition of a fully independent Scotland gathered particularly little support, with only 38.9% of yes votes, which is rank 25 out of 32 council areas. Glasgow, on the other hand, was one of only five council areas where a majority voted for independence. All of the above considerations and data make it more understandable why Edinburgh, more than other locations in Scotland, should be an environment conducive to SSE-SSBE contact, and why Scobbie, Hewlett and Turk (1999: 242) should
Chapter 1. Introduction
expect “a greater influence of Anglo English on the Edinburgh middle class” in comparison to the situation in Glasgow, for example. The settings of the school and university, where data collection took place (cf. Section 4.1), can be said to reflect the general social fabric of Edinburgh at lower levels. These two institutions are not only bound to bring out SSE as the general mode of communication, but will also give rise to the kind of contact situations the present study is interested in. For example, Edinburgh University is considerably anglicised, with a large number of students from England or from abroad (see Table 1.1 above). Many university lecturers will be English – the author knows for a fact, for example, that at the time of data collection (in 2008) only one out of approximately ten lecturers was actually Scottish in the department where students were interviewed. While only very few teachers at the school are English, a number of them have taught in England for a while. Among pupils, there is a large proportion of families with at least one English parent. Naturally, the private school attracts pupils from families with a financially strong middle-class background, not least because of the considerable annual fees.4 The middle-class background of the speakers interviewed for this study is slightly different between the older and the younger generation. For instance, the older speakers are often from families that are working-class rather than middleclass in the parental generation. However, they can be said to have changed class by becoming teachers at one of the more prestigious private schools in Edinburgh or taking employment with the University of Edinburgh. In several cases, the speakers from the older group had one parent with a skilled (or even academic) profession, and thus the family could be said to have been on the path to gentrification for some time. In contrast to the older speakers, the younger generation (pupils and students) generally have parents who are already part of the middle class, e.g. teachers, bank managers, solicitors, civil servants, architects or lecturers. That is, younger speakers have not moved up but were born into the middle class. Apart from class membership, speakers spend a large part of their time in institutional contexts where contact with speakers of SSBE or speakers from mixed ScottishEnglish households plays a relatively major role. Thus, while the speakers of this study are clearly Scottish themselves, they are part of what could be called mixed social networks – that is, networks characterised by various shades of Englishness and Scottishness, in cultural and linguistic terms. The interview situation partly tests in how far English elements from the accent repertoire are accessed when talking to a non-Scottish interviewer.
4. Statistics for the school were not available, but the general observations that are recorded here are based on personal communications by members of the teaching staff.
5
6
A Sociophonetic Approach to Scottish Standard English
The contact situation in which the speakers of the sample find themselves is not one between different languages, nor between standard and non-standard dialects of the same language, but between two supraregional standard dialects (or in this case: accents) of the same language. The results suggest that it is useful to regard SSBE as an additional external player in the variation of SSE, even if its influence is not uniform across different variables. This also means that SSBE needs to be incorporated into existing continuum models of the English language in Scotland. Furthermore, it will be argued that SSE is more than “an analysts’ artefact” (Giegerich 1992: 46), as it displays patterned internal variation and systematically adapts to different communicative functions. The complex linguistic situation in Scotland creates a certain ambiguity of norms. On the one hand, SSE is considered an intact and robust prestigious standard accent of English in Scotland. On the other hand, a direct outside influence of SSBE is also a constant theme. There is thus a conflict between the two views that (i) SSE is indeed a standard, and that (ii) SSE is less standard than SSBE, at least for some speakers. It is one of the aims of this study to contribute to the resolution of these contradictions, based on empirical findings. The first view, i.e. the view that SSE is the norm and that outside influence, especially from SSBE, is therefore relatively unlikely, is put forward by Wells (1982: 393). Similarly, McClure (1994: 62) not only calls SSE the “sociolinguistic norm” in Scotland, but also classifies it as “an autonomous speech form”, which “is recognised as an established national standard, throughout the English-speaking world” (McClure 1994: 79). The choice of the term autonomy is certainly not coincidental and suggests autonomy from southern English. McClure (1994: 80) emphasises his point by saying that Received Pronunciation (RP; see footnote 2) is entirely alien to the language situation in Scotland, and that in general speakers will not regard it as a target accent worth emulating. In support of these claims, an earlier accent evaluation experiment by Romaine (1980: 226) found that Scottishsounding accents evoke positive evaluations of speakers’ personality traits by Scottish (middle-class) raters. Similarly, on a broader UK-wide basis, Coupland and Bishop (2007: 79) find that Scottish accents are rated very favourably for social attractiveness and prestige. These patterns of evaluation are far from restricted to Scotland, although it is here that they are most prominent. Evidence of the positive perception and evaluation of Scottish accents is sometimes complemented by claims to the effect that positive perception of SSBE-like accents is generally, i.e. not only in Scotland, in decline, thus argued by Abercrombie (1991), who passes a rather harsh judgement on the model character of the southern standard accent, which he calls RP. It is therefore quite possible that the anglicisation of SSE towards SSBE is a popular myth, rather than sociolinguistic fact. This myth, moreover, is propagated less by linguists than by “informal observers” (Scobbie 2006: 340),
Chapter 1. Introduction
probably mainly based on the historical and economic dominance of England over Scotland, on the asymmetrical population figures between the two countries, and on vague ideas regarding the levelling effects of globalisation (cf. Trudgill 1998; Meyerhoff & Niedzielski 2003: 536). The second view, i.e. the view that SSBE is prestigious for some speakers in Scotland, is put forward by Speitel and Johnston (1983: 41). Somewhat vaguely, they claim that several changes were in progress in the Edinburgh of the 1980s, some of which were changes from below that originated in Scottish working-class usage, but others of which were “changes from above, originating in RP or some other prestige accent”. The picture drawn by Speitel and Johnston (1983) indicates the need for an up-to-date inspection of the same questions: is there any evidence that SSBE has an influence on the SSE accent, and how can we model such situations of contact between two influential standard accents? To conclude this general introduction, it seems reasonable to assume that, at least among certain middle-class speakers in certain settings (see discussion above), there is variation at the accent level between realisations that are more typical of SSE and others that are more typical of SSBE. It would be rash, however, to conclude from this that SSBE holds greater prestige than SSE for those speakers. As Meyerhoff and Niedzielski (2003: 542) point out, quantitative analyses of language variation very often assume that speakers have a target, i.e. the conscious or unconscious desire to sound like someone else. This assumption continues to be popular not least because it already constitutes one cause of language variation and change. It can be dangerous, however, to give a possible target outside the speech community (here: SSBE) too much weight a priori, as this diverts the attention of the researcher away from other possible causes of variation and change. For instance, it may also be the case that some aspects of the variation of SSE are accent-internal, i.e. uninfluenced by outside factors. A more nuanced definition of SSE and the ways in which it functions relative to SSBE is needed – a fuller treatment that explores the function of SSE as a standard accent within Scotland, but also its position within the British context. This book aims to contribute to the discussion of these issues. 1.3 The variables under investigation The broader questions outlined above will be approached through two sets of sociophonetic variables. They are (i) the vowels /e/ and /o/ in words like day and go, and (ii) the consonant /r/ in coda position (e.g. car, part), linking contexts (e.g. here_is), and onset position (e.g. right, bright). I follow Giegerich (1992) in transcribing the vowels phonologically as /e/ and /o/ in all varieties, because their
7
8
A Sociophonetic Approach to Scottish Standard English
diphthongisation, e.g. in SSBE and General American (GA), is purely phonetic. This is in contrast to Cruttenden (2008) and Roach, Setter and Esling (eds., 2011). The phonemes /e/ and /o/ correspond to Wells’s (1982) lexical sets face and goat, but will most of the time be referred to as the sociophonetic variables (e) and (o). Likewise, the phoneme /r/ is of interest not so much as a phoneme but as the sociophonetic variable (r), which can be further subdivided into the three special cases mentioned above and discussed in more detail below (cf. 4.3): coda (r), linking (r), and onset (r). The variables were chosen because it has been claimed (and will partly be confirmed) that the variation of vocalic and consonantal variables throws into relief different aspects of variation in an accent (see below). Additionally, (e), (o) and (r) (the latter at least in coda positions) belong to a group of variables singled out by Aitken (1979) as “optional features” of the SSE accent (cf. 2.3.2). One might argue that they should therefore be particularly susceptible to sociophonetic variation. The variables (e), (o) and (r) quite possibly have double roles to play, being salient markers in both accents whose contact is of interest, SSE and SSBE. That is, both non-rhoticity and diphthongal /e/ and /o/ are defining characteristics of the southern English standard accent (Bauer 1984: 74; Roach, Setter & Esling, eds. 2011: vii, xvii), and conversely, rhoticity (and certain phonetic variants of /r/) as well as monophthongal /e/ and /o/ are defining characteristics of SSE (e.g. Jones 2002: 25–27). In contrast to /aɪ/, /aʊ/ and /ɔɪ/, the vowels /e/ and /o/ are phonological monophthongs, i.e. their diphthongisation is redundant for communicative purposes and has no systemic function. However, the fact that diphthongisation is not necessary for communicative purposes can be taken to imply that there is all the more scope for indexical diphthongisation – i.e. diphthongisation that comes about for sociolinguistic reasons – in an accent like SSE. As will be discussed in Section 5.2, the two vowels /e/ and /o/ are historically often treated as a pair, and it is tacitly assumed that they will undergo changes in parallel. This can be partly explained from the fact that they are positioned symmetrically in the vowel space and follow similar long-term paths of change (see Table 5.1). However, in a sociophonetic present-day investigation, a rather different picture may well emerge. For example, one of the two variables may be more salient and therefore more responsive to social factors. Accordingly, one of the interesting general questions will be whether the two variables indeed vary (and change) in similar ways or not. Furthermore, it cannot be taken for granted at all that variation and change affecting these two vowel features is best described in terms of diphthongisation. Once more it is not by necessity but historical accident that the focus concerning the variability of (e) and (o) has been shifted to this particular aspect over the last few centuries. However, the present study will also look for
Chapter 1. Introduction
other patterns of variation and change, like qualitative shift, which are known to affect other (monophthong) vowels. The consonant /r/ continues to be a very popular object of sociophonetic research in accents of English. Its appeal is manifold: (i) It varies at several levels (phonological, phonetic), (ii) it is an important separator between accents (rhoticity vs. non-rhoticity; cf. McMahon et al. 2007: 137), (iii) it features in juncturephenomena (linking r, intrusive r) and (iv) it breaks down into highly distinct phonetic variants (e.g. trill [r], tap [ɾ], and several kinds of approximants [ɹ, ɻ, ʋ]). This diversity and these different roles in different contexts are implied in the idea of “hyper-variation” (Scobbie 2006: 337; cf. 7.1).5 Hyper-variation can also mean that the variation of (r) may follow rather different sociolinguistic patterns in each of the three contexts outlined above, coda (r), linking (r), and onset (r). For this reason alone, (r) cannot reasonably be treated as a single unified variable, quite apart from the considerable problems involved in the construction of a single statistical model for all subspecies of (r). While the concept of linking (r) may seem a shade dubious in a rhotic accent like SSE (where, as a segment, /r/ should be in place in any case), this problem is much reduced if SSE is regarded as variably rhotic (cf. 7.1). In this view, potential liaison sites are simply a special context in which the general rate of vocalisation of (r) may not only be different from the one found in non-linking coda (r) (presumably lower, that is), but may also follow different sociolinguistic patterns. According to Foulkes (1997b: 78), “[i]t is usual in studies of English [r] to include a parenthetic comment regarding, or rather disregarding, the high degree of variation across accents in the phonetic characteristics of what is for convenience transcribed as [r]”. He goes on to explain that, due to the dominance of a phonological perspective, the specific phonetic realisations of /r/ are often of less interest, as the main focus is on the presence or absence of the segment (1997b: 79). In the Scottish context, for example, it makes a potentially vast difference whether all variants of articulated (r) are included within a single category contrasted with the zero-variant, or whether more than two categories are assumed, e.g. [ɹ], [ɾ], and Ø. The former approach might be called sociophonological, because it ignores phonetic properties of the outcome variable; the latter approach is truly sociophonetic, because it assumes that the variation of interest goes beyond the mere presence or absence of the segment (cf. Schützler 2013). The choice of statistical tools in the present study is geared towards the inclusion of both aspects of (r) in 5. Results presented in Krug and Schützler (2013) suggest that the extent to which certain constructions grammaticalise has an effect on the use of intrusive /r/ between their components (see footnote 44). This, it can be argued, is an additional (and as yet little explored) facet of the multifunctionality of /r/.
9
10
A Sociophonetic Approach to Scottish Standard English
the cases of coda (r) and linking (r) (see Section 4.4.1). However, descriptions of processes as sociophonological will generally be avoided; particularly where (r) is vocalised (or inaudible) in rapid speech or weak syllables, it would be difficult to argue that the segment has truly disappeared from phonological representation. Findings by Lawson, Stuart-Smith and Scobbie (2008), and the fact that all speakers in the present study are variably rhotic, suggest that it is probably too early to claim a changing phonology of /r/ in ScE, even if the segment is in many instances not produced or not perceived auditorily. The previous paragraphs highlighted some of the variable-specific problems, e.g. concerning the kind of variation that can be expected and the amount of phonetic detail that needs to be included. However, taking a more general macroperspective also throws certain problems into relief. There are, for example, r eports in the variationist literature to the effect that vowels and consonants may respond quite differently to contact situations. For instance, Milroy et al. (1999: 42) and Kerswill (2003: 231) claim that vowel changes are generally more local, while consonants are affected by influences across a wider geographical area. It has to be pointed out, however, that Kerswill’s claim, made in the context of levelling, does not necessarily hold in SSE-SSBE contact, where it is not fully appropriate to assume that levelling is taking place at all (cf. 3.1). It must also be said that Kerswill and Williams (2000: 11) find the opposite pattern for Milton Keynes: vowels in particular change towards a supraregional standard (in this case SSBE6), while for consonants levelling seems to happen towards a more regionally confined non-standard southern norm. Considering these different patterns and the lack of explanation for them, it is extremely difficult to say if Foulkes and Docherty (1999: 12) are correct in claiming that, on the whole, differences between accents are marked by vowels more than by consonants. Against this background, the present study will try to assess how closely or how loosely the vocalic and consonantal variables attach to either SSE or SSBE, and whether there are fundamental differences between them in this respect. 1.4 Research in a sociophonetic framework As Thomas (2011: 1) points out, sociophonetics as a sub-discipline in its own right was not truly recognised before the mid-1990s. This is all the more surprising because, as Hay and Drager (2007: 90–91) rightly point out, even the early works of Labov (e.g. 1966) – and, in the Scottish context, Macaulay (1977) and
6. Kerswill and Williams use the term RP, not SSBE (see footnote 2).
Chapter 1. Introduction
Romaine (1978) – fall broadly within the domain of sociophonetics. Acceptance of sociophonetics as an autonomous discipline varies. Thomas (2011: 1) finds that some linguists regard it as a questionable construct (used, perhaps, with an eye on its considerable buzzword factor), while others see unique theoretical aspects in it. A theme common to most of the current approaches to sociophonetics as a linguistic discipline (Thomas 2011; Foulkes, Scobbie & Watt 2010; Hay & Drager 2007; Foulkes & Docherty 2006; Hay & Jannedy 2006) is that, (i) as a discipline, it is still emergent, and that, (ii) due to its situation between the social and the phonetic, there is a rather large degree of methodological freedom in comparison to other, better-established fields within linguistics (cf. Foulkes, Scobbie & Watt 2010: 727–733). It is therefore necessary to review some of these aspects in this section, and to outline what kind of sociophonetic research is carried out in the present study. Foulkes and Docherty (2006: 410) comment on the generally rather poor definition of sociophonetics. Sociophonetic variation has been defined as “variable aspects of phonetic or phonological structure in which alternative forms correlate with social factors” (Foulkes & Docherty 2006: 411); “[s]ocially indexical, interindividual differences in pronunciation” (Hay & Drager 2007: 91); and “phonetic variation in speech which is socially conditioned” (Hay & Jannedy 2006: 405). Sociophonetics, accordingly, is the study of these phenomena (Hay & Drager 2007: 90; Foulkes, Scobbie & Watt 2010: 704). This suggests that sociophonetics is no more than a sociolinguistic study of phonological or phonetic variables, a type of research that has become rather fashionable soon after Labov’s groundbreaking work in the 1960s and can therefore not be called particularly innovative per se. However, the challenge in sociophonetics – and probably also the key to its emancipation as a research tradition in its own right – lies in the integration of methodologies and theories stemming from phonetics and sociolinguistics, respectively, in a single framework (Foulkes, Scobbie & Watt 2010: 703). Naturally, individual responses to these challenges can concentrate more upon aspects and research questions of a social nature or focus on phonetic aspects and problems (Foulkes, Scobbie & Watt 2010: 703). These two approaches differ especially in terms of sampling and the resulting quality of the data (Thomas 2011: 3). Two perspectives on the problem are summarised as follows by Hay and Jannedy (2006: 405): From a phonetician’s perspective, much current variationist work could be viewed as lacking in methodological rigour both in terms of the analysis of fine phonetic detail, and the sophisticated statistical modelling of that detail. On the other hand, from a variationist’s perspective most phonetic work could be criticized for the focus on non-naturally occurring speech.
11
12
A Sociophonetic Approach to Scottish Standard English
A sociophonetic study will usually strike a compromise between the two positions, because phonetic and sociolinguistic variation are each in their own ways highly complex, and a study interested in both types of variation will have to deal with these combined complexities (cf. Hay & Drager 2007: 90). The somewhat ill-defined state of sociophonetics seems to be due to its position between its component disciplines. Pointing to possible improvements to this state, Hay and Drager (2007: 94–95) identify two major trends in current sociophonetic research: Sociophonetics is extending in two exciting directions: one that complicates the social, and one that complicates the phonetic. These directions stand to benefit the field most if they are jointly pursued. Future work in the field should investigate the extent to which continuous social factors correlate with continuous acoustic factors and the extent to which both correlate with a combination of different nonlinguistic cues.
The implication is that at the end of this process, which is characterised by the development and application of increasingly complex research methods, there will be a sociophonetic discipline that is defined by this very complexity. Regarding the first (sociolinguistic) path of development, Hay and Drager (2007: 93) see the future in an ethnographic approach, where social categories are not limited to traditional predetermined groupings, e.g. according to gender, age, and class, but are more inclusive of other information types as well, paying special attention to the construction of social identity at the level of a specific speech community (cf. also Foulkes, Scobbie & Watt 2010: 706–717). With respect to the second point, i.e. an increase in complexity of the phonetic detail that is modelled, Hay and Drager (2007: 92) use the example of the trend in vowel analysis towards the treatment of outcomes as continuous (e.g. formant frequencies), fine variations of which cannot be captured auditorily. This is a trend that is now almost the rule for studies on vowels, but is relatively slow to catch on for the analysis of consonants (an exception being Docherty and Foulkes 1999, 2001). In addition to aspects of growing phonetic and sociolinguistic complexity, there is a third dimension into which sociophonetic research seems to be branching out. This is the growing application of increasingly complex statistical models. There is a trajectory of change in statistical approaches that can be traced from the purely descriptive (e.g. the statement and discussion of proportions in the early work of Labov) via the application of statistical tests to proportions (e.g. inter-group tests using the χ2-statistic in Johnston 1984) to the application of analyses of variance (ANOVA; e.g. in Lawson 2009) and (multiple) linear or logistic regression (e.g. Sudbury & Hay 2002; Schützler 2010b). But the development does not stop here. In particular, the hierarchical nature of data structures frequently encountered in
Chapter 1. Introduction
sociophonetics calls for statistical models that approach these structures accordingly. The present study will suggest the use of multilevel models of analysis to address these issues (see Chapter 4.4; cf. Hay 2011: 212–213). Hay and Drager (2007: 94) concede that only very few studies combine fine phonetic detail with an ethnographic approach – that is, few studies develop an interest in methodological and theoretical innovations in more than one area. This is also true for the present study, which invests most heavily in phonetic and statistical issues while remaining relatively conservative concerning sociolinguistic issues. 1.5 Overall contribution of the present study At the most global level, this study investigates SSE-SSBE contact as an underresearched area. As a result, some additions to current models of accent contact will be suggested, and the hypothesis will be advanced that SSE has a crucial d ouble function. On the one hand, it is the standard accent in the Scottish context; on the other hand it is characterised by a certain flexibility when in contact with another major standard accent of the wider English-speaking world, namely SSBE. Consequently, the well-known Scots-English continuum is augmented by a second dimension of variability with SSE and SSBE as its poles (see 1.1 and 2.2). A central question that is addressed is whether the patterns of variation observed in the data suggest that contact between SSE and SSBE is indeed a driving force behind it. By looking at several variables that have relatively clearly identifiable standard forms in SSE and SSBE, it is possible to inspect how uniformly they vary between these two reference accents. For example, it may very well be the case that socio-stylistic variation of (e) and (o) follows markedly different patterns than that of (r). In this case, one would probably conclude that there is no direct, unidirectional influence from SSBE, but that the predominant motivation underlying variation of SSE is accent-internal stratification. The presence of SSBE forms in contact situations may still have an influence on this kind of variation, but not in the sense that there is a uniform pull in one direction, and therefore a shift of norms. It is a possibility, but not inevitability, that anglicisation is responsible for the observed variation. Methodologically, the “complication of the social” (Hay & Drager 2007: 94; see 1.4 above) is a path that the present study does not follow. Social variables, while used as parts of a novel and complex statistical model (cf. 4.4), are treated in a relatively traditional way (cf. 3.2). That is, group membership and identity are not quantified at a level of complexity that goes beyond categorical descriptors like, for example, ‘age’ and ‘gender’. However, the phonetic aspects of the data are approached in a complex way, exploring the notion that very often the treatment
13
14
A Sociophonetic Approach to Scottish Standard English
of a variable as a single parameter cannot adequately capture the complex variation it is subject to. In particular, the incipient diphthongs (e) and (o) are treated in this way, in this case as variables composite of four parameters that can, at least in theory, vary independently (see 4.2.3). The variable (r) is also approached in a more complex way than in most earlier studies. For coda (r) and linking (r), for example, this has the advantage that a single statistical model can be read in two ways: as a prediction of (r)‑vocalisation or retention, and as a prediction of the phonetic form unvocalised (r) takes (see 4.4.1). Additionally, as part of the analyses of (e) and (o), some minor alterations to existing techniques for the quantification of acoustic vowel data are suggested (see 4.2.2). Finally, multilevel analysis is used as a statistical tool that overcomes some of the drawbacks inherent in other approaches. It is also suggested as a way of partitioning variation (and explanation) in the statistical model into three categories that correspond to the respective partitioning in theory into internal, stylistic and social variation (see 4.4). 1.6 Structure of the book Following the discussion of the focus and the general aims of the present study that has been presented above, Chapter 2 gives an overview of the linguistic situation in Scotland, providing brief accounts of the history of Scots, the emergence of SSE, and the present-day view of a Scots-English language continuum and its implications. This chapter also includes a brief summary of the main features of SSE phonology and phonetics, beyond the ones that are investigated in this study. Some of the models that have been proposed to account for language variation and change (LVC) in situations of dialect contact will be outlined in Chapter 3. In particular, the notion of change by accommodation (Auer & Hinskens 2005; Trudgill 1986) will be discussed. Furthermore, consideration will be given to ways in which different levels of LVC, namely internal (linguistic or psycholinguistic) and external (socio-stylistic) factors, can be integrated in a unified approach. The chapter includes brief definitions of the independent variables. Chapter 4 deals with four different aspects of methodology. In Chapter 4.1, the data collection procedures for this study are described. This includes aspects of sampling, the fieldwork materials that were used, recording technique, and the processing of soundfiles. Chapter 4.2 contains an introduction to the main tools for vowel analysis and a description of how vowel data were measured and transformed in the present study. Chapter 4.3 outlines how (r) is broken down into subvariables and describes the approach taken in the auditory analysis. Finally, Chapter 4.4 comprises an introduction to multilevel modelling, the statistical approach taken in the main analyses of the study (Chapters 6 and 8).
Chapter 1. Introduction
The research background for the analyses of (e) and (o) is provided in Chapter 5, leading up to the formulation of more specific research questions and expectations. The chapter includes a discussion of general aspects of (incipient) diphthongs as well as a short summary of the historical developments of the two particular sounds under scrutiny, and reports previous findings, focusing on the Scottish context. The analyses proper of (e) and (o) and a discussion of results are found in Chapter 6. Similar in structure to Chapter 5, Chapter 7 highlights the research background of (r), providing the basis for the main research questions and expectations formulated at the end of the chapter. It includes some general discussions of rhotic sounds as a somewhat problematic category, and proceeds to outline the more recent history of (r) in English and the existing research on this variable, particularly in Scottish English. Chapter 8 contains the main analyses of (r) with a discussion of findings. Chapter 9 provides a final summary and discussion of central results as well as concluding remarks, aiming to bring together and generalise from the separate studies of the individual variables. It also assesses possible implications of the study as a whole for a theory of SSE.
15
chapter 2
Scottish Standard English in context
Two kinds of context are relevant for an understanding of SSE. First, there is the historical context out of which SSE emerged and took its particular position as the standard accent in the present-day continuum referred to as Scottish English (2.1). Second, there is the wider context of the British Isles, with SSE as one of several standards of pronunciation – the most important neighbour being SSBE, of course (2.2). It is from the historical development and the present-day position of the SSE accent that its defining characteristics and their variability can be understood (2.3). 2.1 The Scottish English language continuum Scots and English both derive from the Anglo-Saxon branch of the Germanic languages. Moreover, they have been in relatively close contact for much of their histories. This combination of relatedness and prolonged contact makes them linguistically compatible (McClure 1994: 23–24) or “partially interpenetrable” (Johnston 2007: 110). Therefore, everyday usage in Scotland often cannot be described as purely English or Scots, but will be intermediate between the two (Aitken 1985: 41–42). It is therefore convenient to assume a continuum in which linguistic forms can tend more towards either English or Scots (McArthur 1979: 56; Wells 1982: 395). The two concepts SSE and Scots are used to describe the extreme points, or poles, of the continuum (cf. Aitken 1979: 86–87; Jones 2002: 24; StuartSmith 2008: 48). The continuum as a whole is usually called Scottish English (Bauer 2002: 26; Hughes, Trudgill & Watt 2005: 101; Stuart-Smith 2008: 52), and the present study follows this practice. A less common terminological solution is to call the continuum Scots (Corbett, McClure & Stuart-Smith 2003: 1–2; McClure 1979: 30), or Modern Scots (Jones 2002: 1). As Corbett, McClure and Stuart-Smith (2003: 1–2) point out, one problem with this is that the use of the term Scots is double: if unqualified, it is usually taken to refer to broad Scots, and if used to denote the Scots-English language continuum, there would thus be a ‘Scots within Scots’. This terminological quandary, i.e. the ambiguity (or polysemy) of the term Scots, may be one of the reasons why it is seldom adopted to denote the whole continuum. McArthur (1979: 51) introduces the term Scots English as an umbrella
18
A Sociophonetic Approach to Scottish Standard English
term covering the combined domains of English and Scots in Scotland (see also Aitken 1979: 85). While there is much to be said for McArthur’s label, the present study is not the place for this terminological debate, and the better-known and unambiguous term Scottish English is adhered to.7 2.1.1 Scots and Scottish Standard English McClure (1988: 14–15) provides an excellent brief description of the genesis of Scots as a language that originated from the same Old English roots but took a different development than English in the rest of Britain. According to him, Scots is the language brought within the boundaries of what is now Scotland by the encroaching Germanic invaders in the sixth century, becoming one of the languages of the Scottish king’s domains in the eleventh, gradually establishing itself as the medium through which Scotland became a fully-fledged European state in the twelfth and thirteenth, finally coming to be employed as the language of monarchy and government in the early fifteenth, and all the while diversifying from the related dialect of the English metropolis.
Today, at least three problems characterise all attempts to define the form and function of Scots within the Scots-English continuum: (i) structurally, Scots is and always has been closely related to English; (ii) in practice, we hardly ever encounter anything like ‘pure Scots’; and (iii) some authors define things from the vantage point of historical Scots, while others concentrate on present-day usage. Thus the boundary between Scots and SSE is elusive both in theory and practice and remains “extremely nebulous” (McClure 1979: 26; cf. also Burchfield 1994: 5; McArthur 1998: 140). This complex situation is easier to understand against the background of the intertwined histories of Scots and English in Scotland. There have been a number of catalytic forces in the demise of Scots from the 16th century onwards and the simultaneous emergence of what was to become SSE. In the first wave, English norms in writing were propagated by the introduction of the printing press to Scotland in 1508 (Romaine 1982: 58–59; Murison 1979: 9) and by the Reformation (McClure 1994: 33; Aitken 1979: 87). In the latter context, it was particularly the English Bible that had an influence mainly on the written language (Romaine 1982: 58; cf. also Aitken 1979: 91). McClure (1994: 36) views the departure of James VI and I to London after the Union of the Crowns in 1603 as a stronger factor even than the Reformation in boosting the spread of 7. Schmitt’s (2009: 88) terminological solution may at first seem slightly defeatist, but is in fact rather elegant: he only describes and labels the poles of the continuum.
Chapter 2. Scottish Standard English in context
a more southern variety of English. According to Murison (1979: 9), however, the effect was again largely limited to written language (cf. also McClure 1994: 37; Aitken 1979: 90). Romaine (1982: 61) sees the Union of the Parliaments in 1707 as a further step in the degradation of Scots to a merely regional language, and Johnston (2007: 106) points out that the urban upper classes of Scotland, i.e. the elite, generally rejected post-1707 nationalism, both political and linguistic, which led to further anglicisation. For everyday speech, Southern English became a model no earlier than the 18th century. The second half of that century was also a time of deliberate attempts to anglicise Scottish middle-class speech. Notable examples are the teachings of Thomas Sheridan and the agenda of the Select Society of Edinburgh (Jones 1997: 268–272; 1993: 98).8 As a result, upper-class speech did become more English in some ways, but it nevertheless remained recognisably Scottish, more or less resulting in SSE in the contemporary sense (Aitken 1979: 96; Johnston 2007: 108). Jones (1997: 274; 1993: 124–125) traces the promotion of high-prestige features of educated Scots speech in some of the 18th-century elocutionist publications (cf. also Corbett, McClure & Stuart-Smith 2003: 13). It is a relatively small step from these institutionalised accents to the SSE accent of the present day (Corbett, McClure & Stuart-Smith 2003: 13). The trend established in the 18th century was continued in the 19th. While there was an academic interest in and nostalgia connected to Scots, it was no longer a serious competitor to English in everyday usage (Jones 1997: 273). ‘Proper’ English also became the norm in education, a policy further cemented by the introduction of an all-British state system for schools (Education Act 1872). Even in the early 20th century the situation of Scots was much the same as in the 19th (Murison 1979: 12), excepting attempts by C. M. Grieve (alias Hugh McDiarmid) in the 1920s to re-establish Scots through intellectually more serious poetry, an attempt followed up by the ‘Lallans’ movement later in the century.9 Today, in a Scotland that is part of Europe and a globalised world economy, Standard English is used in writing and SSE in speaking in virtually all domains of public life. Contributions like those of Kay (2006) and Fitt (2000) in language activism and literature, respectively, and the acclaim and interest that greet them leave no doubt that Scots continues to have its place in Scottish life and culture. Its role, however, is certainly limited. 8. The (telling) full name is given by Jones (1993: 98) as Select Society for Promoting the Reading and Speaking of the English Language in Scotland. 9. For a particularly disparaging account of MacDiarmid and the Lallans movement cf. Bähr (1974: 140–147), for more balanced views cf. McClure (1981: 98) and Dósa (1999: 80–81).
19
20 A Sociophonetic Approach to Scottish Standard English
For McArthur (1979: 51) Scots comprises local or regional vernaculars, but it can also have a social function. As Stuart-Smith (2008: 48) points out, in the urban varieties Scots is generally more working-class and SSE more middle-class (cf. Section 2.1.3). Thus, Scots can be regionally, socially and stylistically marked. The national importance of Scots also comes through in Romaine (1982: 57), who defines it as “the distinctive English of Scotland” that was consolidated into a national language. While Scots clearly no longer functions as that national language but as a set of dialects, raising awareness of its historical status will result in a more balanced view of the continuum in which the role of SSE as a standard is due to historical accident rather than to an innate deficiency of Scots. However, in practice this is of little avail to the latter’s status; to many people, ‘non-standard’ or ‘dialect’ is near-synonymous with ‘uneducated’, ‘lower-class’, or the like (cf. McArthur 1998: xvi; Jones & Singh 2005: 69). The paradox of the double status of Scots as a national language and a regional or social dialect can be partly resolved using Johnston’s (2007: 105; see also Trudgill 2002: 116) distinction between function and status, in which he highlights both the historical and the contemporary perspective: While [Scots] functions as the localised dialect of Lowland Scotland, it enjoys a special status due to an important aspect of its history: it is the only Germanic variety in Britain besides Standard English ever to have functioned as a full language within an independent state […] and to have been used for all domains […].
It appears that at a surface level, Scots today functions as a group of dialects relative to SSE. Scots dialects differ from SSE in terms of morphosyntax (cf. Miller 2008; Jones 2002: 9–22) and at the accent level in the realisation of certain phonemes, but also in the distribution of phonemes.10 In status, however, Scots often is far more than ‘just’ a dialect of English. In its actual form and use it obviously lacks autonomy from SSE, but at the same time a strong mythological element has been created around the ideal of the pure and distinct Scots language, linked to its former status of national language (Aitken 1981b: 86). 2.1.2 Status and definition of SSE Linguistically, SSE could be defined as ‘the standard dialect of English used in Scotland, including a Scottish standard accent’. Thus it is the counterpart of SSBE, i.e. the standard dialect of southern English spoken with a prestigious southern
10. Consider, for example, Scots stane [sten] and gless [glɛs] vs. SSE stone [ston] and glass [glas]; s.v. stane and glass in the Concise Scots Dictionary (Robinson, ed. 1985).
Chapter 2. Scottish Standard English in context
British accent.11 In practice, however, SSE – like SSBE, in fact – is mostly used to denote the accent itself, while even small differences in morphosyntax are usually referred to as Scots. A good definition of SSE in this latter (more narrow) sense is the one provided by Carr (1999: 156; cf. Wells 1982: 395; Stuart-Smith 2008: 48): “SSE is the standard accent which many Scots speak when speaking the Standard English dialect”. It is an open question whether the grammars of SSE and SSBE are truly similar enough to be regarded as a single Standard English grammar (as implicitly done by Carr); this question, however, cannot be addressed in the present study. Giegerich (1992: 46) holds that strictly speaking there is no single accent in Scotland with the sociolinguistic status of a standard. The only aspirant to this position is the SSE accent, but it is certainly less clearly defined than SSBE or GA. In contrast, McClure (1994: 79) claims that SSE is accepted as a standard in its own right not only in Scotland, but internationally as well (cf. also Jones 2002: 24). McClure does not use the term SSE in the narrow sense of ‘Scottish Standard English accent’, but his main focus is nevertheless on pronunciation. Education is a parameter often used in connection with SSE. Thus, SSE for Aitken (1979: 99) is the “educated Scottish accent”, Stuart-Smith (2008: 48) associates it with educated middle-class usage, and for McClure (1994: 79–80) it is the “speech of the professional class and the accepted norm in schools”. See also the much earlier example of Grant (1914: 4): The standard adopted in this book is the speech of the educated middle classes in Scotland. It is the speech of our Universities, of the pulpit, the platform, and the school, and although in different districts it may present some variations, it constitutes on the whole a type of pronunciation quite distinct from that of England.
Grant does not explicitly call this Scottish Standard English but Standard Scottish, but as it is associated with the “educated middle classes”, with educational institutions, and with formal public speech, he clearly refers to what we know as SSE. Variation within the SSE accent is generally assumed to be limited (StuartSmith 2008: 48), which may partly explain why research dealing specifically with SSE is equally limited, as discussed in Section 1.1. There is a common phonological system shared by most forms of SSE, with some differences conditioned by regional and social factors, and to a lesser degree also by age and gender (McClure 1994: 79). Giegerich (1992: 46) excepts accents of the Scottish Borders and the North from a 11. There is room for improvement in the use of these labels. It could be argued, for example, that in analogy to Scottish Standard English (SSE), the term Southern British Standard English would be more appropriate than Southern Standard British English (SSBE), since the latter takes it for granted that a uniform standard dialect of English is valid for the whole of Britain.
21
22
A Sociophonetic Approach to Scottish Standard English
general definition of SSE because they share many but not all of its characteristics. There is also the difference between SSE in general and Highland English, the accent of English found especially in the Western Isles and the Scottish Highlands (Wells 1982: 412–414; Jones 2002: 81–83). As it is based on a Gaelic substratum, Highland English could be said to constitute a different branch of SSE, or even a separate accent (e.g. Macafee 2004: 61). 2.1.3 ‘Drifting’ Where on the Scots-English continuum a speaker is positioned in a particular context mainly depends on regional or sociolinguistic factors (e.g. Jones 2002: 5). An example of the latter is formality of context (Jones 2002: 24; Stuart-Smith 2008: 48), while the effect of region can be that in rural areas Scots is generally spoken more than elsewhere. But there is another more basic difference between rural and urban speakers. Wells (1982: 395) indicates that in rural areas there may be a rather sharp division between Scots and SSE, while the continuum is more typical of urban areas. Likewise, Stuart-Smith (2008: 48; cf. Aitken 1985: 42) differentiates between the quasi-categorical dialect shifting of rural speakers and the more flexible and gradual dialect drifting typical of urban speakers (Stuart-Smith 2008: 53). Scots is generally ascribed more to the working classes (Stuart-Smith 2008: 48) and is often called Modern Scots or Modern Urban Scots (Stuart-Smith 2003) if referred to in this context. Intra-speaker variation between Scots and SSE is largely conditioned by formality of context, formal occasions evoking SSE and private settings evoking Scots (Stuart-Smith 2008: 48). Whatever factors result in a certain position in the continuum, McArthur (1979: 60) rightly suggests that, to avoid bias, speakers should be described as moving “across” or “over” from one pole of the continuum to the other, rather than “up and down”.12 Drifting is of course a more fuzzy and complex process than shifting (or switching), as style-drifters are less predictable in their behaviour and display more fluctuation (Aitken 1979: 85–86). The two different ways in which Scots and SSE can interact or relate to each other is also captured by McArthur (1979: 56) in a discussion of the concept of bipolarism as compared to that of diglossia. He uses both to describe the Scottish language situation, but while bipolarism refers to the continuum-scenario outlined above, diglossia refers to the shift-scenario assumed for rural settings. McArthur (1979: 57) generally finds diglossia too categorical a term. A useful general concept might be “flexible diglossia” (Schmitt 2009: 309) or ‘elastic diglossia’. 12. This is in contrast to the representation used in Figure 2.1 in Section 2.2, which is useful, however, because of the two-dimensional model that is proposed.
Chapter 2. Scottish Standard English in context
In the case of Scots and English, the concept of bipolarism can also be interpreted as a general theme in the history of the speech community, as in the following statement by McArthur (1979: 53): Thus, with the tongues called English and Scots no-one can point to a stage in time or space where something English becomes something Scots – unless of course one were to point to Hadrian’s Wall.
Thus, the present-day Scots-English continuum would appear to reflect in synchrony the diachronic developments Scots and English have undergone in each other’s company from the Reformation until today. 2.2 SSE as a double contact variety The notion of ‘double contact’ refers to the fact that on the one hand SSE is situated at the standard pole of the Scots-English continuum (i.e. Scottish English), but at the same time at one pole of another continuum, namely that between SSE and SSBE, which is a continuum between two standard accents. While drifting within Scottish English is expected to follow the patterns typical of shifts between dialect and standard, the interaction of SSE and SSBE is on the whole less predictable. As explained in Section 1.1, it is not often consciously or theoretically acknowl edged that “SSE is itself a contact variety” (Macafee 2004: 59). Research tends to focus rather heavily on the interface between Scots and English and is less interested in the contact between the standard varieties SSE and SSBE (Corbett, McClure & Stuart-Smith 2003: 4). This may be because SSE as a standard is not as clearly defined as SSBE (Giegerich 1992: 46). Additionally, the concept of SSE is hardly known to non-linguists in Britain and Scotland, as Schmitt (2009: 307) points out. An assessment of the role played by SSBE-SSE contact in language variation and change hinges on the relative status of these two accents in Scotland. There have been early attempts to promote SSE as a standard in teaching (Williams 1912; Grant 1914), and Abercrombie (1991: 53) also toys with the idea of establishing SSE as a standard accent that, he claims, is for several reasons s uperior to what he still calls RP, but what in this book is generally labelled SSBE. He even goes so far as to reject RP as a model accent, not only in the Scottish context but in general (Abercrombie 1991: 51): RP is a much disliked accent in many parts of the world, particularly in Scotland and America. I am an RP speaker, so I speak from experience. It is disliked, as well as envied, by many people in England also. This dislike is becoming more common, and also more outspoken. There are signs that RP’s prestige, privileges, and power are being eroded.
23
24
A Sociophonetic Approach to Scottish Standard English
Later, he concludes that “[t]he position and prospects of RP today […] are not very bright” and that the accent “is slowly but surely on its way out” (Abercrombie 1991: 53). Abercrombie’s view seems a bit harsh and somewhat overgeneralising; we also have to be careful not to take the term RP (as Abercrombie uses it) to mean southern accents of British English in general. Wells (1982: 393) comments on the status of SSBE13 in Scotland in more moderate terms, simply saying that it “does not enjoy the same tacit status in Scotland as it does in England and Wales […].” Far from denigrating SSBE in a general way, Wells makes it clear that specifically for the Scottish context this accent may not be accepted as a model. Similarly, McClure (1994: 80) explains that Southern English accents are not native to Scotland and therefore do not generally enjoy social prestige – in his view, they are not necessarily perceived as negative, but simply do not appear to be a factor in the linguistic landscape of Scotland (also cf. Johnston 2007: 109). The above views (Wells 1982; Abercrombie 1991; McClure 1994) would result in a language-contact model that could be called a ‘vertical shift’ model: in Scot land, the Scots-English continuum functions as described in Section 2.1.3, but between SSE and SSBE there is virtually no interaction, i.e. influence of one upon the other, simply because they constitute two standards, one for the English, and one for the Scots. This scenario is represented in Figure 2.1a, where the vertical bar indicates that the Scottish English continuum between SSE and Scots is detached from SSBE. Others, while they are still very critical of the role of SSBE in Scotland, see at least a limited possibility of SSBE influence on the standard accent of Scotland. Aitken (1979: 110–111) considers it possible or even likely that modifications of Abercrombie’s (1979) Basic Scottish Vowel System towards SSBE are due to individual speakers’ imitation of speakers from England. But, at the same time, Aitken describes the feelings of Scots towards southern speech as ambivalent: “[RP] simultaneously raises hackles and overawes”. This potential modification towards SSBE and the simultaneous discomfort with that accent can mean that only a limited number of accent features will anglicise, the accent as a whole remaining clearly identifiable as SSE. In the context of the present study, this means that even if speakers (variably) diphthongise (e) and (o) or vocalise coda (r) at unusually high rates, this alone cannot be interpreted as evidence of a general anglicisation of their accents. On the other hand, SSBE (or certain components of it) may very well serve as a model for individual speakers. This aspect is also taken up by Hansen, Carls and Lucko (1996: 68), who say that, while SSE as a standard dominates the linguistic situation in Scotland, RP – a term that is preferable to SSBE in this context (see
13. Wells, too, uses the term RP, not SSBE (see footnote 2).
Chapter 2. Scottish Standard English in context
footnote 2) – is still preferred by certain groups like the landed gentry. While this group’s importance as a role model is very limited, to say the least, the idea can probably be extended to speakers of the (upper) middle classes, who may be more readily influenced by southern English speech (see Johnston 1984). The most important prerequisite of an anglicised Scottish accent would seem to be contact with speakers of SSBE. This implies that speakers belong to the middle classes, as the necessary sociolinguistic contact is most likely to happen there (cf. Giegerich 1992: 57). McClure (1994: 91) also sees a general southern influence through the media. He thinks that there is not enough Scots or even SSE in television and the field is thus left to SSBE.14 Accepting the more liberal view that under certain circumstances speakers may become more English at least regarding some features of their accents, the model represented in Figure 2.1b would apply. Here the vertical shift along the Scots-English continuum is unchanged, but there is also an influence of SSBE on SSE. (a)
SSE Scots
SSBE
(b)
SSE
SSBE
Scots
Figure 2.1 Scots, SSE and SSBE: Two models of contact.
Intuitively there is no reason why this interaction should work in only one direction, i.e. why speakers of SSBE living in Scotland should not be influenced by and partly converge towards the SSE accent, but this is not an aspect investigated in the present study (also see the discussion of Figure 3.2 in Section 3.3). McClure (1994: 80) comments that RP speakers in Scotland, if not actual English nationals, have generally acquired their accent through direct English influence, such as education in England or in a quasi-English private school […].
According to McClure, it is reasonable to assume that noticeable anglicisation (including the complete anglicisation he mentions) is more likely in surroundings – like certain schools and universities – where English or strongly anglicised accents are spoken naturally and with no stigma attached to them, alongside with other (in this case: Scottish) accents. This kind of contact functions as the ‘node’ between the two standard accents, SSE and SSBE. There is no set formula predicting which 14. However, see Trudgill (1986: 40), who doubts that there is a strong influence of one-way media like television on accent change.
25
26 A Sociophonetic Approach to Scottish Standard English
features are most likely to anglicise in these contexts, or which features are salient in the sense that they will be more readily perceived as English in a Scottish speaker. Aitken (1979: 111–113) suggests that certain accent features are likely candidates for the drift between SSE and SSBE and calls such features optional features of SSE. They will be discussed in Section 2.3.2. 2.3 Characteristics of the SSE accent This section will concentrate on systemic, structural (i.e. phonotactic), and realisational aspects of the stressed monophthongs and consonants of SSE, as far as they can be regarded as defining characteristics that set this accent apart from SSBE. Aspects of suprasegmental phonology are not discussed, as they hardly play a role for the present study, and research in this field is scarce (Stuart-Smith 2008: 65–66; but cf. Grant 1914: 85–93; Wells 1982: 414–415). For a discussion of vowels in unstressed syllables see Abercrombie (1979: 80–81) and Stuart-Smith (2008: 55), and for a more detailed description of the full diphthongs /aɪ, au, ɔɪ/ see Schützler (2011a). 2.3.1 The sound inventory The vowel system of SSE is very small (Abercrombie 1979: 81), due to the complete absence of centring diphthongs – which in turn is the result of rhoticity – and several vowel mergers (see below). One possible vowel inventory of SSE is /i, ɪ, e, ɛ, a, u, o, ʌ, ɔ, aɪ, au, ɔɪ/ (cf. Figure 2.2 below), which is very much in accordance with Giegerich (1992: 47), who transcribes /aʊ/ instead of /au/ and includes /ə/, but otherwise does not differ.15 Figure 2.2 is a full representation of the SSE vowel inventory. It is informed by and not very different from a chart provided by Giegerich (1992: 75), except for the trajectory of /au/, which is given as clearly fronting here, in accordance with the front quality of /u/. The arrangement of /au/ and /u/ is based on the results of an acoustic investigation of Edinburgh speech (Schützler 2011a), which analysed a subsample of the data used for the present study. A central characteristic of SSE is the lack of phonemic vowel length. The socalled Scottish Vowel Length Rule (SVLR; Aitken 1981a; McKenna 1988; Scobbie, Hewlett & Turk 1999) specifies that there is no general length differentiation
15. Abercrombie (1979: 72) also includes /ɛ̈/ as an optional element, but its phonemic status can be doubted. It seems to be a non-contrastively distributed alternative to /ɛ/, which occurs in specific lexemes.
Chapter 2. Scottish Standard English in context
i u e
o
i
i
au a
ai
Figure 2.2 The vowels of SSE.
between the vowels in word pairs like beat/bit and bait/bet, all of whose vowels are essentially short (Giegerich 1992: 56). The rule further defines that vowels will be lengthened in open stressed syllables, before a voiced fricative, and before /r/. This is also the case if an inflectional suffix is added to an open syllable. Thus, the stressed vowels in agree and agreed are long, but the vowel in greed is short (McClure 1994: 80; Stuart-Smith 2008: 58). The current view is that only /i, u, aɪ/ are affected and that the role of SVLR is therefore much more limited than formerly believed (Scobbie, Hewlett & Turk 1999: 244). It is sometimes suggested that an additional contrast exists between (short) /aɪ/ and (long) /ae/ in minimal pairs like side and sighed (e.g. Abercrombie 1979: 72). Despite the fact that contrasts like this are responsible for a change in meaning, the phenomenon is usually treated as related to allophonic variation, as it is predictable from the vowel’s environment (e.g. Carr 1999: 157). The SVLR could also be said to create the minimal pair brood (monomorphemic) vs. brewed (morphologically complex) and thus the phonemes /u/ and /uː/, which has not been claimed in the literature. In SSE the lexical sets trap / palm / bath are represented by a single phoneme /a/ in the same way that lot / cloth / thought and foot / goose are represented not by pairs of phonemes but only by /ɔ/ and /u/, respectively (McClure 1994: 81; Giegerich 1992: 54; Jones 2002: 25–27; Stuart-Smith 2008: 59–60). Part of the reason for this reduced vowel inventory is seen in the SVLR (see above), with the general lack of length-distinctions between vowels eroding their qualitative distinctions also (McClure 1994: 81; Giegerich 1992: 56). A differentiation between /ɑ/ and /a/ along the lines of the contrast in SSBE is reported by Abercrombie (1979: 75–76) especially for Edinburgh speakers, by Giegerich (1992: 57) for middle-class speakers from Edinburgh and Glasgow, and by Jones (2002: 25) for middle-class speakers converging upon SSBE norms in formal contexts. This phenomenon of reversing the mergers of paired SSBE phonemes into single SSE phonemes is perhaps best called a demerger, rather than a split, precisely because it undoes a previous merger. However, Abercrombie (1979: 75–76) points to a rather ‘un-English’ distribution.
27
28
A Sociophonetic Approach to Scottish Standard English
For example, gather, salmon and value would be pronounced as [ˈgɑˑ.ðər], [ˈsɑˑ.mən] and [ˈvɑˑl.ju], respectively.16 Abercrombie interprets this as an indicator of the antiquity of this practice, since present-day contact would have resulted in the ‘correct’ acquisition. The phenomenon of demergers would thus appear to be a lexical rather than a sociolinguistic phenomenon. A (partial) demerger of /ɔ/ into /ɒ/ and /ɔ/ may also happen in SSE (Wells 1982: 402–403) but generally seems to be more rare than demergers of /a/ (McClure 1994: 81). Again, Abercrombie (1979: 76) points to a distribution unlike the one found in SSBE, with lorry, squash and watch pronounced as [ˈlɔˑ.ri], [skwɔˑʃ] and [wɔˑtʃ], respectively. According to Wells (1982: 402), SSE /u/ never demerges into /ʊ/ and /u/. There is a general tendency to describe the demerger of /u/ as rather unlikely, that of /ɔ/ as more likely, and that of /a/ as the most probable of the three (McClure 1994: 81). These different probabilities of the three demergers are formulated into an implicational scale by Abercrombie (1979: 76), with /ɔ/→/ɒ, ɔ/ implying /a/→/æ, ɑ/, and /u/→/ʊ, u/ implying both of the others. For the SSE vowels /a/ and /ɔ/, Giegerich (1992: 75) records qualities approximately intermediate between those of their counterparts in SSBE. This is on the whole upheld by Schützler (2011a), although Stuart-Smith (2008: 59–60) points to considerable qualitative variation in both phonemes. The vowel /u/ is transcribed as /ʉ/ by Stuart-Smith (2008: 60), who describes it as a high, usually rounded, central or front vowel. The vowel /ɪ/ is often considerably lower in SSE than in SSBE (Wells 1982: 404; Giegerich 1992: 72). Stuart-Smith (2008: 58) points to considerable sociophonetic variation of this vowel, with lower-class speakers using more retracted and lower variants than those of a higher class (cf. Schützler 2011a; Clark 2008). The vowels /e/ and /o/ of the lexical sets face and goat are in most descriptions given as monophthongs in SSE. Since they are two of the variables in this study, no details are given here (but see Section 5.3). The number of vowels that can occur before /r/ is greater in SSE than, for example, in SSBE or GA. This is because levelling of vowels before /r/ is not very pronounced in SSE: kirk, work and perk are pronounced /kɪrk/, /wʌrk/ and /pɛrk/, although it is typical of Edinburgh speakers to merge their vowels upon /ɜ/ (Abercrombie 1979: 80). However, these vowel mergers before /r/ are not generally very pronounced in SSE (McClure 1994: 83). Abercrombie (1979: 80) reports increasing mergers of the vowels in first, word and heard into a single central vowel among the professional classes of Edinburgh and Glasgow, but he points out that this process is not necessarily connected to derhoticisation, as speakers remain rhotic despite the loss of vowel distinctions.
16. There are no length diacritics in the original.
Chapter 2. Scottish Standard English in context
The consonant inventory of SSE is basically equivalent to that of SSBE, with the addition of two phonemes, /x/ and /ʍ/ (Stuart-Smith 2008: 61; cf. Giegerich 1992: 34). Abercrombie does not mention /ʍ/ and thus describes the SSE inventory as exceeding what he calls the “general English consonant system” by only one phoneme, namely /x/. This, he claims, is a phoneme unique to SSE among standard accents of English (Abercrombie 1979: 71). Apart from these systemic peculiarities, SSE is described as a rhotic accent (Abercrombie 1979: 69; Wells 1982: 10–11, 410; McClure 1994: 82; Jones 2002: 26–27). The consonant inventory of SSE is shown in Table 2.1. Symbols in square brackets indicate additional phonetic information. Thus, the glottal stop [ʔ] is not a phoneme in its own right but an allophone of (mostly intervocalic alveolar) stop consonants, the realisations of /r/ as [r] and [ɾ] equally have no impact on the system, and /w/ and /ʍ/ appear in both bilabial and velar position because of their double articulation. Table 2.1 The consonants of SSE. bilabial labiodental dental alveolar palato- palatal velar alveolar stop fricative affricate lateral approximant nasal approximant trill / tap
p b f v
m ʍ w
θ ð
t d s z
k ɡ [ʔ] x
ʃ ʒ tʃ dʒ
l n [r] [ɾ]
r
glottal
j
ŋ [ʍ] [w]
The phoneme /x/, realised as a velar fricative (Giegerich 1992: 41), occurs in Gaelic loans, place names and, allegedly, also in words like technical and technique (Abercrombie 1979: 71). McClure (1994: 84) claims that Scots people use /x/ in biblical, classical and foreign names like Enoch, Antioch, Arachne, Munich, Bach, but also finds that it is generally in decline. While Wells (1982: 408) and Stuart-Smith (2008: 63) see /x/ as a stable feature of SSE, Jones (2002: 28) reports its erosion.17 Another sound that is characteristic of Scottish English accents is /ʍ/, which may occur in words beginning with and creates minimal pairs like wail/whale or witch/which. Articulatorily, the sound can be described as a voiceless labio-velar 17. Unlike in names like Munich or Bach, /x/ is rarely heard in words like technical or technique, and this usage has also been doubted by SSE speakers of the author’s acquaintance. Furthermore, in Munich the realisation of /x/ would be [ç], while in Bach it would be [x], a distinction not normally made in the literature.
29
30
A Sociophonetic Approach to Scottish Standard English
fricative (McCully 2009: 47).18 There is general agreement that /ʍ/ is maintained in SSE (Wells 1982: 408; Giegerich 1992: 36; McClure 1994: 84; Jones 2002: 27–28; Stuart-Smith 2008: 63), but evidence of the decline of /ʍ/ has also been found (Brato 2007 for Aberdeen; Schützler 2010b for Edinburgh SSE). Apart from the two sounds /x/ and /ʍ/, SSE differs from southern British accents mainly in that it is rhotic. Additionally, the realisation of /r/ is very variable. Both aspects are discussed in detail in Section 7.3. Stop consonants in SSE are generally less aspirated than in other accents (Stuart-Smith 2008: 61), even in the cases of /p, t, k/ in stressed initial position (Wells 1982: 40). Dental variants of /t, d/ can occur especially in Gaelic-influenced areas but also in popular Edinburgh speech and in Aberdeen (Wells 1982: 409). This dentalisation may also affect /l/, which then attracts dental /d, t/ in words like belt and kilt (ibid.). ‘Clear /l/’ [l] and ‘dark /l/’ [ɫ] are not allophones of /l/ in the same way as in SSBE, as the sound tends to be velarised (‘dark’) in all contexts, while invariably ‘clear’ realisations are associated with Gaelic-speaking areas (Wells 1982: 411–412; Stuart-Smith 2008: 65). Wells (1982: 409) reports a considerable degree of /t/-glottaling in less presti gious accents in the Central Lowlands including Glasgow and Edinburgh. StuartSmith (2008: 62) also gives this realisation of final or intervocalic /t/ as [ʔ], e.g. in butter [ˈbʌ.ʔər] or bit [bɪʔ], as an option that middle-class speakers may choose. 2.3.2 Criterial and optional features of SSE Aitken (1979: 111–112) differentiates between criterial and optional features of SSE. The former are displayed by virtually all middle-class speakers, resulting in a vowel system like the one shown in Figure 2.2. Optional features, on the other hand, are modifications away from this common core, which may look like changes in the direction of SSBE. According to Aitken, these features include the merger of vowels before /r/ into /ɜ/, the diphthongisation of /e/ and /o/, and the loss of coda /r/. Aitken gives two characteristics of optional features: (i) they do not apply to all middle-class speakers, and (ii) they are of relatively recent origin, i.e. they did not start to emerge before the late 19th or early 20th century. In a comment on SSE accents that include optional features, Aitken (1979: 113) explicitly allows for the possible drift between SSE and SSBE (emphasis in original):
18. See the detailed discussion of terminological problems regarding /ʍ/ in Schützler (2010b: 8–9).
Chapter 2. Scottish Standard English in context
As their widespread middle-class occurrence would suggest, ‘hybrid’ accents of this kind seem to attract little popular attention and receive little or no adverse middle-class comment: they are just as acceptable as, or perhaps more acceptable than, accents which employ the Basic System and possess only those features which are criterial for ‘educated’ Scottish Standard English speech.
This suggests that SSE as a standard is somewhat more elastic than sometimes suggested (e.g. in McClure 1994), but also that anglicisation, rather than being a general phenomenon, is particularly likely in a restricted set of accent features. One of the aims of the present study is to investigate whether or not the variables under investigation can be claimed to be optional features of SSE.
31
chapter 3
Explaining accent variation and change
For the situation of accent contact between SSE and SSBE in the present study, it is assumed that variation and change at a higher level depend upon inter-speaker processes at a lower level. When there is accent contact, speakers change their linguistic behaviour according to the principles of Speech Accommodation Theory and Audience Design, and more permanent accent change can result from these accommodative inter-speaker changes. Sociolinguistic parameters can help to account for the patterns of accommodation and change; language-internal factors, on the other hand, are responsible for additional variability. 3.1 Accent contact and change by accommodation For the present study it is accent contact (as a special case of dialect contact) rather than language contact (cf. Thomason 2001) that is of interest. Kerswill and Williams (2002: 82) caution against the treatment of dialect contact as simply a subtype of language contact, because it does not involve the learning of a new language. However, it is implicit in the literature that no such critical distinction has to be made between accent contact and dialect contact. Therefore, while most sources use the term dialect contact as a somewhat more general label, theoretical assumptions can be taken to apply to situations where only accent features are of interest, as in the present study.19 Trudgill (2003: s.v. dialect contact; cf. also Britain & Trudgill 2000: 73) defines dialect contact as “[c]ontact between linguistic varieties which results from communication between speakers of different but mutually intelligible dialects, often involving accommodation”. Milroy (2002: 4) stresses the rising importance of such contact as an explanatory tool in variationist linguistics. This is in contrast to earlier approaches, in which a speech community was regarded as geographically closed and as undergoing internal, rather than contactinduced social stratification (cf. also Kerswill 2003: 225).
19. In differentiating between dialect and accent, the mainstream view (e.g. in Trudgill 2000: 5) is followed, namely that between dialects there may exist differences in vocabulary, morphology, syntax and pronunciation, while differences between accents concern pronunciation only.
34
A Sociophonetic Approach to Scottish Standard English
Largely based on Trudgill (1986: 39, 126), Auer and Hinskens (2005) discuss what they call the “change-by-accommodation model” as a “way of linking change at the level of the community to variable use in verbal interaction” (Auer & Hinskens 2005: 356). They explore the long-standing notion that speaker-tospeaker variation and change is the basis of dialect change at the level of the entire speech community. Essentially, they propose a model of language change based on Speech Accommodation Theory (SAT; Giles 1973; also cf. Giles 2009; Thakerar, Giles & Cheshire 1982; Giles & Coupland 1991), as will be outlined below. Change by accommodation happens in three stages, which form a sequential or implicational hierarchy (Auer & Hinskens 2005: 335–336; cf. Trudgill 1986: 39). First, in the so-called interactional component of the model, speakers accommodate to new forms in face-to-face interaction (cf. Trudgill 1986: 1–3; Kerswill 2003: 223). In the second stage, accommodation to new forms becomes habitual and quasi-permanent for individual speakers, even if there is no longer any interactional need for accommodation. This is called the individual component of the model. Finally, the changed speech habits are generalised across the speech community as a whole. If this stage is reached, change has taken place. For Trudgill (1986: 39), the two crucial factors in the transition from a merely situational accommodation process to one that is at least habitual are frequency of contact (i.e. extensiveness; see also Auer & Hinskens 2005: 337) in combination with a set of attitudes that is favourable to change. At the level of the speech community, contact between different dialects (or accents) leads to mixture, i.e. the coexistence of many different linguistic forms in use. If this situation develops into a synchronically stable new variety, this is called a koiné (Trudgill 1986: 107–108; cf. Trudgill 2003: s.v. koiné). A koiné is more focused than the original dialect mixture, which means that the several input dialects (or accents) have developed into a single new dialect with norms of usage that are generally agreed upon (Trudgill 2003: s.v. focused). In this context, the term levelling refers to the process of reducing the number of variant forms (e.g. variant pronunciations), often through the discontinuation of use of minority forms or forms that are in other ways socially or regionally marked (Milroy 2002: 7; cf. Farrar & Jones 2002: 9; Kerswill & Williams 2005: 1024). If several variants of a feature are not levelled but remain in the community’s repertoire, they are likely to be reallocated, i.e. different variants are given certain functions and are used in a systematic way (Trudgill 1986: 110; Trudgill 2002: 108; Britain & Trudgill 2000: 73–74). Examples given by Trudgill (1986: 118–121) are reallocation according to style (i.e. intra-speaker reallocation of variants), according to social class (i.e. inter-speaker reallocation), or according to geographical area. For the present study, the working assumption will be that there is a considerable amount of contact between middle-class speakers of SSE and speakers of
Chapter 3. Explaining accent variation and change
SSBE, and that the mechanisms of dialect (or accent) mixing apply more or less as described above. The general position of variants relative to the two poles of the variation continuum will be interpretable as habitual, quasi-permanent accommodation that is a stable characteristic of present-day SSE. Or, if there is structured variation, the coexistence of different variants will be interpreted as reallocation in the sense of a certain flexibility between the two accents in question, which enables speakers to make choices depending on the communicative context. 3.2 Internal and external factors in variation and change The subdivision of factors into internal and external ones plays a central role in much of the literature on language variation and change (LVC). As Aitchison (2001: 153) suggests, external factors strengthen internal processes. She describes sociolinguistic factors as “superficial” in the sense that at a higher level they exploit tendencies encoded in “deep” (i.e. internal) factors (2001: 151). Adopting this view, it will be argued that internal variation enriches the repertoire of variants that are available to speakers, and that social variation can capitalise on this latent low-level variation. Thus, the internal and the external have specific contributions to make to LVC as a whole. More detailed discussions of the terms internal and external (as well as other, related terms) and their theoretical implications can be found in Farrar and Jones (2002: 1–8), Bright (1997: 82–83), and Foulkes and Docherty (1999: 10). However, as a practical terminological solution, the present study uses the term internal simply to denote properties that are measured below the levels of speakers or texts, i.e. below the socio-stylistic levels of analysis (cf. 4.4). This coincides with Aitchison’s (2001: 134) definitions: On the one hand, there are external sociolinguistic factors – that is, social factors outside the language system. On the other hand, there are internal psycholinguistic ones – that is, linguistic and psychological factors which reside in the structure of the language and the minds of the speakers.
This approach also rather elegantly takes care of potentially controversial variables like lexical frequency. While frequency is a characteristic attached to types (lexemes) and therefore unconnected to properties of individual speakers, it is at the same time the very use of language by speakers that determines how frequent a type becomes in a particular variety. Lexical frequency is therefore linked to various levels: the type, the speaker (or community of speakers), and possibly even the language variety itself. Following Aitchison’s definition, however, it can legitimately be treated as an internal variable.
35
36
A Sociophonetic Approach to Scottish Standard English
3.2.1 Age, gender and contact The change-by-accommodation model presented in Section 3.1 assumes that speakers accommodate in interaction and that this may lead to changed speech habits of individual speakers and eventually to a change in the whole speech community. This section is concerned with the social patterns that processes of shortterm accommodation and long-term change are traditionally expected to follow. It is assumed that accommodation depends on speaker characteristics. For example, women do not speak differently because they are women, but because as women they respond differently to contact situations. It is important to bear in mind that variables like age and gender are primarily descriptive, not explanatory.20 The explanations that are proposed in the present study are found in SAT and its role in accent contact. Social class is not included as a predictor variable (cf. for example Ash 2002 or Meyerhoff 2006: 155–183) although it is rather likely to play a role in contact situations. However, considering the theoretical angle of the present study, it is useful to assume a somewhat simplified scenario in which the settings of the school and the university create sufficient uniformity to overrule differences in social background. Chambers (2002: 349) identifies speaker age as “the primary social correlate of linguistic change”. If there is change, the relative frequency of an innovative variant is expected to be negatively correlated with speakers’ age, i.e. younger speakers display a higher frequency of this variant (Chambers 2002: 355; cf. Feagin 2002: 28). The approach described by Chambers (2002: 355), i.e. comparing the speech of older and younger speakers in synchrony to arrive at conclusions concerning diachronic developments, is commonly referred to as apparent-time (Labov 1994: 43– 72). In apparent-time approaches “the speech of each generation is assumed to reflect the language more or less as it existed at the time when that generation learned the language” (Bailey et al. 1991: 241). The popularity of the apparenttime approach is ascribed by Bailey (2002: 312) to its ability of showing formerly unobservable change in progress. As a general qualification and cautionary note, Bailey (2002: 314) says that data analysed in apparent-time can be no more than a substitute for real-time data and cannot be taken to have the same meaning (cf. also Chambers 2002: 358). Three potential weaknesses attached to apparent-time studies are identified by Bailey (2002: 314–315). First, it is not clear how generally apparent-time studies 20. If in the present study a factor is discussed in general terms, standard orthography is used (e.g. ‘speech rate’, ‘word list’, ‘word-list style’). If the same concepts are translated into independent variables (that can feature in model outputs, for example), small capitals and in many cases also a contracted spelling are used (e.g. speechrate, wordlist).
Chapter 3. Explaining accent variation and change
reflect diachronic change. That is, the two may not always be well-aligned, and comparative work on this issue is limited (but see Bailey et al. 1991: 261–262). Second, apparent-time approaches assume but cannot be certain that vernaculars are stable at the level of the individual speaker, at least from a time in early adolescence onwards. Third, there is the possibility of age grading, where differences between age groups do not represent change in progress but a pattern that is repeated in each generation of speakers (cf. also Chambers 2002: 358). In response to the second problem, that of the diachronic (in)stability of an individual’s speech, Bailey (2002: 324) explains that individual vernaculars can confidently be assumed to stabilise only in early adults, while even older teenagers’ habits are still unstable. Meyerhoff (2006: 133), on the other hand, restricts the critical period of language learning largely to childhood. Since some of the speakers in the present study are as young as seventeen, and, moreover, assignment of speakers was only reasonably possible to two age groups, using the binary predictor age (cf. Appendix D.1), claims of possible change in progress will have to be made with due caution. The second major external factor is speakers’ gender. The term gender generally refers to a social or cultural construct, while sex describes biological characteristics (cf. also Eckert 1989: 245; Milroy & Gordon 2003: 100; Meyerhoff 2006: 133). Strictly, sex and gender are not binary concepts, but they are usually treated as such (Cheshire 2002: 424). In the present study, the binary predictor gender (cf. Appendix D.1) is used as what Cheshire (2002: 425) calls an “unanalysed speaker variable”, i.e. a relatively crude category (cf. Eckert 1989: 247). The possibly multi faceted construction of gender by speakers at the level of the local community or at the level of society as a whole are not discussed. Some recurrent differences between male and female speakers will be briefly summarised below. Labov’s (1990: 210–213) first principle of gender-related variation states that for variables characterised by stable social stratification, women will generally use higher proportions of standard variants, i.e. variants associated with overt prestige (cf. Eckert 1989: 247–248). A second, related principle (Labov 1990: 213–214; 2001: 274) is that women are more likely than men to adopt incoming prestige forms in change from above (cf. also Meyerhoff 2006: 209).21 For the well-known arguments that explain these patterns in terms of differences in social position and power between men and women, see the summary in Cheshire (2002: 426–427). Eckert and McConnel-Ginet (1999: 198) argue for an adjustment of research focus away from “properties” of men and women towards their “social practices and social relations”, or, away from gender “attributes” to gender-related “actions and stances”. This view is compatible with an SAT-based approach (see 3.1). Instead 21. The so called “gender paradox” (Labov 2001: 292), sometimes treated as an additional principle of gender-related variation, is of little relevance in the present study.
37
38
A Sociophonetic Approach to Scottish Standard English
of seeking the reasons for women’s speech behaviour in their social position and in power relationships, it might make sense to look for mechanisms at the level of interactive accommodation that differ between women and men. If women are psychologically more observant of differences in language (as can be read between the lines in Meyerhoff 2006: 209), and if we accept that there may be a basic – even a biological or neurological – difference between the sexes regarding their perception of and sensitivity to small differences in speech, then this alone could account for some of the linguistic behaviour of women. If women are more perceptive than men, they will be able to accommodate more sensitively. For this mechanism, social reasons do not even need to be invoked. In short, the characteristic that is measured in the present study is speaker sex, but the social variable that is used is labeled gender, assuming that sex correlates closely with gender as a simple category (cf. Labov 2001: 263). In the present study, contact as a predictor is quite literally taken to mean exposure to and interaction with speakers of SSBE beyond the sporadic everyday encounters that are commonplace in a city like Edinburgh. There are three scenarios for which this was assumed: (i) one or both parents are English, although the speaker can be classified as fully Scottish (speakers 17-20f-c, 18-19f-c, and 27-22m-c), (ii) the speaker has spent an extended period working in England (11-57f-c), or (iii) the speaker has spent an extended period studying in England (speaker 12-40m-c). The general assumption is that due to the settings in which interviews were conducted, all speakers have considerable experience of SSBE. But in the cases of those speakers who meet the above conditions, there exists (or existed) a particularly high level of exposure. In treating those speakers as a group potentially influenced by contact, the present study follows Trudgill (1986: 39) and assumes that the frequency (or extensiveness) of contact with speakers of another accent is a crucial factor in accommodation-based models of variation and change (cf. 3.1). 3.2.2 Style Apart from the age and gender of speakers, style is a sociolinguistic dimension along which language can vary. Schilling-Estes (2002: 375) roughly defines stylistic variation as sociolinguistic variation within speakers, in contrast to variation between speakers (cf. also Milroy & Gordon 2003: 198). Two theories of style have been particularly influential (Schilling-Estes 1998: 68): Attention Paid to Speech (ATS; Labov 1972: 70–109) and Audience Design (AD; Bell 1984). The following paragraphs will explicate that style shifts occurring in the interview situation in the present study will be interpreted in terms of AD rather than ATS. However,
Chapter 3. Explaining accent variation and change
this does not necessarily mean that traditional experimental devices like the word list or the reading passage need to be abandoned. The assumption that underlies ATS is that the less consciously speakers monitor their own linguistic behaviour, the more vernacular their speech will be, and vice versa. In the original theory by Labov, style shifts are not only viewed as resulting from different degrees of attention paid to speech, but these degrees are equated to different levels of formality (Schilling-Estes 2002: 379). Still basically adhering to the original ATS framework, Trudgill (2000: 82, 89) points out that formality is actually a more complex factor, due to the situational (or interactional) aspect of the interview. This situational dimension is of central importance in the general criticism of ATS formulated by Giles (1973: 87–88), whose Speech Accommodation Theory (SAT) can be seen as a forerunner of Bell’s (1984, 2009) Audience Design framework. Giles argues that Labov fails to take into account processes between speaker and interlocutor (see 3.1 above). Bell (1984; 2001; 2009) expands the SAT framework beyond the dyad of speaker and interlocutor, puts it into a purely linguistic context, and develops a set of principles (Bell 2009: 268–273; cf. also Bell 2001: 141–148). Three aspects are most relevant for the present study. First, “[s]peakers design their style primarily for and in response to their audience” (2009: 269–270). This claim is Bell’s third principle, which he calls “the heart of audience design”. It is argued that it is the influence of an audience that triggers style shifts, not any introspective process of evaluating the social meaning of a variant. Second, “[s]tyle derives its meaning from the association of linguistic features with particular social groups” (2009: 269). This is Bell’s second principle, which situates AD firmly within sociolinguistics. It is in contrast to ATS, where linguistic features are associated with social groups only very indirectly via the link of attention, and the social group relevant in a certain situation cannot therefore be a particular group. Third, “[v]ariation on the style dimension within the speech of a single speaker derives from and echoes the variation which exists between speakers on the ‘social’ dimension” (Bell 2009: 271). This so-called Style Axiom (1984: 151; 2001: 144), Bell’s fifth principle, establishes the relationship between style shifts and actually existing sociolinguistic differences between speaker groups. It agrees not only with Labov (1972: 70–109), but also with the models of dialect variation and change presented by Trudgill (1986) and Auer and Hinskens (2005) as explained in 3.1 above. SAT and AD are very similarly motivated, but between them and Attention to Speech there is a rather deep theoretical divide. Interestingly, the practical difference in design between an ATS-based study and one that adopts SAT need not be equally large. In other words, the theoretical shift from ATS to SAT/AD does not necessarily rule out the use of production types like word lists, reading passages,
39
40 A Sociophonetic Approach to Scottish Standard English
and interview components (careful or casual speech), which were developed in traditional ATS-based research. In the present study it is precisely these three stylistic categories that will be used – the predictors being wordlist and text, with careful speech remaining uncoded as the reference category. Results are compatible with ATS, but will be interpreted in terms of SAT or AD. It is proposed that the two approaches to style can be viewed as complementary – in other words, the degree of attention paid to speech and the degree of inter-speaker accommodation are negatively correlated. This is shown schematically in Figure 3.1. maximal attention to speech word list
minimal attention to speech reading passage
minimal interactive accommodation
careful speech maximal interactive accommodation
Figure 3.1 Complementary stylistic continua: ATS (above) and SAT (below).
In this model, word list speech will produce what is the norm for the speaker. There is no active interaction to speak of, even if the interviewer is of course still present. At the same time, in the traditional ATS approach, the speaker will pay most attention to the exact form of what is produced in the word list, precisely because the task is cognitively easy for a literate speaker. At the opposite end of the continuum, careful speech will give rise to more accommodation towards the interviewer, as it contains responses to questions or spontaneous tasks, which have a more interactive dimension. The reading passage takes an intermediate position between the two poles of the continuum. What is applied in this study could essentially be described as a reinterpretation of Labov’s ATS approach in terms of SAT (see Giles & Coupland 1991: 62), using the conventional text production types known from the traditional sociolinguistic interview. 3.2.3 Ease of articulation and clarity Blevins (2004: 71–72) explains that teleological approaches to language variation and change “view sound patterns as moving towards optimal targets”. She also identifies the two main targets, ease of articulation and clarity, whose pull in opposite directions results in a continuum of variation (cf. Lindblom 1990).22
22. Optimality Theory is a more formal teleological framework (Blevins 2004: 76–77; see Minkova & Stockwell 2003 for an application) but plays no major role in this study.
Chapter 3. Explaining accent variation and change
Sound changes can result in reduced or increased effort. Thomas (2011: 274) cites assimilation, deletion, lenition, and monophthongisation as examples of effort-reducing changes.23 On the other hand, he makes reference to several sound changes that are fortitions and concludes that it cannot be predicted whether clarity or ease of articulation will prevail in a specific case (2011: 275). In the present study it is assumed that part of the explanation of variation and change may lie in optimisation of some kind, following Lindblom (1998: 259– 260), who treats the tension between the goals of clarity and ease as “an important source of the phonetic variation from which speech community members select innovations of the phonological system”. The two predictor variables that are used are stress and speechrate, indicating the level of prosodic prominence of a syllable containing one of the target variables and the speed at which utterances are produced. Measurements of the independent variable speechrate are based on connected stretches of speech containing the target word and excluding pauses. Two parameters are measured: (i) the number of syllables and (ii) the duration in seconds. It is assumed that speech rate is a nonlinear measure. For example, an increase in speech rate from 3 syll./s to 4 syll./s is not equivalent to an increase from 5 syll./s to 6 syll./s., the latter being the smaller increase of the two. Each pair of measurements (number of syllables and duration) for a given interval was therefore transformed using the equation speechrate = log(syll./s)/log(2) – 2 (see Appendix D.1). A doubling of the number of syllables per second will result in a unit increase in speechrate. Additionally, the scale is centred on the value of speechrate = 2, which is the equivalent of 4 syllables per second, the whole number value that is closest to the average value in the data. The variable stress is categorical, not continuous. For (r), it takes wholenumber values of 0–3, which are treated as ordinal and centred on the value 1.5. The approach was guided by the concept of metrical grids, as found in Hayes (1983: 366–371). Values of stress = 0 mark syllables that are maximally weak, as in reduced function words, while stress = 3 marks /r/ in syllables that are maximally strong, e.g. in focus stress position. For (e) and (o), stress only ranges from 0 to 2, because for these vowels weak forms – possible, for example, in so and no – were excluded, as it was felt that the reduction of the vowel to a central quality approximating to [ə] would have disqualified it within the context of diphthongisation. For these two dependent variables, stress was centred around value one. Coding 23. Thomas (2011: 274) reproduces the view that less tongue movement means less effort, but tense stability (as in monophthongs) may actually be more effortful than movement. For example, Krug (2012: 772) partly explains the diphthongisation of the tense ME vowels /iː/ and /uː/ in terms of lenition.
41
42
A Sociophonetic Approach to Scottish Standard English
for stress was done by looking at the context of the relevant syllable and giving it a weight relative to the rest of the intonation phrase (cf. discussion in Gut 2013b). 3.2.4 Frequency effects In usage-based approaches to variation and change, the frequencies with which patterns are used in a language have effects on their mental representation and possibly the phonetic forms that are produced (Bybee 2001: 1).24 All tokens perceived by a speaker are sorted into categories which contain considerably more information than would be necessary for the mere establishment of phonemic contrast. One such category that comprises a number of tokens of sufficient phonetic, contextual and social similarity is called an exemplar (Bybee 2001: 51–52; 2010: 19). As Phillips (2006: 94) explains, innovative forms are stored as part of the lexical representation of the word form. If exposure to one particular variant is high, the respective exemplar is strengthened and may eventually come to be stored as the default, which is then most easily evoked in production (cf. Bybee 2010: 19). This reflects the potential of frequency-based theories in studies of language variation and change like the present one. One finding that has proved relatively robust across different studies is that phonetic change progresses faster in high frequency words, especially as far as reductive changes such as consonant deletion or vowel reduction are concerned (Bybee 2010: 20; Krug 2003: 18; cf. Bybee 2001: 11; Zipf 1929). According to Bybee (2010: 20), the “bias towards lenition” – with lenition defined as articulatory reduction – of high-frequency words results from practice: gestures overlap more as the type is reinforced through repeated production and perception (also see Gut 2009: 39 and Krug 2012: 772). One could also say that the phonetic components of high-frequency words are tied more closely together and will be stored and produced less analytically, but rather as one synthetic whole – a process remotely similar to morphological reduction in grammaticalising high-frequency constructions (e.g. want to → wanna; cf. Krug 2000: 152; Krug 2001: 214). In the present study, the frequency of a token was measured as the logarithm (to the base of 10) of a lexeme’s frequency per 10 million words in the spoken part of the British National Corpus (BNC), centred around the value 3 (see Appendix D.1). The BNC as a reference corpus was chosen because it can serve as a fair representation of British Standard English usage. The predictor bnclogf is
24. For the discussions in this section the relevant concept of frequency is token frequency, i.e. the occurrence of a unit (word) that was actually produced in text (Bybee 2001: 10; cf. Krug 2003: 9–10).
Chapter 3. Explaining accent variation and change
based on the added frequencies of closely related words with compatible lexical stress patterns. For example, bnclogf for the word pacing is based on the added frequencies of pacing, pace, paces, paced, pacers, and pacemaker in the spoken BNC.25 The underlying assumption is that a derived or inflected word-form of low frequency is still likely to be influenced by the high frequency of the root or stem on which it is based. Since it is not possible to calculate the logarithm of zero, for lexemes that occurred in the data but not in the BNC a single occurrence in the latter was simulated.26 For linking (r), two additional frequency-related variables were measured: clc bnclogf, which describes the frequency of two-word collocations, and nx bnclogf, which describes the frequency of the second word in a two-word collocation (see Appendix D.1).27 Two final points need to be made. First, word frequency as a conditioning factor is not in conflict with other (e.g. phonetic) avenues of explanation (Bybee 2002: 262). In the words of Aitchison (2001: 86), “[a] change is […] most likely to ‘get its foot in the door’ in places where frequency of usage is combined with linguistic susceptibility”, i.e. where several factors interact, one of them being frequency. Second, Bybee (2010: 21) points out that exemplars can be structured in such a way that phonetic differences can be stored in association with certain social groups along dimensions such as age and gender, or class. Hence it is possible to build an entirely usage-based theory of sociophonetic variation and change. For the present study, however, this kind of approach was not chosen, mainly because it would have required a different (and in many respects more elaborate and timeconsuming) research design (see, for example, Eckert 2000 or Clark 2008: 79–83). 3.2.5 Other internal factors Additional language-internal factors that are included in the present study are to do with the historical pronunciation of the sounds under investigation, the position of segments within the word, or the position of syllables in the sentence, as well as preceding or following segments. A short technical definition of all independent variables is given in Appendix D.1, and only a very short discussion is provided here. 25. A text-only version of the BNC (based on Version 1.0 of the corpus) was used to generate frequency lists in WordSmith. This version of the corpus was created by a research team at the University of Paderborn and is mentioned in Schlüter (2005: 55). My thanks are due to Julia Schlüter for giving me access to this version. 26. For a different approach to this problem see Clark and Trousdale (2009: 38). 27. It was easier to use BNCweb in order to determine values of clc bnclogf, rather than the version of the BNC described above.
43
44 A Sociophonetic Approach to Scottish Standard English
The situation of the variables (e) and (o) in SSE is somewhat special because, at least in certain words, present-day diphthongisation can result in forms that resemble earlier (diphthongal) Middle English (ME) forms – a process that has already taken place in SSBE (see 5.2 below). The rationale behind the inclusion of a word’s diphthongal ME origin as a possible predictor (labeled mediph) is that, potentially, those words that were diphthongs at an earlier historical stage may also be the first to diphthongise today. They may either have retained minute traces of the older pronunciation, or it may be the case that their historical spellings encourage a partial return to those pronunciations, at least in reading. Evidence showing that at least in dialects of English this may play a role is provided by Trudgill (2004: 170–171), Wells (1982: 337), Beal (2004b: 123–124) and Thomas (1994: 117–118). A detailed discussion of the relevant historical background can be found in 5.2. The quality of the preceding vowel may have an effect on the realisation of coda (r). For example, due to articulatory factors (e.g. frontness or backness), certain vowels may attract a following approximant or zero-variant. The respective vowel environments are coded as ini 1–ini 6. For words like fear, here, and near; the predictor ini 1 is relevant; the variable ini 2 is assigned to words like where, care, dare (disregarding the distinction between [er] and [ɛr]); ini 3 plays a role for words like car, mark, art; the predictor ini 4 is used for words like north, sport, sort (disregarding the distinction between [or] and [ɔr]); ini 5 is associated with words like sure, poor, tour; and ini 6 is the relevant predictor for words like bird, heard, nurse, corresponding to Wells’s (1982: 137) lexical set nurse. As McClure (1994: 83) points out, Scottish accents of English traditionally retain distinct pre-/r/ vowel qualities [ɪr], [ɛr], and [ʌr] in the three examples above, whereas other rhotic accents (like GA) have only [ɜr]. However, this merger is also a peculiarity of Edinburgh middle-class speech (cf. also Downes 1984: 126–127; Lawson, Scobbie & Stuart-Smith 2011: 265). The author’s own observation in Edinburgh was that especially younger speakers do indeed merge [ɪr]/[ɛr]/[ʌr] to a great e xtent, and in consequence it was decided to use ini 6 as a variable, keeping in mind that as a category it is bound to be somewhat fuzzy and unstable. The following vowel can play a role for realisations of onset (r). The respective five environments are coded as follo 1–follo 5, with a numbering that is strictly parallel to the one used for the variables ini 1 to ini 6 (see above). The following vowel environment is coded as follo 1 for words like really, tree, free, as well as brick, trick (collapsing [ri] and [rɪ]), but also for unstressed final vowels in very or story; the predictor follo 2 is relevant for words like friend, reptile, rest, but also train, rain, trail (collapsing [re] and [rɛ]); follo 3 plays a role for words like rough, travel, rather (combining [rʌ] with the SSE category [ra] that corresponds to [ræ] and [rɑ] in SSBE); follo 4 applies to words like rock, broad, road (combining [ro]
Chapter 3. Explaining accent variation and change
with the SSE category [rɔ] that corresponds to [rɒ] and [rɔ] in SSBE); and, finally, follo 5 is associated with words like rule, true, cruel. Further, an immediately following speech pause may very well have a favouring or blocking effect on the vocalisation of coda (r), as may the containment of coda (r) in a consonant cluster (predictors prepausal and precons, respectively). And, once again in analogy, the realisation of onset (r) is possibly affected by its being the very first segment in a word (predictor wordini), or being preceded by a vowel (predictor intervoc). All such language internal factors may create low-level variation that can serve as the point of inception for more far reaching, socially motivated patterns of variation and change. 3.3 Towards a unified model The approach taken in the analysis of variation in the present study rests on the long-standing, traditional and relatively uncontroversial view that linguistic variation and change is best investigated by looking at internal and external factors in combination (Weinreich, Labov & Herzog 1968: 188; Aitchison 2001: 162). Because the individual components have been discussed in detail above, only a relatively brief account will be given here. Phonetic variables are unstable even within an individual speaker for a number of reasons. Lexical frequency, (subconscious) traces of historical stages of the language, the opposing pulls of ease of articulation and clarity of speech, as well as the segmental or morphological environment all contribute to the many-toone relationships between phonemes and their realisations. In addition, speakers simply do not exert sufficient control over the articulation process to produce exact replicas of the same sound (cf. Ohala 1989, 1993). This internal, low-level instability is only to some extent predictable from identifiable factors like, for example vowel environment or prepausal position – much of it will remain unexplained and qualifies as ‘noise’. When speakers of SSE and SSBE meet in conversation, both parties will negotiate the appropriate linguistic distance through accommodation, a process that may to some extent become habitual. The specifics of this accommodative process, i.e. the extent to which speakers converge or diverge, will be partly predictable from social characteristics of the speaker. The variants that come to play a role in the accommodation process will be partly drawn from or informed by the pool of variants that exist speaker-internally. Thus, in the present study it is assumed that low-level (internal) variability enlarges speakers’ repertoires of variants and creates points of inception for lexical
45
46 A Sociophonetic Approach to Scottish Standard English
diffusion or the catch-on phase of socially motivated variation and possibly change. It is therefore necessary to look at a range of internal and external factors in combination. Figure 3.2 sums up the contact situation between SSE and SSBE and the processes, both SSE-internal and inter-varietal, that are assumed to play a role. sociolinguistic factors
SSE
internal factors + noise
inter-speaker accommodation
SSBE
intra-speaker variation
Figure 3.2 SSE-SSBE contact: Internal and external factors.
The empty ellipsis for SSBE in Figure 3.2 is an intended simplification; for SSBE, similarly complex internal patterns of variation can be assumed as for SSE. The model also indicates that there are accent features in both varieties that are similar enough to be described as shared features – hence the overlap of the ellipses.28 The main challenge posed by this integrated model relates to the observability and demonstrability of some of its components. Accommodation is introduced as a social inter-speaker mechanism, but it is not measured – nor indeed measurable – in the design of the present study. To some extent, therefore, some of the criticisms of Meyerhoff (1998: 208) would seem to apply, i.e. the charge that SAT is used as a post-hoc “hand-waving device” that plays too limited a role in the research design. Cameron (2009: 110) defines patterns of sociolinguistic variation as “essentially descriptive statements about the distribution of certain variables in the speech community” that do not in themselves explain anything. The interpretative approach taken in the present study is therefore only one of several possible ways to account for variation and change, albeit one that appears to be quite plausible for the context under investigation.
28. In fact, particularly as far as consonants are concerned, the overlap is rather more substantial than suggested by the figure (cf. Section 2.3.1).
chapter 4
Data and methodology
The four methodological aspects treated in this chapter are (i) the collection of the data (4.1), which includes the choice of speakers, interview materials, and the conduction of the interview itself; (ii) the analysis of acoustic vowel data (4.2), which comprises decisions concerning the choice of measuring points, the transformation of the data, and general issues in the treatment of vowel variables; (iii) the auditory analysis of the consonantal variable (r) (4.3); and (iv) the statistical analysis (4.4). Particularly concerning data collection, vowel analysis and statistical analysis, the researcher has to make rigorous choices from the possible methodological options that are available, and this chapter therefore partly serves as an illustration of the compromises that are necessary in sociophonetic research (cf. 1.4). 4.1 Data collection As Thomas (2011: 3) points out, there is a fundamental difference in sampling methodology between sociolinguistics and phonetics. If phonetic data are to be measured acoustically, they must be of a high quality, and this requirement places certain restrictions on the recording situation. The obvious solution for phoneticians is to resort to a controlled, quasi-laboratory approach, where subjects are interviewed in a one-to-one situation, ideally in a quiet room. One important aim of sociolinguistic work, on the other hand, is to overcome the Observer’s Paradox, i.e. to conduct interviews in situations where speakers do not feel the pressure of being observed (Feagin 2002: 20; cf. Labov 1972: 209). By its very nature, sociophonetic work will be a compromise between these two approaches (Foulkes, Scobbie & Watt 2010: 727). The data for the present study were collected in March 2008 at the University of Edinburgh and one of Edinburgh’s private schools, within the daily routine of the school and the university, respectively. For practical reasons, interviews were designed to last no more than 25 minutes each. There were only few recording sessions that exceeded this time limit. In the two settings of the school and the university, SSE predominates – that is, there is little, if any, drifting towards Scots usage. In both contexts, the avoidance of non-standard features was reinforced by
48 A Sociophonetic Approach to Scottish Standard English
the presence of the interviewer as an outsider to whom speakers will to some extent have accommodated their linguistic behaviour (Di Paolo & Yaeger-Dror 2011: 10). Thought must also be given to the choice of materials, i.e. the way in which speech is elicited using either standardised (written) material, an interview format, or a combination of both (cf. Feagin 2002: 31). For example, written texts read by subjects are particularly useful for phonetic variables as they ensure that enough (balanced) material is produced by each speaker. Apart from the time constraint placed on the interview situation, it was partly these considerations that led to the inclusion of a word list and a reading passage. 4.1.1 Speakers and styles Recruiting speakers from two institutions clearly means that sampling at this level was not random. Rather, the procedure can be described as convenience sampling (Thomas 2011: 3; cf. Milroy & Gordon 2003: 26). Of course, limiting sampling to two institutions introduces a bias, and claims of representativeness across SSE as a whole are not possible on this basis (cf. Milroy & Gordon 2003: 24). The sampling of individual speakers was also non-random, as only individuals who were willing and had the time to participate in the study were recorded. As Feagin (2002: 29) points out, sample size plays a role in so far as it is generally advantageous if groups with shared social characteristics (so-called ‘cells’) are larger (cf. Milroy & Gordon 2003: 25). Typically, a number of five individuals per cell is recommended. As Table 4.1 shows, these requirements are met in the present study. However, while the numbers of younger vs. older as well as female vs. male speakers were controlled during data collection, it was not possible to obtain symmetrically filled cells using the three binary criteria age, gender, and contact. Table 4.1 Distribution of speakers by age, gender, and contact. Number of speakers by age and gender contact = 0 contact = 1
younger female
younger male
older female
older male
6 2
4 1
4 1
8 1
In the regression-based analyses used in the present study (see 4.4), balanced cell sizes are not vital for the functioning of the model, but for some effects there will be a considerable reduction in (or lack of) analytic power. This is especially true for interaction effects associated with the variable contact. To identify speakers in the present study, alphanumerical code is used, in which the speaker number is followed by indications of his or her age and gender
Chapter 4. Data and methodology
and an additional suffix if contact = 1 for this speaker. For practical reasons, speakers will also be referred to simply by the first number of the label. The interview was designed so as to generate speech in three text types, with the term text type used as a rough equivalent to genre, i.e. a text produced under particular conditions. They are (i) a reading passage, (ii) a careful speech component, and (iii) the reading of a word list. In contrast to text type, the term text is used to denote a specific text unit produced by a particular speaker in the statistical model (level-2 unit; see 4.4). Each interview was concluded by the completion of a standard questionnaire (reproduced in Appendix A.3). This was modelled on a similar format developed at the University of Bamberg for use in fieldwork, d esigned to document speaker characteristics like age, gender, regional provenance, and linguistic background. In the following, the components of the interview are discussed in the order in which they were elicited. This was (i) reading passage; (ii) careful speech part I (continuation of a story); (iii) careful speech part II (questions and answers, remembering detail); (iv) word list; (v) careful speech part III (discussion of personal details while filling in the questionnaire). Despite the suggestion of Di Paolo and Yaeger-Dror (2011: 16) to present structured tasks (e.g. reading passage, word list) after conversational components to make speakers less self-conscious and to avoid their guessing the variables of interest, a different strategy was followed in the present study. The reading passage was the first task subjects were given, because it was felt that the straightforward task of reading a short text was much more likely to relax speakers in a situation where they were interviewed by an outsider. Additionally, the content of the reading passage was needed for later stages of the experiment (see below). Furthermore, it is fairly difficult to guess the targeted variables from the reading passage, as can be seen from Appendix A.1. The text for the reading passage was created by the author; a remote influence was the well-known myth connected to the Loch Ness Monster.29 In content, it tells part of the story of a fictitious west-coast fisherman named Hamish MacGregor. The name and description of the man as well as his environment were meant to evoke associations of a gruff, weather-beaten man, braving a life of poverty, hardship, and relative loneliness in a forbidding, peninsular west-of-Scotland setting (cf. Appendix A.1 for the full text). Apart from general descriptions, the text relates that Hamish is troubled by a mysterious pain in his foot, which is of unknown origin. Further, there are hints that the fisherman has for a long time been trying to find (or catch) some mysterious sea-creature, but the background to this part 29. I am grateful to my colleague Shane Walshe, who proof-read the final version and made valuable suggestions concerning some details.
49
50
A Sociophonetic Approach to Scottish Standard English
of the story is equally left in the dark. The first part of the text introduces Hamish MacGregor, his looks, his home, some of his habits, and the mysterious ailment that plagues him. In the second paragraph, the focus is on Hamish’s work, his boat, and the fact that for some unknown reason, and much to the astonishment of his retired fisherman friends, he refuses to give up his job. The story terminates on a cliffhanger. Speakers were asked to continue the open-ended story, using a prompt by the interviewer. The amount of back-channelling during this sequel to the narrative was kept to a minimum, as advised by Feagin (2002: 31), but this depended on how well different speakers handled the task. Speakers continued up to a point where their story came to a natural end (often enough creating another cliffhanger), or appeared to run out of narrative steam. At this point, the second part of the careful speech component followed, where certain questions regarding the content of the reading passage were asked by the interviewer. Points of interest were Hamish’s ailment, the difficulties of a fisherman’s life, the area where the story was set, Hamish’s boat, the reasons why he was proud of it, etc. There was more back-channelling and prompting in this part. In cases where the first two parts of the careful speech component did not produce a sufficient amount of data, tokens from the very end of the interview were included as well, i.e. material recorded during the discussion and completion of the questionnaire (see Appendix A.3), after the word list had been read. Following the question-and-answer section of the careful speech component, the word list was read through once. The interviewer handed the single sheet with the list (see Appendix A.2) to the speaker, giving him or her a brief spoken summary of the instructions found at the top of the sheet. In the widest sense, the careful speech component consists of all material that was not part of either reading passage or word list. In practice, however, a selection of three stretches of speech was made: (i) the continuation of the story, (ii) the question-and-answer part (see below), and to a much more limited extent also (iii) the discussion of speakers’ personal backgrounds while going through the questionnaire after the word list reading. For a sample transcript see Appendix B. 4.1.2 Types and tokens The structure of the word list is best seen by looking at Appendix A.2. The most important points in constructing the list were (i) the clear instruction of the speaker, (ii) readability, (iii) the clear presentation of the carrier phrase, (iv) a sensible sequence of words in the list, and (v) the interspersion of the list proper with ‘dummy’ items. The list started with the words house, rock, and fit, which were
Chapter 4. Data and methodology
only there for the reader to find his or her pace and adjust to the situation before proceeding to the target words. The carrier phrase ‘Now repeat ________ , please’ was constructed so as to leave the targeted word framed between two voiceless stop consonants to avoid more dramatic transitional effects. Words were printed in three columns of sixteen words each, and were further arranged in triplets constructed as [(e)-type] → [(o)-type] → [(r)-type or ‘dummy’]. What is here referred to as dummies is called fillers by Di Paolo and Yaeger-Dror (2011: 14–15). These are items inserted between targeted tokens to distract subjects from the research focus. Table 4.2 shows the words occurring in the word list. The variable (r) is generally less robustly represented, and linking (r) was not targeted at all. Table 4.2 Types used in the word list. (e) (o) coda (r) onset (r) dummies
they, say, day, bay, bait, shade, paid, face, gaze, days, name, ache, eight, take, Hamish no, go, know, tow, boat, code, showed, nose, knows, home, oak, own, don’t, Oban hear, hair, car, sure, north, force, butter, word, bird, heard gross, Braes, across, Tobermory calm, man, thought, mood, foot
For the variables (e) and (o), the words targeted in the reading passage are essentially the same as in the word list, minus the word they, excluded due to the difficulty of putting it into a syntactic position where it would not be a weak form. Table 4.3 lists the relevant items in the reading passage by variables, in each case not alphabetically but in the order of appearance in the text. Table 4.3 Types used in the reading passage; marked (*) if variably attributed to coda (r) or linking (r). (e) (o) coda (r)
linking (r) onset (r)
Hamish, name, face, shade, day, bay, gaze, days, ache, say, eight, paid, take, bait code, home, know, boat, oak, own, no, tow, Oban, go, knows, nose, don’t, showed there, fisherman, MacGregor, certainly, part, turned, hard, understand, hardly, ever, Sunart, *where, were, waters, were, waters, were, there, others, *or, fourteen, neighbour, her, mornings, [harbour, harbour], Tobermory, work, retailers, their, more, there, other, retired, fishermen, other, returned, their, your, never, there, *for, for, there, fearsome nature_and, *where_he, door_and, *or_he, for_all, stare_out, *for_him, creature_of MacGregor, proud, grey, from, from, trying, read, Braes, crooked, across, strange, pride, travel, grumbled, prices, friends, retired, returned, crease, reply, true, rest, creature, monstrous, gross
51
52
A Sociophonetic Approach to Scottish Standard English
Table 4.3 indicates that in three cases types were variably classified as linking (r) or coda (r), depending on whether the words following the /r/ (he or him) began with an articulated /h/, creating no liaison site, or whether the /h/ was dropped and liaison enabled. These sites of the type /-r # h-/ were individually inspected, and this partly explains the variability of token numbers between coda (r) and linking (r) in the reading passage (see Appendix C.2). The total number of words produced in careful speech was 20,642 (765 words per speaker). Despite its smallness, this corpus captured a sufficient number of observations, but its usefulness for other than sociophonetic purposes is of course limited. The number of words per speaker was rather variable, between a minimum of 326 (speaker 11-57f-c) and a maximum of 1,976 (speaker 10-61m), although the latter really was the exception. For an account of token numbers see Appendix C. For both (e) and (o), the loss of data due to unmeasurability was not dramatic, so the semi-laboratory approach that was taken clearly paid off in this respect. Table 4.4 shows token numbers for all variables by style (for a more detailed list see Appendices C.1 and C.2). Table 4.4 Token numbers for (e), (o) and (r) by style. word list reading passage careful speech Σ
(e)
(o)
coda (r)
linking (r)
onset (r)
400 359 303 1062
338 348 299 985
294 1198 1073 2565
143 143 286
108 670 618 1396
For coda (r) and onset (r), the number of tokens in word-list style is rather low, because here the elicitation of a sufficient number of vowel tokens was given priority. For linking (r), only reading passage and careful speech are available, since linking (r) was not originally targeted as a variable. 4.1.3 Recording, processing and transcription Stereo recordings were made using a Zoom H2 Handy Recorder, placed on a tripod on the table, facing the speaker. The inbuilt stereo microphone with a directivity angle of 90 degrees was pointed at the speaker, usually with the interviewer sitting at the same table at a right angle to him or her. This setup resulted in the reduction of unwanted background noise and an interviewer who appears predominantly on one stereo channel in the recordings. The sampling rate was 44.1 kHz at a bit depth of 16 bit (stereo), which results in a bit rate of approximately 1.41 Mbit per second (cf. Gut 2013a). Recordings were
Chapter 4. Data and methodology
stored as wav-files on a flash memory card, which does not generate mechanical background noise. Sound editing consisted of the elimination of high-peak noise (e.g. rattling furniture, slamming of objects onto the table), a reduction of the amplitudes of other irrelevant peaks (e.g. laughter), and the general normalisation of amplitudes. The audio editor Audacity (Audacity Development Team 2010) was used throughout. After normalisation, soundfiles were exported as mp3-files from Audacity. These soundfiles became the input to both the acoustic and the auditory analyses (cf. 4.2 and 4.3). Only ‘first-generation’ mp3-files were used, i.e. multiple compression was not applied at any point. Cieri (2011: 33) generally advises against the use of compression in speech analysis. However, no measurable artefacts of data compression were found in the formant readings of a few tests that were conducted. Cieri is certainly right in saying that storage capacity should not be an argument in favour of data compression, but the processing speed of the computer can be, as using uncompressed soundfiles seriously slows down the acoustic analysis. 4.2 Analysing acoustic vowel data Acoustic vowel analysis in sociophonetics is usually based on the source-filter theory of speech (Fant 1970: 15; cf. also Fant 1973: 6; Clark, Yallop & Fletcher 2007: 233–234). A simplified source-filter structure consists of a single source – the vibrating vocal folds powered by pulmonic airflow – and a filter mechanism defined by variable positions especially of tongue and lips. The filter is responsible for differences in sound quality. Acoustically, it is defined by its resonances or formants, i.e. certain frequency areas that are enhanced, while others are dampened (Clark et al. 2007: 217, 244; cf. Gut 2013a). The resonance frequencies (F1, F2, F3, …) at a specific point in time are referred to as the formant pattern (Fant 1973: 5). There is little argument about the general importance of formant frequencies as acoustic properties of vowels (e.g. Borden, Harris & Raphael 1994: 184; Ladefoged 2003: 104). Not only are F1 and F2 generally given the most important role in descriptions, but they are also very often interpreted as acoustic correlates of the articulatory dimensions [close ↔ open] and [back ↔ front], respectively (e.g. Stevens 1997: 503; Ladefoged 2003: 105). However, according to Borden, Harris and Raphael (1994: 106–107), formant outputs have to be understood as “the acoustic response of the vocal tract as a whole to the components of the source”. That is, formant centre frequencies enable the analysis and classification of vowel outputs, but they do not reveal the specific articulatory settings that produced them. If, therefore, in the present study, the
53
54
A Sociophonetic Approach to Scottish Standard English
variable (e) is classified as close-mid and front on the basis of its formants, this really refers to acoustic closeness and frontness, and a statement of this kind has to be taken with due caution. The software used for all acoustic vowel analyses in the present study is Praat (Boersma & Weenink 2010). 4.2.1 Making vowel measurements Formant analysis in Praat depends on Linear Predictive Coding. For consecutive time intervals, the computer calculates a specified number of formants in a specified spectral range (Ladefoged 2003: 115; Johnson 2003: 40), which provide the best possible fit to the power spectrum of the output (Ladefoged 2003: 120; Johnson 2003: 97). The formants at different points in time are displayed as formant tracks. For the Praat analysis in this study, the expected number of formants was 5, with a range of 5 kHz for men and 5.5 kHz for women. This is in accordance both with the Praat manual (Boersma & Weenink 2010) and Ladefoged (2003: 113, 125). The time-step strategy was set to ‘automatic’, and the window length for the formant analysis was set to 40 ms, resulting in time steps of 10 ms for the formant tracker. If spurious formants were detected in addition to or instead of the formants shown in the spectrogram, either the ‘maximum formant’ setting or the ‘number of formants’ setting was altered. Tokens were excluded if they were masked or if the formant pattern was not clear and prevented the location of extrema. A large number of monophthong vowels were measured for the present study; while not of interest as sociolinguistic variables, they were needed as normalisers, i.e. vowels measured for the purpose of centring different speakers’ vowel spaces (see 4.2.2). Adopting what Strange (1989: 2081) calls the “simple target” model of monophthong perception, the course of action suggested by Ladefoged (2003: 104; see also Thomas 2001: 12) was followed in the present study. The centre frequencies of F1 and F2 were averaged across a 30-millisecond interval at the centre of the vowel, ideally with both formants relatively steady. If there was no steady-state, the position of the 30 ms interval was guided by turning points in the formants, assuming that these were (at least articulatorily) maximally distant from the consonants responsible for the movement (see Di Paolo, Yaeger-Dror & Wassink 2011: 91–92). The mean formant pattern of F1 and F2 across the relevant interval was then treated as a static target. For diphthongs, Ladefoged (2003: 104–105) suggests measuring F1 and F2 at appropriate points near the beginning and end of the vowel, far enough away from its transitional boundaries. These points will typically be found where formants change direction, i.e. at peaks and troughs in the formant track (Thomas 2001: 12; Ladefoged 2003: 132–133). This procedure identifies and measures the points at which formants are likely to be least affected by transitions and roughly
Chapter 4. Data and methodology
corresponds to what Di Paolo, Yaeger-Dror and Wassink (2011: 91–92) call the “maximal displacement approach”. Based on an expected closing and fronting gesture of (e) and a closing and backing gesture of (o), the first measuring point (target one, hereafter T1) was set at the early maximum of F1, interpreting rising formant gestures up to that point as part of consonantal effects. The second measuring point (target two, hereafter T2) was set at the minimum of F1 towards the end of the vowel, or before destabilisation set in. Vowels were excluded if their formant patterns were unstable enough to seriously hamper the location of minima and maxima. Figure 4.1 shows how the words bait and no were measured in the word list reading of speaker 27-22m-c. The spectrograms are shown with the formant tracks faintly visible as dots. T1 and T2 are indicated, and the gestures of F1 and F2 between them are highlighted in white. bait T1
T2
/b/ /e/
/t/
no T1
/n/
T2
/o/
Figure 4.1 Measuring points in the words bait and no.
The two targets are located some distance away from the edges of the vowel at those points where the relevant formant maxima and minima were identified. Note that the entire gesture is shown only for illustrative purposes; in the quantitative analysis, only values of F1 and F2 at both targets play a role. 4.2.2 Vowel transformation Differences in frequency are measured as linear but perceived logarithmically (e.g. Miller 1989: 2120). This means, for example, that, auditorily, the gross frequency difference of 50 Hz between 200 Hz and 250 Hz is not equivalent to that between 400 Hz and 450 Hz, because it is not the absolute difference that matters, but the ratio. This perceptual mechanism essentially also applies to formant centre frequencies (cf. Ladefoged 1967: 87; Johnson 2003: 51).
55
56
A Sociophonetic Approach to Scottish Standard English
However, it has been argued that logarithmic treatment of frequency differences is not adequate under all circumstances (e.g. Zwicker 1961: 248), and in consequence the Critical Band Rate (CBR) and its unit, Bark (Bk), were devised (Zwicker & Terhardt 1980). CBR compensates for the fact that frequency intervals are judged logarithmically only up to about 500 Hz, whereas in higher areas of the spectrum more than a doubling of frequency is required, for example, for an interval to be perceived as an octave (Neppert 1999: 58–59). The Bark scale is often regarded as best suited to the representation of the frequency spectrum in speech. It is variously referred to as “a measure of auditory similarity” (Ladefoged 2003: 130), the “psychoperceptual equivalent” of Hertz (Watt, Fabricius & Kendall 2011: 111), or, more generally, as a “perceptual unit” (Watt & Fabricius 2002: 161). However, it can be argued that the Bark scale has to be understood as a kind of link-function, i.e. an intermediary between gross frequency and logarithmic transformation (cf. Harrington & Cassidy 1999: 18–19; Zwicker & Fastl 1999: 160). In the present study, measured formant frequencies (in Hz) are first transformed into critical band rate (in Bk), using Traunmüller’s (1990: 99) formula. In a second step, a conversion of Bark into octaves relative to 1 Bk is applied. For this quantity the unit ωBk (‘omega-Bk’) is proposed. It is an expression of how often 1 Bk needs to be doubled in order to arrive at the respective critical band rate of a measurement. Unlike Bark (and certainly unlike Hz), the resulting scale can with some confidence be treated as linear, i.e. same absolute differences in ωBk between two values will have the same auditory significance, irrespective of the underlying values in Hz. Naturally, both conversions can also be applied in a single step that directly uses f (Hz) as the input quantity. This is shown in Equation 4.1, where the bracketed part of the numerator is Traunmüller’s (1990: 99) formula.30 Equation 4.1 Conversion of Hz into ωBk.
log ( Bk
26.81 .53) 1960 / f 1 log (2)
Psychoacoustic transformation of the kind described above would be sufficient if the research were interested in absolute differences between isolated acoustic events. However, as Hindle (1978: 162) points out, [v]owels spoken by different individuals that sound different may have the same formant measurements, and conversely, vowels of different speakers that sound the same may have different formant values. 30. The figures in the equation have no intrinsic meaning. They simply provide the closest approximation of Hz-values to the experimentally established centre frequencies of critical bands.
Chapter 4. Data and methodology
This means that the sociolinguistic as well as the linguistic meaning of vowels largely depends on “the relative formant structure of vowels” (Ladefoged & Broadbent 1957: 103), i.e. the qualities of other vowels uttered by the same speaker (cf. also Fant 1973: 29). As a consequence, in addition to psychoacoustic transformation (see 4.2.2), the measured formant frequencies of a speaker need to be normalised to minimise physiological differences between speakers while preserving phonemic and sociolinguistic information (Hindle 1978: 167; Adank, Smits & van Hout 2004). The method adopted in the present study is very similar to Nearey’s (1978: 138) so-called single logmean procedure. The slight change that is made to Nearey’s original approach is of a psychoacoustic nature. The psychoacoustic unit that is used is ωBk (see above) instead of lnHz, but otherwise the process of vowel space centring is the same. Thus, all formant values used in the present study are the result of passing the original measurements first through Equation 4.1 and then centring them using Equation 4.2. Equation 4.2 Nearey’s (1978) single logmean using ωBk instead of lnHz.
Fnnorm (Bk) j i Fn (Bk) j i Fn (Bk) j
That is, a specific measurement i of formant n produced by speaker j is normalised by subtracting that speaker’s mean value of that formant from the unnormalised value. Prior to this, the gross measurement is converted into the psychoacoustic unit ωBk, as indicated in parentheses. In this approach, psychoacoustic values are retained in the centred vowel spaces. Physiological or idiosyncratic differences will be greatly reduced or eliminated through the two-stage transform, but it remains possible to compare qualitative distances in absolute (psychoacoustic) terms. As Disner (1980) points out, the choice of vowels for the determination of the midpoint partly depends on the language variety at hand. For the present study, a selection of vowels was made that is at the same time large and relatively wellbalanced in SSE, concerning the front-back and open-close dimensions, respectively (cf. Figure 2.2). This set, which will hereafter be referred to as the normalisers, consists of the following vowels: /i, u, ɪ, e, a, ʌ, ɔ, o/ To control for phonetic environment, only material from the reading passage and word list was used for normalisation. For /e/ and /o/, the midpoint between the two measured targets was used.31
31. On average, the input to the normalisation process consisted of 12 tokens per speaker for each of the vowels /i, u, ɪ, a, ʌ, ɔ/ and 27 tokens per speaker for the two vowels /e/ and /o/ in combination.
57
58
A Sociophonetic Approach to Scottish Standard English
4.2.3 Acoustic vowels as variables In the present study, the variables (e) and (o) are treated as complex in the sense that, as variables, they are defined not by a single measurable parameter, but four. This is the case because the two targets that are measured – T1 (onset) and T2 (offset) – can vary in the two dimensions of F1 and F2. Accordingly, for each token four measurements are made, and a separate statistical model is fitted to each of the four. Strictly speaking, both vowel variables break down into four subvariables which are technically treated as unrelated. However, the model output will show that very often several of the four parameters will covary in association with a certain independent variable. The advantage of inspecting T1 and T2 separately for the two formants is a maximal retention of detail and a better insight into the precise nature of acoustic variation. The approach outlined in the previous paragraph is chosen explicitly as an alternative to the use of Euclidean (qualitative) distance as a measure of diphthongisation. The use of Pythagoras’s theorem to calculate the Euclidean distance between vowels in order to gauge their qualitative resemblance has gained some currency in sociophonetics (cf. Di Paolo, Yaeger-Dror & Wassink 2011: 101–102; Harrington & Cassidy 1999: 241–243). Likewise, it is possible to express the degree of diphthongisation in terms of the qualitative distance between two points of a qualitative trajectory (e.g. Haddican et al. 2011). Equation 4.3 shows how the length of a diphthong trajectory D is calculated, where T1F1 is the value of the first formant at the starting point and T2F1 is its value at the end point of the relevant interval. T1F2 and T2F2 are the respective values of the second formant. Equation 4.3 Calculation of Euclidean distance in acoustic vowel spaces.
D (T2F1 T1F1)2 (T2F2 T1F2)2
The clear advantage of using Euclidean distance as the dependent variable is that it constitutes a single parameter. If quantities like D are interpreted as acoustic correlates of diphthongisation, results can be very easily discussed in dichotomies like ‘larger’ vs. ‘smaller’, or ‘more diphthongal’ vs. ‘less diphthongal’. However, one of the problems resulting from this kind of representation is that the often curved or even meandering nature of formant movements is entirely lost. Additionally, and perhaps more critically, Euclidean distance makes a statement only about lengths and distances, but does not take into account the positions of vowels in acoustic space. A very likely pitfall will be to treat items as equal on the basis of trajectory lengths and to ignore that they are very different in other respects, e.g. in terms of general frontness or raising. The present study will therefore make use of Euclidean distance only sparingly, as an additional measure.
Chapter 4. Data and methodology
4.3 Auditory analyses of (r) The nature of (r) as a variable (see 7.1) suggests an auditory analysis (cf. Milroy & Gordon 2003: 144–145), sorting variants into categories rather than measuring acoustic properties. Particularly because the differences between taps and approximants are abrupt rather than gradual with respect to acoustic parameters, an instrumental approach would be problematic. As explained in Section 1.3, (r) is more adequately treated not as a single variable but as three subvariables, coda (r), linking (r), and onset (r), as shown in Figure 4.2. (r) coda (r) Ø, [], [ɾ]
linking (r) Ø, [], [ɾ]
onset (r) [], [ɾ]
Figure 4.2 Variants of the subvariables coda (r), linking (r) and onset (r).
Realisations of coda (r) and linking (r) are categorised as one of the three variants Ø (the zero-variant), [ɹ] and [ɾ]. The latter two could be called ‘broad categories’ as they are abstractions from the underlying more diverse range of actual variants. For example, the category [ɾ] also includes the trill [r] and makes no distinction between flaps and taps, and [ɹ] is a category composite of [ɹ], [ɻ], their fricative variants, and weakly rhotic forms, all of which are not considered distinct variants of (r). For onset (r), the zero-variant is not an option. Thus, only the variants [ɾ] and [ɹ] are taken into consideration for this subvariable. Individual speakers’ recordings of the reading passage, the word list and the careful speech component were annotated in Praat, creating text grids on which tokens containing instances of (r) were labeled. No randomisation of items was applied. However, in order to avoid priming effects, i.e. the adjustment to the speakers first listened to, the auditory analysis was repeated after a break of several weeks. At the second stage, the enhanced familiarity with the overall range of variability enabled the correction of doubtful cases, particularly those that were partially vocalised and therefore only truly analysable in the knowledge of the full dataset. Where necessary, individual tokens were inspected more than twice. It was also helpful in many instances to include the wider (phrasal or clausal) context of tokens in the listening exercise, particularly in rapid speech.
59
60 A Sociophonetic Approach to Scottish Standard English
4.4 Multilevel modelling A multilevel data structure results from what Hox (2010: 4) calls “a multistage sample”, where a number of units is sampled from a higher level, and within each unit a number of lower-level units are sampled. The statistical tools used to analyse nested data are the hierarchical linear model (HLM) – a multilevel version of multiple linear regression (Snijders & Bosker 1999: 2) – and the hierarchical generalised linear model (HGLM) – a multilevel version of multiple logistic regression. Usually, individuals (in the sense of ‘persons’) are nested within groups, but hierarchical structures can also consist of individual observations clustered within persons (Hox 2010: 1). The latter is the case in the present study; a schematic representation of the structure of the dataset is shown in Figure 4.3. While the number of level-3 and level-2 units is invariable, as there are 27 speakers, each of whom produces speech in three text units, the number of observations at level-1 varies between speakers. This is because not all speakers were equally productive in careful speech, or because tokens were missing or unmeasurable. The particular appeal of applying a three-level statistical model to the data in the present study is that the traditional notions of language-internal, stylistic and social variation are mapped onto three levels in an orderly fashion. Level 3
Speaker 1
Level 2
Reading passage 2
Level 1
Speaker 2
………
Careful speech 2
Token WL 2.1
Speaker 27
Word list 2
Token WL 2.2 … … Token WL 2.n
Figure 4.3 The 3-level structure of the dataset in the present study.
The two most important advantages of hierarchical models are that missing data (i.e. unequal numbers of observations within groups) are unproblematic (Snijders & Bosker 1999: 52), and that the assumption of independent observations is relaxed, because dependencies in the data become part of the model structure (Hox 2010: 6). It could be said that the very purpose of multilevel models is to address the problem of dependent observations in datasets, if we equate ‘dependent’ with ‘nested’, ‘grouped’, or ‘clustered’. For a discussion of the potential consequences of applying an ordinary least-squares analysis to nested data see Hox (2010: 3), Snijders and Bosker (1999: 15–16) and Luke (2004: 6–7).32 The software used for 32. See Schützler (2011b) for a methodological study in which a single dataset is analysed both with ordinary multiple logistic regression and hierarchical multiple logistic regression. For a discussion of model assumptions see Raudenbush and Bryk (2002: 253, 291), Hox (2010: 23–24), Agresti and Finlay (2009: 448).
Chapter 4. Data and methodology
all analyses in the present study is HLM7, which is a specialised multilevel analysis program (Raudenbush et al. 2011). 4.4.1 The hierarchical (generalised) linear model In essence, HLM produces models that are multilevel versions of single-level multiple regression (Hox 2010: 8). This relatedness can be seen in the individual components of the schematic model shown in Equation 4.4. In itself, each rung of the hierarchy – or “system of equations” (Luke 2004: 10) – is a plain regression, potentially extendable to a multiple regression. As highlighted in the simplified system on the right-hand side of Equation 4.4, lower-level intercepts depend on (and can be predicted from) higher level information. That is, separate regression analyses are conducted on each level and are then substituted for the respective terms on lower levels. As indicated by the arrows, only the intercepts are assumed to vary across higher-level units, but it is also possible to model variable slopes. For full discussions of the structure of multilevel models, the reader is referred to Raudenbush and Bryk (2002), Luke (2004), or Hox (2010). Equation 4.4 Schematic representation of a random-intercepts HLM (based on Luke 2004: 10). Formal
Simplified
Level 3
Level 2 Level 1
It is intuitively plausible that variability at the lowest level – for example the probability of occurrence of a certain variant – not only depends on factors associated with the individual token, e.g. stress or phonetic context, but also on factors that belong to a higher level, e.g. the formality of the text type (level 2) or the age of the speaker (level 3). The advantage of hierarchical models is that observations are not simply coded for characteristics measured at all levels, but analyses are conducted level by level and then substituted. There are two important consequences of this approach. First, speaker-dependent estimates (e.g. the assessment of the effects of age or gender) are actually based on the number of observed speakers. As the number of speakers is invariably lower than the number of tokens, this will make assessments of the p-levels of social effects more conservative and therefore realistic. Second, estimation errors are assumed to exist at each level, indicated by the terms e, r and u in Equation 4.4. This makes
61
62
A Sociophonetic Approach to Scottish Standard English
it possible to gauge the proportions of unmodelled variability between speakers, between texts, and between tokens. The Hierarchical Generalised Linear Model (HGLM) is appropriate if the outcome variable is not continuous but takes categorical – e.g. binary – values (Raudenbush & Bryk 2002: 291). If a fully linear model is applied to binary data, nonsensical outcomes of p > 1 or p that ’s uh }--} […] b {--{ ach } well it was co 590 it was *?* it was weekends I was in Dumfriesshire 600 and it was a year after my parents died *?* 610 was down 05:01:39 c *?* I was still at school here but 620 down there { during the *?* }-} 05:03:30 69 {-{ so it ’s something like your } home was down there and { your }-} 05:09:74 70 {-{ it was } both because my 630 mother died I was still s uh at school here 640 but I was *?* staying on { a farm }-} 05:10:53 71 {-{ oh when you were still } going to { school *?* }-} 05:14:44 72 {-{ yes I 650 was } still at { school so I went }-} to school here 660 but in Gordon’s { uh }--}
04:50:57 04:57:15 04:57:15 05:00:43 05:03:12 05:09:10 05:10:15
168 A Sociophonetic Approach to Scottish Standard English
05:11:05 05:11:94 73 a {-{ oh I see *?* } 05:14:28 05:14:44 b {--{ yeah } 05:14:44 05:18:75 74 but I was uh at uh 670 s at the weekends and holidays I was on the 680 farm in Dumfriesshire 05:18:75 05:19:78 75 ok […] 12:19:84 12:28:11 218 a I sort of changed 1710 social class because my father as a *?* p musician 1720 and then he worked in the post office and did 1730 part time music work 12:28:11 12:32:07 b uh we stayed in quite a 1740 humble house in uh the north side of Edinburgh 12:32:07 12:34:99 c two 1750 rooms the room and the kitchen that was it it 1760 ’s { uh }-} 12:34:73 12:36:68 219 {-{ and even } your mother being a teacher you { *?* }-} 12:36:23 12:41:27 220 {-{ yeah but } she did n’t work because uh 1770 women married women did n’t teach in those days 12:41:27 12:42:19 221 *?* { uhum }-} 12:41:73 12:48:40 222 {-{ but 1780 when } she was a widow she came back into teaching 1790 so I mo I shifted up class and { then came 1800 to }-} Gordon’s so { it was different }--} 12:46:23 12:46:67 223 a {-{ uhum } 12:47:43 12:51:93 b {--{ she she had } gone to university but not like not have been heavily academically trained 12:51:93 12:53:41 224 well { she ’d she 1810 ’d }-} 12:52:22 12:53:85 225 {-{ or yeah } { *?* }-} 12:53:41 13:00:41 226 {-{ but no } she taught until she married but women 1820 had to { resign }-} when they married because you did n’t 1830 have married { teachers women }--} 12:56:63 12:57:34 227 a {-{ *?* } 12:59:71 13:03:71 b {--{ even though she } was in a way the better { qualified of the two }-} 13:02:01 13:09:07 228 {-{yes that ’s correct that ’s 1840 correct } but she went she { stayed at }-} home looked after 1850 us that was the norm in uh that st at 1860 that period […]
_________________________________________________________________
Appendix C Token numbers C.1 Token numbers of (e) and (o) Numbers before a slash refer to (e); numbers after a slash refer to (o). N Speaker 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Σ
WL
TX
CS
Σ
15/14 15/14 15/13 15/14 15/14 15/14 15/14 15/14 15/13 15/13 15/14 15/14 15/13 15/14 14/12 15/6 15/14 15/14 15/10 13/12 15/10 15/12 13/10 15/8 15/11 15/13 15/14
14/13 14/14 14/14 14/14 14/13 14/13 13/13 14/14 14/13 13/13 14/12 13/14 14/14 12/12 13/13 13/10 14/14 14/14 14/14 12/14 13/8 14/13 13/11 11/12 14/13 9/12 14/14
13/11 12/10 10/8 7/11 9/14 13/14 14/14 14/12 11/14 14/14 10/13 14/12 13/12 8/13 14/14 10/10 13/14 14/10 8/9 4/8 13/8 11/9 11/3 10/6 13/10 6/14 14/12
42/38 41/38 39/35 36/39 38/41 42/41 42/41 43/40 40/40 42/40 39/39 42/40 42/39 35/39 41/39 38/26 42/42 43/38 37/33 29/34 41/26 40/34 37/24 36/26 42/34 30/39 43/40
400/338
359/348
303/299
1062/985
170 A Sociophonetic Approach to Scottish Standard English
C.2 Token numbers of (r) N coda (r)
N linking (r)
Speaker
WL
TX
CS
Σ
TX
CS
1
11
45
42
98
5
7
12
2
10
45
41
96
5
6
11
3
11
45
41
97
5
5
4
11
45
16
72
5
6
5
11
45
45
101
5
6
10
45
39
94
5
7
11
45
44
100
8
11
44
38
93
9
11
45
22
78
4
10
11
45
36
92
5
11
11
44
36
91
5
12
11
45
44
100
5
13
11
44
42
97
14
11
44
41
96
15
11
42
45
16
11
44
42
17
11
42
18
11
45
19
10
20
11
21 22
Σ
N onset (r) WL
TX
CS
Σ
4
25
22
51
4
25
23
52
10
4
25
21
50
11
4
25
21
50
4
9
4
25
25
54
7
12
4
25
24
53
5
5
10
4
25
24
53
6
9
15
4
25
25
54
4
8
4
24
25
53
13
18
4
25
25
54
6
11
4
25
22
51
3
8
4
23
24
51
6
7
13
4
25
25
54
5
5
10
4
25
24
53
98
8
5
13
4
25
24
53
97
5
5
10
4
25
20
49
47
100
6
3
9
4
25
20
49
41
97
5
8
13
4
25
21
50
44
44
98
6
2
8
4
25
21
50
45
32
88
5
7
12
4
24
20
48
11
45
41
97
5
6
11
4
25
24
53
11
44
46
101
6
2
8
4
24
23
51
23
11
45
42
98
5
6
11
4
25
23
52
24
11
45
36
92
5
5
4
25
23
52
25
11
45
42
98
5
4
9
4
25
22
51
26
11
42
44
97
5
3
8
4
25
23
52
27
11
44
44
99
6
5
11
4
25
24
53
Σ
294
1198
1073
2565
143
143
286
108
670
618
1396
Appendix D Independent variables D.1 Technical definitions
onset (r)
Stylistic (level 2) wordlist wordlist = 1 for speech produced in word-list style text text = 1 for speech produced in the reading passage read read = wordlist + text; i.e. read = 1 for speech produced in the wordlist or the reading passage
linking (r)
Social (level 3) age age = 1 if a speaker is between 17 and 22 years old (N = 13); age = 0 if a speaker is between 40 and 62 years old (N = 14) gender gender = 1 if a speaker is female (N = 13); gender = 0 if a speaker is male (N = 14) contact contact = 1 if speakers have been (or are) exposed to SSBE to a more than ordinary extent (N = 5)
coda (r)
Definition
(e) Variables
(o)
Relevant for
0, 1
✓
✓
✓
✓
✓
0, 1
✓
✓
✓
✓
✓
0, 1
✓
✓
✓
✓
✓
0, 1
✓
✓
✓
0, 1
✓
✓
✓
✓
✓
0, 1
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
Scale
Internal (level 1) −2.94 bnclogf The logarithm (base: 10) of a lexeme’s …1.99 frequency (per 10 million words) in the spoken part of the British National Corpus (BNC), centred around the value 3: F 7 BNCLOGF (centred) log 10 3 N
✓
172 A Sociophonetic Approach to Scottish Standard English
clc bnclogf Quantified like bnclogf, but centred around the value 2; based on the spoken BNC frequency of an entire collocation whose two parts are potentially linked by /r/ follo 1 follo 1 = 1 for words like really, tree, free, but also unstressed very or story, as well as brick, trick (collapsing [ri] and [rɪ]) follo 2 follo 2 = 1 for words like friend, reptile, rest, but also train, rain, trail (collapsing [re] and [rɛ]) follo 3 follo 3 = 1 for words like rough, travel, rather (combining [rʌ] with the SSE category [ra] that corresponds to [ræ] and [rɑ] in SSBE) follo 4 follo 4 = 1 for words like rock, broad, road (combining [ro] with the SSE category [rɔ] that corresponds to [rɒ] and [rɔ] in SSBE) follo 5 follo 5 = 1 for words like rule, true, cruel ini 1 ini 1 = 1 for words like fear, here, near ini 2 ini 2 = 1 for words like where, care, dare (disregarding the distinction between [er] and [ɛr]) ini 3 ini 3 = 1 for words like car, mark, art ini 4 ini 4 = 1 for words like north, sport, sort (disregarding the distinction between [or] and [ɔr]) ini 5 ini 5 = 1 for words like sure, poor, tour ini 6 ini 6 = 1 for words like bird, heard, nurse, corresponding to Wells’s (1982: 137) lexical set nurse intervoc It is an expression of whether onset (r). intervoc = 1 if onset (r) is preceded by a vowel, e.g. in -mory (forming part of the word Tobermory), terrible or – across a morpheme-boundary – in to read. Cases where intervocalic /r/ is etymologically part of a syllable coda were analysed in the context of linking (r).
−2.02 …1.75
onset (r)
Scale
linking (r)
Definition
coda (r)
(e) Variables
(o)
Relevant for
✓
0, 1
✓
0, 1
✓
0, 1
✓
0, 1
✓
0, 1
✓
0, 1 0, 1
✓ ✓
0, 1 0, 1
✓ ✓
0, 1 0, 1
✓ ✓
0, 1
✓
Appendices 173
mediph
Coded as mediph = 1 for words that were historically (ME) diphthongs (spellings: , ; , , , ) Quantified and centred like bnclogf, but refers only to the second of two words potentially linked by /r/. Thus, in better off, only the frequency of the word off is quantified. Measured exactly like stress (see below), but refers to the syllable following the potential liaison site. Marks the difference between onset (r) immediately following a pause (postpausal = 1) as opposed to onset (r) in mid-sentence. precons = 1 in words like art with coda (r) contained in a consonant cluster, and precons = 0 in far with word-final coda (r). prepausal = 1 marks cases where the syllable in question immediately precedes a pause, irrespective of whether the segment of interest is in syllable-final position or not. E.g. Where did he go? vs. Where did he go next? or He came by car vs. He took the car to the garage. Measures the rate of speech based on the immediate environment of the target word. In a connected stretch of speech containing the target word and excluding pauses, the number of syllables and the duration in seconds are measured. speechrate is quantified as a logarithmic scale (base 2) centred on the value of 2, as follows:
nxbnclogf
nxstress
postpausal
precons
prepausal
speechrate
speechrate (centred) =
log(syll./s) 2 log(2)
0, 1
−2.94 …2.42
✓
−1.5, −.5, .5, 1.5
✓
0, 1
onset (r)
✓
linking (r)
✓
Scale
coda (r)
Definition
(e) Variables
(o)
Relevant for
✓
0, 1
✓
0, 1
✓
✓
✓
−2.00 …1.47
✓
✓
✓
✓
✓
174 A Sociophonetic Approach to Scottish Standard English
wordini
coda (r)
linking (r)
onset (r)
stress
Definition
(e) Variables
(o)
Relevant for
for (r): ✓ −1.5, −.5, .5, 1.5
✓
✓
✓
✓
Scale
Centred variable with four levels for (r) and three levels for (e) and (o). The approach was guided by the general idea of metrical grids, as found in Hayes (1983: 366–371). Lowest values of stress occur in function words containing /r/, while high values mark syllables that are maximally strong, as is the case in focus stress position. Coding for stress was done by looking at the context of the relevant syllable and giving it a weight relative to the rest of the phrase/sentence (cf. discussion in Gut 2013b). If /r/ is itself in word-initial position, as in red or run, then wordini = 1. If, on the other hand, it is part of a word-initial consonant cluster, as in try or broad, then wordini = 0.
for (e) and (o): −1, 0, 1
0, 1
✓
D.2 Normal values Normal values of independent variables (IV) may vary between different dependent variables (DV). The standard deviation (σ) is given only for the three predictors stress, speechrate and bnclogf, and only in cases where high and low values (of +1σ and –1σ) are of relevance in the main analysis. IV
DV
Normal value
age gender age *gender
all all all
.5 .5 .25
contact
all
.185
σ
Comment assumes balanced age groups assumes balanced gender groups assumes balanced gender and age groups assumes that proportion of 5/27 is ‘normal’ for contact
Appendices 175
IV
DV
Normal value
wordlist
all
0
text
all
0
read
all
0
bnclogf
(e)
.009
1.122
bnclogf
(o)
.362
1.131
bnclogf
coda (r)
.194
1.067
bnclogf
linking (r)
.454
1.180
bnclogf
onset (r)
−.107
1.089
ini 3 ini 6 intervoc mediph mediph prepausal prepausal prepausal prepausal*precons speechrate
coda (r) coda (r) onset (r) (e) (o) (e) (o) coda (r) coda (r) (e)
.074 .146 .311 .449 .140 .234 .268 .263 .060 .171
.456
speechrate
(o)
.138
.415
speechrate
coda (r)
.254
.411
stress
(e)
.073
.729
stress
(o)
.084
.762
stress
coda (r)
−.124
.907
σ
Comment assumes that ‘word list’ is not a normal style assumes that ‘reading passage’ is not a normal style assumes that ‘read speech’ is not a normal style mean value and standard deviation in careful speech mean value and standard deviation in careful speech mean value and standard deviation in careful speech mean value and standard deviation in careful speech mean value and standard deviation in careful speech mean value in careful speech mean value in careful speech mean value in careful speech mean value in careful speech mean value in careful speech mean value in careful speech mean value in careful speech mean value in careful speech mean value in careful speech mean value and standard deviation in careful speech mean value and standard deviation in careful speech mean value and standard deviation in careful speech mean value and standard deviation in careful speech mean value and standard deviation in careful speech mean value and standard deviation in careful speech
Index
A Abercrombie, David 2, 6, 23–24, 26–29, 72–73, 149 accommodation 34–36, 38–40 affected acrolect 144–145 Aitken, Adam J. 1, 8, 17–20, 24, 26, 30–31, 138, 146 Akaike’s Information Criterion (AIC) 64 alternative acrolect 145 anglicisation 5–6, 19, 24–26, 31, 97, 107, 135, 139–144, 146–147 apparent-time 36–37 approximant /r/ 9, 44, 59, 101, 103, 105–109, 112–113, 115–116, 123, 134–135, 138–139, 141–143 see also tapped /r/, trilled /r/ articulation 40–42, 44–45, 53–54, 95–96, 104 Attention Paid to Speech (ATS) 38–40 Audience Design (AD) 38–40 see also Style Axiom B Bark, see Critical Band Rate basic Scottish vowel system 24, 73 bipolarism 22–23 British National Corpus (BNC) 42–43 C carrier phrase 50–51 census data 3 centring diphthongs 26 change in progress 36–37, 108 choice models 63 class, see social class collocation 43, 111, 113, 138 consonantal variables 8, 142, 146 see also vocalic variables
contact 2–8, 1317, 23–25, 33–34, 36, 38, 46, 96–97, 99, 109, 140, 142–145 see also dialect contact, language contact convenience sampling 48 Critical Band Rate (CBR, Bark, Bk) 56 D derhoticisation 28, 102–107, 134, 142 see also semi-rhotic, variably rhotic, vocalisation of /r/ devolution 4 dialect 2–3, 14, 20–21, 23, 33–35 contact 14, 33 see also language contact drifting 22–23, 47 mixing 35 shifting 22 diglossia 22 diphthong 26, 54, 58, 67–69, 72, 77–78, 91, 142 see also monophthong -isation 7–8, 24, 30, 41, 44, 58, 67–68, 70–73, 75–78, 87–88, 91–92, 96–97, 139–140, 146 gliding diphthong 68 narrow diphthong 68, 72, 76 sequential diphthong 68 see also monophthongisation dummy items 50–51 E Edinburgh 3–5, 7, 73, 105, 107, 109, 138, 149–160, 168 education 19, 21, 25 effort 41, 99, 113, 127, 137, 141 elocutionists 19 entrenchment 111, 113 Estuary English 71
Euclidean distance 58, 68, 93 exemplar, see usage-based approaches explained variance 64, 84 external factors 35, 37, 45–46 F face merger 69–70 vowel 8, 28, 69–70 fillers, see dummy items filter 53 fixed effects 63–64 fleece merger 69 formants 53–55 formant track 54–55 see also formants fortition 41, 96 frequency, see lexical frequency G Gaelic 22, 29–30, 127 General American (GA) 8 Giegerich, Heinz J. 1, 6–7, 21, 23, 26 Glasgow 3–5 globalisation 7, 19 goat advancement 71, 75–76, 88 merger 69 vowel 8, 28, 69–71, 88 Great Vowel Shift 70 H Highland English 22 hybrid accents 31, 146 hyper-variation 9, 101 I independence, see Scottish independence independent observations 60
178 A Sociophonetic Approach to Scottish Standard English
intensity 69, 143 interview 39, 47–50 intervocalic context 103, 106, 108, 114, 133–134, 138 intra-class correlation (ICC) 64 J juncture-phenomena 9 see also liaison K koiné 34 -isation 2, 144 see also levelling L laboratory approach 47, 52 Labov, William 10–12, 37, 39–40, 102, 136 Lallans 19 language contact 6, 24, 33 see also dialect contact continuum 1–2, 6, 13–14, 17–18, 20, 22–25, 35, 97, 143 lengthening environments 70, 73, 78 lenition 41–42, 98–99, 104, 113, 137, 141 levelling 34, 144, 146 lexical frequency 35, 42–43, 78, 128, 138, 141–142 see also logarithmic transformation of lexical frequency liaison 110–111, 113, 118 Linear Predictive Coding 54 logarithmic transformation of acoustic frequency 56 of lexical frequency 42–43 of speech rate 41 logistic function 62 regression 12, 60, 62 see also logits logits 62–63 Long Mid Diphthonging 71
M maximal displacement 55 McClure, J. Derrick 1–2, 6, 18, 21, 24–25, 29, 72, 74–75 middle class 3–7, 19–21, 25, 141 model parsimony, see Akaike’s Information Criterion (AIC) monophthong 8, 54, 68, 142 see also diphthong -isation 41, 89, 96 see also diphthongisation Morningside 73, 76–77 N national identity 3 non-rhotic 8–9, 101–104, 107, 135–136 see also rhoticity, semi-rhotic, variably rhotic non-standard 1, 6, 20, 47 normalisation of amplitudes 53 of vowels 55–57 see also normalisers normalisers 54, 57 normal values of predictors 83 O Observer’s Paradox 47 octaves 56 P perception of /r/ 10 of Scottish accents 6 of southern British accents of English 6, 24 of vowels 54–56, 67–69, 76 phonotactics 101–102 pitch 69, 143 prepausal position 45, 78, 106, 109, 113, 131–132, 136–137, 141–142 prestige of accent features 19, 37, 101 of SSE and SSBE 6–7, 23–24, 141 prominence 41, 78, 136–137, 142–143 psychoacoustics 56–57
R random effects 63–65 rapid speech 10, 59, 117–118 rate of speech, see speech rate reading passage 48–52 see also word list reallocation 34–35, 146 Received Pronunciation (RP) 2, 6, 23–25, 145 reduction effects 42, 77–78, 94, 99, 117, 127–128, 137, 141 referendums in Scotland 4 Reformation 18 regression 12, 48, 60–62, 64 see also logistic regression representativeness 48 retroflex /r/ 101, 105 see also tapped /r/, trilled /r/ rhoticity 8–9, 26, 29–30, 101– 105, 107, 110, 113, 135–136, 140 see also non-rhoticity, semirhotic, variably rhotic rhotics 101–102 rhyme 131, 136 RP, see Received Pronunciation S salience 8, 26, 67–68, 73, 135–136 sample size 48, 52, 60 Scobbie, James M. 4, 6, 27, 47, 101, 105–107 Scottish Borders 21 English 1–2, 17–18, 23–24, 29, 71–73, 101, 107, 153–160 independence 4 Vowel Length Rule (SVLR) 1–2, 26–27, 72 semi-rhotic 102 see also non-rhotic, rhoticity, variably rhotic single logmean procedure 57 social class 5, 12, 34, 36, 72, 91, 107–108, 112 see also middle class, working class social identity 12 sociophonology 9–10 source-filter theory 53
Index 179
Southern Standard British English (SSBE) 2–3, 6–8, 10, 13, 17, 20–21, 23–27, 77–78, 97, 115, 134–135, 139–147 see also Received Pronunciation (RP) spectrogram 54–55 Speech Accommodation Theory (SAT) 34, 39–40, 46 speech rate 41, 110, 113, 128, 137, 141 standard accent 2–3, 6–7, 13, 21, 23 dialect 2–3, 20–21 stress 41–42, 78, 98, 111, 113, 128–129, 136–137, 141 structured tasks 49 see also reading passage, word list Stuart-Smith, Jane 1–2, 10, 17, 20–22, 102, 104, 106–109, 112–113, 134 Style Axiom 39 see also Audience Design (AD) supraregional standard 6, 10
T tapped /r/ 9, 29, 59, 101, 105–107, 112–113, 115–119, 123, 127–128, 133–134, 139, 141–142 see also approximant /r/, trilled /r/ teleology 40–41 tenseness 41, 76, 97–99, 139, 142 trilled /r/ 9, 29, 59, 101–103, 105–106, 115, 137, 144 see also approximant /r/, tapped /r/ Trudgill, Peter 14, 25, 33–34, 38–39, 144, 146 U undershoot, see vowel undershoot Union of the Crowns 18 of the Parliaments 19 University of Edinburgh 3, 5, 47 of Glasgow 3 Urban Scots 22 usage-based approaches 42–43
V variably rhotic 102–103 vocalic variables 8, 142, 146 see also consonantal variables vocalisation of /r/ 9, 14, 103–104, 108, 112, 117–118, 135, 141–143 see also derhoticisation vowel demergers 27–28 length 26, 72 see also Scottish Vowel Length Rule (SVLR) mergers 26–28, 30, 44, 75, 138 targets 54–55, 57–58, 74, 79–80, 141 trajectory 58, 67–69, 74–81 transitions 51, 54 undershoot 96, 141 W Wells, John C. 6, 8, 22, 24, 26, 69–73, 102–106, 145 word list 39–40, 48–52, 89, 97, 126, 143 see also carrier phrase, reading passage working class 7, 20, 22, 104, 106–107