The Written Questionnaire in Social Dialectology : History, Theory, Practice [1 ed.] 9789027267771, 9789027258311

Methods of linguistic data collection are among the most central aspects in empirical linguistics. While written questio

178 46 15MB

English Pages 426 Year 2015

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

The Written Questionnaire in Social Dialectology : History, Theory, Practice [1 ed.]
 9789027267771, 9789027258311

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

History, theory, practice

Stefan Dollinger

I M P A C T s tudies in l anguage and so cie t y

The Written Questionnaire in Social Dialectology

40

JOHN BENJAMINS PUBLISHING COMPANY

The Written Questionnaire in Social Dialectology

IMPACT: Studies in Language and Society issn 1385-7908 IMPACT publishes monographs, collective volumes, and text books on topics in sociolinguistics. The scope of the series is broad, with special emphasis on areas such as language planning and language policies; language conflict and language death; language standards and language change; dialectology; diglossia; discourse studies; language and social identity (gender, ethnicity, class, ideology); and history and methods of sociolinguistics. For an overview of all books published in this series, please see http://benjamins.com/catalog/impact

General Editor Ana Deumert

University of Cape Town

Advisory Board Peter Auer

Marlis Hellinger

Jan Blommaert

Elizabeth Lanza

Annick De Houwer

William Labov

J. Joseph Errington

Peter L. Patrick

Anna Maria Escobar

Jeanine Treffers-Daller

Guus Extra

Victor Webb

University of Freiburg Ghent University University of Erfurt Yale University

University of Illinois at Urbana Tilburg University

University of Frankfurt am Main University of Oslo University of Pennsylvania University of Essex University of the West of England University of Pretoria

Volume 40 The Written Questionnaire in Social Dialectology. History, theory, practice by Stefan Dollinger

The Written Questionnaire in Social Dialectology History, theory, practice

Stefan Dollinger University of Gothenburg & University of British Columbia

John Benjamins Publishing Company Amsterdam / Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

doi 10.1075/impact.40 Cataloging-in-Publication Data available from Library of Congress: lccn 2015037176 (print) / 2015039102 (e-book) isbn 978 90 272 5831 1 (Hb) isbn 978 90 272 5832 8 (Pb) isbn 978 90 272 6777 1 (e-book) © 2015 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Company · https://benjamins.com

for Nina-Greta & Benjamin Maximilian

Table of contents List of common abbreviations

xv

List of tables

xvii

List of figures

xix

List of illustrations

xxi

Acknowledgementsxxiii Abstractxxv Companion website

xxvi

Author’s prefacexxvii chapter 1 Written Questionnaires in the wider linguistic context 1.1 Three basic types of language data and WQs  4 1.2 Data in traditional dialect geography  6 1.2.1 The Fieldworker Interview (FI) method  7 1.2.2 Wenker’s Written Questionnaire (WWQ) method  10 1.3 Today’s Written Questionnaire (WQ) methods  11 1.4 The organization of this Book  13

1

Part I.  History & theory chapter 2 A history of written questionnaires in social dialectology 2.1 German-language pioneers 21 2.2 From Wenker’s Deutscher Sprachatlas to Mitzka’s Wortatlas 23 Advantages 25 2.3 Dutch and Flemish WQs 27 2.4 Early English language WQs in the US 28 2.4.1 A new beginning: Alva L. Davis’ (1948) WQ Survey 30 2.4.2 Cassidy’s and Allen’s WQ studies 34 2.5 Scotland and The Linguistic Atlas of Scotland 36 2.6 WQs in Canada: A special case 38 2.6.1 Canadian beginnings 39 2.6.2 Survey of Canadian English (1972) 42 2.6.3 Other Canadian WQs 44

19

viii

The Written Questionnaire in Social Dialectology

2.6.4 Dialect Topography of Canada (1991–2004) 45 2.6.5 North American Regional Vocabulary Survey 46 2.7 Other, more recent applications 48 2.8 Chapter conclusion  51 chapter 3 A comparison of data collection methodologies 53 3.1 Corpus linguistics and WQs: A methodological comparison 54 3.1.1 Limited linguistic contexts: Problem #1 55 3.1.2 Low-frequency items: Problem #2 57 3.1.3 (Positive) Evidence and negative evidence: Problem #3 59 3.1.4 Documentation of social backgrounds: Problem #4 60 3.1.5 Corpora and WQs: A comparison 61 Linguistic examples: Attested and reported 64 3.2 Comparison of elicitation techniques: WQ and FI 65 3.2.1 The Linguistic Atlas of the Upper Midwest (1947–1953; 1973–6) 65 3.2.2 Allen’s WQ and FI data: Chambers’ selection 67 3.2.3 Selecting the best test 68 Testing the data for equivalence 70 Allen’s original assessment 71 3.3 Comparison of elicitation techniques: Sociolinguistic interview and WQ 73 3.3.1 The Observer’s Paradox and WQs 74 3.3.2 McDavid’s test 76 3.3.3 WQs and sociolinguistic interviews in Vancouver, Canada 77 Low-back vowels 78 Yod-dropping 82 3.4 Chapter conclusion  86 chapter 4 Types of traditional WQ variables 4.1 Lexis (vocabulary) 87 4.1.1 A Canadianism is dying out: chesterfield 88 The rise and fall of chesterfield 92 4.1.2 A Canadianism is staying put: The case of tap 95 4.1.3 A Canadianism is entering the scene: take up #9 99 Interpreting the data on take up #9 102 Tracing take up #9 in the Canadian Oxford 104

87



Table of contents

4.2 Morphology 105 Snuck as a global form? 107 4.3 Syntax and usage 109 4.3.1 Different from/than/to? 109 4.3.2 Between you and me or I? 113 The source of the confusion 114 4.3.3 Telling time: 11:40 or twenty-to-twelve? 117 4.4 Pronunciation: Phonemic variables 121 4.4.1 Yod-dropping 122 Yod-retention in avenue: An urban vs. rural split? 124 4.4.2 Variation in lexical item vase 125 4.5 Outlook 129 chapter 5 World Englishes, multilingualism and written questionnaires 5.1 Canadian English and the multilingual speaker 133 5.1.1 From monolingual to multilingual perspectives 134 5.1.2 From the national to the transnational: The sociolinguistics of globalization 138 A new kind of sociolinguistics? 139 Local literary practices and WQs 141 5.2 World Englishes, Global Englishes: Concepts 142 5.3 WQ Elicitation in World English contexts 144 5.3.1 Some problems of WQs in contact scenarios 150 5.3.2 Conceptualizing space in dialect geography 152 5.3.3 Select morphosyntactic features of World Englishes 153 5.4 English as a Lingua Franca (ELF) 156 5.4.1 Concepts 156 5.4.2 Polling language teacher attitudes towards ELF 158 Group identity, mutual intelligibility and ELF 161 5.4.3 Linguistic error or innovation? The case for WQs 162 5.4.4 Discovering variables and variants 164 Lexical innovation 164 Pragmatic innovation: Idiomaticity in ELF 166 Some principles for variable detection 168 5.5 Addendum: Global Englishes and expert WQs 171 5.6 Chapter summary  173

131

ix

x

The Written Questionnaire in Social Dialectology

chapter 6 WQ data and linguistic theory 175 6.1 Real time and Apparent time 176 6.1.1 Age-grading 178 6.2 The S-curve of linguistic change 180 6.2.1 The case of N/V+ing + N compounds 181 6.3 Change from above, change from below: Social class 184 6.4 Gender (sex) 185 6.4.1 Principle 1: Stable situations: Women use the standard more than men 185 6.4.2 Principle 2: Women use more standard forms in changes from above 187 6.4.3 Principle 3: Women use more of the incoming variant in changes from below 188 6.4.4 Indexing social meaning: Gender 189 6.5 Border effects: Autonomy vs. heteronomy 191 6.5.1 Insights from Dialect Topography 193 A cross-border continuum: St. Stephen (New Brunswick) and Calais (Maine) 193 Political borders as linguistic divides: Shone in Ontario and New York 196 Heightened differences in immediate border regions 197 6.5.2 Insights from NARVS: The North American lexical perspective 198 6.6 Sociohistorical framework and explanations 200 6.6.1 Canada’s five major immigration waves 201 6.6.2 Trudgill and Schneider: Two complementary approaches? 202 Trudgill’s (2004) New-dialect Formation Theory 203 Schneider’s (2007) Dynamic Model 205 6.7 Indexing social meaning 208 6.7.1 Three “waves” in sociolinguistics 209 6.7.2 Yod-dropping in CanE as an indexical field 211 6.8 Homogeneity and heterogeneity 214 6.8.1 Homogenization on the national level 215 Homogeneity & Standard Canadian English 216 6.8.2 Homogenization on the continental level 216 6.8.3 Heterogenization (diversification) 217 6.9 Chapter summary  221



Table of contents

Part II.  Practice chapter 7 Questionnaire design and data collection 225 7.1 Planning the questionnaire: Purpose & research question 226 7.2 Structure of the questionnaire 227 7.2.1 Questionnaire length 231 7.2.2 Choice of medium: Paper or online? 233 7.3 Question design 234 Self-reporting and community-reporting 235 7.3.1 Types of questions 236 Closed-response items 237 Checklists 237 Multiple-Choice items: Binary (nominal) & categorical 238 Rating scales 238 Multiple-items scales in social dialectology 240 Inter-related items 241 Open-response items 243 Mixed-response types 244 7.3.2 From raw questions to questionnaire items 245 7.3.3 Self-reporting linguistic behaviour 249 Inventories vs. social correlations 249 Socio-syntactic reformulations 250 Magnitude Estimation Tasks and grammaticality judgements 251 7.3.4 Self-reporting language attitudes and perceptions 252 Language attitudes 252 Perceptions 254 7.3.5 Community-reporting of linguistic behaviour 258 7.3.6 Mitigating prescriptive influence: Framing the questions 259 The role of instructions 259 Using informal language in the questionnaire 261 Reliability 261 Some tentative insights: How to ask and how better not 262 Defining evaluative categories 265 Harnessing a pedagogical phonetic alphabet for social dialectology? 266 Order of stimuli and trial items 267 Intuitive formatting & item ordering 267

xi

xii

The Written Questionnaire in Social Dialectology

7.3.7 Piloting and revising the questionnaire 268 7.3.8 Social background questions 269 7.4 Population sampling 270 7.4.1 Random or judgement sampling? 270 7.4.2 A combined sampling method 273 7.5 Chapter summary  274 chapter 8 Working with WQ data 8.1 The Dialect Topography portal 275 8.1.1 Dialect Topography questionnaire 277 8.1.2 “View Results”: One variable in one location 277 8.1.3 View Results: Comparing two locations 280 8.1.4 Tutorials 283 8.2 Calculating social indices 284 8.2.1 The Regionality Index (RI) 285 Complex cases 290 Applying the RI 291 Adjusting the RI for multilingual respondents 291 8.2.2 Language Use Index (LUI) 292 8.2.3 Ethnic Orientation Index (EOI) 293 8.2.4 Occupational Mobility Index (OMI) and Social Class (SC) 294 8.3 Data-readying in Excel 296 8.3.1 Importing DT data into Excel 296 PC Users 298 Text Import Wizard 299 Mac Users 300 8.3.2 Three basic Excel commands 302 Principles of Excel 302 COUNTIF and SUM 304 Exercises on the file “GH (All) – q1-different” 307 Multiple conditions: COUNTIFS 308 Excel graphing tool 310 More exercises on the file “GH (All) – q1-different” 311 COUNTIFS with indices: RI 311 Exercise with RIs 313 8.3.3 Beyond manual commands: Pivot tables 313 8.4 Chapter summary

275



Table of contents

chapter 9 Statistical testing with R Downloading and installing R 320 Descriptive & Analytical statistics 321 9.1 Why use statistical tools? 321 9.2 Preliminaries: Types of variables and forming a hypothesis 323 9.2.1 Types of variables in traditional WQs 324 9.2.2 Formulating a hypothesis 326 9.3 Univariate analytical statistics 328 9.3.1 One dependent variable 329 9.3.2 One dependent variable, one independent variable 333 9.3.3 Converting a categorical variable into an ordinal variable 335 9.4 Multivariate Analysis with R 339 9.4.1 Linear models: Hierarchical Configural Frequency Analysis 340 9.4.2 (Non-linear) logistic regression modelling 346 Importing “data frames” into R 348 The case for data import checking: Two errors 349 A first logistic regression with interactions 351 When you get a “Warning message” 354 Logistic regression modelling without interactions 355 9.5 Chapter summary  359

319

chapter 10 Epilogue361 10.1 The revival of WQs in social dialectology 362 10.2 WQs and linguistic variables 366 10.3 Desiderata 369 10.3.1 Guidelines for WQ design in social dialectology 369 10.3.2 WQs, geographical space, and potential risks 370 10.3.3 WQs and WQ-internal checks and controls 372 10.4 WQs: The delayed method 373 References375 Index395

xiii

List of common abbreviations AAE AfrE ALF AmE AusE AutE AutG BlSAE BNC BrE CanE ChinE CL CONTE DARE DCHP-1 DCHP-2 DT EAfrE EFL ELF ELT EngEng EOI ESL FI GerE GerG GhaE IndE IndSAE L1 L2 LakE

African American English African English Altas linguistique de la France American (US) English Australian English Austrian English Austrian German Black South African English British National Corpus British (UK) English Canadian English Chinese English corpus linguistics Corpus of Early Ontario English Dictionary of American Regional English Dictionary of Canadianisms on Historical Principles, First Edition Dictionary of Canadianisms on Historical Principles, Second Edition Dialect Topography of Canada East African English English as a Foreign Language English as a Lingua Franca English Language Teaching English English Ethnic Orientation Index English as a Second Language Field-worker interview German English German German Ghanaian English Indian English (East Indian) Indian South African English first language second language Sri Lankan English

xvi

The Written Questionnaire in Social Dialectology

LAP LAS LAUM LAUSC LUI MalE NARVS NigE NORM OED OMI PhilE RI SC SCE SED SingE StCanE WE WEs WQ WWQ

Linguistic Atlas Project Linguistic Atlas of Scotland Linguistic Atlas of the Upper-Midwest Linguistic Atlas of the United States and Canada Language Use Index Malayan English North American Regional Vocabulary Survey Nigerian English Non-mobile, older, rural male interviewees Oxford English Dictionary, Third Edition Occupational Mobility Index Pilippine English Regionality Index Social class Survey of Canadian English Survey of English Dialects Singapore English Standard Canadian English World English World Englishes written questionnaire Wenker’s written questionnaire

List of tables Table 2.1 Table 3.1

Survey of Canadian English, overall results  43 Lexical item cool in the Corpus of early Ontario English (CONTE), 1776–1849 54 Table 3.2 Cool in the Strathy Corpus of Canadian English (1985–2011)55 Table 3.3 50 most frequent words in BNC57 Table 3.4 20 most frequent verbs and noun lemmas in BNC58 Table 3.5 Comparison of features of three linguistic data collection methods62 Table 3.6 Data from FI and WQ by state in the Linguistic Atlas of the Upper Midwest 66 Table 3.7 Comparison of 35 lexical variables from LAUM (Allen 1973–76)69 Table 3.8 Interviewees in the sociolinguistic interviews78 Table 3.9 Matches between self-reports and acoustic data80 Table 3.10 Glide deletion in Vancouver: acoustic measurements of yod-ratio83 Table 4.1 SCE data for chesterfield (question #29) in percent89 Table 4.2 National SCE data for tap (question #31) in percent96 Table 4.3 Comparison of OUP dictionary entries for take up #9104 Table 4.4 Different from by gender and social class 112 Table 4.5 First person pronoun singular in Old English and Present-Day English113 Table 4.6 Old English a-stem nouns declension for cyning ‘kings’ (plural)114 Table 4.7 Cross-tabulation of answers of 2440 responses (Pi 2000)119 Table 4.8 Trajectory of change for telling the time with approximate time line120 Table 5.1 Calculating the Regionality Index: Two fictional examples from Toronto135 Table 5.2 Excerpt from the lexical part of the Bamberg Questionnaire (Krug & Sell 2013)146 Table 5.3 Extract from grammatical section of Bamberg Questionnaire (Krug and Sell 2013)146 Table 5.4 Lexical innovation in ELF164 Table 6.1 Vancouver yod-retention190 Table 6.2 Replies for sneakers in 14–19 year-old, RI 1–5 (%)194 Table 6.3 Pop (vs. soda) in New Brunswick and Maine among 14–19-year-olds 197 Table 6.4 Yod-ful avenue, east to west in %197 Table 6.5 Strength of lexical boundaries between Canada and the U.S. 198 Table 6.6 Strength of lexical boundaries between Canadian regions 199 Table 6.7 Canada’s five major immigration waves201 Table 6.8 Three stages in Trudgill’s New-Dialect Formation Theory203

xviii The Written Questionnaire in Social Dialectology

Table 6.9

Phases in Dynamic Model applied to Canadian English (Schneider 2007)205 Table 6.10 Regional dialects in Canadian English based on Boberg (2005, 2008)218 Table 7.1 Perceived pleasantness of dialects (McKinnie & Dailey-O’Cain 2002)256 Table 7.2 Large-scale sociolinguistic studies that include audio-recordings272 Table 7.3 Sample sizes in two Vancouver WQ studies273 Table 8.1 Overview of Dialect Topography data276 Table 8.2 Independent (social) Variables in drop-down menu of DT web portal278 Table 8.3 Calculation schema for the Regionality Index286 Table 8.4 Has drank (not has drunk) in percent by OMI295 Table 9.1 Fictitious example for yod-less and yod-fulness in student, low n’s 322 Table 9.2 Fictitious example for yod-less and yod-fulness in student, high n’s 322 Table 9.3 Types of variables in traditional WQ studies325 Table 9.4 Overview of statistical procedures and sketch of application criteria328 Table 9.5 Carbonated drink in Quebec City329 Table 9.6 Answers to roof rhyming with hoof in absolute frequencies (ns) to Question 11333 337 Table 9.7 Four major variant types in q74 (wedgie)

List of figures Figure 2.1 Figure 2.2

LAMSAS data for variable (chest of drawer) (Kretzschmar 2009) 31 Dialect map (display and interpretive lines) for the pronunciation of route41 Figure 3.1 Allen’s data for PQ and FI (grey) and normal distribution (black)70 Figure 3.2 Vowel plots for caught/cot (non-normalized)79 Figure 3.3 Vowel plots for Don/Dawn and sorry/sari (non-normalized)81 Figure 4.1 Chesterfield in the Greater Toronto Region and environs, 1991/290 Figure 4.2 Data from Figure 4.1 interpreted with the apparent-time hypothesis91 Figure 4.3 Tap and faucet in the Dialect Topography of Canada database97 Figure 4.4 Greater Toronto (1991/2) tap vs. faucet 99 Figure 4.5 Correct answers take up #9, by the place of formative years (8–18)101 Figure 4.6 Canada Census data 1971–2012, In- and Out-migration of ON and AB103 Figure 4.7 dove (not dived) and snuck (not sneaked) in the Golden Horseshoe by age (1991/2 data)106 Figure 4.8 Snuck (not sneaked) by internet domains in % (31 May 2010 & 4 March 2015)107 Figure 4.9 Different from/than/to in the Golden Horseshoe110 Figure 4.10 Different from in seven Canadian regions 111 Figure 4.11 Non-standard between you and I by education 115 Figure 4.12 Analog answers in the greater Toronto region118 Figure 4.13 Student pronounced with yod-dropping, BC (SCE), Vancouver and Toronto (DT)123 Figure 4.14 Avenue with yod (you) in seven Canadian and three American locations (DT)124 Figure 4.15 Vase rhyming with days, i.e. /veɪz/ 126 Figure 4.16 Vase in Toronto and Vancouver (%) by strength of local ties (Regionality Index)129 Figure 5.1 Soother and dummy in six varieties (14 May 2014)148 Figure 5.2 Variant responses in traditional Inner Circle & hypothetical super-diversity settings151 Figure 5.3 First-ranked varieties for “best” English accents among L2 ELT teachers161 Figure 6.1 Deontic Obligation Markers in Canadian English 177 Figure 6.2 Zed in the Golden Horseshoe, 1991/2 and 2000179

xx

The Written Questionnaire in Social Dialectology

Figure 6.3 Depiction of S-curve of linguistic change181 Figure 6.4 N/V+ing + N, 1980–2007182 Figure 6.5 (ing) in Norwich, England186 Figure 6.6 Between you and me (vs. I) in Ottawa187 Figure 6.7 Subjunctive was/were in Ottawa187 Figure 6.8 Quotatives in Toronto English (ages 9 to >80) 189 Figure 6.9 Canadian English & Canadian identity, Vancouver 2009 (%)192 Figure 6.10 ‘Athletic shoe’ in seven Canadian and four American regions (%)194 Figure 6.11 Shone pronounced /ʃɑn/ in the Golden Horseshoe 196 Figure 6.12 Possible indexes of yod-ful student in Canadian English212 Figure 6.13 Vowels before [r] in guarantee for respondents under 40220 Figure 7.1 Position of questions and error rate by age of respondent231 Figure 7.2 Answers to “Is there a Canadian way of speaking?” by education253 Figure 8.1 Soft drink and pop by RI in Quebec City 291 Figure 8.2 Sofa in Ottawa Valley according to LUI and mother tongue293

List of illustrations Illustration 1.1 Illustration 2.1 Illustration 2.2 Illustration 4.1 Illustration 5.1 Illustration 6.1 Illustration 6.2

Types of linguistic data by degree of monitoring and naturalness  5 Social background questions in Hempl’s WQ29 Part of the questionnaire of the 1972 Survey of Canadian English 42 Chesterfield (left) vs. chesterfield (right) 88 Visualization of Kachru’s Three Circle Model143 Youngest example of dumping truck in The Globe (and Mail) 183 Canada-U.S. border along “0 Avenue” in Surrey, BC (left), and Blaine, WA (right)193 Illustration 7.1 Informed Consent Form (UBC 2013)228 Illustration 7.2 Beginning of Washington State English Survey (2011)229 Illustration 7.3 Instructions for the linguistic part (Washington Survey)230 Illustration 7.4 Raw material from Preston’s 1984 map task254 Illustration 7.5 Perceived dialect regions by 100 Ontarians and 100 Albertans257 Illustration 7.6 Instructions from Gregg’s BC Linguistic Survey 260 276 Illustration 8.1 DT of Canada Introductory Screen (“About”) Illustration 8.2 View Results screen278 Illustration 8.3 Basic output for View Results function279 Illustration 8.4 Automatic Graphing feature (full circles) with manual additions (semi-circles)280 Illustration 8.5 Regional comparison feature281 Illustration 8.6 Output of Regional comparison function282 Illustration 8.7 Tutorial View with Split Windows283 Illustration 8.9 Raw download format (Full Data Length)297 Illustration 8.8 Data Request home screen (Dialect Topography)297 Illustration 8.10 Importing the text into Excel 2010298 Illustration 8.11 Copy-and-pasting the delimiting character into the Excel 299 Illustration 8.12 Imported data in Excel300 Illustration 8.13 Mac procedure for inserting the Pipe symbol via Copy-and-Paste301 Illustration 8.14 Downloaded DT file, Golden Horseshoe (All), q1-different, in Excel303 Illustration 8.15 COUNTIF example304 Illustration 8.16 Basic variant count for q1-different305 Illustration 8.17a SUM command305 Illustration 8.17b SUM command & percent calculation306

xxii The Written Questionnaire in Social Dialectology

Illustration 8.18 Creating a percent formula that is automatically “shiftable”306 Illustration 8.19 Shift box (top) and full percentages with “shifted” function307 Illustration 8.20 COUNTIFS in age groups309 Illustration 8.21a Formula dragged to the right309 Illustration 8.21b First correction of dragged formula309 Illustration 8.22 q1-different by age group & three major variants310 Illustration 8.23 Marking data for graphing tool and chart310 Illustration 8.24 COUNTIFS and RI312 Illustration 8.25 Calling up the PivotTable function in Excel314 Illustration 8.26a Submenu for command “PivotTable…”314 Illustration 8.26b Responses for q1 per age cohort315 Illustration 8.27a Manipulate-able Pivot Table for q1 and all other variables (header in top row)316 Illustration 8.27b Answer for different from only in display view in PivotTables317 Illustration 9.1 Normal distribution with 95% area shaded (95% likelihood or 0.05 cut-off)330 Illustration 9.2 Histogram for wedgie and variants in Vancouver 339 Illustration 9.3 Input routine for HCFA testing343 Illustration 9.4 Select all tables from 0000 to 0015 (in our case)344 Illustration 9.5 HCFA_output_sum.txt for differentvan.txt345 Illustration 9.6a Montreal.news.txt: original output in “q42”349 Illustration 9.6b Montreal news2.txt: original q42 replaced with R-compatible format349 Illustration 9.7a Montreal.news2: missing value for dependent variable in line 234, id 5234350 Illustration 9.7b Montreal.news3: missing values removed351 Illustration 9.8 Logistic Regression modelling, model 1, with interactions. AIC not shown353 Illustration 9.9 Model 2, model.glm2, without interactions356 Illustration 9.10 Model 3, model.glm3, without interactions357 Illustration 9.11 Model 4, model.glm4, without interactions 358

Acknowledgements A debt or gratitude is owed to many people. Commissioning editor Kees Vaes is thanked for realizing the relevance of this topic and suggesting the proper format. Ana Deumert for seeing the potential of this book and for soliciting the perfect pair of referees. I am indebted to one anonymous proposal reviewer and to two anonymous manuscript reviewers, who offered most valuable critique. Vielen Dank to Erik Schleef for most incisive, measured and detailed feedback on a draft version of the entire book. Manfred Krug for reading Section 5.3 in the right light and for volunteering unpublished data from his Malta study. Stefan Th. Gries for lightning-fast and good-spirited help with R and his permission to include the HCFA code on the companion website. Sandra Clarke and Suzanne Power for help with questions around WQs in Newfoundland, J. K. Chambers for piquing my interest in questionnaires in the first place. All shortcomings remain, as always, my sole responsibility. Ruth Dollinger for reading the entire first draft of a hastily composed manuscript and for tolerating my chronic tardiness. Thanks must go to my Joseph-Conradian editor, Ania Basiukiewicz, for straightening out the more opaque ELF constructions I failed to notice in the heat of the writing battle. Since I need to live by what I preach, I occasionally opted for my own innovations regardless. The students in my UBC English Language Majors Seminar (ENGL 489) in the fall of 2014, who, working hard, pushed the limits of the method, and the course’s 2012 and 2013 installments for teaching me what does not work. My lab assistants and junior colleagues Sasha (Alexandra) Gaylie and Jessica Tam for taking on, besides their work on DCHP-2, odd jobs and ends, and always happily so. My Viennese teachers I offer thanks: Henry Widdowson and Barbara Seidlhofer for introducing me first to englische Sprachwissenschaft. Nikolaus Ritt for showing me that English linguistics does not stop with current English linguistics literature and, especially, Herbert Schendl, who is always there with the right doktorfatherly words of advice. Finally, Richard Schrodt for relating the Austrian German vernacular to the bigger linguistic picture and allowing me to connect my Anglistik with my Germanistik existence.

xxiv The Written Questionnaire in Social Dialectology

My UBC colleagues, Elizabeth Hodgson for her patience and good counsel, Margery Fee for her encouragement in 2002, 2015 and in between, Laurel Brinton for her high-principled input. Ira Nadel, Mary Chapman, Barbara Dancygier and Tiffany Potter for straight talk and support.

Vielen Dank, tack så mycket and thank you.

Abstract The methods and procedures used to collect linguistic data comprise some of the most central aspects in social dialectology, the study of regional and social variation in language. Since the early 20th century, interview methods have been preferred over the “indirect method” of written questionnaires. While written questionnaires hitherto played only a minor role in the field at large, the last decade or so has seen some sort of revival in a number of subfields and various innovations have pushed the limits of this method. It therefore stands to reason that there is more to written questionnaires than usually meets the (linguistic) eye. This book is the first monograph-length account on the theory, history and administration of written questionnaires in the study of regional and social linguistic variation. Reconnecting to a questionnaire tradition that was last given serious treatment in the 1950s, the present book combines the older practice with more recent instantiations and reincarnations and offers an up-to-date, near-comprehensive treatment for the newcomer to the method and the beginner in empirical linguistics and sociolinguistics alike. The text explores the advantages and limitations of written questionnaires in social dialectology in two distinct, yet connected parts: a historical-theoretical and a practical part. The scene is set with a re-evaluative history of the use and avoidance of written questionnaires in traditional dialect geography and sociolinguistics since the late 19th century, with a special focus on English. Methodological comparisons of interview and corpus data with written questionnaires throw into sharp relief the written questionnaire’s strengths and weaknesses, which are illustrated with detailed linguistic variables from traditional (dialect geography) and novel contexts (Global Englishes and English as a Lingua Franca). The most pervasive sociolinguistic theories are explained and contextualized with examples and case studies from Canadian English, a variety that has by historical and geographical accident greatly benefitted from written questionnaires. The practical section is a guide for the newcomer to the field. It caters to the needs of advanced undergraduate and graduate students, was written with special consideration for students in the Arts and Humanities and assumes no knowledge of quantitative linguistics. This part leads readers through a step-by-step process from start to finish, from formulating a research question to the interpretation of (statistically enhanced) data analyses. In the second part, readers should acquire the necessary

xxvi The Written Questionnaire in Social Dialectology

skills for conducting their own written questionnaire studies, from question design and data administration to the tabulation and statistical testing of the most typical variable types of written questionnaire data. The book is addressed to anyone wishing to use written questionnaires for the study of language variation and change and will be of relevance to linguistic geographers, social dialectologists, variationists and sociolinguists of many stripes.

Companion website All data files for Excel (Sections 8.3.2 and 8.3.3) and R commands (Chapter 9) can be downloaded from the book’s companion website:

or

Author’s preface When somebody studies a method without the direct assistance of an experienced practitioner, one is forced to glean insights from the existing literature on the one hand, and to learn by trial-and-error on the other hand. In linguistics, one will quickly find that articles reporting results based on the method in question do not always offer, or if then only in a very limited way, practical instructions on how to proceed. This book has its origins in such attempt. When I first used written questionnaires in the summer of 2008, I was trying to make sense of the methodological sections of existing questionnaire studies, but quickly realized that the method appeared to be only loosely defined and that practices would sometimes stand in outright contradiction. The problem of an apparent lack of universal guidelines became especially obvious when the method was presented to students, whom I required to collect data in a UBC course on Varieties of English (ENGL 323A) in the fall of 2008. While the results from this class-based survey were as good as those published (some discussed in Dollinger 2012a, 2012b), it became obvious that a lot of methodological potential remained untapped. As attempts to gather information on principles and best practices from those who had been actively engaged with written linguistic questionnaires was not particularly helpful either, the idea of composing some sort of “guide” for the design of linguistic questionnaires was first conceived in the spring of 2009. Over the years, the blueprint of the book was extended more and more, including theories, some of longstanding and some newer ones, as three draft manuscripts were tested in advanced upper-level undergraduate courses. Originally, this book was intended as a combination of previously published articles and some newly commissioned papers and section introductions. It is thanks to Kees Vaes of John Benjamins, who, while seeing the potential in the idea, suggested that the text should offer maximum coherence. At a time when Praat and sociophonetics were already buzzwords and were being used in more and more contexts by more and more people, a proposal by a junior scholar on an apparently “old-fashioned” method of yesteryear might have seemed strange on many an editor’s desk. Not so with John Benjamins and Ana Deumert’s impact series, whose assistance resulted in a much improved book. I hope that the outcome will at least in part meet with approval. Experience with multiple methods improves any field. Should this book facilitate this overall goal somewhat, it will have served its purpose. Vancouver, Canada, 1 May 2015

Chapter 1

Written Questionnaires in the wider linguistic context The discipline of linguistics and language study has long been straddling the demarcation line between the humanities and the social sciences, which has, traditionally but simplistically, been defined as qualitative in the former and quantitative in the latter case. A particularly vexing issue is the nature of evidence deemed admissible in linguistic inquiry and, consequently, the definition of what precisely constitutes that discipline. In more than one way, one’s view of linguistics is shaped by evidence that is permitted, or, in other words, by the methodologies that are accepted. Most disciplinary disputes seem to be anchored in disagreements about the nature of linguistic data. This book will address one aspect of that debate. Linguistics has seen more than its fair share of competing approaches over the course of the 20th century. In hindsight, it seems as though an earlier period’s universal agreement about the goals of the discipline had to be seriously challenged. Until about World War I, linguistics was a discipline unified by the commonly accepted methodology of comparative historical linguistics, often called, or considered a vital part of, philology. The goal of comparative linguistics was to reconstruct earlier stages of language development, within or across languages, and to establish the lineages between them. The quest for the Indo-European language, ancestor to most European and some Asian languages and spoken till about 4,000 BC, was a particularly successful venture for much of the 19th and the early 20th centuries. Philological greats like Jacob and Wilhelm Grimm (the Grimm Brothers of fairy tale fame), Karl Verner, Julius Pokorny, August Schleicher, the Junggrammatiker (Neogrammarians) Karl Brugmann and Hermann Paul, or the Anglicist Karl Luick (1964 [1914–40]), to name but a few, propelled the comparative historical method to unprecedented heights. Beginning with Ferdinand de Saussure (1916), an array of different perspectives and foci on language study developed and the available methodological approaches multiplied, if not to say dispersed. Saussure, among other things, introduced two principled distinctions that had profound consequences on the conception of the field. First, Saussure distinguished between a speaker’s actual instances of speech or language (parole) and language as a collectively shared abstract system (langue). Most significantly, he declared langue the focus of study. Second, he separated the diachronic (historical) study of language, which was de facto the overwhelmingly dominant mode of activity, from the synchronic (contemporaneous) study of language. Importance was

2

The Written Questionnaire in Social Dialectology

placed on synchronic perspectives, making historical perspectives no longer the only, or even the most important, kind of perspective. Since Saussure’s work, linguistic approaches have multiplied and several schools have developed. Noam Chomsky’s generative linguistics is one such school, a school that is based on a dichotomy between competence and performance. These are not to be confused with Saussure’s conceptual pair. Competence represents the speakers “tacit knowledge” of linguistic structures, the abstract idea of language, while performance includes all practical “limitations”, such as fatigue, memory restrictions, and, crucially, concrete uses of language in given situations. While with competence Chomsky declared the object of (his kind of) linguistics, performance was relegated to the status of an epiphenomenon, to be merely controlled for and abstracted away from. Chomsky’s categorical dismissal of usage as performance, and thus beyond the scope of the discipline, and his elevation of competence, the tacit knowledge of the infamous “ideal speaker-listener” caused great concern among linguists of many persuasions. With one strike, many linguists’ objects of study were ruled out as beyond the scope of linguistics by Chomsky, whose school concerned himself exclusively with an ideal speaker-listener, in a completely homogeneous speech-community, who knows its [the speech community’s, SD] language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations [and the like, SD].  (Chomsky 1965: 3)

Many protested against this narrow definition of linguistics and the field was, quite understandably but regrettably, rife with dispute. In the early 1970s, Derwing (1973: 25) characterized the time since 1957 (Chomsky’ first monograph) as a period when the discipline gave “the appearance of being racked with disputes, lack of communication, even downright hostility – almost as though it were organized into armed camps”. Philologists and experimental linguists alike were outraged over a field definition that excluded their objects of interest as performance, which was supposed to be sidelined. Shortly after Chomsky’s proposal, a new school of thought gestated and would be established by William Labov, Peter Trudgill and others, who (re)discovered in a quantitative framework that patterns in precisely those aspects ruled out by Chomsky were systematic (e.g. Labov 1972). The new school’s focus on the social dimension of language use was, while building on earlier dialect geographical methods, in that consequence a novelty, giving rise to the field of variationist sociolinguistics. Quantitative methods are at the centre of this discipline that focuses, in contrast to philologists and historical linguists, on the spoken language. Labov’s quantitative approach was, of course, not utterly new. At least since the 1920s, corpus linguists had approached data from a bottom-up perspective and aimed



Chapter 1.  Written Questionnaires in the wider linguistic context

to synthesize linguistic principles from linguistic behaviour, which was accessible via text collections (“corpora”). Early corpus linguists such as Charles C. Fries (e.g. 1925), in Michigan in the American context, or, in the European tradition, Alvar Ellegård (1953) in Gothenburg, Sweden, showed that a principled approach to quantification – then carried out with pen on paper – would produce important insights into language structure. This kind of empirical approach returned on a grand scale only in the 1980s with the advent of affordable home computers, which greatly facilitated the time-­ consuming and tedious but necessary tasks of searching for and counting linguistic forms (see Kretzschmar 2009: 6–63 for a summary). Today, a basic quantitative method is part and parcel of almost all schools of thought and there seems to be agreement, even in most generativist circles, that some form of data collection is required beyond immediate native speaker introspection. So much shall suffice on the wider disciplinary context of language variation, and the regional and social study of language. The present book explores one method of data collection, or rather a group of methods, that employs written questionnaires (WQs) for sociolinguistic, dialectological and variation studies. These approaches combined shall be collectively referred to as “social dialectology”. The term has a somewhat older ring to it, which is fully intended as it harks back to a period in which philological approaches were still part of the linguistic discourse. The term’s simultaneous coverage of social and regional variation and its implied link to historical approaches make it the term of choice. This book, with its focus on regional and social linguistic variation, expressly includes the study of language attitudes and issues of linguistic perception. The examples in this book come from European languages, with a good deal from English. Canadian English will be given a prominent place in the case studies, because of the method’s enduring legacy in that field and the prevalence of WQs in that variety’s scholarship. This introduction is setting the stage for WQ studies in the subdisciplines that I call social dialectology. Immediately following, however, a further step back will be taken on the question of data and evidence in the field with the goal to isolate some of the characteristic, high-level similarities and differences between the most important data collection methods. Thereafter, a brief account of traditional dialect geographical projects, starting with Gilliéron’s and Wenker’s paradigm-setting approaches, will be given. The introduction will show, as will several examples within the book, that Wenker, while a pioneer in many important ways, is quite incorrectly considered the archetypal proponent of the WQ method. This distinction will be reflected in the terminology used, in which WWQ is used for Wenker’s WQ and WQs for modern-day WQs. The final part of this introduction then characterizes this book’s two parts and nine following chapters.

3

4

The Written Questionnaire in Social Dialectology

1.1

Three basic types of language data and WQs

Three basic types of data collection have traditionally been distinguished in language study. These are: – Introspection – Elicitation – Observation Introspection is the method used in standard generativist theory: centred on the idea of the native speaker as the ultimate and best judge of the “grammaticality” of sentences, inferences are made about language structure. Termed “armchair linguistics” (Fillmore 1992), this method in its most extreme form requires only one native speaker of the language – often the linguists themselves – to produce “data”. Elicitation requires more effort and is today one of the most widespread techniques across various subdisciplines. Generative linguists working on endangered languages (see, e.g., Rau 2013) habitually elicit linguistic structures by asking informants “how do you say X or Y in your language” or the like. The way elicitation is usually practiced means that a linguist with no or with limited knowledge of the target language is gleaning insights into the language. Another elicitation technique entails a more informal polling that is often carried out by linguists of all persuasions in the form of “Do you know the term/meaning/construction …?”. As preliminary hypothesis-building attempts, these techniques are entirely reasonable. Generally, though, such anecdotal reports are not taken as evidence and need to be substantiated with more solid data. Observation is considered the most superior form of data collection by many linguists, the “gold standard”, so to speak, especially pertaining to naturally occurring speech (i.e. not in formal settings, performance contexts etc.). Rather than relying on a speaker’s introspection or on the responses elicited from someone conversant in the language, utterances – whether in writing or speech – are collected after the fact and then, in a second instance, systematically analyzed. Corpus linguistic data is the only kind of data that can strictly be classified as observation,1 as will be discussed in Chapter 3. Corpora are collections of text (either spoken, written or both) that are constructed post-hoc, either by sampling texts, or (transcriptions of) recordings, which were made for different purposes. One limitation is that most corpus material today still consists of written texts, since this medium is much more easily accessible than spoken language. The goal of variationist sociolinguists is to elicit data that is as 1. Even sophisticated ethnographic observation is subject to the observer’s paradox, in like manner as the sociolinguistic interview. What is available are mitigation procedures, but no real solution to the problem outside of corpus linguistics, as is addressed later in this chapter and in Chapter 3.



Chapter 1.  Written Questionnaires in the wider linguistic context

minimally monitored and as natural and informal as possible. Labov developed a form of elicitation interview called the sociolinguistic interview (see Becker 2013), designed to elicit a number of speech styles, from more formal to less formal ones, the latter of which were considered the most-prized and most natural forms of human language, i.e. the “vernacular”. Illustration 1.1 depicts the possible approaches to empirical data collection in linguistics, with the exception of introspection (Krug, Schulte & Rosenbach 2013: Figure 3). The methods are arranged on a scale from least natural (–natural), which allows speakers to actively “monitor” or manipulate their linguistic output, perhaps in accordance with socially desirable norms, to the most natural (+natural) and least monitored (–monitor) types of data. In this hierarchy, corpora are considered as more desirable forms than data from the observation category, as only in corpus data with its reliance on “authentic” materials from natural situational contexts, no observing or participant-observing researcher is present who might interfere with or influence linguistic performance (“observer’s paradox”, see Chapter 3). Granted, there are good workaround and mitigation procedures to reduce this effect, but for the sake of the present argument unless one works with texts created for different scenarios than the research purpose, and with no researcher, whether insider or outsider participant, an observer’s paradox is interfering to one degree or another. WQs are elicitation techniques and are to be classified as “metalinguistic” in Illus­tration 1.1 since they generally present direct linguistic questions (not all do, see Chapter 7). As addressed above, there has been a bias against the use of WQs in social dialectology. As will be shown in Chapter 2, from the historical perspective, and from a different angle in Chapter 3, WQs are generally not employed in many variationist studies (see Boberg 2013 for the few exceptions), so that their use demands a special case be made in each instance. In corpus linguistics, somewhat differently yet related, a focus on authentic examples from real-life contexts also creates a bias against WQs, which do not offer such attributes. +natural –monitor

–natural +monitor

Corpora

(Surreptitiously recorded) spontaneous speech (Various genres of ) written texts

Observation

(Surreptitious) participant observation Unconcealed observation with subject consent

Elicitation

Sociolinguistic interviews Metalinguistic interviews and questionnaires

Experimentation

Minimally invasive experiments Invasive experiments

Illustration 1.1  Types of linguistic data by degree of monitoring and naturalness (Krug, Schlüter & Rosenbach 2013: Figure 3)

5

6

The Written Questionnaire in Social Dialectology

Despite the differences, sociolinguistic interviews and WQs share a common characteristic as elicitation techniques, which becomes clear from Illustration 1.1. Sociolinguistic interviews, however, have the advantage of the spoken medium and audio recordings, a trait that WQs cannot offer. Instead, WQs have the advantage of ease of use, and ease of dissemination, collection and analysis, as will be shown in detail in later chapters. The dividing line between Corpora and Observation is of a more categorical nature as shown in Illustration 1.1, since the ethical treatment of participants usually forbids surreptitious recordings in most, if not all, contexts today. Corpora, therefore, need to be set aside as offering a level of authenticity that neither WQs nor sociolinguistic interviews can provide.2 WQs and interviews are elicitation methods, working with different media, but otherwise sharing a number of features. Interviews, however, have been the method of choice in dialect geography for more than a century, a fact that will be explored next.

1.2

Data in traditional dialect geography

Traditional dialect geography focuses on the description of linguistic variation across geographical space. It charts this variation by location, e.g. in location A, speaker X uses linguistic form Y, and the like. The major projects of dialect geography are linguistic atlases, which are thick and large volumes, ideally with maps. They are, unfortunately and inconveniently, completed over longer periods of time, which are generally measured in decades not in years. Whether one looks at the Deutscher Sprachatlas, or the Linguistic Atlas of the United States and Canada (LAUSC) or the Survey of English Dialects (SED), one common denominator binding them together is that it took several decades from the start of the data collection to their actual publication or (partial) completion. With such lengthy projects, it seems clear that methodological innovations often “overtake” works-in-progress. And that is a good thing: if over the course of two or three decades no new methods were invented, something would seriously be wrong in any field. In order to understand the disciplinary contexts, it is necessary to briefly recount the discipline’s historical development. Standard textbooks in dialect geography and dialectology (e.g. Nelson 1983; Chambers and Trudgill 1998) generally give prominence to two pioneering, and methodologically very different projects: Georg Wenker’s postal questionnaire method from the 1870s, which would eventually epitomize in the German Linguistic Atlas (Deutscher Sprachatlas) on the one hand, and Jules Gilliéron’s 2. The bottom of the issue is whether corpora are considered, like the other three forms, a data collection method. This is, in my mind, clearly the case, though there are different opinions on this precise issue.



Chapter 1.  Written Questionnaires in the wider linguistic context

fieldworker interview method, which was developed for the Linguistic Atlas of France (Atlas linguistique de la France, Gilliéron & Edmont 1902–1910), on the other hand.

1.2.1

The Fieldworker Interview (FI) method

For the longest part of the 20th century, the fieldworker interview (FI) method was the undisputed method of choice. The FI method was developed by Jules Gilliéron, the Swiss director of the Atlas linguistique de la France (ALF), who produced the work based on his sole fieldworker’s transcriptions. Gilliéron pioneered the method for his first dialect atlas on the southern Rhône dialects, published in 1880. He surveyed these while hiking the area (Lamelli 2010: 576–7). Gilliéron’s fieldworker Edmond Edmont, on his part, is known to legions of dialectology students as the ‘bicycling fieldworker’ (e.g. Chambers & Trudgill 1998: 17); while Edmont cycled around France, sending his transcriptions off as they became available, Gilliéron edited ALF in record time in faraway Paris. From 1896, when fieldwork began, to 1910, when the last volume of ALF appeared (Gilliéron & Edmont 1902–1910), only 14 years passed. Their method of the face-to-face interview is often referred to as the ‘direct method’ in dialectology, as opposed to the ‘indirect’ method of written questionnaires. The FI method, obviously, relies heavily on trained and skilled fieldworkers who transcribe – in the pre-audio-recording period – answers with pen on paper in narrow phonetic script.3 The FI method would become the preferred method for dialect geography in English and Romance linguistics. While originally fieldworkers asked questions and transcribed the interviewees’ answers precisely and immediately, starting around the 1940s, FIs would be occasionally and partially recorded. It was not until the late 1960s, however, that a linguistic atlas project was taped in its entirety (Lee Pedersen’s Linguistic Atlas of the Gulf States as part of the Linguistic Atlas of the US and Canada). After ALF, the FI was first applied in Italy and the Italian-speaking regions of Switzerland, which produced a much-celebrated linguistic atlas (Jaberg & Jud 1928–1940). Switzerland became a hub for important dialectological projects so that in the years 1940 to 1958, fieldwork for the model-defining Sprachatlas der Deutschen Schweiz (SDS) was undertaken, with volumes being published between 1962 and 2003. SDS was a methodological continuation of the Italian-Swiss Atlas and ALF alike and established a model for smaller regional atlases that are still the norm in German-speaking areas (Scheuringer 2010: 167). Romance dialectology as a whole continued to follow ALF’s lead and its preference of FIs (see Pop 1950). Without much delay, the fieldwork method also found its way to the United States where Hans Kurath, the Austrian-raised American dialectologist, began the Linguistic 3. ALF’s first volume (which contains no maps), can be accessed in open access at [1 May 2015].

7

8

The Written Questionnaire in Social Dialectology

Atlas of the United States and Canada (LAUSC) in 1929. Kurath split the vast North American continent into subareas and introduced, as occurs in every new project, some methodological innovations. Methodologically, Kurath took over Gilliéron’s method “as refined and modified for Italy by Karl Jaberg and Jakob Jud” (Atwood 1986 [1963]: 67) and studied the method in person with members of the Italian team. The FI became the method of choice of almost all4 aspects of the Linguistic Atlas of the United States and Canada projects, whose fieldwork continued into the 1990s. The first and paradigm-setting atlas in the English context was the Linguistic Atlas of New England (LANE, Kurath et al. 1939, Kurath et al. 1972), followed by the Linguistic Atlas of the Middle Atlantic States (LAMSAS, Kurath 1949; Kurath & McDavid 1961), the Linguistic Atlas of the North and Central States (LANCS), for which fieldwork was completed, but no publications came forth. Two other important dialect atlases are the Linguistic Atlas of the Upper Midwest (LAUM, Allen 1973–6), to be discussed further in Chapter 3 for its special relevance for WQs and the ultimate Linguistic Atlas of the Gulf States (see Pederson, McDaniel & Adams 1986–93), which, directed by Lee Pederson, is perhaps the “best” LAUSC atlas. For an up-to-date list see the Linguistic Atlas Project website, see , which, for the first time, offers an easily accessible, clearly formatted and organized overview of the many LAUSC projects. The website is run by the Linguistic Atlas Project, the umbrella organization for LAUSC and other projects, which is directed and curated by William Kretzschmar Jr. The website offers more LAUSC content than previously available, including the entire, hitherto unpublished LAMSAS data set, which is in terms of the number of interviews the most comprehensive LAUSC atlas. LAUSC reveals the long reach of Gilliéron’s method with a direct lineage from Gilliéron’s ALF to Kurath’s LAUSC, and consequently to Raven I. McDavid and now William A. Kretzschmar, who succeeded Kurath in their positions as atlas directors. With the exception of preliminary work by Guy Lowman, the key fieldworker on Hans Kurath’s team, in 1937/8,5 the FI method was not used in England until after World War II when data from 313 locations was gathered between 1950–1961 for the Survey of English Dialects (SED). Kloeke (1952: 134) speaks at the time of Scotland and England as having “no tradition in linguistic cartography”. Paradoxically, after the original project was completed, SED did not offer maps and, at first, published only the raw data in table format and no cartographic representation (the so-called “Basic Materials”, Orton 4. Some exceptions are discussed in Chapter 2, and include Atwood (1962) in the US South and Bright (1971) in the US West. 5. The fieldworker method, interestingly, seems to have been brought to England from the US and not from France. Guy Lowman, principal and legendary fieldworker of LANE, carried out fieldwork in Southern England just before the outbreak of World War II. This unpublished data is analyzed in Viereck (1975).



Chapter 1.  Written Questionnaires in the wider linguistic context

et al. 1962–1971). One had to wait until the late 1970s before the first (!) scientific dialect maps of England appeared (e.g. Viereck 1975; Orton, Sanderson & Widdowson 1978; Upton & Widdowson 11996, 22006; Viereck & Ramisch 1997). England is a comparatively late FI adopter, and continental European scholars, above all the Swiss Eugen Dieth, must be credited for bringing the method to the UK (Mather & Speitel 1975, I: 6). SED and LAUSC overlap in about a quarter of their variables (McDavid 1953a: 566), which allows for interesting inferences of historical input and linguistic change in the former North American colonies and the motherland. All of the mentioned projects are based on FIs and FIs depend, as one of their key characteristics, on very detailed and extensive interviews that generally take more than one day to complete and produce large amounts of data from each interviewee. As a consequence, all projects have the common limitation of the numbers of interviewees they can process, i.e. small samples. To illustrate this point, France’s ALF is based on 639 interviews for the entire country and the Italian and Southern Swiss Italian Atlas on 387 locations (available online, see Tisato 2009). The major atlases in the USA interviewed 208 people (LAUM), 416 (LANE), 564 (LANCS), 1118 (LAGS) and 1162 (LAMSAS) respectively, with usually two or three interviewees per location. LAMSAS and LAGS comprise by far the most extensive data sets. LAMSAS, which has never been published on paper, is now available on the Linguistic Atlas Project website in full and for the first time since the completion of fieldwork in 1974. LAGS, by contrast, was published in paper and conducted rather swiftly from 1968, the start of fieldwork, to 1993, when the last volume appeared. In English dialectology, two schools of thought exist on the FI. The American dialect geography tradition used “work sheets” for data elicitation, which left some room for the individual fieldworkers to find adequate ways to elicit a variable in indirect ways, i.e. without mentioning the target pronunciation, word or construction (see p. 67, question #20.3b for an example). In England, by contrast, the SED was more restrictive by ensuring that every fieldworker asked precisely the same question of each informant. One can see advantages in the US method, such as more lively conversations that tend to reduce the level of monitoring, but also some disadvantages, such as interference from differently worded questions. In the end, though, no matter the approach, both schools share their unequivocal focus on the fieldworker, as expressed here for the SED: No matter how well and ingeniously the questions are drawn up, a questionnaire [for use in fieldwork] will not work or produce the desired results unless it is handled by a competent fieldworker. Much depends upon his [or her, SD] conduct of the interview: there is an art in asking questions in a lively and sympathetic way. Naturally, the questions cannot be put to just anybody. The informant must be both knowledgeable and intelligent, and also quick to respond. (Dieth & Orton 1952: vii)

9

10

The Written Questionnaire in Social Dialectology

Dieth and Orton’s term questionnaire refers to a guide for the fieldworker which is not to be confused with the WQ that is filled out by a respondent. The fieldworker questionnaire is merely a list of stimuli to be elicited and may include ways to elicit them. As much as traditional dialectology focuses on the FI, variationist sociolinguists would focus in like manner on the sociolinguistic interview, which can be considered a methodological advancement in the sense that it operates with a more explicit structure (speech styles) (see Becker 2013 for a summary). Generally speaking, interview methods and protocols have received a great deal of attention, starting with the FI method. One measure exemplifies the FI’s dominance very clearly: in the most substantial survey of dialectological projects until the early 1950s, which was arguably the heyday of dialectology, Sever Pop (1950: 1133–1175) places emphasis on the FI method by devoting 40 pages to its principles, while the WQ method is dealt with on merely two pages.

1.2.2

Wenker’s Written Questionnaire (WWQ) method

Georg Wenker’s method (WWQ) is usually presented as the prototypical WQ study. Wenker was a pioneer of the structured elicitation of dialect features and the first to gather linguistic data for linguistic theory building. One should stress, however, that Wenker’s method, which is discussed in detail in Chapter 2, is not the method used in recent WQ studies. Wenker’s method consisted of eliciting information from German schoolmasters, who, for instance, were rarely from the region at hand themselves and who were not phonetically trained or instructed to translate sample sentences from standard German into the local dialect. In other words, Wenker requested a type of community reporting (see Section 7.3.5) of a region’s typical linguistic behaviour. WWQ is therefore very different from most present-day WQ studies, in which speakers themselves fill out the questionnaire without an intermediary. Wenker’s survey eventually covered the entire German Reich, for which data from some 50,000 locations was collected. The Deutscher Sprachatlas was thereby confronted with a massive amount of data, much more so than in Gilliéron’s case, where data from about 700 speakers was collected. The survey grid in Germany was therefore much tighter than the one in France. For each data point in France, more than 60 data points would be available in Germany. This high density in locations is one of the undisputed assets of the WQ method. In a way, the success of Gilliéron’s method was perhaps contingent on his being able to produce a complete national atlas in less than 15 years, still a feat by today’s standards, while Wenker’s method was mired in logistical problems from the beginning. These included problems in data processing characteristic of the pre-computer era and problems in cartographic representation. Technological innovations could only slightly alleviate these problems until very recently: for example, Wenker’s successor in



Chapter 1.  Written Questionnaires in the wider linguistic context

the 1950s, Walther Mitzka, praised microfilm techniques as a godsend, while we know today of their limited practicability. Only the advent of the internet allowed Wenker’s data to become fully available in DiWA, the Digital Wenker Atlas, which finally presents the data more than a century and a quarter after the start of the project to the interested public (see . Gilliéron must be credited in addition to his linguistic skill for his clear vision to see a large project through from start to finish. Had he not been so efficient, WQs might have had a different status in dialect geography.

1.3

Today’s Written Questionnaire (WQ) methods

WQs have generally not figured prominently in dialectology, though at some point in time they played their part. In the early 1950s, McDavid (1953b: 568) characterized the use of a lexical WQ for the Linguistic Atlas of Scotland in opposition to Wenker’s survey as “a new attempt to obtain by correspondence the materials for a linguistic atlas” and considered the method as suitable to giving “accurate information about the distribution of linguistic forms” (ibid: 570). Four decades later, the WQ had all but lost its momentum. As Chambers and Trudgill (1998: 16) put it so succinctly: the WQ “is no longer the primary method of data-gathering”. Flash-forward 15 years and one can witness an increase in the use of and renewed interest in WQs in social dialectology and variation studies. Buchstaller et al. (2013: 97), for instance, believe that “questionnaire based approaches can be suitable for studying both morphosyntax and phonology” and thus take an approach towards WQs that was quite unthinkable only a decade or so ago. It seems that WQs have finally come to be seen for what they are: a highly interesting method that is often set aside too quickly. As we have seen above, WQs clearly do not produce observation data. Rather, they elicit linguistic information about behaviour. They are tools for different kinds of reporting – either self-reporting – on one’s own use, attitudes or perceptions – or community reporting – reporting on language use in a community. The basic WQ approach as such is of course not new. What is new, though, is that scholars are starting to exploit the WQ for its strengths in unprecedented ways. WQs, as a rather intuitive method, have been developed independently in a number of locations and contexts. At least partly as a consequence of William Labov’s sociolinguistic revolution, WQs lost much of their attraction. They have at times been sidelined for the wrong reasons and have only recently been re-gaining some form of acceptance, yet to varying degrees (e.g. Schleef 2013; Buchstaller et al. 2013; Boberg 2013), outside of their well-established base in the speaker-evaluation tradition (e.g. Giles & Billings 2004 for an overview of that tradition since the 1960s).

11

12

The Written Questionnaire in Social Dialectology

It will be argued in the present book that WQs have a lot to offer for both the quantitative and qualitative study of the correlations between linguistic and social phenomena. The present focus will be primarily on a quantitative angle, as this seems to be the area that is most in need of attention. It will be shown that WQs should be considered as a viable method alongside other choices and should be part of the standard methodological toolkit. WQs are defined as questionnaire-based elicitation tools that are filled out by literate and semi-literate respondents without assistance. WQs are used in a number of linguistic disciplines, from applied linguistics and language pedagogy (see, e.g. Brown 2001; Dörnyei 2003) to speech act theory (e.g. Beebe & Cummings 1996) and the study of language use (e.g. Fuller 2005; Pi 2000). Their range is considerable and question types vary widely. The following overview focuses on the question types that are predominantly used in dialectology, dialect geography and sociolinguistics. Schleef (2013) presents five types of questionnaires used in the latter field. Building on these, I suggest a more general, three-tiered WQ question typology: 1. Questions concerning regional language variation and social language variation: from the use of linguistic varieties in given locales and settings (e.g. Extra & Yagmur 2004) to regional and social variation in language (as discussed in Chapter 4) to social variation of particular linguistics items (e.g. Fuller 2005; Lillian 1995 on the use of “Ms”) 2. Questions concerning language perception & language attitudes (e.g. Preston & Long 1999–2002; Watson & Clark 2014; and e.g. Lambert et al. 1960; Lambert 1967; Bourhis, Giles & Howard 1981; or Jenkins 2007) 3. Questions using acceptability judgements of grammaticality: originally a mainstay in generative linguistics on a binary scale, WQs have come to be used on gradient scales outside of the generative domain since Bard et al.’s (1996) Magnitude Estimation Method (e.g. Sorace & Keller 2005; Hoffmann 2006). This typology by subject area is intended to facilitate the classification of the different approaches that are directly relevant to social dialectology and use of WQs. Questions should also be classified by type of reporting, distinguishing between self-reporting or community reporting, as well as by the type of information sought, with assessments of linguistic behaviour on the one hand or reporting of language attitudes and perceptions on the other hand. We will refer to studies in these areas throughout this book and the suggested typology will aid with their classification.



Chapter 1.  Written Questionnaires in the wider linguistic context

1.4 The organization of this Book The content of this book is organized into two parts: a historical-theoretical part (Part I) and a practical part (Part II). Part I (Chapters 2–6) is comprised of a historically wellgrounded, theoretical overview of the development of WQ methodology. It takes recourse to some predecessors, characterizes typical applications and results of traditional WQ variables, addresses recent adaptations of WQ methodology in the context of global Englishes and migratory studies and, last but not least important, probes into the reliability of WQs when compared to FI, corpus linguistic and sociolinguistic interview data. Part II (Chapters 7–9) was written as a practical aid or ‘handbook’. It aims to illustrate, in as detailed a way as possible, how written questionnaires can be devised, administered and analysed. The overall goal of this book is to offer an introduction to the WQ method to anyone interested in social dialectology and variation studies, from the (upper-level) undergraduate student to the language scholar wishing to explore another method. A brief synopsis of each part and chapter of the book is offered below. The theoretical part begins with a history of the use of WQs in social dialectology and related fields in Chapter 2. The chapter’s overarching goal is to identify key moments in the development of the discipline. WQs were used frequently in the late 19th century in both Europe and, with some minor delay, in the US, and it was only thereafter that FIs replaced them as the primary data gathering tool. From the early 1970s onwards, the sociolinguistic interview more and more replaced the traditional FI method. It will be argued that the 1940s and 50s saw an interesting renaissance of interest in the WQ method in the US. While used in some projects, WQs failed to regain a status as a fully accepted method in English linguistics. However, WQs have been used continuously in non-English linguistics (e.g. Dutch) and the reasons for their lack of acceptance in English will be explored in that chapter. Anglophone Canada is the exception to the rule, as WQs have been in continuous use there since the late 1940s and have provided some of the major findings. It is not just for this reason that examples from Canadian English figure prominently in this book, but also because a focus on one variety offers avenues for theory-building that are otherwise difficult to establish. After this fairly detailed historical sketch, Chapter 3 will probe into the reliability of WQ data. The major concern with WQs is that they do not provide observation data, which is why this chapter begins with a principled comparison of WQ data with corpus linguistic data. Following this, both FI and WQ data are stacked up with sociolinguistic interview data. It will be shown that in the comparison of FI and WQ, WQ data is no different from FI data. In the comparison of WQ and sociolinguistic data, some problems will be identified and earmarked for further exploration. Overall, however, and varying with the linguistic level and the precise linguistic variable and variable contexts, it will be found that WQ data delivers results that are largely equivalent and, generally, highly useful.

13

14

The Written Questionnaire in Social Dialectology

An examination of traditional WQ variables, defined as variables used by the mid20th century, is offered in Chapter 4. This is intended as a kind of “smallest common denominator” and established practice in question design and WQ data analysis. The chapter focuses almost exclusively on variables in Canadian English, as it will set the empirical backdrop for an explication of a number of more general theoretical concepts in Chapters 5 and 6. The elicitation and basic analysis of lexical, morphological, syntactic and usage variables will be illustrated, before more recent approaches are addressed. Chapter 5 explores the application of WQs beyond a traditional scope in the contexts of World Englishes and Global Englishes, where special consideration will be afforded to the study of English as a Lingua Franca, i.e. communication among non-native speakers of English. In this area the method shows special potential to help address data gaps in the description of super-regional and global varieties of English. Key concepts in sociolinguistic theory and historical linguistic theory are the focus of attention in Chapter 6. The idea is to introduce the beginning and intermediate student of linguistics to theoretical concepts and findings that will aid in the work with WQ data, which is the focus of Part II. A number of theoretical approaches from both synchronic and diachronic perspectives will be offered and illustrated using Canadian English. These include staple concepts such as real time and apparent time, the s-curve of linguistic change, change from above and change from below and some concepts involving gender. Among the newer approaches are linguistic border effects, which are of considerable relevance in Canada, sociohistorical frameworks of dialect development and new-dialect formation theory, the indexing of social meaning and thoughts on homogenizing and heterogenizing forces in today’s dialects. This concludes the theoretical part. The practical part is intended to guide the novice in empirical methods in the design of a WQ study, from the conception of an idea (or shortly after that), to the statistical modelling of the data in the open source software suite R. This part was written with the advanced undergraduate student of the Arts in mind, who is generally familiar with qualitative methods of language study but not necessarily, or not at all, with quantitative methods. Part II begins with Chapter 7, which explores questionnaire design from a number of perspectives: deciding on which variables to focus on, finding a question style that works, determining questionnaire length and protocols for data collection, or determining which questions and variables can be polled with WQs and which cannot or not easily so. Chapter 7 also offers a typology of WQ questions in the context of social dialectology, from self-reporting and community reporting of linguistic behaviour to the assessment of attitudes and perceptions. In the context of a question typology, more recent WQ question types, those that have come into use in the 2000s, are discussed in this section.



Chapter 1.  Written Questionnaires in the wider linguistic context

Chapter 8 introduces the reader to practical work with WQ raw data. By using the online database of the Dialect Topography of Canada Project (Chambers 1994), a freely accessible and quite substantial data collection in what may be considered the standard framework, the reader will be shown in a step-by-step process how to download and manipulate the data. The only software tools needed throughout this book are a version of Excel 2007 or higher (or another spreadsheet software, though some commands may not function in the same way) and the freeware statistics and graphics utility suite R. The chapter assumes no prior Excel knowledge and is built around a step-by-step tutorial, with screenshots every step of the way. It limits itself to only a handful of Excel commands that will enable students to work with large data sets. Until this point in the book, with the exception of Chapter 3 for evidential purposes, no statistical tests will be applied, which affords full focus on the WQ data. Chapter 9, finally, looks at the statistical testing of and hypothesis modelling with WQ data. This chapter is an introduction to linguistic computing with R, a suite that is in widespread use in statistics departments and increasingly used in linguistics. Limited to the set of variables found in traditional WQs, this chapter introduces a fully illustrated, step-by-step approach to four procedures that will assist greatly in the detection of patterns and in the identification of significant factors and predictors that (co-)determine the linguistic variables in question. An Epilogue is offered in Chapter 10, which summarizes the most important points on WQs and aims to gauge their potential in social dialectology. In addition to highlighting some immediate desiderata, the attempt of an overall assessment of WQs, with special reference to methodological trends and their perceptions since the mid-20th century, completes this book. It is my hope that this treatment may highlight the great versatility of WQs, which is a feature that has usually been overlooked for some of their perceived shortcomings, while at the same time emphasising their drawbacks in an adequate, balanced and nuanced way. WQs are time-efficient, cost-effective, easy-to-administer tools that show, for a large number of variables, high reliability and validity. A review of the vexing history of WQs in the study of language in geographical and social space, which is a history of changing fortunes, shall be the start of the exploration.

15

Part I

History & theory

Chapter 2

A history of written questionnaires in social dialectology In any school of thought, certain methods are favoured over others. Dialect geographers in the English-speaking world have preferred the fieldworker interview (FI) method over written questionnaires (WQs) to a degree that the latter came to be considered as suboptimal. Scepticism towards WQs has been voiced even by those who successfully employed them. Frederic Cassidy, one of the pioneers of the method, for instance, quite clearly stated his preference: There can hardly be much question that the best method of collecting facts about living language – in this case dialect speech – is that of direct, personal interview of the speaker by a trained interviewer who knows what is significant, and who can elicit this in a natural way and record it accurately. (Cassidy & Duckert 1953: 9)

Such a statement would have made sense – especially at the time – in reference to a collection of pronunciations or a phonetic study, such as the linguistic atlases, for instance. It is, however, of more limited relevance for the study of local and regional lexis to which Cassidy was referring to. His 1953 booklet, prepared with Audrey Duckert, presents a postal questionnaire, a WQ that would later be used as part of the data collection for the Dictionary of American Regional English (DARE). It is surprising that Cassidy, who had just administered a large WQ pilot study in the state of Wisconsin (Cassidy 1948), was stating so clearly the alleged inferior role of WQs in the collection of dialect lexis. After all, the materials collected with the WQ method exceeded his expectations as they “have proven to be even fuller and more various than had been anticipated” (Cassidy 1953: 9, fn 7). Cassidy considered WQs to be a second-best option, employed for the sake of economic viability rather than linguistic merit: If it were possible for our collectors to work primarily by personal interview, that would clearly be best. (Cassidy & Duckert 1953: 9)

The idea of WQs as a second-best method is apparent, and the air of opportunism has been with WQs since Gilliéron’s day, which solidified the position of the fieldworker method in 20th-century English language dialectology and beyond, as shown in Chapter 1. WQs are something that one does, if one can’t help it, but not if there is a choice. Cassidy himself discarded the idea of a WQ as the empirical base for DARE and commenced an FI-based survey in a more limited number of US locations in the 1960s.

20

The Written Questionnaire in Social Dialectology

Once a preferred practice is established in the field, it is customary to focus on other issues and FIs became the de facto method of choice. While there have not been many voices favouring WQs in English dialectology throughout the 20th century, dissenting voices can be found in even some of the earliest research. Angus McIntosh, for instance, offered a possible reason for the avoidance of WQs for the 1950s and 1960s: [S]ome scholars feel strongly that everything, whatever else it may be [lexical, morphological or syntactic information], is phonetically important, and they therefore insist that all the material should be written down in phonetic script by an expert.  (McIntosh 1961: 48)

McIntosh’s assessment certainly offers a reason for a more limited use of WQs: if phonetics is considered to be the most important subdiscipline, then all resources should be supportive. McIntosh, quite rightly, questioned the usefulness of the approach and certainly did not abide by it, as will be shown in the section on Scotland and Scottish English. He suggested, pushing this point further, that phonetic material gathered without an eye on the systemic structure of the sound system or a particular aspect of it, might not be as useful to the phonetician as one might think, as one would likely need different kinds of phonetic contrasts that are not elicited at random (ibid: 48). More recently, Labov, Ash and Boberg (2006: 3–7) have made precisely that point about fieldworker-collected linguistic atlas data in general, even those intended for phonetic study, in their introduction to the Atlas of North American English. They characterize the separation of linguistic geography, or language in physical space, from general linguistics and its theory-focussed work, thereby addressing the long-term severing of the two disciplines. They write that “dialect atlases were produced as works of reference without any immediate connection with the issues that concerned theoretical or descriptive linguistics” (ibid: 4). The idea of collecting data independently of a theory and as objectively as possible proved to be an honourable, but elusive goal, as implicit assumptions are always part of any data collection process. Methodologically speaking, there no longer is a one-size-fits-all method, if there ever was one. It is therefore not surprising that in recent years, WQs have seen a form of modest “comeback” in English linguistics and beyond. The possibility of collecting one’s own regional and social dialect data for a specific theoretical purpose in a relatively short timespan makes WQs a very attractive option that has more to offer than meets the eye. A point that has been made frequently about FIs is that they are expensive and labour intensive and require measurement in decades rather than years. Over the course of a few decades, moreover, linguistic approaches change while data collection and editing of atlas materials is still in progress. The Linguistic Atlas of the United States is a good case in point. As the average time from the start of data collection to the publication has proven to be 25 years, and longer if one includes the planning stages, one can imagine that theoretical advances in the meantime render



Chapter 2.  A history of written questionnaires in social dialectology

the published atlas materials less useful than one had originally hoped. If years go by between the start and end of fieldwork, the data is also subject to diachronic language change that cannot be isolated from other factors. While counter examples exist, such as DARE’s US national fieldwork, which was carried out in only five years, the issues described are an inherent feature of most FI surveys. In this section, a different kind of history of linguistic geography is offered: a history that brings to the forefront elements of WQ or WQ-type surveys beyond Wenker’s initial method. While it is impossible to list all projects that feature a WQ component, emphasis will be given to the historical development of WQs as they relate to English linguistics, including important developments in other languages. The aim of the present section is not to be comprehensive, this would be beyond the present scope, but to offer a reasonable and representative view of how WQs have come to be used (or not) in English language studies. The chapter will begin with German-language precursors in dialectology, at least one of which used a correspondence method more than half a century before Georg Wenker. Dutch linguistics will briefly be looked at, where WQs have been in use to a degree only matched in Germany. After World War II, WQs were also experimented with in the United States, which was the locus of English-language dialectology, but were not fully embraced, as Cassidy’s stance above suggests. While fieldwork continued to be interviewer-based in the USA, a lively research tradition centred around WQs developed in Canada starting in the late 1940s that was matched only by Scotland for some time thereafter in the English-speaking world. The chapter concludes with more recent approaches from Germany, the US and England, where, in the latter case, the WQ method arrived with some delay.

2.1

German-language pioneers

The early years of the 19th century saw the first systematic attempts of dialect data collection. In Switzerland, Franz Joseph Stadler, a pastor, published as early as 1806 the first volume of his Probe eines schweizerischen Idiotikons (roughly translated as Attempt of a Guide to Swiss Rural and Uneducated Speech), which was followed, after a favourable reception from philologists in both Switzerland and Germany, with volume 2 in 1812. The Idiotikon is best considered a dictionary of Swiss dialects that translates Swiss German varieties into Standard German. In 1819, though, Stadler published his Landessprachen der Schweiz, oder schweizerischen Dialektologie (Languages of Switzerland or Swiss Dialectology), which can be considered the first national approach to dialectology. Besides a dialect grammar of Swiss German, it includes transcriptions of the biblical Parable of the Prodigal Son in 18 Swiss German dialects, comprising all major regions, from Aargau to Zurich, as well as Raeto-Romansch, French and Italian renderings from various regions, comprising all four official languages of Switzerland.

21

22

The Written Questionnaire in Social Dialectology

Stadler corresponded with people conversant in the local dialects, most of them fellow pastors, and asked them for transcriptions of the Parable in local dialect as a form of community reporting. Stadler can therefore be considered as having systematically applied a “correspondence method”. As may be expected from someone with a profound interest in local dialects, Stadler was well aware of the limitations of the written language to render linguistic variation. He writes: Diese Uebersetzungen, verfasst von Männern, die der örtlichen Sprachart wohl kundig sind, geben getreulich den Dialekt jedes Ortes, so fern sich der Ton und Laut desselben in leblosen Schriftzeichen ausdrücken lässt. (Stadler 1819: VII) These transcriptions, penned by men who are familiar with local speech patterns, render faithfully the dialect of each location in as far as its sounds may be captured in lifeless characters.  [translation SD]

It can be seen from the text above that even in the earliest applications, administrators of WQs were clearly aware of their limitations. Stadler does not seem to have offered any guidance for the rendering of sounds, instead he relied on the educated status of his fellow pastors. Approaches that employed tailor-made transcription systems were applied as well (e.g. Ellis 1869–1889), though with limited success as the burden of learning a new writing system was often too great. The systematicity of Stadler’s approach is remarkable for the early part of the 19th century. Sever Pop, in his comprehensive overview of projects in dialectology, considers Stadler as “le fondateur de la dialectologie” – the founder of dialectology (1950: 763). However, the title of “founder of dialectology” seems to be bestowed more often to Johann Andreas Schmeller, who in 1821 produced the first dialect map based on scientific data collection (Harnisch 1992). Schmeller’s map comprised the German dialects of Bavaria, humbly entitled Kärtchen zur geographischen Uebersicht der Mundarten Baierns (Little Map for a Geographical Overview of the Bavarian Dialects). The Kärtchen was based on linguistic observation, which made it very different from previous cartographic attempts. For instance, Schmeller consistently differentiated between grapheme and phoneme, which was novel at the time. Scheuringer (2010: 159) expressly praises Schmeller’s map for its objectivity by using strictly geographical naming practices without pejorative undertones, and assesses the work overall as “an outstanding harbinger” that predates the acknowledged beginnings of linguistic geography with Georg Wenker by more than half a century. The dialect geographic work by Schmeller, who also compiled the first dialect dictionary of Bavarian German, is of a pioneering nature. At any rate, Schmeller was one of the earliest proponents of an unbiased study of (German) non-standard dialects that was based in the conviction that any variety, standard or not, is linguistically as



Chapter 2.  A history of written questionnaires in social dialectology

good as the next one (Reiffenstein 1981: 292). With that conviction, Schmeller was anticipating descriptive approaches to language that would take hold only in the 20th century.

2.2 From Wenker’s Deutscher Sprachatlas to Mitzka’s Wortatlas Most introductions to dialect geography date the beginning of the discipline in Georg Wenker’s 1876 doctoral dissertation. Wenker must indeed be given credit not just for systematically collecting phonetic variation, but also for consistently striving to present linguistic variation on maps, which is a key point in dialect geography (cartography). In his thesis, Wenker originally aimed to show the regional distribution of a phonetic isogloss in the vicinity of his Westphalian hometown of Düsseldorf. The province of Westphalia is linguistically interesting, as the isogloss bundles in the eastern and central German regions begin to “fan out” in the Rhineland, in a phenomenon known as the “Rhenish Fan”. Wenker’s method consisted of asking school teachers to transcribe 42 sentences into the local dialect. Because the results did not confirm the established linguistic borders – far from it – Wenker expanded his project, first to all of Westphalia in 1879–1880; it included by 1887 the German Reich and until 1926 data from all German-speaking areas in Europe. The “Wenker sentences” were adapted marginally as the postal survey was expanded and eventually 40 sentences were used in most mailings. The sentences, given in Standard German, were transcribed into the local vernacular in whichever way the teachers saw fit. The lack of a coordinated transcription system was justifiably one of the biggest points of criticisms, but Wenker and his successors considered it an unreasonable request to have teachers abide by a system that they would need to spend a great deal of time studying. A selection of the 40 Wenker sentences, from the Handbook to the Deutscher Sprachatlas (Mitzka 1952) is shown below. A translation (in italics) is offered to give readers a sense of the kind of sentences that were used. 1. Im Winter fliegen die trockenen Blätter durch die Luft herum. In the wintertime dried leaves fly about in the air. 2. Es hört gleich auf zu schneien, dann wird das Wetter wieder besser. It will soon stop snowing, then the weather will be better. 3. Thu die Kohlen in den Ofen, daß die Milch bald an zu kochen fängt. Put the coal in the stove so that the milk may soon start to boil. 11. Ich schlage Dich gleich mit dem Kochlöffel um die Ohren, Du Affe! I will slap your ears with the cooking spoon, you monkey!

23

24

The Written Questionnaire in Social Dialectology



17. Geh, sei so gut und sag Deiner Schwester, sie sollte die Kleider für eure Mutter fertig nähen und mit der Bürste rein machen. Go and be so good as to tell your sister that she should finish sewing the clothes for your mother and clean them with a brush. 26. Hinter unserem Hause stehen drei schöne Apfelbäumchen mit rothen Aepfelchen. Behind our house, there are three beautiful little apple trees with little red apples. 32. Habt ihr kein Stückchen weiße Seife für mich auf meinem Tische gefunden? Have you not found a bar of white soap for me on my table? 38. Die Leute sind heute alle draußen auf dem Felde und mähen. All the people are outside today in the field mowing. 39. Geh nur, der braune Hund thut Dir nichts. Come along, the brown dog won’t hurt you. 40. Ich bin mit den Leuten da hinten über die Wiese ins Korn gefahren. I rode with the people over the meadows back there and into the corn field. (Mitzka 1952: 13–14)

Some of the Wenker sentences seem to be representing variants that were possibly not used throughout the German-speaking world. For example, sentence #2 is stylistically strange, #3 has an atypical placement of particle an, which is separated from the verb fängt and sounds like an archaic use, even by the standards of the 1870s (Standard German: “bald zu kochen anfängt” or “bald anfängt zu kochen”). For a pioneering project it would have been too much to ask for all those aspects to be controlled for, such as the use of pan-Germanic forms and structures. Most sentences reveal some of the social practices of the late 19th century, including aspects of farming and rural life of the day, e.g. #38, #40, domestic life, e.g. #3, #17, or disciplining children, e.g. #11. Wenker’s surveys produced the Sprachatlas des deutschen Reiches (Linguistic Atlas of the German Reich), in two hand-drawn, multi-coloured versions (Wenker 1887– 1923) and eventually gave rise to the Deutscher Sprachatlas (German Linguistic Atlas) (Wrede, Martin & Mitzka 1927–1956). At the end of data collection in 1939, Wenker’s set of 40 sentences was transcribed for almost 40,000 locations in German-speaking Europe (villages with a school) and included about 50,000 transcriptions of the Wenker sentences. Wenker’s method relied heavily on regional school inspectors and primary school teachers (often “village teachers”). The former received the Wenker sentences and were instructed to look for a “suitable teacher” to transcribe the sentences into the local dialect. In his instructions to the school inspectors, Wenker addressed linguistic variation: he expressly requested more than one answer sheet from urban areas, presumably to mirror the more diverse linguistic variation found in larger towns and cities. In biand multilingual areas he suggested the translation of the sentences into other local languages. He stressed the importance of every contributor proceeding with a “certain



Chapter 2.  A history of written questionnaires in social dialectology

amount of love” (mit einer gewissen Liebe) for the work and asked, finally, for the names of the respondents. In return, he promised to send an account of the local language of a given region to every contributor, should such volume come forth. Wenker must be credited for his perseverance and his belief in the method, which was to be much criticized, above all by dialectologists of Romance languages. In German, a vivid dispute between Bremer (1895) and Wenker (2013 [1895]) erupted over the reliability of the method. Two points of criticism have been voiced consistently since. First, that the teachers themselves were often not from the target area and, second, that no transcription system or guidance was provided: each teacher was left to his or her own devices to render the sentences in the local dialect, which was a compromise Wenker made but Bremer criticized. The data was used both for phonetic and lexical study, the former of which has been criticized most effectively. Ferdinand Wrede, Wenker’s first successor, and his graduate students showed the general equivalence of the Sprachatlas data with data gathered through the FI method in a number of dissertations that appeared in the series Deutsche Dialektgeographie (1908-present). In the Handbook to the Sprachatlas, Walther Mitzka (1952), Wenker’s third successor as director of the Sprachatlas project, addressed a number of these issues, stressing that the Sprachatlas must be seen as a research tool needing further, more detailed study. The basic problem with the Sprachatlas data remains to this day: it is difficult if not impossible to gauge the kind of biases and errors that the lack of a transcription system and the employment of multiple transcribers might have brought to the data.

Advantages Despite these shortcomings, the Sprachatlas has at least two significant advantages, which are properties of WQs more generally; its most convincing advantage was the result of gathering responses on a very tight grid of locations over all of Germanspeaking Europe that FI methods could not come close. Data from 49,363 locations in the German-speaking areas of Europe (Mitzka 1952: 12) gathered with WQs were collected. By contrast, the Atlas linguistique de la France (ALF), which used FIs, gathered data from a mere 639 locations. As a result of its massive amount of data, the Sprachatlas was confronted with unprecedented problems of data processing running through the project’s history. The production of atlas maps proved so difficult that, initially, only two hand-drawn colour copies were produced of a subset of variables in 1889, being then deposited in the libraries of Marburg and Berlin. The need for visual representation was felt to be the most pressing desideratum, so much so that when Wenker was hired by the German (Prussian) government in 1889, he received the order not to spend any time interpreting the material, but to exclusively draw dialect maps, which he did from 1889 to his death in 1911.

25

26

The Written Questionnaire in Social Dialectology

The second advantage of the Sprachatlas and WQs in general is the time-efficiency with which data can be collected. For his doctoral thesis alone, Wenker collected responses from about 1500 locations, more than twice the number that formed the basis of ALF for all of France. Mitzka, in a study to be discussed shortly, managed to collect via postal questionnaire responses from 15,000 locations in half a year. If some of the infelicities of Wenker’s methodology can be controlled for, WQs stand to be more valuable tools than they are reputed to be. In a way, Wenker’s method was just the beginning of WQs, though German dialectology continued to use Wenker approach for reasons of comparability with Wenker’s Sprachatlas des Deutschen Reiches (1887–1923) and its continuation as Deutscher Sprachatlas (Wrede, Martin & Mitzka 1927–1956). In like manner, newer projects followed suit. Since the Sprachatlas was primarily intended to document phonemic variation, vocabulary was only a by-product of Wenker’s study. Walther Mitzka’s Deutscher Wortatlas (German Word Atlas) followed Wenker’s method minutely, while applying it expressly to lexis (Mitzka & Schmidt 1951–80). Mitzka, therefore, commenced a new survey in 1939 just before the outbreak of World War II. Compiling a list of words that serve as lexical triggers, Mitzka (1938, 1939) used a lexical inventory of 200 terms to elicit lexical variants: 170. Tasse (Ober-, Unter-). – 171. Tauber (männl. Taube). – 172. Tomate. – 173. Topf (irdener). – 174. unfruchtbar (von der Kuh) – 175. Veilchen (Violet) 170. cup/saucer (upper part [cup], lower part [saucer]) – 171. cock (male dove) – 172. Tomato – 173. pot (of clay) – 174. infertile (of a cow) – 175. [variant form of violet] (violet)

The standard German words offer the cues for variant words in the local dialect, which were once more polled via school inspectors and school teachers. This survey followed Wenker’s method in detail without any improvements, with the justification that Wenker’s and Mitzka’s results would need to be comparable. In a critique of the Wortatlas, Kurath (1958) pointed to the main issues, which reiterate the major points of disputes between the FI and the WQ methods and posed the following crucial question: “Can numbers make up for the inferior quality of the records?” (Kurath 1958: 432). This question is perhaps too harsh in the context of a lexical atlas, especially since Kurath uses examples from his native Austrian accent from Villach, Carinthia, to show the limitations of WQs by referring to minute phonetic detail that cannot be gathered with an uncontrolled transcription method. What is seen here is the methodological dispute between Wenker on the one hand and Gilliéron on the other half a century later in a second round: Mitzka vs. Kurath.



Chapter 2.  A history of written questionnaires in social dialectology

2.3 Dutch and Flemish WQs The Dutch and Flemish had an early start in dialectology, with initial word lists sent out by the 1850s and by the 1870s for the Flemish-Dutch dialects of Belgium (Kruijsen & van der Sijs 2010: 182–183). The activities culminated in 1930 with the foundation of a government-funded institute for the study of dialects, which is today’s MeertensInstituut. The Instituut’s main objective was the production of a linguistic atlas of all Dutch-speaking areas with WQs as the main data gathering tool. The first WQ was sent out in 1931 and since then the Instituut has sent out a questionnaire every year (Kruijsen & van der Sijs 2010: 187). In 1924 data collection via WQs for the Linguistic Atlas of the North and South Netherlands (Taalatlas van Noord- en Zuid-Nederland) was started, covering all Dutchspeaking areas until 1958. The methodological choice was not without critique (Pop 1950: Vol. II tells the full picture), as WQ studies continued to be criticized. One early Dutch project organized by the Geographical Society was discontinued because of “criticism from the linguistic world” (Kruijsen & van der Sijs 2010: 183). The first installment of the Taalatlas was published in 1939 by a team of collaborators that included L. Grootaers, G. G. Kloeke and P. J. Meertens. Between 1981 and 1988, another survey continued the project “with the same design” employing WQs (ibid: 188). The Dutch and Flemish have made ready use of the WQ method, often combining it with FIs. The idea that WQs would offer a first glance at a linguistic geographical situation has a long pedigree in this tradition. Projects combining both WQs and FIs have been used at least since the early 20th century. The Dictionaire generale de la Language Wallone (General Walloon Language Dictionary) is an early mixed-method project on the French spoken in Belgium. Although a dialect dictionary, the General Walloon Language Dictionary employed, as early as 1906, a questionnaire that asked respondents about their lexical choices. Respondents were asked to stipulate whether they used a given word, and if not, which equivalent words they knew and whether there were any pronunciation variants. Overall, 20% of responses were deemed “excellent”, 60% “generally good”, while only 20% were deemed as “fragmentary, imprecise, insufficient or drafted in haste” based on data not further specified (Pop 1952: 61). It seems as if even Walther Mitzka (1952: 58), the champion of Wenker-style WQs, stated the unspoken consensus at the time, praising the “most refined methodology, the mixed method”, which combines the WQ and FI methods. In this climate, any study that did not use fieldworkers was confronted with criticism. WQs continue to be used in pilot studies in recent projects: in the Dutch context, the Syntactic Atlas of the Dutch Dialects [SAND = Syntactische Atlas van den Nederlandse Dialecten] (Barbiers et al. 2004), used WQs as a first survey tool, whose results were then harnessed in the main part of the study to identify specific areas and phenomena with oral interviews.

27

28

The Written Questionnaire in Social Dialectology

2.4 Early English language WQs in the US Similar to Germany and the Netherlands, the first empirical projects in the US were WQ-based. The American Dialect Society, founded in 1889, aimed to study linguistic variation in the United States and Canada. In 1894, Georg Hempl prepared a WQ questionnaire to fill the void of linguistic maps of North American English (Hempl 1896a). Hempl’s questions were formed with input from the members of the Modern Language Association and some of his variables have been in use since then. Hempl was trained in Germany and it was therefore not a coincidence that he brought the WQ method to the USA. He solicited, with the help of the American Dialect Society, answers from about 1600 respondents to a written questionnaire survey. Respondents were mostly from the US, but also from Canada, and to a more limited extent from England, Scotland, Wales, and Ireland. The goal was to collect some form of informal speech: “It cannot be too distinctly emphasized”, Hempl (1896a: 315) writes, “that what is wanted is a report of natural speech, without regard to what dictionaries and teachers say is ‘correct’”. Although his sample overrepresented the Northern US, it allowed for a first national and to a limited degree continental and trans-Atlantic perspective, and offered a four-tier linguistic division of the US into North, South, Midland and West (1896b: 438) based on the pronunciation of grease/greasy with either /s/ or /z/. He also offered correlations of the reported frequencies of /s/ and /z/ with frequencies in the “Old Country”, Britain and Ireland. Hempl did not produce a map as such, but a schematic representation of dialect zones. Hempl’s WQ, which included approximately a 100 questions in total, was presented as “trying the experiment of issuing a circular of questions regarding some particular usages” (Babbit & Mott 1896: 313). The variables look remarkably modern, as the examples below illustrate:



27. Would you say “I want up” = ‘I want to get up’? 42. Do you pronounce ‘where’ and ‘wear’, ‘whet’ and ‘wet’ alike? 45. In which (if any) of the following does s have the sound of z ‘the grease,’ ‘to grease,’ ‘greasy’? 49. Which of the following words usually have a as in ‘cat,’ or nearly that? 50. Do any have a sound resembling a in make? 50 ½. Do any have a sound resembling a in art? 51. Do any have a sound resembling a in ‘all’? – calm, psalm, yes ma’am, rather, haunt, drama, gape ‘yawn,’ gape ‘stare,’ almond, salmon, ant, aunt, shan’t, plant, command, dance, answer, sample, laugh, calf, half, staff, draft, path, past, nasty, fasten, ask, basket, glass, grasp? 63. What would you call a wooden vessel for carrying water, etc., a ‘pail’ or a ‘bucket’? 66. Do you say ‘frying pan’, ‘fry pan,’ ‘skillet,’ or ‘spider’? 67. If more than one, how do you differentiate? (Hempl 1896b)



Chapter 2.  A history of written questionnaires in social dialectology

One can see some variables that have been used in surveys ever since, e.g. q63 in LAUSC, q42 in Dialect Topography or q66/67 in NARVS, though Hempl’s format is no longer used today, as it caused some problems, e.g. by including some double-­barrelled questions, such as q42, which is asking respondents to assess more than one issue in the same question. Hempl’s demographic questions, as shown in Illustration 2.1, also strike one as very modern. He polls, for instance, information related to the formative years, as shown below in question 2. Hempl’s chosen period of ages 8–18 will resurface in the Regionality Index calculation of Dialect Topography a century later (see Section 8.2.1).

Illustration 2.1  Social background questions in Hempl’s WQ (Hempl 1896a: 316)

There are technical details that would warrant a different approach today. For instance, it is unusual for today’s standards to elicit historical settlement information from the respondents (question 2). While Hempl’s survey looks decidedly sociolinguistic in nature, WQs were abandoned in American dialectology. In 1963, Atwood delivered a scathing assessment of Hempl’s questions as “awkward and unwieldy” and reasoned that “it is difficult to see how most of them could have produced useful answers” (1986 [1963]: 64). While not all questions were ideally worded, by far the biggest issue must have been the unstructured responses that were received. The only apparent publication coming from the survey is the paper on grease/greasy (1896b), which shows that Hempl’s data was useful and productive by offering the first, profoundly data-driven account of North American dialects. Following Hempl’s approach, half a century would pass by before WQs would again be given serious consideration in the USA. This time a group of scholars involved in the Linguistic Atlas of the US and Canada undertook pioneering studies, by comparing FI data with WQ data. Raven I. McDavid, who would later succeed Hans Kurath as

29

30

The Written Questionnaire in Social Dialectology

director of the LAUSC, assessed these studies in encouraging ways. McDavid stated, for instance, that “a carefully selected multiple-choice questionnaire of lexical items will provide a picture of dialect distributions differing little from that obtained by a preliminary survey in the field” or that material for a “survey of the folk lexicon” can indeed be successfully elicited via WQs (1953b: 569). Alva L. Davis’ (1948) doctoral thesis was the first American study to prove the equivalence of WQ and FI, and it is this unpublished Ph.D thesis to which we now turn.

2.4.1

A new beginning: Alva L. Davis’ (1948) WQ Survey

The ground-breaking study that reinstated the written questionnaire as a legitimate method in American dialectology was Alava L. Davis’ (1948) unpublished dissertation at the University of Michigan. While the theoretical focus of the study was to trace the dialectal lineages of the Great Lakes region back to the eastern US, the study is most remarkable today for its methodological innovations. Davis’ thesis departed from the established methodological framework by putting a questionnaire directly before the respondents, without the aid of an intermediary, and, importantly, in a highly structured format. Davis stresses these points: The innovation in technique which is used in this study is that the informants themselves record their usage; by correspondence the field interview is simulated. (Davis 1948: 26)

In the American context, we know that Hempl had already put away with intermediaries. However, Davis incorporated some of the principles of market research, which was only beginning to become popular at the time. Davis argues that the resulting questionnaire had to be “simple enough so that the linguistically naïve and relatively unschooled informant could indicate his own usage” (1948: 26), for which he offered a set of multiple choice questions (called “checklists”) for each of the 100 lexical questions he put to his respondents. These design features are key in more recent studies and it is quite easy to credit Davis – or perhaps more so Albert H. Marckwardt, who was one of his supervisors – with adapting WQs in a linguistically more reliable way than had been done previously. Davis’ thesis includes a list of six key points: 1. 2. 3. 4.

The item must be a matter of vocabulary. Items should not have an extremely large number of variants. The items should not be those affected by matters of “correctness.” The items should not show a large range of meaning with specialization of certain ones. 5. The distribution of the item [had to be regionally varied]. 6. The item must be clear to the informant. It must be possible to indicate briefly the nature of the intended response [which was tried in test runs].



Chapter 2.  A history of written questionnaires in social dialectology

Limiting the variables to lexis (#1) is a more conservative and cautious approach that can be extended, as will be shown in later chapters, most notably in Chapters 4 and 5. Points #2 and #4 appear to target concepts that are clear and easily polled in writing (which is point #6). Point #2 is intended to restrict variables to those that have only very few, perhaps two or three, major variants, which applies to most variables in traditional settings (but see Chapter 5 on highly mobile and multilingual settings, where this is often not the case). Many lexical variables have only a few high-frequency variants that account for some 90% of all tokens. The LAMSAS variable “chest of drawers”, for instance, has only three major variants among the responses collected from 1162 informants (multiple answers were allowed): bureau (1104 occurrences), followed by dresser (382), and chest of drawers (227). The fourth most frequent variant, chest, is already far behind with only 44 occurrences. By the time we reach the 10th most-frequent variant, we are at 19 occurrences and the 15th, stand, has a mere 7 occurrences (Kretzschmar 2009: 198–9). Figure 2.1 visualizes these variant distributions and produces asymptotic hyperbolic curves (Kretzschmar 2009: 190–209). Asymptotic hyperbolic curves, in short “A-curves”, are typical in lexical variation, where very few variants account for the lion’s share of the variation, while a long tail of minor variants comprise the rest. The interest of most dialectologists would be to model the use of the major variants first. 1200

occurrences

1000 800 600 400 200 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

variant rank

Figure 2.1  LAMSAS data for variable (chest of drawer) (Kretzschmar 2009: 199)

Davis’ criterion #5 is a quality that can only be confirmed after the study, but a hunch for regional variation, or variation in general, certainly helps to produce interesting results. Points #3 and #6 are probably the most salient points in Davis’ list, as it is paramount to avoid or mitigate against the under-reporting of linguistic items that are negatively evaluated from a social standpoint, as heavily discussed linguistic forms may

31

32

The Written Questionnaire in Social Dialectology

indeed be less likely to produce valid results (see Chapter 7). Point #6 is paramount as well, meaning that it is important to pilot the WQs before questionnaire finalization. Davis does not appear to have developed the method from scratch. He was in the fortunate position of building on a questionnaire that Albert Marckwardt, professor at the University of Michigan and one of his supervisors, had used in his teaching. This suggests that the method may have had its origin as a data collection exercise for students and was subsequently tested in Davis’ Ph.D. project. What strikes one as interesting is the brevity of Davis’ instructions and questions – they are surprisingly concise, keeping all instructions and explanations to a minimum, yet highly efficient. See the reproduction below: DIRECTIONS: 1. Please put a circle around the word in each group which you ordinarily use. 2.  IF you ordinarily use more than one word in a group, put a circle around EACH of the words you use. 3.  DON’T put a circle around any word you don’t actually use, even though you may be familiar with it. 4.  IF the word you ordinarily use is not listed in the group, please WRITE it in the space below the item. 5. THE MATERIAL IN CAPITALS IS EXPLANATORY ONLY. EXAMPLE:  TOWN OFFICER:  selectman, trustee, councilman, commissioner

The directions he gave to his respondents confirm a number of principles that stand to this day, at least in some forms of WQs. For instance, Davis stresses that his only interest is in the respondent’s use of the terms, not in anything he or she may have heard (i.e. self-reporting, not community reporting, see Chapter 7). It is also important to note that respondents are expressly encouraged to offer variants that are not listed. One might critique the practice of merely circling all variants without the indication of a priority of variants, though intra-speaker variation (variation that is situation-­ dependent, not speaker-dependent) is the aspect that is perhaps least successfully studied with WQs. The following shows a sample of Davis’ 100 linguistic questions. The questionnaire was concise, with sometimes only the variants given (e.g. #31) and only when needed, some context in upper case letters (e.g. #1 or #2). MARK ONLY WORDS YOU USE YOURSELF: ADD YOUR WORD IF IT ISN’T GIVEN

1. THE PART OF THE DAY BEFORE SUPPER: afternoon, evening



2. WE START WORK AT: sun-up, sunrise



3. WE QUIT WORK AT: sunset, sun-down



13. SHED FOR WOOD ETC., A SEPARATE BUILDING: wood shed, wood house, tool shed, tool house, shed



22. TO DO THE HOUSEWORK: clean up, redd up, ridd up, do up, tidy up, straighten up



31. corn bread, johnnycake, corn pone, pone bread



34. THICK, SOUR MILK: clabber, thick milk, clabbered milk, loppered milk, lobbered milk, bonny-clabber, bonny-clapper, curdled milk, sour milk, cruddled milk, crudded milk, cruddy milk



Chapter 2.  A history of written questionnaires in social dialectology



57. THE HORSE ON THE LEFT SIDE IN PLOWING OR HAULING: nigh horse, leader, near horse, lead horse, line horse, wheel horse, saddle horse, near-side horse



74. toad, toad-frog, hop-toad, warty-toad, grey frog



75. fish worm, angleworm, fishing worm, earthworm, dew worm, mud worm, bait worm, fish bait, angledog, rain worm, eaceworm, red-worm



87. FAMILY WORD FOR MOTHER: ma, maw, mom, mommer, mammy, mommy



90. OF IMMEDIATE FAMILY: my folks, my parents, my relations, my relation, my people, my family, my home-folks, my kinfolks, my kin, my relatives



98. HE THREW: a rock, a stone, a dornick, a donnick (AT THE DOG) 100. GREETING AT CHRISTMAS: Merry Christmas! Christmas Gift!

The last page of the questionnaire asked for the respondent’s social background information. The categories were: respondent’s name, age, education (highest grade reached), residence and length of residence in the community, birthplace, other places of residence (with dates), travel history, parents’ and grandparents’ birth places (as was standard in LAUSC), occupation and other languages spoken. The general detail of the respondents’ social backgrounds must be considered impressive, especially in a temporal context when sociolinguistic approaches as such would not be developed for another two decades. While names are generally not elicited in today’s WQs, often in an attempt to increase the level of anonymity, Davis’ WQ features a number of innovations. To list just a few, we should say that the inclusion of the length of residence in each location is highly advanced, as it allows the tracing of regional influences in a respondent’s biography, as is the inclusion of his or her travel history, which allows the gauging of language contact phenomena. The inclusion of all spoken languages was innovative for its cross-linguistic contact dimension, as we will see in Chapter 5. Davis collected a total of 233 responses, which corresponds to typically 4 or 5 responses in each of 59 locations in the states of Michigan, Illinois, Indiana, Ohio (the American “Great Lake States”). In order to assess the validity of the WQ, Davis grouped the answers to each question depending on a variant considered as common, fairly common or not occurring/rare. In the appendix, Davis provides the raw data from the postal questionnaire, which allows an assessment of his categorization by frequency. For instance, he reports in the first location for question #1: afternoon for two of five respondents, while three answered with evening. Davis’ assessment of the frequency of afternoon is “fairly common”. Clearly, such location-specific variation cannot be reproduced by FI data, where only one or two people per location were interviewed. Since Davis’ method of assessment is heuristic and not statistic, his comparison between WQ and FI data has qualitative undertones. Dividing the data into three dialect regions (northern, central and southern) and comparing the frequencies of occurrence of each variant in each of those regions with one another, Davis reaches the following verdict:

33

34

The Written Questionnaire in Social Dialectology

The correspondence technique, at least when limited to the distribution of folk [i.e. non-standard, SD] words, is shown to be a satisfactorily accurate method of investigation. Its results are comparable in most respects to that of field work, giving almost all the significant features of word distribution. In minor details, there are discrepancies; however, these may be no greater than they would be if the field work were repeated in the same communities. (Davis 1948: 96)

Davis’ result is stunning, since he seems to have proven the equivalence of the WQ with the FI method. As Davis mirrored the social profile of the LAUSC speakers closely, his average respondent age was 72. We know that WQs face special problems when put before the elderly (as shown in greater detail in Section 7.2.1), which is reason to believe that his WQ would have worked even better in a more balanced age sample. Despite very good equivalence, Davis finds it necessary to add a disclaimer: “This is not to say that the [correspondence] technique is preferable to the use of field records.” As a graduate student of the leading dialectologists of the day, one would not expect Davis to draw a more radical conclusion, but instead to err on the side of caution. Examined by Marckwardt, Fries and Kurath, the latter of them the major proponent of the FI method in North America, Davis attests, perhaps to some surprise, the WQ method to be of merely “general usefulness in collecting dialect information” (Davis 1948: 3).

2.4.2 Cassidy’s and Allen’s WQ studies While Davis pushed the methodological limits of the WQ in English, he stopped short of declaring it equivalent to the FI method. He was, however, only one of several researchers to use WQs in the period. As early as 1947, Frederic G. Cassidy formed the plan to carry out a preliminary lexical survey of the state of Wisconsin (Cassidy 1948). His Wisconsin English Language Survey (WELS) would serve as one type of data source for the Dictionary of American Regional English (DARE, Cassidy & Hall 1985–2013), which otherwise relies on FIs for its non-historical data. For WELS, Cassidy employed a very long questionnaire that used open-answer fields rather than multiple-choice answers. The WELS questionnaire was to be followed up by fieldworker visits, thus making WELS, strictly speaking, a mixed method project. The WELS questionnaire is published in Cassidy and Duckert (1953: 19–95). With a total of 1520 questions, WELS reads very much like a fieldwork questionnaire. In order not to overly tax the respondents’ patience, the questionnaire was sent out in batches of about 200 questions. Each question was printed on a paper filing card (a “paper slip”) on which the respondent directly wrote his or her answer. Because of its length alone, WELS could not be sent to just about anyone but only to specifically recruited, willing and interested individuals, thus creating a bias in the sample. Some further background knowledge on the purpose of WELS will help to properly contextualize Cassidy’s WQ. The WELS questionnaire was, from the beginning, a test balloon for the fieldwork intended to support the dialect dictionary of the American



Chapter 2.  A history of written questionnaires in social dialectology

Dialect Society. That dictionary, which was to become the Dictionary of American Regional English (DARE), was first envisaged at the founding meeting of the American Dialect Society (ADS) in 1889 and was inspired by Joseph Wright’s English Dialect Dictionary (1898–1905). DARE, however, did not come to fruition till the 1960s and was not finished until 2013 (Cassidy and Hall 1985–2013), which illustrates both the detail of the work and its funding challenges. It therefore stands to reason that WELS was never intended as a WQ in its own right. WELS was based on a semantic analysis of 60 years of work in American dialectology by several generations of American dialectologists and its primary purpose was as “the pilot study on which the DARE Questionnaire was based” (Cassidy 1985: xii). It seems in hindsight as if WELS was more about gathering semantic areas for the fruitful exploration of dialect lexis in FIs, rather than about seriously testing the WQ method, since the method of choice had always been, given Cassidy’s proximity to the LAUSC scholars, the type of FI used in that project. The above “test-balloon” rationale would also explain the unrealistically long WQ for WELS that contrasts sharply with any other WQs employed in North American dialectology. By contrast, for instance, Davis’ short 100-item questionnaire offers different opportunities for data elicitation. From today’s perspective we might say that Cassidy’s WELS, being so closely modelled on the fieldworker questionnaire, was only bound to highlight the advantages of the FI method. From this background it may be of no surprise that the FI method was chosen for DARE, with interviews carried out between 1964 and 1970 in 1002 communities across the United States (Cassidy 1985: xii). In hindsight it is fully understandable why Cassidy may have felt he had to employ FIs rather then WQs. This was, simply, the prevailing opinion of the day, which is also reflected in Harold Allen’s Linguistic Atlas of the Upper-Midwest (LAUM), for which fieldwork was started in 1947 and completed in 1951. Most interestingly for the present volume, Allen embarked in the years 1951–1953 on a “supplementary” survey in the form of a postal questionnaire in the same region. The ratio of FI data to WQ data was about 1 : 5, similar to Davis’ ratio. Allen, however, treated the WQ merely as a “supplementary mail study”, a view that was first challenged by Chambers (1998a) and is introduced in full detail in Chapter 3. If one were to embark on a major study of American dialect words in those days, one’s best bet would be to employ the generally accepted method: the FI and not the WQ, although, I would argue, it would have been not only the more economical, but also the more suitable method for that purpose. To summarize the state of the method in mid-century North America, the WQ had generally been regarded as a second-best option for its role of “supplementing” FI data. Atwood (1962) attested a limited popularity to WQs by writing at the time that More recently nearly all of those engaged in dialect study have made use of a check list [i.e. WQ in the day’s terminology] for one purpose or another – usually to supplement investigations of fieldworkers. (Atwood 1962: 30)

35

36

The Written Questionnaire in Social Dialectology

Resulting from such partially positive assessments, WQs never completely ceased to be used. While “several surveys supplement[ed] the [LAUSC] interviews with vocabulary questionnaires distributed by mail” (McDavid 1986 [1980]: 118), their status, however, was always and without exception seen as inferior when compared to the preferred FI method in general and, more particularly, limited to the polling of vocabulary. As pertaining to the latter, Atwood justifies his own WQ study on lexis in the US Southeast as following several investigators [who] have found that vocabulary items can be satisfactorily collected by less costly methods than the employment of highly trained fieldworkers.  (Atwood 1962: 29)

This is hardly the language used to describe a generally and universally accepted method. It is safe to say that Cassidy, just as Allen, did not fully appreciate the WQ. As they were two of the leading figures of the day, their bias against the WQ was bound to reduce the exploration of the method’s limits. In many ways, WQs were not viewed as primary data collection tools in the US until Bert Vaux’s (2004) internet survey in the early 2000s. There is, however, more to the complex and, as I would like to argue, inconsistent preference of the FI in mid-century American dialectology. Before turning their backs on the questionnaire method, for instance, Allen and Cassidy actively assisted colleagues in Scotland, who were at the time embarking on a linguistic atlas of their own. In the Scottish dialect atlas, WQs were put at the methodological centre and it is with this project that the focus of activity as far as English is concerned, shifted from the USA to Scotland.

2.5 Scotland and The Linguistic Atlas of Scotland In the English language context, special consideration must be given to the Linguistic Atlas of Scotland (LAS). Collaborating in the planning stages with LAUSC, Scottish dialectologists introduced WQs into dialectology with a systematicity formerly unknown. While it is true that Alexander Ellis (1869–89) must be credited for using a postal questionnaire with a phonetic script that respondents were required to learn (Mather & Speitel 1975: 14) and Joseph Wright’s English Dialect Dictionary (Wright 1898–1905) would not have been possible without a correspondence element, both their projects did not make use of the more formalized WQs that are the focus of this book. Their efforts, however, are still important precursors of the WQ method, much in the same vein of Stadler’s earlier linguistic enterprise. For instance, Wright sent “at least 12,000 queries […] from the ‘Workshop’ connected with words contained in this [first] volume” (Wright 1898: v), which contributed a good deal of data to making the EDD the impressive tome it became. Beyond the British context, LAS is remarkable as being the first major English language atlas to use a structured WQ as the main data collection tool. By structured



Chapter 2.  A history of written questionnaires in social dialectology

I mean the use of one questionnaire, in contrast to individually tailored, ad hoc questions about given lexical items and as such the WQ is an important step in the use of the method in English linguistics in general. The WQ behind LAS is published in two slightly different versions, the first being separately accessible in McIntosh, Uldall and Jackson (1951). Originally meant to comprise a Scots and a Gaelic section, only the Scots section was completely published. LAS, which comes in three volumes, is special for the present purposes because of its lexical part (comprised of volumes 1 and 2), which is based entirely on WQ data. The phonetic section (volume 3) is based on fieldwork. It is important to point out, however, that by the time the American dialectologists assisted the Scottish atlas project, WQs had already established a tradition in Scotland. For instance, some LAS questions date back to preliminary WQ surveys from as early as 1936 under the direction of John Orr (Mather & Speitel 1975: 10), so one must credit the Scottish linguists for employing a method in a more crucial way than had been seen in the USA at the time. In the introduction to volume 1, Mather and Speitel (1975) describe the history of the project and the Linguistic Survey of Scotland – of which LAS is the result – in great detail. Given the prevailing reservations about WQs, the authors, too, saw the need to justify the use of postal questionnaires, even for lexical study. In Mather and Speitel’s opinion, WQs seem[] especially suitable in the collection of lexical data and [are] cheaper than using field-workers who might in any case devote themselves to more intricate investigations. (Mather & Speitel 1975: 10)

In total, the Scottish team of linguists collected answers from almost 1800 respondents throughout Scotland between 1952–4. Unfortunately for the general credibility of the method, their WQ was not without problems, though they generally related to rather superficial planning aspects. For instance, some features were overlooked, such as failing to poll the education level or occupation of the respondents (Macaulay 1977: 225) or neglecting to include a date field to establish a time line for all WQs, which were sent out over a few years (e.g. 1952 or 1954). A further problem revolved around the instructions for the respondents, which were a page of dense print on legal-sized paper. They were too complex and confusing. As Mather and Speitel (1975: 13) admit, asking the respondents to distinguish between usual and less common words that they use overtaxed the speakers’ capabilities (see Section 7.3.2 on “Accessibility” of linguistic features). Despite these shortcomings, though, the LAS convinced some critics that the WQ method was a reliable tool for large-scale projects. As one reviewer put it: “One of the surprising things about the results from the postal questionnaire is that, despite the shortcomings of the method and the sample, clear isoglosses do emerge” (Macaulay 1979: 227). LAS proved, not only in the eyes of variationist sociolinguist Ronald Macaulay, that WQs were indeed better than their reputation.

37

38

The Written Questionnaire in Social Dialectology

But even with such praise from practitioners such as Macaulay, who consciously foregrounded the spoken language and the method most conducive to it, the interview, LAS’ influence remained at the local Scottish level and only went minimally beyond that region. One such project, however, was based in Wales. Partly inspired by LAS, Thomas’ (1973) Linguistic Geography of Wales is a lexical atlas of Welsh English and Welsh regional words that expanded the principles of LAS by devising a bilingual questionnaire for eliciting English and Welsh words whenever possible. Thomas aimed to document older and archaic uses in order to establish the original lexical boundaries of the Welsh language that were unaffected by more recent borrowings from English. Using WQs was not established practice then, and Thomas (1973: 3) considered his approach “an experiment” for that reason. Thomas’ study is, however, one of the earlier approaches to multi- and bilingualism employing a WQ, an approach that will be foregrounded in Chapter 5 for its relevance in the highly diverse global settings that English is being put to use today.

2.6 WQs in Canada: A special case WQs have a special status in the study of Canadian English (CanE), as they play an “unusually important role in the study” of the variety (Boberg 2013: 135). Walter S. Avis, a graduate of the University of Michigan, was the first to introduce the method to CanE in a series of three articles on linguistic differences along the Canada and US border. Avis does not reveal many details about his WQ or as he calls it, “a questionnaire-type investigation of Ontario speech” (Avis 1954: 16). Methodologically, the questions do not survive and all that is known is that the survey was “multiple-choice” (Avis 1956: 41), which means that he listed the answer choices, but it is not known whether he requested additional variants. Most importantly, Avis did not limit his survey to lexical variables, but included phonemic, grammatical and usage questions. He seems to have been the first to do so since Hempl in North America, yet he apparently took great care to avoid some of the problems that inflicted Hempl’s earlier approach. Avis polled on at least three different occasions in the years 1949/50 in Kingston, Ontario, as part of his graduate work in Henry Alexander’s seminar (Avis 1973: 55). Some variables are listed by the total number of respondents, which is between 85 and 165. Avis used a smaller set of responses for the pronunciation variables, as he aimed to rule out inconsistent results from less educated speakers who were confused by the questions, which implies the collection of social background data that is otherwise not reported in his publications. However, there is no indication that Avis collected data on the US side of the Canada-US border, which means that for the comparative component he most likely used data from the unpublished Linguistic Atlas of the Northern and Central States and his own informal observations. In other words, while his Canadian data is sound, even profound in some cases for the day’s standards, his



Chapter 2.  A history of written questionnaires in social dialectology

US data is less systematic to the extent that it is based on only a handful of speakers, which is typical for FIs. It also means that his comparative dimension suffers somewhat from an incompatibility of data sets: many responses from Ontario, only few from the US bordering states.

2.6.1

Canadian beginnings

Avis published three articles, first on lexis (Avis 1954), then on grammar (Avis 1955) and finally on pronunciation (Avis 1956), thereby putting WQs to use for different linguistic levels and offering a first, quantitative description of features of CanE. At the time, little was known about CanE in general, as the published findings comprise only half a dozen short papers (see Dollinger 2008a: 21–23), led by Ayearst (1939) and Ahrend (1934), who dealt with phonetics, while next to nothing was known about lexical and morphosyntactic patterns. With this series of articles, Avis established the model for many rounds of WQ inquiries into CanE. Shortly after his own surveys, the method was applied to Montreal English by Hamilton (1958, 1964), who collected in the years 1957/1958 a total of 230 questionnaires in Montreal, with respondents ranging from 17 to 50 years of age (1964: 456). Like in Avis’ case, however, the opinion seemed to have prevailed that questionnaires are put to best use in the hands of educated speakers. Hamilton, following Avis’ (1956: 43) practice for pronunciation, ruled out responses from individuals who did not at least graduate from high school, which reduced the number of respondents to 118 (1958: 71). It therefore stands to reason that these early Canadian studies underrepresent the variation implicit in the more typical education levels at the time. Around the time of Hamilton’s work in Montreal, George Story and Patrick Drysdale were experimenting with questionnaire design in Newfoundland for surveys that would eventually inform the Dictionary of Newfoundland English. A first questionnaire was tested in 1958 (Kirwin 2012: 18), with several versions following. From the preface of the 1959 version of that questionnaire, the basic mode of inquiry was the FI method, yet the questionnaire was worded in WQ manner with complete question and answer sets to, as the preface to the questionnaire states, “make it possible for untrained students to record usage on the questionnaire-sheets”. However, “[f]or actual fieldwork” (Story 1959: 2), answers would be recorded on index files. While Story stressed the “experimental nature” of the questionnaire, it seems clear that the primary goal of the questionnarie, which was developed over several decades well into the 1980s into a more typical FI questionnaire (cf. Noseworthy 1974 for one such version), was to elicit data by way of an intermediary, whether university student or trained fieldworker. Some informants may have filled out the questionnaire directly, yet this was likely only a secondary function, as the FI method was given preference (cp. Kirwin 2012). It seems that the WQ method was only an afterthought in Newfoundland dialect projects, which is consistent with prevailing opinions at the time.

39

40

The Written Questionnaire in Social Dialectology

The next location for WQs in Canada was British Columbia. UBC Linguistics professor Robert J. Gregg conducted a survey of southern BC English, results of which were not published, but are found in two M.A. theses. Gregg (1973: 107) reports findings from only one region of “a fairly complete survey of all the B.C. districts bordering on the U.S.” that was polled from the late 1960s by a number of students using “experimental questionnaires”. Data survives from a mixed survey of the BC Kootenay region whose 24 adult responses, collected by Howard Woods, included 15 interviewee responses but also 9 respondents via postal questionnaire. Polson’s (1969) unpublished M.A. thesis from UBC’s Department of Linguistics, supervised by R. J. Gregg, is an important point of departure. Polson’s work is remarkable as it is based on a thorough investigation of the survey items and is credited as the direct inspiration for the style of questions in Chambers’ Dialect Topography project of the 1990s (1998a: 224) and, more generally, the introduction of some variables and questions into the Canadian WQ tradition. Theoretically, Polson questions the suitability of a questionnaire developed in the US for use in Canada (1969: 5–8), as is the case with LAUSC. He considers it unlikely that such questionnaire would be detailed enough to capture Canadian variation and proves the point in a number of variables with data from Vancouver (1964), Duncan on Vancouver Island (1966) and Hope at the eastern end of the Fraser Valley (1967) in addition to two postal questionnaires distributed province-wide with the help of BC regional newspapers. The case of the vowel pronunciation in vase will be discussed in some detail in Section 4.4.2 to illustrate Polson’s point. While the exact details of the BC Survey are difficult to establish today, data collection for the Linguistic Atlas of BC, using Polson’s postal questionnaire, was carried out throughout the 1970s, after which approximately 800 postal questionnaires had been completed (de Wolf & Hasebe-Ludt 1993: 304). De Wolf and Hasebe-Ludt (1993) offer some data for select variables from the BC Survey, which shows their distribution in eight regions of BC. The best account of the Survey of British Columbia, is found in Stevenson (1976), which is an analysis of the phonological questions in the questionnaire in seven BC regions. Of the 518 responses received at the time, Stevenson used the answers from 368 respondents, who were basically non-mobile – i.e. without any significant travel activity –, monolingual, and born or at least raised in the immediate area (1976: 96, 19–20). Gregg’s rigid selection criteria will be critically discussed in Section 5.1. Comparisons with virtually all other available data (Gregg’s and Polson’s unpublished data, the SCE and Rodman’s SCE subsample – see below) and extensive appendices of feature frequencies make this M.A. thesis the opus magnus of BC dialectology to date. The thesis includes a small number of display maps with classifications overlaid, as shown in Figure 2.2, which are, to my knowledge, the only dialect maps of English in BC:



Chapter 2.  A history of written questionnaires in social dialectology

Figure 2.2  Dialect map (display and interpretive lines) for the pronunciation of route (Stevenson 1976: 149)

The area shown in Figure 2.2 is the part of British Columbia comprising the Okanagan Valley and Cariboo regions, with the Canada-US border shown at the bottom, using a horizontal line. The variants of the pronunciation of the word route are charted either with [aʊ] or with [u:]. It seems that variant [ru:t] is most common in the area north of the curved isogloss, while [raʊt] has considerable frequency to the south of it, next to the Canada-US border. Stevenson appears to be first to produce dialect maps of BC based on linguistic evidence. Dialect maps remain a resource that is extremely rare for the province to this day and it may well be that Stevenson’s maps are the only such maps for the region.

41

42

The Written Questionnaire in Social Dialectology

2.6.2 Survey of Canadian English (1972) After Avis, Hamilton and the early WQ activities in British Columbia, Matthew H. Scargill coordinated, aided by the Canadian Council of Teachers of English, a national linguistic survey, the Survey of Canadian English (SCE). In 1972, Scargill and his regional directors sent questionnaires to Canadian schools and received 15,575 useable responses from all ten Canadian Provinces (provisions for the Northwest Territories were made, but results were not reported). The single-sheet, double-sided questionnaire was handed to Grade 9 students, who filled out the form themselves and who passed it on to their parents. The two-generation design allowed the tracing of language change from one generation to the next in what is now known as apparent-time (see Section 6.1). The data was automatically processed within the limitations of early computing constraints, which explains some of the questionnaire’s rather rigid answer options shown in Illustration 2.2:

Illustration 2.2  Part of the questionnaire of the 1972 Survey of Canadian English

The theoretical focus of the project and the general disregard for L2 varieties of English and their influence unfortunately resulted in the rejection of all responses by individuals who were not born in Canada. This left a total of 14228 responses of presumably L1 speakers of English, as shown in Table 2.1. The questionnaire listed 104 linguistic questions, comprised of 42 items on pronunciation, 27 on grammatical usage, 30 on vocabulary and 5 on spelling conventions, which are reported in Scargill and Warkentyne (1972) and, in slightly different format, in Scargill (1974). Where the results differ, Scargill and Warkentyne (1972) should be given precedence. Some examples of the SCE polling of spelling variables (#11),



Chapter 2.  A history of written questionnaires in social dialectology

Table 2.1  Survey of Canadian English, overall results (Scargill & Warkentyne 1972: 50) NL PE NS NB QC ON MB SK AB BC Total

Male parents

Female parents

Male students

Female students

Total

153 278 531 272 100 180 182 266 284 250 2490

207 437 704 434 163 246 241 388 372 313 3495

263 458 463 457 540 379 368 421 306 315 3936

133 544 539 502 555 443 404 434 378 382 4307

756 1717 2237 1665 1358 1248 1195 1509 1340 1260 14228

grammatical variables (#16 for overt adverb marking), pronunciations (#33 and #110) and lexical variables (#34, #58, #74) are shown below: 11.

Which spelling do you use? A. color B. colour C. either one

16.

Which do you say? A. It’s really hot in here. B. It’s real hot in here. C. either one D. none

33.

What do you call the letter Z? A. zee B. zed C. either way



34. What do you call the sweet hard substance that covers some cakes? A. frosting B. icing



58. What do you call the small square of paper with which you can wipe your fingers during a meal? A. serviette B. napkin C. either one

74. If talking about money with friends do you call it bread? A. yes B. no C. sometimes 110. How do you pronounce the first a of guarantee? A. like the a of cat B. like the a of car C. like the a of care

43

44

The Written Questionnaire in Social Dialectology

In terms of its social background questions, however, the SCE offers very minimal data, as only three background questions for students and, respectively, four for parents were included. The social questions were: type of respondent (male parent, female parent, male student, female student), birth province, province of residence; parents were also asked about their education level. The last question, level of education, is not reported in any published results and is probably lost. While the SCE allowed a first national comparison, it clearly fell short in the social dimension. Had more background data been included, e.g. birth years instead of a mere classification as “male parent”, “female student”, and most especially survey location (e.g. Moosehead, Saskatchewan) instead of only “province” (e.g. just “Saskatchewan”), much more could have been accomplished with the data. Some voices were very harshly critical of the SCE, and in hindsight quite unreasonably so. Pringle, for instance, called it “a ‘national’ survey of questionable design” (1983: 100) and failed to appreciate its strengths or its exploratory nature. The data collation was carried out by IBM and it seems likely that an answer output of either two, three, four or five choices was required by the software in these early computing days. For instance, answers to one’s province of birth or residence had to be polled in three separated questions with partial answer options, one of which reading each time “none of the above”, as shown in Illustration 2.2 (questions 2–4). The SCE, however, includes a number of interesting Canadian variables and for many of them it offers the oldest empirical data for benchmark comparisons. It has been used in a number of follow-up studies (e.g. Polson 1969; Rodman 1974; and Stevenson 1976 on the BC sample; Warkentyne & Brett 1993 for the national sample).

2.6.3 Other Canadian WQs The SCE was a preliminary end-point for WQs in Canadian English, with the exception of a national survey on the spelling habits of Canadians (Ireland 1979) and some MA projects (e.g. Nylvek 1984). It was not until the early 1990s when WQs were again used in social dialectology projects. One region where few detailed linguistic facts are known is the province of Saskatchewan. Here, Nylvek (1992, 1993a, 1993b) used a postal questionnaire to gather 661 responses from two urban and two rural regions in Saskatchewan (one east and one west of Regina) with roughly equal urban and rural distributions. Her results offer the most substantial look at the English as used in Saskatchewan.6

6. Previous work is almost exhausted in Allen (1959), who used five field records (interviews) from his Linguistic Atlas of the Upper Midwest, one of which was located in Saskatchewan.



Chapter 2.  A history of written questionnaires in social dialectology

At about the same time, Zeller (1990, 1993) used a postal questionnaire to poll linguistic variables in 20 universities from Toronto, Canada, to Milwaukee, Wisconsin, in an effort to reveal signs of a geographic dialect continuum and the effect of the Canada-US border for pronunciation and lexical variables. From a small sample of only 107 questionnaires, she reported that the “postal questionnaire proved to be very reliable” compared with audio recordings from two locations and that “survey informants responded consistently throughout” (Zeller 1993: 180). She was able to show clear border effects of a number of variables, among them the Ontarian pronunciation of the past tense of shine, shone as /ʃɑn/, which stops at the political border (see Section 6.5) and running shoes (Ontario) vs. tennis shoes (US Midwest). Zeller’s MA work is credited as the inspiration behind Chambers’ Dialect Topography of Canada project (1994–2004) (Chambers 1998a: 222) and must therefore be considered the impetus behind the continuation of WQs in CanE after a period of inactivity in the 1980s.

2.6.4 Dialect Topography of Canada (1991–2004) The Dialect Topography of Canada began in the year 1991/2 as a regional project to document linguistic variables in the “Golden Horseshoe”, Canada’s most populous region. The area covers the western tip of Lake Ontario, roughly from Oshawa, Toronto, Hamilton to Niagara Falls at the Canada-US border. Its director, J. K. Chambers, expanded, with the help of regional directors, the coverage of the project to seven Canadian regions by 2004 – mostly in Eastern Canada, with Metro Vancouver as its only western location. The database includes four American regions adjacent to the international border, which facilitates cross-border comparisons. The WQ is a relatively short questionnaire of 92 questions that is comprised of 11 background questions and 76 linguistic questions, some in multiple parts, pertaining to different linguistic levels: 30 pronunciation questions, 31 vocabulary (25 general words and 6 special ones), 7 morphology, 5 syntax and 4 usage questions (1994: 38). The name dialect topography rather than dialect geography is aimed to indicate that WQs do not reveal the finegrained details that interviews offer. Between 1991 and 2004, Chambers and his associates gathered data from about 6000 individuals in the Canadian and American regions. The Canadian regions from east to west are: New Brunswick, Quebec City (Anglophone population), Montreal (Anglophone population), Eastern Townships (Anglophone population), Ottawa Valley, Golden Horseshoe (Greater Toronto Region & surroundings) and Metro Vancouver. The American data comes from regions adjacent to the Canadian regions of New Brunswick, Eastern Townships (southeastern Quebec), Golden Horseshoe, and Metro Vancouver. All data is available for viewing and downloading on the Dialect

45

46

The Written Questionnaire in Social Dialectology

Topography website and will be used in Chapters 8 and 9 for practical work with WQ data. The Dialect Topography project has produced a large body of research that at times builds on established variables and at other times introduces new ones (e.g. Chambers 1994, 1998a, 1998b, 2007, 2008; Chambers & Heisler 1999; Chambers & Lapierre 2011; Easson 2000; Pi 2000; Boberg 2004a, 2004c; Berger 2005). It correlates, in obvious contrast to the SCE 25 years before, detailed data on the social background of the respondents with their linguistic responses and can therefore be put to use much better than the SCE. Dialect Topography data will be found throughout this book, most notably in Chapters 4 and 6, which is why there is no need to introduce its variables here. The importance of the project for the study of regional and social dialects across this large country cannot be overstated. Dialect Topography showed in the 1990s what many English linguists at the time did not consider possible: that WQs produce sociolinguistically interesting data.

2.6.5 North American Regional Vocabulary Survey In the spirit of previous WQs on CanE, Boberg’s North American Regional Vocabulary Survey (NARVS) developed from a student data gathering exercise to a fully-fledged survey. The dataset included exclusively lexical items, as a set of 53 vocabulary items was polled, in addition to detailed social background questions (Boberg 2005: 54–57). Data was collected between 1999 and 2007 (Boberg 2010: 168) and a total of 6000 responses were gathered, of which 2400 were used in the analysis: 1900 from Canada and 500 from the United States. Among the 6000 respondents, those who had moved in their childhood were set aside in order not unnecessarily complicate the data analysis (ibid). In NARVS, respondents were given a brief definition of the desired word and were then offered a list of answer choices, which they were asked to circle on paper, and in later versions to tick off on an internet form. Respondents were asked to select the variant “they would use most often in everyday speech” and to select more than one word “only if necessary”. In addition, they were offered the option of writing other variants into the margin (on paper) or into a text field in the online version. What can be seen here is that every questionnaire includes some compromises: offering answer options likely decreases the range of variants one would be able to elicit, but has other advantages, such as a smaller time commitment by respondents and expedited response collating for the researcher. Boberg (2005: 25) acknowledges some disadvantages of the method, but prefers its ease of use (circling and ticking off is easier than writing words out) in comparison with open answer choices.



Chapter 2.  A history of written questionnaires in social dialectology

The variables used are comprised of a set of three types: – “less-obsolete traditional variables” of CanE (Boberg 2005: 25), such as chesterfield/ sofa, frosting/icing or frying pan/fry pan/skillet rather than quawk ‘uncooked frozen meat or fish’ (DCHP-1 Online), aboiteau ‘dike or sluice gate’ (Maritimes) or splake ‘hybrid trout’, developed by Canadian biologists (DCHP-1 Online); – “newer variables that seemed likely to reveal national differences between American and Canadian English”, such as candy bar/chocolate bar (the latter being Canadian) (Miller 1989: 30–1), soda/pop (the latter predominantly Canadian) (Miller 1989: 33), the last letter of the alphabet zee/zed (the latter Canadian) Scargill & Warkentyne (1972). – “newer variables that involved at least one variant known to be characteristic of a particular region of Canada, and particularly of Quebec”, e.g. soda/pop/soft drink (the latter is Quebec English), hoodie/kangaroo jacket/bunnyhug (the latter being a Saskatchewanism), parking garage/parkade (the latter Western Canadian) Variables with more than two variants were given preference, on the assumption that they would provide a better, more fine-grained diagnostic from a North American perspective, while officially backed usage, or terms in use by government bodies or associations close to government, such as metro/subway or postal code/post code/zip code were “taken to be less interesting than variation involving no such constraints”. It needs to be kept in mind that this focus disregards officially sanctioned terms that are considered crucial in many pluricentric approaches and may have important identity-related functions (e.g. Ammon 1995: 162–168 for a list of such terms for Austrian German vs. German German and Swiss German; Muhr 1989: 80). Finally, “variables distinguishing a single region with a unique local term that has no equivalent elsewhere were excluded” (ibid), therefore ruling out terms such as saltchuck (BC Coast, from Chinook Jargon) ‘ocean’, saskie < Salish tsa’tsqi ‘edible shoots of a berry bush’ (an East Vancouver word, now likely extinct (Gregg 1995)). The overall aim of NARVS was to “create a questionnaire featuring words that any native speaker of contemporary North American English would recognize as part of his or her daily vocabulary” (2005: 26) so that Boberg’s variables provide a more contemporary data set when compared to the Dictionary of Canadianisms on Historical Principles (Avis et al. 1967, see DCHP-1 Online). Boberg’s set of variables was statistically analyzed and used to isolate lexical boundaries within Canada and with the USA. The findings were coherent and offer a new perspective on lexis in Canada, producing stronger and more tenable generalizations for CanE in relation to AmE than before on the basis of 16 Canadian regions (from “Vancouver-Victoria” to “Cape Breton” and “Newfoundland and Labrador”) and 7 northern American regions (such as “Western United States” or “New England”).

47

48

The Written Questionnaire in Social Dialectology

Using this comparison, Boberg calculates linguistic boundaries, or isoglosses, for 44 of his variables (see Section 6.5.2). New WQ data has also been collected in British Columbia from a cross-border perspective by using data from British Columbia and Washington. These samples are generally larger than NARVS data and comparable with data from Dialect Topography. The difference with Dialect Topography is that the newer Vancouver data expressly includes L2 speakers of English who are long-term residents. It is evident in the analysis that the newcomers uphold some traditional Canadian variables that would otherwise no longer be operational along the political border (Dollinger 2012a). Chapter 5 addresses the role of L2 speakers in greater detail, interpreting their linguistic behaviour as part of the linguistic pool in a given setting and as reflecting ongoing demographic changes in World Englishes.

2.7 Other, more recent applications Recently, and fuelled in part by the ease of internet polling, WQs have seen a kind of revival in a number of contexts. One of the early and perhaps most widely publicized internet surveys was Bert Vaux’s Harvard Dialect Survey (Vaux 2004, access maps of results at ), which lists 122 questions on various linguistic levels. The responses to this survey, which is limited to the United States, are generally around an impressive 10500 per question. On the downside, it appears that no residence data is used to categorize the responses, at least on currently available maps, which is why some variables deliver an unusually wide geographical spread, such as question 54, using positive anymore (Chambers 2007): He used to nap on the couch, but he sprawls out in that new lounge chair anymore.

The respondents that actively use anymore in this manner, which roughly corresponds to “nowadays”, would be expected to be located in the mid-Atlantic (PA, NY) and Midwestern US region, though results show tokens in California and Washington State the reasons for which cannot be isolated. Another interesting and still ongoing project is the Atlas zur Deutschen Alltagssprache (ADA, Atlas of Colloquial Standard German, Elspaß & Möller 2003–). Initiated by Stephan Elspaß und Robert Möller, data has been collected in various “collection rounds” since 2003. The ADA uses the internet to collect a “middle-ground” variety of German that is neither local dialect nor formal standard language and is increasingly used. The aim is to document this “middle ground” of the German language, which might be termed the “Colloquial Standard”, as opposed to the “Formal Standard” and WQs seem particularly adept at doing so. The questions confine themselves mostly



Chapter 2.  A history of written questionnaires in social dialectology

to lexical and grammatical variables, though some pronunciation questions are also found. The data is then mapped online (results can be viewed at ) and a brief commentary is provided that contextualizes and interprets the data (see Elspaß 2005). ADA is methodologically different from most traditional English-language surveys. The respondents are asked to select not the variants they themselves use most often, but the “most commonly heard” variant in a given location – the one that is deemed typical for the location. This kind of community-reporting is certainly a more direct route towards the gauging of regionally dominant variants, requiring fewer responses from a given location by not asking respondents directly what they themselves use and then abstracting from their answers. This convenience, however, comes at the possible price of precision and a conflation of attitudes and actual use. There are today syntactic linguistic atlases that rely solely on WQs: the Syntactic Atlas of Swiss German (Syntaktischer Atlas der deutschen Schweiz), uses 118 questions to elicit information on 54 syntactic constructions with elaborate questionnaires that combine a number of question types (completion questions, multiple choice and translation questions – e.g. from Standard German to Swiss German). A total of about 3200 questionnaires was received from 383 locations in German Switzerland and form the basis of analysis (see, e.g. Bart et al. 2013; Glaser 2008). Given this activity in German and Dutch language circles, it is perhaps not surprising that as of late WQs have come to be appreciated in a different light in English linguistics. Buchstaller et al. (2013) have tried an interesting approach and use WQ methodology in interview settings in Northern England and Scotland, thus mirroring methods employed in the syntactic atlases of Swiss German and Dutch. They used judgements of dialect constructions in various contexts by asking respondents which grammatical constructions of non-standard verbal s- (i.e. the Northern Subject Rule) occur in their community and how often:

1: 2: 3: 4:

This type of sentence would never be used here – it seems very odd. This type of sentence is not very common here but it doesn’t seem too odd. I have heard this type of sentence locally but it’s not that common. People around here use this type of sentence a lot. (Buchstaller et al. 2013: 95)

By requesting indirect grammaticality judgements on community behaviour rather than personal behaviour, the authors hope for a reduction of prescriptive pressures on the reporting of socially stigmatized variants. For another grammatical phenomenon in the region, the change from /t/ to /r/ in contexts such as shut up → shur up, a different question was asked. Here they asked the respondent directly about their personal use:

49

50

The Written Questionnaire in Social Dialectology

1: I would never pronounce this word with an r 2: I can sometimes pronounce this word with an r, but I wouldn’t do it very often 3: It would be normal for me to pronounce this word with an r (Buchstaller et al. 2013: 96)

The T-to-R question was handed out after the respondent was involved in a conversation on the matter and the use of the construction in the region. While the T-to-R method does not represent, strictly speaking, a WQ but a mixed form, since a conversation precedes the questionnaire, it represents an interesting avenue of inquiry for low-frequency features that we will revisit in Sections 7.3.3 and 7.3.5. There have also been more and more projects that appeal to the wider public more generally and usually less selectively than in pure research projects. In an ingenious and inspiring attempt to combine media work with data gathering, the BBC Voices project collected in the mid-2000s, with input from Clive Upton, past director of the SED project at Leeds University, data with WQs in a combination of both online questionnaire and paper questionnaires. The material led to a considerably-sized data base of regional Englishes throughout the UK. The digitally collected data can be viewed on the BBC Voices webpage, , while the paper-based WQs still await analysis. Such projects have the advantage that they not only collect valuable information, depending on the degree that dialectologists are allowed to set the parameters, they also bring linguistics closer to the speakers who are often eager to know insights about their varieties. It was therefore quite a logical consequence to see the British Library in London encouraging their patrons in its 2010 Evolving English exhibition to “either submit a word or phrase they felt was somehow ‘special’ in their variety of English (the ‘WordBank’) or recite a reading passage designed to capture their accent (the ‘VoiceBank’)” (Robinson 2015). This was done in especially set-up recording booths in the London library, its regional branches across England, and in one location in the London subway system. While linguistically not as rigorously designed as BBC Voices, it nevertheless produced 15,000 entries of partially highly meritorious content that form an interesting audio archive. The project has tapped into the prevailing public interest in dialectological and variationist research and offered outreach as well as feedback and data collection.



Chapter 2.  A history of written questionnaires in social dialectology

2.8  Chapter conclusion This brief history of important works employing WQs has shown that WQs have come a long way from Stadler’s and Wenker’s methods. Mitzka, as we have seen, held on to Wenker’s model more than half a century later in order to ensure maximum comparability of data sets. Others, such as Dialect Topography, expanded the social range of documentation and left the methodological straightjacket of SCE. The issue of being faced with a difficult and conflicting choice between the comparability and the reliability of data sets will be dealt with again in the next chapter. Today, WQs are not only used for traditional questions of linguistic geography, but also for tests of linguistic theories. The WQ method, while used continuously in some contexts, such as in the Netherlands and in Flemish Belgium, has had a stronghold in Canada, where the method has generated a great deal of insights over the decades. It would be fair to say that today we are at a methodological crossroads, where a choice of options is available and the rigid methodological disputes of the past are, indeed, a thing of the past. WQs seem to be regaining their place not just in dialectology but also in sociolinguistics. The slow development and acceptance of WQs in English linguistics seems to have been influenced by the cautious assessments of the associates of the Linguistic Atlas projects: Fred Cassidy, Alva Davis and, as we will see in the next chapter in detail, Harold Allen did not see the merits of the method in a balanced way. The early sociolinguists, with their focus on collecting spoken language in a novel way were not a likely group to see the potential validity of a method, an older method, that relied on pen and paper to collect data. They were, clearly and rightly, focussing on speech and in this context, WQs must have seemed as something not particularly promising or exciting. And even Walter S. Avis, the Canadian pioneer of WQs, did not lose many words about his WQ method, as if he knew that doing so might open a Pandora’s box of criticism. Instead, he focused on what the data would show about Canadian English, a variety that had not been empirically studied at that point. Some researchers saw the potential of WQs more so than others. Raven I. McDavid, who was Kurath’s successor at the LAUSC project and highly experienced in the FI method, was fairly convinced, apparently much more than other members of the LAUSC group, that WQs were a useful tool. He suggested that even in phonology, WQs may have a role to play: “lay observers”, he wrote, “are competent to record the distribution of the phonemes if not their phonetic quality” (McDavid 1953b: 569). In the next chapter, we will revisit data from 1950s American dialectology to show how McDavid might have arrived at this assessment, and how others might have failed to see the potential of WQs.

51

Chapter 3

A comparison of data collection methodologies Chapter 2 introduced a number of studies that employed WQs. In English dialectology, however, the perceived wisdom to avoid WQs whenever possible was not changed substantially with the exception of the Canadian and Scottish contexts. There have been debates about this question – WQs vs. FIs – starting with Bremer’s (1895) critique of Wenker’s approach and continuing into the mid-20th century in the exchanges between Mitzka and Kurath (1958). Both WQs and FIs were discussed in Chapter 1 as representing elicited linguistic behaviour rather than natural observation. WQ and FI alike are responses to linguistic tasks and “thus arise from a situation that is more or less artificial, or experimental” (Seiler 2010: 512). Whether a fieldworker reports responses to elicitation tasks or the respondents themselves do so is a rather small issue in the big schema of empirical work and data types. The question is how to elicit data in the least interfering way. In most cases, a combination of methodologies would possibly lead to the most reliable results as the advantages and disadvantages of each method would become apparent and, ideally, balanced by another method. As Seiler writes in an insightful article: “The question of which is the best elicitation method is perhaps misleading” (2010: 512). Instead, it is important to be aware of the advantages and disadvantages of each method and to use them with a clear rationale in mind. The present chapter will be taking a broader look at the debate of WQ vs. FI. WQs will be compared, first, with identical fieldwork interviews and, second, with sociolinguistic interview data. The latter method is generally classified as elicited behaviour coming close to naturally occurring speech, though the method has disadvantages of its own (e.g. the most theoretically vexing issue is perhaps the circular definition of the vernacular, see e.g. Becker 2013). The most pervasive method for the study, description and theorizing on the basis of observation is Corpus Linguistics. Corpus Linguistics has developed quickly since its humble beginnings in the pre-computer era, but it was not until personal computers became widely available that the method gained critical mass. Today, corpus linguistic methods are perhaps the most widely used methods in English linguistics in general, warranting their principled comparison to WQs.

54

The Written Questionnaire in Social Dialectology

3.1

Corpus linguistics and WQs: A methodological comparison

Corpus linguistics is not, unlike historical linguistics (language and its development), sociolinguistics (language and society) or forensic linguistics (language and law enforcement), a branch of linguistics but a method; a method using elaborate collections of texts – both written and spoken texts – to produce data on linguistic items in their discourse contexts. To give a simple example, if one wants to know how cool is used in a given English variety, one can produce examples of cool in context that can be analyzed, as shown in an extract of a KWIC concordance (keyword-in-context) in Table 3.1 for historical CanE: Table 3.1  Lexical item cool in the Corpus of early Ontario English (CONTE), 1776–1849 (Dollinger 2006)

KWIC lists are part and parcel of corpus linguistics but they only represent a first step – the raw data – that needs to be classified and then analyzed. As Table 3.1 shows, cool is used only three times in CONTE to refer to temperature (examples 2 and 3, from newspapers from period 1, 1776–99) and once, from diaries, in the sentence Mr Neave sent Mary her trace patterns & a short cool note, he … in a second meaning “not affected by passion or emotion”. The OED-3 (meaning 2a) shows that the second meaning can already be found in Old English (Beowulf), Middle English (Chaucer) and Shakespeare, so there is little surprise to find it in mid-19th century Canada. As Lindquist points out, corpus linguistics also generally carries with itself “a certain outlook on language” that is best described as a conviction that the rules of language are usage-based and that changes occur when speakers [(or writers, texters, video callers), SD] use language to communicate with each other. The argument is that if you are interested in the workings of a particular language, like English, it is a good idea to study English in use. (Lindquist 2009: 1)

This means that if the use of a particular linguistic item changes, then the rules that govern that item change as well. These rule changes need to be reflected in reference grammars, dictionaries and other books on language. Table 3.2 replicates the search for the lexical item cool in a contemporary corpus of Canadian English, the Strathy Corpus. The Strathy Corpus is a balanced, stratified corpus of 50 million words. “Balanced, stratified” means that it aims to mirror CanE in a balanced and representative fashion in a well-selected number of text types (or strata). The Strathy Corpus includes 1664 occurrences of cool, 10 of which (item 1654 to 1663) from spoken language are listed in Table 3.2. One can immediately see that cool is used in different contexts in



Chapter 3.  A comparison of data collection methodologies

Table 3.2  Cool in the Strathy Corpus of Canadian English (1985–2011) (retrieved from )

present-day Canada than in early Canada. Cool has acquired new lexical meanings, such as “sophisticated, stylish; or admirable or excellent”; or just “ok, alright”, which the OED-3 documents as early as 1884, but for the most part only for the 20th century. The meaning of “ok” is only found post-WWII (OED-3, s.v. “cool” (8a), (8b) & (8c)). While some of these uses were first limited to African-American English, cool is today used widely and generally. In keeping with the “unspoken mindset” that Lindquist refers to above, such corpus results (that is if analyzed in full, rather than using just a few examples) would need to be reflected in grammars and dictionaries. As such, corpus linguists have a deeply descriptivist approach to language and they have powerful tools to back up their findings: their corpus data. So why don’t we use corpora in all instances, including in social dialectology? There are some problems that exclude corpora from a number of questions and approaches.

3.1.1

Limited linguistic contexts: Problem #1

In theory, corpus linguistics offers a superior form of data: data that is authentic and that is derived from real language use and authentic communicative contexts, where no observer possibly influenced or interfered with the data. In the study of language in space, corpora have not figured as prominently, though lately advances have been made. For English, FRED – the Freiburg English Dialect Project and Corpus (e.g. Kortmann & Wagner 2005) – uses transcriptions from British audio recordings that were originally collected for oral history projects. This data, showing a wide array of variation in grammatical features in England, can be used to some extent to locate non-standard forms in geographical space. There are profound practical problems with the use of corpora in dialectology. As corpus collection is a time-consuming task, especially for spoken language, one cannot dream to have a fully-fledged corpus from every village one would like to have linguistic information on, which is a severe limitation to the usefulness of corpus linguistics. What can be done with existing corpora is to compare representative corpora of national varieties to arrive at generalizations, but these can only speak to an overall difference and can only to a limited degree answer questions pertaining to regional

55

56

The Written Questionnaire in Social Dialectology

variation within these national varieties (see, e.g., Dollinger 2008c; Rohdenburg & Schlüter 2009). In this context, an irreconcilable tension between comparability and reliability (authenticity) of the data is seen (Seiler 2010: 512–3). While one wishes to obtain spontaneous speech for dialect study, such reliable (because authentic) data will be limited for a given linguistic variable. Buchstaller et al. (2013: 93 fn10) have found that for their study on the Northern Subject Rule, even a large corpus of Newcastle English consisting of sociolinguistic interviews of 24 speakers, did not produce nearly enough instances of verbal -s that the phenomenon could be studied. NSR means that in Scottish and Northern English dialects verbal -s occurs in non-standard ways that are influenced by a number of factors, including type of subject (pronoun or not) and proximity of subject. This gives the following distribution examples for the present tense: (3.1)

a. b. c. d. e.

she like Mary likes they like John and Mary likes they like and never speaks

In (3.1a) the immediate pronoun blocks the -s, in (3.1b) the absence of a pronoun produces the -s in a non-standard context, with (3.1c) and (3.1d) respectively. In (3.1e) the immediate proximity of the pronoun blocks the -s on like, but produces it on speaks. Such linguistic conditioning would require a good deal of data and contexts, and standard-size spoken corpora today often do not have enough material. The total of 74 instances of verbal -s in their corpus, many of which do not qualify (such as uses of be), was not enough to make any generalizations. Low-frequencies in required contexts are one major reason why corpus data cannot be used in many contexts. The (reliable) corpus data is therefore of little help. In order to assess the phenomenon, a different route must be taken: one needs to elicit the required behaviour. As elicitation is a somewhat artificial task, the reliability of elicited data will be somewhat compromised. While elicitation and observation are related, it is difficult if not impossible, to establish the precise nature of their relationship. A researcher can set up an elicitation routine that can be employed with all respondents or interviewees, which ensures maximum comparability of the data. In this way, tokens of a phenomenon are offered in all desired contexts by a good sample of speakers, yet the reliability of the data is bound to suffer somewhat (see Chapter 1, Illustration 1.1 on the theoretical distinction between observation and corpus data).



3.1.2

Chapter 3.  A comparison of data collection methodologies

Low-frequency items: Problem #2

Another issue that remains one of the weak aspects of many data collection techniques and applies to all observation methods, including corpus linguistics, pertains to items of generally lower or even medium frequency. We can make the issue transparent by looking at data from the British National Corpus (BNC), a 100-million word text collection of written (90%) and spoken (10%) British English, which is a standard corpus in English linguistics. While the precise figures are from the BNC, the principles apply to all text collections of human language. Table 3.3  50 most frequent words in BNC (Lindquist 2009: 28–9) 1. the 2. of 3. and 4. a 5. in 6. to (inf.) 7. it 8. is 9. was 10. to (prep.)

11. I 12. for 13. you 14. he 15. be 16. with 17. on 18. that 19. by 20. at

21. are 22. not 23. this 24. but 25. ’s (poss.) 26. they 27. his 28. from 29. had 30. she

31. which 32. or 33. we 34. an 35. n’t 36. ’s (verb) 37. were 38. that 39. been 40. have

41. their 42. has 43. would 44. what 45. will 46. there 47. if 48. can 49. all 50. her

The BNC’s 50 most frequent words are shown in Table 3.3. Inspection of the list shows that this 100-million word corpus – and in fact any corpus of English – most frequently includes determiners (the – occurring just over 6 million times), prepositions (of – almost 3 million times and thus at half the frequency of the), conjunctions (and – about 2.6 million times), auxiliary verb forms (is – just under 1 million), pronouns (I – 880,000 times), modal verbs (would, ranked as 43rd, with just over 255,000 occurrences) and the personal possessive pronoun her (ranked 50th with 218,000 occurrences, while his, in 27th position, occurs more frequently). This is a typical figure for language corpora: closed class, also called function words, dominate the word counts. This means if one wishes to study grammatical items of high frequency, corpora will be useful. Other kinds of analyses may not be possible with corpus linguistic methods, as over half of the BNC’s 100 million word forms occur only once. Moreover, it goes without saying that many words do not occur at all in the BNC, or even in bigger corpora. Table 3.4 lists the 20 most frequent verbs next to the 20 most frequent open class lexical items (noun lemmas):

57

58

The Written Questionnaire in Social Dialectology Table 3.4  20 most frequent verbs and noun lemmas in BNC (Lindquist 2009: 31–3) verbs (aux. in italics)

N per million words

noun lemma

N per million words

1. be 2. have 3. do 4. will 5. say 6. would 7. can 8. get 9. make 10. go 11. see 12. know 13. take 14. could 15. think 16. come 17. give 18. look 19. may 20. should

42,277 13,655 5,594 3,357 3,344 2,904 2,672 2,210 2,165 2,078 1,920 1,882 1,797 1,683 1,520 1,512 1,284 1,151 1,135 1,112

1. time 2. year 3. people 4. way 5. man 6. day 7. thing 8. child 9. Mr 10. government 11. work 12. life 13. woman 14. system 15. case 16. part 17. group 18. number 19. world 20. house

1,833 1,639 1,256 1,108 1,003 940 776 710 673 670 653 645 631 619 613 612 607 606 600 598

The left column in Table 3.4 shows that the most frequent verbs are very general: say, make, get, see, all of which can be used in many contexts and functions. Many of them, six of the top ten, are auxiliary verbs, which are italicized. The measurement here is in tokens by million words of running text. Approximately 42,000 occurrences (tokens) of the verb form to be (verb lemma or type be) can be expected per one million words in English, which is a long shot from about 1,100 occurrences for should (#20) in British English, though these distributions vary somewhat between varieties. The first four of these verbs are “closed class” function words that belong to the grammatical system. If one takes the example of the Northern Subject Rule with its constraints, one can imagine that corpus linguistics works best with very frequent phenomena, as they are more likely to occur in all linguistic contexts needed. The occurrences of noun lemmas, by contrast (right column of the table) are even much less frequent than the verbs: time, with just about 1,800 occurrences per million words, is the most frequent open class noun in the BNC. House, in 20th position, has about one third of the instances – which is a striking contrast from the quickly decreasing frequencies in the verbs. This means that common nouns (noun lemmas) are much less frequent than general verbs. It is thus easier to study the verb system with corpus linguistics, especially the auxiliary verbs with their much higher frequencies, than



Chapter 3.  A comparison of data collection methodologies

lexical nouns – bad news for lexicographers. All auxiliaries are in italics in Table 3.4 and, as one can see, many of the most frequent verbs are auxiliary verbs. It is therefore no coincidence that auxiliary verbs are among the most studied phenomena: not only do they undergo change, they are also frequent enough to be found in moderately sized, and even small size corpora. As sociolinguistic interviews are also searchable archives of speech, the same can be said of them: text collections always represent a portion of a person’s or a language’s inventory. Since interview databases are generally much smaller than written corpora – because of the effort that is needed to transcribe the audio recordings – the limitations for lexical, open class searches are more pronounced. The bigger the corpus, the less significant the problem becomes, but it never goes away completely. It is a structural feature of human language to have very few types accounting for many tokens (occurrences) in the corpus, which presents a profound problem with items of lower or even medium frequency. It would be next to impossible to carry out lexical corpus studies on chesterfield vs. couch, or on a particularly narrow meaning of phrasal verb take up, both variables that will be presented in Chapter 4. The internet is the largest corpus available, yet it is also the messiest one (e.g. Mair 2006). In many ways, one should not speak of the internet as a corpus, but as an unstructured and unbalanced database that can also be unreliable as search engines produce different overall counts on different days and on the same day at different times, against which one needs to build in safeguards. While the internet is more and more widely used today, it can only be used with a number of mitigation procedures (see, e.g. Grieve, Asnaghi & Ruette 2013; Dollinger 2011b; Dollinger 2015).

3.1.3

(Positive) Evidence and negative evidence: Problem #3

Another problematic issue of corpus data is that negative evidence, at least not by default, is not meaningful. If one searches a corpus of British English and does not find construction X, this does not mean that construction X is not used in British English. It does not mean much, actually, but only that the search term does not occur in that particular corpus of British English. Taken alone, this statement is not meaningful. It would take considerable work to substantiate the hypothesis that construction X is not used in British English. The reverse case, positive evidence, is fairly unproblematic. If construction X occurs in the corpus it is used in British English. In addition, if one wants to use a corpus, one must have a fairly good idea of the competing variants comprising a linguistic variable. What are the equivalent variants of the will-future in English? For that variable we have a good idea through existing corpus-based grammars, such as Quirk et al.’s (1985) Comprehensive Grammar of the English Language, Biber et al.’s (1999) Longman Grammar of Written and Spoken English or the methodologically more eclectically oriented Cambridge Grammar of

59

60

The Written Questionnaire in Social Dialectology

the English Language (Huddleston & Pullum 2002). But for many phenomena, these reference works are not detailed enough and they certainly do not inform us much about dialects and non-standard variants (DARE, the Dictionary of American Regional English is a dialect dictionary, though scholarly dialect grammars are much rarer). For lexical variables, the number of variants can be mind-boggling, as the A-curves in Figure 2.1 have shown, and corpora do not show us what to look for. Researchers, therefore, need to approach a corpus with a good idea of what to extract and this concerns not just the variable, but all (or at least the most prominent) of its variants! This means that, if we pick up the example from the previous chapter, the variants of the variable chest of drawers would necessitate the carrying out of at least 37 searches, one for each lexical variant, to aim and trace them in a corpus. Needless to say, many variants will not occur because of the features of text corpora, where few types account for a great part of tokens in the corpus.

3.1.4

Documentation of social backgrounds: Problem #4

The fourth problem is not a principled issue, but a matter of practice. For dialectology, the documentation of the social background of the informants and respondents has generally been foregrounded. Starting in the 1930s, one would find detailed background information about each speaker in the LAUSC project (e.g. Kurath et al. 1939). Sociolinguistics, with its focus on social correlations of linguistic features, tends to document the social background even in a more detailed manner than dialectologists. In corpus linguistics comparatively little is known about the speakers beyond basic social categories such as gender, setting and profession, for instance (see Baker 2013, 2010 for corpus linguistic examples), with the exception of sociolinguistic interview corpora, though these are not corpora of observation. The comparative lack of social variables in corpus linguistics is a consequence of the emphasis in corpus linguistics on authentic settings and situations, such as group discussions in professional settings. In these settings background questionnaires might be seen as overly invasive. When dealing with the written data and text types that have come to be associated with corpus linguistics, often very little or nothing is known of the authors beyond the publication channel and their names. Again, the concern for language data that is used naturally and that was not produced for any research purpose trumps the desire to have many detailed social variables for correlation with the linguistic variables. What we gain on one end, authentic linguistic use, we loose on the other with a relative lack of social variables that often comes with full authenticity. Labov’s sociolinguistic interview treads a middle ground: while the behaviour is elicited for linguistic purposes, safeguards are implemented that aim to reduce the level of conscious monitoring by the interviewee (see, e.g. Bekker 2013).



3.1.5

Chapter 3.  A comparison of data collection methodologies

Corpora and WQs: A comparison

With these four disadvantages of the corpus linguistic approach, which are limited linguistic contexts for some variables, problems with low-frequency items and lexical searches, the problem of negative evidence, and a shortage of social background information on the writers/speakers in many corpora, there is obviously room for other methods. Sociolinguistic interviews and fieldwork interviews can compensate for all four of these shortcomings to a considerable degree, given that the interviewer can influence the direction of the conversation and can assign tasks, such as reading passages, word lists or minimal pair contrasts, which are methods that elicit low-frequency items and linguistic contexts that would otherwise occur only very sporadically. But there are obvious limits to the degree of manipulation that the interviewer can undertake before the situation would become too artificial. For FI settings, Pratt (1983) makes the point, among other methodological considerations, that for certain lexical items, such as those that a person has not used in a long time because the occasion has not arisen (but if it did, the word would be used), the traditionally shunned direct elicitation may be the only way to record the word (“Do you know the word…”). The cost of control in interview settings is a reduced level of authenticity of the speech situation. Authenticity is, however, the great strength of corpus linguistics, which faces its own limitations. A big drawback is that even after 30 years of corpus compilation, corpora for many varieties do not yet exist. If we consider the multitude of specialized regional corpora needed for some theoretical questions, it is doubtful that stratified corpora will be readily available in the foreseeable future (e.g. Dollinger 2008c for one such case). It is possible that in cases where no adequate corpora are available or where items of lower frequency, including lexical items, will be studied, the elicitation of linguistic forms is one of the best options. The important point is that a switch to interview elicitation or written elicitation format takes place when corpus linguistic methods cannot serve one’s data needs. Once the researcher has the informants/respondents answer particular questions targeting a set of variables or forms, a degree of artificiality is introduced. It is clear that both kinds of methodologies – reported and observed linguistic behaviour – need to be seen as complementary and not as alternatives. The following comparison shall make explicit the differences between corpus linguistics and WQs. Corpus linguists have, quite rightly, celebrated and stressed the advantages of their method, especially in relation to introspection, the method used by generative linguists, who for the longest time have relied on the assessment of grammaticality, usually assessed by only one native-speaker informant. This “armchair linguistics”, so called because the native-speaker linguist could ponder over the answers to grammaticality in one’s chair, has been contrasted with corpus linguistic methods by one of the pioneers

61

62

The Written Questionnaire in Social Dialectology

of corpus linguistics (Svartvik 1992). Table 3.5 shows Svartvik’s original table, with an added dimension for WQs. Svartvik’s categories are listed below, next to his comparison with generative linguistics (1992: 8–10). Two categories were added at the end that are of relevance to WQs, yielding 12 points in total, which we will address one by one: Table 3.5  Comparison of features of three linguistic data collection methods (#1–#10 expanded from Svartvik 1992: 8–10, #11–#12 added) Introspection

Corpus Linguistics

WQs

1. Data is objective







2. Data can be easily verified





in principle 

3. Data can be shared







4. Data reflects the wide repertoire of uses







5. Frequencies of features are offered







limited 





7. Data is a theoretical resource







8. Data is relevant for applied linguistics (language teaching, planning)







9. Possibility of accounting for all relevant linguistic features in the data, not just a select few







10. Accessible method to linguists of non-native backgrounds







11. Allows for a polling of language attitudes in a population







12. Produces detailed social background data on speakers



generally 



6. Data includes examples

Table 3.5 throws into sharp relief the methods’ key features. While all data aims to be objective (#1), it is clear that data from one person, or a handful of people, is not objective in the best possible way. Both corpus linguistics (CL) and WQ data share much greater degrees of objectivity – corpora by virtue of the number of texts and their aim to be representative of a given variety, and WQs by virtue of a good sample population that is representative of the speech community under study. Verification (#2) is an interesting point and is most readily done with published corpora, such as the Strathy Corpus of CanE or the British National Corpus or the Corpus of Contemporary American English (all of which are available online at . Verification is



Chapter 3.  A comparison of data collection methodologies

possible with WQs too, if the survey raw data is made available – such as is the case with the SCE (Scargill & Warkentyne’s 1972 published data tables) or Dialect Topography (where the raw data is accessible online, see Section 8.1). With introspection, however, results are bound and only generalizable to the one speaker (or the handful of speakers) asked, which makes data sharing (#3) very difficult. Given the amount of data that variationist studies produce in general, both CL and WQs manage to yield a wide range of variants that would not be accessible by introspection (#4). A wide range of all kinds of different uses of a construction has traditionally been a major strength of CL, but this range crucially depends on the sampled texts and, equally so, on corpus size. If only print sources are used, the data will not be able to speak to spoken language or e-discourse, for that matter. Likewise, WQs need to be based on good, accessible questions and adequate sampling. The ability of offer frequency counts of the variants in question is again not accessible by introspection, but a key feature of many empirical approaches (#5). CL is able to offer counts in relation to a benchmark, e.g. per a million words, while in WQs a discourse frequency in not directly attainable. Frequency, however, can be successfully gleaned by taking the percentile results of reporting (in self-reporting) as an indicator of a person’s discourse frequency, or, in community-reporting, of the frequency in the community’s linguistic feature pool. The result is not as reliable as frequency counts in a well-balanced corpus and very large corpus, but, as will be shown later in this chapter, and Chapters 4, 6 and 9, offers quite striking insights into variation that are often equivalent with CL findings. When it comes to the absolute range of variation, there are good indicators to consider WQs superior to even very large corpora for some domains, above all, lexical variation, which is, as we have seen in the previous section, notoriously difficult to grapple with in CL. It goes without saying that all three approaches consider their data as adequate for theory building (#7). For both CL and WQs however, a theory of how linguistic features are used is the ultimate goal. The importance of corpora for applied linguistics, e.g. for language teaching and language planning, cannot be overstated (#8), an issue that correlates also with its widespread use among non-native speaker linguists (#10). WQs may be useful for applied linguists as well, though their roles have not been fully exploited, perhaps with the exception of the study of language attitudes (see Section 7.3.4, e.g. Jenkins 2007: 147–189) and perceptions (e.g. Preston & Long 1999–2002), for which WQs are the most logical tool (#11). For the inclusion of social variables, WQs offer the full range of social background information (#12), while corpora, even spoken corpora, usually make do with basic data (e.g. gender, age, perhaps occupation).

63

64

The Written Questionnaire in Social Dialectology

Linguistic examples: Attested and reported The final two points to be discussed are linguistic examples (#6) and the range of linguistic constructions that can be generated and accounted for with each method (#9). The ease of offering examples is the prime domain of CL and is one of its biggest strengths. CL produces data that is infinitely more varied than any team of linguists could devise on their own. Corpora generally produce examples that are difficult to devise and that at times run counter to one’s intuition: there is nothing stranger than real linguistic use. In this domain, however, CL is, I would argue, closely followed by certain kinds of WQs. Depending on the type of question, WQs can also produce an impressively wide range of variants that will reveal minor forms. Open answer responses fare generally well, with open answer lexical questions producing extraordinarily wide ranges of variation, as was effectively demonstrated in classic WQ studies such as Dialect Topography. For instance, the question for the name of a childhood prank, where elementary school children would take the back of a classmate’s underwear and pull them up, now most often referred to as a wedgie, yielded 100 variants. This included spelling variants and minor variations in a sample of 935. Only 126 respondents did not offer an answer, which shows that the concept was widely known in the survey population. The point to be made here is that the diversity of answers to open response questions in WQs is stunning, with variants including forms such as skwedgies, weggie, wedgie pull, scwegie, wedgie wars, weegie, wedgies, wedge, and a handful of double answers, e.g. “rooney, wedgie”, “wedgie, gotchie” or “gotchie, wedgie, supergotchie”.7 As we have seen in Chapter 2 with the A-curve, lexical variables usually comprise very few major variants and a long rat’s tail of minor variants. Open answer questions are the response type consistently producing the most varied variation, a range that is nothing short of the range found in good corpora. In some situations, WQs might elicit more variants than corpora. This would be the case in a highly diverse population of respondents that a large sample would be able to mirror. Such diversity is often difficult to capture in written corpora, as a result of the standardized nature of texts comprising many general corpora. One of the traditions of CL has been to account for all of the data, or the big bulk of it, instead of just for fringe phenomena or artificially constructed sentences (John loves Mary. A girl eats ice cream. The happy dog eats hot dogs. John is easy to please.). The aim has been to account for a wide range of linguistic constructions (#9), a feature that would apply to theorizing based on WQs: likewise, one would aim to describe the overwhelming part (the 90% or more) if not all of the data. This feature in WQs is 7. Section 9.3.3 will show that this array of answers can be reduced to four major types, six alternative types and seven ambiguous names by normalizing spellings and orthographic conventions without losing too much information.



Chapter 3.  A comparison of data collection methodologies

certainly contingent on the extent of a speaker’s conscious access to a feature. Access is generally given for lexis, as shown above, but is somewhat less of a given for morphology and syntax, though innovative prompting and task questions (Section 7.3) allow for the polling of syntactic and morphological data. The problem of accessibility, as we shall see in Section 3.3.3, lies above all in the area of phonetics.

3.2 Comparison of elicitation techniques: WQ and FI After having explored the relationship between CL and WQs, we now need to probe further into the reliability and validity of data collected with WQs: how is WQ data different from FI data? A comparison of WQs with FIs has been supplied in a previous study (Chambers 1998a), so that the present chapter can limit itself to offering statistical testing not available to the original paper. This section seeks to establish whether advanced statistical methods confirm the results of Chambers’ pioneering study, which was a first step towards re-establishing the credibility of WQs in English sociolinguistics. The two data sets were collected for the Linguistic Atlas of the Upper Midwest and are comprised of equivalent WQ and FI data.

3.2.1

The Linguistic Atlas of the Upper Midwest (1947–1953; 1973–6)

The Linguistic Atlas of the Upper Midwest (LAUM) is a major project for the Linguistic Atlas of the United States and Canada. Under the leadership of Harold B. Allen this project commenced fieldwork in 1947 and was completed four years later by a team of seven fieldworkers, though the results were not published until Allen (1973–76). The principal survey area was the five Upper Midwestern states of Minnesota, Iowa, the Dakotas, and Nebraska, though five interviews were conducted in Canada (three in NW Ontario, one in Manitoba and one in Saskatchewan). Interestingly, LAUM included a supplementary WQ that overlapped in some items with the fieldwork variables, providing the opportunity to compare the findings. A basic comparison of both types of survey was carried out in Chambers (1998a), an innovative paper that will be used as a springboard for the current analysis. Chambers (1998a) is important for two reasons. First, it introduced written questionnaires into variationist linguistic circles, a feat that would have been unthinkable only a decade or two earlier. And second, it offered a sociolinguistic rationale for the WQ method, which Chambers employed in the Dialect Topography of Canada. LAUM’s distribution of respondents by state is shown for both FI and WQ in Table 3.6. It can be seen that the WQ respondents are four to six times the number of the fieldwork interviewees:

65

66

The Written Questionnaire in Social Dialectology Table 3.6  Data from FI and WQ by state in the Linguistic Atlas of the Upper Midwest FI WQ

Minnesota

Iowa

North Dakota

South Dakota

Nebraska

 65 267

 52 256

 26 135

 28 178

 37 228

 208 1064

The collection of the WQ data took considerable time, as it was designed analog to the FI procedures. By using a two-stage process for selecting his respondents, Allen and his associates spent a lot of time applying the selection method characteristic of interviewee selection. This approach complicated and slowed down the data collection process considerably. First, Allen identified possible candidates and invited them in writing to participate, and only then would they receive the questionnaire in the mail. Allen collected most of the WQs in the years 1951–53, with some limited follow-up work. Every project of the Linguistic Atlas of the US and Canada adhered to the basic principles laid out in Kurath et al.’s (1939) Handbook of the Linguistic Atlas of New England, but every regional project innovated in some area or another. The inclusion of a WQ can be regarded as one of Allen’s innovations, though Allen himself only considered it a “supplementary mail study”. Chambers (1998a) speculates that Allen did not deem the WQ results as fully adequate and up to par with the FI data. At the same time, Allen’s characterization remains somewhat ambivalent. For instance, his inclusion of the WQ section was a direct response to Alva L. Davis’ PhD dissertation on the vocabulary of the Great Lakes Region (1948), introduced in Chapter 2. Allen (1973: 29) considered Davis’ study as having “so effectively demonstrated the supplementary and corroboratory value of lexical data obtained by mail”, yet he did not foreground the WQ in any meaningful or indeed obvious way. Quite on the contrary it seems that Allen treated the WQ as under par. The WQ was meant merely to be compared with the FI data and wherever disagreement between the two surveys was found, precedence would be given to the FI. Though this approach is methodologically biased, it is historically consistent with the established LAUSC fieldworker methodology as the method of choice and is about as favourable an assessment of a competing method than one might imagine. Seen from this angle, Allen’s assessment of WQs is actually quite positive. Allen’s WQ presents itself in characteristically sociolinguistic fashion, eliciting place of residence, age, education, occupation, and asking respondents to provide birthplaces and the birthplaces of both parents’ grandparents. It also required the complete residence history (with dates) of each respondent, as well as out-of-state travel, on top of the languages spoken. Respondents were also asked, as in Davis (1948), to provide their names. Allen’s instructions are interesting, as respondents are



Chapter 3.  A comparison of data collection methodologies

expressly alerted to regional linguistic variation. The opening line of the instructions reads, for instance: “For many ordinary things around the house and the farm and in daily living, people in various parts of the country use different words”, which is then explicitly linked to different settlement histories. These instructions highlight two features – regional variation and historical lineages. The respondents were further directed: Put a circle around that word in each group which you yourself ordinarily use. If you ordinarily use more than one word in a group, then put a circle around each of the words you use. Don’t put a circle around any word you do not use, even though it is familiar. If the word you ordinarily use is not listed in the group, then please write it in the space below the item. The words printed in capital letters are there just for explanation. Example: TOWN OFFICERS: selectmen, trustees, councilmen, supervisors, commissioners (with supervisors circled, SD)

The instructions clearly show Allen’s indebtedness to Davis (1948), as parallels run deep, from order to wording, as a quick check (p. 32) brings to light. The WQ encourages multiple answers to allow for more complex variation, yet it does not allow to rank-order the variants, a problem that Allen was well aware and that remains a methodological challenge to this day. The FI elicited, depending on the version, between 584 and 661 items (i.e. “words and phrases”, Allen 1973: 31–2). The WQ, by contrast, included 136 lexical items in addition to the social background questions.

3.2.2

Allen’s WQ and FI data: Chambers’ selection

Chambers (1998a) collated data for both FIs and WQs for a subset of 35 of the 136 WQ items, which is presented in Table 3.7. The sample was limited for logistical reasons and was chosen as a “fair and representative sample” (Chambers 1998a: 232), which is one way to make do with a subsample. For instance, the first two variants – kerosene and coal oil (going back to Hempl 1896a) – are the major variants to the prompt “FUEL FOR LAMPS” of item #23 in the WQ. Other variant options in the WQ included oil and lamp oil, which were not considered in the comparison as minor variants. The following test, therefore, applies only to the major variants, two in the case of kerosene/ coal oil, but up to six for the variable clodhopper. The WQ question of kerosene/coal oil corresponds to FI question #20.3b: 20. 1 grease (the car) [verb]. 2 greasy. 3a oil. 3b [1949] kerosene, coal oil, lamp oil. 4 (inner) tube. 5 (they are going to) launch the boat. 6 I am going (today) [is the auxiliary verb omitted?]; we ~ ~; they ~ ~. 7 Am I going (to get some)? ~ they ~ ~?

67

68

The Written Questionnaire in Social Dialectology Table 3.7  Comparison of 35 lexical variables from LAUM (Allen 1973–76) (from Chambers 1998a: 233) Variants

Field Postal

kerosene 90 90 coal oil 20 20 attic 96 98 garret  8  6 front room 24 30 living room 72 71 parlor 18 16 sitting room 17 16 porch 99 96 veranda 12  6 stoop 26 29 step(s) 20 22 vegetable garden 18 18 garden patch  1  8 shafts/shavs 85 88 thills/fills 21 16 sawbuck 61 62 sawhorse 33 40 haycock 55 51 hayshock 24 29 sook [to calves] 34 32 come 41 48 bucket 31 26 pail 80 85 (wood) bucket 69 62 (wood) pail 54 44 harrow 73 64 drag 51 50 pig 35 35 piggy 16 31 pooie 15 11 sooie 13 15 sunset 54 59 sundown 53 46 sheepy 26 35 sheep   6   3 come sheep   8 12 co-nanny   4 10 kuday  2  8 yeast 94 78 dry yeast 12 27 potato yeast   1   7 soft yeast  2  6

Variants

Field Postal

Variants

Field Postal

shade 81 94 doubletree 82 82 blind 18 20 evener 54 31 curtain 18 13 frying pan 75 62 blacktop 77 86 skillet 63 45 oil road 13 18 spider 14 10 tarvia 12  7 jew’s harp 67 74 ditch 79 71 juice harp 33 15 grader ditch   7 19 thunderstorm 50 77 borrow pit 13 12 electrical storm 32 29 gutter  1  9 eavestroughs 51 80 scum 38 44 gutters 26 25 skim 23 14 sashaying 63 63 belly gut   3   1 go cattywampus  3 31 belly bunt   1   3 (coal) bucket 29 41 belly down   1 10 (coal) hod 33 14 burlap bag/sack 40 45 (coal) pail 13 45 gunny sack 77 65 (coal) scuttle 45 19 relatives 66 67 relations 18 12 folks 32 18 kinfolks   7  3 (garbage) slop 71 89 (garbage) swill 20 26 (garbage) pail 77 82 (garbage) bucket 28 34 (thrown) stone 51 63 (thrown) rock 50 51 moo 68 75 bellow 19 21 bawl 28  4 low  4  7 clodhopper   9 11 country jake 12 13 hayseed 42 52 hick 22 16 hillbilly 13  1 rube  9  1 give him the air 15 12 give the bounce   5   3 cold shoulder   5 14 give the mitten 24 13 jilted him 37 14 threw him over   9   3



Chapter 3.  A comparison of data collection methodologies

We see here the loosely defined “work sheet” style that is characteristic of American dialect geography: given the prompts of 20.3b, the trained fieldworker is left to his or her own devices to elicit the desired response in as natural a manner as possible. Table 3.7 shows the average percentages for answers to lexical questions by 208 field survey respondents and 1064 postal survey respondents. For the first variable, kerosene/coal oil, 90 percent answered with kerosene and 20 percent with coal oil in both the WQ and FI. The percentages do not add up to 100%, as multiple answers were actively encouraged. Minor differences can be seen in attic/garret – the next item, and more drastic differentials of about 30 percentage points in (last column) thunder storm/ electrical storm and eaves troughs/gutters, for example. How can we tell whether the differentials shown can be considered as significant differences between FI and WQ or whether they are a product of chance? In other words, the kind of variation that is always present but not systematically derived from the type of survey method? The original tests were two statistical texts, which showed that both populations are not different (Chambers 1998a: 234). In the following section a stricter statistical measure will be applied. All statistical tests are performed in the open source software suite “R”, which also offers very powerful drafting and visualization features. We will tap into some of R’s functions in the last chapter of this book.

3.2.3

Selecting the best test

The first step is to check if the data to be tested meets the constraints of the statistical test. There are a number of resources for this purpose, but linguists will find that Gries (2009b) and, for the more advanced student, Baayen (2008) will offer answers to most questions. If the results of two samples need to be compared, a Paired T-Test is often used in statistics. This was the test originally used in Chambers (1998a), followed with a Pearson Correlation Coefficient. For each of the 35 variable pairs, each variant was compared with a Paired T-Test against the other, as can be seen from Table 3.7. One criterion is that the samples are independent from each other, which is usually assumed (this means that the chances are equal for each respondent to choose attic or garret – the 2nd variable – after having chosen one variant or the other for the first variable). Another basic condition of both the T-Test and the Pearson Correlation Coefficient is that the data is what statisticians call “normally distributed”. One may recognize the bell curve with its distinct shape, as seen as the black curve in Figure 3.1. Normal distribution means all data points pattern around the target area in the shape of the bell curve. Many data collections meet that requirement, e.g. darts targeted at a bull’s eye or marbles dropping into a roster of columns. Language data, however, is often “not normally” distributed, as the statistician says, which is why this condition should be tested beforehand.

69

The Written Questionnaire in Social Dialectology

There are two ways to check for “normality”: first by visualizing the results and second by applying statistical tests of normality. The light-grey lines in Figure 3.1 are “density curves” based on Allen’s data from Table 3.7. All data points from the survey are lined up and then charted on the grid. The dark grey line is a “normal distribution” curve, the bell curve. In order for both the T-Test and Pearson Correlation Coefficient to be applicable, the curves representing the WQ and FI data should be nearly congruent with the normally distributed curve.

0.015

0.010 density

70

0.005

0

0

50

100

frequency

Figure 3.1  Allen’s data for PQ and FI (grey) and normal distribution (black)

It is easy to see that there is little overlap between the light grey lines of the survey data and black normal distribution curve, which is a clear warning sign that Allen’s data is not normally distributed, ruling out all tests that require this feature (see Baayen 2008: 76) including the T-Test and the Pearson Correlation Coefficient. Figure 3.1, moreover, also suggests from its close overlap of the light grey curves, the WQ and FI data, that the two data sets may indeed be equivalent. In most cases we would stop here and move straight to tests for “non-normally distributed” data to see if the equivalence of FI and WQ is upheld with more rigorous, non-visual methods.

Testing the data for equivalence If we decide not to trust the somewhat fuzzy visual comparison, we can test whether the light grey curves meet the normal distribution requirements. Normality is tested with the Shapiro-Wilk test. If the p-value for probabilities in the Shapiro-Wilk test falls below the cut-off point of 0.05, in 19 out of 20 cases the distribution is not normally distributed. If that is the case, the Paired T-Test should be replaced with a test designed for non-normally distributed data, such as the Wilcoxon test (Gries 2009b: 214).



Chapter 3.  A comparison of data collection methodologies

The Shapiro-Wilk test uses as its input the differences between each paired sample, which are calculated from Table 3.7. The first value is derived by subtracting the values for kerosene 90 (field) – 90 (postal) = 0.0, followed by 20 − 20 = 0.0 for coal oil, and 96 − 98 = −2.0 for attic, and so forth. It is these differentials that are fed into R and then submitted to a Shapiro-Wilk test. Testing is very easy in R. This test, like most others, consists of only one command. The corresponding p-value is 0.0058 and therefore smaller – ten times smaller actually – than the 0.05 cut-off point, which means that with a likelihood of 95%, or more, the two data sets are not normally distributed. We now have hard evidence (instead of our “soft” visual comparison) that the T-Test and Pearson Correlation Coefficient should not be used. Consequently, we use the command and parameters for the Wilcoxon test to establish whether the responses to the WQ and FI are equivalent. If WQ and FI are different, the Wilcoxon test would produce a p-value of under 0.05. But it does not: its p-value is 0.81, which is much higher than the cut-off of 0.05. If the p-value were 1, there would be certainty that the WQ and FI are identical; if it were 0 or close to zero, it would be apparent that WQ and FI data are definitely different. With a value of 0.81, which is much closer to 1 than to 0, there is a good level of confidence that WQ and FI are equivalent overall, i.e. that when all answers are taken together and measured, the WQ and FI are equivalent. What we have shown is that it cannot be proven that WQ and FI are different. Strictly speaking, this is not the same as proving that they are the same (after all, p is “only” 0.81 and not 1). Note however, that this test can only speak to the 35 of 136 variables tested, which was the original selection made in Chambers (1998a). This method can generally be used to identify questions that might be less optimally equivalent between WQ and FI, by a grouping answer sets together that look rather different, such as the last variable in Table 3.7, (coal) bucket. Such an approach would go some way towards identifying possible variables and types of variables that face challenges in either WQ or FI. However, the test has shown solid evidence for the equivalence of WQ and FI with more advanced statistical methods than in the original study, and arrived at the same conclusion as Chambers (1998a). This is good news for the validity of lexical WQs when compared to FIs.

Allen’s original assessment It is interesting that Allen, when discussing his mail survey (1973: 29–30), expressed doubts about the WQ (in LAUSC terminology “checklists”) which led him to strictly separate the FI data from the WQ data. When disparity occurred, Allen sided – in hindsight somewhat uncritically – with the FI. This scepticism runs like a thread through Allen’s commentary, such as in the following lines:

71

72

The Written Questionnaire in Social Dialectology

Although the checklist returns usually exhibit rather remarkable correspondence with the findings of the field investigations, the contradictory data for a few items raise again the unanswered questions always associated with the use of a questionnaire to be filled out by mail. (Allen 1973: 30)

The FI appears to be the unassailable standard, while the WQ is found deficient where there is no match. Allen, however, had no principled way of knowing whether the contradictions came from the WQ or the FI, though he lists the following sources of error in the WQ: – influence by the order of variants – skewing of results by reporting somebody else’s and not one’s own linguistic behaviour – possible misunderstandings due to the brevity of the prompts The FI, as the established method, is presented as beyond any doubt and does not receive a similar treatment of possible errors. Instead, Allen finds fault exclusively with the WQ and its “weakness due to the absences of the fieldworker with his ability to explain what is wanted without suggesting possible answers” (Allen 1973: 30). The idea that fieldworkers suggest variants is problematic. Could it be that fieldworkers, possibly some of them at least some of the time, led the interviewees in a given reaction? There is certainly a fine line between ensuring that the target variable is elicited and suggesting a form otherwise not used (Chambers 1998a: 227). Certainly, in a team of fieldworkers there is bound to be more variation than if one were to interview alone (which is, since Gilliéron’s ALF and Lowman’s work on LANE, no longer feasible). For LAUM, FI data was collected by a total of nine fieldworkers, six under Allen’s direction plus himself, and data from two legacy data sets. For another LAUSC project, Bailey and Tillery (1999) show that one interviewer, Barbara Rutledge, elicits more than twice the frequency of the double modal might could, a stigmatized form, than any other interviewer. Bailey and Tillery (1999: 397) attribute this as a “consequence of the work of a single field-worker and of her approach to elicitation”, which is clearly a problem for the FI method; a problem that does not apply to WQs. Allen expressly acknowledges individual differences in interviewing styles and offers some interesting details, personal details on the fieldworkers’ routines. McDavid is characterized as “least inclined to explore the informant’s vocabulary by jogging his memory with a suggested word”, producing only 3.5 additional words, compared with 18.3 by Weber. More generally, any additional “spontaneously produced language forms” were recorded differently: “fieldworkers differ widely in their concern for recording [valuable conversational items]”, Allen writes. Glenn, a female fieldworker, recorded 84 items on average per interview, while Peterson recorded a mere 11. It seems that personality and the gender of the interviewer would have an effect at least



Chapter 3.  A comparison of data collection methodologies

in terms of additionally documented variables that are not required by the interview questionnaire. It is also conceivable, indeed likely, that differences in the set of variables may be attributed to these factors. The danger of fieldworker effects is probably somewhat greater in American than in English dialectology due to the different kinds of interview questionnaires – the guidelines for the fieldworker – that were used. While British fieldwork depended on questions that were read verbatim by every fieldworker, American fieldwork was in favor of a freedom that allows a trained fieldworker to exercise his own discretion and ingenuity in seeking a response in different ways best suited to the occasion and informant, and hence often obtain relevant information that rigidly controlled questions would not elicit.”  (Allen 1973: 26–7)

One can see both sides of the coin: more freedom will yield, possibly, a more free-flowing conversation (which is good as it is more likely to produce spontaneous speech despite the artificial interview situation), but on the other hand the data may not be fully equivalent between the fieldworkers. Once again, we have arrived at the clash between reliability and comparability of the data. In light of these thoughts, however, it seems even more likely that Allen did not give the WQ full consideration. With more than five times as many responses as the FI it surpasses the original goal of acquiring only two responses from all 401 UM locations by more than 250 responses. Allen’s assessment tells less about the quality of WQs in lexical surveys and more about the prevailing attitudes of the day.

3.3 Comparison of elicitation techniques: Sociolinguistic interview and WQ The results from the previous section offer support for the validity of data collected with WQs when compared with FIs. However, there is the chance that both surveys, WQs and FIs, err in the same direction. If that is the case then the two elicitation methods in general – regardless of their precise “mode” – do not adequately reflect language use. This precise issue has been explored in early sociolinguistic work in classic studies by Labov (22006 [11966]) in New York and by Trudgill (1972, 1974b) in Norwich. In New York, Labov’s well-known series of studies on post-vocalic [r], compared reported use (as in WQs) with (near) observed behaviour (via sociolinguistic interviews). For the most part of the 20th century, New York English was torn between the Standard American r-ful forms of, e.g. car [kɑr], and the traditional vernacular r-less forms, e.g. [kɑ]. Labov (2006: 304) summarizes the r-data: “It appears that there is little relation between the amount of (r) which New Yorkers actually use, and their impressions of their own speech”, with them most often over-reporting their use as more standard-like [r] than it actually is.

73

74

The Written Questionnaire in Social Dialectology

If over-reporting were generally the case, one could consider such a feature and adjust for it. However, in Trudgill’s study of yod-dropping in Norwich, a more complex result was found. In British English, [j] in student, tune, queue is the standard variant, while the yod-less Ø is non-standard. Trudgill (1972: 186) found 16% over-reporting yod, which is the same direction as in the New York case. He also found, however, 40% under-reporting yod, which means that they did pronounce yod in the interview, but reported otherwise. Trudgill shows that more male speakers under-report yod, which he ascribes, much like Labov, to the covert prestige of non-standard forms that are more frequently applied by males. These facts are bad news for the reliability of WQs, as they do not consistently err in one direction that could easily be controlled for. For that reason, variationists consider WQs “to be of little use in the investigation of social variables” (Boberg 2013: 133–4).

3.3.1

The Observer’s Paradox and WQs

As with any data collection method, some types of variables are more prone than others to show biases. Generally, pronunciation variables are considered the least likely variables to be reported faithfully. In contrast, lexical variables are those considered to be most reliably polled, while with syntactic and morphological variables opinions are divided. For the polling of opinions and attitudes about language, WQs represent one of the generally most accepted forms of data collection. The rationale of the present section is to compare two pronunciation variables to reveal how good a match can be found between WQ data and (audio-recorded) sociolinguistic interview data. If a satisfactory match can be established for phonological variants, it would imply that other types of variables may fare the same or better with WQ data. The Labovian approach introduced a focus on the spoken language. Its method of choice was the sociolinguistic interview (see, e.g., Tagliamonte 2006: 37–49), which elicits variables, as was discussed in Chapter 1, in a minimally invasive way. The sociolinguistic interview, however, is not a natural situation. Among other things, interviewees are given different tasks, such as reading a list of isolated words and a text passage, in addition to speaking freely. The different tasks produce various “speech styles” – from formal (such as word lists), to less formal ones (such as reading a passage), to hopefully the least formal ones, such as an informal conversation about a topic close to the interviewee’s heart. The conversation is recorded and then later transcribed and analyzed in systematic ways. The single biggest problem in interview elicitation settings is the aforementioned “Observer’s Paradox”. The Observer’s Paradox can be succinctly described in a famous quotation from Labov (1972: 209): “the aim of linguistic research in the community must be to find out how people talk when they are not being systematically



Chapter 3.  A comparison of data collection methodologies

observed: yet we can only obtain these data by systematic observation” (see Meyerhoff 2011: 42–43 for discussion). The issue had been known before,8 but Labov designed his interview methodology in order to systematically minimize and overcome the Observer’s Paradox, which affects the usability of direct questions about language, such as asked in WQs: do you use A or B? Consequently, direct questions about language, be they questions on a dialect survey or judgements of grammaticality, “tell us about people’s opinions on language, not about language itself ”, as Boberg (2013: 134) puts it. In this view, WQs allow researchers only to explore the language attitudes of people and the results have little to no value for actual language use. This all-or-nothing approach seems somewhat radical: while it is certainly true that variables, some more than others, will be more likely subject to manipulation by the respondent, the equivalence test just performed in the previous chapter suggests that there is little difference between lexical WQ data and the FI in American dialectology, whose goal was a free, as natural as possible conversation that weaves in cues for the interviewee to produce certain variables. Depending on the skill of the fieldworker, some conversations would have come close to the Labovian ideal of a conversation where the focus is the conversation and not language, i.e. closer to the variationist ideal of the vernacular. Granted, a conversation with a fieldworker, however skilled in the “gift of the letting people talk”, who would scribble down answers on sheets of paper is different from having a conversation with a sociolinguist who records with an inconspicuous recording device. But given that FIs span over multiple visits and over a few days, and sociolinguistic interviews typically last only about an hour and are usually done in a single visit, one can easily imagine that by Day 3 a fieldwork interviewee will be quite familiar and therefore relaxed with both the fieldworker and the procedure, perhaps even crucially more so than in a comparatively short sociolinguistic interview between strangers. Relating to the Observer’s Paradox, however, WQs might have one clear advantage over any interview setting, as most if not all social constraints of a face-to-face interview fall away. Since no stranger is physically present, there is no danger that the respondent might accommodate to a particular interviewer. This is not to say that WQ respondents do not know that someone will eventually read their answers, but by anonymizing the questionnaire submission process (such as via anonymous mail-in or with an internet questionnaire), there is at least no direct incentive to answer in

8. Alexander Ellis (1869–89: IV: 1086) noted, for instance, that “the mere observation is beset with difficulties. The only safe method is to listen to the natural speaking of someone who does not know that he is observed”. Ellis, however, gives credit to an earlier source, 18th-century German poet Friedrich Gottlieb Klopstock.

75

76

The Written Questionnaire in Social Dialectology

consciously manipulative ways. Under these circumstances, one might ask why some respondents would not report even socially stigmatized forms? As seen in Chapter 2, WQs have come to be acknowledged for a greater versatility than first conceded: Boberg expressly distinguishes between variables inviting public comment and carrying social stigma and those that do not: some variables may be talked about but may not be negatively evaluated socially. For these variables, Boberg reasons, “the disadvantages of the observer’s paradox may be diminished, in some cases to the point where they are balanced by the advantages of more data at lower cost” (2013: 135). For variables that are socially evaluated, he suggests that [a]s long as the effect of direct observation is kept in mind, and survey data are not treated as equivalent to data extracted from actual speech, questionnaire responses can indicate social or regional patterns in the evaluation of variables, including evidence of changes in progress.  (Boberg 2013: 135)

This assessment is very different from ruling out the method a priori. To interpret the data in light of its strengths and weaknesses seems like the only reasonable approach. After all, no matter the kind, there is no data that captures all aspects of the communicative situation. Some data types come closer to the ideal than others, but all need to be interpreted with their caveats in mind.

3.3.2

McDavid’s test

The basic comparison and test design to be employed in this part is not new, as it has long been known that the validity and reliability of WQs are best tested by comparisons with recorded data. One of the first such tests was carried out in the context of the LAUSC project almost 80 years ago by Raven I. McDavid. Testing the reliability of speakers reporting on their low-back vowels, McDavid collected data for the purpose of comparing the two methods. Using word-lists, he asked informants “to indicate [their] natural pronunciation, whether with [ɑ] or with [ɔ] (identified as the vowels in father and law, respectively)” (McDavid 1940: 145). McDavid (ibid) lists four influences as possible and likely skewing factors. First, that the WQ method is incapable of registering fine phonetic detail, a fact which does not distort the results, since he only looked into phonemic contrast. Second, that “phonetically unsophisticated informants cannot determine what vowel they use”, which is potentially detrimental to the method and its biggest challenge. Third, that “spelling-pronunciations and notions of ‘correctness’ or ‘elegance’ may inhibit a normal response”, which is the point of critique that most variationists would muster against the method. And, fourth and last, that “fatigue or haste” may lead to incorrect reporting or repetitions, which is also a valid point that needs to be mitigated against, above all by limiting the length of the questionnaire.



Chapter 3.  A comparison of data collection methodologies

The overall results of McDavid’s comparison show a striking equivalence of FI and WQ data for phonemic variants. McDavid (1940: 145) deems the tabulated WQ data to “correlate closely both with the two field transcriptions and with the general impression the author has of the speech” of the area.9 The lack of an explicit and quantitative comparison – rather than McDavid’s qualitative-summative one – might be one reason why McDavid’s paper, which offers striking proof for the validity of WQs in phonemic elicitation, seems to have encouraged few WQs for phonemic variables until Avis’ Canadian survey. Another reason might lie in one of McDavid’s stated limitations, writing that his study made “no claim to presenting a detailed impressionistic [in the sense of elicited and transcribed] analysis” but that it attempted “merely to indicate a relative phonemic distribution” of two back vowels, with the second, third and fourth skewing factors “cancel[ling]” each other out so that only group averages may be trusted. In other words: while no single, individual record would be reliable, the average reported vowel values across the 75 respondents would adequately represent the speech features of the region. Clearly, with such modest claims, WQs would remain of little interest for phoneticians.

3.3.3

WQs and sociolinguistic interviews in Vancouver, Canada

The present section provides a direct comparison of WQs and sociolinguistic interviews. For 10 interviewees and speakers of Vancouver English, phonetic and phonological variables will be compared in an effort to establish the validity and reliability of the WQ method. The latter case will show the limitations of WQs, which appear to be not as egregious as one might think. The variables are yod-dropping, also known as glide-deletion, in the three lexical contexts of student, coupon, and avenue, and the low-back vowel merger in three contexts: caught vs. cot, Don vs. Dawn and sorry vs. sari (East Indian dress). This approach goes beyond McDavid’s aggregate reporting. Unlike McDavid, each vowel utterance will be looked at, in addition to the aggregate result. All interviewees were native-born and raised Vancouverites, who had close network ties in the city and were very local (based on a Regionality Index of not higher than 3, see Section 8.2.1 for an explanation of this index score). Table 3.8, shows the interviewees’ aliases, ages, and ethnicities:

9. While his keywords (father and law) are not listed, LAMSAS data is available online at the Linguistic Atlas Project website and invites systematic comparisons with McDavid’s contemporary data, see .

77

78

The Written Questionnaire in Social Dialectology Table 3.8  Interviewees in the Vancouver sociolinguistic interviews (AC = Asian Canadian, CC – Caucasian Canadian) Young

Middle

Old

Male

Female

Male

Female

Male

Female

Anton 15, AC Mario 25, CC Gustave 25, AC

Lola 19, AC Kelsey 20, AC Ella 26, CC

Chad 44, CC

Nancy 40, CC

Carl 62, CC

Carla 61, AC

A few weeks after the interview, the interviewees were given a short questionnaire which included the following questions, intermingled with a number of pseudo questions in order not to reveal the purpose of the questionnaire. For the low-back merger interviewees were asked the following questions:

3. Are the first parts in the words sorry and sari pronounced the same? Yes No 7. Do the words cot and caught sound the same to you? Yes No 13. Are the names Dawn and Don pronounced the same? Yes No

And for yod-dropping, the questions were identical to Chambers (1998a: 236):

8. Does the ending of AVENUE sound like you or oo? 12. Does the u in STUDENT sound like the oo in too, or the u in use? 15. Does the beginning of COUPON sound the same as cue, or coo?

The self-reports were then compared with the interviewees’ acoustic measurements.

Low-back vowels The merger of the low-back vowels is an ongoing linguistic change in North America, in which Canada has shown advanced features. The merger was most likely brought to Canada with the United Empire Loyalists in the 18th century (Dollinger 2010: 210–18; Chambers 1993: 11), but has spread across Canada faster than the United States, creating such homophones as cot and caught, Don and Dawn, Otto and auto. While the merger now covers more than half of North America, Labov, Ash and Boberg (2006: 65) assess that “Only in Canada is the merger well enough established to show no correlation with age” (see Boberg 2008a: 135 for a bigger sample). The cot-caught merger has been surveyed in self-reports for some time. In the 1972 SCE, the merger showed a wide dissemination in Canada. The parents (respondents in their late 30s and early 40s) reported merged vowels in 85% of males, 85% of females and in the 14-year-olds in 84% of males and 87% of females (Scargill & Warkentyne 1972: 64). These figures include Newfoundland, whose percentage of around 70% was much lower than mainland Canada and decreased the national averages by 2–3% points. The basic message, interpreted in apparent-time, is that already in the early 1950s, when



Chapter 3.  A comparison of data collection methodologies

the adults were completing their formative years, the low-back merger seems to have occurred more or less categorically (around 90%) in native-born speakers across Canada who were then in their 20s or younger. There is further support for this interpretation. Gregg (1957: 22) reports from “local” Vancouver university students that cot and caught and caller and collar are merged in a low-back vowel with “slight lip-rounding”. Figure 3.2 shows the acoustic vowel plots10 for the ten informants for single tokens of caught and cot. Eight of ten interviewees merge the two vowels (shown by overlaps) or are very close (e.g. Carl, 62). Mario, 25, is a good example of a complete merger, while Ella, 26, and Carla, 61, and Carl, 62, who have auditorily indistinguishable mergers, show minor measurements differences. Their difference in vowel height is only detectable by acoustic analysis. Nancy, 40, is a borderline case by showing a bigger gap and should be considered “close”, not merged, while Anton, 15, shows distinct vowel sounds in both contexts. Gustave 25

400 600

caught cot

1200 2500 2000 1500 1000

caught cot

800

800

cot caught

F1

400 600 F1

400 600

800

Carl 62

200

400 F1

200 400 cot caught

800

600 800

1000

1000

1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

F2

F2

Lola 19

F2

Kelsey 20

200

F2

Ella 26

Nancy 40

Carla 61

200

200

200

F2

200

400

400

600

600

caught cot

800

caught cot

800

caught cot

1000

1000

1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

F2

F2

F2

800

caught cot

1000

1200 2500 2000 1500 1000 F2

F1

400 600 F1

400 600 F1

400 600 800

caught cot

1000

F1

F1

Chad 44

200

600

1000

F1

Mario 25

200

F1

Anton 15 200

800

caught cot

1000 1200 2500 2000 1500 1000 F2

Figure 3.2  Vowel plots for caught/cot (non-normalized)

The crucial question is how these measurements compare with WQs. Table 3.9 shows the interviewees’ self-reporting. For cot and caught, Ella was the only one who did not self-report her vowels as merged, while her plot in Figure 3.2 shows, contrarily, a merger that is almost complete. It seems as if she is misguided by orthography, which is one of the dangers of WQs. Nancy’s recordings, auditorily, appear to be very similar

10. The acoustic analysis was carried out in Praat using F1 and F2 measurements at the vowel midpoint.

79

80

The Written Questionnaire in Social Dialectology

and might count as a match, though we err on the side of caution by assigning her “No?”. Anton’s measurement, however, is the real outlier: his vowels clearly do not match, yet he reports them to be identical. A possible, if unusual explanation is that Anton, who was very nervous for most of the interview, might have lost some control over his linguistic performance. The cause for this rationale being considered here is that he confessed after the interview that he did “not know what [he] was saying”. On the whole, apart from Ella, Anton and perhaps Nancy, 7 out of 10 self-report faithfully for cot/caught. Nancy might count as a match, given her auditory merger that almost, but not quite, shows in Figure 3.2, which leaves only Ella and Anton as the only two real cases of wrong self-assessment. Table 3.9  Matches between self-reports and acoustic data cot/caught merge Anton

X

Mario

X

Gustave

X

nonmerge  

Don/Dawn match

merge

NO

X X

YES YES

nonmerge

sorry/sari match

merge

YES  

YES

X

NO

 

nonmerge

match

X

NO

X

YES

X

YES

Lola

X

YES

X

YES

X

YES

Kelsey

X

YES

X

YES

X

YES

Ella Chad

X X

NO YES

X X

YES

X

YES

NO

X

YES

Nancy

X

NO?

X

YES

X

YES

Carl

X

YES

X

YES

X

YES

Carla

X

YES

X

NO

X

YES

7/10

7/10

9/10

The vowel plots for Don/Dawn and sorry/sari are shown in Figure 3.3. For Don/Dawn, Ella repeats her linguistic idiosyncrasy by reporting that she distinguishes between the two vowel sounds, which is indeed shown in the vowel plot. Ella is an interesting case: misinterpreting the cot/caught question, yet faithfully reporting the vowel feature of Don/Dawn. Gustave, believing he pronounces the two names differently, is the one case here that is in error. Chad and Carla, in the older age cohorts, are not merged but self-report a merger. All others report faithfully, which leaves 7 of 10 correct self-reports for Don/Dawn. For sorry and sari (also spelled saree), we expect a distinction, not a merger, which is borne out in almost all cases. Anton is again the exception, as he merges the two vowels but self-reports a non-merger. He was unaware that sari is an East Indian dress and saw the word for the first time. Lola is also coming close to a merger of sorry/sari, and also did not know the word. Carla, 61, has a case of fronting of sari, which is likely



Chapter 3.  A comparison of data collection methodologies

dawn don

800

Chad 44

600 800

Carl 62

200

200

400

400 don dawn

F1

400 600 F1

F1

400 600 800

Mario 25 200

don dawn

400

dawn don

600

600 F1

Gustave 15 200

F1

Anton 15 200

800

dawn don

800

1000

1000

1000

1000

1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

F2

F2 Lola 19

Kelsey 20

Ella 26

200

Nancy 40

200

Carla 61 200

200 400

400

600

600

dawn

800

don

don dawn

800

don dawn

1000

1000

1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

F2

F2

Anton 15

F1

800

600

sorry sari

800

1200 2500 2000 1500 1000 F2

chad 44

Carl 62

200

400 600

1000

F2

200

F1

400 sorry sari

don dawn

1200 2500 2000 1500 1000

Mario 25

200

400 600

1000

F2

Gustave 25

200

800

sorry sari

800

200

400

sorry

400

600

sari

600 F1

800

F1

dawn don

F1

400 600 F1

400 600

F1

400

800

F1

F2

600 F1

F1

200

F2

F2

800

sorry sari

800

1000

1000

1000

1000

1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

Lola 19

Kelsey 20

200

800

sari

600 800

Carla 61 200

200

400 sorry F1

800

600

F2

Nancy 40

200

400 sorry sari

F1

F1

Ella 26

200

400 600

F2

F2

400 sorry sari

400

sorry

600

600 800

sari

F1

F2

F1

F2

800

sorry sari

1000

1000

1000

1000

1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

1200 2500 2000 1500 1000

F2

F2

F2

F2

F2

Figure 3.3  Vowel plots for Don/Dawn and sorry/sari (non-normalized)

due to unfamiliarity with the token (as she mentioned). With sorry and sari, we have very faithful self-reporting with 9 out of 10 considered as accurate. Overall, self-reporting of the low-back merger is fairly reliable even on an individual basis, which is an aspect that McDavid’s test did not provide. While the auditory data is needed to fully interpret the graphs, the comparison shows that 23 of 30 tokens are correctly reported by the 10 interviewees. There are individual differences, however, with Anton being the most unreliable self-reporter.

81

82

The Written Questionnaire in Social Dialectology

Yod-dropping Just like the qualities of the low-back vowel, the second variable, yod-dropping (or glide-deletion) has been commented on in CanE for some time. Glide-deletion affects words such as student, news, tune, or duke. Until recently, the retention of the glide has been seen as a marker of CanE by both the public and linguists alike. As many commentators have argued, the yod-ful variants are vehicles for Canadians to assert their Canadianness in opposition to variants that are considered American: “When a particular pronunciation is clearly identifiable as American, the majority of Canadians tend to shun it without hesitation”, as Orkin (1970: 124) wrote in a popular account. The identification of what is American and what is not, however, is not clearcut and far from trivial. Three large-scale sociolinguistic studies in Canada report on glide deletion in three Canadian cities (Woods 1999 [1979] in Ottawa, Gregg 2004 in Vancouver and Clarke 1993b in St. John’s, Newfoundland). In a well-argued paper, Clarke (2006: 234) notes that “the +glide variant is not the formal target for all groups”, which indicates that statements such as Orkin’s no longer apply. Dialect Topography data from the Toronto region suggested that in Canada glidefull variants generally do not carry overt prestige: If Canadians were in the habit of ‘putting on airs’ by pronouncing students as st[ju]dents and news as n[ju]s, they would surely do so when answering the language-survey question for the Dialect Topography project. They do not. Yod-dropping is not only common and standard but also unmonitored. Chambers (1998b: 19)

That the case with yod is more complex is seen in the evidence of females over 40 leading glide retention in three Canadian cities (Ottawa, St. John’s, Newfoundland, and Vancouver). Glide loss, on the other hand, is led by males and blue-collar workers. This leads Clarke (2006: 236) to propose a “change in indexicality”, as “glided and glideless variants have come to symbolize different social values for different segments of the Canadian population”. The self-report data is compared with the acoustic measurements in Table 3.10. The acoustic data uses a “yod-ratio” measure, which normalizes the length of the glide for cross-speaker comparison.11 For avenue and coupon “0” was taken as showing no glide, while for student, given the phonetic environment that triggers inevitably some form of a measurable glide, a yod-ratio of 0.15 was introduced as a cut-off point after listening to each token and establishing the presence or absence of a glide in student auditorily. 11. The “yod-ratio” is calculated in the following way: The duration for the /j/-glide was measured from the beginning of the periodic wave form until the start of F2 drop as the glide transitions into the back vowel /u/. The raw duration was then divided by the duration of the entire syllable to obtain a score that is normalized for different speech rates and allows for comparisons between speakers. This value I call the yod-ratio.



Chapter 3.  A comparison of data collection methodologies

Table 3.10  Glide deletion in Vancouver: acoustic measurements of yod-ratio name

word

self

yod-ratio

match

Chad Carla Carl Ella Kelsey Anton Nancy Gustave

avenue avenue avenue avenue avenue avenue avenue avenue

nju nju nu nju nju nu nju nju

0.150 0.190 0.191 0.221 0.246 0.251 0.287 0.309

YES YES NO YES YES NO YES YES

Lola Mario

avenue avenue

nu nju

0.348 0.355

NO YES

Overall match

7/10

name

word

self

ratio

match

Anton Gustave Lola Ella Nancy Chad Carla Carl Kelsey Mario

coupon coupon coupon coupon coupon coupon coupon coupon coupon coupon

ku ku ku ku ku kju ku kju ku kju

0 0 0 0 0 0.135 0.178 0.220 0.227 0.252

YES YES YES YES YES YES NO YES NO YES

Overall match

8/10

name

word

self

yod-ratio

match

Mario Chad Carla Kelsey Gustave Carl Nancy Ella Lola Anton

student student student student student student student student student student

oo oo confused oo oo you oo oo you oo

0.046 0.071 0.142 0.160 0.165 0.169 0.188 0.227 0.246 0.254 Overall match

YES YES – NO NO YES NO NO YES NO 4/9

83

84

The Written Questionnaire in Social Dialectology

For avenue, a robust match is shown between self-reporting and observation. All interviewees self-report some degree of yod-retention, with the exception of Carl, 61, Anton, 15, and Lola, 19. Interestingly, Anton and Lola, the youngest in the sample, self-report the glide-less variant, while having fairly long glides, especially in the case of Lola (a yod-ratio of 0.348 is the second highest). Since all interviewees show some kind of audible glide, 7 out of 10 can be considered as reporting faithfully. For coupon, which has been reported as a variable of divided usage (Chambers 1998a: 243), everyone from Anton, 15, to Nancy, 40, is categorically glide-less, while from Chad, 44, to Mario, 25, increasingly longer glides can be seen. The self-reporting is apparently very reliable: 8 out of 10 self-reports match with the acoustic data, with Carla and Kelsey as the two that report unreliably. Arguably, student is the most interesting case: eliminating Carla, who could not decide in her self-reporting questionnaire how she pronounces the word,12 only 4 out of 9 self-report reliably, using the 0.15 cut-off for an assessment of “glide-ful” vs. “glideless”. This cut-off is indirectly confirmed by Carla, who sits right at the transition point, which explains her confusion. A striking fact emerges when the three contexts are combined. Eliminating Carla for student for the above reasons, all incorrect reports consistently under-report their use of yod. In all nine cases that informants self-report incorrectly, glide-ful pronunciations are measured but not reported (highlighted by the grey shadings in Table 3.10). This suggests a reversal of the traditional CanE pattern offered by Orkin (1970), in which glide-less variants do carry overt prestige. The result goes against many of the statements in the literature concerning glide deletion, which prove to be made fairly casually. Pringle’s assessment is one such example, claiming that Canadians want to stress how their English differs in sound from American English, they are particularly likely to settle on these [palatal glides, i.e. yods]. In former years many school English texts which concerned themselves with matters of elocution regularly included advice to say ‘news’ not ‘nooz’ […].  (Pringle 1985: 190)

It seems that today something else is affecting the Vancouver speakers. The glide-less pronunciation, which has been part of CanE since its inception, has now become the target variant carrying overt prestige. Speakers may still articulate glides, but they do not report them. As for the unreliability of self-reports for student, Clarke suggests a social recoding of yod-ful variants for one lexical item, news, in CanE. The re-indexicalization of yod in news, Clarke suggests, has little to do with “Britishness” and musters evidence for widespread use in United States TV media, where news retains the glide to give it more ‘respectable formality’ (Pitts 1986: 136 qtd. in Clarke 2006: 243). Contextual features do, of course, play a significant part in the precise interpretation 12. This was the only such case of indicisiveness among the 10 interviewees.



Chapter 3.  A comparison of data collection methodologies

of such features, which may also be linguistically co-determined (to be explored in a more theoretical context in Section 6.7). Similar to yod-ful news in the US media, where it serves primarily functions of ‘sophistication’ rather than ‘Britishness’, st[ju]dent, as opposed to st[u]dent, may have taken on a specialized indexicality for ‘learning & erudition’. Carl, 61, and Lola, 19, both live on UBC campus and are part of the academic community. Four others, who also have active ties with academia or learning, correctly self-report yods: Kelsey and Ella are undergraduates, Anton is a high-school student and Nancy an art teacher: they are all part of a larger workspace of learning. Without being aware of it, they all retain their glides in student. By contrast, Chad, 44, has no academic connection and is, correctly, a self-reported yod-dropper. Carla is a secretarial assistant who staffs the front desk at a large university department. She is surrounded all day by university instructors, while coming from a modest background. Her “confusion” as to what she says, yod or not, is not surprising when her position in Table 3.10 is considered, being right at the yod-ful cut-off for student. Carla is confused because her pronunciation is indeed smack in the middle between perceptible yod-ful and yod-lessness, which is reminiscent of Chambers and Trudgill’s (1998) “fudged lects”, compromise forms that were found in the transition zones between isoglosses. This leaves only Mario as the odd one out, the law student who correctly self-reports his yod-dropping, when we would expect him to signal “learning”, especially in exchange with his assistant professor acquaintance. Mario, however, is very ambitious and all about building himself; he is a person who has a keen interest in Canadian politics and is likely to run for office one day. The indexicality explanation, while not perfect, accounts for 9 out of 10 cases, and is thus preferable. The best reason for Mario’s behaviour is that he chooses not to index student for “learning” in order not to be perceived as “putting on airs”, in the sense of the quotation from Chambers (above): Mario has the education, but he does not “show” it with a yod-ful pronunciation. This comparison of yod-dropping has shown that self-reporting errors are found in the over-reporting of the yod-less variant, which is contrary to older statements that Canadians prefer a yod-ful variant to mimick the British standard. However, the data shows a consistent error in the under-reporting of yod-ful variants and suggests a change in indexicality for individual lexical items, as suggested in Clarke (2006). It seems clear that the general Canadian norm is now the yod-less variant. Exceptions seem to be lexical items that are indexed for some social feature such as “elegance and good breeding” for n[ju]s (Clarke 2006: 242) and, as suggested above, “learnedness” for st[ju]dent. The matches between self-reporting and acoustic measurements are generally good: avenue (7 of 10), coupon (8 of 10), with student, for reasons of its indexicality shift, faring badly with only 4 of 10. In the latter case it was argued that a social interpretation, considering the ties of an individual to academia, offers an explanation.

85

86

The Written Questionnaire in Social Dialectology

The other variables have shown that self-reporting has a reliability of 70% or higher on the following variables: avenue: 7 of 10, coupon: 8 of 10, caught/cot and Don/Dawn 7 of 10, sorry/sari 9 of 10. The case of student has shown that WQ data needs to be treated with caution. While generally a close relationship between reported and actual use can be assumed (in the area of 70% overlap or higher), variables undergoing a social re-evaluation fare badly: what is measured for student is not so much the actual use, but the attitudes towards the form. The consistent over-reporting of yod-less forms confirms their status as the (new) target forms. The comparison also suggests that some respondents clearly underperform and are less reliable: Anton and Carla are two cases in point. Both appeared as generally hesitant and doubtful in nature and their data might be best eliminated from the WQ pool. In most WQs, however, it will be impossible to know which respondents are of the “doubting” kind, but if possible, perhaps in smaller studies, their responses would be best eliminated.

3.4  Chapter conclusion This chapter began with a comparison of WQs and corpus linguistics. It was found that WQs have distinct advantages over corpus linguistic methods in their focussed elicitation techniques that ensure comparable data, yet that they suffer some drawbacks relating to the reliability of the data. The latter issue was then explored in two stages: first, WQ and FI were compared and found equivalent. Second, a comparison of WQ data with orally elicited behaviour in the form of sociolinguistic interviews was carried out for two phonemic variables. It was found that unless a variable is undergoing social re-evaluation the results of WQs match to a degree of 70% or higher with the interview data for two phonemic variables. In cases where a social re-indexing is in progress, WQs may be used to report the attitudes reflected in those variables more so than their actual use, as the case of student suggests. It should be mentioned that every data collection method faces some challenges and that it is best to treat each data for the merits it may offer. It is paramount not to directly equate WQ data with observed linguistic data, which is best achieved by strictly discriminating between the verbs “use” and “report”. While it may be problematic to say, based on WQ data, that, e.g. 73% of a population use feature X, it is always correct to state that 73% of a population report feature X.

Chapter 4

Types of traditional WQ variables This chapter introduces the types of variables that have successfully been employed in WQ studies. The focus will be on the “traditional” linguistic variables that have been employed in social dialectology. Because of the special status of WQs in Canada all examples come from Canadian English, which provides a coherent theme throughout the chapter. Extensions of this focus will be offered in Chapters 5 and 7, while the present chapter aims to present what may be called ‘best-practice approaches’ of ‘triedtested-and-found-true variables’. First, three types of lexical variables will be presented. Two variables in the area of syntax & morphology follow, with a third variable from what is usually considered a problem of language usage. The terms grammar, in the sense of syntax and morphology, and usage are sometimes used interchangeably and – as today’s usage often becomes tomorrow’s grammar – the three variables are presented under the same heading. The final variables are phonemic and have been shown to yield interesting and consistent results, before a brief outlook sets the stage for the next chapters.

4.1 Lexis (vocabulary) The type of variable that is widely and generally considered by linguists of many persuasions to produce reliable results in self-reporting are lexical variables. To be more precise, only core vocabulary items are considered as the best type (Mather & Speitel 1975). In Canada, among the most common variables are chesterfield ‘couch’, tap ‘faucet’ and parkade ‘parking garage’, among many others. Boberg (2005) is one of the most detailed synchronic studies of lexis, solely polling lexical items, 53 in total. Virtually all surveys, starting with Avis (1954), the Survey of Canadian English (SCE) and Chambers (e.g. 1998b) include lexical variables, which are the most frequently polled type.

88

The Written Questionnaire in Social Dialectology

4.1.1

A Canadianism is dying out: chesterfield

The first lexical variable is one of the most widely known variables in the Canadian context and refers to chesterfield as a word for couch. While there are special kinds of couches, such as leather-upholstered couches with steel rivets that are called Chesterfields (with upper case C), chesterfield – with lower case c – is a generic name for any couch. So, instead of chilling out on the couch, some Canadians chill out on their chesterfields. Or rather, they used to, as we will see in this section.

A couch ofofthe type A couch A couch the of Chesterfield the Chesterfield Chesterfield type type Source:

A chesterfield in century Canada A chesterfield A chesterfield in 20th 20th in 20th century century Canada Canada

Wikimedia Commons (photo: Dan Kamminga)

Illustration 4.1  Chesterfield (left) vs. chesterfield (right)

Illustration 4.1 shows the differences in referents. The variable was part of the Linguistic Atlas of the US and Canada and was adapted into Avis (1954) and later the Survey of Canadian English (Scargill and Warkentyne 1972), which, as we know from Chapter 2, polled some 16,000 Canadians. The SCE asked respondents (4.1) What do you call a piece of furniture that seats two or three people in a row and has upholstered arms and back? A. a sofa B. a chesterfield C. a davenport D. by any other name

SCE yielded the following results, separated by province, gender and generation (grade 9 students in 1972 or their parents):



Chapter 4.  Types of traditional WQ variables

Table 4.1  SCE data for chesterfield (question #29) in percent (Scargill & Warkentyne 1972: 86)13 Male parents

Female parents

Male students

Female students

A

B

C

D

A

B

C

D

A

B

C

D

A

B

C

D

NL PE NS NB QC ON MB SK AB BC

10 10 9 11 24 11 13 7 9 6

82 75 83 80 71 86 79 87 84 88

4 5 2 3 1 1 2 2 2 1

3 9 5 5 2 3 4 3 4 3

9 9 10 8 21 11 5 9 66 4

82 79 82 83 66 82 85 91 87 90

1 3 2 1 2 1 3 1 0 0

3 8 5 7 9 5 6 3 7 5

8 12 9 8 16 11 9 8 7 8

83 56 62 58 67 68 74 69 76 72

3 3 2 3 1 3 3 3 0 1

5 28 26 30 14 17 14 19 15 16

11 11 8 9 17 13 12 7 6 6

79 61 67 60 63 61 70 67 69 67

3 2 1 1 1 1 1 1 2 1

5 26 23 30 18 24 15 24 23 25

All

10

82

2

4

8

84

1

6

10

67

2

19

10

65

1

23

Table 4.1 is a typical results table from the SCE, which summarizes all types of information that the SCE provides: reported use by two generations, gender and region (10 provinces & the national average) are the three social variables one can operate with. What is not in the table cannot be retrieved from the SCE. What the SCE has shown for chesterfield is that it was the dominant form in all Canadian provinces and for all respondents – student or parent, male or female – in the early 1970s. Among the parents, the BC female parents report chesterfield the most (90%), the Quebec female parents the least (66%). All students also use chesterfield as their majority form – from 56% (PEI male students) to 83% among the NL male students. Generally, the level of reported use by students is lower than their parents’. In the overall national data (“All” row), 82% or 84% of male and female parents report the form, while only 67% and 65% among the students. The students show considerable frequencies in column “D”, which marks any other name and reports nationally 19% (males) and 23% (females) and as high as 30% for New Brunswick youth. This generational difference indicates a change in progress: the younger generation is consistently reporting their parents’ preferred form less frequently. If they do not use their parents’ preferred form, which form are they using?

13. The abbreviations stand for the Canadian provinces: NL = Newfoundland and Labrador, PE = Prince Edward Island, NS = Nova Scotia, NB = New Brunswick, QC = Quebec, ON = Ontario, MB = Manitoba, SK = Saskatchewan, AB = Alberta, BC = British Columbia.

89

The Written Questionnaire in Social Dialectology

We can see one of the drawbacks of this SCE question: we only know that they are using something, but we do not know what: had the respondents been asked to offer their preferred variant (and open-ended question), we would know, but with the SCE data one can infer only with the benefit of hindsight that the new form was couch. The most complete data on chesterfield come from Chambers’ Dialect Topography of Canada, whose online database will be introduced in Chapter 8, and from Boberg’s North American Regional Vocabulary Survey (Boberg 2005). Chambers (1995) is the most complete story of the term’s rise and fall and shows how WQs can enrich knowledge of linguistic processes. Dialect Topography uses the following question – slightly different from the SCE question above – to elicit responses: (4.2) What do you call the upholstered piece of furniture that 3 or 4 people sit on in the living room? ____________________________

Figure 4.1 shows the result of the Dialect Topography data from the Greater Toronto Region (“Golden Horseshoe”) for the early 1990s. The y-axis shows the percentages of respondents that reported one of the three major variants, chesterfield, couch and sofa. The x-axis is divided into age-cohorts: those in their 80s (and older), their 70s, their 60s, down to the respondents in their teens. 

couch



chesterfield sofa

   

90

    

s+

s

s

s

s

s

s

s

Figure 4.1  Chesterfield in the Greater Toronto Region and environs, 1991/2

We can interpret Figure 4.1 in two ways: synchronically, as a snapshot of language users in the Greater Toronto Region at the time of study (1991/2), or we can interpret



Chapter 4.  Types of traditional WQ variables

it diachronically by employing the apparent-time hypothesis, a standard framework in variationist sociolinguistics. All other things being equal, the apparent-time hypothesis states that one can equate the language of the older generations with the language when they were young adults, for which age 20 can serve as a reasonable cut-off point. Unless we have evidence that suggests that speakers change their linguistic behaviour as they grow older, it is customary in variationist circles to work with this assumption, which seems to work particularly well with non-standard variables. We can take the language of the 80-year-olds as a window into the past when they were around 20: so 80 − 20 = 60. The answer of someone aged 80 can be taken as representing the language 60 years prior to the survey date, in this case 1931/2. We will fully explore this apparent-time approach in Section 6.1. If the assumption is correct and the respondents preserved the language use of their young adult years, Figure 4.1 can, de facto, be interpreted as shown in Figure 4.2: 

couch



chesterfield sofa

 



    

/



/

 *



/ 



 /

–

 /

/  



/  –

– 



 / 



/

  

/

– 

 /

–

/

 



 /

–



/





Figure 4.2  Data from Figure 4.1 interpreted with the apparent-time hypothesis

One can immediately see the attraction of the apparent-time construct, as Figure 4.2 appears to offer a window into the past without having to resort to historical sources. The 80-year-olds approximate the language of the decade following 1931/2 (remember, this data is from 1991/2), the 70-year-olds from 1941/2, the 60-year-olds from 1951/2 and so on. Those in their twenties at the time of polling are taken to represent

91

92

The Written Questionnaire in Social Dialectology

the use of 1991/2, while the teenagers might even taken to predict future use, i.e. ten years from the survey date. The date 2001/2 is asterisked to signal that this would be a prediction by projecting the current use of teenagers into the future. However, as the language use of teenagers is not yet stable, one needs to be especially cautious with such projections. There is, however, a more general danger that Figure 4.2 or its description may be read as overly precise and matter-of-factly, as one must remember that the assumption rests on ‘static’ linguistic behaviour over a lifetime, which has its limitations. As a pedagogical model, though, Figure 4.2 will succeed in rendering the gist of an hypothesis that, at the bottom of it, would need to be substantiated with outside, real-time evidence. Let us assume, though, that apparent time applies in the present case and interpret the data in the light of linguistic variation and change. One can see from Figures 4.1 and 4.2 that the variant chesterfield, which was the majority variant in the 1930s (72%), 1940s (63%) and 1950s (60%), lost ground consistently since then and was reported by only 4% of 20-year-olds and only 3% of teenagers in 1991/2. This decline is to account for the steady increase of couch, the formerly American term, which is reported by 86% of teenagers. Please note that the shape of the curve for the incoming variant couch is almost a clear S-shape: an S-curve, with the exception of some fluctuation for the 60- and 50-year-olds. Changes often take hold in form of an S-curve. First slow, then quicker (steep part of the curve) and then levelling off (see Section 6.2). Sofa is an interesting minor variant that has been shown to correlate with a respondent’s first language or heritage language: Cantonese, for instance, uses the loanword sofa for a couch, as do French and German, which increases the likelihood that respondents with such backgrounds report sofa rather than couch (see also Section 8.2.2). Sofa has its strongest showing – up to 16% – in the middle-aged cohorts, which suggests that younger respondents more readily conform with the new Canadian norm: couch.

The rise and fall of chesterfield Using WQs and the simple question shown in (4.2), Chambers (1995) was able to reveal the coherent pattern of variant distribution replicated in Figures 4.1 and 4.2. These figures are based on an analysis of about 1000 responses. The sociohistorical and linguistic implications of the pattern, however, are not insignificant. By using historical evidence and immigration demographics into Canada and the Toronto region, we are able to breathe life into the percentile distributions shown above. One question to be addressed is why Canadians, for a good part of the 20th century, used chesterfield as the generic term for couch. Another question concerns the processes that led to its demise – as today (and already in 1991/2) chesterfield is only a minor variant.



Chapter 4.  Types of traditional WQ variables

It is clear that the process of semantic generalization of chesterfield predates the 1930s. By looking at Figure 4.2, one may reason that by extending the curve of chesterfield (i.e. if one had sufficient data from 90-year-olds and 100-year-olds), one might expect higher percentages in the 1920s and possibly the 1910s. Since we are reaching the limits of the apparent-time method in terms of time-depth (rarely will one find enough responses from 100-year-olds, though single responses from that age group do sometimes happen), we must find other means of dating the change. As will be shown below, the available real-time evidence may serve as an important corrective to an all too literal application of the apparent-time construct. The quest for real-time evidence starts in this case with the name Chesterfield, which has strong English ties (Lord Chesterfield). Canada has seen profound immigration from Britain and Ireland for most of the 19th century,14 which renders possible a transatlantic connection. Since the English meaning was and still is confined to the specific meaning of Chesterfield, as shown in Illustration 4.1 on the left, it seems that at one point in time the English meaning was generalized to all long, upholstered pieces of furniture. Historical dictionaries and repositories provide an important line of inquiry to substantiate such hypotheses: Chambers uses the Oxford English Dictionary, the Dictionary of Canadianisms on Historical Principles, 1st edition (DCHP-1, see Dollinger et al. 2013 [Avis et al. 1967]), and a textbook on Canadian English (McConnell 1979) for the decisive cues on when the word entered Canadian English. The date of its earliest attestation in print in England is from 1900 (OED-3, s.v. “chesterfield”). The earliest attestation in Canada is 1903 (DCHP-1), but the meaning of the attestation may be the specific English one rather than the generic Canadian one. The 1903 Canadian citation, in a cowboy adventure story by British writer Ridgwell Cullum, set in Alberta, reads: “He was leaning over the cushioned back of the Chesterfield upon which the old lady was seated ….” The upper-case spelling, as well as Cullum’s British upbringing, suggests the use of the more expensive, leather-upholstered couch, and not yet its generic use reported by the WQ respondents. Similarly, McConnell (1979: 131–32) notes that the term was not used at all in the 1901 catalogue of the T. Eaton Company, the major department store in the country at the time. The catalogue listing illustrates and names an impressive array of long, upholstered seats, including a “davenport sofa”, “lounge”, “sofa”, and “hall settee”, but chesterfield is not among them. These three independent pieces of evidence suggest that chesterfield was not used long before 1900 in England (in its specific meaning) and that around 1901–03 its use in Canada also referred to the specific couch. 14. From 1815 to about 1890, British and Irish immigration was much more pronounced than from any other country, see section (6.6.1).

93

94

The Written Questionnaire in Social Dialectology

By 1918, however, the word had been embraced by advertisers. DCHP-1 cites the following newspaper advertisement: “Chesterfields, all-over upholstered frame, deep spring back, roll-shaped arms, and spring seat and edge covered in floral tapestry and velour, reg. $110 for $85.00.” This description, combined with the extraordinary high price, suggests the traditional leather furniture rather than a general term. A comparison of prices at the time is indeed telling, using McConnell’s data: the largest piece offered by Eaton’s in 1901, the davenport sofa, cost $32.50, but the couch cost only $7.90. Given that the term was at least common enough by 1918 to be used without explanation in an advertisement, it was also presumably on its way toward being generalized as the name for all types of long, upholstered seats. The 1918 advertisement appears to mark a watershed for the term’s generalization from a specific item to any item of a category. Chesterfield appears to have taken on its generic Canadian meaning and become the standard term in Canadian English sometime in the 1920s. The data from Figure 4.2 picks up shortly afterwards and then documents its declining use since the 1960s. For the five decades between the 1920s (post-World War I) and the 1960s, however, chesterfield was the majority term in the Greater Toronto area and, by extension, in Canada (this can be shown with data from other regions – see Chapter 8). There are more points to corroborate this interpretation: In the 1950s, Avis used the variable as his first example to show differences between Canadian and American English in a scenario that is worthy to be retold here: No long ago a Torontonian shopping in a large department store just across the border asked where he could find chesterfields. On following directions, he was somewhat dismayed to find himself at the cigar counter! (Avis 1954: 13)

The term is also dominant in Canadian literature at the time and is used by literary greats such as Hugh MacLennan, Margaret Atwood and Margaret Laurence. Boberg (2010: 117) includes a literary attestation to its declining frequency from Margaret Laurence’s 1974 novel The Diviners: “furnished with Danish Modern, long teak coffee tables, svelte things to sit on (you could not call them sofas or chesterfields, both words having unseemly old-fashioned connotations)” (ibid). Figure 4.2 shows that by the 1970s, use of chesterfield was below 30% and the reason, as indicated by Laurence, is that the term became outdated and unfashionable. The 1970s seem to have been the crucial time of rapid change from chesterfield to couch. In another Canadian region, the Kootenays in the BC Interior, Gregg reported 56% couch and only 40% of chesterfield, with sofa and other responses at 2 percent each (1973): “contrary to expectation,” Gregg wrote, “the preferred General Canadian (GC) form chesterfield has lost ground heavily to couch” (1973: 108). From today’s perspective, it was the beginning of its quick decline. From a national sample of WQs,



Chapter 4.  Types of traditional WQ variables

Boberg (2010: 191) replicates most aspects of Figure 4.2, thereby independently reconfirming the scenario. How did chesterfield come to be generalized? It was possible to date the onset of semantic generalization to the 1920s, which is 20 years or one generation after the first attestations in Canada. It is often insightful to correlate settlement patterns with linguistic data. Chambers (1991) classified immigration into Canada into major “waves”, starting with the American Revolution (see Section 6.6.1). One possible scenario for the development of generic chesterfield can be accounted for by the immigration wave that lasted from about 1890 to the outbreak of World War I in 1914: among the continental Eastern European immigrants that characterize the period, a significant number of British working class emigrants came to Canada and it is the latter that offer the most plausible link. Many of these British working class immigrants would not have been in the position to afford a chesterfield (whether the specific or the generic meanings) in their homes in the Old Country, but they would have been exposed to Chesterfields in the specific meaning if they were in employment at an English country house, such as maid or footman. From here the generalization from a specific type of upholstered bench to any kind of upholstered bench is possible. Once in the position to acquire such a piece of furniture for their Canadian homes, they may have used the term, but this time in a generalized way. The above scenario (expanded from Chambers 1995) is one possible interpretation of how chesterfield came to be generalized in Canada. Another one hinges on trade relations between the UK and Canada and (Canadian) advertising language (Erik Schleef: correspondence), with its desire to “uptalk” ordinary sofas by calling them chesterfields (with lower case). WQ data played a vital role in revealing the specifics of the case of chesterfield. It is, arguably, one of the most widely known and best-­documented Canadianisms, though, it is also a historical Canadianism and as such no longer characteristic of CanE.

4.1.2

A Canadianism is staying put: The case of tap

Of course not all Canadianisms of long standing, such as chesterfield, are dying out. In some cases, a British form came into use and held its ground, so that it is now standard Canadian. One such variable is the name for the valve that releases water into sinks in kitchens and bathrooms. It is called a tap in Britain, and so it is across Canada, with some competition from the US term faucet. In the USA, spigot is found as well as is tap, though the latter is used there at very low frequencies, with the exception of the compound noun tap water. The SCE data is offered in Table 4.2. The question and answer options are listed in (4.3), which are needed to interpret SCE’s results:

95

96

The Written Questionnaire in Social Dialectology

(4.3)

What do you turn on to run water into a basin or sink in the house? A. tap B. spigot C. faucet D. valve

Table 4.2  National SCE data for tap (question #31) in percent (Scargill & Warkentyne 1972: 87) Male parents

Female parents

Male students

Female students

A

B

C

D

A

B

C

D

A

B

C

D

A

B

C

D

NL PE NS NB QC ON MB SK AB BC

73 87 88 89 88 91 87 92 94 92

1 1 0 0 1 1 1 0 0 0

24 10 11 11  8  8 10  6  5  6

1 1 0 0 1 0 2 1 1 0

70 88 90 85 90 90 90 92 95 92

3 1 0 1 1 0 1 1 0 0

23 10  9 13  8  8  8  6  4  6

1 1 0 0 0 0 0 0 0 0

82 92 91 92 84 88 90 87 94 92

2 2 1 2 2 2 1 2 0 2

12  5  7  5 12  9  7  8  5  6

3 0 2 1 1 0 1 1 0 0

81 93 95 95 89 93 93 92 95 95

1 1 0 0 1 1 1 1 1 0

16  5  4  4  9  5  6  7  3  4

1 1 0 1 0 0 0 0 0 0

All

89

1

 9

1

89

1

 9

0

89

1

 7

1

93

1

 6

0

In contrast to chesterfield, the responses for tap do not see a decrease from the parent to the student generation, which is a good indicator that tap is not undergoing a change and is a stable linguistic variable. On the contrary, in a number of contexts, the students show even slightly higher response rates than their parents, such as in PEI, NS, MB for both genders. From this apparent-time data we can infer that tap was here to stay. Newfoundland has lower response rates for tap in the parent generation, but the province’s student generation is moving more closely to the national standard form tap. Newfoundland joined Canada only in 1949 as its 10th province. Since then the speech of young Newfoundlanders has been oriented towards mainland Canadian norms to a greater degree than before and this increase of tap would be attributed to Canadian norms. Faucet makes a showing in the 1972 data, but – with the exception of Newfoundland – barely over the 10% mark. Spigot, a major US midland variant, is selected in 3 percent of responses or less and thus marginal. Figure 4.3 offers data from seven Canadian regions from the Dialect Topography database from a generation to a generation-and-a-half after the SCE. It offers a benchmark in real-time to check the above prediction, based on 1972 SCE data, that tap is a stable variant. In this later survey tap is shown in every Canadian region to still be the



Chapter 4.  Types of traditional WQ variables

majority variant: with scores in the low 60s in Montreal, the Eastern Townships and Quebec City – all of which are locations in Quebec. Outside of Quebec, percentages reach the high 70s in Greater Toronto and New Brunswick. This WQ data offers proof for tap’s majority status across the nation. Faucet, however, is now a stronger variant in Canada compared with the SCE, with substantial numbers in all locations. It is strongest in the Eastern Townships with more than a quarter of all answers. There is a small percentage of respondents who attest to using both forms. Spigot, on the other hand, has lost even more ground as there is only one single answer, from a Quebec City middle-aged man, among several thousand responses. One may argue that perhaps even the original SCE data is skewed by exaggerating the responses for spigot: spigot was listed as option (B) on the questionnaire, which generates more responses than an open answer text field. Some respondents, for instance, may pick the “more exotic” variant they are familiar with. 

faucet both tap

    

    

ick

ity Ne

w

Br

un sw

be cC

ip s

Qu e

ns h

al Ea

st er

n

To w

on tre

y M

le

Ot

ta w aV al

To ro nt o

Gr ea te r

M

et

ro

Va nc ou

ve r



Figure 4.3  Tap and faucet in the Dialect Topography of Canada database (source: Chambers 2008)

One can see the difference when (4.3), the SCE question, is compared with (4.4), the open-response question from Dialect Topography: (4.4) What do you call the knob you turn to get water in a sink?

____________________

97

98

The Written Questionnaire in Social Dialectology

It is one thing selecting variants from a list, but it is another to list a variant that one does not actively use, which is why the Dialect Topography data is more reliable (see 7.3.1 on types of questions). Spigot is a regional American variant that is used in Philadelphia and ranging outward from there into the midland American dialect region (Chambers 2008: 17). Since Pennsylvania was one of the major input areas into Canada following the American Revolution in 1776, one might expect spigot to be a variant in CanE. The almost complete absence of spigot in the Dialect Topography data suggests that the Loyalist input, prior to 1800, was too early for the term to have travelled with them (or alternatively that it died out in the interim, though this explanation is less likely). The earliest Canadian attestation for spigot in a water context is from 1849 from the Lower Canada Agricultural Society discussing the introduction of sewer pipes: I have been in the habit of making these pipes from the well known clay at the Drongan Pottery, spigot and faucet, which, for common sewers, […] are made with eyes, one of which is placed opposite each house, and a smaller pipe led from thence to convey the waste water from the dwellings into a main conduit. (Bank of Canadian English)

It is significant that spigot and faucet are used in the context of sewers and not in the context of fresh water pipes and both spigot and faucet have different meanings than the bathroom knob. In Philadelphia, spigot underwent semantic shift to water pipes in houses, which became widely available only in the last quarter of the 19th century. Chambers (2008: 18) argues that tap was the result of the second wave of immigrants. Almost a century after American Loyalist immigration indoor plumbing became more widely available in urban areas. At that time, British immigration had brought the British term for the appliance: tap. One might reason that Canadian-British trade relations probably solidified the use of the non-American term in Canada. So far we have only discussed the regional distribution of the variable in seven locations. The question whether tap continues to be a stable variant needs to be addressed differently. Figure 4.4 uses an apparent time graph to show that the younger age cohorts in Greater Toronto (30s and under) do not use tap considerably less than the older age cohorts. They are, in fact, reporting tap slightly more often than the 70s and 80+s age cohorts. The minor fluctuations shown in the figure are typical for empirical data and should not be of any surprise. Note that tap and faucet do not add up to 100%, as minor variants are not included. Such flat lines among the younger age cohorts suggest a stable variable for the next generation(s) to come. This is a hypothesis that could be confirmed (or refuted) by a new WQ study.



Chapter 4.  Types of traditional WQ variables

tap

90

faucet

80 70

%

60 50

40 30 20 10 0

80s+

70s

60s

50s

40s

30s

20s

10s

Figure 4.4  Greater Toronto (1991/2) tap vs. faucet (Dialect Topography of Canada)

4.1.3

A Canadianism is entering the scene: take up #9

There are not just dying and stable Canadianisms, as new ones are coming into existence all the time. Boberg (2005) lists a number of more recent variables in the Canadian context. Some of them have a national character, such as candy bar (AmE) vs. chocolate bar (CanE), the name for the first year in school after kindergarten: first form (BrE) / first grade (AmE) / grade one (CanE) or frosting (AmE) vs. icing (CanE) on a cake. Other terms have a regional dimension and are dominant in some but not all parts of the country. For example, pizza with all the standard (and free) toppings: all-dressed/ deluxe/everything-on-it/loaded/supreme/the works or a sweatshirt with front pockets and a hood: bunny hug (a Saskatchewan term) / hooded sweatshirt / hoodie / kangaroo jacket, or a multilevel building for parking cars: car park / indoor parking / parkade / parking garage / parking lot / parking ramp. A particularly interesting example of a previously undetected variable is what I call take up #9 (for a fuller treatment, see Dollinger in press, b). As a phrasal verb, take up has many meanings but meaning number 9, so called here because of its label as the ninth meaning in the Canadian Oxford Dictionary, shows a specifically Canadian dimension that is hitherto undocumented. In 2007, a Canadian professor teaching in the United States, suggested the term might be Canadian for its specific meaning of “going over the answer of a test, quiz etc.” In sentences such as “Let’s take up the exam after lunch,” it is clear for those who know the term that the exam, after having been marked, is discussed as a class to identify the correct answers.

99

100 The Written Questionnaire in Social Dialectology

It appears from anecdotal evidence that mostly Ontarians know this particular meaning. However, a search in existing dictionaries offers little guidance. The OED, s.v. take (v.), 93. take up, gives some general definitions that might offer an explanation for a semantic change, such as meaning “(c) With special obj., implying a purpose of using in some way, such as in to take up a book (i.e. with the purpose to read).” Neither the American Heritage Dictionary (42000), nor the Dictionary of American Regional English (Cassidy and Hall 1985–2013) list the meaning. The Canadian Oxford Dictionary (Barber 2004) includes the term with the desired meaning as number 9 of take, but does not offer any information on its regional status. Most significantly, it is not marked as Canadian. The earliest unambiguous citation in writing is from the 1992 Edmonton Journal, reporting on student life in a Toronto high school (Bank of Canadian English): “If you only got 55 per cent in Math 23, you probably won’t pass this course and someone should have advised you on this,” he tells the class. “I’m not telling you to leave, but no one wants to stay in a course and fail.” After taking up the test, he gives us plenty of time to do our homework. Everyone pulls out calculators and a couple of kids turn on their Walkmans. (Edmonton Journal, 30 Oct. 1992: A1, By-line: Toronto, ON)

Given that take up #9 is predominantly used in spoken discourse, its first attestation in 1992 is almost beyond doubt a drastic distortion of its real “age”. The most conclusive data comes from WQs. Data collected in the spring of 2010 with surveymonkey.com15 offers a convincing picture that supplants the anecdotal evidence. Based on a small WQ study, 159 respondents answered, among whom 32 spent their formative years in Ontario and 40 in the US, besides respondents from elsewhere in Canada and from around the world. They were asked to explain in their own words the meaning of “take up” in the following sentence (4.5) The professor took up our test in math class this morning.

The majority of respondents interpreted the phrasal verb as “pick up the test”, some as “began” or “start”, while others just said that they “had no idea.” A few guessed the meaning more or less correctly, such as a 30-year-old female from Germany with no connection to Canada, who answered “perhaps: ‘he talked about the topic of the test in class this morning’”. Clearly, it is not impossible to guess the correct meaning from the context, so we excluded answers that were clear guesses, such as those that ended with a question mark or included modal adverbs such as maybe or possibly as we were aiming to identify those who were familiar with the construction. Second language speakers of 15. Administered by undergraduate student Caitlin Bethune.



Chapter 4.  Types of traditional WQ variables 101

English were fairly good at guessing the meaning for some reason. Native speakers were not, however, probably because they were bound by other meanings, such as “collect,” or simply refused to accept the sentence as grammatically correct. Figure 4.5 shows the results for ‘going over the answers’ according to the places where the respondents were raised in their formative years from 8 to 18. What stands out in this small sample is that more than 93 percent of Ontarians unambiguously paraphrased the target meaning. The Albertans were next best (50 percent) and then British Columbians (25 percent). Responses from other Canadian regions were not sufficient in numbers to allow meaningful conclusions. Americans, generally, could not interpret the sentence correctly (New York State and other regions). It seems that take up, ‘go over’, is an Ontarian regionalism, and, from anecdotal evidence, has some currency in Manitoba. We find some revealing statements among the Ontarians, such as the following explanation by a Torontonian in his twenties: I can categorically state that the meaning of this sentence is crystal clear: as everyone knows, the use of “took up” is a simple idiom indicating that the professor went over the correct answers to each question for the benefit of those who were wanting to know.

Even people who migrated to Ontario late in their lives identified take up correctly, including a German native in his 60s who moved to Ontario at the age of 30. Comments by other respondents provide further evidence for the regional provenance. One Quebec native says: “Since coming to U of T [University of Toronto], I’ve learned that it means ‘to discuss, to go over’. I hadn’t heard that before.”

%

take up “go over” 100 90 80 70 60 50 40 30 20 10 0

Ont

BC

AB

NY

elswhere US

Figure 4.5  Correct answers take up #9, by the place of formative years (8–18)

How did take up come to acquire the meaning of “go over the answers” in one corner of the English-speaking world? The British do not seem use the construction with this meaning at all even regionally, as a follow-up study showed. Neither OED nor the English Dialect Dictionary (Wright 1898–1905) list it. Moreover, none of the

102 The Written Questionnaire in Social Dialectology

New Zealanders or Australians asked knew the meaning; they most often took it to mean “to collect” or “to begin.” As seen above, it is also not listed in any American dictionary.

Interpreting the data on take up #9 It appears that the meaning developed independently in Ontario. Early New York English was one input variety into Ontarian English (Dollinger 2008a: 68) but the New Yorkers in Figure 4.5 do not frequently recognize the expression (less than 10%). The evidence points to take up as a Canadian innovation rather than the preservation of a colonial form, like tap, or the generalization of a more specialized form, like chesterfield. Figure 4.5 also shows that some western Canadians report take up #9. This is actually not surprising. The explanation is that the meaning has been spreading westwards, out of Ontario, across Canada. There are vital links between Albertan and British Columbian cities and central Canada and movement back and forth between them has been constant since the completion of the trans-Canada railway in 1886. As an example of the strength of these links, our research revealed two Ontarians who had been teaching in Vancouver for two decades and have been using take up #9 for all those years without realizing that the local student population is not generally familiar with the expression. Statements such as the following one from a British Columbia woman in her 20s are also telling: I never use this phrase, though I recognize it from English speakers of other backgrounds. Maybe it’s the British who say it? Australians? I’m not sure offhand.

Her familiarity with take up #9 does not come from British speakers or Australians, but from Ontarians, perhaps even from the two Ontarian teachers. The same mechanism of dialect contact explains why many Albertans know the construction. There are profound population streams between Ontario and Alberta, which have been fluctuating with the booming oil industry, though other areas also contribute to the province’s oil-based population boom. Figure 4.6 shows Ontario out-migration that is reasonably matching Alberta in-migration since the 1970s. With the exception of the mid-1980s and 2009/10, Alberta has had net in-migration over the entire period. When Alberta was hit by a modern-day gold rush with sky-high oil prices in the 1970s, more than 915,000 people within Canada migrated to Alberta between 1971 and 1982. The peak, as can be seen in the lighter solid line, occurred in 1980/81 (with almost 110,000). The net-gain, in a province of 1.6 million in 1971, was a stunning 300,000 over an 11-year period or almost 20%. The number of people flocking to this oil-rich province fell off sharply as a result of the world-wide recession, reaching its lowest trough in 1983/84 (with just above 41,000 that year). Only since the late 1990s



Chapter 4.  Types of traditional WQ variables 103

120000 100000 80000 60000 40000 20000

19 71 19 /19 73 72 19 /19 75 74 19 /19 77 76 19 /19 79 78 19 /198 81 0 19 /19 83 82 19 /19 85 84 19 /198 87 6 19 /198 89 8 19 /199 91 0 19 /19 93 92 19 /199 95 4 19 /199 97 6 19 /199 99 8 20 /200 01 0 20 /20 03 02 20 /200 05 4 20 /200 07 6 20 /200 09 8 20 /20 11 10 /2 01 2

0

Ontario In-migrants Alberta In-migrants

Ontario Out-migrants Alberta Out-migrants

Figure 4.6  Canada Census data 1971–2012, In- and Out-migration of ON and AB (absolute frequencies). Source: Statistics Canada, Table 051-0018

(followed by a significant decline at the height of the Financial Crisis in 2009/10 with just 58,000), in-migration has again reached the late 1970s figures so that in 2011/12 the number of newcomers outgrew the record figures from the 1970s oil-boom. In one word: Ontarians are moving – with some temporary lulls – to Alberta. They bring their linguistic features, such as take up #9 with them. It is important to state, though, that take up #9 does not seem to be spreading into the United States. Like other recent linguistic changes (Chambers 2009: 72–73), take up #9 proves the strength of the U.S-Canada border as a linguistic boundary. At present, however, take up #9 is not generally used or understood across the country. One Vancouverite in his 20s makes this plain in his response to the test sentence. “It means absolutely nothing,” he writes, and then becomes increasingly irritated: How could you expect a sentence of absolute jibberish [sic!] would mean anything to me. I take it as a personal insult that you expect me to be able to interpret this nonsense!

A reaction like this one would be unthinkable in Ontario, where many more use the expression. Its narrow regional distribution has obscured it as a Canadianism. One of the best sources for present-day Canadianisms is the Canadian Oxford Dictionary, 2nd edn. (2004), but even this dictionary missed the geographical specification of take up. The meaning itself is listed in the dictionary, but it is not marked Cdn or Ont. Instead, it is presented as a general English term with no regional or national label.

104 The Written Questionnaire in Social Dialectology

Tracing take up #9 in the Canadian Oxford A closer look at the entry in the Canadian Oxford makes an instructive case about how the specifically Ontarian meaning got into the dictionary but remained unlabeled. Like any other dictionary, the Canadian Oxford was not started from scratch. Instead, it took the eighth edition of the Concise Oxford Dictionary (81990) as its starting point. Table 4.3 offers a comparison between the two entries. The entries are very similar with two changes relevant to our case: meaning #6 is replaced with another meaning, and meaning #9 is inserted as an addition in the Canadian Oxford (11998): “9 go over the correct answers to (homework, an assignment, a test, etc.).” The fact that the extra meaning was inserted as number 9 indicates that it has modest frequency (less frequent than meanings 1–8 and more frequent than the ones that follow it). But why was there no regional label? Its absence from the Concise Oxford provides an indirect clue to its Canadian provenance, as the Canadian editors apparently had no knowledge about is regional base in Ontario. So why was it not marked Canadian, or Ontarian? One plausible explanation is that the Canadian Oxford editorial team, based in Toronto with almost all its members having strong ties to Ontario,16 may have considered the meaning so commonplace that they did not suspect it might be characteristically Canadian. Whether this accounts for the omission or not, the example illustrates how complex it is to uncover regional patterns. Table 4.3  Comparison of OUP dictionary entries for take up #9 (Concise Oxford Dictionary, 8th edn. and Canadian Oxford Dictionary, 1st and 2nd edn.) Concise Oxford Dictionary (8th edn.)

Canadian Oxford Dictionary (1st edn.)

take up  1 become interested or engaged in (a pursuit).  2 adopt as a protégé.  3 occupy (time or space).  4 begin (residence etc.).  5 resume after an interruption.  6 interrupt or question (a speaker).  7 accept (an offer etc.).  8 shorten (a garment).  9 lift up.  10 absorb (sponges take up water).  11 take (a person) into a vehicle.  12 pursue (a matter etc.) further. take a person up on accept (a person’s offer etc.). take up with begin to associate with.

take up 1 become interested or engaged in (an interest, pursuit, hobby, etc.). 2 adopt as a protégé. 3 occupy (time or space). 4 begin (residence etc.). 5 resume after an interruption. 6 join in (a song, chorus, etc.). 7 accept (an offer etc.). 8 shorten (a garment). 9 go over the correct answers to (homework, an assignment, a test, etc.). 10 lift up. 11 absorb (sponges take up water). 12 take (a person) into a vehicle. 13 pursue (a matter etc.) further. 14 interrupt or question (a speaker). take a person up on accept (a person’s offer etc.). take up the gauntlet accept a challenge. take up with begin to associate with.

16. Five team members were from Ontario and the other from Calgary. The editor-in-chief, Katherine Barber, is from Winnipeg, where the expression is reported to be used.



Chapter 4.  Types of traditional WQ variables 105

4.2 Morphology Lexical items are not the only ones that have been polled with WQs. Morphological and syntactic features are sometimes considered as more difficult to be successfully elicited with the method. This is because of the use of writing in WQs and the close link of the written language with prescriptive traditions of right or wrong, which might lead to more standardized answers than for, say, vocabulary. One of the morphological variables successfully employed in CanE (and AmE), are the past tense forms of to dive and to sneak. When in 1857, the Reverend A. Constable Geikie gave a lecture on the infractions of English he witnessed during his short stint in Canada, he included the following comment: In England, when a swimmer makes his first leap, head foremost, into the water he is said to dive, and is spoken of having dived, in accordance with the ordinary and regular construction of the verb. Not so however, is it with the modern refinements of our Canadian English. In referring to such a feat here, it would be said, not that he dived, but that he dove. Even Longfellow makes use of this form, – so harsh and unfamiliar to English ears […] (Geikie 1857 [2010: 49–50])

The sarcastic tone is obvious and Geikie’s prescriptive airs ring loud and clear through these lines when he describes, what he calls “lawless and systemless changes” in Canadian English. Geikie did not approve, but he was not listened to. Both SCE (questions 12 and 23 on dove, and question 89 on snuck) and Dialect Topography gathered data on these morphological variables (Chambers 1998b: 19–25). The Dialect Topography questions in Example (4.6) were used to create the data in Figure 4.7. The reason two questions are included lies in some speakers’ claim to differentiate between inanimate subjects (the submarine) and animate subjects (he) or between noun phrase (the little devil) and pronoun subjects (he). (4.6) a. Q34. Yesterday he ________________ into the quarry. b. Q69. The submarine __________________to the floor of the sea. c. Q46. Which do you say?  The little devil sneaked into the theatre.  The little devil snuck into the theatre. d. Q59. Which would you say?

 He snuck by when my back was turned.  He sneaked by when my back was turned.

As can be seen in Figure 4.7, snuck and dove have made inroads among the younger respondents: snuck increases consistently from older to younger respondents in what resembles almost a complete S-curve, while dove is a bit more erratic for those over 70, but otherwise regular as well. What cannot be seen is any distinction worth mentioning

106 The Written Questionnaire in Social Dialectology



he dove sub dove snuck by snuck into

         

s+ s

s

s

s

s

s

s

Figure 4.7  Dove (not dived) and snuck (not sneaked) in Greater Toronto by age (1991/2 data)

between the types of subjects, whether inanimate, animate, pronoun or noun phrase. However, for snuck, the pronoun subject (he) triggers consistently more responses than the noun phrase subject (the little devil) from the 80- to the 30-year-olds, which might be the legacy of a pronoun subject rule that some respondents appear to be familiar with. Today, dove must be considered the standard CanE past tense form, not dived. The spread of dove is generally viewed as a homogenization process in which all of North America partakes. Originally a northern US form, dove has spread from there into other parts of the US and Canada (Chambers 1998b: 22) so that today it is a North American form. Snuck, the second morphological example, might be a form that Geikie missed. Had he heard it, he would not have liked it either, though he probably was in no danger of hearing it, because snuck appears to be a later development. In a study of dictionary evidence and the treatment of snuck, Creswell (1994: 145) identifies Webster’s New International Dictionary, 2nd edn from 1934 as the first reference work to mention snuck. While it was marked “dialectal” in that publication, Creswell shows that over the course of the 20th century snuck became “well established, fully standard, and in widespread general use in both the U.S. and Canada, and in growing use in Britain and Australia” (1994: 147). Figure 4.7 confirms Creswell’s assessment of linguistic change. Snuck accelerated in the 1940s and 1950s (seen in 50-year-olds), established itself as a majority form in the



Chapter 4.  Types of traditional WQ variables 107

1960s and 1970s, and in the 1990s approached categoricity (more than 90 percent) for people younger than thirty. Figure 4.7 illustrates a typical linguistic change in progress: snuck is the standard form in CanE and, indeed, in the North American context. In the context of World Englishes, one might reason that past tense form snuck not sneaked would be part of a global standard.

Snuck as a global form? There is indeed good reason to assume that snuck might become the standard in many varieties of World English. As snuck and sneaked are fairly unique forms, normalized internet searches17 can give an indicator of its use in a wider context. The data is calculated and displayed in a complementary manner (Dollinger 2015): 76% of snuck in Canada (.ca) implies that 24% of uses are sneaked. The higher the bar, the more snuck and the fewer sneaked occurrences are found. Figure 4.8 shows the percentage of Google hits per domain names in 2010 and 2015: it can be seen that Canada (.ca) is quite advanced in its use of snuck, as are the American domains .edu and .gov and, in 2015, China (.cn), which has taken the lead for highest snuck saturation with more than 90%, up from just the mid-30% mark only five years ago. It is figures such as these that testify to the vibrancy of China in the World English context. 100

2010

90

2015

percent of snuck vs. sneaked

80 70 60 50

40 30 20 10 0

.ca .edu .gov .uk

.ie

.au

.za .pk

.in

.se .de

.ru

.cn .hk .com

Figure 4.8  Snuck (not sneaked) by internet domains in % (31 May 2010 & 4 March 2015)

17. Normalized with the method used for the Dictionary of Canadianisms on Historical Principles, Second Edition, see Dollinger (2011b, 2015), Brinton (In press).

108 The Written Questionnaire in Social Dialectology

Let’s step back to the view from 2010 (dark bars). The 2010 data shows that in 9 of 15 domains world-wide, snuck was the prevalent form. To gauge the worldwide spread of snuck, the figure includes domains for Inner Circle countries (where English is spoken natively), Outer Circle countries (where English has had a long colonial history) and Outer Circle countries (where English is taught merely as a foreign language) (Kachru 1985, see Section 5.2). Even in the UK and in Ireland (.ie), which were the most conservative Inner Circle countries, sneaked had lost its status as the sole past tense and past participle form, with about one third of the tokens being snuck. Historical ties with the UK play only a marginal role for Inner Circle countries: in 2010 Canada is among the leaders of the change, as are the American domains .edu and, less so, .gov. For the latter two, the counts for snuck are slightly lower, because of the more formal registers that can be expected in post-secondary education and government sites. In the southern hemisphere, historical ties with the UK also do not show in the 2010 data, as Australia (.au) had the highest score of Inner Circle countries for snuck with 77%, and South Africa (.za) is about equally split at present. All of these countries were British colonies and are part of the Commonwealth of Nations. Only in the Outer Circle countries, however, is an effect of more conservative British norms still felt with Pakistan (.pk) and India (.in) showing a similarly low dissemination of around 30% for snuck. In fact, both countries were in 2010 more conservative than the British. Figure 4.8 also includes English as a Foreign Language (EFL) or Expanding Circle varieties. The 2010 data from Sweden (.se), Germany (.de) and Russia (.ru) reflect different traditions of teaching English: in Germany, and especially in Sweden, English was part of school curricula for the better part of the 20th century, with British norms as the target norm. English is a more recent phenomenon in Russia (.ru), at a time when the US is the dominant English-speaking nation, which explains why Russian web users have gone directly to the new form snuck, bypassing the more traditional sneaked used in European language teaching. By far the biggest domain on the web is .com. Since it is used worldwide by businesses around the globe, it cannot be attributed to a particular country. It may, however, serve as an indicator for Global English and here snuck is clearly in the lead: more than 1.9 million tokens for snuck, and only 780,000 for sneaked were found in .com, which suggested that snuck would likely be the form of the global standard. The prediction from 2010 that .com may be used as a simple bellwether for the new global standard of English is only partly confirmed by the 2015 data. Paradoxically, .com had a minor decrease in snuck compared with 2010. Russia (.ru), in 2010 firmly in snuck territory, is losing ground to the benefit of sneaked, while in the UK, snuck is now slightly more common than five years ago. Ireland, South Africa, India, Pakistan and Sweden, a mixed bag of various types of English, however, show increases of snuck. Could it be that British prestige is holding ground in some areas, respectively fighting back (.ru and



Chapter 4.  Types of traditional WQ variables 109

.com)? Then, of course, there is China (.cn), which shows a dramatic increase in just a few years, morphing from a “snuck Cinderella” in 2010 into the new leader in 2015. Some say, not without good reason, that the future of English depends on China. We will explore the global issues further in Chapter 5.

4.3 Syntax and usage WQs have also been used to investigate syntactic variables, some of which represent interesting usage questions that involve conventions of style and diction. In this area, grammars and usage guides aim to regulate, or at least recommend, certain forms over others. We have seen in the case of dove that some commentators, such as the Rev. A. Geikie, have strong feelings about particular forms. Most commentaries on usage, by which we understand how a language is customarily spoken or written as opposed to some authority’s taste and preferences, are found in the area of syntax. This section will review the traditional (and somewhat limited) ways of eliciting syntactic features, while in Chapter 7 more recent approaches to social perspectives on non-standard syntax will be presented. Now, we will look at three variables, two are connected with prescriptive grammar rules, whereas the third one is more frequently found in ESL instruction.

4.3.1

Different from/than/to?

The first syntactic variable is a classic in the field of English usage questions. It concerns the preposition following the adjective different, as in the question (4.7), from Dialect Topography: (4.7)

Which do you say? Our house is very different to yours. Our house is very different than yours. Our house is very different from yours.

Many texts and style guides devote multiple columns to settle the question which variant is “right”. In North American English, competition is fierce between different from and different than. Different to occurs in less than five percent and plays no role on this continent, while it has some currency in British and Australian English. Current usage guides rarely afford to discredit categorically any of the three forms. However, conservative writers still advise against the use of different than on the assumption that than should imply a comparison of degrees, as in she is taller than you. Such a comparison is not the case in (4.7), therefore “writers should generally prefer different from” (Garner 2003: 249). The OED (s.v. different adj. meaning 1b.), in an

110 The Written Questionnaire in Social Dialectology

entry that was published in the 1890s, states that “The usual construction is now with from”. With this, we can say that different from carries overt prestige and is the form one would expect most commonly in official texts and broadcasts. This can be empirically shown. Gregg (2004: 84) confirms the prestige status of different from for Vancouver in the late 1970s: 70 percent of his respondents with a university education considered different from as “correct”, 77 percent of teachers, and only 52 percent without a postsecondary education. Figure 4.9 shows the data for the Greater Toronto region for all three variables. It becomes clear that different than and different from are neck on neck in the youngest age cohort and that different from is no longer the “usual construction” it once was: different than is steadily gaining ground in apparent-time. The Guide to Canadian English Usage (Fee and McAlpine 2007: 177) no longer recommends any of the two forms and takes a practical approach: “Different to is a British form […] this form is uncommon in Canadian English. Whether you use from or than, it is important to make sure that readers know what is being compared with what.” 90

to

80

than

70

from

%

60 50 40 30 20 10 0 over 80

70–79 60–69 50–59 40–49 30–39 20–29 14–19

Figure 4.9  Different from/than/to in Greater Totonto

This is precisely what Canadians seem to be doing and not just in one region, but almost everywhere. Figure 4.10 shows the traditional prestige form different from in seven Canadian regions. There is overlap and criss-crossing between the regions in a band that is 10 to 15 percent wide, paired with some fluctuation in the middle-aged groups. Overall, it seems that different from is slowly declining and different than is increasing. There is one exception to this trend among the speakers under 30: Quebec City. Quebec City stands out in the two youngest age groups, with close to 80 percent reporting different from, which matches much older age groups, such as the



Chapter 4.  Types of traditional WQ variables

90

Vancouver Golden Horseshoe Ottawa Valley Eastern Townships Montreal Quebec City New Brunswick

80 70 60

%

50 40 30 20 10 0

over 80

70s

60s

50s

40s

30s

20s

10s

Figure 4.10  Different from in seven Canadian regions (Dialect Topography database)

50-year-olds in this city or the speakers 80 and over in the Eastern Townships or New Brunswick. These three regions have significant numbers of French speakers (obviously so in Quebec City and Montreal, but also in New Brunswick, the only officially bilingual province in Canada18). There is one characteristic of the Quebec City anglophone community, however, that separates it from the other varieties of Canadian English. Like Montreal, Quebec City is located in the officially monolingual French province of Quebec. Quebec City is lacking the international orientation of Montreal, which means that the overwhelming majority of Quebec City anglophones who are bilingual have therefore almost exclusively French as their second language. Most importantly, Quebec City has by far the lowest number of monolingual English speakers as shown below: %

Ottawa Valley

New Montreal Eastern Vancouver Quebec Brunswick Townships City

English monolinguals 14–29

74.3

82.4

31.8

43.2

52.9

8.6

18. Canada is bilingual at the federal level, but all not so on the provincial level. Quebec is, by contrast, officially francophone on the provincial level, while all other provinces with the exception of New Brunswick, are de facto anglophone (see, e.g., Boberg 2010: Chapter 1 & Dollinger 2011c).

111

112

The Written Questionnaire in Social Dialectology

The gap is quite noticeable. Such concentration of French second-language competence is bound to have linguistic consequences in English. Young Quebec City anglophones can be expected to have a working knowledge of French, with only 8.6 percent not speaking any other language than English. Différent de, the French form, is the direct equivalent of different from, which would reinforce different from in Quebec City anglophones. On the other end of the spectrum, the Ottawa Valley and New Brunswick, where at most a quarter of the younger age groups speak a second language (74.3 and 82.4% are monolingual), the frequencies of different from are the lowest, with the other regions in between. It seems we discovered an explanation for the difference in reported use for different from in the frequency of the use of French. Apart from this difference in the younger age cohorts, different from repeatedly shows the same gender trend in all Canadian regions. Table 4.3 shows the correlation of gender and social class. Table 4.3  Different from by gender and social class (source Dialect Topography database) different from

%

Vancouver Greater Toronto

Ottawa Valley

Montreal

Quebec City

Gender

Females Males Middle class Lower class

55.9 50.3 56.1 47.8

67.2 62.6 68.1 58.5

67.8 64.7 67.3 63.3

76.0 68 76.5 60.7

Social class

64.9 56.2 61.7 61.3

In each of the five regions shown above, gender and social class consistently influence different from. First, from east to west, females use different from more often than males, from 3.1 percent differential in the Ottawa Valley to 8.7 percent in the Golden Horseshoe. Second, again consistently, in every community the middle class members use different from, the traditional prestige form, more often than the lower class members: from a very small differential of 0.4% in Greater Toronto to 15.8% in Quebec City. These distributions confirm the remnant prestige associated with different from: it is one of the Labovian principles that females are generally more sensitive to prestige norms (see Section 6.4, Principle 2; Meyerhoff 2011: 220, Principle Ia). It is also widely attested that middle class speakers adhere to overt prestige forms more than lower class speakers (e.g. Chambers & Trudgill 1998: 58). While differences between the social classes in the Canadian settings are not drastic, they nevertheless exist. Traditionally, grammatical variables in North American English (in contrast to phonological variables) tend to be quite absolute or near-absolute markers of class membership. Such sharp stratification does not happen with respect to different from, which allows the hypothesis that the existing social permeability will hasten the decline of different from: if members of various social classes use



Chapter 4.  Types of traditional WQ variables

different than alike, there are fewer social reasons to maintain the old prestige form different from. The majority of the younger age cohorts are members of the middle class, for example in the Ottawa Valley, 72 percent of the 14–19-year-olds are members of the middle class (assigned through their parents), and 82 percent of the 20–39-yearolds. Yet the young respondents generally tend to stay away from different from: the 14–19-year-olds report different from the least in Figure 4.10, while the 20–39-year-olds report it in about 60 percent. Given the ongoing change towards different than, it would not be surprising to see the first grammar guides embracing different than as the default form soon. As we have seen, the Guide to Canadian English Usage no longer states a categorical preference. It is only a question of time before we see different than gaining full acceptance. As style guide writers can make the case for different than with the same arguments than for different from, it is easy to switch allegiances, as both forms have a long pedigree. Different than is attested since 1556, different from since 1551 (OED) and both have been used by literary greats throughout the periods. In some contexts, different than is even deemed more economical than different from. Fee and McAlpine (2007: 176) recommend its use in one context, when different is preceding a clause. Even the most conservative style guides are not entirely immune to this line of thought. Fowler’s Modern English Usage uses the example I was a very different man in 1935 from what I was in 1916, which could be rephrased as I was a very different man in 1935 than in 1916. However, Fowler recommends to avoid this “American” construction, without offering an alternative (Burchfield 1996: 213). As the data show, different than is not only American; it is also perfectly Canadian.

4.3.2 Between you and me or I? The next syntactic variable concerns the use of grammatical case in pronouns. There is much variation in the pronoun systems of English dialects, which is a legacy feature of the case marking system of Old English (the language of, e.g., the Beowulf poem), which is depicted in Table 4.4: Table 4.4  First person pronoun singular in Old English and Present-Day English Old English Nominative Accusative Genitive Dative Instrumental

Present-Day English ic mē mīn mē mē

I me mine – –

Nominative Objective Possessive

113

114

The Written Questionnaire in Social Dialectology

While the present-day forms go back to Old English forms, it no longer makes sense to view English syntax in the case system shown on the left, with dative and instrumental cases. The 1st person personal pronoun forms, I (nominative case), mine (genitive), me (accusative) and me (dative and instrumental) signal different semantic roles: nominative (role of subject), genitive (among others, marking possession), accusative (undergoing actions), dative (beneficiary etc.) and instrumental (the means of achieving the action). Today, dative and accusative cases taken together are usually called objective case. Until Middle English times – the time of Chaucer and Gower – English used cases on most noun phrases to mark semantic roles of the constituents, as shown for the word kings in Old English in the next table: Table 4.5  Old English a-stem nouns declension for cyning ‘kings’ (plural) Nominative Accusative Genitive Dative Instrumental

OE cyningas OE cyningas OE cyninga OE cyningum OE cyningum

‘the kings’ (initiating the action) ‘the kings’ (undergoing the action) ‘of the kings’ ‘to the kings’ ‘by way of the kings’

In Middle English times, such overt case markings were breaking down rapidly and were replaced by a fixed word order. The subject, which formerly was marked by a nominative word ending (-as in the above example), is today always the first constituent in a clause and the object, formerly marked by the objective case today follows the verb: in Modern English it makes a difference whether you say the cat ate the mouse or the mouse ate the cat. In Old English both nouns had case markings that identified who ate whom, and their position in the sentence was quite flexible. The older case marking rules are the driving force behind the variable between you and me/I. In this example, the pronouns you and me/I follow the preposition between. When the case markings were operative, many prepositions demanded a dative or accusative case. In Old English, for instance, the prepositions between and in demanded the dative case, while through and up commanded the accusative case. Today, the legacy of this rule demands the objective case, which is me, as shown in Table 4.4. The application of this principle demands that all prepositions are followed by objective case in pronouns: that is me in the first person singular, not I, so between you and me.

The source of the confusion So far the discussion has been focussing on me following a preposition in object position. Confusion arises since the same pronoun behaves differently in subject position. When used as the subject of a sentence, as in You and I are students, the subject is You and I, and subjects are always in the nominative case, that is I in the first person (see Table 4.4). Applying the same logic, You and I in subjective position is right, and You



Chapter 4.  Types of traditional WQ variables

and me is wrong. By contrast, between you and me is right (because of the preposition), and between you and I is wrong. So, going by the traditional rule, it is simply not just the case that me is always wrong and I is always right. If students hear in English class not to say My friend and me, but rather My friend and I when in subject position, it is easy to see how the rule might be misinterpreted as “never say me, always use I”, which is where between you and I derives from. To top the complexity, English has a long history of substituting me for I in subject position. Most famously, in extrapositions with it, that or there such as It is me, or That is me, objective me has been used since Shakespeare’s time alongside It is I, that is I (Bauer 1998). So in this context, where the traditional case rules predict I, English speakers, including upper class speakers, have used me for centuries. It is in this context only that me is approaching the level of a standard. It is much harder to find examples of me in non-extraposed contexts, but they also go back to the 18th century at least, as in “you and me are gone in my carriage”. You and me or Me and you in subject position are considered informal and non-standard and are targeted in education, where You and I is demanded. Extraposed constructions, such as it is me are the exception here. The education system in English countries has aggressively fought me in subject position. It has dealt with it so effectively that a considerable number of people now hypercorrect me by replacing it with I in all situations, including when following prepositions. The rule in subject position (always I, never me) is applied to contexts following prepositions, which command, traditionally, objective case (always me, never I). Figure 4.11 shows the results from three Canadian regions for non-standard between you and I by education levels: 100

Quebec City

90

Ottawa

80

Greater Toronto

70

%

60 50 40 30 20 10

em

en ta

ry

ol el

ho sc gh

hi

ge co lle

un

ive

(c eg

rs

ep )

ity

0

Figure 4.11  Non-standard between you and I by education (source Easson 2000: 16)

115

116 The Written Questionnaire in Social Dialectology

As Easson (2000) has shown, there is a correlation between non-standard use of between you and I and education: the lower the education level, the higher the use of the non-standard form. The high percentages of non-standard between you and I at about 50 percent for those with only elementary level education include mostly the elderly, dropouts and current high-school students. It is of some surprise, however, that almost 40 percent of those with a high school or practical diploma from a trade college report between you and I. Only among the university-educated is the form a clear minority variant of about 20 percent. That the use of the nominative case after prepositions is still stigmatized could be seen during the 2010 Winter Olympics in Vancouver. The theme song of the Olympics, entitled I Believe is sung by Montreal singer Nikki Yanofsky and includes the line “I believe in the power of you and I”. You and I following a preposition, in this case in, not between, is so stigmatized that Yanofsky was not universally granted poetic license to preserve the rhyme with the preceding line: I believe together we’ll fly I believe in the power of you and I

Grammar Girl19 and other usage mavens weighed in, the Montreal Gazette called the line “grammatically horrific” (Perusse 2010), and a Vancouver Sun writer was “cringing” over it (Kennedy 2010). Fowler’s Modern Usage says, categorically, that between you and I “must be condemned at once” (Burchfield 1996: 373). While journalists may cringe and prescriptive grammarians condemn the construction, many Canadians, as is evident in Figure 4.11, use it. Yanovsky might have been given some leeway and Canadians generally did grant her just that: the song went to number 1 in the Canadian charts within five days, “incorrect” grammatical construction or not. In contrast to different than, between you and I is fought vigorously by traditional grammarians. For the time being, apparent-time graphs generally show a steady distribution of between you and I across the age cohorts and not, as we have seen in the case of couch, a steady incline from the oldest to the youngest groups. A variant deemed ‘incorrect’ according to the prescriptive rules of English can be limited by education but it cannot be eliminated by it. While between you and I is persistent it also continues to attract criticism. This stands in contrast with different than, which is encroaching on the once-prescribed different from with comparatively little comment, at least in North America. In a language system that no longer has consistent case markings, preserving them in the pronoun system represents the last foothold from a language evolution perspective. Between you and I therefore persists as a usage variable, and it also persists as a battle front for prescriptive grammarians. 19. [7 August 2014].



Chapter 4.  Types of traditional WQ variables

4.3.3 Telling time: 11:40 or twenty-to-twelve? The final variable in this section describes a change in conventions of telling time that can be meaningfully correlated with technological changes and social practices. Respondents were asked the following (Dialect Topography): 12 9

12 12 3

9 9

6 Q. 18 What time is it? (Please write in words what you would say.)

12 3 3

6 6

9

3 6

Q. 49 What time is it? (Please write in words what you would say.)

Questions about how people tell the time have been included from the start of linguistic geography in North America with the Linguistics Atlas of New England (Kurath et al. 1972 [1939–43]). There are two basic variants for answering the above questions: on the one hand twenty to eleven and half past nine, which shall be called analog variants, and on the other hand, eleven-forty (11:40) and nine-thirty (9:30), which shall be called digital variants. It is a characteristic of analog ways of telling time to have a number of variants: for 11:40, at least the following exist: twenty to, twenty before, twenty till/’til, twenty of are all recorded variants. For 9:30, one finds for instance half past nine, half nine, half after nine. As with other open questions, there is a noticeable number of dual responses, where a respondent answers for example half past nine and 9:30. This is in line with good collection principles, where respondents are expressly asked to include other variants and the present case study shows that dual responses may offer the critical clue towards solving a change in progress. Figure 4.12 depicts the results for single answers, analog or digital, in the Greater Toronto Region. The analog and digital formats are pitted against each other. This means that the 32% for “half past 9” in the over 80-year-olds, correspond with 68% who say digital nine-thirty. The two variables behave quite differently: for the half-hour variable, only about 10% (or fewer) of respondents younger than 80 report “half past nine”, with about 90% and more reporting digital 9:30. This general pattern can be found in all Eastern Canadian survey regions (Pi 2000). The different behaviour of the two variables may be interpreted, as Pi (2000: 89) suggests, as constituting two independent changes. The conversion to a digital format is basically complete for the half-hour variable, while the twenty-to variable is stable, or at best at the beginning of a change.

117

118 The Written Questionnaire in Social Dialectology

100

20 to 12

90

half past 9

80 70

%

60 50 40 30 20 10 0

over 80

70–79 60–69 50–59 40–49 30–39 20–29 14–19

Figure 4.12  Analog answers in the greater Toronto region

We can see the relative independence of changes confirmed by diary entries. At the beginning of modern time keeping, Samuel Pepys, the mid-17th century London diary writer who was obsessed with time, never talked about anything else but full hours and, if at all, approximated to the full hour, as in “About 4 o’clock comes Mrs. Pierce to see my wife” (Pepys n.d.: s.v. 25 May 1667). About one and a half centuries later, the journal entries of Ontarian Charlotte Harris, written in the 1840s, routinely refer to half hours and occasionally, quarters of the hour, such as in “Feb 13 We left again at a quarter to nine.” (CONTE; Harris & Harris 1994: 18) In the mid-19th century, no trace of digital time telling is found. Digital displays were not invented until the early 1970s, but the use of digital time telling clearly antedates this innovation. Figure 4.12 shows that only slightly more than 10 percent of the 70-year-olds in 1991/2 used the analog half past nine, which means that almost 90% responded with the digital variant 9:30. Applying the apparent-time hypothesis, one suggests that fifty years earlier in 1941/2, when the 70-year-olds were in their early 20s, the digital variant was already in use. And the early 1940s clearly predate the advent of digital clocks and watches. This means that digital watches are, against popular belief, not responsible for the switch from analog to digital time telling, but they likely accelerated a change that had been in operation. We next need to look into the mechanisms of these two changes. We would like to know in what ways people transition from the analog to the digital format. In Pi (2000: 92), the data from five Dialect Topography regions is arranged in insightful ways (Table 4.6):



Chapter 4.  Types of traditional WQ variables 119

Table 4.6  Cross-tabulation of answers of 2440 responses (Pi 2000: 92)

The table cross-tabulates the possible combinations of analog and digital formats. This is how Table 4.6 is read: Box A, in the upper left corner, represents speakers that use analog forms for both variables, that is a form of half past nine or half nine and twenty to, twenty of, twenty till eleven. Of the 2440 respondents, 249, or 10.20% report these formats. By contrast, Box I, at the bottom right, represents the end point of these two changes, with speakers saying 9:30 and 11:40, that is both digital forms. This means that 15.20% have completed the change to digital in both environments. Other boxes show mixed forms, e.g. Box D represents speakers who say analog half past, but mix analog and digital in twenty to and 11:40. There is only one such speaker in the data. By contrast, Box C represents the most common pattern today (that is in the 1990s, when the data was polled), digital 9:30 and analog twenty to, which almost 70 percent of the respondents report, that is 1700 speakers. What can be said about the trajectory of the change? Looking at the numbers, it can be seen that Boxes A, C, and I are most numerous (249, 1693 and 371 speakers respectively). The white-shaded Boxes D, E, G and H are least common (1, 14, 8 and 3 speakers). Because the difference is so great (249 vs. 14 is the smallest differential) and not close at all (i.e. not 20 vs. 14), we can make a claim based on the distribution alone: these data suggest that the white-shaded boxes are not part of the trajectory of change, but represent idiosyncratic and idiolectal manners of speaking. By comparison, the “dual forms” in boxes B (50) and F (51) are reported quite frequently, by a factor of at least three times as many than any of the white-shaded boxes. These numbers suggest that the trajectory of the change in speakers follows the grey-shaded pattern: A → B → C → F and finally to → I

We have identified a pattern that was built up from a good array of empirical data. The next step would be to aim to date these changes and to find the cultural motivations for them. Figure 4.12 provides a glimpse with the 80-and-over group into the final

120 The Written Questionnaire in Social Dialectology

stage of the change from half past nine to 9:30, which allows a dating (from Box A to Box C) to the end of their formative years, which was around 1930. This is as much as our apparent-time data can do for us. Beyond its reach, we need independent evidence that we find in literature from different periods. We know the end of the half-past variable, but when did the change from analog to digital begin? Charlotte Harris’ diary from the 1840s, quoted above, serves as a starting point to that purpose. The Linguistic Atlas of the Upper Midwest (LAUM) was recorded in the early 1950s, as you recall from Chapter 3. LAUM reports 51% of the 9:30 variant for those born before 1890 and 62% for those born in the early years after 1900 (Pi 2000: 96), which goes some way but not all the way to the origin of the change from analog to digital. There is evidence to suggest that the change began with the railroad’s introduction of precise time keeping. Prior to the creation of trans-national railroad time tables, which depended on precise time-keeping, local times were kept in each town and each town clock proclaimed its own time. These local times were replaced with “railroad time”, that is the digital format, and from there it entered the general language. The Dictionary of American Regional English (s.v. railroad time) has an early citation from 1865, that is already used in a general, non-technical context: “She is … so punctual that railroad time might be kept by her instead of a chronometer.” By 1865, railroad time was already used in a figurative manner of speaking. Table 4.7 offers a summary of the changes and a dating in real-time from analog to digital ways of telling time: Table 4.7  Trajectory of change for telling the time with approximate time line half past twenty to mid-19th century



9:30 or half past twenty to mid 19th-century to 1920s



twenty to 1930s to 1970s

9:30

9:30

9:30 →

twenty to or 11:40 1980s to present

→ 11:40 ?

The majority pattern in the data, 9:30 combined with twenty to, is dated from about 1930 to 1980. The youngest age cohort still uses the combination 9:30 and twenty to in most cases. These speakers grew up in the 1970s and 1980s, which offers a point in time for the next change. The late 1970s and 1980s, the period when digital LCD displays became common, coincides with this time line and likely contributes to the transition of the 11:40 variant by accelerating it. Digital clocks, however, are not the driving force behind the change to digital, which is a much older development, a trajectory that is intricately linked with human development and increasing dependence on time-keeping.



Chapter 4.  Types of traditional WQ variables

In the youngest group, we already see signs of the final step of the sequence in Table 4.7, the step towards 11:40. The majority of the teenagers in Western Washington (80%), Montreal (52%), New Brunswick (51%) and large minorities in Vancouver (46%) and to a lesser degree in Quebec City (33%) already report 11:40. As Tony Pi put it some years ago: the twenty to variable “is at the beginning of its change towards digital” (2000: 100). It has further progressed since, and more recent data would likely show so, but we cannot predict exactly when the current variability in the twenty-to category would give way to categorical 11:40 (hence “?” in Table 4.7). The ways of telling time may appear at first as a linguistically only marginally interesting variable. What Tony Pi’s account has first shown, however, is the underlying structure behind these changes. WQ data is capable of offering a coherent interpretation, and provides a thread through otherwise complex and multilayered changes that span more than 150 years. The half-hour mark acted as the entry point for “railroad time” in the general language. The completion of the half-hour change to a digital format almost took a century to peter through to general use, from the time of the first railways in North America in the 1840s to the 1920s. Much later, the digital watch displays in the 1970s and 1980s helped expand the trend to the 11:40 context. The time-telling variables invite comparisons with other languages and may be considered a modern equivalent for cultural universals. Traditionally, universals are sought after by anthropological linguists in areas that are apparently stable across cultures, which are generally kinship or colour terms. In today’s world, time telling is such a unifying factor in all modern societies. It would also be interesting to see how 11:45 (quarter to) or 11:55 (five to) figure in comparison to 9:30 and 11:40: is there a prediction to be made?

4.4 Pronunciation: Phonemic variables The written medium and traditional writing system have a notoriously bad reputation for recording variation in sounds. A number of studies have shown, however, that methods such as rhymes and comparisons with key words that have a uniform pronunciation may alleviate some problems. What can be said is that WQs may be used to poll phonemic differences, but they are generally considered as inept to elicit the more fine-grained phonetic features. In other words, respondents are generally reliable in identifying those elements in the sound system of their language that create meaning distinctions, e.g. dusk and tusk, or absences of a phoneme, e.g. g-dropping as in thinkin’ or t/d-deletion, e.g. bes’ for best. WQ respondents, however, are not reliable in offering information on allophones of a given phoneme, e.g. whether the t in tusk, butter or nut sound the same or different. With that important proviso

121

122 The Written Questionnaire in Social Dialectology

in mind we can approach cases of successful phoneme elicitation. An assessment of the accessibility of a feature by respondents is required in every case as the example on yod-dropping in Chapter 3 has demonstrated. Avis (1956) must be considered one of the pioneers of using WQs to elicit phonemic information. As we have seen in Chapter 2, he first applied the method in a questionnaire that he distributed at Queen’s University in 1949/50 (1956: 56, fn 1) and expanded it in important ways from Davis’ (1948), which was limited to lexis. Not attempting to elicit information on Canadian Raising and other phonetic features, Avis (1956: 43) “aimed at broad distinctions which the informants could be expected to recognize”. This is the most important maxim of all WQ phonetic ­questions. After eliminating all respondents who apparently were confused by the questions on sound, which was reached by reviewing their answers for strange patterns, Avis decided to only include senior university student or graduate students who, in his survey, were able to interpret the questions. A limitation by education is today generally no longer necessary, though the data would need to be inspected to ensure maximal validity.

4.4.1 Yod-dropping The phenomenon of yod-dropping is the name for the absence of a /j/-glide, or yod, in words of the type student, news, dew, tune and the like. Yod-dropping is a process that began in 16th century English and has today almost reached all phonetic contexts. In Canadian English, a general increase of yod-dropping has been reported in stressed syllables with WQ data for the items student and news, though in somewhat unusual ways (Chambers 1998b). Both variants seem to fluctuate around the 60 percent mark in the Metro Vancouver and with sizeable competing variants in other places, which allows a lot of room for sociolinguistic exploitation. Student is an interesting variable for a number of reasons. It has been polled in the following way (here from the SCE): (4.8) 17. The u in student is pronounced like A. oo in too B. u in use C. either way

Student is of interest, first, for an apparent mismatch between reported and measured yod-retention, as we recall from Chapter 3. Second, for its apparent clarity in the SCE (for BC), and its more complex nature revealed in the Dialect Topography data (in Vancouver), as shown in Figure 4.13:



Chapter 4.  Types of traditional WQ variables

90

BC 1972 – oo

80

Van 2004 – oo Tor 1991/2 – oo

70 60

%

50 40 30 20 10

s 10

s 20

s 30

s 40

s 50

s 60

s 70

s+ 80

(9

0s

+)

0

Figure 4.13  Student pronounced with yod-dropping, BC (SCE), Vancouver and Toronto (DT) (BC data is comprised of two data points only: 60s & 30s)

Figure 4.13 represents the state in 2004, aligning the BC and Toronto data accordingly. What is clearly visible is that yod-dropping occurs at different rates in Toronto and in Vancouver, while BC and Vancouver, the largest city in BC, pattern nicely together. The SCE’s parents and grade-9 student data (the diamond symbols) were averaged and matched with the age cohorts of the 2004 Vancouver survey: the grade-9 students would have been in their 40s in 2004 and their parents mostly in their 60s. While in Toronto we have a steady progression towards yod-dropping from the 80s+-year-olds who would have been in their 90s+ in 2004, Vancouver does not show such progression. The 1972 BC data suggests that the drop in the Vancouver 60-year-olds is no coincidence. It appears that those born in the decade after 1944 use less yod-dropping than the older age cohort. Since the 1960s (50s and younger), Vancouver’s rate of yod-dropping has fluctuated around 65%. Clarke (2006) suggests, as we recall from Chapter 3, that yod-dropping or retention serves different social functions today, for instance, that yods in news are being inserted to index a speaker’s level of learnedness: the more education, the more likely one is to retain the yod in news. Similar processes can be suspected for student. The process of social indexing will be explained in greater detail in Section 6.7.

123

124 The Written Questionnaire in Social Dialectology

Yod-retention in avenue: Urban vs. rural split? Yod-dropping can also occur in unstressed syllables and it is here that the Canadian results are more homogeneous than for student (Chambers 1994, 1998b). The context of avenue offers a clear Canada-US spilt as Figure 4.14 shows: 100

NB

90

QC

80

ET

70

MTL

60

OV

50

GTA

40

VCR

30

US-NY

20

US-WA

10

US-NewEng

0

80s+

70s

60s

50s

40s

30s

20s

10s

Figure 4.14  Avenue with yod in seven Canadian and three American locations (DT)

It is apparent from Figure 4.14 that Canadians show a tendency towards yod in avenue and that the American locations (dotted lines) show lower percentages. However, in terms of age and location there are considerable differences within Canada. Most significantly, the younger respondents in the Eastern Townships, the Ottawa Valley and New Brunswick, three of the most rural locations, favour yod much less than their peers in Canada’s more urban regions. All other locations remain, even for the youngest cohort, above 70% or higher. It remains to be seen whether successive generations in the three rural locations will continue to prefer the traditional American yod-less form in lieu of the Canadian pronunciations. In the Eastern Townships, the change in apparent time is consistent and has made the yod-less variant already the majority form. It may be that in rural contexts yod in avenue is perceived first and foremost as old and traditional, regardless of its status as a Canadianism. We mentioned indexicalization with news and student: if one chooses to linguistically construct oneself



Chapter 4.  Types of traditional WQ variables 125

as “non-rural” – yod-less avenue would contribute towards that performative goal, though further work is needed to substantiate this interpretation. So far we have dealt with binary phonemic variables. It is possible, though, to elicit more complex patterns involving more than two choices. One of the most successful variables is the pronunciation of the first vowel in guarantee, which will be discussed in the sociolinguistic theory chapter (Chapter 6). Here we will briefly look at a less frequently reported variable of long standing: the pronunciation of the vowel in vase.

4.4.2 Variation in lexical item vase The available pronunciation data for the vowel in vase affords an interesting view of dialectal variation in Canada’s regions that has not been synthesized. Vase has been polled since Avis’ 1949/50 survey (1956: 43), though we have no idea of Avis’ question, which is of particular methodological relevance. Cassidy and Duckert (1953) include the variable in their WQ. Polson studies vase in BC with a wide array of data from four different BC locations that the following discussion is based on. The SCE includes it as well, though with a differently and confusingly worded question. Dialect Topography takes over Polson’s question, shown in (4.9): (4.9) 2. Would you rhyme “vase” with “face”, “days”, “cause”, or “has”? If you don’t rhyme it with any of these, supply your own rhyme.  (Polson 1969: 42)

The variant choices are interesting as this question type has proven its efficiency for eliciting the four major variants, which there are: face /feɪs/ days /deɪz/ cause /kɑz/ (or /kɒz/) has /hæz/

yielding /veɪs/, the principal variant in the USA yielding /veɪz/, a predominantly Canadian variant yielding /vɑz/ (or /vɒz/) (or other low/low-back vowels), the /ɑ/ realizations being the originally British variant yielding /væz/, a minor variant

Responses from the Toronto region show the following overall distribution: cause – 44%, days – 35%, face – 17%, has – 2%. The apparent-time graph in Figure 4.15 for Toronto (GTA, circle line marker) reveals a different and more nuanced picture: the variant days /veɪz/ is on the increase and has reached more than 50% among the teenagers. The figure shows two more facts: first, the American percentages (dotted and dashed lines) are all under 10% in the younger age cohorts, while the octogenarians in Vermont report it more often, which suggests /veɪz/ as an American variant of the past. Second, there is considerable variation in Canada. While New Brunswick (NB) has by far the highest counts in all age groups

126 The Written Questionnaire in Social Dialectology

(the NB octogenarians have roughly the same reported frequency as the Toronto teens), Vancouver has the lowest reported figures for the three youngest age groups. Overall, it looks as if /veɪz/ is a Canadian variant that might soon gain overall predominance. Phonetically located “between” the British /vɑz/ and the American /veɪs/, /veɪz/ can be seen as a typical Canadian compromise variant. There is a long tradition for Canadian “middle ground” solutions: from classic variables such as spellings that combine American tire (British tyre) with British centre (American center) to create compound words such as Canadian tire centre (McConnell 1979: 47), to more recent phonetic developments. Boberg (2009) suggests a similar mid-way solution to the British and American extremes for foreign (a) nativization, for instance, which refers to the vowel quality of loan words such as pasta, Mazda or llama – where Canadians are beginning to use a phonetically intermediate form, between the trap and palm vowels. From this theoretical backdrop, vase appears to develop a Canadian dimension. 80

NB

70

QC ET

60

MTL 50

OV

40

GTA

30

VCR US-V T

20

US-NewEng 10 0

US-NY 80s+

70s

60s

50s

40s

30s

20s

10s

US-WA

Figure 4.15  Vase rhyming with days, i.e. /veɪz/ (Dialect Topography data)

While the Metro Vancouver data is somewhat odd, Polson’s (1969) data from Vancouver Island (Duncan, BC), the Okanagan Valley (BC Interior) and mainland BC offers interesting insights that may help explain the lower Vancouver percentages. Polson noted in his mid-1960s data that the BC adults were using British /vɑz/ in 70%, while /veɪz/ only in 16%. His BC high school students, however, were using /veɪz/ in 50% of all cases (and /vɑz/ only 45%). It looked, from the perspective of the late 1960s, already as if /vɑz/ was in decline and /veɪz/ would become the new form in BC before too long (1969: 44).



Chapter 4.  Types of traditional WQ variables

Polson’s prediction of /veɪz/ becoming the majority form is not borne out in Figure 4.15, however. The students from the mid-1960s, then in their teens, would show as the 50-year-olds in Figure 4.15 – with an overall incidence of /veɪz/ of about 28 percent. The younger cohorts merely hover around the 30% mark, which shows that the change towards /veɪz/ has not taken off in the Metro Vancouver context. Polson offers one explanation: Further evidence that a movement from /vɑz/ to /veɪz/ is in progress can be found in the distribution of the teen-age responses. I have found invariably that in the responses to any given item the Duncan [Vancouver Island] students tend to choose the more conservative and/or British form, the Vancouver students are always less conservative and less British, and the Hope [Eastern end of the Fraser Valley] students always move further in this direction than the Vancouver group. (Polson 1969: 46)

Polson’s regional distribution is more fine-grained than any other BC data we have to date. It shows, including other changes that Polson discussed, that Hope was linguistically more advanced than Vancouver and much more so than Duncan, which means that the linguistic distribution in the regional town of Hope is a better predictor for linguistic innovation than Metro Vancouver. It might strike one as odd that the town of Hope, a municipality of 6000 at the south-western bend of the Fraser River, would feature more innovative forms than the region’s metropolis, given standard models of geographical diffusion which predict that innovations derive from the urban centres (Trudgill 1974a). The changes reported by Polson for Hope, including the spread of /veɪz/ might qualify as counter-­ hierarchical diffusion form country to town (Trudgill 1986) in the BC context. Given the multi-ethnic and international make-up of Vancouver, salient variables would first operationalize in smaller places before they are taken on in big cities. This interpretation, of course, stands in contrast with the urban hierarchy (or cascade) diffusion model (Trudgill 1974a; Boberg 2000 in the Canadian context), which predicts that changes originate in the bigger cities and are then picked up by smaller municipalities as a function of distance and population size. Hope, 160 km from Vancouver, is separated from the metropolis by some of the worst traffic grid lock in North America, which makes daily commuting not a viable option and which limits the frequency of contact. On closer inspection, however, the idea of Hope displaying innovative forms in a counter-hierarchical movement seems appealing in the Canadian context where American, Canadian and British layers of influence have historically intermingled. The 20th century has seen massive non-anglophone immigration, which would add yet more layers. It may well be that in Vancouver a number of prestige factors distort the results, given US and British influence, besides L2 Englishes that are quite

127

128 The Written Questionnaire in Social Dialectology

prominent. The case of (rural) New Brunswick in Figure 4.15, confirms the idea that less diverse places might show an incoming Canadian form sooner than a multiethnic urban context. It may also well be that as a western city, Vancouver’s comparatively young settlement history allows for more variation – especially in the 1960s – than in the Canadian east. Another idea might explain the Vancouver-Toronto difference as relating to the make-up of the immigrant population on the whole. When one compares the Vancouver and Toronto data for two groups, those with very strong local ties (those whose parents were already born in the city) and those who have less thorough ties (those whose parents immigrated to the city or who are themselves immigrants), a different pattern emerges in the two data samples. Figure 4.16 shows the Toronto and Vancouver data rearranged by strong and loose local ties. What both scenarios have in common is that in both groups, cause /vɑz/ is on the decline and face /veɪs/ occurs as a variant. Toronto differs in the 20-year-olds and the teenagers in that days /veɪz/ is already the majority variant for both the strong and loose ties, which indicates a promising future for a new Canadian form. In Vancouver, this is not yet the case. We can see in the left bottom panel for those with strong local ties that cause /vɑz/ is still the majority variant even in the teenagers, attesting to a conservative tendency, a holding on to British norms. On the other hand, those who have looser local ties in Vancouver, the right panel, show only a haphazard adoption of days /veɪz/ in the 30-year-olds and younger groups. For teenagers, face / veɪs/ seems particularly appealing and has /væz/ makes a strong showing in the three youngest cohorts. The two Toronto graphs have more in common than they do with the Vancouver graphs: those with not-so-strong local ties in Vancouver show more linguistic diversity than the peer group in Toronto. To summarize, Torontonians embrace the Canadian variant with voiced /z/, days, in both strong-tie and weaker-tie groups. In Vancouver, however, the strong-tie group has not yet a clear dominant variant, suggesting more heterogeneity in Vancouver than in Toronto, an interpretation that is consistent with the Vancouver loose-tie group. This suggests that linguistic heterogeneity is more extreme in Vancouver than in Toronto, at least for vase, which is an unexpected finding in the midst of conceptions of homogeneity in Canadian English (see Section 6.8). In the last instance, one would need to decide if these differences are the result of different immigration and assimilation pressures in Toronto and Vancouver, or if they are artefacts of the data sampling. One would need to find independent data to confirm this, both quantitative and qualitative alike.



Chapter 4.  Types of traditional WQ variables 129

70

days

60

face

50 40

has

30

50

days

40

face

30

has

20

20

Toronto RI 1–3 (strong local ties)

10s

30s

20s

50s

40s

70s

60s

10s

30s

20s

50s

40s

70s

60s

80s+

0

80s+

10

10 0

cause

cause 60

80

Toronto RI 4–7 (looser local ties)

100

cause

70

cause

80

days

60

days

face

60

has 40

50

face

40

has

30 20

20

10s

30s

20s

40s

50s

60s

10s

20s

30s

40s

50s

60s

Vancouver RI 1–3 (strong local ties)

70s

0

0

80s+

10

Vancouver RI 4–7 (looser local ties)

Figure 4.16  Vase in Toronto and Vancouver (%) by strength of local ties (Regionality Index) RI 1

4.5  Outlook This completes the overview of traditional linguistic variables in social dialectological WQ studies. The types of variables shown here have been applied to WQs at least since the 1950s, although not always with the precise variables shown here. There are further approaches, which will be dealt with in the next chapter on extensions of established practice in a World English context and in Chapter 7 under the rubric of question design. Lexical variables are a staple in WQ research: chesterfield (as an outgoing Canadianisms), tap (as a stable Canadianism) and take up #9 (as a nationally incoming Canadianism) have shown three patterns. In the area of syntax and morphology, prepositional case (between you and me) and different from were combined with innovation in the past tense forms snuck and dove. Usage issues were explored with analog and digital ways of telling time, which is a sociocultural variable that allows insights into the increased importance of exact time keeping in everyday life over

130 The Written Questionnaire in Social Dialectology the past 150 years. The examples of yod-dropping in student and avenue were complemented with vowel incidence in vase, which is a variable of considerable pedigree that allows interesting theorizing on models of linguistic diffusion across space (see Britain 2010b: 148 for a summary of diffusion models). What this chapter has not addressed are newer WQ methods for the study of syntactic phenomena from the field of “socio-syntax”, which have been successfully employed in syntactic atlas projects in the Netherlands (SAND, Barbiers et al. 2004), Scandinavia (ScanDiaSyn, L­ indstad et al. 2009; Johannessen et al. 2010) and Switzerland (Glaser 2000). Their methods involve indirect grammaticality judgements and translation and reformulation tasks, i.e. offering a non-standard sentence as an interrogative (a question) to the respondents and asking for the equivalent positive sentence (a statement). Another more recent application of WQs concerns the polling of linguistic perceptions and attitudes (e.g. Preston & Long 1999–2002), which has developed innovative question designs. These methods will be discussed in Section 7.3, while the next chapter addresses the use of WQs in the very dynamic and growing field of World Englishes.

Chapter 5

World Englishes, multilingualism and written questionnaires The traditional WQ methods are limited in two respects. First, many WQ studies are at least implicitly based on a monolingual speaker model. This is a reflection of the traditional focus of dialect geography on what Chambers and Trudgill (1998) called NORMs, the non-mobile, older, rural, male speakers, who were taken to represent the oldest accessible speech forms in an area. This bias is of course reflected in WQ methodology. The Survey of Canadian English (SCE), discussed in Chapter 2, is a prime example of such monolingual focus in the Canadian WQ context. SCE, for instance, ruled out its 1347 responses from “respondents born outside of Canada” (Scargill and Warkentyne 1972: 49); they were not just set aside for separate analysis, but not reported in any of the publications, which suggests their purging from the data pool. As SCE’s goal was to capture the English typical of the ten Canadian provinces, it failed so by permitting, de facto, only monolingual speakers who were raised in Canada. The target group was therefore defined as the speech of native speakers of Canadian English, who are defined as those who grew up with CanE (Chambers 1998c: 252; Boberg 2010: 107). As a result, SCE is limited in its representation of the Canadian populace, which has been a highly heterogeneous group since the country’s inception. Equally importantly, however, SCE cannot speak at all to potentially important language and dialect contact scenarios that may have contributed to the newly developing variety in a young nation. As a consequence of the pervasive monolingual focus, WQs have generally not been designed to model language and language-contact scenarios effectively, if they included such elements at all. When multilingual speakers were polled, the status of their multilingualism was usually not systematically surveyed. There are exceptions, of course. The Dialect Topography of Canada innovated the “Language Use Index” (Section 8.2.2), but primarily so to assess the respondents’ frequency of use of English; respondents’ other languages have been a relatively minor concern and have been employed as an explanandum only concerning Canada’s second official language, French. For instance, the language transfer scenarios from Quebec French to Quebec English, e.g. sofa (the French term, not couch) or different from (in analogy to différent de) represent only one aspect of language change through contact.

132

The Written Questionnaire in Social Dialectology

The central question of the present chapter is whether and how a method developed for and predominantly applied in researchers’ local, usually native linguistic settings such as Canada, the USA, Scotland (for English), German-speaking Switzerland (for German) or The Netherlands (for Dutch), can be adapted for use in diverse transnational contexts. One issue to be reckoned with are situations of language and dialect contact that traditional WQs in social dialectology have not been applied to. With such scenarios, the identification of linguistic variables and their variants will be more difficult, especially if WQs are aimed for use in multiple locations to allow for comparisons (as will be discussed in Section 5.3.2), as the researchers will not always be in a position to have first-hand local information for some settings they wish to collect data from. This chapter first explores the foundations of the traditional WQ method and its implicit monolingual speaker target. For the sake of consistency, Canadian English shall again serve as the example and test case (Section 5.1). The historical expansion from monolingual to more multilingual dimensions will be tackled from three perspectives. First, by exploring practical adaptations of existing WQ formats – some “quick fixes”, so to speak, that may work in some settings. Second, by incorporating a much broader reorientation of sociolinguistics in the more recent contexts of linguistic ‘super-diversity’ and globalization. And third, by querying the notion of “space”, which has been central to WQs on a big scale since Wenker, from the background of a theory of space. These reconceptualizations draw from recent work on Global Englishes and Lingua Franca approaches, problematize the implications of surveying speakers of highly diverse backgrounds who populate and create super-diverse settings and conceptualize geographical and social spaces alike. These speakers routinely participate in acts of mobility and construct their linguistic meanings in particular situational contexts by drawing on all (partial) linguistic competencies available to them. Geographical space is turned into social space and is perceived as yet another kind of space. Recent theories suggest that these types of speakers and spaces will be the new normal in a globalizing world and are therefore of immediate relevance to WQs. The chapter presents Kachru’s basic circle model of English (Section 5.2) as a terminological point of long-standing. The model, offering a rough categorization of WEs into three spheres, provides useful concepts and terminology that are used to introduce and explore features specific or perhaps typical for New Englishes. Section 5.3 serves as a springboard for a discussion for the search of linguistic variables that are amenable to WQs and some of the problems such extensions generate. It introduces an international WQ project (Krug, Hilbert & Fabri forthc.), which is innovative and at the same time reveals some of the problems of adapting traditional WQs in more and more diverse settings (Section 5.3.1) and spaces (Section 5.3.2).



Chapter 5.  World Englishes, multilingualism and written questionnaires

Since WQs in English have predominantly been employed in Inner Circle countries (countries where L1 speakers of English are the majority), morphosyntactic and lexical features of the Outer Circle (Englishes in former colonies where native speakers were the minority) that seem pollable with WQs will be addressed. Considerable focus will be given in this part of the chapter to English as a Lingua Franca (ELF) (Section 5.4), which are Englishes [sic] used for communication among non-native speakers, an area that is presently one of the most dynamic ones in the field. The section includes a brief introduction of attitudes towards ELF, for which data has been successfully collected with WQs. Following Bamgbose’s (1998) criteria on the distinction between errors and innovations, the claim is made that WQs may be the best tool for clarifying questions about variant use, dissemination and acceptance, which are key in many newer varieties. The section is concluded with a rudimentary set of principles for the identification of variables and variants in contact scenarios, such as WE and ELF contexts, that may assist with this important task more generally. An addendum is offered in Section 5.5, where “expert WQs”, a type of WQ that has been used extensively in WEs, are briefly discussed. Expert WQs are of a very different nature than those generally used in sociolinguistics, but they are included here for their prominent status in accounts of WEs. They represent the problem of detailed data versus comparable data that was addressed in earlier chapters. As some aspects of this chapter will be exploratory, not all questions pertaining to these topics can be answered. The discussion will hopefully facilitate further examination of the nature of data transfer and data exploitation beyond the traditional WQ domain. Section 5.6 concludes with a concise summary and some relevant points for the future.

5.1

Canadian English and the multilingual speaker

The monolingual focus on CanE is of course a reflection of former research trends in the greater field. For example, a closer look at the data behind Dialect Topography, a project that expressly aimed towards representative sampling (Chambers 1994: 36), suggests that the notion of representativeness did not fully capture the multilingual make-up of the region in question. In the entire Greater Toronto database (Golden Horseshoe 1991/2) of 1015 respondents, 209 are listed as speaking a language other than English. While this is a good percentage, the sample is in its multilingual dimension not representative of the region. For instance, it includes as the most frequent “additional language” group 57 speakers of Dutch, followed by 39 Italian, 20 German, 4 Hindi/Urdu, 3 Chinese speakers and no Punjabi speakers. The Toronto census data (population of 5 million, in 2006), though, shows that Chinese comprises the largest non-English linguistic group in the city with more than 400,000 speakers, followed by 185,000 Italian speakers, 132,000 Punjabi and 132,000 Hindi/Urdu speakers. By

133

134 The Written Questionnaire in Social Dialectology

contrast, Dutch is only spoken by 11,000 Torontonians. It is clear from these data that the focus of this project’s sampling effort, while aiming at representation in terms of age and gender, was not intended to reflect the multilingual make-up of Canada’s largest city. Similar cases can be made for other projects. In spite of this disparity, it needs to be stressed that the Dialect Topography project, while comprising a sociolinguistically representative sample to a considerable degree, is not representative of the multilingual fabric of Canada’s regions. The underlying target seems to have been the two official language groups of (usually) monolingual English speakers and French speakers in areas where they reach considerable proportions. Put differently, the focus appears to have slighted English speakers with more recent immigration histories, many of whom are second language speakers (L2) of English. This fact is not meant as a critique of one project, but as a bias in the approaches to language in geographical and social space. The systematic inclusion of L2 speakers requires a different theoretical conceptualization of language in space, a globalization of sociolinguistics that is only now becoming available and puts mobility at its centre. First, however, less extreme approaches towards including L2 speakers in the survey populations will be discussed.

5.1.1

From monolingual to multilingual perspectives

Staying within the Canadian context, one can see a cline of development from very strict inclusion criteria for respondent selection to a gradual widening of scope to include all long-term residents of a survey location. Polson’s (1969) Questionnaire for British Columbia applied overly exclusive selection criteria for respondents, which are summarized below: 1. Respondents “must have been born” in the area – “or at the very least they must have come to the area at a very early age”. 2. Residence in the area for “most of their lives with very few absences”. 3. “Their parents must not have spoken any language but English in the home”. 4. Their parents “should, if possible, have been long-time residents of the area”. 5. Intelligent and knowledgeable, no difficulties reading. 6. No education beyond high-school. 7. Knowledge of farming, ranching and the outdoors “would be useful”. 8. Willingness to participate. The monolingual focus of Polson’s questionnaire and all Survey of BC respondents becomes evident from restrictions # 1, #2 and #3. The local character is reflected in #4, while the more traditional focus is also seen in restrictions in terms of education (presumably to avoid distortions through educated usage) and a focus on rural terminology (#6).



Chapter 5.  World Englishes, multilingualism and written questionnaires

Polson’s criteria seem peculiar and overly limiting from today’s perspective. His respondents share the most important characteristics with NORMs, the non-mobile, older, rural and male speakers of traditional dialect geography and their non-mobility. The criterion of de facto monolingualism (#3) kept the group artificially homogeneous. Combined with the exclusion of any multilingual component, the respondent selection criteria seems revealing of the implicit assumptions of what kinds of language were worthy of study. From this background, Chambers and Heisler’s (1999) “Regionality Index” (RI), conceived for the Dialect Topography project, must be perceived as utterly innovative. The index represents the departure from a full focus on monolingual and geographically static speakers to a more inclusive and more representative data sampling. The RI is an index value assigned to every respondent that quantifies and represents his or her strength of local ties. Respondents are placed on a scale from 1 (very local) to 7 (recent migrant) based on birth place, place of residence between ages 8–18, and parents’ birthplaces. Section 8.2.1 explains the details of the arithmetic, but at this point it shall suffice to say that the Regionality Index is heavily weighted towards the local population. Respondents with scores from 1–5 have all grown up in the target region, while only the scores of 6 and 7 are used to classify those who did not. The RI was a step forward that allowed the inclusion of some aspects of multilingualism. Compared with Polson, who categorically ruled out the multilingual, even the bilingual speaker, the RI is a big achievement. However, a score of 7 is a class that covers an excessively wide range of respondents and is more of a catch-all for all those who migrated to the target region after age 18. Consider the two cases in Table 5.1, which would be treated identical with the RI: Table 5.1  Calculating the Regionality Index: Two fictional examples from Toronto Born in Montreal 2 Raised in Fredericton ages 8–18 2 One parent born in Poland, one in France 2 Base point:  1 Regionality Index:  7

Born in Germany 2 Raised in Germany ages 8–18 2 Parents born in Russia 2 Base point:  1 Regionality Index:  7

The RI was not designed to distinguish between the respondents on the left and the right side of Table 5.1, who do not have anything in common. The respondent on the left is an anglophone with strong Canadian ties, while the one on the right migrated to Toronto only after age 18 from a German-language background. The basic principle behind the RI would certainly allow for the inclusion of (more) recent immigrants to a target region. It is at this point where even more modern approaches, approaches that are more tolerant of migration and multilingualism, reveal

135

136 The Written Questionnaire in Social Dialectology

the RI’s focus on the “native speaker”; RI was intended to distinguish between different kinds of native speakers, in this case CanE, as can be seen in statements such as the following: “We would like to know how close the ties have to be before a person ‘speaks like a native’” of a given target region (Chambers 2000: 180–1). As such, the RI is only a modest advance towards including the long-term and longer-term residents of a region. It is, however, not suitable at all to model transitory migrations, as we will see in the next section. As Table 5.1 shows: both respondents would have been ruled out as inadmissible for Toronto English only because they moved to the region after their 18th birthday. In the context of CanE, and as far as a monolingual or an officially bilingual perspective (English-French) is concerned, the Regionality Index is useful. Some features have been shown to be primarily adopted by members of the very local population (e.g. Canadian Raising) and such index would reveal such differential. From the linguistically more diverse Canadian background and indeed a global background, however, the index makes sense in only a very limited way. To capture and model a more complete version of linguistic variation in a location, however, one needs to go beyond the Regionality Index, and Section 5.1.2 explores some alternative, yet still developing, approaches. The project’s “Language Use Index” (LUI) is another index of interest (Chambers and Lapierre 2011). Based on questions of a respondent’s use of English in four settings (at home, at work, with one’s relatives and among friends) answers are generated on a scale offering the options always (3), often (2), seldom (1) or never (0). The added scores produce the LUI: the higher the index, the more a language other than English is used. The minimum is “0”, which is a monolingual English speaker, the maximum is 12, which would be a person who does not speak English in any of these four locations (which would be suspicious – after all, how could he/she fill out the questionnaire?). Anyone else is in between the two extremes. The LUI is relevant for an understanding of how a linguistic substrate, e.g. a home or heritage language, an additional L2 language, or a combination of which, can influence one’s reported use of English. It is even more meaningful if the information on the use of English can be paired with a person’s other languages. In Section 8.2 both the Language Use Index and the Regionality Index will be explored further, where they will also be applied to real-world cases. At this point, it is important to know that traditional WQs offer some tools to model multilingual competences, but that their tools only go some modest way towards a more realistic modelling of multilingual competencies. In the context of Canada’s big cities, in which non-mother tongue speakers of an official language (English or French) generally comprise percentages of about 20% – a percentage that is mirrored in other countries’ metropolises – more intricate and precise ways to model multilingualism would be in



Chapter 5.  World Englishes, multilingualism and written questionnaires

order. This is especially the case, to stick with the Canadian example, in Toronto and Vancouver, where 43.8% of the population (Toronto) and 41.5% (Vancouver) do not speak English as their mother tongue (2006 data, Boberg 2010: 21, including French), which makes the systematic inclusion of L2 speakers an especially pressing issue. Some have begun to take this view using sociolinguistic interviews (e.g. Hoffman & Walker 2010; Nagy, Chociej & Hoffman 2014), while WQ projects in Vancouver have included a more focussed polling of L2 speakers, who comprise about half (46%) of a large Vancouver sample (Dollinger 2012a). In general, however, WQ methodology still waits to be explored more systematically in multilingual contexts. The question might arise why multilingualism has not been more actively integrated into social dialectology. One answer lies in the disciplinary histories. If one considers, for instance, the development of dialect geography, as outlined in Chapter 1, the focus on long-established communities and speakers with local ties seems quite logical. If one’s primary goal is to establish lineages of historical development, as was the original intention of dialectology until well into the second half of the 20th century, one is bound to poll the most local and most narrowly confined speakers in any community, the NORMs. Mobility and effects of mobility, such as multilingualism, is clearly a factor that would “dilute” the desired data. In addition to this discipline-intrinsic reason, there may be purely practical ones. For instance, the focus on the monolingual speaker-respondent in established populations avoids questions that are theoretically difficult to address. Such questions, as shown in 1–3 below, have only recently been explored in the modelling of the Canadian multilingual experience: 1. At which point does an immigrant community become an ethnic group within the larger community? 2. At which point becomes a group of speakers a permanent part of the local social fabric? In the Central European context of labour migration that started in the 1960s, it took over 20 years for the first studies of migrant worker language to surface (e.g. Clahsen, Meisel & Pienemann 1983), but much longer for a general acceptance of the 2nd and 3rd generation immigrant children into the societal mainstream, a process which is not yet completed. At which point would these former migrant speakers need to be included in a study of the main speech patterns in a given community? 3. To which extent, if at all, can one consider people who have emigrated from the same location or who speak the same minority language as forming a community (Walker & Hoffman 2013: 80)? In the wider macro-context that WQ studies usually provide, the question is a facet of the previous one: if one is polling the linguistic behaviour characteristic of location X, at which point would one wish and need to include more recent migrants?

137

138 The Written Questionnaire in Social Dialectology

As migrants make up large percentages of the population in many urban locations, their inclusion seems a logical next step. One might argue that the percentages do not even need to be as high as in the Canadian examples to include L2 speakers in representative sample: depending on the research question, a 10% threshold of L2 speakers in a community might be one reasonable approach to sampling if general statements on the linguistic variables in one location should be made.

5.1.2

From the national to the transnational: The sociolinguistics of globalization

As useful as these newer adaptations to traditional WQs discussed so far may be, migration and multilingualism are still treated as special events, and not as events and behaviours that are bound to become more and more frequent, a new norm in a globalizing world. While questions such as 1–3 above go some way towards capturing the linguistic realities, they still are based on assumptions of single-event migratory moves from one location to another, typically from an economically under-developed area to a better developed one, which has been the pattern in Canada and other countries. This perspective, however, overlooks the new realities of a transnational, life-time mobility that is emerging. For instance, when adolescents of Bengali background report on their multilingual modes of expression in a Stockholm low-income neighbourhood, they do so in multiple codes that may include Bengali, standard Swedish, vernacular Swedish and standard and vernacular English. Most importantly, while these adolescents may have spent most of their lives in Sweden, they do not always orient themselves within a local, Swedish framework, but more towards a global one. This global frame of reference is the crucial difference between earlier migration waves and more recent ones that draw on affordable air travel and digital ways of communication. On the one hand, these youth grow up in a multiethnic, multilingual context of super-diversity (see next section), where no single language is dominant in the neighbourhood: not a government-supported language, such as Swedish in this example, or any immigrant language. Faced with such diversity, the adolescents learn and employ partial linguistic competencies in a number of codes and construct identities that are decidedly mixed or transcultural. By communicating with friends from different ethnic backgrounds, with friends in digital forums, with friends and family both in diasporic contexts (e.g. London, UK, Australia, India, South Africa), they act linguistically differently from members of traditional immigrant neighbourhoods, whose linguistic settings were less diverse and whose pre-digital age communication options were rather limited. Through much of the 20th century, these traditional immigrant neighbourhoods tended to be dominated by a single ethnic group (e.g. Italians and Germans in Toronto, Cantonese in Vancouver, Turkish migrants in



Chapter 5.  World Englishes, multilingualism and written questionnaires 139

Vienna), whose heritage language played a special role, but otherwise the assimilation into the linguistic mainstream was generally only a question of time (e.g. Hoffman & Walker 2010 for Toronto). The situation of some immigrant neighbourhoods is different today, as their ethnic make-up is decidedly more mixed as a result of more sizeable and more frequent migratory moves. The Bengali Swedes, for instance, construct their own linguistic ‘multiethnic lects’ as an expression of their transnational orientation, with Swedish as only one part in the mix. They consider the possibility of relocation to other locales, which finds expression in their linguistic and cultural competencies; competencies that will adapt and expand throughout their lifetimes (Haglund 2010). Unlike the traditional immigrant, whose target was often the “integration” into mainstream culture (which is still the model national governments stubbornly ascribe to), migratory moves are no longer one-time events, but often moves for shorter periods of time compared to the previous situation of permanent relocation. So much so that questions like 1–3 above may no longer capture the migratory realities of an increasing number of people in an age of rapid globalization. Frequent temporary moves involve the active negotiation of one’s linguistic and cultural identities, a negotiation that is more complex than can be captured by “hyphenated identities”, such as Iranian-Canadian, Chinese-Canadian or Polish-Canadian.

A new kind of sociolinguistics? Some linguists have convincingly argued that current heightened scenarios of mobility and inter-connectedness on a global level demand new models of language and language use and new methodologies (Blommaert 2010; Pennycook 2010, 2007; Fairclough 2006; Calvet 2006). Blommaert (2010), for instance, has been arguing for a new sociolinguistics of mobility, which expressly acknowledges the changing social realities of linguistic use: with an increase of mobility one is confronted, primarily, with “truncated” linguistic repertoires and not, as is often assumed, full language competences. The basic premise of Blommaert’s approach is that patterns of mobility, social mobility, economic mobility, refuge and forced mobility, produce mixing of linguistic codes and repertoires that can be referred to as “super-diversity”. Such super-­diversity might be found in immigrant neighbourhoods in European cities, where few L1 speakers of an otherwise dominant language are present, as discussed above, or as a result of armed conflict around the world (e.g. Rwanda, Darfur, or more recently in Syria and the Ukraine), or of new realities due to political changes, e.g. the fall of the Iron Curtain in the Eastern Bloc, the opportunities and challenges in post-Apartheid South Africa, or as a result of more positive changes, such as increased student mobility, resulting in more speakers of more diverse linguistic backgrounds getting to communicate with one another and creating their own communicative scenarios as they see fit.

140 The Written Questionnaire in Social Dialectology

Researchers on super-diversity often stress the fact that partial linguistic competences are the norm rather than the exception. While this is a truism, since no native speaker has ever had full control over any language in its entity, in the current accelerating stage of globalization, partial competences or “truncated repertoires” come to the fore. What was once described as a fairly static linguistic marketplace of symbolic capital and power (Bourdieu 1991) is being turned into a “messy” new marketplace with mobile resources, mobile speakers and mobile markets. The challenge now is to adequately model this much more diverse linguistic marketplace. Just a few remarks shall be made to that effect here. Blommaert’s (2010) proposal to model the new situation rests on the notions of sociolinguistic scales, varying orders of indexicality and the polycentricity of linguistic power. With polycentricity, Blommaert aims to explain why in certain contexts a given range of linguistic features of some linguistic norms may occur but not in others. Understanding the indexicality of linguistic forms and processes is key in being understood in the desired way: the kind of register and linguistic acts and forms used to index the speaker as, e.g. a lawyer, asylum seeker, international scholar and the like. With sociolinguistic scales Blommaert refers to acts of communication that manifest themselves on individual and collective levels. He writes “when people or messages move, they move through a space that is filled with codes, norms and expectations” (2010: 32). These expectations most certainly clash in one way or another or may at least not be entirely congruent, which require negotiation skills of the interlocutor (Pennycook 2010 goes even one step further by making the interesting proposal that something like this negotiation-communication process should be the centre of linguistic study). Scales are important since they help explain why immigrant neighbourhoods in European cities, for instance, no longer work like the traditional single-­dominantlanguage neighbourhoods they once were, because of the influence of other scales (communicative events) that manifest themselves in a dazzlingly varied mix of communicative expectations and norms. Whether in Antwerp, Berlin, Copenhagen, London, Oslo or Vienna, language in urban neighbourhoods is interactively created by taking linguistic features from all language and language varieties available and creating a new “multiethnic” code . This creation of multiethnic lects, adaptations of the local vernacular, have been shown in a number of cities and are shared by youth from different ethnic backgrounds (e.g. Quist & Svendsen 2010; Hansen & Pharao 2010 in Copenhagen; Kostinas 1998 in Stockholm; Cheshire et al. 2011 in London; in Canada, no such claims have yet been made, see Hinrichs 2014 in Toronto). These speakers construct their linguistic identities in a “transnational frame of reference rather than an ethnonational one” (Haglund 2010: 102) and as such they construct their identities as multilingual, multivariational novel configurations, defying prevailing opinion within society, which generally sees multilingualism as a hindrance



Chapter 5.  World Englishes, multilingualism and written questionnaires

to integration into the mainstream society. The point is that integration is only one scenario, with the creation of transnational frames offering other options for identity formation.

Local literary practices and WQs It should have become obvious that global and multilingual streams are highly complex and that WQ designers cannot assume that the traditional linguistic (national) reference frame is uncontested in super-diverse contexts. One study that used WQs in such linguistically diverse area is the study of Wesbank High, the only high school in an underprivileged neighbourhood of Cape Town, South Africa (reported in Blommaert 2010: 78–101). Distinguishing between the ‘core’ and ‘periphery’ of English practices, in which the core has access to standard varieties and the periphery does not, Blommaert explores the local writing practices in the school setting, where Afrikaans, Xhosa and varieties of English are used. In this context, students and teachers have constructed their own local form of English, a “grassroots literacy” corresponding with their limited access to standard models. The features include: – Erratic use of capitals, e.g. “English. because it’s the oFFicial Language in South Africa” – Inconsistent singular and plural markings – Variability with verb inflection markings, esp. plural and tense markings – A wide range of non-standard spellings, mostly a result of phonetic spelling, e.g. spesel ‘special’, dearist ‘dearest’, neve ‘never’ or pefect ‘perfect’ – Aestheticized writing, e.g. ornamented letters, “while struggling with basic writing skills” (Blommaert 2010: 84) Grassroots literacy is ‘non-elite’ forms of writing, or writing “performed by people who are not fully inserted into elite economies of information, language and literacy” (Blommaert 2008: 7). Such ‘sub-elite literacy’ and ‘skeleton writing competences’ have their roles in the local context and can loosely be interpreted as locally created norms. And while such forms are expressions of creativity on the local level and are “sociologically realistic form[s] of literacy in the sense that [they] mirror the marginalized status of the community in which they occur[]” (ibid: 90), they serve quite a different function beyond the immediate local context when they mark the writer as not fully literate or worse. On the local level, however, they are “a level of shared literacy culture in an otherwise extremely heterogeneous community” (ibid: 90). Obviously, findings on writing systems, literacy styles and grassroots literacy are of immediate relevance to a survey tool relying on written input such as WQs. It is important to recognize the difference between levels of scales, such as the local and the non-local, and the social indexing that happens on each of these levels. The issue

141

142 The Written Questionnaire in Social Dialectology

of the language choice for WQ questions becomes an utterly sensitive topic and what may work in one setting (in Wesbank High the questions were presented both in Standard English and Standard Dutch), may not work at all in another complex. While in super-diverse settings, one or more forms of English are generally included, but they may not be part of the best language choices for a WQ. The limits of WQs that are administered from a distant location without intricate knowledge of the local situation would become clearly visible in attributes such as low response rates and samples biased towards a particular subgroup of the population. It is important to realize that while WQs may still be successfully employed, they would best be delivered as a component of a longer fieldwork and ethnographic experience, rather than as an online WQ distributed from afar.

5.2 World Englishes, Global Englishes: Concepts The field of World Englishes20 is a highly productive enterprise today. With at least four academic journals, World Englishes, English World-Wide, English Today (more diverse in its readership) and, more recently, English as a Lingua Franca, it has infused a global perspective to a field that prior to the 1980s dealt predominantly with British or American perspectives. One aspect that is foregrounded in World Englishes is the role of L2 speakers. We have seen that L2 speakers have only comparatively recently entered the picture, having been given more attention in social dialectology only in the past few years. One of the goals of the present chapter is to address multilingualism and English in a broader, global perspective. For the study of World Englishes, WQs seem to be predestined to play an important role due to their time efficiency and cost effectiveness. The term World Englishes, which is often used in the plural to reflect the diverse nature of its varieties, refers to forms of English around the globe. The field of World Englishes has grown rapidly. While barely existent in the early 1980s – Bailey and Görlach’s (1982) and Platt et al.’s (1984) collections are early key landmark publications, this area has produced a vast amount of research over the past two decades and must now be considered as one the most prolific fields in English linguistics overall. While there are many models of World Englishes, one of the first and most influential ones is Kachru’s (1985) “circle model”.

20. Please note that for space constraints the terms World Englishes and Global Englishes are used interchangeably in this book, regardless of the recent discrimination between World Englishes, as focussed on geographically definable varieties, and Global Englishes, which foreground a transnational, lingua-franca perspective.



Chapter 5.  World Englishes, multilingualism and written questionnaires 143

The Three Circle Model of English has been popular in the field and will be used throughout the remainder of this book. It was described by Braj Kachru in 1985 and is based on three concentric circles, as shown in Illustration 5.1. Inner Circle (ENL) – “norm-providing”

Outer Circle (ESL) – “norm-developing”

Expanding circle (EFL) – “norm-developing”

Illustration 5.1  Visualization of Kachru’s Three Circle Model

The model is composed of an Inner Circle, comprised of the countries where native English-speakers are the majority of speakers. These are the UK and the English settler colonies: USA, Canada, Australia, New Zealand.21 These countries are the traditional “norm-providing” countries, as linguistic norms and standards have traditionally been formed in reference to the Englishes used in these locations, but above all the UK and the US. The number of English speakers in this group is stagnant. The Outer Circle represents the countries that have had colonial ties to England or the USA (in case of the Philippines) and where settlers were the (often very small) minority. In these locations the native populations often adopted the use of English and developed its own forms, e.g. India, Pakistan, Philippines, Nigeria, Egypt and so forth. In this group, English was originally taught as a second language (not a foreign language) or ESL. These countries, to varying degrees, have been called norm-developing, as beginning to codify their own varieties of English, with India and the Philippines as the most advanced ones. The number of English speakers in this group is growing quickly. Finally, the Expanding Circle is the rest of the world that has never had colonial ties with England or the US and where English has been taught or is increasingly taught as a foreign language (EFL). The use of English in these highly diverse locations, e.g. Austria, Nicaragua, Russia, or China, is generally confined to international contexts, while English typically has no intranational (internal) functions within these countries. Only comparatively recently did English gain a wider foothold in some of the social 21. South Africa is a more complex case due to its complex linguistic history, but is occasionally assigned to the Inner Circle.

144 The Written Questionnaire in Social Dialectology

and professional circles of these countries’ societies, though to differing degrees and mostly within the younger generations, such as university courses taught in English to Austrians or business transactions carried out in English in Russia or China. In Expanding Circle countries, English is very limited in its scope and secondary to one or more vernacular languages. These countries have traditionally been called “norm-receiving”, i.e. following Inner Circle models of English (usually UK or US English) in their Foreign Language teaching. The number of English speakers in this group is growing rapidly as well. There are a number of traditional terms describing speaker competencies that relate to the circle model in interesting ways. The term English as a Native Language (ENL) is traditionally associated with the Inner Circle and is also called First Language (L1). English as a Second Language (ESL) is associated with the Outer Circle, where English has institutional backing, and English as a Foreign Language (EFL) with the Expanding Circle. The term Second Language (L2) cuts across ESL and EFL and is sometimes complemented with Third (L3) or Fourth Language (L4) and so on, though in most cases L2 is a cover term for all additional languages (Second and up). English as a Lingua Franca (ELF) is English used for communication among non-native speakers (L2 speakers) and cuts predominantly across the categories of the Outer and Expanding Circles, as both groups show characteristics of English language use that differs from traditional Inner Circle norms. ELF is also found, just like ESL, in the Inner Circle, though ELF and ESL/EFL differ in their target norms: while ESL and EFL have the native speaker norm as the learner target, ELF is defined differently as resulting from the communicative situations and norms created by non-native speakers.

5.3 WQ Elicitation in World English contexts World Englishes have been studied with the traditional type of speaker questionnaire that was the basis for the discussion in Chapters 2–4. Two interesting approaches will be presented in this section. They differ from the studies presented in Chapter 4 in a number of methodological respects. The first is an application of WQs in vastly different contexts with the Bamberg Questionnaire Project (see Krug, Hilbert & Fabri forthcoming), which aims to collect information on linguistic features and their social correlations. The second application uses WQs to probe into speaker attitudes concerning various accents of speakers of English as a Lingua Franca (Jenkins 2007). The Bamberg project is innovative in a number of ways. It uses a very detailed 15page questionnaire that comes in four parts:



Chapter 5.  World Englishes, multilingualism and written questionnaires 145

– a personal background sheet – an auditory part of grammatical and discourse marker features, which is played to the respondents who are then asked for evaluate the linguistic feature – a lexical part with 68 items – a written part of 200 sentences  (Krug & Sell 2013: 80) The questionnaire has been administered to university and secondary school students in originally four locations for a research project on Romance-English language contact: Malta, the British Channel Islands, Gibraltar and Puerto Rico.22 As a benchmark, data was also collected in the UK and the USA, allowing for an alignment of each variable beyond these two quasi-standards of English. The questionnaire is different from the typical postal or written questionnaire in that it includes sections requiring assistance from an administrator. As sound clips need to be played, the questionnaire is used in an interview setting where respondents can ask questions (Krug & Sell 2013: 82). Administered in more than one day and about 80 minutes in length, the questionnaire is on the longer end of the spectrum. Krug and associates employ scaling methods that allow for gradual answer options. Judgements about the currency of a given linguistic feature are elicited on a six-point scale from “everyone = 5” using the construction to “no-one = 0”, which makes the answers a type of community reporting rather than self-reports. A number of controls are built into the questionnaire to accommodate respondents’ behaviour, including the respondent tendency to shy away from extreme poles, which will be discussed, along with other aspects of questionnaire design, in Chapter 7. The data are limited to educated English, or the acrolect in each location, as a result of data collection in secondary and post-secondary settings. In the grammar and discourse marker sections the respondents are asked to provide community-reporting, whether each sentence could be said “in their home country in an informal conversation” or could be “written in their home country in an email to a former teacher”. Krug and Sell (2013: 81) suggest that this framing helps to eliminate socially desired answers of stigmatized forms, e.g. if it was known that aluminum is coded for [– highly educated] or as [+ American], chances are that those who use that variant would not report aluminum but aluminium instead, if they wanted to be seen as [– American] or as [+ educated]. On the other side, the link between respondents’ social categories and their linguistic answers is more indirect than if personal language were elicited, since

22. More recent collection rounds targeted Australia, New Zealand, Sweden, Ireland, Wales and Scotland (Krug: personal correspondence, March 2015), which expand the questionnaire’s dissemination considerably.

146 The Written Questionnaire in Social Dialectology

exposure to forms is not necessarily a function of the social categories of the user, but a function of the extent and reach of their social networks. The types of lexical and grammatical variables are shown below. For lexical choices, however, personal use is polled:

a drop in the ocean

a drop in the bucket

a faucet

a tap

aluminum

aluminium

anticlockwise

counterclockwise

eggplant

aubergine

Explanation / Comment

I never use either expression

I always use this expression

I use this expression more often

I have no preference

I use this expression more often

I always use this expression

Table 5.2  Excerpt from the lexical part of the Bamberg Questionnaire (Krug & Sell 2013: 96)

The lexical part is aiming to place each variable between the (extreme) poles of American (on the left) and British English (on the right) and it is clear that, although clearly inadvertently and as a consequence of the condensed format of WQs, a binary opposition is constructed where there is not necessarily one. The grammatical part expressly seeks to elicit judgements of the currency of a construction in “emails to a former teacher”:

G1. Over the last few years people have become less willing to do the manual work. G2. I’m learning French because is a beautiful language. G3. You’ve already met my father, no? G4. She came over and speak to us.

No-one

Few

Some

Many

Most

Everyone

Table 5.3  Extract from grammatical section of Bamberg Questionnaire (Krug and Sell 2013: 98)



Chapter 5.  World Englishes, multilingualism and written questionnaires 147

The idea is to get respondents to truthfully report the use of non-standard features, because they are not required to report on their own or their friends’ use. Features such as article insertion (G1) or pronoun deletion (G2), question tags (G3) and non-agreement of verbal tense (G4) are presented to the respondents in order for them to report whether such constructions might occur in “emails to a teacher”. The strategy is aimed to counter the influence of prescriptive norms and socially acceptable behaviour. While it appears to be intuitive, one might wonder whether the educational settings of post-secondary and secondary school would not produce a high level of standard-like behaviour despite the instructions. Preliminary results seem to suggest, however, that this problem may be confined to the higher social strata (Krug: personal correspondence, March 2015). One problem seems to be that the hypothetical setting remains still vague, as the concrete communicative context is not specified: which teacher (the English teacher, the math teacher?) and which school form would possibly make a difference. There are two points of critique that come to mind for the Bamberg questionnaire. Both are minor in the context of the project but can serve to illustrate potential problems with the use of one questionnaire in multiple locations, which is one of the biggest issues for WQ methodology in global contexts. The first concern the background sheet and the second involves the use of dichotomous UK/US prompts. The background questions are fairly standard in their make-up: age, gender, nationality, ethnic self-identification, country or region of identification, education and so forth, including the parents’. Residence history is completely polled, from birth to the time of polling, which offers a good background for geographical correlations, while questions on multilingualism and language use in the home are reported in four discrete categories (here for Malta): English, mostly English – some Maltese, mostly Maltese – some English, Maltese

In addition, a text field for “other” languages and fields for the mother’s and father’s native language(s) are offered. The use of one type of questionnaire in multiple settings presents challenges for the proper documentation of linguistically important factors that go far beyond the immediate project. Language use, for instance, is a case in point, as Italian, which is used by almost two thirds of the Maltese population and is “the superstrate language of its earlier history and one of the media languages” (Markus 2007: 204) is not listed. While respondents would likely report additional languages in the provided fields, we are left wondering whether the listing of all major varieties would set different signs for multilingual speakers and produce more variation than otherwise. It seems that we see a problem of the polling of multilingual communities more generally.

148 The Written Questionnaire in Social Dialectology

The second issue concerns the linguistic stimulus of lexical items and is the result of highly complex semantic distinctions in World Englishes, including homonyms (see, e.g. Görlach 1995). The lexical section polls the self-reported use of lexical variables (Table 5.2). One such variable is dummy vs. pacifier. This variable shows three dominant variants including soother, as shown in Figure 5.1. The lack of a major variant such as soother likely triggers substantial answers in the category “I never use either expression”, or, worse, would produce erroneous data. Interestingly, soother does not produce such comments, which is the case for for sneakers/trainers, where respondents use the Comment/Explanation field (Krug: personal correspondence, March 2015). As Figure 5.1 shows, soother is a frequent variety and more common in the UK than dummy, which is of relevance in the European English context. “soother” AND “baby”

Frequcency index (×,)



.

.



.

   

. .

.

.ca

.uk

.

    .ie

.nz

.au

.za

US

Frequcency index (×,)

‘’dummy” AND “baby”           

. .

. . .ca

.uk

. .ie

.

. .nz

.au

.za

US

Figure 5.1  Soother and dummy in six varieties (14 May 2014). Source: DCHP-2 (s.v. soother)



Chapter 5.  World Englishes, multilingualism and written questionnaires 149

The principle of the case of dummy vs. pacifier vs. soother, or, if the preliminary results can be taken as established, rather the case of sneakers vs. runners vs. trainers in the Maltese context is a reminder that variables with more than two competing forms require special consideration. In the absence of detailed linguistic studies, the method of normalized, targeted internet searches used in the Dictionary of Canadianisms (DCHP2) may help isolate the most relevant forms in some World English contexts (Dollinger 2011b, Dollinger 2015). However, behind this issue lingers a more profound problem, as lexical questionnaires of the type shown in Table 5.2 depend on the supposition that all variants of a variable need to be consistent synonyms of each other. In a global context full or near semantic equivalence between forms is more unlikely than in more limited contexts. The basic point has been made by Görlach (1995), as he has drawn attention to the stylistic non-equivalence of variants in a case study of the use of trash : rubbish : refuse : garbage : waste and found that in Global Englishes these variants cannot be expected to be exclusive to individual parts of [a] country – as they often are in the older mother country. Rather, in speech communities composed of people of different origins, they are likely to be used side-by-side, as synonyms […], or with various, often idiolectal, restrictions in style or compatibility. (Görlach 1994: 267)

When one asks for trash can vs. dustbin, one would need to have a fairly good picture of the semantic field of garbage and waste in order for the results to be meaningful. There are also a number of noun compounds, such as garbage can or BrE-AmE hybrid compounds such as garbage (AmE) bin (BrE), the latter type figuring in CanE, that complicate the picture. In addition, when one considers Italian and Maltese loanwords for these terms, as processes of language mixing would likely occur in the Maltese setting, we see that WQs with binary of even multiple choice answers face their limitations. These reflections show that there is no one-size-fits-all approach for WQs. There are at least two ways to consider WQs in international settings: either from a basic typological perspective or from a location-specific, profoundly local perspective. The former aims to use categories that are comparable across different locations (such as in the Bamberg project), the other tailors the categories as closely as possible to the language-specific situation in one location, or uses open response types, which is the preferred type in many WE settings. The typological approach results in a bird’s eye view that must make some compromises in terms of accuracy, as shown above, while the location-specific approach can fully develop the fine-grained and location-specific responses that are caught in the very tight-knit net of a tailor-made WQ.

150 The Written Questionnaire in Social Dialectology

5.3.1

Some problems of WQs in contact scenarios

There are also problems with the location-specific approach. The lexical examples above suggest that in multilingual settings open text fields would likely capture linguistic variation more accurately than closed-response items, though this design would increase the administrative burden considerably (see Section 9.3.3 on the classification of open answers). While open answers would be a practical way to document the multilinguistic and multidialectal base more fully and avoid too many answers that are either “do not use” or, worse, variant choices that respondents do not actually use, the respondent’s processing time of the WQ would be increased. As the example of dummy vs. pacifier has shown, the lack of the plausible third answer choice soother renders the results of this question too imprecise for non-typological approaches. This problem is compounded when in an increasingly mobile world, varying uses and meanings of terms are co-determined not just by region, which was the primary focus of traditional WQs, or classical sociolinguistic group characteristics, such as age, gender, social class or ethnicity – the categories of traditional sociolinguistics –, but are a reflection of heterogeneous populations that are socially highly diverse. Speakers who happen to live in one location may come from different parts of the globe and may communicate with one another with terms that refer to semantic fields that only partially overlap. Görlach’s conclusion on the semantic field of Garbage and Waste is a call for caution. WQs would need to be designed with particular caution in order to gauge the inventory of linguistic variation in a given location. An array of questions relating to one semantic field might be needed in some cases and open answer questions would be the preferred format, barring the availability of detailed studies. In any case, for open answer questions a reduction of the number of questions, resulting in fewer variables, would have to be considered (see Section 7.2.1). While open answer questions yield more variants and more precise data, their format increases the response times by a great margin, as writing or typing an answer, perhaps a more complex answer than just a word, requires both more effort and more time of the respondents. As these examples show, the WQ designer is once more confronted with the difficult choice between the comparability of data across locations and the reliability of the information, a problem that was first discussed in Section 3.1.1. This dichotomy will keep resurfacing in this book as we discuss WQ and question design. As a primarily typological and innovative approach, the Bamberg project, quite understandably, foregrounds the former. But there is a price to be paid for multiple, binary choice response formats that in most contexts reduce data reliability and the polled range of variation by a considerable margin. We said earlier that the range of variation is likely increased in WE contexts. In Chapter 2, responses from traditional, Inner Circle contexts were discussed and modelled for lexical variants and Figure 2.1 introduced the concept of the A-curve. The



Chapter 5.  World Englishes, multilingualism and written questionnaires

A-curve visualizes the Inner Circle fact that a very small number of variants, somewhere in the vicinity of 3 or 4 for many lexical variables, account for about 90% of all responses, while the remaining 10% are comprised of a dazzling array of lower frequency items. It is important to keep in mind that these data were gathered in the de facto largely monolingual setting of the Eastern USA in the mid-20th century. It is only to be expected, however, that in present-day multilingual, socially and geographically mobile communities, such as the ethnically diverse neighbourhoods of European cities discussed earlier in this chapter, the variant response patterns would look differently. As variation increases, the curve that best matches the responses in multilingual contexts would look flatter than in monolingual ones. In other words as variation increases, it would take more than 3 or 4 types to comprise 90% of all answer tokens, perhaps 6 or 8 and potentially many more. Figure 5.2 offers a hypothetical example and comparison (squares) with Kretzschmar’s attested A-curve from Chapter 2 (diamonds). 1200

Monolingual Multilingual

occurrences

1000 800 600 400 200 0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

variant rank

Figure 5.2  Variant responses in traditional monolingual vs. hypothetical multilingual, super-diverse settings

Both curves in Figure 5.2 are based on the same number of overall responses, but their “long tails” were cut at 4 occurrences in order to highlight the important differences on the left-hand side, resulting in “flatter curves” that may, in some contexts, no longer be asymptotic hyperbolic curves, but perhaps merely bell curves, with a flatter y-axis dimension. Figure 5.2 suggests that the squared curve has five major variants and considerably strong sixth and seventh most frequent types rather than three major variants. Such scenarios, flatter A-curves, would be expected in mixed multilingual settings, where more speakers come from more diverse social and linguistic backgrounds. With this diversity, variation would increase and WQs would need to find ways to capture that variation. The proposal made here is to use more open response question types,

151

152

The Written Questionnaire in Social Dialectology

which would go some way, but other issues need to be addressed, such as variable identification as such, which will be addressed in the next sections. What follows will necessarily need to be adapted to super-diverse settings, which offers a most promising perspective on language and language as practice (e.g. Rampton et al. 2015).

5.3.2

Conceptualizing space in dialect geography

The notion of super-diversity puts a very profound problem of dialect geography into sharp relief: until very recently, the last decade or so, the concept of “space” has been left undefined in the field. It was all about finding the variants that were used in geographical space. If there was village A, the space of village A was left untheorized and the focus of dialect geographers, and many variationists alike, was to fill the “space” with the interesting linguistic variation inherent in that locale. Britain (2010a) offers a most informative account of the lack of theory about space and he suggests and identifies some interesting concepts about space that go beyond the obvious, and at first sight trivial, notion of space. Landmark discoveries in this context include pioneering work by Trudgill (1974a) on a social dialect geography, integrating the social with the geographical dimensions. This was followed, with some delay, by Britain (1991), Chambers & Trudgill (1998 [1980]), among others, and showed that both social and geographical expressions of space co-determine linguistic variation. By taking a closer look at geographical transition zones between geographical isoglosses of linguistic forms, interesting socially conditioned variation could be discovered. As a result of this and relating work, space is understood today in at least three dimensions (Britain 2002, 2010a): – Geographical space (Euclidean space): the distance between two points; this is the traditional variable in dialect geography – Social space: geographical space is overlaid with social relations. As Britain (2010a: 70) puts it: [o]ur settlement and manipulation of that space, our movement and interaction within it and the relationship between individual actors and the institutions of capital and the state which govern and shape our actions within space mean that it is socially produced.

– Perceived space: the imaginary creation of space, our beliefs about and perceptions of it Britain (2002: 604) calls all three levels of space taken together “spatiality”, which is described as “a key human geographic dimension”. Spatiality is thus not static, but a dynamic concept that is always in the making and it is this variable, not just Euclidean space, that co-determines language use: geographical space is only one component of



Chapter 5.  World Englishes, multilingualism and written questionnaires

it. Any dialect geography today will need to consider spatiality in this way: space in its social manifestation and in combination with other social factors. WQs in social dialectology, with their ease of distribution over wide geographical and social spaces, need to be especially sensitive to spatiality, which make them not just great tools for WEs, but also, as shown before, potentially ripe with egregious errors. It is all too easy to treat space as a one-dimensional, apparently objective unit on a map, as a pseudo “default” setting. The danger exists that blanket mass-mailings of a given WQ will only treat space as a simple constant factor that is used to anchor responses. This has been the traditional dialect geographical approach and most findings discussed in this book are largely based on such static notion of space. Apart from the currently developing sociolinguistics of globalization, for which it is too soon to tell what its lasting parameters may be, perceptual dialectology is a notable exception that already offers insights into “perceived space”, subjective space, via its map-based approaches and rankings of correctness and pleasantness. The integration of all three perspectives of spatiality in large-scale WQs, especially and most drastically perhaps in global contexts, poses theoretical and practical questions waiting to be addressed. The basic question would be how to integrate questions on the social uses of geographical space in existing WQs in an effective manner. Here, new methodologies and question types would need to be developed. Take, for instance, a super-diverse Viennese neighbourhood: what kinds of questions can reasonably be posed to a socially volatile, multilingual and multicultural teenager of a migratory background without giving her the impression that she is not “matching up” with some unspoken and undefined standard that is most often presented in public and media discourse? It would take a highly knowledgeable and sensitive researcher who must approach linguistically and culturally super-diverse individuals on an equal footing. One can easily see that a mass-emailed WQ would stand little chance to be answered and, if it were, would likely not include the right kind of questions for that person’s social reality. With the breakdown of homogeneous communities, researchers are required to rethink elicitation strategies just as much as their questions – both social and linguistic.

5.3.3

Select morphosyntactic features of World Englishes

In the present context of demonstrating some linguistic feature that are, in principle, pollable with WQs, it seems a good idea to resort to tried-tested-and-true areas of World Englishes, which cannot yet be said of super-diverse settings. In this context, the Outer Circle countries take a leading role, as a massive body of linguistic studies is available today that may assist in variable identification. Indeed, so much material has been collected that the abstraction and generalization of linguistic features in Outer Circle Englishes is possible. This section offers an exemplary overview of features that are the result of language contact and that may be successfully polled with WQs. Mostly

153

154 The Written Questionnaire in Social Dialectology

drawing on the Outer Circle, the processes underlying them are also applicable to Expanding Circles. Lexical and morphosyntactic variables are most readily polled with WQs. Lexis will be dealt with in the ELF context, though the same processes would apply in the Outer Circle. Among the morphosyntactic phenomena that are amenable to WQs are the following features: (5.1) Article use: a. SingE I want to buy Ø bag. b. ChinE I can play Ø piano. c. ChinE Xiao Ying is a tallest girl in the class. d. ChinE He was in a pain. (Mesthrie & Bhatt 2008: 47–52) e. IndE I’m staying in one house with three other. f. EAfrE I’m not on scholarship. (Jenkins 2009: 29)

Article deletions are common, but not merely limited to Outer Circle countries (5.1a). Many Expanding Circle countries, depending on the substrate language, feature them too (5.1b). Example (5.1c) shows the interchangeability of the definite and the indefinite articles in some cases, and (5.1d) article insertion. (5.1e) and (5.1f) exemplify the specific/non-specific system in the Outer Circle that appears to have replaced the definite/indefinite article system in Inner Circles: (5.2e) is referring to a specific house (marked by one), whereas (5.2f) does not refer to a specific scholarship, since no scholarship was received (for more detail see Mesthrie & Bhatt 2008: 47–48). An area where more work is needed is describing the use of the number system. Examples (5.2a, b) relate to the deletion of plural morphemes, (5.2c, d) to the regularization of Inner Circle irregular plural formation processes in Outer Circle Englishes. Example (5.2e) shows the treatment of mass nouns as count nouns, a process that can also be found in Inner Circle Englishes, e.g. AmE accommodation – accommodations, water – waters (not glasses of waters). (5.2) Number: a. LakE One of the worksheet. b. SingE I know people who speak with those accent. c. IndSAfE hoof – hoofs (not hooves), knife – knifes (not knives), IndSAfE child – childrens d. Asian and African Englishes: theses (sg.) – theses (pl.) e. Asian and African Englishes: furniture – furnitures, equipment – equipments (Mesthrie & Bhatt 2008: 52–53)

Grammatical gender displays interesting phenomena, despite its limited status in English in comparison to other languages (e.g. German, Hindi), as it is limited to the expression of natural gender in pronouns. Here, though, Outer Circle varieties show non-agreement patterns usually not found in Inner Circle varieties (but also reported in the Expanding Circle):



Chapter 5.  World Englishes, multilingualism and written questionnaires

(5.3) Gender: a. MalE My mother, he live in kampong. (‘fenced-in village’ < EngEng compound) b. East Africa My husband who was in England, she was by then my fiancé. (Mesthrie & Bhatt 2008: 55)

These examples shall give an impression of some phenomena found in the noun phrase. More phenomena are found in Platt, Weber & Ho (1984: Chapter 4), and the more recent papers in Schneider et al. (2004), as well as in Mesthrie & Bhatt (2008). In the verb phrase, a number of tendencies exist as well. They include the loss of overt 3rd person singular marking (5.4a), variability in the past tense suffix (5.4b) and an overlap of present and future tenses (5.4c). (5.4) Tense: a. PhilE She drink milk. (Jenkins 2009: 29) b. SingE Last time she come on Thursday. (Mesthrie & Bhatt 2008: 59) c. CapeFlatsE I take it later. (I’ll take it later) (Mesthrie & Bhatt 2008: 60)

While variation is found in the tense system, more variability is displayed in the aspect system, as shown in (5.5): (5.5) Aspect: a. MalE He already go home. b. MalE You eat finish, go out and play (= When you’ve finished eating, go out and play). c. IndE I have read this book last month. d. GhaE It has been established many years ago.  (Mesthrie & Bhatt 2008: 62–3)

Substitute markers like already and finish(ed) often mark the completion of an action instead of the perfective aspect, as in (5.5a, b) which is expressed in British Standard English with He’s already gone home; When you’ve finished. Examples (5.5c, d) are either unstable forms or, if stable, may signal the transfer of aspect marking from the tense system to other domains such as adverbials (already, finish). Mesthrie and Bhatt (2008: 63), in contrast, consider examples such as (5.5c) as aspectual innovations for extended temporal contexts, possibly along the lines of ‘I read the book and I know its content’. Such generalization of the auxiliary have is also common in Expanding Circle Englishes, which makes the assessment of the stability of the feature important in order to rule out a possible learner effect. The distinction between stative and non-stative verbs and their different grammatical treatments is often lost. In Standard English verbs denoting actions (play, sing, swim) occur in the progressive, while verbs denoting stages (know, smell, have) do not. In Outer Circle (and Expanding Circle) Englishes, the progressive is generalized across all verb classes, as shown in (5.6).

155

156 The Written Questionnaire in Social Dialectology

(5.6) Stative verbs a. BlSAfE I am having a cold. b. NigE I am smelling something. c. MalE She is owning two luxury apartments.  (Mesthrie & Bhatt 2008: 67) d. EAfrE She is knowing her science very well. e. IndE Mohan is having two houses. (Jenkins 2009: 30)

Inner Circle varieties have now come to accept some forms in idiomatic expressions, e.g. love, I’m loving every minute of it, and in commercials I’m loving it, where formerly I love it would have been required, though the basic constraints are still in place and examples in (5.6) would not be possible. These examples shall suffice to illustrate grammatical features that may be used in WQ polling. One of the major concerns of the field of WEs has been the legitimization of Outer Circle varieties in addition to Inner Circle varieties. If a variety of English is the result of historical colonial ties with England or the US, the nativized English of the Outer Circle is now a legitimate object of study: Indian English, Pakistani English, Filipino English. The same processes that have shaped the Outer Circle English (phenomena of language contact and substrate influence, borrowings and so forth), are also shaping another contact variety: English as a Lingua Franca, or, the English varieties of the Expanding Circle.

5.4 English as a Lingua Franca (ELF) It was mentioned above that the study of Global Englishes was established in the 1980s, at which point in time its focus was on the Outer Circle, which has come to be associated with the term “World Englishes”. A scholarly debate on whether Outer Circle Englishes should serve as teaching models (Kachru’s point of view) or the traditional varieties of the former colonizer (Quirk’s view) is referred to here as “Quirk-Kachru Debate”, though other scholars weighed in as well (e.g. Jenkins 2009: 66–70). In the late 1990s a new perspective enriched the field: the use of English as a global lingua franca, a contact variety that serves the communicative needs of speakers who do not share another common language.

5.4.1

Concepts

The use of English as a lingua franca for communication among non-native speakers can be considered a game-changing moment in the evolution of the English language and, given its global spread, in the history of language. English as a Lingua Franca, or ELF (usually pronounced like the fairy-tale being) can be defined as “communication predominantly among NNs [non-native speakers] rather than between NSs [native



Chapter 5.  World Englishes, multilingualism and written questionnaires

speakers] and NNS [non-native speakers]” (Jenkins 2007: 3). Seidlhofer offers the following definition of ELF as any use of English among speakers of different first languages for whom English is the communicative medium of choice, and often the only option.  (Seidlhofer 2011, Kindle edition, italics in original)

English is in a different position compared to any other language for its widespread use today. While all other languages without exception have more native speakers than non-native speakers, the case is reverse for English today: English is spoken by more non-native speakers, L2 speakers, than by native speakers, L1 speakers. And the differential is not small. According to the best estimates, today for every five L2 speakers of English there is merely one L1 speaker of English (Crystal’s answer, OED Symposium 1 Aug 2013, to a plenary question). By comparison, in 2003 the ratio was 3 : 1 (Crystal 2003). Given the present high rates of adoption of English, at the time this book appears in print, the ratio might be closer to 6 : 1. The repercussions of this change in the social applications of English have increasingly been documented and explored. As English is used in novel contexts by multilingual speakers, it begins to develop in new directions. English is employed in addition to existing languages and in communicative settings that may have little to do with the traditional contexts in Inner Circle countries. English, therefore, differs crucially from other foreign languages such as Spanish, Russian, Japanese, and so on, which continue to be learnt predominantly for communication with L1 speakers, usually in the L1 country. (Jenkins 2000: 6)

With people studying English to communicate with non-native speakers, questions of ownership and teaching standards surface. The view that native speakers “own the language” is still widely held today. Applied linguists, however, have questioned its underlying assumptions given the new speaker dynamics in English. The question of ownership has been most clearly explored in Widdowson (1994), when he states, after a careful examination of the usual claims for ownership by native speakers, the following: How English develops in the world is no business whatever of native speakers in England, the United States or anywhere else. They have no say in the matter, no right to intervene or pass judgement. They are irrelevant. The very fact that English is an international language means that no nation can have custody over it. To grant such custody of the language, is necessarily to arrest its development and so undermine its international status. (Widdowson 1994: 385)

This often quoted passage has stirred an important discussion in the field. David Crystal contextualizes the discussion about ownership from the perspective of the L1 speaker and emphasizes the importance of demographics:

157

158 The Written Questionnaire in Social Dialectology

The loss of ownership is of course uncomfortable to those, especially in Britain, who feel that the language is theirs by historical right; but they have no alternative. There is no way in which any kind of regional social movement, such as the purist societies which try to prevent language change or restore a past period of imagined linguistic excellence, can influence the global outcome. In the end, it comes down to population growth.(Crystal 2003: 141)

This population growth is definitely in favour of the Outer and Expanding Circles. A lot of work has been invested to describe this international form of English, this English as a Lingua Franca (e.g. Jenkins 2000; Seidlhofer 2007, 2011), and the rationale that is offered is clear: given that English as a Lingua Franca is, in contrast to other languages, used more frequently by non-native speakers than by native speakers, it follows that a description of ELF uses should be offered as teaching models to those who wish to use it. Graddol (2007: 108) predicts an increase in Lingua Franca English for the next generation or so, at which point knowledge of ELF will no longer offer a competitive advantage, as the “market” becomes saturated with this particular competence or, in other words, a basic requirement. From that point on, ELF decreases in use to a more sustainable level. While it is always difficult to predict the future, Graddol’s basic idea seems to be of merit and the prediction that the demand of ELF is increasing until a point of saturation is reached, seems reasonable. There are a number of important theoretical issues that result from the introduction of ELF, which has been slowly gaining acceptance in expert circles, but not unanimously so quite yet. As Barbara Seidlhofer asserts on the state of research into the variety: This refusal [of the linguistic community by and large] to take ELF and ELF speakers seriously is all the more perverse since it flies in the face of everything that sociolinguists have held dear all along: interest in the intricate relationship between linguistic variation, contexts of use and expressions of identity, insistence on the intrinsic variability of all language, and the natural virtues of linguistic diversity. .(Seidlhofer 2011: Kindle edition)

Seidlhofer then points to the fact that varieties in a WE paradigm, i.e. the Outer Circle varieties, are now “assigned legitimacy”. This is not yet the case for ELF varieties, which are frequently met with utterly different attitudes, even in some linguistic circles and, quite typically, among some ELF speakers themselves.

5.4.2 Polling language teacher attitudes towards ELF Since the late 1950s a veritable research tradition that probes into speaker attitudes towards particular varieties has been developed. First pioneered by Wallace Lambert and associates (Lambert et al. 1960) in what came to be known as the “matched-guise”



Chapter 5.  World Englishes, multilingualism and written questionnaires 159

technique, many speaker evaluation methods have been developed since. One study shall serve to illustrate the profound relevance of attitude studies. Jennifer Jenkins (2007) has used WQs to probe the attitudes of ELF speakers concerning various “accented Englishes”. Building on attitude studies (Giles & Billings 2004 for an overview) and on perceptual dialectology (e.g. Preston 1989; Preston & Long 1999–2002), whose goal is to investigate the “cognitive states that govern the comments that people make” about language (Preston 2006: 115), Jenkins uses WQs to probe into the attitudes towards various EFL accents. The connection between beliefs about language and attitudes towards language is important for all languages, but especially for emerging language varieties such as ELF. Wolff ’s (1959) classic study on the languages of two communities in the Niger Delta whose languages were linguistically so similar that one could consider them dialects of the same language, is a case in point for the relevance of language attitudes. The research showed that the power and status of each group had a dramatic impact on inter-group intelligibility. While the socially less powerful group claimed to understand the language of the other group, members of the more powerful group claimed to find the speech of their neighbours unintelligible. In the English context, attitudes towards varieties and accents would clearly be expected to influence intelligibility ratings, which is why it is essential to know people’s attitudes towards ELF. WQs in map form (see Chapter 7) have played a vital role in the exploration of linguistic attitudes and perceptions since Preston (1989). Jenkins, putting her work expressly in line with the area of folk and perceptual linguistics, adopts a ratings scale for accent perceptions of 10 national varieties of English, including ELF varieties in four domains: “correctness”, “acceptability for international communication”, “pleasantness” and the respondent’s familiarity with the accent in question, for which respondents could choose six categories, from “very correct/acceptable/pleasant/familiar” to “very incorrect, unacceptable, unpleasant, unfamiliar” (Jenkins 2007: 190–3 for her questionnaire). Another question asked respondents to rank the “five English accents that you think are the best” from any accents, not just those 10 listed in another question. In addition to the scalar ratings on these four dimensions, a qualitative element is included by prompting respondents to write down for each accent a word or phrase that represents for you the English accent of each numbered country on the map. You can refer to any aspect of the accent, such as its speed, its quality of tone (e.g. ‘harsh’, ‘melodious’), its pitch, its rhythm (e.g. ‘like a machine gun’), its precision, its strength, how easy it is to understand etc. etc. There is no correct answer. Please say what you think – I am interested in your views.  (Jenkins 2007: 190)

160 The Written Questionnaire in Social Dialectology

Making it clear that each personal opinion is appreciated, and giving ample space in more than one location to elaborate, Jenkins combines a quantitative with a qualitative element. More than half of her respondents made use of the offer, in sometimes elaborate comments. In addition, she offered her email address to “discuss any issues relating to the questions”. The overall goal was to elicit the beliefs of non-native speaker teachers of English, English Language Teaching (ELT) professionals, in ten different settings across the globe. Non-native ELT professionals have themselves spent many years mastering the target language and then underwent rigorous training to become teachers. Their opinions on accents of English, including ELF accents, would be a bellwether for the acceptance of the nascent variety. WQs show an advantage over face-to-face interviews in that there is less danger to obtain socially-screened answers. As Jenkins writes, the aim in conducting the questionnaire study was to find out in what ways and to what extent the kinds of [negative] beliefs and attitudes that typically emerge in written and spoken discussions of ELF […] would be replicated when teachers were given the opportunity to voice their thoughts privately and (if they so choose) completely anonymously.  (Jenkins 2007: 147)

In total, 326 responses were received. An exemplary result is shown in Figure 5.3, which represents the “best” English accents according to non-native English teachers: the gap between BrE (167 mentions) and AmE (100 mentions) with the other varieties is phenomenal, with AusE and CanE receiving only five mentions each. The overall result of this attitude study was clear. Even when asked with the added bonus of (possible) anonymity, non-native teachers of English in 12 countries reveal strongly held positions about the correctness, pleasantness, and international acceptability of English accents (sometimes on the basis of limited familiarity), and firm linguistic beliefs about the locus of the ‘best’ English accents (i.e. the US and UK). (Jenkins 2007: 186)

This result in attitudes is in stark contrast to the widespread situations in which ELF is used. Quoting earlier research, Jenkins suggests that language attitudes may change quickly in the light of new social settings, such as ELF’s continuing dominance in the speaker pool of English. In particular, Trudgill and Giles’ (1978) Social Connotations Hypothesis is quoted to offer an explanation for these, from an ELF perspective, disappointing results (Jenkins 2007: 187). According to the hypothesis, reactions to accents are not the result of a reaction to their intrinsic features (intelligibility, versatility), but to the social connotations that are evoked with the groups associated with these accents. The present entrechment of results such as in Figure 5.3 is indeed striking. Jenkins argues with Trudgill and Giles (1978: 175), who suggest that unbiased responses by the ELT



Chapter 5.  World Englishes, multilingualism and written questionnaires 161

180 160 140 120 n

100 80 60 40 20 0

BrE AmE AusE CanE IrE DutchE FrE

IndE JapE SwedE

Figure 5.3  First-ranked varieties for “best” English accents among L2 ELT teachers (Jenkins 2007: 157)

teachers would be very unlikely by virtue of being exposed to native target models throughout their teaching careers and especially, in their teacher training programs. It is well known that in these programs, the native-speaker norm is the predominant target (see, e.g. Spichtinger 2000 vs. Hüttner & Kidd 2000 for the continental European context). So, it seems that the target model in teaching programmes might be the biggest problem towards the legitimization of ELF.

Group identity, mutual intelligibility and ELF Jenkins finds reason for optimism in some individual response patterns, as three L2 teachers ranked their own accents as the “best” accents, three others ranked their own as second-best and 7 as third-best. In total 37 respondents ranked their own respective EFL accents as one of the five-best, which is a fact that “cannot be overlooked”, especially as 15 East Asian speakers are among them. Jenkins reasons that: It is possible that this is the start of a trend, and that in the next few years, increasing numbers of expanding circle speakers, following in the path of outer circle groups, will resist the pressure to ‘aspire’ to NS [native-speaker] English accents. (Jenkins 2007: 161)

Such amelioration of attitudes would be expected as ELF communication is discussed more widely and the issue of group identity is included. For example, one might ask why a Taiwanese engineer who communicates with Japanese engineers in English on a regular basis would not want to show her Taiwanese background when using English? As Dörnyei et al. (2006: 110) point out for the Hungarian context, English represents the language for the “world at large” rather than a narrowly-defined target community from the Inner Circle. Similar processes will be in operation in other places of the

162 The Written Questionnaire in Social Dialectology

Expanding Circle and it would be very odd if identity formations, such as lined out in Schneider’s (2007) model for the Inner and Outer Circle would not become operative (see Section 6.6.2). Such identity-supporting linguistic features may work against the mutual intelligibility of WEs and, as Crystal (2003: 22) assesses, “people tend to underestimate the role of identity […]. Language is a major means of showing where we belong, and of distinguishing one social group from one another”. If this is the case, ELF is – or rather various varieties of ELF – are bound to prosper.

5.4.3 Linguistic error or innovation? The case for WQs The consequence of English being adopted by more non-native than native speakers of the language has important social, linguistic and cultural implications. The issue of legitimization of linguistic features that are different from standard varieties is a key factor affecting the Outer Circle and especially the ELF varieties today. Legitimization has, however, also affected Inner Circle verities. The Inner Circle variety with historically perhaps the least claim to genteelism is Australian English (AusE). Derived from the first convicts who landed in 1788, AusE, like other colonial varieties, has had little claim to social refinement and prestige until recently. Before the 1970s, AusE was not considered a prestige variety even by Australians, who should have had a vested interest in improving its status. The first favourable comments on AusE can only be found in the 1950s, which is reflective of the low opinion Australians had of their native variety. These attitudes have radically changed in the past four decades, with AusE being used as a teaching model in some parts of the Pacific region as early as the 1990s (Leitner 1992: 208). If it took an Inner Circle variety like AusE more than a century to gain an appreciation among its own speakers, how difficult must it be for Outer Circle varieties, such as Nigerian English or Malaysian English, or Expanding Circle varieties, such as Euro English (Jenkins 2015: A7) and Chinese English, to acquire a social status that is conducive to serving anything but the less prestigious functions? One important aspect for the study of WEs is assessing the varieties for their intrinsic functions and in their own right as codes of communication on the one hand and codes expressing local (or regional or national) identities on the other hand. The assessment of differences between nascent varieties and standard varieties has been explored since Kachru (1983: 2), who used the term “deviations” in an effort to distinguish deviations from “mistakes” or “errors”. Since deviation may carry negative connotations today, we will use the term “feature” (Mesthrie & Bhatt 2008: 46) or innovation (Jenkins 2009: 266). Bamgbose (1998) introduces a framework for distinguishing linguistic innovation from what may traditionally have been called learner “errors”. It is important to keep in mind that the term “error” is never meant to be considered in relation to Inner Circle standard varieties, but only and exclusively within the intrinsic norm of the given variety, i.e. from an endonormative rather than an exonormative perspective.



Chapter 5.  World Englishes, multilingualism and written questionnaires 163

Endonormative refers to the use of variety-intrinsic norms, while exonormative refers to the application of external norms to a variety, such as using British English norms of grammar to assess correctness in, e.g. Fiji English. To assess a given linguistic feature one needs to ascertain whether it is typical or atypical in a given context. Bamgbose (1998) addresses the conflict of L2 Englishes as being caught between two norms, an endonormative and an exonormative norm: Innovations in non-native Englishes are often judged not for what they are or their functions within the varieties in which they occur, but rather according to how they stand in relation to the norms of native Englishes. To this extent, it is no exaggeration to say that these innovations are torn between two sets of norms [, the exonormative and the endonormative norms, SD]. (Bamgbose 1998: 1)

It is self-evident that in a study of WEs that the endonormative perspective is the only one worthy of a descriptive approach to language and language change. Bamgbose (1998) distinguishes between feature norm, as referring to the linguistic features that are common in one region, and behavioural norm, meaning the pragmatic rules in a given setting. He stresses the importance of drastically different behavioural norms for Outer Circle Englishes. Examples from Nigerian English, such as saying sorry! after one sneezes, go-slow as the lexical item for ‘traffic jam’ or using the expression not on seat for someone who is not in the office, testify to the different behavioural norms and pragmatic uses that English has been put to in Nigeria. In order to show that such features are not errors but innovations, he uses five criteria that may be applied to any form of English: – Demographic factor: How many people use the innovation and from which social background (basilectal, mesolectal or acrolectal)? – Geographical factor: How widely dispersed is it? – Authoritative factor: Who uses it (actual use of feature by writers, teachers, the media and the like) – Codification factor: Where is its usage sanctioned (which dictionary, which grammar)? – Acceptability factor: What is the attitude of users and non-users to it?  (Bamgbose 1998: 3–5) Bamgbose’s framework offers answers to the status of each variable within the variety. Given the need for empirical work for four of these five factors, WQs can deliver answers for all of these “internal measures of innovation”, to use Bamgbose’s term, that allow an assessment of each feature independently of other varieties of English. The demographic and geographical factors can be established with a survey of particular features. The acceptability factor requires a kind of language attitude questionnaire that has respondents rate various constructions for acceptance. For these three kinds of data it is, in fact, difficult to imagine any other data collection method than a WQ. Even the

164 The Written Questionnaire in Social Dialectology

authoritative factor is best studied with a WQ that is distributed among language professionals such as writers, teachers and reporters. Only codification can be studied without WQs, simply by combing the books available on the language reference book market.

5.4.4 Discovering variables and variants WQs have a lot to contribute to the study of ELF and Global Englishes. For ELF the same kinds of linguistic processes that have been found in Outer Circle Englishes may be operational, yet there are specific features that may be limited to ELF, as ELF contexts may afford special opportunities to create new uses and applications of English. Two areas of innovation that have been identified in WEs in general are lexis and pragmatics. ELF is a particularly suitable context to illustrate some principles of linguistic change as “ELF users can be observed – usually quite unselfconsciously – pushing the frontiers of Standard English when the occasion, or the need, arises” (Seidlhofer 2011: Section 5.2 Kindle Edition).

Lexical innovation Typical lexical phenomena in contact scenarios include loan transfers from L1s or semantic shifts of English words, which include changes in word classes. Pitzl, Breiteneder and Klimpfinger (2008) analyze a subsection of the Vienna-Oxford International Corpus of English (VOICE) for lexical innovations and show that lexical innovations in ELF follow the same patterns, yet to different degrees, than in Inner Circle English. They arrive at the summary of features shown in Table 5.4: Table 5.4  Lexical innovation in ELF (Table 2, Pitzl, Breiteneder & Klimpfinger 2008: 30) Category

Number of types (double categorization)

Category

Number of types (double categorization)

Suffixation Prefixation Multiple affixation Borrowing Analogy Reanalysis

85 (10) 65 (2) 19 (4) 13 (2) 24 (4)   7 (2)

Backfomation Blends Addition Reduction Compounding Truncations

  4 (3)   6 (2) 10 (5) 19 (4)   5 (1)   3 (1)

Affixation is by far the most productive category in this sample, with 85 and 65 instances of suffixation and prefixation respectively, complemented by 19 cases of multiple affixation. Examples include claustrophobicy, conformal, contentwise, cosmopolitanism, forbiddenness, imaginate, increasement, preferntly, publishist, turishhood, workal (Pitzl, Breiteneder & Klimpfinger 2008: 31). Some of these creations fill lexical gaps, such as forbiddenness, while others only reinforce formally the existing word class, e.g. increasement.



Chapter 5.  World Englishes, multilingualism and written questionnaires 165

Example (5.7) illustrates a lexical innovation that seems to serve the principle of economy by using the novel term pre-thesis in a working group discussion on joint degree programmes: (5.7) VOICE: POwgd14; S1=Swedish 971 S1: developed . er in each case (.) no? hh (.) but i think er: (.) if you talk about er interdisciplinary er er joint er programs that SOME part (.) er that wou-could be very interesting wo-would be hh (.) very interesting if it was er:m er developed as new . as a sort of an intersection of of er (.) the idea what you can contribute from different sides and make some part perhaps it’s (.) the most er sort of specialized (.) pre-thesis (.) a course so to say that could be more integrated and new. (2) i you understand what i mean (.) no? er er oh i think yeah. er (.) 972 SX: mhm

The authors argue that pre-thesis, used by a Swedish L1 speaker, is an innovation that is particularly apt and explain the possible motivation behind it: Instead of elaborating on the concept of a compulsory paper that has to be written in a certain course preceding the actual thesis – a rather complicated matter even if it is put down in writing – the speaker expresses the concept in a more economical way via coining the word pre-thesis. (Pitzl, Breiteneder & Klimpfinger 2008: 33)

Pre-thesis as a term, describing something similar to qualifying paper, might have a chance to be used more widely. Borrowings would seem to be prime contenders for lexical enrichment in ELF contexts, due to the multilingual substrates that are at the speakers’ disposal. This feature is of course limited as ELF speakers do not necessarily share other languages, but it is still present. With only 13 borrowings, however, this process is not as frequent as one might think. An example from VOICE is given in (5.8), from a conversation between a Dutch and a Danish L1 speaker. (5.8)

VOICE: LEcon227; S1 = Dutch (BE), S2 = Danish 65 S1: seven prime ministers = 66 S2: = but (.) how much power do they have? as (.) 67 S1: quite a lot = 68 S2: = they must have different (.) tasks 69 S1: yeah (.) but quite a lot actually it’s very much (.) so (.) erm (.) er de- a decreet{decree} (.) has the same power as a law (.) 70 S2: yeah (.)

166 The Written Questionnaire in Social Dialectology

The example shows how difficult it can be to classify lexical borrowings. The authors consider two options, that “decreet might suggest that the speaker omitted a consonant while probably aiming at discreet” or that “the speaker, […] means to say decree and borrows from his first language Dutch, where the English word decree translates as decreet in Dutch” (ibid: 37). They opt for the second explanation of a Dutch borrowing in his ELF. A third, and in the data more prevalent process is analogy, which yields forms such as thinked, catched, drived, feeled, losed, putted ( integrating the: THEN [org2] criteria =

The idiom head and tails is not, as Pitzl comments, the L1 idiom of can’t make heads or tails of it ‘do not understand’ but instead a “newly created metaphor using terms of embodiment” (2009: 311) that are intuitively analyzable: the draft document discussed is not in need of an entire make-over. WQs have a role in testing such new, ad-hoc created metaphors for their versatility of interpretation, their domains of use and their stability. By creating idioms that are, so to speak, tailor-made for the ELF context, ELF speakers give full weight to the cooperative principle of not throwing off one’s interlocutor by using idioms from different (i.e. L1) contexts. Variability has been found in ELF contexts as being highly situation-dependent, however. This situationally motivated linguistic creativity leads to the linguistic forms of ELF being locally (re)coined and (re)adapted in a sequence of individual speech events but presumably never becoming as stable as the form of a nativized variety of speech.  (Pitzl 2012: 39)

However, this does not mean that ELF does not have norms, far from it. Endonormativity, however, has not been observed on the level of “one homogeneous ELF variety, but at the level of different Communities of Practice”, where among groups that collaborate over longer periods of time “some unconventional linguistic practices may become conventional” (ibid). WQs might play a vital role in establishing “[h]ow far stabilization [of creative structures] […] in different groups of ELF speakers in particular domains of use and constellations of first language backgrounds” do occur (Seidlhofer 2009: 211).

Some principles for variable detection While the choice of variables is interwoven with the specific social situations, some guidelines for identification of linguistic variables may be offered. There are a number of processes that are expected to occur in ELF settings and some general principles of linguistic change that may be exploited in the study of WEs. These principles may guide the researcher in identifying possible variants of existing variables, or new variables as such (where no source is given, examples were gathered through informal observation).



Chapter 5.  World Englishes, multilingualism and written questionnaires 169

Principle 1 – Analogical Patterns: look for forms that exist as a result of analogical patterns, verb forms, e.g. teached (Expanding Circle Englishes, in Austria, for taught), adjectives, e.g. prepone (IndE, in analogy to postpone, exploiting a semantic gap in Standard BrE, Widdowson 1994: 383–384), prepositions, e.g. discuss about (in analogy to talk about, write about, etc. West AfrE Bamgbose 1998: 5), or in L2 CanE Please return newspaper back (instead of Please return newspaper, in analogy to turn back, give back etc.). The opposite case is also found, i.e. the loss of prepositions, e.g. CanE to protest something (vs. to protest against something). Principle 2 – Principle of Economy: look for shortenings, replacements and abbreviations of all types, e.g. CanE Mountie for police officer (or RCMP officer or even Royal Canadian Mounted Police officer) AusE brekkie for breakfast (for hypocristics in AusE, see Simpson 2004), CanE EI for employment insurance, (the euphemism that replaced earlier CanE UI for unemployment insurance), including initialisms, such as AmE JT for Justin Timberlake or AmE JLo for Jennifer Lopez, or clippings, such as (orig.) AAE bro & sis for brother and sister respectively and all its various semantic shadings, specializations and generalizations in meaning. Principle 3 – Transparency: trace items that have become opaque as a result of language change or are no longer used in a systematic way as they may become reinterpreted differently as potential linguistic variables. Interpreted at some point in the 20th century in the opposite way than intended was adjective inflammable, which originally meant may catch fire (the English reflex of German prefix ent-, e.g. entflammbar ‘may catch fire’, entzündlich ‘may be enflamed’ etc.) became reinterpreted as cannot catch fire. Prefix in- was seen as related to un- ‘not’, such as in incorrect or unfriendly. Etymologically this is not the case, but it was not transparent to some speakers. Therefore, flammable was created, ‘may catch fire’, to avoid potentially deadly confusions. Moreover, it seems that basic phrasal verbs, go down, go up, go with, while not always transparent, are preferred in ESL and EFL contexts over Latinate, French and Greek word roots in contexts where semantic doublets exist, e.g. decrease, increase, join. It seems that the versatility of the construction with one verb stem, e.g. go and particles, has an advantage over less transparent vocabulary. Combinations with adjectives, as in go hungry, instead of starve, go mad, instead of insane, extend the versatility of the basic verb go. Principle 4 – Language Contact Phenomena: look for phenomena that are a direct result of language contact. This could start with loanwords or concepts that do not exist in the Anglophone world, e.g. AutE zivildienst ‘social service in lieu of compulsory military service’ or GerE BAFÖG ‘type of federal student grants’. Language contact phenomena include syntactic, morphological, phonetic and phonological features, e.g. from final consonant devoicing in AutE busses [bʌsɪs] instead of [bʌsɪz], to the replacement of sounds with others such as interdental fricatives [ð] with stops [d] in

170 The Written Questionnaire in Social Dialectology

words such as that, those in Euro English, and beyond to suprasegmental sound features, such as the “tonal” quality of Asian Englishes (see, e.g. Lim 2011). Principle 5 – Substrate Influence: a special instantiation of Principle 4 is substrate influence which refers here especially as pertaining to idioms and pragmatic conventions, which are one important source of influence of vernacular languages in Outer and Expanding Circle Englishes (but, of course, also in Inner Circle Englishes). Whether it is opening salutations, e.g. instead of hi or hello the use of so what in Sri Lankan English as a greeting (Jenkins 2009: 32), or direct translations of proverbs and sayings from one language to the other, e.g. that’s half the rent < AutG das ist die halbe Miete, which is a metaphor that refers to the identification of a big part of a task as being completed. Other areas of substrate influences include lexis, via calques and loan translations, e.g. Kenyan/Ugandan E matatu ‘collective taxi’ (Schmied 1991: 76–7), Turkish dolmush for a related and very similar form of transportation, or GhaE dodo ‘fried plantain’ (< Hausa, Bamgbose 1998: 6), have found their way into the Englishes in their given locations via other vernaculars. While substrate influence occurs on all linguistic levels, Bamgbose reasons that “lexical and semantic innovations are easier to accept and even inevitable” (1998: 6), which is a good reason for lexical and semantic variables, all readily and easily accessible with WQs, to be recommended for work on understudied varieties. Principle 6 – Internationalisms: another special instantiation of Principle 4 concerns the internationally (and increasingly globally) shared vocabulary. A considerable part of the vocabulary in languages in a given region, e.g. the western European languages, goes back to shared roots. There is a substantial part of the vocabulary that cuts across language boundaries and language families and it is only natural to expect “vocabulary sharing” within other regions. These terms are called “Internationalisms” and are comprised of terms that “work” in more than one language for obvious reasons of international transparency, e.g. hospital is internationally better understood than infirmary or German Spital, and arguably much more transparent to non-German speakers than German Krankenhaus ‘sick people’s house’, a common synonym for Spital (for Germanic languages such as English, the contributions in Hufeisen and Marx (2007) offer an interesting point of departure.). A subgroup of internationalisms revolves around abbreviations in international institutions. For instance, in the European Union context the question arises whether one adheres to the vernacular or to English names and their abbreviations. For instance, the European Central Bank is abbreviated in English as ECB, while its official German equivalent is Europäische Zentralbank and EZB (calques, but in this case from German and French to English, given the history of the European Union). Its French name is Banque centrale européenne (BCE) and its Greek name Ευρωπαϊκή Κεντρική



Chapter 5.  World Englishes, multilingualism and written questionnaires

Τράπεζα (Ε. Κ. Τ.). At times, the uses are mixed in texts, for instance Standard AutG long forms are increasingly paired with English initialisms, as in the quality newspaper Der Standard: “Der permanente Euro-Rettungsschirm (ESM) startet 2012 […]”. It is striking that no German abbreviation is used, but the English-inspired ESM for European Stability Mechanism. Increasingly, it seems as if we are dealing with hybrid usage that cuts across language boundaries: vernacular names are combined with English abbreviations and initialisms, which renders them relevant in a World Englishes context. These principles present some ideas that assist in the discovery of new variables. They are not exhaustive by any means and may not apply in all cases, but they may provide a springboard for the identification of new variables arising from language contact.

5.5 Addendum: Global Englishes and expert WQs A very different form of WQ has been put to use for the study of Inner and Outer Circle varieties, including pidgin and creole varieties. Most prominently, the Handbook of Varieties of English used expert questionnaires to establish benchmarks for phonological/ phonetic and morphological & syntactic phenomena in World Englishes (Schneider et al. 2004; Kortmann et al. 2004). The Handbook is the most complete comparison of features of World Englishes and an indispensable reference tool. A spin-off project, the electronic World Atlas of Varieties of English (eWAVE, Kortmann & Luckenheimer 2011) focuses on morphosyntactic features. Their methodology is different from speaker-based WQs as it does not employ questionnaires for lay speakers of the variety, but for expert linguists who offer their assessments of a set number of features in 60 locations world-wide. The focus of these questionnaires is on the comparability of data across many varieties. EWAVE features 74 varieties and charts 235 morphosyntactic features in four categories, as assessed by linguist experts: 

A – feature occurs frequently or even pervasively B – feature occurs neither pervasively nor rarely C – feature occurs rarely D – feature does not occur or no answer is possible (Anderwald & Kortmann 2013: 318)

Questions and instructions for linguist experts are, of course, framed utterly differently from questions for general users of the language. The questions are listed according to over-arching category, e.g. negation, and use specialist terminology, e.g. from eWAVE,

171

172

The Written Questionnaire in Social Dialectology



154. multiple negation/negative concord (e.g. He won’t do no harm) 155. ain’t as a negated from of be (e.g. They’re all in there, ain’t they?) 156. ain’t as a negated from of have (e.g. I ain’t had a look at them yet) 157. ain’t as generic negator before a main verb (e.g. Something I ain’t know about)

As one can see, the type of question is opaque for lay people, as best shown in three types of ain’t, which are elicited with specialist terminology that would not work with general-language user questionnaires. Expert questionnaires are a shortcut to obtaining language data: the basic idea is to apply them in large-scale projects where one cannot start from scratch for every variable and study each pattern individually. Instead, expert knowledge is elicited for a sketch of the situation of each feature. This becomes clear for the Handbook of Varieties of English, which polled expert opinion on 179 phonological/phonetic features and 76 morphosyntactic features. For phonological and phonetic features, experts were asked to provide information on each vowel and consonant (with three answer categories: A – occurs normally / is widespread; B occurs sometimes / occasionally, with some speakers / groups, in some environments; C – does not normally occur). For the KIT vowel, for instance, the short lax high front vowel in words such as bit, kid, kill, king and so forth, four contexts were polled for each variety (e.g. Surinamese Creole, Standard Ghanaian English, Standard American English or Irish English, among others):

KIT [ɪ] KIT raised / fronted, > [i] KIT centralized, > [ə] KIT with offglide, e.g. [ɪə/iə](Handbook of Varieties of English)

It would not make much sense to ask lay language users these kinds of questions. Expert types of WQs depend heavily on the existing literature on a variety and the expert’s familiarity with each feature and the variety in question. They represent an extreme aspect on the elicitation spectrum, as the features are evaluated in an abstract way that have little to do with one’s own personal use. However, they are more similar to community reporting, where respondents are asked to estimate what is common in a given speech community and experts make this type of assessment for fairly large regions, from national (e.g. Canadian English) to large regional varieties (e.g. Northern English English) (see Section 7.3.5). For a typological perspective – a perspective that seeks to compare features across a large array of varieties and languages – expert questionnaires are one of the few feasible and practical ways to obtain comparable data.



Chapter 5.  World Englishes, multilingualism and written questionnaires

5.6  Chapter summary This introduction to some features and linguistic functions in WEs and ELF was aimed to foreground common phenomena in these varieties. It is clear that WQs may have something to offer to the description of varieties of English as a World Language, as a Lingua Franca in regional and, especially, in global contexts and multilingual settings. The WQ’s relative ease of administration and its potential to gather responses from large numbers of people in a relatively short time and at a very low cost renders it a prime method for large-scale studies that would be otherwise very difficult to conduct with the labour-intensive interview-and-transcription method. Mobility on a global scale, such as in super-diverse contexts, pose challenges to many existing methods of study, including the WQ. As approaches to the sociolinguistics of globalization (Blommaert 2010) and language in its existence as local practice (Pennycook 2010) are being refined, WQs are likely to play a role in they study of these novel communication practices. All these findings are influencing a theory of space, in which space is much more than just “dots on a map”. Space is something that is actively created and shaped not just by built structure, but by social actions or non-actions, alike. The question of a “common core” in ELF, a lingua franca core, has been asked since Jenkins (2000), which established some shared phonological features of ELF speakers. The question to what degree lexical and pragmatic phenomena may be shared, given their creation in the moment, is an interesting one for which WQs may play a role too. In some ELF contexts, such as in Europe, literacy levels are usually not a problem. The situation, however, is different in some WE contexts, which would severely limit the administration of WQs as a survey tool. Respondents should be skilled in reading and writing, not just marginally familiar with it. In the US, when census polling switched from a fieldworker-based to a self-administered method in the 1970s (Dillman 2000: 7), it was felt that these levels had been reached. It is likely that in many contexts globally these levels are not yet realized, which puts limits to the versatility of WQs. Context plays a crucial role in designing a WQ, as one hopes to deliver a questionnaire that is maximally relevant to the linguistic and social context in question. This problem is compounded in superregional and global contexts, where respondents come generally from more diverse backgrounds. Caution is needed in the wording of the questions that need to avoid any marked constructions or locally restricted language use. Some of these issues will be explored in Chapter 7, after some key concepts in sociolinguistic theory in Chapter 6 will have been explored.

173

Chapter 6

WQ data and linguistic theory The present chapter is the last in the history and theory part of the book. Its primary function is to act as a one-stop introduction to major concepts in sociolinguistics and historical linguistics, concepts that have proven useful in the interpretation of WQ data. Chapter 4 provided an overview of the types of variables that WQs have been applied to, showing that the method does produce interesting results, while Chapter 5 explored a number of avenues for WQs in global contexts. It is hoped that the following theoretical concepts will facilitate the interpretation of WQ data and help the reader to move beyond individual variables by offering some concepts that allow, in connection with previous studies perhaps, to speak more generally on an issue. Theoretical concepts that are built on many case studies achieve this goal by linking an individual case to previously established analogues. For this purpose, Canadian examples will once more be provided. Some of these concepts have already been implicitly used in the previous discussions. It is now time to bring some of the assumptions to the forefront. I will address them one by one, starting with a concept that is at the core of almost all sociolinguistic studies and that we have already used a number of times, beginning with Chapter 4: the apparent-time hypothesis (6.1). This is followed by an explanation of the S-curve of linguistic change (6.2) in real and apparent time and the application of the Labovian concepts of change from above and change from below (6.3). These concepts are not new in any way, but they are essential to the understanding of language change. Five social correlates of linguistic change are then discussed and exemplified: gender (6.4), which is one of the most intriguing correlates, is followed by a discussion of linguistic effects across and along political borders (6.5). Additionally, a discussion of two frameworks from historical linguistics that offer clues to the development of postcolonial varieties will be presented. Both aim to model koinéization, i.e. the creation of “compromise varieties”, in relation to national languages. While they are generally presented as antithetical to each other, I will aim to show that Trudgill (2004) and Schneider (2007) have more in common than is usually perceived if their limitations are kept in mind. The Canadian context shall once more serve as the test case for both models (6.6). In the last decade or so, new theories of linguistic variation have been put forward, placing greater emphasis on the performative aspects of the speaker and his or her active creation of group membership. The section on social indexing (6.7) introduces these approaches, for which WQs will also be in the position to produce new data,

176 The Written Questionnaire in Social Dialectology

either via probing into speaker motivation for language choice or in asking, as explored in the speaker evaluation tradition (see, e.g., Giles and Billings 2004, Chapters 5 and 7), for assessments about the social associations of linguistic forms in certain contexts. The chapter is rounded off with a discussion of perspectives on the homogenization and heterogenization of dialects (6.8). While the data will be Canadian, the concepts are of relevance to many varieties, most obviously many postcolonial ones. As such the chapter presents concepts that will be useful in the new field of historical sociolinguistics (e.g. Hernańdez-Campoy & Conde-Silvestre 2012; A. Auer et al. 2015), which combines to central fields in the present book.

6.1 Real time and Apparent time All studies presented in Chapter 4 use the concept of apparent time to interpret the development of changes in progress. Apparent time is a concept that uses respondents’ ages as a shortcut to past language behaviours. Sometimes this is done more explicitly, as in Figure 4.14 for yod-dropping in avenue, other times it is less apparent, as in our discussion of take up the test (Section 4.1.3). But in both cases we have been working with the ages of the respondents and had made inferences based on age. Age is one of the most profound sociolinguistic variables. A respondent’s age affects his or her language use, as do gender, social class, or other social correlates. The difference with age is that an underlying assumption is applied: age is taken as a window into the past by interpreting the older respondents’ speech as representing the speech of a former era. In other words, we assume that each generation of speakers reflects the state of the language they acquired when they were young adults with stable linguistic repertoires. It is important to keep in mind that this method is an inference, a short-cut that has proven to work under most conditions. In order to learn what these conditions are we will look at two types of data: real-time and apparent-time data. Real-time studies use data from different points in time. A researcher might collect data today and return ten years later and interview the same people with the same method (a “panel study”, which is rare), or people who are socially defined in the same way (a “trend study”, which is more common). These kinds of longitudinal studies are logistically difficult to carry out and involve large waiting times between survey repeats where no results can be produced. Another kind of real-time study is found in historical corpus data. A historical text corpus allows the use of data composed at different points in time. Figure 6.1 shows data from the Bank of Canadian English, a 2.2-million-word database of historical Canadian English (Dollinger et al. 2006–), charting the development of modal auxiliaries. The lines in the figure represent the changes in the ways people expressed in writing a strong obligation from the early 17th century to the present: be to, as in She is to go, is rapidly declining, must, She must go peaks in the mid-19th century while have to, She



Chapter 6.  WQ data and linguistic theory



BE TO MUST HAVE TO GOT TO

  

    

– – – – –

Figure 6.1  Deontic Obligation Markers in Canadian English (Bank of Canadian English)

has to go, has been steadily gaining in importance, with got to, She (’s) got to (gotta) go only making a minor showing in the written data. Figure 6.1 uses real-time data (in this case texts composed at various points in time and arranged in categories) as evidence for language behaviour. As far as the written language is concerned, this is safe, but it would require a line of arguments to infer these data to the spoken language at the time. Real-time data is different from the way in which we interpreted most of the WQ data. Figure 6.1 shows date-able evidence, which has not been the case for the WQ data in Chapter 4. For instance, Figure 4.9 on the use of different from/than/to shows that the older speakers in Toronto, those over 80, report different from much more often than the younger speakers, with the age cohorts in the middle figuring in between. We did not just make a factual statement like this in Chapter 4, but we went beyond that by inferring from that data that “different from is no longer the ‘usual construction’ that it was in the last years of the 19th century”. We took a short-cut and interpreted the older speakers’ responses as representing a diachronically older stage of language development, and the younger speakers as a more recent one: this is the concept of apparent time. Canadian English In WQs, by asking respondents to place themselves in age brackets and by asking for their residence history we gather information that allows the classification of their answers by age-cohort as well as the region they linguistically represent. This works in the following way: In apparent time, the age of respondents is taken to reflect the state of the language at the end of their formative years, which is usually set at the age of 20. In young adults, language behaviour is generally settled to allow for this short-cut (there are exceptions, of changes occurring later, which will concern us below). This method provides a convenient window into the linguistic past. By asking a 63-year-old how she calls, for example, “the upholstered piece of furniture in one’s living room,” an answer of chesterfield suggests she used it when she was 20 (63 − 20 = 43 years ago). In some WQ studies we do not ask for the precise age but for the age bracket of the respondent,

177

178 The Written Questionnaire in Social Dialectology

in order to remove some of the social stigma of asking a person’s age. If, as we could see in Figure 4.1 for chesterfield, we find that 60 percent of respondents aged 60–69 report chesterfield, we can infer that 49 (in the case of the 69-year-olds) to 40 years ago (in the case of the 60-year-olds), people used chesterfield in 60 percent of cases. Simplifying somewhat, “four to five decades earlier” chesterfield was already the majority form. This rationale, obviously works only if the assumption holds that people – on the whole – do not change their linguistic behaviour as they grow older. Luckily, evidence for apparent time is strong, so that we can approach the issue from the other end: we can assume that respondents do not change their behaviour, unless we find evidence for them doing so. Apparent time is one of the standard ways of interpretation of change in sociolinguistics, which is why we can afford this work-around. The alternative approach to apparent time is real time, as we have seen in Figure 6.1. Studies in real time that are not based on a historical corpus would entail the repeated interviewing, or administration of questionnaires, to the same or a comparable sample of respondents. Real-time studies are rarer, as they require more effort with new data collection. We have also, tacitly, used real-time comparisons in the previous chapter. Sometimes, we use data from previous studies, such as Scargill and Warkentyne’s (1972) study or from Gregg’s Vancouver study (2004 [1984]). These comparisons are sometimes problematic, as different survey questions or conditions influence the findings. We still use this data, as real-time data is often not perfect and since it offers an important viewpoint into the past that is often needed to substantiate apparent-time studies.

6.1.1

Age-grading

The apparent-time hypothesis has been shown to hold generally, with one exception: age-graded changes. Age-graded changes are changes in the use of a variant that recur at a particular age in successive generations. In other words, age-grading happens when speakers change their linguistic behaviour in later life and real-time studies are invaluable for testing against age-grading. The Dialect Topography of Canada replicated the Golden Horseshoe survey (Greater Toronto region and surroundings) for the two youngest age cohorts about a decade after the initial study and has therefore introduced a real-time component. Both data sets, from 1991/2 and 2000, are found on the website (Chapter 8). Based on this data we can illustrate the concept of age-grading in apparent time, which allows us to identify one variable as an age-graded change. The variable is elicited in question 62: 62. Do you pronounce the letter Z as zee, or as zed?

It is important to know that zee is a variant predominantly used in American English. It has been used in CanE for a long time, but only as a marginal form. In 1846, an angry reader of the Kingston Herald wrote a letter to the editor, complaining that:



Chapter 6.  WQ data and linguistic theory 179

the instructor of youth, who when engaged in teaching the elements of the English language, direct them [the students] to call that letter ze, instead of zed, are teaching them error.  (quoted in Chambers 1993: 12)

An “error” that had American origins. An analysis of the Golden Horseshoe data from 1991/2 and 2000 for all respondents (RI 1–7) and for only the very local ones (RI 1–3) is shown in Figure 6.2. The data from 1991/2 is on the left hand side, the data (for only two age cohorts) from the 2000 re-survey is on the right. 100

1991/2 – RI 1–3 1991/2 – All 2000 – RI 1–3 2000 – all

90 80 70 60 50 40 30 20 10

–2 9 14 –1 9

20

70

ov er

80 –7 9 60 –6 50 9 –5 40 9 –4 9 30 –3 9 20 –2 9 14 –1 9

0

Figure 6.2  Zed in the Golden Horseshoe, 1991/2 and 2000

The 1991/2 data shows an increase of zee in the two youngest age cohorts, which would suggest the replacement of the Canadian form, zed, with the American form, zee, especially as the locals (RI 1–3) behave the same as the overall population. With these data alone we would not be able to tell whether there was a change in progress – towards zee – or age-grading in operation. The data on the right shows that the 20–29-year-olds in 2000, who were the 14–19-year-olds in 1991/2, preserve their percentages of zed, at around 70 percent and that the youngest group, 14–19, is almost at the same level. From this evidence it appears that zee in Canada is an age-graded phenomenon. As children get older they tend to change their usage from zee to zed. One reason they use the American form at all is a pre-school show, Sesame Street (Chambers 2009: 201–2), which until recently was broadcast from the United States and was widely watched. The show’s “alphabet song”, practiced with zeal, rhymes Z with me, which only works with the American pronunciation:

180 The Written Questionnaire in Social Dialectology

W (double-u), X (ex), Y (why), Z (zee) Now you know your ABC Next time won’t you sing with me?

We have yet to find solid evidence for a persistent decline of zed. Instead, today we find “alphabet song” versions that break the rhyme with “me” and use “zed” instead of “zee”, which is quite remarkable considering the importance of rhymes in children’s songs. WQ data can provide insights into linguistic changes. Boberg (2004a) uses WQ data in comparison to previous studies to show that apparent-time constructs generally hold but that not all variables adhere to it uniformly. As Labov says “variables operating at high levels of awareness are modified throughout a speaker’s life-time, with consistent age-grading in the community” (1994: 111). The change from zee to zed in the Golden Horseshoe is such a change. In this case, zed is also linked with local identity in the Golden Horseshoe, which drives the age-graded change back to zed: once grade school children are told that zed is Canadian and zee is not, they generally switch. Age-grading therefore describes changes in a speaker’s repertoire. Such individual (or in the case of zed group-based) changes can of course co-occur with societal changes and the question is to tell the two apart: when is a decrease in frequency among younger speakers a sign of societal linguistic change and when it is merely age-grading? This question is one of the most challenging aspects of apparent-time studies and the answer can ultimately only be given by real-time evidence.

6.2 The S-curve of linguistic change Chapter 4 has shown a number of S-curves or “near” S-curves of linguistic change. Figure 4.1 on the rise of couch or Figure 4.7 on snuck represent fairly regular S-curves. In S-curves, a change starts slowly at first until a threshold is reached, which is followed by a period of rapid change, before the change tapers off. Figure 6.3 shows an idealized representation of the S-curve. The change of an incoming variant is proceeding slowly at time points 1 and 2, while at point 3, at about 20%, a period of rapid change is triggered, which will then, after time point 4, taper off until only some residue tokens of the old variant remain at time points 5 and 6. This pattern has been discussed in the context of linguistic change since at least the 1950s (Denison 2003: 54) and has been observed in many linguistic changes since its formal establishment (Wang and Cheng 1970). While not all aspects of the S-curve are clear, for instance the relationship between real-time and apparent-time distributions



Chapter 6.  WQ data and linguistic theory 181

(Denison 2003: 61), it is now a standard template of change. We would expect, had we complete data, that incoming variants would be adopted in such a manner. However, since the rate of linguistic change is not constant, we have no way of predicting how long the change would need to be implemented fully: some changes are much faster than others (years vs. centuries). Most crucially, however, one can often tell only in hindsight where on an S-curve trajectory a change was caught at a given point in time. S-curve

100 90

% of incoming variant

80 70 60 50 40 30 20 10 0

1

2

3

4

5

6

Figure 6.3  Depiction of S-curve of linguistic change

6.2.1

The case of N/V+ing + N compounds

It is quite easy to produce S-curves if one can be fairly certain that only two variants compete for a function. An example is a change in English word-formation patterns of the types noun+ing + noun or verb+ing + noun, as in frying pan → fry pan or shipping list → ship list. This change has been ongoing for more than a century (Gallinsky 1952; Gold 1969; Wald & Besserman 2002; Dollinger 2008b). Figure 6.4 shows three examples: wait time (vs. waiting time), wait list (vs. waiting list) and dump truck (vs. dumping truck) for the last 30 years or so. The data come from Canadian Newsstand, a substantial collection of more than 300 Canadian newspapers from the late 1970s to the present. The tokens for the short form (wait time) and long form (waiting time) were searched for in each period and then converted into percentages which were then charted on the graph:

182 The Written Questionnaire in Social Dialectology

100

wait-time wait-list dump truck

90

percent of incoming form

80 70 60 50 40 30 20 10 0 1980–84 1985–89 1990–94 1995–99 2000–04 2005–07

Figure 6.4  N/V+ing + N, 1980–2007 (data: Canadian Newsstand)

It seems that we caught wait time just at the right moment in time. Of its 15266 tokens between 1980 to 2007, none were found in 1980–4, only one in 1985–89 (but 71 tokens of waiting time in that period). By 2005–07, the ratio was inverted: 4833 tokens for wait time vs. only 1237 for waiting time. We caught wait time at the right time to show its S-curve pattern in full with data that was strong enough (in total 20,566 competing tokens of both forms). Wait list, on the other hand, seems to be on a slower trajectory. At the end of the period, waiting list is still the majority form with more than 70% in usage (total of 19,569 tokens for both forms). Dump truck, on the other hand, has already run the full S-curve, and general reports of dumping truck as sounding “strange” to Canadians confirm this. More solid evidence can be found in the corpus figures: there are 3652 forms of dump truck since 1980, but not one single long form. One needs to do some digging to unearth the examples of dumping truck that used to be more common. Illustration 6.1 shows an ad from 1918 from a Canadian newspaper23. In the case of dump truck, Figure 6.4’s time line comes too late. What appear to be isolated changes are indeed coordinated changes from long form to short form, happening at different points in time. They are consistent with the overall movement 23. It is true that the context of automatic dumping somewhat interferes. However, the existence of dumping truck is beyond doubt: in the New York Times, dumping truck was fading out at the expense of dump truck in the last two decades of the 19th century.



Chapter 6.  WQ data and linguistic theory 183

of English from a synthetic to an analytic language, as inflected forms are increasingly pushed back: swim meet (much more common now) than swimming meet or finish line (vs. finishing line). There may be regional variation, with most short forms more advanced in North America. But even these examples do not go back to the cause or the competition between -ing forms and simple forms: the roots of the problem go back to late Old English times and have had a ripple effect ever since (Sauer 1992).

Illustration 6.1  Youngest example of dumping truck in The Globe (and Mail), 5 Sept. 1918: p. 5 Ad for “five-ton automatic dumping truck”

These examples illustrate the reality of the S-curve concept. As Denison (2003: 59) remarks, if there are more than two competitors however, the concept gets ‘messy’ very quickly and it is more difficult if not impossible to establish S-curves. However, as an approximation of the kinds of gradual changes that usually occur, it is even useful in more complex variant situations.

184 The Written Questionnaire in Social Dialectology

6.3 Change from above, change from below: Social class The concepts of change from above and change from below are key concepts of the Labovian sociolinguistic approach. First introduced in the 1966 study of the vernacular of New York City, it was suggested that an assessment of a linguistic change along the two lines needs to be made early. “Above” and “below” refer “simultaneously to levels of social awareness and positions [of speakers] in the socioeconomic hierarchy” (Labov 1994: 78). With this concept, social hierarchies such as social class enter the picture and linguistic changes need to be separated into two groups. First, changes may be initiated by the upper social classes (changes from “above”), which speakers are generally aware of and which trickle down in top-down, hierarchical fashion. R-dropping in postvocalic contexts is historically one such feature, in, e.g. New York or New England English, with the dropping of r’s enforced by the “dominant social class, often with full public awareness” (Labov 1994: 78). In late 18th-century London, it became fashionable, for instance, to no longer pronounce the r’s after vowels: car, far, four etc. In New York City, the introduction of r-forms in the middle of the 20th century changed the originally non-rhotic, r-less NYC dialect into a rhotic, r-ful one. Second, changes from below are the counterpoints to changes from above and are defined as “systematic changes that appear first in the vernacular, and represent the operation of internal linguistic factors”. It is crucial to note that “[a]t the outset, and through most of their development, they are completely below the level of social awareness” (Labov 1994: 78). It may be that later the one or the other commentator will spot the changes as of lower social origin (other than the highest social group). Generally, only when these changes are nearing completion (in the sense of the S-curve) will some members of the speech community start to notice them. These changes can be introduced by any social class, but “no cases have been reported in which the highest-status social group acts as the innovating group” (ibid). In 19th and 20th-century London working-class speech, which is traditionally h-less in word-initial position, e.g. hammer and hasty pronounced without the h-, the loss of word-initial h- would stem from the lower social classes. Only when the feature was widespread it was noted and turned into material for musicals (for example, Eliza Doolittle and Professor Higgins in My Fair Lady). Another, more recent example would be the adoption of originally working-class speech features into British standard speech and the development of an intermediate variety, between the British standard pronunciation (“RP”) and working class dialects. This “watering down” of a rather stiff RP pronunciation is usually done with linguistic features from the lower strata, which are changes from below (see Altendorf 2003). Among the variables from Chapter 4, different than would be a change from below. We saw that different from was considered the prestige form and different than the



Chapter 6.  WQ data and linguistic theory 185

vernacular Canadian (and North American) variant. The spread of take up #9 would be, likewise, a contender for change from below within Canada, as commentary on the feature is not found. Concerning the pronunciation of schedule, another WQ variable, pronounced with either [ʃ] (“sh in shed”) or [sk] (“sch in school”), is an interesting case. While [sk] is the older and more dominant variant in CanE, the introduction of a British layer of English in the early 19th century has led to a change toward [ʃ]. This change would have been one from above – since the British newcomers would have shown their superior attitudes linguistically in such a way. It is not always clear whether a change is from above or from below and it may take some historical research to establish where a variant likely originated. This is one of the first facts that should be established for any linguistic variable.

6.4 Gender (sex) Gender is a social category, while sex is a biological one. In sociolinguistics, however, until fairly recently gender was and is often still used to mean sex, i.e. biological sex in a binary fashion. This simplistic, binary view of gender has been one of the major variables in sociolinguistics. It reflects an unsophisticated view of the issue that has nonetheless revealed interesting trends, while current sociolinguistic theory is working towards a re-imagining and theorizing of gender as a culturally constructed variable. For the present purposes, we will report on studies that use a “classic” conception of a binary gender variable. One of the most exciting findings in early sociolinguistics was the discovery of fairly consistent gender-based linguistic changes. Early studies, spearheaded by William Labov in North America and Peter Trudgill in the UK, identified three gender-related patterns that still serve as guidelines today. Meyerhoff (2011: 218–31) calls them “Labov’s principles” and we will turn to these now.

6.4.1 Principle 1: Stable situations: Women use the standard more than men Linguistic variables that do not undergo change, and that are stable, will behave in a certain way. Examples of stable variables in English are “g-dropping” or -ing (talking or talkin’), or negative concord (as in I do not buy nothing/anything). Trudgill’s (1972) study on Norwich, England, delivers the prototypical female more-standard-like linguistic behaviour for (ing)24: Figure 6.5 shows the linguistic behaviour of men and women in both working and middle classes in three contextual styles: reading passage (RP), where the monitoring is

24. Sociolinguistis use ( ) to mark linguistic variables.

186 The Written Questionnaire in Social Dialectology

percent g-dropping

100 90

Working-class women Working-class men

80

Middle-class women Middle-class men

70 60 50 40 30 20 10 0

RP

FS

CS

Figure 6.5  (ing) in Norwich, England (Trudgill 1972, as classified by Meyerhoff 2011: 219)

the greatest, formal speech (FS) and, finally, casual speech (CS), in which the monitoring is the least. One can see that all classes and genders decrease their use of non-standard g-dropping, [ɪn] as in talking, dancing etc., as the speech style gets more formal. There is a clear gap between the working class speakers of both genders (dotted lines) at the top, and the middle class speakers. Among the working class, the women consistently use more standard forms (more [ɪŋ]) than the men. The middle class women do so too, with one notable exception. When reading a text passage (RP), middle class women show a stunning 0% of g-droppings and only a little (2%) in formal speech. In casual speech, however, they outperform the middle class men, shooting up to 34% of non-standard forms for g-dropping. This range in the female middle class speakers is remarkable and attests to sensitivity to the social context. It also shows that middle class women, when monitoring allows it, use more standard forms – in WQs, where monitoring is possible, such female behaviour would be expected. The reasons for this behaviour have given rise to a number of hypotheses including (from Trudgill 1972: 182): – in western society, men are more evaluated on what they do than how they speak – women employ the linguistic prestige forms more since they (traditionally) have lacked opportunities to gain access to power in the world outside of the home In some western working-class settings, women may have been exposed to more diverse ways of speaking (e.g. going to the doctor with the children, speaking with the teachers etc.) than men (Milroy 1987), which might explain the consistent differential shown in Figure 6.5 and other stable variables.



Chapter 6.  WQ data and linguistic theory 187

6.4.2 Principle 2: Women use more standard forms in changes from above One of the early data sets in Canadian English is Howard Woods’ Ottawa Survey, completed in 1979, but not published until Woods (1999). His face-to-face interviews included phonetic as well as grammatical questions. For the latter, the results are particularly clear cases of changes from above. In response to the question in (6.1), Woods’ 100 informants from the Ottawa region responded as shown in Figure 6.6: (6.1) Just between you and ___________, I think that they’re not telling the truth. A. Me B. I 100

Female Male

90 80

percent

70 60 50 40 30 20 10 0 Lower working & lower middle

Middle-middle

Upper middle & lower upper

Figure 6.6  Between you and me (vs. I) in Ottawa (source Woods 1999: 186)

80

Female Male

70

percent

60 50 40 30 20 10 0

Lower working & lower middle

Middle-middle

Upper middle & lower upper

Figure 6.7  Subjunctive were in Ottawa (source Woods 1999: 191)

188 The Written Questionnaire in Social Dialectology

Woods’ data shows that the upper social classes, including the highest “Upper-Middle & Lower-Upper” class, show categorical use of the prescribed form me, while the lowest classes display greater frequencies for me than the “Middle-Middle” class. We have explored the reasons in Chapter 4 as the confusion between I/me in the nominative and objective cases. It is the Middle-Middle class that hypercorrects for “incorrect” I the most. What is important in the present context is that in both cases there is a gender differential with the females leading in the use of the prescribed form me. The gender differential is also seen in Figure 6.7, which shows the results for was/ were in the subjunctive in response to question (6.2): (6.2) They would go for a walk if it __________ warmer. A. were B. was

The traditionally prescribed form for irrealis situations is were, though was is common in North America. In all three social groupings we see that women use were to a greater degree and with a considerable differential in the two highest classes. Was, the “vernacular” form, is hindered by were, which is a change from above.

6.4.3 Principle 3: Women use more of the incoming variant in changes from below The great majority of linguistic changes are changes from below. The majority of speakers are more often successful in lifting one form that used to be frowned upon to higher (or accepted) social status than the other way around. The sentence adverbial hopefully, as in Hopefully, it will be sunny tomorrow, is one such case. Another such change, noticed by commentators only when it was nearing completion, is the quotative be like. Like can have many functions, but the one of interest here concerns the introduction of direct quotations, as in She’s like: “They all looked so cool.”, which has been studied since the early 1990s. Tagliamonte and associates have looked at Canadian English in this context. Figure 6.8 shows the frequencies of quotative be like and competitors over the age groups in apparent time. It is obvious that be like is on the rise in the younger age cohorts, 17–29-yearolds, while say has lost most of its appeal, except in the very old. Be like is, basically, unrivalled in those under 30 in Toronto. What is important in the present context is that the introduction of be like as a quotative marker is led by women for those under 30 (Tagliamonte & D’Arcy 2007: 206, Table 2): in the 9–14, 15–16, 17–19 and 20–29-year-olds, women are more likely to use the form than men. The fact that the 30–39 year old males use the form slightly more than females is an interesting aspect that Tagliamonte and D’Arcy explain in the following way: they argue that the increase of be like began in the 30–39-year-olds, when gender effects were minimal, with men even leading slightly. Among the 20–29-year-olds, however, the women took the lead,



Chapter 6.  WQ data and linguistic theory 189

90

be like

80

say

70

go

60

zero

think

misc.

%

50 40 30 20 10 0

> 80 79–70 69–60 59–50 49–40 39–35 34–30 29–25 24–20 19–17 16–15 14–9

Figure 6.8  Quotatives in Toronto English (ages 9 to >80) (Tagliamonte & D’Arcy 2007: 205, Figure 2)

which then led to a clear differential in the 17–19-year-olds (2007: 209). Once such gender differential is established, with women leading, “men either retreat or resist the change […] causing a gender-split” (ibid).

6.4.4 Indexing social meaning: Gender The three principles as such can be used as diagnostics, but must also be treated with caution. They are the result of group aggregates that do not include the individual, performative aspect of speakers, as their behaviour would be concealed by the group averages. Meyerhoff (2011: 231–2) quite rightly calls this a “Gender Paradox”: on the one hand, women use more standard forms, on the other hand, they use more innovative forms. This is only a problem, if it can be shown that the same women exhibit these contradictory inclinations. It can well be that some women use more standard forms, while other women use more innovative forms. And here a more fine-grained, individual approach is needed, which has been explored in the work by Penelope Eckert and associates. The concept of indexing conceptualizes linguistic features as ‘indexing’ social meaning – where the linguistic sign acts like an arrow pointing towards social meanings and concepts – can be exploited to take a more detailed approach. Indexing, as we shall see in a later section, is an important feature of individual performance. Eckert (2011) shows that by using phonetic variants, Rachel, a 12-year old girl, linguistically indexes her social positioning: she either positions herself linguistically in a teenage group or in a children’s group, according to the needs of the social situation. Rachel, “a prominent, flamboyant, and central member of the popular

190 The Written Questionnaire in Social Dialectology

crowd” uses fronting of [oʊ] as in go, toe, show for the linguistic construction of social identity. Rachel’s pronunciations of quotative go are significantly more fronted than non-quotative occurrences, which “indexes what one might call a ‘teenage stance,’ ” (p. 95) as opposed to a ‘child stance’. She seems to know how to manipulate these linguistic cues to construct her social persona. As Eckert (2011: 93) puts it, “she juggles childish and teenage personae, registering childish hurt over being wronged by her friends and registering teenage sophistication in connection with the more general workings of the social market”. The social market is in this context a heterosexual market, as girls and boys are assigned different roles, with girls being put in charge of social engineering. We have mentioned the indexing of social meanings by linguistic means in Section 4.4 in relation to the variables of student and news and the presence or absence of yod. For instance, females tend to resist the change towards yod-lessness in Vancouver (Table 6.1), while younger working-class males reduce their usage of yod in more formal styles, which is atypical (Gregg 2004: 48). Table 6.1  Vancouver yod-retention (Gregg 2004: 48)

Minimal pairs Word list Reading passage

Older females Upper middle class

Younger males Working class

77 78 65

27 51 43

The older females have yod-forms as the target forms since their percentages increase or stay roughly the same from Reading Passage to the most formal styles. The working class males’ use of yod, however, drops off in the most formal style, in minimal pairs. Reviewing the often conflicting data on yod in various Canadian contexts, Clarke suggests because of the unusual gender distribution, that “the +glide [yod] variant is not the formal target for all groups” (2006: 235) and she reasons that we “are dealing with a change in indexicality: glided [yod] and glideless [no yod] variants have come to symbolize different social values for different segments of the Canadian population” (p. 236). It is therefore no longer primarily the case that [+yod] indexes “British”, as was the case some decades ago, and [−yod] indexes “American”, “North American” or “Canadian”. Instead, yod-ful variants have become to index “good breeding” and “sophistication”, which may explain variation behind student [−yod] and student [+yod]. It is, of course, possible, for indexical meanings to co-occur, e.g. “British”, “educated” at the same time, but perhaps to different degrees (see 6.7 for more on indexing). Indexicality is one area that often cannot be captured well with self-reports, as speakers often do not have a clear understanding of the process: they do it, but they



Chapter 6.  WQ data and linguistic theory 191

cannot faithfully report it in direct questions that are void of the social context. While findings on group averages, which WQs can produce, are an important step in the decoding of the social characteristics of a change, it is important to keep in mind that more fine-grained, individually-tailored interpretations, such as in the case of Rachel, are not possible. In addition, changes that are undergoing re-indexicalization, such as news and student in Canada (as we know from Table 3.10), are also not adequately measured with WQs. While WQs offer no access to speech style, anomalous gender patterns – other than the three reported here – are signs of special circumstances that should be taken into consideration and interpreted.

6.5 Border effects: Autonomy vs. heteronomy The typical situation in linguistic geography is that dialects change gradually when one moves from town to town. Sometimes the differences are bigger, sometimes smaller, but they will be cumulative: the farther one moves from a given location, the larger the differences will become. This situation is referred to as a Geographical Dialect Continuum (Chambers & Trudgill 1998: 9–12). The linguistic behaviour and attitudes of speakers along political boundaries differ from that typical situation as the border interferes and influences linguistic features on either side of the border in different ways. Two concepts are of central importance in relation to language and nation: autonomy and heteronomy. Linguistic autonomy can be explained as linguistic independence from another country. Heteronomy is the opposite of autonomy and refers to the dependence on another country for one’s linguistic standards. A political border often but not always changes speaker focus. Speakers on one side of the border will orient themselves towards their country’s prestige forms and, if available, norm-providing institutions, while speakers on the other side of the border will do the same for their country. Along the Czech and Slovak border, for instance, similar dialects have been spoken for centuries and in a unified Czechoslovakia, both regions looked towards the elites in Prague for their linguistic standard. Since the peaceful separation into a Czech and a Slovak Republic in 1993, only the Czechs are continuing to orient themselves towards Prague for their linguistic models, while the Slovaks look towards Bratislava, their new capital. As a result, what used to be the same language is now diverging into two different national varieties. In other words, the Slovak dialects along the border are heteronomous to Standard Slovak and are autonomous from Standard Czech. This means simply that the Slovaks consider themselves as speaking Slovak, while the Czechs consider themselves as speaking Czech. There are many other examples of linguistic effects along a border. Austrian and Bavarian German is another situation. Upper Austrians and Bavarians speak similar

192 The Written Questionnaire in Social Dialectology

dialects, but are heteronomous to Standard Austrian German and Standard German German respectively, which creates not only different reference points but also, over time, language diversification. The Upper Austrian-Bavarian German situation is about 200 years old, just about as old as the Canada-US border, which was established in 1776 but in the western regions as late as 1846. In the case of Canada and the U.S., a similar situation presents itself compared with Upper Austria/Bavaria, though with the profound difference that English settlement is only about as old as the mid-19th century in some parts of the country, which is in stark contrast to the German-speaking areas in Europe. The timelines in which the sociolinguistic conditions for an Austrian or Canadian variety would be in place are comparable and can be dated as far back as WWII (Dollinger in press, a). There are signs today that Canadians perceive of themselves to be speaking a distinct form of English that is linked with Canadian identity. Figure 6.9 shows the Vancouver results (n = 489, fall 2009) to the question whether “Canadian English is part of a Canadian identity”: 30 25 20 15 10 5 0

str ag 1

2

3

4

5

str dis 6

Figure 6.9  Canadian English & Canadian identity, Vancouver 2009 (%)

The choices are arranged on a Likert scale from strongly agree (1) to strongly disagree (6), with intermediate answers agree (2), somewhat agree (3), somewhat disagree (4) & disagree (5). More than two-thirds, or 69 percent, responded positively to CanE being part of Canadian identity, by choosing either somewhat agree, agree or strongly agree. Compared with studies from 30 years ago, there has been a massive attitudinal shift. Warkentyne (1983) reported that University of Victoria students rated Canadian linguistic identity and Canadian national identity as very low, leading to the assessment that the scores “should allay any fears that nationalism may be carried to extreme in British Columbia” (p. 73). This heteronomous assessment, certainly, is no longer evident in Figure 6.9.



Chapter 6.  WQ data and linguistic theory 193

Illustration 6.2  Canada-U.S. border along “0 Avenue” in Surrey, BC (left), and Blaine, WA (right) (photo: S. Dollinger)

The Canada-U.S. border is a particularly interesting linguistic divide, as traffic across the border is profound and steady: both countries’ economies are deeply intertwined, facilitating the exchange of goods and the movement of people in great numbers. This border can also lay claim to fame as being the longest unguarded border between two countries in the world, a section of which is shown in Illustration 6.2. WQ data has revealed interesting border effects, which will be discussed next.

6.5.1

Insights from Dialect Topography

The Dialect Topography of Canada database offers data from seven Canadian regions and four American border regions. In this section, one area in the Maritimes and one area in Ontario will be used. The normal progression of regional changes is along a continuum, where a political divide (as opposed to a geographical divide) would not play a major role. This is what can sometimes be seen across border regions but, more regularly, political borders produce some linguistic differentiations.

A cross-border continuum: St. Stephen (New Brunswick) and Calais (Maine) Discussed in Miller (1989) and Burnett (2006), an excellent example of a cross-­border geographical dialect continuum is found between New Brunswick (Canada) and Maine (United States). In this section, findings will be reported and augmented with new analyses of the Dialect Topography data. The variable is the name for the generic type of athletic shoes often worn today with casual clothing. There are many words for such athletic shoes: sneakers, running shoes, runners, tennies, kicks and so forth. While

194 The Written Questionnaire in Social Dialectology

sneakers is the overwhelming form in almost all American regions, in Canada the standard form is running shoes or runners west of the Maritime provinces. Figure 6.10 shows the situation in the seven Canadian and four adjacent American regions:          

tennis shoes / tennies sneakers / sneaks running shoes / runners

Go

ld

Va en nco Ho uve Ot rse r ta sho w aV e a Ea M lley st er on n To tre w al Qu nsh Ne eb ips w ec Br cit un y W sw es ick te rn W as h GH ing Ne ton w Y Ve ork rm on t M ai ne



Figure 6.10  ‘Athletic shoe’ in seven Canadian and four American regions (%) (adapted from Berger 2005: Figure 55)

While running shoes or runners are the majority forms in most of Canada, sneakers is the standard in three US regions. In Western Washington, though, tennis shoes or its shortening tennies dominate. Figure 6.10 shows that in New Brunswick the standard term is sneakers, contrary to the other Canadian regions. Closer inspection reveals that a dialect continuum crosses the international border, as shown by the percentages in Table 6.2, which reports the youngest cohort with strong local ties to their regions for sneakers and variants (e.g. sneaks, snicks). Table 6.2  Replies for sneakers in 14–19 year-old, RI 1–5 (%) Canada

United States

Moncton, NB

Saint John, St. Stephen, NB NB

Calais, Maine

Maine elsewhere

New Hampshire

Massachusetts

96.2

89.5

75

100

100

100

75

The data shows that percentages of sneakers decrease in the Moncton-Saint John.-St. Stephen corridor, running from the Northeast to the Southwest of the province. It also shows that Calais, on the American side, shows an identical frequency of sneakers when compared with St. Stephen, New Brunswick at 75 percent in otherwise categorical US regions. Together, the two cities form what may be called a cross-border dialect continuum



Chapter 6.  WQ data and linguistic theory 195

for the youngest speakers, shaded grey in Table 6.2: the teenagers in Calais report like the teenagers in St. Stephen. There are very good social reasons for the somewhat atypical cross-border continuum, as St. Stephen and Calais are two border towns that are socially intertwined in a number of ways. First, they are at opposite ends of the most important border crossing in the Maritimes. Second, and most importantly, they share a number of services: a movie theatre, a civic theatre, and a fire department, which means that local traffic between the two towns is frequent, more like between two parts of one town. And the third and perhaps most crucial factor: a number of youth and social activities are shared as well, including cross-border dating (Burnett 2006). All of these points taken together make St. Stephen and Calais stand out linguistically in subtle ways when compared to the New Brunswick and Maine norms: all other Maine respondents report sneakers categorically, as do the New Hampshire and Massachusetts respondents who are part of that US sample. In the New Brunswick hinterland, percentages of the use of sneakers are generally higher than in St. Stephen or in Calais. These two towns, therefore, pattern together. Both terms for athletic shoes originate in the last quarter of the 19th century. Running shoe is first attested in 1894 in Canada (Wick 2003) and sneaker in 1895 in American English (Mathews 1951). The assumption that the Maritimes adopted sneakers from American English seems reasonable, while running shoes spread across the rest of Canada. For sneakers, all age groups in New Brunswick show a dissemination of 79 percent and higher, with no apparent-time progression, which suggests that the form is well established. For instance, the over 80-year-olds responded in 92 percent of all cases with sneakers. Likewise, all New Brunswick regions show a dissemination of beyond 80 percent. However, some indicators of a spread of sneakers in the last century can still be found in the data. Geographical dialect continua are subject to influences by bigger cities. Some linguistic innovations spread as a function of distance from a centre and population sizes (Trudgill 1974a), which results in major centres acquiring innovations sooner. This explains why Moncton, the biggest conurbation in the Maritimes, has an almost categorical use of sneakers in the youngest age cohort with 96.2%, much like the American regions (Table 6.2). Saint John, however, which is only about half the size of Moncton but considerably bigger than St. Stephen and Calais, shows a lower percentage, but is farther advanced than St. Stephen and Calais in its adoption of sneakers. The explanation of Calais’ atypical behaviour lies in the cross-border ties between the communities. It seems that Calais is subject to Canadian influence for sneakers in the sense that sneakers is not categorical or near-categorical, as in the rest of the state. This finding, with St. Stephen and Calais exhibiting atypical behaviour, is also found for variables semi, with word-final [i] (not [aɪ]), and avenue with higher percentages of yod, which suggests a Canadian influence across the border (Burnett 2006: 168).

196 The Written Questionnaire in Social Dialectology

Political borders as linguistic divides: Shone in Ontario and New York It is sometimes stated that Canadian English is not very different from American English varieties. Given the settlement history of Canada’s English population, which is firmly rooted in 18th-centry Midland American English, a general similarity to American forms should be expected. On the contrary, any existence and continuing persistence of dialectal differences should be the cause of surprise. Despite a general permeability of the border since its inception, profound Canada-U.S. ties, and a U.S. hegemony in popular culture that now extends across the globe, a number of shibboleths, linguistic markers, exist that distinguish Canadian English from American English. In the Greater Toronto region, a striking example of such a shibboleth is the pronunciation of the past tense (and past participle) of shine: shone. Dialect Topography’s Question 43 elicits the information in the following way: (6.3) Does SHONE, as in ‘The sun shone brightly’, rhyme with John or Joan?

In phonological terms, the major Ontarian variant in shone rhymes with John, i.e. [ɑ]. The diphthong in Joan, [oʊ], is not often heard there, while it is the major variant across the border in the US. The pronunciation [ʃɑn] is a variant of long standing for which we can muster both real-time and apparent-time evidence. In Avis (1956: 50), 97.1% of his 104 Ontarian respondents reported it. In apparent-time, the 60-year-olds in the 1991/2 Dialect Topography of the Golden Horseshoe, who were in their twenties at the time of Avis’ poll report the same percentage, with 97.3%. While this result underlines the apparent-time hypothesis, it makes very clear that [ʃɑn] is deeply entrenched in Ontario. In a border context, [ʃɑn] becomes a Canadian shibboleth. Figure 6.11 shows the percentages by counties, with the two New York regions on the right, reported in Chambers (1994). 100 80 60 40 20

NY 2

NY 1

n

Be St lt .C at ha rin es W el la nd Ni ag ar a

ch

ilt o

Pe a

m

lto

n Ha

Ha

gh To ro nt o M iss iss au ga

ro u

bo Sc

ar

Os

ha w a

0

Figure 6.11  Shone pronounced [ʃɑn] in the Golden Horseshoe (%, Chambers 1994: Map 4)



Chapter 6.  WQ data and linguistic theory 197

The distinction between the Canadian and the U.S. side is almost categorical: 90% in St. Catherines, 98% in Welland and 93% in Niagara contrast dramatically with 8 and 2% in the New York regions. We have seen similar differences in Chapter 4 (e.g. chesterfield, tap, take up #9, avenue) and there are more: anti- and semi- are predominantly pronounced with final [ɪ] in Canada and [aɪ] in the U.S. Lever rhymes with fever in Canada, but with never in the U.S., and, as Figure 6.10 has shown, running shoes (or runners rather than sneakers) is a shibboleth in most parts of the country.

Heightened differences in immediate border regions Occasionally but not generally, one may find amplified and exaggerated linguistic effects in the immediate vicinity of a border. One such effect was found for the variable ‘soft drink’. Wendy Burnett has shown with WQs that Canadian pop and American soda become categorical, qualitative rather than quantitative differences in the border towns of St. Stephen and Calais, as Table 6.3 shows: Table 6.3  Pop (vs. soda) in New Brunswick and Maine among 14–19-year-olds (Burnett 2006: 168) Variable soft drink

pop (vs. soda)

New Brunswick

St. Stephen, NB

Calais, Maine

Maine

96.1

100.0

0.0

4.3

While the average for pop in New Brunswick is 96.1% and in Maine 4.3%, the border towns of St. Stephen and Calais show categorical behaviour and an exaggeration of the general differences found in the regions. We discussed yod-dropping in avenue in Chapter 4. In the Golden Horseshoe region it can be shown for avenue (Chambers 2002b) that the immediate border districts show greater contrast (here 90% and 32% of yod-ful avenue) than the areas further removed from the border (81% and 84% vs. 48% in the U.S. hinterland). Table 6.4 reproduces the scores of the border counties: Table 6.4  Yod-ful avenue, east to west in % (Chambers 2002b: Map 1) Canada

United States

Grimsby

Welland

St. Catharines

Niagara

NY 1 (Buffalo)

NY2 (hinterland)

85

84

81

90

32

48

In the Golden Horseshoe, the general effects of yod-retention and yod-dropping in the area are emphasized in avenue along the border (but unlike New Brunswick, Burnett 2006: 168). In Table 6.4 we can see that yod-fulness increases in Niagara, directly at the border, and is lower in Buffalo, on the opposite side (32%), than in the rest of New York State (48% in the hinterland). This linguistic phenomenon may correspond with social phenomena of exaggerated local pride in border regions, though it is not yet clear why it occurs only in some contexts and variables and not more generally.

198 The Written Questionnaire in Social Dialectology

These differences illustrated with Dialect Topography data coincide with phonetic studies on recent sound changes in the front vowels, collectively called the Canadian Shift (e.g. Boberg 2008a: 136). Simplifying greatly, one can say that the changes operate in the opposite direction in Ontario than in the neighbouring American region, the US “Inland North” (see Boberg 2008b: 154–55). Taken together, these differences appear to be bigger along the Ontario-U.S. border than in Western Canada or perhaps in New Brunswick. This does not mean that there are no linguistic differences along these other border regions, as can be shown with data from yet another WQ study. Boberg’s NARVS, which puts assessments about the relative strength of the Canada-U.S. border in various regions in perspective.

6.5.2 Insights from NARVS: The North American lexical perspective In Chapter 4 Charles Boberg’s North American Regional Vocabulary Survey was briefly introduced. This section explores how these data have enriched our idea of the linguistic effects of the border for these lexical items. The novelty of the approach was that Boberg consistently offered statistical tests for his set of 44 lexical variables that afford an assessment in a continental perspective, producing interesting results of a greater general applicability than one can reach with isolated variables. Table 6.5 offers such summative overview of the data. The left column lists the Canadian (on the left) and American regions (on the right), while the right-most column contains the number of variables showing a Canada-U.S. differential of greater Table 6.5  Strength of lexical boundaries between Canada and the U.S. (Boberg 2010: 187)

Saskatchewan-Western US British Columbia-Western US Manitoba-Western US Prince Edward Island-New England Alberta-Western US Newfoundland&Labrador-New England Vancouver&Victoria-Western US Quebec-New England Cape Breton-New England Northern Ontario-US Inland North Maritimes-New England New Brunswick-New England Nova Scotia-New England Eastern Ontario-US Inland North Toronto-US Inland North Southern Ontario-US Inland North

Mean frequency difference in %

N (diff. > 50%)

17 16 15 17 15 16 15 17 16 16 16 16 15 15 14 15

30 29 28 27 27 26 26 25 25 24 23 23 23 22 21 20



Chapter 6.  WQ data and linguistic theory 199

than 50% (of a maximum of 44). The middle column shows the overall mean frequency difference over all 44 variables. While the latter is indicative, the former shows more clearly which regions have the most lexical differences and reveals some surprising results: Saskatchewan, which is woefully understudied, is the most diverse region from its U.S. neighbouring region for these 44 variables with 30 variables showing a differential of 50% or greater. Closely followed by BC (29) and Manitoba (28), it seems that Western Canada is lexically more distinct from its U.S. neighbours than the Golden Horseshoe region – Toronto is only found in 15th place. Table 6.5 is a good indicator of the strength of the Canada-U.S. border in the lexical domain. It also reminds us that everyday impressions, as reported in the section above (“Ontario appears to show a stronger linguistic border than BC”), are not necessarily true and that systematic study can reveal rather unexpected patterns. After having dealt with the international comparisons, we need to consider regional differentiation within Canadian English lexis. Some varieties, such as AmE, have entire multi-volume dictionaries dedicated to the regional, i.e. non-national component of the lexis, as shown in the Dictionary of American Regional English (Cassidy and Hall 1985–2013). In Canada, the new edition of the Dictionary of Canadianisms on Historical Principles (Dollinger forthcoming) aims to offer a regional dimension where possible. To that purpose, WQ studies such as NARVS are the most reliable data. Table 6.6 shows a summary of the 44 variables for lexical boundaries within Canada. Table 6.6  Strength of lexical boundaries between Canadian regions (Boberg 2010: 184)

Montreal-New Brunswick Eastern Ontario-Montreal Nova Scotia-Newfoundland&Labrador Cape Breton-Newfoundland&Labrador Northern Ontario-Southern Ontario New Brunswick-Prince Edward Island Manitoba-Northern Ontario Saskatchewan-Manitoba Alberta-Saskatchewan Prince Edward Island-Nova Scotia Montreal-Quebec Nova Scotia-Cape Breton Vancouver&Victoria-British Columbia Southern Ontario-Toronto Toronto-Eastern Ontario New Brunswick-Nova Scotia Vancouver&Victoria-Alberta

Mean frequency difference in %

N (diff. > 50%)

12 10  8  8  6  7  7  7  5  7  6  6  6  5  5  5  4

17  9  8  5  5  3  1  1  1  0  0  0  0  0  0  0  0

200 The Written Questionnaire in Social Dialectology

One can see that the largest differences (greater than 50%) are found between New Brunswick and Montreal: 17 variables have such differential, while between Montreal and Eastern Ontario only 9 variables show this level of differentiation. Perhaps surprising is that differences between Newfoundland – generally considered Canada’s most distinct region – and Nova Scotia are only ranked in third place. Here, the focus on everyday concepts that are in use in the entire country is certainly bias against the lexical uniqueness of Newfoundland, which is founded on traditional vocabulary, as documented in the Dictionary of Newfoundland English (Story, Kirwin & Widdowson 1982, 1999). From 10th to 17th position, no major differences greater than 50% can be seen. In summary, Montreal stands out as a lexical dialect island within Canadian English, a fact that can be explained by the unique status of English as a minority language in an otherwise English-dominated Canada (Boberg 2012). Tables 6.5 and 6.6 show a number of interesting findings. They offer an empirical basis for a relative assessment of Canadian linguistic autonomy. As Boberg puts it: Perhaps the most remarkable fact to emerge from Table [6.5] is that all of the numbers it displays are larger than all of the numbers in Table [6.6]. This indicates that, whatever the relative importance of lexical differences between each Canadian region and the adjacent region of the United States, Canada’s regions have more in common with each other than any of them has with the United States. (Boberg 2010: 188)

It is safe to say that without WQ data such general assessments would not be available. Because of NARVS there is good, reliable data for a small set of lexical variables that serves as a starting point to a more comprehensive assessment of the linguistic features of CanE in cross-border perspective. Before NARVS, only isolated discussions of regional vocabulary were available (e.g. Gregg 1995; Harris 1983 for BC), which are informative in their own right but impossible to apply across the country. This typological, continent-wide perspective is made possible by WQs.

6.6 Sociohistorical frameworks and explanations In recent years, theories have been proposed that explicitly aim to explain the development of postcolonial varieties of languages, and in some cases especially of English. These theories vary in their consideration of social factors at certain stages of the development, but have in common a profound interest in the sociohistorical constraints of these developing varieties. Traditionally, the domain of historical linguistics, these approaches offer insights into theories of the genesis of New World varieties. What all models have in common is their interest in settlement patterns: whether Trudgill’s New-Dialect Formation theory (Trudgill et al. 2000a, 2000b; Trudgill 2004) or Schneider’s Dynamic Model (Schneider 2003, 2007), or Hickey’s (2003a) view of



Chapter 6.  WQ data and linguistic theory 201

dialect mixing and language mixing that foregrounds specific social characteristics in each setting, the basic demographic facts are always at the centre of focus. This chapter begins with a short overview of the settlement of Canada, where the most important immigration movements are grouped into five major (and somewhat abstracted) immigration “waves” (for more details see Boberg 2010: Chapter 2 on Canada; Dollinger 2008a: Chapter 3 on Ontario). This knowledge will facilitate many interpretations of CanE data. Then, we will briefly introduce the models by Trudgill and Schneider, which are often viewed as antithetical models, as approaches that offer theoretical frameworks for the interpretation of data and have more in common than at times meets the eye.

6.6.1 Canada’s five major immigration waves Table 6.7 represents a summary of major immigration streams into Canada. First outlined in Chambers (1991), immigration waves I–IV are now complemented with a fifth one, from a diverse set of countries, with mainland China leading in absolute immigration since the 1996 census. Full details are provided in Boberg (2010: 55–105) and in Chambers (2010: 12–19, 28–32). Table 6.7  Canada’s five major immigration waves 1776–1812 1815–1867 1890–1914

Wave I Wave II Wave III

1945–70s Post WW-II

Wave IV

1990s–present

Wave V

American immigration (“United Empire Loyalists”) British & Irish immigration Continental European immigration (Germany, Italy, Scandinavia & Ukraine), & British immigration Highly diverse immigration populations, including Europe, Asia (Korea, China, Vietnam, India, Pakistan), Latin America and the US Diverse immigration continues, with Chinese immigration now peaking

Accepted opinion today is that WAVE I, which was comprised of American immigration in the wake of the American Revolution, was responsible for establishing the basic character of Canadian English. Bloomfield (1948) described this scenario that is now known as the Loyalist Base Theory, named after the United Empire Loyalists leaving the newly-founded US for Canada. WAVE II, consisting of immigration from mostly non-southern locations in Britain, is responsible for foregrounding those linguistic variables of British descent that speakers have conscious access to. We mentioned some of them before, e.g. schedule with [ʃ], fill in a form (not fill out), tap (not faucet) and colour (vs. color), centre (vs. center), first person shall (not will) for the future tense are all forms that have (or used to have) wide currency as prestige forms in Canada. The

202 The Written Questionnaire in Social Dialectology

effects of subsequent immigration waves have been limited to cultural items, often at first limited to loan words and loan concepts for various kinds of foods. There are different rates of maintenance of one’s ethnic heritage among the immigrant groups. Boberg (2010: 97) approaches the different ethnic groups from the background of varying assimilation rates into mainstream society and offers data for numerically strong immigrant groups. Taking the rates of ethnic intermarriages as a measure of assimilation, Boberg reports that the groups with the highest assimilation rates are Welsh, Americans, Swedes, Norwegians, Irish, Scots and Danes, while whose with the lowest rates are Chinese, East Indians, Filipinos, Vietnamese, Portuguese, Greeks and Jamaicans. One might expect linguistic features to be developing in the more closely-knit groups, as long as they use enough English, aided by transfer features from various heritage languages. This scenario is different from the monolingual Belfast working class context of Milroy’s (1987) classic account, in which looser social networks were shown to favour linguistic innovation.

6.6.2 Trudgill and Schneider: Two complementary approaches? Peter Trudgill’s model of New-Dialect Formation has its roots in his 1986 monograph on the contact linguistics of dialect mixing. With the help of oral history recordings from New Zealand, one of the youngest Inner Circle varieties of English, Trudgill and associates more recently aimed to reconstruct the early stages of New Zealand English by accounting for the variation on the recordings in relation to 19th century British English. This is possible as some speakers represent the second generation of New Zealanders, which is the first native-born generation. As a project on the fascinating intersection between historical linguistics and sociolinguistics, a number of publications have appeared (most importantly Gordon & Lewis 1998; Trudgill et al. 2000a, 2000b), and later the monograph Trudgill (2004), entitled New-Dialect Formation Theory: The Inevitability of Colonial Englishes. Edgar Schneider’s approach, termed “Dynamic Model” (2003, 2007), is a model that only at a first glance appears to be explaining similar processes as it describes the formation of postcolonial Englishes. Based on extensive knowledge of the literature of New Englishes, Schneider suggests a five phase model. Both models, New-Dialect Formation and the Dynamic Model agree that the dialect and language mixing process produces new varieties that are of a compromise character and do not match either input variety. A mixed variety, which is called a koiné, is the outcome. Trudgill and Schneider disagree, however, about the role of social factors in this process. The term koiné is derived from the Greek word for “common” and refers to the common dialect that developed as a result of dialect contact among the people of Piraeus, the Athens sea port, and later spread as the language of the Greek Empire (Kerswill 2002). The term koinéization can therefore be used as a theory-neutral term.



Chapter 6.  WQ data and linguistic theory 203

Trudgill’s (2004) New-dialect Formation Theory The basic tenets of the theory are that (post)colonial varieties are products of dialect and language mixing processes, which is a statement that is uncontroversial today. The bone of contention is that Trudgill suggests to disregard social factors more or less completely for a well-defined period during the second and third generation after initial settlement, but not in the immediate first generation or the later ones, which is often overlooked. By eliminating all social factors, including the concept of identity during this period, Trudgill has managed to shock sociolinguists of many persuasions (see for example, the debate in Language in Society 37(2) 2008). We will see below that Edgar Schneider bases his own model of koinéization on identity as a driving force in the formation of new Englishes. At face value, these models are mutually exclusive. However, upon closer inspection, one will be able to identify a role for both. Trudgill’s model is attractive in that it offers testable, concrete and falsifiable predictions. It is based on six linguistic processes rooted in accommodation theory (Trudgill 1986; Kerswill 2002). Accommodation theory is ultimately linked to the innate feature of living creatures to copy behaviour from one another. Linguistically speaking, if one is in contact with speakers of different dialects one will inevitably pick up the one or the other feature to a degree. An important proviso is that the model only applies in tabula rasa situations, which are situations where “no prior-existing population [is] speaking the language in question, either in the location in question or nearby” (Trudgill 2004: 26). Based on six processes (dialect mixing, dialect levelling, dialect unmarking, interdialect development, reallocation of strong minority features and the subsequent focusing into a coherent new dialect), Trudgill postulates three stages that correspond each to one generation (about 25 years), shown in Table 6.8. Table 6.8  Three stages in Trudgill’s New-Dialect Formation Theory STAGE I STAGE II STAGE III

a.  rudimentary levelling a.  extreme variability a.  choice of majority forms

& & &

b.  interdialect development b.  apparent levelling b. reallocation

Stage I begins with the journey to the new location, where speakers of different linguistic backgrounds meet and speak with one another. In the case of speakers of different regional dialects of a language, this is the first stage of dialect mixing, in the case of speakers of different languages, this is the first stage of language mixing. From here on until the death of the emigrating generation, rudimentary levelling, which is linguistic accommodation among (emigrating) adults, is taking place. Accommodation in face-to-face communication between adults has been shown to be usually limited to the salient features of a dialect, i.e. those features that are “most prominent in the consciousness” of speakers of other varieties (Trudgill 1986: 12). Here, social factors

204 The Written Questionnaire in Social Dialectology

do play a role, even an important one. The person in the socially superior situation would be less likely to accommodate to forms of the socially inferior. Interdialect development is another consequence of adults being the main agents in Stage I. This involves the production of unusual, or even novel linguistic forms that are not present in any of the input varieties. Interdialect forms are the result of partial accommodation and/or misanalyses by adult speakers. It is in Stages II and III that most of the koinéization process is taking place. Stage I merely reduces the most extreme variants that the speakers are either aware of, willing and capable to reduce in frequency. It also produces some unexpected intermediate results, which changes the linguistic feature pool somewhat by the time the first generation of native-born children is acquiring their language. In Stage II, the first native-born generation is faced with an unusually varied feature pool. Extreme Variability is the result of the diverse linguistic input that the first native-born generation is exposed to, while a target dialect is still lacking for them. This generation selects their linguistic variants, and as they choose, only features that occur above a certain discourse frequency have a chance of being selected, which Trudgill suggests is 10%. This process is called apparent levelling and further reduces the feature pool. Stage III is the bone of contention in Trudgill’s theory. This concerns the further reduction of variation in the feature pool by the second native-born generation and is by far the most controversial part of the theory. Trudgill predicts that under normal circumstances only the majority forms are chosen (choice of majority forms) by this generation. This third-generation dialect will then be comprised of “features which were in a majority in the input, except in cases where unmarked features are in a large minority and win out over majority features on the grounds of their unmarkedness” (Trudgill 2004: 125). Unmarkedness refers to feature properties that are linguistically better suited and have a selection advantage over the majority feature. Examples are phonetic forms that are articulated with considerably less articulatory energy by preserving key distinctions. Overall, the New Zealand oral history data suggest that the final shape of New Zealand English is the result of a “levelling process which, for the most part, consisted of the loss of demographically [and thus linguistically] minority forms” (Trudgill 2004: 114). The children of Stage II and III do the heavy lifting and only for these two stages, Trudgill (2004: 158) is “arguing for a total absence of social factors” (Trudgill 2004: 158), while later, in the case of New Zealand from 1890 onwards [after Stage III], social factors may have begun to play a role – but [Trudgill’s] claim is that they were not relevant in the actual new-dialect formation process itself: New Zealand English in 1900 did not, for instance, have Diphthong Shift because it was prestigious. (Trudgill 2004: 156)



Chapter 6.  WQ data and linguistic theory 205

Schneider’s (2007) Dynamic Model The basic tenet behind Schneider’s model, which was first published in Schneider (2003) and then in more detail in Schneider (2007), is that new national varieties are the result of the “identity-driven process of linguistic convergence” (Schneider 2007: 30). Schneider and Trudgill agree that processes of dialect mixing and koinéization lead to a new variety in (post)colonial settings. As Schneider says “The linguistic mechanism through which the symbolic expression of group identities is achieved is accommodation […]” (Schneider 2007: 27). They disagree, one might say vehemently so, as to the driving forces behind the process. Schneider continues from the previous sentence: “Speakers who wish to signal a social bond between themselves will minimize any existing linguistic differences as a direct reflection of social proximity […]” (ibid). Where Trudgill aims to limit social factors to a minimum, Schneider places them front and centre. Schneider’s most fundamental claim is that “identity constructions and realignments, and their symbolic linguistic expressions, are also at the heart of the processes of the emergence of PCEs [postcolonial Englishes]” (2007: 28) and that a “common core” stands behind their development. Schneider’s “Dynamic Model”, which is named to reflect the dynamic process of “creating and recreating one’s identity” (2007: 28), works with five “Phases”. The developmental cycle of a new variety from start to finish is presented in Table 6.9, as applied to Canadian English: Table 6.9  Phases in Dynamic Model applied to Canadian English (Schneider 2007: 240–50) Phase I Phase II Phase III Phase IV Phase V

Foundation Exonormative Stabilization Nativization Endonormative Stabilization Diversification

1713–1812 1812–1867 1867–c.1910s c.1920–c.1970 c.1970–

Each phase is characterized, to the degree that the literature on the variety allows, by four aspects: – the political and historical developments (present in all cases), – “identity construction”, which is the crucial concept behind Schneider’s model and relies on the joining of a “Settler” branch of English speakers (be reminded the model is laid out for postcolonial Englishes) and “Indigenes” groups, with occasional “Adstrate” groups, which are socially equal to the English settlers (i.e. French in the Canadian context). The key element is the creation of a joint identity between the Settlers and the Indigenes that is something new and a creation of both groups.

206 The Written Questionnaire in Social Dialectology

– the sociolinguistics of contact/use and attitudes – the linguistic development in each phase for each variety where available The phases are temporally not defined in any way; they may last a few years or even centuries. The Canadian example is discussed in Schneider (2007: 240–50) and shall serve as the model of explication here. The Foundation stage of Schneider’s model is the “initial stage where English is brought to a new territory by a significant group of settlers” (2007: 33), when Indigenes (IDG) and Settler (STL) streams are “clearly distinct from the other”. Three linguistic processes go with this phase: koinéization (dialect levelling) among the settlers, incipient pidginization (for communication with the IDG) and toponymic borrowing. The phase of Exonormative stabilization (or Phase II) is characterized by stable colonial settlements and the establishment of the language of administration, schooling and the like. The settlers “perceive of themselves as outposts of Britain, deriving their social identity primarily from their common territory of origin” (p. 37). In Phase II the settlers perceive of themselves as having a “ ‘British-plus’ identity”, the IDG strand however, begins to structurally nativize the English language for their own purposes and contexts, which is “linguistically the most important and interesting [process]” (p. 39). Phase III is at the heart of the model, “the central phase of both cultural and linguistic transformation” (p. 40), which gives rise to a new identity, combining the old (British-based identity) and the new (from the New World) to create something novel. To a degree, it is also the merger of the STL and IDG strands: “both population groups realize and accept the fact that they will have to get along with each other for good”, which results in the STL and IDG strands to “become closely and directly intertwined” (p. 41). The premise is that political independence – which is now realized to a good degree – is a precursor for linguistic independence. This stage includes many linguistic phenomena, such as word-formation innovations, new collocations, varying degrees of personal usage and innovative verb complementation patterns (p. 46), as well as structural changes that are not referential in nature, such as syntax and morphology (p. 44). In addition, mixed codes, using two languages, are expected to surface in this phase (p. 47). The phase of Endonormative stabilization, Phase IV, gives expression to the newly achieved psychological independence and the acceptance of a new, indigenous identity in the gradual adoption and acceptance of local forms of English as a linguistic symbol of that new identity, a new, locally rooted linguistic self-confidence (p. 49). This self-confidence is seen in the publication of dictionaries and grammars for the local market and literary activity that harnesses these now standard linguistic features. An “Event X” may spur the political independence of the young nation, e.g. when expectations in the mother country are not met but result in bitter disappointment, such as Australia being left unsupported during World War II (p. 49).



Chapter 6.  WQ data and linguistic theory 207

Finally, in Phase V – Linguistic Diversification, the newly-gained independence has been established politically as well as linguistically and national identity is “no longer a prominent, disputed issue” (p. 53). In this climate, new, more fine-grained identity constructions come to the foreground that lead to social and linguistic diversification. As new groupings are formed within the nation, new social and especially regional dialects are being created (p. 54). Linguistic differences left between STL and IDG groups are “likely to resurface as ethnic dialect markers”. Schneider uses the major political benchmarks for a division into phases. Comparing the temporal scales, one notices a general and more wide-spread overlap between the phases in Canada. Defining the Foundation phase as lasting only until 1812 is an Ontario-centric point of view, as at this time the west was still uninhabited by Europeans. More importantly, one did not know prior to 1867 if the country would remain as a unit independent from the USA, which renders this phase’s end-point a post-hoc classification. The dating of Phase II prior to 1867 and the founding of a “Dominion of Canada” is intuitive, but one can argue that the phase of exonormative stabilization (Phase III) continued much longer than until 1867, at least to the 1930s or, in some cases the 1960s, when the first Canadian-made reference books came into wide circulation (the Gage Canadian Dictionaries, see DeWolf et al. 1997 for the latest edition). The first dictionary for the Canadian market appeared only in 1937 (Considine 2003). In the Canadian case, an exceptionally long overlap between Phases II and III can be seen. There is good evidence for Nativization in the early and mid-20th century, but temporary structures existed as early as the mid-19th century (Dollinger 2008a). Diversification is also taking a special role: ethnic dialects of English are either quite old (such as the Montreal ethnic groups or First Nation Englishes) or have not yet appeared, as the Toronto studies (Hoffman & Walker 2010; Hoffman 2010). These overlaps pose a problem to the model. What is interesting is that both Trudgill and Schneider emphasize accommodation as the underlying cause of koinéization. However, the process of accommodation is construed by Trudgill an automatic, unconscious process and by Schneider as a social process, whether conscious or unconscious, or a combination of both. This is one major difference between the two models. The outcome is for both the same in a “newly emerging, more focused compromise variety” (Schneider 2007: 27). Schneider talks of a causative function of identity in that process when he says that “those features [in the feature pool], which are identified as being shared within the group (and possibly not outside of it) will be used more regularly and become habitualized” (ibid). By contrast, for Trudgill (2008: 251), “if a common identity is promoted through language, then this happens as a consequence of accommodation; it is not its driving force.” The difference is the assessment of cause and effect: is identity the prime cause for linguistic koinéization (or levelling) or is it a result?

208 The Written Questionnaire in Social Dialectology

In scientific methodology, the strength of a model is decided by its predictive power: as such, Trudgill’s is the model that offers falsifiable predictions. Schneider’s model, on the other hand, has very little predictive power as it can only assign phases to developments in hindsight and once they are long completed: the model “does not claim to account for each and every aspect of complex realities” (Schneider 2007: 55). Trudgill does not claim to be comprehensive either, but his model is testable and makes falsifiable predictions (which was the prime reason for its application in Canada in Dollinger 2008a). Trudgill’s model also follows Occam’s Razor, which is used in the sciences to decide which of two models is “better”. Occam’s Razor, named after 14th century monk William of Ockham, states if a phenomenon is explained by two models, the one model that relies on fewer assumptions is to be preferred. Schneider’s model is, while a well-written and immensely well researched account, a post-hoc description of varieties that has limited predictive power. Its level of generality is fairly high. The prediction that all varieties will run through five phases at some point and at their own pace is a weak statement, given the general nature of the phases. The cycle of mixing (focus on motherland – creation of new variety – focus on innovations and appreciation followed by some diversification) is of the most general type. Schneider’s model has intuitive appeal, which is something that is lacking for some aspects of Trudgill’s model. Both models will be useful in the interpretation of WQ data: Schneider for Phases III, IV and V in the Canadian context, all of which may still be accessed via the apparent-time hypothesis considering the suggested overlap. Trudgill’s stages, likewise, are still accessible in western Canadian contexts as these settings generally include speakers that go back to the third generation (the second born in the country) (Stage III), and in some cases, such as on the Prairies even to the first native born generation (Stage II). Vancouver for instance was founded in 1886. Prior to that date, the bush land in its area was inhabited by the First Nations and by only a few dozen Europeans in the entire Lower Mainland. This situation would qualify as a tabula rasa situation on account of its distance to next English-speaking areas, e.g. Victoria was a day’s journey away in good sailing weather and Seattle, then a city of 40,000, was yet farther away. Those born before World War II are members of the first native-born generation of Vancouverites and are now in their 70s and 80s, which affords opportunities to test Trudgill’s model with WQs.

6.7 Indexing social meaning So far we have reviewed what may be called current “standard” sociolinguistic theory, especially from the perspective of language variation and change. Implicitly, however, we have interpreted linguistic signs as somewhat absolute markers of social categories,



Chapter 6.  WQ data and linguistic theory 209

such as, “if you use ‘dark L’ in Viennese German, you must be a working class member and may even be from the 12th district (Meidling), hence the name ‘Meidling L’ ”. To draw an analogy with the field of lexis, we treated linguistic variables more or less as independent entities with certain features. For instance, we defined a term, e.g. house, as signifying particular features that were independent of other linguistic variables, such features like a roof, windows, a door or more, perhaps a front and back yard. We have, to stick with the analogy of the house, ignored other members of the domain “human dwellings”, such as high rise, skyscraper, duplex, semi-detached house, detached house and the like, which, as a field of related entities, likely influences the other terms’ meanings. For the Viennese German working class marker, dark L, such a functional field approach would mean that it is not the only and perhaps not even the most important linguistic identifier of working class membership. This is intuitively evident as well. While sociolinguistic features may be found more prevalently in some groups, one feature does not really say much. Most importantly, however, some presumed, stereotypical working class speech features are shared across various social groups and classes to one degree or other. It is, in other words, more about the linguistic repertoire a person uses and that can, of course, be to a degree consciously manipulated and is also subject to subconscious adaptations. We, therefore, should be dealing not so much with isolated linguistic variables and the description of their social functions, as with fields or arrays of linguistic features.

6.7.1

Three “waves” in sociolinguistics

It is here where recent sociolinguistic theory has been developing what Eckert (2012) has quite recently termed “Third Wave Sociolinguistics”, which puts the speaker’s and writer’s performative aspect in the centre. First wave sociolinguists, by and large, have worked with general and broad correlations between group characteristics (age, gender, social class) and language (e.g. Labov 22006 [11966]; Trudgill 1972, 1974a).25 Second wave sociolinguists took a more ethnographic, field-worker based approach, whose insights were used to inform analyses (e.g. Eckert 1989, 2000). The focus was on group characteristics with a profound consideration for individual behaviours. Third wave approaches have been taking a more systematic view of the construction of linguistic styles and consider linguistic variables as generally under-defined entities that take on their precise meaning in a given situational context (e.g. Eckert 2012). This last point is important: somewhat like the (Firthian) idea that words take on precise meaning 25. Note that the first sociolinguistic study, Labov (1963), does not fit this generalizing pattern. While it deals with group characteristics, these were the result not the premise of the study. The latter got to characterize the field for the next 20 or 30 years.

210 The Written Questionnaire in Social Dialectology

in the context of other words, linguistic variables get to express their particular social functions in concert with other linguistic variables and in the given situation. The basic idea behind Third Wave approaches is the idea of interconnectedness. As such, the idea is not new, but its application in modern variationist sociolinguistics is somewhat of a rejuvenating force to the variationist paradigm, as it allows the cross-fertilization with other, not as heavily quantitative approaches to language. The same principle of interconnectedness, for instance, was applied in the field of lexical semantics as early as the late 1920s. Trier’s (1931) study on lexical fields, which is famous in Germanic linguistics but little known in English-language circles, can, for instance, be seen as a spiritual predecessor of Third Wave sociolinguistic approaches, if only in a very general way. Trier’s basic idea was inspired by what came to be known later as structuralist thought: in order to identify the true meaning of a (lexical) item, he envisaged the necessity to describe a semantic area in toto. So, if one wanted to encapsulate the precise meaning of nouns, e.g. house, bungalow, shed, hut, block house and the like, or of verbs of movement, e.g. walk, stagger, stumble, saunter and the like, one would need to establish the relationship between all involved terms, a network, so to speak. This is what Trier did with Old High and Middle High German words. In the early days of semantic field theory, the interconnectedness was stressed to an extreme degree: Das einzelne Wortzeichen […] ‘bedeutet’ nur in diesem Ganzen und k r a f t dieses Ganzen. Außerhalb eines Feldganzen kann es ein Bedeuten überhaupt nicht geben.  (Trier 1931: 5)26

It is clear that the extreme stance presented here is too drastic, yet it exemplifies the importance of interrelatedness that Eckert (2008) and others have recently begun to exploit for the social uses of linguistic variables. The connection of ideas of interconnectedness with the social correlates of linguistic items is almost immediately apparent. If we say that Item X has social meaning Y, we can only say so in the context of other linguistic items and their social meanings in a given situation. The meanings of Item X may indeed change from one context and speaker to the next. Silverstein (2003) introduced the relevant concepts into anthropological linguistics, establishing an “indexical order” of linguistic signs, which was elaborated by Eckert (2008) into the notion of an indexical field, and hence, the intellectual connection to system-dependent approaches of the type first proposed by Trier (1931) becomes apparent. Indexicality is one way of ordering what is otherwise often called “appropriateness”, which allows for creative uses of language to certain 26. “The stand-alone lexical item […] carries meaning only in the context of the whole [semantic field] and through the entire field. Beyond the Entirety of the field there can be no meaning whatsoever” (trans. SD).



Chapter 6.  WQ data and linguistic theory

degrees. Silverstein’s (2003) paper, in which he analyzed a number of factors that shape real-time linguistic exchanges, including politeness and power structures, makes the dynamic, “dialectological” aspect of language use unmistakably clear: An illuminating indexical analysis, as opposed to an incomplete or inadequate one, has to take account of the dialectical plenitude of indexicality in micro-contextual realtime, and has to situate itself with respect to the duplex quality of language use, always already both “pragmatic,” i.e., presuppositionally/entailingly indexical, and metapragmatic, i.e., in particular, ideologically informed.  (Silverstein 2003: 227, italics added SD)

In this context it is important to point out, as Silverstein and others have shown (Eckert 2008: 455), that social, linguistic variables index social variables not in a straight-forward, direct kind of way – such as was illustrated in the above example of “Meidlinger L” – but in a much more nuanced, indirect way that depends on the speakers involved in the discursive context and the like. Eckert goes on to construct indexical fields within which linguistic variables may code, or index, social meaning. In her own words: I argue that the meanings of variables are not precise or fixed but rather constitute a field of potential meanings – an indexical field, or constellation of ideologically related meanings, any one of which can be activated in the situated use of the variable. (Eckert 2008: 454)

The shift from what may be called “static, direct meaning” to “indirect and potentially activatable meaning”, depending on the situational context, puts a focus on the actual discourse situation in similar ways to Pennycook’s (2007) concept of language as practice or Blommaert’s (2010) linguistic repertoires: all these approaches have in common that they take the negotiation factor of meaning in discourse seriously. Eckert goes on to specify that this situational context actively shapes the social context of linguistic variables in the indexical field, which is defined as a “constellation of meanings that are ideologically linked” (Eckert 2008: 464). As all meanings in the field are linked with linguistic forms, they can be considered “as an embodiment of ideology in linguistic form” (ibid).

6.7.2 Yod-dropping in CanE as an indexical field An example for an indexical field will illustrate the point. Yod-dropping, the absence of palatal glide /j/ in words such as student, tune, news, Tuesday has been presented as a typical American feature, while British English varieties typically prefer the yod. In Canada, which has been exposed to both American and British influences, a complex array of meanings has developed that has been difficult to model. An important advance was made in Clarke (2006), which revisited studies on yod-dropping since

211

212 The Written Questionnaire in Social Dialectology

the 1970s and came to the conclusion that the traditionally associated meanings [+yod] = American and [−yod] = British are not representative of the variable’s scope as other social meanings, or in new parlance, other social indices have been developed. Figure 6.12 shows a graphic representation of some of those meanings, as discussed in Chapter 4 and based on Clarke (2006) and Dollinger (2012b), and is inspired by Eckert’s (2008) depictions.27

British

oldfashioned

sophisticated

?

learnedness

not North American

Canadian

poshness

nonfolksyness

European

EnglishCanadian

panAsianness

AntiAmerican

Figure 6.12  Possible indexes of yod-ful student in Canadian English

Yod-ful pronunciations of student or news as /stjuːdənt/ and /njuːz/, as we have seen in Chapter 4, can signify a number of things for Canadians: they may signal a traditional, plain British orientation, or just an Anti-American one (“I do as the American’s don’t”), or non-folksy-ness (a fact that triggered Mario’s atypical use of yod-less [studənt] among his university-educated peers, see Section 3.3.3), or poshness and pretension, or, depending on the ethnicity of the speaker, perhaps more European or pan-Asian stances, as BrE is the teaching model in much of western Europe and much of Asia (with the exception of the Philippines). Or, other meanings, not yet studied, as represented by the “?”. One crucial point is that these meanings are all potentially realized at the same time and depend on the particular social context, the interlocutors and the settings. The view of indexicality also allows discrepancies between produced and perceived meanings: one person might wish to signal “erudition” by being yod-ful, but might just be perceived as “posh” or as exotic “British”, in, say a Southern US small-college context. Equally important is the fact that these social indices are subject to change. As Eckert expresses the idea:

27. Eckert (2008) visualizes data from Campbell-Kibler (2007) on (ing) and Podesva’s work on realizations of (t).



Chapter 6.  WQ data and linguistic theory 213

The [indexical] field is fluid, and each new activation has the potential to change the field by building on ideological connections. Thus variation constitutes an indexical system that embeds ideology in language and that is in turn part and parcel of the construction of ideology. This concept leaves us with a new (that is, an additional) enterprise of studying variation as an indexical system, taking meaning as a point of departure rather than the sound changes or structural issues that have generally governed what variables we study and how we study them. (Eckert 2008: 454)

Indexical approaches to linguistic variability are a welcome addition that unlocks a lot of systematicity that has hitherto been sidelined to anecdotal reporting (however important these reports may be). With these approaches, the speaker’s or writer’s agency is given a central role, a role that has been diminished in much variationist work until recently. Moreover, the agent’s underlying assumptions about language and linguistic structures, however subjective they may be, are now increasingly taken as a serious factor in shaping linguistic behaviour to the extent that they help shape the constituents that comprise the indexical field. It does, after all, make a big difference whether I perceive of “learnedness” or “Americanness” as positive or negative or perhaps more neutrally, and these attitudes are shaped by past experience and encounters that are not limited to linguistic experiences. The view of the speaker and writer as an agent opens up new avenues for cross-­ discipline connections. Some researchers have been linking linguistic variables with speakers’ desire to align with a group (Eckert 1989, 2000), which is a scenario that is known in sociology as social identity theory (Tajfel & Turner 1979), which distinguishes between personal identity (I am Ruthie) and social identity, which foregrounds group membership (I am an urban planner). The theory’s goal is to identify the factors that lead people to think of themselves as individuals or as group members. Obviously, linguistic choice and performance may play an important role in a process that seems to be driven by perceived group membership, favouring in-group members over outgroup members even if there are no objective criteria to warrant such differential treatment (Ellemers 2010). Socially-coded linguistic features often take on a life of their own. Labov spoke, back in 1966, of “stereotypes”, those linguistic features that are associated with some groups and are publicly commented on. In contrast to “markers”, which vary with the speech style (from minimal pairs, to word lists, reading passages and more or less informal conversations), and “indicators”, which do not vary with these Labovian elicitation modes (called “styles” in his terminology), stereotypes are of central importance for the construction of language ideologies (e.g. Cameron 2003 for more on the matter). As Campbell-Kibler (2007: 55) writes on (ing), the dropping of ‘g’ in dancing, singing and the like, reactions typically associated with accents may be associated with other linguistic characteristics as well:

214 The Written Questionnaire in Social Dialectology

although accents obviously incorporate linguistic cues, they are social constructs, ‘things in the world,’ as Cavanaugh (2005) puts it. The (ING) variable, one of these linguistic cues which has made it to the conscious awareness of speakers (a “stereotype,” Labov 1966), also may be seen as a social object with its own meanings and relationships to other social objects such as accents.  (Campbell-Kibler 2007: 55)

More on this fascinating approach is found in Campbell-Kibler (2007), Eckert (2008), which is one of the ground-breaking papers in the variationist context, together with Johnstone and Kiesling (2008), and others. Variationist sociolinguists are adopting agentive perspectives on language use that have been central to many more qualitative-focussed enterprises, such as linguistic anthropology or literary studies of language, but have for a long time taken a backseat in favour or group attributes.

6.8 Homogeneity and heterogeneity The final section shall be dedicated to two concepts that not all linguists would consider widely applicable theoretical concepts: homogeneity (linguistic similarity) and heterogeneity (linguistic differences) between one or more regional or social loci. Homogeneity has long been considered an important feature in Canadian English (see Dollinger & Clarke 2012) and it has figured quite prominently in the discussion of linguistic features in some colonial settings, which, in contrast to Old World settlements, are often found to be less linguistically diverse. The prevalence of homogeneity in colonial contexts has been explained as a result of dialect mixing processes more generally and, in the Canadian context, with strong pan-Canadian connections and communications that outnumber other social ties (e.g. to the U.S. or the U.K.). Canadian linguistic homogeneity, it has been argued, has its roots in the westward movement from Ontario, when Ontarian speech patterns were planted in the western towns and cities, and the connections between Canadians from coast to coast. Around 1950, homogeneity has been called “the most surprising thing about the English currently spoken in Canada” (Priestley 1968 [1951]: 75). The concept has often been used as an explanation for the persistence of linguistic markers in Canada when compared to the United States. We have used this concept as well, for instance to explain the spread of the Canadian meaning of take up #9. There are a number of competing forces that operate on CanE however, and homogenization is just one tendency. There are: – Homogenization on the national level (with or without Newfoundland) – Homogenization on a continental level (concerning CanE and AmE variants) – Heterogenizing tendencies (working against the previous tendencies)



Chapter 6.  WQ data and linguistic theory 215

6.8.1 Homogenization on the national level On the one hand, there is a homogenizing trend across mainland Canada. There is also a more recent claim that includes Newfoundland in the mainland Canadian homogenization process (proposed by Chambers 2012 but opposed by Clarke 2012), which would make this a national phenomenon. Whether the linguistic changes in Newfoundland and Labrador, the youngest province that joined Canada as late as 1949 after centuries as a British colony in its own right, are part of this process remains to be seen. On the Canadian mainland this idea has a long pedigree in the literature, going back to the time around World War II. In an important 1948 paper, Morton W. Bloomfield was perhaps the first to state expressly that “one type of English is spread over Canada’s 3000-mile populated belt” (1975 [1948]: 8), that is, from Nova Scotia in the east to British Columbia in the west. This view was echoed in 1951 by Priestley (1968 [1951]), who remarked that a hypothetical trans-Canada traveller in the 1940s would have found that on the whole […] the speech of young Canadians from Halifax to Victoria tended to be far more uniform than that of young Englishmen […] or of young Americans […].  (Priestley 1968 [1951]: 75)

Though Bloomfield’s and Priestley’s views of a homogeneous CanE include the Maritime provinces, Alexander (1951, 1939) expressly excluded them (Nova Scotia, New Brunswick and Prince Edward Island) from discussions of homogeneity. He suggests however, that the regional dialect of the Great Lakes region from Ontario westwards “has to a considerable extent travelled westward with the flow of population” (1951: 13), which clearly implies homogeneity, but without distinguishing between Canadian and U.S. varieties. Lacking more sophisticated analysis techniques, studies at the time often conflated the themes of homogeneity and autonomy in the North American west, but Alexander, who collected fieldwork data for the LAUSC in Nova Scotia in the 1930s, was surely in a position to make that judgement call. The third quarter of the 20th century in Canadian dialectology was dominated by the definition of that area of homogeneity, which posed considerable logistical problems in the second-largest country of the world. Based on previous work, Chambers (1973: 114) defined a homogeneous speech area stretching from Ontario to Edmonton, the Rocky Mountains, and to the Canada-U.S. border. Gregg (1957: 20) however, had indirectly included British Columbia (BC) in this area by suggesting a generation prior to Chambers that the English of Vancouver youth is “embracing in its main features not only the province of BC, but probably most of English-speaking Canada”. The inclusion of BC into the homogeneous area was made explicit in the late 1970s (Chambers 1979: 190).

216 The Written Questionnaire in Social Dialectology

Homogeneity & Standard Canadian English All claims about Canadian homogeneity are confronted with a methodological conundrum as they are all grounded in descriptions of the speech of the Canadian middle class and most often the urban middle class. Boberg (2008a; 2010) notes that his phonetic studies are based on Standard Canadian English (StCanE), which he defines (2010: 199) as “essentially, the speech of middle-class people from Vancouver to Halifax”, expressly excluding working-class speech and the Newfoundland and Cape Breton dialect enclaves. The most clear-cut definition of StCanE comes from Chambers (1998c: 252), who describes the “standard accent” as “urban, middle-class English as spoken by people who have been urban, middle-class, anglophone Canadians for two generations or more”. If this definition is taken literally, it means that StCanE is not the variety of a majority: based on the 2006 census, only about one-third (36%) of the Canadian population would be speakers of StCanE (Dollinger 2011a: 5). However, it is still a widely spoken variety in comparison to other standard varieties of English (e.g. RP in Britain – used natively by only 3% to 5%, Trudgill & Hannah 2002: 9). It remains to be seen to what extent socially more diverse data – including non-middle class speakers – would alter the picture of Canadian English homogeneity. .

6.8.2 Homogenization on the continental level Another homogenizing force operates across continental North America rather than (mainland) Canada. It alters regional variation by spreading some variants at the expense of others. By homogenizing forms across the international Canada-U.S. border, changes may work in favour of heteronomy and at the expense of the autonomy of CanE. One such set of features that once quite systematically distinguished CanE form AmE are British terms that were promoted by members of the Canadian elite who are appropriately called “Canadian Dainty” (Chambers 2004). Starting with British migration in the second quarter of the 19th century and extending its cultural influence till about the mid-20th century as the leading social group, Canadian Dainty forced British variants onto the North-American character of CanE: schedule with [ʃ], what with [hw] instead of [w], pram not stroller, serviette not napkin, dived not dove, 1st person future shall, not will among others, were all British variants that had considerable currency until the third quarter of the 20th century (Avis 1973: 62–3 for the contemporary context; Chambers 2002a: Figure 9; Dollinger 2008a: 227–248). Linguistic work from the 1980s and early 1990s, in fact, tended to be preoccupied with the “Americanization” of Canadian features or the creation of North American homogeneity. Studies that suggested increasing heteronomy, or increased usage of American-like linguistic forms, include Chambers (1980; 1998b), Clarke (1993b), Woods (1993) and Zeller (1993), while Nylvek (1992, 1993b) found both British and American influence. Woods (1993: 174) for instance, went so far as to state that: “It appears that English Canadians will have to (and may want to) define their identity by



Chapter 6.  WQ data and linguistic theory 217

means other than language”, since some linguistic features were assimilating to supraregional forms used across North America. This is a late echo of Hamilton’s prediction (1964: 459) that “the speech of educated Canadians and Americans will fall together in some general North American standard”. The loss of chesterfield, a Canadian innovation and the adoption of couch for instance, can be seen in such light. However, it is clear from today’s perspective that the voices that belaboured the loss of Canadian English were exaggerated, as the next section illustrates.

6.8.3 Heterogenization (diversification) The picture of linguistic change is more complex than simple continental homogenization and a reduction of autonomy: apart from these homogenizing tendencies, either nationally or continentally, there are signs today that heterogeneity is beginning to become more clearly visible in Canada. Current phenomena of diversification relate to at least three different levels: – legacy diversification of traditional speech enclaves – urban diversification of ethnic groups – continuing diversification among Canadian middle-class speakers While there is a long-standing tradition to describe dialect enclaves, beginning at least with Lighthall (1889), emphasis has been placed in linguistic work on the theme of homogenization. Until the 1980s, talk about heterogeneity involved usually only a discussion of traditional Canadian settlement enclaves, which were areas that often did not partake in dialect and language mixing like the rest of English Canada on the basis of their dominant settler groups coming from homogenous backgrounds. These included the English of Lunenburg, Nova Scotia (which was founded on a German 18th-century substrate, Emenau 1935; Wilson 1958), Peterborough, Ontario (which was settled by a government-sponsored emigration scheme settling about 2000 Irish in the 1820s), the Ottawa Valley (Pringle & Padolsky 1983), Cape Breton, Nova Scotia (where many Scots settled) and Markham, Ontario – (where Germans settled early on). The general background of these and other enclaves was that they managed to delay the uptake of mainstream Canadian English features by virtue of their isolation and different social and ethnic make-up. With the passing of time however, these communities blended increasingly more into the Canadian standard. From the national perspective, Newfoundland can be seen as the largest enclave which is now changing some of its traditional dialect features due to increased contact with the mainland (van Herk, Childs & Thorburn 2010). A focus on urban diversification and ethnic networks has recently fuelled a vibrant new research field and an interest in heterogenization in urban areas. It is one of the historical linguist’s principles of language diversification that the longer-settled

218 The Written Questionnaire in Social Dialectology

areas feature higher linguistic diversity. Micro-linguistic studies in Montreal among the Jewish and Italian populations (Boberg 2004b) have shown that the socially relatively self-sustaining Italian and Jewish communities have developed identifiable phonetic features over four or five generations. In Toronto however, studies among the Chinese and Italian populations suggest that so far no recognizable linguistic features have been detected in the second generation (the first born in Canada) (Hoffman & Walker 2010; Hoffman 2010). These results suggest that a more complex array of social factors, including attitudes towards one’s heritage language and one’s “ethnic orientation”, either towards or away from one’s heritage group, will need to be applied to discover meaningful correlations with linguistic features (see Section 8.2.3). It seems that in this context Montreal is currently the exception with exceptionally tight-knit ethnic neighbourhoods within the city that show detectable linguistic features (Boberg 2004b). As Nagy et al. (2014) suggest “there is no one-size-fits-all approach to coding and operationalizing ethnic orientation”, which means that in each context the social co-determinants of language use in ethnic groups need to be explored individually. This research focusing on ethnic dialects in both English and the heritage languages in urban areas is a highly promising and much needed expansion of existing work on CanE. There is yet a third area in which heterogeneity enters the Canadian picture: renewed interest in Canadian dialectology in the late 1990s and early 2000s gave rise to a more updated view on the homogenous nature of Standard Canadian English. This research tends to confirm that CanE displays many shared features from the QuebecOntario border to the Pacific coast. The findings of ANAE (Labov et al. 2006) indicate the existence of phonetic isoglosses from Ontario to BC, and define the Maritime provinces as well as Newfoundland as linguistic regions in their own right. The ANAE results however, are based on just 33 Canadian speakers from urbanized communities, which were later revised with a larger sample (Boberg 2008a) and produced a clearer regional diversification, shown in Table 6.10. Table 6.10  Regional dialects in Canadian English based on Boberg (2005, 2008a) (Dollinger & Clarke 2012: 459) Phonetics

West

Ontario

Quebec

Maritimes

Newfoundland

BC Prairies BC, Prairies, NW Ontario Vocabulary West

Ontario Quebec Ontario

NW Ontario

Quebec (Montreal data only)

Atlantic Canada New Brunswick Prince Newfoundland & Nova Scotia Edward Island



Chapter 6.  WQ data and linguistic theory 219

The regional dialects of Canadian English are listed for both vocabulary (data from Boberg 2005) and phonetics (data from Boberg 2008a). While there are subtle differences between the two linguistic levels, there is also wide-spread agreement: the Maritimes and Newfoundland, Quebec, Ontario and the West are, on one level, dialect zones. At a lower level one can see that northwestern Ontario in vocabulary and BC in phonetics stand out. The data in Table 6.10 is characterized as middle-class Canadian English. Boberg’s data thus shows consistent, yet subtle differences in Standard Canadian English. The contradiction of a standard variety with regional variation is thus empirically shown for Canadian English. Boberg sees these differences as deriving from differences in the population input which has left durable traces in the regional variety, admitting that this variation presents: a challenge, if only a minor challenge, to the conventional view that Canadian English is geographically homogeneous over the vast territory extending from Vancouver to Ottawa.(Boberg 2008a: 150)

The question remains to which extent other stable regional variation has been developed across the large Canadian expanse, a question to which only systematic study can reveal the answer and for which WQs will play an important role. Polson (1969) and Stevenson (1976) offered some insights into dialect differences in non-urban regions of BC some 40 years ago. It would need to be shown if these have homogenized or have further diversified – and how other locations across the country fare in this respect. WQ data has proven to be able to detect exceptions to homogeneity in CanE, which the next example shows (Boberg 2004a: 187). In most of Canada, front vowels before [r] have merged into a [ɛ], for instance Mary [eɪ], merry [ɛ] and marry [æ] fall together in just one vowel [ɛ] and are indistinguishable. As early as the 1950s, Gregg (1957: 22) said that “Some [university student] speakers […] treat as homophones Harry and hairy, marry, Mary and merry [ˈmɛri].” Question 67 in Dialect Topography elicits the responses for the vowel before [r] in the following way: (6.4) Q67. Is the first a of GUARANTEE pronounced like a of cat? or a of care? or a of car?

Figure 6.13 shows the results for all seven Canadian regions:

          

sw ick

ps

Br

un

hi w Ne

st er

n

To w ns

ec

cit y

l eb

on tre a M

Qu Ea

Go

ld en

Ho r

se

co u Va n

sh oe Ot ta w aV al le y

care cat car

ve r

%

220 The Written Questionnaire in Social Dialectology

Figure 6.13  Vowels before [r] in guarantee for respondents under 40 [ɑ] as in car (bottom), [æ] as in cat (middle), [ɛ] as in care (top)

Figure 6.13 shows that the front vowel merger before [r] is operative in Canada, from east to west, with the notable exception of Montreal. In more than 60% of cases, Montrealers report the low front vowel [æ] in guarantee. Only Quebec City, with more than 30%, is somewhat close to the Montreal outlier. The third-highest use of the low front vowel is the Eastern Townships, with an extensive gap. It is possible that these three regions, all situated in the province of Quebec, participate in a sphere of influence on their own which might express the French superstrate and English’s unique status as a minority language in the Canadian context. Clearly, Montreal bucks the national trend of a front vowel merger. Boberg (2004a: 187) contrasts the situation in Southern Ontario, where 78% of the sample reported the first vowel of guarantee as care, [ɛ], and only 13% as cat, [æ], while in Montreal “these proportions were almost exactly reversed: 72% said it sounds like cat, while only 21% said it sounds like care.” This is not the first time that Montreal English has shown certain idiosyncrasies. Adaptations to variation in diphthongs (called Canadian Raising) have been shown to occur in Victoria, Vancouver and Toronto, but not in Montreal (Chambers 2006; Hung, Davison and Chambers 1993). The exceptions are not only limited to pronunciation. Boberg (2005: 36) concludes for NARVS that “Montreal appears to be the most lexically distinct region in Canada”, with the qualification of Newfoundland, and his acoustic analysis of Montreal speakers confirms that the lower quality of the front vowel before [r] is “one feature of the Montreal English vowel system that distinguishes it from other Canadian varieties” (Boberg 2004b: 550).



Chapter 6.  WQ data and linguistic theory 221

6.9  Chapter summary This chapter has introduced some of the most useful and versatile theoretical concepts in the interpretation of social dialectology WQ data. With the exception of social indexing and the model of koinéization, they are all time-tested hallmarks of empirical research into language variation and change. While age, gender, border effects, issues of homogeneity and heterogeneity as well as sociohistorical koinéization models cover a lot of ground, there are other social correlates of linguistic change that were not explored here. Some of these depend on the social groups being studied and include social mobility, ethnicity, ambition or other kinds of cultural orientations, including gender constructions (does one fit or fight prevailing gender stereotypes?) and group associations (e.g. hip-hop or indie rock?). Settlement history and theories on the development of new dialects were given a central part. Here, Trudgill’s and Schneider’s models have stood exemplarily as the two most widely discussed approaches, which are unfortunately more often than not pitted against one another, when they might just complement one another in all but one developmental stage. Their scope is, after all, quite different. The concepts presented here do not exhaust the spectrum: they provide more of a starting point that should encourage the student to look for other, potentially meaningful correlates.

Part II

Practice

Chapter 7

Questionnaire design and data collection This chapter is a guide through the design process of the methodological tool at the heart of all WQs: the questionnaire. It introduces the steps towards a custom-built WQ that is tailored to a particular situation, language variety or dialect. The applications of WQs are manifold: they range from large-scale projects that survey regional linguistic variation over large stretches of geographical space, down to microlinguistic studies of only one syntactic phenomenon or even small undergraduate class assignments. The method is largely language-independent. As long as the respondents are sufficiently literate it is worthwhile considering. While levels of literacy obviously vary between societies globally, in places where there have been low rates of illiteracy over a person’s lifetime, the method will generally be useful. The success of any survey depends a lot on the way a question is phrased. Questionnaire design is far from “anything goes”, which will be shown in this chapter, though it is also sometimes difficult to formulate precise dos and don’ts. By the end of the chapter, you should have a fairly good idea of: – what you can and you cannot ask in WQs – how to ask your questions – the pros and cons of different survey formats (e.g. length, presentation, paper or online) We will first reflect on the planning stages of questionnaire design. The research question at the base of the questionnaire defines its purpose and therefore needs to be clarified (Section 7.1). The structure of the questionnaire will be considered next (7.2), before ample discussion will be dedicated to question types (7.3), including those used in newer WQ approaches, and useful strategies to mitigate against socially sanctioned answer behaviour. After these core questions, we will address some issues about population sampling (7.4).

226 The Written Questionnaire in Social Dialectology

7.1

Planning the questionnaire: Purpose & research question

The most obvious question to ask is about the purpose of the questionnaire. In reality, this process begins with the formulation of a research question. The research question should be of a broader spectrum and not as specific as a particular questionnaire item. At this early stage of the research, literature searches on the general research question are particularly informative. For Canadian English, this book includes a lot of reference points. For other areas, the researcher would gather the precise knowledge elsewhere. Examples of research questions, drawn from the previous chapters and relating to Canadian English, include: (7.1)

a. Is Canadian English a homogeneous variety? b. Is the autonomy of Canadian English in danger? c. Do women in Vancouver use more standard variants than men? d. Are traditional Canadian lexical variables on the decline? e. Do Chinese-Canadians in Vancouver use different lexical items than Anglo-Irish Canadians?

Boberg’s (2010) assessment on lexical autonomy in Canadian English, for instance, is of immediate interest to research questions (7.1a, b, & d) and will directly inform the design of the survey in some respects. Research question (7.1e) would require theorizing on the concept of ethnicity, for which Hoffman and Walker’s (2010) Ethnic Orientation Index (Section 8.2.3) or Nagy et al.’s (2014) indices will be useful. Without a proper literature search one is bound to repeat some mistakes or one will not be able to find that particular angle that is missing in existing work. Equally importantly, without a thorough literature review one will be missing the studies that will help to contextualize one’s own findings. It is important that a research question is empirically testable, which is best reached with a set of questions that, taken together, address the research question. Any of the research questions in (7.1) will need to be fleshed out further with follow-up questions, such as the following: – How do you intend to study the phenomenon, which variables will you study? – From whom do you intend to collect data? Location (city, province, state, country etc.), social group(s)? – What are comparisons based on? Is there a control group or data that you can use to the effect or will a control group also need to be polled? Looking at Example (7.1b), we see that the research question is fairly general and in need of specification, which includes the following issues. First, how does one intend to measure “linguistic autonomy”? Second, what does “endangerment” refer to (in



Chapter 7.  Questionnaire design and data collection 227

this context obviously the decrease or even loss of linguistic autonomy compared to a previous point in time)? We often need a benchmark to compare our present-day results with previous results that makes such assessments possible. Such results usually come from previous studies that can be used as real-time comparisons. For Canadian English, Dialect Topography data from various points from the 1990s and early 2000s are one such benchmark, the Survey of Canadian English or Boberg’s NARVS are other reference points. Often times, however, it means to collect two data sets, which requires double the effort for data collection alone. Not all comparative approaches require the collection of two data sets. Example (7.1a), for instance, can be answered without a real-time component, but still requires a comparative component. To assess homogeneity, one needs to define the concept for the terms of the study and show that homogeneity is either greater or less (or equal) in CanE when compared with another variety of English, most obviously American English. In one way, though, this question can be addressed with only one data set. If it can be shown that most traditional linguistic variables behave similarly across the country (within a given, pre-defined range), one would be able to postulate homogeneity. Ideally, though, one would include a comparison with, e.g. American English varieties.

7.2 Structure of the questionnaire A very basic but important design feature of questionnaires is its structural organization. The distinction is made between an introduction that includes respondent instructions, a survey part that arranges the questions, preferably in an intuitive order, and a conclusion, that expresses thanks to the respondent and provides contact details. Schleef (2013) lists seven essential parts of introductions to WQs: 1. 2. 3. 4. 5. 6. 7.

Title of the questionnaire Brief explanation of its purpose Polite request to fill in the questionnaire fully and honestly A short outline of what the questionnaire will cover (including a time estimate) The promise of anonymity The researcher’s name, institution and contact details An expression of thanks

Before respondents start on the survey part, they need to give their consent to participate. Often times, the WQ is preceded by an information sheet, which may be framed as a letter of informed consent. Some details from Schleef ’s list may be included in a Consent Form or Information Sheet that is handed out prior to the distribution of the survey or survey link, such as the following shown in Illustration 7.1:

228 The Written Questionnaire in Social Dialectology

INFORMATION and INFORMED CONSENT FORM, Version 1.2 Department of English INFORMATION for INFORMED CONSENT Class Project: English Usage, 2013 Interested in filling out a 10-minute online questionnaire on English language features for a UBC class project? In the questionnaire you will be asked how you pronounce certain words, what you call certain items and what constructions you use. The questionnaire consists of two parts. In the first part we would collect some information on your social background, such as your age, birthplace or your parents’ birthplaces. You may be asked to self-identify your ethnicity. We need this information in order to spot trends in the language. The second part includes the linguistic questions. In most cases you will be offered a number of choices, so the process is really quick. Please be advised that your answers will be stored on a server in the European Union. All data will be used for statistical analyses only and will not (and cannot) be linked to you personally. We will not ask for your name or your contact details. Your questionnaire would be one in a pool of around 500 questionnaires. If you decide to participate in the survey, please feel free to discontinue the interview at any time without providing any reasons. If you are interested, please connect to this link to start the survey: ENTER LINK. If you have questions or concerns, you can ask me or you can contact my instructor: Dr. Stefan Dollinger Assistant Professor Department of English University of British Columbia #397-1873 East Mall Vancouver BC V6T 1Z1 stefan DOT dollinger AT ubc DOT ca (604) 822 4095 You will not directly benefit from this study, but your answers will assist us in spotting linguistic change. If you are interested in results, please contact me or my instructor in April 2013. Thanks for your consideration! STUDENT FIRST NAME If you have any concerns about your rights as a research subject and/or your experiences while participating in this study, you may contact the Research Subject Information Line in the UBC Office of Research Services at 604-822-8598 or if long distance e-mail [email protected] or call toll free 1877-822-859".

7 Feb. 2013

Page 1 of 1

Illustration 7.1  Informed Consent Form (UBC 2013)

This Consent Form was approved by UBC’s Behavioural Ethics Review Board – a process which is now compulsory in most university settings (at least at the M.A. and Ph.D. levels and beyond). This paper version was accompanied by a text-only version for social media advertising (without the UBC crest and formatting). Once an invitee agrees to this anonymous Consent Form, the link to the survey brings her to a site such as the one shown in Illustration 7.2. In terms of Informed Consent, an important aspect for WQs is what may be called “Implied Agreement”: after being informed about the study and its implications, the invitee has the choice to follow up with a link to a survey or to leave it. If she follows the link, she agrees to the study.



Chapter 7.  Questionnaire design and data collection 229

The survey in Illustration 7.2 works in conjunction with the Informed Consent Form shown in Illustration 7.1. A title is given (Washington State English Survey), a purpose is given that should be accurate but not too general (“to spot trends in the language” – Illustration 7.1), a request to fill in the questionnaire is offered, a time estimate is given (15 or 20 min) and the researcher’s name and contact details are provided, as well as an expression of thanks.

Illustration 7.2  Beginning of Washington State English Survey (2011)

There are a number of ways to skin a cat, and this applies to the order of the background information questions in a questionnaire as well. One difference between Schleef ’s basic layout, which recommends the listing of the background questions at the end of the survey, and the Washington Survey is that the latter lists these questions upfront. In addition, in the Washington Survey detailed instructions on how to deal with the linguistic questions are only added after social background questions were asked. The assumption here is that people are generally familiar with social background questions. Once age, gender, education, residence history and the like are asked in the beginning of the questionnaire (and out of the way), one can focus the attention of the respondent on the all-important linguistic part with a statement such as the following:

230 The Written Questionnaire in Social Dialectology

Illustration 7.3  Instructions for the linguistic part (Washington Survey)

The respondents are encouraged not to answer what might be prescriptively desirable but instead to reveal their more informal language use, “among friends”. This may or may not work, as we will see in Section 7.3.6, but most surveys include such instructions. In this survey of 54 questions, including background questions and 8 optional questions, no reminders were included about the style of language that was aimed to be elicited, but in longer surveys, they will be very useful. Longer surveys, such as the 96-item Dialect Topography Survey benefit from some more elaborate form and structure. Following the established model of the FI questionnaires (such as LAUSC or SED), the linguistic section is arranged into groups of questions that are semantically related. There are five parts in Dialect Topography, for instance:

I. Around the House II. Food and Drink III. Outdoors IV. Neighbours V. Potpourri

In the first part – Around the House – typical questions are q5 “What do you call the knob you turn to get water outdoors or in the garden?” or q6 “What do you call the small cloth you use for washing your face?” Part II elicits, among others, answers to q20 “What do you call food eaten between meals or before going to bed?” or q27 “Which do you say? (a) He has drank three glasses of milk. Or (b) He has drunk three glasses of milk.” Part III asks respondents questions outside the home, e.g. q37, “In ASPHALT, the PH sounds like f. Does the S sound like sh?” or whether one says, in q40, “The cat wants to go out. Or The cat wants out,” fitting with the outdoor theme. The idea behind such ordering is to keep respondents engaged with a subject. It is believed that thematically arranged questions, with changes of topic, are more entertaining than a random mix or mélange of questions, and might increase the quality of the data.



7.2.1

Chapter 7.  Questionnaire design and data collection

Questionnaire length

Length is perhaps one of the most crucial categories when designing a survey. On the one hand, researchers want to elicit as much information as possible, on the other hand, they often forget the negative effects of long questionnaires: not only is there great danger that respondents would discontinue or not even start with a survey, which would influence the representativity of the survey sample, but there is good evidence to suggest that the longer the survey the less reliable the results. In a 2008 survey in Vancouver the explicit comparison was made between WQ and FI data. The survey included 32 linguistic questions following 18 social background questions and was therefore fairly short (Dollinger 2012b). Figure 7.1 shows the deviations in frequencies between the WQ and FI data that are greater than 10%, grouped by responses from the first and the second half of the survey. An interesting pattern can be seen. The younger age cohorts, teens to thirties, show more deviations in the first half of the questionnaire. The 30-year-olds are in the lead with two deviations in the second half but six in the first.The 40-year-olds show five big deviations in each half. Then, the 50- and 60-year-olds show an increase in deviations in the second half: the longer they work on the questionnaire, the more the WQ answers diverge from the FI scores. The 70- and 80+-year-olds are in a category of their own: regardless of the section, they show a steady deviation with ten or eleven and thirteen or eleven errors in the first and second halves, respectively. These figures indicate that the two older age cohorts may indeed be more unreliable respondents. Deviations >=10% per age cohort in 1st and 2nd half 14

1 st half 2 nd half

12

No.of deviations

10 8 6 4 2 0

10s

20s

30s

40s

50s

60s

70s

80s+

Figure 7.1  Position of questions and error rate by age of respondent (Dollinger 2012b: 90)

231

232 The Written Questionnaire in Social Dialectology

Figure 7.1 clusters into two levels of deviations: one level is around five, the other around twelve deviations, as represented by the dotted lines. The 60-year-olds provide a clue to what is happening. Their deviations in the second half skyrocket from five to twelve – that is, from one level to another. Whereas sampling size affects the younger age cohorts, it is the 60-year-olds, whose sample size is in between the younger and the older ones, who appear to show some fatigue factor in the second half. In terms of sample sizes, the cohorts seventy and older have fewest respondents, while the 50-yearolds and younger have around fifty respondents and more. The 60-year-olds are in the middle, with twenty-four respondents. Since their deviation score is low in the first half, their deviations are not an effect of sample size but appear to point toward fatigue in the 60-year-olds and over. Length, therefore, does seem to be a critical issue. What has been long known in the opinion polling literature, the discipline that pioneered written surveys, is true also for linguistic questionnaires. As Moser and Kalton (1971: 309) put it in their classic account: “The temptation is always to cover too much, to ask everything that might turn out to be interesting.” They continue with clear words of advice: “This must be resisted.” The question is then: how long should a questionnaire be? A definite answer would be difficult to give, as questions vary in processing time – consider the difference between a question for “place of residence” on the one hand and “offer your residential history from birth to present” on the other hand. However, one hundred questions (including background questions) appear to be an upper limit that key surveys abide by (Davis 1948 listed 100 linguistic questions), but in most cases the number of response items should be lower than that. Seiler (2010: 520) suggests a maximum of one hour as the time to be spent on a questionnaire, which translates well into a 100 questions for the slower respondents. The DT of Canada questionnaire, for instance, is about 30 minutes to an hour in length, depending on a number of factors, such as literacy, age, and current fatigue levels of the respondent. As such, it is much shorter than previous linguistic questionnaires, including Allen’s “check list” for the Linguistic Atlas of the Upper Midwest. It is most important that questionnaire length is balanced with the anticipated sample size: more data points may possibly skew results (due to fatigue), especially in later questions, while fewer questions will provide more reliable results and, most importantly, higher response rates. If one wishes to collect substantial numbers of responses, a shorter questionnaire would aid tremendously. A high response rate is one of the universally acknowledged advantages of WQs and long WQs undercut this design advantage, which must be avoided. In some fields, such as psychology, it is customary to pay respondents and there are some linguistic projects, e.g. the Roswell Voices project (Kretzschmar et al. 2007), which as a part of their policy pay the informants. Such practice is also common in fieldwork with aboriginal and First Nations communities, which has in these contexts the added



Chapter 7.  Questionnaire design and data collection 233

bonus of expressly breaking with a historical and unreflected practice of exploiting aboriginal peoples. It is very important to ensure that respondents are in no power relationship with the researchers, e.g. students who do not want to fill out the form but do not dare to decline for fear of losing favours in an ongoing course. These are some of the issues that Ethical Review Boards will look out for and, at least in the North American (research) university context, apply very high standards to prevent any kind of coercion. However, there is also a lower limit of the number of questions asked which is based on respondent expectations: if one is asked 15 or 18 background questions, a respondent might reasonably expect roughly an equal number of linguistic questions in order to “make it worth” the respondent’s time. Respondents clearly understand that their social data is just the backdrop for the linguistic questions one is interested in. It is important to meet the respondents’ expectations, which means that unless one plans to elicit very few social categories there is a lower limit of questions under which one should not go. The shortest questionnaire I have ever used included only four linguistic questions and 8 background questions, bringing the total number up to 12 questions. This worked, because the linguistic questions were time-intensive. I announced it as a “5-minute-or-less survey”, which helped increase the number of responses. In fact, I was primarily interested in only two of the four linguistic questions (which were on take up #9, as discussed in Chapter 4), while the others were used as filler and practice questions that also helped to contextualize the meanings of the variable in question (e.g. some of the more widely used meanings of phrasal verb take up, e.g. take up where you left ‘continue’).

7.2.2

Choice of medium: Paper or online?

A very important aspect is the choice of medium: will the survey be printed on paper or should it be an online survey, or a combination of both? Until recently, the paper method was the most frequently used method and most of the literature refers to this type still (e.g. Dillman 2000; Brown 2001; Dörnyei 2003; Schleef 2013). Structural features are different in paper and online environments. For instance, in an online questionnaire that required respondents to click “Next” to go to page 2, only slightly more than 50% actually did so. This resulted in a small sample for the second half of questions. What does not pose problems on paper (Please turn over), was overlooked by half of the respondents online despite a big “NEXT” button, which illustrates that both media have structural requirements that need to be considered. The best way to discover what works and what does not is to pilot the WQ (Section 7.3.7) Internet surveys have proven immensely useful and follow the societal trend of self-administration (Dillman 2000: 7). It needs to be kept in mind, however, that even in the Western World not all layers of society can be successfully and equally reached with screen-based collection methods (be it on a personal computer or on a handheld

234 The Written Questionnaire in Social Dialectology

device). The same problem applies to a much lesser degree to written surveys as such, since pen-and-paper have been available much longer. Considering this drawback, which will decrease with time as IT technologies reach even the most disadvantaged societal layers and regions, there are considerable advantages when collecting data digitally. While, for the computer savvy, constructing one’s own online database and collection form is a task that can be done with open source software such as php/MySQL quite easily, the more practical approach is to use a service such as limesurvey.org. Lime Survey (2012) is one of the most interesting and versatile options and free of charge: its open software concept and its impressive features give you every flexibility, though it would take some time to get familiar with its a-typical interface. Other suites include surveymonkey.com and esurveyspro.com, fluidsurveys.com or qualtrics.com, though all of these charge a fee for the packages you would need for most WQs. The output of survey suites can be imported into a spreadsheet program, such as Excel (to which Chapter 8 serves as an introduction).

7.3 Question design In English linguistics, the tradition of asking linguistic questions is deeply entrenched in the FI method. As most attention has been given to this method (as shown in Chapters 1 & 2), it is only logical that FI questions are the more advanced genus. English dialect geography has consistently emphasized the importance of indirect questions with the maxim to never ask an interviewee directly for a linguistic item. One should refrain from asking an interviewee how he pronounces “barn”, since the fieldworker’s pronunciation will influence the interviewees (as discussed in the previous chapter’s section on koinéization theory), and one better asks indirectly, e.g. ‘what do you call the structure in which you keep the cows’, as it will produce more varied lexical items. What is a perfectly logical elicitation requirement in phonetics, however, has also been applied to non-phonetic domains in a blanket fashion. For instance, if one is interested in the use of a particular lexical item, it might just be better to ask for it directly and have the interviewee elaborate on the item to verify whether she really knows the word and its meaning, as argued in Pratt (1983). As a side effect, one also saves a lot of time and is able to conduct the interview more naturally rather than skirting around a target that is hard to reach. Today, a clear trend towards more varied and less dogmatic approaches to question design can be seen that includes face-to-face interviewing. In plans for a new dialect atlas of the UK and Ireland, for instance, Kerswill et al. (1999) elaborate on the idea of a WQ component as part of an interview setting. The WQ is given to interviewees a few days prior to the appointment and is to be filled out on their own: their WQs, which are “Sense-Relation-Networks” (SNRs) ask direct linguistic questions, with “standard



Chapter 7.  Questionnaire design and data collection 235

notion words” acting as linguistic cues for dialect words. This direct method is not just meant to decrease the time needed to elicit words, but also to influence the interview setting in a positive way, as the authors write with an indirect question, the interaction may feel more like an interview or a test of some sort, rather than a conversation, and this may have the [e]ffect of increasing the formality of the speech of the informant. (Kerswill et al. 1999: 262)

While the follow-up interview is an essential part of this method, the use of a WQ component clearly shows that more recent FI methods have begun to embrace direct questioning that is traditionally used in WQs and has been shunned in FIs.

Self-reporting and community-reporting An important aspect of all types of questionnaire studies is the kinds of responses which are elicited: are the respondents asked to report on their own use, assessments, attitudes or report on somebody else’s? The term self-reporting is used to refer to reports of one’s own use, attitudes and the like and is distinguished from community-reporting, which is the reporting of features and usage in a second-hand fashion as used by others. This distinction is important as we are dealing with two different types of questions when for instance, respondents are informed (in the DT questionnaire, for instance) that We are only interested in what you say when you are among friends – not what you think you should say, and certainly not what you think other people think you should say.

Such questions are exclusively aimed at the respondent’s own language use or attitudinal assessment and therefore self-reporting in nature. The social context is defined as being “among friends”, implying that informal language use is what is aimed to be elicited here. Self-reporting questions are different from questions that elicit an assessment of community norms or behaviour, such as the request in Krug and associates’ questionnaire which asks respondents to assess whether a feature could a. be said in their home country in an informal conversation or b. be written in their home country in an email to a former teacher by = = everyone, = most, = many, = some, = few or no one (Krug & Sell 2013: 81)

This type of community reporting is more complex than self-reporting, in particular as the social circumstances of language use vary and are difficult to be defined in a way that is clear to everyone, though contexts, such as in Krug and Sell’s example, are useful. Depending on the personality type of the respondent and the reach of the respondents’ social networks, however, one would expect quite different answers from respondents of otherwise similar social backgrounds. Community-reporting questions are likely to produce a more heterogeneous data set than self-reporting questions.

236 The Written Questionnaire in Social Dialectology

7.3.1

Types of questions

There is an array of question types to be used in WQs. Different classifications have been proposed for basic types of question content – Dillman (1978) and Patton (2001) list five types that overlap to a great degree. The recurring types of content questions are – – – –

Behaviour and Experience Knowledge Beliefs and Opinions Attitudes and Feelings

Whatever the classification, it is important to design questions that target one of these areas. When transferring raw questions into questionnaire items one may want to ensure that one uses the correct basic type of question – if we’re interested in behaviour – e.g. what word do you use for X – we must ensure to phrase the question accordingly. The main distinction for question format is between open questions and closed questions: open questions (or open-response items) invite answers with an open text field that the respondents are asked to fill in, which naturally produces variation in spelling, framing, verb forms (if the answer is more elaborate) and the like. Closed questions (or closed-response items), by contrast, offer a limited set of answer categories in one form or another. The latter is the type that is more common in linguistic WQs, but closed questions require both more thorough knowledge of the linguistic variable than most open answer questions and necessitate the researcher to have a very definite idea of the answer options that a question will trigger. In return, if a closed question is designed in the right way, the analysis is much easier than with open answer questions, as the answers are already classified. In the social science questionnaire tradition (such as opinion polling), closed questions are the norm and a great deal of literature has been produced on their design in sociology, psychology and market research (e.g. De Vaus 1991; Holm 1998; Dillman 2000). Dörnyei (2003), Brown (2001) and Groom and Littlemore (2011: 98–105) reflect on questionnaires in the context of applied linguistics, which is, overall, close to traditional social science methodology and of immediate relevance to linguistic attitude and perception studies, which we will address in Section 7.3.4 below. Apart from linguistic attitudes and perception studies, questions in social dialectology generally aim to elicit information on the regional and social uses of linguistic variables and represent an interesting sub-type of a somewhat atypical nature. Compared to the usual social science opinion-poll type of questions, language use questions target only forms of behaviour and not knowledge, attitudes, beliefs or



Chapter 7.  Questionnaire design and data collection 237

opinions. For instance, one feature of the opinion polling of sociological and psychological study is that questionnaire items “rarely take the form of actual questions that end with a question mark” (Dörnyei 2003: 28). In linguistic questionnaires, however, it is almost always the case that respondents are directly asked whether they say variant A or variant B or what they call an item that is used for X. As we will see below, for some aspects linguistic questions show differences when compared to social or psychological question design.

Closed-response items Closed-response items are the preferred type of question in the social sciences for their ease of analysis. There are many different types of closed items, though only some of which have been used in social dialectology and sociolinguistics. These will be presented below. Checklists Checklists are a question format that was among the earliest closed-item response types to be used in linguistic WQs, as we recall from Davis’ (1948) study. Checklists deliver categorical answers offered in a closed list of possible answer options. The respondents are asked to select (circle, click) all that apply. For instance: (7.2) a. What do you call the wheeled conveyance you put your groceries in while shopping: shopping basket / shopping buggy / shopping cart / shopping trolley  (NARVS, q7) b. How do you spell the following word: colour color

With checklists, multiple selections are usually encouraged: if a person uses both shopping buggy and shopping cart, both choices would be selected. Likewise, if a person uses both spellings, colour and color would be marked. The advantage of this method is its speed (circling or ticking off options is fast), its downside is that it offers no information on the frequency of the variants: is colour used more frequently or color? Generally, respondents would not be in the position to give adequate answers as to frequency, beyond stating their primary variant. Checklists do not allow for such assessment. For perception and attitude studies, however, checklists can be used quite effectively, as in the following example from Schleef (2013: 45): (7.3) The following is a list of cities in England. In which of these do you think some locals speak a widely recognizable local dialect. Please circle. London Leicester Liverpool Leeds Sheffield Nottingham Manchester Birmingham Northampton Carlisle Norwich Plymouth

238 The Written Questionnaire in Social Dialectology

Multiple-Choice items: Binary (nominal) & categorical Checklists deliver categorical answers. A special case of categorical answers are binary answer choices (also known as True-False items in the wider context). In this case, only two answer options are offered and the respondent may only choose one. In other cases, more than two options are offered and only one may be chosen. In Multiple Choice, a number of options are offered. Instructions may be permitted to either only choose one or more. Examples include: (7.4) a. What do you call a small house in the countryside, often by a lake, where people go on summer weekends: cabin / camp / chalet / (summer) cottage / summer house / summer place / the lake / vacation home  (NARVS, q45) b. What do you call a multilevel building for parking cars: car park / indoor parking / parkade / (parking) garage / parking lot / parking ramp (NARVS, q48) c. What do you call the sweet hard substance that covers some cakes? frosting icing (SCE, q34) d. What do you call milk with more than 2% fat content? whole milk homo homo milk other __________ (Vancouver Survey, q27)

It is important with categorical answers to add an option of “Do not use” or “Other”, to avoid respondents getting stumped by answer variants they do not use. While binary options are not favoured in social science research, “the more options an item contains, the more accurate evaluation it yields” (Dörnyei 2003: 42), there are good reasons to use them for linguistic features. For instance, it would make little sense to ask respondents how long their glide is in news. For these questions, one might ask: (7.5) Does the u in STUDENT sound like the oo in too, or the u in use?

(DT, q52)

A binary answer option – possibly with an option “Do not know” – is as much detail as self-reporting WQs can obtain on phonetic behaviour, for instance.

Rating scales Rating scales are a standard answer feature in social science research. There are a number of different types: Likert scales and Semantic Differential Scales. The original Likert scales were developed in the 1930s by the American psychologist Renesis Likert. Respondents are asked to indicate the degree of their agreement or disagreement to a set statement, e.g.



Chapter 7.  Questionnaire design and data collection 239

(7.6) I find Austrian German very pleasant to listen to. Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree

For Likert scales to work, they need to be worded in a favourable or unfavourable manner. “Neutral” wording does not work well, e.g. Austrian German is alright, as it does not offer an incentive to use the extreme ends of the scale. Semantic differential scales are an adaptation of Likert scales. Instead of responding with degrees of agreement or disagreement, a set number of answer choices is offered in between two logical extremes, e.g. (7.7) Canadian English is Widely appreciated _____ : __X__ : ______ : ______ : ______   Not appreciated Well studied __X__ : _____ : ______ : ______ : ______ Unstudied

Rating scales allow more advanced statistical processing than is possible on categorical answers, which is one of the reasons for their popularity. In sociolinguistics, attitude and perception studies make frequent use of them, as we have seen in Jenkins’(2007) study in Chapter 5 or as will be shown in Section 7.3.4. The issue of whether to use an even or odd number of categories (which results in the creation of a “neutral” category in the latter case), is a matter that is highly disputed among practitioners. While 20% of respondents seem to select a “neutral” category if one is offered to avoid making a choice, the relative responses, the ratio of answers in both extremes, are not affected by an even or odd numbered set of options (Dörnyei 2003: 38). In general I have used even-numbered categories, thus forcing the respondents to take a stance. These are the most popular types of closed-response questions in social dialectology, though they only represent a fraction of choices as there are many more answer options and presentation modes than have been applied in linguistic contexts (see, e.g. De Vaus 1991; Holm 1998: Vol. 1). Likert scales have been used for decades to assess language attitudes, starting with the work of Lambert and Giles. With the advancement of digital technology, sound files have been used more frequently, with CampbellKibler’s (e.g. 2007) and associates’ (Wanjema et al. 2013) work being a good case in point for sound delivery online. Analog tape recorders, however, have been put to use for a long time so that a decisive factor here is the ease of delivery with digital formats with internet technology. To give only one recent example, Watson & Clark (2014), for instance, combine time-tested methods in a digital environment, using Likert scales to have respondents rate speech samples for attributes such as pleasantness, while presenting material in the form of sound files on the internet in their WQ.

240 The Written Questionnaire in Social Dialectology

Multiple-items scales in social dialectology An interesting difference between social science and linguistic questionnaires is the fact that one of the most important tools in social science questionnaires, which is a must-have feature in that discipline, is very difficult to incorporate in a social dialect WQ. This feature is called multi-item scaling and refers to the measuring of one attribute with multiple questions. For instance, if a psychologist considers measuring the level of extroversion in a person, the questionnaire would include a number of inter-related questions and scales, not just one question on the topic. Then, the aggregate score of the questions targeting one feature is calculated, which yields much more reliable results than just one question on the issue. Multi-item scales are important, as it has been shown that the actual wording of the questions assumes an unexpected importance: minor differences in how the question is formulated or framed can produce radically different levels of agreement or disagreement. (Dörnyei 2003: 33)

Multi-item scaling is the antidote to this problem. The score averaged over a number of differently worded questions on the same issue is more likely to balance out individual weaknesses of individual questions. This key feature of social science questionnaires is, unfortunately, only of very limited use in social dialectology. This is because of the different types of questions dialectologists and sociolinguists need to ask. In social behaviour the issues to be polled are generally relating to beliefs, opinions and attitudes, which are quite easily asked in five different ways (such as attitudes towards free speech, sympathy scores of politicians or personality traits such as honesty). In dialectology, by contrast, quite a number of linguistic items have unique conditioning factors that make it very difficult to link them with other items. Even if we consider related questions, such as yod-dropping in student, news and avenue, we see that all three contexts and lexical items show their individual behaviours and express different social connotations so that they are usually not collapsed into one score. It is the same with lexical variables: whether someone uses tap, parkade and toque are three isolated cases of linguistic variables that will hardly be grouped together in one kind of measurement. While in these three variants a combined score would indicate the use of Western Canadian variants over American variants, devising questions that target the same linguistic item and do not seem repetitive is challenging. Where multi-item scaling would be possible, it simply has – to my knowledge – not been done. For grammatical patterns, one could offer different contexts for the same grammatical rule (e.g. case following prepositions: my house is different from/ than/to yours, asked again in different contexts, e.g. his car is different from/than/to hers). Researchers probably consider such questions as repetitive and as too obvious for a respondent not to align the answers – after all, in Western societies consistency



Chapter 7.  Questionnaire design and data collection 241

in linguistic forms has been taught to the population as a whole since the onset of compulsory schooling. In cases where parallel forms of asking questions is possible without sounding too obvious and repetitive , however, multi-item scaling would be an area of improvement for linguistic WQs. As we said, multiple-item scaling do not work well in social dialectology due to the nature of the variables, which means that researchers need to be very sure that their questions are as unbiased as possible. The closest one can easily get to this approach is by asking inter-related questions, i.e. questions on the same type of variable in a number of contexts. An example of these is discussed below.

Inter-related items Inter-related questions are useful for variables that are formally the same (e.g. word form or syntactic construction), but require special situational contexts. They are a way to ensure that the respondent needs to offer more than one cue correctly for closed-response items to demonstrate that the variable is part of her grammar. One such case is the lexico-grammatical variable positive anymore, i.e. anymore used in positive sentences. Standard English, generally, requires any instead of some in negative contexts only, e.g. There are no cigarettes anymore. Positive anymore, however, occurs in positive sentences with a different meaning: e.g. John smokes a lot anymore, which means that John smokes a lot nowadays (Chambers 2007). The semantics is complicated because of the presence of negative anymore in any standard and many non-standard dialects of English (e.g. John no longer smokes anymore), a presence which clouds the picture considerably. The existence of two similar constructions, negative anymore as a standard feature and positive anymore as a non-­ standard feature may lead respondents to report on their use of the Standard English variety that was taught (or at least reinforced) in school instead on their knowledge of the non-standard construction. With positive anymore, the questions are necessarily more ambiguous than for yod-dropping. This should not come as a surprise as we are trying to elicit relatively fine semantic nuances. The DT questions 48, 50, 51 and 54 are shown below in (7.8): (7.8) a. 48. Someone said, John smokes a lot any more. Does this mean that John hasn’t been able to cut down, let alone stop? or John wasn’t smoking much for a while but now he is? or John has almost quit? b. 50. What does any more mean in John smokes a lot any more? still or nowadays or negative

242 The Written Questionnaire in Social Dialectology

(7.8) c. 51. That sentence, John smokes a lot any more, does it sound like something you might say under the right circumstances? or others might say though you wouldn’t? or no one you know would say? d. 54. Someone said Harry likes rock music any more. Does this mean that Harry’s turned off rock? or Harry’s finally seen the light? or Harry’s always been a great rock fan?

Note that the questions do not all immediately follow one another. Example (7.8a) gives three paraphrases of the entire sentence – all written in informal style. The chance that a respondent is going to guess the answer right is 1 in 3 (right answer: the second option). We therefore need more safeguards to avoid data that would be unacceptably skewed (note also that respondents are not given the option to say: “I don’t know”, which would present a different scenario). Next, in (7.8b), we do not translate the meaning of the entire sentence, but we merely elicit the meaning of “any more” in this context: again, the chances are 1 in 3 to guess it right (which many people would rather do than not answer at all). The combined likelihood that all three interpretations would be guessed correctly (7.8a, 7.8b, 7.8d) would be 1/3 * 1/3 * 1/3 = 1/27 and thus only a miniscule chance to get all three answers right. If one is looking for a safe method to discriminate guessers from users and one does not wish to work with open-response items, this method would do the job. In addition, question (7.8d) uses highly idiomatic language such as “turn somebody off rock”, “seeing the light” (getting to appreciate rock music) that is likely to test the certainty of the respondent. As the meanings are ambiguous, a respondent would have to be quite certain in the interpretation of Harry likes rock music any more to select the second option (‘has seen the light’). The lack of a “Don’t know” category was apparently inspired by the idea to elicit passive knowledge rather than active knowledge: the goal was to detect all people that would passively know the correct meaning, although they might not use the construction themselves. Example (7.8c), then, directly asks the respondents whether they use the construction or are passively familiar with it. Matching all four questions offers an example of the usefulness of inter-related questions and allows the identification of passive users to a degree that would not be possible for any of these questions in isolation.



Chapter 7.  Questionnaire design and data collection 243

Open-response items Open response items are very frequently used in the field and most often for lexical variables. In social science research, however, these items carry the connotation of being suboptimal questionnaire items. Their major shortcomings are that answers are more time-consuming for respondents to write and significantly more time-consuming in the analysis. Open-response items, however, do play a vital role in dialectology, as only open answer questions offer the range of variants some dialectological studies are interested in. There are a number of ways to elicit open responses, which range from open text fields, to semi-structured answers to more innovative modes of responses. For instance, Preston’s (1989; Preston & Long 2002) method of having respondents identify areas on a map for certain linguistic attributes, is a discipline-related response mode that is a form of open response to be discussed in Section 7.3.4. The more traditional types of questions are shown below: (7.9) a. What do you call someone who studies “too much” and tries very hard to impress the teacher? _______________________________________ (DT, q53) b. What do you call the building where people park their cars when they go shopping? _______________________________________ (Vancouver Survey, q29)

The examples in (7.9) elicit lexical items. Note that the description of the lexical target, rather than asking “what is your word for brown-noser, keener etc.”, renders these questions indirect rather than direct. The example in (7.9b), however, is an open-response version of (7.4b). Open-answer questions trigger more lexical variation than can be listed in closed-answer questions, which, in turn, are more quickly answered than open-answer questions and help focus the respondent’s attention on the variable. For finer semantic distinctions, open answer questions are often the most effective method. The examples below come close to a short-answer type of question, where respondents are invited to comment on a variable of interest. (7.10) a. What is the meaning of hydro in Chris bought a few hydros down the street. _______________________________________  (Vancouver Survey, q21) b. Three interlinked questions on take up variable  (Language Survey Vancouver-Washington State, 2011)

244 The Written Questionnaire in Social Dialectology

The meaning of hydro in (7.10a) targeted a type of marijuana joint and a mere six of 429 respondents identified this special, fringe-group meaning. Example (7.10b) is an expanded version of an open-response type of question. It first invites a paraphrase of the meaning of a sentence in open text format. It asks who might use this sentence and then invites further, optional comment. One would dedicate such space – three of the recommended maximum of a 100 questions – to a variable of prime interest that is semantically rather narrow, such as take up #9. For questions that are asked in a direct fashion, or on a binary basis (e.g. yes/ no answer choices) it is recommended to invite elaborate comment. Pratt (1983) has found that only if the respondent (or interviewee) is able to fully contextualize the linguistic item, one can consider her as actually knowing how to use the item. This method offers an alternative to inter-related questions by replacing some closed-­ response items with one open-response item.

Mixed-response types One technique that combines some of the benefits of closed-response questions (quicker answer submission and ease of analysis) with the benefit of open-response types (greater range of variants) has been used in a number of studies and may be called major response types plus other field. The examples in (7.11) show the principle:



Chapter 7.  Questionnaire design and data collection 245

one combines a list of the most prevalent response types with a field “Other” and invites the addition of other, not-listed terms. (7.11) a. Which of the following names do you use for a waterway smaller than a river?

 crick  stream  brook  rill  snye  run  creek  other ________________________

(DT, q44)

b. Please select the word you would use most often in your everyday speech. Select more than one answer only if necessary. If the word you use is not listed, please write it in.  (NARVS instructions)

Example (7.11a) offers a common way to invite more than the listed variants (either on paper, as shown, or online with a textbox). Sometimes, however, researchers choose to limit the variants. Example (7.11b) shows the instructions from NARVS. One can see that NARVS instructions discourage the selection of any and all variants that are known to the respondent, but encourages to provide unlisted terms “if necessary” in an attempt to focus respondents’ answers. This can backfire, however, if respondents interpret the instruction too narrowly and refrain from offering unlisted variants they generally use.

7.3.2

From raw questions to questionnaire items

Now that the most common question types have been introduced, we need to address the types of linguistic variables that can be elicited, which are first treated as “raw questions”. An obvious way to arrive at raw questions is by reading the literature and by looking at previous studies. For newer variables, the process is less straightforward: it is generally recommended to start with a “question repository” where one collects potential variables that may be included in the questionnaire. Raw questions are only the first step and must not be confused with final questionnaire items. Transferring raw questions into items involves an analysis of each question with the general research question in mind, which will help decide whether to include a particular variable or not. Once this decision has been made, the next step is to check the question type and then to explore different kinds of question wording. The quality of the questions is a most important and crucial point, because if not formulated in the best possible way the results will be biased. In order to turn raw questions into questionnaire items, three principles should be kept in mind:

246 The Written Questionnaire in Social Dialectology

– Accessibility: respondents have to have conscious access to the given linguistic feature – Clarity: the questions must be clearly and straightforwardly worded and tailored towards a diverse readership – Conciseness: the items must be short and to the point Accessibility of a speaker to the polled linguistic feature is the most basic requirement. It needs to be assessed for each question individually and thoroughly. The two examples below are typical WQ questions with binary answer choices and may serve to illustrate the principal type of accessibility assessment: (7.12) a. Which do you say? Just between you and me, your aunt is often wrong. Just between you and I, your aunt is often wrong.  (Dialect Topography, q56) b. Does LEVER, as in ‘Pull the lever’, rhyme with clever or cleaver?  (Dialect Topography, q35)

While (7.12a) elicits responses to syntax (or phraseology) about between you and I/ me, (7.12b) polls respondents on a pronunciation variable in the form of a lexically conditioned phonetic realization of the stressed vowel in the word lever. The binary answer choices above in (7.12b) are designed to tackle phonemic variation in a given lexical item. It is important to assess carefully whether respondents have conscious access to this kind of information, which will involve trial runs for new variables. In (7.12a) respondents can be expected to give a reliable answer, as prescriptive grammar questions such as this one have traditionally been at the forefront of grammar classes in compulsory schooling in western countries. With this question the challenge (and the risk) is to get the respondents to report what they actually say rather than what they were told by their English teachers in school. In other words, we need to assess whether there is a potential social stigma attached to one variant or another and if there is, we need to clarify whether there are sufficient indicators that the variant is not too highly stigmatized to lead respondents to report on what they “ought to be saying”. If so, we would be accessing language attitudes and not the reported behaviour. In the Canadian context, we have seen in Chapter 4 that the formerly stigmatized construction between you and I is now increasingly heard in the Canadian media landscape. It seems warranted to include this variable in a self-report survey, but one would need to keep this possible bias for underreporting between You and I in mind in any interpretation. In (7.12b) the accessibility problem is different. For phonemic questions, the major pitfall is to elicit information that is phonetic, not phonemic in nature. For



Chapter 7.  Questionnaire design and data collection 247

instance, if we were to ask whether our Canadian respondents pronounce the vowel in the word ban with a lower, higher or equal tongue position than the vowel in bat, we would get nonsensical results. We would get some results, but they would not be trustworthy, because this variation is phonetic and generally not accessible to the language user. It requires spectrograms and a large sample of pronunciations of each token (ban and bat in certain contexts) to show that Canadians raise the vowel (a higher tongue position) before nasals (in ban) but not before voiceless obstruents (in bat) (Boberg 2010: 144). Some respondents with a good ear might notice that ban is higher in vowel height than bat, though generally, people would be confused. Respondents cannot reliably report on phonetic variation, but they can report phonemic values, such as whether they pronounce leaver as rhyming with cleaver or clever. With phonemic questions it is crucial to give clear reference points that are stable in pronunciation across age groups, social groups and dialect regions. In the current example, reference points either and never rather than clever and cleaver would not work, as the pronunciation of either varies between [i] and [aɪ]. For variables where situational variation plays a role and speakers might use both variants depending on the situation, WQs are not a good method of data collection. Clarity of the item is another important aspect and the researchers themselves may have problems assessing this aspect objectively without outside help. Brown (2001: 46) spots the problem in the researcher’s familiarity with questions and their purposes that the researchers themselves “may not be able to spot ambiguities and double meanings”. One such example is shown in (7.13). (7.13) Do you use the expression So that’s what he thinks, eh? yes no sometimes

(SCE, q24)

Example (7.13), from the 1972 Survey of Canadian English (SCE), is a bad example for clarity: it asks for two issues to be assessed (a “double-barrelled question”). First, whether the construction with eh is used at all (yes/no) and second the frequency with which it is used (in answer C). It would have been clearer to just ask for general use (yes, no) and to possibly offer the qualifier “do you use it in the right situations” – to accommodate for the social stigma that some uses of eh carry (e.g. Avis 1972; Gold 2008). As an early study, the SCE made more than one suboptimal choice and most problem questions can be attributed to lack of clarity. The variable chesterfield can serve to illustrate a number of potential problems in questions design. In SCE it was offered in the format shown in (7.14a):

248 The Written Questionnaire in Social Dialectology

(7.14) a. What do you call a piece of furniture that seats two or three people in a row and has upholstered arms and back?

A. B. C. D.

sofa chesterfield davenport by any other name

(SCE, q29)

b. What do you call the upholstered piece of furniture that 3 or 4 people sit on in the living room? _________________________  (DT, q2)

SCE offers answer options, yet couch is not listed as an incoming variant – a focus group interview with younger speakers would have produced that variant, which was first reported around the time, e.g. Gregg (1973). It is paramount to offer a description that is maximally transparent for as many people as possible, ideally all participants. However, the description cannot be too precise or too cumbersome. Instead, one must work with some degree of “common sense”. As this feature is culturally sensitive, in multicultural societies we face a bigger issue with clarity than first meets the eye. A pilot study that examines the question format is therefore absolutely necessary. One efficient way to do so is to have people read the questionnaire and comment, in a Think-Aloud Protocol, on the questions themselves: are they confused? If so, where is the problem? Are they clear on what is being asked of them? For chesterfield, Dialect Topography (DT) takes a different route, as shown in (7.14b) by offering a slightly adapted question that seems to capture the essential features of the item in question: 3 or 4 people are seated (two avoid confusion with the variant love seat, which generally seats two people), adding the typical location in a house (living room) yet removing “clutter” by not specifying which parts are upholstered. The DT question is shorter, which is generally also good. Conciseness. As Brown (2001: 45) states: “As a rule of thumb, short questions are good questions”. As long as the target form is clearly identified, which applies better to (7.14b) than the longer question in (7.14a), the question is considered as detailed enough. One goal should be to write the shortest question possible that identifies the item for as many people as possible. This is easier said than done and will not always be accomplished. But the goal must be to come as close as possible to the shortest question that unambiguously identifies the item and sounds “natural”. Pilot testing with an array of speakers of different backgrounds (that are found in the target group) is an essential step towards finding the right balance between conciseness and detail (Section 7.3.7). Testing is a crucial part of all WQ design and it is often only at this stage that difficulties with and differences between question phrasing are discovered.



7.3.3

Chapter 7.  Questionnaire design and data collection 249

Self-reporting linguistic behaviour

The questions discussed so far in this chapter have all been designed for self-reporting of a respondent’s typical linguistic behaviour; they are all traditional self-report questions of linguistic behaviour. There have only been two exceptions to this: Example (7.3), which asks the respondent to report on her perceptions, not on her own use (see Section 7.3.4), and (7.8c), which includes questions about community use (“others might say but you wouldn’t”) (see Section 7.3.5). Because the reader will thus be fairly familiar with traditional linguistic behaviour reporting, we only need to highlight possible limitations of the self-reporting of linguistic behaviour. We will first address the issue of linguistic inventories and social correlations elicited in WQs before we will look at two more recent elicitation techniques from the social study of syntax that are not found in traditional WQs. These are translation and reformulation tasks and grammaticality judgements with Magnitude Estimation Tasks.

Inventories vs. social correlations Some findings suggest that WQs are not the ideal method for data collection when one aims to study the social correlates of stigmatized linguistic phenomena. Cornips (2002) shows in a study on Dutch verbal complementizers om and voor that both forms are reported in the questionnaires, yet are distributed differently when compared to natural speech. While inventories of variants may be established with WQs, their social distribution may be skewed. It is essential to note that the prestige variant is om and that voor is categorically avoided by educated respondents. This means that the elicitation of socially stigmatized variables (in this case of non-educated speech) requires some fact-checking. The relationship of reported use to actual use is the matter of some debate. On the one hand, there are those who treat reported data as representative of the evaluation of linguistic features. On the other hand, reported use can be treated as indicative of some form of use that is only indirectly linked to actual use. It seems that Seiler’s statement is closest to the nature of the data: It is impossible to compare the informant’s linguistic behaviour in the interview with his/her ordinary speech. […] It seems that the indirect method yields reliable results as far as the geographical distribution of linguistic variables per se is concerned. However, its limitations are to be found in the exploration of the exact functional properties of the variants […]. (Seiler 2010: 516, boldface SD)

In other words: WQs generally produce very reliable inventories of variants, but they seem to offer only limited cues towards the linguistic behaviour of the respondent in concrete functional contexts. After all, one does not ask respondents in the questions presented so far to “rate” the social acceptability of particular constructions. Such

250 The Written Questionnaire in Social Dialectology

questions would yield different results from the type of “What do you call/Which do you say” type we have used so far. WQs, however, do not always yield the full social picture of variant distribution and any WQ study must be very aware of this limitation.

Socio-syntactic reformulations Traditionally the domain of generative linguistics, grammaticality judgements are now used in more sophisticated versions in the new field of “social syntax” or socio-syntax. These question types increase the versatility of WQs for the study of syntactic and morphological phenomena beyond the traditional types and questions. There are two basic question types. The first is a translation or “reformulation” task where a test sentence is given with the instruction to “translate” it into the variety under study. The test sentence can either be given in the standard variety with the request to translate it into the target variety, as applied in Sytnaktischer Atlas der deutschen Schweiz (Syntactic Atlas of German Switzerland, e.g. Glaser 2000). With this method, we run into problems somewhat reminiscent of Wenker’s missing transcription system, because respondents are required to render dialectal forms in writing that is usually only performed in the standard variety. The results, however, appear more promising because their focus is, unlike Wenker, on syntax and not phonetics: (7.15) GerG AutG

Ich sprach schon I spoke already I hob scho g’redt (reden [g’redt] & sprechen [g’sprochen] are synonyms) I have-AUX already spoken-PastPart

Example (7.15) illustrates the principle by using a Standard German test sentence and its Austrian German translation – one can see that even in the absence of a normed writing system the difference in the tense (preterit spoke vs. perfect have spoken) can be clearly rendered and it is immaterial what letter to indicate vowel quality is used. An interesting variant of the reformulation task is suggested in Buchstaller and Corrigan (2011: 32–3). By offering the test sentence in the vernacular and asking respondents to perform a linguistic task, some of the issues of prescriptive influence can be by-passed. Example (7.16) shows their instructions and a test sentence (7.16) You will hear and then see a question, and you will be asked to turn it into the equivalent statement that sounds natural to you. Training session Question: Was John’s friend Ian at the party? Statement: John’s friend Ian was at the party. Now please do the same for the following sentences: Question: Will it be Susie what presents the cheque? Statement: ___________________________________ (Buchstaller & Corrigan 2011: 33)



Chapter 7.  Questionnaire design and data collection 251

It is reasonable to assume that respondents who are familiar with relativizer what, which is used in the Northern English & Scottish locales that Buchstaller and Corrigan are studying, will be more likely to produce it in the positive sentence than if a standard English prompt was provided (e.g. Will it be Susie who presents the cheque?).

Magnitude Estimation Tasks and grammaticality judgements Socio-syntacticians also use alternative methods in grammaticality judgements by employing a methodology known as the magnitude estimation task (MET) (Bard et al. 1996), which has been applied over the last decade or so. MET is a departure of traditional assessments of grammaticality on a binary (nominal) scale (yes or no). Inspired by psycholinguistic methods of measurement, METs offer respondents a reference sentence, which they are required to rate: it is up to the individuals to rate the sentence 1 or 5 or 100, as the value in isolation is immaterial. Only when a second sentence is rated and a relationship between the example sentence score and a test sentence is established, the numbers gain significance. An example is given in (7.17): (7.17) Reference sentence:  I’m going home and got an umbrella.

Your rating:   10

Now, please rate all sentences below in relation to the sentence above: Sentence Your rating 1. The man put his coat on a hanger.    18 2. That’s what I hate, is that she’s always late    8 3. I’m not going to eat nothing hot no more     5 (Buchstaller & Corrigan 2011: 36)

An advantage of this method is that it allows respondents to express subjective feelings of correctness. Sentence #3 was rated lower than the reference sentence because of its multiple negation though the person could rank it as low as 1 or even negative scores, which could be easily considered in the analysis, which deals with relative scores only. A version of the MET is to offer ratings in a visual form. This adaptation offers a visual line that represents the scales of acceptability and respondents are asked to mark their assessment on the line, as shown in (7.18): (7.18) Please use an ‘X’ to rate the acceptability of this sentence: I’m going home and got an umbrella. |-----------------------------------------| Now please do the same for the following sentences using ‘X’ again – this time to represent whether you think these are better or worse than the sentence above in bold: I really wants to buy those red shoes. |-----------------------------------------| Sometimes the girls thinks it’s boring. |-----------------------------------------|

252 The Written Questionnaire in Social Dialectology

The mark on the lines will then be measured for its distance from the edges (in millimeters, Buchstaller & Corrigan 2011: 38). A powerful argument for this question format is that some respondents may have problems handling the abstract (and open-scale) ranking system shown in (7.17). Consequently, the respondents “with low numeracy skills” who would otherwise fail the original MET are allowed to partake with the visual format (ibid: 37). These methods represent some of the new approaches to the study of dialect syntax for which WQs play an important role. They allow a reassessment of traditional WQ methodology, by introducing gradient scales. Thus questions such as John smokes a lot any more (is this something you would say/someone else/no one would) would not necessarily need to be answered in a binary fashion. Combined with a tendency of indirect reporting via the reporting of community behaviour rather than personal use, one would defray prescriptive pressures on the respondents (see Section 7.3.6)

7.3.4 Self-reporting language attitudes and perceptions Language attitude studies and perception studies make explicit the underlying beliefs of respondents’ conceptions of language. WQs have been used extensively to assess people’s beliefs, so much so, that they can be considered the field’s primary data collection tool. We have seen, in Chapter 5 for instance, Jenkins’ assessment of teacher attitudes towards English as Lingua Franca. The prominent role of WQs in this area has been long in the making. Mather and Speitel (1975: 25, fn 80) anticipated the use of WQs for “evaluation procedures, i.e. specific questions in which the informant is called upon to comment on his linguistic responses”. Attitude and perception studies tap into the beliefs and attitudes of respondents, which is different from the reporting of linguistic behaviour or the knowledge about linguistic forms found in a community. For instance, in the Canadian context, data on language attitudes towards Canadian English has traditionally been hard to come by. Those studies that do exist, however, have successfully employed WQs (Owens & Baker 1984; Gulden [Halford] 1979; Warkentyne 1983).

Language attitudes Language attitude studies aim to elicit the sociolinguistic evaluation of a feature, not its use. For instance, we might ask someone to “rate this construction for pleasantness” or the like and offer a scale for assessment from “very” to “not at all”, whereas in usage-based self-reporting we use the typical questions discussed in Section 7.3.3. The study of language awareness is a central part of the WQ tradition and central to sociolinguistics, since social evaluations of a linguistic feature are an intrinsic part of an explanation of linguistic change (Weinreich, Labov & Herzog 1968). William Labov, among others, has been very clear in making this point:



Chapter 7.  Questionnaire design and data collection 253

The speech community is not defined by any marked agreement in the use of language elements, so much as by participation in a set of shared norms; these norms may be observed in overt types of evaluative behaviour, and by the uniformity of abstract patterns of variation which are invariant in respect to particular levels of usage. (Labov 1972: 120–1)

Put plainly, membership in the speech community is not defined by the simple notion that people speak the same, but by the more abstract notion that they evaluate communal linguistic variations similarly, and by shared linguistic patterns. It is here where attitude studies provide much valuable insight. One attitude result, from a Vancouver sample of 429 respondents (2009) is shown in Figure 7.2. The data was collected with question (7.19): (7.19) Is there a Canadian way of speaking? Strongly agree Agree Somewhat agree Somewhat disagree Disagree Strongly disagree

The question’s vagueness is intended, as we did not want to specify the details, but primarily gauge the ‘feelings, wishes, or attitudes’ of the respondents as expressed in a Likert Scale rating. Figure 7.2 shows the influence of post-secondary education, with the university educated respondents’ answers on the bars on the left and the non-university-educated on the right. It can be seen that the strongest believers in Canadian traits are found in the university educated (columns 1 “strongly agree” and 2 “agree”). In column 3 “somewhat agree”, the non-university educated take over and also lead the sceptics in columns 4 “somewhat disagree”, 5 “disagree” and 6 “strongly disagree”. 0.4

Canadian Way of Speaking and Education Edu>=4 Edu>=3

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

str ag 1

2

3

4

5

str dis 6

Figure 7.2  Answers to “Is there a Canadian way of speaking?” by education (left: university education, right: up to community/trade college)

254 The Written Questionnaire in Social Dialectology

Perceptions As mentioned in Chapter 6, Wallace Lambert’s (1960) study is generally considered as the starting point of speaker-assessment studies. How linguistic varieties are perceived can be assessed in a number of ways (see Giles and Billings 2004 for a succinct genesis of the field). The most famous method is Lambert and associates’ matched-guise technique, in which bilingual speakers, one time speaking French, the other time speaking English, provided the audio cues that listeners rated by a number of character traits of the person behind the voice they just heard, such as good looks, self-confidence, kindness, ambition, likability and even body height. The method was soon extended to varieties of the same language, from which a strong research tradition developed. A more recent development and extension of speaker (in this case hearer)-based assessment was pioneered in Preston (1989) and following publications with “perceptual dialectology”, which has developed a methodology for WQs with ratings and map-based response types to elicit perceptions of dialects. This methodology puts the regional perspective of language front and centre but does not deal with “factual” distributions of dialects and dialect regions, yet with their perceived constructions. Illustration 7.1, for instance, shows the information gathered via map-delimitation and labelling tasks, in this case from a young southern Michigan student.

Illustration 7.1  Raw material from Preston’s map task (18-year old southern Michigan female student) (Preston 2005)

One can see from the region delimitations but more so from the region labels that the method grants access to “the ordinary speaker’s understandings of language variation” (Preston 1989: 2). This ethnolinguistic (macrolinguistic) approach to linguistic



Chapter 7.  Questionnaire design and data collection 255

perception can probe into popular taxonomical aspects of a language or variety, the social characteristics that are “overly regarded by a speaker as supporting linguistic differences” and an ordinary speaker’s belief in geographical language differences (ibid). All of the above is present in the WQ answers in Illustration 7.1 – from “Hillbillies: Hick talk” to Canadian slang “Ey” and the “society freaks” on the East Coast: there is a mix of perceptions of people and their varieties. When analyzed in a consistent way over a large sample, one arrives at very coherent pictures of perceived dialects, as we will see in Illustration 7.2. Preston’s approach has been applied to Canada in McKinnie and Dailey-O’Cain (2002), whose goal was to document young Ontarians’ and Albertans’ “perceptions of the cultural and linguistic make-up of the country, as well as their attitudes towards English speakers in other geographical areas in Canada” (p. 277). Following Preston’s (1989) methodology, respondents were asked to judge each province or territory’s variety for perceived ‘correctness’, ‘pleasantness’, and similarities to their own variety on a scale from 1 (most) to 5 (least). A possible attitude elicitation question is shown in (7.20): (7.20) Rate the English of Canada’s provinces and territories on a scale from 1 “most correct” to 5 “least correct”:

British Columbia 1 2 3 4 5 Alberta 1 2 3 4 5 Manitoba 1 2 3 4 5 […]

In part two of their study, the authors provided respondents with a map of Canada with only provincial and territorial boundaries and asked to “demarcate areas in which they believe people speak the same as each other but differently from people in other areas of the country” (p. 279). The study yielded three types of data: quantitative data on correctness, pleasantness and ‘sameness’, quantitative data on regions identified in the map task (for which all regions were considered that were identified by at least 10% of the respondents) and qualitative data via the labelling of the dialect regions. The results are highly interesting and show some striking consistencies. For instance, young Albertans view their province’s English as very pleasant. As Table 7.3 shows, with a mean rating of 2.05, Albertan English is considered as pleasant. BC English, however, is considered even more pleasant, while Ontario English is rated comparatively low. This is surprising, as Ontario is the traditional economic powerhouse of the country and such assessment is counter-intuitive to ad hoc assessments.

256 The Written Questionnaire in Social Dialectology Table 7.3  Perceived pleasantness of dialects (McKinnie & Dailey-O’Cain 2002: 280) Albertan respondents’ perceptions of “pleasantness” BC AB MB SK NB PE ON NS YT NT NL QC

Mean

s.d.

1.95 2.05 2.33 2.39 2.59 2.60 2.63 2.64 2.68 2.77 2.89 3.06

1.09 1.09 1.07 1.09 1.07 1.08 1.19 1.11 1.14 1.12 1.29 1.22

Ontarian respondents’ perceptions of “pleasantness” ON BC AB PE SK MB NB NS NL YT NT QC

Mean

s.d.

2.17 2.22 2.57 2.67 2.68 2.69 2.76 2.81 2.94 2.95 3.02 3.07

1.20 1.12 1.05 1.11 0.93 0.95 1.06 1.06 1.35 1.02 1.07 1.25

Ontarians, by contrast, rate their English as the most pleasant (2.17), followed by BC (2.22) and they are more generous with Albertan English (2.57) than vice versa. However, they share the low opinion of Quebec English with Albertans, but assign generally worse scores than the Albertans. In terms of perceived “correctness”, BC is considered the most correct English by both Albertans and Ontarians (ibid: 283), followed by their own province’s variety in second place. In third place are Ontario, respectively Alberta. At the other end of the spectrum, Newfoundland and Quebec are considered the second least correct and the least correct varieties. McKinnie and Dailey-O’Cain (2002: 284) remark that the ratings of Newfoundland and Quebec English “are only slightly higher than three, on a scale where three is the mid-point, which indicates that [Canadians] find theses varieties neither particularly ‘correct’ nor particularly ‘incorrect’.” The analysis of the map-drawing and labelling task affords a more nuanced view on the perception of dialect regions in Canada that may be compared and contrasted with the regional dialect zones derived from linguistic data in Table 6.10 (p. 218). Illustration 7.2 shows the accumulated responses with a minimal overlap of 10% of respondents. Together with the labels assigned to these regions, one arrives at very interesting insights into perceptions of Canadian English. The most frequently identified regions are Quebec and Newfoundland, which are “simply the most distinct forms of English in Canada” (ibid: 292), whereas in the case of Quebec the L2 English of francophones is commented on with labels like “Franglais” or the extremely pejorative label “Bad French Bad English”. Ontario is singled out next, as are the Maritimes (New Brunswick, PEI and Nova Scotia) and the Territories. The



Chapter 7.  Questionnaire design and data collection 257

Precentage of respondents drawing boundary – – – – +

Illustration 7.2  Perceived dialect regions by 100 Ontarians and 100 Albertans (McKinnie & Dailey-O’Cain 2002: 291)

latter are perceived collectively because their English appears to be “Influenced by First Nations”, as one respondent put it. BC and Alberta are discriminated by 10–25% of respondents, while the same percentage recognizes a Southern BC variety and a Southern Ontario variety. While Southern BC was rated as “Canada’s most correct English” or “ more proper” and “normal”, the Southern Ontario region received less favourable attributes, such as “American style” or “Industrial Northern States type accent” (ibid: 290–1). It is obvious that these perceptions are of relevance. Not just to the extent that they inform decisions about language in real-world contexts, but as part of the description of a sociolinguistics that places social aspects and interactional issues at the centre. For instance, from traditional WQ data, one would have no means to gauge the radically different assessments of the Southern BC and Southern Ontario varieties. In addition, McKinnie and Dailey-O’Cain point out that the Englishes of the three economically most powerful Canadian regions – BC, Alberta and Ontario – were “consistently rated most highly in terms of ‘correctness’, which is an important sociolinguistic finding that is matched in varieties, though is perhaps less obvious in the Canadian context.

258 The Written Questionnaire in Social Dialectology

7.3.5

Community-reporting of linguistic behaviour

While self-reporting questions have been used most frequently in the Canadian context, the reporting of general community-wide linguistic behaviour has a long tradition since Stadler’s early 19th-century dialectological approaches and, of course, Wenker’s method of sentence transcriptions by school teachers. So far we have seen community-reporting questions interspersed with self-reporting questions that have been worded in ways such as the following: Do you use the construction under the right circumstances? Do you know of others who use it? Noone you know uses the construction.

In contrast to self-reporting, community-reporting questions do not target the linguistic behaviour but the knowledge of the respondent: does s/he know anyone who uses the variable? They are thus knowledge questions, not behavioural questions in Dillman’s (1978) typology and would need to be framed accordingly. Most recently, Buchstaller et al. (2013) rediscovered community-reporting as they chose to frame questions on non-standard grammatical items in a community-reporting way, in order to circumvent the social stigmata of some variables. In other words, rather than asking whether a person uses the construction They wants more cookies, they asked whether people in the community use the construction on a scale of grammatical acceptability, as shown in (7.21): (7.21)

1: 2: 3: 4:

This type of sentence would never be used here – it seems very odd. This type of sentence is not very common here but it doesn’t seem too odd. I have heard this type of sentence locally but it’s not that common. People around here use this type of sentence a lot.

Please rate these sentences as described above. The local supermarket got robbed and the police were looking for a witness. They were asking a group of children whether they had seen anything. Suzie pointed at a little girl. She said ‘That’s the girl seen it’. 1---------------2---------------3---------------4

(Buchstaller et al. 2013: 95)

Using WQs, Buchstaller et al. (2013) assess both morphosyntactic and phonological features, which offer them detailed data that had been unavailable on their phenomena. What seems important is that they approach syntactic and morphological non-standard phenomena in a more indirect way than previously. Rather than asking whether someone uses an non-standard construction (or someone respondents know), they ask about behavioural norms in a community. This more indirect approach seems poised to offer new avenues for the successful employment of WQs in social dialectology.



Chapter 7.  Questionnaire design and data collection 259

As a positive side-effect, providing non-standard cues is one of the most effective and simple means available to mitigate standard language interference. The use of written language is, at least until the advent of digital and social media communication, generally associated with more standard-like behaviour, which is an effect that would cause respondents to answer with more formal responses. With non-standard cues, this kind of standard-interference is reduced as much as possible for the WQ method. There are more methods to reduce prescriptive influences in WQs, to which we turn next.

7.3.6 Mitigating prescriptive influence: Framing the questions Standard sociolinguistic practice has long favoured the study of observed linguistic behaviour whose (elusive) goal is to observe, usually via audio recordings, natural speech situations with the aim to capture unmonitored, vernacular style. Observation is only possible with an observer in place which increases the likelihood of linguistic monitoring. With a written, self-reporting questionnaire, we do not have the same kind of problem, albeit the respondents who fill out the questionnaire know full well that somebody is going to read their answers at some point and he or she may even assume that the person handing them the questionnaire might do that. Still, while we have a kind of implied Observer’s Paradox, the fact that WQs do not ask for names, addresses or phone numbers – and are thus anonymous – would surely instill some level of confidence. If the questionnaire is returned by mail, the respondent is not asked to provide a return address, in an online format, the IP addresses would not be saved or exported into the analysis sheet. In this section we explore some verbal and non-verbal measures that help mitigate prescriptive influence.

The role of instructions WQs have decidedly different methodological challenges to address than interviews. As respondents read linguistic questions, they interpret them from their various backgrounds and provide answers from among a set of choices (yes or no questions, a set of answer choices), and sometimes fill in their own response(s) to open-answer items. The written medium and its association with schooling is a complicating factor as some respondents would inevitably select the more formal choices that researchers generally do not want to elicit. To alleviate such behaviour to the extent possible, the respondents are informed about the purpose of the questionnaire, at which point a statement against prescriptive influence is warranted. Dialect Topography, as seen in its questionnaire on the webpage,28 informs respondents about the general purpose of this study – words and their pronunciation in 28. [22 August 2014]

260 The Written Questionnaire in Social Dialectology

Canada and bordering U.S. regions, but, more importantly, respondents are asked to report on what they ‘say when among friends’ and not on what they think is ‘proper’ linguistic etiquette. With these instructions, one aims to circumvent the effect of schooling (“good grammar” vs. “bad grammar”) and the avoidance of linguistic behaviour that some speakers consider as not preferable. DT also asks respondents not to ponder over their answers and that their gut reactions usually work the best (third paragraph). The idea is that snap judgements produce more authentic assessments than well-crafted and reasoned responses. The B.C. Linguistic Questionnaire as developed by Polson included in a later version quite elaborate and informal instructions, which are quoted in Illustration 7.3 from Stevenson (1976: 84–5). The informality of style and directness of Robert Gregg’s instructions (Stevenson’s & Polson’s supervisor and director of the BC Survey) could hardly be more extreme. It clearly shows that the “city boys” (in 1970s generic masculine wording) and “girls” were aiming to reduce prescriptive influence. In today’s WQ one would probably aim to word a bit more concisely, but the general tone would still work. INSTRUCTIONS FOR THE B.C. LINGUISTIC QUESTIONNAIRE Please read these instructions before trying the questionnaire. This is NOT a test. We do NOT presume to judge people’s speech habits; we merely record them. So when you answer the questions,   PLEASE put down what you actually say.    Do NOT put down what you think you SHOULD say.    Do NOT put down what your friends and relations think you should say. We picked YOU as our informant and we want YOUR answers. Don’t let anyone else tell you what to put down. If someone else wants to add his two-bits worth and you think the information might be interesting, add a note to your answer telling who the other person is and where he comes from. But PLEASE don’t let the other person influence your answers.    It isn’t very likely that you’ll be able to answer all the questions. Don’t worry about it. The fact that you don’t use a particular word can be just as important as the fact that you do. […] Furthermore, we’re city boys. If you would assume that we’re shockingly ignorant about most things, and try to set us straight, or give explanatory notes, or add information, it would be helpful (and don’t worry about your writing – we’re not snobs). […] [examples of questions follow]

Illustration 7.3  Instructions from Gregg’s BC Linguistic Survey (Stevenson 1976: 84)

While all WQs use some sort of instructions, their effect seems to be less clear. Instructions to linguistic surveys have been found to offer only a limited kind of guarantee that the respondent would observe them. Schütze (1996: 133) reports of a study on grammaticality judgements and reasons, somewhat disappointingly, that “the exact



Chapter 7.  Questionnaire design and data collection 261

contents of those instructions might not matter a great deal”. One set of instructions “invoked English professors marking term papers, the other emphasized the absence of right or wrong answers and appealed to personal reactions”, yet the different instruction sets “turned out to have almost no effect on the pattern of responses” (ibid). Such results are mirrored in the social science literature, where it is widely acknowledged that “many respondents do not read the entire content of questionnaires in a thoughtful way” (Dillman 2000: 81). While we would still want to offer instructions, one might want to consider using indirect verbal cues and visual aids rather than instructions alone to elicit the desired low-level formality response that is anticipated.

Using informal language in the questionnaire Since instructions were shown to only offer a small effect on the results, we may think of other aspects of the questionnaire to elicit non-standard responses. One of the most effective ways to do so is by using informal language in all texts offered, in the consent email, in the instructions as well as in the questions themselves. In the Canadian context, Polson’s (1969) study is credited for establishing a model for informal question tone. Examples include: (7.22) a. We aim to discover → we’d like to find out b. Dear participant → Hi there (depending on the context and target group) c. My residence is different from yours → My house is different than yours

In short, the goal is to mirror the informality of the spoken language in a given region and considering the target group, in as much as the written medium allows. As (7.22a) shows, lexical choice tends towards the informal (use of phrasal verb find out instead of Latinate verb discover) and the use of contractions (’d) is common. Terms of address are more informal as in (7.22b), in as much as the social conventions allow. The examples in the questionnaire items should also remain within a common core vocabulary (house instead of residence) (7.22c). Another way to approach the issue of linguistic formality of the WQ text as such is to write the questionnaire with the reading skills of a Grade 9 student in mind.

Reliability Some studies examine respondent consistency when presenting the same item twice. When given the same sentence to be judged again, some respondents show somewhat erratic behaviour. Carden (1973) found reliability measures between 87 and 92 percent for responses within one point, on a scale of five or six, of the previous rating, which falls to 67% if any changes are included and rises to 97% if only “radical” changes are considered. Greenbaum and Quirk’s (1970: 43) test of individual consistency in grammatical performance and evaluation tests found a “very high level of consistency” at 84 and 82 percent of matching results.

262 The Written Questionnaire in Social Dialectology

The overall message seems to be that ratings are reliable within a given range, but not for absolute categories. This result will facilitate the interpretation of gradient data and offer a motivation to collapse answer classes, e.g. on a six-tiered scale, results that are reported in three groups (the two most extreme classes each combined with a neutral class) or even two groups would offer more reliable results overall. To increase this match further, one approach would be to opt for binary variables rather than gradient variables or to reduce gradient answers to basic two or three categories, however. This “safe” choice comes with a downside, however, as it would mean that a number of statistical tests, which are common in social evaluation studies, would not be applicable. It is a choice that is probably best made based on a particular set of variables and a number of variant options in a pilot study (see Carden 1976: 104), since at this stage little work has been carried out on that important matter, as, in general, different methodological approaches to the same variables and social contexts are the exception. Such comparative studies, however, would be indispensable for good judgement calls.

Some tentative insights: How to ask and how better not Pilot testing of single, draft questions (raw questions) is a very effective way to learn about the needs of the target population. The following findings are based on 20 respondents from quite homogeneous sub groups of the target group, e.g. university students, retirees and the like and were solicited in the ENGLISH 489 majors seminar in the fall term of the 2014–15 academic year. The question was handed to the respondents on a slip of paper with the instruction to “answer this question”, such as in the examples in (7.23): (7.23) a. Do Marry and Mary sound the same? Yes No b. What do you call the toilet facilities in public places, such as airports, restaurants or shopping malls? bathroom restroom washroom lavatory other (please specify) c. Complete the sentence: He didn’t mean to bump into you, he did it _____ accident. on by (Tawnie Chambers 2014)



Chapter 7.  Questionnaire design and data collection 263

d. Please read the following sentence, then select the statement that is most applicable.

It is stressful not to know what to do.

This type of sentence would never be used in my community – it seems odd. This type of sentence is not very common in my community, but it doesn’t seem odd. I’ve heard this type of sentence in my community, but it is not that common. People in my community use this type of sentence a lot. Hirota (2014)

The examples in (7.23) represent different stages in the development of questionnaire items from a first hunch or “raw question”. We will address their strengths and weaknesses, as identified in the pilot test phase, one by one. Example (7.23a) has, in one form or another, been asked for a long time in the Canadian and American contexts. Everybody asked understood the text (clarity), but some asked the administrator of the question whether they were supposed to sound the same (Jeff Ashkinasi). Since it is one of three angles on the merger of the front (first) vowel in merry, marry and Mary in CanE, it would need to be followed up, preferably after some other questions and the instruction not to go back and “align” or correct previous answers, with questions on Mary and merry and merry and marry. Interestingly, once the overall purpose of the question was clear – being part of a set of questions – the lay respondents suggested to group the questions together. This, however, is not to be recommended as it would entice the respondent to draw comparisons that seem to be more based on spelling, causing respondents to assign different phonetic values to different spellings, a common phenomenon in highly literate societies. Example (7.23b) is a question that did not trigger too many problems: 18 answered with the Canadian majority variant washroom, while only 1 answer each for bathroom and restroom were reported (Christopher Cheng’s report). It is crucial for this lexical variable, like for others, to offer a description that is precise enough but not too awkward or bulky. In this context, most Canadians would call a room with toilet at home a bathroom, whereas such facility in public places a washroom. It is not only the rookie researchers who is prone to making errors of that type. More generally, for lexical questions with a well-researched set of variant options, very few problems pertain. Jasmine Chen reports on a similar type of question about the name of sports class in school (phys-ed, gym, P.E. other [please specify]) 18 of 20 answered the question quickly and without raising any questions or reporting any problems when asked. In one case, the respondent wanted to know whether the current name or the name during her time in school was wanted – presumably commenting on a change in progress she noticed. The other person volunteered the information on how to make the question less redundant in one minor point. Other comments included form “Oh, this is a good

264 The Written Questionnaire in Social Dialectology

question” to “It’s definitely called [P.E.] here [in Vancouver]”. These comments offer valuable insights into possible variants that are missing and offer leads. One question that arose in a number, but by far not in all contexts, was that respondents in the Canadian context were generally interested in the linguistic question and were curious as to what it was used for. This interest can and should be harnessed to generate interest in WQs. However, the most problematic kind of feedback was that the respondents generally tried to figure out “the right answer”, which is a very important reminder that every effort must be made to remind the respondents that their choices are wanted not any “right”, prescriptively sanctioned ones. It is therefore paramount to include proper instructions, despite their sometimes less consistent effects, as suggested by the limited research on the matter in the previous section. Questions (7.23c) and (7.23d) explore new variables and are therefore treated separately. Undergraduate student Tawnie Chambers explored first question (7.23c), while MA student Tomoharu Hirota used WQs as one of several tools to explore split infinitives (7.23d). Tawnie’s hunch that there might be some variation concerning the preposition, where traditional by is beginning to be rivalled by on, is an interesting case. Her pilot study showed that all answers for innovative on accident were by Americans, while traditional by was chosen by all Canadians, one Brit and two L2 speakers of English. As a consequence of respondent feedback, however, the question was shortened to (7.24): (7.24) Which do you say: He did it by accident. He did it on accident. Other: __________________________

Because of the unambiguous context, it was felt that the context scenario was, in fact, not needed. If the target group are, for instance, learners at a lower level of competence, the fuller context would be helpful. These considerations, obviously, depend on the situational and social contexts and need to be taken into account and tested in good pilot trial runs. Tomoharu Hirota’s quest into assessing the rise of the split infinitive, the erstwhile pet peeves on many self-declared grammar mavens, took him from traditional corpus study to exploring WQs to that purpose. The rationale behind not drawing attention to the split infinitive was originally inspired by avoiding drawing attention to a potentially heavily stigmatized, or at least prescriptively dispreferred, construction in some social contexts. An interesting point is how to consider respondent feedback. If taken into consideration, many respondents, after being informed of the target split infinitive, suggested visual marking of it. If this choice is taken up, however, one needs to be clear of the repercussion, which lies in a – difficult to quantify – effect of increased under-reporting.



Chapter 7.  Questionnaire design and data collection 265

As shown above, Buchstaller et al. (2013) deliberately chose the context of community reporting – in your community – in order to allow the respondent to report on less prestigious forms without the risk of losing face, which even in anonymous surveys is a factor. However, the pilot study revealed confusion about what the “community” constitutes: one’s place of origin or one’s campus (in the case of university students), among other things. This led to the use of the phrase “in your circle of friends” rather than “in your community”. On the one hand the group is better defined, on the other, though, the stigmata-avoiding effect is lost, as one clearly is part of one’s own friendship circle. It is this kind of reasoning that needs to be made explicit and then weighed. Tomoharu decided to put (7.25) on the survey: (7.25)

Have you heard the underlined construction? It’s fine to not know everything. a. I’ve never heard this type of construction among my friends. b. I’ve rarely heard this type of construction among my friends. c. I’ve heard this type of construction among my friends sometimes. d. I’ve heard this type of construction among my friends a lot.

Whatever the decision, there will be trade-offs: gain in one dimension, e.g. less ambiguous question contextualization, is at times lost in another one, e.g. loss of stigmata reduction. These few examples should have demonstrated the principle behind raw question testing. In addition to single question testing, it is equally important to test the entire WQ with a few people of different social groups. After having them do the questionnaire, in, possibly, two or three sections, the researcher needs to follow up with question on the “flow” of the questionnaire and on any issues that seem confusing, ambiguous or just not applicable to a particular person. In addition, it is a good idea to see if the WQ is not too onerous, perhaps entertaining and not “bothersome” in any way unless that is intended. As WQ designers we need to give the respondents as little reason as possible to discontinue our WQ, or, even worse, not to start it.

Defining evaluative categories An issue that is more often than not disregarded is to offer brief definitions for one’s evaluative categories, or at least contextualizations. For evaluation tasks, criteria such as “acceptable” or “grammatical” are frequently and generally used. There is good evidence to suggest that the criteria should be defined, as respondents interpret them in very different terms (Carden 1970). Schütze (1996: 132–3) lists a host of studies that concur with Carden’s point, yet points to a sorry state of WQ research in this respect. He reasons that “if we were to ignore all studies in which we believe the instructions to subjects were inadequate to convey the subtlety of a linguistic definition, the remaining studies could be likely counted on one hand”.

266 The Written Questionnaire in Social Dialectology

In practice, defining categories need to be little more than to paraphrase terms used in the evaluation, e.g. “acceptable” means you would use the sentence in conversations with friends if the context arises

Or whatever context one wishes to elicit. Binary categories (yes, no questions), might represent an exception to this rule: Do you pronounce Variable X with the vowel in Y? Yes No

It seems that further explication reveals the limits of the existing question formats, as phonetic differences have not been successfully addressed in WQs. For this reason, nominal/binary choices are best left uncommented, provided that the question is phrased clearly. There might be, however, one avenue of inquiry that promises to push the envelope of WQs and phonetic study further.

Harnessing a pedagogical phonetic alphabet for social dialectology? As has been mentioned, the opinion that WQs are not particularly well suited for eliciting phonetic detail is, of course, correct. But rather than ruling sounds out categorically, there may be ways to exploit the written medium creatively. As was shown in Chapter 2, the basic type of questions on pronunciation, despite refinements, has not been significantly altered since Hempl’s late 19th century survey, and can be summarized as using key words that are pronounced (largely) invariably to elicit information. We said at the beginning of this book that this basically means that such pronunciation questions are limited to phonemic information, while phonetic quality is generally beyond reach. However, recent developments in speech pathology and therapy seem potentially promising to push the agenda of WQs and non-phonemic pronunciation further. Sound files, played to respondents and matched with closed answer options would allow participants to align their own pronunciation with the one sound option that fits closest. This, in itself, is an interesting proposal that will likely be exploited in the years to come. Visualization techniques of phonetic features, which are being developed in speech therapy (Ruß 2008) offer a potentially simpler method, a method that can also be used on paper. As there are some contexts and users for which paper is the preferable medium, pursuing this traditional form of delivery seems worthwhile. The basic idea is to break down each phoneme into some of its phonetic components and articulatory features, e.g. vowel height, voicing, lip rounding, airflow, and to find intuitive visuals to represent them. In a way, it is taking the 16th century approach of early phoneticians (orthoepists) into a visual realm. The orthoepists described the production of speech sounds by observing the shapes of the articulators and contrasting them with other sounds. Their appeal was their mixture of precision with generally understandable descriptions. In its 21st century guise, the “visual description” of



Chapter 7.  Questionnaire design and data collection 267

Lautbilder – sound pictures – is informed by pedagogical principles and not by absolute precision. This mix is of interest for WQs. As the method has proven efficient even in young children, it is reasonable to believe that its adaptation might enrich the limited number of options for eliciting phonemic and (more limited) phonetic options.

Order of stimuli and trial items Recently, some researchers (e.g. Buchstaller & Corrigan 2011; Buchstaller et al. 2013) have begun to reordering questions to balance any error that might stem from their presentation sequence. Greenbaum and Quirk (1970: 35) show that “significantly different results” are “undoubtedly attributable to the effect of sequence” in their tasks and recommend that each test battery, or questionnaire, needs to be presented “in more than one order”. Schütze (1996: 134–5) offers an alternative explanation for such mismatch in the lack of “warm-up trials”, which are common in psycholinguistic studies. Greenbaum and Quirk (1970: 32) suggest for tests in which respondents perform a linguistic operation (moving adverbs in a sentence etc.) that some questions “may yield a particular result merely on account of its occurring very early in the [test]” and that the first few items “on any occasion provide in effect practice”. These findings are based on operations that are comparable with translation or reformulation tasks but will not apply as much to traditional WQ items. In any case, starting a questionnaire with two or three linguistic “practice” questions (without informing the respondents) seems like sound practice. Intuitive formatting & item ordering The formatting of questionnaires may be considered a trivial matter, but it is one of the few features that researchers have full control over and that might influence the response rates as well as the reliability of findings. Readability – do the fonts work well in a digital environment, is the print version clearly legible – are key factors. Issues such as colour choices for paper, ink, margins all play a role. Most importantly, an attractive and clear design contributes directly to increased response rates in general, and at higher proportions in those social groups that are least likely to respond to WQs (Dillman 2000: 81). It is therefore important for the general validity of the sample (see 7.3.8) that all formatting tools are used. Moreover, since respondents have been shown to not necessarily read instructions carefully, the visual design features should be thought of as offering cues for those respondents as one of the most important structuring devices of the WQ. These tools include, but are not limited to, the formatting of answer choices, working with line breaks and indentations in the text and the like. The order of the items is another issue. It is often recommended that more sensitive questions are asked at the end of the questionnaire and not at the beginning as a “respondent who has spent five or 10 minutes already answering questions is less likely to respond to an objectionable question by quitting” (Dillman 2000: 87). Objectionable

268 The Written Questionnaire in Social Dialectology

questions can be questions about one’s age, income and the like. Schleef (2013) recommends in general placing the social background questions at the end of the questionnaire, though the practice may need to be weighed with other concerns. The trend, though, seems to be in line with Schleef ’s recommendation. Within the questionnaire, the sequence of questions should be logical from the respondents’ point of view, as much as discernable, as was addressed in the section of questionnaire structure (7.2). Another important issue concerns the first question of the questionnaire. The first question is of extreme importance. The WQ’s first question – must be applicable to everyone – must be easy to answer – needs to be interesting 

(Dillman 2000: 92)

Bad examples of a first question would be an open answer question (they are more work than a limited response option), good examples those that tap into widely-discussed linguistic questions. One might even go so far to include a question that one is not interested in, but that is widely known, e.g. pot[eɪ]to or pot[ɑ]to is one such variable. These questions can also be used as a “warm-up question” that is recommended by Schütze (1996), as seen above. Easier still would be a variable on lexical choices: in the Canadian context chesterfield would make a good first question that is easy to be answered. Dillman summarizes: “No single question is more crucial than the first one”, in the sense to increase the response rates of the survey (2000: 92). Of course, he is right: asking for age (as has been done in some social science WQs), or even income in this place would diminish the return rate.

7.3.7

Piloting and revising the questionnaire

WQs should be tested twice: first, when a particular raw question is transferred into a questionnaire item, as discussed above, and, second, the completed draft WQ should be exposed to trial runs from start to finish. As with any empirical study, it is paramount to run these pilot tests. They can be done informally by the investigator, or can involve focus group interviews on a bigger scale. At first it is enough to get a sense of the strengths and weaknesses of a questionnaire item or the completed questionnaire, which is best done with the help of volunteers from a small number of different social backgrounds who provide (oral) feedback while filling out the questionnaire. Such think-aloud protocols are immensely useful in identifying questions that work and those that do not. These answers provide invaluable cues to problems that need attention and may dispel fears of the researcher about anticipated problems of a given section or question. The pilot phase is one of the most important aspects of WQ design, as there is only one chance to get the questions right. This is not only necessary from an economical



Chapter 7.  Questionnaire design and data collection 269

point of view, but also from an ethical one. As most WQ surveys are anonymous in nature (unless they are longitudinal studies that need to identify respondents to match their answers, or when payment is offered, which requires at least an email address and often many more details), respondents do not need to fear that their data will be matched against their names. Anonymity also helps when dealing with Ethical Review boards and the administration of the survey: since we are not interested in the identity of the respondents beyond their group-specific background information (age cohort, sex, occupation, education and so forth), WQs are generally considered “minimum-risk” studies. If we do not elicit the identity of the respondent, safe storage and access restrictions to the data collection can be simplified. However, anonymity also means that we need to get the answers “right” the first time around, as there is no second chance to go back and complete an incomplete questionnaire. Testing or piloting the individual questions in isolation as well as the completed WQ is therefore imperative, as it offers cues for revisions that the researchers themselves may not be able to see.

7.3.8 Social background questions In Chapter 2 we have seen the development of the demographic (social) background questions over time as part of the general history of WQs in dialectology. Today a ‘standard set’ of social variables can be offered, as shown below. Of course, depending on the study, this set needs to be adapted or expanded: Gender (sex) Age (by cohort, or continuous in years) Education Ethnicity (self-defined) Place of residence Birth place Residence history with approximate ages since birth Occupation Father’s Occupation Mother’s Occupation Father’s birth place Mother’s birth place Languages spoken and competence level Information for any indices needed (see Section 8.2).

As mentioned earlier, there is some disagreement in the literature whether to place the social background questions at the beginning or at the end of the questionnaire. On one hand, they may be used to “warm up” the respondent, as they are generally easy and straightforward. The wrong background question, however, might interfere

270 The Written Questionnaire in Social Dialectology

with the return rate (e.g. age, education for some lesser educated respondents). On the other hand, offering the background questions early will almost certainly negatively affect the response rates as some respondents, especially if socially sensitive information is elicited, e.g. income or (in some contexts) place of residence (neighbourhood), respondents would be more likely to quit the survey. Generally, it seems that more and more surveys ask these questions at the end, though one needs to ensure that respondents do not “feel tricked”, when at the end of a questionnaire and after having put in some time, they are asked social questions. It is recommended to clearly state that “social background questions (e.g. age, sex, education)” will be asked at the end. Regardless of how this is framed, some respondents will choose to discontinue the survey after the linguistic part, which means that some “orphaned” responses will need to be discarded.

7.4 Population sampling By sample we mean that part of a population that can, in principle, participate in a survey and sampling is the concrete method how the participants are chosen. Ultimately, we are interested in the behaviour of a population, which is why the sample must be representative of the population. If one wishes, for instance, to study the social embedding of linguistic change in a particular setting, one needs to ensure that the sampling is in its social make-up as balanced and representative as possible (women, men, younger, older, educated, less educated and the like should be as equally represented as possible). If we aim to describe the behaviour of traditional Canadian variables in Vancouver, we need to have an idea of the Vancouver population and how we are going to sample the population, since we cannot ask every Vancouverite to partake in the survey (and even if we could, not everybody would). While balanced sampling is the goal, in reality in WQ data some groups will be over-represented. The goal must be to eliminate these differences as much as possible by actively reaching out to social groups the researchers are typically not in contact with.

7.4.1

Random or judgement sampling?

Random sampling is a key component in the social sciences. The idea is that every member of a population must have the same chance to be selected in the sample. With (landline) telephone registries (assuming that every household has a phone line), one could draw samples from a publicly accessible list only 15 or 20 years ago, but reaching representativity has become a more complex issue since: which lists could be used today? One issue of big concern in random sampling are the increasing percentages of nonresponse rates: if not everyone has an equal chance to return a WQ,



Chapter 7.  Questionnaire design and data collection

because some groups tend not to do so (“nonresponse error”) or are not reached in the first place (“coverage error”), the sampling will be skewed and the results not reliable (Tourangeau & Plewes 2014; Dillman 2000 for traditional solutions to this problem). It is important to point out that most sociolinguists work under special conditions. They are either intimately familiar with a speech community or work with what may be perceived prototypical members of a group. Most sampling methods in sociolinguistics are therefore not random samples, but judgement samples. The researcher uses his or her special knowledge of the community to select the informants and once basic choices have been made, often more detailed background information is documented. In cases where the researcher is a member of the group, such documentation is often not needed since the data is already known. Early studies in sociolinguistics, inspired by the social sciences, started with random sampling. Shuy et al.’s (1968) study recorded 702 (!) subjects, while Wolfram (1969) only had to use 36 of them to arrive at meaningful findings. A couple of years earlier, Labov applied a similar approach by selecting 122 subjects from a list of 617 provided by a social sciences survey (Labov 2006: 108; 117). Labov himself justified the smaller, selective sampling technique with the fundamental difference between data in the social sciences on the whole and linguistic data: If the type of behavior which was being studied was similar to most forms of behavior that are investigated by social survey, the value of the study could be measured by how far it [the sample] fell short of the […] standards [of the social science survey]. However, linguistic behavior is far more general and compelling than many social attitudes or survey responses. The primary data being gathered […] are not subject to the informant’s control in the way that answers on voting choices would be. (Labov 2006: 114)

While there is a wide range of possibilities of someone commenting on a political leader, there are only so many alternative variants one can meaningfully identify for most linguistic variables. In these situations, an insider can make meaningful selections of informants that do not skew the data, but it is important to keep in mind that sociolinguists work under the most favourable conditions. Technically speaking, most sociolinguistic surveys have varying degrees of randomness: Gregg (2004 [1978/79]) took great pains to arrive at a randomized sampling of his subjects, quite unnecessarily so, while later studies, e.g. Tagliamonte and D’Arcy (2007), took full advantage of judgement sampling. In WQ studies, it is easy for a person to reach 200 people and more in a week, provided one taps into the right network channels – either in person, via mail, or via the internet. The Atlas of North American English, mentioned in Chapter 1, is a ground-breaking work (Labov, Ash and Boberg 2006). Its findings for mainland Canadian English in an area stretching some 6,000 kilometres from east (Halifax,

271

272 The Written Questionnaire in Social Dialectology

Nova Scotia) to west (Vancouver, British Columbia) are based on the acoustic analysis of only 33 speakers (p. 220). Follow-up studies (Boberg 2008b) use data from 86 speakers across the country and are on the larger end of the spectrum of studies of acoustic phonetics, where data needs to be recorded, tokens extracted, formant measurements made and normalized to ensure the comparability across speakers. The same effort of analysis applies to other studies where speakers need to be recorded, such as variationist studies in morphosyntax. Table 7.4 gives an overview of some sample sizes in the bigger sociolinguistic studies: (for more large-scale studies, see Labov 2006: 380–403). Table 7.4  Large-scale sociolinguistic studies that include audio-recordings Wolfram (1969) Trudgill (1974) Woods (1999 [1979]) Gregg (2004 [1978/9]) Poplack (1985) Kerswill and Williams (2000) Eckert (2000) Labov, Ash and Boberg (2006) Tagliamonte and D’Arcy (2009) Walker and Torres Cacoullos (2009)

36 analyzed (702 recorded by Shuy et al. 1968) 60 subjects 100 subjects analyzed 300 subjects analyzed 120 subjects 96 subjects 69 (“Neartown”), and 60 regional sample 439 acoustic analyses, 33 of which in Canada 152 speakers (350 hours) 74 speakers

Some of these studies have fairly large sample sizes. What is not shown is the effort that went into the creation of these corpora. Tagliamonte and D’Arcy build on multiple years of data collection in teams, as do Labov, Ash and Boberg (2006) and Walker and Torres Cacoullos (2009); Woods (1979 [1999]) went to the upper extreme for his doctoral project by interviewing a 100 speakers in Ottawa, Canada. Poplack’s study is also one of the bigger ones with 270 recorded hours of speech in this corpus. Gregg’s study on Vancouver English is with 300 interviewees still one of the biggest data pools. The large sample size, however, came with compromises in the linguistic data analysis that did originally not go beyond the phonemic level. Wolfram’s study of 36 informants, which was carried out in the early days of sociolinguistics, shows the other extreme: a reduction of sample size and increase in linguistic detail. Most sociolinguistic studies today work with a range between 25 and 40 speakers and are thus on the lower end of the spectrum of sample sizes presented. By contrast, the range of WQ data is different. Samples sizes in WQ studies range for Dialect Topography from 307 respondents (Quebec City) to 935 (Golden Horseshoe 1991/92). In studies in a given urban region, sample sizes of about 500 respondents that meet the selection criteria have proven to work well and allow findings for some correlations. As it is quite feasible to reach around 500 respondents today, one might



Chapter 7.  Questionnaire design and data collection 273

decide to include more subjects should one’s variables and social stratification require so. The 533 responses from the Dialect Topography of Vancouver from 2004 (DT 2004) took only weeks to complete with a postal questionnaire. The Vancouver Survey 2008 (VS 2008) was administered by 43 students in an English dialectology class with students directly approaching potential respondents and handing over questionnaires and pen, over the course of two weeks. Table 7.5 shows the absolute figures in each age cohort: Table 7.5  Sample sizes in two Vancouver WQ studies DT 2004 VS 2008

14–19

20–29

30–39

40–49

50–59

60–69

70–79

80+s

TOTAL

104  82

170 153

78 48

58 54

70 43

25 24

15 10

13  9

533 423

All empirical studies have to make some compromise in either linguistic level of analysis or survey size on account of the complexity of real-life scenarios. In typical sociolinguistic studies one would reduce the sample size rather than the complexity of the linguistic analysis. The rule of thumb is that WQs, as a self-reporting or community-reporting technique, do not allow for the level of linguistic detail as other sociolinguistic methods, while they excel at the relative ease with which to collect data from a large array of people.

7.4.2 A combined sampling method There are a number of ways to approach sampling. Dörnyei (2003: 72–72) distinguishes between four types of sampling: Convenience sampling, Snowball sampling, Quota sampling, Random (stratified random) sampling. Convenience sampling is the easiest form of sampling for the researcher, who contacts groups that meet a criterion considered as most salient, e.g. residence in a given location. This will likely trigger a number of responses that need to be ruled out, as the respondents do not meet other criteria. It is a kind of catch-all with post-hoc control for a well-defined group, e.g. excluding visitors to a given location. Snowball sampling works with in-group contacts and depends on a chain reaction by one contact leading to another one within the same network. Quota sampling defines criteria of particular groups to be included in the sampling and aims to meet these pre-set quotas: for instance, a study on Vancouver might aim for 40% non-native speakers of English, as a reflection of this ratio in the city. Stratified random sampling defines groups within a group (“strata”, e.g. Chinese Canadians in Vancouver) and establishes quotas with each group (stratum). Within the strata every potential participant must have an equal chance to be selected, so stratified sampling is logistically more challenging and depends on the availability of lists of group members that is representative of the community, which are

274 The Written Questionnaire in Social Dialectology

often no longer accessible (like in a telephone book for an entire region). It is therefore rare that random sampling in the strict sense is applied outside of the social sciences. For linguistic WQs, most often a combination of quota sampling and convenience sampling is used: if one wishes to compare the answers of Chinese Canadians and Anglo-Irish Canadians (those of UK or Irish descent), a quota must be set and respondents that meet the criteria (one of the ethnic groups of long-term residents in Vancouver) are invited to partake in the survey. Often, calls are placed in social media and people can be invited via email. It is important, though, to go beyond one’s immediate contacts in order to ensure as broad a demographic representation as possible.

7.5  Chapter summary This chapter has offered some guidelines on question and questionnaire design. Its most important goal has been to sensitize the reader to the major issues that affect the quality and quantity of WQ data. One main concern was to offer a basic typology of WQs questions in social dialectology, which was undertaken in Section 7.3. There are aspects of WQs that are difficult to gauge, as there are many choices to be made and the literature on the topic is not plentiful. Each of these choices will have an effect on the result, so it is hoped that the recommendations offered in this chapter will keep at a minimum the remaining survey errors. In linguistic circles, it is fair to say that the number of methodological studies that compare WQs to one another are an under-studied area. Often, WQs are not foregrounded in studies, which generally focus on results that are more directly linked to linguistic theory. It is to be hoped that the methodological aspects of linguistic WQs, which this chapter has shown to differ in some crucial aspects from the more general social science WQs we have come to be familiar with, will become a focus in the empirical linguistics literature.

Chapter 8

Working with WQ data The present chapter serves as an introduction on how to download, categorize, and analyze WQ data. As Chapter 7 was a step-by-step guide to the design of appropriate questionnaires, this chapter will be a guide on how to work with (potentially) large data sets, which involves the counting and classification of data. We will use the WQ data from the Dialect Topography of Canada project (Chambers 1994), which is available on the internet, to practice essential procedures. The reader will first be guided through the Dialect Topography web portal’s online functionality (Section 8.1) and will later be introduced to the downloading and data-readying procedures that allow the manipulation of the data in Excel and other spread sheet programs (Section 8.3). We will introduce the most important Excel commands – only three of which – that assist in the production of frequency tables, as seen in the theoretical chapters and case studies. Section 8.2 serves as an introduction to the calculation of social indices that allow the creation of meaningful subclasses of respondents. These indices need to be understood before one can make full use of the downloaded WQ data. The overall goal of Chapter 8 is to “keep things simple” and was expressly written with the novice in Excel and quantitative data manipulation in mind. A home computer with internet access and a version of Excel (2007 or later) is all that is needed for this introduction to data analysis. The overall goal of this chapter is to introduce all techniques that are needed to arrive at findings presented in the previous sections of this book, e.g. in Chapter 4.

8.1 The Dialect Topography portal The previous chapters have shown what WQs are able to do, have detailed the variables that have been studied with them and have used linguistic theory to illuminate the underlying principles of linguistic change shown in the data. An easy way to familiarize oneself with WQ data is by using existing data sets. The Dialect Topography (DT) of Canada is unique in the context of sociolinguistics as it offers WQ data free of charge over its web portal, which can be accessed at:

This section explains how the web portal can be used. Once you enter the portal, you can see a screen with six choices in the header: About, Questionnaire, View Results, Data Request, Tutorials and Contact Us:

276 The Written Questionnaire in Social Dialectology

Illustration 8.1  Dialect Topography of Canada Introductory Screen (“About”)

The first menu point (“About”) shows the introductory screen, which also offers information for First-time users, Visitors and Researchers (only partly shown in Illustration 8.1). In order to work effectively with DT data, the key features of each region’s database need to be kept in mind. Table 8.1 offers a summary of the available data that goes beyond the information provided on the site and will be the first port of call for those interested in working with DT data: Table 8.1  Overview of Dialect Topography data

Canada

USA

crossborder

Region

Survey director

Survey date

Age cohorts

# of respondents

Golden Horseshoe

J. K. Chambers

1991/2

14–80+

935

Golden Horseshoe 2000

J. K. Chambers

2000

14–29

825

Montreal

Charles Boberg

1998/9

14–80+

589

Ottawa Valley

André Lapierre

1997/8

14–80+

681

New Brunswick

Wendy Burnett

2001–2003

14–80+

758

Eastern Townships

Pamela Grant

2002

14–80+

404

Quebec City

Troy Heisler

1994

14–80+

307

Greater Vancouver Tony Pi (now Metro Vancouver)

2004

14–80+

533

Golden Horseshoe New York

J. K. Chambers

1991/2

14–79

 80

Golden Horseshoe New York 2000

J. K. Chambers

2000

14–29

394

Vermont

Pamela Grant

2002

14–80+

146

“across NB border” (incl. Michigan, New Hampshire)

Wendy Burnett

2001–2003

14–69

107

Western Washington

Tony Pi

2004

14–29

189

Golden Horseshoe (All)

collapsed 1991/2 1991/2 Golden Horseshoe data

14–80+

1015



Chapter 8.  Working with WQ data 277

As the data was assembled over some 15 years, it is unavoidable that some regional surveys are better represented than others. As the project grew, adaptations were made. For instance, the original data from the Golden Horseshoe, from Canada and New York, are different in their scope: 935 records in a range from 14–80+ come from Canada, compared with only 80 from New York. In the Golden Horseshoe 2000, only the two youngest age-cohorts were polled (with some outliers in the 30s-age cohort), yet in very substantial numbers. It is important that readers familiarize themselves with these basic constraints of each data set.

8.1.1

Dialect Topography questionnaire

Previous chapters have shown various questions from the DT questionnaire. The complete questionnaire can be accessed under the appropriate heading (“Questionnaire”, see Illustration 8.1). It is recommended that the readers take the time to familiarize themselves with this survey tool. The questionnaire is organized in three principal parts. An introduction, followed by the background information and linguistic questions, which are concluded with a brief thank-you note and contact details. The original DT questionnaire was designed as a paper questionnaire, but was, in the last survey in Vancouver, partly transferred to an online environment. We will need to refer to the questionnaire frequently to put the answers in the contexts they were elicited with. The complete questionnaire shows also the exact sequence in which the questions occur, which might be of interest if one suspects a bias in one question over the other one (as discussed in Chapter 7).

8.1.2

“View Results”: One variable in one location

The “View Results” function is both useful and convenient. It is a good means to quickly check an aspect of the data before downloading and analyzing it further. Clicking on “View Results” (Illustration 8.1) opens new tab with the query screen. The default Report Type is “Question by Region”. This option produces the data in tables, bar graphs and apparent-time charts for one of the eleven regions (seven Canadian plus four American). The Region can be selected under “Project Region”, while “Question” allows you to choose your variable. For the latter it is best to first look at the questionnaire, identify an item of interest there and then use the question number (q1–q74) to locate the variable in the “Question” drop-down menu, which offers a key-word to prevent confusion – in Illustration 8.2 “q1-different” refers to the variable preposition different from/than/to.

278 The Written Questionnaire in Social Dialectology

Illustration 8.2  View Results screen

The field “Independent Variable” at the bottom of the screen is also very important. With it the user decides against which social variable the linguistic data is calculated and plotted. DT offers the user seven options, with the “Age” of the respondent as the default. The full list of options is summarized in Table 8.2: Table 8.2  Independent (social) Variables in drop-down menu of DT web portal Age

Regionality Index

Occupational Mobility Index

Sex

Language Use Index

Social Class

Education

In “Window Name” one can label the output, if one wishes, for instance: “Golden Horseshoe 1991/2 – different from/than/to”. This label will become the header of the output graph and is useful if one runs many comparisons. Let’s say we are interested in question #55 from the questionnaire in Vancouver and how the variable schedule, with its two pronunciations with initial sk- or with sh-, has changed over the years. This would call for a display in apparent time, which is what the View Results functions gives us with “Age” as the independent variable. Question 55 is as follows (from the “Questionnaire”). 55. Does the sch of SCHEDULE sound like sch in school, or sh in shed?



Chapter 8.  Working with WQ data 279

We would, in Illustration 8.2, select “q55 - schedule” from the drop-down menu for “Question” and select the Project Region “Vancouver”. As the “Independent Variable” we would leave the default setting “Age”. We label the window “schedule - Vancouver”. The output is opened in a new tab when “Submit Query” is pressed. It is shown in Illustration 8.3. We get a table with the absolute frequencies and the relative frequencies for all answers, presented in descending order. In this case, almost all respondents chose either sch in school, /sk/, or sh in shed /ʃ/, with only 13 non-responses. The table output shows via the percentages how the answers change in apparent-time. The button “graph” in the right-most column of Illustration 8.3 assists with the interpretation as it produces a line graph that often shows the data more clearly. Pressing “graph” in the line “sch”, the majority variant, produces the line graph in Illustration 8.4. With the exception of the over 80-year-olds, a fairly regular increase of or [sk] can be seen, with some tapering off in the two youngest age cohorts 20–29 and 14–19. The graphing function does not only visualize the data from the table in Illustration 8.3, we can also manipulate the graph. For instance, the two semi-circles values and the line connecting them for or [ʃ]were added manually in a second step. Here is how this is done: the right column of the last line in Illustration 8.4 offers a choice of line style. We chose “Variant 2 (orange semicircle)” in the drop-down menu and then enter the values from the Table in Illustration 8.3, which drafted the line for . You can leave out single data points by entering “x” in the percentage column. With any screen shot program one can capture the chart for one’s paper or report.

Illustration 8.3  Basic output for View Results function

280 The Written Questionnaire in Social Dialectology

Illustration 8.4  Automatic Graphing feature (full circles) with manual additions (semi-circles)

We might repeat this graphing function for all seven independent variables in Table 8.2. We can chart, for instance, the answers for schedule with Regionality Index as the Independent Variable for Vancouver, or any other area. The usefulness of the web portal becomes clear quite quickly: one can form hypotheses on language use and then test them directly. Alternatively, one can just aim to “play” with the graphing tool and then interpret the findings afterwards. While the former would be the scientific way of approaching the data, sometimes the exploration of the materials (“playing with it”) will offer interesting insights.

8.1.3

View Results: Comparing two locations

As we have seen in the previous chapters, comparisons of regions with each other are an important feature. The DT web portal has a built-in basic comparison function that allows the comparison of two locations with respect to a given variable.



Chapter 8.  Working with WQ data 281

Illustration 8.5  Regional comparison feature

Illustration 8.5 shows the data for the Golden Horseshoe, Canada, for the pronunciation of shone, past tense of shine, which is either pronounced as [ɑ], as in John, or as [oʊ], as in Joan. The rhyme with John is the traditional Canadian variant. Illustration 8.5 shows a dark (on the website: purple) box that can be found at the bottom of any data table in “View Results”. With this box, the Regional Comparison Box, we can fill in Region 2, in our case, Golden Horseshoe (New York). This allows us to compare the Golden Horseshoe (Canada) data with the data across the border. We have to tell the program which variant should be plotted – in our case variant Joan, which means that we would like to see the results for shone rhyming with Joan, i.e. the non-Canadian variant. As the Independent Variable we select this time “Regionality Index”, which quantifies the local ties of the respondents (as discussed in Section 8.2). The output is given in Illustration 8.6:

282 The Written Questionnaire in Social Dialectology

Illustration 8.6  Output of Regional comparison function

We see the table data, by Regionality Index (from RI1 – local, to RI7 – not local), which is now automatically followed by a bar graph. In the graph, the Canadian answers are on the left, the American on the right and we can see that Canadians in the Golden Horseshoe do not generally use the variant Joan: only 4.2 percent of RI1 use it, and only about 11 percent of RI 5 and 6. On the American side, the picture is radically different: near-categorical and categorical reporting for RI 1–4 and 6 can be seen, with high percentages around 80 percent for RI5 and 7. We are also shown the sample sizes in each category, e.g. just 6 in RI5, only 2 in RI6 and 9 in RI7 in New York, but 112, 102 and 160 for these RIs in Canada, as shown in the table in Illustration 8.6. These absolute figures remind us of the categories we do not have enough data of to make solid claims with. Illustration 8.6 reveals a lack of correlation of the dependent variable (pronunciation of shone) with the independent variable of our choice (Regionality Index): across the range of RIs, Canadians score very low for shone rhyming with Joan, and Americans very high, which makes variant John a Canadianism in the Golden Horseshoe. There are further ways to explore this variable. We would, perhaps, wish to find other correlations, for instance with Age, or Sex, or Social class, or any of the



Chapter 8.  Working with WQ data 283

other independent variables, such as Education. If we would like to find out, whether the Canada/U.S. split in shone persists, we would want to use “View Results” with the Independent variable “Age” (to chart an apparent-time graph), or Sex, since we know that women often lead in incoming variants (see, e.g., Labov’s gender principles from Chapter 6). The overall aim would be to link the principles from the theory chapter with the DT data.

8.1.4 Tutorials There are two tutorials online, which can be accessed from the button tutorials (Illustration 8.1). The material in the basic tutorial should already be familiar. You can use this resource to practice and solidify what has already been learned. We have also presented issues covered in the intermediate tutorial, with the exception of the “Subregion Function” and the look-up of Individual Records. The Subregion Function divides a given survey region into subregions. For instance, the Quebec City DT data has been coded for six subregions: Loretteville, Quebec (the city proper), Shannon, Sillery, Ste Foir and Valcartier. We can now graph the dependent variables for each of those regions. The intermediate tutorial explains how. Illustration 8.7 shows the Tutorial view with the split windows. The bullet “Subregion” is selected. Upon submitting, the subregion plotting function is opened in a new browser tab. Please keep in mind that for some variables in some subregions the absolute frequencies may be small to arrive at meaningful results (perhaps too small, such as seven respondents 14–80+ for question 1 in Loretteville), but it never hurts trying to test for correlations on the subregional level if you have reason to suspect it as a social correlate of language use.

Illustration 8.7  Tutorial View with Split Windows (Graphing Tool at top, tutorial text at bottom)

284 The Written Questionnaire in Social Dialectology

There is a second function that is shown in the intermediate tutorial. With the individual record lookup you can access a respondent’s particular answers. It works via the index number (the records as such are anonymous and cannot be traced back to the respondent). You may just be able to see in Illustration 8.7 that the option “In­ dividual Record by Index” is a field below the option “By Subregion”. Each respondent received a unique identifier, and “index”, which allows researchers to look up an individual respondent’s complete set of answers. The Index numbers are best identified in the downloaded datasheet, which will be explained in the next section. For now, a glance at Illustration 8.12 shows the first column on the left, “gh_index”, the unique record identifier for the Golden Horseshoe 1991/2 survey, starting with 1001. With this number, one can look up all answers for, e.g., respondent 1010, a female university student from Corfu, New York. This information might be of interest if the closer study of outliers in a group is warranted. These are the most important aspects of the portal’s online functionality. For more fine-grained analyses, the downloading of the data and handling in Excel and statistic packages is needed, which will be introduced in Section 8.3. Before we can work with the full range of options of the DT data in Excel, however, we need to look at one more concept that needs to be understood: social indices and how they are calculated.

8.2 Calculating social indices Indices of one sort or another have been applied to multiple uses in sociolinguistic studies. Broadly speaking, one can distinguish between social and linguistic indices. Social indices categorize and group participants along social criteria, whereas linguistic indices operationalize a linguistic variable. While the former is necessary to group informants in a study and find meaningful social correlates of linguistic features, the latter makes it possible to quantify the linguistic data. In many sociolinguistic studies both types of indices are used, though in WQs mostly social indices have been employed. Indices for Social class, Education, Age and Sex (biological gender29) have long been used in sociolinguistics. Four indices will be introduced here. These are the Regionality Index (RI) (Chambers & Heisler 1999), which identifies each individual’s ties to the locality, the Language Use Index (LUI) (Chambers & Lapierre 2011), which rates each speaker’s use of the target language in relation to one’s other languages, the Ethnic Orientation Index (EOI) (Walker & 29. For readers socialized with a terminological dichotomy of sex (biological) and gender (social), the term “biological gender” will necessarily feel like an oxymoron. In the North American context, however, it is quite common.



Chapter 8.  Working with WQ data 285

Hoffman 2010), which assesses the involvement of a member of an ethnic group in that ethnic community and the Occupational Mobility Index (OMI), which assesses in a rudimentary way whether the respondents have socially moved up (or down) compared with their parents. The RI, LUI and OMI indices are supplied with most DT data sets and it is crucial to understand their construction. The EOI is offered as a new method to theorize on ethnicities in diaspora settings and has not yet been applied to WQ data. We will look at each of these indices in turn, as an understanding of them is essential for a number of processes, from questionnaire design, to data collection, categorization and analysis.

8.2.1

The Regionality Index (RI)

The most pervasive independent variable in regional dialectology is region. Like other variables, the influence of region varies with the linguistic variable and is subject to change over time. For instance, Johnson (1996) identified, for farming vocabulary, region as the least reliable determinant of language use in 1990, whereas in 1930 when the original LAUSC interviewing was carried out, it was the most pervasive one. While Johnson’s data is regionally confined compared to the older data, it followed the same interview protocol so that some change in the regional make-up seems evident. Change that relates to the make-up of the rural vocabulary in a reflection of the considerable social changes over the period. Present-day studies often need to go beyond the simple assignment of geographic location and operationalize each respondent’s ties to the local region instead. This is precisely what the Regionality Index does. The RI assesses each individual on a scale from 1 to 7; “1” is assigned to people with very strong local ties – those whose parents were already born in the target region, who themselves were born there and who have spent at least their formative years in the target region. An RI of “7” is assigned to newcomers to a region, although they may have lived there for a long time as well. If somebody moved to the target region after age 18, and has no previous ties to the region, she is assigned a score of 7. Everyone else is in between. The index considers four criteria: residence in the target region, place of birth, place where the respondent was raised ages 8–18, and places where the respondent’s parents were born. One can see from these criteria that the index is blind to a number of other relevant factors that establish regional ties, such as time spent in the target region or the region of their parents’ formative years. The four types of information needs to be collected for each survey respondent, which is carried out in a straightforward way in the questionnaire. Below, the responses for Ly are shown (from the 2008 Vancouver Survey):

286 The Written Questionnaire in Social Dialectology

Where do you live now? (Town, Province, Country, please)

Richmond, B.C., Canada

Your birthplace: Where were you born? (Town, Province, Country, please)

Vietnam

Where were you raised from ages 8 to 18? Please list the places, e.g.   8–9 New Delhi, India;   9–17 Toronto, Canada   17–18 Vancouver, Canada (Town, Province, Country, please)

Vietnam

Where was your father born? (Town, Province, Country, please)

Vietnam

Where was your mother born? (Town, Province, Country, please)

Vietnam

Ly’s answers need to be transferred into her RI. Ly, a female in her 60s, was born in Vietnam and lives in Richmond, B.C., which is part of Metro Vancouver, the target area of the Vancouver Survey. She has lived there for 30 years. She grew up (ages 8–18) in Vietnam and both her mother and father were born there. To assess her RI, the information needs to be quantified. The schema to do so is shown in Table 8.3 below: Table 8.3  Calculation schema for the Regionality Index Place living

Place born

Place raised

Either parent born

Target region: 1

Target region: 0

Target region: 0

Target region: 0

Other region: rule out

Elsewhere in study region Elsewhere in study region Elsewhere in study (e.g. Province): 1 (e.g. Province): 1 region (e.g. Province): 1 More distant (e.g. outside of Country): 2

More distant (e.g. outside of Country): 2

More distant (e.g. outside of Country): 2

RI = Place living + Place born + Place raised + Either parent born

If we apply the formula in the last line of Table 8.3 to the example, we see how the RI works. The Target Area is Metro Vancouver, that is the City of Vancouver and its surrounding municipalities. The terms “Elsewhere in the study region” depends on the area. In the Canadian context this can be equated with province; “More distant”, can be interpreted as outside of the province. Ly’s score would be calculated as follows: Place living: Ly lives in Richmond, B.C., which is part of Metro Vancouver. Score 1. Anybody who does not live in the target area needs to be excluded from the survey. The score in this category is a “base point”, as no one can go lower than 1 for the RI.



Chapter 8.  Working with WQ data 287

Place born: she was born in Vietnam, which is another country and thus a score of 2. Place raised: she was raised in Vietnam for the entire period 8–18, which produces a score of 2. Either parent born: the parent born closest to the target area is our reference point. As long as one parents is born in the survey area, the score in this category will be 0. In our example, both parents were born in Vietnam, which is a score of 2.

Now, we can add the score. RI = 1 (Place living) + 2 (Place born) + 2 (Place raised) + 2 (Either parent born) = 7.

Ly’s Regionality Index is 7. She is thus labeled as someone, from a linguistic perspective, with fairly loose ties to the target region. It is important to add that relative newcomers such as Ly have traditionally been ignored in dialect surveys. The instrument of the RI allows us to draw from the pool of all residents of a target region and therefore produces a more representative sample of the target area. This is an important step in the data collection in dialectology, which can be exemplified with Gregg’s (2004 [1978/79]) survey of Vancouver English. Gregg wanted to describe the English of only very local residents and had to go through great trouble to do so, which indicates the artificiality of the task. In order to reach a number of 240 informants for his survey (on top of 60 informants he already had), he and his team had to phone 3700 potential subjects, resulting in an acceptance rate of only 7% (Gregg 2004: 6). By not considering different degrees of local ties, Gregg arrived at a description of a minority of less than 10% of speakers of Vancouver English, which is unfortunate indeed. Ly is one extreme end of the RI, on the other extreme is Sarah, who was 19 at the time of the survey. She had lived in Metro Vancouver her entire life. Here are her answers: Where do you live now?

Burnaby, B.C., Canada

Your birthplace: Where were you born?

Vancouver

Where were you raised from ages 8 to 18?

Burnaby

Where was your father born?

Vancouver

Where was your mother born?

Vancouver

Sarah lives in Burnaby, which is part of Metro Vancouver (1 base point), she has lived there from ages 8–18 (in fact, she has lived there her entire life) (plus 0), she was born in Burnaby (plus 0), and neither parent was born outside of the target region (plus 0). Sarah has an RI of 1, which means she is a person with very close ties to the target area.

288 The Written Questionnaire in Social Dialectology

Everyone else is between Ly and Sarah’s extremes. There are many examples, some of which we will discuss to illustrate the principle of the RI better. Kate is a student in her 20s. She now lives in Vancouver and her answers to the relevant questions are shown below: Where do you live now?

Vancouver, BC, Canada

Your birthplace: Where were you born?

Vancouver, BC, Canada

Where were you raised from ages 8 to 18?

8–18 Port Hardy, BC, Canada

Where was your father born?

Langley

Where was your mother born?

Vancouver

Kate’s RI score is as follows: she lives in the target region (1 point), she was born in the target region, (plus 0), but she grew up in Port Hardy, BC, at the northern tip of Vancouver Island which is a 2 hour ferry ride and an 8 hour car drive from Vancouver. Port Hardy is in the province, but no longer in the target region (Metro Vancouver), which is why we assign a score of plus 1. As one of Kate’s parents was born in Vancouver, we assign the closest category (plus 0) on the assumption that the close ties of one parent suffice. Kate’s RI score is therefore: 1 + 0 + 1 + 0 = 2. With an RI of 2 Kate is local, but not as local as Sarah who spent her formative years in the metropolitan area of Vancouver, while Kate lived in a small community in rural BC. In countries where in-migration plays an important role, such as in Canada, an important cut-off point on the RI scale is 3. Often times, people were born and raised in the target area but their parents moved from abroad to Canada. This would produce, in most cases, an RI of 3. Mario has an RI of 3, as can be seen from his answers: Where do you live now?

Vancouver, BC, Canada

Your birthplace: Where were you born?

Vancouver, BC, Canada

Where were you raised from ages 8 to 18?

8–18 Vancouver, BC, Canada

Where was your father born?

Israel

Where was your mother born?

Ukraine

Mario is 25 and the son of immigrants. He reaches a score of 3 the following way: RI = 1 (Place living) + 0 (Place born) + 0 (Place raised) + 2 (Either parent born) = 3.

It is important to keep in mind that an RI of 3 is still very local. After all, Mario spent his entire life in Vancouver until leaving for grad school. Still, because of his parents’ newer status in Vancouver, he does not reach a lower score. Chambers’ RI distinguishes very delicately between groups of people who are very local. This is because certain, fine-grained linguistic feature distributions (such as Canadian Raising) have been suggested to be acquired by people whose parents already grew up in the target region. For practical reasons, probably, Chambers asks for



Chapter 8.  Working with WQ data 289

“Place born” of either parent, and not for “Place raised”. Keeping this in mind, we will, at times, collapse findings for RI 1–3, which coincides with the group of speakers who were born and were raised in the target area. This was also Gregg’s target group in the Vancouver Survey from the late 1970s, for which he had troubles findings interviewees. Another interesting cut-off point is an RI of 5. This class applies, amongst others, to informants who grew up in the target area, but whose parents were not born there and neither were they. People who moved into the target area as young children with their families from abroad will frequently score 5. For example, John is in his 90s, and has lived in Vancouver for 84 years. He was born in Norfolk, England, where his parents were born. He spent his formative years, 8–18, in Vancouver, as he came to Vancouver at the age of 6. He would reach the following RI score: RI = 1 (Place living) + 2 (place born) + 0 (place raised) + 2 (either parents born) = 5.

The formula, therefore, assigns to one of the oldest and most local Vancouverites in any sample only an RI of 5. In other words, RI 1–5 is the group of very local respondents: the RI is fine-grained for locals but rather crude for the diverse group of immigrants. This is not the only way an RI of 5 can be reached, however. For instance, Martin is in his 30s and was born in Victoria, B.C. He grew up in Nelson, B.C. and moved to Vancouver for good only 6 months prior to taking the survey. His dad was born in Austria, his mother in the USA. His score is also 5: Place living: Vancouver, B.C.: base score of 1. Place born: born in Victoria, B.C., which is in the province (elsewhere in the study region), plus 1 Place raised: raised in Nelson, B.C., within the province, but not the target area, plus 1 Either parent born: either parent was born outside of the country, plus 2.

Martin’s RI score is: 1 + 1 + 1 + 2 = 5. Although Martin never left the province for long, the RI documents his looser ties to the specific target community. One can see now how the intermediate scores of 4 and 6 can be reached: had John been born in Edmonton, Alberta, he would score 1 (outside of the province) for “place born” and not 2 (for Norwich, UK). This would give him an RI score of 4. If Martin, on the other hand, was not born in Victoria, BC, but in the USA, where his mom is from, his RI score would be 6, not 5. The Regionality Index is not a linear, interval index, but a ranked index: Sarah and Martin, with a difference of 5 points between them on the 7-point scale, have more in common linguistically, than Martin and Ly, who score 5 and 7 respectively. While Martin and Ly have only two points between them, Martin has much more profound ties to the location than Ly. It is important to keep in mind that an RI of 5 has still profound ties to the province and country in the larger context of immigration. This is

290 The Written Questionnaire in Social Dialectology

different for RI 6 and 7, which are the two scores for newcomers from other countries. RI 7 is a very heterogeneous class: someone, like Ly, who migrated as a second-language speaker of English to Canada, and who learned English in Canada as a senior, will be assigned the same score than someone who has lived all his life in Washington State, just south of the Canada-U.S. border, who may have visited Vancouver occasionally for weekend trips, and who decided to move to Vancouver after age 18. In those cases, the RI is blind to the speakers’ differences and the user of DT data needs to understand these specifics of the RI.

Complex cases There are some cases for which assigning the RI is a bit tricky. People with complex residential histories, e.g. children of families who moved a lot, can have very complex histories in their residential record, sometimes to the extent that they cannot recall the precise locations of their whereabouts for some periods in their lives. The approximate information is usually enough (one should ask for fairly precise identification of the place names on the questionnaire, “City, Province, Country”, only to make identification possible). Such complex cases are fairly infrequent, but will need to be dealt with. The first concept to keep in mind is that one can only assign 0, 1 or 2 points for “Place raised”, and no intermediate points (no ‘half points’). This is necessary to keep the RI of that individual consistent with the system. The following case illustrates an efficient approach for complex migration histories: Jay

born: Chicago Jack born: Guangzhou, Guangdong, China 8–9: Chicago 8–9: Guangzhou 9–10: London, UK 9–18: Burnaby, BC, Canada 10–13: Seattle 13–18: Vancouver

Jay was born in Chicago and spent his first nine years in Chicago (we infer so from his data), then two years in the UK, followed by three years in Seattle, before his family moved to Vancouver when he was 13. They were still in Vancouver at the time of the survey. Jay’s is a more complex residential history. Which RI should we assign? Our choices are 0, 1 or 2. Given Jay’s five years from age 13 in Vancouver, it would seem wrong to assign a “2” and also it would not feel right to assign a “0”. A score of “1” seems like a good compromise. Cases like Jack’s, whose parents are Hong Kong Chinese and who moved to Metro Vancouver (Burnaby) at age 9, are easier to solve. Here is some leeway in score assessment and one will have to decide on a cut-off point. Jack could be assigned a score of “0” for “Place raised”, as, de facto, he spent the vast majority of his formative years in the target area. In his case, one need not interpret the 8–18 age frame for formative years all too strictly. Had he come at age 13, such as Jay, we would assign him a “1”. A good rule of thumb is to assign a person who spends about 50% of his formative years in



Chapter 8.  Working with WQ data 291

the target region a “1”, two years or less a “2” and more than 50% a “0”. Some difficult calls would need to be made, but the idea is that in the big picture, such distinctions would not skew the overall picture.

Applying the RI As is the case with other independent variables, the RI can be correlated with linguistic features. In Quebec City it was shown that the RI offers and explanation for the use of the Quebec English variant soft drink (rather than pop/soda etc.). Figure 8.1 shows that in Quebec City soft drink is predominantly reported by low RIs: for those of RI 1–3 it is the majority variant, while newcomers (RI 5–7) prefer the standard Canadian pop. As the ties of respondents increase over the generations, their use of soft drink increases as they align more closely with this local standard variant. Adjusting the RI for multilingual respondents It is important to keep in mind that the RI does not capture all kinds of migration histories with the same precision. It does not consider, for instance, the residential history of ages 0–8 and it is a somewhat fuzzy method for children of highly mobile families. The RI, as this introduction has shown, is geared towards a very fine-grained discrimination of people with local ties, and lumps together, by comparison, the interlopers into heterogeneous group. This has proven important in the Canadian native-speaker context, but it needs to be kept in mind that other contexts require adjustments to the RI. DT studies are intended for respondents starting at age 14. For studies of younger children a different RI and a different data collection method would be needed. soft drink pop

    

      















Regionality Index

Figure 8.1  Soft drink and pop by RI in Quebec City (Chambers & Heisler 1999: 43)

The RI has proven to be an effective tool for native speakers of a language. Now that we have explained its calculation on Vancouver examples and have seen in Figure 8.1 that the 7-level RI works well in some Canadian contexts, we need to add that the RI

292 The Written Questionnaire in Social Dialectology

is a suboptimal tool for the study of more recent migrants. If the study of L2 Englishes or their inclusion in the dataset, for instance, is the goal of the study, the RI would need to be extended to distinguish in a more meaningful way between these relative newcomers. In other words, the RI level 7 would be subdivided further, by “First language” for instance, adding levels 8, 9, 10 and higher. If higher scores are constructed as subclasses of RI 7, one could reduce the higher scores to RI 7 if needed to preserve compatibility with the DT data.

8.2.2 Language Use Index (LUI) Another index in the DT context pertains to the use of the English language. The Language Use Index assesses the frequency of a respondent’s use of English. Based on four questions, a numerical index is established that ranges from “0” for the monolingual English speaker who only speaks English to “12” for the person who never uses English (Chambers & Lapierre 2011). The matrix of questions is shown below: How often do you use English? At home: always, often, seldom, never At work: always, often, seldom, never With your friends: always, often, seldom, With your relatives: always, often, seldom,

never never

For use of English in each specified setting, a numerical value is assigned for each answer according to the following scheme: always = 0 points, often = 1 point, seldom = 2 points, never = 3 points

The above example would thus yield the LUI of: 1 (at home) + 0 (at work) + 1 (with friends) + 3 (with relatives) = 5.

As we can see, the hypothetical respondent’s LUI would be 5. It is obvious that the frequency of use of English might affect one’s linguistic behaviour. An example is offered in Chambers and Lapierre’s (2011) study of the Ottawa Valley. Figure 8.2 shows the correlation of LUI and the reported frequency of the variant sofa for the variable chesterfield/couch. In Chapter 4, sofa was left aside as a minor variant. Here, and with the help of the LUI, we can show that sofa correlates with a specific subgroup of the data set. Figure 8.2 shows clearly that Francophones’ use of sofa correlates with their frequency of use of French: the higher their use of French (LUI 9–11 rarely use English), the higher the transfer from French, which has sofa as its standard term. The LUI is a simple, yet important diagnostic in multilingual contexts, yet it is also blind to the linguistic background of the respondents due to its focus on English. Such information will need to be gleaned from the respondents’ remaining background



Chapter 8.  Working with WQ data 293

sofa (L-French) sofa (L-English)

    

      



–

–

–

LUI

Figure 8.2  Sofa in Ottawa Valley according to LUI and mother tongue (Chambers & Lapierre 2011: 47)

questions (e.g. “other languages used”, or better “Mother tongue”, the latter of which is not asked in DT).

8.2.3 Ethnic Orientation Index (EOI) A more recent innovation and currently a topic of great interest is the modelling of “ethnic orientation” in sociolinguistics: rather than relying on the self-identification of ethnic group membership, ethnicity is defined in this approach by an individual’s actions towards or against the association with an ethnic group. Does an individual aim to be part of an ethnic community or the mainstream, or both, and to which degree? Hoffman and Walker (2010) presented their Toronto interviewees with a detailed questionnaire on Ethnic Orientation that is divided into eight sections and lists between 2 and 5 questions in each section. The sections are, with sample questions, shown below: Ethnic identification, e.g. Do you think of yourself as Italian, Canadian or ItalianCanadian? Are most of your friends Italian? Are people in your neighbourhood Italian? Language, e.g. Do you speak Italian? How well? How often? If no: Can you understand Italian? Do you prefer to speak Italian or English? Do you prefer to listen to the radio or watch TV in Italian or English? Language choice, e.g. What does your family speak when you get together? What language do you speak with your friends? Did/do you speak to your parents Italian? Your grandparents? Your children/grandchildren?

294 The Written Questionnaire in Social Dialectology

Cultural heritage, e.g. Where were you born? If in Italy: How old were you when you came here? How long have you lived here? If in Canada: Have you ever been to Italy? When? For how long?, Where did you go to school? Parents, e.g. Do your parents think of themselves as Italian, Canadian or Italian-Canadian? Partner, e.g. Is your husband/wife/boyfriend/girlfriend Italian? Does she/he speak Italian? Do you speak Italian to her/him? Italian culture, e.g. Should Italian-Canadian kids learn Italian? Italian culture? Would you rather live in an Italian neighbourhood? Discrimination, e.g. Have you ever had a problem getting a job because you’re Italian? What about renting an apartment or buying a house? Is there a lot of discrimination against Italians? (Excerpts from Ethnic Orientation Questionnaire,  Hoffman and Walker 2010: 66–67)

The basic principle of quantification is already known from the RI and LUI. By assigning scores to assess levels of ethnic involvement for each of these questions, the EOI is created. Scores range from 1 for the least ethnic-centric answer to 3 for the most, e.g. for self-identification as Canadian = 1, Italian-Canadian = 2 and Italian = 3 (first question above). Averages are formed across the total of 35 questions, with 1.5 serving as a cut-off point: 1.5 and higher counted as a “high EOI” and 1.49 or lower as a “low EOI”. The creation of subclasses is possible as well, but was not carried out in the original study. While it was shown that the overall linguistic system in second-generation speakers of Italian and Chinese descent are the same as the British-Irish speakers in Toronto, an overall correlation between EOI and the participation in mainstream Toronto English features suggests that “at least some of them may be using overall rates of use to construct and express ethnic identities” (Hoffman & Walker 2010: 58). It seems clear that the respondents’ orientation towards an ethnicity may influence language choices and linguistic behaviour. The EOI may serve as a model to devise indices that allow the isolation of speaker attitudes and strategies in the construction of linguistic identities. Overall, one can say that the RI and LUI go a certain way and are useful indices. The EOI and similar indices have greater potential with more finegrained instruments, yet with 35 questions the EOI needs quite a bit of space on any WQ, which makes it less practical for some applications. Using a selection of a subset of what is considered as the most important questions, perhaps with 18 about half of the questions, instead of all 35 in the EOI would go a considerable way towards a more efficient modelling of the linguistic effects of ethnic orientation.

8.2.4 Occupational Mobility Index (OMI) and Social Class (SC) The OMI is based on social class. In DT, social class is assigned by the sole criterion of the profession of the respondent. White collar occupations (teachers, admin assistant etc.) are assigned a “1”, blue collar occupations (e.g. construction worker, carpenter) a “2”.



Chapter 8.  Working with WQ data 295 Social Class (SC):

1 = white collar occupation 2 = blue collar occupation

The OMI uses the social class scores of the respondent’s parents and the respondent’s own score and adds them. This yields the following three classes of OMI: Occupational Mobility Index (OMI): Add social class scores for respondents and parents

2 = stable white collar 3 = occupationally mobile 4 = stable blue collar

If both parents have different types of occupations, the white collar occupation of one parent is used to set the score. One variable that correlated with the OMI is the past participle form in the DT question #27: 27. Which do you say?  He has drank three glasses of milk.  He has drunk three glasses of milk.

Drunk is the standard participle form, though drank has been in use in North America for a long time. Table 8.4 shows a correlation of the non-standard form with blue collar professions in two Canadian locations. Stable white collar respondents, who had at least one parent in a white collar profession, shows the least use of has drank – about 20% in Quebec City, with occupationally mobile (moving from blue collar to white collar in the children’s generation) show slightly more than a quarter report drank, while stable blue collar respondents report it in more than 60% of the cases. In New Brunswick, the figures are generally higher – stable white collar workers use has drank to almost 50%. In that province it is the occupationally mobile who are most sensitive to the linguistic standard and use has drank the least. This result is reminiscent of the hypercorrection of Lower Middle Class for postvocalic [r] in New York City: those who sit at the cusp between social groups are often hyper-sensitive to the norms of the higher class and often “overdo” it. The stable blue collar workers, by contrast, report the non-standard form for almost 3 out of 4 individuals. Table 8.4  Has drank (not has drunk) in percent by OMI (Dialect Topography database) Quebec City New Brunswick

OMI 2

OMI 3

OMI 4

20.5 49.1

27 39.7

63.2 73.1

While only a crude measure of social mobility, the OMI manages to produce some meaningful correlations, especially for grammatical variables. Now that we have discussed the use of indices for the social sub-classification of respondents, we have all the tools needed to fully exploit WQ data. The next section is a brief introduction, a “crash course”, in making the most of WQ data with a very basic software product: Microsoft Excel.

296 The Written Questionnaire in Social Dialectology

8.3 Data-readying in Excel For many advanced routines, it is necessary to download the DT data from the website in order to process it in more complex ways than is possible on the website. Downloading and data-readying is a straightforward process that only requires a few steps. Once downloaded, the DT data provides a template of what all WQ data should look like: each respondent’s answers are listed in one line, with the variables in the columns. Social variables (Soc. 1–n) and linguistic variables (Var. 1–n) are arranged in the following way: Soc. 1

Soc. 2

Soc. […]

Var. 1

Var. 2

Var. 3

Var. 4

Var. 5

Var. […]

Resp. #1 Resp. #2 Resp. #3 […]

8.3.1

Importing DT data into Excel

The function “data request”, in the top menu in Illustration 8.1, enables us to download the data for each survey in various formats. Illustration 8.8 shows the home screen in data request. You can choose data length, region and question, which you can then submit. The screenshot also shows the beginning of the instructions section. The web portal provides you with some guiding documents, and you can access for further information the “Routines for the Working With Dialect Topography Databases”, which is further down on the Data Request page but not shown in Illustration 8.8. In this section, you will be introduced to the download functions that should serve most needs. An important field is data length in Illustration 8.8, which is where you decide both the format and the “length” of the downloaded records. In most cases “Full” would be the option of choice: you would like to download all available information, not just a basic subset (“short”). You also need to choose between formats such as Excel and the VARBRUL format, which is a format for a statistical analysis tool called GOLDVARB. Goldvarb is another standard in sociolinguistic statistics.30 region can be selected based on Table 8.1. For instance, the selection “Golden Horseshoe (All)”, as shown in Table 8.1, summarizes all data from the 1991/2 survey, regardless of Canadian/U.S. provenance of the respondents. 30. You can download Goldvarb here: , see Tagliamonte (2006) for a description how to use it.



Chapter 8.  Working with WQ data 297

Illustration 8.8  Data Request home screen (Dialect Topography)

Let us download and import some data. Select under data length “Full”, which includes the complete linguistic and social data for the variable. For Region we can leave “Golden Horseshoe (ALL) and for Question, we can leave “q1-different”. We can see in Illustration 8.9 the raw form of the output data. Fields with data are marked by " " and are separated from another with the pipe, |, symbol. The data now needs to be “readied”. Depending on your computer, PC and Mac Users need to take the corresponding steps outlined below. The output will be a table that can be worked with in standard spreadsheet software, such as Excel.

Illustration 8.9  Raw download format (Full Data Length) (clipping)

298 The Written Questionnaire in Social Dialectology

PC Users Procedures may vary slightly from system to system, but PC users take the following steps (the example was generated using Mozilla Firefox, MS Excel 2010 as spreadsheet, on a Windows machine): 1. Under the Brower’s file menu, select save as 2. Save the data file. A Window will appear with “exceldata.php” as the default name. Let’s use a “talking name”, such as “GH All q1-different”. Ensure that the file format is “Text file” or an equivalent such as “Plain text”. 3. Open the data file with a Spreadsheet program (such as Microsoft Excel). 4. The file is recognized as a text file. You will get a warning that is intended to save you and your data from malicious code from the internet: “The file you are trying to open, ‘GH ALL q1-different” is in a different format than specified by the file extension. Verify that the file is not corrupted and is from a trusted source before opening the file.” In our case, this warning can be disregarded. Press “Yes”, or “Ok” to open the file. The DT files are clean, and not the product of hackers and virus programmers. 5. In many programs, a text import wizard will open. Excel 2010 has the following one.

Illustration 8.10  Importing the text into Excel 2010



Chapter 8.  Working with WQ data 299

6. Choose as the data type “Delimited”, as selected in Illustration 8.10. DT data is divided by a special character and can thus be transferred into a table structure. 7. Pressing “Next” will prompt you to identify the character that delimits (separates) the fields in our data table. Illustration 8.11 shows the Wizard with the correct choices:

Illustration 8.11  Copy-and-pasting the delimiting character into the Excel

Text Import Wizard 8. Select the option “Other” (uncheck any other choices, such as “Tab”). Now we only need to copy and paste the delimitation character. All DT datasets are delimited with the “pipe” symbol: “|”. Here is how you can insert the pipe symbol: go with the cursor into the field next to “Other:”. Then press the “Alt” button and keep it pressed, while you enter “1”, “2” and “4” from the key pad on the right, then let go of the “Alt” button. Now, the pipe symbol will appear in the box and the data will already, in the preview, appear as properly delimited (Alternatively, you may copy-paste the pipe symbol from the browser screen, see Illustration 8.13). The Preview Window in Illustration 8.11 gives you an idea that we are on the right track. 9. Make sure that the Text qualifier is a double quotation mark: ", as shown in Illustration 8.11. Now all that is left is to press “Finish” (or “Next” and then “Finish”, without any further changes) and you will see a table that has been imported clean into Excel, as shown in Illustration 8.12:

300 The Written Questionnaire in Social Dialectology

Illustration 8.12  Imported data in Excel

10. Illustration 8.12 is the format we need to arrive at. In the spreadsheet, a given respondent’s answers are listed in one line, and all variables are listed in columns. 11. Save the file now with save as, and select “Excel Workbook” as the format (do not save the file in a text format).

Mac Users Mac Users use a very similar routine. The following steps have been tested with Firefox 3.0, Excel for Mac 2008 and Mac OS X 10.4: 1. Under the Brower’s file menu, select save as. 2. Save the data file. A window will appear with “exceldata.php” as the default name. Let’s use a “talking name”, such as “GH All q1-different”. Ensure that the file format is “Text file” or an equivalent such as “Plain text”.



Chapter 8.  Working with WQ data 301

3. Open Excel, Open File. Select “All Documents” from Finder – the default is “All Readable Documents”, which won’t let you open the file. 4. Text Import Wizard, Step 1 of 3: Choose “Delimited” as the “Original Data Type”, press “Next”. 5. Text Import Wizard, Step 2 or 3: Uncheck “Tab” and check “Other:”. Now you need to insert the pipe symbol into the box next to “Other:”. In order to do this, go back to the browser window where you saved the data. Illustration 8.13 shows this process. Mark a pipe symbol on the page and copy it with [Apple]+C. Then move the cursor into the field next to “Other:” and insert the pipe with [Apple]+V. Ensure to include the “Text qualifier”, which is a double quotation mark “.

Illustration 8.13  Mac procedure for inserting the Pipe symbol via Copy-and-Paste

6. Press “Finish” and Excel will display the file in the same order as in Illustration 8.12 7. Save the file now with save as, and select “Excel Workbook” as the format (do not save the file in a text format). We have imported the data. The result in Illustration 8.12 shows the first few lines of our data, for different (question 1) and the Golden Horseshoe 1991/2 (there are in total of 1015 lines of data for this DT survey). We can see the Index numbers, starting with “1001”, followed by the answer “Our house…” (the column can be widened to display the rest of the sentence), Gender, Education, Occupation (teacher, student, office worker and so on), the index scores (OMI, SC [social class] and RI), Subregion within

302 The Written Questionnaire in Social Dialectology

the Golden Horseshoe (starting with the U.S. locations, e.g. New York, Long Island, Albany). Remember that we chose “All”, the cross-national data for download and we see that the first lines show the American section of the sample, a broader regional classification (NY1 and NY2) and the first language of the speaker. With this information we can look up responses of individuals on the web portal, with the function we discussed in the previous section. If, for instance, we are aiming to look up a male in his 40s, we can enter “1001” in the individual lookup and get his answer. Or, we can just look at the Excel sheet. You will probably find that the Excel format is, with a little practice, more convenient than the webpage for individual lookups as all the information is in one location. All of the new findings based on DT data in this book were derived working from the downloaded data. While the online graphing tools are very useful for the beginner, the downloaded data is easier to deal with, especially when one likes to compare all seven regions, as we did in Chapter 4. There are very few limits how to process the data from here. We will look at some very versatile Excel commands next.

8.3.2 Three basic Excel commands This section is aimed at the novice in quantitative linguistics and no prior Excel knowledge is assumed. The goal is to introduce the reader to a very small set of commands that can be used in the analysis of WQ data – either downloaded or self-collected data. We will only work with three commands (called formulae): SUM, COUNTIF and COUNTIFS. These three commands we will apply frequently. Download the DT data from the previous section (Golden Horseshoe – All) and follow the steps in Excel. Please note that all data files for this practical section, as well as for the statistics section in Chapter 9, can be downloaded on the companion website or .

Principles of Excel Excel is a table-oriented “calculator” and offers two modes: the text mode and the mathematical mode. When you open Excel, the text mode is the default. The trigger to switch into math mode is by typing an equal sign in a cell: with “=” Excel is told to switch into math mode. In math mode, Excel is identifying cells with letter and number codes: letters (A, B, C … AA, AB, AC…) stand for columns, numbers (1, 2, 3, 4, …) for rows. So A1 defines a cell, as does K1034 or, beyond Z, for instance AB1, AC1, AD1 and so forth. The labels can be found in the top row, e.g. “AD” and next to the left-most column, e.g. “1”. Excel identifies cells by this grid pattern, which is why it is important to understand this labelling system. In Illustration 8.14, the beginning and the end of the file that we downloaded in the previous section is shown (Golden Horseshoe – All q1-different):



Chapter 8.  Working with WQ data 303

Illustration 8.14  Downloaded DT file, Golden Horseshoe (All), q1-different, in Excel

The line above the table grid shows “B2” and its content “Our house is different from yours.”, which is where I placed the cursor. We can see from Illustration 8.14 that the variable q1-different was coded in the following way: different from – “Our house is very different from yours.”, with house sometimes capitalized as “House” different than – “Our house is very different than yours.” different to – “Our house is very different to yours.” (Coding of q1-different; the latter variant not shown in Illustration 8.14)

It is very important to look precisely at how a variable was coded, such as in the inset above. We also see that social variables are coded in a particular format, such as the age categories, which are listed in the following way: 14–19, 20–29, 30–39, 40–49, 50–59, 60–69, 70–79 and “over 80”. gender is coded as “Male” and “Female”, the omi (Occupational Mobility Index, 2, 3 or 4), social class (1 or 2) and the Regionality Index (between 1 and 7). The information for place of residence and a subdivision of place “central place” are also included. Some records show a “language”, presumably the first (or other language) or the respondent. The Language Use Index (LUI) was not yet applied in this first study. We can also see that the first respondents are from the US, while the last ones are from Canada (see the field “place”).

304 The Written Questionnaire in Social Dialectology

COUNTIF and SUM We can now begin to analyze the data. The COUNTIF command counts cells that meet a condition. We can use this command to count, for instance, how many answers of “from”, “than” and “to” we have in the data. The variable is located in column B, and the records run from line 2 to line 1016. With COUNTIF, we can let Excel count these responses for us. In Excel syntax, the following command counts all occurrences of the word “from” in the range of B2 to B1016, which are all our data fields: =COUNTIF(B2:B1016, "*from*")

Note the brackets, colons and commas, as well as the double quotation marks and the asterisks *. The asterisk tell Excel that we are not interested in what else is in the field, either to the left or to the right of the string from – this trick levels out the different spellings for house/House that are found in the data. Remember to start the field with “=”, so Excel switches into math mode:

Illustration 8.15  COUNTIF example

I place my cursor somewhere under the table, I choose B1018, and enter my formula, starting with =. Excel assists us by highlighting the search range. If we press enter at the end of the line, we get as a result 607. There are 607 occurrences of from as a variable in our total of 1015 responses, which is more than half. In order to keep track of my counts, I add a legend and repeat the command – with a different search term – for the other variables, using the following formulae (see Illustration 8.16): =COUNTIF(B2:B1016, "*than*") =COUNTIF(B2:B1016, "*to*")



Chapter 8.  Working with WQ data 305

Illustration 8.16  Basic variant count for q1-different

You should get the results: from – 607, than – 354, to – 42. This result already shows that to is a minor variant and from and than are in competition with one another.

Illustration 8.17  SUM command

An important way to compare results is to form percentages of all responses. With the SUM command, shown in Illustration 8.17, we can now build the sum of those three variants. Again, starting with =, we type: =SUM(B1019:B1021)

And press enter: 1003 is the result. Note how I provided a label for the operation (SUM) in field B1022. Note also how I inserted an empty line above our calculation (mark line, right mouse click, insert line) to make space for the % sign in line 1018. We are already in the middle of the process of building our first data table. We would now like to know how many percent answered with from or than or to. We need to do this to be able to compare this result – for the Golden Horseshoe – with other regions. Since the regions have different total numbers of responses, percentiles allow us to compare the reported uses anyway. Excel can help us here. I instruct Excel, again by starting with =, to divide the content of field B1019 (the 607 counts for from) by field B1022 (the sum of all three variants) and multiply by 100 to yield percent. Illustration 8.18a shows the operation and the formula window on the top screen. Pressing ENTER in the field or pressing the check mark in the formula window causes Excel to compute the result.

306 The Written Questionnaire in Social Dialectology

Illustration 8.18a  SUM command & percent calculation, with formula window on top

Pressing ENTER or the check mark on top of the Excel screen shows that the answer is that 60.51844 percent who responded with from. I could repeat the same command for the other two options, but this would quickly get tedious, especially if there are more than three variants or one needs to perform this operation multiple times (which you will). For repetitive tasks, the power of Excel as a spreadsheet comes into play. In math mode, Excel thinks of the fields’ contents as mere field labels. Since the formula in C1019 in Illustration 8.18a is the same for all three variants, Excel can “shift down” the formula and adjust the cell identifier automatically from B1019 (content: 607) to B1020 (content: 354) and then to B1021 (content: 42). But Excel shifts all cells by the same amount unless we tell it not to do so. We need to tell Excel, therefore, that it must always work with the number 1003 as the divider, which is in field B1022. By putting $ signs in front of the row and column identifiers in Illustration 8.18b ($B$1022), we prevent the shift for this field only and Excel will treat the field B1022 as a constant that is not changed or shifted:

Illustration 8.18b  Creating a percent formula that is automatically “shiftable”

The formula in cell C1019 is now ready to be “shifted down”, with $B$1022 replacing B1022 from Illustration 8.18a. First we press ENTER. The result is still 60.51844 percent for from (as it should be). In a next step, we move the mouse over the lower right corner of the box around 60.51844, as shown in Illustration 8.19: we click and hold the lower right corner and drag it all the way down to field C1022. Only then we let go of the mouse button. The



Chapter 8.  Working with WQ data 307

result is shown at the bottom in Illustration 8.19: we have used Excel’s pivot function to perform multiple calculations for us: 60.5% report from, 35.3% than, and a mere 4.2% to. The 100% sum is our check mark that the formula worked right. You can now look behind the individual formulae and you will see that Excel “shifted” the cell identifier according to our downward movement. We have calculated our first percent table. To round the decimal digits to only one digit after the comma, mark the cells you would like to have rounded, right click, select “Format Cells…”, choose “Number” and select “Decimal Places: 1”.

Illustration 8.19  Shift box (top) and full percentages with “shifted” function

Exercises on the file “GH (All) – q1-different” Let us practice these operations, as we will be using them a lot. A TIP: there are many online tutorials for Excel functions. For instance, the tutorial at [23 Sept. 2015] reviews the COUNTIF command with non-linguistic examples. 1. Calculate the number of respondents per age group, using the COUNTIF and SUM commands. Present them both in absolute frequencies (n) and in percentages (%). Use the following format:

Do you notice any discrepancies among the age groups? If so, what effects might they have on the data and how can you consider any imbalances in your interpretation?

308 The Written Questionnaire in Social Dialectology

2. Calculate the number of respondents per level of education (highest completed), for both absolute frequencies (n) and percentages (%), using the following format:

Do you notice any discrepancies among the levels of education? If so, what effects might they have on the data and how can you consider any imbalances in your interpretation?

Multiple conditions: COUNTIFS With COUNTIF and SUM we can count single text fields and we can produce percentages, which is very important. We cannot, however, apply multiple conditions with COUNTIF. For instance, we can only count respondents per age group or per gender, but we are not able to count the responses for one variant per age group, which would show whether the teenagers (14–19) or the octogenarians (80+) report a form more frequently. For that function, COUNTIFS will help us and will be the one function that we will use most. Luckily, it is a logical extension of COUNTIF, so we can use the Excel syntax we already learned. Using the Golden Horseshoe (All) file, let us then continue from Exercise 1, which has shown us the number of responses per age cohort. Now, we would like to know how many variants were reported in each age group, which can be done in the table format shown in Illustration 8.20. As COUNTIFS are an extension of COUNTIF, we are merely adding more conditions. We know from Illustration 8.14 that column B holds the linguistic variable and column C the age groups. We also know that data runs from line 2 to line 1016, with line 1 including the column headers. Using COUNTIFS, we can use this information and the following formulae to count how many 14–19-yearolds reported from, than and to: =COUNTIFS($B$2:$B$1016, "*from*", $C$2:$C$1016, "14-19") =COUNTIFS($B$2:$B$1016, "*than*", $C$2:$C$1016, "14-19") =COUNTIFS($B$2:$B$1016, "*to*", $C$2:$C$1016, "14-19")

The first line reads: count all fields from B2 to B1016 that include the answer “from” for all those who are in age category 14–19. The age condition will then need to be changed to “20–29”, “30–39” and so on. Luckily, we do not have to re-type the full formula and edit the conditions, as we can copy-paste the formula from the formula window at the top of the Excel screen (shown in the Illustration 8.20 on top). Paste the formula into the next field and adapt



Chapter 8.  Working with WQ data 309

Illustration 8.20  COUNTIFS in age groups

the conditions. I advise you to use the constant marker $ for both rows and columns, as the variables will always need to be counted in this precisely defined area. With the “$ markers” we can shift the formula around since Excel will always refer to the fields so defined. A yet faster way is to drag the formula in the first line all the way to the over 80-year-olds, the result of which is shown in Illustration 8.21a:

Illustration 8.21a  Formula “=COUNTIFS($B$2:$B$1016, “*from*, $C$2:$C$1016, “14–19”)” dragged to the right

Illustration 8.21b  First correction of dragged formula

We then have to adjust the values for the age groups in a second step, with 20–29 replacing the 14–19 age qualifier, then 30–39, and so on. The first such correction is shown in Illustration 8.21b. This method is much faster than retyping, but requires some care in adapting the conditions. Illustration 8.22 shows the full table for both n and percent, with the last condition for “over 80”-year-olds shown.

310 The Written Questionnaire in Social Dialectology

Illustration 8.22  Q1-different by age group & three major variants

Illustration 8.22 is our first “real” result of linguistic significance. We can see from the first line that from is in decline: more than 80% of those over 80 report it, while just under 50% of the teenagers do so. Likewise, than has been rising steadily. Congratulations! You have just replicated the findings from Figure 4.9 in Chapter 4. TIP: There are, of course, also tutorials for COUNTIFS commands. For instance, you might want to review the non-linguistic tutorial at [23 Sept. 2015].

Excel graphing tool With more complex information, graphs work better than tables. Excel offers a very convenient graphing tool. You only have to mark the data you would like to chart with the mouse, which is highlighted (bottom right) in Illustration 8.23 and press Insert from the top menu of Excel and select the option Line (for a line graph):

Illustration 8.23  Marking data for graphing tool and chart



Chapter 8.  Working with WQ data

The graph in Illustration 8.23 represents visually the data in the marked table we just created. It can be further refined by adding axis labels (percent on the y-axis and age-cohorts on the x-axis) and it can be give a title, e.g. “different from/than/to in the Golden Horseshoe 1991/2”. In any case, the chart can be saved in Excel and exported into your paper or report. As a rule of thumb, a line graph is preferred over a bar graph when there is a meaningful order and some sort of progression, at least for a good part of the period, is visible. This is the case in the chart in Illustration 8.23.

More exercises on the file “GH (All) – q1-different” 3. Calculate the number of respondents per age group and gender for the than variant of q1-different, using the COUNTIFS and SUM commands. Present them both in absolute frequencies (n) and in percentages (%). Use the following format:

Since than is the incoming variant, do you see any (female) behaviour that would fit with any of Labov’s three gender principles? Are there any outliers that do not conform with these principles (perhaps, you can theorize on ways to index social meaning, see Chapter 6). 4. Is there an effect of education on the use of incoming variant than? Correlate the use of than with education and gender, using the following format:

In addition, do you see any problems with the number of responses in each cell? If so, which are they? If not, why not?

COUNTIFS with indices: RI COUNTIFS commands can be applied with as many conditions as one wishes. In practice, however, we will often see the limitations of the data, with very low respondent counts once we subclassify by a number of social characteristics. There is one special use of COUNTIFS, which employs some of the indices presented in the previous

311

312

The Written Questionnaire in Social Dialectology

section. I will show the procedure, using mathematical operators and = in the conditions for the RI, but they apply just the same to the LUI or any other index that works with a numerical value. With the RI, as shown in Section 8.2, we have a number of cut-off points. We said that RI1–5 is for local people and it is only with RI 6 and especially 7 that a lot of respondents of different backgrounds are included. The Golden Horseshoe file features the RI in column “I”, see Illustration 8.14. We might now calculate the results for all five stages, RI1, RI2, RI3, RI4, and RI5 and then add the results, but luckily, we do not have to do that. Excel’s mathematical operators < (smaller than), > (greater than) = (greater and equal) assist us. To obtain only the results for all respondents whose RI is equal to or smaller than five, the following attribute needs to be added to a COUNTIFS chain: I2:I1016, " source("C:/Documents and Settings/Stef/My Documents/R Data/e_scripts/ 05-1_hcfa_3-2.r")35 > hcfa() Enter a path to choose/create a directory for all the results files from this analysis! Press to continue …

After entering “hcfa()” in the R console and pressing [ENTER], HCFA is running and prompts us to choose or create a directory for the HCFA output files. Press [ENTER] to continue, as prompted. Then press “1:” and you enter the path to your output directory (note: just the path, NOT the filename), in this case “C:\rspecial” and press the enter key: 1: C:\rspecial

[ENTER]

R reminds us: “Please put your data file into this directory … Press to continue …”. This is precisely what we do – we copy our input file into the directory named

35. The file path, in this case “C:/Documents and Settings/Stef/My Documents/R Data/e_scripts/” needs to be adapted to fit the storage location of the file “05-1_hcfa_3-2.r” on your computer.

342 The Written Questionnaire in Social Dialectology

above. Please note that HCFA does not accept directory names with spaces (e.g. “My Documents”). It is therefore advisable to create an extra working directory (all lower case), such as my “rspecial” directory in the root C:\ drive. We press enter (our file is already there). Next, you tell R which table format to use (which is the format of “differentvan.txt”): What does your input file contain? 1: Raw data (= default) 2: Configurations and frequencies

We select “1” for Raw data. Then R prompts us for the type of test we would like to use: Which p’s do you want to use for significance testing? 1: conservative: binomial with Bonferroni 2: more powerful: Holm sequentially rejective

Always use option “2” here, which is the more powerful predictor (1 was given just for historical reasons). Then, we specify what kind of output we would like (1 is recommended):

How should the output be sorted? 1: q (= coefficient or pronouncedness) 2: observed frequency 3: p 4: contributions to chi-square 5: no sort (= nested tables)

Here, we select “q” as our default option (we need not worry about the other ones). R then wants us to select the file from the working directory we provided earlier: “Where is the input file? Press to continue …”

Pressing [ENTER] will open a selection menu from the directory path we provided earlier. We go and select the data file “differentvan.txt” with the mouse:



Chapter 9.  Statistical testing with R 343

Illustration 9.3  Input routine for HCFA testing

After selecting “differentvan.txt” and pressing [ENTER], HCFA will compute the statistic. It prompts us with, in our case: 15 subtables have to be generated; they will be put into your working directory Press to continue …

We press ENTER and we are told that

1 done, 14 to go … 2 done, 13 to go … 3 done, 12 to go … 4 done, 11 to go … 5 done, 10 to go … 6 done, 9 to go … 7 done, 8 to go … 8 done, 7 to go …



9 done, 6 to go … 10 done, 5 to go … 11 done, 4 to go … 12 done, 3 to go … 13 done, 2 to go … 14 done, 1 to go … 15 done, 0 to go …

Highlight the subtable(s) you want to include in the analysis! Press to continue …

After pressing ENTER, another pop-up window appears. All possible tables, in our case 15, are numbered from 0000 to nnnn. We take all 15 tables by holding the [CTRL] (or [COMMAND for Mac] button and marking them with the mouse, as shown in Illustration 9.4. Then we press [ENTER].

344 The Written Questionnaire in Social Dialectology

Illustration 9.4  Select all tables from 0000 to 0015 (in our case)

Then, we are done and HCFA tells us so with the following casual remark and plea for citation (which we, of course, dutifully carry out): I’d be happy if you provided me with feedback and acknowledged/cited the use of HCFA 3.2 as Gries, Stefan Th. 2004. HCFA 3.2. A program for R. URL: . That’s it …!

With a few keystrokes, we have instructed R to compute complex measures and correlations. The results are presented in three output files, which HCFA wrote in the directory we listed at the beginning (in our case: c:\rspecial). The files are listed on the left; on the right you will find the names of the output files for “differentvan.txt” as they appear on the website (for comparative reasons): HCFA_output_complete.txt differentvan_HCFA_output_complete.txt HCFA_output_hierarchical.txt differentvan_HCFA_output_hierarchical.txt HCFA_output_sum.txt differentvan_HCFA_output_sum.txt

You can also see the files in the menu in Illustration 9.4. How are we to interpret the result?



Chapter 9.  Statistical testing with R 345

It is advisable to use a simple text editor that numbers the lines automatically. I can recommend Notepad++, which is freeware and works fine in Windows. First, let us look at “HCFA_output_sum.txt”, the “sum” standing for summary (you can compare your output with the one that is uploaded to the website called “differentvan_HCFA_ output_sum.txt”. Their content should be identical. If it is not – try using the file “differentvan.txt” from the website as input to see that HCFA is running properly on your computer). “HCFA_output_sum.txt” has 114 lines, and lines 13 to 52 are shown in Illustration 9.5 (as shown by Notepad++):

Illustration 9.5  HCFA_output_sum.txt for differentvan.txt

In line 34 it gets interesting. All lines before deal with only one variable, but line 34 reads “q1 and age”, that is, this table (lines 34–39), shows age influence on q1, i.e. different from/than/to. With a chi-square p-value of 0.0014, and thus smaller than 0.05, yielding a significant age result (age influences q1 in significant ways). Gender, as can be seen in lines 41–46, does not, as its p-value of 0.55 is much bigger than the cut-off of 0.05. Education, on the other hand, has a significant effect, like age, with p = 0.018.

346 The Written Questionnaire in Social Dialectology

Reading on in the output file, one can see that interactions of independent variables are computed as well, such as the effect of a combination of age and education on q1, which can be seen below (lines 83–88 in the original file): Variables: q1 age education chi-square = 360.073 G-square = 244.473 df = 114 p for chi-square = 0 p for G-square = 1.605638e-11

The value for p is “0”, or considered 0 by my ThinkPad laptop, for chi-square and therefore highly significant. For age and gender and education (lines 97–102), we have both p-values at “0” (chi-square and G-square). Generally, we do not worry about the G-square, but go by the chi-square. The degrees of freedom (df) of 114 tells us that R made many comparisons and tests to arrive at the result. More tests are found in the complete file, but the summary file lists the most important comparisons. HCFA is a powerful pattern finding technique. It also works well for hypothesis testing. HCFA will take care of many of our statistic needs and some people might say that this is all we really need. Other scholars will argue that the best way to model linguistic data – where some variants that occur very often and some not at all – is logistic modelling, which is introduced next.

9.4.2 (Non-linear) logistic regression modelling One problem of linear models is the premise that data distribution should be balanced between cells. We know that this is not the case with language. For instance, we know that some sounds occur much more often than others, in English [s] occurs more often than [kw], or some words occurs more frequently than others, e.g. in written (British) English the occurs almost 24 times as often as has (see Table 3.3) and, of course, many more times more often than disestablishmentarianism (which is one of the longest common nouns in the English language and of very limited use). The present section takes further strides into the direction of data modelling, which can be understood as applying multiple rounds of statistics procedures, multiple “runs”, on the same data in order to arrive at the most elegant formula predicting as much of the data as possible. HCFA has done precisely that in a linear paradigm and in this section we will use a logistic, non-linear method. For binary (categorical) dependent variables (and that means for a lot of the WQ data), logistic regression is often the method of choice. The example to be discussed here is the choice of yod-ful or yod-less variants in the variable news as the dependent variable and a selection of the following independent variables:



Chapter 9.  Statistical testing with R 347

sex (binary-categorical)* age (categorical)* education (ordinal) social class (binary) RI (ordinal) (see Section 8.2.1) LUI (ordinal)* (see Section 8.2.2)

The list could include more or even all (types of) independent variables in Table 9.3, but for demonstrative purposes, the Multifactorial Logistic Regression that will be shown shall be limited to only three independent variables (marked with an *). In the following we will first attempt to predict the pronunciation of news – with yod and without – by the independent variables (or factors) and their interactions (combinations) between factors. Then we will switch to regression without interactions, i.e. model the influence of the independent variables on the dependent variable but not in combination. This choice will allow us to show the complete screenshots of the output for reach regression run and will allow to effectively demonstrate the principle: whether you work with interactions or without, the same principles apply. How to go about logistic regression modelling? The basic idea is that we start with a model that includes ALL independent variables we are interested in, in our case three. We then decide whether to include interactions or not. Interactions will show us the influence on the dependent variable for two or more independent variables working in combination. So, for instance, an interaction would be how gender, age, sex and LUI – in all their different combinations – influence the variable, rather than each independent variable’s influence on its own. First, we always put all independent variables together and run the Logistic Regression. This is called the first step in a “Step-down” approach, since we will be taking out variables one by one (hence, stepping down) (“stepping-up” is also possible, where you start with one independent variable and include more, one by one, but we will be leaving this alternative strategy aside as the slightly less frequently used one). In the above example we will therefore include sex, age, and LUI and see which percentage of the actual data (for the variants for news in the Montreal data) this model predicts. We will obtain a Multiple Regression Coefficient, R2 (pronounced “R-square”) that tells us how our model fares. The higher R2, the better the model’s prediction of the data. R’s output lists with the AIC another index in lieu of R2 to assess the “model fit”. AIC stands for Akaike Information Criterion (named after its inventor) and for AIC, the better the model, the lower the AIC score. It is important to keep these two scores apart: R2 AIC

the better model has a

higher R2 lower AIC

348 The Written Questionnaire in Social Dialectology

The Step-down approach, as mentioned before, starts by removing the one factor that contributes least to the explanatory power. We then check to see how the AIC (R2) has changed. Then, we will remove the next least significant factor and so forth until we have reached the point where we have removed all factors that can be removed (more on that later). The model we have at this point is called the Minimal Adequate Model (MAM) for our data, which is the equivalent of Occam’s Razor, the principle going back to the medieval monk William of Ockham [sic], who stated if two rival models predict the same outcome (the “data”), the one that rests on fewer presuppositions is to be preferred. The MAM would be that model. When modelling with six or seven or more independent variables there are a few runs that can be done, while our three variables allow the removal, in principle, of only two independent variables, which demonstrates the principle and the precise procedures.

Importing “data frames” into R First, we will load the data into R; R refers to data as “data frames”. Data frames are nothing more than tables, as shown at the beginning of Section 9.4.1, but in a universal, barebones format that R can read. You can download the Montreal Dialect Topography data set yourself, with the option “Full Data Length” and data-ready it as described in Section 8.3.1. In the interest of time, the file is also offered on the companion website (Open the file “Montreal news.txt”). The R command to open the file is:  Montreal.news comment.char="")

It’s best to copy-and-paste these commands in a text file and then merely to adapt the names, for instance. This line reads that our data will be imported into R under the name “Montreal.news”. This will be the “data frame” that includes the data. Pressing [ENTER] will open a pop-up window and you are then required to select the a Tabdelimited text (txt) file (not the Excel, xls, file) that contains the data: this is what R expects, given the attributes, header=T, sep="\t", quote="", comment.char="", of the import command. After import, if you type the name of the data frame, “Montreal.news”, and press the [ENTER] key, you can check to see if the data was imported properly: > Montreal.news

[ENTER]

Illustrations 9.6a and 9.6b show you what it should roughly look like. R is ruthless and direct: if you mistype a command, R will give you all kinds of error messages (and likewise if you forget to convert the readied Excel file into a “Tab delimited” text file).



Chapter 9.  Statistical testing with R 349

Illustration 9.6a  Montreal.news.txt: original output in “q42”

Illustration 9.6b  Montreal news2.txt: original q42 replaced with R-compatible format

The case for data import checking: Two errors Checking of data, data cleaning and data-readying are very important tasks and should be taken seriously. After all, if the input data is incorrectly formatted, the output data will be wrong and you may not even realize it. In this section, I will show you what often happens with real data when imported into R and how you can fix the most common problems. When importing the data file “Montreal.news”, we imported, unknowingly, data that contained two serious formatting errors. Often the user is not aware of this. In more than one way, the most difficult thing about R is to get the data into the program. Once it’s there, it is very easy and convenient to carry out the most elaborate tests, which is one of R’s (many) great strengths. So what went wrong? Illustration 9.6a shows in the column for (dependent variable) “q42” two factor levels: “2. nooze [noo]” and “1. nyooze [nyoo]”. These names for factor levels work fine the DT context and came with the downloaded DT file. Now R is somewhat particular with its format requirements for names, as it does not like [] and blank spaces “ ” in them (it is okay, please note, with dots . and underscores _). If we were to run the test now, (very) faulty results would follow. It is an absolute must to check, at least browse, the imported data for potential problems of allowed name formats. In some cases, R will tell that something is wrong, but as there are many different file formats, R might not see all possible problems.

350 The Written Questionnaire in Social Dialectology

Formatting problems are best checked and remedied not in R, but in Excel or another spreadsheet programs with a GUI. This requires the reimport of the corrected file into R, so that going back and forth between the two programs becomes somewhat of an expected and common “loop” for anyone working with R. To correct errors in the format of the input file, it is often easiest to utilize the “Replace” function in Excel. In the present case, two replacements are required: first, we replace all “2. nooze [noo]” names with unproblematic names consisting of text characters alone, such as “nooze” (without [], spaces or any special characters). We proceed likewise with “1. nyooze [nyoo]”, which is replaced with “nyooze”. The result is Illustration 9.6b, which shows column “q42” in a cleanly formatted way with levels “noo” and “nyoo”. It is important that we do not use numbers for the new names, as can be seen in the same Illustration in column “q42binary”. As you know, q42 is a categorical (binary) variable, and categorical variables require text labels in R. Numerical levels, e.g. 0, 1 or 2 (as in “q42binary” in Illustrations 9.6a and 9.6), would lead R to treat the variable not as categorical but as continuous, which would result in faulty results. The golden rule of name formatting in R prevents all these problems: use numerical names for variables that are continuous, e.g. response times in seconds, e.g. 1.347, or an index that is rank-ordered, 1, 2, 3, 4, …, but use text characters for categorical variables. Never confuse the two. When in doubt, check with someone. The change in name formatting from the original input file “Montreal.news” is significant, which is why we should give the new input file a new name, such as “Montreal.news2” to reflect the new format. With the name changes to the input file, we are half-way on the road to a useable R input file. There is one more, very common error in “Montreal.news2”. Illustration 9.7a shows that the dependent variable, “noo” or “nyoo”, is missing for some entries (respondents). This becomes a serious problem for dependent variables in logistic regressions, which require two variable levels for all entries. The blank field in line 234 (Illustration 9.7a) would not only trigger a warning, it would also block the regression entirely. In total, “Montreal.news2” contains four missing values in “q42” that need to be fixed (lines 234, 292, 376 and 463, which means that four of the about 500 respondents failed to answer this question on the [paper-based] original DT questionnaire).

Illustration 9.7a  Montreal.news2: missing value for dependent variable in line 234, id 5234



Chapter 9.  Statistical testing with R

Illustration 9.7b  Montreal.news3: missing values removed

There are ways for R to only consider complete entries (command “na.strings” if you’re interested), but they are a bit tricky and one would need to have a solid understanding of R. The easiest way to solve the problem is, therefore, to delete the four incomplete entries for “q42” in Excel and save it under a new file name (e.g. “Montreal. news3.txt”). This way, you can always go back to the original data set, which is what you’d want). In R, we can call the newly imported file “Montreal.news3”. Illustration 9.7b shows line 234 from Montreal.news2, id 5234, age 40–49, now deleted. The entry with the id 5235, age 70–79, is now listed in its place. This, finally, is the file we can work with.

A first logistic regression with interactions As you will suspect by now, we will import the file with the following command. Here’s a tip: in R, using the “Arrow up” key, brings back previous commands and saves a lot of typing. Therefore, use “Arrow up” [↑] to bring back the import command for “Montreal.news2”, press [↑] (or [↓] until you find it, then move the cursor to be beginning and replace the “2” with a “3” to read):  Montreal.news3 comment.char="")

Select Montreal.news3.txt as the input file. With the next command you can display the first 234 lines, at which point R stops the output, so you can conveniently check whether the old line 234 is gone: > Montreal.news3 [1:234,]

Now that we have our file in order, we can benefit from all the convenience of R. Let us carry out a first logistic regression with three independent variables and their interactions. We could, of course, include all six variables in Illustration 9.6a, all but “occupation”, which would need to be categorized and data-readied. Remember, these are original DT files not coded for R. For simplicity’s and clarity’s sake, we shall stick to age, sex and LUI. Interactions are, as discussed above, combinations of independent

351

352 The Written Questionnaire in Social Dialectology

variables with one another: so, we not only test how the 20–29-year-olds use variant “noo” compared to the 14–49-year-olds (without interaction), but we can check how, e.g., sex and age, the female 20–29ers compare with the male 30–39ers or the like for their use of nooz vs. nyooz. In R, interactions are created by using “*”. If you do not want to include interactions, simply use + instead of the *. The regression command including interactions is this: > model.glm1 summary(model.glm1)

This will give you the output in Illustration 9.8. The column of interest is on the far right, labelled Pr(/>z/), the statisticians’ more precise name for the p-value, which is between the extremes of 0 (no influence) and 1 (influence fully). Unless there is a mistake in the input file, all p-values will be somewhere in between 1 and 0, that is rather than 0 you will see R’s equivalent of 0, which is 2e-16. Any 1’s or 2e-16’s in the output should trigger your alarm bells and make you re-check your input data. The cut-off point to consider a factor level, which are listed on the far left, and on interactions of levels, to be significant, is again 0.05. R marks significant levels with symbols next to the right-most column, with *** (highly significant), ** (very significant), * (significant in our sense, i.e. less than 0.05) and . (marginally significant, with less than 0.1). We see that several factor levels for age are significant, while the LUI is marginally significant, which also holds for the interaction LUI and age group 70–79. In a step-down approach we would then take out the factor that contributes the least to the model: if you study Illustration 9.8, you will notice that gender is that factor: “genderm”, male gender, as computed against the reference point “genderf ” (not shown), has a p-value of 0.7086 and thus a far cry from 0.05. In addition, its interactions, gender and age, e.g. “genderm:age20–29”, or gender, age and LUI, e.g. “genderm: ageover80:lui” are mostly close to 1, and with the lowest, “genderm:age50–59”, with 0.32 also far from any significance. We would therefore re-run the regression without “gender”, which seems to have no effect whatsoever on the pronunciation of news in the data. The command would be this: > model.glm1.1 model.glm2 model.glm3 summary(model.glm3) The syntax, using the tilde symbol, ~, the dot . and the minus – sign, instructs R to use “model.glm2” but to take out the factor education. The summary function shows again the outcome:

Illustration 9.10  Model 3, model.glm3, without interactions

We see from Illustration 9.10 that the factors age (all levels except 30-39), social class, RI and LUI are significant. Only gender, “genderm”, is not. The AIC has also decreased from 762.49 in model 2 to 757.67 in model 3. This is good news and exactly what you are looking for: we are on the right track. We also see that the p-values for basically all factor levels have changed. Compare, for instance, the p-values for the factor level “age70-79” in model 2 and model 3, or any other level for that matter: they are not the same, as R calculates an entire new regression and leaving out one variable, “education”, like in any closed system, affects to a greater or lesser degree all others. This is because

358 The Written Questionnaire in Social Dialectology

in a regression everything is connected and the entire model changes if we take out (or add) one or more factors. In this fashion, we can continue to take out factors until we reach (something close to) the MAM. Judging from Illustration 9.10, the next contender would be gender, as “genderm”, is high with 0.690633. We can remove “gender”, again using the update function: > model.glm4 summary(model.glm.4)

Illustration 9.11  Model 4, model.glm4, without interactions

Our AIC is now, in model 4, 755.83 and smaller than the AIC of model 3, which was 757.67. Model 4 appears to be our MAM, since all remaining factors are significant (because lower than 0.05). We would stop here. If the AIC had gone up compared to 757.67, we would go back to model 3 and declare it the MAM. But this was not the case here. Alternatively, if you look back to model 2, we could take out gender first, see what happens to the AIC, and then education, with possibly some other factor that could be removed. In any case, you get the idea: enjoy your newly acquired skills with complex statistics procedures.



Chapter 9.  Statistical testing with R 359

There are, of course, many more ways to run logistic regressions. One such way is to use a nice little program called Rbrul, which has the advantage that it translates the R regression output into the output that many, mostly North American, variationists have gotten used to. This output format is called “factor weights”, which are, again, numbers between 0 and 1: if a factor is less than 0.5, it means that it hinders the realization of the variant in question, if it is greater than 0.5 it enables the variant. Those who are interested in using R in ways that directly speak to many North American variationists will find everything they need, including the source code and installation instructions, here: . Note, however, that factor weights are merely a conversion of the log likelihoods that stand behind the p-values and offer no objective advantage whatsoever. R is the superior statistics tool by a long stretch compared to the suites. It pays off manifold to learn a little R.

9.5  Chapter summary We have discussed a number of analysis techniques with R, and we concluded this chapter with logistic regressions of two types. We began the chapter by exploring the rationale behind statistic testing and modelling and we offered three procedures for linear models: two chi-square versions, one of which (HCFA) is a powerful multifactorial linear model tool, and a conversion of a categorical variable into an ordinal variable. Ordinal variables allow for more statistic processes than categorical ones. We did not discuss means and standard deviations, as these are the result of ratio-scaled variables which are not very common in traditional WQ data, though quite frequent in speaker evaluation WQs. The section on non-linear analytical methods included two types of logistic regression methods in the native R environment, with and without interactions, and we addressed some of the most pesky, yet rarely reported and explicated problems of data import into R, which, I think, represents the biggest stumbling block for newcomers to R who were digitally raised in WYSIWYG-environments (what you see is what you get – graphic user interfaces). It is my hope that the few simple tricks will alleviate import problems and will instill the kind of confidence in the newcomer that will allow them to problem-solve and troubleshoot on their own, which are essential skills with any stats program. I have not found in the introductory books of R that I worked with much help in that regard and I hope, at the same time, that the problems made explicit will not deter anyone from giving R a try. Finally, I hope this stats crash course will help students, especially students of the humanities that have traditionally not been exposed to computation, to appreciate the many wonderful sides of R, which is the most amazing stats and graphing tool available: the Lamborghini, to stick with the car metaphor, of statistic suites. Hopefully, you will begin, step by step beyond this brief introduction and with the help of the more advanced texts, such as Gries (2009b) and Baayen (2008) – preferably in that order –, to appreciate R’s statistical prowess.

Chapter 10

Epilogue The present book has aimed to contextualize WQs in social dialectology in relation to three established methods: the FI method, corpus linguistic methods and, finally, the field’s “gold standard”, the sociolinguistic interview (Chapters 1 and 3). As an elicitation method, WQ data is of a different type than corpus linguistics data in the sense that unlike the sociolinguistic interview, it is a written elicitation method that usually works with direct, meta-linguistic types of questions. As the previous chapters have shown, WQs offer important data informing sociolinguistic and dialectological theory building. Chapter 3 has produced evidence that, certainly for non-stigmatized or comparatively lightly stigmatized variables, reported linguistic behaviour offers generally a very good match with observed linguistic behaviour and is, in some ways, superior to it, while in others clearly not as good. Lexis is a widely acknowleged strong suit of WQs, while morphosyntax has seen recent innovations (Chapter 7) that propel the method to new heights. Similar assessments can be made for pragmatic and related phenomena. For sounds, phonological information can be, despite some problems, also quite reliably and consistently elicited across speakers, but this linguistic level would benefit from more experimentation. For the latter, visualization techniques aiming to use non-verbal tools to unambiguously (or less ambiguously) depict phonetic and acoustic features have recently been developed in speech therapy (Ruß 2008) and beckons adaptation and pilot testing in social dialectology. While they are currently designed to aid very young children alongside verbal explanations, the basic semiotic principle seems to have considerable innovation potential to freeing the WQ of its “phonemic straightjacket”. Traditional WQ studies have produced meaningful results for a sizeable array of variable types. More recent approaches have explored methods of community-reporting, targeting what is commonly heard in a given locale, and explore linguistic reporting beyond the level of self-reporting. For attitude and perception studies, WQs are already the method of choice in many contexts, as they allow not only the self-reporting of attitudes, speaker evaluations and perceptions, but also effective community-based reporting modes. By reporting how varieties are perceived and socially evaluated, important information on the social evaluation is offered, as we know since Lambert et al.’s (1960) early work.

362 The Written Questionnaire in Social Dialectology

10.1 The revival of WQs in social dialectology At the present point in time, WQs represent a minority approach to the study of social dialects, as elaborated in Chapters 1 and 2. In more than one sense, the WQ’s ease of administration is its biggest asset as well as its biggest problem. While WQs are relatively easy to design, it is difficult to design an efficient and well-devised WQ. It is even more difficult, however, to find agreement on best practices, as local traditions have developed in isolation. This feature of perceived “willy-nilly-ness” contributed in no small part to the WQ’s decreasing popularity among dialectologists and sociolinguists, especially when compared to the many uses of WQs in applied linguistics, pragmatics and speech act theory or typological linguistics. In the context of the social dialectology of English, Hempl’s WQ (1896a, b), for instance, probably contributed to the demise of the method in the USA: administered with haste, it included a number of suboptimal questions while its response options were generally too unstructured. In the mid-1970s Mather and Speitel (1975: 10) suggested that social science questionnaire design would improve the quality of WQs. As the review in Chapter 7 suggests, social dialectologists seem to not have yet agreed on universally accepted design principles for WQs, but there have been steps in the right direction so that parts of such agreement might come to fruition comparatively quickly. In some way, social dialectology WQs have flourished despite their bad reputation. Chapter 4 has shown that traditional WQs of linguistic self-reporting have been successfully employed in the Canadian context. It was suggested that their application in the quickly growing area of World Englishes might help address a dearth and need for data in a more and more diversifying English language in a cost-effective manner. Literacy rates are a crucial constraint on the usability of WQs in a number of contexts, yet English as a Lingua Franca studies would be one area where WQs can contribute greatly to the definition of what ELF is in a number of superregional and global contexts. Today there are clear signs that WQs are to regain a role in social dialectology. This revival comes, in the case of English, after a long hiatus of about half a century. As shown in Chapters 2 and 3, clear signs of the versatility and validity of WQ-generated data were seen in the context of English linguistics as early as the late 1940s and early 1950s. McDavid commented consistently very positively on the method. McDavid’s (1940) early study proved that WQs are useful in the elicitation of phonemic variables, with surprisingly high matching rates between the FI and WQ data sets he compared. Davis’ (1948) was then to prove that lexical studies could be successfully carried out with WQs. Yet somehow, WQs did not regain their status as a fully legitimate data collection method. While all assessments and comparative studies produced positive results, the WQ was always presented as a second-rate type of data collection method in dialectology. When sociolinguists entered the scene, their focus on the vernacular and



Chapter 10.  Epilogue 363

their uncompromising focus on one type of data – the data produced in sociolinguistic interviews – was the final demise for the WQ in social dialectology. Until the 1990s dialectologists were only slightly less negative towards the WQ. Sociolinguists’ interest in WQs was characterized, by one of the few practitioners who dared to employ them at the time, as simply “negative” (Chambers 1998a: 222). The current mainstream attitude can be gleaned from Boberg (2013), who is more positive, but still limiting WQs in a number of areas when, I hope to have shown in this book, they should be more appreciated. In the crucial late 1940s and early 1950s, the period of import in English linguistics relating to WQs, Harold Allen (Chapter 3) and, more crucially so, Fred Cassidy (Chapter 2), were unfortunately among the WQ’s sceptics. In hindsight, Cassidy’s insistence on using FIs for the dialect survey behind the Dictionary of Regional American English (DARE) was on one level a logical choice, helping raise DARE’s status and credibility in the profession. At a time when FI interviews were rolled out en masse for the linguistic atlases in North America (LAUSC) and England, and even bigger plans were devised (e.g. Kloeke’s 1952 plan to coordinate a “linguistic cartography of the world”), which all relied on the universally accepted FI method, of which the sociolinguistic interview must be considered an adaptation, every principal investigator of a new project would have risked his or her reputation had they not settled with the FI as the primary data collection method. As a side-product, however, Cassidy’s decision, as understandable and logical as it may seem, damaged and restricted the scope of the WQ in social dialectology for decades to come, even for lexical study, which has always been considered the one linguistic level that most linguists were willing to concede to them. Three decades on, sociolinguist Ron Macaulay epitomized the low status of WQs in the field, when expressing his surprise over LAS’s postal questionnaire, which “despite the shortcomings of the method” managed to produce “clear isoglosses” (1979: 227). Fast-forward another three and a half decades from Maculay’s assessment to the now finished DARE, completed in 2013 (Cassidy & Hall 1985–2013), and we can finally hear a different tune. As part of attempts to maintain the project beyond the editing of the last lexeme (which is, as an aside, zydeco), DARE’s editor-in-chief and Cassidy’s successor Joan Hall announced the launch of a WQ (!) resurvey of American English to keep the data up-to-date and to probe further. While Cassidy tried hard to suppress the virtues of WQs at the time, Hall can now, in a changing linguistic climate, openly and without much criticism, embrace them. It is almost as if there never were a debate, should my reading of Hall’s announcement and reaction be accurate. Clearly, the times are a-changing and the tides are a-turning in favour of the WQ. In hindsight, it is pretty obvious that a lexical WQ would have been the smarter, more economical and more fruitful choice for DARE, which, as a massive achievement

364 The Written Questionnaire in Social Dialectology

in itself, became to be the single-most expensive NEH (US National Endowment to the Humanities) project in history. A DARE based on WQ data would not only have been more detailed, because of more data points, and cheaper, because of lower labour costs and faster publication cycles (potentially and in relation to the 58 years it took to produce about 58,000 extremely well-defined lexemes). I am of the firm conviction that WQs, as my account should have made clear, would have yielded superior data for DARE, with the obvious exception of the pronunciation features. If done right in multiple rounds, like the Atlas zur deutschen Alltagssprache (see Chapter 2, Elspaß & Möller 2003–), with short WQ questionnaires and with tens of thousands of responses, DARE would have become a WQ-based dictionary that could have been supplemented with select FI queries. It is always easy to be smart in hindsight. It is not my intent at all to take away from the enormous achievement that DARE, beyond any doubt, represents. On the contrary, DARE needs to be commended for pulling off the project and seeing it, against all odds, to fruition. It is the “best” dictionary of any variety of English and has set an utterly new benchmark for the treatment of regional varieties. Cassidy’s under-appreciation of WQs must have had a profound impact on the status of WQs in the field. After conducting WELS (the Wisconsin English Language Survey), which was limited to lexis in the late 1940s, he was the most experienced person in the use of WQs at the time and a decision to opt for WQs would have carried considerable weight. By comparison, Maher and Speitel’s (1975) adoption of WQs for the lexical part of the Linguistic Atlas of Scotland, which is based on WQ data elicited by Angus McIntosh and associates in the 1950s, had comparatively little effect: LAS was not published until the 1970s, at which point Maher and Speitel already had to work hard to justify even the use of WQs. Consequently, they labelled their work as merely “a tentative step towards the acquisition of knowledge”(p. 10). Like so many WQ-based projects since, they felt the need to apologize for their method. Could it really be that Cassidy did not fully exploit the advantages of WQs? Cassidy & Duckert’s (1953) book on WQ methodology shows that their WQ approach was too deeply entrenched in the FI tradition to allow a genuine WQ approach to develop. By converting the long and detailed FI questionnaire format into a WQ format and by applying FI informant selection criteria to WQ, the strengths of the WQ method were not harnessed. For instance, short WQs, sent to thousands of households in multiple polling rounds would have offered more lexical variation than was possible to be gathered with DARE fieldworkers across the USA. It is probably true that the data processing capabilities at the time would not have easily afforded the administration of returns from that many respondents and locations, but the logistic problem was never offered as a rationale for the rejection o WQs: it was always the alleged lesser quality of WQ data, as was concluded in Chapter 2.



Chapter 10.  Epilogue 365

Today, clear signs can be seen for a revival of WQs in the field. Chambers’ Dialect Topography was among the first that expressly placed WQ data in a sociolinguistic context. Boberg’s NARVS, unusual for its variationist adoption of WQs, coincided with Bert Vaux’s internet US dialect survey (2004). As such, English dialectologists of late have been widening their methodological scope to a degree that has been seen for a longer time in German linguistics and, more so, in Dutch and Flemish linguistics, where WQs have always had their accepted role. Methodological innovations in social syntax in these languages have shown, in Chapters 2 and 7, that WQs are used in innovative ways for data that are difficult to obtain via observation (see, e.g., the disadvantages of corpus linguistics in Chapter 3). Perhaps one of the clearest signs that WQs have acquired a permanent role in social dialectology, however, is that WQ studies are now used as benchmark data for the evaluation of nascent methods, such as site-restricted data searches. Grieve, Asnaghi & Ruette (2013), for instance, use Vaux’s data from internet WQs as their benchmark. Today there are entire disciplines that almost exclusively rely on WQs for their main data collections, but social dialectology is not one of them. The field of perceptual dialectology is the exception, as it was founded on the methodological basis of WQs (Chapter 7) and has offered important findings on the social evaluation of varieties around the globe. Language attitude studies as such are also largely dependent on WQs and have already been put to good use in a number of contexts, including the presently very dynamic field of English as a Lingua Franca (Chapter 5). Since WQs are in general use in applied linguistics, it is not very surprising that ELF studies would draw on them. Perception, speaker evaluation and attitude studies are areas that have benefitted from their proximity to sociological and psychological disciplines and questions types that overlap between them. As such, these linguistic fields are poised to introduce questionnaire features and methods from the social and psychological disciplines, which have been in the vanguard of WQ design for many decades now (e.g. Holm 41998 and his six-volume WQ reference guide for sociology). There is a logic in the dove-tailing of these areas, as the assessment of, e.g. attitudes towards political parties and the perception of dialects share a number of structural features in a way that studies of reported linguistic behaviour do not. But this does not mean, of course, that the latter is precluded from employing WQs in ways that fit the object of study. It only means that more exploratory and methodological preliminary work needs to be carried out in social dialectology and it is precisely the dearth of this kind of work that has not helped the situation (but see, e.g., Section 7.3.6 on “some tentative insights”, Buchstaller et al. 2013; Dollinger 2012b; Chambers 1998a or McDavid 1940 for the few studies in that vein to date). Fortunately, the tide is changing and it is to be hoped that more scholars will critically probe into reliability, validity and problems of WQs in variation studies.

366 The Written Questionnaire in Social Dialectology

10.2 WQs and linguistic variables The previous chapters have aimed to demonstrate that WQs are useful on almost all linguistic levels for the reporting of linguistic behaviour, with the exception of finegrained phonetic variables and variables that are undergoing reindexicaliztion. Two issues to mitigate are adverse effects of the prescriptive tradition on WQ responses and the social stigma of variables (both addressed in Section 7.3.6 and passim), which applies in principle to all linguistic levels. If a variable is heavily stigmatized, respondents will not report faithfully what they use themselves unless mitigation procedures are devised. It was shown that some scholars prefer the use of WQs for lexical variables only, which is an attitude that we first saw for English in the 1940s. However, the previous chapters should have demonstrated that the polling of phonemic variant choices is not necessarily more difficult than the polling of lexical information and it will remain to be seen whether some advances can be made for some phonetic features using novel visual representations. It is justifiable to say, however, that the limitation on lexis seems to be unwarranted and must be considered as not entirely reasonable (see Chapter 3). The area of “grammar”, i.e. syntax and morphology, was once considered not overly conducive to the study of WQs and Chapter 4 has shown that traditional WQs produced only a limited set of variables, such as variation in prepositions and verb morphology. The problem of stigmatized variants is also an issue in this domain, as prescriptive traditions have generally targeted grammatical items, especially in the written medium. For instance, it might be difficult to have speakers report their use of, e.g. I seen him yesterday vs. I saw him yesterday, but the newer methods of sociosyntax can alleviate some of the problems (Sections 7.3.3 and 7.3.5). The success rate on this linguistic level depends to a larger extent on the “linguistic stance” of the respondent, as a traditional teacher of say, English, who possibly spent a lifetime vilifying vernacular constructions, would be a problematic respondent. These obvious cases can be excluded or relativized with a detailed biographical section (e.g. profession – please be specific). There are other cases, such as the “hobby prescriptivist”, who may be an engineer by training but may have taken over overly traditional forms of the language that would prevent him from admitting to using non-standard constructions such as I seen. In such cases, obviously, some questions on language attitudes should help. For issues of usage, Chapter 4 discussed insights of relevance for social and cultural history, going beyond plain linguistic theory. The change of conventions of telling time, from analog to digital, has been traced back in its beginnings to the 17th century and has revealed the more and more detailed dissection of humans’ waking time over the centuries. In this case study, WQ data offered the crucial cues for interpretation, allowing the synthesis of the historical data (from DARE and other sources) with the immediate present-day scenario that revealed major pathways of transition from one form



Chapter 10.  Epilogue 367

to the next. Another example of social and cultural history, not expressly discussed, are spelling conventions, which may vary in different locations and contexts. The vexing Canadian spelling scenario was first authoritatively studied with the help of a national WQ by Ireland (1979). The study of socially conditioned conventions of language use, including writing practices, is a highly promising research area at the interface of language and society, which WQs can inform. Some of these cut across linguistic borders in a Sprachbund-like fashion and include pragmatic terms and conventions of use. Striking cases in point are found in the parts of the former Austrian-Hungarian Empire, where pragmatic expression such as Küss’ die Hand, ‘kiss your hand’ (a greeting to a woman), is (or was) used in its various vernacular translations, or the use of the informal greeting formula servus, ‘I am your slave’, which can be found, a century after the Empire’s dissolution, in Austria, Hungary, parts of Poland, the Czech Republic and Slovakia, among others. As the example of telling time has shown, once the facts are known, it is possible to relate the linguistic changes very closely and in a non-trivial manner, to societal changes and, only in a second instance, to the technological changes (change from analog to digital clock displays). One area of interest, in ELF and language pedagogy alike, is the study of idiomatic expressions, as shown in Chapter 5 (Section 5.4.4). In this area, WQs would be the type of method allowing the collection of substantial data that would otherwise be difficult if not practically impossible to obtain. Using newer methods such as translation tasks or reformulation tasks, respondents would be put in the position of showing evidence for language production, in addition to reporting, which would provide evidence for the active use of idiomatic expressions or lexical chunks that are otherwise difficult to come by, like many discourse-dependent and situation-dependent features. A central issue in Chapter 4 is tied to more theoretical considerations, otherwise addressed in Chapter 6. Because of the intricate connection of global perspectives and the new linguistic and cultural super-diversity that is likely going to be a new norm in the young millennium, theoretical issues of globalization, the need for new research methodologies and the bigger concept of spatiality (Britain 2010a) were addressed in this chapter, immediately adjacent and interlinked with the role that WQs seem poised to play in World and Global Englishes Studies. The aim of Chapter 6, the last chapter of the historical-theoretical part, was to show that WQ findings inform linguistic theory building. Addressed to students of language and linguistics using WQs for their data collection, Chapter 6 introduced a number of well-known concepts in addition to newer theories, such as social indexing, questions of the homogeneity and heterogeneity in national dialects and linguistic perspectives across political borders. Its section on theories of koinéization on Trudgill’s NewDialect Formation theory and Schneider’s Dynamic Model has aimed to link external language history with linguistic findings in more concrete, less metaphorical and abstract ways, and suggested that WQs are directly relevant to both kinds of models.

368 The Written Questionnaire in Social Dialectology

The second, partical part of this book was geared towards the novice in empirical linguistics. Chapter 7 (questionnaire design), Chapter 8 (working with WQ data) and Chapter 9 (statistics with R, with special consideration for the student in the humanities) were written expressly for the beginner in quantitative linguistics. This is in an attempt to bridge the lamentable gulf, at least somewhat, between qualitative and quantitative approaches. In order to do language study well, we all need to be able to speak to both “camps”, at least to some degree. Chapter 7 is a first, hopefully useful attempt at a more formalized method of WQ design in social dialectology and variation studies. Besides offering guidelines on good questionnaire and question design – but note, no best practice examples, as I consider it too early to actually identify some for their methodological soundness – considerable emphasis was placed on cultivating a serious reflection process on WQs, their constraints and some workaround procedures. Most of these are found, though in other places as well, in Section 7.3.6, which is supported by a number of small, student-led pilot studies into good question design. A typology of WQ questions in social dialectology was presented in Section 7.3, as were newer approaches, such as attitude, social evaluation, perception studies, and socio-syntax, to name some. Statistics has been gaining momentum in all areas of English language and linguistics in recent years and probably in most modern language philologies. Chapter 9 serves as an introduction to R, which will be the software package of choice for many linguists and is poised to become the undisputed discipline standard before long. For this reason, and for the wealth of material and help that is available online, R was given preference over other software packages. As an open source tool that is constantly being expanded, students will be well served with a basic working knowledge of R, not matter in which area they specialize. In the same economical manner as Excel commands were reviewed in Chapter 8, Chapter 9 offered R commands in small bits and only in the context of traditional WQ variables (which are mostly categorical, see Table 9.3), which contained the range of possible tests and operations. It is my hope that this approach will help students, and again, especially those who consider themselves to be more philologists or qualitative researchers, to explore and overcome possible hesitations. In any case, students should not, I am tempted to say must not, be deterred too much by the command-line interface of R. Instead, they are encouraged to think of the command line as the hip equivalent of vinyl records: vilified in the late 1980s and slated to be faded out by the big record labels by they year 2000, records came back with a vengeance. In like manner, the command line is back: not only cool, but also incredibly versatile, elegant and, frankly, just the more practical and time-saving interface compared to the graphics user interfaces (GUIs). GUIs can only do so much, but never everything. This level of control afforded by the command line will be more and more appreciated with increasing R knowledge. Any initial effort with R will, therefore, pay off manifold with the endless opportunities that R affords its users. It is a rare thing



Chapter 10.  Epilogue 369

that the best software is made available free of charge to everyone, rather than being sold in an expensive and unaffordable business context. This fact alone would warrant a closer look and trial.

10.3 Desiderata It has been the overall purpose of this book to explore the possibilities that WQs afford to the study of social and regional dialects. As WQs expand their fields of application, researchers are expected to turn to the method and, consequently, inevitable improvements will be made. This development can already be seen in newer approaches presented in Chapter 7, for instance. In this section, three desiderata of WQs in social dialectology and variation studies shall be addressed: the lack of good guidelines on questionnaire design, the need to reconceptualise notions of space in more modern terms and the way toward WQ-internal checks and controls that would allow an easier detection of less reliable respondents. While there are other areas, these three domains seem to me the most promising for the immediate future.

10.3.1 Guidelines for WQ design in social dialectology Chapter 7 was aiming to link the historical-theoretical with the practical parts, while revealing that methodological work on WQs has been carried out only to a very limited degree. My attempt at a question typology is not more than just that. The literature on social science polling is somewhat useful (e.g. Gillham 2007, de Vaus 1991), but because of the different nature of information elicited in social surveys compared with dialectological and variationist WQs, more interdisciplinary work would be needed. A methodology of social dialectology WQs is an ongoing desideratum that was only partly addressed in the present text. As outlined in the previous section, another area that would benefit from more attention is the empirical study of efficient question and questionnaire design in linguistic contexts. Work in the area is either older (from the 1970s) or only marginally applicable to social dialectology WQs. Especially needed is work on the effect of different question wording on the elicitation of linguistic behaviour, for which Section 7.3.6 offers only a staring point. Question types in social dialectology, as Section 7.3 has shown, are relatively few and follow earlier models that were not systematically tested for their effectiveness. It would therefore be beneficial to experiment with different wordings and question styles and study their effects. An overall goal would be to arrive at question templates for various question types, similar to what has been available in the social sciences (e.g. de Vaus 1991). The method is only able to exploit its advantages, to the extent that its intricacies are tested and formalized.

370 The Written Questionnaire in Social Dialectology

10.3.2 WQs, geographical space, and potential risks In the studies presented in this book, space has been implicitly defined as a constant and static factor. As discussed in Section 5.3.2, the study of language and space has received renewed attention lately, and has started to inform theories of space (see, Auer & Schmidt 2010; Auer et al. 2013). Inspired by changing conceptions of space in human geography, Britain (2002) recounts the development of dialect geography and sociolinguistics along two different clines of development. In dialect geography, geographical space was taken as a steady constant. While early sociolinguistic studies operationalized social variables such as age, gender and ethnicity they ignored geographical space. Since dialect geography focussed on rural speech, the new urban focus “could be seen as throwing the rural baby out with the traditional dialectological bathwater” (Britain 2002: 607). While dialect geography has often failed to fully include social criteria in its data representations, sociolinguistics has generally worked with social variables in frameworks where geographical space played little to no role. Much sociolinguistic work can be classified as being built on a dichotomy between geographical and social space, where attention was given to the latter. Now that WQs are being used in the paradigms of World Englishes, Global Englishes and local Englishes (e.g. Blommaert 2008), the lack of a theory of space has become a pressing issue. As discussed towards the beginning of Chapter 5, which started out with a characterization of the – often implicit – monolingual backdrop to much early and not-so-early sociolinguistic work, new realities of massive and ongoing migration have emerged and changed the necessary methodologies. Today’s super-diverse and high-mobility settings render the once-in-a-lifetime migratory act of the 19th and much of the 20th century as near obsolete. The ensuing recreation and adaptation of social space into different kinds of spaces, e.g. the transfer of the originally often monolingual, immigrant neighbourhoods of the New World, e.g. Italians in Toronto or Hong Kong Chinese in mid-century Vancouver, into more multilingual and less homogeneous communities than they every used to be. As I am writing these lines, I am sitting in the formerly Vietnamese neighbourhood of Kensington-Cedar Cottage in East Vancouver, a neighbourhood that has been experiencing a highly diverse mix of gentrification, in-migration of young and not-so-young professionals, alongside the more recent Mandarin and Cantonese speaking working class in-migrants, the Europeans, British, Americans, Filipinos and East Indians and Canadians from “back east” who are now moving into the neighbourhood, altering the erstwhile monolingual Vietnamese neighbourhood. On the neighbourhood main drag, Kingsway, one can see the formerly ubiquitous Vietnamese signage giving way, bit by bit, to other linguistic signage, by which I do not mean merely English. In Europe the development is further advanced, it seems, finding its theoretical expression in the phenomenon of the “multiethnic urban lects” in the big European



Chapter 10.  Epilogue

centres. These developments have rendered the idea of a dominant vernacular culture, whether native (e.g. German in Vienna’s 15th district) or dominated by an immigrant language (which used to be Turkish), no longer reflect a more recent heterogeneity, where 20, 30 or more cultural backgrounds co-habit in a given location where there used to be only one. As a result, social space and perceived space, in addition to geographical space, are constructed anew with each new encounter: it is often impossible to know what background the next interlocutor will bring to the exchange. As a result, multi-ethnic and multi-lingual koinés will be on the rise, and the multi-ethnic lects, first described in Copenhagen in the 1990s, later in other European cities (Section 5.1.2) will likely give rise to new accommodation skills and competencies. ELF scholarship (e.g. Seidlhofer 2011) is showing us already what kind of contact phenomena and skills we might expect. Clearly, all this is of profound relevance when aiming to design WQs for mass distribution, as researchers would need to keep potential pitfalls in mind. The discussion in Sections 5.2, 5.3 and 5.4 should have shown that this area presents a genuine desideratum. It starts at the most basic of questions, which is the choice of WQ language: what might be the right choice for one social group in a given location, might send the wrong signals for any of the other groups. The use of a supposedly “neutral” dominant and autochthonous language, e.g. German in Vienna’s 15th district, seems to be an obvious compromise, but a compromise that will preclude large parts of the district’s population who do not speak (or write) German, or not well enough in their own assessment to feel comfortable with a WQ on their own.

10.3.3 WQs and WQ-internal checks and controls In the absence of parallel studies, which are generally rare, it is difficult or next to impossible to determine any biases in collected data of any type. Unfortunately, there has not yet been a great interest in replicating studies or in doing so in order to test the reliability, the strengths and weaknesses of a particular method. While it is somewhat of a generalization that disregards existing methodological innovators, it seems fair to say that rather than spending the resources to further refine a method’s biases different aspects of a phenomenon are studied with a methodology that has come to be considered “standard”. It seems therefore quite widespread and, unfortunately, somewhat acceptable, for many practitioners to either swear by a method or a given method and to not use some other method(s) at all. To remedy the perceived, reported and observed general under-appreciation of WQs to some degree at least has been one of driving forces behind the writing of this book. What does this silo-like treatment of methodologies mean for WQs? It would mean that any effort should be encouraged to probing into the real, but not the perceived, prejudged or pre-conceived, limitations of the method in the variationist linguistic context.

371

372 The Written Questionnaire in Social Dialectology

The tradition of speaker evaluation can show the way to some degree and would offer valuable points of departure, if the different variable types are considered appropriately. There are a number of other desiderata that can only be addressed once more basic comparative studies between WQ data and other types of data, FIs, sociolinguistic interview and corpus studies, perhaps other sources, will have become available. McDavid (1940) set the scene that Davies (1948) exploited and to which Dollinger (2012b), with all its limitations, was a more recent step towards more objective comparisons of data quality. All three papers have in common that they compared WQs to interview data and all three found the WQ to be of high quality, sometimes surprisingly so (see Chapter 3). In terms of desiderata that may be tackled immediately, the ubiquitous use of multi-item scaling in social and psychological WQs and their (almost) complete lack in variation studies is one case in point. While multiple-item scaling has been employed in only a very limited way in linguistic self-reporting, and not in a very sophisticated way (see Section 7.3.1), this feature is used pervasively and habitually in attitude and perception studies (Chapter 7). While it needs to be considered that the variable types differ somewhat – categorical variables are common in the variationist framework, while ratio-scaled or ordinal variables are common in the speaker evaluation tradition – this fact does not mean that carefully crafted methodological experiments would not offer ways for incorporating Likert-type scales and multi-item scaling. Such innovative study and methodology would boost the checks and controls that can be offered within any WQ, or WQ-internally, so to speak. To a limited degree, internal checks and controls may be used already, e.g. in questions on yod-dropping, yet present-day variation WQs are generally not designed to offer correctives of a type found in the sociological, psychological and opinion-polling traditions. The benefit is that linguists would be in a position to assess each respondent’s WQ internally for reliability, just as it is possible in opinion polling. A possible scenario would run along the following lines and would rest on the repetition of linguistically identical or equivalent items. If, for instance, a respondent reported a yod-ful variant in questions 3 and 59, but a yod-less variant in questions 27 and 31, provided that one had good evidence to expect, with considerable certainty either the one or the other type in all four contexts or occurrences, the respondent’s answers could be set aside as of doubtful quality and would not be collated with respondents that seem to report more faithfully. That particular respondent’s data could, in a later instance, be crosschecked to ensure that the established pattern of variation is really sound and not an artefact of other data. Clearly, one would need to proceed very cautiously with such regulatory diagnostics, but, I believe, it is in principle a viable way forward.



Chapter 10.  Epilogue 373

10.4 WQs: The delayed method This book can be read as a post-hoc assessment that WQs should have been canonized as a standard data elicitation technique in English linguistics, in concert with other ones, in the 1950s. Traditional dialectology has never accepted WQs beyond a “supplementary” method, though this assessment was quite widespread for a while (Atwood 1962: 30). As the new discipline of sociolinguistics relied on a different type of elicitation technique for the description of linguistic variation and change, the sociolinguistic interview, WQs were restricted in the new field. Labov (2006 [1966]) and Trudgill’s (1972), much to their credit, did test self-reports, Labov on postvocalic /r/ in New York and Trudgill on yod-dropping in Norwich. As both variables in their respective contexts were coded for overt prestige, self-reports did not yield reliable data, much like student or news in the North American context. This mismatch would relegate their use in sociolinguistics to the measurement of social responses to linguistic variables, i.e. attitudes and perceptions, where they have always played a central role. Among Weinreich, Labov and Herzog’s (1968) familiar five dimensions for the description of a linguistic change, WQs would only be admissible as evidence for the evaluation of linguistic features by its speakers. While evaluative assessments are certainly a feature of linguistic WQs, even a strong suit, one of the goals of the present text was to highlight that WQs can offer more than social evaluations and can and do contribute to linguistic description & theory building. While at present WQ data generally cannot speak to its precise relationship with observed data, WQ data have been shown to provide valuable material for linguistic description. Recent activity and innovation in social dialectology has improved the versatility and reliability of WQs (e.g. Vaux 2004; Boberg 2005; Buchstaller et al. 2013; Krug & Sell 2013). This activity may be seen as the belated methodological expansion of WQs beyond the social evaluative aspect. WQs have never completely ceased to be used in English linguistics. The problem in dialectology and more so in sociolinguistics has been that the label “questionnaire” (unless in the “worksheet” sense for FIs) has carried negative undertones. At times, WQs were even used as prime pieces of evidence, which has led to somewhat bizarre assessments. Despite Kurath’s (1958) scathing critique of Mitzka’s Wortatlas, Kurath himself applied Atwood’s (1962) WQ data on Texan and southeastern US lexis to make important points on dialect contact in the US (Kurath 1986 [1972]). Their ease of administration kept WQs in use in classroom projects on English social dialectology. Whether it was Chambers’ master’s student Christine Zeller, Boberg’s undergraduate students working in the late 1990s on a data collection exercise that culminated in fine publications, or the present author’s practice of using WQs in undergraduate classes starting in 1st and 2nd year English, WQs have been put to good use

374 The Written Questionnaire in Social Dialectology

in student projects since at least the days of Albert H. Marckwardt in Michigan in the early 1940s. Today, undergraduate students employ the method successfully and more and more researchers are beginning to adopt it. WQ data is one type of data that must be taken seriously in the social and regional study of linguistic variation. It is simply no longer acceptable for social dialectologists to turn up their noses at WQs. As one of the methods in the toolkit of linguists, WQs will not serve every purpose, but they will offer an attractive choice in all the contexts presented and beyond. WQs are extremely versatile and are poised to gain their rightful place in the methodological toolbox of the dialectologist, variationist and sociolinguist, and in the wider linguistic context.

References Ahrend, Evelyn R. 1934. Ontario speech. American Speech 9: 136–139. Alexander, Henry. 1939. Charting Canadian speech. Journal of Education (Nova Scotia) 10: 457–458. Alexander, Henry. 1951. The English language in Canada. In Royal Commission Studies, “Massey Report”, 13–24. Ottawa: King’s Printer. Allen, Harold B. 1973–76. The Linguistic Atlas of the Upper Midwest, 3 Vols. Minneapolis, MN: University of Minnesota Press. Allen, Harold B. 1959. Canadian-American differences along the middle border. Canadian Journal of Linguistics 5: 17–24. Allen, Harold B. & Linn, Michael D. (eds.) 1986. Dialect and Language Variation. Orlando FL: Academic Press. Altendorf, Ulrike. 2003. ‘Estuary English’: Levelling at the interface of RP and Southeastern British English. Tübingen: Narr. Ammon, Ulrich. 1995. Die deutsche Sprache in Deutschland, Österreich und der Schweiz: Das Problem der nationalen Varietäten. Berlin: Mouton de Gruyter. doi: 10.1515/9783110872170 American Heritage Dictionary. 2000. 4th edn. Boston MA: Houghton Mifflin Company. Anderwald, Lieselotte & Kortmann, Bernd. 2013. Applying typological methods in dialectology. In Research Methods in Language Variation and Change, Manfred Krug & Julia Schlüter (eds), 313–333. Cambridge: CUP. Atwood, E. Bagby. 1986 [1963]. The methods of American Dialectology. In Allen & Linn (eds), 63–97. Atwood, E. Bagby. 1962. The Regional Vocabulary of Texas. Austin TX: University of Texas Press. Auer, Anita, Catharina Peersman, Simon Pickl, Gijsbert Rutten & Rik Vosters. 2015. Historical sociolinguistics: the field and its future. Journal of Historical Sociolinguistics 1: 1–12. Auer, Peter & Schmidt, Jürgen Erich (eds). 2010. Language and Space: An International Handbook of Linguistic Variation [Handbücher zur Sprach- und Kommunikationswissenschaft 30], 2 Vols. Berlin: De Gruyter. Auer, Peter, (ed.-in-chief), Hilpert, Martin, Stukenbrock, Anja & Szmrecsanyi, Benedikt (eds). 2013. Space in Language and Linguistics: Geographical, Interactional and Cognitive Perspectives. Berlin: De Gruyter. doi: 10.1515/9783110312027 Avis, Walter S., Crate, Charles, Drysdale, Patrick, Leechman, Douglas, Scargill, Matthew H. & Lovell, Charles J. (eds). 1967. Dictionary of Canadianisms on Historical Principles. Toronto: Gage. Avis, Walter S. 1954. Speech differences along the Ontario-United States border, I: Vocabulary. Journal of the Canadian Linguistic Association 1(1, Oct.): 13–18. Avis, Walter S. 1955. Speech differences along the Ontario-United States border, II: Grammar and syntax. Journal of the Canadian Linguistic Association 1(1, Mar.): 14–19. Avis, Walter S. 1956. Speech differences along the Ontario-United States border, III: Pronunciation. Journal of the Canadian Linguistic Association 1(1, Mar.): 41–59. Avis, Walter S. 1972. So Eh? is Canadian, Eh? Canadian Journal of Linguistics 17(2): 89–104. Avis, Walter S. 1973. The English language in Canada In Current Trends in Linguistics. Vol. 10/1, Thomas Sebeok (ed.), 40–74. The Hague: Mouton.

376 The Written Questionnaire in Social Dialectology Avis, Walter S., Crate, Charles, Drysdale, Patrick, Leechman, Douglas, Scargill, Matthew H. & Lovell, Charles J. (eds). 1967. A Dictionary of Canadianisms on Historical Principles. Toronto: Gage. Ayearst, Morley. 1939. A note on Canadian speech. American Speech 14: 231–233. Baayen, R. Harald. 2008. Analyzing Linguistic Data: A Practical Introduction to Statistics using R. Cambridge: CUP. doi: 10.1017/CBO9780511801686 Babbit, Eugene H. & Mott, Lewis F. 1896. The 1894 Circular. Dialect Notes 1/VII: 311–314. Baker, Paul. 2010. Sociolinguistics and Corpus Linguistics. Edinburgh: EUP. Baker, Paul. 2013. Corpus linguistics in sociolinguistics. In Holmes & Hazen (eds), 107–118. Bailey, Richard W. & Görlach, Manfred (eds). 1982. English as a World Language. Ann Arbor MI: University of Michigan Press. Bailey, Guy & Tillery, Jan. 1999. The Routledge effect: The impact of interviewers on survey results in linguistics. American Speech 74(4): 389–402. Bamgbose, Ayo. 1998. Torn between the norms: Innovations in World Englishes. World Englishes 17(1): 1–14. doi: 10.1111/1467-971X.00078 Bank of Canadian English = see Dollinger et al., 2006. Barber, Katherine. 2004 [1st ed. 1998]. Canadian Oxford Dictionary. 2nd ed. Don Mills ON: OUP. Bard, Ellen, Robertson, Dan & Sorace, Antonella. 1996. Magnitude estimation of linguistic acceptability. Language 72: 1–31. doi: 10.2307/416793 Barbiers, Sjef, Cornips, Leonie & van der Kleij, Susanne (eds). 2002. Syntactic Microvariation. ­Amsterdam: Meertens Instituut. Barbiers, Sjef, Cornips, Leonie & van der Kleij, Susanne (eds). 2004. Syntactische Atlas van den Nederlandse Dialecten. Amsterdam: Amsterdam University Press. Bart, Gabriela, Glaser, Elvira, Sibler, Pius & Weibel, Robert. 2013. Analysis of Swiss German syntactic variants using spatial statistics. In Current Approaches to Limits and Areas in Dialectology, Xosé Afonso Álvarez Pérez, Ernestina Carrilho & Catarina Magro (eds), 143–169. Newcastle upon Tyne: Cambridge Scholars. Bauer, Laurie. 1998. You shouldn’t say ‘It is me’ because ‘me’ is accusative. In Language Myths, Laurie Bauer & Peter Trudgill (eds), 132–138. London: Penguin. Becker, Kara. 2013. The sociolinguistic interview. In Mallinson, Childs & Van Herk (eds), 91–100. Beebe, Leslie M. & Cummings, Martha Clark. 1996. Natural speech act data versus written questionnaire data: How data collection method affects speech act performance. In Speech Acts across Cultures: Challenges to Communication in a Second Language, Susan Gass & Joyce Neu (eds), 65–86. Berlin: Mouton de Gruyter. Berger, Christine Maria. 2005. The dialect topography of Canada: Method, coverage, interface and analyses. MA thesis, University of Vienna. Biber, Douglas, Johansson, Stig, Leech, Geoffrey, Conrad, Susan & Finegan, Edward. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson. Blommaert, Jan. 2008. Grassroots Literacy: Writing, identity and Voice in Central Africa. London: Routledge. Blommaert, Jan. 2010. The Sociolinguistics of Globalization. Cambridge: CUP. doi:  10.1017/CBO9780511845307

Bloomfield, Morton W. 1948. Canadian English and its relation to eighteenth century American speech. Journal of English and Germanic Philology 47: 59–66 [reprinted in Chambers (ed.) (1975), 3–11]. Boberg, Charles. 2000. Geolinguistic diffusion and the U.S.-Canada border. Language Variation and Change 12: 1–24. doi: 10.1017/S0954394500121015 Boberg, Charles. 2004a. Real and apparent time in language change: Late adoption of changes in Montreal English. American Speech 79(3): 250–69. doi: 10.1215/00031283-79-3-250

References 377 Boberg, Charles. 2004b. Ethnic patterns in the phonetics of Montreal English. Journal of Sociolinguistics 8(4): 538–568. doi: 10.1111/j.1467-9841.2004.00273.x Boberg, Charles. 2004c. The dialect topography of Montreal. English World-Wide 25: 171–198. doi:  10.1075/eww.25.2.02bob

Boberg, Charles. 2005. The North American Regional Vocabulary Survey: New variables and methods in the study of North American English. American Speech 80: 22–60. doi:  10.1215/00031283-80-1-22

Boberg, Charles. 2008a. Regional phonetic differentiation in Standard Canadian English. Journal of English Linguistics 36(2): 129–154. doi: 10.1177/0075424208316648 Boberg, Charles. 2008b. English in Canada: Phonology. In Varieties of English, Vol 2: The Americas and the Caribbean, Edgar W. Schneider (ed.), 144–60. Berlin: Mouton de Gruyter. Boberg, Charles. 2009. The emergence of a new phoneme: Foreign (a) in Canadian English. Language Variation and Change 21: 355–380. Boberg, Charles. 2010. The English Language in Canada: Status, History and Comparative Analysis. Cambridge: CUP. doi: 10.1017/CBO9780511781056 Boberg, Charles. 2012. English as a minority language. World Englishes 31(4): 493–502. Special Issue on Autonomy and Homogeneity in Canadian English. doi: 10.1111/j.1467-971X.2012.01776.x Boberg, Charles. 2013. The use of written questionnaires in sociolinguistics. In Mallinson, Childs & Van Herk (eds), 131–141. Bourdieu, Pierre. 1991. Language & Symbolic Power. Ed. by John B. Thompson, trans. by Gino Raymond & Matthew Adamson. Cambridge, MA: Harvard University Press. Bourhis, Richard Y., Giles, Howard & Rosenthal, Doreen. 1981. Notes on construction of a ‘Subjective Vitality Questionnaire’ for ethnolinguistic groups. Journal of Multilingual and Multicultural Development 2: 145–155. doi: 10.1080/01434632.1981.9994047 Bremer, Otto. 1895. Beiträge zur Geographie der deutschen Mundarten in Form einer Kritik von ­Wenkers Sprachatlas des Deutschen Reiches. Leipzig: S. Hirschel. Brinton, Laurel. In press. Using historical corpora and historical text databases. In The Handbook of Lexicography, Philip Durkin (ed.). Oxford: Oxford University Press. Bright, Elizabeth. 1971. A Word Geography of California and Nevada. Berkeley CT: University of California Press. Britain, David. 1991. Dialect and space: a geolinguistic study of speech variables in the Fens. Ph.D. dissertation. University of Essex. Britain, David. 2002. Space and spatial diffusion. In The Handbook of Language Variation and Change, Jack K. Chambers, Peter Trudgill & Natalie Schilling-Estes (eds), 603–637. Malden MA: Blackwell. Britain, David. 2010a. Conceptualizations of geographic space in linguistics. In Auer & Schmidt (eds), Vol. 2: 69–97. Britain, David. 2010b. Language and space: The variationist approach. In Auer & Schmidt (eds.), Vol. 1: 142–163. Brown, James Dean. 2001. Using Surveys in Language Programs. Cambridge: CUP. Buchstaller, Isabelle & Corrigan, Karen P. 2011. How to make intuitions succeed: Testing methods for analyzing syntactic microvariation. In Maguire & McMahon (eds), 30–48. doi:  10.1017/CBO9780511976360.003

Buchstaller, Isabelle, Corrigan, Karen P., Holmberg, Anders, Honeybone, Patrick & Maguire, Warren. 2013. T-to-R and the Northern Subject Rule: Questionnaire-based spatial, social and structural linguistics. English Language and Linguistics 17: 85–128. doi: 10.1017/S1360674312000330 Burchfield, Robert W. 1996. The New Fowler’s Modern English Usage. Oxford: Clarendon.

378 The Written Questionnaire in Social Dialectology Burnett, Wendy. 2006. Linguistic resistance on the Maine-New Brunswick border. Canadian Journal of Linguistics 51(2-3): 161–76. doi: 10.1353/cjl.2008.0005 Calvet, Louis-Jean. 2006. Towards an Ecology of World Languages. Cambridge: Polity Press. Cameron, Deborah. 2003. Gender and language ideologies. In The Handbook of Language and Gender, ed. by Janet Holmes and Miriam Meyerhoff, 447–467. Malden, MA: Blackwell. Canadian Oxford Dictionary. See Barber (2004). Campbell-Kibler, Kathryn. 2007. Accent, (ING) and the social logic of listener perceptions. American Speech 82: 32–64. doi: 10.1215/00031283-2007-002 Carden, Guy. 1970. Discussion of Heringer 1970. In Papers from the Sixth Regional Meeting of the Chicago Linguistic Society, 296. Chicago IL: Chicago Linguistic Society. Carden, Guy. 1973. Dialect variation and abstract syntax. In Roger W. Shuy (ed.). Some New Directions in Linguistics, 1–34. Washington, DC: Georgetown University Press. Carden, Guy. 1976. Syntactic and semantic data: replication results. Language in Society 5(1): 99–104. doi: 10.1017/S0047404500006886 Cassidy, Frederic G. 1948. On collecting American dialect. American Speech 23: 185–193. doi:  10.2307/486917

Cassidy, Frederic G. 1985. Introduction. Dictionary of American Regional English, Vol. I: Introduction and A-C, xi–xxii. Cambridge MA: Belknap Press. Cassidy, Frederic G. & Duckert, Audrey R. 1953. A Method for Collecting Dialect. Gainesville FL: American Dialect Society. Cassidy, Frederick G. & Houston Hall, Joan (eds). 1985–2013. Dictionary of American Regional English, Vols I–VI. Cambridge MA: Belknap Press of Harvard University Press. Chambers, Jack K. 1973. Canadian Raising. Canadian Journal of Linguistics 18(2): 113–135. Chambers, J. K. (ed.) 1975. Canadian English: Origins and Structures. Toronto: Methuen. Chambers, Jack K. 1979. Canadian English. In The Languages of Canada [Série 3L Series 3], Jack K. Chambers (ed.), 168–204. Montréal: Didier. Chambers, Jack K. 1980. Linguistic variation and Chomsky’s ‘homogeneous speech community’. In Papers from the Fourth Annual Meeting of the Atlantic Provinces Linguistic Association. University of New Brunswick, Frederickton, N.B., 12–13 December 1980, A. Murray Kinloch & Anthony B. House (eds), 1–31. Frederickton: University of New Brunswick. Chambers, Jack K. 1991. Canada. In English Around the World. Social Perspectives, Jenny Cheshire (ed.), 89–107. Cambridge: CUP. Chambers, Jack K. 1993. ‘Lawless and vulgar innovations’: Victorian views on Canadian English. In Sandra Clarke (ed.), 1–26. doi: 10.1075/veaw.g11.02cha Chambers, Jack K. 1994. An introduction to dialect topography. English World-Wide 15: 35–53. doi:  10.1075/eww.15.1.03cha

Chambers, Jack K. 1995. The Canada-U.S. border as a vanishing isogloss: The evidence of chesterfield. Journal of English Linguistics 23: 155–166. doi: 10.1177/007542429002300113 Chambers, Jack K. 1998a. Inferring dialect from a postal questionnaire. Journal of English Linguistics 26: 222–246. doi: 10.1177/007542429802600304 Chambers, Jack K. 1998b. Social embedding of changes in progress. Journal of English Linguistics 26(1): 5–36. doi: 10.1177/007542429802600102 Chambers, Jack K. 1998c. English: Canadian varieties. In Language in Canada, John Edwards (ed.), 252–272. Cambridge: CUP. doi: 10.1017/CBO9780511620829.014 Chambers, Jack K. 2000. Region and language variation. English World-Wide 21(2): 169–199. Chambers, Jack K. 2002a. Patterns of variation including change. In The Handbook of Language Variation and Change, Jack K. Chambers, Peter Trudgill & Natalie Schilling-Estes (eds), 349–372. Malden MA: Blackwell.

References 379 Chambers, Jack K. 2002b. Yod dropping in an English accent. Journal of the Phonetic Society of Japan. 6(3): 4–11. Chambers, Jack K. 2004. ‘Canadian Dainty’: The rise and decline of Briticisms in Canada. In Legacies of Colonial English. Studies in Transported Dialects, Raymond Hickey (ed.), 224–241. Cambridge: CUP. Chambers, Jack K. 2006. Canadian Raising Retrospect and Prospect. Canadian Journal of Linguistics 51(2-3): 105–118. doi: 10.1353/cjl.2008.0009 Chambers, Jack K. 2007. A linguistic fossil: positive any more in the Golden Horseshoe. In LACUS Forum XXXIII: Variation, Peter Reich, William J. Sullivan, Arle R. Lommel & Toby Griffen (eds), 31–44. Houston TX: Linguistic Association of Canada and the United States. Chambers, Jack K. 2008. The Tangled Garden: Relics and vestiges in Canadian English. Anglistik 19: 7–21. Special issue Focus on Canadian English, Matthias Meyer (ed.). Chambers, Jack K. 2009. Sociolinguistic Theory, 3rd rev. edn. Malden MA: Wiley-Blackwell. Chambers, Jack K. 2010. English in Canada. In Canadian English: A Linguistic Reader, Elaine Gold & Janice McAlpine (eds), 1–37. Kingston ON: Strathy Language Unit. Chambers, Jack K. 2012. Homogeneity as a sociolinguistic motive in Canadian English. World Englishes 31(4): 467–477. Special issue on Autonomy and Homogeneity in Canadian English. doi:  10.1111/j.1467-971X.2012.01774.x

Chambers, Jack K. & Trudgill, Peter. 1998. Dialectology, 2nd edn. Cambridge: CUP. doi:  10.1017/CBO9780511805103

Chambers, Jack K. & Heisler, Troy. 1999. Dialect topography of Québec City English. Canadian Journal of Linguistics 44(1): 23–48. Chambers, Jack K. & Lapierre, André. 2011. Dialect Variants in the Bilingual Belt. In Le français en contact. Hommages à Raymond Mougeon, France Martineau & Terrt Nadasdi (eds), 35–50. Québec: Presses de l’Université Laval. Chambers, Tawnie. 2014. On the presence of “on accident” in British Columbia. Term paper. ENGL 489, University of British Columbia. Cheshire, Jenny. 1991. Introduction: Sociolinguistics and English around the world. In English Around the World: Sociolinguistic Perspectives, Jenny Cheshire (ed.), 1–12. Cambridge: CUP. doi:  10.1017/CBO9780511611889.001

Cheshire, Jenny, Kerswill, Paul, Fox, Sue & Torgersen, Eivind. 2011. Contact, the feature pool and the speech community: The emergence of multicultural London English. Journal of Sociolinguistics 15: 151–196. doi: 10.1111/j.1467-9841.2011.00478.x Chomsky, Noam. 1965. Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Clahsen, Harald, Meisel, Jürgen & Pienemann, Manfred. 1983. Deutsch als Zweitsprache: Der Spracherwerb ausländischer Arbeiter. Tübingen: Narr. Clarke, Sandra (ed.). 1993a. Focus on Canada [Varieties of English around the World G11]. Amsterdam: John Benjamins. doi: 10.1075/veaw.g11 Clarke, Sandra. 1993b. The Americanization of Canadian pronunciation: A survey of palatal glide usage. In Clarke (ed.), 85–108. doi: 10.1075/veaw.g11.06cla Clarke, Sandra. 2006. Nooz or nyooz?: The complex construction of Canadian identity. Canadian Journal of Linguistics 51(2-3): 225–246. doi: 10.1353/cjl.2008.0013 Clarke, Sandra. 2012. Phonetic change in Newfoundland English. World Englishes. Special Issue on Autonomy and Homogeneity in Canadian English 31(4): 503–518. Concise Oxford Dictionary. 1990. 8th edn. Oxford: OUP. Considine, John. 2003. Dictionaries of Canadian English. Lexikos 13: 250–270.

380 The Written Questionnaire in Social Dialectology Cornips, Leonie. 2002. Variation between the infinitival complementizers om/voor in spontaneous speech data compared to elicitation data. In Barbiers, Cornips & van der Kleij (eds), 75–96. CONTE = Corpus of Early Ontario English, pre-Confederation Section 1776–1850. Stefan Dollinger (ed.). 2006. University of Vienna. See Dollinger (2008a: ch. 4). Creswell, Thomas J. 1994. Dictionary recognition of developing forms: The case of snuck. In Centennial Usage Studies, Greta D. Little & Michael Montgomery (eds), 144–54. Tuscaloosa AL: University of Alabama Press. Crystal, David. 2003. English as a Global Language, 2nd edn. Cambridge: CUP. doi:  10.1017/CBO9780511486999

DARE = See Cassidy & Hall 1985. Davis, Alva L. 1948. A Word Atlas of the Great Lakes Region. PhD dissertation, University of Michigan. DCHP-1. See Avis, et al. 1967. DCHP-1 Online. See Dollinger et al. 2013. DCHP-2. See Dollinger. Forthcoming. Denison, David. 2003. Log(ist)ic and simplistic S-curves. In Hickey (2003b), 54–70. Derwing, Bruce L. 1973. Transformational Grammar as a Theory of Language Acquisition: A Study in the Empirical, Conceptual and Methodological Foundations of Contemporary Linguistics. ­Cambridge: CUP. De Vaus, David. 1991. Surveys in Social Research, 3rd edn. London: Routledge. De Wolf, Gaelan Dodds & Hasebe-Ludt, Erika. 1993. A linguistic atlas of British Columbia: A first for Canadian English. In Proceedings of the International Congress of Dialectologists, Bamberg 29.7.-4.8. 1990, Vol. 2, Wolfgang Viereck (ed.), 303–342. Stuttgart: Franz Steiner. DeWolf, Gaelan Dodds, Gregg, Robert J., Harris, Barbara P. & Scargill, Matthew H. (eds.) 1997. Gage Canadian Dictionary. [5th], rev. and expanded edn. Toronto: Gage. Dictionary of Canadianisms on Historical Principles, see Avis, et al. 1967 & Dollinger. Forthcoming. Dieth, Eugen & Orton, Harold. 1952. A Questionnaire for a Linguistic Atlas of England. Leeds: Leeds Philosophical and Literary Society. Dillman, Don. 1978. Mail and Telephone Surveys: The Total Design Method. New York NY: Wiley. Dillman, Don. 2000. Mail and Internet Surveys: The Tailored Design Method, 2nd edn. New York NY: Wiley. Dollinger, Stefan. 2006. Oh Canada! Towards the Corpus of Early Ontario English. In The Changing Face of Corpus Linguistics ([Language and Computers 55], Antoinette Renouf & Andrew Kehoe, 7–25. Amsterdam: Rodopi. Dollinger, Stefan. 2008a. New-Dialect Formation in Canada: Evidence from the English Modal Auxiliaries [Studies in Language Companion Series 97]. Amsterdam: John Benjamins. doi:  10.1075/slcs.97

Dollinger, Stefan. 2008b. V/N+ing + N Compounds in North American English: On the trail of the S-curve? Paper presented at ISLE-1, First Conference of the International Society for the Linguistics of English, Freiburg, Germany, 8 October 2008. Dollinger, Stefan. 2008c. Taking permissible shortcuts? Limited evidence, heuristic reasoning and the modal auxiliaries in early Canadian English. In Studies in the History of the English Language, IV: Empirical and Analytical Advances in the Study of English Language Change, Susan Fitzmaurice & Donka Minkova (eds), 357–385. Berlin: Mouton de Gruyter. doi: 10.1515/9783110211801.357 Dollinger, Stefan. 2010. Written sources of Canadian English: Phonetic reconstruction and the lowback vowel merger. In Varieties of English in Writing: The Written Word as Linguistic Evidence [Varieties of English around the World G41], Raymond Hickey (ed.), 197–222. Amsterdam: John Benjamins. doi: 10.1075/veaw.g41.10dol

References 381 Dollinger, Stefan. 2011a. Academic and public attitudes to the notion of ‘standard’ Canadian English. English Today 27(4):3–9. doi: 10.1017/S0266078411000472 Dollinger, Stefan. 2011b. Lexicology Project Manual. Version 1.3 for ENGL 320. Vancouver BC: University of British Columbia Dollinger, Stefan. 2011c. Canadian English: ‘Can-eh-dian,’ or, the ‘continuous short-a system. Review of Boberg, Charles. 2010. The English Language in Canada: Status, History and Comparative Analysis. Cambridge: CUP. American Speech 86(4):480–489. doi: 10.1215/00031283-1587268 Dollinger, Stefan. 2012a. The western Canada-U.S. border as a linguistic boundary: The roles of L1 and L2 speakers. World Englishes 31(4): 519–533. doi: 10.1111/j.1467-971X.2012.01778.x Dollinger, Stefan. 2012b. The written questionnaire as a sociolinguistic data gathering tool: Testing its validity. Journal of English Linguistics 40(1): 74–110. doi: 10.1177/0075424211414808 Dollinger, Stefan. 2015. Regional labelling and English (historical) dictionaries: Two methodological suggestions from DCHP-2. Paper presented at the Fourth International Symposium on Approaches to English Historical Lexicography and Lexicology (Ox-Lex), Pembroke College, Oxford University, UK, 25 March. [19 Oct. 2015]. Dollinger, Stefan. In press, a. National dictionaries and cultural identity: Insights from Austrian German and Canadian English. In The Handbook of Lexicography, Philip Durkin (ed.). Oxford: OUP. Dollinger, Stefan. In press, b. Towards a pluricentric cross-border approach in English linguistics: the case of take up #9. World Englishes. Dollinger, Stefan (editor-in-chief). Forthcoming. DCHP-2: The Dictionary of Canadianisms on Historical Principles, 2nd edn. With the assistance of Margery Fee.Vancouver, BC & Gothenburg, Sweden. Dollinger, Stefan, Brinton, Laurel & Fee, Margery (eds). 2006–. Bank of Canadian English. Online resource. Dollinger, Stefan, Brinton, Laurel & Fee, Margery (eds). 2013. DCHP-1 Online: A Dictionary of Canadianisms on Historical Principles, 1st edn. Based on Walter S. Avis, et al. 1967. Dollinger, Stefan & Clarke, Sandra (eds). 2012. Special issue on Autonomy and Homogeneity in Canadian English. World Englishes 31(4): 449–548. doi: 10.1111/j.1467-971X.2012.01773.x Dörnyei, Zoltán. 2003. Questionnaires in Second Language Reserach: Construction, Administration, and Processing. Mahwah NJ: Lawrence Erlbaum Associates. Dörnyei, Zoltán, Csizér, Kata & Németh, Nóra. 2006. Motivation, Language Attitudes and Globalisation: A Hungarian Perspective. Clevedon: Multilingual Matters. Easson, Gordon. 2000. Cross-border effects of education on ‘correct’ speech. Toronto Working Paper in Linguistics 18: 11–20. Eckert, Penelope. 1989. The whole woman: Sex and gender differences in variation. Language Variation and Change 1: 245–267. doi: 10.1017/S095439450000017X Eckert, Penelope. 2000. Linguistic Variation as Social Practice. Oxford: Blackwell. Eckert, Penelope. 2008. Variation and the indexical field. Journal of Sociolinguistics 12: 453–476. Eckert, Penelope. 2011. Language and power in the preadolescent heterosexual market. American Speech 86(1): 85–97. doi: 10.1215/00031283-1277528 Eckert, Penelope. 2012. Three waves of variation study: The emergence of meaning in the study of sociolinguistic variation. Annual Review of Anthropology 41: 87–100. doi:  10.1146/annurev-anthro-092611-145828

Ellegård, Alvar. 1953. The Auxiliary ‘do’: The Establishment and Regularization of its Use in English. Stockholm: Almqvist & Wiksell.

382 The Written Questionnaire in Social Dialectology Ellemers, Naomi. 2010. Social identity theory. In Encyclopedia of Group Processes and Intergroup Relations, John M. Levine & Michael A. Hogg (eds), 798–802. Thousand Oaks CA: Sage. Ellis, Alexander J. 1869–89. On Early English Pronunciation. Parts I–V. London: Trübner. Elspaß, Stephan. 2005. Zum Wandel im Gebrauch regionalsprachlicher Lexik: Ergebnisse einer Neuerhebung. Zeitschrift für Dialektologie und Linguistik 72: 1–51. Elspaß, Stephan & Möller, Robert. 2003–. Atlas zur deutschen Alltagssprache. Salzburg & Liège. [2 May 2015]. Emenau, Murray B. 1935. The dialect of Lunenburg, Nova Scotia. Language 11: 140–147. Extra, Guus & Yagmur, Kutlay. 2004. Urban Multilingualism in Europe. Clevedon: Multilingual Matters. Fairclough, Norman. 2006. Language and Globalization. London: Routledge. Fee, Margery & McAlpine, Janice. 2007. Guide to Canadian English Usage, 2nd edn. Don Mills ON: OUP. Fillmore, Charles. 1992. ‘Corpus linguistics’ or ‘Computer-aided armchair linguistics’. In Directions in Corpus Linguistics, Jan Svartvik (ed.), 35–60. Berlin: Mouton de Gruyter. Francis, W. Nelson. 1983. Dialectology: An Introduction. London: Longman. Fries, Charles C. 1925. The periphrastic future with shall and will in Modern English. PMLA 40: 963–1024. doi: 10.2307/457534 Fuller, Janet M. 2005. The uses and meanings of the female title Ms. American Speech 80: 180–206. doi: 10.1215/00031283-80-2-180

Gage Canadian Dictionary. See de Wolf, et al. 1997. Gallinsky, Hans. 1952. Die Sprache des Amerikaners, Vol. 2: Wortschatz und Wortbildung. Heidelberg: Kerle. Garner, Bryan A. 2003. Garner’s Modern American Usage. Oxford: OUP. Geikie, Rev. A. Constable 2010 [1857]. Canadian English. In Canadian English: A Linguistic Reader, Elaine Gold & Janice McAlpine (eds), 44–54. Kingston ON: Strathy Language Unit. Giles, Howard & Billings, Andrew. 2004. Assessing language attitudes: speaker evaluation studies. In The Handbook of Applied Linguistics, Alan Davies & Catherine Elder (eds), 187–209. Malden MA: Blackwell. doi: 10.1002/9780470757000.ch7 Gilliéron, Jules & Edmont, Edmond. 1902–1910. Atlas linguistique de la France, 9 Vols. Paris: Champion. Gillham, Bill. 2007. Developing a Questionnaire, 2nd edn. London: Continuum. Glaser, Elvira. 2000. Erhebungsmethoden dialektaler Syntax. In Dialektologie zwischen Tradition und Neuansätzen: Beiträge der internationalen Dialektologentagung, Göttingen, 19.-21. Oktober 1998, Dieter Stellmacher (ed.), 258–76. Stuttgart: Steiner. Glaser, Elvira. 2008. Syntaktische Raumbilder. In Dialektgeographie der Zukunft: Akten des 2. Kongresses der Internationalen Gesellschaft für Dialektologie des Deutschen (IDGG), Franz Patocka & Peter Ernst (eds), 85–111. Stuttgart: Steiner. Gold, David L. 1969. Frying pan versus frypan: A trend in English compounds? American Speech 44: 299–302. doi: 10.2307/454688 Gold, Elaine. 2008. Canadian Eh? From Eh to Zed. Anglistik 19(2): 141–156. Gordon, Elizabeth & Lewis, Gillian. 1998. New-dialect formation and Southern Hemisphere English: The New Zealand short front vowels. Journal of Sociolinguistics 2(1): 35–51. doi:  10.1111/1467-9481.00029

Görlach, Manfred. 1994. Garbage in, rubbish out, or, how far can methods of traditional dialectology be applied to a world language? In Proceedings of the International Congress of Dialectologists, Bamberg 29.7.-4.8.1990, Vol. III, Wolfgang Viereck (ed.), 258–268. Stuttgart: Steiner.

References 383 Görlach, Manfred. 1995. Heteronomy in International English. In More Englishes: New Studies in the Varieties of English 1988–1994 [Varieties of English around the World G13], Manfred Görlach, 93–123. Amsterdam: John Benjamins. doi: 10.1075/veaw.g13.05het Graddol, David. 2007. English Next: Why Global English May Mean the End of English as a Foreign Language. British Council. [11 November 2013]. Greenbaum, Sidney & Quirk, Randolph. 1970. Elicitation Experiments in English: Linguistic Studies in Use and Attitude. Coral Gables FL: University of Miami Press. Gregg, Robert J. 1957. Notes on the pronunciation of Canadian English as spoken in Vancouver BC. Journal of the Canadian Linguistic Association 3: 20–26. Gregg, Robert J. 1973. The linguistic survey of British Columbia: The Kootenay region. In Canadian Languages in Their Social Context, Regna Darnell (ed.), 105–16. Edmonton: Linguistic Research. Gregg, Robert J. 1995. The survival of local lexical items as specific markers in Vancouver English. Journal of English Linguistics 23(1-2): 184–194. doi: 10.1177/007542429002300115 Gregg, Robert J. 2004 [1984]. The Survey of Vancouver English. A Sociolinguistic Study of Urban Canadian English [Strathy Language Unit Occasional Papers 5], Gaelan Dodds de Wolf, Margery Fee & Janice McAlpine (eds). Kingston: Queen’s University. Gries, Stefan T. 2004. HCFA 3.2. A program for R. Gries, Stefan T. 2009a. Quantitative Corpus Linguistics with R: A Practical Introduction. New York NY: Routledge. doi: 10.1515/9783110216042 Gries, Stefan T. 2009b. Statistics for Linguistics with R: A Practical Introduction. Berlin: Mouton de Gruyter. doi: 10.1515/9783110216042 Grieve, Jack, Asnaghi, Constanza & Ruette, Tom. 2013. Site-restricted web searches for data collection in regional dialectology. American Speech 88: 413–440. doi: 10.1215/00031283-2691424 Groom, Nicholas & Littlemore, Jeannette. 2011. Doing Applied Linguistics: A Guide for Students. Abingdon: Routledge. Gulden [Halford], Brigitte K. 1979. Attitudinal factors in Canadian English usage. MA thesis, University of Victoria. Haglund, C. 2010. Transnational identifications among adolescents in suburban Sweden. In Quist & Svendsen (eds), 96–110. Hamilton, Donald E. 1958. Notes on Montreal English. Journal of the Canadian Linguistic Association 4(1, Spring): 70–79. Reprinted in Chambers (ed.) 1975, 46–54. Hamilton, Donald E. 1964. Standard Canadian English: Pronunciation. In Proceedings of the Ninth International Congress of Linguists. Cambridge, Mass., August 27–31, 1962, Horace G. Lunt (ed.), 456–459. The Hague: Mouton. Hansen, Gert Foget & Pharao, Nicolai. 2010. Prosody in the Copenhagen multiethnolect. In Quist & Svensdsen (eds), 79–95. Harnisch, Ruediger. 1992. Johann Andreas Schmeller zwischen universeller Lauttheorie und empirischer Dialektlautkunde. Historiographia Linguistica 19(2-3): 275–300. doi:  10.1075/hl.19.2-3.04har

Harris, Barbara P. 1983. Handsaw or harlot? Some problem etymologies in the lexicon of Chinook Jargon. Canadian Journal of Linguistics 28(1): 25–32. Harris, Robin S. & Harris, Terry G. (eds). 1994. The Eldon House Diaries: Five Women’s Views of the 19th Century. Toronto: The Champlain Society. Hempl, George. 1896a. American speech-maps. Dialect Notes 1(VII): 315–318. Hempl, George. 1896b. Grease and greasy. Dialect Notes 1(IX): 438–444.

384 The Written Questionnaire in Social Dialectology Hernández-Campoy, Juan Manuel & Conde-Silvestre, Juan Camilo (eds). 2012. The Handbook of Historical Sociolinguistics. Malden MA: Wiley-Blackwell. doi: 10.1002/9781118257227 Hickey, Raymond. 2003a. How do dialects get the features they have? On the process of new dialect formation. In Hickey (2003b), 213–329. doi: 10.1017/CBO9780511486937.014 Hickey, Raymond (ed.). 2003b. Motives for Language Change. Cambridge: CUP. doi:  10.1017/CBO9780511486937

Hinrichs, Lars. 2014. Diasporic mixing of World Englishes: The case of Jamaican Creole in Toronto. In The Variability of Current World Englishes, Eugene Green & Charles Meyer (eds), 169–194. Berlin: De Gruyter. Hiorta, Tomoharu. 2014. The split negative infinitive on the move: A study based on BC linguistic survey 2014. M.A. term paper. ENGL 489, University of British Columbia. Hoffman, Michol F. 2010. The role of social factors in the Canadian Vowel Shift: Evidence from Toronto. American Speech 85(2): 121–140. doi: 10.1215/00031283-2010-007 Hoffman, Michol F. & James A. Walker. 2010. Ethnolects and the city: Ethnic orientation and linguistic variation in Toronto English. Language Variation and Change 22: 37–67. doi:  10.1017/S0954394509990238

Hoffmann, Thomas. 2006. Corpora and introspection as corroborating evidence: The case of preposition placement in the English relative clause. Corpus Linguistics and Linguistic Theory 2(2): 165–195. doi: 10.1515/CLLT.2006.009 Holm, Kurt (ed.). 1998. Die Befragung (The Opinion-Gathering Process), 6 Vols, 4th edn. Stuttgart: UTB Taschenbuch. Holmes, Janet & Hazen, Kirk (eds). 2013. Research Methods in Sociolinguistics: A Practical Guide. Somerset NJ: Wiley Blackwell. Huddleston, Rodney and Geoffrey Pullum. 2002. The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press. Hufeisen, Britta & Marx, Nicole (eds). 2007. EuroComGerm – Die Sieben Siebe. Germanische Sprachen lesen lernen. Aachen: Shaker. Hung, Marietta, Davison, John & Chambers, Jack K. 1993. Comparative sociolinguistics of (aw)Fronting. In Clarke (ed.), 247–67. doi: 10.1075/veaw.g11.12hun Hüttner, Julia & Kidd, Sophie. 2000. Reconstructing or demolishing the “Sprechpraktikum” – A reply to Daniel Spichtinger From anglocentrism to TEIL: reflections on our English programme. VIEWS 9(2): 75–78. Ireland, Robert J. 1979. Canadian Spelling. An Empirical and Historical Survey of Selected Words. PhD dissertation, York University, Ontario. Jaberg, Karl & Jud, Jakob. 1928–1940. Sprach- und Sachatlas Italiens und der Südschweiz, Vol.1–8. Bern: Zofingen. Jenkins, Jennifer. 2000. The Phonology of English as an International Language: New Models, New Norms, New Goals. Oxford: OUP. Jenkins, Jennifer. 2007. English as a Lingua Franca: Attitude and Identity. Oxford: OUP. Jenkins, Jennifer. 2009. World Englishes: A Students’ Guide, 2nd edn. New York NY: Routledge. Jenkins, Jennifer. 2015. Global Englishes: A Resource Book for Students, 3rd edn. New York NY: Routledge. Johannessen, Janne Bondi, Vangsnes, Øystein A., Laake, Signe, Lindstad, Arne Martinus & Åfarli, Tor A. 2010. The Nordic Dialect Corpus and Database: Methodological challenges in collecting data. In Proceedings of Methods XIII: Papers from the Thirteenth International Conference on Methods in Dialectology, 2008, Barry Heselwood & Clive Upton (eds), 113–122. Frankfurt: Peter Lang.

References 385 Johnson, Daniel Ezra. 2014. Rbrul. Version 2.24. (14 Aug. 2014) Johnson, Ellen. 1996. Lexical Change and Variation in the Southeastern United States 1930–1990. Tuscaloosa AL: University of Alabama Press. Johnson, Keith. 2008. Quantitative Methods in Linguistics. Malden MA: Blackwell. Johnstone, Barbara & Kiesling, Scott F. 2008. Indexicality and experience: Exploring the meanings of /aw/-monophthongization in Pittsburgh. Journal of Sociolinguistics 12: 5–33. doi:  10.1111/j.1467-9841.2008.00351.x

Kachru, Braj (ed.). 1983. The Other Tongue: English Across Cultures. Oxford: Pergamon. Kachru, Braj. 1985. Standards, codification and sociolinguistic realism: The English language in the Outer Circle. In English in the World: Teaching and Learning of Language and Literature, ­Randolph Quirk & Henry G. Widdowson (eds), 11–36. Cambridge: CUP. Kennedy, Janice. 2010. The iPhone’s assault on the English language. Vancouver Sun. 13 Apr. 2010, A15. Kerswill, Paul. 2002. Koineization and accommodation. In The Handbook of Language Variation and Change, Jack K. Chambers, Peter Trudgill & Natalie Schilling-Estes (eds), 669–702. Malden MA: Blackwell. Kerswill, Paul, Llamas, Carmen & Upton Clive. 1999. The First SuRE [sic] Moves: Early steps towards a large dialect project. Leeds Studies in English n.s. 30: 257–269. Kerswill, Paul and Ann Williams. 2000. Creating a New Town Koine: children and language change in Milton Keynes. Language in Society 29(1): 65–115. Kirwin, William. 2012. The background of dialect questionnaires in English Department Research: an internal report. Regional Language Studies… Newfoundland 23: 18–24. Kloeke, G. 1952. How can we co-ordinate the Linguistic Cartography of the World? Orbis 1: 130–134. Kortmann, Bernd & Luckenheimer, Kerstin (eds). 2011. The Electronic World Atlas of Varieties of English [eWAVE]. Leipzig: Max Planck Institute for Evolutionary Anthropology. Kortmann, Bernd & Wagner, Susanne. 2005. The Freiburg English Dialect Project and Corpus. In A Comparative Grammar of British English Dialects: Agreement, Gender, Relative Clauses, Bernd Kortmann, Tanja Hermann, Lukas Pietsch & Susanne Wagner, 1–20. Berlin: De Gruyter. doi:  10.1515/9783110197518.1

Kortmann, Bernd, Burridge, Kate, Mesthrie, Rajend, Schneider, Edgar W. & Upton, Clive (eds). 2004. A Handbook of Varieties of English, Vol. II: Morphology and Syntax. Berlin: Mouton de Gruyter. doi: 10.1515/9783110197181

Kostinas, Ulla-Britt. 1998. Language contact in Rinkeby, an immigrant suburb. In Jugendsprache, Langue de jeunes, Youth Language, Jannis K. Androutsopoulos and Arno Scholz (eds.), 125–148. Frankfurt/Main: Lang. Kretzschmar, William A. Jr. 2009. The Linguistics of Speech. Cambridge: CUP. doi:  10.1017/CBO9780511576782

Kretzschmar, William A., Jr., Childs, Becky, Anderson, Bridget & Lanehart, Sonja. 2007. The relevance of community language studies to HEL: The view from Roswell. In Managing Chaos: Strategies for Identifying Change in English [Studies in the History of the English Language 3], Christopher Cain & Geoffrey Russom (eds), 173–186. Berlin: Mouton de Gruyter. Krug, Manfred & Sell, Katrin. 2013. Designing and conducting interviews and questionnaires. In Research Methods in Language Variation and Change, Manfred Krug & Julia Schlüter (eds), 69–98. Cambridge: CUP.

386 The Written Questionnaire in Social Dialectology Krug, Manfred, Schlüter, Julia & Rosenbach, Annette. 2013. Introduction: Investigating language variation and change. In Research Methods in Language Variation and Change, Manfred Krug & Julia Schlüter (eds), 1–13. Cambridge: CUP. Krug, Manfred, Hilbert, Michaela & Fabri, Ray. Forthcoming. Maltese English morphosyntax: corpus-based and questionnaire-based studies; Il-Lingwa Taghna. Special issue Towards a Description of Maltese English, Alexandra Vella & Ray Fabri (eds). Kruijsen, Joep & van der Sijs, Nicoline. 2010. Mapping Dutch and Flemish. In Language and Space: An International Handbook of Linguistic Variation [Handbücher zur Sprach- und Kommunikationswissenschaft / Handbooks of Linguistics and Communication Science 30.2], Vol. 2, Peter Auer & Jürgen Erich Schmidt (eds), 180–202. Berlin: De Gruyter. Kurath, Hans. 1949. A Word Geography of the Eastern United States. Ann Arbor MI: University of Michigan Press. Kurath, Hans. 1958. Review of Deutscher Wortatlas by Walther Mitzka & Ludwig Erich Schmitt. Language 34(3): 428–434. doi: 10.2307/410937 Kurath, Hans. 1986 [1972]. The sociocultural background of dialect areas in American English. In Allen & Linn (eds), 98–116. Kurath, Hans & McDavid, Raven I. 1961. Pronunciation of English in the Atlantic States. Ann Arbor MI: University of Michigan Press. Kurath, Hans, et al. 1972 [1939–43]. Linguistic Atlas of New England, 3 Vols. New York NY: AMS Press. Kurath, Hans, Hansen, Marcus, Bloch, Julia & Bloch, Bernard. 1939. Handbook of the Linguistic Geography of New England. Providence RI: Brown University Press. Labov, William. 1963. The social motivation of a sound change. Word 18: 1–42. Labov, William. 1972. Sociolinguistic Patterns. Philadelphia PA: University of Pennsylvania Press. Labov, William. 1994. Principles of Linguistic Change, Vol. 1: Internal Factors. Oxford: Blackwell. Labov, William. 2006 [1966]. The Social Stratification of English in New York City, 2nd edn. C ­ ambridge: CUP. Labov, William, Ash, Sharon & Boberg, Charles. 2006. The Atlas of North American English. Phonetics, Phonology and Sound Change. Berlin: Mouton de Gruyter. Lambert, Wallace E. 1967. The social psychology of bilingualism. Journal of Social Issues 23: 91– 109. doi: 10.1111/j.1540-4560.1967.tb00578.x Lambert, Wallace E., Hodgson, Richard C., Gardner, Robert C. & Fillenbaum, Samuel. 1960. Evaluational reactions to spoken languages. Journal of Abnormal and Social Psychology 60(1): 44–51. doi: 10.1037/h0044430 Lamelli, Alfred. 2010. Linguistic atlases – traditional and modern. In Auer & Schmidt (eds), Vol. I: 567–92. doi: 10.1515/9783110219166 Leitner, Gerhard. 1992. English as a pluricentric language. In Pluricentric Languages. Differing Norms in Different Nations, Michael Clyne (ed.), 179–237. Berlin: Mouton de Gruyter. Lighthall, W. Douw. 1889. Canadian English. The Week (Toronto), 16 August 1889, 581–583. Lillian, Donna L. 1995. Ms. Revisted: She’s Still a Bitch, Only Now She’s Older! Papers of the Annual Meeting of the Atlantic Provinces Linguistic Association 19: 149–61. Lim, Lisa. 2011. Revisiting English prosody: (some) New Englishes as tone languages? In The Typology of Asian Englishes [Benjamins Current Topics 33], Lisa Lim & Nikolas Gisborne (eds), 97–118. Amsterdam: John Benjamins. doi: 10.1075/bct.33.06lim LimeSurvey Project Team & Schmitz, Carsten. 2012. LimeSurvey: An Open Source survey tool. LimeSurvey Project Hamburg, Germany.

References 387 Lindstad, Arne Martinus, Nøklestad, Anders, Johannessen, Janne Bondi & Vangsnes, Øystein A. 2009. The Nordic Dialect Database: Mapping microsyntactic variation in the Scandinavian languages. In NODALIDA 2009 Conference Proceedings, Kristiina Jokinen & Eckhard Bick (eds), 283–86. Lindquist, Hans. 2009. Corpus Linguistics and the Description of English. Edinburgh: EUP. Luick, Karl. 1964[1914–40]. Historische Grammatik der englischen Sprache, Friedrich Wild & Herbert Koziol (eds), 2 Vols. Oxford: Blackwell. Macaulay, Ronald K. S. 1977. Review of The Linguistic Atlas of Scotland: Scots Section Vol. 1 (1975) by James Y. Mather, H. H. Speitel, G. W. Leslie & I. E. Mather. Language 53(1): 224–228. doi:  10.2307/413070

Macaulay, Ronald K. S. 1979. Review of The Linguistic Atlas of Scotland: Scots Section Vol.2 (1977) by James Y. Mather, H. H. Speitel, G. W. Leslie & I. E. Mather. Language 55(1): 224–228. doi:  10.2307/412528

Maguire, Warren & McMahon, April (eds) 2011. Analysing Variation in English. Cambridge: CUP. doi: 10.1017/CBO9780511976360

Mair, Christian. 2006. Tracing ongoing grammatical change and recent diversification in present-day standard English: The complementary role of small and large corpora. In The Changing Face of Corpus Lingustics, Antoinette Renouf & Andrew Kehoe (eds), 355–376. Amsterdam: Rodopi. Mallinson, Christine, Childs, Becky & Van Herk, Gerard (eds). 2013. Data Collection in Sociolinguistics: Methods and Applications. New York NY: Routledge. Markus, Manfred. 2007. Maltese English in its multicultural setting. In Tracing English Through Time: Explorations in Language Variation in Honour of Herbert Schendl on the Occasion of his 65th Birthday, Ute Smit, Stefan Dollinger, Julia Hüttner, Gunther Kaltenböck & Ursula Lutzky (eds), 203–18. Vienna: Braumüller. Mather, James Y. & Speitel, H. H. 1975, 1977. The Linguistic Atlas of Scotland, Vol. 1–2. London: Croom Helm. Mathews, Mitford (ed.). 1951. Dictionary of Americanisms on Historical Principles. Chicago IL: University of Chicago Press. McConnell, Ruth E. 1979. Our Own Voice. Canadian English and How it is Studied. Toronto: Gage. McDavid, Raven I. 1940. Low-back vowels in the South Carolina Piedmont. American Speech 15: 144–148. doi: 10.2307/486820 McDavid, Raven I. 1953a. Review of A Questionnaire for a Linguistic Atlas of England by Eugen Dieth & Harold Orton. The Journal of English and Germanic Philology 52(4): 563–568. McDavid, Raven I. 1953b. Review of Linguistic Survey of Scotland, First Questionnaire, by Angus McIntosh, Hans J. Uldall & Kenneth Jackson. Edinburgh, 1951. Journal of English and Germanic Philology 52(4): 568–570. McDavid, Raven I. 1986 [1980]. Linguistic geography. In Allen & Linn (eds), 117–122. McIntosh, Angus. 1961. An Introduction to a Survey of Scottish Dialects. Edinburgh: Thomas Nelson. McIntosh, Angus, Uldall, Hans J. & Jackson, Kenneth. 1951. Linguistic Survey of Scotland: First Questionnaire. Edinburgh: University of Edinburgh. McKinnie, Meghan & Dailey-O’Cain, Jennifer. 2002. A perceptual dialectology of Anglophone C ­ anada from the perspective of young Albertans and Ontarians. In Preston & Long (eds), Vol. 2: 277–94. Mesthrie, Rajend & Bhatt, Rakesh M. 2008. World Englishes. Cambridge: CUP. Meyerhoff, Miriam. 2011. Introducing Sociolinguistics, 2nd edn. Abingdon: Routledge. Miller, Corey. 1989. The United States-Canadian border as a linguistic bourndary: the English language in Calais, Maine and St. Stephen, New Brunswick. B.A. Essay. Department of Linguistics, Harvard University.

388 The Written Questionnaire in Social Dialectology Milroy, Lesley. 1987. Language and Social Networks [Language in Society 2], 2nd edn. Oxford: Blackwell. Mitzka, Walther. 1938. Der deutsche Wortatlas. Zeitschrift für Mundartforschung 14: 40–55. Mitzka, Walther. 1939. Der Fragebogen zum Deutschen Wortatlas. Zeitschrift für Mundartforschung 15: 105–111. Mitzka, Walther. 1952. Handbuch zum Deutschen Sprachatlas. Gießen: W. Schmitz & Elwertsche Universitätsbuchhandlung Marburg. Mitzka, Walther & Schmidt, Ludwig Erich. 1951–80. Deutscher Wortatlas, 22 Vols. Giessen: W. Schmitz. Moser, Claus A. & Kalton, Graham. 1971. Survey Methods in Social Investigation. London: Heinemann. Muhr, Rudolf. 1989. Deutsch und Österreich(isch): Gespaltene Sprache – Gespaltenes Bewusstsein – Gespaltene Identität. ide (Informationen zur Deutschdidaktik), (Klagenfurt) 2 (13th year): 74–98. Nagy, Naomi, Chociej, Joanna & Hoffman, Michol F. 2014. Analyzing ethnic orientation in the quantitative sociolinguistic paradigm. Language and Communication 35: 9–26. New Perspectives on the Concept of Ethnic Identity in North America, Lauren Hall-Lew & Malcah Yaeger-Dror (eds). Nelson, Francis W. 1983. Dialectology: An Introduction. London: Longman. Noseworthy, Ronald. 1974. Fishing supplement – Newfoundland Dialect Questionnaire. Regional Language Studies… Newfoundland 5: 18–21. Nylvek, Judith A. 1984. A Regional and Sociolinguistic Survey of Saskatchewan English. MA dissertation, University of Victoria, BC. Nylvek, Judith A. 1992. Is Canadian English in Saskatchewan becoming more American? American Speech 67(3): 268–278. doi: 10.2307/455564 Nylvek, Judith A. 1993a. Canadian English in Saskatchewan. A Sociolingistic Survey of Four Selected Regions. PhD dissertation, University of Victoria, BC. Nylvek, Judith A. 1993b. A sociolinguistic analysis of Canadian English in Saskatchewan: A look at urban versus rural speakers. In Clarke (ed.), 201–228. doi: 10.1075/veaw.g11.10nyl OED = OED-3 = Oxford English Dictionary. 2000–. 3rd edn. Ed. by Michael Proffitt & John Simpson. Oxford: Oxford University Press. Owens, Thompson W. & Baker, Paul M. 1984. Linguistic insecurity in Winnipeg: Validation of a Canadian index of linguistic insecurity. Language in Society 13: 337–350. doi:  10.1017/S0047404500010538

Orkin, Mark M. 1970 [1971]. Speaking Canadian English. An informal account of the English language in Canada. Reprint. New York NY: McKay Company. Orton, Harold, et al. 1962–71. Survey of English Dialects: Basic Materials, 13 Vols. Leeds: E. J. Arnold & Son. Orton, Harold, Sanderson, Steward & Widdowson, John. 1978. The Linguistic Atlas of England. ­London: Croom Helm. Patton, Mildred L. 2001. Questionnaire Research: A Practical Guide. 2nd edn. Los Angeles: Pyrczak Publishing. Pederson, Lee, McDaniel, Susan L. & Adams, Carol M. (eds). 1986–93. Linguistic Atlas of the Gulf States, 7 Vols. Athens GA: University of Georgia Press. Pennycook, Alastair. 2010. Language as Local Practice. London: Routledge. Pennycook, Alastair. 2007. Global Englishes and Transcultural Flows. London: Routledge. Pepys, Samuel. n.d. The Diary of Samuel Pepys, H. B. Wheatley (ed.).  doi: 10.2307/3713771 Perusse, Bernard. 2010. Yanovsky’s voice remains her own. Montreal Gazette, 22 Apr. 2010, C1.

References 389 Pi, Chia-Yi Tony. 2000. Canadians telling time: A study in dialect topography. Toronto Working Paper in Linguistics 18: 80–102. Pitzl, Marie-Louise, Breiteneder, Angelica & Klimpfinger, Theresa. 2008. ‘A world of words: processes of lexical innovation in VOICE’. Vienna English Working Papers 17(2): 21–46. Pitzl, Marie-Louise. 2009. ‘“We should not wake up any dogs”: Idiom and metaphor in ELF. In English as a Lingua Franca: Studies and Findings, Anna Mauranen & Elina Ranta (eds), 298–322. Newcastle upon Tyne: Cambridge Scholars. Pitzl, Marie-Louise. 2012. Creativity meets convention: Idiom variation and re‑metaphorization in ELF. Journal of English as a Lingua Franca 1(1): 27–55. doi: 10.1515/jelf-2012–0003 Plag, Ingo. 2003. Word-formation in English. Cambridge: CUP. doi: 10.1017/CBO9780511841323 Platt, John Talbot, Weber, Heidi & Ho, Mian Lian. 1984. The New Englishes. London: Routledge & Kegan Paul. Polson, James. 1969. A Linguistic Questionnaire for British Columbia: A Plan for a Postal Survey of Dialectal Variation in B.C., with an Account of Recent Research. MA thesis, University of British Columbia. Pop, Sever. 1950. La dialectologie: aperçu historique et méthodes d’enquêtes linguistiques, 2 Vols. Louvain: Université de Louvain. Poplack, Shana. 1985. Contrasting patterns of code-switching in two communities. In Methods/ Méthodes V. 1984. Papers from the Fifth International Conference on Methods in Dialectology, H. J. Warkentyne (ed.), 363–385. Victoria, B.C.: University of Victoria. Pratt, T. K. 1983. A case for direct questioning in traditional fieldwork. American Speech 58(2): 150–155. doi: 10.2307/455325 Preston, Dennis R. 1989. Perceptual Dialectology. Nonlinguists’ Views of Areal Linguistics. Dordrecht: Foris. doi: 10.1515/9783110871913 Preston, Dennis R. 2002. Language with an attitude. In The Handbook of Language Variation and Change, Jack K. Chambers, Peter Trudgill & Natalie Schilling-Estes (eds), 40–66. Malden MA: Blackwell. Preston, Dennis R. 2005. The big sibling to the north: US views of Canadian English. Talk presented at University of Alberta, no date. Preston, Dennis. 2006. Response to D. Deterding. 2006. Review of Nancy A. Niedzielski & Dennis R. Preston. 2000. Folk Linguistics. Berlin: Mouton de Gruyter. International Journal of Applied Linguistics 16(1): 113–15. Preston, Dennis R. & Long, Daniel (eds). 1999–2002. Handbook of Perceptual Dialectology, Vols 1–2. Amsterdam: John Benjamins. doi: 10.1017/s0047404505210060 Priestley, Francis Ethelbert Louis. 1968 [1951]. Canadian English. In British and American English since 1900. With contributions on English in Canada, South Africa, Australia, New Zealand and India, Eric Partridge & John W. Clark (eds), 72–84. New York NY: Greenwood Press. Pringle, Ian. 1983. The concept of dialect and the study of Canadian English. Queen’s Quarterly 90(1): 100–121. Pringle, Ian. 1985. Attitudes to Canadian English. In The English Language Today, Sidney Greenbaum (ed.), 183–205. Oxford: Pergamon. Pringle, Ian & Padolsky, Enoch. 1983. The linguistic survey of the Ottawa Valley. American Speech 58(4): 327–344. doi: 10.2307/455147 Prodromou, Luke. 2007. Bumping into creative idiomaticity. English Today 23(1): 14– 25. doi:  10.1017/S0266078407001046

390 The Written Questionnaire in Social Dialectology Quirk, Randolph, Greenbaum, Sidney, Leech, Geoffrey & Svartvik, Jan. 1985. A Comprehensive Grammar of the English Language. London: Longman. Quist, Pia & Bente A. Svendsen (eds) 2010. Multilingual Urban Scandinavia: New Linguistic Perspectives. Bristol: Multilingual Matters. Rampton, Ben, Blommaert, Jan, Arnaut, Karel & Spotti, Massimiliano. 2015. Superdiversity and sociolinguistics. Working Papers in Urban Language & Literacies 152: 1–13. Rau, D. Victoria. 2013. Cross-cultural issues in studying endangered languages. In Mallinson, Childs & Van Herk (eds), 101–104. Reiffenstein, Ingo. 1981. Johann Andreas Schmeller und die heutige Dialektforschung. Zeitschrift für Dialektologie und Linguistik 48(3): 289–298. Robinson, Jonnie. 2015. Ratching through kintle for bobby-dazzlers: Initial reflections on the British Library’s Evolving English ‘WordBank’. Paper presented at Ox-Lex, 4th International Symposium on Approaches to English Historical Lexicography and Lexicology, Oxford University, 25 March. [1 May 2015]. Rohdenburg, Günther & Schlüter, Julia (eds). 2009. One Language, Two Grammars? Differences between British and American English. Cambridge: CUP. doi: 10.1017/CBO9780511551970 Rodman, Lilita. 1974. Characteristics of B.C. English. The English Quarterly 7(4): 49–82. Ruß, Marina. 2008. Bildliche Lautdarstellungen in Therapie, Unterricht und Lehre. Cologne: ProLog. Sauer, Hans. 1992. Nominalkomposita im Frühmittelenglischen: Mit Ausblicken auf die Geschichte der englischen Nominalkomposition. Tübingen: Max Niemeyer. doi: 10.1515/9783110940657 de Saussure, Ferdinand. 1916[1966]. Course in General Linguistics, Charles Bally & Albert Sechehaye (eds), Wade Baskin (transl.). New York NY: McGraw-Hill. Scargill, Matthew H. 1974. Modern Canadian English Usage. Linguistic Change and Reconstruction. Toronto: McClelland and Stewart. Scargill, Matthew H. & Warkentyne, Henry J. 1972. The survey of Canadian English: A report. The English Quarterly. A Publication of the Canadian Council of Teachers of English 5(3): 47–104. Scheuringer, Hermann. 2010. Mapping the German language. In Language and Space: An International Handbook of Linguistic Variation [Handbücher zur Sprach- und Kommunikationswissenschaft / Handbooks of Linguistics and Communication Science 30.2], Vol. 2, Peter Auer & Jürgen Erich Schmidt (eds), 158–179. Berlin: De Gruyter. Schleef, Erik. 2013. Written surveys and questionnaires. In Holmes & Hazen (eds), 42–57. Schmied, Josef. 1991. English in Africa. London: Longman. Schneider, Edgar W. 2003. The dynamics of New Englishes: From identity construction to dialect birth. Language 79: 233–281. doi: 10.1353/lan.2003.0136 Schneider, Edgar W. 2007. Postcolonial English: Varieties Around the World. Cambridge: CUP. doi:  10.1017/CBO9780511618901

Schneider, Edgar W., Burridge, Kate, Kortmann, Bernd, Mesthrie, Rajend & Upton, Clive (eds). 2004. A Handbook of Varieties of English. A Mulitmedia Reference Tool, Vol. I: Phonology. Berlin: Mouton de Gruyter. doi: 10.1515/9783110197181 Schütze, Carson T. 1996. The Empirical Base of Linguistics : Grammaticality Judgements and Linguistic Methodology. Chicago IL: University of Chicago Press. Seiler, Guido. 2010. Investigating language and space: Questionnaire and interview. In Language and Space: An International Handbook of Linguistic Variation, Vol. 1: Theories and Methods [Handbücher zur Sprach- und Kommunikationswissenschaft / Handbooks of Linguistics and Communication Science 30.1], Peter Auer & Jürgen Erich Schmidt (eds), 512–527. Berlin: De Gruyter.

References 391 Seidlhofer, Barbara. 2007. English as a lingua franca and communities of practice. In Anglistentag 2006 Halle: Proceedings, Sabine Volk-Birke & Julia Lippert (eds), 307–18. Trier: Wissenschaftlicher Verlag. Seidlhofer, Barbara. 2009. Accommodation and the idiom principle in English as a lingua franca. Intercultural Pragmatics 6(2): 195–215. doi: 10.1515/IPRG.2009.011 Seidlhofer, Barbara. 2011. Understanding English as a Lingua Franca [Oxford Applied Linguistics], Kindle Locations 5915–5918. Oxford: OUP. Shuy, Roger W., Wolfram, Walter A. & Riley, William K. 1968. Field Techniques in an Urban Language Study. Washington DC: Centre for Applied Linguistics. Silverstein, Michael. 2003. Indexical order and the dialectics of sociolinguistic life. Language and Communication 23: 193–229. doi: 10.1016/S0271-5309(03)00013-2 Simpson, Jane. 2004 [2008]. Hypocristics in Australian English. In Varieties of English: The Pacific and Australasia, Kate Burridge & Bernd Kortmann (eds), 398–415. Berlin: Mouton de Gruyter. Sorace, Antonella & Keller, Frank. 2005. Gradience in linguistic data. Lingua 115(11): 1497– 1525. doi: 10.1016/j.lingua.2004.07.002 Spichtinger, Daniel. 2000. From anglocentrism to TEIL: reflections on our English programme. VIEWS 9(1): 69–72. Stadler, Franz Joseph. 1819. Landessprachen der Schweiz oder Schweizerische Dialektologie; mit kritischen Sprachbemarkungen beleuchtet, nebst der Gleichniszrede von dem verlorenen Sohne in allen Schweizermundarten. Aarau: Sauerländer. Stevenson, Roberta C. 1976. The Pronunciation of English in B.C. MA thesis, University of British Columbia. Story, George M. 1959. A Newfoundland Dialect Questionnaire: Avalon Peninsula. Ms. Memorial University. English Language Research Centre: Memorial University. Story, George Morley, Kirwin, William J. & Widdowson, John David Allison (eds). 1982. Online ed. 1999. Dictionary of Newfoundland English. Toronto: University of Toronto Press. Svartvik, Jan. 1992. Corpus linguistics comes of age. In Directions in Corpus Linguistics, Proceedings of the Nobel Symposium 82, Stockholm 4–8 August 1991, Jan Svartvik (ed.), 7–13. Berlin: De Gr uyter. doi: 10.1515/9783110867275 Tajfel, Henri & Turner, John C. 1979. An integrative theory of intergroup conflict. In The Social Psychology of Intergroup Relations, William G. Austin & Stephen Worchel (eds), 33–47. Pacific Grove CA: Brooks & Cole. Tagliamonte, Sali A. 2006. Analysing Sociolinguistic Variation. Cambridge: CUP. doi:  10.1017/CBO9780511801624

Tagliamonte, Sali A. and Alexandra D’Arcy. 2007. The modals of obligation/necessity in Canadian perspective. English World-Wide 28(1): 47–87. Tagliamonte, Sali and Alexandra D’Arcy. 2009. Peaks beyond phonology: adolescence, incrementation, and language change. Language 85(1): 58–108. Thomas, Alan R. 1973. Linguistic Geography of Wales: A Contribution to Welsh Dialectology. Cardiff: University of Wales Press. Tisato, Graziano G. 2009. NavigAIS: AIS Digital Atlas and Navigation Software. [Digital version of Jaberg & Jud 1928–1940]. Ver. 1.47. [7 February 2014]. Tourangeau, Roger & Plewes, Thomas J. (eds). 2014. Nonresponse in Social Sciences: A Research Agenda. Washington DC: The National Academies Press. Trier, Jost. 1931. Der deutsche wortschatz im sinnbezirk des verstandes: Von den anfängen bis zum beginn des 13. Jahrhunderts. Heidelberg: Carl Winter.

392 The Written Questionnaire in Social Dialectology Trudgill, Peter, 1972. Sex, covert prestige and linguistic change in the urban British English of Norwich. Language in Society 1(2): 179–95. doi: 10.1017/S0047404500000488 Trudgill, Peter. 1974a. Linguistic change and diffusion: description and explanation in sociolinguistic dialect geography. Language in Society 3: 215–246. doi: 10.1017/S0047404500004358 Trudgill, Peter. 1974b. The Social Stratification of English in Norwich. Cambridge: CUP. Trudgill, Peter. 1986. Dialects in Contact. Oxford: Blackwell. Trudgill, Peter. 2004. New-Dialect Formation: The Inevitability of Colonial Englishes. Edinburgh: EUP. Trudgill, Peter. 2008. Colonial dialect contact in the history of European languages: On the irrelevance of identity to new-dialect formation. Language in Society 37(2): 241–254. Trudgill, Peter & Giles, Howard. 1978. Sociolinguistics and linguistic value judgements: Correctness, adequacy and aesthetics. In Functional Studies in Language and Literature, Frank Coppieters & Didier L. Goyvaerts (eds), 167–180. Ghent: E. Story-Scientia. Trudgill, Peter, Gordon, Elizabeth, Lewis, Gillian & Maclagan, Margaret. 2000a. Determinism in new-dialect formation and the genesis of New Zealand English. Journal of Linguistics 36: 299–318. doi: 10.1017/S0022226700008161 Trudgill, Peter, Gordon, Elizabeth, Lewis, Gillian & Maclagan, Margaret. 2000b. The role of drift in the formation of native-speaker southern hemisphere Englishes: Some New Zealand evidence. Diachronica 17: 111–138. doi: 10.1075/dia.17.1.06tru Trudgill, Peter & Hannah, Jean. 2002. International English. A Guide to Varieties of Standard English, 4th edn. London: Arnold. Upton, Clive & Widdowson, John David Allison (eds). 22006. 11996. An Atlas of English Dialects. Oxford: OUP. Van Herk, Gerard, Childs, Becky & Thorburn, Jennifer. 2010. Identity marking and affiliation in an urbanizing Newfoundland community. In Canadian English: A Linguistic Reader, Elaine Gold & Janice McAlpine, 135–144. Kingston ON: Strathy Language Unit. Vaux, Bert. 2004. Harvard Dialect Survey. [8 July 2014]. Viereck, Wolfgang. 1975. Lexikalische und Grammatische Ergebnisse des Lowman-Survey von Mittelund Südengland, 2 Vols. Munich: Wilhelm Fink. Viereck, Wolfgang & Ramisch, Heinrich. 1997. The Computer Developed Linguistic Atlas of England 2. Tübingen: Niemeyer. Wald, Benji & Besserman, Lawrence. 2002. The emergence of the verb-verb compound in twentieth century English and twentieth century linguistics. In Studies in the History of English: A Millennial Perspective, Donka Minkova & Robert Stockwell (eds), 417–447. Berlin: Mouton de Gruyter. doi: 10.1515/9783110197143.3.417

Walker, James A. & Hoffman, Michol F. 2013. Fieldwork in immigrant communities. In Mallinson, et al. (eds), 80–83. Walker, James A. & Torres Cacoullos, Rena. 2009. The present of the English future: Grammatical variation and collocations in discourse. Language 85(2): 321–354. doi: 10.1353/lan.0.0110 Wanjema, Shontael, Carmichael, Katie, Walker, Abby & Campbell-Kibler, Kathryn. 2013. The Ohiospeaks Project: Engaging undergraduates in sociolinguistic research. American Speech 88: 223–235. doi: 10.1215/00031283-2346798 Wang, William S.-Y. & Cheng, Chin-chuan. 1970. Implementation of phonological change: The Shuang-feng Chinese case. Chicago Linguistic Society 6: 552–59. Warkentyne, Henry J. 1983. Attitudes and language behaviour. Canadian Journal of Linguistics 28: 71–6.

References 393 Warkentyne, Henry & Brett, A. C. 1993. Statistical Analysis of Variation in Canadian English. In Proceedings of the 1990 International Congress of Dialectologists, Wolfgang Viereck (ed.), 496–506. Stuttgart: Franz Steiner. Watson, Kevin & Clark, Lynn. 2014. Exploring listeners’ real-time reactions to regional accents. Language Awareness. doi: 10.1080/09658416.2014.882346 Weinreich, Uriel, Labov, William & Herzog, Marvin. 1968. Empirical foundations for a theory of language change. In Directions in Historical Linguistics, Winfried P. Lehmann & Yakov Malkiel (eds), 95–195. Austin TX: University of Texas Press. Wenker, Georg (ed.). 1887–1923. Sprachatlas des Deutschen Reiches: Laut- und Formenatlas. 1647 hand-drafted multicolour maps. Archived at the Forschungszentrum Deutscher Sprachatlas Marburg and at the Staatsbibliothek Preußischer Kulturbesitz Berlin. Wenker, Georg. 2013[1895]. Herrn Bremers Kritik des Sprachatlas. In Georg Wenker: Schriften zum Sprachatlas des Deutschen Reiches: Gesamtausgabe, Band 2, Alfred Lameli (ed.), with assistance of Johanna Heil & Constanze Wellendorf, 957–976. Hildesheim: Georg Olms. Wick, Neil. 2003. Intra-speaker variability in dialect variation and change. Ms, LING 5240, Department of Linguistics, York University. Widdowson, Henry G. 1994. The ownership of English. TESOL Quarterly 28(2): 377–389. doi:  10.2307/3587438

Wilson, H. Rex. 1958. The dialect of Lunenburg County, Nova Scotia. A study of the English of the county, with reference to its sources, preservation of relics, and vestiges of bilingualism. PhD Thesis, University of Michigan. Winchester, Simon. 1998. The Professor and the Madman: A Tale of Murder, Insanity, and the Making of the Oxford English Dictionary. New York NY: HarperCollins. Winford, Don. 2003. An Introduction to Contact Linguistics. Malden, MA: Wiley-Blackwell. Wolff, H. 1959. Intelligibility and inter-ethnic attitudes. Anthropological Linguistics 1(3): 34–41. Wolfram, Walter A. 1969. A Sociolinguistic Description of Detroit Negro Speech. Washington DC: Center for Applied Linguistics. Woods, Howard B. 1993. A synchronic study of English spoken in Ottawa: Is Canadian English becoming more American? In Clarke (ed.), 151–178. doi: 10.1075/veaw.g11.08woo Woods, Howard B. 1999 [1979]. The Ottawa Survey of Canadian English. [Strathy Language Unit Occasional Papers 4]. Kingston: Queen’s University. Wrede, Ferdinand, Martin, Bernhard & Mitzka, Walther (eds). 1927–1956. Deutscher Sprachatlas. Marburg: Elwert. Wright, Joseph (ed.). 1898–1905. The English Dialect Dictionary. London: Henry Frowde. Wright, Joseph (ed.). 1898. The English Dialect Dictionary, Vol I: A-C. London: Henry Frowde. Zeller, Christine. 1990. Dialect variants from Toronto to Milwaukee. MA forum paper, Department of Linguistics, University of Toronto. Zeller, Christine. 1993. Linguistic symmetries, asymmetries, and border effects within a Canadian/ American sample. In Focus on Canada [Varieties of English around the World G11], Sandra Clarke (ed.), 179–200. Amsterdam: John Benjamins. doi: 10.1075/veaw.g11.09zel

Index A accommodation processes  204 age-grading 178, 179, 180 American English  xv, 47, 99, 105, 149, 154, 160, 169, 199, 214, 216, 322 attitudes 3, 11, 12, 14, 62, 63, 73, 74, 75, 86, 130, 133, 144, 158, 159, 160, 161, 162, 185, 191, 206, 213, 218, 235, 236, 239, 240, 246, 252, 253, 255, 271, 294, 361, 365, 366, 373 Auer, Peter  370 avenue 50, 77, 82, 83, 84, 85, 86, 124, 130, 176, 195, 197, 240, 266 Avis 38, 39, 42, 47, 51, 77, 87, 88, 93, 94, 122, 125, 196, 216, 247 B Baayen 69, 70, 320, 359 Bamberg Questionnaire  144, 146 Bank of Canadian English  170 Barbiers 27, 130 between you and I  115, 116, 246 Bloomaert 173, 211, 370 Bloomfield, Leonard  201, 215, 321 Bloomfield, Morton  201, 215, 321 BNC  xv, 57, 58 Boberg 5, 11, 20, 38, 46, 47, 74, 75, 76, 78, 87, 90, 94, 95, 99, 111, 126, 127, 131, 180, 198, 199, 200, 201, 202, 216, 218, 219, 220, 226, 227, 247, 271, 272, 276, 363, 365, 373, 374 Britain, David  28, 93, 95, 106, 130, 152, 158, 201, 206, 216, 367, 370 British English  xv, 57, 58, 59, 74, 99, 146, 149, 160, 163, 169, 202, 211, 212, 324 Buchstaller 11, 49, 50, 56, 250, 251, 252, 258, 264, 265, 267, 365, 373

C Canadian English  xv, xvi, xxv, 3, 13, 14, 38, 39, 42, 43, 44, 45, 46, 47, 51, 54, 55, 62, 82, 84, 87, 88, 93, 94, 95, 98, 99, 100, 105, 106, 107, 110, 111, 113, 122, 128, 131, 132, 133, 149, 160, 169, 172, 176, 177, 178, 185, 187, 188, 192, 196, 199, 200, 201, 205, 211, 212, 214, 215, 216, 217, 218, 219, 226, 227, 239, 247, 252, 256, 263, 271, 322 Canadian Oxford Dictionary  99, 100, 103, 104 Cassidy 19, 21, 34, 35, 36, 51, 100, 125, 199, 363, 364 categorical variable  323, 324, 326, 329, 335, 340, 350, 359, 372 Chambers 6, 7, 11, 15, 35, 40, 45, 46, 48, 65, 66, 67, 68, 69, 71, 72, 78, 82, 84, 85, 87, 90, 92, 93, 95, 97, 98, 103, 105, 106, 112, 122, 124, 131, 133, 135, 152, 179, 191, 196, 197, 201, 215, 216, 220, 241, 262, 264, 275, 276, 284, 288, 291, 292, 293, 363, 365, 374 change from above  14, 175, 184, 188 change from below  14, 175, 184 checklists 30, 71, 237 chesterfield 47, 59, 87, 88, 89, 90, 92, 93, 94, 95, 96, 102, 129, 177, 197, 217, 247, 248, 268, 292 chi-square 322, 327, 328, 332, 340, 342, 345, 346, 359 Clarke, Sandra  82, 84, 85, 123, 190, 211, 214, 215, 216, 218 closed response  266 community reporting  10, 11, 12, 14, 22, 32, 49, 63, 145, 172, 235, 258, 265, 273, 361 Considine, John  207

CONTE  xv, 54, 118 corpus linguistics  xv, xxv, 2, 4, 5, 13, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 86, 167, 168, 176, 178, 182, 264, 272, 361, 365, 372 COUNTIF 302, 304, 307, 308, 311, 313, 316, 317 Crystal 157, 158, 162 D data-readying 296, 349 Davis, Daniel  30, 31, 32, 33, 34, 35, 51, 66, 122, 232, 237, 362 DCHP-1  xv, 47, 93, 94 DCHP-2  xv, xxiii, 148, 149 dialect contact  102, 131, 132, 202, 373 dialect geography  xxv, 6, 7, 9, 11, 12, 20, 21, 22, 23, 45, 51, 69, 117, 131, 135, 152, 153, 191, 234, 370 dialectology  xxv, 3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 19, 20, 21, 22, 26, 27, 29, 30, 35, 36, 40, 44, 51, 53, 55, 60, 73, 75, 87, 129, 132, 153, 159, 215, 218, 221, 236, 237, 239, 240, 241, 243, 254, 258, 266, 269, 273, 274, 285, 287, 340, 361, 362, 363, 365, 368, 369, 373, 374 Dialect Topography  xv, 15, 29, 40, 45, 46, 48, 51, 63, 64, 65, 82, 90, 96, 97, 98, 99, 105, 109, 111, 112, 117, 118, 122, 123, 124, 125, 126, 131, 133, 134, 135, 178, 193, 196, 198, 219, 227, 230, 232, 235, 238, 241, 243, 245, 246, 248, 259, 272, 273, 275, 276, 277, 278, 280, 283, 284, 285, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 301, 302, 303, 318, 333, 335, 336, 337, 341, 348, 349, 350, 351, 354, 365 Dieth 9, 10

396 The Written Questionnaire in Social Dialectology different from  10, 13, 22, 49, 65, 75, 76, 90, 109, 110, 112, 113, 116, 129, 131, 145, 162, 171, 177, 184, 196, 202, 235, 240, 252, 261, 277, 278, 303, 311, 317, 321, 323, 345 Dillman 173, 233, 236, 258, 261, 267, 268, 271 Dörnyei 12, 161, 233, 236, 237, 238, 239, 240, 273 Dutch 13, 21, 27, 49, 132, 133, 165, 166, 249, 365 E Easson 46, 115, 116 Eckert 189, 209, 210, 211, 212, 213, 214, 272 Ellegård 3 Ellis, Alexander  22, 36, 75 Elspaß 48, 364 English as a Lingua Franca  156, 173 ethnic orientation index  xv, 284, 285, 293, 294 Excel exercises  166, 311 Expanding Circle  108, 143, 144, 154, 155, 156, 158, 162, 169, 170 F Fee  xxiv, 110, 113, 199 Flemish 27, 51, 365 G Geikie 105, 106, 109 gender 14, 60, 63, 88, 89, 112, 134, 147, 150, 154, 175, 176, 185, 188, 189, 190, 191, 209, 221, 229, 283, 284, 303, 308, 311, 319, 328, 338, 346, 347, 352, 355, 357, 358, 370 Gilliéron 3, 6, 7, 8, 10, 19, 26, 72 Glaser, Elvira  49, 130, 250 globalization 132, 134, 153, 173, 367 Görlach 148, 149, 150 Graddol 158 Gregg, Robert J.  40, 47, 79, 82, 94, 110, 178, 190, 200, 215, 219, 248, 260, 271, 272, 287, 289 Gries, Stephan Th.  xxiii, 69, 70, 320, 321, 323, 326, 332, 340, 344, 359

H Hempl 28, 29, 30, 38, 67, 266 heterogeneity 128, 214, 217, 218, 221, 367, 371 homogeneity 128, 214, 215, 216, 219, 221, 227, 335, 367 hypothesis testing  318, 321, 322, 346 I idiomaticity 166, 167 indexing 123, 175, 221, 367 interactions 328, 346, 347, 351, 352, 353, 354, 355, 356, 359 interdialect development  203 J Jenkins 12, 63, 144, 154, 155, 156, 157, 158, 159, 160, 161, 162, 166, 170, 173, 239, 252 K Kachru 108, 132, 143, 156, 162 Kerswill 202, 203, 234, 235, 272 koinéization 175, 202, 203, 204, 205, 206, 207, 221, 234, 367 Kortmann 55, 171 Kretzschmar 3, 8, 31, 151, 232 Krug  xxiii, 5, 132, 144, 145, 146, 147, 148, 235, 373 Kurath 7, 8, 26, 29, 34, 51, 53, 60, 66, 117, 373 L Labov 2, 5, 11, 20, 60, 73, 74, 78, 180, 184, 185, 209, 213, 214, 218, 252, 253, 271, 272, 283, 311, 373 LAGS 9 Lambert 12, 158, 239, 254, 361 LANCS 8 LANE 8, 72 language contact  33, 145, 153, 156, 169, 171 language use index  xvi, 284, 285, 292, 293, 294, 303, 312, 325, 326, 347, 351, 352, 356, 357 LAUM  xvi, 8, 35, 65, 68, 72, 120, 334

LAUSC  xvi, 6, 8, 9, 29, 30, 33, 34, 36, 40, 51, 60, 66, 71, 72, 76, 215, 230, 285, 363 levelling apparent levelling  203 rudimentary levelling  203 Lindquist 54, 55, 57, 58 linguistic attitudes  159, 236 linguistic error  72, 80, 85, 162, 167, 179, 231, 267, 271, 321, 330, 348, 350, 355 linguistic innovation  30, 102, 118, 127, 129, 162, 163, 164, 165, 166, 202, 217, 293, 361, 373 logistic regression  326, 327, 338, 339, 346, 347, 350, 351, 358, 359, 364 M Macaulay 37, 38, 363 majority form  203 Mather 9, 36, 37, 87, 252, 362 McDavid 8, 9, 11, 29, 36, 51, 72, 76, 77, 81, 362, 365, 372 migration 216, 290, 291, 370 Mitzka 11, 23, 24, 25, 26, 27, 51, 53, 373 monolingualism 40, 111, 112, 131, 132, 133, 134, 135, 151, 202, 292, 370, 371 multi-item scaling  240, 241, 372 multilingualism 24, 31, 131, 132, 133, 134, 135, 147, 150, 151, 153, 157, 165, 173, 291, 292, 370 multivariate 319, 321, 327, 328 N NARVS  xvi, 29, 46, 47, 48, 198, 199, 200, 220, 227, 237, 238, 245, 365 new-dialect formation  14, 204 New Zealand English  204 non-linear statistics  328, 338, 339, 346, 359 O occupational mobility index xvi, 285, 294, 295, 301, 325 open response  46, 64, 97, 150, 236, 243, 268, 325, 335

Index 397 ordinal variable  323, 324, 325, 326, 328, 335, 359, 372 Orton 8, 9, 10 Outer Circle  108, 133, 143, 144, 153, 154, 155, 156, 158, 162, 163, 164, 171 Oxford English Dictionary  xvi, 101, 109, 113, 157 P Pennycook 173, 211 perception 3, 12, 236, 237, 239, 252, 254, 256, 361, 365, 368, 372 Pi 12, 46, 117, 118, 119, 120, 121, 276 pilot test  263, 268, 361 Plag 166 Pratt 61, 234, 244 Preston 12, 63, 130, 159, 243, 254, 255 Q questionnaire length  14, 232 question type  12, 14, 49, 125, 151, 153, 225, 236, 245, 250, 370 R ratio-scale variable  324, 325, 326, 340, 359, 372 reallocation 203 regionality index  xvi, 129, 135, 179, 194, 282, 284, 285, 286, 287, 288, 289, 290, 291, 294, 301, 311, 312, 313, 347, 356, 357 Ruß 266, 361

S sampling 4, 63, 128, 133, 135, 225, 232, 270, 271, 273, 274 Scargill 42, 43, 63, 78, 88, 89, 96, 131, 178 Schleef  xxiii, 11, 12, 95, 227, 229, 233, 237, 268 Schmidt 370 Schneider 155, 162, 171, 175, 200, 201, 202, 203, 205, 206, 207, 208, 221, 367 s-curve 14 Seidlhofer  xxiii, 157, 158, 164, 166, 167, 168, 371 snuck 105, 106, 107, 108, 129, 180 social class  112, 150, 176, 184, 188, 209, 294, 295, 301, 303, 347, 357 social evaluation  252, 262, 361, 365, 368, 373 space, geographical  6, 55, 152, 153, 225, 370, 371 space, perceived  153, 371 space, social  15, 132, 134, 153, 370, 371 spatiality 152, 153, 367 Stadler 21, 22, 36, 51, 258 super-diversity 132, 151, 152, 367 Survey of Canadian English  xvi, 40, 42, 44, 46, 51, 63, 78, 87, 88, 89, 90, 95, 96, 97, 105, 122, 123, 125, 131, 238, 247, 248 T take up #9  99, 100, 101, 102, 103, 104, 129, 185, 197, 214, 233, 244

tap 69, 87, 95, 96, 98, 99, 102, 129, 146, 197, 201, 240, 252, 268 telling time  117, 118, 119, 120, 121, 325 test of independence  118, 191, 206, 207, 328, 333, 334, 339 Trudgill 2, 6, 7, 11, 73, 74, 85, 112, 127, 131, 152, 160, 175, 185, 186, 191, 195, 200, 201, 202, 203, 204, 205, 207, 208, 209, 216, 221, 272, 324, 367, 373 U unifactorial 328 univariate 319, 321, 327, 328 Upton 9, 50 V variability extreme variability  203 vase 40, 125, 126, 128, 130 W Walker, James A.  207, 218, 226, 272, 284, 293, 294 Wenker  xvi, 3, 6, 10, 11, 21, 22, 23, 24, 25, 26, 51, 53, 132, 250, 258 Widdowson  xxiii, 9, 157, 169, 200 Wrede 24, 25, 26 Y yod-dropping/retention 74, 82, 84, 85, 122, 123, 124, 190, 195, 211, 322, 323, 339, 347