275 46 3MB
English Pages 257 [258] Year 2013
Eline Zenner and Gitte Kristiansen (Eds.) New Perspectives on Lexical Borrowing
Language Contact and Bilingualism
Editor Yaron Matras
Volume 7
New Perspectives on Lexical Borrowing Onomasiological, Methodological and Phraseological Innovations
Edited by Eline Zenner and Gitte Kristiansen
ISBN 978-1-61451-591-3 e-ISBN 978-1-61451-430-5 ISSN 2190-698X Library of Congress Cataloging-in-Publication data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.dnb.de. © 2014 Walter de Gruyter, Inc., Boston/Berlin Typesetting: P T P-Berlin Protago-TEX-Production GmbH Printing and binding: Hubert & Co. GmbH & Co. KG, Göttingen Printed on acid-free paper Printed in Germany www.degruyter.com
Table of contents Eline Zenner and Gitte Kristiansen Introduction: Onomasiological, methodological and phraseological perspectives 1 on lexical borrowing Ad Backus A usage-based approach to borrowability
19
Eline Zenner, Dirk Speelman and Dirk Geeraerts What makes a catchphrase catchy? Possible determinants in the borrowability of English catchphrases in Dutch 41 Esme Winter-Froemel Formal variance and semantic changes in borrowing: Integrating semasiology 65 and onomasiology Augusto Soares da Silva Measuring and comparing the use and success of loanwords in Portugal and Brazil: A corpus-based and concept-based sociolectometrical approach 101 Alexander Onysko and Andreea Calude Comparing the usage of M¯aori loans in spoken and written New Zealand English: 143 A case study of Maori, Pakeha, and Kiwi Frank van Meurs, Jos Hornikx and Gerben Bossenbroek English loanwords and their counterparts in Dutch job advertisements: An 171 experimental study in association overlap Astrid Rothe On the variation of gender in nominal language mixings
191
Helge Sandøy Linguistic globalization: Experiences from the Nordic laboratory Index
251
225
Eline Zenner and Gitte Kristiansen
Introduction: Onomasiological, methodological and phraseological perspectives on lexical borrowing Despite quite some terminological issues (especially concerning the applicability of the chosen metaphor; Haugen 1950), “lexical borrowing” has become the established term to describe the process of the transfer of lexical material from one language (the donor, source or model language) to another language (the receptor or replica language). Over the past century, lexical borrowing has received a great deal of attention in historical linguistics and contact linguistics. However, the focus of most existing studies has been rather limited: loanword research has so far predominantly been conducted from a systemic and structuralist perspective, with attention mainly being paid to counting and classifying types of loanwords according to the degree of morphological and phonological adaptation to the receptor language (e.g. gender assignment, plural formation …), to the diachronic evolution in the amount of loanwords or to their lexicographical treatment (see Androutsopoulos 2012 and Onysko and Winter-Froemel 2011 for similar claims). The main aim of this volume, which grew out of a theme session organized at the 44th meeting of the Societas Linguistica Europaea (Logroño, Spain, September 2011), is to open up the theoretical and methodological perspective of loanword research, as such advancing our understanding of the process of lexical borrowing. In this introduction, we first present a succinct overview of the main topics dealt with in current lexical borrowing research. In this way, we illustrate the predominantly structuralist focus of the paradigm and present some ensuing shortcomings in current approaches. In a next section, we turn our attention to some new avenues for loanword research: despite the longstanding tradition of loanword research and the abundance of existing studies, there are still a number of ways to add to our understanding of the process of lexical borrowing. More specifically, linking up to corpus linguistic research and Cognitive Linguistics, we first advocate an expansion of the restricted focus of current accounts on single words as the unit under scrutiny. On the one hand, such an expansion can be achieved by introducing an onomasiological, concept-based approach to borrowing, where not only the loanword itself but also possible receptor language equivalents are taken into consideration. On the other hand, by paying more attention to the immediate textual context in which a loanword occurs, the focus can be expanded from single-word units towards multi-word units (such as idiomatic expressions or constructions). Second, focusing on the importance of multifactorial designs, experimental set-ups and the combination of types of linguistic evidence in patterning (contact-induced) variation and change, we will illustrate how loanword research can benefit from linking up to methodological advances made in (variational) sociolinguistics, psycholinguistics and corpus linguistics in the past decades. The final section of this introduction, then, summarizes the eight selected contributions
2
Eline Zenner and Gitte Kristiansen
to the volume and positions them against the new avenues presented in the second section.
1 Prime topics in current lexical borrowing research Ever since Sapir (1921) and Bloomfield (1933) emphasized the importance of language contact for linguistic analyses (cf. Hoffer 1996), contact-induced variation and change have received a great deal of attention, and contact linguistics became an established discipline in linguistics, with the most common result of language contact being “change in some or all of the languages: typically, though not always, at least one of the languages will exert at least some influence on at least one of the other languages” (Thomason 2001: 10). In practice, “at least some kind of influence” involves a huge range of types of contact-induced change. From less to more impact on the languages involved, we find amongst others the introduction in the receptor language of words and structures from the source language (borrowing and imposition), the mixing of two grammars (codeswitching), the homogenization of heteronomous varieties (koinezation and Sprachbünde ), the creation of a common language to allow for communication between people with different mother tongues (pidginization and creolization), the theoretical possibility of truly mixed languages (e.g. Michif), and attrition, domain loss and language shift (in rare cases resulting in language death) (cf. e.g. Matras 2009 for more details). Of these phenomena, “the most common specific type of influence is the borrowing of words” (Thomason 2001: 10). Not surprisingly, lexical borrowing research has developed into a rich and longstanding tradition in (historical) linguistics, which largely developed in the late nineteenth and early twentieth century from a systemic and structuralist perspective, i.e. a perspective that focuses on the position of borrowed items within the structure of the receptor language. The study of loanwords remains in fashion today, but current studies still predominantly follow the original structuralist paradigm. This structuralist focus of most loanword research is evident from the four main topics dealt with in the paradigm. In this section, we describe each of these four topics in some detail. A first goal frequently found in existing loanword research is to provide a precise definition of what a loanword is, making the necessary distinctions between different transfer types. In some way or other, most accounts go back to the taxonomies of borrowing proposed by Betz (1936, 1959) (cf. Duckworth 1977 for English terminology) and Haugen (1950, 1953). As such, most of the existing categorizations focus on the possible combinations of source (SL) and receptor language (RL) material in form-meaning pairs (cf. Table 1) (see Onysko 2007 for a more detailed account). The most prototypical type of lexical borrowing consists of the introduction of both form and meaning from the SL in the RL (e.g. the English loanword compact disk in Dutch). Alternatively, an SL meaning can be attached to a (new) RL form. For example, take the semantic specialization of the Greek word angelos (from ‘messenger’ to ‘religious messenger’,
Introduction
3
Tab. 1: Form-meaning pairs in lexical borrowing SL meaning SL form RL form mixed form
RL meaning
direct loan pseudo-loan indirect loan (language-internal) hybrid loan
i.e. ‘angel’) analogous to and based on the extension of ml’k in Hebrew (cf. Geeraerts 2010a). Next, an SL meaning can be attached to a mixed form. Consider for example Dutch webpagina, consisting of the English noun web and the Dutch pagina ‘page’ based on the English compound webpage. Finally, a (new) SL form can be given an RL meaning, such as the German use of English Handy to refer to a mobile phone (see Winter-Froemel 2011 for further examples and a classification of pseudo-loans). Existing taxonomies of lexical borrowing also often rely on the degree to which a loanword has been adapted to follow the morphophonological and orthographic rules of the receptor language, which forms the second main topic in existing accounts. In his taxonomy, Betz (1959) for example provides only one dichotomous subcategorization of direct loans, contrasting items which are not adapted to the morphophonological rules of the receptor language (“foreign words”), and those which are (“loanwords”). However, it is in principle unlikely that a loanword is not adapted to the receptor language in any way: typically, some of the orthographical, phonological or morphological features of the borrowed form need to be changed to fit the systemic requirements of the receptor language. Most current approaches in loanword research focus on morphological nativization, and primarily on pluralization and gender assignment: given that the source language does not mark gender overtly (or not as extensively as the receptor language), what gender will be assigned to these forms in the receptor language (cf. Matras 2009 and Winter-Froemel 2011 for well-focused overviews of nativization and adaptation)? Mainly, these approaches have aimed at listing the patterns found and, less frequently, on providing some general motivations for gender assignment (e.g. Weinreich 1970: 45 on animacy, gender of possible native equivalents, productivity of genders in the receptor language etc.). In-depth analyses of variation in and motivations for gender assignment are so far relatively rare (cf. Winter-Froemel, this volume, and Rothe, this volume). An additional task in defining the precise object of research on lexical borrowing is finding (universal) structural criteria for demarcating borrowing from codeswitching, which forms the third topic addressed in current approaches. Put simplistically, codeswitching is the process of switching between different grammatical systems within a single conversation. In contrast, lexical borrowing involves the introduction of a foreign lexical item in the vocabulary of the receptor language, not the switching between grammatical systems (Muysken 2000: 70). Simple though this distinction may seem, the reality is far more complex. Especially difficult is the status of single source language content words occurring in the receptor language. Should a distinc-
4
Eline Zenner and Gitte Kristiansen
tion be made between single-word switches and loanwords, and if so, how? As concerns the first question, Myers-Scotton (2002: 153) stresses that there basically is no need for a synchronous distinction between borrowing and codeswitching. Most other researchers do believe that a distinction needs to be made, for which they rely on a number of criteria. Most frequently, existing theories consider the proficiency of the speaker (cf. Thomason 2001: 133), the morphophonological nativization of the loanword (Poplack 1980; Appel and Muysken 1988: 172–173 and more recent approaches such as Arroyo and Tricker 2000 or Cacoullos and Aaron 2003), the “listedness” of the loanword (inclusion in standard lexicography of the receptor language; e.g. Muysken 2000: 71 on Di Sciullio and Williams 1989) or the frequency of the loanword as most important diagnostics. Recently, suggestions have been made to let go of the idea of a sharp dichotomy between borrowing and codeswitching and to consider both contactinduced phenomena as related points on a continuum (Thomason 2001: 133; Matras 2009: 110). The idea of “nonce borrowing” – incidental, transient borrowings that are practically equivalent with single-word code switches – as advocated by Sankoff, Poplack, and Vanniarajan (1990) and Poplack and Meechan (1995) also fits into this trend. Promising though this approach is, an important remaining shortcoming is that the debate on the status of single-word switches seems to take up so much attention that the possibility of multi-word borrowing (i.e. the borrowing of phraseological units or constructions) is hardly considered (cf. infra). Finally, instigated by Whitney in 1881, researchers aim to find a universal scale of receptivity to foreign material. The main question in this line of research is how we can pattern, interpret and explain variation in the borrowability of linguistic items. So far, most attention has been paid to patterning variation in borrowability, where the most typical approach is to build (universal) clines of borrowability for different loanwords (expressed in type and token counts): the idea is to identify (universal) constraints on borrowing based on these clines, primarily as concerns the borrowability of different word classes. Consistently, these accounts reveal that nouns are the most borrowable items (e.g. Field 2002, Gómez Rendón 2008). With regard to interpreting such scales of borrowability, the main question is whether the clines should be considered as modeling an implicational, a diachronic or a quantitative pattern (Van Hout and Muysken 1994). As the clines are typically based on frequency information, a quantitative interpretation seems warranted: nouns are more frequently borrowed than verbs. However, a diachronic interpretation is also possible: in any given contact setting, receptor languages will first borrow nouns, and only then verbs. An implicational interpretation, then, holds that when a language has borrowed verbs, you can rest assured that it has also borrowed nouns. Finally, in explaining the clines, the question of which contexts and, more particularly, which contact settings, enable the borrowability of which parts of the cline is addressed. Currently, accounts addressing this point are largely based on anecdotal evidence and attention is mainly restricted to macro-social categories (cf. Matras 2009: 154) (note for example the approach taken by Thomason & Kaufman 1988, who define the ideal setting in which intense language contact is likely to occur).
Introduction
5
In contrast, attempts quantifying the effect of micro-social predictors on variation in the use and success of loanwords are virtually absent (but see Poplack, Sankoff, and Miller 1988; Sharp 2001; and Zenner, Speelman, and Geeraerts forthcoming). Also, the importance of speaker attitudes for borrowability (the often-mentioned prestige factor, see e.g. Hock and Joseph 1996) remains largely untested (cf. Van Meurs, Hornikx, and Bossenbroek, this volume). A final important observation on current approaches to lexical borrowing concerns the methods used to address the topics presented above. Currently, researchers rely on small corpora, observations from dictionaries or ad hoc created samples of personal observations (often without being clear on the sources from which these observations are drawn). These limited collections are mainly used to draft inventories of different types of loanwords, focusing on nativization processes, word classes or diachronic patterns (cf. Backus this volume and Winter-Froemel this volume). Considering the limited amount of data used, it is hard for researchers to draw reliable empirical conclusions concerning their research questions and hypotheses. Moreover, given the restriction in size, researchers usually refrain from conducting (advanced) quantitative analysis. As such, it is hard to determine to what extent the patterns found in a given dataset can be extrapolated to the linguistic community at large. Finally, most empirical approaches rely on (written or spoken) corpora: experimental designs are surprisingly scarce, and studies aiming to combine results from such experimental approaches with corpus-based research (cf. Arppe and Jrvikivi 2011) are to our knowledge absent altogether. This brief overview is meant to illustrate how current lexical borrowing research is primarily developed from a structuralist perspective and how this restricted focus has led to the neglect of a number of important questions and topics concerning the process of lexical borrowing. On the one hand, the restricted focus on single-word units has impeded researchers from including onomasiological (concept-based) or phraseological perspectives. On the other hand, the main goal of most studies has been to provide inventories of the types of loanwords found in the data as a means to describe the structural processes loanwords are subjected to: more in-depth variationist, quantitative corpus-based or experimental approaches are virtually absent. This volume collects eight studies that illustrate how taking such new onomasiological, phraseological and methodological perspectives on loanwords can help advance our understanding of the borrowing process. Below, the ways in which such perspectives can be developed are introduced in some more detail.
6
Eline Zenner and Gitte Kristiansen
2 Advancing lexical borrowing research 2.1 Expanding on the word as the unit under scrutiny As was mentioned in the previous section, the structural perspective of lexical borrowing research has led to a restricted focus on “the word” as the unit under scrutiny. However, it is useful to expand on this focus in at least two ways. On the one hand, a shift in attention from lexical items to concepts is advocated, which entails taking an onomasiological approach. On the other hand, we will indicate how paying more attention to the textual context surrounding the word reveals the possibility of borrowed multi-word units. Below, we briefly present each of these aspects.
2.1.1 From lexemes to concepts A first way to open up the restricted focus on single words is to link up loanword research to the Cognitive Linguistic distinction between semasiology and onomasiology. The distinction was introduced to lexical semantics as early as in 1903 (in the work of Zauner, cf. Geeraerts 2010a: 31), but the terminology was revived in Cognitive Linguistic approaches to lexical semantics, and most notably in the work of the Leuven University research unit Quantitative Lexicology and Variational Linguistics (see mainly Geeraerts, Grondelaers, and Bakema 1994; Geeraerts 1997 and Grondelaers and Geeraerts 2003).
Fig. 1: Semasiology versus onomasiology
Semasiological approaches to lexical semantics and lexicology take the word as point of departure, and study its different meanings and how these are related. In onomasiology, the focus is on naming instead of on meaning: the possible lexicalizations for a particular concept are charted. One of the most popular topics within onomasiology is outlining different mechanisms of lexicogenesis (e.g. borrowing, word formation). Consider Figure 1 as an example to illustrate the distinction between both perspectives. In the figure (like in this text), concepts are printed in small caps and lexemes
Introduction
7
in italics. The dotted triangle represents a semasiological perspective, focusing on the Dutch word bank. From this perspective, the point is to appreciate how Dutch bank has two different meanings: it can refer to a financial institution (and also to the building in which this financial institution is situated) (’bank’), or to a usually comfortable piece of furniture used for seating more than one person (’couch’). The full triangle in Figure 1 represents an onomasiological perspective, focusing on the concept PIECE OF FURNITURE FOR SEATING MORE THAN ONE PERSON. In Dutch, this concept can be named by the lexemes bank, sofa, zetel, and canapé. Taking an onomasiological approach to the study of loanwords implies that attention is not only paid to (new) source language material, but also to possible existing receptor language alternatives (e.g. Dutch spijkerbroek as an alternative for the English loanword jeans ). Amongst others, an onomasiological perspective helps redefine theoretical classifications like the distinction between core and cultural borrowings: “cultural borrowed forms are words for objects new to the culture (e.g. CD or compact disk, espresso ), but also for new concepts (e.g. overtime ). Core borrowed forms are words that more or less duplicate already existing words in the L1 (e.g. words for ‘brother’ or ‘home’ or words for time references such as le weekend in French)” (Myers-Scotton 2002: 239). This means that core borrowed forms are those which are the only lexicalization of a given concept. For cultural borrowed forms, there is onomasiological variation: they have to compete with a receptor language alternative. Also, an onomasiological approach opens up ways to introduce more reliable methods of measuring variation in the success of loanwords. Although the reliability of raw frequency (either in the form of type or token counts) has been questioned repeatedly, both in lexicology (e.g. Speelman, Grondelaers, and Geeraerts 2003) and in lexical borrowing research (e.g. Van Hout and Muysken 1994), type and token counts still form the main type of analyses in loanword research. However, as stressed by Van Hout and Muysken (1994), the frequencies of the loanwords should be combined with some type of set-external proof. One possibility is to introduce a concept-based success-measure, in which the existence of alternative (receptor language) expressions is taken into account when quantifying the success of a loanword (e.g. schaduwschrijver as Dutch alternative for the French loanword nègre ‘ghostwriter’) (Zenner, Speelman, and Geeraerts 2012). Several contributions to this volume will provide different solutions as to how such an onomasiological approach can be included in loanword studies.
2.1.2 From lexemes to multi-word units An onomasiological approach to measuring borrowability is one way to expand on the restricted focus of existing lexical borrowing research on single-word units. Another possibility is to address the borrowability of multi-word units, such as idiomatic expressions or constructions.
8
Eline Zenner and Gitte Kristiansen
By linking up loanword research with existing accounts on phraseology, the possibility of borrowing multi-word units, idioms and fixed phrases receives the attention it has so far been lacking. Ever since Firth (1957: 11) coined his famous slogan “you shall know a word by the company it keeps” and introduced research on collocations (e.g. cow and milk ), lexical semantics and lexicology have moved away from analyzing words in isolation towards corpus linguistic studies where the importance of context is emphasized (cf. Geeraerts 2010a). One result is a renewed attention for pre-fixed chunks (idioms such as kick the bucket, but also less figurative fixed phrases such as as good as it gets ). Such formulaic sequences, which include “[any] sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar” (Wray 2002: 9), have over the last decades become a research topic in their own right (Moon 1998; Wray 2002; Van Lancker-Sidtis and Rallon 2004). Amongst others, this is apparent in the detailed attention given to (relatively) fixed and (relatively) conventionalized units in Cognitive Linguistics, and more specifically in construction grammar (Langacker 1987; Goldberg 1995 and Croft 2001). However, in lexical borrowing research, despite there being “the generally recognized possibility of borrowing idiomatic phrases as units” (Pfaff 1979), hardly any attention has been paid to the borrowability of such fixed expressions (but note Backus 1996 on pre-fixed chunks in insertional codeswitching). Recently, some advances in this respect have been made in the field of anglicism research: Stefanowitsch (2002) discusses a number of examples of German-English bilingual punning, such as Message in a Zottel (Zottel ‘shaggy-haired (person)’), Van de Velde and Zenner (2010) discuss how the MTV-program Pimp my Ride gave rise to the introduction of the expression [pimp my N ], and several of the contributions in Furiassi, Pulcini, and RodriguezGonzalez (2012) present qualitative accounts on loan translations of English fixed expressions (e.g. Gottlieb 2012; Marti Solano 2012; Oncins-Martinez 2012; Fiedler 2012). One of the aims of this volume is to expand on and add to this new line of research.
2.2 Methodology: Variationist approaches, experimental designs and converging evidence Given the relatively low average frequency of individual content words (as opposed to function words like pronouns or grammatical structures like causatives), the study of lexical variation ideally relies on large corpora (see Geeraerts 2010b; Ruette 2012: 5). However, in contact linguistic studies it is often stressed that finding sufficient data is difficult, because the linguistic community under scrutiny is either limited in size and hardly produces written material, or because the contact phenomena under scrutiny are typical of spoken language (see Backus 1996; Backus this volume). However, over the past decades quite some sizeable newspaper corpora, with word counts running
Introduction
9
into the billions, have been made available. Especially for weak contact settings (like in the case of the spread of English loanwords in Western-European languages), these corpora can easily be (but are usually not) applied to lexical borrowing research. Once such larger corpora are used, including the study of lectal variation in the research design becomes a sine qua non : [t]here was a time when the progress of research required that each community should be considered linguistically self-contained and homogeneous. (…) It certainly was a useful assumption. By making investigators blind to a large number of actual complexities, it has enabled scholars, from the founding fathers of our science down to the functionalists and structuralists of today, to abstract a number of fundamental problems, to present for them solutions perfectly valid in the frame of the hypothesis, and generally to achieve, perhaps for the first time, some rigor in a research involving man’s psychic activity. (…) But we shall now have to stress the fact that a linguistic community is never homogeneous and hardly ever self-contained. (Weinreich 1970: vii)
Naturally, applying Weinreich’s quote to the current state of empirical analyses on loanwords is a bit of an anachronism, but this is in itself quite ironic. Weinreich clearly considered the monolectal fallacy (i.e. perceiving of a linguistic community as homogeneous and self-contained) as a phenomenon of the past, although the neglect of socio-lectal variation still is the norm in lexical borrowing research. Nevertheless, considering Weinreich’s reasoning, it is not hard to appreciate how truly empirical (corpus-based) approaches should be variationist: large corpora typically represent some form of regional, social or register variation. As such, the first methodological perspective we wish to add to existing borrowing research is a sociolinguistic perspective, linked to the tradition of Labovian variationist studies (e.g. Zenner, Speelman, and Geeraerts forthcoming) or to the tradition of interactional analyses (e.g. Zenner and Van de Mieroop forthcoming). As a second step in our methodological reasoning, it is important to appreciate that any corpus can be heterogeneous in more than one way: regional, social and register variation can occur simultaneously. As such, a truly variationist study ideally follows a multifactorial design: instead of focusing on either regional variation, linguistic variation or social variation (as is currently the norm), the combined effect of speaker-related, community-related and language-related features on the linguistic phenomenon under scrutiny (in our case lexical borrowing) should be patterned. As a natural consequence, when aiming to disentangle the importance of these parameters, the analyses need to rely on some form of quantification. Although “quantification is not the essence of empirical research, [… it] simply follows in a natural way from what an empirical methodology tries to achieve: quantification in empirical research is not about quantification, but about data management and hypothesis testing” (Geeraerts 2010c: 72). More specifically, inferential statistics are preferred. As opposed to purely descriptive statistics (such as type or token counts or basic proportions), inferential statistics aim to indicate to what extent researchers can extrapolate the patterns found beyond the sampled data towards the speech community at large,
10
Eline Zenner and Gitte Kristiansen
most typically by means of some form of significance testing. Taking into account that such techniques allow researchers to find out which of the (linguistic, regional, social …) variables introduced account for most of the attested variation and to determine how these variables interact (see Baayen 2008), they are particularly useful when adopting a multifactorial variationist approach. As such, the use of inferential statistical analyses is the second methodological perspective we wish to add to the study of lexical borrowing. For the final methodological perspective introduced in this volume, we wish to emphasize the importance of combining different types of methods and data (see e.g. Arppe and Jrvikivi 2007; Backus this volume). Attitudinal data and corpus data can for example be very complementary, each serving as one piece to solve the puzzle of the process of lexical borrowing. For one thing, combining types of linguistic evidence can help test the often heard, but hardly tested claim that prestige affects borrowability. Where corpus-based analyses can help attest the actual variation in the success rates of different loanwords in different contact settings, experimental data can help quantify the attitudes towards the source and receptor languages under scrutiny. Using experimental designs (such as matched guise tests, association tasks or affective priming experiments; Speelman et al. 2013) – as opposed to questionnaires or surveys – helps the researcher to identify subconscious, covert attitudes towards the languages studied. However, up to this point, experimental designs are scarce in research on lexical borrowing. Several contributions to this volume will aim to address that gap, by showing which new insights can be attained in experimental studies on loanwords and by combining types of linguistic evidence. Summarizing, linking up with advances made in Cognitive Sociolinguistics (e.g. Geeraerts, Kristiansen, and Peirsman 2010), sociolinguistics, corpus linguistics and psycholinguistics, the methodological improvements proposed in this volume centre around the following three points. First, the importance of sufficiently large and empirically reliable data collections is emphasized. Second, we stress the importance of quantifications and of objectively defined (dependent and independent) variables whose significance is tested by means of inferential statistical analyses. Finally, we advocate the introduction of experimental research designs and of studies combining types of linguistic evidence to the field of lexical borrowing research.
3 Outline of the volume This volume aims to illustrate the new perspectives on lexical borrowing that we presented above. Before proceeding to an outline of the contents of the individual papers, it is important to appreciate that several papers illustrate more than one of the perspectives introduced above. For example, the contribution by Frank van Meurs, Jos Hornikx, and Gerben Bossenbroek not only presents an example of experimental designs in loanword research, but also takes an explicitly onomasiological perspec-
Introduction
11
tive. Furthermore, while the papers by Alexander Onysko and Andreaa Calude, Augusto Soares da Silva and Eline Zenner, Dirk Speelman, and Dirk Geeraerts illustrate an onomasiological or phraseological perspective, they also constitute prime examples of the methodological perspectives outlined above. As such, these papers illustrate how the new perspectives on lexical borrowing presented above should not be given an either/or-interpretation: they can easily be implemented in one and the same research design. This conviction is also stressed in the first paper in the volume: in “A usagebased approach to borrowability”, Ad Backus presents a theoretical study advocating a usage-based approach to studying borrowing and borrowability, which combines insights from Cognitive Linguistics and contact linguistics. Several theoretical and methodological issues resulting from this combination are described: the opposition of single-word and multi-word units, the opposition between text corpora and experimental data, and, finally, the opposition between individual entrenchment and community-based entrenchment. As such, this programmatic paper presents some of the new perspectives introduced above against a broader and more detailed theoretical background. Next, “What makes a catchphrase catchy? Possible determinants in the borrowability of English catchphrases in Dutch”, provides a more hands-on example of the importance of attention for multi-word units in lexical borrowing research highlighted by Backus. More specifically, Eline Zenner, Dirk Speelman, and Dirk Geeraerts aim to address the current lack of attention for borrowed phraseology by patterning variation in the borrowability of 229 English catchphrases from movies and TV shows (the truth is out there; nudge nudge, wink wink; shaken, not stirred etc.) in two large lemmatized and syntactically parsed newspaper corpora of Dutch. Using inferential statistical analyses, they disentangle the importance of features relating to the source of the catchphrase, of media-exposure and lectal variation on the borrowability of the catchphrases under scrutiny. The results of the regression analysis show the importance of the original media origin of the catchphrase, of the popularity of the catchphrase in International English and of the type of receptor language variety (i.e. the higher register of newspaper language). No statistically significant difference is found between the use of catchphrases in Belgian Dutch and Netherlandic Dutch, which is surprising given the different socio-cultural history of both regions. As was explained in the previous section, analyzing the borrowability of multiword units is only one of two ways to expand on the restricted focus of existing loanword research on the single word as the unit under scrutiny. As is illustrated by a number of papers in the volume, it is also possible to include a concept-based, onomasiological perspective to loanwords, by taking the concept expressed by the loanword as point of departure in the analyses. First, in “Formal variance and semantic changes in borrowing: Integrating semasiology and onomasiology”, Esme Winter-Froemel outlines the main arguments for contrasting semasiological and onomasiological perspectives on loanwords with rigor and attention for detail. In her study, Winter-Froemel
12
Eline Zenner and Gitte Kristiansen
presents a qualitative account of variation in the orthographic, morphophonological and semantic nativization of source language lexemes. Using the Internet as a corpus, she for example finds dozens of alternatives for the English loanword people in French (e.g. pipole or pipeul ), demonstrating how the neglect of variation in current accounts is far from warranted. She accounts for the attested variation by relying on Blank’s semiotic model (2001), formalizing the distinction between ear loans (introduced in the receptor language via spoken communication) and eye loans (introduced in the receptor language via written communication) and emphasizing the long-neglected role of hearer-induced change in loanword nativization. In “Measuring and comparing the use and success of loanwords in Portugal and Brazil: A corpus-based and concept-based sociolectometrical approach”, Augusto Soares da Silva investigates to what extent an onomasiological perspective can help to fine-tune our understanding of the role played by foreign influence in accounting for variation between Brazilian Portuguese and European Portuguese. More specifically, Soares da Silva patterns the role of foreign (mainly French and English) lexicalizations in a diachronic decrease or increase in lexical uniformity between the two varieties by means of the concept-based measure of onomasiological variation (Geeraerts, Grondelaers, and Speelman 1994; Speelman, Grondelaers, and Geeraerts 2003). Relying in part on a survey-based analysis, this paper also serves as an example of the importance of combining types of linguistic evidence. Based on quantitative analyses, Soares da Silva concludes that the influence of English is stronger in Brazil than in Portugal, that foreign words have contributed to a decrease in lexical uniformity between the two varieties and that the results from the attitudinal and corpus-based data do not completely align. Yet another way in which native equivalents for loanwords can be accounted for in corpus-based loanword research is illustrated by “Comparing the usage of M¯aori loans ¯ in spoken and written New Zealand English: A case study of Maori, Pakeha, and Kiwi ”. In their contribution, Alexander Onysko and Andreea Calude focus on the diachronic and sociolinguistic spread of the M¯aori loanwords Maori , Pakeha and Kiwi in NewZealand English. The study is based on both spoken and written corpora and includes a wide variety of extra-linguistic features to explain variation in the frequency of these loans, such as ethnicity of speaker and hearer (and the interaction between both), age and gender of the speaker, topic and genre of the conversation for the spoken data and diachronic period, source, topic and newspaper section for the written data. For their analyses, Onysko and Calude rely on state of the art statistical techniques such as generalized linear models (Baayen 2008) and variable-based neighbor clustering analysis (Hilpert and Gries 2009). In the regression analyses, native speaker alternatives are accounted for by including the frequency of these native lexemes as offset term. The descriptive statistical analyses present the proportion of the M¯aori loanwords over the sum of the loanwords and the native equivalents and reveal how the ethnicity of both speaker and hearer plays a crucial role for the use of Maori in the spoken data: for Maori and Pakeha, the analyses of the spoken and written data both show how the
Introduction
13
loans are primarily used to talk about M¯aori culture, and how Kiwi is mainly used as positive marker of the New Zealand identity. As concerns lectal variation, the written data reveal significant regional differences in the use of Maori and Pakeha, related to the social situation of the North and South Islands. Finally, the diachronic analyses of the written data shed new light on existing claims on the rise of M¯aori loanwords in New Zealand English. The following paper, “English loanwords and their counterparts in Dutch job advertisements: An experimental study in association overlap” by Frank van Meurs, Jos Hornikx, and Gerben Bossenbroek, is one of the first to use an experimental set-up to trace semantic differences between loanwords and their native equivalents. Although quite a number of the papers presented above also introduce new methodological perspectives to lexical borrowing research, this paper deserves a special mention for its use of experimental techniques. More specifically, applying the Conceptual Feature Model (De Groot 1992) to the use of English in Dutch job advertisements, the study consists of an association task for English job-ad terms and their Dutch equivalents (completed by sixty university students), and a follow-up norming study to determine the levels of concreteness of the concept expressed and of the cognateness between the source and receptor language terms expressing the concept. Intriguingly, the results show low average overlap between the Dutch and English terms: job-ad related Dutch and English terms in general do not evoke the same types of associations. Moreover, van Meurs, Hornikx, and Bossenbroek show how cognateness (but not concreteness) contributes significantly to explain the attested variation in the degree of overlap in associations for the different word-pairs included in the study. The penultimate paper of the volume, “On the variation of gender in nominal language mixings”, by Astrid Rothe, also relies on experimental data. More specifically, Rothe presents the results of a forced choice task, used to trace variation in gender assignment to L2 insertions by German-French, German-Italian and German-Spanish bilinguals. In accounting for the attested variation in gender assignment, attention is paid to speaker-related features like language proficiency and semantic features like gender transparency. Measuring statistical significance of the attested patterns by means of Chi-square tests, Rothe is able to discern interesting patterns in the attested variation. In the discussion, the results are linked to the ongoing debate between lexical borrowing and single-word codeswitching: while the (as good as) balanced bilingual speakers follow the codeswitching pattern – using the L2 gender – the monolinguals and weakly bilingual speakers follow the borrowing pattern by opting for the gender of the nearest L1 equivalent. In “Linguistic globalization: Experiences from the Nordic laboratory”, the final paper of the volume, Helge Sandøy demonstrates how results from newer experimental studies and more traditional corpus-based analyses can be brought together to present a more encompassing view on a contact situation. The project described in the paper is not only noteworthy for this novel combination of types of linguistic evidence, but also for the size of the conducted research project. Specifically, the paper focuses on varia-
14
Eline Zenner and Gitte Kristiansen
tion in the penetration of English words in Iceland, the Faroes, Norway, Denmark, Sweden, Swedish Finland and Finnish Finland (the Nordic countries). In his paper, Sandøy discusses results from a diachronic corpus-based analysis of newspaper data (comparing the amount of loanwords in 1975 and 2000). The same corpus is used to measure variation in the degree of purism found in the different countries (mainly by comparing nativization processes). Next, overt attitudes are measured by asking participants to rank the countries under scrutiny according to their alleged degree of openness to foreign languages, showing how the Danes are considered the least puristic. Interestingly, when comparing these results to those resulting from a matched/verbal guise test (which triggers covert attitudes), Sandøy finds that, subconsciously, it is precisely the Danes who are the most skeptical towards English influence. In a final section, the different patterns found for the seven regions are outlined against information on the macro-economic situation of the different regions in the past decades, illustrating how the speed of economic modernization can to some extent help explain the contrasts found amongst the central Scandinavian languages. Overall, the papers brought together in this volume are prime examples of the promising new perspectives developed and presented for the study of lexical borrowing. Naturally, many more applications of the proposed innovations can be envisaged, and future research exploring these possibilities is warmly welcomed.
References Androutsopoulos, Jannis. 2012. English ‘on top’: Discourse functions of English resources in the German mediascape. Sociolinguistic Studies 6(2). 209–238. Appel, René & Pieter Muysken. 1988. Language contact and bilingualism. London: Arnold. Arppe, Antti & Juhani Jrvikivi. 2007. Every method counts: Combining corpus-based and experimental evidence in the study of synonymy. Corpus Linguistics & Linguistic Theory 3(2). 131– 159. Arroyo, José Luis Blas & Deborah Tricker. 2000. Principles of variationism for disambiguating language contact phenomena: The case of lone Spanish nouns in Catalan discourse. Language Variation and Change 2. 103–140. Baayen, Harald. 2008. Analyzing linguistic data. A practical introduction to statistics using R. Cambridge: Cambridge University Press. Backus, Ad. 1996. Two in one. Bilingual speech of Turkish immigrants in The Netherlands. Tilburg: Tilburg University Press. Betz, Werner. 1936. Der Einfluss des Lateinischen auf den althochdeutschen Sprachschatz. Heidelberg: Winter. Betz, Werner. 1959. Lehnwörter und Lehnprägungen im Vor- und Frühdeutschen. In Friedrich Maurer & Friedrich Stroh (eds.), Deutsche Wortgeschichte, 127–147. Berlin: Schmidt. Blank, Andreas. 2001. Einführung in die lexikalische Semantik für Romanisten. Tübingen: Niemeyer. Bloomfield, Leonard. 1933. Language. London: George Allen & Unwin Ltd. Cacoullos, Rena Torres & Jessi Elana Aaron. 2003. Bare English-origin nouns in Spanish: rates, constraints and discourse functions. Language Variation and Change 15. 289–328.
Introduction
15
Croft, William. 2001. Radical Construction Grammar: Syntactic theory in typological perspective. Oxford: Oxford University Press. De Groot, Annette M. B. 1992. Bilingual lexical representation: A closer look at conceptual representations. In Ram Frost & Leonard Katz (eds.), Orthography, phonology, morphology, and meaning, 389–412. Amsterdam: Elsevier. Di Sciullio, Anna-Maria & Edwin Williams. 1989. On the definition of word. Cambridge (MA): MIT press. Duckworth, David. 1977. Zur terminologischen Grundlage der Forschung auf dem Gebiet der englisch-deutschen Interferenz. Kritische Übersicht und neuer Vorschlag. In Herbert Kolb and Hartmut Lauffer (eds.), Sprachliche Interferenz: Festschrift für Werner Betz zum 65. Geburtstag, 35–65. Tübingen: Niemeyer. Fiedler, Sabine. 2012. Der Elefant im Raum … The influence of English on German phraseology. In Cristiano Furiassi, Virginia Pulcini & Félix Rodriguez-González (eds.), The anglicization of European Lexis, 239–260. Amsterdam & Philadelphia: John Benjamins. Field, Fredric W. 2002. Linguistic borrowing in bilingual contexts. Amsterdam & Philadelphia: John Benjamins. Firth, John R. 1957. Papers in linguistics 1934–1951. London: Oxford University Press. Furiassi, Cristiano, Virginia Pulcini & Félix Rodriguez-González. 2012. The anglicization of European Lexis. Amsterdam & Philadelphia: John Benjamins Publishing Company. Geeraerts, Dirk. 2010a. Theories of lexical semantics. Oxford: Oxford University Press. Geeraerts, Dirk. 2010b. Lexical variation in space. In Peter Auer & Jürgen Erich Schmidt (eds.), Language in space. An international handbook of linguistic variation. Volume 1: Theories and methods, 821–837. Berlin & New York: Mouton de Gruyter. Geeraerts, Dirk. 2010c. The doctor and the semantician. In Dylan Glynn & Kerstin Fischer (eds.), Quantitative methods in Cognitive Semantics: Corpus-driven approaches, 61–78. Berlin & New York: Mouton de Gruyter. Geeraerts, Dirk, Stefan Grondelaers & Peter Bakema. 1994. The structure of lexical variation. meaning, naming and context. Berlin & New York: Mouton de Gruyter. Geeraerts, Dirk. 1997. Diachronic prototype semantics: A contribution to historical lexicology. Oxford: Clarendon. Geeraerts, Dirk, Gitte Kristiansen & Yves Peirsman. 2010. Advances in Cognitive Sociolinguistics. Berlin & New York: Mouton De Gruyter. Goldberg, Adèle E. 1995. Constructions: A Construction Grammar approach to argument structure. Chicago: Chicago University Press. Gómez Rendón, Jorge Arsenio. 2008. Typological and social constraints on language contact: Amerindian languages in contact with Spanish. Utrecht: LOT. Gottlieb, Henrik. 2012. Phraseology in flux: Danish anglicisms beneath the surface. In Cristiano Furiassi, Virginia Pulcini, & Félix Rodriguez-González (eds.), The anglicization of European Lexis, 169–198. Amsterdam & Philadelphia: John Benjamins. Grondelaers, Stefan, and Dirk Geeraerts. 2003. Towards a pragmatic model of cognitive onomasiology. In Hubert Cuyckens, René Dirven & Johan Taeldeman (eds.), Cognitive approaches to lexical semantics, 67–93. Berlin & New York: Mouton de Gruyter. Haugen, Einar. 1950. The analysis of linguistic borrowing. Language 26(2): 210–231. Haugen, Einar. 1953. The Norwegian language in America: A study in bilingual behavior. Philadelphia (PA): University of Pennsylvania Press. Hilpert, Martin & Stefan Th. Gries. 2009. Assessing frequency changes in multistage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition. Literary and Linguistic Computing 24 (4). 385–401.
16
Eline Zenner and Gitte Kristiansen
Hock, Hans Henrich & Brian D. Joseph. 1996. Language history, language change, and language relationship: An introduction to historical and comparative linguistics. Berlin & New York: Mouton De Gruyter. Hoffer, Bates L. 1996. Borrowing/ Lehnvorgänge/Emprunt linguistique. In Hans Goebl, Peter Hans Nelde, Zdenek Stary, and Wolfgang Wölck (eds.), Kontaktlinguistik: ein internationales Handbuch zeitgenössischer Forschung/ Contact linguistics: an international handbook of contemporary research/ Linguistique de contact: manuel international des recherches contemporaines, 541–548. Berlin & New York: De Gruyter. Langacker, Ronald W. 1987. Foundations of Cognitive Grammar, Vol.I: Theoretical prerequisites. Stanford (CA): Stanford University Press. Marti Solano, Ramon. 2012. Multi-word loan translations and semantic borrowings from English in French journalistic discourse. In Cristiano Furiassi, Virginia Pulcini, and Félix RodriguezGonzález (eds.), The anglicization of European Lexis, 199–216. Amsterdam & Philadelphia: John Benjamins. Matras, Yaron. 2009. Language contact. Cambridge: Cambridge University Press. Moon, Rosamund. 1998. Fixed expressions and idioms in English. Oxford: Clarendon Press. Muysken, Pieter. 2000. Bilingual speech: A typology of code-mixing. Cambridge: Cambridge University Press. Myers-Scotton, Carol. 2002. Contact linguistics: Bilingual encounters and grammatical outcomes. Oxford: Oxford University Press. Oncins-Martinez, José Luis. 2012. Newly-coined Anglicisms in contemporary Spanish: A corpusbased approach. In Cristiano Furiassi, Virginia Pulcini & Félix Rodriguez-González (eds.), The anglicization of European Lexis, 217–238. Amsterdam & Philadelphia: John Benjamins. Onysko, Alexander. 2007. Anglicisms in German. Borrowing, lexical productivity and written codeswitching. Berlin & New York: Walter de Gruyter. Onysko, Alexander & Esme Winter-Froemel. 2011. Necessary loans – luxury loans? Exploring the pragmatic dimension of borrowing. Journal of Pragmatics 43(6). 1550–1567. Pfaff, Carol W. 1979. Constraints on language mixing: Intrasentential code-switching and borrowing in Spanish/English. Language 55(2). 291–318. Poplack, Shana. 1980. Sometimes I’ll start a sentence in Spanish y termino en español: Toward a typology of code-switching. Linguistics 18. 581–618. Poplack, Shana & Marjory Meechan. 1995. Patterns of language mixture: Nominal structure in Wolof-French and Fongbe-French bilingual discourse. In Pieter Muysken (ed.), One speaker, two languages. Cross-disciplinary perspectives on code-switching. Cambridge: Cambridge University Press. Poplack, Shana, David Sankoff & Chris Miller. 1988. The social correlates and linguistic processes of lexical borrowing and assimilation. Linguistics 26. 47–104. Ruette, Tom. 2012. Aggregating lexical variation. Towards large-scale lexical lectometry. PhD thesis, University of Leuven. Sankoff, David, Shana Poplack & Swathi Vanniarajan. 1990. The case of the nonce loan in Tamil. Language Variation and Change 2(1). 71–101. Sapir, Edward. 1921. Language: An introduction to the study of speech. New York: Harcourt, Brace and company. Sharp, Harriet. 2001. English in spoken Swedish. A corpus study of two discourse domains. Stockholm: Almqvist and Wiksell. Speelman, Dirk, Adriaan Spruyt, Leen Impe & Dirk Geeraerts. 2013. Language attitudes revisited: Auditory affective priming. Journal of Pragmatics. Stefanowitsch, Anatol. 2002. Nice to miet you: Bilingual puns and the status of English in Germany. Intercultural Communication Studies 11(4). 67–84.
Introduction
17
Thomason, Sarah Grey, and Terrence Kaufman. 1988. Language contact, creolization, and genetic linguistics. Berkeley (CA): University of California Press. Thomason, Sarah Grey. 2001. Language contact. Edinburgh: Edinburgh University Press. Van de Velde, Freek & Eline Zenner. 2010. Pimp my lexis: het nut van corpusonderzoek in normatief taaladvies. In Els Hendrickx, Karl Hendrickx, Willy Martin, Hans Smessaert, William Van Belle, and Joop Van der Horst (eds.), Liever meer of juist minder? Over normen en variatie in taal, 51–68. Gent: Academia press Van Hout, Roeland, and Pieter Muysken. 1994. Modeling lexical borrowability. Language Variation and Change 6(1). 39–62. Van Lancker-Sidtis, Diana, and Gail Rallon. 2004. “Tracking the incidence of formulaic expressions in everyday speech: methods for classification and verification.” Language and Communication 24: 207–240. Weinreich, Uriel. 1970. Languages in contact. The Hague & Paris: Mouton de Gruyter. Whitney, William D. 1881. On mixture in language. Transactions of the American philological association 12. 5–26. Winter-Froemel, Esme. 2011. Entlehnung in der Kommunikation und im Sprachwandel. Theorie und Analysen zum Französischen. Berlin & Boston: Walter de Gruyter. Wray, Alison. 2002. Formulaic language and the lexicon. Cambridge: Cambridge University Press. Zenner, Eline, Dirk Speelman & Dirk Geeraerts. Forthcoming. A social analysis of borrowing in weak contact situations: English loanwords and phrases in a Dutch reality TV show. Submitted to International Journal of Bilingualism. Zenner Eline, Dirk Speelman & Dirk Geeraerts. 2012. “Cognitive Sociolinguistics meets loanword research: Measuring variation in the success of anglicisms in Dutch”. Cognitive Linguistics 23(4). 749–792 Zenner, Eline & Dorien Van de Mieroop. Forthcoming. The social function of English in weak contact situations: Ingroup and outgroup marking in the Dutch reality TV show ‘Expeditie Robinson’. Submitted to Multilingua.
Ad Backus
A usage-based approach to borrowability Abstract: Loanwords are generally thought to be interesting for historical linguistics and for the studies of bilingualism, but they have not featured very prominently in general linguistic theory. This contribution argues that a theoretical renewal, bringing in a usage-based approach, may make looking at loanwords more broadly useful. In particular, they provide valuable data for understanding language change. This will move the field beyond the establishment and explanation of borrowability hierarchies in historical linguistics and similar generalizations on possible sites of codeswitching in contact linguistics, and into the effort to build a usage-based account of language change. Loanwords are well suited, in particular, to studying the degree to which lexical elements are conventionalized in community varieties and entrenched in individual idiolects, both valuable types of empirical data for the study of language change. Several methodological hurdles, however, need attention if we are to make the most of this paradigm shift. The chief problem is the availability of sufficient data. Since funding agencies will not easily fund the building of large corpora of bilingual speech, it is important to develop additional methods. Psycholinguistic experimentation would be a welcome addition to the field of contact linguistics, as it will allow investigating the sorts of questions about loanword integration that a usage-based perspective puts forward. In discussing these issues, the paper attempts to solidify the links between contact linguistics and Cognitive Linguistics, thereby contributing to (1) a better understanding of the phenomenon of borrowing; (2) the account of language contact phenomena in a Cognitive Sociolinguistics framework (more specifically a usage-based account of contact-induced change); and (3) a further appreciation of the methodological issues involved in researching borrowing from these perspectives.
1 Introduction Loanwords, words that at one point were borrowed from another language, have led a relatively quiet life in linguistics in the last decades. They have figured prominently in historical linguistics and in the literature on bilingual codeswitching, but their impact on general linguistic theory has been negligible. The point I wish to make in this contribution is that this was fairly understandable given the focus on synchronic and syntactic issues in generative theory, but that the theoretical renewal that has set in with the advent of the usage-based approach makes it useful to take a fresh look at loanwords. In particular, they provide valuable data for understanding language change.
20
Ad Backus
Historical linguistics has always had an interest in loanwords as a matter of course. Layers of loanwords in a language tell you something about past contacts between speakers of the host and donor languages, and the nature of the words that were borrowed also tells you something about the nature of the contact. In addition, the study of loanwords gave rise to the issue of “borrowability”: do languages just borrow anything from each other, or are some words more borrowable than others (Bynon 1977: 231)? While most discussions of this question have stuck to loanwords proper, the question has been extended to take in all linguistic elements that languages borrow from each other, including shades of meaning (“semantic extension”), lexical combinations (“loan translations”) and grammatical patterns (“interference”), cf. Weinreich (1964); Johanson (2002); Matras (2009), for recent overviews. Loanwords have most intensively been studied in the field of codeswitching, the phenomenon of mixing two or more languages that is ubiquitous in bilingual communities around the world (Bullock and Toribio 2009). In fact, they have created quite some controversy because several theories, notably those of Shana Poplack and associates (see e.g. Poplack and Meechan 1995), crucially rely on being able to make a principled distinction between a word from the other language that is a loanword and one that is a codeswitch. Though battle lines are clearly drawn (cf. Myers-Scotton 2002; Poplack and Dion 2012), the debate is essentially unresolved; I will argue below that this is because of the strictly synchronic focus that characterizes studies of codeswitching. One of the beneficial effects of adopting a usage-based approach is that this synchronic view must be integrated with a diachronic one. The chief reason why a usage-based approach necessitates renewed interest in loanwords is that by definition it portrays change as a design feature of language, because of the way usage directly impacts mental representations. This is a very different agenda from the one that has dominated linguistics in the generative era, when mental representations were assumed to be mostly determined by Universal Grammar, and hence not really changeable at all, except in a superficial sense. Now that the issue is on the agenda, though, some methodological hurdles need to be overcome, since the methods used in contact linguistics are not sufficient for answering the research questions a usage-based approach to loanwords forces us to ask. Given the emphasis on issues such as variation and change, it is natural to situate these efforts in the emerging paradigm of Cognitive Sociolinguistics (Geeraerts and Kristiansen forthcoming). This contribution will first go over the findings about loanwords uncovered by the historical linguistic and codeswitching literatures. Section 3 will then reinterpret these findings from a usage-based perspective and introduce some of the questions that need to be asked, also going a little deeper into the crucial notions of entrenchment and conventionalization. Section 4 outlines what a usage-based account of loanwords could look like. The methodological hurdles facing this field are reviewed in Section 5, and Section 6 discusses some theoretical challenges ahead. The whole paper can be seen as an attempt to solidify the links between contact linguistics and Cognitive Linguistics, thereby hopefully contributing to (1) a better understanding of the phenomenon
A usage-based approach to borrowability
21
of borrowing; (2) an account of language contact phenomena in a Cognitive Sociolinguistics usage-based framework; and (3) a further appreciation of the methodological issues involved in researching borrowing from these perspectives.
2 Studying loanwords As studies of diachronic developments in languages and of codeswitching amply illustrate, languages borrow words from each other. In intense contact settings, the number of loanwords can grow to impressive numbers, so that there are languages for which the percentage of vocabulary that is considered to have been borrowed from other languages is as high as 50% or more (Thomason and Kaufman 1988; Mufwene 2008). This section will review previous research in these two fields and will conclude that the aspect of loanwords that is particularly relevant for linguistics as a whole is that it is one of the prime manifestations of language change. Historical linguistics and contact linguistics communicate surprisingly little with each other, given that they could easily be construed as dealing with the diachronic and synchronic aspects of language contact, respectively. However, both have established clines, or hierarchies, of borrowability, which rank parts of speech according to the ease with which they can be borrowed or “switched” (e.g. van Hout and Muysken 1994; Field 2002). This suggests that the appearance of an other-language word as a codeswitch is linked to its possible future status as a loanword; more about this later. In the case of historical studies, these clines are based on lists of borrowed words in a language, while the findings of contact linguistics are generally based on collections of recorded bilingual speech. The findings converge to a large degree: content words are borrowed more easily than function words, nouns are generally borrowed much more easily than other parts of speech, and the least borrowable of all are bound morphemes (cf. Matras 2009: 155). Naturally, various explanations have been offered for this pattern. The question is what causes content words, particularly nouns, to be more “attractive” (Johanson 2002). Suggested explanations refer to the distinction between open and closed classes, or between content and function words (loanwords add to the lexicon, and you can only add to open classes), the fact that there are more nouns to choose from (cf. Haugen 1972 for all of these explanations), the degree of syntagmatic freedom (nouns are less tied structurally to other words in the sentence, and can therefore be borrowed more easily; cf. Field 2002), or an underlying dimension of semantic specificity: the more specific the meaning of a word, the more attractive it is for other languages, as there is a good chance it would add to that language’s expressive richness (Backus 2001). Many nouns, of course, have highly specific meanings. These factors are unlikely to be independent of each other, but a comprehensive theory of lexical borrowing that unites the various contributions still awaits formulation. What I want to draw attention to here is that this body of work tends to focus on words only; almost no attention is paid to multi-word units or grammatical con-
22
Ad Backus
structions. This is perhaps understandable given the traditional division between lexicon and syntax as separate modules, but it is also surprising, for two reasons: (1) codeswitching data feature more than just the use of lone words from the other language (the possible future loanwords); and (2) both synchronic as well as diachronic contact data show that languages also borrow grammatical constructions. I will discuss these issues in turn. First, studies of modern bilingual settings ostensibly show how loanwords come to be: bilingual speakers often codeswitch, and one prominent type of codeswitching is the insertion of foreign words into utterances otherwise framed in the base language (Myers-Scotton 2002). Such inserted words may well be future loanwords; in fact, they may well be established loans already (see Section 5). However, if we compare lists of codeswitched items in empirical studies of bilingual corpora with the borrowability hierarchies compiled in historical studies, we see an interesting difference. While the historical loanword layers almost exclusively contain simplex words, with very few multi-word units as exceptions (e.g. le mot juste in English; cf. Bynon 1977; Thomason 2001), insertional codeswitching data, while confirming that content words, primarily nouns, are switched the most, also include many other types of insertions besides simple words. There are many attested examples of inserted phrases and collocations (cf. Muysken 2000; Myers-Scotton 2002; Backus 2003). This creates an intriguing question: what happens to these inserted chunks, phrases and expressions diachronically? Why does only a subset of insertions, i.e. simple words, end up as loanwords? We will return to this issue in Section 6. The development of insertional switches into loanwords over time is often hinted at, but the codeswitching literature contains little serious discussion about how exactly this process unfolds. Instead, discussion has focused on a purely synchronic issue: are lone other-language words found in a corpus of bilingual speech instances of borrowing or codeswitching? I will argue in Section 4 that the question makes no sense because the two terms refer to different phenomena that cannot be in complementary distribution. But let us first look at the controversy. Poplack (1980) suggested a few principles that were claimed to govern where and when in a sentence codeswitching is possible. As counterexamples started coming in, Poplack and associates responded with a more articulated theory, which most importantly claimed that many of these examples, usually involving lone other-language words, were actually not instances of codeswitching but of borrowing, an entirely different process (e.g. Poplack and Meechan 1995). Thus, it became important to be able to tell whether a particular case involved a codeswitch or a borrowing; in Poplack’s definition, borrowings behave morphologically and syntactically exactly like native words, i.e. they are inserted into sentences framed by the recipient language grammar without any indication that they are etymologically special. Interestingly, this follows exactly the definition of insertional codeswitching in other works (e.g. Muysken 2000; Myers-Scotton 2002), so that the continuation of the debate is somewhat surprising. Important for our purposes, this debate is purely about synchronic data, while borrowing is so obvi-
A usage-based approach to borrowability
23
ously a diachronic notion (Backus 2009). It makes sense to distinguish between synchronic language mixing and diachronic borrowing, but it makes equal sense to relate synchronic data somehow to what happens to the languages involved diachronically. A likely hypothesis is that words inserted as codeswitches in bilingual speech may become more and more acceptable in the wider community and become established loanwords. Second, historical linguistics includes a group of studies that look at grammatical changes in on-going contact settings. This field enjoys high vitality, and many articles and monographs have appeared in the last two decades (e.g. Aikhenvald 2002; Heine and Kuteva 2005; Verschik 2008). Generally, these studies make use of the tried and tested linguistic modes of descriptions, featuring a strict separation of lexical and structural issues, and largely focusing on the latter. There is now an impressive database of contact-induced grammatical changes in a growing range of languages and contact settings, allowing detailed hypotheses about what is typical and what is not in how languages influence each other. Particularly relevant is that these studies make available a detailed model of how exactly contact-induced change unfolds. For linguistic approaches in which lexicon and syntax are strictly separated, this may not seem very relevant if one wants to know anything about lexical borrowing, but this does not hold for the usage-based approach. If lexicon and syntax form a continuum, the study of loanwords may well benefit from a comparison with the study of grammatical borrowing. In fact, the usage-based approach potentially provides exactly what is needed to integrate both the synchronic and diachronic studies of contact effects, and the study of both lexical and structural contact effects, into a single framework. Before sketching the outlines of this framework, we will review this usage-based approach in the next section.
3 Usage-based linguistics and language change I will argue in this section that adopting a usage-based approach to linguistic theory (Barlow and Kemmer 2000; Bybee 2006) will help bring together contact linguistics, historical linguistics, and also sociolinguistics, and help them face the challenges outlined above. During the rise of usage-based linguistics in the previous twenty years or so, links with the concerns of sociolinguistics have repeatedly been mentioned. In a sense, it seems astounding that the fields have not embraced each other immediately, since a usage-based approach to mental representation all but calls out for attention to differences between people in their language use, as studied by sociolinguists, while it can provide sociolinguistics with a model of the cognitive organization of language that is much more in line with its central concerns, i.e. variation and change, than the long-dominant generative approach was (cf. Kristiansen and Dirven 2008). Language change has not featured prominently in recent linguistic theorizing. For the strictly synchronic linguistics of the past decades, the goal was to model the sta-
24
Ad Backus
ble and invariant components of linguistic knowledge, usually hypothesized as innate knowledge, and then language change seems a relatively superficial concern, or in any case one outside the scope of linguistics proper. Since usage-based approaches to linguistic competence do attribute direct theoretical importance to actual usage, and therefore also to the social and psychological factors that determine language use, variation and change gain in importance, being built right into the design of language. No two people have identical usage, and within an idiolect there will be constant fluctuations in the use of particular linguistic elements. Change, as a result, is often a matter of “merely” increasing or decreasing frequency of use, rather than the adoption or complete loss of particular forms (cf. Heine and Kuteva 2005). It may be important to realize this, as for many linguists who work outside a usage-based framework, language change requires that the underlying syntactic structure has changed. As usage-based frameworks do not work with “underlying” structures, they do not recognize this restriction. The blueprint for a usage-based account of language change is given in Croft (2000); also see Backus (2005). Briefly, faced with some communicative task, a speaker has two choices: say something in a familiar way or say it in a creative way. Each utterance involves numerous such choices, as it pertains to the selection of words and how to pronounce them, word combinations, grammatical constructions, particular meanings of words, discourse structures, etc. Each selection does either of two things: it may represent an innovation (referred to by Croft as “altered replication”) or it may replicate a unit already stored in the speaker’s mental representation. In both cases, the selection raises that unit’s level of entrenchment: each instance of use contributes to the strength of the unit’s memory trace. The frequency with which a speaker encounters a unit is assumed to correlate with its degree of cognitive entrenchment (Gries 2008: 18). Many innovations occur only once and have no further consequences for the speaker’s inventory of linguistic units, his or her linguistic competence. Others, however, are used again, and enter in competition with any already existing units that convey more or less the same meaning. It is not difficult to see that this scenario may be relevant for loanwords: a loanword enters the language as an innovation in the speech of one or more people, it gets used more and more, and at some point it has been conventionalized as a regular word in the language. If there was a native equivalent, it may still be in competition (some people use the native word, others use the loanword, or people use both words some of the time), so that both have some degree of entrenchment. If there was no native equivalent to begin with (as with “lexical gaps”), the loanword will be conventionalized if the concept it stands for is a useful one, in the sense that people talk about it often enough. In this perspective, the explanation of contact-induced language change, including the life cycle of loanwords, comes down to two things: explaining the social determinants of language use and the way our cognitive system deals with this, both in terms of synchronic processing and of diachronic storage. The latter issue will be picked up in Section 3.2; first, Section 3.1 will go into the role of social factors.
A usage-based approach to borrowability
25
3.1 Social determinants of language use We use language according to the needs of the communicative situations we find ourselves in. Social factors, therefore, broadly determine how we talk, and they include the person-related factors that make up one’s social identity (age, gender, social group, ethnicity, etc.), community-related factors that determine the norms of behavior (e.g. which language or register is deemed suitable for a particular exchange), and the setting-related factors that characterize the communicative situation (genre, personal relations, topic, mood, etc.). In bilingual situations, this also involves how the relative status of the languages and the intensity of the contact determine language choice. A tough theoretical nut to crack for sociolinguistics is that there is a wide gap between these social determinants and actual language use, in the sense that these factors do not directly determine how you speak. Usage is influenced at more concrete levels than the broad-brush community-based factors commonly considered in sociolinguistics and contact linguistics. Though these factors ultimately help explaining any individual’s usage, there are still many basic-level factors that determine usage at a more subtle level, such as who one’s friends are, what one’s hobbies and interests are, what job one has, and many more ephemeral aspects of life, such as the settingrelated factors mentioned earlier, but also whether one is currently job hunting or not, spends much time in bars, etc. While the macro-level factors determine one’s repertoire in terms of the languages, varieties and features one masters, it is likely, at least, that the basic-level factors exert considerable influence on one’s inventory of lexical and constructional forms, particularly on the degree to which they are entrenched in one’s idiolect. Particularly relevant for the diffusion of loanwords is whether or not the community in question is permissive of their use. Assuming that in multicultural settings, many concepts are best conveyed by words from different languages, there may be a conceptual need for loanwords, but there may also be social blockage of them. If successful establishment of loanwords depends on their frequency of use (see also the next subsection), then it must make some difference whether or not it is deemed okay to use foreign words at all. Communities and subgroups within them differ in the degree to which they cherish purism. In so-called “focused” communities (Le Page and Tabouret-Keller 1985) linguistic boundaries are tightly controlled. In such communities, it is a prominent part of the meaning of any linguistic element, especially words, to which language they “belong”, while in “diffused” communities, on the other hand, this plays a much less important role. Diffused settings are conducive to using words and structures from other languages; focused ones are not. The limiting case of this freedom may be switching between styles of the same language, or between idiolects. Loanwords provide an obvious target for purism, for a variety of reasons. Foreign languages may stand for, or index, certain norms and values that are deemed alien or incompatible with the norms and values associated with the native language. In addition, foreign words stand out more, and are, therefore, an easier target than grammar,
26
Ad Backus
or words from the same language but associated with a different register, or with a different speaker. Hill and Hill (1986) provide a telling illustration of the pragmatics of using Spanish loanwords in Mexicano, in a community characterized by this kind of linguistic tension. Tension calls for heightened awareness of how one speaks, and the effect is that the use of loanwords is monitored. Obviously, this has its implications for the degree to which different kinds of speakers use them. An extreme case of this is discussed by Aikhenvald (2002), who shows that speakers of the Amazonian language Tariana, despite intense contact with the locally dominant East Tucanoan, have borrowed almost no words from that language. At the same time, their grammar is full of borrowed constructions: the difference illustrates that metalinguistic awareness has real influence on mental representations, through its direct influence on language use.
3.2 Cognitive determinants of language use: Entrenchment and conventionalization Above, change was conceptualized as the increase or decrease of the degree of entrenchment of a particular form-meaning unit. These entrenchment levels, in turn, influence further usage: the more entrenched a unit is, the higher the chance it will be selected again next time the meaning it codifies needs to be expressed. This process, amply demonstrated in the psycholinguistic literature, can be referred to as the cognitive determinants of language use, as opposed to the social determinants discussed in the previous subsection. At the same time, though, it needs emphasizing that selection ultimately resides in the social needs to which language is put. However, well entrenched units are selected more or less automatically in running speech, often without any conscious attention to the selection process, and for this reason it makes sense to recognize that there are both social and cognitive determinants of language use. The theoretical model of usage-based linguistics hypothesizes that entrenchment, or strength of storage, is a cognitively real phenomenon. It further hypothesizes that degree of entrenchment is determined by frequency of use: the more an element is used, the more entrenched it is. Note that this is a strictly speaker-based notion: entrenchment is defined as a property of the individual language user’s mind. However, what is normally studied in linguistics is the degree to which linguistic units are in general use, which means we trust individuals not to differ too much from each other. That is, what we are often interested in is community-based conventionalization. This is a sociolinguistic notion which refers to the degree to which an element, say a loanword, has become a conventional lexical choice for the various members of the community. If all members use it, it is fully conventionalized as a normal word in the language. However, this should not be confused with what we could call person-based entrenchment. This is a psycholinguistic notion that deals with the degree to which a particular speaker knows the word. Theoretically, a loanword may be the conven-
A usage-based approach to borrowability
27
tional choice for just a few people in the community, so that it is a convention (an established loanword) for them and a highly entrenched part of their inventories, but it might never be used by others, so that it is not entrenched for them. In that case, we could not really see it as a conventionalized loanword in the variety spoken in the bilingual community. None of this is unique for loanwords, of course: the question how well individual entrenchment and community convention correlate holds for all words, not just borrowed ones; in fact, it holds for all linguistic units. To assess how established a particular unit is, we essentially need two measures: (1) the social measure of how many people use it (a measure of the degree of conventionalization); and (2) the individual measure of how well entrenched it is in the linguistic competence of a representative individual speaker. The two measurements do not have to agree: the hypothetical example of the skewed distribution of loanword use mentioned in the previous paragraph shows that entrenchment and conventionalization are independent constructs, up to a point. Of course, if something is not entrenched very well in anyone’s mental representation, it is not conventionalized in the community either. For a theory of language change, however, it is desirable to be able to say something about whether a change is propagating at the community level. That opens up tricky questions: in how many individuals in a community should we see some entrenchment, and how high should these levels of entrenchment be? Cognitive Linguistics has grappled with these questions for a while now. The field has exploded in recent years with empirical investigations into entrenchment, mostly making use of corpus analyses, psycholinguistic experimentation, and the combination of the two (see Wiechmann 2010 for a critical review). Corpora provide data on frequency, which can be seen as relevant for both conventionalization and entrenchment. Forms that are used by many different people, in different genres, can be considered conventionalized. Frequency in corpora is also used as a measure of entrenchment, however, and this is a bit more problematic. A representative corpus pools linguistic data from many different speakers, often from different social backgrounds, across different speech genres. This means that quite some individual variation may become invisible, so that using the corpus frequency of a particular linguistic unit as a measurement of the entrenchment of that unit in a random speaker represents a leap of faith. To deal with this problem, many studies have started to look for “converging evidence”, by checking whether frequency patterns could be seen to correlate with psycholinguistic measurements of the strength of storage. Experimental techniques used for this purpose include variations on the conventional judgment task, such as Magnitude Estimation, lexical decision and speeded grammaticality judgment tasks (Schönefeld 2012). Encouragingly, studies in Cognitive Linguistics often find good results when attempting to correlate corpus frequencies and behavioral or psycholinguistic measures. Elements that are frequent elicit shorter reaction times, for example, in lexical decision experiments, or are judged more acceptable.
28
Ad Backus
In conclusion, a usage-based approach logically entails that variation and change are essential design features of language. In fact, since it claims that usage is more or less the same as performance, and that performance directly influences competence, it provides the performance-based linguistic theory sociolinguistics has long called for. That entails, in turn, that a usage-based approach calls for the unification of sociolinguistics and general linguistics: if variation and change are central features of language, linguistic theory needs to account for them in an integrated theory of mental representation. In the rest of this contribution, I will sketch how this can be done in the domain of loanwords.
4 A usage-based account of loanwords Sections 2 and 3 have set the stage for the usage-based account of lexical borrowing that will be developed in the present section. Loanwords instantiate contact-induced lexical change, and as such, a full model must account for their rise (the innovation aspect), their success (the propagation aspect), and possibly their fall, too. Accounting for innovation, it must be shown what makes particular words attractive in particular situations; accounting for propagation entails first of all measuring their degrees of conventionalization in the community and entrenchment in individual speakers. Loanwords provide what may conceptually be the easiest type of contact-induced change. As their foreign origin is beyond doubt, there will rarely be discussions about the possible pre-contact presence of the word in the receiving language, an issue that makes suspected cases of contact-induced grammatical change often very hard to prove (Thomason and Kaufman 1988). The pre-contact entrenchment level of a loanword in the speech of individual bilinguals will have been zero. During contact, however, as the change they instantiate is being propagated, entrenchment levels fluctuate somewhere between low and high, depending on whether the individual uses it or not, whether people around her use it or not, and the extent to which it is used. The issue of innovation relates to what Weinreich, Labov, and Herzog (1968) formulated as the Actuation Problem, the most basic question one can ask about language change: why did it happen when and where? Tailored to the domain of loanwords: why was the word borrowed? Section 2 mentioned various aspects that help make a particular word “attractive” in a particular situation, and a usage-based approach has little to add that is specifically “usage-based”, except perhaps that both high frequency and a high degree of perceptual salience may help a foreign word’s chances of getting noticed, itself a prerequisite for getting selected as an insertional codeswitch. Where the usage-based approach does have something to add is in another one of Weinreich, Labov, and Herzog’s (1968) problems, the Transition Problem. How can we know whether a foreign word that we encounter in a particular language represents an established change in that language, an ongoing change, or only an incipient change that we managed to catch in its early stages, and how did it acquire its current sta-
A usage-based approach to borrowability
29
tus? For example, when an individual bilingual speaker, for instance a Turkish-Dutch speaker in The Netherlands, uses a particular Dutch word in her Turkish, we do not know to what degree that word is an established loanword in her Turkish, let alone in Immigrant Turkish in general. In an analysis of bilingual speech, this occurrence will normally be analyzed as a case of insertional codeswitching (or “nonce-borrowing”, cf. Poplack and Meechan 1995, which amounts to the same thing). However, this tells us nothing about the degree of community conventionalization. Perhaps another word is in order here about the difference between borrowing and codeswitching. Recall from Section 2 that in the codeswitching literature, this has proved to be a very divisive issue, evaluations of the value of a theoretical proposal sometimes hinging on the question whether a particular counterexample should be classified as a codeswitch or as a case of borrowing. To my mind, this debate is misguided, because a foreign-origin word can be both: borrowing and codeswitching are not mutually exclusive like that. Borrowing is a diachronic process, while codeswitching is a synchronic event. The Dutch word used in a Turkish sentence can thus be both: it might synchronically represent the use of a Dutch word in Turkish, and diachronically a more or less established loanword in this particular variety of Turkish. The synchronic use of the word in question in a particular sentence recorded for a corpus cannot tell you much about the degree to which the word is integrated into the receiving system. To assess its status as a loanword, we would need information on its degree of entrenchment in the idiolect of the speaker, and its degree of conventionality in the speech community of which the speaker is a member. A loanword is a foreign-origin word which is, to a certain extent, an accepted and established lexical item in the borrowing language. A codeswitch is a shift in mid-utterance or middiscourse to material from the other language. In a bilingual context, these two categories do not exclude each other. What is needed for a word to be a loanword is that it is used often enough. What is needed for something to be used as a codeswitch, is some awareness of the foreign etymological origin. It is easy to see that in a bilingual situation, where speakers are bilingual and thus know of any word whether it’s originally Dutch or Turkish, both conditions can apply to the same word at the same time. The ability to recognize a word as Dutch is independent of its conventionalization as a relatively widely used loanword. Speakers may even still actively use it for codeswitching in the literal sense, as a temporary switch to Dutch. This can be done for any number of pragmatic reasons, such as attention grabbing, emphasizing, etc. However, it stands to reason that this potential decreases with increasing entrenchment, since the effect of high entrenchment is to make the word in question more and more normal, and hence unnoticed. The more entrenched, the less its Dutch-origin nature stands out. As long as the population is bilingual, though, this potential can never be zero. Complicating the issue, the extensive literature on bilingual speech makes it clear that there is considerable variation across speakers in codeswitching patterns. From a usage-based perspective, this means there must also be considerable variation in the
30
Ad Backus
degree to which particular foreign-origin words are entrenched in speakers’ mental representations. From the perspective of Cognitive Sociolinguistics, then, what is interesting about borrowing is what it can tell us about the nature of language change. The cognitive interest centers on issues of entrenchment: how entrenched is a putative loanword (and therefore, to what degree can we say that the language has undergone change)? The social interest of the issue lies in the tension between the individual nature of entrenchment and the social nature of conventionalization. If the loanword is entrenched to different degrees by different speakers, then for whom is it entrenched more, and why? These questions are all part of the usage-based reformulation of the Transition Problem. With the theoretical account more or less sketched out, we can turn to the data needed to test its claims. The next section will conclude that there are a few problems to overcome.
5 Methodological hurdles Several methodological challenges plague the usage-based investigation of loanwords, chief among them is the availability of sufficient data. The previous section has argued that it is important to find out how pervasive a particular loanword is in current language use. Of course, one can search dictionaries to see whether the loanword appears in it, and if it does, it is probably conventionalized to some degree. Loanword dictionaries, in fact, exist for many of the major languages, and they give a fine perspective on past contact situations and the degree to which the language has participated in the global flow of cultural influences. However, they are less useful for the research questions motivated by a usage-based approach: Which loanwords are really in current use? And how frequently are they used, and by how large a percentage of the population? Who uses these loanwords and who does not? Is their frequency of use purely determined by the number of times the concept they encode is needed, or are they (still) in competition with a native equivalent? To what degree is their usage dependent on communicative, contextual and stylistic factors? For many of the major languages, large spoken corpora are of course available, and corpora of written data, such as newspaper archives, are relatively easy to come by. The spread of loanwords could be investigated by mining these monolingual corpora, but we should bear in mind that this will only provide information about loanwords in one type of context. Loanwords enter languages either through face-to-face contact between bilinguals (see above) or through the intermediary of elite bilinguals. The latter type is not unimportant, and is probably responsible for many of the Latin and Greek internationalisms in most of the world’s modern languages. Its latest incarnation is the globalization-induced spread of English words worldwide through the media: extensive knowledge of English and daily conversational use is not necessary for English words to spread successfully around the globe. The tools of corpus lin-
A usage-based approach to borrowability
31
guistics can most certainly be used to investigate this type of loanword. The Corpus of Spoken Dutch (CGN), for example, a 10-million word sample representative of spoken registers in Holland and Flanders, will contain many English words; identifying the frequency and contexts of their use, and characteristics of the speakers who use them, will go some way towards answering some of the abovementioned questions. Globally, it would show the extent to which globalization affects the Dutch lexicon. On the other hand, even spoken corpora tend to be relatively limited in the amount of everyday informal interaction they can include. They tend to make liberal use of data that are easy to process, such as public lectures, so their representativeness of the ambient language should not be overstated. It is likely that significant improvements will be made in the near future, e.g. with the help of automatic speech recognition, or the use of enormous repositories of subtitles (which often represent relatively ordinary dialogue). Note, also, that large corpora are normally not tagged for etymological origin of the words that are used, so that identifying loanwords will be much work. More generally, though, these corpora will tell us little about how loanwords spread in bilingual communities, the empirical domain of contact linguistics. It is probably unrealistic to think that the availability of bilingual corpus data will improve much. Borrowing tends to be from the dominant language in society into a dominated, low-status language, often the language of an immigrant or indigenous minority group. Immigrants are prone to shift to the majority language at some point; both factors make it unlikely that any funding agency will spend much money on building a large corpus of the minority language. On the other hand, social developments in bilingual life (use of Internet-based modes of communication) and technological developments in “E-Humanities” (e.g. new extraction techniques) may make more tools available than can currently be envisaged. The situation is better for indigenous minority languages, though, especially if they are threatened with shift and death. Conservation and documentation of the language may be perceived as a matter of national interest, of preserving an essential heritage. On the other hand, especially for such languages the corpus that might result may not represent everyday usage very well, and thus not accurately reflect current loanword usage. Language documentation of endangered languages will often be designed to maximize monolingual language use, while everyday speech may have a high proportion of codeswitching with the language that the population is shifting to. The Turkish corpus collected by the author and associates is typical. It is as large as any bilingual corpus one is going to find, consisting of about half a million words of spoken Turkish conversation (cf. Do˘gruöz and Backus 2007). About two thirds of it was collected from bilingual speakers in The Netherlands, from both first and second generation speakers. The rest of the corpus served as control data, and were collected in Turkey, in the same place as where most of the immigrants in the bilingual corpus had their roots (the central Anatolian town of Kır¸sehir). All data come from spoken everyday interaction: most were interviews with one to three individuals, conducted by an interviewer unknown to them before the recording. The recordings have yielded
32
Ad Backus
a stylistically fairly homogeneous set of data, which has been stored in a machinereadable form. This is a fairly typical corpus for contact linguistics; similar databases have been built elsewhere, including other ones for Immigrant Turkish. The gold standard is perhaps provided by the corpora built under the supervision of Shana Poplack in Ottawa (see www.sociolinguistics.uottawa.ca). Useful as corpora like these may be, tracing the diffusion of individual Dutch loanwords in Immigrant Turkish is impossible with them. Basically, it’s a problem of numbers: the corpus is simply too small. Loanwords tend to be content words, and even frequent content words do not occur that often in a corpus of half a million words, if at all. The tools of corpus linguistics cannot easily be applied to it, since no largescale corpora are available, and presumably never will be. Given that loanwords tend to have relatively specific meanings (Backus 2001), the typical loanword will have a low token frequency. Often, the corpus will not turn up even a single instance of a particular foreign-origin word that may well be an established loanword in the vernacular of the community. In fact, Mollin (2009: 370) claims that even a corpus of 100 million words will be too small when it comes to the investigation of the use of medium- or low-frequency collocations. In addition, lexical diffusion tends to be determined by social factors, such as the social background of the speaker, and his or her communicative goals, while a corpus such as ours was kept as homogeneous as possible in order to be able to compare bilingual and monolingual Turkish. And this typifies corpora of this kind, as they want to maximize the number of comparable utterances, rather than document the extent of variation. That is, there is little stylistic variation between recordings and little social variation between speakers. So, for the vast majority of bilingual settings, there are no large and balanced corpora, and therefore there are no frequency data that provide a reliable picture of how widespread a loanword is. There are various additional problems if such corpora are to be used to investigate the question of loanword diffusion. First, the speakers captured on tape are few. Second, informants for codeswitching studies will often have been selected precisely because they codeswitch a lot, which is useful if the structure and pragmatics of codeswitching is the object of research, but it is clear that these speakers only cover part of the range of sociolinguistic variation present in the community. Loanwords used by them may not be used by everybody. Third, the conversation captured on tape may not be representative of community interaction either. Often, a corpus of bilingual speech consists of only a few, or even just one, recording. It is, therefore, unlikely to capture the full communicative repertoire of the community. Still, corpus linguistics makes several tools available that have not been explored at all yet in connection to loanwords, as far as I know. A collostructional analysis (Gries and Stefanowitsch 2004) of loanwords could, for instance, show that speakers prefer to use loanwords in particular parts of a clause, such as the periphery (Treffers-Daller 1994) or special sites for loanwords (Poplack and Meechan 1995). The only interpretative step that may be defendable is some form of extrapolation. It sure stands to reason that if the use of a loanword is captured in such limited data, it
A usage-based approach to borrowability
33
probably is a word that is in general use in the community. This can then be checked, in at least two ways. One solution would be to search for more data concerning this particular word, e.g. by browsing Internet forums and blogs using the community language; the other one, especially advocated here, is to use these words as stimulus items in judgment tasks or as the basis of discussion in focus group interviews. Essentially, the question posed to informants then would be something like “I found you use this loanword in everyday conversation; how widespread is it really? Do you and/or people around you indeed use it freely?” There are, thus, reasons to invest in alternative methods for investigating the social diffusion of loanwords beyond the difficulty of building suitable corpora. Recall that Cognitive Linguistic studies often combine corpus research with psycholinguistic experimental work, to search for converging evidence. In the study of loanwords, too, community conventionalization and individual entrenchment can be investigated with experimental data. Entrenchment levels of individual loanwords could be elicited through judgment or acceptability tasks, for example. As far as I know, this has not been done yet, though we have piloted some methods recently (Backus, Demirçay, and Sevinç forthcoming). We gave participants sentences to judge, asking them to rate how normal they sounded to them, to what degree they thought they could well hear the sentence around them, or use it themselves. The results were promising, in that participants could do the task, and rated sentences that are common in corpus data from Immigrant Turkish as very acceptable. We also think this way of asking the questions made sure that participants judged the sentences irrespective of any positive or negative attitudes they might have towards Dutch influence on their Turkish, but this is not easy to prove. There are many aspects of the design of such studies still to be worked out. In addition, they have not been done yet on loanwords, which, because of their salience, are even more prone to evoke attitudes rather than judgments on relative frequency of occurrence. Another task that could be envisioned is one modeled on the phrasal decision task designed by Arnon and Snider (2010), who asked participants to judge whether particular English multi-word sequences (of four words each) were common units in English or not. The results confirmed their hypothesis that frequent combinations such as don’t have to worry are processed faster than less frequent ones (e.g. don’t have any place ). Such decision tasks could also be done with loanwords, testing the hypothesis that attested loanwords will be processed faster than non-attested or unlikely ones. However, here too, there are some aspects of the design that would need to be worked out first. For a general overview of experimental tasks and an assessment of their naturalness, see Gilquin and Gries (2009). To summarize, we obviously will not have as good a basis for frequency data for contact varieties as we have for the larger world languages, but experimental measurements are surely within reach. Psycholinguistic tasks can be used to track the degree to which individual loanwords are deemed to be in common use in bilingual populations. Arguably, they provide better data on this issue than corpus data would, even if
34
Ad Backus
we did have a large corpus at our disposal. Ideally, both types of data provide converging evidence.
6 Challenges One of the more urgent tasks for contact linguistics in the immediate future is to develop the methodology for obtaining experimental data. We would then be able to empirically establish the success rate of loanwords, as well as the maintenance rate of native equivalents and rivals. As a usage-based account of lexical borrowing is developed, and the required methodology is worked out as well as possible, new questions arrive on the horizon. They contribute to the potential of loanwords to inform linguistic theory with important insights. A first challenge is that a theory of loanwords would be rather limited in scope. If lexical borrowing is indeed just one type of contact-induced change, it should be compared with other types. The innovation and propagation of loanwords needs to be placed within a larger theory of contact-induced change that also takes into account loan translation, semantic extension, and all kinds of grammatical change (Croft 2000; Backus 2005). Is there a trade-off between using loanwords and employing loan translations? Is there a direct link between loanword usage and the use of foreign-origin grammatical features? In a collostructional analysis, the collostructional strength of foreign words and a particular foreign structure could be checked: if there is a significant attraction, then using foreign words appears to push the entrenchment of the foreign construction. That would suggest evidence that lexical codeswitching is a mechanism for contact-induced grammatical change, a hypothesis sometimes hinted at, but so far not empirically demonstrated. Usage-based approaches tend to place lexical and structural elements of language on a continuum (cf. Langacker 2008), and thus one may hypothesize that the mechanisms of borrowing are the same for specific units (i.e. words and expressions), schematic units (i.e. grammatical patterns) and partially schematic units (i.e. constructions in the sense of Construction Grammar). While above we have considered the limitations of the existing corpora of bilingual speech for the analysis of lexical borrowing, they can be, and are, used to investigate the diffusion of syntactic patterns. Analyses of sociolinguistic variation, of course, often rely on the quantitative analysis of just this kind of data. It is possible, for example, to track the use of a particular African-American English feature, such as copula be, in a large corpus of American English and check to what extent it has penetrated general usage. Similarly, even with a modest corpus of Immigrant Turkish, it is possible to track the occurrence of “native” SOV and “borrowed” SVO order (cf. Do˘gruöz and Backus 2007). A second challenge involves the mechanism by which units become entrenched. In particular, it is not so clear when we should decide that the use of a particular unit is similar enough to what is stored in the mind so that it can be assumed to further
A usage-based approach to borrowability
35
strengthen that unit’s entrenchment. Every unit combines a form and a meaning. On the form side, there is not much of a problem, especially as long as we limit the discussion to words. However, forms tend to show polysemy, and hence we have to ask: how much variation in meaning do we allow and still see the word as “the same”? Should all meanings be taken together, so that each occurrence, no matter what the specific contextually determined meaning, contributes to the entrenchment of the same formmeaning unit? There does not seem to be an easy answer to this question. What seems to make sense, though, is to assume that in case of true polysemy (rather than e.g. homonymy), all uses count. If a Turkish speaker in Holland uses the Dutch word feestje ‘party’ several times, it may alternately refer to different kinds of parties, but by and large it all contributes to the entrenchment of the unit that comprises this form and a generalized meaning of ‘party’, glossing over the differences within a range of types of party. It may or may not overlap with the meaning of a small number of Turkish equivalents, such as e˘glence and parti (most likely, for most people the Dutch word will take on the specific connotation of a party done the Dutch way, thus making the meaning that is being entrenched relatively specific). Specificity helps the putative loanword in its competition with any native equivalent, as the specific meaning may make it more salient, or suitable, in many of the contexts where in principle all three words would suffice. Encyclopedic characterization of meaning is key: a foreign word’s attractiveness may lie solely in its pragmatic impact or in the fact that the language it originates from has an association with cultural change (modernization, globalization, etc.). There are numerous analyses in the codeswitching literature that suggest this, whenever examples are presented of insertions with highly specific cultural meanings or where switching is done to achieve a reference to a more powerful code (cf. Hill and Hill 1986). Pre-contact, it is obvious that the entrenchment of the Dutch word is zero. But what is the pre-contact level of entrenchment of the Turkish words? Should we set them at 100%? That would only be justified if entrenchment can reach levels where further activation does not really do anything anymore. In reality, it is certainly imaginable that monolingual Turkish speakers, given the right methodology, will be shown to have different levels of entrenchment for e˘glence and parti. In the contact situation, the entrenchment of each of the three words will be more than zero for most bilinguals; but how high they should be set seems to be an empirical question once the methodology is in place to measure degree of entrenchment. Has the Dutch word reached the same levels as its Turkish counterparts? Has one or both of these decreased its entrenchment level? Are the figures for the words related, so that for any individual speaker the entrenchment of one word predicts the entrenchment of the others? That would be a useful hypothesis, since the words may be expected to be in competition. Finally, it needs to be investigated what happens to the lexical chunks larger than single words that appear in such great numbers as complex insertions in codeswitching data, but that fail to make the cut in lists of loanwords. The vocabulary of a borrowing language is enriched with a number of loanwords, but borrowed phrases and
36
Ad Backus
collocations are few and far between. It is because of their rarity that the occasional French phrase (such as je ne sais quoi, or le mot juste ) borrowed into English becomes a contact linguistic cause célèbre. In the past, this question was never asked because codeswitching studies showed little interest in how the use of foreign-origin elements developed over a longer time period, i.e. diachronically, while for the field of contactinduced change these cases are too lexical to be of more than passing interest. For the type of usage-based approach sketched here, however, the question is interesting. If Turkish speakers in Holland sprinkle their Turkish abundantly with Dutch phrases and other larger chunks of discourse, as they have been observed to do, one would expect these phrases to become more and more entrenched in the competence of these speakers. The phrases in question are often adjective-noun (cf. short cycle ) or verbobject collocations (cf. run a program ), fixed and idiomatic prepositional phrases (cf. for what it’s worth ) and assorted semi-idiomatic turns of phrase (cf. doesn’t matter ). Could a Dutch Turkish develop in which these collocations and idioms become established loans? It would be at odds with what we normally see in loanword layers. Various explanations are possible. It could be that most languages that incorporate this much foreign material eventually die, as their speakers simply shift completely to the other language. A usage-based hypothesis for this scenario would be that the foreign phrases tend to trigger more foreign material, such as subject and object pronouns, verb inflection, plural marking, etc., because of the strongly entrenched links to that other material. That means utterances will tend to become monolingual productions in the other language, and ultimately this leads to shift unless it is halted in some way. Another explanation is that speakers at some point start to feel the need to oppose the influx of large chunks, for example to protect the integrity of their “native” language. Our data suggest that if speakers are forced somehow (e.g. by the choice of interlocutor) to speak monolingual Turkish, the incidence of loan translations and other forms of Dutch-influenced Turkish goes up. Phrases that could end up as multiword borrowings might instead end up as loan translations. This raises the interesting question whether the entrenchment of a foreign collocation is transferable, as it were, to that of a literally translated native equivalent that did not exist before contact.
7 Conclusions This paper is an attempt to rethink the issue of loanwords from the perspective of an emerging Cognitive Sociolinguistics. The view put forward is that constructing a usagebased account of lexical borrowing would entail a coherent integration of synchronic and diachronic perspectives on language change that is largely absent from work on loanwords up to now. It would explore links between accounts of loanwords in historical linguistics and of insertional codeswitching in contact linguistics. I have worked out various theoretical and methodological implications of this view. One is to rely less exclusively on corpus data and make better use of speakers’ intuitions and metalin-
A usage-based approach to borrowability
37
guistic knowledge. If we want to know about the degree to which a particular foreignorigin word has spread through the speech community, we can ask people. Loanwords are normally content words, and content words are normally low in frequency, so that corpus frequencies do not give a reliable picture about the overall use of these words in the speech community. This problem with corpora is exacerbated for bilingual speech because corpora will generally be relatively small in size. While the paper has focused on lexical cross-linguistic influence, the investigation of structural influence would also be better served by a combination of corpus and experimental data. While traditional approaches to linguistics, with their strict separation of lexicon and grammar, naturally focused on either one or the other, usage-based approaches call for a more integrated account.
References Aikhenvald, Alexandra Y. 2002. Language contact in Amazonia. Oxford: Oxford University Press. Arnon, Inbal & Neal Snider. 2010. More than words: Frequency effects for multi-word phrases. Journal of Memory and Language, 62. 67–82. Backus, Ad. 2001. The role of semantic specificity in insertional codeswitching: evidence from Dutch-Turkish. In Rodolfo Jacobson (ed.), Codeswitching worldwide II, 125–54. Berlin & New York: Mouton de Gruyter. Backus, Ad. 2003. Units in codeswitching: evidence for multimorphemic elements in the lexicon. Linguistics 41(1). 83–132. Backus, Ad 2005. Codeswitching and language change: One thing leads to another? International Journal of Bilingualism 9(3/4). 307–340. Backus, Ad. 2009. Codeswitching as one piece of the puzzle of language change: The case of Turkish yapmak. In Ludmila Isurin, Donald Winford & Kees De Bot (eds.), Interdisciplinary approaches to codeswitching (Studies in Bilingualism 41), 307–336. Amsterdam & Philadelphia: John Benjamins. Backus, Ad, Derya Demirçay & Ye¸sim Sevinç. Forthcoming. Converging evidence on contact effects on second and third generation Immigrant Turkish. To appear in Ad Backus, Carol W. Pfaff & Annette Herkenrath (eds.), Turkish in Northwestern Europe versus Turkish in Turkey. Copenhagen: Copenhagen University. Barlow, Michael & Suzanne Kemmer (eds.). 2000. Usage-based models of language. Stanford: CSLI Publications. Bullock, Barbara & Almeida J. Toribio (eds.). 2009. The Cambridge handbook of linguistic codeswitching. Cambridge: Cambridge University Press. Bybee, Joan. 2006. Frequency of use and the organization of language. Oxford: Oxford University Press. Bynon, Theodora. 1977. Historical linguistics. Cambridge: Cambridge University Press. Croft, William. 2000. Explaining language change: An evolutionary approach. Longman: Harlow. Do˘gruöz, Ay¸se Seza & Ad Backus. 2007. Postverbal elements in Immigrant Turkish: Evidence of change? International Journal of Bilingualism 11(2). 185–220. Field, Fredric. 2002. Linguistic borrowing in bilingual contexts. Amsterdam: John Benjamins. Geeraerts, Dirk & Gitte Kristiansen. Forthcoming. Cognitive Linguistics and language variation. In Jeannette Littlemore & John Taylor (eds.), Companion to Cognitive Linguistics. London: Continuum.
38
Ad Backus
Gries, Stefan Th. 2008. Phraseology and linguistic theory: A brief survey. In Sylviane Granger & Fanny Meunier (eds.), Phraseology. An interdisciplinary perspective, 3–25. Amsterdam & Philadelphia: John Benjamins. Gries, Stefan Th. & Anatol Stefanowitsch. 2004. Extending collostructional analysis; A corpusbased perspective on ‘alternations’. International Journal of Corpus Linguistics 9. 97–129. Gilquin, Gaëtanelle & Stefan Th. Gries. 2009. Corpora and experimental methods: A state-of-theart review. Corpus Linguistics and Linguistic Theory 5(1). 1–26. Haugen, Einar. 1972. The analysis of linguistic borrowing. In The ecology of language, 79–109. Stanford: Stanford University Press. Heine, Bernd & Tania Kuteva. 2005. Language contact and grammatical change. Cambridge: Cambridge University Press. Hill, Jane & Kenneth Hill. 1986. Speaking Mexicano. Dynamics of syncretic language in Central Mexico. Tucson: University of Arizona Press. Johanson, Lars. 2002. Structural factors in Turkic language contacts. London: Curzon. Kristiansen, Gitte & René Dirven. 2008. Cognitive Sociolinguistics: Language variation, cultural models, social systems. Berlin & New York: Mouton de Gruyter. Langacker, Ronald W. 2008. Cognitive Grammar. A basic introduction. Oxford: Oxford University Press. Le Page, Robert & Andrée Tabouret-Keller. 1985. Acts of identity. Creole-based approaches to language and ethnicity. Cambridge: Cambridge University Press. Matras, Yaron. 2009. Language contact. Cambridge: Cambridge University Press. Mollin, Sandra. 2009. “I entirely understand” is a Blairism. The methodology of identifying idiolectal collocations. International Journal of Corpus Linguistics 14(3). 367–392. Mufwene, Salikoko. 2008. Language evolution. Contact, competition and change. London: Continuum. Muysken, Pieter. 2000. Bilingual speech: A typology of codemixing. Cambridge: Cambridge University Press. Myers-Scotton, Carol. 2002. Contact linguistics: Bilingual encounters and grammatical outcomes. New York: Oxford University Press. Poplack, Shana. 1980. Sometimes I’ll start a sentence in English y terminó en español. Linguistics 18. 581–616. Poplack, Shana & Nathalie Dion. 2012. Myths and facts about loanword development. Language Variation and Change 24(3). 279–315. Poplack, Shana & Marjorie Meechan. 1995. Patterns of language mixture: nominal structure in Wolof-French and Fongbe-French bilingual discourse. In Pieter Muysken & Leslie Milroy (eds.), One speaker, two languages, 199–232. Cambridge: Cambridge University Press. Schönefeld, Doris. (ed.). 2012. Converging evidence. Methodological and theoretical issues for linguistic research. Amsterdam & Philadelphia: John Benjamins. Thomason, Sarah Grey. 2001. Language contact: An introduction. Washington (DC): Georgetown University Press. Thomason, Sarah Grey & Terrence Kaufman. 1988. Language contact, creolization, and genetic linguistics. Berkeley & LA & London: University of California Press. Treffers-Daller, Jeanine. 1994. Mixing two languages: French-Dutch contact in a comparative perspective. Berlin & New York: Mouton de Gruyter. Van Hout, Roeland & Pieter Muysken. 1994. Modeling lexical borrowing. Language Variation and Change 6. 39–62. Verschik, Anna. 2008. Emerging bilingual speech: From monolingualism to code-copying. London: Continuum. Weinreich, Uriel. 1964. Languages in contact: Finding and problems. The Hague: Mouton [1953].
A usage-based approach to borrowability
39
Weinreich, Uriel, William Labov & Marvin Herzog. 1968. Empirical foundations for a theory of language change. In Winfried P. Lehmann & Yakov Malkiel (eds.), Directions for historical linguistics: A symposium, 95–195. Austin: University of Texas Press. Wiechmann, Daniel. 2010. Understanding complex constructions: A quantitative corpus linguistic approach to the processing of English relative clauses. University of Jena dissertation.
Eline Zenner, Dirk Speelman and Dirk Geeraerts
What makes a catchphrase catchy? Possible determinants in the borrowability of English catchphrases in Dutch Abstract: This paper aims to add two perspectives to the current study of English loanwords in weak contact situations. First, it aims to provide empirical support for recent theoretical claims on the borrowability of phraseological units (as good as it gets, always look on the bright side of life ), as such opening up the restricted focus of existing anglicism research on single-word units (computer, manager ) to multi-word units. To this end, we present a corpus-based analysis of the borrowability of English catchphrases in Dutch. Second, we wish to widen the scope of the mainly structuralist perspective of existing anglicism research by studying the impact of a wide variety of lectal, pragmatic and socio-conceptual features on borrowability. Most importantly, our analyses aim to verify how important mass media are for the spread of English in Europe. Results reveal first, how direct media influence appears to play a role in the spread of catchphrases and second, how the popularity of the catchphrase in International English also contributes to its borrowability.
1 Introduction This paper presents a multifactorial corpus-based analysis of the use of English catchphrases in Dutch. We show how an interplay of socio-conceptual, encyclopaedic and linguistic characteristics of the catchphrases under scrutiny can help explain whether a given catchphrase occurs in Dutch newspapers. As such, this paper serves as a first, exploratory step to address two shortcomings in existing anglicism research. First, by incorporating a variety of lectal, pragmatic and socio-conceptual features in the analysis, we wish to widen the predominantly structuralist scope of existing anglicism research. Second, this study serves as empirical support for recent theoretical claims concerning the borrowability of (English) formulaic sequences (cf. also recent advances in Furiassi, Pulcini, and Rodriguez-González 2012). Although catchphrases form a niche of the formulaic lexicon, we will illustrate how they form a particularly useful and interesting starting point, as they can help shed light on the importance of mass media for the spread of English. Moreover, we will indicate how inquiring into the role played by mass media is especially meaningful for weak contact settings, where contact with English is indirect, remote and asymmetrical (Onysko 2004). The structure of this paper is as follows: in a first section, we describe how this study wishes to contribute to existing anglicism research. Second, we present the design of our study: after introducing the catchphrases under scrutiny, we discuss
42
Eline Zenner, Dirk Speelman and Dirk Geeraerts
the selection of dependent and independent variables. For the dependent variable, we investigate whether or not a given English catchphrase occurs in two large Dutch newspaper corpora. As concerns the independent variables, a number of features are identified that might have an influence on the absence or occurrence of a catchphrase. Special attention is paid to both advantages and drawbacks of the chosen predictors compared to alternative operationalizations. In the next section, the results of the study are presented: we present the output of a logistic regression analysis, a multifactorial technique that takes the interplay of all defined features into account. After presenting the main model, we verify whether regional variation can be found, contrasting the data found for Belgian Dutch (spoken in Flanders, the northern part of Belgium) and Netherlandic Dutch (the official language of the Netherlands). Conclusions are made in the final section, where we also mention some possible avenues for future research.
2 Theoretical background In this paper, a first attempt is made to overcome two shortcomings in existing anglicism research. First, attention for socio-pragmatic and cognitive aspects is surprisingly scarce. Second, neither contact linguistic research nor studies on phraseology have as yet inquired in detail into the borrowability of formulaic sequences. In this section, these shortcomings are described in some more detail. Then, we briefly describe how our study on the use of English catchphrases in Dutch serves as a first attempt to tackle these issues. Finally, the main reasons that led us to consider this specific subset of formulaic sequences are listed.
2.1 Borrowability of formulaic sequences: A Cognitive Sociolinguistic perspective 2.1.1 A Cognitive Sociolinguistic perspective on borrowability Our analysis focuses on the use of English in Belgian Dutch and Netherlandic Dutch, the two main national varieties of Dutch. Both regions belong to Kachru’s “expanding circle”, a cover term used for all language communities where English does not have any official status, but mainly functions as language for international communication (see Kachru 1992, but also e.g. Pennycook 2003 and Yano 2009 for a critical appraisal of the model). Within this expanding circle, and more specifically within German linguistics, a longstanding tradition of research on English loanwords has emerged since the post-war period. Over the years, corpus-based research on English loanwords has mainly taken a structuralist perspective, focusing on exhaustive inventorization following a stepwise procedure. First, researchers collect all anglicisms found in a group of texts, often without being entirely clear on the data they rely on (e.g. Zandvoort 1964; Kurth 1998).
The borrowability of English catchphrases in Dutch
43
More recently, researchers have started proceeding more systematically, mining structured text corpora for anglicisms, but the corpora used are still limited in size, as researchers extract the anglicisms manually from their data (e.g. Fink 1997; Onysko 2007). As a second step, the extracted anglicisms are grouped according to several dimensions, such as the degree of morphophonological, orthographic or syntactic integration to the receptor language (e.g. Carstensen 1965; Filipovic 1977; NettmannMultanowska 2003), or according to the lexical field the items belongs to (e.g. Krauss 1958; Grigg 1997). Most popular are tallies showing the amount of anglicisms per part of speech (e.g. Yang 1990: 29; Onysko 2007: 131), which ties in with research on borrowability, where the aim is to identify which (universal) features influence the ease with which items can be borrowed (Whitney 1881; Haugen 1950; Field 2002: 117). Other possibly influential features for the borrowability of anglicisms are left unmentioned and the question of what factors can explain the higher borrowability of certain parts of speech is only rarely considered (but see Van Hout and Muysken 1994; Field 2002; Backus, this volume). Although presented from a bird’s eye view, the structuralist nature of anglicism research should be clear. Nevertheless, some occasional attempts have been made to incorporate a more social perspective on (the borrowability of) anglicisms. First of all, in their study on the use of English loanwords in five neighborhoods in Canada, Poplack, Sankoff, and Miller (1988) link the degree of integration of anglicisms to several social factors. Using a large database of anglicisms and employing multifactorial statistical analyses, they illustrate the importance of social class, neighborhoods, age and sex for explaining variation in the level of integration of anglicisms. Although this approach is promising, the focus of the study is still predominantly structuralist. Second, Leppänen and Nikula (2007) investigate how the choice for Finnish or English is a marker of “the participants’ subtle orientations towards the talk and activity” (Leppänen and Nikula: 2007: 336). Next, Androutsopoulos (2012) focuses on the discourse functions of English in German media, which according to him are heading (the use of English in headlines), bracketing (the use of English to delimit the boundaries of a text) and naming (the use of English to name people, organizations or products). Finally, Zenner, Speelman, and Geeraerts (2012) conduct a multifactorial analysis on variation in the success of English loanwords, studying the combined effect of word-related (e.g. era of borrowing), concept-related (e.g. euphemism) and lectal features (e.g. regional variation) (cf. also Zenner, Speelman, and Geeraerts forthcoming). Despite these attempts, it is safe to say that anglicism research is in need of a Cognitive Sociolinguistic approach (Geeraerts 2005; Kristiansen and Dirven 2008; Geeraerts, Kristiansen, and Peirsman 2010). More specifically, in view of the narrow focus of borrowability research on structural features, this paper pleads for the incorporation of socio-lectal features, encyclopaedic and cognitive-conceptual features tailored to the phenomenon under scrutiny (see Geeraerts, Grondelaers, and Bakema 1994), pragmatic and stylistic features (e.g. De Sutter, Speelman, and Geeraerts 2008), and
44
Eline Zenner, Dirk Speelman and Dirk Geeraerts
the combined effect of all of the above, by using multifactorial statistical analyses (e.g. Impe, Geeraerts, and Speelman 2008; Zenner, Geeraerts, and Speelman 2009).
2.1.2 Borrowed formulaic sequences Lexical semantics and lexicology have moved away from analyzing words in isolation towards corpus linguistic studies where the importance of context is emphasized (cf. Geeraerts 2010). This opened up attention for formulaic sequences, which include “[any] sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar” (Wray 2002: 9). Given these innovations, the narrow focus of anglicism research on loanwords and compounds is remarkable: up to this point, research on the use of longer stretches of foreign language material has largely been limited to the domain of codeswitching (e.g. Muysken 2000; Deuchar, Muysken, and Wang 2007). However, it is not hard to appreciate that the borrowing of formulaic sequences, either in the form of fixed expressions (like pimp my ride ) or of more schematic units (like [pimp my N]), and the actual mixing of codes are two very different phenomena. First of all, codeswitching is the spontaneous mixing of ad hoc created material from different grammatical systems (example 1, drawn from Poplack 1980). The use of foreign formulaic sequences, on the other hand, does not involve this creation of new language material; the expression can be borrowed as a whole and does not have to be parsed into its grammatical components (example 2, taken from our Dutch newspaper corpus) (cf. infra). (1) And from there I went to live pa’ muchos sitios. Después viví en la ciento diecisiete with my husband. (Poplack 1980: 597) ‘And from there I went to live in many different places. After that period, I started living at 117th with my husband.’ (2) Mijn favoriete beginsel is dat ieder mens recht heeft op the pursuit of happiness (De Morgen, 21/10/2000). ‘My favorite dogma is that every man is entitled to the pursuit of happiness.’ Second and related to this, there is a clear difference in the proficiency level required for the phenomena. As Poplack (1980) points out, codeswitching is typical for bilinguals who are highly proficient in both codes. By contrast, bilingualism is no prerequisite for the use of foreign formulaic sequences at all. As such, it is not hard to see how there is “the generally recognized possibility of borrowing idiomatic phrases as units” (Pfaff 1979). Moreover, the rise of English as a language for international communication (e.g. Crystal 2003) makes an increase in the borrowability of English formulaic sequences in the “expanding circle” very plausible. Consequently, over the last few years, some isolated attempts to discuss the possibility
The borrowability of English catchphrases in Dutch
45
of borrowed English phraseology can be noted. For instance, Sharp (2001) pays attention to longer stretches of English in spoken Swedish. However, she uses the term codeswitching as an aggregate for all English multi-word units in her corpus, although most of these should better be considered as borrowed formulaic sequences (e.g. go to hell, be my guest ). She hints at the formulaic nature of the phrases herself (Sharp 2001: 107), but does not link this to a theoretical distinction between codeswitching and borrowed phrases. A similar problem appears in Onysko’s work (2007), which focuses on written German. He dedicates a paragraph to “multi-word phrasal borrowings” (e.g. state of the art, just in time ) but is not very consistent in distinguishing these from his examples of intra- and inter-sentential codeswitching (e.g. the place to be ). Nevertheless, both approaches are worth mentioning as one of the first to pay systematic attention to longer stretches of English occurring in weak contact settings. Still more promising steps have been taken by Stefanowitsch (2002), Van de Velde and Zenner (2010) and some of the contributions in Furiassi, Pulcini, and Rodriguez-González (2012). Stefanowitsch (2002) discusses a number of examples of German-English bilingual punning, such as Message in a Zottel (Zottel ‘shaggy-haired person’). Van de Velde and Zenner (2010) discuss how the MTV-program Pimp my Ride gave rise to the introduction of the construction [pimp my N] in Dutch. The analysis shows how the construction was first used with cars and other vehicles (pimp my bike ). Then, the construction became more open to other types of nouns, and finally also to other (Dutch) possessive pronouns (pimp je grootje ‘pimp your grandma’). As a result, [pimp my N] eventually gave rise to a new verb, pimpen ‘make more attractive’. Finally, very recently, more systematic attention to borrowed phraseology has occurred in the field of anglicism research (Gottlieb 2012; Marti Solano 2012; Oncins-Martinez 2012; Fiedler 2012), but these (largely qualitative) analyses are mainly restricted to loan translations of English phrases: attention for direct phraseological borrowing is still relatively rare.
2.1.3 A socio-cognitive perspective on the borrowability of formulaic sequences Above, we have shown how the scope of anglicism research needs to be expanded to include socio-pragmatic and cognitive parameters. Second, we indicated the need for a framework on the borrowability of formulaic sequences. This study brings both aspects together by raising the following question: from a group of socio-conceptual, linguistic and encyclopaedic features, which are influential in determining the borrowability of formulaic sequences? In this paper, we address this question by presenting a corpus-based, multifactorial, quantitative study on the borrowability of English catchphrases in Dutch.
46
Eline Zenner, Dirk Speelman and Dirk Geeraerts
2.2 The spontaneous use of English catchphrases in Dutch Catchphrases are expressions used in (visual) media, politics, literature etc. that “catch on” and are incorporated in “the phraseological component of the native speaker’s lexicon” (Alexander 1983: 11): they are used freely in discourse, in contexts detached from the original source. Henceforward, we refer to this non-source-related use of a catchphrase as “spontaneous use”. Based on this definition and on the background sketched above, the main question for this case study is what factors influence whether an English catchphrase occurs spontaneously in Dutch, and whether these mechanisms are consistent across lects. Before dealing with the actual methods we developed, we briefly discuss why catchphrases are chosen as the starting point for this study. Although they are a rather unprototypical subset of the formulaic lexicon (see Moon 1998: 22), catchphrases are interesting from both a practical and a theoretical point of view. First, Van Hout and Muysken (1994) state that issues of borrowability are best discussed by means of set-external proof, e.g. by consulting a source-language corpus. The point is to have both positive and negative evidence; verifying which set of elements from a source corpus (English) end up in the recipient language (Dutch) and which do not, forms a reliable way to study borrowability. For our project, this means we should create a list of English formulaic sequences, to then check which elements from this list occur in Dutch. In this respect, catchphrases are particularly useful, as quite some people take pleasure in posting lists of English catchphrases online. Second and more importantly, the media-related origin of catchphrases makes them interesting from a theoretical point of view. The intensity and means of exposure to the source language have often been mentioned as crucial for borrowability (Haspelmath 2008). More specifically for the expanding circle, the asymmetrical contact with English via mass media is frequently mentioned as a principal factor for its spread (see Kowner and Rosenhouse 2008; Androutsopoulos 2012). However, due to the methodological complexity of the phenomenon, no empirical evidence for this presumed importance of media influence has been given so far. Moreover, research on monolingual data minimizes the role of mass media in language change (see Labov 2001: 363–364). Hence, the actual contribution of media influence to the rise of English is an unresolved issue. The occurrence of English catchphrases in Dutch can serve as a first step in dealing with this issue: as English movies and series are subtitled instead of dubbed in Flanders and the Netherlands (Booij 2001), we can verify how important the original source of a catchphrase is for its borrowability. As such, our main research question becomes: how important is (the entrenchment of) the media source of the catchphrase for its borrowability, compared to the influence of other socio-conceptual, linguistic and lectal features?
The borrowability of English catchphrases in Dutch
47
Of course, this specific comparison of features can only serve as a first step in determining the role of mass media. Specifically, our study compares the importance of media with other features in the spread of a media-related phenomenon, but future research should also study the importance of media for the spread of other types of fixed expressions (e.g. figurative idioms).
3 Data, variables and method The main method for this study follows the approach set out by Van Hout and Muysken (1994) and consists of four main steps. First, we create an external list of English catchphrases. Second, we verify which of these catchphrases occur (spontaneously) in a large Dutch newspaper corpus. Third, we identify a set of predictors that might explain the occurrence or absence of a catchphrase in the corpus. For this step, our Cognitive Sociolinguistic approach is crucially different to Van Hout and Muysken (1994), who only incorporate linguistic features. Finally, the importance of each of our predictors is determined by performing multifactorial statistical analyses. Each of these steps is described in more detail below.
3.1 Data 3.1.1 External catchphrase list As noted above, the practical advantage of working with catchphrases is the existence of online catchphrase lists. We collected all catchphrases from nine such lists, which resulted in a first set of more than 1,000 catchphrases. To exclude personal preferences of the list makers (who sometimes show a disproportionate and unrepresentative fondness for certain movies or series), only those catchphrases that occur twice in our original set are considered for the analyses. This caused a first drastic reduction of the number of catchphrases, but note that two exceptions to this rule were made. First, one of the nine lists has been created by an American cable television network (TV Land), that used the list in the TV special The 100 Greatest TV Quotes & Catchphrases. Because of the wider scope of this list, all 100 TV Land catchphrases are incorporated in the dataset. Second, to avoid underrepresentation of UK sources, all catchphrases with a UK origin (see example 3) were selected. Nevertheless, our database still has a rather unequal distribution of both regions. We will come back to this below. (3) Nudge, nudge, wink, wink, say no more (Eric Idle in Monty Python’s Flying Circus ) As our research question focuses on the spread of media-related catchphrases, a further restriction of the final list is required: we only include catchphrases from movies
48
Eline Zenner, Dirk Speelman and Dirk Geeraerts
(see example 4), series (example 5) and TV shows (example 6). All other types of catchphrases (example 7) are removed. (4) (5) (6) (7) (8) (9)
You can’t handle the truth! (Colonel Nathan Jessup in A Few Good Men ) The truth is out there. (Fox Mulder in The X-Files ) The tribe has spoken. (Jeff Probst in Survivor ) I did not have sexual relations with that woman. (Bill Clinton) Definitely. (Raymond Babbitt in Rain Man ) Hasta la vista, baby. (The Terminator in The Terminator )
Also excluded are catchphrases made up by a single-word unit (example 8) and catchphrases in other languages than English (example 9). After applying these criteria, the eventual list contains 229 catchphrases. Of course, the fairly small size of this set is a direct consequence of the restrictions. An alternative approach is to incorporate all the catchphrases we find in the nine online lists. However, applying the strict selection criteria we have chosen here, forms the most reliable way to arrive at a first indication of the borrowability of catchphrases.
3.1.2 Corpora The next step in the analysis is to verify which catchphrases occur (spontaneously) in Dutch. For this step, we rely on two large, syntactically parsed and lemmatized newspaper corpora, which represent the two main national varieties of Dutch. LeNC, the corpus for Belgian Dutch, consists of data from six different national newspapers from 1999 to 2005 and contains over one billion words. TwNC forms the Netherlandic Dutch counterpart for LeNC: it contains roughly 300 million words for five different national newspapers in the period from 1999 to 2002. For the main analysis, both corpora are taken together. In a next step, we verify whether any regional variation exists by comparing the occurrence of catchphrases in TwNC and in LeNC. To ensure maximal comparability, this analysis only includes LeNC material for 1999–2002. Although we do not expect newspapers to be swamped with catchphrases, there are some reasons to base the analyses on this type of data. First, it allows us to stay true to the tradition of using newspaper corpora in anglicism research. At the same time, we are able to move away from the tradition by using a corpus that is sufficiently large to make reliable claims on low-frequent phenomena like catchphrases. So far, corpora of more than one million words are rare in anglicism research, due to the manual extraction methods we mentioned before. Second and more importantly, newspaper corpora form a stable and reliable source to determine whether a given item has penetrated the language at large. As newspaper journalists write for a big audience, they do not easily use expressions they deem unknown to their readers. Hence, any catchphrase used (spontaneously) in newspapers is most likely familiar to the average speaker of Dutch. We will come back to the verification of this claim further down in this paper.
The borrowability of English catchphrases in Dutch
49
3.2 Response variable: Spontaneous use of English catchphrases in the Dutch corpus The identification of the response variable of this study relies on determining which of the 299 catchphrases occur in our newspaper corpus. Using a broad search, we are able to find all potential hits, including shortened (example 10) and slightly altered (example 11) versions of the catchphrases. (10) Het wordt steeds duidelijker dat deze Russische peetvader Poetin wel eens echt ten val zou kunnen brengen . Maar niet met “an offer he can’t refuse” (De Morgen 08/12/2005). ‘It becomes more and more obvious that this Russian godfather might bring Putin down. But not with an offer he can’t refuse.’ (11) Dezelfde krant kon in elk geval de verleiding niet weerstaan om te koppen “Don’t mention the score”, met een knipoog naar de bekendste Fawlty Towersaflevering (NRC Handelsblad 12/12/2001). ‘Either way, the same newspaper could not resist the temptation to use the headline ‘Don’t mention the score’, with a wink to the most famous episode of Fawlty Towers.’ After deleting all noise from the list, 1,598 observations for 96 catchphrases were found in the corpus. A next and very crucial step is now to distinguish three different ways in which a catchphrase can manifest itself in our corpus and to appreciate that only one of these shows the actual intrusion of English catchphrases in the Dutch phraseological lexicon. First, the source-related use of the catchphrase is found when the catchphrase occurs in a piece of text that deals with the movie or series it originates from (see 12). These 656 observations are no indication of the actual penetration of the catchphrase in the Dutch phraseological repository: they only serve as an indication of the entrenchment of the source of the catchphrase in Dutch media. What is measured here is how often journalists for example talk about James Bond, not how often they freely use the expression “shaken not stirred” in discourse. (12) Sean Connery gaf toen gestalte aan de Britse geheime agent die zijn wodka “shaken, not stirred” drinkt en tussen de vuurgevechten door ’s werelds mooiste vrouwen in bed praat (LN 09/04/1999). ‘At that time, Sean Connery performed the role of the British secret agent who drinks his vodka ‘shaken, not stirred’ and who chats up the most beautiful women in the world in between fights.’ Second, some catchphrases are used to name companies, events or organizations. A Belgian record company for instance chose to name itself Play it Again Sam, and an
50
Eline Zenner, Dirk Speelman and Dirk Geeraerts
artistic photographer used there’s no place like home as the name for his exhibition. All 412 of these observations are excluded from the analyses. Finally, the spontaneous use of catchphrases is seen when the catchphrase is used in texts on subjects that are not related to the movie or series the catchphrase belongs to (see examples 10 and 11). Only these 530 observations reflect the actual intrusion of the catchphrase in the Dutch phraseological lexicon. The main question is then what features influence whether an English catchphrase is used spontaneously in our corpus or not. Before presenting the main possibly influential features we identified for this study, it is important to highlight the possibility of making a further subdivision within the category of “spontaneous use” between situations where the origin of the catchphrase is still explicitly mentioned (example 11) and situations where the catchphrase is completely detached from its origin (example 10). We have aggregated over this distinction for our analyses due to data sparseness, but it should be developed in future research.
3.3 Predictors: Possible determinants in the borrowability of catchphrases Once the response variable has been identified, the factors that may influence its behavior have to be defined. For the study of spontaneously used English catchphrases, we compare the importance of media influence, the entrenchment of the catchphrase in English and its structural and pragmatic features. Below, we discuss the operationalization of these groups of features. We also present possible alternatives and indicate what restraints led us to implement the predictors as presented here.
3.3.1 Direct media exposure: Entrenchment of the source of the catchphrase Our first feature, direct media exposure, is used to indicate how familiar a Dutch language user is with the source of the catchphrase. Put differently, it measures the importance of the popularity of the movie or series in which the catchphrase was first used for its borrowability. We operationalize this feature in two ways. First, we check the Google frequencies for the title of the source of the catchphrase, restricting the search to pages in Dutch. We complement the query with the words film or serie ‘series’ (e.g. for beam me up, Scotty, the query would be “ ‘Star Trek’ + serie”). The results are classified in 3 frequency bands (see 13)1. (13) a. b. c.
less than 5,000 [5,000 – 25,000[ more than 25,000
1 All websites used for the analyses were consulted in January 2010.
The borrowability of English catchphrases in Dutch
51
Although Google frequencies are a practical and efficient operationalization of media entrenchment, they form a rather indirect and oblique measure of the popularity of the source. A more tangible alternative is to work with box office figures, audience ratings or broadcasting information: ideally, we would have information on the extent to which the series and movies were shown in movie theatres and on television in the Low Countries. We pursued this possibility, but were hindered by the unhelpfulness of one of the most important commercial broadcasting corporations in Belgium. Second, we also used the occurrence of source-related use of a catchphrase in our corpus (see example 12) as an indication for the popularity of the source. We use a binary classification (occurrence or absence of source-related use), which is more practical than working with raw frequencies. We chose to work with the occurrence of source-related use of the catchphrase (e.g. beam me up, Scotty ) over the occurrence of the title of the source (e.g. Star Trek ) in the newspaper corpus, to ensure that the two operationalizations for media entrenchment measure different phenomena.
3.3.2 Nature of media exposure: Encyclopaedic characteristics of the source With the next group of features we focus on the encyclopaedic characteristics of the source of the catchphrase and on what these may tell us about the nature of the exposure to English media in Flanders and the Netherlands. All information is based on Wikipedia and the Internet Movie Database. A first variable is the country of origin of the source of the predictor. All selected movies and series either come from the UK or the US. This predictor can help verify whether the United States are, as is often claimed, the primary locus of English influence. Second, we compare source-types by verifying whether movies generate more spontaneous catchphrases than series or shows. Finally, we determine the importance of the age of the source, by comparing catchphrases from sources younger than forty to catchphrase from sources older than forty. For movies, age is based on the release date in the country of origin. For series, it is based on the date of the first broadcasting of the first episode in the country of origin. Of course there are other encyclopaedic features to think of. However, most of these are only applicable to either movies or series. We for instance tested the influence of the genre of the source, but as comedies are highly overrepresented for series, the predictor is only useful for the analysis of catchphrases from movies.
3.3.3 General exposure to the expression: Frequency of the catchphrase in English The following feature is meant to capture the popularity of the catchphrase in (the international use of) English. The goal is to determine to what degree Dutch users are exposed to the catchphrase, aside from their contact with its original occurrence in the media. This way, we capture the possibility that Dutch users might be familiar with
52
Eline Zenner, Dirk Speelman and Dirk Geeraerts
the catchphrase without having had any contact with the original media source. We operationalize this factor by checking the Google frequencies of the catchphrase, limiting the search to pages in English (e.g. 213,000 hits for beam me up, Scotty ). We then classified the results in three frequency bands (cf. 14). (14) a. b. c.
less than 100,000 [100,000 – 1,000,000[ more than 1,000,000
By searching all Internet pages written in English, this approach relies on the broadest possible definition of exposure to English, which is ideal for exploratory analysis. However, it obscures certain divisions between variants and varieties of English. Specifically, “pages in English” covers a load that is most likely broader than the varieties of English that Dutch speakers stand in contact with. Hence, it is advisable to obtain a more disentangled view in future research. The most straightforward approach would be to focus on the use of the expression in corpora for ELF (English as a lingua franca) and for the two “traditional” varieties of English (UK and US).
3.3.4 Linguistic features of the catchphrase The structural and pragmatic characteristics of the catchphrase form the final group of potentially influential features we identify in this study. The first of these predictors is the number of words in the catchphrase. We base this on the catchphrase as it is found in the online lists, not as it is used in the Dutch corpus. The second variable indicates the percentage of non-conventional vocabulary in the catchphrase. Exclamations, onomatopoeia and proper names are regarded as non-conventional. The percentage is computed by taking the ratio of the amount of non-conventional words over the total number of words (see examples 15 and 16). (15) Yippee-ki-yay, motherfucker! (John McClane in Die Hard ; ratio = 50%) (16) Use the force, Luke. (Obi-Wan Kenobi in Star Wars ; ratio = 25%) The sentence type of the catchphrase is the last of our predictors. We distinguish three groups: statements (see 17); (2) interrogatives and requests (see 18); and exclamations and commands (see 19). (17) I know nothing, I’m from Barcelona. (Manuel in Fawlty Towers ) (18) Permission to speak, Sir. (Jones in Dad’s Army ) (19) Allll-righty, then! (Ace Ventura in Ace Ventura ) We choose this basic classification over existing functional models of (English) formulaic sequences (e.g. Lattey 1986; Moon 1998: 215–230; and Wray 2002), because the latter are less applicable to our analyses. As we are dealing with the borrowability of catchphrases, we have to assign pragmatic function based on types, not on tokens.
The borrowability of English catchphrases in Dutch
53
Hence, our operationalization is inevitably basic. Future analyses of the corpus examples will allow us to incorporate more advanced stylistic and pragmatic analyses.
3.4 Analyses A first glance at the response variable shows that 58 of the 229 inspected catchphrases occur spontaneously in our corpus. The crucial part of our study is to determine why it is precisely these 58 catchphrases we find. More specifically, we want to identify which of our independent variables have a true influence on the borrowability of English catchphrases in Dutch. To answer this question, we perform forward stepwise logistic regression analysis. This multifactorial statistical technique determines which independent variables have an influence on the behavior of a binomial dependent variable, taking the combined effect of all the predictors into account. Performing the analysis “stepwise” ensures that the variable with the strongest impact will be selected first, followed by the second most important factor and so on. For our analyses, this means we verify which predictors influence whether a catchphrase occurs, not how often it occurs. The 229 types form our set of observations, the response is binary (“does or does not occur spontaneously in the corpus”) and the specific token frequency of the 58 occurring types is disregarded. Although disregarding the frequencies might appear to be a severe limitation, there are several good reasons to do so. First of all, there is not much variation in the token frequencies of the occurring catchphrases: 23 occur only once, the median of the frequencies is three and the maximal token frequency is 71. Taking the overall size of the corpus into account, it is clear that the token frequencies lie closely together. Moreover, the frequencies are not normally or Poisson distributed, which makes other multifactorial techniques less applicable. Second, Van Hout and Muysken (1994) also choose forward stepwise logistic regression analysis: when determining influential predictors in borrowability, the sheer occurrence of an expression or lexeme is at least as important as the number of times it occurs. Nevertheless, some more information is required concerning the status of single occurrences in the corpus. As mentioned above, 23 of the 58 spontaneous catchphrases occur only once. Considering the size of our corpus, which contains over one billion words, the question begs itself whether these single occurrences can tell us something about language at large, or whether they are to be regarded as whimsical and unreliable. For this dataset, we have reasons to believe we can form conclusions based on all 58 catchphrases, including the single occurrences. First of all, newspaper language is stable and traditional. Journalists take their audience into account and will not be prone to use unknown or uncommon constructions. Hence, even a single occurrence of a catchphrase can be seen as a clear indication of its overall use. Second, we have supported this claim by performing our regression analysis two times: once for the model with single occurrences, once for a model that only incorporates catchphrases with
54
Eline Zenner, Dirk Speelman and Dirk Geeraerts
two or more spontaneous occurrences in the corpus. Without going into the details of these regressions, it is safe to say they corroborate our claim: the model with single occurrences did not overfit the data. Hence, the single occurrences can be discussed in the analysis.
4 Results For the discussion of the regression output, we first focus on the main model, which takes the results of both corpora together. Next, we examine possible lectal variation in the importance of the predictors, by comparing the regression models for the Belgian Dutch and the Netherlandic Dutch corpus. Before discussing the results, a final comment has to be made on how to read the output of the model. For each variable, we take one value as reference point (e.g. “declarative” for sentence type). Then, the analysis verifies which of the other values (e.g. “question/request”) has an effect on the success rate of the response variable, compared to that of the reference point. Only variables with significant effects (pvalues under 0.05) are incorporated in the model. The reference point itself is captured in the intercept, and hence does not have a separate line in the output of the model. The sign of the estimate shows the direction of the effect: if this is a positive number, it means the probability of finding borrowed catchphrases is higher for the given value (e.g. “question/request”) than for the reference point (e.g. “declarative”). Negative estimates imply a lower chance of finding spontaneously occurring catchphrases than in the intercept.
4.1 Main model Our discussion of the main model (Table 1) starts off with an overview of the general characteristics of the regression output. We then discuss the significant predictors, to round off with some remarks on the insignificant predictors.
4.1.1 General characteristics of the model The general diagnostics for logistic regression confirm the goodness of fit of the model. The data do not exhibit overdispersion and no multicollinearity exists (which means that there are no inordinately strong correlations between the independent variables in the model). It is thus safe to say that our model contributes to the explanation of the borrowability of catchphrases in Dutch. An important question is then how big this contribution is. We use two measures for this, which both show that our model is quite powerful. First, pseudo R² is used to indicate how much of the variation can be explained by the model. In our case, the
The borrowability of English catchphrases in Dutch
55
pseudo R² is a very acceptable 52.6%. Second, the C-measure, which is a more reliable measure for logistic regression analysis, is used to verify whether a given model has any predictive power. Our C-level of 0.89 indicates that, given new sets of catchphrases, this model will to a reasonable extent succeed in predicting which ones occur spontaneously in Dutch.
4.1.2 Significant predictors: Entrenchment in Dutch and exposure to English Now that we have established the overall significance and importance of the main model, we can focus on the actual effects we find. Following the stepwise technique, we present the significant predictors in order of importance. The occurrence of the source-related use of the catchphrase in our newspaper corpus, which measures direct exposure to the media source, is the most important factor in the model. We can deduce from Table 1 that catchphrases that also occur in contexts related to the source of the catchphrase (see example 12) will more often have spontaneous occurrences in the corpus (see example 11) than catchphrases without such source-related use. This entails that the familiarity Dutch speakers have with the original origin of the catchphrase has a significant effect on the borrowability of the catchphrase. However, final conclusions on the importance of direct media exposure can only be made once we compare this factor to the importance of the popularity of the catchphrase in (International) English. Indeed, Table 1 shows that the overall exposure Dutch users may have to the expression in (International) English is also an important predictor in the model. Taking the ordinality of the Google frequency measure into account by using reverse Helmert coding for the statistical model, we find a steady rise of the amount of spontaneously used catchphrases from the lowest to the highest frequency band. This means that the more popular a catchphrase is in English, the bigger the chance that it is used spontaneously in our corpus. Bringing both variables together, we see that a catchphrase is likely to occur spontaneously in the corpus if it is used in Dutch in relation to its media source and if a large group of English speakers uses the expression. Of course, it is important to verify
Tab. 1: Regression output Coefficient (Intercept) source-related use: yes exposure in English (Google): 1 exposure in English (Google): 2 sentence: question/request sentence: exclamation/command type: series/show
Estimate Standard Error −1.47 2.67 0.92 0.33 0.18 −1.40 −0.83
0.46 0.43 0.28 0.15 0.62 0.46 0.42
z-value
p-value
−3.23 6.27 3.33 2.16 0.29 −3.02 −1.99
0.001 0.000 0.001 0.031 0.773 0.003 0.047
56
Eline Zenner, Dirk Speelman and Dirk Geeraerts
whether both claims are independent of each other and if not, what this entails for the interpretation of the predictors. If a large (positive) correlation coefficient is found, this would mean that both variables are measuring something very similar. Hence, asking the question if Dutch speakers are familiar with the catchphrase through the original media source or through exposure to the expression in their contact with English, would become hazardous. Entrenchment in English vs. Entrenchment in Newspapers moreThan1000 000
NO_SRU
[100 000 - 1000 000[
SRU
Source-Related Use
lessThan100 000
Google in English
Fig. 1: Entrenchment in English vs. entrenchment in newspapers
As is visible in Figure 1, we find a significant, but mild correlation between both features (p for Spearman = 0.01, rho = 0.17). Specifically, we see that catchphrases that are located in a higher frequency band (which means that they are used more often by English speakers) have a higher chance of having source-related uses (SRU) in our Dutch corpus. Although the correlation is too weak to jeopardise the conclusions we made above, it is important to interpret it. The main question is for what reason the popularity in International English and the media-related occurrence of a catchphrase in Dutch newspapers are linked. Is the entrenchment in English also influenced by the popularity of the series or movie the catchphrases originates form, or is the SRU in Dutch influenced by the popularity of the expression in English? Crucially now, disentangling this chicken-or-egg problem is complicated by the problems we mentioned above concerning the use of Google frequencies (and cf. Winter-Froemel, this volume). Not only are we not sure to what extent they are representative for the type of English that Dutch speakers are exposed to, but more impor-
The borrowability of English catchphrases in Dutch
57
tantly, they do not allow us to make a distinction between source-related and spontaneous use of the catchphrase in English. We will thus only be able to make definite conclusions on the interplay between both variables once we have collected more reliable information on the exposure to the expression in English by searching more stable corpora. Overall, we see how both familiarity with the original media source of a catchphrase and the overall exposure to the expression in (International) English contribute to the borrowability of catchphrases. We find a weak correlation between these features, which can only be reliably interpreted by conducting further analyses.
4.1.3 Significant predictors: Sentence type For sentence type, we find a clear effect on the borrowability of catchphrases. The declarative sentences are taken as reference point. No significant difference exists with questions and requests, but imperative and exclamative catchphrases do behave significantly different. Specifically, we find that these catchphrases have significantly less chance of occurring spontaneously in the corpus than declarative catchphrases. This effect is tightly linked to the formal character of newspaper language: exclamations and commands are all but typical for this register. Hence, this type of catchphrases is less likely to occur in our corpus. What this factor shows us, then, is the importance of the characteristics of the receiving medium in patterning borrowability. Specifically, due to the formal nature of the “receiving” register, the use of exclamative and imperative catchphrases is hindered. To corroborate this claim, further analyses should be conducted, based on less formal language use.
4.1.4 Significant predictors: Source type (movies versus series & shows) The final and least important predictor in the model compares catchphrases from two source types, movies (as reference point) on the one hand and series and shows on the other. Table 1 indicates that catchphrases from the latter are less likely to occur spontaneously in our corpus than catchphrases from the former. Specifically, 44% of the movie catchphrases are found spontaneously in the Dutch corpus, whereas this is true for only 16% of the catchphrases from series and shows. No clear-cut interpretation of this effect is available. A first important step in understanding the pattern is determining whether the effect of source type is reliable or whether it should be seen as an artefact of the data collection. Specifically, we calculate the correlation of source type with the entrenchment of the catchphrase in English, in order to verify whether it so happens that for movies, the online list makers only incorporate popular catchphrases from entrenched sources, whereas for series, they select all types of catchphrases from a broader variety of sources.
58
Eline Zenner, Dirk Speelman and Dirk Geeraerts
Distribution Source-Type over Frequency Bands (English) 1.0 lessThan100 000 [100 000 - 1000 000[ moreThan1000 000
0.8
Total
0.6
0.4
0.2
0.0 movie
series/show
Fig. 2: Association between source types and exposure in (International) English
A significant, but weak correlation is found between both predictors (p = .01, rho = −.17). As is demonstrated by Figure 2, the effect is especially visible in the lowest frequency band (less than 100,000 Google hits for the catchphrase). Almost half of the catchphrases from series and shows are part of this category, whereas only about one fifth of the movie catchphrases is found here. However, the correlation between both predictors does not completely explain the higher success rate for movie catchphrases. This on the one hand becomes apparent in the rather weak correlation coefficient (-.17), but also in the fact that both predictors occur side by side in the regression model without causing multicollinearity. Furthermore, if the pattern found for source type could be explained solely by the higher entrenchment of movie catchphrases, the success rate for movies and series should be similar when the frequency bands are held constant. However, Table 2 clearly shows that the higher success rate for movies occurs consistently across the three frequency bands.
Tab. 2: Success rates movies and series/shows per frequency band frequency-band (Google in English) less than 100,000 [100,000–1,000,000[ more than 1,000,000
succes rate movies
succes rate series/shows
0.20 0.47 0.61
0.06 0.16 0.23
The borrowability of English catchphrases in Dutch
59
Apparently, the effect of source type is not an artificial consequence of the nature of the database. The question then begs itself how we should interpret the difference between movies and series and what it can tell us about the nature of the English media that Dutch language users are exposed to. One possibility is to link the effect we see in Table 1 to a difference in cultural impact of movies and series. More specifically, movies can be said to have more cultural weight and thus a higher symbolic value, than series. This entails that the popularity of movie catchphrases might well be linked to the higher register of the receptor language corpus (newspapers). A hypothesis is then that the success rate for catchphrases from series and shows will be higher in colloquial language and informal registers (e.g. Usenet, IRC, youth language etc.). This claim will be verified in future research.
4.1.5 Insignificant predictors Table 3 gives an overview of the predictors that did not made it to our final model. Two comments need to be made. First, we would like to link the significance of sourcerelated use to the lack of significance of our alternative operationalization for media entrenchment (Google frequencies in Dutch). As both predictors are used as measures for the same feature (media entrenchment), it is not remarkable that only one of them is selected in the model. Because the definition of source-related use is based on the same corpus as the response variable, this factor forms a stronger, more robust reflection of the influence of media entrenchment than the Google frequency counts. Hence, it is not surprising that the latter variable is not selected for the model.
Tab. 3: Non-significant predictors Predictor source: media entrenchment (Google in Dutch) source: country of origin (US/UK) source: age catchphrase: number of words catchphrase: non-conventional vocab
Second and more importantly, it is remarkable that we find no significant effect of the country of origin of the catchphrase. As we applied a less strict criterion for the initial selection of UK catchphrases, we would expect this to show up in a lower success rate for these observations. More specifically, because several of the UK catchphrases were mentioned in only one online list, the catchphrase-status of these expressions is more dubious. Furthermore, as we have a limited set of UK phrases (47 versus 182 US phrases), these “dubious” catchphrases carry quite some weight in the analyses. Moreover, an often heard claim is that English influence is predominantly US-oriented.
60
Eline Zenner, Dirk Speelman and Dirk Geeraerts
All of these observations lead us to expect a higher success rate for US catchphrases. Yet, this is not reflected in the data.
4.2 Lectal variation: Belgian Dutch vs. Netherlandic Dutch A final step in determining the influential factors for the borrowability of catchphrases in Dutch is finding out whether any lectal variation exists in the importance of the different factors. In this study, we focus on regional variation by determining whether any differences can be found between the two main varieties of Dutch: Belgian Dutch (spoken in Flanders, the northern part of Belgium) and Netherlandic Dutch. Previous research on the attitudes of both regions towards foreign influence leads us to believe that differences in the borrowability of catchphrases might exist. Specifically, Geeraerts, Grondelaers, and Speelman (1999) discuss the longstanding purist tradition in Flanders, which originated as a consequence of the standardization process of Dutch, which was slowed down because the language of the elite was French throughout most of the eighteenth and nineteenth century. When standardization was eventually speeded up, convergence with the long-established Netherlandic Dutch norm was desired, which in practice entailed a purist reaction against French loanwords (which were abundant in Belgian Dutch, but far less frequent in Netherlandic Dutch). This purist reaction then spread out to all forms of foreign influence, including English. However, Geeraerts, Grondelaers, and Speelman (1999) also note that this purist tradition has started to decline since the second half of the twentieth century. The Netherlands, on the other hand, have always been characterised by an open attitude towards foreign languages in general, and towards English in particular. In order to determine possible differences between the varieties concerning the use and borrowability of catchphrases, we first compared the overall amount of catchphrases borrowed in both regions. Of the 229 catchphrases, 47 occur spontaneously in the Belgian Dutch newspapers, 38 occur spontaneously in the Netherlandic Dutch newspapers. This difference is not significant (p for Chi-square > 0.10). Although the proportion of catchphrases found in both regions is highly comparable, we might find differences concerning the effects of the specific predictors. To this end, we built a regression model for each of the two regions and compared the results. More specifically, we compare the confidence intervals around the estimates for the predictors in order to determine whether the effect sizes are equal (cf. Zenner, Speelman, and Geeraerts forthcoming for more details on the method). When the intervals overlap, the effects are comparable in both models. Given the more limited number of hits (47 for Belgian Dutch, 38 for Netherlandic Dutch), only two predictors could be included in the models: including more variables would lead to overfitting. Exposure to English and sentence type were included in the models, which both show a decent goodness of fit. The comparison shows that no differences between the regions can be noted (Table 3): the confidence intervals show overlap for all regressors.
The borrowability of English catchphrases in Dutch
61
Tab. 4: Confidence intervals (C.I.) for Belgian Dutch vs. Netherlandic Dutch Coefficient exposure English: 1 exposure English: 2 sentence: question/request sentence: exclamation/command
C.I. Belgian Dutch
C.I. Netherlandic Dutch
overlap?
[0.39, 1.37] [0.07, 0.58] [−0.48, 1.53] [−1.85, −0.28]
[0.49, 1.65] [−0.05, 0.54] [−1.45, 0.86] [−2.02, −0.33]
YES YES YES YES
Hence, no striking differences in the use and borrowability of the catchphrases between both regions can be noted. As such, this part of the analyses ties in with the results presented by Gerritsen et al. (2007). They also form further support for Geeraerts, Grondelaers, and Speelman (1999), who claim that, since the 1950s, the lexical differences between both regions have started to decline. Finally, the results align with the observations of Zenner, Speelman, and Geeraerts (2012) and Zenner, Speelman, and Geeraerts (forthcoming).
5 Conclusions and prospects The main point of this paper was to verify which factors influence the occurrence of spontaneously used English catchphrases in Dutch, with specific attention for the importance of media entrenchment. The regression analysis produced a stable model with quite some explanatory power, leading to four important points on the borrowability of catchphrases. First, we were able to indicate the importance of the original media origin of the catchphrase: direct media influence appears to play a role in the spread of catchphrases. Second, indirect influence was reflected in the importance of the popularity of the catchphrase in International English. Third, we indicated the importance of the characteristics of the investigated variety of the receptor language. Although no differences between the two main regional varieties (Belgian Dutch and Netherlandic Dutch) were found, the results for sentence type and source type indicated the importance of register. Specifically, we saw how exclamative and directive catchphrases are unsuccessful in our newspapers and how this is mainly due to the formal nature of newspaper language. A similar argument was used to explain the higher success rate for movie catchphrases compared to catchphrases from series and shows. Finally, no difference was found between catchphrases from US and UK sources, although the US are presumed to be more important in the spread of English as a language for international communication. To round off, we would like to present some prospects for future research. First of all, we want to get a better grasp on the distinction between direct media influence and indirect exposure to English. This point can be developed by using a broader set of corpora: indirect exposure should be based on more stable corpora of English than Google (such as BNC, COCA and ELF-corpora such as VOICE or ELFA), and Dutch corpora rep-
62
Eline Zenner, Dirk Speelman and Dirk Geeraerts
resenting lower registers should also be consulted. Second, now that we have a better view on what determines whether we find catchphrases in our corpus, we can also focus on how we find them in our corpus. Three issues should be addressed. First, which types of catchphrases are subject to internal variation (compare Stefanowitch 2002)? Second, a distinction needs to be made between catchphrases that are used spontaneously, but are still linked to their original source and catchphrases that are used completely freely (cf. supra). Finally, more research is needed on the specific textual functions of spontaneously used catchphrases (compare Androutsopoulos 2012). The final, and most important, prospect is the expansion of the issue of borrowed phraseology to other types of fixed expressions. Only then will we be able to make definite claims on the importance of mass media in the spread of English.
Acknowledgements Many thanks to Tom Ruette for his professional and cheerful help on the xml-parsing.
References Alexander, Richard J. 1983. Catch phrases rule OK. Allusive puns analyzed. Grazer Linguistische Studien 20. 9–30. Androutsopoulos, Jannis. 2012. English ‘on top’: Discourse functions of English resources in the German mediascape. Sociolinguistic Studies 6(2). 209–238. Booij, Geert. 2001. English as the lingua franca of Europe: A Dutch perspective. Lingua e Stile 36(2). 347–357. Carstensen, Broder. 1965. Englische Einflüsse auf die deutsche Sprache nach 1945. Heidelberg: Carl Winter Universitätsverlag. Crystal, David. 2003. English as a global language. Second edition. Cambridge: Cambridge University Press. De Sutter, Gert, Dirk Speelman & Dirk Geeraerts. 2008. Prosodic and syntactic-pragmatic mechanisms of grammatical variation: the impact of a postverbal constituent on the word order in Dutch clause final verb clusters. International Journal of Corpus Linguistics 13. 194–224. Deuchar, Margaret, Pieter Muysken, and Sung-Lan Wang. 2007. Structured variation in codeswitching: towards an empirically based typology of bilingual speech patterns. The International Journal of Bilingual Education and Bilingualism 10(3). 298–339. Fiedler, Sabine. 2012. Der Elefant im Raum … The influence of English on German phraseology. In Cristiano Furiassi, Virginia Pulcini & Félix Rodriguez-González (eds.), The anglicization of European lexis, 239–260. Amsterdam & Philadelphia: John Benjamins. Field, Fredric W. 2002. Linguistic borrowing in bilingual contexts. Amsterdam & Philadelphia: John Benjamins. Filipovic, Rudolf. 1977. English words in European mouths and minds. Folia Linguistica XI(3/4). 195–206. Fink, Hermann. 1997. Von Kuh-Look bis Fit for Fun: Anglizismen in der heutigen deutschen Allgemein- und Werbesprache. Frankfurt am Main: Peter Lang. Furiassi, Cristiano, Virginia Pulcini & Félix Rodriguez-González (eds.). 2012. The anglicization of European lexis. Amsterdam & Philadelphia: John Benjamins.
The borrowability of English catchphrases in Dutch
63
Geeraerts, Dirk. 2005. Lectal variation and empirical data in Cognitive Linguistics. In Fransisco Jose Ruiz de Mendoza Ibáñez & Sandra M. Peña Cervel (eds.), Cognitive Linguistics: Internal dynamics and interdisciplinary interaction, 163–189. Berlin & New York: Mouton de Gruyter. Geeraerts, Dirk, Stef Grondelaers & Peter Bakema. 1994. The structure of lexical variation. Meaning, naming and context. Berlin & New York: Mouton de Gruyter. Geeraerts, Dirk, Stef Grondelaers & Dirk Speelman. 1999. Convergentie en divergentie in de Nederlandse woordenschat: een onderzoek naar kleding- en voetbaltermen. Amsterdam: Meertens Instituut. Gerritsen, Marinel, Catherine Nickerson, Andreu Van Hooft, Frank Van Meurs, Ulrike Nederstigt, Marianne Starren & Rogier Crijns. 2007. English in product advertisements in Belgium, France, Germany, the Netherlands and Spain. World Englishes 26(3). 291–315. Gottlieb, Henrik. 2012. Phraseology in flux: Danish Anglicisms beneath the surface. In Cristiano Furiassi, Virginia Pulcini & Félix Rodriguez-González (eds.), The anglicization of European lexis, 169–198. Amsterdam & Philadelphia: John Benjamins. Grigg, Peter. 1997. Toubon or not Toubon: the influence of the English language in contemporary France. English Studies 4. 368–384. Haspelmath, Martin. 2008. Loanword typology: Steps toward a systematic cross-linguistic study of lexical borrowability. In Thomas Stolz, Dik Bakker & Rosa Salas Palomo, Aspects of language contact: New theoretical, methodological and empirical findings with special focus on Romancisation processes, 43–62. Berlin & New York: Mouton de Gruyter. Haugen, Einar. 1950. The analysis of linguistic borrowing. Language 26 (2). 210–231. Impe, Leen, Geeraerts, Dirk & Speelman, Dirk. 2008. Mutual intelligibility of standard and regional Dutch language varieties. International Journal of Humanities and Arts Computing 2(1/2): 101–117. Kachru, Braj B. 1992. The other tongue: English across cultures, second edition. Urbana: University of Illinois Press. Kowner, Rotem. & Judith Rosenhouse. 2008. The hegemony of English and determinants of borrowing from its vocabulary. In Kowner, Rotem & Judith Rosenhouse (eds.), Globally speaking. Motives for adopting English vocabulary in other languages, 4–18. Clevendon: Multilingual Matters. Krauss, Paul G. 1958. The increasing use of English words in German. The German Quarterly 31(4). 272–286. Kristiansen, Gitte & René Dirven (eds.). 2008. Cognitive Sociolinguistics: Language variation, cultural models, social systems. Berlin & New York: Mouton De Gruyter. Kurth, Ernst-Norbert. 1998. American idiom in Modern German: socio-linguistic motives. Babel: Revue internationale de la traduction 44(3). 193–206. Labov, William. 2001. Principles of linguistic change. Social factors. Oxford: Blackwell Publishers. Lattey, E. 1986. Pragmatic classification of idioms as an aid for the language learner. IRAL, International Review of Applied Linguistics in Language Teaching 24(3). 217–233. Leppänen, Sirpa & Tarja Nikula. 2007. Diverse uses of English in Finnish society: Discoursepragmatic insights into media, educational and business contexts. Multilingua 26(4). 333– 380. Marti Solano, Ramon. 2012. Multi-word loan translations and semantic borrowings from English in French journalistic discourse. In Cristiano Furiassi, Virginia Pulcini & Félix RodriguezGonzález (eds.), The anglicization of European lexis, 199–216. Amsterdam& Philadelphia: John Benjamins. Moon, Rosamund. 1998. Fixed expressions and idioms in English. Oxford: Clarendon Press. Muysken, Pieter. 2000. Bilingual speech. A typology of code-mixing. Cambridge: CUP.
64
Eline Zenner, Dirk Speelman and Dirk Geeraerts
Nettmann-Multanowska, Kinga. 2003. English loanwords in Polish and German after 1945: Orthography and morphology. Frankfurt am Main: Peter Lang. Oncins-Martinez, José Luis. 2012. Newly-coined Anglicisms in contemporary Spanish: A corpusbased approach. In Cristiano Furiassi, Virginia Pulcini & Félix Rodriguez-González (eds.), The anglicization of European lexis, 217–238. Amsterdam & Philadelphia: John Benjamins. Onysko, Alexander. 2004. Anglicisms in German: from iniquitous to ubiquitous? English Today 20(1). 59–64. Onysko, Alexander. 2007. Anglicisms in German. Borrowing, lexical productivity, and written codeswitching. Berlin & New York: Walter de Gruyter. Pennycook, Alastair. 2003. Global English, Rip Slyme and performativity. Journal of Sociolinguistics 7(4). 513–533. Pfaff, Carol. 1979. Constraints on language mixing: Intrasentential code-switching and Borrowing in Spanish/English. Language 55(2). 291–318. Poplack, Shana. 1980. Sometimes I’ll start a sentence in Spanish Y TERMINO EN ESPAÑOL: toward a typology of code-switching. Linguistics 18. 581–618. Poplack, Shana, David Sankoff & Chris Miller. 1988. The social correlates and linguistic processes of lexical borrowing and assimilation. Linguistics 26. 47–104. Seidlhofer, Barbara. 2009. Common ground and different realities: World Englishes and English as a Lingua Franca. In World Englishes 28(2). 236–245. Sharp, Henriette. 2001. English in spoken Swedish. A corpus study of two discourse domains. Stockholm: Almqvist and Wiksell. Stefanowitsch, Anatol. 2002. Nice to miet you: bilingual puns and the status of English in Germany. Intercultural Communication Studies XI(4). 67–84. Yang, Wenliang. 1990. Anglizismen im Deutschen. Tübingen: Max Niemeyer Verlag. Yano, Yasukata. 2009. English as an international lingua franca: from societal to individual. World Englishes 28(2). 246–255. Van de Velde, Freek & Eline Zenner. 2010. Pimp my Lexis: het nut van corpusonderzoek in normatief taaladvies. In Els Hendrickx, Karl Hendrickx, Willy Martin, Hans Smessaert, William Van Belle & Joop Van der Horst (eds.), Liever meer of juist minder? Over normen en variatie in taal, 51–68. Gent: Academia press. Van Hout, Roeland & Pieter Muysken. 1994. Modeling lexical borrowability. Language Variation and Change 6(1). 39–62. Whitney, William D. 1881. On mixture in language. Transactions of the American Philosophical Association 12. 1–26. Wray, Alison. 2002. Formulaic language and the lexicon. Cambridge: Cambridge University Press. Zandvoort, Reinard W. 1964. English in the Netherlands: a study in linguistic infiltration. Groningen: Wolters. Zenner, Eline, Dirk Geeraerts & Dirk Speelman. 2009. Expeditie Tussentaal: Leeftijd, identiteit en context in ‘Expeditie Robinson’. Nederlandse Taalkunde 14. 26–44. Zenner, Eline, Dirk Speelman & Dirk Geeraerts. 2012. Cognitive Sociolinguistics meets loanword research: Measuring variation in the success of anglicisms in Dutch. Cognitive Linguistics 23(4). 749–792. Zenner, Eline, Dirk Speelman & Dirk Geeraerts. In press. Macro and micro perspectives on the distribution of English in Dutch. A quantitative usage-based analysis of job ads. To appear in Linguistics.
Esme Winter-Froemel
Formal variance and semantic changes in borrowing: Integrating semasiology and onomasiology Abstract: In this paper I adopt a usage-based approach to borrowing and investigate two fundamental kinds of change which challenge traditional approaches to the processes of borrowing. First, I will address phenomena of formal variance, i.e. different variants of a loanword which may coexist in the recipient language/RL, as given in French people, pipole, pipeul, etc. Second, I will discuss semantic changes which can be observed in the situation of borrowing proper, such as in French sombrero, whose meaning is narrowed compared to the Spanish meaning (‘hat’ → ‘a kind of hat’). In order to account for these phenomena, I will propose an approach which reconsiders language contact and borrowing at the level of speaker-hearer interaction. I will argue that two fundamentally distinct levels of analysis must be carefully distinguished: the level of the abstract or virtual signs and the level of their actualizations in discourse. The latter level is of central importance for the various processes of change, as I will show by combining semasiological and onomasiological analyses: processes of borrowing imply that the RL hearer constructs a (virtual) RL sign out of the form of the loanword which is actualized in language contact. The formal variance observed can thus be explained by assuming varying processes of integration within the RL community, and by assuming that certain degrees and modalities of loanword integration can convey specific pragmatic effects, as indicated by metalinguistic comments in corpus data. Moreover, building on the hypothesis that the borrowings take place in bridging contexts where the SL and RL meanings of the loanwords are functionally equivalent (cf. Evans and Wilkins 2000), the semantic changes can be explained by assuming situations in which different conceptual categorizations and semantic interpretations of the concrete referent are available for the (SL) speaker and the (RL) hearer. Finally, I will argue that the onomasiological perspective also offers an explanation for the fact that loanwords may show fundamentally different pragmatic behaviors, some loanwords typically having strong pragmatic effects (e.g. French pipole ), others, in contrast, being more or less neutral lexical choices (e.g. Italian mouse ).
1 Challenges for modeling processes of borrowing in a usagebased perspective In recent research on loanwords we can observe a shift in the focus of analysis: while previous studies have been strongly oriented towards the classification and formal description of loanwords and loanword integration (cf. Betz 1949, 1974; Hope 1971; Duckworth 1977; Kiesler 1993; Peperkamp and Dupoux 2003; LaCharité and Paradis 2005), recent approaches stress another aspect of loanwords, which is their use in communi-
66
Esme Winter-Froemel
cation (cf. Onysko 2007; Shin 2010; Winter-Froemel 2011; Zenner, Speelman, and Geeraerts 2012). Moreover, the shift towards a usage-based perspective on borrowing (cf. Rohde, Stefanowitsch, and Kemmer 1999) also involves a methodological shift towards corpus-based studies, as traditional methods of analyzing loanwords, e.g. by manual extraction from dictionaries, are considered as insufficient. At the same time, these new approaches suggest a picture of borrowing and loanword integration that partly diverges from traditional conceptions and highlights the complexity and dynamics of this process of neology in the speakers’ usage. In this paper my aim is to investigate two challenges which arise from usage-based and corpus-based studies of loanwords. First, it can be shown that processes of borrowing are much more heterogeneous and complex than traditional dictionary-based studies suggest, as they frequently involve a stage where different variants of a given loanword coexist. This finding must at least to some degree hinder attempts to postulate regular processes of loanword integration, and requires more flexible models of borrowing, capable of accounting for the formal variance observed. Second, I will address basic issues in the semantics and pragmatics of borrowing. Some loanwords show semantic changes which occur in the situation of borrowing proper. These changes represent a puzzle for loanword studies, as it is difficult to explain why a source language (SL) speaker should innovate in a situation of language contact with a recipient language (RL) hearer; at the same time, for most loanwords it is not plausible to assume that the RL hearer misunderstood the SL speaker in language contact. Moreover, adopting a usage-based perspective on borrowing, it is interesting to observe that while some loanwords typically convey strong pragmatic effects (e.g. F. pipole, G. Event ), similar interpretations are absent in other borrowings (e.g. I. mouse, G. Film ). These specific pragmatic effects have not been systematically addressed in traditional studies, and they thus remain mostly unexplained. The paper is structured as follows: in the remainder of Section 1 I will expose in more detail the various kinds of challenges which arise from usage-based loanword studies. In Section 2 I will then propose a usage-based approach to borrowing in speaker-hearer interaction. In such a view semasiology and onomasiology have to be redefined with respect to the semiotic entities concerned in each perspective. Section 3 aims at explaining the formal variance observed in loanwords. Section 4 combines semasiological and onomasiological analyses in order to explain semantic changes in borrowing and specific pragmatic effects of loanwords. The results of the analyses are summarized in Section 5. The various arguments discussed here will be illustrated mainly by examples taken from French.
1.1 Formal variance Many traditional studies on borrowing have been based on dictionaries (cf. Steuckardt et al. 2011). Following the information provided in the dictionary entries, lexical bor-
Formal variance and semantic changes in borrowing
67
rowing appears as a relatively simple process in which a certain word form is imported into another language, with the possibility of formal changes (in pronunciation, spelling and morphological features) by which the loanword is adapted to the structure of the RL. However, it can be shown that this picture of borrowing is biased by the methods of analysis and the data sources used, as dictionaries frequently tend to register only variants which have been conventionalized to a large degree. Moreover, some dictionaries also show normative or puristic tendencies, which means that in some cases, loanwords are only reluctantly included at all, and if they are included, variants showing a high degree of formal adaptation to the RL (e.g. F. fioul compared to fuel, cf. DAF ), or native equivalents to the loanwords (e.g. F. logiciel for software, see DAF, cf. PR ) are recommended. In dictionaries with strong puristic tendencies, it is also possible that certain variants of a loanword, in spite of their wide diffusion in the RL community, are not cited or are marginalized because of their – presumed – inadequate degree of formal integration (e.g. for F. fuel the DAF redirects the reader to the entry fioul ).1 If we take corpus data as the starting point of analysis, a radically different picture of borrowing emerges, as in many cases we can observe that several variants of the loanword coexist for a certain period of time after the introduction of the loanword into the RL. This becomes particularly evident if we use corpora containing informal speech and texts with a (relatively) low degree of normativity. For this reason, in spite of certain methodological disadvantages, the Internet as a corpus permits us to gain interesting new insights into processes of borrowing.2
1 Such puristic tendencies can be illustrated by the following citation extracted from the preface of PR : “Certains anglicismes, on le sait, sont plus contestables dans la mesure où ils ne sont pas nécessaires.” (PR, Préface, p. XVIII). However, at the same time the authors emphasize that the main aim of the dictionary is not to establish rules concerning the use of anglicisms, but to observe the linguistic facts really attested and to point to possible problems. In the DAF, the puristic orientation is even more clear: “Nous ne faisons place aux mots étrangers qu’autant qu’ils sont vraiment installés dans l’usage, et qu’il n’existe pas déjà un honnête mot français pour désigner la même chose ou exprimer la même idée.” Nevertheless, the authors insist on the fact that this aspect is often exaggerated: “Nous sommes d’ailleurs plus accueillants qu’on ne le prétend, considérant que la langue est moins menacée par l’extension du vocabulaire que par la détérioration de la syntaxe.” (DAF, Préface). 2 An important issue which has to be addressed when using the Internet as a corpus concerns the possibility of restricting the searches to certain periods of time (cf. Kehoe 2006; Renouf, Kehoe, and Banerjee 2007). However, these difficulties can be partly overcome by using special research instruments, as provided by WebCorp and AltaVista. Nevertheless, certain problems for automated searches in a large number of documents remain, as there is no unitary norm concerning the encoding of the date of Internet pages. Moreover, it has to be acknowledged that the data available on the Internet and its retrieval by search engines is strongly influenced by marketing strategies, that is, the search engines and their search options have not been designed for linguistic purposes, but for purposes related to commercial interests.
68
Esme Winter-Froemel
This can be shown very clearly if we study loanwords which have only recently been conventionalized in the RL or are still in the course of conventionalization and which, therefore, are only scarcely documented in dictionaries and traditional corpora. In order to show the extent of variation attested in the Internet let us have a look at E. people, as borrowed into French. While the database Frantext only provides a restricted number of occurrences of this loanword in French and exclusively attests this borrowing in adjectival use3, the dictionary PR lists some more forms in which this borrowing may occur (F. people [pipœl] adj.inv. and n.m.inv.; n.m.pl., n.m.sg., pipeul(e), pipole ). However, even these variants still do not exhaust the broad range of actual uses, as can be shown by queries on the Internet. Here we can find the following occurrences in French language documents (the different variants of this borrowing are given in bold print)4: (1)
Les people / Dansent dans la neige et l’alcool / Ils sont tous devenus folles (from the song “Les people” performed by Marianne James, where F. people rhymes with F. idole, parasols, Nicole, gondoles, casseroles, alcool, etc.; cf. http:// musique.ados.fr/Marianne-James/Les-People-t105491.html, accessed 09 February 2012)
(2)
Vacances: Où partent les peoples cet été? (title of an article mentioning both male and female celebrities, e.g. Johnny Hallyday, Beyoncé, Helena Christensen, Naomi Campbell, Clint Eastwood, Michael Schumacher, Jean-Claude Van Damme, published 18 July 2011, http://www.staragora.com/news/vacances-oupartent-les-peoples-cet-ete/428206, accessed 13 February 2012)
(3) Ségolène Royal est devenue ce qu’elle voulait: une PEOPLE!! (blog entry, dated 27 February 2009, http://www.bestbuzz.fr/2009/02/politique/segolene-royal-etandre-hadjez/, accessed 09 February 2012) (4) Les people enceintes: les bébés stars de 2011! (title of an entry published 12 April 2011, author: Johanna, http://www.bebe-cards.com/blog/stars-enceintes/, accessed 12 February 2012) (5) Les peoples enceintes (title of a blog entry dated 28 April 2011, http://www.clic postal.com/blog/2011/04/28/les-peoples-enceintes/, accessed 12 February 2012)
3 In total, we find 8 occurrences of people in Frantext (the first dating from 1993), and in all of these uses, the borrowing appears in adjectival form (4 occurrences for “la rubrique ‘People’ [des journaux]”, 2 occurrences for “presse people ”, 1 occurrence for “la série people des magazines” and 1 for “un article ‘people’ ”, query from 09 February 2012). 4 And still, these occurrences do not exhaust the full range of variants attested. However, I will limit the following investigations to variants which are relatively frequently attested (more than 300 results in Google searches).
Formal variance and semantic changes in borrowing
69
(6) Un Pipole … Des Pipeaux (title of a satirical comedy, 2009, cf. http://www.placed esprods.com/productions/un-pipole-des-pipeaux, accessed 09 February 2012) (7) J’aime pas les Pipoles! […] Bon courage aux gentils éleveurs de bambins quasiadultes, ces derniers vous riront bien au nez quand vous leur direz “si tu veux faire quelque chose de ta vie, il faut étudier!” ou quand vous sortirez l’argument massue (et désuet) qu’on a tous entendu au moins une fois: “passe ton bac d’abord!”, vos joyeux drilles, chair de vos entrailles, vous riront bien au nez: “Rien à foutre le vieux, moi je ferai pipooooooole!!” (blog entry, dated 27 January 2012, author: Dunno [female, France], http://dunnowhattodo.over-blog.com/article-jaime-pas-les-pipoles-98039797.html, accessed 09 February 2012) (8) Entre nous, j’ai trouvé un peu lège de la part des organisateurs de l’évènement qu’ils fassent venir une pipole [= Michelle Blanc, EWF] de l’autre côté de la planète pour la planter là toute seule dans une ville inconnue. (blog entry, dated 22 January 2012, author: Sophie Ménart, http://www.sophiemenart.info/2012/01/ journee-a-sete/, accessed 09 February 2012) (9) Salut les pipeul! J’ai acheter une basse electro acoustique (ça faisais un an que je faisais de la basse éléctrique) et je cherche une ou deux musique sympa a faire, genre autour du feu et tout, ‘fin vous voyez le genre quoi ^^. Merci d’avance les gars (filles aussi)! (posting in an Internet forum, dated 25 August 2008, author: Matouvu, http://forums.abc-tabs.com/lofiversion/index.php/t24441.html, accessed 09 February 2012)5 (10) Devenir écrivain, est-ce que ça vous fait peur ou est-ce que ça vous tente? Je sous-entends “écrivain connu”, un People parmi d’autres, poursuivi par les journalistes et les groupies du Verbe. (…) Vos réponses sont attendues dans le hall d’accueil de Fulgures, ici même. A vous! [= question, author: Mireille] Bon, moi, je suis pas prête ;-) J’aime bien qu’on me foute la paix, alors … Quand je serai vraiment très vieille, bourrée de talent et détachée (mais pas indifférente), peut-être que je me laisserai tenter par les sirènes de la renommée!!! Pour l’instant, je Fulgurise et ça me va. [= comment posted by Sylviane Kerivel, dated 17 February 2009] Réponse de l’auteur: C’est une idée, ça, Sylviane, attendre d’être une vieille mémé pour casser les pieds à tous les pipeuls et à toutes les pipeul-ettes à la ronde ;o)
5 The low degree of normalization of many of the Internet documents cited also becomes visible in the relatively high number of orthographical mistakes or grammatical errors (e.g. j’ai acheter, ça faisais ). However, for certain deviations from the “correct” spelling, it is not clear whether they should be analyzed as (unintended) errors or as intentional (creative) uses (cf. Winter-Froemel 2011: 451). I refrain here from signaling all of these various kinds of (potential) mistakes / errors in the citations and from discussing them in more detail.
70
Esme Winter-Froemel
en sortant des best sellers. La seule exigence: que nos livres soient bons. Il faudra absolument éviter les compromis, les copinages et le sens du courant. Alors rendez-vous plus tard : ça nous fera bien rire d’en reparler ;o) Bisous à toi. [= author’s response] (discussion in an Internet forum, dated February 2009, http://www.fulgures.com/ content/view/2906/2/, accessed 09 February 2012) (11) Un peu d’pipeule (category title, http://chateauroux-c-est-fou.hautetfort.com/ un-peu-d-pipeule/, accessed 12 February 2012) (12) Salut les pipeule! Je cherche donc un câble de ce type pour ma 360! Si vous avez un bon plan …;) Sinon je partirai à la pêche en magasin demain! (message in an Internet forum, dated 07 January 2011, author: Isimorn, http://forum.gamespirit 69.com/viewtopic.php?f=40&t=1165, accessed 13 February 2012) (13) Mais voilà que monsieur le poète photographie aussi les pipeule! (are you familiar with “les pipeule”? What the French call the beautiful “people”?!!) [= comment to a photo, dated 10 May 2009, author: cieldequimper] Yes, I shot her, she’s one of the (few) actresses I really like. But the background is quite good … Les pipeule is a new word for me, thank you! [= reaction to the comment, dated 10 May 2009, author: Vogon Poet] (http://dailyphotostream.blogspot.com/2009/05/goal.html, accessed 12 February 2012) (14) L’été, on traque les pipeules (title of a blog entry, dated 06 September 2010, http://pouikou.over-blog.com/article-les-pipeules-se-cachent-56572254.html, accessed 29 March 2012) These relatively few examples show that there is a broad range of variation beyond the facts registered by PR and Frantext, and this variation affects different levels of analysis: the levels of pronunciation ([pipOl] vs. [pipœl]), spelling (e.g. , , , ), and morphology (e.g. people n.m.sg./n.f.sg./n.pl., pipeul n.pl. vs. pipeuls n.pl). (Moreover we can see that the borrowing of E. people into French also shows an interesting semantic development, to which we will return in 1.2). In order to distinguish this specific situation of coexisting variants from other types of variation, I have proposed to introduce the term variance (Winter-Froemel 2010: 69), by which I refer to the existence of alternative realizations of the structural features of a loanword on the levels of spelling, pronunciation and morphology (including gender assignment and inflection).
Formal variance and semantic changes in borrowing
71
The different variants attested in the Internet documents cited above can be summarized as follows (for the remainder of this paper I will focus on the nominal use of the borrowing only):6 (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27)
E. people ‘people’ → F. people n.m.pl. ‘celebrities’ E. people ‘people’ → F. people n.m.sg. ‘a celebrity’, pl. people E. people ‘people’ → F. people n.m.sg. ‘a celebrity’, pl. peoples E. people ‘people’ → F. people n.f.sg. ‘a (female) celebrity’, pl. people E. people ‘people’ → F. people n.f.sg. ‘a (female) celebrity’, pl. peoples E. people ‘people’ → F. pipole [pipOl] n.m.sg. ‘a celebrity’, pl. pipoles E. people ‘people’ → F. pipole [pipOl] n.f.sg. ‘a (female) celebrity’ E. people ‘people’ → F. pipeul [pipœl] n.m.pl. ‘celebrities’ E. people ‘people’ → F. pipeul [pipœl] n.m.sg. ‘a celebrity’, pl. pipeul E. people ‘people’ → F. pipeul [pipœl] n.m.sg. ‘a celebrity’, pl. pipeuls E. people ‘people’ → F. pipeule [pipœl] n.m.sg. ‘a celebrity’, pl. pipeule E. people ‘people’ → F. pipeule [pipœl] n.m.sg. ‘a celebrity’, pl. pipeules E. people ‘people’ → F. pipeulette [pipœlEt] n.f.sg. ‘a female celebrity’, pl. pipeulettes 7
The great extent of variance attested represents an immediate challenge for traditional approaches which consider borrowing and loanword integration as simple processes of importation in which certain features of the SL form may be adapted to the RL, thereby yielding the RL form of the borrowing.8 However, as we can see, “the” RL form does not necessarily exist, as different processes of adaptation may occur, and different degrees of adaptation may be judged as adequate by different RL speakers using this loanword.
6 It can be noted, however, that for F. pipeul, the first results given by common search engines do not include documents from 2011 or 2012, and this variant thus seems to become more and more marginal. Similarly, the information provided in PR hints at a marginal status (cf. the fact that this variant is cited with different spellings – s.v. people we find the spelling , but the entry to this variant reads ), and this observation is further confirmed by the deletion of the mention of this variant in Wikipédia (“People est un terme anglais signifiant «les gens» ou «le people» (mot français à l’origine du mot anglais). Francisé en «pipole» et parfois «pipeul», […]” is modified to “People est un terme anglais signifiant «les gens» ou «le peuple», francisé en «pipole»”, modification dated 12 February 2009, http://fr.wikipedia.org/w/index.php?title=People_%28homonymie% 29&action=historysubmit&diff=37954564&oldid=34232737, accessed 09 February 2012). 7 The form pipeulette arises from a further suffixation of the loanword. 8 For example, optimality theoretic models of loanword integration assume that this process is entirely determined by the specific constraint ranking of the RL or by general perceptual features which only permit one specific structural realization of the loanword in the RL (see e.g. Jacobs and Gussenhoven 2000; LaCharité and Paradis 2005; Peperkamp and Dupoux 2003; Miao 2005; Rose and Demuth 2006). Interestingly, the variance observed thus challenges both the proponents of a phonological stance and perceptual accounts of loanword integration.
72
Esme Winter-Froemel
What is more, in some cases, we also find different variants used by one and the same speaker and within one and the same text (cf. un People and les pipeuls / les pipeul-ettes in example (10) and les pipeule and people in example (13)) – these seem even more difficult to explain, as they cannot be explained by factors related to the individual speaker, nor by factors related to the communicative setting (e.g. text type, degree of formality, etc.). Before passing over to the semantic changes which may occur in borrowing, let us consider one more example of variance. For the borrowing of E. fuel (oil) into French, the degree of variance is lower than for the example discussed above, but it is interesting to observe that again, we find variants which differ fundamentally on the levels of pronunciation and spelling. (28) E. fuel ["fju…´l] → F. fuel [fjul] (OED ; PR ; TLF ; DHLF ) (29) E. fuel ["fju…´l] → F. fioul [fjul] (OED ; PR : «Recomm.[andation] offic.[ielle] pour fuel »; TLF ; DHLF s.v. fuel ; DAF ) (30) E. fuel ["fju…´l] → F. fuel [fÁEl] (OED ; Arrivé, Gadet, and Galmiche 1986: 250; cf. TLF ) (31) E. fuel ["fju…´l] → F. fuel [fy"El] (OED ; TLF )
1.2 Semantic changes and varying pragmatic effects of borrowings Borrowings occur as a consequence of situations of language contact between a SL speaker and a RL hearer. In these situations, the SL form is used to designate a certain referent, and it can be borrowed by the RL hearer in this meaning. It should therefore typically be expected that borrowings occur without any changes at the semantic level. However, this is not the case. If we compare the SL and the RL meaning(s) of loanwords, we can frequently observe that these do not exactly match. The semantic deviations can be grouped in three main categories (cf. Winter-Froemel 2011: 484–485). First, deviations can arise from further semantic (or other) change in the SL or in the RL. For example, the semantic difference between the SL and the RL form in (32) can be explained by assuming that the French form originates from a borrowing of E. fuel oil into French (without a change in meaning) and a subsequent ellipsis (F. fuel oil ‘fuel oil’ → F. fuel ‘fuel oil’).9 This kind of process is not restricted to borrowings, and I will therefore not comment on this type of semantic deviations in more detail here.
9 Alternatively, however, we could also assume that the French form is a direct borrowing from E. fuel, with a genuine semantic change (a semantic specialization) in borrowing.
Formal variance and semantic changes in borrowing
73
(32) E. fuel ‘material burned to produce heat or power’ – F. fuel ‘fuel oil’ (OED ; PR ; TLF ) Second, an important observation which can be made is that loanwords are normally borrowed in one meaning only. This means that SL polysemies (and homonymies) normally disappear in processes of borrowing (see example (33)).10 (33) E. link ‘ring of a chain’, ‘person connecting two others’, ‘measure of length’, ‘link between Internet documents’ → I. link ‘link between Internet documents’ (OED ; DO ) This kind of change is often presented as a “frequent tendency” in borrowing (Gusmani 1973: 95). However, I would argue that it can even be regarded as a basic characteristic, that is, as a rule of borrowing in general (cf. Pulcini 2002: 162): again, this kind of change seems relatively easy to understand from a usage-based perspective on borrowing, focused on the speakers and interpersonal communication. In the situation of language contact which represents the origin of the borrowing, the speaker uses the SL form in a certain (contextual) meaning, and the other meanings of the SL form are not (directly) relevant.11 Therefore, in this perspective, the reduction of SL polysemy in borrowing does not require any further explanation; in contrast, cases in which SL polysemies equally arise in the RL can be seen as exceptional cases which have to be explained by further investigations.12 Third, there are also genuine cases of semantic change in borrowing, that is, changes occurring in the situation of borrowing proper (in the [presumably13] first use of the loanword in the RL). A relatively recent – and particularly striking – example is the semantic change occurring in the borrowing of E. people into French. The clear semantic deviation between the SL and the RL form becomes clear if we have a look at the French version of Wikipedia, where a query for “people” redirects to the entry “célébrité”, which is defined as follows:
10 “Normally”, because we may find parallel polysemies in the SL and in the RL. Many of these occur in situations of high bilingualism and intense language contact. On a theoretical level, they can be explained by assuming several processes of borrowing for the various meanings, so that these phenomena do not contradict the explanations and the semiotic modeling of borrowing proposed here. 11 In the remainder of this paper, I will therefore specify only the meanings of the SL forms which are directly relevant to the situations of language contact and borrowing. 12 It would seem interesting to investigate whether there are exceptions to the general rule stated above. For example, we could think of situations of intense language contact and high bilingualism permitting interlingual wordplay on several meanings of a loanword, etc. 13 “Presumably” refers to the perspective of the innovator, who introduces (first uses) the loanword in the RL.
74
Esme Winter-Froemel
(34) Un personnage public ou une célébrité est une personne largement reconnue ou fameuse qui attire sur elle l’attention du public et des médias. […] (http://fr.wikipedia.org/wiki/People, accessed 09 February 2012) The meaning is thus narrowed in the process of borrowing (see (35)). Other examples of semantic changes in the situation of borrowing proper are given in (36) to (38). Changes like these represent a puzzle for loanword studies, as it is difficult to explain why a source language (SL) speaker should innovate in a situation of language contact with a RL hearer; at the same time, for most loanwords it is not plausible to assume that the RL hearer misunderstood the SL speaker in language contact. (35) E. people ‘people’ → F. people ‘celebrities’ (OED ; PR ) (36) S. sombrero ‘hat’ → E./F. sombrero ‘a kind of hat with a very wide brim’ (OED ; PR ; TLF ) (37) I. grappa ‘pomace brandy’ → F. grappa ‘pomace brandy of Italian origin’ (PR ; TLF ) (38) E. flipper ‘lever in the pinball game’ → F. flipper ‘pinball game’, ‘pinball machine’ (OED ; PR ) Concerning possible explanations of similar changes, it seems important to note that certain kinds of semantic change clearly prevail here. Traditional research has mainly stressed the importance of semantic specializations, as given in the borrowing of F. people (‘people’/ ‘famous people’) (cf. among others Alexieva 2008: 43; Busse and Görlach 2002: 162; Humbley 2008: 231). In this case, the meaning of the RL form is thus taxonomically subordinated to the meaning of the SL form (see also examples (36) and (37) cited above). However, there is also another type of semantic relation which can be observed: in (38), the SL and RL meaning stand in a relation of contiguity, that is, we are dealing with a metonymic change. Metonymies are only rarely mentioned in traditional loanword research (for example, in her study on anglicisms in Italian, Pulcini notes that “[…] the original meaning of the loanword may be changed, especially through metonymic modifications”; Pulcini 2002: 162). Quantitatively, metonymies are also less important than specializations in borrowing; nevertheless, the question arises how these deviations can be explained. Returning, once more, to the borrowing of E. people into French, we can finally observe that this loanword can assume a clearly negative value in communication. In other words, specific pragmatic effects can be triggered if a speaker uses this loanword in French (see (39)). For other loanwords, in contrast, similar pragmatic effects seem to be absent (see (40)). Therefore the question arises why pragmatic effects may arise when certain loanwords are used, and why these effects do not apply to all borrowings in the same way.
Formal variance and semantic changes in borrowing
75
(39) Mais c’est quoi un pipole en fait? Un pipole, ca peut aussi s’appeler une starlette ou une célébrité, mais dans le mot pipole, il y a pipeau (oui, je sais, c’est léger). Le pipeau est un instrument à vent et peut donc représenter (de façon très imagée) le cerveau d’un pipole (rempli de courant d’air, donc). Le pipole est dans tous les magazines dits presse à scandales ou tabloïds, genre Voici, Gala, Paris Match ou les versions du pauvre : Closer, Oops et Public. Le pipole n’a aucun talent particulier, il fait juste parler de lui, comme ça. Et on le paie pour ça. Le pipole est souvent énervant parce que franchement stupide. Le pipole est souvent une jolie fille pleine de seins ou une joli mec plein de muscles (sachant que le mot “joli” est à l’appréciation de chacun). Le pipole a eu accès à la notoriété en s’affichant un peu n’importe où et n’importe comment. Voilà. (blog entry dated 27 January 2012, author: Dunno [female, France], http://dunnowhattodo.over-blog.com/article-jaime-pas-les-pipoles-98039797.html, accessed 09 February 2012) (40) Passa il mouse sulle località attive per conoscerne la distanza da Le Bilodole. (“Pass the mouse over the active zones in order to know their distance from Le Bilodole”; from a description of Le Bilodole, a Bed and Breakfast in Tuscany, http://www.bilodole.it/ita/dintorni.html, accessed 28 March 2012)
2 Reconsidering language contact and borrowing at the level of speaker-hearer interaction In order to account for the various phenomena observed, I will now propose an approach which is rooted in speaker-hearer interaction and which analyzes language contact and borrowing from the perspectives of the speakers and hearers taking part in the communicative events concerned. This implies that the approach taken here is inherently usage-based. While many previous studies on borrowing have insisted on the fact that notions like borrowing and loanword are metaphorical and that it is not languages which can borrow words from other languages, but only speakers (see e.g. Alexieva 2008: 47– 48), this observation has not lead to a methodological and semiotic reconsideration of the process of borrowing. A central task which has to be addressed in developing a usage-based approach to borrowing thus consists of modeling the communication process. If we analyze concrete situations of language contact and borrowing, i.e. individual acts of communication, we have to take into account the concrete realisations of these semiotic entities in communication, as given in the corpus data. These entities are sound chains (in the case of oral communication) and strings of written signs in
76
Esme Winter-Froemel
the case of written communication between the SL speaker and the RL hearer.14 At the level of content, these realizations refer to a concrete or actualized referent. Moreover, if we analyze communication as a kind of speaker-hearer interaction, we also have to take into account the linguistic and extra-linguistic knowledge of the speaker and hearer, which means that we will also include the abstract linguistic signs (e.g. the word as part of the SL or the RL, considered as abstract systems of signs, i.e. the signifiant /signifier and the signifié /signified), as well as the extra-linguistic concepts to which these signs may refer (cf. Winter-Froemel 2011: 259–293). While the actualized entities (sound chains/strings of written signs and communicative referents) are, in principle, accessible to both speaker and hearer, their knowledge with respect to the other kinds of entities may diverge. Summing up these observations, we can propose the following model of communication (as illustrated in Figure 1). It is essentially based on Blank’s semiotic model (Blank 2001: 9; 1997: 102), which is extended to a genuine model of communication (however, some terminological modifications are introduced). While the abstract semiotic entities (virtuelles / ‘virtual’ in the sense used by Saussure [1916] 1969, cf. Heger 1969: 147; Hilty 1971; Coseriu 1955–56; Winter-Froemel 2011: 255) are given in grey, the actualized entities accessible to both speaker and hearer appear in white. The latter entities are actualized in a concrete situation of communication which is shared by speaker and hearer and which can be labelled as the current discourse space (CDS, Langacker 2001: 144, 2007: 425–426); these elements thus represent the common basis for communication. linguistic entities (of a particular language system)
extra -linguistic entities
sign speaker
CDS
—
[signifiant]
CONCEPT
abstract (virtual) entities
communicative actualized referent entities
„… sequence of graphic and / or phonic signs …“
hearer
[signifiant]
‘signifié’
‘signifié’
sign linguistic entities (of a particular language system)
—
CONCEPT
abstract (virtual) entities
extra-linguistic entities
Fig. 1: A comprehensive model of communication
14 Speaker and hearer are employed here in a general sense that includes writers and readers as well. Oral and written refer to the medium in which the speaker’s utterance is realized.
Formal variance and semantic changes in borrowing
77
Situations of language contact between a SL speaker and a RL hearer can thus be described as follows: in the situation of language contact, the speaker wants to designate a certain referent. S/he chooses a certain conceptualization of this referent and, accordingly, chooses a SL sign which s/he judges as adequate in order to express that concept. S/he then actualizes the signifier of this sign in graphic and/or phonic form, depending on the medium of communication. In this sense, the speaker’s perspective is basically onomasiological, starting from the referent and the concept s/he wants to designate (cf. Koch 2000: 79–80). For the hearer (or reader) in turn, the actualization of a certain sequence of graphic and/or phonic signs is the starting point from which s/he constructs a RL sign in language contact. It is important to stress that we are faced here with a kind of process that is absent from intralinguistic semantic change, as the material realization of the sign uttered by the speaker is not yet part of the linguistic convention, that is, of the hearer’s knowledge of the recipient language system. The hearer thus has to adopt the new sign and make it part of his/her knowledge of the RL system. In doing this, s/he can realize different processes of loanword integration. Three different aspects or levels should be taken into account here. The signifier has a phonological as well as a graphematic side and, if the situation of language contact occurs only at the graphic or only at the phonic level, the hearer is faced with a graphic or phonic actualization of the sign exclusively and must construct on his/her own either the phonological or the graphematic realization of the sign. In this way, processes of graphophonemic and phonographemic loanword integration can be accounted for (see discussion in Section 3 of this paper). Moreover, there is also the possibility of other formal changes in language contact/borrowing, located at the morphological level, such as gender assignment, or agglutinations and deglutinations (e.g. F. le chien ‘the dog’ → Seychelles Creole lisyen ‘dog’; I. l’alicorno ‘the unicorn’ → Middle French la licorne ‘the unicorn’, examples taken from Detges and Waltereit 2002: 155), etc. At the same time, the hearer assigns a certain semantic interpretation to the new sign s/he adopts; in this sense, the hearer’s perspective is basically semasiological. Grounding on the presumed communicative meaning of the speaker’s utterance and, most importantly, on the (presumably) intended communicative referent, the hearer assigns a certain conceptual interpretation to this referent. This conceptual interpretation is the basis on which s/he defines the signified of the new sign. In this process, the RL hearer may choose a conceptualization of the communicative referent which is different from the speaker’s, but communicatively still adequate. For example, s/he may choose a different level of abstraction. If the hearer’s conceptual interpretation becomes the signified of the loanword in the RL, a semantic specialization with respect to the SL meaning of the loanword has occurred (in a parallel way, metonymic changes can be accounted for, see Section 4.1 below). In short, a situation of borrowing implies that the hearer has extracted a signifier as well as a certain conceptual interpretation from the entities actualized in language
78
Esme Winter-Froemel
contact, and in both respects, deviations from the speaker’s choices (or knowledge) are possible. Coming back to semasiology and onomasiology, we have seen that these two terms can be related to the perspectives of the hearer and speaker respectively, referring to mental processes occurring in the speaker’s and hearer’s minds. Moreover, these two terms can also be used in order to approach and to account for various puzzles which have been observed in situations of borrowing, and this is what I will propose in the following sections of this paper. As a prerequisite, let us have a closer look at the ways in which the two perspectives are traditionally defined. Semasiology is frequently conceived as a discipline analyzing the meaning of certain expressions. The basic question here is “What is the meaning of the word X?”, or, in more linguistic terms, “What is the signified corresponding to a certain signifier?” (Geckeler and Kattenbusch 1992: 89), “What is the meaning of a certain linguistic sign or expression?” (cf. Bußmann 2008: 618). Thus, according to these approaches, the starting point of investigation is the signifier, the linguistic sign or expression. In onomasiology, in contrast, the basic question is frequently stated as “How do you express X?”, or, “What is the signified corresponding to a certain signifier or a certain concept?” (Geckeler and Kattenbusch 1992: 89), or “By what linguistic sign can a certain meaning or content be expressed?” (cf. Geckeler and Kattenbusch 1992: 89; Bußmann 2008: 493). These definitions can be applied to a broad range of lexicological investigations, for example, diachronic analyses of semantic changes of a certain expression or comparative analyses of the ways different languages express certain concepts.15 However, processes of borrowing (and situations of lexical innovation in general) are specific domains of analysis, insofar as we are dealing here with the introduction of a new sign into the language system. This means that at the point of innovation, there is not yet any signifier or signified in the (recipient) language, as the innovation has not yet become part of the language system concerned. This excludes the possibility of taking entities like the signifier as a starting point, and for our concerns it thus becomes necessary to take actualized semiotic entities as the starting points of our investigation. The two basic points of reference in language contact and borrowing are exactly the two kinds of entities which are accessible to both speaker and hearer in the current discourse space – the sequence of graphic and/or phonic signs as well as the communicative referent, and the two perspectives can thus be opposed to each other according to whether a sequence of graphic and/or phonic signs or the communicative refer-
15 Nevertheless, one exception must be made here: definitions of onomasiology referring to the signified and the signifier as the two main reference points seem ill-defined as, according to Saussure’s conception of these two entities, they are inseparable, and it thus seems impossible to first isolate a signified and then look for its signifier (cf. Winter-Froemel 2011: 232). A semasiological investigation of the signified of a certain signifier, in contrast, seems possible.
Formal variance and semantic changes in borrowing
79
ent is the starting point of investigation. Summing up, we will define semasiology and onomasiology in this paper as follows: ONOMASIOLOGY takes the communicative referent as the starting point of analysis and investigates how this actual referent is conceptualized and by what phonic or graphic signs it is designated in a concrete situation of communication. SEMASIOLOGY takes a certain sequence of phonic or graphic signs, such as actualized in situations of language contact, as the starting point of analysis and investigates the concepts or referents designated by this sequence of signs.
Before passing over to the analyses of our examples, let me add one further methodological remark. For many of the following onomasiological descriptions, it is sufficient to refer to concepts (and not communicative referents) as the starting points of investigation (cf. the definition of onomasiology proposed by Koch 1996: 224), even if, ultimately, the reinterpretations observed in language contact and borrowing always go back to reinterpretations in concrete situations of communication with an actual communicative referent. In the remainder of this paper I will mark the concepts designated by small caps.
3 Extracting the RL signifier: Processes of loanword integration First, let us have a look at changes in situations of borrowing which relate to the hearer’s construction of the loanword’s signifier in the RL. As we have seen in Section 1.1, we can find a high degree of variance with respect to formal integration for some loanwords, and I will now focus on variants of two recent loanwords differing fundamentally with respect to the degrees and modalities of integration. For the borrowings of E. people and E. fuel into French, the following variants have been observed16: (41) E. fuel ["fju…´l] → F. fuel [fjul] (42) E. fuel ["fju…´l] → F. fioul [fjul] (43) E. fuel ["fju…´l] → F. fuel [fÁEl] (44) E. fuel ["fju…´l] → F. fuel [fy"El] (45) E. people ‘people’ → F. people n.m.pl. ‘celebrities’ (46) E. people ‘people’ → F. people n.m.sg. ‘a celebrity’, pl. people
16 For the variants of people observed in 1.1, I will focus here only on the different signifiers (pronunciation, spelling, gender assignment, plural marking). The various semantic interpretations will be commented on in Section 4.1 below.
80
Esme Winter-Froemel
(47) E. people ‘people’ → F. people n.m.sg. ‘a celebrity’, pl. peoples (48) E. people ‘people’ → F. people n.f.sg. ‘a (female) celebrity’, pl. people (49) E. people ‘people’ → F. people n.f.sg. ‘a (female) celebrity’, pl. peoples (50) E. people ‘people’ → F. pipole [pipOl] n.m.sg. ‘a celebrity’, pl. pipoles (51) E. people ‘people’ → F. pipole [pipOl] n.f.sg. ‘a (female) celebrity’ (52) E. people ‘people’ → F. pipeul [pipœl] n.m.pl. ‘celebrities’ (53) E. people ‘people’ → F. pipeul [pipœl] n.m.sg. ‘a celebrity’, pl. pipeul (54) E. people ‘people’ → F. pipeul [pipœl] n.m.sg. ‘a celebrity’, pl. pipeuls (55) E. people ‘people’ → F. pipeule [pipœl] n.m.sg. ‘a celebrity’, pl. pipeule (56) E. people ‘people’ → F. pipeule [pipœl] n.m.sg. ‘a celebrity’, pl. pipeules (57) E. people ‘people’ → F. pipeulette [pipœlEt] n.f.sg. ‘a female celebrity’, pl. pipeulettes How can the emergence of the single variants be explained? And how can their coexistence (competition) in the RL – and in the utterances of single speakers – be accounted for? Concerning the first question, it immediately becomes clear from the examples cited above that for pronunciation and spelling, different options are viable for the RL speakers. The examples in (42) and (43) illustrate two fundamentally different types of integration: whereas the variant in (42) remains relatively close to the SL pronunciation, its spelling differs considerably from the SL model. In (43), in turn, the original spelling is kept, but the pronunciation of the loanword strongly deviates from the SL realization. In traditional loanword research, these two options have been distinguished by the labels ‘phonographemic’ vs. ‘graphophonemic’ integration (cf. Meisenburg 1993; for the following explanations cf. also Winter-Froemel 2011: 368–375).17 How can the semiotic model proposed in Section 2 contribute to a better understanding of these two options? For phonographemic integration, it can be assumed that the language contact took place in the phonic medium, so that the RL hearer was confronted with a phonic realization of the sign. When integrating this phonic form into his/her language system, the hearer makes certain adjustments to the RL system: vocalic length (which is not a distinctive feature in the vocalic system of French) and the schwa disappear
17 Other terms used in the literature are ‘ear-loans’ vs. ‘eye-loans’ (Meisenburg 1993: 48), ‘phonetic spelling’ vs. ‘spelling pronunciation’ (Jabłonski 1990: 23), emprunts auditifs et phonétiques vs. emprunts visuels et graphiques (Roudet 1908: 244) (cf. Winter-Froemel 2011: 265).
Formal variance and semantic changes in borrowing
81
(the latter change is due to phonotactic reasons). Nevertheless, the pronunciation remains relatively close to the SL model. The spelling , in contrast, is developed independently from the SL spelling; instead, the adapted pronunciation serves as the basis for deriving the RL spelling of the loanword by applying the rules of the French writing system (phoneme-to-grapheme correspondence rules, cf. Meisenburg 1996) to the single phonic units: [f] ↔ < f >, [j] ↔ , [u] ↔ , [l] ↔ < l >. This modality of integration can be summarized as shown below.18 A similar description could be given for F. pipole [pipOl], pipeul [pipœl], pipeule [pipœl]: E. ["pip9l] → F. [pipOl] / [pipœl] → / , , according to the rules [p] ↔
, [i] ↔ , [O] ↔ / [œ] ↔ , [l] ↔ < l >, ([-] ↔ ).19 (58) E. fuel ["fju…´l] → F. fioul [fjul] S [E.]
–
↓ […'fju:əl…] ↓ [fjul]
CDS
H [F.]
['fju:əl]
‘material burned to produce heat or power’
[f] ↔ , [j] ↔ , [u] ↔ , [l] ↔
MATERIAL BURNED TO PRODUCE HEAT OR POWER
↓ fuel oil ↓ ‘fuel oil’
–
FUEL OIL
In order to explain the French forms fuel [fÁEl] / [fy"El], in contrast, we have to assume a completely different scenario of language contact and borrowing. Here the loanword is borrowed in graphic form (without any formal changes) and then pronounced according to the rules of the RL system (grapheme-to-phoneme correspondence rules), yielding [fÁEl]: < f > ↔ [f], / ↔ [Á], / ↔ [E], ↔ [l] (see (59)). The alternative realization [fy"El] can be explained by assuming that the hearer applies the context-free rule ↔ [y] instead. As a result, we obtain the disyllabic realization here, which triggers a further assignment of ultimate stress (again, according to the rules of the system of French).
18 S = speaker, H = hearer, CDS = current discourse space, E. = English, F. = French. 19 For the following scenarios of language contact and borrowing, I assume that it is the RL hearer who imports the loanword into the RL (cf. classical definitions of borrowing as a hearer-induced process or as a case of RL agentivity as opposed to substratum interference as a speaker-induced process or SL agentivity; cf. Thomason and Kaufman 1988: 37–45; Van Coetsem 2000: 65). Nevertheless, as an anonymous reviewer points out, it is also possible that SL speakers introduce a loanword into the RL in a (partly) adapted form (e.g. for stylistic or socio-pragmatic reasons). We could thus assume parallel scenarios of language contact and borrowing where the processes of loanword integration described above are located in the mind of the speaker, so that the actualized sequence of phonic or graphic signs already shows phenomena of adaptation to the RL.
82
Esme Winter-Froemel
(59) E. fuel ["fju…´l] → F. fuel [fÁEl]
However, we notice that there is also a third basic option of integration apart from the solutions we have just discussed. In some cases, we can find RL spellings of a loanword which remain close or identical to the SL and at the same time, the pronunciation of the loanword also remains close to the SL model. In order to explain these borrowings, we have to assume that the language contact involves both the phonic and the graphic medium20: (60) E. fuel ["fju…´l] → F. fuel [fjul] S [E.]
['fju:əl]
CDS
↓ «…fuel…» ↓
↓ […'fju:əl…] ↓
H [F.]
[fjul]
‘material burned to produce heat or power’
–
MATERIAL BURNED TO PRODUCE HEAT OR POWER
↓ fuel oil ↓ ‘fuel oil’
–
FUEL OIL
This last kind of scenario has not been mentioned in traditional research. However, it seems to be of central importance for analyzing recent loanwords. This is confirmed by the fact that several of the variants attested for the borrowing of E. people can be explained by assuming a scenario of double contact with the phonic and the graphic realization of the SL sign: the pronunciation variants [pipOl] and [pipœl] which are attested for the spelling in French21 cannot be derived from the RL spelling,
20 I would also apply this kind of scenario to situations of borrowing where there is a relatively high degree of bilingualism, so that the RL hearer may already have encountered the SL pronunciation and/or spelling of the loanword at an earlier stage of time. 21 The pronunciation [pipœl] is indicated by PR ; for the variant [pipOl] see example (1), where rhymes with alcool, folles, etc.
Formal variance and semantic changes in borrowing
83
but can only be explained by referring to the SL pronunciation. At the same time, the spelling represents a direct importation of the SL spelling. To sum up, the different spelling and pronunciation variants of a certain loanword in the RL can be explained by assuming different scenarios of language contact and borrowing, as well as different degrees of bilingualism. At the same time, we have already seen that within these scenarios, the hearer may also have a certain liberty of choosing one variant or another (e.g. the spellings vs. can both be derived from the pronunciation [pipœl] according to the rules of the French writing system). Let us now turn to the morphological level (for the following analyses cf. WinterFroemel 2011: 362–365 and 467–472). A first observation which emerges from the examples cited above is that we can find both masculine and feminine gender for the spelling variants and , whereas is attested only with masculine gender. To account for gender assignment in loanwords, different factors have been proven to be potentially relevant in previous research. These factors may contradict each other so that, once again, different outcomes may be obtained. It is generally assumed that the attribution of masculine gender is the unmarked choice for loanwords (cf. Humbley 2002: 116), so this factor could account for all of the masculine variants. Moreover, for the variant , masculine gender can also be motivated by suffix analogy, as the ending of this variant corresponds to the French suffix -eul determining masculine gender (cf. Callies, Ogiermann, and Szcze´sniak 2010: 67). The variant , in contrast, shows a formal analogy to the suffix -ole motivating feminine gender assignment, which is equally attested in the corpus data. Another potentially relevant factor is analogy to semantically (and possibly formally) similar words in the RL (which are paradigmatically related to the loanword). In our case we could think of the RL items peuple n.m. (motivating masculine gender assignment) on the one hand and célébrité, star, vedette, personne, personnalité, all n.f. (motivating feminine gender assignment) on the other hand. In many cases, one or several of these semantically related items appear together with the loanword in the speaker’s utterance (i.e. on the syntagmatic level), so that it seems even more plausible that these items can influence the gender assignment of the loanword: (61) Tout le monde a relevé, certains pour s’en offusquer, d’autres pour le déclarer insignifiant, la coloration très people de la soirée électorale. […] le fait marquant reste encore ce qu’il est désormais convenu de nommer peopolisation de la vie politique. […] On remarquera pourtant ce glissement qui, du peuple, nous entraîne vers le people, vers la célébrité! […] En s’anglicisant, la démocratie moderne trahit la république! (extracted from the article «People», in: Lexique 2007/Petit abécédaire du quotidien, author: Pierre-Michel Simonin, http://pierre-michel.fr/lexique/people.htm, accessed 31 March 2012, italics original, bold print EWF)
84
Esme Winter-Froemel
(62) Et vous Charlotte, êtes-vous plutôt une star, une célébrité ou une people? (from an interview with Charlotte Baut, in the radio broadcast 2007: La vie privée des stars, diffused 24 December 2007, 19.45, RTL TVI, text version by Mara UM, http://www.tuner.be/actu.asp?id=140340&content=home, accessed 25 May 2009, emphasis original) Finally, another factor which becomes relevant for the borrowing of E. people is the natural gender of the referent, that is, masculine gender should occur for masculine referents (or mixed groups of referents and generic uses), feminine gender for feminine referents. Many occurrences of the loanword in French confirm this tendency; however, even here we can find counter-examples: (63) on nous montra tout de même une “pipole”: un ancien journaliste député européen et candidat malheureux, dernièrement, à Paris XII (blog entry dated 17 December 2008, http://ferragus.blog.lemonde.fr/2008/12/17/romanee-counteez/, accessed 31 March 2012, bold print EWF) With respect to inflection, it is important to note that the following remarks concern only written realizations of this loanword (in phonic realization, plural marking is obtained by other means such as the definite article, e.g. [l´pipOl] vs. [lepipOl], etc.; for the following remarks cf. Winter-Froemel 2011: 473–479). We have already seen that plural forms in -s as well as invariant forms are attested in the RL. The invariant forms remain closer to the SL model: they can be motivated by referring to the situations of language contact where the SL form is used in its collective meaning. At the same time, plural is often redundantly marked in written utterances in French (cf. the different forms of the article in vs. , adjectives in singular vs. plural form co-occurring with the nominal form of the loanword, etc.), so that a lack of plural marking on the noun can be compensated for. Moreover, it can be assumed that the loanword – which is at most weakly entrenched in other RL hearers’ minds at the stage of early diffusion in the RL – can more easily be recognized (that is, related to the entry “people”/“pipole”/“pipeul” etc. in the mental lexicon of RL hearers) if it is used in an invariant form. On the other hand, we have already seen that the loanword is also used with an individuative meaning. For these uses which show a greater distance with respect to the SL form, pluralization with -s is a plausible option, as it corresponds to the rules of the French language. Along these lines, pluralization with -s can also be chosen for collective uses. Thus, both strategies can be motivated, and it seems interesting to observe that the two possible strategies correlate with the degree of integration at the phonetic/phonological and graphematic level: for the weakly integrated variant , invariant plural forms by far prevail (2,930,000 hits for “les people” vs. 931,000 for “les peoples”), whereas for the strongly integrated variant , proportions are inverted (1,530 hits for “les pipole” vs. 146,000 for “les pipoles”, queries with google.fr, 29 March 2012).
Formal variance and semantic changes in borrowing
85
To sum up, from a usage-based perspective focusing on the single situations of language contact and borrowing, it seems obvious that different variants of a certain loanword – and different degrees and modalities of loanword integration – can be considered adequate by the speakers introducing the novel forms into the language. In this sense, variance and fluctuation in loanword integration do not appear to be exceptional cases, but rather represent the rule in borrowing (at least at a first stage of time, when the novel forms are introduced into the RL).22 This observation is particularly valid if the RL speech community is highly heterogeneous, e.g. with respect to the degrees of bilingualism, the attitudes towards the SL, diatopic variation (cf. Canadian French), etc. However, one last issue remains to be addressed: as variance is attested not only within the RL community as a whole, but also within the utterances of single RL speakers (cf. examples (64) and (65)), we can ask why individual speakers should use different variants of a certain loanword within a single utterance. (64) Fioul/fuel: obtenir la prime à la cuve. Les ménages non imposables qui ont fait livrer du fuel pour leur chauffage entre le 10 novembre 2007 et le 31 janvier 2008 peuvent demander une aide à la cuve. […] Cette prime au fioul devrait être portée à 200 euros pour l’hiver prochain. (document dated 11 June 2008, author: ericRg, http://droit-finances.commentcamarche.net/faq/sujet-1086-fioulfuel-obtenir- la-prime-a-la-cuve, accessed 31 March 2012, bold print EWF) (65) Les people aussi sont fans d’Apple. Le monde entier est en émoi après l’annonce hier soir du décès de Steve Jobs, l’ex-patron légendaire de la marque à la pomme. Apple compte plusieurs millions de fans à travers le monde et les people ne sont pas en reste. La direction d’Apple l’a annoncé hier soir dans un communiqué : Steve Jobs, l’ex-directeur général et co-fondateur de la marque californienne est décédé des suites d’un cancer du pancréas. Aujourd’hui, les fans de la marque ont salué la mémoire du «magicien» qu’il était, en organisant des minutes de silence. De même, les Apple Store du monde entier ont été submergés par des milliers de bouquets de fleurs en hommage au génie visionnaire qui a «révolutionné le monde». Et les peoples ne sont pas en reste! De 50cent à Nicole Kidman en passant pargeorges Clooney, tous sont fans avérés de la marque à la pomme. (article in marie claire, title: Les people aussi sont fans d’Apple, author: Emmanuelle Ringot, http://www.marieclaire.fr/,stars-fanapple-steve-jobs,20175,432899.asp, accessed 29 March 2012, bold print EWF) In some cases, the combination of different variants can be interpreted as a deliberate choice of the speaker. In example (64), we can assume that s/he mentions both
22 As a related point, it seems plausible to assume parallel and independent innovations by several RL speakers for most borrowings.
86
Esme Winter-Froemel
spelling variants in the title of the document in order to increase its accessibility by Internet search engines. Moreover, it can be assumed that RL speakers may judge different variants of a loanword as adequate in different communicative settings, e.g. formal vs. informal communication, etc. This means that we can assume that different degrees and modalities of loanword integration can be pragmatically or stylistically marked and interpreted as sociolinguistic variables (I will return to this issue in more detail in Section 4.2 below). Combining different realizations of the loanword can thus be seen as a strategy of adapting the utterance to a (presumed) heterogeneous audience regarding attitudes towards loanword integration. In other cases, however, the use of several variants of a loanword does not seem to be a deliberate choice: in example (65) and in the further uses of the loanword in (64), there are no clear indications of a conscious choice of one over the other variant within the text. Similar uses suggest that loanwords may be underspecified in the RL hearers’ minds, and that certain of their features (e.g. plural marking, spelling, pronunciation) may be determined ad hoc when the loanword is (re)used. Further evidence for this hypothesis is provided by metalinguistic comments of speakers who express their uncertainty about the “correct” realization (pronunciation, spelling, plural marking, etc.) of single loanwords: (66) […] Enfin … il s’adapte, travailler plus pour gagner plus, ça marche aussi pour les people (ou peoples, je ne sais pas si on met le S pour ces anglicismes!). (blog entry, dated 08 February 2008, author: moussakaonline [French nationality, female, born in 1978, living in Athens; cf. http://www.uniterre.com/membre/mous sakaonline/], http://moussakaonline.uniterre.com/&thisy=2008&thism=2&this d=8, accessed 31 March 2012, bold print EWF)
4 Semasiological and onomasiological analysis Having investigated the different processes of loanword integration with respect to the formal realization of loanwords, let us now turn to the domain of content. The following sections will deal with two basic issues: first, the possibility of accounting for semantic changes in the situation of language contact/borrowing proper23, and second, the analysis of different pragmatic effects which may arise from varying degrees and modalities of integration as well as from two fundamentally different types of loanwords.
23 The semantic changes studied in the following sections originate in language contact, when the RL hearer interprets the loanword in a way which is different from the SL speaker’s meaning. However, the semantic shift introduced by the RL hearer may become visible only when s/he reuses the loanword in the RL.
Formal variance and semantic changes in borrowing
87
4.1 Extracting the RL signified: How to account for semantic changes in borrowing Grounded on the usage-based hypothesis that introductions of words borrowed from a SL into a RL can be traced back to single events of communication24, one should expect that borrowings normally take place without semantic changes, so that the loanwords are introduced into the RL exactly in the meaning they had in the situation of language contact, when the RL hearer was first confronted with the SL form.25 As we have seen in the previous sections of this paper, however, semantic changes in borrowing do occur. In a semasiological perspective, we can thus compare the SL and the RL meaning of a loanword and observe, in some cases, that the two meanings do not match. More specifically, we have seen that two types of semantic relations are attested: taxonomic subordinations (for semantic specializations), and contiguity (for metonymic changes). If we turn back to the semiotic model proposed in Section 2, we are faced with the task of accounting for these semantic innovations taking place in language contact, albeit that the communication between the SL speaker and the RL hearer remains successful (that is, we assume situations of language contact which do not imply a complete misunderstanding leading to a breakdown of communication). How can speaker and hearer successfully communicate and at the same time assign different meanings to the lexical items occurring in the utterance? The speaker aims at expressing a particular content: s/he wants to designate a certain referent and chooses a certain conceptualization of this referent and a SL sign having the adequate meaning (onomasiological perspective). The actualization of this sign is the basis on which the hearer reconstructs the message (semasiological perspective). But at the same time, the hearer’s semantic interpretation of the utterance is constrained by the communicative referent. This means that the onomasiological perspective becomes equally relevant here, as we can analyse and compare the different conceptualizations assigned to the referent by speaker and hearer, the hearer’s conceptualization becoming possibly conventionalized and thus determining the loanword’s signified in the RL. Let us recall the semantic changes observed in the loanwords mentioned above. For E. people ‘people’, the first RL meaning is ‘celebrities’, a meaning which is taxonomically subordinated to the SL meaning:
24 These events of communication are language contact on the one hand and borrowing proper on the other hand, that is, the first use of the new item in the RL, according to the speaker’s perception. 25 Reduction of SL polysemy (or homonymy), in contrast, is highly frequent, as becomes immediately clear from the usage-based approach to borrowing proposed here (cf. Section 1.2).
88
Esme Winter-Froemel
(67) E. people ‘people’ → F. people n.m.pl. ‘celebrities’, frz. pipeul [pipœl] n.m.pl. ‘celebrities’ The basic claim made here is that the semantic shift can be explained by assuming contexts of use (in language contact) where both interpretations are compatible with the overall meaning of the utterance, or, in other words, where both meanings are functionally equivalent. Recently the notion of ‘bridging contexts’ has been proposed to describe this type of contexts (Evans and Wilkins 2000); moreover, this aspect can be related to the topic of ambiguity, as the actualized signs are potentially ambiguous in the situations of language contact which are relevant here (cf. Winter-Froemel and Zirker 2010; Winter-Froemel 2012). In order to explain the semantic shift occurring in the borrowing of E. people into French, it is generally assumed that the American magazine People plays an important role. A first example for a potentially ambiguous utterance would thus be the title of the magazine itself, or an identical heading within another magazine (example (68)). Similarly, many occurrences of the word E. people appearing in descriptions of the American magazine People are also ambiguous (example (69)): the SL speaker conceptualizes the referents (which are normally famous people, and be it only because they become famous for appearing in this magazine) as PEOPLE26; at the same time, the RL hearer may interpret the form in a more specific meaning (as designating the concept FAMOUS PEOPLE, CELEBRITIES) without any negative effects arising from this (re-)interpretation for the success of communication. It is thus possible that the hearer attributes a new meaning to the loanword, so that s/he may then reuse the loanword in the RL in the more specific meaning in contexts where both meanings are no longer functionally equivalent (see example (70)).27 (68) People (69) People (originally called People Weekly) is a weekly American magazine of celebrity and human-interest stories, published by Time Inc. […] People’s website, People.com, focuses exclusively on celebrity news. […] People is perhaps best known for its yearly special issues naming “Most Beautiful People”, “The Best
26 There is also a protoypicality effect here (famous people are the most prototypical referents which are relevant here), as well as a conversational implicature towards the more specific interpretation (FAMOUS PEOPLE), which is triggered by the apparent flouting of the maxim of relevance (cf. Grice 1975; Sperber and Wilson 1986). However, this interpretation remains cancellable in the SL (which is not the case in the RL). 27 In the salutation “Salut les pipeul(e)!” (see examples (9) and (12) cited in Section 1.1), however, the loanword seems to be used not in its conventional RL meaning (referring to CELEBRITIES), but in a meaning which remains closer to the SL (PEOPLE; cf. the salutations “Salut les gars!”, “Salut les filles!”, “Salut les copains” etc.). At the same time, the use of the loanwords in the salutation formula can convey pragmatic effects (extravagance, expressivity), to which we will return in Section 4.2.
Formal variance and semantic changes in borrowing
89
Dressed”, and “The Sexiest Man Alive”. (http://en.wikipedia.org/wiki/ People_%28magazine%29, accessed 30 March 2012) (70) J’aime pas les Pipoles! (blog entry dated 27 January 2012, author: Dunno [female, France], http://dunnowhattodo.over-blog.com/article-j-aime-pas-les-pipoles98039797.html, accessed 09 February 2012) Beyond that, we can find many incidences of variants of people used in an individuative meaning (‘a celebrity’) in the RL, and in some cases, further morphological features are used to distinguish between male and female referents (grammatical gender or suffixes, cf. un pipeul – une pipeulette 28): (71) E. people ‘people’ → F. people n.m.sg. ‘a celebrity’, pl. people / peoples, pipole [pipOl] n.m.sg. ‘a celebrity’, pl. pipoles, frz. pipeule [pipœl] n.m.sg. ‘a celebrity’, pl. pipeule / pipeules Here again, the semantic innovations can be explained by assuming scenarios of communication where the loanword is ambiguous and permits both the collective and the individuative interpretation. In utterances like (72) containing the plural form, les people can be interpreted as a collective term, but also as referring to a sum of individual celebrities. In this way, it is possible that the speaker uses the loanword in its collective meaning, but the hearer reinterprets and reuses it him/herself in later utterances with an individuative meaning (as indicated, e.g., by uses of the singular un(e) people ). (72) Les people enceintes: les bébés stars de 2011 ! (title of an entry published 12 April 2011, author: Johanna, http://www.bebe-cards.com/blog/stars-enceintes/, accessed 12 February 2012) Finally, the introduction of further distinctions between male and female referents is evidence of a creative use and reshaping of the imported material according to the communicative needs of the RL speakers. Similar uses still are relatively rare in the RL, but they may become conventionalized if they prove useful for a large number of speakers. In this sense, we are faced here with an example of language change in the making. (73) E. people ‘people’ → F. people n.f.sg. ‘a (female) celebrity’, pl. people / peoples, F. pipole [pipOl] n.f.sg. ‘a (female) celebrity’
28 On the Internet we also find an utterance where the form pipol is used for male referents and pipole for female referents, but (at least for the time being, March 2012), this distinction remains an individual use: “cette femme [Cécilia Sarkozy] est une pipole, c’est normal, elle a commencé avec un pipol [Jacques Martin], elle a séduit un politique qui s’est lui-même pipolisé, puis elle est retourné au monde des pipols.” (comment in an Internet discussion, dated 2007, author: Kolderov, http://fr.answers.yahoo.com/question/index?qid=20071020222151AAEJSAR, accessed 30 March 2012).
90
Esme Winter-Froemel
(74) E. people ‘people’ → F. pipeul [pipœl] n.m.sg. ‘a male celebrity’, pl. pipeul / pipeuls (75) E. people ‘people’ → F. pipeulette [pipœlEt] n.f.sg. ‘a female celebrity’, pl. pipeulettes Other interesting examples of semantic shifts occurring in processes of borrowing are the following (cf. Section 1.2): (76) S. sombrero ‘hat’ → E. / F. sombrero ‘a kind of hat with a very wide brim’ (OED ; PR ; TLF ) (77) I. grappa ‘pomace brandy’ → F. grappa ‘pomace brandy of Italian origin’ (PR ; TLF ) (78) E. flipper ‘lever in the pinball game’ → F. flipper ‘pinball game’, ‘pinball machine’ (OED ; PR ) For sombrero and grappa, the nature of the changes is, again, taxonomic, and the changes can thus be explained in analogy to the scenarios proposed above for people : we can think of ambiguous utterances where both interpretations are permitted by the context of communication, that is, utterances where the communicative referents are (Mexican) hats with a very wide brim or pomace brandy of Italian origin, and where it is plausible that the speaker conceptualizes these referents as representatives of the more general category (HAT / POMACE BRANDY), relying on the context in order to make the reference clear.29 Moreover, in these cases it is very plausible that the hearer will interpret the forms in the more specific meaning, as there are already other designations for the superordinate concepts HAT and POMACE BRANDY in the RL; the introduction of the loanwords into the RL thus permits the RL speakers to introduce a concise term into the language by which the subordinate concept can be expressed.30 For flipper, however, a different innovation scenario must be proposed, as we are dealing here with a metonymic shift: the speaker’s and hearer’s interpretations do not diverge with respect to the level of abstraction, but nevertheless we can propose potentially ambiguous utterances permitting a semantic reinterpretation or reanalysis by the hearer without any negative effects on ongoing communication. More specifically, what is central here is a figure-ground (or part-whole) shift occurring in contexts of use where the communicative referent is a lever in the pinball game (cf. Winter-Froemel 2012: 65–69).
29 We can assume that sombrero has been borrowed in a situation of language contact with (Mexican) Spanish where the actualized referent was a prototypical Mexican hat. 30 For the loanword people, in contrast, the situation is different, as the concept CELEBRITIES can also be expressed by the native expression F. célébrité. Here, the introduction of the loanword in its new (RL) meaning seems to be mainly motivated by pragmatic factors (cf. Section 4.2).
Formal variance and semantic changes in borrowing
91
In order to understand the possibility of a semantic reinterpretation, let us recall some facts of the history of pinball games. They developed essentially in the 20th century, one of the most important manufacturers being D. Gottlieb and Company. The flippers (the levers) were only introduced in 1947, permitting the players to control the movement of the ball. After World War II, the pinball game diffused progressively in Europe, and one of the most successful games was Flipper Fair. In this game, the (English) word flipper was written on the levers (and it also appeared elsewhere on other pinball machines; cf. example (79)). From the perspective of the SL “speaker” (the company), this inscription seems communicatively useful, as the new elements are introduced together with their designation. At the same time, the meaning of E. flipper is clear: in order to designate the game, there is already the expression pinball, and the use of flipper to designate the LEVERS can be semantically motivated by their similarity to the limbs of sea animals (which is the original meaning of E. flipper ).31 The text in (80) is taken from advertisements for this game; again, we can see here a speaker strategy of introducing the new element (which is a central element of the game) together with its designation. For the RL “readers”, in contrast (the pinball players), there is not only a need to have a designation for the LEVERS, but also for the PINBALL GAME as such, when the game spreads in Europe. As the word flipper frequently appears on the pinball machines (whereas the word pinball is only rarely mentioned), it seems plausible for them to interpret this word as a designation for the game. For the SL speaker, the LEVERS are thus the more prominent element – the figure – (when the new games are introduced), whereas the GAME is the ground; for the RL “readers”, the situation is reversed, as for them, the GAME is more prominent and the LEVERS are just one element within the semantic frame of the pinball game. (79) FLIPPER [inscription on the levers] (cf. http://www.ipdb.org/images/894/image14.jpg>, accessed 31 March 2012; A. Menarini, «Le lingue del mondo» XXIV, 1959, 246, cited from DELI ) (80) a Gottlieb flipper skill game (cf. http://www.ipdb.org/machine.cgi?id=894, accessed 31 March 2012) To sum up, we have seen that the different kinds of semantic shifts in borrowing share certain basic characteristics: they arise from a hearer-induced reanalysis32 in contexts of use which are potentially ambiguous, insofar as the communicative referent permits several conceptual/semantic interpretations of the utterance. The referent thus plays a double role in this situation: it is a fixed-point in communication constraining possible
31 The similarity can be seen both in the form and in the movements of the flippers (cf. also the English verb to flip, indicating similar movements). 32 For a more general account of this type of change cf. Detges and Waltereit (2002).
92
Esme Winter-Froemel
interpretations; at the same time it is the element offering certain margins of interpretation. Furthermore, from a Cognitive Linguistic point of view, two kinds of scenarios are central for the reanalyses which are relevant in language contact and borrowing: first, scenarios offering a choice of different levels of abstraction, and second, scenarios permitting metonymic figure-ground shifts. Interestingly, for the examples of semantic reanalyses in language contact studied here – contrary to speaker-induced semantic change – there is no conceptual association of the source concept and the target concept. This finding seems to be valid for semantic reanalysis in general. In a usage-based perspective, semantic reanalysis thus seems to be fundamentally different from other kinds of (speaker-induced) semantic change. Finally, with respect to the material realizations of the signs, we have seen that syntactically reduced utterances (titles, labels, inscriptions) are particularly important: a semantic reinterpretation is facilitated here because the syntactic context which could constrain the possible interpretations is reduced.
4.2 How to account for different pragmatic effects of borrowings Recent approaches have emphasized that in order to measure the success of loanwords, new methods are required. A particularly important aspect to be taken into account is onomasiological variation33: for many loanwords, there are native equivalents which can be used to designate more or less the same concept. Therefore, the success of the loanwords should be determined from a concept-based, onomasiological perspective, evaluating the relative diffusion of the loanword as well as the diffusion of the native equivalent. In this respect, the question whether there is a native equivalent for the loanword in the RL becomes central. Moreover, it can be shown that the issue of onomasiological variation also has an impact on the interpretation of the loanwords. We have seen that the use of certain loanwords in the RL is pragmatically unmarked (see examples (81) and (82)). In other cases, however, the loanwords appear in contexts which are strongly marked by expressivity (the speaker formulates his/her message in an extravagant way, in order to catch the readers’ attention, etc., see examples (83) and (85) as well as the metalinguistic statement in (84)). (81) Passa il mouse sulle località attive per conoscerne la distanza da Le Bilodole. (“Pass the mouse over the active zones in order to know their distance from Le Bilodole”; from a description of Le Bilodole, a Bed and Breakfast in Tus-
33 See the editors’ introduction to this volume as well as Speelman, Grondelaers, and Geeraerts (2003), Zenner, Speelman, and Geeraerts (2012), and Humbley, Jacquet-Pfau, and Sablayrolles (2011).
Formal variance and semantic changes in borrowing
93
cany, http://www.bilodole.it/ita/dintorni.html, accessed 28 March 2012, bold print EWF) (82) Bonjour Après avoir de nombreuses fois mal débranché ma clé usb ( je n’ai pas utilisé la fonction retirer le périphérique en toute sécurité de windows xp), ma clé usb n’est plus détectée par le système, et le voyant de la clé usb ne s’allume plus. Existe t’il une solution? (question in an Internet forum, dated 2008, http://www.commentcamarche.net/ forum/affich-1777890-cle-usb-comment-reparer, accessed 31 March 2012, bold print EWF) (83) Bon les niouses pipole la? (question in an Internet forum, dated 13 September 2006, http://www.magrossesse.com/forum/bebes-janvier-2004-f133/bon-lesniouses-pipole-la-t128802, accessed 31 March 2012, bold print EWF) (84) yes I also think it comes from the magazine “People”, featuring celebrities, hence “pipole” became the “in” way to say celebs in french. […] (contribution in a discussion forum, dated 25 October 2007, author: mogador [living in Marseille, native speaker of French and American English], http://forum.wordreference.com/ showthread.php?t=694184, accessed 31 March 2012) (85) Remate le trailer encore une fois, fais abstraction des passages “PSP” (avec l’ecran “cropé”). C’est de la next-gen pour moi ça. (comment in an Internet discussion, dated 07 June 2009, author: Hajin, http://soulcalibur.fr/index.php? threads/soul-calibur-broken-destiny-pour-psp.7215/page-3, accessed 31 March 2012, bold print EWF) As I will argue in this section, the pragmatic differences observed can be related to the issue of (formal) onomasiological variation (cf. Speelman, Grondelaers, and Geeraerts 2003: 318), as the strong pragmatic effects arise from the relatively marked status of certain loanwords compared to native equivalents (cf. Onysko and Winter-Froemel 2011).34 If we compare the loanwords appearing in the examples above, we can see that in the first two cases, the loanwords lack a native equivalent in the RL: they introduce a new concept into the RL for which there is no alternative designation. In this sense, their use is an unmarked choice. For the other three examples, in contrast, there are native equivalents which could equally have been used by the speakers: célébrité for pipole, nouvelle generation for next-gen. This fundamental difference can be related
34 With respect to traditional loanword studies the issues raised here can be related to the distinction between (so-called) necessary and non-necessary (luxury) loans, a distinction which has been controversially discussed due to its strong puristic underpinnings (cf. note 1 above). In Onysko and Winter-Froemel (2011) and Winter-Froemel (2011) the two groups are therefore labeled ‘catachrestic’ (~necessary) and ‘non-catachrestic’ (~non-necessary) loanwords/innovations.
94
Esme Winter-Froemel
to Levinson’s theory of Presumptive meanings (Levinson 2000): he distinguishes between three basic types of implicatures, and two of these are directly related to the notion of (relative) markedness.35 Levinson argues that for lexical doublets, one of the forms normally has an unmarked status while the other one is (relatively) marked. Consequently, the use of the unmarked form will only convey I-implicatures towards a stereotypical (or prototypical) interpretation36, whereas the use of the marked form will typically convey additional M-implicatures (such as extravagance, expressivity, etc.) and show a relative pragmatic markedness (Levinson 2000: 137–138). Thus, according to Levinson, a speaker’s use of the loanword mouse in Italian will be interpreted in the sense of talking about a normal, stereotypical computer mouse (and similarly for uses of USB /usb in French). Uses of F. pipole (instead of célébrités ) or next-gen (instead of nouvelle generation ), in contrast, are marked choices. This relative markedness becomes even more evident in utterances where both the loanword and the native equivalent are used. In (86), the loanword is used in the title, whereas in the full explanation of the question, the native equivalent appears. As one of the functions of the title of a message is to attract the readers’ attention, the choice of the marked item seems strategically motivated here. Within the message, in contrast, the referential function predominates, and the choice of the native expression thus seems to be more adequate for the speaker. The pragmatic difference between the two forms is confirmed in the metalinguistic statement in (87) where the speaker compares the pragmatic value of pipole and potential equivalents in French (starlette, célébrité ); moreover, issues of motivatability (pipole – pipeau ) are raised here. (86) changé mon password Hello, J’aimerais bien changer mon mot de passe pour l’administration de mon site, mais je suis un peu peureuse … […] (question in an Internet forum, dated 28 July 2010, author: J4cd, http://forum.joomla.org/viewtopic.php?p=2214089, accessed 31 March 2012, bold print EWF) (87) Mais c’est quoi un pipole en fait? Un pipole, ca peut aussi s’appeler une starlette ou une célébrité, mais dans le mot pipole, il y a pipeau (oui, je sais, c’est léger). […] Le pipole n’a aucun talent particulier, il fait juste parler de lui, comme ça. Et on le paie pour ça. […] Le pipole a eu accès à la notoriété en s’affichant un peu n’importe où et n’importe comment. Voilà. (blog entry dated 27 January 2012,
35 According to Levinson, forms can be considered as marked if they are “more morphologically complex and less lexicalized, more prolix or periphrastic, less frequent or usual, and less neutral in register” (Levinson 2000: 137). Levinson’s relative (un)markedness thus includes onomasiological salience (cf. Geeraerts 1993), but also other aspects related to the formal features of the alternative expressions. 36 Throughout his book, Levinson speaks of stereotypical interpretations (cf. Putnam 1978), using this term in a sense which is very close to current uses of the more widespread term prototypical.
Formal variance and semantic changes in borrowing
95
author: Dunno [female, France], http://dunnowhattodo.over-blog.com/article-jaime-pas-les-pipoles-98039797.html, accessed 09 February 2012) These observations about the pragmatic effects arising from relatively marked forms within the RL can also be applied to certain variants of loanwords, as the following examples show. In all of these cases, the speakers express metalinguistic reflections about the stylistic and pragmatic values of variants differing from the established pronunciation and/or spelling of the loanword people : the use of pipole (pronounced [pipol], as indicated by the spelling ), which was only rarely used in 2007, at the time of the message in (88), triggers reactions of strong repugnance (“EEEEEEEEEEEWASHHHHHHHHH! Mais euuuuuurk!”, “un horrible accent français exagéré”, “Ohhh que non!”); similarly, the integrated pronunciation variant [pipOl] / [pipol] is judged as inadequate by the speaker in (89). In (90) the speaker motivates his deliberate deviation from the established form by a desire for humor, and in (91), the speaker asserts different diastratic values of spelling variants of pipolisation (a derivation from the loanword people /pipole ). Thus, the deviations from the usual degree of integration can be either positively or negatively interpreted – but the basic point remains that they trigger additional pragmatic or lectal interpretations.37 (88) LES PEOPLE TROP PIPOLES Ah oui et la palme d’or de la journée. Le mot “people” … employé par les français pour désigner les gens riches et célèbres qui font la une des magazines par leur consommation de drogues ou de jeunes garçons tel Michael Jackson, ou par leurs mariages spectaculaires, leurs excès de poids et leurs régime. […] Donc le mot “people” a été francisé … maintenant, en France, le mot “pipole” ca se dit et c’est accepté. EEEEEEEEEEEWASHHHHHHHHH! Mais euuuuuurk! Pipole … pipole … dites le tous ensemble avec un horrible accent français exagéré … Pi-pôle! Attention le pétage de broue, ca pète fort pis ca sent pas bon! Ohhh que non! Pipôle! Maintenant on lève le petit doigt tout le monde, attention … Pipôle! Oh mais regardez-vous, vous n’êtes pas très pipôle! Ah ah ah! Regarde la Tour Eiffel, elle n’est pas très pipôle! Oh oh! Mangeons du pain … (from a blog entry, dated 14 February 2007, author: Dyrian, location: Jonquière, Saguenay-Lac-St-Jean, Canada, http://grossemerdeausaguenay.blogspot.de/ 2007/02/les-people-trop-pipoles.html, accessed 31 March 2012, bold print EWF) (89) Note de Nil: on prononce le mot People «Pi-peul », pas «Pi-Pole ». Merci de ne pas répéter cette faute de prononciation stupide xD (comment, dated 25 February 2007,
37 Another domain where the choice of a certain level of loanword integration is of central importance is publicity and product marketing. In product descriptions the choice of weakly integrated variants (e.g. G. Eiscrème /Eiscreme ‘ice cream’, Eiscafé ‘iced coffee’) often seems to be considered better for marketing reasons, as these forms can convey a flavor of exclusivity, gourmet quality, etc., which is less strongly present in integrated variants (G. Eiskreme /Eiskrem, Eiskaffee ).
96
Esme Winter-Froemel
posted in the section L’actualité informatique et multimédia, author: Nicolas.G, http://www.pcinpact.com/actu/news/34901-Britney-Spears-cheveux-encheres. htm, accessed 31 March 2012, italics original, bold print EWF) (90) Au lieu de faire une revue de presse people comme j’ai fait il y a 2 semaines, je préfère faire une revue de presse popole c’est plus humoristique. (blog entry, dated 07 March 2006, author: Baby Junkie [male, origin: Amiens], http://www. chartsinfrance.net/communaute/index.php?showtopic=10932&mode=threaded, accessed 31 March 2012, bold print EWF) (91) Je vais arrêter là cette croisade donquichottesque contre la pipolisation (ou peopolisation, pour du frangliche de plus haut vol). (article dated 30 June 2008, author: Captain Gloo, http://www.glooland.com/pipolisation-lauremanaudou24375, accessed 31 March 2012)
5 Conclusion As we have seen in this paper, a usage-based approach to processes of borrowing reveals that different modalities of language contact and different strategies of loanword integration can be realized in language contact. The variance resulting from the heterogeneous situations of borrowing in speaker-hearer interaction represent a challenge for traditional approaches, as borrowing and loanword integration prove to be much less systematic than many previous approaches (e.g. optimality theoretic accounts of phonetic/phonological loanword integration) assume. As a result of the present studies, variance thus appears to be a fundamental characteristic of processes of borrowing, and theoretical models of borrowing should be conceived in a sufficiently flexible way in order to account for the variance observed. Rather than predicting the (one and only) “solution” that will be chosen by the RL speech community, the main aim of this paper has been to explain how the different variants of a certain loanword may have originated. Another important task has been to explain semantic shifts which can frequently be observed in borrowing. Their analysis involves a methodological shift towards an onomasiological perspective which permits us to explain how a new interpretation of the loanword can be introduced by the hearer. Moreover, the notion of hearer-induced reanalysis permits us to understand why only two types of semantic shifts (specialization and metonymy) can be observed in borrowing. Finally, we have seen that linguistic and extra-linguistic, cognitive and communicative factors play an important role here. Loanword integration is to a certain extent negotiated in single acts of communication between speakers and hearers, and the choice of using a loanword – and of using a particular variant of the loanword, characterized by a certain degree of integration – is a choice which may trigger additional pragmatic interpretations, positive or negative. In this respect, there is no “per-
Formal variance and semantic changes in borrowing
97
fect” strategy of integration, but depending on the communicative setting, different options may be judged as optimal.
Acknowledgements I would like to thank two anonymous reviewers for their helpful comments on this article.
References Alexieva, Nevena. 2008. How and why are anglicisms often lexically different from their English etymons? In Roswitha Fischer & Hanna Pułaczewska (eds.), Anglicisms in Europe: Linguistic diversity in a global context, 42–51. Cambridge: Cambridge Scholars Publishing. AltaVista. http://fr.altavista.com (accessed 21 May 2010). Arrivé, Michel, Françoise Gadet & Michel Galmiche. 1986. La grammaire d’aujourd’hui: Guide alphabétique de linguistique française. Paris: Flammarion. Betz, Werner. 1949. Deutsch und Lateinisch. Die Lehnbildungen der althochdeutschen Benediktinerregel. Bonn: Bouvier u. Co. Betz, Werner. 1974. Lehnwörter und Lehnprägungen im Vor- und Frühdeutschen. In Friedrich Maurer & Heinz Rupp (eds.), Deutsche Wortgeschichte, 3rd edn., volume 1, 135–163. Berlin & New York: De Gruyter. Blank, Andreas. 1997. Prinzipien des lexikalischen Bedeutungswandels am Beispiel der romanischen Sprachen (Beihefte zur Zeitschrift für romanische Philologie 285). Tübingen: Niemeyer. Blank, Andreas. 2001. Einführung in die lexikalische Semantik für Romanisten. Tübingen: Niemeyer. Busse, Ulrich & Manfred Görlach. 2002. German. In Manfred Görlach (ed.), English in Europe, 13–36. Oxford: Oxford University Press. Bußmann, Hadumod. 2008 [1993]. Lexikon der Sprachwissenschaft, 4th edn. Stuttgart: Kröner. Callies, Marcus, Eva Ogiermann & Konrad Szcze´sniak. 2010. Genusschwankung bei der Integration von englischen Lehnwörtern im Deutschen und Polnischen. In Carmen Scherer & Anke Holler (eds.), Strategien der Integration und Isolation nicht-nativer Einheiten und Strukturen, 65–86. Berlin & New York: Walter de Gruyter. Coseriu, Eugenio. 1955–56. Determinación y entorno. Romanistisches Jahrbuch 7. 29–54. DAF = Dictionnaire de l’Académie française, neuvième édition [1992–], version informatisée, http://atilf.atilf.fr/academie9.htm (accessed 10 June 2009). Detges, Ulrich & Richard Waltereit. 2002. Grammaticalization vs. reanalysis: A semantic-pragmatic account of functional change in grammar. Zeitschrift für Sprachwissenschaft 21. 151– 195. DELI = Cortelazzo, Manlio & Paolo Zolli. 1999. Il nuovo etimologico. DELI – Dizionario Etimologico della Lingua Italiana, 2nd edn. Bologna: Zanichelli. DHLF = Rey, Alain. 1998. Dictionnaire historique de la langue française. 3 vols. Paris: Dictionnaires Le Robert. DO = Devoto, Giacomo & Gian Carlo Oli. 2000. Il Dizionario della lingua italiana. Firenze: Le Monnier. Duckworth, David. 1977. Zur terminologischen und systematischen Grundlage der Forschung auf dem Gebiet der englisch-deutschen Interferenz. Kritische Übersicht und neuer Vorschlag. In
98
Esme Winter-Froemel
Herbert Kolb & Hartmut Lauffer (eds.), Sprachliche Interferenz. Festschrift für Werner Betz zum 65. Geburtstag, 36–56. Tübingen: Niemeyer. Evans, Nicholas & David Wilkins. 2000. In the mind’s ear: Semantic extensions of perception verbs in Australian languages. Language 76. 546–592. Frantext = Base textuelle FRANTEXT. http://www.frantext.fr/ (accessed 31 March 2012). Geckeler, Horst & Dieter Kattenbusch. 1992 [1987]. Einführung in die italienische Sprachwissenschaft, 2nd edn. Tübingen: Niemeyer. Geeraerts, Dirk. 1993. Generalised onomasiological salience. In Jan Nuyts & Eric Pederson (eds.), Perspectives on Language and Conceptualization, 43–56. Cambridge: Cambridge University Press. Grice, H. Paul. 1975. Logic and conversation. In Peter Cole & Jerry L. Morgan (eds.), Syntax and Semantics, vol. 3, Speech Acts, 41–58. New York, San Francisco & London: Academic Press. Gusmani, Roberto. 1973. Aspetti del prestito linguistico. Napoli: Libreria Scientifica Editrice. Heger, Klaus. 1969. Die Semantik und die Dichotomie von Langue und Parole. Zeitschrift für romanische Philologie 85. 144–215. Hilty, Gerold. 1971. Bedeutung als Semstruktur. Vox Romanica 30. 242–263. Hope, T. E. 1971. Lexical borrowing in the Romance languages. A critical study of italianisms in French and gallicisms in Italian from 1100 to 1900. Oxford: Blackwell. Humbley, John. 2002. French. In Manfred Görlach (ed.), English in Europe, 108–127. Oxford: Oxford University Press. Humbley, John. 2008. [Review] Jansen, Silke (2005), Sprachliches Lehngut im world wide web […]. Neologica 2. 228–233. Humbley, John, Christine Jacquet-Pfau & Jean-François Sablayrolles. 2011. Emprunts, créations “sous influence” et équivalents. In Marc Van Campenhoudt, Teresa Lino & Rute Costa (eds.), Passeurs de mots, passeurs d’espoir: lexicologie, terminologie et traduction face au défi de la diversité. Actes des 8e Journées scientifiques de chercheurs: Lexicologie, terminologie, traduction, 325–339. Paris: Éditions des archives contemporaines. ´ Jabłonski, Mirosław. 1990. Regularität und Variabilität in der Rezeption englischer Internationalismen im modernen Deutsch, Französisch und Polnisch. Aufgezeigt in den Bereichen Sport, Musik und Mode. Tübingen: Niemeyer. Jacobs, Haike & Carlos Gussenhoven. 2000. Loan phonology: Perception, salience, the lexicon and OT. In Joost Dekkers, Frank van der Leeuw & Jeroen van de Weijer (eds.), Optimality Theory. Phonology, Syntax, and Acquisition, 193–210. Oxford: Oxford University Press. Kehoe, Andrew. 2006. Diachronic linguistic analysis on the web with WebCorp. In Antoinette Renouf & Andrew Kehoe (eds.), The changing face of corpus linguistics, 297–308. Amsterdam: Rodopi. Kiesler, Reinhard. 1993. La tipología de los préstamos lingüísticos: no sólo un problema de terminología. Zeitschrift für romanische Philologie 109. 505–525. Koch, Peter. 1996. La sémantique du prototype: sémasiologie ou onomasiologie? Zeitschrift für französische Sprache und Literatur 106. 223–240. Koch, Peter. 2000. Pour une approche cognitive du changement sémantique lexical: aspect onomasiologique. In Jacques François (ed.), Théories contemporaines du changement sémantique (Mémoires de la Société de Linguistique de Paris, N.S. 9), 75–95. Leuven: Peeters. LaCharité, Darlene & Carole Paradis. 2005. Category Preservation and Proximity versus Phonetic Approximation in Loanword Adaptation. Linguistic Inquiry 36(2). 223–258. Langacker, Ronald W. 2001. Discourse in Cognitive Grammar. Cognitive Linguistics 12(2). 143– 188. Langacker, Ronald W. 2007. Cognitive Grammar. In Dirk Geeraerts & Hubert Cuyckens (eds.), The Oxford handbook of Cognitive Linguistics, 421–462. Oxford: Oxford University Press.
Formal variance and semantic changes in borrowing
99
Levinson, Stephen C. 2000. Presumptive meanings. The theory of generalized conversational implicature. Cambridge (MA) & London: The MIT Press. Meisenburg, Trudel. 1993. Graphische und phonische Integration von Fremdwörtern am Beispiel des Spanischen. Zeitschrift für Sprachwissenschaft 11(1). 47–67. Meisenburg, Trudel. 1996. Romanische Schriftsysteme im Vergleich: eine diachrone Studie (ScriptOralia 82. Zugl.: Habil.-Schr. Freiburg, Breisgau, 1994). Tübingen: Narr. Miao, Ruiqin. 2005. Loanword adaptation in Mandarin Chinese: Perceptual, phonological and sociolinguistic factors. Stony Brook University / Shanghai Jiao Tong dissertation. Rutgers Optimality Archive ROA 814. http://roa.rutgers.edu/view.php3?id=1125 (accessed 15 April 2009). OED = Oxford English Dictionary. 2007. Oxford: Oxford University Press. http://www.oed.com (accessed 31 March 2012). Onysko, Alexander. 2007. Anglicisms in German. Borrowing, lexical productivity, and written codeswitching. Berlin & New York: Walter de Gruyter. Onysko, Alexander & Esme Winter-Froemel. 2011. Necessary loans – luxury loans? Exploring the pragmatic dimension of borrowing. Journal of Pragmatics 43(6). 1550–1567. Peperkamp, Sharon & Emmanuel Dupoux. 2003. Reinterpreting loanword adaptations: The role of perception. Proceedings of the 15th International Congress of Phonetic Sciences. 367–370. PR = Rey-Debove, Josette & Alain Rey. 2007. Le nouveau Petit Robert. Dictionnaire alphabétique et analogique de la langue française. Paris: Dictionnaires Le Robert. Pulcini, Virginia. 2002. Italian. In Manfred Görlach (ed.), English in Europe, 151–167. Oxford: Oxford University Press. Putnam, Hilary. 1978. Meaning, reference and stereotypes. In Franz Guenthner & Monica Guenthner-Reutter (eds.), Meaning and translation. Philosophical and linguistic approaches, 61– 82. Worcester & London: Duckworth. Renouf, Antoinette, Andrew Kehoe & Jayeeta Banerjee. 2007. Web Corp: an integrated system for web text search. In Marianne Hundt, Nadja Nesselhauf & Carolin Biewer (eds.), Corpus linguistics and the Web, 47–68. Amsterdam: Rodopi. http://rdues.bcu.ac.uk/publ/WebCorp_integrated_system_DRAFT.pdf (accessed 31 March 2012). Rohde, Ada, Anatol Stefanowitsch & Suzanne Kemmer. 1999. Loanwords in a usage-based model. CLS 35: The Main Session. 265–275. Rose, Yvan & Katherine Demuth. 2006. Vowel epenthesis in loanword adaptation: Representational and phonetic considerations. Lingua 116. 1112–1139. Roudet, Léonce. 1908. Remarques sur la phonétique des mots français d’emprunt. Revue de philologie française 22. 241–267. Saussure, Ferdinand de. 1969 [1916]. Cours de linguistique générale. Publié par Charles Bally et Albert Sechehaye. Paris: Payot. Shin, Naomi Lapidus. 2010. Efficiency in lexical borrowing in New York Spanish. International Journal of the Sociology of Language 203. 45–59. Speelman, Dirk, Stefan Grondelaers & Dirk Geeraerts. 2003. Profile-based linguistic uniformity as a generic method for comparing language varieties. Computers and the Humanities 37(3). 317–337. Sperber, Dan & Deirdre Wilson. 1986. Relevance. Communication and cognition. Oxford: Blackwell. Steuckardt, Agnès, Odile Leclercq, Aïno Niklas-Salminen & Mathilde Thorel (eds.). 2011. Les dictionnaires et l’emprunt. XVIe-XXIe siècle. Aix-en-Provence: Publications de l’Université de Provence.
100
Esme Winter-Froemel
Thomason, Sarah Grey & Terrence Kaufman. 1988. Language contact, creolization, and genetic linguistics. Berkeley (LA) & London: University of California Press. TLF = Le Trésor de la Langue Française informatisé. http://atilf.atilf.fr/tlf.htm (accessed 31 March 2012). Van Coetsem, Frans. 2000. A general and unified theory of the transmission process in language contact. Heidelberg: Winter. WebCorp = WebCorp – The Web as corpus. http://www.webcorp.org.uk/ (accessed 21 May 2010). Winter-Froemel, Esme. 2010. Les people, les pipoles, les pipeuls: Variance in loanword integration. PhiN 53. 62–92. http://web.fu-berlin.de/phin/phin53/p53t4.htm (accessed 26 October 2012). Winter-Froemel, Esme. 2011. Entlehnung in der Kommunikation und im Sprachwandel. Theorie und Analysen zum Französischen (Beihefte zur Zeitschrift für romanische Philologie 360). Berlin & Boston (MA): Walter de Gruyter. Winter-Froemel. 2012. Néologie sémantique et ambiguïté dans la communication et dans l’évolution des langues: défis méthodologiques et théoriques. Cahiers de Lexicologie 100. 55–80. Winter-Froemel, Esme & Angelika Zirker. 2010. Ambiguität in der Sprecher-Hörer-Interaktion. Linguistische und literaturwissenschaftliche Perspektiven. Zeitschrift für Linguistik und Literaturwissenschaft 158. 76–97. Zenner, Eline, Dirk Speelman & Dirk Geeraerts. 2012. Cognitive Sociolinguistics meets loanword research: Measuring variation in the success of anglicisms in Dutch. Cognitive Linguistics 23(4). 749–792.
Augusto Soares da Silva
Measuring and comparing the use and success of loanwords in Portugal and Brazil: A corpus-based and concept-based sociolectometrical approach* Abstract: This paper presents a quantitative study on differences in the use of loanwords in European Portuguese and Brazilian Portuguese, with specific attention to their impact on the general linguistic/lexical distance between the two national varieties. Specifically, loanwords in the field of football and clothing terminology are studied by means of a concept-based method of onomasiological variation. This variation is not due to a different conceptual classification of the same entity, but rather to the use of different synonymous terms for referring to the same concept, i.e. denotational synonyms, which may be associated with different regions, different social groups or even different registers. The study uses advanced corpus-based and sociolectometrical methods to measure the use and success of loanwords, as well as their impact on convergence and divergence between the two varieties, specifically featural measures (calculating the proportion of terms possessing a special feature) and uniformity measures (calculating onomasiological homogeneity and convergence/divergence between language varieties). These measures are based on onomasiological profiles, i.e. sets of alternative synonymous terms, together with their frequencies. The use of the onomasiological profile-based method allows for a control mechanism to avoid thematic bias in the corpus. The study combines different types of empirical data, both corpus-based and survey-based. The data include thousands of observations of the usage of alternative terms to refer to 43 nominal concepts from football and clothing terminologies, and several dozen elicitations of attitudinal intentions with regard to 15 clothing concepts by means of a survey. Corpus material was extracted from sports newspapers and fashion magazines from the 1950s, 1970s and 1990s/2000s, Internet chats related to football, and labels and price tags pictured from clothes shop windows. Football and clothing terms confirm the hypothesis that the influence of English, French and other foreign languages is stronger in the Brazilian variety than in the European variety. The use of loanwords has contributed towards onomasiological heterogeneity within and across the two national varieties in the last 60 years. Specifically, loanwords and their adaptations contributed to divergence between the two varieties in the lexical domain of clothing and for slight convergence in football vocabulary. The attitudinal inten-
* This study was financed by national funding through the Portuguese Foundation for Science and Technology, as part of the PEst-OE/FIL/UI0683/2011 research project. I would like to thank two anonymous reviewers and Eline Zenner for their thorough and illuminating comments. Needless to say, the remaining errors are only mine.
102
Augusto Soares da Silva
tions of both the Brazilian and the Portuguese respondents are very favorable to the use of loanwords.
1 Introduction The aim of this study is to measure and compare the use and success of loanwords in European Portuguese (EP) and Brazilian Portuguese (BP). It is a development of the author’s previous sociolexicological and sociolectometrical research into the lexical convergence and divergence that has taken place between the two national varieties of Portuguese in the last 60 years (Soares da Silva 2010). Focusing on the lexical fields of football and clothing concepts, three main issues are addressed by means of a corpusbased and sociolectometrical approach: (i) whether the influence of English and other foreign languages is stronger in the Brazilian than the European variety; (ii) the impact of loanwords on onomasiological variation in football and clothing concepts and on the evolving relationship between the two national varieties; (iii) whether speakers’ subjective knowledge of the origins of words corresponds to actual language behavior as observed in the corpus, and, in addition, the effect of language planning on the use of loanwords. The study is concerned with onomasiological variation between semantically equivalent terms (denotational synonyms) and therefore takes into account the concept expressed by the lexical item and the different ways of expressing it. The onomasiological method has been adopted to study language-internal variation, since denotational synonyms often display sociolinguistic differences and it is these differences that motivate the very existence of, and competition between, language varieties. In addition, looking at alternative expressions of lexical meanings provides us with a reliable control mechanism to avoid the potential statistical bias caused by an asymmetric distribution of concepts. The data include thousands of observations of the usage of alternative terms to refer to 43 nominal concepts from football and clothing terminologies in the 1950s, 1970s and 2000s, and several dozen elicitations of attitudinal intentions with regard to 15 clothing concepts by means of a survey. The present study uses advanced corpus-based and sociolectometrical methods to measure the use and success of loanwords, specifically featural measures (calculating the proportion of terms possessing a special feature) and uniformity measures (calculating onomasiological homogeneity and convergence/divergence between language varieties). These measures are based on onomasiological profiles, i.e. sets of alternative synonymous terms, together with their frequencies. These profile-based sociolectometrical techniques were developed by the Quantitative Lexicology and Variational Linguistics research unit for Netherlandic and Belgian Dutch (Geeraerts, Grondelaers, and Speelman 1999; Speelman, Grondelaers, and Geeraerts 2003; Zenner, Speelman, and Geeraerts 2012). Focusing on the interplay between conceptual and social aspects of language-internal variation, this study subscribes to the framework of Cogni-
Measuring the use and success of loanwords in Portugal and Brazil
103
tive Sociolinguistics (Kristiansen and Dirven 2008; Geeraerts, Kristiansen, and Peirsman 2010), an emerging extension of Cognitive Linguistics as a meaning-oriented and usage-based approach to language.
2 Background and material Differences between European and Brazilian Portuguese exist at all levels of linguistic structure. Innovative and conservative trends have emerged in both varieties, such that tradition is not the privilege of EP nor is innovation the privilege of BP. For example, in terms of phonetics and phonology, BP is more conservative than EP: there has been a marked change in the system for unstressed vowels in EP towards a strong rise, reduction and even disappearance. BP is also more conservative than EP with regard to clitic placement: in BP the proclisis of Middle and Classical Portuguese still predominates, whereas EP has moved towards enclisis. For an overview of the main differences between the two national varieties, see Baxter (1992), Castilho (2010: 171–195) and Soares da Silva (in press). BP presents a situation of diglossia – there is a clear distance between the idealized and prescriptive traditional norm and the real norm (or norms) used in big city centers – and is characterized by a wide dialectal continuum (Mattos e Silva 2004), while an increasing standardization of EP has been observed since the 1974 democratic revolution. BP is now facing two major challenges: a sociolinguistic dilemma (due to great regional and social variation) and a didactic dilemma (teaching the language to a soaring population). A population of 220 million Brazilians is foreseen in the next 15 years; this involves a 40 million increase in population (Castilho 2005). The change in recent years in the official language teaching policy in Brazil has helped reduce the impact of these problems and schools are now more receptive to sociolects than before. In addition, the intensive and rapid urbanization of Brazil has brought popular and educated varieties of BP into closer contact and therefore reduced the gap between them. Both Brazilian and Portuguese writers, grammarians, linguists and other intellectuals have explicitly or implicitly revealed contrasting attitudes towards the unity/ diversity of the Portuguese language and the convergence/divergence between EP and BP. Some believe that what is spoken in Brazil and what is spoken in Portugal are already different languages, whilst others consider them (distinctively) different varieties of the same language. Equally some defend the idea of convergence and others divergence between the European and Brazilian varieties. There are, as yet, no sufficiently developed and systematic studies on the question of convergence or divergence between the two national varieties. The hypothesis of divergence currently holds the greatest consensus amongst both Portuguese and Brazilian linguists. Other hypotheses concerning the relationship between EP and BP refer to the increasing influence of BP on EP, greater stratification (the distance be-
104
Augusto Soares da Silva
tween formal and informal registers) in BP than EP, and, of greater relevance to this study, the idea that BP is more receptive to loanwords than EP. It is well known that English is the main contemporary source for loanwords in Portuguese and other languages. In the past, French was the dominant source, specifically from the 18th century to the beginning of the 20th century. As with French loanwords in the past, English loanwords today arouse opposing feelings and attitudes, either of unquestioning adoption or hostility, and desire or fear (with regard to Brazilian society, see Garcez and Zilles 2004). They are coveted as a sign of modernity, knowledge, consumerism and power, and of imitating and belonging to the external model of power and consumption, whether North American or European (in terms of the more developed European societies). In Brazilian society, where there is a bigger divide between the small consumer class and the great mass of non-consumers, anglicisms are used more intensively and visibly to mark this economic, social and cultural differentiation than in Portuguese society. However, anglicisms are also dreaded and resisted, due to a fear of external invasion threatening the supposed “purity” of the language, and a fear of diversity and plurality. In the vast Brazilian territory where national unity is under greater threat, this fear is probably stronger than in Portuguese society. Nevertheless, these feelings and attitudes can be overcome. Football, the reigning sport in both countries, is a prime example in Portugal and even more so in Brazil of how English loanwords have been adopted rapidly by all social classes since the early decades of the 20th century and recognized as part of the identity of Brazilian and Portuguese societies. The fight against foreign words, namely English loanwords, is back on the agenda, particularly in Brazil, and has awakened the same passionate and purist attitudes triggered in the past in Portugal (and also in Brazil, at the end of the 19th century) in the fight against French loanwords. The most eloquent example of the current war on English loanwords can be found in Brazil. In 1999, a Federal Bill proposed by the MP Aldo Rebelo on the “promotion, protection and defense of the use of the Portuguese language” contained provisions for banning the use of foreign words and fines for those who broke the law. Aldo Rebelo’s Bill was based on two perfectly false and prejudiced linguistic issues, namely communication difficulties resulting from the invasion of English words, and the deterioration of the Portuguese language. Underlying the Bill and many other manifestations of linguistic purism, such as the proliferation of grammar columns in Brazilian newspapers aimed at preserving Portuguese from the effects of “impurities” that were external (above all, the influence of English) and also internal (non-prestigious varieties) was a nationalist, conservative and xenophobic ideology. This ideology aims to transmit the mythical idea that the gigantic country of Brazil speaks one language that is absolutely identical and transparent to all citizens. The same ideology sees linguistic unity as the main guarantee of national unity and language as the arena for the fight against Anglo-American imperialism and the internationalism of globalization. In addition, it is an authoritarian ideology that wants to maintain the language of power by force and thus is also the ideology of social ex-
Measuring the use and success of loanwords in Portugal and Brazil
105
clusion and economic discrimination. The Aldo Rebelo project led to a lively counterreaction (obviously) on the part of Brazilian linguists which unfortunately had little influence, expressed in various ways and in particular through the book Estrangeirismos: guerras em torno da língua (‘Foreignisms: the language wars’), by Carlos Alberto Faraco, published in 2001 (with a revised and extended third edition appearing in 2004). Nationalist ideology and linguistic purism can also be found in Portugal and in this case Brazilian language forms are considered to be the “invaders”. A collective book entitled Estão a assassinar o português! (‘Murdering the Portuguese language!’), was published in Portugal in 1983 in which the main party considered guilty of committing “outrages” against the Portuguese language was the Brazilian soap-operas (Moura 1983). More recently, aversion to the Brazilian language variety has re-emerged within the context of the recent spelling agreement, which implies more changes in EP than BP: many Portuguese people see the agreement as representing the unacceptable submission of Portugal to Brazil. This reflects a neo-colonialist stance still espoused by many Portuguese, according to which miscegenation leads to the corruption and impoverishment of a hypothetically “authentic” Portuguese tongue. At this point it is worth stating in more detail the three research questions briefly mentioned above, linking them to the socio-cultural and linguistic situation of the two national varieties. Firstly, is it possible to confirm that the influence of English and other foreign languages is stronger in BP than in EP? From a stratificational point of view, is foreign influence stronger in the standard register than the substandard register? In diachronic terms, has foreign influence increased or decreased in the last 60 years? Secondly, what is the impact of loanwords on onomasiological heterogeneity and on the relationship between EP and BP? More specifically, has the use of loanwords contributed towards the convergence or divergence of the two national varieties in the last 60 years? Finally, does the speakers’ knowledge about the origin of words correspond to actual language behavior as observed in the corpus? In addition, what is the effect of language planning on the use of loanwords? More specifically, have the purist measures that have been implemented to curb the use of loanwords been successful? The data for the present study was collected from the lexical fields of football and fashion/clothing, due to their popularity and the fact that they are susceptible to the influence of foreign languages. Corpus material was extracted from three different sources in order to respond to diachronic and synchronic research questions: (i) sports newspapers and fashion magazines from the early years of the 1950s, 1970s and 1990s/2000s; (ii) Internet Relay Chat (IRC) channels related to football (traditional chat fora); and (iii) labels and price tags pictured from shop windows in two Portuguese and Brazilian towns respectively. Material gathered from (i) can help to answer the diachronic question of convergence and divergence between the two national varieties, while material collected from (ii) and (iii) sheds light on the synchronic question of stratification in both varieties (i.e. the actual distance between the standard strata and
106
Augusto Soares da Silva
the substandard strata). Data referring to the Brazilian variety was collected from the two largest cities in the country, namely São Paulo and Rio de Janeiro. The sub-corpus of football contains 2.7 million tokens selected from 8 newspapers (4 Portuguese and 4 Brazilian newspapers) and 15 million tokens collected from Internet chats. The sub-corpus of clothing extends to 1.2 million tokens gathered from 24 fashion magazines (14 Portuguese and 14 Brazilian magazines) and 1,300 pictures of labels and price tags photographed from clothes shop windows. These two subcorpora make up the CONDIVport corpus (Soares da Silva 2005, 2008a, 2008b). This corpus is structured according to geographical, diachronic and stylistic variables and has, at present, an extension of 4 million tokens for the formal register (used in sports newspapers and fashion magazines) and 15 million tokens for the informal register (of Internet football chats and clothes labels). The CONDIVport corpus is partly available on the Linguateca website www.linguateca.pt/ACDC (a distributed resource center for language technology for Portuguese; Santos and Sarmento 2003; Santos 2009). On the basis of the CONDIVport corpus, denotational synonyms used to denote 43 nominal concepts were compiled, 21 from football terminology, and 22 from clothing terminology, together with their frequencies. As selection criteria, concepts that were onomasiologically-formally heterogeneous, and concepts that were representative of their respective lexical fields were chosen. As for the corresponding lexical items, terms with a strong popular mark were excluded to avoid inflating differences. As regards the 21 sets of synonymous terms (or onomasiological profiles ) from the field of football, a total number of 183 terms were studied in a database containing 90,202 observations of these terms used in sports newspapers and 143,946 observations of their use in Internet chats. As for the 22 onomasiological profiles of clothing items for men (M) and women (F), 264 terms were studied in a database compiling 12,451 observations of their use in fashion magazines and 2,775 observations of their use in labels and price tags pictured from clothes shops. All the profiles including their denotational synonyms are listed in Appendix A. Loanwords that keep their original form are indicated in inverted commas. The name of each profile is translated into English. The profiles for football are: BACK, BALL, COACH, CORNER, DRIBBLING, FORWARD, FOUL, FREE KICK, GOAL1 , GOAL2 , GOALKEEPER, MATCH, MIDFIELDER, OFFSIDE, PENALTY, REFEREE, ASSISTANT REFEREE, SHOT/KICK, SHOT/PLAYING, TEAM, WINGER. The profiles for clothing are: BLOUSE F, CARDIGAN M/F, COAT F, COAT M, DRESS F, JACKET M/F, JACKET (BLOUSON) M/F, JEANS M/F, JUMPER M/F, LEGGINGS F, OVERCOAT M/F, RAINCOAT M/F, SHIRT M, SHORT JACKET F, SHORT JACKET M, SHORT TROUSERS M/F, SKIRT F, SUIT M, SUIT/OUTFIT F, TAILORED JACKET M/F, TROUSERS M/F, T-SHIRT M/F. The total number of observations for each football and clothing profile is presented in Appendix B and Appendix C.
Measuring the use and success of loanwords in Portugal and Brazil
107
3 Onomasiological and sociolectometrical methods This study uses the onomasiological method in the study of language-internal variation and focuses specifically on onomasiological variation between denotational synonyms. Lexical choice in discourse may be determined by conceptual factors (differences in concepts) or by lectal factors (differences between language varieties), which gives rise to two different types of onomasiological variation. For example, the choice between avançado and atacante is a choice between two forms that express the same concept (‘forward’) but belong to different national varieties (the former is more widely used in EP while the latter is common in BP), whereas the choice between avançado (‘forward’) and jogador (‘player’) is a choice of concept (the first term is more specific, while the second is the taxonomically hyperonymous term). We may call the avançado/atacante type of variation formal onomasiological variation, in contrast with conceptual onomasiological variation illustrated in the avançado/jogador type of variation (see Geeraerts, Grondelaers, and Bakema 1994). Formal onomasiological variation is not due to a different conceptual classification of the same entity, but rather to the use of different synonymous terms for referring to the same concept, i.e. denotational synonyms, which may be associated with different regions, different social groups or even different registers. In other words, denotational synonyms are characterized by the fact that their differences are not conceptual but social in nature, namely sociolinguistic, stylistic or pragmatic. This variation is particularly interesting from a sociolinguistic point of view because the use of denotational synonyms generally gives some hints as to the relationships existing between language varieties. This does not mean that the distinction between formal and conceptual onomasiological variation is a matter of dichotomous classification, or that it is an easy task to establish equivalence of meaning between the different expressions. Conceptual differences, in fact, may be subtle and determining when linguistic expressions can be accepted to be formal variants of each other is often hard and rather like choosing a cut-off point on a continuum. In the case of concrete lexical items such as clothing and football terms, semantic equivalence and denotational synonymy are easier to establish, since we can control the concrete referents and therefore verify whether the referent is the same. In this study all the denotational synonyms from the lexical field of clothing were determined on the basis of images of the respective items of clothing. In the case of the football terms, images and/or context also enabled denotational synonymy to be determined objectively. The difficulties increase as the focus moves from lexical items that have concrete references in the real world to abstract lexical items and to grammatical constructions. However, the important thing to determine is not whether lexical or constructional alternative expressions differ semantically but whether the semantic differences are stable between the different varieties. It can therefore be said that if the semantic differences between the lexical or constructional variables in the different varieties are stable, the remaining variation is sociolinguistic variation.
108
Augusto Soares da Silva
With regard to establishing the onomasiological profiles mentioned in the previous paragraphs and listed in Appendix A, certain procedures were adopted to neutralize, as far as possible, conceptual variation between the denotational synonyms in each onomasiological profile. In addition to excluding terms with a strong popular mark, as mentioned above, metaphorical expressions were also excluded and hyponymous terms avoided. The calculation of foreign influence on European and Brazilian Portuguese varieties, together with calculating its impact on convergence and divergence and other types of distances between the two national varieties, is based on onomasiological profile-based and sociolectometrical methods. These methods have been developed by the Quantitative Lexicology and Variational Linguistics (QLVL) research unit as part of its studies on Netherlandic and Belgian Dutch (Geeraerts, Grondelaers, and Speelman 1999; Speelman, Grondelaers, and Geeraerts 2003). Sociolectometry refers to any lectometric effort to calculate distances between language varieties that explores the multifactorial nature of linguistic variation and therefore simultaneously analyses lectal varieties representing several sources of linguistic variation. The basis for the calculations are individual formal onomasiological profiles, or profiles in short. A profile for a particular concept in a particular language variety is the set of alternative linguistic means on the same taxonomical level used to designate that concept in that language variety, together with their frequencies (expressed as relative frequencies, absolute frequencies or both). For instance, the profile for the concept GOAL1 includes the alternative terms bola, goal, gol, gôl, golo, ponto and tento. Table 1 presents the absolute and relative frequencies for each of the alternative terms in the databases for EP and BP in the 1950s. The use of a profile-based method has undeniable advantages for the study of language-internal variation as well as for loanword research. Working with a variable relative frequency among alternatives rather than their absolute frequency is a convenient framework for classifying and comparing variables. Moreover, the onomasiological profiles method allows for a control mechanism to avoid thematic bias in the corpus. Token frequencies in a corpus could correlate with a formal onomasiological preference in the corpus, but they could also correlate with the thematic specificity of the corpus (Speelman, Grondelaers, and Geeraerts 2003). If, in the case of the profile COAT for instance, only the variable casaco and not its alternative blazer is included in the investigation, a high frequency of casaco in a given text is ambiguous: it could be due to a preference for casaco rather than blazer, but it could also be due to the fact that the text is avoiding the use of loanwords. Including all the alternative expressions of a concept in the investigation resolves such ambiguities. In lexical borrowing research, an even more specific way of ensuring that the onomasiological profile-based method prevents the possible distortion caused by thematic bias is pointed out by Zenner, Speelman, and Geeraerts (2012: 752–753). When comparing loanwords in two time periods, an increase in raw frequencies does not necessarily mean an increased entrenchment of those loanwords. The increase could also simply indicate that the more
Measuring the use and success of loanwords in Portugal and Brazil
109
recent corpus contains disproportionately more articles on a certain subject than the older corpus. Three sociolectometrical techniques were used: uniformity measures, featural measures and attitudinal intention measures. Onomasiological heterogeneity, convergence, and divergence between lectal varieties can be calculated using uniformity (U) measures. Featural (A) measures provide the proportion of terms possessing a special feature, such as being borrowed. Attitudinal intention (C) measures calculate language users’ intentions and evaluative opinions. Three points should be clarified with regard to the statistical significance of the results. Firstly, for each of these three aspects, both weighted and unweighted measures are calculated. For the weighted measures, the relative frequency of each concept is taken into account. As such, the weighted measures are more significant than the unweighted measures. The weighted measure implies that high frequency concepts have a more outspoken impact, whereas the unweighted measure presupposes that all the concepts hold the same status. In this study, the impact of loanwords and the relationship between the two varieties are accounted for from a pragmatic and communicative perspective (which integrates the differences in frequency of the concepts studied) rather than a structural one (which attributes the same weight equally to every concept). For this reason, the weighted measures and more frequent concepts are statistically more significant. Secondly, the comparison of the two percentages is based on the principle that differences of less than 5% are not statistically significant. In this case it may be said that the results in question are more or less equal, and this is indicated by the symbol . The 5% margin is an arbitrarily chosen value used to account for a statistical margin of error. Finally, statistical significance should obviously be based on sample size. As would be expected, the study involves certain concepts which are used more frequently and others which are used less frequently, with the former affording greater statistical security than the latter. However, returning to the first point, the weighted measure provides a means of balancing the effects of less frequent concepts and therefore less secure data, given that a less frequent concept counts for less in the calculations. In addition, even the less frequent concepts selected are still representative of the respective lexical fields (care was taken to choose common, everyday football and clothing concepts). Therefore the unweighted calculation is also important, although less relevant than the weighted calculation. We begin with the main measurement used in this study, the featural measure (A). This measure provides the proportion of terms with a certain feature which, in the case of this study, is the proportion of loanwords in the onomasiological profile of one concept (or in the onomasiological profiles of a set of concepts) in the research sample. Assigning a feature is not a binary issue, but rather the result of a continuum. For this study, the proportion A of all borrowed items used to name a concept is quantified as the sum of the borrowed items’ relative frequencies in the corpus, weighted by a membership value that indicates the degree to which the loanwords are adapted to the
110
Augusto Soares da Silva
source language.2 To give an example, consider the profile GOAL1 . The term goal looks and sounds English, so its membership value is 1. In comparison, gol (used in Brazilian Portuguese) and golo (used in European Portuguese) use the Portuguese spelling and pronunciation but are still recognizably related to the English term goal and are therefore attributed the membership value 0.5. As an example, Table 1 lists the English influence on the onomasiological profile GOAL1 in EP (P) and BP (B) in the 1950s, based on the absolute (abs) and relative (rel) frequencies of the alternative terms, the membership value (W) of the English loan, and the sum of the relative frequencies of the alternative terms weighted by the membership value (rel*W). The proportion of anglicisms in the profile GOAL1 is greater in the 1950s Brazilian database (44.8%) than in the 1950s Portuguese database (31.8%). These are the values we are primarily interested in in this paper.
Tab. 1: The impact of the English loans (A) on the GOAL1 profile in EP and BP in the 1950s GOAL1
abs
P50 rel
rel*W
abs
B50 rel
rel*W
W
bola goal gol gôl golo ponto tento
109 24 0 0 1841 204 795
3.7 0.8 0.0 0.0 61.9 6.9 26.7
0.0 0.8 0.0 0.0 31.0 0.0 0.0
0 528 111 66 0 26 631
0.0 38.8 8.1 4.8 0.0 1.9 46.3
0.0 38.8 4.1 1.9 0.0 0.0 0.0
0 1 0.5 0.4 0.5 0 0
A (eng)
31.8
44.8
2 Technically, the featural A/A measures are calculated with the following formulae. The weighted A measure takes into account the relative frequency of each concept whereas the unweighted A measure does not. With regard to the membership value W, the highest score (1) is given to loanwords keeping their original form, and the lowest score (0.25) to strongly adapted terms and loan translations. The weightings are thus based on the extent to which the form of the borrowed item is maintained and the extent to which it has been adapted to the Portuguese language. AK,Z (Y) =
n i= 1
FZ,Y (Xi ). WXi (K)
The proportion A of all items x with feature K in the onomasiological profile of a concept Z in the subcorpus Y equals the sum of x’s relative frequencies weighted by the membership value W . AK (Y) =
n i= 1
AK,Zi (Y). GZi (Y)
The proportion A of all items x with feature K in the subcorpus Y equals the sum of all A-measures, weighted by G, that is the relative frequency of concept Z in Y.
Measuring the use and success of loanwords in Portugal and Brazil
111
The internal uniformity measure, which is the second measure used in this study, consists of calculating uniformity within a single language variety. The internal uniformity (I/I ) reaches its highest value when all the speakers, in every circumstance, choose the same lexical item to denote a given concept. This does not signify lexical stability, but rather the fact that no alternatives exist for the dominant designation. The internal uniformity value will decrease the greater the amount of terms that compete to denote the same concept, and the more dominant some of these terms become. In practice, the internal uniformity for a concept is quantified as the sum of the squares of relative frequencies of the lexical items used to name that concept.3 Consider the example of the FORWARD profile. Table 2 shows the relative frequencies (P/B) and the measured internal uniformity (I) for the Portuguese and Brazilian database in the 1950s.4 As the table shows, for this period, the internal uniformity is greater in the Portuguese database (55.8% vs. 38.1%). This can be explained by the two factors which contribute to determine internal uniformity. First, P50 has a single term which is clearly dominant whereas B50 has two dominant terms. Second, there are more highly frequent alternative terms in B50 than in P50. The internal uniformity measure is an indicator of onomasiological homo-/heterogeneity within and across language varieties. However, this is not necessarily an indicator of standardization because it is not possible
Tab. 2: Internal uniformity (I) for the FORWARD profile in EP and BP in the 1950s FORWARD
P50
(P50)2
B50
(B50)2
atacante avançado avante dianteiro forward ponta-de-lança
8.8 71.6 0.0 19.2 0.1 0.3
0.778 51.288 0.0 3.692 0.0 0.001
36.6 0.9 48.9 6.8 5.2 1.5
13.407 0.009 23.935 0.458 0.274 0.024
I
55.8
38.1
3 The internal uniformity I/I measures are calculated with the following formulae. IZ (Y) =
n i= 0
FZ,Y (xi )2
The internal uniformity I for a concept Z in the sample Y equals the sum of the squares of relative frequencies F of the lexical item x in the onomasiological profile for Z in Y: I (Y) =
n
IZi (Y).GZi (Y)
i=0
The internal uniformity I for a set of concepts Z in the sample Y equals the sum of I-values for Zs weighted by the relative frequencies G of Z within the total set of Zs in Y. 4 The numbers presented in the columns (P50)2 and (B50)2 are the squares of the proportions, and not the percentages. For instance, 0.778 is the square of 0.88 (the proportion), not 8.8 (the percentage).
112
Augusto Soares da Silva
to know how much internal variation is normal or acceptable to consider whether a given linguistic situation is standardized. The external uniformity measure (U/U ), which is the third measure used in this study, consists of calculating uniformity between language varieties. In practice, the external uniformity for a concept between two varieties is quantified as the sum of the smallest relative frequencies of the lexical items used to name that concept in the two varieties.5 Diachronically, convergence and divergence can be quantified through increasing or decreasing external uniformity. Synchronically, the greater the distance there is between the standard and substandard registers, the smaller external uniformity there is between these two registers. Consider again the example of the FORWARD profile. Table 3 shows the percentages of the external uniformity (U) measure in relation to the onomasiological profile FORWARD in the Portuguese (P) and Brazilian (B) databases between 1950 and 1970 (P50, B50, P70, B70). The increase in uniformity between EP and BP from 16.9% in the 1950s to 28.8% in the 1970s suggests convergence between both varieties in relation to the FORWARD profile.
Tab. 3: Uniformity (U) for the FORWARD profile between EP and BP (1950–1970) FORWARD
P50
B50
P70
B70
atacante avançado avante dianteiro forward ponta-de-lança
8.8 71.6 0.0 19.2 0.1 0.3
36.6 0.9 48.9 6.8 5.2 1.5
13.6 47.4 0.0 20.1 0.0 19.0
73.8 0.0 11.0 0.7 0.0 14.5
U
16.9
28.8
Finally, the attitudinal intention measure (C) calculates the behavioral intentions of speakers in relation to a word or construction used to express a particular concept or function. The attitudinal intention measure thus enables convergent and divergent
5 The external uniformity U/U measures are calculated with the following formulae. UZ (Y1 , Y2 ) =
n i= 0
min(FZ,Y1 (x1 )FZ,Y2 (x1 ))
The uniformity U for a concept Z between two samples Y1 and Y2 equals the sum of the minima of relative frequencies F of the lexical item x in the onomasiological profiles for Z in Y1 and Y2 . U (Y1 , Y2 ) =
n
UZ (Y1 , Y2 ).GZ
i= 0
The uniformity U for a set of concepts Z between two samples Y1 and Y2 equals the sum of U-values for Zs weighted by the relative frequencies G of Z within the total set of Zs.
Measuring the use and success of loanwords in Portugal and Brazil
113
attitudes to be calculated and its application will be examined later in Section 7. For technical details of the measures used in this study, see Geeraerts, Grondelaers, and Speelman (1999: 36–64). To summarize, by using featural measures, uniformity measures and attitudinal intention measures, the intention is to combine different types of evidence to answer the research questions, i.e. both corpus-based and survey-based results will be discussed.
4 Loanwords in European and Brazilian Portuguese in the last 60 years Two questions are addressed in this section. Firstly, is the influence of English and other foreign languages stronger in Brazilian Portuguese (BP) than in European Portuguese (EP)? Secondly, has foreign influence increased or decreased in the last 60 years? As stated in Section 2, the analysis is based on a database containing 90,202 observations of football terms used in sports newspapers and 12,451 observations of clothing terms used in fashion magazines, in both cases taken from the 1950s, 1970s and 1990s/2000s. Using the featural (A/A ) measures described in Section 3, we will now calculate the proportion of terms with feature “English”, “French” or “loan” (regardless of the origin) within the onomasiological profile of a selected concept and then for all the concepts included in the analysis of the samples of both varieties. In the vocabulary of football, foreign loanwords are distributed into two categories: English loanwords and loanwords in general (including Spanish, Italian and French loanwords). In the vocabulary of clothing, foreign borrowings are divided into three categories, given that both English and French loanwords are relevant in this lexical field: French loanwords, English loanwords and loanwords in general. Tables 4 and 5 present the results obtained for English loanwords (A/A Engl ), French loanwords (A/A Fr ) and for all the loanwords in general (A/A loan ) in the Portuguese (P) and Brazilian (B) varieties, in the three periods under study, namely the 1950s, 1970s and 1990s-2000s. First the weighted proportion of loanwords (A ) is presented, followed by the unweighted proportion of loanwords (A). As regards the corpus of football (Table 4), the influence of English borrowings and other loanwords is clearly stronger in BP than in EP in all the periods studied. In the 1950s, there was a very big difference between BP and EP. In fact, the number of English loanwords in the Brazilian variety is twice as large as in the European variety, namely 18% in BP against 7.1% in EP (for weighted measures A ). This difference results from a larger number and a higher frequency of foreign borrowings that keep their original form in the 1950s in the Brazilian variety. This is the case for referee, forward, back, team, foul, goal, keeper, match, half, shoot, corner, for instance, which are absent from the European Portuguese texts in the majority of cases. As will be discussed in the next
114
Augusto Soares da Silva
Tab. 4: Loanwords in the corpus of football (from the 1950s to the 2000s) A Engl (P50) A Engl (P70) A Engl (P00)
7.1% 9.8% 10.2%
< <
PORT U = 4.433 + 2.831 A LOAN
0.004 0.1488 0.0049
100 94.63 99.99
U = 78.991–1.352 A ENGL U = 22.57 + 3.069 A FREN U = 165.8–4.064 A LOAN
0.2916 0.2613 0.4298
80.45 84.08 60.94
Football - Brazil
p
r2
Clothing – Brazil
p
r2
U = 198.6–8.581 A ENGL U = 41.07 + 0.9099 A ENGL>PORT U = 765.8–30.77 A LOAN
0.2617 0.1079 0.5594
84.03 97.15 40.72
U = 84.23–1.766 A ENGL U = 34.68 + 2.197 A FREN U = 462 -15.74 A LOAN
0.237 0.3808 0.6861
86.77 68.28 22.4
Measuring the use and success of loanwords in Portugal and Brazil
127
In the table, the r 2 value indicates the percentage of data explained by a particular variable. Nevertheless, the results obtained from linear regression have to be viewed carefully, given the rather low number of factors studied (3 parameters and 3 time periods only).
7 Loanwords, attitudinal intentions and language planning We will now examine whether speakers’ knowledge of the origins of words corresponds to actual language behavior as observed in the corpus. In other words, the intention will be to ascertain the extent to which objective indicators taken from the corpus and subjective indicators expressed in the speakers’ attitudes correlate. In order to answer this question, a survey was produced containing 15 onomasiological profiles for clothing terms (the profiles most frequently used were selected from the 22 profiles studied in the corpus of fashion magazines and clothing labels) and administered to 120 undergraduate students from the BA courses in Modern Languages and Literatures and Psychology, 60 of these were Brazilian and 60 Portuguese. The Brazilian students (35 females and 25 males) were from São Paulo and Rio de Janeiro, while the Portuguese students (40 females and 20 males) were from Braga and Lisbon. The survey combined both cognitive and behavioral factors of language attitudes, while an evaluative factor was included only implicitly. It contained three questions. The first, concerning the behavioral component, sought to determine the attitudinalbehavioral intention with regard to word X as a name for concept Z. Respondents were asked which onomasiological alternative they would use (usually/sometimes/never) when expressing themselves in a Portuguese standard. The response would be: “In a context requiring standard Portuguese usage, I would usually/sometimes/never use this word to denote concept Z . As this usage intention may be both positive and negative, this first question is also implicitly related to the evaluative component. That is to say, respondents were asked to indicate a behavioral intention and not an explicit assessment. The other two questions concerned the respondents’ common knowledge about the origin and typical usage of the words selected. Thus, it was the respondents’ “linguistic worldview”, which determines their language attitudes, that was being explored. Respondents were asked if the word in question is of English, French or Portuguese origin. The response would be: “In my opinion, this word is a Portuguese (therefore not foreign)/English/French term ”. They were also asked if the word in question is typically used in Portugal, Brazil or in both countries (“In my opinion, this word is used only or mainly in Portugal/only or mainly in Brazil/in both Portugal and Brazil ”). The first two questions are of interest to this study (for a more detailed analysis of this attitudinal study, see Soares da Silva 2012). The responses to the first question constitute the behavioral intention (C) of individual respondents and of all respondents together with regard to word X as a name for
128
Augusto Soares da Silva
concept Z. The calculation of individual/global attitudinal-behavioral intention (C) is weighted as follows: “usually” (1 point); “sometimes” (0.5) and “never” (0). Individual attitudinal intention equals the relative frequency of X in the response category “usually” plus half the relative frequency of X in the category “sometimes” plus zero times the frequency in “never”. To give an example, if a respondent usually uses blazer, sometimes casaco and never paletot to denote COAT when expressing him/herself in the standard register, his/her attitudinal intention is 66.7% for blazer, 33.3% for casaco and 0% for paletot. The global attitudinal intention is the average of all individual intentions. The attitudinal intention is used to calculate the proportion of words with a certain feature in a respondent’s intention with regard to concept Z or with regard to the total set of concepts in the intention of a single respondent or all of them. The features are provided by the cognitive component of the survey. They include (see the second question) terms of “English origin” (ENGL), “French origin” (FR) and “Portuguese origin” (PORT). The proportion of the words with, for instance, the ENGL feature in a respondent’s intention with regard to concept Z is quantified as the sum of the intentions to the words in question, weighted by a membership value that indicates the membership/non-membership of category ENGL.6 Taking the example above, if a respondent usually uses blazer, sometimes casaco and never paletot to denote COAT and among these three terms he/she recognizes blazer as a English term, the proportion of English terms in the attitudinal intentions of that respondent is 66.7% (blazer [66.7 × 1] + casaco [33.3 × 0] + paletot [0 × 0]). The proportion of English and French clothing terms in the attitudinal intentions of the informants will now be measured. Three calculations are carried out. Firstly, the proportion of English (ENGL) and French (FR) words in the intentions of the Portuguese (SURVP) and Brazilian (SURVB) respondents is calculated in relation to the total set of clothing concepts (15 onomasiological profiles). The results can be seen in the upper section of Table 14. The proportion of English clothing terms in the intentions of Portuguese and Brazilian respondents is higher than the proportion of French clothing terms in the in-
Tab. 14: English and French clothing terms in the attitudinal intentions of the informants A Engl (SURVP) A Engl (SURVB)
24.3% 25.8%
> >
A Fr (SURVP) A Fr (SURVB)
18.5% 16.1%
A Engl (SURVP) A Fr (SURVP)
24.3% 18.5%
A Engl (SURVB) A Fr (SURVB)
25.8% 16.1%
6 Technically, the proportion A of the words with feature K in the intentions of a respondent with regard to concept Z equals the sum of the attitudinal intentions C with regard to the words in question weighted by membership score W (1 for membership of category K and 0 for non-membership).
Measuring the use and success of loanwords in Portugal and Brazil
129
tentions of Portuguese and Brazilian respondents. Comparing these proportions with the proportion of English and French terms in the corpus of fashion magazines and clothes labels in the 2000s presented in Tables 5 and 12 above, it may be concluded that attitudinal intentions coincide with the actual language behavior observed in the corpus of fashion magazines and clothes labels. This means that Portuguese and Brazilian speakers show a preference for English clothing terms, both in language use, as well as attitudinally. The second calculation involves comparing the proportion of English and French clothing terms in the two national varieties. The results appear in the lower section of Table 14. Portuguese and Brazilian respondents have identical attitudinal intentions with respect to both English and French loanwords. This means that the attitudinal intentions do not confirm the hypothesis of a stronger foreign influence in the Brazilian variety. The last calculation involves comparing the proportion of English and French clothing terms from the survey to the proportion of English and French clothing terms from the 2000s magazines (P00, B00) and clothes labels (Psub 00, Bsub 00). To maximize this comparison, the material from the magazines and labels was reduced (R) to the concepts and terms selected for the survey. The results are presented in Table 15.
Tab. 15: English and French clothing terms from the survey and from magazines and clothes labels A Engl (SURVP) A Engl (SURVP) A Engl (SURVB) A Engl (SURVB)
24.3% 24.3% 25.8% 25.8%
> > > >
A Engl (P00R) A Engl (Psub 00R) A Engl (B00R) A Engl (Bsub 00R)
12.1% 15.6% 16.7% 9.3%
A Fr (SURVP) A Fr (SURVP) A Fr (SURVB) A Fr (SURVB)
18.5% 18.5% 16.1% 16.1%
> > > >
A Fr (P00R) A Fr (Psub 00R) A Fr (B00R) A Fr (Bsub 00R)
12.7% 10.5% 8.5% 11.3%
The proportion of clothing terms considered by respondents to be English and French loanwords is always higher than the proportion of the English and French loanwords in the magazines and clothes labels. This means that attitudinal intentions do not coincide with actual language behavior. It was expected that attitudinal intentions would be closer to the standard register of fashion magazines than the substandard register of labels photographed in clothes shops. This expectation is derived from the fact that the attitudinal intentions were interrogated in relation to the standard register. However, this expectation was not confirmed either: the attitudinal intentions were as distant in the standard as the substandard register. Two general conclusions regarding the use of loanwords may be drawn from this study of attitudinal intentions. Firstly, the attitudinal intentions of both the Portuguese
130
Augusto Soares da Silva
and Brazilian respondents with regard to the use of English and French clothing terms did not always correspond to their actual language behavior as observed in the corpus of fashion magazines and clothing labels. Specifically, the attitudinal intentions do not confirm the wider use of English and French clothing terms in the Brazilian variety in comparison with the European variety. Secondly, the language planning and purist measures that have been implemented to combat the use of loanwords, such as the recent Brazilian Federal Bill cited in Section 2, seem to be a complete failure. In fact, the attitudinal intentions of both the Brazilian and the Portuguese respondents are very favorable to the use of loanwords.
8 Conclusions Certain conclusions can be drawn from this corpus-based and concept-based sociolectometrical approach to foreign influences on the European and Brazilian Portuguese language varieties in the last 60 years. Firstly, football and clothing terms confirm the hypothesis that the influence of English and other foreign languages is stronger in the Brazilian variety than in the European variety. The greater receptivity of the Brazilian language variety to loanwords is evident both in the use of directly imported items and through adaptation. In fact, Brazilian Portuguese imports a larger number of loanwords and adapts and integrates them more easily than European Portuguese. This difference between the two language varieties is stronger in the football vocabulary than the clothing vocabulary. The influence of English football borrowings and other loanwords is clearly stronger in BP than in EP in all the periods studied (the 1950s, 1970s and 1990s/2000s), both in terms of the weighted and unweighted measures and also in relation to the standard register of sports newspapers as well as the substandard register of chats. In the case of the clothing terms, the proportion of loanwords in the two language varieties is similar, although slightly biased towards the Brazilian variety. The fact that the French influence is not weaker in BP than in EP is also significant. Secondly, the use of loanwords has, in principle, contributed towards onomasiological heterogeneity within and across the two national varieties. A positive correlation was observed between English loanwords and onomasiological heterogeneity in both national varieties and in both lexical fields. In contrast, a negative correlation was found between French loanwords and onomasiological heterogeneity. However, this negative correlation can be interpreted as a specific effect of the semantic field in question. One even clearer result is that changes in onomasiological heterogeneity occur on a larger scale in the Brazilian variety in both lexical fields. The greater receptiveness of BP to loanwords may be one of the factors involved in this greater changeability of BP. It can also be seen that the use of loanwords contributes to a certain extent to the evolving relationship between the two national varieties. The enormous number of English football loanwords in the Brazilian variety in the 1950s helped create a
Measuring the use and success of loanwords in Portugal and Brazil
131
greater distance between the two varieties and the large percentage of adaptations of English football loanwords appearing in BP between 1950 and 1970 is probably one of the main factors in the slight convergence between the two language varieties. In the case of clothing vocabulary, the use of loanwords was one of the factors in the evolving divergence between the two language varieties during the period of time studied. Thirdly, the attitudinal intentions of both the Brazilian and Portuguese respondents regarding the use of English and French loanwords does not entirely correspond to actual language behavior as observed in the corpus. Specifically, attitudinal intentions do not confirm the greater use and success of English and French clothing terms in the Brazilian, as opposed to the European, language variety. In addition, the proportion of loanwords in the intentions of Portuguese and Brazilian respondents is clearly greater than the proportion of loanwords effectively used in the corpus. This reduces the extent to which data from the survey enables predictions about the use and success of loanwords to be made. With regard to the effects of language planning, the success of loanwords in the Brazilian variety, both in terms of attitudinal intentions and actual language behavior as observed in the corpus, shows that attempts to reduce or even ban the use of loanwords in Brazil are a complete failure. Finally, the concept-based sociolectometrical approach allows the use, success and evolution of loanwords to be measured and compared in both national varieties of Portuguese, as well as the potential influence of loanwords on convergence and divergence and other lexical distances between the two national varieties.
References Baxter, Alan N. 1992. Portuguese as a pluricentric language. In Michael Clyne (ed.), Pluricentric languages. Differing norms in different nations, 11–43. Berlin & New York: Mouton de Gruyter. Castilho, Ataliba Teixeira de. 2005. Língua portuguesa e política linguística: O ponto de vista brasileiro. In Eduardo Prado Coelho (ed.), A língua portuguesa: Presente e futuro, 193–221. Lisboa: Fundação Calouste Gulbenkian. Castilho, Ataliba Teixeira de. 2010. Nova Gramática do Português Brasileiro. São Paulo: Editora Contexto. Faraco, Carlos Alberto (ed.). 2004. Estrangeirismos. Guerras em torno da língua. 3th edn. São Paulo: Parábola Editorial. Garcez, Pedro M. & Ana Maria S. Zilles. 2004. Estrangeirismos: desejos e ameaças. In Carlos Alberto Faraco (ed.), Estrangeirismos. Guerras em torno da língua. 3th edn., 15–47. São Paulo: Parábola Editorial. Geeraerts, Dirk, Stefan Grondelaers & Peter Bakema. 1994. The structure of lexical variation. Meaning, naming, and context. Berlin & New York: Mouton de Gruyter. Geeraerts, Dirk, Stefan Grondelaers & Dirk Speelman. 1999. Convergentie en divergentie in de Nederlandse woordenschat. Amsterdam: Meertens Instituut. Geeraerts, Dirk, Gitte Kristiansen & Yves Peirsman (eds.). 2010. Advances in Cognitive Sociolinguistics. Berlin & New York: Mouton de Gruyter.
132
Augusto Soares da Silva
Kristiansen, Gitte & René Dirven (eds.). 2008. Cognitive Sociolinguistics: Language variation, cultural models, social systems. Berlin & New York: Mouton de Gruyter. Mattos e Silva, Rosa Virgínia. 2004. Ensaios para uma sócio-história do Português Brasileiro. São Paulo: Parábola Editorial. Moura, Vasco Graça (ed.). 1983. Estão a assassinar o português!. Lisboa: Imprensa Nacional Casa da Moeda. Santos, Diana. 2009. Caminhos percorridos no mapa da portuguesificação: A Linguateca em perspectiva. Linguamática 1(1). 25–59. Santos, Diana & Luís Sarmento. 2003. O projecto AC/DC: Acesso a corpora/disponibilização de corpora. In Amália Mendes & Tiago Freitas (eds.), Actas do XVIII Encontro Nacional da Associação Portuguesa de Linguística, 705–717. Lisboa: Associação Portuguesa de Linguística. Soares da Silva, Augusto. 2005. Para o estudo das relações lexicais entre o Português Europeu e o Português do Brasil: Elementos de sociolexicologia cognitiva e quantitativa do Português. In Inês Duarte & Isabel Leiria (eds.), Actas do XX Encontro Nacional da Associação Portuguesa de Linguística, 211–226. Lisboa: Associação Portuguesa de Linguística. Soares da Silva, Augusto. 2008a. Integrando a variação social e métodos quantitativos na investigação sobre linguagem e cognição: Para uma sociolinguística cognitiva do português europeu e brasileiro. Revista de Estudos da Linguagem 16(1). 49–81. Soares da Silva, Augusto. 2008b. O corpus CONDIV e o estudo da convergência e divergência entre variedades do português. In Luís Costa, Diana Santos & Nuno Cardoso (eds.), Perspectivas sobre a Linguateca/Actas do Encontro Linguateca: 10 anos. http://www.linguateca.pt/ LivroL10/ Soares da Silva, Augusto. 2010. Measuring and parameterizing lexical convergence and divergence between European and Brazilian Portuguese. In Dirk Geeraerts, Gitte Kristiansen & Yves Peirsman (eds.), Advances in Cognitive Sociolinguistics, 41–83. Berlin & New York: Mouton de Gruyter. Soares da Silva, Augusto. 2012. Comparing objective and subjective linguistic distances between European and Brazilian Portuguese. In Monika Reif, Justyna A. Robinson & Martin Pütz (eds.), Variation in language and language use: Linguistic, socio-cultural and cognitive perspectives, 244–274. Frankfurt: Peter Lang. Soares da Silva, Augusto. In press. The pluricentricity of Portuguese: A sociolectometrical approach to divergence between European and Brazilian Portuguese. In Augusto Soares da Silva (ed.), Pluricentricity: Language variation and sociocognitive dimensions. Berlin & New York: Mouton de Gruyter. Speelman, Dirk, Stefan Grondelaers & Dirk Geeraerts. 2003. Profile-based linguistic uniformity as a generic method for comparing language varieties. Computers and the Humanities 37. 317–337. Zenner, Eline, Dirk Speelman & Dirk Geeraerts. 2012. Cognitive Sociolinguistics meets loanword research: Measuring variation in the success of anglicisms in Dutch. Cognitive Linguistics 23(4). 749–792.
Measuring the use and success of loanwords in Portugal and Brazil
133
Appendix A Football profiles BACK: “(full-)back”, beque, bequeira, defensor, defesa, lateral, líbero, zagueiro BALL: balão, bola, couro(inho), esfera, esférico, pelota COACH: mister, professor, técnico, treinador CORNER: canto, chute de canto, “corner”, córner, escanteio, esquinado, pontapé de canto, tiro de canto DRIBBLING: corte, drible(ing), engano, “feint”, finta, firula, ginga, lesa, manobra enganadora, simulação FORWARD: atacante, avançado, avante, dianteiro, “forward”, ponta-de-lança FOUL: carga, falta, “foul”, golpe, infra(c)ção, obstru(c)ção, transgressão, violação (das regras) FREE KICK: chute (in)direto, falta, “free(-kick)”, livre (directo, indirecto), pontapé livre, tiro dire(c)to, tiro livre (direto, indireto) GOAL1 : bola, “goal”, gol, golo, ponto, tento GOAL2 : arco, baliza, cidadela, “goal”, gol(o), malhas, marco, meta, rede, redes, vala GOALKEEPER: arqueiro, “goal-keeper”, goleiro, golquíper, guarda-meta, guarda-rede, guarda-redes, guarda-vala, guarda-valas, guardião, “keeper”, porteiro, quíper, vigia MATCH: batalha, choque, combate, competição, confronto, desafio, disputa, duelo, embate, encontro, jogo, justa, luta, “match”, partida, peleja, prélio, prova, pugna MIDFIELDER: alfe, central, centro-campista, centro-médio, “half”, interior, médio, meia, meio-campista, meio-campo, “midfield”, trinco, volante OFFSIDE: adiantamento, banheira, deslocação, fora-de-jogo, impedimento, “offside”, posição irregular PENALTY: castigo máximo, castigo-mor, falta máxima, grande penalidade, penalidade, penalidade máxima, penálti (pênalti, pénalti), “penalty” REFEREE: apitador, árbitro, director da partida, juiz, juiz de campo, “ref(eree)”, referi, refre ASSISTANT REFEREE: árbitro auxiliar, árbitro assistente, auxiliar, 2º/3º/4ºárbitro, bandeirinha, fiscal de linha, juiz de linha, “liner” SHOT/KICK: chute, chuto, “kick(-off)”, panázio, pelotada, pontapé, quique, “shoot”, tiro SHOT/PLAYING: jogada, lance TEAM: conjunto, formação, eleven, equipa/e, esquadra, esquadrão, grupo, “match”, onze, onzena, plantel, quadro, “team”, time, turma WINGER: ala, extremo, ponta, ponteiro Clothing profiles F: “blouse”, blusa, blusinha, “bustier”, camisa, camisa-body, camisão, camiseiro(inho), camiseta/e, (blusa) “chémisier”, (blusa) chemisiê
BLOUSE
134
Augusto Soares da Silva
M/F: cardigã, “cardigan”, casaco/casaquinho de malha (de lã, de tricô), “gilet”, japona, malha, “twin-set” COAT F: “blazer”, blêizer, blêiser, casaco, casaquinho/a, “manteau”, mantô, paletó, “paletot” COAT M: “blazer”, blêizer, blêiser, casaco, paletó, “paletot” DRESS F: camiseiro, “chemisier”, chemisiê, “shirt-dress”, traje/o, veste, vestido(inho), vestido-camisa, vestido-camiseiro, vestido-camiseta, vestido-chemiser(ê), (vestido) cai-cai, (vestido) tomara-que-caia JACKET M/F: casaca, casaco curto, jaleca, jaqueta, “jaquette”, jaquetinha, véstia JACKET (BLOUSON) M/F: “blazer”, blêizer, blêiser, blusão, “bluson”, camurça, camurcine, camisa esporte, casaco de pele (de ganga, etc.), colete, parka JEANS M/F: calça(s) de ganga, calça(s) em denim, calça(s) em jeans, ganga, jeans JUMPER M/F: blusa, blusão, blusinha, “body”, cachemir, camisa, camisa-de-meia, camiseta, camisinha, camisola, camisolinha, “canoutier”, canoutiê, malha, malhinha, moleton, “pull”, “pullover”, pulôver, suéter, “sweat”, “sweat shirt”, “sweater” LEGGINGS F: “fuseau(x)”, fusô, “legging(s)” OVERCOAT M/F: abafo, agasalho, balandrau, capote, casacão, casaco comprido, casaco de abafo/abafar, casaco de agasalho, casaco de/em pele, casaco-sobretudo, “duffle-coat”, gabão, “gilet”, “manteau”, mantô, manto, overcoat, paletó, “pardessus”, “pelerine”, samarra, sobrecasaca, sobretudo, sobreveste, “trench (coat)” RAINCOAT M/F: “ciré”, “ciré-maxi”, “anorak”, canadiana, capa, capa de chuva, casaco impermeável, corta-vento, casaco-gabardina, gabardine/a, impermeável, kispo, parka SHIRT M: blusão, camisa, camisa de gravata, camisa de manga curta, camisa desportiva, camisa esporte(iva), camisa jeans, camisa social, camiseta, camisete, “camisette” SHORT JACKET F: bolero, carmona, casa(i)b(v)eque, casaco curto, casaquilha, colete, colete camiseiro , corpete, corpinho, garibáldi, “gilet”, manguito, mini, minicasaco, roupinha, “shortie”, vasquinha SHORT JACKET M: casaco curto, colete, espartilho, gibão, “gilet”, jaleca, jaleco, jaqueta, véstia SHORT TROUSERS M/F: bermuda(s), calças-capri, calça(s) corsário, calça(s) curta(s), calças 3/4, calções, “cool pants” , corsários, “hot pants”, “knikers”, “pantacourt”, “pedal pusher”, “short(s)”, “short cuts”, “short shorts”, shortinho, “slack(s)” SKIRT F: kilt, maxi (máxi), maxissaia, micro-mini, micro-saia, míni (mini), mini-saia, minissaia, pareô, saia, saia-calça, saia-calção, saião, sainha, saiote SUIT M: beca, completo, costume, fato, terno SUIT/OUTFIT F: “complet”, completo, conjunto, costume, duas-peças, “ensemble”, fatinho, fato, saia-casaco, “tailleur”, “toilette”, toilete, vestido-casaco TAILORED JACKET M/F: “black-tie”, casaca, casaco cerimónia, fraque, “manteau”, mantô, paletó, “paletot”, “pelerine”, “smo(c)king”, sobrecasaca, “tuxedo” CARDIGAN
135
Measuring the use and success of loanwords in Portugal and Brazil
TROUSERS M/F: calça, calças, pantalona T-SHIRT
M/F: camisa, camiseta/e, “camisette”, camisola, licra, “singlet”, “tee-shirt”, “t-shirt”
Appendix B The column on the right in the following tables shows the total number of observations of each profile and the absolute total of observations, thus demonstrating the sample size of each onomasiological profile. The impact of the English loans (AEngl /A Engl ) on the 21 football profiles AEngl /A Engl
P50
B50
P70
0
B00
9.5
0.05 0
7.5
0.06 0.2
0
BACK
0.3
0.01 14.3 0.36 0.6
0.02 1.3
0.05 0
BALL
11.2
0.99 7.6
0.5
10.2 0.93 12.3 0.98 11.9
0.76 12.5 0.47
6542
COACH
0.7
0.01 0
0
1.8
0.1
4720
CORNER
24.9 0.36 49.3 0.53 30.5 0.26 58.6 0.22 19.8 0.14 16.1
DRIBBLING
57.0 0.2
33.3 0.07 56.0 0.22 81.5 0.22 54.5 0.09 94.3 0.21
FORWARD
0.1
5.2
FOUL
11.6 0.11
FREE KICK
25.0 0.19 2.8
GOAL1
31.8 3.54 44.8 5.19 44.9 5.95 48.0 6.13 47.0 6.87 49.9 6.49
GOAL2
0.3
GOALKEEPER
12.2 0.33 24.4 0.6
MATCH
0.1
0.14 0
A
0.03 0
0
0
40.0 0.47 10.9 0.09 11.8 0.01 24.8 0.2
0.01 17.0 0.66 0 0.02 13.7
0
0.3
A
A
A
6.6
0.05 0
0
0
0.01
0
0
2.1
0
0.06 11.6 0
0.3 0
0
33.9 0.77 0
0.02 0
0
0
0.09 8.3
0.04 5.0
0 0.07 0
0.13 12.5 0.09
23.5 0.29 0
12.3 0.27 42.5 1.29 9.7
3.91 0.1
0
A
A
Total nr.
ASSIS. REFEREE
0
A
A
P00
A
A
A
B70
A
0
0
31.9 0.4
0.21 50.0 1.42 0
0
251 3238 814 675 11294 3250 2332 21502
11.6 0.27 22.0 0.65 11.4
OFFSIDE
43.0 0.24 2.8
PENALTY
30.3 0.45 64.8 0.94 49.2 0.84 56.9 0.44 48.8 1.2
74.6 1.22
1450
REFEREE
0
0
0
3310
SHOT/KICK
5.9
0.12 29.6 0.45 2.0
0.02 42.7 0.66 1.1
0.01 47.9 0.36
SHOT/PLAYING
0
0
0
TEAM
0.8
WINGER
0
Total
13.2 7.05 18.5 18
0.8 0
0.03 0
0
0
0
0
0.15 16.1
3.5
1.9
0.41 30.5 6.27 0
0.01 28.3 5.19
0
0
0
0
0
15.0 9.76 20.4 17.1
0
0
0
0
0
0
0
0
0
0
0
27.4 0.17 0
0
0
0
0
0.23
818
MIDFIELDER
0.01 50.6 0.35 0
0.16 8.1
0
623 2791
0
0 0
2004 395
1211 4140 17642 1200
12.8 10.2 20.3 16.2 90202
136
Augusto Soares da Silva
The impact of the French loans (AFr /A Fr ) on the 22 clothing profiles
AFr /A’Fr
P50
B50
B70
48
3.9
51.0 6.04 30.1 2.98 26.4 2.78 18
CARDIGAN M/F
7.27 0.04 6.0
COAT F
8.2
0.47 17.0 0.69 8.2
0.6
16.1 1.15 8.6
COAT M
51.1 2.01 75.0 0.26 9.5
0.1
30.0 0.14 7.8
DRESS F
1.44 0.28 1.4
0.78 11.9 2.01 3.12 0.45 0.5
0.08
2335
JACKET M/F
26.7 0.79 48.6 0.95 45.1 0.9
50.0 1.53 24.7 0.2
51.5 0.84
198
JACKET (BLOUS.) M/F
7.69 0.05 25.0 0.04 50.4 1.0
50.0 0.72 40.3 1.3
43.1 0.23
193
JEANS M/F
0
0
0
180
JUMPER M/F
23.7 0.23 22.2 0.68 36.0 2.14 22.9 1.51 44.9 3.26 28.2 0.43
LEGGINGS F
0
OUTFIT F
61.9 6.43 63.7 7.1
0
0 0
0.02 28.9 0.2
0.42 4.8
0 0
0 0
0 0
56.3 4.7
0
0
0
0 0
49.0 2.3
A
A
1.48 20.0 2.01
13.1 0.1
0 20
A
A
Total nr.
BLOUSE F
A
A
B00
A
A
A
P00
A
0
A
P70
A
1213
0
0
0.6
5.0
0.48
896
0
50.0 0.06
138
0
0.04 0
28.4 0.9
0
573 12
15.6 0.57
849
OVERCOAT M/F
7.22 0.63 29.3 0.96 12.3 0.9
14.1 0.23 8.2
8.8
0.2
642
RAINCOAT M/F
19.8 0.59 0
0
6.5
0
0
49.7 0.7
0
0
207
SHIRT M
0
0
8.33 0.03 0
0
0
0
0
128
SHORT JACKET F
11.6 0.26 5.6
SHORT JACKET M
44
0.45 50.0 0.04 62.5 0.2
SHORT TROUSERS M/F
0
0
0
0
4
0.15 4.9
0.19 0
SKIRT F
0
0
0
0
0
0
0
0
0.85 0.13 1.5
0.24
SUIT M
4.02 0.12 0.9
0.02 0
0
0
0
0
0
0
0
TAILORED JACKET M/F
44.8 1.33 47.5 0.67 0
0
15.0 0.07 0
0
28.1 0.18
TROUSERS M/F
0
0
0
0
19.7 2.58 0
0
2.2
0.41
1327
T-SHIRT M/F
0
0
45.3 0.51 40.3 0.22 49.0 2.25 1.36 0.15 20.0 1.74
244
Total
16.7 17.6 22.2 18.5 20.6 15.9 17.6 18.1 16.1 10.2 16.9 7.86
12451
0
0
0.2
0.07 51.1 0.8
0
0
0.3
0
97
0
28.4 0.68 36.0 0.5
47.2 0.34
0
50.0 0.04
0
50.0 0.1 0
0
0
195 39 454 2337 88 106
Measuring the use and success of loanwords in Portugal and Brazil
137
The impact of the English loans (AEngl /A Engl ) on the 22 clothing profiles AEngl /A Engl
P50
B50
P70
B70
P00
B00
A
A
A
A
A
BLOUSE F
0
0
0
0
0
0
CARDIGAN M/F
0
0
0
0
12.5 0.1
66.7 0.57 11.5 0.1
84.2 0.44
97
COAT F
0
0
0
0
5.8
0.5
19.3 1.38 10.0 0.7
52.6 5.03
896
COAT M
0
0
0
0
4.4
0.1
51.0 0.24 18.9 0.1
28.3 0.03
DRESS F
0
0
0
0
0
0
0
0
0
0
0
JACKET M/F
0
0
0
0
0
0
0
0
0
0
JACKET (BLOUS.) M/F
0
0
0
0
0
0
0
0
9.9
0.3
JEANS M/F
0
0
0
0
100 0.03 0
0
77.8 4.18 100
JUMPER M/F
52.6 0.51 50.1 1.55 21.0 1.25 22.2 1.46 5.5
0.4
LEGGINGS F
0
0
0
OUTFIT F
0
0
0
OVERCOAT M/F
11.7 1.02 1.6
0.05 7.1
RAINCOAT M/F
4.31 0.13 0
0
SHIRT M
0
SHORT JACKET F SHORT JACKET M
0
0
0
0
0
0 0.5
7.3
0.2
A
A
0.5
0.05 1.81 0.15 0.8
0
A
A
A
A
Total nr.
A
0.08
1213
138
0
2335
0
0
198
6.5
0.03
193
1.54
180
16.1 0.25
573
0
80
0.17 100
0
0
0
0
0
0
849
2.9
0.05 13.1 0.5
2.6
0.06
642
2.5
0.02 0
12.5 0.01
207
0
0.04
12
0
0
0
0
0
0
0
5.43 0.11 0
0
128
9.3
0.2
0
0
0
0
0
0
0
0
0
0
195
0
0
0
0
0
0
0
0
0
0
0
0
SHORT TROUSERS M/F
60
0.46 66.0 2.17 82
3.04 82.9 3.25 10.3 0.35 82.8 4.5
SKIRT F
0
0
0
0
0
0
1.0
0.19 2.56 0.38 0
0
SUIT M
0
0
0
0
0
0
0
0
0
TAILORED JACKET M/F
31
0.92 30.0 0.42 100 0.1
80.0 0.38 97.0 0.2
59.7 0.39
TROUSERS M/F
0
0
0
0
0
0
0
0
0
T-SHIRT M/F
0
0
0
0
11.1 0.06 0
0
84.4 9.29 50.7 4.42
244
Total
7.68 3.25 6.7
15.0 7.61 19.5 16.9 27.1 16.8
12451
0
4.19 16.0 5.8
0
0 0
0
0
39 454 2337 88 106 1327
138
Augusto Soares da Silva
Adaptations/translations of English loans in the 21 football profiles AEngl.ad. /A Engl.ad.
P50
B50 A’
A
0.05 0
0
6.5
0
0.02 0
B70 A’
A
0.05 0.2
0
4.5
0
9.5 0
0.05 0
BALL
11.2 0.99 7.6
0.5
10.2 0.93 12.3 0.98 11.9 0.76 12.5 0.47
6542
COACH
0
0
0
4720
CORNER
24.4 0.35 7.2
0.08 21.1 0.18 58.6 0.22 19.8 0.14 16.1 0.07
DRIBBLING
0
0
0
0
0
0
0
0
0
0
0
0
251
FORWARD
0
0
0
0
0
0
0
0
0
0
0
0
3238
FOUL
11.2 0.11 6.4
0
1.3 0
0
0
A’
A
A’
Total nr.
BACK
0
A
B00
ASSIS. REFEREE
0.8
A’
P00
A’
0
A
P70
A
0.03 0.0
0
0
0.01
0
0.3 0
0
0.07 10.9 0.09 11.8 0.06 11.6 0.13 12.5 0.09
FREE KICK
25.0 0.19 2.8
0.01 24.8 0.2
GOAL1
31.0 3.45 6.0
0.7
GOAL2
0.3
0.01 8.0
0.31 0
GOALKEEPER
11.1
0.3
MATCH
0
0
0.3
0
23.5 0.29 0.0
0
44.9 5.95 47.9 6.12 47.0 6.87 49.9 6.49 0
33.9 0.77 0
0
11.3 0.28 9.1
0.2
42.5 1.29 9.7
0.21 50.0 1.42
0
0
0
0
0
0
0
0
31.9 0.4 0
814 675 11294 3250 2332 21502
11.6 0.27 11.1
0.33 11.4 0.09 8.3
0.04 5.0
OFFSIDE
7.9
0.05 0
0
10.3 0.07 0
0
PENALTY
9.3
0.14 7.1
0.1
7.1
0.12 56.9 0.44 6.9
0.17 74.6 1.22
1450
REFEREE
0
0
0
0
0
0
3310
SHOT/KICK
5.6
0.11 10.6 0.16 2.0
0.02 42.7 0.66 1.1
0.01 47.9 0.36
SHOT/PLAYING
0
0
0
0
0
0
0
0
0
0
TEAM
0
0
1.3
0.28 0
0
30.5 6.27 0
0
28.3 5.19
WINGER
0
0
0
0
0
0
0
0
Total
7.5
6.02 3.8
7.9
16.5 16.9 7.8
0
2.84 7.5
0
0 0 0
22.6 0.14 0 0
0
0
0.23
818
MIDFIELDER
0
0.16 8.1
0
623 2791
0 0 0 0
8.91 15.8 16
2004 395
1211 4140 17642 1200 90202
139
Measuring the use and success of loanwords in Portugal and Brazil
Appendix C The column on the right in the following tables shows the total number of observations of each profile and the absolute total of observations in the substandard register of chats and clothes shops; the middle column shows the number of observations of each profile in the standard register of sport newspapers and fashion magazines in the 1990s/2000s. English loans for the 21 football profiles in the standard and substandard varieties AEngl /A Engl
P00
B00 A
Total nr.
Psub 00
A
ASSIS. REFEREE
6.6
0.05
0
0
287
4.43
0.01
0
0
324
BACK
0
0
0.3
0.01
1126
0.08
0
0.15
0.01
5247
BALL
11.9
0.76
12.5
0.47
1451
12.5
0.97
12.5
0.53
10005
COACH
2.1
0.1
0
0
3130
7.92
0.42
0.13
0.01
7469
CORNER
19.8
0.14
16.1
0.07
166
25.3
0.24
14.4
0.07
1208
DRIBBLING
54.5
0.09
94.3
0.21
57
9.91
0.04
79.1
0.16
531
FORWARD
0
0
0
0
1117
0
0
0
0
5153
FOUL
11.6
0.13
12.5
0.09
267
12.4
0.12
12.5
0.13
1464
FREE KICK
23.5
0.29
0
0
300
25
0.24
1.47
0
1086
GOAL1
47.0
6.87
49.9
6.49
3992
49.9
9.12
50
8.95
26163
GOAL2
0
0
31.9
0.4
665
0
0
46.4
1.06
2551
GOALKEEPER
9.7
0.21
50.0
1.42
738
13.4
0.22
49.8
1.47
2792
MATCH
0
0
0
0
6325
0.01
0
0.01
0
35952
MIDFIELDER
5.0
0.16
8.1
0.23
876
4.88
0.24
8.06
0.28
6570
OFFSIDE
27.4
0.17
0
0
98
31.1
0.31
0
0
1151
PENALTY
48.8
1.2
74.6
1.22
586
49.2
2.26
49.9
1.57
6142
REFEREE
0
0
0
0
918
0.05
0
0
0
4871
SHOT/KICK
1.1
0.01
47.9
0.36
210
10.6
0.03
41.4
0.24
455
A
A
Total nr.
A
A
A
Bsub 00
A
SHOT/PLAYING
0
0
0
0
1211
0
0
0
0
1714
TEAM
0
0.01
28.3
5.19
5439
0.76
0.09
48.1
12.4
21290
WINGER
0
0
0
0
77
0
0
0
0
1808
Total
12.8
10.2
20.3
16.2
29036
12.3
14.3
19.7
26.9
143946
140
Augusto Soares da Silva
English loans for the 22 clothing profiles in the standard and substandard varieties AEngl /A Engl
P00 A
B00 A
A
A
Total nr.
Psub 00
Bsub 00
A
A
A
A
Total nr.
BLOUSE F
1.81
0.15
0.8
0.08
440
0
0
0
0
CARDIGAN M/F
11.5
0.1
84.2
0.44
39
11.1
0,09
68.8
1.81
169 34
COAT F
10.0
0.7
52.6
5.03
390
21
1,73
34
0.56
188
COAT M
18.9
0.1
28.3
0.03
12
11.1
0,24
85
0.14
47
DRESS F
0
0
0
0
762
0
0
0
0
71
JACKET M/F
0
0
0
0
57
0
0
0
0
4
JACKET (BLOUS.) M/F
9.9
0.3
6.5
0.03
91
21.5
1,46
28.3
0.14
150
JEANS M/F
77.8
4.18
100
1.54
164
67.7
0,97
100
2.96
49
JUMPER M/F
5.5
0.4
16.1
0.25
208
12.7
1,29
5
0.89
329
LEGGINGS F
80
0.17
100
0.04
6
0
0
100
0.66
4
OUTFIT F
0
0
0
0
164
0
0
0
0
47
OVERCOAT M/F
13.1
0.5
2.6
0.06
141
30
0,07
50
0.16
7
RAINCOAT M/F
0
0
12.5
0.01
33
80
0,18
0
0
7
SHIRT M
5.43
0.11
0
0
51
0
0
0
0
344
SHORT JACKET F
0
0
0
0
53
0
0
0
0
2
SHORT JACKET M
0
0
0
0
5
0
0
0
0
2
SHORT TROUSERS M/F
10.3
0.35
82.8
4.5
214
18.4
0,74
27
2.79
150
SKIRT F
2.56
0.38
0
0
759
0
0
0
0
146
SUIT M
0
0
0
0
30
0
0
0
0
118 7
TAILORED JACKET M/F
97.0
0.2
59.7
0.39
21
28.6
0,09
0
0
TROUSERS M/F
0
0
0
0
696
0
0
0
0
595
T-SHIRT M/F
84.4
9.29
50.7
4.42
473
91.3
11,1
0
0
305
Total
19.5
16.9
27.1
16.8
4809
17.9
18
22.6
10.1
2775
141
Measuring the use and success of loanwords in Portugal and Brazil
French loans for the 22 clothing profiles in the standard and substandard varieties AFr /A Fr
P00 A
B00 A
A
A
Total nr.
Psub 00
Bsub 00
A
A
A
A
Total nr.
BLOUSE F
18
1.48
20.0
2.01
440
23.1
1.73
0
0
CARDIGAN M/F
13.1
0.1
0
0
39
8.89
0.07
5.63
0.15
169 34
COAT F
8.6
0.6
5.0
0.48
390
7.53
0.62
6
0.1
188
COAT M
7.8
0
50.0
0.06
12
8.7
0.18
0
0
47
DRESS F
3.12
0.45
0.5
0.08
762
0
0
0
0
71
JACKET M/F
24.7
0.2
51.5
0.84
57
50
0.05
50
0.16
4
JACKET (BLOUS.) M/F
40.3
1.3
43.1
0.23
91
38.2
2.59
33.3
0.16
150
JEANS M/F
0
0
0
0
164
0
0
0
0
49
JUMPER M/F
44.9
3.26
28.2
0.43
208
33.2
3.38
31.8
5.64
329
LEGGINGS F
20
0.04
0
0
6
0
0
0
0
4
OUTFIT F
28.4
0.9
15.6
0.57
164
12.5
0.17
20.6
0.57
47
OVERCOAT M/F
8.2
0.3
8.8
0.2
141
2.2
0.01
0
0
7
RAINCOAT M/F
49.7
0.7
0
0
33
17
0.04
0
0
7
SHIRT M
0
0
0
0
51
1.39
0.16
0
0
344
SHORT JACKET F
36.0
0.5
47.2
0.34
53
50
0.02
0
0
2
SHORT JACKET M
50.0
0.1
50.0
0.04
5
0
0
50
0.16
2
SHORT TROUSERS M/F
0
0
0
0
214
0
0
0
0
150
SKIRT F
0.85
0.13
1.5
0.24
759
0
0
0
0
146
SUIT M
0
0
0
0
30
0
0
1.52
0.08
118
TAILORED JACKET M/F
0
0
28.1
0.18
21
53.6
0.17
0
0
7
TROUSERS M/F
0
0
2.2
0.41
696
0
0
0
0
595
T-SHIRT M/F
1.36
0.15
20.0
1.74
473
4.36
0.53
50
3.37
305
Total
16.1
10.2
16.9
7.86
4809
14.1
9.74
11.3
10.4
2775
Alexander Onysko and Andreea Calude
Comparing the usage of M¯aori loans in spoken and written New Zealand English: A case study of Maori, Pakeha, and Kiwi * Abstract: This paper takes an in-depth approach to analyzing the usage of three common M¯aori loans (Maori, Pakeha, and Kiwi ) in New Zealand English in order to empirically address some hypotheses that have been mentioned in previous studies on M¯aori loans but have not yet been confirmed by close analysis. First of all, the paper focuses on the question of whether the ethnicities of the interlocutors as M¯aori or New Zealand European can have an influence on the usage of the loans. Secondly, we provide a short term diachronic view on how the usage frequencies of the selected M¯aori loans developed between 1996 and 2011. Two datasets are analyzed in order to investigate these questions: the Wellington Corpus of Spoken New Zealand English (WSC) and the recently compiled New Zealand English Press Corpus (NZEPC). For considering the effect of interlocutor ethnicity on the use of the loans, we extracted all instances of the loans from the WSC and coded their occurrence according to speaker and addressee ethnicities while controlling for the usage of English equivalents. In order to trace the recent development in the frequency of use, the selected loans were counted out from the NZEPC, which comprises a sample of the major New Zealand newspapers in three year intervals between 1996 and 2011. The analyses of the data are based on a combination of quantitative and qualitative methods, including different statistical procedures as well as close contextual readings. The results confirm the hypothesis that the ethnicity of the speaker as well as of the addressee have a significant effect on the use of the M¯aori loans. While Maori and Pakeha are preferably used in discourse constellations involving M¯aori or mixed interlocutors, Kiwi shows a preference for occurring among New Zealand Europeans. As far as the recent diachronic trend in the usage of the selected M¯aori loans is concerned, results confirm earlier findings on a regional difference in usage between the South and North island of New Zealand. Furthermore, a mixed picture emerges for the three loans with Pakeha showing a marked decrease in usage while Maori remaining on a similar level of usage with variations between intervals and Kiwi exhibiting a slight but not significant upward trend in usage. These observations are followed up by contextual analyses of the loans. Overall, the study highlights the fact that it is important to consider the development and use of M¯aori loans in New Zealand English on an individual basis in order to do justice to the unique scope of usage that M¯aori borrowings have. In
* We would like to thank two anonymous reviewers and the editors of the volume for cogent remarks on earlier versions of this article. Any remaining errors are ours.
144
Alexander Onysko and Andreea Calude
addition, the study emphasizes the necessity to combine quantitative and qualitative methods when carrying out research on borrowings in order to obtain an empirically well-founded and a contextually precise picture on the usage of loanwords.
1 Introduction: Contact between M¯aori and English ¯ I nga¯ ra¯ o¯ namata, ko te mahi a te reo Ingarihi he tango kupu mai i te reo Kariki, i te reo Ratani hoki. ¯ M¯ehemea kaore ¯ Na¯ konei i ora roa ai tae mai ki t¯enei ra. i p¯enei te mahi a te reo Ingarihi, kua ngaro noa atu k¯e (P¯otatau 1991: 93)1 [Previously, what English did was to borrow words from Greek and Latin. That is why it has survived so long right up to the present. If it had not done this, it would have been lost long ago] (translation by Ray Harlow 2001: 105)
These words from H¯emi P¯otatau’s autobiography express what appears as a truism among researchers in language contact. This is the fact that languages change and that taking words (to render the literal meaning from the M¯aori original) from another language, i.e. borrowing, is a regular process of language development. However, P¯otatau’s words also have to be read while being mindful of the cultural and linguistic contact between M¯aori and English. In this sense his words imply an attitude that sees the borrowing of English words into M¯aori as a natural, even necessary process for the survival of the M¯aori language. While many linguists studying contact scenarios where English functions as a source of loanwords, particularly for European languages, might readily agree with the implications of this statement, language contact between M¯aori and English cannot be compared with that stable type of contact which frames the influence of English on German, Italian, French, Russian, Spanish, and so on. In fact, the colonial history of New Zealand has established English as a dominant majority language and has brought the native Pacific tongue to the brink of extinction. ¯ Initiated by M¯aori activism, Te Reo Maori (‘M¯aori language’) has experienced a resurgence from the 1980s, which found social and legal support in New Zealand society. Particularly following a Treaty of Waitangi claim in 1987, the acknowledgment of Te ¯ Reo Maori as a taonga (‘treasure’) and as an official language of New Zealand has led to continued efforts of language planning that range from supporting M¯aori immersion and bilingual schools as well as M¯aori tertiary education institutions, to broadcasting in M¯aori and special initiatives to promote the daily use of the language in M¯aori fam¯ ilies and homes. Despite these successful measures, Te Reo Maori is still in a critical
1 In this article, we follow M¯aori orthographic conventions and include macrons as symbols of vowel length whenever use of M¯aori terminology is made. However, macrons are not applied when referring to the actual loanwords of Maori and Pakeha, which cohere with English orthography in the corpora of this study and do not appear with macrons.
M¯aori loans in spoken and written New Zealand English
145
state today as only about a fifth of the M¯aori population consider themselves as conversationally fluent speakers of their language with considerable regional differences (see Harlow 2007; Bauer 2008). This continued alarming state of the language is due to a series of factors, among which the most crucial ones are a lack of intergenerational transmission (see Harlow 2003; Chrisp 2005) and a lack of exposure to M¯aori language in daily life of mainstream New Zealand. At the same time, speakers of M¯aori emphasize the tight link between the language and central aspects of M¯aori culture, which makes the struggle for the language also a struggle for the survival of M¯aori ways of thinking and being. Thus, from a linguistic point of view, the relation between M¯aori and English is an example of contact between a dominant and a subdominant language (to use Van Coetsem’s terminology, 2000: 38). This uneven balance of power between the languages gives rise to different scenarios of language contact. For bilingual speakers of M¯aori and English, contact between the languages can occur as individual phenomena of codeswitching and transfers in their speech (see Eliasson 1990). On the macrolevel of the M¯aori speech community, the dominance of English has not only caused widespread borrowing of words, but there are also indications that the M¯aori language is changing under the influence of English (see Harlow et al. 2009 for changes in the phonology of M¯aori and Harlow 2001: 45 for examples of grammatical transfers from English into M¯aori). This is consistent with Van Coetsem’s (2000) model of language contact which predicts that the dominant language can exert intense influence on the subdominant language at the phonological and grammatical level. On the other hand, contact-induced influence from the subdominant to the dominant language is largely restricted to lexical borrowing. Indeed, from the perspective of New Zealand English, M¯aori has contributed to its lexicon with a number of loanwords over their period of contact. As summarized in Macalister (2007: 493, referring to Belich 2001) each of three major phases of first colonization (until 1880s), recolonization (between 1880s and 1970s), and decolonization (from 1970s on) had their impact on the types of M¯aori loans entering New Zealand English (henceforth NZE). Initial contact and openness to M¯aori culture and language facilitated some borrowing of terms relating to culturally specific items and customs (e.g. patu ‘club’, pa ‘fortification’, haka ‘ritual posture dance’, hui ‘meeting’, whare ‘house’) as well as the integration of terms relating to the local environment such as place names and M¯aori designations for flora and fauna (e.g. kauri and totara as types of trees and kiwi, tui, and weka as types of birds). While the end of the Land Wars and the Native Schools Act at the end of the 1860s initiated a phase of English-only attitudes and policies, limiting the amount of borrowing from M¯aori, the M¯aori Renaissance in the 1970s came with a revival and an expansion in the use of M¯aori loans. Since then terms from traditional M¯aori society and culture have been revived, and the use of M¯aori borrowings can assume the status of an identity marker for speakers of New Zealand English. According to Deverson, the use of M¯aori borrowings remains the most distinctive feature of New Zealand English and gives New Zealanders
146
Alexander Onysko and Andreea Calude
the opportunity to mark their own identity (1991: 18–19). This observation has also been emphasized in other studies on M¯aori loanwords (see Kennedy 2001; Macalister 2007; Degani 2010). The linguistic and cultural restoration of M¯aori has raised new interest in investigating the use of M¯aori loanwords in English, and a variety of studies have been undertaken in the last two decades. Kennedy (2001: 60) points out that previous research on M¯aori borrowings has been approached eclectically through the listing and discussion of illustrative examples of M¯aori loans (see Deverson 1991). Apart from that, there is more systematic work of lexicographers as evident in Orsman’s Dictionary of New Zealand English (1997) and Macalister’s more recent Dictionary of Maori Words in New Zealand English (2005). The latter features a total of 981 entries of M¯aori terms occurring in New Zealand English. Some studies have been concerned with the understanding of M¯aori loans among New Zealanders and their attitudes towards the use of M¯aori loans (see Bellet 1995; Macalister 2006a, 2008). As an outcome of these investigations, Macalister estimates that the average New Zealander knows about 70 to 80 M¯aori loans other than proper nouns (2008: 75). A substantial amount of research on M¯aori loanwords has been based on corpora and large, searchable collections of texts. By extracting all M¯aori language items in the Wellington one-million word corpora of written and spoken New Zealand English, Kennedy (2001) gives an estimate of the frequency of M¯aori loans in general New Zealand English. He determines an average of five M¯aori terms per 1000 spoken words and a nearly equal rate of six M¯aori words per 1000 terms in the written language. In a diachronic corpus study based on newspapers, Hansards (parliamentary debates), and issues of the School Journal, Macalister finds a steady increase in the use of M¯aori loanwords from 3.29 per 1000 in 1850 to 8.8 in 2000 (2006b: 11). In a short-term diachronic study investigating the frequency of selected M¯aori borrowings from 1997 to 2004, Davies and Maclagan (2006) do not find an overall significant increase in the use of M¯aori loans. Since their frequency counts are based on the number of articles in which a M¯aori loan occurred, the results are not comparable to the previous findings. Daly’s (2007) analysis of M¯aori lexical items in children’s picture books is an example of how genre, context and authorship can influence the use of M¯aori loans in New Zealand English. In her analysis of thirteen picture books written by M¯aori authors, dealing with M¯aori topics, and published by a publishing house that focuses on M¯aori and Pacific experiences, Daly counts an average of 56 M¯aori terms per 1000 words (2007: 23). This is about six times as many as Macalister’s figure of 8.8 M¯aori loans in 2000. In a study on the occurrence of M¯aori borrowings in New Zealand TV news, De Bres (2006) does not find an increase in the use of M¯aori terms when comparing news items from 1984 with 2004. This is a further indication that the long term diachronic trend for an increasing use of M¯aori terms in New Zealand English does not seem to capture what has been happening in the period of the last three decades. Altogether, these mixed results on the rate of M¯aori loans in New Zealand English call for further investigation whereby close attention has to be paid to the fac-
M¯aori loans in spoken and written New Zealand English
147
tors of context, users, and topics as being determinative for the occurrence of M¯aori borrowings. Besides the major concern with how dispersed M¯aori loans are in general New Zealand English, recent research focuses on the use of selected M¯aori loans. Thus, Degani (2010) discusses the use of aroha (‘love, affection’), mana (‘prestige, power, authority’), and marae (‘traditional meeting grounds’) for their different textual functions and connotational aspects, highlighting their use for commercial purposes, as names, and as markers of identity. Research by Degani and Onysko (2010) on hybrid compounds of English and M¯aori terms investigates the productivity of highly frequent M¯aori loans in New Zealand English and shows the semantic clusters pertaining to compounds involving the ten most productive borrowings. Their findings show the type of semantic fields that M¯aori terms inhabit when they combine with English bases and how certain functions are manifest in these creations (e.g. Kiwi as a marker of New Zealand identity triggering a great number of compounds in the domains of economics, entertainment, and sports; 2010: 225–226). While these diverse concerns of research on M¯aori loans have added to understanding their presence and use in New Zealand English today, previous studies also raise a number of questions which call for further comprehensive, empirically founded investigation. One of the issues is the observation that the use of M¯aori borrowings depends on the ethno-cultural background of the language user (speaker/author) as well as on the topic of the discourse. Preliminary findings in Kennedy (2001: 74), who counts a higher incidence of M¯aori terms in the English spoken by M¯aori people and the results from Daly’s (2007) study on children’s picture books as summarized above indicate that the ethnicity of the language user and the type of topics talked about influence the frequency of M¯aori loans in New Zealand English. However, these findings have yet to be tested for significance, and, crucially, a more diversified picture of the potential influence of ethnicity and topic on the use of M¯aori loans in spoken language has to consider the whole discourse constellation, i.e. the ethnicities of the interlocutors involved in the speech situation. In order to determine the relation between topics and M¯aori loans, a close contextual analysis is necessary. This is also imperative for determining the range of functions borrowings from M¯aori can have in New Zealand English. A second important question emerges from the different results concerning the numbers of M¯aori loans over time. As indicated above, findings regarding the recent history of M¯aori loans conflict with Macalister’s claim for a diachronic increase of the rate of M¯aori loans. Shedding more light on this issue demands a close analysis tracking the usage frequencies of M¯aori loans over the last two decades. According to these questions of research on M¯aori borrowings, the current contribution aims to investigate the role of interlocutor ethnicity, discourse topic, and recent trends in the rate of occurrence by comparing the use of three well-established M¯aori loans (Kiwi, Maori, and Pakeha ) in spoken and written New Zealand English. The investigation is based on an analysis of the Wellington Corpus of Spoken New Zealand English (WSC) and of the New Zealand English Press Corpus (NZEPC) and combines
148
Alexander Onysko and Andreea Calude
statistical analyses with a close examination of usage contexts. In this way, the case studies are designed to provide insight into the complex relations between factors influencing the use of M¯aori loans.
2 Methods: Determining M¯aori loans in spoken and written corpora Owing to our combination of a narrow focus (examining close usage contexts) and a wide focus (considering many different factors) regarding the use of M¯aori loans in New Zealand English (NZE), we are forced to limit the number of loans investigated. Hence, we restrict the discussion to three loans, namely, Kiwi, Maori, and Pakeha. The reasons for choosing these specific loanwords are four-fold. First, they are well-established loans and not without good reasons: one of the primary distinctions to be established in the kind of contact situation that we find in New Zealand is the way in which to describe everyone present: New Zealanders in general (Kiwi ), New Zealanders of M¯aori origin (Maori ), and New Zealanders of European origin (Pakeha ). Secondly, the terms denoting ethnicities of the various members of New Zealand society are important identity markers and, at the same time, transparent of the general groupings within it. Thus, they form a coherent semantic group. Thirdly, we wanted to focus on loans that are used frequently (see Kennedy 2001) and among the most productive loans in hybrid compounds (Degani and Onysko 2010: 218) in order to be able to carry out statistical analysis, i.e. so that we could be confident that the trends observed are not the effect of random variation. Related to this is also the observation that the three loans have been part of New Zealand English for some time now (Kiwi first entering NZE as a fauna term, and Pakeha and Maori probably entering NZE from the very beginning of the contact situation in the late 1700s). This means that we can have a better chance of investigating their use and occurrence in the diachronic data collected. Fourth, previous research raises expectations with respect to the use of these loans. Kennedy finds that there are differences in the use of all three loans between M¯aori and P¯akeh¯a people, and also between males and females (2001: 20, Table 10). This led us to a more fine-grained analysis which goes beyond analyzing differences in the ethnicity and gender of the speakers, to investigating differences in their age and taking into account the ethnicity of the interlocutors present at the time of the interaction. Given Kennedy’s initial usage counts, we wanted to build a statistical model to test whether the differences observed are indeed significant, by controlling for the various relevant factors (age, gender, ethnicity, total word count, genre, and so on). In this study, we make use of two different sets of corpus data. We discuss each set below. First, we analyzed transcripts of the Wellington Corpus of Spoken New Zealand English (henceforth WSC), which has the advantage of providing comprehensive infor-
M¯aori loans in spoken and written New Zealand English
149
mation about the speakers (gender, ethnicity, age, education, job, and total word count contributed) involved in the data collection (see Holmes, Vine, and Johnson 1998 for a guide). The WSC contains one million words of spoken interaction of various types, formal and informal, scripted and unscripted, such as telephone conversation, radio talkback, teacher monologues, weather reports, judges summations, etc. Half of this data consists of face-to-face spontaneous conversation. We identified all the instances of each of our three loans (Maori, Pakeha, and Kiwi ) in the WSC transcripts using a regular expression search. We then manually inspected each instance to ensure that it was indeed one with the intended meaning (for example, we did not count instant kiwi or kiwi (calling) card as instances of Kiwi because they were not used to mean ‘New Zealander’2), and also in order to eliminate (the few) cases of codeswitching. All plural versions of the three loans were included in the data, that is, Kiwis, Pakehas, Maoris (and they all occur in the WSC). Secondly, we also identified all the uses of New Zealand English variants for each of our three loans (“New Zealander(s)” for Kiwi, “European/White New Zealander(s)” and “New Zealand European” for Pakeha and “native(s)/indigenous people(s)” for Maori ). As with the loans, we manually inspected each instance in order to make sure we only compare the intended variant, e.g. we did not include “native plant” or “indigenous tree”. The reason we kept track of the New Zealand English counterparts was so that we could attempt to control for topic-hood, in other words, to be sure that we were indeed investigating lexical preference (of loans versus existing English words) and not topic preference (whether or not people talk about Kiwis/New Zealanders at all). Once finalized, the data were manipulated using Python coding3 to obtain for each use of each loan the profile of the speaker uttering it (gender, age, ethnicity, and total word count) and in the case of the spontaneous conversation data, the ethnicity(ies) of the addressee(s) present in the respective interaction. In accordance with findings from the large body of work in the area of Accommodation Theory (e.g. Coupland 1995; Giles 1973; Meyerhoff 1998; Trudgill 1986) and Audience Design (Bell 1984; Bell 1991), we decided to test whether the ethnicity of the addressees present in our interactions had any effect on the presence of the three loans4. Hence, we not only wanted to gauge the effect of
2 There is a case to be made for counting these in the analysis, but we decided to restrict our analyses to the meanings intended. In addition, we controlled for any occurrences of Kiwi with reference to the endemic New Zealand bird in the WSC and excluded the only reference to the bird in the corpus. In the NZEPC, a mere 53 out of 1000 hits refer to the bird, and thus confusion of Kiwi ‘bird’ with Kiwi ‘New Zealander’ can be considered as negligible in the corpus. 3 We thank Paul James for writing the Python code. 4 Ethnicity seems to play the biggest role in Kennedy’s study, and it is noted as an important potential factor in the work of Macalister, so we only tested this factor in relation to the addressee(s). We could have also attempted to test the effects of gender and age of our addressee(s); however, we had to be careful in building our statistical model: coding too many factors in our data would restrict the predictive power of the model.
150
Alexander Onysko and Andreea Calude
the ethnicity of the speaker, but also that of the addressee(s) present. In particular, we hoped to learn whether or not a M¯aori addressee was more likely to be associated with a higher number of loans, regardless of the ethnicity of the speaker. The strength of our model comes from the ability to control for the various factors in order to see which (if any) are significant predictors of loan use. The second data analyzed come from a diachronic corpus of New Zealand newspapers, namely the New Zealand English Press Corpus (henceforth NZEPC). This corpus is more recent than the WSC, which was collected in 1998, and it provides a diachronic view of the use of the three loans. The NZEPC contains five million words of newspaper language from major newspapers in Auckland (New Zealand Herald and the Sunday Star Times ), Wellington (The Dominion Post ), Christchurch (The Press ), and the Southland region, including Invercargill and neighboring parts of Otago (The Southland Times ) (see Calude and James 2011 for full details). The newspapers were sampled every three years starting with 1996 until 2011 (that is, 1996, 1999, 2002, 2005, 2008, 2011), and they include a range of fifteen newspaper sections, e.g. news, features, sports, advertising, entertainment, business, employment, and so on. One merit of the corpus is that it is balanced across geographical regions, years, and newspapers. Unfortunately, we do not have any information about the writers of the articles included (with a few exceptions). However, given the nature of newspaper language, we do have a certain amount of homogeneity within the data, i.e. the language is written by people of a similar background and education (mostly trained newspaper writers), and the contents includes a diverse array of topics intended for a wide readership. The advantage of newspaper language is that it is transparent of general societal norms and attitudes (Garrett and Bell 1998: 3), and it is thus a valuable means of assessing the current prescriptive view regarding the use of M¯aori loans in written New Zealand English. The three loans were identified in the NZEPC by regular expression searches, but this time, the results were not all manually checked. This is because these data are much “cleaner” than the spoken transcripts (i.e. there are no tags denoting overlapping speech, pauses, etc. but only running text), and also because of the large size of the data. In using the two corpora, we hope to obtain a deeper and, at the same time, a more holistic understanding of the use of the loans Maori, Kiwi, and Pakeha in New Zealand English. On the one hand, the WSC corpus provides us with information about the sociolinguistic factors which come into play with regard to the use of the loans; on the other hand, the NZEPC corpus is revealing a bigger picture in terms of their geographic spread and diachronic progress.
M¯aori loans in spoken and written New Zealand English
151
3 Results: Maori, Pakeha, and Kiwi in spoken and written New Zealand English This section provides the results obtained from the analysis described above. Each dataset and its associated findings are discussed in turn, beginning with our oldest dataset, namely the spoken extracts from the WSC.
3.1 Results in the WSC data The first finding of the WSC data concerns the overall counts of the three loans investigated. These are summarized in Table 1 below.
Tab. 1: Overall use of the loans Maori, Kiwi, Pakeha in the WSC data (per million words) Loan
Form
Counts (per million words)
Kiwi
Kiwi Kiwis
55 41
96
Pakeha
Pakeha Pakehas
84 11
95
Maori 5
Maori Maoris non-Maori
857 43 7
907
Total uses of loans analyzed from the WSC
Total
1098
We ran a Generalized Linear Mixed Model (GLMM) (Baayen 2008) to test whether the ethnicity, age, gender of the speakers, or genre were significant predictors of the use of the loan Maori in the WSC6 (unfortunately, we could not run the same analyses for the other two loans due to insufficient numbers for a GLMM analysis). The analysis was conducted using the statistical software package R (R Development Core Team, 2009). The GLMM was used because the same speaker was sometimes involved in more than one interaction (Baayen 2008). In running the test, we controlled for two important factors. First, we controlled for the total number of words uttered by each speaker in each interaction (by including its
5 We excluded 31 uses of the loan Maori from the analysis because no information was available about the speakers using them (this situation arose in cases where people turned up unexpectedly during the recording process and were accidentally recorded). 6 We are greatly indebted to Roger Mundry for his invaluable help in running the R code and discussions regarding the methods used. Any remaining errors are our own.
152
Alexander Onysko and Andreea Calude
logarithm) so as not to bias results in favor of more verbose speakers. We anticipated that if a speaker was to utter more words than another, the chances of a loan to be used were also (potentially) increased. Secondly, we controlled for the equivalent lexical variant(s) in (New Zealand) English as a way of accounting for lexical choice rather than topic choice. For some speakers, the loan or its English equivalents may simply never come up in conversation, so we wanted a measure of “potential” versus “actual” use (how many times a given speaker could have used the loan, compared to how many times they actually used it). The GLMM model was used to test our predictor variables, in order to see which (if any) characteristics of our speakers were significant predictors of loan use. We were able to test the speakers’ ethnicity, gender, and age. We also had information about the specific genre (e.g. talkback radio transcripts, judge’s summation, conversation, parliament transcripts) and this was also coded in the model. This was not so interesting per se, but we wanted to also control for the fact that certain genres may lend themselves more to the uses of the loan. Almost half of the spoken data was spontaneous conversation and for this portion of the corpus we also had information about the ethnicity of the addressee(s) present during the interactions. Because of the strong link identified by previous literature between ethnicity of speakers and loan use, we hypothesized that there might potentially also be a link between the ethnicity of the addressee(s) involved and loan use. Therefore, for the conversational data, we also tested the ethnicity of the addressee(s) present. We verified that a Poisson distribution fitted the GLMM models (we used a Poisson Dispersion test, the p-values obtained were 1, and the largest Dispersion Parameter was 0.235). As an overall test of the effect of the predictors on the use of the loans, we used a likelihood ratio test comparing the full model (the model with all predictors) against the null model comprising only the random effect (the speaker) and the control for the use of New Zealand English variants and total word count. See below for a summary of the model. GLMM Model Summary Fixed effects NZE Frequency (frequency of loan counterpart) – as control Total word count (logged) – as control Ethnicity of speaker Gender of speaker Age of speaker Genre of speech excerpt (for the full spoken data only) Ethnicity of addressee(s) present (for the conversational data only) Random effect Speaker
153
M¯aori loans in spoken and written New Zealand English
We discuss the results below. The GLMM for the loan Maori was significantly better than the null model (ž2 = 151.67, df = 10, p < 0.001), see Table 2 for the results of the model (the significant factors are marked by one, two, or three asterisks with three representing the highest level of significance).
Tab. 2: Fixed effects of the model for the use of the loan Maori Fixed Effects (Intercept) NZEFre Logged_word_count ETHNICITYpakeha GENDERmale GENREdialogue_conversation GENREdialogue_meetings GENREdialogue_oral_history_interview GENREdialogue_parliament GENREdialogue_social_dialect_interview GENREdialogue_telephone AGE
Estimate
St. Error
z-value
Pr(>|z|)
−7.383 0.268 1.105 −3.518 −0.360 −1.237 −1.588 −2.347 −0.635 −0.488 −1.478 0.021
1.275 0.308 0.164 0.371 0.339 0.272 0.594 1.583 1.019 0.630 0.470 0.011
−5.791 0.872 6.735 −9.473 −1.063 −4.551 −2.676 −1.482 −0.623 −0.774 −3.143 2.015
< 0.001 0.3837 0.383 < 0.001 0.288 < 0.001 0.007 0.138 0.533 0.439 0.002 0.044
*** *** *** *** **
*** *
We proceeded to test each predictor we were interested in separately in order to see whether it could significantly improve the model’s power in predicting the use of the loan, by comparing the full model with the model comprising all other predictors than the one being tested (i.e. taking out each predictor one by one). The significant predictors were: a) ethnicity of the speaker (ž2 = 101.69, df = 28, p < 0.001), such that speakers who identified themselves either as M¯aori or M¯aori/P¯akeh¯a used the loan more than speakers that self-identified as P¯akeh¯a, b) age (ž2 = 5.342, df = 1, p = 0.021), such that older speakers used the loan more than younger speakers, and c) genre (ž2 = 32.744, df = 6, p < 0.001), with genres such as interviews (dialect and broadcast), telephone conversations and face-to-face conversations having a higher occurrence of the loan in comparison with other genres. Gender was not a significant predictor (ž2 = 1.496, df = 1, p = 0.221). This came as a surprise, because a comparison of raw numbers of loans in the same corpus by
7 We also ran the analyses using the model reduction approach, and taking out the non-significant predictors does not change the results in any way. 8 The ethnicity categories “Asian” and “Pasifika” were excluded since speakers of these ethnicities never used the loanword.
154
Alexander Onysko and Andreea Calude
1.0
Kennedy (2001: 20, Table 10) showed some differences across genders. Our data may not show significant differences across genders, either because these differences are not relevant to the use of the loan Maori, or because these gender differences dissipate once other factors are taken into account (such as ethnicity of the speaker, age and genre). Finally, we tested the interaction between the speaker’s ethnicity and their age and the interaction between the speaker’s ethnicity and their gender, neither of which were significant (ž2 = 1.434, df = 2, p = 0.488 and ž2 = 0.581, df = 2, p = 0.748, respectively). However, despite the fact that significance could not be detected, a visual inspection of the interaction plots shows that there are some divergent trends occurring for the different types of speakers, as can be seen in Figures 1 and 2. These figures are purely descriptive, showing the proportion of use of the loan over the sum of the loans and the (New Zealand) English alternatives. Figure 1 shows the interaction between age and ethnicity of the speakers with respect to the use of the loan Maori, while controlling for its New Zealand English equivalents mentioned earlier. The figure shows that while P¯akeh¯a speakers tend to use the loan as they grow older, M¯aori speakers have three noticeable spikes in use, namely in the age group of 25–29, in the age group of 55–59, and in the age group of 75+. The speakers who self-identify as both M¯aori and P¯akeh¯a appear to follow the trend of the M¯aori speakers (with the exception of the 16–19 age group), but unfortunately, our data do not contain M¯aori/P¯akeh¯a speakers older than the age range of 35–39, so we cannot be sure whether this continues throughout the older age groups.
0.2
0.4
0.6
0.8
pakeha maori maori/pakeha
0.0
Mean use of controlling for the NZE variants
Ethnicity
16-19
25-29
35-39
45-49
55-59
65-69
75-79
85-89
Age
Fig. 1: Interaction plot showing the use of the loan Maori for speakers of different ethnicities and different ages
0.6
155
Ethnicity
0.2
0.3
0.4
0.5
maori maori/pakeha pakeha
0.1
Mean use of controlling for the NZE variants
M¯aori loans in spoken and written New Zealand English
female
male Gender
Fig. 2: Interaction plot showing the use of the loan Maori for speakers of different ethnicities and across males and females
Figure 2 shows that unlike the interaction between age and ethnicity, M¯aori/P¯akeh¯a speakers do not follow the same pattern as M¯aori speakers across males and females. Female M¯aori/P¯akeh¯a speakers use the loan Maori more than male M¯aori/P¯akeh¯a, while female M¯aori speakers use the loan less than their male counterparts. Nevertheless, the effect is not statistically significant, possibly because we may not have sufficient data points to test the interaction reliably. These issues would need to be investigated further with more data. We now turn to the conversational data, with the particular aim of testing the influence of the ethnicity of the addressee(s) on the use of the loan. This time, we ran a similar GLMM as before, taking ethnicity (of the speaker), gender, age, and addressee as fixed effects, and controlling for the use of the New Zealand English alternatives “native(s)” and “indigenous” as well as for the total number of words used by each speaker in each interaction. Because some conversations involved interactions between more than two speakers, the addressee data were coded as M¯aori (when a speaker was addressing a M¯aorionly audience), non-M¯aori (when a speaker was addressing an audience where no M¯aori was present), mixed (when a speaker was addressing a group of mixed ethnicities including a M¯aori speaker), and M¯aori/P¯akeh¯a (when the speaker was addressing
156
Alexander Onysko and Andreea Calude
an audience containing exclusively M¯aori/P¯akeh¯a participants)9. This coding reflects the cautious anticipation that the presence of a M¯aori person in the interaction may influence (probably increase) the use of the loan. We also did not want to make any assumptions about the effects of a M¯aori/P¯akeh¯a audience, hence we coded this group (albeit small) separately. Table 3 summarizes the results of this model (as before, the significant factors and their level of significance are marked by asterisks).
Tab. 3: Fixed effects of the model10 for the use of the loan Maori in the conversation data Fixed Effects
Estimate
St. Error
z-value
Pr(>|z|)
(Intercept) NZEFre Logged_word_count ETHNICITYmaori/pakeha ETHNICITYpakeha GENDERmale AGE ADDRESSEEmaori/pakeha ADDRESSEEmixed ADDRESSEEnon-maori
−12.097 0.758 1.604 −0.677 −1.793 −0.356 0.024 −1.734 0.857 −2.508
2.267 0.384 0.300 0.985 0.633 0.510 0.015 0.765 0.615 0.569
−5.336 1.972 5.338 −0.687 −2.834 −0.698 1.588 −2.266 1.394 −4.409
< 0.001 0.049 < 0.011 0.492 0.005 0.485 0.112 0.024 0.163 < 0.001
*** * *** *** **
* ***
The full model performed significantly better than the null model (ž2 = 94.625, df = 7, p < 0.001). As before, the speaker’s ethnicity (ž2 = 9.806, df = 2, p = 0.007) was a significant predictor, and their age was borderline significant (ž2 = 3.365, df = 1, p = 0.067). A further significant predictor, and one which we were especially interested in testing here, was the ethnicity of the addressee (ž2 = 38.096, df = 3, p < 0.001). Our data suggest that M¯aori addressees and mixed addressees where a M¯aori interlocutor was present exhibit significantly higher numbers of the loan than audiences where no M¯aori participant was present. Finally, we tested the interaction between the ethnicity of the speaker and that of the addressee11, and, although this was itself not significant (ž2 = 4.496, df = 6, p < 0.610), a visual inspection of the interaction plot suggests some interesting trends, see
9 There were no interactions of mixed audiences containing no M¯aori participants but one (or more) M¯aori/P¯akeh¯a participants. 10 There were no speakers of Asian or Pasifika ethnicities who uttered the M¯aori loan in any of our conversational data (and they were not present as addressees either in any conversations where the loan was used). Similarly, there were no mixed groups without any M¯aori addressees present in conversations where the loan was used. So, we eliminated these categories from the analysis. Leaving these in would increase the standardized error values because the model struggles to predict anything in these rare cases. 11 We thank an anonymous reviewer for leading us into this direction.
157
1.0
M¯aori loans in spoken and written New Zealand English
0.2
0.4
0.6
0.8
maori pakeha maori/pakeha
0.0
Mean use of controlling for NZE variants
Ethnicity of Speaker
maori
maori/pakeha
mixed
non-maori
Ethnicity of Addressee
Fig. 3: Interaction plot showing the use of the loan Maori for speakers and addressees of different ethnicities
Figure 3. Figure 3 shows that addressees who are non-M¯aori or M¯aori/P¯akeh¯a tend to co-occur with fewer instances of the loan, whereas audiences exhibiting the presence of M¯aori participants (either exclusively, or a mixture of ethnicities including at least one M¯aori participant) co-occur with an increased use of the loan, regardless of the ethnicity of the speaker. Furthermore, the observed increase in use of the loan with an exclusively M¯aori audience is most pronounced for M¯aori speakers, somewhat less pronounced for the M¯aori/P¯akeh¯a speakers and least of all for the P¯akeh¯a speakers. Interestingly, the rise in use of the loan with mixed audiences which consist of at least one M¯aori speaker is highest for the M¯aori/P¯akeh¯a speakers this time, and less so for the M¯aori speakers. As we have seen with the previous interactions tested, the models are not able to detect significant differences in this regard, possibly because of insufficient data. The visual representations do, however, suggest that these effects are worth investigating in further detail when larger datasets become available. We now turn our attention to the other two loans investigated, namely Kiwi and Pakeha. As mentioned earlier, we are not in a position to build similar GLMM models to test the use of these loans because of their low frequencies of occurrence. Instead, we content ourselves with some descriptive remarks regarding their use in the WSC data. Unlike the loan Maori, the loan Kiwi is used more by P¯akeh¯a than by M¯aori, as shown in Table 4.
158
Alexander Onysko and Andreea Calude
Tab. 4: Use of Kiwi in the WSC, compared to the variant New Zealander Ethnicity of speaker
M¯aori M¯aori/ P¯akeh¯a P¯akeh¯a Asian
Number of uses of Kiwi
Number of uses of New Zealander
11 1 80 0
12 2 72 2
Proportion of use of loan compared to NZE variant 48% 33% 53% 0%
0.12
Ethnicty of Speaker
0.02
0.04
0.06
0.08
0.10
paheka maori maori/paheka
0.00
Mean use of controlling for the NZE variant
Looking exclusively at the conversational data, we can provide a visual representation of the interaction of the ethnicity of the speaker and that of the addressee with respect to the use of the loan Kiwi. This is similar to the one given for the loan Maori in Figure 3. The use of Kiwi is controlled for as a proportion of the total combined use of the loan and its New Zealand English equivalent. The plot in Figure 4 seems almost reversed for the loan Kiwi when compared to Maori. P¯akeh¯a speakers only use the term Kiwi when talking to non-M¯aori audiences. M¯aori speakers use it only with M¯aori/P¯akeh¯a participants while M¯aori/P¯akeh¯a speakers use Kiwi only with other M¯aori participants (this latter being similar to their use of the loan Maori ). Finally, we now turn our attention to the loan Pakeha. This time, the New Zealand English equivalent terms (White New Zealander, European New Zealander, or New Zealand European ) never occurred in the corpus. So, we have no way of accounting for lexical choice of a loanword over an existing term in the language, above and beyond
maori
maori/pakeha
mixed
non-maori
Ethnicity of Addressee
Fig. 4: Interaction plot showing the use of the loan Kiwi for speakers and addressees of different ethnicities
M¯aori loans in spoken and written New Zealand English
159
topic of conversation. While this limitation prevents us from making any comparisons with the previous loans as regards the WSC corpus, the data generally show that both M¯aori and P¯akeh¯a speakers use this loan. In order to investigate this issue further, we manually checked all the occurrences of Pakeha in the conversations and the dialogue data counting out the individual discourse constellations. The manual counts established a marked difference between discourse constellations that involve a P¯akeh¯a or a M¯aori addressee. Thus, a P¯akeh¯a addressee was present in 9 conversations (amounting to a total of 13 uses of the loan Pakeha ) while M¯aori and M¯aori/P¯akeh¯a addressees were involved in 22 conversations (covering 49 uses of the loan). The preferential use of the loan among M¯aori interlocutors is also supported by the fact that only in two cases did a M¯aori speaker use the loan when conversing with a P¯akeh¯a interlocutor and the reverse was the case in only one instance (a female P¯akeh¯a professor talking to a M¯aori broadcaster).
3.2 Results in the NZEPC data We searched the NZEPC data for the same variants as we did with the WSC data. In total, we found 420 instances of Maori, 22 instances of Pakeha and 282 instances of Kiwi (on average, per million words). The rank ordering of the three loans in the NZEPC is the same as in the WSC data: the most frequent loan is Maori, followed by Kiwi, and Pakeha. However, the loan Kiwi has increased in use almost threefold in the newspaper data, whereas the use of Maori is only half as frequent as in the spoken data. Similarly, Pakeha has also decreased considerably. It must be noted, however, that the data of the two corpora are not completely comparable because the older data come from spoken language collected during a short period of time, whereas the more cur-
Tab. 5: Overall use of the loans Maori, Kiwi, and Pakeha in the NZEPC (per million words) Loan
Form
Counts (normalized per million words)
Kiwi
Kiwi Kiwis
189.04 93.42
282.46
Pakeha
Pakeha Pakehas
20.12 1.59
21.71
Maori
Maori Maoris non-Maori non-Maoris
398.00 18.72 2.39 0.80
419.91
Total uses of loans analyzed from the NZEPC
Total
724.08
160
Alexander Onysko and Andreea Calude
rent data represent written language (and a more homogenous sample) collected over a period of fifteen years. The three loans are unevenly distributed across newspapers particularly between the North and the South Island (see Table 6). The Wellington area, represented by the Dominion Post, exhibits the highest counts of the loans, followed closely by The New Zealand Herald/Sunday Times located in Auckland (see shaded cells in Table 6). The South Island newspapers consistently show much lower figures, with the southernmost region of Southland in particular demonstrating a marked drop in the usage frequency of these loans.
Tab. 6: Overall use of the loans in the NZEPC across newspapers (per million words, rounded) Newspaper
New Zealand Herald & Sunday Star Times)
Dominion Post
The Press
Southland Times
MAORI(S) PAKEHA(S) KIWI(S)
483 30 305
594 31 307
300 10 296
145 6 169
We also noticed that the loans were more likely to occur in certain newspaper sections than in others. Focusing on the major sections present in all newspapers, the loans Maori and Pakeha exhibit their highest occurrences in News and Features. Kiwi, on the other hand, is generally more evenly spread throughout these four sections and shows a peak in both Business and Sport (see Table 712).
Tab. 7: Distribution of the loans across major newspaper sections (per million words, rounded) Newspaper section Business Features News Sport
MAORI(S)
PAKEHA(S)
KIWI(S)
97 340 523 93
3 23 27 0
457 197 241 485
Finally, a further important analysis carried out in the NZEPC data relates to the diachronic use of the three loans (see Table 8). The Kendall’s Tau tests show that the trends for both Maori and Pakeha are decreasing (though the p-value for Maori is higher than the 0.05 threshold). There is some hint of an upward trend in the use of
12 Since not all the newspaper sections contain the same amount of words, we only calculated the projected value per million words for the sufficiently large sections. In detail, Business holds 361,000 words, Features consists of 839,000 words, News amounts to 3,209,000 words, and Sport contains 503,000 words.
M¯aori loans in spoken and written New Zealand English
161
Tab. 8: Distribution of the three loans across years (per million words) Year
1996
1999
2002
2005
2008
2011
Kendall’s Ł, p-value
MAORI(S) PAKEHA(S) KIWI(S)
662 50 246
681 25 294
326 18 152
353 23 265
212 6 374
298 9 361
Ł = −0.60, p = 0.13 Ł = −0.73, p = 0.06 Ł = +0.50, p = 0.26
6 1
2
3
4
5
6
250 200 150 0
5
0.0
4
300
350
3
100
Distance in summed standard deviations
159.2 119.4 79.6
2
50
1
39.8
Distance in standard deviations
199.0
Kiwi (tau = +0.50); however, the p-value is too high to be confident that this is a real effect. In light of such unclear Kendall’s Tau scores, we decided to pursue a variablebased neighbour clustering analysis (VNC) (cf. Hilpert and Gries 200913). The VNC analysis is a complimentary technique to the Kendall’s Tau test, designed to find hidden patterns in diachronic data. It is a clustering analysis which takes into consideration the temporal ordering of frequency data, thus only allowing the clustering of values which are temporally adjacent. The resulting dendograms for our three loans are included in Figures 5–7 (see below) alongside the plots of the number of clusters assumed. The VNC plots provided in the right-hand side panels give scree plots showing the main clusters and the progression of each loan diachronically over the period from 1996 to 2011. The cluster graphs on the left-hand side show how many different stages should be proposed over this time period – the steeper the curve from one stage to the next, the more substantial the separation to a new stage (or cluster) (see the paper by
1996
Clusters
1999
2002
2005
2008
2011
Time
Fig. 5: VNC dendogram from the NZEPC for the loan Maori
13 We are grateful to Martin Hilpert for his help with the VNC analysis (any errors are our own).
162
Alexander Onysko and Andreea Calude
0.0
1.6
5
1
2
3
4
5
20 15 0
6
25
30
4
10
Distance in summed standard deviations
3
5
9.6 11.2 4.8
6.4
8.0
2
3.2
Distance in standard deviations
14.4
1
6
1996
1999
Clusters
2002
2005
2008
2011
2008
2011
Time
1
250
82.0
Fig. 6: VNC dendogram from the NZEPC for the loan Pakeha
200 150 100
4
50
Distance in summed standard deviations
32.8
49.2
3
8.2 16.4 0.0
5 6 1
2
3
4
5
6
0
Distance in standard deviations
65.6
2
1996
Clusters
1999
2002
2005
Time
Fig. 7: VNC dendogram from the NZEPC for the loan Kiwi
Hilpert and Gries 2009 for detailed information on the VNC plots and their interpretation). In general, we find that while the three loans exhibit slightly different trends of use over time, some common stages in their progression emerge. One such similarity shows that all three loans decrease in use by the year 2002 compared to the previous years. A second general pattern is that of an increase between 2002 and 2005. The point of divergence for our loans seems to be in what happens to their respective use following 2005. Maori decreases in 2008 and increases in 2011, and, Pakeha decreases
M¯aori loans in spoken and written New Zealand English
163
in 2008 and remains close to that low value in 2011. By contrast, Kiwi increases in 2008 and shows a similarly high value in 2011.
4 Discussion: Contextualizing the results on Maori, Pakeha, and Kiwi This section discusses the major findings from the WSC and NZEPC and draws on insights gained from a contextual analysis of the three loans. For the spoken data in the WSC, the major question was to determine statistically whether the ethnicities of the interlocutors (as speaker and addressee) would play a role for how frequently the loans were used. In addition, the application of a Generalized Linear Mixed Model allowed the testing of other potential factors influencing the use of the loan Maori such as age and gender. The most significant finding from the statistical analysis is that the ethnicities of both interlocutors make a decisive difference in the usage amount of Maori. Thus, the presence of M¯aori speakers and addressees increase the likelihood for the usage of the term Maori. Descriptive analysis of the spoken data for the loan Kiwi indicates that it is used more often by P¯akeh¯a speakers, in particular when talking to non-M¯aori interlocutors. The loan Pakeha, on the other hand, is again favored among M¯aori as a manual count of all the interlocutor constellations in the WSC shows. In general terms, the higher rate of the occurrence of Maori, which is also the most frequent loan in the written newspaper corpus (NZEPC), can, to some extent, be explained by its referential function. Thus, the loan Maori represents the only commonly used term to denote the indigenous population of New Zealand. So, whenever reference is made to people pertaining to that ethnic group, the loan Maori is the most appropriate term to mark their ethnic belonging. This is quite different for the loans Pakeha and Kiwi. In both cases, these terms of ethnic and national identity are not the only means to refer to a non-M¯aori New Zealander (typically of European descent, i.e. Pakeha ) or to a New Zealander in general (Kiwi )14. The loan Pakeha can be used interchangeably with the term New Zealand European, and, as controversy over the use of Pakeha in recent New Zealand censuses shows, the expression New Zealand European is available as an alternative (see, e.g. Ethnicity Profile of New Zealander Responses in the 2006 Census, 2007). Furthermore, identification with the loan Pakeha can be problematic for some New Zealanders who perceive the term negatively. This is quite the opposite for Kiwi which tends to be perceived positively when used synonymously with the term New Zealander. When it comes to the use of the three loans in both the WSC and the NZEPC, they show some similar characteristics and functions in spoken and written discourse.
14 This is why we controlled for the occurrence of New Zealand English equivalents of the loans in the analysis of the data in WSC (see Section 3).
164
Alexander Onysko and Andreea Calude
Apart from their nominal use, all of the three loans show a predilection for forming compounds (e.g. Kiwi guy, Kiwi accent, Kiwi life ; Maori people, Maori language, Maori culture ; Pakeha majority, Pakeha people, Pakeha looking ). Their occurrence in the determinant position of compounds is closely related to the fact that they can also be generally used as attributive adjectives. More rarely so, the predicative use of being Pakeha, Maori or Kiwi also occurs in the data. The loans can also refer to the respective languages spoken in New Zealand with Maori as short for M¯aori language/Te Reo M¯aori, Pakeha, albeit rarely so, representing the majority tongue English, and Kiwi with particular reference to the variety of New Zealand English (e.g. the term Kiwi accent is frequently used in the data in discussions of differences between New Zealand and Australian English). Apart from these general similarities, the three loans show marked differences in their most common usage contexts. Among the spoken data, typical topics of the loan Maori include M¯aori language (e.g. discussions of the ability to speak M¯aori as well as the revitalization, acquisition, and teaching of M¯aori in schools), M¯aori culture and M¯aori people (including specific needs, health issues, family, and sports), law and politics (e.g. the frequently mentioned compound Maori seats ) as well as M¯aori television. While these topic areas are also pervasive in the written newspaper corpus, data from NZEPC also feature the loan in a great number of proper names (Maori Land Court, Maori Language Commission, Ministry of Maori Development, Maori Party, and so on). On the other hand, the loan Pakeha is used much less frequently and only occurs in a restricted range of topics, almost all of which relate to a M¯aori context. The binominal Maori and Pakeha is a recurrent construction in both spoken and written data, and it is striking to observe the pervasive co-textual presence of the term Maori in both written and spoken language. An example from the WSC illustrates the fact that the main function of the loan Pakeha is to emphasize bipolar ethnic differences in New Zealand (e.g. “do you think knowing Maori helps Pakehas understand Maori people?” asked by a M¯aori interviewer during a social dialect interview in the WSC, DPP008). Thus, Pakeha is a term that reflects the M¯aori perspective of the world (Te Ao M¯aori), and it occurs primarily among the topics of M¯aori language, NZ politics, ethnic distinctions in NZ, and M¯aori cultural events/concepts (e.g. when talking about a tangi ‘M¯aori funeral’ or the concept of tapu ). These contexts of use are confirmed in the written data where Pakeha can additionally assume the function of an ethnic label as in the description of a suspected criminal. In contrast to Pakeha, Kiwi is used in a whole range of usage contexts in the data. This is a token of its generally positive connotations as a marker of national and in some cases even ethnic identity. The function of Kiwi as prime marker of New Zealand identity shows up in the recurrent topic of competition between Australia and New Zealand in the spoken data. In this case Kiwi is an affectionate term for a New Zealander (e.g. “[…] fundamental about the Kiwi character, you’ll see the rest of the world before you see Australia” WSC, DPP308). Furthermore, the loan is often used in the spoken data in the context of international sports competitions or when mostly non-
M¯aori loans in spoken and written New Zealand English
165
M¯aori people talk about their relations to fellow New Zealanders, when reminiscing in their stories of the past (such as growing up in New Zealand), and when referring to their New Zealand accent or variety of English. The versatility of Kiwi is confirmed in the newspaper data (see Table 7), where the loan is shown to occur consistently in all major newspaper sections with peak frequencies in both Business and Sport. Its overall function as a marker of national identity is evident in recurrently used compounds and combinations that refer to core elements of New Zealand national identity (e.g. Kiwi living standards, Kiwi culture, Kiwi lifestyle, Kiwi identity, Kiwi production, Kiwi athletes, Kiwi bloke, Kiwi soldiers, Kiwi dollar, Kiwi passports, Kiwi ingenuity and spirit ). A particularly telling example in this regard is the use of Kiwi in the following comment as part of a tribute to Sir Edmund Hillary: “[…] but most of all he was a quintessential Kiwi. He was ours from his craggy appearance and laconic style to his directness and honesty.” (Dominion Press, January 12, 2008). The newspaper data also give some insight into the pluralization patterns of the loanwords. While Kiwi is generally pluralized in the English fashion of adding the plural suffix -s, there is some controversy over whether the terms Maori and Pakeha should retain their original morphologically unmarked plural form or be fully integrated into the English plural system. Interestingly, the Anglicized plural forms Maoris and Pakehas mostly occur prior to 2000 and totally disappear after 2002 in written newspaper language. The only occurrences of Maoris in 2005 and 2011 are part of a proper name and of an oral quotation. This hints at a difference in the use of the English plural for these M¯aori loans between written and spoken language in New Zealand today. Thus, written media assume a prescribed attitude of correctness by using M¯aori loanwords according to M¯aori conventions, whereas spoken discourse more likely reflects regular integration processes that generally happen when words are borrowed and integrated into another language. As illustrated in Table 8 and in Figures 5–7, the newspaper data also show differences in the occurrence of the three loans over the period of the last fifteen years. The most significant trend concerns the loan Pakeha whose usage has decreased five-fold in recent years. This development is most likely a sign of an increased stigmatization of the term among non-M¯aori New Zealanders, who prefer to revert to other ethnic designations such as New Zealander, New Zealand European, and Kiwi instead. The amount of Maori also exhibits a decrease from 1999 to 2002 but has remained rather even since then, with slight ups and downs. Kiwi, on the other hand, is the only loan among the three that has slightly increased its frequency from 1996 to 2011. While this could indicate the rising popularity of Kiwi to denote a person from New Zealand and things New Zealand in general, the lack of statistical significance of this tendency should caution any attempts at giving a conclusive interpretation of the numbers. Indeed, when looking at the usage contexts of Kiwi, it appears that recent fluctuations in its amount appear to be influenced by specific events happening throughout the period of observation. Thus, the rise of the number of Kiwi in 2008 coincides with New Zealand’s first title in the Rugby League World Cup, and a number of articles and reports on that event
166
Alexander Onysko and Andreea Calude
refer to this Kiwi achievement. The same seems to hold true for the slight fluctuations in the number of Maori in recent years. In the first months of 2011, for example, the term Maori Party occurred particularly often in the newspapers as reports on Hone Harawira’s split from the M¯aori Party and his setting up of the Mana Party were important topics in New Zealand domestic news. Finally, the analysis of the written data in the NZEPC shows a significant divide in the use of Maori and Pakeha between the North and South Islands. This substantiates earlier findings on selected M¯aori loans in Macalister (2001: 39), Davies and Maclagan (2006: 94), and Degani (2010: 179), confirming the difference in the media representation of certain M¯aori loans in the two major islands of New Zealand15. As pointed out by Macalister (2001: 39), the reason for this difference between North Island and South Island newspapers relates to the demographic fact that the vast majority of M¯aori tribes and people live on the North Island. Therefore, most of the events and issues concerning M¯aori people happen on the North Island. Furthermore, as the political centre, the capital city of Wellington harbours most of the political and legal concerns between the M¯aori and P¯akeh¯a population of New Zealand. By comparison, the more evenly spread usage of Kiwi across the newspapers indicates, first of all, that the term is used to denote New Zealand identity in general and, secondly, that regional differences in the amount of M¯aori loans have to be considered on an individual basis. It is thus a task of further research to investigate whether other established borrowings from M¯aori in New Zealand English show a similar divide in their newspaper occurrences between the North and the South Island and whether a more general trend can be found.
5 Conclusion This contribution has set out to provide an in-depth account on the use of three prominent M¯aori loans in spoken and written New Zealand English. Relying on statistical measures, the study shows the significant role of interlocutor ethnicities for the use of the loan Maori in spoken New Zealand English as recorded in the WSC. Descriptive statistics and manual screening of the data have added further evidence for an ethnic bias in the use of the three loans. An investigation of their usage contexts coincides with the observed ethnic bias in that Maori and Pakeha usually occur with topics relating to M¯aori people, culture, language, ethnic controversies, and politics while Kiwi assumes the function of a positive marker of New Zealand identity, particularly among
15 Like these studies, our analysis also suffers from a possible limitation which is that we did not control for inter-writer variability. This means that the difference between North Island and South Island newspapers might be influenced by personal stylistic preferences of certain writers who are particularly active in their use of these loans and who happen to be writing for the Dominion Post and the New Zealand Herald/Sunday Star Times. We hope to address this issue in future work.
M¯aori loans in spoken and written New Zealand English
167
non-M¯aori people. In the written data, the major topic areas and textual functions of the loans are similar to the spoken corpus with the exception of plural formation. In this case, newspaper language has abandoned the use of English plural suffixation since about 2002, thus following a shift in attitude towards aiming at representing M¯aori terms in their original forms. Spoken language appears to be more lenient in this sense and is more likely to follow linguistically intuitive patterns of integration. Further statistical analyses of the written data have shown a significant regional difference in the usage frequency of Maori and Pakeha which is indicative of the demographic situation in the North and the South Island. These findings substantiate earlier claims in research on selected M¯aori borrowings. The recent diachronic development in the usage frequencies of the loans does not cohere with a general trend that the rate of M¯aori loans in New Zealand English is constantly increasing. In more recent data, Pakeha has shown a significant decrease over the last fifteen years, and the numbers for Maori have dropped at the beginning of the new millennium and have remained more or less at that level since then. Among the three loans, Kiwi is the only term that has slightly increased in numbers; however, this increase remains below the level of significance and contextual analyses indicate that specific events have influenced its slight numerical fluctuation. It will thus be a task of further research to investigate the diachronic development of M¯aori loans in New Zealand English on a comprehensive scale. As a guideline for such research, this case study demonstrates that M¯aori borrowings have to be considered on an individual basis and that establishing an overall trend for the amount of M¯aori loans might not coincide with the numerical development of individual borrowings. In addition, this study emphasizes that it is necessary for empirical research into borrowing to strike a balance between methods of statistical data analysis and close contextual inspection of the data. This combination will allow the researcher to look beneath the numbers and explain phenomena from the root of linguistic research.
References Bauer, Winifred. 2008. Is the health of Te Reo M¯aori improving? Te Reo 51. 33–73. Baayen, Harald. 2008. Analyzing linguistic data. A practical introduction to statistics using R. Cambridge: Cambridge University Press. Belich, James. 2001. Paradise reforged: A history of the New Zealanders from the 1880s to the year 2000. Auckland: Allen Lane/Penguin Press. Bell, Allan 1984. Language style as audience design. Language and Society 13. 145–204. Bell, Allan 1991. The language of news media. Oxford & Cambridge: Blackwell. Bellet, Donella. 1995. Hakas, hangis and kiwis: Maori lexical influence on New Zealand English. Te Reo 38. 73–103. Calude, Andreea S. & Paul James. 2011. A diachronic corpus of New Zealand newspapers. New Zealand English Journal 25. 1–14. Chrisp, Steven. 2005. Maori intergenerational language transmission. International Journal of the Sociology of Language 172. 149–181.
168
Alexander Onysko and Andreea Calude
Coupland, Nikolas. 1995. Accommodation theory. In Jef Verschueren, Jan-Ola Ostman & Jan Blommaert (eds.), Handbook of Pragmatics, 21–26. Amsterdam & Philadelphia: John Benjamins. ¯ Daly, Nicola. 2007. Kukupa, koro, and kai: The use of M¯aori vocabulary items in New Zealand English children’s picture books. New Zealand English Journal 21. 20–33. Davies, Carolyn & Margaret Maclagan. 2006. M¯aori words – read all about it: Testing the presence of 13 M¯aori words in four New Zealand newspapers from 1997 to 2004. Te Reo 49. 73– 99. De Bres, Julia. 2006. Maori lexical items in the mainstream television news in New Zealand. New Zealand English Journal 20. 17–34. Degani, Marta. 2010. The Pakeha myth of one New Zealand /Aotearoa: An exploration in the use of Maori loanwords in New Zealand English. In Roberta Facchinetti, David Crystal & Barbara Seidlhofer (eds.), From International to Local English – and Back Again, 165–196. Frankfurt am Main: Peter Lang. Degani, Marta & Alexander Onysko. 2010. Hybrid compounding in New Zealand English. World Englishes 29 (2). 209–233. Deverson, Tony. 1991. New Zealand English lexis: The Maori dimension. English Today 26. 18–25. Eliasson, Stig. 1990. English–Maori language contact: Code-switching and the free-morpheme constraint. In Rudolf Filipoviç & Maja Brataniç (eds.), Languages in contact, 33–49. Zagreb: Institute of Linguistics, University of Zagreb. Garrett, Peter & Allan Bell. 1998. Media and discourse: A critical overview. In Allan Bell & Peter Garrett (eds.), Approaches to media discourse, 1–20. Oxford: Blackwell. Giles, Howard. 1973. Accent mobility: A model and some data. Anthropological Linguistics 15. 87–109. ¯ Harlow, Ray. 2001. A Maori reference grammar. Auckland: Pearson Longman. ¯ Harlow, Ray. 2003. Issues in M¯aori language planning and revitalisation. Journal of Maori and Pacific Development 4 (1). 32–43. ¯ Harlow, Ray. 2007. Maori: A linguistic introduction. Cambridge: Cambridge University Press. Harlow, Ray, Peter Keegan, Jeanette King, Margaret Maclagan & Catherine Watson. 2009. The changing sound of the M¯aori language. In James Stanford & Dennis Preston (eds.), Variation in indigenous minority languages, 129–152. Amsterdam & Philadelphia: John Benjamins. Hilpert, Martin & Stefan Th. Gries. 2009. Assessing frequency changes in multistage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition. Literary and Linguistic Computing 24 (4). 385–401. Holmes, Janet, Bernadette Vine & Gary Johnson. 1998. Guide to the Wellington Corpus of Spoken New Zealand English. Wellington (New Zealand): University of Wellington. Kennedy, Graeme. 2001. Lexical borrowing from Maori in New Zealand English. In Bruce Moore (ed.), Who’s centric now? The present state of post-colonial Englishes, 59–81. Oxford: Oxford University Press. Macalister, John. 2001. Introducing a New Zealand newspaper corpus. New Zealand English Journal 15. 35–41. Macalister, John. 2005. A dictionary of Maori words in New Zealand English. Oxford & New York: Oxford University Press. Macalister, John. 2006a. Of weka and waiata: Familiarity with borrowings from Te Reo M¯aori. Te Reo 49. 101–124. Macalister, John. 2006b. The Maori presence in the New Zealand English lexicon, 1850–2000: Evidence from a corpus-based study. English World-Wide 27 (1). 1–24. Macalister, John. 2007. ‘Weka’ or ‘woodhen’? Nativization through lexical choice in New Zealand English. World Englishes 26. 492–506.
M¯aori loans in spoken and written New Zealand English
169
Macalister, John. 2008. Tracking changes in familiarity of borrowings from Te Reo M¯aori. Te Reo 51. 76–97. Meyerhoff, Miriam. 1998. Accommodating your data: The use and misuse of accommodation theory in sociolinguistics. Language and Communication 18. 205–225. Orsman, Harry. 1997. The dictionary of New Zealand English. Auckland: Oxford University Press. P¯otatau, H¯emi. 1991. He hokinga mahara [An autobiography]. Auckland: Longman Paul. R Development Core Team. 2009. R: A language and environment for statistical computing. Foundation for Statistical Computing, Vienna, Austria. Profile of New Zealander responses, ethnicity question: 2006. 2007. Statistics New Zealand/ Tatauranga Aotearoa: Wellington. http://www.stats.govt.nz (accessed February 2012). Trudgill, Peter. 1986. Dialects in contact. Oxford: Blackwell. Van Coetsem, Frans. 2000. A general and unified theory of the transmission process in language contact. Heidelberg: Winter.
Frank van Meurs, Jos Hornikx and Gerben Bossenbroek
English loanwords and their counterparts in Dutch job advertisements: An experimental study in association overlap Abstract: A question which has not yet been addressed in loanword studies is to what extent people perceive loanwords as having different meanings than their nativelanguage equivalents, and on what factors this may depend. According to the Conceptual Feature Model (De Groot 1992b), translation-equivalent words in a first and second language are linked to both the same and different conceptual features, which can be elicited by asking speakers to write down their associations with these words. Two factors determining association overlap between equivalent L1 and L2 words are their concreteness and cognateness. The aim of the current study was to determine experimentally to what extent English loanwords from Dutch job ads evoke the same associations as their Dutch equivalents, and to what extent this association overlap is predicted by the degree of concreteness and cognateness of these words. In an experiment, 60 Dutch participants wrote down associations with 30 English loanwords selected from corpora of Dutch job ads and with their Dutch counterparts, in two sessions separated by a six-week interval. As a baseline, they also wrote down three associations with English/Dutch word pairs which Van Hell and De Groot (1998) had found evoked a relatively small and a relatively large proportion of overlapping associations, respectively. The degree of concreteness and cognateness of these words was determined in separate norming studies involving 129 Dutch participants. The results showed that the mean overlap in associations between the English loanwords from Dutch job ads and their Dutch equivalents was 21.6%. This was significantly less than the percentage for the word pairs for which large overlap had been expected (30.6%), and similar to the percentage for the word pairs for which little overlap had been expected (21.4%). Regression analyses revealed that the degree of association overlap was significantly predicted by cognateness but not by concreteness. It can be concluded that Dutch people to a large extent have different associations with English loanwords from Dutch job ads than with their Dutch equivalents, and therefore to a large extent link them to different conceptual features. This finding, and the finding that the degree of association overlap was predicted by the degree of cognateness of the words, are in line with the Conceptual Feature Model.
1 Introduction Dutch job advertisements frequently contain English words. A corpus analysis of job ads published in the Dutch national quality newspaper de Volkskrant in August 2001
172
Frank van Meurs, Jos Hornikx and Gerben Bossenbroek
revealed that 39% contained at least one English word (Korzilius, Van Meurs, and Hermans 2006). Similarly, 88.5% of job ads appearing on the Dutch job site Monsterboard in February 2004 were found to contain at least one English word (Van Meurs, Korzilius, and Den Hollander 2006b). These high percentages are particularly striking, since for a number of these English loanwords, Dutch equivalents are available, such as verkoopleider for sales manager, and personeelszaken for human resources. A number of reasons have been given for the use of loanwords generally and in job ads in particular. Hock (1986: 408–409) and Wetzler (2006: 27–28) provide two general, but distinct reasons: loanwords fill a lexical gap in the recipient language, and loanwords are perceived to be more prestigious than equivalents in the recipient language. Interviews with Dutch job ad makers show that they indeed use English terms for these reasons, but also for other reasons, such as the fact that they are commonly used in their organisation or their sector (Van Meurs 2010: 157–166). Corpus analyses have revealed factors that determine the ease and frequency with which words are borrowed. Such factors include features of the words themselves – e.g. word class (Poplack, Sankoff, and Miller 1988), word length (Zenner, Speelman, and Geeraerts 2012), and inflexion in the donor language (Van Hout and Muysken 1994) – and characteristics of the language users, e.g. their social class, gender, and neighborhood (Poplack, Sankoff, and Miller 1988). Corpus analyses of Dutch job advertisements have revealed that the extent to which English words are used depends on the international (versus domestic) nature of the organization (Korzilius, Van Meurs, and Hermans 2006; Van Meurs, Korzilius, and Den Hollander 2006b), the educational level of the ads’ target group, and the sector in which the organization operates (Van Meurs, Korzilius, and Den Hollander 2006b). The studies about reasons for loanword use and the corpus analyses yield insights into loanwords from the perspective of those who produce them. Another important perspective on loanwords is their perception. A number of different approaches have been taken to study the perception of loanwords. Some studies have taken a direct approach by asking people what they think about the phenomenon of loanwords (e.g. a survey by Thøgersen 2004), or about specific instances of loanwords that the researchers provided (e.g. Fink 1977; Wetzler 2006: 354). This last approach was also taken in the domain of loanwords in job advertisements in Van Meurs et al. (2007), who compared Dutch people’s evaluations of English and Dutch job titles. Other studies have taken an indirect approach by asking people to evaluate texts containing either loanwords or their equivalents in the recipient language (e.g. Hassall et al. 2008). In the domain of job ads, this indirect approach was taken by Van Meurs, Korzilius, and Hermans (2004) and Van Meurs, Korzilius, and Den Hollander (2006a), who conducted experimental studies in which Dutch participants were asked to evaluate job ads with English loanwords or their Dutch equivalents. The experimental studies by Van Meurs, Korzilius, and Hermans (2004) and Van Meurs, Korzilius, and Den Hollander (2006a) show that the use of English loanwords or their Dutch equivalents in Dutch job advertisements had no effect on the evaluation of
Association overlap between English loanwords and their Dutch counterpart
173
the job ads, the jobs advertised and application intentions. However, Van Meurs et al. (2007) found that, in isolation, the use of some English instead of Dutch job titles led to worse evaluations of the job titles themselves, but also to more positive evaluations of the jobs, and to the jobs being considered more international and being associated with higher salaries. Thus, in a number of cases, English loanwords from jobs ads are evaluated differently than their Dutch equivalents. The question remains what causes these differences in perception. Are there underlying differences in meaning between the English loanwords and their Dutch counterparts? As far as we know, to date, no study has investigated this fundamental question on the perception of loanwords. The present study aims to answer this question for loanwords from the specific domain of job advertisements. The perspective from which this question is addressed is the Conceptual Feature Model, a psycholinguistic model of the way people process words in two languages (De Groot 1992a; Kroll and De Groot 1997; Van Hell and De Groot 1998).
1.1 The Conceptual Feature Model and associations with L1 and L2 words The Conceptual Feature Model is a further specification of the Revised Hierarchical Model (RHM). These models apply to bilingual speakers of different types: not only fully balanced bilinguals (who speak two languages equally well), but also bilinguals who have a better mastery of their first language than of their second language. This chapter focuses on bilinguals from the second category. According to the Revised Hierarchical Model, the form of words and their conceptual meanings are represented at different levels in the memory of speakers, the lexical level and the conceptual level, respectively (Dufour and Kroll 1995; Kroll and De Groot 1997; see also Brysbaert and Duyck 2010; Kroll et al. 2010; Smith 1997). At the lexical level, there are separate representations for a speaker’s languages, but at the conceptual level there is one unitary system, with links to the word forms in the different languages. De Groot’s (1992a) Conceptual Feature Model (CFM) further specifies the links between word forms and concepts in bilingual memory. The basic idea behind the CFM is that translation-equivalent words in a bilingual’s first language (L1) and in his/her second language (L2) may share certain conceptual aspects, but that other aspects of meaning may be unique to either the L1 word or the L2 word. In accordance with the CFM, a number of studies have found that respondents have both identical and different associations with translation-equivalent L1 and L2 words (Kolers 1963; Taylor 1976; Van Hell and De Groot 1998), the assumption being that a word association task reflects conceptual processing (De Groot 1989; Van Hell and De Groot 1998), and thus reveals the conceptual features linked to a particular word. Figure 1 illustrates the CFM with a hypothetical example, adapted from Luna and Peracchio (2002: 460), in which the translation-equivalent English and Spanish words friend and amigo share the conceptual features ‘honesty’ and ‘toys’, while the concep-
174
Frank van Meurs, Jos Hornikx and Gerben Bossenbroek
Fig. 1: A hypothetical example of conceptual features shared by and unique to the English word friend and its Spanish translation equivalent amigo (adapted from Luna and Peracchio 2002: 460)
tual features ‘McDonalds’, ‘cycle’ and ‘love’ are unique to the English word and the conceptual feature ‘male’ is unique to the Spanish word. Since the CFM specifies that words in a bilingual’s L1 and L2 may be linked to shared and unique conceptual features, the first research question for this study is to what extent loanwords, a specific form of L2 words (in this case English loanwords in the domain of job ads), have the same conceptual features as their equivalents in the recipient language, the bilingual’s L1 (in this case Dutch). Answering this question is an important addition to knowledge in the field of loanword studies, particularly concerning the perception of loanwords compared to their equivalents in the recipient language. In view of the link between underlying concepts and associations posited in the CFM, we aim to investigate this question by studying the extent to which English loanwords from Dutch job ads evoke the same associations as their Dutch equivalents. The first research question, therefore, is: RQ1: To what extent do English loanwords from Dutch job ads evoke the same associations as their Dutch equivalents?
1.2 Cognateness and abstractness as predictors of association overlap Two factors that have been shown to determine the degree of overlap in associations between translation-equivalent words in two languages are their degree of cognateness and abstractness. Studies have found that concrete words were translated faster than abstract words, and that cognates were translated faster than non-cognates (e.g. De Groot 1992b; for an overview of such studies, see Kroll and De Groot 1997; Van Hell and De Groot 1998). In these studies, concrete and abstract words were operationalized as words with high versus low imageability, respectively (De Groot 1992b:
Association overlap between English loanwords and their Dutch counterpart
175
1002), and cognates were defined as “translation pairs in which the words are similar in sound and spelling” (Van Hell and De Groot 1998: 193). Examples of concrete words are the English/Dutch translation pairs tree/boom and apple/appel, while the English/Dutch translation pairs duty/plicht and quality/kwaliteit are examples of abstract words (Van Hell and De Groot 1998: 210). Examples of cognates that Van Hell and De Groot (1998: 210) provide are the English/Dutch translation pairs insight/inzicht and shoulder/schouder, while favor/gunst and bottle/fles are examples of non-cognate English/Dutch translation pairs. The CFM takes the differences in translation times found between translation pairs belonging to these different categories as evidence that concrete and cognate translation-equivalent words share more conceptual features than do abstract and non-cognate translation-equivalent words. As an explanation for the larger similarity between the associations with translation-equivalent cognates than non-cognates, Van Hell and De Groot (1998: 194) suggest that the similarity in form between cognates may lead language learners to “simply map the to-be-learned L2 word onto the existing conceptual representation of its translation in the native language”, whereas for non-cognates the dissimilarity in form does not invite them to do so. One possible explanation which Van Hell and De Groot offer for the larger overlap in associations between translation pairs of concrete words than between translation pairs of abstract words is that the meaning of abstract words may depend more on their linguistic context, and is therefore more likely to be language specific. In contrast, as Kroll and De Groot (1997: 187) point out, “[b]ecause concrete words refer to perceptual referents that are, for the most part, shared across languages, they will access similar or identical subsets of conceptual features, regardless of the language in which they are presented”. Van Hell and De Groot (1998) compared participants’ associations with translation-equivalent L1 and L2 words to the associations they gave at different times for the same word in the same language. They found that, although overlap between the associations with the same words in the same language was relatively small, it was larger than that between translation pairs, which provides some evidence that associations with words may be language specific. In order to shed light on possible underlying factors that may explain the degree to which loanwords and their equivalents in the recipient language differ in meaning, the current study aims to establish whether cognateness and concreteness can also explain the degree of overlap between the specific kinds of translation equivalent L1 and L2 words under study here, i.e. English loanwords in the domain of job ads and their Dutch equivalents. The second research question, therefore, is: RQ2: To what extent can the degree of overlap in associations between English loanwords from Dutch job ads and their Dutch equivalents be explained by the degree to which they are cognates and the degree to which they are concrete?
176
Frank van Meurs, Jos Hornikx and Gerben Bossenbroek
1.3 Contribution of the current study By investigating RQ1 and RQ2, the current study answers the call by Luna and Peracchio (2002) to apply the CFM to language choice in advertising, in this case job advertising. It adds to the field of loanword studies by applying a psycholinguistic model to the study of loanwords and their native language equivalents. It is also innovative in the field of loanword studies in that it uses an experimental approach to elicit associations with loanwords and their native language counterparts. It is similar to studies in Cognitive Sociolinguistics in its interest in a statistical approach to variation in meaning as perceived by members of a speech community (cf. Geeraerts, Kristiansen, and Peirsman 2010; Kristiansen and Dirven 2008), in this case the meaning of English loanwords from job ads and that of their Dutch equivalents as perceived by Dutch people. However, where most studies in the field of Cognitive Sociolinguistics use corpus analyses to study variation (for an example relating to loanwords, see Zenner, Speelman, and Geeraerts 2012), the current study takes a psycholinguistic experimental approach.
2 Method Participants were asked to write down associations with English words and their Dutch translation equivalents, in two sessions separated by a six-week interval so as to prevent a repetition effect between sessions. In order to have a baseline against which the overlap in associations with Dutch-English word pairs from job ads could be measured, three types of English and Dutch equivalent word pairs were included in the experiment: (1) English words from Dutch job ads and their Dutch equivalents, (2) English/Dutch word pairs which on the basis of an earlier psycholinguistic study could be expected to evoke relatively few overlapping associations, and (3) English/Dutch word pairs which, on the basis of that same study, could be expected to evoke a relatively large proportion of overlapping associations. For each word, participants were asked to write down the first three associations they thought of. For each participant, the degree of overlap in each word pair was calculated.
2.1 Material 2.1.1 Job ad word pairs The first type of word pairs consisted of English words from Dutch job ads and their Dutch equivalents. Thirty English words were chosen from corpora of Dutch job ads aimed at highly educated readers: 119 job ads published in the national quality newspaper de Volkskrant in August 2001 (Korzilius, Van Meurs, and Hermans 2006), 113 job ads published on the Dutch job site Monsterboard in February 2004 (Van Meurs,
Association overlap between English loanwords and their Dutch counterpart
177
Korzilius, and Den Hollander 2006b), and 33 job ads from Carp (June 2006), nrc.next (July 2006) and de Volkskrant Banen (April 2006). These words were selected from the English words that occurred at least twice in these corpora. Words were considered English when they were listed as entries in a bilingual English-Dutch dictionary (Martin and Tops 2002), in the Longman Dictionary of English Language and Culture (Gadsby 1998), or in a monolingual Dutch dictionary (Den Boon and Geeraerts 2005) with the indication that the word was of English origin (with the abbreviation Eng ). In order to be considered for inclusion in the experiment, a word had to be unambiguously identifiable as an English word. Consequently, a word such as ICT was discarded, because it could be either a Dutch word (an abbreviation for Informatie- en Communicatietechnologie ) or an English word (an abbreviation for Information and Communication Technology ). The English words that were ultimately included in the experiment were those English words for which Dutch equivalents were available. These equivalents were found in a bilingual English-Dutch dictionary (Martin and Tops 2002), a dictionary of English job titles with Dutch translations (Schreiner 1990), and the EU terminological translation database InterActive Terminology for Europe (http://iate.europa.eu). Three criteria were used for selecting the Dutch equivalents of the English words from the job ads. Firstly, the Dutch equivalent had to be one word; one exception was made for non-profit where the only Dutch translation consists of two words (zonder winstoogmerk ). Secondly, the English back-translation of the Dutch equivalent of the original English term had to be the same as the original English word. For example, the English term part time and the Dutch deeltijd were considered good equivalents, because the only possible English back-translation of deeltijd is part time. Control, for instance, did not meet this criterion because one possible Dutch equivalent of control (as in quality control ) is controle, which can also be back-translated as check or verification. Thirdly, the Dutch equivalent should not be or include a word of English origin. For instance, a possible Dutch translation of team player is teamspeler, which includes the English word team, and therefore did not meet this last criterion. Two pretests were conducted to test the equivalence of the English words and their Dutch translations: an evaluation of translation equivalence and a back-translation check. In the first pretest, the quality of the translations was evaluated by seven raters who all had wide experience in using English in their academic studies and in their work, using a five-point semantic differential scale (1 = very poor; 5 = very good). In a second pretest, the Dutch translations were back-translated into English by two Dutch near-native speakers of English who both were teachers of English language proficiency at Dutch universities. A word was only included in the main experiment if the quality of a translation was evaluated with a score of 3.5 or higher in the first pretest, and if at least one of the back-translations resulted in the original English word in the second pretest.
178
Frank van Meurs, Jos Hornikx and Gerben Bossenbroek
Some examples of the English-Dutch word pairs resulting from this process are sales manager/verkoopleider, fulltime/voltijd, human resources/personeelszaken, job/ baan and retail/detailhandel.
2.1.2 Word pairs with little expected association overlap The second type of word pairs that were included in the experiment were twenty-nine English-Dutch word pairs which were expected to evoke few overlapping associations on the basis of the study conducted by Van Hell and De Groot (1998). Van Hell and De Groot (1998) do not provide exact overlap scores for the word pairs they included in their experiment, but they indicate to which category each word pair belongs: abstract/concrete, cognate/non-cognate. In the current study, all word pairs were chosen that they categorized as both abstract and non-cognate, supplemented with other abstract word pairs to arrive at twenty-nine word pairs. Examples of word pairs with little expected association overlap include duty/plicht ; memory/geheugen, guess/raden, and principle/principe.
2.1.3 Word pairs with much expected association overlap The third type of word pairs included in the experiment were twenty-nine EnglishDutch word pairs which on the basis of Van Hell and De Groot (1998) could be expected to evoke a large proportion of overlapping associations. Consequently, word pairs were chosen that Van Hell and De Groot categorized as both concrete and cognate, supplemented with other concrete word pairs to arrive at twenty-nine word pairs. Examples of word pairs with much expected association overlap are shoulder/schouder, apple/appel, climb/klimmen, and gun/geweer. The full list of the word pairs of all three types of the English-Dutch word pairs used in the experiment can be found in Appendix A.
2.2 Participants In all, 60 Dutch university students or graduates took part in the experiment (68% female). Their mean age was 24.04 years (SD = 2.68), with a minimum age of 20 and a maximum of 31. All participants spoke Dutch as their first language and all indicated that they spoke English in addition to their mother tongue. Their university background and mastery of English as a foreign language was comparable to that of participants in earlier RHM experiments (Kroll et al. 2010). Participants were paid 10 Euros after having completed the second word association test.
Association overlap between English loanwords and their Dutch counterpart
179
2.3 Research design and instrumentation The experiment had a within-subject design. Each participant wrote down associations with all English words and with their Dutch equivalents, in two separate sessions with a six-week interval between them. Half of the participants were presented with the Dutch words in the first session and with the English words in the second session, and vice versa for the other half of the participants. The words originating from the three categories were presented in five different sequences, generated by the Random Sequence Generator utility (http://www.random. org/sequences). Each individual participant was presented with the same sequence for the Dutch and the English words in the two sessions. This ensured that if associations with a word influenced the associations with the following word, the preceding word was the same for both the English loanword and its Dutch equivalent and therefore the carryover effect was kept constant. The participants were asked to write down three associations for each word. To this end, underneath each word three lines were printed in the questionnaire. Psycholinguistic research indicates that frequency of occurrence can be an important factor in determining the associations evoked by L1 and L2 words (De Groot 1989: 825–826). The frequency of the English and Dutch words used in the experiment was determined on the basis of corpora of English and Dutch newspapers, respectively. The English corpus was the Reuters RCV1 corpus (see Lewis et al. 2004), consisting of over 130 million words, and the Dutch corpus was a selection of exactly the same number of words from the Twente News Corpus (see Twente News Corpus n.d.). The frequency that was determined was that of the exact word forms as used in the experiment.
2.4 Analysis of overlap The following six criteria were used to determine whether associations with a word pair were the same between two sessions. Two associations were considered the same if (1) the associations were literally identical, e.g. the underlined associations with snow and sneeuw respectively: sneeuw/ pop /vlok (‘snow/man/flake’) and wit / vlok /pop (‘white/flake/man’) (2) the associations were the singular and plural form of the same word, e.g. kogel – kogels (‘bullet – bullets’) (3) the associations were the regular and the diminutive form of a word, e.g. ei – eitje (‘egg – little egg’) (4) the associations were a verb and its corresponding noun, e.g. schot – schieten (‘shot – shoot’) (5) the associations were the infinitive and another form of the same verb, e.g. mislukken – mislukt (‘fail – failed’)
180
Frank van Meurs, Jos Hornikx and Gerben Bossenbroek
(6) the associations were synonyms, e.g. cijfer – getal (‘figure – number’), pc – computer, snappen – begrijpen (‘grasp – understand’), dokter – arts (‘doctor – physician’) The degree of overlap was determined by counting the number of overlapping associations and dividing them by the highest number of possible overlapping associations. If one association in one session overlapped with an association in the other session, the number of overlapping associations was counted as two. The highest number of possible overlapping associations was the highest number of associations given in either session multiplied by two. For instance, one participant wrote down three associations for the English word retail : winkel, producten, boodschap (‘shop, products, shopping’). For the corresponding Dutch word detailhandel, s/he wrote down two associations: klanten, producten (‘customers, products’). The number of overlapping associations was 2 (producten /producten ). This was divided by the highest number of possible overlapping associations, which was 3 (the number of associations given in the first session) times 2, which equals 6. Consequently, the overlap was calculated to be 2/6 = 33.33%. In order to determine inter-rater reliability, a second, independent rater calculated overlap for a random selection of 10% of the questionnaires (six questionnaires), as is recommended for social science research (Neuendorf 2002: 158). Cohen’s K’s ranged from 0.79 to 0.97 (M = 0.88; SD = 0.06), which can be qualified as good or excellent (Neuendorf 2002: 143).
2.5 Norming studies for cognateness and concreteness In order to determine the degrees of cognateness and concreteness of the words from the word pairs in the experiment, two norming studies were conducted with participants who were different from those who wrote down the associations. It was necessary to conduct these norming studies, since there were no cognateness and concreteness scores available for the job ad specific word pairs, and the exact scores for cognateness and concreteness for the other word pairs could not be retrieved from the study from which these word pairs were taken (Van Hell and De Groot 1998). The materials for the norming studies consisted of the same word pairs as those used in the main experiment. In the cognateness norming study, each English-Dutch word pair was presented in bold letters, with a scale underneath. The English word was always presented first. In the concreteness norming study, either the English words or their Dutch counterparts were presented, also in bold letters, with a scale under each word. In both norming studies, the words or word pairs were presented in the same random sequence, generated by the Random Sequence Generator utility. In total, 129 Dutch university students took part in the norming studies. Of these participants, 45 evaluated the concreteness of the English words, and 42 evaluated the
Association overlap between English loanwords and their Dutch counterpart
181
concreteness of the Dutch words. The degree of cognateness of the English and Dutch items in the word pairs was evaluated by 42 participants. The participants were on average 19.87 (SD = 1.83) years old, and 74.4% were female. The first language of all participants was Dutch. The participants took part on a voluntary basis, and were not given a reward. Filling in the questionnaires took less than 15 minutes. The scales used in the two norming studies were the same as those used in De Groot, Dannenburg, and Van Hell (1994), and in Van Hell and De Groot (1998). In the cognateness study, participants were asked to indicate how similar they thought the English and Dutch words in each word pair were on a seven-point scale, where 1 = very small similarity and 7 = very large similarity. Based on the instructions in the earlier studies, they were asked to base their evaluation on a combination of similarity in spelling and in sound. The exact wording of these instructions was based on those in Tokowicz et al. (2002: 444). In the concreteness norming study, participants were asked to rate how easily a word evoked a mental image, on a seven-point scale, where 1 = does not evoke an image at all and 7 = evokes an image very quickly. The instructions at the beginning of the questionnaire about concreteness, like those in the earlier studies, were taken from Paivio, Yuille, and Madigan (1968), in the Dutch translation by Van Loon-Vervoorn (1985).
2.6 Statistical procedure After association overlap for each word pair had been calculated for each participant, the average overlap for the three word groups (job ad specific words, words with an expected low degree of overlap and words with an expected high degree of overlap) was compared using a one-way ANOVA (RQ1). For RQ2, a regression analysis was employed to investigate the potential effect of cognateness and concreteness on association overlap for the job ad specific word pairs (there were no outliers, and the residuals followed a normal distribution). The cognateness scores for the word pairs were taken from the norming study (n = 42). The concreteness scores were based on the mean scores for the English and the Dutch word in a word pair (n = 97). In addition to these two potential predictors, a word pair’s log frequency was added as a control. Log frequency was computed on the basis of the sum of the frequencies of the English and Dutch word; the sum frequency was converted to a log10 frequency. The regression model with the three predictors was tested with 30 word pairs (which is the minimum number of observations, according to Field 2009: 10 times the number of predictors). It should be noted, however, that each word pair’s score on cognateness and concreteness was based on 42 and 97 (participant) observations, respectively, adding to the model’s robustness.
182
Frank van Meurs, Jos Hornikx and Gerben Bossenbroek
3 Results 3.1 RQ1 In order to address RQ1 regarding the extent to which English loanwords from Dutch job ads evoke the same associations as their Dutch equivalents, the essential indicator is the mean degree of overlap in associations. Results show that English loanwords taken from job ads and their Dutch equivalents had a mean association overlap of 22% (M = 21.61, SD = 8.39). Two other groups of word pairs were included in the study as baselines against which this mean can be compared: a group with low expected overlap, and a group with high expected overlap. The results demonstrate that the mean overlap of these three differed significantly: F(2, 85) = 13.30, p < .001, 2 = .24. Posthoc Bonferroni tests further indicated that the job ad word pairs showed the same amount of overlap as the low-overlap group (M = 21.37, SD = 7.72) and a lower amount of overlap than the high-overlap group (M = 30.63, SD = 7.28). Table 1 gives the descriptive statistics for the three groups of word pairs.
Tab. 1: Degree of overlap, cognateness and concreteness of three groups of English/Dutch word pairs group n Job ad word pairs Low-overlap word pairs High-overlap word pairs
30 29 29
overlap M SD 21.61 21.37 30.63
8.39 7.72 7.28
cognateness M SD 2.88 3.35 4.13
1.87 1.97 1.89
concreteness M SD 3.55 3.50 6.04
1.14 1.18 0.75
3.2 RQ2 In RQ2, the question of interest was to what extent the degree of association overlap between English and Dutch job ad word pairs could be explained by their degrees of cognateness and concreteness. An indication for the role of concreteness and cognateness in predicting the degree of association overlap can be found in the fact that the job-ad-specific word pairs were significantly less concrete (F(1, 57) = 96.89, p < .001, 2 = .63) and significantly less cognate (F(1, 57) = 6.59, p < .05, 2 = .10) than the word pairs for which a large degree of overlap was expected, and that the job-ad-specific word pairs did not differ significantly in these respects from the word pairs for which a small degree of overlap was expected (concreteness: F (1, 57) < 1; cognateness: F (1, 57) < 1). However, to answer RQ2 with a more stringent test, a regression analysis for the job ad word pairs was conducted with cognateness and concreteness as predictors, word frequency as a control predictor, and degree of overlap as the dependent variable. This
Association overlap between English loanwords and their Dutch counterpart
183
Tab. 2: Regression for the predictors of percentage of association overlap for job-ad-specific word pairs predictor cognateness concreteness word frequency
B
SE
ˇ
t
p
2.54 1.63 0.98
0.64 1.24 1.31
.56 .22 .12
3.96 1.32 0.75
.001 .198 .459
model proved to be significant: F(3, 26) = 10.48, p < .001, adjusted R2 = .50. Further analysis showed that the degree of overlap was significantly predicted by cognateness (ˇ = .56, p < .01), but not by concreteness (ˇ = .22, p = .20) or word frequency (ˇ = .12, p = .46). This means that word pairs with a higher degree of cognateness had a higher degree of association overlap. Table 2 gives the results for the regression analysis, and Figure 2 is a visualization of these results. The frequencies, cognateness scores, concreteness scores, and percentages of association overlap for the individual words and word pairs can be found in Appendix A.
Fig. 2: Results of the regression analysis for percentage of association overlap for job-adspecific word pairs (** p < .01)
4 Conclusion and discussion The aim of this study was to determine the extent to which English loanwords from Dutch job ads evoke the same associations as their Dutch equivalents (RQ1) and the extent to which degree of overlap in associations between English loanwords from Dutch job ads and their Dutch equivalents can be explained by the degree to which they are cognates and the degree to which they are concrete (RQ2). In answer to RQ1, the present experimental investigation has shown that translation-equivalent job-ad-related English and Dutch words to a large extent do not evoke the same associations. These findings are in line with the Conceptual Feature Model (CFM), according to which translation-equivalent L1 and L2 words are linked to both
184
Frank van Meurs, Jos Hornikx and Gerben Bossenbroek
the same and different concepts in a bilingual’s mind (De Groot 1992b; Van Hell and De Groot 1998). On the basis of the assumption in the CFM that associations reveal the conceptual features linked to a particular word (De Groot 1989; Van Hell and De Groot 1998), it can, therefore, be concluded that the findings of the current study are in line with lexical gap theory as an explanation for the use of loanwords (e.g. Hock 1986: 408–409; Wetzler 2006: 27–28). That is, the English loanwords are linked to conceptual features which are to a large extent different from those of their Dutch equivalents. Whereas the lexical gap theory explains loanword use from the perspective of the loanword users, the current study has demonstrated that loanwords and their counterparts in the recipient language are perceived as having largely different associations, and hence largely different associative meanings, in the minds of those who read them, i.e. the receivers of the loanwords. This has been demonstrated to be the case even for word pairs in which a loanword has a translation equivalent in the recipient language. These findings do not imply that loanwords and their translation equivalents have completely different denotational meanings, but they would certainly appear to be linked to different concepts. In answer to RQ2, the current study has demonstrated that degree of cognateness explains differences in association overlap between English loanwords from Dutch job ads and their Dutch equivalents, which is in line with what is predicted by the CFM (Van Hell and De Groot 1998). Concreteness, however, was not found to predict degree of association overlap, which is inconsistent with earlier findings (see Van Hell and De Groot 1998). Two potential explanations for this discrepancy can be ruled out: concreteness scores were neither very low nor very high, and there was sufficient random variation around the mean. Word frequency was also not found to predict association overlap, which is consistent with the experimental studies in De Groot (1989). The present study has provided evidence that English loanwords from Dutch job advertisements and their Dutch translation equivalents have largely different associative meanings. While it has investigated to what extent these associations were the same, it has not explored how they differed. Future studies should investigate the nature of the associations evoked by loanwords and their equivalents in the recipient language, for instance the extent to which they are evaluative and their valence, i.e. the extent to which they are positive or negative (cf. Hornikx, Van Meurs, and Starren 2007). Another topic that should be explored in future research is how differences in associations between English loanwords and their Dutch equivalents affect the interpretation and evaluation of the job ad messages of which they are part (cf. Hornikx, Van Meurs, and Starren 2007). The findings of experimental studies which investigated the evaluation of English versus Dutch in job ads and in job titles suggest that factors that should be taken into account in such future research are text length and proportion of loanwords used in the text. Van Meurs et al. (2007) found that some English job titles were evaluated differently from their Dutch counterparts, while Van Meurs, Korzilius, and Den Hollander (2006a) and Van Meurs, Korzilius, and Hermans (2004) found that in entire job ads the use of English loanwords (and even of entire English
Association overlap between English loanwords and their Dutch counterpart
185
texts) did not result in evaluations that were different from the evaluations of equivalent fully Dutch job ads. It may be that differences in association have an impact on evaluation when the English loanwords are presented in concentrated form, such as in job titles in isolation, but not in larger texts, such as in entire job ads, because in larger texts the associations with the loanwords interact with the associations with all the other words. In conclusion, the present study aimed to contribute to loanword studies, and particularly to the study of English loanwords in the domain of job advertisements, by presenting a novel experimental approach to investigating differences in the perception of loanwords and their equivalents in the recipient language, which is based on the CFM. The current study has demonstrated that loanwords and their equivalents in the recipient language indeed behave in accordance with the CFM. English loanwords from Dutch job ads were found to evoke different associations than their Dutch translation equivalents, and differences in association overlap could be explained by differences in degree of cognateness. It is hoped that the CFM will prove to be a fruitful framework for future studies in the area of loanword studies.
Acknowledgements We thank Antal van den Bosch, Albertine Bosselaar, Béryl Hilberink-Schulpen, Henk van Jaarsveld, Michael Snijders and two anonymous reviewers for their help and advice. We also thank Johan Oosterman and Mathijs Sanders for allowing us to ask their students to take part in the norming studies.
References Brysbaert, Marc & Wouter Duyck. 2010. Is it time to leave behind the Revised Hierarchical Model of bilingual language processing after fifteen years of service? Bilingualism: Language and Cognition 13(3). 359–371. De Groot, Annette M. B. 1989. Representational aspects of word imageability and word frequency as assessed through word association. Journal of Experimental Psychology: Learning, Memory and Cognition 15(5). 824–845. De Groot, Annette M. B. 1992a. Bilingual lexical representation: A closer look at conceptual representations. In Ram Frost & Leonard Katz (eds.), Orthography, phonology, morphology, and meaning, 389–412. Amsterdam: Elsevier. De Groot, Annette M. B. 1992b. Determinants of word translation. Journal of Experimental Psychology: Learning, Memory and Cognition 18(5). 1001–1018. De Groot, Annette M. B., Lucia Dannenburg & Janet G. van Hell. 1994. Forward and backward word translation by bilinguals. Journal of Memory and Language 33(5). 600–629. Den Boon, Ton & Dirk Geeraerts. 2005. Van Dale groot woordenboek van de Nederlandse taal [Van Dale large dictionary of the Dutch language]. Utrecht & Antwerpen: Van Dale Lexicografie.
186
Frank van Meurs, Jos Hornikx and Gerben Bossenbroek
Dufour, Robert & Judith F. Kroll. 1995. Matching words to concepts in two languages: A test of the concept mediation model of bilingual representation. Memory and Cognition 23(2). 166– 180. Field, Andy. 2009. Discovering statistics using SPSS (and sex and drugs and rock ‘n’ roll) (3rd ed.). London: Sage. Fink, Hermann. 1977. “Texas-Look” und “Party-Bluse”: Assoziative Effekte von Englischem im Deutschen [“Texas-Look” and “Party-Bluse”: associative effects of English in German]. Wirkendes Wort 27(6). 394–402. Gadsby, Adam. 1998. Longman dictionary of English language and culture. Harlow: Addison Wesley Longman. Geeraerts, Dirk, Gitte Kristiansen & Yves Peirsman. 2010. Introduction. Advances in Cognitive Sociolinguistics. In Dirk Geeraerts, Gitte Kristiansen & Yves Peirsman (eds.), Advances in Cognitive Sociolinguistics, 1–19. Berlin & New York: Mouton de Gruyter. Hassall, Tim, Elisabeth Titik Murtisari, Christine Donnelly & Jeff Wood. 2008. Attitudes to western loanwords in Indonesian. International Journal of the Sociology of Language 189. 55–84. Hock, Hans Henrich. 1986. Principles of historical linguistics. Berlin & New York: Mouton de Gruyter. Hornikx, Jos, Frank van Meurs & Marianne Starren. 2007. An empirical study of readers’ associations with multilingual advertising: The case of French, German, and Spanish in Dutch advertising. Journal of Multilingual and Multicultural Development 28(3). 204–219. Kolers, Paul A. 1963. Interlingual word associations. Journal of Verbal Learning and Verbal Behavior 2. 291–300. Korzilius Hubert, Frank van Meurs & José Hermans. 2006. The use of English in job advertisements in a Dutch national newspaper - on what factors does it depend? In Rogier Crijns & Christian Burgers (eds.), Werbestrategien in Theorie und Praxis: Sprachliche Aspekte von deutschen und niederländischen Unternehmensdarstellungen und Werbekampagnen [Advertising strategies in theory and practice: linguistic aspects of German and Dutch company representations and advertising campaigns], 147–174. Tostedt: Attikon. Kristiansen, Gitte & René Dirven. 2008. Introduction. Cognitive Sociolinguistics: Rationale, methods and scope. In Gitte Kristiansen & René Dirven (eds.), Cognitive Sociolinguistics: Language variation, cultural models, social systems, 1–17. Berlin & New York: Mouton de Gruyter. Kroll, Judith F. & Annette M. B. de Groot. 1997. Lexical and conceptual memory in the bilingual: Mapping form to meaning in two languages. In Annette M. B. de Groot & Judith F. Kroll (eds.), Tutorials in bilingualism: Psycholinguistic perspectives, 169–200. Mahwah (NJ): Lawrence Erlbaum. Kroll, Judith F., Janet G. van Hell, Natasha Tokowicz & David W. Green. 2010. The Revised Hierarchical Model: A critical review and assessment. Bilingualism: Language and Cognition 13(3). 373–381. Lewis, David D., Yiming Yang, Tony G. Rose & Fan Li. 2004. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5. 361–397. Luna, David & Laura A. Peracchio. 2002. Uncovering the cognitive duality of bilinguals through word association. Psychology and Marketing 19(6). 457–476. Martin, Willy Jules Rogier & Guy August Juliaan Tops (eds.) 2002. Van Dale groot woordenboek Engels Nederlands [Van Dale large dictionary of English Dutch] [electronic version]. Utrecht & Antwerpen: Van Dale Lexicografie. Neuendorf, Kimberley A. 2002. The content analysis guidebook. Thousand Oaks (CA): Sage Publications.
Association overlap between English loanwords and their Dutch counterpart
187
Paivio, Allan, John C. Yuille & Stephen A. Madigan. 1968. Concreteness, imagery and meaningfulness for 925 nouns. Journal of Experimental Psychology. Monograph Supplement 76(1). Part 2, 1–25. Poplack, Shana, David Sankoff & Christopher Miller. 1988. The social correlates and linguistic processes of lexical borrowing and assimilation. Linguistics 26. 47–104. Schreiner, N. A. F. M. 1990. Zogenaamd zogenoemd: Sollicitatiewoordenboek van functiebenamingen [So-called: application dictionary of job titles]. Baarn: Fontein. Smith, Marilyn Chapnik. 1997. How do bilinguals access lexical information? In Annette M. B. de Groot & Judith F. Kroll (eds.), Tutorials in bilingualism: Psycholinguistic perspectives, 145– 168. Mahwah (NJ): Lawrence Erlbaum. Taylor, Insup. 1976. Similarity between French and English words – a factor to be considered in bilingual language behaviour? Journal of Psycholinguistic Research 5(1). 85–94. Thøgersen, Jacob. 2004. Attitudes towards the English influx in the Nordic countries: A quantitative investigation. Nordic Journal of English Studies 3(2). 23–38. Tokowicz, Natasha, Judith F. Kroll, Annette M. B. de Groot & Janet G. van Hell. 2002. Numberof-translation norms for Dutch-English translation pairs: A new tool for examining language production. Behavior Research Methods, Instruments, and Computers 34(3). 435–451. Twente News Corpus. n.d. Twente News Corpus: A multifaceted Dutch news corpus. Retrieved 19 October 2012 from http://hmi.ewi.utwente.nl/TwNC Van Hell, Janet G. & Annette M. B. de Groot. 1998. Conceptual representation in bilingual memory: Effects of concreteness and cognate status in word association. Bilingualism: Language and Cognition 1(3). 193–211. Van Hout, Roeland & Pieter Muysken. 1994. Modeling lexical borrowability. Language Variation and Change 6. 39–62. Van Loon-Vervoorn, Willemina Angenita. 1985. Voorstelbaarheidswaarden van Nederlandse woorden [Imageability values of Dutch words]. Lisse: Swets & Zeitlinger. Van Meurs, Frank. 2010. English in job advertisements in the Netherlands: Reasons, use and effects. Utrecht: LOT. Van Meurs, Frank, Hubert Korzilius & Adriënne den Hollander. 2006a. Testing the effect of a genre’s form on its target group. In Paul Gillaerts & Philip Shaw (eds.), The map and the landscape: Norms and practices in genre, 91–117. Bern: Peter Lang. Van Meurs, Frank, Hubert Korzilius & Adriënne den Hollander. 2006b. The use of English in job advertisements on the Dutch job site Monsterboard.nl and factors on which it depends. ESP across Cultures 3. 103–123. Van Meurs, Frank, Hubert Korzilius & José Hermans. 2004. The influence of the use of English in Dutch job advertisements: An experimental study into the effects on text evaluation, on attitudes towards the organization and the job, and on comprehension. ESP across Cultures 1. 93–110. Van Meurs, Frank, Hubert Korzilius, Brigitte Planken & Steven Fairley. 2007. The effect of English job titles in job advertisements on Dutch respondents. World Englishes 26(2). 189–205. Wetzler, Dagmar. 2006. Mit Hyperspeed ins Internet. Zur Funktion und zum Verständnis von Anglizismen in der Sprache der Werbung der Deutschen Telekom [With hyper speed on the Internet: about the function and comprehension of Anglicisms in the advertising language of Deutsche Telekom]. Frankfurt am Main: Peter Lang. Zenner, Eline, Dirk Speelman & Dirk Geeraerts. 2012. Cognitive Sociolinguistics meets loanword research: Measuring variation in the success of anglicisms in Dutch. Cognitive Linguistics 23(4). 749–792.
188
Frank van Meurs, Jos Hornikx and Gerben Bossenbroek
Appendix A: English/Dutch word pairs used in the experiment Job ad specific word pairs (F = Frequency; Concrete = Concreteness, Cognate = Cognateness) English word / Dutch word
F English
F Dutch
Concrete
Cognate
Overlap
Account Manager/Relatiebeheerder
8
5
2.77
1.64
8.33
Assistant/Assistant
2165
1169
4.29
6.33
33.89
Banking/Bankieren
16837
784
3.97
5.19
39.17
Business Unit/Bedrijfsonderdeel
306
169
2.74
2.79
15.00
Client/Cliënt
1899
167
3.92
6.43
28.89
Core Business/Kernactiviteit
756
203
2.15
1.79
13.61
Customer/Klant
4552
5145
5.11
1.51
28.89
Development/Ontwikkeling
24684
12637
2.63
1.76
17.78
Financial/Financieel
49310
5442
3.07
6.19
30.00
Full Time/Voltijd
298
31
2.75
3.48
27.78
Group/Groep
82462
26978
4.95
6.02
26.94
Human Resources/Personeelszaken
272
167
2.20
1.48
13.89
Industry/Industrie
48000
8204
5.79
6.43
33.33
Insurance/Verzekering
18606
2013
2.90
1.62
19.17
Job/Baan
12527
14481
4.45
1.64
34.44
Non-Profit/Zonder Winstoogmerk
353
40
2.60
1.69
21.11
Office/Kantoor
23012
5138
5.69
1.81
22.78
Part Time/Deeltijd
216
388
2.40
2.93
31.39
People/Mensen
72816
123136
6.16
1.69
19.72
Professionals/Vakmensen
891
127
3.93
1.62
16.11
Recruitment/Personeelswerving
341
24
2.80
1.60
13.06
Representatives/Vertegenwoordigers
4550
3007
3.28
1.52
11.94
Retail/Detailhandel
15967
2612
2.55
2.64
19.17
Sales Manager/Verkoopleider
423
33
3.51
1.69
18.33
Services/Diensten
1539
2210
2.92
1.88
10.28
Solutions/Oplossingen
29761
5641
2.80
1.79
17.22
Supply Chain/Toeleveringsketen
88
0
2.30
1.64
10.83
Support/Ondersteuning
39069
1528
3.46
1.93
14.44
Tax/Belasting
52772
4570
3.54
1.60
22.78
Work/Werk
30618
56008
4.87
5.98
28.05
Association overlap between English loanwords and their Dutch counterpart
189
Word pairs with little association overlap (based on Van Hell and De Groot 1998; F = Frequency; Concrete = Concreteness, Cognate = Cognateness) English word / Dutch word
F English
F Dutch
Concrete
Cognate
Overlap
Admit/Toegeven
1278
1714
2.50
1.43
11.39
Arrest/Arresteren
5073
554
5.72
5.50
33.89
Attempt/Poging
7174
7694
2.75
1.88
28.61
Chance/Kans
8945
25670
2.61
4.57
21.11
Circle/Cirkel
730
976
6.68
6.10
38.61
Conscience/Geweten
289
2921
2.37
1.61
14.72
Dare/Durven
203
2936
2.74
3.48
22.22
Demand/Eis
48898
3752
2.75
1.48
11.11
Disturb/Storen
162
490
3.93
1.74
15.55
Duty/Plicht
3980
1686
3.30
1.52
16.39
Favor/Gunst
5851
416
2.67
1.79
13.61
Figure/Figuur
12290
2346
5.12
5.95
15.55
Forget/Vergeten
769
8619
2.92
4.21
24.44
Guess/Raden
1080
3996
2.98
1.64
31.67
Insight/Inzicht
195
4106
2.53
5.48
13.89
Irritate/Irriteren
181
194
4.64
5.57
23.05
Memory/Geheugen
1846
2151
3.41
1.66
25.00
Metal/Metaal
6779
826
5.94
6.10
21.11
Method/Methode
1797
3103
2.55
5.98
31.94
Motive/Motief
530
1156
2.53
5.76
23.89
Opportunity/Gelegenheid
6233
7473
2.47
1.40
11.39
Panic/Paniek
1563
2721
5.23
5.83
30.56
Principle/Principe
3271
4651
2.70
5.71
15.55
Promise/Beloven
2400
1058
3.38
1.79
29.72
Refuse/Weigeren
1095
2862
3.37
1.62
16.39
Revenge/Wraak
607
1950
3.87
1.67
21.11
Succeed/Slagen
2348
3898
3.45
2.57
10.00
Truth/Waarheid
1646
4874
2.84
1.43
22.22
Understand/Begrijpen
3133
5094
3.49
1.74
25.00
190
Frank van Meurs, Jos Hornikx and Gerben Bossenbroek
Word pairs with much association overlap (based on Van Hell and De Groot 1998; F = Frequency; Concrete = Concreteness, Cognate = Cognateness) English word / Dutch word
F English
F Dutch
Concrete
Cognate
Overlap
Apple/Appel
293
790
6.78
6.29
26.11
Baptise/Dopen
1
159
5.00
1.50
28.89
Calculate/Rekenen
720
6170
5.01
1.76
25.55
Captain/Kapitein
3966
1157
6.33
5.56
39.44
Climb/Klimmen
2666
921
5.83
4.48
35.00
Coffee/Koffie
12475
4045
6.76
6.36
28.89
Cry/Huilen
452
1676
6.20
1.45
32.78
Daughter/Dochter
2125
10278
6.06
5.07
34.44
Doctor/Dokter
1899
2739
6.43
6.60
16.94
Finger/Vinger
471
1867
6.75
6.21
39.44
Flower/Bloem
337
1000
6.74
1.76
25.00
Frown/Fronsen
21
134
5.33
5.05
23.06
Gun/Geweer
1799
896
6.51
2.76
32.22
Hospital/Ziekenhuis
9096
10244
6.52
1.67
28.89
Listen/Luisteren
586
3805
4.24
4.64
24.72
Mirror/Spiegel
298
1544
6.55
1.67
31.39
Pepper/Peper
215
2053
6.25
5.86
36.67
Police/Politie
40521
43930
6.61
5.71
24.44
Potato/Aardappel
252
496
6.74
1.52
20.83
Season/Seizoen
16609
21437
5.09
5.24
41.11
Shoulder/Schouder
883
2603
6.61
5.57
27.22
Sing/Zingen
174
4329
6.02
5.48
28.89
Slave/Slaaf
281
254
6.48
5.62
28.61
Sneeze/Niezen
13
36
6.17
3.76
35.00 51.11
Snow/Sneeuw
2256
2951
6.72
5.17
Steal/Stelen
461
1040
5.26
5.17
22.78
Swim/Zwemmen
174
2327
6.38
4.86
36.11
Throw/Gooien
1210
3191
5.34
1.55
38.05
Tremble/Beven
16
193
4.37
1.48
24.72
Astrid Rothe
On the variation of gender in nominal language mixings Abstract: This paper investigates gender assignment in nominal language mixings. It analyzes German-French, German-Spanish and German-Italian mixings between determiners and nouns to determine whether there are any patterns in the gender assignment and which factors influence this process. Based on data from a survey given to approximately 400 students with multilingual backgrounds, obtained data were analyzed statistically to examine the distribution of the assigned gender and to detect influencing factors. The results show systematic variation in the assigned gender. There are mainly two variants, one that is predominantly chosen by monolingual respondents and one that is mostly produced by bilingual respondents. Thus, speakerrelated factors, such as language proficiency, seem to be important. Other detected factors concern the other-language noun: its gender transparency, its sociolinguistic integration and its biological gender.
1 Introduction In the seventeenth century the German language took over the French noun fricassée, meaning a dish of pieces of meat. In a text from about 1670, cited in the revised edition of the Deutsche Wörterbuch, one finds the following: (1) als eine fricassee in eine pfanne bringen as a fricassee into a pan put (Courtisan, hocus pocus, p. 284) ‘put into a pan as a fricassee’ eine = indefinite feminine determiner To further illustrate this noun the Deutsche Wörterbuch lists other examples, like the following: (2) wenn man ein fricassee machen will if one a fricassee to make wants (vollständiges hannöver. kochb., p. 49) ‘if one wants to make a fricassee’ ein = indefinite masculine/neuter determiner (3) der säuerliche duft des frikassees hing […] (mädchenkrieg, p. 45) ‘the acidulous odor of the fricassee hang’ des = definite masculine/neuter determiner, genitive
192
Astrid Rothe
This indicates two things: first, the spelling of the noun apparently changed over time, and was assimilated to the German pattern by using a instead of a . The lexical entry of this noun in a current dictionary is spelled as in the last example and with a capital letter: Frikassee (cf. e.g. the online dictionary of the Institut für Deutsche Sprache OWID, www.owid.de). Second, it shows that the gender of the noun has changed from feminine (in the first example), the original gender of the noun in French, to neuter (in the last two examples), as indicated by the accompanying article. This is corroborated by the Deutsche Fremdwörterbuch. The entry there specifies that the noun was feminine at first but as soon as 1688 was used as neuter. In current German, Frikassee is a completely integrated neuter noun1 and is no longer recognized as a loanword. Thus at first, the gender of the noun was the same as the French noun (feminine), and then after some years it switched to neuter, which is the gender of the German equivalent noun Fleisch ‘meat’ (see other examples for French nouns with gender changes after being integrated into German in Brunt 1983 and Wawrzyniak 1985). As another example, a German-Italian bilingual child was found to produce the following mixed utterance alternating between both languages: (4) il schwanz (Cantone and Müller 2008: 820, as (47)) theDET.IT.MAS tailN.D.MAS 2 The bilingual child apparently chooses masculine – the Italian determiner is a masculine one – in accordance with the gender of the German noun Schwanz though the Italian equivalent noun coda is feminine. This pattern seems to be a systematic one as it is evident in several more comparable examples in language mixing data. These examples show that the language used may alternate between determiner and noun. In the course of this alternation gender needs to be assigned. It is necessary, virtually without exception, to assign a gender when the other-language noun is used with a determiner in a language that also has gender. For this reason Eisenberg and Baurmann (1984) call gender the admission ticket for other-language nouns. Interestingly, gender assignment does not seem to be a straightforward process and hence merits further investigation. Language pairs such as the ones above – i.e. German and French or German and Italian – are well suited for this kind of investigation because there are various options for the alternation between determiner and noun regarding gender. Given that German has three genders (masculine, feminine and neuter) the French noun fricassée could be alternated either with a masculine article (5), a feminine one (6) or a neuter one (7):
1 The revised edition of the Deutsche Fremdwörterbuch does not list an entry for Frikassee anymore. Integrated and assimilated loanwords such as Frikassee do not meet the criteria for a listing. 2 Key to the abbreviations: DET = determiner, N = noun, D = German, F = French, IT= Italian, FEM = feminine, MAS = masculine, NEU = neuter.
On the variation of gender in nominal language mixings
193
(5) der fricassée theDET.D.MAS fricasseeN.F.FEM (6) die fricassée theDET.D.FEM fricasseeN.F.FEM (7) das fricassée theDET.D.NEU fricasseeN.F.FEM The same is true for the alternation between an Italian determiner and a German noun. The noun could be alternated with a masculine (8) or a feminine Italian determiner (9): (8) il Schwanz theDET.IT.MAS tailN.D.MAS (9) la Schwanz theDET.IT.FEM tailN.D.MAS In addition to the choice of the determiner, the constellation of the genders in both languages (noun and translation equivalent) needs to be taken into account. There are nouns for which gender is the same as that of their respective translation equivalent (e.g. the French noun solution is feminine, its German translation equivalent Lösung is feminine; gender-congruent combination) as opposed to nouns whose gender is different (e.g. the German noun Schwanz is masculine, its Italian translation equivalent is feminine; gender-incongruent combination). This kind of investigation is done in loanword (e.g. Poplack, Pousada, and Sankoff 1982, Poplack and Sankoff 1984; Gregor 1983) as well as in codeswitching research (e.g. Gonzalez 2005; Cantone 2007). However, both of these approaches place assumptions upon the set of acceptable nominal alternations or mixings, i.e. combinations of determiner and noun. Furthermore, there is no consensus on the definitions of borrowing and codeswitching respectively (cf. e.g. Clyne 2003: 70). Therefore this paper takes a two-tiered approach applying variationist methodology (cf. Meechan and Poplack 1995, Poplack and Meechan 1998). First, nominal alternations are observed on the basis of elicited data without restricting the data. Afterwards the obtained results are interpreted with regard to loanword and codeswitching research. The advantage of this approach is that no possible noun and determiner combination is excluded beforehand. The research questions underlying this analysis are the following: what gender is assigned to other-language nouns in language mixings; are there any patterns in the gender assignment, and what factors are relevant in this process? In the first section of this contribution, the topic of nominal language mixings and the relevant research questions are introduced. Secondly, a presentation of the survey on which the investigation is based on follows; it describes the respondents, the material used and the statistical analysis. Then, the general pattern of variation in the results is explained before presenting the results that show which factors influence
194
Astrid Rothe
the gender assignment. Finally, the results are discussed in the context of the distinction between lexical borrowing and single-word codeswitching.
2 Methodology To explore the gender assignment to other-language nouns a large set of data was collected. Three language pairs were targeted to study the mixings of determiner and noun: German-French, German-Spanish and German-Italian. These languages and combinations were chosen because of the way they exhibit grammatical gender. In German there are three genders – masculine, feminine and neuter – and the article agrees in gender with the noun that follows. German nouns are not gender transparent even though there are regularities between both a noun’s form and meaning and its gender (cf. Köpcke and Zubin 1984, 1995; e.g. certain suffixes correlate with genders). The three chosen Romance languages, French, Spanish and Italian, have two genders: masculine and feminine, and the article agrees with the noun that follows. The strength of the correlation between form and gender varies in these three languages. In French, Spanish and Italian many nouns are gender transparent through their ending (cf. Schwarze 1988; Teschner and Russell 1984; Tucker, Lambert, and Rigault 1977). These languages were chosen because their nouns have grammatical gender, the articles in the languages agree with the noun in gender, and there are nouns for which the gender of the translation equivalent differs. For example: (10) Schuh (German, masculine) – Italian translation equivalent scarpa, feminine [‘shoe’] (11) canard (French, masculine) – German translation equivalent Ente, feminine [‘duck’] (12) mesa (Spanish, feminine) – German translation equivalent Tisch, masculine [‘table’] The German noun Schuh in (10) is masculine; its Italian translation equivalent scarpa however, is feminine. The French noun canard in (11) has the same gender combination: it is masculine and its German translation equivalent is feminine. The Spanish noun mesa in (12) is also an example of a gender-incongruent combination; it is feminine whereas its German translation equivalent is masculine. To investigate which article is chosen in nominal mixings – the choice between der mesa, die mesa and das mesa – a survey was conducted by means of questionnaires for each language pair. More than 400 respondents with different language backgrounds participated in this survey. The data were then coded digitally and analyzed statistically. Out of these, 213 German-French questionnaires, 107 GermanSpanish questionnaires and 76 German-Italian questionnaires entered the analysis.
On the variation of gender in nominal language mixings
195
The respondents were chosen with respect to their language proficiency. They all have at least one of the tested languages of each pair as their first language (L1) and the other as L1 or as their second language (L2).3 For each pair there is a subsample of bilinguals, respondents for whom both tested languages are first languages. The categorization of the respondents is based on the given sociolinguistic information.4 Table 1 at the end of the description of the participants lists the information on the subsamples. The German-French dataset was collected in winter 2004/2005 and in 2007. For the first round German and French students from two international German-French schools in the surroundings of Paris and German and French students from a GermanFrench university program at a Paris university were questioned (167 students, 111 female, 55 male, 1 no entry on gender, mean age: 16). The second round was conducted with German students who follow a French language course at the University of Cologne in Germany (46 students, 27 female, 19 male, mean age: 31). However, these German students are not enrolled in French language studies. Based on the given sociolinguistic information the respondents from these two rounds were divided into four subsamples according to their language proficiency. The respondents whose data were collected in France during the first round are divided into three subsamples: a bilingual subsample and two monolingual subsamples – consisting of French monolinguals (L1 = Fr) and German monolinguals (L1 = D). For the bilingual respondents both German and French are an L1 (D = L1, Fr = L1). The French monolingual subsample has French as L1 and German as L2 (Fr = L1, D = L2). This group consists of French students who go to school or university with German-French bilingual programs and their proficiency in German is relatively high. The German monolingual subsample has German as L1 and French as an early L2 (D = L1, Fr = early L2). French is the language of their environment and of their school or university; thus, their proficiency in French is rather high. The re-
3 Instead of an L2 the language in question can as well be an L3 or an L4. The term L2 is preferred for this paper because the important distinction is whether the language in question is a native language (L1) or not. Languages that are not an L1 – i.e. that are learned later, e.g. in school or at university, are considered as an L2 (foreign language) here without focusing on the order of acquisition, thus without further distinction between L3, L4 etc. It has not been tested whether the results differ for L2, L3 or L4 learners since the sample size would have been too small. 4 The definition of bilingual in this paper is a rather strict one. To be considered bilingual, a person has two first languages, in contrast to a person with only one first language and the other language being a second language. The latter is often labeled as monolingual. This strict definition is preferred to a wide sense definition according to which someone who speaks two languages to a certain degree is already categorized as a bilingual. The classification of the respondents as to their mono- or bilingual proficiency is based on the sociolinguistic information elicited in the questionnaire. This procedure is obviously not an exhaustive means to detect bilingual proficiency as it is possible to become nearly bilingual in an L2 and having two L1s does not guarantee a bilingual proficiency. It is however a pragmatic way to classify respondents – even extensive and time consuming testing is not able to detect the turning point at which monolingual proficiency turns into bilingual proficiency.
196
Astrid Rothe
spondents whose data were collected in Germany in 2007 during the second round are the fourth subsample of the German-French dataset. They all have German as L1 and French as a late foreign language (D = L1, Fr = late L2). German is the language of the environment and French is only taught once a week. Their proficiency in French is rather low compared to the German monolingual subsample that was collected in France. They have French as an early L2. The data for the German-Spanish part were gathered in spring 2010 in Salamanca (Spain) amongst Spanish students in a German program, with further data collected from a master thesis (Weiser 2008).5 The respondents from the Spanish university of Salamanca have Spanish as their L1, and German as a late L2 (Sp = L1, D = L2; 17 students, 13 female, 3 male, 1 no entry on gender, mean age: 22). Their proficiency in German is rather low. The three subsamples from the Weiser-corpus, however, all have German as their L1. There is a bilingual subsample with both German and Spanish as L1 (D = L1, Sp = L1) and there are two monolingual subsamples with Spanish as an L2. They differ according to their proficiency in Spanish. The respondents from one of these subsamples have been learning Spanish for more than five years, are currently studying Spanish, or have immigrated to Spain. Their Spanish level is rather advanced (D = L1, Sp = early L2). The respondents of the second German monolingual subsample have been learning Spanish for less than five years (D = L1, Sp = late L2). The Spanish level of the first subsample (Sp = early L2) is higher than the level of the second (Sp = late L2). The German-Italian dataset consists of two rounds. The first part was collected in November 2005 at an Italian school in Cologne (Germany; 44 students, 22 female, 22 male, mean age: 18). The second part was gathered at the university of Cagliari in Sardinia (Italy) amongst Italian students of German in spring 2010 (32 students, 20 female, 10 male, 2 no entry on gender, mean age: 21). The students from Cologne are divided into two subsamples: a bilingual group and an Italian monolingual group. For the bilingual subsample both German and Italian is the first language (D = L1, It = L1). For the second subsample Italian is the first language and German an early learned second language (It = L1, D = early L2). The respondents of this subsample live in Germany – so the language of their environment is German – and attend a German-Italian school. Italian is the first language for the third subsample, the respondents from Cagliari. They study German, but their proficiency in German – especially compared to the other Italian monolingual subsample from Germany – is rather low (It = L1, D = late L2). The following table summarizes the information on all the subsamples for the three tested language pairs: their language proficiency (the same abbreviations are
5 I thank Elisabeth Weiser for allowing me to analyze her data. Unfortunately, no information on age and gender is available for this subsample.
On the variation of gender in nominal language mixings
197
used in the following figures), the language of the nouns that were tested for each corresponding subsample, and their number (for the nouns see next paragraph).
Tab. 1: The tested subsamples per corpus (D = German, Fr = French, Sp = Spanish, It = Italian) Corpus
Subsamples
N
Language and total of tested nouns
German-French
D = L1, Fr = L1 (bilinguals) D = L1, Fr = early L2 D = L1, Fr = late L2 Fr = L1, D = L2
70 53 46 44
D (32), Fr (32) Fr (32) Fr (64) D (32)
German-Spanish
D = L1, Sp = L1 (bilinguals) D = L1, Sp = early L2 D = L1, Sp = late L2 Sp = L1, D = L2
29 30 31 17
D (23), Sp (18) Sp (18) Sp (18) D (23)
German-Italian
D = L1, It = L1 (bilinguals) It = L1, D = early L2 It = L1, D = late L2
11 33 32
D (64) D (64) D (64)
The paper-pencil questionnaire consisted of several parts: tasks on article elicitation and questions on the sociolinguistic background.6 The respondents received the questionnaire while they were in class. The instruction was mostly made in both respective languages. Special caution was given so each respondent filled out the questionnaire without discussing it with others. The elicitation task in each questionnaire was comprised of a list of nouns from one language and a box with given articles from the other language. The assignment for the respondents was to select an article for each noun and write it in front of the noun. This task consisted of two parts, as there were lists with nouns from each of the two languages (a list with nouns from language A with a box containing articles from language B and a list with nouns from language B with a box containing articles from language A). The articles presented in the box are from the other language, so that the respondents had to produce nominal language mixings7 (e.g. for the noun mesa on the Spanish list the German-Spanish respondents could select from three German definite
6 The questionnaire also comprises an utterance judgment task. However, the results of this task are not analyzed here (for a description and an analysis of these see Rothe 2012). 7 As Haugen (1950: 212) points out for loanwords, “we shall rarely if ever be able to catch a speaker in the actual process of making an original borrowing, it is clear that every loan now current must at some time have appeared as an innovation. Only by isolating this initial leap of the pattern from one language to another can we clarify the process of borrowing.” The aim of this questionnaire is to catch the borrowing or more generally the alternation process where it happens by constraining respondents to utter alternations between determiner and noun.
198
Astrid Rothe
articles der, die, das ; they then produced either der mesa, die mesa or das mesa ).8 The last column of Table 1 above shows the languages and the respective number of nouns tested for each subsample. For the German-French dataset three lists were evaluated: a list with 32 German nouns by the bilinguals (D = L1, Fr = L1) and the French monolinguals (Fr = L1, D = L2), a list with 32 French nouns by the bilinguals (D = L1, Fr = L1) and the German monolinguals with early L2-French (D = L1, Fr = early L2) and a list with 64 French nouns by the German monolinguals with late L2-French (D = L1, Fr = late L2). For the German-Spanish dataset two lists were tested: a list with 23 German nouns for the bilinguals (D = L1, Sp = L1) and the Spanish monolinguals (Sp = L1, D = L2) and a list with 18 Spanish nouns for the bilinguals (D = L1, Sp = L1) and both German monolingual subsamples (D = L1, Sp = early L2 and D = L1, Sp = late L2). For the German-Italian dataset one list with 64 German nouns was analyzed for all three subsamples.9 The nouns analyzed in this paper are listed in tables in the appendix below. For the exhaustive lists of all 233 nouns that were tested, see Rothe (2012: 105–106, 116, 125–126, 140–141, 151–152 and 165–166). The lists consist of nouns from each possible combination of gender, i.e. the noun’s gender and the gender of the article of its translation equivalent in the language. There are gender-congruent combinations where the gender of the otherlanguage noun and the gender of its translation equivalent are the same (e.g. masculine noun – masculine translation equivalent)10 and gender-incongruent combinations where both genders differ (e.g. masculine noun – feminine translation equivalent). Several nouns were chosen for each combination and then presented in a randomized order to counterbalance any effects of the nouns’ combination of gender. By using the questionnaire to elicit data on a targeted phenomenon a large set of data becomes available which can be used to systematically analyze the effect of certain factors. The articles written down by the respondents were coded digitally with respect to their gender and then analyzed statistically. Two aspects were tested by means of Chi-square tests: the distribution of the articles, i.e. which gender was assigned, and whether there is a correlation between the assigned gender and the respondents’ language proficiency. Chi-square tests were chosen because they adequately test for categorical data. The distribution of the assigned gender is tested with a Chi-square test for goodness-of-fit, thus testing whether there is a preference for one of the nominal variables values in the distribution of the frequencies, such as whether
8 The chosen articles were not analyzed systematically for all the respondents in the datasets, except for the bilinguals. To be analyzed the other-language noun that is tested needs to be from the L2 of the respondents (see Table 1). 9 The data were gathered between 2004 and 2010 in the context of a PhD-thesis, and were improved and adapted throughout the study. For the German-Spanish and the German-Italian datasets for example another test condition was added where the nouns were presented with a translation equivalent. 10 See Sebba (1998, 2009) and Deuchar (2005) for a discussion on the assumption that categories are the same across languages, e.g. masculine in French and masculine in German.
On the variation of gender in nominal language mixings
199
there is a preference for a particular gender. The Chi-square test of independence tests whether the two nominal variables (assigned gender, language proficiency) are independent. Unfortunately one cannot perform a Chi-square test for the totality of the nouns from a certain combination because one of the conditions of the Chi-square test is the independence of the observations. Therefore the tested frequencies can only contain one observation per respondent (cf. Field 2009: 691; Larson-Hall 2010: 209). That is why the Chi-square tests needed to be computed for each single noun and for each subsample. Then, in a second step cross-tabulations and Chi-square tests of independence were computed for each noun regarding the variables (gender and language proficiency). However, the large amount of data allows for a thorough analysis of these variables. The Chi-square tests display highly significant results with considerable effect sizes (for the frequencies and the results of the Chi-square tests see the tables in the appendix of Rothe 2012: 255–285). The nouns and results chosen for this contribution all have highly significant results.
3 Results 3.1 Variation in the data The corpus thus consists of a large set of elicited data of nominal language mixings. The obtained data show a fair amount of variation regarding the selected gender for the article. Nonetheless, the analysis reveals certain patterns. First of all, the language proficiency of the respondents is an important factor: monolingual and bilingual respondents tend to produce different variants of nominal language mixings regarding the assigned gender. This will be shown in Section 3.2.1. Other influencing factors such as the noun’s gender transparency, its sociolinguistic integration and its biological gender will be presented subsequently (Section 3.2.2 to 3.2.4). These results will be elucidated with exemplary results from the data. It is not the aim of this contribution to exhaustively analyze all the data but to provide representative results that give a general idea of the data and the results (for an exhaustive overview of all results cf. Rothe 2012). Subsequently, some exemplary results will be analyzed to illustrate this variation starting with examples from the German-Italian dataset. The respondents of this set had to produce mixed noun phrases with an Italian article and a given German noun. They had the choice between Italian masculine articles like il, lo, uno, un 11 and Ital-
11 Nevertheless, some of the respondents wrote l’/lo or lo/il. These double entries were analyzed as masculine articles.
200
Astrid Rothe
Fig. 1: Elicited Italian articles for German masculine nouns, all German-Italian respondents (in percent) It. = Italian, trans. equiv. = translation equivalent, FEM = feminine, MAS = masculine
ian feminine articles like la, una, un’.12 Figure 1 displays the results from the GermanItalian dataset for all the tested German masculine nouns from all respondents of the German-Italian dataset (the tested nouns are listed in Appendix A in Table 2). The pair of bars on the left represents the results for the gender-congruent nouns (German masculine nouns with a masculine Italian translation equivalent) and the pair of bars on the right represents the results for the gender-incongruent nouns (German masculine nouns with a feminine Italian translation equivalent). The bars show the relative amount of elicited masculine and feminine Italian articles for all the German masculine nouns that were tested. Figure 1 shows a clear difference between the two sets of nouns. For the gendercongruent nouns there is, not very surprisingly, a clear choice for one gender. For the gender-incongruent nouns the choice for the article’s gender is less clear. There is variation between masculine and feminine articles with a majority of masculine articles. However, 35% of the respondents name a feminine Italian article. This reveals the effect of the translation equivalent’s gender, which in this case is feminine. The same pattern is visible for the German masculine nouns in the German-French and in the German-Spanish dataset. Figure 2 shows the elicited French articles on the left of the figure (masculine: le, un ; feminine: la, une ; the tested nouns are listed in the appendix in Table 3; respondents: bilinguals D = L1, F = L1, French monolinguals F = L1, D = L2) and the Spanish ones on the right (only definite articles were proposed here: masculine: el ; feminine: la ; the tested nouns are listed in the appendix
12 Nevertheless, some of the respondents wrote l’/la, which was analyzed as a feminine article.
On the variation of gender in nominal language mixings
201
Fig. 2: Elicited French and Spanish articles for German masculine nouns, all German-French respondents on the left and all German-Spanish respondents on the right (in percent) Fr. = French, Sp. = Spanish, trans. equiv. = translation equivalent, FEM = feminine, MAS = masculine
in Table 4; respondents: bilinguals D = L1, Sp = L1, Spanish monolinguals Sp = L1, D = L2). While there is no variation for the gender-congruent nouns – nearly all the respondents in both datasets choose masculine articles – there is a considerable amount of feminine articles for the gender-incongruent nouns. However, in the German-French dataset the proportion of feminine articles is considerably lower than in the GermanSpanish dataset. Thus, in general the variation pattern for the choice of the article’s gender is similar for all three language pairs. Taking the responses of all the informants without splitting the results into subsamples are the following: for gender-congruent nouns almost all elicited articles are of the same gender as the other-language noun whereas for the gender-incongruent nouns the elicited articles vary between the same gender and the gender of the other-language noun’s translation equivalent. This can be examined in further detail by looking more closely at factors like language proficiency (Section 3.2.1.), the noun’s gender transparency (Section 3.2.2.), its sociolinguistic integration (Section 3.2.3.) and its biological gender (Section 3.2.4.).
3.2 Factors influencing gender assignment 3.2.1 Language proficiency of the speaker To elucidate the cause of these patterns the respective samples of each dataset are split into subsamples to compare the chosen articles. The respondents have differ-
202
Astrid Rothe
ent degrees of proficiency in the language of the noun that is tested. For example, for some respondents German is an L1 while for others it is an L2 (for the description of the subsamples see Section 2 and Table 1 above). The same holds for the corresponding subsamples for the tested French nouns and the Spanish nouns and vice versa for the tested German nouns in each dataset (cf. Section 2 and Table 1). The following figures illustrate the selected articles for the German masculine nouns with a gender-incongruent translation equivalent according to the respondents’ language proficiency in German. In Figure 3 the subsamples of the German-Italian dataset are presented so that the proficiency in German is lowest on the left side and highest on the right side of the figure. The first subsample on the left has Italian as L1 and German as a late L2. The subsample in the middle also has Italian as L1, but German as an early L2. The proficiency in German of this subsample is considerably higher than the one of the other subsample. The third subsample on the right displays the highest proficiency in German since both Italian and German are the respondents’ first language (bilinguals).
Fig. 3: Elicited Italian articles for German masculine nouns per subsample (in percent) D. = German, It. = Italian, FEM = feminine, MAS = masculine
Overall there seem to be two different selection patterns for these three subsamples: the majority of the respondents of the low-proficiency subsample (German = late L2) choose a feminine article for the gender-incongruent noun. On the contrary, the majority of the respondents of the two other subsamples select masculine articles – with the bilingual sample displaying an even lower amount of feminine articles than the second subsample (German = early L2). Again, the results for the German-French dataset and the German-Spanish dataset are quite similar. The following figure illustrates the selection of the respective respon-
On the variation of gender in nominal language mixings
203
dents. As in Figure 3, the respondents are split into subsamples according to their language proficiency in German. The level of proficiency in German is highest on the right side of the figure for each dataset. In the German-French dataset there is a group of respondents with French as L1 and German as L2 (on the left). The second subsample has both French and German as L1 (bilinguals on the right). For the German-Spanish respondents there are two subsamples as well: one with respondents having Spanish as L1 and German as L2 (on the left) and the other with Spanish and German as L1 (bilinguals on the right).
Fig. 4: Elicited French and Spanish articles for German masculine nouns per subsample (in percent). D. = German, Fr. = French, Sp. = Spanish, FEM = feminine, MAS = masculine
The selected articles of the low-proficiency group in the French dataset (French = L1, German = L2) are evenly divided in masculine and feminine articles. In contrast, the majority of the bilingual subsample selects a masculine article. The same holds for the Spanish dataset. While the majority of the low-proficiency group (Spanish = L1, German = L2) chooses a feminine article, the majority of the bilingual subsample selects a masculine article. These analyses show that there is a correlation between the choice of the article’s gender and the respondents’ language proficiency. While the respondents with lowproficiency in the language of the noun tend to choose the gender of the noun’s translation equivalent, the high-proficiency respondents, especially the bilingual ones, select the other-language noun’s gender. This correlation can be found throughout the data. The difference between the subsamples according to their language proficiency is not surprising as the respondents were selected according to this criterion. This same criterion, however, appears to be significant when a cluster analysis is performed on
204
Astrid Rothe
all the data without splitting it into subsamples. In a cluster analysis a set of disordered objects is assigned to groups, or clusters. The objects of each group are more similar to each other than to objects of other groups. Such a cluster analysis can be performed with the respondents as objects by using the selected articles’ gender to find out if the computed clusters correspond to the language proficiency subsamples. The respondents are assigned to one of the obtained clusters through the cluster analysis. In the next step, the obtained types of clusters are cross-tabulated with the language proficiency group. Doing so shows how the allocation based on the sociolinguistic information matches the computed clusters, which are based on the elicited language data. This analysis is performed on part of the German-Italian and German-Spanish data. For the cluster analysis on the German-Italian data all selected Italian articles for gender-incongruent nouns were taken into account (cf. Table 2 in the appendix; some of the gender-incongruent nouns had to be discarded from the analysis because they had too many missing values, and all nouns with more than ten missing values were discarded). A TwoStep Cluster Analysis was run on these data using the computer program SPSS. The analysis results in two clusters. Almost every respondent is assigned to a cluster – the allocation to a cluster cannot be computed for each respondent because the cluster analysis discards objects with missing values – and can then be crosstabulated with the classification based on language proficiency. Thus, the subsamples in Figure 5 (and in Figure 6 as well) do not include every respondent. The result for the German-Italian dataset is visualized in the subsequent figure. According to this analysis the respondents of the German-Italian dataset are divided into clusters as follows (cf. Figure 5): the majority (75%, n = 18) of the lowproficiency subsample (German = late L2) belongs to the first cluster. Only a quarter is assigned to the second cluster (25%, n = 6). The result for the early L2-respondents
Fig. 5: The subsamples’ allocation to the computed clusters (German-Italian dataset) D. = German, It. = Italian
On the variation of gender in nominal language mixings
205
Fig. 6: The subsamples’ allocation to the computed clusters (German-Spanish dataset) D. = German, Sp. = Spanish
is almost an exact mirror image of this distribution: The majority (77%, n = 20) is in the second cluster and the minority (23%, n = 6) in the first cluster. All the respondents of the bilingual subsample are in the second cluster. The allocation of these three subsamples to the two computed clusters is in line with the results discussed above (see Figure 3): the early L2-respondents resemble the bilingual subsample more closely than the late L2-respondents. The same kind of analysis is run on the German-Spanish data, i.e. with the chosen German articles for gender-incongruent Spanish nouns (cf. Table 5 in the appendix). This means that only respondents with German as L1 (i.e. the bilinguals: D = L1, Sp = L1, the German monolinguals: D = L1, Sp = early L2, and D = L1, Sp = late L2; cf. Section 2 and Table 1) are included in this analysis. The TwoStep Cluster Analysis results in two clusters as well. Nearly all respondents are assigned to one of those clusters. This classification is then cross-tabulated with the classification according to language proficiency. Figure 6 displays the relative proportion of both cluster types for each subsample. The allocation of the German-Spanish subsamples to the clusters is somewhat different to the one in the German-Italian dataset. Nevertheless one can see the same kind of association with the degree of proficiency. The majority of the late-L2 respondents – the subsample with the lowest proficiency in Spanish – is assigned to cluster 1 (74%, n = 23), and only 26 percent (n = 8) of them are assigned to cluster 2. The next subsample (German = early L2) is evenly attributed to both clusters (each with n = 15). Most of the respondents from the bilingual subsample are in cluster 2 (76%, n = 22). But contrary to the German-Italian data not all the bilingual respondents are in the same cluster, and some of them are in the other cluster (24%, n = 7).
206
Astrid Rothe
The cluster analysis shows that based on the elicited language data (the chosen articles), the sample is allocated to two clusters, or subsamples: a high-proficiency one and a low-proficiency one. Accordingly, the intermediate proficiency respondents are allocated to both these clusters. The apparent difference between the allocations of both intermediate subsamples can be explained with their differing proficiency in the L2. In the Spanish subsample the intermediate group is evenly allocated to both clusters, while the majority of the Italian intermediate subsample on the other side is allocated to the cluster with all the bilinguals. The respondents of the Italian subsample have a higher proficiency in their L2 German than the respondents of the Spanish subsample in their L2 Spanish (cf. above Section 2). These exemplary cluster analyses emphasize the significance of language proficiency for the choice of an article with other-language nouns and corroborate its status as a decisive factor.
3.2.2 Gender transparency of the other-language noun In addition to the respondents’ language proficiency there are other factors influencing the choice of the article for other-language nouns. One of these is the gender transparency of the other-language noun through its morphonological form (i.e. its morphological or phonological ending). Evidence for the effect of this factor has been presented by Poplack, Pousada, and Sankoff (1982) and Poplack and Pousada (1981). In Spanish for example many feminine nouns end in -a, whereas many masculine nouns end in -o (cf. Teschner and Russell 1984). Other clues for the noun’s gender are suffixes which can be associated with a gender. In French, for example, there are typically feminine endings -ion and -ette (cf. Tucker, Lambert, and Rigault 1977; Lyster 2006). Some of the tested nouns in the data are clearly gender transparent through their morphophonological ending. It can thus be checked whether this factor influences the choice of an article. Three Spanish feminine nouns were tested in the German-Spanish dataset. Two of them are gender transparent by their final -a (mesa (13) and luna (14)) and one is not (torre (15)). All three nouns are of the same gender combination; they are genderincongruent: (13) mesa (feminine) – German translation equivalent Tisch, masculine [‘table’] (14) luna (feminine) – German translation equivalent Mond, masculine [‘moon’] (15) torre (feminine) – German translation equivalent Turm, masculine [‘tower’] The following figure compares the results for the two gender transparent nouns with the results for the opaque noun torre. The selected German articles (only definite German articles could be chosen) – masculine: der, feminine: die, neuter: das – are displayed for three subsamples. As in the preceding figures the subsamples are split and ordered along the respondents’ language proficiency in their L2, in this case Spanish, starting on the left with a group of respondents having German as L1 and Spanish as a late L2 (cf. Figure 6 above). The subsample in the middle has German as L1 and Spanish
On the variation of gender in nominal language mixings
207
Fig. 7: Elicited German articles for Spanish feminine nouns: torre vs. gender transparent nouns luna and mesa per subsample (in percent). D. = German, Sp. = Spanish
as an early L2; their proficiency of Spanish is higher than the one of the late-L2-Spanish subsample. The subsample on the right is composed by bilingual respondents with German and Spanish as L1. The difference between both kinds of nouns is clearly visible. For the opaque noun the choice of articles by the two L2-Spanish subsamples is different from the one by the bilingual subsample. The majority of the low-proficiency group (Spanish = late L2) selects a masculine article. This applies to the intermediate-proficiency group (Spanish = early L2) as well: the majority picks masculine. However, the amount of selected feminine articles is considerably higher in this group, similar to the bilingual subsample, which clearly prefers the feminine article. This shows the effect of a noun’s gender transparency: for an opaque noun almost no one from the low- and the intermediateproficiency subsamples selects the other-language noun’s gender, but when it is transparent they do, thus resembling the choice of the bilinguals. The effect of gender transparency can also be seen in the German-French dataset. Some of the tested French nouns end in the typically feminine suffixes -ion or -ette : (16) solution (feminine) – German translation equivalent Lösung, feminine [‘solution’] (17) fourchette (feminine) – German translation equivalent Gabel, feminine [‘fork’] (18) serviette (feminine) – German translation equivalent Handtuch, neuter [‘towel’] Two of these French gender transparent nouns are gender-congruent, (16) and (17), while the other – serviette (18) – is gender-incongruent since its German translation equivalent is neuter. The subsample for which these nouns have been tested has German as L1 and French as late L2. The following figure contrasts the results for nouns
208
Astrid Rothe
of the same gender combination, which are opaque, with the ones for the transparent nouns (the tested nouns are listed in the appendix in Table 7). The first two groups on the left show the results13 for feminine gender-congruent nouns (French feminine nouns with a German feminine translation equivalent); the two groups on the right show them for feminine gender-incongruent nouns with the German translation equivalent being neuter.
Fig. 8: Elicited German articles for French feminine nouns: opaque vs. gender transparent nouns, German monolingual respondents (D = L1, Fr = late L2; in percent) FEM/F = feminine, MAS = masculine, NEU/N = neuter, transp. = transparent
The influence of the noun’s gender transparency is rather obvious; it is strongest for the gender-incongruent nouns. That is because for the gender-congruent nouns the choice is already relatively clear since the gender of the other-language noun and the gender of the translation equivalent are both feminine. Nonetheless, the majority of feminine articles for the non-transparent gender-congruent nouns is less pronounced (70%) than for the transparent ones (96%). For the nouns with a neuter translation equivalent the difference is more substantial: while for the non-transparent nouns the majority (55%) of the chosen articles is neuter like the German translation equivalent, where the majority of the chosen articles for the transparent noun serviette is – with a clear 93% – feminine. The choice could also be due to the fact that the French noun serviette has a second possible German translation equivalent: Serviette (‘table napkin’), which is feminine.
13 Masculine determiner: der, feminine determiners: die, eine, neuter determiner: das and the indefinite article for both masculine and neuter: ein.
On the variation of gender in nominal language mixings
209
These examples from two different datasets show that the bilingual subsample is apparently not affected by the transparency of the other-language noun, since the bilingual respondents select the gender of the other-language noun for the article independently from the noun’s gender transparency. On the other hand, the transparency has an impact on the L2-subsamples. If the noun is opaque, they prefer the gender of the translation equivalent. If the noun is gender transparent, they lean towards the gender of the other-language noun.
3.2.3 Sociolinguistic integration of the other-language noun The majority of the nouns presented in the questionnaire are common, everyday nouns, such as Fisch (‘fish’), Zucker (‘sugar’), Stuhl (‘chair’), nuage (‘cloud’), solution (‘solution’), luna (‘moon’), mesa (‘table’). They may easily be mixed into the language of the article in common language mixing situations but they are not established loanwords in the language of the article and should therefore not be recognized as such by monolingual speakers of the language of the article. These nouns are neither recurrent nor widespread in the speech community of the article’s language and they are merely uttered ad hoc and probably once for this elicitation situation. All these features are features of so-called ad hoc or nonce borrowings as defined by Poplack and colleagues (cf. Poplack, Sankoff, and Miller 1988, Sankoff, Poplack, and Vanniarajan 1990, Poplack 2012). The counterpart of these ad hoc borrowings are established loanwords. Established loanwords are accepted in the language of the article (the recipient language) and may be listed in the respective dictionaries of the recipient language (see the criterion of listedness proposed by Muysken 2000). They are to a certain degree widespread, recurrent and accepted, depending on the degree of their sociolinguistic integration that makes them lexemes of the recipient language with the additional feature of being foreign. The French noun métro is one of these nouns. It is a French noun but it is also an established loan in German with differing degrees of grammatical integration: it may be written with a capital letter, without the accent on the second letter (Metro ) and stressed on the first syllable14 instead of being written in lowercase and stressed on the second syllable as in French. Figure 9 contrasts the elicited German articles for French masculine nouns with feminine German translation equivalents (the tested nouns are listed in the appendix in Table 6 and 7)15 with the results for the French noun métro, which has the same gen-
14 It is listed as such in the German dictionaries Duden (cf. www.duden.de), for example, as well as in OWID (www.owid.de) and DWDS (www.dwds.de). In these entries the noun is specified as a feminine noun: “Dies war auch der Grund für den Bau der Metro.” ‘This was also the reason for the construction of the Metro’ (Mannheimer Morgen 22.11.2003, in OWID; der Metro = feminine genitive singular). 15 The masculine French nouns that were tested differ for the three subsamples (see Table 6 and 7 in the appendix).
210
Astrid Rothe
Fig. 9: Elicited German articles for the French masculine nouns and the established loanword métro per subsample (in percent). D. = German, Fr. = French, trans. equiv. = translation equivalent, FEM = feminine, MAS = masculine, NEU = neuter
der combination. The results are listed for three subsamples which differ regarding their degree of proficiency in French; the proficiency of French is lowest for the subsample on the left and highest for the bilingual subsample on the right (cf. Section 2 and Table 1 for a more detailed description of the subsamples). In each of the two L2-French-subsamples a majority selects feminine German articles for all the nouns even though the majority is not that pronounced in the intermediate group. For the loanword métro the amount of feminine articles is considerably higher and there is less variation in the choice of other articles. The fact that the noun is an established loanword16 visibly affects the gender selection for the article of the L2-subsamples inasmuch as there is a clear choice and less variation. The selection of the bilingual subsample is affected as well, but the pattern is rather different, especially for the non-established nouns. The majority of the bilingual respondents chooses masculine articles for the French masculine nouns even though their choice is not that clear: there is a significant portion who uses the indefinite German article ein, and only a small portion who uses the feminine articles.17 This is completely
16 Even though the noun is presented in lowercase and an accent in the questionnaire, the noun may have been treated as the established loanword (the stressing cannot be controlled). 17 The reason for the bilinguals’ uncertainty in the article selection is that the adequate choice for French (as well as Spanish and Italian) masculine nouns is the German underspecified indefinite article ein. The definite articles der and das are overspecified and therefore a less fitting choice (cf. Rothe 2012; Gonzalez 2005). That is one reason why there is so much variation in the bilingual subsample.
On the variation of gender in nominal language mixings
211
different for the bilinguals’ choice for the noun métro : here the feminine articles outnumber the others. Thus, while there is more variation for the nonce-loans there is less of it for the established noun. For this established – now German – noun the bilinguals seemingly switch more easily to feminine, which is the gender of the established loanword in German and of the French noun’s German translation equivalent. Thus, for the noun métro, which can be counted among established loanwords in German, the article selection of all three subsamples resembles each other with the bilingual subsample displaying some more variation that indicates uncertainty for the selection of gender. The factor of the sociolinguistic integration of the other-language noun plays a role insofar as there is more variation regarding its gender when the noun is still an ad hoc borrowing (i.e. a young loanword) than there is when the noun is conventionalized and integrated sociolinguistically (established loanword) – especially for the L2-subsamples.18
3.2.4 Biological gender of the other-language noun Another factor for the gender assignment of other-language nouns is its biological gender (cf. e.g. Poplack, Pousada, and Sankoff 1982, Poplack and Pousada 1981: 23, 33; and confirming evidence by Fuller and Lehnert 2000; Violin-Wigent 2006). A noun is said to have biological gender when it is an animate noun referring to a male or female being. Some of the tested French nouns in the German-French dataset have a biological gender insofar as they are animate nouns referring to a being with biological sex. Most of these nouns are gender-congruent nouns, that is, these nouns and their respective translation equivalent are of the same gender. (19) (20) (21) (22) (23)
prince (masculine) – German translation equivalent Prinz, masculine [‘prince’] roi (masculine) – German translation equivalent König, masculine [‘king’] femme (feminine) – German translation equivalent Frau, feminine [‘woman’] reine (feminine) – German translation equivalent Königin, feminine [‘queen’] vendeuse (feminine) – German translation equivalent Verkäuferin, feminine [‘saleswoman’]
The French masculine nouns prince and roi, as well as their German translation equivalents, refer to a male being. The French feminine nouns femme, reine and vendeuse and their German translation equivalents refer to a female being. These nouns have
18 This phenomenon has been observed for other language pairs as well; it has been labelled Genusvaszillation or Genusschwankung (vacillation of gender, e.g. Schulte-Beckhausen 2002), and typically occurs at the beginning of the integration of a loanword. Once it is an integrated established loanword the gender is rather invariable.
212
Astrid Rothe
Fig. 10: Elicited German articles for French gender-congruent nouns regarding biological gender, German monolingual respondents (D = L1, Fr = late L2; in percent) FEM/F = feminine, MAS/M = masculine, NEU = neuter, biol. = biological
been tested with only one subsample, respondents with German as L1 and French as late L2 (cf. Figure 8 and 9). Figure 10 displays the amount of elicited articles per gender for French gender-congruent nouns – masculine nouns on the left and feminine nouns on the right – comparing all non-animate nouns of this gender combination19 to the animated nouns (the tested nouns are listed in the appendix in Table 7). One can see that the choice for the gender-congruent non-animate nouns is not clear even though it should be. There is obviously a fair amount of uncertainty about which article to select. For the masculine nouns there is a majority of 59% in favor of the masculine article and 10% for the feminine articles, 25% for the neuter article and 6% for ein (cf. footnote 17). For the feminine nouns a majority of 70% is for feminine articles, but there are also 18% for the masculine article, 3% for the neuter article and 2% for ein. On the contrary for the animate nouns the majority for masculine and feminine is higher: for the masculine nouns prince and roi, 83% are for the masculine article, and for the feminine nouns femme, reine and vendeuse, 91% are for the feminine articles.
19 The displayed results do not include those for the gender transparent nouns (cf. above).
On the variation of gender in nominal language mixings
213
4 Discussion: Classification of language mixings, borrowing or codeswitching? The analysis of the corpus of elicited data on nominal language mixings has shown that the following factors influence the gender assignment to the other-language noun: language proficiency, gender transparency of the other-language noun, the sociolinguistic integration of the other-language noun and the biological gender of the otherlanguage noun. Regarding the gender assignment to other-language nouns there are two variants. First, the variant mostly produced by respondents highly proficient in both languages (i.e. bilinguals) is that they mix determiners and nouns such that the determiner has the same gender as the other-language noun in its source language. Second, the variant mainly produced by respondents who are highly proficient in the language of the determiner (their L1) but less proficient in the language of the noun (their L2) is that they select determiners for the other-language noun that have the same gender as its translation equivalent in the language of the determiner (their L1). However, this is not a clear-cut correlation and the results of the proficiency subsamples are not unanimous. The majority of the high-proficiency respondents produces the first variant, but there are also instances of the second variant. The same holds for the low-proficiency subsamples as well: the majority produces the second variant, but some also produce the first one. The L2-respondents with a greater proficiency in the L2 (e.g. the intermediate subsamples) produce more of the first variant, though not to the same degree as the bilinguals. This demonstrates that proficiency is a continuous and dynamic variable, and that with increasing proficiency in the language of the noun the frequency of the bilingual variant of the nominal mixings also increases. The gender transparency of the noun affects the article selection of the L2-subsamples while the selection of the bilingual respondents is not influenced. The choice of the L2-subsamples for gender transparent nouns resembles the choice of the bilingual subsamples, but for gender opaque nouns it does not. The sociolinguistic integration of the other-language noun, i.e. whether it is an established loanword, also influences the article selection of all the subsamples. To assess this factor more precisely, further research is required as the evidence in this corpus is not conclusive. There is also an effect of the biological gender, but it cannot be interpreted systematically as it has only been tested with one L2-subsample. Even though it has a strong impact it does not seem to have the clear overriding value as described, for example, by Poplack, Pousada, and Sankoff (1982) and Poplack and Pousada (1981). This may be due to the general uncertainty of this L2-subsample about the article selection. The respondents of this subsample have a lower proficiency in the language of the noun than the respondents in Poplack, Pousada, and Sankoff (1982). This difference in proficiency could cause the varying findings for the influence of the biological gender.
214
Astrid Rothe
The factors of gender transparency and biological gender touch upon the form and meaning of the other-language noun. The factor of sociolinguistic integration is both of sociological and linguistic quality: The speech community agrees on the assigned gender. The noun is listed with a specific gender in monolingual dictionaries of the language of the determiner. Thus, the noun is a lexeme of this language. Language proficiency, finally, is a characteristic of the speaker. This is the more influential factor as subsamples split along this criterion show a systematic variation with regard to gender assignment. The other factors merely attenuate this variation. Following Poplack’s (e.g. 2001, 2004) definition of codeswitching and borrowing in terms of variation analysis, the two variants can be identified as codeswitching and borrowing. In codeswitching each item is internally grammatical by the rules of its language; there is no morphosyntactic integration. The grammars of both languages are equally active and there is no asymmetry. For nominal mixings this means that there is a language switch between a determiner and a noun. Unlike in borrowing, the noun is not integrated into the patterns of the language of the determiner. The noun keeps its gender, and the determiner merely agrees with it. Both the noun and its determiner behave like in their own language as the function of a determiner as an agreement target is to match the gender of the noun (cf. Corbett 1991). For a nominal mixing between an Italian determiner and the German noun Schuh (masculine) this would mean the following: (24) lo Schuh theDET.IT.MAS shoeN.D.MAS The gender in which the determiner and noun agree is masculine, the gender of the German noun. The Italian translation equivalent (scarpa, feminine) has no influence. On the contrary, in the borrowing process a donor language item is integrated morphosyntactically according to the recipient language’s patterns. Only the grammar of the recipient language is active, as it determines the structure and position of the lexemes. For nominal mixings this means that a noun is taken from a donor language and adapted to the patterns of the recipient language. It is put into a position that is designated by the syntax of the donor language and its gender is assigned according to the rules of the recipient language. In most cases this is the gender of the translation equivalent. For the example with the noun Schuh, we have the following: (25) la Schuh theDET.IT.FEM shoeN.D/IT.FEM The noun shoe is no longer masculine. It has been adapted to the patterns of Italian, the recipient language, and assigned feminine according to its translation equivalent.
On the variation of gender in nominal language mixings
215
The noun as well may no longer be a German noun but an Italian one (cf. Frikassee in Section 1). The Italian article then simply agrees in this “new” gender.20 This description coincides with the findings in codeswitching and loanword research. The central idea of codeswitching-models concerning nominal mixings is that the noun determines the gender (Gonzalez 2005; Cantone 2007).21 Determiner and noun agree in the noun’s gender. This corresponds to the variant described above in that it is mainly produced by the high-proficient subsamples. The other variant found in this corpus is categorized by these models as a rare exception. They are identified as such because they have the so-called wrong gender (the translation equivalent’s).22 Codeswitching research is based on data from bilinguals. It is worth noting that the exceptions are produced by bilinguals, but not by prototypical bilinguals in the strict sense, since they have an unbalanced proficiency (cf. Cantone 2007). According to loanword research the most important factor in gender assignment to other-language nouns is the gender of its translation equivalent (also called analogical gender ; cf. e.g. Poplack, Pousada, and Sankoff 1982; Carstensen 1980; Gregor 1983; Lee 1996; Thiel 1984; Volland 1986). Typically, research on loanwords focuses on monolinguals, i.e. respondents with the language of the determiner as L1 and with no or only a low proficiency in the language of the other-language noun. This corresponds to the findings in this corpus. Respondents with low-proficiency in the language of the noun mostly produce mixings between determiners and nouns, where the determiner exhibits the gender of the translation equivalent of the noun. Gender transparency has also been described as an impact factor. Poplack, Pousada, and Sankoff (1982) for instance show that for English loanwords in Puertorican Spanish and Montreal French, morphology as well as phonology of the nouns influence their assigned gender. In this study the factor of biological gender is shown to be influential as well. While all these factors do have a considerable effect, especially analogical gender, the original gen-
20 The only fly in the ointment is that the origin of the variation in nominal mixings is not a morphosyntactic phenomenon but rather a lexical one. Whether a noun has one gender or another is determined in the lexicon, although the consequences are apparent in the selected determiner which agrees with the noun. The origin of the variation lies in the dynamic structure of the mental lexicon and in the different lexical and conceptual connections as a function of the proficiency in both languages (see Chapter 5.2 in Rothe 2012 for an explanation based upon the Revised Hierarchical Model proposed by Kroll and de Groot 1997). 21 There are other codeswitching models that claim the determiner to be crucial in gender assignment to other-language nouns (e.g. Jake, Myers-Scotton, and Gross 2002, 2005; Radford et al. 2007) but it is questionable whether these models really deal with codeswitching or rather borrowing. For the nominal mixings of the language pairs considered here they would predict only one variant, i.e. the variant that can be labelled as borrowing (cf. Rothe 2012, Chapter 3). 22 See for example Cantone (2007: 223): “It should just be pointed out that the frequency of wrong gender selection is very low.” In a study by Jorschick et al. (2011) as well, one variant – when the determiner has the gender of the translation equivalent – is labelled from the start as correct and the other as incorrect – when the determiner does not agree with the gender of the translation equivalent.
216
Astrid Rothe
der of the other-language noun in the donor language is shown to be not as important (also called gender copy or Genusentlehnung ‘borrowing of gender’, Thiel 1959; Talanga 1987; Lee 1996). It is only significant in certain multilingual settings, e.g. in speech communities with strong language contact (cf. e.g. Brunt 1983 and Wawrzyniak 1985 on the impact of French-German language contact on French loanwords in German; Treffers-Daller 1994 on French loans in Brussel’s Dutch, as well as Stolz 2008; Petersen 2009 on the effect of strong language contact in multilingual speech communities). This reflects the impact of the factor of language proficiency and how it correlates with the variants of nominal mixings. Thus, there are parallels with codeswitching and loanword research which corroborate the correlation between language proficiency and the variants in nominal mixings. The structure that is described as the regular one in codeswitching research is described as the exception in borrowing research and vice versa. This coincides with the investigated speakers who are bilinguals in codeswitching research and so-called monolinguals (i.e. speakers with a rather low proficiency in the language of the otherlanguage noun) in borrowing or loanword research. The models in codeswitching and borrowing research do not emphasize this correlation. The codeswitching models especially only offer an explanation for one of the possible variants and dismiss the other variant by qualifying it as a rare or even wrong exception. It is not included or explained in the models. A neutral analysis describing the variation and the involved factors can be possible only by not constraining the collected data but collecting data for respondents with different levels of language proficiency and putting no restrictions on the “correctness” of the nominal mixings. This analysis can likewise show that the alleged clear-cut correlation of language proficiency and production of a certain variant of nominal mixing is not that clear. Depending on the factors, highproficiency speakers (i.e. bilinguals) can also produce borrowing variants and lowproficiency speakers (i.e. monolinguals) can produce codeswitching variants.
References Brunt, Richard J. 1983. The influence of the French language on the German vocabulary (1649– 1735). Berlin & New York: Mouton de Gruyter. Cantone, Katja F. 2007. Code-Switching in bilingual children. Dordrecht: Springer. Cantone, Katja F. & Natascha Müller. 2008. Un nase or una nase ? What gender marking within switched DPs reveals about the architecture of the bilingual language faculty. Lingua 118. 810–826. Carstensen, Broder. 1980. The gender of English loan-words in German. Studia Anglica Posnaniensia 12. 3–25. Clyne, Michael. 2003. Dynamics of language contact. Cambridge: Cambridge University Press. Corbett, Greville. 1991. Gender. Cambridge: Cambridge University Press. Deuchar, Margaret. 2005. Congruence and Welsh-English code-switching. Bilingualism: Language and Cognition 8(3). 255–269.
On the variation of gender in nominal language mixings
217
Deutsches Fremdwörterbuch von Hans Schulz, Erster Band A-K. 1913/1974. Berlin & New York: Mouton de Gruyter. Deutsches Fremdwörterbuch, Band 5 Eau de Cologne-Futurismus, 2nd edition (revised at the Institut für Deutsche Sprache), 2004. Berlin & New York: Mouton de Gruyter. Deutsches Wörterbuch von Jacob Grimm und Wilhelm Grimm, 9. Band F-Fux, revised edition by the Berlin-Brandenburgische Akademie der Wissenschaften and the Akademie der Wissenschaften zu Göttingen, 2006. Stuttgart: Hirzel. Eisenberg, Peter & Jürgen Baurmann. 1984. Fremdwörter – fremde Wörter. Praxis Deutsch 67. 15–26. Field, Andy. 2009. Discovering statistics using SPSS, 3rd edition. London: Sage Publications. Fuller, Janet M. & Heike Lehnert. 2000. Noun phrase structure in German-English codeswitching: Variation in gender assignment and article use. International Journal of Bilingualism 4(3). 399–420. Gonzalez, Kay E. 2005. Die Syntax des Code-Switching. Esplugisch: Sprachwechsel an der Deutschen Schule Barcelona. Köln: Universität zu Köln. PhD Thesis. Gregor, Bernd. 1983. Genuszuordnung. Tübingen: Niemeyer. Haugen, Einar. 1950. The analysis of linguistic borrowing. Language 26(2). 210–231. Jake, Janice L., Carol Myers-Scotton & Steven Gross. 2002. Making a minimalist approach to codeswitching work: Adding the Matrix Language. Bilingualism: Language and Cognition 5(1). 69–91. Jake, Janice L., Carol Myers-Scotton & Steven Gross. 2005. A response to MacSwan (2005): Keeping the Matrix Language. Bilingualism: Language and Cognition 8(3). 271–276. Jorschick, Liane, Antje Endesfelder Quick, Dana Glässer, Elena Lieven & Michael Tomasello. 2011. German-English-speaking children’s mixed NPs with ‘correct‘ agreement. Bilingualism: Language and Cognition 14(2). 173–183. Köpcke, Klaus-Michael & David Zubin. 1984. Sechs Prinzipien für die Genuszuweisung im Deutschen: Ein Beitrag zur natürlichen Klassifikation. Linguistische Berichte 93. 26–50. Köpcke, Klaus-Michael & David Zubin. 1995. Prinzipien für die Genuszuweisung im Deutschen. In Ewald Lang & Gisela Zifonun (eds.), Deutsch typologisch. Jahrbuch des Instituts für Deutsche Sprache, 473–491. Berlin & New York: Mouton de Gruyter. Kroll, Judith F. & Annette M. B. de Groot. 1997. Lexical and conceptual memory in the bilingual: mapping form to meaning in two languages. In Annette M. B. de Groot & Judith F. Kroll (eds.), Tutorials in bilingualism: Psycholinguistic perspectives, 169–200. Mahwah (NJ): Lawrence Erlbaum. Larson-Hall, Jenifer. 2010. A guide to doing statistics in second language research using SPSS. New York & London: Routledge. Lee, Jinhee. 1996. Die graphematische und morphologische Integration von Fremdwörtern im Deutschen. Untersuchungen anhand von Wörterbüchern des 19. und 20. Jahrhunderts. Erlangen: Universität Erlangen dissertation. Lyster, Roy. 2006. Predictability in French gender attribution: A corpus analysis. French Language Studies 16. 69–92. Meechan, Marjory & Shana Poplack. 1995. Orphan categories in bilingual discourse: Adjectivization strategies in Wolof-French and Fongbe-French. Language Variation and Change 7. 169– 194. Muysken, Pieter. 2000. Bilingual speech: A typology of code-mixing. Cambridge: Cambridge University Press. Petersen, Hjalmar P. 2009. Gender assignment in Modern Faroese. Hamburg: Kovaˇc. Poplack, Shana. 2001. Code-switching (Linguistic). In Neil Smelser & Paul Baltes (eds.), International encyclopedia of the social and behavioral sciences, 2062–2065. Amsterdam: Elsevier.
218
Astrid Rothe
Poplack, Shana. 2004. Code-switching. In Ulrich Ammon, Norbert Dittmar, Klaus Mattheier & Peter Trudgill (eds.), Sociolinguistics. An international handbook of the science of language and society, 589–596. Berlin & New York: Mouton de Gruyter. Poplack, Shana. 2012. What does the Nonce Borrowing Hypothesis hypothesize? Bilingualism: Language and Cognition 15(3). 644–648. Poplack, Shana & Marjory Meechan. 1998. Introduction: How languages fit together in codemixing. International Journal of Bilingualism 2(2). 127–138. Poplack, Shana & Alicia Pousada. 1981. A comparative study of gender assignment to borrowed nouns. CENTRO Working Papers 10. Poplack, Shana, Alicia Pousada & David Sankoff. 1982. Competing influences on gender assignment: variable process, stable outcome. Lingua 57. 1–28. Poplack, Shana & David Sankoff. 1984. Borrowing: the synchrony of integration. Linguistics 22(1). 99–135. Poplack, Shana, David Sankoff & Chris Miller. 1988. The social correlates and linguistic processes of lexical borrowing and assimilation. Linguistics 26(1). 47–104. Radford, Andrew, Tanja Kupisch, Regina Köppe & Gabriele Azzaro. 2007. Concord, convergence and accommodation in bilingual children. Bilingualism: Language and Cognition 10(3). 239– 256. Rothe, Astrid. 2012. Genus und Mehrsprachigkeit. Zu Code-Switching und Entlehnung in der Nominalphrase. Heidelberg: Winter. Sankoff, David, Shana Poplack & Swathi Vanniarajan. 1990. The case of the nonce loan in Tamil. Language Variation and Change 2. 71–101. Schulte-Beckhausen, Marion. 2002. Genusschwankung bei englischen, französischen, italienischen und spanischen Lehnwörtern im Deutschen: Eine Untersuchung auf der Grundlage deutscher Wörterbücher seit 1945. Frankfurt am Main: Peter Lang. Schwarze, Christoph. 1988. Grammatik der italienischen Sprache. Tübingen: Niemeyer. Sebba, Mark. 1998. A congruence approach to the syntax of codeswitching. International Journal of Bilingualism 2(1). 1–19. Sebba, Mark. 2009. On the notions of congruence and convergence in code-switching. In Barbara E. Bullock & Almeida Jacqueline Toribio (eds.), The Cambridge handbook of linguistic code-switching, 40–57. Cambridge: Cambridge University Press. Stolz, Christel. 2008. Loanword gender: A case of romancisation in Standard German and related enclave varieties. In Thomas Stolz, Dik Bakker & Rosa Salas Paloma (eds.), Aspects of language contact. New theoretical, methodological & empirical findings with special focus on romancisation processes, 399–440. Berlin & New York: Mouton de Gruyter. Talanga, Tomislav. 1987. Das Phänomen der Genusschwankung in der deutschen Gegenwartssprache – untersucht nach Angaben neuerer Wörterbücher der deutschen Standardsprache. Bonn: Universität Bonn dissertation. Teschner, Richard V. & William M. Russell. 1984. The gender patterns of Spanish nouns: An inverse dictionary-based analysis. Hispanic Linguistics 1. 115–132. Thiel, Rudolf. 1959. Über die Geschlechtsgebung bei Fremdwörtern. Muttersprache 69. 263–266. Thiel, Rudolf. 1984. Die Behandlung englischer Wörter im Deutschen. Sprachpflege 5. 64–66. Treffers-Daller, Jeanine. 1994. Mixing two languages: French-Dutch contact in a comparative perspective. Berlin & New York: Mouton de Gruyter. Tucker, G. Richard, Wallace E. Lambert & André Rigault. 1977. The French speaker’s skill with grammatical gender. An example of rule-governed behavior. The Hague: Mouton. Violin-Wigent, Anne. 2006. Gender assignment to nouns codeswitched into French: Observations and explanations. International Journal of Bilingualism 10(3). 253–276.
On the variation of gender in nominal language mixings
219
Volland, Brigitte. 1986. Französische Entlehnungen im Deutschen. Transferenzen und Integration auf phonologischer, graphematischer, morphologischer und lexikalisch-semantischer Ebene. Tübingen: Niemeyer. Wawrzyniak, Udo. 1985. Das Genus französischer Lehnwörter im Deutschen. Zeitschrift für Sprachwissenschaft 4(2). 201–217. Weiser, Elisabeth. 2008. Aspekte des Code-Switchings bei spanisch-deutschen Bilingualen und Sprachlernern. Köln: Universität zu Köln. MA thesis.
220
Astrid Rothe
Appendix A Tab. 2: Tested German nouns with Italian determiners, all German-Italian respondents German noun
Italian translation equivalent
English translation
masculine nouns Fisch Held Löffel Schinken Stiefel Tisch Zucker Zug
masculine equivalents pesce eroe cucchiaio prosciutto stivale tavolo zucchero treno
‘fish’ ‘hero’ ‘spoon’ ‘ham’ ‘boot’ ‘table’ ‘sugar’ ‘train’
masculine nouns Bahnhof Besen CA Dreisatz CA Drucker Platz Rücken CA Schuh Stein Stuhl Sturm Turm CA
feminine equivalents stazione scopa regola del tre stampante piazza schiena scarpa pietra sedia tempesta torre
‘station’ ‘broom’ ‘rule of three’ ‘printer’ ‘square’ ‘back’ ‘shoe’ ‘stone’ ‘chair’ ‘storm’ ‘tower’
feminine nouns Aufgabe CA Heizung CA Mühle CA Straßenbahn CA Stufe CA
masculine equivalents compito riscaldamento mulino tram; tramvia (f.) grado
‘assignment’ ‘heating’ ‘mill’ ‘tramway’ ‘grade’
neuter nouns Fest CA Mehl CA Papier CA Tor CA Türschloss CA Unternehmen CA
feminine equivalents festa farina carta porta serratura impresa
‘feast’ ‘flour’ ‘paper’ ‘gate’ ‘lock’ ‘company’
For the noun Tisch there is a second translation equivalent tavola (feminine) but its meaning is restricted to a table when it is set. For the noun Stein there is a second translation equivalent as well: sasso (masculine). All nouns that are taken into account in the cluster analysis (see Figure 5) are labelled with noun CA .
On the variation of gender in nominal language mixings
221
Tab. 3: Tested German nouns with French determiners, respondents: bilinguals (D = L1, Fr = L1) and French monolinguals (Fr = L1, D = L2) German noun
French translation equivalent
English translation
masculine nouns Fisch Roman Schinken Zug
masculine equivalents poisson roman jambon train
‘fish’ ‘novel’ ‘ham’ ‘train’
masculine nouns Dreisatz Löffel Platz Schuh Turm Wald
feminine equivalents règle de trois cuillère place chaussure tour forêt
‘rule of three’ ‘spoon’ ‘square’ ‘shoe’ ‘tower’ ‘forest’
Tab. 4: Tested German nouns with Spanish determiners, respondents: bilinguals (D = L1, Sp = L1) and Spanish monolinguals (Sp = L1, D = L2) German noun
Spanish translation equivalent
English translation
masculine nouns Gürtel Mantel See Umschlag Zahn
masculine equivalents cinturón abrigo lago sobre diente
‘belt’ ‘coat’ ‘lake’ ‘envelope’ ‘tooth’
masculine nouns Kopf Schlüssel Teppich
feminine equivalents cabeza llave alfombra
‘head’ ‘key’ ‘carpet’
222
Astrid Rothe
Tab. 5: Tested Spanish nouns with German determiners, respondents: bilinguals (D = L1, Sp = L1) and both German monolingual subsamples (D = L1, Sp = early L2 and D = L1, Sp = late L2) Spanish noun
German translation equivalent
English translation
masculine noun sol
feminine equivalent Sonne
‘sun’
feminine nouns luna mesa torre
masculine equivalents Mond Tisch Turm
‘moon’ ‘table’ ‘tower’
feminine nouns bicicleta cara sal
neuter equivalents Fahrrad Gesicht Salz
‘bicycle’ ‘face’ ‘salt’
Tab. 6: Tested French nouns with German determiners, respondents: bilinguals (D = L1, Fr = L1) and German monolinguals with early L2-French (D = L1, Fr = early L2) French noun
German translation equivalent
English translation
masculine nouns beurre drapeau métro nuage souci
feminine equivalents Butter Fahne, Flagge (U-)Bahn Wolke Sorge
‘butter’ ‘flag’ ‘metro’ ‘cloud’ ‘sorrow’
On the variation of gender in nominal language mixings
223
Tab. 7: Tested French nouns with German determiners, respondents: German monolinguals with late L2-German (D = L1, Fr = late L2) French noun
German translation equivalent
English translation
masculine nouns baton bras chiffon crayon four lac manteau pinceau tapis with biological gender prince roi
masculine equivalents Stab Arm Lappen, Lumpen Stift Ofen See Mantel Pinsel Teppich
‘baton’ ‘arm’ ‘rag’ ‘pencil’ ‘oven’ ‘lake’ ‘cloak’ ‘brush’ ‘carpet’
Prinz König
‘prince’ ‘king’
masculine nouns métro nuage
feminine equivalents (U-)Bahn Wolke
‘metro’ ‘cloud’
feminine nouns craie école main montre pince plume poire gender transparent nouns: fourchette solution with biological gender: femme reine vendeuse
feminine equivalents Kreide Schule Hand Uhr Spange Feder Birne
‘chalk’ ‘school’ ‘hand’ ‘watch’ ‘clasp’ ‘feather’ ‘pear’
Gabel Lösung
‘fork’ ‘solution’
Frau Königin Verkäuferin
‘woman’ ‘queen’ ‘saleswoman’
feminine nouns chemise épée fenêtre jambe maison mer voiture gender transparent noun: serviette
neuter equivalents Hemd Schwert ; Degen (m.) Fenster Bein Haus Meer Auto (n.); Wagen (m.)
‘shirt’ ‘sword’ ‘window’ ‘leg’ ‘house’ ‘sea’ ‘car’
Handtuch (n.); Serviette (f.)
‘towel’
Helge Sandøy
Linguistic globalization: Experiences from the Nordic laboratory Abstract: Since the 1960s the Nordic language boards have been concerned and worried about the Anglo-American influence on the languages of the Nordic countries. In 2001 a research group started a comprehensive empirical study both of the foreign influence on the languages in question and of people’s attitudes towards loanwords (import words). To operationalize the notion of loanword we defined it to be a word that had entered the languages after the Second World War (i.e. ‘modern import words’). The aim of the project reported in the article was to investigate whether the foreign influence differs at the various levels of linguistic description, and whether the degree of influence correlates with internal linguistic or external, i.e. social, factors. The way of examining these questions was to use the Nordic countries as a laboratory, by making systematic comparative studies with identical research methods in the seven main language communities, all of which with a specific history and with cultural characteristics, but also with a very similar economic and social structure. Thus the Nordic countries make up a well-suited laboratory. The seven communities are: Iceland, The Faroe Islands, Norway, Denmark, Sweden, Swedish-speaking Finland and Finnish-speaking Finland. The article describes quantitatively both how English-positive vs. purist each language community is in various perspectives, and how the situation has changed in a surprising way in some of the communities from 1975 to 2000. It is thoroughly attested how the different levels of description demonstrate different degrees of structural adaptation of import words, and the pattern of each structural level is not identical in the closely related languages. The results demonstrate historical changes in the import rate that are very interesting with respect to testing hypotheses about social forces causing different inclinations to importing foreign words, and about possible restrictions from linguistic structure on the acceptance of import words. We found that especially Norway had changed during the last quarter of the 20th century from being a rather “purist” country to being the leading language community with respect to using modern import words. This Norwegian situation in particular is discussed with respect to the possible indirect effect of changes in economic structure and wealth. This study demonstrates the advantage of using the Nordic communities as a language laboratory and it has provided us with solid comparative data which we can exploit when trying to test hypotheses about linguistic change and causes of change.
226
Helge Sandøy
1 Aim and project The influence of Anglo-American culture in general and English language in particular has been a topic of discussion for both language boards and general cultural discourse in all Nordic countries since the early 1950s. In the first decades, the debate focussed on the cultural sphere in which Nordic countries participated after World War II and on American hegemony. In recent decades, however, the situation has been understood from the perspective of globalization. From the perspective of linguists, there is a need for practical information about the degree of influence of the English language and for a research approach studying the causes and consequences of borrowing. Suggestions have been given for why languages in the Nordic countries import English differently. The broad and vivid engagement in this cultural debate has inspired research projects. Several projects were conducted in the 1980s and 1990s (Ljung 1985, 1988; Chrystal 1988; Jarvad 1995; Graedler 1998; Sharp 2001) on the question of the degree of influence, but all of these studies were restricted to one national language. Furthermore, their methods differed, making comparisons difficult, as demonstrated in Graedler (2007). Therefore, in 2001, the project Modern import words in the Nordic countries (Moderne importord i språka i Norden – MIN, cf. http://folk.uib.no/hnohs/MIN/) was launched. This project involved 30 researchers in various positions (university staff, doctoral students, and graduate students) in seven communities: Iceland, the Faroes, Norway, Denmark, Sweden, Swedish Finland and Finnish Finland. Finnish does not typologically belong to the Nordic languages ; this is the reason for the wording “in the Nordic countries” in the project title. Within the Nordic languages, there are significant structural differences, so potential structural restrictions or influences on import tendencies can be identified. We normally consider all seven relevant communities to have similar societal structures, so the “Nordic community” can function as a laboratory in which other factors, including culture, historical background, and language structure, can be tested. As a semantically or metaphorically more neutral term for loanword, we prefer ‘importord’ = import word or import, as used in the project title.1 One of our main objectives is to benefit from the laboratory situation constituted by the seven Nordic language communities. As a written language, Finland-Swedish adopts Sweden-Swedish norms in orthography, morphology and syntax, but it often shows deviances in vocabulary. The spoken language has its own independent tradition. The main aim of the MIN project is to gain insight into the degree of linguistic influence and the societal and linguistic conditions for the influence on our languages. Attitudes towards English influence are considered essential for borrowing. In partic-
1 Some subprojects were financed by the Norwegian Research Board, NordForsk, the Joint Committee for Nordic Research Councils for the Humanities and the Social Sciences (NOS-HS), Nordplus and the Nordic language boards.
Linguistic globalization: Experiences from the Nordic laboratory
227
Fig. 1: The six Nordic countries (and the seven language communities) under study
ular, we investigate the correspondence between various types of attitudes and borrowing. By designing a project that comprised parallel subprojects in all seven communities, we hoped to generate comparative descriptions that could be a testing tool for hypotheses and ideas about the relations between society and language. Based on the aim of this study, we modeled five topics and seven subprojects to be conducted with a parallel methodology in all communities to allow a systematic comparison. To investigate these topics, we collected the following types of data: A: the volume of imports in the individual languages, taken from a database of 2.6 million words in 19 newspapers from 1975 and from 2000 (Selback and Sandøy 2007); B: the adaptation of imports to the domestic languages in i) morphology, ii) pronunciation and iii) spelling, taken from the same database as in A and complemented with results from a questionnaire for spoken language (Jarvad and Sandøy 2007; Omdal and Sandøy 2008); C: the frequency and usage of native replacement words, studied by contrasting frequencies of 40 pairs of imports vs. substitute words in previously established databases (newspapers from the 1990s) (Kvaran 2007); D: the official standardization tradition described on the basis of historical documents and descriptions (Sandøy and Östman 2004);
228
Helge Sandøy
E: i) conscious and ii) subconscious attitudes towards imports and substitute forms, investigated by 264 in-depth, tape-recorded interviews (Nyström Höög 2005; Thøgersen 2007; Óladóttir 2009; Mattfolk 2011; í Lon Jacobsen 2012), 6000 Gallup poll interviews (Kristiansen and Vikør 2006) and a matched guise test with 4200 respondents (Kristiansen 2006). In Section 2, I will discuss some theoretical topics that form the point of departure for our research questions. Section 3 presents a broad description of the situation in the Nordic countries regarding linguistic influence (cf. subprojects A, B and C above). For the background of this description, in Section 4, I will test whether the general factors discussed in Section 2 have any explanatory power. Section 5 presents our studies on different aspects of attitudes towards linguistic influence, and Section 6 suggests explanatory factors at the macro-level for both attitudes and linguistic influence.
2 Theoretical background and research questions There are various claims about the factors that can hamper or increase the degree of practised linguistic purism (read: resistance to foreign influence) or linguistic openness. Some linguists assert that the structure of a language is a decisive factor because a complex structure cannot integrate alien linguistic structures (Kramer 1983: 314 in the Italian case, Ottósson 1997: 32 in the Icelandic case). A language with a case system, for instance, will tend to reject borrowings if they do not fit the case morphology. In the Nordic context, this should mean that Icelandic and Finnish are maximally purist. The counter-argument is that language users have the competence to adapt borrowings to the structure of their mother tongue. In our project, adaptation is measured and regarded as an aspect of purism. George Thomas (1991) describes the social and societal nature of purism and claims that purism can be directed against both internal elements (i.e. dialect features in the standard language) and external elements: Purism is the manifestation of a desire on the part of a speech community (or some section of it) to preserve a language from, or rid it of, putative foreign elements or other elements held to be undesirable (including those originating in dialects, sociolects and styles of the same language). It may be directed at all linguistic levels but primarily the lexicon. Above all, purism is an aspect of the codification, cultivation and planning of standard languages. (Thomas 1991: 12)
Normally, the notion of purism refers to external purism, as it does in the present article. Because the situation of purism versus openness towards foreign influence varies from community to community, it is a typically cultural phenomenon. On the ideological level, we can observe that the notion of purism has negative connotations in Sweden and Denmark, whereas “purism” and “pure language” trigger positive associations in the Icelandic community. Historians are inclined to look for social and po-
Linguistic globalization: Experiences from the Nordic laboratory
229
litical functions of purism, which may vary from the subordinate’s need for defence against foreign dominance to the cultural expression of hegemony by a social elite in a given context. The most central historical factor that influences the degree of purism is whether people experience a situation as a threat from abroad or a necessary element of progress. In some political contexts, people find it essential to demonstrate their sovereignty or national characteristics. The struggle for political sovereignty is relatively recent in all Nordic states, with the exception of Sweden and Denmark, which emerged during the high Middle Ages and have remained sovereign states (although they formed a union during 1397–1523). Norway gained full sovereignty in 1905, Finland in 1917, and Iceland in 1944, whereas The Faroe Islands continue to struggle for independence from Denmark. If political conditions exert influence, we should expect that the speed of borrowing will fluctuate in accordance with a changing political situation. Therefore, in subproject A of the MIN, we collected data from both 1975 and 2000 with the intention of studying this potential correspondence. A well-known sociolinguistic insight is that people’s opinions or conceptions of language do not necessarily correspond to the actual state of the relevant language, such as the difference between reported and actual speech (Trudgill 2000). This discrepancy can be interpreted as an effect of the cultural focus in a community (or the cultural discourse of a community, which may be an effect of dominance or hegemony). This situation may also apply to foreign influence. Therefore, in the MIN, we examined people’s opinions of the situation and compared them with our statistical data from studies on linguistic structure (cf. Section 5.1). Language contact is normally considered the most evident mediating factor for language influence, especially when the contact is direct, as in face to face-situations where language accommodation between individuals is a normal socio-psychological consequence (Giles, Coupland, and Coupland 1991). This force seems to be more relevant for influences from a dominant language than from a dominated language, in accordance with accommodation theory. This situation is evidenced in history: Low German had an overwhelming influence on all Scandinavian languages over the centuries where the Hanseatic League controlled much of the commercial traffic in Northern Europe. Therefore, we may expect that the currently increasing contact with English may demonstrate a similar impact. A social factor that is claimed to be an important condition for openness towards language influence is urbanization (Lund 1986). Because urban societies do not create close-knit networks as rural communities do, urban societies are generally more open. The urbanization rate may be theoretically relevant in this context if we assume that urbanization is a way of measuring modernization and that modernization disseminates English imports into a language. The Nordic countries represent a laboratory in this respect as well because the degree of urbanization varies considerably from country to country.
230
Helge Sandøy
It is difficult to identify a direct causal connection between the political situation and purism. The intermediate link must be that historical and political situations form people’s attitudes, and these attitudes hamper or increase the foreign influence on language. In our project, we have collected different types of data to test this (cf. overt and covert attitudes in Section 5). This topic is of interest both because of its theoretical content and because of the political question whether authorities should spend money on campaigns with the intention of changing people’s attitudes. Based on these theoretical topics, our research questions in this project are as follows: 1. What is the degree of influence in the seven Nordic language communities, and is the influence different or the same at the various levels of linguistic description? (cf. Section 3) 2. Does the degree of linguistic influence correlate with any of the linguistic, historical, social or attitudinal (socio-psychological) factors suggested above? (cf. Sections 4 and 5) A theoretical issue that had to be operationalized prior to the data collection was the definition of an import word (loanword). Our approach was to study the situation in modern history. We were dependent on what people understood and experienced as import words in our attitude studies. Linguistic and etymological definitions were not relevant to our purpose; thus, we operationalized the notion of modern imports as words entering our languages after the Second World War. Modern meant that we “reset” language history and began in 1945.
3 Foreign influence on different linguistic levels – the situation in 2000 Figure 2 presents the simplest description of foreign influences on the languages in question, showing import frequencies in selected newspapers from 2000. These newspapers were selected for the same two days to reduce the risk that different top news stories would influence the vocabulary used in newspapers from different countries. The newspapers chosen were both national and local, and the imports, with context, were excerpted from all articles and adverts and tagged in a database with information about linguistic categories, genre, and topic. Which languages characterize linguistic globalization in the Nordic countries? The distribution is strikingly similar in all seven communities for export (donor) languages, as Table 1 demonstrates.
231
Linguistic globalization: Experiences from the Nordic laboratory
100 90 80 70 60 50 40 30 20 10 0
Icelandic
Faroese
Norwegian
Denmark
Sw-Swedish
Fi-Swedish
Finnish
12
27
88
82
70
68
35
2000
Fig. 2: Imports per 10,000 text words
Tab. 1: Imports from various languages (percentage of tokens in editorial texts) (Sandøy 2007: 130) Export language
Icelandic Faroese Norwegian Danish Sweden- Finland- Finnish Average Swedish Swedish percentage
English International words Other languages Italian German French Japanese Spanish Greek Russian Arabic Finnish Swedish Danish
71.1 1.2 9.6 7.6 1.2 1.6 2.8
Total
100
89.1 6.0 1.1 1.9 0.8 1.1
3.6 0.4
87.3 3.2 2.4 3.1 0.8 0.8 0.7 0.5
82.9 10.2 2.7 0.1 1.0 1.1 0.2 1.2
91.5 2.7 2.5 0.7 0.9 0.5 0.3 0.7
89.8 1.9 1.7 0.7 1.5 1.3 0.2 0.3
1.2
0.1 0.5
0.2 0.2
1.3 0.7 0.7
85.0 6.9 0.5 1.1 0.2 1.5 0.6 1.3 0.1 0.5
85.2 4.6 2.9 1.9 1.1 1.1 0.8 0.6 0.5 0.5 0.2 0.1
2.4 0.8 100
100
100
100
100
100
1
Table 1 shows that English has a dominant position. As most international words (e.g. televisjon, mega-) have diffused throughout the world through English, we can conclude that roughly 90% of modern imports in our languages have English as their last harbour before being imported to our languages. Of the other export languages, no
1 Averages are not relevant for the Scandinavian words because they are counted only in the nonScandinavian languages.
232
Helge Sandøy
single language plays an interesting role; they are therefore ignored in the following discussion. Table 1 is based on token frequency (i.e. occurrences in texts). In many lexicological contexts, focusing on types (= lexemes) may be a more relevant approach and may provide another picture. In this dataset, calculating the imported lexemes used in the texts gives us almost the same percentages. Therefore, we will use token frequency in this study. The dominance of English is an effect of international economic and political affairs. The distribution of imports in word classes, shown in Table 2, may also reflect conditions made by the linguistic structure. An essential part of linguistic influence is the introduction of new notions, which are most naturally expressed by nouns. However, the different percentages of word classes may also be an effect of linguistic properties, and the notion of open classes is probably relevant.
Tab. 2: Modern imports distributed by word classes (percentage of tokens) Word class
Icelandic Faroese Norwegian Danish Sw-Swed Fi-Swed Finnish Average Distr. of word of % classes in Norw. texts
Nouns Adjectives Verbs Adverbs Interjections Prepositions Other classes.
88.4 7.2 3.6
Total
1000
89.8 8.3 0.4 1.5
88.4 7.7 3.7 0.1 0.1
88.6 8.0 3.2 0.1 0.1
89.3 7.6 3.0 0.1
87.5 8.3 3.8 0.3 0.1
94.3 3.2 1.9 0.6
100
100
100
100
100
100
0.4 0.4
89.5 7.2 2.8 0.4 0.1 0.1
22.6 8.2 20.8 7.4 0.2 14.8 26.1 100
Table 2 indicates that the distribution is similar in all languages, except for a higher percentage of nouns in Finnish. Whether this difference can be traced to the different structure of Finnish cannot be addressed here. To put the figures in perspective, we have added a column for the general word class distribution in Norwegian texts (Vestbøstad 1989: XVI). Comparing the two rightmost columns, we can conclude that nouns are obviously the most open class for imports. It is surprising that imports to the verb class make up only 2.8% because this class comprises 20.8% of running words in texts and is normally regarded as an open class. If we measure the opposite of purism, “openness towards foreign language influence”, by counting modern imports in newspaper texts, the situation in our language communities is as visualised in Figure 2 above. For our discussion, we ignore adverts and consider only articles and letters because the language used there is the most representative of factual prose and daily language use. Against the background of the common Nordic stereotypes discussed below in Section 5.1 most of our research group was surprised that Norwegian was at the top of the ranking list. As expected, Icelandic, Faroese and Finnish were the most purist
Linguistic globalization: Experiences from the Nordic laboratory
233
in language practice. Not all differences in Figure 2 are significant. We add the results from a Pearson’s test that shows that the difference between Danish and Norwegian is not significant. We should thus rephrase our conclusion so that Norwegian and Danish are at the top of the Nordic openness ranking and are significantly ahead of SwedenSwedish and Finland-Swedish: Norwegian = Danish *** > Sw-Swedish = Fi-Swedish *** > Finnish ** > Faroese *** > Icelandic.2 Influenced by George Thomas (1991), we wanted to study the different effects of purism on the various description levels: lexical, phonetic, spelling and morphological. When designing the project, we dismissed the idea of studying the syntactic level as there would be little measurable frequency of English structures. Morphological purism is typically strong for all of our languages, and we found low frequencies when we operationalized it as the number of running words (tokens) with a foreign suffix. Danish ranks highest, with 5 instances per 10,000 words. Almost all instances are related to the plural suffix of nouns, where the plural –s has been accepted for more words in Danish than in the other languages. Verbs never show a foreign suffix (we comment on adjectives below). Figure 3 demonstrates openness to both foreign morphological influence and spelling, and the percentage level difference between the two is striking. In these two cases, the method measures the percentage of instances with foreign vs. national spelling and foreign vs. national suffixes. For example, we counted how often juice was spelled juice or jos in Swedish texts or whether displays was written with the plural -s or with the Danish plural suffix -er, as in displayer. In Figure 3, Danish is the most open language, with a significant difference from Norwegian. Almost 60% of the imports were spelled in accordance with the export language and differently from the prescriptions of the orthographic principles of the Danish spelling system. This different tradition can easily be observed in actual Danish usage, which may be an effect of the fact that the Danish language board seldom recommends that an import be spelled differently from English. All other language boards often do so, such as tape, where tejp is the Swedish spelling and teip is an accepted Norwegian spelling (Omdal 2008: 164). Written language represents the easiest accessible source when studying imports. However, it represents only a minor part of the language use in a community, and, moreover, it is more loyal to language institutions in society, whether a language board or editors’ ideology and guidelines. We thus also aimed to study spoken language but had to develop a more selective and strategic approach, as accessible oral language
2 = ‘no statistically significant difference’, ** ‘significance p < .01’, *** ‘significance p < .001’. (Sandøy 2007: 147)
234
Helge Sandøy
70 60 50 40 30 20 10 0
Icelandic
Faroese
Norwegian
Danish
Sw-Swedish
Fi-Swedish
B1 Ortho
28
43
42
59
36
25
Ba Morpho
2
2
3
5
3
2
Finnish
Fig. 3: Orthographic and morphological purism (percentage)
databases did not exist in all communities during the planning stage in 2001. Because of our strict principle of using completely parallel methods for all languages, we had to elicit our own data by developing tests. These were constructed as a questionnaire, in which we asked respondents to produce certain sentences or pronounce specific words that were elicited by either giving synonyms or asking the respondents to read words written on a piece of paper. All surveys were tape recorded and analyzed for each variable, which should reveal how randomly selected respondents chose to “solve a conflict” between a relevant English structure/pronunciation and a corresponding domestic (or mother-tongue) structure and pronunciation, e.g. jobs vs. jobber in the plural and juice with initial [Ã-] or [j-]. The variables were similar though not always identical for each language community because of different linguistic structures. When interpreting Figure 4, we must notice that the percentages given are averages of the
60 50 40 30 20 10 0
Icelandic
Faroese
Norwegian
Danish
Sw-Swedish
Fi-Swedish
Finnish
B2 Phonol
9
2
26
44
38
37
16
B2 Morpho
56
22
21
48
37
33
18
Fig. 4: Purism in spoken language, phonology and morphology
Linguistic globalization: Experiences from the Nordic laboratory
235
specific phonetic and morphological variables used in the test and do therefore not represent text frequency. The curve for phonetic variables repeats the general tendency that the “peripheral” languages in the Nordic countries are less open than the central ones in Scandinavia (= Norway, Denmark, and Sweden). We observe here, however, that Norwegian stands out from the others by being fairly puristic (26%), whereas Danish has the greatest number of imports with its 44%. There are two deviating tendencies for the morphological variables. That Faroese is, unexpectedly, at the same level as Norwegian possibly reflects that many words are “copied” from Danish, which has a more consistent pattern of being open. Danish newspapers and literature are central in Faroese culture, and the English (and Danish) plural suffix –s is therefore frequently used, whereas the plural –s is more restricted in written Faroese because of the traditional purism. Figure 4 is relevant only as a comparison between the languages, not between the decriptional levels or between written and oral language. The oral tests had a different design, as, in the questionnaire, we asked for selected words that could reveal the “boarders” for English as a morphological model, and the comparability principle was handled using the same words or phenomena in all languages. Great differences in percentage between the tests on written vs. oral data can thus partly reflect an effect of a different method. The frequency in running words is demonstrated in Figure 3, and, e.g. written Faroese has a low rate of English morphology there. The two different methodological approaches produce very different scores for Icelandic in Figures 3 and 4. In the oral test, we focused on the concordance principle, which Icelandic, interestingly, seems to suspend when using imported adjectives, e.g. in Stelpurnar eru kúl ‘the girls are cool’, where we should expect *Stelpurnar eru kúlar in accordance with traditional grammar. The high percentage (56%) of openness in oral morphology is caused by this feature, whereas the other languages are more conservative and retain the concordance principle to a much higher extent. The figures above demonstrate that linguistic purism can be imposed differently on the various linguistic structure levels in different languages. This is an interesting observation in both structural linguistics and the sociology of language. If we extend our notion of purism, in accordance with Thomas (1991), by applying it to the internal language standardization situation (endoglottic purism), this could be demonstrated further; we omit this here. Figure 5 summarizes and simplifies the total situation concerning exoglottic purism. The figure displays the various kinds of purism, and it is obvious that purism is practised most consistently in Icelandic and Finnish – i.e. on all levels. Danish shows the greatest contrast, whereas Norwegian and Swedish differ concerning the structural level on which purism is demonstrated. An alternative and quantitative way of describing the Nordic situation is to sum all levels by averaging the rank numbers for each language community in the previous tables. Figure 6 gives the result.
236
Helge Sandøy
Icelandic Faroese
Norwegian Danish Sw -Swedish Fi -Swedish Finnish
Morphological Spelling Phonetic Lexical Fig. 5: Exoglottic purism 6 5 4 3 2 1 0 Datenreihen1
Icelandic
Faroese
Norwegian
Danish
Sw-Swedish
Fi-Swedish
Finnish
1,6
2,4
4,2
5,4
3,8
3,6
2,1
Fig. 6: Average of rank numbers in Figure 2–4.
4 Are there linguistic, historical and social explanations? It is obvious that Icelandic and Finnish are the languages that differ most from English, structurally speaking. This finding coincides with these two languages being the least open towards English. Thus, this claim initially has some support in the empirical data of our project. However, all of our languages are rather purist with regard to morphology, despite the significant differences in morphological complexity, and all have some phonological adaptation strategies, although they are not used to the same extent (cf. Figure 3). Furthermore, as shown below, Finnish has changed its import rate without changing its structure. The suggestion that political history can play a role and influence cultural life and people’s mentality is within the traditional historical interpretation. This suggestion seems likely for Iceland because this country struggled for independence for a long time. Since the late Middle Ages, there have been incidents that indicate an awareness that Iceland had to defend its society both culturally and economically (e.g. Iceland has never had military forces). With the struggle to restore its parliament (Alþing) in the middle of the 19th century, a strong national feeling was developed, crowned by the
Linguistic globalization: Experiences from the Nordic laboratory
237
success of the Home Rule Act of 1917 and full sovereignty in 1944. Finland experienced a parallel political struggle in the first half of the 19th century and obtained sovereignty in 1919. The Faroe Islands are still not completely independent of Denmark, and there is a strong cultural discussion and awareness of conflicting interests, which may nourish the “purist mood”. (It can be added that Icelandic and Finnish as languages did not play a role in the struggle for independence. In the Faroes, in contrast, the acceptance of the Faroese language was a political topic until around 1980.) The three geographically peripheral language communities (Iceland, Faroes and Finnish-Finland) thus have a political history that can plausibly explain language purism in accordance with our suggestion. Moreover, the official language policy pursued by the language board in Iceland and the Faroes is in full accordance with our expectation and concentrates on developing replacement words. That the two central Scandinavian national states, Denmark and Sweden, have never been in a constitutionally subordinated position further supports this position. In these states, we find openness towards imports. Correspondingly, the official language policy has not focused on restrictions to import words. However, Norway is a problem in this line of reasoning because its political history during the 19th century was strongly characterized by a struggle for independence, which it achieved in 1905. This is not reflected in the language situation presented in Figure 2. If we want to take this suggestion about political history further, we must develop an argument about how long the cultural effect of a mood based on the struggle for independence can last and when this effect can be overruled by other forces. To investigate historical changes in the tendency to import foreign words, we excerpted newspapers from 1975 from the same dates that we used for 2000, following the same criteria and guidelines for both periods. We counted how many modern imports were accumulated in the language since 1945. The time span was 30 years for the 1975 figures and 55 years for the 2000 figures. We should expect almost a doubling in quantity if the speed – or amount per year – is the same (specifically, an increase of 83%). Figure 7 displays the result from Nordic newspapers, and the 1975 results for Swedish newspapers correspond to those found in Ljung (1985). From the expectation of doubling mentioned above, our figures from Icelandic, Faroese and Swedish (in both countries) need no comment, except that we can conclude that there is a slight growth in speed. Based on an increasing tendency of globalization and worldwide integration over this period, we should be surprised that this evolution has not accelerated more. Conversely, Danish, Norwegian and Finnish have sped up, the last two dramatically so. Finnish has increased the amount of imports by 483% (from 6 to 35), Norwegian by 319% (Danish by 165%, Faroese 145%, Icelandic 149%, Fi-Swedish 113%, and Sw-Swedish 112%). The central Scandinavian languages were already well ahead of Icelandic, Faroese and Finnish in 1975, and they had increased their lead by 2000 if measured in absolute numbers. The Finnish and Norwegian situation has changed over one generation. These striking and clear empirical results were especially surprising from the perspective dis-
238
Helge Sandøy
100 90 80 70 60 50 40 30 20 10 0
Icelandic
Faroese
Norwegian
Danish
Sw-Swedish
Fi-Swedish
1975
5
11
21
31
33
32
Finnish 6
2000
12
27
88
82
70
68
35
Fig. 7: Historical development in amount of modern imports in newspapers in 1975 and 2000 (per 10,000 text words)
cussed in the suggestion concerning the social and societal nature of purism in Section 2. If we seek societal explanations for this change, we can assume that, in the Finnish case, the shift in political orientation has had an effect on the general cultural climate. In the 1970s, Finland was still eager to underline its political neutrality and good relations with Moscow, but the “grasp from the east” was soon lightened, and the country has become westernised. This may have affected the language culture as well. At this stage, this is only a tentative assumption. Section 6 will discuss the Norwegian case. Because Icelandic and Finnish are the most geographically peripheral languages in the Nordic countries, it is sometimes assumed that an axis from the centre to the peripheries reflects a reduction in the extent of general contact and external influence. However, this issue of the centre and the periphery does not seem to correspond perfectly with our data. In our Gallup poll, we collected information on how often the respondents had been in contact with the English language in the last week and what kind of contact (spoken, written or read) they had had. Figure 8 shows the average frequency of having used English last week. The most peripheral language community, Iceland, ranks as the superior one in this respect, well ahead of the central Scandinavian countries. It is also the most purist country. In the eastern periphery, there is a corresponding relationship between Finnish and Finland-Swedish, where users of the former are in contact with English more often than users of the latter. This indicates that “peripheries” are not peripheries in our context and that the notion of periphery is unproductive. More importantly, when comparing Figure 8 and Figure 6, we find that a high degree of contact with English does not correspond with being strongly influenced by English. As far as the urbanization hypothesis is concerned, Iceland is again the counterevidence because this country is centralized and urbanized, with more than half of its
Linguistic globalization: Experiences from the Nordic laboratory
239
80 60 40 20 0
Used last week
Iceland
Faroe Islands
Norway
Denmark
Sweden
SwFinland
Fi-Finland
74,8
43,7
58,8
55,2
61,2
46,8
53,9
Fig. 8: Percentage of respondents who used English “last week”
population in and around the capital. In contrast, Norway, which is open to imports, has a geographically scattered population.
5 Perception of foreign influence 5.1 Nordic stereotypes As mentioned above, influence from the Anglo-American world has been an issue of public discourse since the Second World War, and we may expect a high awareness about this among people. As the Nordic countries have developed collaboration on all societal areas and people have rather good contacts and experience with the neighboring countries, we could expect them also to have a meaning or impression of how each language culture was oriented with respect to imports. We thus asked our informants in the in-depth interviews to rank all relevant languages by the amount of imports and eagerness to replace imports with national words (for instance e-post for e-mail ). Figure 9 visualizes the results of lay people’s views on their own and their neighbors’ languages. Figure 9 shows that lay people in all countries tend to rank Danish as the most open language (only people in Finland and Sweden think Swedish is more open). The curve shapes are similar, and the figure can be an illustration of a rather great homogeneity across state borders in how we view the various languages. The figure may inform us to what extent these countries share a common culture, i.e. including common stereotypes. This applies both to the beliefs that Danish is on top and that Norwegian is a fairly puristic language. If we count the averages of stereotypes of each language in Figure 9, we obtain Figure 10. Interestingly, also linguists (i.e. experts) normally share this opinion with laymen about how open or puristic the languages are (Lund 1986:
240
Helge Sandøy
100 90 80 70 60 50 40 30 20 10 0
Iceland Faroe Islands Norway Denmark Sweden Sw-Finland Fi-Finland
Fig. 9: Nordic stereotypes of how open the various languages are towards imports 60 50 40 30 20 10 0 Icelandic
Faroese
Norwegian
Danish
Swedish
Finnish
Fig. 10: Averages of stereotypes about the languages
35; Vikør 1995: 181). However, as Figure 2 shows, these stereotypes are not true when we investigate empirical data.
5.2 Conscious (overt) attitudes In the Gallup Poll, we put these questions (among others) to the respondents: A. To what degree do you agree with the following claims? 1) “People use far too many English words currently” 2) “One ought to create new [national] words to replace the English words that enter our language” 3) “The best thing should be that English was the common mother-tongue for all human beings”
Linguistic globalization: Experiences from the Nordic laboratory
241
B. What is your attitude to using English as the working-place language in national enterprises? When calculating the scores for these general questions, we found an average for each language community, as displayed in Figure 11.
10,7
9,7 9,2
9
8,2
8,4
Ic
Fa
No
9
Da
SS
FS
Fi
Fig. 11: Average of openness in conscious attitudes (i.e. agreement with some positive claims) (Sandøy and Kristiansen 2010: 156)
Danish attitudes are obviously most open in the Nordic countries, and Swedish is second. Icelandic attitudes are most puristic. We can now observe both a striking similarity between Figure 11 (conscious attitudes) and Figure 10 (stereotypes). However, these curves do not fit well with the contemporary pattern of purism/openness in language usage, cf. Figure 2. Conversely, the pattern of attitudes and stereotypes corresponds better with the pattern of usage in 1975, cf. Figure 12, which applies first and foremost to the central Scandinavian language ranking, where Norwegian was not an open language. An interesting question to include in the follow-up to this study is whether stereotypes and conscious attitudes represent a time-lagged pattern, i.e. a product of what we experienced a generation ago. An argument supporting such a suggestion of a time-lagged pattern is that both conscious attitudes and stereotypes represent utterances that we control and reflect on, and it would be reasonable if they are based on previous experiences either from language use itself or public discourse. We can therefore expect stereotypes and conscious attitudes to be conservative. Furthermore, conservatism is in this respect synonymous with consistency, and consistency is certainly preferable for our (conscious)
Fig. 12: A comparison of conscious attitudes and stereotypes in 2002 and language use in 1975
242
Helge Sandøy
self-image. However, we leave this question and further speculations open here. Our results could be interesting in a wider theoretical discussion of the nature of attitudes and awareness.
5.3 Subconscious (covert) attitudes When respondents consciously offer attitudes, they have the opportunity to interpret a question in context and to respond in accordance with what is politically correct or in accordance with their self-image. Therefore, we found it interesting to collect data on people’s subconscious attitudes when they provide evaluations on the basis of linguistic stimuli without knowing that the test is related to language (Kristiansen 2001, 2006). A large-scale subproject examined people’s subconscious attitudes using a matched / verbal guise test with news reading. The guises were identical texts with different numbers of imports. The context presented to the respondents was that we wanted help to find the best newsreader to be employed at a radio station. This test was performed in various gatherings, meetings and classes, and the number of respondents was well over 600 in each community. At the end of the test situation, the respondents were invited to discuss the voices and the test itself. If we discovered during the discussion that the respondents tended to be sceptical about our alleged purpose and that they had discovered that the study was about language and not the ability to read news, we considered the test in that gathering invalid. An interesting consequence of this method was that our test did not work in Iceland and the Faroes. The respondents in these areas tended to doubt our story quickly and claimed that this study must be about language and imports. Our guises were, so to say, disguised. This experience illustrates how essential language and purism are in the public discourse of these language communities. Following our strict principles, we thus had to discard the Icelandic and Faroese tests in the comparisons across language boards. Figure 13 displays the averages of evaluations of the guises with most English imports. The averages indicate surprising findings based on the background of our results presented above. On the subconscious level Norwegians and Finland-Swedes are most positive towards the use of English words, whereas Danes and Finns are most sceptical. The fact that Danes are subconsciously the most sceptical towards imports in Scandinavia is surprising, but it represents a rather normal situation in which conscious and subconscious attitudes provide opposite results. These results do not represent a conflict because we are dealing with different objects of study (i.e. two types of attitudes), and Kristiansen’s view is that subconscious attitudes are the fundamental driving force in language change (Kristiansen 2001, 2009). Norwegians (and Finland-Swedes) are most in favor of imports. This finding corresponds to the situation in our data from 2000, presented in Figure 7. Swedes are signif-
Linguistic globalization: Experiences from the Nordic laboratory
243
0,1
0,07 -0,22
-0,44 -0,65
No
Da
SS
FS
Fi
Fig. 13: National averages of differences between evaluations of texts with and without English imports in news reading (Sandøy and Kristiansen 2010: 157 and Kristiansen 2010: 83)
icantly below Norwegians both with respect to actual usage in 2000 and their preferences for imports, as shown in Figures 7 and 13. Danes’ subconscious attitudes deviate fundamentally. Whether this has an effect on their willingness to accept imports can only be tested reliably in the future. However, if there is a connection or effect of subconscious attitudes on language use, this can be indicated by the slow rate of word import in the Danish case, as in the discussion in Section 4 (an increase of only 165% in 25 years in Danish, whereas Norwegian had a 319% increase). The next theoretical question is how subconscious attitudes are related to other concepts in this study. For instance, do subconscious attitudes keep better pace with contemporary trends in usage than conscious attitudes because they cannot be controlled by consciousness and self-image? Furthermore, do subconscious attitudes affect usage, or are they an effect of usage? These issues can only be examined based on historical data in future studies.
6 A macro-economical factor? In Section 4, we found it difficult to provide a simple explanation for the various purist situations in the Norwegian language community. Therefore, we must further develop our understanding of possible macro-factors. An interesting correlation will be developed into a suggestion about a possible driving force. Sweden demonstrates a falling ranking in foreign word imports from 1975 to 2000, whereas Norway overtakes Denmark and Sweden and climbs to the top. Why do we see two opposite tendencies in these neighboring countries with similar social structures and considerable contact among people? Do our data indicate the nature of the shift in Figure 14?
244
Helge Sandøy
100 90 80 70 60 50 40 30 20 10 0
1975
2000
21
88
Norwegian Danish
31
82
Sw-Swedish
33
70
Fig. 14: Increase in imports in the three Scandinavian countries
When the situations are tested for statistical significance, the situations in 1975 and 2000 are presented as follows: 1975: 2000:
Norwegian *** < Danish = Swedish Swedish *** < Danish = Norwegian.
Our data from newspapers can be broken down based on genre and topic. For genre, the distributional import pattern seems, interestingly, to be similar in 1975 and 2000. Figure 15 demonstrates the Swedish situation during these two years, but the Norwegian and Danish curves show almost the same pattern (Sandøy 2009). This indicates remarkable stability. When the figures are broken down into topics, they demonstrate different patterns, as shown in Figures 16–18. (The topics are lei sure, dom estic issues, crim e,
30 25 20 15 10 5 0 feature interview editorials
leers
announce noces -ments
news
reviews
paragraphs
portraits
reports
1975
1,4
6,27
14,01
4,16
5,07
15,15
25,49
7,63
3,63
2,28
14,91
2000
0,19
6,71
8,83
3,7
5,25
18,74
27,14
6,97
2,95
5,01
14,52
Fig. 15: Frequency of imports in various genres in Swedish newspapers (per 10,000 text words)
Linguistic globalization: Experiences from the Nordic laboratory
245
250 200 150 100 50 0
lea
dom
crim
cul
loc
eco
per
pop
pol
spo
ent
you
for
1975
32
17
18
25
15
55
8
43
8
17
69
10
19
2000
225
73
36
129
39
98
10
159
37
63
128
98
92
Fig. 16: Frequencies of imports in Norwegian newspapers on different topics 700 600 500 400 300 200 100 0
lea
dom
crim
cul
loc
eco
per
pop
pol
spo
ent
1975
0
42
14
30
18
22
32
30
21
20
80
2000
95
80
57
52
106
111
44
115
82
55
227
you
for 45
638
54
Fig. 17: Frequencies of imports in Danish newspapers in different topics 300 250 200 150 100 50 0
lea
dom
crim
cul
loc
eco
per
pop
pol
spo
ent
you
1975
69
9
5
56
0
10
11
57
20
39
85
0
for 29
2000
85
32
21
59
32
140
32
115
72
52
117
246
34
Fig. 18: Frequencies of imports in Swedish newspapers in different topics
cul ture, loc al issues, eco nomy, per sonal issues, pop ular science, pol itics, spo rt, ent ertainment, you th, for eign policy.) When viewing Figures 16–18, we should consider that the volume of texts is low for some topics. Thus, we ignore results from topics with less than 10,000 words. We find obvious changes in the Norwegian newspapers in the topics leisure, culture and foreign countries, where the number of tokens is high enough to be considered interesting (for
246
Helge Sandøy
popular science, the text volume is too low). In Swedish and Danish, we do not find a parallel increase in any topics with acceptable text volumes in either year. A frequently mentioned idea is that the Norwegian oil industry has introduced a dependence on Anglo-American culture and language. If we count occurrences of oil industry imports in 1975 and 2000 in our Norwegian texts, we find that the figure decreased from 14 to 8. Oil import words were typically replaced by Norwegian words over the relevant period, such as pipe carrier, which was one of the 14 occurrences in 1975. We do not find this word in 2000, but an Internet search shows that this type of ship is now called forsyningsskip (for rør). When the oil age began in the late 1970s in Norway, a deliberate effort was made to nationalize relevant terminology. This work has been fairly effective, and today, the oil industry is rather norwegianized. Conversely, Norwegian society was radically transformed during the last two or three decades of the twentieth century. This was evident for ordinary people, and a way of quantifying this change is to look at the Gross National Income per inhabitant. Figure 19 compares Sweden and Norway by calculating the Norwegian GNI per inhabitant as a percentage of the corresponding Swedish GNI.
Fig. 19: Norwegian GNI / inhabitant as a percentage of the Swedish GNI (Cappelen and Larsen 2005: 9)
In Figure 19, it is obvious how the normal situation has historically been that Sweden has been ahead of Norway in economic development. Around 1980, when the oil age began, this situation suddenly changed, and the Norwegian GNI caught up with the Swedish one within a decade and took the lead in the 1990s. This economic growth was primarily produced by the oil industry, but it also stemmed from the fishing industry. It has triggered a Norwegian feeling of being modernized both socially and culturally. Such effects can easily be illustrated by comparative data from the Scandinavian countries, where we see for education, for instance,
Linguistic globalization: Experiences from the Nordic laboratory
247
that 16.7% of Norwegian cohorts take exams from universities or university colleges, whereas these values are 15.3% in Sweden and 15.0% in Denmark. In Norway, 80% of the population takes holidays abroad; the corresponding figures are 60% for both Denmark and Sweden. The figures on holidays indicate that GNI may also influence culture; people’s ability to afford to spend their holidays abroad now plays a more essential role in people’s lives and experiences. Above, we noted a surprising increase of imports in Norwegian in articles on leisure and foreign countries, two topics that may indicate an influence of these new habits. Modern general welfare and individual prosperity have extensive effects on culture. This may be assumed, and it seems reasonable. However, we are, so far, lacking elements and details in the theoretical framework that can combine the indirect relationship between GNI growth via a feeling of modernization, a new construction of identity and a new way of life to people’s inclinations to choose their words and alternatives for structural adaptation or non-adaptation.
7 Conclusion The project Modern import words in the languages in the Nordic countries has provided huge databases of information on imports and attitudes towards foreign influence on the mother tongue, and from these empirical data we have been able to describe the situation in each country following a method that shall open up for precise comparison and therefore for studies leading to general insights into factors and effects in language culture. This article is a short presentation of the design of the project and of some main results. Based on our empirical data, we have been able to correct the general description (stereotypes) of the situation in our communities, and it is documented that language purism has different structural expressions in various languages. The extent of our explanation of the different situations in the seven language communities is unclear. No single factor seems to give a perfect correlation with our descriptive data. Perhaps the most important benefit of our study is that we have been able to discard some previous explanations and that our different data types have allowed us to develop theories about relations between language usage, societal conditions, discourse, conscious attitudes, and subconscious attitudes. We presented some macro-economic data to examine how fast economic modernization can have some explanatory power for the interesting contrasts between the central Scandinavian languages. This can be a possible direction for understanding macro-conditions for developing language culture, but there is a need to obtain solid and comparative data that can highlight the many stages in the indirect relations between GNI and puristic vs. open language practice.
248
Helge Sandøy
References Cappelen, Ådne & Erling R. Larsen. 2005. Økonomisk utvikling og verdiskaping [Economic growth and wealth creation]. In Ragnhild Rein Bore (ed.), Hundre års ensomhet? Norge og Sverige 1905–2005, 6–14. Oslo: Statistisk sentralbyrå. Chrystal, Judith-Ann. 1988. Engelskan i svensk dagspress [English influence on Swedish newspapers]. (Skrifter utgivna av Svenska språknämnden 74.) Göteborg: Esselte studium. Graedler, Anne-Line. 1998. Morphological, semantic and functional aspects of English lexical borrowings in Norwegian. (Acta Humaniora 40.) Oslo: Universitetsforlaget. Graedler, Anne-Line. 2007. MIN-prosjektet jamført med tidligere studier [The MIN project compared to previous studies]. In Bente Selback & Helge Sandøy (eds.), Fire dagar i nordiske aviser. Ei jamføring av påverknaden i ordforrådet i sju språksamfunn [Four days in Nordic newspapers. A comparison of influence on the vocabulary of seven language communities] (Moderne importord i språka i Norden 3), 157–172. Oslo: Novus. Jacobsen, Jógvan í Lon. 2012. Ærligt talt, who cares? En sociolingvistisk undersøgelse af holdninger til og brug af importord og afløsningsord i færøysk [A sociolinguistic study on attitudes to and use of import words and replacement words in Faroese]. (Moderne importord i språka i Norden 13). Oslo: Novus. Jarvad, Pia. 1995. Nye ord – hvorfor og hvordan? [New words - why and how?] København: Gyldendal. Jarvad, Pia & Helge Sandøy (eds.). 2007. Stuntman og andre importord i Norden. Om udtale og bøjning [Stuntman and other import words in the Nordic countries. On pronunciation and inflextion]. (Moderne importord i språka i Norden 7.) Oslo: Novus. Kramer, Johannes. 1983. Language planning in Italy. In István Fodor & Claude Hagège (eds.), Language reform: History and future, Vol. 2, 301–316. Hamburg: Buske. Kristiansen, Tore. 2001. Two standards: One for the media and one for the school. Language Awareness 10 (1). 9–24. Kristiansen, Tore. 2009. The macro-level social meanings of late-modern Danish accents. Acta Linguistica Hafniensia 41. 167–192. Kristiansen, Tore. 2010. Conscious and subconscious attitudes towards English influence in the Nordic countries: evidence for two levels of language ideology. In Tore Kristiansen & Helge Sandøy (eds), The linguistic consequences of globalization: The Nordic Countries. (International Journal of the Sociology of Language 204), 59–96. Berlin & New York: Mouton de Gruyter. Kristiansen, Tore (ed.). 2006. Nordiske sprogholdninger. En masketest [Nordic language attitudes. A matched guise test]. (Moderne importord i språka i Norden 5.) Oslo: Novus. Kristiansen, Tore & Lars S. Vikør (eds.). 2006. Nordiske språkhaldningar. Ei meiningsmåling [Nordic language attitudes. A Gallup Poll]. (Moderne importord i språka i Norden 4.) Oslo: Novus. Kvaran, Guðrún. (ed.). 2007. Udenlandske eller hjemlige ord? En undersøgelse af sprogene i Norden [Foreign or national words? A study on the languages of the Nordic countries]. (Moderne importord i språka i Norden 6.) Oslo: Novus. Ljung, Magnus. 1985. Lam anka – ett måste? En undersökning av engelskan i svenskan, dess mottagande och spridning [A study on English influence on Swedish, its reception and diffusion]. (EIS Report No. 8.) Ljung, Magnus. 1988. Skinheads, hackers & lama ankor. Engelskan i 80-talets svenska [English influence on Swedish in the 1980s]. Stockholm: Trevi. Lund, Jørn. 1986. Det sprogpolitiske klima i de nordiske lande [The language political climate of the Nordic countries]. Sprog i Norden. 17–30.
Linguistic globalization: Experiences from the Nordic laboratory
249
Giles, Howard, Justine Coupland & Nikolas Coupland. 1991. Contexts of accommodation: Developments in applied sociolinguistics. Cambridge: Cambridge University Press. Mattfolk, Leila. 2011. Attityder till det globala i det lokala. Finlandssvenskar om importord [Attitudes towards the global in the local community. Finland-Swedes on import words] (Moderne importord i språka i Norden 12). Oslo: Novus. Nyström Höög, Catharina. 2005. Teamwork? Man kan lika gärna samarbeta! Svenska åsikter om importord [Swedish opinions on import words] (Moderne importord i språka i Norden 9). Oslo: Novus. Óladóttir, Hanna. 2009. Shake, sjeik eller mjólkurhristingur? Islandske holdninger til engelsk språkpåvirkning [Icelandic attitudes towards English language influence] (Moderne importord i språka i Norden 11). Oslo: Novus. Omdal, Helge. 2008. Tilpassing eller ikke tilpassing? Sammenlikning og resultater [Adaptation or non-adaptation? Comparison and results]. In Helge Omdal & Helge Sandøy (eds.), Nasjonal eller internasjonal skrivemåte? Om importord i seks nordiske språksamfunn [National or international spelling? On import words in six Nordic language communities] (Moderne importord i språka i Norden 8), 163?186. Oslo: Novus Ottósson, Kjartan. 1997. Purisme på islandsk [Purism in Icelandic]. Purisme på norsk? (Norsk språkråds skrifter 4), 31–37. Oslo; Norsk språkråd. Sandøy, Helge. 2007. Avisspråket i Norden – ei jamføring. In Selback, Bente & Helge Sandøy (eds.), Fire dagar i nordiske aviser. Ei jamføring av påverknaden i ordforrådet i sju språksamfunn [Four days in Nordic newspapers. A comparison of influence on the vocabulary of seven language communities] (Moderne importord i språka i Norden 3), 127–155. Oslo: Novus. Sandøy, Helge. 2009. Forskjellar i likskapane? Om importord i skandinavisk [Differences in the similiarities? On import words in the Scandianavian languages]. In Henrik Hovmark & Iben Stampe Sletten & Asgerd Gudiksen (eds.). 2009. I mund og bog. 25 artikler om sprog tilegnet Inge Lise Pedersen på 70-årsdagen d. 5. juni 2009, 263–276. København: Afdeling for Dialektforskning. Sandøy, Helge & Jan-Ola Östman (eds.). 2004. “Det främmande” i nordisk språkpolitik. Om normering av utländska ord [The ‘foreign’ in Nordic language policy. On the codification of foreign words] (Moderne importord i språka i Norden 2). Oslo: Novus. Sandøy, Helge & Tore Kristiansen. 2010. Conclusion. Globalization and language in the Nordic countries: Conditions and consequences. In Kristiansen & Sandøy (eds.), Tore Kristiansen & Helge Sandøy (eds), The linguistic consequences of globalization: The Nordic Countries. (International Journal of the Sociology of Language 204), 151–159. Berlin & New York: Mouton de Gruyter. Sharp, Harriet. 2001. English in spoken Swedish: A corpus study of two discourse domains. (Stockholm Studies in English 95.) Stockholm: Almqvist & Wiksell International. Thøgersen, Jacob. 2007. Det er meget godt som det er .. er det ikke? En undersøgelse af danskernes holdninger til engelsk [A study of the Danes’ attitudes towards English] (Moderne importord i språka i Norden 10). Oslo: Novus. Thomas, George. 1991. Linguistic purism. London & New York: Longman. Trudgill, Peter. 2000. Sociolinguistics. An introduction to language and society (4th edn.) London: Penguin. Vestbøstad, Per. 1989. Nynorsk frekvensordbok [Dictionary of word frequences in Nynorsk]. Bergen: Alma Mater. Vikør, Lars S. 1995. The Nordic languages. Their status and interrelations (2dn edn.) Oslo: Novus.
Index abstractness 174 anglicism see English loanword association overlap 174, 178, 181–183, 184–185 attitudes – conscious see overt attitudes – covert 10, 228, 242–243 – overt 228, 240–242 – subconscious see covert attitudes bilingual 195, 213, 215–216 – balanced 173 – corpus 22, 31–32, 34 – memory 173 – punning 45 – speech 22, 29 bilingualism 44 borrowability 4–5, 8, 20–22, 41, 42–44, 46, 61–62 borrowing 31, 66–67, 72–73, 75, 77–78, 87, 91–92, 96, 144, 167 – and codeswitching 3–4, 20–22, 29, 35–36, 44–45, 193, 213–216 – cultural 7 – grammatical 23 – lexical 1, 2–5, 21, 34–36 – nonce 209 – phraseological 45 bridging contexts 65, 88 catchphrase 46, 47–48, 49–50, 61–62 CFM see Conceptual Feature Model codeswitching 20, 22 – bilingual 19 – insertional 22, 28–29 cognateness 174–175, 180–181, 183–184 Cognitive Sociolinguistics 20–21, 30, 42–44, 176 Conceptual Feature Model 173–174, 176, 184 concreteness 175, 180–181, 183–184 conventionalization 19, 26–28, 29–30, 68 – community 33 corpus-based sociolectometry 102, 108–109, 130–131 entrenchment 24, 26–28, 29–30, 34–36, 46, 50–51, 55–56, 59, 108 – individual 33
– media 51, 59 ethnicity 147, 150, 152, 154–158 – interlocutor 147, 166 European and Brazilian Portuguese 102, 103–105, 130–131 gender assignment 3, 77, 83, 192–194, 201, 213–215 Generalized Linear Mixed Model 151–153, 155, 163 GLMM see Generalized Linear Mixed Model import word see loanword job advertisements 171–173, 176–178, 184–185 Kendall’s Tau 160–161 language – change 19, 21, 23–24, 27, 28, 36, 46, 89, 242 – culture 238, 239, 247 – dominant 31, 145, 229 – donor 1, 20, 46, 66, 74, 110, 172, 213, 214, 230 – export see donor language – host 1, 2, 3, 7, 20, 43 – minority 31 – receptor see host language – source see donor language – subdominant 145 loan translation 8, 20, 34, 36, 45, 117 loanword 2, 5, 19, 22–23, 30–31, 172, 174–175, 184–185, 226, 230 – adaptation 3, 4, 65, 71, 77, 79–86, 96–97, 109, 225 – diffusion of 25, 32–33 – established 29, 148, 209, 213 – English 9, 42–43, 104, 113–117, 124–126, 172, 179, 183–185 – integration see loanword adaptation – research 1, 2, 65–66, 74, 108, 176, 193, 215–216 logistic regression analysis 42, 53, 54–55 macro-economy 243–247 mass media 41, 46, 62 multi-word unit 1, 4, 7–8, 21–22, 33, 45 New Zealand English 145–147, 149, 166–167
252
Index
New Zealand English Press Corpus 147, 150 New Zealand identity 147, 164, 166 noun 4, 21, 22, 192–193, 198, 199, 213–216, 232 NZEPC see New Zealand English Press Corpus
set-external proof 7, 46 stereotypes 232, 239–241, 247 synonyms 102, 106, 107, 180 Te Reo M¯aori 144, 164
onomasiology 6–7, 77, 78–79, 86, 92–93, 102, 107 onomasiological profile 101, 106, 108–113 phraseology 4, 5, 8, 42, 45, 46, 49–50, 62 pragmatic effects 66, 72–75, 92–96 purism 25, 104–105, 119, 228–230, 235, 247 – exoglottic 235 – morphological 233–234 – orthographic 233–234 quantitative analysis 5, 34, 45, 143–144, 225 Revised Hierarchical Model 173, 178 RHM see Revised Hierarchical Model semantic change 66, 72–75, 78, 87–92 semasiology 6–7, 77–79, 87
usage-based approach 20, 23–24, 26, 28–30, 34, 66, 75, 85, 92 variable-based neighbor clustering 161 variance 66–72, 79, 85, 96 variation – lectal 9, 60–61, 108–109 – onomasiological see onomasiology – semasiological see semasiology VNC see variable-based neighbor clustering Wellington Corpus of Spoken New Zealand English 147, 148, 149, 151 WSC, see Wellington Corpus of Spoken New Zealand English word class 4, 232