Codeswitching Worldwide: II 9783110808742, 9783110167689


241 96 11MB

English Pages 377 [380] Year 2000

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Introduction
Section 1. Theoretical issues revisited
The matrix language frame model: Development and responses
Language alternation: The third kind of codeswitching mechanism
Section 2. Linguistic aspects: From morphosyntax to semantics
Contrastive sociolinguistics: Borrowed and codeswitched past participles in Romance-Germanic language contact
Functional categories and codeswitching in Japanese/English
Linguostatistic study of Bulgarian in the Ukraine
The role of semantic specificity in insertional codeswitching. Evidence from Dutch-Turkish
Section 3. Codeswitching as oral and/or written strategy
Oral and written Assyrian-English codeswitching
Written codeswitching: Powerful bilingual images
Section 4. Emergence of new ethnicities
Talking in Johannesburg: The negotiation of identity in conversation
Codeswitching in the language of immigrants: The case of Franbreu
Section 5. Communication codes in education
Towards a new understanding of codeswitching in the foreign language classroom
References
Index
Recommend Papers

Codeswitching Worldwide: II
 9783110808742, 9783110167689

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Codeswitching Worldwide II

W G DE

Trends in Linguistics Studies and Monographs 126

Editor

Werner Winter

Mouton de Gruyter Berlin · New York

Codeswitching Worldwide II

edited by

Rodolfo Jacobson

Mouton de Gruyter Berlin · New York

2001

Mouton de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter GmbH & Co. KG, Berlin.

© Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence and durability.

Library of Congress Cataloging-in-Publication Data Codeswitching worldwide II / edited by Rodolfo Jacobson, p. cm. - (Trends in linguistics. Studies and monographs ; 126) Chiefly papers presented at the 14th World Congress of Sociology, held 1998, University of Montreal. Includes bibliographical references and index. ISBN 3-11-016768-9 (alk. paper) 1. Codeswitching (Linguistics) - Congresses. I. Title: Codeswitching worldwide 2. II. Title: Codeswitching worldwide two. III. Jacobson, Rodolfo. IV. World Congress of Sociology (14th : 1998 : University of Montreal) V. Series. PI 15.3 .C646 2000

00-033864

Die Deutsche Bibliothek - Cataloging-in-Publication Data Codeswitching worldwide / ed. by Rodolfo Jacobson. - Berlin ; New York : Mouton de Gruyter 2. - (2001) (Trends in linguistics : Studies and monographs ; 126) ISBN 3-11-016768-9

© Copyright 2000 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording or any information storage and retrieval system, without permission in writing from the publisher. Printing and Binding: Hubert & Co., Göttingen. Cover design: Christopher Schneider, Berlin. Printed in Germany.

Contents Introduction Rodolfo Jacobson

1

Section 1 Theoretical issues revisited The matrix language frame model: Development and responses Carol Myers-Scotton

23

Language alternation: The third kind of codeswitching mechanism Rodolfo Jacobson

59

Section 2 Linguistic aspects: From morphosyntax to semantics Contrastive sociolinguistics: Borrowed and codeswitched past participles in Romance-Germanic language contact Jeanine Treffers-Daller Functional categories and codeswitching in Japanese/English Shop Azuma Linguostatistic study of Bulgarian in the Ukraine Ol'ga S. Parfenova The role of semantic specificity in insertional codeswitching: Evidence from Dutch-Turkish Ad Backus

75

91

105

125

vi Contents

Section 3 Codeswitching as oral and/or written strategy Oral and written Assyrian-English codeswitching Erica McClure

157

Written codeswitching: Powerful bilingual images Cecilia Montes-Alcala

193

Section 4 Emergence of new ethnicities Talking in Johannesburg: The negotiation of identity in conversation Robert K. Herbert

223

Codeswitching in the language of immigrants: The case of Franbreu Miriam Ben-Rafael

251

Section 5 Communication codes in education Towards a new understanding of codeswitching in the foreign language classroom Diana-Lee Simon

311

References

343

Index

365

Introduction Rodolfo Jacobson

The advancement, over the last four years, in the field of codeswitching has suggested that a follow-up volume of the Editor's Codeswitching Worldwide (Mouton de Gruyter, 1998) is in order and that the present publication of Codeswitching Worldwide II will allow the interested reader to realize to what extent scholars have come to grips with the alternation between two languages as an ordered phenomenon of language use. The core of the studies included in this volume consists again of papers that were delivered at the World Congress of Sociology, the fourteenth congress held in 1998 at the University of Montreal in Canada. In addition to the papers selected, a few additional papers have here been incorporated as they closely relate to the very topics discussed at the session Languages, codes and codeswitching, chaired by the Editor of the present volume. The international focus has once more been one of the major goals in our selection and the authors hailing from six different countries are discussing bilingual language use in over a dozen of different settings in such diverse areas as France (including Alsace), Israel, Japan, Malaysia. Mexico, Netherlands, Russia, Central and South Africa, Spain, and Turkey. The volume has been subdivided into five main sections, Section 1: Theoretical issues revisited, Section 2: Linguistic aspects: from morphosyntax to semantics, Section 3: Codeswitching as oral and/or written strategy, Section 4: Emergence of new ethnicities and Section 5: Communication codes in education. These sections are followed by a general bibliography compiled from the individual references supplied by each author and by a well-organized index to assist the reader in locating the terms and topics that relate to the various chapters. The objective of the present introduction is to highlight some of the issues discussed in the chapters that follow.

2 Rodolfo Jacobson

The volume starts out with a contribution from Carol Myers-Scotton entitled The matrix language frame model: developments and responses. The inclusion of Myers-Scottons study in this volume is in no way accidental as her research in the field of codeswitching has been remarkable in the sense that her work has raised the theoretical level of investigation to a new higher plane. The placement of her chapter at the beginning of this anthology is then intended to set the framework of what codeswitching means in the eyes of sociolinguistic scholars today. Her chapter is quite unique in that she takes a step backward to reexamine, as if an outsider, her ownfindingsover the years and tries to deal - in a most scholarly way - with some of the issues or critiques of her work that colleagues in thefieldhave raised in their books, articles or even personal conversations. Finally, she advances as a bonus some thoughts on the refinement of her Matrix language frame model, a new model that she calls the 4-M model. In the introductory part of her chapter, Myers-Scotton defines the two basic hierarchies of classical codeswitching, the Matrix language vs. embedded language opposition and the Content morpheme vs. system morpheme opposition, and refers then to some more recent elaborations on these hierarchies in publications by herself or in collaboration with Janice Jake. Thereafter, Myers-Scotton elaborates on the findings reported in each of these publications. Later in this chapter, she explores the notion of congruence checking which goes beyond her earlier blocking hypothesis. Congruence, she alleges, is apparent at the level of abstract entries in the mental lexicon known as lemmas and each lemma must be checked for congruence at various abstract levels. In one of her subsections, Myers-Scotton discusses several compromise strategies, two of which are embedded language islands and bare forms, strategies that she illustrates with examples from Swahili-English. She then elaborates on how the original Matrix language frame model has been modified. Several parts of that model had in fact been clarified in the revised edition of Duelling languages, certain claims revised and the notion of Composite matrix language introduced. Myers-Scotton ties the model to the abstract level of linguistic competence, thus showing her indebtedness to Chomskyan postulates. Most important, however, is another subsection in which the author clarifies that her unit of analysis is neither the

Introduction 3

sentence nor the clause but the Projection of complementizer, for short CP. Myers-Scotton then shows how a CP may qualify as a bilingual unit and the reader will realize that "a bilingual CP contains minimally a mixed constituent or at least one embedded language island". The notion of classic codeswitching becomes clearer when Myers-Scotton compares it to composite matrix language where the latter allows for a degree of convergence toward either the matrix or the embedded language. The very notion of matrix language becomes significant when Myers-Scotton specifies its nature as an abstract frame rather than an actual language event. In other words, matrix language exists only as a morpho-syntactic abstraction. Here she detracts herself from her earlier conception to identify the matrix on the basis of frequency metric alone. The new attempt of subcategorizing system morphemes now leads Myers-Scotton to describe in the newly conceived 4-M model, a four-way distinction that preserves the content vs. system morpheme opposition but subdivides the system morpheme category into three types: early system morphemes, bridge late system morphemes and outsider late system morphemes. In other words, one is here dealing with two different levels of morphemes, one that distinguishes between content and system morphemes and another that subcategorizes system morphemes into three different types as seen in the following diagram: Content

-

System / I \ Early Bridge late Outsider late

This subdivision of system morphemes, Myers-Scotton argues,"allows for a fuller explanation of why certain types of congruence problems arise in codeswitching". It is in particular here that Myers-Scotton returns to the issues or critiques referred to above in order to show how some queries by sociolinguistic scholars can be satisfied on the basis of a subcategorization of system morphemes. In fact, she takes great pains to respond to the various arguments of experts like Bentahila-Davies, Poplack, and Boumans and seems to settle the issues raised quite satisfactorily. On the other hand, it should be interesting to learn to what extent the objections of these scholars have now been acquiesced or

4 Rodolfo Jacobson

mitigated. In the final portion of the chapter where she discusses the composite matrix language, Myers-Scotton attempts to clarify further how the abstract lexical structure from more than one variety is involved in building the frame and how these levels of structure can actually be split and recombined. In summary, the discussions in this chapter are highly informative elaborations on the status of today's codeswitching theories and it is particularly gratifying how Carol Myers-Scotton addresses the issues raised by some of the colleagues in the field and refines the model that may solve several legitimate concerns on how to analyze the global phenomenon of code alternation. Compared to Myers-Scotton, who covers her research and that of her associates during almost two decades, the following chapter is of a more limited scope. Rodolfo Jacobson, who is also Editor of the present volume, basically addresses one single issue alone, that of whether, in addition to matrix-embedded language constructs, one can also make a case for another type of codeswitching strategy. Jacobson's study entitled Language alternation: the third kind of codeswitching mechanism is included in this section because of his concern for theoretical issues, even though his approach to codeswitching stresses to a large degree its pragmatic nature and its sociocultural significance. In his introduction, Jacobson refers to a number of studies, his own as well as those of others, that point to the fact that the two participating languages in bilingual discourse may at times play equal roles in the unfolding of the message rather than functioning in a superordinate-subordinate relationship and supports this assertion with codeswitching data from EnglishSpanish and English-Malay discourse. Jacobson then suggests that data of this sort give credence to the fact that the so-demonstrated notion of equality actually points to a third kind of codeswitching mechanism, one that he calls with Bentahila-Davies language alternation. On the other hand, Jacobson does recognize the fact that the most common type of switching is "one in which one language occupies a dominant position and the other is subordinated to the former". He cites to this effect additional examples from English-Spanish and English-Malay and reminds the reader that he had suggested - as early as in 1983 - a crude form of frame analyis where an imaginary frame would allow chunks of the dominating language [matrix] joined by chunks of a subordinated

Introduction 5

language [embedding] be embedded in such a frame. Since MyersScotton conceptualized at a later time her Matrix language frame model, Jacobson emphasizes some of the differences between these two conceptualizations, in particular the fact that he uses the actual sentence as unit of analysis, whereas Myers-Scotton uses the CP, that is, the projection of complementizer. Another difference can be seen in the fact that in Myers-Scotton's view only two mechanisms can operate, whereas for Jacobson there are three of them. In this context, Jacobson makes ample reference to the work of Abdeläli Bentahila and Eyrlis Davies when he continues to stress the importance of the kind of codeswitching that reveals equal relationship between the two participating languages. Later, he reports on his own research in Malaysia and cites various English-Malay utterances that he alleges are valid examples of language alternation. The criteria to identify language alternation used by Bentahila-Davies are subsequently expanded and concrete steps are suggested that would identify some bilingual discourse as such. The sociocultural implications offindinginstances of language alternation are then discussed and Jacobson offers a panoramic view of Malaysia's language situation. Although the cited country had ruled, at the time of its independence, that Bahasa Malaysia would be the national language, English, the language of its colonial predecessor, is still present in many language events. Jacobson argues therefore that language alternation might be a feasible way of allowing both languages to co-exist as, in this language mixing strategy, Malay would never occupy a subordinate position in regard to English. A subordination of the Malay language would certainly run counter to the political demands of this independent nation. Section Two focuses more directly on the linguistic aspects of language contact. Its four chapters encompass such diverse issues as morphosyntactic change from a historical perspective, switchability of items at the synchronic level, lexical innovations as a result of cultural thrust, and the role of semantic specificity in the adoption of host language lexemes by a migrant population. More specifically, Jeanine Treffer-Daller argues in her chapter entitled Corttrastive linguistics: Borrowed and codeswitchedparticiples in Romance-Germanic language contact that her contribution arises from "a comparison of the linguistic

6 Rodolfo Jacobson

consequences of language contact between the Germanic and Romance language varieties" spoken along the linguistic frontier. She touches upon a vital argument that, before her, had already been considered by Muysken, namely whether the patterns observed are due to structural differences between the languages or are merely the result of a series of sociolinguistic factors and characteristics of interlocutors. Treffers-Daller then points to the limitation of actual typological differences between the Romance and Germanic varieties but still recognizes the impact of such sociolinguistic variables as the amount of support for a given variety and the general attitude toward these variables. Treffers-Daller roughly bases her study on a language contact model developed by Thomason and Kaufman, model that she considers a powerful tool for describing contact differences. However, she finds two exceptions to the similarities of otherwise parallel situations, those of Brussels and Strasbourg. In describing these situations, she deviates to some extent from the general focus of this volume, which is codeswitching, and proceeds to describe one single grammatical characteristic that has resulted from the borrowing patterns of the speakers of French and of two Germanic varieties (Dutch and Alsatian) in the formation of past participles. The nature of this Romance-Germanic merger in single words is however close enough to the codeswitching process as to allow the inclusion of her study in the present volume. At the beginning of the chapter, Treffers-Daller provides some basic notions on borrowing and interference through shift here drawing on the earlier work of Weinreich as well as the more recent elaborations by Thomason and Kaufman. Her definitions of these two types of externally motivated language change help the reader conceptualize her arguments and it is, in particular, the discussion of the five levels of Thomason and Kaufman's interference scale that clearly shows how shift can be measured quantitatively. Treffers-Daller then applies the model to data from her and Gardner-Chloros' database and formulates three hypotheses that lead her to conclude that the data from Brussels and Strasbourg may not lend support to the claim that "the sociolinguistic history of the speakers is the primary determinant of the linguistic outcome of language contact". It is the structure of the languages that plays the more important role, argument that she corroborates with findings from Brussels Dutch and Alsatian, even though she restricts

Introduction 7

these to the formation of the past participles that show different integration strategies, some having integrated and others, unintegrated forms. In the following chapter entitled Functional categories and codeswitching in Japanese/English, Shoji Azuma takes the reader to consider, within the context of universal grammar, what is switchable in human languages. Although he considers this issue from the vantage point of Japanese-English codeswitching, he hopes that his findings can be generalized to apply to all codeswitching events. The Principles and parameters approach on which his study is based is defined in the introductory part of the chapter and a functional parametrization hypothesis is subsequently framed in order to then formulate the constraint that closed class items cannot be switched. The switching of open class items, Azuma alleges, is widely illustrated in the literature and the dichotomy between these two classes can be related to some recent discussions by Chomsky on lexical vs. functional categories. He then argues with Fukui that only functional, that is, [+F] elements are subject to parametric variation. Azuma elaborates on this point that according to the suggested hypothesis "if an element has the feature [+F], then it is parameterized for its specific language and (it) is not interchangeable". If this hypothesis proves valid, then only [-F] elements in the lexicon can participate in codeswitching. In the main part of this chapter, Azuma discusses the switchablity in Japanese of the four non-functional elements of the lexicon, noun (N), verb (V), adjective (A) and pre/postposition (Ρ). Ν is recognized as the single most commonly switched element but V, A and Ρ require extensive discussion in order to determine under what circumstances switching becomes viable. Then, Azuma turns his attention to the functional categories agreement (AGR), tense (T), determiner (D), and complement (C) and demonstrates that neither Τ nor C switches have ever been accounted for and that AGR and D, both absentfromthe Japanese language, cannot even be investigated. One isolated example of D (determiner) insertion however seems to suggest that D may be less resistant to codeswitching. At the end of the chapter, Azuma provides the reader with Fukui's feature specification of functional categories which shows [+N] to be specified for both AGR and D, a fact that may explain a slight potential for switchability.

8 Rodolfo Jacobson

The following chapter takes the reader back to the European continent, not to Treffers-Daller's Romance-Germanic borderland but to the eastern frontier. Olga S. Parfenova acquaints the reader in her chapter entitled Linguostatisticstudy ofBulgarian in Ukraine with a comprehensive picture of the survival of the Bulgarian language in the Ukraine and the area of the Sea of Azov. The language has undergone there a great deal of relexification as a result of the strong Russian influence, mainly during the communist era. Parfenova makes an interesting distinction between discourse mode and discourse strategy where the former reflects the nature of intra-ethnic communication alone. However, she points to the difficulty of determining the extent to which borrowings from Russian occur as the close genetic relationship between the two languages often obscures the difference between varieties. Parfenova provides then a careful description of the background of the language situation of the area in order for the reader to capture the relationship between language and ethnicity. The following discussions deal with the description of data and methodology where she specifies four types of text units on the basis of Gerov's Bulgarian word level data. Her overall approach of analysis follows the linguostatistic model and Parfenova takes pains to describe how the idiolectal and sociolectal data have actually been collected. The quantitative characteristics of Russian words in Bulgarian speech is given in percentages and also shown in diagrams in order to allow the reader get an insight into the language situation of two groups of Bulgarians in the context of their use of russisms. Parfenova's study becomes more specific on this issue when she discusses the functional characteristics of the cited russisms, especially where she gives examples of codeswitched utterances and distinguishes there Bulgarian lexemes from non-adapted [bold] and adapted [capitals] Russian borrowings. Furthermore, Parfenova discusses temporal, locality as well as several modal expressions which all show the interesting mosaic of the preference of Bulgarian items for certain expressions and of Russian expressions for others. Parfenova concludes her chapter qualifying the language use of Bulgarians in the Ukraine as a codeswitched mode with primarily Bulgarian characteristics as Russian constitutes a presence of reasonably small percentage in intra-ethnic communication, mostly in non-adapted form. She ascribes the presence of whatever Russified variants that are found

Introduction 9

in local speech to the influence of the socially active population. As for the overall process of relexification, Parfenova suggests that the total absence of Bulgarian instruction in local schools is the main cause of the presence of Russian words in Bulgarian speech. The following chapter contains some similarities - at least on its surface - with the two preceding chapters of this section in the sense that its author also focuses on the insertion of embedded lexemes, a field of study that he calls insertional codeswitching. On the other hand, Ad Backus investigates in his contribution to the volume entitled The role of semantic specificity in insertional codeswitching: Evidence from DutchTurkish a setting that is quite differentfromthose of two earlier chapters. In effect, Backus' data describe the clash of two cultures (Turkish/Dutch) that has resulted, not from the contact between resident cultures as discussed by Treffers-Daller and Parfenova, but from recent waves of migration to a distant country, to Holland. Backus argues in the introduction to his chapter that the reasons for lexical borrowing have rarely been studied and yet a semantic-pragmatic study of this nature is a valuable undertaking, since it sheds light on the process of lexical renewal. The purpose of such a study, he states, is "to develop some ideas what it is that makes a content word borrowable" and after studying the switching strategies of Turkish immigrants, he suspects that it is the high degree of specificity that seems to stimulate insertional codeswitching. In his section on Specificity, Backus suggests that semantic congruence is higher for general concepts and requires little if any embedded language words. Words, however, that are not semantically equivalent in the two languages are more prone to be used within a matrix language utterance and several semantic-pragmatic factors may indeed underlie the selection of embedded words. The reader may find it profitable at this point to interrelate Azuma's notion of switchability and Backus' thoughts on semantic specificity. Both address the issue of what can be borrowed in codeswitched discourse, but one does so with a syntactic and the other, with a semantic perspective in mind. In a later segment of the chapter, Backus suggests a tentative definition of what he calls the Specificity hypothesis and makes a clear distinction between the inherent semantics of lexical elements and referential specificity. His primary argument for insertional codeswitching is that "borrowing

10 Rodolfo Jacobson

speakers only takefromanother language what they need" in order to fill the gaps in their discourse. Furthermore, specificity is best seen in gradient terms so that the terms high specificity and low specificity can be equated to higher-level and basic-level vocabulary. Backus then attempts to give a pragmatic definition of the mentioned notion in the sense that certain semantic fields or even certain topics stimulate insertional codeswitching. "If semantic domain," he argues, "is an important predictor of switches, then it must be part of our definition of specificity." The preceding arguments lead Backus to now focus on the main issue of the chapter, i.e., the semantic specificity in Turkish-Dutch codeswitching. At this point, he describes the data studied, the roles assigned to semantic domains and provides the reader with a summary of the findings. In the course of his deliberations, Backus upgrades his semantic domain hypothesis when he refers more specifically to the embedded language semantic domain. In later subsections he deals with the selection of embedded language elements by groups of immigrants, groups that he classifies as first generation, intermediate generation and second generation. There is a wealth of specific data here that point to the semantic domain effects, even though the predictability of embedded language content word selection seems to diminish across generations.. In the section Role of specificity, Backus seeks to account for otherlanguage embeddings that can apparently not be justified by means of the notion of semantic fields but can through what Backus calls certain auxiliary constructs. Here, Backus refers to words that are intimately related to Dutch and are preferred by second generation speakers who have lived in Holland all their lives. Other words - even though unmotivated by domain membership - can be explained through semantic specificity and still others through various psycholinguistic mechanisms, such as activation levels, just to mention one. There is one word however that Backus finds totally unexplainable by semantic or psycholinguistic means but the unexpected use of the Turkish word can be justified simply by the fact that this word always appears in the matrix language. In Specificity within a semantic domain the reader finds an in-depth analysis of content words as used by one member of the intermediate generation. A table of the distribution of content words by semantic field and matrix language of clause accompanies the descriptive analysis. Two

Introduction 11

additional factors that also promote codeswitching are discussed, i.e., flagging and focusing. Flagging is the notion that had been proposed earlier by Poplack and refers to the speaker's intent to call the listener's attention to an imminent codeswitch, whereas focusing has to do with the inherent meaning of morpho-syntactic constructions. Backus' conclusion reiterates - as it should - the findings of this study, even though he still makes some tentative comments on the difference between core and culture bor- rowings and the selection of embedded language words, the latter being guided by a combination of factors, such as, bilinguality, personal preferences, and current accessibility. The study as a whole reflects the writer's insight into the sociolinguistic working of the bilingual mind. As the reader moves on to Section Three, Codeswitching as oral and/or written strategy, there are two matters that may come into his/her mind, one, in order to encounter codeswitching data, even those of a quite exotic kind, a person does not have seek them out in faraway lands like Russia or Japan, he/she can find them right here in the United States where many immigrant communities seek to maintain their ancestral language; two, codeswitching, that was believed to be an oral language performance only, also exists at the written level as McClure and MontesAlcalä have shown, in particular when those sharing the same two varieties hardly ever come into personal contact as this is the case for Assyrian-English switchers or when an individual switches languages in his/her personal record keeping as is the case in the Spanish-English diary entries. Erica McClure tells the reader in her chapter entitled Oral and written Assyrian-English codeswitching that "relative little attention has been accorded to the difference between oral and written codeswitching" and to the nature of the social and political features of codeswitching communities. Her study on Assyrian-English codeswitching aims at filling this gap in the codeswitching literature. Most important for this endeavor is her reference to Gal's proposal that an "integration of conversational, ethnographic and social historic evidence" is called for and McClure's study of modern Assyrian reveals the extent to which she is following Gal's postulate. The introduction of the chapter deals mostly with the overall background of the West Semitic language that is little kown to

12 Rodolfo Jacobson

outsiders except for a handful of linguists specializing in Semitic languages. A number of interesting characteristics of the Assyrian people are here revealed among which their multilingualism, dialect variation and diglossia will capture the readers' interest. Despite the cited dialect variation, McClure also mentions forces of dialect unification, even though there is no Assyrian nation as such to sponsor or legislate language unification. Four types of Assyrian data are collected among natives or descendants of natives of these Middle-Easterners now residing in the Chicago area and yield the data base for the present study. In her discussion of what codeswitching is, McClure reviews some of the sociolinguistic literature with special attention to the difference between a codeswitch and a loanword, difference that has baffled numerous linguistic scholars without ever finding a conclusive answer to their query. From oral codeswitching McClure moves on to the written counterpart and points to several factors that distinguish the written from the oral type. She specifies here what, in the context of her research, qualifies as a codeswitch and that words are considered codeswitches when they meet certain characteristics. In her discussion of oral and written codeswitching, McClure uses a format that differs from other contributions in the volume in the sense that she first comments on the characteristics found in her data and later on provides the actual data, a slight inconvenience for the reader but the care that McClure takes to explicate her data is impressive and makes up for any inconvenience that this format might have caused. At the same time, McClure provides a worthwhile means by her data to acquaint the interested reader with some aspects of this unfamiliar code. The actual switches illustrated in the chapter contain intersentential, full clausal, embedded-between-languages and subordinate switches. Also, various grammatical categories are discussed and, most interestingly, one such category concerns bicodal words where English nouns carry Assyrian plurals or, in one case, an Assyrian possessive pronoun postposition. McClure's comments on written codeswitching are also interesting for the difference that she makes between the various codeswitching strategies in accordance with the media type or genre in which they occur, such that the switching on the internet is quite different from printed materials. In her discussion later of the functions of oral Assyrian-English codeswitching, McClure

Introduction 13

distinguishes between situational and conversational switching and, as for the latter, she subcategorizes the lexical switches on the basis of gaps, connotation differences, clarification, emphasis and switches of interlocutors. In this part of the chapter, the format of data citation changes and the examples are now inserted in the text, a practice that makes the reading and identification of elements easier to follow. McClure's discussion of the functions of written Assyrian-English codeswitching are mainly an elaboration of earlier arguments but include some interesting comments on the significance of postings on the internet to the effect that cyberspace is allowing Assyrians who are spread out all over the world to reunite in a certain way and hereby experience, for the first time, a kind of togetherness they never dreamed of during the many centuries of diaphora. Cecilia Montes-Alcala offers the reader a different aspect of written codeswitching, since she focuses, not on a whole community like the preceding author, but on the performance of a single individual, herself as recorder of a personal journal. Montes-Alcala first discusses briefly the different levels of codeswitching, refers to its social stigmatization and stresses its inherent regularity at the grammatical as well as pragmatic levels. She also deplores the paucity of studies on written codeswitching. Although the object of her study and the methodology used are only touched upon slightly, the actual nature of her approach is the pragmatic analysis of data collected in the past and not a search for theoretical elaborations on language alternation. In effect, most of her sources reflect dates in the seventies and eighties with very few in the nineties. The significance of her study should therefore not be seen in the advancement of abstract schemes but in the citations of codeswitching events that reflect the creativity of a truly bicultural person. It also reveals an interesting progression from intersententiality to intrasententiality as the writer attains the necessary balance between two cultures without which such intrasententiality would not emerge. Montes-Alcala subcategorizes her Spanish-English switches into ten types; direct quotes, emphasis, clarification or elaboration, parenthetical comments, idiomatic expressions, linguistic routines or cliches, symmetric alternations, triggers, stylistic matters and lexical need, types that are remindful of some of the earlier work on oral codeswitching. However, what is different in

14 Rodolfo Jacobson

Montes-Alcalä's data is the superb mixture of two languages that all but creates a new language of its own, melting the two codes into one single variety as suggested by her eloquent example of the player of a guitar with eight strings - rather than one using two guitars with four strings each. She later discusses some syntactic constraints in written codeswitching and focuses on such constraints as cited by Gumperz, Lipski, Poplack and Timm some two decades ago, for which Montes-Alcalä finds some corroborations but also counterexamples. She notes in this regard that "written codeswitching is not subject to each and every syntactic constraint of the natural discourse [but] is still far from a random phenomenon..." In her conclusion, Monies-Alcala stresses the degree of bilingualism and biculturalism as clue to the refinement and complexity of the switches to the extent that bilingual bicultural balance is ultimately responsible for the quality of switches that the speaker generates. Section Four of the present volume focuses on the likelihood that some manifestations of codeswitched discourse may lead to the emergence of new ethnicities. Two important chapters are included in this particular section, one, dealing with the multilingual and multicultural setting in South Africa where indigenous and colonial languages are woven into a unique fabric of inter-ethnic communication and, the other, dealing with the rise of Franbreu [short for Fran^ais-Hebreu or French-Hebrew] in Israel where one immigrant language finds its way into the national language of the country, thus creating a communication code of its own. Both types of language mixtures seem to suggest that those who engage in this hybrid kind of communication set themselves off from the remainder of the population as a different ethnicity. Robert K. Herbert entitles the following chapter Talking in Johannesburg: The negotiation of identity in conversation and tells the reader there about the linguistic and ethnic complexity of one South African city where the languages of settlers, Afrikaans and English, join ranks with those of several African languages and some more recent immigrant varieties, so that speakers can draw their linguistic resources from a dozen or more different codes. He notes, furthermore, that there is a bias in favor of rural, conservative African varieties, a factor that tends to blur the actual language boundaries in urban centers. Herbert's chapter offers the reader a social and pragmatic interpretation of language use rather

Introduction 15

than a theoretical discussion of codeswitching and its constraints. Nevertheless, HerbertfindsMyers-Scotton's research [see chapter in this volume] useful as analytical framework but he draws, in addition, on the disciplines of ethnography of speaking, conversational analysis and ethnomethodology. After recognizing a three-way distinction between borrowing, code-mixing and codeswitching, Herbert goes on to discuss the markedness model developed earlier by Myers-Scotton but is apparently unaware of the more recent extensions and refinements of her work. Herbert agrees with many of Myers-Scotton's earlier postulates, offers some interesting data and suggests some minor changes to MyersScotton's model that may already have been considered in her upgraded version. Herbert also stresses the difference between South Africa and other settings in Africa where English is associated in many studies with higher status, whereas in South Africa, with its two settler languages and scores of indigenous varieties, the settler languages share, in their range of significance, mutual but distinct prestige to the extent that education and status calls for the use of English and authority, for the use of Afrikaans. In addition, several African languages are associated with specific values as Herbert shows in the switched discourse of husband and wife or among university students. Herbert then raises an interesting point in his discussion of codeswitching as unmarked choice arguing that "the overall pattern of conversation carries social marking, such that there is actually no true neutrality involved," reason for which he suggests the label codeswitching as a linguistic variety to find a way out of the dilemma that he plants. The wealth of his examples help the reader follow his arguments and, at the same time, gain insights into some of the features of African languages. To Myers-Scotton's three conditions for code-switching to occur, Herbert adds a fourth one, i.e., presumably positive evaluation of identities associated with each code. There is a need, Herbert alleges, to distinguish between two types of conversation, one that is characterized by lexical intrusions and the other by actual shifting of the operative language. And he notes at the end of his chapter that codeswitching is not restricted to informal contexts but also occurs in university tutoring groups and in urban classrooms. Many of the points that he makes here seem to involve some analytical judgments concerning codeswitching and constraints, more than what he may have

16 Rodolfo Jacobson

wanted to bargain for. On the other hand, the detailed citations of conversations make good on his promise to address the social and pragmatic interpretation to which he eluded at the beginning. Herbert concludes the chapter mentioning some valuable issues to be solved by future researchers, in particular one that poses the interesting question as to whether some highly mixed urban discourse should be thought of as operating on a continuum of language mixing or rather as the combination of "all languages into one language". Miriam Ben-Rafael describes another aspect of an equally complex linguistic setting, that of Israel, where the code alternation between French and Hebrew is creating a language variety known as Franbreu. In the introduction to the chapter entitled Codeswitching in the immigrant's language: The case ofFranbreu, Ben-Rafael reviews some of the earlier tenets on codeswitching, mostly of the seventies and eighties and with special emphasis on the work of Gumperz,Auer and the earlier MyersScotton. Issues, such as, personalization vs. objectivization, normative association between linguistic practices, values and social relations, marked vs. unmarked models, autonomy of the bilingual discourse are all cited there in her attempt to define the nature of codeswitching. Her comments on borrowing suggest an intent, like others before her, to differentiate borrowing from codeswitching and to view the former merely as a form of switching placed at a specific point on the bilingual continuum. Since a distinction must however be made between lexical intrusions and clausal insertions, she proposes such terms as unitary codeswitching and segmental codeswitching, notions that she discusses in depth on the basis of her Franbreu data. After some brief reference to the methodological aspects of her study, Ben-Rafael focuses in Analysis of the corpus on several types of unitarian codeswitching with examples of interactive repetitions and alternations - as opposed to rigidity - within unitarian codeswitching. Most interesting here are the instances where Hebrew never alternates with its French counterparts if they represent culture-specific images. In the discussion of segmental codeswitching, Ben-Rafael mainly distinguishes two types, reported speech and unflaggedness but also refers briefly to ready-made expressions, idioms of religious character, greetings or congratulations, rhetorical juxtapositions and alternation of speech turns. As for codeswitching as an

Introduction 17

unflagged element, Ben-Rafael describes it as fluid and uninterrupted. She argues in this regard that "the alternation of languages is neither the expression of a lack of linguistic competence nor the attrition of French" but that it is more than anything else a means of clarification. At the end of the chapter, Ben-Rafael provides an overall interpretation of codeswitching and mentions such characteristics as constellation of participants, codeswitching as topic marker, sequential subordination, change of topics, recovering an earlier reference, preferences and cohesion, reformulations and expressive support. She concludes summarizing the series of queries addressed in the chapter, thus allowing the reader to review the various issues in light of the broader question as to whether or not a new ethnicity is here emerging. Section Five, Communication codes in education, consists of a single contribution by Diana-Lee Simon in which she explores the role of codeswitching in the language educational context. Her chapter entitled Toward a new understanding of codeswitching in the foreign language classroom traces new paths into a mainly unchartered territory, that of alternating in the foreign language classroom between the target language and the language of the community. Simon first describes the developmentfrommacro- to microcodeswitching contexts that are triggered by the need for a more dynamic model which would focus on the meaning of codeswitching in social interaction. She then points to a somewhat parallel developments in language education that seeks to better understand what kind of relationship should exist between the language to be learned in the classroom and the language spoken outside with special attention to the alternation between codes. The use of the learner's native language in the foreign language classroom, she continues, is in theory a practice to be avoided but in reality its presence is often felt regardless of the teacher's guilt feeling that it creates. Hence, a "frank reversal of perspective on the role of codeswitching in foreign language teaching and learning" may be in order. Simon then reviews the research literature describing the ongoing change from quantitative studies to studies that are ethnographically sound to include observation and analysis of classroom talk, getting hereby closer to the interactional work being accomplished. Simon further notes that she wishes to consider the foreign language classroom a microcosm of the community

18 Rodolfo Jacobson

outside but with certain specific features and explores in this context such notions as the foreign language classroom as bilingual community, the verbal repertoires, the status and role ofparticipants and the purpose of communication in the foreign language classroom. Most interesting in regard to the latter is her reference to the interaction at multilevels, communicative and metacommunicative, linguistic and metalinguistic, shiftings that she later demonstrates in her data. After a brief theoretical elaboration on social and classroom switching, Simon gives in a later section of the paper a description of her data collection and stresses in that context the existence of a formal (institutional) learning frame and a social (interpersonal) frame whose interchange at times blocks and, at other times, promotes the alternation of codes. Simon includes at this point two examples of classroom talk in Thailand that contain codeswitches initiated by the teacher, and discusses in this context the methodological vs. personal motivation for the code choice and stresses the significance of codeswitching as a marked choice. She then makes three final points, one that "code choice is very frequently closely associated with the type of task", two that "determining what the matrix language and the embedded language are in a foreign language class is difficult" and finally that "sociocultural values are encoded by the languages of the repertoire". Three examples of classroom talk in France with codeswitches initiated by the learner illustrate Simon's contention that a number of issues justify the switching to the other usually-avoided code, such as double identity, request for grammatical clarification, request for clarification of meaning, response to social needs and others. Simon concludes her chapter by stressing the complexity of the process of negotiation undertaken by teachers and learners as they jointly exercise theirfreedomto break with methodologically imposed code constraints. If one accepts, she reiterates, the view that the classroom is in effect a social situation, then the switching can be seen to serve a specific pedagogical objective. The longer-than-usual introduction to the present volume has the dual purpose of assisting the less experienced reader to know what the arguments are that international sociolinguistic scholars consider crucial in their contributions and by revealing at the same time to experts in the field what is being debated in the various chapters. The Editor hopes that

Introduction 19

this objective is accomplished but of course the full appreciation of the work of these authors can only be obtained by examining in depth each chapter and verifying the extent to which the included data do truly back the assertions that the writers make in each individual chapter.

Section 1 Theoretical issues revisited

The matrix language frame model: Developments and responses Carol Myers-Scotton

1. Introduction The goal of this chapter is to summarize developments in the Matrix Language Frame model of codeswitching since its introduction in Duelling Languages (Myers-Scotton 1993a) and to clarify the model in regard to problematic codeswitching data. Some examples or issues raised by other researchers as presenting problems for the model will be discussed. Solutions to perceived problems regarding system morphemes (functional elements) will be presented in terms of a new sub-model, the 4-M model. In addition, the chapter will try to explain more fully the theoretical construct of the matrix language. While the model has been favorably received by some observers, the matrix-language construct has remained the source of some criticisms and/or misunderstandings. Most of the explication will refer to the subject of Myers-Scotton (1993 a), what I now call classic codeswitching, but other language contact phenomena that are the subject of more recent proposals and analyses will also receive some attention. Classic codeswitching is defined as the alternation between two varieties in the same constituent by speakers who have sufficient proficiency in the two varieties to produce monolingual well-formed utterances in either variety. This implies that speakers have sufficient access to the abstract grammars of both varieties to use them to structure codeswitching utterances as well. However, there are many types of language contact phenomena for which speakers do not have such full access to abstract grammatical structures. These phenomena receive attention under applications of recent extensions of the model: the 4-M model that accounts for the distribution of different types of morphemes and their election in language production; and the Abstract Level Model that explains how features from two or more varieties structure utterances in convergence

24 Carol Myers-Scotton

data (Myers-Scotton-Jake 1999; Myers-Scotton 1999b). Types of data studied include attrition and language shift. Even with several important elaborations and some revisions, the major original theoretical claims of the Matrix Language Frame model remain the same. The two hierarchies of the original Matrix Language Frame model can be modified to explain distributions in other types of language contact as well as in classic codeswitching. Note that this claim is not the same thing as saying that all bilingual speech is no different from codeswitching or words to that effect; applications of the model's basic hierarchies to other language contact phenomena, such as Creole development, sometimes have been misinterpreted in this way. Rather, the claim is that the same principles and processes underlie all contact phenomena, although the results differ, depending on the socio-historical and psycholinguistic factors present where the contacts occur (MyersScotton 1998). The basic hierarchies referred to are: • The Matrix Language vs. Embedded Language opposition. When two or more varieties come together within a single bilingual constituent, they do not participate equally. However, the division between the varieties differs, depending on the type of contact phenomena studied. In classic codeswitching, the division is strict: one, and only one, of the participating varieties is the source of the abstract grammatical frame of the constituent. This frame is called the Matrix Language. In classic codeswitching, the other participating variety - termed the Embedded Language - can only contribute limited material (largely only content morphemes and Embedded Language islands within the larger constituent, but see section 4 on permissible Embedded Language system morphemes). Example (1) illustrates classic codeswitching between Swahili as the Matrix Language and English as the Embedded Language. Note that in the mixed Verb Phrase constituent zinqfunction right now, English can supply content morphemes (the verb stem -function) as well as the adjunct Embedded Language island {right now). However, grammatical elements (system morphemes such as the verbal prefixes) come only from Swahili.

The matrix language frame model 25

(1) It's only essential services amba-zo zi-na-function right Comp-Cl.10 Cl.lO-Pres 'It's only essential services that function right now.' now.(Swahili/English Myers-Scotton 1993a [1997]: 130) In other contact phenomena, the varieties contributing to a bilingual utterance still are in a hierarchical relationship. However, although one variety may still be the main source of the abstract grammatical frame of this utterance, other participating varieties also contribute grammatical structure. In such cases, one must speak of a composite Matrix Language. This condition results when one variety's position in a speaker's repertoire is diminishing and that of another variety is gaining ground. For example, consider the dynamics of language shift in Pennsylvania German, a Germanic variety spoken in certain religious communities in the United States. Example (2) shows that a composite Matrix Language, with abstract grammatical patterns from English as well as from German, is structuring the verbal system. As Fuller (1996: 504) notes, "[o]ngoing and future actions are expressed, as in English, with the present progressive or with gehn 'going to', as opposed to the historically German use of the present tense or future auxiliary waerre/warre " (2)Ichhab geglaubt - es geht ihm happene 1S have/1S believe/PART - it go/3 s him/Dat. happen/INF Ί thought - it's going to happen to him!' (Pennsylvania German Burridge 1992: 206 cited in Fuller 1996: 504) •

The content morpheme vs. system morpheme opposition. Content morphemes from the Embedded Language occur more freely in mixed bilingual constituents than system morphemes (functional elements). Most inflections and functional elements qualify as system morphemes. Content morphemes participate in the thematic grid, either receiving or assigning thematic roles; system morphemes do not.1 Extensive discussion of system morpheme types follows in section 4.

26 Carol Myers-Scotton

Recent elaborations and modifications affecting these two hierarchies are found in three main publications: (1) Myers-Scotton and Jake (1995) amplifies the original Matrix Language Frame model in an extended discussion of how congruence figures as a constraint on classic codeswitching data; (2) The Afterword in the 1997 paperback edition of Duelling Languages modifies and clarifies several of the arguments in the original volume (Myers-Scotton 1993a); (3) In Myers-Scotton and Jake (1999), two new sub-models that have been implied in earlier work are explicated in detail, the Abstract Level Model (echoing ideas from the 1995 article) and the 4-M Model (elaborating on observations in MyersScotton (1993a) about divisions within the category of system morpheme). Space limitations permit discussion of only some of the points raised in these articles. The chapter has the following organization: Each article will be discussed in its own section along with relevant issues raised by other researchers. Section Two considers Myers-Scotton and Jake (1995), Section Three explicates the Afterword to the 1997 edition of Duelling Languages, and Section Four considers the 4-M model as a new development in Myers-Scotton and Jake (1999). Section Five briefly illustrates the notion of a composite Matrix Language, and Section Six is the conclusion.

2. Congruence checking Myers-Scotton and Jake (1995) goes well beyond the Blocking Hypothesis (Myers-Scotton 1993 a: 120-121) in making specific how the concept of congruence figures in constraining codeswitching. Congruence is salient at the level of lemmas, abstract entries in the mental lexicon. The claim is that when a speaker's intentions select the lemma underlying an Embedded Language content morpheme, that lemma must be checked for congruence with its Matrix Language counterpart at three abstract levels. The first level refers to lexical-conceptual structure (language-specific semantic/pragmatic features). The second level, predicate-argument structure, deals with how thematic structure is mapped onto grammatical relations (e.g., mapping Agent to Subject, etc.). The

The matrix language frame model 27

third level is the level of morphological realization patterns, referring to how grammatical relations are realized on the surface (agreement morphology, morpheme order, etc.).2 Thus, in (1) above, with Swahili as the Matrix Language, the claim is that function from English is congruent enough with its Swahili counterpart to be inserted into a Swahili frame.

2.1.

Compromise strategies

The apparent lack of congruence at one or more of these levels results in compromise strategies in codeswitching rather than mixed constituents (constituents consisting of morphemes from two or more varieties), all occurring within a larger syntactic unit. The two compromise strategies discussed are embedded language islands and bare forms. Embedded language islands were defined in Myers-Scotton (1993a) as constituents that are entirely well-formed in the embedded language and that show structural dependency relations (i.e., not simply two adjacent morphemes), but within a larger matrix language-framed constituent. In (3), early this month is an embedded language island. (3) Hata siyo mwezijana. I-li-ku-w-a early this month. Not even last month. CI. 9-past-infin-cop-fv 'Not even last month. It was early this month.' Note: fv=finalvowel in this and other Swahili examples (Swahili/English Myers-Scotton 1993a [1997]: 147) Bare forms are EL content morphemes that do not show all the system morphemes that would make them well-formed in the Matrix Language. In (4) wife is an example of a bare form. For nouns to be well-formed in Swahili, the Matrix Language here, they must show noun class prefixes. Here, the relevant noun class prefix would be wa- for class 2. Note that the class 2 agreement prefix does appear on the numerical modifier watatu 'three'.

28 Carol Myers Scotton

(4) Lakini hu-yo jamaa na-siki-a a-na wife wa-tatu. But Dem-C1.9 person 1 S-hear-Fv3 s/Pres-assoc wife C1.2-three 'But that guy, I hear, has three wives.' (Swahili/English Myers-Scotton 1993a [1997]: 114) Both embedded language islands and bare forms were described in Myers-Scotton (1993a), as permitted under the matrix language frame model; Myers-Scotton and Jake (1995) tries to explain why these structures occur. Further, in the 1993a volume, embedded language islands were simply referred to as optional; the 1995 article makes a stronger claim, arguing that embedded language islands must occur if the speaker's intentions to convey certain semantic/pragmatic information are to be satisfied. Both islands and bare forms result when there is incongruence regarding any of the three levels of abstract structure just discussed. For example, consider wife in (4); because lemma entries for English nouns do not provide information allowing them to receive an appropriate Swahili noun class prefix, they receive none at all and occur on the surface as bare forms. In this case, there is incongruence at the level of morphological realization patterns. However, because there is sufficient congruence between wife and a Swahili counterpart at the level of lexicalconceptual structure, it can appear in a Swahili-framed constituent. In (3) early this month is an adverbial adjunct that shares lexical-conceptual structure with a Swahili counterpart. However, units of time are more often discussed in English than in Swahili by Swahili/English bilinguals, especially when they are preceded by degree adverbials, such as early. That is, early this month has a slightly different pragmatic force in English than the Swahili counterpart mwezi huu mapema 'month this early'. Thus, this example shows some incongruence at the level of lexical-conceptual structure, resulting in an English island as the preferred selection.

2.2.

Implications regarding cross-linguistic differences

Myers-Scotton—Jake (1995) goes on to claim that codeswitching data

The matrix language frame model 29

including compromise strategies provide implications regarding crosslinguistic differences in lexical entries. Further, the entire notion that congruence checking at different levels is possible implies the modular nature of abstract grammatical structures. These ideas about abstract levels are further developed and formalized in the abstract level model (Myers-Scotton--Jake 1999). Studies by Myers-Scotton (1997) and Bolonyai (1998) show how levels of abstract structure can be split in one variety and combined with levels from another variety. A composite matrix language structures bilingual constituents in a type of language change that would result in a matrix language turnover if it would go to completion (Myers-Scotton 1998).

3. The matrix language frame model modified The Afterword in the 1997 edition of Duelling Languages clarifies several parts of the Matrix Language Frame model, revises certain claims, and introduces the idea of a composite Matrix Language.

3.1

The matrix language vs. embedded language distinction

The first issue addressed is how the Matrix Language Frame model should be characterized. While the original discussion does refer to the model as a production-based model (1993a: 6), the Afterword is at pains to make it clear that "those descriptions [by interpreters of the model] of the MLF [Matrix Language Frame] model as production-based are at best incomplete and have distracted attention from the fact that the heart of the model explains codeswitching data by referring to the abstract level of linguistic competence" (1997: 241). Further, the claims that the matrix language vs. embedded language distinction holds in all bilingual utterances and that the content vs. system morpheme distinction applies to linguistic data in general (not just to codeswitching data) are, in effect, claims that these are universal distinctions. By implication, these distinctions can only derive from underlying linguistic competence (part of the universal linguistic faculty). While I agree with researchers who

30 Carol Myers-Scotton

argue that codeswitching can be explained without invoking a third grammar (i.e., in addition to the grammars of the participating languages), I do argue that because codeswitching involves two or more varieties, an additional opposition - the Matrix Language vs. the Embedded Language - is necessary to explain the differential roles of the participating grammars.3

3.2 Projection of complementizer (CP) Second, the Afterword clarifies the unit of analysis for codeswitching as the Projection of Complementizer (CP), not the sentence. This idea was implicit in the 1993a volume, but admittedly at times the analysis there implies the sentence as the analytic unit.4 By the writing of MyersScotton and Jake (1995), I realized that sentence or even clause was too inexact to apply unambiguously. In contrast, what constitutes a CP (Projection of complementizer) is clearer: it is the highest unit projected by lexical items. Another reason to employ the CP in analyzing bilingual speech is that examining utterances based on the sentence may tell little about the constraints actually structuring mixed constituents because many sentences qualify as bilingual by including two or more monolingual CPs in different languages. In such cases, the two languages are not necessarily really in contact from a structural point of view. A CP may qualify as a bilingual unit in several ways. While of course the CP itself is a constituent, it may contain one or more constituents that it dominates that are Embedded Language islands (e.g., a prepositional phrase) or one or more that are mixed constituents (with morphemes from both participating varieties). Thus, a bilingual CP contains minimally a mixed constituent or at least one Embedded Language island. For example, even though (1) can be considered a single sentence, there are two CPs, only one of which is bilingual (ambazo zinafunction right now 'that function right now'), based on the fact it contains a mixed constituent.

The matrix language frame model 31

3.3.

Classic codeswitching

Third, the Afterword makes clear that the original Matrix Language Frame model was intended as a model only of classic codeswitching. Classic codeswitching consists of bilingual CPs (Projections of Complementizer) that conform entirely to either (a) Matrix Language constraints (mixed constituents or Matrix Language islands) or (b) are well-formed Embedded Language islands, while still under Matrix Language control in some ways, such as placement of the island in the larger CP. It follows that speakers who engage in classic codeswitching are able to produce well-formed utterances in both of the participating varieties. However, this does not mean the speakers are necessarily equally proficient in both varieties. Nor does it mean that the varieties they speak are necessarily the standard dialects of the languages.5 Even when the bilingual utterances of speakers qualify as classic codeswitching, it is likely that differences in proficiency will affect which options are taken up from among permissible patterns. For example, Finlayson, Calteaux and Myers-Scotton (1998: 412-413) shows that more educated speakers produce more Embedded Language islands than less educated ones (in Zulu/English or Sotho/English codeswitching). This difference holds, even though there is no significant difference across the groups in the percentage of bilingual CPs against the overall total of CPs in their conversations. Thus, for the less educated speakers, singly-occurring Embedded Language morphemes rather than Embedded Language islands are what make their CPs bilingual. While instances of classic codeswitching clearly exist (e.g., the Swahili/English codeswitching data base in Duelling Languages and the Arabic/English data base in Jake and Myers-Scotton 1997), it may be a lessfrequentphenomenon than bilingual speech that shows convergence at one or more of the three levels of abstract structure mentioned in section two. For an example of such convergence, refer back to example (2). The situation is complicated by the real possibility in the same corpus that some bilingual CPs will show classic codeswitching and others will show codeswitching with convergence. While the 1993a volume considered deep grammatical borrowing in Chapter Seven, the

32 Carol Myers-Scotton

Afterword more explicitly recognizes the need to posit a composite Matrix Language if one is to explain differences in the structural makeup of bilingual speech in different types of contact situations.

3.4.

Matrix language as a notion for codeswitching analysis

Fourth, and perhaps most important, the Afterword addresses the most misunderstood part of the matrix language frame model, what the notion of having a matrix language means. Myers-Scotton (1993a [1997]: 3) states, "the ML [matrix language] sets the morphosyntactic frame of sentences showing CS [codeswitching])." The idea of the matrix language as the frame remains central to the notion of the matrix language, but now with the CP (projection of complementizer), not the sentence, as the most appropriate unit of analysis. Unfortunately - because of lack of specificity on my part - the construct of Matrix Language has been taken by some observers to be a specific language. However, rather than equating the matrix language with an existing language, one should view the matrix language as an abstractframe,the source of grammatical structure for the bilingual CP. If the matrix language is not a language, what is its relation to its source variety, and what does it mean to say that it is an abstract frame? In classic codeswitching, the matrix language is identical to the frame of one of the varieties (languages) involved. Because speaker intentions select this language as the source of the frame for bilingual CPs and because speakers have sufficient access to the frame of the selected language, calling the matrix language by the name of this language should be seen as a convenient shortcut (i.e., a convention of the model) - but as no more than that. The reason that equating the matrix language with a language is inexact in that the matrix language exists only as a morphosyntactic abstraction. In contrast, languages exist as full linguistic systems when they are realized as their dialects. As an abstract frame, the matrix language does include specifications at the three levels of grammatical structure outlined above (lexical-conceptual structure, predicate-argument structure, and morphological realization patterns). Thus, the matrix language includes slots for permissible surface-level

The matrix language frame model 33

morphemes, but it is not synonymous with a fully fleshed-out linguistic variety. Related to its abstract nature is the possibility for the matrix language to be a composite of abstract structure from more than one source variety. Such a matrix anguage is called a composite matrix language to distinguish it from the matrix language of classic codeswitching. A composite matrix language arises when speakers do not have full access to the grammatical frame of the language that is the selected matrix language. For example, such a situation occurs in se-cond language acquisition when speakers produce an interlanguage variety. Or, speakers may have ambiguous intentions about a desired matrix language (cf. Jake and Myers-Scotton, forthcoming). This may occur in immigrant groups who are shifting from their L1 to the dominant language of their new community.

3.4.1. Matrix language defined How is the matrix language identified? The Afterword points out that earlier attempts (Myers-Scotton 1993a [1997]: 66-69) to define the matrix language outside of the bilingual CP were wrong. Specifically, the claim that the matrix language will be the source of the most morphemes in the discourse is discounted. However, some observers, such as Poulisse (1998: 379), erroneously link the status of the system morpheme and the morpheme order principle to morpheme counting; dropping morpheme counting in no way affects the status of these principles.6 Rather, the error was to link the matrix language with a frequency metric and to not state that the nature of the matrix language is necessarily only structural. Some researchers, such as Boumans (1998) argue that various lower level structures within a CP (Projection of complementizer) can each have their own matrix language. That is, Boumans is recognizing what are embedded language islands in the matrix language frame model as independent units, at least in relation to the language of the larger CP. Of course, Boumans's position is supported by the fact that within an embedded language island, all the morphemes come from the embedded

34 Carol Myers-Scotton

language - even though, as stated above, the position of the island within the CP may be under matrix language control.7 The problem with adopting Boumans' approach is that it misses the generalization that the roles of the participating languages in bilingual production are different, with one language having greater structural import. Considering every embedded language island as having a separate matrix language, as Boumans suggests, ignores two asymmetries: (1) the fact that only one variety is the source of grammatical structure in mixed constituents within the same CP and (2) the fact that mixed constituents are much more frequent than embedded language islands across most data sets in the literature. Further, new empirical evidence suggests a relevant hypothesis. The hypothesis is this: Once a matrix language is established within a bilingual CP, that matrix language is not subject to change within that CP. Quantitative data cited in Finlayson et al. (1998: 412) support the hypothesis. Across 124 bilingual CP's studied in their corpus, the matrix language is sometimes English and sometimes either Zulu or Sotho. However, within a single CP, the matrix language remains the same. That is, it is not just within a single mixed constituent that one language sets the frame, but rather for all mixed constituents within the same bilingual CP.

3.4.2. Discourse-based distinctions While it is important to think of the matrix language vs. embedded language distinction as a structure-based opposition, the distinction may well coincide with discourse-based distinctions. Those discourse-based distinctions identifying the unmarked choice (Myers-Scotton 1993b) or dominant language (Lanza 1997) may also be referring to the source variety of the matrix language in classic codeswitching. Also, especially when a bilingual CP contains an apparent embedded language island (a constituent with Embedded Language system morphemes that would constitute a violation of the system morpheme principle if they were not in an island), it is simply useful - as a check on the researcher's preliminary analysis - to look beyond the single bilingual CP to see if the same language supplies syntactically relevant system morphemes to

The matrix language frame model 3 5

neighboring bilingual CPs (e.g., in the same conversational turn). Boumans (1998: 37-38) seems to criticize the validity of using such discourse information to point to the matrix language (Myers-Scotton 1997: 223). Yet, he seems to miss my use of the word generally in regard to the value of such information. No criteria outside the structural makeup of the bilingual CP under examination always identifies the matrix language. Also, even though they do not identify or determine it, factors outside the bilingual CP do affect the choice of the matrix language, an observation from Duelling Languages that remains intact. Changes in socio-political conditions in a community or in speakers' proficiency levels across languages can affect choice of the matrix language. These changes also can lead to a turnover of the matrix language over time (Myers-Scotton 1998). Also, the matrix language can change within a conversation if situational factors (e.g., participants, topic) change.

3.4.3. Matrix language as opposition to embedded language In line with its structure-based nature, I stress that the matrix language is only identified in opposition to the embedded language. Further, this opposition only exists in a bilingual CP; therefore, looking for a matrix language in monolingual CPs (as some researchers have done) misses the point of the construct. The basis of the opposition is that the role of the matrix language in the bilingual CP is different from that of the embedded language. The differences in their roles are formalized in the system morpheme principle and the morpheme order rrinciple. They are repeated here from Myers-Scotton (1993 [1997]: 83): The Morpheme-Order Principle: In ML + EL constituents [mixed constituents] consisting of singly-occurring EL lexemes and any number of ML morphemes, surface morpheme order (reflecting surface syntactic relations) will be that of the ML. The System Morpheme Principle: In ML + EL constituents [mixed constituents], all system morphemes which have grammatical relations external to their head constituent (i.e., which participate in the sentence's thematic role grid) will come from the ML.

These principles are testable hypotheses, and data from classic

36 Carol Myers-Scotton

codeswitching largely support them. For example, consider again (1) in which the verbal inflections come from Swahili, the matrix language, and the morpheme order follows that of Swahili. In turn, this support for the principles is evidence for the usefulness of the constructs of matrix language and the embedded language as heuristic devices. Their relationship, as expressed in the two principles, captures a generalization across diverse data sets, providing for a parsimonious analysis of bilingual data.

3.4.4. The system morpheme principle The system morpheme principle has been misinterpeted by many observers. Some observers apparently assume that the principle requires that any system morpheme come from the Matrix Language; however, the principle clearly restricts its scope to system morphemes having grammatical relations external to their head constituent,8 However, I may be responsible for part of any misunderstandings because in my discussion of data supporting the principle (Myers-Scotton 1993a [1997]: 105-109) I give a mixed bag of system morphemes. Most examples that I cite do meet the specifications of the principle; i.e., they have grammatical relations external to their head constituent (e.g., those Swahili class 10 agreement prefixes (z- and -zi~ in example (28), reproduced here as (5). The form of the prefix on the possessive pronoun (z-ake) and the forms of prefixes on the verb (ha-zi-ku-w-a) depend on information external to their own heads. For example, the form of the agreement prefixes (ζ- and zi-) depends on the noun class prefix (on the noun) with which they are coindexed - even though in this case the class 10 noun class prefix on results is not overt. (5). . 0-results z-ake ha-zi-ku-w-a n-zuri. . . CI. 10/PL-results CI. 10-his 3s/Neg-cl. 10-Infin-Cop-Fv CI. 10-fine . .his results were not good...' (Swahili/English Myers-Scotton 1993 [1997]: 106) However, in my discussion of evidence for the principle I also cite

The matrix language frame model 37

some examples of system morphemes that, indeed, are system morphemes, but not that type of system morpheme that is relevant to the principle. That is, they are not system morphemes that have relations external to their heads. For example in (31) on page 107, the French article le in the mixed noun phrase (MP) le brain is indeed a system morpheme, but - as will become clear in section 4.3 in the discussion of the 4-M model - this article is an early system morpheme, a morpheme that does not depend on having grammatical relations external to its head constituent for its form. The 4-M model (section 4) divides system morphemes into three types. This new discussion clarifies the limits of the system morpheme principle much better than I did in Duelling Languages. In Section Four, I show how many of the supposed counterexamples to the system morpheme principle cited in the literature are not counter-examples at all - because they do not include system morphemes of the type specified by the principle.

3.4.5. Identity of constructs Still, some have criticized the principles as circular, claiming that it is not clear what type of data would falsify them. However, this claim seems to be a red herring since it is obvious that if either (a) embedded language system morphemes with relations external to their heads or (b) embedded language morpheme order (if it is different from matrix language order) occurs in mixed constituents, the principles are falsified. Rather, I suspect what really bothers critics is that they want the matrix language and embedded language to have an identity independent of the two principles (cf. Boumans 1998: 39). However, as I have argued above, these constructs are abstractions. They only become real as the terms of hypotheses (principles). If the hypotheses are supported, there are two results. First, in an empirical sense, the terms Matrix Language and Embedded Language are useful labels for the languages against which the hypotheses have been tested. Second, more important, in a theoretical sense, the terms can be employed in the explanatory generalization that two or more varieties in bilingual CPs showing classic codeswitching have asymmetrical roles.

38 C arol Myers-Scotton

3.5.

The role of congruence

The Afterword also contains a lengthy discussion of the role congruence plays in affecting codeswitching patterns, especially the occurrence of embedded language islands. This is an amplification of the arguments in Myers-Scotton and Jake (1995), although it also refers to later work that appeared as Jake and Myers-Scotton (1997). This article argues that the extraordinary number of large embedded language islands (in this case, IP islands) in an Arabic/English corpus results from lack of congruence between the two languages in regard to information in the lemmas supporting verbs (at the level of the mental lexicon). This argument is further developed in Myers-Scotton and Jake (1999). Still, some observers have misunderstood the Myers-Scotton and Jake position on congruence. In a rather complicated passage, Meechan and Poplack (1995: 190) argue that when there is non-equivalence and therefore a lack of congruence across the two languages, the Matrix languageframemodel would predict little codeswitching. They seem to go on to say that because, in fact, codeswitching does occur in the Fongbe/French corpus they analyze, the matrix language frame model is not supported. The authors claim that Fongbe and French adjectival expressions are incongruent; yet, French adjectives occur and are designated as codeswitching forms by the authors. In fact, the matrix language frame model would not claim a lack of codeswitching when there is incongruence; rather, the model predicts codeswitching with compromise strategies in such cases.9 One compromise strategy discussed under the matrix language frame model (in many places, e.g., in Jake and Myers-Scotton 1997) is the do verb construction. This is a means of incorporating embedded language verbs into a mixed constituent by inflecting a matrix language do verb with the necessary affixes and following it with an embedded language bare infinitive or other non-finite form. (It seems clear that the do construction in Fongbe/French codeswitching is also a compromise strategy, operating as a means of incorporating French adjectival expressions into a Fongbe frame). Whether they recognize it or not, the conclusion of Meechan and Poplack implies that strategies come into play that are similar to the compromise strategies referred to in discussions of the matrix language

The matrix language frame model 39

frame model. They write, "[s]peakers [Fongbe/French CS speakers] proceed directly to solution (vi) " (p. 191). Solution (vi) reads: "Equivalence is established via a language-internal mechanism specialized for the incorporation of nonequivalent categories" (p. 172).

3.6.

Status of morphemes

The Afterword elaborates on the status of morphemes in two ways: (a) It introduces a distinction among system morphemes that is further developed in the 4-M model in Myers-Scotton and Jake (1999) and (b) it summarizes an argument from Myers-Scotton and Jake (1995) that discourse markers should be considered content morphemes. While discourse markers do not participate in the thematic grid of a CP, the new claim is that these markers are content morphemes at the discourse level because they assign discourse-thematic roles, such as consequence, reason, etc. For example, in (6) entonces 'so' is such a discourse marker, restricting interpretation of the following IP to consequence: (6) Entonces he can use his calling card So 'So, he can use his calling card.' (Spanish/English corpus M5.5 cited in Myers-Scotton and Jake 1999:30 )

3.7.

Status of borrowed words

Also, the status of borrowed forms vs. singly-occurring CS forms is touched on in the Afterword, but the position taken in Myers-Scotton (1993a) is not changed. That position is that singly-occurring CS forms (content morphemes) are difficult, if not impossible, to distinguish from established loan words in a synchronic sense because they both are morphosyntactically integrated into the Matrix Language or recipient language. The two forms may differ phonetically in that codeswitching forms typically retain much, if not all, of the phonetic features of their

40 Carol Myers-Scotton

production in the embedded language; many established loans, of course (but by no means all!), take on the phonetic features of the recipient language. Presumably, the two forms differ psycholinguistically in that a codeswitching form only has an entry in the mental lexicon tagged for the embedded language, while an established borrowed form has an entry in the matrix language and probably retains one in the embedded language.

3.7.1. Nonce borrowings Poplack and her associates continue to analyze singly-occurring forms (from the embedded language under the matrix language frame model) in bilingual CPs as nonce borrowings when they are morphosyntactically integrated into the language in which they occur (BudzhakJones-Poplack 1997).10 Under the matrix language frame model such forms are singly-occurring codeswitching forms, just because they are morphosyntactically integrated into the matrix language.

3.7.2. Claims on the distinction between borrowing and codeswitching Poplack and her associates also support their claims about distinctions between borrowed forms and codeswitching forms by interpreting data in ways that I find unconvincing. In Meechan and Poplack 1995, the authors argue that in Fongbe/French codeswitching (with Fongbe as the matrix language under a matrix languageframemodel analysis), if French adjectives occur as M y inflected for French gender and number, they are codeswitching forms. That is, because they show French inflections, they are codeswitching forms, not borrowed forms. For example in (7) they cite importante, inflected for gender and singular number to agree with langue, as a codeswitching form: (7) Done 3 nyc m= do que langue 3 e do importante So Top I see tell that language Def she be important 'So, me, I see that language is important.'

The matrix language frame model 41

(Fongbe/French Meechan—Poplack 1995: 187) Another reason Meechan and Poplack seem to think that these French adjectives must be codeswitching forms and not nonce borrowings is that they differ in a number of ways from unmarked Fongbe constructions. They state, "[I]t is clear that adjective constructions with do do not follow the dominant pattern of Fongbe adjectival expressions " (1995: 187). Elsewhere (1995:191) they state that this construction is "virtually nonexistent in monolingual discourse." As noted above in section 3.5, the do verb construction in many other codeswitching pairs is reminiscent of the Fongbe do verb construction because both seem to be compromise strategies so that the speaker can incorporate an embedded language content morpheme into the utterance. Thus, while the way that French adjectives are treated in the Fongbe/French data may not follow the unmarked Fongbe pattern, evidence from other codeswitching corpora with do constructions indicates that the Fongbe treatment of embedded language adjectives is not remarkable. Along with many other researchers studying codeswitching (e.g., Backus 1996), I accept as codeswitching forms any singly-occurring embedded language forms that follow matrix language morpheme order and are either (a) fully morpho syntactically integrated into the matrix language or (b) forms, such as these French adjectives, that are bare forms from the standpoint of the matrix language.11 Recall that for Meechan and Poplack, fully morphologically integrated forms would be nonce borrowings (see section 3.8.1).

4. New approaches to morphemes What is innovative about Myers-Scotton and Jake (1999) is that it spells out the formal characteristics of morphemes so that a four-way distinction results. This distinction is called the 4-M model. This model builds on the matrix language frame model because it preserves the content vs. system morpheme distinction. What is new is that the model subdivides system morphemes into three types. However, this model is not so much a classification of morphemes but rather constitutes a

42 Carol Myers- Scotton

hypothesis on how morphemes are conceived in linguistic competence and accessed in production. This article also elaborates on what is involved in congruence crosslinguistically, synthesizing its characterization as the abstract level model. This model refers to the three levels of abstract grammatical structure first related to the matrix language frame model in Myers-Scotton and Jake (1995), as discussed in Section Two. The 1999 article applies these two models to several troublesome data sets and shows how one of the models, or a combination of the two, helps explain the distributions of data in classic codeswitching corpora. The analysis also shows briefly how the abstract level model allows for a more precise description of convergence data in which a composite matrix language is the source of the abstract frame (i.e., specifications from two or more varieties, not a single one).

4.1.

The 4-M Model

In the current chapter, the discussion emphasizes the 4-M model because its description is relevant to most of the putative counter-examples to the matrix language frame model. Subdividing system morphemes allows for a fuller explanation of why certain types of congruence problems arise in codeswitching between certain language pairs. The premise that underlies the 4-M model is that languages differ in the level at which different classes of system morphemes are activated or elected. The model hypothesizes that system morphemes are activated at two different levels.

4.1.1. The early system morpheme Along with content morphemes, one type of system morpheme, labeled early, is activated at the lemma level. That is, both are conceptuallyactivated. Content morphemes, of course, assign or receive thematic roles; the lemmas that underlie them (level of the mental lexicon or the lemma level) are directly elected by the semantic/pragmatic feature bundle that is selected at the conceptual (pre-linguistic) level. Early or

The matrix language frame model 43

indirectly-elected system morphemes are also activated at the lemma level because the lemmas underlying content morphemes point to them (cf. Bock—Levelt 1994)12. Unlike content morphemes, early system morphemes do not receive or assign thematic roles.

4.1.2. Late system morphemes There are two types of late system morphemes. They are called bridges and outsiders. They are neither activated at the lemma level nor do they receive/assign thematic roles. They are structurally-assigned when information about the constituent structure of morphemes and their assembly as parts of larger constituents is available at the level of the formulator. While there are slots for such late system morphemes at the lemma level, their form depends on information only available when constituent assembly occurs in the formulator. (The formulator assembles the constituent structure of maximal projections, based on information sent to it from the lemmas.)

4.2.

Types of system morphemes

The three types of system morphemes can be distinguished formally: • Early system morphemes are always realized without going outside the maximal projection of the content morpheme that elects them. Thus, the French determiners le and la (respectively masculine and feminine singular) depend on their head nouns for their form. Similarly, when English the expresses definiteness, it also is an early system morpheme. • Bridge late system morphemes are similar to early system morphemes in that they depend on information within the maximal projection in which they occur. Yet, they differ in that they do not add conceptual structure to a content morpheme; rather, what they do is unite elements in a maximal projection. Hence, the name bridge. For example, in English the genitive possessive of as in friend of the family is a bridge system morpheme. Such a morpheme

44 Carol Myers-Scotton



4.3.

is not coindexed with any other form; in regard to of, its form depends on directions at the level of the formulator that require case to be realized in this way in this type of English construction. Outsider late system morphemes depend on grammatical information outside of their own maximal projection. The result is that the form of such morphemes is only available when the formulator sends directions to the positional/surface level for how the larger constitutent (the CP) is unified. For example, in English, in the present tense, the form of a verb that is coindexed with a 3rd person singular noun or pronoun must look to that noun or pronoun for its form (e.g., the dog bite-s the burglar). These morphemes are called outsiders because they look outside their immediate maximal projections for information about their form.

Constraints on codeswitching

How is this model relevant to explaining the constraints on codeswitching? First, the distinctions that the 4-M model makes are very relevant to the system morpheme principle. With a little thought, it should become clear that those system morphemes that the principle requires to be in the matrix language are outsider late system morphemes, and only those. Recall that the principle refers to all system morphemes that have grammatical relations external to their head constituent. Admittedly, the original formulation is less exact than the new version under the 4-M model, but both clearly refer to the same type of morpheme. For example, in (8), the form of the Shona prefix on the verb (-kasika 'hurry') must agree with the subject (yarn 'children', a class 2 noun) of the verb. The form of this subject prefix is not available until the noun phrase is assembled with the verb phrase; that is, the verb form must look outside its immediate maximal projection for the form of the prefix. Also in example (9), the form of the subject prefix on the verb - even though the verb is French - follows well-formedness rules for Lingala, the matrix language in this case. As such it shows the outsider late system morpheme agreeing with a 3rd person singular subject in (9a) and a first

The matrix language frame model 45

person singular subject in (9b). Again, the form of the subject prefixes on the verb cannot be established until the Verb Phrase is assembled into the larger constituent including the Noun Phrase subject of the verb. (8) Va-na va-no-kasik-a ku-absorb zvi-nhu C1.2/Pl-child C1.2/Pl-Pres/Habit-hurry-Fv Inf-absorb C1.8/Pl.-thing 'Children hurry to absorb things.' [Note: Fv=finalvowel] (Shona/English corpus; Bernsten and Myers-Scotton 1988 corpus) (9a) Stephane or-telephoner lobi tel Stephane 3S-telephone yesterday interrog 'Stephane didn't telephone yesterday?' (9b) ngai moto «α-telephoner. na-telephon-aki na tongo Is person ls-telephone lS-telephone-Past at morning Ί am the one who called. I called this morning.' (Lingala/French; Meeuwis and Blommaert, 1998: 86; [glosses added by CMS])13 In most languages with overt case marking, the form of the casebearing affix is assigned by the main verb in the complementizer phrase. Consider this example from an American Finnish/English corpus.14 The partitive case affix -tä is assigned to the noun napkin by the governing verb of possession or owning onks 'have', (10) Onks sulla vähän napkin-ei-tä have/Q you-Ade some napkin Pi-Part [Note: Ade= adessive case] 'Do you have some napkins?' (Finnish/English Halmari 1997: 80)

46 Carol Myers-Scotton

4.4.

Multi-morphemic forms

Sometimes, of course, a form is multi-morphemic. In such cases, MyersScotton and Jake (1999) hypothesizes that if a form includes an outsider system morpheme, the entire form is activated as an outsider form (i.e., its form is not accessed until the formulator assembles larger constituents).15 For example, in Swahili/English codeswitching, in (11), the form y-a consists of a bridge morpheme -a (the a of association to Bantuists) and an outsider morpheme y- that receives its form from the class 9 noun. While the form of the a of association is invariant (compare it with of in a similar construction in English or de in the French construction beaucoup de gens 'much of people' (lit.) = 'many people'), it receives a prefix that must agree with the Noun Phrase in the larger constituent. (11)

. ..Lakini

0-scale but CL.9/SING

y-a CL.9/slNG-of

chini. . . below

'. . .But a lower scale...' (Swahili/English Myers-Scotton 1993 [1997]: 104)

4.5.

Counterexamples

The literature presents relatively few presumed counter-examples to the system morpheme principle; however, it becomes clear most of them do not include outsider late system morphemes, meaning they are not counter-examples at all. For example, Bentahila and Davies (1998:40) cite a number of possible counter-examples to the predictions of the principle from corpora with Arabic and French as the participating languages. One of these involves what they refer to as the Arabic possessive preposition'djal'. Boumans (1998: 48) also cites an example of djal, also with Arabic and French as the participating languages. Example (12) comes from Bentahila and Davies: (12) . . . de quel degre de connaissance djal la personne . . . '. . . on which degree of knowledge of the person . . .'

The matrix language frame model 47

(Moroccan Arabic/French Bentahila-Davies 1998: 38) While Bentahila and Davies are correct in stating that the discussion in Myers-S cotton1993 a [1997]: 106 clearly identifies djal as a system morpheme, with the implication that its presence in the cited example supports the system morpheme principle. However, given the 4-M model, I would now argue otherwise. While I maintain that djal is a system morpheme, it is now clear that it is not an outsider, but rather a bridge late system morpheme. Recall that although bridge system morphemes are not conceptuallyactivated and therefore are accessed late in the production process (at the level of the formulator), they do not look outside their own immediate maximal projection for their form. This is a crucial difference between them and late outsider system morphemes. In example (12), the apparent matrix language is French. Why should Arabic supply the bridge morpheme in this case? One of the puzzles needing more study is that the system morpheme principle underpredicts the incidence of matrix language system morphemes. Not only late outsiders come from the matrix language; most early and most bridge system morphemes also come from the matrix language. Whatever the reason for its presence, djal is not a counter-example to the system morpheme principle since only outsider late system morphemes meet the specifications of the principle. Another potential counter-example cited by Bentahila and Davies leads to a recognition that the same phonetic shape may be classified as two different morphemes in different functions. Bentahila and Davies point out a lone example of the Arabic particle / , which corresponds to English 'in', in their example (3) on page 37. They say it is a form "which Myers-Scotton would certainly identify as a system morpheme" (p. 40), referring to a comment in Myers-Scotton (1993a [1997]: 123) about the Spanish equivalent of'in' (en). My response now would be yes, in this case, / i s a system morpheme, but - like djal - it is a bridge late system morpheme as it occurs in Bentahila and Davies' example: (13) Du moment ou tu n'as pas de reduction / le billet.... 'From the moment where you have no reduction in the ticket...'

48 Carol Myers-Scotton

(Moroccan Arabic/French; Bentahila-Davies, 1998: 37) In this case, reduction f le billet means 'ticket reduction'. That is, it is synonymous with other bridge system morphemes that refer to relationships between two noun phrases that are often partitive relationships (compare beaucoup de pain 'much bread'). As such,/is not a counter-example to the system morpheme principle. In other instances, / may be considered a content morpheme that assigns a thematic role to its complement. In cases such as this,/appears to assign the thematic role of goal or location to s serval 'the trousers'.16 This bilingual CP occurs in example (2) in the Bentahila and Davies article: (14) D'ailleurs, hadi ma tadxul sf s serwal. Besides this not enters not in the trousers 'Besides, this one does not tuck inside the trousers.' (Moroccan Arabic/French Bentahila and Davies 1998: 37) There is no reason why the same form cannot function as several different types of morphemes in different contexts. For example, English the is an early system morpheme, conveying definiteness in most contexts; however, in American English at least, it is a bridge late system morpheme in such an utterance as Haven't you heard? John's in the hospital. Note that what distinguishes / and the when they are bridge system morphemes from other system morphemes is that they do not add conceptual material (what early system morphemes do) nor do they look outside their immediate maximal projection for their form (what outside later system morphemes do).17

4.6.

Revisiting early system morphemes

Finally, in light of the 4-M Model and its discussion in Myers-Scotton and Jake (1999), one can re-visit what are now called early system morphemes and consider their distribution in codeswitching data sets.

The matrix language frame model 49

Recall that these system morphemes differfromother system morphemes because they are conceptually-activated in that in some sense they flesh out the semantics of the head that they modify. Recall also that they are accessed at the same time as their heads. Thus, it should be no surprise that early system morphemesfromthe embedded language occur at times in codeswitching corpora. That is, because they are accessed with content morphemes in monolingual speech, mistiming in accessing an early system morpheme from the Embedded Language along with a selected Embedded Language content morpheme seems very feasible.

4.6.1. Mistiming Considering possible mistiming helps explain the presence in mixed constituents of internal embedded language islands. Such islands have a matrix language element as their head; the island itself often consists of a determiner that is an early system morpheme and its noun phrase (NP) head. Such islands are often misunderstood by other researchers. For example Meechan and Poplack (1995: 193) mentioned above claim that codeswitching "in the vicinity of copular do [Fongbe/French codeswitching] constitutes a counterexample to the Matrix Language Frame Model . . . which stipulates that 'embedded language islands' (unambiguous code switches in our terminology) may not occur modified by system morphemesfromthe 'matrix language'." This is not the claim of the matrix language frame model. First, the example to which the footnote is attached (their example 3 Od on page 188) is an internal embedded language island under the matrix language frame model. Meechan and Poplack's example is: (15) Trop extravagant ä too extravagant Neg 'not too extravagant' Internal Embedded Language islands are discussed in Myers-Scotton (1993a [1997]: 151-153). To qualify as an internal Embedded Language island, the island must have a node dominating the Embedded Language

50 Carol Myers-Scotton

island filled by the matrix language; in this case, the negative ä fills that node. Meechan and Poplack misinterpret the relation of the negative element to the island. It does not modify the internal island; rather, the negative element dominates the island in the phrase structure. Second, whether, in fact, the negative element is a system morpheme is problematic; in many languages, one can argue that negative elements are content morphemes. Like all embedded language islands, internal embedded language islands must show structural dependency (prop extravagant does this with the adverb trop dependent on the adjective extravagant). Other researchers disparage an analysis allowing internal embedded language islands as impoverishing the system morpheme principle. Bentahila and Davies (1998: 36) assert that they "considerably weaken the original claim of the model" regarding system morphemes in mixed constituents. Later, they (1998: 41) refer to the "problem posed by the occurrence of French determiners in mixed language noun phrases." Also, Boumans (1998: 44-45) states, "I have no quarrel with the notion of internal EL islands, (sic) however, it undermines the authority of the System Morpheme Principle." Contrary to these claims, the system morphemes in internal embedded language islands have no effect on the system morpheme principle because they do not come under its purview. In line with the discussion in Myers-Scotton (1993a [1997]: 152), the construction housing the internal embedded language island is a matrix language constituent. The constituent itself consists of a matrix language morpheme in Specifier position and the embedded language internal island as a sister node. The morpheme in specifier position is an early system morpheme. In (16), with Shona as the matrix language, ka- is a noun class prefix (class 13, a diminutive class). Like other early system morphemes, ka- depends on its head for its form (in this case the noun house). Also, the embedded language system morpheme in the island, the -s plural on houses, is also an early system morpheme, also depending on its head for its form. (16) kasmall house-s C1.13/PL-small house-Pi. 'small houses'

The matrix language frame model 51

(Shona/English Myers-Scotton 1993a [1997]: 152) Similarly in dak la chemise 'this the shirt', the Arabic dak is an early system morpheme, as is the French article la in the French noun phrase (Bentahila and Davies, 1998: 36). Based on such data, I hypothesize that all system morphemes in constructions involving internal embedded language islands will be early system morphemes, not the late outsider system morphemes to which the system morpheme principle refers. Thus, I do not see how analyzing such constructions as internal islands has any effect on the system morpheme principle at all. I would go on to predict that early system morphemes that are embedded language determiners may occur frequently in some data sets in internal islands, given the relation of some determiners to their heads. Not only do they add definiteness but in some languages (for example, Romance languages), determiners make visible the phi features of number and gender that are properties of their head nouns.

4.6.2. Double morphological features Similarly, it is no surprise if nouns in other data sets often occur with an embedded language plural marker doubling a matrix language plural marker (as in ka-small house-s above). Myers-Scotton and Jake (1999) propose the hypothesis that only early system morphemes can double in codeswitching data sets. We argue that mistiming in the production process is a likely reason for such doubling (i.e., the early system morpheme is indirectly-elected along with its content morpheme head that is directly-elected). Double morphology involving the plural affix19 is most common, as in (16) where lesson receives both the matrix language prefix ma- and the embedded language suffix -s. The plural affix in many languages is a prototypical example of an early system morpheme because they are realized without going outside the maximal projection whose head elects them and their form depends on this head. (16) . . . va-no-nok-a ku-it-a catch-up /ww-/wa-lesson-s C12/Pl.-Hab-be late-Fv Inf-do-Fv catch-up C1.18-C1.6/Pl-lesson-Pl

52 Carol Myers-Scotton

. .they are late to catch up in [their] lessons.' [Note: Fv= final vowel] (Shona/English; Bernsten-Myers-Scotton 1988 corpus)

5. The composite matrix language The notion of a composite matrix language is discussed in several works (e.g. Afterword 1997; Myers-Scotton 1998); it is considered specifically in terms of the abstract level model in Myers-Scotton and Jake (1999). As indicated in Section One, the notion of classic codeswitching assumes that speakers are proficient enough in the variety selected (generally unconsciously) as the matrix language to employ its abstract grammatical frame to structure the bilingual CP when they engage in classic codeswitching. However, there are many cases of bilingual speech for which speakers do not have sufficient access to the frame of a target matrix language to employ it consistently/completely in their codeswitching. In such cases, I argue that the result is a composite matrix language that structures the bilingual projection of complementizer CP. That is, abstract lexical structure from more than one variety is involved in building the frame. Such is the case in convergence and potential language shift, for example. The abstract level model provides for a composite outcome because it is based on the premise that levels of structure can be split and recombined. that is, one level - or parts of one level - may comefromone variety and other levels - or their parts - from another variety. The result is a composite matrix language. Consider (17) from the speech of a Hungarian child whose L1 is Hungarian, but who is growing up in the United States, with English becoming her dominant language. Bolonyai (1998: 34) shows how English influence at the level of lexical-conceptual structure can be seen in the way the English expression to play school is mapped onto the largely-Hungarian frame. Bolonyai points out that "[i]n English, lexical-conceptual structure projects a Locative thematic role (i.e., school), whereas in Hungarian, an actor is the required thematic role (i.e., iskoläs 'schooler')."

The matrix language frame model 53

(17) jätsz-ok

school-0/

play-1 Sg/Pres/Sub.Conj school-Acc 'I'm playing school.' Standard Hungarian: iskolä-s-at jätsz-ok school-N-Acc play-1 Sg/Pres/Obj.Conj [Expected: school-os-at jätszok] (Hungarian/English Bolonyai 1998: 34) At the level of lexical-conceptual structure school is treated as a locative, not as the basis for a derived actor. At the level of morphological realization patterns, verb placement is according to English, not Hungarian. Also, the Hungarian suffix for actor -s is missing before the accusative marker on the codeswitched form school. Thus, in this example there is evidence that a composite matrix language, which consists of English as well as Hungarian abstract structure, is structuring the utterance. Research under the abstract level model (in combination with the 4-M model) is in its early stages. In various ways, these new models interact with the original matrix language frame model. For example, one can formulate hypotheses about the sources of various parts of a composite matrix language, either regarding morpheme types or levels of structure. These models have promise in showing how combining structures in language contact phenomena other than codeswitching can be explained rather than simply described as transference. These combinations have analyzable, even predictable, internal structure.

6. Conclusion

In summary, first, I have tried to clarify the original provisions and modifications of the matrix manguageframemodel as it applies to classic codeswitching. Second, I have briefly explained two new sub-models that complement the original model; these are the 4-M model of morpheme types and the abstract level model of complex grammatical structure. Third, I have emphasized two specific goals: making plain my conception of the matrix language and spelling out the intended referents

54 Carol Myers-Scotton

of the system morpheme principle. The discussion has emphasized that the matrix language is best understood, not as an actual language, but rather as an theoretical construct referring to the abstract morphosyntactic frame that structures bilingual utterances. The value of the concept of a matrix language-asframe and its opposition to the role of the embedded language is that it explains codeswitching patterns and asymmetries across diverse CS data sets. Thanks to the development of the 4-M model, the system morpheme principle can now be better understood. It becomes clear that the system morphemes to which this principle applies are only one of the three types of system morpheme; these are the outsider late system morphemes that must look outside their immediate maximal projection for information about their form. I argue here that some, if not most, of the supposed counter-examples to the principle involve other types of system morphemes. An obvious subject for further research is the status of these other morphemes in relation to the matrix language frame; that is, why is the matrix language still the source of most of them? The explanation may be two-fold: (1) While both languages are on, the matrix language has a higher level of activation than the embedded language; this promotes selection of matrix language morphemes in general. (2) Because they are naturally congruent with the requirements of the abstract grammatical frame projected by the matrix language, they are easily selected first. Still, I hope I have answered some questions here. At the same time, all of the questions regarding the structure of codeswitching and other language contact phenomena have not been answered yet; there is always room for new analyses and insights from many researchers. Notes 1. Exactly which lexical categories assign thematic roles is still under much discussion among syntacticians/semanticists. This means that I must admit that there are more grey areas than I would like regarding which lexical categories, or parts of lexical categories, are content vs. system morphemes. Further, there may be disagreement as to the exact identification of which thematic role is assigned in a given instance.

The matrix language frame model 55

2.

3.

4.

5.

6.

7.

Still, the content vs. system morpheme distinction remains viable because at least there is consensus that certain lexical categories (verbs and some prepositions) are prototypical thematic role assigners and that certain other categories (nouns and attributive adjectives) are prototypical thematic role receivers. The levels of lexical-conceptual structure and predicate-argument structure are based on discussion in Rappaport-Levin (1988). The conception of the level of morphological realization patterns draws on Talmy (1985). A key premise for researchers, such as MacSwan (1999), who are attempting to explain codeswitching within a minimalist framework (Chomsky 1995) is that only the same constraints that explain monolingual syntactic structures are necessary to explain codeswitching data. For example, in the discussion of embedded language islands (1993a [1997]: 128, 130) several supposed examples of islands (e.g., examples 14 and 15) are, in fact, full monolingual CPs, not islands at all. Meeuwis-Blommaert (1998: 76) state that they question my frame of reference in Myers-Scotton (1993a and 1999b), implying that my use of the term language implies a monolectal view. They also ask, "what particular varieties of English, Swahili, or Shona are being used?" (p. 79). I hope it is clear to most observers that I use the term language only as a common conventional term for a linguistic variety that often is not the standard dialect. Further, if it needs saying, of course I acknowledge that the status of any one variety and its structure vis a vis other varieties is always in flux. In her review of the 1997 edition, Poulisse (1998) raises several puzzling points. First, she seems to have missed the discussion in the Afterword of the CP as the proper unit of analysis because she writes,"... it is not always clear which ρ art of the discourse should be used as the unit of analysis...." (p. 379). Second, she is right to say that if "the predictions made by the Morpheme Order Principle and the System Order (sic) [system morpheme] principle are not always correct", then "the MLF model is falsified to some extent" (p. 379). But she fails to go on and give evidence that the principles are falsified across data sets. She does state that in her own L2 learner data, there are 33 instances of Dutch determiners. However, such determiners are not the type of system morphemes to which the system morpheme principle applies. Determiners in Dutch - like determiners in many languages - are early system morphemes (their form depends on their relation to the heads of their immediate maximal projection), not late outsider system morphemes. See section 4. Later, Boumans (1998: p. 76 ff.) seems to support the claim that the matrix language is the language of the inflections of the tensed verb. As long as the tensed verb is in a mixed constituent, this claim is acceptable because most (if not all) of such verbal inflections are thetypeof system morpheme that must come from the matrix language under the system morpheme principle. However, it is quite another story if the tensed verb of the CP is in an embedded language island. While such examples are not frequent across many data sets, they are extremely frequent in some data sets, such as the Palestinian Arabic/English corpus discussed in Jake and Myers-Scotton (1997)

56 Carol Myers-Scotton and Myers-Scotton and Jake (1999). That is, English IP embedded language islands (N= 86) make up nearly 20% of the English elements in the data set. 8. My use of syntactically relevant as an abbreviated way to say have grammatical relations external to their head constituent clearly was a source of confusion (MyersScotton 1993a [1997]: 83). Even as a short form, syntactically relevant was intended as a technical term. However, Boumans (1998: 37) seems to have interpreted relevant as only important when he writes (regarding the presence of EL plural affixes in a mixed constituent), "[I]f there is just an EL system morpheme marking plurality in a mixed constituent, it cannot be said to be syntactically irrelevant." 9. A problem arises regarding the interpretation of Meechan and Poplack regarding the position of the matrix language frame model on bare forms. Their reference is to (Jake and Myers-Scotton 1994), a poster presentation in which only embedded language islands are the only topic (as compromise strategies). However, elsewhere, Myers-Scotton (1993a [1997]: 92ff.;l 10ff; 1999a), as well as Jake-Myers-Scotton (1997), discuss bare forms and the do construction as ways to preserve the constraints of the model (i.e., as compromise strategies). Thus, while the matrix languageframemodel would predict that incongruence may prevent some embedded language forms from receiving frill morphosyntactic integration into the matrix language frame, the model has always allowed such forms as bare forms in mixed constituents. 10. Budzhak-Jones and Poplack (1997:251) write, "It is now clear that the lone Englishorigin nouns with overt inflections for Ukrainian number, gender, and/or case, whether standard or not, have been integrated into the grammar of Ukrainian. This means that they are borrowings, if only for the nonce." They go on to imply that for singly-occurring embedded language forms to be considered codeswitching forms, they should "display some features of English grammar which is (sic) at the same time not Ukrainian"(p. 252) . The matrix language frame model takes a very different position. If singly-occurring forms in this data set show English morpheme order or syntactically-relevant system morphemes, the matrix language frame model would make an opposite argument. They would not be considered as codeswitching forms, but rather as counter-examples to the model because Ukrainian, not English, system morphemes are predicted with codeswitching forms in mixed constituents. 11. Further, there are two arguments as to why the French adjectives occur with French agreement morphology and are not more morphosyntactically integrated into Fongbe. (a) Fongbe has few real adjectives (according to Meechan and Poplack p. 176) and apparently no gender-agreement morphology. Therefore, if one assumes Fongbe as the Matrix Language, French adjectives inserted into Fongbe frames could still hardly show Fongbe agreement affixes, (b) French adjectives, as surface forms, must agree in number and gender with their head nouns; that is, there are no French surface forms without this agreement. Yet, the French adjectives are still bare forms from the standpoint of Fongbe. 12. Bock and Levelt (1994: 953)referto indirect election of some lemmas by lemmas for lexical concepts. They do not specify which lemmas are indirectly elected; however,

The matrix language frame model 57 the example they give is of what is an indirectly-elected or early system morpheme under the 4-M model (their example is to in listen to). Of course, under the 4-M model there are three types of system morphemes (not something that Bock and Levelt consider) and only early system morphemes are indirectly-elected. 13. Meeuwis and Blommaert (1998: 88) state that"... more than any other instance of Lingala-French code-switching, speech such as example 1 [examples 9a and 9b in my text] breaches all of the possible syntactical constraints suggested in the grammatical literature on code-switching." I pointed out to them (personal communication) that the example did not violate the matrix language frame model; indeed, as I have shown here, their data support the system morpheme principle. In response, they replied (in a personal communication), that their comment about breaches did not refer to the matrix language frame model, even through Myers-Scotton (1993a) was the only model of syntactic constraints listed in the references. Their generalization in print about "all the possible syntactical constraints" could mislead readers. 14. Halmari (personal communication) states that the partitive-assigning verb olla 'have' becomes onks in the colloquial question form. 15. Myers-Scotton-Jake (1999) discusses German determiners as multi-morphemic. While they include morphemes for gender and number (early system morphemes), they also include a morpheme for case. Case is a late system morpheme in German because verbs or prepositions assign case to a noun and its modifiers. Thus, the German determiner is not accessed until the level of the formulator when larger constituents are assembled. Certain codeswitching distributions involving German are attributed to this lateness of German articles. 16. I thank Janice L. Jake for a discussion and some useful ideas about the status o f / when Arabic is not the matrix language. 17. Some of the other putative counter-examples that Bentahila and Davies (1998) cite could well be performance errors. For example, the demonstrative and determiner from Arabic in the Noun Phrase dak I materiel 'that the material' that is part of a conversational turn that is otherwise almost entirely in French are hard to explain as anything but idiosyncratic errors. Of course, if such examples were numerous, then they would require a revision of the model. A possible explanation, though, for Arabic demonstratives/determiners when Arabic is the embedded language in a bilingual CP is that (a) demonstratives and determiners are early system morphemes (their form depends on the head of their maximal projection, the noun they modify and they add the conceptual structure of definiteness). Since they are indirectly elected at the lemma level, it is easy for them to reach salience when a noun phrase is being accessed. Thus, in Bentahila and Davies' example (2) (cited as (1) below), dak may surface as an emphatic demonstrative: (1) 9a te rapelle quelque chose, dak la chemise? that you recall something that the shirt 'Does that remind you of something, that shirt?' (Moroccan Arabic/French Bentahila and Davies 1998: 36) The same argument applies to hadik in their example (6) on p. 39, ses idees hadik

58 Carol Myers-Scotton

18.

'his ideas these'. Certainly, such lone embedded language morphemes as discourse markers (walafdn 'but' and li'?anna 'because' in their example (5) on their p. 38 and biianna 'in me' in their (6) on their p. 40 can be explained as content morphemes at the discourse level. Also, my colleagues and I are concluding that complementizers, at least in some languages, are also content morphemes because they behave similarly to discourse markers (i.e., they restrict the interpretation of what follows). Boumans ( 1 9 9 8 : 3 6 ) acknowledges that double morphology involving plural affixes occurs in many data sets without weakening the system morpheme principle. However, he argues that lone embedded language plurals (no matrix language plural is present) still pose a problem for the system morpheme principle. I have several quarrels with his analysis. First, he states that such lone plurals "are at least as common as double plural morphology". I would like to see some quantitative evidence of this claim because it contrasts with my experience. Second, he assumes that the embedded language plural affix governs matrix language agreements in the constituent. In his example, the mixed Noun Phrase duk artikel-en 'those articles' (Arabic demonstrative and Dutch noun and plural suffix) in a constituent for which Arabic is the Matrix Language, the noun is marked only for plural by Dutch, not Arabic. Yet, the Arabic demonstrative duk 'those' is marked for plural, as is the Arabic verb with which the noun agrees, t-terzem-hüm (2-translate-3PL 'you translate them'). My claim is that it is the underlying matrix language counterpart noun (i.e., Arabic noun) with which these plural forms are coindexed. This argument is in line with the claims in Myers-Scotton and Jake ( 1 9 9 5 ) that embedded language forms are checked for congruence with a matrix language counterpart. However, I also admit that this is a problematic example for which Boumans' analysis may be right.

Language alternation: The third kind of codeswitching mechanism Rodolfo Jacobson

1. Introduction Recent studies are suggesting that some codeswitched sentences may not be analyzable on the basis of matrix and embedded language utterances as the two participating languages may play equal roles in the unfolding of the message. Consider the following mixed sentences in which chunks of English and Spanish occur together in the construction of an utterance: (1) Los pensamientos de uno del otro lado es take over where you The thoughts of one from the other side is 'The thoughts of onefromthe other side is [to] take over where are working at, eh? you are working at, eh?' (RJ/fm, 2B)1 or (2) Y lo logran, they continue helping their own family,— tienen su And they succeed in it (they) have their 'And they succeed, they continue helping their own family,senoray familia. spouse and family they have their own spouse and family.' (RJ/rm,1.3) One would be hard pressed here to determine whether the English portion dominates the Spanish segment or the Spanish portion dominates the English one. A similar relationship can be found in utterances in which English and Malay make up a sentence as this can often be overheard among speakers in Malaysia. Consider the following mixed sentences:

60 Rodolfo Jacobson

(3) Jadi apakah pendapat orang-orang seni apabila Roslan Aziz Sowhat-Q opinion men art when Roslan Aziz 'So, what is the artist's opinion when Roslan Aziz says kata you need professional, you do what you do best, you say you need a professional, you do what you do best, you menyanyi. sing. sing.' (OKS, 2[ΠΑ])2 or (4) Okay, what Mike Bernie is saying, kita ini tak ada problem we here not have Okay, what Mike Bernie is saying, we do not have a problem sebenarnya. actually here actually.' (OKS,3 [IIB]) In the light of data of this nature, the author is arguing that some mixed sentences do not display a superordinate-subordinate relationship but rather reflect a balance between the two participating languages, even though this particular pattern may not represent the great majority of codeswitching data. If this notion of equality can be shown to exist also in other language pairs, then it might be worth formalizing the occurrence of Language Alternation as a kind of mechanism that differs from the customary matrix-embedded language relationship. It is the purpose of the present chapter to briefly describe the first two mechanisms in which Lt dominates L2 or the reverse and then argue, on the basis of data from Moroccan Arabic as well asfromfurther examples from Spanish-English and Malay-English codeswitching, for the existence of a balanced distribution between the two languages in bilingual discourse as a mechanism worthy of consideration.

Language alternation as measure 61

2. Matrix-embedded language constructs The most common type of codeswitching observed in multilingual societies is one in which one language occupies a dominant position and the other is subordinated to the former. Observe the dominance of English over Spanish in the following example: (5) Getting to what age? Well, always the reason,...the reasons que that 'Getting to what age? Well, always the reason,..the reasons that el mejiccmo trabaja;there is always on their minds,for my sons, the Mexican works the Mexican works is always on their minds, for my sons.' (RJ/rm, 1.5) Quite often, it is the Spanish language portion that dominates over the English segment: (6) Eso es todo lo que hace el mejicano no mas; they go; es todo. That is all what does the Mexican only is all 'That is all that the Mexican ever does; they go, that's all.' (RJ/rm, 1.7) In turn, the dominance of English over Malay becomes obvious in such mixed events as: (7) Sayang, we have to define what is bahasa tinggi and rendah. Love language high low 'Honey, we have to define what high and low language is. You can't just go and write. You just can't go and write.' (OKS,5[IIIA]) or that of Malay over English:

62 Rodolfo Jacobson

(8) Hey, Feminin is so cute/αΑ. You dengar tak lagu dia 'Kini'? Feminin (emph.) hear not song his Now? 'Hey, Feminin is so very cute. Did you not hear his song Now'? (OKS,8[IVA]) Examples like the above show strong support for the notion that the dominant language portion serves as the matrix within a given sentence, whereas the sub-dominant segment is the embedded string. By the same token, it makes sense to argue that a first step in the construction of a mixed discourse utterance is aframeinto which the matrix chunk and the embedded chunk are both inserted. Conceptualizations along this line of frame constructions have been made for some time now. The writer proposed a crude frame model in a paper delivered at a professional meeting in Fort Worth, Texas (South Central Modem Language Association) over twenty years ago and this notion has later been refined by others to design a model known as the matrix language frame model (Shoji Azuma, Carol Myers-Scotton). Although significant differences exist between the earlier and current postulations - Jacobson used the sentence as a basic unit of analysis while Myers-Scotton has recently proposed that the unit of analysis should be the projection of complementizer - the overall idea remains the same, that is, there is an unequal relationship between the two language segments in that one represents the dominant language and the other the dominated or embedded language. The ongoing professional debate, however, raises now the fundamental question of whether the matrix-embedded language relationship is the only viable mixed language manifestation or whether, in addition to it, there is still another relationship in which there is a balanced manifestation of two languages without any one showing superordination with respect to the other. For those who believe in the exclusiveness of the former, there are then only two mechanisms of codeswitching (Li dominating L2 and L^ dominating Lx) and for those others who believe that, in addition to the matrix-embedding manifestation, there is an equal relationship pattern, as a result of which three such mechanisms operate, the two previously mentioned mechanisms and a third one, Language Alternation, as suggested by a well-known team of Moroccan linguists (Bentahila and Davies).

Language alternation as measure 63

This conceptualization is however opposed by others, in particular by Myers-Scotton, who contends that there is always a dominant language in codeswitched speech. This difference in opinion is partly due to the fact that the syntactic unit of analysis for the latter is the projection of complementizer CP, whereas for the former it is the sentence as a whole. Myers-Scotton elaborates on this issue in her earlier correspondence with the writer by saying that [a] CP is a syntactic unit headed by a Complementizer position; in what we called independent clauses, the COMP position is empty...Within many sentences, there are more than one COMP. Generally, the CP has a finite verb...Therefore, what others (and me, too) called intra-sentential CS [codeswitching] should be called intra-CP switching, I now state. This is the domain of the MFL [Matrix Language Frame] model and it is only within the CP that the ML [Matrix Language] vs. EL [Embedded Language] distinction makes any sense. Switching across CPs is something else and is outside my purview.

Differences between researchers are not unusual and, for the time being, this author has remained faithful to the sentence as unit of analysis and within such a framework, it seems to make sense, as it does for Bentahila and Davies, talking about three different mechanisms, one, in which language A occupies a dominant position, another in which language Β occupies that position and a third one in which language A and language Β share an equal load of responsibility without neither dominating the other.

3.

Codeswitching as an equal relationship

The work of Bentahila and Davies does not only raise a theoretical issue, the existence of a third mechanism, but carries the research to a new language group, Moroccan Arabic/French. Jacobson argued in his introduction that, if a balance between the two participating languages could be found in mixed language discourse, it might well be worthwhile to formalize the concept of a third codeswitching mechanism. Bentahila and Davies (1997:25-49) have actually done so when they analyzed a

64 Rodolfo Jacobson

passage of Arabic-French discourse in their recent study Codeswitching, an unequal partnership? After examining the following sample, (9) wahed nmtba kunt ana w Thami. On s'est arrete juste au feu rouge, on parlait kunna brina nmsiw I meraks ma nmsiw I meraks, w hunt qrit. H m'a vu enseigner w daksi, w zajin hna, on habitait ici. waqef, il faut voir,Afo le dix-septieme etage f dak le feu rouge fas zawlu zzerda hvstanija. ad sawbulha Igas et j'etais devant, il y avait une centaine de voitures derriere moi, w ana waqef. ^attends le feu rouge pour changer, wahedsar, comme 9a, je demarre, je demarre, yaw/, w kant dak la semaine djal tajzawlu les permis. Je demarre hakda w nnas kulhum waqfin muraj. ['Once there were Thami and I, we stopped right at the red light, we were talking. We were wondering whether to go to Marrakesh or not, and I had been teaching. He watched me teaching and so on, and we were coming here, we lived here. I was waiting, you should have seen, near the seventeen story building, at that traffic light where they had taken away the garden in the middle, they have just put concrete there, and I was in thefront,there were about a hundred cars behind me, and I was waiting. I was waiting for the traffic lights to change. After a while, just like that, I moved off, I moved off, I mean, and it was that week where they take driving licenses away. I moved off like that and all the people were waiting behind me.1]

Bentahila and Davies made a word count to determine the number of words in each language. More relevant than the almost perfect balance between Arabic (57) and French (59) words was for them the distribution of each language at clause level. They identified 26 clauses, 11 entirely in Arabic and nine entirely in French. They went on to examine the remaining six clauses and found three of them in French (except for an Arabic filler) and three, in Arabic (except for some French nouns preceded by their determiners). Furthermore, they noticed frequent alternation between whole statements in one as well as the other language and interpreted the alternations as a means to show that both languages seemed to have equal parts to play in the unfolding of the story. In view of all this, they concluded that "it does not seem plausible to view the French clause as insertions into an Arabic matrix, or the Arabic ones as being embedded in essentially French discourse." This finding matches well Jacobson's earlier view concerning the

Language alternation as measure 65

mixing of two languages (Jacobson 1977, 1983) to the effect that some mixed sentences may not qualify as instances of codeswitching nor codemixing for the simple reason that the portion rendered in language A (Lj) cannot be said to be embedded in language Β (L2) or vice-versa but rather that the two language segments A and Β (Lt and L2) maintain equal status in a given bilingual discourse. As a matter of fact, the author had suggested, already in 1983, that some utterances might be classified as AB Frames because such utterances were encoded in two languages, A (L t ) and Β (L^), each sharing with the other the total load of the message to be conveyed (Jacobson, 1983). In a more recent study, Jacobson elaborates further on the potential of identifying the cited third mechanism by (a) examining some recent recordings in Malay-English, (b) selecting a number of utterances that qualify as instances of language alternation and (c) describing the criteria to.be used in order to distinguish language alternation utterances (third mechanism) from common matrix-embedded language occurrences. With the assistance of Ong Kin Suan, a Chino-Malaysian instructor residing in Kajang, Selangor, Malaysia, fifteen dialogs or conversations were recorded and transcribed yielding a total of 62 pages of material containing 413 switches of various kinds. The approximate duration of the recordings was 15 hours of audiotaping and each informant was carefully screened in terms of seven variables, i.e. gender, age, ethnicity, socio-economic status, education, native language and occupation. Of 53 informants, 19 were male and 34 female. Except for 8 informants whose ethnicity was Chinese, Indian or Indo-Malaysian, the individuals were all Malays speaking Bahasa Malaysia as their native language. The educational level of all was either post-secondary or tertiary [college level] and their occupational status was mostly Bilingual Education students with a sprinkling of other occupations, such as, media persons, businessmen and politicians. The informants were middle or upper-middle class members of the Malaysian society, although the exact wages earned were difficult to assess. Observe now the following mixed language utterances that were found among the Malay-English discourse samples that can only be identified as instances of language alternation:

66 Rodolfo Jacobson

(10) I feel artis-artis veteran lebih berpengalaman imtuk beri artists veteran more experienced to give Ί feel that veteran artists are more experienced to dorongcm kepada artis-artis muda sekarang dan saya rasa, encouragements to artists young now and I feel encourage young artists now and I feel this is for young artists

ini untuk artis-artis muda ya, saya rasa dia orang tak perlu this for artists young yes, I feel he person not need this is for young artists for sure, I feel a person does not need tanya what the association can do for them, they should ask ask to ask what the association can do for them, they should ask apayang dia orang boleh buat untuk menaikkan tahap what that he person can do to raise status what a person can do to raise the status of that association.' persatuan itu. association that (OKS, 2[IIA]) (11) Jadi, Y.B. [Yang Berhormat]. Ada perlu compromised/? di So the One honored there is need Q in 'So, my distinguished friend. Is there a need to compromise antara popular culture...compromise between popular culture between between popular culture..compromise between popular culture dengan perlunya nilai-nilai tradisi ini. with its need values tradition this and the need for the values of this tradition?' (OKS,3 [IIB])

Language alternation as measure 67

(12) Culture is something like...you look this group of people, sebuah one 'Culture is something like., you look [at] this group of people, a masyarakat and you look at their way of life, cara kehidupan society way living society, and you look at their way of life, their way of life mereka, macam mcma berinteraksi sesama mereka dan ciri their how much interact among them and a characteristics how they interact among themselves and [display] a unique yang unik terhadap masyarakat tersebut, the unique ways of which unique toward society cited characteristics toward [the] cited society or the group.' the society or the group. (OKS, 4 [IIIA]) By the same token, the following Spanish-English samples could also be assigned to language alternation: (13) Try to get a house and to have this and to have this other one; 'Try to get a house and to have this one and have the other one; trying to gain lo que ya perdio de juventud Ese es el that which already lost in youth. That is the trying to gain what he had lost when he was young. That is his pensamiento de el. thought of him thinking.' (RJ/rm, 1.11) (14) Pos depende si quieres American-made,-I'll stick to a Chevrolet Then [it] depends if [you] want 'Then, it depends on whether you want American-made, - I'll stick

68 Rodolfo Jacobson

six cylinder; they are very reliable, - pero, if you are interested but to a Chevrolet six cylinder; they are very reliable, - but, if you are in a gas economy, no mas, este, el Datsun ο el Toyota, either one. only eh the or the interested in gas economy alone, eh, the Datsun or the Toyota, either one.' (RJ/ec, 17) The identification of these text samples as instances of language alternation seems to have resulted from an intuitive judgment by the observer, as it is virtually impossible to determine which language dominates and which is embedded. An intuitive judgment, however, is not good enough in the eyes of the linguistic analyst and specific criteria should be identified to assist him/her, to recognize the balanced manifestation not only in the above examples, but also in scores of other text samples. Following the guidelines proposed by Bentahila and Davies in their mentioned study, Jacobson suggests the following set of criteria in order to assess Language Alternation status in a more objective way: (1) There are a number of language elements that can be properly quantified, such that the numerical balance (or near-balance) of these elements suggests that one of the prerequisites of Language Alternation has been satisfied. (2) From the purely lexical viewpoint the number of words in a given string is approximately the same. (3) From the morphosyntactic viewpoint, there are roughly as many clauses, preposition/adverbial phrases, noun phrases and isolated morphemes in one language as there are in the other. (4) There is an equal (or almost equal) amount of monolingual utterances in either language.

Language alternation as measure 69

(5) Finally, the actual message to be conveyed strongly relies on the two languages such that both contribute significantly to what Bentahila has called the unfolding of the story. A valid test in this regard might be the exclusion of all the material uttered in one of the languages and the corresponding assessment as to whether the total message can still be derivedfromthe other language material, that is, from what has remained after the exclusion. If it cannot, one certainly deals with the joint contribution of both languages to achieve the mentioned unfolding of the story.

4.

Sociocultural implications

Language alternation, and not matrix-embedded language codeswitching, stands an excellent chance of receiving the approval of governmental agencies responsible for the language planning efforts of a nation as the equality status is important to allow some other-language elements into the Nation's linguistic network. This is particularly true for situations where an indigenous language has risen to become the national language of the country. To be sure, a subordinate role of the new national language, that is, subordinate to, say, a western language, would run counter to the desire of the new independent nation to establish the standardized variety of its vernacular as the official language of the land. Malaysia is a case in point and serves as an example of how language alternating techniques may be tolerated whereas others would not be. A brief reference to the Malaysian language situation may therefore be in order. When the former British colony of Malaya achieved its independence in. 1957, the Constitution of the newly created Kingdom of Malaysia decreed that Malay (now known as Bahasa Malaysia/Bahasa Melayu) would serve as the national language of the country. The new national language would be a standardized version of regional Malay dialects, mainly based on the Johor-Riau dialect, and it was to be expected that, in a period of ten years all functions that at that time were processed in English would then be borne out by the new standard variety. A Malaysian language academy, the Dewan Bahasa dan Pustaka, [Institute

70 Rodolfo Jacobson

of Language and Literature] and was, and still is, in charge of standardizing and upgrading the Malay vernacular, spoken in different varieties on the Archipelago, so that schools might teach the subjects in Malay, first at the elementary level and later at secondary and tertiary levels. The full extent of the decree has however been implemented barely now, not ten years but almost 40 years after Independence. Standard Malay began to be used as the official language of the country, as soon as most state employees could be properly trained. Some educational activities were allowed to continue in English, e.g., the teaching of law at the university level, (especially, international law), but as new Malay terms emerged, terms coined specifically through the work of various language standardization committees as well as other related agencies some of them also including Indonesian and Brunei language experts, more and more English language activities became Malay language activities. At a less official level however both languages continued to occupy a very important position among, at least, the educated members of Malaysian society. This is where the issue of codeswitching mechanisms into focus again and where Malaysians had to show their preferences for one of the three strategies discussed earlier. As for the Malaysian attitudes to codeswitching in educational/official circles, Language Alternation tends to emerge as a kind of code alternation that may not run into conflict with governmental attitudes. The matrix/embedded sentence mechanism, in turn, is still relegated to personal and informal interchanges with which language planning agencies have little desire to meddle as they do not interfere either in regard to the use of Chinese or Indian languages or regional Malay dialects in the home or on the street. One can therefore envision that two means of communication could find the approval of the Government for formal conversation, Bahasa Malaysia - or Bahasa Melayu, as it is also often referred to - and Malay-English Language Alternation. In view of the important role that the joint use of the two languages is here predicted to play, it is important that future research address the nature of Malay-English mixture that is designed to reveal an equal status between a national language and a non-indigenous, international language, so as not to endanger by the use of this mixed variety the role

Language alternation as measure 71

of Malay as official language but rather to emphasize by it Malaysia's new role in a worldwide setting.

5.

Conclusion

It has been the purpose of this chapter to direct the reader's attention to the fact that some mixed language utterances do not reveal the dominant status of one language and the subordinate status of the other. Text samples of this nature have been found, for some time now, in the bilingual speech of Mexican-Americans, Malaysians and Moroccans. This equality status of two languages, here called Language Alternation, does not characterize however the majority of codeswitching occurrences. In their majority, mixed structures do indeed reveal a situation where one language is dominant (the matrix language) and the other is subordinate (the embedded language). In the analysis of matrix-embedded language occurrences, codeswitching has been interpreted as frame construction where matrix language elements and embedded language elements both enter into such frames. Disagreement, however, exists concerning whether the unit of analysis should be the sentence (Bentahila and Davies, Jacobson) or should be the projection of complementizer CP (Myers-Scotton). In supporting the latter, all codeswitching operates according to two mechanisms, one in which Lj dominates L2 and the other where L2 dominates Li. In supporting the former (sentence as unit of analysis), however, three mechanisms emerge, that is (a) Lj dominates L2, (b) L2 dominates Lx and (c) neither dominates the other (equality status). Codeswitching as an equal relationship has accordingly come to represent the third mechanism and been called Language Alternation by Jacobson and his Moroccan colleagues. A lengthy excerpt from their study (Bentahila-Davies in Jacobson (ed.) Codeswitching Worldwide) shows how French and Arabic, share both in the unfolding of the story. Jacobson has cited some of his own data from the two language pairs of his competency. The identification of text samples of this kind in SpanishEnglish and Malay-English constructions leads him to formalize, on the basis of Bentahila and Davies' work, what should be the criteria to

72 Rodolfo Jacobson

properly identify sentences as instances of language alternation. The final section of the paper examines language alternation from a sociocultural perspective with special emphasis on the extent to which new independent nations that have chosen to upgrade their own vernaculars to become official languages of their countries are allowing two languages to function together in the conveyance of a message. Language alternation may thusfindacceptance at the formal/official level. Malaysia has been cited here as a case in point. Accordingly, the balanced presence of the two languages is here considered to be an excellent opportunity for new nations to incorporate portions of an international language like English or French into their own linguistic framework but without endangering hereby the status of their own national language.

Notes 1. The initials RJ identify Jacobson's codeswitching database and are followed by the initials of the student researcher at the University of Texas at San Antonio together with corresponding file numbers. 2. The initials OKS identify Ong Kin Suan's recorded data followed by the number of dialog and tape identification number.

Section 2 Linguistic aspects: From morphosyntax to semantics

Contrastive sociolinguistics: Borrowed and codeswitched past participles in Romance-Germanic language contact Jeanine Treffers-Daller 1. Introduction The main aim of this article1 is to discuss issues arising from a comparison of the linguistic consequences of language contact between the Germanic and Romance language varieties that are spoken along the linguistic frontier. The mutual contacts between the language varieties spoken in Northern Italy, Switzerland, France, Luxembourg and Belgium have been studied by many different researchers, from a sociolinguistic or a structural point of view. The majority of these studies are purely descriptive and little effort is made to explain the facts in the framework of theories on contact linguistics, such as Thomason and Kaufman's theory of contact-induced language change. In view of the fact that much information has already become available, it seems important to come to a synthesis of the facts that have been published in a range of different journals and books. The purpose of the present article is to develop a further understanding of the contact patterns found along the linguistic frontier and to come to a better understanding of the similarities and the differences between these contact patterns. The variability in language contact phenomena found all over the world is such that the search for general constraints on these phenomena has become very problematic. As Muysken (1991) points out, it remains to a large extent unclear whether the patterns observed are due to structural differences between languages, to sociolinguistic factors and characteristics of the interlocutors, or to a conventionalization of patterns that are in principle arbitrary. A careful comparison of results of language contact phenomena along the linguistic frontier, in which a certain

76 Jeanine Treffers-Daller

amount of variables are kept constant, may help to clarify the patterns observed. The typological differences between the Romance varieties spoken on one side of the frontier are relatively small and the same is true for the Germanic varieties of Dutch and German spoken on the other side. Of course, subtle differences do exist and we will come back to some of them below. Still I believe that the varieties under study are typologically close. There is a reasonable amount of comparability in the type of communities along the frontier, in that we are studying indigenous groups, with a long tradition of bilingualism. Furthermore, in most cases, the French varieties are considered to be more prestigious than the Germanic varieties. The situation of Rhaeto-Romansh/Germanic contact, as described by Weinreich (1953), is probably one of the exceptions to this situation. Differences exist in the amount of support for the individual varieties, the presence or absence of standard varieties in the immediate environment and the attitudes towards the different varieties and these differences need to be taken into account in a description of the language contact phenomena. In Treffers-Daller (in press) I have given a detailed overview of some differences in the sociolinguistic situation of Strasbourg and Brussels and I have tried to argue that there are important differences between both cities, from a sociolinguistic point of view. It is probably because of the sociolinguistic differences between language communities that Neide (1986) doubts whether it is possible to compare language contact situations. I agree that language contact situations are never entirely the same from a sociolinguistic point of view, but I do think that it is possible to describe these differences systematically, as Bister-Broosen (1996) did for Colmar and Freiburg and Willemyns (1996) for different bilingual communities along the linguistic frontier in Belgium and French Flanders. This article aims at further contributing to the development of contrastive sociolinguistics by a detailed analysis of the contact patterns in different communities along the linguistic frontier. A full understanding of the contact patterns along the linguistic frontier is only possible when both the sociolinguistic and the structural aspects of language contact in the bi-lingual communities are taken into account. The language contact model presented by Thomason and Kaufman (1988 and Thomason 1998) offers a very useful framework for a

Contrastive sociolinguistics 77

comparison of language contact phenomena, and allows us to go beyond a simple description of the facts towards an explanation of the variability found. In Treffers-Daller (forthcoming) I have shown that the model correctly predicts the asymmetries between the mutual influences in the Romance and the Germanic varieties spoken in Brussels and Strasbourg. Thomason and Kaufman's model is thus a very powerful tool for describing these differences. On previous occasions (Treffers 1988, Treffers-Daller 1995, 1997 and forthcoming) I have discussed the similarities and the differences between the borrowing and interference processes in the Romance and the Germanic varieties spoken in Brussels and Strasbourg. I will summarize the main points of that analysis below. The present paper focuses on two exceptions to the similarities between the contact patterns in Brussels and Strasbourg. These exceptions relate to the ways in which French past participles are integrated into Brussels Dutch and Brussels French. The aim of the paper is to show that these differences can be explained on the basis of structural differences between Brussels Dutch and Alsatian and that sociolinguistic factors have very little explanatory power in this matter. Before that I will summarise the main points of Thomason and Kaufman's model.

2. Borrowing and interference through shift Since the publication of Weinreich's study Languages in Contact (1953) many studies into language contact have been carried out in bilingual communities across the world. As the results of those studies revealed an immense variability in the outcome of language contact, it has become an almost impossible task to predict with any certainty which language contact phenomena are to be expected in a specific situation. The central problem Weinreich already discussed is to discover to what extent the quantity and the quality of language contact phenomena are determined by structural language-internal factors on the one hand and socio-cultural factors on the other hand. Thomason and Kaufman (1988: 35) take a very clear point of view in this discussion when they say that "it is the sociolinguistic history of the speakers, and not the structure of

78 Jeanine Treffers-Daller

their language, that is the primary determinant of the linguistic outcome of language contact." The importance these authors attach to the sociolinguistic history of the speakers is reflected in the definition they give of the two basic processes of external linguistic change distinguished, that is: borrowing and interference through shift. Borrowing is defined as "the incorporation of foreign features into a group's native language by speakers of that language: the native language is maintained but is changed by the addition of the incorporated features." (1988: 37). In a situation of borrowing, it is generally words that are borrowed first. If language contact becomes more intense, due to social pressure, more intimate forms of borrowing (phonological, syntactic and even morphological borrowing) is predicted to occur. Thus, the type and the quantity of borrowing depends on the intensity of contact between two languages. Thomason and Kaufman specify their predictions in a borrowing scale, which consists of five levels. The first level of this hierarchy represents a casual form of contact, in which only cultural elements and non-basic vocabulary are borrowed, and the fifth level corresponds to intense forms of language contact, which may include important typological changes to the borrowing language. The second type of externally motivated language change, interference through shift, is defined as "the result of imperfect group learning during a process of language shift. That is, in this kind of interference a group of speakers shifting to a target language fails to learn the target language (TL) perfectly." (1988: 38-39)2. Contrary to a situation in which borrowing takes place, "interference through imperfect learning does not begin with vocabulary: it begins instead with sounds and syntax, and sometimes includes morphology as well before words from the shifting group's original language appear in the TL." (1988: 39). Thomason and Kaufman do not present an inter-ference scale in their book, but distinguish slight interference from moderate to heavy interference. Although there is no sharp dividing line between slight and moderate to heavy interference, the authors suggest that slight interference includes phonological and syntactic features, whereas moderate to heavy interference "will have more examples of these and, in addition, some interference in the inflectional morphology" (1988: 121).

Contrastive sociolinguistics 79

Given the limitations of the present chapter, it is not possible to discuss the model in more detail here, but it is clear that the model is so explicit that a number of interesting research questions and hypotheses for future research can be derived from it. As the Romance-Germanic language contacts along the linguistic frontier are not discussed in Thomason and Kaufman's book, they form an interesting test case for their model.

3. The application of Thomason and Kaufman's model to Brussels and Strasbourg In a previous paper (Treffers-Daller, forthcoming), in which I compared language contact phenomena in Brussels and Strasbourg, I have explored several hypotheses derived from Thomason and Kaufman's model. In the first place I have argued that the phenomena found in the Dutch variety spoken in Brussels and in the Alemannic variety spoken in Strasbourg are the result of a process of borrowing, whereas the contact phenomena found in the French varieties spoken in these cities are the result of interference through shift. As a result, lexical borrowing is far more important in the Germanic varieties than in the French varieties. Syntactic and phonological interference, on the other hand, have been shown to be more prominent in the French varieties than in the Germanic varieties. Morphological interference, on the other hand, is relatively restricted in comparison to interference on syntactic and phonological levels (Thomason and Kaufman, 1988: 38). In the second place, I have shown that the borrowing patterns are equally intimate in Brussels and Strasbourg. The borrowing patterns found in both cities are restricted to level two (and perhaps some aspects of level three) on Thomason and Kaufman's borrowing scale. This means that borrowing is fairly limited in structural terms, and mainly restricted to the level of the lexicon. There is no evidence for important typological contact-induced changes in the structure of the contact languages. In the third place I have tried to argue that the similarities in the outcome of language contact are remarkable in view of the fact that there are considerable differences between the two communities from a sociolinguistic point of view. These differences relate to the status and

80 Jeanine Treffers-Daller

function of standard languages in both cities, the numbers of speakers of the Germanic varieties, the tensions between the language groups and a number of other points. I have claimed that the striking similarities in the borrowing and interference patterns in both cities find a plausible explanation in the fact that the language contact situations in both cities are similar from a typological point of view: in each city a variety of French is in contact with a Germanic variety and the French variety is considered to be the more prestigious of the two. Thus, I have claimed that the data from Brussels and Strasbourg do not lend support to Thomason and Kaufman's claim that the sociolinguistic history of the speakers is the primary determinant of the linguistic outcome of language contact. Although the sociolinguistic history of the speakers in both cities is clearly different, the outcome of language contact is strikingly similar. Thus, the claim of my previous paper is that the structure of the languages plays a more prominent role in the outcome of language contact than the sociolinguistic history of the speakers. It is the aim of the current article to further corroborate this claim by a comparative analysis of the integration patterns of borrowed past participles. I hope to show that there are differences between the integration patterns of French past participles in Brussels Dutch and Alsatian and that these differences can be explained on the basis of structural factors only. Sociolinguistic factors have very little explanatory value in this matter.

4. Some differences between language contact in Brussels and Strasbourg In Treffers-Daller (1995 and forthcoming) I have shown that the linguistic outcome of language contact is very similar in Brussels and Strasbourg, both from a quantitative and a qualitative point of view. I would like to point to two differences that I found between the language contact phenomena in both cities now, as this sheds an interesting new light on the different role of structural and sociolinguistic factors in language contact patterns. The differences concern the occurrence of

Contrastive sociolinguistics 81

French past participles in Brussels Dutch and Alemannic as spoken in Strasbourg. German and Dutch have borrowed many French forms and one can show that the strategies used for the integration of these forms are very similar. Both German and Dutch attach a suffix to the root of the French verb. In Standard Dutch as well as in Brussels Dutch this suffix is [e.r] and in Standard German and Alsatian the suffix is [i:r]. Thus, for example, the French verb arranger has been borrowed into Dutch as arrangeren [aranze:re] and into German as arrangieren [aranzi:re)] (cf. Treffers-Daller 1994 for more details). The suffix [e:r] is also attached to verbs that do not belong to the French -er class, such as finir 'to finish' and traduire 'to translate'. These verbs becomefiniss-er-en and traduis-er-en in Brussels Dutch. There is an important difference, however, between the integration strategies in both languages. The past participle of the Dutch arrangeren is ge-arrangeer-dy whereas the past participle of the German arrangieren is arrangier-t. Thus, the prefix geis absent in the case of the German past participle. The second difference concerns the fact that French past participles may keep French morphology in Alemannic sentences, but can only occur in a morphologically integrated form in Brussels Dutch. Thus the following structure has been attested for Strasbourg but not for Brussels. (1) Sie sind condamnes worre3 They are condemned been 'They have been condemned.' (Gardner-Chloros 1991: 131) (2) Noh het er remercie Then has he thanked (us) 'Then he has thanked us.' (corpus Gardner-Chloros, conversation B, page 7) (3) Tee het er als zamme melang0 Tea has he all together mixed 'He mixed all sorts of tea together.' (Gardner-Chloros 1991: 167)

82 Jeanine Treffers-Daller

(4) De Larouge het ne aa schunn soigne De Larouge has him already taken care of 'De Larouge has already taken care of him.' (Gardner-Chloros 1991: 141) (5) Noch Even

schlimmer, worse,

permis driving licence

wenn de client recale wurde am when the candidate failed was at the

weje because of

de the

panne shortage

d'essence of fuel

'Even worse, when the candidate was failed for his driving license because of a fuel shortage.' (Gardner-Chloros 1991:151) In Brussels Dutch we onlyfindthe forms ge-condamneerd 'condemned', ge-remercieerd 'thanked', gemelangeerd en gesoigneerd 'mixed and taken care of, which are fully morphologically integrated. I never found gerecaleerd 'failed', but that is probably due to the fact that the occurrence of a particular verb depends on the topic of conversation and the topics of conversation are never exactly the same in two conversations. From now on French past participles that keep the French morphology in Alsatian sentences will be called unintegrated past participles and those that receive German or Dutch pre- and suffixes will be called integrated past participles. The unintegrated participles can be considered to be examples of code-switching or nonce-borrowing, whereas the integrated forms are probably examples of established borrowings. This terminological issue is not very important for the argumentation of this paper. The difference with respect to the prefixes can be traced back to subtle differences in the rules for past participle formation for German and Dutch verbs, as described by Kiparsky (1966), Wiese (1992) and Geilfuss-Wolfgang (1998) for German, and by Schultink (1973) for Dutch. Although we cannot go into the details of the rules for participle formation here, the most important difference between the German and the Dutch rules should be mentioned here. In German the prefix ge- is

Contrastive sociolinguistics 83

deleted in participles of all verbs that have an unstressed first syllable, whether or not thisfirstsyllable is part of an unstressed prefix4. In Dutch, however, the prefix ge- is only deleted if the first syllable of the verb is part of an unstressed prefix. Thus, the prefix ge- is deleted in both Dutch and German in the following example, where the unstressed part of the verb is a prefix: (German) ver-Muft and (Dutch) ver-kocht 'sold'. Dutch and German differ, however, for the following example, in which the unstressed part of the verb is not a prefix: (German) marchiert and (Dutch) gemarcheerd 'marched'. The first syllables of French verbs generally being unstressed, French verbs do not receive the ge- prefix when borrowed into German. As the unstressed first syllable of marcheren is not a prefix, the past participle receives the prefix ge- . Thus, we get arrangiert in German and ge-arrangeerd in Dutch. The French verbs are integrated strictly according to the existing rules in each language, and there is no need to formulate a special rule for the creation of past participles of these verbs. The above facts are important for two reasons. In the first place because they show that the variability in morphological integration patterns observed here is due to a subtle structural difference between the two borrowing languages, Dutch and German, and that sociolinguistic factors do not play any role in this issue. In the second place, the absence of the prefix ge- in the past participle forms of French verbs may help explain why the unintegrated past participle forms we have seen in examples (1) through (5) are only found in Strasbourg but not in Brussels. It is a generally held assumption that elements can be switched or borrowed because speakers perceive these elements to be more or less equivalent or congruent in each language (Muysken 1990; Myers-Scotton 1993). Thus, nouns can easily be switched or borrowed, because it is relatively easy to recognize a French and a Dutch noun as equivalent. Past participles, on the other hand, may not be considered equivalent in French and Dutch, due to the different morphology they carry. There is a considerable distance between the number and type of suffixes that are attached to a regular French past participle (march-e) and the morphologically integrated form of that past participle in Dutch (gemarch-eer-d). Since the past participle form of French verbs does not receive the prefix ge- in German, the distance between the integrated and

84 Jeanine Treffers-Daller

the unintegrated forms of French past participles in German is smaller. In other words, the morphologically integrated form arrang-ier-t and the unintegrated form arrang-έ may be considered to be more equivalent. This, in turn, could facilitate switching of unintegrated French past participles into Alemannic sentences, as in (1) through (5) above. In view of the striking similarities between the borrowing patterns of German and Dutch, it is remarkable that the two exceptions discussed here both concern French past participles. This suggests that both phenomena must be linked. The explanation given above does make a link between the absence of the prefix ge- in German and the occurrence of unintegrated French participles in Alemannic. Probably an explanation that links both phenomena is to be preferred over an explanation in which they receive an independent explanation.

5. Past participles in other bilingual communities along the Romance-Germanic linguistic frontier It is interesting to compare the facts from Brussels and Strasbourg with those from other bilingual communities along the linguistic frontier. If structural factors are of overriding importance in language contact, there must be clear similarities between the contact patterns in the different French-German language communities along the frontier. The evidence I know of appears to show that this is indeed the case. Riehl (1996) shows that borrowing in South Tyrol (contact between varieties of German and of Italian) and Eastern Belgium (contact between varieties of German and French) is equally intimate when measured against Thomason and Kaufman's borrowing scale. Borrowing is limited to level two in Thomason and Kaufman's scale, just as we found for Brussels and Strasbourg. She also shows that French past participles can occur in integrated form in German as spoken in East Belgium, see (6). (6) Die Jugend hat das Englische The youth has the English 'The youth has adopted English.' (Riehl 1996: 196)

adoptiert adopted

Contrastive sociolinguistics 85

This example show that French past participles are integrated in precisely the same way into the German varieties spoken in Strasbourg and in East Belgium (that is: without the prefix ge- and with the suffix i:r). Riehl (1996) does not provide any examples of unintegrated past participles. Biegel (1996) on the other hand found both integrated and unintegrated past participles in Walscheid (Lothringen). The examples are interesting because Biegel found not only unintegrated French past participles in the German variety spoken in Walscheid (see (7) and (8)) but he found also the reverse: unintegrated German past participles in the French variety spoken in Walscheid (see (9) and (10). (7) Er isch jetzt decide, unn do wird nett... Hij is now decided, and there is no... 'He has decided now, and there is no...' (Biegel 1996:196) (8) Manchmol Sometimes decidee odder decided or 1996:196)

sahn say

se they

ma but

se she

will jetzt wants now

se she

isch nimme is no more

noch yet

nett (Biegel not5

'Sometimes they say (it), but she is no more decided, or she doesn't want yet.' (9) Quand When

ma my

fille est daughter is

partie sans rien dire, left without nothing say

il I' he it

a has

geschnitt, ignored,

il he

etil and he

a has

tout everything

paye. (Biegel 1996:196) paid

Γ it

a has

sentie felt

'When my daughter left without saying anything, he ignored it, he felt it and he payed everything.'

86 Jeanine Treffers-Daller

(10) C'est It is c' it

fou quand-meme, mais crazy nevertheless but est is

c' it

est is

ussgehängt, put up

quand-meme publie. (Biegel: 1996:196) nevertheless published

'It's crazy nevertheless, but it has been announced, it has nevertheless been published.' The facts from Lothringen are different from the Strasbourg data in the sense that German past participles containing the prefix ge- are considered to be equivalent to French past participles. There are no examples of this type in Gardner-Chloros' data. I only have access to my own French-Dutch corpus from Brussels and it is therefore difficult to prove that unintegrated French past participles are not found in any other Dutch varieties spoken in Belgium. There are no examples of this phenomenon in the literature about language contact in Belgium. Deneckere (1954) only gives examples of integrated past participles occurring in Dutch as spoken in Flanders. In the data I have been able to trace integrated past participles can be found since the 18th century. I found the following example in Deneckere (1954), who quotes from a satirical dialogue: (11) Monsieur l'avocat, ik heb d'eer u tesalueeren, Mister the lawyer, I have the honor you to greet ik heb lange gedesidereert u eens over interessante zaken te I have long wished you PART about interesting matters to spreken en altijd g'echoueert in die entreprise (Deneckere 1954: speak but always failed in that undertaking 6 327) 'Mister lawyer, I have the honor of greeting you and I have wanted to speak to you about interesting matters for a long time, but was never successful in this undertaking.'

Contrastive sociolinguistics 87

It would be interesting to study whether unintegrated French past participles are found in Dutch as spoken in French Flanders (northern France) or other varieties spoken in Belgium. If my assumption is correct, unintegrated past participles do not occur in the Dutch varieties, whereas they do occur in the German varieties.

7. Conclusion Thomason and Kaufman's model correctly predicts that the language contact phenomena in Brussels and Strasbourg are to a large extent similar, both from a quantitative and from a qualitative perspective. The model correctly predicts that lexical borrowing is far more important in the Germanic varieties spoken in Brussels and Strasbourg, as the language contact phenomena in the Germanic varieties are the result of a process of borrowing. Structural interference is more prominent in the French varieties, and this can be explained by assuming that structural interference is the result of a process of interference through shift. However, borrowing is not as intimate as one could have expected on the basis of Thomason and Kaufman's model. The speakers of the Germanic varieties in Strasbourg and Brussels underwent such cultural pressure that many of them shifted to French. If one takes the sociolinguistic history of the speakers to be the primary determinant of the outcome of language contact, as Thomason and Kaufman do, one would expect more dramatic forms of borrowing and interference. Thomason (in press) acknowledges that predicting the outcome of language contact remains an almost impossible task as "[g]reat intensity of contact is a necessary condition for certain kinds of interference, especially structural interference, but it is by no means a sufficient condition." (Thomason 1998: 3). Thus, in some cases structural borrowing remains very limited, despite the presence of intensive language contact and important culture pressure. The differences we found between the Strasbourg and Brussels language contact patterns could be shown to be linked to structural factors rather than to the sociolinguistic history of the speakers. Thus, the results obtained so far give us a more accurate picture of the structural determinants of language contact than of its sociolinguistic determinants.

88 Jeanine Treffers-Daller

It is not the aim of this paper to deny the influence of sociolinguistic factors on language contact. In Treffers-Daller (1994) I have shown that different micro-sociolinguistic factors such as social networks, age, the area where an informant lives and the fact whether an informant attended a French or a Dutch have a bearing upon the frequency with which codeswitching and borrowing occur in the speech of individuals. The aim of this paper is to show that sociolinguistic factors have little explanatory value when we are discussing qualitative differences in language contact patterns. Clearly, sociolinguistic factors have a bea-ring upon quantitative differences between speakers, such as the fact that some speakers in the Strasbourg corpus produce more unintegrated past participles than others. It is very interesting to see unintegrated past participles are mainly found in the speech of informants belonging to the middle generation in the Strasbourg corpus. This group is also the group in which we find most balanced biUnguals (Gardner-Chloros 1991). They codeswitch more than the younger and the older generations. Either French or Alsatian can be the matrix language of their utterances. The absence of Alsatian morphology on French past participles can be interpreted as an indication of ongoing language shift. The use of Alsatian goes down, especially in the cities (Bothorel-Witz and Huck 1996). The presence of unintegrated French past participles in Alsatian can be interpreted as a sign that Matrix Language turnover (Myers Scotton 1998) is taking place. The middle generation is on their way to switching to a new matrix language: French. Similar patterns havefrequentlybeen found at the level of phonology. The more fluent a bilingual becomes in the two languages, the less s/he adapts loan words to the phonological system of the borrowing language (Haugen 1950; Poplack et al 1988). In Brussels the situation is different because of the presence of Standard Dutch Language shift is extremely complex: the use of Brussels Dutch is going down (De Vriendt and Goyvaerts 1989), but some speakers shift to French and Standard Dutch and not only to French. The presence of a related standard language can function as a support mechanism for the Brussels Dutch dialect, especially in those parts of the grammar that are very similar. In this chapter I limited the discussion to borrowing or the insertion of single words from one language into the other and did not discuss

Contrastive sociolinguistics 89

codeswitching. There are indications, however, that codeswitching is a more wide-spread phenomenon in Strasbourg than in Brussels and I have argued in the past that this may well be due to factors of a sociolinguistic nature (Treffers-Daller 1992). Future research needs to show whether there are any qualitative differences between the switching patterns in both cities. We may then be able to give a more complete picture of the interaction of linguistic and sociolinguistic factors in this language contact area.

Notes 1. I am very grateful to Penelope Gardner-Chloros for having allowed me to study her corpusfromStrasbourg. For further information about the corpus, the reader is referred to Gardner-Chloros (1991). An earlier version of this paper appeared in Gramma/TTT, volume 19, number 1. 2. Recently, Thomason (1998:3) pointed out that "the crucial factor is not whether or not shift takes place but wheter or not there is imperfect learning by a group of people. 3. According to Philip and Bothorel Witz (1989: 313), "Low Allemanic is the traditional name for the type of dialect spoken in Alsace and Baden (...)." It differs from High Alemannic, which is spoken in the extreme south of Alsace and Baden as well as in Switzerland. Most authors use the term Alsatian dialects for those dialects spoken in the Alsatian part of the Upper Rhine region. As Bister Broosen (1996: 136) puts it, "[wjhile Alsatian dialects still share many similarities with the other Alemannic dialects of German, they also differ from them in important respects, mainly because of the close contact with French and because of the fact that French, and not German, is used as the standard language." For further discussion of similarities and differences betweem Alsatian dialects and other Alemannic dialects, the reader is referred to the literature. 4. I follow Gardner-Chloros' (1991) transcription of the examples. The plural -s on condamnes is not audible in oral data. 5. The rules for verbs beginning with a stressed (and separable) particle differ from the rule given here. The prefix ge- appears between the particle and the root of the verb, thus the past participle of zuhören 'listen' is zugehört. As the French verbs borrowed into Dutch or German do not fall into this category, I do not discuss this any further. 6. It is not entirely clear what sahn se means. The author does not provide translations of his examples. 7. Deneckere gives the following source of this quote: the dialogue was written by De Foere and it was published in the first volume, number 4, pages 162-163 of the Spectateur Beige. I assume that the dialogue was then written shortly after the Belgian Independence in 1830. It is important to realize that data from satirical literature were intended to ridicule language mixing. The data from the literature and the spontaneous data discussed here are thus of a different nature.

Functional categories and codeswitching in Japanese/English Shoji Azuma 1. Introduction As more studies on codeswitching have been carried out, it becomes more clear that codeswitching is not a random alternation of two languages but rather it is a patterned behavior. As a result, adequately describing these behaviors of codeswitching has become one of the area of focus in the study of codeswitching among bilinguals. The present study is an attempt to examine the behavior of codeswitching in Japanese/English with respect to the Principles-andparameters approach, specifically that of Fukui (1995: 327-372). The data for this study are drawn from Japanese/English codeswitching as well as other data documented in the codeswitching literature. Around 1980, the Principles-and-parameters theory emerged from the camp of the theory of generative grammar which previously had predominantly studied monolingual speakers. This approach is based on the premise that human languages can be characterized as a general set of principles each of which is associated with an open parameter to be set by each individual language. The set of principles is the core of the Universal Grammar (UG) or the mental organ which is shared by all humans. This approach is attractive in terms of language learnability. All humans seem to have the innate ability to learn any language as long as the proper input is provided.

2. Functional parametrization hypothesis One of the much discussed characteristics about lexicon is that it is roughly divided into open class and closed class. This dichotomy of the word class has attracted the attention of general linguists as well as psycholinguists (e.g., Bock 1989: 163-186; Garrett 1990: 133-175;

92 Shoji Azuma

Schachter 1985: 3-61; Taft 1990: 245-257). It is Joshi (1985: 190-204) who first pointed out the relevance of word class in codeswitching. He formulated the constraint that "[cjlosed class items (e.g., determiners, quantifiers, prepositions, possessive, Aux, Tense, helping verbs, etc.) cannot be switched." (1985: 194) Indeed, the idea that switched items are limited to open class items (i.e., not closed class items) is repeatedly observed in various language pairs, in addition to Marathi/English on which Joshi's study is based. Some of the examples are given below. (1)

Parents te depend-honda e Parents on depend-be AUX 'It depends on the parents.' (Panjabi/English, Romaine 1989: 124)

(2)

Leo si-ku-come na books z-angu today lst/neg Past/neg come with books CI 10 my 'Today I didn't come with my books.' (Swahili/English, Myers-Scotton 1993: 80)

(3)

Moo shaa-nai kara compromise-shit-age-ta wa. Emph way-Neg because compromise-do-give-Past Emph 'Because there was no way, I compromised (with him).' (Japanese/English, Azuma 1997b: 6)

In (1), the codeswitched elements parents and depend are noun and verb stems. Closed class items such as prepositions and operator verbs, which carry tense, do not switch. In (2), the codeswitched elements come and books do not belong to the closed class. In (3), the codeswitched element compromise is suffixed by the Japanese helping verb -suru 'do' and other morphemes to form a compound verb. In all of the examples, a common feature shared by the typologically different language pairs is that open class items participate in codeswitching but closed class items do not. This insightful observation by Joshi (1985: 190-205) led other researchers to refine the constraints on codeswitching (e.g., Myers-Scotton 1993: 75-119; Azuma 1993: 1071-1093). Interestingly, the dichotomy of open vs. closed class is not confined just

Functional categories and codeswitching 93

to codeswitching but widely observed in various language phenomena such as speech error and language acquisition (e.g., Brown 1973; Garrett 1975: 133-177; Petersen 1988: 479-494; Vihman 1985: 297-324). Thus, the characterization of codeswitching using the above dichotomy can be applied more widely. This dichotomy in the lexicon has been attracting the attention of linguists who work in the theoretical framework of the so-called Principles-and-parameters approach as well. For example, in his discussion of the lexicon, Chomsky (1995: 54) states as follows: Items of the lexicon are of two general types: with or without substantive content. We restrict the term lexical to the former category; the latter arefimctional.Each item is a feature set. Lexical elements head NP, VP, AP, and PP, and their subcategories (adverbial phrases, etc.).

Although it is not clear what substantive content is, it is fair to say that the thesis put forth in the statement can be related to the dichotomy we have been observing in our codeswitching data. Our data suggest that the lexical category (e.g., Ν, V) may be codeswitched but the functional category may not switch. One interesting question is why the functional category does not participate in codeswitching but the lexical category does. It appears that the lexical category is interchangeable between two languages through codeswitching. On the other hand, the functional category in one language cannot be replaced by another language, making the functional category un-interchangeable. This suggests that the functional category (not the lexical category) is the core of its language or is what makes the language different from other languages. When languages differ from each other, it is the functional category which makes language X different from language Y. In the framework of the Principles-and-parameters approach, it can be stated that the functional category is the one to be parameterized for each individual language. As Chomsky (1995. 6) remarks, Within the Ρ & Ρ [principles-and-parameters] approach the problems of typology and language variation arise in somewhat different form than before. Language differences and typology should be reducible to choice of values of parameters.... A still stronger one is that they are restricted to formal features of functional categories (see Borer 1984, Fukui 1986,1988).

94 Shoji Azuma

As suggested by Chomsky's above remark, Fukui (1995: 327-372) makes the following hypothesis. (4)

Functional parametrization hypothesis: Only [+F] elements in the lexicon are subject to parametric variation. (Fukui 1995: 337)

This functional parametrization hypothesis essentially claims that languages differ from each other only in the functional category. If we assume that codeswitching occurs between interchangeable items, then the hypothesis coincides with the fact about codeswitching as well. If an element has the feature of [+F], then it is parameterized for its specific language and it is not interchangeable. Thus, it cannot be codeswitched. The thesis of the functional parametrization hypothesis fits well within our codeswitching phenomena. It is entirely possible to postulate the follow-ing hypothesis as a consequence of the functional parametrization hypothesis. (5)

Functional Hypothesis for Codeswitching 1: Only [-F] elements in the lexicon participate in codeswitching.

The hypothesis is welcome because it is a natural consequence of a more universal hypothesis (i.e., Functional parametrization hypothesis) which is claimed to define the nature of human languages in general. Our next question is then, what are the members of the [+F]? Fukui (1995: 338-339) states that there are essentially four major lexical categories Ν (noun), V (verb), A (adjective) and Ρ (preposition/ postposition), and four major functional categories AGR (agreement), Τ (tense), D (determiner), C (complement). We may assign a feature [+F] to the functional categories and a feature [-F] to the lexical categories. If we apply the functional hypothesis for codeswitching, the hypothesis predicts that Ν, V, A and Ρ arefreelycodeswitched but AGR, T, D, C do not codeswitch. In what follows, I will examine this prediction in the Japanese/English environment.

Functional categories and codeswitching 95

3.

[+N] as a relevant feature for switching

First, I will examine the major lexical categories in terms of their codeswitchability. Although the hypothesis predicts that Ν, V, A and Ρ are freely codeswitched, this prediction is not straightforwardly borne out. It is true that Ν is the single most commonly codeswitched element in Japanese/English (e.g., Nishimura 1997: 90, Azuma 1997b: 4) as well as other language pairs (e.g., McClure 1997: 133). However, the other categories are not straightforward as the case of N. As a matter of fact, we do not observe the following switching patterns where English V, A and Ρ are simply inserted in Japanese discourse as an instance of codeswitching. (6)

* Kinoo 5 jikan watashi wa yesterday 5 hours I Top 'Yesterday, (for) five hours, I studied.'

(7)

* Are wa big yama that Top big mountain 'That's a big mountain.'

(8)

* Sono gakusei wa gakkoo from that student Top school from 'That student came back from school.'

(9)

* Sono gakusei wa from gakkoo that student Top from school 'That student came back from school.'

studied. studied

da. is

modot-ta. returned

modot-ta. returned

Example (6) illustrates a switching of V, likewise (7) illustrates a switching of A, and (8) and (9) illustrate a switching of P, respectively. As the examples suggest, V, A and Ρ are not directly codeswitched from English to Japanese. In the case of P, Japanese has postpositions and English has prepositions, thus there is a confounding factor of word order. Nevertheless, examples such as (8) and (9) are not attested in the codeswitching literature. Thus, it appears the prediction is borne out

96 Shoji Azuma

only in the case of N, but not in the cases of V, A, and P. However, further examination shows that Japanese has the capability of switching at least V and A. First, V switching is accomplished through the use of the so-called helping verb or light verb, which is -suru 'do' in Japanese. Observe the following attested examples. (10)

Watashi mo nani I too what

retire-suru retire -do

made sixty-five ninaru until sixty-five become

made hataraki-mashita. until work-Past Ί, too, worked until I retired, until I became 65 .' (Nishimura 1997:78) (11)

Moo shaa-nai kara compromise-s\nt-3.ge-t?L wa. Emph way-Neg because compromise-do-give-Past Emph 'Because there was no way, I compromised (with him).' (same as (3), Japanese/English, Azuma 1997b: 6)

Switching of V is accomplished through the verbal noun construction (Martin 1975: 869-880). English verbs are recategorized as verbal nouns (which are nouns as discussed later) and they are suffixed by the helping verb of -suru, which carries tense and other grammatical information.1 Interestingly, the literature suggests that this switching pattern of using the helping verb is not confined to Japanese/English but widely observed in other language pairs as well (e.g., Romaine 1989: 142-143; MyersScotton 1993: 87-89; Backus 1996: 211-283). For the case of A (adjective), there is no straightforward switching of A as the example of V in (9) showed. First of all , as a category, Japanese has two categories which are equivalent to English adjectives. One is called adjective and the other is called adjectival noun (Martin 1975: 754-766). Adjectives directly modify following nouns, but adjective nouns require the -na suffix to modify following nouns. Observe the two types illustrated in (12) and (13).

Functional categories and codeswitching 97

(12)

Adjective:

a.

b. (13) Adjectival noun:

a.

b.

yasashii hito kind person 'kind person' * yasashii-na hito * kirei hito beautiful person 'beautiful person' kirei-na hito

The word yasashii 'kind' is an adjective and it does not require -na suffocation. On the other hand, the word kirei 'beautiful' is an adjectival noun and it does require the -na suffixation. Whether a certain word is classified as an adjective or an adjectival noun, it is always lexicalized. For example, words such as kitanai 'dirty', kibishii 'strict', akai 'red', amai 'sweet' are all adjectives and words such as shizuka 'quiet', hen 'strange', yuumei 'famous' are all adjectival nouns. In terms of switching pattern, we notice that adjectives are recategorized as adjectival nouns. Observe the following examples. (14)

Dirty-na tokoro dat-ta ne. Dirty place is-Past Tag '(It) was a dirty place, wasn't it?' (Azuma 1997b: 8)

(15)

Modern-na tookyoo nante daikirai da. Modern Tokyo Emph hate '(I) hate Tokyo that is modern.' (Nishimura 1997: 93)

In both examples, the adjectives are suffixed by -na, which shows that they are treated as adjectival nouns. Interestingly, in Japanese, words equivalent to English 'dirty' and 'modern' (i.e., kitanai and atarashii) are classified as adjectives but not adjectival nouns. The following examples illustrate the point.

98 Shoji Azuma

(16) a. kitanai tokoro dirty place 'dirty place' b * kitanai-na tokoro (17) a. atarashii tokoro modern place 'modern place' b. * atarashii-na tokoro In codeswitching, original Japanese words (adjectives) are recategorized as adjectival nouns and then they are codeswitched by suffixing -na to English switched adjectives. This recategorization is what we observed in the case of V. Finally, Ρ itself is never attested for codeswitching. Observe the following example in (18). (18) I slept with her basement de I slept with her basement at Ί slept with her in the basement.' (Nishimura 1986: 128) The sentence in (18) appears to involve a switching of English preposition to Japanese postposition de. However, it is important to note that the word basement is not accompanied by an English article and that the placement of the adpositon follows the word order of Japanese. These suggest thatfirstthe entire adpositional phrase ('in the basement') is switched into Japanese and then in the switched phrase, the word for 'basement' is switched from Japanese to English. In other words, the sentence in (18) does not involve a switching of pre/post-position. The switching of the entire adpositonal phrase is commonly observed in Japanese/English data. Observe the following example in (19). (19) What do you call it nihongo del What do you call it Japanese in 'What do you call it in Japanese?' (Nishimura 1986: 128)

Functional categories and codeswitching 99

The sentence in (19) exhibits that the entire Japanese adpositonal phrase nihongo de is switched from English to Japanese. Thus, we can maintain the thesis that Ρ itself does not switch. 2 We have observed so far the following facts. Among Ν, V, A, and P, the only freely switched category is Ν. V and A do switch but they have to go through the recategorization; in the case of V, it has to be recategorized as a verbal noun; in the case of A, it has to be recategorized as an adjectival noun. Ρ simply never switches. Our next task is to account for the above examined phenomena in a unified way. One way to do this is to examine the lexical features of the categories. Using the [+/- N] and [+/- V] features in Chomsky (1970: 184-221), Miyagawa (1987:30) argues that Japanese major lexical categories can be characterized as follows: (20)

Verb: Noun: Verbal noun: Adjective: Adjectival noun: Postposition:

[+V, -N] [-V, +N] [-V, +N] [+V] [+V, +N] [-V, -N]

A feature shared by noun, verbal noun and adjectival noun emerges as [+N] and this is not a feature of the postposition. We can state that the relevant feature for codeswitching is not just [-F] but it has to be [-F, +N], Thus, we can modify the functional hypothesis for codeswitching as follows: (21)

Functional Hypothesis for Codeswitching 2 Only [-F, +N] elements in the lexicon participate in codeswitching.

One of the most common observations about codeswitched elements is that the category of nouns is the most dominant in switching. The [+N] feature in the hypothesis clearly manifests this simple fact. Also the hypothesis predicts that any items can be codeswitched as long as a language has a mechanism to change the feature of relevant item into

100 ShojiAzuma [+N], possibly via recategorization as in the case of Japanese (i.e., verbal noun, adjectival noun).

4. [+F] and switchability Next, we will turn our attention to the category of [+F], As the hypothesis predicts, there is no switching expected among the functional categories such as AGR, T, D, and C. This prediction is largely borne out as the codeswitching literature shows (e.g., Nishimura 1985: 135, Azuma 1993: 1071-1093). For example, the following switching patterns with respect to Τ have never been attested. (22)

a. * Moo zenbu oboQ-ed. already all memorize-Past '(I) already memorized all.' b.

* I already

memorize-far

all.

-Past (23)

a.

* Ano hito wa zutto hanashi-/'«g that person Top continuously speak-Prog 'That person is continuously speaking.' b. * That person is continuously speak-te-iru. -Prog

In (22a), the past tense morpheme -ed is switched from English. In (22b), the past tense morpheme -ta is switched from Japanese. Likewise, in (23 a), the progressive morpheme -ing is switched from English. In (23b), the progressive morpheme -te-iru is from Japanese. The fact that none of the patterns have been attested suggests that indeed Τ is a nonswitchable category. Next, in terms of C, the Japanese equivalent of English complementizer 'that' is to. Observe the following examples of codeswitching of C, which are never attested.

Functional categories and codeswitching 101

(24) *Watashi wa Taroo ga tensai I Top Taroo Nom genius Ί think that Taroo is a genius.' (25)

da is

that omou. that think

I think to Taroo is a genius. I think that Taroo is a genius Ί think that Taroo is a genius.'

The example (24) exhibits the English switched complementizer that. The example (25) exhibits the Japanese switched complementizer to. As the examples show, there is a confounding factor of word order. The English complementizer precedes its complement. On the other hand, the Japanese complementizer follows its complement. Thus, it is not clear whether the observation is due to the functional category of C or simply word order. However, the observation does not contradict our hypothesis based on the feature of [+/- F], As for the other two categories of AGR and D, we cannot say much about them because Japanese lacks AGR and D (e.g., Fukui 1995: 353). There are no attested examples of switching AGR and D in the literature. However, Azuma (1997a: 122) notes that the English definite article the is sometimes borrowed into otherwise completely Japanese discourse. Observe the following example. (26) Yookoso, minasama no the dendoo e. welcome you Gen the palace to 'Welcome to your palace.' The fact that D category is borrowed is interesting in the light of the present discussion. Interestingly, it is never attested that C is borrowed in Japanese/English. This suggests that all members in the functional category are not uniformly [+F], but there may be some difference in the strength of its [+F] feature. In other words, D may be stronger than C in terms of its [+F] feature. In his discussion of the functional category, Fukui (1995: 339) suggests the following feature specification for the major functional categories:

102 Shoji Azuma (27)

AGR Τ D C

= [+F, +N, +V] = [+F, -Ν, +V] = [+F, +N, -V] = [+F, -Ν, -V]

Without going into details of the feature specification, one feature relevant to the present study is the feature of [+N], Among the four major categories, AGR and D are claimed to have the [+N] feature. This will nicely fit to our observation. Τ and C do not have the [+N] feature and they are the categories which strongly resist codeswitching (as well as borrowing). This leaves room for the argument that AGR and D are less resistant to codeswitching (as well as borrowing), and we just observed in (26) that D can be borrowed. As for AGR, we do not have any data to prove one way or another, although the feature specification suggests that AGR may be susceptible to codeswitching.

5. Conclusion The present study has shown that the behavior of codeswitching in Japanese/English can be captured in the feature specification, within the framework of the Parameters-and-principles approach. According to this approach, human languages are analyzed in terms of a set of universal principles which are each parameterized to an individual language. One of the important claims is that only functional categories are parameterized. In other words, non-functional categories or lexical categories are commonly shared by all languages. In terms of our codeswitching study, this can be taken to suggest that codeswitching may be possible where languages share their systems, that is, in the area of lexical categories. Then, it is the area of functional categories where each language is parameterized and codeswitching does not occur because they are not shared among languages and not interchangeable. This thesis was tested in the light of Japanese/English data. It has been shown that the relevant feature is not just [+F] but that [+N] should be added after examining the cases of recategorization (i.e., verbal noun and adjectival noun). In codeswitching, Verbs ([+V, -N]) are recategorized

Functional categories and codeswitching 103

as Verbal Nouns ([-V, +N]) and Adjectives ([+V]) are recategorized as Adjectival Nouns ([+V, +N]). In both cases, the feature [+N] is acquired after recategorization. The feature of [-F, +N] is presented as a hypothesis for code-switchability. Because [+/-F] is parameterized to each language, it is almost impossible to present feature specifications and their members which are universal to all human languages. However, it would be interesting to examine other language pairs with respect to this hypothesis. Finally, it has to be pointed out that a comprehensive account for codeswitching is not complete without the factor of word order. Although the present study has not discussed word order, it is well attested that word order plays a crucial role in codeswitching (e.g., Myers-Scotton 1993: 83-85, Poplack 1980: 518-618). Fukui (1995: 336) argues that parameters with ordering restrictions should be postulated outside of the lexicon. The present study focused only on lexicon; discussions about ordering must await future study. Notes 1. There is the very small number of V switching which does not involve the helping verb -suru 'do'. Nishimura (1997:121) reports the following examples: Don't suu. Don't slurp 'Don't slurp.' Can I nigerul Can I escape 'Can I escape?' In both cases, a Japanese infinitive verb occurs. Interestingly, the information such as tense and aspect is not carried by the codeswitched element. 2. This does not mean that Ρ never switches in other languages. In other languages, some items in Ρ may codeswitch. For example, Myers-Scotton (1993-124-125) reports that prepositions such as before and between participate in codeswitching in Swahili/English.

Linguostatistic study of Bulgarian in the Ukraine1

Ol'ga S. Parfenova 1. Introduction The Bulgarian population living on the territory of the modern Ukraine for about two centuries has preserved its native language, despite the lack of any contacts with their compatriots in Bulgaria during a long period of time and in spite of the policy of Russification of the Soviet government. Used only for interethnic communication, Bulgarian has undergone a strong Russian influence and in this sense it can be qualified as a mixed language, containing Russian adapted and non-adapted items. Such mode of discourse is characteristic not only for representatives of junior or middle generations but also for the older generation of Bulgarians and is acknowledged by the whole ethnic group as their mother-tongue. When speaking their Bulgarian, the people will incorporate a great many Russian items. So, I tend to agree with Poplack's finding (1980: 614) that codeswitching, as it emerges in communication within an endogroup, is a discourse mode, and not a discourse strategy "to achieve certain interactional effects at specific points during a conversation." How rapidly the process of relexification in Bulgarian discourse has evolved and what the perspectives are for maintaining this language variety shall be the first issue in the present chapter. The second issue of the chapter concerns the status of Russian lexical items in Bulgarian, some of which represent switches and some others, borrowings. Many scholars, studying languages in contact, recognize the lack of universal operational criteria allowing to distinguish between these two phenomena (see Appel and Muysken, 1987: 172-173). In our case it seems to be even more difficult, taking into account the genetic relationship of the languages in contact, which makes them mutually intelligible. The use of a Russian word with Bulgarian formatives can be not only a simple borrowing - widely spread among native-speakers and acknowledged by them as part of the Bulgarian lexicon - but also a

106 Ol'ga S. Parfenova discourse strategy, when an individual attempts to speak pure Bulgarian, for instance, with a Bulgarian from Bulgaria or with anybody speaking literary Bulgarian. At the same time Russian words without any adaptation do not always represent switches into Russian because of their wide spread and the lack of Bulgarian equivalents in the variety under consideration. Thefrequencyof Russian and Bulgarian equivalents in our data can help to solve this problem. The third issue in which I am interested is the on-going process in the core vocabulary. I have observed in Bulgarian speech the frequent use of Russian temporal adverbs, modal words, particles, numerals; suggesting that not only single words of the core vocabulary but whole semantic groups of the lexicon as well undergo shifting. To investigate these issues concerning the present-day state of Bulgarian in the Ukraine, I will use an approach known as lexicostatistics. The present study envisions several tasks: (1) to evaluate the presence of adapted and non-adapted Russian items at idiolect and sociolect levels; (2) to state correlations between the frequency of Russian items and their social background; and (3) to reveal the role and status of some Russian words for expressing temporal, local, and some modal meanings.

2. Ethnic background and history of Bulgarian in the Ukraine

The influence of Russian on Bulgarian is not only related to the ethnic contacts between the speakers of the two languages but rather to the status of Russian as the state language of the Russian Empire, later of the Soviet Union, and with its exceptional usage in the spheres of education, culture, information and all phases of official communication. The appearance of Bulgarian settlements goes back to the period of wars between Russia and Turkey of the mid-eighteenth to the nineteenth centuries. The region of the most compact settlement of Bulgarians is Bessarabia, a former province of the Russian Empire that had been annexed by military actions. Today the territory of Bessarabia is divided between two states, the larger part belonging to Moldavia and the coastal part belonging to the Odessa region of the Ukraine. Except for Bessarabia Bulgarian settlements also exist along the coast of the Sea of

Linguostatistic study 107

Azov. Their emergence dates from the same time period. It should be noted that by the time the migration of Bulgarians began, Bessarabia was almost uninhabited. Then, during the nineteenth century there appeared Bulgarian, Serbian, German, Ukrainian and other settlements; some of them, ethnically pure and others, mixed. Until the 1940s the Bulgarians lived in comparative isolation from the neighboring population. In the post-war period the intensification of migration processes resulted in the disappearance of purely Bulgarian villages, and now Bulgarians mostly live side-by-side with Ukrainians and Moldavians. In the Odessa region approximately half of the Bulgarian population lives in towns, but the Bulgarians represent there only an insignificant percentage of the whole population. The exception is the district center Bolgrad where Bulgarians constitute 40% of the population. By and large, according to the census of 1989, 165,800 Bulgarians live in the Odessa region, and they rank third in number after Ukrainians and Russians. Neither the Bulgarian language nor its culture was subjected to any restrictions in their development by the Tzarist authorities nor during the first ten years of Soviet Power. During the period of the building of the multinational state (the 1920's - mid-1930's), the Bulgarian language in the Azov region2 was given all the prerogatives of an official language: it was used in school education and office work in local government bodies, and books and magazines were published in Bulgarian as well. The decrease of the social significance of Bulgarian during the Russification period, which started by the middle of the 1930's, predetermined both, the language assimilation for a part of the Bulgarians and the wide spread of Bulgarian - Russian bilingualism. The results of my questionnaire used in interrogation in 1993 of 370 rural Bulgarians have demonstrated that 96% of the informants speak fluently both languages, Bulgarian and Russian. Today the Ukrainian authorities have taken measures to support Bulgarian by its inclusion in the spheres of education, culture, and mass media. However, the active government policy of Ukrainization3 casts some doubts on the effectiveness of these measures. In the national schools, where the teaching of Bulgarian has been introduced, the language of instruction is Russian, but in higher educational institutions and colleges the entrance examinations and

108 Ol'ga S. Parfenova

education are only in Ukrainian. As a result, the young Bulgarians prefer increasingly the ordinary schools where Bulgarian is not taught over the national schools.

3. The data As data for the investigation I used the Bulgarian speech of 23 informants from Odessa region, recorded in June-July of 1993, and data from five informantsfromthe Azov region, whose speech was recorded by the Moscow State University student V. Domontovic for her diploma in dialectology in 1980. The main purpose of recording Bulgarian speech was to obtain texts representing the state of Bulgarian in Ukraine. In some cases I asked local Bulgarians to interview their fellow citizens according to my instructions. The overall volume of audio recordings of the speech was 6.5 hours (3.5 hours of speech for the Odessa region residents and 3 hours, for the residents of the Azov region). The informants belong to different social groups according to the following characteristics: (1) Nationality - 27 Bulgarians and one Ukrainian who grew up in a Bulgarian village and speaks Bulgarian; (2) Residence - seven live in towns and 21, in villages; (3) Age - six persons under 20 years of age, twelve, medium aged (20 to 55), and ten elderly persons (above 55). Communication in Bulgarian was carried out in dialogue form on everyday life topics. Each informant was asked to speak about his family, his/her work, about the history and the traditions of their village. The situation in which the Bulgarian speech was recorded remained the same in all instances and this excluded any influence of such factors as the communicants' role behavior, the environment, the type of address, and so forth.

4. Methodology The Bulgarian dialects of Bessarabia and the Azov region go back to the East group of dialects, on which basis literary Bulgarian developed during

Linguostatistic study 109

the second half of the nineteenth century. The speech of the Bessarabian Bulgarians retained some specific features of these dialects - those in particular which are specific also for Russian. In the analysis of the Bulgarian speech and the singling out of Russian items, I used materials on historical lexicology (Myzlekova, 1990) as well as the dictionary of colloquial Bulgarian written by Najden Gerov in the nineteenth century (Gerov,1977). The analysis of all the textual data at word level has shown that the Bulgarian speech contains the following types of text units: (1) Interlanguage synonyms, i.e., words belonging both to Bulgarian and Russian, which in the given context bear the same lexical meaning and also the same phonetic shape (some conjunctions, prepositions, and nouns); they are designated as elements of Bulgarian. (2) Elements of Bulgarian - word forms belonging to Bulgarian in lexical, phonetic, and morphologic aspects in the given context; (3) Elements of Russian - word forms belonging in the given context to Russian in light of their lexical, phonetic, and morphological characteristics; (4) Adapted Russisms which also include the so-called loanshifts (Haugen 1972: 344), that is, cases with no phonetic adaptation where a word under the influence of another language acquires meanings and uses new to it as well as the use of Russian lexemes in Bulgarian speech with either only phonetic adaptation, with only morphological adaptation or with both, morphological and also phonetic adaptation.4 The given data contain mainly intrasentential codeswitching, i.e., Russian items are single words, or can be identified an idiomatic expression, or short phrases. The presence of Ukrainian and Moldavian language elements is insignificant: in all the data studied we found two Moldavian and five Ukrainian lexemes. I carried out statistical data processing on the level of idiolects and sociolects, uniting several idiolects according to the informants' social features and their residence. Thus statistical samples were obtained representing. 1. Bulgarian town residents ; 2. Bulgarian villagers of school age from the Odessa region;

110 Ol'ga S. Parfenova 3. Middle-aged Bulgarian villagers from the Odessa region; 4. Bulgarian villagers of older age from the Odessa region; 5. Bulgarian villagers of older age from theAzov region. All the texts were divided into samples of 100 word usages. At the level of idiolects, we worked with speech texts of 17 informants with more than 300 word usages in total. At the level of sociolects we worked with samples of6,000 sounds (or 6 kilophones)5. Each socio- and idiolect was represented by three statistical (variation) series: (1) total number of Russisms, (2) the number of non-adapted Russian words, (3) the number of adapted Russian words in each sample. The main properties of the variation series were represented by the following values: arithmetic mean frequency (X) and standard deviation (σσ 2 ) as dispersion factor of the parameter investigated. For checking the type of frequency distribution (normal, Gaussian / non-normal) we applied Pearson's criterion (χχ 2 ). When the distribution of the factors investigated was normal, the student's criterion (t) was used to determine the essentiality of differences of the two series, i.e., whether they are statistically invariant or not. The threshold between the fortuitous and essential differences is within the range 2,23 < t < 3,17 (for η = 6). So, if the result falls into this interval no definite conclusions can be made. Using the deciphered records of speech of Bessarabia and the Azov region, we compiled a frequency dictionary of Russisms containing all parts of speech. For each word the number of its uses in the investigated block of texts was calculated. The approach to description of foreign words in Bulgarian speech was suggested by the ideas of functional grammar, which proposes descriptions of language according to the semantic classification of its elements (Bondarko 1983: 57-66).

5.

Quantitative characteristics of Russian words in Bulgarian speech

The processing of the text data by statistical methods has shown that the mean frequency of adapted and non-adapted Russian elements in

Linguostatistic study 111

Bulgarian varies, in a 100-word text, from 7.6% to 30%; non-mixed Russian elements range from 2.5% to 25.7%; and adapted Russian words, from 4% to 13.7%. Thus, the frequency of non-adapted Russisms is greater than that of the adapted ones. Furthermore, the insignificant differences in the use of adapted Russian words demonstrate that this characteristic of the speech samples isless relevant than the frequency of the non-adapted ones of all Russian words (see diagrams 1, 2 and 3) 1-3 - Junior generation. 4-12 - Middle generation. 13-17 Elder generation. 35 π 30 25 I 20 S 15 t-l b

10 -

5 Η 0 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17

• Urban Bulgarians

• Bulgarians from Odessa region

• Bulgarians from Azov region

19 Ukrainian

Diagram 1. Frequency of adapted and non-adapted Russian words in Bulgarian

112 Ol'ga S. Parfenova 1-3 - Junior generation. 4-12 - Middle generation. 13-17 Elder generation. 30 -ι 25 * 20

ο I

J0 ä S 8 er . ο 60§ σ 50Si

40 30 20 10 -

01

2

3

4

5

6

Kilophones (text of 1000 sounds)

Urban Bulgarians Bulgarians of school age from the Odessa region Middle-aged Bulgarians from the Odessa region Elder Bulgarians from the Odessa region Elder Bulgarians from the Azov region Diagram 4.

Frequency of adapted and non-adapted Russian words in Bulgarian

114 Ol'ga S. Parfenova

The graphic representation shows that there seem to be two groups according to their use of Russisms: on one hand there are urban Bulgarians and the demographic group of middle-aged villagers, on the other hand, there are the demographic groups of younger and elderly villagers. Thus, the Bulgarian speech of the older generation does not differ substantially from that of school age people, and the changes going on in the language are not are not linear but cyclic in nature. The result obtained reaffirms the viewpoint of the German dialectologist Karl Nahrings who pointed out that the characteristics of dialect speech in influenced, not only by the biological, but also the social age, i.e., the involvement of the person in the sphere of production, and his/her relationship to the more prestigious language variety (Krjuökova, 1991:21) The estimation of the student criterion (t-criterion) aiming at the detection of noticeable differences between sociolects has revealed that all variation series corresponding to sociolects belong to one parent population. The only uncertainty left is the difference between the urban Bulgarians and the ones from the Azov region for there the t-criterion lies within the assumption interval (t=2,43). The negative result signifies the maintenance of Bulgarian in the Ukraine but accompanied by an increase of Russian lexical elements used predominantly in their non-adapted form.

6. Functional characteristic of Russisms in Bulgarian The processing of the dictionary data revealed that in the totality of foreign language usage 40% of the words belong to the class of nouns, 25% are adverbs, 13%, verbs and other parts of speech make up less than 8%. The most frequent (from 15 to 40 usages) Russian words are oöen 'very', takze 'also', sejöas 'now', semja 'family', ran'se 'formerly', voobsöe 'generally' or 'so', skola 'school', uze 'already' or 'no l o n g e r ' , ' c o l l e c t i v e farm', vse 'everything', koneöno 'of course', daze 'even', sorok 'forty', vot 'here' (with the meaning of'here it is'). The main task of the analysis of the Bulgarian vocabulary was to find out

Linguostatistic study 115

was to find out meanings expressed by: a) lexemes of Russian only; b) lexemes of Bulgarian only; c) lexemes of both Russian and Bulgarian with equal or different frequencies. 6. J.

Means for expression of temporal semantics.

In the group of adverbs of time there are those of the most frequent Russian words found in Bulgarian - sejdas 'now' and ran 'se 'earlier' (see example (la)). Informants of the elder generation utter these words with Bulgarian phonetic characteristics (i.e., sounds ζ and s are palatalized). The adverb sejaas is used on a par with its Bulgarian equivalent sega, but ran'se does not have any equivalent. The example (lb) demonstrates variability in expressing the meaning 'now' in the same idiolect, a phenomenon characteristic for the speech of many Bessarabia residents. ( 1 )* a. Seas UZE nesesbirat. RAN'SE Now no longer gathers together - Negat. Formerly gu praznuvaxa - njamase AKSERKI. it-Acc. celebrated - had not obstetricians. 'Now nobody gathers together. Earlier it (a holiday) was celebrated - there were not obstetricians.' b. A pyk seas polu&a pet xiliadi, sega mi nabavi oste And well now have got five thousands, now me added else dve xiliadi. two thousands. 'Now I have already got five thousands, now they added me two thousands.' As the reader can see, the adverb ran 'se can be qualified as borrowing, whereas, sejdas is a switched item.The use of both, Bulgarian and Russian words, was however found in expressions of other temporal

116 Ol'ga S. Parfenova meanings (for 'when', 'after', 'while', 'never'), and the use of Russian equivalents reveal instances of codeswitching. Russian adverbs not having their Bulgarian equivalents are: vsegda 'always', nedavno 'recently', pozdno, 'late', davno long ago', inogda 'sometimes', snaeala 'first', spustja 'ago', vnaöale 'in the beginning', vskorosti 'soon' (see examples (2a,b) ). They are mostly used according to the Russian phonetics with the exception of adverb pozdno 'late'. Most probably it is perceived by Bulgarians as being Bulgarian: it is pronounced according to the Bulgarian phonetics and was used in speech by many informants. ( 2 ) a. Ami, imaxme nedavno koncert "Proletni praznici". Well we had recently concert spring - Adj. holidays. 'Recently, we had a concert here: "Spring Holidays'" b.

Davno ne sym gu vizdala. Long ago be - Aux. Negat. him saw 'It is long ago that I saw him'

Bulgarian adverbs having no Russian equivalents belong either to dialects or to literary Bulgarian and their number is no less than that of analogical Russian ones. The low frequency of adverbs without equivalents in the data makes it impossible to determine their status. To sum up, the analyses of usage of Russian and Bulgarian time adverbs have shown that they play an equal role in Bulgarian speech, something that cannot be said about other thematic groups within temporal semantics.Thus, Bulgarians use Russian names for the months usually without any morphological adaptation; on the other hand, they use Bulgarian ones for the days of the week. It is worth noting that in Bulgarian speech indication of the date of an event with numerals is done either by a mixed construction or simply in Russian. (See examples ( 3a or 3b). V

( 3 ) a. Rabotia ν SKOLATA, uätelka, ot vosem'desiat ( I ) work in school, teacher, since eighty

Linguostatistic study 117

sedmata godina. seven year. Ί have been working in school, as a teacher, since 1987'. b. No nasite oste diadovci dosli.... ν tridcatyje gody But our else grand-fathers came... in thirty's years. dosli. came. 'But our ancestors came... they came in the 30s'. Generally speaking the indigenous lexemes designating numerals (with the exception of the numerals for 'forty' and 'eight') are preserved and they enter into synonymic relations with the Russian numbers. We noticed that usage of particular numerals is related to their function in the sentence: in order to indicate dates, speakers use Russian more often, and to express quantitative semantics they use both Bulgarian and Russian.

6.2. Means to express locality meanings The most frequent Russian word with locality semantics is the adverb doma 'at home'. The word belongs to Russian and also to some Bulgarian dialects. As synonyms Bulgarians use two other combinations, which are characteristic for mother-country language ( u doma and ν kysti), but their frequency is limited. The fixation and spreading of the adverb doma took place under the impact of Russian. In general, if we consider adverbs with locality semantics, the predominance of Bulgarian words stands out. In our materials there were only three Russian adverbs which had no equivalents: krugom 'around', kuda 'where', rjadom 'near' (see examples (4a) and (4b)). ( 4 ) a. A pyk tuk ima nemcite - rjadom tija And Part, there were Germans - near these nemci. Germans. 'There were also Germans here - these Germans were near'.

118 Ol'ga S. Parfenova b. Razvalili tova öerkva... Krugom naselata gi Destroyed this church ... Around in villages them vyzstanovljavat. rebuild. 'They destroyed this church... In villages all around they are being rebuilt.' There are many more Bulgarian adverbs of place which have no Russian equivalents ( for example, for 'far1, 'from there', 'on the other side', 'on the top', 'outside', etc.). Furthermore, geographical names are used by Bulgarians mainly in adapted morphological forms; nevertheless the phonetic character of the words does not change. Similarly to some dialects of the mother country, in the speech of the older generation the geographical names are sometimes used in a definite form. Consequently, mostly Bulgarian lexical means are used in locative functions in the Bulgarian speech in the Ukraine and therefore this group of Bulgarian adverbs is less accessible to foreign-language influence than what is the case for temporal adverbs.

6.3. Means of expression of modal meanings In regard to modality as reflected in the speech of our subjects, we shall on;y discuss those meanings on which we have sufficient material for the analysis.

6.3.1. Means to express negation In the Bulgarian speech in both the Odessa and the Azov region we met the Bulgarian ne or njama, Russian net, and Ukrainian ni or nema in different types of negative statements. The Russian negative particle net is used only as a separate utterance (see example (5)) or as a part of idiomatic expressions. It is more often met in Bessarabian texts.

Linguostatistic study 119

( 5 ) a. Net, nie srazu go kupixme. No, we right away it bought. 'No, we bought it (the flat) right away' The Ukrainian means of negation (the particle ni 'no',the verb nema 'there is not') are much more rare than the Russian ones, but they are used in various type of negation: (a) in a negative sentence; (b) in the negation of the predicate expressed by a compound verbal predicate; and (c) a negative expression (the verb nema) in an existential sentence (see example 6). (6)

Ne, nema TAKI slucaja. No, do not have such case 'No, there was no such case.'

A fundamental difference between Ukrainian and Bulgarian on the one hand and Russian on the other consists in the means of formation of semantic representation of existential sentences: in Russian it is the verb 'to be' that is used, whereas in Ukrainian and Bulgarian - it is the verb 'to have'. In our recordings of Bulgarian speech we did not find examples where the Russian particle would be used in the sentences of this kind. Generally, the use of the Bulgarian means of negation is more typical for recorded speech than are the Russian or Ukrainian ones.

6.3.2. Means of expression of assertion and supposition In Bulgarian speech modal words expressing assertion and supposition are, as a rule, Russian. Among the most frequently used modal words, Russian kone&io 'of course' can be qualified as borrowing , others like mozet byt' 'it is possible' and navernoe 'probably' are switches because their Bulgarian equivalents occur elsewhere in the data. More rarely used Russian modal expressions are the following: pravda 'in truth', dopustim

120 Ol'ga S. Parfenova

'let us assume', dejstvitel'no 'really', kazalos' by 'it seemed', kazetsja 'it seems', mozet 'maybe' ( see examples 7a and 7b). ( 7 ) a. Razbiram, pravda, ne vsickoto. Understand-1st p.S., in truth, not everything. 'In truth, I do not understand everything' b. To misli, ce tova, dejstvitel'no, tvojto lapi. He thought that this, really, your guy. 'He thought that this is really your guy'. Contrary to Bulgarian modal adverbs which are hardly used by Bulgarians, the particle maj, which expresses supposition 'perhaps', 'very likely', is well known to Bulgarians of both regions. It is possible that its preservation was favored by the lack of an exact equivalent for it in Russian and so in different contexts it is translated into Russian by different means.

6.3.3. Means to express generalization and specification To express generalization and specification Russian modal words are mostly used. Among them the most frequent are ν obsöem 'in general' (see example 8a) and voobsöe 'generally speaking', they have no Bulgarian equivalents, so we can treat them as borrowings. Less frequently used Russian modal words are: ν principe 'principally', glavnoe 'most important', nakonec 'at last', po krajnej mere 'at least', ν smysle 'in a sense', and sobstvenno 'in fact'. ( 8 ) a. ...v kluba odi - po selata, ν obsdem, peje. ..in club goes-3rd p.S. around villages, in general,, sings, '..she goes to the club - around villages, she sings, in general.' The data also contain Russian-Bulgarian equivalents for 'so', 'mainly', 'thus', which all have a common origin.

Linguostatistic study 121

6.3.5. Emphatic means of Bulgarian speech Among the means of marking and emphasizing first and foremost there are some determinative adverbs. The most frequent Russian adverb was found to be the adverb oeen 'very'. In Bulgarian the adverb mnogo compares to it, but the Bulgarian word is multivalued as it is simultaneously an equivalent of the Russian mnogo 'much'. In recordings of Bessarabian speech mnogo is used only according to Russian usage, in the Azov region it takes both meanings. In its function of feature intensification the adverb sil'no 'strongly' is used, which is characteristic of southern Russian speech, especially in Bessarabia ( see example 9). (9)

Toj sil'no posedja. He strongly became grey. 'He turned very grey.'

High frequency of usage is also characteristic to the Russian emphasizing particle uze 'already'. We found the Bulgarian adverb vece, which compares to it, but only in Bessarabian texts recorded in one and the same village. In other villages only the Russian uze is used (see example 10a). ( 10) a. Tja uze vsicko razbira. She already everything understands. ' She already understands everything'. The adverb toze 'also' is displacing its Bulgarian equivalent from the Bulgarian speech a fact that is shown by the proportion of its usage. In contrast to the adverbs toze 'also' and oden' 'very' displacing the traditional Bulgarian words, the Bulgarian adverb for 'else' maintains itself firmly in place. Other Russian words and word combinations with emphatic semantics are much more rare in Bulgarian speech, but their large number attracts attention: vse-taki 'just the same', vse ravno 'nevertheless', prjamo 'right away', opjat' ze 'again', bolee - menee 'more or less', dovol'no 'enough', osobenno 'in particular', bukval'no 'literally', konkretno

122 Ol'ga S. Parfenova

'exactly', soversenno 'absolutely', [particle] (see examples 11).

kak raz 'just', odin 'one', ze

( 1 1 ) a. A pyk moja sin ziveje kak raz And Part, my son lives just Ivan. Ivan. 'And my son lives just opposite Ivan.' b.

naprotiv s opposite with

I takyv pomidor se polucava xubav - prjamo And such tomato get- Refl. good - right away gu jades i vodata pies kakto vino. It eat-2nd p.S. and water drink as wine. 'And tomatoes get so good - right away you eat them and drink water as wine'.

Bulgarian emphatic adverbs are less diverse. In fact, some of the meanings are only expressed by the Bulgarian words for 'only' and 'again'. As a general characteristic of the means of emphasis in Bulgarian speech one notes that both lexica, the Bulgarian and the Russian, play an equal role.

6.3.6. Means to express obligation In the Bulgarian speech of the Ukraine, the modality of obligation is expressed by the Bulgarian modal verb trjabva, just as in the language of the mother country. The Russian equivalent nado 'one should', which belongs to the class of modal adverbs, was encountered only in one text. Commonness of origin and functioning of verbs accounted for the use of the Ukrainian verb treba in the narration of a Bessarabian schoolgirl ( see example 12). (12)

Lapito mu treba systoto ν tozi grad. The boy he-Dat. needs also in this town. 'The young man also has to get to this town.'

Linguostatistic study 123

Nevertheless the subjective assessment of obligation is expressed only by the Russian lexeme objazatel'no 'without fail' ( see example 13). We found nowhere the Bulgarian analogue in our Bulgarian speech samples. (13)

Davat na majkata otrez, tam, sitec, Give-3rd p.PI. to mother length, well, calico kojto kakvoto ima. Objazatel'no takovo besi. who what has got. Without fail this was. 'Mother is given a length, well, calico, everybody gave what he had. It was without fail.'

6. Conclusion The present investigation has confirmed my suggestion that Bulgarian in the Ukraine shows, in the widest sense of the term, the effects of a kind of codeswitching, in which elements of Russian constitute a 7 - 30% in a 100-word text, and they are mostly used in their non-adapted form. The most russified variant is the speech of socially active population. Differences between the older and the younger generations in their use of Russian lexemes is not as important as it would seem at first glance. Nevertheless the analysis of a part of the core vocabulary has shown that many of the Russian words play a significant role in Bulgarian speech: some of them enter into synonymic ties and have almost equal usage with their Bulgarian equivalents; others are replacing or have already replaced the indigenous words. The process of relexification of the core vocabulary is due to both social and linguistic conditions. The meanings which are expressed by bound forms are almost not subjected to shifting: this demonstrates the state of affairs in the domain of negation and obligation. Some processes of replacing free forms from the core vocabulary may have social implications. The significant role of Russian items for expressing generalization, specification, assertion, and supposition can be justified by the absence of Bulgarian as a tool language in education, a situation that results in the development of logical thinking in a foreign language.

124 Ol'ga S. Parfenova

Probably this fact and also the absence of Bulgarian in official settings has influenced the use of Russian in dates. We cannot explain here the causes for the frequent use of Russian items for temporal references and their rare use for localities. It may be a specific feature of Russian-Bulgarian codeswitching, or it may be typical of the speech behavior of representatives of other ethnic groups. A study within a functional framework, as proposed here, might help to identify both common and specific features of the codeswitching mechanism. As regards the state of Bulgarian, in the near future we can expect the strengthening of the Ukrainian influence as well as the upgrading of Bessarabian Bulgarian through vocabulary building from literary Bulgarian. Which tendency will manifest itself more actively depends on the main goals of a language policy to be decreed by the Ukrainian authorities.

Notes * 1.

2. 3.

4.

5.

In the displayed examples all Russian words - not only those being discussed - are either bolded (non-adapted borrowings) or capitalized (adapted borrowings). This survey forms part of the project Non-indigenous minority languages on the territory of the former USSR ( 96-04-06360) supported by the Russian Foundation of Humanities. Bessarabia was joined to the Soviet Union in 1940; before, it had been occupied by Rumania since 1918. In the southern and eastern Ukraine Ukrainian is not wide-spread among the population, and many Ukrainians acknowledge Russian as their mother tongue. As my investigation has revealed, only 16% of the informants speak Ukrainian. One of the most interesting cases of adapted Russian items is when the word contains both Russian and Bulgarian reflexive morphemes. For example, se sätaetsja ' to be considered'; se - Bulgarian reflexive particle, sja - Russsian reflexive suffix. In our investigation we use the linguostatistical method in interpretation of Nadezda V. Kotovajvloscow and Miroslav Janakiev, Bulgaria (see Janakiev, 1977). I would like to thank Nadezda V. Kotova and Miroslav Janakiev for their help and comments on linguostatistics and the software for computer processing.

The role of semantic specificity in insertional codeswitching: Evidence from Dutch -Turkish Ad Backus 1. Introduction The literature on codeswitching tends to deal more with the morphosyntactic integration of elementsfromanother language than with the motivation for using exactly these elements and not others. This chapter describes a semantic-pragmatic study of insertional codeswitching. Patterns of lexical selection can yield new insights into the process of lexical renewal, and, ultimately, contribute to a fuller explanation of observed codeswitching patterns. Lexical borrowing can be seen as the conceptually most simple type of linguistic change. New words are added to the lexicon, either as pure additions or else as replacements for older (LI) words. In contrast to the change itself, the reasons for it have not been studied very often, although Weinreich, Labov and Herzog (1968) have called on sociolinguists to place the study of the actuation problem high on their research agendas. This article is an attempt to address this issue. Some typical examples of insertional codeswitching are given in (1). (1)

a.

Swiss German-Italian (Preziosa di Quinzio 1992, quoted from Franceschini 1998)

perche, meinsch che se tu ti mangi emmenthaler ο se tu ti mangi una fontina, isch cm en unterschied, oder? Schlussändlich e sempro dentro li pero il gusto isch andersch. 'because, you mean, if you eat Emmental cheese or if you eat Fontina cheese, there is also a difference, isn't there? Actually, it's still there, but the taste is different.'

126 Ad Backus

b.

English-Swedish (Boyd, Andersson and Thornell 1991) but there were, I think, four or five foreign lecturer tjänsts. 'but there were, I think, four or five foreign lecturer positions.'

c.

Irish-English (Stenson 1990) Nior thog se ach split second 'It only took a split second.'

These examples are not only typical in that the morphosyntactic integration of the Embedded Language elements is completely directed by the Matrix Language. They are also typical in that the inserted embedded language words are such logical codeswitches. Emmenthaler is a Proper Noun, the name of a cheese, and thus maximally specific as to what sort of cheese is being referred to. The Swedish word tjänst is probably so intimately associated with professional life in Sweden that it just doesn't feel right to refer to it with its English equivalent. Example (lc) illustrates a different source of unique reference: the composite expression split seconds a conventionalized way of saying very quickly. Figurative language normally doesn't have an exact equivalent in the other language because the particular metaphor underlying the idiom is unlikely to have been used in that language as well. The purpose of this chapter is to develop some ideas about what it is that makes a content word particularly borrowable, on the basis of data from a contemporary contact setting, the Turkish immigrant community in Holland. From the literature on codeswitching one gets the impression that very general words are rarely taken from the other language. This has led me to expect that a high degree of semantic specificity stimulates codeswitching. The next section provides a review of the literature on language contact with respect to the topic touched upon above, and introduces the Specificity Hypothesis. Section 3 will present the Turkish-Dutch data; Section 4 is a case study of one semantic field. Some implications for the theory of language contact are discussed in the final section.

Semantic specificity 127

2. Specificity The basic idea to be expressed is that high semantic specificity enhances a word's chances of being used as a codeswitch. Something is considered highly specific if it is hard to replace with another lexical item. If it can only be paraphrased with a novel expression, it is maximally specific. I further assume that familiar lexical items are generally preferred over novel expressions, even if that familiar item is part of a language other than the one used in building up the present clause. This seems to be a plausible general assumption about how people go about conducting conversations, perhaps relatable to a maxim of relevance (Ariel 1998: 202). Cross-linguistically, it may be assumed that semantic congruence is higher for general concepts. Although connotations and associations will differ from language to language, even for the most basic words, bilinguals will generally see enough overlap in meaning between matrix language and embedded language words for something as general as tree to effectively warrant equivalence (Myers-Scotton and Jake, 1995: 988). On the other hand, it is likely that codeswitching will be easier for embedded language words that are not equivalent with anything in the matrix language. Otherwise, why would one want to use a foreign word for something that is quite straightforwardly expressed with a familiar word from the matrix language (Meechan 1995)? That is, of course, not to say that such a thing would be impossible; I merely suggest it would not be likely. One may wonder whether it is semantic or rather semantic-pragmatic factors which are most important in explaining the selection of embedded language words. If it is primarily a matter of semantics, the mere referential characteristics of a word should force its selection. Alternatively, it could be assumed that the totality of a word's meaning, i.e., including its connotations, is what is relevant to the speaker selecting it.

2.1. Semantic definition As Thomason and Kaufman (1988) note in their taxonomy of borrowing,

128 Ad Backus

every contact setting involves at the very least borrowing of non-basic vocabulary. It is the aim of this chapter to make the phrase non-basic vocabulary more precise, by classifying instances of this broad category along a cline going from highly specific to more general. The specificity hypothesis in (2) claims that at the specific end of this continuum, borrowing is most likely. (2)

Specificity Hypothesis Embedded language elements in codeswitching have a high degree of semantic specificity.

First of all, let me point out that the notion of specificity as it is used here refers to the inherent semantics of lexical elements. It is not to be confused with referential specificity, a discourse-pragmatic effect achieved by such elements as case markers and definite articles, and used to single out one instance of a concept as the one talked about. That is not to say that the two uses of the term are unrelated: modification with an adjective or a relative clause, for instance, makes a noun more specific in both senses of the word. This entails further that, even though most of this chapter deals with inherent specificity, the semantic characteristics of a lexical item in isolation, its degree of specificity in context is modified by that context as Van Schaaijk (1996: 60) points out that the marriage in John's marriage is a lot more specific than the one in Marriage is a good thing. The ultimate basis for the hypothesis in (2) is the assumption that borrowing speakers only take from another language what they need. They react to their circumstances byfillingin the gaps they perceive in their vocabulary when they attempt to use it to talk about the world around them. The other side of the coin is that the process of borrowing does not affect what the speakers already have: basic vocabulary and the ways of combining words, i.e., syntax and morphology. To be sure, syntax and even morphology can be borrowed too, but this occurs either very late in a language shift scenario, where the first language is undergoing attrition and the speakers can thus be said to have gaps that can/must be filled, or it occurs as a by-product of lexical borrowing, cf.

Semantic specificity

129

Johanson (1992) on mixed copies, i.e., caiques that include at least one borrowed morpheme as well. Gaps must be perceived before they can be filled, and the nature of language is such that lexical gaps are perceived easier than any other gap: looking for the right word is surely a more frequent phenomenon than looking for the right construction. Codeswitching then, is seen in this chapter as either filling such a gap, or as reflecting an earlierfillingof a gap, one that was so successful that the solution has been conventionalized, i.e., it spread through the speech community to become what we call an established loanword. Gaps don't involve basic vocabulaiy, as the latter are likely to be shared by all or most languages. The forms are different, but the underlying concepts for things like man, tree, nose and do are probably almost identical cross-linguistically. Hence a bilingual speaker will not perceive a gap here in his matrix language. A Dutch immigrant in the United States may briefly compare Dutch vrotiw and English woman, and decide they are similar enough to warrant the Dutch word in his developing variety of American Dutch. But when he compares an American high school with a Dutch middelbare school, he will notice certain crucial differences, especially in their non-central meanings (differences in school environment, athletic programs, grading system, language of instruction etc.), and accordingly decide to refer to an American highschool as high school in his Dutch from now on. Though the considerations mentioned so far may sound plausible in an intuitive way, terms like specific and basic, or general, will have to be defined in a more abstract way in order to be useful as anything more than a superficial yardstick. Specificity is best cast in gradient terms, since it makes more sense to say that one word is more or less specific than another, than to say how specific it is in isolation. When a word is said to be highly specific, it cannot be replaced by something else that is even more specific, except when it is paraphrased. A general word, on the other hand, is easily replaced by something more specific. Therefore, oak is specific, but tree is general. Similar pairs are the lumberjack versus the guy, Sue versus the woman, and burrito versus Mexican food. As I do not have much more to offer at this point, the terms high specificity and low specificity can be equated with higher-level vocabulary and basiclevel vocabulary, respectively, but I wish to emphasize that a cline of

130 Ad Backus

semantic specificity is likely to be a more accurate way of describing subtle differences between elements, especially in semantic fields that are not easily split into dichotomies. Perhaps a better term would be an element's complexity, but I would like to retain that term for the build-up of morphemes, that is, for structural complexity, rather than for inherent semantics.1 Along similar lines, Field (1998) emphasizes the importance of the degree of semantic independence in determining an item's borrowability: the higher this degree, the easier it is to borrow that item. Myers-Scotton and Jake (1995: 988) concentrate on whether an element is an "easily accessible concrete entity". If it is, as in the case of nose, this eases the calling of the lexeme along with the concept. Van Coetsem (1997) uses the term stability for what I have called specificity, concentrating more on the effect than on what it is that brings this effect about. Especially "contentive vocabulary", he claims, is less stable, and therefore more borrowable, than structural elements. The next section will discuss some problems involved with this purely semantic definition. It will be shown that the hypothesis can be made more promising if meaning is defined in broader terms, taking into account, in particular, culture-bound connotations.

2.2. Pragmatic definition The codeswitching literature has come to the conclusion that nouns are by far the easiest element to switch. But, in addition, it is often pointed out that within the class of content words, certain nouns are typical candidates for borrowing, at least implying that certain others are not. Certain semantic fields predominate among the loanwords in any language. Identifying these semantic fields yields information about language contact situations the borrowing language has found itself in the past. Similarly, synchronic contact situations show that talking about certain topics stimulates codeswitching. This is at odds with the thought expressed in the previous subsection, that only the referentially determined level of specificity determines the likelihood that an embedded language word will be inserted into a matrix language clause. If that were true, codeswitching should not show a skewed distribution

Semantic specificity 131

across semantic domains, which, alas, it often seems to do. The idea was that codeswitching can proceed if an embedded language candidate for insertion means something sufficiently different from any matrix language equivalent. However, one may wonder whether true translation equivalents exist at all in the grammars of bilinguals. Connotations often differ, if only because the languages themselves are evaluated differently by members of a bilingual community (Becker 1997). English words convey modernity at Taiwan universities, for example, irrespective of whether they also fill a lexical gap (Chen 1996). Kamwangamalu (1992: 177) shows that monolingual speech may yield a pedantic (embedded language) or old-fashioned (matrix language) image in Tanzania. It has often been observed, on the other hand, that mixing the languages may function as a signal of dual identity (Myers-Scotton 1993 a), so that basically any embedded language element will do, regardless of its meaning. Even so, translation equivalents often have differing connotations on their own. Translatability is probably hardest for culturally loaded words, which may simply fill complete gaps in the other language (Lauttamus 1990). A typical reflex of this pattern is the use of borrowings from a language associated with learning in all kinds of intellectualfields.These words presumably inherit the sense of modernity from the values the language of their provenance indexes. Sometimes this works the other way round: there is less emotional attachment to words in a foreign language, so that taboos in the matrix language may promote the use of embedded language equivalents (Necef 1994). Knowledge of such subtle semantic differences between translation equivalents is a hallmark of proficient bilinguals (Oksaar 1972: 442; Singh 1995). Myers-Scotton (1995: 82), in a view of the lexicon based on Levelt's work, but also similar to the one embraced in Langacker (1987: 55), recognizes that this kind of connotational, or encyclopedic, information is part of the meaning when she argues that "differences in semantic fields and socio-pragmatic features are salient in the lemma selection process." Although in the matrix language frame model this notion of congruence is mainly applied to cases where there is a lack of morphosyntactic congruence (such as different subcategorization frames), it certainly applies to lack of semantic congruence as well (referred to by Myers-

132 Ad Backus

Scotton and Jake (1995) as "pragmatic mis-matches").2 If semantic domain is an important predictor of switches, then it must be part of our definition of specificity: being saliently connected with the embedded language culture enhances a word's specificity. Such topics have been experienced and talked about in the embedded language most of the time, and are therefore identified with the embedded language. Embedded language vocabulary is better developed in these fields. However, with increasing levels of bilingualism, these domain effects get weaker. For instance, Field (1998) notes that pretty much every Spanish content word or function word can appear in Mexicano, and estimates that about 60% of words in Mexicano speech is Spanish in origin. It is also noteworthy that matrix language words are retained better in some semantic domains than in others (Lauttamus — Hirvonen 1990). The specificity hypothesis in (2) must be read with this broad definition of specificity in mind. Codeswitching is likely for embedded language words that are high in specificity, where highly specific means both that the word has a highly specific referential meaning, and that its matrix language equivalent, if there is one, conjures up quite different connotations.

3. Semantic specificity in Turkish-Dutch codeswitching 3.1. The data The data I will review below derive from the Turkish immigrant community in Holland. I have reported on them extensively elsewhere (Backus 1992, 1996,1999), and will only concentrate here on the aspect outlined in Section 2. The community has its roots in the labor migration that started in the late sixties. Today, it is a sizable minority community, concentrated in the urban centres. Seven social networks were investigated. Conversations between two to seven members of a network were recorded and transcribed. The informant pool represented variable immigration histories. For classificatory purposes, a distinction was made between a first, an intermediate, and a second generation. Intermediate generation

Semantic specificity 13 3

informants arrived in Holland when they were between 5 and 12 years old. If they were younger than 5 on arrival, they were considered second generation; if they arrived older than 12, they were classified as first generation. The conversations were entirely unguided. No instructions were given beforehand as to language choice or topics of conversation, apart from the one general request to talk "as they always do". This resulted in seven recordings which are widely different in temperament and style, but which nevertheless allowed for substantial generalizations. Generally, first generation informants spoke Turkish with just a few Dutch content words thrown in. The most general pattern for the intermediate generation was a mix of intrasentential codeswitching with Turkish as the matrix language and actual alternation. The second generation engaged in frequent intersentential codeswitching (see Backus 1996, for details). Relevant for our purposes here is the question which particular embedded language elements get chosen in insertional codeswitching. Given the rarity of codeswitching from a Dutch base, I will only discuss Dutch insertions in Turkish clauses. For reasons of space, I will limit the discussion to three of the seven networks. Dutch insertions into Turkish clauses will be listed first according to their semantic domain membership.

3.2. The role of semantic domain As we will see, many of the embedded language insertions classify as cultural borrowings: they have a high degree of specificity because they are the only viable candidates for encoding a given concept. However, we will also see that borrowings tend to predominantly come from certain semantic domains which are typically associated with the embedded language. The relation between high specificity and semantic domain is investigated in more detail in this subsection.3 Perhaps semantic domain is the only concept needed to explain the selection of embedded language elements in insertional codeswitching. In that case, the hypothesis in (3), which is conceptually more simple than (2), would have to be preferred.

134 Ad Backus

(3)

Semantic Domain Hypothesis Every embedded language insertion is used by virtue of its belonging to a typically embedded language semantic domain.

This hypothesis is easily tested by looking at all insertions and checking whether they belong to predictable domains. Bearing in mind that much of what we get is dependent on the topics which happened to be covered during the recordings, most Dutch domains in the Turkish-Dutch data are fairly predictable. They are connected to Holland in an intimate way; examples include education, job hunting, work (cf. the hospital terms in Ayhan's speech), and various aspects of social life in Holland, such as fashion, dating and sports. What makes these fields typically embedded language is that speakers have experience with them through the embedded language. Dutch is used in interactions associated with the semantic field, so that much of the vocabulary belonging to it has made its way into their idiolects. The next subsections each describe domain effects in one of the networks, each one representing a different immigrant generation.

3.2.1. First generation The insertions produced by one of the two first generation networks, referred to as the Tilburg women, are given in (4). All five speakers contributed one or more cases. (4) A. school terms: Hemelvaart 'Ascension Day' Nieuwkomers 'newcomers' toets 'test' herhaling 'repetition' zeergoed 'very good' procent 'percent' (2x) pauze 'break' vakantie 'holiday' hoeveelprocent 'what percentage'

B. Dutch culture: Tilburg-Noord 'Northern Tilburg' Burger King (2x) MacDonalds gulden 'guilder' (2x) friet 'fries' (3x) terras 'open-air cafe' (2x) hamburger (2x)

Semantic specificity 13 5 C. others: bijna 'almost' direct 'immediately'

Most of the Dutch elements (16 out of 18) reflect aspects of these women's lives that involve the Dutch language. These include the Dutch class and certain areas of social life. Many of the Dutch words fill lexical gaps. Of the school list, two words are proper nouns (Hemelvaart, the name of a holiday, and Nieuwkomers, the name of the Dutch class they have been assigned to). Other words seem more general at first sight, but in school contexts they are used in a very specific meaning: toets and herhaling are names for certain types of exercises, and zeer goed is a grade which can be achieved for them. These are clear examples of cultural borrowings. The other words listed under school terms, though still obviously related to school life, are more general, since their meaning does not change very much outside the school context. On the other hand, for the informants these words are likely to be school terms. Similar considerations apply to the other semanticfieldto which Dutch contributes words in this conversation: Dutch social life. Words like firiet, hamburger and terras are not lexical gap fillers in the narrow sense, but they are indisputably cultural borrowings. Their connotational, and to a certain extent even their referential meaning, differs from that of their near-equivalents in Turkish. Dutch friet is slightly different from Turkish pomfrit, as it is different from English chips or American French fries. Similar considerations apply to terras, referring as it does to a Dutch outdoor cafe, which differs in certain architectural, social and other aspects from similar places abroad.4 Several other semantic fields figure in the conversation, and these do not contribute any Dutch words. Most relate to the overall theme of cultural differences between Holland and Turkey, more in particular how the immigrant community deals with them. Subthemes within this field are: male-female relations, the first days in Holland, daily life in Turkey, an incident which had occurred in a Turkish shop in Tilburg, and the trip from Turkey to Holland. The women also briefly discuss what to talk about and what they should do immediately after the recording. The negative evidence that none of these fields yields Dutch material is the strongest indication that semantic field is a fairly accurate predictor of

136 Ad Backus

insertional codeswitching for the first generation. This entails that the notion of specificity can only be maintained as a relevant one if it makes reference to semantic domains.

3.2.2 Intermediate generation The main intermediate generation informant, Ayhan, contributed the following Dutch insertions in Turkish clauses. Not listed are the insertions used by his interlocutor, who was a second generation immigrant. The insertions are divided by semantic field and the list includes everything that could possibly be construed as an insertional codeswitch, including full constituents. While the list given for the first generation informants in (4) above pretty much exhausts all the Dutch elements used in their conversation, Ayhan also used a lot of inter-clausal and intersentential codeswitching. Roughly speaking, he used about twice as many Turkish as Dutch clauses. (5) A. education/job hunting: Al. job hunting proper: arbeidsbureau 'employment agency' (5x) advertentie 'ad' politie-academie 'police academy' (2x) administratief medewerker 'administrative employee' vast 'permanent'

A2. higher education/bureaucracy of the ed. system HBO 'higher vocational education' decaan 'dean' (4x) HBO-opleiding 'a course at the level of higher education' verhörte opleiding 'shortened study' van έέη jaar, twee jaar 'of one year, two years' als jij administratief opleiding 'if you [do] an administrative training course'

B. hospital: foto 'X-ray' begeleid 'supervisor' begeleidster 'supervisos' laborant 'lab worker' particulier 'private' met de begeleiding' with supervision' van mijn begeleider 'of my supervisor' C. Dutch social life: kerst 'Christmas' samenwonen 'to cohabit' bowlen 'to play bowling' bowling

Semantic specificity 137 A3, education in general: computer inzicht 'insight'

D. bureaucracy vergunning 'permit' met wettelijke ... 'with legal >

taalniveau 'language proficiency level' college 'class' A3 (continued) proiktijk 'practice' theorie 'theory' E. others: hotel migraine getuige 'witness' (2x) pakket 'package' kijken 'to look' wegnemen 'to steal' eenvoudig 'simple' moeilijk 'difficult' verantwoordelijk 'responsible' plus 'and' en 'and'

vriendin 'girlfriend' de mogelijkheden 'the possibilities' elke meter 'every meter' map 'folder' handigheid 'trick' binnen twee weken 'within two weeks' met alle,iedereen 'with all,everybody' voor alle zekerheid 'to be on the safe side' als getuige 'as a witness' nog nooit getuige 'never as witness' en ja 'and well'

The number of insertions that have to be qualified under others (21 out of 51) is much higher than for thefirstgeneration, where only two words were so classified. Still, many of the words are related to a semantic field in which Dutch naturally reigns supreme: the job market in Holland. This was also the source of many of the Dutch insertions in earlier data from similar informants, reported on in Backus (1992). Many of these terms are part of larger Dutch stretches, for example administratief opleiding in (6a). Yet others, of course, occur in Dutch sentences, such as politie-academie in (6b).

138 Ad Backus

(6)

a.

o bana söylemi§ti ki: "als jij administratief opleiding yaparsan, dat is ook op een HBO-niveau en dan kijk ik ookweer met Nederlandse problemen." (82)5 'she said to me "if you do an administrative training course, that is also at HBO level, and then I will help you with your Dutch again ".'

b.

dus ik had naar politiebureau gebeld, je kunt daar solliciteren want ze hebben liever allochtonen bij politieacademie als... (110) 'so I had called the police office, you can apply for jobs there because they'd rather have foreigners in the police adacemy than...'

Some of those words belong to job hunting proper; others to the more bureaucratic sides of the educational system, which is intimately tied in with job hunting. These educational terms belong more specifically to the domain of higher education, the branch of the system with which Ayhan has had most contact recently. However, there are also several educational words from Dutch in (5), listed under A3, that are more general. They seem to be established elements in the Immigrant Turkish of Ayhan and his peers, possibly because they entered their idiolects when they were still in school and becamefirmlyentrenched because of frequent past usage. Possibly terms like eenvoudigvsimple1, and moeilijk "difficult1, here classified as Others, also belong to this group. Since Ayhan talks a lot about an in-house training he had recently been doing in a hospital, there is a lot of hospital jargon in his speech (cf. Section 4.2). Most of the words involved are Dutch, as is to be expected given that the program was in Dutch. That Dutch social life and bureaucracy are associated with Dutch is hardly surprising. Job hunting in general and its subfields (the importance of proficiency in Dutch, work conditions in a factory, the hospital training program, and, as an aside related to the last topic, his supervisor's relationship with a Turkish man), is one of the two main topics of discussion in this conversation. The other one is Turkish weddings, more in particular the

Semantic specificity

139

tasks of a witness at one. This part of the conversation yields very few Dutch insertions into Turkish clauses, even though the number of Turkish clauses is much higher in this part of the conversation than elsewhere. This suggests that the Turkish content vocabulary in this semantic domain is still strongly entrenched. Once more it is this negative evidence which truly brings out the importance of the notion of semantic field in explaining the selection of embedded language content words.

3.2.3 Second generation This conversation is conducted by three young women, all born in Holland. The language of choice fluctuates: there is much intersentential codeswitching throughout the recording, with Dutch being used about four times as much as Turkish. Insertional codeswitching in Turkish clauses, in accordance with the general second generation picture, is relatively rare: most clauses are Dutch and most codeswitching is alternational. The insertions are listed in (7): (7)

A. fashion: föhn 'blow-dryer'(2x) schuim 'foam'(2x) maat 'size' glänzen 'to shine' donker 'dark' dof 'dull' van die, met die bontkragen ' with those fur collars'

B. others: vakkenvullen 'to stock shelves' lenen 'to borrow' geest 'ghost' deel 'part' echt spontaan 'real spontaneously' waarom 'why' en 'and' ofzo 'or something'

Dutch words are most likely to appear in Dutch utterances and Turkish words in Turkish ones, which makes an account of specificity such as was done for the other two generations slightly irrelevant, if only because insertional codeswitching itself is a relatively marginal pattern in the speech of these speakers. It is difficult to give a list of things talked about, because it is a very loose conversation. The girls interrupt each other, they tease each other, topics get dropped and picked up again, etc. The topics include stories

140 Ad Backus

about things that happened at work, gossip about mutual friends, plans for the holiday, hair fashion, discussion of TV programs, a recently held wedding, the plans for the next day, and a lot of small talk. Only the wedding (a Turkish one) and the discussion of plans involve quite a lot of Turkish clauses; for the other topics Dutch predominates. Of the topics talked about, only fashion contributes some Dutch words to Turkish discourse. None of them is a clear cultural loan.

3.3 Summary So far, we have seen that the predictability of embedded language content word selection diminishes across the generations, with the first generation switching in the most predictable way. There, all codeswitching is insertional, and insertions come from a few, typically Dutch, semantic domains. The intermediate generation uses embedded language insertions from a variety of domains, some of them not typically associated with Dutch. Predictable semantic domain effects still account for more than half of the insertions, however. For the second generation, finally, while semantic domain still accounts for some embedded language insertions, the predominant pattern of codeswitching is such that embedded language insertions are not used much to begin with. Rather, the whole sentence is likely to be in Dutch when it includes a word that could very well have surfaced as an embedded language insertion if the sentence had been in Turkish. Obviously, we can never know whether it would have. An interesting kind of negative evidence for the semantic domain hypothesis in (3) is that typically Turkish semantic domains, as was to be expected, feature a more than average number of Turkish clauses with a less than average number of Dutch insertions. However, the semantic domain hypothesis cannot account for those embedded language insertions that do not belong to domains typically associated with the embedded language, and can also not tell us much about how many and which words within a particular semantic domain are going to end up as embedded language insertions. The next section offers a way to deal with this gap.

Semantic specificity 141

4. The role of specificity It was argued in Section 2 that translation equivalents may have encyclopedic meanings which differ enough from each other to effectively render them not equivalent at all (recall the discussion of Dutch friet Frenchfries',and its English and Turkish equivalents). Such details - any semantic details really - add to a word's specificity. The specificity hypothesis, as formulated in Section 2, predicts that the more specific a word is, the higher the chance it stands to become used as a code switch, and, alternatively, that embedded language general words will rarely be inserted into matrix language clauses. In Section 3.2., it was made clear that one can go a long way towards explaining insertional codeswitching by focusing on semantic domain only. It is likely that specificity and semantic field interact. One could also argue, on the other hand, that specificity is irrelevant, that switchability is determined by the semantic field a word belongs to. In its strong version, this cannot be true, as words get switched that do not belong to semantic domains dominated by the embedded language (cf. the words in the lists above that are classified as others. If specificity has anything to add to what can already be attributed to semantic domain, we should see specificity effects within one such domain. I will present one such analysis in Section 4.2., using one of the conversations that were focused on in Section 3.2. But first, we will see whether the specificity hypothesis can shed light on those insertions not covered by the semantic domain hypothesis.

4.1. Words not explainable through semantic domain In (8), all words listed as others in the lists of insertions given in Section 4.1. are brought together:6 (8) FIRST GENERATION: bijna 'almost', direct 'directly'; INTERMEDIATE GENERATION: migraine, de mogelijkheden 'the possibilities', map 'folder', handigheid 'trick', verantwoordelijk 'responsible', eenvoudig 'simple', moeilijk 'difficult', kijken 'to look', wegnemen 'to steal', hotel, pakket 'package',

142 Ad Backus getuige 'witness', vriendin 'girlfriend', plus 'and', en 'and', en ja 'and well'; SECOND GENERATION: vakkenvullen 'to stock shelves in a supermarket', lenen 'to borrow',geest 'ghost', deel 'part', echt spontaan 'real spontaneously', waarom 'why', en 'and', o/zo 'or something'.

Several of these words cannot be explained through semantic specificity either, but can be through certain auxiliary constructs. Some of these will be briefly illustrated in Section 4.3. Here, I will focus on the nouns and verbs, as prototypical content words. Thefirstgeneration does not provide any relevant cases. Recall from Section 3.2.1. that the Semantic Domain Hypothesis already took care of most insertional switches. The second generation conversation provides only four cases: the verbs vakkenvullen and lenen, and the nouns geest and deel. Only one of those is highly specific: vakkenvullen stands for a specialized activity that workers in supermarkets are called on to do. It could normally be replaced by a more general word, such as work, and it can be an answer to a question such as What kind of work?, which explicitly asks for a more specific term. Lenen seems to be a truly established borrowing in Dutch Turkish, as it has been attested many times before. I am not sure how to explain the semantic attractiveness of this word. A tentative suggestion, however, is that the Turkish equivalent ödüng almak carries associations of borrowing money, while Dutch lenen is more neutral as to what gets borrowed. The two nouns relate to TV programs. The first one, geest, describes a character in Twin Peaks, who has turned into a ghost. A compelling semantic reason for using the Dutch word seems impossible to give. I will advance a different motivation for this switch in Section 4.3., however. The second noun, deel, refers to the second episode of a two-part show. Why it is used in a Turkish clause is hard to explain, especially since a little earlier the Turkish equivalent bölüm was used by the same speaker, cf. (9a and b). Their usage makes unequivocally clear that they are true synonyms, since they both refer to episodes of the same TV show. (9)

a. ondan sonra ne oldu? ben ikinci bölümära/ kaq\rd\m. (P. 148) 'what happened then? I missed the second episode'

Semantic specificity 143

b. ilkine baktm mi, ilk deel inel (P. 156) 'did you see the first one, the first part?' The specificity hypothesis can thus help in explaining the selection of some, but not all, of these embedded language words. Recall that they are the only ones used by the second generation as embedded language insertions that could not be explained through semantic domain alone. The majority of the insertions to be explained is provided by the intermediate generation. How does the Hypothesis fare with these? Closer examination reveals that many of these words are intimately associated with Dutch after all, even though they don't belong to a typically Dutch semantic domain. Consider, for example, migraine. The two times that the concept surfaces, the Dutch word is used, one of those times in a Turkish sentence.7 This is not surprising, since the speaker who introduces it, Ayhan's second generation interlocutor Hatice, has been living in Holland all of her life, and has presumably gone to a Dutch doctor to whom she talks about her condition in Dutch. Similar considerations can be brought forward for mogelijkheden (typical jargon word of the Dutch employment bureaucracy), map (folders with job ads, called maps, are the most central element for any visitor to an employment center), and the education terms handigheid, verantwoordelijk, eenvoudig, and moeilijk. However, at least for map, eenvoudig and moeilijk it would be far-fetched, or at least unmotivated, to designate them as semantically highly specific. This still leaves five words as unmotivated by domain membership. Only two of those can be explained through semantic specificity. Wegnemen is a stylistically more restricted, and thus more specific, variant of stelen 'to steal'. The other word that is explainable is kijken, normally one of the most general verbs imaginable. Its Turkish equivalents, bakmak and görmek, are used very often throughout the data. The first step in the analysis is to establish the exact meaning of kijken in its context. The sentence comes up during a description of the witness' role during the wedding ceremony. Millet kijken yap- in (10b) means that everybody's eyes are on you during the ceremony, the sort of watching alluded to in Big Brother's watching you. This is clearly a more specific meaning than basic looking. To complete the analysis, two more

144 Ad Backus

questions should be asked: first, does this concept appear elsewhere? If so, how is it coded? It turns out that this concept is indeed coded two utterances earlier, also by Ayhan. There he uses only Turkish, cf. (10a). (10)

a. milletin gözünde sende oluyor, weet je. (A, 236) 'and everybody's eyes are on you, you know.' b. ja, maar toch, millet kijken yapiyor (A, 240) yeah, but still, people will be watching you.'

The context reveals that Ayhan is warning Hatice that the witness at a wedding, a function she has been invited to fill but which she has serious doubts about, is the center of attention. He interprets her response I'm not such an attention seeker, so... as meaning it won't be that badfor me, because I'm such a quiet person and repeats his warning that all eyes will be upon her. It is in this second warning that the Dutch verb replaces the Turkish idiom. The Dutch verb seems to help in reinforcing the warning, along with the sheer repetition. Obviously, kijken has a more specific meaning here than in its basic sense of look, but its selection is at least partially motivated by pragmatic considerations as well (cf. Section 4.3). The words hotel andpakket have almost identical cognates in Turkish (both Dutch and Turkish borrowed these words). The Dutch form may have replaced the Turkish form in the variety of Turkish spoken in Holland, a common process in language contact (Johanson 1993: 215). The word getuige, finally, plays an important role in the last half of the conversation, and is repeatedly focused on in a semantic discussion about the similarities and differences between a Dutch getuige and a Turkish sagchg, both witnesses at a wedding. Therefore, it enjoyed a high level of activation when Ayhan uttered (11). This in turn shows that various psycholinguistic mechanisms, such as activation levels, are an integral part of the process of lexical selection. (11)

yamma birde ba§ka getuige aldim (A, 250) 'and I took another witness by my side'

Semantic specificity 145

The one word, in addition to deel in (9b), that is totally unexplainable through any semantic and/or psycholinguistic analysis is vriendin in (12). We will see in the next subsection that such general terms denoting people are normally always in the matrix language. (12)

bir sene beraber tam§iyorduk ya, o-nun vriendin-i van mijn begeleider, die had een vriend in Turkije. (A, 180) 'one year they met each other, a friend of my supervisor, who had a friend in Turkey.'

The conclusion is that a word's chances of being used as a codeswitch are high if it has Dutch connotations. For the specificity hypothesis to work, that means that the definition of specificity must include connotational meaning, and it seems this is done best using the concept of domain. In terms of Section 2, it has to include the pragmatic level. Purely semantic specificity only plays a supporting role, for instance in the case of fdjken in (10b) above. Many Dutch words are in free variation with Turkish equivalents, and semantic specificity plays a limited role in predicting for which words this will hold. However, the fact remains that truly basic vocabulary seems to be rare among embedded language insertions. The next section will investigate this in more detail.

4.2. Specificity within a semantic domain In this section, I will consider the use of content words by Ayhan (intermediate generation) and Hatice (second generation) in the fragment of their conversation in which they talk about Ayhan's hospital training program. Of the 71 clauses devoted to the topic, 41 are Dutch (25 by Ayhan; 16 by Hatice), and 30 Turkish (26 by Ayhan; only 4 by Hatice).8 The list in (13) below shows how the content words in this stretch of the conversation are distributed over the language of provenance, the language of the clause they are part of, and the semantic field (hospital versus general). For reasons of space, I give only English translations for the more numerous general words. The various categories are represented schematically in the table below.

146 Ad Backus

(13) Content words in the hospital fragment A. True hospital training content words: 36 i. Dutch words in Dutch clauses: 23 HBO-opleiding 'higher vocational training', geslaagd 'passed' (2x), baan 'job' (2x), opleiding 'training' (2x), budget (2x), praktijk 'practice', zelfstandig 'unassisted' (2x), foto's'X-mys', thorax, longfoto 'X-ray of the lungs', onderste buik 'belly', blaasfoto 'X-ray of the bladder', extremiteiten 'extremities', met bescherming 'with protective clothing', radiologen 'radiologists', collega's 'colleagues', begeleidster 'supervisor' (2x). /'/'. Turkish words in Turkish clauses: 1 fotograf gekmek 'to make an X-ray'. iii. Dutch words in Turkish clauses: 10 laborant 'lab worker',particulier 'private',praktijk 'practice', theorie 'theory', handigheid 'skills', leren 'to learn', begeleiding 'supervision', foto 'X-ray', begeleidfsterj 'supervisor' (2x). rv. Turkish words in Dutch clauses: 2 devlet hastanesi 'state hospital', kollar 'arms'. B. General content words used in this fragment: 74 /. Dutch words in Dutch clauses: 39 Nouns: contact (2x), point, risk, government, people, reason, months, theoiy, orders, everyone, pictures, signature, Turk, friend, Turkey. Verbs: fired, to cut money, hired, to pay, to come, Adjectives: good (5x), fun, married, Adverbs: even (2x), too bad (2x), normally, nothing, Idioms: I like ..., exactly, look, I mean, that's right, /'/'. Turkish words in Turkish clauses: 30 Nouns: girl (2x), year, waiter, Verbs: go on holiday (2x), disappointed, get (=learn), did, said, looked, was disappointed, went, ate, sent a card, had met, see, took, won't go, fall in love, working, comes, can't stay Adverbials: in a short time, four months, in two weeks, with most, home, New Year's, alone, iii. Dutch words in Turkish clauses: 5 Nouns: girlfriend, hotel, Christmas Adjectives: simple Adverbial: in two weeks, /v. Turkish words in Dutch clauses: 0

Semantic specificity 147 Table 1. Distribution of content words in the hospital fragment, divided by semantic field and matrix language of clause.

Dutch content words Turkish content words

In Turkish clauses General Hospital 5 10 30

In Dutch clauses General Hospital 39 23

0

2

What we can conclude from these data is that content words in this semantic field tend to be Dutch, regardless of what the matrix language is in any given clause. Two things that are especially noteworthy are the following: first, while there are ten Dutch content words that belong to the semantic field in question in Turkish clauses, there is only one such content word from Turkish. Obviously, the Dutch content vocabulary specifically denoting concepts relevant to the hospital training is used much more readily by Ayhan than the Turkish equivalents. Another way in which this is borne out is that eleven times as many (33 versus 3) of the content words within the semanticfieldare Dutch. That is way more than can be expected given the 41:30 division into Dutch and Turkish clauses in this fragment. For the general content words in B, the division is indeed as expected: 44:30. In other words: choice of Dutch words is likely if the targeted word belongs to a semantic field that is associated with the embedded language.9 Second, in contrast to that lone Turkish hospital word under Aii, there are 30 general Turkish content words in Turkish clauses within this stretch of talk (Bii). These words do not themselves belong to the semantic field. Are they less specific than the five Dutch non-hospital content words that appear in Turkish clauses (listed under Biii)l The answer has to be negative. As noted before, the simple fact that a word is typically associated with a semantic field strongly associated with Dutch, makes it highly specific. The five Dutch words (for 'girlfriend', 'hotel', 'Christmas', 'simple', and 'within two weeks') are not radically more specific than the 30 non-hospital Turkish words used in this fragment. Only kerst 'Christmas', being a name, may be considered highly specific. On the other hand, virtually all of the Turkish words are very basic too, with the exception of garson 'waiter' (could be replaced by

148 Ad Backus

more general 'guy') and üzüldü 'was disappointed' (could be replaced by a vaguer emotion verb). We must conclude, once again, that specificity, if it is to play a role, must include the notion of semantic field in its definition, since codeswitching in this fragment mainly involves words that have to do with the hospital training program. Among the content words that do not belong to thisfieldand that are part of Turkish clauses, Dutch does not contribute the more specific ones. Therefore, we must conclude that it is not specificity itself that enhances the likelihood of insertional codeswitching, but rather one aspect of specificity: membership in a semantic domain that has strong associations with the embedded language. Connotations and other aspects of encyclopedic meaning are an integral part of the meaning of a lexical item, which in effect entails that semantics and pragmatics are indivisable. We have seen that talking about a topic that is dominated by an embedded language vocabulary can increase the amount of insertional codeswitching, but it can also have another effect: use of Dutch as the main language of communication as long as the topic prevails. The speakers here have chosen the first option, since the division of Turkish and Dutch clauses follows the average of the whole conversation (cf. note 9). In the second generation data, however, the other option is used far more often, which partly explains the low incidence of insertional codeswitching in those conversations.

4.3 Other factors In this section, I just wish to briefly illustrate two additional factors that promote codeswitching. Both are pragmatic in nature, based on the awareness-raising effect of focusing. The first has been discussed by others under the heading offlagging (see, especially, Poplack, Wheeler and Westwood, 1989). An embedded language word that is itself the focus of attention, is obviously a good candidate for a switch. At that point, the speaker is maximally aware of lexical selection, which means that the language of provenance of the word he/she wants to use, is irrelevant, as long as the interlocutors know it too. Flagging is mostly demonstrated as an explicit device, illustrated by the word-searching

Semantic specificity 149

dummy §ey 'thing' in (14), but it can also be implicit. The context then makes clear that the word in question is in focus. Recall what was said in Section 4.1 about the high activation level of the Dutch word getuige 'witness' at some point during the conversation between Ayhan and Hatice. (14)

Ο da dü§ündü ta§indi, biraz ϋς ay falan §ey kaldilar samenwonen. (A, 184) 'and she thought about it and moved, and they have been what's it called, living together for a while, for three months or so.'

The other factor relies on the inherent meaning of morphosyntactic constructions, and certainly deserves more attention than I can give it here. Certain positions lend themselves particularly well to focusing, so much so that they are referred to as focus positions in monolingual grammars. The complement position of the copula is such a position in many languages. Quite a few of the insertions in my data occur at this position, cf. geest in (15). They represent the new information in the clause they are part of, and are thus the part of the clause where the speaker is most aware of lexical selection. It seems plausible that words with high semantic load gravitate towards focused positions. (15)

het is zo dat't eh, geest var, degil mi §imdi (§, 132) 'it's like this, it uhm, there's this ghost now, isn't there?'

5. Conclusion We have seen that the selection of embedded language elements is most predictable in the first generation data, in the sense that those Dutch words that were selected within Turkish clauses were not surprising choices. In the second generation data, however, the Dutch words that get inserted into Turkish clauses do not seem to have been selected because of their semantic attractiveness to the matrix language speaker. Instead, there seems to be a certain randomness in the selection of Dutch words. Presumably, this has to do with the more balanced bilingualism

150 Ad Backus

displayed by the second generation: they know more Dutch words, but they are also better at retrieving the Turkish equivalents of those words. We have also seen that typically Dutch semantic domains are not just responsible for the majority of insertional switches, within such a domain virtually all of the content words are Dutch. Finally, non-Dutch semantic domains yield few switches. The specificity hypothesis claims that insertional codeswitching mainly occurs with words that have a high degree of semantic specificity. That is, codeswitching is not just determined by what is syntactically possible, but also by what speakers wish to say (Myers-Scotton 1996). By and large, this hypothesis was supported by the data. Very few general Dutch words were inserted into Turkich clauses. What this chapter has demonstrated is that specificity, or whatever it is called, is increased by domain boundedness (Halmari 1997: 189), since this adds particular connotations to the referential meaning the element already possesses. Though languages in contact borrow words which are seemingly unnecessary, closer examination often shows us that there was a good reason after all to borrow the word in question. As (Weinreich 1953: 59) writes: "a bilingual is perhaps even more apt than the unilingual to accept loanword designations of new things because, through his familiarity with another new culture, he is more strongly aware of their novel nature." Core borrowings (i.e., in this case, borrowed content words that have general meaning) are generally held to be typical of intense bilingualism. They differ from cultural borrowings in that there is no "urgent consensus" (Myers-Scotton 1993b: 175) that they are very useful to the matrix language. Examples can be found in most articles about codeswitching, where such words are inserted as mother (Bhatt 1997), or drink eat (Lauttamus — Hirvonen 1995). Kamwangamalu (1996: 301) notes that siSwati has borrowed all kinds of words from English that are not highly specific, and for which there are equivalents in siSwati. Core borrowing becomes really rampant in codeswitching between closely related languages. Norde (1997) reports many borrowed German function words in Middle Swedish. Function words are prototypical general words, so, for instance, the borrowing of conjunctions is held to be a sign of "deep influence", due to "intense long-term contact" (Johanson 1999). It should be kept in mind that, in addition to a list of

Semantic specificity 151

core borrowings in any given context of language contact, one could probably draw up an even bigger list of general embedded language vocabulary which has not been borrowed. Owens (1996), in a study on an isolated African Arabic dialect, notes that most of the caiques involve basic vocabulary. Specific semantic fields contain borrowings (p. 301), though all in all really not a lot (p. 302). Basic vocabulary, however, remains Arabic. In Myers-Scotton (1993b: 194), the Preferential Path Principle is advanced to describe how, even when there is a lot of insertional codeswitching, many content morphemes will still be from the matrix language. This is the reverse of an also possible Relexification Principle, which could be suggested to take over in situations of more intense contact. However, actual relexification would be an extreme outcome. Since we are dealing with bilingual lects, it would be counterintuitive to expect the languages involved to be working towards an economical division between the two lexicons. Obviously, bilinguals know the translation equivalents of many words in their two languages. Selection is probably guided by a combination of factors, which include the bilinguality of the current mode of speaking (Grosjean 1992), personal preferences of the speaker, and current accessibility. Words that have been used recently in the conversation are perhaps more accessible than their counterparts (Halmari- Cooper 1998). In my data, sometimes the use of the embedded language word seems to activate the matrix language equivalent (Backus 1996: 206). Now why would it be that basic words apparently do not make good insertional switches? I suggest the answer lies in how we go about planning our utterances. Basic vocabulary doesn't tend to attract much attention from either speaker or hearer because it doesn't require much mental effort to make mental contact with its referent. Either it functions pronominally (as alternatives to personal and demonstrative pronouns), or it is general and familiar enough to have its referent be conjured up automatically. However, basic vocabulary can be made less basic if it is modified in some way. Modification can come about in several independent ways; I have discussed three of them, one semantic and two pragmatic in nature. The semantic factor is that modification with modifiers raises an

152 Ad Backus

element's specificity. Specific elements, including modified basic words, are switched relatively easily.10 I suggest that of the following two hypothetical codeswitching examples, the second is way more likely to occur (in fact, 16b has been attested, in Backus (1992); 16a has not): (16)

a. ΊEngeldi bir tane a