Variation in Language: System- and Usage-based Approaches 9783110346855, 9783110343557

Where is the locus of language variation? In the grammar, outside the grammar or somewhere in between? Taking up the deb

245 12 3MB

English Pages 321 [322] Year 2015

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Contents
System and usage: (Never) mind the gap
Part 1: System, usage, and variation
Language variation and the autonomy of grammar
The grammar of use and the use of grammar
Looking for structure-dependence, category-sensitive processes, and long-distance dependencies in usage
Variation in syntax: Two case studies on Brazilian Portuguese
Part 2: Rare phenomena and variation
Rare phenomena revealing basic syntactic mechanisms: The case of unexpected verb-object sequences in Mennonite Low German
The no man’s land between syntax and variationist sociolinguistics: The case of idiolectal variability
What you like is not what you do: Acceptability and frequency in syntactic variation
Part 3: Grammar, evolution, and diachrony
“Intelligent design” of grammars – a result of cognitive evolution
Syntactization, analogy and the distinction between proximate and evolutionary causations
Gradual loss of analyzability: Diachronic priming effects
How usage rescues the system: Persistence as conservation
Recommend Papers

Variation in Language: System- and Usage-based Approaches
 9783110346855, 9783110343557

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Variation in Language: System- and Usage-based Approaches

linguae & litterae

Publications of the School of Language & Literature Freiburg Institute for Advanced Studies Edited by Peter Auer, Gesa von Essen, Werner Frick Editorial Board Michel Espagne (Paris), Marino Freschi (Rom), Ekkehard König (Berlin), Michael Lackner (Erlangen-Nürnberg), Per Linell (Linköping), Angelika Linke (Zürich), Christine Maillard (Strasbourg), Lorenza Mondada (Basel), Pieter Muysken (Nijmegen), Wolfgang Raible (Freiburg), Monika Schmitz-Emans (Bochum)

Volume 50

Variation in Language: System- and Usagebased Approaches Edited by Aria Adli, Marco García García and Göz Kaufmann

ISBN 978-3-11-034355-7 e-ISBN (PDF) 978-3-11-034685-5 e-ISBN (EPUB) 978-3-11-038457-4 ISSN 1869-7054 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2015 Walter de Gruyter GmbH, Berlin/Boston Typesetting: epline, Kirchheim unter Teck Printing: Hubert & Co. GmbH & Co. KG, Göttingen ♾ Printed on acid-free paper Printed in Germany www.degruyter.com

Contents Aria Adli, Marco García García, Göz Kaufmann System and usage: (Never) mind the gap  1

Part 1: System, usage, and variation Frederick J. Newmeyer Language variation and the autonomy of grammar  29 Gregory R. Guy The grammar of use and the use of grammar  47 Richard Cameron Looking for structure-dependence, category-sensitive processes, and long-distance dependencies in usage  69 Mary A. Kato Variation in syntax: Two case studies on Brazilian Portuguese  91

Part 2: Rare phenomena and variation Göz Kaufmann Rare phenomena revealing basic syntactic mechanisms: The case of unexpected verb-object sequences in Mennonite Low German  113 Leonie Cornips The no man’s land between syntax and variationist sociolinguistics: The case of idiolectal variability  147 Aria Adli What you like is not what you do: Acceptability and frequency in syntactic variation  173

VI 

 Contents

Part 3: Grammar, evolution, and diachrony Hubert Haider “Intelligent design” of grammars – a result of cognitive evolution  203 Guido Seiler Syntactization, analogy and the distinction between proximate and evolutionary causations  239 Rena Torres Cacoullos Gradual loss of analyzability: Diachronic priming effects  265 Malte Rosemeyer How usage rescues the system: Persistence as conservation  289

Aria Adli, University of Cologne Marco García García, University of Cologne Göz Kaufmann, University of Freiburg

System and usage: (Never) mind the gap1 1 System- and usage-based approaches 1.1 What is at stake? At least since the Saussurian distinction between langue and parole, the relation between grammar and language use has been a central topic of linguistic thought. The present volume deals with this relation by focusing on language variation. The improved possibilities of working with large corpora and the increased refinement of experimental designs make this – once again – a worthwhile undertaking. Quite unsurprisingly, different linguistic subfields make different uses of these new possibilities, uses which reflect their respective theoretical frames. Many sociolinguists apply usage-based approaches while most, though not all, syntacticians adhere to system-based approaches. However, both usage and system are heterogeneous and even somewhat fuzzy concepts. Therefore, the question arises whether this distinction is at all meaningful. It may be more appropriate to conceive a continuum between system- and usage-based approaches. Such a continuum includes intermediate positions, several of which can be found in this volume. On the system-side, the endpoint of such a continuum may be seen in generative grammar. An important common denominator of generative (systembased) approaches is the assumption that grammar is independent from usage and that language use obeys the rules of a grammatical system. On the usageside, the endpoint may be represented by the model of emergent grammar, which refers to the idea that linguistic structures and regularities are no more than an epiphenomenon, i. e. “not the source of understanding a communication but a by-product of it” (Hopper 1998: 156). An important common denominator of

1 The editors would like to thank the authors for their contributions and for their willingness to participate in the process of internal reviewing. Our thanks also go to two external reviewers, to Peter Auer for his continuous help and to Elin Arbin for checking the English of authors whose native language is not English. Obviously, all remaining shortcomings are our responsibility. Finally, we would like to thank the Freiburg Institute for Advanced Studies (FRIAS) for financing both the workshop System, Usage, and Society, which took place in Freiburg in November 2011, and the publication of this volume.

2 

 Aria Adli, Marco García García, and Göz Kaufmann

usage-based approaches is that grammar is essentially shaped by usage patterns and frequency. When bringing together system, usage and variation, some basic questions have to be answered: (i) Given that systematic variation is one of the most stable findings in the analysis of language(s), how do (more) system-based and (more) usage-based approaches explain such variation? (ii) How do (more) systemand (more) usage-based approaches define the relation between theory and data? (iii) Are there empirical facts that can only be explained satisfactorily by an autonomous grammatical system? (iv) Can we find distributional patterns whose quantitative analysis constitutes strong evidence for the assumption that frequency, i. e. usage, shapes the speakers’ cognitive representation of language? The first two questions will be dealt with in this introductory chapter, the last two in the contributions to this volume.

1.2 Gradience and change Experience and gradience are defined differently in usage and system-based approaches: According to Bybee (2006: 711), experience with language is the basic element in usage-based models: “While all linguists are likely to agree that grammar is the cognitive organization of language, a usage-based theorist would make the more specific proposal that grammar is the cognitive organization of one’s experience with language”. However, experience cannot be the trigger for an abrupt change of the cognitive representation of language; it rather leads to gradual processes. Bybee (2010: 2) compares language to a constantly and smoothly changing sand dune: The primary reason for viewing language as a complex adaptive system, that is, as being more like sand dunes than like a planned structure, such as a building, is that language exhibits a great deal of variation and gradience. Gradience refers to the fact that many categories of language or grammar are difficult to distinguish, usually because change occurs over time in a gradual way, moving an element along a continuum from one category to another.

By contrast, generative syntacticians assume the existence of an autonomous syntactic core module. Importantly, this module is not shaped by the individual’s experience with language since it is considered to be part of the biological equipment of humans. Universal Grammar is the genotype and each adult grammar constitutes a possible phenotype. Thus, experience is only the factor that explains the development of a specific phenotype within the restricted limits imposed by universal grammar. During first language acquisition, parameters are

System and usage: (Never) mind the gap 

 3

set to specific values, a process which is seen as fairly robust (Meisel 2011). Lightfoot (2006: 6) points out that “a person’s system, his/her grammar, grows in the first few years of life and varies at the edges depending on a number of factors” [highlighting added by us]. By way of illustration, Lightfoot (2006: 4–5) uses the mold, not the sand dune, as metaphor: The biology of life is similar in all species, from yeasts to humans. Small differences in factors like the timing of cell mechanisms can produce large differences in the resulting organism, the difference, say, between a shark and a butterfly. Similarly the languages of the world are cast from the same mold, their essential properties being determined by fixed, universal principles. The differences are not due to biological properties but to environmental factors.

Within this approach, variation and gradience require other explanations than in usage-based theories. Many generative syntacticians see their locus at the community level in the sense that during a period of change, multiple competing grammars coexist. Introducing the constant-rate hypothesis, Kroch (1989: 200) presents a quantitative corpus study of the rise of periphrastic do in English questions and negations. He claims that “when one grammatical option replaces another with which it is in competition within the community across a set of linguistic contexts, the rate of replacement, properly measured, is the same in all of them”.2 However, for usage-based linguists, the locus of change (and of gradience) is not exclusively the community where typically a generational change between caretakers and children takes place, but also the mature individual whose linguistic knowledge undergoes changes over lifetime. Usage-based theorists also point out that frequency should not be seen in isolation, but rather in “interaction and competition with various other factors of language use, such as recency, salience and context” (Behrens et al. to appear: section 9). Taking a usage-based stance, Torres Cacoullos (this volume) discusses both change over lifetime and recency or priming effects. She studies the historical development of complex verbal constructions in Spanish (locative estar + gerund) to a single periphrastic unit of progressive aspect. In doing so, she shows that progressive estar-constructions are primed by preceding non-progressive estar-constructions. Torres Cacoullos argues that the priming effect (including its changing intensity over time) is the result of the analyzability of the progres-

2 Yang (2000: 248) extends Kroch’s (1989) interpretation, assuming that multiple grammars do not only exist within a community but also within an individual’s mind. He claims that “there is evidence of multiple grammars in mature speakers during the course of language change.”

4 

 Aria Adli, Marco García García, and Göz Kaufmann

sive estar-construction. The capacity of this type of analyzability and the gradual change over time connected to it is assumed to exist within an individual’s grammar, given that priming is a psycholinguistic phenomenon based on individual cognitive processes. Furthermore, Torres Cacoullos argues that Kroch’s (1989) above-mentioned constant-rate hypothesis does not hold for the probability of selecting the progressive variant. Rosemeyer (this volume) is another contribution from a usage-based perspective. On the basis of a quantitative historical corpus, he studies Spanish split auxiliary selection, i. e. the question whether writers chose BE or HAVE in analytic perfect constructions. Like Torres Cacoullos, he analyzes priming effects, namely effects of persistence (linked to temporally close activation) and entrenchment (linked to repeated activation) (in the sense of Langacker 1987: 59; Bybee 2002; Szmrecsanyi 2005). Rosemeyer points out that both persistence and entrenchment have conserving effects on diachronic grammatical development, thereby creating systematicity in the patterns of change. The generative view on frequency effects is quite different. It is crucial to distinguish, as Meisel (2011: 3) has pointed out, between grammatical change that involves parameter resetting in the sense of Universal Grammar (see e. g. Lightfoot 2006) and change that is not attributable to new parameter values: As Sankoff (2005) and Sankoff and Blondeau (2007) have demonstrated, individuals may, in fact, adapt their language use during adulthood to innovative patterns resulting from generational change. Such lifespan changes may have profound consequences, but they do not involve reanalysis of grammars, i. e. we do not find evidence suggesting that mental representations of parameterized grammatical knowledge are subject to modifications after childhood. Even attrition of syntactic knowledge only seems to affect a person’s ability to use the knowledge developed early on in life.

For example, the frequency of subject pronoun realization can vary substantially from one null subject language (NSL) to another (Otheguy, Zentella and Livert 2007). However, from a generative point of view a critical threshold must be reached which leads to a parameter resetting from [+NSL] to [–NSL] or vice versa (or possibly also to [+partial NSL] in the sense of Holmberg, Nayudu and Sheehan 2009). Beyond that critical threshold, change is not gradual but abrupt, because in this case I-grammar changes. This has been clearly expressed by Lightfoot (2006: 158): I submit that work on abrupt creolization, the acquisition of signed languages, and on catastrophic historical change shows us that children do not necessarily converge on grammars that match input. This work invites us to think of children as cue-based learners: they do not rate the generative capacity of grammars against the sets of expressions they encounter but rather they scan the environment for necessary elements of I-language in unembedded

System and usage: (Never) mind the gap 

 5

domains, and build their grammars cue by cue. The cues are not in the input directly, but they are derived from the input, in the mental representations yielded as children understand and “parse” the E-language to which they are exposed [...]. We may seek to quantify the degree to which cues are expressed by the PLD [Primary Linguistic Data], showing that abrupt, catastrophic change takes place when those cues are expressed below some threshold of robustness and are eliminated.

Thus, one crucial question is whether there are different types of change, gradual non-parametric ones that are better accounted for with usage-based models and abrupt parametric ones that can be better explained within the generative model.3 Newmeyer (2003: 693–694) highlights one aspect that contradicts the usagebased view: He shows that grammars are not always useful in the sense of optimally responding to users’ pragmatic and social needs, i. e., languages often lack lexical and/or grammatical properties that may arguably be useful and possess properties that are not useful at all. As an example of the lack of distinctions that would be useful in everyday communication, Newmeyer (2003: 693) mentions the conspicuous rarity of the inclusive/exclusive pronoun distinction in the world’s languages. If grammar was as adaptive and fluid as suggested by the sand dune metaphor (most clearly expressed by the idea of emergent grammar in Bybee and Hopper 2001), one would expect, as he states, such useful features to occur in the great majority of languages. Unlike this, characteristics of dubious usefulness such as the homonymy between English you2sg and you2pl should not occur. Likewise, we would not expect abrupt phenomena of change under the assumption of a fully adaptive and fluid grammar.

1.3 Frequency and probabilities in grammar The debate in the Journal Language in the years 2003 to 2007 collected diametrically opposed opinions about the (ir)relevance of frequency and probabilities in grammar. Newmeyer (2003) doubted in his paper the representativeness of empirical data in many usage-based corpus studies on syntax (cf. also the discussion on the relation between theory and data in section 2.1). Guy (2005, 2007) replied that this is no more (but also no less) than a challenge that can be overcome. Indeed, quantitative sociolinguistic research has already established, as

3 We know from research in dynamic modeling that phenomena of change in very different fields do incorporate both gradual and abrupt changes (cf. e. g. Thom 1980). Future research needs to show whether grammatical change follows a similar pattern and whether gradual and abrupt language change can be integrated into one theory.

6 

 Aria Adli, Marco García García, and Göz Kaufmann

Guy points out, high methodological standards with regard to corpus data. Furthermore, the explanatory power of linguistic theory would be unnecessarily diminished when quantitative correlations with other language-internal and social factors are not taken into account. At this point, it is important to highlight that variationist sociolinguistics is not “inherently usage-based”. However, the importance of frequency or probability represents an important zone of overlap between variationist and usage-based models. It does not come as a surprise that one sociolinguistic subfield, namely cognitive sociolinguistics (Kristiansen and Dirven 2008; Geeraerts, Kristiansen and Peirsman 2010), builds on the premises of usage-based linguistics. Likewise, relying on corpus data from actual language use in formal-syntactic research does not by itself lead to the incorporation of usage-based positions into generative thinking. The difference between theorists from both persuasions with regard to the role of experience on the cognitive organization of the (child and adult) speaker remains. Yet, taking corpus data seriously leads to a more serious consideration of frequency and related phenomena such as gradience, recency, and variation in usage. To put it in Barbiers’ (2013: 3) words, we could then “shift away from the methodology of idealization of the data in search of the universal syntactic properties of natural language, towards a methodology that takes into account the full range of syntactic variation that can be found in colloquial language”. However, most generative linguists still do not analyze social variation in their research and those who do have abandoned classic generative premises.4 Seeing the social perspective as irrelevant goes back to Chomsky’s (1965: 3–5) notion of the ideal speaker-listener. The following quotation, which is embedded in Chomsky’s (2000: 31) critique of externalist philosophy, highlights this: Suppose, for example, that “following a rule” is analyzed in terms of communities: Jones follows a rule if he conforms to the practice or norms of the community. If the “community” is homogeneous, reference to it contributes nothing (the notions norm, practice, convention, etc. raise further questions). If the “community” is heterogeneous – apart from the even greater unclarity of the notion of norms (practice, etc.) for this case – several problems arise. One is that the proposed analysis is descriptively inaccurate. Typically, we attribute rule-following in the case of notable lack of conformity to prescriptive practice or alleged norms. […] The more serious objection is that the notion of “community” or “common

4 However, not all generative syntacticians follow this approach. Wilson and Henry (1998: 8) point out that social variation and change is constrained by universal grammar, which defines the set of possible grammars. A corresponding observation is that grammatical introspection – the most important empirical source in generative syntax – is subject to systematic social variation (Adli 2013: 508; cf. also Bender 2007; Eckert 2000: 45).

System and usage: (Never) mind the gap 

 7

language” makes as much sense as the notion “nearby city” or “look alike”, without further specification of interests, leaving the analysis vacuous.

The classic generative position, particularly widespread during the early years of generative grammar, is that frequency effects and correlations between the use of a construction and other language-internal and language-external factors were (and regrettably sometimes still are) considered to be epiphenomenal, a position most clearly expressed in Chomsky’s (1965: 3) famous stance: Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech-community, who knows its language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance.

According to mainstream generative theory, understanding language use is secondary in the sense that any theory of use presupposes a theory of the system (Chomsky 1965: 9). Actual linguistic experience and general non-linguistic capacities are rather “third-factor effects” in this research enterprise, as becomes clear in one of Chomsky’s (2009: 25) recent writings: Assuming that language has general properties of other biological systems, we should be seeking three factors that enter into its growth in the individual: (1) genetic factors, the topic of UG; (2) experience, which permits variation within a fairly narrow range; (3) principles not specific to language. The third factor includes principles of efficient computation, which would be expected to be of particular significance for systems such as language. UG is the residue when third-factor effects are abstracted.

1.4 Variation as a bone of contention In usage-based approaches variation does not constitute a serious problem because it is seen as a core property of language and of the speakers’ knowledge of language. As we have seen, variationist sociolinguists, who are often but not always in line with usage-based positions, describe variation by means of variable rules, calculating the probability p for a given form in a subpopulation q (Guy 1991; Labov 1969). Guy (this volume) writes that “any grammar of a language, or theory of grammar, that fails to account for variability is inadequate on its face – it does not even reach Chomsky’s most elementary level of ‘observational’ adequacy”. Generative syntacticians usually do not adhere to the idea of “optional” variants of one underlying form, which is a major conceptual obstacle in integrating

8 

 Aria Adli, Marco García García, and Göz Kaufmann

variation into their model. Rather, optionality is described as “apparent”, either by assuming that the input of the sentences differs (Adger and Smith 2010) – with or without differences in meaning – or by assuming that the sentences are produced by multiple competing grammars (Tortora and den Dikken 2010; Kroch 1989). One example illustrating the debate on optionality are French wh-questions, in which the wh-element can appear in-situ as in (1a) or fronted as in (1b). (1a) Ton frère est allé où ? your brother has gone where (1b) Où ton frère est allé ? where your brother has gone ‘Where did your brother go?’

It is telling that optionality, the central notion of variationist sociolinguistics (“alternate ways of saying the same thing”, cf. Labov 1972: 118), is refuted by mainstream generative syntax. Belletti and Rizzi (2002: 34) express the generative position very clearly with regard to word order variation: “The movement-as-last resort approach implies that there is no truly optional movement. This has made it necessary to reanalyze apparent cases of optionality, often leading to the discovery of subtle interpretive differences”. The wh-fronted order in (1b) could be an example of this. Some linguists would claim that the word order differences in (1a) and (1b) represent pragmatic or semantic differences, for example in terms of givenness of the non-wh-entities or presupposition of the wh-element, yet it is far from clear whether such differences are systematic. The idea, as pointed out by Newmeyer (2006: 705), is that “as far as syntactic variation is concerned, since variants typically differ in meaning, the probabilities are likely to be more a function of the meaning to be conveyed than a characteristic inherent to the structure itself”. Functionally motivated variation, “the more systematic aspects of one’s extragrammatical knowledge”, is taken into account by Newmeyer (2003: 698); however, it is not seen as part of the system but of the “user’s manual”.5

5 The notion of a user’s manual goes back to Culy (1996: 112), who sees its roots in pragmatics. His initial idea was to explain systematic grammatical differences between registers/styles, such as the distinctive use of zero objects in English recipes or the frequency of use of particular grammatical forms or constructions. The user’s manual in Culy’s (1996: 114) terms is roughly described as specifying “the characteristics of registers and, within each register, […] characteristics of different styles”. The user’s manual essentially carries information on frequency of use of items and constructions and default interpretations of variables in valency relations. This notion was later taken up by Zwicky (1999), yet without proposing a notably refined definition.

System and usage: (Never) mind the gap 

 9

The criteria that have to be fulfilled for most syntacticians with regard to optionality are much stricter than those usually applied in variationist socio­ linguistics. While many syntacticians refute optionality if two variants are not fully identical in meaning and distribution, it is good enough for sociolinguists if two variants are optional in most contexts. This phenomenon is called “neutralization in discourse” (Sankoff 1988: 153). In his debate with Newmeyer, Guy (2007: 3) points out that “the prevailing consensus is that, while certain structures may have different meanings in some of the contexts they occur in, there are often other contexts in which they function as alternants. Therefore, productive variationist analyses can be conducted, given careful attention to contexts and meaning”. Syntacticians who try to incorporate optionality into the grammar system represent a minority. Kato (this volume) is one example. She presents a study within the generative paradigm which nevertheless takes a critical stance towards standard minimalist assumptions. Kato attempts to account for variation and optionality in Brazilian Portuguese syntax. First, she discusses the variation between null and overt subject pronouns. It is noteworthy that she sees the locus of this variation inside a person’s I-language without resorting to the idea of multiple internal grammars (such as Yang 2000; see also Roeper’s 1999 more radical approach of “universal bilingualism”). In doing so, she refers to the distinction, introduced by Chomsky (1981: 8), between the core and the extended periphery of grammar, both constitutive of a person’s I-language. Kato builds on Kato (2011), where core grammar is linked to early childhood acquisition and the extended periphery in syntax to late childhood acquisition. It turns out that overt subject pronouns, typical of current Brazilian Portuguese, are acquired before schooling and null subjects during schooling. This means that the null subject is used by older children and adults, yet its late acquisition “does not affect grammar as a system”. The second phenomenon Kato discusses concerns optional surface orders of Brazilian Portuguese wh-questions, namely the variation between “fronted” and “in-situ” wh-constituents. Her study suggests that the positioning of the wh-constituent is acquired in early childhood. Thus, Kato argues that the variants belong to the child’s core grammar. Barbiers (2005) is another example of a generative syntactician who incorporates the notion of optionality into grammar.6 He assumes that “variation and

6 Other examples are Fukui (1993), Saito and Fukui (1998), Henry (2002), Haider and Rosengren (2003), Biberauer and Richards (2006), and Adli (2013). Barbiers (2005) builds his conclusions on the data from SAND (Syntactische Atlas van de Nederlandse Dialecten, cf. www.meertens. knaw.nl/sand and Barbiers 2013: 2–3 on the European Dialect Syntax Project). Other projects

10 

 Aria Adli, Marco García García, and Göz Kaufmann

optionality are an inherent property of grammatical systems. Individual speakers and communities pick their choice from the options provided by their grammatical system, but they never pick beyond these options” (cf. as a possible counterexample Cornip’s (this volume) example of non-V2-root clauses in Germanic varieties).7 We have already seen in the previous section that the issue of frequency and probability constitutes a dividing line between generative syntacticians and variationist sociolinguists. This question is also an essential aspect in the discussion on how to account for variation: In essence, the question is whether grammar contains numbers or not? Variationist sociolinguists believe that “variability and quantitative properties are found in the system, inside the grammar” (Guy, this volume). Guy underlines the fact that functional and usage-based approaches that do not work with probabilities and limit their explanations to notions such as functional load “fail to predict any specific quantitative relation”. In this sense, variationist sociolinguists claim to go further than usage-based linguists by integrating mathematical operations and probabilities into the grammar. These regular patterns of probability have been described by Weinreich, Labov and Herzog (1968: 100) as “orderly heterogeneity”, which is considered a core property of language and a basic aspect of speakers’ knowledge. Newmeyer (this volume) rejects this view, claiming that “grammars do not contain numbers”. He states that probabilistic observations in variation are an “interaction of the formal grammar and extragrammatical faculties, as modulated by the user’s manual”. Newmeyer uses the notion of core and interface in order to further pinpoint the user’s manual. He adopts what he calls a “modular approach to variation” with grammatical competence (the system) as its core. According to him, the user’s manual presents “one face to grammatical competence and one face to the external factors that shape grammar. In a nutshell, it tells us what to do with our grammars and how often to do it”. It is not surprising that Guy (2007: 4) strongly criticizes “an ill-defined ‘user’s manual’, from which the quantitative generalizations emerge epiphenomenally, without any necessary connection to the principles contained in the grammar”.

which combine quantitative analyses of microdialectal variation with modern syntactic theory are the Dialect Syntax of Swiss German (University of Zurich), and Kaufmann’s (2007) study on the verbal syntax of Mennonite Low German. 7 For an interesting formal proposal that accounts for (genuine or apparent) optionality within the minimalist framework, see the algorithm dubbed “combinatorial variability” presented by Adger (2006) and Adger and Smith (2010).

System and usage: (Never) mind the gap 

 11

1.5 A dash of epistemology At the present stage of knowledge it is hard to see how the dispute for or against a clear distinction of grammar and usage can be empirically settled. One way to proceed can be to take a step back in order to engage in an epistemological discussion. Cameron (this volume) takes a critical stance against a binary view of system and usage. He emphasizes that this distinction is better described as a “fundamental assumption that contributes to theory building” and that “as such, the distinction itself may not actually be falsifiable in a broad sense”. Essentially, he takes up three phenomena cited by Newmeyer (2003) in favor of a binary distinction (long-distance dependencies, category-sensitive processes and structure dependency) and shows that these phenomena have analogues or parallels in usage. Cameron concludes: “I guess what I am arguing for is a new set of terms, something other than grammar and usage or competence and performance, something not binary, something n-nary”. The contributions of Seiler and Haider go one step further by integrating central thoughts from other fields of science. Seiler (this volume) engages in a conceptual discussion on the foundation and the implications of the systemusage debate. He states that usage-based linguistics has a certain kinship with functionalist approaches to grammar (“in a formalist view on syntax, syntactic structure is to some degree immune against usage”), while formalist theories often embody the idea of an autonomous syntactic system. Yet he points out that formalist and functionalist approaches to language should not be seen as antagonistic since they may explain different aspects of language. For Seiler (this volume), a strict system-usage dichotomy would be ill-fated, just as ill-fated as the unproductive formalist-functionalist dichotomy in biology: The fundamental structure of the debates in biology and linguistics is astonishingly similar. In both disciplines, two schools defended their way of explaining aspects of nature as the only possible one at their time: proximate vs. evolutionary in biology, formal vs. functional in linguistics. The main difference between biology and linguistics lies in the fact that the complementarity (and compatibility) of the two kinds of explanation has been widely accepted by biologists since the modern evolutionary synthesis some seventy years ago. A modern linguistic synthesis is still yet to come. For linguists, this is not exactly a reason to be proud of.

On the surface, Haider’s (this volume) opinion with regard to functionalist and structuralist (formalist) schools in linguistics seems comparable to Seiler’s, but Haider is more radical, considering both approaches to be wrong: “The dispute [between structuralism and functionalism in biology] turned out to be completely irrelevant after Darwin’s theory of evolution gained ground”. Unlike Seiler,

12 

 Aria Adli, Marco García García, and Göz Kaufmann

Haider does not expect improvement from the “complementarity (and compatibility) of the two kinds of explanation”, i. e., functionalist and structuralist approaches, thus thwarting a central goal of this volume to a certain degree. He sees in grammar a cognitive organism which undergoes cognitive evolution and claims that “the descent of species and the descent of languages encompass the same abstract mechanism (self-replication, variation, selection) in two different domains”, a comparison suggesting that linguistics will need a figure on a par with Darwin in order to advance. So far, we have dealt with the first question mentioned at the end of section 1.1, namely the question of how usage- and system-based approaches tackle variation in language. Bringing contributions from disparate theoretical perspectives together in one volume, however, is also a good opportunity to raise fundamental methodological issues. The rest of this introduction will, therefore, be dedicated to the second question mentioned above, namely the relationship between theory and data.

2 L ooking behind the scenes: The old problem of theory and data 2.1 Empiricism and methodological issues One aspect in the discussion of system- and usage-based approaches concerns empiricism. The central importance of empiricism for any linguistic school is well expressed by Labov (1975: 7), who writes that “if one linguist cannot persuade another that his facts are facts, he can hardly persuade him that his theory is right, or even show him that he is dealing with the same subject matter”. Although at first glance it may seem somewhat far-fetched to insinuate that linguists of different orientations do not deal with the same subject matter, a closer look at one part of Chomsky’s (1965: 3) famous dictum, which has already been cited above, does not leave much space for discussion: “Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech-community”. Section 1 has already stressed the central role of variation in the discussion about system and usage. The issue of whether one accepts variation as a core fact of language, whether one locates variation inside or outside grammar proper, and whether one deems it necessary to abstract away from variation is present in several of the papers in this volume. We therefore must address the following questions: What kind of empirical data should linguists use and which role should these data play in theory-building and theory-testing? These questions will be approached by discussing an older paper by Labov (1975) and

System and usage: (Never) mind the gap 

 13

a more recent one by Featherston (2007a) (cf. also Schütze 1996). By focusing on the empirical base of generative grammar and the critique it spawned, the positions of both system and usage-based (variationist) approaches will become clear. The fact that Featherston (2007a) criticizes some of the same things Labov (1975) criticized 32 years earlier shows that the latter was right in concluding that there was little room for optimism with regard to a possible approximation of the different positions: “Ideological positions are too well established, and habits of work are too firmly set to believe that there will be an immediate convergence of thinking on these issues” (Labov 1975: 54). With regard to the question of what kind of empirical data linguists should use, Featherston (2007a: 271 and 308) considers the frequently applied practice of using judgments of a single person to be “inadequate”, adding that “[t]here can be little satisfaction in producing or reading work which so clearly fails to satisfy scientific and academic standards”. Besides many critical reactions, Featherston also receives support. Haider (2007: 389) writes: Generative Grammar is not free of post-modern extravagances that praise an extravagant idea simply because of its intriguing and novel intricacies as if novelty and extravagance by itself would guarantee empirical appropriateness. In arts this may suffice, in science it does not. Contemporary papers too often enjoy a naive verificationist style and seem to completely waive the need of independent evidence for non-evident assumptions. The rigorous call for testable and successfully tested independent evidence is likely to disturb many playful approaches to syntax and guide the field eventually into the direction of a serious science. At the moment we are at best in a pre-scientific phase of orientation, on the way from philology to cognitive science.8

Considering that both Featherston and Haider work in the generative frame and nevertheless take such a critical attitude and considering that some generative linguists use historical corpora or elicited data for their analyses (Lightfoot 1999; Kroch 1989; Barbiers 2005, etc.) may raise our hopes, but in general the empirical base of much generative work continues to be of a rather dubious nature. Newmeyer (2007: 395) still describes the use of a “single person’s judgments” as “standard practice in the field” not just in generative linguistics, but in cognitive and functional linguistics as well. The problem with a “single person’s judgments” is that generative linguists work with the judgments of conscious and – even more importantly – self-conscious human beings and not with uncon-

8 Pullum (2007: 36) discusses the same point: “Looking back at the syntax published a couple of decades ago makes it rather clear that much of it is going to have to be redone from the ground up just to reach minimal levels of empirical accuracy. Faced with data flaws of these proportions, biology journals issue retractions, and researchers are disciplined or dismissed”.

14 

 Aria Adli, Marco García García, and Göz Kaufmann

scious matter as classical natural sciences like physics or chemistry do. Due to this, the attempt to separate acceptability from grammaticality, i. e., to filter out the “noise” of acceptability from the supposedly pure essence of grammaticality, may not be solvable in principle (cf. Schütze 1996: 25–27 and 48–52; Featherston 2007b: 401–403; Newmeyer 2007: 396–398), regardless of whether one aggregates judgments of hundreds of disinterested informants or whether one uses the introspection of a definitely not disinterested linguist. Besides the acceptability-grammaticality issue, there is a whole array of further problems; for example, the still unclear relationship between speakers’ judgments and speakers’ language production:9 Kempen and Harbusch (2005: 342) analyze sequences of (pronominal) arguments in the midfield of German clauses and write that “[a]rgument orderings that embody mild violations of the [linearization] rule, receive medium-range grammaticality scores […] but are virtually absent from the corpora because the grammatical encoding mechanism in speakers/writers does not (or hardly ever) produce them”. Unfortunately, the theoretical relevance of such grammaticality-production mismatches is rarely the focus of research (but cf. Adli this volume). Why do sequences, which are rated as medium-range grammatical, not occur more often and what exactly does it mean if a sentence is of medium-range grammaticality? Be this as it may, the mismatch between medium-range grammaticality (judgment) and lack of occurrence (production) does not constitute a fundamental problem for generative grammar. A more threatening issue, however, is the mismatch of supposed ungrammaticality (judgment) and occurrence (production). A case in question is the “prohibition against the deletion of the relative pronouns which are subjects” (Labov 1975: 41–42). In spite of this judgment-based prohibition, this kind of deletion occurred in fourteen out of 336 possible tokens in a corpus from Philadelphia (4.2 %). Granted, 4.2 % is not a very high share, but fourteen occurrences is a robust enough number for linguists to wonder how one could account for the existence of these tokens. As many rare phenomena will not form part of the idiolects of linguists, these linguists will simply not (be able to) submit them to their own grammaticality judgments. Thus, the linguist who refuses to work with performance data (E-language) is bound to overlook possibly crucial linguistic facts since many of these rare phenomena are only detected in large corpora (cf. section 2.2

9 Schütze (1996: 48) comments: “Over the history of generative grammar, much has made [sic!] of its heavy reliance on introspective judgments and their nonequivalence to production and comprehension”. Featherston (2007a: 271) also mentions production data as a possible source for studies in the generative framework: “This focus [on judgment data] is in no way intended to belittle the value of corpus data or make out that this data type is any less relevant”.

System and usage: (Never) mind the gap 

 15

for a more thorough discussion of rare phenomena). Without analyzing such phenomena, we will not be able to produce a grammar which generates all possible sentences of a language. Another problem with regard to the use of grammaticality judgments can be illustrated by means of the reactions of Grewendorf (2007) and Den Dikken et al. (2007) to Featherston’s (2007a) article. Grewendorf (2007: 376) notes: Nevertheless, it may eventually turn out that differences in grammaticality judgments between a group and an individual linguist cannot be attributed to ‘‘inadequate research practice’’ of the latter but clearly exhibit differences between I-languages. In this respect, the grammatical intuitions of the individual cannot be falsified by the results of acceptability experiments carried out with a group.

Even more pointedly, Den Dikken et al. (2007: 343 – Footnote 4) state: As a side point, we do not see how the mean value of the judgments of a group of speakers can confirm or disconfirm an individual’s judgments: one’s judgments are one’s judgments, no matter what other speakers of ‘the same language’ might think.

With regard to this “‘my idiolect’ gambit” (Featherston 2007a: 279; Schütze 1996: 4–5), one can only hope that Haider (2007: 382 – Footnote 1) is not right when he claims: “More often, the problem [the risk of having to give up a “dearly fostered hypothesis”] is solved pragmatically. Conflicting evidence is simply ignored or repressed”.10 In any case, accepting Grewendorf’s and Den Dikken et al.’s convictions would constitute the end of science as we know it. This was already seen by Labov (1975: 14, 26, and 30), who writes that “[t]he study of introspective judgments is thus effectively isolated from any contradiction from competing data.” He adds that “[u]ntil more solid evidence is provided by those who have no theoretical stake in the matter, the most reasonable position is to assume that such dialects [idiosyncratic dialects, i. e. idiolects] do not exist” and that “the uncontrolled intuitions of linguists must be looked on with grave suspicion”. What is at stake here is not Labov’s question of whether idiolects exist or not – they probably do; what is at stake is the lack of control in the ‘my idiolect’ gambit and the conflict of interest of researchers who base their theories on their evaluation of sentences constructed by them. The fact that Grewendorf and Den Dikken et al.

10 Pullum (2007: 38) adds another possible technique for saving a “dearly fostered hypothesis”: “In syntax, if you want some sequence of words to be grammatical (because it would back up your hypothesis), the temptation is to just cite it as good, and probably you won’t be challenged. If you are challenged, just say it’s good for you, but other dialects may differ”.

16 

 Aria Adli, Marco García García, and Göz Kaufmann

still mention positions which were convincingly rejected thirty years ago is telling proof of the existence of what Labov calls an “ideological position”. In spite of these problems, one must not forget the unprecedented progress our understanding of language has achieved thanks to generative grammar. Fanselow (2007: 353) rightly emphasizes that “it [generative syntax] has broadened the data base for syntactic research in a very profound way”. But while one’s own intuitions may have been sufficient in the beginning of generative linguistics and may still be sufficient for basic syntactic phenomena (cf. Labov’s 1975: 14 and 27 discussion of Chomsky’s clear cases), the ‘my idiolect’ gambit cannot be applied to rare or controversial phenomena. One’s own intuitions simply do not fit a field aiming to overcome a “pre-scientific phase of orientation” (cf. Haider’s 2007: 389 quote above). This does not mean that Grewendorf’s (2007: 373) grammar is deviant “because [his] judgments do not correspond to the judgments of Featherston’s group”, it just means that nobody should devise a theory based exclusively on his/her own judgments; i. e., like Pullum (2007: 38) we should stop thinking that the “how-does-it-sound-to-you-today method can continue to be regarded as a respectable data-gathering technique”. Due to the still widespread lack of interest in the empirical side of their research, generative grammar has not yet made the methodological and analytical progress sociolinguistics and variation linguistics have achieved. With regard to quantification and especially with regard to categorization, Labov’s early methodology must be regarded as naïve (e. g., reading word lists and recounting near-death experiences representing formal and informal styles, respectively). But this objection has to be qualified, since these methods were used to establish a new field and have been improved dramatically ever since. In contrast, many generativists still use the same empirical base their colleagues used fifty years ago, despite the fact that the field has gone through at least four major theoretical phases (from Standard Theory to Minimalism). Leaving the question of what kind of empirical data linguists should use, we will briefly focus on the second point mentioned at the beginning of this section: the question of which role empirical data should play in theory-testing and theorybuilding. It does not come as a surprise that many generativists do not foster a balanced view of theory-oriented and data-driven approaches, i. e., we are still far away from Featherston’s (2007b: 408) conviction that “data and theory are indeed in a mutually dependent relationship, both affecting the credibility of the other”. Even for Barbiers (2005: 258), a generative linguist with much experience in the analysis of elicited judgment/production data, sociolinguistics seems secondary to generative linguistics: “Finally, it was argued that there are certain patterns in individual and geographic variation about which generative linguistics has nothing to say. That is where sociolinguistics comes in”. Surprisingly, Featherston

System and usage: (Never) mind the gap 

 17

(2007a: 310 and 314) himself also shows a rather biased opinion with regard to this interplay putting data first: The most criticized stances in his paper are that “[l]inguists need to look at the data first and develop their models afterwards […]” and that “[d]ata is a pre-condition for theory, and the quality of a theory can never exceed the quality of the data set which it is based on”. One can be sure that few linguists, let alone generative linguists, would subscribe to this division of labor (cf. especially the comments of Grewendorf 2007: 377–379). In any case, as long as data-driven approaches only come in when formal approaches fail and as long as elicited data are only seen as a source for checking hypotheses at best, we cannot take full advantage of their potential. Therefore, the question is not only whether to use new types of empirical data in systembased approaches, but how to use them and how to correctly evaluate their use. The present volume contains analyses of different types of elicited language data, some of which are analyzed within the framework of system-based approaches: Cornips (this volume) uses elicited language data, but also judgment tests in the case of clusters with three verbal elements. Adli (this volume) combines judgment and production data and offers comparisons between these data types for wh-questions in French. With this, he tries to tackle the grammaticality-production mismatch mentioned above. Kato (this volume) analyzes data from child language acquisition and Kaufmann (this volume) uses translation data. His informants were asked to translate Spanish, Portuguese and English stimulus sentences into Mennonite Low German. As with all research methods, there exist advantages (amount and comparability of the data, cf. Schütze 1996: 2) and disadvantages to such an approach (no natural speech, possible priming effects, cf. for the latter Kaufmann 2005). Torres Cacoullos (this volume) and Rosemeyer concentrate on structural changes in the verbal domain of Spanish by analyzing historical texts. As mentioned above, both of them work within a usage-based framework (unlike Cornips, Adli, Kato and Kaufmann). In this case, too, the methodological problems are manifold, but Torres Cacoullos and Rosemeyer can hardly be held responsible for this. Historical data reflect oral speech only to a certain degree and one can only use the data one has, i. e., one cannot go back and ask speakers/writers how they rate constructions for which one does not find evidence in the written record.

2.2 Anti-frequency or the problem of rare data One especially interesting case with regard to linguistic data are rare phenomena, i. e., generally uncommon but nevertheless robust linguistic facts. Rare phenomena raise some essential empirical and theoretical questions for the study

18 

 Aria Adli, Marco García García, and Göz Kaufmann

of language, in particular concerning system- and usage-based approaches. But what exactly are rare phenomena? In our view, it seems reasonable to distinguish between two different types, namely (i) rare phenomena in a typological sense and (ii) rare or anti-frequent phenomena in a given language (or language family). The former meaning is by far the more common one. Following Plank’s characterization in the introductory notes to his famous Raritätenkabinett,11 a rare phenomenon (rarum)12 can be defined as a trait (of any conceivable sort: a form, a relationship between forms, a matching of form and meaning, a category, a construction, a rule, a constraint, a relationship between rules or constraints, ...) which is so uncommon across languages as not even to occur in all members of a single […] family or diffusion area (for short: sprachbund), although it may occur in a few languages from a few different families or sprachbünde.

For several reasons, the study of rare phenomena remains an important linguistic task (cf. also Cysouw and Wohlgemuth 2010: 3–4). First of all, it seems obvious that the consideration of rara will provide an empirically much more detailed picture of what is (im)possible in the languages of the world. Given that rare phenomena may contradict or even falsify cross-linguistic assumptions and linguistic universals, they may help us to formulate more adequate generalizations. As a consequence, we may get better linguistic descriptions, which in turn may offer more adequate explanations. This also holds true for the other type of rare phenomena, namely those that are anti-frequent in a given language (or language family). Under this label, we refer to all kinds of linguistic traits that are very infrequently attested with respect to other paradigmatic alternatives in the language(s) under consideration. The rareness of these linguistic traits is, in principle, irrespective of the distribution and frequency of these traits in other languages, i. e., a phenomenon that hardly occurs in a given language may or may not be rare in other languages. Some examples of this kind of rare phenomena are differential object marking with inanimate objects in Spanish (cf. García García 2014), wh-clefts and other wh-variants in French (cf. Adli this volume), verb-second order violations in Germanic languages (cf. Cornips this volume) or non-verb-final dependent clauses in Mennonite Low German (cf. Kaufmann this volume).

11 Das grammatische Raritätenkabinett is an online database comprising at present 147 rare phenomena (http://typo.uni-konstanz.de/rara/intro/index.php). 12 In addition to rarum, Plank also uses the terms rarissimum and singulare, which refer to even rarer or uniquely attested traits, respectively. For further definitions and specifications of this kind of rare phenomena, see Cysouw and Wohlgemuth (2010: 1–6).

System and usage: (Never) mind the gap 

 19

Anti-frequent phenomena present a special problem for usage-based approaches given the leading idea of this framework which “seeks explanations in terms of the recurrent processes that operate in language use” (cf. Bybee 2010: 13). Since frequency plays a decisive role in usage-based approaches (for instance, for the conventionalization of a linguistic structure), one wonders how scarcely attested phenomena can be(come) grammatical. Of course, this type of rare phenomena is also problematical for formal approaches to language, at least as long as the phenomenon in question cannot be derived from the interaction of other, more frequent phenomena (cf. Kaufmann this volume). This volume presents three studies that touch on these questions. All of them pertain to phenomena that are rare in a given language (or language group). The first one concerns violations of V2 in Germanic languages. This is an especially intriguing case because the rare variant in question (non-V2) is the marked variant within most Germanic languages, but it is the common variant from a global typological point of view (V2 is rarum number 79 in the Raritätenkabinett). Cornips (this volume) describes these violations as the consequence of both system- and society-based factors. About the latter she writes that “these ‘violations of V2’, or to be more precise, Adv-S-Vfin instances [adverb-subject-finite verb] occur only in peer conversations”. This restriction shows that an exclusively formal, systembased argumentation cannot explain the existence of non-V2-main clauses in Germanic languages since probably all speakers would deem such clauses as being outright ungrammatical (perhaps even the very persons who use them). Thus, we are faced with a synchronic mismatch between grammaticality and acceptability. This mismatch may be “resolved” by the speech community provided the hitherto rare phenomenon occurs in a sufficiently robust number (cf. Lightfoot 1999: 156) and provided it starts occurring outside of peer conversations. Current acceptability in peer conversation may thus eventually turn into grammaticality. This type of language change cannot be explained (or predicted) by systemic considerations because at a certain moment in time, the phenomenon in question is ungrammatical. However, as such rare phenomena are detected by meticulous data analysis, they have to be accounted for. The second example of a rare phenomenon can be found in Kaufmann (this volume). Kaufmann deals with a likewise marked syntactic variant, namely the occurrence of dependent clauses in Mennonite Low German, where the only verbal element surfaces before its internal complement. In German varieties, which are all SOV, one would expect the finite verb to surface after its internal complement in a dependent clause. Kaufmann shows that the tokens of the rare phenomenon can be explained as an analogical extension of the informants’ derivational preferences with regard to verb projection raising and scrambling; the two syntactic mechanisms he claims are responsible for different cluster variants

20 

 Aria Adli, Marco García García, and Göz Kaufmann

in dependent clauses with two verbal elements. The problem in analyzing the phenomenon in question is therefore not the lack of formal explanations, but the necessity of finding the grammatical mechanisms whose interaction causes the rare phenomenon. The third case involving rare phenomena is presented in Adli (this volume). It deals with the variation of wh-constructions attested in Modern French, where nine different types of wh-variants can be distinguished, among them the wh-insitu construction (e. g. Tu fais le dessin quand ? ‘When do you do the drawing?’), the whVS construction (e. g. Quand fais-tu le dessin ?) or the wh-cleft construction (e. g. quand c’est que tu fais le dessin ?). As already mentioned, Adli’s study draws on production data as well as gradient acceptability judgments (both types of linguistic evidence were provided by the same set of individuals). The results show that some variants, such as the wh-in-situ construction, are very frequent, while others, such as the whVS construction or the wh-cleft construction, are only rarely attested or do not occur at all. Yet all of these variants were rated as acceptable. What is more, the variants belonging to a rather formal register were evaluated as being more acceptable than those pertaining to a colloquial register. For example, the formal whVS construction received the highest acceptability scores. However, this preference in acceptability is not reflected in the production data. Thus, Adli’s study reveals an interesting mismatch between usage and speaker judgments, showing that some variants hardly occur although they are rated as acceptable. Adli suggests that this frequency-acceptability mismatch is at least partly due to register: “While frequency data from spontaneous speech […] provide insight into colloquial language, acceptability data reflect the entire range of registers available to a speaker”. Moreover, he concludes that acceptability judgments are influenced by normative pressure, especially in the case of French. Comparing the findings of rare phenomena presented in Cornips (this volume), i. e., V2-violation in Germanic languages and the rare wh-variants studied in Adli (this volume), one sees that both are socially dependent. However, there exists an obvious difference. While the V2 violation is the result of a colloquial innovation process that is at present confined to peer conversations, some of the scarcely produced wh-variants (e. g. the whVS construction) seem to be the result of a socially determined conservation process.

3 Structure of the volume The volume is divided into three parts. The first part, entitled “System, usage, and variation”, opens with two central and complementary points of view: Frederick Newmeyer’s “Language variation and the autonomy of grammar” and Gregory

System and usage: (Never) mind the gap 

 21

Guy’s “The grammar of use and the use of grammar”. Newmeyer discusses the question of whether language variation calls into question the hypothesis of the autonomy of grammar. On the basis of a modular approach to variation, he argues that variation and probabilities do not pertain to linguistic knowledge proper. Rather, they should be viewed as the result of the interaction between grammatical competence and extragrammatical factors such as processing pressure or social factors. As already mentioned above, Newmeyer proposes that this interaction is modulated by the user’s manual – that is, conceived of as an interface between grammatical competence and extragrammatical factors. Guy takes a different stance on the relation between grammar and variation and argues that language is “a uniquely social phenomenon”. For him, linguistic knowledge is exclusively derived from usage and interaction with other users. Accordingly, the core linguistic knowledge is assumed to include knowledge about variation, probabilities and social factors. This does not mean that it is devoid of abstract operations and mental representations. However, abstract operations and mental representations are inferred from usage and are thus inherently probabilistic and variable in nature. The following two contributions are the papers of Richard Cameron and Mary Kato. Cameron’s article “Looking for structure-dependence, category-sensitive processes, and long-distance dependencies in usage” deals with specific problems of the system-usage dichotomy, while Mary Kato’s paper “Variation in syntax: Two case studies on Brazilian Portuguese” introduces the dimension of variation as the central topic. The dimension of variation is most strongly concentrated in the second part of this volume, entitled “Rare phenomena and variation”. All contributions in this part have a strong empirical orientation and take into account rare phenomena. Göz Kaufmann writes about “Rare phenomena revealing basic syntactic mechanisms: The case of unexpected verb-object sequences in Mennonite Low German.” He stresses the importance of a thorough analysis of the possible interplay of seemingly unrelated syntactic mechanisms. Leonie Cornips’ paper “The no man’s land between syntax and variationist sociolinguistics: The case of idiolectal variability” deals with the central question of intra-speaker variation which she exemplifies by means of four case studies. Finally, Aria Adli’s contribution “What you like is not what you do: Acceptability and frequency in syntactic variation” is especially important in view of the topics dealt with in section 2.1, where advantages and disadvantages of different types of empirical data are discussed. All papers in the third part of this volume, entitled “Grammar, evolution, and diachrony”, deal with the dimension of time. The papers “Gradual loss of analyzability: Diachronic priming effects” by Rena Torres Cacoullos and “How usage rescues the system: Persistence as conservation” by Malte Rosemeyer analyze

22 

 Aria Adli, Marco García García, and Göz Kaufmann

the causes of historic change in the Spanish progressive and Spanish auxiliary selection, respectively. Longer time periods and more abstract questions are the focus of Hubert Haider and Guido Seiler. Both their contributions, “‘Intelligent design’ of grammars – a result of cognitive evolution” and “Syntactization, analogy and the distinction between proximate and evolutionary causations” apply the biological concept of evolution to language and present hints of how the gap between functional and formal approaches may be narrowed.

References Adger, David (2006): Combinatorial variability. Journal of Linguistics 42: 503–530. Adger, David and Jennifer Smith (2010): Variation in agreement: A lexical feature-based approach. Lingua 120(5): 1109–1134. Adli, Aria (2013): Syntactic variation in French wh-questions: a quantitative study from the angle of Bourdieu’s sociocultural theory. Linguistics 51(3): 473–515. Barbiers, Sjef (2005): Word order variation in three-verb clusters and the division of labour between generative linguistics and sociolinguistics. In: Leonie Cornips and Karen P. Corrigan (eds.), Syntax and Variation: Reconciling the Biological and the Social, 233–264. Amsterdam: John Benjamins. Barbiers, Sjef (2013): Where is syntactic variation? In: Peter Auer, Javier Caro Reina and Göz Kaufmann (eds.), Language Variation – European Perspective IV: Selected Papers from the Sixth International Conference on Language Variation in Europe (ICLaVE 6), 1–26. Amsterdam/Philadelphia: John Benjamins. Behrens, Heike, Stefan Pfänder, Peter Auer, Daniel Jacob, Rolf Kailuweit, Lars Konieczny, Bernd Kortmann, Christian Mair and Gerhard Strube (to appear): Introduction. In: Heike Behrens and Stefan Pfänder (eds.), Experience counts: Frequency effects in language, Berlin/ Boston: Mouton de Gruyter. Belletti, Adriana and Luigi Rizzi (2002): Editors’ introduction: some concepts and issues in linguistic theory. In: Adriana Belletti and Luigi Rizzi (eds.), Noam Chomsky: On Nature and Language, 1–44. Cambridge: Cambridge University Press. Bender, Emily M. (2007): Socially meaningful syntactic variation in sign-based grammar. English Language and Linguistics 11(2): Special Issue on Variation in English Dialect Syntax: Theoretical Perspectives, 347–381. Biberauer, Theresa and Marc Richards (2006): True optionality: When the grammar doesn’t mind. In: Cedric Boeckx (ed.), Minimalist Essays, 35–67. Amsterdam: John Benjamins. Bybee, Joan L. (2002): Sequentiality as the basis of constituent structure. In: Talmy Givón and Bertram F. Malle (eds.), The Evolution of Language out of Pre-­language, 109–132. Amsterdam & Philadelphia: John Benjamins. Bybee, Joan L. (2006): From Usage to Grammar: The Mind’s Response to Repetition. Language 82(4): 711–733. Bybee, Joan L. (2010): Language, Usage and Cognition. Cambridge: Cambridge University Press.

System and usage: (Never) mind the gap 

 23

Bybee, Joan L. and Paul J. Hopper (2001): Introduction to frequency and the emergence of linguistic structure. In: Joan L. Bybee and Paul J. Hopper (eds.), Frequency and the Emergence of Linguistic Structure, 1–24. Amsterdam: John Benjamins. Chomsky, Noam (1965): Aspects of the Theory of Syntax. Cambridge: MIT Press. Chomsky, Noam (1981): Lectures on Government and Binding. Dordrecht: Foris. Chomsky, Noam (2000): New Horizons in the Study of Language and Mind. Cambridge: Cambridge University Press. Chomsky, Noam (2009): Opening remarks. In: Massimo Piattelli-Palmarini, Juan Uriagereka and Pello Salaburu (eds.), Of Minds and Language: A Dialogue with Noam Chomsky in the Basque Country, 13–43. Oxford: Oxford University Press. Culy, Christopher (1996): Null objects in English recipes. Language Variation and Change 8(01): 91–124. Cysouw, Michael and Jan Wohlgemuth (2010): The other end of universals: theory and typology of rara. In: Jan Wohlgemuth and Michael Cysouw (eds.), Rethinking Universals, 1–10. Berlin/New York: Mouton de Gruyter. Den Dikken, Marcel, Judy B. Bernstein, Christina Tortora and Raffaella Zanuttini (2007): Data and grammar: Means and individuals. Theoretical Linguistics 33(3): 335–352. Eckert, Penelope (2000): Linguistic Variation as Social Practice. Malden, MA/Oxford: Blackwell. Fanselow, Gisbert (2007): Carrots perfect as vegetables, but please not as a main dish. Theoretical Linguistics 33(3): 353–367. Featherston, Sam (2007a): Data in generative grammar: the stick and the carrot. Theoretical Linguistics 33(3): 269–318. Featherston, Sam (2007b): Reply. Theoretical Linguistics 33(3): 401–413. Fukui, Naoki (1993): Parameters and optionality. Linguistic Inquiry 24: 399–420. García García, Marco (2014): Differentielle Objektmarkierung bei unbelebten Objekten im Spanischen. Berlin/Boston: de Gruyter. Geeraerts, Dirk, Gitte Kristiansen and Yves Peirsman (eds.) (2010): Advances in cognitive sociolinguistics. Berlin/New York: Mouton de Gruyter. Grewendorf, Günther (2007): Empirical evidence and theoretical reasoning in generative grammar. Theoretical Linguistics 33(3): 369–380. Guy, Gregory R. (1991): Explanation in variable phonology: an exponential model of morphological constraints. Language Variation and Change 3(1): 1–22. Guy, Gregory R. (2005): Grammar and usage: A variationist response (Letters to Language). Language 81(3): 561–563. Guy, Gregory R. (2007): Grammar and usage: The discussion continues (Letters to Language). Language 83(1): 2–4. Haider, Hubert (2007): As a matter of facts comments on Featherston’s sticks and carrots. Theoretical Linguistics 33(3): 381–394. Haider, Hubert and Inger Rosengren (2003): Scrambling: nontriggered chain formation in OV languages. Journal of Germanic Linguistics 15(3): 203–267. Henry, Alison (2002): Variation and syntactic theory. In: Jack K. Chambers, Peter Trudgill and Natalie Schilling-Estes (eds.), The Handbook of Language Variation and Change, 267–282. Oxford: Blackwell. Holmberg, Anders, Aarti Nayudu and Michelle Sheehan (2009): Three partial null-subject languages: a comparison of Brazilian Portuguese, Finnish and Marathi. Studia Linguistica (Special Issue: Partial Pro-drop) 63(1): 59–97.

24 

 Aria Adli, Marco García García, and Göz Kaufmann

Hopper, Paul (1998): Emergent grammar. In: Michael Tomasello (ed.), The New Psychology of Language: Cognitive and Functional Approaches to Language Structure, 155–175. Mahwah: Lawrence Erlbaum. Kato, Mary A. (2011): Acquisition in the context of language change: the case of Brazilian Portuguese null subjects. In: Esther Rinke and Tanja Kupisch (eds.), The Development of Grammar: Language Acquisition and Diachronic Change – In Honour of Jürgen M. Meisel, 309–330. Amsterdam/New York: John Benjamins. Kaufmann, Göz (2005): Der eigensinnige Informant: Ärgernis bei der Datenerhebung oder Chance zum analytischen Mehrwert? In: Friedrich Lenz and Stefan Schierholz (eds.), Corpuslinguistik in Lexik und Grammatik, 61–95. Tübingen: Stauffenberg. Kaufmann, Göz (2007): The verb cluster in Mennonite Low German: A new approach to an old topic. Linguistische Berichte 210: 147–207. Kempen, Gerard and Karin Harbusch (2005): The relationship between grammaticality ratings and corpus frequencies: a case study into word order variability in the midfield of German clauses. In: Stephan Kepser and Marga Reis (eds.), Linguistic Evidence: Empirical, Theoretical and Computational Perspectives, 329–349. Berlin/New York: Mouton de Gruyter. Kristiansen, Gitte and René Dirven (eds.) (2008): Cognitive Sociolinguistics. Language Variation, Cultural Models, Social Systems. Berlin/New York: Mouton de Gruyter. Kroch, Anthony (1989): Reflexes of grammar in patterns of language change. Language Variation and Change 1: 199–244. Labov, William (1969): Contraction, deletion, and inherent variability of the English copula. Language 45(4): 716–762. Labov, William (1972): Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press. Labov, William (1975): What is a Linguistic Fact. Lisse: The Peter de Ridder Press. Langacker, Ronald W. (1987): Foundations of Cognitive Grammar, Vol. 1: Theoretical prerequisites. Palo Alto: Stanford University Press. Lightfoot, David (1999): The Development of Language: Acquisition, Change, and Evolution. Oxford/Malden, MA: Blackwell. Lightfoot, David (2006): How new languages emerge. Cambridge: Cambridge University Press. Meisel, Jürgen M. (2011): Bilingual language acquisition and theories of diachronic change: Bilingualism as cause and effect of grammatical change. Bilingualism Language and Cognition 14(2): 121–145. Newmeyer, Frederick J. (2003): Grammar is grammar and usage is usage. Language 79(4): 682–707. Newmeyer, Frederick J. (2006): Grammar and usage: A response to Gregory R. Guy (Letters to Language). Language 82(4): 705–708. Newmeyer, Frederick J. (2007): Commentary on Sam Featherston, Data in generative grammar: The stick and the carrot. Theoretical Linguistics 33(3): 395–399. Otheguy, Ricardo, Ana Celia Zentella and David Livert (2007): Language and dialect contact in Spanish in New York. Language 83: 770–802. Pullum, Geoffrey K. (2007): Ungrammaticality, rarity, and corpus use. Corpus Linguistics & Linguistic Theory 3(1): 33–47. Roeper, Thomas (1999): Universal bilingualism. Bilingualism: Language and Cognition 2(3): 169–186. Saito, Mamoru and Naoki Fukui (1998): Order in phrase structure and movement. Linguistic Inquiry 29(3): 439–474.

System and usage: (Never) mind the gap 

 25

Sankoff, David (1988): Sociolinguistics and syntactic variation. In: Frederick J. Newmeyer (ed.), Linguistics: the Cambridge Survey. Vol IV: The Socio-cultural Context, 140–161. Cambridge: Cambridge University Press. Sankoff, Gillian (2005): Cross-sectional and longitudinal studies. In: Ulrich Ammon, Norbert Dittmar, Klaus J. Mattheier and Peter Trudgill (eds.), An International Handbook of the Science of Language and Society, Volume 2, 2, 1003–1013. Berlin/New York: Mouton de Gruyter. Sankoff, Gillian and Hélèn Blondeau (2007): Language change across the lifespan: /r/ in Montreal French. Language 83(3): 560–588. Schütze, Carson T. (1996): The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. Chicago: University of Chicago Press. Szmrecsanyi, Benedikt (2005): Language users as creatures of habit: A corpus-based analysis of persistence in spoken English. Corpus Linguistics and Linguistic Theory 1(1): 113–150. Thom, René (1980): Modèles mathématiques de la morphogenèse. Paris: C. Bourgois. Tortora, Christina and Marcel den Dikken (2010): Subject agreement variation: Support for the configurational approach. Lingua: 1089–1108. Weinreich, Uriel, William Labov and Marvin I. Herzog (1968): Empirical foundations for a theory of language change. In: W. P. Lehmann and Yakov Malkiel (eds.), Directions for Historical Linguistics: A Symposium, 95–195. Austin, TX: University of Texas Press. Wilson, John and Alison Henry (1998): Parameter setting within a socially realistic linguistics. Language in Society 27: 1–21. Yang, Charles D. (2000): Internal and external forces in language change. Language Variation and Change 12(3): 231–250. Zwicky, Arnold M. (1999): The grammar and the user’s manual. Paper presented at ‘LSA’s Linguistic Institute (Forum Lecture)’, University of Illinois-Champaign & Urbana.

Part 1: System, usage, and variation

Frederick J. Newmeyer, University of Washington, University of British Columbia and Simon Fraser University Canada

Language variation and the autonomy of grammar1

Abstract: This paper takes on the question of whether the facts of language variation call into question the hypothesis of the autonomy of grammar. A significant number of sociolinguists and advocates of stochastic approaches to grammar feel that such is the case. However, it will be argued that there is no incompatibility between grammatical autonomy and observed generalizations concerning variation.

1 Introduction The point of departure of this paper is a set of propositions which, while not universally accepted among linguists, have at least a wide and ever-increasing currency. They are, first, that a comprehensive theory of language has to account for variation (Weinreich, Labov and Herzog 1968 and much subsequent work); second, that much of everyday variability in speech is systematic, showing both social and linguistic regularities (Labov 1969 and much subsequent work); third, that language users are highly sensitive to frequencies, a fact that has left its mark on the design of grammars (Hooper 1976 and much subsequent work); and fourth, that an overreliance on introspective data is fraught with dangers (Derwing 1973 and much subsequent work). The question to be probed is whether, given these propositions, one can reasonably hypothesize that grammar is autonomous with respect to use. The paper is organized as follows. Section 2 introduces the concept of the ‘autonomy of grammar’, along with some theoretical and methodological considerations relevant to its understanding. The central section 3 examines and attempts to refute recent claims that the facts surrounding language variation show that autonomy is untenable. Section 4 is a brief conclusion.

1 I would like to thank Marco García García and Hubert Haider for their comments on the entire pre-final manuscript. Thanks also to Ralph Fasold, David Odden, and Panayiotis Pappas for fruitful discussion on the topic of this paper. They are not to be held responsible for any errors.

30 

 Frederick J. Newmeyer

2 The autonomy of grammar I characterize the Autonomy of Grammar as follows: (1) The Autonomy of Grammar (AG): A speaker’s knowledge of language includes a structural system composed of formal principles relating sound and meaning. These principles, and the elements to which they apply, are discrete entities. This structural system can be affected over time by the probabilities of occurrence of particular grammatical forms and by other aspects of language use. However, the system itself does not directly represent probabilities or other aspects of language use.

Put informally, grammars do not contain numbers. My conclusion will be that even adopting the points of departure above, AG is a motivated hypothesis. Three important methodological considerations will be assumed throughout the paper: (2) Three considerations of methodology: a. It is incorrect to attribute to grammar per se what is adequately explained by extragrammatical principles. b. Given that grammars are models of speaker knowledge, facts that a speaker cannot reasonably be expected to know should not be attributed to grammar. c. Knowledge of the nature of some grammatical construct is not the same kind of knowledge as that of how often that grammatical construct is called upon in language use.

I begin with some fairly obvious and somewhat trivial examples of these points and then turn to more complex cases. As far as (2a) is concerned, nobody would suggest that speakers with a serious head cold should be endowed with a separate grammar, even though their vowels are consistently more nasalized than those of the healthier members of their speech community. Appeal to the partial blockage of the passages involved in speech production suffices to explain the phenomenon. Many different types of generalizations fall under (2b). For example, speakers might know that adjectives like asleep, awake, and ajar are different from most other adjectives in that they do not occur prenominally. But they can hardly know that the reason for their aberrant behavior derives from the fact that these adjectives were historically grammaticalizations of prepositional phrases (awake was originally at wake). A child in acquiring his or her language does not learn the history of that language. Along the same lines, children acquiring German learn the principles involved in V2 order and those acquiring English learn to produce the retroflex ‘r’ sound. But neither learn that these elements of their languages are typologically quite rare. Likewise, speakers cannot be assumed to know epiphenomenal facts, that is, properties of their language that are the byproduct of other properties (which may or may not be part of knowledge). Speakers know

Language variation and the autonomy of grammar  

 31

principles of Universal Grammar and they know (implicitly) that such-and-such a sentence is ungrammatical. But they do not know that the ungrammaticality results in part from a particular principle. It takes complex scientific reasoning to arrive at such a conclusion. In other words, not everything that a linguist knows is necessarily known by the speaker. As two linguists whose attention to data is legendary put it: Not every regularity in the use of language is a matter of grammar. (Zwicky and Pullum 1987: 330, cited in Yang 2008: 219)

Finally, to exemplify (2c), I know the meaning of the definite article the, its privileges of occurrence, and its pronunciation. I also happen to know that I am more likely to use that word than any other word of English. These however are different ‘kinds of knowledge’. I learned the former as an automatic consequence of acquiring competence in English. The latter is a metalinguistic fact that arose from conscious observation and speculation about my language. So given these considerations, how can we know what to include in models of grammatical competence and what to exclude from it? In particular, given the theme of this volume, how can we know to what extent (if at all) variability is encoded in the grammar itself? As it turns out, classical formal grammar has nothing to say about probabilistic aspects of grammatical processes, except to hypothesize that where we find variability we have ‘optional’ grammatical rules. For example, in Chomsky (1957) active and passive pairs were related by an optional transformational rule. No attempt was made to capture as part of the rule the fact that actives are used more frequently than passives and are used in different discourse circumstances. In fact, an approach to grammar excluding the direct representation of probabilities might be the best one to take if it could be shown, in line with (2a–c), that the probabilities in question are a different sort of knowledge from grammatical knowledge or are not in any reasonable sense ‘knowledge’ at all. So the crucial question is to what extent speakers actually ‘know’ the probabilities associated with points of variation and if they do know them, then what kind of knowledge that is. One alternative to their knowing probabilities might be that quantitative aspects of speaker behavior are no more than a reflection of principles that, in their interaction, lead them to act in a certain way a certain percentage of the time. Let me give an example of an epiphenomenal consequence of interacting principles that is drawn from everyday life. My place of work is four blocks north of where I live and four blocks west. I could construct a ‘grammar of my walk to work’ to characterize my procedure for proceeding from my home to my office. Each intersection that I cross has a traffic light. If the first light that I come to is green, I continue straight on to the north. If the light is red, I turn

32 

 Frederick J. Newmeyer

to the left (to the west). I continue this procedure at each intersection up to the point where I don’t overshoot my mark to the west or to the north. This leads to the possibility of more than a dozen different routes for getting to work. But in practice they are not equally frequent, because traffic lights differ from each other in the percentage of time that they are green or red. It would not be hard, if I wanted to do it, to calculate the percentage of time that I take each route. But I do not ‘know’ these percentages, in any reasonable sense of the word ‘know’. I certainly have a vague feeling that I take some routes more than others. But the percentages are not encoded in my ‘grammar of walking to work’. They fall out as an epiphenomenal by-product of the interaction of my grammar of walking and the timing of traffic lights. I think that at this point the reader can see where I am heading, as far as probabilistic generalizations in linguistics are concerned. To the extent that variability is predictable externally, it does not need to be encoded in the grammar. But before turning to linguistic examples, the reader might well ask about my walking to work story: ‘Is the percentage of the time that I take any particular route to work completely predicted by reference only to my grammar of walking and to the timing of traffic lights?’ The answer is ‘no’, and the reason for that negative answer is quite relevant to how we should handle linguistic data that are not fully predictable. We return to this problem below. Turning to language, a huge number of facts that one might be tempted to put in the grammar no more belong there than probabilities belong in my grammar of walking to work. So consider a pair of sentences from Manning (2002): (3) a. b.

It is unlikely that the company will be able to meet this year’s revenue forecasts. That the company will be able to meet this year’s revenue forecasts is unlikely.

Manning points out that we are far more likely to say (3a) than (3b) and suggests that this likelihood forms part of our knowledge of grammar. No it does not. It is part of our use of language that, for both processing and stylistic reasons, speakers tend to avoid sentences with heavy subjects (see Hawkins 1994; 2004). As a consequence, one is more likely to say things like (3a) than (3b). It would be superfluous to repeat in the grammar what is adequately accounted for outside of it. The probability of using some grammatical element might arise as much from real-world knowledge and behavior as from parsing ease. For example, Wasow (2002) notes that we are much more likely to use the verb walk intransitively than transitively, as in (4a–b): (4) a. b.

Sandy walked (to the store). [frequent intransitive usage] Sandy walked the dog. [infrequent transitive usage]

Language variation and the autonomy of grammar  

 33

He takes that fact as evidence that stochastic information needs to be associated with subcategorization frames. But to explain the greater frequency of sentence types like (4a) than (4b), it suffices to observe that walking oneself is a more common activity than walking some other creature. It is not a fact about grammar. What I offer then is a classically modular approach to variation, that is, an approach which posits an autonomous grammatical module at its core. The observed complexities of language result from the interaction of core competence with other systems involved in language. Figure 1 illustrates:

Figure 1: A modular approach to variation2

In this view, the observed probability of a particular variant results from the interaction of the formal grammar and extragrammatical faculties, as modulated by the user’s manual. What then is the nature and function of the user’s manual? This construct presents one face to grammatical competence and one face to the external factors that shape grammar. In a nutshell, it tells us what to do with our grammars and how often to do it. For example, it might tell an English speaker to avoid stranding prepositions in formal writing, to extrapose heavy subjects, and to avoid certain vulgar expressions in polite conversation. These usage conventions are not totally arbitrary, of course. They are shaped by stylistic level, processing pressure, and

2  A referee poses the question of where information structure fits into this picture. While space does not permit a comprehensive reply, in my view, generalizations pertaining to the grammardiscourse interface are partly subsumed under ‘grammatical competence’, partly handled in the user’s manual, and are partly shaped by external factors such as the exigencies of constructing a coherent discourse.

34 

 Frederick J. Newmeyer

social factors respectively. But they are not totally predictable either. Clearly language varieties differ in the degree to which one factor predominates in a particular situation. So there is a certain degree of arbitrariness in the user’s manual. Let me give a concrete example of the functioning of the user’s manual. For several decades there has been a debate on the nature of Subjacency and other constraints on extraction.3 Two counterposed views have been put forward: (5) Two views on the nature of Subjacency: a. It is a universal constraint on (competence) grammars (Chomsky 1973 and most subsequent formalist work). b. It is a performance condition resulting from parsing (and other) pressure (Kuno 1973 and most subsequent functionalist work).

In support of (5a), it is typically pointed out that the effects of Subjacency differ somewhat from language to language and that there is no one-to-one correspondence, in any particular language, between parsing and other pressure and what is ruled out by the constraint (see Fodor 1984; Newmeyer 2005). In support of (5b) it is typically pointed out that, to a very great extent, the effects of Subjacency do follow from external pressure (Deane 1992; Kluender 1992). Furthermore, the effects of Subjacency are variable, in that some island effects are stronger than others. In English, for example, it is more difficult to extract from finite clauses than from non-finite clauses (see Szabolcsi and den Dikken 1999). In my view, both the formalists and the functionalists are partly right and partly wrong. Subjacency is just the sort of phenomenon we would expect to find localized in the user’s manual. The interaction between external pressure, in this case largely parsing pressure, and the language-particular grammar, generates Subjacency effects. But I use the term ‘generates’ advisedly. The interaction of grammar and performance invites, so to speak, the existence of Subjacency effects, but it does not fully predict them. The user’s manual, for reasons of historical accident, the vagaries of use, and so on, necessarily accommodates a certain degree of arbitrariness.

3  Subjacency is a constraint that rules out extractions of elements in particular syntactic configurations. For example, the deviant English sentences *What did you wonder where Bill put? and *The woman who I believe the claim that Mary talked to are Subjacency violations.

Language variation and the autonomy of grammar  

 35

3 L anguage variation is compatible with grammatical autonomy Let us now turn to some key studies of language variation that might seem to call into question the correctness of the hypothesis of grammatical autonomy. Throughout this section, I elaborate on the point, discussed in the previous section, that not all generalizations about grammatical patterning are necessarily handled the same way. Some are encoded directly in the grammar and some are not. Take /-t, -d/ deletion in English, that is, the deletion of a coronal stop in final clusters: (6) a. b. c.

I don’ think so. Then he pass’ me his plate. She tol’ a lie.

The probability of deletion is tied to the nature of the preceding and following segment, as is partly illustrated in Table 1: Table 1: Following segment effect on English /-t, -d/ deletion (Guy 1980: 14) — Following Context, Rate of deletion (Varbrul 1 factor weights)

Obstruent 1.0

Liquid .77

Glide .59

Vowel .40

Does that mean we need to complicate the rule of deletion by including this information? In other words, do we need a variable rule? Not necessarily, if the knowledge of how often we delete is a different kind of knowledge from whether we are allowed to delete at all. And it is a different kind of knowledge. Let’s look at that point in more detail. Guy (1997) has constructed several arguments with the goal of demonstrating that the variable weights need to be stated in the rule itself. For example, he has argued that if the regularities of variability were stated in some separate performance component, then we would need to state the same constraint twice, once in the grammar and once in performance. So consider the fact that final /t/ and /d/ are never pronounced after another /t/ and /d/: (7) a. *paint#t *raid#d b. painted raided

(with epenthesis)

36 

 Frederick J. Newmeyer

More generally, as in (8): (8) The more shared features between the /t/ and /d/ and what precede them, the less likely the sequence will be realized in actual speech (Guy and Boberg 1994: n. p.).

As Guy notes, these appear to be Obligatory Contour Principle (OCP) effects, and he remarks: Now, if the variable data arise not because of the competence OCP constraint, but stem from a separate performance OCP constraint, it should come as a theoretical surprise, a random coincidence, that the two are so similar in nature and direction of effect. (Guy 1997: 134)

But there is no ‘competence OCP constraint’, in the sense of there being a principle of universal grammar called the OCP. As Odden (1986) and others have pointed out, what are called OCP effects differ wildly from language to language and are not even present in some languages. What we have is universal articulatory- and acoustic-based pressure to avoid sequences of segments that are ‘too close’ to each other. English grammaticalizes this pressure to a certain extent more than some languages and less than others. This pressure is responsible for the impossibility of forms like (7a). But that has nothing to do with the rule of /t, d/-deletion per se, much less a variable condition that needs to be imposed on it. In fact, we never find geminate /t/’s and /d/’s in English. What we have then is an interface principle of the user’s manual that looks at English phonology in one direction and looks at external pressure in the other direction and generates the statistical generalization. Is this interface principle an automatic exceptionless consequence of the interaction of English phonology and phonetic pressure? Certainly not, but the fact that variable data cannot be derived in their entirety from universal principles, does not mean that they need to be stated ad hoc in the competence grammar. Let me draw another analogy with the grammar of my walking to work. There are, in fact, more factors than the timing of traffic lights that affect the probability of my taking one route more often than another. Some are ‘global’, in that they would affect anybody following the same strategy. For example, one intersection might be blocked by construction and therefore likely to be avoided. Some constraints are what one might call ‘local’ or ‘personal’. For example, I might opt more often for a particular route because the view appeals to me. But the fact that the probabilities are not fully predictable does not mean that one needs to revert to encoding them in the grammar of walking per se. One derives what one can with the understanding that there will always be a residue of contributing factors that are unexplained and perhaps inexplicable.

Language variation and the autonomy of grammar  

 37

Another argument from Guy (1997) involves the fact that deletion is much more frequent before /l/ than before /r/. Without deletion, resyllabification would result in /tl-/ /dl-/ onsets, which, as Guy points out, are lexically impossible in English. But they are not impossible universally, so, he argues, there is no hope of deriving the facts from articulatory universals. But it is not necessary to derive them from articulatory universals. They can be derived in part from the fact, already noted, that English bans /tl-/ and /dl-/ onsets. In other words, the probability-computing function of the user’s manual has access to the mental grammar. The user’s manual is also aware that the rule of /-t, -d/ deletion, which can produce such onsets, is optional. Taking into account these two facts, it instructs the speaker to delete /t/ and /d/ before the relevant /l/’s with a high frequency rate. Of course there is a lot more to be said than that. For example, the user’s manual also has information from the other direction. For phoneticallybased reasons, /tl-/ and /dl-/ onsets are relatively rare. That fact surely influences the probability of /-t, -d/ deletion in this case, though I am not in a position to specify precisely how. When we turn to syntax, it is even easier to pinpoint the problems with purely grammatical approaches to variability. At least in phonology, we can usually say with confidence that two variants are just different ways of saying the same thing. That is much less true in syntax. In an important paper, Beatriz Lavandera (1978) pointed out that the choice of syntactic variants is determined in part by the meaning that they convey. Viewed from that angle, assigning probabilities to rules, structures, or constraints seems especially problematic. The probabilities may be more a function of one’s intended meaning than of some inherent property of the linguistic unit itself. No, it is not the case that all syntactic variants differ in meaning. But the great majority do, if our definition of ‘meaning’ includes the full range of discourse-pragmatic aspects of interpretation. Let’s take the various possibilities of post-verbal orderings of elements in English as an example: (9) Heavy NP Shift a. The waiter brought the wine we had ordered to the table. b. The waiter brought to the table the wine we had ordered. (10) Dative Alternation a. Chris gave a bowl of Mom’s traditional cranberry sauce to Terry. b. Chris gave Terry a bowl of Mom’s traditional cranberry sauce. (11) Verb-Particle a. Sandy picked the freshly baked apple pie up. b. Sandy picked up the freshly baked apple pie.

38 

 Frederick J. Newmeyer

Arnold, Wasow, Losongco and Ginstrom (2000) calculated the probabilities for speaker choice of one ordering variant or the other and found a complex interaction of meaning factors, in particular whether the constituent is new to the discourse or not, and processing factors, such as the ‘heaviness’ (that is, the pro­ cessing complexity) of the constituent. In other words, the (a) and (b) variants in (9–11) can be used to convey different meanings. Hence we have a good example here of why we would not want to tie variability to particular rules or grammatical elements. Since the heavy NP shift alternants, the dative alternants, and the verbparticle alternants do not mean the same thing, the alternant that is chosen in discourse is a function in part of the meaning that the speaker wishes to convey. That fact would be obscured by a probabilistic rule relating the variants in question. So we see how incorporating variability into a particular rule can mask an explanation of the underlying generalization. Another example can be drawn from variable subject-verb agreement in Brazilian Portuguese (BP; see Guy 2005 for a summary). In that language, subjects can occur both preverbally and postverbally. But interestingly, subject-verb agreement is disfavored, but not categorically impossible, with postposed subjects. Guy, following a popular, but not universally accepted, analysis, suggests that subjects of unaccusative verbs are originally VP-internal and have to raise across the verb to trigger the feature checking that accomplishes agreement. In his analysis, the variability in agreement is a property of the feature checking process (and hence purely grammar-internal). I offer as an alternative the idea that agreement in post-verbal position is completely optional, as far as the competence grammars of BP speakers is concerned. Why is that a better alternative? As a first point, it needs to be stressed that preverbal and postverbal subjects do not have the same meaning. In BP, as in all Romance languages, preverbal and postverbal subjects differ in their discourse properties (Naro and Votre 1999). These meaning differences would be obscured by a variable rule relating the two subject positions. But there is more to be said than that. If postverbal subjects do in fact originate in object position, then we have an independent explanation for the agreement facts. Verb-object agreement is crosslinguistically significantly less common than subject-verb agreement (Siewierska and Bakker 1996), a fact which is rooted ultimately in the greater topicality of subjects vis-à-vis objects (Corbett 2005). In the approach advocated here, all of the relevant generalizations can be accommodated. The grammar of BP allows agreement with arguments in both positions. The user’s manual interfaces discourse-based and functional factors on the one hand and the grammar on the other hand to derive the statistical generalizations. As far as syntactic rules and meaning are concerned, there certainly are processes that have little or no effect on meaning. So it is sometimes claimed

Language variation and the autonomy of grammar  

 39

that there is no meaning difference between sentences in English where the subordinate clause is marked by the complementizer that and those where it is not. (12a–b) are examples: (12) a. b.

I think that I’ll make a shopping list today. I think I’ll make a shopping list today.

So at first thought, that-deletion might seem to be an appropriate candidate for a variable rule. I do not think that such would be the most promising way to proceed. The variants might be identical in meaning strictly speaking, but nevertheless there are a huge number of interacting factors that determine the retention or omission of that: (13) The presence or absence of that is affected by (Bolinger 1972; Quirk, Greenbaum, Leech and Svartvik 1985; Thompson and Mulac 1991; Biber, Johansson, Leech, Conrad and Finegan 1999; Hawkins 2001; Dor 2005; Kaltenböck 2006; Kearns 2007; Dehé and Wichmann 2010): a. the type and frequency of the matrix verb b. the type of the main clause subject (pronominal vs. full noun phrase) c. the choice of matrix clause pronoun d. the length, type, and reference of the embedded subject e. the position and function of the embedded clause f. the voice of the main clause (active vs. passive) g. ambiguity avoidance h. the linear adjacency or not of the matrix verb and that i. the speech register j. the ‘truth claim’ (Dor 2005) to the proposition of the embedded clause k. the rhythmic pattern of the utterance

No doubt with sufficient ingenuity one could write a variable rule of that-deletion sensitive to all of the conditioning factors in (13a–k). But that would be a mistake, since each of the conditioning factors conditions other processes in English. For example, consider (13a). The matrix verbs that inhibit that-deletion, for example, factive verbs like regret, are the same ones that resist infinitival complements and resist extraction from the complement: (14) a. b. c.

I regret *(that) he left. *I regret to have left. *What did he regret that he saw?

Clearly, the generalization is much broader than something expressible by a probabilistic condition on that-deletion. Such a condition would in fact mask the crosslinguistic generalization that factive verbs are less malleable, so to speak, than nonfactive ones (see Givón 1980).

40 

 Frederick J. Newmeyer

Even when variants have the same meaning, it is clear that they can differ stylistically. That fact poses more than a small problem for handling variation grammar-internally. Put simply, it would lead to a different set of probabilities for each genre, carrying the idea of handling variation grammar-internally to an unacceptable conclusion. It is sometimes claimed that stylistic variation poses no problems, since it is said to be quantitatively simple, involving raising or lowering the selection frequency of socially sensitive variables without altering other grammatical constraints on variant selection (Boersma and Hayes 2001; Guy 2005). In fact, Guy (2005: 562) has written that “it is commonly assumed in VR analyses that the grammar is unchanged in stylistic variation.” The research on register does not support such an idea. Biber has shown that there are at least six ‘dimensions’ in which genres interact: (15) The 6 ‘dimensions’ in which genres interact (Biber 1988): a. Involved versus informational production b. Narrative versus non-narrative concerns c. Explicit versus situation-dependent reference d. Overt expression of persuasion e. Abstract versus non-abstract information f. On-line informational elaboration

Different genres, and the grammatical variability that they manifest, map differently onto each dimension. Along Dimension (15e), for example, we find differences within press reportage genres. Passives and other past participial constructions are much more probable in spot news broadcasts than in financial reporting. We find similar statistical differences between scientific and humanistic writing. As far as spoken language is concerned, there are systematic differences along Dimensions (15a), (15c), and (15f) with respect to different types of telephone conversations. What all of this shows, and Biber gives many more examples, is that each speaker of English would need to be endowed with a multitude of different variable rule-containing grammars if one were serious about handling variation grammar-internally. The question that has to be raised is: ‘If variable rules are so well motivated and have been so successful, then why have people all but stopped formulating them?’ As long as twenty years ago, Ralph Fasold was writing about ‘The quiet demise of variable rules’ (Fasold 1991). It is true that there are a lot of people doing probabilistic approaches to grammar these days. But by and large they have engineering tasks as their ultimate goal. They are not building models of grammatical knowledge. The models mix speech forms from different speech communities and styles willy nilly. As one well-known practitioner of this approach has remarked: ‘As far as I’m concerned, if I can Google it, it’s English’ (attributed, perhaps apoc-

Language variation and the autonomy of grammar  

 41

ryphally, to Christopher Manning). As far as sociolinguistics is concerned, what one sees in the majority of papers analyzing variable phenomena are tables of constraints with their associated VARBRUL probabilities and no indication of where these numbers fit in to an explicit statement of linguistic structure. Probably one reason that one sees fewer and fewer variable rules is that there has been an increasing realization that the units of variation do not mesh very well with the units of analysis arrived at by grammarians in their grammatical models. This problem has been known since the 1980s. For example, Romaine (1982) looked at the variation among three possible occupants of the complementizer position in English: (16) Possible occupants of the complementizer position in English relative clauses (Romaine 1982): a. She’s the person who I saw (Wh-phrase) b. She’s the person that I saw (that-complementizer) c. She’s the person ___ I saw (φ)

She toyed with the idea of writing a variable rule associating the three options, but soon realized that formal grammatical analysis does not relate the three options by means of the same rule. Who is generally regarded as belonging to the system of fronted wh-elements, while that is a complementizer. So a variable rule relating the three options would not be a simple matter of adding a set of probabilities to an existing motivated grammatical rule. Rather, it would involve adopting a grammatical analysis accepted by few if any grammarians. One could make the same point about the relationship between sentences like (3a–b) above, which I repeat here as (17a–b): (17) a. b.

It is unlikely that the company will be able to meet this year’s revenue forecasts. ?That the company will be able to meet this year’s revenue forecasts is unlikely.

In older versions of transformational grammar, it is true that a rule of Extraposition derived (17a) from (17b). No doubt such a rule could have been reinterpreted as a variable rule. But movement approaches to the relationship between these sentence types are no longer current. It is not clear how the variability could be encoded into a rule, or what that rule would be. I find it both very interesting and very puzzling that practically everybody who has proposed a probabilistic rule has implicitly or explicitly kept social factors out of the statement of the rule. Such factors are there in principle. Consider David Sankoff’s characterization of what a variable rule is and does: Whenever a choice among two (or more) discrete alternatives can be perceived as having been made in the course of linguistic performance, and where this choice may have been

42 

 Frederick J. Newmeyer

influenced by factors such as features in the phonological environment, the syntactic context, discursive function of the utterance, topic, style, interactional situation or personal or sociodemographic characteristics of the speaker or other participants, then it is appropriate to invoke the statistical notions and methods known to students of linguistic variation as variable rules. (Fodor 1984; Sankoff 1988: 986; emphasis added)

I can hardly pretend to have mastered the entire body of sociolinguistic literature, but I am not aware of a paper in which gender, class, identity, and so on have been incorporated into the statement of a variable rule. In other words, advocates of variable rules themselves have adopted, to an extent, a modular approach to linguistic variation. So I am not suggesting anything radical to variationists – just that they follow through and make their approach a consistently modular one.4

4 Conclusion To conclude in one sentence, there is no incompatibility between the facts of language variation and the correctness of the hypothesis of the autonomy of grammar. Lest there be any doubt on the question, I feel that the discovery of systematic variability in language is one of the great breakthroughs of 20th century linguistics and I have said as much in print (Newmeyer 1996). The only issue is its formal implementation. I hope to have made a convincing case that a treatment of systematic variability centered on a grammatical system interacting with usage-based facts, but not itself incorporating those facts, is the best-motivated approach.

References Anttila, Arto (1997): Deriving variation from grammar. In: Frans Hinskens, Roeland van Hout and W. Leo Wetzels (eds.), Variation, change, and phonological theory, 35–68. Amsterdam: John Benjamins. Anttila, Arto (2002): Variation and phonological theory. In: Jack K. Chambers, Peter Trudgill and Natalie Schilling-Estes (eds.), Handbook of language variation and change, 206–243. Oxford: Blackwell.

4  Hubert Haider has observed (personal communication) that “[f]rom a European perspective, a sociolinguistic concept of variable rules for covering language variation appears to be amusingly naïve. Only in a context like that of the US, without historically grown, easily identifiable, regional dialects, could such a position be at all tenable.”

Language variation and the autonomy of grammar  

 43

Arnold, Jennifer E., Thomas Wasow, Anthony Losongco and Ryan Ginstrom (2000): Heaviness vs. newness: The effects of structural complexity and discourse status on constituent ordering. Language 76: 28–55. Biber, Douglas (1988): Variation across speech and writing. Cambridge: Cambridge University Press. Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad and Edward Finegan (1999): Longman grammar of spoken and written English. London: Longman. Boersma, Paul and Bruce Hayes (2001): Empirical tests of the Gradual Learning Algorithm. Linguistic Inquiry 32: 45–86. Bolinger, Dwight (1972): That’s that. The Hague: Mouton. Chomsky, Noam (1957): Syntactic structures. The Hague: Mouton. Chomsky, Noam (1973): Conditions on transformations. In: Stephen R. Anderson and Paul Kiparsky (eds.), A festschrift for Morris Halle, 232–286. New York: Holt Rinehart and Winston. Corbett, Greville G. (2005): Number of genders. In: Martin Haspelmath, Matthew S. Dryer, David Gil and Bernard Comrie (eds.), The world atlas of language structures, 126–29. Oxford: Oxford University Press. Deane, Paul D. (1992): Grammar in mind and brain: Explorations in cognitive syntax. Berlin/New York: Mouton de Gruyter. Dehé, Nicole and Anne Wichmann (2010): Sentence-initial I think (that) and I believe (that) Prosodic evidence for use as main clause, comment clause and discourse marker. Studies in Language 34: 36–74. Derwing, Bruce L. (1973): Transformational grammar as a theory of language acquisition: A study in the empirical, conceptual, and methodological foundations of contemporary linguistic theory. Cambridge: Cambridge University Press. Dor, Daniel (2005): Toward a semantic account of that-deletion in English. Linguistics 43: 345–382. Fasold, Ralph (1991): The quiet demise of variable rules. American Speech 66: 3–21. Fodor, Janet D. (1984): Constraints on gaps: Is the parser a significant influence? In: Brian Butterworth, Bernard Comrie and Östen Dahl (eds.), Explanations for language universals, 9–34. Berlin/New York: Mouton. Givón, Talmy (1980): The binding hierarchy and the typology of complements. Studies in Language 4: 333–377. Guy, Gregory R. (1980): Variation in the group and the individual: The case of final stop deletion. In: William Labov (ed.), Locating language in time and space, 1–36. New York: Academic Press. Guy, Gregory R. (1997): Competence, performance, and the generative grammar of variation. In: Frans Hinskens, Roeland van Hout and W. Leo Wetzels (eds.), Variation, change, and phonological theory, 125–143. Amsterdam: John Benjamins. Guy, Gregory R. (2005): Grammar and usage: A variationist response. Language 81: 561–563. Guy, Gregory R. and Charles Boberg (1994): The obligatory contour principle and sociolinguistic variation. Toronto Working Papers in Linguistics: Proceedings of the Canadian Linguistics Association 1994 Annual Meeting. Hawkins, John A. (1994): A performance theory of order and constituency. Cambridge: Cambridge University Press. Hawkins, John A. (2001): Why are categories adjacent? Journal of Linguistics 37: 1–34.

44 

 Frederick J. Newmeyer

Hawkins, John A. (2004): Efficiency and complexity in grammars. Oxford: Oxford University Press. Hooper, Joan B. (1976): Word frequency in lexical diffusion and the source of morphophonemic change. In: William M. Christie (ed.), Current progress in historical linguistics, 95–106. Amsterdam: North-Holland. Kaltenböck, Gunther (2006): ‘…That is the question’: Complementizer omission in extraposed that-clauses. English Language and Linguistics 10: 371–396. Kearns, Katherine S. (2007): Epistemic verbs and zero complementizer. English Language and Linguistics 11: 475–505. Kluender, Robert (1992): Deriving island constraints from principles of predication. In: H. Goodluck and M. Rochemont (eds.), Island constraints: Theory, acquisition, and processing, 223–258. Dordrecht: Kluwer. Kuno, Susumu (1973): Constraints on internal clauses and sentential subjects. Linguistic Inquiry 4: 363–385. Labov, William (1969): Contraction, deletion, and inherent variability of the English copula. Language 45: 716–762. Lavandera, Beatriz R. (1978): Where does the sociolinguistic variable stop? Language in Society 7: 171–182. Manning, Christopher, D. (2002): Probabilistic syntax. In: Rens Bod, Jennifer Hay and Stefanie Jannedy (eds.), Probabilistic linguistics, 289–341. Cambridge, MA: MIT Press. Naro, Anthony J. and Sebastião J. Votre (1999): Discourse motivations for linguistic regularities: Verb/subject order in spoken Brazilian Portuguese. Probus 11: 75–100. Newmeyer, Frederick J. (1986): Linguistic theory in America: Second edition. New York: Academic Press. Newmeyer, Frederick J. (1996): Benchmarks: 35 years of linguistics. The Sciences 36: 13. Newmeyer, Frederick J. (1998): Language form and language function. Cambridge, MA: MIT Press. Newmeyer, Frederick J. (2002): Optimality and functionality: A critique of functionally-based optimality-theoretic syntax. Natural Language and Linguistic Theory 20: 43–80. Newmeyer, Frederick J. (2005): Possible and probable languages: A generative perspective on linguistic typology. Oxford: Oxford University Press. Odden, David (1986): On the role of the Obligatory Contour Principle in phonological theory. Language 62: 353–383. Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech and Jan Svartvik (1985): A comprehensive grammar of the English language. Harlow: Longman. Romaine, Suzanne (ed.) (1982): Sociolinguistic variation in speech communities. London: Edward Arnold. Sankoff, David (1988): Variable rules. In: Ulrich Ammon, Norbert Dittmar and Klaus J. Mattheier (eds.), Sociolinguistics: An international handbook of the science of language and society, 984–997. Berlin/New York: Walter de Gruyter. Saussure, Ferdinand de (1916/1966): Course in general linguistics. New York: McGraw-Hill. [Translation of Cours de linguistique générale. Paris: Payot, 1916]. Siewierska, Anna and Dik Bakker (1996): The distribution of subject and object agreement and word order type. Studies in Language 20: 115–161. Szabolcsi, Anna and Marcel den Dikken (1999): Islands. Glot International 4–6: 3–8. Thompson, Sandra A. and Anthony Mulac (1991): The discourse conditions for the use of the complementizer that in conversational English. Journal of Pragmatics 15: 237–251.

Language variation and the autonomy of grammar  

 45

Wasow, Thomas (2002): Postverbal behavior. Stanford, CA: CSLI Publications. Weinreich, Uriel, William Labov and Marvin I. Herzog (1968): Empirical foundations for a theory of language change. In: W. Lehmann and Y. Malkiel (eds.), Directions for historical linguistics, 95–188. Austin, TX: University of Texas Press. Yang, Charles D. (2008): The great number crunch. Journal of Linguistics 44: 205–228. Zwicky, Arnold M. and Geoffrey K. Pullum (1987): Plain morphology and expressive morphology. Berkeley Linguistics Society 13: 330–340.

Gregory R. Guy, New York University

The grammar of use and the use of grammar Abstract: Language is a uniquely social phenomenon – it is acquired exclusively through interaction with other users. The use of language is further characterized by orderly heterogeneity: regular patterns of social and linguistic conditioning. Since a speaker’s knowledge of language derives from and includes knowledge of usage, it incorporates knowledge about variability, frequency, and social significance. An adequate linguistic theory must account for this social, interactive nature of language. Dichotomous oppositions between system and usage, competence and performance, I-language and E-language, etc., are misleading, if they incline us to favor puristic concepts of the mental grammar as an idealized, invariant, categorical system. But an entirely usage-based model, devoid of abstract operations, is also inadequate. Rather, the empirical evidence indicates that speakers do construct mental representations, abstractions, and operations to guide their production, but these are probabilistic and variable, rather than deterministic and discrete. From probabilistically patterned input, speakers infer inherently variable grammars, from which they generate productions that display orderly diversity.

1 Language as a social phenomenon Language is a social phenomenon, seated in society, and acquired by the individual only through social interaction – i. e., usage. This means that the primary information people possess about their languages is information about how they are used in communities by speakers. Such information about social usage provides the base for learning language, and for using it productively and competently; ultimately this is what we know when we know a language. Consequently, knowledge of language necessarily incorporates some level of familiarity with all the linguistic diversity, complexity, and variability that we encounter in the world around us. Therefore I will argue that the linguistic systems we construct during the course of language acquisition and use are necessarily formed by, sensitive to, and generative of, variability and use. This has the consequence that the traditional distinctions in linguistics that oppose system vs. usage, competence vs. performance, etc., are false dichotomies – false in the sense that they are not very helpful in constructing coherent and adequate theories of language. This does not mean, however, that there is only usage, or that there is no linguistic

48 

 Gregory R. Guy

system or grammar; rather, I conclude that linguistic systems – in the sense of abstract representations and processes – clearly do exist, but they emerge from use and experience, are constructed out of it and dependent upon it. The empirical evidence will belie both kinds of theoretical purism – that which denies the relevance of usage and that which denies abstraction and system. Recognizing the fundamentally social nature of language – its status as both the product and medium of social interaction – is essential to achieving an adequate account of language, and hence an adequate linguistic theory. If linguists conceptualize language as primarily or exclusively the property of the individual mind, we are bound to go wrong. Consider, first, the origins of language, both phylogenetically in the species, and ontogenetically in the individual. In the individual, it is clear that social interaction is the sine qua non of language development: without it, there is no language. This is evident in several ways. Humans who have no social interaction as children, who are not part of a speech community, do not develop any language at all. Child language development occurs exclusively through social interaction, so that the fortunately rare cases of isolated individuals – so-called ‘wild’ children, and children isolated due to abuse – do not develop language (cf. Fromkin 1974). But given a human community and social interaction, some communicative system will emerge, even if no prior language is available, or if there are other hindrances to communication. Thus we have the emergence of contact languages, pidgins, home signing systems, and the like, appearing in situations where people interact but cannot do so on the basis of normal language acquisition. On this evidence we may further conclude that the phylogenetic situation was the same: the raison d’être, the ultimate motivation for language evolution in the first place, was social interaction and communication, not as some would have it, as an aid to the solitary activity of thought. The evolutionary competitive advantage of homo loquens ‘the talking hominid’ was social communication (Mufwene 2011). The appropriate conclusion is that societies have languages, individuals do not. Indeed, specific ‘languages’ such as English, Cantonese, and Kikuyu only persist and continue to have regular patterns and continuities because they are the ongoing means of communication in complete human societies. When a language ceases to be used by a speech community, it dies. But when an individual dies, the languages he or she spoke survive, so long as they are used by a community. Thus the structures and system of language do not exist in isolation from human interaction, from usage. Usage and interaction therefore provide both the data and the motivation for language acquisition: for developing the mental capacity to speak ourselves. And let us be clear, usage provides not just some of the data, but all of the data: we have in fact no other kinds of evidence about how our languages work or how

The grammar of use and the use of grammar 

 49

to be a speaker except what we hear and perceive from those around us.1 This is true of the child language learner, and it is also true of the linguist; even a linguist’s intuition about grammaticality is a product of the system. The idea that intuition and introspection give us a usage-free pathway to inspect the inner workings of language directly is misguided. But given that our information comes only from usage, the next questions for the linguist are, what do we do with that information, and what does the mental capacity to be a speaker consist of? These are the questions that point us to the ‘system’ part of the title of this volume. But we must be cautious in how we use the terminological distinction between ‘system’ and ‘usage’, lest we fall into the essentialist trap of believing that two different labels must refer to two essentially distinct things – that the ‘system’ is something separate from all the evidence obtained from usage. I take this pair of terms to be a contemporary update of the familiar dichotomy that has beguiled and bedeviled linguistic theory since Saussure famously distinguished langue from parole. In my view, the grand Swiss scholar, widely considered to be the father of modern linguistics, did the discipline a disservice with his love of dichotomies, particularly those opposing synchronic and diachronic linguistics, and langue and parole. Linguistics has been mesmerized ever since by the seductive metaphorical opposition between a system and its products, subsequently restated as competence vs. performance, I-language vs. E-language, grammar vs. usage. Before analyzing the substance of this dichotomy, let us consider how these concepts have worked in terms of the sociology of the field – what have they meant for the people and practices in linguistics? From this perspective, I believe the dichotomy has flourished for two reasons. First, it’s a simplifying assumption: it designates all variability, heterogeneity, idiosyncrasy, and other messy stuff as belonging to something other than ‘real’ language, and allows us to set it aside while we figure out the general patterns. This is a typical step in the early stages of a scientific field: it is easier to work out generalizations and models if we can start by ignoring some of the complexity of reality. Thus a simple model of mechanical motion might start by ignoring friction, and a simple model of gravity might start by ignoring relativity. These were in fact the ways those theories developed in physics, so we can surely forgive our predecessors in linguistics for doing the same thing when they postulated a categorical, homogeneous, abstract mental grammar, ignoring diversity in society and variability in the individual. But when

1  I neglect here the still-debated issue of innateness; even if individuals possess an innate mental faculty that aids in language acquisition, it does not aid them in acquiring any particular language in the absence of linguistic interaction with other speakers.

50 

 Gregory R. Guy

a science achieves greater maturity and self-confidence, it needs to revisit the simplifications and incorporate the inconvenient facts previously ignored, if it is to continue to progress. I think linguistics is clearly at that point: we need to marry our models of language with linguistic reality – focusing on questions like: how does language work, how can we communicate and understand each other by means of speech, and what do we do when we are doing that? However, it is obvious that not all linguists agree on this point. There is a tendency in the field to reify these simplifying assumptions as if they were a fundamental truth about the nature of reality; this approach leads to Chomsky’s (1965: 4) surprising position (hardly altered in the last half-century) that “observed use of language … surely cannot constitute the actual subject matter of linguistics, if this is to be a serious discipline”. This remarkable, but once widely accepted position deserves serious reflection. Why would linguists adopt such a stance towards the very substance of language? To me, it seems to contradict reality, basic scientific empiricism, and even common sense, to assert that a serious science cannot be based on the study of what people do when they communicate with language. So why has such a position been so influential in our discipline? One possible answer that merits consideration is that the Chomskyan position – opposing competence/system to performance/usage while simultaneously defining the study of competence as the ‘serious’ science – is a self-justifying ideology in the service of the interests of a particular school and a particular methodology. It licenses the practitioner to ignore messy data from production, and avoids tiresome empirical testing of one’s theories. It saves one the trouble of doing fieldwork, and allows one to work while introspecting comfortably in an armchair. Even better, it exalts theory constructors over data analyzers and fieldworkers. This is all very attractive – Chomsky offers linguists a way to make our problems more tractable, to make our working conditions more pleasant, and to feel superior, all at once. Of course, even within generative syntax there has been increasing discomfort with the shakiness of badly gathered introspection (cf. Schütze 1996), but I-language, not E-language, continues to constitute the fundamental focus of Chomskyan linguistics. The problem that constantly confronts such an approach is that the diversity of linguistic reality keeps undermining the dichotomy. Looking at the world, it is clear that system and usage interpenetrate: within the linguistic system there is variability, reflecting patterns of usage, and within variable usage there is system and structure. There are social, usage-based constraints on the grammar, and systematic, grammar-based constraints on social usage. So at the present state of our knowledge, the dichotomy has little explanatory or interpretive value. It doesn’t even have much practical value, if it is now doing more to obstruct progress in the

The grammar of use and the use of grammar 

 51

field than to facilitate it. It is time to abandon this conceptualization of our problems, and move to dealing with language as it is, not as we would imagine it to be.

2 Orderly heterogeneity: The systematic nature of usage A more illuminating approach begins with the two observations enunciated by Weinreich, Labov and Herzog (1968) as elements of their “Empirical Foundations for a Theory of Language Change”. These are “inherent variability” and “orderly heterogeneity”. Orderly heterogeneity is the observation that variability in language is not random and arbitrary, but structured and systematic. It is linguistically structured, quantitatively constrained by the linguistic system. It is also socially structured: speakers in a community are not randomly different from one another; rather, linguistic diversity systematically reflects social organization: people’s usage follows regular patterning by age, sex, class, ethnicity, linguistic experience, interlocutor, purpose, context, acts of identity, and so on. Our data about language, and our knowledge of it as users, look like the following examples. In New York City, where I live and work, users of English hear the social patterning in the use of coda /r/ that Labov famously reported in 1966, reproduced here as Figure 1. The rates of /r/ production show simultaneous class and stylistic differentiation, such that higher status speakers use more /r/, and everybody uses more /r/ in their more careful or formal styles. Crucially, these patterns are regular, systematic, and pervasive, and they are regularly produced, perceived, and interpreted by NYC English speakers. Class 6–8

80

9 4–5 2–3 1 0

(r)

60 40 20 0 A

B

C Style

D



Figure 1: Coda /r/ production in New York City English: Stratification by socioeconomic class and speech style (A: casual speech, B: careful speech, C: reading style, D: word lists, D’: minimal pairs), from Labov (2006: 152)

Another pattern of class differentiation is regularly encountered when something about the language is changing – specifically, when the change is a spontaneous innovation emerging within the speech community. In such situations we typi-

52 

 Gregory R. Guy

cally encounter a curvilinear distribution, with a peak in the lower middle class or upper working class, as illustrated in Figure 2 from Labov’s (1980: 261) study of two ongoing vowel changes in Philadelphia English, the fronting of the nucleus of (aw) (e. g., ounce, house) and the raising of the nucleus of (ey) in closed syllables (e. g., made, take). 200

p

.26

4b: External constraint: Following segment effect on coronal stop deletion – – – Morphological Class – – – Following Segment

Underived

Regular Past

Consonants Vowels Pause

.73 .31 .45

.65 .24 .63

Range: .42

=

.41

What this implies is that variability and quantitative properties are found in the system, inside the grammar. And as we saw in the previous section, systematic, regular ‘grammatical’ properties are also found within the use of language. So the dichotomy that opposes system and usage, assigning invariant and categorical properties to the system/grammar, and variable and probabilistic properties to usage, is turning into an obstacle to explanation, rather than a facilitator.

4 T  owards an integrated theory: Grammar emerges from experience What then are the elements of a more coherent vision that eschews the facile system vs. usage dichotomy in pursuit of a model of the fundamental unity of

60 

 Gregory R. Guy

grammar and use? We can start where everyone starts, as a child encountering the language in use in the community around us. Usage constitutes our entire input. We have an intelligent mind, perhaps even endowed with specialized neural networks that facilitate language processing. But whether or not language is cerebrally special, we face the general problem of identifying units, collocations, and productive principles that will allow us not simply to reproduce the specific utterances that we have heard, but to form our own novel utterances that will be correctly interpreted by others. We must make our output well-formed, so we have to figure out what ‘well-formedness’ consists of. Basically, we have to find patterns. The patterns are the grammar, the system. Where do they come from? The usage-based perspective on this issue, associated with linguists like Bybee (2001, 2002) and Pierrehumbert (2001, 2006), argues that system is emergent, consisting of generalizations across observed usage. Let me present an example from my research with Daniel Erker on Spanish pro-drop (Erker and Guy 2012). Spanish has optional use of subject personal pronouns (SPPs). They can occur overtly or be omitted, as in (3), where both full and omitted forms communicate the same meaning. (3) Overt subject personal pronoun Yo quiero ‘I want’ Omitted subject personal pronoun Quiero ‘I want’

So how does a speaker know or decide where to use one or the other? Much previous research on this topic has turned up some systematic, widely general patterns of use that are governed by morphosyntactic and discursive structures of Spanish. For example, SPP occurrence is regularly constrained by verbal morphology, verb semantics, and discourse reference (cf., inter alia, Otheguy and Zentella 2012). The morphological constraint contrasts different tense/mood/ aspect forms, with the result that TMA categories with more distinctive verbal inflection (e. g., the preterit, where every person/number category has a distinctive inflected form) are associated with lower probabilities of pronoun occurrence than those with less distinctive inflection (e. g., the imperfect, where first and third person singular forms are systematically identical). The verbal semantics constraint favors SPP use for verbs of mental activity, while those of external activity show less SPP occurrence. And the discourse level constraint considers the flow of reference in a text: a subject which makes reference to a different person than the subject of a preceding sentence (switch reference) is more likely to be expressed by an overt pronoun. These patterns are confirmed in our data, as shown in Table 5.

The grammar of use and the use of grammar 

 61

Table 5: Three constraints on Spanish SPP occurrence (from Erker and Guy 2012: 540–541) N

% overt SPP

 708 2695  877

43 % 36 % 29 %

 840 1438 2601

45 % 36 % 31 %

2653 2233

40 % 29 %

Tense-Mood-Aspect form of verb Imperfect Indicative Present Indicative Preterit Indicative   F = 9, p  Patient/Theme > Locative

Syntactization, analogy and distinction between causations 

 245

(7) a. AGENT >> PATIENT: sie öffnet die Tür she opens the door b.

BENEFICIARY >> PATIENT: sie bekommt ein Auto she gets a car

c.

PATIENT: die Tür geht auf the door goes open ‘the door opens’

Crucially, subjects are not inherently linked to any specific thematic role. This means that subjecthood does not contribute anything to semantic interpretation. The subject condition is a constraint on clause structure as such. Similarly, subjects are not inherently linked to any specific discourse function. If we adopt Choi’s (1997: 75) distinction of Topic, Focus, and Tail in German (adapted from Vallduví 1992), it follows that subjects may be linked to each of the discourse functions: (8) a.

Subject = Topic: Obama wird die nächsten Wahlen GEWINNEN Obama will the next elections win

b.

Subject = Focus: die nächsten Wahlen wird OBAMA gewinnen the next elections will Obama win

c.

Subject = Tail: die nächsten Wahlen wird Obama SICHER gewinnen the next elections will Obama surely win

SpecCP: Let us now turn to the other type of expletive insertion. German is a socalled verb-second language (more precisely: a finite-second language). German declarative main clauses obey a verb-second constraint: There is one (and only one) constituent position (SpecCP, the German prefield position) before the finite verb which must be filled. We have already seen that SpecCP is linked neither to any syntactic function (subject, object, etc.) nor to any particular thematic role. As regards information structure, the SpecCP position does not seem to be linked to a particular discourse function: The relationship between discourse-pragmatic functions and the respective means of expression is unusually complex, disparate and partly contradictory in German (see Musan 2002 for an overview). However, it is uncontroversial that the SpecCP position can be used for topicalization:

246 

 Guido Seiler

(9) a. [DP dieses Buch] hat Anna heute in die Bibliothek gebracht this book has Anna today to the library brought b.

[PP in die Bibliothek] hat Anna heute dieses Buch gebracht to the library has Anna today this book brought

c.

[VP dieses Buch in die Bibliothek gebracht] hat Anna heute this book to the library brought has Anna today

Most interestingly, SpecCP-topicalization is limited to one constituent only. From the perspective of communicative function one could easily figure out contexts where topicalization of two constituents would be appropriate, e. g. dieses Buch and in die Bibliothek. The resulting sentence is understandable and even discourse-pragmatically interpretable for speakers, but it is ill-formed due to its violation of the verb-second constraint: (9) d. [DP dieses Buch] [PP in die Bibliothek] hat Anna heute gebracht this book to the library has Anna today brought

Case marking: Why does a language have case marking? Blake (2001: 1) defines the function of case as “a system of marking dependent nouns for the type of relationship they bear to their headsˮ. Applied to transitive predicates, this means that nominative and accusative (or ergative and absolutive) have the function of morphologically distinguishing subjects from objects. In a predicate HIT(Hans, Peter) it is functional to be able to distinguish between the hitting and the hit participant. This can be achieved by means of word order, verbal agreement, or morphological or adpositional case marking. However, in many transitive predicates grammatical means of subject-object distinction are completely useless: (10) a. b. c.

BUY(Hans, the car) EAT(my brother, a tomato) WRITE(Anna, a letter)

A typical transitive predicate combines an argument which is high in definiteness and animacy with an argument which is low in definiteness and animacy. Languages with differential subject or object marking have grammaticalized the prototypical decrease in definiteness and animacy from subject to object insofar as only less typical subjects or objects require an overt marker of the grammatical function (cf. Aissen 2003 among others). Thus, the fact that such languages make use of overt case marking especially in those predications with potential (not necessarily actual) ambiguities is not too surprising from a functional perspective on case. What is surprising indeed is the obligatoriness of case marking in strict case marking languages (i. e. languages without differential subject or

Syntactization, analogy and distinction between causations 

 247

object marking) like German: Here, objects are required to be expressed in the accusative case regardless of potential or actual ambiguities. In a sense, strict case marking languages like German are by far too explicit with respect to the subject-object distinction. It seems that these languages make use of the available case morphology with no regard to communicative function but simply because the structural configuration requires it. As for German, this observation is even true if looked at from a different perspective. In Modern German, the nominativeaccusative distinction of noun phrases containing a determiner is very weak, for the distinction is overtly expressed only in the masculine singular. In addition, and even worse, German notoriously lacks any case morphology in proper names. Yet in the written standard variety proper names are not accompanied by a determiner. Nominative and accusative of proper names are thus identical in their form. But proper names are high in both animacy and definiteness, so if we had the freedom to distribute case morphemes over parts of the lexicon, we would certainly give them to proper names in the first place! In sum, German case is dysfunctional in two ways. On the one hand, German uses too much case (in unambiguous transitive predicates); on the other hand, case is too limited (as far as proper names are concerned). To conclude this section, let us briefly come back again to the general issue of (absence of) extrasyntactic functionality of the discussed examples. We have argued that each example poses great difficulties in terms of functional motivation as it is generally understood in the functionalist literature. However, we are convinced that expletives, verb-second, case marking, etc. do fulfill a certain function in the linguistic system – just not a function at the level of semantics, pragmatics, or iconic encoding. In order to identify this function it is necessary to overcome the bilateral-semiotic view as it is practiced e. g. in Construction Grammar. At the core of the bilateral-semiotic view lies the assumption that grammar at all of its levels consists of a collection of constructional schemes, i. e. form–meaning pairings of varying size. The intuition is that grammar has the job of packing particular meanings into particular constructions. But grammar has other functions than just that. It guarantees structural well-formedness. Whereas particular patterns of well-formedness are not related to any particular meaning or communicative function, grammar as a whole has a function indeed: It makes processing easier. Thus, ultimately the purpose of structural well-formedness as such is to make communication more efficient. This can be achieved in various ways, but it has to be done somehow. How exactly well-formedness is realized is to a great extent specific to a language (as long as the language remains within the constraints of possible cross-linguistic variation); some languages put their objects in front of the verb, others after it – the important thing is that they have to do it somehow. If we think of typological variation in terms of different rankings

248 

 Guido Seiler

of Optimality-theoretical constraints, it is random how exactly a single language ranks the constraints. But it has to rank them somehow in order to efficiently function in communication: It is not so much the particular constraint ranking which is functional but the mere fact that constraints are ranked in a language at all. Having accepted that well-formedness as such can ultimately be motivated in terms of communicative function, it is a surprising fact about theoretical linguistics that the notion of well-formedness does not play a role in the functionalist paradigm, which has left the territory of well-formedness to the formalist school.

2.2 The structure–function paradox and its possible solution In a formalist view on syntax, syntactic structure is to some degree immune against usage. Autonomous syntactic structure is taken as evidence for the innateness of fundamental, abstract principles of structural organization (Universal Grammar). In our proposal we defend the existence of autonomous syntax (in line with formalist linguistics, contra functionalism) but we do not follow formalism in interpreting autonomy as evidence for innate principles. This does not inevitably mean that the idea of Universal Grammar must be rejected altogether. We only believe that the hypothesis of Universal Grammar does not necessarily (though possibly) follow from the existence of autonomous syntax. In other words, autonomy of syntax is not very convincing evidence of Universal Grammar. A much more striking type of evidence is cross-linguistic generalizations. Syntactic structure, autonomous or not, must come from somewhere – if not from innate principles, it must stem from the function of language in communication and constraints on language use. As far as autonomous parts of syntax are concerned this is, of course, a paradox: How can functional factors shape structural traits which are functionally unmotivated? The solution lies in a literal understanding of the above-mentioned verb “stem from”: Functional constraints on usage influence the ways syntax develops over time, how variants are selected by speakers increasingly often until obligatorization. However, the results of these processes may well be independent of the functional factors which formerly drove the emergence and spread of a variant. Thus, there is a causal relationship between structure and function, but an indirect one: Usage drives change, but is rather irrelevant for the synchronic structural shape of a syntactic pattern. The idea that diachrony is essential for our understanding of the structure–function relationship is not new. It is most explicitly formulated by Haspelmath (1999a: 183–184), who observes that constraints on structural markedness as assumed by Optimality Theory are often functionally motivated. He proposes that the

Syntactization, analogy and distinction between causations 

 249

link between structure and function can be constructed only via diachrony, i. e. processes of variation and selection (1999a: 187–189). What is new in the present proposal is the claim that functional factors may become obsolete over time, thus enhancing autonomy. Our central assumptions, which will be exemplified in detail in Section 3, are the following: (i) There is autonomy of syntax. (ii) Autonomy of syntax is the result of language use and diachronic development. (iii) The cognitive mechanism by which autonomous syntactic structure is diachronically implemented is analogy. Ad (i): In Section 2.1 above, we defined autonomy of syntax as a cover term for those aspects of syntax which cannot be motivated by anything extrasyntactic such as meaning, communicative function or general cognitive principles (e. g. iconicity). We thus assume that there are aspects of syntax which are arbitrary from a functional perspective. Arbitrariness of the linguistic sign is one of the most fundamental design features of human language (Hockett 1960) and one of the central insights of modern structural linguistics. It is a surprising fact that arbitrariness has been disputed at all in the area of syntax, given the fact that arbitrariness is a standard assumption about vocabulary, but also about morphology (cf. morphomes, Aronoff 1994). Even in phonology we find arbitrary traits, i. e. traits which cannot be motivated in terms of articulation or perception, namely opacity (Kiparsky 1973). So, if arbitrariness is a fundamental property of human language structure as a whole, why should syntax be an exception? Ad (ii): Having accepted that syntactic autonomy exists, one has to ask where it comes from. I do not see any obvious reason to conclude from autonomy to innateness of basic principles of syntactic organization. Rather, I propose again that we should learn from phonology and morphology. Phonological opacity goes back to formerly transparent, phonetically motivated alternations which persist even at a time when the motivating factor is lost, thus gaining a certain degree of autonomy, or phonetic arbitrariness. Morphomes, such as e. g. arbitrary inflectional classes, are often the synchronic reflex of transparent, e. g. semantically motivated distinctions at an earlier stage of the language. Again, the relevance of the motivating factor has decreased or even been lost entirely. Phonological and morphological autonomy have in common that they both emerge through diachronic processes, namely diachronic processes of a special kind: loss of conditioning environment (henceforth LOCE). LOCE is a very common pathway of language change, and it would be surprising if syntax were an exception. The loss of semantic or pragmatic conditioning in the development of syntactic structure was a central observation in early grammaticalization research,

250 

 Guido Seiler

under the term “syntactization”. What Givón (1979) describes in the following citation can be understood as LOCE at the syntactic level: “Loose, paratactic, ‘pragmatic’ discourse structures develop – over time – into tight, ‘grammaticalized’ syntactic structures. […] Language […] takes discourse structure and condenses it – via syntactization – into syntactic structureˮ (Givón 1979: 108; emphasis mine). From the perspective of LOCE, we can paraphrase syntactization as follows: At some earlier time, the use of an expression was dependent on the presence of a particular pragmatic context. It required an extrasyntactic trigger. Later, the expression gained a certain degree of autonomy with regard to extrasyntactic factors. Ad (iii): Strictly speaking, LOCE does not necessarily mean that once an old distributional pattern is lost a new one emerges: The distribution of expressions may become totally random from a synchronic point of view. However, in the interesting cases the new distribution of expressions also follows a certain pattern, but one which no longer reflects the old motivating factor. Thus, in order for syntactization to work a new distributional pattern must be established which is syntactic in essence. It is often the case that an expression is formerly used only under certain extrasyntactic contextual conditions which are then dropped such that the syntactic environment alone triggers the use of that expression. That is, a grammatical pattern is extended from a source environment to other cases. This is, of course, the classical definition of analogical extension. Analogical extension starts from a source context and affects items in a (larger) target context which shows some functional or structural similarity to the source context. Analogical extension may affect all items within a given target context, in which case we speak about obligatorization, or syntactization as far as a syntactic pattern is concerned. I assume that analogical extension is the mechanism by which autonomous syntactic structure is implemented diachronically. The term “analogy” describes both a cognitive mechanism and a common pathway of diachronic change, as Bybee (2010) emphasizes: “It is important to note that analogy as a type of historical linguistic change is not separate from analogy as a cognitive processing mechanism” (Bybee 2010: 72). The literature on analogy is abundant. There is some agreement among authors that analogy is a more general, domain-independent cognitive principle ( cf. Blevins and Blevins 2009; Itkonen 2005; cf. also Gentner, Holyoak and Kokinov 2001, without particular reference to language structure). Also, authors emphasize the importance of similarity relations in analogy (Itkonen 2005: chapter 1.1; Bybee 2010: 57; de Smet 2012: 603). Non-technically speaking, we might understand analogical extension as an instance of the general tendency to use similar strategies for similar tasks. If you have learned to eat spaghetti by rotating a fork you will rotate the fork for linguine, too, thus eat linguine in analogy to spaghetti. More

Syntactization, analogy and distinction between causations 

 251

specifically, and with regard to language structure, we can distinguish between similarities in terms of communicative function and similarities in terms of structural makeup. It is often very difficult to tell functional and structural similarities apart, and it is probable that both closely interact in analogical change (Itkonen 2005: 1). I illustrate the interaction of functional and structural similarities by reference to a fraction of Old High German inflectional morphology. In early Old High German a small number of neuter nouns of the so-called “strong” inflectional class displayed a stem allomorphy, e. g. chalb- / chelbir- ‘calf’, whereby the second allomorph was used when a suffix followed: chalb (Nom.Sg.), but chelbir-e (Dat.Sg.) (Braune 2004: 188). The majority of nouns of the strong inflectional class did not display this kind of stem allomorphy, cf. wort ‘word’ (Nom. Sg.), wort-e (Dat.Sg.) (Braune 2004: 184). In the later development of Old High German the stem allomorphy vanished: chalb (Nom.Sg.), chalb-e (Dat.Sg.). In terms of functional similarity we can define dative singular formation as the task which is common to both wort-e and chelbir-e/chalb-e. However, it is interesting to note that other existing patterns of dative singular formation did not serve as models here, cf. e. g. hërza–hërzen (‘heart’, neuter, “weak” inflectional class) or anst–ensti (‘favor’, feminine, strong inflectional class) (Braune 2004: 203, 207). Obviously, the model for dative singular formation was selected within a specific structural environment, namely within the limits of the strong inflectional class of non-feminines (class distinction and gender were morphomic (thus meaningless) already in Old High German). We might therefore understand functional similarity as the dimension along which similar tasks are grouped together and structural similarity as the dimension along which the potential models for fulfilling the task are grouped together. In other cases the distinction between functional and structural similarities is even more difficult to draw. Taking German subject expletives as an example (see Section 2.1), we might understand the insertion of the dummy pronoun es in finite clauses with weather verbs (lacking any argument positions) as follows: The task is the formation of a finite clause. This is fulfilled in analogy to the prototypical case which here serves as the structural model, i. e. predications involving at least one argument position (of which the one with the most prominent thematic role is assigned the subject function by default). Finally, with regard to analogy-driven change in particular, we understand analogy as grammar optimization, as proposed by Kiparsky (1982, 2012). Kiparsky defines analogical change as “the elimination of unmotivated grammatical complexity or idiosyncrasy” (Kiparsky 2012: 21). Thus, analogical change makes a pattern more general by removing contextual restrictions. This is exactly what happens in cases of syntactization where an expression’s dependence upon specific semantic or pragmatic contexts is weakened and eventually dropped.

252 

 Guido Seiler

It is worth noting that the concept of syntactic analogy is not new at all, although its role has perhaps been underestimated. According to Percival (1971), it goes back to Neogrammarian concepts of change, in particular to Blümel (1914). What does the proposed scenario mean for the relationship between structure and function, and for the division of labor between formal and functional explanations? Morphosyntactic change is driven by forces which are well understood and described in functionalist terms, such as reanalysis, grammaticalization, iconicity and analogical extension. It seems that (the direction of) change is a direct reflection of the ways language is used by speakers to achieve their communicative goals. However, frequent use of grammatical patterns may entrench their structural makeup to such a degree that functional motivations (which enabled the process to get into play) become obsolete. What a concrete example of such usage-driven syntactization (with autonomy as its result) might look like will be discussed in greater detail in Section 3. If the proposal made here is correct, it follows that functional explanations are especially powerful with regard to patterns of variant selection and thus ongoing change, but too limited for a deeper understanding of the resultant, synchronic grammatical structure. It is formal theories of syntax in the first place that are suitable to predict grammatical well-formedness (and this, of course, is exactly what they are designed for).

3 Variation and change: A case study 3.1 The importance of variation and change for theoretical linguistics In order to capture how syntactization works, it is essential to understand how extrasyntactic triggering of an expression may turn into a syntactic one. There is no better source of data than cross-dialectal variation for a deeper investigation into the subtle differences in the conditioning factors for various expressions. We adopt an approach to language change inspired by evolutionary theory (Haspelmath 1999a; Croft 2000; Seiler 2002, 2003, 2004; de Vogelaer 2007; Rosenbach 2008; cf. Haider, in this volume). According to this view, change is a twostep process: emergence of new variants and selection among available variants. Croft (2000) terminologically distinguishes between “innovation” (≈ emergence) and “propagation” (≈ selection). Croft’s “propagation” is limited to the success of a variant in terms of its sociolinguistic function only. In earlier work (Seiler 2002, 2003, 2004) I proposed supplementing Croft’s “propagation” with the concept of “implementation” which refers to the status of the variant in the respective linguistic system (its valeur linguistique), for Croft’s limitation to social factors in the selection process turned out to be insufficient and entirely ignores linguistic

Syntactization, analogy and distinction between causations 

 253

factors which may become crucial for the selectional success of a variant (cf. de Vogelaer 2007 for a similar point). So, why is cross-dialectal variation important in this context? New variants emerge at some place at some time (often via processes of relatively mechanical, “blind” structural reanalysis, as we will see in the following section). They then gradually spread over larger areas. The first consequence of variant spread for the infected grammars is just the addition of a new option, i. e. spread leads to variant competition in larger areas. However, different dialects often deal with a given set of competing variants in different ways, according to social, functional or structural factors (one might say that dialects do different things with the same set of available expressions). Dialects may develop different functional arrangements between those variants. A variant may become obligatory in dialect A under certain contextual conditions, but not in dialect B, whereas in dialect C other conditions are relevant than in dialects A and B, etc. In short, dialects differ not only in their inventories of variants, but also in the ways variants are implemented in their respective systems of grammar.6 Therefore, cross-dialectal variation offers us the most direct insight into the rise and fall of functional motivations of variant selection.

3.2 Prepositional dative marking in Upper German The phenomenon under discussion in this section is relatively widespread in Upper (southern) German dialects. It occurs in dialects of Alsace, Baden-Württemberg, German-speaking Switzerland, Bavaria, Austria and South Tyrol. For all kinds of details I refer to previous work (Seiler 2002, 2003, 2004). In these dialects, a dative noun phrase can be preceded by a prepositional marker (DM = dative marker): (11)

sàg’s in der frau say-it DM the:Dsf woman ‘say it to the woman’ (Bavarian: Upper Inn Valley; Schöpf 1866: 286)

(12)

er git dr Öpfel a mir, statt a dir he gives the apple DM me:D instead DM you:D ‘he gives the apple to me, not to you’ (Alemannic: Glarus; Bäbler 1949: 31)

6 We assume that different dialects have different grammars. Dialect variation is just cross-linguistic variation between closely related languages.

254 

 Guido Seiler

The dative marker is homophonous with the local/directional prepositions an ‘at’ or in ‘in’. The distribution of the two sound forms is geographically determined. However, the dative marker is entirely meaningless, and its historical source is probably not a local/directional preposition (see below). The examples above demonstrate that prepositional dative marking is not a periphrasis, i. e. not a strategy to avoid the dative case since dative case morphology is used in this construction, too. Prepositional dative marking is therefore rather a reinforcement of the dative, which is somewhat surprising since Upper German dialects have generally preserved dative inflections anyway (in many dialects the dative is even the only case which is clearly morphologically distinct from the nominative). Thus, prepositional dative marking cannot be interpreted as a compensation for eroding case inflections either. As for the grammatical status of the dative marker, it is not entirely clear whether we are dealing with a preposition or something else (e. g. a prefix). In most respects, the dative marker indeed behaves like prototypical prepositions. Most strikingly, dative marker and preposition occur in complementary distribution: (13) mit der frau in der frau *mit in der frau

‘with the:Dsf woman’ ‘DM the:Dsf woman’ ‘with DM the:Dsf woman’

Other observations suggest that the dative marker is less independent than other prepositions. For example, it does not allow scope over two conjuncts. In Seiler (2003: 148) I analyze the dative marker as an element of the class of prepositions, whereby it is a special property of the dative marker that it is not able to project a prepositional phrase but is rather head-adjoined to the following determiner. As for the emergence of the dative marker, it is argued in Seiler (2003: 215) that reanalysis of article forms plays a crucial part. Already in Middle High German, dative article forms, e. g. the singular masculine dëme, formed fusional morphs with prepositions, whereby the initial dental of the article was dropped: (14) obem 1280, uf(f)em 1270, am 1277, im 1258, underm 1276, us(s)em 1409, vom 1277, vorem 1280, hinderm 1403, bim 1280, zem 1245 (Idiotikon XIII: 1191–1192).

In Upper German the form without an initial dental has been generalized over all other contexts, also in dialects without prepositional dative marking. Thus (with the exception of extremely conservative dialects) the article form became əm, with some variation in the vocalism. There exists a whole paradigm of fusional morphs , some of which are homophonous with the bare dative article in unstressed position (namely the equivalents of Stand-

Syntactization, analogy and distinction between causations 

 255

ard German im ‘in_the’, am ‘at_the’; cf. Seiler 2003: chapter 8.1 for details). It is relatively obvious to reanalyze a form əm, which is etymologically just , as having the morphological structure . This is exactly what happened in a subset of Upper German dialects which developed prepositional dative marking. But why should this reanalysis take place after all? According to Nübling (1992: 221), the most frequent and thus prototypical context for datives is post-prepositional anyway. More than 90 % of datives are governed by a “true” preposition in Upper German. Developing prepositional dative marking means that the prototypical context for dative forms is generalized even over those contexts where no other preposition is there already (e. g. in indirect object function). We might interpret this process as analogical extension: Formerly bare datives are realized in analogy to the more frequent, i. e. post-prepositional occurrence type. Reanalysis as a process of mechanical structural variation produces an element without any particular meaning or function, but with a category label: the dative marker as an expletive preposition. In light of the evolutionary framework as outlined in Section 3.1, language change is a two-step process. Reanalysis simply adds a new variant; indeed, prepositional dative marking and bare datives still coexist in most dialects. However, different dialects deal with this variant competition in different ways, i. e. they show different patterns of variant selection. Moreover, the distribution of the bare vs. prepositional dative can be attributed to more general functional (extrasyntactic) principles. I will focus on the influence of information structure and iconicity here (see Seiler 2003: chapter 7 for other factors). In Alemannic dialects of northern Switzerland there is a strong tendency to insert the dative marker only if the dative noun phrase is focused and bears main sentence stress. It is not inserted if another constituent is focused (cf. Seiler 2003: 177–186): (15) a. b.

Dative ≠ focus: dasmal han ich etz dr Marte es BUECH gschänkt this_time have I now the:Dsf Martha a book given ‘this time, I gave Martha a BOOK’ (Alemannic: Schaffhausen) Dative = focus: ich han s buech i dr MARTE ggëë I have the book DM the:Dsf Martha given ‘I gave the book to MARTHA’ (Alemannic: Schaffhausen)

Is there a functional motivation available for this kind of distribution? According to Givón (1984), indirect objects are typically secondary topics. It is therefore unusual for a dative to be the focus of the sentence. Prepositional dative marking is more explicit and involves more phonological material than bare datives. Thus,

256 

 Guido Seiler

the more marked situation (dative = focus) is expressed by means of the more marked variant (= prepositional dative marking). This distribution is (constructionally) iconic. A similar point is made by Lambrecht (1994) about the correlation between prosodic prominence and communicative importance: The interpretation of sentence prosody in terms of communicative intentions is based on the notion of a correlation between prosodic prominence and the relative communicative importance of the prosodically highlighted element, the prosodic peak pointing to the communicatively most important element in the utterance. Prosodic marking is thus in an important sense iconic, since it involves a more or less direct, rather than purely symbolic, relationship between meaning and grammatical form. (Lambrecht 1994: 242)

Thus, in northern Switzerland, where both bare and prepositional datives coexist, their distribution nicely reflects extrasyntactic factors such as information structure and sentence stress, the concrete realization of which corresponds to more general cognitive principles such as iconicity. However, in other dialects prepositional dative marking is obligatory in all contexts. This is the case e. g. in the Muotathal valley of central Switzerland. Here, all dative noun phrases are preceded by the dative marker or another preposition, regardless of discourse function, stress pattern or other factors (distinctiveness of dative morphology, thematic roles, position, determiner category, etc.). The dative marker serves as an expletive which is inserted whenever no other preposition is there already, without respect to any other (in particular extrasyntactic) factor. We interpret this state of affairs as full implementation of prepositional dative marking. Diachronically speaking, dative marker insertion is analogically extended to all datives. Recall that, according to Kiparsky (1982, 2012), analogical extension can be understood as grammar simplification since contextual constraints are dropped.7 A strategy is extended to the whole of a certain context – and in our case this context is purely syntactic, i. e. the target environment of analogical extension is defined on purely structural grounds. Assuming that analogy relies on a similarity relation, similarity here is based on a purely structural description without any reference to function or meaning. How do we get from the Schaffhausen to the Muotathal variant of prepositional dative marking? Is there a way of motivating the analogical extension of the

7 Whereas constraint removal can be understood as the impetus for analogy, its result may also (and paradoxically) be a complexification of the system, as long as obligatorization is not yet reached: “As every working historical linguist knows, analogical changes tend towards improving the system in some way (even if incomplete regularization may paradoxically end up complicating it)”. (Kiparsky 2012: 21)

Syntactization, analogy and distinction between causations 

 257

variant that involves more phonological material? Perhaps it is due to the maxim of “extravagance” which, according to Haspelmath (1999b), plays a central role in grammaticalization processes. Haspelmath discusses why grammaticalization is irreversible. Pursuing a usage-based approach to change in the spirit of Keller (1994), he introduces a maxim of extravagance (“Extravagance: talk in such a way that you are noticed”, Haspelmath 1999b: 1055), which may ultimately cause grammaticalization processes as the unintended cumulative effect of communicative actions: “Grammaticalization is a side effect of the maxim of extravagance, that is, speakers’ use of unusually explicit formulations in order to attract attention” (Haspelmath 1999b: 1043). As an unintended side-effect of increasing use, the more explicit expression may become obligatory. Increasing obligatoriness, however, is nothing else than what we called syntactization earlier, i. e. applied to our case: dative marker insertion due to a purely morphosyntactic constraint on possible environments of dative forms. In sum, every single step of the gradual implementation of prepositional dative marking can be relatively easily motivated on the basis of very general, extrasyntactic, highly usage-based mechanisms such as analogical extension, iconicity and “extravagance”. However, the result of these processes cannot. Muotathal speakers certainly do not focus their datives all the time. The example of prepositional dative marking shows that functional factors provide a plausible explanation for selectional preferences during a phase of variant competition and for further implementation of the variant in question. At the same time, it is true that functional motivations which promote the implementation of a variant may become obsolete once the variant is implemented further. As for obligatory, fully syntactisized prepositional dative marking, it is not only impossible to ascribe it any extrasyntactic function: It is unnecessary. The dative marker is inserted because the syntax wants it. Any search for a functional motivation within the synchronic state of the language misses the generalization.

4 Concluding remarks: Lessons from evolutionary biology In this chapter it was argued that both formal and functional approaches in linguistics are explanatory, but at different levels. It was shown that syntax contains traits which cannot be motivated on the basis of extrasyntactic function in any direct way. We called this class of phenomena syntactic autonomy. Methodologically, it seems fully appropriate to us to make use of the analytical tools provided by the formalist tradition in order to capture abstract, purely structural regularities and relationships. Functionalist argumentations run the risk of overinterpreting autonomous traits of syntax by searching for extrasyntactic motiva-

258 

 Guido Seiler

tions where none exist. Based on the example of prepositional dative marking in Upper German, we have shown that the patterns of variant selection found in some dialects can indeed be motivated extrasyntactically whereas in other dialects dative marker insertion is purely syntactically triggered, which makes the search for a functional motivation not only a difficult, but also a pointless task: Here, prepositional dative marking is due to syntactic well-formedness. We have hypothesized that well-formedness as such does have a communicative function insofar as it makes communication more efficient, yet the concrete instantiations of well-formedness in a particular language are often independent of concrete functional motivations. According to our hypothesis, autonomy of syntax is the result of diachronic development – processes of changes in variant selection which often reflect more general, i. e. extrasyntactic cognitive principles such as analogical extension, iconicity or “extravagance”. These must be understood in functionalist or usagebased terms. Paradoxically, analogical extension may lead to syntactization which makes the functional factors formerly promoting the selection of a particular variant obsolete: Whereas pathways of change may be motivated by language use and communicative function, these processes may ultimately enhance syntactic autonomy. If this reasoning is on the right track, it means that functional explanations are actually diachronic explanations. Extrasyntactic motivations are at play especially as long as a variant is not yet fully syntactisized. Another consequence is the fact that the synchronic structural makeup of a syntactic pattern is not determined by its function. Knowing the function of a construction tells us little or nothing about its formal structure. Interestingly, a similar point can be made from the perspective of evolutionary biology. Venomous snakes use their poison for hunting and digesting their prey in the first place. It is functional for the snake not to waste the poison for defense. There are two basic strategies which limit the use of poison for defense: camouflage and warning. As for warning, different species display different patterns: warning gestures (cobras), warning sounds (rattlesnakes), or warning colors (coral snakes). Important in our context is the fact that the function of those patterns does not determine their structural makeup and therefore leaves space for formal variation. Also, from a diachronic perspective, form may follow function only on the basis of inherited traits. Languages can never invent things ex nihilo (even if that would be extremely functional); they can only transform devices which are there already. Most aspects of the structure of a language are determined by the fact that they are inherited from the language spoken by the preceding generation, regardless of whether they are functional or not, whether they are good representatives of a language universal or not, whether they reflect cross-linguistic preferences

Syntactization, analogy and distinction between causations 

 259

or not. Only changes in that structure are in a more direct way interpretable as adjustments towards more general, structural or functional tendencies. Things cannot be invented ex nihilo in biology, either. The predecessors of sea urchins were sessile and had no limbs. Later, pre-sea urchins began to move, perhaps as a reaction to a change in their environment. Evolution did not invent new limbs because there was nothing which could be transformed into limbs, due to the pentaradial-symmetric structure of the pre-sea urchin’s body. But pre-sea urchins had spines, and indeed today’s sea urchins use their spines for motion (Knop 2008: 9). Finally, if the synchronic grammar allows for a great degree of autonomy, i. e. independence of functional motivations, one question remains: Is all syntactic structure just historical contingency? Given our assumptions, shouldn’t it be the case that anything goes in syntax, without respect to limitations of possible crosslinguistic variation? Probably not. First, certain types of change are likely to occur and produce certain kinds of synchronic structure. This idea has been elaborated in great detail in the field of phonology (Blevins 2004). According to Blevins’ theory of evolutionary phonology, cross-linguistically recurrent patterns are not so much due to (innate) language universals but rather due to the fact that these patterns are the results of common types of phonological change. It is worth considering to what extent this approach is applicable to syntax as well (evolutionary syntax in analogy to evolutionary phonology). Second, even linguists who are generally skeptical about the idea of Universal Grammar must acknowledge the striking fact that the syntaxes of all languages have something to say about constituent structure, recursion, grammatical function, lexical classes and basic principles of case marking and agreement. Whereas Culicover and Jackendoff (2005) refuse the concrete instantiation of Universal Grammar as suggested by mainstream generativist linguistics in its technical detail, they nonetheless maintain the idea that limitations of cross-linguistic variation cannot be understood without reference to a downsized version of Universal Grammar, which consists exactly of the ingredients quoted above (Culicover and Jackendoff 2005: 40). Let us now construct a last, more far-reaching analogy to evolutionary biology. We have tried to show that both structure-driven and function-driven explanations are justified in linguistics: Both structural and functional causations are at play in syntactic patterning, variation and change. Having accepted that both explanations are necessary, a central question of theoretical linguistics must be: In what ways do structure and function interact, and in what sense are they independent of each other? How can we talk about structural and functional causations in an objective, non-sectarian way? The answer is clear: by acknowledging that they are complementary. Structural and functional approaches explain different aspects of language. They are, so to speak, in complementary

260 

 Guido Seiler

distribution, and this is exactly the reason why they are ultimately compatible with each other. Evolutionary biology could serve as a model for the integration of different, but compatible levels of explanation. According to Nesse (2009), biologists distinguish between two levels of explanation – proximate and evolutionary – which coexist side by side and are complementary of each other: “The most fundamental distinction in biology is between proximate and evolutionary explanations. Proximate explanations are about a trait’s mechanism […]. Evolutionary explanations are about how the mechanism came to exist. These two kinds of explanation do not compete. They are fundamentally different. Both are essential for a complete explanation” (Nesse 2009: 158). Based on the fundamental distinction between proximate and evolutionary explanations, Tinbergen (1963) even distinguishes between four questions a biologist must deal with in order to arrive at a complete explanation of a trait. Tinbergen’s questions enhance the proximate–evolutionary distinction with the dimensions of ontogeny and phylogeny. They are “now nearly universal as a foundation for the study of animal behavior […]. Textbooks all begin by explaining the need for all four kinds of explanation” (Nesse 2009: 159): (16) Tinbergen’s Four Questions (following Nesse 2009: 159): 1. What is the mechanism? [proximate] 2. What is the ontogeny of the mechanism? [proximate] 3. What is the phylogeny of the mechanism? [evolutionary] 4. What selection forces shaped the mechanism? [evolutionary]

The proximate–evolutionary distinction was introduced and promoted mainly by evolutionary biologist Ernst Mayr. As Nesse (2009: 159) points out, Mayr’s terminology has caused confusion insofar as he sometimes calls “ultimate” what is usually called “evolutionary”, and “functional” what is usually called “proximate”, as we will see below. Despite the terminological and technical details, the crucial point about the proximate–evolutionary distinction is its role in the historical development of the discipline. As Mayr (1997) himself points out, there was an immense controversy in biology, too, which is surprisingly reminiscent of the formalist vs. functionalist divide in theoretical linguistics. One camp of biologists claimed that biological traits must be explained on the basis of the instructions given by genetic programs. This is the type of biological explanation which we called proximate. The other camp defended the view that explanations must be formulated in terms of the function of a trait in its evolutionary context. This is the type of biological explanation which we called evolutionary. (In Mayr’s own terminology, “functional” refers to proximate explanations, which is the source of the terminological confusion mentioned above.) Mayr (1997) states:

Syntactization, analogy and distinction between causations 

 261

Every phenomenon or process in living organisms is the result of two separate causations, usually referred to as proximate (functional) causations and ultimate (evolutionary) causations. All the activities or processes involving instructions from a program are proximate causations. [...] Ultimate or evolutionary causations are those that lead to the origin of new genetic programs or to the modification of existing ones – in other words, all causes leading to the changes that occur during the process of evolution. [...] It is nearly always possible to give both a proximate and an ultimate causation as the explanation for a given biological phenomenon. [...] Many famous controversies in the history of biology came about because one party considered only proximate causations and the other party considered only evolutionary ones. (Mayr 1997: 67)

The debates in biology and linguistics do not, of course, match in detail. For example, one might ask whether functional explanations in linguistics are analogous to evolutionary explanations of phylogeny in biology, to phenotypic plasticity of organisms (van Buskirk and Schmidt 2000), or to both.8 However, the fundamental structure of the debates in biology and linguistics is astonishingly similar. In both disciplines, two schools defended their way of explaining aspects of nature as the only possible one at their time: proximate vs. evolutionary in biology, formal vs. functional in linguistics. The main difference between biology and linguistics lies in the fact that the complementarity (and compatibility) of the two kinds of explanation has been widely accepted by biologists since the modern evolutionary synthesis some seventy years ago. A modern linguistic synthesis is still yet to come. For linguists, this is not exactly a reason to be proud of.

References Aissen, Judith (2003): Differential object marking: Iconicity vs. economy. Natural Language and Linguistic Theory 21: 435–483. Aronoff, Mark (1994): Morphology by Itself: Stems and Inflectional Classes. Cambridge, MA: MIT Press. Bäbler, Heinrich (1949): Glarner Sprachschuel: Mundartsprachbuch für die Mittel- und Oberstufe der Glarner Schulen. Glarus: Verlag der Erziehungsdirektion. Blake, Barry (2001): Case. 2nd ed. Cambridge: Cambridge University Press. Blevins, James P. and Blevins, Juliette (eds). (2009): Analogy in Grammar. Form and Acquisition. Oxford et al.: Oxford University Press.

8 Phenotypic plasticity leaves room for direct interactions between traits and environment, whereas in phylogeny this interaction is mediated by evolution. Nevertheless, the existence of phenotypic plasticity simultaneously calls for proximate and evolutionary explanations: How does it work, and how did it come into being?

262 

 Guido Seiler

Blevins, Juliette (2004): Evolutionary Phonology: The Emergence of Sound Patterns. Cambridge: Cambridge University Press . Blümel, Rudolf (1914): Einführung in die Syntax. Heidelberg: Winter. Braune, Wilhelm (2004): Althochdeutsche Grammatik. Edited by Ingo Reiffenstein. Tübingen: Niemeyer. Bresnan, Joan (2001): Lexical-Functional Syntax. Malden, MA/Oxford, UK: Blackwell. Bybee, Joan (2010): Language, Usage and Cognition. Cambridge et al.: Cambridge University Press. Choi, Hye-Won (1997): Optimizing Structure in Context. Scrambling and Information Structure. Stanford: Center for the Study of Language and Information. Croft, William (1995): Autonomy and functionalist linguistics. Language 71: 490–532. Croft, William (2000): Explaining Language Change. Harlow et al.: Longman. Culicover, Peter W. and Ray Jackendoff (2005): Simpler Syntax. Oxford et al.: Oxford University Press. de Smet, Hendrik (2012): The course of actualization. Language 88: 601–633. de Vogelaer, Gunther (2007): Darwinian or Lamarckian change: innovative 2pl.-pronouns in English and Dutch. In: Frank Brisard (ed.): Papers of the Linguistic Society of Belgium, 1–14. Bruxelles: Linguistic Society of Belgium. Fanselow, Gisbert and Sascha W. Felix (1993): Sprachtheorie I: Grundlagen und Zielsetzungen. 3rd ed. Tübingen: Francke. Gentner, Dedre, Keith J. Holyoak and Boicho N. Kokinov (eds.) (2001): The Analogical Mind: Perspectives from Cognitive Science. Cambridge,MA/London: MIT Press. Givón, Talmy (1979): On Understanding Grammar. New York: Academic Press. Givón, Talmy (1984): Direct object and dative shifting: semantic and pragmatic case. In: Frans Plank (ed.), Objects. Towards a Theory of Grammatical Relations, 151–182. London/New York: Academic Press. Halliday, Michael A. K. (1973): Explorations in the Functions of Language. London: Arnold. Haspelmath, Martin (1999a): Optimality and diachronic adaptation. Zeitschrift für Sprachwissenschaft 18: 180–205. Haspelmath, Martin (1999b): Why is grammaticalization irreversible? Linguistics 37: 1043–1068. Hockett, Charles F. (1960): The origin of speech. Scientific American 203: 88–96. Idiotikon (1881–): Schweizerisches Idiotikon. Wörterbuch der schweizerdeutschen Sprache. Begonnen von Friedrich Staub und Ludwig Tobler und fortgesetzt unter der Leitung von Albert Bachmann, Otto Gröger, Hans Wanner, Peter Dalcher, Peter Ott, Hanspeter Schifferle. Frauenfeld: Huber. Itkonen, Esa (2005): Analogy as Structure and Process: Approaches in Linguistics, Cognitive Psychology and Philosophy of Science. Amsterdam/Philadelphia: John Benjamins. Keller, Rudi (1994): Language Change: The Invisible Hand in Language. London: Routledge. Kiparsky, Paul (1973): Abstractness, opacity and global rules. In: Osamu Fujimura (ed.), Three Dimensions of Linguistic Theory, 57–86. Tokyo: Tokyo Institute for Advanced Studies of Language. Kiparsky, Paul (1982): Explanation in Phonology. Dordrecht: Foris. Kiparsky, Paul (2012): Grammaticalization as optimization. In: Dianne Jonas, John Whitman and Andrew Garrett (eds.), Grammatical Change: Origins, Nature, Outcomes, 15–51. Oxford et al.: Oxford University Press. Knop, Daniel (2008): Seeigel im Meerwasseraquarium. Münster: Natur und Tier.

Syntactization, analogy and distinction between causations 

 263

Lambrecht, Knud (1994): Information Structure and Sentence Form. Topic, Focus, and the Mental Representations of Discourse Referents. Cambridge: Cambridge University Press. Mayr, Ernst (1997): This is Biology. The Science of the Living World. Cambridge, MA/ London: Harvard University Press. Musan, Renate (2002): Informationsstrukturelle Dimensionen im Deutschen. Zur Variation der Wortstellung im Mittelfeld. Zeitschrift für germanistische Linguistik 30: 198–221. Nesse, Randolph M. (2009): Evolutionary and proximate explanations. In: David Sander and Klaus R. Scherer (eds.), The Oxford Companion to Emotion and the Affective Sciences, 158–159. Oxford: Oxford University Press. Nübling, Damaris (1992): Klitika im Deutschen. Tübingen: Narr. Percival, Keith W. (1971): The Neogrammarian approach to syntactic change. Manuscript presented at the Twenty-Fourth Annual University of Kentucky Foreign Language Conference in Lexington, Kentucky, 22–24 April 1971. http://people.ku.edu/~percival/ NeogramSyntax.html. Rosenbach, Annette (2008): Language change as cultural evolution: Evolutionary approaches to language change. In: Regine Eckardt, Gerhard Jäger and Tonjes Veenstra (eds.), Variation, Selection, Development : Probing the Evolutionary Model of Language Change – Proceedings of Blankensee Colloquium 2005, 23–72. Berlin/New York: Mouton de Gruyter. Schöpf, Johann Baptist (1866): Tirolisches Idiotikon. Innsbruck: Wagner. Seiler, Guido (2002): Prepositional dative marking in Upper German: a case of syntactic microvariation. In: Sjef Barbiers, Susanne van der Kleij and Leonie Cornips (eds.), Syntactic Microvariation, 243–279. Amsterdam: Meertens Instituut. Available at: www.meertens. knaw.nl/projecten/sand/synmic/. Seiler, Guido (2003): Präpositionale Dativmarkierung im Oberdeutschen. Stuttgart: Steiner. Seiler, Guido (2004): The role of functional factors in language change. An evolutionary approach. In: Ole Nedergaard Thomsen (ed.), Competing Models of Linguistic Change. Evolution and beyond, 163–182. (Current Issues in Linguistic Theory 279.) Amsterdam/ Philadelphia: John Benjamins. Siewierska, Anna (1991): Functional Grammar. London: Routledge. Vallduví, Enric (1992): The Informational Component. New York: Garland. van Buskirk, Josh and Benedikt R. Schmidt (2000): Predator-induced phenotypic plasticity in larval newts: trade-offs, selection, and variation in nature. Ecology 81: 3009–3028.

Rena Torres Cacoullos, Penn State University

Gradual loss of analyzability: Diachronic priming effects

Abstract: Competing accounts of the formation of grammatical units are tested by deploying the facts of variation of the Spanish Progressive. First, unithood and frequency measures support usage-based chunking as more tenable than formal reanalysis as an account of change in constituency. Second, comparison of multivariate models of variation over time reveals that the spread of the Spanish Progressive relative to the simple Present has been differential, as shown in change in the linguistic conditioning of variant choice, in disagreement with an abrupt-reanalysis, constant-rate hypothesis but in support of gradual change in diachrony and inherent variability in synchrony. Third, a priming effect – such that selection of a given construction is favored by previous use of a related construction (here, priming of the estar Progressive by non-Progressive estar constructions) – is introduced as a measure of internal structure, in particular, of (loss of) analyzability.

1 Introduction How do grammatical units come about, and how can change in constituency be observed? Reanalysis is widely invoked by linguists of otherwise different persuasions as a pivotal mechanism of syntactic change. Reanalysis is understood to change underlying structure, including constituency and syntactic-category labels (Campbell 1998: 284). For example, the English future auxiliary is said to result from reanalysis of the purposive motion construction of main verb go with a non-finite clause complement, as represented by rebracketing of some kind: [BE going [to Verb]] > [BE going to Verb] (Hopper and Traugott 1993: 3). A material indication of such reanalysis would be phonetic reduction of going to to gonna. Reanalysis has been seen as abrupt, following from the view that each word sequence must have a unique constituent analysis, which in turn follows from the formalist (generative) view that proposed syntactic rules or constraints are categorical and that syntactic categories, for example, main vs. auxiliary verb, are discrete. But the facts of synchronic variation, as between going to and gonna (e. g., Poplack and Tagliamonte 1999: 328–332), disturb an understanding of grammatical change as abrupt reanalysis.

266 

 Rena Torres Cacoullos

The probabilistic aspects of grammar (Labov 1969; Cedergren and Sankoff 1974) are now being recognized by more linguists, who are exploring usage-based and emergentist theories of grammar. Bybee (2010) proposes that constituent structure is derivable from domain-general mechanisms in operation as speakers produce and process language. Pertinent here is the fusing of sequential experiences that occurs with repetition or, for language, the chunking of frequent word sequences as single processing units (Bybee 2010: 34 and references therein). For example, the vowel of don’t is more likely to reduce to a schwa in I don’t know than when the main verb is a less frequent one, as in I don’t inhale, even though the two expressions are apparently of the same syntactic structure (Scheibman 2001: 114). In a usage-based view, a consequence of frequent repetition and ensuing chunking of contiguous linguistic units is the loss of analyzability of the sequence of (erstwhile) units (Bybee 2010: 44–45; see also Croft and Cruse 2004: 250–253; Langacker 1987: 292). Analyzability is seen as a morphosyntactic parameter that has to do with the degree to which the internal structure is discernable (akin to the morphological “decomposability” of complex words (cf. Hay 2001)), which is not subsumable under a semantic criterion. For example, while pull strings has a non-transparent meaning that is not predictable from pull and strings, it is syntactically analyzable, in that speakers presumably recognize the component parts as individual words and the relation between them, here a verb with its object. With schematic (productive) constructions that have open classes of items, such as [BE going to Verb], loss of analyzability is understood as the weakening of the association between the erstwhile individual components with other instances of the same items. In Bybee’s (2003: 618) example, as going to reduces to gonna, its composite morphemes lose their association with go, to or -ing. But what observations provide evidence for “association” and its loss? In this paper, the facts of variation are deployed to tackle the question of how grammatical units come about. I use variability in the Spanish Progressive to test gradualness vs. abruptness in the formation of grammatical units and put forward diachronic priming effects as a gauge of analyzability and its erosion over time. After presenting the linguistic variable in Section 2, I begin in Section 3 with a recapitulation of unithood indices and frequency measures, which score an initial point in favor of usage-based chunking as more tenable than formal reanalysis. I then present a multivariate model of variation between the Progressive and simple Present, in Section 4. The shift in the relative importance of aspectual reading and locative co-occurrence scores a further point in favor of gradual change in diachrony and inherent variability in synchrony. In Section 5 I introduce priming effects as a measure of (erosion of) analyzability.

Gradual loss of analyzability: Diachronic priming effects  

 267

2 Spanish Progressive ESTAR + Verb-ndo Latin did not have a dedicated morpheme or construction for progressive aspect, the simple Present serving this function among others (Allen and Greenough 1916: 293, § 465). Probably the most common source for progressives crosslinguistically is locative expressions (Bybee, Perkins and Pagliuca 1994: 127–133; cf. Comrie 1976: 98–105). Beginning from the earliest Spanish texts, we find gerunds (-ndo forms) combining with finite forms of spatial (locational, postural or movement) verbs. Besides estar ‘be (at)’, these were usually ir ‘go’, andar ‘walk, go around’, venir ‘come’, salir ‘go out’, quedar ‘remain, stand still’. Examples are (1a), with ir, and (1b), with venir. (1a) déxa=me dezir, que se va hazie-ndo noche let.imp=acc.1sg say.inf that refl go.prs.3sg make-ger night ‘let me speak, it is [literally: goes] becoming night’ (15th c., Celestina, Act VI) (1b) ¿No oyes lo que viene canta-ndo ese villano? neg hear.prs.2sg that.rel come.prs.3sg sing-ger that rustic ‘Don’t you hear what that rustic is [literally: comes] singing?’ (17th c., Quijote II, Ch. IX)

Allen and Greenough (1916: 819, § 507) give a medieval Latin example of this general Spatial Verb + Verb-ndo (gerund) construction, cum una dierum flendo sedisset, quidam miles generosus iuxta eam equitando venit (Gesta Romanorum, 66 [58]) ‘as one day she sat weeping, a certain knight came riding by’ (Gesta Romanorum, 66 [58]). In Torres Cacoullos (2000) I adduced evidence for the origins of Spanish Progressive ESTAR (< Latin stare ‘stand’) + Verb-ndo (gerund) as a locative expression ‘be located somewhere Verb-ing’ from its early distributions across co-occurring locatives (most frequently with en ‘in’) and verbs (most frequently hablando ‘talking’, other verbs of speech, esperando ‘waiting’, and verbs of body activity). These co-occurring elements are consonant with being stationary in a given place. A 13th century example appears in (2). In contrast, gerund combinations with motion verbs ir ‘go’ and andar ‘walk (around)’ tended to co-occur with other kinds of locatives (a ‘to’, por ‘along’) and verb classes (motion, process, general activity). (2) uros hebreos estan aqui razona-ndo prs.3pl here discourse-ger ‘your Hebrews are here conferring’ (13th c., General Estoria I, fol. 151r)

268 

 Rena Torres Cacoullos

The key construct in variation theory is the linguistic variable (Labov 1969), a set of variants which “are used interchangeably to refer to the same states of affairs” (Weiner and Labov 1983: 31), i. e. “alternative ways of saying the same thing” (Labov 1982: 22). In the pair of examples from a 19th century play in (3), the “same thing”, or grammatical function, is present progressive and the “alternative ways”, or variants, are the Progressive and simple Present forms. In the English translation, PROG designates the Progressive – ESTAR + Verb-ndo – as in (3a), PRS the simple Present, as in (3b). Both forms here express a situation in progress at the moment of speech. (3a) EDUARDO. – No me muestres esa compasión. Yo no la merezco. ¿Sabes tú con quién estás habla-ndo? know.prs.2sg you with rel be.prs.2sg speak-ger ‘EDUARDO: Don’t show me such compassion. I don’t deserve it. Do you know who you are talking to (PROG)?’ (19th c., Amor de padre, Act 5, Scene2) (3b) AGENTE. – ¿Cómo tienes valor? Olvidas que hablas con un republicano? forget.prs.2sg comp speak prs.2sg with a republican ‘AGENT: How do you have the courage? Do you forget that you are talking (PRS) to a republican?’ (19th c., Amor de padre, Act 3, Scene VII)

The variable context is the sum of contexts where distinctions in grammatical function among different forms may be “neutralized in discourse” (Sankoff 1988a: 153). This is defined here broadly as the domain of present temporal reference, since the Progressive and simple Present also compete as expressions of non-progressive present situations (e. g., (8), below). We circumscribe a variable context in order to adhere to the principle of accountability, that not only occurrences but also non-occurrences of a given variant be noted (Labov 1982: 30), here, where the Progressive could have materialized but the simple Present did instead, as in (3b). Tokens of Present-tense ESTAR + Verb-ndo were exhaustively extracted from a corpus comprised of 60 texts from three time periods, the 13th–15th, 17th and 19th centuries (traditionally, Old Spanish, Golden Age Spanish, and Modern Spanish; the texts are listed in the Appendix). Tokens of the “non-occurrences”, i. e., of the simple Present, were extracted by taking simple Present-tense occurrences of the same lexical types that appear in the Progressive in a given text. From this sample, Present-tense forms with future or past temporal reference were excluded, for example, estaba […] para montar a caballo […], cuando oigo ¡tras tris, tras tras! ‘I was […] about to get on the horse, when I hear tras tris, tras tras! (Pazos, Ch. XXI). Also discarded were first- or second-person singular discourse routines (e. g., digo ‘I say’, ya ve(s) ‘you see’) or prefabs involving ser ‘be’ (e. g. es que ‘it’s that’).

Gradual loss of analyzability: Diachronic priming effects  

 269

Following these protocols, a total of 1,656 tokens of the Progressive or simple Present were retained for the analyses of variation. Table 1 depicts the number of texts, word counts and Ns for the three time periods. Table 1: Data for the study of Progressive – simple Present variation in present temporal reference contexts

No. texts Word count N Progressive N simple Present

13th–15th century

17th century

19th century

17 2,500,000 119 4291

15 600,000 180 564

28 900,000 317 663

All tokens of both forms were coded according to a number of hypotheses about variant choice, operationalized as factors based on the presence or absence of linguistic elements of the context in which the token occurs. Included in the factor groups (independent or predictor variables, or constraints) are co-occurrence of locative adverbials, aspectual reading and priming. The linguistic conditioning of variant selection is instantiated in probabilistic associations of forms with contextual elements. A multivariate model of the variation is presented in section 4 ahead, after we first consider evidence from distributional analysis and frequency counts, below.

3 Unithood and frequency Spanish Progressive ESTAR + Verb-ndo would seem a good candidate for change via either reanalysis or loss of analyzability. The change in constituent structure would be from a sequence of two independent units – a finite form of main verb estar ‘to be (located)’ with a gerund -ndo complement – to a single periphrastic unit, in which estar is an auxiliary and the gerund is the main verb (4). (4)

reanalysis / loss of analyzability

[ESTAR]verb + [Verb-ndo (gerund)]complement > [ESTARaux + Verb-ndoverb]Progressive

1 For the 13th–15th century simple Present sample, tokens of lexical types appearing in the Progressive were not extracted from Grimalte y Gradissa and Crónica de los Reyes Católicos, for which electronic versions were not available (three Progressive tokens each); also omitted were Present tokens of frequent decir ‘say’ in Corbacho (of which there was one Progressive token). More on the texts, the simple Present sampling and exclusions is given in Torres Cacoullos (2012).

270 

 Rena Torres Cacoullos

Whereas in going to the items are contiguous, here we have a schematic construction with an intervening slot for the open class of items, the Verb. In the absence of surface phonetic reduction, as with English future gonna, what evidence could be assembled for the status of ESTAR + Verb-ndo as a unitary constituent? We may take the obverse of analyzability to be unithood, operationalized as the proportion of the instances of the construction in which the adjoining items behave as a single unit, i. e. as one word. In previous work (e. g., Torres Cacoullos 2006; Torres Cacoullos and Walker 2011) we developed unithood indices from distributional analysis, which tracks proportions of tokens of an expression across its contexts of occurrence. Increasing unithood of ESTAR + Verb-ndo has been inferred from a decreasing proportion of occurrences with elements intervening between estar and the gerund, with more than one gerund per estar, or with the gerund preceding estar (Torres Cacoullos 2000: 31–55; Bybee and Torres Cacoullos 2009: 201–203; Torres Cacoullos 2012: 79). A more direct index of unithood is the positioning of object pronouns, which precede finite verb forms in modern Spanish (Torres Cacoullos 1999b). In (5a) the object pronoun (underlined) is postposed to the gerund (is telling him), in (5b) it is preposed to estar (literally, ‘it are saying’). The latter configuration, known as “clitic climbing”, has been viewed in generative syntax as a restructuring of a series of verbs into a single verbal complex (e. g., Rizzi 1982). In a functionalist view, “clitic climbing” has been seen as a manifestation of the grammaticalization of auxiliaries, as a verb comes to express grammatical (e. g., aspectual, progressive) more than lexical (e. g., spatial, locative) meaning (Myhill 1988). (5a)

[ESTAR] + [Verb-ndo + object pronoun/clitic]

Yo voy con tu cordon tan alegre: que se me figura que esta dizie-ndo le alla su coraçon la merced que nos heziste be.prs.3sg tell-ger dat.3sg there his heart […] ‘I’ll go with your cord so happily, I can almost imagine that his heart there is telling him of the great favor you have done us’ (15th c., Celestina, Act IV, fol.32r) (5b)

[object pronoun/clitic + ESTAR + Verb-ndo]

– Pero que nosotros tampoco les vamos a dar cien días. Vamos a decir lo que nos parezca desde hoy. – Ya lo estamos dicie-ndo.   already acc.3sg be.prs.1pl tell-ger ‘But we’re not going to give them a hundred days. We’re going to say what we think starting today.’ ‘We [it] are already saying it’ (20th c., CORLEC, CDEB014A, p215–p216)

Gradual loss of analyzability: Diachronic priming effects  

 271

In the 15th century example in (5a) ESTAR + Verb-ndo is compatible with locative meaning, indicated by co-occurring allá ‘there’ in the same clause and the motion verb voy ‘I go’ in a previous clause: the speaker will go to where the person represented metonymically by his heart is located (está…allá ‘is…there’). There is at the same time aspectual meaning, as conveyed by se me figura ‘I can imagine’: the situation referred to by the gerund is in progress at speech time. In comparison, spatial meaning appears at best attenuated in the 20th century example in (5b), where most prominent is aspectual meaning, indicated by co-occurring temporal adverbial ya ‘already’: the speaker asserts that the verbal situation (diciendo ‘saying’) is in progress. In Table 2, though the count of all eligible cases is low, there is a clear trend of increased rates of placement before estar (proclisis). Increasing placement of object pronouns before the whole complex (as with single finite verbs), rather than attached to the gerund, can be taken as an indication of enhanced unithood.2 Table 2: Increasing unithood of ESTARPresent + Verb-ndo: placement of object pronouns before estar (“clitic climbing”)3 13th–15th century

17th century

19th century

20th century4

71 % (10/14)

72 % (18/25)

89 % (58/65)

97 % (100/103)

Unithood is a theory-neutral measure, compatible with either reanalysis or loss of analyzability. However, the two accounts are distinguished by the expected (non) role of frequency. On the one hand, loss of analyzability attributable to chunking depends on repetition. Applied to the case at hand, with frequent repetition the sequence ESTAR + Verb-ndo would become a new chunk – more of a fused unit. On the other hand, a theory of syntactic change based on reanalysis in terms of

2 The 19th and 20th century rate of preposed object clitics shown in Table 2 is higher than for all tenses of ESTAR + Verb-ndo (respectively, 70 %, 54/77 in the same texts (reported in Bybee and Torres Cacoullos 2009: 203) and 89 %, 103/115 in Mexico City “habla popular” (UNAM 1976) (reported in Torres Cacoullos 1999b: 146). This is consonant with Progressive grammaticalization advancing in present before past tenses (Torres Cacoullos 2012: 110, n. 3) (whereas habitual markers are said to appear in past before generalizing to present temporal reference contexts (Bybee et al. 1994: chapter 5)). 3 In χ2 tests, difference between 17th and 19th p