Linguistic Evidence: Empirical, Theoretical and Computational Perspectives 9783110197549, 9783110183122

The renaissance of corpus linguistics and promising developments in experimental linguistic techniques in recent years h

287 83 5MB

English Pages 588 [589] Year 2005

Table of contents :
Frontmatter
Contents
Evidence in Linguistics
Gradedness and Consistency in Grammaticality Judgments
Null Subjects and Verb Placement in Old High German
Beauty and the Beast: What Running a Broad-Coverage Precision Grammar over the BNC Taught Us about the Grammar — and the Corpus
Seemingly Indefinite Definites
Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese
Aspectual Coercion and On-line Processing: The Case of Iteration
Why Do Children Fail to Understand Weak Epistemic Terms? An Experimental Study
Processing Negative Polarity Items: When Negation Comes Through the Backdoor
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
The Decathlon Model of Empirical Syntax
Examining the Constraints on the Benefactive Alternation by Using the World Wide Web as a Corpus
A Quantitative Corpus Study of German Word Order Variation
Which Statistics Reflect Semantics? Rethinking Synonymy and Word Similarity
Language Production Errors as Evidence for Language Production Processes – The Frankfurt Corpora
A Multi-Evidence Study of European and Brazilian Portuguese wh-Questions
The Relationship between Grammaticality Ratings and Corpus Frequencies: A Case Study into Word Order Variability in the Midfield of German Clauses
The Emergence of Productive Non-Medical -itis: Corpus Evidence and Qualitative Analysis
Experimental Data vs. Diachronic Typological Data: Two Types of Evidence for Linguistic Relativity
Reflexives and Pronouns in Picture Noun Phrases: Using Eye Movements as a Source of Linguistic Evidence
The Plural is Semantically Unmarked
Coherence – an Experimental Approach
Thinking About What We Are Asking Speakers to Do
A Prosodic Factor for the Decline of Topicalisation in English
On the Syntax of DP Coordination: Combining Evidence from Reading-Time Studies and Agrammatic Comprehension
Lexical Statistics and Lexical Processing: Semantic Density, Information Complexity, Sex, and Irregularity in Dutch
The Double Competence Hypothesis On Diachronic Evidence
Backmatter

Recommend Papers

Handbook of Communication Disorders: Theoretical, Empirical, and Applied Linguistic Perspectives 9781614514909, 9781614516859, 9781501519352

The domain of Communication Disorders has grown exponentially in the last two decades and has come to encompass much mor

203 48 14MB Read more

Handbook of Communication Disorders: Theoretical, Empirical, and Applied Linguistic Perspectives 9781614514909, 9781614516859, 9781501519352

The domain of Communication Disorders has grown exponentially in the last two decades and has come to encompass much mor

145 100 13MB Read more

Parenthesis and Ellipsis: Cross-Linguistic and Theoretical Perspectives 9781614514831, 9781614516743

This volume presents a cross-section of research addressing the interaction of two prominent areas in linguistic theory:

165 76 6MB Read more

Parenthesis and Ellipsis: Cross-Linguistic and Theoretical Perspectives 9781614514831, 9781614516743

This volume presents a cross-section of research addressing the interaction of two prominent areas in linguistic theory:

167 97 2MB Read more

Pedagogical Translanguaging: Theoretical, Methodological and Empirical Perspectives 9781788927383

This book presents cutting-edge qualitative case-study research across a range of educational contexts, as well as theor

142 8 12MB Read more

Morphological Variation: Theoretical and empirical perspectives 9027203148, 9789027203144

Morphological variation is a rather young, yet fascinating topic to study in its own right because it offers challenging

357 41 9MB Read more

EU Soft Law in the Member States: Theoretical Findings and Empirical Evidence 9781509932030, 9781509932061, 9781509932054

This volume analyses, for the first time in European studies, the impact that non-legally binding material (otherwise kn

188 49 6MB Read more

Reverse Social Innovation: Theoretical Perspective and Empirical Evidence (Contributions to Management Science) 3031482468, 9783031482465

This book presents important insights into Social Innovation and Reverse Innovation. It introduces a unique perspective

110 48 2MB Read more

Multilevel Selection: Theoretical Foundations, Historical Examples, and Empirical Evidence [1st ed.] 9783030495190, 9783030495206

This book embeds a novel evolutionary analysis of human group selection within a comprehensive overview of multilevel se

399 53 5MB Read more

Projectification of Organizations, Governance and Societies: Theoretical Perspectives and Empirical Implications 3031304101, 9783031304101

This book is about projectification – a concept that captures the increasing reliance on “the project” in contemporary s

166 118 5MB Read more

Linguistic Evidence: Empirical, Theoretical and Computational Perspectives
9783110197549, 9783110183122

Author / Uploaded
Stephan Kepser (editor)
Marga Reis (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Linguistic Evidence

≥

Studies in Generative Grammar 85

Editors

Henk van Riemsdijk Harry van der Hulst Jan Koster

Mouton de Gruyter Berlin · New York

Linguistic Evidence Empirical, Theoretical and Computational Perspectives

Edited by

Stephan Kepser Marga Reis

Mouton de Gruyter Berlin · New York

Mouton de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter GmbH & Co. KG, Berlin.

The series Studies in Generative Grammar was formerly published by Foris Publications Holland.

앝 Printed on acid-free paper which falls within the guidelines 앪 of the ANSI to ensure permanence and durability.

Library of Congress Cataloging-in-Publication Data Linguistic evidence : empirical, theoretical, and computational perspectives / edited by Stephan Kepser, Marga Reis. p. cm. ⫺ (Studies in generative grammar ; 85) Includes bibliographical references. ISBN-13: 978-3-11-018312-2 (cloth : alk. paper) ISBN-10: 3-11-018312-9 (cloth : alk. paper) 1. Linguistics ⫺ Methodology. I. Kepser, Stephan, 1967⫺ II. Reis, Marga. III. Series. P126.L48 2005 410.72⫺dc22 2005031124

Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at ⬍http://dnb.ddb.de⬎.

ISBN-13: 978-3-11-018312-2 ISBN-10: 3-11-018312-9 ISSN 0167-4331 쑔 Copyright 2005 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin. All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Cover design: Christopher Schneider, Berlin. Printed in Germany.

Contents

Evidence in Linguistics Stephan Kepser and Marga Reis

1

Gradedness and Consistency in Grammaticality Judgments Aria Adli

7

Null Subjects and Verb Placement in Old High German Katrin Axel

27

Beauty and the Beast: What Running a Broad-Coverage Precision Grammar over the BNC Taught Us about the Grammar – and the Corpus Timothy Baldwin, John Beavers, Emily M. Bender, Dan Flickinger, Ara Kim, and Stephan Oepen

49

Seemingly Indeﬁnite Deﬁnites Greg Carlson and Rachel Shirley Sussman

71

Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese Sonia M. L. Cyrino and Ruth E. V. Lopes

87

Aspectual Coercion and On-line Processing: The Case of Iteration Sacha DeVelle

105

Why Do Children Fail to Understand Weak Epistemic Terms? An Experimental Study Serge Doitchinov

123

Processing Negative Polarity Items: When Negation Comes Through the Backdoor Heiner Drenhaus, Stefan Frisch, and Douglas Saddy

145

Linguistic Constraints on the Acquisition of Epistemic Modal Verbs Veronika Ehrich

165

vi Contents The Decathlon Model of Empirical Syntax Sam Featherston

187

Examining Constraints on the Benefactive Alternation by Using the World Wide Web as a Corpus Christiane Fellbaum

209

A Quantitative Corpus Study of German Word Order Variation Kris Heylen

241

Which Statistics Reﬂect Semantics? Rethinking Synonymy and Word Similarity Derrick Higgins

265

Language Production Errors as Evidence for Language Production Processes – The Frankfurt Corpora Annette Hohenberger and Eva-Maria Waleschkowski

285

A Multi-Evidence Study of European and Brazilian Portuguese wh-Questions Mary Aizawa Kato and Carlos Mioto

307

The Relationship between Grammaticality Ratings and Corpus Frequencies: A Case Study into Word Order Variability in the Midﬁeld of German Clauses Gerard Kempen and Karin Harbusch

329

The Emergence of Productive Non-Medical -itis: Corpus Evidence and Qualitative Analysis Anke L¨udeling and Stefan Evert

351

Experimental Data vs. Diachronic Typological Data: Two Types of Evidence for Linguistic Relativity Wiltrud Mihatsch

371

Reﬂexives and Pronouns in Picture Noun Phrases: Using Eye Movements as a Source of Linguistic Evidence Jeffrey T. Runner, Rachel S. Sussman, and Michael K. Tanenhaus

393

The Plural is Semantically Unmarked Uli Sauerland, Jan Anderssen, and Kazuko Yatsushiro

413

Coherence – an Experimental Approach Tanja Schmid, Markus Bader, and Josef Bayer

435

Contents vii

Thinking About What We Are Asking Speakers to Do Carson T. Sch¨utze

457

A Prosodic Factor for the Decline of Topicalisation in English Augustin Speyer

485

On the Syntax of DP Coordination: Combining Evidence from Reading-Time Studies and Agrammatic Comprehension Ilona Steiner

507

Lexical Statistics and Lexical Processing: Semantic Density, Information Complexity, Sex, and Irregularity in Dutch Wieke M. Tabak, Robert Schreuder, and R. Harald Baayen

529

The Double Competence Hypothesis. On Diachronic Evidence Helmut Weiß

557

List of Contributors

577

Evidence in Linguistics Stephan Kepser and Marga Reis

As is well known, the central objects of linguistic enquiry – language, languages, and the factors/mechanisms systematically (co-)governing language acquisition, language processing, language use, and language change – cannot be directly accessed; they must be reconstructed from the accessible manifestations of linguistic behaviour. These manifestations constitute the realm of possibly usable linguistic data. Since they fall into many types – introspective data, corpus data, data from (psycho-)linguistic experiments, synchronic vs. diachronic data, typological data, neurolinguistic data, data from ﬁrst and second language learning, from language disorders, etc. –, and since each type, apart from historical data, can be instantiated by inﬁnitely many tokens, the linguist’s central task of building theories about the above-mentioned linguistic objects is invariably bound up with several empirical tasks as well: (i) collecting/selecting a representative as well as reliable database from one or more data types, (ii) evaluating the various data types as to how they reﬂect linguistic competence (recall that even so-called primary data from introspection as well as authentic language production are complex performance data involving different nonlinguistic factors), (iii) assessing the relationship between the various data types such that comparison between studies of the same issue based on different data types is possible, and potential conﬂicts in results can in principle be resolved. As will be obvious, the three empirical tasks are largely interdependent. However, they are to a considerable degree dependent on linguistic theorising as well: Task (i) must typically be solved for speciﬁc linguistic problems, the speciﬁc shape of which is determined by linguistic theory proper. Tasks (ii) and (iii) must be related to theories about the interaction of linguistic competence with nonlinguistic faculties and factors in performance. Thus, gaining relevant linguistic evidence from the mass of potentially available data is neither a trivial matter nor a purely methodical one that can be pursued in isolation from concrete linguistic enquiry and their theoretical concerns. Moreover, providing useful data collections (be it appropriately annotated corpora, collections of controlled speaker judgements, experimentally elicited data, etc.) is also a linguistically challenging ‘practical’ task. In short, linguistic

2 Stephan Kepser and Marga Reis evidence is an extremely important topic as well as a challenging problem for linguists of all persuasions. Given the fundamental nature of the problem, linguistic evidence is a remarkably new topic of linguistic discussion. Traditionally, concrete speech events, i.e, naturally occurring written or spoken utterances, were taken without further ado as the only relevant source of linguistic data, although the need for ‘abstracting’ the linguistically relevant traits from these data was by no means unknown (cf. B¨uhler 1932: 97, 1934: 14–15). Within structuralism, this tradition gained explicit methodological and theoretical status (‘distributionalism’). Thus the explicit mentalistic turn of generative grammar which claimed the priority of explanatory over descriptive goals and introspective over corpus data was bound to inspire a heated debate concerning the status of linguistics as an empirical science in general and the nature of proper linguistic evidence in particular. This debate, however, died down after the seventies without virtually any consequences on linguistic practice: Generative linguists continued relying more or less on introspective data gained in rather informal ways, non-generative linguists continued relying more or less on corpus data that were often just as informally obtained. In recent times, this has begun to change. Regarding the use of introspective data, an important turning point was the book by Sch¨utze (1996), who was the ﬁrst to argue forcefully for a systematic approach to the collection of speaker judgements. Since then, many authors have followed his lead and shown in various ways the necessity of controlling the many factors that inﬂuence speaker judgements in order to obtain more reliable data. As a consequence, there is a growing awareness among generative linguists that it is imperative to collect introspective data in systematically controlled ways, and moreover useful to complement them by data from other sources, both of which increasingly inﬂuences their linguistic practice. Regarding corpus data, the importance of this source of evidence has grown signiﬁcantly since about the mid nineties, when when really large amounts of language data of many types became electronically available and easily accessible for the ﬁrst time. Frequently, these data were annotated in linguistically relevant ways which made these sources even more valuable. At the same time, computational linguists developed methods of accessing and evaluating these corpora. Consequently, linguists have now access to corpora that are several orders of magnitude larger than they were before. And the size and number of such corpora is still rapidly growing. Hence the renaissance of corpus linguistics to be observed since the nineties is by no means a coincidence. Both developments, by voiding mutual reservations concerning solidity

Evidence in Linguistics 3

and practicability of method, have also paved the way for a rapprochement between introspective and corpus linguists, as evidenced by several recent publications in which the question of what should count as linguistic evidence is discussed from either perspective, on the whole opting for using corpus as well as introspective evidence (see, e.g., the recent special issues of Lingua and Studies in Language ). But an astonishing number of participants in the discussion are still trying to argue that one of these types of linguistic evidence is generally signiﬁcantly superior to the other (see, e.g., Lehmann (2004) and Borsley (2005b)). It is one of the main aims of this volume to overcome the corpus data versus introspective data opposition and to argue for a view that values and employs different types of linguistic evidence each in their own right. Evidence involving different domains of data will shed different, but altogether more, light on the issues under investigation, be it that the various ﬁndings support each other, help with the correct interpretation, or by contradicting each other, lead to factors of inﬂuence so far overlooked. This ties in naturally with the fact we started out with that there are more domains and sources of evidence that should be taken into account than just corpus data and introspective data. These insights may sound simple, but, unfortunately, a look into the discussion on evidence in linguistics shows that they are not generally accepted. Apparently, it is not so much the origin of evidence that counts. What is more important is adequacy and the status of the data as true ‘evidence’. Adequacy means that the data put forward to support a certain claim actually do so. This can only be decided on an individual level, i.e., for the particular linguistic problem in question. It is therefore of no concern to us here. Whether certain data can be regarded as true evidence touches the key questions of reliability and reproducibility of data. Reproducibility of data is a base demand in all areas of science for these data to be considered true evidence for something. Typical counterexamples are example sentences held to be (un)acceptable by virtue of the linguist’s own judgement only (especially if fortiﬁed by the belief in individual ‘dialects’), or quoting a single occurrence of a construction found in the world wide web, which is by some regarded as the largest accessible corpus as support for this construction’s grammatical existence. Reliability encompasses reproducibility, but requires more. A proper analysis and control of the factors that inﬂuence the constitution of the data are necessary as well. With reproducibility and reliability secured, data can be fruitfully used as evidence for strengthening or refuting hypotheses. The contributions to the present book are examples of how this can be done

4 Stephan Kepser and Marga Reis in linguistic practice. An important aspect of this book, and a consequence of what we pointed out at the outset about the theoretical underpinnings of issues of linguistic evidence, is the absence of purely abstract discussions of methodologies. Rather, all issues concerning linguistic evidence taken up in the various contributions are addressed in relation to speciﬁc linguistic research problems. The main reason for this is our belief that it is only with respect to concrete problems that the quality of the method and of the various types of evidence brought to bear on them can be evaluated. Apart from that it is just more convincing to see how using different types of evidence and different methods of obtaining it may in fact further our understanding of such concrete problems. It stands to reason then that a volume on ‘Linguistic Evidence’ should cover a wide range of data types (and methods for turning data into evidence) to be applied to an equally wide range of linguistic phenomena. The present volume does: As for data types, many sources of evidence come into play: corpus data, introspective data, psycholinguistic data, data from computational linguistics, language acquisition data, data from historical linguistics, and sign language data. In several contributions, different data types are comparatively evaluated, which yields particularly insightful results. What is remarkably absent is quarrel about the status of introspective vs. corpus data; both are recognised throughout as equally valid sources of evidence. We take this as a hopeful sign that the longstanding but fruitless either-or confrontation of these data types will ﬁnally be overcome. Different ways for gaining linguistic evidence are also well represented in this volume, papers applying/exploring psycholinguistic methods forming perhaps the largest group. A good part of them is concerned with experimental data from language processing, exploring systematic ways for measuring and interpreting these data. But there are also papers exploring methods for collecting reliable as well as reproducible grammaticality judgements. These data types and methods are applied insightfully to phenomena from such diverse areas as syntax, semantics, phonology, morphology, psycholinguistics, historical linguistics, language acquisition, corpus linguistics, computational linguistics, and patholinguistics. For books, such diversity of topics is not always a virtue. But in this case, it serves to underline the fundamental importance issues of linguistic evidence have for all ﬁelds of linguistics. It also indicates that awareness of these issues has by now reached almost all these ﬁelds. The present book is based on the conference on Linguistic Evidence. Empirical, Theoretical, and Computational Perspectives that took place in T¨ubingen, January 29 – February 1, 2004. It was organised by the Collaborative

Evidence in Linguistics 5

Research Centre (SFB) 441 on “Linguistic Data Structures. On the Relation between Data and Theory in Linguistics” at the University of T¨ubingen, which has supported in-depth studies of linguistic evidence in all its aspects since 1999. The contributions to this volume are elaborated versions of the conference presentations, plus a paper by H. Weiß designed to complement the historical section. Unfortunately, four papers presented at the conference were not be submitted for publication. The editors of this volume wish to express their gratitude to the members of the collaborative research centre (SFB) 441 on Linguistic Data Structures at the University of T¨ubingen for many interesting discussions on key issues of evidence in linguistics, and for their vigorous support when organising the above-mentioned conference. In this regard we owe particular thanks to Sam Featherston, Beate Starke, and Dirk Wiebel. We also want to thank the members of the conference programme committee for their excellent work. When preparing the present volume we received again generous support by many, to whom we are very grateful. In particular, we wish to thank the colleagues who reviewed the papers for publication, for their extremely useful comments and criticisms, and the group of helpers without whom editing this volume might have become a mission impossible: Iris Banholzer, Ansgar H¨ockh, Chris Sapp, and Bettina Zeisler. We are also grateful to the German Science Foundation (DFG) for their generous support of the collaborative research centre 441 and of the conference on Linguistic Evidence.

Stephan Kepser and Marga Reis

November 2005

References Borsley, Robert D., (ed.) 2005a Data in Theoretical Linguistics, volume 115(11) of Lingua. Borsley, Robert D. 2005b Introduction. Lingua, 115(11): 1475–1480. B¨uhler, Karl 1932 Das Ganze der Sprachtheorie, ihr Aufbau und ihre Teile. In Bericht u¨ ber den XII. Kongreß der Deutschen Gesellschaft fu¨ r Psychologie, pp. 95–122. Fischer, Jena.

6 Stephan Kepser and Marga Reis 1934

Sprachtheorie. Die Darstellungsfunktion der Sprache. Fischer, Jena. 2nd edition Stuttgart, 1965. Lehmann, Christian 2004 Data in linguistics. The Linguistic Review, 21: 175–210. Penka, Martina and Anette Rosenbach 2004 What counts as evidence in linguistics. Studies in Language, 28(3): 480–526. Sch¨utze, Carson 1996 The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. University of Chicago Press.

Gradedness and Consistency in Grammaticality Judgments Aria Adli

1

The importance of graded grammaticality judgments: a case study of que Æ qui in French

The methodological issue of the unreliability of certain introspective data circulating in the syntactic literature has already been mentioned by several authors (e.g. Schütze 1996; Adli 2004). One particularly problematic phenomenon is that questionable judgments are sometimes quoted in theoretical studies without prior critical empirical verification, contributing to the formation of “myths” in the literature. One case is the que Æ qui ‘rule’ in French. This rule, which has been introduced into the literature solely on the basis of uncontrolled introspective data, is not confirmed by an experimental study in which a controlled process of data collection is applied to a whole sample of test subjects and which makes use of a graded concept of grammaticality. The que Æ qui rule essentially states that an ECP violation can be avoided in French if qui is used instead of the usual complementizer que in sentences where a wh-phrase has been extracted from the subject position (see Perlmutter 1971; Kayne 1977). This rule rests on the empirical ‘premise’ that there should be a clear difference in grammaticality between (2a) and (2b) (all four sentences are taken from Hulk and Pollock 2001). (1)

a. Quel livre crois-tu que les filles vont acheter. which book think-you COMPque the girls will buy b. *Quel livre crois-tu qui les filles vont acheter. which book think-you COMPqui the girls will buy

(2)

a. *Quelles filles crois-tu que vont acheter ce livre-là. which girls think-you COMPque will buy that book-there b. Quelles filles crois-tu qui vont acheter ce livre-là. which girls think-you COMPqui will buy that book-there

8

Aria Adli

The que Æ qui rule has been an often-used argument in syntactic theorizing.1 The assumption is that this rule is a sort of loophole to avoid ungrammaticality, or in Pesetsky’s words (1982: 308): “Qui does not occur freely as a complementizer, but only ‚when needed’ to avoid an NIC violation. [...] In other words, qui is a form of que which provides an ‘escape hatch’ from the effects of the NIC.” Chomsky (1977) compares it with free deletion in COMP in English. Rizzi (1990; 1997) supports his assumptions concerning the agreement process in the COMP system with this rule. He states that in cases of felicitous subject extraction in French the agreeing complementizer is not 0, but the overt form qui. He assumes that an ECP violation is produced if the agreeing form does not occur and C is in what he considers as the unmarked form que. He further states that this rule is a morphological reflex of Spec-head-agreement between a trace and the head of COMP. Therefore Rizzi (1990: 56) assumes: (3)

qui = que + Agr

Rizzi (1990) accounts for the ungrammaticality of the object extraction (1b) by assuming that Spec-head-agreement requires a C-adjacent position of the extracted element. Furthermore, Rizzi (1990) assumes that the que Æ qui rule only applies when agreement occurs between C0 on the one hand and its specifier as well as its complements on the other hand. (Such a double agreement had already been described for Bavarian German by Bayer 1984 concerning sentences like Wenn-st du kumm-st). The result would be as shown in (4): t’ agrees with C0, t with I0 and – due to the identity of t and t’ – C0 with the maximal projection of I0 (by transitivity). (4)

[t’ C0 [ t I0 ...

One aim of this paper was to test this assumption in an experimentally controlled process of data collection using a graded concept of grammaticality. Such a graded concept is assumed in Chomsky (1964), but it is already given up in Chomsky (1965) in favour of a distinction between grammaticality and acceptability. However, a rather pre-theoretic concept of gradedness persists in the syntactic literature, sometimes tacitly through the use of symbols like “?”, “??”, etc. Furthermore, some principles even make use of theoretical predictions in line with a graded concept (e.g. ECP vs. subjacency violation).

Gradedness and Consistency in Grammaticality Judgments

9

In order to measure graded grammaticality judgments, an instrument based on the principle of graphic rating (cf. Guilford 1954: 270; Taylor and Parker 1964) has been developed. Part of the design is an extensive instruction and training phase. Judgments are expressed by drawing a line on a bipolar scale (and not by marking one of several boxes with a cross). Within the limits of a person’s differential capacity of judgment, a theoretically infinite number of gradations are therefore possible. The test was presented in a A4 ring binder containing two horizontally turned A5 sheets (see diagram). comparaison

Quel avion, pouvez-vous penser, prennent les touristes chinois ?

Jugement (510B)

Le gros buffet en chêne doit être retapé.

Quelle est l’armoire que refont les employés de la scierie ?

Figure 1.

The upper sheet contained the reference sentence, the lower sheet the experimental sentence. The sentence, with the graphic rating scale under it, was printed in the middle of each sheet. After the subject had rated the experimental sentence on the lower sheet, he or she turned this page to go on with the next sentence. The upper sheet with the reference sentence was not turned and remained visible during the whole test. The judgments were given relative to the reference sentence judged in the beginning by the sub-

10

Aria Adli

ject himself, within both endpoints (obviously well-formed and obviously ungrammatical) given by the design. It was, therefore, a bipolar, anchored rating scale with the characteristic that the subjects choose the anchor for themselves. The reference sentence consisted of a suboptimal, but not extremely ungrammatical, sentence. The dependent variable was the difference between the judgment of a particular sentence and the judgment of the reference sentence. The test started, after the presentation of written instructions, with an interactive instruction and training phase of about 10 to 15 minutes. During this phase, two main concepts were introduced in a 9step procedure: isolated grammaticality and gradedness (cf. Adli 2004: 8588 for details). A pre-test revealed the importance of such an additional training phase. Although not directly visible to the naked eye, the concept of grammaticality was often confounded with extra-grammatical factors (e.g., the plausibility of the situation described by the sentence). The understanding of the concept of isolated grammaticality is necessary to reduce interferences with semantic and pragmatic effects. Furthermore, subjects had to replace the common distinction between grammatical and ungrammatical, or "good" and "bad", sentences with a truly graded notion of grammaticality. They were introduced to these two main concepts, among other things, by rating different training sentences and by explaining the reasons for their ratings to the experimenter, who could therefore adapt the instructions to the level of understanding of each subject. After instruction and training, the experimenter left the room. Given that reliability can generally be improved by the use of several items, each syntactic structure was presented in 4 lexical variants. Since the use of experimental methods in grammar research is recent, and not much experience exists yet, the evaluation of the instrument with regard to its reliability is important. A reliability analysis indicates the limits of an instrument concerning the precision of its measurements. Furthermore, the only three studies on the reliability of experimentally collected, graded grammaticality judgments I know of, namely Bard, Robertson and Sorace (1996: 61), Cowart (1997: 23) and Keller (2000: 215), rely on erroneous or improper calculations.2 Reliability is evaluated by Cronbach’s D, which is a measure of internal consistency (see Cronbach 1951). It indicates the consistency between the different lexical variants of a sentence without taking into consideration mean differences between the variants. Indeed, the reliability of the measurements turned out to be sufficiently high (Cronbach’s D = 0.85).

Gradedness and Consistency in Grammaticality Judgments

11

78 French native speakers participated in the experiment. Validity was ensured by means of a special index (called violation of trivial judgments), reflecting the capability of the subject to give graded grammaticality judgments (cf. Adli 2004: 89-91). By means of this criterion, those subjects who were deemed unable to perform this task could be identified and excluded; the data of 65 subjects could be utilized for the subsequent statistical analyses. Given that the measure of graded grammaticality does not reflect the categorical distinction between well-formed and ill-formed sentences, and given that such an information is still – for theory-internal reasons – important, grammatical as well as ungrammatical constructions were included in the test design in order to make available comparative scale points for the interpretation process: The experiment did not only cover subject-initial and object-initial interrogatives with long extraction over que and/or qui. The clearly felicitous constructions (5a) and (5b) with a PP-parenthetical “d’après vous” and the sentences (6a) and (6b) with the expression “croyezvous” at the position after the wh-phrase were also included – some aspects of their syntax are discussed in section 3 (see Adli 2004 for full details).3 (5)

a. Quel appache, d'après vous, méconnaît les obstacles de l'hiver? which Appache according you ignores the difficulties of the winter b. Quel animal, d'après vous, rôtissent les esquimaux de l' igloo? which animal according you grill the Eskimos of the Igloo

(6)

a. (?)Quel architecte, croyez-vous, conçoit les demeures du président? which architect think you designs the residences of the president b. (?)Quel argent, croyez-vous, investissent les organisateurs du bal? which money think you invest the organisers of the ball

(7)

a. ??Quel ingénieur, pensez-vous, qui conçoit la fusée de l'Aérospatiale? which engineer think you quiCOMP designs the rocket of Aérospatiale b. *Quel idiot, pensez-vous, que perd les clefs de la maison? which idiot think you queCOMP looses the keys of the house

12

Aria Adli

c. ?Quel appel, pensez-vous, que reçoivent les policiers du quartier? which call think you queCOMP receive the police officers of the district The data was analysed with a two-way repeated measures ANOVA (variable A: “d’après vous” / “croyez-vous” / “pensez-vous qu-”; variable B: subject / object). I took into consideration not only information about the significance level, but also about the effect size of the differences (in terms of partial K2, cf. Cohen 1973; see also Keren & Lewis 1979: 119). The hypothesis was tested at D = 5%, which approximately allows for D = E.4 In the following, only the relevant results concerning the que Æ qui issue will be given: In order to take into account the whole details of the results, a complete set of orthogonal simple effects was tested as regards the subject interrogatives (cf. Bortz 1999: 254), contrasting (i) (5a). vs. (6a), (ii) (7a) vs. (7b), as well as (iii) (5a) and (6a) vs . (7a) and (7b) [+]grammatical 60

subject questions

grammaticality

40

object questions

20

0

-20

-40

"...d'après vous..." "...croyez-vous..."

"...pensez-vous qui...""...pensez-vous que..."

-60

[-]grammatical

Figure 2.

The results show a partial K2 of 0.183 (p...@ swing even when standing can I swing

(14)

***File CHI020628.cha": line 535; Deontic Reading *CHI: wa kann ich malen ## wann kann ich malen ? ‘whe[n] can I paint when can I paint? ’ *MOT: erst wenn du dich hinsetzt #2. ‘only when you sit down.’

Linguistic Constraints on the Acquisition of Epistemic Modal Verbs

179

(15) ***File CHI020613.cha": line 136; (MOT and CHI are making puppets) *CHI: da kann man des durchstecken # und ein Clown machen. there can one that through-put and a clown make *MOT: Clown ist schon ziemlich schwer # . The difference with respect to ORDERING SOURCE correlates with overt subject realization. We classified the MVs of the corpus by their ORDERING SOURCE and checked how many occurrences of each class were used with an overt subject. Figure 4 presents the results: deontic readings, which require an external ORDERING SOURCE, are relatively rare at MLU I (13% of all MV-occurrences), but their frequency increases (from 25% at MLU II to 34% at MLU III). Realistic MVs, which also have an external ordering source, are the least frequent MVs (no occurrence at MLU I, 7% at MLU II, and 9% at MLU III), but they hardly ever occur without a subject.

Percentages

Overt Subjects and Ordering Source 100 90 80 70 60 50 40 30 20 10 0

Dispo-sub Deo-sub Real-sub MV-sub FV-sub

MLU I

MLU II

MLU III

Figure 4.

The distribution of overt subjects is quite different: MVs in general as well as MVs in dispositional readings are used with an overt subject in about 50% of their occurrences, whereas the frequency of overt subjects for deontic MVs (with external ordering source) increases from 63% at MLU I to 83% at MLU III. Caroline, obviously, has the capacity to produce either

180

Veronika Ehrich

full clauses with subjects or elliptical ones without, but she seems to avoid the effort of producing a full clause whenever the semantics of the MV ensures that the intended message will be recognized anyway. Subjects of FVs and dispositional MVs (with internal ordering source) are randomly omitted between MLU I and MLU III, whereas subjects of deontic and realistic MVs (with external ordering source) are spelled out on a quite regular basis by stage MLU II. In other words, the increasing availability of different MV readings goes along with the development of a more elaborate syntax. 3.4

Bare infinitives and strict coherence

According to Reis (2001), STRICT COHERENCE is the syntactic correlate of epistemicity, its defining feature being the embedding of a bare infinitive. Thus, on the STRICT COHERENCE account, Caroline should have acquired the bare infinitive constraint on MVs before she produces her first epistemic MV readings. In fact, even some of her earliest MVs combine with a bare infinitive (see (11) above for illustration), and, by age 2;3, Caroline even distinguishes bare infinitives and zu-infinitives (16 -18). But she is not very consistent in this respect, sometimes she omits the infinitive ending (18), and, in 38% to 46% of her MV-productions she fails to embed an infinitive at all. (16)

***File CHI020302.cha": line 418; *MOT: guck mal wer wohnt denn in dem schwarzen Haus? *CHI: ja #2 ein Dach ## musst du malen #1. yes a roof must you paint-Inf *CHI: ich mal #3 Dach # ein Dach #1.

(17)

*** File "90-02-17.cha": line 235. *CHI: ja #2 brauch keine Angst zu haben die Ente # . yes need not be afraid the duck *MOT: ja aber die Eulen die fressen nämlich manchmal Enten # ***" CHI020325.cha":line 30; **MOT:aber du kannst zum Beispiel # ne Strumpfhose naehen. *CHI: strumpf # trumpf # Hose naeh kann doch nicht # . panty panty hose sew can yet not

(18)

Linguistic Constraints on the Acquisition of Epistemic Modal Verbs

181

While being able to obey the bare infinitive constraint in principle, she avoids the infinitive quite frequently. This suggests that integration of MVs and infinitives into a strict coherent construction poses considerable difficulties for her (compare Table 3). Table 3. Distribution of Infinitives. The first figure gives the absolute number of MVs in each group, the second figure gives the absolute number of MVs embedding bare infinitives in that group, percentages of embedded infinitives in parentheses. MVtotal 81, 31 (38%) MLU II 540, 263 (46%) MLU III 282, 122 (43%) MLU I

Dispositional 46, 13 (10%) 340, 123 (36%) 127, 57 (44%)

Deontic 10, 9 (90%)

Realistic 0

136, 83 (61%) 71, 61 (81%)

43, 42 (97%) 27, 22 (81%)

Again, there are remarkable differences with respect to the different MV readings. Deontic and realistic MV occurrences, which are based on an external ordering source, occur with bare infinitives almost twice as often as dispositional MVs based on an internal ordering source. Obviously, Caroline’s performance on the bare infinitive constraint and her performance on the overt subject requirement follow the same strategy: dispositional MVs occur in more elliptical constructions, whereas deontic and realistic MVs tend to be used in fully integrated structures with overt subjects plus embedded infinitives. We measured Caroline’s MV-productions for the degree of integration. MV-constructions containing a bare infinitive in addition to an overt subject are counted as fully integrated (Integration Factor =1), MVs accompanied by either a bare infinitve or an overt subject count as partially integrated (=0.5). A zero-degree of integration (=0) is assumed where a MV construction lacks a bare infinitive as well as a subject. We calculated the Mean Integration Factor for each MLU stage by adding up the values obtained for the individual MVs at a given stage and dividing the sum by the total number of MVs occurring at that stage. Figure 5 shows that the degree of integration is lowest for MVs in dispositional readings, and highest for MVs in realistic readings (= 0.79 at MLU III).

182

Veronika Ehrich

Integration Factor 1 = 100

Integration Factors 100 90 80 70 60 50 40 30 20 10 0

MV-total MV-disp MV-deo MV-real

MLU I

MLU II

MLU III

Figure 5. Integration Factors for MVs in Different Readings

These data, again, are evidence for a close interaction between the syntax and the semantics of MVs in child language. This does not necessarily entail that syntax is the source of MV semantics or vice versa. It may very well be the case that semantic and syntactic capacities, while having developed separately up to a certain time (each in its own way and temporal order), converge at a certain point in bringing about a growing variety of MV readings with their specific syntactic shapes.

3.5

Evidence for a developing THEORY OF MIND

In order to find out whether Caroline had developed a THEORY OF MIND when producing the first epistemic MVs, the corpus was checked for occurences of mental verbs like wissen (‘know’), denken (‘think’), meinen (‘mean’), verstehen (‘understand’), glauben (‘believe’), finden (‘judge’) and vergessen (‘forget’). These verbs are used in reference to mental states in 40% of their overall occurrences (Benz 2004). See (19) for illustration:

Linguistic Constraints on the Acquisition of Epistemic Modal Verbs

(19)

183

***CHI020413.cha": line 15. *MOT: sprichst du mit deiner Puppe #1? *CHI: ja # ja #1 [=! stoehnt] . *CHI: kann nich hinstelln # . *CHI: weisst du genau #6. know you exactly

Caroline’s use of sentential adverbs like vielleicht (‘perhaps’) is further evidence for her ability of modal reasoning. She uses vielleicht in a deliberating function at almost the same age at which she produces her first epistemic MVs (20). (20)

***CHI020814: line 81. *MOT: und da # hat er da sein Taschentuch #1? *CHI: nein ein Baby #1. >...@ *MOT: und warum sitzt es da #1? *CHI: vielleicht # ist da #2 in Papis Bauch #2. perhaps is there >a baby@ in daddy’s belly *MOT: aber Papis haben keine Babys im Bauch ##

Caroline talks about consequences resulting from a possible action and uses first conditionals. The one in (21), though connected to an ongoing action, proves that she starts reflecting about alternative futures resulting from her actions already by age 2;7. (21)

***CHI020710.cha": line 237. *MOT: dis # ah ja ich probier es jetzt mal nur mit falten . *CHI: nein dis # brauchst du ## zum Kleben #3. *CHI: wenn dis #gar nich geht ## machen wir ohne ##Klebe #1. if this not works do we without glue *CHI: ich mach #1 [=! stoehnt].

There is, thus, firm evidence that Caroline has acquired an elementary by age 2.7. She has not only acquired inferencing capacities but is also able to express her reasoning in appropriate linguistic terms. THEORY OF MIND

184

Veronika Ehrich

4

Conclusion

MV-acquisition studies of the last twenty years have been mainly concerned with cognitve constraints on the rise of epistemic meanings, whereas the form-meaning correlation has hardly been tackled. By contrast, the present study is focussed on the interaction between the syntax and the semantics of modal verbs as evidenced by Caroline’s development. Caroline’s production of MVs shows that form and meaning of MVs are indeed tightly connected. This is not primarily a function of the MODAL BASE contrast between circumstantials and epistemics, but seems to depend on the contrast between INTERNAL (ability and bouletic readings) and EXTERNAL ORDERING SOURCEs (deontic, realistic and epistemic readings). Before using her first epistemic MVs by age 2;7, Caroline starts varying her syntax for circumstantial MVs. While elliptical constructions lacking a subject phrase, a bare infinitive, or both, are predominant in bouletic and ability readings of MVs even beyond age 2;10, Caroline uses a more elaborate syntax for deontic and realistic readings by age t 2;4. The increase in semantic MV variation goes along with the production of more full-fledged syntactic structures. Caroline’s growing capacity for handling semantic polyfunctionality and her growing command of MV syntax converge in the period from age 2;4 to 2;10. This is also the age when she produces her first epistemic MVs. But syntactic development in general, and Caroline’s growing command of strict coherence in particular, are probably not the only source of epistemicity. The fact that reference to mental states, first epistemic adverbs and conditionals temporally overlap with her first epistemic MVs indicates that the cognitive basis for modal reasoning develops across various grammatical categories. Obviously, syntactic progress, semantic diversification and cognitive development are all necessary prerequisites for the rise of epistemicity, but none seems to be sufficient by itself. The data reported here do not support any monodirectional account in terms of syntactic vs. semantic boot-strapping, nor in terms of strict cognitivism. Caroline’s first epistemic MV uses seem to arise from converging developments in syntax, semantics and cognition. She makes use of whatever evidence is available to her, in order to gain access to the grammar of MVs, and, of whatever capacity she has, in order to make herself understood. This is, perhaps, just the way language development works.

Linguistic Constraints on the Acquisition of Epistemic Modal Verbs

185

Acknowledgements I would like to thank the editors and two anonymous reviewers for their helpful comments and suggestions. I am greatly indebted to the members of the ‘Modal Verb Project’ in the SFB 441, especially to Marga Reis, who shared their ideas about the syntax, semantics, and acquisition of modal verbs with me.

References Benz, Judith 2004 Epistemische Ausdrücke in der Kindersprache. Zulassungsarbeit, Tübingen: Deutsches Seminar. Clahsen, Harald and Martina Penke 1992 The Acquisition of Agreement Morphology and its Syntactic Consequences: New Evidence on German Child Language from the Simone-Corpus. In Jürgen M. Meisel, (ed.), The Acquisition of Verb Placement, 181-223. Dordrecht: Kluwer. Doitchinov, Serge 2001 „Es kann sein, dass der Junge ins Haus gegangen ist“. Zum Spracherwerb von können in epistemischer Lesart. In Reimar Müller, Marga Reis, (eds.), Modalität und Modalverben im Deutschen. Linguistische Berichte, Sonderheft 9: 109 – 134. Hamburg: Buske. 2005 Why do children fail to understand weak epistemic terms? An experimental study. [This volume]. Jordens, Peter 1990 The Acquisition of Verb Placement in Dutch and German. Linguistics 28: 1407 – 1448. 2002 The acquisition of verb placement in Dutch and German. Linguistics 40: 687 – 765. Kiss, Tibor 1995 Infinitive Komplementation – Morphologie, thematische und syntaktische Relationen. Neue Studien zum deutschen verbum infinitum. Tübingen: Niemeyer. Kratzer, Angelika 1991 ‘Modality’. In Arnim v. Stechow and Dieter Wunderlich, (eds.), Semantik. Ein internationales Handbuch der zeitgenössischen Forschung, 639-650. Berlin/New York: de Gruyter. MacWhinney, Brian B. 2000 The CHILDES project: Tools for Analysing Talk. Third edition. Mahwah, NJ, Lawrence Erlbaum Associates. http://childes.psy.cmu.edu Müller, Reimar and Marga Reis (eds.) 2001 Modalität und Modalverben im Deutschen. Linguistische Berichte, Sonderheft 9. Hamburg: Buske.

186

Veronika Ehrich

Öhlschläger, Günther 1989 Zur Syntax und Semantik der Modalverben des Deutschen. Tübingen: Niemeyer. Papafragou, Anna 2002 Modality and theory of mind. Perspectives from language development and autism. In Sjef Barbiers, Frits Beukema, and Wim v.d. Wurff (eds.), Modality and its interaction with the verbal system, 185-204. Amsterdam: Benjamins. Reis, Marga 2001 Bilden Modalverben im Deutschen eine syntaktische Klasse? In Reimar Müller and Marga Reis (eds.), Modalität und Modalverben im Deutschen. Linguistische Berichte, Sonderheft 9: 287-318. Hamburg: Buske. 2004 Modals, so-called Semi-Modals and Grammaticalization. Unpublished ms., Tübingen University. Roberts, Ian and Anna, Roussou 1999 A formal approach to grammaticalization. Linguistics 37: 1011-1041. Ross, John 1969 Auxiliaries as main verbs. In William Todd (ed.), Studies in Philosophical Linguistics. Series I: 77-102. Evanston, Ill.: Great expectations Press. Shatz, Marilyn, Henry M. Wellman, and Sharon Silber 1983 The acquisition of mental verbs. A systematic investigation of the first reference to mental state. Cognition 14: 301-321. Shatz, Marilyn and Sharon A. Wilcox 1991 Constraints on the acquisition of English modals. In Susan A. Gelman and James P. Byrnes (eds.), Perspectives on language and thought: interrelations in development, 319 - 353. New York: Cambridge University Press. Stechow, Arnim v. and Wolfgang Sternefeld 1988 Bausteine syntaktischen Wissens. Ein Lehrbuch der generativen Grammatik. Opladen: Westdeutscher Verlag. Stephany, Ursula 1995 Function and Form of Modality in First and Second Language Acquisition. In Anna Giacalone Ramat and Grazia Crocco Galeas (eds.), From Pragmatics to Syntax. Modality in Second Language Acquisition, 105-120. Tübingen: Narr. Wurmbrand, Susanne 1999 Modal verbs must be raising verbs. In Proceedings of the 18th West Coast Conference on Formal Linguistics (WCCFL 18): 599-612. Somerville, MA: Cascadilla Press.

The Decathlon Model of Empirical Syntax Sam Featherston

1 Introduction This paper reports our investigations into the data base of syntactic theory, speciﬁcally addressing the similarities and differences between corpus data and experimentally obtained well-formedness judgements and sketching the implications for the construct of grammaticality and the architecture of the grammar which our ﬁndings have. The motivation for these studies was a dissatisfaction with the state of affairs in syntax, when two syntacticians can look at the same phenomenon and come up with widely differing analyses of what is going on. Another disappointment is the lack of any real forward movement in theory: alternative analyses seem to succeed each other more due to fashion than due to falsiﬁcation. We might say that syntactic description, let alone syntactic explanation, is underdetermined by its data base. This is in part due to the nature of the available evidence: most data feeding into syntactic theory has signiﬁcant ﬂaws: it is fuzzy and it reﬂects multiple factors, only some of which are relevant to theory. These factors are difﬁcult to identify and even more difﬁcult to distinguish (eg Sch¨utze 1996). Judgements have been particularly criticized as a data type, partly because of their inherent qualities, but partly for the way that they have been used (eg Labov 1996). One problem is that, faced with the impreciseness of judgement intuitions, researchers have idealized this data type to a very great degree, reducing the scale to a binary opposition, with marginal values as unclear cases. In part as a response to this situation, some syntacticians have sought other data sources, such as corpus frequencies and processing studies, which has tended to split the ﬁeld and furthered the development of schools of syntax, who have neither a common formalism nor a common data base which would permit rapprochement between them. Related to this diversiﬁcation into schools, a range of different grammar architectures have arisen. Generative syntacticians most commonly still use judgements, and assume a “live rail” grammar, in which any infringement of a grammatical rule causes a structure to be excluded absolutely. Those interested in competition models such as Optimality Theory (OT, Prince and

188 Sam Featherston Smolensky 1993) will tend to use frequency data and allow some idealization, while those favouring probabilistic models will tend to take a more ﬁnegrained approach to frequency data, and account for the variants using probabilities (eg Manning 2003). It need hardly be said that these models of the architecture of the grammar cannot all be right. Our view was that a more detailed study of data types and their characteristics might provide a way forward. When we have a more detailed understanding of the factors which each reﬂects and thus how the different types relate to each other, then we shall be in a better position to judge the evidence of each for syntax. This should also allow us to establish a well-founded procedure for idealization for each. With more ﬁnely grained data, we should be in a better position to determine how the grammar functions, and which architecture is therefore the correct one. In the following, we ﬁrst sketch the sort of studies we have undertaken, outline the broad picture of the results, and then move on to the implications that these ﬁndings have for the relationship between data and theory and for the nature of the grammar. It turns out that the grammar has a rather different architecture to what is generally assumed. Note that this article aims to provide an overview of our results and our interpretation of their wider implications for theory; space does not permit discussion of the individual studies (see Featherston 2002, 2003, 2004, 2005a, 2005b). 2 Our studies We have carried out many studies using both frequencies and judgements, aiming ﬁrstly to clarify these issues of data and data type, secondly to clear up outstanding questions in the syntax, and thirdly to clarify the nature of the grammar. We have performed experiments on German and English, and have addressed a range of syntactic structures, among others island constraints, reﬂexives, reciprocals, word order, parenthetical insertions and echo questions. Our frequency data for German is drawn from the COSMAS I corpus of German (IDS, Mannheim), and for English, the British National Corpus (Oxford). We have generally elicited our judgement data using a variant of the magnitude estimation procedure (Bard et al. 1993). This method has three main differences to standard judgement elicitation. First, only relative judgements are gathered. Subjects are asked whether example A sounds more or less “natural” than example B, and by how much, but no absolute criterion of well-formedness is used. This distinction between relative and categorical judgements is important (see Section 5.2 below), but it also has the simple

The Decathlon Model 189

practical advantage that it defuses the problem of having to deﬁne a cutoff point between well-formed and ill-formed. Second, to anchor the judgements, subjects give their judgements relative to reference items and to their own previous judgements. Third, there is no imposed scale; no top or bottom limit nor minimum division between scores. Judgements are expressed in numerical form, and decimal fractions are allowed. This method allows informants to express all the differences in “naturalness” that they perceive, with no coercion to a given scale. When the limitation to a scale selected by the linguist is removed, the results exhibit more differentiation than conventional judgements are assumed to contain. But this additional information is an inherent part of grammaticality judgements, which was always potentially present. Previous collection methods were insufﬁciently sensitive to reveal this detail; deliberately so, since asking for categorical judgements is a form of idealization, of simplifying the explanatory task by reducing the amount of information gathered. In this paper we argue that this idealization has not only, as intended, simpliﬁed the job of explanation, but also distorted the picture, and led to some false conceptions of the way that the grammar and associated systems work. 3 Relative judgements In this section we sketch out the general pattern that the results of the judgement studies showed. This is necessary, since these experimentally obtained judgements reveal a very different pattern to that often assumed (but see Keller 2000 and Cowart 1996 for discussion and much insight). Firstly, judged well-formedness is a continuum. Figure 1 shows the results of a typical experiment gathering judgements from informants who are not forced to use a particular scale. In the graph, the four different syntactic conditions tested appear along on the horizontal axis and the mean judged wellformedness on the vertical axis, with higher scores reﬂecting “better” judgements. The error bars show the mean values and 95% conﬁdence intervals of the scores for each condition. Let us be clear that these mean judgements can show no differential effect of lexis, context or plausibility, since these factors are fully controlled for. These error bars show only effects of structure. This experiment looked at the effect of discourse linking (Pesetsky 1987), which one might loosely paraphrase here as the effect of wh-item type on the permissable order of wh-items in multiple wh-questions. The standard view of the data is that (1a), (1b), and (1d) are good, but (1c) is ungrammatical, so syntacticians look for a factor affecting (1c).

190 Sam Featherston

Figure 1. Given freedom of judgement scale, informants do not just distinguish ’good’ and ’bad’ structures, but also ’good’ ones and ’better’ ones.

(1)

a. Who ate what? b. Which person ate which food? c. *What did who eat? d. Which food did which person eat?

The results of this study conﬁrmed that (1c) is plainly worse than the other conditions, but the data also reveals that (1b) is not only good, but also clearly better than (1a) and (1d). What is more, (1b) is just about as much better than (1a), as (1d) is better than (1c). The factor we should be looking for therefore applies to both (1b) and (1d). If we use a model of well-formedness idealized to a binary opposition, in which (1a), (1b), and (1d) are all just good, not only do we do serious violence to the data, but we will also be looking in the wrong place for the correct syntactic account. In order to deal with this data, we must have a model of well-formedness as a continuum, on which there is not only good and bad, but also good and better. A model with good, bad, and intermediate positions (such as example structures in syntax with a question mark) will not sufﬁce here. It follows that there are cases where the correct syntactic analysis of a structure can only be represented with a model of well-formedness as a continuum. In the following, we will illustrate our points about the data with reference to a particular piece of work in which both judgements and frequencies were collected. Let us be clear that this is just one example study, but other studies of the same type show the same basic patterns. The focus of this work was

The Decathlon Model 191

Figure 2. The data pattern form judgement studies. Given the choice, informants do not choose to bunch structures as good or bad, instead they produce a continuum of well-formedness.

the realizations of coreferent objects in the mittelfeld in German. Example (2) shows just eight of the possibilities; we also tested full NPs as antecedents. Full linguistic details of this work are in Featherston (2002), but these are not necessary for a full understanding of the present paper. Note that we tested 16 conditions in the original study, but here we shall sometimes just report on eight of them, for clarity. (2)

a. b. c. d.

ihni (selbst) im Spiegel gezeigt habe weil ich ihmi in.the mirror shown have as I him.DAT him.ACC self ihmi (selbst) im Spiegel gezeigt habe weil ich ihni in.the mirror shown have as I him.ACC him.DAT self Spiegel gezeigt habe sichi (selbst) im weil ich ihmi in.the mirror shown have as I him.DAT REFL self Spiegel gezeigt habe sichi (selbst) im weil ich ihni in.the mirror shown have as I him.ACC REFL self

Figure 2 shows the results of this study, this time with eight conditions. Here we have ordered the conditions not by their linguistic features (for this see Figure 3 below), but in order of their judged well-formedness, from best to worst. The judged well-formedness of these structures descends gradually. Looking at this continuum, it will be clear that the choice of any point at which to locate the cut-off point between well-formed and ill-formed will

192 Sam Featherston

Figure 3. This graph shows the results of the same judgement study on eight structural variants deﬁned by three binary features. All syntactic features have an effect upon the judgements, and these effects are cumulative.

be arbitrary. These examples straddle the putative location of the cut-off point, since the best among them would be regarded as well-formed and the worst ones as ill-formed. In fact informants show no sign of using categorical well-formedness when given the option not to, instead they always use a continuum. We put forward our explanation of the intuition of categorical well-formedness in Section 5.2. We now turn to Figure 3, which shows the same data set but with the conditions ordered by their grammatical features. This was a 2x2x2 experimental design, that is, we tested eight structural variants differing on three binary parameters: a b c. The graph shows the conditions across the horizontal axis and the mean judged well-formedness on the vertical axis, with higher scores indicating better judgements. Each pair of error bars linked by a line is a minimal pair differing only in one of the three features a b c. In each case one of the pair violates a constraint and the other does not. We annotate the conditions in the graphic with the numbers 1 to 8 for easy identiﬁcation, but the values of the three syntactic features for each condition are given on the baseline. For example, condition 1 has the values a:1, b:0 and c:0, which means that it violates constraints b and c, but not a. Let us look at the pattern of data. The clear ﬁnding is that well-formedness judgements directly correspond to the syntactic conditions, that is, the conditions are judged well-formed to the extent that they do not violate the syn-

The Decathlon Model 193

tactic constraints, but any and every constraint which is violated affects the judgements. It is evident that each constraint differentiating a minimal pair has a consistent effect upon the judgements: the relationship between the scores assigned to each pair differentiated by a given constraint is the same. So the relationship between conditions 1 and 2 is the same as between 3 and 4, and between 5 and 6, and between 7 and 8. Put differently, the relation between all 1bc and all 0bc is consistent. Notice also that this is generally true: for each pair ab0 and ab1, and a0c and a1c, the relationship is the same. Whether the structure was good or bad before does not matter: the application of these constraint violation costs is blind and automatic. Notice also that the pairs close to each other (eg 1 and 2, 3 and 4,...), linked by the short broken line, differ in their ratings only moderately, which shows that this particular constraint has a relatively small violation cost. The other two sets of minimal pairs (1 and 3, 2 and 4, etc and 1 and 5, 2 and 6, etc) have greater violation costs, and consistently so, but the systematicity is just as evident. The violation of a given linguistic constraint entails a given difference in judgements. We can say that each linguistic factor has a quantiﬁable, and constraint-speciﬁc effect upon judged well-formedness. An additional important point (Keller 2000) is that these violation costs are cumulative. The violation of any constraint entails a violation cost in judged well-formedness, to which any further violation costs are added. It is thus systematically the case that more violations cause a structure to be judged worse. This raises the question whether any of these constraints could be regarded as a ‘hard’ constraint. Perhaps the traditional deﬁnition of a ‘hard’ constraint is one which excludes a structure from being part of the language. In judgement data we might expect a ’hard’ constraint to cause a violating structure to drop to the bottom of the scale. One might predict that a ’hard’ constraint would cause a structure to be judged so bad that no further additional constraint violation could make it any worse. Perhaps surprisingly, experience with this data type has shown that there is apparently no such thing as a ‘hard’ constraint on this deﬁnition. The effect of a violation is only ever to make a structure worse, by an identiﬁable amount; no constraint violation makes a structure so bad that it cannot be made worse by an additional violation. We refer to this quality of linguistic constraints in judgement studies as survivability, which is best understood in contrast to the OT concept of violability. OT’s violability means that under certain circumstances, constraints have no effect on the output, that is, they fail to apply. This is in part necessary because the only effect that a constraint can have in OT is to exclude categorically the violating structures - OT only has ‘hard’ constraints. Our

194 Sam Featherston Table 1. Data from COSMAS, IDS, Mannheim (531 million word forms) ihni ihmi ihmi ihni ihmi sichi ihni sichi ihni ihmi selbst ihmi ihni selbst ihmi sichi selbst ihni sichi selbst

(“him.ACC him.DAT”) (“him.DAT him.ACC”) (“him.DAT REFL.ACC”) (“him.ACC REFL.DAT”) (“him.ACC him.DAT SELF”) (“him.DAT him.ACC SELF”) (“him.DAT REFL.ACC SELF”) (“him.ACC REFL.DAT SELF”)

0 hits 0 hits 0 hits 1 hit 0 hits 0 hits 0 hits 14 hits

survivability means that all constraints always apply, exceptionlessly, and a given violation always has the same effect – there is no probabilistic element at all. The effect of a constraint violation is to cause a structure to be judged worse, but no violation excludes a structure. We lay out what can exclude a structure under Section 5.2 below. Notice that this is strong evidence that our informants are not using occurrence as a criterion when they give a judgement – they must be assumed to be responding to something else. In the light of the ﬁnding that violation costs as measured in judgements of well-formedness are cumulative, survivable and blindly applied, on the one hand, but, as we shall see in Section 5.1 below, not directly related to output frequency on the other, it seems reasonable to assume that these violation costs, and hence well-formedness as measured by judgements, are related to computional workload. This raises questions about what psycholinguistically plausible mechanism might allow us to convert cognitive workload into judgements, and why we have such an ability. We take up these questions in Section 5.2, but we now turn to consider the evidence of frequency data. 4 Judgement data and frequency data Frequency data reveals a very different pattern. Table 1 contains the data pattern of the frequency study looking at the same variants of object coreference structures as those judged in Figures 2 and 3. The important point here is the distribution of forms found in the corpus: one structure is found fourteen times, another one is found once, but none of the others appear at all. Frequency data shows evidence of a competitive interaction of candidate forms, which would seem to indicate that the “best” structure of a comparison set usually wins through to be produced. Intuitively, this seems to be evidence

The Decathlon Model 195

Figure 4. The contrast between COSMAS frequency data and experimental judgement data on the same phenomenon.

that there is a competition function in the grammar, which in particular Optimality Theory has raised to its central operating principle. Interestingly, slightly less “good” alternatives are sometimes produced, which would suggest that the competition for output functions probabilistically. This is the motivation for stochastic versions of OT (eg Boersma and Hayes 2001). Figure 4 allows us to compare the two data patterns directly, as it superimposes the two different measures of the same sixteen structures on a single graph. The error bars show the mean normalized judgements obtained for the sixteen structures tested (left-hand scale). These can be seen to increase steadily from the very bottom to the very top, while the frequencies (right-hand scale), represented by the line without error bars, creep across the bottom at zero, and only rise sharply at the right-hand end. The comparison of these two measurements of the same structures brings the contrast of the data patterns into sharp focus. The ﬁrst point to notice is their similarity: the same structures come top in both data types. The highest frequency structure is judged best and the next highest is judged second best, which makes it seem likely that the two data types are at least in part measuring the same underlying factor. But we should also note the key difference: the judgement data demonstrates that at least some part of the human linguistic computation mechanism is sensitive to differences among structures which are so bad that they would never be produced, for the structural variants on the left are surely so bad that they would never appear in any corpus, no matter how big. Since this is the case, it is plain that the two data types

196 Sam Featherston are also in part not measuring the same factor. We can therefore exclude categorically the possibility that relative judgements merely reﬂect frequency or probability of occurrence in some way. The attested frequency and probability of occurrence of the worst two thirds of these structural variants is exactly the same, and it is zero. These structures have in all likelihood never been used in all of human history, but our subjects can readily distinguish them in judgements, and do so very consistently. Our Decathlon Model of wellformedness and the architecture of grammar attempts to specify what process differentiates the two types of data, frequencies and relative judgements. 5 The Decathlon Model The name of this model derives from the athletic discipline of the decathlon. In this event, competitors take part in ten different sub-disciplines, and their performances are converted into a numerical form according to a set of standard scoring tables. The sum of these scores decides who wins the medals. But the scores are calculated not on their relative performance in the subdisciplines, but in their absolute performances, which means that whether an athlete comes ﬁrst, second, or third in a sub-discipline is of no signiﬁcance, what matters is that they perform at their personal best. In a sense therefore, they are not so much competing against each other at this stage as against themselves. Competition between competitors takes place at the second stage, where the ten numerical scores are totalled, and the highest scorer takes the gold. Something similar seems to us to be happening in human linguistic processing, as will become clear in this section. The Decathlon Model is at once an outline architecture of a grammar and at the same time an account of the differences between data types. Our ﬁnding that gradience reﬂects a real psychological phenomenon related to constraint violation cost (see Section 3) demands that the architecture of syntax reﬂect this reality, which current models generally do not do. An empirically adequate and psychologically real grammar must have the following features: quantiﬁable violation costs, a continuum of well-formedness, and survivable constraints (ie no constraint violation necessarily results in the exclusion from the language of the violating structure); all this to account for our judgement data. It must also generate output competitively and probabilistically so as to reﬂect the data patterns observed in frequencies. The obvious way to achieve this is for our syntax model to distinguish between a grammatical module which applies syntactic constraints and another which selects output. Our Decathlon Model thus has a Constraint Applica-

The Decathlon Model 197

Figure 5. The Decathlon Model of the grammar and grammaticality.

tion module, which applies constraints, assigns violation costs, and outputs form/meaning pairs, weighted with violation costs. We know certain things about the internal functioning of this module: constraints are applied blindly and exceptionlessly, and violation costs are cumulative. We may think of this module as containing the grammar, though it also contains the other factors which affect well-formedness in judgements. The second module, Output Selection, functions quite differently. Its task is to select from the possible form/meaning pairs the form which is to be output (in production processing) or the interpretation to be assigned to an input (in receptive processing), and exclude the others. It functions competitively and selects the best candidate on the basis of the weightings assigned by the Constraint Application module. This selection occurs probabilistically however, which accounts for occasional production of sub-optimal versions. In Figure 5 we see the computational steps which generate frequency data and judgements. In production we assume that an unformed message is delivered for formulation in the Constraint Application module, drawing on the resources of the lexicon. Incrementally, perhaps phrase by phrase, candidates for the linguistic representation together with their weightings are proposed to the Output Selection function, which selects the best, or one of the best. The arrows exiting the left-hand module show the candidate continuations of the structure passing to the selection module, their weightings represented by their offset positions. Sometimes two continuations will be roughly equally good: She turned the light off vs She turned off the light, in which case both will have about equal weightings and both will occur. Receptive processing makes use of the same two modules, Output Selection choosing in this case what form/meaning pair to assign to a given form, the input, rather than

198 Sam Featherston choosing what form/meaning pair to assign to a given meaning, the message the speaker wishes to convey. Giving judgements is a little different. The example is input processed as usual to determine its structure and meaning, but instead of returning the output of the selection module, relative judgements consist of returning the output of the Constraint Application function. Recall that Constraint Application outputs form/meaning pairs with a weighting. This of course requires the claim that the output of this module can be consciously accessed, as well as merely passed on as usual for selection. The capacity to be aware of ﬁnegrained cognitive workload is not something which we might have predicted for ourselves, but it is nevertheless not implausible, since we are certainly aware of more coarse-grained thinking effort. The difference between frequency measures and relative judgements can therefore be attributed to them being the outputs of two different modules of linguistic processing, both of which are independently motivated. This model has a number of explanatory advantages. First, it is ﬁrmly based on the primary data of syntax. It accounts for the differences in outcome patterns between data types, an outstanding question in linguistics. Frequency data reﬂects the output of the Output Selection module, which is (necessarily, since we produce only one form of an utterance) competitive. Since this module uses the weightings which are output by the Constraint Application module, we account for the fact that judgements and frequencies agree in identifying the same forms as optimal. These weightings are themselves functionally motivated by their identiﬁcation with computational complexity, an explanatorily economical association, since we know of the existence of workload effects from other sources, such as processing data. The fact that output selection occurs probabilistically accounts for the occasional production of sub-optimal versions: rare but documented counterexamples in corpus data are thus no threat to grammatical generalizations in this model. Note that this is not an unprincipled method of accounting for awkward data, on the contrary, it makes strong and testable predictions: the most frequently occurring variant should be that which is judged best, but the much lower frequency of alternative variants should be strictly in order of their judged well-formedness. Second, it ties the grammar in to evidence from sentence processing. It is consensual that syntactic processing operates on-line, incrementally, and applies information from multiple sources in order to take decisions. It has often been suggested that the processor consists of a constraint component and a decision component which prunes less optimal interpretations or outputs

The Decathlon Model 199

(see Featherston 2001 for discussion of parser types). Our model is compatible with the evidence that we readily understand structures with errors, for example. This makes it necessary that we should be able to assign a structure to input which contains faults. Our Constraint Application model can account for this well-documented characteristic, since structures with constraint violations are not immediately excluded from the language but merely given more negative weightings. No model which assumes that a grammatical violation cost is identical with exclusion from the language can do this. This fault-tolerant quality greatly extends the range of linguistic data that the grammar can account for. A third strength is that it provides some explanation of the wide variation in grammar architectures that we ﬁnd competing in linguistic theory. Each of these captures a part of the fuller picture that we have sketched: until the recent interest in competition in syntax (eg M¨uller and Sternefeld 2001), it was generally the case that all constraints were thought to apply to all structures, unorderedly, blindly, and automatically. Our judgement data conﬁrms the empirical reality of this and it is reﬂected in our model. OT, by contrast, is entirely committed to competition, motivated by the insight that it is generally the best of any competing set of structural alternatives which is produced. This too reﬂects a real aspect of the empirical data: the process of selecting a form to produce necessarily results in a competitive interaction – the non-occurrence of anything but the best. This is thus included in the Output Selection module in our model. Probabilistic grammars (cf Manning 2003) too have their motivation: there is indeed a probabilistic component in the linguistic production system, although our relative judgements suggest that it is no part of the grammar, which operates blindly and exceptionlessly, but is located further downstream at the selection stage. Each of these grammar types can achieve some success because each reﬂects an aspect of the data: the Decathlon Model shows that they need not be contradictory, and includes all three features simultaneously. Our fourth and last explanatory advantage concerns the position of the grammar in the wider picture of evidence about the way language works. Our model allows the syntax to cover a much wider range of phenomena. Such issues as linguistic variation and language acquisition can be accounted for in a model with exceptionless constraint application but a parameter of violation cost strength. For example, Aissen & Bresnan’s (2002) Stochastic Generalization notes that similar constraints may be found cross-linguistically, but they appear grammatical and categorical in one language while being mere statistical tendencies in another. We have a ready account of these ﬁndings:

200 Sam Featherston the same factors exist across languages, but their violation costs vary, due to interactions of constraints (for the superiority effect in German and English as an example of this, see Featherston 2005b). Not only the differences between languages, but also regional, sociolinguistic and even idiomatic variation can be encoded as differences in violation cost amplitude. The learning of the language-speciﬁc parts of these violation strengths can thus be seen as a part of the acquisition of syntax. Our model thus offers a far wider view of the linguistic environment than most approaches to syntax. In this it bears a resemblance to the syntax of the sixties and seventies, when questions about the position of grammar in a more general cognitive setting were a standard issue for syntacticians. More recently they have tended to see their role as developing grammars within a psycholinguistic framework which in the meantime has become not merely a consensus, but rather a part of the set of basic assumptions of syntactic theory. Syntacticians now tend to devise syntactic analyses within this given conceptual space, rather than question the shape and extent of the space itself. In our work we have aimed to re-open this debate, and revisit these assumptions in the light of the new data available. 5.1

Well-formedness does not directly trigger occurrence

Our model is also supported by data from the interaction of well-formedness and occurrence. The standard assumption is that the functions of constraint application and output selection are not to be distinguished, and that they take place with the same module. In generative grammar, this would predict that any structure which is generated and does not violate any constraint on structure is grammatical and may be produced, while in OT the last candidate remaining, the only well-formed one, is produced. Both of these thus assume that production depends directly on the grammar, and that well-formedness directly determines occurrence. The Decathlon Model however claims that production competition determines output, so that there is no single level of well-formedness that triggers occurrence in the output. In the light of this, consider the results of the experiment in Figure 6. This ﬁgure shows the results of an experiment which contained three unrelated sub-experiments, with their mean judgements indicated by error bars as before, arranged in ascending order of well-formedness, by sub-experiment. Each group of error bars is thus a set of structural alternatives competing to represent given semantic contents. This is clearest in the set on the right-hand side, where all are competing to represent a single semantic content, whereas

The Decathlon Model 201

Figure 6. The mismatch of well-formedness and occurrence: Production is competitive.

the middle group are competing for three different semantic contents, and the left-hand group are competing for four different semantic contents. In each set, those structures which were found to occur in the COSMAS I corpus (IDS, Mannheim) are above the line, while those which do not occur are below the line. It is striking that the structures which occur always appear in a solid block, from the top of the group. This alone is strong evidence of competition for production, based on the weighting information which we can access as judgements. However, notice that the best two structures from the right-hand group, which are those which occur in the language, are nevertheless judged worse than some of the lower structural alternatives in the other groups, which do not occur. Let us be clear that these judgements were given by the same participants in the course of the same experiment, whose items were ordered randomly. The implication is clear: occurrence is not directly dependent upon well-formedness, but rather upon a competition function based on these weightings. This ﬁnding supports the distinction of the grammar and the production function, as in the Decathlon Model, but it is not compatible with an architecture in which these two are merged. 5.2

Categorical judgements and relative judgements

This insight into human linguistic processing offers an account of another outstanding question: Why do judgements, elicited under strictly controlled conditions, show that informants, given a free choice of scale, do not use a

202 Sam Featherston binary division or end points which might represent “fully grammatical” and “fully ungrammatical”? Our solution to this quandary is to distinguish the categorical judgements commonly used in syntactic work from the relative judgements obtained from our experimental studies. Our assumption of this dissociation is based upon several pieces of evidence. The strongest evidence for the reality of categorical judgements is quite simply our intuition that there are such things as “full grammaticality” (= “I would expect to hear this”) and “full ungrammaticality” (= “I would never expect to hear this”). Every speaker seems to have this, and neither its reality nor its relevance can be doubted: any naive informant, given a binary choice whether an example is good or bad, can immediately make sense of the question. It seems likely that the existence of this intuition is the reason for the standard linguistic assumption of dichotomous grammaticality. On the other hand, the results of carefully controlled experimental studies such as our own demonstrate conclusively that relative judgements exist too. Further evidence for the distinction is offered by a frequent comment in judgements of sets of sub-optimal structures: “I would never say it, but it is better than the other one”. The frequency of this type of reaction suggests that this intuition too is common to all speakers. With this response the informant is giving both types of judgement information: a categorical judgement and a relative one. This typical comment also gives us a clue about the difference between the two types: categorical judgements concern occurrence, while relative judgements reﬂect computational cost. Let us take these in turn. The categorical judgement, we argue, is an expression of the likelihood that a structure is good enough to occur in practice. As such it is probably dependent on one or both of two factors: ﬁrstly, our internal corpus of the language, made up of the effects of language exposure, which feeds information into every process which makes use of frequency. The question that the informant is internally answering (at least sometimes consciously) is: “Have I heard structures like this?” The second possible factor is our Output Selection function. The internal question here is: “Would this structure be produced or is there a better alternative which would be chosen in preference to it?”. Either way, categorical judgements reﬂect occurrence, and produce an essentially binary output in the same way that other occurrence-based data types do; a structure either does or does not occur. The relative judgement, on the other hand, reﬂects the cognitive workload in processing the form and semantic content of the structure, and relating the two. It reﬂects the function of the Constraint Application module, and consists of its standard output of a candidate form-and-meaning pair with an assigned weighting. This provides an account of why relative judgements can

The Decathlon Model 203

distinguish between sets of structures which are all seriously ill-formed and none of which would ever occur. Such data cannot possibly reﬂect occurrence or frequency, since this is consistent across all such structures, but they nevertheless differ in computational workload. Notice that this too provides an explanation of our failure to ﬁnd any reﬂection of the intuition that certain linguistic constraints are ‘hard‘ constraints in our judgements (see Section 3). Constraints felt to be ‘hard’ are those which have such high violation costs that structures violating them will, in practice, tend not to win the competition for output and thus not occur. We ﬁnd no correlate of ’hard’ constraints in our relative judgements because their ‘hardness’ is a feature of occurrence, not computational complexity. Short people do not tend to become professional basketball players, in fact there may be no short basketball players. A ’hard’ restriction? Plainly not. Short people can become basketball players if their other qualities (agility, speed, good aim) can make up for the disadvantage of shortness in this context. This may in practice never occur, but nevertheless the restriction is not a ’hard’ one. Restrictions on syntactic structure work in the same way: certain violations may mean that violating structures will rarely or never be selected for output: but the link is not direct, and there is no ’hard’ restriction. 6 The nature of well-formedness If further work conﬁrms that relative judgements reﬂect computational load and categorical judgements reﬂect possible occurrence, then a number of implications for the architecture and nature of the grammar would follow. First, at least a proportion of the restrictions on linguistic structure are ultimately functionally motivated, since they relate to the factor ease of use, and are ultimately emergent, in that the factors which drive the division into “better” and “worse” structures are themselves value-free. It should be clear that this conception of the cognitive roots of grammar has little in common with approaches more generally associated with the label emergent (eg Bybee and Hopper 2001), which use the factor frequency as the causal factor in the emergence of structure. Our ambition here is to account for (among other things) occurrence frequencies, not use them as explanations. We are associative and competition-driven thinkers: put differently, we are lazy thinkers, and we therefore prefer computationally easier processing tasks. But the processing of every word and every syntactic relation comes with a cost: this can be readily seen in judgement studies, where longer sentences are systematically judged worse than shorter sentences (more words

204 Sam Featherston mean more computional load). There is of course nothing actually “wrong” with longer sentences: the interpretation of computational load as “badness” comes only at the stage when the production system has to deal with structural alternatives, and, at this stage but not before, forms which incur higher computational costs are dispreferred. Thus longer sentences are computationally more costly, but sometimes necessary and so they occur, whereas forms for which a more economical structural alternative exists may only be a little more costly, but even this little additional computational cost is unnecessary, which makes the structure unlikely to be selected for output. This model of well-formedness can perhaps be understood as resembling the world of economics. All expenses are dispreferred. Nevertheless, we are prepared to pay more for a motorbike than for a bicycle, because a motorbike does more than a bicycle. We therefore produce long or complex structures when these are appropriate, even though these are computationally costly. On the other hand, we are not to prepared to buy the more expensive of two objects which perform the same function. Equivalently, when two structural variants communicate the same semantic content, we choose the less computationally costly alternative. But there is nothing per se about any structure which makes it good or bad – computational workload is not bad until we lazy thinkers judge it so. This analysis of the nature of judged well-formedness accounts for cumulativity, violation costs, survivability etc, but at the same time goes some way to explaining why there is evidence for the universality of grammatical restrictions (architecture-related factors are by their nature universal), and it does this within a psycholinguistically and empirically motivated framework. 7 Implications for data types and their relation to theory It seems fair to state that a fundamental assumption underlying the use of frequencies as a source of evidence for syntax is that “good” structures are produced, and thus found in corpus data, while “bad” structures are not produced and thus not found. In a second step we might generalize that there is an assumption that better structures are produced more often than less good structures. These assumptions are conﬁrmed by our ﬁndings, but they reveal that this is not the full picture: frequencies correlate with well-formedness in judgements among the very “best” structures, but provide no information about “poorer” candidates, because these undifferentiatedly do not occur. Or rather, they do not occur in the size of corpus to which we have access. If we are right in our suggestion that output competition is probabilistic, then,

The Decathlon Model 205

in a big enough corpus, we should ﬁnd not only the best and second-best candidates but also the third and fourth and so on. The fact that linguists are always ﬁnding structures in corpus data which they had assumed to be categorically excluded, but which do not appear to be mere slips of the tongue, must strongly support this suggestion (just search for ”What did who” in Google). It would follow that frequency measures and judgement data are mathematically related, since we could predict the score of a given item in a comparison set on the basis of the set’s scores from the other data type. They are not practically related, however, since the corpus size required would increase exponentially as we proceeded down the order of preferredness. It follows from our arguments here that the data type of choice for syntax must be relative judgements. Frequency measures give us the same information as relative judgements about the best (couple of) structural alternatives in each comparison set, but they give us no information about any of the others. Since the interaction of linguistic constraints is demonstrably cumulative, this is a severe disadvantage, especially as it tends to make linguists interpret relative restrictions on structure as absolute restrictions. Put brieﬂy: if you want to know what people say, choose frequencies, but if you want to know why, you are better off with relative judgements. 8 Implications for syntactic theory These new, but empirically founded perspectives on data types and their implications for the nature of grammaticality and the structure of the grammar are in some ways revolutionary, since they require a number of conventional assumptions to be abandoned or revised. Much can remain unchanged, however, since linguists in the past, on the basis of the much more partial data they had available, often nevertheless correctly identiﬁed characteristics of the data set. For example, with only an individual’s judgements and without the immediate access to corpus data that we have now, the abstraction to an essentially binary model of grammaticality was a reasonable step, which has in many ways served the ﬁeld well. On the other hand, our ﬁndings should make it clear to every syntactician that the current model of syntax has signiﬁcant weaknesses. We can well understand how these came about, but that cannot be a reason not to move on. In fact the necessary reformulation of syntactic theory requires only two major steps. Syntacticians must ﬁrst recognize that production processing has a role in deciding what linguistic forms are produced, and that occurrence only indirectly reﬂects well-formedness. This entails that output selection and the grammar are two separate processes, and we must decide which of these we are modelling.

206 Sam Featherston There are three possibilities: we can look speciﬁcally at the system of the constraints which apply to syntactic structures, our Constraint Application module, and disregard production factors. To do this we should use data types which exclude the effects of occurrence as far as possible, ie relative judgements, and reﬁne our theory to more accurately reﬂect the attested data patterns. This is narrow syntax. Others will be more interested by the processing system: there is an extensive literature on sentence processing and numerous data-near models of how we go about using our embedded grammar. This work concerns the aspects of what we have called here the Output Selection module. The third approach is to look at the cumulative effect on output of the two modules, Constraint Application and Output Selection. This is what many syntacticians are currently doing, assuming themselves to be looking at just one system, but the mismatch in data patterns between frequencies and relative judgements reveals this work to be treating two heterogenous objects as one. Nevertheless, this is an interesting and worthwhile ﬁeld of study in its own right, one closely related to traditional descriptive linguistics, in which the occurring patterns of a language are the issue, rather than the underlying causes of these patterns. Frequencies will be the data of choice for this study since they represent the selection of the output processing system from the candidates made available and weighted by the grammar. The insight that we can and should distinguish between the functioning of Constraint Application and Output Selection should bring about a major improvement in the empirical adequacy of syntax models, for the division of these two modules resolves at a stroke many of the inconsistencies which obscure the nature of the interaction of linguistic constraints. Syntactic theory will be far closer to the data, and hypotheses about the grammar will be far more constrained, surely a welcome development. Having cleared the picture by factoring out the competitive effects of output selection, we can take a look at the module containing the grammar, which we have called Constraint Application. The second major step in the revision of theory applies here, and consists of the speciﬁcation of constraint violation costs. Each violation must have a quantiﬁed cost, since there are stronger and weaker violation constraints. The introduction of this parameter should alone bring about many of the changes in architecture which are necessary to adjust current theory to gradient grammaticality, as is demonstrated to be necessary in work such as Keller (2000) and Featherston (2005b). As soon as violation costs are accepted as a real variable, the other adjustments (survivability of constraint violations, cumulativity of violation costs, dissociation of categoricity and grammar-relevance) follow automatically.

The Decathlon Model 207

These then are the lessons which we argue that syntax theory needs to draw from the closer inspection of its data base. First, we must redraw the boundary between grammar and production so as to distinguish between the effects of linguistic constraints, and the effects of our need to select just one way formulating each utterance. Second, we must add the additional parameter of violation cost to our models of syntax. Not words and rules, therefore, are the basic components of the grammar, but words, rules and sanctions. Acknowledgements This work was carried out in the project Suboptimal Syntactic Structures of the SFB 441 Linguistic Data Structures supported by the Deutsche Forschungsgemeinschaft. Thanks are due to project leader Wolfgang Sternefeld, my colleague Tanja Kiziak and many other members of the SFB 441, as well as to Frank Keller for WebExp. All errors are mine.

References Aissen, Judith and Joan Bresnan 2002 Categoricity and variation in syntax: The Stochastic Generalization. Talk at Potsdam Gradience Conference, 22.2.2002. Bard, Ellen, Dan Robertson, and Antonella Sorace 1993 Magnitude estimation of linguistic acceptability. Language, 72: 32– 68. Boersma, Paul and Bruce Hayes 2001 Empirical tests of the gradual learning algorithm. Linguistic Inquiry, 32: 45–86. Bybee, Jane and Paul Hopper 2001 Frequency and the Emergence of Linguistic Structure. Benjamins, Amsterdam. Cowart, Wayne 1996 Experimental Syntax: Applying Objective Methods to Sentence Judgements. Sage, Thousand Oaks, California. Featherston, Samuel 2001 Empty Categories in Sentence Processing. Benjamins, Amsterdam. 2002 Coreferential objects in German: Experimental evidence on reﬂexivity. Linguistische Berichte, 192: 457–484. 2003 That-trace in German. Lingua, 1091: 1–26. 2004 Bridge verbs and V2 verbs: The same thing in spades? Zeitschrift f u¨ r Sprachwissenschaft, 23: 181–209. 2005a Magnitude estimation and what it can do for your syntax. Lingua, 115. 2005b Universals and grammaticality: Wh-constraints in German and English. Linguistics, 43.

208 Sam Featherston Keller, Frank 2000 Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. Ph.D. thesis, Edinburgh University, Edinburgh. Labov, William 1996 When intuitions fail. In Lisa McNair, Kora Singer, Lise Dolbrin, and Michelle Aucon, (eds.), Papers from the Parasession on Theory and Data in Linguistics 32, pp. 77–106. Chicago Linguistics Society, Chicago. Manning, Christopher 2003 Probabilistic syntax. In Rens Bod, Jennifer Hay, and Stephanie Jannedy, (eds.), Probabilistic Linguistics, pp. 289–341. MIT Press, Cambridge, MA. M¨uller, Gereon and Wolfgang Sternefeld 2001 Competition in Syntax. Mouton de Gruyter, Berlin. Pesetsky, David 1987 Wh-in-situ: Movement and unselective binding. In Eric Reuland and Alice ter Meulen, (eds.), The Representation of (In)Deﬁniteness, pp. 98–129. MIT Press, Cambridge, MA. Prince, Alan and Paul Smolensky 1993 Optimality Theory: Constraint interaction in generative grammar. Technical Report Technical Report No.2, Center for Cognitive Science, Rutgers University, New Brunswick. Sch¨utze, Carson 1996 The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. University of Chicago Press, Chicago.

Examining the Constraints on the Benefactive Alternation by Using the World Wide Web as a Corpus Christiane Fellbaum

1

Introduction

This paper asks whether data gathered from the web can provide new insight into speakers' grammars and serve as evidence for linguistic theories. Our case study examines the poorly understood English Benefactive Alternation and the various constraints that have been proposed to account for its distribution. Web data show these constraints to be soft and subject to frequent violation and extension, raising the possibility of a constraint with "fuzzy edges." Our case study argues for the need to substitute or augment constructed data with web data to avoid theoretical biases and capture the full range of rule-governed linguistic behavior. The Benefactive Alternation relates two syntactic variants for the expression of an argument that bears the semantic role of Beneficiary. This argument can be realized either in a PP headed by for as in (1) or as the first of two direct objects in a double object construction as in (2): (1) (2)

Chris baked/bought/stole a cake for Kim. Chris baked/bought/stole Kim a cake. The direct object alternant of the Benefactive is subject to constraints:

(3) (4) (5)

Chris baked/bought/decorated/sliced a cake for Kim. Chris baked/bought Kim a cake. *Chris decorated/sliced Kim a cake

Prior explanations for contrasts like that between (2, 4) and (5) have been formulated in terms of semantic constraints on the verb's semantic class membership, the aspectual nature of the event, the Beneficiary and/or the Agent arguments, as well as the verb's morphophonological make-up.

210

Christiane Fellbaum

1.1

The data

Previous investigations of the ill-understood Benefactive alternation are based on introspection and the examination of data generated by the investigator. Such data are often constructed in order to highlight a particular phenomenon, and theoretical bias may well influence and limit the range of data that should be considered for a full account. Judgements can be unreliable and vary both across speakers and within a given speaker, depending on factors such as the context in which a specific structure is embedded. Given the current availability of naturally occurring data, it seems timely to re-examine the alternation and the constraints that have been proposed in an empirical fashion.1 1.1.1 Using the web as a corpus We used the World Wide Web as a corpus and a source for naturally occurring examples in order to examine a range of claims specifying how the Benefactive Alternation is semantically constrained. The web is a rich source of freely available linguistic data covering a wide range of speakers, topics, and styles, with much of the data generated spontaneously and unedited. The web is thus a logical alternative to conventional corpora, which tend to be hard to come by, small, or limited to a few domains. At the same time, there are no controls on web postings, and some linguistic data may not be suitable as evidence for linguistic theories. Web data are vulnerable to the charge of unreliability for several reasons. A major concern are data posted by non-native speakers. This is particularly worrisome in the case of English data, as English, unlike, say, Hungarian or Japanese, serves as the lingua franca for worldwide communication, for which the web is a preferred channel. To safeguard again non-native data, each URL needs to be examined, and those that are clearly of foreign origin must be excluded. Moreover, for each type of construction, we collected several data points. Perhaps most importantly, the data discussed in this chapter were presented at several conferences and in colloquia where they went unchallenged by the native English speakers in the audience. Some of the data cited in this paper could be dismissed as non-standard. Certainly, quite a few examples reflect language that one might not find in official publications or written language at all. But the kind of language that is often spontaneously generated in postings to chat groups nevertheless reflects speakers' grammars. Our work shows that data generated "naturally" and outside the context of a linguistic investigation force a rethinking of

Examining the Constraints on the Benefactive Alternation

211

previously proposed constraints on the Benefactive alternation, which we argue are too narrowly formulated. Not just anything goes, as far as web data are concerned. The fact that we could not find web data like (5) indicates that the construction is constrained in a principled way that needs to be understood. Of course, not finding a web example of a Benefactive with a particular verb or noun argument does not mean that it is categorically ruled out and ungrammatical. First of all, our search was unsophisticated in that we could not look for abstract syntactic patters or entire verb classes, but had to search for specific strings, necessarily missing other, similar, ones. But even the absence of hits to more exhaustive searches cannot be overinterpreted. The asterisks in this chapter must therefore be interpreted not as strictly "ungrammatical," but as "unattested on the web." The motivation for this work was less a definitive formulation of the constraints on the Benefactive as a confrontation of the proposed constraints with attested data. 1.1.2 Searching for relevant structures The PP alternant of the Benefactive, exemplified in (1) and (3) is essentially unconstrained, while (5) shows that the alternant where the Beneficiary is projected as the direct object (DO) is restricted. To determine the scope and nature of the constraints, we searched for examples of the DO alternants of the corresponding PP alternants. As this work was carried out prior to the development of the Linguist's Search Engine (Resnik and Elkiss 2004), we had no intelligent tool for targeted searches and had to rley on simple pattern-matching searches. Using GOOGLE , we formulated queries of the forms (6) (7)

"she Ved me some" "he Vs her a"

where V was filled with specific verbs. We used a variety of verbs that occurred with high to medium frequency in the Brown Corpus, and whose semantic make-up was relevant to the constraints formulated. When testing hypotheses concerning the nature of the Agent or the Beneficiary, we looked for examples with specific nouns filling the argument slots. We excluded sexually explicit sites or other inappropriate data.

212

Christiane Fellbaum

2

Beneficial events

We now examine some aspects of constructions that express events with a benefit. 2.1

What kinds of events can be beneficial?

The argument expressing a Beneficiary is always optional and is not part of the verb's theta grid. This suggests that verbs selecting for Beneficiaries do not inherently denote beneficial events, but can receive such an interpretation. A question that arises is, are there any constraints on the class of verbs that can express benefits? Many transitive and intransitive verbs from a wide variety of semantic classes can add a Beneficiary as a PP adjunct; the actions expressed by these verbs are not inherently actions performed for someone's sake or benefit. The sentences below, found on the web with the Beneficiary, are just as good without: (8)

(9)

There'll be an unloading zone at the transition area if you wish to have someone drop you off and park your car for you home/att.net/~ata-jc/kaprules.html There is also a system-wide startup file which is run for you first Orbit-net.nesdis.noaa.gov/ora/oraintranet/ctst/unix/c15.html

Similarly, many verbs permitting the DO alternant do not denote events with an inherent benefit: (10) Peel me a grape (11) Hurry, get my red shirt (12) It feels as though someone had designed me a custom dress... www.between-theshadows.com/shadows/fire/transformations/aboutme.html (13) I asked Mom to wash me some clothes, www.bad-krama.net/archive/arc39.html (14) I ask Roberto if he can change me some money www.newbury.net/deanwood/doc/Greece.htm (15) And try to find me some aspirin while you're at it. www.geocities.com/chocofeathers/ multifics/2ndc_chap8.htmll

Examining the Constraints on the Benefactive Alternation

213

In some cases, construing a beneficial reading is difficult. We could not find examples such as the following, though they are perfectly interpretable in the right context (e.g., where the Beneficiaries are a nurse and a stage director, respectively). (16) I'll take a walk/swim/nap for you (17) She fell down the steps for him While these structures seem perfectly grammatical, given an appropriate context, the corresponding DO alternants do not: (18) ??I take you a walk/swim/nap (19) ??She fell him down the stairs Unlike the unrestricted PP alternant, the DO alternant seems to be reserved for events where a beneficial reading can be constructed more easily and naturally. 2.2

No benefit

The alternation also occurs with events that have undesirable consequences for the DO: (20) They have done nothing but ruin me my whole life www.piedmont.tec.sc.us/worldlit/andr1.htm (21) They have done nothing but ruin my whole life for me (22) So they set you a trap hot.ee/fanfic/thirteefull.html (23) So they set a trap for you There is a straightforward semantic contrast between benefactives and these "malefactives." It has often been observed that contrast is a particular kind of semantic similarity, as contrastive concepts tend to represent different values of a shared attribute or distinct points on the same scale. The fact that Benefactives and some "Malefactives" participate in the same alternation is consistent with a view in which they are semantically related.

214

Christiane Fellbaum

2.3

Beneficiary or replaced Agent?

The for-phrase in the PP alternant is potentially ambiguous. Green (1974) cites, besides the Benefactive, the "instead-of" reading. In (24) and (25) respectively, the missionary taught the class, and Kahler gave the speech, in place of the writer: (24) But on Tuesday, I stayed home, in bed part of the day! Another missionary taught the class for me. www.jacklynes.com/russia/letter16.htm (25) ...had developed a very hoarse sore throat. So with the approval of my hosts, Kahler gave the speech for me and did very well indeed. www.nobel.se/noble/events/eyewitness/hench/ The distinction between the Benefactive and the "instead-of" readings is not always sharp, as the substitution seems to imply a benefit for the substituted Agent; this reading is avoided only when the PP is headed by instead of. (26) Complete Grocery Shopping... We do all the shopping for you. There's no need for you to spend your valuable time ... www.shadowlief.com/what_we_offer.htm (27) E-mail Software does all the work for you. www.homeuniverse.com/bulk.htm In some cases, the context supplies world knowledge that disambiguates between the two readings of the for-phrase: (28) ...pianist Vladimir Horowitz. After hearing two of John's compositions, which he played for the maestro one evening after dinner, Mr. Horowitz looked over to John ... www.johnsciullo.com Most likely, John is performing for Horowitz's benefit here, not in his place. Under the "instead-of" reading, for can receive heavy stress, so long as no other constituent in the VP is focused. Thus, (29) means (30) and not (31): (29) I'll do it FOR you (30) I'll do it in your place

Examining the Constraints on the Benefactive Alternation

215

(31) I'll do it for your benefit Green notes that in the direct argument alternant, no replacement interpretation is possible, and the NP is always a Beneficiary. This can be seen in (32) and (33), where only the beneficial context seems felicitous: (32) Mary played Mr. Horowitz two of her compositions ...and the maestro listened attentively ??...while he was away on tour (33) Mom washed me some shirts ...so I'd look neat for the job interview ??because I don't know how to operate the washing machine 2.4

The Agent as Beneficiary

Some verbs, including consumption and perception verbs like eat, drink, watch, listen, etc., denote events with an inherent benefit for the subject, the "ingester." It is difficult to construct an additional Beneficiary argument for these verbs. Nevertheless, some dialects of American English can add an explicit DO Beneficiary, in constructions that Curme (1986) calls a "personal dative." 2 The DOs here must be Beneficiaries and not Recipients, as the verbs do not denote transfers. These Beneficiaries are either reflexives or object pronouns, necessarily coreferent with the subject. Web examples are: (34) (35) (36) (37) (38)

I'll have myself a little snack before bed I'll eat me some potted meat gonna listen me some Guns and Roses gonna watch me some uneducational TV Have yourself a merry little Christmas

In contrast to the Benefactives where subject and object have distinct referents, the corresponding PP alternants seem ungrammatical: (39) (40) (41) (42) (43)

*I'll have a little snack before bed for myself *I'll eat some potted meat for me/myself *gonna listen to some Guns and Roses for me/myself *gonna watch some uneducational TV for me/myself *Have a merry little Christmas for yourself

216

Christiane Fellbaum

Non-ingestion verbs can have a reflexive beneficiary in a for-phrase, but only when there is a contrast: (44) I'll wash some shirts for myself (but not for you) The reflexives seem therefore to constitute a different phenomenon with these ingestion verbs; we will return to these data later. 2.4.1 Some Spanish data In what appear to be related cases, Nishida (1994) examines Spanish sentences with transitive verbs that have alternants with an additional Reflexive: 3 (45) Juan SE tomó una copa de vino John (REFL) drank a glass of wine (46) Yo ME comí diez manzanas I (REFL) ate ten apples The Spanish data seem similar to the English ones, and interestingly, Nishida's examples cover the same semantic classes of verbs (verbs of consumption/ingestion like eat, drink, smoke, read), and what could be described as semantically contrasting verbs (skip over, miss out); as well as verbs of acquisition like steal, gain, learn). Nishida claims that the reflexive clitic in cases like the above overtly marks quantitatively delimited events. The reflexive variants of (45) and (46) thus can be translated as eat up/drink up, whereas the non-reflexives do not have this completive aspect. This contrast is very clearly shown in (47). Only the sentence with the reflexive necessarily refers to an event where the entire book was read (the English gloss is underspecified with respect to aspect): (47) Juan (SE) John REFL

leyó el libro anoche read the book last night

Given the meaning difference, it seems surprising that sentences like (46), which have a delimited object (as opposed to, say, a bare plural NP), allow both the reflexive and the non-reflexive form. Not surpisingly, Nishida states that the native speakers he consulted preferred the reflexive form.

Examining the Constraints on the Benefactive Alternation

217

Despite the superficial similarity, Nishida's explanation for the Spanish data does not hold for English, where the sentences with reflexive Beneficiaries like (35-37) have a DO with a partitive, which marks them as nondelimited, and where the event is necessarily open-ended. We will return to the question of aspect in Benefactives in Section 4.4. 3

Argument status of the Beneficiary

The Beneficiary is always optional, hence it cannot be considered to be a subcategorized argument of the verb. Several explanations have been offered to account for the licensing of the Beneficiary. Larson (1990) suggests a mechanism he calls Argument Augmentation, which adds a Beneficiary to the verb's theta grid. Marantz (1984) proposes an affix-mediated increase in the verb's valency. Baker (1988) suggests that a zero affix allomorph of for attaches to verb when the Beneficiary is in direct argument position.4 In support of the claim that a Beneficiary is not a true argument, it has been pointed out that it it fails a standard test for argumenthood, passivization. In this respect, Beneficiaries contrast with Recipients: (48) *Kim was selected/designed/sewn a wedding dress (Kim = Beneficiary) (49) Kim was given/sent/mailed a present (Kim = Recipient) However, a web search turned up numerous examples of passivized Beneficiaries (these may be more characteristic of British than American English): (50) I was made coffee and sat and talked to Ella www.punternet.com/frs/fr_view.php?recnum=6798 (51) ...until early morn, when I was made tea and toast, ... pws.prserv.net/usinet/declair.diary1.htm (52) Today, the teachers were fixed breakfast www.switzerland.k12.in.us/~pacerps/pdf (53) "Oh," Nick said ...watching as he was poured a drink. "Brian?" http://cgi.allihave.net/fiction/hom/short/ForMyWedding.shmtl (54) He was built a house at pool-side, to keep him in the shade. www.bronxbard.com/specials.html

218

Christiane Fellbaum

(55) His 'friend' who came with him insisted that he was bought some trainers... duvel.lowtem.hukudai.ac.jp/~jim.climbing/manda3_report/node15.html We will not further pursue the question concerning the argument status of the Beneficiary but, clearly, passivization does not distinguish Beneficiaries from real arguments like Recipients. We will return to a comparison of the Beneficiaries and Recipients later. 4

Constraints on the Benefactive alternation

A number of explanations have been proposed to account for the restrictions on the Benefactive alternation, specifically, the distribution of the DO alternant. These explanations have been formulated in terms of the verbs' lexico-semantic and morphophonological properties, the aspect of the beneficial event, and the semantics of the Agent and the Beneficiary arguments. We examine each of the proposed constraints in the light of pertinent web data. 4.1

Lexico-semantic constraints

Green presents the most extensive analysis of verbs that show the Benefactive alternation. She classifies these verbs into distinct groups: Verbs of creation, selection, performance, and obtaining, as well as "symbolic actions." In parallel with Green's classification, benefits have been characterized as created entities (including entities created as a by-product of acting on another entity), prepared entities, (future) possessions and obtained entities (Green 1974; Levin 1993; Larson 1990; Jackendoff 1990; inter alia). We will examine each of these classes in turn with respect to the alternation. 4.1.1 Verbs of creation A creation usually requires effort, and efforts tend to be undertaken only when they are associated with a benefit. A created entity can therefore readily be interpreted as a Benefit, and creation verbs generally allow Beneficiaries in direct argument position:

Examining the Constraints on the Benefactive Alternation

219

(56) In 1818-1819, Benjamin Henry Latrobe built the couple a house, which came to be known as Decatur House, next to Lafayette ... www.library.georgetown.edu/dept/speccoll.decatur (57) My friend Ola fixed me a job. www.trance.org/sensphere/ (58) She made them waffles www.womenspace.ca/Fabrications/lorr3.htm (59) and she bore him a son, Hasumat. www.sacred-texts.com/oah/oah/oah360.htm An interesting subclass is constituted by cases where the creation is a by-product or result of another event, which is not a creation event: (60) only if you clean me some room on this desk to work, right? www.geocities.com/dbzasuri/epics.itsoadc1.htm The room is the result of the desk cleaning event; even though the room is the DO of the verb, what is cleaned is not the room but the desk. 4.1.2 Destruction verbs It has been claimed that that the Benefactive is not available for verbs of destruction (Wechsler 1991). However, we found quite a few web examples that refute this claim, including the following: (61) ...kick the crap outta saint nick and burn him some pagans www.geocities.com/s1xlet/apathy19.html (62) ...an idealistic 18-year-old eager to go kill him some Redcoats www.dvdjournal.com/reviews/p/patriot.shtml These sentences show that the destruction of an entity (the Theme) may result in a benefit (for a Beneficiary that may or may not be coreferent with the Agent). Destruction and creation verbs are semantically related by virtue of the contrast between them, and an extension of the alternation from creation to destruction verbs could be attributed to this similarity. In (63) and (64), the destruction events do not appear to entail a benefit: (63) Herons or other wild fowl shall destroy them their nest or eggs www.nprwc.usgs.gov/resource/1999/eastblue/ebexotic.htm

220

Christiane Fellbaum

(64) The white missionary is trying to ruin them their way of life www.piedmont.tec.sc.us/worldlit/andr1.htm Here, the DO seems to emphasize the "malefactive" effect of the destruction. 4.1.3 Verbs of preparation Verbs expressing events where an Agent acts on an entity such that this entity is prepared for use or consumption easily take a Beneficiary argument and exhibit the alternation. (65) Peel me a grape, Crush me some ice. Skin me a peach, save the fuzz for my pillow. www.amyandfreddy.com/cd/track5.html (66) I asked Mom to wash me some clothes www.bad-karma.net/archive/arc39.html (67) Honey, can you iron me a shirt? www.epinions.com/hmgd-review-6689-32384DB-3A231D50-prod1 4.1.4 Verbs of performance Verbs of performance can be described as re-creations of a work of art such as a composition, a poem, or a song. Performances therefore resemble creations. The Beneficiary here is an Experiencer. (68) and that's where I met Mel and Shaz, I played them some tunes www.portowebbo.co.uk/nottinghilltv/faces-kgee.htm (69) Anyways, Herman has sang me some of his banana-fried lyrics members.aol.com/bumingler/set1/songs.html (70) Morgane donned wooden clogs and danced us a dance beyondthebrochure.homestead.com/britnorm.html Note that some performance verbs take both for- and to-phrases in the PP alternant: (71) Moses and his sister Miriam both sang a song to the Lord... www.pcusa.org/ega/music/favoritesongs.htm

Examining the Constraints on the Benefactive Alternation

221

(72) That week at ending campfire, we sang a song for Cassie. kidsaid.com/stories/cassie.html (73) ...anyone who would like to can play a piece to the school. www.childokeford.dorset.sch.uk/ clubsandactivities/music.htm (74) When I actually sit down to play a piece for others... www.violinist.com/discussion/response.cfm?ID=3688 4.1.5 Symbolic actions In addition to the classes listed above, Green notes that certain "symbolic actions" performed for someone's benefit can undergo the Benefactive alternation. Green does not explain the exact nature of "symbolic actions;" the benefit is specific to the context. Among the examples we found are these: (75) God said to Abraham: Kill me a son www.ieor.berkeley.edu/~goldberg/lecs/kierkegaard.html (76) Baby open me your door www.geocities.com/SunsetStrip/Pit/8508/songs/chameleo.html The verbs here (and those cited by Green) do not fit neatly into any semantic class. Moreover, the symbolic actions most clearly show the distinction between Benefactive and Recipient, since these cases do not involve a created/prepared/performed/obtained entity that is moved, transferred, or that comes into the possession of the Beneficiary. 4.1.6 (Future) possession Many verbs of obtaining, describing a resultant possession for the Beneficiary, participate in the alternation: (77) He said he'd rather go out and grab him some food www.dreamwater.net/art.jtdoc/hasil.html (78) Retrieve me some cream cakes! Home.talkcity.com/ekochap8.htm A subclass of the future possession verbs are the verbs of selection, where a to-be-possessed entity is chosen for the Beneficiary:

222

Christiane Fellbaum

(79) ...please select me a good singer for about twelve shillings www.classicreader.com/read.php/sid3/bookid1506. (80) I've written to Sylvia asking her to choose me a coat www.gerty.ncl.ac.uk/letters/l1170.htm Obtained Entities, including abstract ones like emotions, can become the Beneficiary's possession: (81) Radical Red: Get Me Some Self-esteem. by Laura Jones. www.thebody.com/tpan/julaug_01/self_esteem.html (82) I was bought loads of drinks and got quite drunk. www.wrecked.co.uk/norames/ott3.html 4.2

Benefit and possession

Larson (1990), Daultrey (1997), Krifka (1999) inter alia, attribute to the DO alternant the requirement for a created/prepared/obtained entity that becomes the Beneficiary's possession. The benefits in the different verb classes conform to the notion of possession to varying degrees. Future possession verbs clearly do. The products of creation events also become straightforwardly possessions of the Beneficiary: (83) Anyone who can create me some copies in other formats, please give me a shout! www.cunningham-king.freeserve.co.uk/YCC/Fornt%20Page.htm (84) She read the recipes and cooked her husband some Spam (85) composed me a few lovely haiku www/ghostinthemachine.net/weblog4200.html Performance verbs are a subclass of creation verbs that involve no physical entity; Green argues that the audience's perception constitutes a kind of possession: (86) Henn Parn with his dancing partner performed us the professional ballroom dances www.euroiniv.ee/evana/eng/ball2001.htm The notion of possession is somewhat stretched here, as other ways of referring to a possessed entity seem infelicitous. With a possessive adjec-

Examining the Constraints on the Benefactive Alternation

223

tive, the noun in (87) , interpreted as an activity rather than as a result, is odd, as compared with the created noun in (88): (87) You/I/she watched your/my/her dances (88) Here is your/my/his spam The product of a preparation or transformation event is extending the notion of possession even further, as the Theme is already in the possession of the Beneficiary prior to the preparation or transformation: (89) Well, the rest is his story? Honey, can you iron me a shirt?? www.epinions.com/hmgd_review-6689-32484DB-3a231D60-prod1 (90) You're a good boy, Joe. Now get busy and wash me some dishes. www2.xlibris.com/bookstore.book-excerpt.asp?bookid=902 (91) I asked Mom to wash me some clothes, www.bad-karma.net/archive/arc39.html Sentences like (92) and (93) might be described as referring to the repossession of an entity that the Beneficiary owns: (92) The captain shouted to the first mate, "Hurry, go to my cabin and get me my red shirt!" www.skywaystools.com/jokes1/html (93) his segundo would fetch him his French hat, morning frock coat and a birch tree chair. Collections.ic.ca/skeena/Cataline.htm Clearly, to equate Benefit with Possession is stretching the notion of possession considerably in many cases. Moreover, such an equation would not account for why the Benefactive exists as a phenomenon distinct from the Dative Alternation and applies to a different set of verbs. 4.3

The Latinate constraint

As in the case of Dative Shift, verbs of Latinate origin are said to be generally ineligible for the Benefactive Alternation (Levin 1993, inter alia).5 However, we found the following examples on the web, which include verbs from all the lexico-semantic classes associated with the Benefactive Alternation:

224

Christiane Fellbaum

(94) Anyone who can create me some copies in other formats, please give me a shout www.cunningham-king.freeserve.co.uk/YCC/Front%20Page.htm (95) Please compose me a short piece. www.uen.org/utahlink/activities/view_activity.cgi?activity_id=7511 (96) ...promised to procure me seeds mnlg.com/gc/species/c/cau_pla.html (97) I shall decline your invitation to purchase me a beverage www.fabulamag.com/contest/august_html (98) I am going to japan to acquire me a new slave home1.gte.net/methnews1/GLA.txt (99) this networking helped to secure me a position www.geocities.com/SouthBeach.Jetty/9001/collegedays.html (100) She produced me two gorgeous sows cavyclub.tripod.com/satinperuvian.html (101) Can anyone..photocopy me the manual. www.driverforum.com/harddrive/1267.html (102) Is there someone who could construct me a set of replicas www.taxidermy.net/forums/FeerTaxiArticles/ (103) ... a group of students performed us sketches about their school www.fast-trac.ofw.fi/report14.htm All these examples have a clitic pronoun in direct argument position, which might suggest that for clitics, the Latinate constraint is relaxed. Indeed, we found fewer Latinate verbs with a full noun DO Beneficiary than with a pronoun, but our searches turned up quite a few examples, including the following: (104) ... To secure our customers success in using our technology... Support.reachin.se/Downloadable/brochures/Core_technology.pdf (105) Her aggressive and well planned marketing concepts, combined with her personable selling skills, guarantee her customers outstanding results. rearch_realtors.com/pennsylvania/wyomissing/Liz_Egner_94768886.html (106) ...in order to ensure future generations an opportunity to appreciate and enjoy the West's rich heritage ... www.wstpc.org/About/Facts/htm (107) SA has obtained his clients recognition all over the world... www.pontsoft.com/empresa/plasticm/eng/welcome.htm

Examining the Constraints on the Benefactive Alternation

225

We conclude that there is no restriction on the Benefactive alternation that can be formulated in terms of the etymological or morpholphonological properties of the verb. Rather, speakers seem to employ this construction with certain Latinate verbs just as they do with semantically similar verbs that are of Germanic origin. In this respect, too, the Benefactive alternation resembles the Dative Shift. The restrictions on the Dative Shift have been formulated in terms of the Latin vs. Germanic origin of verbs like donate/contribute on the one hand and give/hand,on the other hand. But this explanation does not hold, as sentences like (108) and (109) with Latinate verbs show: (108) at her death she bequeathed him her whole property www.fordham.edu/halsall/pwh/plut_sull1.html (109) he offered us some hope. www.ucsfhealth.org/childrens/profiles/sieberSamuel.html Offer and bequeath are verbs with of future possession. Pinker (1989) attributes to this distinction their apparently exceptional behavior with respect to the Dative Shift. But this explanation does not account for the Dative Shift with a verb like render, which denotes a transfer contemporaneous with the event time but which behaves syntactically like a verb of future possession: (110) having rendered us the slightest service www.wtj.com/archives/suchet/suchet03a.htm The distinction between future possession and possession at the time of the event cannot account for the distribution of the Benefactive alternation with non-Latinate verbs, either. While for many verbs, in particular the creation and preparation verbs, there is necessarily a delay between the event and the benefit derived from it, benefits derived from a performance must necessarily be co-temporaneous with the performance, and Pinker's proposed constraint for the Dative Shift would not work for verbs like sing, dance, and recite in sentence like these: (111) She sang them a song (112) He danced us a little jig (113) Recite us your latest poem

226

Christiane Fellbaum

4.4

Aspect

Green states that the Benefactive alternation is compatible only with accomplishment. But again, attested data appear to refute this claim. As we saw in connection with the reflexives and the Spanish data, DO Benefactives occur with events that have stative character: (114) I always keep me some balled up paper by the phone. www.jolenestrailerpark.com/Storys/4htm (115) We all loved the flavour and thedevelopment of this wine and I said "Keep me some of this for my seafood starter". ... www.wineoftheweek.com/hist/food200806.html In the following example, the event denotes an open-ended activity: (116) I'm gonna have me some fun. ww.atlyrics.com/quotes/p/predator.html In fact, all the creation, preparation, and performance verbs can be turned into activities, as can be seem by their compatibility with temporal for-adjuncts: (117) She baked them waffles for hours (118) Mom washed me my shirts for years (119) She sang us Christmas songs for weeks It might be argued that (117-119) refer to repeated accomplishments; however, this is not the case in (114-116). We conclude that the restriction on the Benefactive alternation cannot be formulated in terms of the aspectual properties of the event. Indeed, it is not clear how such a constraint would be semantically motivated.6 4.5

Constraints on the noun arguments

Several restrictions have been formulated in terms of the semantics of the nouns expressing the Agent, the Benefit, and the Beneficiary.

Examining the Constraints on the Benefactive Alternation

227

4.5.1 Devotion or intention to please Green states that the Benefactive construction expresses the Agent's devotion or desire to please the Beneficiary. The Agent intends the event to benefit the Beneficiary. We already saw examples that involve no benefit but rather an undesirable consequence. Moreover, devotion and desire to please presuppose the animacy--or, more precisely, the sentience--of both Agent and Beneficiary. The Agent must not only be the instigator and in control of the event, but capable of intent and the feeling or attitude of devotion; conversely, the Beneficiary must be capable of appreciation. While a majority of the attested Benefactive data conforms to this assertion, we found cases with inanimate Beneficiaries, which are clearly incapable of being pleased. One could argue that in the examples below, the Agents are devoted to a doll or a car, which are anthropomorphized; the "devotion constraint" is extended here. (120) Brandy found a shirt sleeve...and made her doll a dress www.geocities.come/Haertland/Esates/3147/RPLOTN/julkids.html (121) ... Bought my car some new boots... www.strangely.org/diary/200008/ There are also cases where the subject of the Benefactive is an inanimate Cause or a causing event; these data invalidate the claim that the event demonstrates devotion or a desire to please. (122) ...the mixture of sand and clay and then let it stand in summer, the sun bakes you a brick. www.growise.com/Articles/sprhtm/bestsoilamendments.htm (123) Luck Found Me a Friend in You. ... www.gocollect.com/p/cherished-teddie/special-occasions.html (124) That deal saved me $6 www.100megsfree4.dom/blahthings/2001/march/mar16.html These data suggest that the devotion/appreciation constraint is not a hard one. While in (124), the Beneficiary might also have played an agentive role in the deal and thus have been at least partially responsible for his own action and the ensuing consequences, no intention can be ascribed to the sun or luck.

228

Christiane Fellbaum

4.5.2 Contemporaneous existence Green includes in her constraints on the Benefactive the contemporaneity of Agent and Beneficiary; the reasoning is that the Beneficiary must be able to benefit from the product or result of the event or of the entire event. But a web search reveals that this constraint does not hold. People perform actions for the benefit of not-yet-born Beneficiaries: (125) This again will save future generations time in collecting data ilil.essortment.com/craftstimecaps_rlmd.htm (126) ...industrious cultivators of Abbasid times save their descendants expense and labour by providing them with building materials www.gerty.ncl.ac.uk/letters/1462htm (127) agreement the Air Force and Raytheon and Hughes have negotiated on the Advanced Medium-Range Air-to-Air Missile will save future customers about $180 million. .. www.aerotechnews.com/starc/102797/102997d.html We also found examples of actions performed to benefit deceased, i.e., or no longer existing, Beneficiaries: (128) We will see what we can do to get him a gravestone marker www.islesoford.com/idcgues.html (129) if you don't buy a gravestone for Khveodor. You kept saying, it's winter, winter ... if you don't buy him one, he will come again, www.geocities.com.cmcarpenter28/Works/3deaths.txt (130) I've saved my last dime to buy him a casket. amsterdam.nettime.org/Lists-Archives/nettime 9908/msg00038.html Presumably, the speakers/writers of these sentences consider the deceased as being still among them and they extend the "contemporaneity" constraint proposed by Green accordingly. 4.5.3 The Beneficiary as employer Green further states that, with the Beneficiary as the DO, the action performed for someone's benefit cannot be carried out when the referent of the subject is employed by the referent of the DO. Her (constructed) examples are (131-132), where Mr. Lubin pays the speaker:

Examining the Constraints on the Benefactive Alternation

229

(131) I baked cakes for Mr. Lubin. (132) *I baked Mr. Lubin cakes. But the "employment constraint" is more subtle. We found numerous examples on the web with DO Beneficiaries where the action is performed under conditions of employment: (133) Happy Customers .. He built me a very fully loaded system www.grovecomputerservices/com/happy.htm (134) Tom Mullins Web Design Studio prepares you a bid www.tommullinsdesign.com/flag-prices .htm (135) In 1818-1819, Benjamin Henry Latrobe built the couple a house www.library.georgetown.edu/dept/speccoll/decatur/ What seems to account for the contrast in Green's examples and the web data is not just a broader context of employment or the commercial setting of the event, but the difference between the roles of an employer and a customer (I am grateful to Anthony Krogh and Philippa Cook, who, on separate occasions, suggested this interpretation). When the DO is unambiguously an employer, as in (136-139), the Benefactive alternant seems ruled out and only the for-PP alternant is grammatical; when the DO is a customer (a kind of temporary employer), the DO alternant is felicitous. We could find no sentences of the kind (136) (137) (138) (139)

??She stacked Wal-Mart shelves ??We cleaned The Maids houses ??They sold AIG insurance (cf. They sold insurance for AIG) ??Maradona played the Naples club soccer

Interestingly, the customer is also the Recipient of the product or result of the event, while the employer, who presumably passes the product on to his customers, is not. These data show once again the close semantic relation between Recipient and Benefactive. 5

Benefactive vs. Dative alternation

The Benefactive alternation resembles the Dative alternation both syntactically and semantically, and the two are often lumped together. We discuss

230

Christiane Fellbaum

some similarities and differences and argue for a distinction between the two constructions. 5.1

Beneficiary and Recipient

Both Recipients and Beneficiaries can occur freely in PP adjuncts and, with as yet ill-understood restrictions, as direct objects of many verbs. We already saw the semantic similarity between the two kinds of roles in cases where a Beneficiary is also a Recipient: (140) ...where he bought him some meat and a big loaf of bread. www.penguinreaders.com/downloads.Spreads/Olivertwist167.pdf (141) I got her some cute summer dresses. www.livejournal.com/users/piazza_rox31/ Oliver and she both receive something and benefit from it. The semantics of verbs like buy and get include both transfer and benefit, and this may account for the syntactic similarity of these verbs with respect to the PP/DO alternation. Only the PP alternants for verbs like buy and get, which must be headed by for, not to, show the difference between the two kinds of arguments. Some verbs of transfer do not strictly subcategorize for a Recipient. They may select for an adjunct with either a Recipient or a Beneficiary: (142) Kim mailed/sent/faxed a poem for John on his birthday. (John=Beneficiary) (143) Kim mailed/sent/faxed a poem to John on his birthday. (John=Recipient) Both arguments can co-occur as adjuncts in either order: (144) Kim mailed/sent/faxed a poem to Mary for John. (145) Kim mailed/sent/faxed a poem for John to Mary. But when one of these arguments is in direct argument position, it must be the Recipient: (146) Mary mailed/sent/faxed John a poem for Kim. (John = Recipient) (147) *Mary mailed/sent/faxed John a poem to Kim. (John = Beneficiary)

Examining the Constraints on the Benefactive Alternation

231

This fact seems to suggest some kind of "primacy" of the Recipient over the Beneficiary with respect to argument status. Passivization data reinforce this intuition. Our search results indicate that verbs that can take both a Recipient and a Beneficiary can passivize only the Recipient: (148) I sent mailed flowers for/to John. (John = Recipient or Beneficiary) (149) John was sent /mailed flowers (John = Recipient only) By contrast, many verbs that do not (also) select for a Recipient can passivize the Beneficiary (cf. also the examples in (50) - (55)): (150) The host poured drinks for us/*to us (us = Beneficiary) (151) The host poured us drinks (152) We were poured drinks Nevertheless, some verbs with a strong transfer meaning component that undergo the Benefactive alternation apparently cannot passivize the Beneficiary argument: (153) (154) (155) (156) (157) (158)

He got her shoes for her She fetched some clothes for him He got her her shoes She fetched him his clothes ??She was gotten her shoes ??He was fetched his clothes

Another difference between Recipients and Beneficiaries shows up in sentences where they are the sole argument. Some verbs that select for a Theme and a Recipient may delete the Theme when it is background knowledge shared among the discourse participants; here, the Recipient can be the sole DO: (159) (160) (161) (162)

I paid him (the money) She served them (dinner) I trade you (my stamp collection) He showed me (the trick)

But verbs in double object Beneficiary constructions cannot delete the Theme:

232

Christiane Fellbaum

(163) (164) (165) (166)

I'll cook you *(dinner) She prepared them *(lunch) We danced the children *(a folk dance) He bought her *(the ring)

Passives with Beneficiaries seem moreover constrained to events denoting the preparation of an entity and/or a transfer of possession. We found many non-transfer verbs in active constructions with a Beneficiary argument (either in a PP or as a DO ) but the web yielded no corresponding passives: (167) (168) (169) (170) (171) (172) (173) (174) (175) (176) (177) (178)

he composed her a song ??she was composed a song create me a website ??I was created a website wash me a shirt ??I was washed a shirt kill me some Redcoats ??I was killed some Redcoats ruin them their way of life ??they were ruined their way of life strike me a fire ??I was struck a fire

Like ingestion verbs, verbs of transfer can also select for a reflexive Recipient in DO position: (179) Mr Graham-Cumming sent himself the same message 10,000 times... www.spamfo.co.uk/The_News/Scams_&_Fraud/How_to_make_spam_unst oppable/2/ (180) She had promised herself a night on the town www.skaro.com/write/trish/trish27.html (181) she granted herself permission to lie. www.creativenonfiction.org/thejournal/articles/issue05/05editor.htm For Recipients, the PP-alternant is attested, too, in contrast to the ingestion verb with a reflexive Beneficiary: (182) ...a copy he sent to himself turned up in his own spam folder. ... www.careerjournal.com/jobhunting/ resumes/20040413maher.htm

Examining the Constraints on the Benefactive Alternation

233

(183) But Lindo had promised to herself that she would never forget... www.hh.shuttle.de/hh/gyha/ Facher/Englisch/joyluckmarl.htm (184) ...the little birthday present she'd granted to herself. www.grandt.com/XanderZone/ stories/read.php?story=Rejoined The show that there is some overlap between Beneficiaries and Recipients. First, a number of verbs select for both arguments. Second, the semantic role of a noun phrase often includes aspects of both Beneficiary and Recipient and cannot always be clearly distinguished. Third, the passivization data indicate a kind of "competition" between the Beneficiary and the Recipient, but suggest that only the latter has full argument status. A possible interpretation of the data is that the Beneficiary is a kind of sub-role of the Recipient, semantically more specified and syntactically more constrained. The Benefactive may be reserved for those cases not covered by the broader Dative/Recipient, namely, cases where no change of possession is necessarily involved or where the emphasis is on the benefit rather than a change of possession. Previous analyses of the Benefactive alternation, including Green,'s have cast its semantics in terms a change of possession, characterizing verbs of performance, creation, and preparation as metaphorical possession transfer. But this does not account for the fact that the alternations differ and are available for different verb classes. 6

Semantics of the alternants

We examined the constraints that have been proposed to account for the double object alternant of the Benefactive alternation. Web data demonstrate that many of the previously formulated constraints do not strictly hold and that speakers violate them regularly. However, the violations are not random but appear to be extensions demonstrating the "softness" of the semantic constraints. Given the semantic and syntactic overlap between the Dative Shift and the Benefactive Alternation, one might ask whether the explanations proposed for the constraints on the former can also help in understanding the latter. Krifka (1999, 2003) in his study of a large number of Dative-shifting verbs, argues for distinct meanings associated with the two alternants in many cases. He proposes that the DO syntax expresses a change of possession, where an Agent causes a Goal (or Recipient) to be in a state of possessing the Theme. The DO construction does nor provide for a movement event, in contrast to the PP alternant, which expresses an event where the

234

Christiane Fellbaum

Agent causes the motion of the Theme towards a Goal. Assigning specific semantics to these constructions, in the spirit of Goldberg (1995), seems to work well for the wide range of verbs showing the Dative alternation that Krifka discusses. An extension of this explanation to the Benefactive Alternation might be formulated roughly as follows. Analogously to Krifka's proposed analysis for the Dative Shift, the DO alternant causes a change of state in the Beneficiary, namely one where the referent necessarily becomes a Beneficiary and incurs the benefit. The PP alternant on the other hand simply expresses an event where an Agent intends a benefit for a potential Beneficiary; intention here could be interpreted as a kind of metaphorical movement of the benefit.7 Interestingly, the data from the reflexive Beneficiaries, often considered substandard or dialectal, provide some support for this analysis. Recall that these sentences involve verbs of ingestion and perception, where the Agent or Experiencer is necessarily coreferent with the Beneficiary: (185) (186) (187) (188)

I'll have myself a little snack before bed I'll eat me some potted meat gonna listen me some Guns and Roses gonna watch me some uneducational TV

Unlike with the other verb classes that show the alternation, the PP alternant is not available for these verbs: (189) (190) (191) (192)

*I'll have a little snack before bed for myself *I'll eat some potted meat for me/myself *gonna listen to some Guns and Roses for me/myself *gonna watch some uneducational TV for me/myself

Ingestion and perception events necessarily affect, and change the state of, the ingesting or perceiving entity, making these data consistent with a theory that says that the semantics of the double object alternant, but not those of the PP alternant, provide for a change of state. Green's constraint requiring the contemporaneous existence of Agent and Beneficiary constitutes a prerequisite for this analysis, while the data we found where the DO Beneficiaries are dead or as yet non-existent (sentences (125)-(130)) would be counterexamples. But it seems plausible that the speakers of these sentences conceptualize the Beneficiaries as living entities, capable of benefiting and undergoing a change of state.

Examining the Constraints on the Benefactive Alternation

7

235

Restrictions on the Benefactive alternation consistent with the data

We saw that the previously proposed constraints on the Benefactive cannot fully account for the naturally occurring data found on the web. The web data indicate that the constraints, as they have been formulated, are too rigid, and speakers regularly extend them. But clearly, restrictions on the DO Benefactive do exist. The data we examined do not permit us to formulate any hard constraints. Instead, we can state only one necessary but not sufficient condition for the DO Benefactive alternation. The attested data all appear to show at least one common semantic feature, the control of the subject over the event. 7.1

Control with transfer verbs

We saw that the Benefactive is allowed in many cases where the subject acts in order to bestow a benefit on another person (or entity). Some verbs of obtaining, like buy, get, fetch, and steal, when used as simple transitives, imply that the Agent is also the Beneficiary or Recipient of the obtained entity. But the default Beneficiary or Recipient can be cancelled in the presence of another Beneficiary argument: (193) ...santa said you guys gotta buy me my presents this year. . www.sassyandseksi.com/buystuff.htm (194) He wants someone to fetch him his shoes... www.washingtonpost.com/wpsrv/style/books/features/11980621.htm (195) Gabby's mom stole me some pants from the hospital www.dyve.com/springman/avi/art/artnav.htm Other verbs, like receive, denote events where the potential benefit must remain with the subject and cannot be passed on to another (non-subject) Beneficiary. The subject here is not only a necessary, but also a passive Beneficiary, and is not in control of the event where the possession changes ownership. We could not find any examples of these verbs with the Benefactive, either in the PP or in the DO alternant: (196) *I'll receive me/you/her a little present The subject's control over the event appears to be one (necessary but not sufficient) requirement for the Benefactive alternation. Further evidence

236

Christiane Fellbaum

comes from the interesting case of polysemy presented by the verb find. It shows the Benefactive alternation (and a Benefactive reading of the corresponding for-NP phrase) when it refers to the result of a search effort that implies a goal or intent, but we could not find instances where it refers to an accidental or serendipitous finding, as in the constructed (200): (197) Find Me My Property. Www.marinatradingpost.com/form1.html (198) My husband ...made it his mission to find me my pink shoes. www.epinions.com/sprt-Basketball-Adidas_superstarII (199) Find me my Perfect Mate! ... www.cutecards.net/platinum/icq/funlovetest.html (200) ??Find me a wallet in the street 7.2

Control with consumption and perception verbs

For verbs of consumption and perception, the subject, the ingester, is necessarily the Beneficiary. An explicit Beneficiary, coreferent with the subject, can be added to emphasize the subject's causation of, or control over, the event: (201) I'll have myself a little snack before bed www.dedecountryhome.com/BuddyBoy3.html (202) I reckon I'll eat me some potted meat. www.math.gatech.edu/~mullikin/res/respics.htm (203) gonna.listen me some Guns and Roses www.angelfire.com/me3/NovaSparkle/xjournal02.html (204) Gonna watch me some uneducational TV, damnit. www.champuru.com/08-2000/08-29-2000.html Our web searches turned up no examples of Benefactives with verbs where the perception event is not caused or controlled by the subject, as with hear and see, which do not imply intention and hence control by the perceiver: (205) ??I hear me some noises in the street (206) ??I saw me an accident on the road

Examining the Constraints on the Benefactive Alternation

7.3

237

Inanimate controllers

Sentences like (207-208) below show that an inanimate Cause can have control over an event, even though it is incapable of intention and volition: (207) The sun baked you the bricks (208) Still, the fact is the current budget only bought us time. ... www.americanprogress.org/site/ A Cause may control an event because of its specific properties, much as in middle constructions, where an entity's particular property enables a potential event. No sentience, volition, or intention is required to cause a benefit. Control is thus the one common semantic component of the wide range of subjects in the DO alternant; all other previously proposed constraints were shown to be violated by attested data. While control does not seem like a satisfactory semantic characterization, we expect to better understand the nature of the arguments in the alternation as more sophisticated web searches yield pertinent data. 8

Conclusions and future work

The web data show that most of the constraints that have been proposed on the basis of constructed data are soft and speakers frequently violate and extend them, though most data fall into the kinds of patterns that previous researchers have suggested. The work reported here raises the question as to the core of a constraint and its "fuzzy edges." This case study shows up the need for attested data, as constructed contrastive data, often labeled either "grammatical" or "ungrammatical," fail to capture the fuzziness of real constraints and often reflect the theoretical biases of the investigators who construct the data.8 Our main goal here was to test proposed constraints against attested data, we are not able to offer a revised full explanation of the alternation. The data are consistent with two observations: One, that the DO alternant requires that the subject have the abilities or properties required to bestow the benefit; two, that in the DO alternant, unlike in the PP alternant, a benefit is necessarily bestowed, resulting in a change of state of the affected entity, the Beneficiary. Traditionally, linguistic research had to rely on data based largely on the investigator's intuitions; attested, unsolicited, and naturally occurring data could not be obtained in a systematic fashion. Corpora represent a first step

238

Christiane Fellbaum

toward research based on non-constructed data. In particular, the World Wide Web represents a very large and domain-independent corpus that can be mined easily and efficiently. Our method was clumsy, and we cannot claim to have found all the relevant data. Therefore, we are careful not to propose a definitive account of the Benefactive construction. We plan to re-examine the Benefactive construction, as well as other illunderstood grammatical phenomena, with the help of a sophisticated search tool (Resnik and Elkiss 2004). Resnik and Elkiss's Linguist's Search Engine allows the user to search data from Internet Archives for specific syntactic structures and to build custom tailored corpora with pertinent hits for the purposes of empirical investigation. This tool will allow the testing and possible refinement of linguistic theories, and permit their formulation in the light of relevant data that might not otherwise be considered. Acknowledgements This work was supported in part by NSF Grant Number IIS-0112429. I thank Mari Olsen, Philip Resnik, Usama Soltan, Manfred Krifka, Philippa Cook, Adele Goldberg, Artemis Alexiadou, Hans Kamp, Effi Georgialou, Anthony Kroch, and Ben Haskell for their critique and helpful comments.

Notes 1.

2. 3. 4. 5.

Lapata's (1999) interesting study the Dative and Benefactive alternations, using the British National Corpus, investigates the relative frequencies of the two alternations, the preference of the alternating verbs for the DO vs. the PP alternant, and the representative members of the participating classes, based on Levin (1993). Her quantitative focus is however quite different from ours and does not directly challenge the proposed constraints on the Benefactive alternation. These constructions are described by Christian (1991) in her study of Appalachian speech. Christian states that they carry a "light benefactive meaning," but offers no further evidence for this assertion. I thank Manfred Krifka for pointing these data out to me. In principle, non-argument status would preclude the occurrence in direct argument position. Verbs like contribute and donate are generally considered to be restricted from the Dative Alternation. While a web search showed up no sentences with contribute and DO Recipients, well over a thousand sentences like can anyone donate me some ice cubes? and you can donate me some money. Apparently,

Examining the Constraints on the Benefactive Alternation

6. 7.

8.

239

whatever semantic constraints blocks the alternation for contribute does not (or no longer) block it for donate. If Benefit is equated with Change of Possession, then the aspectual constraint would be better motivated, as a change of possession tends to denote an accomplishment. Such an explanation seems related to the holistic constraint, which is often assumed to account for the spray-load alternation. The argument projected as the direct object is fully (holistically) affected by the event, in contrast to the argument in the PP. In an examination of the distribution of the Dative Shift on the basis of attested data, Bresnan and Nikitina (2003) claim that much data that is labeled "ungrammatical" is merely "improbable" and that the probability of their occurrence is linked to information structure.

References Anderson, Stephen R. 1971 On the Role of Deep Structure in Semantic Interpretation. Foundations of Language 7: 387-396. Baker, Carlos L. 1979 Syntactic theory and the projection principle. Linguistic Inquiry 10. 533-581. Bresnan, Joan and Tatiana Nikitina 2003 On the Gradience of the Dative Alternation. Ms. Stanford, CA: Stanford University. Christian, Donna 1991 The personal dative in Appalachian speech. In Dialects of English, Peter Trudgill. and J Chambers (eds.), 11-19. London: Longman. Curme, George O. 1986 A grammar of the English Language. Vol II: Syntax. Essex, CT: Verbatim Printing. Daultrey, Bethan 1997 The Structure of the Double Object Construction in English. www.ucd.ie/~pages/97/daultrey Goldberg, Adele 1995 Constructions: A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press. Green, Georgia 1974 Semantics and Syntactic Regularity. Bloomington, IN:Indiana University Press. Jackendoff, Ray 1990 Semantic Structures. Cambridge, MA: MIT Press.

240

Christiane Fellbaum

Krifka, Manfred 1999 Manner in Dative Alternation. In: Proceedings of the 18th West Coast Conference on Formal Linguistics, Sonya Bird, Andrew Carnie, Jason D. Haugen, and Peter Norquest (eds.), Tucson, AZ: University of Arizona. 2003 Semantic and pragmatic conditions for the Dative Alternation. Korean Journal of English Language and Linguistics 4:1-32. Lapata, Maria 1999 Acquiring Lexical Generalizations from Corpora: A Case Study for Diathesis Alternations. In: Proceedings of the 37th Meeting of the Association for Computational Linguistics, 266-274. College Park, MD. Larson, Richard 1990 On the Double Object Construction. Linguistic Inquiry 19:335-391. Levin, Beth 1993 English Verb Classes and Alternations: A Preliminary Investigation. Chicago, IL: University of Chicago Press. Marantz, Alec 1984 On the Nature of Grammatical Relations. Cambridge, MA: MIT Press. Nishida, Chiyo 1994 The Spanish reflexive clitic se as an aspectual class marker. Linguistics 23: 425-258. Pinker, Steven 1989 Learnability and Cognition: The Acquisition of Argument Structure. Cambridge, MA: MIT Press. Resnik, Philip and Aaron Elkiss 2004 The Linguist's Search Engine: Getting Started Guide. Technical Report: LAMP-TR-108/CS-TR-4541/UMIACS-TR-2003-109. University of Maryland, College Park. Wechsler, Stephen 1995 A Non-Derivational Account of the English Benefactive Alternation. Paper presentend at the 65th LSA Annual Meeting. Chicago, IL.

A Quantitative Corpus Study of German Word Order Variation Kris Heylen

1

Introduction

Word order variation in German is as an area of syntactic research in which the limitations of the types of data traditionally used in theoretical linguistics have become apparent. In a case study, we present a quantitative corpus analysis as a possible alternative to overcome these shortcomings. Traditionally, theoretical linguists have not had to worry much about the validity of the data on which they based their theories. For most linguists, obtaining relevant data seemed fairly unproblematic and easy. In the generative tradition, linguists could rely on the introspective grammaticality judgments of any single native speaker (including themselves) to uncover the principles of grammar. Researchers taking a cognitive-functional approach offered hermeneutic interpretations of “encountered” examples of language use to show how discourse properties or general cognitive abilities shape the grammar. Most research into word order variation in German was also based on these types of linguistic evidence. Recently however, there has been a growing awareness across the field of theoretical linguistics that this kind of data has limited reliability and is insufficient to deal with the complexity of linguistic phenomena. Several options are being pursued to provide grammar research with a more solid empirical basis. One of them is the use of large electronic corpora for collecting representative data samples in which theoretically interesting patterns of linguistic structure can be discovered and validated. This kind of corpus study will be applied here to word order variation in the Mittelfeld of the German clause. 2

Word order variation in German

Word order variation has been a longstanding issue within the study of German syntax. In the first section, we will briefly outline the major research questions involved, and we will point out how problems with data

242

Kris Heylen

reliability have played an important role in this area of research. The second section introduces the specific type of word order variation that we will pursue further in a following case study. 2.1

A challenge to traditional data types

German clause topology is characterized by the so-called Klammerconstruction. The German clause has two fixed positions, called Klammer (German for ‘brackets’), that are typically occupied by elements of the verbal group in main clauses or by a complementizer and the verbal group in subordinate clauses. These fixed positions subdivide the clause into three main “fields”. Of interest here is that the field between the two fixed positions, called the Mittelfeld (middle field), can contain multiple constituents and that these constituents do not always occur in the same order. Especially the relative order of verbal arguments like subject, direct and indirect object, co-occurring in the Mittelfeld, has been the subject of a lively debate within the German linguistics community. In this debate, problems of linguistic evidence have played a major role. At the debate’s climax in the mid 1980’s, linguists mainly differed in opinion as to whether the word order variation in the Mittelfeld was mainly determined by either grammatical or pragmatic factors.1 Both sides kept on coming up with examples that seemed to confirm the importance of the factors they had put forward, while refuting the effect of those suggested by the other side. Two main problems seemed to keep the debate from being settled: firstly, a great many factors were involved simultaneously, and secondly, each factor taken individually rarely had a categorical effect. This posed enormous problems to the traditional types of linguistic evidence on which the researchers relied. Without a categorical effect of the factors, grammaticality judgements were not unambiguous and not at all stable across speakers. Moreover, there was no obvious scale to interpret the resulting graded differences in grammaticality. The fact that multiple factors were involved, made it well nigh impossible to control for all of these factors simultaneously in test sentences, let alone assess the contribution of each factor to the graded grammaticality. Towards the beginning of the 1990’s, there was an increasing awareness that the main problem was indeed a methodological one: the traditional introspective data was unreliable and could not cope with the phenomenon’s complexity. As a consequence, several new empirical methods were tried out.

A Quantitative Corpus Study of German Word Order Variation

243

One approach used psycholinguistic experiments based on processing time differences (e.g. Pechmann et al. 1996, Poncin 2001), and a second type of study looked at corpus material (e.g. Primus 1994; Kurz 2000). A third and more recent approach uses a sophisticated version of grammaticality judgments with a strict design, taken from multiple test subjects and analyzed with advanced statistical techniques (e.g. Keller 2000). Yet, these approaches do have problems of their own. Both the psycholinguistic experiments and the corpus studies continued struggling with the variation’s multifactorial complexity: the psycholinguistic studies had to limit the number of factors because of the time-consuming and costly way of collecting data. Results were highly reliable but dealt with only one or two factors simultaneously. The corpus studies could investigate the effect of multiple factors in large amounts of actual usage data, but they lacked the statistical apparatus to deal with multifactorial complexity. The third method of enhanced grammaticality judgments can gather sufficient data relatively easy and has the appropriate statistics to deal with multifactoriality, but the heuristic status of grammaticality judgments themselves is not unproblematic. They certainly give a reliable, reproducible estimate of speakers’ post-hoc introspective judgments of sentence acceptability, but it is unclear whether this, as claimed, directly reflects speakers’ linguistic competence of grammatical constraints used in on-line production, while at the same time filtering out “performance2 noise”. It seems more probable that the relation of acceptability judgements to the grammatical system is a complex and indirect one, because assessing acceptability is a separate and complex cognitive activity that potentially introduces new biases. The case study presented below opts for corpus material as data source for several reasons. Firstly, usage data as collected in corpora can be seen as the primary data in linguistics. Actual usage is what can be directly observed about language in reality. For a usage-based approach to grammar3, usage is primary because it fundamentally shapes the grammar. But even in a modular, autonomous grammar, usage data is not less primary than judgments because both are biased by performance, and in both performance noise can be filtered out in principle. Secondly, electronic corpora are getting larger by the day, so that gathering large amounts of data is relatively easy. Thirdly, corpus data deals with the problem of gradient effects fairly automatically by studying relative frequencies. Finally, the simultaneous effect of multiple factors can be studied directly by looking at actual usage, and these effects can be explored given the appropriate statistical apparatus. It is in this last respect that the case study below will try to improve on previous corpus studies that only used monofactorial analyses.

244

Kris Heylen

2.2

A specific type of word order variation

The case study focuses on a specific type of word order variation in the Mittelfeld, viz. the variation that occurs when both a full NP-subject and a pronominally realized object are present in the Mittelfeld. In this case, the pronominal object can either precede the full subject NP (as in ex. 1) or follow it (ex. 2).4 The variation occurs with both direct and indirect object pronouns. (1)

Ein paar Tage später nahm ihn der SED-Chef der Uni beiseite a few days later took him the SED-chief of the Uni aside ‘A few days later the university's SED-chief took him aside’

(2)

Später, als die Kommission ihn entlassen hat, sagt er, ... later when the commission him dismissed has says he ‘Later, when the commission has dismissed him, he says ...’

Although most reference grammars of German consider the word order with object-first (ex. 1) to be more common, both word orders seem to be freely interchangeable without any obvious difference in grammaticality or meaning. Because of this, traditional heuristic methods like grammaticality judgments cannot discriminate between examples and thus cannot detect the effect of relevant factors. Even the method of enhanced grammaticality judgments cannot detect a difference in acceptability between the two variants (Keller 2000: 108ff). The few other studies that discuss this type of variation (Lenerz 1994; Zifonun 1997:1511ff) admit that influencing factors are hard to identify. Classifying any syntactic variation as “free” variation is explanatorily highly unsatisfying and probably means more often than not that our methods for studying the phenomenon are insufficient. Admittedly, this type of variation might seem a hard nut to crack and not an obvious choice to start investigating German word order variation, but it also has a methodological advantage: By keeping the object pronominal, we reduce the multifactorial complexity because pronouns vary less in e.g. lexical form, length or discourse given/new status than full NPs. In what follows, we look at whether a quantitative corpus study can overcome the deadlock of traditional grammaticality judgments.

A Quantitative Corpus Study of German Word Order Variation

3

245

A quantitative corpus study

In the first section we discuss the corpus that was used to collect data and how this data collection was done. The second section introduces the different factors whose effect on word order we will examine in section 4. 3.1

Collecting the data

The case study is based on data from the NEGRA corpus.5 The corpus was compiled at the University of Saarbrücken and consists of 20,602 morphosyntactically annotated sentences (355,096 tokens) taken from the 1992 editions of a local Frankfurt-based daily newspaper (Frankfurter Rundschau). Strictly speaking, this means that any of our findings will only hold for this specific type of language use, i.e. language use from this specific register (newspaper articles), location (region of Frankfurt) and time period (1992). Yet newspaper texts cover many topical domains and are written by multiple authors, often with different backgrounds, so that the patterns we find in this type of data might well be representative for modern German usage in general. Of course, whether there are regional, sociolinguistic or register differences is a question that is open to further empirical investigation. For the data collection, relevant observations were defined as all clauses with a nominal subject and one pronominal object in the Mittelfeld. Taking advantage of NEGRA’s morpho-syntactic annotation, we used a selfprogrammed PERL script to extract these observations automatically, and then we did a manual check for precision on all retrieved observations and a manual check for recall on 20% of the corpus. The script had found all relevant observations but also 15% spurious ones. After removing irrelevant clauses, we were left with a total number of 995 observations. This means that the construction is quite common and present in about 5% of the corpus’ sentences. The observations were then annotated for the response variable word order (subject first vs. object first) and for a number of factors that are traditionally mentioned as relevant in the literature on word order variation. These factors had to be operationalized, so that each observation could be assigned a unique value during a process of mostly manual annotation. Basically, this process takes the form of an iterative annotation loop: as a new observation comes up that does not fit into the initial operationalization of a factor, this operationalization has to be revised to accommodate for this observation. Then, all previously annotated observations have to be checked

246

Kris Heylen

again to see whether they are still in compliance with the new operationalizations of the different factors. This reiteration goes on until all observations are annotated adequately. It goes without saying that this a very time consuming process, but it is probably the price that has to be paid for reliable data. The final outcome of this annotation is a so-called data matrix, which states for every observation which word order was attested and which values were observed for each factor. This data matrix is then amenable to further statistical analysis. But before turning to this analysis, the next section discusses the factors that were investigated. 3.2

The factors

Because the differences in grammaticality and meaning between the word order variants in our study are so elusive, it is very hard to make an initial guess about which factors might have an influence. Therefore we consulted the (vast) literature on word order and selected a set of the many factors that are mentioned as relevant. Seven of those are discussed in this study (see table 1 for an overview). Most of them pertain to properties of the subject because its realization as a full NP allows for more diversity than the pronominal object. Table 1. Factor overview FACTORS

VALUES

1

Case of the pronominal object

[dative] [accusative]

2

Semantic role of the subject

[agent] [recipient] [theme]

3

Length difference between subject and object

number of syllables

4

Given/new status of the subject

nine point ordinal scale

5

Animacy of the subject

[animate] [inanimate]

6

Pronoun type of the object

[personal] [reflexive]

7

Clause type

[main] [subordinate]

A Quantitative Corpus Study of German Word Order Variation

3.2.1

247

Case of the pronominal object

Grammatical case is probably the most basic factor for word order phenomena. Indeed, most German grammars explicitly refer to a constituent’s case to describe preferred orderings. Grammatical case like nominative, accusative or dative is mostly morphologically marked in German. Because our study keeps the presence of a nominative subject constant, the only variation in case occurs with the pronominal object. This can either be a pronoun in accusative or dative case. 3.2.2

Semantic role of the subject

The semantic role or theta role of a constituent refers to the role the referent of a constituent plays in the action denoted by the verb. It is a factor that figures prominently in many generative accounts of word order. In this study we discriminate between three roles for the subject referent. Agent, when the referent performs an action or causes an action to take place; Recipient, when a referent is the active recipient of objects or stimuli; and Theme, when the referent is itself inactive in the action / state denoted by the verb. 3.2.3

Length difference between subject and object

The idea that length difference has an effect on word order was introduced by Jacob Wackernagel in late 19th century and recapitulated by Behaghel as his “law of growing constituents”, which claimed that short constituents tended to precede longer ones. More recently, Hawkins’ (1994) EIC-theory claims that length difference is the main factor determining word order. In this study we measured the difference between subject and object in syllables rather than number of words, because in a compound-friendly language like German individual words can already differ greatly in length. Because the object is always a one or two syllable word, this measure mainly reflects the length of the subject. 3.2.4

Given/new status of the subject

Given/new status refers to whether a referent was previously mentioned in the discourse or not. Its influence on word order was a basic assumption of

248

Kris Heylen

the Prague School. It is a factor that is notoriously difficult to operationalize because of the many intermediate categories between completely new and totally given. The 9 point scale we use here for the subject referent was developed as an opportunistic tool by Grondelaers (2000) based on earlier work by Ellen Prince (1981) and Mira Ariel (1991) and it has since proven its applicability in several analyses. The scale (table 2) classifies a referent by its degree of accessibility, given the current discourse model. Table 2. Given/new scale VALUE

REFERENT ACCESSIBILITY IN CURRENT DISCOURSE

1

Not accessible and unconstrained

2

Not accessible but constrained by the context

3

Not accessible but constrained by an anchor referent

4

Accessible through encyclopaedic knowledge

5

Inferable from an anchor referent

6

Accessible in the wider linguistic context

7

Inferable from the near linguistic context

8

Accessible in the near linguistic context

9

Accessible in the immediate speech context

3.2.5

Animacy of the subject

Whether a constituent’s referent is animate or inanimate is proposed as the main determiner of German constituent order in the reference grammar by Zifonun et al. (1997). In this study, we only look at the animacy of the subject referent because a substantial part of the objects, viz. those realised as reflexive pronouns, would only mirror the subject’s animacy. 3.2.6

Pronoun type of the object

Although reflexive pronouns also participate in the variation, they have a different semantics from personal pronouns, which could influence word order. Because of the reporting style of news paper text, most pronouns are third person pronouns. For reflexives, the third person form sich is indeed the only one that was observed.

A Quantitative Corpus Study of German Word Order Variation

3.2.7

249

Clause type

In this study, clause type refers to the difference between main clauses and subordinate clauses. Whereas main clauses in German have the finite verb in second position (occupying the first Klammer), subordinate clauses have the finite verb in (near) final position (the second Klammer). Hawkins’ (1994) theory predicts this will lessen the tendency in subordinate clauses to put shorter constituents before longer ones. In our case, this would mean fewer short pronominal objects before long full-NP subjects in subordinate clauses. 4

Statistical exploration

After the annotation process described above, we had obtained a data matrix which states for every observation which order the subject and object appeared in, and which value was observed for each of the seven factors. Now, statistical analysis allows us to examine the correlations between word order and the factors. This can be done from two perspectives: if some theory of grammar has lead us to formulate a hypothesis that makes an explicit prediction about the correlation between word order and some factor(s), we can test whether this hypothesis is confirmed by the data or not. This is called confirmatory analysis. On the other hand, if we have annotated our observations for a number of factors, but we do not yet have an explicit hypothesis about which factors determine the word order and we would just like to know a bit more about the effects of these factors, we can explore the correlations between factors and word order in the data. This is called exploratory analysis. This exploration is meant to give a better insight into the data, which may well lead to new theoretical understandings and explicit hypotheses. In their turn, these hypotheses can again be tested. With the word order variation studied here, the main problem was precisely its elusive character which prevented us from formulating an explicit hypothesis about what determined the variation. Instead we chose to look at a number of factors suggested by the literature. An exploratory statistical analysis can now give us an idea about which of these factors are actually relevant, in what way and to what extent. Note that a statistical analysis will not in itself provide an explanation; rather, it uncovers patterns that themselves need explaining. These analyses help to expose empirical facts that are not apparent at first sight. These facts should be the input for explanation finding and theory building. A good theory will try to generalize and make predictions for other cases than the ones it started from. Whether

250

Kris Heylen

these predictions are borne out by the “empirical facts” is then a question to be addressed in additional analyses. The analyses presented below explore the data at levels of increasing complexity with increasingly advanced statistical techniques. The main concern will be the kind of information that these statistical techniques provide, not their technical details.6 First, we look at the relative order of subject and object per se to see which order is dominant. Next, the effect on word order of each factor separately is investigated. Then, we examine the effect of one factor while controlling for a second factor. Finally, we assess the effect on word order of multiple factors simultaneously. 4.1

The proportion of object-first and subject-first

In studying syntactic variation, an obvious first question seems to be: how much variation is there? Are both orderings of subject and object equally frequent, or is there a clear default, dominant order? In our data, 889 out of 995 observations have object-first whereas only 106 observations have subject-first. This proportion of 89,3% object-first confirms what most grammars of German say, viz. that object-first is the default order. For a future theoretical interpretation, this probably means that subject-first will be considered a marked order whereas object-first is the unmarked order. We might also be interested to know how reliable the information about this proportion is. How sure can we be that the proportion object-first we find in our data is a good estimate for the proportion of object-first in general. In fact, this is the basic question underlying all of statistics: how reliable are the results obtained from a sample of observations when compared to all possible observations. Intuitively, it is clear that the more observations we take into consideration, the more reliable our results will be. Statisticians use this property to determine confidence intervals from a sample. The 95% confidence interval for a proportion is the interval in which the true proportion for all possible observations will be situated with 95% certainty. The more observations we take into account, the more we can narrow down the interval. This confidence interval holds for all observations made under similar conditions as those under which the sample was collected. In our case, these conditions would be something like all observations that come from newspaper articles that appeared in local newspapers from central Germany in the early 1990’s. For these conditions, the 95% confidence interval for the proportion of object-first is situated between

A Quantitative Corpus Study of German Word Order Variation

251

87.1% and 91.0%. We now can reliably say that object-first is indeed the default order. 4.2

The effect of separate factors

Above, we introduced seven factors that we think might influence the relative ordering of a full subject NP and a pronominal object. In this section, we examine for each of the seven factors separately, what its effect on word order is. For each factor, we look at two statistics: the F² test7 tells us whether there is an association between the factor and word order. If there is an association, a “measure of association” tells us how strong the association is and what direction it takes. 4.2.1

Case of the pronominal object

Table 3 makes clear that there is not much of an effect of case on word order. The proportion of object-first versus subject-first cases is exactly the same for observations with an accusative and a dative pronoun. The F² test confirms that there is no significant association (p = 0.94).8 This lack of effect is somewhat unexpected, because case is considered to be relevant for word order by nearly all reference grammars. However, this may lead us to consider that although case per se is not important, some more specific interpretation of case might well have an effect, as we will see below (4.3). Table 3. Word order by Case of object Case

OBJECT FIRST

SUBJECT FIRST

ACCUSATIVE

724 / 810 (89%)

88 / 810 (11%)

DATIVE

165 / 185 (89%)

20 / 185 (11%)

4.2.2

Semantic role of the subject

The semantic role of the subject has a significant effect on word order (F², p < 0.01). Agent subjects precede the object relatively more often than recipient subjects, and in their turn recipient subjects precede objects more often than theme subjects. Increased agentivity of the subject seems to favour subject before object ordering, something we indeed expect from the literature. If we consider the three semantic roles as levels on a scale of

252

Kris Heylen

agentivity, the so-called gamma index gives a measure for the strength of the association between agentivity and word order. The index ranges from -1 (perfect inverse linear association) over 0 (no association) to 1 (perfect linear association). Here the gamma index is –0.49. The negative sign means that high levels of agentivity correspond to relatively lower levels of object-first (i.e. relatively more subject-first). The absolute value of |0.49| indicates that there is moderate association. Table 4. Word Order by Semantic role of subject Role

OBJECT FIRST

SUBJECT FIRST

AGENT

466 / 547 (85%)

81 / 547 (15%)

RECIPIENT

104 / 116 (90%)

12 / 116 (10%)

THEME

319 / 332 (96%)

13 / 332 (04%)

4.2.3

Length difference between subject and object

Length difference between subject and object has a significant effect on word order (F², p < 0.01). Smaller length differences lead to relatively more subject-first as we can also see in table 5. Because the pronominal object is always short, length difference mainly reflects the length of the subject. This means that shorter subjects precede the object relatively more often than longer ones, which is what we expect from Behaghel’s “law of growing constituents”. The gamma index of –0.44 reflects a moderate inverse association between object-first and smaller length differences. Table 5. Word Order by Length difference Syllables

OBJECT FIRST

SUBJECT FIRST

0-3

299 / 362 (82%)

63 / 362 (18%)

3-6

212 / 233 (91%)

21 / 233 (09%)

>6

378 / 400 (95%)

22 / 400 (05%)

4.2.4

Given/new status of the subject

Table 6 shows that the given/new status of the subject referent as measured by its degree of accessibility does not have a perfect linear effect.9 Indeed, empirical data does not always show the neat results we would like. How-

A Quantitative Corpus Study of German Word Order Variation

253

ever, more accessible subjects do seem to precede the object relatively more often, which we would expect from the theories of the Prague school. The (MH) F² test confirms that there is linear association (p < 0.01). A gamma index of 0.28 indicates that this linear association is relatively weak. Table 6. Word order by given/new status of the subject Values

OBJECT FIRST

SUBJECT FIRST

1

162 / 169 (96%)

7 / 169 (04%)

2

48 / 55 (87%)

7 / 55 (13%)

3

24 / 24 (100%)

0 / 24 (00%)

4

103 / 117 (88%)

14 / 117 (12%)

5

101 / 106 (95%)

5 / 106 (05%)

6

255 / 291 (88%)

36 / 291 (12%)

7

140 / 159 (88%)

19 / 159 (12%)

8

56 / 74 (76%)

18 / 74 (24%)

4.2.5

Animacy of the subject

In table 7, animate subjects precede pronominal objects more often than inanimate subjects. This effect is statistically significant (F² p < 0.01) and fits in with the effect of animacy that Zifonun (1997) predicts. Both word order and animacy have only two values and the measure of association generally used in such cases is the odds ratio. Here, this is the odds in favour of subject-first with animate subjects divided by the the odds in favour of subjectfirst with inanimate subjects, which gives a value of 2.33. The odds of having subject-first with animate subjects is more than twice the odds with inanimate subjects, a moderately strong association. Table 7. Word order by Animacy of the subject Animacy

OBJECT FIRST

SUBJECT FIRST

ANIMATE

532 / 614 (87%)

82 / 614 (13%)

INANIMATE

357 / 381 (94%)

24 / 381 (06%)

254

Kris Heylen

4.2.6

Pronoun type of the object

Personal pronouns follow the subject significantly more than reflexive pronouns (F² p < 0.01). Apparently, the fact that reflexive pronouns do not introduce a separate referent in the sentence’s meaning has consequences for their ordering. This finding from our data exploration can now lead us to search for a theoretical interpretation. The odds ratio of 2.96 indicates a moderately strong association. Table 8. Word order by pronoun type of the object Pronoun type

OBJECT FIRST

SUBJECT FIRST

PERSONAL

141 / 179 (79%)

38 / 179 (21%)

REFLEXIVE

748 / 816 (92%)

68 / 816 (08%)

4.2.7

Clause type

The marked order subject before object is much more frequent in subordinate clauses than in main clauses. There is indeed a significant association between clause type and word order (F² p < 0.01). This is a finding we will also have to interpret further after completing our data exploration. The odds ratio of 7.41, meaning that the odds for subject first in subordinate clauses is more than 7 times those odds in main clauses, indicates a very strong association. Table 9. Word order by Clause typet Clause type

OBJECT FIRST

SUBJECT FIRST

MAIN

646 / 674 (96%)

28 / 674 (04%)

SUBORDINATE

243 / 321 (76%)

78 / 321 (24%)

4.3

Stratified analysis

In the one-by-one analysis of factors, a surprising result was the lack of effect of object case on word order. Although there is no general effect, there might be an effect for specific types of pronominal objects. We therefore consider the effect of case for personal and reflexive pronouns separately. This is done in a so-called stratified analysis: we examine the effect of one factor (case) on word order, while controlling for a second factor (pronoun

A Quantitative Corpus Study of German Word Order Variation

255

type). Table 10 now tells us that there is a significant, moderately strong effect of case with personal pronouns (F² p < 0.01, odds ratio = 2.92), but there is no such effect with reflexive pronouns (F² p = 0.60). There is also a test statistic, the Breslow-Day test, to check whether the effect of case is indeed significantly different for reflexives and personal pronouns. With a p-value of 0.02, we can say that the probability of the effect being the same is very small. One reason might be case syncretism: the reflexive sich has the same form for dative and accusative, whereas personal pronouns do have different forms for these cases. There may also be other reasons, but in any case, the stratified analysis has revealed an interesting difference that we might want to interpret theoretically. Table 10. Word order by pronoun case, stratified for pronoun type Pron. type

Case

PERSONAL

ACCUSATIVE

69 / 97 (71%)

28 / 97 (29%)

DATIVE

72 / 82 (88%)

10 / 82 (12%)

ACCUSATIVE

655 / 713 (92%)

58 / 713 (08%)

DATIVE

93 / 103 (90%)

10 / 103 (10%)

REFLEXIVE

4.4

OBJECT FIRST

SUBJECT FIRST

Multifactorial analysis

In the previous sections, we have looked at the effect of the seven factors separately, or at the effect of one factor while controlling for a second factor. However, in the actual data, these seven factors are at work simultaneously. To investigate simultaneous effects, multifactorial statistical techniques are used. They address questions like, considering all factors at the same time, which ones do actually have an effect, what is their combined effect, what is each factor’s contribution to the combined effect, which factor is the most important one, and how good can we model the variation by the factors we have considered so far? First, we will look at a Logistic regression model. Next, we discuss a Classification and Regression Tree (CART). 4.4.1

Logistic regression model

A logistic regression model is an advanced statistical technique that estimates the simultaneous effect of the factors on word order. First, a stepwise

256

Kris Heylen

selection procedure determines which factors actually have an effect, given that all seven factors are considered simultaneously. The procedure selects the factors in order of effect strength and adds these to the model until no factors are left that still make a significant contribution to the effect on word order. In table 11, we see that five factors with a significant effect (p-value < 0.01) are selected for the model. Clause type has the strongest effect, followed by length difference, subject animacy, pronoun type and subject givenness. The procedure also selects one interaction, between clause type and pronoun type. Apparently, the effect of pronoun type is not the same in main and subordinate clauses. The model now states the combined effect of all selected factors on the odds of having subject-first (because odds must lie between 0 and 1, the effect is modelled on a logarithmic scale). Table 11. Logistic regression model Factor

DF

Estimate

Odds ratio

p

INTERCEPT

1

-5.114

1) Clausetype (subordinate)

1

2.512

2) Length diff. (small)

1

0.731

2.078