Productivity in Argument Selection: From Morphology to Syntax 9783110303919, 9783110300796

This book centers on the idea that some verbs and other argument structure constructions have an inherently different pr

215 5 2MB

English Pages 302 Year 2012

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
List of abbreviations and symbols
List of tables
List of figures xviii
Introduction
1. The problem in a nutshell: How do speakers know how productive each slot is?
2. Preliminary remarks on usage-based theories
3. Argument structure, argument selection and adjuncts
4. Requirements for a theory of syntactic productivity
5. Chapters in this book
2 (Re-)defining productivity: From morphology to syntax
1. General definitions in previous work
2. What productivity applies to: Morphology versus syntax
3. Granularity and grades of productivity
4. Criteria for productivity
5. Productivity versus creativity
6. Roadmap: Towards a productivity complex
3 Morphological productivity measures
1. Methodological remarks on testing productivity measures
2. Using type counts: V
3. Token counts and in-category vocabulary: N(C), f(C) and VC
4. Using hapax legomena: Baayen’s P * and P?
5. Vocabulary growth, frequency spectrums, A and θ
6. Estimating total vocabulary: Zipf’s Law, LNRE models and S
7. Measuring global productivity: I,Iand P*
8. Summary: Measuring morphological productivity
4 Adapting measures to the syntactic domain
1. Methodological remarks on using corpus data
2. Types and type counts in syntax
3. Argument selection in competing constructions: Prepositional and postpositional wegen in German
4. Different heads, different measures: Ranking productivity for direct object selection in English transitive verbs
5. Productivity in multiple slots: The case of comparative correlatives
6. Interim conclusion: Measuring productivity for syntactic argument slots
5 Lexical semantics and world knowledge
1. Semantic approaches to argument selection
2. Can lexical semantics and world knowledge explain novel argument selection?
3. Argument selection in (near) synonymous heads and constructions
4. Semantic and selectional effects in derivations from the same stem
5. Semantic-pragmatic motivation and syntactic alternations
6. World knowledge and argument selection in translational equivalents
7. Interim conclusion: Towards a usage-based account of novel argument selection
6 Representation within a usage-based productivity grammar
1. Productivity as knowledge and the innocent speaker
2. A formalization of the Productivity Complex
3. Explicitly modeling entrenchment and productivity
4. Why do skewed distributions lead to productivity? A Hebbian cognitive account of argument categorization
5. Lexical choice and the structure of the mental lexicon
6. Relation types in the mental lexicon
7. Interim conclusion: Outline of rules in a productivity grammar
7 Conclusion
1. Main results of this study
2. What models of grammar are compatible with these results?
3. Outlook
Appendices
A Queries
B Linear regression model with quadratic term for -saml-bar
References
Author index
Subject index
Recommend Papers

Productivity in Argument Selection: From Morphology to Syntax
 9783110303919, 9783110300796

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Productivity in Argument Selection

Trends in Linguistics Studies and Monographs 260

Editor

Volker Gast Founding Editor

Werner Winter Editorial Board

Walter Bisang Hans Henrich Hock Heiko Narrog Matthias Schlesewsky Niina Ning Zhang Editor responsible for this volume

Volker Gast

De Gruyter Mouton

Productivity in Argument Selection From Morphology to Syntax

by

Amir Zeldes

De Gruyter Mouton

ISBN 978-3-11-030079-6 e-ISBN 978-3-11-030391-9 ISSN 1861-4302 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.dnb.de. ” 2012 Walter de Gruyter GmbH, Berlin/Boston Printing: Hubert & Co. GmbH & Co. KG, Göttingen 앝 Printed on acid-free paper 앪 Printed in Germany www.degruyter.com

For Nina and Lia

Preface The present work started with a naive question: does productivity as it has been described in morphological word formation end at the word level, and if it does not, what would that mean for syntax? As a result, it grew together from two sides like a bridge, from morphology on the one side and the syntax-semantics interface on the other. It is the author’s hope that a coherent account has been arrived at in the middle. The subject of this book, argument selection, initially seems like a domain of limitless productivity: we cannot possibly enumerate every conceivable argument of a verb like drink or eat. But at the same time it is easy to observe empirically that the arguments of the former are fewer in number, more repetitive and less prone to innovation than those of the latter in virtually any sample of usage data. Once a certain sample size has been reached, it becomes difficult to observe novel material for any argument selection process, but how quickly this point is reached differs for different lexemes and constructions. Though there are clear pragmatic reasons why this may be, some cases seem arbitrary. For example if we look at large amounts of text, why do speakers of English seem to shake so many different things but jog so few (other than [someone’s] memory)? Why should near synonyms like start, begin and commence exhibit a significantly different likelihood of admitting novel objects? And what makes speakers generate different sized vocabularies for the verbal complement of help to [VERB] versus help [VERB]? These cases and many others like them are pervasive in usage data, and motivate a concept of productivity in argument selection just as it has been defended in morphology. In presenting the case for productivity in argument selection, two issues will have to be addressed. To begin with, it will be shown that productivity is part of speakers’ (implicit) knowledge of their language (Chapters 3–4). Productivity effects would have to be robust and resist reduction to extralinguistic factors, such as the difference between eating and drinking in real life (see Chapter 5). The sense of productivity meant here will comprise multiple dimensions. In the first instance, I refer to the likelihood of a speaker to produce novel forms in a construction, a quantity that has been studied especially in the sense of Baayen’s (1993, 2001, 2009) potential productivity (marked 3 ) in morphology. However, as I will discuss at length, other aspects of what I shall call the Productivity Complex (PC) are closely related to this in many ways, with the attested and projected total vocabulary size of a construction’s argument slot or slots, also discussed by Baayen for morphology, playing an important role. The main line of argu-

viii

Preface

mentation leading to the conclusion that speakers have meaningful knowledge about syntactic productivity will run along the following three points: 1. 2.

3.

Empirical productivity estimates for the same construction behave consistently within and across comparable datasets. The properties of productivity cannot be accounted for on semantic grounds without resorting to ‘per head semantic classes’ (as criticized e.g. by Dowty 1991). This is supported by the fact that (near) synonymous constructions and lexical heads have different, inexplicably idiosyncratic productive behavior. Many partially filled exponents of productive patterns have a nearsaturated vocabulary of arguments which must be stored as such in the mental lexicon to account for the data. Linguistic knowledge of these facts complements speakers’ knowledge of the preferred arguments for each construction (cf. Stefanowitsch & Gries 2003), and is meant to explain the differential generation of novel arguments in familiar slots.

A second issue to be addressed in this book, especially in Chapter 6, is how this kind of knowledge may be acquired and stored in a model of grammar. Here I make a first attempt, which is by no means complete, to integrate the empirical findings into current theories of usage acquisition based on properties of the input distribution. Within the framework of construction grammar (Fillmore 1985, 1988; Goldberg 1995, 2006a; Croft 2001 and many others), recent work (Casenhiser & Goldberg 2005; Goldberg 2006b, 2009; Boyd & Goldberg 2009; and for second language acquisition Ellis & Ferreira-Junior 2009; Wulff et al. 2009) has shown how constructions with certain skewed distributional properties are more readily acquired by both children and adults. In particular, constructions with a few very frequent types and many infrequent ones (a typical Zipf or LNRE distribution, see Baayen 2001; Evert 2004) are acquired more quickly and can be extended more easily by speakers. In line with these findings, this book aims to show that productive constructions exhibit precisely these properties and are therefore acquired for productive use by speakers, who then reinforce their extensibility by producing similar distributions in subsequent usage. It is my hope that this book will contribute to the explanation of the mechanisms of productivity in the syntactic domain, which open up new questions about usage-based views of grammar. Berlin, September 2012 Amir Zeldes

Acknowledgments This book could not have been written without the help of very many people. First and foremost my gratitude goes out to Anke Lüdeling and Stefan Gries who supervised my doctoral thesis on which this book is based. Anke has not only taught me how to use and understand corpora but has also guided, helped and encouraged me to publish my work at every turn. Stefan has been my role model in statistical work and, in spite of the distance between continents and time zones, was always there to answer my questions and provide detailed feedback. Special thanks are due to the many people who agreed to comment on various parts of the work described here at different stages, in particular: Peter Adolphs, Jasmine Bennöhr, Hans C. Boas, Peter Bosch, Stefan Evert, Livio Gaeta, Hagen Hirschmann, Marc Reznicek, Sören Schalowski and Stefanie Wulff. I thank Felix Golcher for help on some of the more complicated statistical issues that developed during research and for helping me in getting R to do what I wanted it to. Needless to say, any errors or inaccuracies still found in the final version are entirely my own responsibility. I would also like to express my appreciation for the authors whose work this volume further develops and owes a great debt to: to Harald Baayen for developing the morphological productivity paradigm that has fascinated me for the past few years; to Stefan Evert and Marco Baroni, who introduced me to Zipf-Mandelbrot models, for implementing the software to process word frequency distributions in R, and again to Marco and his colleagues for making the WaCKy corpora available. I am also thankful to the many authors working on productivity in particular and on the acquisition and modeling of usage-based grammar in general whose work I am building upon and which cannot all be mentioned here, but the majority of which are cited in the course of the text. My further gratitude is due to the helpful staff at De Gruyter for answering questions and advising me on formal aspects of the publication. I especially wish to thank series editor Volker Gast for many valuable comments and corrections to the manuscript. Finally I would like to thank my family for their love and support and for teaching me about language and science from early on. And most importantly, I want to thank Nina and Lia, to whom I dedicate this work, for making life wonderful.

Contents

List of abbreviations and symbols ......................................................... xiv List of tables............................................................................................. xvi List of figures ......................................................................................... xviii 1 Introduction ..............................................................................................1 1. The problem in a nutshell: How do speakers know how productive each slot is? ......................1 2. Preliminary remarks on usage-based theories ...................................5 3. Argument structure, argument selection and adjuncts ....................12 4. Requirements for a theory of syntactic productivity .......................14 5. Chapters in this book.......................................................................15 2 (Re-)defining productivity: From morphology to syntax ..................17 1. General definitions in previous work ..............................................17 2. What productivity applies to: Morphology versus syntax ..............21 3. Granularity and grades of productivity ...........................................26 4. Criteria for productivity ..................................................................33 5. Productivity versus creativity..........................................................39 6. Roadmap: Towards a productivity complex ...................................45 3 Morphological productivity measures..................................................48 1. Methodological remarks on testing productivity measures.............48 2. Using type counts: V .......................................................................49 3. Token counts and in-category vocabulary: N(C), f(C) and VC ........57 4. Using hapax legomena: Baayen’s 3 * and 3.................................60 5. Vocabulary growth, frequency spectrums, $ and ș .......................68 6. Estimating total vocabulary: Zipf’s Law, LNRE models and S ......76 7. Measuring global productivity: I, , and P* ...................................85 8. Summary: Measuring morphological productivity .........................92 4 Adapting measures to the syntactic domain ........................................96 1. Methodological remarks on using corpus data................................96 2. Types and type counts in syntax .....................................................98

xii 3. 4.

5. 6.

Contents

Argument selection in competing constructions: Prepositional and postpositional wegen in German .....................106 Different heads, different measures: Ranking productivity for direct object selection in English transitive verbs .............................................................................114 Productivity in multiple slots: The case of comparative correlatives ...................................................................................125 Interim conclusion: Measuring productivity for syntactic argument slots ...................135

5 Lexical semantics and world knowledge ............................................138 1. Semantic approaches to argument selection .................................138 2. Can lexical semantics and world knowledge explain novel argument selection? ............................................................147 3. Argument selection in (near) synonymous heads and constructions ................................................................................150 4. Semantic and selectional effects in derivations from the same stem ..............................................................................................166 5. Semantic-pragmatic motivation and syntactic alternations ...........172 6. World knowledge and argument selection in translational equivalents....................................................................................180 7. Interim conclusion: Towards a usage-based account of novel argument selection ......187 6 Representation within a usage-based productivity grammar ..........190 1. Productivity as knowledge and the innocent speaker....................191 2. A formalization of the Productivity Complex ...............................193 3. Explicitly modeling entrenchment and productivity .....................196 4. Why do skewed distributions lead to productivity? A Hebbian cognitive account of argument categorization ...........201 5. Lexical choice and the structure of the mental lexicon .................210 6. Relation types in the mental lexicon .............................................219 7. Interim conclusion: Outline of rules in a productivity grammar ..................................226 7 Conclusion ............................................................................................230 1. Main results of this study ..............................................................230 2. What models of grammar are compatible with these results? .......234 3. Outlook..........................................................................................237

Contents

xiii

Appendices ...............................................................................................244 AQueries .................................................................................................244 BLinear regression model with quadratic term for -sam/-bar .................249 References ................................................................................................251 Author index ............................................................................................273 Subject index ...........................................................................................277

List of abbreviations and symbols

# ? * $ a-structure ACC BCC CC CNCN CNVCNV CONJ CLASS CxG DAT EXCL fZM f(C) GEN HL I , IA IP N(C) NEG NOM 3 P* 3* PC PL pos.

Infelicitous or inappropriate, but grammatical utterance An utterance of questionable acceptability A clearly ungrammatical utterance Measure of activation or parsability Argument structure Accusative Bare comparative correlative (the Xer the Yer) Comparative correlative (the Xer … the Yer …) Comparative correlative with NP (the Xer NP the Yer Comparative correlative with NP+VP (the Xer NP VP the Yer NP VP) Conjunction Classifier (for Japanese numeral classifiers) Construction grammar Dative Exclamative particle (e.g. Japanese yo) Finite Zipf-Mandelbrot model Normalized frequency of tokens from a category C Genitive Hapax legomena, types occurring only once in a corpus Aronoff’s index of global productivity, the ratio V/S Baayen’s index of global productivity, the ratio S/V Item and arrangement Item and process Token count from category C Negation Nominative Potential productivity (proportion of HL in a sample) Two-dimensional measure of global productivity based on 3 and V Measure of productivity based on the proportion of HL from a category within all HL in a sample Productivity Complex Plural Position within a corpus (running token number)

List of abbreviations and symbols

POT PTC REFL S SC SPC ZM V V1 Vm VO

xv

Potential form (of Japanese verbs) Particle reflexive pronoun Total vocabulary size Synthetic compound Frequency spectrum, type count Vm for each value of m Zipf-Mandelbrot model Vocabulary size, type count Vocabulary size of items with a frequency of 1 Vocabulary size of items with a frequency of m Verb-object pair

List of tables

Table 1. Table 2. Table 3. Table 4. Table 5.  Table 6.  Table 7.  Table 8.  Table 9.  Table 10. Table 11. Table 12.  Table 13. Table 14. Table 15.  Table 16. Table 17.

Table 18. Table 19. Table 20.

Examples of entries in the constructional lexicon or ‘constructicon’. .........................................................................9 Previous approaches to the application domain of productivity. ............................................................................21 Survey of approaches to grades of morphological productivity described in Bauer (2001: 15–20).......................27 Ranking of English verbs with certain sets of objects according to intuitive set size. .................................................34 Type counts for -sam and -bar in each of the 10 subcorpora ..53 Type counts for words beginning with Q- and T- in each of the 10 subcorpora with total sums and mean values. ..........56 Type counts for -sam and -bar in 10 equally sized samples from each process....................................................................58 Token counts for -sam and -bar in 10 equal sized subcorpora of some 1.5 million tokens each. ..........................59 Hapax legomena, 3 * and 3 for -bar and -sam. ..................65 Parameters and goodness-of-fit for the fitted fZMs of -sam and -bar ..........................................................................84 Summary of productivity measures and representation techniques and their dependence on the sample size N. ..........93 Productivity measures for prepositional and postpositional German wegen ‘because’. .....................................................109 Productivity measures for wegen ‘because’ governing dative and genitive, pre- and postpositionally. ......................112 Error analysis for automatic extraction of 1000 objects per verb and manual correction. ............................................116 Productivity measures for the different verbs. ......................123 Productivity rankings for verbs according to different criteria. ..................................................................................123 Observed and expected 3 values for two-slot constructions under an assumption of hapax probability independence. ........................................................................131 Comparative frequencies in CC slots and outside of them. ..133 Near synonym sets and their expected shared argument classes……............................................................................158 Top 5 objects for the hate set.. ..............................................160

List of tables

Table 21. Table 22. Table 23.  Table 24.  Table 25.  Table 26. Table 27.  Table 28. Table 29. Table 30. Table 31.

xvii

Top 5 objects for the understand set. ....................................161 Top 5 objects for the start set. ...............................................162 Top 5 modified adjectives for the intensifier set. ..................164 Rankings for sets of near synonyms according to the PC dimensions.......................................................................165 Examples of verbal phrases and corresponding SCs extracted from deWaC. .........................................................168 V1 for some SC heads along with the subset of items attested as VO and the proportion of VO / SC attestation. ...172 Frequencies and odds ratio for the most common verbal complements of help in either construction. .........................177 Productivity measures for harbor and its German translational equivalent hegen. ..............................................184 Entries in the mental lexicon together with entrenchment values…… ............................................................................197 Adding estimated VGC data to the constructicon explicitly using fZM parameters. ..........................................................198 Coefficients for a least squares regression model with a quadratic term fitted to the frequency distributions of -sam and -bar. .......................................................................249

List of figures

Figure 1. Figure 2. Figure 3. Figure 4.  Figure 5.  Figure 6.  Figure 7. Figure 8. Figure 9.  Figure 10. Figure 11. Figure 12. Figure 13. Figure 14. Figure 15. Figure 16.

Figure 17. Figure 18. Figure 19. Figure 20.

Schematicity, productivity and entrenchment in Barðdal’s (2008: 48) adaptation of Clausner and Croft (1997: 271). ......36 Barðdal’s (2008: 50) approach to hierarchical levels of schematicity as a correlate of productivity..............................37 Boxplots of V for -bar and -sam. ............................................55 Boxplots for the variance of 3 * and 3 for -sam and -bar. .64 Vocabulary growth curves for -bar and -sam. ........................69 VGCs with standard deviations and 95% confidence intervals for 10 equal sized samples of -bar and -sam. ...........70 Development of V and V1 for -bar and -sam. .........................71 Frequency spectrums for -bar and -sam adjectives in the c’t corpus….. ...........................................................................72 Frequency spectrum for personal pronouns in the c’t corpus. 73 Frequency spectrums and VGCs for words in T- and Q- in the c’t corpus…. ......................................................................75 Ranked frequencies for word forms in c’t and corresponding expected Zipf distributions ..............................77 Ranked frequencies for -bar adjectives in c’t .........................79 Comparison of the frequency distributions for -bar and -sam adjectives in the c’t corpus. ............................................81 VGCs for -bar and -sam based on observed data and a ZM interpolation. ....................................................................82 Empirical VGCs for -bar and -sam and fZM curves, extrapolated for -sam to an equal size. ....................................83 Global productivity (P*) for some English affixes from Baayen and Lieber (1991) and for -bar, -sam and -lich according to the same scheme……. ........................................87 P* diagram for -bar, -sam and -lich at the maximal equal sample size. .............................................................................88 3-dimensional representation of the development of P* as a function of N(C) for -bar, -sam and -lich. ........................89 VGCs and fZM extrapolation for prepositional and postpositional wegen. ............................................................109 Distribution of V and V1 across 10 equal sized samples of arguments for prepositional wegen. ......................................110

List of figures

xix

Figure 21. VGCs and fZM extrapolations for prepositional genitive, dative and postpositional genitive wegen with masculine singular arguments…… ........................................................113 Figure 22. P* diagram for eight verbs at N(C)=1000. ............................118 Figure 23. Barplots for N(C), V and V1 of direct objects for selected verbs in ukWaC…. ................................................................119 Figure 24. VGCs for objects of different verbs and fZM extrapolations…. ...........................................................................120 Figure 25. Dynamic P* diagrams for multiple verbs at 1000 token intervals……… .....................................................................122 Figure 26. VGCs for symmetric bare, NP and NP+VP comparative correlatives using a ‘naive’ type definition based on slot concatenation…….................................................................128 Figure 27. VGCs for symmetric bare, NP and NP+VP comparative correlatives using a non-permutational novel type definition based on innovation in at least one slot. ...............129 Figure 28. VGCs for each comparative slot in symmetric bare, NP and NP+VP comparative correlatives, as well as non-CC comparatives. ........................................................................131 Figure 29. VGCs for drink and spill with and without the object beans……… .........................................................................154 Figure 30. VGCs for the near synonyms in Table 19. ............................159 Figure 31.  Log-log plot of V for verbal lexemes in well-attested verb-object and synthetic compound constructions. .............169 Figure 32.  Log-log plot of N for well-attested lexeme pairs as SCs and VO pairs. ........................................................................171 Figure 33.  VGCs for bare and to-infinitive complements of help. .........176 Figure 34. VGCs for gerund and to-infinitive complements of start. ....180 Figure 35. VGCs for the comparatives in the apodosis of German and English bare comparative correlatives. ..........................186 Figure 36. S, V and 3 as properties of the vocabulary growth function…….. .......................................................................194 Figure 37. Representation of specific plants and plants in general using overlapping Hebbian assemblies. ................................203 Figure 38. The semantic class [+liquid] as an assembly of subnetworks……. .......................................................................206 Figure 39. A less productive network for the 2nd comparative in a bare comparative correlative. ................................................208 Figure 40. Lexical choice for a correlation of risk and profit. ................211

xx

List of figures

Figure 41. Interlocking representation of bare and NP-VP CC apodosis adjective slots within the sphere of the comparative construction. .....................................................220 Figure 42. The CC apodosis with syntagmatic relations between constructional constituents. ...................................................222 Figure 43. Syntagmatic, paradigmatic and associative links on the path from conceptualization to the processing of a productive utterance. .............................................................224 Figure 44. Linking lexical usage information to an HPSG feature structure....... ..........................................................................236

Chapter 1 Introduction 1.

The problem in a nutshell: How do speakers know how productive each slot is?

This book aims to answer a series of related questions, which can be summed up as follows: if all syntactic rules can generate an infinity of utterances from a limited vocabulary of morphemes, words or constructions, why is it that we are inclined to use some rules more often (but not exclusively) with familiar material, which is presumably already stored in memory, while other syntactic patterns tend to fill out their empty slots with items the speaker has never before used or heard in that position? Do speakers know when they are ‘allowed’ to make use of the infinitely productive facility of syntax and when they ‘should’ repeat familiar forms? And if they do have such a knowledge, how do they acquire it? These questions will be addressed here by focusing on the empirical behavior of lexical argument selection, a central domain for the definition of syntactic constructions. In order to reach satisfactory answers, we must first address the difficult problem of the place of productivity in syntactic theory. Most linguistic theories to date, and most notably generative grammar, have simply regarded all phrasal structures that are not lexicalized as part of a fully productive generative apparatus, where one rule is as liable to produce novel utterances as the next, the lexical material being chosen independently by nonsyntactic modules of the language faculty or extralinguistic semantic needs.1 Indeed, one of the defining characteristics of human language in essentially every theoretical account is its capacity to generate an unbounded range of novel forms never seen or heard before (especially as contrasted with forms of animal communication, see e.g. Bickerton 1992: 8, among many others). Put another way in the famous words of Wilhelm von

1.

This is not to say that generative approaches are not aware of selectional preferences, such as collocations and other types of multi-word units. Generative grammar is at its core simply not production-oriented in that, unlike usagebased approaches, it does not have the pretense of producing naturally distributed performance output: only a competence level description of all possible structures is aimed at.

2

Introduction

Humboldt, language is said to “make infinite use of finite means” 2 (von Humboldt 1963 [1836]: 477), an idea which was also referred to early on by Chomsky (e.g. 2009 [1966]: 71–72) and which remains central in more recent generative approaches as well (Chomsky 1995: 14). Whether or not a syntactic construction embeds novel material or simply repeats already familiar lexemes in its open positions is not viewed as important, especially because in many approaches the unit of syntactic analysis has generally been the sentence, and “most of the ‘normal sentences’ of daily life are uttered for the first time in the experience of the speaker-hearer” (Chomsky 1966: 35). Large electronic corpora, which were not available at the time the latter statement was made, can now show that it is not true for many types of texts and sentences (Erman and Warren 2000 estimate that on average more than half the words in a text are part of a prefabricated unit or ‘prefab’; cf. also the criticism in Manning 2003). However, even for rather complex text types or registers with very little wholesale repetitiveness, forming sentences is not a monolithic process in any syntactic theory, and the process of selecting one or even multiple lexemes, e.g. by a head filling its argument structure (or a-structure for short, see Section 3), is quite likely to repeat familiar material in some cases and innovate in others. Thus if it can be shown that some heads are more selective in preferring familiar arguments, a linguistic theory that claims to explain language usage would also have to explain how these tendencies develop and how they are stored, since unlike the case of lexicalization or the emergence of collocations, behavior with novel forms cannot be explained by assigning specific combinations to the mental lexicon. To illustrate this point more clearly, one can consider the different usage of near synonyms in novel and lexicalized phrases. For example, why is it that speakers of English use a wide variety of direct object arguments for a verb such as shake, but restrict the similar transitive verb jog overwhelmingly (but not entirely) to the lexicalized object phrase jog (one’s) memory? How do speakers know to prefer the verb shake over the verb jog with nov2.

The original in context reads: “Denn [die Sprache] steht ganz eigentlich einem unendlichen und wahrhaft grenzenlosen Gebiete, dem Inbegriff alles Denkbaren gegenüber. Sie muss daher von endlichen Mitteln einen unendlichen Gebrauch machen, und vermag dies durch die Identität der Gedankenund Spracheerzeugenden Kraft.” In Heath’s English translation (von Humboldt 1988: 91): “For [language] is quite peculiarly confronted by an unending and truly boundless domain, the totality of all that can be thought. It must therefore make infinite use of finite means, and is able to do so in virtue of the identity of the force that engenders both thought and language.”

The problem in a nutshell

3

el arguments, despite the fact that most of these are semantically compatible with both verbs, 3 and both verbs can exhibit novel objects? For example the object elbow in example (1) occurs only once even in a fairly large corpus like the BNC4 and is probably not lexicalized or previously stored in memory. (1)

A hand jogged his elbow and he turned. [BNC, doc. FS8]

But if we take the first ten examples of this verb in the BNC, we will find the idiomatic memory eight times, while the first ten examples of shake contain the frequent object hand(s) only four times besides six other, seemingly non-lexicalized objects (e.g. rags). From this data alone it is difficult to determine whether speakers are actually disinclined to select one verb when the argument they have in mind is unfamiliar in that position, or whether the precise meaning of the one word happens to fit their purpose in a particular case better than the other. One could also argue that such idiomatic cases as jog (one’s) memory should be removed from the discussion entirely as representing another sense, and therefore another verb, but this raises several problems, including the identification of lexicalized cases in the first place and degrees of lexicalization as well (see Chapter 5, Section 3 in detail). It is therefore important to establish a rigorous methodology and find out whether there are systematic preferences that cannot be reduced to differences in meaning. In the following we will see that differences in productivity are pervasive in every manner of construction, including the following examples which will be discussed in detail: –



3.

4.

Objects of unrelated verbs, such as incur (very unproductive) versus eat (very productive, see Chapter 4, Section 4), but also of very nearly synonymous verbs like start, begin and commence (Chapter 5, Section 3). Objects of competing synonymous adpositions, such as pre- and postpositional wegen ‘because’ in German (Chapter 4, Section 3). This in turn raises many further questions about which and what kind of semantic classes or roles one should assume, a topic which I will discuss in depth in Chapter 5 in evaluating in how far lexical semantics can account for such phenomena. The British National Corpus, approx. 100 million words of balanced British English from the 1990s, see http://www.natcorp.ox.ac.uk/.

4

Introduction



Deverbal derivations which preserve argument structure, as in collect X versus X collector (Chapter 5, Section 4). The same constructions embedded in different syntactic environments, such as the comparative adjectives in variants of the comparative correlative construction (e.g. the faster the better, Chapter 4, Section 5 and Chapter 5, Section 6). Syntactic variants or alternations such as help (to/Ø) [VERB] or start ([VERB]ing/to [VERB]) (Chapter 5, Section 5). Translational pairs expressing the same or similar concepts, e.g. harbor [an emotion or mental state] and its German equivalent hegen (Chapter 5, Section 6).



– –

Although part of the explanation for the imbalances in preferences between competing constructions will inevitably involve semantic subtleties and world knowledge, I will aim to show that a substantial part of these imbalances must be seen as idiosyncratic, specified by a languagespecific grammar and firmly a part of the syntactic selectional system itself. Teasing apart the semantic, pragmatic and grammatical aspects of these differences is one of the major challenges for the work at hand. Beyond simply showing that differences in productivity permeate the syntactic-lexical interface in argument selection, I will also be concerned with the representation of information necessary for these properties of selectional processes in general to be represented in grammar, and especially in cases where lexical semantics cannot fully account for the range of selections in practice. The search for such a representation is informed by previous work admitting gradual distinctions in productivity as part of speakers’ knowledge of their language. For the most part such approaches have grown in the area of morphological productivity, roughly the relative extent of the theoretically inexhaustible ability to form new words in certain patterns (see Baayen 2001, 2009; Bauer 2001; Booij 1977; van Marle 1985; Plag 1999, 2003; Schultink 1961 to name a few; see Chapter 2 in detail), though a corresponding notion of syntactic productivity has emerged in some recent studies (most notably Barðdal 2006, 2008; Kiss 2007). Much of the first half of this book is dedicated to the adaptation of methods from the well-developed literature on morphological productivity to the domain of syntactic argument selection, and the exploration of the theoretical consequences within a usage-based model of grammar. The remainder of this chapter is structured as follows: The next section introduces the foundations of usage-based grammar models, in particular

Preliminary remarks on usage-based theories

5

within the framework of construction grammar, which my interpretation of the empirical data will build on. The following Section 3 broadly discusses the scope of the terms argument structure and argument selection, as well as the argument-adjunct distinction, as they apply to the analyses in this book. Section 4 gives a sketch of the goals for the endeavor of answering the questions raised so far, providing an overview of the requirements that a theory describing and explaining productivity in syntactic argument selection should satisfy. Section 5 then lays out the structure of the following chapters, which aim to fulfill these goals. 2.

Preliminary remarks on usage-based theories

The issues raised in the previous section are related to what has been labeled ‘usage’ in the theoretical discussions of recent years, since they pertain to what speakers are likely to say in practice, and not only what they could or could not say in principle. As such, the interpretation of the facts presented in this study must be rooted in a usage-based framework; in particular I orient my terminology to the general framework of construction grammar (Goldberg 1995, 2006a; Kay and Fillmore 1999; Croft 2001 among others; I use the abbreviation CxG without singling out those authors who also use it), which may be seen as a family of related usagebased theories of grammar. Although there have been some attempts to classify construction grammar approaches (e.g. Fischer and Stefanowitsch 2006), there is still a fairly wide spectrum of more or less formal representations which refer to CxG as an underlying model or declaration of intent (more formal approaches are also said to be converging with HPSG, cf. Fischer and Stefanowitsch 2006: 3–4 and see in particular Boas and Sag 2011 for sign-based construction grammar or SBCG). The exact nature of the grammatical system that best fits with the results of the present study will be discussed in Chapters 6 and 7 once the data has been presented and some fundamental concepts have been defined. The aim of this section is to give some of the foundational ideas common to all models of CxG as contrasted with other frameworks so that these can already be referred to in the interim. As stated above, the body of generative approaches to grammar has aimed to determine the set of rules which account for the grammatical utterances in a language through different combinations acting on an inventory of lexical items and syntactic categories. A traditional formulation of this view can be found in Chomsky’s early work, e.g.:

6

Introduction The fundamental aim in the linguistic analysis of a language L is to separate the grammatical sequences which are the sentences of L from the ungrammatical sequences which are not sentences of L and to study the structure of the grammatical sequences. (Chomsky 1957: 13)

In this view, there is little room for the concept of productivity as anything other than the binary feature of whether or not a structure is part of the grammar of a language (and hence productive), or not. Though there has certainly been much debate in linguistics since this statement was made about the nature of grammar and what it should or shouldn’t contain, this notion of a grammar or ‘analysis’ of a language is still found in many contemporary introductory texts, as pointed out e.g. by Sampson (2007: 1– 2). Usage-based models of grammar, by contrast, claim that grammar must account not only for the structures that occur in language themselves, but also for the way they are used. Some authors go so far as to argue that among other facts of usage, the different frequency and likelihood of competing constructions in certain situations are expected to be predicted by the grammar (especially Manning 2003: 325), or else cast the task of grammar acquisition as predicting the choice of grammatical construction given the message to be conveyed (Goldberg 2006b: 46).5 The term ‘usagebased’ as it is used in the above context goes back to Langacker’s cognitive grammar:6 Cognitive grammar, […] is a usage-based theory. The grammar lists the full set of particular statements representing a speaker’s grasp of linguistic convention, including those subsumed by general statements. Rather than thinking them an embarrassment, cognitive grammarians regard particular statements as the matrix from which general statements (rules) are extracted. For example, the N +-s rule of English plural formation is extracted by 5.

6.

This contrasts explicitly with statements such as Newmeyer’s (2003: 692), who stresses: “[n]o generative grammarian ever claimed that sentences generated by the grammar should be expected to reveal directly what language users are likely to say”. Note that taking Newmeyer’s view does not deny that frequency information is somehow represented in the brain: it is probably undisputed that priming studies can show this unequivocally. The issue is rather whether this information need be part of the grammatical description. In fact CxG approaches are themselves sometimes designated as cognitivelinguistic (e.g. Wulff 2008: 14).

Preliminary remarks on usage-based theories

7

speakers from an array of specific plural forms (toes, beads, walls, etc.), including some learned previously as fixed units; in fact the rule is viewed simply as a schematic characterization of such units. Speakers do not necessarily forget the forms they already know once the rule is extracted, nor does the rule preclude their learning additional forms as established units. [Boldface in the original - AZ] (Langacker 1987: 46)

In other words, usage-based grammar is conceived of as “nonreductionist and maximalist” (Behrens 2009: 385; cf. also Bybee 2010: 14–32): [I]t does not strive to reduce language to as abstract a rule system as possible, because particular (lexically specific) and abstract phenomena are the same in kind, namely symbolic form-function units. Parsimony of storage and representation is not the goal of the theory, nor the underlying assumption of how grammar works. Consequently, usage-based grammar is maximalist because it considers idiosyncratic phenomena, low-level schemas, as well as very productive schemas with general, rule-like properties. (Behrens 2009: 385)

These properties are important for the study of productivity, since the phenomenon is by definition idiosyncratic: if the range of items and the extensibility of a particular slot in a construction are somehow predictable (e.g. from its meaning) then there is no need to assume a separate notion of productivity. This notion only becomes interesting if we believe that there is something conventionalized about productivity, where one construction can be productive and another construction less productive or not productive at all without a clear extralinguistic reason. The maximalist nature of usage-based approaches is also important, since it lends itself to the idea that frequent constructions are remembered by speakers as whole units, which in turn get used more repetitively, constraining the range of lexical choice and reducing productivity. CxG approaches build on the usage-based paradigm and all share the notion that language is built up from ‘constructions’, pairings of form and meaning at all levels of grammatical representation. Earlier approaches restricted construction status to patterns whose meaning could not be completely predicted from that of their components, even if they are partly or largely transparent (notably Goldberg 1995: 4; see Fischer and Stefanowitsch 2006: 5–6 for discussion). This echoes older ideas such as

8

Introduction

Fillmore’s (1979) notion of the ‘innocent speaker’, who could not be expected to come to the correct compositional interpretation even for seemingly transparent constructions. For example, a completely innocent speaker with only a lexicon of morphemes and general composition rules would not be able to predict the different meanings of prisoner and jailer based on their constituent morphemes alone (see Chapter 6, Section 1 for further discussion). Most current CxG approaches assume that even completely regular complex units can achieve an independent construction status in the mental lexicon if they are frequent enough: Any linguistic pattern is recognized as a construction as long as some aspect of its form or function is not strictly predictable from its component parts or from other constructions recognized to exist. In addition, patterns are stored as constructions even if they are fully predictable as long as they occur with sufficient frequency. (Goldberg 2006a: 5)7

All constructions at all levels of abstraction are assumed to be stored together within a mental lexicon sometimes referred to in CxG as the ‘constructicon’, which contains abstract patterns, patterns with partly specified lexical material, and fully specified lexemes. The representation in Table 1 (adapted from Goldberg 2006a: 5) exemplifies these different types of constructions which are stored uniformly regardless of their complexity. Constructions are stored along with their semantics and information about their distribution, though the kind of semantics assumed to be represented in the mental lexicon covers a very wide spectrum of phenomena which go beyond traditional truth value-based formal semantics. Different approaches accept for example frame semantic meanings, pragmatic meanings, information structure, and more, as aspects which may be coded on the meaning side of a construction (see Fischer and Stefanowitsch 2006: 8–10). Common to most approaches is the assumption that quantitative distributional facts about the usage of a construction are also stored mentally, i.e. that speakers know in which linguistic and extralinguistic context an utterance may be appropriate, as well as how appropriate. This may form a further kind of stored meaning, especially for proponents of distributional approaches to semantics, which postulate that the meaning of a linguistic sign may be equated with knowledge of the environments in 7.

For earlier approaches viewing non-compositionality as immaterial cf. Langacker (1987: 409–411) and Clausner and Croft (1997: 252), among others.

Preliminary remarks on usage-based theories

Table 1.

Examples of entries in the constructional lexicon or ‘constructicon’ (adapted from Goldberg 2006a).

unit Morpheme Word Complex word Complex word (partially filled) Idiom (filled) Idiom (partially filled) Comparative correlative Ditransitive Passive

9

examples e.g. pre-, -ing e.g. avocado, anaconda, and e.g. daredevil, shoo-in e.g. [N-s] (for regular plurals) e.g. going great guns, give the Devil his due e.g. jog [someone's] memory, send [someone] to the cleaners The Xer the Yer (e.g. the faster the better) Subj V Obj1 Obj2 (e.g. she gave him a taco; he baked her a muffin) Subj aux VPpp (PPby) (e.g. the armadillo was hit by a car)

which it occurs. This claim, which is sometimes referred to as the Distributional Hypothesis,8 will interest us in the present discussion insofar as the productivity of a construction can be seen as part of the knowledge about its distribution. If speakers have implicit knowledge about productivity in their selection of constructions and their arguments, then this may be a further type of distributional knowledge, about compatibility with novel environments, which must also be stored somehow in the mental lexicon. The mental lexicon is built hierarchically in all CxG approaches so that more complex constructions can refer to other constructions as constituents (e.g. a verbal argument structure requiring a nominal phrase, which is itself a construction). The maximalism of the usage-based approach ensures that both specific and abstract entries can be represented side by side, so that e.g.

8.

See Sahlgren (2008). The approach is generally attributed to the distributional semantics of Zellig Harris (1954, 1970: 785f), though John Firth’s (1957: 11) maxim “you shall know a word by the company it keeps” is often taken as a point of reference as well. The exact origin of the idea is difficult to pin down, since much of the conceptual substance of the hypothesis can already be found in an informal way in the earlier structuralists, as well as in Wittgenstein’s (2009: 41) “Die Bedeutung eines Wortes ist sein Gebrauch in der Sprache” [The meaning of a word is its usage in language].

10

Introduction

the ditransitive pattern can be stored next to frequent examples such as fully or partially filled instances of give and its arguments as in (2)–(6). (2)

give me a chance [40 times in the BNC]

(3)

give me a [N]

(4)

give me [NP]

(5)

give [NP] [NP]

(6)

[V] [NP] [NP]

This property too is important for the question of productivity in the syntactic context, since constructions offer a convenient locus or point of reference for the definition of candidates for more or less productive argument selection processes: we could speak about the productivity of the lexical selection of material for the THEME slot in indefinite NPs (the construction in (3)), about any such NP (as in (4)) or even of both slots together (as in (5)). The last pattern in (6) shows that at the most abstract level the verb itself can be selected for use in the ditransitive construction, including verbs not generally lexicalized as ditransitive, e.g. Sally baked her sister a cake (Goldberg 1995: 141). Conversely, every utterance performed by a speaker is considered to instantiate the entire spectrum of matching constructions (see Goldberg 2006a: 10; this can also be applied to fully lexicalized idioms which instantiate their regular components, cf. Wulff 2008: 17–18), so that uttering (2) is simultaneously tantamount to uttering all of (2)–(6), in the sense that these constructions are instantiated or activated in the mind of the speaker.9 Whether or not these particular candidates merit construction status, or which ones are interesting for the study of productivity are of course difficult questions deserving separate discussion. But if we assume that a theory of grammar can inform us about the constructions in a language, which is a goal of CxG, then the assumption of a corresponding 9.

This is also evidenced by the fact that both lexemes and syntactic constructions create priming effects, which may also interact (see Gries 2005). Some approaches prefer to speak of more specific constructions inheriting the properties of more schematic ones, with similar consequences, cf. Michaelis and Lambrecht (1996).

Preliminary remarks on usage-based theories

11

mental lexicon of constructions is a useful theoretical tool to attach the property of productivity to (see Chapter 6 for discussion). A further common denominator of CxG approaches is the cognitively founded notion of degrees of ‘entrenchment’, that is the extent to which a construction is lexicalized or its weight in the mental lexicon. More entrenched constructions have a higher token 10 frequency (in both production and perception, see Langacker 1987: 59–60; Goldberg 2006a: 93, 2006b), though it is often suggested that salience may also interact with entrenchment (cf. Schmid 2007). Entrenchment is of particular interest for the study of productivity since entrenched constructions are said to be retrieved more easily from memory and lead to repetition. They may also have a tendency to pre-empt variation by constraining productivity, since one is reluctant to modify a familiar form (cf. Goldberg 2006a: 94–98), and this applies to language acquisition as well (e.g. children are more likely to innovate and misuse vanish transitively than its more frequent, and presumably more entrenched synonym disappear, see Brooks et al. 1999). Finally, usage-based approaches are often probabilistic and lend themselves to views of graded grammaticality. For many writers (notably Manning 2003; Sampson 2007), the question of what structures are possible in a language is really just tantamount to the implicit formulation of a cutoff point on very unlikely constructions, so that what was once viewed as ungrammatical is recast as highly unlikely. By taking their usage into account, it becomes possible to predict probabilities for attested constructions in context. The importance of this goal of at least some usage-based grammars for the study of productivity is that assessing the productivity of a construction may offer insights into the likelihood of unattested constructions as well, rather than uniformly treating it as an indistinct, very low probability. With these properties of the theoretical framework in mind, we may now turn to discuss what phenomena should be treated under the heading of argument selection.

10. ‘Tokens’ in this context are specific cases of attestation, and ‘types’ are unique classes of those tokens. For example, the sentence “The blackness of darkness is darker than the blackness of coal” can be said to contain three particular tokens of words ending in -ness, which belong to two distinct types: blackness and darkness. Therefore nouns in -ness have a token frequency of 3 and a type frequency of 2 in that sentence.

12 3.

Introduction

Argument structure, argument selection and adjuncts

For the majority of this work, the notion of argument structure (a-structure, see Grimshaw 1990) will be taken as a given, with the exact nature and identity of arguments for a particular case being supplied by some existing grammar. Arguments are understood to be constructions in the sense discussed above, which fill empty slots in hierarchically more complex constructions as lexically unspecified constituents thereof. The a-structure of a construction therefore consists of the lexical representation of the empty slots of that construction. It is generally assumed that a lexical entry for an a-structure must subsume at least two kinds of information. On the one hand, the structural or syntactic aspect of the a-structure must be stored, i.e. the syntactic categories required by an open ‘slot’ in a construction and their configuration. On the other hand, the semantic aspect must be represented, which assigns meaning to the argument with regard to the construction (this is no different from the traditional generative approach to a-structure, which refers to a syntactic d-structure and a lexical semantic structure, cf. Grimshaw 1990: 1–6). In a CxG approach, the semantic aspect may also include a semantic contribution from an entire a-structure, such as the abstract ditransitive pattern, which may have a meaning above and beyond the compositional meaning of the lexical constituents filling its slots, such as the verb bake or object NPs like her and a cake (Goldberg 1995). The exact nature of the semantics of the arguments themselves is left underspecified, and moves between coarsely defined thematic roles such as e.g. AGENT 11 for the subject of drink (see Dowty 1991) and more fine grained semantic classes such as [+liquid] for its direct object (see Jackendoff 1990). The granularity assumed for the semantics of an argument slot will play a significant role in the discussion of the range of all possible arguments for the productive use of that slot and will be discussed in detail in Chapter 5. The term ‘argument structure’, also referred to as a subcategorization frame or valency, 12 will be used to refer not only to the number and syntactic/semantic types of the objects of verbs, but more generally, to any construction specifying empty slots. Thus we find a-structures for verbs requiring NPs, PPs etc., but also for the nominal arguments of adpositions, 11. I will use small capitals to denote thematic roles throughout. 12. There are, of course, subtle differences in the intended meaning of these terms for different authors, but for the present purpose they may be used interchangeably.

Argument structure, argument selection and adjuncts

13

verbal complements such as infinitives and gerunds, or completely idiosyncratic constructions such as comparative correlatives, which require at the very least two comparative adjectives as arguments (Xer and Yer in the Xer the Yer above). Knowledge about a-structure is accordingly implied by the structure of the constructicon directly. Argument selection is the process by which the empty slots of a construction are filled, resulting in an argument realization. Depending on the definition of the construction, this can mean specifying a single lexeme to fill a single slot, multiple lexemes for multiple slots, or specifying a lexeme together with its modifiers (e.g. a construction may specify that an object NP must be definite or indefinite, in which case a specific range of determiners becomes possible, one of which must be realized). In most cases which will interest us, a lexical head will be specified which directly requires only argument heads. Arguments or modifiers of those arguments can be specified hierarchically by the construction occupying the initial argument slot (see Chapter 4, Section 2 in detail). In one particular case I will compare syntactic argument selection in VPs to argument selection for deverbal nominalizations in synthetic compounds (Chapter 5, Section 4), but otherwise the term argument selection will be used to refer to arguments from the word level and upwards. Finally, the dichotomic distinction between arguments and adjuncts, often called into question in the past as ultimately undecidable (or at least for a sufficiently problematic amount of open cases, see esp. Vater 1978; Somers 1984; Jacobs 1994; Przepiórkowski 1999), will play a very small role in the present work. Since modification via adjuncts must also be mapped onto constructions in a constructionist approach (e.g. an attributive adjective construction, a PP modifier construction, etc. all assume filling some open slots in a predetermined configuration), both modification and argument filling will be approached using the same mechanisms. It is equally possible to speak of productivity in argument selection and in adjunct selection, and the kind of distinction one may wish to draw between the two is orthogonal to the approach I will be taking. Nevertheless, the focus in this work will be on the filling of slots traditionally viewed as arguments, such as verbal or prepositional objects, with few exceptions, such as a study of adjunct intensifiers like very, extremely, highly, and others in Section 3 of Chapter 5.

14 4.

Introduction

Requirements for a theory of syntactic productivity

For a theory of productivity in syntactic argument selection to be valid and useful, it must make valuable predictions as well as adequately describe and satisfactorily explain the phenomena discussed above, while at the same time remaining falsifiable. Such a theory must first of all demonstrate that productivity is indeed a measurable feature of syntactic structures in a reproducible, reliable way, and that one can generalize from the findings of a particular study onto the properties of the relevant syntactic class. This means first and foremost that operationalizable criteria must be developed to measure syntactic productivity in much the same way as such measures have been developed for morphological productivity (see Chapter 3). However, this can only be achieved if a theoretical understanding is developed of what it is we wish to measure, or more generally, what we mean by ‘productivity’. Any measures suggested must therefore take account of both qualitative criteria (when, for a particular case, is something productive?) and quantitative strategies to estimate how productive different cases might be. We may then study what the probability of productive behavior is for unseen cases, based on empirical data. This type of procedure will also contribute to a possible falsification of any results, since data can be gathered to show estimates are incorrect or that samples behave inconsistently, casting doubt upon the validity of either the measure being used or the underlying phenomenon being assumed. Finally, a theory of productivity for syntactic constructions should be integrated into a more general grammatical framework to explain how productivity interacts with the system constraining speakers’ grammatical choices, and how speakers acquire more or less productive constructions based on input. This also implies that the theory must be language independent, applying across typologically diverse languages by referring to non-language-specific notions such as frequency, lexicalization, innovation, semantic classes, etc. Although I will be taking an explicitly CxG oriented approach to the language-independent integration of productivity in grammar, an outline will be given in Chapter 7 (Section 2) of what this integration could mean for other formalisms and what kind of usage-based theories about grammar are compatible with the data presented here. The final section of this chapter lays out the structure for the remainder of the discussion, which is meant to fulfill the above expectations.

Chapters in this book

5.

15

Chapters in this book

The remaining chapters of this book are structured as follows. Chapter 2 gives an overview of the literature on productivity to date. Since the focus in the productivity paradigm has been on morphological word formation, for which very many good sources exist, the discussion primarily aims to summarize these sources and then redefine the pertinent concepts from previous work in the context of their applicability to the field of syntactic argument selection. Chapter 3 presents concrete empirical productivity measures based mainly on the work of Harald Baayen and colleagues in morphology, with examples from word formation to illustrate the behavior of each measure with actual data. Productivity is established as a multidimensional complex of phenomena, each aspect of which can be captured by a certain measure. Chapter 4 applies a selection of the measures presented in the previous chapter to English and German a-structures, including verbal objects, adpositional phrases and, as an example of a family of multi-slot constructions, the choice of comparative adjectives in comparative correlatives. With the applicability of the productivity measures conceived for morphological word formation established in the domain of syntactic argument structure, Chapter 5 considers lexical-semantic and world knowledge-based alternatives for the explanation of argument realization without reference to productivity phenomena. Views based on semantic classes and decompositional semantics are presented and confronted with idiosyncratic productivity effects that show semantically inexplicable differences in the productive usage of different types of near synonyms. These include synonymous verbs in the ordinary sense (distinct lexical verbs with similar meaning), but also sets of related verbs and constructions including derivates from the same lexeme stem, as well constructional alternations using the very same choice of lexical heads. Chapter 6 follows with an interpretation of the empirical results of the previous two chapters by offering an exemplar-based account of constructional categorization represented in a network model of the mental lexicon. It is argued that Hebb’s Law, which states that coactivated networks develop stronger connections, can be made responsible for the acquisition of some constructions as more productive than others based on input frequency distributions, independent of their meaning. Finally, Chapter 7 draws the conclusion by summarizing the main findings of the previous chapters, discussing the kind of models of grammar that would be compatible with the present results and giving an overview of open theoretical and

16

Introduction

methodological questions regarding productivity in argument selection. A number of further methods from a variety of sub-disciplines are also discussed, which may be used to answer some of the questions that remain to be answered.

Chapter 2 (Re-)defining productivity: From morphology to syntax

This chapter introduces the main concepts underlying the study of productivity, especially as they have been applied to morphological productivity in previous work. It makes extensive reference to the survey of morphological productivity in Bauer (2001) and attempts to single out and adapt the concepts most relevant for the discussion of corresponding phenomena in syntax. Section 1 reviews and compares major definitions of productivity, encompassing both morphology-centered and general approaches. Section 2 discusses the question of which theoretical objects productivity should be seen as a property of: phrases/constructions, slots, processes etc. Section 3 establishes the necessity of distinguishing degrees of productivity and discusses the distinction between discrete classes and a productivity continuum. The following Section 4 then examines the principled criteria which may be used in assigning phenomena to a class or a point on such a continuum. In Section 5 the scope of cases to be covered in this work is narrowed down to exclude the syntactic equivalents of what has been referred to in the morphological literature as ‘creativity’. The final section of this chapter builds on the notions introduced in the other sections to draw a concrete roadmap for the formalization of different aspects of productive behavior under one theoretical framework. 1.

General definitions in previous work

Though long considered to be one of the ‘most unclear terms in linguistics’ (Mayerthaler 1981: 124) 13 or even a ‘mystery’ (Aronoff 1976: 35), 14 productivity has been repeatedly defined along lines that seem to converge on multiple criteria, at least in as far as the term applies to word formation. Some important and often cited definitions include the following: We see productivity as a morphological phenomenon as the possibility for language users to coin unintentionally an in principle unlimited number of 13. In German: „Produktivität zählt zu den unklarsten Begriffen der Linguistik“. 14. “Productivity is one of the central mysteries of derivational morphology”.

18

(Re-)defining productivity new formations, by using the morphological procedure that lies behind the form-meaning correspondence of some known words. (Schultink 1961: 113)15 Productivity is a feature of morphological innovation. It is a feature of morphological processes which allow for new coinages, but not all coining necessarily indicates productivity. To be shown to be productive, coining must be repetitive in the speech community: isolated instances of coining from individuals do not in themselves necessarily indicate productivity. Various factors appear to aid productivity: type frequency of appropriate bases, phonological and semantic transparency, naturalness, etc., but these are aids to productivity, not productivity itself. Productivity can be distinguished from creativity, although it is hard to draw a consistent line between the two. It may be the case that productivity can be seen as rulegoverned, and creativity seen as rule-changing and equated with the use of analogy, but this is not settled. In sum, the productivity of a morphological process is its potential for repetitive non-creative morphological coining. (Bauer 2001: 97–98)

And specifically with reference to word formation through affixation: Any theory of word-formation would therefore ideally not only describe existing complex words but also determine which kinds of derivative could be formed by the speakers according to the regularities and conditions of the rules of their language. […] The property of an affix to be used to coin new complex words is referred to as the productivity of that affix. Not all affixes possess this property to the same degree [...] Even among affixes that can in principle be used to coin new words, there seem to be some that are more productive than others. [Boldface in the original - AZ] (Plag 2003: 44)

Though the above authors turn out to have some differences on subsequent details (these will be discussed below), all these definitions share the notion of ‘coining new words’ and some reference to regularity or repetitiveness (if Schultink’s “morphological procedure that lies behind the form15. Translation from Evert and Lüdeling (2001: 167) of the Dutch: “Onder produktiviteit als morfologisch fenomeen verstaan we dan de voor taalgebruikers bestaande mogelijkheid door middel van het morfologisch procédé dat aan de vorm-betekeniscorrespondentie van sommige hun bekende woorden ten grondslag ligt, onopzettelijk een in principe niet telbaar aantal nieuwe formaties te vormen.”

General definitions in previous work

19

meaning correspondence of some known words” is to be read this way). The notions of ‘coining’ and repetitiveness are deeply rooted in word formation, since ‘coining’ is hardly ever applied to novel sentences or argument selection (perhaps sometimes to novel idiomatic phrases) and repetitiveness of a regular syntactic pattern usually goes unnoticed (recall the early view from Chomsky on ‘normal sentences’ in the first section of the previous chapter). Nonetheless, a relatively early exception to the restriction of the term productivity to morphology can be found in Hockett: The productivity of any pattern – derivational, inflectional, or syntactical – is the relative freedom with which speakers coin new grammatical forms by it. (Hockett 1958: 307)

Hockett goes on to demonstrate the term in inflection as applying to the English plural -s, in derivation to the adverb forming suffix -ly and, further on, to the ‘subject-predicate pattern’, i.e. copular sentences of the type X is Y. The often cited earliest definition of productivity in probabilistic terms, from Bolinger’s (1948: 18) treatment of morphemes, is also not explicitly restricted to morphology, though this is its intended application: he defines it as “the statistical readiness with which an element enters into new combinations”. Baayen and Renouf (1996: 69) take this definition as a starting point for measuring morphological productivity for derivational affixes and attempt to estimate this readiness in terms of the probability of encountering an unseen form from the category in question (Baayen and Renouf 1996: 73–74; see also Chapter 3 in detail). Another definition, completely unrestricted by the type of construction can be found in Langacker: Productivity is a matter of how available a pattern is for the sanction of novel expressions. Though productivity tends to correlate with generality, they are often dissociated and have to be distinguished. Patterns of comparable generality can easily differ in their degree of productivity. (Langacker 1999: 114)

This definition simply refers to ‘patterns’. In fact, Langacker’s subsequent example deals with a hypothetical construction forming causatives for intransitive verbs – there is no explicit mention of whether this construction is realized morphologically (e.g. as an affix) or syntactically (a periphrasis); both options are of course well attested typologically (e.g. Japanese or He-

20

(Re-)defining productivity

brew for the former, English or German for the latter). This illustrates how the generalized device of the construction in construction grammar (or ‘pattern’ or ‘schema’ in similar approaches) ‘levels the playing field’ between syntax and morphology for the application of the notion of productivity. Finally, a specific definition of productivity for syntax can be found in Barðdal’s work, who sees syntactic productivity as “an argument structure construction’s ability to attract new or existing lexical items” (Barðdal 2008: 1), and refers specifically to two different aspects of syntactic productivity (Barðdal 2008: 29–20, 53): i. speakers’ ability to generate sentences never heard before; and ii. speakers’ extensions of a-structure constructions (i.e. abstract constructions without a lexical head, such as the ditransitive construction) to new verbs. The latter aspect, explored in depth in Barðdal’s work, is closely related to morphologically based definitions (i.e. the coining of new types by introducing novel verbs), though she restricts it to apply only to novel verbs as instantiating types in abstract schemas. Novel verbs are certainly one of the sources of productive behavior in syntax, and also partially responsible for the former aspect, novel sentences, but this does not cover e.g. the selection of novel verbal objects and other arguments. The question of what productivity can and should be applied to in syntax (verbs, constructions etc.) will be discussed in the next section, while the definition of types especially in the context of argument selection will be dealt with in Chapter 4, Section 2. The common points between all of the above definitions are rather few and somewhat vague – it is clear that productivity is a feature describing the propensity associated with some linguistic pattern for the regular production of novel forms, where these patterns may or may not extend from morphology to include syntactic ones. Some definitions have very broad coverage (e.g. Langacker 1999) while others are more restricted and specific (e.g. Schultink 1961). However there also are a great many differences of principle between the views of all the above authors in what kind of variable productivity is (usually binary, ordinal or ratio-scaled); what productivity applies to exactly (e.g. words, affixes, constructions, processes etc.); how this ‘propensity’ is to be understood (e.g. realized type counts so far, freedom from restrictions, or some corpus-based or psycholinguistically established estimate of future novel behavior); and the possible borders of productive phenomena (especially the distinction between productivity and creativity mentioned in Bauer’s definition). These aspects have already been discussed in depth for morphological productivity (see the overview in Bauer 2001: 33–99) but not as such for

What productivity applies to: Morphology versus syntax

21

the domain of syntactic selectional processes. In the following subsections I will therefore only introduce each point as it has been debated in morphology and concentrate on the interpretation within a syntactic context. The final section in this chapter offers an integrative summary of the elements presented in other sections and proposes the adoption of a multidimensional model I will refer to as the ‘Productivity Complex’ (PC), the components of which the rest of this work will try to establish and quantify on empirical grounds. 2.

What productivity applies to: Morphology versus syntax

In studies of morphological productivity there has been some debate as to what productivity is a feature of: affixes, processes, rules or words. An overview of these conceptions in previous studies is given in Bauer (2001: 12–15) and can be summed up in different groups as follows (I add additional views not discussed by Bauer in cursive type): Table 2.

Previous approaches to the application domain of productivity.

studies

productive element 16

Lulofs (1842 [1833]: 156) ; Fleischer (1975: 71); Plag (2003: 44)

affixes

Uhlenbeck (1978: 4); Anderson (1982: 585)

[morphological] processes

Aronoff (1976: 36); Zwanenburg (1980: 248); Bakken (1998: 28)

rules

de Saussure (1969 [1915]: 228)

words

Al and Booij (1981: 32); Anderson (1982: 585)

groups of processes

Hockett (1958: 307); Langacker (1999: 114)

pattern

With the exception of de Saussure’s position, which treats complex words as themselves being productive insofar as they form a prototype for novel

16. Bauer refers to an edition (1835: 157) of the same title (with some spelling variations) published in Groningen and cited by Schultink (1992: 189), which is not available to me.

22

(Re-)defining productivity

formations by analogy,17 most views concentrate on the marker common to both extant formations of the type in question and novel ones of the same sort, in most cases an affix. That is to say, if a familiar noun like awareness in (7) and a novel one like Kafkaesqueness in (8) share -ness as a common denominator, it is -ness which is productive (emphasis in examples is mine throughout): (7)

The piece tests the actor’s awareness and imagination to the full, but nevertheless makes precise demands on him [BNC, doc. A06]

(8)

I would have smiled at the Kafkaesqueness of the situation, had I not broken out in a cold sweat and had I at that stage not been ignorant of Kafka’s existence [BNC, doc. AA8]

Bauer (2001: 12–15) criticizes the focus on affixes since it does not cover e.g. productive vowel alternations, though this could arguably be addressed by extending the concept of ‘affix’ simply to ‘morpheme’, covering vowel alternations if these are interpreted as infixes, or even non-concatenative morphology or ‘transfixation’ as in e.g. the Semitic languages, if one allows discontinuous morphemes. Non-overtly marked processes, such as productive conversion as in (9), offer greater difficulties, but can possibly be resolved by referring to a null morpheme, as in the analysis in (10). (9)

Do you want to sellotape him up? [BNC, doc. KBW]

(10) [[sellotape]N ØV]V The main distinction therefore seems to lie in whether productivity is a feature of concrete linguistic signs or of the cognitive processing steps responsible for combining these signs with the relevant bases. These views also parallel the distinction between item-and-arrangement (IA) and itemand-process (IP) approaches to morphology (see Hockett 1954), i.e. 17. In de Saussure’s (1969: 228) words, cited in Bauer (2001: 13): “On pourrait classer les mots d’après leur capacité relative d’en engendrer d’autres selon qu’ils sont eux-mêmes plus ou moins décomposables” ‘It would be possible to classify words according to their relative capacity to generate others, depending on their own degree of analysability’ [Bauer’s translation - AZ]. However Bauer (2001: 14) notes that de Saussure was not consistent in his view of productivity and also cites Schultink (1992: 207) to the same effect.

What productivity applies to: Morphology versus syntax

23

whether word formation is simply a concatenation of stored units (including regular allomorphic variants) or a derivational process producing surface forms ‘on the fly’ from deeper representations. As Bauer also points out, the difference between reference to ‘rules’ or ‘processes’ in the latter family of views is largely related to the authors’ views of grammar, and the same could probably be claimed for the use of ‘pattern’ in Hockett and Langacker’s cases. Schultink’s (1961: 113) use of the word ‘procedure’ in the definition of productivity earlier is perhaps ambiguous between these senses of rule and process. I will therefore put these differences aside pending some choice of a specific model of grammar (see Chapter 6 and Section 2 of Chapter 7), and concentrate for now on the application to the syntactic domain of a more restricted question: ‘cognitive process in the widest sense’, or ‘lexically stored sign’ (simply as a form-meaning correspondence, including infixes, vowel alternation, discontinuity and possible null elements). For syntactic constructions, it is obvious that we do not always have a concrete outward realization of the common denominator for a productive class. If we regard structures like prepositional phrases as a class, which always contains a preposition and its argument, we can perhaps say of one specific preposition that it is productive, but for PPs in general we cannot claim that productivity resides in any one morpheme. At the same time, it is clear that the generation of prepositional phrases is a productive process in English, and if we choose to regard PPs as a productive class, we mean that we assume a construction PP, much like any unit in the mental lexicon, which can have this property, or else that the process involved in the construction of PPs is a productive one. Although Bauer notes these views for morphology are largely equivalent and do not generate any conflicting predictions in theory (this is probably also the consensus about the IA or IP debate), it seems to me that the terminological choice of ‘process’ is preferable for syntax, since it brings us one step further from the terminological level of morphemes vs. complex constructions, which is conducive to the level of generalization aimed at here. This nomenclature can be used whether one wishes to view an IA process of arrangement or an IP process of derivation as productive, but in the end it may be a matter of personal preference, since in any case we must accept that syntax in general and syntactic argument selection in particular, much like morphology, involve choice, and productivity means the possibility of new choices. If we see word formation and phrase building as comparable constructions, the most straight-forward translation of morphological processes to syntactic processes is to equate an invariable recurrent part of

24

(Re-)defining productivity

the construction as indicative of the productive process (much like a morphological affix) and the variable part (a content lexeme filling some slot of the construction) as the item operated upon by the process (much like a morphological stem). Kiss (2007) does just this by examining a subclass of German prepositional phrases with bare singular nouns that have no determiner, such as unter Androhung ‘under threat’. These constructions are particularly interesting since German generally does not allow singular nominal phrases to appear without articles (with some exceptions, see e.g. Helbig and Buscha 2001: 338–347 for German; Himmelmann 1998 for a cross-linguistic survey of adpositions with zero articles, and TrawiĔski, Sailer, and Soehn 2006 for German PPs in particular). Here the preposition is the marker for the PP construction, and the noun may alternate (e.g. with Beachtung in unter Beachtung ‘under consideration [of], allowing [for]’). Kiss then suggests this formation should be treated as productive, much like a morphological process, on the basis of corpus evidence (this study will be described in more depth in Chapter 4, Section 2). At the same time it is not necessary to keep a lexical part of the construction, in this case the head, invariable or literally identical. Barðdal (2008: 29–30) suggests defining types for verbal a-structures, e.g. dative or accusative transitive constructions, based on the head verb, while ignoring its arguments. For her, new arrangements of arguments for a familiar verb count only as instances of regularity in syntax, which she compares with inflection in morphology, while novel verbs in a familiar construction constitute an ‘extension’ of the construction, which is equated with morphological derivation (see also Chapter 4, Section 2). Thus a verbal construction such as the ditransitive is productive insofar as it may be extended to house novel verbs (see also Barðdal 2006 for a summary). The decision determining what slots are considered to be relevant is thus flexible. The latter approach can also be combined with the prepositional example above, as it is possible to define the construction PP to involve any preposition (Kiss 2007 examines only the preposition unter ‘under’, but also mentions others such as mit ‘with’), so that any prepositional phrase with a singular determinerless noun is a token for that construction, and the type is determined either by the noun or by both the noun and the preposition. 18 Summing up these approaches, we arrive at the following preliminary parallels between morphological and syntactic productivity: 18. Recall that multiple constructions may apply simultaneously (Goldberg 2006a: 10): on a more abstract level, speakers instantiate ditransitive or PP constructions, while at the same time they instantiate partially-filled

What productivity applies to: Morphology versus syntax

Constant

Variable

Morphology Process (e.g. affix requiring a base) Morphological base

:

:

25

Syntax Process (construction with one[?] slot) Slot (argument)

Seen in this way, the syntactic representation of productivity is just a more generalized way of saying that a neologism arises when a novel argument is used to fill an open slot within an established construction (leaving aside for now the question of what exactly constitutes an argument in a construction). This preliminary presentation is certainly a considerable simplification of argument filling in syntax. It can be argued that it is dissatisfactory since morphological processes typically involve only one stem, whereas syntactic patterns can have multiple slots. However for morphological processes like compounding too, it can be debated whether the compounding process selects the head, which then selects a modifier, or whether it operates on two stems at once. (11) Compounding process for any nouns X, Y: [[X]N [Y]N]N (12) Modifier selection for noun X and specific head A: [[X]N [A]N]N The process represented in (11) corresponds to the general prepositional phrase case above, or to the ditransitive construction entry in the mental lexicon, where the abstract construction specifies no lexical material at all in its entry, whereas the construction in (12) is partly lexically filled and describes a compounding template for a specific lexical head A (this is more generally the account of compounding at large as a construction in the schema model as suggested by construction morphology in Booij 2005, 2010). Partially filled constructions are, incidentally, not restricted to

constructions with the relevant verb (e.g. give) or preposition (e.g. under); cf. Section 2 in the previous chapter. This implies that several layers of productive behavior may be at play simultaneously.

26

(Re-)defining productivity

argument selection; the same principles would apply for compound families based on a common modifier.19 The simplest point of departure is probably to suggest that we may examine productivity for each slot separately, or even hierarchically, positing the choice of a complement at each junction of a syntactic derivation tree as a single-slot decision. Only in cases where there is evidence of a non-hierarchical constraining of lexical choice is there really any reason to consider productivity in multiple slots at the same time (this will turn out to be the case, for example, in comparative correlatives, see Chapter 4, Section 5). Since the issue has far reaching consequences for empirical productivity measures, it cannot be discussed in any more detail before these are presented. Further discussion of the latter point will therefore be deferred for the moment. 3.

Granularity and grades of productivity

Is productivity a yes or no property or are there distinguishable categories or degrees of productivity? If so, how many, or how granular is this property? These questions have become prominent mainly in the context of theories attempting to counter algebraic approaches to word formation, which state that a word formation rule, if present and active in the morphology of a language, should simply be seen as productive. Bauer (2001: 15–20) offers a good survey of some significant studies with regard to word formation, which can be summed up in the following table in chronological order (again, additional studies are added in cursive type). For each study I note the number or scale of productivity grades assumed, the nomenclature assigned to any intermediate grades of productivity, and 19. However it should be noted that some cases of compounding arguably resist hierarchical analysis from the outset. In German, cases where a derivation is applied to a complex base that is otherwise not attested or even acceptable are referred to as Zusammenbildung “forming together” (see Erben 1993: 34, 114–115). A classic example is blauäugig ‘blue eyed’ where the adjective ?äugig ‘eyed’ which appears to head the compound is hardly attested. If this is viewed as an extensible pattern with two slots (cf. grünäugig ‘green eyed’ and langohrig ‘long eared’), then it is difficult to establish which process occurs first, derivation from Auge ‘eye’ or compounding. The pattern may then be regarded as a word formation with two slots. Other scholars reject a special status for Zusammenbildungen altogether (e.g. Fleischer and Barz 2007: 47; Leser 1990).

Granularity and grades of productivity

27

most importantly, where applicable, the criterion for membership in such a grade. Table 3.

Survey of approaches to grades of morphological productivity described in Bauer (2001: 15–20).

study

grades

de Saussure (1969: 228)

gradient

term for intermediate grade --

Dik (1967: 370)

3 grades

semi-productive

Pike (1967: 170)

3+ grades binary with restrictions index

semi-active

Matthews (1974)

3 grades

semi-productive

Fleischer (1975: 71)

3 grades

active

Booij (1977: 5) Zwanenburg (1983: 29) Baayen (1989 i.a.)

binary binary gradient

----

Fox (1990: 124)

3 grades

semi-productive

3 grades

semi-productive

gradient

--

Jackendoff (1997: 115– 121)

3 grades

semi-productive

Plag (1999)

gradient

--

Botha (1968)

Pinker and Prince (1991: 231) Bauer (1992 i.a.)

--

criterion for intermediate grade -applicable to some novel, but not all possible bases ? productive process may be subject to fewer or more restrictions non-automatic acceptability? processes creating only a moderate amount of novel words ---rules only usable for special purposes, e.g. technical terms (register restriction?) extensibility ‘to some degree’ -output of a rule unknown for some cases --

As already mentioned in Chapter 1, Section 2, binary approaches are tightly linked to algebraic models of grammar, where all rules are by

28

(Re-)defining productivity

definition open to extension within the conditions imposed by the rule, and everything else is listed in the lexicon. The appeal of these approaches has been in clearly defined borders for the domain of grammar: a finite set of rules operating on a (synchronically) static lexicon, which together form a complete generative system. Algebraic approaches therefore by definition do not attempt to assess the likelihood that a certain rule will or won’t be used to produce novel output. This would introduce a completely new feature into the system, weakening it only to describe a property that is not considered to be part of grammar at all (this view is defended very clearly by Newmeyer 2003, but see Behrens 2009 for the view that the resulting acquisition system would actually be less parsimonious). The pivotal point in explaining the difference between the prolific attestation of some processes over others, and especially in an evolving, diachronic context, therefore lies in specifying what kind of categorical conditions or restrictions a rule may impose. I will return to this point in the discussion of semantic argument classes in Chapter 5, but for the present discussion it should be noted that binary models attempting to assign an index of restriction to processes (i.e. one productive process may be constrained by more restrictions than another), such as Botha (1968), are already leaning towards a gradient concept of productivity within the generative approach. In any case, complete denials of any intermediate grades of productivity seem to be in decline in recent years, with the rise of usage-based models and the availability of corpus data. A key reason for this may be that given enough data, novel examples can be found for almost any transparent process for which it might be interesting to claim that it is not or no longer productive. In fact, even Zwanenburg’s (1983: 29) prototypical example of the English nominal -al in arrival as completely unproductive faces counterevidence when confronted with very rare examples in large Web corpora, for example the form improval, which is not listed in any dictionary I have consulted, but appears twice in ukWaC, a large Web corpus of English (approx. 2.25 billion tokens, see Baroni et al. 2009):20 (13) more than 20 new prestige classes , each with new feats & a variety of special schools, that allow improval of skills & abilities [ukWaC, pos. 16240506] 20. It should also be noted that, based on preceding and following context, both of these examples come from native English speakers and are in no way jocular, so that there is no reason to suppose anything other than natural native productivity is responsible for these two independent word formations.

Granularity and grades of productivity

29

(14) Managing 6 team leaders, 18 engineers, to progress the business through performance improval of machinery/production lines [ukWaC, pos. 65828836] But even using more controlled (and smaller) corpora like the BNC, some examples that occur only once can be found, such as pursual and accusal beside much more common pursuit and accusation, though it is hard to tell if they are simply archaisms or real examples of productivity. The claimed etymological restrictedness of the suffix to only Latinate stems has also been questioned by established examples such as overthrowal (twice in the BNC; see Plag 2003: 109). It therefore seems that there might be very few cases of absolute unproductivity that can nevertheless be transparently analyzed (for another example see also the discussion of the German adjective forming suffix -sam in the next chapter, roughly equivalent to English -some in e.g. lonesome. These suffixes are hardly productive at all in either language, but some neological examples can be found nevertheless). Among those scholars who postulate discrete categories of productivity, most tend to assume three classes (Bauer 2001: 15–17): fully productive, fully unproductive, and something in between, most often called “semiproductive”. What exactly falls under the intermediate category varies from anything that is not fully applicable to all members of an open class (Dik 1967) or just cases where novel forms are not uncommon, but clearly not automatically acceptable (Matthews 1974 and many others). For the unproductive category, the criterion is generally a closed class of bases, or put more formally enumerability of the produced forms (as in Karcevski 1932: 85, cited in Bauer 2001: 16), which are presumably stored in the mental lexicon. As Bauer (2001: 16) notes, even the processes most generally acknowledged as fully productive, such as English plural inflection, are not applicable to all conceivable bases, either on account of blocking (e.g. *childs) or semantics (?musics), consequently assigning virtually any process to the semi-productive grade. Regardless of the acceptability of such forms, the objection becomes even more critical when it is made clear that the unacceptable or infelicitous cases are probably not enumerable, which would then inevitably require recourse to (probably semantic) classes fitting the observed restrictions. In gradient approaches, some differences between semi-productive cases are acknowledged, so that productivity becomes either an ordinal variable (one process can be more productive than the other), or an attempt is made to directly quantify it, producing a continuous scalar variable (e.g. as a

30

(Re-)defining productivity

bounded number on a scale between 0 and 1, see especially Baayen 1993, 2001, 2009, which will be discussed in depth below alongside some alternative suggestions). A gradient view of productivity for morphology can be argued for on many grounds, including explanatory power for diachronic change (intermediate steps between productive and nonproductive, see e.g. Bauer 2001: 163–172; Scherer 2005; Štichauer 2009), explaining intuitive judgments on relative productivity which are reproducible to a considerable extent (cf. Baayen 1993, 2009: 904–907 for Dutch; Plag 1999: 5–35 for English deadjectival nominalization in -ness, -ity or -cy; and in general Bauer 2001: 1–32), differential performance of test subjects in neologism generation tasks (Anshen and Aronoff 1988; Baayen 1994), a correlation with parsability in human perception (Hay and Baayen 2002) and the consistent relative behavior of corpus-based productivity measures in different but comparable datasets (that is to say, measurements showing a word formation process to be more productive than another in a corpus have predictive power for the behavior of that process in further data of the same sort; for an example see Zeldes 2011 and Chapter 3 of this work which reproduce results from Evert and Lüdeling 2001; Chapter 4 presents comparable examples for syntactic argument selection). Since the question of granularity in the case of word formation has already been discussed in depth in the sources above, I will not go into the arguments for each position in any more detail at present and turn to the matter at hand: does binary or ordinal categorial productivity suffice for the description of syntactic selectional processes? What would each approach entail for the overall view of grammar? Taking the binary approach first, it would follow that any syntactic construction that is not productive is fully lexicalized. This results from the rather wide consensus on the definition of non-productivity on the basis of enumerability, and corresponds to the classic algebraic view of generative grammar. Full productivity, on the other hand, seems to require both an open class of bases and complete applicability to this class along the lines proposed by Dik (1967). Clear examples of both classes would be idioms for the former, e.g. kick the bucket, with the sense ‘to die’,21 and very unrestricted constructions for the latter, such as [VERB] [OBJECT] or even with particular verbs eat 21. In fact, many idioms and idiosyncratic constructions are very much extensible, but I will ignore this for the sake of discussion here (see Philip 2008 and Section 5 below). It is also not likely that idiomaticity itself is a binary property, see Wulff (2008).

Granularity and grades of productivity

31

[OBJECT], which can be argued to be fully productive for any object of the class [+edible]. For syntax this approach may be more resistant to Bauer’s criticism than it is for morphology, since we may be able to find many more syntactic patterns of the intuitively unrestricted kind than morphological ones. However, even if we find no sensible interpretation for non-binary productivity in most cases, one counterexample, or ideally several, should suffice to make such a model inadequate. To demonstrate the existence of an intermediate class based on the criteria discussed above, we require constructions for which we cannot enumerate all possible argument realizations, but for which we cannot predict the members based on a ‘trivial’ semantic class such as [+edible] (if one does not accept the existence of such a class, then eat [OBJECT] already constitutes a counterexample). In other words, the process has to apply to an open set of bases which defies definition as a semantic class for which membership is decidable. This apparent contradiction brings up a question which is central to a theoretical account of syntactic productivity – what is the relevant class of bases? Arguably, we could postulate an appropriate semantic class for every process, [+edible] for the objects of eat and [+cross-referenceable] for the verb cross-reference. It should be stressed at this point that any classification can always be reduced to one category per case (maximal granularity), so that the question is distinctly ‘how granular should classes of bases be?’ and not ‘how granular can they be?’ (cf. Dowty 1991: 550– 551 on the “individual-thematic-role escape hatch” and the articles in Bornkessel et al. 2006). The goal of semantic classes, in my view, should be: a. predictive, that is if two positions admit the same class, then what can occupy one can occupy the other; and b. to allow for entailments, e.g. that for the object of both kill and a paraphrase like cause to die the entailment that it is consequently dead can be derived (this seems to me to correspond to the granularity and functionality envisioned in Jackendoff’s (1990) Conceptual Structure, see Chapter 5, Section 1 in detail). It therefore becomes evident that a main opposition to a theory of non-binary syntactic productivity may come from an absolutist view of lexical semantics, in which classes can be invoked to explain any differences in the behavior of two constructions, and conversely, that a key danger of degrees of productivity is losing the generalizing power of semantic classes of bases. However as already alluded to, taking an unequivocal class-based view

32

(Re-)defining productivity

runs into problems in explaining differential behavior for synonyms22 and quickly results in very small semantic classes, to the point of reducing lexical semantics to a tautology (verbi takes arguments that are verbi-able, and that class is made of those things which can be verbi-ed). This is of course not so different from a CxG approach allowing different lexicalized behavior for every item on grounds of entrenchment (this also leads to perconstruction ‘classes’), but it is explicitly consistent with the premise of usage-based grammar, and so forms part of that theory – classes would then be dictated for entrenched cases at least in part not only by real-world semantics (what is drunk/eaten), but by convention (how the English verbs drink, eat are used productively). However undesirable this may be, assuming that differential behavior can be shown in slots for which we would not want to formulate different extralinguistic semantic classes, I see no way around integrating language internal non-binary productivity into any account wishing to describe usage. The prerequisites for discussing this point and demonstrating this state of affairs will be analyzed in Chapter 4, but as already mentioned in Section 1 of the previous chapter, semantically problematic cases can be found in a variety of constructions. These will be put to the test in Chapter 5 once appropriate methodological tools have been laid out. An attempt to reconcile the empirical findings with the admitted utility and cognitive reality of semantic classes will be made in Chapter 6. If we provisionally accept the need for intermediate classes, the final point left to raise for the granularity of syntactic productivity is that of discrete categories versus a continuum of semi-productivity. For discrete categories one must assume some qualitative difference in the way a process is used to generate novel forms. Apart from completely unproductive and productive, conceivable intermediate categories might relate to certain restrictions on unbounded productivity (as suggested by Botha 1968), i.e. productive in some context or for some class of bases. However, if we assume the legitimacy of classes of bases, the latter sort of categories becomes untenable, since as we have considered above, as long as a process universally applies to a non-circular, well-defined class of bases, it is considered fully productive. Productivity only in a certain context, on the other hand, suggests competing systems, e.g. different registers (cf. Plag, Dalton-Puffer, and Baayen 1999, Grabar and Zweigenbaum 2003). Certainly we would also accept such cases for morphology, even for very productive categories like inflection: a language might keep an archaic 22. However these may be defined; see Chapter 5, Section 3 for discussion and some concrete examples.

Criteria for productivity

33

form in one register but not in another, such as the second person singular thou forms for English verbs (thou knowest etc.). But in these cases one would probably assume the system allowing such forms employs a different grammar, so that in different registers or contexts, the same process may be fully, partially or hardly productive, with extreme cases allowing only lexicalized forms or disallowing any realization of the process altogether. This would be equally applicable to fully gradient views of productivity, since the same process may have a different productivity level in different registers (insofar as this is assessed using corpus data, one might in fact expect precisely such results). It therefore seems to me that admitting more than three categories of productivity very quickly leads to using a gradient scale, since the circumstances leading to the difference cannot be stated on qualitative grounds other than the class of bases (an internal selectional criterion) or context (an external selectional criterion), i.e. that which embeds the process or that which the process embeds. Reverting back to three categories also seems dissatisfactory, since if a process is semi-productive by virtue of our not knowing which future members of its base class it may apply to, then we are simply admitting our failure to find the appropriate class of bases. If all we want to say is that a process is less likely to realize novel forms, or that these are more difficult for speakers to accept, we immediately return to at least an ordinal, multi-categorial view, in which different processes can all be more or less productive than other ones. The crucial question thus becomes: what do we mean by ‘productive’ or, compounding the difficulty further, by ‘more productive’? I therefore turn next to the examination of the criteria for productivity, especially as envisioned by gradient approaches. 4.

Criteria for productivity

Even if one accepts multiple ordinally arranged grades or a continuum of productivity, it remains rather unclear what the criteria are for being more or less productive on that ordinal or ratio scale. As we have seen above, it is fairly clear that enumerability translates to complete non-productivity, and automatic applicability to an open class of bases of predictable membership translates into total productivity (this last statement is not without its problems, but can be accepted for now). But what does it mean for one process to be more productive than another in an intermediate productivity grade? This has been the subject of much debate in morphology.

34

(Re-)defining productivity

The first question one should ask on this topic is perhaps why one would one want to designate one process as ‘more productive’ than another. The initial motivation in this case seems to be a strong intuition on the part of some morphologists working within a graded view as to what the relative order of productivity between word formation processes might be.23 Thus, despite disagreements on the empirical measurement of productivity, Baayen (1993) largely concurs with van Marle (1992) on an intuitive ranking of Dutch derivational suffixes, distinguishing not only coarse classes, but a suffix for suffix order (e.g. for the processes supplying the nominal suffixes -tje > -ing > -heid). Plag’s (1999: 93–118) survey of English verb deriving processes (e.g. affixation with -ize, -ate, circumfixation of en+ -en, conversion etc.) claims “it seems clear that some of the processes are more productive than others” (Plag 1999: 96), and similar statements for nominal processes can be found in Plag (2003: 44): “there appear to be some [affixes] that are more productive than others”, with -ness (as in cuteness) exemplifying a more productive affix than -ish (as in apish). But are there similar intuitions for syntactic constructions? Returning to the analogy syntactic slot : morphological base, it is possible to sense that, for example, different verbs have intuitively very different sized classes of possible arguments. Consider my intuitive ranking of the verbs in Table 4 according to perceived sizes of the suggested classes fitting the direct object slot of their respective argument structure. Table 4. rank 1 2 3 4

Ranking of English verbs with certain sets of objects according to intuitive set size. verb eat achieve harbor sift

direct object class [+edible] [+achievement] [+mental state] [+fine-grained]

It may seem intuitive that eat should have more possible objects than harbor (in the sense of harboring an emotion, e.g. resentment), simply because 23. Aronoff (1976: 35) has commented on the vagueness of this intuition: “The term productivity is widely used in the studies of derivational morphology, and there is obviously some intuition behind the usage, but most of the discussion is rather vague”. In the decades since, however, repeated attempts at formalizing this intuition attest to its being a desirable feature for a theory of grammar to capture, despite the difficulties involved.

Criteria for productivity

35

there are more edible things in the world than sentiments. But if we consider such a statement against the background of the enumerability criterion introduced earlier, it becomes clear that formally speaking, the processes of direct object selection for eat and harbor are both productive to some extent, since it is not possible to enumerate their possible arguments (consider the non-lexicalized use of arguments such as harboring violence, superstition, hunger or grief to name but a few examples occurring only once in the BNC; see Chapter 4, Section 4 and Chapter 5, Section 6 for detailed studies involving harbor). Speaking from a number-theoretical point of view, both these verbs have a set of potential objects with a cardinality of Cantor’s aleph-null (š0), that is an infinite set which can be mapped to the natural numbers (see Dauben 1990: 179). Why, then, might we feel that this ranking makes sense? Can this perceived difference be assessed consistently by speakers? And does it correspond to demonstrable effects in corpus or experimental data from these a-structures, e.g. different type-token ratios or different sized argument vocabularies, or diverging reaction times in elicitation tasks? These are all empirical questions which can and should be addressed, but at this stage they will simply serve to show that a discussion of criteria for an ordinal (if not ratio scaled) ranking of productivity in syntax is motivated by intuitions comparable to those from morphology.24 The question therefore remains crucial: what do we mean by “more productive”, or even “has more arguments”, between slots that seem equally inexhaustible? Going back to morphological productivity, it appears that this can mean different things. Bauer contrasts two senses of morphological productivity, availability and profitability: “Productivity” deals with the number of new words that can be coined using a particular morphological process, and is ambiguous between the sense “availability” and the sense “profitability.” The availability of a morphological process is its potential for repetitive rule-governed morphological 24. A different type of objection could be raised that these intuitions may be available, but correspond to cognitive concepts or world knowledge, and not a part of the linguistic system proper (i.e. there are more edible things in the world than there are siftable ones, or at least insofar as our functional perception of the world is concerned). This will be dealt with in further chapters, though the two main problems with such approaches have already been alluded to: different argument realization in synonyms, and different behavior across languages in translations of the same concepts. For now, the reality of this intuition and the need to explain it are the main issue.

36

(Re-)defining productivity coining, either in general or in a particular well-defined environment or domain. The profitability of a morphological process reflects the extent to which its availability is exploited in language use, and may be subject unpredictably to extrasystemic factors. (Bauer 2001: 211)

These notions go back to Corbin (1987: 177), who distinguishes profitability and availability in much the same sense, but also regularity,25 which Bauer collapses into the definition of availability (with the words “repetitive rule-governed” above). Regularity is also closely related to schematicity, which is a central concept for productivity for Clausner and Croft (1997), who view the establishment of a regular schema from an ideally large number of (semantically) closely related types as a prerequisite for productivity (particularly in the context of novel metaphors). In their approach, a large base of regular exemplars leads to stronger abstractions, which may become entrenched independently from their surface realizations, while less coherent schemas will be much less entrenched than their realizations, leading to semi-productivity. This is illustrated in Figure 1, as adapted by Barðdal (2008: 48) from the presentation in Clausner and Croft (1997: 271), by bold boxes for the more entrenched construction. Schema

Schema

Instance

Instance

(a) Productive



Instance

Instance

(b) Semi-Productive

Instance (c) Unproductive

Figure 1. Schematicity, productivity and entrenchment in Barðdal’s (2008: 48) adaptation of Clausner and Croft (1997: 271). Bold boxes stand for greater entrenchment (reproduced with permission from the author) Barðdal (2008: 48–49) applies the same notions of schematicity to morphological productivity as well, where e.g. the regular English past simple suffix -ed is fully productive (corresponding to predictable and 25. In French rentabilité, régularité and disponibilité. I follow Plag (1999: 34) and Bauer (2001) in adopting Carstairs-McCarthy’s (1992: 37) translation of the terms.

Criteria for productivity

37

transparent forms), while the strong past formation by vowel alternation is only semi-productive (novel forms are possible, but the lexicalized instances predominate, and though forms may be transparent, their existence cannot be predicted a priori). Finally, suppletive forms are completely unproductive, unpredictable and non-transparent, such as go : went. Barðdal also extends Clausner and Croft’s representation by explicitly allowing related schemas to be joined together at a higher level of schematicity, which will be more productive than its subschemas, cf. Figure 2. In her approach, productivity (syntactic or otherwise) corresponds to maximal schematicity, i.e. a construction is more productive the higher its level of schematicity, which corresponds to more types being generated regularly and coherently (i.e. with consistent meaning) from the schema. To my mind, these may be indicators or even causal reasons for productivity, but not productivity itself: schematicity may facilitate the generation of new forms, but it is the coining of new forms that constitutes productive usage. The role of schematicity as a criterion for productivity can thus be limited to the sense of regularity that is implied by the term schema, since productive forms are understood by association with an existing schema, which dictates a predictable form for the output of a process.26 Schema

Subschema

Subschema



Instance

Instance



(a) Productive

Schema

Instance

Instance

(b) Semi-Productive

Instance (c) Unproductive

Figure 2. Barðdal’s (2008: 50) approach to hierarchical levels of schematicity as a correlate of productivity. Bold boxes stand for greater entrenchment. (Reproduced with permission from the author)

26. In fact Barðdal (2008: 3–4) also views regularity (predictable output), as well as generality (unrestricted applicability), as secondary for productivity beside extensibility (roughly the same as profitability above), and suggests the other two terms have become entangled with productivity by way of association.

38

(Re-)defining productivity

Beyond the question of how reliably the shape of the output of a process can be predicted (regularity), which is perhaps less often an issue for syntax than for morphology, we are left with Corbin’s other two aspects, which relate to the question of how many types belonging to a process are instantiated in data versus how many could conceivably be instantiated. This distinction is also relevant for the grammatical status of the criteria for productivity, since it mirrors the competence / performance distinction in generative grammar (capacities of the system versus output in practice). As pointed out e.g. by Plag (1999: 34), Corbin’s distinction is valuable in teasing apart these, at least in principal, independent aspects of productivity, but gives no way of determining how available a process is quantitatively. Baayen (1993, 2009) recaptures the distinction in more operationalizable terms in what he calls realized productivity, i.e. how many types actually get formed by speakers based on data, and potential productivity, i.e. the probability of finding a neologism of a certain type after a certain amount of data has been seen, and consequently, how many neologisms we believe can be created using a process from a class of bases. He attempts to estimate the latter based on extrapolations of the predictably decreasing probability of neologisms in growing amounts of data. Baayen further proposes that the relative contribution of a process to neologisms in the language in general also plays a role in our perception of productivity, and refers to this quantity with the term expanding productivity (see Baayen 2009: 901–902 for an overview of these terms and their development). In other work based on the Morphological Race Model (Frauenfelder and Schreuder 1992), working under the assumption that frequent types are retrieved from memory and infrequent ones are parsed before this can occur, Baayen (1993) suggests that the ratio of parsed to unparsed members of a category may also be relevant (i.e. how many fixed or established expressions a class has), and contrasts rankings based on this criterion with ones arrived at by his other measures. All of these concepts can be interpreted in a syntactic framework: realized and potential types in a slot according to the analogy in Section 2 above, expansion rate compared to other slots, and the appearance of lexicalized units such as collocations (see Evert 2005, 2009 for recent surveys), colligations (Stefanowitsch and Gries 2003), lexical bundles 27 (see Salem 1987; Altenberg and Eeg-Olofsson 27. Lexical bundles are frequent sequences of multiple words, usually four or more. Since they can cross syntactic brackets, they may be relevant both to the interpretation of complex processes filling multiple slots and to interaction between syntactically interrelated constructions and their productivity.

Productivity versus creativity

39

1990; Biber and Conrad 1999, Biber, Conrad, and Cortes 2004), or other types of multi-word units. The exact methodology behind Baayen’s approach and the different measures proposed both by him and his collaborators, as well as some other views, are reviewed in depth in Chapter 3. It will be a central goal for the present work to establish whether this methodology and related approaches can be adapted to syntactic studies on theoretical and empirical grounds, e.g. whether or not the amount of neological slot members can be predicted from data consistently and reliably. However even if we accept these criteria as relevant, as soon as we identify two or more factors relevant for productivity, it is not at all certain that they should correlate, making it unclear which criterion the ranking in Table 4 above might be based on. This is not necessarily a bad thing: it is also possible that different rankings based on different criteria will correspond to different intuitions and explain different aspects of productive behavior, both synchronically and possibly also diachronically. As an interim conclusion it may already be stated that all of the aforementioned factors found in the morphological literature seem potentially relevant for syntax: regularity/predictability, number of types in theory and in practice, number of tokens in data, proportion of parsed to lexicalized types and tokens in data, probability of neological types and the proportion within all neological types in the language. Since these factors already refer to concrete sets of tokens and types, it becomes imperative to ask “types/tokens/neologisms etc. of which phenomena?”. What constitutes a slot can only be decided by a syntactic theory, the role of which is after all precisely to delineate the structures available in each language. But can and should the construction of each and every kind of structure in a language be ascribed to productivity? To determine if some types of utterances should be excluded from coverage under a theory of productivity for syntactic processes, I turn next to the discussion of the scope of applicability for the term ‘productive’, aiming to separate it from certain cases largely labeled as ‘creative’ in morphological equivalents. 5.

Productivity versus creativity

So far I have equated productive use of a process with the realization of a novel type, previously unseen. However, as Bauer (2001: 62–71) suggests, we may not want to treat all novel formations as instances of productive use, that is not every possible type should be examined. Such exceptions have

40

(Re-)defining productivity

largely been addressed under the heading of creativity as distinct from productivity, and largely seem to revolve around cases where the regularity of the underlying process is questioned (Bauer 2001: 64). Bauer distinguishes three types of creative processes: the generation of novel simplex words after Baayen and Lieber (1991: 815); figurative extension; and ‘non-productive creativity’, which seems to involve the formation of isolated new items in a pattern which is otherwise considered unproductive. Are these reservations applicable to syntactic slots? The case of novel simplex items can perhaps be carried over to syntax in the way of idiosyncratic constructions which can occur e.g. under external influence, such as borrowing (which is also one of the most prominent sources for novel simplex words). For example, at the time of its inception, the English expression long time no see was a calque,28 and can hardly be said to have been modeled after an extant syntactic structure by putting together existing elements into a pattern previously housing other elements. Thus there is no selectional process at play of the ‘argument filling’ type discussed so far. If this structure is subsequently extended, e.g. into long time no hear/write/phone… then it may have evolved in its own right into a construction with a productive process filling the appropriate slot. But its birth is owed to a process of interference that lies outside of the scope of syntactic productivity in the current sense, which makes up the overwhelming majority of novel realized types in syntax, analogously to ‘normal’ word formation. These may therefore be put aside much the same as in morphology on grounds of non-selectionality (they form a lexicallyfilled construction with a valency of zero arguments). With ‘figurative extension’ Bauer refers to the use of a familiar word in a new sense, e.g. bypass being extended from a road to a blood vessel in bypass operation. Bauer then contends that this type of ‘coinage’ need not be classed as morphological productivity, but rather a type of ‘creativity’. The complete reinterpretation of a lexically specified, filled-out syntactic construction seems to parallel this, firstly in that lexicalized expressions with a special sense should result in a separate examination of productivity 28. Probably a pidgin loan translation into American English from Cantonese hou2 noi6 mou5 gin3, literally ‘long time no see’ (see the online American Heritage Dictionary of Idioms at http://dictionary.reference.com/ browse/long+time+no+see?r=75; the Mandarin equivalent is hao2 jiu3 bu2 jian4, ‘id.’), though the OED gives a first citation attributing the words to a Native American (see long in the online Oxford Dictionary of Word Origins at http://www.oxfordreference.com).

⤥侸ℯ夳

⤥ᷭᶵ奩!

Productivity versus creativity

41

(e.g. pulling one’s leg should not be synchronically viewed as an instance of the ‘general’ verb pull unless the general sense is intended), and secondly, a novel, but synchronically unpredictable extension of meaning, e.g. technologically motivated neologisms such as the ‘online’ verbs chat or post on an Internet forum, merit a separately defined argument spectrum. Certainly one would expect different collocations for such innovations, thus different behavior in novel argument selection should come as no surprise, and should be accounted for in a theory explaining syntactic productivity. A related sense of ‘figurative extension’, which is perhaps much more relevant to syntax, is metaphor. Since, even in a lexical semantic approach, semantic classes may be extended ad hoc, metaphors may result in otherwise unacceptable arguments occupying a slot. Consider these examples from Lakoff (1993): (15) He flew through his work (16) He clawed his way to the top Clearly there is a non-arbitrary connection between the metaphors in these examples and the underlying literal senses, though it is not clear how easily or how often such extensions occur generally. If we subscribe to Glucksberg and Keysar’s (1993) “class-inclusion” view, such metaphors are supposedly understood literally on some level and are enabled by an ‘attributive’ creation process, whereby the verbs above are not subtypes of more general notions of ‘flying’ or ‘clawing’, but rather certain properties are projected from a source domain to a target domain: in Lakoff’s terms, speed of movement and climbing up (to a higher status) for the two verbs respectively. This implies that any subset of attributes can be extracted to create a new ad hoc semantic class to extend a slot, meaning the extent of this process may be considerable.29 At the same time, the sense intended in the examples above is intuitively different from a literal one of flying in the air or using one’s actual claws, and naively examining the realized objects of these verbs without accounting for this would lead to an increase in the diversity of a slot’s repertoire thanks only to such examples. Since it is difficult to rule out all metaphoric uses, which may be difficult to judge for all cases, it might be better to encapsulate the problem and separate it provisionally by deciding to regard different senses as reason enough to 29. This is also in line with Dowty’s (1991) view of semantic classes as sets of entailments, with each such property corresponding to an entailment.

42

(Re-)defining productivity

identify the existence of a different slot. The problem of how such senses may be identified in general must be left open for a further discussion. Finally, the case of novel formations in patterns that are otherwise unproductive seems to be applicable to obsolete constructions, e.g. current English use of subject inversion in questions with non-auxiliary verbs, as in (17). (17) Dare she push her bike through that gate — past those fierce horned heads? [BNC, doc. B0B] Since such usage is by definition anomalous in the system, its description could also be separated from the propensity for variation and innovation in ‘normal’ selectional processes for syntax much as for morphology, provided one has a way to tell the two cases apart, a task which ultimately falls to the grammar (i.e. we must decide if subject inversion with lexical verbs is part of the synchronic grammar we wish to describe). It thus seems that, with the exception of the problem of extensions to a slot’s semantics, the cases set aside by Bauer under the label of creativity can be separated for syntax as well given the right criteria. In attempting to find such criteria, Bauer (2001: 65–66) first examines and subsequently dismisses the idea of ‘regularity’ as a decidable solution. Indeed, although ad hoc extension and revival of archaisms are (again, by definition) irregular, and novel simplex forms arguably too (if one assumes syntax is a combinatory system which operates on extant minimal signs), it becomes a circular matter of identifying such extensions as that which is irregular, and vice versa. And in principal, a rule can be proposed for even the smallest set of cases (perhaps even for a single case). Bauer therefore examines the theoretically appealing idea that deliberately coined words should be viewed as creatively, not productively formed, in keeping with Schultink’s definition (recall the wording in Section 1 above: “the possibility for language users to coin unintentionally […]” [my emphasis]). Earlier in his survey (Bauer 2001: 40) Bauer gives cognitive plausibility to this distinction by demonstrating speakers’ awareness of ‘borderline’ coinages in such examples as the following: (18) The very unattainableness… is there such a word? Anyway, you can see what I mean (Ormerod, Roger, The Second Jeopardy. London: Constable, 1987: 154; cited in Bauer 2001: 40)

Productivity versus creativity

43

(19) If a guy who plays around is called a womanizer, what do you call a woman who does the same thing – a manizer? (Sanders, Lawrence, Timothy’s Game. Sevenoaks: New English Library, 1988: 147; cited in Bauer 2001: 40) This type of awareness can probably be demonstrated for syntax as well if we consider utterances of the sort “can you [verb] a [noun]?” (e.g. “can you escalate a dilemma?”) or the negative response “you can’t [verb] [noun]s”, though since novel syntagms are less conspicuous than novel words, as already discussed, it is likely these cases are considerably more rare. However, as Bauer points out, for the overwhelming majority of cases we do not know how meta-linguistically aware speakers are of their choice of words, and this criterion is therefore simply impractical: we cannot ask authors or speakers in every case and they may not have answers anyway. Two other suggested criteria, namely restriction to subclasses of bases (e.g. only non-native words, as suggested by van Marle 1985: 59, cited in Bauer 2001: 68) or a very small number of bases (e.g. a novelty based on one example, forming a class of two), are also discussed briefly and rejected as unreliable, though these are probably not as interesting for syntax, since restrictions based on origin are not generally apparent in syntax30 and classes of one or two members are usually treated as idioms or other types of multiword units. Bauer (2001: 214) compares the latter to the fossilized products of obsolete morphological processes, citing examples like come of age (but *come of maturity), few and far between (but *they are far between) or if you please, all of which go back to patterns that are now obsolete. Of course it is a given that recognizing idioms is far from a trivial task: as already mentioned, many idioms are not strictly invariable (see Philip 2008 on slight variations in fixed expressions in corpus data, Partington 1998: 121 on ‘unusuality’ in unexpected variants of idioms and McGlone, 30. This is not to say that origin does not interact with syntax. For example, Japanese has two types of adjectives, suffixed in attributive use with -i and -na, where the latter are generally of non-native origin. The two are used in different constructions, with the -i type serving as inflectable predicates and the -na type being limited to predication with a copula (Shibatani 1990: 215–221). However, such differences should be seen as operating already on the level of morphology, having different parts-of-speech which are then exposed to syntax. The conceivable case referred to here would mean that two words are morphologically indistinguishable, but syntax makes a distinction of productivity for novel cases in some embedding slots based on their origin.

44

(Re-)defining productivity

Glucksberg, and Cacciari 1994 for experimental evidence on compositional idiomatic comprehension of variants like shatter the ice for break the ice) and idiomaticity is likely a scalar variable (see Wulff 2008 for a data-based approach to its quantification). Conceivably, any idiom that encompasses a selectional process is potentially subject to productivity according to the criteria in the previous section. As long as our syntactic theory regards its surface expressions as non-enumerable, the concepts discussed here continue to apply despite the presence of any non-compositional components, which are in reality likely more often than not present in supposedly ‘non-idiomatic’ constructions too. Bauer concludes that productivity in the more traditional sense, and creativity as distinguished from it, are prototype-centered categories, which no reliable decision procedure can tease apart. An application of these notions to syntactic selectional processes seems to be just as undecidable, though fortunately, the morphological literature on this topic seems to suggest that most of the issues are not as dominant at the phrase level, if only because syntactic formations seem to be more regular and used innovatively less consciously. At the same time the more egregious problem, it seems to me, is the one discussed in relation to ‘metaphorical creativity’, namely that of ruling out cases where a syntactic head is ‘misused’ to create an atypically filled a-structure. This is in essence the more prevalent type of ‘irregular’ use of bases in syntactic derivation, or in other words the syntactic locus where an otherwise ‘inadmissible’ building block is used to derive a higher structure, by being made valid for an exceptional case. Identifying such ‘misuse’ or coercion (see Chapter 5, Section 111) reliably is probably just as intractable as the morphological distinction which Bauer abandons as impractical, but as I have argued above, the more important issue may be a matter of determining which evidence belongs to which attested slot. This problem is just as relevant in telling apart the behavior of any two homonymous structures, lexically identical on the surface, but which refer to different senses of a slot filling process. If we wish to separate the cases where one can fly through [something] figuratively, as in Lakoff’s example, from those where a bird or other flying entity physically flies through something, we should most certainly wish to distinguish the possible arguments of fire (e.g. pottery in a kiln) and fire (dismiss from work), or any other homonyms. This problem, word sense disambiguation (or even determining which senses there are to disambiguate, cf. Cruse 2002), will thus be one of the main difficulties in operationalizing and automating productivity measures for syntax, a problem I will return to briefly in Chapters 4 and 5. It therefore seems that

Roadmap: Towards a productivity complex

45

identifying and hopefully also quantifying productivity in syntactic arguargument selection is a feasible endeavor, comparable with its equivalent in morphological word formation, as long as the relevant slots can be defined both syntactically and semantically. 6.

Roadmap: Towards a productivity complex

The discussion up to this point has hopefully made it clear that productivity is not a single property but a manifold collection of phenomena with rather fuzzy borders, or a “syndrome of properties” (Plag 2006: 549). It is therefore important to clarify what an empirically based quantification of productivity for syntax should set out to do and what some of its applications might be. Given the multi-faceted nature of productivity, the goal will not be finding a formula to produce a single number relativizing productivity for all constructions. This is not only difficult or impossible, since many of the objects under investigation are not comparable (different positions on the prototype scale from productive to creative, admissible for different productivity criteria), but in my opinion also undesirable, since we would be losing all of the insight gained by breaking down the factors associated with productivity. In this sense I agree with Baayen in comparing and interpreting multiple measures of productivity as different aspects of the phenomenon, constructing a multidimensional space for productivity (especially as in the two dimensional model of vocabulary size versus proportion of hapax legomena, words occurring only once in a corpus, in Baayen and Lieber 1991: 819, see Section 7). At the same time I will intentionally avoid the term ‘global productivity’ with which Baayen has aimed to reach one optimal productivity ranking for word formation processes based on complex measures factoring in the more simple measures that make up the dimensions of productivity. The goal for the following chapters is therefore to find a well-defined multi-dimensional model of syntactic productivity which addresses the aspects mentioned so far along the lines of Baayen’s multiple measures for morphology. For each of the multiple dimensions it must be shown that they correspond to actual phenomena in data (i.e. they have predictive power statistically) and that they have a linguistic interpretation (i.e. they have theoretically explanatory power), either by corresponding to reproducible intuitions or by predicting other established theoretical linguistic precepts, conceivably either in synchronic grammar or in diachronic development. Measures fulfilling these criteria will become part

46

(Re-)defining productivity

of the Productivity Complex (PC). Their validity may be falsified by prepresenting counterevidence to their predictive power or linguistic soundness. The latter would be undermined if the measures predict nothing, but also if they predict something which is meaningless to us. In this respect it is important to compare the performance of the measures for target constructions as well as for ‘meaningless’ data (e.g. arbitrary sequences of characters defined in some way), so that we can understand what it is that the measures predict. Finally, since a theory of syntactic productivity should be parsimonious, even if a measure can be shown to be predictive and interpretable, it should not be admitted if it is redundant next to another measure. To guard against this possibility, possible correlations between measures should be empirically evaluated and linguistically interpreted so that the most orthogonal and expressive dimensions are chosen. With the optimal set of dimensions at hand, it can be hoped it will become possible to cluster data from particular areas in the n-dimensional space and interpret such areas as certain theoretically justifiable syntactic types. The applications of measurable criteria for syntactic productivity range from concrete uses in explaining certain phenomena and predicting certain types of behavior in language data, to very abstract uses in theory building, in particular with regard to their implications for the way speakers learn what the extensible rules are in their language. At the most basic level, productivity measures should allow us to compare two or more slots with respect to their repertoire and potential to innovate. They should thereby also indicate how close a structure is to exhausting itself, which is also applicable to diachronic explanations and predictions about lexicalization and idiomatization over time. Beyond such descriptive comparisons, valid productivity measures can provide us with predictive knowledge, recasting the fact that we encounter novel forms at a certain position in a new light. For instance, knowing how productive a slot is can be used to disambiguate syntactic structures. Given that we encounter an unfamiliar item we should assume we are dealing with the more productive of several possible constructions.31 Thus syntactic productivity allows us to treat encountering a new item as a sort of knowledge: knowing that we do not recognize an

31. Whether hearers use this knowledge in parsing utterances is an open question; but such probabilities can certainly be used in computational tasks to make better predictions for unseen types, which are particularly problematic for natural language processing, see Zeldes (2009).

Roadmap: Towards a productivity complex

47

argument becomes informative if we can show that some structures are more prone to such arguments. However empirically demonstrating valid measures of syntactic productivity has the most important consequences for explaining language acquisition, stability and change over time. Given that successive generations of speakers produce more or less the same productivity ratings for the same constructions it must be assumed that these result directly or indirectly from something that they learn. If it can be shown that productivity in syntax is distinct and not fully predictable from lexical semantics and world knowledge, it would follow that it belongs to the combinatory language system itself. Productivity measures could then trace the properties of usage that lead children, but to some extent also adults, to prefer to extend the same slots as their linguistic role models (their parents, peers etc.), and in turn produce the same sort of distributions, thus perpetuating the system. Because each neological case in a role model’s linguistic performance is perceived as attested by the learner, slight changes are bound to occur in argument selection input (this also applies to adult speakers to the extent that their grammar is pliable), which in turn lead to diachronic change. In this way a theory of syntactic productivity becomes an essential tool for explaining how speakers learn to identify productive rules in language input, and how new rules come about or how old ones cease to function. Needless to say, this can only be part of the equation for selectional processes, especially next to semantics and world knowledge; how central a role this tool plays in the acquisition and constant renegotiation of rules remains to be shown as the next chapters present and develop concrete measures for case studies.

Chapter 3 Morphological productivity measures

In this chapter, empirical measures to quantify different aspects of productivity will be put forth and tested according to the first criterion laid out in Section 6 of the previous chapter: consistency and reproducibility of results that are in line with linguistic intuition. Empirical productivity measures have an added value and go beyond intuition by objectively and automatically quantifying differences in the various aspects of productivity. The discussion here will be driven by a survey of work in morphological productivity, focusing on Baayen’s measures already mentioned in Chapter 2, Section 4, as well as some statistical models predicting the behavior of unseen data. For space reasons I will forego well-documented details such as mathematical proofs for the more complex models presented here; the interested reader is referred to the literature on morphological productivity for the relevant equations where appropriate. After some methodological remarks, measures will be tested for consistency on morphological data, and a selection of these will be adapted to, and tested on syntactic argument constructions in the next chapter. This chapter ends with a shortlist of candidate measures for the multidimensional Productivity Complex (PC) outlined in the previous chapter. 1.

Methodological remarks on testing productivity measures

Before approaching the task of measuring productivity in syntactic argument selection, it is worth examining the usage of productivity measures in the morphological literature in depth, and verifying their meaning and validity. This will give us an idea of what we can expect as a ‘good result’ for syntax, based on the morphological domain, in which each of the measures under consideration in this chapter has been deemed useful in previous studies. Although it seems plausible that many word formation processes will exhibit consistent values for productivity measures, the methodology outlined in Section 6 demands that the following be demonstrated for each measure we use: a. b.

The measure is reproducible (at least for comparable text types) It has a linguistic interpretation

Using type counts

c.

49

Its results are not nonsensical when applied arbitrarily to linguistic data which does not result from any productive process

Condition a. can be tested by partitioning data into multiple samples and comparing the values of the measure in each sample. Condition b. is more difficult to establish beyond question, but an attempt will be made to find an operationalized mathematical description for each measure and a plausible intuition linking it to the theoretical concepts described in the previous chapter (e.g. profitability, availability etc.). Finally condition c. provides a special challenge, since although it may seem as though one could simply take random tokens and apply a measure to them, randomness assumptions dictate that any totally random sample will contain about equal proportions of each productive and unproductive process in the language, leading to some sort of average productivity across the board (in fact, if we believe each process is used with a certain probability, this is precisely what we should expect). I will demonstrate that this is so by applying the measures below to an apparently ‘nonsensical’ process and one which is arguably no process at all: items selected by a meaningless orthographic criterion and non-productive series of pronouns. Results for these items can then be compared with other known more or less productive processes. 2.

Using type counts: V

The most naive approach to quantitatively comparing productivity for (morphological) processes is probably to simply compare type counts, i.e. how many distinct realizations a process has (usually lemmas, though in the context of inflectional morphology individual word forms are of interest as well). It remains to be operationalized, of course, what exactly constitutes a type and whether these types are counted in a corpus (and if so in which or what kind of corpus?), elicited under experimental conditions from speakers or estimated from lexical resources such as dictionaries. Especially the last option has often been criticized as unreliable for morphological studies, as dictionaries generally do not list compositionally transparent formations because “dictionary-users need not check those words whose meaning is entirely predictable from its [sic] elements, which by definition is the case with productive formations” (Plag 1999: 96); such cases are however precisely the locus of productive behavior. For regular syntactic structures, large scale lexical resources are not generally available, so that the relevant data for counting types can essentially only come from

50

Morphological productivity measures

performance data (as we shall see in Chapter 4, because of the combinatory nature of syntax, the amounts of data required for analyses comparable to those in morphological studies are an order of magnitude larger, and steer the effort strongly towards very large, written electronic corpora). The type-counting approach, but also its criticism, form the starting point for many discussions of empirical productivity measures, and there is general agreement that high type frequency correlates with productivity (see e.g. Goldberg 1995: 134, Croft and Cruse 2004: 308–313, Bybee 2006, Barðdal 2006, 2008 among many others). Bauer (2001: 145) attributes the first formalized attempt at an empirical productivity measure to Aronoff (1976: 36), who points out that ‘it isn’t fair’ to equate type frequency with productivity, since, as already discussed in the previous chapter, some intuitively very productive processes exhibit few types on account of a small class of available bases. Nevertheless, Baayen and Lieber (1991) use empirical type counts as one dimension of a two dimensional model of global productivity (see Section 7 below), and Baayen (2009: 901–902) justifies the intuitive meaning behind the observed vocabulary or ‘realized productivity’ in his terms (also ‘extent of use’ in Baayen 1993: 181), by equating it with the ‘past achievement’ aspect of Corbin’s (1987) notion of profitability discussed in the previous chapter. That is to say, according to Baayen, a process has been productive in the language system to the extent that types stemming from it are in evidence in contemporary data, quite independently of the question of whether these types are currently being derived using the relevant synchronic process by speakers. Baayen formalizes the notion of a type count or vocabulary size for a category C in a corpus of the size N tokens as: V (C , N )

[1]

In order to test this measure I will attempt to reproduce and crossvalidate results from Lüdeling, Evert, and Heid (2000) for German adjective derivations, specifically for the processes generating adjectives with the suffixes -bar (roughly equivalent to English -able), and -sam (related to English -some), as in (20) and (21) respectively (relevant forms are emphasized in boldface throughout). (20) Das Backlight ist austauschbar. the backlight is replaceable ‘The backlight is replaceable.’ [c’t, pos. 599961]

Using type counts

51

(21) die Tatsache, daß das Programm […] sehr langsam wird the fact that the program very slow becomes ‘the fact that the program […] becomes very slow’ [c’t, pos. 3770829] The latter process is considered to be ‘at best marginally productive’ (Lüdeling, Evert, and Heid 2000: 58) and rather limited in its vocabulary, and has even been used in school grammars to exemplify incorrect productive formations (e.g. an exercise in Marenbach and Gärtner 2010: 35 where incorrect novel forms such as dehnsam ‘stretchsome’ and schwatzsam ‘babblesome’ must be replaced with correct, established adjectives).32 By contrast, -bar has a large vocabulary and generates novel forms routinely from transitive verb stems.33 Note that in comparing these suffixes, the question for now is not whether or not the class of bases available for -bar is larger than for -sam, but rather whether or not we can get similar results from our productivity measure, in this case similar type counts, in multiple datasets with acceptably low variance. To examine the two suffixes I will use a corpus of approx. 14.5 million tokens from 5 years (1998–2002) of the German computer magazine c’tMagazin, 34 which contains a variety of texts such as editorials, reviews, guides, letters to the editor etc. The corpus was tagged for part-of-speech and lemmatized automatically using the freely available TreeTagger (Schmid 1994) and searched using the Corpus Workbench (Christ 1994). As pointed out by Lüdeling, Evert, and Heid (2000), naively searching for all adjectives ending in -bar or -sam could lead to errors and requires special attention in several respects: 32. As it turns out, such forms are actually attested, albeit rarely, see Chapter 3, Section 4. 33. There are in fact several subtypes of derivation with -bar, not all of which are productive or operate on transitive verb stems, e.g. a few denominal cases like fruchtbar ‘fruitful’, or various non-transitive ones like verfügbar ‘available’, whose corresponding verb takes a prepositional argument (see Riehemann 1998, Siebert 1999: 17–88 for in depth analyses). For the sake of simplicity and to ensure comparability with previous work I will not distinguish between subtypes here, neither for -bar nor for -sam. The question whether lexicalized cases such as the above should be dismissed will be returned to in the syntactic context in Chapter 5, Section 3. 34. To gain access to the corpus, a free login can be applied for under http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/ institutkorpora.

52 1.

2.

3.

Morphological productivity measures

Automatic lemmatization cannot be relied on for neologisms, thus unlemmatized word forms should be used and possible inflectional endings must be removed (e.g. machbares ‘doableNOM.SG.NEUT.WEAK.’ should be reduced to the base form machbar). Forms coincidentally ending in the same string must be removed, e.g. taskbar ‘(a/the) taskbar’ is not an adjective and does not contain the suffix -bar. Processes applying after the derivation has transpired should not be considered, namely: - further derivations, e.g. negation with the prefix un- in unmachbar ‘not doable’ - compounding, e.g. schneckenlangsam ‘snail-slow, slow like a snail’

4.

These must be reduced to the initial derivation types (i.e. unmodified machbar and langsam respectively). Note that it is irrelevant whether or not a modified form can be assumed to be the primary or more entrenched form in some cases, e.g. unabdingbar ‘indispensable’ but only very rarely abdingbar ‘dispensable’. Since neither choice alters the type count, both forms can simply be collapsed into one type. Orthographic errors must be corrected, since these can inflate the type count by making two instances of a word appear to be different, e.g. konigurierbar > konfigurierbar ‘configurable’.

These problems mean that results must be filtered and corrected by hand, though some of the work can be done semi-automatically as well. To assess variability in the measure V, the c’t corpus is divided into ten equal portions, 35 a search for adjectives ending in -bar or -sam plus possible inflectional endings is carried out, and post-processing is applied to correct the above problems. Table 5 gives the results for V in each subcorpus and in total.

35. It is important that the subcorpora be continuous texts and not each a random tenth of the word forms. Randomizing the order of words would force the process to approach a binomial distribution with a homogeneous mean approaching the total mean and a variance of n·p·(1-p) in each subcorpus, where p is the probability or proportion of words from the process in the corpus and n is the sample size. The corpora would thus be indistinguishable and tell us nothing about what we can expect in natural data. I thank Felix Golcher for commenting on this point.

Using type counts

Table 5.

-sam -bar

53

Type counts for -sam and -bar in each of the 10 subcorpora. The count for all corpora together is not a sum of the other columns, since many of the same types appear in more than one subcorpus.

1 2 27 23 284 255

3 30 287

4 27 290

5 6 7 8 28 27 25 33 264 282 275 288

9 26 297

10 26 300

mean all 27.2 43 282.2 716

As we can see, V recognizes the fact that -bar is intuitively more productive than -sam. Encouragingly, numbers for each process do not differ very much between subcorpora. Statistically speaking, there are no significant differences in the proportion of different types between the subcorpora, neither for -sam nor for -bar (p=0.9813 and p=0.7088 respectively in a binomial test of equal proportions). Use of these suffixes is rather homogeneous in the subcorpora, and the variety of types one can expect to find in approx. 1.5 million tokens of similar text (the size of each subcorpus) is fairly constant. This is good news for V as a reliable measure, as it can be shown to fulfill condition a. above: it is reproducible within the same text type or register. At the same time V performs similarly to results in Lüdeling, Evert, and Heid (2000), where V values of 544 and 53 were observed for -bar and -sam respectively, in approx. 36 million tokens of newspaper text. However taking a same sized sample from Lüdeling et al.’s data for comparison,36 it turns out that -sam receives a slightly higher score of 47 rather than 43 types, and -bar performs much lower with 400 instead of 716. Assuming the difference is owing to the text type opposition ‘newspaper’ versus ‘computer magazine’, this clearly shows that while the ranking and order of magnitude for differences between processes remains comparable, genre does have a marked effect on individual scores (cf. Plag, Dalton-Puffer, and Baayen 1999 for a systematic study of productivity and register). A very tentative explanation can be offered in that computer magazines tend to require adjectives describing capabilities for reviewing properties of computer hardware and software, and what is more, they may produce a variety of technology related neologisms which will be absent from newspapers. Some typical contexts for such cases can be found in the following examples.

36. I thank the authors for making the original hand filtered data sets available to me.

54

Morphological productivity measures

(22) schließlich basieren sie auf PCI, sind skalierbar, finally base they upon PCI are scalable easy synchronisierbar und kostengünstig leicht synchronizable and cost-cheap ‘after all they are based on PCI, are scalable, easily synchronizable and cheap.’ [c’t, pos. 10823286] (23) libstdc++.a, die sich noch in Entwicklung befindet und libstdc++.a which REFL still in development situates and derzeit nur statisch linkbar ist. currently only statically linkable is ‘libstdc++.a, which is still in development and currently only statically linkable’ [c’t, pos. 3790917] Thus a varying pragmatic need for a certain process can influence productivity, and consequently also productivity measures. To quantify the degree of variation we can expect from V within the computer magazine genre, we note that the variances of the two c’t samples are also low for the average type counts (197.73 and 7.51 for averages of 282.2 and 27.2 for -bar and -sam respectively, meaning standard deviations of only 14.06 and 2.74 types), as the boxplot in Figure 3 illustrates graphically. Even though there is one marginal outlier (the point above the -sam plot’s top whisker, showing subcorpus 8 with the maximal 33 types), variance is relatively low for -sam, and -bar spans a range of only 50 types difference, proportionally even lower considering the mean. If we compare the normalized coefficients of variation cv, dividing each standard deviation by its mean, we find values of only 0.049 for -bar and 0.1007 for -sam, i.e. variation between 5–10% of the mean. If we consider the reasons for variance in V, it is quite possible that some of the differences do not come from the vocabulary of each process itself, but may rather arise if a subcorpus happens to contain more or fewer hits of the relevant category, even though the process would generate equally varied types if it were as common in each subcorpus. The measure V thus conflates two types of variance: variance introduced by communicative needs (whether or not the need was felt to produce as many potentiality adjectives in -bar in each subcorpus etc.) and variance coming from productivity in the process itself (how apt it is to produce further types, see below).

55

300

Using type counts

50

100

150

V

200

250

+

+

bar

sam

Figure 3. Boxplots of V for -bar and -sam. The thick lines in each box represent the median, the cross is at the mean value, while the boxes represent the area between the first and third quartile. The whiskers above and below the box represent results within 1.5 times the interquartile range, while individual points outside this area are considered outliers. Given that V results from both choice of a process and the variety of forms it generates, its mathematical meaning becomes clear: V(C,N) is a sample-based estimator for the number of types from a category C which we expect to find in N tokens of a certain type of text. With growing N, the measure V will grow increasingly slowly (indeed the totals for each process from all subcorpora are much less than 10 times the average above). Linguistically speaking, V characterizes the need for forms from C in speakers’ language usage to date: how many different types, and correspondingly concepts, does C encode, and how often are they used? While clearly this is in part due to the distribution of corresponding

56

Morphological productivity measures

ontological types in the external world (e.g. agent noun formations expressing professions are common since we encounter and discuss many humans with relevant professions), there is also a language internal dimension determining how we encode elements of reality, which also influences V. Chapter 5 will pick up the debate about this distinction in detail. If V is linguistically meant to quantify the extent of use in terms of concepts expressed by forms of C, then we might expect that V does not deliver coherent results for arbitrary collections of data. To test this, we can repeat the procedure used on -bar and -sam with ‘meaningless’ categories, e.g. all words beginning with Q- vs. T- in the same corpus. From a semantic point of view, these groups of words should not represent any common concept, though phonologically and etymologically speaking they have important commonalities. As it turns out, Q- and T- deliver somewhat similarly stable results: Table 6.

QT-

Type counts for words beginning with Q- and T- in each of the 10 subcorpora with total sums and mean values.

1 2 3 4 5 6 7 8 9 10 mean all 359 341 322 326 354 388 398 356 360 362 356.6 2089 3437 3280 3188 3365 3203 3324 3201 3344 3385 3216 3294.3 17520

The differences between the subcorpora are statistically significant for Į=0.05 only for T- (p-values of 0.1092 and 0.009726 for Q- and Trespectively in a binomial test of equal proportions), but they are rather small in both cases, with standard deviations of only 23.87 and 89.20 types and coefficients of variation of only 0.066 and 0.027 respectively, i.e. 2–7% of the mean. This small difference is rather disappointing. It seems that having a more or less consistent vocabulary size is no special property of meaningful processes: there is no reason for one 1.5-million-token sample to have many more words beginning with Q- than another. At the same time it also makes sense linguistically that there are more types in German beginning with T- than with Q-, since the latter is much more restricted, appearing almost exclusively in words of foreign origin. In this sense, Q- truly is less productive than T-. There are less German words beginning with Q, but these words are in turn generated by a variety of more or less productive processes including different derivations, compounding, and so forth.

Token counts and in-category vocabulary

57

To sum up, V is a reliable, reproducible measure of the extent of use of a process (at least at the level of words), although it cannot distinguish ‘sensible’ processes from nonsensical ones, since even these will encompass activity from various meaningful processes. It remains up to other factors in the grammar to decide which processes are evaluated, and especially to ensure that a sensible meaning or corresponding concept is expressed by the process in question. Another problem with V is the conflation of the extent of use for the process with the variety of types whenever it is used, a problem which will be addressed in the next section. Finally, V also deals indiscriminately with all lexemes – all words that have already been generated, novel or established – and hence it cannot tell us how likely it is that novel words will be created using a certain morphological process. This issue will be discussed further in Section 4. 3.

Token counts and in-category vocabulary: N(C), f(C) and VC

As mentioned above, use of V to assess vocabulary size is rather common as a starting point for empirical studies of productivity. When used on an entire corpus, V is an estimator for mean vocabulary size for the sample size in question. However, V only tells us how many types of the category C a process produces, provided that category C is used in the corpus. This factor can be influenced by meaning or communicative needs: it is quite possible that some texts contain a high number of formations expressing potentiality with -bar simply because they deal with what a certain subject can or can’t do. If another text is less prone to express this meaning, the extent of -bar will be lower, telling us something about how frequent -bar is in total, but not how varied -bar adjectives are provided that a -bar adjective is used. For this reason, it makes sense to take a look at typetoken ratios or in-category vocabulary, that is, how many different types a process produces in N tokens of the relevant category. Let us refer to this quantity as a function Vc. Vc is defined as the vocabulary size of a category C in N(C) tokens belonging to the category C, and may be written as follows: Vc (C , N (C ))

[2]

The quantity N(C), using Baayen’s terms, is different from a simple corpus size N in that it does not include any other tokens from the corpus, except those that come from C.

58

Morphological productivity measures

In the previous section we counted a total of 7,672 hits for -sam and 26,797 hits for -bar after post-processing in the c’t corpus. In order to get an idea of the variance in Vc, the hits for each process are divided into ten equal sized samples.37 The data for the ten samples for each process can summarized as in the following table. Table 7.

-sam -bar

1 26 278

Type counts for -sam and -bar in 10 equally sized samples from each process. 2 23 259

3 31 280

4 25 282

5 28 274

6 26 276

7 25 275

8 33 293

9 26 295

10 26 299

mean all 26.9 43 281.1 716

Vc has very slightly higher variance than V for -sam at 8.98, but substantially lower for -bar at 140.98. Comparing the normalized coefficients of variation, we find cv(-sam) = 0.1114553 > cv(-bar) = 0.04224075, meaning -bar is more stable than -sam. Note that despite a variation of some 11% of the mean for -sam, more than that found for Tand Q- above, differences between the samples are not significant at this sample size (Į=0.05, test of equal proportions). Since Vc eliminates the variance contributed by different frequency of use for each process in each subcorpus, it appears that -bar was more equally distributed in the c’t corpus. While the difference is not enormous, it may become more significant in more sparsely attested data categories, which will be of interest to us when applying the measures to syntax. In measuring in-category vocabulary across subcorpora, Vc should thus be preferred, since it isolates just one aspect of productivity: the size of the argument spectrum. Since most studies deal with a single sample, V and Vc can be used interchangeably, though Vc can also be implied by specifying that V is being measured for a sample of items of the size N(C), rather than a corpus of N tokens. In order to avoid introducing unnecessary new 37. This means that two hits must be dropped for -sam, so that the ten samples are exactly equal. This is important, since type counts grow with increasing difficulty the more types we have seen: the rate of addition for new types is expected to drop as the sample size grows (see Section 5 for a more detailed discussion). The order of items has been retained, so that each tenth of the sample comes from contiguous data. This means the data gives a realistic idea of any effects of topic bias, i.e. skewed dispersion of the category due to particular texts using more or less forms from a certain process (see also Section 6).

Token counts and in-category vocabulary

59

notations and to ease comparability with other works, this practice will be followed here too, and the notation V will be used for in-category vocabulary both in single samples and in multiple samples where the reference size is given as N(C). Data will therefore be divided into an equal sample size N(C) and not equal sized subcorpora with N tokens. This leaves the other aspect of the extent of use, namely the frequency with which a process is used in the language. If Vc is the extent of type variety given that a process is used, then the sheer number of tokens from a category C within a corpus is an estimator for the probability that the next item, if we increase our corpus by one token, will belong to C. Thus N(C) itself, or for better comparability f(C)=N(C)/N (the relative frequency of the process, token count divided by corpus size), estimates the extent of use for a category: how many tokens in the corpus are -bar adjectives? Measurements of N(C) for our two processes in the equal sized c’t subcorpora are given below (for f(C) the figures may be divided by the size of the corpus, approx. 14.5 million tokens). Table 8.

Token counts for -sam and -bar in 10 equal sized subcorpora of some 1.5 million tokens each.

1 2 3 4 5 6 7 8 9 10 mean all -sam 796 693 715 735 803 857 758 782 779 754 767.2 7672 -bar 2763 2596 2844 2650 2510 2766 2626 2580 2706 2756 2679.7 26797

For N(C) the differences between the subcorpora turn out to be significant: -sam and -bar are not used with the same frequencies throughout the corpus (for Į=0.005). However, unlike Vc, N(C) does not grow or change pace over the course of the corpus – it does not get progressively harder to use the same process, as long as we do not demand novel output. Therefore estimates of N(C), while more sensitive to content, are in principle valid for any corpus of a substantial size. To ensure stable measurements we would ideally need a larger sample and a more homogeneous corpus; but even in the present samples, we can see that individual probabilities are quite similar to the total value for the corpus. For better reliability it is also possible to take a confidence interval based on multiple samples instead of estimating the exact value.

60 4.

Morphological productivity measures

Using hapax legomena: Baayen’s 3 * and 3

As mentioned above, type counts are problematic for assessing productivity since they treat established forms and novel forms indiscriminately: both a familiar word like goodness and a neologism like dancerliness38 are equally types of the category -ness. To address this problem, Baayen (1989) developed the idea of using hapax legomena, Classical Greek for ‘said once’, meaning words which appear only once in a corpus, which he uses in order to estimate the amount of neologisms in the corpus. The intuitive idea behind looking at such words is that productively created items are one-off unique occurrences, and therefore they must form a subset of the hapax legomena (hence HL) in a corpus.39 That this assumption is not strictly true should be obvious: neologisms arise because speakers require them for their communicative needs, and speakers who form neologisms are quite likely to use them several times in succession, as in Japaneseness in (24), which appears only twice in the BNC. (24) The trappings of Japaneseness are worn differently. British workers felt uncomfortable with the slogans common in Japanese factories […] Superficially, there are other signs of Japaneseness — but they are often more ‘Japanese’ than many factories in Japan itself. [BNC, doc. ABE] At the same time, neologisms can only be a subset of HL, since some words may coincidentally appear only once yet still be familiar to the speaker or writer without being ‘productively’ formed neologisms. In fact, if we look at HL for the suffix -sam in the c’t corpus, filtering for compounding etc., we find only seven types, of which six are listed in a medium-sized dictionary (Duden Bedeutungswörterbuch, Müller 1985): furchtsam ‘tremulous’, betriebsam ‘industrious’, sittsam ‘demure’, 38. This example in context reads “The work’s appeal is also the ‘dancerliness’ of its movement - rare in the recent climate of physical theatre”, and is taken from the April project website, see http://rdues.bcu.ac.uk/newwds/19943.html (last accessed 6.9.2012). 39. Ironically, the term hapax legomenon itself was originally used by Hellenistic scholars in the analysis of Greek Classics such as the works of Homer, and usually referred to obscure words of unknown meaning or unusual form, often archaisms preserved for metrical reasons. These forms were hardly ever productive formations, but just the opposite. Nevertheless, in modern electronic corpora, most neologisms will probably be hapax legomena.

Using hapax legomena

61

strebsam ‘striving’, arbeitsam ‘hard-working’ and duldsam ‘acquiescent’. These are simply very rare, but lexicalized words that may be familiar to many (but possibly no longer all) speakers of German. The one unlisted case, zweisam literally ‘two-some’, is possibly formed by analogy to einsam ‘lonesome’ using the numeral zwei ‘two’ and can be used as in the following example: (25) Zweisame Abende bei Kerzenlicht sollte man also tunlichst two-some evenings by candlelight should one so doably nicht von AMP3-Musik untermalen lassen. not from AMP3- music accompany let ‘As far as possible one should therefore not let evenings for two by candlelight be accompanied by AMP3 music’ [c’t, pos. 4417180] While some speakers find this word familiar, others do not, but accept it as a neologism and also accept further unattested forms of the sort dreisam ‘three-some, for three’, viersam ‘four-some, for four’ etc., making this series of formations a possible productive enclave within this otherwise very unproductive derivational process. Using HL as stand-ins for neologisms can therefore only be regarded as a heuristic. Yet their greatest advantage is that they can be unequivocally operationalized for a given corpus, whereas there is no reliable decision procedure for whether or not any particular word is novel. It is therefore an empirical question whether the results of this heuristic are satisfactory, i.e. whether intuitively more productive processes also score higher in the attestation of HL. It is also easy to imagine that the problem of low accuracy (i.e. not every HL is perceived as novel) can be minimized by using a larger corpus (progressively rarer words should not be familiar, though in practice they will also include some noise such as typos), whereas the converse problem of missing neologisms used multiple times can be addressed by looking at higher frequency bands (dis or tris legomena, viz. words that appear twice, three times etc.). If we look at Web data, it is quite possible to find novel adjective forms with the suffix -sam which hardly any speakers would rate as familiar, although their grammaticality might be called into question. For instance, example (26) shows a formation which is explicitly ruled out as erroneous in a school grammar by Marenbach and Gärtner (2010: 35):

62

Morphological productivity measures

(26) das kannst Du dann mit Sicherheit mit dem Piercer that can you then with certainty with the Piercer abklären was er empfiehlt.. ich denke wenn der Kanal check what he recommends I think when the channel verheilt ist, ist er ja auch sehr dehnsam healed is, is he PTC also very stretchsome ‘You can definitely check with the piercer what he recommends.. I think once the channel is healed it’s after all very stretchsome’ [From the nail and piercing forum http://www.schoenere-naegel.de, accessed 28.4.2009] Though most speakers of German would expect a familiar adjective like dehnbar ‘stretchable, flexible’ here, there is nothing to indicate that the writer has not coined this word spontaneously, either more or less intentionally (i.e. possibly involving conscious ‘creativity’), but the word is certainly very unusual and rare. 40 The aforementioned formation with numbers as a base is also attested sporadically with higher numbers as in (27), which would probably be accepted by most speakers: (27) auf jeden Fall zuviel Alkohol in manch viersamer Stunde on every case too-much alcohol in some foursome hour ‘in any case too much alcohol in some foursome hour’ [From http://www.myspace.com/eatlessbread, accessed 6.9.2012] In any event, the scarcity of hapax legomena in the rather small corpus used here seems to work well as an estimate of the unproductivity of the -sam formation, making a measure based on HL an attractive option to test for a correlation between intuitions on productivity and empirical data. Based on the assumption ‘neologisms ๙ HL’, Baayen uses two numbers in deriving productivity measures for processes in a corpus: in his terminology V(1,C,N) is the number of types occurring once from category C in a corpus of N words and V(1,N) is the number of types occurring once in a corpus of N words (see Baayen 2009 for an overview). From these numbers Baayen derives a measure 3 * termed hapax-conditioned degree of productivity, which is said to measure expanding productivity, i.e. the

40. The writer is also, to the best of my ability to ascertain this from her posts to the forum, a native speaker of German.

Using hapax legomena

63

rate at which a process is currently creating neologisms in the corpus.41 It is defined as the proportion of hapax legomena from the examined category C within the hapax legomena from all categories in the corpus, and is written as in [3]. 3*

V(1,C, N) V(1, N)

[3]

Intuitively, if the amount of hapax legomena could be replaced by ‘true’ neologisms only, this would be the relative contribution of a process to productivity in the corpus, which could then be compared between different processes.42 In order to compute this measure, we must know in general how many HL are attested in each subcorpus (again, this number will be smaller for the entire corpus than for the subcorpora, since HL from one subcorpus may be attested again in another). The subcorpora must of course be of equal size, as for the calculation of V. A further measure based on hapax legomena is referred to as the category-conditioned degree of productivity, which measures the potential productivity of a process (Baayen 2009: 902), meaning how likely it is to produce new members, or how saturated a process is. The measure is defined as the proportion of hapax legomena from category C divided by N(C), the total token count from this category: 3

V(1, C, N) N(C)

[4]

41. Confusingly, Baayen also uses the notation P* for a measure of global productivity (e.g. Baayen 1993, see Section 7 below; this confusion has also been pointed out by Bauer 2001: 155). However in some publications, the measure3 * is given as P* with a non-cursive ‘P’, perhaps for typographical reasons (e.g. in Baayen 2009). The same applies to the measure 3 , sometimes written as P, which is presented in [4] below. A strict separation of these notations is upheld here, and readers may orient themselves to the definitions in the respective equations. 42. This statement must be restricted somewhat: in items showing multiple processes, e.g. bullishness, the processes associated with the suffixes -ish and -ness are not statistically independent, creating a difficulty in using such cases for the comparison of such processes (see Baayen 2009: 903–904). The comparisons of -bar and -sam so far have been unproblematic in this respect, since they are mutually exclusive.

64

Morphological productivity measures

0.03 0.01

3

0.0000

3*

0.0010

This measure is probably the most frequently applied empirical productivity measure beside type counts in morphological studies (e.g. Plag, Dalton-Puffer, and Baayen 1999; Lüdeling, Evert, and Heid 2000; Lüdeling and Evert 2005; Gaeta and Ricca 2006; Štichauer 2009; Vegnaduzzo 2009, to name a few, and sometimes using different notation, e.g. Bolozky 1999: 83) and has also been applied in some of the few studies addressing syntactic issues (Kiss 2007, Zeldes 2009, 2011). Here the basic idea is that at every point within a growing sample, processes have an inherent probability of producing previously unseen forms, and that that probability decreases the more we have seen of a process’s output. As before, comparisons between processes for either measure can only take place at an equal sample size (see Gaeta and Ricca 2006): if one process is more common than another and we simply compare the proportion of HL after 100 items from process A and 1000 items from process B, then the comparison is unfair.43 This forces us to reduce the 1000 item process to just 100 items – whichever process then has the higher proportion of HL should be considered more productive. For this reason we use the ten equal sized samples from each process, as we did for Vc. For both measures there is some variance, but there are no outliers (Figure 4) and the data follows a normal distribution (insignificant Shapiro-Wilk Test at Į=0.05).

bar

sam

bar

sam

Figure 4. Boxplots for the variance of 3 * and 3 for -sam and -bar. 43. It is also possible to draw comparisons between models predicting/estimating the development of the vocabulary (Evert 2004) and to form confidence intervals for the range of possible values of a measure for a subset of a sample using a larger sample (Säily 2011), however these methods do not allow significance testing of the difference between two unrelated processes (more on this in the next two sections).

Using hapax legomena

65

Table 9 gives the counts of hapax legomena for each process and each corpus in total, as well as the values of 3 * and 3 Table 9.

Hapax legomena, 3 * and 3 for -bar and -sam.

HL(ALL) HL(bar) HL(sam) 3 *(bar) 3 *(sam) 3 (bar) 3 (sam)

1 70195 105 2 0.00149583 2.8492E-05 0.03919373 0.00260756

2 3 4 5 69411 69607 70415 69324 89 120 112 110 3 8 3 6 0.001282 0.001724 0.001591 0.001587 4.32E-05 0.000115 4.26E-05 8.66E-05 0.033221 0.044793 0.041807 0.04106 0.003911 0.01043 0.003911 0.007823

HL(ALL) HL(bar) HL(sam) 3 *(bar) 3 *(sam) 3 (bar) 3 (sam)

7 71736 107 5 0.00149158 6.97E-05 0.03994028 0.0065189

8 71267 127 11 0.001782 0.000154 0.047406 0.014342

9 70882 127 6 0.001792 8.46E-05 0.047406 0.007823

10 70925 126 5 0.001777 7.05E-05 0.047032 0.006519

6 70654 123 7 0.001741 9.91E-05 0.045913 0.009126

mean all 70441.6 356075 114.6 260 5.6 7 0.001626 0.00073 7.94E-05 1.97E-05 0.042777 0.009703 0.007301 0.000912

The values for 3 * are of necessity very low, since these processes contribute only a very small amount to the productivity in the entire corpus. But in principle, both measures are bounded between 0 and 1, where 0 results from no HL whatsoever, and 1 means that all HL in the corpus come from C (for 3 *) or all Cs are hapax legomena (for 3). As already mentioned, in order to compare results from two processes, it is necessary to reduce the data from the more frequent process to the amount attested for the other. For example, since we have 7 HL in 7672 tokens of -sam adjectives, we need to see how many HL are found in 7672 tokens of -bar adjectives, which turns out to be 153. At this amount of data, the HL ratio stands at 7 : 153, and 3 compares at 0.0009 : 0.0199, meaning that -bar is over 20 times more productive than -sam at this point (or more exactly, the probability that the next -bar adjective in the corpus after 7672 instances is novel is over 20 times greater than for the next -sam adjective after as many attestations). Since 3 * is also based on the HL count, and the total HL count in the corpus from any process is identical for both processes, 3 * will always deliver the same ratio between processes as 3

66

Morphological productivity measures

if used on same-sized samples. It is therefore more logical to apply 3 * to the full samples and compare values, since its meaning relates to the portion of innovation each process plays as a whole in a certain mass of data. Comparing the same processes in larger or smaller corpora and other registers again delivers somewhat different results: for example, Lüdeling et al. (2000) report lower 3 values, though of the same order of magnitude, for -bar (0.0053) and -sam (0.0002), which is understandable since their corpus was larger and their genre seems to be less productive (especially for -bar). This does not have to be a problematic result: it is well established that different text types use different constructions more or less productively (Plag, Dalton-Puffer, and Baayen 1999), and the different results represent the slightly different usage-based conventions and communicative needs of each text type, just as for V. The important result, through which the validity of the measures could have been compromised, is that the distribution of these measures within each subcorpus from the same dataset is remarkably consistent, despite foreseeable differences in communicative needs between different texts. The advantages of HL-based measures, as opposed to the vocabulary and frequency measures presented earlier, have already been alluded to: they do not treat all types alike but give those types which are likely to correspond to neologisms a special role. In this sense Baayen has likened 3 * to the current expansion rate of novelties in a process, while suggesting that 3gives us an idea of what we can expect from a process in the future: if the process has a low proportion of HL then it is almost saturated and not many neologisms are possible. A high proportion of HL means that the probability of further neologisms is still high, thereby implying more productive behavior in the future. To concretely understand the difference between these measures and vocabulary-based ones we can consider a third process: derivation of adjectives with the suffix -lich, which is intuitively quite frequent with a wide range of lexicalized adjectives as in (28), but can also be used productively as in (29), which shows a likely novel hapax legomenon constructed with this suffix. (28) Er setzte ein freundliches Lächeln auf. he set a friendly smile up ‘He put on a friendly smile.’ [c’t, pos. 389199]

Using hapax legomena

67

(29) Beim frühstücklichen Lesen unserer Lokalzeitung stieß at-the breakfastly reading our.GEN local-newspaper bumped ich auf die mitgeschickte Karstadt-Anzeige I upon the with-sent Karstadt-advertisement ‘During our breakfastly reading of the local paper, I ran into the enclosed Karstadt ad’ [c’t, pos. 3429848] Since this suffix and the other two are mutually exclusive, we can directly compare the processes, which may be considered independent. For the entire c’t corpus, a corresponding manually filtered result set for -lich exhibits V=829, a somewhat higher type count or realized productivity than -bar. But although -lich is a very frequent suffix in German morphology, it is intuitively not as easy to generate novel forms which speakers might assess as previously unheard as with -bar. This is mirrored by its rate of HL: only 201 are attested in 120,458 tokens, leading to a 3value of 0.0016. However for comparison with the other two processes, we must measure 3 after the same amount of tokens; after 7672 tokens, there are 124 HL, leading to 3 N(c)=7672 = 0.0161, which, although higher, is still lower than -bar, but much higher than the almost completely unproductive -sam. Thus we can say that -lich shows a slightly higher realized productivity than -bar in the c’t corpus, but a lower potential productivity. Finally to return to the apparently nonsensical example of words in Tand Q-, we discover here too that these ‘processes’ also have substantial proportions of HL. These will of course contain many productive results of a variety of processes, for items which happen to begin with T- or Q- – certainly every HL must begin with some letter or other. But here too, Tbeats Q-: with 8183 to 1392 HL at the minimal common sample size, Tsimply produces a greater proportion of HL, again explainable by the fact that Q- characterizes mostly foreign roots, which enter into fewer novel native combinations. This does not mean that absolutely any process we choose will exhibit many, or indeed any HL. The selection of personal pronouns, for instance, exhibits only a few types in Modern English, depending on how one counts, perhaps seven: I, you, he, she, it, we and they (for simplicity we may treat the other case forms as an inflectional process, though the same could be said for person and number). If we posit a process generating these types, it will exhibit no HL even in a relatively small corpus. Thus HL-based measures are quite capable of assuming values of zero for truly unproductive processes with an enumerable output. But they are also inadequate for the unequivocal identification of

68

Morphological productivity measures

‘meaningful’ processes: they only tell us if a process is productive under the assumption that it is sensibly defined. The meaning of HL-based measures in general should now become clear: they focus on quantifying the potential for innovation based on the probability of a new previously unseen type, as opposed to the repetition of familiar types. In this function they are consistent, linguistically interpretable, and importantly correspond to intuitions about ease of generation for new forms. What they do not take into account, however, is the frequency spectrum beyond unique words (do items with the frequency 2 or 3 also matter? Or certain patterns of frequencies?), nor the development of HL within the corpus and beyond the current corpus size (how often and how uniformly do HL arise? When can we expect further HL?). But perhaps most importantly, HL-based measures are merely descriptive, in that they apply to a given observed sample size – they do not directly tell us how the category will behave for larger samples, or even more interestingly, how many types the entire vocabulary of a process can potentially manifest. These points will be discussed in the following sections. 5.

Vocabulary growth, frequency spectrums, $ and ș

Though in measuring V we simply end up with a certain number of types in an entire corpus, these do not arise uniformly, since as discussed above, it becomes progressively harder to come up with new items as more and more data is taken into account. To chart the development of vocabulary across time, or across a written corpus (the best approximation of time which we have large amounts of data for), we can simply plot the number of tokens we have seen versus the number of types within them.44 Figure 5 shows the development of vocabulary for the suffixes -bar and -sam; each curve is referred to as a vocabulary growth curve or VGC (see Baayen 1992: 113, Evert and Baroni 2007).

44. Much like in the case of V, we can choose to look at the total number of tokens we examine N, or only tokens from the relevant category, N(C). It is standard practice to observe V for N(C), and I will follow this practice here, though in some contexts it may be interesting to consider N as well. Note also that in many publications N(C) is abbreviated with N, which is unfortunate, since this notation already refers to the tokens of the entire corpus.

69

400 200

V

600

Vocabulary growth, frequency spectrums, $ and ș

0

bar sam 0

5000

10000

15000

20000

25000

N(C)

Figure 5. Vocabulary growth curves for -bar and -sam. The vertical line marks the end of the data for -sam. The x-axis charts our progress through the adjectives of each category in the sample, while the y-axis shows how many types have been accumulated. At first, almost every token provides a novel type and the curve rises, but as the sample progresses, more and more familiar items are seen again, leading to a gradual ‘flattening out’ of each curve. For the unproductive -sam the curve is almost completely flat already; for -bar it is hard to tell how much longer it would take for the curve to become completely flat, if ever. This property of VGCs is useful for getting a general impression of the saturation of vocabulary from a process and offers a good rule of thumb: a flat curve means unproductive, a rising one means more productive (cf. Evert and Lüdeling 2001). As for reliability, VGCs are by definition as reliable as V, as they tally its values at multiple points. The curves appear to be quite consistent for the 10 equal sized samples in the c’t corpus: Figure 6 shows the average curves for a tenth of the data, which naturally contains the first segment of the entire corpus curve (that is the first of the 10 samples). The mean VGC exhibits the same level of variance as V for any given sample size (the dashed lines around each curve give the area of ±1 standard deviation and

70

Morphological productivity measures

100

bar

50

V

150

the dotted lines show a 95% confidence interval, i.e. the area in which 95% of all VGC samples are expected). 45

mean(VGC) +/- 1 SD 95% conf. int.

0

sam

0

500

1000

1500

N(C)

Figure 6. VGCs with standard deviations and 95% confidence intervals for 10 equal sized samples of -bar and -sam. Aside from the use of such charts as a visual heuristic for productivity judgments (steeper curves show greater productivity), they offer even more accurate insight in that the exact growth rate for the vocabulary at a certain point in the corpus could theoretically be given by the derivative (i.e. the slope) of the curve at that point, if there were a simple formula to produce exact VGCs (Baayen 1992: 115). In fact, the measure 3 can estimate the slope of the curve at its end, or any other point at which it is measured: it tells us how likely a novel type (a new HL) would be at this point in the 45. An even more accurate technique is described in Säily (2011), where all documents in the corpus are taken as subset samples and reordered randomly across their possible permutations. This allows Säily to calculate a confidence interval for significantly deviant samples among all possible samples, which is valuable for her scenario of using the entire dataset for a process as a reference point for deviating subsets of that process which would otherwise only let one use partial data for the comparison (in her case, the smaller samples are of male and female speakers). It should be pointed out, however, that these intervals do not allow a comparison of measure values beyond the smaller, maximal common sample size, since the interval for the less attested process, in this case -sam, cannot be extended further to match that of -bar.

Vocabulary growth, frequency spectrums, $ and ș

71

400 200

V,V1

600

data, which is precisely the likelihood that the curve will rise by one type. It is now even more apparent why 3 cannot be directly compared for the two processes – the slope changes with sample size, meaning a fair comparison of the slopes must take place, at the very latest, at the largest common sample size, signified in Figure 5 by the vertical line perpendicular to the end of the -sam curve. It is equally possible to use VGCs to describe individual frequency bands separately from V. Thus we could chart the development of Vm (types with the frequency m), or more particularly V1, viz. hapax legomena. Figure 7 adds HL data to the plot in Figure 5.

0

V(bar) V1(bar) V(sam) V1(sam)

0

5000

10000

15000

20000

25000

N(C)

Figure 7. Development of V and V1 (the lower curve in each pair) for -bar and -sam. Unlike the V curve, which can only rise and possibly become flat at some point, V1 (the lower curve for each process) can fall, since HL can be detracted as soon as they are encountered a second time. Thus the shape of the V1 curve is expected to be something like a very shallow inverse parabola: it rises for a while, but for enumerable processes it should eventually reach zero again as soon as the vocabulary is exhausted and every type has been seen more than once. In fact, when N(C) or N approach infinity, not only V1 approaches zero, but Vm for any m (all items seen twice will eventually occur a third time, a fourth time and so on), which Baayen (2001: 51–52) formulates as follows, using the same notation as in [3] above, but with the variable m replacing the frequency 1 that was used for hapax legomena:

72

Morphological productivity measures

lim V ( m, N )

N of

0 [5]

Baayen thus describes the effect of diminishing productivity on all frequency bands as a gradual elimination of the lower ranks m of Vm. A more visual way of understanding this is by plotting the amount of types Vm that each and every frequency m has: from hapax, dis and tris legomena (m=1,2,3…) to the highest frequencies. This is the purpose of frequency spectrums (sometimes referred to as SPCs; see Baayen 2001: 8–12, Evert and Baroni 2007), which are shown for our two processes in Figure 8. sam

1

2

3

4

Vm

50 100 0

Vm

5

200

6

7

bar

1

5

50 m

500

1

5

50

500

m

Figure 8. Frequency spectrums for -bar and -sam adjectives in the c’t corpus (the x-axis is logarithmically scaled). Lines between points do not imply continuity and are merely meant to emphasize differences in the shapes of the distributions. The spectrum for -bar is typical of a productive process, with very many items for m=1, fewer for m=2 etc. (this is expected under Zipf’s Law, which will be discussed in the next section). -sam is more irregular, with more tris than dis legomena, and generally very small differences between frequency bands. Baayen (1992: 118) has described a shift of values in the frequency spectrum toward the right as characteristic of waning productivity, since as the products of a process approach enumerability, the most prevalent frequency will cease to be 1, with growing corpus size. For a completely unproductive process, such as the pronouns discussed earlier, one would expect no HL and only sporadic points at Vm=1 for a variety of m values (e.g. one pronoun appears 100 times, another 150 times etc.). This

Vocabulary growth, frequency spectrums, $ and ș

73

3 1

2

Vm

4

5

is illustrated in Figure 9 by the spectrum for German personal pronoun lemmas in the c’t corpus.

1

10 100

10000

m

Figure 9. Frequency spectrum for personal pronouns in the c’t corpus. All personal pronouns are very frequent, but no two pronouns occur exactly as often. This is the expected result for a completely enumerable category, one that no productive process is generating. The fact that pronouns differ from -sam in their much worse adherence to the typical spectrum shape in Figure 8 shows us that they are not productive, and have not been productive in recent times. The distribution of -sam is one of a process that is dying or has recently died, but was productive until that point. It is plausible that it has inherited a vestige of an older productive distribution, but it is still noticeably different from -bar. At this point it might be justified to ask again why hapax legomena should have a special status, given that dis and tris legomena may also represent productively formed novel items. In fact, it is conceivable that even more frequent types, which speakers are familiar with, may be produced compositionally and productively by a process instead of being retrieved from memory, meaning these may need to be factored in as well, possibly with a lower weighted contribution. Baayen (1993) attempts to answer this challenge by adopting the Morphological Race Model (Frauenfelder and Schreuder 1992) mentioned in Section 5, in which items can be interpreted (or produced) simultaneously by parsing and retrieval from memory, with the faster procedure ‘winning the race’. He suggests a cutoff frequency ș, starting with which items are so familiar that retrieval from memory is faster than parsing, and defines an activation level $ (a

74

Morphological productivity measures

cursive capital A) for each process based on the count of parsed tokens it exhibits (the notation has been adapted from Baayen 1993: 196–197 for consistency with the formulas above): T 1

$

¦ m ˜Vm

[6]

m 1

This is akin to taking the left portion of the frequency spectrum up to a certain value of m and weighting each point by its rank in linear succession. Though the idea is in principle theoretically motivated, the problems with this approach are numerous, even if we accept the morphological race model: it is not clear what ș should be for a given sample size (Baayen estimates it heuristically at a token count of 8 for an English sample of 17,979,343 tokens and at 20 for a Dutch sample of 42,380,000 tokens); it is doubtful that every single type switches between parsing and retrieval at precisely the same frequency; it is unlikely that activation levels are static (e.g. priming can lead to an otherwise parsed item to be retrieved and vice versa); and other factors like semantic coherence (i.e. how well the semantics of an item fit with the compositional behavior of the construction) may also lead to differences between items. Additionally, this approach inevitably introduces many more familiar, non-neological items (even HL often contain many familiar forms that are also listed in dictionaries), which is a concern if we are interested in neologisms. It is probably because of these reasons that $ and ș have found little acceptance in the morphological literature since, whereas the more intuitive and operationalizable, but admittedly coarse use of total vocabulary growth and HL to represent productive formations have enjoyed considerable popularity.46

46. A different approach to establishing parsability represented in Hay and Baayen (2003) and related work is neglected here, namely the use of the frequency ratio of base forms to derived forms (e.g. derived illegible is more frequent than its base legible, while illiberal is more rare than liberal, see Hay and Baayen 2003: 102). While interesting for morphology, this approach is difficult to apply to syntax, since most syntactic constructions require certain complements in order to be instantiated. Thus the frequency ratio of with without any argument to that of with with the direct object salad is immaterial to the parsing of the a-structure in with salad (see also Chapter 4, Section 2 and Chapter 5, Section 3 on hierarchical selection and collocations).

Vocabulary growth, frequency spectrums, $ and ș

75

5000

V

V

4000

T Q

0

0

Vm

8000

T Q

15000

Finally, turning to vocabulary growth for ‘ill-defined’ processes, the situation for the data from words in T- and Q- is once again unsurprisingly similar to that of any productive process: since, as we have seen, V and V1 behave similarly for these open classes, it should come as no surprise that their VGCs and frequency spectrums for Vm also react in a similar way. However as Figure 10 shows, it is again clear that Q- is less productive than T-.

1

2

5

10

50 m

200

0

50000

100000

150000

N(C)

Figure 10. Frequency spectrums and VGCs for words in T- and Q- in the c’t corpus. We can therefore say that words in T- and Q- arise productively just like adjectives in -bar, though in fact we expect the types belonging to these classes to belong to several productive and unproductive processes (indeed, some of these will be -bar and -sam adjectives). Identification of these processes is again the task of the grammar. So far we have looked at descriptive, summary statistics for the types exhibited by a process. These allow us to make some predictions about the behavior of population data for a certain sample size: average VGCs can for instance tell us how many types we can expect in a corpus up to the sample size, with a confidence interval based on the variance from multiple samples. The measure 3 can even tell us how quickly vocabulary is growing at each point, expressing the slope of the observed VGC. But while 3 can be used in lieu of a formula to describe the curve, it is still inferior to such a formula, since it gives us no information on the behavior of V beyond the observed area. It is therefore desirable to find a statistical model which approximates the VGC for a given process, allowing us to reliably extrapolate values beyond our observed sample. This would allow

76

Morphological productivity measures

us to predict V or Vm at an arbitrary sample size and to find the asymptotic limit of the VGC, which corresponds to the total expected vocabulary size (i.e. the point in the future when the VGC may become flat). In order to do so it is necessary to consider the probability of each frequency type: how quickly do HL turn into dis legomena? How much more frequent are the most frequent types? When will V1 reach zero, or any Vm for that matter? To answer these questions and estimate vocabulary size we now turn to a more thorough consideration of the frequency distribution that governs productive processes. 6.

Estimating total vocabulary: Zipf’s Law, LNRE models and S

Beyond characterizing attested vocabulary, the notion of the total possible number of types for a process, referred to as S by Baayen,47 was discussed as a measure of productivity by Aronoff (1976: 36), who suggested using the ratio of realized types to conceivable types to quantify global productivity (see Section 7 about such complex measures). The main difficulty is finding a decidable criterion for ‘possible types’,48 which for productive processes are expected to be non-enumerable (Baayen 1989: 30 therefore refers to Aronoff’s suggestion as an index of unproductivity). Nonetheless, if we could understand how vocabulary grows over time and formulate a statistical model which can predict vocabulary growth for an arbitrary sample size, it might be possible to find an upper limit for V of a given process which could not be realistically exceeded. In order to do so, it is necessary to examine the way V behaves more closely, or in particular, how its frequency bands behave in our data. To consider the frequency distributions of all types from a process, and not just the HL, the most intuitive approach is to count tokens of each and every type and sort types by frequency. As noticed already by Zipf (1949), 47. The notation S comes from the word ‘Species’ in the study of population growth in animals, as opposed to a specific sample (for word types, observed vocabulary V). See Good and Toulmin (1956) and Efron and Thisted (1976). 48. Actually, defining the attested types is also non-trivial, especially since one might only wish to consider productive, unlexicalized formations as characterizing a process – cf. Bauer (2001: 144), who points out that many Latinate affixes are very frequent in English, but present almost entirely in early French loan words, which were already borrowed as a whole and not derived actively within the English system. For syntactic equivalents, this may be less of an issue, though collocations may play a similar role.

77

Estimating total vocabulary

plotting these frequencies for all words in a corpus typically leads to a discrete, power law-like distribution as in Figure 11A. The empirical data is approximately log-linear (C), seeming to adhere quite closely to a population model defined by the ideal Zipf distribution on the right (D). 1000

B

600 200

400

frequency

4e+05 2e+05

0 400

600

800

1000

0

600

800

1000

rank

D

1e+00

1e+02

frequency

1e+04

C

1e+00

400

1e+06

1e+06

rank

200

1e+04

200

1e+02

0

frequency

Zipf distribution

800

A

0e+00

frequency

6e+05

c't Magazin

1e+00

1e+02

1e+04

1e+06

1e+00

1e+02

1e+04

1e+06

Figure 11. Ranked frequencies for word forms in c’t: the top 1000 ranks (A) and all words (C, logarithmic); and corresponding expected Zipf distributions on the right for the top 1000 (B) and as many ranks as there are in c’t (D, logarithmic). The law behind this distribution describes the relative frequency of progressively more infrequent items in a data sample, which decreases in a harmonic series: the second item is half as frequent as the first, the third is a

78

Morphological productivity measures

third as frequent as the first, and so on.49 Put formally and applied to probabilities following Baayen (2001: 15–16), Zipf’s Law describes the relationship between the probability of an item ʌz and its rank z. Since these probabilities should sum up to 1 (the harmonic series eventually diverges to infinity), the relationship is modified by a normalizing constant C, resulting in [7].

Sz

C [7] z

Thus the probability of a word is systematically related to its rank. The relevance of this law to the distribution of word types was originally formulated by Zipf as resulting from the principle of minimum effort between speaker and hearer and has been similarly explained by Mandelbrot (1962) in terms of information theory as optimizing the cost of coding per unit of transmitted information (see also Powers 1998): words that are less informative are more frequent and vice versa, corresponding also to basic precepts of markedness theory (i.e. lower Zipf ranks are less marked). 50 But perhaps most interesting for the present discussion is the fact that frequent, less marked items are also the ones which are most strongly lexicalized, whereas novel items are unfamiliar, less predictable and therefore transmit a higher amount of information. The consequence of this state of affairs is that Zipf’s Law applies not only to the vocabulary of all words in a text, but also to all types from a certain productive process: the frequencies of progressively less common items decay according to the law, until we reach the minimal frequency of 1, that of the hapax legomena. For example, the vocabulary of all -bar adjectives in the c’t corpus behaves similarly to the general vocabulary we saw in Figure 11.

49. Incidentally, this law applies to randomly generated texts as well, as shown for example by Miller (1957) and Li (1992). 50. At least in the more recent non-binary sense of markedness: markedness theory was originally meant to distinguish the more marked member of a binary opposition, especially in phonology (Trubetzkoy 1989 [1939]), but has since been applied to various areas of linguistics and larger sets of alternatives, see Waugh and Lafford (1994). Baayen also subscribes to the idea that markedness correlates with (potential) productivity, see Baayen (1993: 190– 191).

79

500

B

50

1500

A

1

5

0 500

freq

2500

Estimating total vocabulary

0

200

400 rank

600

1 2

5

20

100

500

rank

Figure 12. Ranked frequencies for -bar adjectives in c’t on an ordinary (panel A) and double logarithmic (B) plane. As in Figure 11, a cluster of hapax legomena leads to a straight line at the bottom right of panel B which ends the graph at freq=1 (and a shorter line on the step above it for the dis legomena), but this is simply a result of the smallest possible frequency being exactly one. Rephrased in terms of probabilities, as in [7] above, we could even imagine probabilities for unseen items, which in the current corpus size have an expected occurrence rate of less than 1 (hence they do not occur). According to Mandelbrot (1962: 194), it can thus be assumed that z is unbounded, that is, the rank of the least probable word approaches infinity just as its probability approaches zero, which fits nicely with the definition of productivity on the basis of non-enumerability. The types for productive processes should correspondingly show a Zipf distribution, with ever more, progressively infrequent items, which leads us to believe that the next frequency class of items, those with an expected value below 1 in our corpus, are the neologisms we have not yet seen but expect for productive formations. The complete set of these neologisms and all attested types comprises S items; in other words, S is the limit of V when N approaches infinity (cf. Baayen and Lieber 1991: 817), at which point ʌ for the next rank approaches zero: S

lim V

N of

[8]

However, Zipf’s Law is neither the only model suited to describing word frequency distributions, nor the best one. Two problems with the

80

Morphological productivity measures

application of Zipf’s Law to natural language data were observed quite early (see Mandelbrot 1953; Baayen 2001: 17–18): a. Zipf’s Law overestimates the frequency of the lowest ranks (the frequencies of the second, third etc. items are a little too similar, cf. the points at the top left of Panel A in Figure 11, which would have to be much farther apart to resemble Panel B); and b. the exact distribution varies not only by process, but is systematically dependent on corpus size: the larger the corpus, the higher the probability of the most frequent items and the sharper the transition to infrequent items (i.e. the log-linear slope as seen in Panel C becomes more steep with decreasing velocity as the corpus grows), corresponding to the already discussed progressive difficulty in innovating as vocabulary grows and grows. To correct these problems, several models have been suggested, which all belong to the class of what Baayen (2001) calls LNRE models, i.e. models suited to explaining a Large Number of Rare Events. Since, as we have seen, vocabularies for a process can be quite large, but each type is relatively rare (especially neologisms, or more operationally HL), such models specialize in describing (or more formally ‘fitting’) frequency distributions where only a few types are frequent. Following Evert (2004), I will concentrate on the empirically most successful model, the ZipfMandelbrot distribution defined by Mandelbrot (1953, 1962), which refines Zipf’s Law with two additional parameters, a and b (see also Evert 2004; for a discussion and comparison of different models see Chapter 3 in Baayen 2001):

Sz

C (z  b) a

[9]

The parameter b compensates for the unexpectedly similar frequency of the first few items by lowering their probability: while z is small, the addition of a constant b to the denominator lowers ʌz substantially, whereas for higher ranks the effect becomes negligible. The parameter a regulates the steepness of the curve by making progressively larger increases to the denominator as z increases – how quickly they grow depends on the exponent. Thus Zipf’s Law can be seen as a specific case of a generalized distribution model family described by the more general Zipf-Mandelbrot model (hence ZM), where a=1 and b=0. Using these parameters it becomes possible to fit the optimal ZM to an observed distribution, which may depend both on corpus size, but also on

Estimating total vocabulary

81

B

bar sam

50

1500

bar sam

500

A

1

0

5

500

freq

2500

the productivity of the process in question, since subtly different distribudistributions will apply to different processes. For instance, if we compare the frequency distribution of -bar adjectives to that of the -sam adjectives, we notice that the -sam curve is much steeper, and conforms less convincingly to the ideal Zipf distribution (Figure 13).51

0

20

40

60 rank

80

100

1 2

5

20 50

200

rank

Figure 13. Comparison of the frequency distributions for -bar and -sam adjectives in the c’t corpus in normal (Panel A) and logarithmic scales (Panel B). Just as in Section 4 above, we notice the paucity of hapax legomena for -sam at the bottom of Panel B, as well as large gaps at the lower ranks (more clearly visible in panel A). How well parameters for the model fit these curves can be estimated using an appropriate Ȥ2-test for goodness-offit up to a finite rank, and the optimal curve is found by minimizing a cost function associated with the fit, given the observed data (for details refer to Baayen 2001: 118–124 and Evert 2004). The freely available zipfR library (Evert and Baroni 2007) for the statistics program R (R Development Core 51. Since the Zipf distribution is log-linear, the deviation from it can be quantified by fitting a linear model with a quadratic term to each observed distribution in the double logarithmic plain. The more variance is explained by a linear correlation of rank and frequency, the less significant the contribution of the quadratic term to the model. Fitting a model with ordinary least squares regression (see Baayen 2008: 169–195) reveals that the loglinear correlation is significant for -bar but not for -sam (Į=0.001), for which the quadratic term is also over three times as large (see Appendix B for details on the regression model).

82

Morphological productivity measures

Team 2003) contains an implementation of this function for the ZipfMandelbrot model, which allows us to compute the best parameters for our processes. These can then be used to describe the frequency distribution and vocabulary growth of each process with a continuous, predictable density function which deviates from the observed data minimally. Figure 14 shows the superimposed interpolated and empirical VGCs, as calculated based on the fitted density functions.

600

emp ZM

400 200

V

bar

0

sam

0

5000

10000

15000

20000

25000

N(C)

Figure 14. VGCs for -bar and -sam based on observed data and a ZM interpolation. Evidently the interpolated curves succeed in following the empirical data quite closely, but smooth out certain irregularities. In principle, the function described by the ZM can be used to extrapolate the further development of the VGC, predicting the values of V at greater, unobserved N(C) values. However as pointed out by Evert (2004), for high N(C) the ZM becomes unrealistic, since it necessarily assumes an infinite vocabulary for any process (this is a consequence of z being unbounded) and eventually diverges to infinity, even for clearly finite processes. Evert therefore introduces an extension of the model called a finite Zipf-Mandelbrot model (fZM), which adds a parameter A to the ZM density function that forms a lower cutoff point for the probability ʌ. This model has the advantage of being able to predict a finite vocabulary and has been shown to converge reliably on unseen data in large samples (see Evert 2004). With this model at hand, we can now extrapolate values for V at larger sample sizes. For

Estimating total vocabulary

83

instance, we can answer the question “how many types can we expect for -sam if we had as many tokens for it as we do for -bar?”. Though this estimate becomes less reliable as the extrapolation target size grows, Figure 15 shows that even immediately after the observed curve, vocabulary is hardly expected to increase.

600

emp fZM

400 200

V

bar

0

sam

0

5000

10000

15000

20000

25000

N(C)

Figure 15. Empirical VGCs for -bar and -sam and fZM curves, extrapolated for -sam to an equal size. We can also use the parameters of the fZM’s density function g(ʌ), given by [10], to directly estimate the limit of total vocabulary S using the model’s limit of V, as in [11]:

­C ˜ S D 1 A d S d B g (S ) : ® [10] otherwise ¯0







C ˜ AĮ  B Į [11] Į

where C is the normalizing constant ensuring probabilities sum up to 1, A is the minimal probability estimated by the model, B is the maximal probability (fitted to a point near the attested probability ʌ1 of the most frequent type) and Į is a parameter between 0 and 1, derived from the ZM’s

84

Morphological productivity measures

a.52 For our processes S is estimated at 48.20738 for -sam, meaning that at the current rate only about 5 further types are expected for -sam, and at 1691.171 for -bar, meaning we can expect its vocabulary to more than double before exhausting itself (since -bar exhibits 716 types in our corpus so far). At this point it is worth asking what these numbers mean exactly. Would the validity of the fZM’s model be refuted if we can find more than 5 -sam adjectives or more than 975 -bar adjectives which do not appear in our corpus? A closer look at the logic behind the fZM’s estimation will show that this formulation is too harsh. Firstly, the accuracy of the model should be evaluated. A multivariate Ȥ2 test for goodness-of-fit suggests that the -bar model is much less adequate than the -sam model. While the -sam curve does not differ significantly from the fZM curve, the Ȥ2 score for -bar shows a significant difference, which could already be discerned to an extent in the VGC plot above. Table 10. Parameters and goodness-of-fit for the fitted fZMs of -sam and -bar.

-sam -bar

Į A B C S Ȥ2 df p-value 0.2062 2.39E-05 0.555212 1.266582 48.20738 3.626294 3 0.304748 0.4248 2.14E-06 0.062562 2.839161 1691.171 50.04167 11 6.15E-07

Secondly, the LNRE models discussed here assume that types are generated at random, which is certainly not the case. Types tend to cluster together for thematic reasons or communicative needs: even a newly coined word is likely to be repeated several times within the text that introduces it, possibly never to occur again. This condition is called underdispersion, i.e. certain types are not dispersed across the entire corpus evenly but are rather only present in a certain segment of the corpus (see Baayen 1996, 2001: 162–173; Evert 2004: 420–421). This leads the model to overestimate probabilities in the observed data but underestimate vocabulary beyond our observations, since the model expects disproportionately more occurrences 52. The new parameter Į arises instead of a since solving the value of z in equation [9] requires us to raise ʌ to the power of 1/a. The derivative of the resulting function, g(ʌ), raises ʌ to the power of -Į-1, and because of the division 1/a, the corresponding parameter in the derivative must be constrained between 0 and 1; this parameter is called Į. The interested reader is referred to Evert (2004: 414–416) for details of the solution.

Measuring global productivity

85

throughout the corpus of those items that cluster, even though types which form clusters may in fact be unique phenomena that will not be repeated again, or at least only very rarely. Finally, we must keep in mind that LNRE models, like all corpus based models, only apply to ‘more data of the same kind’. This means that the fZM’s prediction of only 5 more -sam adjectives is not a prediction for the German language as a whole, but for the type of language represented by c’t Magazin. Put more exactly, it predicts that if we continue reading new issues of c’t Magazin or another comparable text which is written in the same register or ‘style’, for a hundred years or even indefinitely (barring language change in the interim of course), we should expect to encounter about 5 more -sam adjectives and about 975 further -bar adjectives, a prediction that is much easier to accept as plausible than the formulation above (even more so if we consider that a magazine’s editorial process may eliminate some unusual words). As for the behavior of S for nonsensical processes, here too it should be clear that S can be computed for any class exhibiting an LNRE distribution (otherwise the goodness-of-fit for the underlying model would be very low). Its reliability is a direct result of the reliability of the model, which in turn results from Vm, as discussed above. Before ending the survey of morphological measures, it is worth taking a look at previous attempts to find ‘global measures’, which try to assess an overall productivity index for processes based on several criteria.

7.

Measuring global productivity: I, , and P*

Early on in the debate in morphological studies there was an attempt to reduce the notion of productivity to a single global scale, either ordinal or ratio-scaled, on which all processes could be arranged. The quality of empirical measures was then evaluated according to whether or not they matched intuitive expectations of how that absolute scale should behave (see for example the discussion between Baayen 1992 and van Marle 1992 about the correct order of productivity for certain processes in Dutch). Since, as we have seen, productivity involves several distinct aspects, notions of global productivity must factor in multiple measurements. The first such attempt can be found in Aronoff (1976), who suggested taking the ratio of observed types to the possible amount of types that could potentially occur, based on an estimate of the size of the class of bases. Since the size of the class of potential bases corresponds exactly to the

86

Morphological productivity measures

potential vocabulary size, Baayen and Lieber (1991: 803) formalize this ratio in terms of V and S as:

I

V [12] S

where I stands for the ‘Index of Productivity’. Although at the time it was suggested it was still rather unclear how the quantity S could be estimated with precision, we can now use predicted S values based on the LNRE models presented in the previous section in order to calculate I. This measure would then correspond to the proportion of the vocabulary from a particular process which has already been realized in a sample corpus of a certain size out of the maximum possible vocabulary, at least for a predetermined type of text. This factors in two separate aspects of productivity: both the potential and realized productivity measures discussed in the previous sections, though other aspects such as the token frequency and the proportion of neologisms within the vocabulary (represented by hapax legomena or estimated otherwise) are not directly addressed. Nevertheless, when using the LNRE models above, these would form part of the estimate of S, since LNRE models are calculated based on frequency spectrum data. Theoretically speaking, if our corpus were to comprise all utterances so far in the entire ‘language’ (however we may define this set), then I would be the proportion of vocabulary produced by a process in the past, out of the collected vocabulary it already has and will ever produce in that language. Baayen (1989, 1993) and Baayen and Lieber (1991), concur with Aronoff’s view that two different aspects must be integrated in the consideration of global productivity measures, namely the probability or frequency with which new types arise (estimated in their case by 3 ) and some measure of the vocabulary size, either realized (V) or potential (S). In Baayen and Lieber (1991), V and 3 are regarded as two distinct dimensions of a multidimensional model of productivity. Following their representation of global productivity, which they refer to with the term P* (non-cursive), we can plot the data on -sam, -bar and -lich from the previous sections as in Figure 16, which shows our data next to Baayen and Lieber’s original example.

1000

Measuring global productivity

A

er

87

B

-lich 800

600

ation

600

ness ity

400

-bar

N-ish

V-al esque en0.000

200

in ous un ment ee de-N

0.001

0.002

-sam

Adj-ian 0

0

200

400

V

0.003

3P

0.004

0.005

0.006

0.000

0.002

0.004

0.006

0.008

0.010

0.012

3 P

Figure 16. Global productivity (P*) for some English affixes from Baayen and Lieber (1991) in Panel A; and for -bar, -sam and -lich according to the same scheme in Panel B. Each point in each panel corresponds to a class of vocabulary items, the attestation of words in a corpus generated by a particular affixation process. The V dimension on the y-axis gives as before the vocabulary size seen so far for that process in the corpus, while the x-axis gives the value of potential productivity 3 as measured at that point in the sample, for however many tokens were found in the corpus for each affix. Panel B on the right makes it clear that -bar and -lich realize similar vocabulary sizes in our dataset, but -lich is much closer to exhausting itself than -bar, as evidenced by the horizontal proximity to -sam. Yet as we have already seen, comparisons of 3 for different sample sizes are problematic (as criticized e.g. by Gaeta and Ricca 2006). I therefore prefer to plot 3 and V in the P* diagram for the largest common sample size available, namely that of the smallest sample of the three, that is the sample of -sam, i.e. for N(C) = N(sam) = 7672. The results of the new P* plot based only on this subset of the data are shown again for comparison in Figure 17.

-lich -bar

-sam

0

VN C

200 400 600 800

Morphological productivity measures

7672

88

0.000

0.010

3PN C

0.020

7672

Figure 17. P* diagram for -bar, -sam and -lich at the maximal equal sample size. As we can see, at the smaller sample size -bar and -lich are much closer together. Taken together with Figure 16, this implies that -lich exhausts its vocabulary and slips to the left faster than -bar, which is intuitively plausible and correlates with greater difficulty in generating novel forms from this category. The fact that -lich overtakes -bar in V with growing sample size means that we have not yet seen all of -lich’s established vocabulary at the smaller sample size, whereas -bar’s higher 3 values imply that it will overtake -lich despite a slower growth rate in the interim. Although the two-dimensional representation succeeds in combining two aspects of productivity, it has some failings. Firstly, the weakness resulting from the interdependence between the dimensions and the sample size N(C) is problematic: what sample size should we choose? Secondly, as Baayen and Lieber point out, the two dimensional representation does not allow us to directly asses which process is more productive globally. And finally, it is not clear that all relevant dimensions have been explored here – what about each category’s token frequency, our predictions regarding the estimated potential vocabulary size S, or the activation level $ ? The first problem could be addressed by adding N(C) as a further dimension, tracing the shift in the two-dimensional positions of each process ‘across time’, as it were (i.e. with the development of the sample size). This is illustrated by

Measuring global productivity

89

0.8

1.0

Figure 18, which also marks the point at which -bar overtakes -lich in potential productivity (rather early on, after only 125 tokens, 60 types).

0.6

140000 120000

P

3 0.4

100000 80000

o 0.2

40000

0.0

20000 0 0

N(C)

60000

200

400

600

800

-lich -bar -sam

1000

V

Figure 18. 3-dimensional representation of the development of P* as a function of N(C) for -bar, -sam and -lich. The point where -bar’s 3 value exceeds that of -lich is marked with a circle. The non-productivity of -sam in every respect is shown by its immediate collapse to the bottom left corner, with almost only N growing. Further aspects which deserve attention, such as S and other measurements could likewise be made part of a multidimensional model (apart from questions of an adequate visualization for this). But Baayen and Lieber’s main problem with their formulation of P* is much more their declared goal to find a single measure of global productivity. In Baayen and Lieber (1991) they therefore eventually suggest using an estimate of S for this purpose. Baayen (1993) revises this approach, at first comparing S with a new measure, the vocabulary remaining to be realized, i.e. S-V. Baayen ascribes

90

Morphological productivity measures

to the latter the advantage of giving special consideration to the part played by novel types (thus of two processes with the same potential vocabulary S, a process for which more types have already been realized has less chances for innovation in the future, which a comparison based on S-V captures). However these measures are not satisfactory, since they lead to a rating based purely on vocabulary size (processes with many types get higher scores), and neglect processes with a small vocabulary but a high relative productivity (i.e. many potential types considering the paucity of types in general). Like Aronoff he then proposes using a relative measure which is expressed in terms of a proportion of realized vocabulary, but suggests taking precisely the inverse of Aronoff’s I, which he designates as ,:53

,

Sˆ [13] V

where S is estimated using the Waring-Herdan-Muller model (Muller 1979; however, we can directly replace this with an fZM based estimate, which has been shown to perform better, see Evert 2004). This proportion, conversely to I, tells us how much larger S is than V, and tends to evaluate marked processes as more productive than unmarked counterparts (Baayen 1993: 184–186), since these typically have fewer realized types, but could potentially be used to derive many novel types from unmarked counterparts. Thus a suffix deriving feminine agent nouns is ranked above a masculine counterpart in Dutch (Baayen 1993: 185), because although masculine derivations, which also function as generic names for professions etc., are far more common and diverse, there are many more feminine ones that could be derived but haven’t been so far. Similarly Dutch diminutives are evaluated as extremely productive, because although only few such nouns are derived in practice, very many could be derived if one wished to do so. It might therefore be questioned if the proportion of possible to realized types is really a desirable measure for global productivity, since very many possible forms may in fact end up never being used by actual speakers (though we cannot know in advance which). There is thus a tension between the correct weighting of evidence for realized and potential productivity in Baayen’s work, and he explicitly 53. In fact, this measure already appears briefly in Baayen and Lieber (1991: 838), where it is written as V/S, but still described as the inverse of Aronoff’s I. Presumably this is a typographical error for the same measure found in Baayen (1993) and given here as [13].

Measuring global productivity

91

seeks to define the global productivity P* in terms of a function of 3 and V (Baayen 1993: 190). Since the probability for innovation is a flexible quantity which depends on the amount of data we have seen, the weight of 3 should be increased with sample size, that is together with rising V (since it gets progressively harder to produce high values of 3 ). However, neither component captures the fact that a process may have many types or even a high proportion of new types, but that these arise more or less frequently, i.e. they may have a very different token frequency (or to put it as Baayen does in terms of a Poisson distribution, a different mean ‘interarrival time’). To account for all these different factors, Baayen (1993) has suggested using the already discussed hapax-conditioned degree of productivity or expanding productivity 3 *. By examining the proportion of HL in the corpus which come from a process C, this measure factors in both the prevalence of the category and its innovativeness (token frequency as evidenced by hapax legomena qua neologisms). But as Bauer (2001: 155–156) has noted, not only is this measure susceptible to the weaknesses of HL-based measures (dependence on corpus size, neglect of realized established vocabulary and the actual relationship between HL and neologisms, itself dependent on corpus size), it is also intuitively unclear if the question (in Bauer’s words) ‘what proportion of new coinages use affix A’ is really more relevant than the question which 3 seeks to answer, namely ‘what proportion of words using affix A are new coinages’. For Baayen, in any case, it appears that the motivation is mainly mathematical, since 3 * elegantly incorporates the common corpus underlying two samples from different processes even if they are of a different size. In summary, it is difficult to definitively answer the question ‘which measure is best for assessing global productivity’ since different scholars intuitively rank processes differently (but not arbitrarily), based on different criteria. Baayen’s work also presents the different measures not as supplanting each other, but as complementary, emphasizing different aspects of productivity. As such, I concur with Bauer (2001: 154), who feels that P* as a multidimensional representation shows precisely that type frequency and the probability of new types emerging are not directly related, and should therefore be kept entirely separate. Attempting to factor them into one number is thus an undesirable loss of information. The much more interesting conclusion from P* is that some processes are more alike in their productive behavior than others: in Baayen and Lieber’s own data in Figure 16A, the similar status of -er and -ation is revealed (both have a large vocabulary but produce unseen types more rarely), -ee and de- (as in addressee and delouse, both of which have very few types and a relatively

92

Morphological productivity measures

high proportion of unique ones), the cluster of relatively saturated and moderately type frequent -ous and -ment or the remarkably similar results for the negative prefixes un- and in-. It therefore appears that the potential disadvantage in the inability of multidimensional models to deliver a ranking on a single scale is compensated for by their ability to indicate groups of processes with similar features of productivity.

8.

Summary: Measuring morphological productivity

This chapter has presented an overview of the most prominent empirical productivity measures referred to in the morphological literature. As we have seen, the measures attempt to quantify one or more of the aspects of productivity explored in the previous chapter, using operationalizable criteria to approximate relevant quantities that cannot be measured directly. Thus vocabulary size and frequency are estimated by V or S and N(C) or f(C), and the probability of neological or productive formations is estimated based on hapax legomena, possibly together with other frequency bands below a certain threshold (e.g. ș for $ ) with decreasing weight. The measures are generally well-defined for any operationalizable category C, regardless of whether it corresponds to a morphological process that might be considered intuitively sensible (though completely unproductive processes can be ruled out, as in the example of pronouns). The task of delineating relevant constructions is thus relegated to the grammar. Most of the measures isolate one aspect of productivity, but a few mix contributions of different aspects, and global measures intentionally attempt to balance multiple factors or dimensions to create a global ranking. Of these, multidimensional representation techniques, such as VGCs and P*, are of particular interest, since they allow us to cluster together processes with similar behavior. Most (but not all) of the measures are highly dependent on sample size, though this can be mitigated by integrating them into a multidimensional model which regards sample size as one of the relevant factors for productivity judgments (indeed type frequencies, which are central in most views of productivity, are firmly rooted in and intertwined with empirical sample size). Table 11 summarizes the different measures and representation techniques discussed above, giving the relevant aspects of productivity addressed by each method, a concise description of their mathematical meaning, and their dependency on sample size. For a discussion of the aspects of productivity see Chapter 2, Section 4.

Summary: Measuring morphological productivity

93

Table 11. Summary of productivity measures and representation techniques and their dependence on the sample size N. aspects f(C) V Vm VC 3 3 * $ S S-V I ,

meaning

profitability (usage) prob. that next token is from C profitability (usage/concepts) expected types of C in N tokens profitability (concepts) expected types of C with freq. m expected types of C in N tokens profitability (concepts) of C availability (relative) probability that next C is HL availability (absolute) probability that next HL is C probability that next C is parsed transparency/regularity?54 profitability (concepts) expected types of C in ’ tokens availability types left to be generated (relative/absolute) global (multiple aspects) how much of S is covered in V global (multiple aspects) how much bigger S is than V

P* global VGC global SPC global

(multidimensional; see 3 and V) (multidimensional; see N and V) (multidimensional; see Vm and V)

depends on N no yes yes yes yes yes yes no55 yes yes yes yes no56 yes

Some aspects have multiple facets. For instance, the aspect of profitability is divided into usage (how often a process is used, i.e. token frequency) and concepts (how many different concepts it is used to encode, i.e. type 54. The activation level $ is a function of regularity insofar as parsed forms, as defined by the Morphological Race Model, are considered regular, and unparsed forms irregular, since they are not processed using the regular, compositional formation. However clearly this is no measure of the percentage of forms for which the output of a process is predictable from the constituents, which is the more usual definition of regularity. This measure is in principle independent of sample size as long as the threshold ș is scaled together with the sample. But there is no reason to assume that a higher or lower percentage of forms is parsed as the sample grows. 55. The estimation of S is improved by increasing the sample size, but the measure is essentially sample size independent as soon as it becomes stable (see Evert 2004). 56. VGCs incorporate the sample size N and give information on V for all available observed sample sizes; beyond the observed curve extrapolation must be used (see S).

94

Morphological productivity measures

frequency, notwithstanding polysemy). Availability is seen as either relarelative (how open to innovation relative to the extent of attestation) or absolute (how much innovation is afforded in a sample of absolute size, regardless of the base frequency of the category in question). Conspicuously absent is the aspect of regularity, which is only mirrored partly in the transparency aspect measured by $ . This is probably due to the difficulty in operationalizing regularity, which depends on uniform meaning (the strictly formal aspect of regularity is satisfied a priori by the procedure determining the type, e.g. that we are looking at a -bar adjective). As we have seen, all of the measures in Table 11 are reliable in that they can be reproduced for a certain process in different samples from the same type of data. We have also seen that the measures are interdependent to a certain extent, and that conflation of different aspects of productivity should be avoided since these can behave independently (e.g. -lich has a higher profitability but lower availability than -bar). We must therefore leave global measures such as I and , behind, but also the use of 3 *, since it conflates the availability aspect measured by 3 with the usage/profitability aspect associated with frequency (this was the property that made it desirable for Baayen in searching for a global taking both into account). For the concepts/profitability aspect we have seen that Vc, i.e. V depending on N(C) is more orthogonal than V for N, since it is independent of the frequency aspect. S-V mixes relative and absolute availability by referring to the fixed amount of types left to be realized depending on the type count we have seen so far – if we are prepared to look at both V and S this can add no information. Finally, Vm and $ encounter theoretical and methodological problems, since it is unclear how important which frequency bands might be and in which cases. In measuring availability, the heuristic behind 3 is both generally accepted and somewhat motivated by treating hapax legomena as an estimate for neological usage, while the measurement of transparency runs into problems of operationalization already discussed above. This leaves us with four measures which are not easily replaced by each other and which will be used in the following primarily: V (for N(C)), 3 , S, and frequency. To conclude this chapter, what we have seen so far is that the above measures indicate real, stable facts about morphological processes. Working in a usage-based framework and in keeping with the literature on scalar morphological productivity, I will take these results, along with those from the studies cited above, as evidence that some part of a speaker’s linguistic system causes them to realize statistically comparable word formation patterns for particular processes in otherwise independent

Summary: Measuring morphological productivity

95

samples, and that not only the frequency of familiar items is predictable, but also the frequency for rare, productively formed types. This means that speakers possess an implicit knowledge about which morphological process may or should be used more productively, and to what extent. With these results at hand, it is now possible to approach the question of productivity in syntactic selectional processes from a quantitative perspective. The next chapter will apply the methods presented so far to a variety of syntactic constructions and test the performance and interpretation of these measures in the syntactic domain.

Chapter 4 Adapting measures to the syntactic domain

The purpose of this chapter is to demonstrate that the same productivity measures computed in the morphological literature can apply to syntactic argument selection, that is that argument selection in syntax also exhibits a consistent variety and distribution of arguments depending on sample size. Much like in morphology, growth for the range of arguments diminishes along a predictable path with growing sample size, as does the probability of encountering novel argument types, which can be modeled statistically as discussed in the previous chapter. It will also be shown that these measurements ‘make sense’, in that they correspond to some intuitive productivity rankings in syntax for processes that tend to generate more or less conventionalized output. After some methodological remarks on using corpus data to extract syntactic argument realizations in Section 1, I will examine the necessary conditions for applying the measures presented so far to syntactic constructions in Section 2, which discusses such questions as: what are the syntactic equivalents of types, tokens and hapax legomena? Is argument selection really the same as applying a morphological process to a base? The following sections then discuss three case studies: Section 3 presents differential argument distributions for competing constructions with the German adposition wegen ‘because of’; Section 4 examines the different productivity rankings that can be achieved for the accusative slot of nine English transitive verbs depending on the productivity measure selected; and Section 5 deals with multiple slot filling by looking at comparative correlative constructions in English. Section 6 summarizes the findings of this chapter and outlines the questions that they raise for the next chapter.

1.

Methodological remarks on using corpus data

Collecting vocabulary data for syntactic processes is methodologically speaking no simple task, because types must be defined according to operationalizable criteria. Since some of the processes involved are rather

Methodological remarks on using corpus data

97

infrequent, 57 and especially those with waning productivity, data will be collected from very large corpora harvested from the Internet. Unless otherwise specified, data in the following sections uses the ukWaC corpus for English (UK Web as Corpus, approx. 2.25 billion tokens) and the deWaC corpus for German (Deutsch Web as Corpus, approx. 1.63 billion tokens), 58 both automatically part-of-speech tagged and lemmatized, and representing some of the largest freely available corpora for their respective languages at the time of writing (see Baroni et al. 2009 for information on both corpora). This type of automatically retrieved data is rather heterogeneous and possibly error-prone, so a main priority in searching through such corpora is to ensure high accuracy of results by formulating precise queries and manually evaluating error rates as required. Failure to do so inevitably results in an artificially high number of hapax legomena and types in general. However, this necessarily comes at a price of lower recall rates. This means that for large datasets which cannot be examined entirely by hand, conservative queries must be formulated that retrieve as few unwanted cases as possible. Many of the less easily identifiable cases are thereby ignored, an approach commonly adopted in natural language processing, where large corpus sizes can be used to compensate for very strict, accurate queries (cf. Kawahara and Kurohashi 2005). If the criteria for the accurate queries are not biased against especially productive or unproductive environments for each process, then evidence collected in this manner can be regarded as a legitimate subset sample from our data – it is however crucial to make the assumption explicit, or even better to demonstrate, that the exclusion of ambiguous material does not interact with productivity, since this alternative explanation of results is then always possible. It is also important for our purposes to make sure that the queries for each construction in a certain comparison are as comparable as possible. Finally, whenever doubt may arise as to the accuracy of a search, it is necessary to evaluate a sample of the results to estimate error rates. To maintain the transparency and reproducibility of the search process, the

57. For example, the English comparative correlatives examined in Section 5 below occur only about twice for every million tokens of data. 58. Various versions of these corpora are available, resulting in different reported token counts in some papers. I used the unparsed, but tagged and lemmatized versions available from http://wacky.sslmit.unibo.it/, with 1,627,169,557 tokens for deWaC and 2,251,569,613 tokens for ukWaC, downloaded on 25.8.2010.

98

Adapting measures to the syntactic domain

queries for each study are described schematically in each section and givgiven in full in Appendix A. Cases where manual filtering of the data was carried out will be mentioned individually.

2.

Types and type counts in syntax

Applying the vocabulary concept to a syntactic argument selection process requires first and foremost a definition of the type concept. To return to the generalization: (morphological) affix : base

~

(syntactic) head : argument(s)

from Chapter 2, Section 2, it would seem that a syntactic construction must be identified by some invariable part shared by all of its exponents, whereas the argument(s) identify the type. The problem with this is that it is not always clear which parts are variable and what exactly different types must have in common. In working within a hierarchical lexicon framework which incorporates partly unfilled constructions, as briefly described in Chapter 1, Section 2, we assume that every overt form can, and usually does instantiate multiple constructions simultaneously at different levels of abstraction (cf. Goldberg 2006a: 10, Clausner and Croft 1997, Wulff 2008: 165–169). For instance in examining the productivity of transitive verbs in Icelandic, Barðdal (2006, 2008) looks at verb lemma types occupying the verbal slot governing accusative and dative objects, and groups together meaningful subclasses of such verbs, but does not subsequently differentiate types based on each verb’s arguments. She points out that different classes of verbs may be more or less productive, and chooses to focus on the extensibility of these classes with novel verbal head lexemes, at certain levels of abstraction. Thus the following example (Barðdal 2008: 83): (30) Leiðinlegt að msna fólki sem situr við hliðina á mér. Boring to MSN people.DAT who sit with side on me ‘It is so uninteresting to MSN people who sit beside me.’ simultaneously instantiates both the nominative-dative verb class and the ‘verbs of (means of) verbal communication’ class with the novel verb msna ‘MSN (to send a message via Microsoft Network Messenger)’. The processes postulated to generate members of these classes can understandably have different degrees of productivity. But in either case, it

Types and type counts in syntax

99

is the identity of the verb that determines the type, while the abstract construction, including both the argument case assignment and the overall meaning, determine the intended process for which a type is being instantiated. Kiss’s (2007) work on articleless prepositional phrases in German, by contrast, makes the productivity of the construction dependent on argument selection, where the identity of the head category, i.e. the preposition, identifies the process in question, along with a feature of the argument, namely the lack of an expected article for a singular noun. The particular type is then determined by the head noun of the prepositional phrase. Therefore the following two examples: (31) unter Androhung physischer Gewalt (Kiss 2007: 318) under threat physical.GEN violence.GEN ‘under threat of physical violence’

der UN (32) unter Aufsicht under surveillance the.GEN UN.GEN ‘under surveillance of the UN’

(Kiss 2007: 324)

constitute different types, whereas (33) is the same type as (31), since it has the same prepositional argument head noun, and (34) is not a token of the construction, since the noun is qualified by a definite article. (33) unter Androhung aller rechtsgültigen under threat all.GEN legally-valid.GEN ‘under threat of all legally applicable penalties’

Strafen penalties.GEN

(34) unter der Androhung von Folter under the threat of torture ‘under the threat of torture’ This type definition implies that the article (or lack thereof) is not selected by any member of the construction, but rather that the construction itself specifies the realization of the determiner, i.e. there is a construction ‘PP with articleless singular noun’. Working in a construction grammar framework, this type of construction is not problematic: it would be stored in the mental lexicon with exactly the necessary slot, at abstraction levels with and without a lexically specified preposition (unter ‘under’ or a place holder [PREP]).

100

Adapting measures to the syntactic domain

The notion of ‘extension of a construction to a new verb’ as in the nominative-dative verb class above is actually not very different from argument selection, especially if we wish to consider constructions with specific types of arguments or modifiers (e.g. ‘head nouns of indefinite NPs’). The common denominator is the idea of an empty slot to be filled, be it the verb in an a-structure at a level of abstraction that does not specify its lexical arguments, or a concrete NP argument. Both cases are subsumed under the notion of lexical choice invoked in Chapter 2, Section 2.59 For any process, a type is realized whenever all free slots are occupied for that construction, although it is not trivial to determine how many slots each construction has and where novelty should be looked for. For instance, in a phrase like (35) [MSN [people [who sit next to me]CP]NP]VP it is clear that the verb MSN takes an object NP, namely: people who sit next to me. But is this phrase of a different type than in the instance of the VP in (36)? (36) [MSN [people]NP]VP Lexically speaking, we could say that if a speaker has encountered MSN people, the extension to the version with the relative clause is trivial, and vice versa, and does not necessarily have anything to do with the behavior of MSN.60 Syntactically, this corresponds to the hierarchical position in the 59. Of course there is no necessity to measuring types only between heads and modifiers. For example Baayen (2001: 221) gives a distributional analysis of all bigrams in a corpus, regardless of whether they signify a meaningful coherent choice. His analysis shows, unsurprisingly, that the distribution of bigrams is much steeper than that of single words, since there are many more possible combinations and hence many more types, and especially rare types. However I would put it that the analysis becomes theoretically interesting by following a model of grammar, i.e. by addressing the slots that we postulate – otherwise we are simply dealing with the syntactic equivalent of the words in T- and Q- in the previous chapter. 60. Admittedly, this is a simplification which might not always hold: certain modifiers may become lexicalized, as can a construction specifically lacking them, so that presence or absence of the modifier must be seen to constitute a separate type, especially in the case of collocations. For example, the objects in idiomatic and unmodifiable kick the bucket vs. freely modifiable kick the

Types and type counts in syntax

101

derivation at which we are making the lexical choice: the relative clause would be adjoined lower in the syntax tree and remains ‘invisible’ to the VP, therefore the VP selects only the NP head, for which modifiers or ararguments are selected separately in turn. What exactly is ‘visible’ to the selectional process is all the same no simple matter. For instance in English, noun-modified nouns are sometimes analyzed as syntactically complex NPs and sometimes as morphologically derived compounds. Some early analyses (e.g. Bloomfield 1935: 228) limited morphological status to combinations with stressed modifiers (viz. ‘compound stress’), as opposed to the expected phrase final stress. This is expressed in the well-known Compound Stress and Nuclear Stress Rules in Chomsky and Halle (1968: 17), i.e. initial stress for binary compounds like bláckbird as opposed to the phrase blàck bírd, and recursive initial stress on the head’s modifier in more complex cases. Later on, many analyses accepted compounds with unmarked stress patterns (e.g. Bauer 1983: 104), meaning that many more head nouns could be modified on a morphological, pre-syntactic level (for an overview see also Spencer 1991: 309–344; for a current data-based approach to the prediction of compound stress see Plag 2010). This creates a problem for selectional processes in many grammatical models, particularly those that adhere to the ‘Lexicalist Hypothesis’ (Jackendoff 1972: 12–13), which posits that syntactic processes, and by extension argument selection, cannot interfere with or are not ‘aware’ of processes below the word level. Thus the analyses in (37) and (38) expose different lexemes in the a-structure. (37) [polish [the [kitchen]NP [table]N]NP]VP (38) [polish [the [kitchen table]N]NP]VP In (37) we would probably say that, much like in (35), the verb selects an NP head, which is then expanded by a modifier noun at the next hierarchical level: polish is compatible with the argument table, and the specific type of table is secondary, selected by table and not by the verb. In

(green) bucket or conversely the free climate change brought about the (terrible) heat vs. the necessarily modified climate change brought about the *(dead) heat (in the sense of stalemate or narrow race). However it can also be argued that it is the modified element itself that is polysemous; for the present I will ignore such cases.

102

Adapting measures to the syntactic domain

(38) however, the head of the NP is seen to be a compound kitchen table, and if we accept a lexicalist model, we are forced to admit an argument type kitchen table as well. This also means that a novel form such as polish the treasury table would be considered a familiar instance of polish(table) in the first analysis, with a new type of modifier for table, but two distinct object types in the second analysis. For languages like German, where all combinations of this sort, including Küchentisch ‘kitchen table’, are considered to exhibit a purely morphological process of compounding, the first analysis is not generally accepted for any noun-modified nouns. The reasons for this are both principled (especially allomorphy of the modifier stem within the compound, see Fuhrhop 1996) and conventional (compounds are written together, without spaces), though the issues are certainly related. Notwithstanding a variety of other problems with the Lexicalist Hypothesis, in the case of identifying novel arguments, this type of analysis can be very disadvantageous. We have already seen for morphology that some results of compounding and derivation should be ignored (cf. Chapter 3, Section 2), e.g. suffixation of German -bar ‘able’ exhibits the same type in machbar ‘doable’ and unmachbar ‘not doable’, just as -sam has the same type in langsam ‘slow’ and schneckenlangsam ‘snail-slow’. In both cases, the suffixation is said to take place before the prefixation or compounding process, and treating these cases as separate types leads to an overestimation of the productivity of the suffix (e.g. -sam would actually be quite productive, against all expectations, if we did not remove such forms from the count, cf. Lüdeling, Evert, and Heid 2000). Yet in a standard syntactic analysis of the German translation of (37),61 (39) [[ den Küchentisch]DP polieren]VP the.ACC kitchen-table polish ‘polish the kitchen table’

polieren selects a phrase containing the compound noun Küchentisch. Despite the different morphologies of English and German, usage of the selectional process can be viewed as similar in both cases. From a usagebased perspective, the relevant question is what selects for or constrains the compound modifier: the compound head independently of the verb 61. The analysis follows standard practice by assuming a determiner phrase DP instead of NP for German, following Abney (1987). The argument phrase’s internal structure may be ignored for present purposes.

Types and type counts in syntax

103

governing its phrase, the verb which selects the entire compound while ignoring its internal structure, or a combination of both. Let us attempt to defend each of these positions. If we do not adhere to the Lexicalist Hypothesis, it is possible to assume that Tisch independently receives a modifier Küchen- via an ordinary selectional process, which one must assume anyway to explain the existence of that compound in general, and that polieren can select the head Tisch regardless of whether or not it is modified. The appearance of Küchentisch as an argument of polieren could then be predicted from these facts, all other things being equal, resulting in a more parsimonious description. For the productivity of some verbal arguments, in which compounding is common, this may make a lot of sense. For example, the rare transitive German verb anstrengen ‘pursue (a formal procedure)’ exhibits only 98 occurrences in the rather large deWaC corpus, of which a substantial portion of 38 cases are compounds.62 If we include all distinct compounds as separate types we get V=28 and V1=21. However, if we look at the arguments, it is clear that most are headed by Prozess ‘trial’, Verfahren ‘(legal) procedure’ or Klage ‘lawsuit’, as illustrated by the following examples (boldface is used to emphasize the relevant lexemes throughout).

Länder wird eventuell ein solches (40) Eines dieser one these.GEN countries will possibly a such Verfahren anstrengen. procedure pursue ‘One of these countries will possibly pursue such a [legal] procedure’ [deWaC, pos. 111758441] anstrengen. (41) Der dbb wird nun entsprechende Musterverfahren The ddb will now corresponding example-procedures pursue ‘The ddb will now pursue corresponding example trials’ [deWaC, pos. 60971887]

62. The search encompassed all sentences containing the lemma anstrengen or strengen + an (for separable cases) where the reflexive sich was not adjacent (to avoid the verb sich anstrengen ‘exert oneself’), and then manually filtered to rule out errors.

104

Adapting measures to the syntactic domain

(42) Krause strengt einen Prozess an, der "40 Tage und 40 Trial PTC, which 40 days and 40 Krause pursues a Nächte" dauert. nights lasts ‘Krause is pursuing a trial which will take “40 days and 40 nights”.’ [deWaC, pos. 111758442] (43) Will er tatsächlich einen Mobbing-Prozess anstrengen? bullying-trial pursue? Wants he really a ‘Does he really want to pursue a bullying trial?’ [deWaC, pos. 498653619] If we treat compounds headed by Verfahren, Prozess and Klage as representing only three types, we measure V=7 and V1=3. This also drastically alters 3 for N=98 (from 21/98 to 3/98), and other HL-based measures accordingly. Whether or not this makes sense depends on whether we see cases like the above Mobbing-Prozess ‘bullying trial’ or others like Verwaltungsgerichtsprozess ‘administrative court trial’ as merely types of trials, or as distinct forms of legal procedures that can be pursued. In the latter case, we must keep in mind that we may be conflating the productivity of compounds with the head ‘trial’ together with the productivity of the verb’s direct object slot, since compounding is itself a productive process. This would also blur the difference between innovations due to compounding and independent neological arguments, such as Auslieferung anstrengen ‘pursue extradition’: (44) Während ersterer in Haft while

former

genommen wird,

in arrest taken

BAW in

letztem Fall die Auslieferung an.

BAW in

latter

case

the extradition

strengt

die

becomes pursues the PTC

‘While the former is arrested, the BAW is pursuing extradition in the latter case’ [deWaC, pos. 902276595] This is one of the only three HL in the non-lexicalist account, which intuitively appears to be a more productive or innovative case than ‘bullying trial’ or ‘administrative court trial’, which merely expand on the already familiar ‘trial’ argument. On the other hand, moving back to the less clear case of polieren, we must admit that unlike ‘administrative court trial’, Küchentisch is a frequent, familiar word (or collocation if we extend the term to German compounds),

Types and type counts in syntax

105

which suggests it can be selected as a whole without the modifier being selected separately from the head. Even if it is considered completely compositional, a maximalist usage-based grammar (cf. Chapter 1, Section 2) will have no qualms about giving such an entrenched unit its own accessible lexical status, while in the case of opaque or non-compositional compounds even a classical generative grammar will have to allow for exceptions. For instance, the verb essen ‘eat’ exhibits both the argument Schweinchen ‘piglet’ and Meerschweinchen ‘guinea pig’ in deWaC, but we would not want to treat these as one type since guinea pigs are not actually a kind of pig, regardless of the morphological identity of the noun head. Nor would we believe that the speaker first selected ‘pig’ as an argument in this case and then decided to elaborate on it with a further modifier. It is also difficult to say if compounds of the ‘kitchen table’ sort are absolutely compositional – would Fillmore’s innocent speaker (recall Chapter 1, Section 2) know exactly what a kitchen table is just by knowing the meaning of kitchen and table? And if we are not certain about that, who is to say whether ‘administrative court trial’ is too unfamiliar or compositional to be selected as a whole? For a lawyer it may have a very distinct meaning, and may be accessible as a monolithic unit. It therefore seems somewhat hopeless to try to distinguish in compositional cases like (37)–(39) whether the modifier kitchen expands the vocabulary of the construction X table or rather that of polish X, or even both (the ‘combination’ analysis). In a model admitting degrees of entrenchment or lexicalization, there may be no clear cut answers, since any type of modification, even article use, can become entrenched, assume a special meaning and undermine the hierarchical selection model (cf. the articleless PP example above). Thus interaction on all levels to some degree or other must be reckoned with, and multiple productivity estimates can be arrived at depending on our vocabulary definitions. Whether or not it is advantageous to analyze compounds or other modified heads as types should be tested empirically – do we receive more satisfactory productivity rankings by regarding (at least largely transparent) compounds as distinct types or not? Is it even possible to decide if a particular compound is sufficiently compositional for the compositional analysis to apply (cf. also the discussion of productivity in context in Chapter 6)? Finally it should be noted that in some scenarios we may wish to view distinct lexical items as realizing the same type, for example in the case of human referents or proper names. If two verbs have the exact same frequency and number of objects in a sample, e.g. VN(C)=25=12, but one has all different proper names as object lexical heads and the other has 5

106

Adapting measures to the syntactic domain

humans, 4 concrete objects and 3 abstract nouns as objects, we may wish to say that the second verb has a more diverse object vocabulary and is therefore more productive in some sense.63 Whether or not such class-based type definitions should be utilized is again a matter of the research question or application we are interested in: for the obsolete pattern long live X, the identity of X is very restricted, and masking the possible realizations of X behind a class such as ‘human’ would be misleading. If we are interested in the applicability of a construction across ontological domains, e.g. how varied or extensible the kinds of things are which one typically builds (abstract, concrete, or even more fine grained classes such as machines, buildings, tools…), we may wish to collapse types into classes, or use multiple type definitions to figure out the relative diversity within each class. Caution must nevertheless be exercised, since our results will then naturally depend on our class definitions. No matter how we decide for a particular case, the concept of a type for a syntactic construction which we will use from now on can now be simply defined as: a distinct realization which fills all open slots in a construction, where ‘distinct’ and ‘construction’ should be understood in the CxG sense, as entries in the mental lexicon, and where distinctness can furthermore apply to either lexical identity or identity on a more abstract class-based level. This applies both to the extension of lexically unspecified constructions (e.g. the verb MSN extending the dative argument construction in Icelandic), at an abstraction level which disregards further arguments (thus these are not open slots at that level), and the selection of an argument for a lexically specified head, for which the argument position is the empty slot. With this definition, and bearing in mind the difficulties above, it is now time to discuss some concrete cases of gradient syntactic productivity and see how some construction and type definitions fare in practice.

3.

Argument selection in competing constructions: Prepositional and postpositional wegen in German

In German, the adposition wegen ‘because of’ can be used prepositionally (45a) or postpositionally (45b), usually governing the genitive, but prepositionally also the dative (45c) in colloquial use (Helbig and Buscha 2001: 356, Zifonun et al. 1997: 2080; see Petig 1997 on use with the dative): 63. I thank Stefan Gries for this example.

Argument selection in competing constructions

107

des Vaters (45) a. wegen because-of the.GEN father.GEN b. des Vaters wegen the.GEN father.GEN because-of dem Vater c. wegen because-of the.DAT father.DAT ‘because of the father’ The postpositional variant is older, stemming from a phrase of the type von [N.GEN] wegen ‘from N’s side/way’, attested first in the 13th century and still preserved in expressions like von Amts wegen ‘officially, ex officio’, lit. ‘from ways of office’. The preposition von ‘from, of’ was subsequently dropped starting in the 17th century, at which time prepositional variants also begin to be recorded, along with the prepositional dative form shortly thereafter (see Paul 1959: vol. 4, 43–44; Braunmüller 1982: 200–207). In modern usage, all three variants are attested, though they carry a different degree of formality or colloquialism: the postpositional variant appears in the most formal registers and the prepositional dative is the most colloquial. There is however no difference in the sense of ‘because of’ between the pre- and postpositional variants (Zifonun et al. 1997: 2084), and all variants can be used productively (Braunmüller 1982: 200). At the same time, there is a common notion that use of the postpositional variant is waning, and my own intuition is that it is more attracted to fixed formulaic expressions, whereas the two prepositional variants are more likely to be selected when a noun previously unused in either construction is required. This is perhaps not surprising considering the fact that the postpositional genitive construction (henceforth post-gen) is essentially an archaism, whereas the prepositional genitive (pre-gen) is the younger synchronic norm, and the dative (pre-dat) is the emerging colloquial standard. But given that all three variants are undeniably productive, a model not taking scalar productivity in argument selection into account could not describe a state of affairs where all three variants coexist in the grammatical system, but some are more prone to neologisms than others. If this should be the case, such a model would simply ignore these differing levels of productivity. To test the hypothesis about the differential productivity of wegen constructions, I will extract a subset of the attestation for each construction. Since the oblique feminine article der is ambiguous between the dative and genitive, and the plural genitive article is homonymous with the feminine

108

Adapting measures to the syntactic domain

one, I will follow two separate search strategies, with and without case distinctions: 1. All cases of wegen will be compared for all arguments with any compatible article (plural NPs without articles must be ignored, as they lead to unreliable matches); no attempt is made to distinguish dative or genitive. 2. Only masculine and neuter singular arguments are examined; in this case all three constructions, including dative and genitive cases, can be distinguished reliably. A special difficulty in obtaining a large volume of high accuracy results without manual inspection is the fact that wegen can stand between two compatible phrases quite often, e.g. two genitives in: (46) Verkehrswert des Objekts wegen des Mangels market-value the.GEN object.GEN because the.GEN defect.GEN ‘market value of the object because of the defect’ [deWaC, pos. 193047726] In such cases, it is difficult to determine automatically which phrase is governed by wegen. For this reason, a high-accuracy strategy was selected searching for the pre- and postpositional wegen only before punctuation, which makes it highly likely that, if wegen is final, the preceding noun is its argument, or if the noun is final, that it is attached to the preceding preposition wegen. In both cases, simple modifiers such as attributive adjectives are allowed to modify the argument. These are ignored and do not influence the observed type, resulting in the following schematic queries: pre: post:

ART ATTR* NN wegen PUNCT wegen ART ATTR* NN PUNCT

where ART stands for a case-compatible article, ATTR* stands for zero or more attributive adjectives or participles, NN stands for a normal noun (a common, but not a proper noun) and PUNCT stands for punctuation marks such as ‘.’, ‘?’, ‘!’, or ‘,’ etc. Since the amount of results using this strategy was still over 26,000 and cursory examination revealed no errors, arguments were not manually inspected and compounds were considered distinct types (though see the next section on this topic).

Argument selection in competing constructions

109

Following the first search strategy, which does not distinguish grammatical cases, productivity measures from Chapter 3 show that the postpositional construction is less productive and generally rarer than the prepositional forms taken together (Table 12).64

Table 12. Productivity measures for prepositional and postpositional German wegen ‘because’. pre post

N(C) V VN(C)=2744 V1 V1N(C)=2744 3 3 N(C)=2744 S 23555 9971 1958 6848 1604 0.290724 0.584548 67796.01 2744 1420 1420 1008 1008 0.367347 0.367347 9084.91

10000

Plotting the VGCs for each construction (Figure 19) shows this difference is expected to hold at any sample size, with the gap growing progressively (as predicted also by the fZM extrapolation and the projected value of S).

6000

V

S(post)

2000

fzm(post)

0

pre post 0

5000

10000

15000

20000

N(C)

Figure 19. VGCs and fZM extrapolation for prepositional and postpositional wegen. The vertical line gives the end of empirical data for postpositional wegen and the beginning of the extrapolation. The horizontal dotted line gives S(post), the predicted limit for the postposition in the present corpus type. 64. The results for 3 for the entire datasets cannot be compared directly since, as discussed in Chapter 3, Section 4, hapax legomena become much harder to find in larger samples (cf. Gaeta and Ricca 2006). The values at the largest common sample size (N(C)=2744), also given in the table, should therefore be consulted for the comparison instead.

110

Adapting measures to the syntactic domain

mean(VGC) +/- 1 SD 95% conf. int.

1000

V

V1

0

500

V,V1

1500

Before examining the significance of the observed difference, it is important to establish that measurements of V and V1 are as reliable for the syntactic domain as they are for morphology (that 3 is reliable follows from this directly; experiments with the fZM extrapolation will be dealt with in more depth below). This turns out to be very much the case, as illustrated by Figure 20.

0

500

1000

1500

2000

N(C)

Figure 20. Distribution of V and V1 across 10 equal sized samples of arguments for prepositional wegen. Samples are normally distributed throughout the curve except where vertical notches appear at the top (for V) or at the bottom (for V1). We divide the larger dataset (prepositional cases) into 10 equal samples so we can test the distributions of V and V1. For the maximum N(C)=2300, V and V1 are both normally distributed (Shapiro-Wilk test at Į=0.01). If we test for normality at every point along the curve, we will of course get some non-normal results: these are marked by vertical notches in Figure 20. This result is expected for some cases randomly (1% of all independent samples, corresponding to Į) even if the null hypothesis is true that their distribution is normal. However the samples here are not independent: if one sample of N tokens has particularly varied or repetitive arguments for wegen, the same sample at N+1 tokens is still likely to deviate. But assuming the

Argument selection in competing constructions

111

underlying statistic is normally distributed, these results should become increasingly rare with growing sample size: the central limit theorem stipulates that data should occur close to the mean most often, making the deviations become insignificant as more and more data behaves normally. This is precisely what we observe: starting from about N(C)=1000, V is continuously normal, and V1 already much earlier (below N(C)=400). Any short bursts of unusual repetitiveness do not last long enough and are distributed randomly enough across the samples to maintain normality in larger samples. Given this normal distribution, we can compute a 95% confidence interval, which, as the figure shows, is very tight around the sample mean displayed at the center as a solid curve. 65 This means that there is actually an expected mean value for V and V1 for arguments of prepositional wegen at a particular sample size, and that future samples of comparable data should also exhibit values within that interval (with 95% certainty). What we need to do is therefore to test whether data for one process appears within the interval of the other, in this case for pre- and postpositional arguments. If we accordingly treat ‘finding hapax legomena in a fixed sample size’ as a binomial process, as we did in Chapter 3,66 we can now find a very significant difference between the two constructions at the maximal common sample size (N(C)=2744, p sift) corresponds to the ranking according to V, which may roughly be interpreted as the amount of types one has encountered in one’s experience (insofar as massive amounts of data from the Internet can

124

Adapting measures to the syntactic domain

be thought to reflect linguistic experience).69 Absolute frequency, at least in this corpus, would have ranked achieve above all other verbs, whereas the chance to encounter a new argument after seeing all the data (3 ), provided the next verb we encounter is the one in question, is actually highest for sift. This is because sift is so rare that we have yet to see hardly any of its possible arguments, even though these are estimated at a meager S§438. The ranking based on S itself is different again, prioritizing verbs for which there is reason to believe that new arguments outside of this experience are expected (judging by the rate these are added at as the corpus grows). Achieve beats push here despite a marginally smaller vocabulary, just as spend is overtaken by incur. This can be taken to mean that the amount of things one spends, mainly expressions for time and money, does not grow as productively as things that can be incurred, notwithstanding the relative rarity of incur compared to spend, the second most common verb. At N(C)=1000, both 3 and V result in the same ranking for all verbs, as they show a linear correspondence (cf. Figure 22). This is because the proportion of hapax legomena is very high in a sample of only 1000 tokens, so that V and 3 =V1/1000 are closely related. The interim conclusion that can be drawn from these results is first and foremost that productivity cannot be reduced to a single scale for syntax any more than it can be for morphology. An attempt to find a global productivity ranking, for example to correspond to a single intuitive ranking, leads to an undesirable loss of information. The measures used here can therefore be seen as complementary dimensions of productivity with different meanings, not mutually redundant indicators of a single productivity score. This fits with, and motivates, the search for a multidimensional ‘productivity complex’. What these results do not yet clarify at all, however, is what the theoretical status of that complex is. It should by now be clear that the same phenomena that in the previous section made pre- and postpositional wegen have different productivity ratings are also at work here. But unlike in that case study, there is no expectation that there should be ‘as many’ drinkable or edible things as there are achievable ones – insofar as this statement makes sense for nonenumerable categories. Indeed, as already remarked on in Chapter 2, Section 4, this could be a matter of lexical semantics and world knowledge (and doubtless this is at least partly the case, though as I will be suggesting 69. Interestingly, V also plays a central role in Barðdal’s (2008: 52) view of productivity, in that it correlates strongly with ‘schematicity’ (cf. the discussion in Chapter 2, Section 4) – in her view the central property of productivity.

Productivity in multiple slots

125

later, not entirely). Before moving on to examine these problems and find criteria for determining the theoretical status of our measurements, let us consider one last aspect of the application of the morphologically rooted methodology to argument selection – the case of multiple argument slots.

5.

Productivity in multiple slots: The case of comparative correlatives

In Chapter 2, Section 2, I suggested that one difference between syntactic and morphological selectional processes is that syntactic processes often select for multiple slots, whereas typical morphological processes, such as affixation, only select for a morphological base (possible exceptions such as compounding have already been mentioned there). From a CxG point of view, satisfying the selectional requirements of a construction means filling all available slots for a mental lexicon entry that is not entirely lexically specified. Looking at representations of such constructions in CxG literature, we find more or less schematic or lexicalized entries with multiple slots, such as comparative correlatives (the Xer the Yer), or verbal patterns such as ditransitives or the passive (cf. Chapter 1, Section 2). Similarly in Langacker’s (1987: 35) earlier formulation of cognitive grammar, it is said that “lexicon, morphology, and syntax form a continuum of symbolic units”, and his corresponding notion of the inseparability of grammar from meaning suggests that complex syntactic structures are also stored in the lexicon with lexically unspecified positions. In the previous section, we already singled out the realization of the accusative object of transitive verbs as a separate productive process, independent from, say, the realization of the subject for the same verb. To an extent, this can be justified by syntactic accounts that posit a higher adjoined generation for subjects in various languages. In such accounts we could claim the subject is selected at one point, and the object at another. This argument can of course be applied to almost any construction. We may be inclined to view the selectional process as nested or hierarchical (as suggested for a verb selecting a compound head, which then selects a modifier in a separate process, in Section 2 above) and simply look at productivity one slot at a time. To the extent that lexical choices in two slots, such as subject and object, can be shown to be statistically independent (which they are often not), we could disregard measuring productivity for the construction as a whole. It may in any case appear somewhat unclear what it would mean for the ditransitive to be more or less

126

Adapting measures to the syntactic domain

productive than the passive as a whole, whereas reference to the variety of arguments for the transitive object slot alone versus the passive subject slot alone may be of some more interest, for example to answer questions like ‘is usage of passives lexically more conservative in extending to new THEME arguments than actives?’ It is therefore not just a matter of taking two slots in a construction and computing measures (as we saw in Chapter 3, this can yield results even for non-well-defined linguistic processes). We would ideally like a case that is compelling not only for the comparison of single slots but also for the multi-slot constructions that contain them as well. In Goldberg’s example lexicon or ‘constructicon’, the case of comparative correlatives (henceforth CCs) seems to me to be the most interesting (for an in depth discussion of their syntactic properties see Culicover and Jackendoff 1999; for a crosslinguistic survey of this type of construction see den Dikken 2005). Especially in the ‘bare’ form given by Goldberg, the Xer the Yer (without the optional NPs and VPs in the Xer NP VP, the Yer NP VP), we are dealing with a construction that involves two slots that are interdependent not only in their semantic interpretation (the correlation), but also in their being obligatory to the constitution of the construction. We could easily imagine a ditransitive verb with a ‘missing’ argument, e.g.: (49) I would give good money to know what happened there where a referent for a supposedly elided dative object of give is difficult to envision. For the bare CC construction, as in (50), diverging from the the Yer pattern in the apodosis is only licensed if a VP is realized in the protasis clause (51). (50) The faster the better. (51) It would be better the faster it gets done. (52) *It would be better, the faster. (53) *The faster, it would be better. This interdependence is just one of the factors which make bare CCs especially attractive as a candidate for evaluating a two slot construction. At the same time, the contrast with non-bare CCs on the one hand and the behavior of the identical comparative category in either clause on the other

Productivity in multiple slots

127

hand brings us some natural points for comparison, giving the measuremeasurements a relative meaning. My own intuition, which will be tested below, is that the shorter CC variants are less productive than the long ones, which is mirrored by the existence of some intuitively frequent, conventionalized short forms (the more the merrier, the faster the better) but a much less constrained feeling for the full VP variant. Collecting data for CCs in English is somewhat difficult on account of their similarity to NPs (their overt markers being the word the and a comparative), though bare CCs are comparatively easy to find. The search strategy will use the following schematic patterns (see Appendix A.4 for exact queries): C: CC:

the ((more|less) ADJ | COMP) C NP? VP? ,? C NP? VP?

meaning each clause must contain the pattern C, which begins with the (possibly capitalized, upper case was also considered), followed by either analytic more/less and a positive adjective, or a morphological comparative (including forms ending with -er and the irregular forms more, less). The entire pattern CC contains two clauses, optionally separated by a comma, which may contain an NP and possibly a VP using a high accuracy but flexible definition for both (including modification by adjectives, adverbs, PPs and genitive attributes). Though it is of course possible to mix clause types (e.g. VP in the second but not in the first clause), I will limit the scope of this study to the three symmetrical types: bare CCs (abbreviated BCC), CCs with nominal phrases but no VPs (CNCN), and CCs with full VPs in each clause (CNVCNV). In order to compute vocabulary growth curves for each pattern, we must reconsider the type definition arrived at in Section 2 above. What constitutes a new type or a hapax legomenon for a construction with two open slots? Working naively from the definitions discussed above, we may be inclined to simply concatenate lexemes from the two slots to produce a single fused ‘type’ (e.g. faster_better), which is then tested for novelty – if some exact combination has not been observed before, it is a hapax legomenon. Observing the same combination again removes it from the HL list, but the type count V remains the same. Figure 26 gives VGCs based on this definition.

Adapting measures to the syntactic domain

CNVCNV CNCN BCC

0

200

400

V

600

800

128

0

500

1000

1500

N(C)

Figure 26. VGCs for symmetric bare, NP and NP+VP comparative correlatives using a ‘naive’ type definition based on slot concatenation. The results correspond to the intuition above: besides being less frequent in general, BCCs exhibit much lower realized and potential productivity (lower V, V1 and 3 ). CNCNs come in second, and CNVCNVs are the most varied in their choice of comparative combinations. However, is the concatenation based approach really reflecting what we mean by novelty? Looking more closely at the reasons for a hapax legomenon in that approach, we can identify two distinct cases: on the one hand, it is possible that (at least) one of the slots exhibits a novel lexeme, thereby leading to innovation. On the other hand, since we are counting unique concatenations, it is possible that both slots are realizing familiar arguments, but in a novel combination. We may indeed want to treat both these cases as forms of innovation, but it might be useful to understand what the impact of each case is on the idea of measuring multiple slot productivity. We therefore chart a second set of VGCs where only innovation in at least one slot counts as a new type: novel combinations of lexemes already observed in their respective slots are considered repetitions, but lexemes appearing in a slot for the first time are considered novel even if they have already appeared in the other. The results for vocabulary growth as measured using this revised type definition are shown in Figure 27, together with the previous curves in grey to illustrate the difference.

Productivity in multiple slots

129

400 0

200

V

600

800

CNVCNV CNCN BCC

0

500

1000

1500

N(C)

Figure 27. VGCs for symmetric bare, NP and NP+VP comparative correlatives using a non-permutational novel type definition based on innovation in at least one slot. As we can see, the different type definition has had a drastic impact on the more productive constructions, though the already rather unproductive BCC is hardly affected. This is because BCC was very conventionalized to begin with – most of its repetitions are due to verbatim recurrence of the most common phrases (the more the merrier, the faster the better etc.). The proximity of CNCN to BCC shows that the range of lexemes used is not that much higher for CNCN – the flexibility to configure known lexemes in new pairings is a significant contribution to its flexibility. CNVCNV, while strongly impacted by the removal of productivity via permutation, is still clearly above the other constructions, implying a lower degree of conventionalization in its choice of comparatives. Yet to understand just how conventionalized or limited these constructions are, we must look at comparatives at large: how much of the spectrum of conceivable comparatives which appear outside of CCs do these constructions employ? In order to make this comparison, we might naively compare productivity measures for two-slot CCs with one-slot comparatives. However, the choice of a single comparative cannot be fairly compared with a two slot construction, since the latter has ‘two chances’ to innovate. A simple solution would be to compare CCs with two instances of comparative choice. But this would only apply if we assume no interaction between the two CC choices – otherwise we distort the meaning

130

Adapting measures to the syntactic domain

of a potential comparison, since one CC clause may constrain the choice in the other, whereas two textually unrelated comparatives do not constrain each other. That CC slots are interdependent can be shown using the following argumentation: If the probabilities of hapax legomena in the protasis C1 and the apodosis C2 are independent, we would expect the probability for two-slot hapax legomena for a certain sample size N to be complementary to the probability of getting two familiar lexemes at N. In other words, the probability of a CC hapax legomenon should be 1 minus the product of the complementaries to 3 in each slot (for N tokens), as shown in Equation [14]. P ( HLC 1 ‰ HLC 2 ) 1  P (~ HLC 1 ˆ ~ HL C 2 ) [14]

| 1- (1-3 C1)·(1-3C2) Or more generally, for n independent slots designated by Si with the hapax probability 3 i (at a fixed sample size N(C)) we could write: P ( HLS 1 ‰ ... ‰ HLSn )

1  P (~ HLS 1 ˆ ...ˆ ~ HLSn ) [15]

| 1

n

– 1  3 i

i

If we compare 3 measurements for each CC slot, as in Table 17, the prediction in Equation [14] turns out to be fairly good for the most productive construction, CNVCNV (only ~4% deviation), and even for CNCN (~9% deviation, both results showing an insignificant binomial test result compared to the expected hapax probability), but not so good for BCC (17% deviation from the expected probability of hapax legomena, a significant difference for Į=0.05). Clearly, lexical choice is mutually constrained in a substantial way at least for BCC slots, and perhaps even for other CCs to some degree, and this seems to extend to the probability of hapax legomena in each slot. Intuitively, this means that if we see a protasis like ‘the more …’, we have a better idea or stronger expectation about the apodosis (the familiar ‘the better’ or ‘merrier’) than if we see ‘the more you do’.

Productivity in multiple slots

131

Table 17. Observed and expected 3 values for two-slot constructions under an assumption of hapax probability independence. construction BCC1 BCC2 BCC CNCN1 CNCN2 CNCN CNVCNV1 CNVCNV2 CNVCNV

N 824 824 824 1921 1921 1921 1923 1923 1923

V 166 103 206 228 212 378 282 381 567

V1 110 74 148 131 111 216 159 221 349

3 0.133495 0.089806 0.179612 0.068194 0.057782 0.112441 0.082683 0.114925 0.181487

3 / E(3 )

E(3 )

0.211312

1.176496

0.122036

1.085326

0.188106

1.036467

100 150 200 250

CNV2 comp-q BCC1 CNV1 comp+q CN1 CN2 BCC2

0

50

V

This further supports the idea that the two-slot productivity measurement is of interest: bare CCs are more conventional as a whole than longer CCs. At the same time, for reasons discussed above, this interdependence forces us to use single CC slots for a direct comparison with comparatives outside of CCs. This will also tell us a great deal about how constrained each slot is in itself. Figure 28 shows vocabulary growth in the different slots.

0

500

1000

1500

N(C)

Figure 28. VGCs for each comparative slot in symmetric bare, NP and NP+VP comparative correlatives, as well as non-CC comparatives including and excluding comparative quantifiers. The order of the legend reflects the order of the curves from top to bottom.

132

Adapting measures to the syntactic domain

As it turns out, the behavior of comparatives outside of CCs depends heavily on the operationalization chosen for synthetic comparatives. If we define synthetic comparatives based on the comparative part-of-speech tag (JJR), we get a massive amount of hits for more, which is very common as a comparative quantifier in environments where it is not interchangeable with a comparative adjective.70 If we remove the comparative quantifiers (more, less, fewer), then comparatives outside CCs are as productive as the least restricted CC slot, CNV2 (top two curves, no significant difference). With quantifiers included (comp+q), we receive a much flatter curve, since the high probability p(more) lowers p(HL), i.e. it lowers 3 . The result is then as productive as the middle curves around the more restricted CNV1. Comparing the curves for CC slots amongst themselves reveals some bigger surprises: while in the second clause, C2 (black curves), the ranking from before is maintained (CNV2 > CN2 > BCC2, p CNCN > BCC) or individual ones (BCC1 > BCC2). The selectional properties of the comparative formation are distinctly different depending on the embedding syntactic context. If this were not so, we should expect CC slots to exhibit the morphological productivity inherent to the comparative formation at large. But as the data in Table 18 suggests, this is plainly not the case, for reasons that remain to be explored.

6.

Interim conclusion: Measuring productivity for syntactic argument slots

The aim of this chapter has been to give a first look at what the implementation of the morphological productivity measures from Chapter 3 might look like for syntactic argument structures. By now, the following claims should be considered substantiated:

Syntactic productivity is scalar, in precisely the same ways as morphological productivity. For example, as shown in Section 3, it makes sense intuitively that prepositional wegen exhibits more varied head nouns than postpositional wegen, and this intuition can be corroborated in data, despite the fact that neither construction is unproductive (i.e. the possible arguments are not enumerable). ii. Syntactic productivity is consistent. Different slots have typical, distinguishable and predictable values for V and V1 as a function of N(C), and consequently also for 3 and estimates of S (with the caveat already noted for morphology that productivity is always dependent on ‘text type’ or register in the widest sense: measures only tell us what we can expect for ‘more of the same kind of data’). These phenomena should be explained, at least in a framework pertaining to describe and predict patterns of usage. iii. Syntactic productivity is multidimensional. As shown in Section 4, different aspects of productivity are not necessarily correlated (though they often are). Measures based on token and type frequency or hapax legomena, even for a same sized sample, may result in different rankings and must therefore be included in the description of the syntactic Productivity Complex. Some measures are independent from and should scale with sample size (frequency, and to the extent that it can be approximated S) but

i.

136

Adapting measures to the syntactic domain

others depend on the N(C) at which they are measured (V, 3 ). A complex of measures capturing these aspects must therefore give values as a function of N(C). iv. Productivity for multiple slots is meaningfully measurable, though it does produce a loss of information. In the case of comparative correlatives shown in Section 5, there was intuitive logic to the notion that bare comparative correlatives, such as the more the better, are more formulaic and less productive as a whole than ones with NPs and VPs. However, looking at each slot still provided additional information, showing that the second comparative in the bare construction is even less productive, and that the productivity of clauses with two full VPs is more owing to the first slot. Nevertheless, we have seen that productivity measures for both slots together is not necessarily predictable from each slot’s behavior due to interaction, which disproves the prediction made by Equation [14]. v. Productivity is context-sensitive. A byproduct of the comparison of productivity in CC comparative slots and comparatives outside CCs was the result that what is apparently one process, namely comparative formation, can behave quite differently based on embedding context. While this fact was not discussed in depth in this chapter, it will occupy a significant role in the discussion of the place of productivity in grammar in Chapter 6. For now, we will note the fact that unequivocal productivity measures of the sort assumed in Chapter 3 may be too crude, since they imply that each formation, be it morphological (comparative formation, nominalization in -ness) or syntactic (choice of an argument), has only one rating for each aspect of productivity (realized vocabulary, chances of a new type, etc.). This simplification misses out on possible systematic differences in argument selection as a function of the syntactic environment. However, we have not yet established what might be responsible for the varying productivity in different slots. I have already made reference several times to the role of lexical semantics and world knowledge in constraining the possible argument spectrum for some given selectional process, which has become perhaps most evident in Section 4. Clearly, the amount of different edible and drinkable things in the world, which can be denoted linguistically by some semantic class, is relevant for the variety of arguments we encounter as objects for the verbs eat and drink. Yet the

Interim conclusion

137

results in this chapter have implied that productivity in argument filling has a place in the grammatical representation of constructions, i.e. that competing variants of a construction (pre- vs. postposition, bare vs. longer CCs) may have inherently different productive behavior in usage. In the next chapter, I therefore turn to directly examine the relationship between lexical semantics, world knowledge and productivity.

Chapter 5 Lexical semantics and world knowledge

Up to this point, I have attempted to make the applicability of the notion of productivity to syntactic selectional processes plausible, showing descriptively that certain phenomena, in particular argument filling, show reproducible idiosyncratic behavior in the amount, the variety and the novelty of arguments that are realized in a certain slot. However, a direct theoretical examination of the reasons for these phenomena has been avoided. In this chapter I will try to show that generally assumed semantic factors, such as feature or entailment based semantic classes and decompositional lexical semantics, fail to fully explain or predict the phenomena explored so far. Similarly, reliance on world knowledge will be shown to be insufficient in explaining selectional behavior for novel arguments. The chapter begins with a brief overview of semantic approaches to word meaning and their implications for argument selection, focusing on definitions using semantic primes or primitives, semantic classes and features or entailments. The second section attempts to constrain what a semantic account of productivity might look like and what would constitute counterevidence for a purely semantic view. The following three sections are concerned with demonstrating idiosyncratic productivity in three progressively close families of argument structures: synonyms and other items taking a similar class of arguments (Section 3), different derivations from the same morphological stem (Section 4), and syntactic alternations involving the same verb with varying degrees of productivity (Section 5). Section 6 examines the interplay of productivity, semantic classes and pragmatics in translational equivalents across languages, aiming to establish productivity as a language-specific phenomenon. Section 7 concludes the chapter with some claims that will be used in the model of the mental lexicon presented in Chapter 6.

1.

Semantic approaches to argument selection

Since the potential for argument selection has been seen as an intrinsic property of words or morphemes, and especially of verbs, its semantic aspect (as opposed to the details of its syntactic realization) has been

Semantic approaches to argument selection

139

largely discussed within the theoretical domain of lexical semantics (but from a practical standpoint within lexicography and computational linguistics as well, see below). Put simply, lexical semantics is the study of the ‘meaning of words’, 71 which includes paradigmatic aspects of word meaning (e.g. synonymy, antonymy and other relations between paradigmatically interchangeable words) and syntagmatic ones, i.e. how the meanings of individual words combine in ways that are prescribed by the mental lexicon (see Geeraerts 1994a, b). Both of these aspects interact in the process of argument selection, since heads may lexically specify a semantic a-structure which they require to be filled by other lexemes (i.e. a list of thematic relations, or in generative grammar ș-roles in a ș-grid, Chomsky 1981: 35), but may also impose semantic restrictions on the paradigm of lexemes which are semantically suitable to fill that slot. It is therefore an important task of lexical semantics with regard to a-structure to specify what kind of information is stored in the mental lexicon to ensure semantically correct output. To illustrate the constraints that arise due to astructure restrictions, consider the following examples, reproduced from Schulte im Walde (2009: 953): (55) Elsa bakes a chocolate cake (56) *Elsa bakes that she likes cakes The verb bake requires certain lexically predetermined thematic roles to be filled by formally appropriate syntactic arguments, in generative grammar satisfying the ș-criterion, viz. that every role must correspond to exactly one argument and vice versa (notwithstanding the fact that the identification of the precise role and inventory of roles may not be uncontroversial, cf. Levin and Rappaport Hovav 2005: 38–49). Bake may appear in an a-structure with two nominal arguments: an AGENT and a THEME, which can be schematically represented as bake(AGENT, THEME), or instantiated with particular arguments as in (55), bake(Elsa, a chocolate cake). Replacing the THEME NP with a that-clause CP as in (56) is not possible, as this violates the verb’s syntactic a-structure. Crucially, the same head can appear with multiple distinct a-structures, which can make reference to the same thematic roles even if they are realized differently in 71. Though it is recognized that this definition is problematic without the possibility of reference to clear definitions of ‘word’ and ‘meaning’, cf. Geeraerts (1994a: 2160).

140

Lexical semantics and world knowledge

syntax (i.e. not necessarily as syntactic subject and object). This makes it possible to explain alternations of the structure in (55) with those in the following examples (Schulte im Walde 2009: 953): (57) Elsa bakes Tim a chocolate cake (58) The chocolate cake baked for 1 hour In (57) we have an additional GOAL argument, realized by the NP Tim, whereas in (58) we have the same THEME, this time realized syntactically as a subject rather than an object, but no AGENT, showing the relative independence of semantically determined thematic roles from syntactic structure. The assumption that we are not dealing with distinct, homophonic verbs, is motivated by the fact that such alternations retain the relationships of meaning between arguments and heads, and the fact that they are extensible to many other verbs (see Levin 1993 for alternations in English and Goldberg 1995 for their interpretation as constructions). On the other hand, it may not be enough that an argument be present to fill each thematic role, as Schulte im Walde (2009: 954) points outs:72 (59) #Elsa bakes a stone Though the a-structure in this example is superficially comparable to (55), the verb’s selectional preferences are said to make it appear strange, because a stone is not typically baked. Even if the sentence is accepted as grammatical on some level, there are many practical problems for which we may be interested in delineating the expected class of arguments for an a-structure. For example in computational linguistics, lexicalized parsing allows us to identify verbal objects more reliably by replacing individual lexical items with abstract classes. If a parser knows examples of the sort drink a beverage and is confronted with a novel beverage in that position, as in drink cocoa, knowledge of the semantic class can help it solve the data sparseness problem (see Schulte im Walde 2009: 960). In some early accounts, selectional preferences were seen as categorical restrictions (most prominently in Katzian semantics, Katz and Fodor 1963), which were also responsible for the selection of the correct compositional 72. This example is marked with a ‘?’ by Schulte im Walde, but I mark it with ‘#’ here for consistency with other grammatical but infelicitous examples, reserving ‘?’ for questionable grammaticality.

Semantic approaches to argument selection

141

interpretation of sentences containing lexemes with multiple readings. For example, in the following sentence (Katz and Fodor 1963: 198): (60) The man hits the colorful ball

colorful is interpreted as meaning ‘abounding in contrast or variety of bright color’ (as in the gift came in a colorful wrapper), while a possible reading ‘having distinctive character, vividness, or picturesqueness’ (as in no novel is less colorful than Middlemarch, excepting Silas Marner) is rejected. This is attributed to the fact that the first reading qualifies, among other things, the class (e.g. ball), but the second reading qualifies the class instead.73 It should be said at the outset that adhering to this approach leads to some very specific semantic classes, as noted already by McCawley (1968: 134), who points out the classes [+shrimp or prawn] for the verb devein or [+matrix] for the verb diagonalize. Whether or not one understands such class notations as implying cognitively founded classes, however difficult this might be to establish, or merely an expression of the semantic meaning of the verb through which object selection is restricted (regardless of what sort of actual cognitive representation is assumed) the feature-based approach to semantic classes has remained influential as a starting point for explaining semantic restrictions on argument selection, and more so for cognitively plausible classes encompassing foods as in [+edible] or liquids as in [+liquid]. For other authors, the line between such selectional restrictions and astructure in the sense of thematic roles seems less clear. Dowty (1991) distinguishes ‘thematic role types’, which are generalizable across many verbs, from ‘individual thematic roles’, which apply to specific roles for specific verbs (e.g. a BUILDER role for the verb build and a KILLER role for the verb kill). He cautions that the invocation of individual roles trivially leads to maximal differentiation and satisfaction of the ș-criterion for any argument, at a price of the loss of generalization power. However, Dowty does not rule out that such individual roles exist beside more general roles. In fact, he sees all thematic role types as arising from semantic entailments: From the semantic point of view, the most general notion of thematic role (type) is A SET OF ENTAILMENTS OF A GROUP OF PREDICATES WITH RESPECT TO 73. I reproduce the Katzian notation here, which is essentially equivalent to the feature notation used elsewhere in this book, e.g. [+physical object].

142

Lexical semantics and world knowledge

ONE OF THE ARGUMENTS OF EACH. (Thus a thematic role type is a kind of sesecond-order property, a property of multiplace predicates indexed by their argument positions.) (Dowty 1991: 552)

He then continues by demonstrating that various verbs, e.g. murder, nominate and interrogate, all entail for their AGENT argument that it is volitional, intends for the predicate in question to take place, causes it to take place, etc. which other verbs might not (e.g. kill need not have a volitional AGENT). Thus typical agenthood (and likewise patienthood) can be decomposed into several common entailments, which Dowty uses as a foundation for a theory of multiple entailment-based proto-agent and protopatient roles. Naturally, individual thematic roles can also trivially be defined as entailments: build requires an argument that implies not only protoagenthood on a coarse-grained level through the entailments of volition, etc., but also ‘one that builds’, i.e. a BUILDER. Thus the ontology of thematic roles can exist at different grains, where individual thematic roles such as BUILDER are the most granular level.74 In whatever granularity we assume to be necessary, the relevant entailments or semantic classes must be stored in the lexicon entry of the verb or a-structure in question (cf. ‘a property of predicates’ above), since they cannot be predicted on the basis of general principles.75 Another approach to semantic classes may be found in the framework of decompositional semantics, which is concerned with the definition of the meaning of words through the use of a small, closed inventory of basic concepts, which should ideally be language independent. Two prominent approaches can be found in Jackendoff’s (1990) Lexical Conceptual 74. In some CxG work, notably Goldberg (1995: 50–66), general thematic roles such as AGENT are assumed to be specified by more abstract a-structure constructions, which fuse together with specific roles like BUILDER when a concrete verb instantiates the construction. However it seems possible to conceive of any number of intermediate levels of granularity without a qualitative difference in the type of restrictions involved, so that the distinction between the most general and most specific roles need not be seen as a dichotomy (see also Boas 2011 for a recent discussion). 75. For a more explicit formulation of this idea see the Inhaltsmerkmal or ‘content feature’ postulated for the mental lexicon by Jacobs (1994: 22–28), with the corresponding INSP (inhaltliche Spezifizität ‘content specificity’) criterion for felicity.

Semantic approaches to argument selection

143

Structure (LCS) and Wierzbicka’s (1996) Natural Semantic Metalanguage (NSM).76 In Jackendoff’s work, conceptual structure is one of three levels of grammatical representation beside phonological and syntactic structure (Jackendoff 1990: 16–19). It is not completely language-dependent, as it interfaces with general perception and non-linguistic processes of reasoning, but is connected to the syntactic and phonological components through correspondence rules, which presumably lead to language-specific concepts. As a consequence, linguistic entities such as words and sentences reflect conceptual structure. For Jackendoff, the semantic differences between these types of entities are not ones of principle – words have an internal semantic structure which corresponds largely to semantic relationships that can be expressed by complex syntagms, which he carefully but very explicitly expresses as follows: to the extent that twice paraphrases two times, or kill paraphrases cause to die, or smash paraphrases break violently, or sell paraphrases give away in return for money, the extralexical semantic structures expressed by the paraphrases must be reproduced internal to unitary lexical items (Jackendoff 1987: 373)

That is to say, there may certainly be differences between kill and cause to die (and presumably hence “to the extent that…”; see also Fodor 1970 on causative decomposition and kill in particular), but if I understand Jackendoff correctly, then from a semantic point of view they may be seen as equivalent.77 Jackendoff also explicitly defends the idea that “thematic relations are part of a level of semantic/conceptual structure, not part of syntax” (Jackendoff 1987: 372), so that one can assume that the same 76. The brief discussion of these particular approaches here is meant as an example and is by no means exhaustive. For an overview in the context of argument realization and event structure see Levin and Rappaport Hovav (2005: 68–75). 77. This is especially pertinent in his criticism of Fodor’s (1998 i.a.) notion that words can never be accurately defined by a paraphrase and consequently have no decompositional structure, summarized in Jackendoff (1990: 37–41). For Jackendoff, some aspects of meaning are not part of lexical conceptual structure but may be stored in a non-linguistic perceptual or encyclopedic knowledge which should not be relevant for syntactic behavior (Jackendoff 1990: 32-37; though see Taylor 1996 for criticism of this separation of knowledge bases on the grounds that aspects of encyclopedic meaning always find reflexes in argument selection, among other things).

144

Lexical semantics and world knowledge

thematic roles are implied by paraphrases of the sort cited above (these are simply different syntactic realizations of the same role, just as passivization does not alter the roles of the underlying arguments). Formally, Jackendoff (1987: 386) envisions selectional restrictions as ‘semantic markers’, such as the conceptual class [+liquid] for the object of drink, and these are indexed to particular syntactic constituents. Words incorporating e.g. drink into their decomposition would therefore impose the same semantic marker on the relevant argument position as well. However this is not to say that Jackendoff does not recognize differences in meaning between paraphrases such as the above quite explicitly: The fact that almost every word meaning has fuzzy boundaries and various sorts of indeterminacies does not threaten a theory of lexical decomposition, despite frequent claims to the contrary. It is however necessary to develop a theory of decomposition in which such indeterminacies are part of the fabric of conceptual structure. […] On the other hand, such conceptual indeterminacies seem to play a relatively minor role in the relation between conceptual structure and syntax. That is, the correspondence rules between these two levels of representation make reference primarily (and perhaps exclusively) to those aspects of conceptual structure that are more or less discrete and digital. (Jackendoff 1990: 283–284)

In other words, these differences are simply not reason enough to give up on the benefits of decompositional analysis, which can account for entailments (e.g. x killed y entails y is dead) and conversely for the inappropriateness of utterances contrary to such entailments. A similar, but more minimalistic approach can be found in Wierzbicka’s (1996) Natural Semantic Metalanguage (NSM), which attempts to reduce the inventory of ‘semantic primes’, axiomatic primitive concepts, to a small selection of 55 items from which the meaning of all other lexical units in any language should be derivable. Semantic primes themselves are such basic elements of human cognition that they plausibly do not require any definition. Among these are terms of pronominal/substantive reference (I, YOU, PEOPLE), relations (PART, KIND), evaluations (GOOD, BAD), spatiotemporals (ABOVE, ELSEWHERE, NOW) or logical operators (NOT, IF). Wierzbicka (1996: 258) sees lexicography as a particular area that could benefit from semantic primes, as a methodology to do away with the circularity often found in dictionary definitions, which define words in

Semantic approaches to argument selection

145

terms of related or partially synonymous words, which then call upon the first set of words in their own definition.78 Since each language may lexicalize different configurations of these primitives, languages may develop distinct lexicons. In Wierzbicka’s model, meanings are “unique and culture-specific configurations of universal primitives” (Wierzbicka 1996: 257). Though Wierzbicka does not specifically refer to argument realization, this approach has the same characteristics as Jackendoff’s analysis insofar as meanings that are based on the same decomposition cannot formally differentiate their a-structure. In subsequent sections I will refer to this property as ‘a-structure inheritance’, which is to my mind a direct consequence of decompositional approaches to argument semantics. Finally, before beginning to disentangle grammatical, semantic and pragmatic aspects of argument selection, we must consider an alternative view which has been put forward especially in the study of coercion phenomena, i.e. cases where seemingly inappropriate argument selection is interpreted contextually. 79 For example, according to Weinreich (1966: 459), in (61), the object carrots for the verb drink, though semantically odd (since it does not qualify for the [+liquid] class mentioned above), is given an interpretation, in violation of the predictions made by Katzian semantics: (61) I drink carrots Since the object of drink must be a liquid, the verb imposes an appropriate interpretation on the object to the extent that one is imaginable (e.g. that carrot refers to carrot juice). In Weinreich’s approach, the verb imposes a ‘transfer feature’ of liquidity onto the object, overriding conflicting features (such as carrots being solid in this case, though this leads to problems in predicting which features are overridden in which ways, as pointed out by Geeraerts 1994b: 4476). Later studies have treated coercion as more of an exception, rather than assuming transfer features as the basis of selectional 78. In this respect, Wierzbicka’s main concerns are very close to Fodor’s (1998), whose work she very strongly criticizes. Fodor rejects definitions in terms of other words as a priori dissatisfactory, since each and every word’s meaning is distinct and unique. The conclusions of the two authors are thus diametrically opposite: whereas Fodor sees the effort to define lexical meaning as futile, Wierzbicka views definitions as imperative and realizable only by resorting to a set of axioms. 79. The recognition of coercion as a clearly defined or even useful linguistic category is however not unchallenged, see Ziegeler (2007).

146

Lexical semantics and world knowledge

restrictions. According to Jackendoff and Culicover (2003: 542), coercion is largely restricted to certain conventionalized cases. For example, the pattern encountered in (62) can easily be extended: (62) (One waitress says to another) The ham sandwich over in the corner wants another coffee Clearly, ham sandwich can be extended to anything a customer has ordered in referring to that customer, and coffee, though normally a mass noun, is referring to a count noun, presumably a cup of coffee. The pattern (including the context involving the waitress) coerces the otherwise inappropriate subject and object (which must correspond to a volitional AGENT and a countable physical object THEME) to be interpreted correctly, but not any coercion is possible (see also Nunberg 1995 on the conditions in which systematic polysemy of this sort becomes possible): It is crucial to recognize that coercions are conventionalized—it is not as if anything goes. For instance, the coercion responsible for the interpretation of coffee in [(62)] is sometimes called the UNIVERSAL PACKAGER, but it is far from universal. It is truly productive only when applied to edible portions of liquid or semiliquid food (water, pudding, etc.). It is far less appropriate applied to, say, the portion of water necessary to fill a sprinkling can or to a truckload-sized portion of cement (in such a context, *I’ll bring you a water/cement is out). That is, generally a coercion is restricted to certain (conventionalized) contexts, within which it is fully productive. (Jackendoff and Culicover 2003: 543)

Other accounts, notably work by Pustejovsky and colleagues, distinguish between coercions due to polysemy of the arguments involved, where a predicate coerces one correct reading among many readings available for an argument lexeme, and coercions in which a predicate ‘wraps’ an argument in an appropriate semantic type; these two types of coercion are then called ‘type exploitation’ and ‘type introduction’ in Pustejovsky’s terms (see e.g. Pustejovsky and Jezek 2008). Jackendoff’s ‘conventionalized coercion’ is probably more similar to the latter case, whereas the former case may perhaps be dealt with as a type of polysemy under the current approach (cf. Chapter 2, Section 5). In many cases it may not be completely clear which type of coercion is taking place since the basic set of senses for a word is not known. In fact, in some approaches it

Can semantics and world knowledge explain argument selection?

147

might also not be entirely knowable as a discrete list to begin with, cf. Cruse’s (2002) ‘micro-structure of word meanings’. Regardless of whether one treats coercion as an exception that proves the rule, i.e. coercion marks the border of a standard semantic class, or as the rule itself (a transfer feature approach, where coercion is not exceptional), either approach accepts a set of semantic restrictions for a certain slot, be it defined for the slot itself or as a consequence of the syntagmatic relationship to the head (or more generally the construction) that requires it. For the present discussion, all such restrictions that result from somewhat general, not per-construction conceptual classes will be classed as semantic constraints. This class of constraints is tightly interrelated with the way the real world around us is, but is perhaps not identical with it: for example, what is considered edible or drinkable may be a contingent matter of conceptual structure (e.g. whether we eat or drink soup, see Section 6 below). But it is objectively true that we ingest different liquids and solids (based on some operationalizable, say, chemical definition), and possibly that the liquids are less various than the solids. I will refer to the latter type of constraint, in cases where we wish to distinguish it from the conceptual semantic type of constraints, as a pragmatic constraint, which is rooted more deeply in the world or ‘world knowledge’ than in the way we perceive it. Nevertheless, I will not claim that these classes are categorically distinguishable: the line may be very difficult to draw in many cases, since speaking in a strict Cartesian sense, we do not have any knowledge of the world except through our often biased perception. With these foundations in place, the next section will establish what must be the case for semantic and pragmatic constraints to fully explain productivity, and what would constitute a counterexample to this possibility, before moving on to the empirical examination of the problem.

2.

Can lexical semantics and world knowledge explain novel argument selection?

The views about semantic constraints on argument realization presented in the previous section all imply that the meaning of a word is what determines the arguments it may occur with. However, this does not tell us much about the range of lexemes we can actually expect to see realized in each such position, nor is there any distinction between the realization of familiar items (which may be lexicalized in a certain structure) and novel

148

Lexical semantics and world knowledge

ones. In such accounts, which do not necessarily seek to explain actual obobserved tendencies in usage, examples (63)–(64) are equally odd: (63) #Elsa bakes a stone (64) #Elsa eats a stone The object stone is certainly not considered [+edible] or [+bakeable] in any normal semantic account. But to my mind, the former example is considerably more surprising than the latter in some ways: I can more easily conceive of someone eating something inedible than baking something unbakeable. It also transpires that eat is much more productive than bake in every respect. The verb bake is a lot more conventionalized: we only bake certain things, which may of course vary regionally, but probably never amount to a very large repertoire. Also, something that is baked typically leads to something edible, so it is not surprising that eat should be more productive in the empirical sense discussed in previous chapters. But is there really a difference in principle? Certainly one could ingest many inedible things and describe that action as eating, or one can put the same or other things in an oven and call it baking (in this case a stone). The difference is therefore patently one of usage, not of principle: what we use the verbs eat or bake, etc., to express, depends in large part on pragmatics or world knowledge. From a usage-based point of view, in order to have explanatory power semantic classes must predict the range of arguments we see in practice, for given pragmatic contexts. If the class of liquids is actually very big, but humans only drink very few of these in practice, and consequently converse about drinking fewer different liquids, then the semantic class [+liquid] is not yet at fault. Pragmatics can supply a plausible explanation why only certain liquids are said to be drunk. Similarly, objects of eat are more productive than those of drink, despite the fact that any lexical item [food] could generally be pureed and referred to as liquid [food] (cf. Weinreich’s coercion of carrots in the previous section). In reality, foods eaten are repeated less frequently than beverages drunk, which mirrors the productivity measures above nicely.80 But is this sort of explanation always 80. At least for V and V1. The fact that eat is more token-frequent than drink (in the corpora used here) does not reflect the likely truth that we engage in more drink than eat activities in a typical day. Nevertheless it appears one speaks, or rather writes in our corpus more often about eating something than about

Can semantics and world knowledge explain argument selection?

149

possible? Here I would like to suggest that if semantic classes reflect our categorial perception of the world, and if the world pragmatically motivates our utterances, then a direct explanation of syntactic productivity as discussed here, based on purely lexical semantic considerations would have to answer this question with ‘yes’. Indeed, this may not seem so implausible: some corpus based studies show that verbs sharing certain features of meanings cluster well through similar usage in particular constructions, e.g. the (dis)preference of different verbs for specific tense, aspect or mood constructions in Stefanowitsch and Gries (2003: 230–235), and results in the computational field of corpus-based subcategorization acquisition suggest similar possibilities for argument selection (see e.g. Korhonen 2002, and Schulte im Walde 2009 for an overview). But if it is only the semantic meanings of words and the pragmatic or communicative need to use those meanings that determine argument realization, then any difference between two argument spectrums should be reducible to a difference in meaning. The most direct way to test this hypothesis has already been alluded to in the previous chapter: constructions which are synonymous (whatever that may mean, see the next section) should exhibit no significant differences in argument realization. Notwithstanding the fact that slightly different contexts will lead to different arguments in individual instances, there should in particular be no consistent tendency for one member of a synonymous set of constructions to favor or disfavor novel arguments, which can be assessed using hapax legomenon proportions in large, equalsized data sets. We have already seen in the previous chapter that this is not always the case: different variants of the German adposition wegen ‘because of’ presented in Chapter 4, Section 3 are synonymous and can all be used with any argument conceptualized to fill a CAUSE role, but productivity for the more archaic postpositional variant is significantly lower; and different forms of the comparative correlative construction exhibit the same semantic interpretation (a monotonic correlation of two properties), but some variants are less productive than others, with different slots all having distinct tendencies. Are these exceptions or are the meanings of each construction not exactly the same? And if they are not drinking something. The problem is that it is very difficult to separate what portion of the difference is explainable by ‘knowledge about the world’ and what portion, if any, constitutes ‘knowledge about language’ (cf. Manning 2003 for a discussion of the distinction in the context of probabilistic grammar).

150

Lexical semantics and world knowledge

exactly the same, should they not still require the same classes of arguments (e.g. a selectional restriction to [+cause])? And what would constitute perfectly synonymous constructions otherwise? The following sections attempt to clarify these issues for unrelated words with either synonymous meaning or an apparent requirement of the same semantic class as well as for derivations and alternations involving the same morphological stem which preserve a-structure but have significantly different realized spectrums and productivity. A second approach to test the semantic/pragmatic hypothesis for novel argument selection above is to focus on ‘the world’ as a common denominator. If pragmatics determines our choice of lexemes we should expect productivity to be quantitatively comparable for all speakers living in the same world, language independently, given a certain extralinguistic meaning. Though it is well known that different languages lexicalize different collocations, there is no reason to assume that productivity for the object of a verb and its translation in another language should be substantially different. This hypothesis will be tested briefly in Section 6 by comparing argument spectrums across languages. To sum up, for lexical semantics and world knowledge to explain novel argument selection entirely, items with the same meaning and the same pragmatic utility (be it synonyms or translations) must exhibit productivity ratings that are not significantly different. If this turns out not to be the case, we must reject the claim that semantics and world knowledge can fully predict syntactic productivity and seek a language-specific explanation for the phenomena examined here elsewhere, in the grammar or lexicon of each language.

3.

Argument selection in (near) synonymous heads and constructions

If the semantic class model of argument selection is sufficient to explain realized selectional spectrums in practice, and semantic classes are not trivially defined on a per slot basis, it follows that all slots filled or constrained by the same class should have realized inventories that do not differ significantly. Importantly, deviations from this expectation would not mean that lexical semantics has nothing to do with argument selection – on the contrary, as we will see, semantics are intimately related to the function of constructional slots and slot filling processes. But it would refute an absolutist view of semantic classes, showing that semantically arbitrary

Argument selection in (near) synonymous heads and constructions

151

productivity also modulates the choice of unseen arguments from a nonenumerable set. In testing potential same-class slots two major types of slot identity can be distinguished: 1.

Synonyms with synonymous a-structures – if two constructions mean the same and have equivalent slots, then from a semantic point of view their arguments can be expected to be filled in the same way. The second condition, synonymous a-structures, is important, since it is conceivable that two constructions can have the same semantic meaning, except for the fact that one can specify further or fewer arguments, and that this affects the choice of arguments realized in both constructions as well. For example, devour requires a THEME argument which may be omitted for eat, notwithstanding other differences in meaning between the two. This makes a comparison difficult, since it is possible that some cases of eat with no object render object meanings elliptically that would have to be spelled out for devour.

2.

Heteronyms with synonymous a-structure – if two unrelated constructions take the same argument class, then class-based semantics should also predict no difference in their argument realization. This condition is much weaker, since differences may still be explained pragmatically. For instance, it would not surprise us to find out that although boil by definition takes a [+liquid] object, it is much less varied in its choice of objects than drink, which takes the same class in Jackendoff’s account above (see Section 1 above). Even in a formalism pertaining to explain usage (which Jackendoff 1990 does not – the focus is on what can appear in a slot, not what does), this can simply be explained by our experience of these activities in life, without necessarily having to refer to grammar.

What exactly should constitute synonymy is of course the major difficulty in finding concrete test cases. There is a long standing tradition in linguistics, going back at least as far as structuralism, that there are no synonyms whatsoever in natural language (cf. Bloomfield 1935: 145), in the sense of units that are interchangeable in any context salva veritate (‘with saved truth’, i.e. without changing the truth conditions of a statement). De Saussure’s original formulation for the value of a ‘linguistic sign’ stressed the idea that signs receive their meaning through opposition

152

Lexical semantics and world knowledge

to their ‘synonyms’, though clearly he used this term in the sense of words with similar, not identical meaning, i.e. so-called ‘near synonyms’: Within the same language, all words used to express related ideas limit each other reciprocally; synonyms like French redouter ‘dread’, craindre ‘fear’, and avoir peur ‘be afraid’ have value only through their opposition: if redouter did not exist, all its content would go to its competitors. (de Saussure 1966: 116)81

Similarly within lexical semantics, the synonymy of two or more words, as opposed to a mere similarity (which always can apply to some extent between any arbitrary pair of words), has been regarded as a ‘true exception’. According to Bosch (1993), such exceptions occur in only two types of cases: […] one are pairs of a full expression and its abbreviated counterpart (such as laboratory - lab, influenza - flue [sic]) and the other are pairs with elements from different vocabularies, different sub-languages, or just different languages that happen to co-exist in the linguistic behaviour of a population. Typical examples are pairs with one more popular and one more technical word, such as smallpox - variola, poisonous - toxic. In both types of cases we are dealing with exceptions, arguably even with diachronically instable cases, that should not play a central role for the theory of lexical semantics. (Bosch 1993: 21–22)82

Doubts as to the possibility of the existence of complete synonyms have been uttered in the context of lexicography as well, as in the following succinct statement:

81. The original French reads: “Dans l'intérieur d’une même langue, tous les mots qui expriment des idées voisines se limitent réciproquement: des synonymes comme redouter, craindre, avoir peur n’ont de valeur propre que par leur opposition; si redouter n’existait pas, son contenu irait à ses concurrents”. 82. In fact, from a usage-based point of view even Bosch’s examples are not completely synonymous, e.g. toxic waste has a specific, established meaning in usage, which is not the same as poisonous waste, quite independently from technical language or register in general.

Argument selection in (near) synonymous heads and constructions

153

The first principle of semantic analysis of lexical items is that there are “no synonyms”, in the sense that no two lexical items ever have completely the same meanings in all of the contexts in which they might occur (Louw and Nida 1988:xvi)

Even if we accept the latter formulation as true, which many linguists (and the present author) may be inclined to do, this does not preclude that there might still be semantic classes characterizing the selectional space of arguments for near synonyms, or even for entirely unrelated heads that happen to take the same feature-based class of arguments (e.g. [+drinkable] or [+edible], etc.). However using (near) synonyms is probably preferable to distinct heads requiring the same class since it gives us a minimal pair for comparing productive behavior: any differences we find for such a pair which is accepted by a semantic account will have to be due to an extrasemantic, conventionalized or inherent productivity which this chapter seeks to demonstrate. Conversely, any similarities we find can then be explained on semantic grounds. A comparatively disinteresting counterexample to a purely semanticist expectation can be found in more or less fixed sequences, such as collocations and idiomatic expressions (see Evert 2005 and Wulff 2008 respectively). These lead to two problems for measuring productivity: firstly, an out-of-class argument may be collocated or used idiomatically in a slot normally forbidding it. For example, if we compare drink with spill, whose objects should plausibly also satisfy the class [+liquid], we discover that about half the cases of spill in ukWaC are accounted for by the idiomatic collocation spill + beans, as in (65) (the relevant argument structures are emphasized in bold here and below). (65) Now some of Britain’s leading professional crime writers have joined together to spill the beans about their work [ukWaC, pos. 64685483] If we admit these cases, objects of drink are measured to be equally (or insignificantly more) productive compared to those of spill. Yet if we rule these out, spill beats drink significantly (p5 in the search strategy outlined above.90

90. Unfortunately, the size of the corpus makes it impossible to verify manually that all types with a lower frequency are in fact synthetic compounds / verbobject pairs. I therefore concentrate on well-attested vocabulary here: if there is a strong linear correlation between V(SC) and V(VO) this should apply

Semantic and selectional effects in derivations from the same stem

50 100

mach- 'do' herstell- 'produce'

10

20

leit- 'lead'

seh- 'see' verlier- 'lose'

2

5

V(SC)

169

2

r =0.2161

anspitz- 'sharpen' 1

verbind- 'connect'

1

10

100

1000

10000

V(VO)

Figure 31. Log-log plot of V for verbal lexemes in well-attested (N(C)>5) verb-object and synthetic compound constructions (some example stems are highlighted). The horizontal curve gives a linear model for the correlation of V in both constructions. Surprisingly, the correlation of the counts of well-attested types is not strong, accounting for just under 22% of the variance in the dataset (Spearman’s r2=0.2161, p0.05) for N(SC) and N(VO). Frequent compounds do not necessarily have frequent corresponding VO attestation, and vice versa. In fact, it is likely that high frequency in one pattern leads to a lexicalization or collocation which is connected to a preference for that pattern. Some of the most extreme points in the data exemplify this in Figure 32: die Wahrheit sagen ‘tell the truth’ is a very common collocation, whereas Wahrheitssager ‘truth sayer’ is rare.

1000000

Semantic and selectional effects in derivations from the same stem

171

10000

Arbeit+nehm- 'take work, be employed'

2

100

N(SC)

r =0.0006

1

Wahrheit+sag- 'tell truth'

1

5

10

50

100

500

1000

N(VO)

Figure 32. Log-log plot of N for well-attested (N(C)>5) lexeme pairs as SCs and VO pairs. Conversely, lexicalized Arbeitnehmer ‘employee’, lit. ‘work taker’, is hardly ever paralleled by Arbeit nehmen ‘take work’. Evidently, frequent items can develop both their own meanings and frequency distributions, independently of the semantic classes involved in the a-structure. This leaves open the question of infrequent items – is productive behavior in novel argument selection similar for SCs and corresponding VOs? Is it possible to observe similar effects in the behavior of 3 and the development of V1? Here it is impossible to give an accurate assessment based on all lexemes, since hapax legomena in the corpus contain very many errors. Instead, I will concentrate on showing some concrete counterexamples that can be filtered manually. Since transitive verbs are much more frequent than SCs, it is hardly surprising that we can find many VO hapax legomena that are not attested as SCs. However the opposite might be somewhat surprising: are there heads which show up with many novel non-heads that are not attested as VOs? As Table 26 shows, this is exactly the case for many of the lexemes we find showing a high V(SC) in Figure 31.

172

Lexical semantics and world knowledge

Table 26. V1 for some SC heads along with the subset of items attested as VO and the proportion of VO / SC attestation. SC head Hersteller ‘manufacturer’ Leiter ‘head, leader, manager’ Führer ‘head, leader, manager’ Anbieter ‘provider, offerer’ Vertreter ‘representative’ Betreiber ‘operator’ Bewohner ‘inhabitant’ Sender ‘sender, transmitter’ Sammler ‘collector’ … Sucher ‘searcher’ Finder ‘finder’ Sehen ‘seer’ Abbrecher ‘quitter’

V1(SC) 1130 1057 867 716 664 568 381 366 344

attested as VO 92 51 147 136 71 57 12 21 1

RVO/SC 0.081416 0.04825 0.16955 0.189944 0.106928 0.100352 0.031496 0.057377 0.002907

194 108 41 21

96 69 33 9

0.32653 0.638888 0.804878 0.428571

The heads in the top part are typical examples of lexicalized deverbal nouns. Though in principle they preserve the argument structure of their related verbs, they are extremely prolific as compound heads, appearing with a wide range of non-heads unattested as corresponding verbal objects. It seems speakers are very ready to coin compound names for new types of Sammler ‘collector’, Hersteller ‘manufacturer’ or Leiter ‘head, leader’ without choosing the VO realization for the corresponding meaning. The heads at the bottom of the table behave differently: here a large portion of SC hapax legomena seems to be motivated by, or at the very least to parallel, VO attestation. If the semantic relationship between agent SCs and VOs is a paraphrase of the sort ‘one who Vs O’, then these facts suggest that speakers often choose to coin their innovations in the former pattern for certain lexemes, although semantically speaking the latter construction is equally possible. In this respect, the usage of SC lexemes is a good example for the inability of semantic classes to predict productive behavior, even though we are dealing with the same argument class from a decompositional point of view.

5.

Semantic-pragmatic motivation and syntactic alternations

Although the previous section has shown that semantic classes perform differently within different constructions, there is little doubt that the

Semantic-pragmatic motivation and syntactic alternations

173

semantic differences between, say, verbal phrases and compounds are subsubstantial. Differences in productive usage are not arbitrary, but have to do with the meaning of e.g. agent nominalizations and the kind of lexemes and associated real world referents that are likely to be used within that construction. For example, we do not see many types of compounds headed by seer since seeing is not generally a profession or (intentional) habitual action in the real world – this case can therefore be said to be rooted in pragmatics. In this section I will focus on variant constructions that differ only minimally in their surface form, with no discernible differences in meaning or pragmatic context. The goal will be to find forms that have distinct productive behavior that cannot be motivated by corresponding differences in world knowledge. Most syntactic alternations, such as the English locative alternation (see Levin 1993: 49–55, Iwata 2008), are accompanied by subtle differences in meaning. A well-known example is the verb spray, which in (85) implies that the entire wall is covered with paint, while the variant in (86) only describes some paint being sprayed either on some part of the wall or on the entire wall. The distinction between the two cases is sometimes referred to as ‘holistic’ versus ‘partitive’, and can lead to a truth conditional difference in meaning under the right circumstances. (85) Jack sprayed the wall with paint (86) Jack sprayed paint on the wall On the face of it, the thematic roles of the verb should nevertheless be inherited by either construction and accept the same semantic classes, as in the case of heteronyms taking the same object class mentioned in Section 3. However given the subtle difference in meaning, a difference in productivity could be attributed to pragmatic reasons, e.g. that many things get sprayed on other things, but spraying an entire surface or object is usually done with only a few substances.91 It is therefore imperative to find

91. Interestingly, though cases for this alternation in the examined corpora are too few for substantial results, it appears that the opposite may be true. The on variant is more repetitive in its GOAL/THEME argument (target of spraying) than the with variant in the small range of attested cases in ukWaC. However because of the above differences in meaning, this alternation will not be pursued further here.

174

Lexical semantics and world knowledge

cases of alternations which truly do not alter the meaning of the verb in question or its interaction with its arguments.92 One such example which has received considerable attention in literature is the alternate use of to and bare infinitives as verb complements, for example in the alternation help [NP] (to/Ø) [VP], or without an object, help (to/Ø) [VP] (Mair 1995, 2002; McEnery and Xiao 2005). The following BNC examples cited by McEnery and Xiao (2005: 162) illustrate the different constructions in question: (87) a. HELP to V Perhaps the book helped to prevent things from getting even worse. b. HELP NP to V I thought I could help him to forget. c. HELP V Savings can help finance other Community projects. d. HELP NP V We helped him get to his feet and into a chair. Historically, it is generally accepted that the bare infinitive first became preferred in American English (see Quirk et al. 1985: 1205, Kjellmer 1985, Algeo 1988: 22, Biber et al. 1999: 73), but multiple studies of various more recent corpora confirm Mair’s (2002: 122) view that “while this may have been true in the 1960s, it is no longer so now” (this conclusion is also upheld by McEnery and Xiao 2005: 166). Synchronically, the bare variant has become the more frequent one in both British and American English. McEnery and Xiao (2005: 169–170) also dispute a presumed contrast between spoken and written usage (cf. Kjellmer 1985) in a study of different components of the BNC, while attributing this misconception at least in part to diachronic interference – contemporaneous spoken and 92. There are some examples of seemingly equivalent constructions not amenable for the present purposes for these reasons. Wulff (2006) uses collostructional and distinctive collexeme analysis (Stefanowitsch and Gries 2003, Gries and Stefanowitsch 2004) to show that despite a very similar interpretation, English go VERB and go and VERB prefer semantically distinct verbal complements. By contrast, the English verb particle alternation between pick up the book and pick the book up, which has been referred to as a case of ‘allostructions’ with the same meaning (Cappelle 2006), shows little in the way of semantic bias, but since length of the direct object is known to interact with the choice of construction (see Gries 2003 i.a.), a bias can be expected depending on the length of novel arguments and the phrases they head.

Semantic-pragmatic motivation and syntactic alternations

175

written British data shows no statistically significant differences in usage (cf. also Mair 2002: 123 for the same view). Syntactic constraints on the choice of variant have also been suggested: while there is a definite trend to avoid to complementation when help itself is a to-infinitive (to avoid the horror aequi in the sequence to help to [V], cf. McEnery and Xiao 2005: 180) there is considerable counterevidence as well (Mair 2002: 125). Similarly, an intervening object NP raises the likelihood that the bare infinitive will be used, though counterexamples abound (McEnery and Xiao 2005: 176–178).93 For the present discussion, however, the important question is whether or not there are any semantic or pragmatic differences in the usage of the different constructions. Claims of real differences in meaning are rare in recent literature, the chief example being a supposed greater ‘involvement’ of the agent in the helping action in the case of the to-infinitive (Quirk et al. 1972: 841) or claims that to “can be omitted only when the ‘helper’ does some of the work” (Wood 1962: 107). As McEnery and Xiao (2005: 170) point out, such suggestions have been largely abandoned – they are absent in the later Quirk et al. (1985: 1206), which simply refers to the regional difference US:UK, and counterexamples to the claims made by Wood are easy to find. Large-scale corpus studies suggest that a difference in meaning cannot be identified quantitatively in corpus data (McEnery and Xiao 2005: 170–171), nor am I aware of any qualitative examples where the two constructions are not interchangeable salva veritate. If we accept that there is no discernible difference in meaning, it should follow that there can be no pragmatic difference leading to diverging productivity for these variants: anything one can help to do in the real world, one can also help do. We should therefore expect no pragmatically motivated significant differences in V or in 3 for equal sample sizes, which also normalize for differences in N(C) should regional differences affect frequencies. Similarly we can expect the prediction for S to be the same. My own intuition is however that the bare infinitive may be used more readily with frequent verbs that are more established with help, meaning it should be less productive. This does not mean that we can find any verbs for which one construction is grammatical and the other is not: merely that given a choice when selecting a particular argument, I believe 93. Incidentally, all of the above studies have only examined which construction is more frequent (the N(C) aspect of the Productivity Complex). I am not familiar with any previous work on how variable and extensible argument selection is in each variant.

176

Lexical semantics and world knowledge

3500

the speaker may be more inclined to select the bare infinitive for a conconstruction-familiar complement than for an unfamiliar one.94 Intuitions aside, the null hypothesis that there is in fact no difference in productivity can be tested using the vocabulary data plotted in the growth curves in Figure 33.

1500 0

500

V

2500

to-infinitive bare infinitive

0

100000

200000

300000

400000

N(C)

Figure 33. VGCs for bare and to-infinitive complements of help. The vertical lines mark the border between the empirical data and the fZM extrapolations, given by the dotted curves. As the data shows, the claims made by Mair (2002) and McEnery and Xiao (2005) are borne out – the bare infinitive is more frequent in ukWaC, and insofar as this corpus represents British usage, this reflects the assimilation of the construction in that region. However, the bare infinitive is indeed more repetitive, matching the intuition above. The to-infinitive exhibits about 400 more complements at the largest common sample (V values of 2462 : 2848), a significant difference of some 16% more items at p > > > >

108. The form ___er is used as a placeholder for comparatives, though we can allow it to represent analytic comparatives with more as well.

Explicitly modeling entrenchment and productivity

199

In other words, constructional interaction during the fusion of constituents into complex constructions alters productivity. There is therefore no single productivity value that can be represented for comparatives: it is markedly different depending on whether or not the adjective is embedded in a CC clause, which clause (the protasis C1 or the apodosis C2), and what other constituents the clause contains (subject NP, VP). The observed contextdependency of productive behavior thus joins observations on the sensitivity to both linguistic and non-linguistic context of other usage effects, such as priming and preferred argument or construction selection (see Elman 2009 for a review of such evidence). Secondly, it seems implausible that speakers retain representations of VGCs, which are a chronological record of vocabulary growth in the input they have received. While there is ample evidence that there is some storage of raw frequencies (reflected in the notion of entrenchment), there is no evidence to suggest that people are particularly good at telling when or in which order they acquired their vocabulary or certain arguments (beyond declarative knowledge of an anecdotal nature, e.g. I learned what sushi is when I …), and cognitively speaking such a ‘time-stamp’ for each example seems unlikely. Although recency effects, which lead to a higher likelihood of speakers using constructions they have recently heard, are well known (including both short-term and longer-term effects, see Bock and Loebell 1990, Gries 2005, Szmrecsanyi 2006), it has never been suggested that the time and order of exposure are retained, but only that entrenchment of constructions may decay with disuse (Bybee 1985: 118), and that recent exposure bolsters the entrenchment or activation level of the relevant forms. If speakers do not retain an actual representation of the VGC’s chronology, how can we explain the productivity measurements found so far? Since the information contained in PCN is equivalent to knowledge of frequency and the form of the VGC, the latter must somehow be explainable as an epiphenomenon of the structure of the mental lexicon (cf. Plag 2006: 553). Staying with fZMs as an approximation of the knowledge contained in the VGC, it is worthwhile taking a closer look at the function producing these models from Chapter 3, Section 6, reproduced here as [19].

­C ˜ S D 1 A d S d B g (S ) : ® [19] otherwise ¯0

200

Representation within a usage-based productivity grammar

The four parameters which the function g(ʌ) relies on are in fact not based on VGC data: recall that A and B estimate the minimum and maximum probabilities for members of the relevant category, while C is merely a normalization constant. The parameter Į is derived from the exponent parameter a in the denominator of the standard Zipf-Mandelbrot model, which regulates the steepness of the frequency distribution especially for the most frequent ranks (see Chapter 3, Section 6 and Evert 2004 for details). Importantly, none of these parameters require knowledge about the order or time of arrival of the different observed types, but only knowledge about the relative frequencies of the different ranks ʌ. In other words, fZMs are not modeled based on VGCs, but rather based on another object altogether: the frequency spectrums (SPCs) discussed in Chapter 3, Section 5. Predicted vocabulary growth curves can be extrapolated by estimating the probability of each type using the entire spectrum and its rank ʌ, while keeping in mind that there are many more, increasingly improbable types up to the minimum estimated probability rank (B). Importantly, SPCs seem to correspond almost directly to the knowledge provided by entrenchment, since every type is stored in the mental lexicon as a potential construction, and every construction has a degree of entrenchment corresponding to its frequency (modulo recency effects or priming, and possibly other factors modulating the equation ‘frequency = entrenchment’, such as salience). It therefore seems possible that entrenchment information alone is enough to build a representation of all aspects of PCN, since these are either directly related to frequency or are retrievable from a model that can be derived from entrenchment indirectly. One problem with this account is that entrenchment is only supposed to apply to somewhat frequent constructions. It is unlikely that accurate, or perhaps even any frequency information is retained over long periods of time for items appearing very rarely in a single slot, meaning the higher ranks in the frequency spectrum may be missing. For example, if we are familiar with the infrequent color term aquamarine, we can well imagine that we might have heard it in a phrase like paint X aquamarine. But it is difficult to be certain of this, and even if we have episodic memory of a similar proposition, it is quite possible that another verb or construction was used. Such non-lexicalized, very infrequent arguments can end up leaving no discernible traces, in time. This is of course not to say that single exposures have no impact on the brain (otherwise no construction would ever become entrenched), only that we cannot expect every hapax or dis legomenon we have ever experienced to leave retrievable information for our entire lives. However, the constitution of SPCs does not actually

A Hebbian cognitive account of argument categorization

201

require a list of hapax legomena, dis legomena etc., but only the amount of types for hapax legomena, dis legomena etc., or more generally, Vm for every m. If we accept fZM extrapolations as a sufficient approximation of the vgc() function, it is possible to claim that knowledge of Vm for each m is sufficient to explain productive behavior based directly on input distributions, while type-specific knowledge can be restricted to the entrenched, prototypical or highly frequent low ranks ʌ, as well as more recent or salient rare items.109 But is this kind of knowledge about Vm and m, beyond knowledge of concrete types, cognitively plausible? And how would a plausible architecture of the mental lexicon cause processes with data distributed in one way to be more productive than others with different distributions? In order to explain this, I will now turn to consider some of the properties of general cognitive mechanisms of productive generalization.

4.

Why do skewed distributions lead to productivity? A Hebbian cognitive account of argument categorization

Semantic approaches of the sort reviewed in Chapter 5, Section 1 rely on existing conceptual structures corresponding to classes with which we categorize our experience: once a class [+liquid] is present, both familiar items such as water and unfamiliar ones, such as a drink whose name we hear for the first time, are categorized as [+liquid]. There need not be anything particular to language about this process of categorization of new experience on the basis of categories containing already acquired knowledge – humans categorize their perceptions constantly, and even form and judge membership for categories ad hoc (e.g. things to sell at a garage sale, cf. Barsalou 1983). In cognitive science, there has been much work on modeling the acquisition and formation of categories on the basis of input, with implementations especially within the connectionist tradition (Rumelhart and McClelland 1986, McClelland and Rumelhart 1986). Work in neural networks showed early on that both basic linear discrimination tasks (e.g. the early Perceptrons, Rosenblatt 1958, Minsky and Papert 1969) and more complex categorization based on input (see Bishop 1995 for an overview) can succeed using configurations that emerge implicitly from the 109. In fact recent studies suggest that even for high ranks, i.e. very rare items, such as constructions or entire sentences heard only once, there may still be non-negligible traces in memory, at least for some period of time (Gurevich, Johnson, and Goldberg 2010).

202

Representation within a usage-based productivity grammar

properties of input signals. Hopfield networks (Hopfield 1982), and later on the family of Recurrent Neural Networks (e.g. Simple Recurrent Networks, Elman 1990), have shown that common properties of similar input exemplars can cause the creation of abstract ‘spurious representations’ more powerful than any of the actual input cases, a concept we will be able to use for the representation of constructions below. The principle at the heart of connectionist approaches to categorization is the general neural process referred to as Hebbian Learning or Hebb’s Law,110 which was originally expressed by Donald Hebb as follows: When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells, such that A’s efficiency, as one of the cells firing B, is increased. (Hebb 2002 [1949]: 62)

In other words, neural pathways that are used often become stronger through some metabolic change, meaning that usage physically changes the architecture of the brain to make subsequent usage along the same lines more likely. The converse of this principle, which states that cells that repeatedly activate separately enter into an inhibitory relationship is sometimes assumed explicitly as part of Hebb’s Law (e.g. Marshall 1995), while other researchers restrict Hebb’s Law to the original formulation above (e.g. Kohonen 2001), referring inhibitory relationships to other processes, such as lateral inhibition.111 By strengthening the links between co-activated, and hence functionally related neurons, Hebb’s Law predicts the emergence of so-called ‘neuron assemblies’, which can represent perceptual categorizations, and secondarily the cognitive concepts that they give rise to.

110. Sometimes also Hebb’s Rule or Hebb’s Hypothesis. 111. Lateral inhibition is a process by which neighboring neurons inhibit each other, especially in perceptual systems. Networks with lateral inhibition have been described as useful in ‘pinpointing’ a stimulus, e.g. in tactile sensation or vision, since the most powerfully activated receptors inhibit their neighbors, leading to better localization of the stimulus. As we shall see below, both extended Hebbian or lateral inhibition can equally be used to model some linguistic selectional effects, though this might not be strictly necessary. See also Pulvermüller (1999: 254–258) on more recent formulations and suggested modifications of Hebb’s Law based on neuroscience evidence.

A Hebbian cognitive account of argument categorization

203

To illustrate this general cognitive categorization mechanism, we can consider the acquisition of a pre-linguistic conceptual class of plants. As we perceive different instances of plants, certain neural pathways will be activated involving our senses (visual, olfactory etc.), but also our behavior (related motor actions, such as smelling or picking flowers). Different plants will have more or less exceptional properties unique to that species of plant or even instances of plants (special characteristics of my own rosemary plant, as opposed to all rosemary plants). But many properties will be shared across plants, such as the presence of green leaves, stalks, flowers, roots etc., leading to an overlap in neural activation as illustrated schematically at the top of Figure 37.

[+water lily]

[+water lily]

[+water lily]

[+tomato]

[+tomato]

[+bamboo]

Schematicity

Assembly of subnets

Common features [+plant]

Mutually exclusive features

Stronger than any instance

Figure 37. Representation of specific plants and plants in general using over-lapping Hebbian assemblies. The black network of ‘water lily’ at the top left represents an assembly of neural subnets which often fire during experiences related to that plant. The grey network for ‘tomato’, superimposed on the water lily network at the top center, has some subnets in common with the ‘water lily’ network (the

204

Representation within a usage-based productivity grammar

overlapping nodes), but also subnets that are not shared (the bottom right node). If we continue to add somewhat similar, partly overlapping networks like ‘bamboo’ (top right, white nodes and edges), we can expect the common parts of all three networks to receive the most activation, since these are fired during processing of any of these plant types. This results in the overlapping part, which can be thought of as representing the common features of the above networks, developing its own categorial representation and meaning, a network which might represent ‘plants’ and is more abstract or schematic than any of the constituent networks. Since it is strengthened by all plants, it is stronger than any instance of a plant and by virtue of Hebbian learning it should automatically fire whenever a constituent category, such as ‘water lily’ is accessed. This should apply even if a few of its nodes are not shared by ‘water lily’ but are shared by many other plants. Finally note that the special features of lilies and tomatoes should not be coactivated, since we normally either perceive lilies or tomatoes, but not both at once. In a model incorporating inhibition, i.e. the possibility for negative weights between nodes or subnets, we can expect, according to the inhibitory principle outlined above, a tendency for mutual exclusivity between these categories to develop, so that although the common ‘plant’ representation is activated by either plant type, a choice must be made where something most reminiscent of a tomato would laterally inhibit all other plants. This ‘sharpening’ effect is typical of lateral inhibition in perception and could explain the categorical nature of class-based reasoning: although there are more or less prototypical members of a class, once the most fitting class has been chosen it excludes other, conflicting classes. However whether or not our network model should include special inhibitory connections that develop between the peripheral nodes of the ‘plant’ network in Figure 37 is not an open and shut question. The competition between candidate representations for activation can be modeled as a pure ‘shouting match’, where the most strongly activated representation wins (e.g. Rumelhart and Zipser 1985, Földiák 1991, Roelofs 1992) or explicitly using lateral connections between competing nodes (e.g. Földiák 1990, Marshall 1995), so that nodes share an activation space in which the more powerful activation of one connection automatically results in the inhibition of others. 112 Since this point of 112. Another distinction in models implementing inhibition, namely whether inhibition applies to the output of excited units or to their input prior to activation, will not be discussed here (see Spratling and Johnson 2002).

A Hebbian cognitive account of argument categorization

205

implementation is not pertinent to the current discussion, I will sidestep this issue by assuming that the inhibitory mechanism can be either implicit or explicit, but that in either case the activation of the most relevant representation ends up being powerful enough to rule out any competitors whose unique features are in conflict (i.e. rarely or never coactivated) with the input pattern. This creates a paradigm effect in which a choice must be made in the process of categorization between similar, but distinct subclasses which are acquired and then lead to an overarching abstraction such as ‘plant’ through Hebbian learning. Returning to the specifically linguistic domain, neural assemblies formed by Hebbian learning can be seen as a model for lexical knowledge, including abstract semantic classes, 113 words, or even smaller structures such as syllables, as promoted e.g. by Pulvermüller (1996, 1999; cf. also Barsalou 1999: 584): The Hebbian framework suggests that different gestalts, syllables, or word forms have distinct cortical assemblies because perception of these entities will activate different (but possibly overlapping) populations of neurons […] After an assembly has been formed, its strong internal connections will allow the whole assembly to become active if a substantial fraction of its neurons has been activated. Thus, a gestalt can be perceived even if the stimulus object is only partly visible, and a stimulus word can be identified in a noisy environment that masks some of its phonetic information. (Pulvermüller 1996: 318)

Using this reasoning it is possible to transfer the ideas in Figure 37 to the domain of paradigmatic, language-specific semantic classes. Figure 38 shows an analogous representation of a semantic class [+liquid] which in this case comprises networks corresponding to lexical representations (more specifically, peripheral nodes represent the properties unique to those units, though needless to say this and similar renditions below are stark simplifications compared to any real-life correlates).

113. In fact, there is evidence that introducing a linguistic label to distinguish two categories of similar visual stimuli can alter the categorization of unseen, similar stimuli already in ten-month-old infants, cf. Plunkett, Hu, and Cohen (2008).

206

Representation within a usage-based productivity grammar

water

wine

lava

[+liquid]

kombucha

Figure 38. The semantic class [+liquid] as an assembly of subnetworks. The stronger connection of water as a prototype for the [+liquid] class is represented by the thicker lines connecting water to the center of the assembly, and its own more powerful entrenchment is signified by the relative size of its node. Less prototypical elements are used less often in the context of the class, such as lava, and have very weak connections to the network and less entrenchment.114 Novel members, which are categorized based on their properties being perceived as similar to a liquid, are connected by a dashed line, such as kombucha, a drink one may encounter for the first time at some later point in life. Since all of these liquids have unique properties, input matching wine will activate the wine sub-network best, overcoming the activation of other liquids based on shared properties (or in a model with explicit inhibition, we may say that the special features of wine will cause an inhibitory relationship to develop with other members of the [+liquid] network). Thus any specific realization of the feature-based class [+liquid] will force a choice among its exponents, while previously unobserved types will categorize as [+liquid] and strengthen the network core based on shared features, but will build a unique representation with 114. I shall leave the length and orientation of the edges between nodes without signification, though there may be some reasons to believe that semantic proximity might be reflected by physical proximity of subnets in the brain (cf. Pulvermüller 1996). For the present discussion the relative localization of subnets can be set aside.

A Hebbian cognitive account of argument categorization

207

their own feature combination that is not compatible with previously encountered types. In this way Hebbian learning can lead to linguistic paradigm building, whereby a slot can require any member of an extensible semantic class such as [+liquid] based on properties of the input observed for that slot. The space covered by the dashed circle in the diagram, which delineates those items judged to be similar in the relevant respect, can then be equated with a category that gains an independent cognitive representation through recurring activation.115 At the same time, we can expect slightly different behavior for different constructions requiring the ‘same’ class, as we have seen in the previous chapter. While the general [+liquid] class may arise from all cognitive pathways relating to liquids in our experience, there may be more specific subnets correlating with the activation of particular verbs or constructions,116 including synonyms or members of syntactic alternations. Hebbian learning will lead to the subtly different strengthening of different subnets which have been experienced with particular constructions or lexemes, such that although the connection between all liquids can cause excitation of the entire class in any case, a particular verb may have an added or lowered activation corresponding to words and other constructions it often or rarely coactivates with. In this way we can explain and defend the usefulness of open semantic classes despite subtle differences between synonymous constructions. With this basic building-block of our model we can now return to an important question raised earlier: why do skewed distributions, with a roughly Zipfian frequency spectrum including very many rare types and a few frequent types, lead to productivity? If we recall that every instance of a semantic class activates that class (those subnets common to all or most members), it should become clear that rare types, such as hapax legomena, strengthen the relevant class representation with hardly any noticeable change in their own lexical representation. Each mention of 500 rare bever115. This is equivalent in Langacker’s (2000: 7) terms to a schema encompassing its instantiations: “A schema is immanent in its instantiations in the sense that being located in a point-like region of state space entails being located in a broader region that encompasses it.” In other words, instances instantiate their schema by being located within the region defined by its activation pattern. 116. There is in fact some evidence from self-paced reading studies that the thematic fit of specific lexemes as particular arguments with particular verbs is variable, and interacts with the embedding context (McRae, SpiveyKnowlton, and Tanenhaus 1998).

208

Representation within a usage-based productivity grammar

ages makes little change in the entrenchment of those beverages, but the construction housing those mentions, be it a semantic class like [+liquid] or [+beverage] or a corresponding syntactic one such as the direct object of drink, is strengthened substantially in sum.117 A construction with little or no hapax legomena, by contrast, will also be strengthened by each instance of its usage. But if all or most instances of its usage involve a small set of items, it will be quickly identified with those items exclusively – the prototypes become the construction and the more schematic abstraction, removed from its instances, is left comparatively weak, as illustrated in Figure 39. better

bare c2 comparative

merrier

more ergonomic easier

Figure 39. A less productive network for the 2nd comparative in a bare comparative correlative. In this case the networks for better or merrier in the second CC clause are almost the same as the network for the abstract comparative position in this construction. In the case of merrier we may even want to assume an idiomatic network with more in the bare c1 clause for the idiom the more the merrier, so that these lexemes have a particularly powerful coactivation pattern. For other comparatives, such as more ergonomic, this type of network means a much smaller chance to be realized when the construction is selected, since the construction cooccurs too often or too exclusively with its prototypes and, as a consequence of Hebbian learning, excites their 117. This mechanism would also explain the observation in morphology that “lowfrequency items that are morphologically complex form stronger [morphological relations] tha[n] high-frequency complex items” (Bybee 1985: 123). Since low frequency items are not entrenched, their processing activates the abstract construction more powerfully.

A Hebbian cognitive account of argument categorization

209

representations too easily. Interestingly, the resulting weakening of the more schematic level in this case recalls Clausner and Croft’s (1997) and Barðdal’s (2008) view of the maximal level of schematicity reflecting productivity, especially as it applies to extensibility in the sense of the likelihood of novel forms (the 3 dimension of the PC). In light of the above Hebbian model of network formation, it becomes clear why skewed distributions and hapax legomena in the input in particular lead to the acquisition of productive usage. If a construction (i.e. one network) is acquired from only a few very frequent prototypes (i.e. other, very strong constituent sub-networks), they only ever get activated together – the exemplar paths are too strong. If there are many rare types, then their exemplar-based paths may not be fortified sufficiently to persist in the mental lexicon over time. But the construction path expressing their commonalities is fortified, and is likelier to be activated in the future, and less likely to activate one of the frequent prototypes. This leads to a similarity in the frequencies of the input and output distributions, above and beyond the similarity caused for pragmatic reasons (e.g. drinks being more repetitive than foods in reality for objects of drink and eat respectively, etc.). Tying this back to the PC model, we see that 3 at any given point in time corresponds to the proportion of network members in a speaker’s mental lexicon with the weakest possible link to the schematic representation linking all exemplars of a certain process.118 This proportion correlates with the relative strength of the more abstract network core as compared with the strength of its prototypes, and as we have seen in the previous chapters, the proportion itself is predictive of the probability of novel types. Still, it remains unclear what part the facts of distribution play in a concrete instance of argument selection: after all, speakers have a free choice of semantic content, and if they wish to speak about a correlation of ergonomics with some other attribute, they will presumably not be forced to select the adjective better just because the comparative correlative construction is rather unproductive. I therefore turn next to ask how lexical 118. This statement must of course be qualified with the reservation “in as much as corpus data is our model of linguistic experience”. I will briefly return to this fallacy in Chapter 7, Section 3, but note that this is not necessarily a problem from a theoretical point of view: if we were in a position to record linguistic experience more fully and accurately, then proportions from that data would be expected to be predictive of the network configuration of the actual mental lexicon. The measurements observed here can be thought of as (flawed) approximations of those real values.

210

Representation within a usage-based productivity grammar

choice based on communicative needs proceeds within the structure of the mental lexicon.

5.

Lexical choice and the structure of the mental lexicon

Why do speakers say … the better or … the more ergonomic in a comparative correlative apodosis to begin with? Clearly language is a communicative capacity which we use to express ideas that are selected freely and contextually. In order to deal with productivity in the context of free lexical choice, we must make clear what it is we are free to choose. It is not the case that speakers can have a meaning in mind, and are then completely free to choose any construction or lexeme they please to express it. Some compositional combinations resist alteration or combination altogether as in (109a) or else different combinations may be possible but simply very unlikely compared with collocational alternatives, as in the canonical (109b) versus non-canonical (109c). (109)

a. ? drop lifeless, ? blithering moron (cf. drop dead, blithering idiot) b. highly skilled, extremely difficult c. # extremely skilled, # highly difficult

The preference for using more productive constructions to accommodate novel argument lexemes is similar to the latter case: using a less productive construction with novel material may not be ungrammatical, but given the meaning the speaker wishes to express it is marked and odd, as discussed in Section 1 at the beginning of this chapter. What is ‘free’ for the choosing is therefore primarily the pre-linguistic conceptualization or ‘message’ (cf. Levelt 1989: 70–106). 119 Once a message has been selected, language offers a constrained repertoire of constructions with choices being guided by a large number of factors (cf. Gries 2003: 132–156 for a discussion of the complex selection process for English verbal particle placement).

119. In order to remain within the scope of the discussion centering on productivity, I will forgo a detailed discussion of some of the specifics of Levelt’s and other theories of language production – the present interest is mainly on the place that lexicalized productivity can assume in modulating selectional processes. The reader is referred to the literature for more details where relevant.

Lexical choice and the structure of the mental lexicon

211

If we look at the beginning of the process of lexical choice on the conconceptual level, we must avoid lexical representations altogether and consider some abstract level of prelexical meaning (an onomasiological approach, cf. Croft 2010: 7). Let us consider for example, what might lead a speaker to utter a sentence like (110). (110) The greater the risk, the higher the profit. At the most basic level, this sentence indicatively expresses a correlation between risk and profit. It may also have additional pragmatic undertones, such as providing a comment, and perhaps even advice about a risky venture the speaker has been made aware of, and it is interesting to note that it is phrased positively (as opposed to the equally grammatical the smaller the risk the lower the profit). As a very simplistic approximation of the pre-lexical meaning of the sentence I offer the graphic representation on the left-hand side of Figure 40.

[conditional clause] [+profit]

$ [+correlation]

[the xer the yer]

The more the better The sooner the better The higher the more dangerous The more the better ... [+risk]

Figure 40. Lexical choice for a correlation of risk and profit. The image to the left of the black arrows is meant to convey the composite conceptual notion of a correlation between risk and profit. At this point, no lexical representation has been selected yet. At the center of the figure we see some possible language-specific semantic structures that might be activated by the abstract concepts on the left, such as the categories of constructions with the properties [+profit] and [+risk], which may comprise

212

Representation within a usage-based productivity grammar

either just those lexemes or possibly larger sets of near synonyms (profit, gain etc.), and perhaps also an abstract language-specific semantic concept of [+correlation] which represents an abstraction over multiple constructions which cooccur with the expression of a correlation. On the very right we see possible specific constructions that might be selected to express the desired meaning, which are connected to the categories at the center. Once the relevant concepts have been activated, activation spreads in parallel to all connected representations in the vein of the ‘spreading activation’ model (Collins and Loftus 1975; see Carroll 2008: 115–117 for discussion including more recent related models), so that different alternatives are possible during processing, and even representations not selected in the end are activated (that this is so can be shown e.g. through evidence of priming, or in the case of syntactic structures, late-resolved ambiguities). The connections guide the implicit comparison the speaker must make between the concepts he or she wishes to express and the available constructions: the notion of ‘profit’ can be more or less like better, just as an abstraction expressing our experience of ‘correlation’ can be similar to CCs, conditionals, etc. Similarity could be measured in different ways, but we can start by assuming it amounts to the extent of overlap in neural activations (see also the next section). Connections have been schematically rendered as bi-directional since concepts activating production can be activated in turn during perception, and priming effects (semantic or structural) are observable in both perception and production. However in actuality it is clear that neural links are always directed (i.e. a bi-directional connection is a set of at least two paths), and that the strength of connections need not be equal in each direction (e.g. priming from self-production can be stronger than priming from perception). Once the level of individual constructions has been reached, the activation of parallel options that are present so far must be narrowed down until a single form of expression is chosen, presumably the most active representation of all possible combinations (cf. Gries 2003: 157–184 for a formalized model and discussion). Importantly, there need be no one-to-one mapping between prelinguistic concepts and linguistic constructions at the conceptual-semantic interface (cf. Jackendoff 1997: 31–36). The speaker may find that a conditional clause and a comparative correlative clause are both suitable for the expression of a correlation, and must choose between an utterance like (110) and the option in (111), or a PP realization as in (112). The same also applies to the choice of concept to accommodate in either the protasis or apodosis of a CC as in (113), or the choice of positive or negative polarity for the comparison, as in (114).

Lexical choice and the structure of the mental lexicon

213

(111) If you raise the risk you also increase the profit. (112) You can increase profit by raising the risk. (113) The greater the profit the higher the risk. (114) The lower the risk the lower the profit. These sentences can be seen as instances of interrelated constructions with overlapping semantics and lexical items that form a complex network (cf. Langacker 2000), the members of which will certainly have subtly different meanings, and consequently there will be different reasons for choosing each one. The important point for the issue of productivity of argument selection is that we may be more or less inclined to make one of these choices, all other things being equal, if it forces us to use unfamiliar lexical material in a less productive slot.120 Though we have already seen the preference for more productive slots when embedding novel material in the previous chapter quantitatively, consider the qualitative naturalness of the examples in (115)–(118), showing the relatively unproductive but still extensible [Nx P Nx] distributive construction (e.g. house by house, Williams 1994; Jackendoff 2008), which may be thought to compete with quantifiers such as each or every. (115) John painted everything on the block red, house by house. (116) John painted each/every house on the block red. (117) ? John told everyone in the kindergarten ‘hi’, child by child.

120. Whether or not the above sentences are ‘different ways of saying the same thing’, or in a variationistic approach different variants of the same variable (cf. Labov 2004), and also to what extent this can be maintained, are of course difficult questions to decide in each case. Such an interchangeability might only apply within a certain context. However if we believe speakers have any freedom to vary their utterances for a given communicative need, we must assume such choices occur, and as soon as they do, productivity can play a role in tipping the balance. Establishing that ‘all other things are equal’ may also be difficult for the comparison of such variants, though some ideas for experiments under laboratory conditions are outlined in Chapter 7, Section 3.

214

Representation within a usage-based productivity grammar

(118) John told each/every child in the kindergarten ‘hi’. Although (117) may not be ungrammatical, it seems very marked and is probably highly unlikely to occur in natural conversation. The problem is however not the mapping of the [Nx P Nx] construction to the conceptual structure behind the sentence, but rather the unfamiliarity of the argument child, as we can see by the much more natural sounding (119) and (120). (119) John told everyone in the kindergarten ‘hi’, one by one. (120) John told each child in the kindergarten ‘hi’, one by one. Thus the construction itself is quite compatible with the conceptual meaning to be expressed, but for reasons of productivity, it may be dispreferred. 121 In the Hebbian approach this is simply a consequence of the inputdriven form of the construction’s network. Productivity is thus a function of how well the relevant construction is networked to conceptual structures, how strong its representation is, and at lower hierarchical levels, how well networked its lexicalized exemplars are. It also becomes clear why the different dimensions of the PC contribute to productivity: a high token frequency leads to the entrenchment of the construction as an option to begin with; a large vocabulary V leads to productivity since more types mean more sub-networks which could attract similar novel lexemes to the construction; and a skewed distribution helps on the one hand to entrench the construction around powerful ‘general’ exemplars to which many potential vocabulary items may be similar, and on the other hand they prevent those prototypical exemplars from overpowering the possibility of novel forms. This fits well with insights from training studies which examine how quickly adults and children learn novel constructions when exposed to exemplars (for an overview of studies see Goldberg 2006a: 74–92). It has been shown that nonce verbs and constructions which have skewed distributions are acquired more quickly by children (Casenhiser and Goldberg 121. Again I wish to emphasize that it is not impossible to form an acceptable example with the combination child by child in some other environment. The point is that it is qualitatively more difficult to do this here than one might naively assume based on the naturalness of the example with one by one, and that quantitatively, such combinations are predicted by productivity measures to occur significantly more rarely than expected by chance.

Lexical choice and the structure of the mental lexicon

215

2005, Maguire et al. 2008) and adults (Goldberg, Casenhiser, and Sethuraman 2004), 122 and also that the generalization of argument structure constructions in first language acquisition proceeds from frequent prototypes before achieving extensibility to novel lexemes (i.e. children acquire a-structures like the ditransitive exclusively with a frequent, general prototype like give and only later make the abstraction allowing them to embed new verbs in the construction). As Goldberg puts it (cf. also Tomasello 2003: 144–161): The present hypothesis is that the high frequency of particular verbs in particular constructions facilitates children’s unconsciously establishing a correlation between the meaning of a particular verb in a constructional pattern and the pattern itself, giving rise to an association between meaning and form. (Goldberg 2006a: 79)

This is understandable if we believe that a large part of the physical infrastructure representing a construction is shared with its prototype(s), i.e. they share ‘representational real estate’.123 As increasingly numerous rare exemplars are categorized as similar to a prototype, the construction develops an independent representation which can attract novel material (cf. Goldberg, Casenhiser, and White 2007). 124 The results in the previous chapters suggest that skewed distributions not only facilitate the faster learning process evidenced in training studies, but also result in a different kind of learning, which leaves long lasting traces in its effects on productive usage.

122. Similarly, skewed input distributions may be involved in successful second language acquisition (Ellis and Ferreira-Junior 2009, Wulff et al. 2009, Madlener 2011), which should have commonalities with adult learning of completely novel constructions. 123. The latter formulation is due to Pinker and Prince’s (1991: 232) description of connectionist models. Compare also Langacker (2000: 7), who postulates that schemas and their instantiations, which we may represent as separate objects, are actually stored in partly shared structures: “discrete representations […] are not to be taken as implying that a schema and its instantiation are wholly distinct and separately stored”. 124. This process of category building through iterative application of similarity judgments is a pervasive, domain-general cognitive mechanism evident also in the non-linguistic reasoning of adults and children (see Gentner and Medina 1998).

216

Representation within a usage-based productivity grammar

A formal account fitting this process can be found in explicitly exemplar-based computational models of syntax acquisition, such as DataOriented Parsing (DOP, Bod 2006, 2009).125 The DOP framework assumes that any input substructure can become a rule in a parser’s grammar, and all possible parses are assigned to every input the DOP parser is exposed to. However, the DOP parser comes to prefer parses that are more useful for analyzing additional input by considering the probabilities of all subtrees it is exposed to. If a fragment and its superstructure are perceived on separate occasions, a parse containing the fragment subtree as a separate constituent comes to be evaluated as more likely. At the same time, larger subtrees which appear in a consistent form are preferred by the DOP parser, since they are supported both by recurrence of their entire structure and by the probabilities of their constituent fragments. Bod equates the acquisition of such subtree probabilities with a process of exemplar-based similarity checking: The notion of probability may be viewed as a measure for the average similarity between a sentence and the exemplars in the corpus: it correlates with the number of corpus trees that share fragments with the sentence, and also with the size of these shared fragments. (Bod 2006: 307)126

The converse of the conditional probability P(analysis|form) in language production should then be found in the probabilistic selection of a form of expression given certain concepts the speaker wishes to express. This probability could equally be calculated using exemplar-based similarity checking, in which the most strongly attested constructions for the desired meaning are activated. In Bod’s model, there is a bias in favor of the larger attested fragments, since they are supported both by activation of their constituents and exemplars representing larger chunks; transferring the idea to 125. Similar ideas are also being promoted in the framework of fluid construction grammar (FCG), cf. Steels and de Beule (2006) i.a. 126. The precise details of the computational model cannot be discussed here, and the interested reader is referred to the references above. My intention here is mainly to show that approaches with very different starting points, both computational and developmental/psychological, reach converging results with different methods, all indicating the existence of prototype-centered, probabilistic competition between constructions generalized from input, which typically shows more or less skewed distributions having different effects on acquisition.

Lexical choice and the structure of the mental lexicon

217

the cognitive domain, if we find a well-attested (in cognitive terms, a wellentrenched) exemplar, we may be inclined to use it outright. Failure to find a fitting exemplar leads to a compositional process of retrieving progressively smaller fragments until a high similarity is reached between the meaning to be expressed and the structured contents of the mental lexicon (cf. Walsh et al. 2010 for a computational model implementing this notion for both phonology and syntax). If we understand constructional schemas as traditional rules, we find that Bod’s preference of maximal well-attested fragments corresponds to the observation that rules that are applied more often (in either comprehension or production) are also more productive in a gradual model of productivity, as described e.g. by Hay (2001) and Hay and Baayen (2003), who identify a high proportion of morphological bases to derived morphological forms as an index of productivity, in particular with regard to the semantic aspect of transparency (see also Plag 2006 for discussion). If a derived form is more frequent than the base it is applied to, as is the case e.g. for derived illegible vs. its morphological base legible, then the derived form is more easily accessible as a whole, without the need to resort to compositional parsing (recall also the measure of activation $ based on parsability in Chapter 3, Section 5). In production, if no familiar expression is found to express the currently desired meaning, the speaker is forced to produce output productively, using combinations of whatever smaller units are activated in the present context. At the level of subunits, again, more abstract constructions based on the occurrences of multiple types, which will be largely rare ones with an LNRE distribution, are more powerful in their potential for activation with unseen material. This follows from a cognitive interpretation of the principle operating in the computational DOP model of preferring larger, higher structures (in DOP, syntax trees) which can account for the data and are more strongly entrenched since they are activated by all types belonging to the relevant construction. Insofar as these constructions have open argument positions, these activate other constructional subnets in turn (in DOP, the ‘label substitution’ operation attaching subtrees at an open position), which can select lexical material never before encountered in that position. The constructions filling those argument slots can correspond to parts-of-speech in a traditional generative grammar (a transitive verb activates the NP construction, which in turn activates the noun construction at its head), but can also be much more subtle and semantically sensitive, corresponding e.g. to networks activated in the context of our experience of liquids (that these should also have an associated subfamily of NPs is not implausible – liquids for example belong

218

Representation within a usage-based productivity grammar

grammatically to the class of mass nouns in many languages and form good prototypes for that class). There is thus considerable converging support, both from a variety of sources of empirical evidence and from computational models simulating linguistic category acquisition through data, for a model of the mental lexicon in which lexical and grammatical classes arise from exemplars centered around prototypes and bolstered by variable input from a multitude of types following a skewed distribution.127 The present results fit with this model by showing how it could lead to the productivity effects we observe in data, and how the input distributions of more productive constructions can perpetuate themselves. The Hebbian account of productive construction acquisition from skewed input containing a high proportion of rare types as contrasted with less productive, though possibly frequent constructions, is both cognitively plausible and predictive of the multiple aspects of productivity we have seen. The resulting network of dynamically updated representations biases speakers during the selectional process away from certain less productive constructions when called upon to embed novel material compositionally, and towards others that are more productive. Conversely, collocational arguments will favor the constructions in which they are familiar, though of course a wide range of factors, from priming, through information structure to register, can also play a role in the competition of activation between sub-networks. An important question which this general framework leaves open is that of the nature of the relationships between constructions which are activated during construction and argument selection. Is there only one type of link between networks, which arises by Hebbian learning? How are different variant constructions linked, such as more or less elaborate constructions embedding comparative adjectives? Does every construction have an independent representation or is there some sort of inheritance in the mental lexicon which is also reflected in the details of productivity? Before concluding the discussion on the mental lexicon in this chapter, I now turn to these questions and a more concrete view of relations in the mental lexicon.

127. Cf. also Abbot-Smith and Tomasello’s (2006) emphasis on the development of abstracted representations in a hybrid model of prototypes and exemplars for first language acquisition.

Relation types in the mental lexicon

6.

219

Relation types in the mental lexicon

If we accept the assumptions of usage-based models about the storage of frequencies for every exemplar (and frequency effects on priming in simple tasks such as lexical decision times seem to support this), we already have a model with the necessary infrastructure for a network such as the one found in Figure 38 above for the semantic class [+liquid], with its own abstract categorial representation. Still, given that two verbs or other a-structures may select for liquids, it is unclear how the argument slot for one construction can differ from the other, especially in cases where the meanings are so similar as to make extralinguistic, world knowledge-based influences on choice less likely. We might thus be willing to accept that spill and drink behave differently, as the activation of nodes within the liquid class may receive differing levels of excitation depending on the real-world nature of the relevant objects (e.g. oil may often be spilled, but is rarely drunk). But how can we account for the different behavior of syntactic alternants like gerund versus infinitive complements of start (Chapter 5, Section 5), or the comparatives in different variants of the comparative correlative which behaved differently as shown in (108) above? Are there separate networks representing each and every class redundantly in each and every slot? While this may be a possible solution from a theoretical point of view, it seems implausible, since it would lead to an explosion of information in the mental lexicon through the combinatorics of embedding environments (cf. Elman 2009).128 A less redundant representation is suggested in Figure 41. In this interpretation, the comparative slot in different constructions, such as bare CC apodosis or a full CNV apodosis is represented by a sub-network of the nodes involved in the general comparative construction whose identity is represented by the outermost dashed circle ‘General COMP’. The constructional slots of ‘CNV2 COMP’ and ‘Bare C2 COMP’ are both represented by a subset of the nodes making up the general construction, accounting for their shared, similar semantics, and the possibility that use of comparatives outside of CCs may prime the latter construction as well. Some ‘representa128. This is not to say that speakers do not retain a great deal of ‘redundant’ information – usage-based approaches generally assume that much of the accompanying context of each utterance is stored. The issue is whether each and every embedding syntactic environment is likely to be represented by separate structures indefinitely, which may result in astronomical numbers of such structures, or whether some overlap in the architecture allows that not every context representation be separate, as I will suggest below.

220

Representation within a usage-based productivity grammar

tional real estate’ is shared on lexical terms – for example, the node representing the unique contribution of better to the network is enclosed by all circles. Other nodes, such as merrier, are very strong within the representation of the bare C2, but not included in CNV2, corresponding to the fact that merrier almost exclusively appears in the bare version. Inheritance relations in this model are not an explicit type of link, as suggested in some earlier CxG work (esp. Goldberg 1995: 72–100). They are rather an inevitable result of structure sharing, which nonetheless can account for idiosyncratic differences between more general and more specific constructions, since not all structures need be shared. It is entirely possible for there to be nodes in bare C2 that are not part of COMP, for example if merrier would cease to be used outside of CCs entirely. In this way, networks are at once prototype-centered, with the most entrenched, thickest connections in the diagram being most likely to activate, but have fuzzy edges, which may fall within some sub-constructions of a more abstract schema, but not within others.

more cynical more unconvincing greater more likely

CNV2 COMP

more ergonomic more convincing

more expensive easier better

General COMP

merrier

more enigmatic

Bare C2 COMP

Figure 41. Interlocking representation of bare and NP-VP CC apodosis adjective slots within the sphere of the comparative construction.

Relation types in the mental lexicon

221

However, the diagram above only refers to the punctual choice of the comparative argument within the relevant constructions. Can the same type of links also represent the entire sequence of selections filling the entire CC construction? Since the type of network envisaged so far only models an entry for a single lexical choice, adding further choices runs into the problem of modeling temporal progression (first one choice, then another), and risks another explosion of redundant networks if all possible spelled out constructions must be coded as such a network.129 I will therefore assume some sort of link between successive constituents of a construction, which can simply be termed ‘syntagmatic’ for convenience, and the strength of which may be interpreted as describing either transitional probabilities, or more cognitively a contribution to the activation of subsequent constructions or co-constituents in a construction. Figure 42 gives the syntagmatic relations within the entire CC apodosis in its three variants, bare, CN and CNV, which form a family of constructions with differing productivity.130 For better legibility, the internal structure of the COMP slot from Figure 41 has been hidden and some further possible constructions in the network have been omitted (e.g. a clause construction encompassing NP and VP at a higher level). The network for the conjunction the at the bottom left (which is distinct from the network for the article the) does not involve any selection. It has no open slots and is unproductive, as signified by the lack of rare item nodes, connected by weak links.131 A syntagmatic relation (dotted arrow) leads from the to the comparative adjective slot COMP, though the latter is shared among many constructions. The connection from the to COMP is 129. The same problem haunted earlier connectionist neural network models as well, which were initially able to make decisions based on input statically, but could not represent progressive processing well. One of the major contributions of network models such as the Simple Recurrent Network (SRN, Elman 1990) was in the introduction of dynamic units representing network working memory to overcome this problem. 130. Cf. Goldberg and Jackendoff (2004) who discuss the family of resultative constructions in English (see also Boas 2003). Resultatives too have commonalities and idiosyncrasies of usage, including different degrees of productivity (Goldberg and Jackendoff 2004: 535). 131. Although the has only one possible form, its network is schematically represented by two connected nodes for consistency: one for the ‘abstract construction’ and one for the realization of the. It is equally possible to postulate just one node uniting the abstract function and the form of the conjunction the.

222

Representation within a usage-based productivity grammar

particularly strong (thicker arrow), since the conjunction the mandatorily requires the comparative complement; other syntagmatic relations in the network are not quite as strong, as serializations without NP and VP are possible.132 Note also that not all nodes in the COMP network need be part of the area related to bare C2 or CN2 – different parts can belong to different constructional networks (this corresponds in part to Goldberg’s 1995: 97–98 multiple inheritance, though it is necessary for many reasons, including differential productivity).

VP

CNV2

CN2

NP

the

better

COMP merrier

Bare C2 Figure 42. The CC apodosis with constructional constituents.

syntagmatic

relations

between

132. Seen in this way, the strength of syntagmatic connections may also be related to the argument-adjunct distinction, which is known to be gradual, or at least to have fuzzy boundaries (see Vater 1978, Somers 1984, Przepiórkowski 1999, among others).

Relation types in the mental lexicon

223

It may at first appear as though there is no way to decide which combinations of sub-networks will enter into constructions as marked by the ovals in the diagram. In a sense, any combination is a possible construction, much as a formalism like DOP regards all possible parses of an input as possibly informative for parsing further input. Repeated exposure to the same combinations (especially to the most common exemplars, i.e. the prototypes) is what can initially entrench a construction such as the ones above, or can lead to re-analysis at some later stage in the event that an erroneous construction is initially gleaned from the input. A speaker of English must learn that CCs exist, what they mean, that there are variants with NP and no VP, and bare variants without either phrase. Once this has been learned based on such common exemplars as the earlier the better, exemplars similar to these, with a substantial overlap in activation patterns but using novel lexical material, will be classified as CCs by strengthening bonds with the existing network. Networks with many entrenched exemplars can reach a high vocabulary (V aspect of the PC), while networks attracting many rare items are the ones speakers are more willing to extend (3 aspect of the PC). An aspect still missing in the representation in Figure 42 is the activation of appropriate nodes depending on communicative needs in a situation, i.e. the network’s interface to conceptual structure (cf. Jackendoff 1997). In other words, we still require inputs to constructional representations, linking them to the general cognitive system, as suggested in Figure 40. These links may also develop based on Hebb’s Law (cf. Barsalou 1999), and can also apply within the linguistic representational plane, since not only prelinguistic concepts, but also linguistic ones such as lexical units should be able to raise activation levels for related constructions in a way that is neither paradigmatic nor syntagmatic. For example, using or hearing a word with a certain morpheme may activate other derivations using that morpheme, and indirect relations between words may be called upon as well, such as the evocation of rhyming words, semantically related words etc. Honoring structuralist tradition, we can call these connections ‘associative links’ (cf. de Saussure 1966: 125–127), which are represented by two-sided straight arrows in Figure 43.133 133. The images on the right stand for extralinguistic concepts and should not be confused with literal mental images (cf. Barsalou 1999: 582, 593). For simplicity, the straight arrows are two-sided to express bi-directionality between conceptual and linguistic structures in production and comprehension. However as noted earlier, actual neural architectures are directed and not symmetrical in strength, so that two separate pathways should be assumed in actuality.

224

Representation within a usage-based productivity grammar

RISE

VP

CNV2

CN2 NP

the

HIGH

PRICE

CORRELATION

COMP Bare C2

MORE

+ Figure 43. Syntagmatic, paradigmatic and associative links on the path from conceptualization to the processing of a productive utterance. The abstract concept of correlation may activate the entire family of comparative correlative constructions and their constituents to a high degree, but it may also activate other competing constructions, not depicted here, such as conditional clauses (cf. Section 5 above). If the speaker wishes to express a correlation between something and a rise in price, they have multiple options, including selecting an appropriate adjective from COMP (e.g. more expensive) mentioning the higher price (selection of CN2 with higher in COMP and an NP headed by price), or speaking of the actual rise in price as a process (selection of CNV2, e.g. with a VP headed by rise together with higher in COMP and an NP headed by price). We can imagine all of these constructions are activated in parallel, as in the spreading activation model, with one pattern achieving the highest activation and overcoming the others. Activation levels are not deterministically related to conceptual structures being activated, but are modulated by both extralinguistic factors (e.g. alertness, emotion, circadian rhythms) and intralinguistic ones such as recency effects (priming, parallelism), interactions (expensive and rise might both be associatively related to the desired mean-

Relation types in the mental lexicon

225

ing, but do not work well together), information status (e.g. givenness, etc.), and as we have seen in previous chapters, also by productivity. If the conceptual level activates linguistic constructions which have not been combined before, less productive constructions may resist concurrent activation with those nodes, leading to the preference we have observed for the more open, 3 -productive construction. Thus it seems that the Hebbian network model presented in previous sections fits well with three well-known traditional types of linguistic relations, paradigmatic, syntagmatic and associative, which are needed to connect the lexicon’s network of representations with the dynamics of production and comprehension in a cognitively plausible framework. 134 The productivity phenomena observed and investigated in this book can all be seen as epiphenomenal in such a model, an indirect result of the topology of a network-based mental lexicon. Productivity ratings in this model are pliable, since the network is dynamic, and can vary across contexts or registers, as slightly different linguistic subnets develop stronger links in different situations that distinguish scenes in our experiential world, and the representations conceptualizing these contexts can modulate activation themselves. In the present model, there is no feasible way of separating the world knowledge reasons for the differential productivity of the objects of eat and drink from the purely linguistic, conventionalized reasons. It seems that we cannot “take the world out of language”, and that we might rather have to “put language in the world” (Elman 2009: 568). This result is perhaps not surprising from a usage-based perspective, which sees grammar as an organization of our experience with a language, inextricably connected to all other aspects of cognition. As Bybee puts it: While all linguists are likely to agree that grammar is the cognitive organization of language, a usage-based theorist would make the more specific proposal that grammar is the cognitive organization of one’s experience with language. (Bybee 2006: 711)

That said, we are nevertheless in a position to show that conventionalized factors do exist in all aspects of productivity, as the semantic minimal pairs in Chapter 5 were able to show by providing a dissociation between 134. A fourth type of relation, that of inheritance, does not need a separate representation but is implicitly present if we relegate its meaning to the physical structure-sharing of sub-networks, which can also be partial, or multiple.

226

Representation within a usage-based productivity grammar

meaning and productivity. This need not mean that we must give up on the very useful notion of semantic classes and even decompositional semantics, which are useful tools e.g. in the context of identifying textual entailment. As we have seen, semantic classes can have fuzzy boundaries, and different variants of a class can come into existence by repeated activation of subnets within a more general class, just as some lexemes, like merrier, may be well-anchored to one constructional variant (viz. bare C2) but not to others.

7.

Interim conclusion: Outline of rules in a productivity grammar

This chapter has sketched out a cognitively founded, usage-based model of the mental lexicon which gives rise to the multidimensional productive behavior explored throughout this book. In storing productive constructions along the lines laid out by CxG approaches, the mental lexicon is understood to house what is traditionally seen as the ‘grammar’, i.e. the set of rules which give rise to novel combinations with lexical items in order to produce the infinite generative capacity we find in language. In this view the lexicon is not “an appendix of the grammar” or “a list of basic irregularities” (Bloomfield 1935: 274; this view has been referred to as “a naive hope” by Jackendoff 1997: 151), but rather a structured network of both simple and complex signs which also accounts for the productive, rule-like constructions of the language. Rules of grammar are thus those entries in the mental lexicon, i.e. those constructions, which have open slots in their representation. The commitment to a usage-based grammar implies that knowledge about the usage of rules, including information about their productivity, is somehow stored together with these constructional ‘entries’. The architecture presented in the previous three sections suggests that knowledge about the productivity of usage for different constructions can be acquired from the input through the effects of entrenchment on the configuration of links representing prototypes and other exemplars witnessed in the slots of each construction. In a sense this should be unsurprising if we consider the task of the language learner in distinguishing unproductive idioms from fully productive syntactic structures. There is no way of knowing, on the face of it, that the idiom long time no see (recall Chapter 2, Section 5) is not an exemplar of a productive class allowing whole paradigms of adjectives, nouns and verbs in the first second and fourth position of that phrase. In fact, as soon as we are exposed to long time no hear, we may become willing to extend the verb in the fourth position to a productive slot, but not the other positions (e.g. a conceivable extension of the sort

Interim conclusion

227

*great distance no rest). Once a prototype is entrenched, it becomes possible for an extensible construction to arise by adding exemplars that strengthen its representation as independent from its prototype(s). The present theoretical model’s components offer direct interpretations of the dimensions of the PC and the quantities evaluated by productivity measures. The vocabulary aspect V corresponds to the number of argument exemplar types stored in individual subnets of a constructional slot’s network. Their frequency corresponds to the entrenchment of each subnet, with the consequence that frequency for the entire slot can be equated with the sum of their entrenchment (since the slot is activated for each token, and activation frequency is correlated with entrenchment). The 3 aspect of the PC corresponds to the proportion of minimally connected exemplar types, which is predictive of the rate of addition of further novel items and is reflected by the relative strength of the more abstract network representations and its prototypes. The S aspect of the PC estimates the attraction potential of the network by predicting the point at which it will become saturated, regardless of the rate of addition of novel items, which can be slow or fast. Hebb’s Law seems to present a good candidate for explaining the genesis of the structures discussed in this chapter. Beyond its role in the creation of paradigms, it may be assumed to fortify syntagmatic relations, which are learned by observing linearizations in language input, and also associative relations, which, much like paradigmatic relations, should develop between neural assemblies representing semantically related, and hence often coactivated concepts. The different kinds of links must nevertheless be assumed, since paradigmatic knowledge is never observed in sequence, and associative connections can apply beyond paradigm borders, and also with nonlinguistic representations, such as sensory information, which is well known to prime lexical items. In fact, associative links can also apply to connect semantically related members of a paradigm, although clearly they cannot be co-selected within that particular slot. Inheritance links, by contrast, do not seem to be necessary in the current model, since shared constituents of neural networks can be assumed to produce the effects of inheritance, including multiple and partial inheritance (inheritance is thus implicit in this model). The admission of gradual productivity to every level of the mental lexicon has important repercussions for the architecture of the grammar. We can call a usage-based grammar accounting for the facts of gradual levels

228

Representation within a usage-based productivity grammar

of productivity a ‘productivity grammar’.135 In such a grammar, the traditional property of belonging to a grammar or ‘being a rule of grammar’ is no longer binary. There are ‘better’ rules and ‘worse’ rules, though a rule can be better or worse in any one or more aspects of productivity. Constructions like the [Nx P Nx] construction in English have rather low productivity in every respect, and compete in many semantic fields with much more productive counterparts, such as quantifiers like every N or each N; the latter form ‘better rules’ in English. Similarly, the German double passive participle construction in gesagt – getan ‘said – done’ or versprochen – gehalten ‘promised – kept’ rarely admits novel lexical material and competes with extremely open constructions such as temporal or conditional subordinate clauses in some contexts. What vocabulary is established in such constructions and how ready they are to accept innovative arguments are part of speakers’ knowledge about usage. Thus the aspect of frequency, which is increasingly accepted as a stored feature of grammatical constructions (e.g. in Manning’s 2003 ‘probabilistic syntax’), is joined in a productivity grammar by the aspects of extensibility, familiar vocabulary size and potential vocabulary size, i.e. the changing probabilities and distributions of repetition and innovation in usage, all of which derive from the structure of the mental lexicon, as shaped by speakers’ linguistic and extralinguistic experience. Even though treating what we conceive of as rules as having different degrees of ‘ruleness’ may seem like a drastic break with tradition, it is a step worth taking if it gives us a more precise treatment of our data. Failing to do so on account of tradition may be little more than an assertion of dogma, as Jackendoff, for example treats the unwillingness to part with the argument-adjunct distinction: The argument-adjunct distinction, while it has been useful as a rough-andready criterion, has on the whole simply been assumed. If it should turn out that a more precise treatment of the distinction reveals intermediate cases, so what? That is, I believe that objecting to [Jackendoff’s analysis of the a-structure of rid – AZ] on the grounds that it undermines what was

135. This term may also be subsumed under usage-based grammar in general, as knowledge about productivity can certainly be seen as part of our knowledge of usage in many usage-based approaches. My intention is merely to give a label for usage-based approaches that explicitly seek to enrich the model of rules of grammar, schemas, or the ‘constructicon’, with information about the idiosyncratic aspects of productivity.

Interim conclusion

229

assumed to be a clear distinction amounts to little but an assertion of dogma. (Jackendoff 1990: 176)

Similarly, although a grammar not accounting for the facts of productivity is satisfactory for many purposes, and even if the majority of cases can be divided into rules and fully lexicalized idioms (and this is probably not the case, cf. Wulff 2008), a more delicate treatment recognizing degrees in between, especially when backed by empirical data with predictive power, should not be abandoned. Much like we can cast binary grammaticality as a cut-off point between the probable and the very improbable (cf. Sampson 2007), rulehood may be more accurately described, especially for borderline cases, as matter of degrees of productivity.

Chapter 7 Conclusion

This chapter concludes the discussion by summarizing the previous chapters and putting their results in a broader context. Section 1 gives a brief overview of the main results presented over the course of this book. Section 2 discusses the kinds of models of grammar that are compatible with those results and the prospects of integrating findings with different types of models. Finally, Section 3 offers an outlook on central questions that arise from the previous discussion but could not be addressed within the scope of the present work. Some possible directions for further research and chances for converging results using non-corpus-based methods are also discussed.

1.

Main results of this study

The results presented in this book can be divided into two types of findings: firstly, empirical corpus-based evidence on the nature of productivity in argument selection and the methodology we can use to study it, and secondly, a theoretical model that can account for the evidence found using this methodology. The empirical results have shown that productivity in argument selection is a consistent effect exhibiting statistically significant differences in the lexical behavior of argument structure slots that cannot be predicted a priori from any categorical aspect of either form or meaning. Constructions with progressively more similar meaning have been compared, including near synonym heads, different argument structurepreserving morphological derivations from the same stem and syntactic alternations using one and the same verb or adposition, and in each case, substantial divergences have been found. Such findings are not new in the area of morphological productivity, as discussed in Chapters 2–3, but have yet to be demonstrated in detail for syntactic argument selection. If one accepts gradient morphological productivity and a CxG approach in which the combinatory mechanisms governing productive syntactic and morphological processes are not substantially different, this is precisely what one would expect: either gradient productivity does not exist at all, or it can apply to any construction, below or above the word level. At the same time the reservation should be made that some of the present results demonstrate that the view searching for ‘the’ productivity of a morphological process,

Main results of this study

231

such as affixation of a certain morpheme, is too simplistic, since productivity can be very sensitive to the embedding environment (cf. the discussion of comparative formations in Chapter 6, Sections 3 and Chapter 4, Section 5). Beyond being empirically robust, productivity rankings discovered using data often appeal to an intuitive judgment about competing constructions (perhaps most apparently in the case of the different head lexemes in Chapter 5, Section 3 but also in some of the syntactic alternations, notably the case of the German archaic postpositional wegen in Chapter 4, Section 3). Importantly, differences apply not only to the most common, (partially) lexicalized or collocational arguments of each head lexeme, but are evident in the propensity of different argument slots to admit novel lexical material, information about which cannot be stored in the mental lexicon. It has also been shown that the nature of this information is language-specific (Chapter 5, Section 6), and that it is not only (though certainly also) a direct result of world knowledge or abstract, pre-linguistic conceptual semantic representations. The implication of these findings is that productivity is something that speakers have an implicit knowledge of, a knowledge which is idiosyncratic and requires an explanation in our model of grammar. While that knowledge is difficult to demonstrate qualitatively (but not impossible, cf. Section 1 in the previous chapter), it is very much in evidence quantitatively. Over the course of the previous chapters, it has been made apparent that productivity in argument selection is a multi-dimensional phenomenon, much as is the case in morphology. Regarding different aspects of productivity separately, it is possible to reach not only very different relative scores for different processes, but also entirely different rankings. I have claimed that at least four measures from the morphological productivity literature can be successfully adapted to the syntactic domain, forming a socalled Productivity Complex (PC, Chapter 2, Section 6; Chapter 3, Section 8; Chapter 6, Section 2): frequency, vocabulary size, the proportion of rare or unlexicalized items (in practice, hapax legomena) and the projected total of possible vocabulary. While the first and last properties are in principle invariable for each slot, the other two measures are dependent on sample size. Apart from frequency, the three vocabulary-based measures can be estimated and their development with sample size modeled using statistical LNRE models (Chapter 3, Section 6; Chapter 6, Section 2; in some cases correction techniques for unevenly dispersed data should be applied, as has been shown repeatedly in the morphological literature and discussed several times throughout this work).

232

Conclusion

In adapting productivity measures and corresponding type and token definitions to argument selection, key characteristics of syntactic, as opposed to morphological, constructional processes have been examined. A particular challenge is offered by the filling of multiple slots in a-structure constructions. In many cases it is reasonable to assume that each slot is filled separately and that productivity can be measured for those slots in isolation. Within each slot as well, there may be further lexical or argument selectional choices, though these too can be modeled as consecutive decisions with their own productive processes. It thus makes sense to examine the possible object heads of a verb like drink, such as the noun water, without assigning a separate type to instances with distinct attributes such as cold water; the addition of the adjective can be attributed to a separate productive process and can be thought to function similarly in other contexts. However in certain cases in practice a clear interaction can be shown between the choice of novel items in one slot and another, such as the choice of adjectives in bare comparative correlatives in English (Chapter 4, Section 5). If the behavior of multi-slot choices significantly deviates from expectations based on measurements from individual slots, a multi-slot model may be more valuable. Turning to the theoretical results of this study, the statistical modelbased formalization of the PC raises the question of how speakers might represent an equal amount of information in the brain in a way that is cognitively possible. While it is by now largely uncontroversial that speakers store type frequencies by means of entrenchment,136 there is no data to support an internal representation of vocabulary growth in human memory. It has therefore been argued that a similar type of knowledge is stored in the cognitive system to that which an LNRE model can be based on: the vocabulary size Vm for every m, i.e. the number of types in each slot with a frequency of m (in addition to explicit storage of a wide range of types, including frequent prototypes, but probably also a significant portion of rare items for some duration, see Chapter 6, Section 3). This allows us to account for a phenomenon mirrored by the prevalence of hapax legomena, 136. There is in fact evidence that speakers store much more than just entrenchment, including very specific knowledge about selective distributions, cf. e.g. Boyd and Goldberg (2011) on the predominantly attributive use of aadjectives like alive as a quantitative reflex of the distribution of the old, no longer transparent PP construction at its base. The current results fit with Boyd and Goldberg’s view in showing that synchronically arbitrary input distributions are reproduced by speakers in a self-perpetuating way that mutates only slowly over time.

Main results of this study

233

without postulating the explicit long-term verbatim storage of each and every hapax legomenon in each slot, which would be cognitively much less plausible. As a mechanism responsible for the storage of such knowledge and the crystallization of differentially productive constructions, I suggest the well-established, general and language-independent cognitive principle of Hebb’s Law, which, interpreted linguistically, suggests that rare types contribute to productivity by strengthening the neural representation of the abstract constructions they are categorized to, without simultaneously bolstering a prototype to overshadow the abstraction itself (Chapter 6, Sections 4–6). Constructions with too dominant prototypes and little or no rare types develop networks that greatly overlap and are largely identical with their prototypes, while constructions with many rare types develop a strong and distinct representation overlapping common features of all or most of those rare types. This fits well with recent research on the role of skewed distributions in language acquisition and computational models acquiring grammar from examples (see Chapter 6). Within the discussion I suggest in particular that Hebb’s Law could be made responsible for the creation of three types of structural constructs in a network-based model of the mental lexicon: paradigms of non-co-activatable items (i.e. the vocabulary), syntagms of sequentially co-activated constructional constituents and associative networks linking the syntactic representation of constructions to conceptual semantic structure and other related constructions. Seen in this way, productivity is an epiphenomenal function of the configuration of networks acquired through the storage of usage data. This forms a natural component for a usage-based grammar, which assumes the storage of precisely the information needed for such a general architecture. A final consequence of the model suggested in this book is that the property of ‘being a rule of grammar’ (or ‘being a stored construction’ in a CxG approach) is not binary. Beyond the long recognized idea that constructions have different levels of entrenchment, it is suggested that the notion should be formalized that constructions which are in essence productive (in the sense of admitting an open vocabulary) can be so to a greater or lesser extent. A successful application of the gradient view of productivity to argument selection implies that syntactic rulehood itself is gradual. The question sometimes directed at CxG as a theory regarding how one knows what is a construction and what isn’t is in this view seen to misinterpret the nature of constructional lexicon entries: anything can be a construction, and perhaps every possible, even incorrect analysis a speaker entertains becomes a construction at a certain, very preliminary level (much as exemplar-based computational models such as DOP postulate and im-

234

Conclusion

plement, cf. Chapter 6, Section 5). Instead we can ask a different question: how do speakers know which constructions they may extend and to what degree? Very productive constructions, like the passive construction, are those we usually regard as rules, while less productive ones, such as the distributive [Nx P Nx] construction (house by house) or the German double participle construction (gesagt – getan ‘said – done’), are marginal rules, but insofar as their vocabulary cannot be fully specified, rules nonetheless. Before discussing the many more questions that the present approach raises and leaves unanswered, the next section briefly sketches out what kind of theoretical frameworks can be used to integrate the current findings about productivity into a model of grammar, a prospect which is essential if we do not wish to restrict the study of productivity to isolated discussions of the phenomenon itself.

2.

What models of grammar are compatible with these results?

The approach taken in the present study has been rooted in the general framework of construction grammar: it has been assumed that any configuration of linguistic categories and lexical material can not only carry its own meaning, but also bring with it its own behavioral profile. At the same time, compositional semantics and semantic classes need not be done away with – much if not most of the meaning of complex utterances can be derived directly from the meanings of their constituents, and the recursive model of selection that allows a quantification of productivity on a slot-byslot basis also assumes a sort of compositionality. The current results do however speak for a more central position of non-algebraic, prototypebased categories, both as part of the explanation as to why certain constructions are acquired as more productive than others (the exemplar-based, prototype-centered representation created through Hebbian Learning in the previous chapter), but also as a means of reconciling the different behavior of slots taking a seemingly identical or near identical argument class in usage. By assuming prototype-centered semantic classes with slightly different usage-based profiles (as in constructions with a high representational overlap in Section 6 of the previous chapter) we can retain the many advantages of decompositional lexical semantics, including predictable similarities in usage and semantic entailments. The type of grammatical formalism we postulate to host these features of constructional categories and the facts about productive usage is in fact quite open. While it is clearly impossible to represent gradient productivity

What models of grammar are compatible with these results?

235

in a traditional algebraic generative grammar, any theory which moves beyond asking which constructions are possible in a language, towards the more detailed question of how they are used, can add productivity as a further type of usage information. Nevertheless, some types of theories seem better suited to representing the evidence we have reviewed. The fact that closely related members of syntactic alternations exhibit distinct productive behavior suggests that transformational accounts of alternations are probably ill-suited for the description of the idiosyncratic nature of productivity (this applies to morphology as well as syntax, recall the discussion of deverbal synthetic compounds and related verbal arguments in Chapter 5, Section 4). It can therefore be assumed that non-derivational theories such as LFG (Kaplan and Bresnan 1982, Bresnan 2001) or HPSG (Pollard and Sag 1994) can more easily accommodate a mechanism specifying productivity for each construction.137 It should however be noted that some of the assumptions made here, especially in the last chapter, conflict with precepts from non-CxG theories, though this need not mean those theories could not incorporate productivity all the same. The retention of the generative Competence Hypothesis in LFG, stating that a grammar must represent a speaker’s knowledge about language independently of the facts of the processor used to interpret the grammar and other parameters influencing the processor’s behavior (cf. Kaplan and Bresnan 1982), can be adhered to even in a model specifying productive behavior (as a part of grammar); nevertheless, the view adopted here to explain the emergence of productivity phenomena is deeply intertwined with the postulated structure of the processing architecture, and the pliability of productivity in context pleads for a representation that is constantly updated by the facts of usage (the need for such influences to model priming, structural parallelism etc. has also become apparent in more recent studies of grammatical alternation phenomena, e.g. Bresnan et al. 2007). In either case, admitting usage-based information about productivity into a formalism is possible through the addition of features for the relevant 137. In fact, the conceptual proximity of HPSG to CxG has often been noted (e.g. Fischer and Stefanowitsch 2006: 3–4), ultimately resulting in use of the formalism to represent CxG ideas in sign-based construction grammar (SBCG, see the articles in Boas and Sag 2011). In general, model-theoretical approaches such as LFG and HPSG, which concentrate on describing the constraints on the limitless possibilities of the grammar of a language, rather than enumeratively generating the set of permitted utterances (cf. Müller 2010: 325–331 for a discussion contrasting these families of formalisms), are more amenable to the integration of the kind of information discussed here.

236

Conclusion

slots within the functional or semantic structure, quite apart from the assumed parser responsible for the serialization of syntactic categories. For example, if we take an HPSG feature structure representing a verb like drink with its individual thematic roles in the CONTENT feature, it is conceivable one could specify that one aspect of the content of the THEME argument (the feature DRUNK) is on the one hand information about actual vocabulary items appearing in this slot, including both common prototypes central to its meaning and less common but still lexicalized cases as well, and on the other hand some quantitative information about the prevalence of rare items (see Figure 44). For each lexicalized item a number can be stored representing entrenchment, including very rare items that have nevertheless been stored (e.g. the hapax legomenon Kombucha below). However many other rare items, in the lowest box in the figure, leave such faint traces that their lexical representations in that position are not entrenched or are forgotten at some point, perhaps up to a frequency rank Vș, similarly to Baayen’s threshold activation model in Chapter 3, Section 5. Lexical usage information common prototypes water 6154 …

87 1

Vm for m

other stored exemplars martini … kombucha

traces of unlexicalized rare items V1 250 V2 135 … Vș n

Figure 44. Linking lexical usage information to an HPSG feature structure. In order to remain fully within the HPSG framework it would be equally possible to specify lexical usage information as a feature structure within DRUNK, with a list of lexicalized arguments each containing an entrench-

Outlook

237

ment feature and a list of non-lexicalized activation strengths Vm, though this would of course necessitate the definition of an appropriate ontological type for such features within the formalism. For the sake of simplicity the presentation in Figure 44 makes no such attempt, opting for a separate representation outside the HPSG entry proper. Such structures fusing entries in some formalism with lexical usage data contain the information necessary for the representation of productivity phenomena, namely speakers’ experience of Vm for each rank m, and do not interfere with other aspects of the formalism. The lexical entry can then compete with other entries as the most likely given the conceptual semantic content to be expressed in a certain situation and the vocabulary already stored in memory. In this view, mapping functional structures onto surface constituent structures can be seen to express the associative activation of constructions discussed in Section 6 of the previous chapter. However the most natural representation for productivity information in the present approach remains an exemplar-based one, as suggested throughout Chapter 6. Formalisms extracting structure from data and storing, at least initially, all witnessed exemplars, such as DOP (Bod 2006, 2009) or also fluid construction grammar (FCG, de Beule and Steels 2005) could most naturally incorporate and lead to the effects considered here. Yet exemplar-based models and other usage-incorporating theories of grammar are not mutually exclusive. In fact, Bod (2009) suggests an implementation of an induced LFG grammar using DOP which he calls LFG-DOP, and as we have just considered, the integration of productivity as a notion modulating lexical choice need by no means contradict the mechanisms underlying many theories of grammar and lexical description. Bringing productivity into such formalisms and computational models is both an important challenge and an exciting prospect for productivity research in the future.

3.

Outlook

In this section I will mention some of the remaining difficulties inherent in the present work and try to outline some of the questions that most urgently need to be answered to extend the results above. These fall largely into two categories: 1. methodological concerns about the corpus studies presented here and how further methodologies could be used to complement them; and 2. theoretical questions raised by the approach I have taken that can and should be studied further to either confirm or falsify the claims made in the previous chapters.

238

Conclusion

A first problem that should be addressed is that of the type of Web corpus data used in the studies presented here. The need for a truly large database to demonstrate lexical effects in syntactic constructions is understandable, because of the relative rarity of the phenomena – bare comparative correlatives, for example, occur only 1.146 times per one million words in the German deWaC corpus and only 0.365 times per million in the English ukWaC corpus (cf. Zeldes 2009, 2011). However, while Web data is currently the only practicable way of obtaining large enough samples of these processes for quantitative work, it forms a very distorted model of linguistic experience. As Biber and Jones (2009: 1288) estimate, a real English speaker’s daily intake of language data is more probably comprised of some 80% conversation, 10% television, 1% newspapers, 1% novels, 2% meetings, 2% radio broadcasts, 2% texts that they wrote (memos, e-mails, letters), and 2% other texts. While we may have little idea how much Web data the average speaker of English consumes, it is safe to say that the distributions shaping productive usage are mainly acquired elsewhere. Nevertheless, it can be said in defence of the data used in this book that results in many cases match at least the present author’s intuitions, and that Web data may well offer reflexes of the general input distribution insofar as this also directs the language of writers on the Internet. 138 In the worst case, these results indicate productivity in argument selection on the Internet, a phenomenon that a model of grammar should be able to explain in itself, and in the best case, many of the results found here can be extended to other registers than those found in Web data (though some quantitative differences should always be expected). In any case, the empirical methods explored here should apply across text types, and the theoretical model presented in the previous chapters predicts that further corpora will also show consistent productivity effects. Perhaps most interesting would be the discovery of completely different rankings based on the same criteria across genres, which one would only expect for either truly close competitors or extremely different genres. In this regard, further corpus-based studies on more text types are definitely needed to explore the general cross-domain robustness of productivity effects in syntax. There is furthermore a substantial need for more experimental and psycholinguistic studies of productivity phenomena in general and in argument selection in particular. Since productivity has been treated as a largely 138. It has also been shown that used carefully, Web data can quantitatively correspond to and extend analyses achieved with carefully designed corpora such as the BNC (see Keller and Lapata 2003).

Outlook

239

quantitative phenomenon, observable mainly in large amounts of textual data, it would be especially interesting to know if intuitions about relative productivity are consistent and if so under what conditions. To examine these questions, experimental elicitation studies like those already carried out for word formation (e.g. Baayen 1994) are needed in the syntactic context. Assuming consistent intuitions are found, at least for a subset of productivity phenomena, which aspects of productivity do they best correspond to? Can such knowledge be shown to exist for trained linguists, or perhaps even for every speaker? What kind of experiment designs can be used to extract consistent, meaningful judgments? At the same time there is a need for empirical validation from converging sources of non-introspective evidence. For example, can productivity effects be detected using ERP, eye tracking or neural imaging studies? As one possible study, consider an experiment measuring reaction times in a lexical decision task contrasting in and out-of-data items, or common and very rare items in syntactic slots with significantly different corpus-based potential productivity measurements. It might be expected that subjects perform faster or exhibit less surprisal in ERP measurements at the use of novel arguments in a more productive slot, since this is more expected at that position. An absence of any difference would certainly challenge the current results and especially their cognitive explanation in the previous chapter, while a confirmation of differential performance would strengthen the case for a neural reality for representational differences in more or less productive slots, especially if (near) synonymous slots are studied. Another discipline which plays a central role in showing the plausibility of usage-based exemplar-theoretical models for the acquisition of more or less extensible rules is computational linguistics, which has already been mentioned in the last chapter. Both connectionist models and data-driven symbolic models of grammar are being developed which infer rules from the structure of more or less richly annotated input. In this context annotation should not be shied away from as adding information not available to the language learner or possibly obscuring the patterns in the raw language data (as cautioned against e.g. by Sinclair 2004: 190–191): no amount of annotation can compare to the richly structured sensory input that accompanies language acquisition, which is by no means driven by a stream of pure language data. A substantial challenge will be the characterization and schematic categorization of the extralinguistic information relevant for grammar/lexicon acquisition that the language learner receives about language usage. While some abstractions must of necessity be made (we are not in a position to model ‘the world’), some form of knowledge base, sta-

240

Conclusion

tistically inferred or explicitly modeled, will have to be part of any practical system predicting the extensibility of argument structures to out-of-data lexemes. Conversely, productivity information may turn out to be useful for a variety of computational systems dealing with unseen lexemes: for example, if an ambiguous structure is encountered with a previously unseen lexeme, a system with knowledge about productivity might do well to assume that it has encountered the more productive of the possible constructions it must choose between (cf. Zeldes 2009 for a discussion of parsing PP attachment ambiguity and productivity139). From the theoretical standpoint too, many questions remain open at the end of this discussion. The problem of multi-slot interactions has been touched upon briefly as a sort of exception in Chapter 4. In reality it can be assumed that very many cases of interactions will be found, both between distant cousin or sister slots, such as in the case of comparative correlatives, and in subordinated and subordinating selectional processes, such as a verb constraining its object, while attributes of the object may be constrained by both the object head noun and the verb itself (the problem of nonhierarchical argument selection). In these cases I have largely pleaded for a decision about slot definition that is guided by deviation from statistical independence, but ultimately decided by the needs and goals of the analysis. Here we must ask: how granular is our interest in e.g. what kind of official processes one can ‘pursue’ in German with the verb anstrengen (cf. Chapter 4, Section 2)? Do we care if we conflate productivity from nominal head selection and compounding? And so on, depending on our research question or the purpose of the investigation. A particular point of interest not touched upon here has been the position of verbal subjects, which are often much less lexically constrained than objects, while showing strong natural class-based effects such as limitation to animate or human referents (especially in the case of the volitional 139. Put briefly, the idea is that if a parser encounters a sentence like: Yesterday I ate rice with furikake but has no knowledge of what furikake is, the PP can be read as attached high (e.g. an instrumental reading ate … with furikake) or low (fish with furikake). However the instrumental reading has a very unproductive head noun vocabulary (fork, chopsticks…), most of which should have been encountered in a large training corpus. The parser should therefore correctly prefer the low reading based on the knowledge that the noun modifier PP construction is more productive and the fact that it does not know the noun lexeme.

Outlook

241

role for transitive verbs, see Plank 1984: 310).140 While in some theories subjects are generated at a different hierarchical level as specifiers etc., it is easy to show that the choice of novel subjects can be more or less productive for different verbs, much as in the case of objects. In many cases it will be hard to decide whether the subject constrains the verb or the other way around: the class of subjects for bark is rather limited,141 but the range of verbs a dog appears as subject of is also rather small; yet in both cases we can expect some possibility for novel lexical material. In a sense we have already seen the symmetry of productive slot definition in the study of intensifiers like very, extremely etc. Although these are seen as modifiers of adjectives (and therefore subordinate and non-argumental), it is not senseless to speak of the extent of their usage by examining the variability of the adjective lexemes which they modify. Even though this problem of motivating our slot definitions can be ignored for a variety of specific studies of productivity phenomena, a major theoretical concern for a grammar aiming to describe productivity in a language fully will therefore be to determine at what points choices are made which should be subjected to productivity ratings, and which choices constrain which. Another theoretical question that requires further discussion regards the way in which productivity knowledge is acquired from input. Studies of first language acquisition can extend our understanding of how frequency distributions develop in the input that children attend to, and many of the current efforts mentioned in the previous chapter are examining precisely this area (see also the overview in Abbot-Smith and Tomasello 2006). Here time will tell if their findings converge with the account presented here, but results thus far seem to suggest that ambient language in first language acquisition follows an even more starkly and prototype-centered distribution. The model presented in this book predicts that for younger speakers not yet exposed to distributional facts AGENT

140. As already mentioned in Section 2 of Chapter 4, in certain scenarios it also makes sense to collapse certain distinct lexemes into a single type based on these classes in order to get more satisfactory productivity ratings, e.g. in the case of distinct proper names which could be viewed as unproductively realizing the type [NAME] in a slot. 141. Though such restricted cases are not overly common, in many languages they are also not particularly few, e.g. horses neigh and gallop, bells toll, unmarried couples can elope etc. (the latter case is less symmetrical, since the verb gives a lot of information about the subject, whereas unmarried people can be the subjects of a very wide range of verbs; see Plank 1984 for an extensive discussion).

242

Conclusion

about vocabulary items not present in their ambient language, productive behavior could be primed under controlled conditions by exposure to real examples of near synonyms in use (and not just nonce verbs as in previous training studies). Such exposure to unfamiliar examples should show a productively facilitatory effect for whichever item is presented with a more characteristically LNRE distribution (i.e. rich in rare types, with few more common prototypes) independently of adult data for those competing items and the meanings gleaned from the examples. As the data presented here has concentrated on a static view of adult-produced distributions, a developmental cognitive account could complement it by explaining how the way people learn language changes with time and how previously acquired knowledge builds up and affects further learning longitudinally. Yet another intriguing direction for further studies of productivity is the typological perspective. The cognitive mechanisms assumed to underlie the acquisition of productive constructions have been very general in this work, with the interpretation in Chapter 6 focusing on Hebb’s Law as motivating the self-perpetuating nature of lexical distributions. Challenges to this approach could come from languages with phenomena that are more difficult to model with the current assumptions. Are there, typologically speaking, grammatical configurations with a facilitatory effect on the acquisition of productive selectional processes? Does greater variability of word order or interruptibility of constructions change the relationships between input configuration and productivity in production? Some such flexibility in the morphological productivity paradigm has already been shown in studies of Semitic morphology (notably Bolozky 1999 on Israeli Hebrew word formation), and extension of the current methodology to difficult syntactic phenomena would be very welcome. Finally, this work has only addressed productivity in first language use. It remains to be seen whether or not, and to what extent, similar productivity phenomena can be found in adult language learner data and how these can best be explained and theoretically modeled. The study of productivity and distributional effects in second language acquisition is still at a more preliminary stage, but some recent work seems promising (e.g. Ellis and Ferreira-Junior 2009; Wulff et al. 2009; Madlener 2011) and shows clearly that second language learners are also sensitive to extensibility and vocabulary effects found in input frequency distributions, though probably in somewhat different ways (e.g. disruptive effects on acquisition of variable surface word order less present in native speakers, to name one effect; cf. Zeldes, Lüdeling, and Hirschmann 2008). It is hoped that the present investigation will help open up the methodology of the morphological

Outlook

243

productivity paradigm for these, and other areas of study engaged in explaining the usage of syntactic argument selection processes.

Appendices A Queries The queries below were processed using the Corpus Work Bench (Christ 1994) using the relevant corpora as described in each section. A.1 Queries for German adjectives in -bar, -sam and -lich This query allows for predicative/adverbial forms (pos="ADJD") and attributive forms (pos="ADJA") with case/number/gender inflection, as well as comparative and superlative formations. For -sam and -lich, replace -bar with the relevant suffix. Results for these queries were manually filtered as described in Chapter 3, Section 2. [pos="ADJ." & word=".*bar(er|st)?(e[srnm]?)?"]

A.2 Queries for German pre- and postpositional wegen Postpositional (a) and prepositional (b) wegen followed by punctuation (note that in all queries "." stands for the regular expression ‘exactly one character’, and therefore matches ",", "!" etc. as well): (a) "(des|der|eines|einer)" [pos="ADJA"]* [pos="NN"] "wegen" "."

(b) "[Ww]egen" "(des|der|eines|einer|dem|einem|den)" [pos="ADJA"]* [pos="NN"] "."

Prepositional wegen with unambiguous masculine/neuter singular arguments in the genitive (c), the dative (d), and the genitive with postpositional wegen (e), all followed by punctuation: (c) "[Ww]egen" "(des|eines)" [pos="ADJA"]* [pos="NN"] "."

Appendices

245

(d) "[Ww]egen" "(dem|einem)" [pos="ADJA"]* [pos="NN"] "."

(e) "(des|eines)" [pos="ADJA"]* [pos="NN"] "wegen" "."

A.3 Queries for English transitive verbs and objects These find a certain lexical verb (pos="VV.*", replace VERB with the relevant lemma), optionally followed by an article and an arbitrary amount of adjectives. Finally the object noun must appear, followed by a token which is not a possessive marker (’s) or another noun, to prevent finding genitive attributes or compound modifiers of the actual head noun. [pos="VV.*" & lemma="VERB"] [pos="(DT|PP\$)"]? [pos="JJ.*"]*[pos="NN."] [pos!="(POS|NN.)"]

A.4 Queries for English comparative correlatives and comparatives (a) Bare CC (e.g. the faster the better): "[Tt]he" ("(more|less)" [pos="JJ"]|[pos="JJR"]) ","? "the" ("(more|less)" [pos="JJ"]|[pos="JJR"]) [pos!="(DT|CD|PP\$|DT|JJ|NN.?)"]

(b) CC with subject NP only (the fresher the beans, the stronger the coffee): "[Tt]he" ("(more|less)" [pos="JJ"]|[pos="JJR"]) ([pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"]) (([pos="(CC|POS|PP)"]|"of") [pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"])* ","? "the" ("(more|less)" [pos="JJ"]|[pos="JJR"]) ([pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"]) (([pos="(CC|POS|PP)"]|"of") [pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"])* [pos!="V.*"]* "[\.!\?]"

246

Appendices

(c) CC with subject NP and predicate VP (the faster the driver goes, the sooner the bus gets there): [Tt]he" ("(more|less)" [pos="JJ"]|[pos="JJR"]) ([pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"]) (([pos="(CC|POS|PP)"]|"of") [pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"])* [pos="(V.*|TO)"]+ ([pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"])? (([pos="(CC|POS|PP)"]|"of") [pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"])* ","? "the" ("(more|less)" [pos="JJ"]|[pos="JJR"]) ([pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"]) (([pos="(CC|POS|PP)"]|"of") [pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"])* [pos="V.*"]

(d) Comparatives in general (analytic like more/less probable or synthetic like faster). Cases of analytic comparatives followed by a noun were excluded to avoid scope errors, e.g. more heavy losses where more modifies losses, not heavy: ("(more|less)" [pos="JJ"] [pos!="N.*"]|[][pos="JJR"])

A.5 Queries for English adjective intensifiers Replace INTENSIFIER with absolutely, completely, entirely, extremely, highly, totally, very in the following query: [lemma="INTENSIFIER"] [pos="JJ" & word="[a-z\-]+"]

The additional condition that the following adjective consist only of lower case alphabetic characters and possible hyphens was motivated by high error rates due to typos and mistokenizations which contained nonalphabetic characters.

Appendices

247

A.6 Queries for German synthetic compounds and verbal objects (a) Query for synthetic compounds with a nominal head suffixed with -er. Compounds must be tagged as nouns, capitalized and contain only alphabetic German characters or a hyphen. The ending -er may be followed by the genitive singular or dative plural markers -s or -n. [pos="NN" & word="[A-ZÜÖÄ][a-züöäß-]+er[sn]?"]

After retrieval, results were post-processed to exclude -er following i, unless this was part of the diphthong written ei. Items with less than three syllables were ruled out (one each for the modifier, the head and the suffix). (b) High precision query for German verbal objects in subordinate clauses. Main clauses were avoided since a possible topicalized object makes subject and object difficult to distinguish. The expression looks for a conjunction followed by a possible reflexive and a subject phrase (pronominal or nominal, but not es ‘it’ which may be a scrambled object), then any non-verbal tags, followed by an NP which must be a possible direct object. This is done by ruling out a preposition preceding the phrase, and allowing only determiner forms which can be accusative. The verb must follow this object NP, the nominal head of which is extracted automatically. [pos="KOUS"][pos="PRF"]? [pos="ART"]? [pos="ADJA"]*[pos="(PIS|PPER|N.)" & word!="es"] ([pos!="(V.*|\$.)"]* [pos!="(PDAT|KOKOM|APPR|APPRART|ART|ADJA|CARD|\$ .)"])? ([pos="(ART|PIAT|PPOSAT|PDAT)" & word="(den|die|das|([mdDsk]?ein|[Ii]hr|[Ee]uer|u nser|dies)e?n?)"]|[pos="CARD"])?[pos="ADJA"]*[po s="N."][pos="VV.*"]

A.7 Queries for help with different kinds of infinitive complements (a) help with a possible object NP followed by an infinitive with to: [pos="VV.*" & lemma="help"] (([pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"]) (([pos="(CC|POS|PP)"]|"of") [pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"])*)? "to" [pos="VV"]

248

Appendices

(b) the same as (a), without to: [pos="VV.*" & lemma="help"] (([pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"]) (([pos="(CC|POS|PP)"]|"of") [pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"])*)? [pos="VV"]

(c) subset of (a), help to [VINF]: [pos="VV.*" & lemma="help"] "to"

[pos="VV"]

(d) subset of (a), help [NP] to [VINF]: [pos="VV.*" & lemma="help"] ([pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"]) (([pos="(CC|POS|PP)"]|"of") [pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"])* "to" [pos="VV"]

(e) subset of (b), help [VINF]: [pos="VV.*" & lemma="help"] [pos="VV"]

(f) subset of (b), help [NP] [VINF]: [pos="VV.*" & lemma="help"] ([pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"]) (([pos="(CC|POS|PP)"]|"of") [pos="(DT|CD|PP\$|DT)"]? [pos="RB"]? ([pos="(JJ.?)"] (","|[pos="CC"])?)* [pos="(NN.?|PP)"])* [pos="VV"]

Appendices

249

B Linear regression model with quadratic term for -sam/-bar A possible way of quantifying the difference between entire frequency distributions (and not just the proportion of HL) is to compare them to an ideal log-linear Zipfian distribution. We can attempt to fit a linear model with a quadratic term to each observed distribution in the double logarithmic plain: the more variance in the data can be explained by a linear correlation of rank and frequency, the less significant the contribution of the quadratic term to the model (I thank Felix Golcher for pointing this out). Fitting such a parabola to the data from -sam and -bar using ordinary least squares regression (see Baayen 2008: 169–195) reveals that the -bar curve is far more log-linear. Table 31. Coefficients for a least squares regression model with a quadratic term fitted to the frequency distributions of -sam and -bar.

intercept rank rank2

value 7.57221 -0.02686 -0.53007

-sam std.err. 0.2216 0.1998 0.0432

p(>|t|) 0 0.8937 3.775e-15

value 7.7510 -0.1933 -0.1601

-bar std.err. 0.097719 0.041658 0.004375

p(>|t|) 0 4.13e-06 0

As the table shows, the linear coefficient rank is not significantly correlated (p=0.8937) with frequency for -sam, though it is highly significant for -bar. The quadratic term rank2 is very significantly correlated with frequency for both processes, but the absolute value of the quadratic term is over three times as high for -sam, signifying a substantially more curved parabola. It follows that -bar is much closer to the log-linear progression predicted by Zipf’s Law than -sam. The advantage of comparing the entire frequency distribution to an expected one is that all frequency bands are taken into consideration, and not just HL. However the main disadvantage is an unclear linguistic interpretation. In this model, all ranks are equally important, though HL, by virtue of their large numbers, play an important role. It is not clear that a relationship between, say, ranks 5 and 6 which follows Zipf’s Law is as important a characteristic as the relationship between 6 and 7, or even important at all. In the Morphological Race Model of Frauenfelder and Schreuder (1992), for example, only one distinction is made: either a word is parsed or it is retrieved from memory, and intermediate frequencies beyond the threshold are not important (this is modeled

250

Appendices

by Baayen’s $ ). In other words, for this measure to have a linguistic interpretation, we would need to assume that having an ideal Zipf distribution is not just an empirical symptom of productivity, but a causally related factor. In practice, this measure seems to have no clear advantages in assessing potential productivity above and beyond 3 , at least insofar as the ordinal ranking of processes is concerned.

References

Abbot-Smith, Kirsten, and Michael Tomasello 2006 Exemplar-learning and schematization in a usage-based account of syntactic acquisition. The Linguistic Review 23(3): 275–290. Abney, Steven P. 1987 The English noun phrase in its sentential aspect. Ph. D. diss., Massachusetts Institute of Technology. Al, Bernard P. F., and Geert E. Booij 1981 De productiviteit van woordvormingsregels [The productivity of word formation rules]. Forum der Letteren 22: 26–38. Algeo, John 1988 British and American grammatical differences. International Journal of Lexicography 1(1): 1–31. Altenberg, Bengt, and Mats Eeg-Olofsson 1990 Phraseology in spoken English. In Theory and Practice in Corpus Linguistics, Jan Aarts, and Willem Meijs (eds.), 1–26. Amsterdam: Rodopi. Anderson, Stephen R. 1982 Where’s morphology? Linguistic Inquiry 13(4): 571–612. Anshen, Frank, and Mark Aronoff 1988 Producing morphologically complex words. Linguistics 26(4): 641– 655. Aronoff, Mark 1976 Word Formation in Generative Grammar. Cambridge, MA: MIT Press. Baayen, R. Harald 1989 A Corpus-Based Approach to Morphological Productivity. Statistical Analysis and Psycholinguistic Interpretation. Ph. D. diss., Vrije Universiteit, Amsterdam. 1992 Quantitative aspects of morphological productivity. In Yearbook of Morphology 1991, Geert E. Booij, and Jaap van Marle (eds.), 109– 149. Dordrecht: Kluwer. 1993 On frequency, transparency and productivity. In Yearbook of Morphology 1992, Geert E. Booij, and Jaap van Marle (eds.), 181– 208. Dordrecht: Kluwer. 1994 Productivity in language production. Language and Cognitive Processes 9(3): 447–469. 1996 The effects of lexical specialization on the growth curve of the vocabulary. Computational Linguistics 22(4): 455–480.

252

References

2001

Word Frequency Distributions. (Text, Speech and Language Technologies 18.) Dordrecht/Boston/London: Kluwer. 2008 Analyzing Linguistic Data. A Practical Introduction to Statistics using R. Cambridge: Cambridge University Press. 2009 Corpus linguistics in morphology: Morphological productivity. In Corpus Linguistics. An International Handbook. Vol. 2, Anke Lüdeling, and Merja Kytö (eds.), 899–919. Berlin: Mouton de Gruyter. Baayen, R. Harald, and Rochelle Lieber 1991 Productivity and English derivation: A corpus-based study. Linguistics 29(5): 801–843. Baayen, R. Harald, and Antoinette Renouf 1996 Chronicling the Times: Productive lexical innovations in an English newspaper. Language 72(1): 69–96. Bakken, Kristin 1998 Leksikalisering av Sammensetninger [Lexicalization of compounds]. (Acta Humaniora 38.) Oslo: Universitetsforlaget. Baroni, Marco, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta 2009 The WaCky Wide Web: A collection of very large linguistically processed Web-crawled corpora. Language Resources and Evaluation 43(3): 209–226. Baroni, Marco, and Stefan Evert 2007 Words and echoes: Assessing and mitigating the non-randomness problem in word frequency distribution modeling. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 904–911. Prague. Barsalou, Lawrence W. 1983 Ad hoc categories. Memory and Cognition 11(3): 211–227. 1999 Perceptual symbol systems. Behavioral and Brain Sciences 22(4): 577–660. Barðdal, Jóhanna 2006 Predicting the productivity of argument structure constructions. Proceedings of the Berkeley Linguistics Society 32. 2008 Productivity: Evidence from Case and Argument Structure in Icelandic. (Constructional Approaches to Language 8.) Amsterdam/Philadelphia: John Benjamins. Bauer, Laurie 1983 English Word Formation. Cambridge: Cambridge University Press. 1992 Scalar productivity and -lily adjectives. In Yearbook of Morphology 1991, Geert E. Booij, and Jaap van Marle (eds.), 185–191. Dordrecht: Kluwer. 2001 Morphological Productivity. (Cambridge Studies in Linguistics 95.) Cambridge: Cambridge University Press.

References

253

Beck, Sigrid 1997 On the semantics of comparative conditionals. Linguistics and Philosophy 20: 229–271. Behrens, Heike 2009 Usage-based and emergentist approaches to language acquisition. Linguistics 47(2): 383–411. de Beule, Joachim, and Luc Steels 2005 Hierarchy in fluid construction grammar. In KI 2005: Advances in Artificial Intelligence. Proceedings of the 28th German Conference on AI, Ulrich Furbach (ed.), 1–15. Berlin: Springer. Biber, Douglas, and Susan Conrad 1999 Lexical bundles in conversation and academic prose. In Out of Corpora: Studies in Honour of Stig Johansson, Hilde Hasselgård, and Signe Oksefjell (eds.), 181–190. (Language and Computers 26.) Amsterdam: Rodopi. Biber, Douglas, Susan Conrad, and Viviana Cortes 2004 If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics 25(3): 371–405. Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, Edward Finegan, and Graeme Hirst 1999 The Longman Grammar of Spoken and Written English. London: Longman. Biber, Douglas, and James K. Jones 2009 Quantitative methods in corpus linguistics. In Corpus Linguistics. An International Handbook. Vol. 2., Anke Lüdeling, and Merja Kytö (eds.), 1286–1304. Berlin: Mouton de Gruyter. Bickerton, Derek 1992 Language & Species. Chicago: University of Chicago Press. Bishop, Christopher M. 1995 Neural Networks for Pattern Recognition. Oxford: Oxford University Press. Bloomfield, Leonard 1935 Language. London: George Allen & Unwin. Boas, Hans C. 2003 Resultative Constructions in English and German. Stanford: CSLI Publications. 2011 Zum Abstraktionsgrad von Resultativkonstruktionen. In Sprachliches Wissen zwischen Lexikon und Grammatik, Stefan Engelberg, Anke Holler, and Kristel Proost (eds.), 37–70. (Institut für Deutsche Sprache Jahrbuch 2010.) Berlin/New York: De Gruyter. Boas, Hans C., and Ivan A. Sag (eds.) 2011 Sign-Based Construction Grammar. Stanford: CSLI Publications. Bock, Kathryn, and Helga Loebell 1990 Framing sentences. Cognition 35(1): 1–39.

254

References

Bod, Rens 2006

Exemplar-based syntax: How to get productivity from examples. The Linguistic Review 23(3): 291–320. 2009 From exemplar to grammar: A probabilistic analogy-based model of language learning. Cognitive Science 33(5): 752–793. Bolinger, Dwight 1948 On defining the morpheme. Word 4: 18–23. Bolozky, Shmuel 1999 Measuring Productivity in Word Formation: The Case of Israeli Hebrew. Leiden: Brill. Booij, Geert E. 1977 Dutch Morphology. A Study of Word Formation in Generative Grammar. Dordrecht: Foris. 2005 Compounding and derivation: Evidence for construction morphology. In Morphology and its Demarcations, Wolfgang U. Dressler, Dieter Kastovsky, Oskar E. Pfeiffer, and Franz Rainer (eds.), 109–132. Amsterdam/Philadelphia: John Benjamins. 2009 Compounding and construction morphology. In The Oxford Handbook of Compounding, Rochelle Lieber, and Pavol Štekauer (eds.), 201–216. (Oxford Handbooks in Linguistics.) Oxford: Oxford University Press. 2010 Construction Morphology. Oxford: Oxford University Press. Bornkessel, Ina, Matthias Schlesewsky, Bernard Comrie, and Angela D. Friederici (eds.) 2006 Semantic Role Universals and Argument Linking: Theoretical, Typological, and Psycholinguistic Perspectives. (Trends in Linguistics 165.) The Hague: Mouton de Gruyter. Bosch, Peter 1993 Lexical meaning and conceptual representation. In Discourse and Lexical Meaning. Proceedings of Workshop of the DFG Sonderforschungsbereich 340, November 30th - December 1st, 1992, Peter Bosch, and Peter Gerstl (eds.), 19–33. (Arbeitspapiere des Sonderforschungsbereichs 340, Band 30.) Heidelberg: IBM. Botha, Rudolf P. 1968 The Function of the Lexicon in Transformational Generative Grammar. The Hague/Paris: Mouton. Boyd, Jeremy K., and Adele E. Goldberg 2009 Input effects within a constructionist framework. The Modern Language Journal 93(3): 418–429. 2011 Learning what NOT to say: The role of statistical preemption and categorization in a-adjective production. Language 87(1): 55–83. Braunmüller, Kurt 1982 Syntaxtypologische Studien zum Germanischen. Tübingen: Gunter Narr.

References

255

Bresnan, Joan 2001 Lexical-Functional Syntax. (Blackwell Textbooks in Linguistics 16.) Oxford: Blackwell. Bresnan, Joan, Anna Cueni, Tatiana Nikitina, and R. Harald Baayen 2007 Predicting the dative alternation. In Cognitive Foundations of Interpretation, Gerlof Bouma, Irene Kraemer, and Joost Zwarts (eds.), 69–94. Amsterdam: Royal Netherlands Academy of Science. Brooks, Patricia J., Michael Tomasello, Kelly Dodson, and Lawrence B. Lewis 1999 Children’s overgeneralizations with fixed transitivity verbs. Child Development 70(6): 1325–1337. Bybee, Joan L. 1985 Morphology: A Study of the Relations between Meaning and Form. (Typological Studies in Language 9.) Amsterdam/Philadelphia: John Benjamins. 2006 From usage to grammar: The mind’s response to repetition. Language 82(4): 711–733. 2010 Language, Usage and Cognition. Cambridge: Cambridge University Press. Cappelle, Bert 2006 Particle placement and the case for ‘allostructions’. Constructions SV1(7). Carlson, Gregory 1977 Reference to kinds in English. Ph. D. diss., University of Massachusetts Amherst. Carroll, David W. 2008 Psychology of Language. 5th ed. Belmont, CA: Thomson Wadsworth. Carstairs-McCarthy, Andrew 1992 Current Morphology. London: Routledge. Casenhiser, Devin, and Adele Goldberg 2005 Fast mapping between a phrasal form and meaning. Developmental Science 8(6): 500–508. Chomsky, Noam 1957 Syntactic Structures. The Hague/Paris: Mouton. 1966 Topics in the Theory of Generative Grammar. The Hague: Mouton. 1981 Lectures on Government and Binding. Dordrecht: Foris. 1995 The Minimalist Program. Cambridge, MA: MIT Press. 2009 Reprint. Cartesian Linguistics: A Chapter in the History of Rationalist Thought. 3rd ed. Cambridge: Cambridge University Press. Original edition, New York: Harper & Row, 1966. Chomsky, Noam, and Morris Halle 1968 The Sound Pattern of English. New York: Harper & Row.

256

References

Christ, Oliver 1994 A modular and flexible architecture for an integrated corpus query system. In Proceedings of Complex 94. 3rd Conference on Computational Lexicography and Text Research, 23–32. Budapest: Hungarian Academy of Sciences, Linguistics Institute. Clausner, Timothy C., and William Croft 1997 Productivity and schematicity in metaphors. Cognitive Science 21(3): 247–282. Collins, Allan M., and Elizabeth F. Loftus 1975 A spreading-activation theory of semantic processing. Psychological Review 82(6): 407–428. Corbin, Danielle 1987 Morphologie dérivationelle et structuration de lexique. Tübingen: Niemeyer. Croft, William 2001 Radical Construction Grammar. Oxford: Oxford University Press. 2010 The origins of grammaticalization in the verbalization of experience. Linguistics 48(1): 1–48. Croft, William, and D. Alan Cruse 2004 Cognitive Linguistics. Cambridge: Cambridge University Press. Cruse, D. Alan 2002 Aspects of the micro-structure of word meanings. In Polysemy: Theoretical and Computational Approaches, Yael Ravin, and Claudia Leacock (eds.), 30–51. Oxford: Oxford University Press. Culicover, Peter W., and Ray Jackendoff 1999 The view from the periphery: The English comparative correlative. Linguistic Inquiry 30(4): 543–571. Dauben, Joseph W. 1990 Georg Cantor: His Mathematics and Philosophy of the Infinite. Princeton, NJ: Princeton University Press. Dik, Simon 1967 Some critical remarks on the treatment of morphological structure in transformational generative grammar. Lingua 18: 352–383. Dikken, Marcel Den 2005 Comparative correlatives comparatively. Linguistic Inquiry 36(4): 497–532. Dowty, David R. 1991 Thematic proto-roles and argument selection. Language 67(3): 547– 619. Duffley, Patrick J. 1999 The use of the infinitive and the -ing after verbs denoting the beginning, middle and end of an event. Folia Linguistica 33(3-4): 295–331.

References

257

Efron, Bradley, and Ronald Thisted 1976 Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 63(3): 435–447. Ellis, Nick C., and Fernando Ferreira-Junior 2009 Construction learning as a function of frequency, frequency distribution, and function. The Modern Language Journal 93(3): 370–385. Elman, Jeffrey L. 1990 Finding structure in time. Cognitive Science 14(2): 179–211. 2009 On the meaning of words and dinosaur bones: Lexical knowledge without a lexicon. Cognitive Science 33(4): 547–582. Erben, Johannes 1993 Einführung in die deutsche Wortbildungslehre. 3. Auflage. Berlin: Erich Schmidt. Erjavec, Irena Srdanoviü, Tomaž Erjavec, and Adam Kilgarriff 2008 A Web corpus and word sketches for Japanese. Information and Media Technologies 3(3): 529–551. Erman, Britt, and Beatrice Warren 2000 The idiom principle and the open choice principle. Text 20(1): 29–62. Evert, Stefan 2004 A simple LNRE model for random character sequences. In Proceedings of JADT 2004, 411–422. Louvain-la-Neuve, Belgium. 2005 The statistics of word cooccurrences: Word pairs and collocations. Ph. D. diss., University of Stuttgart. 2006 How random is a corpus? The library metaphor. Zeitschrift für Anglistik und Amerikanistik 54(2): 177–190. 2009 Corpora and collocations. In Corpus Linguistics. An International Handbook. Vol. 2, Anke Lüdeling, and Merja Kytö (eds.), 1212– 1248. Berlin: Mouton de Gruyter. Evert, Stefan, and Marco Baroni 2007 zipfR: Word frequency distributions in R. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Posters and Demonstrations Session, 29–32. Prague. Evert, Stefan, and Anke Lüdeling 2001 Measuring morphological productivity: Is automatic preprocessing sufficient? In Proceedings of Corpus Linguistics 2001, Paul Rayson, Andrew Wilson, Tony McEnery, Andrew Hardie, and Shereen Khoja (eds.), 167–175. Lancaster. Fillmore, Charles J. 1979 Innocence: A second idealization for linguistics. Proceedings of the Berkeley Linguistics Society 5: 63–76. 1985 Syntactic intrusion and the notion of grammatical construction. Proceedings of the Berkeley Linguistics Society 11: 73–86.

258

References

1988

The mechanisms of ‘construction grammar’. Proceedings of the Berkeley Linguistics Society 14: 35–55. Fillmore, Charles J., and Hiroaki Sato 2002 Transparency and building lexical dependency graphs. Proceedings of the Berkeley Linguistics Society 28: 87–99. Firth, John R. 1957 Papers in Linguistics. London: Oxford University Press. Fischer, Kerstin, and Anatol Stefanowitsch 2006 Konstruktionsgrammatik: Ein überblick. In Konstruktionsgrammatik: Von der Anwendung zur Theorie, Kerstin Fischer, and Anatol Stefanowitsch (eds.), 3–17. Tübingen: Stauffenburg. Fleischer, Wolfgang 1975 Wortbildung der deutschen Gegenwartssprache. 4th ed. Tübingen: Niemeyer. Fleischer, Wolfgang, and Irmhild Barz 2007 Wortbildung der deutschen Gegenwartssprache. 3rd ed. Tübingen: Niemeyer. Fodor, Jerry A. 1970 Three reasons for not deriving “kill” from “cause to die”. Linguistic Inquiry 1(4): 429–438. 1998 Concepts: Where Cognitive Science Went Wrong. (Oxford Cognitive Science Series.) Oxford: Oxford University Press. Földiák, Peter 1990 Forming sparse representations by local anti-Hebbian learning. Biological Cybernetics 64(2): 165–170. 1991 Learning invariance from transformation sequences. Neural Computation 3(2): 194–200. Fox, Anthony 1990 The Structure of German. Oxford: Oxford University Press. Frauenfelder, Ulli, and Robert Schreuder 1992 Constraining psycholinguistic models of morphological processing and representation: The role of productivity. In Yearbook of Morphology 1991, Geert Booij, and Jaap van Marle (eds.), 165–183. Dordrecht: Kluwer. Fuhrhop, Nanna 1996 Fugenelemente. In Deutsch – typologisch, Ewald Lang, and Gisela Zifonun (eds.), 525–550. Berlin/New York: Walter de Gruyter. Gaeta, Livio 2010 Synthetic compounds. With special reference to German. CrossDisciplinary Issues in Compounding. In Sergio Scalise, and Irene Vogel (eds.), 219–235. Amsterdam/Philadelphia: John Benjamins. Gaeta, Livio, and Davide Ricca 2006 Productivity in Italian word formation: A variable-corpus approach. Linguistics 44(1): 57–89.

References

259

Gaeta, Livio, and Amir Zeldes 2012 Deutsche Komposita zwischen Syntax und Morphologie: Ein korpusbasierter Ansatz. In Das Deutsche als kompositionsfreudige Sprache: Strukturelle Eigenschaften und systembezogene Aspekte, Livio Gaeta, and Barbara Schlücker (eds.), 197–217. (Linguistik – Impulse & Tendenzen 46.) Berlin: De Gruyter. in prep. Between VP and NN: On the constructional types of German -er compounds. Gahl, Susanne, and Susan M. Garnsey 2004 Knowledge of grammar, knowledge of usage: Syntactic probabilities affect pronunciation variation. Language 80(4): 748–775. Geeraerts, Dirk 1994a Lexical semantics. In The Encyclopedia of Language and Linguistics, Ronald E. Asher, and Joy M. Y. Simpson (eds.), 2160–2164. Oxford/New York/Seoul/Tokyo: Pergamon Press. 1994b Syntagmatic lexical relations. In The Encyclopedia of Language and Linguistics, Ronald E. Asher, and Joy M. Y. Simpson (eds.), 4475– 4476. Oxford/New York/Seoul/Tokyo: Pergamon Press. Gentner, Dedre, and José Medina 1998 Similarity and the development of rules. Cognition 65(2-3): 263–297. Glucksberg, Sam, and Boaz Keysar 1993 How metaphors work. In Metaphor and Thought. 2nd ed, Andrew Ortony (ed.), 401–424. Cambridge: Cambridge University Press. Goldberg, Adele E. 1995 Constructions: A Construction Grammar Approach to Argument Structure. Chicago/London: University of Chicago Press. 2006a Constructions at Work: The Nature of Generalization in Language. Oxford: Oxford University Press. 2006b Learning linguistic patterns. Psychology of Learning and Motivation 47: 33–65. 2009 The nature of generalization in language. Cognitive Linguistics 20(1): 93–127. Goldberg, Adele E., Devin M. Casenhiser, and Nitya Sethuraman 2004 Learning argument structure generalizations. Cognitive Linguistics 15(3): 289–316. Goldberg, Adele E., Devin Casenhiser, and Tiffani R. White 2007 Constructions as categories of language. New Ideas in Psychology 25(2): 70–86. Goldberg, Adele E., and Ray Jackendoff 2004 The English resultative as a family of constructions. Language 80(3): 532–568. Good, Irving John, and George H. Toulmin 1956 The number of new species and the increase in population coverage, when a sample is increased. Biometrika 43(1/2): 45–63.

260

References

Grabar, Natalia, and Pierre Zweigenbaum 2003 Productivité à travers domaines et genres: Dérivés adjectivaux et langue médicale. Langue française 140: 102–125. Gries, Stefan Th. 2003 Multifactorial Analysis In Corpus Linguistics: A Study of Particle Placement. London/New York: Continuum. 2005 Syntactic priming: A corpus-based approach. Journal of Psycholinguistic Research 34(4): 365–399. 2008 Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics 13(4): 403–437. 2010 Dispersions and adjusted frequencies in corpora: Further explorations. In Corpus Linguistic Applications: Current Studies, New Directions, Stefan Th. Gries, Stefanie Wulff, and Mark Davies (eds.), 197–212. Amsterdam: Rodopi. Gries, Stefan Th., and Anatol Stefanowitsch 2004 Extending collostructional analysis: A corpus-based perspective on ‘alternations’. International Journal of Corpus Linguistics 9(1): 97– 129. Grimshaw, Jane 1990 Argument Structure. (Linguistic Inquiry Monographs 18.) Cambridge, MA/London, England: MIT Press. Gurevich, Olga, Matthew A. Johnson, and Adele E. Goldberg 2010 Incidental verbatim memory for language. Language and Cognition 2(1): 45–78. ten Hacken, Pius 2009 Early generative approaches. In The Oxford Handbook of Compounding, Rochelle Lieber, and Pavol Štekauer (eds.), 54–77. (Oxford Handbooks in Linguistics.) Oxford: Oxford University Press. Harris, Zellig S. 1954 Distributional structure. Word 10(2-3): 146–162. 1970 Papers in Structural and Transformational Linguistics. Dordrecht: D. Reidel. Hay, Jennifer 2001 Lexical frequency in morphology: Is everything relative? Linguistics 39(6): 1041–1070. Hay, Jennifer, and R. Harald Baayen 2002 Parsing and productivity. Yearbook of Morphology 11. In Geert E. Booij, and Jaap van Marle (eds.), 203–235. Dordrecht: Kluwer. 2003 Phonotactics, parsing and productivity. Italian Journal of Linguistics/Rivista di Linguistica 15(1): 99–130. Hebb, Donald O. 2002 Reprint. The Organization of Behavior. New York: Wiley & Sons. Original edition, New York: Wiley & Sons, 1949.

References

261

Helbig, Gerhard, and Joachim Buscha 2001 Deutsche Grammatik: Ein Handbuch für den Ausländerunterricht. Berlin/Munich: Langenscheidt. Himmelmann, Nikolaus P. 1998 Regularity in irregularity: Article use in adpositional phrases. Linguistic Typology 2(3): 315–354. Hockett, Charles F. 1954 Two models of grammatical description. Word 10: 210–231. 1958 A Course in Modern Linguistics. New York: Macmillan. Hopfield, John J. 1982 Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences of the USA 79(8): 2554–2558. von Humboldt, Wilhelm 1963 Reprint. Werke in fünf Bänden. Vol. 3, Andreas Flitner, and Klaus Giel (eds.), Schriften zur Sprachphilosophie. Stuttgart: Cotta’sche Buchhandlung. Original edition, Berlin: F. Dümmler, 1836. 1988 On Language: The Diversity of Human Language-Structure and its Influence on the Mental Development of Mankind. Translated by Peter Heath. (Texts in German Philosophy.) Cambridge: Cambridge University Press. Iwata, Seizi 2008 Locative Alternation: A Lexical-Constructional Approach. (Constructional Approaches to Language 6.) Amsterdam/Philadelphia: John Benjamins. Jackendoff, Ray 1972 Semantic Interpretation in Generative Grammar. Cambridge, MA: MIT Press. 1987 The status of thematic relations in linguistic theory. Linguistic Inquiry 18(3): 369–411. 1990 Semantic Structures. (Current Studies in Linguistics 18.) Cambridge, MA: MIT Press. 1997 The Architecture of the Language Faculty. (Linguistic Inquiry Monographs 28.) Cambridge, MA/London: MIT Press. 2008 Construction after construction and its theoretical challenges. Language 84(1): 8–28. Jackendoff, Ray, and Peter W. Culicover 2003 The semantic basis of control in English. Language 79(3): 517–556. Jacobs, Joachim 1994 Kontra Valenz. (Linguistisch-Philologische Studien 12.) Trier: Wissenschaftlicher Verlag Trier.

262

References

Kaplan, Ronald, and Joan Bresnan 1982 Lexical-Functional Grammar: A formal system for grammatical representation. The Mental Representation of Grammatical Relations. In Joan Bresnan (ed.), 173–281. Cambridge, MA: MIT Press. Karcevski, Serge 1932 Autour d’un problème de morphologie. Annales Academiæ Scientiarum Fennicæ, Ser. B 27: 84–91. Katz, Jerrold. J., and Jerry A. Fodor 1963 The structure of a semantic theory. Language 39(2): 170–210. Kawahara, Daisuke, and Sadao Kurohashi 2005 PP-attachment disambiguation boosted by a gigantic volume of unambiguous examples. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), Robert Dale, Kam-Fai Wong, Jian Su, and Oi Yee Kwong (eds.), 188–198. Berlin/Heidelberg: Springer. Kay, Paul, and Charles J. Fillmore 1999 Grammatical constructions and linguistic generalizations: The what’s X doing Y? Construction. Language 75(1): 1–33. Keller, Frank, and Mirella Lapata 2003 Using the web to obtain frequencies for unseen bigrams. Computational Linguistics 29(3): 459–484. Kiss, Tibor 2007 Produktivität und Idiomatizität von Präposition-SubstantivSequenzen. Zeitschrift für Sprachwissenschaft 26(2): 317–345. Kjellmer, Göran 1985 Help to/help Ø revisited. English Studies 66(2): 156–161. Kohonen, Teuvo 2001 Self-Organizing Maps. 3rd ed. Berlin/Heidelberg/New York: Springer. Korhonen, Anna 2002 Subcategorization Acquisition. Univeristy of Cambridge Computer Laboratory, Technical Report 530, Cambridge. Kürschner, Wilfried 1974 Zur syntaktischen Beschreibung deutscher Nominalkomposita. Auf der Grundlage generativer Transformationsgrammatiken. (Linguistische Arbeiten 18.) Tübingen: Niemeyer. Labov, William 2004 Quantitative analysis of linguistic variation. In Sociolinguistics. An International Handbook of the Science of Language and Society. Vol. 1, Ulrich Ammon, Norbert Dittmar, Klaus J. Mattheier, and Peter Trudgill (eds.), 6–21. Berlin: Walter de Gruyter.

References

263

Lakoff, George 1993 The contemporary theory of metaphor. In Metaphor and Thought. 2nd ed, Andrew Ortony (ed.), 202–251. Cambridge: Cambridge University Press. Langacker, Ronald W. 1987 Foundations of Cognitive Grammar. Vol. 1: Theoretical Prerequisites. Stanford: Stanford University Press. 1999 Grammar and Conceptualization. (Cognitive Linguistics Research 14.) Berlin: Mouton de Gruyter. 2000 A dynamic usage-based model. In Usage Based Models of Language. Michael Barlow, and Suzanne Kemmer (eds.), 1–63. Stanford: CSLI Publications. Lees, Robert B. 1960 The Grammar of English Nominalizations. The Hague: Mouton de Gruyter. Leser, Martin 1990 Das Problem der 'Zusammenbildungen'. Eine lexikalistische Studie. Trier: Wissenschaftlicher Verlag Trier. Levelt, Willem J. M. 1989 Speaking: From Intention to Articulation. (ACL-MIT Series in Natural Language Processing.) Cambridge, MA: MIT Press. Levin, Beth 1993 English Verb Classes and Alternations. Chicago: University of Chicago Press. Levin, Beth, and Malka Rappaport Hovav 2005 Argument Realization. (Research Surveys in Linguistics.) Cambridge: Cambridge University Press. Li, Wentian 1992 Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Transactions on Information Theory 38(6): 1842–1845. Louw, Johannes P., and Eugene A. Nida 1988 Greek-English Lexicon of the New Testament Based on Semantic Domains. 2nd ed. Vol. 1: Introduction & Domains. New York: United Bible Societies. Lüdeling, Anke, and Stefan Evert 2005 The emergence of productive non-medical -itis. Corpus evidence and qualitative analysis. In Linguistic Evidence. Empirical, Theoretical, and Computational Perspectives, Stephan Kepser, and Marga Reis (eds.), 351–370. Berlin: Mouton de Gruyter. Lüdeling, Anke, Stefan Evert, and Ulrich Heid 2000 On measuring morphological productivity. In KONVENS-2000 Sprachkommunikation, Werner Zühlke, and Ernst G. SchukatTalamazzini (eds.), 57–61. Berlin: VDE-Verlag.

264

References

Lulofs, Berthold H. 1842 Reprint. Gronden der Nederlandsche woordafleidkunde, voor zoo verre dezelve eenigzins zeker is, of woordontleedkundige beschouwing van de wyze, waerop in het Nederduitsch de woorden uit elkander voorspruiten, en met elkaer vermaegschapt of verbonden zyn [Foundations of Dutch morphology, for as far as this is somewhat certain, or the word-derivational investigation of the manner in which the words derive from each other and are wedded or connected with each other in Low German]. Gent: Hemelsoet. Original edition Groningen: Oomkens 1833. Madlener, Karin 2011 Developing productivity with a new construction: Are there frequency effects in instructed second language acquisition (SLA)? In Proceedings of Quantitative Investigations in Theoretical Linguistics 4 (QITL-4), Amir Zeldes, and Anke Lüdeling (eds.), 56–58. Berlin. Maguire, Mandy J., Kathy Hirsh-Pasek, Roberta Michnick-Golinkoff, and Amanda C. Brandone 2008 Focusing on the relation: Fewer exemplars facilitate children’s initial verb learning and extension. Developmental Science 11(4): 628–634. Mair, Christian 1995 Changing patterns of complementation, and concomitant grammaticalisation, of the verb help in present-day British English. In The Verb in Contemporary English. Theory and Description, Bas Aarts, and Charles F. Meyer (eds.), 258–272. Cambridge: Cambridge University Press. 2002 Three changing patterns of verb complementation in late modern English: A real-time study based on matching text corpora. English Language and Linguistics 6(1): 105–131. 2003 Gerundial complements after begin and start: Grammatical and sociolinguistic factors, and how they work against each other. Determinants of Grammatical Variation in English. In Günther Rohdenburg, and Britta Mondorf (eds.), 329–345. Berlin: Mouton de Gruyter. Mandelbrot, Benoît 1953 An information theory of the statistical structure of language. In Communication Theory, Willis E. Jackson (ed.), 503–512. New York: Academic Press. 1962 On the theory of word frequencies and on related Markovian models of discourse. In Structure of Language and its Mathematical Aspects, Roman Jakobson (ed.), 190–219. (Proceedings of Symposia in Applied Mathematics 12.) Providence, RI: American Mathematical Society.

References

265

Manning, Christopher D. 2003 Probabilistic syntax. In Probabilistic Linguistics, Rens Bod, Jennifer Hay, and Stefanie Jannedy (eds.), 289–341. Cambridge, MA: MIT Press. Marenbach, Dieter, and Hans Gärtner 2010 Ich kann’s! 4. Klasse Deutsch: Grammatik. München: Mentor Verlag. van Marle, Jaap 1985 On the Paradigmatic Dimension of Morphological Creativity. Dordrecht: Foris. 1992 The relationship between morphological productivity and frequency: A comment on Baayen’s performance-oriented conception of morphological productivity. In Yearbook of Morphology 1991, Geert E. Booij, and Jaap van Marle (eds.), 151–163. Dordrecht: Kluwer. Marshall, Jonathan A. 1995 Adaptive perceptual pattern recognition by self-organizing neural networks: Context, uncertainty, multiplicity, and scale. Neural Networks 8(3): 335–362. Matsumoto, Yo 1993 Japanese numeral classifiers: A study of semantic categories and lexical organization. Linguistics 31(4): 667–713. Matthews, Peter H. 1974 Morphology. Cambridge: Cambridge University Press. Mayerthaler, Willi 1981 Morphologische Natürlichkeit. Wiesbaden: Athenaion. McCawley, James D. 1968 The role of semantics in a grammar. In Universals in Linguistic Theory, Emmon Bach, and Robert T. Harms (eds.), 124–169. New York: Holt, Rinehart and Winston. McClelland, James L., and David E. Rumelhart (eds.) 1986 Parallel Distributed Processing. Explorations in the Microstructure of Cognition. Vol. 2: Psychological and Biological Models. Cambridge, MA: MIT Press. McEnery, Anthony, and Zhonghua Xiao 2005 HELP or HELP to: What do corpora have to say? English Studies 86(2): 161–187. McGlone, Matthew S., Sam Glucksberg, and Cristina Cacciari 1994 Semantic productivity and idiom comprehension. Discourse Processes 17(2): 167–190. McRae, Ken, Michael J. Spivey-Knowlton, and Michael K. Tanenhaus 1998 Modeling the influence of thematic fit (and other constraints) in online sentence comprehension. Journal of Memory and Language 38(3): 283–312.

266

References

Meinunger, André 2011 Der Wortartenstatus des Elements je in der komparativen Korrelativkonstruktion. Zeitschrift für germanistische Linguistik 39(2): 217– 238. Michaelis, Laura A., and Knud Lambrecht 1996 Toward a construction-based theory of language function: The case of nominal extraposition. Language 72(2): 215–247. Miller, George A. 1957 Some effects of intermittent silence. The American Journal of Psychology 70(2): 311–314. Minsky, Marvin, and Seymour Papert 1969 Perceptrons. Cambridge, MA: MIT Press. Muller, Charles 1979 Du nouveau sur les distributions lexicales: La formule de WaringHerdan. In Langue française et Linguistique quantitative, Charles Muller (ed.), 177–195. Geneva: Slatkine. Müller, Stefan 2010 Grammatiktheorie. (Einführungen Bd. 20.) Tübingen: Stauffenburg. Müller, Wolfgang (ed.) 1985 Duden Bedeutungswörterbuch. 2nd ed. Vol. 10. Mannheim/Vienna/Zurich: Dudenverlag. Munday, Jeremy 2008 Introducing Translation Studies. Theories and Applications. 2nd ed. London/New York: Routledge. Newmeyer, Frederick J. 2003 Grammar is grammar and usage is usage. Language 79(4): 682–707. Nunberg, Geoffrey 1995 Transfers of meaning. Journal of Semantics 12(2): 109–132. Partington, Alan 1998 Patterns and Meanings: Using Corpora for English Language Research and Teaching. Vol. 2. (Studies in Corpus Linguistics.) Amsterdam/Philadelphia: John Benjamins. Paul, Hermann 1959 Deutsche Grammatik. Halle: Niemeyer. Petig, William E. 1997 Genitive prepositions used with the dative in spoken German. Unterrichtspraxis 30(1): 36–39. Philip, Gill 2008 Reassessing the canon. ‘Fixed’ phrases in general reference corpora. In Phraseology: An Interdisciplinary Perspective, Sylviane Granger, and Fanny Meunier (eds.), 95–108. Amsterdam: John Benjamins. Pike, Kenneth L. 1967 Language in Relation to a Unified Theory of the Structure of Human Behavior. The Hague/Paris: Mouton.

References

267

Pinker, Steven, and Alan Prince 1991 Regular and irregular morphology and the psychological status of rules of grammar. Proceedings of the Berkeley Linguistics Society 17: 230–251. Plag, Ingo 1999 Morphological Productivity. Structural Constraints in English Derivation. (Topics in English Linguistics 28.) Berlin/New York: Mouton de Gruyter. 2003 Word-formation in English. (Cambridge Textbooks in Linguistics.) Cambridge: Cambridge University Press. 2006 Productivity. In The Handbook of English Linguistics, Bas Aarts, and April M. S. McMahon (eds.), 537–556. (Blackwell Handbooks in Linguistics.) Malden, MA: Blackwell. 2010 Compound stress assignment by analogy: The constituent family bias. Zeitschrift für Sprachwissenschaft 29(2): 243–282. Plag, Ingo, Christiane Dalton-Puffer, and R. Harald Baayen 1999 Productivity and register. English Language and Linguistics 3(2): 209–228. Plank, Frans 1984 Verbs and objects in semantic agreement: Minor differences between English and German that might suggest a major one. Journal of Semantics 3(4): 305–360. Plunkett, Kim, Jon-Fan Hu, and Leslie B. Cohen 2008 Labels can override perceptual categories in early infancy. Cognition 106(2): 665–681. Pollard, Carl, and Ivan A. Sag 1994 Head-driven Phrase Structure Grammar. Chicago: University of Chicago Press. Powers, David M. W. 1998 Applications and explanations of Zipf’s Law. In NeMLaP3/CoNLL98: New Methods in Language Processing and Computational Natural Language Learning, David M. W. Powers (ed.), 151–160. Stroudsburg, PA. Przepiórkowski, Adam 1999 Case assignment and the complement / adjunct dichotomy: A nonconfigurational constraint-based approach. Ph. D. diss., Universität Tübingen, Germany. Pulvermüller, Friedemann 1996 Hebb’s concept of cell assemblies and the psychophysiology of word processing. Psychophysiology 33(4): 317–333. 1999 Words in the brain’s language. Behavioral and Brain Sciences 22(2): 253–279.

268

References

Pustejovsky, James, and Elisabetta Jezek 2008 Semantic coercion in language: Beyond distributional analysis. Special Issue on “Distributional Models of the Lexicon in Linguistics and Cognitive Science”, Italian Journal of Linguistics/Rivista di Linguistica 20(1): 181–214. Quirk, Randolph, Sydney Greenbaum, Geoffrey Leech, and Jan Svartnik 1972 A Contemporary English Grammar. London: Longman. 1985 A Comprehensive Grammar of the English Language. London: Longman. R Development Core Team 2003 R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Reis, Marga 2009 Zur Struktur von Je-desto-Gefügen und verwandtem im Deutschen. In Koordination und Subordination im Deutschen, Veronika Ehrich, Christian Fortmann, Ingo Reich, and Marga Reis (eds.), 223–244. (Linguistische Berichte Sonderheft 16.) Hamburg: Buske. Riehemann, Susanne Z. 1998 Type-based derivational morphology. Journal of Comparative Germanic Linguistics 2(1): 49–77. Roelofs, Ardi 1992 A spreading-activation theory of lemma retrieval in speaking. Cognition 42(1-3): 107–142. Roeper, Thomas, and Muffy E. A. Siegel 1978 A lexical transformation for verbal compounds. Linguistic Inquiry 9(2): 199–260. Rosenblatt, Frank 1958 The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65(6): 386–408. Rumelhart, David E., and James L. McClelland (eds.) 1986 Parallel Distributed Processing. Explorations in the Microstructure of Cognition. Vol. 1: Foundations. Cambridge, MA: MIT Press. ěeĜicha, Václav 1987 Notes on infinitival and -ing complements of the verbs begin and start. Philologica Pragensia 30(3): 129–133. Rumelhart, David E., and David Zipser 1985 Feature discovery by competitive learning. Cognitive Science 9(1): 75–112. Sahlgren, Magnus 2008 The distributional hypothesis. Italian Journal of Linguistics/Rivista di Linguistica 20(1): 33–53.

References

269

Säily, Tanja 2011 Variation in morphological productivity in the BNC: Sociolinguistic and methodological considerations. Corpus Linguistics and Linguistic Theory 7(1): 119–141. Salem, André 1987 Pratique des segments répétés. Paris: Institut National de la Langue Française. Sampson, Geoffrey R. 2007 Grammar without grammaticality. Corpus Linguistics and Linguistic Theory 3(1): 1–32. de Saussure, Ferdinand 1966 Course in General Linguistics. New York, Toronto/London: McGraw-Hill. 1969 Reprint. Cours de linguistique générale. Paris: Payot. Original edition, Paris: Payot, 1915. Scherer, Carmen 2005 Wortbildungswandel und Produktivität. Eine empirische Studie zur nominalen -er-Derivation im Deutschen. Tübingen: Niemeyer. Schmid, Hans-Jörg 2007 Entrenchment, salience, and basic levels. In The Oxford Handbook of Cognitive Linguistics, Dirk Geeraerts, and Hubert Cuyckens (eds.), 117–138. Oxford: Oxford University Press. Schmid, Helmut 1994 Probabilistic part-of-speech tagging using decision trees. In Proceedings of the Conference on New Methods in Language Processing. , 44–49. Manchester. Schulte im Walde, Sabine 2009 The induction of verb frames and verb classes from corpora. In Corpus Linguistics. An International Handbook. Vol. 2, Anke Lüdeling, and Merja Kytö (eds.), 952–972. Berlin: Mouton de Gruyter. Schultink, Henk 1961 Produktiviteit als morphologisch fenomeen [Productivity as a morphological phenomenon]. Forum der Letteren 2: 110–125. 1992 Produktiviteit, competence en performance [Productivity, competence and performance]. Forum der Letteren 33(3): 206–213. Shibatani, Masayoshi 1990 The Languages of Japan. (Cambridge Language Surveys.) Cambridge: Cambridge University Press. Siebert, Susann 1999 Wortbildung und Grammatik. Syntaktische Restriktionen in der Struktur komplexer Wörter. (Linguitische Arbeiten 408.) Tübingen: Niemeyer. Sinclair, John 2004 Trust the Text. London/New York: Routledge.

270

References

Somers, Harold L. 1984 On the validity of the complement-adjunct distinction in valency grammar. Linguistics 22(4): 507–530. Spencer, Andrew 1991 Morphological Theory. Oxford, UK/Cambridge, MA: Blackwell. Spratling, M. W., and Mark H. Johnson 2002 Preintegration lateral inhibition enhances unsupervised learning. Neural Computation 14(9): 2157–2179. Steels, Luc, and Joachim de Beule 2006 Unify and merge in fluid construction grammar. In Symbol Grounding and Beyond: Proceedings of the Third International Workshop on the Emergence and Evolution of Linguistic Communication, Paul Vogt, Yuuga Sugita, Elio Tuci, and Chrystopher L. Nehaniv (eds.), 197–223. (Lecture Notes in Artificial Intelligence 4211.) Berlin: Springer. Stefanowitsch, Anatol 2008 Negative entrenchment: A usage-based approach to negative evidence. Cognitive Linguistics 19(3): 513–531. Stefanowitsch, Anatol, and Stefan Th. Gries 2003 Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics 8(2): 209–243. Štichauer, Pavel 2009 Morphological productivity in diachrony: The case of the deverbal nouns in -mento, -zione and -gione in Old Italian from the 13th to the 16th century. In Selected Proceedings of the 6th Décembrettes, Fabio Montermini, Gilles Boyé, and Jesse Tseng (eds.), 138–147. Somerville, MA: Cascadilla Proceedings Project. Summers, Della (ed.) 2003 Longman Dictionary of Contemporary English. 4th ed. Harlow: Longman. Szmrecsanyi, Benedikt 2006 Morphosyntactic Persistence in Spoken English. A Corpus Study at the Intersection of Variationist Sociolinguistics, Psycholinguistics, and Discourse Analysis. (Trends in Linguistics. Studies and Monographs 177.) Berlin/New York: Mouton de Gruyter. Taylor, John R. 1996 On running and jogging. Cognitive Linguistics 7(1): 21–34. Terrell, Peter, Veronika Calderwood-Schnorr, Wendy V. A. Morris, and Roland Breitsprecher (eds.) 1980 Collins German-English English-German Dictionary. London/Glasgow: Collins. Tomasello, Michael 2003 Constructing a Language. A Usage-based Theory of Language Acquisition. Cambridge, MA/London: Harvard University Press.

References

271

TrawiĔski, Beata, Manfred Sailer, and Jan-Philipp Soehn 2006 Combinatorial aspects of collocational prepositional phrases. In Syntax and Semantics of Prepositions. Patrick Saint-Dizier (ed.), 181– 196. (Text, Speech and Language Technology 29.) Dordrecht: Springer. Trubetzkoy, Nikolai S. 1989 Reprint. Grundzüge der Phonologie. Göttingen: Vandenhoeck & Ruprecht. Original edition, Prague, 1939. Uhlenbeck, Eugenius M. 1978 Studies in Javanese Morphology. The Hague: Martinus Nijhoff. Vater, Heinz 1978 On the possibility of distinguishing between complements and adjuncts. In Valence, Semantic Case and Grammatical Relations. Werner Abraham (ed.), 21–45. (Studies in Language Companion Series 1.) Amsterdam: Benjamins. Vegnaduzzo, Stefano 2009 Morphological productivity rankings of complex adjectives. In Proceedings of the NAACL HLT Workshop on Computational Approaches to Linguistic Creativity, June 5, Boulder CO, 79–86. Stroudsburg, PA: Association for Computational Linguistics. Walsh, Michael, Bernd Möbius, Travis Wade, and Hinrich Schütze 2010 Multilevel exemplar theory. Cognitive Science 34(4): 537–582. Waugh, Linda R., and Barbara A. Lafford 1994 Markedness. In The Encyclopedia of Language and Linguistics, Ronald E. Asher, and Joy M. Y. Simpson (eds.), 2378–2383. Oxford/New York/Seoul/Tokyo: Pergamon Press. Weinreich, Uriel 1966 Explorations in semantic theory. Current Trends in Linguistics 3: 395–477. Wierzbicka, Anna 1996 Semantics: Primes and Universals. Oxford: Oxford University Press. Williams, Edwin 1994 Remarks on lexical knowledge. Lingua 92: 7–34. Wittgenstein, Ludwig 2009 Philosophical Investigations. 4th ed., translated by Gertrude E. M. Anscombe. Peter M. S. Hacker, and Joachim Schulte (eds.). West Sussex, U.K.: Wiley-Blackwell. Wood, Frederick T. 1962 Current English Usage. London: Macmillan. Wulff, Stefanie 2006 Go-V vs. Go-and-V in English: A case of constructional synonymy? In Corpora in Cognitive Linguistics: Corpus-based Approaches to Syntax and Lexis, Stefan Th. Gries, and Anatol Stefanowitsch (eds.),

272

References

101–125. (Trends in Linguistics Studies and Monographs 172.) Berlin/New York: Mouton de Gruyter. 2008 Rethinking Idiomaticity: A Usage-based Approach. (Research in Corpus and Discourse.) London/New York: Continuum. Wulff, Stefanie, Nick C. Ellis, Ute Römer, Kathleen Bardovi-Harlig, and Chelsea Leblanc 2009 The acquisition of tense-aspect: Converging evidence from corpora and telicity ratings. The Modern Language Journal 93(3): 354–369. Zeldes, Amir 2009 Quantifying constructional productivity with unseen slot members. In Proceedings of the NAACL HLT Workshop on Computational Approaches to Linguistic Creativity, June 5, Boulder CO, 47–54. Stroudsburg, PA: Association for Computational Linguistics. 2011 On the productivity and variability of the slots in German comparative correlative constructions. In Grammar & Corpora, Third International Conference, Mannheim, 22.-24.09.2009, Marek Konopka, Jacqueline Kubczak, Christian Mair, František Štícha, and Ulrich H. Waßner (eds.), 429–449. Tübingen: Narr. Zeldes, Amir, Anke Lüdeling, and Hagen Hirschmann 2008 What’s hard? Quantitative evidence for difficult constructions in German learner data. In Proceedings of Quantitative Investigations in Theoretical Linguistics 3 (QITL-3), Antti Arppe, Kaius Sinnemäki, and Urpo Nikanne (eds.), 74–77. Helsinki. Ziegeler, Debra 2007 Arguing the case against coercion. In Aspects of Meaning Construction, Günter Radden, Klaus-Michael Köpke, Thomas Berg, and Peter Siemund (eds.), 99–123. Amsterdam/Philadelphia: John Benjamins. Zifonun, Gisela, Ludger Hoffmann, and Bruno Strecker (eds.) 1997 Grammatik der deutschen Sprache. Berlin/New York: De Gruyter. Zipf, George K. 1949 Human Behavior and the Principle of Least Effort. Cambridge, MA: Addison-Wesley. Zwanenburg, Wiecher 1980 Regards du dix-septième siècle français sur la productivité morphologique. In Linguistic Studies Offered to Berthe Siertsema, Dick J. van Alkemade, Anthonia Feitsma, Willem J. Meys, Pieter van Reenen, and Jacob J. Spa (eds.), 243–254. Amsterdam: Rodopi. 1983 Productivité morphologique et Emprunt. (Lingvisticae investigationes supplementa 10.) Amsterdam/Philadelphia: Benjamins.

Author index

Abbot-Smith, Kirsten, 218, 242 Abney, Steven P., 102 Al, Bernard P. F., 21 Algeo, John, 174 Altenberg, Bengt, 39 Anderson, Stephen R., 21 Anshen, Frank, 30 Aronoff, Mark, 21, 30, 34, 50, 76, 85 Baayen, R. Harald, 4, 19, 27, 30, 32, 34, 38, 40, 45, 50, 53, 60, 62–64, 66, 68, 70–74, 76, 78–81, 84–87, 89–91, 194–195, 217, 236, 239, 249 Bakken, Kristin, 21 Barðdal, Jóhanna, 4, 20, 24, 36–37, 50, 98, 124, 209 Bardovi-Harlig, Kathleen, 215, 243 Baroni, Marco, 28, 68, 72, 81, 97, 115, 195 Barsalou, Lawrence W., 201, 205, 223 Barz, Irmhild, 26 Bauer, Laurie, 4, 17–18, 20–23, 27, 29–30, 36, 39–40, 42–43, 50, 63, 76, 91, 101 Beck, Sigrid, 186 Behrens, Heike, 7, 28 Bernardini, Silvia, 28, 97, 115 de Beule, Joachim, 216, 237 Biber, Douglas, 39, 160, 174, 238 Bickerton, Derek, 1 Bishop, Christopher M., 201 Bloomfield, Leonard, 101, 151, 226 Boas, Hans C., 142, 221 Bock, Kathryn, 199 Bod, Rens, 196, 216, 237 Bolinger, Dwight, 19 Bolozky, Shmuel, 64, 242

Booij, Geert E., 4, 21, 25, 27, 167 Bosch, Peter, 152 Botha, Rudolf P., 27–28, 32 Boyd, Jeremy K., 193, 232 Brandone, Amanda C., 215 Braunmüller, Kurt, 107 Bresnan, Joan, 235–236 Brooks, Patricia J., 11 Buscha, Joachim, 24, 107 Bybee, Joan L., 7, 50, 196, 199, 208, 225 Cacciari, Cristina, 44, 154 Cappelle, Bert, 174 Carlson, Gregory, 167 Carroll, David W., 212 Carstairs-McCarthy, Andrew, 36 Casenhiser, Devin M., 215 Casenhiser, Devin, 215 Chomsky, Noam, 2, 6, 101, 139 Christ, Oliver, 51, 244 Clausner, Timothy C., 8, 36, 98, 209 Cohen, Leslie B., 205 Collins, Allan M., 212 Conrad, Susan, 39, 160, 174 Corbin, Danielle, 36, 50 Cortes, Viviana, 39 Croft, William, 5, 8, 36, 50, 98, 184, 196, 209, 211 Cruse, D. Alan, 44, 50, 147 Cueni, Anna, 236 Culicover, Peter W., 126, 146 Dalton-Puffer, Christiane, 32, 53, 64, 66 Dauben, Joseph W., 35 Dik, Simon, 27, 29–30 den Dikken, Marcel, 126 Dodson, Kelly, 11

274

Author index

Dowty, David R., 12, 31, 41, 141– 142 Duffley, Patrick J., 162 Eeg-Olofsson, Mats, 39 Efron, Bradley, 76 Ellis, Nick C., 215, 243 Elman, Jeffrey L., 199, 202, 219, 221, 225 Erben, Johannes, 26 Erjavec, Irena Srdanoviü, 181 Erjavec, Tomaž, 181 Erman, Britt, 2 Evert, Stefan, 18, 30, 38, 50–51, 53, 64, 68–69, 72, 80–82, 84, 90, 93, 102, 111, 153, 195, 200 Ferraresi, Adriano, 28, 97, 115 Ferreira-Junior, Fernando, 215, 243 Fillmore, Charles J., 5, 8, 163, 182, 190–191 Finegan, Edward, 160, 174 Firth, John R., 9 Fischer, Kerstin, 5, 8, 235 Fleischer, Wolfgang, 21, 26–27 Fodor, Jerry A., 140–141, 143, 145 Földiák, Peter, 204 Fox, Anthony, 27 Frauenfelder, Ulli, 38, 73, 249 Fuhrhop, Nanna, 102 Gaeta, Livio, 64, 87, 109, 166, 168 Gahl, Susanne, 193 Garnsey, Susan M., 193 Gärtner, Hans, 51, 61 Geeraerts, Dirk, 139, 145 Gentner, Dedre, 215 Glucksberg, Sam, 41, 44, 154 Goldberg, Adele E., 5–6, 8, 10–12, 24, 50, 98, 140, 142, 193, 196, 201, 214–215, 220–222, 232 Goldberg, Adele, 215 Good, Irving John, 76 Grabar, Natalia, 32

Greenbaum, Sydney, 174–175 Gries, Stefan Th., 10, 38, 149, 174, 195, 199, 210, 212 Grimshaw, Jane, 12, 166 Gurevich, Olga, 201 ten Hacken, Pius, 167 Halle, Morris, 101 Harris, Zellig S., 9, 160, 188 Hay, Jennifer, 30, 74, 217 Hebb, Donald O., 202 Heid, Ulrich, 50–51, 53, 64, 102 Helbig, Gerhard, 24, 107 Himmelmann, Nikolaus P., 24 Hirschmann, Hagen, 243 Hirsh-Pasek, Kathy, 215 Hirst, Graeme, 160, 174 Hockett, Charles F., 19, 21–22 Hopfield, John J., 202 Hovav, Malka Rappaport, 139, 143 Hu, Jon-Fan, 205 von Humboldt, Wilhelm, 2 Iwata, Seizi, 173 Jackendoff, Ray, 12, 27, 31, 101, 126, 142–144, 146, 151, 212–213, 221, 223, 226, 229 Jacobs, Joachim, 13, 142 Jezek, Elisabetta, 146 Johansson, Stig, 160, 174 Johnson, Mark H., 204 Johnson, Matthew A., 201 Jones, James K., 238 Kaplan, Ronald, 235 Karcevski, Serge, 29 Katz, Jerrold. J., 140–141 Kawahara, Daisuke, 97 Kay, Paul, 5 Keller, Frank, 238 Keysar, Boaz, 41 Kilgarriff, Adam, 181 Kiss, Tibor, 4, 24, 64, 99

Author index Kjellmer, Göran, 174 Kohonen, Teuvo, 202 Korhonen, Anna, 149 Kurohashi, Sadao, 97 Kürschner, Wilfried, 167 Labov, William, 213 Lafford, Barbara A., 78 Lakoff, George, 41 Lambrecht, Knud, 10 Langacker, Ronald W., 7–8, 11, 19– 21, 125, 196, 207, 213, 215 Lapata, Mirella, 238 LeBlanc, Chelsea, 215, 243 Leech, Geoffrey, 160, 174–175 Lees, Robert B., 167 Leser, Martin, 26 Levelt, Willem J. M., 210 Levin, Beth, 139–140, 143, 173 Lewis, Lawrence B., 11 Li, Wentian, 78 Lieber, Rochelle, 40, 45, 50, 79, 86– 87, 89–90 Loebell, Helga, 199 Loftus, Elizabeth F., 212 Louw, Johannes P., 153 Lüdeling, Anke, 18, 30, 50–51, 53, 64, 69, 102, 243 Lulofs, Berthold H., 21 Madlener, Karin, 215, 243 Maguire, Mandy J., 215 Mair, Christian, 174–179 Mandelbrot, Benoît, 78–80 Manning, Christopher D., 2, 6, 11, 148, 193, 228 Marenbach, Dieter, 51, 61 van Marle, Jaap, 4, 34, 43, 85 Marshall, Jonathan A., 202, 204 Matsumoto, Yo, 182 Matthews, Peter H., 27, 29 Mayerthaler, Willi, 17 McCawley, James D., 141 McEnery, Anthony, 174–176

275

McGlone, Matthew S., 44, 154 McRae, Ken, 207 Medina, José, 215 Meinunger, André, 185 Michaelis, Laura A., 10 Michnick-Golinkoff, Roberta, 215 Miller, George A., 78 Minsky, Marvin, 201 Möbius, Bernd, 217 Muller, Charles, 90 Müller, Stefan, 235 Munday, Jeremy, 182 Newmeyer, Frederick J., 6, 28 Nida, Eugene A., 153 Nikitina, Tatiana, 236 Nunberg, Geoffrey, 146 Papert, Seymour, 201 Partington, Alan, 43 Paul, Hermann, 107 Petig, William E., 107 Philip, Gill, 30, 43 Pike, Kenneth L., 27 Pinker, Steven, 27, 215 Plag, Ingo, 4, 18, 21, 27, 29–30, 32, 34, 36, 38, 45, 49, 53, 64, 66, 101, 217 Plank, Frans, 241 Plunkett, Kim, 205 Pollard, Carl, 235 Powers, David M. W., 78 Prince, Alan, 27, 215 Przepiórkowski, Adam, 13, 222 Pulvermüller, Friedemann, 202, 205–206 Pustejovsky, James, 146 Quirk, Randolph, 174–175 R Development Core Team, 82 Reis, Marga, 185 Renouf, Antoinette, 19 ěeĜicha, Václav, 178

276

Author index

Ricca, Davide, 64, 87, 109 Riehemann, Susanne Z., 51 Roelofs, Ardi, 204 Roeper, Thomas, 166 Römer, Ute, 215, 243 Rosenblatt, Frank, 201 Rumelhart, David E., 204 Sag, Ivan A., 235 Sahlgren, Magnus, 9 Sailer, Manfred, 24 Säily, Tanja, 64 Salem, André, 38 Sampson, Geoffrey R., 6, 11, 229 Sato, Hiroaki, 182 de Saussure, Ferdinand, 21–22, 27, 151–152, 223 Scherer, Carmen, 30 Schmid, Hans-Jörg, 11 Schmid, Helmut, 51 Schreuder, Robert, 38, 73, 249 Schulte im Walde, Sabine, 139–140, 149 Schultink, Henk, 4, 18, 20–23 Schütze, Hinrich, 217 Sethuraman, Nitya, 215 Shibatani, Masayoshi, 43, 181 Siebert, Susann, 51 Siegel, Muffy E. A., 166 Sinclair, John, 239 Soehn, Jan-Philipp, 24 Somers, Harold L., 13, 222 Spencer, Andrew, 101 Spivey-Knowlton, Michael J., 207 Spratling, M. W., 204 Steels, Luc, 216, 237 Stefanowitsch, Anatol, 5, 8, 38, 149, 174, 193, 235 Štichauer, Pavel, 30, 64 Svartnik, Jan, 174–175 Szmrecsanyi, Benedikt, 199

Tanenhaus, Michael K., 207 Taylor, John R., 143 Thisted, Ronald, 76 Tomasello, Michael, 11, 215, 218, 242 Toulmin, George H., 76 TrawiĔski, Beata, 24 Trubetzkoy, Nikolai S., 78 Uhlenbeck, Eugenius M., 21 Vater, Heinz, 13, 222 Vegnaduzzo, Stefano, 64 Wade, Travis, 217 Walsh, Michael, 217 Warren, Beatrice, 2 Waugh, Linda R., 78 Weinreich, Uriel, 145 White, Tiffani R., 215 Wierzbicka, Anna, 143–145 Williams, Edwin, 213 Wittgenstein, Ludwig, 9 Wood, Frederick T., 175 Wulff, Stefanie, 6, 10, 30, 44, 98, 153, 174, 215, 229, 243 Xiao, Zhonghua, 174–176 Zanchetta, Eros, 28, 97, 115 Zeldes, Amir, 30, 46, 64, 134, 168, 238, 240, 243 Ziegeler, Debra, 145 Zipf, George K., 76 Zipser, David, 204 Zwanenburg, Wiecher, 21, 27–28 Zweigenbaum, Pierre, 32

Subject index

abhor. See verbs of hate -able, 50 absolutely. See intensifiers acceptability, 26–27, 29 achieve, 34, 115–124 affixation, 18–20, 22, 24, 34, 87, 125, 231 agent agent nouns, 90, 166–167, 170, 173 in syntax, 12, 139, 142, 146, 175, 241 -al, 28 amplifiers. See intensifiers an X statt (German), 114 analogy, 22, 61 anspitzen (German), 169 anstatt (German), 114 anstreben (German), 192 anstrengen (German), 103, 192, 193, 196, 240 argument-adjunct distinction, 5, 13, 228 -ate, 34 -ation, 91 aufschäumen (German), 169 auswerten (German), 169 availability, 35, 93–94 bake, 139, 140, 148 -bar (German), 50–75, 78, 81–83, 86–89, 94, 102, 249 bark, 241 begin. See verbs of beginning beschreiben (German), 169 blocking, 29 boil, 151 build, 106, 141

causative, 19 coercion, 44, 145–147, 148, 154 cognitive grammar, 6–7, 125 collocation, 2, 38, 41, 76, 100, 104, 150, 153–155, 157, 162, 170, 191–193, 210, 218, 231 commence. See verbs of beginning communicative needs, 1, 54–55, 57, 60, 66, 84, 134, 149, 210, 223 comparative adjective, 129–135, 198, 208, 219, 221, 231 comparative correlative, 4, 9, 13, 125–135, 136–137, 149, 185–187, 197, 199, 208–210, 212, 219–223, 232, 238, 240 competence, 1, 38, 235 completely. See intensifiers compositionality, 8, 12, 73, 74, 93, 105, 140, 157, 191, 217–218, 234 compounds, 25, 52, 60, 101–105, 109, 115, 117, 125, 155, 166–173, 184, 241 synthetic compounds, 13, 166– 172, 235 comprehend. See verbs of understanding connectionism, 190, 201–202, 215, 221, 239 construction grammar, 5–11, 14, 106, 125, 142, 196–197, 216, 220, 226, 230, 233, 234–235 construction morphology, 167 conversion, 22 copula, 19 creativity, 20, 39–45, 62 -cy, 30 Data-Oriented Parsing, 216–218, 223, 237

278

Subject index

de-, 91 decomposition, 138, 142–145, 166– 167, 172, 226, 234 defy, 115, 116–124 despise. See verbs of hate detest. See verbs of hate devein, 141 deverbal derivation. See nominalization devour, 151 diachrony, 30, 46–47, 85 diagonalize, 141 dictionaries, 49, 60, 74, 144, 183 diminutive, 90 discontinuous morphemes. See transfixation dislike. See verbs of hate dispersion, 58, 84, 195, 231 Distributional Hypothesis, 9, 161, 186 ditransitive, 9, 10, 12, 20, 24, 25, 125–126, 215 drink, 32, 115–124, 136, 140, 144– 145, 147–148, 151, 153, 180, 187–188, 209, 219, 225, 232, 236 drive, 166–167 Dutch, 30, 34, 74, 85, 90 eat, 3, 32, 34, 115–124, 136, 147– 148, 151, 180, 209, 225 -ed, 36 -ee, 91 en- -en, 34 English, 20, 23, 29, 30, 33–34, 36, 40, 67, 76, 87, 97, 101, 114–115, 134, 140, 167, 173–174, 181, 183–185, 187, 193, 210, 228, 232, 238 entirely. See intensifiers entrenchment, 11, 32, 36, 37, 52, 105, 163, 196–197, 199, 200, 206, 208, 214, 217, 220, 223, 226–227, 232–233, 236–237

enumerability, 29, 30, 31, 33, 35, 67, 72–73, 76, 79, 135 -er, 8, 91 -er (German), 167–168, 170 essen (German), 105 etymology, 29, 56 evade, 157 experimental approaches, 30, 239 extremely. See intensifiers fathom. See verbs of understanding first language acquisition, 11, 47, 190, 214–215, 218, 233, 239, 241–242 fluid construction grammar, 216, 237 French, 76 frequency, 6, 14, 74, 86, 88, 91–95, 105, 119, 120, 124, 132, 135, 153, 158, 160–161, 163, 169, 170–171, 175, 179, 190, 193–197, 199–200, 208–209, 214, 219, 227–228, 231 frequency spectrum, 68, 72–73, 75– 76, 86, 200 generative grammar, 1, 2, 28, 30, 38, 105, 139, 167, 217, 235 genre. See register, See register German, 4, 20, 24, 29, 50–51, 56, 61, 67, 73, 85, 97, 99, 102–104, 106– 114, 134, 149, 155, 167, 168, 170, 183–186, 191, 196, 228, 231, 234, 238, 240 grammaticality, 6, 11, 61, 140, 160, 175, 179, 193, 210–211, 214, 229 hapax legomena, 60–68, 71, 73–74, 78–79, 81, 86, 92, 94, 97, 109, 111, 113, 116, 124, 127–128, 130, 134–135, 149, 161, 171, 181, 184–186, 191, 200–201, 207–209, 231, 233, 236, 249 harbor, 4, 34, 115, 116–124, 183– 185

Subject index hate. See verbs of hate Hebb’s Law, 190, 202–210, 218, 223, 225, 227, 233–234, 242 Hebrew, 20, 185, 242 hegen (German), 4, 183–184 -heid (Dutch), 34 help (to) V, 4, 174–177, 187 herstellen (German), 169–170, 187 hierarchical argument selection, 13, 26, 100–101, 105, 125, 240 highly. See intensifiers hinterziehen (German), 155–157 homonymy, 44, 108 horror aequi, 175, 178 HPSG, 5, 235–237 Icelandic, 98, 106 idiom, 9, 10, 19, 30, 46, 100, 153– 155, 186, 191, 197, 208, 226, 229 in-, 92 in X’s stead, 114 incur, 3, 115–124 infinitive complements, 174–180 inflection, 32, 49, 67 information theory, 78 -ing (Dutch), 34 innocent speaker, 8, 105, 163, 190, 191–193 instead of X, 114 intensifiers, 13, 157–158, 241 -ish, 34 item-and-arrangement morphology, 22, 23 item-and-process morphology, 22, 23 -ity, 30 -ize, 34 Japanese, 19, 181–182 jog, 2, 3, 9, 154–155 Katzian semantics, 140–141, 145 kick the bucket, 100 kill, 31, 141–144

279

language change. See diachrony Large Number of Rare Events. See LNRE distributions Latinate affixes, 29, 76 leiten (German), 169 Lexical Conceptual Structure, 143– 144 lexical semantics, 4, 47, 136–138, 149, 138–150, 152, 157, 165–166, 187, 190, 234 Lexicalist Hypothesis, 101–104 lexicalization, 2–3, 10, 14, 30, 40, 46, 61, 78, 100, 105, 125, 134, 150, 155, 157, 163–164, 170, 191, 214, 236–237 LFG, 235, 237 -lich (German), 66–67, 86–89, 94 LNRE distributions, 80–85, 86, 134, 217, 231–232, 242 loathe. See verbs of hate locative alternation, 173 long live X construction, 106 long time no see construction, 40, 226–227 -ly, 19 machen (German), 170 markedness, 78, 90 -ment, 92 mental lexicon, 2, 8–11, 23, 28, 29, 98, 106, 125, 139, 142, 154, 187– 190, 193, 196–201, 209–210, 217–219, 225–226, 228, 231, 233 metaphor, 36, 41–42, 155 minimieren (German), 169 Morphological Race Model, 38, 73, 93, 249 MSN, 100, 106 msna (Icelandic), 98 multiple slots, 13, 25, 125–135, 136, 196, 232, 240 Natural Semantic Metalanguage, 143, 144

280

Subject index

nehmen (German), 171 neologism. See novel, words -ness, 22, 30, 34, 60, 136 neural networks, 201–202, 221, 227. See also connectionism nominalization, 4, 13, 30, 117, 136, 166–167, 170, 173 nomu (Japanese), 181–182 non-concatenative morphology. See transfixation noun-modified nouns. See compounds novel arguments, 20, 25, 41, 46, 96, 104, 113, 117, 124, 127–128, 138, 147, 149–150, 157, 163, 166, 171, 174, 191–193, 210, 231, 239 combinations of arguments, 128 construction heads, 20, 24, 98, 100 constructions, 214–215 environments, 9 utterances, 1–2, 19, 20 words, 19, 29, 40, 51, 57, 59–61, 67, 69, 73, 78, 88, 90, 98 N-P-N construction, 213–214, 228, 234 numeral classifier, 182 opacity, 105 -ous, 92 parsing computational, 46, 140, 216, 223, 240 in humans, 38, 46, 73–74, 93, 236, 249 passive, 9, 126, 144, 234 performance, 1, 38, 47, 50 personal pronouns, 67, 72–73 phonology, 56, 78, 143, 168, 217 polieren (German), 102, 104 polish, 101–102, 105

polysemy, 101, 146 pose, 192–193 PP. See prepositional phrase pre-emption, 11 prefab, 2 prefixation, 52, 102 prepositional phrase, 23–25, 99, 106–114, 137, 167, 212, 232, 240 priming, 10, 74, 200, 212, 218–219, 225, 227, 235, 242 Productivity Complex, 21, 45–46, 135, 154, 164–172, 187, 189, 190, 193–196, 197, 209, 214, 223, 227, 231–232 productivity measures activation level ($ ), 73–74, 92– 94, 217, 250 expanding productivity (3 *), 38, 62–68, 91–94 global productivity (I, , , P*), 45, 50, 63, 76, 85–92, 93–94, 117, 121, 124 potential productivity (3 ), 38, 45, 63–68, 70, 86, 89, 92–94, 104, 109, 110, 117, 120–124, 128, 132, 135–136, 165, 171, 175– 176, 184, 193–196, 198, 209, 223, 225, 227, 239, 250 total vocabulary (S), 64, 76–85, 88–89, 92–94, 109, 112, 120– 121, 124, 135–136, 164–165, 175, 177, 193–196, 227, 231 vocabulary (V), 38, 45, 67, 69, 71, 86, 90, 92–94, 104, 106, 110– 112, 117, 120–124, 127–128, 135–136, 158, 162, 165, 168, 175–176, 179, 184, 193–196, 214, 223, 227, 231 profitability, 35, 50, 92–94 push, 115, 116–124 recency, 199–201, 225

Subject index register, 2, 32–33, 53, 66, 85, 107, 114, 135, 162–163, 191–192, 195, 218, 225, 238 regularity, 18–20, 36, 38–39, 93–94 relative clause, 100 represent, 192 reproducibility, 30, 50, 53, 57, 94, 97, 138 -s plural, 7, 19, 29 sagen (German), 170 salience, 11, 200–201 -sam (German), 29, 50–75, 81–83, 86–89, 102, 118, 249 SBCG. See sign-based construction grammar schematicity, 36–37, 203 second language acquisition, 215, 242 see, 173 sehen (German), 169–170 semantic class, 3, 12, 14, 29, 31–32, 41, 138, 140–150, 153, 156–157, 163, 165, 167, 170–173, 180–181, 183, 187, 201, 204–208, 219, 226, 234 semi-productivity, 28–33, 36 shake, 2–3, 154–155 sift, 34, 115–124 sign-based construction grammar, 5, 235 skewed distribution, 134, 201–210, 214–216, 218, 233. See also LNRE distributions -some, 29, 50 spend, 115–124 spill, 153, 157, 187, 219 spray, 173 start. See start to V/Ving; verbs of beginning start to V/Ving, 4, 177–180, 219 suffixation, 34, 36, 50–51, 66–67, 102 suppletion, 37

281

synonymy, 2–3, 11, 15, 32, 34, 138– 139, 145, 149, 150–165, 166, 180, 187–188, 191, 207, 212, 230, 239, 242 syntactic alternation. See ditransitive, locative alternation text type. See register thematic role, 12, 31, 139–140, 142, 173 thematic roles, 139, 141–142, 144 theme in synthetic compounds, 166 verbal object, 10, 126, 139, 146, 151, 155, 166, 173, 236 -tje (Dutch), 34 totally. See intensifiers transfixation, 22 translational equivalents, 4, 150, 180–187 transparency, 8, 37, 93–94, 184, 217 transparent nouns, 182 type definition, 98–106, 127, 128, 232 type exploitation, 146 type introduction, 146 type-token ratio, 35, 57–59, 111, 113, 121 un-, 92 understand. See verbs of understanding usage-based grammar, 4, 5–11, 102, 105, 134, 188, 190, 196, 219, 225–226, 228, 233–234, 236, 239 verbinden (German), 169 verbs of beginning, 3, 158, 162–163, 187 verbs of hate, 157–160 verbs of understanding, 157–58, 160–162 verlieren (German), 169 very. See intensifiers

282

Subject index

vocabulary growth curve, 68–72, 75–76, 92–93, 109, 112, 120, 127–128, 131, 153, 158–159, 176, 194–195, 197, 199–201 wegen (German), 4, 106–114, 135, 149, 162, 191–192, 231

world knowledge, 4, 47, 114, 134, 136–138, 147–150, 157, 173, 182, 180–189, 219, 225, 231 zero article, 24 Zipf’s Law, 72, 76–81, 249, 250