208 87 7MB
English Pages 652 [656] Year 2015
Syntax − Theory and Analysis HSK 42.2
Handbücher zur Sprach- und Kommunikationswissenschaft Handbooks of Linguistics and Communication Science Manuels de linguistique et des sciences de communication Mitbegründet von Gerold Ungeheuer Mitherausgegeben (1985−2001) von Hugo Steger
Herausgegeben von / Edited by / Edités par Herbert Ernst Wiegand
Band 42.2
De Gruyter Mouton
Syntax − Theory and Analysis An International Handbook Volume 2 Edited by Tibor Kiss Artemis Alexiadou
De Gruyter Mouton
ISBN 978-3-11-035866-7 e-ISBN (PDF) 978-3-11-036370-8 e-ISBN (EPUB) 978-3-11-039316-3 ISSN 1861-5090 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2015 Walter de Gruyter GmbH, Berlin/Munich/Boston Typesetting: Meta Systems Publishing & Printservices GmbH, Wustermark Printing and binding: Hubert & Co. GmbH & Co. KG, Göttingen Cover design: Martin Zech, Bremen 앝 Printed on acid-free paper 앪 Printed in Germany www.degruyter.com
This handbook is dedicated to the memory of our dear friend Ursula Kleinhenz (1965−2010). The light that burns twice as bright burns half as long.
Contents Volume 2 IV. Syntactic Models 24. 25. 26. 27. 28. 29. 30. 31.
Minimalism · Marc Richards . . . . . . . . . . . . . . . . . . . . . . . . Lexical-Functional Grammar · Miriam Butt and Tracy Holloway King Optimality-Theoretic Syntax · Gereon Müller . . . . . . . . . . . . . . HPSG − A Synopsis · Stefan Müller . . . . . . . . . . . . . . . . . . . Construction Grammar · Mirjam Fried . . . . . . . . . . . . . . . . . . Foundations of Dependency and Valency Theory · Michael Klotz . . Dependency Grammar · Timothy Osborne . . . . . . . . . . . . . . . . Categorial Grammar · Jason Baldridge and Frederick Hoyt . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
803 839 875 937 974 1004 1027 1045
V. Interfaces 32. 33. 34. 35. 36.
Syntax and the Lexicon · Artemis Alexiadou . . . . The Syntax-Morphology Interface · Heidi Harley . . Phonological Evidence in Syntax · Michael Wagner The Syntax-Semantics Interface · Winfried Lechner The Syntax − Pragmatics Interface · George Tsoulas
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
1088 1128 1154 1198 1256
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
1284 1321 1357 1400
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
VI. Theoretical Approaches to Selected Syntactic Phenomena 37. 38. 39. 40.
Arguments and Adjuncts · Daniel Hole Models of Control · Tibor Kiss . . . . Theories of Binding · Silke Fischer . . Word Order · Klaus Abels . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Volume 1
I.
Introduction: Syntax in Linguistics
1.
Syntax − The State of a Controversial Art · Tibor Kiss and Artemis Alexiadou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Syntactic Constructions · Peter Svenonius . . . . . . . . . . . . . . . . . . . . 3. Syntax and its Interfaces: An Overview · Louise Mycock . . . . . . . . . . .
1 15 24
viii
Contents
II. The Syntactic Tradition 4. The Indian Grammatical Tradition · Peter Raster . . . . . . . . . . . . . . . . 5. Arabic Syntactic Research · Jonathan Owens . . . . . . . . . . . . . . . . . . 6. Prestructuralist and Structuralist Approaches to Syntax · Pieter A. M. Seuren . .
70 99 134
III. Syntactic Phenomena 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.
Syntactic Categories and Subcategories · Hans-Jürgen Sasse . . . . . . Grammatical Relations · Beatrice Primus . . . . . . . . . . . . . . . . . Arguments and Adjuncts · Peter Ackema . . . . . . . . . . . . . . . . . The Morpho-Syntactic Realisation of Negation · Hedde Zeijlstra . . . . The Syntactic Role of Agreement · Stephen Wechsler . . . . . . . . . . Verb Second · Anders Holmberg . . . . . . . . . . . . . . . . . . . . . Discourse Configurationality · Katalin É. Kiss . . . . . . . . . . . . . . Control · Barbara Stiebels . . . . . . . . . . . . . . . . . . . . . . . . . Pronominal Anaphora · Silke Fischer . . . . . . . . . . . . . . . . . . . Coordination · Katharina Hartmann . . . . . . . . . . . . . . . . . . . . Word Order · Werner Frey . . . . . . . . . . . . . . . . . . . . . . . . . Ellipsis · Lobke Aelbrecht . . . . . . . . . . . . . . . . . . . . . . . . . Syntactic Effects of Cliticization · . . . . . . . . . . . . . . . . . . . . Ergativity · Amy Rose Deal . . . . . . . . . . . . . . . . . . . . . . . . Relative Clauses and Correlatives · Rajesh Bhatt . . . . . . . . . . . . . Voice and Valence Change · Edit Doron . . . . . . . . . . . . . . . . . Syntax and Grammar of Idioms and Collocations · Christiane Fellbaum
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
158 218 246 274 309 342 383 412 446 478 514 562 595 654 708 749 777
. . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
1447 1478 1518 1559 1588 1622 1658 1677 1710 1726 1764
52. Syntax and Language Acquisition · Sonja Eisenbeiss . . . . . . . . . . . . . . . 53. Syntax and Language Disorders · Martina Penke . . . . . . . . . . . . . . . . .
1792 1833
Volume 3 VII. Syntactic Sketches 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51.
German: A Grammatical Sketch · Stefan Müller . . . . . . . . . . . . Hindi-Urdu: Central Issues in Syntax · Alice Davison . . . . . . . . . Mandarin · Lisa L.-S. Cheng and Rint Sybesma . . . . . . . . . . . . Japanese · Takao Gunji . . . . . . . . . . . . . . . . . . . . . . . . . . Georgian · Alice C. Harris and Nino Amiridze . . . . . . . . . . . . . The Bantu Languages · Leston Chandler Buell . . . . . . . . . . . . . Tagalog · Paul Schachter . . . . . . . . . . . . . . . . . . . . . . . . . Warlpiri · Kenneth L. Hale, Mary Laughren and Jane Simpson . . . . Creole Languages · Derek Bickerton . . . . . . . . . . . . . . . . . . Northern Straits Salish · Ewa Czaykowska-Higgins and Janet Leonard Syntactic Sketch: Bora · Frank Seifart . . . . . . . . . . . . . . . . .
. . . . . . . . .
VIII. The Cognitive Perspective
Contents
ix
54. Syntax and Language Processing · Claudia Felser . . . . . . . . . . . . . . . . .
1875
IX. Beyond Syntax 55. 56. 57. 58.
Syntax and Corpora · Heike Zinsmeister . . . . . . . . . . . . . . . . . . . . . Syntax and Stylistics · Monika Doherty . . . . . . . . . . . . . . . . . . . . . . Syntax and Lexicography · Rainer Osswald . . . . . . . . . . . . . . . . . . . Computational Syntax · Emily M. Bender, Stephen Clark and Tracy Holloway King . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59. Reference Grammars · Irina Nikolaeva . . . . . . . . . . . . . . . . . . . . . . 60. Language Documentation · Eva Schultze-Berndt . . . . . . . . . . . . . . . . 61. Grammar in the Classroom · Anke Holler and Markus Steinbach . . . . . . . .
. . .
1912 1942 1963
. . . .
2001 2036 2063 2095
Language index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subject index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2127 2133
Indexes
IV. Syntactic Models 24. Minimalism 1. 2. 3. 4. 5. 6.
The nature of the program Merge and the Edge Feature Agree and uninterpretable features Transfer and phases Outlook References (selected)
Abstract This article offers an overview of some of the key concepts and developments of Chomsky’s Minimalist Program (Chomsky 1995), up to and including Chomsky (2008). Focussing on the conceptual underpinnings of the program, I examine in turn the three fundamental operations of Chomsky’s system: Merge, Agree (probe-goal), and Transfer (phases). For each of these, I highlight some of the ways in which minimalist principles, operations and interface conditions emerge from, replace, and above all simplify their GB antecedents, achieving greater explanatory depth, before also considering the empirical gains that they can afford. Some ongoing and unresolved issues and debates are also flagged up, illustrating the kinds of questions and concerns that characterize a minimalist approach to the study of language.
1. The nature of the program Minimalism is a program for extending and developing linguistic theories beyond data coverage, holding them to a higher level of scrutiny and raising the bar for what counts as a genuine explanation of a linguistic phenomenon. Minimalist questions of the kind to be outlined below can be asked of any theoretical framework; the Minimalist Program (MP) itself, however, grew naturally out of the Principles and Parameters approach to transformational generative grammar (Chomsky 1981), and it is here that its greatest advances have been made and its most promising future prospects arguably lie. As the most recent incarnation of Principles and Parameters Theory (PPT), it shares with its predecessor, Government and Binding Theory (GB), the realist-mentalist conception of language fundamental to Chomsky’s work. That is, the object of study is I-language, a state of mind and a property of individual brains (‘I’ denoting internal, intensional, individual), and not language in its wider, social sense of a set of utterances or corpus of external tokens to which any individual has only partial access. The goal of rational linguistic inquiry is then to characterize the nature of that internal knowledge. Traditionally, this characterization has sought to answer two questions: (i) What constitutes knowledge of language in the mind of a speaker-hearer (such that we have a
804
IV. Syntactic Models correct description of the possible grammatical structures of a language)? And (ii) how does that knowledge arise in the mind of the speaker-hearer given the environmental stimulus (linguistic input) − how is it acquired, how much of it is acquired, and what is a possible (acquirable) human language? Insofar as a theory can provide satisfactory answers to these questions, it meets the respective benchmarks of descriptive and explanatory adequacy. Doing so is no mean feat, since there is a clear tension between the two desiderata, i.e. between reconciling a sufficiently restrictive and universal theory of the initial state of Universal Grammar (UG) − the genetic endowment which defines the range of possible, attainable human languages − with the diversity of variation attested across the world’s languages. PPT offered the first real breakthrough in reconciling these antagonistic demands. By factoring out the universal (principles) from the variable (parameters), PPT simplified the acquisition procedure considerably − instead of selecting among entire, fully specified grammars (as on earlier generative approaches), acquisition now proceeds by the fixing of values of a limited set of open parameters on the basis of salient input data. In this way, PPT solves Plato’s problem at least in principle (if not yet in practice). The state of the field at the end of the eighties, then, could be roughly described as follows. A decade of research in GB theorizing had led to a richly specified view of UG as comprising a highly articulated modular architecture (Case Filter, Theta Criterion, Xbar Theory, Subjacency and the Empty Category Principle, Binding Theory, Control Theory, etc.), with each module composed of its own vocabulary, rules, conditions, constraints and principles. All of this technology was specific to UG, that is, to the faculty of language (FL), and it allowed for vast empirical coverage, thus attaining a high level of descriptive adequacy. Through parametrization of these principles under PPT, descriptive adequacy was attained for all possible human languages, as well as an account of their acquisition, thus reaching the level of explanatory adequacy. With GB thus yielding descriptive adequacy, and its umbrella framework PPT enabling explanatory adequacy, the stage was set for a deeper level of explanatory adequacy to be sought. That is, PPT was now in a unique position to seriously address minimalist concerns of the kind that had already been raised in the earliest days of generative grammar but which had been beyond the scope of feasible inquiry at that stage. Essentially, minimalist concerns are those which relate to a third research question regarding the nature of linguistic knowledge (beyond questions [i] and [ii] above). If we are taking the biological foundations of FL seriously, then to the ontogenetic question in (ii), that of explanatory adequacy and the logical problem of language acquisition, must be added the phylogenetic question of how UG arose in the species. That is, we have to meet the desideratum of what we might call “evolutionary adequacy” (a term used independently by Longobardi 2004 and Fujita 2007; cf. the “natural adequacy” of Boeckx and Uriagereka 2007; and the coinage “beyond explanatory adequacy” of Chomsky 2004a) or Darwin’s problem (Boeckx 2009; Hornstein 2009) − the logical problem of language evolution. Just as knowledge exceeds experience, demanding innate, languagespecific mental structures in order to fill in the gap between linguistic input and the acquired final state of linguistic competence, so it seems that that knowledge (the richly specified UG of GB) exceeds what we can reasonably or plausibly attribute to natural selection, given the tiny evolutionary window within which human language is assumed to have appeared (suddenly, around 60,000 years ago; Chomsky 2008). The time frame is simply too short for a highly articulated, richly specified UG replete with myriad FL-
24. Minimalism specific rules, constraints and principles to have emerged by selective pressure, with each piece of FL-specific technology (each rule, principle, module) separately selected for. Such a view is in any case implausible given that an FL containing, e.g., Principle A of the Binding Theory or a null-subject parameter would hardly be competitively fitter than one without. To the extent that FL accords a survival advantage, as it surely does, it does so as a whole, i.e. as a system for generating hierarchically structured soundmeaning pairs allowing thought, planning, evaluation and, ultimately perhaps, communication through externalization. Every claim about the content of UG, then, is a claim about the brain; therefore, the more we attribute to UG, the greater becomes the problem of accounting for how it all got there, evolutionarily speaking. This problem simply cannot be ignored if we are taking seriously the claim that language (syntax) is a biological entity, a natural object to be investigated as part of the natural sciences. Unfettered proliferation of FL-specific technology ushers us ever further from a naturalistic approach to language and a genuine understanding of the properties of FL. We thus have another tension between levels of adequacy to grapple with (descriptive/ explanatory versus evolutionary). In light of such considerations, the minimalist hypothesis is that UG must be maximally empty and maximally simple, perhaps containing just a single innovation that takes us from the pre-linguistic state to the fully functioning FL. A leading hypothesis, put forward by Hauser, Chomsky and Fitch (2002), is that this single innovation was (context-free) recursion, yielding phrase structure and providing a link between the two extralinguistic performance systems of sound (the articulatoryperceptual system, AP) and meaning (the conceptual-intentional system, CI). FL is thus reduced to its bare, conceptual minimum as a system for generating sound-meaning pairs. Clearly, a minimal UG leaves a lot of gaps to fill if we are to retain anything like the empirical coverage that the maximalist GB view of UG afforded. Minimalism attempts to fill those gaps by looking outside of FL itself to find general computational and cognitive rationalizations of the kind of FL-specific technology uncovered in GB studies, thus reducing that technology to the level of principled explanation. (On MP as a rationalization of GB, see also Hornstein 2001, 2009; Hornstein, Nunes and Grohmann 2005; Lasnik and Lohndal 2010.) The object of study thus shifts from data coverage (explaining judgements and intuitions) to the properties and principles of the language system itself, those identified under earlier PPT/GB models. The question to be answered is why the computational system of human language has these properties and not other ones, by finding principled explanations for these properties in terms of general factors nonspecific to the language faculty. Pursuing this line of enquiry, Chomsky (2005) identifies three factors that determine the growth and form of FL as a mental organ: (1)
Three factors in language design I. Genetic endowment II. Experience III. Principles not specific to the FL, the human faculty of language.
Factor I is the domain of Universal Grammar (UG); Factor II is the external data that constitutes the linguistic environment in which language acquisition takes place; and Factor III comprises “general properties of organic systems” (Chomsky 2004a: 1), the
805
806
IV. Syntactic Models result of physical constraints on the form and development of living organisms which define, limit and channel the range of evolutionary options. Minimalism is then an attempt to remove as much technology as possible from the first factor, either by showing it to be redundant and thus eliminable altogether, or by moving it to the third factor by finding cognitive correlates in other domains. Perhaps surprisingly, then, and somewhat controversially, the Minimalist Program is interested in what the study of language has traditionally abstracted away from, namely those properties of language which are not unique to this faculty. In the case of FL, the computational system of human language, such third-factor constraints might plausibly include principles of efficient computation. Thus notions of economy (of derivations and representations) − least effort, shortest movement steps, and so on − have played a large role in the development of the MP, as has the notion of optimality. If every property of FL contributes to the efficiency of the mapping to the interfaces with the external systems of AP and CI, then FL is in that sense an optimal solution to the conditions imposed on it by these external systems. The strong minimalist thesis (SMT) then holds that language is indeed perfect in this way − UG is maximally empty; the only conditions on FL are interface conditions; and FL satisfies these conditions optimally. By emphasizing the role of third-factor (principled) explanations in linguistic science, the role of the first factor (UG) is thus correspondingly diminished, since this is where the imperfections reside (the unexplained, unprincipled residue). In this way, the SMT takes us from methodological minimalism − the search for elegant, simple, economical, nonredundant theories common to all scientific enterprise − to a substantive claim about the object of study itself: FL is itself elegant, simple, economical, and nonredundant. The addition of economy considerations to explanations of linguistic phenomena (i.e. for determining convergent derivations) is one of the key developments that sets MP apart from its forebears. It has also attracted considerable criticism, most notably in a series of provocative articles initiated by Lappin, Levine and Johnson (2000a), to which various prominent minimalists replied, with further responses by Lappin, Levine and Johnson (2000b, 2001). Two such criticisms have particular merit. The first is that the formal economy principles of earlier minimalism (up to Chomsky 1995) involve the comparison of derivations, where the most economical derivation, such as that involving fewer or less costly operations (cf. Chomsky 1991), would win out over its less economical competitors. Such comparison only adds to the complexity of computations, contra the minimalist desideratum of easing the computational burden. Secondly, many of the formal economy constraints postulated in early minimalism have a rather language-internal, FL-specific flavour (e.g. those imposing locality constraints on movement, as Freidin and Vergnaud 2001: 644 [fn. 10] note, and Procrastinate, which avoids overt categorial movement as a more costly operation than covert feature movement and which requires for its implementation the postulation of descriptive “strong” features to enforce PFconvergence). These constraints must therefore be encoded as principles of UG after all rather than following independently as “third-factor” effects. Comparing the situation with the status of minima and maxima principles in physics, Lappin, Levine and Johnson (2000a: 666) write: “Minimization/maximization principles are derived from deeper physical properties of the particles (waves, vectors, etc.) which satisfy them. They follow from the subatomic structure and attributes of these particles, and are not themselves basic elements of the theory”.
24. Minimalism Both of these criticisms have been addressed in more recent developments of the minimalist program (Chomsky 2000). Procrastinate is eliminated in favour of singlecycle generation and earliness of feature-checking (cf. Pesetsky 1989), and all that remains of comparison of alternatives is a fully localized preference for “Merge over Move” that immediately rules out any non-convergent options (see section 4; also Groat 1999 on localizing Merge over Move; and Collins 1997 on the localization of economy in general). Formal economy principles that might have been taken to be FL-specific additions to UG are now replaced with general third-factor properties. For example, instead of a specific “Shortest Attract” economy metric for constraining movement paths, we have a general computational “minimal search” condition on probes, implementing minimality effects, such that a probe returns the first value it finds (see section 3). Minimality thus reduces to the principled, FL-independent minimization of computational complexity, a plausible third-factor property, assuming the human brain to be a computing device. The only FL-specific economy conditions that now remain are interface conditions, in line with the SMT, essentially just Full Interpretation (FI) and Last Resort (LR), corresponding to economy of representation and economy of derivation, respectively (cf. Chomsky 2000: 99). Both FI and LR militate against superfluous elements in syntactic structures − symbols and operations with no interpretive effect at the AP and CI interfaces. Informally, we can characterize LR as the requirement “don’t do too much” and FI as “don’t do too little”. FI is “the convergence condition” (Chomsky 1995: 194), ensuring the legibility of syntactic expressions at the interfaces by barring features that are without an interpretation at the two interface levels, Phonetic Form (PF) and Logical Form (LF). Such uninterpretable features include Case features on nouns and verbal agreement features. LR adds a kind of inertia to the system, ruling out vacuous steps in derivations − all operations must be triggered, again by uninterpretable features. The picture of syntax that emerges is one of an otherwise inert system spurred into action by (i.e. operating on) uninterpretable features (uF). Without uFs, LR dictates that nothing would happen; once uFs are introduced, FI ensures that the system acts immediately to get rid of them, by movement and agreement, like a “virus” triggering the immune system into action (cf. Uriagereka 1998). The twin, antagonistic interface-economy principles FI and LR thus provide a rationale for the existence of uFs: without LR, operations would be free, without the need for a formal uF trigger (cf. the Affect-α model of Lasnik and Saito 1992), but without FI, uFs would be impotent and ineffectual as triggers, since there would be no need to eliminate them − the interfaces would simply tolerate them. Note that optionality of operations is thus largely eliminated through the conspiracy of FI and LR (a triggered operation is obligatory, an untriggered one excluded). However, optionality may still arise in (at least) two ways. Firstly, the trigger (uF) might itself be optional, added to some derivations but not to others, creating derivational minimal pairs. Since, by Full Interpretation, any uF must have an effect on output (Chomsky 1995: 294), these pairs must be distinguishable at the interface by leading to different interpretations (for scope, discourse structure, and other “surface semantic” properties) − see, e.g., Chomsky (2001) on Object Shift, and Reinhart (1995) and Fox (2000) on interface economy. The other possibility for optionality arises where multiple derivational options exist for satisfying the same formal imperative (uF) − see Biberauer and Richards (2006) for discussion.
807
808
IV. Syntactic Models The resulting system implies a version of the Activity Condition (see section 3) − items are only syntactically active for case/agreement purposes for as long as they contain uFs. This captures the old idea that case and agreement go hand in hand (Martin 1999): once the uninterpretable Case feature on a nominal has been eliminated (via agreement with a functional head), that nominal is no longer active for agreeing with further functional heads (thus ruling out structures such as *John is believed [that t is happy], attributed to the Empty Category Principle in GB). The effects of the Case Filter are also immediately derived, without appeal to a special GB-style module − nominals must be Case-licensed by having their Case features eliminated, for reasons of FI. The effects of the GB Theta Criterion are also subsumed: (a) a single nominal cannot be Case-valued more than once, due to Activity, and so cannot relate to two functional heads, thus ruling out structures such as *John hit tJohn, where John would receive both the agent and patient theta-roles; (b) cases where a nominal fails to receive a theta-role, such as Mary in *John danced Mary, reduce to the Case Filter and thus FI. In this way, the interface economy principles FI and LR become general conditions on optimal derivations that do the work previously attributed to multiple GB modules, including the Case Filter, the Theta Criterion and some instances of the Empty Category Principle, allowing such GB machinery to be dispensed with (cf. Freidin and Vergnaud 2001) and thus bringing us closer to the minimalist ideal of a maximally empty UG. Other phenomena explained by the Empty Category Principle in GB receive minimalist reinterpretations in terms of the locality of movement (minimal search and the Phase Impenetrability Condition − see sections 3−4); these include Rizzi’s (1990) relativized minimality effects, such as superraising, the Head Movement Constraint, and superiority (wh-islands) − see Chomsky (1995: 181). Such results are a valid achievement in refining linguistic theory − essentially, we take an empirically motivated GB module, like the Case Filter, and explain it in deeper, language-independent terms, thereby rendering FL a more biologically plausible object (in the sense defined above). Nevertheless, the validity of this enterprise has been challenged by many on the basis that the shift from GB to MP was not empirically motivated (cf. Lappin, Levine and Johnson 2000a), bringing no new empirical facts under the scope of PPT (Newmeyer 2003). Such criticisms seem to overlook two facts: firstly, that there are other criteria for the advancement of theories than just empirical payoff, and secondly, that there are other empirical domains for linguistic inquiry and explanation than just natural language data. On the first point, Freidin and Vergnaud (2001) cite Dirac’s (1968) distinction between two procedures for the development of hypotheses in the natural sciences: the “experimental” and the “mathematical”. In its focus on explanatory depth rather than data coverage, the MP is firmly aligned within the mathematical procedure, sometimes also referred to as the Galilean style of science (the search for mathematical perfection in nature). Chomsky, in a discussion held with Riny Huybregts and Henk van Riemsdijk between November 1979 and March 1980 (reproduced in Chomsky 2004b), had the following to say on the matter, responding to Katz and Bever’s (1976) paper, “The Fall and Rise of Empiricism”: I think that the title of the paper captures a rather significant point about the nature of the field. There is a strong opposition to the idea that there might be abstract theories and principled explanations and deductive chains of arguments from principles that do not look like descriptions of phenomena.
24. Minimalism The almost inevitable tendency of people when any such proposal is made is to try to shoot it down (…). That is usually pretty easy; there is just so much unexplained data. (Chomsky 2004b: 70)
As we have seen, the aim of MP is to do just this − to deduce GB-style machinery from deeper principles. Should this be achieved (or achievable), then the MP will actually be explaining more than GB, since not only will there be no loss of data coverage, but the tools used to explain those data will themselves be explained. As Hornstein, Nunes and Grohmann (2005: 256) neatly put it, “a minimalist reanalysis of the data (…) need not cover more empirical ground to be preferable: a tie goes to the minimalist!”. By reconceptualizing barriers as phases (see section 4 below), movement as Internal Merge (see section 2), government as Probe-Goal (section 3), the Case Filter as (reducible to) FI, etc., we bring GB tools, and the phenomena they accounted for, under the fold of principled explanation − a significant result, but one that is apparently missed by those who dismiss phases (etc.) as just barriers redux. These are not merely cosmetic or terminological changes, but conceptual ones representing a considerable step forward in our understanding of these properties, especially if the restatement in minimalist terms allows problematic aspects of the original conceptions to be resolved (thus the stipulative voiding of barrierhood by adjunction receives a natural explanation in terms of the edge in phase theory, which is a notion independently implied on grounds of optimal design − see section 4.3 below). The goal of the MP, then, is to reconceive as much descriptive technology as possible in this way, in order to meet more stringent explanatory demands. However, there is no expectation that we will be able to explain everything in these deeper terms. Nobody anticipates that the SMT should hold in its strongest form (a totally empty UG − see below). Rather, the SMT provides us with a heuristic that draws a principled line between those properties and phenomena that are amenable to principled explanation in terms of third-factor considerations, and those that are not (at our present level of understanding) and which must therefore be attributed to UG, complicating the evolutionary picture. Returning to the second point mentioned above, that of the empirical basis of the MP, it should be borne in mind that the MP is no less empirically founded than its predecessors. The object of study might have changed from external phenomena (language data) to the properties of the mind responsible for those phenomena (the properties of Ilanguage); however, the latter is still an empirical object in the natural world (indeed, more so than the tokens of E-language; see, e.g., Chomsky 2001: 41−42; Anderson and Lightfoot 2002). To properly understand this object, data can only take us so far; the MP investigates the possibility that a better theory of FL can be attained by taking conceptual, cognitive, and computational factors into account than that which is arrived at on the basis of data alone. Nevertheless, it cannot be denied that the MP has had considerable success on the empirical stage, too. Not only has it yielded refined analyses of familiar phenomena (such as expletive-associate constructions, relativized minimality effects, multiple whmovement, crosslinguistic differences in verb placement, etc.), but it has also allowed new phenomena to be described and explained that were formerly beyond the scope of GB analysis (despite the critical assertions to the contrary mentioned above). Some of these uniquely MP-spawned empirical consequences will be reviewed in the following sections.
809
810
IV. Syntactic Models Given the impossibility of covering almost two decades’ worth of wide-ranging minimalist research in a single, short survey, the remainder of this article must be necessarily superficial and selective in content, leaving out far more than it can include. In order to give just a sense of the fruitful directions in which the MP has taken us since Chomsky (1993), the focus will be on the major developments in Chomsky’s own formulations and conceptions of the field (though given the richness of these works, the overview will still be far from exhaustive). To shape the presentation, we should first identify what must remain of UG (the first factor) under the reductive, SMT-driven approach. Here, considerations of virtual conceptual necessity come into play − what must a maximally simple FL minimally comprise in order to be usable at all as a system for generating convergent sound-meaning pairs? In order for the syntax to provide legible expressions to the two external performance systems with which it interfaces, at least two primitive operations would seem indispensable. Firstly, there must be an operation for building structured expressions; secondly, there must be an operation connecting those expressions with the external systems. This would seem the conceptual bare minimum. The first operation has become known as Merge, the second as Transfer. Further, it is generally assumed that a third operation must exist, the dependency-forming operation Agree, which arguably plays a role both in Merge (e.g. for implementing movement) and in Transfer (for deleting uninterpretable features before they reach the interface). Each of these operations is associated with a particular type of formal feature that triggers it − Merge operates on Chomsky’s (2007, 2008) structure-building Edge Feature (EF); Agree and Transfer are triggered by uninterpretable (phi-)features (case and agreement features), uFs. In the following, we review each of these basic operations and feature types in turn: Merge and EF (section 2); Agree and uF (section 3); Transfer and phases (section 4). Although the emphasis will be on their conceptual motivation and development, major empirical consequences that go beyond the capabilities of GB will also be highlighted. Finally, section 5 concludes with a brief mention of some ongoing debates and current issues that are shaping the immediate development of the program and which bear on the future prospects of minimalism as providing a viable model of the human language faculty.
2. Merge and the Edge Feature 2.1. Bare phrase structure By the GB era, the phrase-structure rules of earlier generative grammar had largely been eliminated as redundant, duplicating information already encoded in lexical information (e.g. subcategorization frames) and the X-bar schemata to which they conformed, principally those given in (2), where YP corresponds to the structural notion of specifier (of XP) and WP to complement (of X). (2)
a. XP % (YP) X′ b. X′ % X (WP)
24. Minimalism
811
X-bar theory provided representational constraints on the form of phrase structure, a template into which lexical material could be inserted. What was missing, however, was a procedure for deriving those representations. Chomsky (1995: 189−191) sets the scene for a derivational, bottom-up view of phrase structure in line with minimalist desiderata by proposing a return to an elementary form of generalized transformations (albeit with an added notion of cyclicity − Chomsky 1995: 190; cf. below). A binary operation maps a pair of phrase markers (K, K1) to the new phrase marker K*; a singulary operation maps the phrase marker K to K*, where K* includes K as a proper subpart (that is, we extend K). These operations would in later work become Merge − external and internal, respectively. At its simplest, Merge takes two syntactic objects, X and Y, and combines them to form the unordered set {X, Y}, which can then provide the input to further instances of Merge (forming {W, {X, Y}}, and so on). In merging X to Y, there are two logical possibilities: either X originates inside Y (as a proper subpart), or else it originates outside Y (either in the Lexicon/Numeration − see [4] below − or as a complex term, i.e. the output of a separate, parallel derivation): these are Internal and External Merge, respectively. The latter does the work of erstwhile phrase-structure rules, forming predicate-argument structures and the like, whereas the former yields movement without further ado, thus implementing the transformational component of GB and earlier generative models. Interestingly, then, the displacement property of human language, long viewed as an imperfection of FL owing to its lack of conceptual necessity (cf. its absence from logical and artificial languages), is immediately given under Merge; in a twist of perspective, its absence or exclusion would now require departure from the SMT (Chomsky 2008: 7). We should briefly mention at this point some other conceivable ways of combining X and Y that are arguably excluded as departures from this minimal conception of Merge and the SMT (i.e. if they exist, they must be extensions to UG). Foremost amongst these is Parallel Merge (Citko 2005), which would concatenate X and Y through a combination of the properties of External Merge (X and Y are separate objects) and Internal Merge (X is merged with a subpart of Y), yielding structures like (3).
X
(3)
(X)
Y W
(Y)
Parallel Merge is claimed by Chomsky (2007: fn. 10) to be a departure from the SMT: “Merge cannot create objects in which some object [W] is shared by the merged elements X, Y.” This exclusion is not uncontroversial, however. Such a variant of Merge has considerable empirical support; Citko (2005) shows how it yields an advantageous analysis of Across-The-Board wh-movement (as in What did John recommend and Mary read?), such that a single wh-item is extracted from multiple conjuncts. Further, insofar as it is X that is the selecting head here (i.e. the head whose Edge Feature is satisfied through Merge), Merge of W to X in (3) conforms to the Extension Condition and/or the No-Tampering Condition (see below for discussion of these concepts). It is important, therefore, to distinguish the question of the legitimacy of the structure in (3) from the question of the legitimacy of the operation(s) that would derive it. Chomsky’s (2007: fn. 10) statement, cited above, is surely correct in the strict sense that (3) cannot be derived
812
IV. Syntactic Models in one fell swoop from a single instance of Merge(X, Y). That is, the structure in (3) cannot be the result of a Merge operation that combines X and Y; it does not yield the set {X, Y}, and so is not an instance of Merge(X, Y). Internal Merge and External Merge thus remain the only two logical possibilities for Merge(X, Y) per se − either X is contained in Y or it is not. However, this reasoning is silent as to the legitimacy of (3) as a structure, which in fact cannot be excluded − it is the legitimate product of a derivation involving two instances of Merge: external merge of W to Y and external (re)merge of W to X (see De Vries 2009 on parallel merge as external remerge). That is, even though (3) cannot be an instance of Merge(X, Y), it is still a possible output of Merge(X, W), where W is contained in Y as a result of Merge(W, Y); in fact, this is all that is crucial for Citko’s analysis. If (3) is to be excluded, then, independent grounds must be sought. For example, it is questionable whether the double-rooted object in (3) that arises from external remerge is a viable syntactic object for further derivational operations − see Epstein et al. (1998); Epstein, Kitahara and Seely (2009), to which we briefly return in section 2.2. Since both Internal and External Merge interleave freely throughout the derivation, the two internal syntactic levels of D-structure and S-structure, and the compositional cycle that mapped the former to the latter in GB, are eliminated as unformulable. This is a desirable result, since the only conceptually necessary levels of representation are those interfacing with the external systems of AP and CI, i.e. PF and LF, respectively. Indeed, even these should be eliminated from a strictly derivational system, in favour of constant, unrestricted access by the interfaces, as Epstein et al. (1998) argue, paving the way for phase-cyclic computation (see section 4). We thus arrive at the following architecture of the grammar, a refinement of the GB T-model. (4)
The minimalist architecture of the language faculty Lexicon → Numeration (overt syntax) ← Spell-Out (covert syntax) LF/CI PF/AP (“meaning”) (“sound”)
It is worth noting here that the status of the numeration (lexical array) in (4) has changed with more recent developments to the framework (see, in particular, the phase system outlined in section 4 below), and is now widely abandoned. As originally conceived in Chomsky (1995: Chapter 4), the numeration is a collection of lexical items selected from the lexicon that defines the workspace of a derivation, and is motivated on numerous grounds: It ensures that PF and LF expressions or representations are drawn from the same vocabulary and are thus compatible, so that we do not get arbitrary mismatches between sound and meaning; it defines the reference set of derivations that compete for
24. Minimalism
813
economy purposes (for example, we do not want the availability of less costly expletive constructions (there was heard a loud explosion) to uniformly block overt subject-raising derivations (a loud explosion was heard); therefore it must be that only those derivations are compared that proceed from the same numeration − in this case, a numeration containing the expletive is not comparable with one that does not); it allows for a type− token distinction to be made, thus distinguishing multiple selections of the same lexical item (e.g. coreferential he in he thinks he is happy) from multiple occurrences arising through movement (e.g. he seems the to be happy), which are clearly treated differently at the AP interface. Furthermore, it is assumed that, for a derivation to be sent to the interfaces for interpretation (to AP at the point marked “Spell-Out” in [4], and to CI at the end of the covert cycle), the numeration must be exhausted (Chomsky 1995: 226). Spelling out before the numeration is exhausted would again compromise compatibility between PF and LF representations, since lexical items could be added in the covert computational cycle, yielding, e.g., a PF John left with an LF interpretation They wonder whether John left before finishing his work (Chomsky 1995: 189). Whether these same results can be achieved without numerations − and if so, how − is a matter of ongoing debate. A further important consequence of the Merge-based system is its elimination of templatic X-bar restrictions on phrase stucture − there are no obligatory intermediate projections, and so no trivial (unary) projections. Instead, what emerges from Merge is a bare phrase structure in which no structure exists independently of the lexical items that project it (thus structural artefacts of X-bar representations, such as bar levels, are also eliminated, as is the head−terminal distinction and the need for separate lexical insertion rules). Such a model thus conforms to a further minimalist desideratum, Inclusiveness, which requires that no new features be added in the course of the derivation beyond those already present on the lexical items in the Numeration − only rearrangements and deletions of the features of these items should be possible (Chomsky 2000: 113). To illustrate the differences, the two trees in (5) represent the structure of the simple verb phrase John reads books under X-bar theory (5a) and under Merge/bare phrase structure (5b). (We assume here for purely expository purposes a naïve VP-internal subject position, and treat nominals as NPs.) (5)
a. X-bar VP NP
V'
N'
V
NP
N
reads
N'
John
N books
814
IV. Syntactic Models b. Merge (Bare Phrase Structure) reads John
reads reads
books
The bar levels of X-bar theory can be read off the structure in (3b) as derived, relational properties (maximal/nonmaximal, minimal/nonminimal). Similarly, complement and specifier now simply refer to first-merged and second-merged items. As numerous researchers have pointed out, bare phrase structures such as (5b) are more problematic than the richer representations in (5a) in one key area: linearization. The base pair {α, β} of every subtree stands in a symmetric c-command relation (sisterhood) that cannot be ordered under Kayne’s (1994) Linear Correspondence Axiom (LCA), which essentially maps asymmetric c-command (sister-of-contain) onto precedence. Thus reads precedes books in (5a) by virtue of the asymmetric c-command relation between the V and N′ (/N) nodes, but no such asymmetry holds between the heads reads and books in (5b). If we adopt Bare Phrase Structure, then we are forced to conclude that the LCA cannot be a constraint on phrase structure itself as a property of narrow syntax, but must be a linearization strategy operative only after Spell-Out in the mapping of syntactic hierarchy onto phonotemporal order in the PF wing of the grammar. As Chomsky (1995: 340) puts it: “We take the LCA to be a principle of the phonological component that applies to the output of Morphology”. This unlinearizable point of symmetry must then be resolved at (or by) PF, to which end numerous proposals have been made in the literature. If movement leaves behind an empty category (trace) that does not need linearizing, then movement of one of the two offending sisters is one option (Kayne 1994: 133; Chomsky 1995: 337; Moro 2000). Other possibilities include cliticization of one of the sisters to the other (via head-adjunction in the syntax, leading to word-internal restructuring in the morphological component − Chomsky 1995: 337; Uriagereka 1998; Nunes 1999), or a head-complement directionality parameter of the kind familiar from GB (see Saito and Fukui 1998 for a syntactic parametrization of Merge, and Groat 1997; Epstein et al. 1998; Richards 2004 for parametric approaches located at the PF-interface).
2.2. The Extension Condition Given Last Resort (see section 1), all operations require a formal trigger in the form of an uninterpretable feature. Chomsky (2007, 2008) proposes that the feature responsible for Merge is the Edge Feature, a property of lexical items. Arguably, EF is the single evolutionary innovation that takes us from the pre-linguistic stage of inert, isolated lexical items (concepts) to a system that enables their combination into larger objects, and those larger objects with further lexical items, and so on − that is, EF yields iterative Merge, and thus recursion (“a discrete infinity of structured expressions”, Chomsky 2007: 5).
24. Minimalism
815
The Edge Feature captures a further important property of Merge, namely its monotonicity: whilst Merge adds new structural properties (relations) to the two objects merged together (X and Y), namely sisterhood and c-command (as a result of Merge, X and Y are sisters, and X c-commands into Y and vice versa), all previously existing structural properties and relations remain unchanged. Such conservation of information is arguably a natural computational principle (a third-factor consideration; see Lasnik, Uriagereka, and Boeckx 2005 on conservation laws). Chomsky (2007, 2008) calls this the No-Tampering Condition (NTC), an early formulation of which can be found in Chomsky (2000: 137 [59]): “Given a choice of operations applying to α and projecting its label L, select one that preserves R(L, γ)”. It follows from the NTC that Merge involving the syntactic object X (either a lexical item or the output of previous applications of Merge) must always apply to the edge of X, extending the tree upwards, hence the Edge Feature. In this way, EF captures the Extension Condition on Merge, barring countercyclic Merge operations that would take X and Y and merge X to W contained in Y, replacing W with {X, W} (Chomsky 1995: 248). More generally, X cannot merge inside Y, altering the sisterhood relations of Y, as in (6a); rather it must merge outside Y (i.e. to its edge), as in (6b). (6)
Merge X to the Edge: a. *
Y Y
Y W
→
W
Y Y
b.
Y Y
X
Y W
→
Y
X Y
W
(based on Chomsky 2000: 136 [57]) Empirically, the Extension Condition is borne out in the form of Relativized Minimality effects (Rizzi 1990), since without it, countercyclic derivations would be possible in which the intervener is merged after the movement operation takes place which the intervener is meant to block. Chomsky (1995: 190) illustrates with the following examples: (7)
a. [I′ seems [I′ is certain [John to be here]]] b. [C′ C [VP fix the car]] c. [C′ C [John wondered [C' C [IP Mary fixed what how]]]
Without the Extension Condition (cyclicity of Merge), the availability of the intermediate structures in (7) would allow violations of superraising, the head movement constraint (HMC), and the wh-island constraint, respectively, to be derived. Thus, in (7a), John
816
IV. Syntactic Models could move to matrix SpecIP across the empty embedded finite subject position; merging it to that position after movement of John to the matrix clause would then yield the illicit superraising violation *John seems it is certain t to be here. Similarly, moving fix to C followed by merger of the auxiliary can in (7b) would yield the HMC-violating question *Fix John can t the car?, and movement of how to matrix SpecCP could precede movement of what to embedded SpecCP in (7c), falsely deriving a wh-island violation (*How did John wonder what Mary fixed twhat thow?). The minimalist “shortest move” economy principles (Minimal Link Condition) that would rule out such minimality violations must therefore be bolstered by the notion of the strict cycle (Chomsky 1973), applied to Merge in the form of an extension condition, without which they would be ineffective. Indeed, Brody (2002) invokes this reliance on the supplementary notion of cycles and extensionality as an argument against the derivational approach of minimalism and in favour of purely representational alternatives. However, as argued above, the cyclicity of Merge follows from independent principles of structural optimality (the NTC), and so would not be an addition to UG but rather a third-factor effect. The Edge Feature immediately ensures this effect, i.e. that Merge conforms to the NTC, thus barring countercyclic Merge. Furthermore, in excluding countercyclic Merge to a position inside the complement of a head, the EF/NTC also subsumes much of the work done by the Projection Principle and Theta Criterion in GB (the ban on raising to complement positions); cf. Chomsky (1995: 191). In this way, we once again see the SMT in action: heterogeneous and overlapping GB technology is shown to be redundant, reducible to a much smaller set of general, simple, FL-independent computational principles (here, the NTC). Before turning to a further advantageous result of the NTC in the next section (the copy theory of movement), two additional empirical consequences of EF should be remarked. The first is that, in order to allow for specifiers to be projected (secondmerge), EF must remain undeleted in the syntax, so that it can be satisfied a second time (Chomsky 2007: 11). However, since it remains undeleted, restriction to just a single specifier would require a stipulation. Thus, under EF, if one specifier is possible, then any number of specifiers are possible. Multiple specifiers therefore come for free from undeletable EF (in sharp contrast to X-bar theory and approaches based on Kayne’s LCA). This is a desirable consequence, as multiple specifiers have played an important part in minimalist analyses of various phenomena, including Object Shift, in which the light verb head v projects both its usual thematic specifier, to which the external argument is merged, and a nonthematic specifier to which the object is raised (Chomsky 1995: 352); Transitive Expletive Constructions, analysed in Chomsky (1995: 342−344) as involving multiple specifiers of T, one hosting the raised external argument and the other the expletive; and, under phase theory (section 4), successive-cyclic movement of items out of phases via multiple specifiers of CP and vP. Secondly, although Merge via EF sharply constrains the range of possible Merge sites for X merging to Y, it has been argued that there is room for at least a little flexibility precisely in the case of multiple specifiers. This wiggle room is exploited by N. Richards (1999), who contends that a conspiracy of economy considerations (Attract Closest and Shortest Move) should result in the “tucking-in” of an outer specifier under an inner specifier when both are the product of movement. That is, via tucking-in, multiple XPmovements targeting a single head should exhibit the surface effect of crossing paths rather than nested ones. Empirically, this is borne out in the form of order-preservation
24. Minimalism
817
effects amongst XPs that move to the same head, such as multiple wh-movements to SpecCP in languages like Bulgarian, and multiple object shift (of direct and indirect object) in Icelandic. Merge-by-EF allows for this possibility since tucking-in conforms to the NTC as originally defined by Chomsky (2000: 136−137) in terms of the preservation of sisterhood and c-command relations (as cited above): whether we merge W above or below Z in (8), the c-command and sisterhood relations holding among X, Y and Z remain untampered-with. (8)
Merge W
+
X
Z
X
→
X
a.
Y
X W
X Z
X X
b.
Y
X Z
X W
X X
Y
Note, however, that the “tucked-in” structure in (8b) does not conform to the NTC if the latter is construed more strictly, and indeed simply, as in Chomsky (2007, 2008), such that Merge of X and Y leaves X and Y unchanged: that is, (8b) does not conform to the schema Merge(X, Y) % {X, Y}. Further, (8b) would also be ruled out under a derivational approach to c-command (Epstein et al. 1998), in which c-command relations are established “online” at point of merger rather than defined representationally on the tree. On this view, Merge(W, X) in (8b) would establish c-command between W, X and Y but would fail to establish any c-command relation between Z and W, since W enters the tree only after Z merges with {X, Y} and forms its c-command relations. Such lack of a derivational relation between Z and W may pose linearization problems in the PF component of the grammar, insofar as the mapping to precedence relies on c-command amongst terminals (cf. Kayne 1994): Z and W then cannot be ordered with respect to each other (Epstein et al. 1998: 142−143). Epstein, Kitahara and Seely (2009) propose that all such objects arising from countercyclic merge are essentially the same as those that arise from Parallel Merge / external remerge (see [3] above), i.e., they are doublyrooted structures; as such they cannot provide the input for further derivational operations. They are, in a sense, unstable (see Moro 2000 on the [PF-]instability of multiple specifiers and other “symmetrical” structures, and Gallego 2007 and Chomsky 2008 for
818
IV. Syntactic Models related ideas, which Chomsky 2013 develops in an interesting new direction). This instability must be resolved if the derivation is to continue. For Moro (2000), movement resolves the instability, as mentioned above; Epstein, Kitahara and Seely (2009) make the intriguing proposal that immediate Transfer to the interfaces does so (yielding phases − see section 4). In sum, the NTC implies that Merge (EF-driven operations) cannot alter the properties of the objects it applies to, hence its “edginess”. No new features can be added to X or Y by Merge(X, Y) (NTC subsumes Inclusiveness here), nor can its output, the set {X, Y}, be altered or broken up by later applications of Merge. It follows from this that the operation of movement (internal Merge) cannot insert traces into base positions, as it did under GB’s Move-α. Nor can the base position simply be entirely vacated, since this would tamper with the structural relations established by the moved item in its original position (sisterhood with its original merge-partner). Rather, movement must leave the original position unchanged, implying that when an item moves, it also stays behind. That is, under the NTC, we arrive at a copy theory of movement, such that a single item becomes associated with multiple positions, leaving copies of itself in each position through (to, from) which it moves. Since this has important empirical consequences that extend the range of data coverage in MP vis-à-vis GB, let us consider how copies work in some more detail.
2.3. Empirical advantages: Copy deletion and Spell-Out The traces left by movement in GB violate Inclusiveness/NTC in at least three ways: (i) they modify the structure, replacing the lexical material of the moved item with a new kind of element (an empty category, of which there are various kinds depending on the type of movement); (ii) they are coindexed with the moved item, and this index is itself a further violation of Inclusiveness; (iii) they are not part of the Numeration but rather are introduced only in the course of the derivation, generated by the movement operation itself. Instead, as described above, the copy theory of movement emerges as the null assumption: “Internal Merge leaves a ‘copy’ in place” (Chomsky 2004a: 8), creating multiple occurrences of the same item. In (9), John merges (at least) twice: once as V’s complement, once as T’s specifier. (9)
a. John was arrested. b. [John was [arrested (John)]]
In the case of overt movement, it is assumed that the lower copy is deleted in the phonological component (PF/AP); however, it remains available for interpretation at the semantic interface, yielding an advantageous implementation of reconstruction without the need for lowering and trace-replacement operations: (10) a. Johni wondered [which picture of himselfi/j ] Billj saw twh b. John wondered (which picture of himself ) Bill saw (which picture of himself )
24. Minimalism
819
Setting aside certain details (see Chomsky 1995: 202−210), the two interpretations of himself indexed in (10a) correspond to the interpretation of different copies of pictures of himself at LF. Reference to syntax-internal levels, such as S-structure or van Riemsdijk and Williams’s (1981) NP-Structure, is thus no longer necessary, in line with minimalist desiderata and the architecture in (4). Comparing the different treatment of copies at the two interface levels, the question arises as to why only a single copy of the moved item is pronounced (i.e. why the phonological features of all other copies are deleted at PF). The most successful and influential account of why more copies are not pronounced is that of Nunes (1999, 2004), who approaches the problem from the perspective of linearization and Kayne’s (1994) LCA (see section 2.1 above). The LCA maps asymmetric c-command onto precedence. To illustrate, the simplified post-movement structure of (9) is given in (11). (11)
TP John
T' was
...VP arrested
John
The top copy of John asymmetrically c-commands was (since was is contained inside John’s sister, T'); in turn, was asymmetrically c-commands the lower copy of John. By the LCA, the ordering instructions determined at PF therefore include: (12) , This means that John must both precede was and follow was, an ordering paradox violating the asymmetry requirement on linear ordering. Nunes’s suggestion is that the phonological features of one of the copies of John must be deleted, thus exempting it from the need to be linearized. More specifically, Nunes (1999: 229) proposes a metric of Formal Feature Elimination to explain why it is usually the lower copy that is deleted; essentially, the copy with the most unvalued features is deleted. However, this implies a dedicated operation, Copy, that creates new copies that can be distinguished between on the basis of their individual featural properties, rather than treating copies as multiple instances of the self-same item with identical features in every position. (See Chomsky 2008: fn. 16 on the distinction.) A unique prediction of the copy theory of movement under Nunes’s analysis is that, should the LCA be overridden or not apply (e.g. because an alternative linearization strategy is available), more than one copy may be realized. This sets it sharply apart from the trace theory of movement, in which lower chain links are intrinsically devoid of phonological content by virtue of being empty categories. The copy theory would therefore find empirical confirmation over the trace theory if evidence of multiple spellout of chain links (realization of multiple copies) could be found. Such evidence, Nunes suggests, comes from wh-copying in a variety of languages, in which intermediate “traces” of successive-cyclic wh-movement may be overtly real-
820
IV. Syntactic Models ized; some Germanic data, as cited by Nunes (2004: 38−39, 42), are reproduced in (13)−(15). glaubst du, wen sie getroffen hat? (13) a. Wen has Who.ACC believe you who.ACC she met ‘Who do you think she met?’
[German]
b. Mit wem glaubst du mit wem Hans spricht? with who.DAT believe you with who.DAT Hans talks ‘With whom do you think Hans is talking?’ c. Wovon glaubst du, wovon sie träumt? Whereof believe you whereof she dreams ‘What do you think she is dreaming of?’ (14) Wêr tinke jo wêr=t Jan wennet? Where think you where=that Jan lives ‘Where do you think Jan lives?’ (15) a. Waarvoor dink julle waarvoor werk ons? why think you why work we
[Frisian]
[Afrikaans]
b. Met wie he jy nou weer gesê met wie het Sarie gedog met wie with who did you now again said with who did Sarie thought with who gaan Jan trou? go Jan marry ‘Whom did you say (again) that Sarie thought Jan is going to marry?’ Crucially for Nunes’s linearization account, such wh-copying is only possible with simple pronominal forms (wh-pronouns) and not with morphologically complex, full whphrases: (16) a. *Wessen Buch glaubst du wessen Buch Hans liest? whose book believe you whose book Hans reads ‘Whose book do you think Hans is reading?’
[German]
b. *Welchen Mann glaubst du welchen Mann sie liebt? which man believe you which man she loves ‘Which man do you believe she loves?’ Assuming that the simplex intermediate wh-copies in (13)−(15) can undergo morphological reanalysis under adjacency or via head-adjunction in the syntax, yielding a [wh+C°] adjunction complex that is restructured into a single phonological word, such wh-copies are no longer subject to the LCA, which does not apply word-internally (Chomsky 1995: 337). The illegitimacy of (16) then follows if we assume that XPs (maximal projections) cannot undergo adjunction to heads in the syntax (Chomsky 1995: 319), hence no morphological reanalysis is possible in the case of complex wh-copies. Converging evidence of multiple copy realization in cases where independent PF requirements override the LCA comes from the domain of verb copying (see, e.g., Aboh 2004; Landau 2006). Many languages, including Gungbe, Hebrew, Russian and Yoruba,
24. Minimalism
821
exhibit “verb-focusing” (predicate-cleft) structures in which the verb is fronted and interpreted contrastively. (17) Đù Sέná ɖù blέɖi l’ɔ bléblé. eat Sena eat bread DET quickly ‘Sena ATE the bread quickly.’ (Aboh 2004: 20 [11a])
[Gungbe]
(18) Liknot, hi kanta et ha-praxim. buy.INF she bought ACC the-flowers ‘Buy the flowers, she did.’ (Landau 2006: 37 [8b])
[Hebrew]
Child language “errors” display a similar phenomenon, in the form of auxiliary doubling (Nunes 1999: 247; from Guasti, Thornton and Wexler 1995): (19) a. Why did the farmer didn’t brush his dog? b. What kind of bread do you don’t like? c. Why could Snoopy couldn’t fit in the boat? Landau (2006) proposes that both copies have to be pronounced in cases such as (17)− (18) as each satisfies a distinct PF requirement: the top copy bears a high-pitch accent, signalling focus, whilst the lower copy (in T) bears inflection (cf. do-support in English). Whatever the correct analysis turns out to be, and whether or not V-copying can be unified with wh-copying, it is clear that such phenomena become far more transparent under the copy theory of movement than they would be under trace theory. With multiple copies realizable under certain PF-defined circumstances, a further possibility that arises once movement is viewed as copying is for different parts of each copy to be realized (so-called scattered deletion; cf. Bošković 2001), rather than one single entire copy. Fanselow and Ćavar (2002) propose such an analysis for certain LeftBranch Extraction effects (split constituents) in Slavic and Germanic (see also Bošković 2005), as illustrated by the following Serbo-Croatian examples. (20) [Crveno auto] je on [crveno auto] kupio. car be he red car bought red ‘He bought a red car.’ (Bošković 2005: 13 [41]
[Serbo-Croatian]
krov] je Ivan [na kakov krov] bacio. (21) [Na kakov on what.kind.of roof be Ivan on what.kind.of roof ball krov] loptu [na kakov thrown on what.kind.of roof ‘On what kind of roof did Ivan throw the ball?’ (Nunes 2004: 29 [52b]
[Serbo Croatian]
Of course, scattered deletion must be constrained so as not to overgenerate and yield phonological forms corresponding to unacceptable left-branch extractions such as those in (22b) and (23a−b), from Bošković (2005: 14 [44]−[47]).
822
IV. Syntactic Models (22) a. [Visoke djevojke] je on vidio [visoke djevokje] tall girls is he seen tall girls ‘Tall girls, he saw.’
[Serbo-Croatian]
b. *?[Visoke djevojke] je on vidio [visoke djevokje] tall girls is he seen tall girls djevojke] je on vidio [visoke lijepe (23) a. *[Visoke lijepe tall beautiful girls he is seen tall beautiful djevokje] girls ‘Tall beautiful girls, he saw.’
[Serbo-Croatian]
djevojke] je on vidio [visoke lijepe djevokje] b. *[Visoke lijepe tall beautiful girls he is seen tall beautiful girls To this end, Bošković (2001) proposes that scattered deletion is characterized by the kind of inertia we attributed to the syntax in section 1: operations do not apply freely, but only if forced to do so. Thus scattered deletion might best be viewed as a last-resort PF strategy that applies only if independent PF constraints block full pronunciation of the top copy (i.e. full deletion of lower copies). Such a constrained approach has been usefully applied to a range of languages and phenomena (cf. Lambova 2004 on participle-auxiliary orders in Bulgarian). In sum, considerations of structural optimality and computational simplicity (monotonicity, the NTC) lead to a simpler approach to movement (as copying) that finds compelling empirical substantiation. Other notable empirical applications of the copy theory of movement include Polinsky and Potsdam’s (2002) treatment of backward control and Grohmann’s (2003) analysis of resumption as lower-copy spell-out, and Fujii’s (2005) approach to copy raising in English; see also Bobaljik (2002) on Holmberg’s Generalization (Holmberg 1986, 1999) and covert movement as lower-copy spell-out in a “single output” model of the syntax.
3. Agree and uninterpretable features 3.1. Probe-Goal A unified picture of subject and object agreement had started to emerge by the end of the GB era. In earlier GB, subjects were assigned nominative case by the finite inflectional head I (combining Tense and Agreement) in the SpecIP position; this SpecifierHead relation was then also responsible for subject-verb agreement. Objects, on the other hand, received case via government by the lexical verb. These differing configurations for subject versus object case corresponded to the difference between subjects and objects with regard to verbal agreement in languages like English. However, work such as Baker (1988) on Chichewa and other languages showed that object agreement was also a possibility; around the same time, Kayne (1989) and Christensen and Taraldsen (1988)
24. Minimalism
823
had argued that participle agreement with objects in French and Scandinavian was contingent on movement, indicating a specifier-head relation for object agreement lower in the clause. In order to unify participle agreement with finite verb agreement, Kayne (1989) postulated an Agr head for the former. Pollock (1989) also identified a low Agr head as the landing site of short-distance infinitive raising in French. This projection was adopted as AgrOP in Chomsky 1991 (Chapter 2 of Chomsky 1995). Completing the symmetry, subject agreement was now attributed to an AgrS head, following Pollock’s (1989) splitting of IP into AgrS and Tense. Since case and agreement were now licensed together, for objects and subjects alike, in uniform specifier-head configurations, the GB notion of case assignment was replaced in early minimalism with one of checking (of case and agreement features) with functional heads. Thus nouns and verbs enter the derivation fully inflected for case and agreement features: nouns bear uninterpretable Case features as well as their inherent, interpretable phi-features (person and number), whilst verbal features include uninterpretable phi-features, thus eliminating the need for problematic lowering operations of earlier approaches such as Affix-Hopping in English. Agreement then becomes a process of checking these uninterpretable features with matching ones in a local checking configuration defined by a functional head. Movement therefore feeds agreement, since checkees have to move into a relevant checking domain in order to check their features. This movement could be either overt or covert, depending on whether it took place before or after Spell-Out (cf. [4]). This in turn was determined via the postulation of “strong” categorial features on functional heads. Since these were stipulated to be PF-uninterpretable, such features had to be checked pre-Spell-Out, resulting in overt movement of verbs into head-adjoined checking configurations and of arguments into specifier-head checking configurations. In the absence of a strong feature, checking would be delayed until the covert component, where formal features alone would raise and adjoin to the relevant functional head (in accordance with the least effort principle Procrastinate; see section 1). In this way, differences in overt word order could be captured as parametric differences in the featural composition of functional heads. For example, the requirement in languages like English that the derived subject position (SpecIP) be overtly filled in finite clauses (cf. [24]), known as the Extended Projection Principle (EPP), could be formally stated as a strong D-feature on the head I. Similarly, the difference between English and French in terms of verb movement to I (cf. English [25] versus the French equivalent in [26]) could be captured by saying that English I has a weak V-feature and French a strong V-feature. (See Adger 2003 for a summary of many further crosslinguistic parametric differences stated in terms of feature strength.) (24) a. There appeared a face at the window. b. A face appeared at the window. c. *Appeared a face at the window. (25) a. John often [VP kisses Mary]. b. *John kisses often [VP tV Mary]. (26) a. *Jean souvent [VP embrasse Marie]. Jean often kisses Marie
[French]
824
IV. Syntactic Models b. Jean embrasse souvent [VP tV Marie]. Marie Jean kisses often ‘Jean often kisses Mary.’ It is fair to say that controversy remains, however, as to the explanatory value of simply restating word-order differences in terms of feature strength. Not least, the approach sheds little light on why something like the EPP requirement should exist in the first place in languages like English. The EPP remains a poorly understood condition, resisting principled explanation, and various researchers have tried to eliminate it (cf. Grohmann, Drury, and Castillo 2000) or to reduce it to other mechanisms, such as Case (Martin 1999; Epstein and Seely 2006). Perhaps the greatest insight and innovation in minimalist work on the EPP is to be found in Alexiadou and Anagnostopoulou (1998), who propose that rich verbal agreement in null-subject languages like Greek and Italian is essentially nominal (bears a D-feature), enabling T’s EPP feature (strong D-feature) to be checked via movement of the finite verb (V-to-I) in these languages, extending the phenomenology and parametrization possibilities of the minimalist EPP in interesting new directions (some of which are explored in Richards and Biberauer 2005). Feature-strength aside, the alternation in (24a−b) has had a particular impact on the development of minimalist models of checking and agreement (see Boeckx 2006: 186− 190 for a survey of the earliest and intermediate stages). The question of greatest concern here is how subject agreement is established between the associate argument (a face in [24]) and the agreement head (I/AgrS/T). (That the verb agrees with the associate can be seen in, e.g., There seem(*s) to be several men in the garden.) In (24b), agreement is established overtly in a specifier-head configuration through raising of the associate to the EPP/subject position. In (24a), it is established covertly, via raising of agreement features at LF. However, as pointed out by den Dikken (1995), the raised features do not yield new binding possibilities at LF (*There seem to each other to be some applicants eligible for the job), nor can they license negative polarity items (*There seem to any of the deans to be no applicants eligible for the job). This perhaps suggests that no raising in fact takes place, even covertly. At the same time, these structures throw up questions as to where the trigger for movement lies. Last Resort (section 1) dictates that there must be an uninterpretable feature (a morphological deficiency) somewhere that drives the movement. Given the well-formedness of (24a) with the associate a face remaining in situ (overtly), it cannot be any deficiency on the associate that forces it to move in (24b). Rather, as we have seen, movement in (24b) is driven by the EPP − the associate satisfies a strong feature of the target head; it thus moves for “altruistic” reasons. On the other hand, Chomsky (1993) argues on the basis of examples like (27) that movement must be for reasons of Greed, i.e. to satisfy a deficiency on the moving XP. (27) *There seems to [a strange man] [that it is raining outside] Case on a strange man is licensed by the prepositional head to; it therefore has no further need to move or do anything. Consequently, it is unable to raise overtly to check agreement with the matrix T/Infl head. Were properties of the target alone enough to license movement (and thus agreement) at LF, this sentence should converge, wrongly. This suggests that Greed is at stake. Yet Greed alone cannot explain (24b). Lasnik (1993)
24. Minimalism suggests a compromise view, Enlightened Self-Interest (ESI), in which the uninterpretable feature that is checked through movement may be located either on the target or on the moving item. The illegitimacy of (27) then suggests a further restriction such that case-checked nominals are inert for further movement and agreement operations (Lasnik 1995). These empirical considerations, combined with the conceptual problems associated with the computational complexities of Greed and Procrastinate (lookahead and global comparison of derivations; cf. Chomsky 1995: 201, 261), indicated that a rethink of checking theory was in order. The simple system that replaced checking − Probe-Goal Agree of Chomsky (2000) − addresses all the above problems, dispensing with associate movement in expletive constructions and replacing ESI with the Activity Condition: both items entering an agreement relation must have as yet unsatisfied featural requirements (i.e. both must at least have the potential of having a feature satisfied by the agreement operation in question), allowing agreement to be asymmetric (as under Greed and ESI) but also symmetric (as under ESI). Under Probe-Goal Agree, uninterpretable features are modelled as features that lack a value (addressing problems raised by Epstein and Seely 2002). Unvalued phi-features on functional heads, called probes, then seek to find a matching set of interpretable (valued) phi-features, a goal, inside the existing structure, i.e. inside their complement (thus Probe-Goal Agree has the additional advantage of dispensing with special checking relations such as specifier-head and replacing them with the general, independent relation of c-command/sisterhood). Should the closest matching goal be active (by virtue of having an unvalued feature of its own − Case), then Agree(Probe, Goal) takes place, as a result of which the unvalued phi-features of the probe receive values from the goal, and the unvalued Case feature of the goal is valued by the probe (nominative by T, accusative by transitive v). Once valued, probe and goal are no longer active, and so cannot participate in any further Agree operations. Case- and agreement-valuation thus go hand in hand: nominals can agree only once, capturing (27) and “Inverse Case Filter” effects (Bošković 2002) formerly attributable to the ECP (*John is believed that tJohn is happy); and the same holds for probes, yielding Case Filter effects, such as *It seems John to be happy: the matrix probe is valued by it and so is inactive for valuing Case on the lower argument, John, leading to a violation of FI. Nevertheless, we should note that the validity and necessity of the Activity Condition on Agree have been disputed by many, who take the null assumption to be that any valued or interpretable phi-set, active or otherwise, should be sufficient to value a probe (see, e.g., Rezac 2004 for such a view, and Nevins 2005 for an influential attempt to eliminate the Activity Condition). Thus it is still widely assumed that inactive phi-sets can intervene for Agree between a probe and a more remote, active goal, yielding socalled defective intervention (Chomsky 2000: 123, 129; Boeckx 2003; Hiraiwa 2005), which has been claimed to underlie certain Minimal Link Condition effects, such as superraising and wh-islands (see [7] above).
3.2. Empirical advantages Probe-Goal Agree severs the tie between movement and agreement. The configuration in (24a), in which the goal remains in situ, becomes primary, and movement to (spec-) probe obtains only in the presence of an additional movement trigger (the generalized
825
826
IV. Syntactic Models EPP-feature of Chomsky 2000, or else the OCC-feature of Chomsky 2001; the Edge Feature of Chomsky 2007 now subsumes this role, albeit as a more fundamental property of lexical items enabling Merge in general − see section 2.2). In terms of the empirical payoff that this affords, we have seen that a superior account of expletive-associate constructions emerges, as well as (Inverse) Case Filter effects in terms of FI. However, the price we pay for this is the loss of the connection between agreement and movement witnessed in the kinds of Romance and Germanic participle agreement facts that originally motivated the postulation of Agr(O) and spec-head object agreement in works such as Kayne (1989). To address this deficit, numerous attempts have been made to revive this connection within a phase-based probe-goal system; see, for example, Svenonius (2001a); Holmberg (2002); D’Alessandro and Roberts (2008); Richards (2012). On balance, it seems fair to claim that probe-goal Agree attains an overall net increase in empirical coverage, allowing a considerable range of new agreement phenomena to be accounted for that were beyond the scope of earlier minimalist and GB approaches to agreement. Space permits just the briefest mention of a few of them. Firstly, since all agreement under Probe-Goal Agree is “at a distance”, nonlocal agreement phenomena are much more readily captured. Such long-distance, cross-clausal agreement, in which the matrix verb registers agreement with an in-situ embedded argument, can be found in such languages as Itelmen (Bobaljik and Wurmbrand 2005), Chukchee (Stjepanović and Takahashi 2001; Bošković 2007), Blackfoot (Legate 2005), Tsez (Polinsky and Potsdam 2001) and Hindi (Boeckx 2004; Bhatt 2005) − examples of the latter two are given in (28) − and all have been given Agree-based analyses in the abovecited works. [už-ā magalu bāc’ruɫi] b-iyxo. (28) a. Eni-r [Tsez] Mother.DAT [boy-ERG bread.III.ABS III.ate] III-know ‘The mother knows the boy ate the bread.’ (III = Class 3; Polinsky and Potsdam 2001: 584) b. Vivek-ne [kitaab parh-nii] chaah-ii. Vivek-ERG book.F read-INF.F want-PFV.F ‘Vivek wants to read the book.’ (Boeckx 2004: 25 [5])
[Hindi]
Agree has also opened up a well-known class of agreement restrictions, the Person-Case Constraint (PCC; Bonet 1991, 1994), to perspicuous analyses in terms of probe-sharing (multiple Agree). The PCC bans combinations of dative arguments with local-person (1/2) direct objects (where both arguments agree with the verb); the French me-lui restriction in (29) is an instance of this, though other variants of the PCC are found (see Boeckx 2000; Anagnostopoulou 2005; Alexiadou and Anagnostopoulou 2006; Nevins 2007; Rezac 2008; Heck and Richards 2010 and others on the “weak PCC” and its resemblance to direction marking in languages like Algonquian; many of these works also offer analyses in terms of Probe-Goal Agree). (29) Jean le/*me lui a recommandé. Jean it/me him has recommended ‘Jean has recommended it/me to him.’
[French]
24. Minimalism
827
The key insight afforded by a Probe-Goal perspective on agreement is that PCC restrictions obtain where two goals relate to a single functional head (probe) for Case-valuation (Anagnostopoulou 2003; Rezac 2004): (30) PCC: single probe, multiple goals … GNOM/ACC/ABS ] % *NOM/ACC/ABS-1/2 [P … GDAT/ERG Although the Activity Condition (see previous section) prevents a probe from entering a full agreement relation with more than one goal (since once its features are valued, they are no longer active), partial or split agreement is possible wherever the functional head bears a composite probe. Thus Person and Number may probe separately (Béjar 2003; Rezac 2003) and value distinct arguments. By minimal search, the first (closest) goal encountered by the probe gets the first bite of the cherry, valuing as many features on the probe as it can. Any residue left over may then probe further, valuing a second, more remote argument; however, because the remaining probe is diminished, there will be fewer agreement (matching) possibilities for that second argument, which places restrictions on its featural composition (if its Case is to be valued and thus FI to be met). PCC effects then arise when the first goal values Person on the probe, leaving only Number to probe, match and value the second goal. The second goal must therefore lack a Person feature, if it is to fully match the remaining probe. On the common assumption that third-person is the absence of Person (variants of this assumption are made in most of the above-cited works), and that dative arguments are obligatorily [+Person] (see, e.g., Adger and Harbour 2007), it follows that the remoter goal (object) is barred from being 1/2-person (and/or animate; see Ormazabal and Romero 2007, who argue that only animate objects take part in the verbal Case/agreement system, with the PCC thus reducing to a ban on multiple object agreement). The general configuration is depicted in (31). (31)
FP Probe {Pers = □, Num = □}
… XP DP1[+Person]
… VP V
DP2[−Person, +Num]
Many interesting variants exist, such as those based on the cyclic expansion of search space for probes (Rezac 2003; Béjar and Rezac 2009). Here, the probing head that values the two goals is situated between them, rather than above them (as in [32]). The probing head (e.g. v) is assumed to first seek a goal in its sister (complement) domain; should its features not be fully valued by the object it finds there, projection of the probe under bare phrase structure allows it then to search inside its specifier, so that the head that normally agrees with the internal argument (valuing internal case) may exceptionally agree with its external argument for certain features. Béjar and Rezac (2009) show how the Probe-Goal configuration in (32) yields a transparent analysis of ergative displacement in Basque (where the external argument
828
IV. Syntactic Models vprobe = {□, □}
(32) DP2
vprobe = {□, □}
vprobe = {□, □}
VP V
DP1
controls absolutive agreement just in case the internal argument is third-person) and related direct−inverse phenomena. More generally, the idea that the functional head v may, under certain circumstances, probe into its specifier has been argued by Müller (2008) to yield the basic difference between ergative−absolutive languages, on the one hand, and those with nominative−accusative alignment on the other. In sum, fine-grained approaches to agreement are a unique empirical selling point of minimalist Probe-Goal-Agree, allowing more exotic and complex agreement patterns and restrictions to be derived relatively straightforwardly.
4. Transfer and phases 4.1. The cyclicity of Agree In section 3 we saw how the Edge Feature captures the cyclicity of Merge, conforming to the NTC. Every head is a Merge cycle, with no merger possible to heads already passed in the derivation. Similar conclusions arise regarding the cyclicity of Agree and the uninterpretable features that drive it. Thus, in order to ensure that subjects are islands (Condition on Extraction Domain [CED] effects, Huang 1982), passivization (A-movement) must precede wh-movement in (33) (Chomsky 1995: 328). (33) *Who was [a picture of twho]i taken ti? In terms of Agree, the Probe-Goal relation between T and the subject (yielding subjectverb agreement and raising to SpecTP) must precede that between C and who, in order to rule out a countercyclic derivation of (34b) in which the subject-island effect would be bled (by first moving who to SpecCP then the remnant subject to SpecTP). (34) a. [CP Who was [TP T [VP taken [DP a picture of twho]]? % b. [CP Who was [TP [DP a picture twho]I of T [VP taken ti]? Every head must define a cycle not only for EFs, then, but also for uFs, so that the phiprobe and strong-D (or EPP) feature on T must be satisfied before the derivation proceeds to the next head (C). In terms of the original Strict Cycle Condition (Chomsky 1973), every head with featural requirements must now count as a cyclic node. The
24. Minimalism resultant featural cyclicity (N. Richards 1999) is formulated by Chomsky (1995: 234 [3]) as in (35), where α is an unvalued (active) feature. (35) D[erivation] is cancelled if α is in a category not headed by α. Active features are thus possible only on the root node (or the “locus” in terms of Collins 2002, who develops and extends this idea). In GB, the island effect in the cyclic derivation of (33) (also [6c] above) was captured by Subjacency, a locality condition on movement: Two blocking categories, DP and IP, are crossed in a single step in the cyclic version of (34b). Subjacency also yielded the effect of the Strict Cycle Condition, forcing long-distance, unbounded movement dependencies to proceed via shorter, bounded steps through intermediate CPs. In minimalism (since Chomsky 2000), the relevant cycles for successive cyclic movement are defined as phases (C and transitive v). A phase is both a point at which the syntax is accessed and evaluated by the interfaces (“transferred”), and the entity transferred at that point. Like Subjacency, phases combine cyclicity and locality. Transferred material becomes opaque to further syntactic computation, in accordance with the Phase Impenetrability Condition, yielding absolute locality effects − the phase boundary represents an upper limit on the length of movement and agreement dependencies. This in turn forces movement to proceed cyclically via an escape hatch at the edge of the phase (SpecCP, SpecvP). In this way, phases “yield a strong form of subjacency” (Chomsky 2000: 108). Nevertheless, phases are still wanting as a theory of locality − they clearly do not make for very good islands, precisely because they are designed to be escapable, via the phase edge (see also Boeckx and Grohmann 2007 on this point). What is required to bring phases up to the level of descriptive adequacy attained by barriers in GB, then, is a theory of precisely when the phase edge (specifier region) is and is not available (i.e. able to be projected). Müller (2010) offers a possible way of addressing this problem. The resultant phase cyclicity is not just relevant for movement, however. That phases define cycles for Agree(ment) too is apparent from the problem in (36) (cf. Chomsky 2004a; Anagnostopoulou 2003; Müller 2004). (36) a. What did John read? b. [CP What did [TP John T [vP (what) [vP (John) [VP read (what)]]]]] The problem is how T can probe the subject John across the wh-copy/trace what in SpecvP. The wh-copy contains the kind of features that the T probe is looking for (i.e. phi-features) and should therefore block Agree(T, John) as a closer potential goal (by defective intervention). The solution proposed by Chomsky (and elaborated in Anagnostopoulou 2003) is that phases are cyclic domains (i.e. domains of rule ordering), so that all “operations within the phase are in effect simultaneous” (Chomsky 2004a: 116, 128 [fn. 63]), taking place at the phase level (at the point of Transfer to the interfaces). Therefore, C can wh-probe what and displace it to SpecCP “before” T probes John, thus removing the intervener out of the way. The upshot is that phase-internal countercyclicity effects are expected, such as those holding between T and C (also V and v). However, this means that the featural/locus cyclicity needed to account for the CED effect in (33), where every head is a cycle, is lost. Further, as Epstein and Seely (2002) discuss at length, delaying the valuation and
829
830
IV. Syntactic Models deletion of T’s uFs until the phase level is conceptually problematic. Chomsky (2007, 2008) addresses the latter problems by making the phase heads themselves the locus of the agreement probes (with these features being passed down onto their complements, T and V, by feature inheritance). It is thereby the phase heads that define agreement cycles. The facts in (33) and (7), however, indicate that we must maintain that every head (phase and nonphase) is still a separate Merge cycle. I would therefore suggest that what we arrive at is a kind of relativized featural cyclicity in which each primitive feature type defines its own cycle: EFs define Merge cycles and uFs define phase cycles; every head is an EF cycle (since every head has EF, in order to be able to Merge), whilst every phase head is a uF cycle (since phase heads are the uF sites).
4.2. Conceptions of phases The model of the grammar given in (4) has each interface accessing the syntax only once, at the end of the derivation and after the Numeration has been exhausted. However, to restrict the operation Transfer in this way is arbitrary and stipulative. Further, as argued by Epstein et al. (1998) and Epstein and Seely (2002), eliminating the internal levels of D-structure and S-structure is not enough from the minimalist perspective. Rather, LF and PF should also be replaced as integral levels of representation (as should the special PF-only operation Spell-Out, a residue of S-structure), with every derivational step (transformation) being accessed and evaluated by the interfaces as the computation proceeds (“invasive”, online, dynamic interpretation). That is, we should aim for a multiple-Transfer model of the grammar: (37) Lexicon → Numeration ← Transfer PHON
SEM
PHON
SEM
PHON
SEM
← Transfer ← Transfer ← Transfer PHON
…
SEM ← Transfer
PHON
SEM
Breaking down the numeration into smaller subarrays would be one way to achieve such multiple Transfer − the derivation proceeds on a subarray-by-subarray basis, with Transfer occurring upon exhaustion of each subarray. Alternatively, uFs might act as the trigger for Transfer − as argued by Epstein and Seely (2002), valued uFs must be immediately transferred and deleted, without delay, in order for convergence to be possible (the CI interface must be able to distinguish between valued uninterpretable and interpretable
24. Minimalism features in order to delete the former). The former view of phases, as lexical subarrays, is put forward in Chomsky (2000), where each subarray contains exactly one instance of one of the phase heads (C, v). The latter conception of phases might be called the convergence view, since it is premised on the idea that cyclic Transfer is required in order to produce legitimate interface objects − either legitimate PF-objects (Uriagereka 1999, who proposes that complex left branches must be spelled out separately for linearization reasons, in order to integrate them with the main tree by the LCA) or legitimate CI-interpretable objects (Epstein and Seely 2002; Chomsky 2007, 2008). See also Svenonius (2001a, b) in this connection, who offers an alternative convergence-based notion of phases in which structure is transferred as soon as it is internally convergent (i.e. with all features valued). Whether conceived of as subarrays or as uF-sites, phases (and cyclic Transfer) have been argued by Chomsky to facilitate the syntactic computation and to economize on cognitive resources in various ways, and thus to be a central third-factor property in an optimally designed FL conforming to the SMT. For example, under the subarray conception of Chomsky (2000: 99−101), phases minimize the workspace (the amount of lexical information that has to be “carried along” at any given point of the derivation); in Chomsky (2001: 5), phases minimize the delay between valuation and deletion of valued uFs, in the manner reviewed above; in Chomsky (2004a: 4), phases minimize working memory and the search space available to probes through the periodic “forgetting” of structure (interpreted material cannot be further modified by Merge or Agree); Chomsky (2007: 5, 16) argues that phases contribute to optimal design by implementing strict cyclicity; and Chomsky (2008: 4, 8−9) emphasizes the elimination of redundant internal levels and cycles (LF and the mappings to the interfaces) that phases afford, yielding “single cycle generation” (cf. [37]).
4.3. Empirical advantages Chomsky’s (2000) original formulation of phases and its immediate development (the revised Phase Impenetrability Condition in Chomsky 2001) were guided by empirical considerations. Lexical subarrays are first motivated as a way to localize the domain in which Merge of an expletive pre-empts movement of a noun phrase. (38) a. There is likely [α to be a proof discovered] b. *There is likely [α a proof to be (a proof) discovered] c. There is a possibility [α that proofs will be (proofs) discovered] The pair in (38a−b) shows the relevant effect (Merge-over-Move): there must merge to the embedded subject position (subsequently raising to the matrix subject position), blocking movement of a proof inside the embedded clause α. The question is why the presence of there in the numeration does not similarly block raising of proofs in (38c). The problem is solved if α in (38c), but not in (38a−b), is built from a distinct subarray of the numeration, from which it is then possible to exclude the expletive. Finite clauses (CPs) are thus phases (the product of a separate subarray); non-finite TPs are not. It is worth noting, however, that the status of the Merge-over-Move preference is contentious
831
832
IV. Syntactic Models (see Castillo, Drury and Grohmann 1999 for early criticism), and becomes especially dubious once Move is reconceived as simply internally-applied Merge (section 2.1). The pre-emption effect on which it hinges followed under the earlier assumption that Merge was a simpler operation than Move (which comprised the subcomponents Agree and Pied-pipe; Chomsky 2000: 101), which is no longer valid. Any differentiation in the “cost” of applying internal versus external merge now has the flavour of a stipulation (see, e.g., Narita 2009b). To capture the effects of successive cyclicity, Chomsky (2000: 108) proposes that the complement of the phase head is transferred, becoming inaccessible to further operations − TP in the case of the C-phase, and VP for the v-phase. Transfer of the complement leaves the specifier(s)/edge of the phase available as an escape hatch to which active material inside the complement can move, thereby allowing the derivation to continue and converge (this property of the phase edge thus follows from optimal design). Some of the most convincing evidence that phase-cyclic computation does indeed proceed in this way, and thus for the reality of phases as units of interpretation, has arguably come from the PF-interface. In addition to Uriagereka’s (1999) reduction of CED effects to the workings of PF-motivated multiple spell-out, several researchers have derived syntactic order preservation constraints such as Holmberg’s Generalization (Holmberg 1986, 1999) from phase-cyclic linearization (Richards 2004; Fox and Pesetsky 2005). Others have explored the possibility that phase-cyclic mapping to the PFinterface should be detectable in the form of phonological phrase boundaries coinciding with syntactic phase boundaries. Thus Franks and Bošković (2001) show how the placement of second-position clitics in Bulgarian is sensitive to the phase boundary that occurs between C and its complement, TP. In a similar vein, Richards (2004) proposes that the distribution of weak pronouns in VO (head-initial) versus OV (head-final) Germanic provides evidence for the reality of the phase boundary between v and its complement, VP. Consider the paradigm in (39), based on Thráinsson (2001). (39) a. Nemandinn las (hana) ekki (*hana). student.the read it not it ‘The student didn’t read it.’
[Icelandic: VO]
b. Nemandinn hefur (*hana) ekki lesið (hana). student.the has it not read it ‘The student hasn’t read it.’ c. Der Student las (es) nicht (*es). the student read it not it ‘The student didn’t read it.’
[German: OV]
d. Der Student hat (es) nicht (*es) gelesen. the student has it not it read ‘The student hasn’t read it.’ Weak pronouns are forced to undergo obligatory object shift in VO Scandinavian (39a), but only in those environments in which the finite verb raises out of VP (39b), in accordance with Holmberg’s Generalization. However, in OV Germanic (German, Dutch, Afrikaans), this shifting of weak pronouns is obligatory irrespective of the position of the
24. Minimalism
833
finite verb (the object must shift in [39d] no less than in [39c]). If weak pronouns are phonologically enclitic, requiring incorporation into a leftward-neighbouring prosodic word at PF, then the facts in (39) follow immediately: Transfer imposes a phase boundary across which cliticization is impossible (since phases are separate spell-out units). (40)
CP C' ← Transfer TP
C Subj
T' T
vP v'
(Subj)
← Transfer VP
v V
Obj
............X..................................X.....................
Due to the impossibility of cross-phasal cliticization, a weak pronoun cannot be the leftmost element inside a phasal domain. Its host must be a phase-mate; for in-situ weak pronouns, this means the host must be inside VP. In VO languages (41), if the verb raises, so must the weak pronominal object, as only an in-situ verb can meet this requirement. In OV languages (42), nothing can meet it, since the verb is to the right; therefore, movement of the pronoun out of VP is always forced, no matter whether the verb moves out of VP or not.
(41) Icelandic (VO) (= [39a]) a. Unshifted pronoun *[CP [Nemandinn] [C′ [las] ... [vP ekki [VP tV hana]]] ( )( )( )φ b. Shifted pronoun [CP [Nemandinn] [C′ [las] ... [vP hana [vP ekki [VP tV tObj]]] ( )( )φ
(42) German (OV) (= [39d]) a. Unshifted pronoun *[CP [Der Student] [C′ [hat] ... [vP nicht [VP es gelesen]]] ( ... )( )φ b. Shifted pronoun [CP [Der Student] [C′ [hat] ... [vP es [vP nicht [VP tObj gelesen]]] ( ... )( )φ
834
IV. Syntactic Models
5. Outlook In terms of longevity, Minimalism has already surpassed all previous transformational− generative models (Standard and Extended Standard Theory, GB). The results of nearly twenty years of minimalist research show that the MP’s attempt to reduce GB-style descriptive technology to interface conditions and general computational and cognitive principles is a viable, realistic and worthwhile enterprise. The list in (43) summarizes some of the results reviewed above, which are themselves a far from comprehensive selection. (43) Minimalism (third factor) Full Interpretation No-Tampering Condition Phases
! ! !
GB (first factor) Case Filter, Theta-Criterion, ECP Projection Principle, Theta-Criterion, strict cycle, extension condition Barriers (Subjacency, CED), strict cycle
To the list in (43) might reasonably be added the reduction of GB’s Control module and the empty category PRO to Internal Merge (the so-called Movement Theory of Control, MTC; cf. Hornstein 1999) as another important example of a successful and significant minimalist result, with notable additional empirical payoffs under the copy theory of movement (e.g. backward control; cf. Polinsky and Potsdam 2002): see Boeckx, Hornstein and Nunes (2010) for a comprehensive survey and presentation of the conceptual and empirical arguments in favour of the MTC as a minimalist theory of control. For a wider perspective and alternative views, see Kiss (Chapter 38, this volume). In addition, we saw that Merge eliminates D- and S-structure, and phases eliminate LF and PF as representational levels. What remains is a largely empty UG, perhaps comprising just the basic inventory of formal features (uFs and EFs) required to yield the three minimal operations, Merge, Agree and Transfer. The biolinguistic ideal of the SMT would therefore seem a serious prospect after twenty years of progress. Of course, innumerable challenges still stand in its path, and not just the obvious empirical ones. Among the questions currently shaping the minimalist program are the following. If UG is maximally devoid of FL-specific principles, then what about parameters? Chomskyan minimalism has always adopted the view that variation must be limited to the properties of lexical items and to the mapping to the AP interface (Chomsky 1995: 169, 221, etc.), domains outside the purview of the SMT; nevertheless, the role of the third factor in determining the form and fixing of parameters is ripe for investigation (cf. Boeckx 2008; Holmberg and Roberts 2008). A related question is whether the interfaces are “created equal”. Chomsky (2007, 2008) and Berwick and Chomsky (2009) suggest that language design is asymmetric, with SMT holding only between syntax and CI. In support of this idea, we might add that the very fact that Merge is symmetrical (see section 2.1) indicates that syntax is indeed not perfectly designed to meet conditions imposed by the AP interface − AP requires asymmetry (for linear order), yet the syntax is not geared towards providing this. Consequently, the AP interface has to do the best it can, allowing head-directionality macroparameters to naturally arise here (cf. Richards 2004). Things are very different at the CI interface, where the SMT holds. But many questions arise here too: most intriguingly, should the nature of the optimal mapping to CI be viewed functionally in terms of the CI system bending the syntax to its will (the
24. Minimalism “I-functional” view of the SMT; cf. Epstein 2007), or is the semantics in fact shaped in the image of the syntax, with perfection at the CI interface really a reflection of optimal syntactic computation (Hinzen 2006; Uriagereka 2008; Narita 2009a)? If interface conditions do hold sway (i.e. the former view), then does that imply that syntax should be crash-proof at the level of individual derivations (Frampton and Gutmann 2002), or simply at the level of FL (Chomsky’s “usable at all” criterion)? An example of the latter kind of “weak” crash-proofing is the need for uFs on phase heads to be offloaded onto their complements (via feature inheritance) in order to prevent every phase from crashing (and thus non-usability of FL); see Richards (2007). However, this conceptually superior view of phases (as the locus of uFs) is incompatible with the empirically superior version of the Phase Impenetrability Condition which Chomsky (2001: 13−14) proposes in order to allow for agreement relations between T and internal arguments, as demanded by facts from Icelandic and, if passive/unaccusative v is also a phase (cf. Legate 2003), by expletive-associate constructions. See Richards (2012) for further discussion of this incompatibility. This last point demonstrates particularly acutely the tension between data coverage and explanatory depth that lies at the heart of the minimalist program. The minimalist instinct is to favour the view of phases that better conforms to the SMT, and thus to seek alternative explanations for the recalcitrant data rather than abandoning the enterprise. To echo Chomsky (1995: 10), whether such instincts are on the right track, only time will tell.
24. Acknowledgements For helpful comments and suggestions on an earlier draft of this chapter, many of which have been taken into account in the version you now see before you, I am particularly grateful to Cedric Boeckx, Erich Groat, and Terje Lohndal, as well as to an anonymous reviewer. Any egregious oversights or gross distortions of the field remain my own.
6. References (selected) Adger, David 2003 Core Syntax. A Minimalist Approach. Oxford: Oxford University Press. Alexiadou, Artemis, and Elena Anagnostopoulou 1998 Parametrizing Agr: Word order, V-movement and EPP-checking. Natural Language and Linguistic Theory 16: 491−539. Anagnostopoulou, Elena 2003 The Syntax of Ditransitives. Evidence from Clitics. Berlin: de Gruyter. Anderson, Stephen, and David Lightfoot 2002 The Language Organ. Linguistics as Cognitive Physiology. Cambridge: Cambridge University Press. Béjar, Susana, and Milan Rezac 2009 Cyclic Agree. Linguistic Inquiry 40: 35−73. Berwick, Robert, and Noam Chomsky 2009 The biolinguistic program: The current state of its evolution and development. In: AnnaMaria di Sciullo and Cedric Boeckx (eds.), The Biolinguistic Enterprise: New Perspec-
835
836
IV. Syntactic Models tives on the Evolution and Nature of the Human Language Faculty. Oxford: Oxford University Press. Biberauer, Theresa, and Marc Richards 2006 True optionality: When the grammar doesn’t mind. In: Cedric Boeckx (ed.), Minimalist Essays, 35−67. Amsterdam: John Benjamins. Bobaljik, Jonathan 2002 A-chains at the PF-interface: Copies and “covert” movement. Natural Language and Linguistic Theory 20: 197−267. Boeckx, Cedric 2003 Islands and Chains. Resumption as Stranding. Amsterdam: John Benjamins. Boeckx, Cedric 2006 Linguistic Minimalism. Origins, Concepts, Methods, Aims. Oxford/New York: Oxford University Press. Boeckx, Cedric, Norbert Hornstein, and Jairo Nunes 2010 Control as Movement. Cambridge: Cambridge University Press. Boeckx, Cedric and Juan Uriagereka 2007 Minimalism. In: Gillian Ramchand and Charles Reiss (eds.), The Oxford Handbook of Linguistic Interfaces, 541−574. Oxford/New York: Oxford University Press. Bošković, Željko 2001 On the Nature of the Syntax-Phonology Interface: Cliticization and Related Phenomena. Amsterdam: Elsevier. Chomsky, Noam 1973 Conditions on transformations. In: Stephen Anderson and Paul Kiparsky (eds.), A Festschrift for Morris Halle, 232−286. New York: Holt, Rinehart & Winston. Chomsky, Noam 1995 The Minimalist Program. Cambridge, Mass.: MIT Press. Chomsky, Noam 2000 Minimalist inquiries: the framework. In: Roger Martin, David Michaels and Juan Uriagereka (eds.), Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik, 89−156. Cambridge, Mass.: MIT Press. Chomsky, Noam 2001 Derivation by phase. In: Michael Kenstowicz (ed.), Ken Hale: a Life in Language, 1− 50. Cambridge, Mass.: MIT Press. Chomsky, Noam 2004a Beyond explanatory adequacy. In Adriana Belletti (ed.), Structures and Beyond. The Cartography of Syntactic Structures, Volume 3, 104−131. Oxford: Oxford University Press. Chomsky, Noam 2004b The Generative Enterprise Revisited. Berlin: de Gruyter. Chomsky, Noam 2005 Three factors in language design. Linguistic Inquiry 36: 1−22. Chomsky, Noam 2007 Approaching UG from below. In: Uli Sauerland and Hans-Martin Gärtner (eds.), Interfaces + Recursion = Language? Chomsky’s Minimalism and the View from Syntax-Semantics, 1−30. Berlin: Mouton de Gruyter. Chomsky, Noam 2008 On phases. In: Robert Freidin, Carlos P. Otero and Maria-Luisa Zubizarreta (eds.), Foundational Issues in Linguistic Theory, 133−166. Cambridge, Mass.: MIT Press. Chomsky, Noam 2013 Problems of projection. Lingua 130: 33−49. Citko, Barbara 2005 On the Nature of Merge: External Merge, Internal Merge, and Parallel Merge. Linguistic Inquiry 36: 475−496.
24. Minimalism Collins, Chris 1997 Local economy. Cambridge, Mass.: MIT Press. Dikken, Marcel den 1995 Binding, expletives, and levels. Linguistic Inquiry 26: 347−354. Epstein, Samuel David, Erich Groat, Ruriko Kawashima, and Hisatsugu Kitahara 1998 A Derivational Approach to Syntactic Relations. New York: Oxford University Press. Epstein, Samuel David, and T. Daniel Seely 2002 Rule applications as cycles in a level-free syntax. In: Samuel David Epstein and T. Daniel Seely (eds.), Derivation and Explanation in the Minimalist Program, 65−89. Oxford: Blackwell. Epstein, Samuel David and T. Daniel Seely 2006 Derivations in Minimalism. Cambridge/New York: Cambridge University Press. Fox, Danny, and David Pesetsky 2005 Cyclic linearization of syntactic structure. Theoretical Linguistics 31: 1−46. Franks, Steven and Željko Bošković 2001 An argument for multiple spell-out. Linguistic Inquiry 32: 174−183. Freidin, Robert, and Jean-Roger Vergnaud 2001 Exquisite connections: some remarks on the evolution of linguistic theory. Lingua 111: 639−666. Gallego, Ángel 2007 Instability. Cuadernos de Lingüística del Instituto Universitario Ortega y Gasset 14: 87−99. Groat, Erich 1997 A derivational program for syntactic theory. Ph.D. dissertation. Harvard University. Grohmann, Kleanthes K. 2003 Prolific Domains: On the Anti-Locality of Movement Depenencies. Amsterdam: John Benjamins. Hauser, Marc, Noam Chomsky, and W. Tecumseh Fitch 2002 The Faculty of Language: What is it, who has it, and how did it evolve? Science 298: 1569−1579. Hinzen, Wolfram 2006 Mind Design and Minimal Syntax. Oxford: Oxford University Press. Holmberg, Anders 1986 Word order and syntactic features in the Scandinavian languages and English. Ph.D. dissertation, University of Stockholm. Hornstein, Norbert 2009 A Theory of Syntax. Minimal Operations and Universal Grammar. Cambridge: Cambridge University Press. Hornstein, Norbert, Jairo Nunes, and Kleanthes K. Grohmann 2005 Understanding Minimalism. Cambridge: Cambridge University Press. Katz, Jerry and Thomas Bever 1976 The fall and rise of empiricism. In: Thomas Bever, Jerry Katz and D. T. Langendoen (eds.), An Integrated Theory of Linguistic Ability, 11−64. New York: Thomas Y. Crowell Company. Kayne, Richard 1994 The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press. Landau, Idan 2006 Chain Resolution in Hebrew V(P)-Fronting. Syntax 9: 32−66. Lappin, Shalom, Robert Levine, and David Johnson 2000a The structure of unscientific revolutions. Natural Language and Linguistic Theory 18: 665−671.
837
838
IV. Syntactic Models Lappin, Shalom, Robert Levine, and David Johnsonoas 2000b The revolution confused: a response to our critics. Natural Language and Linguistic Theory 18: 873−890. Lasnik, Howard 1993 Lectures on Minimalist Syntax. University of Connecticut Occasional papers in Linguistics 1. [Reprinted in: Howard Lasnik 1999 Minimalist Analysis. Oxford: Blackwell.] Lasnik, Howard, and Terje Lohndal 2010 Government-Binding/Principles and Parameters Theory. Wiley Interdisciplinary Reviews: Cognitive Science 1: 40−50. Lasnik, Howard, and Juan Uriagereka, with Cedric Boeckx 2005 A Course in Minimalist Syntax. Foundations and Prospects. Oxford: Blackwell. Legate, Julie Anne 2003 Some interface properties of the phase. Linguistic Inquiry 34: 506−516. Martin, Roger 1999 Case, the Extended Projection Principle, and Minimalism. In: Samuel David Epstein and Norbert Hornstein (eds.), Working Minimalism, 1−25. Cambridge, Mass.: MIT Press. Moro, Andrea 2000 Dynamic Antisymmetry. Cambridge, Mass.: MIT Press. Müller, Gereon 2010 On deriving CED effects from the PIC. Linguistic Inquiry 41: 35−82. Newmeyer, Frederick J. 2003 Review of Chomsky, “On nature and language”; Anderson and Lightfoot, “The language organ”; Bichakjian, “Language in a Darwinian perspective”. Language 79: 583−599. Nevins, Andrew 2005 Derivations without the Activity Condition. In: Martha McGinnis and Norvin Richards (eds.), Perspectives on Phases 283−306. (MITWPL 49) Cambridge, Mass.: MIT Press. Nunes, Jairo 1999 Linearization of chains and phonetic realization of chain links. In: Samuel David Epstein and Norbert Hornstein (eds.), Working Minimalism, 217−249. Cambridge, Mass.: MIT Press. Pollock, Jean-Yves 1989 Verb movement, Universal Grammar, and the structure of IP. Linguistic Inquiry 20: 365−424. Reinhart, Tanya 1995 Interface strategies. OTS Working Papers in Linguistics. Utrecht. Rezac, Milan 2003 The fine structure of Cyclic Agree. Syntax 6: 156−182. Rezac, Milan 2004 Elements of Cyclic Syntax: Agree and Merge. Ph.D. dissertation, University of Toronto. Richards, Marc 2004 Object shift and scrambling in North and West Germanic: A case study in symmetrical syntax. Ph.D. dissertation, University of Cambridge. Richards, Marc 2007 On feature-inheritance: an argument from the Phase Impenetrability Condition. Linguistic Inquiry 38: 563−572. Richards, Norvin 1999 Featural cyclicity and the ordering of multiple specifiers. In: Samuel David Epstein and Norbert Hornstein (eds.), Working Minimalism, 127−158. Cambridge, Mass.: MIT Press. Riemsdijk, Henk van, and Edwin Williams 1981 NP-structure. Linguistic Review 1: 171−217. Rizzi, Luigi 1990 Relativized Minimality. Cambridge, Mass.: MIT Press.
25. Lexical-Functional Grammar
839
Saito, Mamoru, and Naoki Fukui 1998 Order in phrase structure and movement. Linguistic Inquiry 29: 439−74. Uriagereka, Juan 1998 Rhyme and Reason. Cambridge, Mass.: MIT Press. Uriagereka, Juan 1999 Multiple spell-out. In: Samuel David Epstein and Norbert Hornstein (eds.), Working Minimalism, 251−282. Cambridge, Mass.: MIT Press. Uriagereka, Juan 2008 Syntactic Anchors: On Semantic Structuring. Cambridge: Cambridge University Press.
Marc Richards, Belfast (UK)
25. Lexical-Functional Grammar 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Introduction LFG basics: c- and f-structure Syntactic phenomena Argument structure Interfaces/other projections OT-LFG Computational issues and resources Psycholinguistic research Conclusion References (selected)
Abstract LFG is a constraint-based theory with the goal of combining linguistic insight with computational tractability and implementability. As LFG separates surface form from functional dependency, restrictive assumptions about configurationality dominate analyses. LFG has been used to analyze diverse phenomena, including discontinuous constituents and long-distance dependencies in a large number of typologically diverse languages. As a result, a broad range of theoretical, descriptive and computational linguists work within LFG.
1. Introduction Lexical-Functional Grammar (LFG) took shape in the late 1970s when Joan Bresnan’s linguistic concerns about the continued viability of Transformational Grammar met up with Ron Kaplan’s ideas about psycholinguistics and computational modelling. The collection of papers in Bresnan (1982b) sets out the fundamental ideas of LFG. The theory
840
IV. Syntactic Models has since been extended to include new ideas and cover more data from a wide array of languages, but the fundamental ideas put forth in the late 1970s and early 1980s continue to be valid. LFG has the goal of combining linguistic sophistication with computational implementability (Dalrymple et al. 1995). A broad range of theoretical, descriptive and computational linguists work within LFG, with some work also being done in psycho- and neurolinguistics. Several current textbooks are available (Bresnan 2001; Dalrymple 2001; Falk 2001), as is a description of a major computational grammar development effort (Butt et al. 1999). This chapter is structured as follows. Section 2 discusses the basic LFG architecture. With just this basic architecture, interesting linguistic phenomena can be dealt with simply and elegantly. This is demonstrated in section 3, where we provide sample analyses for some core syntactic phenomena. In section 4, we introduce LFG’s linking theory and its application to data that have been discussed centrally within LFG. In section 5, we discuss interfaces to other modules of grammar, such as prosody, information-structure, semantics and morphology, as well as newer developments with respect to combining Optimality Theory (OT) with LFG. The chapter closes with a look at computational issues and resources in section 7.
2. LFG basics: c- and f-structure LFG posits two distinct syntactic representations: c(onstituent)-structure which encodes surface precedence and dominance in a tree structure and f(unctional)structure which encodes grammatical relations and other syntactic features in an attribute-value matrix that is basically a dependency representation. Simplified c- and f-structures for a simple transitive sentence are shown in (1). Throughout this chapter f-structures are simplified for expository purposes. In many instantiations of LFG, all semantically relevant material such as information about telicity, definiteness or animacy is recorded in the f-structure to then become part of the semantics. See (63) for how the different levels of representation feed into the semantic meaning of a sentence. (1)
a. Yassin will watch the movie. b. c-structure
c. f-structure
25. Lexical-Functional Grammar
841
Differences in word order are reflected directly at the c-structure: sentences with different word orders may therefore correspond to identical, or extremely similar, f-structures, especially in “free” word order languages. (Languages vary as to ordering possibilities and as to how word order is connected up to information structural differences. See section 5.2 for a short discussion of proposals within LFG which integrate a representation of information-structure.) For example, the German equivalent of (1) can have (at least) the two forms in (2). (2)
Film ansehen. a. Yassin wird den Yassin will the.M.SG.ACC film.M.SG watch ‘Yassin will watch the movie.’
[German]
Film wird Yassin ansehen. b. Den the.M.SG.ACC film.M.SG will Yassin watch ‘Yassin will watch the movie.’
[German]
These will be analyzed with different c-structures, but identical f-structures except for the choice of sentence topic, cf. Yassin in (2a); the movie in (2b). Indeed, for applications involving machine translation, f-structure is considered to be the appropriate level to work with, because this level of representation abstracts away from information that is too language particular, but still contains enough syntactic information to be useful (e.g., Kaplan et al. 1989; Sadler 1993; Frank 1999; Riezler and Maxwell 2006; Graham et al. 2009). While we have used a very simplified c-structure representation in (1), LFG does assume a version of X′-theory that goes back to Bresnan (1977) and that includes restrictions on the correspondence between different types of c-structure positions and the grammatical functions they can correspond to in the f-structure. For current assumptions about c-structural representations and constraints on c-structure, see Bresnan (2001); Asudeh and Toivonen (2009) and references therein. F-structures are referred to as a projection from the c-structure because they are related to the c-structure via a formal system of annotations. A typical (simplified) example for a fragment of English is shown in (3). The c-structure is built up on the basis of phrase structure rewrite rules. The rule in (3a), for example, says that a sentence consists of two major constituents, an NP and a VP. In effect, the S (sentence) is rewritten as an NP and a VP. Rewrite rules are part of the common heritage of modern generative syntactic theories (cf. Chomsky 1965) and continue to be a standard part of computational linguistics. As the mathematical basis of LFG is defined in terms of model theory, rewrite rules as in (3) are formally realized as a system of constraints that are satisfied by linguistic structures; see Kaplan (1995) and Pullum and Scholz (2001) for further discussion. (3)
a. b.
S
%
VP
NP
VP
(↑SUBJ)=↓ ↑=↓
%
V
NP
↑=↓ (↑OBJ)=↓
The phrase structure rules in (3) are annotated with functional equations. These equations provide the basis for computing the f-structure (information about number, person, tense,
842
IV. Syntactic Models etc. is part of the lexical entries of the nouns and the verb in our example). The up arrow refers to the f-structure of the mother node, i.e., VP in (3b), the down arrow references the current node, i.e., V or NP in (3b). The annotations relate the c-structure in (1b) and the f-structure in (1c) via a formal mathematical projection φ. Without going into too much detail, the representations in (4)−(6) show how the information provided by the annotations is projected onto the f-structure. In fact, the up and down arrows stand for variables, which have been instantiated by integers in (4)− (6). These arbitrary integers are used to label pieces of the f-structure. (4)
(2 PRED) = P ETER NP
2: [ PRED P ETER ]
Peter
(5)
(4 PRED) = VP
DRINK < SUBJ, OBJ>
4: [PRED
DRINK < SUBJ, OBJ>
]
drinks
(6)
The f-structure corresponding to the NP node is labeled 2 in (4), the f-structure corresponding to the S node is labeled 1 in (6). The functional annotations are treated as equations in a mathematical system, which need to be solved. Solving the set of equations is equivalent to combining bits of f-structural information into one large f-structure as in (7). To return to our simple example, the picture in (7) shows the correspondence between the c-structure and f-structure via the φ-function. That is, the f-structure annotations on the c-structure have all been solved. The arrows from the individual c-structure nodes to parts of the f-structure are an informal illustration of the correspondence function. Note that several c-structure nodes may correspond to a single part of the f-structure. LFG is a constraint-based theory. One effect of this is that when information from different parts of the c-structure contribute to a single part of the f-structure, the different pieces of information must be unified with one another and must therefore be compatible. For example, if a verb specifies information about its subject’s person and number, these values (e.g., third person singular) must be compatible with the values for person and number provided by the subject noun. For a sentence like The boy hops. the subject fstructure would be as in (8a) where the number and person information are specified by both the noun boy and the verb hops. An ungrammatical sentence like The boys hops. would provide conflicting number values, which is shown informally in (8b).
25. Lexical-Functional Grammar
843
(7)
(8)
a.
b.
The semantically unique PREDicate values provided by most words cannot unify, even if they appear identical superficially. This is because they carry a semantic index as part of their value and this unique index cannot unify with a different index. The index reflects the semantic reference of predicates, which is considered to be unique. F-structures must obey three constraints: uniqueness, completeness, and coherence. Uniqueness states that for each f-structure, there is a single attribute of a given type, e.g., only one subject, only one number; if more than one attribute of a given type is required, it must be in a set (see section 3.2 on modifiers and section 3.3 on coordination). Completeness requires every argument of a predicate to be filled, e.g., if a verb requires a subject and an object, both must be present in the f-structure (*Yassin devoured.). Coherence requires governable grammatical functions, i.e. the ones licensed as arguments of predicates, to be licensed by a predicate if they appear, e.g., an object cannot appear in an f-structure unless the predicate requires it (*Yassin slept Nadya.). The f-structures we have seen so far have included two basic grammatical relations: SUBJ(ect) and OBJ(ect). In LFG, grammatical relations are assumed as part of the syntactic inventory of every language and are referred to as grammatical functions (GF) to indicate their functional status, which is the relation of arguments and predicational elements to one another. Because GFs are assumed to not be subject to crosslinguistic variation, but are a basic part of the syntactic description language for every language, they are represented at f-structure (and only at f-structure). However, tests for different types of grammatical functionhood may play out differently across languages; see the discussion in Dalrymple (2001) for the LFG perspective (also Kroeger 2005; Croft 2001; Evans and Levinson 2009 for a different perspective). LFG assumes the GFs in (9). (9)
Grammatical Functions SUBJ OBJ OBJθ OBL(ique)θ COMP(lement) XCOMP(lement) ADJUNCT
Dalrymple (2001: 11−27) provides a useful discussion of the GFs as well as several syntactic tests by which they can be identified. COMP and XCOMP represent clausal argu-
844
IV. Syntactic Models ments. Canonically, the COMP is used for finite clauses (e.g., the English that-clause) while the XCOMP encodes nonfinite, open embedded clauses. An example is the to win in John wants to win. Here, the embedded verb win does not have an overt subject, rather, its subject is controlled by the matrix subject John (see Bresnan 1982a for the classic discussion on control and complementation). Finite clauses like the English thatclause are considered to be closed because all of the arguments are realized internal to the clause (i.e. there is no control from the matrix clause). The canonical example for OBJθ is the indirect dative object in languages like German and the second object in the English double object construction (10a). The OBL is used for prepositional phrases which are arguments of the verb. A classic example is the English indirect to object (10b). Other instances of OBL occur with verbs of motion as in (11), where the location is subcategorized for by the verb. (10) a. Kim gave the dog a bone. b. Kim gave a bone to the dog. (11) Chris went to/in/behind the house. The OBJ and OBL are subscripted with a θ to indicate that these GFs are sensitive to thematic role information. That is, PP arguments as in (11) generally reflect a specific spatial semantics. Similarly, indirect objects are generally tied to goals. In fact, there are regularities between the semantics of arguments and their syntactic expression in terms of GFs. Linking Theories in general attempt to capture these regularities (section 4). We close this section by noting that the effect of LFG’s projection architecture is that the levels of representation constrain each other mutually. That is, an analysis can only be successful if the f-structure information is complete and consistent, and if the phrase structure rules license the structure. Because the relation between c-structure and f-structure is stated in terms of a mathematical projection (φ), its inverse can also be computed. That is, not only can one see into the f-structure from the c-structure, but the f-structure can refer back to the c-structure. No level is truly primary and no information is ever lost via derivations. LFG thus falls under the label of constraint-based declarative theories of syntax (Head-driven Phrase Structure Grammar also falls under this category; see the HPSG chapter in this volume). In contrast to the fundamental derivational assumptions of GB/MP (see the Minimalism chapter in this volume), LFG assumes no derivations from one structure to another. Indeed, this is one of the characteristics which makes LFG computationally tractable. With the basic LFG architecture and constraints on the basic c- and f-structure representations in mind, the next section examines how LFG analyzes several well known syntactic phenomena.
3. Syntactic phenomena For purposes of concrete illustration, we go through several core syntactic phenomena in this section and show how they are analyzed in LFG. In many cases, the analyses will share aspects with other syntactic theories, especially within generative syntax and
25. Lexical-Functional Grammar particularly within constraint-based frameworks. However, the analyses also show marked differences, which are driven by the unique architecture and assumptions of LFG. The phenomena discussed include long-distance dependencies, modifiers, coordination, agreement, and case. Argument structure and other projections are described in subsequent sections.
3.1. Long-distance dependencies 3.1.1. Functional control: Equi and raising Functional control is described in detail in Dalrymple (2001), Bresnan (2001) and Falk (2001); the seminal article is Bresnan (1982a). In functional control, a single part of an f-structure plays two or more roles in the f-structure. Classic examples are so-called equi verbs as in (12), where the subject of want is identical to the subject of eat, but is only expressed once, namely in the matrix clause. (Equi verbs can also be analyzed as anaphoric control, see section 3.1.3) (12) a. Kim wants to eat beans. b.
This functional sharing is expressed at the f-structure by identifying two parts of an fstructure with one another. In (12) and also generally in the theoretical literature, this is graphically illustrated by having a line that connects the two parts of the f-structure with one another. Bresnan (1982a) shows that this type of functional control is a lexical property. That is, the type of matrix verb determines the type of functional control that happens. The verb want, for example, shows both subject and object control, where the matrix object controls the embedded subject, as in (13a). The verb persuade shows object control, as in (13b), but the verb promise is a subject control verb, as in (13c). (13) a. Kim wanted Sandy to eat beans. b. Kim persuaded Sandy to eat beans. c. Kim promised Sandy to eat beans. The X in the XCOMP signifies that it is an open function which expects its subject to be controlled. Given that this type of functional control is a lexical phenomenon, the lexical entries of the verbs have to state what kind of functional control the verb allows. A
845
846
IV. Syntactic Models typical equation, found in subject control verbs like promise, is shown in (14). Object control verbs would contain the equation in (15) and verbs like want contain a disjunction which allows for both options. The equations basically state the subject (or object) of the matrix verb is the same as the subject of the embedded clause. (14) (↑SUBJ) = (↑XCOMP (15) (↑OBJ) = (↑XCOMP
SUBJ)
SUBJ)
Early generative syntax argued for a distinction between equi verbs of the type above and raising verbs as in (16). The difference is that in equi verbs, the relevant arguments are licensed thematically by both the matrix and the embedded verb. For verbs like seem, on the other hand, it appears that no semantic selectional restrictions are imposed on the subject and that therefore an expletive subject as in (16) can be used. Verbs like seem were therefore identified as raising verbs. (16) a. There seems to be a unicorn in the garden. b. It seems to be raining. In LFG, this difference in semantic content (or thematic content, as it has also been called) is expressed by writing the non-thematic (semantically empty) argument outside of the angle brackets of the subcategorization frame. This is illustrated in (17b). (17) a. David seemed to yawn. b.
The formal mechanism of functional control, by which two parts of an f-structure are identified with one another, however, remain the same. So the lexical entry of seem would contain the functional equation in (14). The main distinction posited in LFG between equi and raising verbs is a semantic one that is expressed within the subcategorization frame of the verb in question.
3.1.2. Functional uncertainty So far, we have seen instances of functional control that are local. However, functional control can occur over several layers of embedding, as exemplified by the topicalization of beans in (18). (18) Beans, John asked Sandy to persuade Kim to eat.
25. Lexical-Functional Grammar
847
Other dependencies that can be local as well as long-distance are wh-questions and relative clauses in languages like English. These dependencies, in which the interrogative, relative, or topicalized phrases do not appear in their canonical c-structure position, are also analyzed via functional control in LFG. Often, the displaced constituent can play different roles in different sentences, and these roles may be at different levels of embedding (19). (19) a. b. c. d. e.
Who hopped? (↑FOCUS-INT) = (↑SUBJ) Who did John say hopped? (↑FOCUS-INT) = (↑COMP SUBJ) What did he eat? (↑FOCUS-INT) = (↑OBJ) What did John say that he ate? (↑FOCUS-INT) = (↑COMP OBJ) What did he want to eat? (↑FOCUS-INT) = (↑XCOMP OBJ)
The control equations needed to identify the role of the displaced constituent with its GF can be listed in any individual case. However, these possibilities in theory can be infinite. (In practice, limits are imposed due to performance restrictions, for more discussion on this topic see Pullum and Scholz 2010.) LFG captures this empirical fact via functional uncertainty equations (Kaplan and Zaenen 1989). For example, Kaplan and Zaenen (1989) proposed that the rule for English topicalization is as in (20), which says that there is an XP or S′ which precedes the main sentence, that this XP or S′ is the overall topic of the clause and that it may correspond to some GF that may be arbitrarily deeply embedded in a series of COMPs and/or XCOMPs. The technical definition of functional uncertainty is provided by Kaplan and Zaenen (1989: 26): Functional uncertainty: (fα) = v holds iff f is an f-structure, α is a set of strings, and for some s in the set of strings α, (fs) = v. When the string s is longer than one: (fas) ≡ ((fa)s) for a symbol a and a (possibly empty) string of symbols s; (fε) ≡ f, where ε is the empty string. (20)
S′
%
XP or S′ (↑ TOPIC) = ↓ (↑ TOPIC) = (↑ {COMP, XCOMP}* (GF-COMP))
S
↑=↓
In the analysis of a given sentence, the choice of path will depend on how other arguments satisfy the subcategorization frames of the verbs. For example, in (19d) what cannot be the subject of say or eat because they already have subjects and coherence would be violated; however, eat has no object and so what can be equated with its object position. Functional uncertainty paths vary cross-linguistically, reflecting island conditions and other language-particular constraints. In addition, different phenomena within a language (e.g., interrogatives, topicalization, relative clauses) may have different functional uncertainty paths. See Dalrymple (2001) for detailed discussion of functional uncertainty paths in a number of languages. Some versions of LFG theory posit traces for certain constructions in certain languages, while others never posit traces in the c-structure. Bresnan (2001) and Falk (2001, 2007) both argue for traces in LFG for capturing phenomena such as English weakcrossover constructions and wanna contraction. (But see Dalrymple et al. 2001 for an alternative account of weak-crossover in terms of linear prominence constraints that do
848
IV. Syntactic Models not rely on traces.) Such accounts generally posit traces only in highly configurational languages like English which use structural positions to encode grammatical functions, instead of case marking or head marking.
3.1.3. Anaphoric control Functional control involves a single PRED filling two roles in the larger f-structure. However, not all cases of control are of this type. Consider (21), which at first sight would appear to involve an embedded XCOMP, just as in the examples discussed under functional control. However, here the subject (David) does not necessarily control the subject of the embedded clause. That is, there is arbitrary anaphoric control in that the person leaving is some person that the syntax cannot determine and that needs to be computed on the basis of discourse or world knowledge. This computation is thought of as being on a par with anaphora resolution. (LFG posits cross-linguistic binding domains within sentences for anaphora. Many accounts define appropriate f-structure domains for the binding relation in conjunction with restrictions on precedence in the c-structure. These discussions only apply tangentially to anaphoric control.) (21) David gestured to leave. An analysis of (21) is provided in (22). The difference to functional control is that the embedded subject is not empty, but filled with a null pronominal subject (PRO) and, since the subject is filled, the embedded clause is analyzed as a COMP (a closed function) rather than an XCOMP (an open function). (22) David gestured to leave.
Anaphoric control can be obligatory in certain constructions. Equi constructions have been analyzed as obligatory anaphoric control. In the case of equi, the overt controller controls a null pronominal subject of the embedded clause. This controlled subject is absent at c-structure but present in the f-structure. An example is shown in (23) from Dalrymple (2001: 324).
25. Lexical-Functional Grammar (23) David tried to leave.
Obligatory anaphoric control has also been posited for English tough-constructions (Dalrymple and King 2000), as in (24). (24) Mothsi are tough to kill proi.
3.2. Modifiers Modifiers or adjuncts are not subcategorized by predicates and in general multiple modifiers are allowed in a given f-structure. In order to satisfy the uniqueness requirement of f-structures, modifiers belong to a set, even when there is only one of them. Adjectival modification is illustrated in (25), clausal modification in (26). (25) the happy little girl
(26) Monday, Nadya walked quickly.
C-structure analyses of modifiers can be as adjunction but may also appear in much flatter structures. Which type of c-structure is appropriate depends on the language and is generally determined by constituency tests such as coordination possibilities. The φ-
849
850
IV. Syntactic Models mapping between c-structure and f-structure for modifier sets involves specifying them as elements of the set, as in the rule for adjectival modifiers in (27), where the Kleene * represents zero or more instances of the AP. (27)
N′
%
AP* ↓ 2 (↑ ADJUNCT)
N
↑=↓
There are proposals within LFG for constraining the mapping from c- to f-structure in such a way to allow modifier annotations only in specific configurations. See, for example, Bresnan (2001).
3.3. Coordination Coordination also involves the formation of sets in f-structure. A set will have an element for each conjunct in the coordination. The canonical example of this is resolved person and number features in noun phrase coordination, where the coordinated set may have a different person and number than the individual elements. The features which can be features of the set itself are called non-distributive features. Consider the sample coordinated NP in (28). (28) the dog and me
The difference between distributive and non-distributive features is very important for phenomena like agreement (section 3.4) and case (section 3.5). If a requirement is placed on a coordinated f-structure for a distributed feature, then that requirement will distribute to each element of the set and all the elements must unify with that requirement. For example, if a verb requires a dative object and the object is coordinated, then each element in the coordinated object must be dative because case is distributive. In contrast, number and person is generally not distributive and so a verb that requires a plural subject will be compatible with an f-structure as in (28) in which the set is plural even though each conjunct is singular.
3.4. Agreement Agreement is generally analyzed as an instance of feature unification in LFG. Both elements in the agreement relation will specify a value for a feature, e.g., singular num-
25. Lexical-Functional Grammar ber, and these values must be compatible, see (8). This can be done for verb-argument agreement as well as for adjective-noun agreement. Note that the feature can either be present in only one place, e.g., on the subject noun phrase, or in two places with a constraint that the values must be identical, e.g., an adjective may state that its number value is the same as that of its head noun. A form can show agreement in one of two ways. Building on work in HPSG (Wechsler and Zlatic´ 2003), LFG posits two types of syntactic agreement features: CONCORD features are closely related to morphological form, while INDEX features are closely related to semantics. Different predicates will target specific feature sets. In general, CONCORD features are used for noun-phrase internal agreement, while INDEX features are used outside of the noun phrase. Consider (29). (29) a. This boy and girl hop. b.
The verb hop agrees with the plural INDEX value of the coordinated NP, which is a resolved value from the agreement features on the individual conjuncts, while the determiner this agrees with the singular CONCORD value of each of the coordinated arguments. Recent LFG research has been interested in asymmetric agreement, especially variants of closest-conjunct agreement (Sadler 2003; Kuhn and Sadler 2007).
3.5. Case LFG has its own theory of case, developed in a series of papers by Butt and King and by Rachel Nordlinger. LFG’s approach to case assignment is also often mediated by the mapping of a(rgument)-structure to GFs, discussed in section 4, whereby certain cases are associated with certain thematic roles or features at a-structure, e.g., Alsina’s (1996) work on Romance. We here briefly present the inventory of case types posited by Butt and King (2003, 2005) and discuss the related proposal by Nordlinger (1998). See Butt (2006, 2008) for a more complete discussion of case in LFG, including in OT-LFG (section 6). Positional case: Positional case is associated only with syntactic information. That is, there is assumed to be a syntactic configuration which requires a particular case marking. An example of positional case is the adnominal genitive in English (see King 1995 for examples from Russian). As shown in (30), the prenominal NP position is identified as genitive as part of the positional information in the syntax (the ↑=↓ notation indicates that the noun is the head of the phrase).
851
852
IV. Syntactic Models (30) English Adnominal Genitives (simplified structure) NP NP
N
(↓CASE)=GEN Boris’s
↑=↓ hat
Structural and default case: Structural case is often an instance of default case and hence functions as the Elsewhere Case (Kiparsky 1973). For languages which require that all NPs have case, this can be stated as in (31a), analogous to the Case Filter in GB (Rouveret and Vergnaud 1980). If a given NP is not already associated with case due to some other part of the grammar, then default case assignment principles as in (31b−c) apply. (31) a. Wellformedness principle: NP: (↑CASE) b. Default: (↑SUBJ CASE) = NOM c. Default: (↑OBJ CASE) = ACC Default case only applies to the core grammatical relations subject and object. The other grammatical relations tend to involve specialized semantics and therefore do not involve defaults. The content of the default assignment principles may vary from language to language, but the existence of a default case for subjects and objects is expected to hold crosslinguistically. Quirky case: The term quirky case is used only for those situations in which there is no regularity to be captured: the case assignment is truly exceptional to the system and no syntactic or semantic regularities can be detected. Under the assumption that case morphology plays a large role in the fundamental organizing principles of language, quirky case is expected to be fairly rare. Instead, case morphology is part of a coherent system, with only a few exceptions along the way. These exceptions are generally due to historical reasons and have not been eradicated or reanalysed as part of a regularization of the case system (Butt 2006). Semantic case: The defining characteristics of semantic case in the sense of Butt and King (2003) are semantic predictability and subjection to syntactic restrictions, such as being confined to certain GFs. Indeed, most cases cannot appear on just any GF, but are restricted to one or two. Under Butt and King’s (2003) analysis, most instances of case involve semantic case. This is because the bulk of the crosslinguistic case marking phenomena involve an interaction between syntactic and semantic constraints (including the quirky case found in Icelandic, which is actually mostly subject to semantic regularities). Consider the accusative/dative ko in Urdu. On direct objects, it signals specificity. That is, a combination of syntactic (direct objects only) and semantic factors (specificity) are involved. The ko can also appear on subjects and on indirect objects, as in (32). In either case, the dative is associated with a more or less abstract goal. Within Butt and King’s system, the ko is therefore analysed as a semantic case. Butt and King furthermore pursue a fundamentally lexical semantic approach to case. That is, lexical entries are posited for individual case markers and these lexical entries contain the bulk of the information associated with the presence of the case markers. The lexical entry for ko, for example, is shown in (32).
25. Lexical-Functional Grammar (32) Accusative ko (↑ CASE) = ACC (OBJ ↑) (↑sem−str SPECIFICITY) = +
853 Dative ko (↑ CASE) = DAT (GOAL ↑arg−str) (SUBJ ↑) n (OBJgo ↑)
The entry for ko specifies that it can be used either as an accusative or a dative. As an accusative, it can only appear on a direct object and is associated with specificity in the semantic projection. Note the use of the ↑ in the lexical entry of the case marker: the second line involves inside-out functional designation (Dalrymple 1993, 2001); the ↑ following the specification of a GF formulates a requirement that the constituent should be analysed as an object. As a dative, it can only appear on either experiencer subjects or indirect objects (OBJgo) and requires a goal argument at a-structure. (32) illustrates that the information associated with case morphology interacts with information at several levels of representation, e.g., f-structure, semantic projection (section 5.4), and astructure (section 4). (Inside-out functional designation can also be applied over functional uncertainty paths, just like the outside-in functional control discussed in section 3.1.2) Constructive case: Further evidence for the above type of lexical approach to case comes from Australian languages. Nordlinger (1998, 2000) analyzes two phenomena found in Australian languages: discontinuous constituents as in (33) and case stacking as in (34). In the Wambaya example in (33), the NP big dog is a discontinuous constituent. Generally, Australian languages are known for their free word order and in Wambaya the only requirement is that there be a finite verb in second position. (I = masculine gender class; A = transitive subject; O = object) (33) galalarrinyi-ni gini-ng-a dawu bugayini-ni. 3SG.M.A-1.O-NFUT bite big.I-ERG dog.I-ERG ‘The big dog bit me.’ (Nordlinger 1998: 96)
[Wambaya]
Now also consider the phenomenon of case stacking found in Martuthunira. In (34) the word thara ‘pouch’ is marked with three cases: one to show that it is signalling a location, one to show that it is part of a possessive or accompanying relation to another word (the proprietive case), and one to show that it is part of (modifying) an accusative case marked noun. The word mirtily ‘joey’ (a baby euro − a type of kangaroo) has two cases. The proprietive shows that it stands in an accompanying relationship with another (it is with the euro), and the accusative to show that it is part of (modifying) an accusative case marked noun. Finally, ‘euro’ is accusative as the direct object of the clause, while the first person pronoun (‘I’) is nominative (unmarked). (PROP = Proprietive) (34) Ngayu nhawu-lha ngurnu tharnta-a mirtily-marta-a I saw-PST that.ACC euro-ACC joey-PROP-ACC thara-ngka-marta-a. pouch-LOC-PROP-ACC ‘I saw the euro with a joey in (its) pouch.’ (Dench 1995: 60)
[Martuthunira]
854
IV. Syntactic Models These facts prompted Nordlinger (1998) to formulate a new perspective on case. She sees morphology as constructing the syntax of the clause. For example, under her analysis, the Wambaya ergative ni carries the information that there be a subject and that it be ergative. These pieces of information are encoded as part of the lexical entry of the ergative, as shown in (35). (35) ni: (↑ CASE) = ERG (SUBJ ↑) With this lexical approach to case, the effect of the analysis is that the combination of information from the lexical entries of big, dog and the ergative case in (35) results in the two partial f-structures shown in (36) and (37). Both the ergative dog and the big specify that they are parts of the subject because of the information associated with the ergative case marker in (35). In addition, the dog serves as the head of the phrase and the big as an adjunct which modifies it (the details of how the adjunct analysis is accomplished are left out here). (36)
(37)
These two sets of information are unified into the structure shown in (38) as a routine part of the clausal analysis within LFG. The problem of discontinuous constituents is thus solved by using the case morphology as a primary source of information about clausal structure. (38)
The same approach also serves well for instances of case stacking. Since every case marker contributes not only a value for the case feature at f-structure, but also imposes a requirement as to what GF it must appear with, the effects of case stacking can be easily explained via the lexicalist, constructive approach to case.
4. Argument structure In addition to the basic c- and f-structural representations, LFG’s architecture potentially allows for several other projections. One standard additional projection is the a(rgument)-structure. The a-structure encodes predicate-argument relationships in terms of thematic roles. These thematic roles are arranged in a thematic hierarchy, shown in (39) (based on Bresnan and Kanerva 1989).
25. Lexical-Functional Grammar (39) Thematic Role Hierarchy agent > beneficiary > recipient/experiencer > instrument > theme/patient > location The GFs as introduced in (9) are also arranged in a hierarchy. Linking is assumed to prefer a mapping between highest thematic role and highest GF (SUBJ). The default mapping is therefore straightforward: agents should map to subjects, themes should map to objects, etc. However, because languages exhibit many phenomena where this default mapping does not apply, an explicit linking or Lexical Mapping Theory was formulated in LFG to account for systematic deviations and argument alternations. The a-structure can be formally represented as an AVM, like the f-structure (e.g., Butt 1998), but in keeping with the bulk of the literature on argument structure, representations like the following are used here: pound < agent theme >.
4.1. Standard LFG Mapping Theory LFG’s Lexical Mapping Theory grew out of early work like Zaenen et al.’s (1985) analysis of Icelandic and German and Levin’s (1987) work on English. The discussion in this section is based on Bresnan and Zaenen (1990), Alsina and Mchombo (1993), Bresnan and Kanerva (1989), and Bresnan and Moshi (1990). Further reading and discussion can be found in Bresnan (1990) and Bresnan (1994). As in Levin’s first formulation for linking, thematic roles and GFs are cross-classified by features in standard LFG, which posits just two relevant features. The feature [±restricted] is a semantically grounded feature, which indicates whether a given thematic role or GF is sensitive to semantic restrictions. The feature [±o(bjective)] marks whether thematic roles are likely to be linked to objectlike GFs. (Alsina [1996: 19] instead proposes two different features: [±subj(ect)] and [±obl(ique)].) The [±r,±o] features classify GFs as shown in (40). The clausal COMP and XCOMP are not considered in this classification (see Berman 2003 and Dalrymple and Lødrup 2000 on the status of clausal arguments). (40) [−r] [+r]
[−o]
[+o]
SUBJ
OBJ
OBLθ
OBJθ
The thematic roles are classified by these same features, as shown in (41) (Bresnan and Zaenen 1990). (41) Classification of Thematic Roles Patientlike roles: [−r] Secondary patientlike roles: [+o] All others: [−o] The possible correspondences between thematic roles and GFs are regulated by the Mapping Principles in (42) and wellformedness conditions, some of which are shown in (43)
855
856
IV. Syntactic Models (Bresnan and Zaenen 1990). The θ stands for thematic role and θˆ refers to the highest argument on the thematic role hierarchy. (42) Mapping Principles a. Subject roles: (i) θˆ is mapped onto SUBJ; otherwise: [−o] (ii) θ is mapped onto SUBJ [−r] b. Other roles are mapped onto the lowest compatible function on the markedness hierarchy, where the subject is the least marked. SUBJ < OBJ, OBLθ < OBJθ (43) Wellformedness Conditions a. Subject Condition: Every (verbal) lexical form must have a subject. b. Function-argument biuniqueness: Each a-structure role must be associated with a unique grammatical function, and conversely. The Function-argument biuniqueness condition in (43b) is reminiscent of GB’s θ-Criterion mapping arguments to theta-roles. It has been challenged within LFG (Mohanan 1994; Alsina 1996) for not allowing the necessary flexibility to account for Argument Fusion in complex predicates (section 4.2). In addition, the Subject Condition may not hold universally (Bresnan 2001; Dalrymple 2001; Falk 2001). The feature classifications together with the mapping and wellformedness principles constitute the essence of LFG’s linking theory. Consider the following example of a transitive, a passive, an unaccusative and an unergative (taken from Bresnan and Zaenen 1990: 51−52). In (44), the transitive verb pound has two arguments, an agent and a theme. These are featurally classified according to (41) and then mapped to SUBJ and OBJ straightforwardly according to the mapping principles. (44)
Passivization suppresses the highest thematic role, as shown in (45). The only argument available for linking into the syntax is the theme. This could potentially be linked either to a subject or an object, but because of the principle in (42), it is linked to the subject. (45)
(46)
25. Lexical-Functional Grammar The single argument of unaccusatives is a theme, as in (47). This is classified by the feature [−r] and is linked to a subject rather than an object because of the mapping principles in (42). The unaccusative situation is thus parallel to the passive in (46). In the unergative, the only argument of an unergative verb like bark is an agent. This is classified as a [−o] thematic role and links to a subject. (47)
The basics of LFG’s mapping theory are thus very simple and yet make for a very powerful system that has been used to analyze complex case marking and argument structure phenomena in Germanic, Bantu, Romance and South Asian languages. Some of the more complex phenomena are briefly described in the remainder of this section.
4.2. Argument alternations and complex predicates One reason LFG allows for a flexible mapping between thematic roles and GFs is because of the crosslinguistic recurrence of argument alternations. One famous alternation investigated within LFG is locative inversion in Chichewa, shown in (48) (Bresnan and Kanerva 1989; Bresnan 1994). (REC.PST = recent past) ku-mu-dzi. a-na-bwér-á (48) a. a-lendȏ-wo [Chichewa] (visitors = subject) 2-visitor-2.those 2.SBJ-REC.PST-come-IND 17−3-village ‘Those visitors came to the village.’ (Bresnan and Kanerva 1989: 2) a-lendo-wȏ. ku-na-bwér-á b. ku-mu-dzi [Chichewa] (village=subject) 17−3-village 17.SBJ-REC.PST-come-IND 2-visitor-2 those ‘To the village came those visitors.’ (Bresnan and Kanerva 1989: 2) Chichewa is a Bantu language, which does not mark arguments via case, but instead uses a complex noun class system. The noun classes have a rough semantic/cognitive basis, but this is only rough, so the different classes are usually indicated by numbers, e.g., 2 for visitors and 17 for village in (48). Bresnan and Kanerva (1989) amass evidence which shows that visitors is the subject in (48a), but not in (48b). In (48b) the subject is village. One piece of evidence for subject status is verb agreement: in (48a) the verb agrees with visitors via the class 2 marker, in (48b), the verb agrees with village (class 17). Locative inversion is triggered by focus, so that the location (village) is focused in (48b). The possibility for locative inversion follows from the standard linking principles. The thematic roles are classified as shown in (49) via the general classification principles in (41). Both the theme and the locative could link to either SUBJ or OBL. In (49a) default linking occurs according to the mapping principles in (42): the theme is linked to the
857
858
IV. Syntactic Models subject because it is higher on the thematic role hierarchy than the locative. In (48b), the locative argument is linked to the subject due to the special focus context. In this context, locatives are associated with the [−r] feature, which means they can only be linked to a subject, preempting the theme. Since there cannot be two subjects in a clause, the theme is linked to the OBJ. (49) a.
b.
Another instance of an argument alternation that follows from LFG’s linking theory is a crosslinguistic pattern of causative formation. In causatives, an event is caused by the action of a causer. In the Chichewa examples in (50) (Alsina and Joshi 1991; Alsina 1997), there is a cooking event which is caused or instigated by a causer external to the cooking event. There are three syntactic arguments in (50): a causer/agent (the porcupine); another agent (the owl), who is also the causee; a theme/patient of the caused event (the pumpkins). kadzīdzi maûngu. i-na-phík-ítsa (50) a. Nǔngu [Chichewa] pumpkins porcupine SBJ-PST-cook-CAUS owl ‘The porcupine made the owl cook the pumpkins.’ (Alsina and Joshi 1991: 8) maûngu kwá kádzīdzi. i-na-phík-ítsa b. Nǔngu [Chichewa] porcupine SBJ-PST-cook-CAUS pumpkins by owl ‘The porcupine had the pumpkins cooked by the owl.’ (Alsina and Joshi 1991: 8) Causatives also show an argument alternation. As seen in (50) the causee alternates between a direct argument or an oblique PP in Chichewa. The argument alternation coincides with a semantic difference. When the causee is realized as a direct argument, it is interpreted as affected by the action. That is, in (50a) the focus is on the owl having to cook the pumpkins and how it might feel about that. In (50b), the focus is on the pumpkins and that they be cooked. It is not important who cooks them, or how they might feel about it, just that they become cooked. This semantic difference holds for Urdu (Butt 1998) and Romance (Alsina 1996) as well. Alsina and Joshi (1991) model this semantic difference via a difference in Argument Fusion. They examine causatives in Chichewa and Marathi, and propose an analysis by which two argument structures are combined and one of the arguments of each
25. Lexical-Functional Grammar argument structure is identified with an argument in the other. Causative morphemes and verbs are taken to have three arguments: a causer agent, a patient and a caused event. This is shown in (51). When a causative morpheme or verb is combined with another verb, it embeds this verb’s argument structure in its own, as shown in (52). (51)
CAUSE
< agent patient event >
(52)
CAUSE
< agent patient ‘cook’ < agent patient >>
There are four semantic arguments in (52). However, only three arguments are expressed in the syntax. Two of these arguments fuse at argument structure before being mapped into the syntax. Alsina and Joshi (1991) posit parameters which allow fusion of the matrix patient argument with either the embedded agent or the embedded patient. When the causee (the matrix patient) is fused with the embedded agent, the embedded agent is no longer available for linking, as shown in (53a). In this case the causee is considered to be the affected argument of the causation and is mapped to the direct object. The embedded patient is mapped to a secondary object (49a). When the matrix patient is fused with the embedded patient, then this argument is no longer available for linking and the agent of the embedded predicate is linked to an oblique (49b). (53) Object Causee a.
b.
LFG’s linking theory is thus primarily concerned with the relationship between argument structure and grammatical relations (f-structure). A final point with respect to causatives and linking theories in general relates to the domain of linking. As the name already indicates, Lexical Mapping Theory assumed that the linking from thematic roles to GFs was based on a single lexical entry, i.e., was taken care of entirely within the lexical component. However, data from causatives show that argument alternations and complex argument structures can arise either in the lexicon, or in the syntax − see Alsina (1997) for an explicit comparison between the morphological causatives in Bantu and periphrastic causatives in Romance. Alsina (1996) and Butt (1995) extended LFG’s linking theory to account for argument linking where the arguments are contributed by two distinct words in the syntax. The analyses assume a complex interaction between c-structure, astructure and f-structure, where one structure cannot be built up without information present at another structure. The a-structure is thus not primary, nor is the c-structure.
859
860
IV. Syntactic Models That is, each of the levels of representation constrain one another within LFG’s projection architecture. Further work on complex predicates which addresses architectural and linking issues is represented by Manning and Andrews (1999) and Wilson (1999).
4.3. Incorporation of Proto-Roles So far, we have presented the standard version of LFG’s mapping theory, albeit extended into the syntax in order to capture complex predication. However, many different versions of this mapping or linking theory have been proposed. One interesting development has been the incorporation of Proto-Role information (Dowty 1991), as proposed by Zaenen (1993), for example. Zaenen (1993) conducts a detailed study of the interaction between syntax and verbal lexical semantics in Dutch. Dutch auxiliary selection is one syntactic reflex of unaccusativity. Unaccusative verbs in Dutch select for zijn ‘be’ while unergatives select for hebben ‘have’. Zaenen shows that semantic factors are at the root of the auxiliary selection patterns. The have auxiliary is associated with control over an action, whereas the be auxiliary is selected when an argument is affected or changed (change of state). These properties are included in Dowty’s (1991) system of Proto-Role entailments: control is a Proto-Agent property and change of state is a Proto-Patient property. Zaenen therefore proposes to incorporates Dowty’s Proto-Role entailments into linking theory as shown in (54). (54) Association of Features with Participants 1. If a participant has more patient properties than agent properties, it is marked −r. 2. If a participant has more agent properties than patient properties it is marked −o. 3. If a participant has an equal number of properties, it is marked −r. 4. If a participant has neither agent nor patient properties, it is marked −o. (Zaenen 1993: 150,152) The Proto-Role information allows Zaenen to dispense with thematic roles and the thematic role hierarchy. Linking is accomplished via the default association of [±o,r] marked arguments with the GF hierarchy, as shown in (55). (55) Association of Features with GFs Order the participants as follows according to their intrinsic markings: −o < −r < +o < +r order the GR [grammatical functions] as follows: SUBJ < OBJ < OBJθ (< OBL) Starting from the left, associate the leftmost participant with the leftmost GR it is compatible with. (Zaenen 1993: 151)
25. Lexical-Functional Grammar Unaccusatives and unergatives can now be analysed as follows: The single argument of unaccusatives such as fall has more patient properties than agent properties, is thus classified as a [−r] role, and is therefore linked to the SUBJ GF. In contrast, the single argument of an unergative such as dance has more agent properties than patient properties, and is therefore classified as a [−o] role and is also linked to SUBJ. The difference in auxiliary selection is sensitive to the [−r] feature, as shown in (56). (56) Auxiliary Selection When an −r marked participant is realized as a subject, the auxiliary is zijn ‘be’. (Zaenen 1993: 149) Zaenen’s linking architecture remains true to the basic spirit of linking theory, but allows a better integration of relevant semantic factors. Other approaches have also integrated Proto-Role properties into an analysis of the relationship between a-structure and GFs. In his treatment of Romance causatives, for example, Alsina revises LFG’s standard linking theory considerably and also includes Proto-Role information. Ackerman’s (1992) ideas are similar in spirit to Zaenen’s analysis, but he offers a different way of integrating Proto-Roles for a treatment of the locative alternation. Finally, Ackerman and Moore (2001) incorporate Proto-Role properties into the selection of arguments without making explicit reference to LFG’s standard linking theory, though they assume their ideas are compatible with it.
4.4. Lexical rules Lexical rules manipulate the argument structure of lexical items in systematic ways. Lexical rules were introduced to capture regular alternations in the lexicon before astructure was introduced as part of LFG theory (Bresnan 1982b). Some lexical rules are now replaced by a-structure alternations. However, alternations that affect GFs, e.g., SUBJ and OBJ, instead of thematic roles or argument slots must be stated as lexical rules. The standard examples of a lexical rule is that of the passive, which rewrites the subject as NULL or as an oblique agent and rewrites the object as the subject. Thus, the lexical rules in (57) would rewrite the lexical entry in (58a) and (58b). (57) Passive lexical rule: (58) a.
% NULL % SUBJ
SUBJ OBJ
= ‘persuade ’ They persuaded the boys to leave. b. PRED = ‘persuade ’ The boys were persuaded to leave. PRED
Rewrites like the passive rule in (57), where arguments are renamed or deleted, are easier to formulate than rules which introduce an argument, e.g., benefactive constructions. When arguments are introduced, it is unclear where in the PRED structure the new argument should appear. For this reason, such argument-adding constructions are usually dealt with within a-structure.
861
862
IV. Syntactic Models
5. Interfaces/other projections The f-structure is the primary syntactic projection from the c-structure. However, it is possible to have other projections off of the c-structure and off of the f-structure or other projections. Here we briefly discuss proposals for a morphosyntactic projection (section 5.1), information structure (section 5.2), prosodic structure (section 5.3), and semantic structure (section 5.4).
5.1. Morphology-syntax interface The morphology-syntax interface can be partially factored out into an m-structure that is distinct from the f-structure. As originally formulated, the m-structure was meant to hold features that are only relevant for morpho-syntactic well-formedness while the fstructure contains features needed for syntax analysis and for semantics. The m-structure of a clause may thus vary significantly from its f-structure. In the original m-structure proposal, for example, m-structure is used to account for auxiliary stacking restrictions in English, which allows for a flatter, simpler f-structure (Butt et al. 1996). (59) a. Nadya will be hopping. b. m-structure
c. f-structure
M-structure has also been used extensively to analyze French auxiliaries and clitic distribution (Frank 1996). Most LFG analyses do not include m-structure, however, some approaches have taken up the m-structure idea and have invested it with more theoretical significance than it had originally. See Sadler and Spencer (2004) for such proposals and a more detailed discussion of the issues.
5.2. Information-structure I(nformation)-structure encodes the clause-internal discourse information such as topic and focus. I-structure often incorporates information from the c-structure as well as the f-structure and possibly the prosodic structure. A simple example of this is shown in (60) where the non-canonical word order reflects the discourse functions of the arguments. Annotations on the c-structure rules create both the f-structure in (60b) and the i-structure in (60c).
25. Lexical-Functional Grammar
863
(60) a. Èto plat’e sˇila Inna. this dress sewed Inna Inna-FOC sewed this dress-TOP ‘It was Inna who sewed this dress.’ b. f-structure
[Russian]
c.
i-structure
This is accomplished technically by having annotations such as those in (61). (61)
IP
%
NP (↑ { SUBJ|OBJ }) = ↓ ↓i2 (↑i TOPIC)
IP
↑=↓ ↑i=↓i
Some types of topic and focus are GFs and hence appear in the f-structure (Bresnan and Mchombo 1987). However, for many languages, a separate projection is needed in order to account for mismatches between the constituency of the f-structure and of the istructure. For example, focused verbs cannot be encoded in the f-structure because they are the heads of their clauses and marking them as focus would result, incorrectly, in the entire clause being focused (King 1997). Note that the topic is represented as a set in (61c). This is because there could be multiple topics in clause. There are a number of differing proposals as to how to exactly integrate the i-structure projection and what exactly should be represented at this level of analysis, see for example Choi (1999), King (1995), O’Connor (2004) and Mycock (2006).
5.3. Prosodic-structure Work on the analysis of clitics and on discourse functions has led to the incorporation of prosodic information into the LFG architecture, generally by means of a p(rosodic)structure projection (Butt and King 1998; O’Connor 2004; Mycock 2006). The projection is generally represented as an AVM similar to the f-structure, but there is still discussion of whether an AVM is the best form for this. For example, Bo¨gel et al. (2009, 2010) argue for using finite-state power to create prosodically bracketed strings which can then guide c-structure formation and hence influence the f-stucture. In related work, Asudeh (2009) proposes that linear string adjacency be part of the syntax-phonology interface in LFG, thereby accounting for complementizer-adjacent extraction effects without reference to traces. Dalrymple and Mycock (2011) propose a modular architecture with lexical entries containing phonological- and syntactic-forms to explain declarative questions and comma intonation in non-restrictive relative clauses. Many analyses of clitics account for their unusual behavior by exploiting the morphology-syntax interface, instead of or in addition to using prosodic-structure for this purpose. Wescoat (2005) proposes lexical sharing to account for English auxiliary clitics,
864
IV. Syntactic Models while Luís and Otoguro (2005) argue that morphology and phrase structure are separate levels of analysis. In addition, OT-LFG (section 6) can use the interaction of OT constraints in order to govern clitic placement (Estigarribia 2005; Lowe 2011).
5.4. Semantic-structure Halvorsen (1983) was the first to propose that semantic interpretation in LFG be done by projecting a s(emantic)-structure from the f-structure. In turn, this s-structure could be mapped to formulae in intensional logic and model-theoretic interpretation. Note that f-structures have since been shown to be equivalent to quasi-logical forms (QLF; van Genabith and Crouch 1996), confirming that f-structures provide a very good input for further semantic analysis. Halvorsen’s example (1) is shown simplified in (62). Note that the f-structure represents an analysis whereby English auxiliaries were taken to embed a VCOMP (infinitive clause), an analysis that is no longer standard in LFG − rather a flat f-structure is now assumed with the auxiliary only registering tense/aspect information (Butt et al. 1999). (62) a. John was flattered by Mary. b. f-structure
c. s-structure
d. formula of intensional logic flatter*ʹ (m, j) In Halvorsen’s original example, the complex f-structure is projected into the flat sstructure in (62c), which can then be further processed into an intensional logic formula, as shown in (62d). Halvorsen and Kaplan (1988) also argue for a s-structure but propose a co-description analysis whereby the s-structure is not projected in its entirety from the f-structure. Instead it is created simultaneously as a parallel projection to the f-structure, with certain
25. Lexical-Functional Grammar mutually constraining factors. Co-description analyses have enjoyed a steady popularity in LFG. In newer approaches, however, semantic analyses based on LFG f-structures have largely abandoned the s-structure approach and instead use Glue semantics (Dalrymple 1999). Under this approach, the meanings of words are separate from the mechanisms which combine them. Lexical entries contain meaning constructors that specify how the word contributes to the meaning of larger syntactic constituents. The meanings of the larger constituents are derived using linear-logic deduction on the meanings of the parts. Combining all of the premises according to a resource logic (linear logic) results in the meaning of an utterance. Because the composition is governed by this logic, it does not have to follow the rules of phrasal (e.g., c-structure) composition. Glue semantics is resource sensitive, because in the deduction each premise is consumed as it is used and all premises must be consumed. Extensive examples of Glue semantic analyses for different syntactic and semantic constructions can be found in Dalrymple (2001). Within the computational community yet another approach to semantic construction has recently been developed, namely a system called XFR (Crouch et al. 2011) involving ordered rewrite rules, which are used to efficiently and robustly produce semantic structures from f-structures (Crouch 2006; Crouch and King 2006). The resulting semantics gives a flat representation of the sentence’s predicate argument structure and the semantic contexts in which those predications hold. These semantic structures in turn are input to additional ordered rewrite rules which produce Abstract Knowledge Representations (AKRs) (Bobrow et al. 2005; Bobrow et al. 2007).
5.5. The overall LFG projection architecture We have now briefly introduced a number of different proposals for additional projections. Within the LFG community, there are a number of proposals as to how these various different structures relate to one another. A constant is that f-structure is a projection from the c-structure, as in the original proposal (Bresnan 1982b). A possible architecture is represented in (63). Here, the c-structure and f-structure are modulated by the m(orphosyntactic)-structure and the a(rgument)-structure. In addition, the s(emantic)structure comprises information from the f-structure and the c-structure, with the p(rosodic)-structure and i(nformation)-structure mapping between the c-structure and s-structure. There are many minor variants of the architecture shown in (63); see Asudeh (2006: 369) for discussion. (63) The Correspondence Architecture (Asudeh 2006)
865
866
IV. Syntactic Models
6. OT-LFG An additional architectural component was added to LFG with the advent of Optimality Theory (OT). OT was originally formulated to solve problems with respect to alignment constraints in prosodic morphology (McCarthy and Prince 1993). The establishment of OT as a serious domain of investigation within syntax is due mainly to articles by Jane Grimshaw (Grimshaw 1997) and Joan Bresnan (Bresnan 1998). In particular, Bresnan showed how OT could be used in conjunction with standard LFG. The next section introduces the basic architecture and assumptions of OT, and the following section shows how LFG theory can be integrated with OT.
6.1. OT basics Within OT, the goal is to determine an optimal output (surface form) with respect to a given input. The optimal output is picked from a set of candidates that compete with one another. The competition between the candidates is resolved by an evaluation of constraint violations, as shown in (64) (adapted from Vogel 2001). (64)
The nature of the input is still in need of precise definition in much of OT. Grimshaw’s (1997) original paper assumed that the input encompasses the basic argument structure of a predicate, and a specification of tense/aspect. In later work (Grimshaw and SamekLodovici 1998), the focus/topic specifications were included as part of the input. A typical input might look as in (65), where the argument structure of give is specified, along with information about which argument is the topic, and which is in focus. (65) give(x,y,z), x=topic, z=focus, x=Kim, z=dog, y=bone
6.2. Optimality Theory and LFG In OT-LFG (Bresnan 2000; Kuhn 2003), the input is assumed to be an underspecified fstructure in the sense that the GFs are not as yet specified. An example for the transitive verb drive is shown in (66). (66)
25. Lexical-Functional Grammar The skeletal f-structure inputs are passed on to a function GEN, which generates a set of possible output candidates that could correspond to the input. Within OT-LFG, GEN is assumed to be equivalent to a standard LFG grammar. LFG grammars can be used to both parse and generate; so the idea is that an underspecified input as in (66) is passed on to an existing grammar for English. This searches through its rule space and produces all possible pairings of c-structures and surface strings that could correspond to the input f-structure. Kuhn (2003) shows that this is computationally viable. Quite a bit of work is done within OT-LFG, some of the early landmark contributions are collected in Sells (2001).
7. Computational issues and resources Right from its inception, LFG was designed as a theory and formalism whose mathematical basis is solid and well-understood. This property means that LFG is particularly suitable for computational linguistic research and there has been computational linguistic work based on LFG for over twenty years. Interest in using LFG in computational linguistics, as well as natural language processing applications is increasing steadily (see the chapter on computational syntax in this volume). Perhaps the most visible computational effort involving LFG is the Parallel Grammar (ParGram) group (Butt et al. 1999; Butt et al. 2002), which implements LFG grammars of different languages on the XLE grammar development platform (Crouch et al. 2011). The guiding motivation of ParGram is an effort at parallel analyses across languages using parallel implementation techniques. That is, much effort goes into sorting through possible alternative analyses and feature spaces for phenomena across languages and trying to agree on f-structure analyses that are as parallel as possible across a diverse number of languages. Grammars that have been developed so far include Arabic, English, French, German, Hungarian, Indonesian, Japanese, Malagasy, Murrinh-Patha, Norwegian, Tigrinya, Turkish, Urdu and Welsh. The English and the Japanese grammars have been used for industrial purposes in terms of developing query systems and the Norwegian grammar was used in a machine translation project named LOGON. Since f-structures are already very close to semantic forms (see section 5.4), the idea in LFG is that if analyses are kept as parallel across languages as possible, then applications like machine translation should be able to produce good results more easily (Frank 1999). The bulk of the computational work done within LFG is (naturally) symbolic and rule-based. However, statistical methods can be integrated in several ways, i.e. to pick the most likely parse or sentence to be generated among a forest of possibilities (Riezler and Maxwell 2006; Cahill et al. 2007; Graham et al. 2009). All of the projections discussed above are in principle implementable via the grammar development platform XLE, though this is rarely done in practice, with most computational grammars confining themselves to just c- and f-structures. However, an OT-projection is routinely used in the computational grammars (Frank et al. 2001; Crouch et al. 2011). Its purpose is to help constrain the grammar by dispreferring or preferring certain rules or lexical items over others.
867
868
IV. Syntactic Models
8. Psycholinguistic research LFG has also been used as the basis for psycholinguistic research, though not to the extent that was once thought. The first standard reference for LFG work, namely Bresnan (1982b), contains papers on psycholinguistics by Marilyn Ford and Steven Pinker. The basic tenets of LFG continued to inform their psycholinguistic research (e.g., Levelt 1989; Pinker 1989; Gropen et al. 1991). However, most of the psycholinguistic LFGrelated work seems to have taken place in the 1980s and early 1990s. An exception are recent publications by Ford and Bresnan (Ford and Bresnan 2013; Bresnan and Ford 2010), which take up Bresnan’s recent work on stochastic approaches to syntax (e.g., Bresnan 2007).
9. Conclusion From its inception, LFG has had the goal of combining linguistic insights with solid mathematical foundations, computational tractability and implementability (Dalrymple et al. 1995). Much work in LFG has focused on typological diversity, with major efforts in a large number of language families. This is reflected in some of the core areas of research such as case and agreement systems, causatives and complex predicates, coordination, and anaphora. LFG minimally posits two levels of representation: the cstructure and the f-structure. The c-structure encodes linear order, constituency and hierarchical relations while the f-structure focuses on the dependency structure of a clause. The mapping between the two is realized in terms of a mathematical function and need not be one-to-one. This fact allows for a flexible architecture which can deal with longdistance dependencies and other complex linguistic phenomena in a straight-forward and elegant manner. A broad range of theoretical, descriptive and computational linguists work within LFG, with some work also being done in neuro- and psycholinguistics. Several current textbooks are available (Bresnan 2001; Dalrymple 2001; Falk 2001), as is a description of the ParGram grammar development effort (Butt et al. 1999).
10. References (selected) Ackerman, Farrell 1992 Complex predicates and morpholexical relatedness: Locative alternation in Hungarian. In: I. Sag and A. Szabolcsi (eds.), Lexical Matters, 55−83. Stanford: CSLI Publications. Ackerman, Farrell, and John Moore 2001 Proto-properties and Grammatical Encoding: A Correspondence Theory of Argument Selection. Stanford: CSLI Publications. Alsina, Alex 1996 The Role of Argument Structure in Grammar: Evidence from Romance. Stanford: CSLI Publications. Alsina, Alex 1997 A theory of complex predicates: evidence from causatives in Bantu and Romance. In: A. Alsina, J. Bresnan, and P. Sells (eds.), Complex Predicates, 203−246. Stanford: CSLI Publications.
25. Lexical-Functional Grammar Alsina, Alex, and Smita Joshi 1991 Parameters in causative constructions. In: Papers from the 27 th Regional Meeting of the Chicago Linguistic Society, 1−15. University of Chicago. Alsina, Alex, and Sam Mchombo 1993 Object asymmetries and the Chichew ˆ a applicative construction. In: S. Mchombo (ed.), Theoretical Aspects of Bantu Grammar, 17−45. Stanford: CSLI Publications. Asudeh, Ash 2006 Direct compositionality and the architecture of LFG. In: M. Butt, M. Dalrymple, and T. H. King (eds.), Intelligent Linguistic Architectures: Variations on themes by Ronald M. Kaplan. Stanford: CSLI Publications. Asudeh, Ash 2009 Adjacency and locality: A constraint-based analysis of complementizer-adjacent extraction. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2009 Conference, 106− 126, Stanford: CSLI Publications. Asudeh, Ash, and Ida Toivonen 2009 Lexical-Functional Grammar. In: B. Heine and H. Narrog (eds.), The Oxford Handbook of Linguistic Analysis. Oxford: Oxford University Press. Berman, Judith 2003 Clausal Syntax of German. Stanford: CSLI Publications. Bobrow, Danny, Bob Cheslow, Cleo Condoravdi, Lauri Karttunen, Tracy Holloway King, Rowan Nairn, Valeria de Paiva, Charlotte Price, and Annie Zaenen 2007 PARC’s bridge and question answering system. In: Proceedings of the Grammar Engineering Across Frameworks (GEAF07) Workshop, 46−66. Stanford: CSLI Publications. Bobrow, Danny, Cleo Condoravdi, Richard Crouch, Ronald Kaplan, Lauri Karttunen, Tracy Holloway King, Valeria de Paiva, and Annie Zaenen 2005 A basic logic for textual inference. In: Proceedings of the AAAI Workshop on Inference from Textual Question Answering, Pittsburgh, PA. Bögel, Tina, Miriam Butt, Ronald M. Kaplan, Tracy Holloway King, and John T. Maxwell III 2009 Clitics and prosodic phonology in LFG. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2009 Conference. Stanford: CSLI Publications. Bögel, Tina, Miriam Butt, Ronald M. Kaplan, Tracy Holloway King, and John T. Maxwell III 2010 Second position clitics and the prosody-syntax interface. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2010 Conference, 106−126. Stanford: CSLI Publications. Bresnan, Joan 1977 Transformation and categories in syntax. In: R. E. Butts and J. Hintikka (eds.), Basic Problems in Methodology and Linguistics. Part Three of the Proceedings of the Fifth International Congress of Logic, Methodology, and Philosophy of Science, 261−282. Dordrecht: Reidel. Bresnan, Joan 1982a Control and complementation. In: J. Bresnan (ed.), The Mental Representation of Grammatical Relations. Cambridge, MA: The MIT Press. Bresnan, Joan (ed.) 1982b The Mental Representation of Grammatical Relations. Cambridge, MA: The MIT Press. Bresnan, Joan 1990 Monotonicity and the theory of relation-changes in LFG. Language Research 26(4): 637−652. Bresnan, Joan 1994 Locative inversion and the architecture of universal grammar. Language 70(1): 72−131. Bresnan, Joan 1998 Morphology competes with syntax: Explaining typological variation in weak crossover effects. In: P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis, and D. Pesetsky (eds.), Is
869
870
IV. Syntactic Models the Best Good Enough? Optimality and Competition in Syntax, 59−92. Cambridge, MA: The MIT Press and MIT Working Papers in Linguistics. Bresnan, Joan 2000 Optimal syntax. In: J. Dekkers, F. van der Leeuw, and J. van de Weijer (eds.), Optimality Theory: Phonology, Syntax, and Acquisition, 334−385. Oxford: Oxford University Press. Bresnan, Joan 2001 Lexical-Functional Syntax. Oxford: Blackwell. Bresnan, Joan 2007 Is syntactic knowledge probabilistic? Experiments with the English dative alternation. In: S. Featherston and W. Sternefeld (eds.), Roots: Linguistics in Search of Its Evidential Base, 77−96. Berlin: Mouton de Gruyter. Bresnan, Joan, and Marilyn Ford 2010 Predicting syntax: Processing dative constructions in American and Australian varieties of English. Language 86(1): 186−213. Bresnan, Joan, and Jonni Kanerva 1989 Locative inversion in Chichewa: A case study of factorization in grammar. Linguistic Inquiry 20: 1−50. Bresnan, Joan, and Sam Mchombo 1987 Topic, pronoun, and agreement in Chichewa. Language 63(4): 181−254. Bresnan, Joan, and Lioba Moshi 1990 Object asymmetries in comparative Bantu syntax. Linguistic Inquiry 21(2): 147−185. Bresnan, Joan, and Annie Zaenen 1990 Deep unaccusativity in LFG. In: K. Dziwirek, P. Farrell, and E. Mejías-Bikandi (eds.), Grammatical Relations: A Cross-Theoretical Perspective, 45−57. Stanford: CSLI Publications. Butt, Miriam 1995 The Structure of Complex Predicates in Urdu. Stanford: CSLI Publications. Butt, Miriam 1998 Constraining argument merger through aspect. In: E. Hinrichs, A. Kathol, and T. Nakazawa (eds.), Complex Predicates in Nonderivational Syntax, 73−113. New York: Academic Press. Butt, Miriam 2006 Theories of Case. Cambridge: Cambridge University Press. Butt, Miriam 2008 Case in Lexical-Functional Grammar. In: A. Malchukov and A. Spencer (eds.), The Handbook of Case, 59−71. Oxford: Oxford University Press. Butt, Miriam, Helge Dyvik, Tracy Holloway King, Hiroshi Masuichi, and Christian Rohrer 2002 The parallel grammar project. In: J. Carroll, N. Oostdijk, and R. Sutcliffe (eds.), Proceedings of the Workshop on Grammar Engineering and Evaluation at the 19 th International Conference on Computational Linguistics, 1− 7. Butt, Miriam, and Tracy Holloway King 1998 Interfacing phonology with LFG. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 1998 Conference. Stanford: CSLI Publications. Butt, Miriam, and Tracy Holloway King 2003 Case systems: Beyond structural distinctions. In: E. Brandner and H. Zinsmeister (eds.), New Perspectives on Case Theory, 53−87. Stanford: CSLI Publications. Butt, Miriam, and Tracy Holloway King 2005 The status of case. In: V. Dayal and A. Mahajan (eds.), Clause Structure in South Asian Languages, 153−198. Berlin: Springer Verlag. Butt, Miriam, Tracy Holloway King, Maria-Eugenia Nin˜o, and Fre´de´rique Segond 1999 A Grammar Writer’s Cookbook. Stanford: CSLI Publications.
25. Lexical-Functional Grammar Butt, Miriam, Marı´a-Eugenia Nin˜o, and Fre´de´rique Segond 1996 Multilingual processing of auxiliaries in LFG. In: D. Gibbon (ed.), Natural Language Processing and Speech Technology: Results of the 3rd KONVENS Conference, 111− 122, Bielefeld. Cahill, Aoife, Martin Forst, and Christian Rohrer 2007 Stochastic realisation ranking for a free word order language. In: S. Busemann (ed.), Proceedings of the European Workshop on Natural Language Generation (ENLG-07), Dagstuhl. Choi, Hyewon 1999 Optimizing Structure in Context: Scrambling and Information Structure. Stanford: CSLI Publications. Chomsky, Noam 1965 Aspects of the Theory of Syntax. Cambridge, MA: The MIT Press. Croft, William 2001 Radical Construction Grammar: Syntactic Theory in Typological Perspective. Oxford: Oxford University Press. Crouch, Dick, Mary Dalrymple, Ronald M. Kaplan, Tracy King, John Maxwell, and Paula Newman 2011 XLE Documentation. On-line documentation, Palo Alto Research Center (PARC). Crouch, Richard 2006 Packed rewriting for mapping text to semantics and KR. In: M. Butt, M. Dalrymple, and T. H. King (eds.), Intelligent Linguistic Architectures: Variations on Themes by Ronald M Kaplan, 389−416. Stanford: CSLI Publications. Crouch, Richard, and Tracy Holloway King 2006 Semantics via F-structure rewriting. In: Proceedings of the LFG 2006 Conference, 145− 165. Stanford: CSLI Publications. Dalrymple, Mary 1993 The Syntax of Anaphoric Binding. Stanford: CSLI Publications. Dalrymple, Mary (ed.) 1999 Semantics and Syntax in Lexical Functional Grammar. Cambridge, MA: The MIT Press. Dalrymple, Mary 2001 Lexical Functional Grammar. Vol. 34 of Syntax and Semantics. New York: Academic Press. Dalrymple, Mary, Ronald M. Kaplan, John T. Maxwell III, and Annie Zaenen (eds.) 1995 Formal Issues in Lexical-Functional Grammar. Stanford: CSLI Publications. Dalrymple, Mary, Ronald M. Kaplan, and Tracy Holloway King 2001 Weak crossover and the absence of traces. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2001 Conference. Stanford: CSLI Publications. Dalrymple, Mary, and Tracy Holloway King 2000 Missing-object constructions: Lexical and constructional variation. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2000 Conference. Stanford: CSLI Publications. Dalrymple, Mary, and Helge Lødrup 2000 The grammatical functions of complement clauses. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2000 Conference. Stanford: CSLI Publications. Dalrymple, Mary, and Louise Mycock 2011 The prosody-semantics interface. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2011 Conference. Stanford: CSLI Publications. Dench, Alan 1995 Martuthunira: a Language of the Pilbara Region of Western Australia. Canberra: Pacific Linguistics. Dowty, David 1991 Thematic proto-roles and argument selection. Language 67(3): 547−619.
871
872
IV. Syntactic Models Estigarribia, Bruno 2005 Direct object clitic doubling in OT-LFG: A new look at Rioplantense Spanish. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2005 Conference. CSLI Publications. Evans, Nicholas, and Stephen Levinson 2009 The myth of language universals: Language diversity and its importance for cognitive science. The Behavioral and Brain Sciences 32: 429−448. Falk, Yehuda 2001 Lexical-Functional Grammar: An Introduction to Parallel Constraint-based Syntax. Stanford: CSLI Publications. Falk, Yehuda 2007 Do we wanna (or hafta) have empty categories? In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2007 Conference, 184−197. CSLI Publication. Ford, Marilyn, and Joan Bresnan 2013 Using convergent evidence from psycholinguistics and usage. In: M. Krug and J. Schlu¨ter (eds.), Research Methods in Language Variation and Change, 295−312: Cambridge: Cambridge University Press. Frank, Anette 1996 A note on complex predicate formation: Evidence from auxiliary selection, reflexivization, and past participle agreement in French and Italian. In: M. Butt and T. H. King (eds.), Proceedings of LFG 1996 Conference. Stanford: CSLI Publications. Frank, Anette 1999 From parallel grammar development towards machine translation. In: Proceedings of the MT Summit VII: MT in the Great Translation Era, 134−142. Kent Ridge Digital Labs. Frank, Anette, Tracy Holloway King, Jonas Kuhn, and John T. Maxwell III 2001 Optimality Theory-style constraint ranking in large-scale LFG grammars. In: P. Sells (ed.), Formal and Empirical Issues in Optimality Theoretical Syntax, 367− 398. Stanford: CSLI Publications. Graham, Yvette, Anton Bryl, and Josef van Genabith 2009 F-structure transfer-based statistical machine translation. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2009, 317−337. Stanford: CSLI Publications. Grimshaw, Jane 1997 Projection, heads, and optimality. Linguistic Inquiry 28(3): 373−422. Grimshaw, Jane, and Vieri Samek-Lodovici 1998 Optimal subjects and subject universals. In: P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis, and D. Pesetsky (eds.), Is the Best Good Enough?, 193−219. Cambridge, MA: The MIT Press. Gropen, Jess, Steven Pinker, Michelle Hollander, and Richard Goldberg 1991 Affectedness and direct objects: The role of lexical semantics in the acquisition of verb argument structure. Cognition 41: 153−195. Halvorsen, Per-Kristian 1983 Semantics in Lexical-Functional Grammar. Linguistic Inquiry 14: 567−615. Halvorsen, Per-Kristian, and Ronald M. Kaplan 1988 Projections and semantic description in Lexical-Functional Grammar. In: Proceedings of the International Conference on Fifth Generation Computer Systems. ICOT. Kaplan, Ronald M 1995 The formal architecture of Lexical-Functional Grammar. In: M. Dalrymple, R. M. Kaplan, J. T. M. III, and A. Zaenen (eds.), Formal Issues in Lexical-Functional Grammar, 7−27. Stanford: CSLI Publications. Kaplan, Ronald M., Klaus Netter, Ju¨rgen Wedekind, and Annie Zaenen 1989 Translation by structural correspondences. In: Proceedings of the 4th Meeting of the European Association for Computational Linguistics, 272−281, University of Manchester. Reprinted in Mary Dalrymple, Ronald M. Kaplan, John Maxwell, and Annie Zaenen,
25. Lexical-Functional Grammar eds., Formal Issues in Lexical-Functional Grammar, 311−329. Stanford: CSLI Publications. 1995. Kaplan, Ronald M., and Annie Zaenen 1989 Long-distance dependencies, constituent structure, and functional uncertainty. In: M. Baltin and A. Krock (eds.), Alternative Conceptions of Phrase Structure. Chicago: Chicago University Press. King, Tracy Holloway 1995 Configuring Topic and Focus in Russian. Stanford: CSLI Publications. King, Tracy Holloway 1997 Focus domains and information structure. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 1997 Conference. CSLI Publications. Kiparsky, Paul 1973 ‘Elsewhere’ in phonology. In: S. R. Anderson and P. Kiparsky (eds.), A Festschrift for Morris Halle, 93−106. New York: Holt, Rinehart and Winston. Kroeger, Paul R. 2005 Analyzing Grammar: An Introduction. Cambridge: Cambridge University Press. Kuhn, Jonas 2003 Optimality-Theoretic Syntax − A Declarative Approach. Stanford: CSLI Publications. Kuhn, Jonas, and Louisa Sadler 2007 Single conjunct agreement and the formal treatment of coordination in LFG. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2007 Conference, 302−322. Stanford: CSLI Publications. Levelt, W. J. M. 1989 Speaking: From Intention to Articulation. Cambridge, MA: The MIT Press. Levin, Lori 1987 Toward a linking theory of relation changing rules in LFG. Technical Report CSLI-87− 115, Stanford. Lowe, John J. 2011 Ṛgvedic clitics and ‘prosodic movement’. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2011 Conference. Stanford: CSLI Publications. Luı´s, Ana R., and Ryo Otoguro 2005 Morphological and syntactic well-formedness: The case of European Portuguese proclitics. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2005 Conference. Stanford: CSLI Publications. Manning, Christopher D., and Avery Andrews 1999 Complex Predicates and Information Spreading. Stanford: CSLI Publications. McCarthy, John, and Alan Prince 1993 Generalized alignment. In: G. Booij and J. van Marle (eds.), Yearbook of Morphology 1993, 79−153. Berlin: Kluwer Academic Publishers. Mohanan, Tara 1994 Argument Structure in Hindi. Stanford: CSLI Publications. Mycock, Louise 2006 The typology of constituent questions: A lexical-functional grammar analysis of whquestions. PhD thesis, University of Manchester. Nordlinger, Rachel 1998 Constructive Case: Evidence from Australian Languages. Stanford: CSLI Publications. Nordlinger, Rachel 2000 Australian case systems: Towards a constructive solution. In: M. Butt and T. H. King (eds.), Argument Realization, 41−72. Stanford: CSLI Publications. O’Connor, Rob 2004 Information structure in lexical-functional grammar: The discourse-prosody correspondence in English and Serbo-Croation. PhD thesis, University of Manchester.
873
874
IV. Syntactic Models Pinker, Steven 1989 Learnability and Cognition: The Acquisition of Argument Structure. Cambridge, MA: The MIT Press. Pullum, Geoffrey K., and Barbara C. Scholz 2001 On the distinction between generative-enumerative and model-theoretic syntactic frameworks. In: P. de Groote, G. Morrill, and C. Retor (eds.), Logical Aspects of Computational Linguistics, 17−43. Berlin: Springer Verlag. Pullum, Geoffrey K., and Barbara C. Scholz 2010 Recursion and the infinitude claim. In: H. van der Hulst (ed.), Recursion in Human Language, 113−138. Berlin: Mouton de Gruyter. Riezler, Stefan, and John T. Maxwell 2006 Grammatical machine translation. In: Human Language Technology Conference − North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT-NAACL’06). Rouveret, Alain, and Jean Roger Vergnaud 1980 Specifying reference to the subject: French causatives and conditions on representations. Linguistic Inquiry 11: 97−202. Sadler, Louisa 1993 Co-description and translation. In: F. van Eynde (ed.), Linguistic Issues in Machine Translation, 44−71. London: Pinter Publishers. Sadler, Louisa 2003 Coordination and asymmetric agreement in Welsh. In: M. Butt and T. H. King (eds.), Nominals: Inside and Out, 85−118. Stanford: CSLI Publications. Sadler, Louisa, and Andrew Spencer (eds.) 2004 Projecting Morphology. Stanford: CSLI Publications. Sells, Peter (ed.) 2001 Formal and Empirical Issues in Optimality Theoretic Syntax. Stanford: CSLI Publications. van Genabith, Josef, and Richard Crouch 1996 F-structures, QLFs and UDRSs. In: M. Butt and T. H. King (eds.), Proceedings of the First International Conference on Lexical-Functional Grammar, 190−205. Vogel, Ralf 2001 Case conflict in German free relative constructions: An Optimality Theoretic treatment. In: G. Mu¨ller and W. Sternefeld (eds.), Competition in Syntax, 341−375. Berlin: Mouton de Gruyter. Wechsler, Stephen, and Larisa Zlatic´ 2003 The Many Faces of Agreement. Stanford: CSLI Publications. Wescoat, Michael T 2005 English nonsyllabic auxiliary contractions: An analysis in LFG with lexical sharing. In: M. Butt and T. H. King (eds.), Proceedings of the LFG 2005 Conference. Stanford: CSLI Publications. Wilson, Stephen 1999 Coverbs and Complex Predicates in Wagiman. Stanford: CSLI Publications. Zaenen, Annie 1993 Unaccusativity in Dutch: Integrating syntax and lexical semantics. In: J. Pustejovsky (ed.), Semantics and the Lexicon, 129−161. Berlin: Kluwer Academic Publishers. Zaenen, Annie, Joan Maling, and Ho¨skuldur Thra´insson 1985 Case and grammatical functions: The Icelandic passive. Natural Language and Linguistic Theory 3: 441−483. Reprinted in Joan Maling and Annie Zaenen (Eds.) Syntax and Semantics 24: Modern Icelandic Syntax, 95−164. New York: Academic Press. 1990.
Miriam Butt, Konstanz (Germany) Tracy Holloway King, San Jose (USA)
26. Optimality-Theoretic Syntax
26. Optimality-Theoretic Syntax 1. 2. 3. 4. 5. 6.
Model of grammar Evidence for OT analyses in syntax Problems for OT analyses in syntax Optimization domains Conclusion References (selected)
Abstract This chapter lays out the basic assumptions and workings of optimality-theoretic syntax. After sketching central aspects of the model of grammar presupposed in optimality theory, I address four pieces of evidence for optimality-theoretic approaches to syntax in the first core section (viz., the ubiquity of constraint conflict, the existence of repair phenomena, the notion of default, and cross-linguistic variation). The second core section is devoted to three potential problems with optimality-theoretic syntax that have been identified in the literature (viz., complexity, ineffability, and optionality). In the third core section, the concept of optimization domain is subjected to scrutiny, and with it the question of whether syntactic optimization proceeds serially or in parallel. Finally, some conclusions are drawn concerning the prospects of optimality-theoretic syntax, particularly with respect to recent developments in the minimalist program.
1. Model of grammar Optimality Theory (OT) has been developed since the early nineties, by Alan Prince, Paul Smolensky, John McCarthy and others. At first, the focus was mainly on phonology; but the approach has since been extended to morphology, syntax, semantics, and pragmatics. The most comprehensive (and best) exposition of the theory is still Prince and Smolensky (1993, 2004). Early groundbreaking work in syntax includes Grimshaw (1997), Pesetsky (1998), and Legendre, Smolensky, and Wilson (1998). Introductions include Kager (1999) (with little material on syntax), Müller (2000b) (in German), Legendre (2001), and McCarthy (2002) (with quite a bit on syntax). OT shares with most other grammatical theories the assumption that constraints are crucial in restricting the class of possible linguistic expressions (LEs) in natural languages; however, it differs in important ways from virtually all other grammatical theories in that it envisages a nontrivial interaction of constraints. More specifically, OT rests on four basic assumptions: First, constraints are universal (universality). Second, constraints are violable (violability). Third, constraints are ranked (ranking). And fourth, the wellformedness of an LE cannot solely be determined on the basis of LE’s internal properties. Rather, external factors (more precisely, the competition of LE with other linguistic expressions) determine whether LE is grammatical or not (competition): LEs are candidates. (Here and henceforth, LE stands for a grammatical unit that is subject to an optimization procedure
875
876
IV. Syntactic Models deciding on its wellformedness. LE is the basic unit of a grammatical domain (phonology, morphology, syntax, semantics); e.g.: the sentence in syntax (but see below).) None of these assumptions is shared by standard grammatical theories like Chomsky’s (1981) Government-Binding (GB) theory or Pollard and Sag’s (1994) Head-Driven Phrase Structure Grammar. Taking GB theory as a typical example, we can first observe that here, not all constraints are universal (there are parameters and language-specific filters − but cf. third-factor meta-constraints on constraints in recent work in the minimalist program, as in Chomsky 2007, 2008). Second, constraints cannot be violated. Third, constraints are not ranked (i.e., all are equally important and do not interact). (It has sometimes been argued that there is a difference between, e.g., weak and strong violations of constraints on movement, such as the Subjacency Condition vs. the Empty Category Principle (ECP) in Chomsky (1986). However, this is just stipulated on top of the grammatical decision procedure (yes/no), and does not reflect a genuine interaction of constraints, let alone a ranking.) Finally, the wellformedness of a linguistic expression LE (e.g., a sentence) can standardly fully be determined on the basis of LE’s internal properties. External factors (i.e., the properties of other LEs) are irrelevant. At the heart of OT is the concept of optimality of a candidate LE, which can be defined as in (1). (1)
Optimality: A candidate Ci is optimal with respect to some constraint ranking iff there is no other candidate Cj in the same candidate set that has a better violation profile.
For now, we can assume that optimality equals grammaticality (or wellformedness). (1) introduces two additional concepts − that of a violation profile, and that of a candidate set. The violation profile determines which of two competing candidates is to be preferred. A concept of violation profile that in principle permits more than one candidate to be optimal is given in (2) (this is in contrast to Grimshaw 1997, which presupposes that only one candidate can be optimal in any given candidate set). (2)
Violation profile: Cj has a better violation profile than Ci if there is a constraint Conk such that (i) and (ii) hold: (i) Cj satisfies Conk better than Ci. (ii) There is no constraint Conl ranked higher than Conk for which Ci and Cj differ.
We can assume that a candidate Cj satisfies a constraint Con better than a candidate Ci if Cj violates Con less often than Ci. This includes, as a special case, the situation that Cj does not violate Con at all, whereas Ci does. Turning to candidate sets next, the basic task of this concept is to clarify what competes with what. Various different versions of the concept have been proposed for syntax. (3) lists some of the more widely adopted definitions. (Similar questions, and similar kinds of variation, can be found in (mostly early) versions of the minimalist program (Chomsky 1995, 2001) that rely on transderivational constraints which choose among a set of competing derivations in a candidate set; see Sternefeld (1996) and references cited there. In the minimalist tradition, candidate sets are usually referred to as reference sets.)
26. Optimality-Theoretic Syntax (3)
Candidate set: Two candidates are in the same candidate set iff a. they have the same content words b. they have the same words (see Chomsky’s 1995 numeration) c. they have the same meaning d. they have the same content words and the same meaning e. they have the same words and the same meaning f. they have the same content words and a sufficiently similar meaning g. they have the same f-structure (see work in OT-LFG, where OT is combined with Lexical Functional Grammar; cf. Choi 1999; Sells 2001b; Bresnan 2001; papers in Sells 2001a; and Butt and King, this Volume) h. they have the same D-structure (see work in the GB tradition) i. they have the same predicate/argument structures and the same logical forms j. they have an identical index (a “target predicate-argument structure, with scopes indicated for variables; operators mark scope”) (Legendre et al. 1998: 258)
In order to be able to check candidate LEs against a set of violable and ranked constraints and resolve the competition by determining the optimal candidate in a candidate set, one must have the candidates first. In other words: The approach to syntax sketched so far presupposes that there is another, prior, component that generates the candidates. (As a side remark, note that this interaction of components, including the order (generation % optimization) instantiates an abstraction as it is standard in most grammatical theories. It should not be understood as making claims about OT as a model of linguistic cognition in the narrow sense, such that, e.g., first a (possibly infinite; see below) set of candidates is generated in actual language processing that is then subjected to optimization, with the best candidate chosen. Here, techniques have been suggested (at least for certain empirical phenomena) that make it possible to determine the optimal candidate without building all competitors first; see Tesar (1995) and Riggle (2004).) The truly optimalitytheoretic component of a grammar that selects a candidate with a best violation profile is often referred to as the H-EVAL (Harmony Evaluation) part of the grammar; this component is fed by a simple standard grammar with inviolable and non-ranked constraints that is called GEN (Generator). The full structure of the syntax component of an OT grammar is given in (4). It is clear what the H-EVAL component takes as its input (viz., the candidate set of competing output candidates generated by GEN); but, as indicated in (4), a major open question (in fact, arguably one of the biggest unresolved problems of OT syntax) is what GEN in turn takes as its input. For phonology, the standard OT assumption is that GEN creates outputs candidates on the basis of an input; i.e., inputs also define the candidate set (see Prince and Smolensky 2004). Outputs then differ from their underlying input in various ways (giving rise to faithfulness violations; see below), but inputs are standardly assumed to be of roughly the same type as outputs (e.g., underlying representations [URs]), and may even be identical. This seems hardly tenable for syntax (or for morphology) because it does not take into account the effect of structure-building operations: If outputs for H-EVAL are syntactic structures, and structures are generated by GEN, then where does the input structure come from if inputs are also syntactic structures? Consequently, it is at present completely unclear what the input in syntax should look like.
877
878
IV. Syntactic Models (4)
Structure of an optimality-theoretic syntax component O1
O2 H(armony)Ii
Gen(erator)
Eval(uation)
O3
Oi
...
On input: • numeration?
optimal output:
• pred/arg
well-formed candidate
structure? output candidates
• nothing? part (i) of the grammar:
part (ii) of the grammar:
inviolable, unordered
violable, ranked,
constraints; simple
universal constraints;
standard grammar
genuine OT grammar
Suggestions range from relatively poorly structured inputs (e.g., predicate/argument structures in Grimshaw 1997) to extremely richly structured inputs (e.g., the index of Legendre et al. 1998); in fact, given that one task standardly attributed to the input is that of defining candidate sets, many of the proposals in (3) can also be viewed as proposals for concepts of inputs. What is more, it might be that there is no input in syntax at all. Thus, in Heck et al. (2002) it is argued that the two basic motivations for inputs in phonology − viz., (i) defining candidate sets and (ii) providing information for faithfulness constraints (see below) − are either unavailable or irrelevant in syntax. More specifically, (i) is unavailable because candidate sets cannot adequately be defined by resorting to input information only, and (ii) is irrelevant because syntax, unlike phonology, is an information-preserving system, with, e.g., subcategorization information present on a verb throughout the derivation. In what follows, I will leave this issue undecided. I will continue to presuppose that inputs exist (but I will not presuppose any specific concept of input); I will equate candidates with outputs. Standardly, two basic types of H-EVAL constraints can be distinguished in OT that often give rise to conflicts. On the one hand, there are faithfulness constraints that demand that input and output are identical with respect to some property. There are three basic subtypes: First, DEP (Dependency) constraints (sometimes also referred to as FILL constraints, with subtle differences related to the overall organization of grammar that need not concern us here) state that there can be no items in the output that are not present in the input. Assuming, for instance, expletives to be absent in syntactic inputs, the occurrence of an expletive in an output will violate a DEP constraint. The same may hold for traces (or copies), assuming that syntactic inputs (whatever they ultimately look like) are unlikely to involve movement. Second, MAX (Maximality) constraints
26. Optimality-Theoretic Syntax (sometimes also referred to as PARSE constraints, with the same qualification as above) demand that all items that are present in the input are also present in the output. Thus, all kinds of deletion phenomena will incur violations of MAX constraints. Third, IDENT (Identity) constraints prohibit the modification of items from input to output. Note that DEP, MAX, and IDENT constraints can be formulated for items of various complexity levels (e.g., feature values, features, feature bundles, lexical items, perhaps complex syntactic categories). Accordingly, MAX/DEP constraints for items with complexity n can often be reformulated as IDENT constraints at the next-higher complexity level n+1, and vice versa. E.g., deletion of a feature (a MAX violation) will give rise to a different lexical item bearing this feature in the input (an IDENT violation). Next to faithfulness constraints, there is a second basic type of H-EVAL constraint: Markedness constraints impose requirements on outputs that may necessitate a deviation from the input. (A side remark: Under the input-free conception of OT syntax mentioned above, DEP, MAX and IDENT constraints all have to be reformulated as constraints that are purely output-oriented. In the case of DEP constraints, this will involve markedness constraints banning items with property P, where P is the property that kept the item from appearing in the input in the first place. For instance, rather than violating faithfulness qua not appearing in the input, expletives, under the input-free view, would violate markedness qua being semantically empty, which would be just the property responsible for their non-occurrence in inputs in the standard OT model; see Grimshaw 1997 for such an approach.) Optimality-theoretic competitions are often illustrated by tables, so-called tableaux. The basic principle is illustrated in (5). There are three constraints A, B, and C, with A ranked higher than B, and B ranked higher than C (A [ B [ C). The candidate set contains five candidate outputs O1−O5 (typically, there are many more, but let us focus on these five for now). Violations incurred by a candidate are marked by a star (*). A decisive violation of some constraint that is responsible for eliminating the candidate by classifying it as suboptimal is here accompanied by an exclamation mark (!); this is strictly speaking redundant and is accordingly sometimes left out in tableaux. Finally, an optimal candidate is identified by the so-called pointing finger: ☞. Given the constraint violations induced by the candidates, and given the ranking of the three constraints A [ B [ C, O1 turns out to have the (sole) best violation profile in tableau (5) (see definition [2]), and is therefore predicted to be optimal (see definition [1]). (5)
The basic principle A
B
☞ O1
*
O2
**!
O3 O4 O5
C
*! *! *!
*
Consider next the issue of cross-linguistic variation. An assumption that is not made in most minimalist approaches, but virtually everywhere else in syntactic theory (including GB theory) is that languages differ with respect to their grammars (i.e., not just the make-
879
880
IV. Syntactic Models up of lexical items). Grammatical differences between languages are often assumed not to be completely arbitrary; this is then captured by assuming some kind of principled variation, or parametrization. Parametrization in optimality theory is simply viewed as constraint reranking. Thus, suppose that the ranking of constraints B and C is reversed in tableau (5), with the violation profile of the competing outputs remaining identical. In that case, O3 (rather than O1) is predicted to be optimal. This is shown in tableau (6). (6)
Parametrization A
C
O1
*!
O2
**!
☞ O3 O4
B
* *!
O5
*!
*
OT was developed out of so-called harmonic grammar approaches, which are instantiations of a more general theory of neural networks; see Prince and Smolensky (2004: ch. 10) and Smolensky and Legendre (2006: part I) for detailed elaboration of the differences between these two kinds of approaches. The main innovation of OT is that quality comes before quantity, in the sense that no number of violations of a lower-ranked constraint can outweigh a single violation of a higher-ranked constraint. This property (wich is also known as strict domination) is encoded in the definition of violation profile in (2); it is illustrated by the abstract competition in tableau (7): Even though candidates O3 and O4 each incur only one constraint violation in total (and O5 only two), O1, with four constraint violations all in all, emerges as optimal because its violations only concern the lowest-ranked constraint C. Quantity does become relevant when quality cannot decide between candidates; thus, O2 is blocked by O1 because it incurs more violations of the highest-ranked constraint on which the two candidates differ. (7)
Irrelevance of constraint violation numbers as such A
B
☞ O1
****
O2
*****!**
O3 O4 O5
C
*! *! *!
*
However, there is a caveat. In some versionsof OT, a means has been introduced that undermines the irrelevance of constraint violation quantity as such, viz., local conjunction of constraints; see Smolensky (1996, 2006). Local conjunction can be defined as in (8).
26. Optimality-Theoretic Syntax (8)
881
Local Conjunction: a. Local conjunction of two constraints Con1, Con2 with respect to a local domain D yields a new constraint Con1&DCon2 that is violated iff there are two separate violations of Con1 and Con2 in a single domain D. b. Universal ranking: Con1&DCon2 [{Con1, Con2} c. It may be that Con1 = Con2. (Local conjunction is reflexive.) d. Notation: B2 = B&B, B3 = B2&B, etc.
Given local conjunction, the situation can arise that the joint violation of two low-ranked constraints B, C may in fact outweigh the violation of a higher-ranked constraint A (because the complex constraint B&DC derived from local conjunction may be ranked higher than A). Moreover, local conjunction can be reflexive (see [8c]); this means that multiple violations of a single constraint may also suffice to outweigh the violation of a higher-ranked constraint. This is illustrated in tableau (9), which differs minimally from tableau (7) in that C4 (the result of iterated reflexive local conjunction applying to C which is violated when C is violated four times or more) is present, and which produces a different winner (viz., O3). (9)
A consequence of reflexive local conjunction C4
A
B
C
O1
*!
****
O2
*!
*******
☞ O3 O4 O5
* *! *!
*
Tableau (9) should make it clear that local conjunction is far from harmless. This conclusion is reinforced by the observation that an unrestricted system of local conjunction automatically leads to a non-finite set of constraints, which is otherwise unheard of in grammatical theory. Still, it can be noted that local conjunction (reflexive or otherwise) has given rise to a number of insightful analyses of various grammatical phenomena; so there is in fact quite a bit of empirical evidence for it (see, for instance, Legendre et al. 1998 on displacement, Fischer 2001 on quantifier scope, Aissen 1999, 2003a and Keine and Müller 2008, 2011 on differential argument encoding, and Keine 2010 on eccentric instances of case and agreement). In what follows, I will highlight central aspects of OT syntax, focussing on issues where OT syntax substantially differs from other syntactic theories. In doing so, I will first discuss types of empirical evidence that would seem to support an OT perspective (section 2); after that I turn to kinds of data that may qualify as problematic for an OT perspective (section 3). Section 4 then addresses a topic that strikes me as potentially very important for future work in OT syntax, particularly when compared to recent developments in the minimalist program, viz., the issue of optimization domains. Finally, in section 5, I turn to the prospects for OT syntax “as a framework”.
882
IV. Syntactic Models
2. Evidence for OT analyses in syntax Central pieces of evidence for OT analyses come from the following four domains: (i) constraint conflict, (ii) repair phenomena; (iii) default contexts (emergence of the unmarked), and (iv) cross-linguistic variation by constraint reranking. I will address these issues in turn.
2.1. Constraint conflict Here the profile of the empirical evidence looks as follows. The facts show that two general and far-reaching constraints are well motivated, independently of one another. However, in some contexts the two constraints may end up being in conflict, with the evidence suggesting that one may selectively, and systematically, be violated in favour of the other. In standard approaches to grammar, this state of affairs automatically gives rise to an undesirable consequence: One of the two constraints must be abandoned; or there has to be an explicit exception clause in the definition of one of the constraints; or the application of one of the two constraints has to be relegated to some other (typically more abstract) level of representation; etc. In an OT grammar, the constraint conflict can be systematically resolved by constraint ranking. Simple wh-movement in English is a case in point; consider (10). (10) a. I don’t know [CP which book John bought] b. *I don’t know [CP John bought which book] Any grammar of English will recognize (something like) (11a) and (11b) as two plausible constraints: On the one hand, in simple questions, a wh-phrase moves to a clause-initial position (SpecC, e.g.); on the other hand, a direct object shows up in the immediate vicinity of the verb that it is an object of. (11) a. WH-CRITERION (WH-CRIT): Wh-items are in SpecC[wh]. b. θ-ASSIGNMENT (θ-ASSIGN): Internal arguments of V are c-commanded by V. In (10), (11a) and (11b) cannot both be satisfied, and the well-formedness of (10a) suggests that it is (11b) that has to give in the case of conflict. This conclusion cannot be drawn in standard models of grammar (that do not envisage constraint violability), though. The consequence here has to be that either θ-ASSIGN does not hold; or the constraint is enriched by an exception clause (“does not hold for wh-items”); or both constraints hold, but not at the same level of representation (WH-CRIT may hold for surface representations or S-structure, θ-ASSIGN may hold for an abstract level of predicate argument structure or D-structure). In contrast, in OT, both constraints can be assumed to hold, but they are ranked as in (12). (12) Ranking: WH-CRIT [θ-ASSIGN
26. Optimality-Theoretic Syntax
883
The competition underlying (10) is illustrated in tableau (13) (for the sake of clarity, a specification of the input is provided in the form of a numeration; this is of no further importance in the present context). (13) Simple wh-question formation in English Input: John, bought, which, book, v, T, C[+wh]
WH-CRIT
☞ O1: ... which book John bought O2: ... John bought which book
θ-ASSIGN *
*!
Note that the displacement of the wh-item can be analyzed in terms of a syntactic movement transformation that moves the wh-item from its base position into the target SpecC position. Movement may be assumed to leave a trace (t) or a copy. On this view, the role of θ-ASSIGN can be taken over by the more general constraint ECONOMY (see Grimshaw 1997; Legendre et al. 1998; Ackema and Neeleman 1998; among others, for versions of this constraint). Given that OT, like other grammatical theories, strives for maximally simple and elegant constraints, this would seem to be a step in the right direction. (14) ECONOMY: Traces (copies) are prohibited. As a matter of fact, it turns out that (14) can − and, if so, arguably should − be derived from yet more general constraints and their interaction; Grimshaw (2001, 2006) has come up with promising attempts to achieve this (also see Steddy and Samek-Lodovici 2011 for an application of the underlying logic to universal constraints on DP-internal order of D, Number, A and N): On the one hand, it can be observed that all syntactic constituents violate so-called alignment constraints that dictate the left-peripheral or right-peripheral placement of items. Given dichotomies like HEAD-LEFT/HEAD-RIGHT and COMPLEMENT-LEFT/COMPLEMENT-RIGHT, with (due to the universality of constraints) both inherently conflicting constraints of a pair active in every language even if only one of the two actually determines a given order, it is clear that more structure will invariably imply more violations of alignment constraints (viz., the ones which are violated in any given structure). Movement is structure-building; therefore, any ECONOMY violation will also trigger a violation of alignment; see Grimshaw (2001). On the other hand, as remarked above, all movement chains in outputs are trivial (i.e., singlemembered) in the input. Movement gives rise to non-trivial (i.e., multi-membered) chains. This implies a violation of faithfulness (IDENT/UNIQUENESS); see Grimshaw (2006). The main conclusion concerning the role of constraint conflict is summed up in the following quote. Whether UG constraints conflict or not is an empirical issue. If they do, and they do appear to do so, a formally precise theory of their interaction becomes necessary for a proper understanding of grammar because simultaneous satisfaction of all constraints ceases to be a viable definition of grammaticality. (Samek-Lodovici 2006a: 94)
884
IV. Syntactic Models
2.2. Repair phenomena With repair phenomena, the profile of the empirical evidence is this: The facts suggest that some well-formed complex LE exhibits properties that are not normally permitted in the grammar. It seems that, in the case at hand, these properties are permitted as a last resort (given that all alternatives qualify as even worse, in a sense to be made precise). Consider the distribution of resumptive pronouns in English, as indicated by the examples in (15). (15) a. (the man) who(m) b. *(the man) who(m) c. *(the man) who(m) d. ?(the man) who(m)
I I I I
saw t don’t believe the claim that anyone saw t saw him don’t believe the claim that anyone saw him
The insertion of resumptive pronouns may (often) be viewed as a repair phenomenon, i.e., as a last resort operation that can only take place if a well-formed sentence cannot otherwise be generated (see Shlonsky1992; Hornstein 2001). Here, a resumptive pronoun is possible if movement is blocked by an island constraint (the Complex NP Constraint, in the case at hand; see Ross 1967); compare (15b) (movement) with (15d) (resumption). If movement is possible, resumption is blocked; cf. (15a, c). The insertion of a resumptive pronoun (which, by assumption, is not part of the input) violates a DEP faithfulness constraint, but is required by a higher-ranked markedness constraint. OT analyses of resumptive pronouns that employ this general logic have been developed in Pesetsky (1998), Legendre et al. (1998), and Salzmann (2006). Let us look at what a (simplified) account of the pattern in (15) could look like. Suppose that there is a constraint like REL-CRIT in (16a) that triggers displacement in relative clauses; and that there is an island constraint like CNPC in (16b). (There may eventually be much more general constraints, or sets of constraints, replacing both REL-CRIT and CNPC, but this is immaterial for the logic of the argument.) Furthermore, there is a DEP constraint blocking insertion of resumptive pronouns in outputs. Following Chomsky (2000, 2001), this constraint may be referred to as INCLUSIVENESS; see (16c). (16) a. REL-CRITERION (REL-CRIT): Relative pronouns are in SpecC of a relative clause. b. COMPLEX NP CONDITION, (CNPC): A moved item must not be separated from its trace by an intervening DP. c. INCLUSIVENESS (INCL, a DEP constraint): Every element of the output must be present in the input. Suppose next that the ranking is as in (17) (the ranking of REL-CRIT and CNPC is actually not crucial here but will be assumed to be strict to simplify matters). (17) Ranking: REL-CRIT [ CNPC [ INCL This accounts for the pattern in (15). Tableau (18) shows two things. First, the highestranked REL-CRIT is not violable in an optimal output (i.e., relative operator movement
26. Optimality-Theoretic Syntax
885
is obligatory). And second, a resumptive pronoun that violates INCL is blocked if movement is possible (i.e., compatible with CNPC). (18) Trace vs. resumptive pronouns; transparent context Input: I, who(m), saw, C[rel], the, man
REL-CRIT
CNPC
INCL
☞ O1: the man who(m) I saw t O2: the man who(m) I saw him O3: the man I saw who(m)
*! *!
In contrast, tableau (19) illustrates that if movement would have to violate CNPC, resumption becomes optimal: INCL is violable as a last resort. (19) Trace vs. resumptive pronoun, opaque CNPC context Input: anyone, who(m), saw, I, do, not, believe, the, claim, that, C[rel], the man
REL-CRIT
O1: the man who(m) I don’t believe the claim that anyone saw t
INCL
*!
☞ O2: the man who(m) I don’t believe the claim that anyone saw him O3: the man I don’t believe the claim that anyone saw who(m)
CNPC
* *!
There are many more such instances of repair phenomena in syntax, and some of them have been given OT analyses that reflect the basic mechanism just presented. The first repair phenomenon to receive an optimality-theoretic account is do-support in English; see Grimshaw (1997) (Grimshaw shows that insertion of an expletive do, which violates a faithfulness constraint, is only possible in wh-contexts and negation contexts, where (partially different sets of) higher-ranked constraints conspire so as to make verb raising obligatory, and an auxiliary that can raise is not available), and also Grimshaw (2010) and references cited there. Other phenomena include the so-called Ersatz-infinitive (Infinitivus pro Participio) in German and the opposite phenomenon of Participium pro Infinitivo in Swedish (see Wiklund 2001; Schmid 2005; Vogel 2009a); R-pronouns in German, Dutch and (Middle) English (see Müller 2000a); wh-scope marking in German and Hungarian (see Müller 1997); expletives in SpecC and SpecT (see Müller 2000b; Grimshaw 2006); repair-driven quantifier raising in VP ellipsis contexts in English, as identified in Fox (2000) (see Heck and Müller 2000, 2003); repair-driven intermediate movement steps as required by the PIC of Chomsky (2001, 2008) (see Heck and Müller 2000, 2003); repair-driven multiple wh-movement in German sluicing constructions, as identified in Merchant (2001) (see Heck and Müller 2000, 2003); and so on. For all these phenomena, the idea that a repair or last resort effect is involved looks fairly natural, and has been widely pursued in various types of syntactic theories. However, as pointed out by Grimshaw (2010), theories that lack the concept of constraint violability
886
IV. Syntactic Models and constraint ranking (like virtually all current non-OT approaches) “generally appeal to the last resort idea by word and not by deed”; “the words ‘last resort’ are employed but the concept plays no role in the analysis.” Consider briefly three of these phenomena. The German Ersatz-infinitive pattern is given in (20). hat. (20) a. dass sie das gewollt that she that want.PTCP has ‘that she wanted it.’
[German]
b. *dass sie das hat wollen. that she that has want.INF ‘that she wanted it.’ hat. c. *dass sie das Lied singen gewollt that she the song sing.INF want.PTCP has ‘that she wanted to sing the song.’ d. dass sie das Lied hat singen wollen. that she the song has sing.INF want.INF ‘that she wanted to sing the song.’ The perfect auxiliary haben normally selects a past participle; it is incompatible with an infinitival form of its dependent main verb; see (20a) vs. (20b). However, when the dependent verb of the perfect auxiliary is itself (used as) a modal verb that further subcategorizes for another verb (which in turn regularly shows up as an infinitive), it has to take on the infinitival form (see [20d]); the expected participial form cannot show up (see [20c]). In addition, the change to the Ersatz-form is obligatorily accompanied by a reversal of order (at least in Standard German): VP-Aux becomes Aux-VP. The gist of an OT analysis will then consist in postulating the interaction of a faithfulness constraint demanding selection requirements to be respected (the perfect auxiliary selects a past participle) on the one hand, and a higher-ranked markedness constraint banning configurations where past participle modals embed verbal categories on the other hand; the optimal violation profile will then (ideally) automatically emerge as one with a reversal of word order. Next, consider the case of repair-driven quantifier raising as it is documented in English VP ellipsis constructions as in (21) (deletion is indicated by crossing out words). (21) a. [CP₁ Some boy admires every teacher ], [ and [CP₂ some girl does [VP admire every teacher] too ]] (dc, cd) b. [CP₁ Some boy admires every teacher], [ and [CP₂ Mary does [VP admire every teacher] too ]] (dc, *cd) c. [CP₁ Mary admires every teacher], [ and [CP₂ some boy does [VP admire every teacher] too ]] (dc, cd) The observation to be explained here is that whereas (21a, c) are scopally ambiguous, (21b) is not. Suppose, following Fox (2000), that scope reversal from the linear order requires quantifier raising (QR), and that there is an economy constraint blocking QR if the same interpretation is reached without it (in particular, QR of an object quantifier is
26. Optimality-Theoretic Syntax thus blocked if the subject is a proper name, as in the second (CP2) conjunct in (21b) and in the first (CP1) conjunct in (21c). Furthermore, note that VP ellipsis obeys strict parallelism: What happens in one conjunct must also happen in the other one. Finally, suppose, again following Fox, that the two conjuncts are generated one after the other, in a bottom-up, right-to-left fashion: CP2 is then optimized before CP1 is. (As a matter of fact, this presupposes that optimization can be serial and local. See section 4 below.) On this view, the ranking of the parallelism requirement above scopal economy will produce the pattern in (21). Both constraints can be fulfilled by both optimizations (applying first to CP2, and then to CP1) in (21a). In (21b), parallelism is not yet an issue in CP2 (for CP1 does not yet exist); so QR is blocked in CP2. Subsequently, parallelism becomes relevant during CP1 optimization, and since CP2 cannot be changed anymore, it blocks scope reversal in CP1 even though this would be semantically non-vacuous. Finally, in (21c), QR may apply in CP2 (since it is semantically non-vacuous), and if it does, parallelism will force it to also apply later in CP1, even though it is not semantically motivated there − in other words, it is repair-driven. The final repair phenomenon to be considered here is repair-driven multiple whmovement in German sluicing constructions. Assume that sluicing is analyzed as whmovement to a SpecC position, accompanied by deletion of the TP sister of an interrogative C. (22) a. Irgendwer hat irgendwas geklaut, aber Kirke weiß nicht mehr [German] someone has something stolen but Kirke knows not more [CP wer1 was2 C t1 t2 geklaut hat ] stolen has who what ‘Someone stole something, but Kirke does not know anymore who stole what.’ b. *Irgendwer hat irgendwas geklaut, aber Kirke weiß nicht mehr [CP wer1 who someone has something stolen but Kirke knows not more C t1 was2 geklaut hat what stolen has ‘Someone stole something, but Kirke does not know anymore who stole what.’ The interesting observation is that (assuming that all alternative parses, e.g., via whscrambling, can be excluded) (22a) instantiates a case of multiple wh-movement, which is not available outside of sluicing contexts in German and would therefore seem to qualify as a repair phenomenon. An OT analysis can postulate an interaction of a general recoverability requirement that precludes a wh-phrase from being deleted, and a second constraint (or set of constraints) ensuring that only one wh-phrase can undergo movement to the specifier of an interrogative C in German; as shown by the contrast in (22a) vs. (22b), the first constraint outranks the second one, leading to multiple wh-movement in the case of conflict.
2.3. Default contexts The notion of default is a core concept in linguistics. The profile of the empirical evidence looks as follows: The data suggest that there is a concept like unmarked case
887
888
IV. Syntactic Models (default, elsewhere case): Some linguistic property P of LEs counts as the unmarked case if it shows up whenever something else (that is incompatible with P) is not explicitly required. In standard conceptions of grammar, the theoretical implementation of this concept is far from unproblematic. (Whenever it seems to be unproblematic, as in approaches to syntax that envisage blocking (see Williams 1997; Fanselow 1991), or in Distributed Morphology (Halle and Marantz 1993), this is due to the fact that the approach in fact shares crucial features with OT − in the case at hand, that it is based on competition and candidate sets, too.) In OT, an unmarked case signals the presence of a constraint C that is ranked very low, and that is typically rendered inactive by higherranked, conflicting constraints. However, when these latter constraints do not distinguish between the candidates, C becomes decisive; this state of affairs is usually referred to as the emergence of the unmarked. As an example, consider the following empirical generalization: In the unmarked case, a DP bears nominative case in German; i.e., nominative is the default case. Default nominative shows up in all contexts in which the regular rules of case government do not apply. This includes the contexts in (23). (23a) instantiates a construction in which an appositive DP introduced by als (‘as’) further specifies a DP. In principle, there is an option (not shown here) for the appositive DP to show up with the same case as the other DP (here, the genitive), via a process of case agreement (see Fanselow 1991). However, if this option is not chosen, the appositive DP receives nominative case, as a default. A second context involves infinitival constructions with a (case-less) PRO subject (see again Fanselow 1991); it looks as though there is no possible case-government or case-agreement source for the DP einer in (23b); so it receives default nominative case. The third example in (23c) involves left dislocation. As in the first context, there is an option of case agreement, but if this option is not chosen, the left-dislocated item bears default nominative case. Finally, (23d) is an instance of a predicative use of und (‘and’), which here connects a subject with an infinitival VP (see Sailer 2002). Standardly, subjects in German bear nominative case in the presence of finite T, and accusative case if embedded under exceptional case marking (AcI; accusativus cum infinitivo) verbs. Since neither context is present in (23d), there is a resort to default case. Kanzlers als großer (23) a. die Ehrung des the homage the.GEN chancellor.GEN as great.NOM Politiker / *großen Politiker politician.NOM great.ACC politican.ACC ‘the homage paid to the chancellor as a great politician’
[German]
nach dem anderen / *einen b. Wir baten die Männer [CP PRO einer one.NOM after the other one.ACC we asked the men nach dem anderen durch die Sperre zu gehen ] after the other through the barricade to go ‘We asked the men to go through the barricade one after the other.’ c. Der Kaiser / *Den Kaiser, dem verdanken wir nichts the emperor.NOM the emperor.ACC him owe we nothing ‘As for the emperor, we don’t owe him anything.’
26. Optimality-Theoretic Syntax
889
/ *Den und ein Buch lesen? (Dass ich nicht lache !) d. Der the.M.NOM the.M.ACC and a book read that I not laugh ‘Him read a book? You must be joking.’ The examples in (24) show that the nominative in (23a−d) is indeed a default case; it is is overriden in all contexts in which rules of case-government apply (accusative assignment by, perhaps, the category v in unmarked object case contexts in [24a], genitive assignment by a V which is lexically specified for this case in [24b], dative assignment by, perhaps, an applicative functional head in a double object construction in [24c]). / ihn getroffen habe (24) a. dass ich *er that I he.NOM him.ACC met have ‘that I met him.’
[German]
/ des Mannes gedachte b. dass man *der Mann that one the man.NOM the man.GEN remembered ‘that people remembered the man.’ / dem Mann das Buch geben c. dass wir *der Mann that we the man.NOM the man.DAT the book.ACC give ‘that we give the man the book.’ The distribution of cases in (23) and (24) can (partially) be accounted for by the system of case-related constraints in (25), accompanied by the ranking in (26). (25) a. GEN(ITIVE) CONSTRAINT (GEN): The object of a verb that is lexically marked as governing genitive case bears genitive. b. ACC(USATIVE) CONSTRAINT (ACC): The object of a transitive verb bears accusative case. c. NOMINATIVE CONSTRAINT (NOM): A DP bears nominative case. (26) Ranking: GEN [ ACC [ NOM The competition in a typical case-government context is illustrated in tableau (27): Nominative case is blocked on the object because a higher-ranked constraint demands accusative here. (27) Accusative government Input: dass, getroffen, habe, 1.Sg./Agent, 3.Sg./Patient
GEN
ACC
☞ O1: dass ich.NOM ihn.ACC getroffen habe O2: dass ich.NOM er.NOM getroffen habe O3: dass mich.ACC ihn.ACC getroffen habe
NOM *
*! **
890
IV. Syntactic Models In contrast, consider tableau (28): Here, all higher-ranked case-related constraints are satisfied vacuously, so the low-ranked constraint NOM springs into action and ensures nominative case on the subject of und. (28) Nominative as the unmarked case Input: und, ein, Buch, lesen, 3.Sg./Agent/Dem O1: Den.ACC und ein Buch lesen ?
GEN
ACC
NOM *!
☞ O2: Der.NOM und ein Buch lesen ? O3: Dem.DAT und ein Buch lesen ?
*!
Note that if nominative (or absolutive in ergative alignment patterns) is inherently a default case across languages, free reranking of the constraints in (26) must be blocked in some way, which may then be related to the more primitive feature structures of the cases (see, e.g., Wunderlich 1997; Kiparsky 1999, 2001). Then again, a look at English may already suggest that other cases may also act as the default case in a language (the accusative in the case at hand). There are many other default phenomena in natural languages, and most of them can arguably be treated straightforwardly in the same way, as an instance of emergence of the unmarked. To name just one further phenomenon: Movement often seems to obey an order preservation constraint. However, to permit permutation at all (as it arises, e.g., when an object wh-DP moves to a local SpecC position across a subject DP), such a constraint must clearly be violable (see Williams 2003). This suggests an OT approach where the constraint demanding order preservation is ranked low but springs into action when all pertinent higher-ranked constraints do not distinguish the candidates; see Müller (2001) and Engels and Vikner (2013). Before proceeding, let me briefly come back to the nature of the constraint GEN. (25a) may plausibly be viewed as a subcase of a more general constraint demanding faithfulness to lexical case specifications. At this point, it can be noted that despite initial appearances to the contrary, OT is arguably not perfectly well designed to capture lexical exceptions via faithfulness to lexical specifications. Here is why: Suppose that a lexical item α is lexically specified as demanding property P in the output (e.g., a verb governs genitive case on its internal argument DP). If a faithfulness constraint demanding preservation of this information in the output is sufficiently highly ranked, P shows up in the output, as desired (e.g., the DP is marked genitive). However, there is no intrinsic requirement for faithfulness for lexical specifications to outrank conflicting constraints in a language, and this means that the situation may well occur that exceptional lexical specifications may be present on lexical items without ever showing up in optimal outputs. To take a far-fetched example: All transitive verbs in German might be lexically specified as governing ergative case for their subject, or as governing instrumental case on a direct object, but with high-ranked case-government constraints outranking the respective faithfulness constraints demanding ergative (or instrumental), this information can never make it to the surface. On this view, peculiar ambiguities may arise: A grammar of German with ergative specifications on transitive verbs and another grammar without will yield the same output; this would give rise to principled redundancies. Potential problems of this type show up systematically in OT. They have been addressed
26. Optimality-Theoretic Syntax by invoking a meta-optimization procedure (input optimization) that is related to learnability considerations in Prince and Smolensky (2004). Also see Prince (2006: 52−58) for pertinent discussion.
2.4. Cross-linguistic variation The approach to case sketched in the preceding subsection relies on a system of ranked constraints demanding the realization of individual cases. Interestingly − and this reveals a more general pattern of OT analyses −, similar effects can be obtained under a system of ranked constraints prohibiting the realization of individual cases, as long as these constraints are accompanied by an inherently highest-ranked (or, as part of GEN, inviolable) constraint that states that all DPs have case. Such an approach is pursued by Woolford (2001), and it may serve to illustrate the simple way that cross-linguistic variation can be handled by reranking in OT. Here are Woolford’s (2001) background assumptions. First, there are (ordered) markedness constraints that block the realization of cases; see (29a, b, c). Second, there are faithfulness constraints that demand the realization of case specifications in the input (i.e., the realization of lexical, inherent case). Case faithfulness constraints come in two versions: a general one that covers both intransitive and (as we will see, irrelevantly) transitive contexts in (29d), and a more specific one for transitive contexts only in (29e). Third, nominative (= absolutive) and accusative are assumed to be structural cases; but both dative and ergative (as well as genitive) are considered inherent cases (that must be specified on a verb). Finally, it is presupposed that every DP must be case-marked; at least for present purposes, this may be viewed as a requirement imposed by GEN. (29) a. *DAT (*Dative): Avoid dative case. b. *ACC (*Accusative): Avoid accusative case. c. *NOM (*Nominative): Avoid nominative case. d. FAITH-LEX: Realize a case feature specified on V in the input. e. FAITH-LEXtrans: Realize a case feature specified on transitive V in the input. Against this background, Woolford shows that cross-linguistic variation with respect to lexical (quirky) case on subjects can be easily derived. The distribution of lexically casemarked subjects in Icelandic, Japanese, and English follows directly from the rankings assumed in (30). (30) a. Ranking in Icelandic: FAITH-LEXtr [ FAITH-LEX [ *DAT [ *ACC [ *NOM b. Ranking in Japanese: FAITH-LEXtr [ *DAT [ FAITH-LEX [ *ACC [ *NOM c. Ranking in English: *DAT [ FAITH-LEXtr [ FAITH-LEX [ *ACC [ *NOM
891
892
IV. Syntactic Models (30a) correctly predicts that lexically specified case-marking on subjects in the input will always be realized in an optimal output in Icelandic (i.e., in both transitive and intransitive contexts), even if this implies a violation of a fairly high-ranked case markedness constraint like *DAT; see (31a) (intransitive context) and (31b) (transitive context). (31) a. Bátnum hvolfdi boat.DAT capsized ‘The boat capsized.’ b. Barninu batnaði veikin child.DAT recovered.from disease.NOM ‘The child recovered from the disease.’
[Icelandic]
The competition underlying (31a) is illustrated in tableau (32). (32) Intransitive V in Icelandic; inherent dative Candidates
FAITH LEXtr
FAITH LEX
☞ O1: DP.DAT V[+dat]
*DAT
*ACC
*NOM
*
O2: DP.NOM V[+dat]
*!
O3: DP.ACC V[+dat]
*!
* *
The competition underlying the transitive context in (32b) is illustrated in tableau (33). Note that given the verb’s lexical dative specification for the external argument, the fact that the remaining (internal) argument must receive nominative case comes for free. This is an instance of emergence of the unmarked that is completely parallel to the approach to default case specified in the previous subsection (despite the move from demanding case realization to prohibiting case realization): An accusative (or dative, genitive, etc.) realization on the internal argument would fatally violate a markedness constraint (*ACC, etc.) that is ranked higher than the one violated by the optimal output (viz., *NOM). (Why, then, do transitive clauses without lexical case specification on the verb not always just involve nominative marking on all arguments? One possible answer might be that there is a high-ranked (or inviolable) case distinctness requirement counter-acting such a state of affairs; crucially, though, such a requirement would always be fulfilled in quirky subject contexts like the ones currently under consideration.) (33) Transitive V in Icelandic; inherent dative on DPext Candidates
FAITH LEXtr
FAITH LEX
*DAT
☞ O1: DP.DAT V[+dat] DP.NOM
*
O2: DP.DAT V[+dat] DP.ACC
*
O3: DP.NOM V[+dat] DP.ACC
*!
*
*ACC
*NOM *
*! *
*
26. Optimality-Theoretic Syntax
893
Furthermore, note that the more specific version of the FAITH-LEX constraint (viz., FAITH-LEXtrans) is strictly speaking not yet needed for Icelandic − if it were absent, FAITH-LEX as such would suffice to exclude O3 in tableau (33). The situation is different in Japanese, where quirky case can only show up on subjects of transitive clauses; see (34ab). arukeru (34) a. Akatyan-ga / *-ni moo baby-NOM / DAT already walk.can ‘The baby can already walk.’ hanaseru b. Taroo-ni eigo-ga Taro-DAT English-NOM speak.can ‘Taro can speak English.’
[Japanese]
A minimal reranking of *DAT and FAITH-LEX (see [30b]) yields the Japanese pattern. Tableau (35) shows how lexical dative is now blocked in intransitive contexts (assuming a dative case specification on the verb). (35) Intransitive V in Japanese; no inherent dative Candidates
FAITH LEXtr
*DAT
O1: DP-DAT V[+dat]
FAITH LEX
*ACC
*NOM
*!
☞ O2: DP-NOM V[+dat]
*
O3: DP-ACC V[+dat]
*
* *!
In contrast, high-ranked FAITH-LEXtrans still ensures lexically marked dative case on subjects in transitive clauses; see tableau (36). (36) Transitive V in Japanese; inherent dative on DPext Candidates
FAITH LEXtr
*DAT
☞ O1: DP-DAT V[+dat] DP-NOM
*
O2: DP-DAT V[+dat] DP-ACC
*
O3: DP-NOM V[+dat] DP-ACC
*!
FAITH LEX
*ACC
*NOM *
*! *
*
*
Turning finally to English, it is clear that the ranking in (30c), with the prohibition against dative case outranking all case faithfulness constraints, ensures that there can be no lexical case on subjects in this language: Even if there is an inherent dative specification on a verb, high-ranked *DAT will not let it become optimal in the output (note that issues of input optimization of the type discussed above become relevant here again). To sum up, reranking of violable constraints offers a promising approach to parametrization of grammars. I have exemplified this with a tiny empirical domain, viz., crosslinguistic variation with respect to subjects that bear lexically marked case. Of course,
894
IV. Syntactic Models there is a lot more to be said about case, and about the cross-linguistic variation encountered in this area, from an OT perspective; see Aissen (1999, 2003a), Kiparsky (1999), Wunderlich (2000, 2003), Stiebels (2000, 2002), Woolford (2001), Lee (2003), de Hoop and Malchukov (2008), Swart (2007), Keine and Müller (2008, 2011), and references cited in these works. Furthermore, case is by no means the only empirical domain in which parametrization by constraint reranking has proven successful; see, e.g., Legendre et al. (1998), Ackema and Neeleman (1998), and Müller (1997) on cross-linguisticvariation in wh-movement, Grimshaw (1997) and Vikner (2001a, b) on cross-linguistic variation in verb movement, and Samek-Lodovici (1996, 2005), Costa (1998), Choi (1999), Büring (2001), Engdahl et al. (2004), Gutiérrez-Bravo (2007), and Gabriel (2010) on cross-linguistic variation in the placement of subjects, direct objects and indirect objects. With n constraints, there is the logical possibility of n! (n factorial) rerankings. If no additional assumptions are made, n! therefore defines the possible number of grammars that can be created on the basis of a set of n constraints. This property of OT is accordingly often referred to as factorial typology. It is clear that the number of possible grammars for natural languages can become quite large this way (e.g., with a mere 12 constraints, free reranking produces more than 479 million grammars; with 13 constraints, it’s already more than 6.2 billion, and so forth). In view of this (and although it is a priori far from clear whether the large number of grammars generated on the basis of a small set of constraints should in fact be viewed as problematic), strategies have been devised to narrow down the options for reranking. One such attempt relies on fixed subhierarchies of constraints, i.e., pairs of constraints whose ranking must be invariant across languages. We have encountered one such case above (see [8b]): Local conjunction of two constraints A and B gives rise to a constraint C (= A&DB) that inherently outranks the individual constraints of which it is composed. Another restriction on free reranking follows from the concept of harmonic alignment (see below). In some cases, the fixed order of related constraints that differ along some dimension may simply have to be stipulated (see Bakovic´ 1995, 1998). Moreover, it turns out that in quite a number of cases, reranking of two constraints does not actually produce an extensionally different grammar because exactly the same candidates are predicted to be optimal under two rankings. In the case at hand (i.e., concerning the five constraints in [29] that played a role in the licensing of lexically case-marked subjects), factorial typology as such would predict not 3, but 120 different grammars. Some of the variation will be empirically innocuous, and not lead to extensionally different grammars; e.g., all rankings in which FAITH-LEX outranks FAITH-LEXtrans will be such that the actual position of FAITH-LEXtrans is irrelevant for the outcome. (The reason is that the two constraints are in a specialto-general relation. However, one must be careful here. As observed by Prince and Smolensky (2004), two cases must be distinguished with constraints that are in a special-togeneral relation: On the one hand, the two constraints may impose conflicting requirements on candidates. In that case, they form a Paninian relation, and ranking the more specific one lower than the conflicting, more general one will invariably imply that the former one becomes inactive. On the other hand, the two constraints may actually push candidates in the same direction, as in the case currently under consideration. The constraints can then be said to form a stringency relation, see Bakovic 1995; here, the more specific constraint can in principle be the lower-ranked one and still carry out some work if a more complex system of constraints is considered.)
26. Optimality-Theoretic Syntax Other rerankings will give rise to peculiar language types that may not be attested; e.g., reversing the ranking of *DAT, *ACC, and *NOM will predict a language in which dative is the default case, and nominative is highly marked. To avoid grammars of this type, it can be assumed that, in the present context (and notwithstanding the above remarks on unmarked case in English), the order of case markedness constraints is invariantly *DAT [ *ACC [ NOM, and the order of case faithfulness constraints is invariantly FAITH-LEXtrans [ FAITH-LEX. This will then ideally be derivable from the internal make-up of the constraint families (e.g., they might all be derivable from more basic abstract constraints by techniques like local conjunction and harmonic alignment; also see Wunderlich 2003; Stiebels 2002 for a somewhat different, but also principled, approach).
3. Problems for OT analyses in syntax Of the problems for OT analyses that have been raised in the literature, three can be singled out as both potentially troublesome and highly illuminating. First, there is the issue of complexity of competition-based grammars (with possibly infinite candidate sets). Second, a problem arises that is practically unheard of in most other syntactic approaches, viz., that of deriving instances of ineffability (or absolute ungrammaticality). And finally, accounting for syntactic optionality remains an intricate issue in OT syntax to this day. As with the evidence in support of OT analyses, I address the issues one by one. (A further objection to OT analyses that one can hear every now and then is that the theory is inherently unconstrained in the sense that “anything goes”; e.g., potentially problematic predictions of an existing analysis can be avoided by adding another highranked ad hoc constraint. While technically correct, such a criticism misses a fundamental point: Criteria of elegance and simplicity hold for OT syntax in the same way that they hold for other syntactic theories. Consequently, adding stipulative, highly specific constraints is ultimately not an option in OT for the same reason that, say, adding stipulative, highly specific exceptions to constraints is rightly frowned upon in non-OT work. This is tacitly presupposed in all good work in OT syntax; it is explicitly stated in, inter alia, Grimshaw 1998 and Smolensky and Legendre 2006.)
3.1. Complexity The potential problem here is very easy to grasp. Competition adds complexity; and because of the general option of recursion in syntax, candidate sets are not finite in most analyses. From the very beginning, Prince and Smolensky had anticipated this criticism. Here is their reaction: This qualm arises from a misapprehension about the kind of thing that grammars are. It is not incumbent upon a grammar to compute, as Chomsky has emphasized repeatedly over the years. A grammar is a function that assigns structural descriptions to sentences; what matters formally is that the function is well-defined. The requirements of explanatory adequacy (on theories of grammar) and descriptive adequacy (on grammars) constrain and
895
896
IV. Syntactic Models evaluate the space of the hypotheses. Grammatical theorists are free to contemplate any kind of formal device in pursuit of these goals; indeed, they must allow themselves to range freely if there is to be any hope of discovering decent theories. Concomitantly, one is not free to impose arbitrary additional meta-constraints (e.g. ‘computational plausibility’) which could conflict with the well-defined basic goals of the enterprise. In practice, computationalists have always proved resourceful. All available complexity results for known theories are stunningly distant from human processing capacities (...) yet all manner of grammatical theories have nonetheless been successfully implemented in parsers, to some degree or another, with comparable efficiency. (...) There are neither grounds of principle nor grounds of practicality for assuming that computational complexity considerations, applied directly to grammatical formalisms, will be informative. (Prince and Smolensky 1993: 197, 2004: 233)
I have nothing to add to this statement, except for the observation that if there is a problem here, OT shares the problem with other competition-based theories of syntax; e.g., minimalist approaches like those of Chomsky (1993, 1995), Collins (1994), and Boškovic (1997), which rely on transderivational constraints applying to candidate derivations in large (typically infinite) reference sets − note that versions of transderivational constraints are arguably still adopted in more recent minimalist analyses, e.g., those that rely on a constraint like Merge before Move; see section 5 below.
3.2. Ineffability (absolute ungrammaticality) Basically, a sentence (more generally, any LE) can only qualify as ungrammatical in OT if there is some other sentence (or LE) that blocks it by being the optimal candidate. However, sometimes it is far from obvious what this other sentence (LE) should look like. Consider illicit extraction from an adjunct island, as in the German example in (37). (37) *Was ist Fritz eingeschlafen [CP nachdem er t gelesen hat ] ? after he read has what is Fritz fallen.asleep ‘For which thing is it the case that Fritz fell asleep after reading it?’
[German]
This is a clear case of ineffability, or absolute ungrammaticality. At least five different approaches to ineffability can be distinguished in OT syntax. In what follows, I introduce them in turn, based on the problem posed by (37).
3.2.1. The generator A first approach relocates the problem with (37) from the H-EVAL system to GEN (see Pesetsky 1997). One might simply assume that GEN contains constraints like (38) that preclude a generation of outputs like (37) in the first place. (38) ADJUNCT CONDITION: Movement must not cross an adjunct clause. This way, the problem of accounting for ineffability is solved outside the OT system proper.
26. Optimality-Theoretic Syntax
897
3.2.2. Empty outputs A second approach relies on the assumption that each candidate set contains a candidate that leaves the input completely unrealized. This candidate is the empty output or null parse: 0̸ (see, e.g., Ackema and Neeleman 1998). By definition, the empty output does not violate any faithfulness constraints; in fact, the only constraint that it violates is *0̸ in (39). (39) *0̸ (Avoid Null Parse): The input must not be completely unrealized. Tableau (40) shows how the empty output (here, O3) can become optimal, and successfully block both a candidate with wh-movement across an adjunct island as in (37) (O1), and a candidate that fails to carry out wh-movement altogether (O2). (40) Ineffability and empty outputs ADJUNCT CONDITION O1: was ... [ nachdem er t V ]
WH-CRIT
*0̸
*!
O2: − ... [ nachdem er was V ]
*!
☞ O3: 0̸
*
The constraint *0̸ defines a strict upper bound in constraint rankings: Constraints that outrank *0̸ are not violable by optimal outputs.
3.2.3. Bad winners A third kind of approach to ineffability assumes that the optimal candidate cannot be interpreted by other components of grammar (phonology, semantics), or by the interfaces with these components; see Grimshaw (1994) and Müller (1997), among others. Thus, one might posit that (without the null parse present in candidate sets), a candidate that leaves the wh-phrase in situ throughout the derivation might be optimal, as in tableau (41). (41) Ineffability and bad winners ADJUNCT CONDITION O1: was ... [ nachdem er t V ]
WH-CRIT
*!
☞ O2: − ... [ nachdem er was V ]
*
However, the optimal candidate O2 of tableau (41), which corresponds to (42), might be semantically uninterpretable as a regular question. (42) #Fritz ist eingeschlafen [CP nachdem er was gelesen hat ] ? Fritz is fallen.asleep after he what read has ‘For which thing is it the case that Fritz fell asleep after reading it?’
[German]
898
IV. Syntactic Models Arguably, this approach corresponds to recent trends in minimalist syntax to attribute much of the work standardly done by syntactic constraints to interface requirements; see Chomsky (2007, 2008), and particularly Boeckx (2009).
3.2.4. Repair The null hypothesis might be that there is in fact an optimal repair candidate for (37); i.e., that extraction from an adjunct island is blocked in favour of a repair strategy that exhibits properties which are not otherwise tolerated in the language. The sentences in (43a, b) are two potential repair candidates for (37) in German. (43) a. Fritz ist eingeschlafen [CP nachdem er was gelesen Fritz is fallen.asleep after he something read hat ] (= etwas) has ‘Fritz fell asleep after reading something.’
[German]
ist Fritz eingeschlafen [CP nachdem er es gelesen b. Bei was with respect to what is Fritz fallen.asleep after he it read hat ]? has ‘For which thing is it the case that Fritz fell asleep after reading it?’ In (43a), the function of the incriminating item is changed: The wh-pronoun was (‘what’) is reinterpreted as an indefinite pronoun was (‘something’, a short form of etwas). In (43b), the form of the incriminating item is changed: Instead of a moved wh-pronoun, there is a wh-PP (bei was, roughly ‘with respect to what’) in the interrogative SpecC position outside the island, together with a resumptive pronoun es (‘it’) within it. Let us focus mainly on the first alternative here (the theoretical issues raised by the other repair approach are more or less identical). Suppose that there is a faithfulness constraint like (44). (44) IDENT([wh]): A feature [+wh] in the input must not be changed to [−wh] in the output. (43a) would then, as a last resort, violate IDENT([wh]) by reinterpreting the wh-pronoun as an indefinite pronoun, as shown in tableau (45). (45) Ineffability and repair ADJUNCT CONDITION O1: was[+wh] ... [ nachdem er t V ] O2: − ... [ nachdem er was[+wh] V ] ☞ O3: − ... [ nachdem er was[−wh] V ]
WH-CRIT
IDENT([wh])
*! *! *
26. Optimality-Theoretic Syntax
899
However, unfortunately the repair approach to ineffability does not work (at least not in the way just sketched). The problem is that the “repair” strategy is also available outside island contexts, e.g., with successive-cyclic wh-movement from a declarative clause embedded by a bridge verb; see (46a) (where wh-movement is possible) vs. (46b) (where the indefinite interpretation of was is also available). Similarly, the wh-PP/resumptive pronoun strategy is possible without an island being present; see (46c). (46) a. Was glaubt Fritz [CP dass er t lesen sollte] ? that he read should what thinks Fritz ‘What does Fritz think that he should read?’
[German]
lesen sollte] b. Fritz glaubt [CP dass er was that he what (= something) read should Fritz thinks ‘Fritz thinks that he should read something.’ c. Von was glaubt Fritz [CP dass er es lesen sollte] ? that he it read should of what thinks Fritz ‘For which thing is it the case that Fritz thinks he should read it?’ As shown in tableau (47) for the indefinite interpretation of was, considering (43a) or (43b) to be repair options does not make the right predictions: O3 is blocked by O1 in tableau (47), but it must be an optimal candidate. (47) A wrong prediction ADJUNCT CONDITION
WH-CRIT
IDENT([wh])
☞ O1: was[+wh] ... [ dass er t V ] O2: − ... [ − dass er was[+wh] V ] O3: − ... [ dass er was[−wh] V ]
*! *!
To conclude, clauses with wh-indefinites are not repair forms; they are available even if long wh-movement is permitted. Similar conclusions hold in the case of sentences with optional wh-argument generation in the matrix clause; see Koster (1987), Cinque (1990), Sternefeld (1991), Barbiers (2002), Gallego (2007). However, there is a more finegrained version of the repair approach that allows one to both have the cake and eat it. It is based on neutralization.
3.2.5. Neutralization The fifth and final approach to ineffability to be discussed here centers around the concept of input neutralization. The main premise is that there can be two competitions based on minimally differing inputs (e.g., inputs that differ only with respect to some feature value). These input differences can then be neutralized by some high-ranked markedness constraint in the output; i.e., two different competitions (based on two candi-
900
IV. Syntactic Models date sets) may converge on a single optimal candidate. Approaches of this type have been developed by Legendre et al. (1995), Legendre et al. (1998), Keer and Bakovic´ (2004), Keer and Bakovic´ (2001), Vogel (2001), and Wilson (2001), among others. For the case at hand, consider first a competition with a transparent context and a wh-item that bears the feature [+wh] in the input, as in tableau (48). (48) Transparent contexts without neutralization: ‘was[+wh]’ in the input Input: was[+wh], ...
ADJUNCT CONDITION
WH-CRIT
IDENT([wh])
☞ O1: was[+wh] ... [ dass er t V ] O2: − ... [ − dass er was[+wh] V ]
*!
O3: − ... [ dass er was[−wh] V ]
*!
O4: was[−wh] ... [ dass er t V ]
*!
O1, which leaves the +-value of the wh-feature intact and moves the wh-phrase to SpecC, emerges as optimal. O3, which changes the value from + to −, fatally violates IDENT([wh]). Consider next a competition with a transparent context where the wh-item is an indefinite (i.e., [−wh]) in the input, as in tableau (49). Again, the faithful candidate wins − changing the feature value does not lead to an improved behaviour with respect to higher-ranked constraints. In both competitions, there is a further output O4 that applies movement of a [−wh] phrase to SpecC. As it stands, O4 has the same violation profile as O3 with respect to the three constraints given here. However, it is suboptimal because it violates ECONOMY (see [14]) in addition without contributing to a better behaviour with respect to any other constraint. As a matter of fact, O4 is not expected to be grammatical under any reranking of the four pertinent constraints; the technical expression for such a state of affairs is that O4 is harmonically bounded by O3 (see Prince and Smolensky 2004) (also see Prince and Samek-Lodovici 1999 for the extended concept of collective harmonic bounding, i.e., cases where it is not a single candidate but a set of candidates that harmonically bounds a candiate, which can therefore never become optimal in any language.) Harmonically bounded candidates are predicted to be universally unavailable. (49) Transparent contexts without neutralization: ‘was[−wh]’ in the input Input: was[−wh], ...
ADJUNCT CONDITION
WH-CRIT
O1: was[+wh] ... [ dass er t V ] O2: − ... [ − dass er was[+wh] V ]
IDENT([wh]) *!
*!
*
☞ O3: − ... [ dass er was[−wh] V ] O4: was[−wh] ... [ dass er t V ]
This solves the problem with the pure repair approach: Both strategies (wh-movement, wh-indefinite) can survive in transparent contexts because they go back to minimally
26. Optimality-Theoretic Syntax
901
different inputs, and thus different competitions. However, in opaque contexts where a locality constraint like the ADJUNCT CONDITION becomes active and distinguishes between candidates, neutralization takes place. Under present assumptions, (50a) and (50b) compete with one another, both with a [+wh]-specification in the input and with a [−wh]-specification in the input. (50) a. *Was ist Fritz eingeschlafen [CP nachdem er t gelesen hat] ? [German] after he read has what is Fritz fallen.asleep ‘For which thing is it the case that Fritz fell asleep after reading it?’ gelesen hat] b. Fritz ist eingeschlafen [CP nachdem er was after he what (= something) read has Fritz is fallen.asleep ‘Fritz fell asleep after reading something.’ If there is a [+wh]-specification in the input, as in tableau (51), the two higher-ranked constraints ADJUNCT CONDITION and WH-CRIT eliminate the faithful candidates O1, O2 (hence, [50a]), and the unfaithful candidate O3 becomes optimal (i.e., [50b]; note that the harmonically bounded output O4 is ignored here and in the following tableau). (51) Island contexts with neutralization, unfaithful: ‘was[+wh]’ in the input Input: ‘was[+wh], ... O1: was[+wh] ... [ nachdem er t V ]
ADJUNCT CONDITION
WH-CRIT
IDENT([wh])
*!
O2: − ... [ nachdem er was[+wh] V ]
*!
☞ O3: − ... [ nachdem er was[−wh] V ]
*
However, O3 (= [51b]) also emerges as optimal in the minimally different context where the input specification is [−wh] to begin with; see tableau (52). (52) Island contexts with neutralization, faithful: ‘was[−wh]’ in the input Input: ‘was[−wh], ... O1: was[+wh] ... [ nachdem er t V ] O2: − ... [ nachdem er was[+wh] V ]
ADJUNCT CONDITION
WH-CRIT
*!
IDENT([wh]) *
*!
*
☞ O3: − ... [ nachdem er was[−wh] V ]
Thus, the difference in the input between tableau (51) and tableau (52) is neutralized in the output. (Ultimately, a bit more will have to be said in this kind of neutralization approach, concerning, e.g., whether C is also marked [+wh] vs. [−wh] in the two inputs, and this feature value must then also be altered. However, this issue does not affect the logic of the neutralization approach as such.) As before, the question arises of whether a sentence like (50b) must then be assumed to have two (or, perhaps, many more) possible sources; and as before, the standard answer given in OT is that input optimization may
902
IV. Syntactic Models compare the two optimal candidates, and filter out one of them. Much more will eventually have to be said about absolute ungrammaticality in OT syntax, but I will leave it at that (see Müller 2000b; Fanselow and Féry 2002a; Legendre 2009; Vogel 2009b).
3.3. Optionality In general, only one candidate should be optimal in a given candidate set (however, recall that the definition of optimality in [1] is in principle compatible with there being more than one winner). Thus, the question arises of what to do about situations where it looks as though several outputs can co-exist as optimal. Let us go through a number of potentially relevant phenomena. Consider first complementizer deletion in English. The example pair in (53a, b) shows that a complementizer that can be left out in declarative object clauses (at least under certain circumstances, which include the right choice of matrix predicate). (53) a. I think − John will leave b. I think that John will leave Next, (54a, b) show that German has an alternation between so-called partial wh-movement (where a wh-phrase moves to an intermediate declarative SpecC position, and the final SpecC position of the interrogative clause is filled by a scope marker was ‘what’), and standard long-distance wh-movement. At least in certain varieties of German, (54a) and (54b) are both perfectly legitimate, unmarked ways of producing exactly the same kind of question. (54) a. Wen glaubst du [CP dass man t einladen sollte] ? whom think you that one invite should ‘Who do you think that one should invite?’
[German]
b. Was glaubst du [CP wen man t einladen sollte] ? what think you whom one invite should ‘Who do you think that one should invite?’ A third example involves wh-movement in French. In certain contexts (viz., root questions), and with certain wh-items (viz., arguments), wh-movement is optional in this language; cf. (55a, b). vu t ? (55) a. Qui as-tu whom have-you seen ‘Whom have you seen?’
[French]
b. − Tu as vu qui ? you have seen whom ‘Whom have you seen?’ Extraposition is also an operation that often applies optionally; compare the example with an in situ relative clause modifying the head of the subject in German in (56a)
26. Optimality-Theoretic Syntax
903
with the minimally different example in (56b), where the relative clause has undergone extraposition. (56) a. dass eine Frau [die ich mag] zur Tür reingekommen ist that a woman whom I like to.the door in.come is ‘that a woman who I like came in through the door.’
[German]
b. dass eine Frau t zur Tür reingekommen ist [die ich mag] that a woman to.the door in.come is whom I like ‘that a woman who I like came in through the door.’ It is entirely unproblematic to continue this list with more examples from many more languages. For present purposes, it may do to give one more relevant example: In free word order languages, there are many contexts where various orders can co-exist; this is shown for the optional permutation of subject and object in Korean in (57) (from Choi 1999: 172). (57) a. Swuni-nun Inho-lul manna-ss-e Swuni-TOP Inho-ACC meet-PST-DECL ‘Swuni met Inho.’
[Korean]
b. Inho-lul Swuni-nun manna-ss-e Inho-ACC Swuni-TOP meet-PST-DECL ‘Swuni met Inho.’ Various kinds of approaches to optionality can be distinguished in OT syntax. The taxonomy of analysis types in (58) is based on Müller (2003b). (58) Analyses of optionality of two candidates Ci , Cj: a. Pseudo-optionality: Ci , Cj belong to different candidate sets and do not interact. b. True optionality: Ci , Cj have an identical violation profile. c. Ties: Ci , Cj differ only with respect to two (or more) constraints that are tied. Ties can be interpreted in various ways: (i) ordered global tie (ii) ordered local tie (iii) conjunctive local tie (iv) disjunctive local tie (v) disjunctive global tie d. Neutralization: Ci , Cj belong to different candidate sets, but interact nonetheless e. Stochastic optimality theory I cannot go through all these different strategies for dealing with optionality in detail here (see Müller 2003b for a comprehensive exposition covering all but the stochastic analyses). In the following subsections, I will confine myself to pseudo-optionality and neutralization, true optionality, two kinds of ties, and finally stochastic optimality theory.
904
IV. Syntactic Models
3.3.1. Pseudo-optionality and neutralization The basic assumption underlying pseudo-optionality analyses is that instances of optionality are only apparent: The two optimal candidates are winners of two different competitions. To achieve this, candidate sets must be defined in such a way that there is little competition. Suppose, for instance, that for the example pairs in (55), (56), and (57), a movement-inducing feature can optionally be present in the input; if it is present, it triggers a movement operation that creates the different word order (wh-movement, extraposition, and scrambling, respectively). (For [53] and [54], invoking different lexical material may suffice to generate two separate competitions.) Assuming that candidate sets are (at least partly) defined by input identity, the candidate with movement will be the optimal output of the candidate set that has the relevant feature in the input, and the candidate without movement will be the optimal output of the candidate set that lacks this feature in the input. This gives rise to an obvious problem, though: If there is not much competition, this weakens the overall theory and increases the problem of accounting for ineffability. To see this, consider, e.g., the case of partial wh-movement in German. Whereas the data in (54) show that partial and long-distance wh-movement can coexist, the examples in (59a, b) show that the distribution of the two construction types is not fully identical. If there is negation in the matrix clause, partial wh-movement ceases to be possible, while long-distance wh-movement is much more acceptable for many speakers. From an optimality-theoretic perspective, this strongly suggests that partial wh-movement and long-distance wh-movement do belong to one and the same competition after all, with the latter option blocking the former one in certain island contexts. Here and henceforth, I will refer to instances of optionality that breaks down in certain contexts as alternations. einladen sollte] ? (59) a. ?Wen glaubst du nicht [CP dass man t that one invite should whom think you not ‘Whom don’t you think that one should invite?’
[German]
b. *Was glaubst du nicht [CP wen man t einladen sollte] ? whom one invite should what think you not ‘Whom don’t you think that one should invite?’ The same problem shows up with the co-existence of wh-movement and wh-in-situ in French (cf. [55]). As shown in (60a, b), in embedded clauses only the former strategy is available (the same asymmetry arises in − embedded or matrix − contexts where the wh-phrase is an adjunct; see Aoun et al. 1981). Again, this alternation suggests that the two construction types are part of the same competition after all, which excludes a pseudo-optionality approach. demande [qui C tu as vu t] (60) a. Je me I myself ask whom you have seen ‘I ask myself who you saw.’ demande [ − (que) tu as vu qui] b. *Je me I myself ask that you have seen whom ‘I ask myself who you saw.’
[French]
26. Optimality-Theoretic Syntax Of course, there is one potential way out of this dilemma (see Legendre et al. 1995; Keer and Bakovic´ 2004, 2001): As with ineffability, one can adopt a neutralization approach. On this view, each of the two optional variants in alternations like those in (54) and (55) is a faithful winning candidate of one competition, and a fatally unfaithful losing candidate of the other competition. In contexts like those in (59) and (60), where optionality breaks down, the two separate competitions converge on a single output that is faithful (with respect to the relevant input property defining the candidate set) in one case and unfaithful (with respect to this property) in the other case − the input difference is neturalized in the output. Basically, in this kind of approach, there is no relevant difference between the ineffability problem and the optionality problem: The relation between, say, (46a) and (46b) emerges as an instance of optionality in the same way that the relation between, say, (53a) and (53b) does. As with ineffability, the neutralization approach to optionality presupposes that the generator (GEN) is sufficiently powerful to create strongly unfaithful candidates; and as before, the issue of input optimization arises.
3.3.2. True optionality Here the assumption is that two (or more) candidates can in fact have the same (optimal) violation profile; given an appropriate definition of optimality that does not presuppose that there is a single, unique winner (see [1]), they can then both be optimal. Approaches of this type have been pursued by Grimshaw (1994) and Vikner (2001a), among others. For instance, Grimshaw (1994) suggests deriving the optionality of English complementizer drop in (53) from an identical violation profile of the two candidates. This approach solves the alternation problem (i.e., it straightforwardly captures a selective breakdown of optionality.) However, adopting such an approach proves very hard (or indeed impossible) in practice: Because of faithfulness constraints for lexical items and features, and because of symmetrical markedness constraints that invariably incur violations for all pieces of structure present in a candidate (like ALIGN-X-LEFT being accompanied by ALIGN-X-RIGHT), there will always be constraints where two non-surface-identical candidates differ. Given a low ranking of these constraints, they may not be active in the sense that they can determine an optimal output’s properties; but they will suffice to create a distinct profile of two candidates. Along these lines, analyses involving true optionality have been widely criticized; see, e.g., Keer and Bakovic (2001), Grimshaw (1999), and,´ indeed, Grimshaw (1997: 411) already; and they do not seem to be regularly pursued anymore.
3.3.3. Ties The central idea behind ties is that two (or more) constraints are equally important, i.e., tied. If two candidates differ only with respect to a tie of constraints, they can both be optimal, even if their violation profiles are not completely identical. In what follows, I will render a tie of two constraints A and B as A◦B. Various concepts of tie have been proposed in the literature. A basic distinction is between what can be called global tie and what can be called local tie. Global ties are abbreviations for multiple constraint
905
906
IV. Syntactic Models rankings; local ties are essentially special constraint types. I am aware of at least five distinct concepts of tie (two of them global, three local) that can be shown to be both conceptually different, and empirically incompatible, viz.: ordered global ties (see Sells et al. 1996; Ackema and Neeleman 1998; Schmid 2001, 2005; Prince and Smolensky 2004); disjunctive global ties (see Müller 1999); ordered local ties (see, e.g., Pesetsky 1997, 1998); conjunctive local ties (see, e.g., Prince and Smolensky 1993, 2004; Legendre et al. 1995, 1998; Müller 1997; Tesar 1998; Legendre 2001); and disjunctive local ties (see Broihier 1995). Still, in the abstract tableau (61), with the ranking A [ B◦C [ D, all these concepts of constraint tie turn out to make identical predictions: O1 and O2 are both optimal, whereas O3 and O4 are blocked as suboptimal. (61) Constraint tie: B◦C A
B|C
☞ O1
|*
☞ O2
*|
O3 O4
D
*(!) | *(!) *!
|
As noted, the basic distinction is between global ties and local ties. Global ties can be viewed as abbreviations for the simultaneous presence of different constraint rankings in a language; they thus essentially correspond to the multiple-grammar approach to (temporary) optionality in syntax as it has been proposed in studies in historical linguistics; see Kroch (2001) and references cited there. In contrast, local ties can be viewed as special constraint types. (Terminology is not uniform is this domain. Legendre 2001 reserves the term tie for what I call local tie, and refers to what I call global tie as floating constraints or partial orderings. Prince and Smolensky 1993 label what I refer to as a local tie crucial nonranking.) The most widespread concepts of tie in the literature are arguably ordered global ties and conjunctive local ties. Ordered global ties essentially work like this: A constraint ranking that exhibits an ordered global tie of two constraints B◦C is underspecified; it is an abbreviation that encodes the simultaneous presence of two hierarchies that exhibit the rankings B [ C and C [ B, respectively. A candidate is grammatical if it qualifies as optimal under one of the possible resolutions of the tie. With global ties (ordered or not), the optimal outputs of one candidate set may have a different violation profile below the tie. This is illustrated for ordered global ties in (62): An output is optimal if it is optimal with respect to at least one of the two rankings (A [ B [ C [ D or A [ C [ B [ D). (62) Diagram of an ordered global tie B◦C A
C
D
B
D
B C
26. Optimality-Theoretic Syntax
907
As a concrete example, consider the optionality of wh-movement and scope marking in German in (54), and its breakdown (alternation) in (59). Each movement step may be assumed to violate ECONOMY (see above), and scope marker insertion may be assumed to violate a DEP constraint blocking expletive insertion (like FULL-INT(erpretation) in Grimshaw 1997); this presupposes that expletives are never part of an initial input (see Hornstein 2001 for a comparable assumption concerning numerations in minimalist syntax). Assuming these two constraints to be part of an ordered global tie, optionality may emerge. However, in (say) negative island contexts, the partial movement output may violate a locality constraint ranked above the tie more often than the long-distance movement candidate does (given the approach in Müller 1997, the reason might be that the former candidate incurs two violations of the locality constraint, one in syntax and one in logical form, whereas the latter candidate incurs only one violation, in syntax). In contrast, conjunctive local ties are not abbreviations for the simultaneous presence of more than one ranking. They basically act like ordinary constraints, and there is no resolution of the tie into suborders. Rather, the two constraints are merged into a single constraint that is interpreted via logical conjunction: A candidate violates a conjunctive local tie if it violates a constraint that is part of this tie, and multiple violations add up. With local ties, two outputs can only be optimal if they have an identical behaviour with respect to constraints that are ranked below the tie. Otherwise, a breakdown of optionality is predicted. (Thus, a somewhat less severe version of the central problem for true optionality approaches persists.) The working of a conjunctive local tie is illustrated in (63). (63) Diagram of a conjunctive local tie B◦C B A
D C
Taking again the optionality (and alternation) of wh-movement in German as an example, a conjunctive local tie approach might rely on the assumption that ECONOMY and DEP (FULL-INT) are locally tied (like B and C in [63]). The locality constraint underlying negative islands might then be ranked either above or below the tie. Whereas the data in (54) and (59) do not differentiate between these two (and other) approaches to ties per se, an argument is brought forward in Müller (1997) to the effect that more complex data favour the conjunctive local tie approach: In cases where there are two intervening SpecC domains between the base position and the ultimate SpecC[+wh] target position, three outputs can emerge as optimal; see (64). (64) a. Wann1 meinst du [CP t1″ dass sie gesagt hat [CP t1′ dass sie t1 that she said has that she when think you kommen würde]] ? come would ‘When do you think that she said that she would come?’
[German]
908
IV. Syntactic Models b. Was1 meinst du [CP wann1 (dass) sie gesagt hat dass sie t1 kommen what think you when that she said has that she come würde]] ? would ‘When do you think that she said that she would come?’ c. Was1 meinst du [CP was1 sie gesagt hat [CP wann1 (dass) sie t1 what think you what she said has when that she kommen würde]] ? come would ‘When do you think that she said that she would come?’ (64a) incurs three violations of ECONOMY (and no violations of DEP); (64b) incurs two violations of ECONOMY and one violation of DEP; and (64c) incurs one violation of ECONOMY and two violations of DEP. A conjunctive local tie permits all three outputs to be optimal (because they all incur three violations of the single merged constraint ECONOMY◦DEP); an ordered global tie ceteris paribus makes the wrong prediction that (64b) should be suboptimal (because the ranking ECONOMY [ DEP will favour maximal use of scope markers, as in [64c], and the reverse ranking DEP [ ECONOMY will favour maximal use of movement, as in [64a]). That said, there is also conflicting evidence from other empirical domains which would seem to suggest that (ordered) global ties form the superior concept. At present, it is an open question which version of tie (if any) is to be preferred in OT syntax. (Of course, several concepts of tie may also co-exist.) Interestingly, there is a version of the concept of ordered global tie which has received some attention in more recent years even though the close connection is usually not made explicit: The concept shows up in stochastic approaches to optimality theory.
3.3.4. Stochastic Optimality Theory Stochastic optimality-theoretic analyses of phonological phenomena have been developed in Anttila (1997), Boersma and Hayes (2001), and Hayes (2001). Syntactic applications include Aissen (2003a, b) (on optionality with differential object marking and with DP-internal possessor placement, respectively), Bresnan, Dingare, and Manning (2001) (on optionality in passive formation), and Bresnan, Deo, and Sharma (2007) (on types of inflection of the verb be, including negation, in varieties of English). The basic observation is that quite often, the constructions that co-exist as optional and may participate in an alternation (with one selectively blocking the other in certain contexts) are not equally frequent, or equally unmarked (or, for that matter, equally well formed − i.e., they may exhibit different degrees of acceptability). For instance, the positioning of possessors in English DPs, while often (though not always) optional, also often illustrates clear preferences for one or the other option that can be detected by checking relative frequency in corpora, and also by consulting native speaker intuitions. Preferences are indicated by > in (65) (and ?*/* signals cases of (near-) complete ungrammaticality of one output, i.e., the breakdown of optionality).
26. Optimality-Theoretic Syntax (65) a. b. c. d. e. f. g.
the result of the accident > the accident’s result Mary’s sister > the sister of Mary the boy’s uncle > the uncle of the boy the door of the building > the building’s door someone’s shadow > the shadow of someone the shadow of something > *something’s shadow her money > ?*the money of her
Evidently, placement of a possessor on animacy and definiteness scales, which are independently motivated (see Hale 1972; Silverstein 1976), plays an important role in their DP-internal positioning. Aissen (2003b) sets out to derive the pattern in (65) − both the preferences for positioning in cases of optionality, and the categorical unavailability of some of the options. To this end, she first assumes that the underlying animacy and definiteness hierarchies can be used as primitives to generate sequences of constraints with a fixed internal order (sometimes called subhierarchies), via a process of harmonic alignment of scales; see Prince and Smolensky (2004), and Aissen (1999) for an influential application in syntax. Harmonic alignment is defined as in (66) (cf. Prince and Smolensky 2004: 161). (66) Harmonic Alignment: Suppose given a binary dimension D1 with a scale X > Y on its elements {X,Y}, and another dimension D2 with a scale a > b > ... > z on its elements {a,b,...,z}. The harmonic alignment of D1 and D2 is the pair of Harmony scales HX , HY : a. HX : X/a _ X/b _ ... _ X/z b. HY : Y/z _ ... _ Y/b _ Y/a The constraint alignment is the pair of constraint hierarchies CX , CY : a. CX : *X/z [ ... [ *X/b [ *X/a b. CY : *Y/a [ *Y/b [ ... [ *Y/z Thus, given an animacy scale [ human > animate >inanimate ] and a definiteness scale [ pronoun > name > definite DP > indefinite DP ], harmonic alignment of these scales with the binary scale [ SpecN > CompN ] will automatically produce the four constraint subhierarchies in (67). (Here, SpecN and CompN are abbreviations for pre-nominal placement and post-nominal placement of the possessor in a DP, respectively, with the former realized by genitive ’s and the latter by an of-PP.) Note that the order within a subhierarchy is universally fixed. This derives varying degrees of markedness of certain options. For instance, from (67a [i]) it follows that a pre-nominal inanimate possessor will always violate a higher-ranked constraint in the subhierarchy (and therefore qualify as more marked) than a pre-nominal animate possessor. (67) a. (i) (ii) b. (i) (ii)
*SpecN/inanimate [ *SpecN/animate [ *SpecN/human *CompN/human [ *CompN/animate [ *CompN/inanimate *SpecN/indef [ *SpecN/def [ *SpecN/name [ *SpecN/pron *CompN/pron [ *CompN/name [ *CompN/def [ *CompN/indef
Given that DP-internal possessors must be placed either in a pre-nominal or in a postnominal position, the constraints in (67a [i]) and the constraints (67a [ii]) impose con-
909
910
IV. Syntactic Models flicting requirements on outputs (e.g., *SpecN/inanimate requires inanimate possessors to show up post-nominally, whereas *CompN/inanimate requires inanimate possessors to show up pre-nominally); as do the constraints in (67b [i]) and (67b [ii]). In a standard OT system without ties, interleaving of the two hierarchies in (67a [i]) and (67a [ii]), and the two hierarchies in (67b [i]) and (67b [ii]) will determine a single optimal output of each input. As regards (67a), if *CompN/inan [ *SpecN/inan, all inanimate possessors will be realized pre-nominally; if *SpecN/human[ *CompN/human, all human possessors will be realized post-nominally; otherwise, mixed patterns will result which, however, respect implicational generalizations (e.g., if an animate (non-human) possessor is realized pre-nominally, a human possessor also has to be realized pre-nominally; or if an animate (non-human) possessor is realized post-nominally, an inanimate possessor also has to be realized post-nominally). Similar conclusions apply in the case of (67b). Furthermore, by locally conjoining the members of two similar subhierarchies (e.g., (67a [i]) and (67b [i]), both precluding pre-nominal possessor placement) in an order-preserving way, a two-dimensional picture arises: In the case at hand, the highest-ranked constraint then is the one blocking a pre-nominal placement of inanimate indefinite possessors (*SpecN/inanimate & *SpecN/indef), the lowest-ranked constraint bans a pre-nominal placement of human pronominal possessors (*SpecN/human & *SpecN/pron); and whereas there is no fixed ranking between, say *SpecN/inanimate & *SpecN/def and *SpecN/animate & SpecN/indef, the ranking of *SpecN/inanimate & *SpecN/def and *SpecN/inanimate & *SpecN/name is fixed again (as is the ranking between *SpecN/ animate & SpecN/indef and *SpecN/animate & SpecN/def). In view of the (partial) optionality visible in (65), this approach does not yet seem correct. In principle, one could now assume ties to derive optionality where it occurs. However, Aissen (2003b) does not pursue this approach because it does not offer a way to integrate the finding that in the cases of optionality in (65), one of the two options is typically more frequent (and less marked) than the other one. This state of affairs can be derived by adopting a stochastic OT approach. The basic idea of stochastic OT is that constraints are not necessarily categorically ordered with respect to each other. Rather, their application domains may overlap. An overlap of application domains gives rise to optionality. A categorical vs. overlapping application domains of constraints is illustrated in (68a, b), where the boxes signal the possible application domains of constraints on an abstract continuum (%) from more strict to less strict (interpreted from left to right). (68) a. Categorical order of application domains of constraints: B↓
C↓
b. Overlapping order of application domains of constraints: B↓
C↓
Here is how the approach works technically: A candidate is evaluated at an evaluation time; it is well formed if it is optimal at that point. For an evaluation, an arbitrary point
26. Optimality-Theoretic Syntax
911
is chosen in the application domain of a constraint. A constraint B is ranked higher than another constraint C at a given evaluation time if the point chosen for B is above the point chosen for C. If the domains of B and C are categorically ordered, then the point for B is always going to be above the point for C, and there will be no optionality. However, if the domains of B and C overlap, optionality arises; the winning candidate is determined by whether the point chosen for B is above the point chosen for C or vice versa. So far, this is basically identical to the concept of ordered global tie. However, in addition to permitting an account of optionality, the new system also captures preferences: The choice of an evaluation point at a given evaluation time is free as such. However, the smaller the common domain of B and C is, the more likely it is that the point chosen for the higher-ranked constraint (say, B) is above the point chosen for the lowerranked constraint (say, C). Accordingly, the more likely a higher position of B points vis-à-vis C points at a given evaluation time is, the more the construction favoured by B is going to be preferred over the construction favoured by C; similarly, the more frequent the construction favoured by B will be in corpora. This is illustrated in (69). (69) a. Typical result: B [ C B↓
C↓ c
b
b. Rare result: C [ B B↓
C↓ c
b
Thus, by assuming that the constraints determining possessor placement may have both nonoverlapping and overlapping (but typically non-identical) application domains in English, Aissen (2003b) succeeds in deriving both categorical ungrammaticality of some options (the composite constraints *SpecN/indef & *SpecN/inanimate & *CompN/ pron & *CompN/hum properly outrank their respective antagonists), and preferences among the two basically optional placement strategies (e.g., in [65a], the application domains of *SpecN/inanimate & *SpecN/def and *CompN/inanimate & *CompN/def overlap, with the likelihood of choosing a higher evaluation point in the former constraint’s domain being greater than the likelihood of choosing a higher evaluation point in the latter constraint’s domain). More generally, since stochastic OT can be viewed as a way to assign preferences to options permitted by globally tied constraints (conceived of as constraints with overlapping application domains), it should in principle be possible to transfer all analyses that rely on (ordered) global ties to stochastic OT analyses; and indeed, it has often been noted for cases like those in (53)−(57) that one of the two options tends to be less marked than the other one (with markedness degrees subject to micro-variation, possibly even idiolectal variation).
912
IV. Syntactic Models
4. Optimization domains The previous two sections have addressed syntactic evidence that supports an optimalitytheoretic approach, and syntactic evidence that may turn out to be problematic. When discussing these issues, I have presented each syntactic analysis in some actual framework. Typically, this has been the one in which it was presented in the original literature; and typically, this has been a version of the Principles and Parameters (P&P) approach (see Chomsky 1981; Chomsky and Lasnik 1993). However, OT is a theory of constraint interaction, not a theory of the basic building blocks that create (or license) LEs (sentences, in the case at hand) as such. So, while it may be true that much of the groundbreaking work in OT syntax has assumed a P&P perspective on syntactic candidates and the makeup of syntactic constraints of both the GEN and the H-EVAL components (see, e.g., Grimshaw 1997; Pesetsky 1998; Legendre et al. 1998), there is no intrinsic relation between the P&P approach and OT. Indeed, it would seem that most syntactic theories could be enriched by an OT component; and whereas theories like HPSG or TAG seem to have largely withstood the impact of OT, LFG in particulary seems to have embraced OT, at least for a while; see Choi (1999), Sells (2001b, a), Bresnan (2001), and Kuhn (2001), among many others. Against this background, one may ask whether optimization might also be compatible with minimalist approaches; see Chomsky (1993, 2001, 2008) and much related work. In this section, I will address the issue on the basis of the related issue of optimization domains.
4.1. Background A fundamental question is whether optimization of a LE applies only once (so-called harmonic parallelism) or more than once (harmonic serialism). To some extent (but see below), this distinction also manifests itself in the similar distinction between a representational and a derivational organization of grammar. Whereas in classical rulebased generative phonology the concept of ordered application of rules is crucial (giving rise to feeding, bleeding, and opacity effects in the guise of counter-feeding and counterbleeding), OT phonology can for the most part do without derivations (with potential problems arising in the area of opacity, though), and thus qualifies as an instance of harmonic parallelism. In fact, it still seems to be a widespread assumption (particularly, but not exclusively, among those who work outside OT) that OT is inherently representational, and characterized by harmonic parallelism. However, this assessment is most certainly incorrect, as the following quote from Prince and Smolensky (1993, 2004) makes clear: Much of the analysis given in this book will be in the parallel mode, and some of the results will absolutely require it. But it is important to keep in mind that the serial/parallel distinction pertains to GEN and not to the issue of harmonic evaluation per se. It is an empirical question (...) Many different theories (...) can be equally well accommodated in GEN, and the framework of Optimality Theoriy per se involves no commmitment to any set of such assumptions. (Prince and Smolensky 2004: 95−96)
26. Optimality-Theoretic Syntax As a matter of fact, having first addressed the issue in McCarthy (2000), John McCarthy has recently come to embrace an approach to OT phonology that relies on harmonic serialism; see McCarthy (2008, 2010) and much related recent work (though, crucially, not McCarthy 2007). The same goes for syntax: There is no deep reason why OT syntax should have to be strictly representational, and qualify as an instance of harmonic parallelism. The following quote makes it clear that there is no fundamental obstacle to reconciling OT with the derivational approach to syntax envisaged in the minimalist program. While some see a major divide between the derivationally-oriented MP and OT, we do not. Of course, there are likely to be differences of empirical import between the non-derivational, chain-based theory of Shortest Move developed here and a particular derivational MP proposal, but such differences seem comparable to those between different approaches to syntax within OT, or to those between different proposals within MP: they do not seem to follow from some major divide between the OT and MP frameworks. In fact, derivational theories can be naturally formalized within OT. Harmonic serialism is a derivational version of OT developed in Prince and Smolensky (1993) in which each step of the derivation produces the optimal next representation. Another approach, seemingly needed to formalize MP within OT has Gen produce derivations; it is these that are evaluated by the constraints, the optimal derivation being determined via standard OT evaluation. Thus, on our view, while the issue of derivations is an important one, it is largely orthogonal to OT. (Legendre et al. 1998: 285−286)
What is more, Legendre et al. (1998) point out that there are actually two ways to reconcile a derivational approach to syntax with OT − either via standard, parallel optimization of full derivations (also cf. the role of candidate chains and so-called rLUMSeqs in McCarthy 2007), or via serial optimization. In the latter case, another issue becomes relevant: In classic transformational grammar (e.g., Chomsky 1965), syntactic transformations applying to the output of the base component effect derivational steps where the input and the output have roughly the same size, exactly as in phonology. For instance, a wh-movement transformation may reorder a wh-phrase with respect to the rest of the clause, but the transformation does not per se create additional structure (many transformations are structure-preserving). Things are different in the minimalist program, where the operations of the base component and of the transformational component are systematically interspersed; syntactic structures start with two lexical items and grow throughout the derivation by iterated application of (external or internal) Merge. From a serial OT perspective, this implies that iterated optimization in syntax cannot apply to objects of (roughly) the same size (as it still is the case with serial optimization in phonology, which involves no structure-building) − the optimal output of one optimization procedure will have to be smaller than the optimal output of the next optimization procedure (assuming there is more than one). This in turn means that we have to introduce a second fundamental difference in optimization options: Optimization may be parallel (i.e., apply once) or serial (i.e., apply more than once); and optimization may be global (applying to the full LE) or local (applying also to smaller domains). Whereas serial optimization in phonology is typically global (phonology restricts the shape of LEs − words or morphemes − but it does not create them; however, cf. Kiparsky 2000; Itô and Mester 2002; and Bermúdez-Otero 2008 for stratal OT, where this reasoning does not hold), serial optimization in minimalist syntax must be local (the LEs created by Merge grow succes-
913
914
IV. Syntactic Models sively). In fact, given the more recent concept of phase-based spell-out (see Chomsky 2001), there is no final representation of the full sentence in minimalist syntax. Under this assumption, Legendre et al.’s first option is not available on principled grounds. Given that syntactic optimization can be both serial and local, the question arises of how the local domain that optimization applies to is defined. (70) lists a number of options. (70) Optimization domains: a. sentence (parallel or serial optimization, derivational or representational) b. minimal clause (e.g., CP; potentially serial optimization, derivational) c. phase (CP, vP (AgrOP), DP: serial optimization, derivational) d. phrase (XP: serial optimization, derivational) e. derivational step (serial optimization, derivational) As noted, the standard assumption in OT syntax is that the whole sentence is subject to a single, parallel global optimization procedure (Grimshaw 1997; Pesetsky 1998; and Legendre, Smolensky, and Wilson 1998; etc.). The output candidates are usually taken to be representations; but they can also be full derivations (as, e.g., in Müller 1997). In contrast, serial global optimization of whole sentences is proposed in Wilson (2001) and Heck (1998, 2001). Finally, serial local optimization in syntax is closely related to developments in the minimalist program. Conceptually, there are trade-offs. An argument for small optimization domains might be this: The smaller the optimization domain is, the more the complexity of the overall system is reduced (i.e., there is a reduction of the size of candidate sets). Furthermore (as noted by a reviewer), smaller optimization domains are also likely to require simpler constraints than larger domains. On the other hand, an argument for larger optimization domains might be that the larger the optimization domain is, the less often optimization procedures have to be carried out. Assuming that iterated optimization in small domains is ultimately cheaper than single optimization of extremely large domains, one might perhaps make a case that local optimization is conceptually preferable. It is also worth noting that there is evidence outside of language for optimization of small domains (see, e.g., Gigerenzer et al. 1999 on “fast and frugal” decision-making, which relies on the availability of very little information). However, ultimately empirical arguments are needed to decide whether optimization domains should be viewed as small or large (possibly global). Such arguments are of the following type: If the ranked constraints have access to more/less structure, a wrong winner is predicted, ceteris paribus. All the options in (70) have been pursued, and arguments for the specific notion of optimization domain chosen in each case have typically taken this form. Let me list a few examples: (i) The minimal clause is identified as the optimization domain for syntax in Ackema and Neeleman’s (1998) study of whmovement in Czech, and in an analysis of extraction from verb-second clauses in German that I develop in Müller (2003a). (ii) Arguments for the phase as the optimization domain are presented in Fanselow and Cavar´ ’s (2001) investigation of MeN-deletion in Malay, and in Müller (2000a, 2002), studies that deal with R-pronouns in German. (iii) Next, the phrase (XP) is argued to be the syntactic optimization domain in the crosslinguistic study of reflexivization (including long-distance reflexivization) documented in Fischer (2004, 2006); similarly, the approach to secondary remnant movement in Müller (2000c) and the approach to wh-movement, superiority, quantifier raising, and
26. Optimality-Theoretic Syntax
915
sluicing developed in Heck and Müller (2000, 2003) make crucial reference to the phrase as the domain of optimization. (iv) Finally, empirical arguments for optimization of individual derivational steps are given in Heck and Müller (2007, 2013) (based on gender agreement with dative possessors in German DPs and expletives in German verbsecond clauses), in Müller (2009) (based on ergative vs. accusative argument encoding patterns), in Lahne (2008, 2009) (based on the incompatibility of SVO order and ergative systems of argument encoding; but cf. Deal, this volume, where the incompatibility is disputed), and in Georgi (2009) (based on global case splits in Tauya). In what follows, I will address two arguments for serial, local optimization in a bit more detail.
4.2. Clauses as optimization domains Ackema and Neeleman (1998) are concerned with multiple question formation in typologically different systems. Based on earlier work by a number of authors, they identify two general possibilities for the analysis of multiple wh-movement as it can be found in Slavic languages, viz. wh-cluster formation (all wh-phrases are adjoined to one, which then undergoes movement) on the one hand and multiple separate movement on the other. Czech can be shown to follow the latter strategy. This must also be the case when multiple wh-movement applies long-distance, as in (71) (note that the sequence of fronted wh-phrases can be interrupted by parentheticals and the like, as in [71], which is incompatible with wh-cluster formation). (71) [VP Co1 [VP podle tebe [VP komu2 [VP Petr řekl [CP že Jan according.to you Petr said that Jan what whom dal t1 t2 ]]]]] ? gave ‘According to you, what did Petr say that Jan gave to whom?’
[Czech]
The analysis is based on the three constraints in (72). Both Q-MARK and Q-SCOPE trigger wh-movement. Q-MARK requires movement to a designated specifier of a functional head, whereas Q-SCOPE can be satisfied via movement to a local VP-adjoined position. STAY is a gradient version of ECONOMY that minimizes the length of movement paths (with no movement at all emerging as the ideal option). (72) a. Q-MARK: Assign [+Q] to a propositional constituent. (This can only be done by an overt functional head, which in turn needs to inherit this capacity in the matrix clause from some wh-phrase in its specifier.) b. Q-SCOPE: [+Q]-elements must c-command the constituent representing the proposition. c. STAY: Every node crossed by movement induces a violation. The ranking Q-SCOPE [ STAY [ Q-MARK in Czech predicts multiple separate whmovement to VP-adjunction sites in matrix questions, as opposed to local wh-cluster
916
IV. Syntactic Models formation and movement of the wh-cluster to a specifier position (as it is predicted for Bulgarian-type languages where Q-MARK is also ranked high). The reason is that separate wh-movements involve shorter movement paths if the target position is in the same clause. However, if the ultimate target is outside the minimal clause, and long-distance wh-movement is called for (as in [71]), the analysis requires local optimization in order to predict the right outcome. Here is what Ackema and Neeleman (1998: 478) have to say in the footnote where they tackle the potential problem: Evaluation of movement constraints proceeds cyclically. That is to say, STAY is first evaluated with respect to the embedded clause, then to the combination of the embedded clause and the matrix clause. In the embedded clause, STAY favours separate movement of the two wh-expressions (...) This means that clustering can only take place when the larger cycle is taken into account, i.e., when the two whs have already been adjoined to the embedded VP. However, it is no longer possible then, because it would have to take place within the embedded clause (the initial landing site of the whs), which would go against strict cyclicity. (Ackema and Neeleman 1998: fn. 25)
Thus, optimization first applies to the embedded CP; see tableau (73). O1 (with separate movement) is optimal because O3 (with wh-cluster formation and movement to a clauseinitial specifier) incurs fatal violations of STAY. (73) Long multiple wh-movement in Czech, optimization of embedded CP Input: part of the numeration
QSCROPE
STAY
☞ O1: [QR že [VP co1 [VP komu2 [VP Jan dal t1 t2 ]]]]
***
O2: [CP komu2 že [VP co1 [VP Jan dal t1 t2 ]]]]
****!*
O3: [CP co1 komu2 že [VP Jan dal t1 t2 ]] O4: [CP že [VP Jan dal co1 komu2 ]]
QMARK
****!*** *!
The optimal output O1 is subsequently taken as the sole input for optimization of the matrix CP; all competing candidates are descendants of O1 (this is signalled by labelling them O11, O12, etc.). Again, wh-movement has to apply, and again, separate movement emerges as optimal because it involves shorter movement paths and thereby minimizes STAY violations. (Q-MARK is now violated, but this is harmless given the ranking.) This is shown in tableau (74). If there had not been serial, local optimization of CPs, but rather parallel, global optimization of the whole sentence as it is standardly assumed, a wrong winner would have been predicted. In the words of Ackema and Neeleman: “It seems to be predicted that when the distance to be covered by the wh-expressions in a multiple question increases, clustering [as in Bulgarian, with a high-ranked Q-Mark] will be favoured.” This is illustrated in tableau (75). The output that would wrongly predicted to be optimal (due to fewer nodes crossed in the course of movement overall) is marked by ☛. The underlying logic is this: Two short separate movements may be better than a short movement (creating a wh-cluster) plus a longer movement of the cluster. E.g., 2 + 2 = 4 nodes may be crossed in the first case, and 1 + 5 = 6 nodes in the second.
26. Optimality-Theoretic Syntax
917
(74) Long multiple wh-movement in Czech, optimization of matrix clause Input: [CP že [VP co1 [VP komu2[VP Jan dal t1 t2 ]]]], Petr, řekl
QSCOPE
STAY
☞ O11: [VP co1 [VP komu2 [VP Petr řekl [CP že [VP t1 [VP t2 [VP Jan dal t1 t2 ]]]]]]]
QSCOPE
*** ******
*
O12: [CP co1 řekl [VP komu2 [VP Petr [CP že [VP t1 [VP t2 [VP Jan dal t1 t2 ]]]]]]]
*** *******!**
O13: [CP co1 komu2 řekl [VP (t1 t2) Petr [CP že [VP t1 [VP t2 [VP Jan dal t1 t2 ]]]]]]
*** *******!**
(75) Global optimization: Long multiple wh-movement in Czech, wrong winner Input: numeration
Q-SCOPE
O1: [VP co1 [VP komu2 [VP Petr řekl [CP že [VP Jan dal t1 t2 ]]]]] ☛ O2: [CP co1 komu2 řekl [VP Petr [CP že [VP Jan dal t1 t2 ]]]]
STAY
QMARK
*********!*
*
********
However, two medium-sized separate movements can still be worse than a short movement (creating a cluster) and a very long movement. E.g., 7 + 7 = 14 nodes may be crossed in the first case, and 1 + 10 = 11 nodes in the second. (Note that these numbers are given solely for the purpose of illustration; they may or may not come close to the actual state of affairs, depending on a variety of further decisions about clause structure that are orthogonal to present concerns.)
4.3. Derivational steps as optimization domains Heck and Müller (2007, 2013) suggest that the minimalist program and optimality theory can be fruitfully combined (also see Pesetsky 1998; Broekhuis and Dekkers 2000; Broek´ avar 2001; Heck and Müller 2000). A basic huis 2000, 2006, 2008; Fanselow and C assumption is that syntactic structure is built up derivationally and is subject to repeated local optimization: Structure-building and optimization apply in a cyclic interleaving fashion. More specifically, based on a given input, the minimalist operations Merge, Move, and Agree create various output candidates α1, ... ,αn: the candidate set M. M is subject to optimization. The optimal output αi then serves as the input for the next cycle, and so on, until the final root node has been reached and the sentence is complete. Thus, in this approach every derivational step is subject to optimization. (Also see Epstein and Seely 2002: 77, who argue that “each transformational rule application constitutes a ‘phase’.”) The empirical arguments for extremely local optimization domains presented in Heck and Müller (2007, 2013) take the following form: Sometimes, the order of applying Agree and Merge is under-determined. If there are no simultaneous rule applications in the grammar (see Epstein and Seely 2002; contra Pullum 1979, Chomsky
918
IV. Syntactic Models 2008), then a conflict arises: Only one of them can be executed at each step. The conflict can be resolved by ranking the requirements: The highest-ranked requirement is satisfied immediately; lower-ranked ones are not yet satisfied at the current derivational step. Such unsatisfiability does not lead to a crash of the derivation and thus suggests an analysis in terms of violable constraints. However, if the optimization domain is larger than the step-level, then, ceteris paribus, the order of elementary operations that is imposed by the ranking under step-level optimization cannot be preserved. This is the wrong result because sentences would be predicted to be well formed that aren’t. One of the relevant phenomena is the pre-nominal dative possessor construction German, which is fairly widespread but still considered substandard (see, e.g., Haider 1988; Zifonun 2004). Here, a dative-marked possessor DP2 shows up in SpecD of a matrix DP1; and there is evidence that it has been moved there from the complement position of the noun (de Vries 2005; Chomsky 1970). D1 in turn is realized by a possessive pronoun with a dual role: The root of the pronoun agrees with DPdat (the possessor) with respect to [num(ber)] and [gend(er)]; and the inflection of the pronoun agrees with its complement NP (the possessum) with respect to [num], [gend], and [case]; see (76a). A basic assumption is that the [gend] features of the possessive pronoun are not inherent; rather, they are determined in the course of the derivation, by Agree relations with gender-bearing nominals; so in principle, the possibility might arise that gender agreement were reversed, with the root of the pronoun agreeing with the possessum and the inflection agreeing with the possessor. This would yield (76b), which needs to be excluded. − In fact, taking (76b) to be a serious competitor of (76a) is presumably not an artefact of the theory. Young children as well as second language learners of German have well-documented problems with getting the two types of gender agreement (i.e., root vs. inflection) with third-person possessive pronouns right; see, e.g., Ruff (2000). Furthermore, gender mistakes with possessive pronouns regularly occur even in adult speech, and are then frowned upon by language mavens (see, e.g., Sick 2006: 108). (76) a. [DP1 [DP2 dem Fritz] [D′ [D1 sein-e] [DP [N Schwester ] t2 ]]] theM.DAT Fritz his.M-F sister.F ‘Fritz’s sister’
[German]
Fritz] [D′ [D1 ihr-0̸ ] [DP [N′ Schwester ] t2 ]]] b. *[DP1 [DP2 dem the.M.DAT Fritz her.F-M sister.F ‘Fritz’s sister’ The analysis is based on three constraints. First, the AGREE CONDITION (AC) demands an immediate valuation of so far unvalued features on an item if the structural context for Agree (roughly, m-command) is available. Second, the MERGE CONDITION (MC) requires structure-building operations (including movement, as an instance of internal Merge; see Chomsky 2008) to take place immediately when the structural context for this operation is present. And third, the MINIMAL LINK CONDITION (MLC) states that all grammatical operations (like Agree and Merge) involve the smallest possible path between two items involved. By assumption, MLC is undominated (or belongs to GEN), and the ranking for German (or, at least, for derivational steps in the nominal domain in German) is AC [ MC. With this in mind, consider (77), which is the relevant stage (here called Σ stage) of the derivation of the pre-nominal dative possessor construction.
26. Optimality-Theoretic Syntax
919
(Some remarks on notation: [*GENDr:□*] [*GENDi:□*] are unvalued gender features of the root and inflectional parts of the possessive pronoun, respectively, that require valuation by Agree with a gender-bearing nominal; [•EPP•] is a property of the possessive pronoun that triggers movement of the possessor DP2 to SpecD, yielding the eventual surface order of constitutents.) (77) The Σ Stage of the derivation and the subsequent order of operations: DP1 D1 D1
(3)
NP infl
D1
N
[* GENDi : *] [GEND :fem]
root
DP2dat [GEND :masc]
(1)
[• EPP• ], [* GENDr : *] (2)
At Stage Σ, various operations (signalled by [1], [2], and [3] in [77]) could in principle be carried out in the next step, because the contexts for Agree and Move to apply are all met. However, given the ranking AC [ MC, gender valuation rather than movement has to apply next, and given the MLC, gender agreement must take place between the inflectional part of the possessive pronoun and the head noun of the construction, which minimizes path lengths for syntactic dependencies (operation [1] in [77]). This is shown in tableau (78). (Note that AC is still violated once by the optimal output O1; the reason is that one gender feature of the pronoun is still unvalued even though the context for Agree to apply is present.) (78) Valuation of inflection’s gender, step 1 (Σ as input): Agree (with possessum NP) Input: [ D1ʹ D[*GENDr : *], [*GENDi : *], [•EPP• ] . . . . . . [ NP N[GEND :fem] DP2 [GEND :masc] ]]
MLC
ჹ O1 : [ D1ʹ D[*GENDr :
*], [GENDi :fem], [• EPP • ] . . . . . . [ NP N[GEND :fem] DP2 [GEND :masc] ]]
O2 : [ D1ʹ D[GENDr :fem], [*GENDi : *], [• EPP• ] . . . . . . [ NP N[GEND :fem] DP2 [GEND :masc] ]] O3 : [ DP1 DP2 [GEND :masc] D[*GENDr : . . . [ NP N[GEND :fem] t2 ]]
*], [* GENDi : *]
*!
...
AC
MC
*
*
*
*
**!
The optimal output O1 of this optimization is then used as the input for the next optimization procedure. As before, agreement is given preference to movement of DP2 (because of AC [ MC), resulting in valuation of the remaining gender feature on the root of the possessive pronoun (operation [2] in [77]); see tableau (79).
920
IV. Syntactic Models (79) Valuation of root’s gender, step 2: Agree (with possessor DP) Input: [ D1ʹ D[*GENDr : *], [*GENDi : fem ], [•EPP• ] . . . . . . [ NP N[GEND :fem] DP2 [GEND :masc] ]]
MLC
AC
ჹ O11: [ D1ʹ D[*GENDr : masc ], [GENDi :fem], [• EPP• ] . . .
MC
*
. . . [ NP N[GEND :fem] DP2 [GEND :masc] ]] O12 : [ DP1 DP2 [GEND :masc] D[ GENDr : masc], [ . . . [ NP N[GEND :fem] t2 ]]
GEND i :
fem ]
...
*!
Finally, movement can and must take place (operation [3] in [77]); O111 in tableau (80) is the sole remaining candidate (at least among those that have any chance of becoming optimal). The resulting order of operations is illustrated by numbered arrows in (77). (80) Possessor raising, step 3: Move Input: [ D1ʹ D[GENDr :masc], [GENDi :fem], [•EPP• ] . . . . . . [ NP N[GEND :fem] DP2 [GEND :masc] ]]
MLC
AC
MC
ჹ O111 : [ DP1 DP2 D[GENDr :masc], [GENDi :fem] . . . . . . [ NP N[GEND :fem] t2 ]]
Suppose now that optimization applied to phrases, to phases, to clauses, or to full sentences − i.e., to any domain that is larger than the derivational step. An optimal DP will always involve raising of DPdat. But with DPdat raised, both DPdat and NP are equally close to the pronoun; the input for optimization will then involve the full structure in (77), after movement to SpecD. Now, ceteris paribus, the unvalued gender feature on the inflectional part of the pronoun can receive value [masc], which derives (76b). Thus, the approach overgenerates; see tableau (81), where O2 in addition to O1 is wrongly classified as optimal. (81) DP (vP, CP, ...) optimization: wrong result Input: D[* GENDr : *], [*GENDi : *], [• EPP • ] [ NP N[GEND :fem] DP2 [GEND :masc] ]
MLC
AC
MC
ჹ O1 : [ DP1 DP2 [GEND :masc] D[GENDr :masc], [GENDi :fem] . . . . . . [ NP N[GEND :fem] t2 ]]
O2 : [ DP1 DP2 [GEND :masc] D[GENDr :fem], [GENDi :masc] . . . . . . [ NP N[GEND :fem] t2 ]] O3 : [ DP1 D[GENDr :masc], [GENDi :fem], [• EPP • ] . . . . . . [ NP N[GEND :fem] DP2 [GEND :masc] ]]
O4 : [ DP1 DP2 [GEND :masc] D[* GENDr : . . . [ NP N[GEND :fem] t2 ]]
*], [GENDi :fem]
*!
...
*!
LR
26. Optimality-Theoretic Syntax From a more general perspective, the argument presented here is a standard counterfeeding argument against strictly representational analyses (see Chomsky 1975; Kiparsky 1973): Movement of DP2 to SpecD1 could in principle feed agreement of the inflectional part of D1 with the possessor DP2, but such movement comes too late in the derivation and therefore doesn’t. Many arguments for serial local optimization are of this general type, involving either counter-feeding (where properties of the ultimate output representation suggest that an operation should have been able to apply even though it could not, the reason being that the context for application was not yet there at the crucial stage of the derivation) or counter-bleeding (where properties of the ultimate output representation suggest that an operation should not have been able to legimately apply even though evidently it could, the reason being that the properties that would block it were not there at an earlier stage in the derivation).
4.4. Problems for local domains for competition resolution Serial local optimization makes a number of interesting predictions, opens up new areas for research, and chimes in well with recent developments in the minimalist program. Nevertheless, it faces challenges in domains where it looks as though more information must be available for optimization procedures than would be permitted under local optimization. Repair or last resort phenomena that seem to involve long-distance dependencies, like resumptive pronouns in island contexts and instances of long-distance binding, are a case in point. Thus, recall from subsection 2.2 that resumptive pronouns that show up with movement dependencies across islands (and only there) can straightforwardly be analyzed in terms of OT, as involving a violation of a DEP constraint (like INCLUSIVENESS) that may become optimal only if all competing outputs violate higher-ranked constraints (i.e., those that trigger movement, and those that block movement across an island). However, there is a problem with a naive transfer of the standard (parallel, global) OT approach to resumptive pronouns to a serial local OT approach: On the one hand, the crucial decision (trace or resumptive pronoun) must be made very early in the derivation, when only little structure is present yet. On the other hand, an island that serves as the trigger for last-resort resumption may come quite late in the derivation; the island may be separated from the extraction site by a huge amount of intervening structure. Consequently, the island will typically be too far away from the extraction site to successfully trigger resumption. The problem is illustrated in (82). (82) ?(the man) who(m) I don’t believe [DP₁ the claim that anyone thinks that Mary believes that Bill [CP₂ said that John [vP₃ t [VP₄ saw him ]]]] Here, VP4 is the domain where the decision must be made under extremely local optimization of derivational steps (or of phrases); vP3 is the domain if phases are optimization domains; and the decision must be made in CP2 if clauses are optimization domains. However, the domain in which the relevant information (viz., presence of an island) becomes available is DP1, which is far beyond any of the local domains for optimization that have been proposed. One can either consider problems of this type to be fatal, or one can take them to pose an interesting challenge to the local optimization enterprise. Assuming the latter,
921
922
IV. Syntactic Models there are various ways to look for solutions. For instance, it has been proposed that morphological realization in (extended) chains permits exceptions to the Strict Cycle Condition (see Chomsky 1973), so that the derivation may in fact backtrack, and selectively change earlier material (see particularly Fischer’s 2006 local OT approach to binding in terms of a wormhole theory; also compare multidominance approaches to movement, which require a similar assumption; see Gärtner 2002; Frampton 2004; among others). Alternatively, one might argue that the relevant information (concerning islands) is in fact already present presyntactically (in the numeration); the decision can be made before the syntactic derivation starts. Irrespective of these considerations, though, data like (82) suggest that potential problems with local optimization arise independently of the exact size of the optimization domain (given that it is not the entire sentence). Arguably, for conceptual reasons, this might then favour the choice of the smallest possible optimization domain, at least as a plausible research strategy.
5. Conclusion What is the current status of OT syntax in the field of linguistics? One cannot help but notice that as a common research program, OT syntax is not well. Various considerations support this conclusion: First, at the time of writing, there do not seem to be regular workshops expressly devoted to OT syntax anymore. (There were such workshops on OT syntax between 1997 and 2002, originally initiated by Sten Vikner at Stuttgart University, and there were several such meetings in the US in the second half of the 1990s.) Second, very few OT syntax papers have appeared in leading journals over the last few years. Third, the few papers that have appeared in leading journals in the last years do not seem to share common research goals, do not tackle similar questions, and regularly do not cite other recent work in OT syntax. Fourth, whereas new edited volumes with a focus on OT syntax came out on a regular basis for some time (see, e.g., Archangeli and Langendoen 1997; Dekkers et al. 2001; Fanselow and Féry 2002b; Legendre et al. 2001; Müller and Sternefeld 2001a; Sells 2001a), this seems to have all but stopped. Note also that the working paper volumes Vogel and Broekhuis (2006) and Broekhuis and Vogel (2008) on Optimality Theory and Minimalism both have only few contributions that might rightfully be subsumed under the label of OT syntax. Also, the book series Advances in Optimality Theory, edited by Vieri Samek-Lodovici and Armin Mester, has so far produced virtually nothing that could be classified as being (mainly) on syntax, the single exception being Broekhuis and Vogel (2013). However, this volume is to some extent based on the earlier two working paper volumes; and note that there is not a single syntax monograph in this series so far. (Another possible exception is the colletion of papers in Samek-Lodovici 2006b that goes back to an LSA workshop in 2005.) Fifth, few influential dissertations on OT syntax have appeared in recent years (since, say, Zepter 2004; Engels 2004; Fischer 2004), and virtually none (as far as I can tell) in the US. All this is very different from the situation in morphology, semantics, pragmatics (here see particularly the work on bidirectional OT going back to Blutner 2000 and Jäger and Blutner 2000) and, especially, phonology, where OT thrives to this day.
26. Optimality-Theoretic Syntax Thus, the immediate prospects for OT syntax as a self-sufficient, viable research programme can be viewed as bleak. However, there is a legacy of OT syntax: In a sense, it lives on in other theories. In particular, its key mechanisms are implicit in much recent (and not so recent) work in the Principles and Parameters tradition, and optimization procedures arguably form an important part of the minimalist program, even though this is typically not acknowledged. For reasons of space, I cannot possibly go through a substantial number of the cases of hidden optimization here, or provide detailed argumentation to support the claim that hidden optimization is often involved in work that purports to do without optimization; but it is clear that many of the relevant analyses are concerned with phenomena that suggest constraint conflict, repair (last resort), or defaults. Let me confine myself to listing a few examples where implicit optimization procedures (that must be construed with violable and ranked constraints if made explicit) show up in work in the Principles and Parameters tradition. In Müller (2000b), I argue for hidden optimization in Chomsky’s (1981) analysis of pronoun vs. PRO in English gerunds based on the transderivational constraint Avoid Pronoun (reconstructed ranking: OBLCONTROL [ *PRON); in Haegeman’s (1995) analysis of pro vs. overt pronoun in pro-drop languages based on Avoid Pronoun (reconstructed ranking: TOP/PRO [ *PRON); in Stechow and Sternefeld’s (1988) analysis of lexical vs. structural control in German (reconstructed ranking: FAITH(LEX) [ OBLCONTROL); in Kayne’s (1994) analysis of complementizer-finality and the absence of overt wh-movement in Japanese (reconstructed ranking: IP-CRIT [ WH-CRIT); in Grewendorf’s (2001) analysis of multiple wh-questions in German (reconstructed ranking: *COMPLEX-WH [ WH-REAL); and in Roberts’s (1997) approach to phonological realization in head chains (reconstructed ranking: *COMPLEXHEAD [ HEAD-REAL). In Heck, Müller, and Trommer (2008), we show that analysis of definiteness marking in Swedish DPs in Embick and Noyer (2001) relies on an implicit ranking of various constraints: N-DEF, D-DEF, HMC [ N-TO-D [ *DISSOCIATION, FULL-INT. Lahne (2009) observes that the analysis of Agree relations in Haegeman and Lohndal (2008) depends on a ranking MINIMALITY, FEATURE MATCHING [ AGREE. Samek-Lodovici (2006a) points out that the analysis of strong and weak pronouns in Cardinaletti and Starke (1999) is ultimately based on a ranking CHECK-F, PARSE [ *STRUC [ STAY. And so on. As remarked above, optimization procedures play an important role in the minimalist program, independently of particular analyses of linguistic phenomena of the type just mentioned. First, earlier versions of the minimalist program regularly employed transderivational constraints like Fewest Steps and Shortest Paths, which involve optimization of a type that is very similar to that adopted in standard OT (see Müller and Sternefeld 2001b for an overview, and Chomsky 1993, 1995; Collins 1994; and Boškovic´ 1997 for some relevant cases). Second, at the heart of the minimalist program are elementary operations like Agree, Merge, Move, Delete, Transfer, and Select. Given that each operation is supposed to apply as soon as its context for application is present (a general Earliness requirement on derivations), it is clear that there will be conflicts. These conflicts have to be resolved by postulating ranking and minimal violability of constraints. This is what Heck and Müller (2007, 2013) argue for in the case of Agree vs. Move (or, more generally, Agree vs. Merge; see above). A far more widespread interaction of requirements for elementary operations concerns Merge vs. Move operations, as they have been discussed under the label of Merge before Move in Chomsky (1995, 2000, 2001, 2005), Frampton and Gutmann (1999), Hornstein (2001, 2009) and many other
923
924
IV. Syntactic Models minimalist analyses, for a variety of phenomena including expletive sentences and adjunct control. In its original conception, Merge before Move is a transderivational constraint. Frampton and Gutmann (1999) suggest the formulation in (83), which brings the constraint closer to the perspective adopted in the previous section. (83) Merge before Move: Suppose that the derivation has reached stage Σn, and Σn+1 is a legitimate instance ′ is a legitimate instance of Move. Then, Σn+1 is to be preferred of Merge, and Σn+1 ′ . over Σn+1 The optimality-theoretic reconstruction is straightforward: A MERGE CONDITION outranks a more specific MOVE CONDITION, as in (84), with the derivational step as the (extremely local) optimization domain (the MC put to use in subsection 4.3 would have to be adapted accordingly). (84) a. MERGE CONDITION (MC): Merge (external Merge) applies if its context for application is met. b. MOVE CONDITION (MoveC): Move (internal Merge) applies if its context for application is met. Does the ranking that is required to derive the effects of (83) have to be universal, or can it be reversed in principle (as suggested by Broekhuis and Klooster 2001)? If the latter is the case, can the ranking vary from one syntactic domain (or category) to another one? At present, these are open questions which, however, strike me as quite important, and which should definitely incite further interesting research. Another example illustrating hidden optimization in core parts of minimalist syntax concerns the Inclusiveness condition adopted in Chomsky (2001) and much subsequent related work (see above). An INCLUSIVENESS constraint demands that nothing may enter the syntactic derivation which is not part of the original numeration; however, this DEPtype constraint must be minimally violable in favour of the requirement that intermediate steps of successive-cyclic movement proceed via edge feature insertion: Edge features on phase heads are not part of the numeration. Arguably, the same conclusion can be drawn for the mechanism of feature valuation as part of Agree; the copy mechanism required here gives rise to a straightforward DEP violation. Similarly, the copy theory of movement (Chomsky 1993) would seem to systematically require violability of INCLUSIVENESS. Finally, it is worth pointing out that implicit optimization in the minimalist program is not confined to conflicting demands imposed by basic operations. For instance, an idea that has been widely pursued in recent years is that attempts at carrying out an Agree operation may in principle fail without necessarily giving rise to ungrammaticality. Rather, a second, different attempt can be made to establish an Agree operation; see Béjar and Řezáč (2009), Boškovič (2009), Patel (2010), and d’Alessandro (2012), among others, on such second-cycle Agree (Georgi 2010 even argues for third-cycle Agree effects). On the simplest interpretation, this clearly presupposes violability of the constraint that triggers Agree in a well-formed output. A version of this is proposed by Preminger (2011): If Agree fails, it does not follow that ungrammaticality arises; rather, the Agreetriggering probe feature can be deleted. Again, this presupposes constraint violability. In
26. Optimality-Theoretic Syntax the same way, Chomsky’s idea (presented at various talks in recent years) that movement is triggered by the necessity to break an otherwise existing symmetry in syntactic structure (which then may preclude a labelling of constituents; see Moro 2007; Boeckx 2008; Ott 2012 for some applications of this or a similar idea) would seem to strongly suggest an optimality-theoretic mechanism at its very core. Thus, OT syntax may be endangered as a research programme sui generis, but based on the preceding remarks, I would like to contend that minimalist syntax is inherently optimality-theoretic at its very core. Independently of this, OT syntax is, in my view, well worth pursuing, and not just for the more obvious reasons having to do with the existence of repair phenomena, constraint conflict, and default forms in natural languages: OT syntax permits a radically new perspective on various kinds of phenomena, one that would not be available in approaches that do not envisage constraint violability and constraint ranking. To see this, consider, finally, the gist of the account of whisland effects developed in Legendre, Smolensky, and Wilson (1998); unlike most other accounts, this analysis does not rely on a concept of intervention (as in Rizzi 1990, 2004). In this alternative account, all movement from an embedded clause significantly violates locality constraints. Such a violation is fatal if the ultimate target position of the wh-phrase that is supposed to undergo long-distance movement can be changed from the matrix clause to the embedded clause without triggering a violation of selection requirements. This is possible, hence obligatory, with embedded wh-clauses, which are objects of [+wh]-selecting verbs. However, such a locality violation with movement from a clause is permissible as a last resort if the ultimate target position of the wh-phrase that is supposed to undergo long-distance movement cannot be relocated to the embedded clause without violating selection requirements. This is the case with embedded declarative clauses, which are objects of [−wh]-selecting verbs. So, surprisingly, what rules out wh-island constructions is the fact that a violation of locality can be avoided by relocating the wh-scope to the embedded clause; and what permits extraction from declarative complements is the fact that a violation of locality cannot be avoided here. Evidently, there is no room for elegant and highly innovative reasonings of this type in non-optimality-theoretic approaches. For reasons like this, a renaissance of OT syntax, however unlikely, might do the field good.
6. References (selected) Ackema, Peter, and Ad Neeleman 1998 Optimal questions. Natural Language and Linguistic Theory 16: 443−490. Aissen, Judith 1999 Markedness and subject choice in Optimality Theory. Natural Language and Linguistic Theory 17: 673−711. Aissen, Judith 2003a Differential object marking: Iconicity vs. economy. Natural Language and Linguistic Theory 21: 435−483. Aissen, Judith 2003b Harmonic alignment in morphosyntax. Ms., University of California, Santa Cruz. Anttila, Arto 1997 Variation in Finnish phonology and morphology. Ph.D. thesis, Stanford University.
925
926
IV. Syntactic Models Aoun, Joseph, Norbert Hornstein, and Dominique Sportiche 1981 Aspects of wide scope interpretation. Journal of Linguistic Research 1: 69−95. Archangeli, Diana, and Terence Langendoen (eds.) 1997 Optimality Theory. An Overview. Oxford: Blackwell. Bakovic, Eric´ 1995 A markedness subhierarchy in syntax. Ms., Rutgers University. Bakovic, Eric´ 1998 Optimality and inversion in Spanish. In: Pilar Barbosa, Danny Fox, Paul Hagstrom, Martha McGinnis and David Pesetsky (eds.), Is the Best Good Enough?, 35−58. Cambridge, Mass.: MIT Press and MITWPL. Barbiers, Sjef 2002 Remnant stranding and the theory of movement. In: Artemis Alexiadou, Elena Anagnostopoulou, Sjef Barbiers and Hans-Martin Gärtner (eds.), Dimensions of Movement, 47− 67. Amsterdam: Benjamins. Béjar, Susana, and Milan Rezáˇcˇ 2009 Cyclic Agree. Linguistic Inquiry 40: 35−73. Bermúdez-Otero, Ricardo 2008 Stratal Optimality Theory. Book Ms., University of Manchester. To appear: Oxford University Press. Blutner, Reinhard 2000 Some aspects of optimality in natural language interpretation. Journal of Semantics 17: 189−216. Boeckx, Cedric 2008 Bare Syntax. Oxford: Oxford University Press. Boeckx, Cedric 2009 Merge Alpha: Movement and Filtering. Ms., ICREA/UAB, Barcelona. Boersma, Paul, and Bruce Hayes 2001 Empirical tests of the Gradual Learning Algorithm. Linguistic Inquiry 32: 45−86. Boškovic, Željko´ 1997 The Syntax of Nonfinite Complementation. An Economy Approach. Cambridge, Mass.: MIT Press. Boškovic, Željko´ 2009 Unifying first and last conjunct agreement. Natural Language and Linguistic Theory 27: 455−496. Bresnan, Joan 2001 The emergence of the unmarked pronoun. In: Géraldine Legendre, Jane Grimshaw and Sten Vikner (eds.), Optimality-Theoretic Syntax, 113−142. Cambridge, Mass.: MIT Press. Bresnan, Joan, Ashwini Deo, and Devyani Sharma 2007 Typology in variation: A probabilistic approach to ‘be’ and ‘n’t’ in the ‘survey of English dialects’. English Language and Linguistics 11: 301−346. Bresnan, Joan, Shipra Dingare, and Christopher Manning 2001 Soft constraints mirror hard constraints: Voice and person in English and Lummi. In: Proceedings of the LFG 2001 Conference. CSLI Publications. Broekhuis, Hans 2000 Against feature strength: The case of Scandinavian object shift. Natural Language and Linguistic Theory 18: 673−721. Broekhuis, Hans 2006 Derivations (MP) and evaluations (OT). In: Ralf Vogel and Hans Broekhuis (eds.), Optimality Theory and Minimalism: A Possible Convergence?, Vol. 25, 137−193. Potsdam: Linguistics in Potsdam.
26. Optimality-Theoretic Syntax Broekhuis, Hans 2008 Derivations and Evaluations. Object Shift in the Germanic Languages. Berlin: Mouton de Gruyter. Broekhuis, Hans, and Joost Dekkers 2000 The Minimalist Program and Optimality Theory: Derivations and evaluation. In: Joost Dekkers, Frank van der Leeuw and Jereon van de Weijer (eds.), Optimality Theory: Phonology, Syntax, and Acquisition, 386−422. Oxford: Oxford University Press. Broekhuis, Hans, and Wim Klooster 2001 On Merge and Move/Attract. In: Progress in Grammar: Articles at the 20 th Anniversary of the Comparison of Grammatical Models Group in Tilburg, 1−20. Tilburg University. Broekhuis, Hans, and Ralf Vogel (eds.) 2008 Optimality Theory and Minimalism: Interface Theories, Vol. 28. Universität Potsdam: Linguistics in Potsdam. Broekhuis, Hans, and Ralf Vogel (eds.) 2013 Linguistic Derivations and Filtering. (Advances in Optimality Theory.) London: Equinox. Broihier, Kevin 1995 Optimality theoretic rankings with tied constraints: Slavic relatives, resumptive pronouns and learnability. Ms., MIT, Cambridge, Mass. Büring, Daniel 2001 Let’s phrase it. In: Gereon Müller and Wolfgang Sternefeld (eds.), Competition in Syntax, 69−105. Berlin: Mouton/de Gruyter. Cardinaletti, Anna, and Michal Starke 1999 The typology of structural deficiency: A case study of the three classes of pronouns. In: Henk van Riemsdijk (ed.), Clitics in the Languages of Europe, 145−235. Berlin: Mouton de Gruyter. Choi, Hye-Won 1999 Optimizing Structure in Context. Scrambling and Information Structure. Stanford: CSLI Publications. Chomsky, Noam 1965 Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Chomsky, Noam 1970 Remarks on nominalization. In: R. Jacobs and P. Rosenbaum (eds.), Readings in English Transformational Grammar, 184−221. Waltham, Mass.: Ginn and Company. Chomsky, Noam 1973 Conditions on transformations. In: Stephen Anderson and Paul Kiparsky (eds.), A Festschrift for Morris Halle, 232−286. New York: Academic Press. Chomsky, Noam 1975 The Logical Structure of Linguistic Theory. New York: Plenum Press. Chomsky, Noam 1981 Lectures on Government and Binding. Dordrecht: Foris. Chomsky, Noam 1986 Barriers. Cambridge, Mass.: MIT Press. Chomsky, Noam 1993 A minimalist program for syntactic theory. In: Ken Hale and Samuel Jay Keyser (eds.), The View from Building 20, 1−52. Cambridge, Mass.: MIT Press. Chomsky, Noam 1995 The Minimalist Program. Cambridge, Mass.: MIT Press. Chomsky, Noam 2000 Minimalist inquiries: The framework. In: Roger Martin, David Michaels and Juan Uriagereka (eds.), Step by Step, 89−155. Cambridge, Mass.: MIT Press.
927
928
IV. Syntactic Models Chomsky, Noam 2001 Derivation by phase. In: Michael Kenstowicz (ed.), Ken Hale. A Life in Language, 1− 52. Cambridge, Mass.: MIT Press. Chomsky, Noam 2005 Three factors in language design. Linguistic Inquiry 36: 1−22. Chomsky, Noam 2007 Approaching UG from below. In: Uli Sauerland and Hans-Martin Gärtner (eds.), Interfaces + Recursion = Language?, 1−31. Berlin: Mouton de Gruyter. Chomsky, Noam 2008 On phases. In: Robert Freidin, Carlos Otero and Maria Luisa Zubizarreta (eds.), Foundational Issues in Linguistic Theory, 133−166. Cambridge, Mass.: MIT Press. Chomsky, Noam, and Howard Lasnik 1993 Principles and parameters theory. In: Joachim Jacobs, Arnim von Stechow, Wolfgang Sternefeld and Theo Vennemann (eds.), Syntax. Ein internationales Handbuch zeitgenössischer Forschung, Vol. 1, 506−569. Berlin: de Gruyter. Cinque, Guglielmo 1990 Types of A-bar Dependencies. Cambridge, Mass.: MIT Press. Collins, Chris 1994 Economy of derivation and the Generalized Proper Binding Condition. Linguistic Inquiry 25: 45−61. Costa, João 1998 Word order variation. Ph.D. thesis, Universiteit Leiden. d’Alessandro, Roberta 2012 Agreement, Ergativity, and the Parametrization of Probes. Ms., Leiden University. de Hoop, Helen, and Andrej Malchukov 2008 Case-marking strategies. Linguistic Inquiry 39: 565−587. de Vries, Mark 2005 Possessive relatives and (heavy) pied-piping. Journal of Comparative Germanic Linguistics 9: 1−52. Dekkers, Joost, Frank van der Leeuw, and Jeroen van de Weijer (eds.) 2001 Optimality Theory. Phonology, Syntax, and Acquisition. Oxford: Oxford University Press. Embick, David, and Rolf Noyer 2001 Movement operations after syntax. Linguistic Inquiry 32: 555−595. Engdahl, Elisabet, Maia Andréasson, and Kersti Börjars 2004 Word order in the Swedish midfield − an OT approach. In: Fred Karlsson (ed.), Proceedings of the 20 th Scandinavian Conference of Linguistics, Vol. 36, 1−12. University of Helsinki: Department of General Linguistics. Engels, Eva 2004 Adverb placement. An Optimality-Theoretic approach. Ph.D. thesis, Universität Potsdam. Engels, Eva, and Sten Vikner 2013 Derivation of Scandinavian object shift and remnant VP-topicalization. In: Hans Broekhuis and Ralf Vogel (eds.), Linguistic Derivations and Filtering, 193−220. Sheffield: Equinox. Epstein, Samuel David, and T. Daniel Seely 2002 Rule applications as cycles in a level-free syntax. In: Samuel David Epstein and T. Daniel Seely (eds.), Derivation and Explanation in the Minimalist Program, 65−89. Oxford: Blackwell. Fanselow, Gisbert 1991 Minimale Syntax. Habilitation thesis, Universität Passau.
26. Optimality-Theoretic Syntax Fanselow, Gisbert, and Damir Cavar´ 2001 Remarks on the economy of pronunciation. In: Gereon Müller and Wolfgang Sternefeld (eds.), Competition in Syntax, 107−150. Berlin: Mouton de Gruyter. Fanselow, Gisbert, and Caroline Féry 2002a Ineffability in grammar. In: Gisbert Fanselow and Caroline Féry (eds.), Resolving Conflicts in Grammars, 265−307. Hamburg: Buske. Fanselow, Gisbert, and Caroline Féry (eds.) 2002b Resolving Conflicts in Grammar. (Special volume of Linguistische Berichte.) Hamburg: Buske. Fischer, Silke 2001 On the integration of cumulative effects into Optimality Theory. In: Gereon Müller and Wolfgang Sternefeld (eds.), Competition in Syntax, 151−173. Berlin: Mouton/de Gruyter. Fischer, Silke 2004 Towards an optimal theory of reflexivization. Ph.D. thesis, Universität Tübingen. Fischer, Silke 2006 Matrix unloaded: Binding in a local derivational approach. Linguistics 44: 913−935. Fox, Danny 2000 Economy and Semantic Interpretation. Cambridge, Mass.: MIT Press. Frampton, John 2004 Copies, traces, occurrences, and all that. Ms., Northeastern University. Frampton, John, and Sam Gutmann 1999 Cyclic computation. Syntax 2: 1−27. Gabriel, Christoph 2010 On focus, prosody, and word order in Argentinean Spanish: A minimalist OT account. Revista Virtual de Estudos da Linguagem 4: 183−222. Gallego, Ángel 2007 Phase theory and parametric variation. Ph.D. thesis, Universitat Autónoma de Barcelona, Barcelona. Gärtner, Hans-Martin 2002 Generalized Transformations and Beyond. Berlin: AkademieVerlag. Georgi, Doreen 2009 Local modelling of global case splits. Master’s thesis, Universität Leipzig. Georgi, Doreen 2010 Third cycle agree effects in Mordvin. In: Sebastian Bank, Doreen Georgi and Jochen Trommer (eds.), 2 in Agreement, Vol. 88 of Linguistische Arbeitsberichte, 125−161. Universität Leipzig. Gigerenzer, Gerd, P. M. Todd, and the ABC Group 1999 Simple Heuristics that Make us Smart. New York: Oxford University Press. Grewendorf, Günther 2001 Multiple wh-fronting. Linguistic Inquiry 32: 87−122. Grimshaw, Jane 1994 Heads and optimality. Ms., Rutgers University. (Handout of talk, Universität Stuttgart). Grimshaw, Jane 1997 Projection, heads, and optimality. Linguistic Inquiry 28: 373−422. Grimshaw, Jane 1998 Constraints on constraints in Optimality Theoretic Syntax. Ms., Rutgers University, New Brunswick, New Jersey. Grimshaw, Jane 1999 Heads and clauses. Ms., Rutgers University. Grimshaw, Jane 2001 Economy of structure in OT. Ms., Rutgers University.
929
930
IV. Syntactic Models Grimshaw, Jane 2006 Chains as unfaithful optima. Ms., Rutgers University. Also in Wondering at the Natural Fecundity of Things: Essays in Honor of Alan Prince; ROA 844, 97−110. Grimshaw, Jane 2010 Do-support and last resorts. Ms., Rutgers University. Gutiérrez-Bravo, Rodrigo 2007 Prominence scales and unmarked word order in Spanish. Natural Language and Linguistic Theory 25: 235−271. Haegeman, Liliane 1995 An Introduction to Government and Binding Theory. Oxford: Blackwell, 2. edition. Haegeman, Liliane, and Terje Lohndal 2008 Negative Concord is Not Multiple Agree. Poster presentation, NELS 39, Cornell University. Haider, Hubert 1988 Zur Struktur der deutschen Nominalphrase. Zeitschrift für Sprachwissenschaft 7: 32−59. Hale, Ken 1972 A new perspective on American Indian linguistics. In: Alfonso Ortiz (ed.), New Perspectives on the Pueblos, 87−103. Albuquerque: University of New Mexico Press. Halle, Morris, and Alec Marantz 1993 Distributed Morphologyand the pieces of inflection. In: Ken Hale and Samuel Jay Keyser (eds.), The View from Building 20, 111−176. Cambridge, Mass.: MIT Press. Hayes, Bruce 2001 Gradient well-formedness in Optimality Theory. In: Joost Dekkers, Frank van der Leeuw and Jeroen van de Weijer (eds.), Optimality Theory. Phonology, Syntax, and Acquisition, 88−120. Oxford: Oxford University Press. Heck, Fabian 1998 Relativer Quantorenskopus im Deutschen − Optimalitätstheorie und die Syntax der Logischen Form. Master’s thesis, Universität Tübingen. Heck, Fabian 2001 Quantifier scope in German and cyclic optimization. In: Gereon Müller and Wolfgang Sternefeld (eds.), Competition in Syntax, 175−209. Berlin: Mouton/de Gruyter. Heck, Fabian, and Gereon Müller 2000 Successive cyclicity, long-distance superiority, and local optimization. In: Roger Billerey and Brook D. Lillehaugen (eds.), Proceedings of WCCFL, Vol. 19, 218−231. Somerville, MA, Cascadilla Press. Heck, Fabian, and Gereon Müller 2003 Derivational optimization of wh-movement. Linguistic Analysis 33: 97−148. (Volume appeared 2007). Heck, Fabian, and Gereon Müller 2007 Extremely local optimization. In: Erin Brainbridge and Brian Agbayani (eds.), Proceedings of the 26 th WECOL, 170−183. Fresno, California State University. Heck, Fabian, and Gereon Müller 2013 Extremely local optimization. In: Hans Broekhuis and Ralf Vogel (eds.), Linguistic Derivations and Filtering, 136−165. Sheffield: Equinox. Heck, Fabian, Gereon Müller, and Jochen Trommer 2008 A phase-based approach to scandinavian definiteness marking. In: Charles B. Chang and Hannah J. Haynie (eds.), Proceedings of the 26 th West Coast Conference on Formal Linguistics, 226−233. Somerville, MA, Cascadilla Proceedings Project. Heck, Fabian, Gereon Müller, Ralf Vogel, Silke Fischer, Sten Vikner, and Tanja Schmid 2002 On the nature of the input in Optimality Theory. The Linguistic Review 19: 345−376. Hornstein, Norbert 2001 Move. A Minimalist Theory of Construal. Oxford: Blackwell.
26. Optimality-Theoretic Syntax Hornstein, Norbert 2009 A Theory of Syntax: Minimal Operations and Universal Grammar. Cambridge: Cambridge University Press. Itô, Junko, and Armin Mester 2002 Lexical and postlexical phonology in Optimality Theory. In: Gisbert Fanselow and Caroline Féry (eds.), Resolving Conflicts in Grammars: Optimality Theory in Syntax, Morphology, and Phonology, 183−207. Hamburg: Buske. Special issue of Linguistische Berichte. Jäger, Gerhard, and Reinhard Blutner 2000 Against lexical decomposition in syntax. In: Adam Wyner (ed.), Proceedings of IATL, Vol. 15, 113−137. University of Haifa. Kager, René 1999 Optimality Theory. Cambridge: Cambridge University Press. Kayne, Richard 1994 The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press. Keer, Ed, and Eric Bakovic´ 2001 Optionality and ineffability. In: Géraldine Legendre, Jane Grimshaw and Sten Vikner (eds.), Optimality-Theoretic Syntax, 97−112. Cambridge, Mass.: MIT Press. Keer, Ed, and Eric Bakovic´ 2004 Have faith in syntax. In: E. Curtis, J. Lyle and G. Webster (eds.), Proceedings of the 16 th WCCFL, 255−269. Stanford, California, CSLI Publications. Keine, Stefan 2010 Case and Agreement from Fringe to Core. Impoverishment Effects on Agree. (Linguistische Arbeiten.) Berlin: Mouton de Gruyter. Keine, Stefan, and Gereon Müller 2008 Differential argument encoding by impoverishment. In: Marc Richards and Andrej Malchukov (eds.), Scales, Vol. 86 of Linguistische Arbeitsberichte, 83−136. Universität Leipzig. Keine, Stefan, and Gereon Müller 2011 Non-zero/non-zero alternations in differential object marking. In: Suzi Lima, Kevin Mullin and Brian Smith (eds.), Proceedings of the 39 th Meeting of the North East Linguistics Society, 441−454. Amherst, Mass., GLSA. Kiparsky, Paul 1973 ‘Elsewhere’ in phonology. In: Steven Anderson and Paul Kiparsky (eds.), A Festschrift for Morris Halle, 93−106. New York: Academic Press. Kiparsky, Paul 1999 Analogy and OT: Morphological change as emergence of the unmarked. Vortrag auf der 21. Jahrestagung der DGfS, Konstanz. Ms., Stanford University. Kiparsky, Paul 2000 Opacity and cyclicity. The Linguistic Review 17: 351−367. Kiparsky, Paul 2001 Structural case in Finnish. Lingua 111: 315−376. Koster, Jan 1987 Domains and Dynasties. Dordrecht: Foris. Kroch, Anthony 2001 Syntactic change. In: The Handbook of Contemporary Syntactic Theory, 699−729. Kuhn, Jonas 2001 Formal and computational aspects of optimality-theoretic syntax. Ph.D. thesis, Universität Stuttgart. Lahne, Antje 2008 Excluding SVO in ergative languages. In: Fabian Heck, Gereon Müller and Jochen Trommer (eds.), Varieties of Competition, Vol. 87 of Linguistische Arbeitsberichte, 65− 80. Universität Leipzig.
931
932
IV. Syntactic Models Lahne, Antje 2009 Where there is fire there is smoke. Local modelling of successive-cyclic movement. Ph.D. thesis, Universität Leipzig. Lee, Hanjung 2003 Parallel optimization in case systems. Ms., University of Minnesota, Twin Cities. Legendre, Géraldine 2001 An introduction to Optimality Theory in syntax. In: Géraldine Legendre, Jane Grimshaw and Sten Vikner (eds.), Optimality-Theoretic Syntax, 1−27. Cambridge, Mass.: MIT Press. Legendre, Géraldine 2009 The neutralization approach to ineffability in syntax. In: Curt Rice and Sylvia Blaho (eds.), Modeling Ungrammaticality in Optimality Theory. (Advances in Optimality Theory.) London: Equinox Publishing. Legendre, Géraldine, Jane Grimshaw, and Sten Vikner (eds.) 2001 Optimality-Theoretic Syntax. Cambridge, Mass.: MIT Press. Legendre, Géraldine, Paul Smolensky, and Colin Wilson 1998 When is less more? Faithfulness and minimal links in wh-chains. In: Pilar Barbosa, Danny Fox, Paul Hagstrom, Martha McGinnis and David Pesetsky (eds.), Is the Best Good Enough?, 249−289. Cambridge, Mass.: MIT Press and MITWPL. Legendre, Géraldine, Colin Wilson, Paul Smolensky, Kristin Homer, and William Raymond 1995 Optimality and wh-extraction. In: Jill Beckman, Laura Walsh-Dickie and Suzanne Urbanczyk (eds.), Papers in Optimality Theory, 607−635. (Occasional Papers in Linguistics 18.) Amherst, Massachussetts: UMass. McCarthy, John 2000 Harmonic serialism and parallelism. In: M. Hirotani, A. Coetzee, N. Hall and J.-Y. Kim (eds.), Proceedings of NELS 30, 501−524. Amherst, Mass., GLSA. McCarthy, John 2002 A Thematic Guide to Optimality Theory. Cambridge: Cambridge University Press. McCarthy, John 2007 Hidden Generalizations. Phonological Opacity in Optimality Theory. London: Equinox. McCarthy, John 2008 The serial interaction of stress and syncope. Natural Language and Linguistic Theory 26: 499−546. McCarthy, John 2010 An introduction to harmonic serialism. Language and Linguistics Compass 4: 1001− 1018. Merchant, Jason 2001 The Syntax of Silence − Sluicing, Islands, and the Theory of Ellipsis. Oxford: Oxford University Press. Moro, Andrea 2007 Some notes on unstable structures. Ms., Universitá Vita−Salute San Raffaele. Müller, Gereon 1997 Partial wh-movement and Optimality Theory. The Linguistic Review 14: 249−306. Müller, Gereon 1999 Optimality, markedness, and word order in German. Linguistics 37: 777−818. Müller, Gereon 2000a Das Pronominaladverb als Reparaturphänomen. Linguistische Berichte 182: 139−178. Müller, Gereon 2000b Elemente der optimalitätstheoretischen Syntax. Tübingen: Stauffenburg. Müller, Gereon 2000c Shape conservation and remnant movement. In: M. Hirotani, A. Coetzee, N. Hall and J.-Y. Kim (eds.), Proceedings of NELS 30, 525−539. Amherst, Mass., GLSA. Revised
26. Optimality-Theoretic Syntax and extended version appeared 2002 in Artemis Alexiadou, Elena Anagnostopoulou, Sjef Barbiers, and Hans-Martin Gärtner (eds.), Dimensions of Movement, 209−241. Amsterdam: Benjamins. Müller, Gereon 2001 Order preservation, parallel movement, and the emergence of the unmarked. In: Géraldine Legendre, Jane Grimshaw and Sten Vikner (eds.), Optimality Theoretic Syntax, 279−313. Cambridge, Mass.: MIT Press. Müller, Gereon 2002 Harmonic alignment and the hierarchy of pronouns in German. In: Horst Simon and Heike Wiese (eds.), Pronouns: Grammar and Representation, 205−232. Amsterdam: Benjamins. Müller, Gereon 2003a Local vs. global optimization in syntax: A case study. In: Jennifer Spenader, Anders Eriksson and Östen Dahl (eds.), Variation within Optimality Theory. Proceedings of the Stockholm Workshop, 82−91. Stockholm University, Department of Linguistics. Müller, Gereon 2003b Optionality in optimality-theoretic syntax. In: Lisa Cheng and Rint Sybesma (eds.), The Second GLOT International State-of-the-Article Book. The Latest in Linguistics, 289− 321. Berlin: Mouton de Gruyter. Müller, Gereon 2009 Ergativity, accusativity, and the order of Merge and Agree. In: Kleanthes K. Grohmann (ed.), Explorations of Phase Theory. Features and Arguments, 269−308. Berlin: Mouton de Gruyter. Müller, Gereon, and Wolfgang Sternefeld (eds.) 2001a Competition in Syntax. (Studies in Generative Grammar.) Berlin: Mouton de Gruyter. Müller, Gereon, and Wolfgang Sternefeld 2001b The rise of competition in syntax: A synopsis. In: Gereon Müller and Wolfgang Sternefeld (eds.), Competition in Syntax, 1−68. Berlin: Mouton/de Gruyter. Ott, Dennis 2012 Local Instability, Vol. 544 of Linguistische Arbeiten. Berlin: Mouton de Gruyter. Patel, Pritty 2010 Disagree to agree. Ms., MIT, Cambridge, Mass. Pesetsky, David 1997 Optimality Theory and syntax: Movement and pronunciation. In: Diana Archangeli and Terence Langendoen (eds.), Optimality Theory. An Overview, 134−170. Oxford: Blackwell. Pesetsky, David 1998 Some optimality principles of sentence pronunciation. In: Pilar Barbosa, Danny Fox, Paul Hagstrom, Martha McGinnis and David Pesetsky (eds.), Is the Best Good Enough?, 337−383. Cambridge, Mass.: MIT Press and MITWPL. Pollard, Carl J., and Ivan A. Sag 1994 Head-Driven Phrase Structure Grammar. Chicago: University of Chicago Press. Preminger, Omer 2011 Agreement as a fallible operation. Ph.D. thesis, MIT, Cambridge, Mass. Prince, Alan 2006 Implication and Impossibility in Grammatical Systems. Ms., Rutgers University. ROA. rutgers.edu/files/880-1006/880-PRINCE-0-0.PDF. Prince, Alan, and Vieri Samek-Lodovici 1999 Optima. Ms., ROA 363. Prince, Alan, and Paul Smolensky 1993 Optimality Theory. Constraint Interaction in Generative Grammar. Book ms., Rutgers University.
933
934
IV. Syntactic Models Prince, Alan, and Paul Smolensky 2004 Optimality Theory. Constraint Interaction in Generative Grammar. Oxford: Blackwell. Pullum, Geoffrey 1979 Rule Interaction and the Organization of a Grammar. New York: Garland. Riggle, Jason 2004 Generation, recognition, and learning in finite state Optimality Theory. Ph.D. thesis, University of California, Los Angeles. Rizzi, Luigi 1990 Relativized Minimality. Cambridge, Mass.: MIT Press. Rizzi, Luigi 2004 Locality and left periphery. In: Luigi Rizzi (ed.), The Structure of CP and IP. The Cartography of Syntactic Structures, vol. 2. Oxford: Oxford University Press. Roberts, Ian 1997 Restructuring, head movement, and locality. Linguistic Inquiry 28: 423− 460. Ross, John 1967 Constraints on variables in syntax. Ph.D. thesis, MIT, Cambridge, Mass. Ruff, Claudia 2000 Besitz und Besitzer in der Sprachentwicklung von deutschen und italienischen Kindern. Ph.D. thesis, Universität Braunschweig. Sailer, Manfred 2002 The German Incredulity Response Construction and the Hierarchial Organization of Constructions. Ms., Universität Göttingen. Talk, 2nd International Conference on Construction Grammar. Salzmann, Martin 2006 Variation in resumption requires violable constraints. A case study in Alemannic relativization. In: Hans Broekhuis and Ralf Vogel (eds.), Optimality Theory and Minimalism: Interface Theories, Vol. 28, 99−132. Potsdam: Linguistics in Potsdam. Samek-Lodovici, Vieri 1996 Constraints on subjects. An optimality-theoretic analysis. Ph.D. thesis, Rutgers University, New Brunswick, New Jersey. Samek-Lodovici,Vieri 2005 Prosody-syntaxinteraction in the expression of focus. Natural Language and Linguistic Theory 23: 687−755. Samek-Lodovici, Vieri 2006a Optimality Theory and the Minimalist Program. In: Ralf Vogel and Hans Broekhuis (eds.), Optimality Theory and Minimalism: A Possible Convergence?, Vol. 25: 77−97. Potsdam: Linguistics in Potsdam. Samek-Lodovici, Vieri 2006b Studies in OT Syntax and Semantics. Amsterdam: Elsevier. Lingua Special Issue 9, vol. 117. Schmid, Tanja 2001 OT accounts of optionality: A comparison of global ties and neutralization. In: Gereon Müller and Wolfgang Sternefeld (eds.), Competition in Syntax, 283−319. Berlin: Mouton/de Gruyter. Schmid, Tanja 2005 Infinitival Syntax. Infinitivus Pro Participio as a Repair Strategy. Amsterdam: Benjamins. Sells, Peter (ed.) 2001a Formal and Empirical Issues in Optimality Theoretic Syntax. Stanford: CSLI. Sells, Peter 2001b Structure, Alignment and Optimality in Swedish. (Stanford Monographs in Linguistics.) Palo Alto: CSLI Publications.
26. Optimality-Theoretic Syntax Sells, Peter, John Rickford, and Thomas Wasow 1996 An optimality theoretic approach to variation in negative inversion in aave. Natural Language and Linguistic Theory 14: 591− 627. Shlonsky, Ur 1992 Resumptive pronouns as a last resort. Linguistic Inquiry 23: 443−468. Sick, Bastian 2006 Der Dativ ist dem Genitiv sein Tod. Noch mehr Neues aus dem Irrgarten der deutschen Sprache. Folge 3. KiWi Paperback, 5. edition. (2007). Silverstein, Michael 1976 Hierarchy of features and ergativity. In: R. M. W. Dixon (ed.), Grammatical Categories in Australian Languages, 112−171. Canberra: Australian Institute of Aboriginal Studies. Smolensky, Paul 1996 On the comprehension/production dilemma in child language. Linguistic Inquiry 27: 720−731. Smolensky, Paul 2006 Harmonic completeness, local constraint conjunction, and feature domain markedness. In: Paul Smolensky and Géraldine Legendre (eds.), The Harmonic Mind, Vol. II, chap. 14, 27−160. Cambridge, Mass.: MIT Press. Smolensky, Paul, and Geraldine Legendre 2006 The Harmonic Mind. Cambridge, Mass.: MIT Press. Stechow, Arnim von, and Wolfgang Sternefeld 1988 Bausteine syntaktischen Wissens. Opladen: Westdeutscher Verlag. Steddy, Sam, and Vieri Samek-Lodovici 2011 On the ungrammaticality of remnant movement in the derivation of Greenberg’s Universal 20. Linguistic Inquiry 42: 445−469. Sternefeld, Wolfgang 1991 Syntaktische Grenzen. Chomskys Barrierentheorie und ihre Weiterentwicklungen. Opladen: Westdeutscher Verlag. Sternefeld, Wolfgang 1996 Comparing reference sets. In: Chris Wilder, Hans-Martin Gärtner and Manfred Bierwisch (eds.), The Role of Economy Principles in Linguistic Theory, 81−114. Berlin: Akademie Verlag. Stiebels, Barbara 2000 Linker inventories, linking splits and lexical economy. In: Barbara Stiebels and Dieter Wunderlich (eds.), Lexicon in Focus, 211−245. Berlin: Akademie-Verlag. Stiebels, Barbara 2002 Typologie des Argumentlinkings: Ökonomie und Expressivität. Berlin: Akademie Verlag. Swart, Peter de 2007 Cross-linguistic variation in object marking. Ph.D. thesis, Nijmegen University. LOT Dissertations 168. Tesar, Bruce 1995 Computational Optimality Theory. Ph.D. thesis, University of Colorado. Tesar, Bruce 1998 Error-driven learning in Optimality Theory via the efficient computation of optimal forms. In: Pilar Barbosa, Danny Fox, Paul Hagstrom, Martha McGinnis and David Pesetsky (eds.), Is the Best Good Enough?, 421−435. Cambridge, Mass.: MIT Press and MITWPL. Vikner, Sten 2001a V-to-I movement and do-insertion in Optimality Theory. In: Géraldine Legendre, Jane Grimshaw and Sten Vikner (eds.), Optimality-Theoretic Syntax, 424−464. Cambridge, Mass.: MIT Press.
935
936
IV. Syntactic Models Vikner, Sten 2001b Verb Movement Variation in Germanic and Optimality Theory. Habilitation thesis, Universität Tübingen. Vogel, Ralf 2001 Case conflict in German free relative constructions. In: Gereon Müller and Wolfgang Sternefeld (eds.), Competition in Syntax, 341−375. Berlin: Mouton/de Gruyter. Vogel, Ralf 2009a Skandal im Verbkomplex. Betrachtungen zur scheinbar inkorrekten Morphologie in infiniten Verbkomplexen im Deutschen. Zeitschrift für Sprachwissenschaft 28: 307−346. Vogel, Ralf 2009b Wh-islands: A view from correspondence theory. In: Curt Rice and Sylvia Blaho (eds.), Modeling Ungrammaticality in Optimality Theory. (Advances in Optimality Theory.) London: Equinox Publishing. Vogel, Ralf, and Hans Broekhuis (eds.) 2006 Optimality Theory and the Minimalist Program, Vol. 25. Universität Potsdam: Linguistics in Potsdam. Wiklund, Anna-Lena 2001 Dressing up for vocabulary insertion: The parasitic supine. Natural Language and Linguistic Theory 19: 199−228. Williams, Edwin 1997 Blocking and anaphora. Linguistic Inquiry 28: 577−628. Williams, Edwin 2003 Representation Theory. Cambridge, Mass.: MIT Press. Wilson, Colin 2001 Bidirectional optimization and the theory of anaphora. In: Géraldine Legendre, Jane Grimshaw and Sten Vikner (eds.), Optimality-Theoretic Syntax, 465−507. Cambridge, Mass.: MIT Press. Woolford, Ellen 2001 Case patterns. In: Géraldine Legendre, Jane Grimshaw and Sten Vikner (eds.), Optimality-Theoretic Syntax, 509−543. Cambridge, Mass.: MIT Press. Wunderlich, Dieter 1997 Der unterspezifizierte Artikel. In: Christa Dürscheid, Karl Heinz Ramers and Monika Schwarz (eds.), Sprache im Fokus, 47−55. Tübingen: Niemeyer. Wunderlich, Dieter 2000 Optimal case in Hindi. Ms., Universität Düsseldorf. Wunderlich, Dieter 2003 The force of lexical case: German and Icelandic compared. Ms., Universität Düsseldorf. To appear in Kristin Hanson and Sharon Inkelas (eds.) The Nature of the Word: Essays in Honor of Paul Kiparsky. Cambridge, Mass.: MIT Press 2003. Zepter, Alexandra 2004 Phrase structure directionality: Having a few choices. Ph.D. thesis, Rutgers University, New Brunswick, New Jersey. Zifonun, Gisela 2004 Dem Vater sein Hut − Der Charme des Substandards und wie wir ihm gerecht werden. Deutsche Sprache 03: 97−126.
Gereon Müller, Leipzig (Germany)
27. HPSG − A Synopsis
27. HPSG − A Synopsis 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Formal foundations Valence and constituent order Semantics Nonlocal dependencies Lexical rules Idioms and phrasal lexical items Generalizations Convergence of theories and differences Conclusion References (selected)
Abstract Head-Driven Phrase Structure Grammar (HPSG) is a linguistic theory that wants to be psycholinguistically plausible and compatible with language acquisition facts. It is a model-theoretic approach and hence constraint-based. Constraints are formulated as feature value pairs, identity statements (structure sharing) and relational constraints that relate several values. HPSG employs types that are organized in inheritance hierarchies, which makes it possible to capture both lexical and syntactic generalizations. HPSG includes all linguistic levels of description, but this article focuses on syntax and semantics. It describes how valence information is represented, how it is linked to semantic information, how certain constituent structures are licenced and how non-local dependencies can be analyzed. HPSG is a lexical theory, that is, the lexicon is a rich and structured object. Appart from the organization in inheritance hierarchies lexical rules are used to capture productive and semi-productive processes in the lexicon. Head-Driven Phrase Structure Grammar (HPSG) was originally developed by Ivan Sag and Carl Pollard in the mid 80s. The main publications are Pollard and Sag (1987, 1994). International conferences have been held since 1994 and there is a rich collection of publications regarding analyses of linguistic phenomena (in the area of phonology, morphology, syntax, semantics, and information structure), formal foundations of the framework, and computational issues like efficient parsing and generation. See http://hpsg. fu-berlin.de/HPSG-Bib/ for bibliographic data. Since HPSG analyses are usually sufficiently formalized they can and have been implemented as computer processable grammars. This makes it possible to check the interactions of analyses with other phenomena and to use the linguistic knowledge in practical applications. See Bender et al., this volume, for further details. An overview of implementations and pointers to demos and downloadable systems can be found in Müller (2013b: chapter 8).
1. Formal foundations HPSG assumes feature structures as models of linguistic objects. These feature structures are described by feature descriptions, which are also called Attribute Value Matrix
937
938
IV. Syntactic Models (AVM). Such AVMs consist of feature value pairs. The values can be atomic or feature descriptions. Every feature structure is of a certain type. Types are ordered in hierarchies with the most general type at the top of the hierarchy and the most specific types at the bottom. Figure 27.1 shows an example hierarchy for the type case and its subtypes. (1)
Fig 27.1. Subtypes of case in a grammar of German
Types in a model of a linguistic object are maximally specific, that is, a noun or an attributive adjective in a model of an actual utterance has a case value that is nom, gen, dat, or acc. The linguist develops theories that describe possible feature structures. In contrast to feature structures, feature descriptions can be partial. For instance it is not necessary to specify a case value for the German word Frau ‘woman’ since Frau ‘woman’ can be used in NPs of all four cases. (2) shows a simplified description of the nominal agreement information for the German noun Frau ‘woman’ (see Kathol 1999 for details and Wechsler and Zlatic 2003 for a comprehensive´ overview of agreement in HPSG). Frau ‘woman’ has feminine gender, is compatible with all four cases, and is singular. The AVM has the type nom-agr. Types are written in italics. nom-agr is a complex type which introduces the features GEN, CASE, and NUM. fem, case, sg are also types, but they are atomic. fem and sg are maximally specific, since they do not have subtypes, but case does have subtypes. (2)
The purpose of descriptions is to constrain possible models. Since the specification of the CASE feature in (2) does not add any information, it can be omitted entirely. The AVM without the CASE feature is shown in (3): (3)
(2) and (3) describe the same set of feature structures. One very important part of the formalism is structure sharing. It is used to express that information in feature structures is identical, that is, token identical rather than just type identical. Structure sharing is indicated by boxed numbers in feature descriptions. An identical number at several places in an AVM expresses the fact that the respective values are identical. To give an example of structure sharing, the agreement information of a noun in German has to be compatible with the agreement information of the adjective and the
27. HPSG − A Synopsis
939
determiner. This compatibility is established by identifying a part of the structure that represents a noun with parts of the structure for the adjective and the determiner in an NP. In an analysis of (4), the definite article has to be compatible with the description in (3). (4)
die Frau the woman ‘the woman’
[German]
die ‘the’ is ambiguous between feminine singular nominative/accusative and plural nominative/accusative. (5)
Since Frau is singular, only feminine singular nominative/accusative is compatible with this noun. The result of identifying the feature bundles of die and Frau therefore is (6): (6)
While structure sharing is the most characteristic expressive means in HPSG there is one extension of the basic formalism that plays a crucial role in most HPSG analyses: relational constraints. Relational constraints are used to relate several values in a feature structure to each other. The relational constraint that is used most often in HPSG is append (4). append is used to concatenate two lists. The schema in (14), which will be discussed in section 2.2, is an example for an application of such a constraint. This brief sketch mentioned many essential concepts that are used in HPSG. Of course a lot more could be and has been said about the properties of the formalisms, but this introductory article is not the place to discuss them in detail. However, it cannot be emphasized enough that it is important that the formal details are worked out and the interested reader is referred to the work of Shieber (1986), Pollard and Sag (1987: chapter 2), Johnson (1988), Carpenter (1992), King (1994, 1999), Pollard (1999) and Richter (2004, 2007). The work of King, Pollard, and Richter reflects current assumptions, that is, the model-theoretic view on grammar that is assumed nowadays. Before I start to discuss several phenomena and their analyses in HPSG in the following sections I want to give an overview of the general feature geometry as it was developed in Pollard and Sag (1994). (7) shows parts of the lexical item for Mannes ‘man’, the genetive form of Mann ‘man’. The first feature value pair describes the phonological form of the word. The value of PHON is a list of phonemes. For reasons of readability usually the orthographic form is given in HPSG papers and phonological structure is omitted, but see Bird and Klein
940
IV. Syntactic Models (7)
(1994), Bird (1995), Orgun (1996), Höhle (1999), Klein (2000), and Asudeh and Klein (2002) for analyses. The second feature is SYNTAX-SEMANTICS (SYNSEM) and its value is a description of all properties of a linguistic object that are syntactically and semantically relevant and can be selected by other heads. Information that is locally relevant (LOCAL) is distinguished from information that plays a role in non-local dependencies (NONLOCAL, see section 4). Syntactic information is represented under CATEGORY (CAT) and semantic information under CONTENT (CONT). The example shows the HEAD value, which provides information about all aspects that are relevant for the external distribution of a maximal projection of a lexical head. In particular the part of speech information (noun) is represented under HEAD. The value of AGREEMENT (AGR) is parallel to the one given in (2). As well as information regarding the head features, valence information also belongs under CAT. The example shows the SPR feature, which is used for the selection of a specifier (see the next section for details on valence). The is an example of structure sharing. It ensures that the specifier that is realized together with the noun has compatible agreement features. The AVM in (7) shows a description of phonological, morpho-syntactic, and semantic aspects of a word. But of course other aspects can be and have been described by feature value pairs as well. For instance Engdahl and Vallduví (1996), Wilcock (2001), De Kuthy (2002), Paggio (2005), Bildhauer (2008), and Bildhauer and Cook (2010) show how information structure can be modeled in HPSG in general and how the interaction between phonology, syntax, and information structure can be captured in a constraintbased setting.
27. HPSG − A Synopsis
941
2. Valence and constituent order 2.1. Valence Descriptions of lexical elements contain a list with descriptions of the syntactic and semantic properties of their arguments. This list is called Argument Structure (ARG-ST). (8) gives some prototypical examples for ARG-ST values. (8)
Verb sleeps likes talks gives
ARG-ST
) ) ) )
NP[nom] * NP[nom], NP[acc] * NP[nom], PP[about] * NP[nom], NP[acc], NP[acc] *
SPR
) ) ) )
NP[nom] NP[nom] NP[nom] NP[nom]
COMPS
* * * *
) ) ) )
* NP[acc] * PP[about] * NP[acc], NP[acc] *
In (8) items like NP[nom] are abbreviations that stand for feature descriptions. The elements in the ARG-ST list are ordered according to the obliqueness hierarchy suggested by Keenan and Comrie (1977) and Pullum (1977). (9)
SUBJECT 0 DIRECT 0 INDIRECT 0 OBLIQUES 0 GENITIVES 0 OBJECTS OF OBJECT OBJECT COMPARISON
In grammars of configurational languages like English, the ARG-ST list is mapped onto two valence features: SPR (SPECIFIER) and COMPS (COMPLEMENTS). Examples for the respective values are also given in (8). The HPSG representation of valence is reminiscent of Categorial Grammar (see Baldridge, this volume, for an overview), where each head comes with a description of its arguments. Figure 27.2 shows the saturation of the subject valence: A head that requires a subject can be combined with a subject that matches the description in the SPR list. The indicates that the single element of the SPR list and the subject NP are identical. Therefore accusative NPs like him are excluded as a subject of sleeps. (10)
Fig 27.2. Analysis for Peter sleeps.
The elements in valence lists are canceled off once the combination with an appropriate item has taken place, that is the SPR list of Peter sleeps is empty since the SPR element of sleeps is realized as a sister of sleeps. Figure 27.3 shows a more complex example with a transitive verb.
942
IV. Syntactic Models (11)
Fig 27.3. Analysis for Kim likes Sandy.
likes and Sandy form a VP (a verbal projection with an empty COMPS list) and this VP is combined with its subject to form a fully saturated verbal projection, that is, a clause.
2.2. Constituent structure As was explained in section 1, HPSG exclusively uses feature descriptions with structure sharing and relational constraints for describing linguistic objects. As a consequence of this, the theory does not use phrase structure rules. (Of course a phrase structure component can be used and is used in computational implementations of HPSG for efficiency reasons, but phrase structural rules are not necessary on theoretical grounds and hence are replaced by constraints which results in a leaner architecture.) Instead the dominance relation between linguistic objects is modeled with feature structures. Trees are used for visualization purposes only. The attribute value matrix that expresses the dominance relations in the tree in Figure 27.4 is shown in (13). (12)
Fig 27.4. the man
(13)
27. HPSG − A Synopsis For explanatory purposes (13) shows the phonological information only. Part of speech information and valence information that is contained in the tree in Figure 27.4 is omitted. The value of PHON is a list of phonological contributions of the daughter signs. The feature HEAD-DTR is appropriate for headed structures. Its value is the sign that contains the head of a complex expression (the verb in a VP, the VP in a clause). The value of NON-HEAD-DTRS is a list of all other daughters of a sign. The following implication shows the constraints that hold for structures of type headcomplement-phrase: (14) Head-Complement Schema (fixed order, head-initial):
This constraint splits the COMPS list of the head daughter into two parts: a list that contains exactly one element () *) and a remaining list ( ). The first element of the COMPS list is identified with the SYNSEM value of the non-head daughter. It is therefore ensured that the description of the properties of the complement of a transitive verb like likes in Figure 27.3 is identified with the feature value bundle that corresponds to the properties of the object that is combined with the head, Sandy in the case of Figure 27.3. Since the Head-Complement Schema in (14) licenses structures with exactly one head daughter and exactly one non-head daughter, head-complement structures will be binary. This is not the only option for defining head-complement structures. The constraints can be specified in a way that allows for the realization of any number of complements in one go. See for instance Pollard and Sag (1994) for an analysis of English with a flat VP and Bouma and van Noord (1998) for an absolutely flat analysis of Dutch, including a flat verbal complex. The Head-Complement Schema in (14) licences the VP in Figure 27.3. The combination of the VP and its specifier is licenced by the Head-Specifier Schema: (15) Head-Specifier Schema:
Note that the non-head daughter is taken from the end of the SPR list, while the nonhead daughter in head-complement phrases is taken from the beginning. For heads that have exactly one specifier this difference is irrelevant, but in the analysis of object shift that is suggested by Müller and Ørsnes (2013), the authors assume multiple specifiers and hence the difference in order of combination is relevant.
943
944
IV. Syntactic Models Note that Pollard and Sag (1994: chapter 9) use a special valence feature for the subject and a special schema for combining subjects with their heads. As was argued convincingly by Pollard and Sag, information about subjects has to be represented in addition to the information about specifiers in order to account for predication structures, but this does not mean that a SUBJ feature has to be a valence feature. Instead I follow Pollard (1996) and Kiss (1995) and assume that SUBJ is a head feature. This change makes it possible to analyze determiner noun combinations and NP VP combinations with one schema rather than with two schemata as in earlier versions of HPSG.
2.3. Constituent order In the simple NP example above the order of the elements is fixed: the head follows the nonhead. However this is not always the case. For instance there are mixed languages like Persian that allow some heads to the left of their arguments and some heads to the right (Prepositional phrases are head-initial and verb phrases are head-final in Persian). For such reasons HPSG assumes a separation between immediate dominance (ID) constraints and linear precedence (LP) constraints as was common in GPSG (Gazdar et al. 1985). For instance, the Head-Complement Schema in (14) does not impose any order on the head and the non-head. This is taken care of by a set of separate constraints. Heads that precede their complements can be marked as INITIAL+ and those which follow their complements as INITIAL−. The following LP constraints ensure the right ordering of heads with respect to their complements: (16) a. b.
HEAD
[ INITIAL+ ] < COMPLEMENT < HEAD [ INITIAL− ]
COMPLEMENT
2.4. Free constituent order languages The Head-Complement Schema in (14) allows for the combination of a head with its complements in a fixed order, since the first element on the COMPS list is combined with the head before all other elements. (This is similar to what is known from Categorial Grammar.) Taken together with the linearization constraint in (16a), this results in a fixed constituent order in which the verb precedes its complements and the complements are serialized according to their obliqueness. However there are languages with much freer constituent order than English. If one does not want to assume a base order from which other orders are derived by movement or equivalents to movement one has to find ways to relax the constraint on head complement structures. One way of doing this is to allow the non-head daughter to be an arbitrary element from the COMPS list of the head daughter. The respective modification of the schema in (14) is given as (17). The COMPS list of the head daughter is split into three parts: a list of arbitrary length ( ), a list containing one element () *) and another list of arbitrary length ( ). and can be the empty list or contain one or more arguments. For non-configurational languages it is assumed that the subject of finite verbs is treated like the other arguments, that is, it is mapped to COMPS instead of being mapped
27. HPSG − A Synopsis
945
(17) Head-Complement Schema (free constituent order):
to SPR as in English. Having explained the difference in the HPSG analysis of configurational and non-configurational languages we can now give an example of an analysis of a language with rather free constituent order: The Figures 27.5 and 27.6 show the analysis of the German sentences in (18): das Buch kennt (18) a. [weil] jeder because everybody the book knows ‘because everybody knows the book’ kennt b. [weil] das Buch jeder because the book everybody knows (19)
Fig 27.5. Analysis of jeder das Buch kennt ‘everybody the book knows’.
(20)
Fig 27.6. Analysis of das Buch jeder kennt ‘the book everybody knows’.
[German]
946
IV. Syntactic Models In Figure 27.5 the object is combined with the verb first and the subject is represented in the COMPS list of the mother and in Figure 27.6 the subject is combined with the verb first and the object is represented in the COMPS list of the mother. As far as constituent ordering is concerned, this analysis is equivalent to proposals that assume a set for the representation of valence information. Any element from the set can be combined with its head. Such analyses were suggested very early in the history of HPSG by Gunji (1986) for Japanese. See also Hinrichs and Nakazawa (1989), Pollard (1996), and Engelkamp, Erbach, and Uszkoreit (1992) for set-based approaches to constituent order in German. A crucial difference between a set-based analysis and the list-based analysis advocated here is that the elements of the lists are ordered in order of obliqueness. This order is used in various subparts of the theory for instance for assignment of structural case and for expressing constraints on pronoun binding. So the obliqueness ordering has to be represented elsewhere in set-based approaches. For authors who assume binary branching structures the difference between languages with fixed constituent order and languages with free constituent order lies in the value of and in the schema in (17). If either or is the empty list one gets a fixed constituent order, with head-complement combination either in order of obliqueness or in the reverse order of obliqueness. A more radical approach to constituent order was first developed by Mike Reape (1994) for German and later adopted by other researchers for the analysis of constituent order in German (Kathol 1995, 2000; Müller 1995, 1999, 2002, 2004) and other languages. (Note though, that Müller 2005a, b argued that clause structure in German should not be analyzed in a domain-based way since this approach cannot account for multiple frontings that were documented in Müller (2003). Instead, a head-movement approach (Kiss and Wesche 1991; Frank 1994; Kiss 1995; Meurers 2000) was found more appropriate.) Another phenomenon for which these so-called linearization-based analyses seem to be needed is coordination (Crysmann 2008; Beavers and Sag 2004). In Reape’s approach constituent structure is entirely separated from linear order. Dependents of a head are inserted into a list, the so-called order domain. Elements that are inserted into the list can be ordered in any way provided no linearization rule is violated. In such a setting the structural order in which arguments are combined with their head is kept constant but the phonological serialization differs. Figure 27.7 shows an example analysis of the sentence in (21): (21) [dass] der Mann der Frau das Buch gibt that the man the woman the book gives ‘that the man gives the book to the woman’
[German]
The arguments of gibt are represented in the COMPS list in the order of obliqueness (nom, acc, dat). They are combined with the verb in this order and representations for the arguments are inserted into the constituent order domain of the head (the DOM list). In the analysis of (22) this results in a discontinuous constituent for der Frau gibt since the dative object der Frau and the verb are combined first and only later is the accusative object das Buch inserted between der Frau and gibt. Such domain-based analyses have been suggested for languages with even freer order. For instance Warlpiri is a language that allows attributive adjectives to be serialized
27. HPSG − A Synopsis (22)
Fig 27.7. Linearization analysis of the sentence der Mann der Frau das Buch gibt ‘the man the woman the book gives’.
independently of the noun they modify. This can be modeled by inserting the adjective independently of the noun into a serialization domain (Donohue and Sag 1999). While such linearization analyses are very interesting, the machinery that is required is rather powerful. Bender (2008) has shown how Wambaya − a language with similar constituent order freedom − can be analyzed in the framework of HPSG without assuming discontinuous constituents: instead of canceling the valence requirements in the way that is standard in HPSG and that was illustrated above, Bender assumes a non-cancellation approach to valence (Meurers 1999; Przepiórkowski 1999; Müller 2008), that is, all the information in the ARG-ST list is projected to the highest node in the clause. By doing this, information about the syntactic and semantic properties of the arguments is available at each projection in the clause. Making the ARG-ST information available at higher nodes is similar to what LFG does with the f-structures: all elements in a head domain can contribute to and access the same f-structure node. Since this information is available, case agreeing attributive adjectives can be realized far away from the head they modify (see Nordlinger 1998 for an LFG analysis of Wambaya that relies on accessing f-structure information). To sum up, there are three approaches to free constituent order: Flat structures, linearization domains with discontinuous constituents, and the non-cancellation of syntactic and semantic properties of arguments.
947
948
IV. Syntactic Models
2.5. Heads and projection of head features Section 1 introduced head features and Figure 27.7 shows that the information about part of speech and finiteness of the head is present at every projection, but until now nothing has been said about head feature propagation. The identity of the head features of a head and of a mother node is taken care of by the following principle: (23) Head Feature Principle: In a headed phrase, the HEAD value of the mother and the HEAD value of the head daughter are identical. This can be formalized by the following implicational constraint: (24) headed-phrase 0
The head daughter is the daughter that contains the syntactic head, that is in the phrase der Frau gibt in Figure 27.7 it is the lexical item gibt and in the phrase der Frau das Buch gibt it is the constituent der Frau gibt. The constraint is a constraint on structures of type headed-phrase. Types like head-complement-phrase are subtypes of headedphrase and hence the constraint in (24) applies to them too.
3. Semantics The first publications on HPSG assumed Situation Semantics (Barwise and Perry 1983) as the underlying semantic framework (Pollard and Sag 1987, 1994). While there are also more recent publications in this tradition (Ginzburg and Sag 2000), many current analyses use semantic formalisms that allow for the underspecification of scope constraints such as for instance Underspecified Discourse Representation Theory (UDRT, Frank and Reyle 1995), Constraint Language for Lambda Structures (CLLS, Egg et al. 2001), Minimal Recursion Semantics (MRS, Copestake, Flickinger, Pollard, and Sag 2005) and Lexical Resource Semantics (LRS, Richter and Sailer 2004). Minimal Recursion Semantics is widely used in the theoretical literature and in computational implementations of HPSG grammars and is also adopted by researchers working in other frameworks (see Kallmeyer and Joshi 2003 for an MRS semantics for TAG, Kay 2005 for a CxG fragment with an MRS semantics and Dyvik, Meurer, and Rosén 2005 for the combination of LFG and MRS). In what follows I will briefly explain the basic assumptions of MRS and their AVM encoding. (25) shows the examples for the semantic contribution of a noun and a verb in Minimal Recursion Semantics. An MRS consists of an index, a list of relations, and a set of handle constraints, which will be introduced below. The index can be a referential index of a noun (25a) or an event variable (25b). In the examples above the lexical items contribute the dog relation and the chase relation. The relations can be modeled with feature structures by turning the semantic roles into features. The semantic index of
27. HPSG − A Synopsis (25) a. dog
949 b. chases
(26) chase:
(27)
Fig 27.8. Analysis for Every dog chases some cat.
nouns is basically a variable, but it comes with an annotation of person, number, and gender since this information is important for establishing correct pronoun bindings.
950
IV. Syntactic Models The arguments of each semantic relation (e.g. agent, patient) are linked to their syntactic realization (e.g. NP[nom], NP[acc]) in the lexicon. (26) shows an example. NP[nom] stands for a description of an NP with the semantic index identified with . The semantic indices of the arguments are structure shared with the arguments of the semantic relation chase′. Generalizations over linking patterns can be captured elegantly in inheritance hierarchies (see section 7 on inheritance hierarchies and Wechsler 1995; Davis 2001; Davis and Koenig 2000 for further details on linking in HPSG). Before turning to the compositional analysis of (28a), I want to introduce some additional machinery that is needed for the underspecified representation of the two readings in (28b, c). (28) a. Every dog chased some cat. b. cx(dog(x) % dy(cat(y) o chase(x,y))) c. dy(cat(y) o cx(dog(x) % chase(x,y))) Minimal Recursion Semantics assumes that every elementary predication comes with a label. Quantifiers are represented as three-place relations that relate a variable and two so-called handles. The handles point to the restriction and the body of the quantifier, that is, to two labels of other relations. (29) shows a (simplified) MRS representation for (28a). (29) ) h0, { h1: every(x, h2, h3), h2: dog(x), h4: chase(e, x, y), h5: some(y, h6, h7), h6: cat(y) } * The three-place representation of quantifiers is a syntactic convention. Formulae like those in (28) are equivalent to the results of the scope resolution process that is described below. The MRS in (29) can best be depicted as in Figure 27.9. h0 stands for the top element. This is a handle that dominates all other handles in a dominance graph. The restriction of every points to dog and the restriction of some points to cat. The interesting thing is that the body of every and some is not fixed in (29). This is indicated by the dashed (30)
Fig 27.9. Dominance graph for Every dog chases some cat.
27. HPSG − A Synopsis lines in Figure 27.9 in contrast to the straight lines connecting the restrictions of the quantifiers with elementary predications for dog and cat, respectively. There are two ways to plug an elementary predication into the open slots of the quantifiers: (31) a. Solution one: h0 = h1 and h3 = h5 and h7 = h4. (every dog has wide scope) b. Solution two: h0 = h5 and h7 = h1 and h3 = h4. (some cat has wide scope) The solutions are depicted as Figure 27.10 and Figure 27.11. (32)
Fig 27.10. every(x, dog(x), some(y, cat(y), chase(x, y))) ≡ (28b).
(33)
Fig 27.11. some(y, cat(y), every(x, dog(x), chase(x, y))) ≡ (28c).
There are scope interactions that are more complicated than those we have been looking at so far. In order to be able to underspecify the two readings of (34) both slots of a quantifier have to stay open (Copestake et al. 2005: 296).
951
952
IV. Syntactic Models (34) a. Every nephew of some famous politician runs. b. every(x, some(y, famous(y) o politician(y), nephew(x, y)), run(x)) c. some(y, famous(y) o politician(y), every(x, nephew(x, y), run(x))) In the analysis of example (28a), the handle of dog′ was identified with the restriction of the quantifier every′. This would not work for (34a) since either some′(. . . ) or nephew′(x, y) can be the restriction of every′. Instead of direct specification so-called handle constraints are used (qeq or =q). A qeq constraint relates an argument handle and a label: h =q l means that the handle is identified with the label directly or one or more quantifiers are inserted between h and l. Taking this into account, we can now return to our original example. The correct MRS representation of (28a) is given in (35). (35) ) h0, { h1:every(x, h2, h3), h4:dog(x), h5:chase(e, x, y), h6:some(y, h7, h8), h9:cat(y) }, { h2 =q h4, h7 =q h9 } * The handle constraints are associated with the lexical entries for the respective quantifiers. They are represented in a list as the value of a feature called HCONS. Figure 27.12 shows the analysis. (36)
Fig 27.12. Analysis for Every dog chases some cat.
The RELS value of a sign is simply the concatenation of the RELS values of the daughters. Similarly the HCONS value is a concatenation of the HCONS values of the daughters. See Copestake et al. (2005) for an extension of this mechanism that allows for the inclusion of additional semantic information at the phrasal level. An interesting application of the underspecification of scope constraints is the treatment of the ambiguity of (37a). (37) a. dass Max alle Fenster aufmachte that Max all windows opened ‘that Max opened all windows’
[German]
27. HPSG − A Synopsis
953
b. c x (window(x) % CAUSE(max, open(x))) c. CAUSE(max, c x (window(x) % open(x))) The first reading corresponds to a situation in which all windows were closed and Max opens each window and the second reading corresponds to a situation in which some windows were open already and Max opened the remaining windows which results in a situation in which all windows are open. Egg (1999) suggests specifying the meaning of öffnen ‘to open’ in an underspecified way. (38) gives an MRS version of his analysis: (38) ) h0, { h1:CAUSE(x, h2), h3:open(y) }, { h2 =q h3 } * The CAUSE operator embeds the open ‘to open’ relation, but the embedding is not direct. It is stated as a dominance constraint h2 =q h3. This allows for quantifiers to scope between the CAUSE operator and the embedded predicate and therefore admits the readings in (37b, c). The analysis also extends to the readings that can be observed for sentences with adverbials like wieder ‘again’. The sentence in (39a) has three readings that originate from different scopings of CAUSE, c, and wieder ‘again’: (39) a. dass Max alle Fenster wieder aufmachte that Max all windows again opened ‘that Max opened all windows again’ b. CAUSE > c > again′ > open′ c. c > CAUSE > again′ > open′ d. c > again′ > CAUSE > open′
[German]
The first two readings are so-called repetitive readings and the third one is a restitutive reading. See Dowty (1979: section 5.6) on this phenomenon. Since only the relative scope of CAUSE and open′ is fixed in the lexical representation in (38), other scopetaking elements can intervene. With such a semantic representation the syntax-semantics interface can be set up as follows: the adverbial combines with aufmachen ‘to open’ and the resulting phrase is combined with the object alle Fenster ‘all windows’ and the subject Max. The scoping of the universal quantifier and the adverbial wieder ‘again’ depends on the ordering of the elements, that is, in (39a) only readings in which c outscopes again′ are available. See Kiss (2001) for more information of the treatment of quantifier scope in German in the framework of HPSG. Egg (1999) suggests the underspecification analysis as an alternative to von Stechow’s analysis in the Minimalist Program (1996). Von Stechow assumes a decomposition in syntax in the style of Generative Semantics and relies on several empty heads and movement operations that are necessary to derive readings. As was pointed out by Jäger and Blutner (2003) the analysis does not get all attested readings. Apart from such empirical problems, the underspecification analysis has to be preferred for conceptual reasons: the syntactic structures directly correspond to observable facts and hence it seems more likely that a performance model can be paired with such a surface-oriented approach. Apart from this surface-oriented structures do not require a rich innate UG to explain the acquisition of language (see section 8). See also Richter and Sailer (2008) for arguments for a richer syntax-semantics interface as opposed to proposals that derive certain readings via syntactic movement operations.
954
IV. Syntactic Models
4. Nonlocal dependencies The basic ingredients for the analysis of nonlocal dependencies like the one in (40) are taken over from GPSG analyses (Gazdar 1981; Gazdar, Klein, Pullum, and Sag 1985). (40) [People like him]i , everybody knows I dislike _i . In (40) the object of dislike is realized outside of the clause. While examples like (40) were seen as a motivation for movement transformations, Gazdar found a way to analyze such nonlocal dependencies as a series of local dependencies. The main idea is that the information about the fact that something is missing in the object position next to dislike is recorded locally and passed up to mother nodes that embed the VP until the phrase people like him is found and fills in the missing information about the object of dislike. (I explain the analysis in a bottom-up fashion starting with lexical items, but this is just for explanatory reasons. HPSG grammars are a collection of constraints without a specific order of application.) HPSG uses features, values, and structure sharing to establish the link between a missing element (a gap or trace) and its filler (Pollard and Sag 1994: chapter 5). Figure 27.13 shows the basic mechanism. (41)
Fig 27.13. Percolation of nonlocal information.
The figure shows an empty element for the introduction of the nonlocal dependency, but alternatives to trace-based approaches have been suggested. One such alternative is the assumption of additional dominance schemata that do not require certain arguments to be realized overtly but instead introduce an element into SLASH. This is basically equiva-
27. HPSG − A Synopsis lent to the traceless meta rule approach in the framework of GPSG (Uszkoreit 1987: 76− 77). Another traceless analysis is the lexical approach of Bouma, Malouf, and Sag (2001), in which only those arguments are mapped from ARG-ST to the valence features that are not extracted. The extracted arguments are mapped to SLASH instead. See Müller (2009) on problems of this approach with raising and Levine and Hukari (2006) for an elaborated discussion of various extraction analyses in HPSG. In the following I want to discuss the trace-based approach in more detail. The following AVM is a description of a trace. The local properties of a linguistic object are identified with an element in a list, which is the value of the SLASH feature under the path INHERITED (INHER). (The original HPSG treatment of nonlocal dependencies assumed that the value of the SLASH feature is a set. Recent versions − for instance Müller 1999 and Sag 2010 − assume a list-valued feature.) (42)
The LOCAL value of the trace is not constrained at all. Hence it is compatible with whatever is required in a given context. In the example under discussion the trace is used in the place of the object. The verb dislikes selects an accusative object and by identifying the description in the COMPS list of the verb with the SYNSEM value of the trace, the LOCAL value is more restricted than in the trace in (42). Since the LOCAL value of the trace is identical to the element on the INHER|SLASH list, the information about the missing object is represented in SLASH too. The information is passed upwards and identified with the LOCAL value of the filler. Since the LOCAL value of the filler is identical to the SLASH element of the trace and hence also identical to the LOCAL value of the trace, syntactic and semantic constraints of the governing verb can be checked as if the combination of people like him and dislike happened locally. The percolation of nonlocal features is taken care of by the following principle: (43) Nonlocal Feature Principle: For each nonlocal feature, the INHERITED value of the mother is the union of the INHERITED values of the daughters minus the TO-BIND value on the head daughter. This explains the introduction of the nonlocal dependency and the percolation of the information. What is missing is an immediate dominance schema that binds off the nonlocal dependency. This schema is given as the Head-Filler Schema in (44). This schema licenses combinations of a finite clause (a linguistic object that is fully saturated) with another element that has the same LOCAL properties as the element in the list under TO-BIND|SLASH. Due to the specification of the TO-BIND|SLASH value the SLASH information is not projected any further in the tree, but is bound off in the headfiller phrase.
955
956
IV. Syntactic Models (44) Head-Filler Schema:
5. Lexical rules Since HPSG is a lexicalist theory, the lexicon plays an important role. The lexicon is not just a prison for the lawless as suggested by Di Sciullo and Williams (1987: 3), but is structured and lexical items are related to each other. One means of capturing generalizations is lexical rules. A lexical rule says if there is a lexical item with certain properties then there is also another lexical item with certain other properties. An example for the application of lexical rules is morphology (Pollard and Sag 1987: chapter 8.2; Orgun 1996; Riehemann 1998; Ackerman and Webelhuth 1998; Kathol 1999; Koenig 1999). The HPSG lexicon (of inflecting languages) consists of roots that are related to stems or fully inflected words. The derivational or inflectional rules may influence part of speech (e.g. adjectival derivation) and/or valence (-able adjectives and passive). (45) is an example for a lexical rule. It was suggested by Kiss (1992) to account for the personal passive in German. (For a more general passive rule that unifies the analyses of personal and impersonal passives see Müller 2002: chapter 3. This more general rule for the passive uses the distinction between structural and lexical case.) The rule in (45) takes as input a verbal stem that governs both a nominative and an accusative. The nominative argument is not represented in the COMPS list of the output. The (45) Lexical rule for the personal passive following Kiss (1992):
27. HPSG − A Synopsis
957
case of the object is changed from acc to nom. The remaining arguments (if there are any) are taken over from the input ( ). The stem is mapped to a word and the phonology of the input ( ) is mapped to the passive form by a function f. During the past decades there has been some discussion concerning the status of lexical rules. One way to formalize them is to simply use the formalism of typed feature structures (Krieger and Nerbonne 1993: chapter 7.4.1; Copestake and Briscoe 1992; Briscoe and Copestake 1999; Meurers 1995, 2001; Riehemann 1998). In one type of such feature structure-based proposals, the input of the lexical rule is a daughter of the output. This is basically equivalent to a unary branching immediate dominance rule. (46) shows the lexical rule in (45) in a format that directly reflects this approach. (46) Lexical rule for the personal passive (formalized as a typed feature description):
An advantage of this formalization is that lexical rules are constraints on typed feature structures and as such it is possible to integrate them into an inheritance hierarchy and to capture generalizations over various linguistic objects. See section 7 on inheritance hierarchies and generalizations. For instance it was argued by Höhle (1997) that complementizers and finite verbs form a natural class in German. (47) a. dass Karl das Buch liest that Karl the book reads ‘that Karl reads the book’
[German]
b. Liest Karl das Buch? reads Karl the book ‘Does Karl read the book?’ In head-movement-inspired approaches (Kiss and Wesche 1991; Kiss 1995; Meurers 2000) the verb in (47b) is related to a lexical item for the verb as it occurs in (47a) by a lexical rule. The complementizer and the lexical rule are subtypes of a more general type capturing the commonalities of dass in (47a) und liest in (47b).
958
IV. Syntactic Models
6. Idioms and phrasal lexical items While early treatments of idioms assumed that they are just complex words, it has been pointed out by Nunberg, Sag, and Wasow (1994) that there are different types of idioms and that there are rather flexible classes of idioms. Some allow passivization, some allow for fronting of idiom parts, some relativization and so on, while others like kick the bucket are frozen and loose their idiomatic interpretation when they are passivized or rearranged in a different way. (48) shows some examples of fronting and relativization of the idiom pull strings. (48) a. Those strings, he wouldn’t pull for you. b. Pat pulled the strings [that got Chris the job]. c. The strings [that Pat pulled] got Chris the job. For examples like pull strings Sag (2007) suggested a lexical analysis in the framework of Sign-Based Construction Grammar, which is a version of HPSG. Sag assumes a special lexical entry for pull that selects strings and ensures that strings contributes an idiomatic strings relation (i_strings_rel′). The special lexical item for pull contributes an idiomatic pull relation (i_pull_rel′). The analysis is translated into the feature geometry that is used in this overview article as follows: (49)
strings_rel is a type that has two subtypes: one for the literal reading of strings, namely l_strings_rel and one for the idiomatic reading, namely i_strings_rel. The specification above says that the meaning of strings is l_strings_rel′ in the default case. This is marked with the /p. The little p stands for persistent and indicates that this default is not resolved in the lexicon but remains in the syntax. See Lascarides and Copestake (1999) for defaults in HPSG. The default may be overridden and such an overriding is enforced in the lexical item of the idiomatic pull, which is given as (50): (50)
27. HPSG − A Synopsis
959
The idiomatic lexical item for pull requires the LID value of the selected NP to be i_ strings_rel. It is therefore ensured that this lexical item for pull is combined with the noun strings and not, for instance, with books and the default specification for the semantic contribution of strings that is specified in (49) is overridden. This brief sketch shows that a lexical analysis works well for some idiom classes and such lexical analyses have been suggested in other frameworks too (see for instance G. Müller 2011 on a Minimalist analysis). Of course there are several other kinds of idioms and this brief overview article can not do justice to the colorful world of idioms. The interested reader is referred to Sailer (2000) and Soehn and Sailer (2008) for an extensive discussion of idioms and various instances of relations between idiom parts. While this type of lexical analysis can also be extended to non-flexible idioms like kick the bucket this is usually not done in HPSG. The lexical analysis would involve an expletive with the form bucket (see G. Müller 2011), which is rather unintuitive. A further problem for the lexical analysis is that it does not extend to idioms that stretch clause boundaries, since under standard assumptions heads do not select arguments of arguments. However, as Richter and Sailer (2009) pointed out, such idioms exist. For instance, the German example in (51) requires a complement clause with verb second position in which the patient of the kicking relation is fronted and expressed as a pronoun that is coreferent with the subject of the matrix clause, that is, the one who utters the sentence. The matrix verb has to be glauben ‘to believe’ or denken ‘to think’. (51) Ich glaube, mich /# dich tritt ein Pferd. I believe me you kicks a horse ‘I am very surprised.’
[German]
What is needed here is an extended domain of locality. This is generally the strength of Tree Adjoining Grammar (TAG) (see Abeillé 1988; Abeillé and Schabes 1989 on idioms in TAG), but the TAG analyses can be taken over to HPSG: Richter and Sailer (2009) developed an account for the analysis of sentences like (51) that uses a phrasal lexical entry. The gist of the analysis is that one uses partially specified feature descriptions to constrain the aspects of the idiom that are fixed. The respective AVMs can describe a tree with some non-terminal tree nodes fixed, with some terminal nodes fixed, or with both terminals and non-terminals fixed. In the case of the example above, there has to be a head-filler phrase in which the filler is a pronoun with accusative case that is an argument of treten ‘to kick’ and there has to be an indefinite NP with the head noun Pferd ‘horse’. It is possible to use regular expressions involving the daughter features. For instance, HEAD-DTR+ stands for one or more occurrences of the feature HEAD-DTR. This allows for an underspecification of structure: while HEAD-DTR refers to a node that is directly embedded in a certain structure − as for instance man in the man, see (13) − HEAD-DTR|HEAD-DTR refers to a head that is embedded two levels deep − as man in the [happy man]. So one can leave open the exact number of embeddings and just require that there has to be a head of a certain kind. This allows for adjuncts or arguments to be realized along the head path provided no other constraints in the grammar are violated. I hope to have shown that HPSG is well-equipped to handle various types of idioms, something that plays an important role in current theorizing. See for instance Jackendoff (1997: chapter 7), Culicover (1999), Ginzburg and Sag (2000: 5) and Newmeyer (2005: 48) on the arbitrariness of the Core/Periphery distinction and consequences for linguistic theories.
960
IV. Syntactic Models
7. Generalizations HPSG is a theory that places a lot of information in the lexicon. For instance lexical entries of verbs contain detailed descriptions of their arguments, they contain information on how arguments are linked to the semantic contribution of the verb, information about semantic roles and so on. A good way to capture generalizations with respect to this lexical knowledge is to use type hierarchies with multiple inheritance (Pollard and Sag 1987: chapter 8.1). Sag (1997) argued for several different immediate-dominance schemata for variants of English relative clauses and modified the feature geometry of HPSG in a way that made it possible to capture the generalizations over the various schemata in an inheritance hierarchy. Figure 27.14 gives an example of how (a part of) an inheritance hierarchy that includes both lexical and phrasal types may look. (52)
Fig 27.14. Part of an inheritance hierarchy that contains lexical entries and immediate dominance schemata.
In section 2.5 we discussed constraints on phrases of type headed-phrase. Since structures of the type head-complement-phrase are a subtype of headed-phrase, they inherit all the constraints from their supertype. Hence, head features at the mother node of a head-complement phrase are identified with the head features of the head daughter. Similarly the constraint that there is an nominative and an accusative object is represented at the type transitive-verb. The type strict-transitive-verb adds the information that there is no further argument and the type ditransitive-verb adds the information about an additional dative argument. It should be noted that such inheritance hierarchies cannot capture all generalizations that one wants to capture in a grammar. Inheritance hierarchies do capture so-called vertical generalizations, but horizontal generalizations are left to lexical rules (Meurers 2001): It was argued by Krieger and Nerbonne (1993), Koenig (1999), and Müller (2006, 2010) that all linguistic phenomena that interact with valence and/or derivational morphology should be treated by lexical rules rather than inheritance hierarchies. The reason for this is that these phenomena can be iterated as for instance in great-great-grandfather or preprepreversion. The double combination of a prefix with a stem cannot be handled by inheritance, since inheriting information about the presence of a prefix twice would not add new information. In contrast a lexical rule can combine a stem with an affix and apply recursively to its output, for instance, adding pre- to preversion. Hence, both lexical rules and inheritance hierarchies are needed.
27. HPSG − A Synopsis
961
8. Convergence of theories and differences It is interesting to see that various theories converge on similar analyses despite differences regarding fundamental assumptions. For instance many analyses in GB/MP, LFG, TAG, and HPSG assume a head-movement analysis for German or something that is equivalent. The “base-generation” approach to constituent order that assumes that arguments can be combined with their head in any order (see section 2.4) can also be found in Minimalist work (Fanselow 2001). Since it was argued that movement-based analyses make wrong predictions with respect to quantifier scope (Kiss 2001: 146; Fanselow 2001: section 2.6) and that feature-driven, movement-based accounts of constituent order make wrong predictions (Fanselow 2003), the base-generation analyses seem to be the only option. Furthermore, the importance of phrasal constructions has been noted across frameworks (Sag 1997; Ginzburg and Sag 2000; Culicover and Jackendoff 2005; Jackendoff 2008; Jacobs 2008) and inheritance hierarchies have been found to be a good tool to capture generalizations (HPSG, CxG, Simpler Syntax, LFG, some versions of CG, and TAG). However, the question to what extent phrasal constructions should be used in linguistic descriptions is currently under discussion. Goldberg (1995, 2006), Goldberg and Jackendoff (2004), Culicover and Jackendoff (2005), Alsina (1996), and Asudeh, Dalrymple, and Toivonen (2008) suggest analyzing resultative constructions and/or caused motion constructions as phrasal constructions. As was argued in Müller (2006) this is incompatible with the assumption of Lexical Integrity, that is, that word formation happens before syntax (Bresnan and Mchombo 1995). Let us consider a concrete example, such as (53): (53) a. Er tanzt die Schuhe blutig / in Stücke. he dances the shoes bloody into pieces ‘He dances the shoes bloody / into pieces.’
[German]
b. die in Stücke / blutig getanzten Schuhe the into pieces bloody danced shoes ‘the shoes that were danced bloody/into pieces’ c. *die getanzten Schuhe the danced shoes The shoes are not a semantic argument of tanzt ‘dances’. Nevertheless the NP that is realized as accusative NP in (53a) is the element the adjectival participle in (53b) predicates over. Adjectival participles like the one in (53b) are derived from a passive participle of a verb that governs an accusative object. If the accusative object is licensed phrasally by configurations like the one in (53a) it cannot be explained why the participle getanzte can be formed despite the absence of an accusative object. See Müller (2006: section 5) for further examples of the interaction of resultatives and morphology. The conclusion, which was drawn in the late 70s and early 80s by Dowty (1978: 412) and Bresnan (1982: 21), is that phenomena that feed morphology should be treated lexically. The natural analysis in frameworks like HPSG, CG, CxG, and LFG is therefore one that assumes a lexical rule for the licensing of resultative constructions. See Verspoor (1997);
962
IV. Syntactic Models Wechsler (1997); Wechsler and Noh (2001); Kay (2005) and Simpson (1983) for lexical proposals in some of these frameworks. If resultative constructions are lexical constructions, one needs rather abstract schemata for the combination of the respective lexical items. The grammatical examples in (53) are licenced by the rather general schemata for head-complement phrases and headadjunct phrases that are assumed in HPSG. As I have shown in Müller (2013c), the Head-Specifier Schema and the Head-Complement Schema can be seen as formalized versions of External Merge and the Head-Filler Schema as a version of Internal Merge as assumed in the Minimalist literature and informaly stated in Chomsky (2008, 2013). So, the truth seems to lie somewhere in the middle: on the one hand we need phrasal constructions for some of the phenomena that Jackendoff and others discussed, e. g. the NPN-construction (Jackendoff 2008), but on the other hand we need abstract schemata for combining lexical items. While there are many commonalities between frameworks and the descriptive tools that are used, there are huge differences in the ways arguments for the specific analyses are stated. Given the evidence that has accumulated over the past few decades it cannot be ignored any longer that Chomsky’s claims regarding innateness of language-specific knowledge are too strong. For instance Bod (2009) shows that Chomsky’s Poverty of the Stimulus argument (Chomsky 1971: 29−33) is not convincing since the appropriate grammars can be acquired from the input even if the input does not contain sentences of the type Chomsky claims to be necessary for correct acquisition. Gold’s formal proof that natural languages are not identifiable in the limit from positive data (Gold 1967) was shown to be irrelevant for the problem of language acquisition (Pullum 2003; Johnson 2004). All other arguments for innate linguistic knowledge have been challenged too. See for instance Tomasello (1995, 2003); Da˛browska (2004); Goldberg (2004, 2006) and Müller (2013b) for an overview. While early publications in HPSG assumed a theory of UG, researchers are much more careful nowadays as far as claims regarding innateness are concerned. A consequence of this is that theoretical entities are only stipulated if there is language internal evidence for them (Müller 2013a), that is, if it is plausible that language learners can acquire knowledge about certain linguistic objects or structures from their input. To take an example, empty nodes for object agreement (AgrO) in German cannot be justified on the basis of object agreement data from Basque. German does not have object agreement and hence AgrO is not assumed to be an entity of grammars of German. While Fodor’s work on UG-based language acquisition (1998) is compatible with HPSG, recent work on language acquisition tries to explain language acquisition without recourse to innate linguistic knowledge (Green 2011). The insights of Tomasello (2003) and Goldberg et al. (2004) can be directly transferred to a Greenian approach. For details see Müller (2013b: section 11.4.4, section 11.11.8.1). Another important point in comparing frameworks is that HPSG is a model theoretic and hence a constraint-based approach (King 1999; Pollard 1999; Richter 2007). HPSG shares this view with certain formalizations of LFG (Kaplan 1995), GPSG (Gazdar et al. 1988; Rogers 1997), GB (Rogers 1998), and the Minimalist Program (Veenstra 1998). In model theoretic approaches, well-formedness constraints on objects are formulated. Everything that is not ruled out explicitly is allowed. As Pullum and Scholz (2001) pointed out, this makes it possible to assign structure to fragments of utterances, something that is impossible for generative-enumerative approaches, which enumerate an infi-
27. HPSG − A Synopsis nite set of well-formed utterances. Since partial phrases as for instance and of the from (54) are not in this set, they do not get a structure. (54) That cat is afraid of the dog and of the parrot. Furthermore increased markedness of utterances can be explained with reference to the number and strength of violated constraints. Gradedness has not been accounted for in generative-enumerative approaches. Chomsky’s attempts to introduce gradedness into Generative Grammar (1964) are discussed in detail by Pullum and Scholz (2001: 29). Another advantage of surface-oriented constraint-based theories is that they are compatible with performance models (Sag and Wasow 2011). Psycholinguistic experiments have shown that hearers use information from phonology, morphology, syntax, semantics, information structure, and world knowledge in parallel (Crain and Steedman 1985; Tanenhaus et al. 1996). This is evidence against strong modularity in the sense of Fodor (1983) and therefore against theories that assume that the phonological realization of an utterance is the spell-out of a syntactic structure and the meaning is the interpretation of the syntactic structure. Rather constraints from all descriptive levels interact and are used for parsing and disambiguation as soon as the relevant information is available to the hearer. In particular interpretation is not delayed to the end of a phrase or a phase (see Chomsky 2008 on phases and Richards, this volume for several different approaches to transfer of linguistic objects to the semantics component that is assumed in Minimalism).
9. Conclusion This brief introduction to HPSG showed that typed feature descriptions can be used to formulate constraints on all linguistic aspects. In particular it was shown how to capture valence, dominance and precedence, long-distance dependencies, and the linking between syntax and semantics. It was discussed how lexical rules (described with typed feature descriptions) can be used to analyze morphological phenomena and how certain idioms can be analyzed lexically while others require a flexible conception of the lexicon that includes phrasal lexical items among the entities stored in the lexicon. Pointers to publications dealing with phonology and information structure were provided. Due to the uniform representation of all information as feature value pairs, generalizations regarding roots, words, lexical rules, and phrases can be captured by inheritance hierarchies. The consistency of many of the analyses has been verified in small-, medium-, or large-scale computer implementations. I hope to have shown that HPSG is a full-blown framework that has worked out analyses on every linguistic level. It can be paired with psycholinguistically adequate performance theories and is compatible with results from language acquisition research. Hence HPSG fulfills all the desiderata for a cognitively plausible linguistic theory.
27. Acknowledgements I thank Emily M. Bender, Philippa Cook, Bjarne Ørsnes, Frank Richter, and an anonymous reviewer for comments on an earlier version of this paper.
963
964
IV. Syntactic Models
10. References (selected) Abeillé, Anne 1988 Parsing French with Tree Adjoining Grammar: Some linguistic accounts. In: Proceedings of COLING, 7−12. Budapest. Abeillé, Anne, and Yves Schabes 1989 Parsing idioms in Lexicalized TAG. In: Harold Somers and Mary McGee Wood (eds.), Proceedings of the Fourth Conference of the European Chapter of the Association for Computational Linguistics, 1−9. Manchester, England: Association for Computational Linguistics. Ackerman, Farrell, and Gert Webelhuth 1998 A Theory of Predicates. (CSLI Lecture Notes 76.) Stanford, CA: CSLI Publications. Alsina, Alex 1996 Resultatives: A joint operation of semantic and syntactic structures. In: Miriam Butt and Tracy Holloway King (eds.), Proceedings of the LFG ’96 Conference, Rank Xerox, Grenoble. Stanford, CA, CSLI Publications. Asudeh, Ash, Mary Dalrymple, and Ida Toivonen 2008 Constructions with lexical integrity: Templates as the lexicon-syntax interface. In: Miriam Butt and Tracy Holloway King (eds.), Proceedings of the LFG 2008 Conference, 68−88. Stanford, CA, CSLI Publications. Asudeh, Ash, and Ewan Klein 2002 Shape conditions and phonological context. In: Frank Van Eynde, Lars Hellan and Dorothee Beermann (eds.), The Proceedings of the 8th International Conference on HeadDriven Phrase Structure Grammar, 20−30. Stanford, CA, CSLI Publications. Barwise, Jon, and John Perry 1983 Situations and Attitudes. Cambridge, MA/London, England: MIT Press. Beavers, John, and Ivan A. Sag 2004 Coordinate ellipsis and apparent non-constituent coordination. In: Stefan Müller (ed.), Proceedings of the 11th International Conference on Head-Driven Phrase Structure Grammar, 48−69. Stanford, CA: CSLI Publications. Bender, Emily M. 2008 Radical non-configurationality without shuffle operators: An analysis of Wambaya. In: Stefan Müller (ed.), Proceedings of the 15th International Conference on Head-Driven Phrase Structure Grammar, 6−24 Stanford, CA, CSLI Publications. Bildhauer, Felix 2008 Representing information structure in an HPSG grammar of Spanish. Dissertation, Universität Bremen. Bildhauer, Felix, and Philippa Cook 2010 German multiple fronting and expected topic-hood. In: Stefan Müller (ed.), Proceedings of the 17th International Conference on Head-Driven Phrase Structure Grammar, 68− 79. Stanford, CA: CSLI Publications. Bird, Steven 1995 Computational Phonology: A Constraint-Based Approach. (Studies in Natural Language Processing.) Cambridge: Cambridge University Press. Bird, Steven, and Ewan Klein 1994 Phonological analysis in typed feature systems. Computational Linguistics 20(3): 455− 491. Bod, Rens 2009 From exemplar to grammar: Integrating analogy and probability in language learning. Cognitive Science 33(4): 752−793.
27. HPSG − A Synopsis Bouma, Gosse, Robert Malouf, and Ivan A. Sag 2001 Satisfying constraints on extraction and adjunction. Natural Language and Linguistic Theory 19(1): 1−65. Bouma, Gosse, and Gertjan van Noord 1998 Word order constraints on verb clusters in German and Dutch. In: Erhard W. Hinrichs, Andreas Kathol and Tsuneko Nakazawa (eds.), Complex Predicates in Nonderivational Syntax, 43−72. (Syntax and Semantics 30.) San Diego: Academic Press. Bresnan, Joan 1982 The passive in lexical theory. In: Joan Bresnan (ed.), The Mental Representation of Grammatical Relations, 3−86. (MIT Press Series on Cognitive Theory and Mental Representation.) Cambridge, MA/London: MIT Press. Bresnan, Joan, and Sam A. Mchombo 1995 The lexical integrity principle: Evidence from Bantu. Natural Language and Linguistic Theory 13: 181−254. Briscoe, Ted J., and Ann Copestake 1999 Lexical rules in constraint-based grammar. Computational Linguistics 25(4): 487−526. Carpenter, Bob 1992 The Logic of Typed Feature Structures. (Tracts in Theoretical Computer Science.) Cambridge: Cambridge University Press. Chomsky, Noam 1964 Degrees of grammaticalness. In: Jerry A. Fodor and Jerrold J. Katz (eds.), The Structure of Language, 384−389. Englewood Cliffs, NJ: Prentice-Hall. Chomsky, Noam 1971 Problems of Knowledge and Freedom. London: Fontana. Chomsky, Noam 2008 On phases. In: Robert Freidin, Carlos P. Otero and Maria Luisa Zubizarreta (eds.), Foundational Issues in Linguistic Theory. Essays in Honor of Jean-Roger Vergnaud, 133−166. Cambridge, MA: MIT Press. Chomsky, Noam 2013 Problems of projection. Lingua 130: 33−49. Copestake, Ann, and Ted J. Briscoe 1992 Lexical operations in a unification based framework. In: James Pustejovsky and Sabine Bergler (eds.), Lexical Semantics and Knowledge Representation, 101−119. (Lecture Notes in Artificial Intelligence 627.) Berlin/Heidelberg/New York, NY: Springer Verlag. Copestake, Ann, Daniel P. Flickinger, Carl J. Pollard, and Ivan A. Sag 2005 Minimal recursion semantics: an introduction. Research on Language and Computation 4(3): 281−332. Crain, Stephen, and Mark J. Steedman 1985 On not being led up the garden path: The use of context by the psychological syntax processor. In: David R. Dowty, Lauri Karttunen and Arnold M. Zwicky (eds.), Natural Language Parsing, 320−358. (Studies in Natural Language Processing 23.) Cambridge, UK: Cambridge University Press. Crysmann, Berthold 2008 An asymmetric theory of peripheral sharing in HPSG: Conjunction reduction and coordination of unlikes. In: Gerhard Jäger, Paola Monachesi, Gerald Penn and Shuly Wintner (eds.), Proceedings of Formal Grammar 2003, Vienna, Austria, 47−62. Stanford, CA: CSLI Publications. Culicover, Peter W. 1999 Syntactic Nuts: Hard Cases, Syntactic Theory, and Language Acquisition, Vol. 1 of Foundations of Syntax. Oxford: Oxford University Press.
965
966
IV. Syntactic Models Culicover, Peter W., and Ray S. Jackendoff 2005 Simpler Syntax. Oxford: Oxford University Press. Da˛browska, Ewa 2004 Language, Mind and Brain: Some Psychological and Neurological Constraints on Theories of Grammar. Washington, D.C.: Georgetown University Press. Davis, Anthony R. 2001 Linking by Types in the Hierarchical Lexicon. Stanford, CA: CSLI Publications. Davis, Anthony R., and Jean-Pierre Koenig 2000 Linking as constraints on word classes in a hierarchical lexicon. Language 76(1): 56−91. De Kuthy, Kordula 2002 Discontinuous NPs in German. (Studies in Constraint-Based Lexicalism 14.) Stanford, CA: CSLI Publications. Di Sciullo, Anna-Maria, and Edwin Williams 1987 On the Definition of Word. (Linguistic Inquiry Monographs 14.) Cambridge, MA/London, England: MIT Press. Donohue, Cathryn, and Ivan A. Sag 1999 Domains in Warlpiri. In: Sixth International Conference on HPSG−Abstracts. 04−06 August 1999, 101−106. Edinburgh. Dowty, David R. 1978 Governed transformations as lexical rules in a Montague Grammar. Linguistic Inquiry 9(3): 393−426. Dowty, David R. 1979 Word Meaning and Montague Grammar. (Synthese Language Library 7.) Dordrecht/ Boston/London: D. Reidel Publishing Company. Dyvik, Helge, Paul Meurer, and Victoria Rosén 2005 LFG, Minimal Recursion Semantics and Translation. Paper preseneted at the LFG conference 2005. Egg, Markus 1999 Derivation and resolution of ambiguities in wieder-sentences. In: Paul J. E. Dekker (ed.), Proceedings of the 12th Amsterdam Colloquium, 109−114. Egg, Markus, Alexander Koller, and Joachim Niehren 2001 The constraint language for lambda structures. Journal of Logic, Language and Information 10 4: 457−485. Engdahl, Elisabet, and Enric Vallduví 1996 Information packaging in HPSG. In: Claire Grover and Enric Vallduví (eds.), Edinburgh Working Papers in Cognitive Science, Vol. 12: Studies in HPSG, chap. 1, 1−32. Scotland: Centre for Cognitive Science, University of Edinburgh. Engelkamp, Judith, Gregor Erbach, and Hans Uszkoreit 1992 Handling linear precedence constraints by unification. In: Henry S. Thomson (ed.), 30th Annual Meeting of the Association for Computational Linguistics. Proceedings of the Conference, 201−208. Newark, DE: Association for Computational Linguistics. Also appeared as CLAUS-Report, No. 19, University of the Saarland. Fanselow, Gisbert 2001 Features, θ-roles, and free constituent order. Linguistic Inquiry 32(3): 405− 437. Fanselow, Gisbert 2003 Free constituent order: A Minimalist interface account. Folia Linguistica 37(1−2): 191−231. Fodor, Janet Dean 1998 Unambiguous triggers. Linguistic Inquiry 29(1): 1−36. Fodor, Jerry A. 1983 Modularity of Mind: An Essay on Faculty Psychology. Cambridge, MA: MIT Press.
27. HPSG − A Synopsis Frank, Anette 1994 Verb Second by Lexical Rule or by Underspecification. Arbeitspapiere des SFB 340 No. 43, IBM Deutschland GmbH, Heidelberg. Frank, Anette, and Uwe Reyle 1995 Principle based semantics for HPSG. In: Steven P. Abney and Erhard W. Hinrichs (eds.), Proceedings of the Seventh Conference of the European Chapter of the Associationfor Computational Linguistics, 9−16. Dublin: Association for Computational Linguistics. Gazdar, Gerald 1981 Unbounded dependencies and coordinate structure. Linguistic Inquiry 12: 155−184. Gazdar, Gerald, Ewan Klein, Geoffrey K. Pullum, and Ivan A. Sag 1985 Generalized Phrase Structure Grammar. Cambridge, MA: Harvard University Press. Gazdar, Gerald, Geoffrey K. Pullum, Bob Carpenter, Ewan Klein, Thomas E. Hukari, and Robert D. Levine 1988 Category structures. Computational Linguistics 14(1): 1−19. Ginzburg, Jonathan, and Ivan A. Sag 2000 Interrogative Investigations: the Form, Meaning, and Use of English Interrogatives. (CSLI Lecture Notes 123.) Stanford, CA: CSLI Publications. Gold, Mark E. 1967 Language identification in the limit. Information and Control 10(5): 447−474. Goldberg, Adele E. 1995 Constructions. A Construction Grammar Approach to Argument Structure. (Cognitive Theory of Language and Culture.) Chicago/London: The University of Chicago Press. Goldberg, Adele E. 2004 But do we need universal grammar? comment on Lidz et al. (2003). Cognition 94: 77−84. Goldberg, Adele E. 2006 Constructions at Work. The Nature of Generalization in Language. (Oxford Linguistics.) Oxford, New York: Oxford University Press. Goldberg, Adele E., Devin Casenhiser, and Nitya Sethuraman 2004 Learning argument structure generalizations. Cognitive Linguistics 15(3): 289−316. Goldberg, Adele E., and Ray S. Jackendoff 2004 The English resultative as a family of constructions. Language 80(3): 532−568. Green, Georgia M. 2011 Modelling grammar growth: Universal grammar without innate principles or parameters. In: Robert Borsley and Kersti Börjars (eds.), Non-Transformational Syntax: Formal and Explicit Models of Grammar: A Guide to Current Models, 378−403. Oxford, UK/Cambridge, MA: Blackwell Publishing Ltd. Gunji, Takao 1986 Subcategorization and word order. In: William J. Poser (ed.), Papers from the Second International Workshop on Japanese Syntax, 1−21. Stanford, CA: CSLI Publications. Hinrichs, Erhard W., and Tsuneko Nakazawa 1989 Subcategorization and VP structure in German. In: Aspects of German VP Structure. (SfS-Report-01-93.) Eberhard-Karls-Universität Tübingen. Höhle, Tilman N. 1997 Vorangestellte Verben und Komplementierer sind eine natürliche Klasse. In: Christa Dürscheid, Karl Heinz Ramers and Monika Schwarz (eds.), Sprache im Fokus. Festschrift für Heinz Vater zum 65. Geburtstag, 107−120. Tübingen: Max Niemeyer Verlag. Höhle, Tilman N. 1999 An architecture for phonology. In: Robert D. Borsley and Adam Przepiórkowski (eds.), Slavic in Head-Driven Phrase Structure Grammar, 61−90. Stanford, CA: CSLI Publications.
967
968
IV. Syntactic Models Jackendoff, Ray S. 1997 The Architecture of the Language Faculty. (Linguistic Inquiry Monographs 28.) Cambridge, MA/London: MIT Press. Jackendoff, Ray S. 2008 Construction after construction and its theoretical challenges. Language 84(1): 8−28. Jacobs, Joachim 2008 Wozu Konstruktionen? Linguistische Berichte 213: 3−44. Jäger, Gerhard, and Reinhard Blutner 2003 Competition and interpretation: The German adverb wieder („again“). In: Ewald Lang, Claudia Maienborn and Cathrine Fabricius-Hansen (eds.), Modifying Adjuncts, 393−416. (Interface Explorations 4.) Berlin: Mouton de Gruyter. Johnson, Kent 2004 Gold’s theorem and cognitive science. Philosophy of Science 71(4): 571−592. Johnson, Mark 1988 Attribute-Value Logic and the Theory of Grammar. (CSLI Lecture Notes 14.) Stanford, CA: CSLI Publications. Kallmeyer, Laura, and Aravind K. Joshi 2003 Factoring predicate argument and scope semantics: Underspecified semantics with LTAG. Research on Language and Computation 1(1−2): 3−58. Kaplan, Ronald M. 1995 The formal architecture of Lexical-Functional Grammar. In: Mary Dalrymple, Ronald M. Kaplan, John T. Maxwell III and Annie Zaenen (eds.), Formal Issues in Lexical Functional Grammar, 7−27. (CSLI Lecture Notes 47.) Stanford, CA: CSLI Publications. Kathol, Andreas 1995 Linearization-based German syntax. Ph.D. thesis, Ohio State University. Kathol, Andreas 1999 Agreement and the syntax-morphology interface in HPSG. In: Robert D. Levine and Georgia M. Green (eds.), Studies in Contemporary Phrase Structure Grammar, 223− 274. Cambridge, UK: Cambridge University Press. Kathol, Andreas 2000 Linear Syntax. New York, Oxford: Oxford University Press. Kay, Paul 2005 Argument structure constructions and the argument-adjunct distinction. In: Mirjam Fried and Hans C. Boas (eds.), Grammatical Constructions: Back to the Roots, 71−98. (Constructional Approaches to Language 4.) Amsterdam/Philadelphia: John Benjamins Publishing Co. Keenan, Edward L., and Bernard Comrie 1977 Noun phrase accessibility and universal grammar. Linguistic Inquiry 8(1): 63−99. King, Paul 1994 An Expanded Logical Formalism for Head-Driven Phrase Structure Grammar. Arbeitspapiere des SFB 340 No. 59, Eberhard-Karls-Universität, Tübingen. King, Paul 1999 Towards truth in head-driven phrase structure grammar. In: Valia Kordoni (ed.), Tübingen Studies in Head-Driven Phrase Structure Grammar, 301−352. (Arbeitsberichte des SFB 340 No. 132.) Tübingen, Universität Tübingen. Kiss, Tibor 1992 Variable Subkategorisierung. Eine Theorie unpersönlicher Einbettungen im Deutschen. Linguistische Berichte 140: 256−293. Kiss, Tibor 1995 Infinite Komplementation. Neue Studien zum deutschen Verbum infinitum. (Linguistische Arbeiten 333.) Tübingen: Max Niemeyer Verlag.
27. HPSG − A Synopsis Kiss, Tibor 2001 Configurational and relational scope determination in German. In: Walt Detmar Meurers and Tibor Kiss (eds.), Constraint-Based Approaches to Germanic Syntax, 141−175. (Studies in Constraint-Based Lexicalism 7.) Stanford, CA: CSLI Publications. Kiss, Tibor, and Birgit Wesche 1991 Verb order and head movement. In: Otthein Herzog and Claus-Rainer Rollinger (eds.), Text Understanding in LILOG, 216−242. (Lecture Notes in Artificial Intelligence 546.) Berlin/Heidelberg/New York, NY: Springer Verlag. Klein, Ewan 2000 A constraint-based approach to English prosodic constituents. In: 38th Annual Meeting of the Association for Computational Linguistics: Proceedings of the Conference, 217− 224. Hong Kong, Association for Computational Linguistics. Koenig, Jean-Pierre 1999 Lexical Relations. (Stanford Monographs in Linguistics.) Stanford, CA: CSLI Publications. Krieger, Hans-Ulrich, and John Nerbonne 1993 Feature-based inheritance networks for computational lexicons. In: Ted Briscoe, Ann Copestake and Valeria de Paiva (eds.), Inheritance, Defaults, and the Lexicon, 90−136. Cambridge, UK: Cambridge University Press. A version of this paper is available as DFKI Research Report RR-91−31. Also published in: Proceedings of the ACQUILEX Workshop on Default Inheritance in the Lexicon, Technical Report No. 238, University of Cambridge, Computer Laboratory, October 1991. Lascarides, Alex, and Ann Copestake 1999 Default representation in constraint-based frameworks. Computational Linguistics 25(1): 55−105. Levine, Robert D., and Thomas E. Hukari 2006 The Unity of Unbounded Dependency Constructions. (CSLI Lecture Notes 166.) Stanford, CA: CSLI Publications. Meurers, Walt Detmar 1995 Towards a semantics for lexical rules as used in HPSG. In: Glyn V. Morrill and Richard T. Oehrle (eds.), Proceedings of the Formal Grammar Conference Barcelona, Spain. Meurers, Walt Detmar 1999 Raising spirits (and assigning them case). Groninger Arbeiten zur Germanistischen Linguistik (GAGL) 43: 173−226. Meurers, Walt Detmar 2000 Lexical Generalizations in the Syntax of German Non-Finite Constructions. Arbeitspapiere des SFB 340 No. 145, Eberhard-Karls-Universität, Tübingen. Meurers, Walt Detmar 2001 On expressing lexical generalizations in HPSG. Nordic Journal of Linguistics 24(2): 161−217. Müller, Gereon 2011 Regeln oder Konstruktionen? Von verblosen Direktiven zur sequentiellen Nominalreduplikation. In: Stefan Engelberg, Anke Holler and Kristel Proost (eds.), Sprachliches Wissen zwischen Lexikon und Grammatik, 211−249. (Institut für Deutsche Sprache, Jahrbuch 2010.) Berlin/New York, NY: de Gruyter. Müller, Stefan 1995 Scrambling in German − extraction into the Mittelfeld. In: Benjamin K. T’sou and Tom Bong Yeung Lai (eds.), Proceedings of the tenth Pacific Asia Conference on Language, Information and Computation, 79−83. City University of Hong Kong. Müller, Stefan 1999 Deutsche Syntax deklarativ. Head-Driven Phrase Structure Grammar für das Deutsche. (Linguistische Arbeiten 394.) Tübingen: Max Niemeyer Verlag.
969
970
IV. Syntactic Models Müller, Stefan 2002 Complex Predicates: Verbal Complexes, Resultative Constructions, and Particle Verbs in German. (Studies in Constraint-Based Lexicalism 13.) Stanford, CA: CSLI Publications. Müller, Stefan 2003 Mehrfache Vorfeldbesetzung. Deutsche Sprache 31(1): 29−62. Müller, Stefan 2004 Continuous or discontinuous constituents? a comparison between syntactic analyses for constituent order and their processing systems. Research on Language and Computation, Special Issue on Linguistic Theory and Grammar Implementation 2 2: 209−257. Müller, Stefan 2005a Zur Analyse der deutschen Satzstruktur. Linguistische Berichte 201: 3−39. Müller, Stefan 2005b Zur Analyse der scheinbar mehrfachen Vorfeldbesetzung. Linguistische Berichte 203: 297−330. Müller, Stefan 2006 Phrasal or lexical constructions? Language 82(4): 850−883. Müller, Stefan 2008 Depictive secondary predicates in German and English. In: Christoph Schroeder, Gerd Hentschel and Winfried Boeder (eds.), Secondary Predicates in Eastern European Languages and Beyond, 255−273. (Studia Slavica Oldenburgensia 16.) Oldenburg: BISVerlag. Müller, Stefan 2009 On predication. In: Stefan Müller (ed.), Proceedings of the 16th International Conference on Head-Driven Phrase Structure Grammar, 213−233. Stanford, CA: CSLI Publications. Müller, Stefan 2010 Persian complex predicates and the limits of inheritance-based analyses. Journal of Linguistics 46(3): 601−655. Müller, Stefan 2013a The CoreGram project: A brief overview and motivation. In: Denys Duchier and Yannick Parmentier (eds.), Proceedings of the Workshop on High-level Methodologies for Grammar Engineering (HMGE 2013), Düsseldorf, 93−104. Müller, Stefan 2013b Grammatiktheorie. (Stauffenburg Einführungen 20.) Tübingen: Stauffenburg Verlag, 2. edition. Müller, Stefan 2013c Unifying everything: Some remarks on Simpler Syntax, Construction Grammar, Minimalism and HPSG. Language 89(4): 920−950. Müller, Stefan, and Bjarne Ørsnes 2013 Towards an HPSG analysis of object shift in Danish. In: Glyn Morrill and Mark-Jan Nederhof (eds.), Formal Grammar: 17th and 18th International Conferences, FG 2012, Opole, Poland, August 2012, Revised Selected Papers, FG 2013, Düsseldorf, Germany, August 2013. Proceedings, 69−89. (Lecture Notes in Computer Science 8036.) Berlin/ Heidelberg/New York, NY: Springer Verlag. Newmeyer, Frederick J. 2005 Possible and Probable Languages: A Generative Perspective on Linguistic Typology. Oxford: Oxford University Press. Nordlinger, Rachel 1998 Constructive Case: Evidence from Australia. (Dissertations in Linguistics.) Stanford, CA: CSLI Publications. Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow 1994 Idioms. Language 70(3): 491−538.
27. HPSG − A Synopsis Orgun, Cemil Orhan 1996 Sign-based morphology and phonology. Ph.D. thesis, University of California, Berkeley. Paggio, Patrizia 2005 Representing information structure in a formal grammar of Danish. In: Proceedings of the 2nd International Workshop on Logic and Engineering of Natural Language Semantics (LENLS2005). Kitakyushu, Japan. June 13−14. Pollard, Carl J. 1996 On head non-movement. In: Harry Bunt and Arthur van Horck (eds.), Discontinuous Constituency, 279−305. (Natural Language Processing 6.) Berlin/New York, NY: Mouton de Gruyter. Veröffentlichte Version eines Ms. von 1990. Pollard, Carl J. 1999 Strong generative capacity in HPSG. In: Gert Webelhuth, Jean-Pierre Koenig and Andreas Kathol (eds.), Lexical and Constructional Aspects of Linguistic Explanation, 281− 298. (Studies in Constraint-Based Lexicalism 1.) Stanford, CA: CSLI Publications. Pollard, Carl J., and Ivan A. Sag 1987 Information-Based Syntax and Semantics. (CSLI Lecture Notes 13.) Stanford, CA: CSLI Publications. Pollard, Carl J., and Ivan A. Sag 1994 Head-Driven Phrase Structure Grammar. (Studies in Contemporary Linguistics.) Chicago, IL/London: The University of Chicago Press. Przepiórkowski, Adam 1999 On case assignment and “adjuncts as complements”. In: Gert Webelhuth, Jean-Pierre Koenig and Andreas Kathol (eds.), Lexical and Constructional Aspects of Linguistic Explanation, 231−245. (Studies in Constraint-Based Lexicalism 1.) Stanford, CA: CSLI Publications. Pullum, Geoffrey K. 1977 Word order universals and grammatical relations. In: Peter Cole and Jerrold M. Sadock (eds.), Grammatical Relations, 249−277. (Syntax and Semantics 8.) New York, San Francisco, London: Academic Press. Pullum, Geoffrey K. 2003 Learnability: Mathematical aspects. In: William J. Frawley (ed.), Oxford International Encyclopedia of Linguistics, 431−434. Oxford: Oxford University Press, 2. edition. Pullum, Geoffrey K., and Barbara C. Scholz 2001 On the distinction between generative-enumerative and model-theoretic syntactic frameworks. In: Philippe de Groote, Glyn Morrill and Christian Retor (eds.), Logical Aspects of Computational Linguistics: 4th International Conference, 17−43. (Lecture Notes in Computer Science 2099.) Berlin/Heidelberg/New York, NY: Springer Verlag. Reape, Mike 1994 Domain union and word order variation in German. In: John Nerbonne, Klaus Netter and Carl J. Pollard (eds.), German in Head-Driven Phrase Structure Grammar, 151− 198. (CSLI Lecture Notes 46.) Stanford, CA: CSLI Publications. Richter, Frank 2004 A mathematical formalism for linguistic theories with an application in HeadDriven Phrase Structure Grammar. Phil. Dissertation (2000), Eberhard-Karls-Universität Tübingen. Richter, Frank 2007 Closer to the truth: A new model theory for HPSG. In: James Rogers and Stephan Kepser (eds.), Model-Theoretic Syntax at 10 − Proceedings of the ESSLLI 2007 MTS@10 Workshop, August 13−17, 101−110. Dublin, Trinity College Dublin. Richter, Frank, and Manfred Sailer 2004 Basic concepts of lexical resource semantics. In: Arnold Beckmann and Norbert Preining (eds.), ESSLLI 2003 − Course Material I, 87−143. (Collegium Logicum 5.) Kurt Gödel Society Wien.
971
972
IV. Syntactic Models Richter, Frank, and Manfred Sailer 2008 Simple trees with complex semantics: On epistemic modals and strong quantifiers. In: Maribel Romero (ed.), What Syntax Feeds Semantics?, 70−81. Konstanz, Fachbereich Sprachwissenschaft, Universität Konstanz. Richter, Frank, and Manfred Sailer 2009 Phraseological clauses as constructions in HPSG. In: Stefan Müller (ed.), Proceedings of the 16th International Conference on Head-Driven Phrase Structure Grammar, University of Göttingen, Germany, 297−317. Stanford, CA: CSLI Publications. Riehemann, Susanne Z. 1998 Type-based derivational morphology. Journal of Comparative Germanic Linguistics 2(1): 49−77. Rogers, James 1997 “Grammarless” phrase structure grammar. Linguistics and Philosophy 20: 721−746. Rogers, James 1998 A Descriptive Approach to Language-Theoretic Complexity. (Studies in Logic, Language and Information.) Stanford, CA: CSLI Publications. Sag, Ivan A. 1997 English relative clause constructions. Journal of Linguistics 33(2): 431−484. Sag, Ivan A. 2007 Remarks on locality. In: Stefan Müller (ed.), Proceedings of the 14th International Conference on Head-Driven Phrase Structure Grammar, 394−414. Stanford, CA: CSLI Publications. Sag, Ivan A. 2010 English filler-gap constructions. Language 86(3): 486−545. Sag, Ivan A., and Thomas Wasow 2011 Performance-compatible competence grammar. In: Robert Borsley and Kersti Börjars (eds.), Non-Transformational Syntax: Formal and Explicit Models of Grammar: A Guide to Current Models, 359−377. Oxford, UK/Cambridge, MA: Blackwell Publishing Ltd. Sailer, Manfred 2000 Combinatorial semantics and idiomatic expressions in Head-Driven Phrase Structure Grammar. Dissertation, Eberhard-Karls-Universität Tübingen. Shieber, Stuart M. 1986 An Introduction to Unification-Based Approaches to Grammar. (CSLI Lecture Notes 4.) Stanford, CA: CSLI Publications. Simpson, Jane 1983 Resultatives. In: Lori S. Levin, Malka Rappaport and Annie Zaenen (eds.), Papers in Lexical Functional Grammar. Indiana University Linguistics Club. Reprint: Simpson (2005). Simpson, Jane 2005 Resultatives. In: Miriam Butt and Tracy Holloway King (eds.), Lexical Semantics in LFG, 149−161. Stanford, CA: CSLI Publications. Soehn, Jan-Philipp, and Manfred Sailer 2008 At first blush on tenterhooks. about selectional restrictions imposed by nonheads. In: Gerhard Jäger, Paola Monachesi, Gerald Penn and Shuly Wintner (eds.), Proceedings of Formal Grammar 2003, Vienna, Austria, 149−161. Stanford, CA, CSLI Publications. Tanenhaus, Michael K., Michael J. Spivey-Knowlton, Kathleen M. Eberhard, and Julie C. Sedivy 1996 Using eye movements to study spoken language comprehension: Evidence for visually mediated incremental interpretation. In: Toshio Inui and James L. McClelland (eds.), Information Integration in Perception and Communication, 457−478. (Attention and Performance XVI.) Cambridge, MA: MIT Press.
27. HPSG − A Synopsis
973
Tomasello, Michael 1995 Language is not an instinct. Cognitive Development 10(1): 131−156. Tomasello, Michael 2003 Constructing a Language: A Usage-Based Theory of Language Acquisition. Cambridge, MA: Harvard University Press. Uszkoreit, Hans 1987 Word Order and Constituent Structure in German. (CSLI Lecture Notes 8.) Stanford, CA: CSLI Publications. Veenstra, Mettina Jolanda Arnoldina 1998 Formalizing the minimalist program. Ph.D. thesis, Groningen. Verspoor, Cornelia Maria 1997 Contextually-dependent lexical semantics. Ph.D. thesis, University of Edinburgh. von Stechow, Arnim 1996 The different readings of wieder “again”: A structural account. Journal of Semantics 13 2: 87−138. Wechsler, Stephen Mark 1995 The Semantic Basis of Argument Structure. (Dissertations in Linguistics.) Stanford: CSLI Publications. Wechsler, Stephen Mark 1997 Resultative predicates and control. In: Ralph C. Blight and Michelle J. Moosally (eds.), Texas Linguistic Forum 38: The Syntax and Semantics of Predication. Proceedings of the 1997 Texas Linguistics Society Conference, 307−321. Austin, Texas: University of Texas Department of Linguistics. Wechsler, Stephen Mark, and Bokyung Noh 2001 On resultative predicates and clauses: Parallels between Korean and English. Language Sciences 23 4: 391−423. Wechsler, Stephen Mark, and Larisa Zlatic´ 2003 The Many Faces of Agreement. (Stanford Monographs in Linguistics.) Stanford, CA: CSLI Publications. Wilcock, Graham 2001 Towards a discourse-oriented representation of information structure in HPSG. In: 13th Nordic Conference on Computational Linguistics, Uppsala, Sweden.
Stefan Müller, Berlin (Germany)
974
IV. Syntactic Models
28. Construction Grammar 1. 2. 3. 4. 5. 6.
Introduction Basic notions Notational conventions Explanatory potential beyond traditional syntactic analyses Concluding remarks References (selected)
Abstract The article presents an overview of the fundamental features that characterize Construction Grammar as a distinct approach to linguistic analysis. The central notion is that of a construction, defined as a theoretical entity and the basic unit of analysis, which allows for both the holistic view of linguistic patterning and for analyzing the internal properties of larger patterns. The article discusses the theory’s conceptual underpinnings, illustrates the basics of its representational apparatus, and provides a brief demonstration of the ways in which a constructional analysis can be carried out. The constructional approach is shown to be now expanding into several areas beyond its original aim (syntactic description), including language change, typology, acquisition, and corpus and computational applications.
1. Introduction 1.1. Conceptual underpinnings Construction Grammar (CxG) is a theoretical approach in which generalizations about linguistic structure are formulated in terms of constructions, i.e., conventionalized clusters of features (syntactic, prosodic, pragmatic, semantic, textual, etc.) that recur as further indivisible associations between form and meaning (meaning is broadly understood, see below). The constructional approach developed out of a confluence of interests − linguistic, cognitive, anthropological, philosophical, computational − which all revolved around the idea that linguistic form is inextricably bound with its meaning and its communicative function and that this connection must be the basis for any descriptively and explanatorily adequate theory of linguistic structure. The conceptual origins of CxG can be traced most directly to Fillmore’s Case Grammar, a case-role based approach to syntactic analysis, laid out in Fillmore’s (1968) seminal paper. The goal of CxG is to account for the defining properties of all types of linguistic expressions. This is based on the assumption that any kind of linguistic structure − whether considered regular or relatively unusual − has an equal informational value in our quest for understanding the nature of language as a particular kind of cognitive and social behavior. The explicitly stated objective is to study language in its totality, without making any distinction between core and periphery or assuming that certain structures
28. Construction Grammar are inherently more deserving of an analyst’s attention. The justification for such an approach can be articulated in terms of the following two hypotheses: (i) a model that can handle complicated, out-of-the-ordinary patterns can surely handle the common ones as well and (ii) the study of unusual patterning can also help us understand the nature of grammar organization in general. A research program of this kind necessarily calls for a relatively complex basic unit of analysis, one that can accommodate features of various kinds (syntactic, morphological, semantic, pragmatic, etc.) in a single integrated and internally structured whole. Consistent with this requirement is the idea of a sign as a symbolic unit that represents a conventional association between form and meaning/function; in CxG, the sign is called construction and applies to all types of linguistic entities. Form in constructions may refer to any combination of syntactic, morphological, or prosodic features and meaning/ function is understood in a broad sense that includes reference to lexical semantics, event structure, diathesis, pragmatics, and discourse structure (a detailed explication and exemplification of the combinatorial possibilities can be found in Fried and Östman 2004b: 18−22). A grammar in this view consists of a repertoire of constructions, which are organized in networks of overlapping and complementary patterns. The central importance of constructions is motivated by two empirical observations: (i) even semantically opaque expressions (idioms) may share certain aspects of regular syntactic structure with fully productive expressions (Fillmore, Kay and O’Connor 1988) and (ii) even seemingly transparent syntactic structures may involve all sorts of unpredictable constraints that cannot be simply derived from the syntax alone (cf. Fillmore’s 1986a analysis of English conditionals). A fundamental claim of construction grammarians is the following: to ignore either of these two observations would mean to miss important generalizations about the nature of linguistic patterning and the nature of speakers’ linguistic knowledge. The work of different constructional analysts may emphasize one or the other perspective, but systematic research of the last thirty years has shown that both perspectives are inextricably interconnected and the notion of idiomaticity requires a much more nuanced approach than the traditional division based essentially on semantic non-compositionality of particular expressions. The difficulties of drawing the line between what should count as an idiom in the traditional sense and a productive syntactic pattern have been addressed explicitly in numerous articles (e.g. Fillmore 1989 on the expression type the greener the better, Lambrecht 1988 on There’s a farmer had a dog, Kay and Fillmore 1999 on What’s Bill doing inspecting the car?). But the challenge in addressing this distinction can also manifest itself in more subtle ways, involving expressions that are syntactically quite simple. For example, blue ink, blue sweater, or blue paper are evidently instances of a regular modification structure [Mod − N]; semantically, the modifier slot could be filled not only by other color terms, but also by any other semantically compatible adjective, and the meaning would also be a composition of the meaning of the adjective and the meaning of the noun, the former restricting the eligible referents of the latter. Thus, the meaning of the actual phrases can be figured out if we know what blue means and what ink, sweater, or paper mean outside of any context. In contrast, the meaning of the expression blue moon is not predictable from the meaning of its component; it is an idiom in the traditional sense. At the same time, it shares the same syntactic structure [Mod − N], and even in other ways the phrase follows the behavior of English NPs (e.g., it takes an
975
976
IV. Syntactic Models article). There just is no flexibility with respect to the fillers of the two syntactic slots and the expression thus falls into the same type of expressions as the more familiar cases of VP idioms (spill the beans, hit the road, etc.) in terms of its semantic noncompositionality coupled with a transparent syntactic structure shared with productive instances of that structure. But then in expressions like blue eyes, we detect features of both. On the one hand, this phrase still has the same syntactic structure and even a certain degree of productivity (blue/hazel/green/dark/etc. eyes) shared with the semantically fully compositional expressions. On the other, it has idiosyncratic properties of its own: the interpretation of the whole is not quite the same as with the combinations blue ink/sweater/paper in that the color modifier conventionally applies only to a particular part of the object denoted by the noun (the iris) and in this metonymic meaning, the noun slot is dedicated to a particular lexical item (eye/eyes). In cases like these, it becomes impossible to apply a simple binary categorization into (lexical) idioms vs. fully transparent syntactic combinations of words. The point of a constructional approach is to allow us to treat all these different types of expressions as related along identifiable shared properties, while also keeping in focus the dimensions in which they constitute distinct grammatical entities. In section 2.4, I will illustrate how this can be done. Especially in early CxG analyses, the commitment to describing all of language manifested itself in a strong focus on studying unusual grammatical patterns, with little or no systematic attention devoted to more ordinary ones (say, basic transitive sentences, passives, wh-structures, etc.). The perception, within some schools of thought, that CxG is good only for studying idioms may have its origin, at least in part, in this early bias toward the conspicuously irregular. However, such an assessment either reduces the scope of CxG research to a domain delimited by a very restricted understanding of idiomaticity or, more likely, is based on a fundamental misunderstanding of the nature of the approach as a whole. At any rate, since the early preoccupation with the nature of standard idioms, CxG practitioners have turned their attention to a broad range of topics in grammatical description and the model has developed into a robust, well-established theoretical tool for analyzing and representing linguistic behavior in general. Moreover, the approach is now used and further developed in several additional areas of research that were not part of the original design. Among them are language acquisition, typological studies, language change, text linguistics and certain strands of interactional linguistics, and most recently also in computational linguistics for modeling language evolution and language understanding. I will comment on each of these areas in section 4, after first explaining the inner workings of CxG in sections 2 and 3.
1.2. Basic assumptions, methods, research goals CxG shares with other grammatical approaches − whether cognitive or mainstream generative − the assumption that language is a learnable cognitive system that is internally structured and provides means for producing and interpreting novel utterances. Beyond this, however, the methodology and research goals associated with CxG are in marked contrast with the generative approach and are shaped by particular assumptions concerning the relationship between grammar and lexicon, the sources of explanation, and the nature of empirical data.
28. Construction Grammar
977
First off, CxG, like other cognitively oriented approaches, does not draw a categorical distinction between lexicon and grammar, thereby providing the necessary analytic and representational flexibility in accommodating the amply documented gradience in categorial distinctions. This feature is inherent in extending the idea of signs from the domain of words (the lexicon), where it has resided traditionally, into the domain of grammatical structure. While CxG does not reject the intuitive and pre-theoretically useful notions of grammatical vs. lexical, the conceptual basis and the architecture of the model does not force the analyst to impose any arbitrary boundaries between what counts as a lexical item and what is a lexically independent (morpho)syntactic structure. Instead, lexical items on the one hand and highly schematic, abstract grammatical patterns on the other are seen as two poles of a continuum along which much of our linguistic knowledge can be arranged. Likewise, linguistic categories are treated as functional prototypes, as specific focal points along a continuum of categoriality. Nonetheless, construction grammarians find it convenient and useful to use the traditional terms lexical and grammatical for general reference to the endpoints of the continuum (or, more precisely, to typical, uncontroversial examples of the endpoints); the labels thus are not intended as theoretical claims expressing a categorical distinction. Put differently, lexicon and syntax are not to be thought of as something in addition to constructions. While we may conceptualize language as encompassing these two domains (mostly for practical expository purposes), they both consist of nothing but constructions of various kinds and of varying degrees of schematicity. The way constructions may differ in the degree of specificity is illustrated in Table 28.1. The examples are listed in the order from fully specific lexical items, in which nothing is left to variation and which may consist of a single word or be multiword units (i.e., lexical items, as linguistic objects, are constructions as well), to fully schematic syntactic or morphological patterns, in which perhaps only (morpho)syntactic or lexical categories, their structural and linear position, and their mutual relationship need be specified explicitly. The partially schematic constructions form a continuum between these two poles in that some part of each construction is fixed and the rest is schematic. (1)
Tab. 28.1: Examples of English constructions on the lexicon-grammar continuum Degrees of schematicity:
Examples:
− fully filled and fixed − fully filled and partially flexible − partially filled
blue moon, by and large, children, ink, blue go[tense] postal, hit[tense] the road the [AdjP] (e.g. the rich/hungry/young) [time expression] ago (e.g. six days/beers ago) adj-ly (e.g. richly, happily) [V NP]VP, [NP VP]S stemV-PAST (e.g. walk-ed, smell-ed)
− fully schematic
Second, CxG does not work with any notion of an a priori established universal structure that would be the basis of every grammatical pattern; instead, it seeks explanations for any universal as well as language-specific properties in certain combinatorial strategies that are based on general cognitive principles and regular communicative strategies. The relevant cognitive principles include categorization, focus of attention, types of reasoning and inferencing strategies (including metonymy and metaphor), associative memory,
978
IV. Syntactic Models planning ahead, etc. The communication-based explanations are concerned with information flow, the nature of speaker-hearer relations, subjective/affective involvement, principles of politeness, text-cohesion strategies, etc. Third, in rejecting the idea that the true nature of language can be best studied and grasped on the basis of an idealized subset of core linguistic expressions, CxG makes a commitment to exploring language in its authentic manifestations and puts emphasis on empirically grounded analysis. Methodologically this translates into an inductively oriented approach: a search for recurring patterns about which we can formulate adequate generalizations. The usage-based aspirations of CxG are also reflected in its attention to issues of context in grammatical descriptions, analyzing data that can shed light on the role of discourse structure and the socio-pragmatic dimension of linguistic organization. Finally, constructions are not only the basic units of linguistic analysis and representation, but are also taken to be hypotheses about speakers’ linguistic knowledge. All of CxG research is motivated by one basic general question (whether stated as such explicitly, or just tacitly assumed): what constitutes speakers’ native-like knowledge and understanding of any given linguistic structure?
1.3. Frame Semantics The focus on incorporating the semantic and pragmatic dimension of linguistic structure is most visibly manifested in the semantic sister theory of CxG known as Frame Semantics (e.g., Fillmore 1982, 1984; Fillmore and Atkins 1992; Atkins, Fillmore and Johnson 2003), as well as in the semantic orientation of the construction grammarians’ interest in pursuing computational applications. Frame Semantics is concerned with the semantics of understanding. Linguistically relevant semantic information is schematized in interpretive frames (Fillmore 1982), which are structured representations of speakers’ conceptualizations of the experienced world and contain information, organized in clusters of frame elements (FEs), that reflects speakers’ native understanding of what the lexical item means and how it can be used in context. A single linguistic expression may be associated with multiple frames and, conversely, a single frame may be shared by multiple expressions; each such expression, then, represents a particular conceptualization of certain parts of the larger background scene. The frame also carries information about the conventional expression of the syntactically relevant participants as they manifest themselves in the syntactic organization of sentences (see section 3.4). This is a unique feature of Frame Semantics as a lexical semantic model: the built-in connection between lexical meaning of an item and the canonical (morpho)syntactic expression of its frame elements, which, again, may differ in their degree of specificity/schematicity. Taking predicates as an example, some frames may only specify the number and type of event roles and those unify with general linking patterns (linking constructions, discussed in section 3.3), which give them an appropriate syntactic form (what such very general frame patterns might be is explored in Fried 2005), while other frames have to make an explicit connection to a particular form (these kinds of frame-syntax associations are analyzed in Fillmore and Atkins 1992, inter alia).
28. Construction Grammar CxG and Frame Semantics together offer a model for representing lexico-grammatical networks in which the relative stability of grammatical form does not conflict with the relative flexibility of meaning and expressive richness, and vice versa.
1.4. Construction Grammar(s) and related models of language Berkeley-based CxG is not a monolithic framework and constructional work is associated with several recognizable strands; differences among them, however, are more an issue of focus or emphasis rather than any fundamental divergence. Its original conception, developed by Ch.J. Fillmore and his associates, is characterized by its focus on issues of phrasal, clausal, and sentential syntax (e.g. Fillmore 1986a, 1988, 1999; Michaelis and Lambrecht 1996; papers in Fried and Östman 2004a; Lambrecht and Lemoine 2005) and by accepting the importance of a more or less consistent formal notation as a way of maintaining analytic rigor. This is also the strand that forms the basis of the present chapter. Other well-known variants are represented by Goldberg’s work, which is dedicated predominantly to issues of argument structure and primarily in the context of language acquisition, or Croft’s (2001) Radical Construction Grammar, motivated primarily by typological issues. For a thorough overview of the breadth in constructional thinking, the interested reader is referred to the encyclopedic compendium by Hoffmann and Trousdale (2013). There is also a good degree of compatibility between CxG and other theories that work with (some version of) the notion of construction. For example, much of the CxG notational system for representing constructions is similar to the representational practices in Head-Driven Phrase Structure Grammar (HPSG; Müller, this volume); both also use elaborate inheritance networks for capturing relationships among constructions. Some construction grammarians explicitly adopt the HPSG formalism and analytic principles (e.g. Ruppenhofer and Michaelis 2010). However, there are also significant differences between the two models, particularly in their fundamental focus and articulated goals: HPSG does not share the explicitly stated concern of Fillmorean CxG for integrating the semantic, cognitive, and interactional dimensions in constructional representations, nor does it include provisions for incorporating the insights of Frame Semantics concerning the interplay between word meaning and (morpho)syntactic structure: HPSG takes the formal pole as central and the starting point of analysis, while CxG recognizes function and meaning as a crucial source of insight concerning the shape of linguistic expressions. Both approaches also differ with respect to more specific phenomena, such as endocentricity (CxG does not operate with the concept known as Head Feature Principle) and locality (CxG does not make use of this concept and its logical consequence − the positing of unary branching structures), as pointed out by Ruppenhofer and Michaelis (2010). Another approach that overlaps with many of the basic characteristics of CxG is Cognitive Grammar (Langacker 1987). In general, Cognitive Grammar emphasizes the indispensability of the conceptual dimension of constructions as the central element in linguistic structure, rather than its grammatical form or the details of the mapping between the two poles. Goldberg’s and Croft’s constructional approaches show some con-
979
980
IV. Syntactic Models vergence with the Cognitive Grammar tradition. Fillmorean CxG, on the other hand, does not accord the conceptual layer a privileged status relative to the formal and/or communicative dimensions.
2. Basic notions 2.1. Constructions and constructs The notion and term construction has the status of a theoretical entity in CxG: it is defined as a symbolic sign which provides a general, multidimensional blueprint for licensing well-formed linguistic expressions and that applies to units of any size or internal complexity (morphological units, words, phrases, clauses, etc.). Constructions capture generalizations about conventional linguistic knowledge that cannot be derived or predicted on the basis of knowing any other pieces of a given language. It is also crucial to keep in mind that constructions are distinct from constructs (or instances of constructions, in another terminological practice): the former are abstractions, “pieces of grammar” (Kay and Fillmore 1999: 2), while constructs are physical realizations of constructions in actual discourse. A construction is thus a generalization over constructs of the same type. To illustrate, Table 28.2 lists examples of constructs and the corresponding constructions that license them. (2)
Tab. 28.2: Some English constructions and corresponding constructs Passive Object-Control Co-instantiation Modification Plural Noun
be greeted by the Prime Minister persuade the children to come new candy, tall tree, large houses students, cars, beers
CxG recognizes the fact that constructional specifications also differ from each other according to the function they serve and to the type of linguistic entity they describe. E.g. lexical constructions capture the properties of lexical items (persuade, children); linking constructions specify conventional patterns of argument realization (Passive, Active Transitive, Motion, etc.); other constructions make generalizations about constituent structure (Plural Noun, Modification, Object-Control); there may also be linearization constructions for capturing word order patterns that are independent of dominance relations (such as, perhaps, verb-second phenomena, clitic clusters, topic-focus articulation, etc.).
2.2. Constructional meaning It is a definitional property of any construction to be more than just a sum of its parts and thus to have a meaning that cannot be derived compositionally from the properties of its constituents. Constructions are signs (recall the explication in section 1.1) and, therefore, never compositional. This characterization seems to have created some confu-
28. Construction Grammar sion among the non-constructionists about the nature of constructional meaning and the status of non-compositionality in CxG. The key to understanding these definitions lies in returning to the early explications of what constitutes a construction and in making the distinction between constructions and constructs. Constructions are defined as objects of syntactic representation that “are assigned one or more conventional functions [emphasis mine] (…) together with whatever is conventionalized about its contribution to the meaning or the use of structure containing it” (Fillmore 1988: 36). This definition suggests a distinction between the function of a construction as a piece of grammar and the meaning of a linguistic expression (i.e. a construct); the distinction is made quite explicit in a subsequent wording of the definition, which states that a construction is “dedicated to a particular function in the creation of meaningful utterances in the language” (Fillmore 1989: 18). It follows that constructions are not necessarily expected to have a meaning in the sense of specific semantic content. Some do, as was later shown by Goldberg (1995) for particular types of argument structure constructions, but such examples represent only one type of constructions: those which manipulate the inherent meaning of predicates by elaborating their valence structure in particular ways. However, not all syntactic patterns involve meaning in the same sense as certain (though not all) argument structure manipulations may. Such patterns include the Modification construction (instantiated by new car, blue ink), Subject-Predicate (Dogs bark, The boat capsized), VP construction (reads poetry, found a mistake), or, say, constructions that capture the linear organization of sentences into field slots determined by information-structure considerations in languages with flexible word order. Yet, these syntactic patterns illustrate precisely the phenomena that motivated the constructional definitions quoted above: what is conventional about these patterns is their syntactic function, such as government, grammatical relation, determination, modification, agreement, headedness, or functions motivated by information flow. These constructions represent dependencies and configurations that are accepted by the speakers as regular grammar expressing constituent structure or linear organization, although once we know what the constructs licensed by them consist of lexically, the semantic content of the whole expression (i.e. a construct, not construction!) may very well be obtained compositionally, by adding up the meanings of the words that instantiate the grammatical pattern in discourse. What is constructional, i.e. in some sense non-compositional, about these kinds of configurations may be, for example, the functional relationship between the constituents. Thus the meaning of (one type of) Modification construction could be labeled as “restrict reference of the noun by the property expressed by the modifier”. It is not an inherent and automatically projected feature of nouns that their referential range will be restricted and that the restriction will take this particular form, nor is it necessarily a feature of every adjective that it will be used as a modifier. Putting the two next to each other will not alert speakers to interpret them as a modification pattern unless the speakers operate with the conventional, shared knowledge that such a pattern exists and that it provides an interpretive clue as to the mutual relation between the two items. Similarly, the constructional status of the English Subject-Predicate pattern comes from the fact that it encodes a particular, otherwise unpredictable event-participant relation, which is different, for example, from an event-participant relation encoded by a verb phrase. Put differently, it is not an inherent feature of nouns or noun phrases that they serve the subject
981
982
IV. Syntactic Models function; it is only by virtue of appearing in a specific, convention-based combination (construction) with a finite verb that gives them this grammatical role. What, thus, constitutes the meaning of the pattern [noun − finite verb] is the fact that the combination expresses a subject−predicate relation. It is in this rather abstract sense that these constructions can be considered non-compositional. Whether there will also be specific (semantic, pragmatic, or other) constraints on the internal constituents, such as, say, animacy or contextual boundedness of the subject referent, will depend on the language. It is thus not the case that CxG somehow establishes an inventory of constructions that can be compositional and then “adds” meaning to them, in order to make them noncompositional. The very fact that two (or more) components form a conventional grammatical pattern that serves, as a whole, an identifiable function in a larger syntagmatic context, and that speakers recognize as such, gives the combination its constructional status. A slightly more elaborate case is presented by the English Determination construction: its non-compositionality consists in the fact that the combination of a determiner and a noun designates a semantically bounded entity, whether or not its constituents are inherently compatible with boundedness. For example, the phrase much snow consists of semantically unbounded items but as a whole, the combination is compatible with contexts in which boundedness is required and the phrase thus behaves the same way as phrases in which boundedness may be part of the inherent meaning of their parts (a car, the cars, etc.): the sentences I couldn’t clear much snow in half an hour / *I couldn’t clear snow in half an hour present the same kind of contrast as I fixed the car(s) in a week / *I fixed cars in a week. The point is that the presence of the completive adverbial (in half an hour, in a week) requires a bounded interpretation of the substance that is being manipulated (cleared, fixed, etc.) in order for the whole proposition to be semantically coherent. The Determination construction is thus non-compositional in the sense that the combination as a whole is necessarily bounded, while its constituents are unspecified for boundedness and may, therefore, even be in conflict with this constructional requirement i.e., they may not simply add up. (For a full analysis of the construction and the argumentation, cf. Fried and Östman 2004b: 33−37). For further illustration of the range of constructional meanings that are outside the domain of argument realization patterns, consider the following pair of sentences: (3)
a. Why don’t you be the leader? b. Why didn’t you become the leader?
Superficially, both constructs could be viewed as instances of a negative wh-question. However, a closer analysis would reveal that (3a) differs from (3b) in ways that make the (3a) pattern unpredictable. Most conspicuous among those features are the following: (3a) can only occur with a present tense verb, while (3b) is unrestricted; (3a) allows a do-support negation with the verb be, which is not normally the case, whether in questions of the type (3b) or elsewhere; and, crucially, (3a) has the pragmatic force of a positive suggestion (“I suggest that you be the leader”), whereas (3b) can only be a genuine information question. Each sentence in (3) is thus licensed by a different construction (Positive wh-Suggestion and Negative Information Question, respectively). Both are highly schematic and fully productive syntactic patterns, but they differ in what speech-act function they conventionally express, i.e. in their constructional meaning.
28. Construction Grammar (The Positive Suggestion meaning is also in conflict with various features of its own constituents, but that is a separate issue; such conflicts are addressed in section 3.5). The relevant question thus is not whether constructions always have meaning but, instead, whether they can license also expressions whose propositional content may be compositional in the sense in which formal theories understand this notion. In other words, we must ask whether a particular string of words, or morphemes, in actual utterances may reveal a grammatical construction in the technical, theoretical sense if the meaning of the string (the construct) is actually a sum of the meanings of its parts. The answer is that non-compositionality in this narrowly semantic (i.e. propositional) sense is not a necessary condition for constructional status. Part of the problem is the term meaning, which can be misleading since the label has to apply to lexical as well as pragmatic and grammatical meaning. But part of the problem surrounding the status of (non-)compositionality may also arise simply from attempts to translate constructional analyses into the meta-language of formal theories. Such translations may be another contributing factor in the erroneous conclusion among non-constructionists that constructions cannot capture any generalizations about predictable (i.e. compositional) phrasal meanings and are thus good only for describing idioms. It is important to emphasize that in many grammatical constructions (i.e. outside of argument structure constructions), non-compositionality concerns the functional dimension of a particular schematic (syntactic or morphological) configuration, such as the examples just discussed, rather than the meaning of the words that can fill the constructional slots in actual utterances. To summarize, when construction grammarians talk about the meaning of constructions, they have in mind the following range of possibilities: idiomatic (lexical) meaning, e.g. the rich, blue moon, go postal, etc.; grammatical function or dependency, such as determination, modification, government, diathesis, etc.; or pragmatic function, e.g. speech-act functions, politeness, etc.
2.3. Rules vs. constraints Since CxG is a monotonic, declarative, unification-based framework, it does not work with any notion of generating linguistic expressions by applying grammatical rules; there is no mechanism that would derive one construction from another. Constructions can be freely combined with one another as long as they are not in conflict. Complex linguistic structures are thus accounted for by identifying what combination of smaller constructions is at work in licensing the expression as a whole. As a simple example, consider the following English sentence: (4)
Can I change the reservation that my colleague made?
This sentence is licensed by the combination of several highly schematic constructions, listed in (5b−h). The order in the list is arbitrary; accounting for the actual construct is independent of the order in which the constructions are combined: (5)
a. lexical constructions associated with the lexical items that fill the constructional slots (e.g. can [with its valence], I, change [with its valence], etc.) b. Subject-Auxiliary Inversion construction to form a Y/N question (instantiated by can I)
983
984
IV. Syntactic Models c. Post-Nominal Modification construction (instantiated by reservation that my colleague made) d. Restrictive Relative Clause construction, which combines a wh-word (that) with a Subject−Predicate construction with a “missing” non-subject argument (instantiated by that my colleague made) e. Subject-Predicate construction (here licensing two clauses) f. Determination construction (instantiated by the reservation and my colleague) g. VP construction (directly instantiated by change the reservation) h. Transitive (linking) construction, to ensure that both arguments of each verb (change, make) are realized in an active transitive pattern It is also obvious that each of the constructions involved in licensing the sentence in (4) can be found in an infinite number of other expressions, in combination with any number of other constructions. Each construction simply specifies constraints on what types of entities can fill its slots and what combinatorial conditions may be imposed on them.
2.4. Networks of grammatical patterns The grammar of a language is seen as an inventory of constructions (not assumed to be the same for all languages), which are organized in structured networks of varying degrees of complexity. The networks capture relationships across constructions based on feature overlap and can be either onomasiological or semasiological in nature. An important concept in setting up the networks is that of inheritance, which provides a coherent way of organizing constructional specifications in terms of those properties that individual constructions have in common and those that set them apart as distinct objects. So far, the networks have been conceived of in two ways, both motivated by the kind of empirical data they cover and the analytic perspective we take. One type works with strictly hierarchical trees. A root, which is the most general pattern, is inherited by all its descendants, each of which is a more specialized and narrowly applicable variant (e.g. Michaelis and Lambrecht 1996); such hierarchies are motivated primarily by accounting for similarity in form. The nature of such a hierarchy can be illustrated on the (fairly simple) example of integrating the modification expressions blue ink, blue eyes, blue moon into a network of related patterns. As was discussed in section 1.1, they represent a continuum of form-meaning integration, in which all the expressions appear to be instances of the same syntactic pattern (modifier-noun). This is a generalization worth capturing, but the representation still has to preserve the equally salient fact that they differ in productivity and in the degree of compositionality in computing the meaning of each expression. Such a generalization can be articulated as a hierarchy of increasingly restricted variants of the most general, schematic Modification construction at the root, with each new generation of daughters introducing particular constraints, all the way to the fully filled and fully fixed combinations (such as blue moon, black eye, or red eyes), which share with the rest of the network only the syntactic configuration and either a specific color adjective or, at the bottom level, also the noun eye(s). The hierarchy, in a simplified form, is sketched informally in Figure 28.1; the bracketing is just a shortcut for a full representation of each construction. The nodes on
28. Construction Grammar
985
the left are added to show that the Modification construction can have other variants based on different semantic types of modifiers; the boldface indicates new constraints that hold in a particular variant. (6)
[ Adj N ] [ size-Adj N ]
[ ... ]
[ color-Adj N ]
[ [color -Adj ] eye(s)N ] [ blackAdj eyeN ]
[ blueAdj moon N ]
[ redAdj eye(s)N ]
Fig. 28.1: Hierarchical inheritance network of Modification constructions
Another type of network is needed for capturing partial inheritance, in which constructions are related through family resemblance relationships. This concerns cases where it is evident that a group of constructions is related through various subsets of shared features but where a true hierarchy of increasingly more constrained variants, or an empirically attested root, cannot be established. Family resemblance is often at play in capturing diachronic relationships among constructions; in those cases we are confronted with various residues and drifts, which can leave pieces of the putative hierarchy missing in the synchronic data. But it also plays an important role in capturing associations between a particular functional domain and the constructions that may encode it (thus taking the opposite starting point as compared to the tree hierarchies). In this respect, the unifying element in the network is not some root construction, but a functional (or conceptual) space onto which given constructions can be mapped. The direction toward forming constructional maps of this kind has been taken in Fried (2007, 2008 and elsewhere). A relatively simple example of the general idea is provided by a small set of correlative patterns, in English instantiated by the sentences in (7−9); I will abbreviate them as [as A as N] patterns:
(7) (8)
Jack is as old as my brother is. a. Jack is strong as an ox. b. *Jack is old as my brother. (9) a. Jack is as old as my brother. b. Jack is as strong as an ox. (10) *Jack is old as my brother is.
Schematically: as A as N is A as N as A as N * A as N is
The examples in (7−9) are clearly related both in form and the general correlative meaning, as well as in the syntactic function served by the [as A as N] pattern: it is used as a non-verbal predicate after a copula, expressing a property of the subject. But each variant is also associated with special features of its own. The one in (7) includes a second instance of the copula (turning the second correlate into a clause) and has only a literal reading (the N is referential). (8a) does not contain the first as, prohibits the presence of a second copula (*Jack is strong as an ox is), and allows only a figurative
986
IV. Syntactic Models reading: the N must be non-referential, as confirmed by (8b). Finally, (9) does not have the second copula and can be read both literally (9a) and figuratively (9b). It follows from all this that the presence of the second copula is a clear signal of a literal (i.e., referential) reading of the correlative pattern, while the absence of the first as (8) is associated with a figurative (i.e., non-referential) reading only. The configuration in (10) thus fails because it combines two incompatible properties, one that goes with a nonreferential N only (absence of the first as) and one that goes with a referential N only (presence of the second copula). It is desirable to capture the relatedness of the three variants but it would be impossible to arrange them into a hierarchical tree: first, selecting the root node would be a wholly arbitrary decision and second, the variants only display partial overlaps, not the kind of inheritance shown in Figure 28.1, where the same syntactic configuration is preserved throughout the network. It is more accurate to conceptualize the relationships exemplified in (7−10) as a network of overlapping constructions, a constructional map, shown in Figure 28.2. The representation here is simplified to the bare minimum, abstracting away from additional details that go hand-in-hand with the difference in referentiality and would, of course, have to be part of the full representation. The present purpose of the picture is to show only that one construction has to be specified as exclusively referential, one as exclusively non-referential, and one is not specified for referential status of the N, thus allowing both interpretive possibilities, while all three of them share the function of expressing a correlative relationship and serving as nonverbal predicates. (11)
Non-referential Referential Correlative; Predicative
as
A
as
N
is
Fig. 28.2: A partial inheritance network of Correlative constructions
In sum, based on the available research, it seems likely that most often, all of these types of networks will be necessary for a full description and representation of a particular syntactic phenomenon.
3. Notational conventions Constructional literature shows a variety of notational practices, the major ones being the hallmark box-style notation, HPSG-style notation, and Goldberg’s notation for argumentstructure constructions. This relative freedom can be seen as a by-product of the fact that CxG does not work with any predefined structure that should apply to all constructions, and that there is no fixed set of features that would have to be present in all representations of all types of constructions. In the rest of this section, I will present the
28. Construction Grammar basics of the relatively detailed box-style notation that is used as a way of forcing analytic precision within the Fillmorean strand of CxG. (The present coverage draws on the exposition in Fried and Östman 2004b, to which the interested reader is referred for more in-depth discussion and exemplification of the full notational system.).
3.1. Structural relations The boxes-within-boxes notation captures hierarchical relations and the linear order of constituents. The notation can be viewed as a more elaborate version of nested brackets. Grammatical constructions that capture very simple syntactic configurations might be more or less replaceable by nested brackets or tree diagrams. However, the point of constructions is to address the empirically supported observation that most grammatical patterns, including some fairly simple ones (such as, say, English determination structures or subject-predicate relations) may require a representation enriched by reference to additional layers of information (semantic, pragmatic, etc.); using boxes is simply a more convenient way of keeping all the information (relatively) transparently organized. Moreover, CxG makes a systematic distinction between two layers of specification: the holistic, constructional level (a set of constraints on how a given unit fits in larger syntagmatic contexts) and the constraints that apply to its constituents. The former is referred to as the external properties of a construction and the latter establishes the internal make-up of a construction. For example, the Positive Suggestion construction, instantiated in (3a), would have among its external specifications the unexpected pragmatic force (positive suggestion), while its internal structure would mostly consist of features inherited from the Negative Question construction (which alone would consist of inheritance links to a number of other constructions, such as do-support, SubjectAuxiliary Inversion, Imperative, VP, etc.), with certain idiosyncracies imposed explicitly on some of its constituents as features of this construction alone (restriction in tense, no semantic restriction on the head verb, obligatory contraction on the auxiliary, etc.). A skeletal example of the box notation that in some version appears in all constructional representations is in Figure 28.3, here showing a headed phrasal construction. (It should also be noted that headedness is not taken as a required feature of phrasal constructions since there are also non-headed structures of various types. A simple and familiar example can be taken from the English coordination structures ([[tall] [and] [thin]], [[returned to California] [and] [started a new business]]), which consists of three elements none of which syntactically dominates the other two. CxG only posits headed structures when such an analysis is warranted by the data.) The outer box represents the whole construction, which in this case has two structural daughters (the inside boxes), the head preceding its dependent(s); the “Kleene +” symbol following the right daughter indicates that the head expects one or more of these dependents. Each box (outer or inner) will contain various subsets of the types of features listed here generically: syn(tactic), prag(matic), sem(antic), val(ence), etc. The ones in the outer box are the external features, relevant to the holistic specification of the construction; the ones in the inside boxes are the features that are associated with individual constituents (internal features). The order in which the features are listed has no theoretical status, although CxG practitioners tend to follow certain general conventions, reflected
987
988
IV. Syntactic Models (12) syn prag sem val phon
[ external syntactic & categorial properties ] [ constructional pragmatics, information-structure specifications ] [ semantics of the construction as a whole ] { arguments and/or adjuncts required by the construction, not provided by the head predicate } [ phonological & prosodic properties of the construction ]
role syn prag sem val phon
head [ syntax & lex. cat of the head ] [ pragmatics & information-structure of the head ] [ semantic properties of the head ] { valence requirements of the head } [ phonological & prosodic properties of the head ] [ lxm specific lexeme ]
role filler syn ... prag ... sem ... etc.
+
Fig. 28.3: Skeletal structure of constructional representation
also in this overview. Finally, let it be noted that just like CxG does not assume the existence of a universal inventory of certain (or all) constructions, not all constructions of a given language, let alone cross-linguistically, are expected to list constraints within all the categories shown in Figure 28.3; for any given construction, only the minimal subsets that are empirically justified for a descriptively adequate generalization about that construction will be specified. For example, the val(ence) statement (to be discussed in section 3.4) at the constructional level will be necessary in the English VP construction, which does not provide a subject slot; or in cases such as applicative constructions, where an additional participant role must be incorporated into the structure of the sentences; etc. On the other hand, a valence statement is not present in the Modification construction since this construction is not concerned with constraints on licensing arguments of predicates. Similarly, the lxm specification only applies in lexically partially filled constructions, such as will be shown in section 3.5. And the same holds for all the other categories.
3.2. Feature structures The content of the domains (syntactic, semantic, prosodic ...) listed in Figure 28.3 is presented in the form of attribute-value pairs (enclosed in square brackets), which serve to organize all the grammatically relevant information and to specify unification relationships. The attributes correspond to linguistic categories, each of which is specified for a particular value. Since CxG is an inductively oriented enterprise, the categories/attributes must be motivated by linguistic facts; there is no a priori determined set of attributes that would function as universal primitives. Examples of attributes and their values that can be found in existing constructional analyses are given in Table 28.3; the list is not exhaustive, of course.
28. Construction Grammar
989
(13) Tab. 28.3: Examples of attributes and their values Domain
Attribute
Values
Syntactic
lexical category finiteness grammatical function
n, adj, v, p, … +/− subj, obj, obl, …
Semantic
number definiteness semantic role
sg, du, pl, … +/− agent, patient, goal, …
Prosodic
prosodic constituent intonation stress
word, phrase, clitic… falling, raising, … primary, secondary, null
Pragmatic
activation in discourse register speech act genre discourse role shift in topic
active, accessible, null formal, informal question, request, … informational, argumentative, … theme, rheme yes/no
The values are assigned in one of three ways, depending on the nature of the attribute. If the feature is binary (e.g. definiteness, finiteness), the value will be + or −. A nonbinary attribute (e.g. lexical category, semantic role) will get its value from a list of possibilities. The list, however, is not a random and freely expandable inventory; its members must make up a coherent set in which each member is defined in relation to other possible members of the set. Finally, CxG allows a value of any attribute, binary or not, to be left unspecified; this is marked by a pair of empty brackets []. For example, in many languages, the members of a Modification construction must agree along any number of features, as illustrated by the Czech examples in (14); throughout the string, all the constituents agree in number, gender, and case. nov-ý román (14) a. můj my.NOM.SG.M new-NOM.SG.M novel.NOM.SG.M ‘my new novel’
[Czech]
nov-á knih-a b. moj-e my-NOM.SG.F new-NOM.SG.F book-NOM.SG.F ‘my new book’ nov-ým knih-ám c. mým my.DAT.PL.F new.DAT.PL.F book.DAT.PL.F ‘[to] my new books’ In order to state this as a generalization independent of any specific instantiations, the relevant categories within that construction will all be marked as [], shown in Figure 28.4. The underspecification of the lexical category of the modifier (cat []) represents the fact that in Czech, this syntactic slot can be filled with items of various categorial value (adjectives, possessive pronouns or adjectives, demonstratives, ordinal numbers, etc.). The “Kleene +” symbol again indicates that there can be a whole string of these
990
IV. Syntactic Models modifiers. (This notation ignores the fact that some members of the string have to be arranged in a particular order depending on their lexical category but I will not address this issue here; the simplified representation is sufficient for our immediate purposes.) (15) Modification cat n sem ['restrict reference of the head by the property expressed by the modifier'] cat [ ] role modifier morphol. case #i [ ] number #j [ ] gender #k [ ]
+
cat n role head morphol. case number gender
#i [ ] #j [ ] #k [ ]
Fig. 28.4: Czech Modification construction
Figure 28.4 illustrates a few additional properties of the notational system. One, the attribute-value pairs are organized in attribute-value matrices (AVM) if a particular linguistic category requires reference to a cluster of features. Two, the AVMs can be nested; the value of an attribute thus can be also an AVM, not just an individual value. In the Modification construction, the morphological features form a coherent cluster and it is then the cluster (an AVM) that is the value of the attribute morphol(ogical categories). And finally, the representation shows one particular use of the co-indexing mechanism (#i, #j, etc.), which is a formal way to keep track of unification relations. In general, co-indexation marks features that must match or at least must not be in conflict either within a single construction or across constructions; this is at the heart of the unification mechanism, which ensures that pieces of linguistic material that do not match along any number or types of properties will not be licensed as possible constructs. Successful unification comes in two shapes, schematically summarized in (16). Two specifications can unify (i.e. fit together) either if they are identical in their requirements (16a) or if one is unspecified (16b); in contrast, conflicting specifications (16c) normally cannot unify. (16) a. [attr x] [attr x] b. [attr x] [attr []] c. *[attr x] [attr y] Thus, for example, the definite article in English is unspecified for the semantic feature number, [num []], and can thus combine with any noun, regardless of its grammatical number (sg/pl); this is the configuration in (16b). In contrast, the indefinite article is specified as [num sg] and can thus unify only with a (countable) noun in the singular, i.e. one that has exactly the same specification (16a). On the other hand, CxG takes language and linguistic structure to be inherently dynamic and as such not immune to constant potential for variability and change. Consequently, strict and absolute unification is not a realistic expectation and, in fact, would
28. Construction Grammar contradict one of the basic CxG tenets, namely, that linguistic analysis must be sensitive to the interactional basis of linguistic structure (Fillmore [1974] 1981). Constructions are assumed to be stretchable to some degree, and the stretching includes cases where a particular combination, produced and accepted by speakers as a possible utterance, involves a unification conflict. A relatively straightforward case of a conflict in a single feature can be drawn from expressions such as the London (of my youth) or (This is) a different London (from the one I know). The combination of a determiner and a proper noun should be ruled out under strict unification, since such a combination violates the constraint that only common nouns can fill the slot of the head noun in a regular determination pattern. In formal terms, the noun London is specified as [proper +] while the slot of the head noun in the construction must be specified as [proper −], in order to capture the robust generalization that normally we do not say things like Tomorrow I’m flying to a London or The London is one of her favorite cities. Yet, the conflict evidently need not always result in an ungrammatical structure. At the same time, it is clear that in order for this combinatorial conflict to be accepted by speakers as meaningful and syntactically possible, certain contextual conditions must obtain. Notice that the combination necessarily evokes the image of a kind of partitioning (in this case temporal), as if dividing the entity London into discrete phases of its existence, which can be fully individuated and, hence, restrictively referenced one at a time and in a mutual contrast. However, this construal, which is imposed by the determination pattern itself, automatically requires the additional context that explicitly encodes this (otherwise unexpected) restrictive reading of an explicitly determined entity. This seemingly trivial kind of conflict is instructive in that it highlights a fundamental feature of CxG that sets it apart from other syntactic theories: grammatical generalizations (i.e. constructions) are treated as functional prototypes in the sense of relatively stable, recurrent patterns shared across a speech community, but not as inviolable rules that result either in a grammatical structure if everything is in full harmony, or in a failure. This conceptual flexibility is cognitively supported by reference to prototypebased categorization and to the goal-oriented nature of normal communication, in which speakers are motivated to interpret even less than perfect matches between the abstract grammatical patterns and the words that fill them in concrete expressions. This, in turn, naturally allows for the often observed fact that there is also a cline in what degree of stretching is likely to be acceptable in a given communicative situation, and at which point the novel combination will be rejected. The unification relationships exemplified in Figure 28.4 express grammatical agreement, but the same basic mechanism applies in capturing government (i.e. argument expression), discussed below.
3.3. Valence In dealing with regular associations between the lexical meaning of predicates (i.e., argument-taking lexemes) and their role in sentence structure, CxG incorporates reference to semantic frames, each of which represents the complete background scene associated with a given linguistic expression: the scene’s participants, settings, props, and any other unique semantic features. The scene-based conception of predicate semantics
991
992
IV. Syntactic Models (Fillmore 1977: 73) provides a natural connection between predicate-specific participant roles and the more abstract notion of semantic roles, which are generalizations over the specific roles, based on shared linguistic behavior. Frame-semantic lexical representation of predicates thus may consist of two layers of information: a frame and a valence. The frame contains all the idiosyncratic information about the meaning of a given predicate, while the valence consists of the syntactically minimal set of semantically more abstract roles (agent, patient, theme, path, etc.) that capture the generalized event type instantiated by the predicate. The association between the frame-specific participants and the corresponding semantic roles is not always fully predictable from the frame, as is well known from various alternation phenomena. In schematic representations, the two layers are linked directly for each predicate by coindexing; the formalization includes grammatically relevant lexical information about a specific lexeme (here lxm buy) and − in the case of predicates − a canonical morphosyntactic form in a particular syntactic pattern (e.g. active, passive, antipassive, causative, reflexive, etc.). This is exemplified in Figure 28.5, with the verb buy as it would appear in an active transitive pattern (this particular representation is again a slightly simplified rendition that leaves out certain minor details concerning features not discussed in this abbreviated survey). The symbol θ stands for semantic role, rel for relation, gf for grammatical function, n+ for a full NP. (17) syn sem
val
[ cat v ] frame COMMERCIAL_TRANSACTION FE #1 [ Buyer ] FE #2 [ Seller ] FE #3 [ Goods ] FE #4 [ Money ] { #1
rel syn
lxm
θ gf n+
agt sub
, #3
rel
θ pat gf obj
syn
n+
}
buy
Fig. 28.5: Fully specified valence of the English verb buy
Predictability in the mappings between semantic roles of arguments and their syntactic function and form in a sentence is captured through linking constructions, which are generalizations about argument realization. The types of links and the level of detail that needs to be spelled out will, to some degree, differ across languages, and the form pole may also involve various categories (grammatical functions, case markers, verbal morphology, prosody), depending on the typological properties of a given language. A simple example of a linking construction would be the English passive, shown in Figure 28.6; p+by stands for a PP introduced by the preposition by, (fni) stands for optional free null instantiation, indicating that the agent need not be expressed and when it is not, its interpretation is free (i.e., depends on specific context in which the passive is used). Notice that linking constructions do not specify concrete lexical items; their job is to apply to whole classes of eligible lexemes.
28. Construction Grammar (18)
syn
cat v voice passive
993
Passive
sem [ 'an entity is affected by a potentially unidentified cause' ] prag [ 'discourse prominence of the result of an action' ] val { rel
} θ agt gf obl syn [ cat p+ by ] (fni)
Fig. 28.6: English passive linking construction
3.4. Instantiation principles Structural dependencies − in CxG called instantiation patterns − as well as constraints on linearization patterns are captured by appropriate phrasal constructions; the Modification construction in Figure 28.4 exemplifies a type of phrasal construction. Instantiation patterns can be classified into two major types: direct instantiation and other. Direct instantiation means that each constituent of a given phrasal construction corresponds to a discrete syntactic unit in the actual linguistic expression. This will always be the case in modification structures: both the head and the dependent(s) must be physically expressed by an appropriate syntactic unit, and in English this also means that the constituents are expressed locally, in the immediate proximity to their phrasemates. However, in the case of complementation, other types of instantiation must be accounted for as well. CxG works with the following general patterns (at least for English): null instantiation, left-isolation, double instantiation, and co-instantiation. Null instantiation refers to instances in which a valence element that is required by the predicate semantics is left unexpressed in certain environments. The omission in concrete expressions is licensed either by a particular predicate (e.g. the patient argument of read, eat, cook) or by a construction, e.g. imperative (licensing a null subject), passive (null expression of the agent), etc. Constructional representations also specify what kind of interpretation is conventionally associated with the unexpressed referent, by attaching the appropriate label to the corresponding valence element (cf. the Passive Linking construction in Figure 28.6). Indefinite null instantiation (labeled as ini in the representations) appears in cases where the referent is some unidentified, indefinite entity whose existence is conventionally understood as the kind of participant required by the predicate in question (e.g. reading material in a sentence such as He spent the whole morning reading). Definite null instantiation (dni) concerns referents that are present in the discourse and can be assumed by the speaker to be identifiable by the hearer; examples include the null subject of imperatives (a property of a grammatical construction), a null argument of certain predicates (e.g. the frame participant role Contest with the verb win, in the sentence He won), and the like. Free null instantiation (fni), exemplified by the optional agent in the Passive construction is licensed in cases where either definite, indefinite, or generic (i.e. folks in general) interpretation is possible.
994
IV. Syntactic Models Left-isolation patterns (also known as distant instantiation) account for dependencies that correspond, roughly, to wh-movement phenomena in the transformational tradition. Thus one of the constructions that together license our examples Why don’t you be the leader (3a) or that my colleague made (4) are left-isolation constructions for forming, respectively, wh-questions and relative clauses. Double instantiation (also known as extraposition) constructions account for patterns in which the properties of a single valence element are distributed over two discrete syntactic units in actual expressions; a single argument thus appears to be instantiated twice. This concerns sentences such as It is annoying that they have such short business hours, in which the semantic content of the subject complement of annoying is expressed by a that-clause (that they have such short business hours), extraposed after the verb, while its syntactic status (subject) is expressed by the sentence-initial it (as a syntactic place-holder). Finally, co-instantiation refers to the opposite configuration, one in which a single syntactic element simultaneously expresses (co-instantiates) two distinct arguments supplied by two distinct predicates; this includes various control phenomena, in the transformational literature known as raising and equi structures. Thus in the example persuade the children to come (Table 28.2), the NP the children co-instantiates the object of persuade and the subject of to come. A general co-instantiation pattern that covers both object and subject control can be formulated as an abstract construction shown in Figure 28.7. This representation specifies that co-instantiation involves two valence elements − one (#1) is syntactically unspecified (can be subject or object of the main predicate) and the other is a subjectless clause, whose main predicate brings along a subject complement that will be co-instantiated by the first element; this is indicated by the co-indexing. The distinction between “raising” and “equi” types will correspond to, respectively, the absence or presence of semantic requirements in the embedded val(ence) statement. (19)
Coinstantiation
syn
[ lex + ]
val
{ [ syn #1 [ ]] , syn [ subj - ] val { #1 [ rel [ gf sub ] }
}
Fig. 28.7: English Co-instantiation construction
Instantiation issues have been discussed in some detail by various scholars, cf. Kay and Fillmore (1999) on left isolation, Michaelis and Lambrecht (1996) on double instantiation, Fillmore (1986b), Ruppenhofer and Michaelis (2010), or Lambrecht and Lemoine (2005) on null instantiation (the first two for English, the last one for spoken French), or Lyngfelt (2009) on a broad range of control patterns in Swedish.
3.5. External vs. internal properties of constructions The nested boxes reflect constituent structure, but also allow us to make a principled and systematic distinction between the external properties of a construction and its internal, constituent-level properties. This distinction is essential for capturing the fact that a
28. Construction Grammar
995
complex expression (morphological or syntactic) as a whole may have its own idiosyncratic features that do not follow from the internal composition. The internal and external levels may share certain attributes and/or values, but need not share all of them; hence the non-compositionality effects. The non-sharing manifests itself in various ways: there may be a direct conflict between certain requirements of the construction as a whole and the specifications of its constituents, in which case the constructional properties override the constituent-level properties, or some aspect − syntactic, semantic, pragmatic, or a combination of any of these − of the construction is added beyond what a simple concatenation of the constituents contributes. Very often, both of these possibilities co-occur, as in the example of determined proper nouns, where the conflict is not purely between external and internal specifications but between the type of an internal constituent and its lexical filler in actual expressions. The Positive Suggestion construction (Why don’t you be the leader? in 3a), presents a conflict between, on the one hand, the function and meaning of the wh-expression, the negation, and the syntax of questions and, on the other hand, the positive suggestion interpretation at the constructional level that clearly cannot be attributed to any (subset of) features of the constituents themselves, nor does it arise from a simple concatenation of the constituent meanings. The constructional meaning thus represents an idiosyncratic external contribution that is added independently of the internal specifications. As to how we get from the literal, compositional meaning of a negative why-question to this novel interpretation is a separate question, one for diachronic analysis, but from the synchronic point of view, this external/internal discrepancy is fully conventionalized. A more intricate type of conflict can be illustrated by phrases such as the poor, the affluent, the hungry, the very naive, etc. Externally, in larger grammatical patterns, these expressions behave like regular noun phrases. However, this behavior cannot be so readily projected from their internal composition, especially if we keep the analysis at the categorial level, as shown in (20): (20)
?? Det
Adj P
Degree
Adj
Whether we arbitrarily replace the question marks at the root of the tree by N or by Det, we will fail to capture the true nature of this fully productive, yet somewhat irregular pattern. Treating the whole phrase as a NP will put its categorial status in conflict with the category of the head, whether we decide to accord the head status to the adjective or to the determiner. And the seemingly obvious (although controversial on other grounds) alternative of treating the phrase as a DP headed by the determiner fails in other respects, since an adequate description of this pattern has to go beyond solving a categorial mismatch across the external/internal dimension, let alone the internal question of headedness. The full challenge becomes apparent when we also consider constructs that have to be ruled out as regular, conventionally expected instantiations of this pattern: *the spacious, *the expensive, *a hungry, *these affluent, *many naïve, *poor, etc. The choice
996
IV. Syntactic Models of both the determiner and the adjective is constrained: the former is restricted to the definite article (other determiners, including null, are not part of the conventional usage) and the latter must come from a semantic class denoting properties attributable to human beings; it also appears that the adjective must be one that can be used predicatively (cf. *the main). The adjectival semantics must be compatible with the idiosyncratic interpretation of the phrases (i.e. the external semantics), which cannot be predicted from the categorial, semantic, or combinatorial properties either of the definite article the or the adjectives: the phrase can only be used in reference to people and they are necessarily understood generically and as a group (cf. The poor have been migrating to this neighborhood vs. *The poor next door moved in last month). The group identity manifests itself also formally by plural agreement, which the phrase forces when used as a subject (The poor were treated with disdain). We thus must posit a construction that spells out all these features, as shown in Figure 28.8; the representation is very slightly simplified for the purposes of this chapter. The point of the representation is to capture the fact that this is a fully productive pattern, which, however, has some unpredictable properties, including its categorial configuration: externally, the phrase plays the role of a noun phrase, through one of the features inherited from (i.e., shared with) the Determination construction that licenses regular NPs, and internally consists of an otherwise unpredictable combination of a determiner and an adjectival phrase. The feature lex(ical) [] indicates that the adjectival slot may or may not be further expanded by a modifier (e.g. the truly clueless). (21)
Group Identity Noun Phrase
cat n+ sem
inherit Determination frame num animacy
[ 'a group of people defined by property x and understood generically' ] plural human syn
lxm the
sem
cat
adjpred
lex
[]
frame
[ ... ]
Fig. 28.8: Group Identity Noun Phrase construction
We can also use this construction as an example of a particular place on the continuum of constructions discussed in section 1.2: it is a partially filled syntactic idiom.
4. Explanatory potential beyond traditional syntactic analyses 4.1. Corpus and text linguistics With the growing availability and greater reliability of electronic corpora, CxG research has been increasingly putting emphasis on empirical methods and particularly statistical
28. Construction Grammar methods known from corpus analysis. A distinct domain of interest has been the study of collocations; this work has been pioneered by an approach developed by Gries and Stefanowitsch (Gries 2003, 2005; Gries and Stefanowitsch 2004), who have coined the terms distinctive collexemes and collostructions in capturing the relative strength in collocational preferences between phrase mates and the degree of conventionalization in these preferences. The importance of a discourse-related dimension in syntactic analyses was recognized quite early in Fillmore’s work on “text semantics/text comprehension” (Fillmore 1981). A more focused attention to the communicative underpinnings of linguistic structure is now rapidly emerging as a very active area of research, drawing also on the advances in certain strands of interactional linguistics and conversation analysis (e.g. Selting 1996; Linell 1998). A systematic study of the grammar of spoken language is framed by the hypothesis that a native-like linguistic knowledge and understanding must include recurring conventionally expected socio-pragmatic patterns and structure, not just the knowledge of words and grammatical rules. Work in this domain thus explores the relationship between grammar and interaction, focusing on a number of relevant areas: structures beyond sentences; the nature and role of non-propositional meanings in spontaneous conversation, as a way of maintaining conversational coherence; interactional properties of linguistic categories; degree of conventionalization in incorporating contextual clues in recurrent grammatical patterning; etc. A representative sample of contextually oriented, corpus-based constructional research can be found in Bergs and Diewald (2009), but also in the work of many other scholars, such as Nikiforidou and Katis (2000); Fischer (2010); Lambrecht (2004); Östman (2005); Fried and Östman (2005); Matsumoto (2010); Fried (2011); Antonopoulou and Nikiforidou (2009); Nikiforidou (2010); or Terkourafi (2010).
4.2. Language variation and change The usage-based orientation of CxG suggests itself also as a link to the study of language variation and change. CxG has only recently started to be tested on diachronic data, but it is becoming evident that constructional analysis can help us be more precise in articulating the emergence of grammatical structure and capturing the inherently dynamic nature of language. CxG seems like a useful model for addressing the central problem of diachronic syntax: the gradual nature of linguistic change, which follows from the tension between, on the one hand, discrete, partial transitions occurring in language use and involving specific features of larger patterns and, on the other, new constructions (i.e., clusters of features) that may arise from these partial changes. Interest in constructional analysis as a potentially useful tool has been rising particularly in grammaticalization research; the connection is most explicitly stated and directly explored especially in Traugott’s work (e.g. Traugott 2008; cf. also papers in Traugott and Trousdale 2010). A broad range of diachronic problems have been recently addressed also in Bergs and Diewald (2008) and Leino (2008). An explicitly CxG-based treatment can be found in Fried (e.g. 2008, 2009); this work examines specifically the representational potential of CxG in capturing the gradualness of syntactic and morphosyntactic changes. The usefulness of CxG in tracking diachronic shifts consists primarily in three features of the model. (i) Maintaining the distinction between constructions and constructs
997
998
IV. Syntactic Models is relevant to the hypothesis that shifts in grammatical structure originate in language use (one of the basic tenets of grammaticalization theory). A series of partial transitions in an expression may ultimately give rise to a new construction but the changes themselves necessarily originate in constructs (i.e. in actual utterances). (ii) The networkbased view of grammar is particularly relevant to capturing diachronic relationships across grammatical forms. It provides a basis for capturing the well-documented layering effects in language change. And (iii), the systematic, theoretically grounded distinction between external and internal properties of constructions offers a coherent way of resolving the conflict between maintaining a transparent internal structure of a linguistic form and developing new functional associations that result in idiosyncratic form-function pairings, i.e., new constructions.
4.3. Typology CxG does not operate with any explicitly articulated assumptions about the universality of specific grammatical categories or syntactic patterns, but this does not mean it has no aspirations for uncovering cross-linguistic generalizations or universal properties of language. On the one hand, by not assuming any universal syntactic structure, the model has the flexibility that is needed for capturing typologically diverse grammatical patterns, as demonstrated in CxG-based typological research and also in detailed studies of various constructions in languages other than English (cf., for example, various papers in Fried and Östman 2004a, Fried and Boas 2005). On the other hand, universal validity may be found in particular types of meaning-form patterns and/or in the way constructions map onto a conceptual space; the latter has been explored particularly in Croft’s (2001) studies of various grammatical categories that can be organized in networks (“conceptual maps”) of related constructions across languages.
4.4. Language acquisition In CxG, knowing a language with a native-like fluency means knowing (and learning) the constructions of that language. Constructional research has been vigorously pursued in language acquisition, particularly in the work of Goldberg and Tomasello (Tomasello 2003). Their general approach is conceptually closer to the Langackerian conception of constructional analysis but the theoretical foundations are shared across both theoretical variants: language acquisition is hypothesized to crucially depend on cognitive and interactional principles, learning is facilitated by language use in particular communicative and social contexts, and the basic domain of learning is a construction in the CxG sense. The topics that have attracted the most focused attention so far center mostly on the acquisition of verbs and argument structure patterns but other structures have been working their way into the acquisition research as well (e.g. Diessel and Tomasello 2001; Katis and Nikiforidou 2010). Constructional analyses can be found in L2 acquisition research as well (e.g. Ellis 2003; Ellis and Ferreira-Junior 2009). According to Goldberg (2006), the usefulness of a constructional approach in acquisition research can be justified on a number of grounds. (i) The learners’ reliance on
28. Construction Grammar multiple cues (syntactic, semantic, pragmatic, phonetic) in the learning process can be best captured by a multidimensional object, such as a construction, which can be − at various stages of acquisition − processed at the holistic (external) level as prefab chunks, or as having a transparent internal structure that then aids in a more productive use of language. (ii) Constructions can be shown to have a predictive value in learning sentence meaning. (iii) The learning process suggests a direction from concrete constructs (exemplars or instances in the cognitive linguistic terminology) to constructions.
4.5. Computational applications Most recently, CxG has also served as a theoretical starting point for designing computational systems that simulate language development and language interpretation, and that aim at integrating conceptual structure in systems of natural language processing. One application is known as Fluid Construction Grammar (FCG), which is being developed by Luc Steels and his associates and which extends the constructional approach into the domain of artificial intelligence (cf. papers in Steels 2011, 2012). The main concern of FCG is to develop computer simulations and robotic experiments that study the development of shared grammar across multiple agents. Constructional thinking is taken as very well suited to the task for the following reasons: (i) its multidimensional architecture and the unification-based representations; (ii) the shared fundamental assumption about the interactional basis of language evolution; (iii) the expectation that speakers within a single community (agents in the robotic experiments) may not always have exactly the same inventories of grammatical constructions; instead, their grammars are assumed to be fluid to some degree. Another computational extension of mainstream CxG is known as Embodied Construction Grammar (ECG), associated with the work of Bergen and Chang (2005). ECG is a model of dynamic inferential semantics, where the central concept is that of embodied schemas (akin to frames). The aim of this model is to develop simulations of the interpretive processes involved in on-line interaction, in which the knowledge of conventionalized structures and meanings (i.e. constructions and words) must be integrated with implicit and open-ended inferences based on situational and interactional context; the latter are generated by the simulations. ECG’s focus on the dynamic nature of linguistic behavior thus explicitly takes issue with the notion of static associations between phonological form and conceptual structure as posited in Cognitive Grammar.
5. Concluding remarks CxG belongs in a family of approaches that are based on one fundamental claim about linguistic structure, namely, that the defining properties of a grammatical pattern form a conventional pairing of form and function/meaning. Construction Grammar has now developed into a mature framework with a solid cognitive and functional grounding, an established architecture, and a consistent notational system for developing schematic representations. It is a constraint-based, non-derivational, mono-stratal grammatical model that also seeks to incorporate the cognitive and interactional foundations of lan-
999
1000
IV. Syntactic Models
guage. It is inherently tied to a particular model of the semantics of understanding, known as Frame Semantics, which offers a way of structuring and representing meaning while taking into account the relationship between lexical meaning, interactional meaning, and grammatical patterning. The appeal of Construction Grammar as a holistic and usage-oriented framework lies in its commitment to treat all types of expressions as equally central to capturing grammatical patterning (i.e. without assuming that certain forms are more basic than others) and in viewing all dimensions of language (syntax, semantics, pragmatics, discourse, morphology, phonology, prosody) as equal contributors in shaping linguistic expressions.
6. References (selected) Antonopoulou, Eleni, and Kiki Nikiforidou 2009 Deconstructing verbal humour with Construction Grammar. In: Geert Brône, and Jeroen Vandaele (eds.), Cognitive Poetics: Goals, Gains and Gaps, 289−314. Berlin: Walter de Gruyter. Atkins, B. T. S., Charles J. Fillmore, and Christopher R. Johnson 2003 Lexicographic relevance: selecting information from corpus evidence. International Journal of Lexicography 16: 251−280. Bergen, Benjamin K., and Nancy Chang 2005 Embodied Construction Grammar in simulation-based language understanding. In: JanOla Östman, and Mirjam Fried (eds.), Construction Grammars: Cognitive Grounding and Theoretical Extensions, 147−190. Amsterdam: John Benjamins. Bergs, Alexander, and Gabriele Diewald (eds.) 2008 Constructions and Language Change. Berlin: Mouton de Gruyter. Bergs, Alexander, and Gabriele Diewald (eds.) 2009 Contexts and Constructions. Amsterdam: John Benjamins. Croft, William 2001 Radical Construction Grammar. Syntactic Theory in Typological Perspective. Oxford: Oxford University Press. Diessel, Holger, and Michael Tomasello 2001 The acquisition of finite complement clauses in English: a usage based approach to the development of grammatical constructions. Cognitive Linguistics 12: 1−45. Ellis, Nick C. 2003 Constructions, chunking, and connectionism: the emergence of second language structure. In: Catherine Doughty, and Michael H. Long (Eds.), Handbook of Second Language Acquisition, 33−68. Oxford: Blackwell. Ellis, Nick C., and Fernando Ferreira-Junior 2009 Constructions and their acquisition: islands and the distinctiveness of their occupancy. Annual Review of Cognitive Linguistics 7: 188−221 Fillmore, Charles J. 1968 The case for case. In: Emmon Bach, and Robert T. Harms (eds.), Universals in Linguistic Theory, 1−88. New York: Holt, Rinehart and Winston. Fillmore, Charles J. [1974] 1981 Pragmatics and the description of discourse. In: Peter Cole (ed.), Radical Pragmatics, 143−166 (reprint of Berkeley Studies in Syntax and Semantics, 1974). New York: Academic Press. Fillmore, Charles J. 1977 The case for case reopened. In: Peter Cole, and Jerry Sadock (eds.), Grammatical Relations, 59−82. New York: Academic Press.
28. Construction Grammar
1001
Fillmore, Charles J. 1982 Frame Semantics. In: The Linguistic Society of Korea (ed.), Linguistics in the Morning Calm, 111−137. Seoul: Hanshin. Fillmore, Charles J. 1984 Frames and the semantics of understanding. Quaderni di Semantica 6: 222−254. Fillmore, Charles J. 1986a Varieties of conditional sentences. Proceedings of the Third Eastern States Conference on Linguistics, 163−182. Columbus, OH: Ohio State University Department of Linguistics. Fillmore, Charles J. 1986b Pragmatically controlled zero anaphora. Berkeley Linguistic Society 12: 95−107. Fillmore, Charles J. 1988 The mechanisms of ‘Construction Grammar’. Berkeley Linguistic Society 14: 35−55. Fillmore, Charles J. 1989 Grammatical Construction Theory and the familiar dichotomies. In: Rainer Dietrich, and Carl F. Graumann (eds.), Language Processing in Social Context, 17−38. Amsterdam: North-Holland/Elsevier. Fillmore, Charles J. 1999 Inversion and constructional inheritance. In: Gert Webelhuth, Jean-Pierre Koenig, and Andreas Kathol (eds.), Lexical and Constructional Aspects of Linguistic Explanation, 113−128. Stanford, CA: CSLI Publications. Fillmore, Charles J., and B. T. S. Atkins 1992 Toward a frame-based lexicon: the semantics of RISK and its neighbors. In: Adrienne Lehrer, and Eva Feder Kittay (eds.), Frames, Fields, and Contrasts: New Essays in Semantic and Lexical Organization, 75−102. Hillsdale, NJ: Lawrence Erlbaum Associates. Fillmore, Charles J., Paul Kay, and Mary Catherine O’Connor 1988 Regularity and idiomaticity in grammatical constructions. The case of Let alone. Language 64(3): 501−538. Fischer, Kerstin 2010 Beyond the sentence: constructions, frames and spoken interaction. Constructions and Frames 2(2): 185−207. Fried, Mirjam 2005 A frame-based approach to case alternations: the swarm-class verbs in Czech. Cognitive Linguistics 16(3): 475−512. Fried, Mirjam 2007 Constructing grammatical meaning: isomorphism and polysemy in Czech reflexivization. Studies in Language 31(4): 721−764. Fried, Mirjam 2008 Constructions and constructs: mapping a shift between predication and attribution. In: Alexander Bergs, and Gabriele Diewald (eds.), Constructions and Language Change, 47−79. Berlin: Mouton de Gruyter. Fried, Mirjam 2009 Construction Grammar as a tool for diachronic analysis. Constructions and Frames 1(2): 261−290. Fried, Mirjam 2011 The notion of affectedness in expressing interpersonal functions. In: Marcin Grygiel, and Laura Janda (eds.) Slavic Linguistics in a Cognitive Framework, 121−143. Frankfurt: Peter Lang. Fried, Mirjam, and Hans C. Boas (eds.) 2005 Construction Grammar: Back to the Roots. Amsterdam: John Benjamins.
1002
IV. Syntactic Models
Fried, Mirjam, and Jan-Ola Östman (eds.) 2004a Construction Grammar in a Cross-language Perspective. Amsterdam: John Benjamins. Fried, Mirjam, and Jan-Ola Östman 2004b Construction Grammar: a thumbnail sketch. In: Mirjam Fried, and Jan-Ola Östman (eds.), Construction Grammar in a Cross-language Perspective, 11−86. Amsterdam: John Benjamins. Fried, Mirjam, and Jan-Ola Östman 2005 Construction Grammar and spoken language: the case of pragmatic particles. Journal of Pragmatics 37(11): 1752−1778. Goldberg, Adele E. 1995 A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press. Goldberg, Adele E. 2006 Constructions at Work: The Nature of Generalization in Language. Oxford: Oxford University Press. Gries, Stefan Th. 2003 Towards a corpus-based identification of prototypical instances of constructions. Annual Review of Cognitive Linguistics 1: 1−27. Gries, Stefan Th. 2005 Syntactic priming: A corpus-based approach. Journal of Psycholinguistic Research 34(4): 365−99. Gries, Stefan Th., and Anatol Stefanowitsch 2004 Extending collostructional analysis: a corpus-based perspectives on ‘alternations’. International Journal of Corpus Linguistics 9(1): 97−129. Hoffmann, Thomas, and Graeme Trousdale 2013 The Oxford Handbook of Construction Grammar. Oxford: Oxford University Press. Katis, Demetra, and Kiki Nikiforidou 2010 Indeterminacy in grammar and acquisition: an interdisciplinary approach to relative clauses. Review of Cognitive Linguistics 8(1): 1−18. Kay, Paul, and Charles J. Fillmore 1999 Grammatical constructions and linguistic generalizations: the What’s X doing Y? construction. Language 75: 1−33. Lambrecht, Knud 1988 There was a farmer had a dog: syntactic amalgams revisited. Berkeley Linguistic Society 14: 319−339. Lambrecht, Knud 2004 On the interaction of information structure and formal structure in constructions: the case of French Right-Detached comme-N. In: Mirjam Fried, and Jan-Ola Östman (eds.), Construction Grammar in a Cross-language Perspective, 157−199. Amsterdam: John Benjamins. Lambrecht, Knud, and Kevin Lemoine 2005 Definite null objects in (spoken) French: a Construction-Grammar account. In: Mirjam Fried, and Hans C. Boas (eds.), Construction Grammar: Back to the Roots, 13−55. Amsterdam: John Benjamins. Langacker, Ronald 1987 Foundations of Cognitive Grammar, vol. 1: Theoretical Prerequisites. Stanford: Stanford University Press. Leino, Jaakko (ed.) 2008 Constructional Reorganization. Amsterdam: John Benjamins. Linell, Per 1998 Approaching Dialogue. Talk, Interaction and Contexts in Dialogical Perspectives. Amsterdam: John Benjamins.
28. Construction Grammar
1003
Lyngfelt, Benjamin 2009 Towards a comprehensive Construction Grammar account of control: a case study of Swedish. Constructions and Frames 1(2): 153−189. Matsumoto, Yoshiko 2010 Interactional frame and grammatical descriptions: the case of Japanese noun-modifying constructions. Constructions and Frames 2(2): 135−157. Michaelis, Laura, and Knud Lambrecht 1996 Toward a construction-based theory of language function: the case of nominal extraposition. Language 72: 215−247. Nikiforidou, Kiki 2010 Viewpoint and construction grammar: the case of past + now. Language and Literature 19(3): 265−84. Nikiforidou, Kiki, and Demetra Katis 2000 Subjectivity and conditionality: the marking of speaker involvement in Modern Greek. In: Αd Foolen, and Frederike van Der Leek (eds.), Constructions in Cognitive Linguistics, 217−237. Amsterdam: John Benjamins. Östman, Jan-Ola 2005 Construction discourse: a prolegomenon. In: Jan-Ola Östman, and Mirjam Fried (eds.), Construction Grammars: Cognitive Grounding and Theoretical Extensions, 121−144. Amsterdam: John Benjamins. Ruppenhofer, Josef, and Laura Michaelis 2010 A constructional account of genre-based argument omission. Constructions and Frames 2(2): 158−207. Selting, Margret 1996 On the interplay of syntax and prosody in the constitution of turn-constructional units and turns in conversation. Pragmatics 6(3): 357−388. Steels, Luc 2011 Design Patterns in Fluid Construction Grammar. Amsterdam: John Benjamins. Steels, Luc 2012 Experiments in Cultural Language Evolution. Amsterdam: John Benjamins. Terkourafi, Marina 2010 Don’t go V-ing in Cypriot Greek: a speech-act construction at the interface of semantics, pragmatics and intonation. Constructions and Frames 2(2): 208−241. Tomasello, Michael 2003 Constructing a Language: A Usage-based Theory of Language Acquisition. Boston: Harvard University Press. Traugott, Elizabeth Closs 2008 ‘All that he endeavoured to prove was ...’: On the emergence of grammatical constructions in dialogic contexts. In: Robin Cooper, and Ruth Kempson (eds.), Language in Flux: Dialogue Coordination, Language Variation, Change and Evolution, 143−177. London: Kings College Publications. Traugott, Elizabeth Closs, and Graeme Trousdale (eds.) 2010 Gradience, Gradualness and Grammaticalization. Amsterdam: John Benjamins.
Mirjam Fried, Prague (Czech Republic)
1004
IV. Syntactic Models
29. Foundations of Dependency and Valency Theory 1. 2. 3. 4. 5.
Introduction Dependency and constituency Tesnière’s Dependency Grammar Some issues in valency research References (selected)
Abstract Dependency, which can be defined by the notions of reducibility and subcategorization, is one of two fundamental principles of syntactic arrangement. It differs from the other principle, constituency, with respect to the syntactic information expressed. Many more recent approaches to syntax therefore contain dependency as well as constituency elements. While Tesnière’s Syntaxe Structurale is usually considered the first dependency grammar, centring around the notions of connexion, translation and junction, not all relations between words in this framework are dependency relations and remain in fact quite ill defined. One central notion in Tesnière’s theory is that of valency which results in the subdivision of dependents into complements and adjuncts. While the notion is intuitively quite convincing it has remained somewhat elusive and has sparked a considerable amount of ongoing research.
1. Introduction This article attempts to give a brief outline of some of the basics of dependency and valency theory. Lucien Tesnière’s Eléments de Syntaxe Structural (1959) is generally seen as the first dependency grammar although the core notions of dependency and valency can be traced back to medieval and early modern grammarians such as Petrus Helias and Matthäus Herben (Ágel 2000: 16−19) and have been taken up by more recent dependency approaches to syntax such as Hudson (1984, 1990), Starosta (1988, 2003) or Heringer (1996). Tesnière’s dependency grammar will be discussed in section two of this article. It is preceded by a discussion of dependency and constituency as basic and possibly alternative notions of syntactic structure. This survey will be concluded by a discussion of valency related issues.
2. Dependency and constituency 2.1. Two basic models of syntactic structuring Theories attempting to describe how syntactic constructions are created or can be analysed usually draw upon (at least) one of two rivalling principles − that of constituency
29. Foundations of Dependency and Valency Theory
1005
and that of dependency, cf. Korhonen (1977: 31−83). A third principle, transformation (Chomsky 1957, 1965), can be disregarded in the present discussion, since it does not deal with the formation of syntactic constructions out of smaller elements (e.g. words). Constituency is best explained as a part-whole-relationship (Heringer 1993b) where the parts are related to the whole by “consists of”-rules. Thus, a syntagm such as the yellow flowers could be explained in terms of a noun phrase which consists of a determiner, an adjective and a noun. It is important to note that the concept of constituency leads to the creation of higher ranking syntactic entities such as phrases, but (in this very basic version of a constituency grammar) no assumptions about the status of the parts relative to each other are being made; they are all of equal importance as parts of the larger construction as Matthews (1981: 73) points out: “In the crudest form of constituency model, a unit a is related to a neighbouring unit b solely by their placement within a larger unit c”. Diagrammatically, this can be expressed by bracketing or a stemma: (1)
[the yellow flowers]NP
[Det Adj N]NP
NP the
NP
yellow flowers
Det
Adj
N
In contrast, dependency addresses the question of the relationship between elements of the syntagm the basic assumption being that some elements are structurally more important than others. Thus in the yellow flowers the noun flowers is given special status as the core of the construction on which the other two elements depend in the sense that the presence of flowers opens up syntactic slots for the other two elements. Thus the flowers and yellow flowers are both acceptable constructions whereas the yellow is not (unless it is seen as elliptical; cf. Halliday and Hasan 1976: 147−166). The construction the yellow flowers is therefore interpreted as consisting of the governor flowers and its dependents the and yellow. In this basic dependency model no assumption regarding the existence of higher-ranking entities such as phrases is being made. Diagrammatically, dependency relations are commonly expressed by stemmata of the following kind: (2)
flowers the
N yellow
Det
Adj
2.2. Dependency defined: reducibility, endocentricity and subcategorisation Dependency can be defined as the conjunction of two aspects: reducibility and subcategorisation. Reducibility, a formalized account of which is given in Heringer (1993b: 316−318) concerns the idea that a governor can exist without its dependents but not vice versa. Yellow flowers is a reducible construction with flowers as governor because they commute in the same syntagm: Harry bought yellow flowers − Harry bought flowers. It
1006
IV. Syntactic Models
is evident that the notion of reducibility is closely connected to Bloomfield's ([1933]1984: 195) notions of endocentricity and headedness, and in fact the close relationship between dependency and endocentricity has been pointed out by Vennemann (1977: 262−266). Some authors like Welke (1995: 163) and Uzonyi (2003) even equate reducibility and endocentricity: Endozentrik bedeutet nämlich in diesem Zusammenhang nur soviel, dass das Vorkommen von a das Vorkommen von b voraussetzt. (…) Nicht endozentrisch ist daher das Verhältnis zwischen dem Valenzträger und seinen obligatorischen Aktanten, weil beide das Vorhandensein des anderen voraussetzen bzw. keines ohne das andere eliminierbar ist. [In this context endocentricity only means that the presence of a presupposes the presence of b. (…) The relationship between a valency carrier and its obligatory complement cannot count as endocentric because each requires the presence of the other; none can be deleted without the other.] (Uzonyi 2003: 231, 233)
Against this one might argue that any dependency construction is by definition endocentric irrespective of its reducibility because the notions “head of an endocentric construction” and “governor of a dependency construction” are ultimately the same. Endocentricity and reducibility can therefore be seen as closely connected but not identical notions. The reducibility of a construction can be seen as proof of its endocentricity; the absence of reducibility, however, does not prove that the construction is not endocentric. With respect to subcategorisation, it is the governor which is subcategorised by its dependents. For example a construction with verb + directional adverbial is only possible for a certain subclass of verbs (put into the box, walk along the river) but not all verbs (*stay into the box; *sit along the river). In other words, it is the governor which provides slots for certain types of dependents. The notion of subcategorisation includes the traditional concepts of government and some cases of congruence tough not all of them (Jacobs 2003: 384−385). An example for the latter is the relationship of adjective + noun in German where the adjective takes its inflectional form according to the gender and number of the noun: ein prächtiger Bau (M.SG) ‘a beautiful building’, eine prächtige Kirche (F.SG) ‘a beautiful church’, prächtige Bauten (M.PL) ‘beautiful buildings’. The adjective subcategorises the class of nouns in the sense that inflected adjectival forms such as prächtiger only occur as attributes of the subclass of masculine singular nouns. The adjective is therefore a dependent of the noun. The conjunction of reducibility and subcategorisation is necessary for the definition of dependency because cases of reducibility and subcategorisation are not co-extensive. For example, defended him in We defended him is not a reducible construction; it cannot be reduced to a single element (its head): We defended him − *We defended. However, defend belongs to that subcategory of verb which is followed by a nominal complement and can therefore by the criterion of subcategorisation be considered the governor in this construction. In contrast, the dependency relation in appeared yesterday cannot be based on the notion of subcategorisation because time adverbs like yesterday can virtually cooccur with all verbs. If the verb is considered the governor in this case, it is because of the reducibility of the construction, i.e. the time adverbial presupposes the presence of the verb but not vice versa: He appeared yesterday − He appeared − *He yesterday. Note, however, that such a statement presupposes a particular view of the sentence as a
29. Foundations of Dependency and Valency Theory
1007
unit of grammatical description. In certain contexts such as When did the book come out? an utterance such as yesterday could also be argued to be a well-formed sentence. However, since the direction of subcategorisation is a matter of interpretation to some degree (cf. Welke 2011: 26), the question of the direction of dependency remains a matter of debate for some irreducible constructions. This is the case for constructions of the type the book where the noun is traditionally seen as the governor or head of the construction, but some linguists, such as Hudson (1984: 90−92, 2003: 519−520, 2007: 190−192) or Welke (2011: 31−32), make a case for seeing the determiner as governor. Still other linguists, such as Eroms (1985: 216−217, 1988), Lobin (1995: 127− 129) and Herbst and Schüller (2008: 26, 43), assume interdependency in this case. Herbst and Schüller talk about a nominal head complex in this case, a notion which closely resembles Tesnière’s nucleus (see 3.1 below).
2.3. Dependency and constituency compared The relationship between constituency grammar and dependency grammar has been discussed frequently, for example by Baumgärtner (1970), Vennemann (1977), Matthews (1981), Welke (1995) and Uzonyi (2003). According to Baumgärtner (1970: 59) and Matthews (1981: 84), both types of grammar are weakly equivalent in that every syntagm that can be generated by a constituency grammar can also be generated by a dependency grammar and vice versa. However, with respect to strong equivalence in the sense that both types of description also give identical information about a construction, there seems to be some disagreement, which, according to Uzonyi (2003: 236), is at least partly due to the fact that both dependency grammar and constituency grammar exist in a variety of versions so that the result of a comparison always depends on the particular versions compared. For that reason, it seems reasonable to initially compare the very basic “pure” versions of constituency and dependency outlined above in 2.1. Here it can be argued that both models differ in some aspects and therefore are complementary rather than truly equivalent: First, a “pure” version of constituency grammar gives no indication of the governordependent relations of the elements of a phrase (Matthews 1981: 84−85) and is, in that respect, less powerful than a dependency description Second, discontinuities of the type a difficult book to read are problematic for constituency grammars, since there is a general assumption that words that form a phrase are adjacent to each other (cf. Matthews 1981: 90−91 for a similar argument). “Pure” dependency, however, makes no such assumption. Third, by the same token, however, a constituency grammar does make a statement of linear sequencing or seriality (cf. Tarvainen 1981: 13; Vennemann 1977: 259) which is not expressed in the dependency stemma (Uzonyi 2003: 237) as Tesnière (1959: 19) himself makes clear by distinguishing l’ordre structural from l’ordre lineaire (cf. also Vennemann 1977: 267−268). Compare the phrase structure and dependency stemmata of these yellow flowers and these flowers on the table (See [3] and [4]). The fact that yellow precedes flowers but on the table follows it is not expressed in the dependency stemmata but it does appear from the constituency trees; compare the wider discussion of this in Heringer (1993b: 321−323) and in Eroms and Heringer (2003).
1008 (3)
IV. Syntactic Models constituency: NP Det
Adj
NP N
Det
N
PP prep
these yellow flowers
(4)
these flowers on
dependency: flowers these
yellow
NP Det
N
the
table
flowers these
on table the
Finally, it is questionable whether one should assume (as a “pure” dependency model does) that the governor in a dependency relationship is always a word. Thus Matthews (1981: 90) argues that the adverb in Obviously he did it is not dependent on the verb did but on the entire clause he did it, which should be considered its governor. In the same vein Quirk et al. (1985: 511−512) distinguish predication adjuncts which are verb related (She kissed her mother on the cheek) from sentence adjuncts which “relate to the sentence as a whole” (She kissed her mother on the platform). A “pure” dependency model cannot express this difference, because there are no phrases and sentences which would serve as governors of sentence adverbials. More recently, however, constituency- and dependency-based models of syntax have shown a clear tendency of convergence (Schmidt 1991: 212; Hudson 1993: 329−330; Uzonyi 2003: 238−240). This is especially clear for the influence of dependency-related ideas on constituency-based models. An important step in that respect was the development of X′-theory (Jackendoff 1977), which forms a part of such mainstream syntactic theories as Government-Binding Theory and Generalised Phrase-Structure Grammar (cf. Hudson 1993: 329). It introduced the notion of a “head”, symbolised as X, into phrase structure and interpreted the phrase as a projection of its head (Horrocks 1987: 63−64; Müller 2010: 53−54). Up to a point this was foreshadowed by the introduction of strict subcategorisation rules in the standard theory of Transformational Grammar (cf. Klotz 2000: 28−29; Müller 2010: 65). Both express the idea that the syntactic structuring around a lexical element is dependent on (or at least interacts with) some lexical properties of this element. Obviously, there is a close correspondence between the notion of head in X′-theory and the dependency-based notion of governor. X′-theory also parallels the dependency approach in that different elements around the head are introduced at different ranks as complements (sisters of X), adjuncts (sisters of X′) and specifiers (sisters of X″). This parallels the distinction between valency-bound dependents (complements) and free dependents (adjuncts), which is a standard part of many versions of
29. Foundations of Dependency and Valency Theory
1009
dependency theory (Uzonyi 2003: 239). This distinction will be discussed in more detail below (3.2 and 4.5). While X′-theory may be the most important development with respect to the introduction of dependency-based notions in mainstream generative grammars, it is by no means the only one. Hudson (1993: 329−330) also mentions the use of relational categories such as “subject” in Lexical Functional Grammar and the recognition of “case” and “government” as relevant notions in Government-Binding Theory (cf. also Schmidt 1991: 214). We may add the concept of θ-roles in GB, which corresponds to (one version of) semantic valency, to this list. Uzonyi (2003: 241) finally also mentions the interpretation of sentences as projections of INFL in Government-Binding theory, which is to some extent parallel to the traditional dependency notion of the verb as the central node in a sentence and is even more closely mirrored by Eroms’ (1988: 292−297) account of the dependency structure of a sentence, in which not the verb itself but its inflectional morpheme is the central node (cf. Ágel 2000: 92−95). On the other hand there are also dependency and valency researchers who have included elements of constituency into their models. For example, Herbst and Schüller’s (2008) valency-based approach to English syntax acknowledges the special status of the subject by observing the traditional subdivision of the clause into subject and predicate, which are understood as constituency or construction-based terms. The predicate is the clause constituent which contains the verb and all its complements except for one (usually the agent), which is mapped onto the subject. The grammar thus accounts for the structural obligatoriness of the subject in the English finite declarative clause as opposed to imperatives or non-finite clauses. The approach is reminiscent of the GB analysis, which interprets the subject as an external argument which is not part of the VP but is tied to the verb by a θ-role (cf. Sells 1985: 35−36). According to Uzonyi (2003: 240−245), dependency and constituency stemmata can be converted into each other by a simple algorithm provided they contain information about the following aspects: (i) headedness of constructions, (ii) linear order of elements, (iii) category membership of elements, (iv) type of relationship between head and dependents (complements or adjuncts) and (v) type of connection (elements + element; element + phrase; phrase + phrase).
3. Tesnière’s Dependency Grammar Many aspects of Tesnière’s syntactic model were published in writing as early as 1934, but the Eléments de Syntaxe Structurale, which was published five years after his death in 1959, is generally seen as his central achievement. It is subdivided into three parts which discuss the three central notions of connexion, translation and jonction (cf. Heringer 1993a; Ágel 2000: 34; Askedal 2003: 80).
3.1. Connexion Connexions or dependency relations are fundamental to a sentence in that they establish its structural order. This they do independently of any morphological markings (Tesnière
1010
IV. Syntactic Models
1959: 11), so although connexion is akin to the traditional concept of government, it is more general in that connexion is not a morpho-syntactic but a purely syntactic relationship (cf. Askedal 2003: 81). The sentence Alfred parle ‘Alfred speaks’ consists of three elements: two words and the syntactic connection which exists between them and integrates them into a whole which is more than the sum of its parts (Tesnière 1959: 11− 12). Dependency relations are directional in that they connect a governor and a dependent. Whereas a governor can have several dependents, each dependent can only be controlled by one governor (Tesnière 1959: 14) unless the governors are coordinated in jonction; (see 3.4 below). Dependency relations are also recursive (Askedal 2003: 81) in that a word can act as dependent and governor at the same time; this establishes a dependency hierarchy (Tesnière 1959: 13−14), which is the structural order of a sentence. The structural order of the sentence mon vieil ami chante cette fort jolie chanson ‘my old friend sings that very pretty song’ can be represented in the following dependency stemma (Tesnière 1959: 15): (5)
chante ami mon
chanson vieil
cette
jolie fort
Governors establish a nexus (nœud), which consists of the governor and all its (direct and indirect) dependents. As Askedal (2003: 86) points out, the nexus is, up to a point, reminiscent of the phrase in constituency-based grammars, which is also why Engel prefers the translation Nexus (nexus) rather than Knoten (node) in his German translation of Tesnièr’s Eléments (1980: 390, fn. 4). Dependency relations exist between words of the four categories noun (O), verb (I), adjective (A) and adverb (E) (Tesnière 1959: 64), which are similar to but not entirely co-extensive with the traditional word classes whose names they share and combine into the class of constitutive words (mots constitutifs) (Tesnière 1959: 55−56; cf. Askedal 2003: 84). The four constitutive categories form a generalised dependency hierarchy, in that I dominates the sentence as the nœud central (Tesnière 1959: 15) and governs O, which in turn governs A. E, finally, can be governed by I or A and can only govern another E. Askedal (2003: 84) summarises this generalised hierarchy of dependencies in the following chart (cf. also Ágel 2000: 38): I
(6) O
E
A
E
E E
29. Foundations of Dependency and Valency Theory
1011
Words outside the constitutive word classes, such as auxiliaries or articles, form the class of subsidiary words (mots subsidiaires). They attach to full words to form a nucleus (nucléus), e.g. in the formation of complex tenses est arrivé ‘has arrived’, the combination of copula and adjective est grande ‘is tall’ (Tesnière 1959: 46), or the combination of article and noun le livre ‘the book’ (Tesnière 1959: 56). The nucleus takes its category from the constitutive word at its centre. However, subsidiary words are not connected to full words by dependency relations; rather Tesnière (1959: 56) describes them as satellites of the constitutive word with a tendency to agglutination. Apparently, he also considers inflections as subsidiary elements which form a nucleus with their stem because Fr. de Pierre and Engl. Peter’s are shown as equivalent syntagms (1959: 368). In that vein Tesnière (1959: 57−58) also discusses the historical development from constitutive word via subsidiary word to inflection; Pasierbsky’s (1981) distinction between macroand micro-valence is clearly inspired by this. Finally, constitutive words can form nuclei by themselves. The elemental building block of the structural order of the sentence is therefore the nucleus (Tesnière 1959: 45). However, whereas extranuclear syntactic relations are well defined in Tesnière’s theory, the exact nature of intranuclear relations remains unclear. Below in 3.4 it will become apparent that the division of syntactic structure into an extra- and intranuclear part also leads to problems with respect to jonction. Stemmatically, nuclei can be expressed in the following way (Tesnière 1959: 46, 56): (7)
parle
est arrivé
est grand
il regarde
Alfred
Alfred
Alfred
le livre
‘Alfred speaks’
‘Alfred has arrived’
‘Alfred is tall’
‘he looks (at) the book’
It will be noticed that the personal pronoun il in the last stemma is considered part of the verbal nucleus. Tesnière distinguishes the emphatic French pronouns moi, toi, lui, etc., which he considers nouns (O), from the non-emphatic ones je, tu, il, etc., which are subsidiary words and mere morphological markers of the category person in the verb (Tesnière 1959: 131−137). However, it is not entirely clear how this distinction could be applied to other languages such as German and English, which do not have two sets of personal pronouns. Tesnière (1959: 133) does notice that in English the subject pronoun can be left out in cases such as Here’s Mr. Maldon, begs the favour of a word, Sir, which can be seen as an argument in favour of considering it merely subsidiary.
3.2. Valency: actants and circumstantials Tesnière (1959: 102–103) subdivides the direct dependents of the verb functionally into actants (actants) and circumstantials (circonstants). Categorially, actants are the direct noun dependents (O) of the verb, circumstantials the direct adverb dependents (E). Semantically, actants are the participants involved in some process expressed by the verb
1012
IV. Syntactic Models
whereas circumstantials indicate time, place, manner etc. A verbal nexus can involve up to three actants, which serve the semantic roles of agent (O′), affected (O″) and beneficiary (O‴) (Tesnière 1959: 115). Depending on the particular language, actants are formally identified by their morphology or position (Tesnière 1959: 107−111). Since actants form a semantic unit with the verb they are often but not always obligatory; in contrast, circumstantials are essentially optional (Tesnière 1959: 128). It is clear that many aspects of Tesnière’s distinction between actants and circumstantials are rather traditional, and he himself notices that the distinction set out in this way is not entirely watertight. For example à Charles in Alfred donne le livre à Charles ‘Alfred gives the book to Charles’ is considered third actant despite the fact that the nucleus contains a directional preposition and is therefore not strictly a noun anymore. On the other hand, de veste in Alfred change de veste, literally Alfred changes of jacket ‘Alfred changes his jacket’, is considered a circumstantial because of its preposition and the fact that it does not fit the semantic description of second (affected) or third (beneficiary) actant. This is despite the fact that de veste is an obligatory dependent of the verb change (Tesnière 1959: 128). Depending on its valency (valence), a verb can take between zero and three actants; verbs can therefore be classified as avalent, monovalent, divalent (or transitif in Tesnière's terminology) and trivalent. Tesnière (1959: 238–239) emphasises that not all valency slots of a verb need to be filled, although the exact conditions of optionality are not discussed. Furthermore, since the actants are defined in terms of their semantic roles, the numbering O′–O‴ does not imply any precedence of occurrence; for example, the fact that a verb takes O‴ does not necessarily imply that it is trivalent and also takes O′ and O″. This is the case in the German sentence Mir ist warm, literally me is warm ‘I am warm’, where mir is the only actant but considered O‴ because of its case morphology and semantic role. Similarly, the divalent French verb plaire takes a first and third actant in le livre me plaît ‘the book pleases me’ (Tesnière 1959: 246) and the trivalent Latin docere take two O″ but no O‴ in Antonius docet pueros grammaticam ‘Anthony teaches the boys grammar’ (Tesnière 1959: 257). From a semantic point of view pueros may well be seen as the recipient or beneficiary of the teaching, so that Tesnière’s interpretation of this syntagm seems to be primarily guided by the morphological form. However, in other cases such as the interpretation of à Charles above, he seems to give precedence to semantic over morphological considerations. Overall, it therefore seems fair to say that Tesnière’s classification of actants suffers from the same problem that also besets traditional grammatical concepts such as “object”: both types of classification employ a mixed bag of semantic and formal criteria which, in many cases, simply do not coincide.
3.3. Diathesis Tesnière (1959: 242–246) treats the issue of passivisation under the traditional term diathèse (diathesis), which he interprets as a reallocation of semantic roles to actants: under passivisation O′ is the affected entitiy. This reallocation of semantic roles is also at the heart of reflexivity, where one entity is given two semantic roles, and reciprocity, where two entities take the role of agent and affected respectively in two converse activities. Reflexivity and reciprocity are thus also included under the heading of diathe-
29. Foundations of Dependency and Valency Theory
1013
sis as are the causative and recessive diatheses. However, the latter two are of a different kind because they change the number of actants around the verb. Under causative diathesis the verb gains one actant, under recessive diathesis is loses one actant. The causative diathesis can be expressed periphrastically as in Alfred apprend la grammaire ‘Alfred learns grammar’ > Alfred fait apprendre la grammaire à Charles, literally Alfred makes learn the grammar to Charles ‘Alfred teaches grammar to Charles’ or lexically as in Alfred voit l'image ‘Alfred sees the picture’ > Charles montre l'image à Alfred ‘Charles shows the picture to Alfred’ (Tesnière 1959: 260–261). A case of recessive diathesis is Alfred lève quelque chose ‘Alfred lifts sth.’ > Alfred se lève, literally Alfred lifts himself ‘Alfred gets up’, where se is interpreted not as actant but part of the verbal nucleus.
3.4. Transference and junction Dependency relations of the kind discussed above establish the simple sentence; the expansion of this is brought about by junction (jonction) and transference (translation) (Tesnière 1959: 323). By transference a constitutive word or nucleus can be transferred to another category. Morphologically, transference is effected by the addition of subsidiary words of the subclass translatif (transferent) to the nucleus. For example, the noun (O) génie has been transferred to the category adjective (A) in un pòete de genie ‘poet of genius’; the morphological marker of the transference is the transferent de. In this way, the noun gènie can be governed by another noun, pòete, without violating the generalised dependency hierarchy above. The notion of transference can therefore be seen as a necessary consequence of the axiomatic correlation of constitutive classes and dependency relations, which is expressed in the generalised dependency hierarchy (Askedal 2003: 85). In effect, transference creates a double-faced nucleus, which acts as an instance of one category with respect to its upwards dependencies, but another category in its downward dependencies (Weber 1996: 250−251; cf. also Askedal 2003: 85). Thus the nucleus around park in They met in a large park simultaneously acts as an adverb (E) dependent of the verb (I) meet and a noun (O) governor of the adjective (A) large. Whereas transference is a qualitative phenomenon in that it changes the categorial status of a nucleus, junction is a quantitative phenomenon, i.e. it connects elements of the same kind (Tesnière 1959: 324). Junction occurs when two or more nuclei in a sentence share the same dependency relation. Thus, in Alfred et Bernhard tombent ‘Alfred and Bernhard fall’ the first (and only) dependent of the verb tombent is realised by two nuclei simultaneously (Tesnière 1959: 325−326). Junction is morphologically marked by jonctifs (junctives), which next to transferents form the second major class of subsidiary words. Junction can result from two dependents sharing the same dependency relation with a governor as in the example above or vice versa two governors sharing a dependency relation with one dependent as in les enfants rient et chantent ‘the children laugh and sing’ (Tesnière 1959: 340). By combining several junctions of either kind, very complex stemmata can result which also allow crossing dependency lines as in le père et la mère achètent et donnent des livres et des cahiers à Alfred et à Bernard, literally the father and the mother buy and give books and writing pads to Alfred and to Bernard ‘Father and mother bought and gave Alfred and Bernard books and writing pads’ (Tesnière 1959: 345):
1014
IV. Syntactic Models
(8)
achètent
le père
et
la mère
et
des livres
et
donnent
des cahiers
à Alfred
et
à Bernard
Tesnière (1959: 327) stresses the fact that junction is strictly extranuclear. This is a consequence of the definition of junction as a phenomenon where two elements share one dependency relation, because such relations only exist between but never inside nuclei. The descriptive adequacy of the notion of strictly extranuclear junction, however, must be doubted. For English, it is easy enough to show that there are structures which show junction inside a construction that in Tesnière’s theory would be considered a nucleus. For example, the last part of the sentence above translates into English as to Alfred and Bernard. Here the nouns Alfred and Bernard share the transferent to, which is a case of intranuclear junction. Similarly two nouns can share one determiner as in peel the potatoes and cucumbers. Furthermore, the copular + adjective construction also counts as one nucleus in Tesnière’s system (see 3.1 above), so that John is young and foolish also illustrates intranuclear conjunction. Simply admitting intranuclear junction, however, would not solve the problem because junction is based on the notion of shared dependency relations, which by definition is extra- and internuclear (see 3.1 above). To resolve this situation, one would either have to redefine junction (but on which basis?) or admit intranuclear dependency relations, which amounts to abolishing the idea of the nucleus. It thus becomes apparent that the real problem does not reside in Tesnière’s theory of junction, but rather in his distinction of extranuclear (i.e. dependency-based) and intranuclear constructions.
4. Some issues in valency research A general survey of concepts of valency can be found in Korhonen (1977), Welke (1988, 2007), Helbig (1992) and Ágel (2000). For an extended notion of valency where valency carrier and governor/head of construction do not necessarily coincide see Welke (2011: 80−81).
4.1. Valency as a theory-neutral and descriptive notion As we have seen in 3.2 above, valency, i.e. the notion that a lexical head determines the number and formal properties of elements in its syntactic environment, and the consequential distinction between actants and circumstantials (or complements and adjuncts in more current terminology) were developed in the framework of Tesnière’s theory of dependency. However, as has been pointed out in section 2.3 above, similar notions such as subcategorisation, the complement-adjunct-distinction in the X′-scheme or θ-roles have found their way into a variety of constituency-based syntactic frameworks. As a result, the notion of valency has become generalised or theory-neutral to a certain extent. In that vein, Müller (2010) discusses the inclusion of valency information in Lexical
29. Foundations of Dependency and Valency Theory
1015
Functional Grammar, Categorial Grammar, Head Driven Phrase Structure Grammar, Construction Grammar and Tree Adjoining Grammar. Nevertheless, it seems fair to say that the ramifications of valency theory have been worked on especially by those linguists who see themselves in a Tesnièrian tradition. For this reason the present article will conclude with a brief survey of valency-related issues.
4.2. Valency dictionaries The lexicalism inherent in the notion of valency has lead to a considerable amount of descriptive, lexicographic work. For the Romance languages Busse (2006: 1424) lists fourteen dictionaries, the oldest of which − Cuervo’s Diccionario de Construcción y Regimen de la Lengua Castellana − goes back to the 19th century. It is clear, however, that these early works can only be called valency dictionaries in the broadest sense in that they deal with the syntagmatic behaviour of the word they describe. A more recent well-known valency dictionary for a Romance language is Busse and Dubost’s Französisches Verblexikon (1977). In Germany, where valency research has a strong base, valency lexicography began with Helbig and Schenkel’s pioneering Wörterbuch zur Valenz und Distribution deutscher Verben ([1968] 21973), which was followed by companion volumes dealing with the valency of adjectives (Sommerfeldt and Schreiber 1974) and the valency of nouns (Sommerfeldt and Schreiber 1977). A much more recent example is VALBU (Schumacher, Kubczak, Schmidt, and de Ruiter 2004), which is based on an analysis of the corpora of the Institut für deutsche Sprache (IdS). In English-speaking countries, notions of valency have traditionally played a minor role. However, there is a strong tradition of learner’s dictionaries following the example of the ground-breaking Idiomatic and Syntactic English Dictionary (Hornby et al. 1942), which include sophisticated descriptions of syntactic patterns around verbs, adjectives and nouns and can be seen as predecessors of the modern notion of valency (Heath 1984: 333). The first dedicated English valency dictionary, however, was published as recently as 2004 under the title Valency Dictionary of English (VDE) by Herbst, Heath, Roe, and Götz. Like VALBU it is corpus-based and takes its descriptions from the analysis of data of the COBUILD Bank of English.
4.3. Valency carriers While it is uncontroversial that the property of valency should be applied to lexical verbs, there is much less agreement as to which other word classes can or should be regarded as valency carriers. While some linguists argue that the term valency should be restricted to the verb (e.g. Eichinger 1995: 37), others have applied it to other traditional word classes such as adjectives (Sommerfeldt and Schreiber 1974; Tarvainen 1981; Herbst 1983; Welke 1988, 2011) and to nouns (Sommerfeldt and Schreiber 1977; Tarvainen 1981; von Randow 1986; Herbst 1988; Welke 1988, 2011) as well as to adverbs (Herbst and Schüller 2008; Welke 2011).
1016
IV. Syntactic Models
The concept of valency can also be applied to other words if one breaks with traditional word class categories. Thus, for example, Herbst and Schüller (2008: 61−62) suggest subsuming traditional subordinating conjunctions, traditional prepositions and some words traditionally classified as adverbs under one word class called particles and explain differences between these in terms of valency: (9)
... I should have thought of it before. (BNC)
(10) ... I went out and exercised the dog before lunch ... (BNC) (11) Even before autumn sets in, gardeners start thinking about spring ... (BNC) Traditionally before is considered an adverb in (9), a preposition in (10) and a conjunction in (11). In valency terms, however, it could be considered a monovalent particle that takes one optional complement, which is not realized in (9), realized by an NP in (10) and a finite clause in (11). This parallels the analysis of verbs in that the latter can also occur in a variety of complementation patterns without changing their word class (cf. Huddleston and Pullum 2002: 599−601, 1011−1014; Herbst and Schüller 2008: 64− 67, 143−145; for a critical view see Breindl 2003: 938; Matthews 2007). In 3.1 above the nucleus rather than the word was identified as the elemental building block of the structural order of the sentence in Tesnière’s dependency theory, which begs the question to what extent multi-word units can be seen as valency carriers as well. This concerns the analysis of phrasal and prepositional verbs in English such as look after or take off or cases such as anmontieren or auflegen in German (Emons 1974: 123− 129, 1995; Ágel 2000: 141−143) as well as phrasal combinations of the type had better or call into question (Emons 1995: 281; Herbst and Schüller 2008: 146) or idioms such as reinen Wein einschenken (literally pour clear wine ‘tell the truth’). Ágel (2003: 33− 34) analyses this idiom holistically as a semantic valency carrier (semantischer Ausdrucksvalenzträger) containing a lexical valency carrier (Wortvalenzträger), i.e. the verb einschenken, which is reminiscent of Somers’ notion of integral complements mentioned below in 4.5.
4.4. Valency and meaning Valency is commonly assumed to be a property not of lexemes but of lexical units in the sense of Cruse (1986: 80). Helbig (1992: 10−13), for example, discusses the interrelatedness of the meaning of a valency carrier and its semantic and syntactic valency, and in a similar vein Welke (2011: 64) sees syntactic valency properties of verbs as a formal reflex of their semantic character as predications. Thus (12) and (13) can be analyzed as exemplifying different meanings of the lexeme get, each with its own set of valency properties: (12) Jane got wet. (‘came to be’) (13) Jane got a bicycle for her birthday. (‘received’) However, the question of how many and which lexical units a given lexeme subsumes frequently does not allow for a straightforward answer (Ágel 2000: 115−117; Klotz
29. Foundations of Dependency and Valency Theory
1017
2000: 120−126). For example, one might argue that write in (14) and (15) are instances of different lexical units since the latter has a habitual meaning (‘be a writer’): (14) John writes beautiful letters. (15) John writes. Although a distinction between divalent write ‘set down on paper’ and monovalent write ‘be a writer’ could be made, it is clear that such an analysis is far more subtle and therefore less compelling than the get-example above. Note, for example, that the habitual interpretation disappears if monovalent write occurs in the progressive or perfective (Herbst and Roe 1996: 185−187): (16) John is writing. (17) John has written. This observation leads to the additional question of whether it would not be more appropriate to specify valency properties for word forms or the occurrence in particular grammatical constructions such as progressive or perfective rather than for lexemes (cf. Ágel 2000: 113−114). A related problem concerns the extent to which valency properties of a lexical unit can be generalized as a function of its meaning. Certain parallels can no doubt be observed; for instance, Francis, Hunston, and Manning (1996) in COBUILD Grammar Patterns list verbs occurring in particular syntactic patterns in semantically defined groups. On the other hand, Klotz (2007) investigates verbs that take propositional complements in the form of that- and N to-infinitive-clauses and concludes that there is no significant statistical correlation between the semantic class a verb belongs to and the form of its propositional complement. Similarly sceptical views about the reducibility of valency properties to facts of meaning are raised by Gazdar et al. (1985: 32), Helbig (1992: 9) and Noël (2003). In fact, empirical research on valency provides ample evidence to show that valency properties must be considered item-specific to a very high degree (Herbst 2009; Faulhaber 2011).
4.5. The character of valency and the complement-adjunct-distinction Although the distinction between complements and adjuncts is a fundamental one in valency theory since only complements add to the valency description of a lexeme, it has been a difficult one to draw from the very beginnings of valency theory in Tesnière (1959: 127−129). He uses three criteria to distinguish the two types of dependents: form (complements are nominal, adjuncts adverbial), semantic role (complements take the role of agent, affected or beneficiary) and obligatoriness (complements tend to be obligatory, adjuncts tend to be optional). As shown in 3.2, these criteria give conflicting evidence in cases like Alfred change de veste where de veste is considered an adjunct because of its prepositional form despite its obligatoriness. It is fair to say that over the years valency grammarians have spent a considerable amount of time and effort in an attempt to clarify and formalize the distinction, which
1018
IV. Syntactic Models
Somers (1984: 508) calls “[b]y far the most researched question in valency grammar”, without arriving at a solution which could be generally considered satisfactory. Indicative of this is Emons’ (1978: 21) conclusion that despite all operational tests making the distinction requires a good measure of enlightened intuition. Similarly, Welke (2011: 62) argues that formal tests can only be considered as indicators of the complement or adjunct status of a dependent without clarifying the intuitive notion of valency as such, while Jacobs (2003: 387) argues that formal test only make sense if they are embedded into a theoretical framework, i.e. the notion of valency must be defined before formal tests for the complement-adjunct distinction can be devised. More recently, valency has increasingly come to be seen as a gradient phenomenon. Ágel (2000: 199−200) divides prototype approaches to valency into mono-dimensional ones, which see complement and adjunct as end points along a single gradient, and multidimensional ones, which see valency as a cover term for a variety of syntactic relations, which accumulatively determine the degree of complementhood of a dependent. Mono-dimensional approaches are exemplified by Somers (1984), Heringer (1984) and Engel (1992). Somers suggests a transition from complements to adjuncts in six discrete steps. He augments the traditional trinity of obligatory and optional complements and adjuncts by three more types of dependent: integral complements, which are lexically bound to a general verb (put sb. at risk); middles, which are not valency-bound but semantically specific (speak French with a Scottish accent); and extraperipherals, which modify an entire proposition (usually). Heringer (1984) develops a cognitively based valency concept according to which a verb triggers a mental scene of some state or process which includes the participants. Participation in such a scene is a matter of gradience and can be tested by an association test. Engel (1992: 64−66) suggests three tests of complementhood: complements cannot be deleted, their form is determined by their governor, they cannot be peripherally attached to the clause by und zwar. The complement-adjunct-gradient results from the fact that dependents can pass all, several or none of the suggested tests. A multi-dimensional approach is suggested by Hudson (1990: 208). For him complements are a prototype category within the general category of dependents, which can be more or less complement-like in a variety of ways. By avoiding the concept of adjunct, Hudson effectively replaces the mono-dimensional complement-adjunct gradient by a multi-dimensional concept, in which dependents approximate the “ideal” of complementhood to various degrees from various sides. The most elaborate multi-dimensional model was originally suggested by Jacobs in 1987, formally published with a postscript in 1994 and presented by Ágel (2000) with some modifications. Jacobs (2003) discusses a more recent version of this, but since it is not possible to do justice to this extended model in the short space available, the older version will be presented. In spirit, however, both versions are compatible. Jacobs suggests that the term valency itself may be misleading in that it subsumes a variety of syntactic dependency relations which are independent of each other and therefore should be kept separate. Ágel (2000: 171−191) factorises valency into five different syntactic relations which are based on whether the dependent is obligatory (NOT), an argument of the valency carrier (ARG), formally specified by the valency carrier (FOSP) or semantically specified by the valency carrier (INSP); the fifth criterion is subcategorisation of the valency carrier by the dependent (SUBKLASS). These criteria must, to some extent, be seen independently of each other because, for example, the second dependent of live in they live in Chester is +NOT but −FOSP (there, on top of a mountain, beyond the river are also possible) whereas the second dependent of smoke in he smokes expensive
29. Foundations of Dependency and Valency Theory
1019
cigars is -NOT (he smokes is acceptable) but +FOSP (it is always an NP). However, in the postscript to his original suggestion Jacobs (1994: 71) suggests that some dependency relations are more fundamental than others. For example, there appear to be no dependency relations which are based on +NOT or +FOSP but are not +ARG at the same time. Jacobs therefore assumes a hierarchy of dependency relations of the following kind: (18) NOT > INSP > ARG FOSP Thus all dependents which are obligatory and/or formally specified are also semantically specified and all dependents which are semantically specified are also arguments of the valency carrier. Jacobs interprets this finding in terms of a gradient of grammaticalisation. While Jacobs and Ágel’s suggestions are generally quite convincing, there are cases which do not easily fit into this general hierarchy of valency relations. For example, benefactive objects such as me in John poured me a beer are arguably +FOSP (always NPs or for-PPs) and also +SUBKLASS (generally monovalent verbs do not admit them *John smiles/sleeps/sneezes me) but it is doubtful whether they can be seen as arguments. A similar point could be made with respect to Somers’ “middles” (see above); the PP in speak French with a Scottish accent seems to be semantically specific in relation to speak without being an argument of speak. Note, however, that the model proposed in Jacobs (2003) defines argument in uncommonly broad terms which would include the cases discussed above.
4.6. Optionality One central aspect in the classification of complements concerns questions of optionality. All valency models seem to agree that not only adjuncts can be regarded as valencyoptional but also certain types of complement (cf. Tesnière’s characterisation of complements discussed above in 3.2). Therefore a distinction between obligatory and optional complements is made but the exact nature of optionality remains a matter of debate. Helbig and Schenkel (21973: 36), for example, see optionality as a surface phenomenon, i.e. they interpret it in terms of a deletion process from a semantically complete underlying representation. However, it is debatable to what extent this notion of deletion applies to cases such as the non-expressed second complement of read in John read all morning, where the speaker may simply not have conceptualised the object of reading in the first place rather than deleting it in the process of formalising his or her message. Helbig and Schenkel (21973: 54) also classify cases of lexical ellipsis such as lay in (19) as a divalent use with deletion: (19) One of our hens started to lay before she was paired to a cock. (VDE) Alternatively, one might consider this as a case of encapsulation where one complement is encapsulated in the verb (cf. Lyons 1977: 262), which therefore assumes a special interpretation and establishes a different lexical unit (cf. 4.4 above). Allerton (1975, 1982: 68−69) also employs the concept of deletion but distinguishes between indefinite deletion (Oliver was reading) and contextual deletion (Oliver was
1020
IV. Syntactic Models
watching), which is only possible if the unexpressed argument is retrievable from context. Similar distinctions are drawn by Matthews (1981: 125), Sæbø (1984) and Fillmore (2007: 146−148), who distinguishes indefinite null instantiations of complements from definite null instantiations. Klotz (2000: 13−14) adds to Allerton’s typology of optionality by introducing stereotypically optional complements to account for cases where some lexically specified complements are optional but not others: (20) We can’t afford to buy a house, so we have to rent. (21) *He arrived without a car so he had to rent. The second complement of rent is optional only if it signifies living space (flat, house, etc.) but not other things (car, etc.). Still another type of optionality is mentioned by Welke (1988: 26, 2011: 136), who, following Pasch (1977) points out that some otherwise obligatory complements need not be realized in contexts of modality and contrast: (22) Er kann (gut) beobachten. he can well observe ‘He knows how to observe’
[German]
The Valency Dictionary of English (Herbst, Heath, Roe, and Götz 2004; cf. 4.2 above) attempts to account for such facts by identifying not only the minimum and maximum valency in active and passive uses but also in a general use such as that of survive in (23) Plants can adapt themselves to their environment and survive. (VDE) Another question of optionality concerns the relationship between valency and clause structure. Many valency descriptions are based on active declarative clauses and therefore consider the subject of the active clause an obligatory complement. The passive is accordingly seen as a case of valency reduction (e.g. Korhonen 1977: 190; Welke 1988: 69) from the Grundvalenz [fundamental valency], which Welke (1988: 63) describes as: “das Wissen der Sprecher/Hörer über das übliche Argumentenpotential” [a speaker’s basic knowledge of the usual number and kind of arguments]. A similar approach is also taken by Heringer (1996: 67−69), who posits a valency frame as a kind of standard realization of valency, which he takes as a default which allows alternations by processes such as ellipsis or reduction. In that vein, Helbig and Schenkel (21973: 56) point out that the obligatory nominative complement of the active clause “becomes” an optional (prepositional) complement in the passive. While this is compatible with Tesnière’s notion of diathesis above (3.3) other researchers view the derivation of passives from actives more critically (cf. Leiss 1992: 132; Ágel 2000: 119). Herbst, Heath, Roe, and Götz (2004) consider the degree of optionality of a complement purely from a lexical point of view and independently of any clause type: in other words, this approach does not make any assumptions about primary clausal structures in terms of a valency description. Complements which take the subject position in active declarative clauses are considered structurally necessary because the clause type requires a subject. From a valency point of view, however, they are considered optional since these
29. Foundations of Dependency and Valency Theory
1021
complements are optional in other clause types such as passives, imperatives or nonfinite clauses. Subjects in finite declarative clauses are therefore structurally necessary but valency-optional (cf. Herbst et al. 2004; Herbst and Roe 1996; compare also Ágel 2000). Generally, however, it may be debatable whether valency slots considered in themselves can always be classified with respect to their degree of optionality. For example, the verb hear allows for an from_NP-complement (SOURCE) followed by an about_NPcomplement (TOPIC): (24) We’d love to hear from you about it. (VDE) Here the
SOURCE-complement
is optional allowing for (25):
(25) I love to hear about people who do things like that. (VDE) However, if the tory:
TOPIC-slot
is realised by an on_NP, the
SOURCE-complement
is obliga-
(26) Some Liberal Democrats seem eager to hear from me again on the subject of their party. (VDE) (27) *Some Liberal Democrats seem eager to hear on the subject of their party. Evidence like this can be taken as an argument for the necessity of valency patterns in addition to complement inventories in a valency description. For that reason both complement inventories and valency patterns are indicated in the VDE. They represent different types of abstraction and are descriptively complementing each other (cf. also Herbst 2007). For German we see a similar notion in the Satzbaupläne which are utilized in the Kleines Valenzlexikon deutscher Verben (Engel and Schumacher 1976; cf. also Engel 1977) and VALBU (Schumacher, Kubczak, Schmidt and de Ruiter 2004).
29. Acknowledgements I would like to thank Thomas Herbst, Timothy Osborne and Kevin Pike for their advice. While I have immensely benefitted from their expertise, the faults in the present article are all mine.
5. References (selected) Ágel, Vilmos 2000 Valenztheorie. (Narr-Studienbücher.) Tübingen: Narr. Ágel, Vilmos 2003 Wort- und Ausdrucksvalenz(träger). In: Alan Cornell, Klaus Fischer, and Ian F. Roe (eds.), Valency in Practice, 17−36. Oxford: Lang. Allerton, D. J. 1982 Valency and the English Verb. London: Academic Press. Allerton D. J. 1975 Deletion and proform reduction. Journal of Linguistics 11: 213−238.
1022
IV. Syntactic Models
Askedal, John Ole 2003 Das Valenz- und Dependenzkonzept bei Lucien Tesnière. In: Vilmos Ágel, Ludwig M. Eichinger, Hans-Werner Eroms, Hans Jürgen Heringer, and Henning Lobin (eds.), Dependenz und Valenz/1, 80−99. (Handbücher zur Sprach- und Kommunikationswissenschaft 1.) Berlin, New York: Walter de Gruyter. Baumgärtner, Klaus 1970 Konstituenz und Dependenz. Zur Integration der beiden grammatischen Prinzipien. In: Steger, Hugo (ed.), Vorschläge für eine Strukturale Grammatik des Deutschen, 52−77. (Wege der Forschung 146.) Darmstadt: Wiss. Buchges. Bloomfield, Leonard 1984 Language. New York: Holt Rinehart and Winston. First published New York: Henry Holt [1933]. Breindl, Eva 2006 Präpositionalphrasen. In: Vilmos Ágel, Ludwig M. Eichinger, Eroms Hans-Werner, Peter Hellwig, Hans Jürgen Heringer, and Henning Lobin (eds.), Dependenz und Valenz/2, 936−951. (Handbücher zur Sprach- und Kommunikationswissenschaft 2.) Berlin, New York: Walter de Gruyter. Busse, Winfried 2006 Valenzlexika in anderen Sprachen. In: Vilmos Ágel, Ludwig M. Eichinger, Hans-Werner Eroms, Hans Jürgen Heringer, and Henning Lobin (eds.), Dependenz und Valenz/2, 1424−1435. (Handbücher zur Sprach- und Kommunikationswissenschaft 2.) Berlin, New York: Walter de Gruyter. Busse, Winfried, and Jean-Pierre Dubost 1977 Französisches Verblexikon. Die Konstruktion der Verben im Französischen. (2. Auflage 1983). Stuttgart: Klett Cotta. Chomsky, Noam 1957 Syntactic Structures. (Janua linguarum 4.) The Hague: Mouton. Chomsky, Noam 1965 Aspects of the Theory of Syntax. Cambridge/Ma.: MIT Press. Cruse, D. Alan 1986 Lexical Semantics. Cambridge: Cambridge University Press. Eichinger, Ludwig M. 1995 Von der Valenz des Verbs und den Abhängigkeiten in der Nominalgruppe. In: Ludwig M. Eichinger, and Hans-Werner Eroms (eds.), Dependenz und Valenz, 37−52. (Beiträge zur germanistischen Sprachwissenschaft 10.) Hamburg: Buske. Emons, Rudolf 1974 Valenzen englischer Prädikatsverben. Tübingen: Niemeyer. Emons, Rudolf 1978 Valenzgrammatik für das Englische. Tübingen: Niemeyer. Emons, Rudolf 1995 Prädikate im Englischen und Deutschen. In: Ludwig M. Eichinger, and Hans-Werner Eroms (eds.), Dependenz und Valenz, 275−285. (Beiträge zur germanistischen Sprachwissenschaft 10.) Hamburg: Buske. Engel, Ulrich 1977 Syntax der deutschen Gegenwartssprache. Berlin: Schmidt. Engel, Ulrich 1992 Der Satz und seine Bausteine. In: Vilmos Ágel, and Regina Hessky (eds.), Offene Fragen, offene Antworten in der Sprachgermanistik, 53−76. (Reihe germanistische Linguistik 128.) Tübingen: Niemeyer. Engel, Ulrich, Helmut Schumacher, and Joachim Ballweg 1976 Kleines Valenzlexikon deutscher Verben. (2., durchges. Aufl. 1978). (Institut für deutsche Sprache, Mannheim: Forschungsberichte 31.) Tübingen: Narr. Eroms, Hans-Werner 1985 Eine reine Dependenzgrammatik für das Deutsche. Deutsche Sprache 13: 306−326.
29. Foundations of Dependency and Valency Theory
1023
Eroms, Hans-Werner 1988 Der Artikel im Deutschen und seine dependenzgrammatische Darstellung. Sprachwissenschaft 13: 257−308. Eroms, Hans-Werner, and Hans Jürgen Heringer 2003 Dependenz und lineare Ordnung. In: Vilmos Ágel, Ludwig M. Eichinger, Hans-Werner Eroms, Hans Jürgen Heringer, and Henning Lobin (eds.), Dependenz und Valenz/1, 247− 263. (Handbücher zur Sprach- und Kommunikationswissenschaft 1.) Berlin, New York: Walter de Gruyter. Faulhaber, Susen 2011 Verb Valency Patters − A Challenge for Semantics-based Accounts. Berlin/New York: de Gruyter. Fillmore, Charles J. 2007 Valency Issues in FrameNet. In: Thomas Herbst, and Katrin Götz-Votteler (eds.), Valency. Theoretical, Descriptive and Cognitive issues, 129−160. (Trends in Linguistics; Studies and Monographs 187.) Berlin: Mouton de Gruyter. Francis, Gill, Susan Hunston, and Elizabeth Manning 1996 Collins Cobuild Grammar Patterns 1:Verbs. London, Glasgow: HarperCollins. Gazdar, Gerald, Ewan Klein, Geoffrey K. Pullum, and Ivan Sag 1985 Generalized Phrase Structure Grammar. Cambridge, Mass.: Harvard University Press. Halliday, Michael A. K., and Ruqaiya Hasan 1976 Cohesion in English. London: Longman. Heath, David 1984 Grammatische Angaben in Lernerwörterbüchern des Englischen. In: Henning Bergenholtz, and Joachim Mugdan, (eds.), Lexikographie und Grammatik. Akten des Essener Kolloquiums zur Grammatik im Wörterbuch, 28.−30. 6. 1984, 332−345. Tübingen: Niemeyer. Helbig, Gerhard 1992 Probleme der Valenz- und Kasustheorie. Tübingen: Niemeyer. Helbig, Gerhard, and Wolfgang Schenkel 1973 Wörterbuch zur Valenz und Distribution deutscher Verben. 2nd edition. Leipzig: VEB Bibliographisches Institut. Herbst, Thomas 1983 Untersuchungen zur Valenz englischer Adjektive und ihrer Nominalisierungen. (Tübinger Beiträge zur Linguistik 233.) Tübingen: Narr. Herbst, Thomas 1988 A valency model for nouns in English. Journal of Linguistics 24: 265−301. Herbst, Thomas 2009 Valency − item-specificity and idiom principle. In: Ute Römer, and Rainer Schulze (eds.), Exploring the Lexis-Grammar Interface, 49−68. Amsterdam/Philadelphia: Benjamins. Herbst, Thomas, David Heath, Ian Roe, and Dieter Götz 2004 A Valency Dictionary of English. Berlin and New York: De Gruyter. (= VDE) Herbst, Thomas, and Ian Roe 1996 How obligatory are obligatory complements? − An alternative approach to the categorization of subjects and other complements in valency grammar. English Studies 2: 179− 199. Herbst, Thomas, and Susen Schüller 2008 Introduction to Syntactic Analysis. A Valency Approach. (Narr Studienbücher.) Tübingen: Narr. Heringer, Hans Jürgen 1984 Neues in der Verbszene. In: G. Strickel (ed.), Pragmatik in der Grammatik, 34−64. Düsseldorf: Schwann.
1024
IV. Syntactic Models
Heringer, Hans Jürgen 1993a Dependency Syntax − Basic ideas and the classical model. In: Joachim Jacobs, Arnim von Stechow, Wolfgang Sternefeld, and Theo Vennemann (eds.), Syntax. Ein Internationales Handbuch zeitgenössischer Forschung = An Internatinal Handbook of Contemporary Research, 298−316. (Handbücher zur Sprach- und Kommunikationswissenschaft = Handbooks of Linguistics and Communication Science 9.) Berlin, New York: Walter de Gruyter. Heringer, Hans Jürgen 1993b Dependency Syntax − Formalized Models. In: Joachim Jacobs, Arnim von Stechow, Wolfgang Sternefeld, and Theo Vennemann (eds.), Syntax. Ein Internationales Handbuch Zeitgenössischer Forschung = An Internatinal Handbook of Contemporary Research, 316−328. (Handbücher zur Sprach- und Kommunikationswissenschaft = Handbooks of Linguistics and Communication Science 9.) Berlin, New York: Walter de Gruyter. Heringer, Hans Jürgen 1996 Deutsche Syntax − Dependentiell. Tübingen: Stauffenburg. Hornby, Albert S., Edward V. Gatenby, and Harold Wakefield 1942 Idiomatic and Syntactic English Dictionary. Tokyo: Kaitakusha. Horrocks, Geoffrey 1993 Generative Grammar. (Longman linguistics library.) London: Longman. Huddleston, Rodney D., and Geoffrey K.Pullum 2002 The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press. Hudson, Richard A. 1984 Word Grammar. Oxford: Blackwell. Hudson, Richard A. 1990 English Word Grammar. Oxford: Blackwell. Hudson, Richard A. 1993 Recent Developments in Dependency Theory. In: Joachim Jacobs, Arnim von Stechow, Wolfgang Sternefeld, and Theo Vennemann (eds.), Syntax. Ein Internationales Handbuch Zeitgenössischer Forschung = An Internatinal Handbook of Contemporary Research, 329−338. (Handbücher zur Sprach- und Kommunikationswissenschaft = Handbooks of Linguistics and Communication Science 9.) Berlin, New York: Walter de Gruyter. Hudson, Richard A. 2003 Word Grammar. In: Vilmos Ágel, Ludwig M. Eichinger, Hans-Werner Eroms, Hans Jürgen Heringer, and Henning Lobin (eds.), Dependenz und Valenz/1, 508−526. (Handbücher zur Sprach- und Kommunikationswissenschaft 1.) Berlin, New York: Walter de Gruyter. Hudson, Richard A. 2007 Language Networks. The New Word Grammar. (Oxford linguistics.) Oxford: Oxford Univ. Press. Jackendoff, Ray 1977 X′-Syntax: A Study of Phrase Structure. Cambridge, Ma.: MIT Press. Jacobs, Joachim 1994 Kontra Valenz. (Fokus 12.) Trier: Wissenschaftlicher Verlag Trier. Jacobs, Joachim 2003 Die Problematik der Valenzebenen. In: Vilmos Ágel, Ludwig M. Eichinger, Eroms HansWerner, Hans Jürgen Heringer, and Henning Lobin (eds.), Dependenz und Valenz/1, 378−399. (Handbücher zur Sprach- und Kommunikationswissenschaft 1.) Berlin, New York: Walter de Gruyter.
29. Foundations of Dependency and Valency Theory
1025
Klotz, Michael 2000 Grammatik und Lexik. Studien zur Syntagmatik englischer Verben. (ZAA studies 7.) Tübingen: Stauffenburg. Klotz, Michael 2007 Valency rules? The case of verbs with propositional complements. In: Thomas Herbst, and Katrin Götz-Votteler (eds.), Valency. Theoretical, Descriptive and Cognitive Issues, 117−128. (Trends in linguistics Studies and monographs 187.) Berlin: Mouton de Gruyter. Korhonen, Jarmo 1977 Studien zur Dependenz, Valenz und Satzmodell − Teil I. Bern: Peter Lang. Leiss, Elisabeth 1992 Die Verbalkategorien des Deutschen. Ein Beitrag zur Theorie der sprachlichen Kategorisierung. (Studia linguistica Germanica 31.) Berlin: De Gruyter. Lobin, Henning 1995 Komplexe Elemente − Indizien aus Nominalphrase und Verbalkomplex. In: Ludwig M. Eichinger, and Hans-Werner Eroms (eds.), Dependenz und Valenz, 117−133. (Beiträge zur germanistischen Sprachwissenschaft 10.) Hamburg: Buske. Lyons, John 1977 Semantics. Cambridge: Cambridge University Press. Matthews, Peter H. 1981 Syntax. Cambridge: Cambridge University Press. Matthews, Peter H. 2007 The scope of valency in grammar. In: Thomas Herbst, and Katrin Götz-Votteler (eds.), Valency. Theoretical, Descriptive and Cognitive Issues, 3−14. (Trends in linguistics Studies and monographs 187.) Berlin: Mouton de Gruyter. Müller, Stefan 2010 Grammatiktheorie. Tübingen: Stauffenburg. Noël, Dirk 2003 Is there semantics in all syntax? The case of accusative and infinitive constructions vs. that-clauses. In: Günter Rohdenburg, and Britta Mondorf (eds.), Determinants of Grammatical Variation in English, 347−377. (Topics in English linguistics 43.) Berlin: De Gruyter. Pasch, Renate 1977 Zum Status der Valenz. In: Beiträge zur Semantischen Analyse, 1−50. (Linguistische StudienReihe A 42.) Berlin: Akademie der Wissenschaften der DDR, Zentralinst. für Sprachwiss. Pasierbsky, Fritz 1981 Sprachtypologische Aspekte der Valenztheorie unter besonderer Berücksichtigung des Deutschen. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 34: 160−177. Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik 1985 A Comprehensive Grammar of the English Language. Harlow: Longman Pearson Education. Randow, Elise von 1986 Valente Substantive des Englischen. (Tübinger Beiträge zur Linguistik 294.) Tübingen: Narr. Sæbø K. J. 1984 Über fakultative Valenz. Deutsche Sprache 12/2: 97−109. Schmidt, Jürgen Erich 1991 Konvergenzen zwischen neueren Grammatiktheorien und Deskriptionsgrammatiken. Zum Verhältnis von Konstituenz, Rektion (Government), Valenz und Dependenz. In: Elisabeth Feldbusch, Reiner Pogarell, and Cornelia Weiß (eds.), Neue Fragen der Lin-
1026
IV. Syntactic Models
guistik. Akten des 25. Linguistischen Kolloquiums, Paderborn, 1990. Bd. 1. Bestand und Entwicklung, 211−218. (Linguistische Arbeiten.) Tübingen: Niemeyer. Schumacher, Helmut, Jacqueline Kubczak, Renate Schmidt, and Vera de Ruiter 2004 VALBU, Valenzwörterbuch deutscher Verben. (Studien zur deutschen Sprache 31.) Tübingen: Narr. Sells, Peter 1985 Lectures on Contemporary Syntactic Theories. Stanford: Center for the Study of Language and Information. Somers, Harold L. 1984 On the validity of the complement-adjunct distinction in valency grammar. Linguistics 22: 507−530. Sommerfeldt, Karl-Ernst, and Herbert Schreiber 1974 Wörterbuch zur Valenz und Distribution deutscher Adjektive. Leipzig: Bibliograph. Inst. Sommerfeldt, Karl-Ernst, and Herbert Schreiber 1977 Wörterbuch zur Valenz und Distribution der Substantive. Leipzig: Bibliograph. Inst. Starosta, Stanley 1988 The Case for Lexicase. An Outline of Lexicase Grammatical Theory. (Open linguistics series.) London: Pinter Publ. Starosta, Stanley 2003 Lexicase Grammar. In: Vilmos Ágel, Ludwig M. Eichinger, Hans-Werner Eroms, Hans Jürgen Heringer, and Henning Lobin (eds.), Dependenz und Valenz/1, 526−545. (Handbücher zur Sprach- und Kommunikationswissenschaft 1.) Berlin, New York: Walter de Gruyter. Tarvainen, Kalevi 1981 Einführung in die Dependenzgrammatik. Tübingen: Niemeyer. Tesnière, Lucien 1959 Eléments de Syntaxe Structurale. Paris: Klincksieck. Tesnière, Lucien 1980 Grundzüge der Strukturalen Syntax. übersetzt von Ulrich Engel. Stuttgart: Klett-Cotta. Uzonyi, Pál 2003 Dependenzstruktur und Konstituenzstruktur. In: Vilmos Ágel, Ludwig M. Eichinger, Hans-Werner Eroms, Hans Jürgen Heringer, and Henning Lobin (eds.), Dependenz und Valenz/1, 230−247. (Handbücher zur Sprach- und Kommunikationswissenschaft 1.) Berlin, New York: Walter de Gruyter. Vennemann, Theo 1977 Konstituenz und Dependenz in einigen neueren Grammatiktheorien. Sprachwissenschaft 2: 259−301. Weber, Heinz 1996 Translation, Rekursivität und Valenz bei Lucien Tesnière. In: Gertrud Gréciano, and Helmut Schumacher (eds.), Lucien Tesnière − Syntaxe Structurale et Opérations Mentales. Akten des deutsch-französischen Kolloquiums anläßlich der 100. Wiederkehr seines Geburtstages, Strasbourg 1993, 249−261. (Linguistische Arbeiten 348.) Tübingen: Niemeyer. Welke, Klaus 1988 Einführung in die Valenz- und Kasustheorie. Leipzig: Bibliographisches Institut. Welke, Klaus 1995 Dependenz, Valenz und Konstituenz. In: Ludwig M. Eichinger, and Hans-Werner Eroms (eds.), Dependenz und Valenz, 163−175. (Beiträge zur germanistischen Sprachwissenschaft 10.) Hamburg: Buske. Welke, Klaus 2011 Valenzgrammatik des Deutschen. Eine Einführung. Berlin: De Gruyter.
Michael Klotz, Erlangen-Nürnberg (Germany)
30. Dependency Grammar
1027
30. Dependency Grammar 1. 2. 3. 4. 5. 6. 7. 8.
Binary division Words to nodes Distribution: heads and dependents A family of grammars Catenae Dependency parsers Conclusion References (selected)
Abstract Dependency grammar (DG) is a family of grammars that proceed from the foundational assumption that dependency, not constituency, is the basic relation that groups syntactic units. At the core of the dependency vs. constituency distinction is the (putative) binary division of the clause into a subject NP and a predicate VP (e.g. S % NP VP). Constituency grammars almost unanimously build on this division in some form or another, whereas DGs reject it. Instead of the binary division, DGs, following Tesnière (1959), take the verb as the root of all clause structure (in matrix clauses). One recent development that is closely associated with DG is the catena unit. The catena is defined as any word or any combination of words that is continuous with respect to dominance. A number of phenomena of syntax have been shown to be sensitive to catenae, e.g. idiosyncratic meaning, ellipsis, discontinuities, predicate-argument structures. There has been an upsurge of interest in DG in recent years. This upsurge is due mainly to the increasing use of dependency parsers in computational applications.
1. Binary division Lucien Tesnière (1893−1954), the father of modern dependency grammar (DG) approaches to syntax and grammar, argued vehemently against the binary division of the clause that is at the heart of all constituency-based systems (e.g. S % NP VP). Tesnière passed away just three years before Chomsky’s (first) seminal work on phrase structure grammar (= constituency grammar) appeared in 1957. Hence Tesnière was no longer alive at the time when Chomsky’s ideas were being adopted almost unanimously. This situation allowed linguistics and the study of syntax to largely overlook and ignore one of Tesnière’s primary insights (1959/69: 103−105). This insight was that the binary subjectpredicate division of the clause stems from term logic of antiquity and has no place in the study of syntax. In place of the binary division, Tesnière viewed the verb as the root of all clause structure. The choice that one makes in this area has a profound impact on the entire rest of the theory of syntax and grammar. A theory that acknowledges the binary division is likely to become a constituency grammar (a phrase structure grammar), and a theory that rejects this division will likely be a DG.
1028
IV. Syntactic Models
The fundamental choice between dependency and constituency and the competing stances toward the (putative) subject-predicate division are illustrated as follows: (1)
V N
S N(P)
a. Structure exists. Dependency
V(P)
b. Structure exists. Constituency
The binary, subject-predicate division is present in the constituency tree on the right; it is apparent in the equi-level appearance of the subject (NP) and the predicate (VP); the two are in a symmetric relation. The verb-as-root approach is obvious in the dependency tree on the left; the verb is positioned above the subject noun in such a manner that a clear dominance relation obtains. The dependency between the finite verb exists and the subject noun structure is asymmetric and directed, since the verb dominates the noun. Tesnière’s claim was that empirical considerations support the verb-as-root approach of dependency over the binary, subject-predicate division of constituency. Constituency grammars can point to subject−object asymmetries as motivation for the binary division. The subject NP appears as a sister of VP, whereas the object NP, when present, appears as a sister of V. This asymmetry across subject and object is then the basis for the numerous differences in behavior across subjects and objects. Since DGs often view subject and object as sisters in syntax as shown in (2), they must seek another means of addressing subject-object asymmetries. The standard assumption in this regard is that the syntactic functions are primitive, whereby each and every dependency bears a function that is at least somewhat independent of the structural configuration at hand, e.g. (2)
loves SUBJ
OBJ
Luke
liver
Luke
loves liver.
Subject−object asymmetries are then addressed in terms of a ranking of syntactic functions, the subject being ranked of course higher than the object. While constituency grammars may, with good reason, reject the notion that the syntactic functions are independent of the configuration, they should also acknowledge that there is little theory-neutral empirical support for the existence of a finite VP constituent. Many widely employed constituency tests for English, for instance, deliver little evidence in favor of a finite VP-constituent, e.g. (3)
The King opened parliament. a. *… and opened parliament the King. b. ??What the King did was opened parliament. c. The King did so (did so ≈ opened parliament) d. What did the King do? − ?Opened parliament.
− − − −
Topicalization Pseudoclefting Proform substitution Answer fragment
30. Dependency Grammar
1029
Topicalization and pseudoclefting fail to identify the finite VP opened parliament as a constituent. Concerning proform substitution, it is not clear whether did so (or did it or did that) can qualify as a proform taking the place of opened parliament, since the omission of the non-finite VP open parliament may be the better analysis: The King did so (open parliament), The King did (open parliament). And while answer fragments like the one in (3d) are generally taken to be acceptable, I prefer to include the subject: He opened parliament. In such cases, the possibility of topic drop, a common occurrence in colloquial speech, may be favorably influencing the acceptability of the fragment. In sum, the data do not deliver solid evidence in favor of a finite VP constituent. One can compare this issue surrounding the presence/absence of a finite VP constituent with the data focusing on a non-finite VP constituent. The four constituency tests are much clearer in this area, e.g. (4)
The King will open parliament. a. … and open parliament the King (certainly) will. − b. What the King will do is open parliament. − c. The King will do so (do so = open parliament) − d. What will the King do? − Open parliament. −
Topicalization Pseudoclefting Proform substitution Answer fragment
The four constituency tests now agree that non-finite VP is a constituent. This agreement is consistent with both types of structure, dependency and constituency, since both acknowledge non-finite VP as a constituent (= complete subtree). Another empirical insight challenging the initial binary division occurs with one-word sentences, such as with imperatives, e.g. Help! The dependency relation easily produces a structural analysis of such utterances, whereas the constituency relation cannot do the same without introducing additional assumptions, such as an elided subject, e.g. (5)
S V a. Help!
N
V
b. (You) Help!
The dependency relation on the left, which is a one-to-one relation, easily matches the verb Help to the V node, whereas the constituency tree on the right employs an empty word and node to maintain the constituency-based analysis, where the nodes should outnumber the words by at least one. Constituency might point out that the lack of inflection on the verb allows the absence of the subject. That point might be correct for English, but it is not valid for other languages in which the inflection is present, e.g. German Helf-t mir!, French Aid-ez-moi!. Furthermore, it does nothing to address the underlying difficulty. This difficulty is that in order to maintain a constituency-based analysis, the nodes should outnumber the words. The tendency for constituency-based systems to employ a greater theoretical apparatus than DGs is evident in this simple example and shall be illustrated and emphasized again below.
1030
IV. Syntactic Models
2. Words to nodes The dependency relation is a one-to-one relation; for every word in the sentence at hand − this contribution focuses on words − there is exactly one corresponding node in the structure that one assumes for that sentence. The constituency relation, in contrast, is a one-toone-or-more relation; for every word in the sentence, there are one or more nodes in the structure that one sees as corresponding to that word. The one-to-one-or-more relation of constituency enables constituency-based systems to grow the amount of structure, whereas the one-to-one dependency relation places a major restriction on the amount of structure that the syntax can assume. The result is that dependency-based trees tend to contain half as many nodes and edges as the corresponding constituency-based trees. The inherent minimalism of dependency is illustrated with the following trees. The dependency tree appears above the corresponding constituency tree: (6)
is Fred
trying
- Dependency to understand difference the
between relations the
a. Fred is trying to understand the
difference between the relations.
is Fred
is
is
- Constituency trying trying to to understand understand difference the difference difference between between relations the relations
b. Fred is trying to understand the
difference between the
relations.
Both trees employ the convention whereby the words themselves appear as the node labels. The dependency tree contains 10 words and thus there are 10 nodes in the tree. The constituency tree, in contrast, contains the same 10 words, but there are 19 nodes in the tree. A strictly binary branching analysis is assumed for the constituency tree. This assumption increases the number of overt constituents to the upper limit of what is possible. There is an irony in these trees. The desire to reduce the amount of sentence structure associated with the introduction of Minimalism and bare phrase structure (BPS)
30. Dependency Grammar
1031
in generative grammar − see Chomsky (1995) − has left the tree structures there far short of the minimalism that is inherent in the dependency relation. Another noteworthy aspect of these trees and of the distinction between dependencyand constituency-based structures in general concerns the (non-)ability to acknowledge exocentric structures. Dependency by its very nature cannot acknowledge exocentric constituents. In other words, every word in a dependency hierarchy has a head (the one word that immediately dominates a given word), the one exception being the root (the highest node). Constituency, in contrast, can and does at times acknowledge constituents without heads. In fact in its purest form, all constituency-based structure is exocentric, i.e. no heads are identified. Imagine tree (6b) with X each time in place of the labels on the phrasal nodes. Such a tree structure would fail entirely to show heads and dependents, but it would still be grouping the words and thus acknowledging constituents. The same alteration applied to tree (6a) would have a much lesser impact, since the heads and dependents would still be clearly visible if only X were to appear as the label on each node in the tree. What this means is that dependency inherently conveys important information about syntactic structure that constituency does not. Constituency-based structures must be augmented with identifying labels on the phrasal nodes if they want to distinguish heads from dependents in tree structures. Since dependency-based systems are restricted in the amount of sentence structure that they can posit, they are also incapable of acknowledging much of the theoretic apparatus associated with many constituency grammars. Notions in the GB/MP tradition such as the X-bar schema, c-command, and functional categories that exist separate from lexical items are not or are hardly possible in dependency-based systems. Other means must be sought to account for the phenomena that these concepts were designed to shed light on.
3. Distribution: heads and dependents Syntactic dependencies are directed, which means that if a dependency obtains between two words, there is a strict head-dependent (≈ mother-daughter) relation between them: the one word is the head, and the other its dependent. Dependency is therefore an organizing principle that performs a major role in syntax; it is grouping words together in such a manner that units of structure can be acknowledged and thus meaning conveyed. A challenge facing all theories of syntax in this area, though, is to determine how the units forming a given sentence are grouped. For DGs, the challenge of identifying syntactic units is tantamount to identifying the syntactic dependencies present and the directions in which these dependencies point. This task is not always an easy one, since it has not been easy to provide a single criterion that DGs can agree upon as the primary means for identifying and thus defining syntactic dependencies. One criterion that is widely employed to identify the presence and direction of syntactic dependencies, however, is distribution (e.g. Owens 1984: 36; Schubert 1988: 40), which is also known as passive syntactic valency (Mel’čuk 2003: 200). Given two words that are connected by a dependency, the one word of the two that is more responsible for determining the environments in which the two together can appear is deemed to be head over the other. For instance, given the two-word combination very happy, one
1032
IV. Syntactic Models
intuitively senses that there is a dependency connecting the two, in particular because very reduces the set of entities that happy can be predicated of to those that are extremely happy. Determining that happy is head over very is relatively simple, since in many sentences where very happy can appear, happy alone can also appear, but not vice versa, e.g. Tom is very happy % Tom is happy % *Tom is very. In other words, happy is more responsible for determining the distribution of very happy than very is. The omission diagnostic just employed is of limited value, though, since there are many cases where it does not promote the identification of head and dependent, for instance concerning many determiner-noun combinations, e.g. The mouse scurried away % *Mouse scurried away % *The scurried away. Such data make it difficult to determine whether the is head over mouse, or vice versa, and indeed, some work in DG has assumed determiner phrases (DPs, determiner as head), e.g. Hudson (1984, 1990), whereas others (most) assume noun phrases (NPs, noun as head). Confronted with this difficulty, other facets of distribution (beyond simple omission) must be accessed. Building on a long tradition of diagnostics for determining syntactic structure, the substitution test is perhaps the most widely employed additional diagnostic. If a two-word combination can be replaced by a single word, then the one word of the two that has most in common with the substitute is the head, e.g. Mice scurried away. Since mice is more like mouse than it is like the, mouse must be head over the in the original sentence. There are certainly difficulties facing the substitution diagnostic as well, but these difficulties will not be considered here. Suffice it to say that experts can disagree about how “dependency” should be defined. This disagreement is, however, no more severe than the difficulties facing any theory of syntax that endeavors to discern syntactic units in such a manner that structure can be acknowledged.
4. A family of grammars The dependency relation is compatible with the variety of principles and concepts that grammars adopt. A DG can be representational or derivational, mono- or multistratal (in the syntax), rule- or construction-based.
4.1. Mono- vs. multistratal Most DGs reject the transformations associated with the tradition of generative grammar, which makes them representational (as opposed to derivational). This fact does not prevent DGs, however, from being multistratal. Two prominent dependency-based frameworks are multistratal in syntax: Meaning-Text Theory (MTT, Mel’čuk 1988, 2003) and Functional Generative Description (FGD, Sgall et al. 1986). MTT posits seven strata, two of which are syntactic and lack linear order (i.e. the lexical units are organized hierarchically but not linearly). FGD also assumes two levels of syntax, a surface analytical stratum and a deep tectogrammatical stratum, both of which encode hierarchical and linear order. Many other DGs, in contrast, are monostratal in syntax, e.g. Word Grammar (Hudson 1984, 1990, 2007), Lexicase (Starosta 1988), the German schools (e.g. Engel 1994; Heringer 1996; Groß 1999; Eroms 2000), etc.
30. Dependency Grammar
1033
4.2. Linear order Tesnière drew a foundational distinction between hierarchical and linear order. This distinction is expressed in the following passage: “Construire, ou établir le stemma d’une phrase, c’est en transformer l’ordre linéaire en ordre structural (…) Inversement, relever un stemma, ou en faire la mise en phrase c’est en transformer l’ordre structural en ordre linéaire (…)” [To speak a language, or to establish the stemma of a sentence, is to transform structural order to linear order (…) Conversely, to comprehend a language is to transform linear order to structural order (…)] (Tesnière, 1959/69: 19)
This statement separates the two ordering dimensions. Linear order and hierarchical order should be viewed as separate according to Tesnière. For language production, linear order is preceded by hierarchical order, and vice versa for language comprehension. What this meant concretely for Tesnière’s theory is that his stemmas (trees) abstracted away from linear order and focused entirely on hierarchical order. Hierarchical order was thus deemed primary and more central to syntax than linear order. The primacy of hierarchical order is evident in the numerous stemmas that Tesnière included in his Éléments, e.g. (7)
a. Tantae molis erat Romanam condere gentem. Such mass was Roman to.found race ‘It was such a massive task to establish the Roman race.’
[Latin]
erat
b. condere
molis
gentem
tantae
Romanam
(Tesnière 1959/69: 20, stemma 11) The stemma is intended to represent hierarchical order only (just dominance relations), not linear order as well (not precedence relations). Tesnière’s Éléments includes 366 of these stemmas. The impact and influence of Tesnière’s ideas concerning linear order left DG with the strong tendency to relegate linear order to secondary status. Indeed, one of DG’s perceived strengths has been its ability to abstract away from linear order and focus intently on the role of hierarchical order. This ability has been particularly beneficial for the analysis of languages with freer word order than that of English. At the same time, however, this traditional emphasis on the role of hierarchical order has undoubtedly contributed to the impression that DG has little to say about certain word order phenomena, such as those associated with discontinuities (e.g. wh-fronting, topicalization, scrambling, and extraposition). In this regard, one can compare the traditional emphasis on hierarchical order of DGs with the strong tendency among constitu-
1034
IV. Syntactic Models
ency grammars to always include linear order in their tree structures. In fact producing constituency trees that abstract away from linear order with the goal of focusing entirely on hierarchical order is difficult, since constituency and linear order are more intimately intertwined. But the emphasis on hierarchical order (to the detriment of linear order) may have actually been a contributing factor to the secondary status of dependency-based syntax in many linguistics circles. As mentioned above, Tesnière’s ideas actually predate Chomsky’s early works, yet the dominance of constituency-based systems in theoretical syntax is undeniable. In this respect, the traditional focus on hierarchical order may have actually decreased interest in DG in general, since explanations of the actual word orders that are and are not possible for a given language or language phenomenon are certainly a central desideratum of theoretical syntax in general. One should also be aware of the fact, however, that despite the traditional emphasis on hierarchical order, nothing prevents DGs from focusing just as intently on linear order as constituency grammars. Certainly the monostratal DGs mentioned above do just this. They produce syntactic representations (usually trees) that encode both hierarchical and linear order simultaneously and in so doing, they are striving to understand and explain which word orders are and are not possible. The traditional emphasis of DGs on hierarchical order should not be misinterpreted to mean that DGs are less capable of addressing word order phenomena.
4.3. Trees vs. networks Many DGs assume rooted trees to represent syntactic structure, just as many constituency grammars do. A rooted tree is a directed graph that has all edges extending down and away from the root word. An interesting fact about dependency graphs in this area is that they need not be trees, but rather they can be networks (or more precisely, DAGs = directed acyclic graphs). If a given node in a dependency graph has two or more parent nodes, that graph is no longer a tree, but rather it is a network. The benefit of assuming networks is that the theory becomes capable of simultaneously acknowledging both syntactic and semantic dependencies, e.g. (8)
has been chewing puppy
on
The
furniture the
The puppy
has been
chewing on the
furniture.
The word puppy is shown with two heads, has being its syntactic head (government) and chewing being its semantic head (selection). As stated, most DGs assume trees, not networks. Word Grammar (Hudson 1984, 1990, 2007), however, takes advantage of this ability of dependency hierarchies to easily
30. Dependency Grammar
1035
allow for multi-headedness. The convention that Word Grammar employs for representing dependencies is different from that in graph (8), though. Word Grammar employs directed arcs above and below the string of words: (9)
The puppy
has been
chewing on the
furniture.
The arcs above the string of words are purely surface syntactic dependencies, whereas the arc below the string of words represents a deep semantic dependency. The subject noun puppy again has two heads, which makes this dependency hierarchy a network, just as (8) is a network. The point to the varying formats shown with (8) and (9) is that DGs differ with respect to the conventions they employ to represent dependencies. Despite the varying conventions, the underlying commitment to a dependency-based understanding of sentence structure is consistent. The extent to which constituency-based systems can employ networks is not clear. Certainly constituency graphs could be augmented to allow a given projection to have two (or more) heads, but the increased complexity of the resulting graphs would likely be daunting.
5. Catenae A recent and promising development associated with DG is the catena unit. O’Grady (1998) observed that the fixed words of many idioms cannot be stored in the lexicon as constituents, but that they are always stored as (what he called) “chains”. O’Grady’s “chain” has since been renamed catena (Latin for ‘chain’, plural catenae) in an attempt to avoid confusion with the chains of derivational theories − see Osborne et al. (2012). A catena is defined as any word or any combination of words that is continuous with respect to dominance. The definition of the catena can be compared with the definition of the string: any word or any combination of words that is continuous with respect to precedence. A more formal definition is the catena is as follows: (10) Catena: A single w(ord) or a set of w(ord)s C such that for all w in C, there is another w′ in C that either immediately dominates or is immediately dominated by w. According to this definition, any given tree or any given subtree of a tree qualifies as a catena. The utility of the catena concept has been firmly established (Osborne 2005, 2012; Groß and Osborne 2009; Osborne and Groß 2012; Osborne et al. 2011, 2012). By acknowledging catenae, the door is open to a parsimonious account of numerous phenomena of syntax. Idiosyncratic meaning chunks are stored in the lexicon as catenae (not necessarily as constituents). The elided material of ellipsis mechanisms (e.g. gapping, stripping, VP-ellipsis, pseudogapping, answer ellipsis, sluicing, comparative deletion, etc.) should be a catena (but certainly not a constituent). The analysis of discontinuities (topicalization, wh-fronting, scrambling, extraposition) becomes more efficient if the role of
1036
IV. Syntactic Models
catenae is acknowledged. And a direct bridge is built to semantics by the insight that predicates and their arguments are always catenae in surface syntax (given continuous structures). The following subsections support these claims, however briefly.
5.1. An illustration As just stated, the definition identifies any combination of words that are linked together by dependencies as a catena. If the standard definition of the constituent is adopted (any node/word plus all the nodes/words that that node/word dominates), then every constituent is also a catena. These units are illustrated with the following tree: illustrates C
(11) tree B This A
unit F the D catena E
This tree illustrates the
catena
unit.
The capital letters serve to abbreviate the words. There are 24 distinct catenae in (11): A, B, C, D, E, F, AB, BC, CF, DF, EF, ABC, BCF, CDF, CEF, DEF, ABCF, BCDF, BCEF, CDEF, ABCDF, ABCEF, BCDEF, and ABCDEF. There are 39 non-catena word combinations in (11), e.g. AC, BF, BD, ABD, BCE, ABDE, BCDE, etc. There are a mere six constituents in (11): A, D, E, AB, DEF, and ABCDEF. Each (of these) constituent(s) is a catena, but it should be apparent that there are many catenae that are not constituents. Thus the constituent is a subtype of catena.
5.2. Idiosyncratic meaning Units of idiosyncratic meaning (= non-compositional meaning) of every sort are stored in the lexicon as catenae, whereby many of these stored catenae are not constituents. This fact is illustrated here briefly with a few idioms: (12)
throw
take X to
X to wolves
cleaners
the a. throw X to the
the wolves
b. take X to the
cleaners
step pull
on leg
X’s c. pull X’s leg
toes X’s d. step on X’s toes
30. Dependency Grammar
1037
make
scare fun
daylights out of
the
of
X
X
e. make fun of X
f. scare the daylights out of X
The X in each of these idioms marks an argument that is outside of the idiom, and due to the presence of this argument, the words of the idiom do not form a constituent. They do, however, form a catena each time. If space allowed, data of this sort could easily be produced for idiosyncratic meanings of every sort. What this means is that units of meaning are being stored in the lexicon as catenae, and as such, they are concrete units of syntax. There is a potential source of confusion that should be noted concerning idiosyncratic meaning. While units of meaning are stored as catenae in the lexicon, they can be broken up in actual syntax when they appear with other, more functional units of meaning (as opposed to lexical units of meaning), e.g. Sam spilled the beans % The beans have been spilled by Sam. The parts of the idiom spill the beans no longer form a catena in the passive sentence due to the intervention of have been in the hierarchy of words. When this occurs, it does not change the fact that all meaning units are stored as catenae. In other words, spill the beans is still a catena in the lexicon. See Osborne and Groß (2012: 196−201) for discussion.
5.3. Ellipsis An examination of ellipsis phenomena motivates the hypothesis that the elided material of ellipsis mechanisms is a catena. If it is not a catena, the attempt at ellipsis fails. This hypothesis is illustrated here just briefly with examples of gapping and answer ellipsis. Gapping usually elides a finite verb at the very least, and often other words are included in the gap. The gapped word combinations form a catena, e.g. (13)
Should I
come visit
complements you
a. Should I come visit you, or you me?
He always
her
b. He always complements her, and she him.
The elided word combinations Should…come visit and always complements are not constituents, but they are catenae. The following data illustrate the same point using answer fragments (see [14]). The acceptable answer fragments in examples (14a−e) have the elided material forming a catena, whereas the unacceptable answer fragments in (14f−g) have the elided material as a non-catena.
1038
IV. Syntactic Models
(14)
wants Tom
Bill to drive car our
Tom a. b. c. d. e. f. g.
wants
Who wants Bill to drive our car? Who does Tom want to drive our car? Whose car does Tom we want Bill to drive? What does Tom want Bill to drive? What does Tom want Bill to do? What of ours does Tom want Bill to drive? What does Tom want Bill to do to our car?
Bill to drive our car.
Tom. Bill. Ours. Our car. Drive our car. *Car. *Drive.
The sort of data illustrated with examples (13−14) could, if space allowed, easily be expanded to other acknowledged ellipsis mechanisms (e.g. stripping, VP-ellipsis, pseudogapping, sluicing, comparative deletion, etc.). These mechansims are eliding catenae. One caveat must be mentioned, however: the hypothesized catena condition on the elided material of ellipsis is a necessary condition, but not a sufficient one. Nevertheless, the catena (not the constituent) appears to be the key syntactic unit for the theory of ellipsis in general.
5.4. Discontinuities The catena is the key unit for the theory of discontinuities. Many DGs address discontinuities in terms of a flattening of structure. The displaced unit takes on a word as its head that is not its governor. This practice is illustrated with the following trees: (15)
is nobody
buying
is
claim
claim nobody
That
buyingg
That
a. That claim nobody is buying.
b. That claim nobody is buying.
did
(16)
you speak
did to
Who a. Who
Who
you speak tog
did you speak to?
b. Who did you speak to?
30. Dependency Grammar (17)
1039
arrived Someone
arrived with
Someone g
with
hair curly a. Someone arrived with curly hair.
hair curly b. Someone arrived with curly hair.
Examples (15a−b) illustrate an instance of topicalization, examples (16a−b) an instance of wh-fronting, and examples (17a−b) an instance of extraposition. The discontinuities are clearly visible in the trees on the left in terms of crossing lines (= projectivity violations). The trees on the right show how these discontinuities are “rectified”. Rising is assumed; the displaced unit (in bold) rises to take on a word as its head that is not its governor. In other words, a flattening of structure occurs. The bolded words are the risen catena and the non-italicized words are the rising catena. A rising catena is the minimal catena that extends from the root of the risen catena to the governor of that catena. The g subscript marks the governor of the risen catena. By examining the nature of the risen catena and the rising catena, a given discontinuity can be identified and categorized. The rising catenae of topicalization, for instance, can be (or are) significantly different from the rising catenae of wh-fronting, scrambling, and extraposition. Concrete blocks on a given type of discontinuity, e.g. Ross’ Right Roof Constraint or Left Branch Condition (1967), can be identified and formalized in terms of the risen and rising catenae involved.
5.5. Predicate-argument structures That syntactic dependency-based structures are close to semantic predicate-argument structures has long been emphasized by DGs (see e.g. Mel’čuk 1988, 2008; Nivre 2010). A typical subject-verb-object sentence, e.g. Bill likes Susan, contains a predicate (likes) and two arguments (Bill and Susan), whereby the arguments are equi-level dependents of the predicate. predicate
(18)
likes argument
Bill
Susan
argument
Bill likes Susan.
Each of the semantic units corresponds to a concrete unit of syntax. If the catena is taken as the fundamental unit of syntactic analysis, this insight extends to sentence structures that are much more complex. Predicates are catenae in surface syntax, and so are arguments. The only exceptions to this observation occur when certain discontinuities are involved. The following sentences illustrate the extent to which matrix predicates are always catenae in continuous structures:
1040
IV. Syntactic Models likes
(19) Bill a.
liking Susan
Bill’s
Bill likes Susan.
b. Bill’s liking Susan.
has Bill
Susan
Does liked
Bill like Susan
c. Bill has liked
Susan
Susan.
d. Does Bill like Susan?
is Susan
Is liked
Susan liked by
by Bill
Bill
e. Susan is liked by Bill.
f. Is Susan liked by Bill?
will Susan
Will have
Susan have been
been liked
liked by
by Bill
Bill
g. Susan will have been liked by Bill.
h. Will Susan have been liked by Bill?
The matrix predicate in each of these sentences includes at least one form of the content verb like and up to three auxiliary verbs as well. The auxiliary verbs add only functional meaning to the main predicate, and they are therefore not separate predicates, but rather they are included in the matrix predicate. Each of these verb combinations forms a catena, but many of them are not constituents. The arguments of predicates are also always catenae (in continuous structures), and at times, these catenae can be non-constituents. This fact is especially apparent with reduced relative clauses, where the head of the relative clause serves as an argument of the predicate in the reduced clause:
(21)
(20) things the
animals the
being
in avoided
the
kept
things being avoided
cages the
animals kept in cages
The predicates of the relative clauses are in bold. These predicates predicate over the head noun and its determiner. Thus being avoided is a predicate that takes the things as its one argument. Similarly, kept is a predicate taking the two arguments the animals and in cages. The noteworthy aspect of these examples is that the two arguments the things and the animals are not normally construed as constituents. They are, however, catenae.
30. Dependency Grammar
1041
The point at hand is perhaps most vividly illustrated using adjuncts. Adjuncts select their governors, which means that the governor is one (or part of an) argument of the predicate expressed by (or contained in) the adjunct. This situation is illustrated using a modal adverb and a locative PP. To mark the adjunct, an arrow convention is now employed (see Mel’čuk 1988); an arrow points away from the adjunct toward the governor that that adjunct selects (this convention has not been employed above):
sleeps
(23)
(22)
Larry
knows Sam certainly
under bed
answer his
the Sam certainly knows the
answer.
Larry sleeps under his bed.
The adverb certainly in (22) is viewed as a predicate that takes the entire rest of the sentence (Sam knows the answer) as its one argument. A similar situation obtains in (23), where the preposition under can be construed as a predicate that takes Larry sleeps and his bed as its arguments. Notice that each of the named arguments in these examples is a catena, but only his bed enjoys the status of a constituent. This discussion of the catena unit is now concluded. For a much more comprehensive discussion of the points covered here, see the sources listed in section 5.
6. Dependency parsing The upsurge in interest in DG in recent years is not so much due to work in theoretical syntax, but rather it is coming from computational linguistics. Dependency-based parsers are increasingly being used in computational applications. While dependency-based parsers have been around as long as constituency-based ones − since Hays (1964) and Gaifman (1965) − they have not (until recently) enjoyed the same prominence. An increasing body of study and applications is now demonstrating, however, that dependency-based parsers are competitive with respect to accuracy and superior with respect to efficiency − see Nivre (2010, 2011). Important in this area is that accuracy across parser types is measured by comparing parser output to manual annotations on held-out data, and in order to make the comparison possible across parsing results, constituency-based parse trees have to be converted to dependency-based ones. This conversion is not a straightforward matter. Despite the evaluation difficulties, dependency seems to be competitive. To parse a sentence is to construct a syntactic structure for that sentence. More formally, dependency parsing is the task of automatically mapping an input sentence S = W1, … Wn to one or more dependency trees T = (V, A), where W = word, T = tree, V = set of nodes, A = set of arcs/edges (Nivre 2010). In other words, dependency parsing is the computational task of determining the head of every word (with the exception of the root word), and further, assigning to each dependency a syntactic function (subject, object, oblique, determiner, attribute, etc.). Taking the claim at face value that dependency-based parsers parse faster than constituency-based ones, one can explain this increased efficiency in terms of the amount of structure involved. Dependency-based parse
1042
IV. Syntactic Models
trees contain on average approximately half the number of nodes and edges. This fact was illustrated above with trees (5a−b) and is shown again here with the following trees: (24)
V N N
N
N
D
- Dependency tree
A
a. DG parse trees contain many fewer nodes. S NP3 N1
VP NP3
N2
V N3
- Constituency tree NP
D
NP A
N
b. DG parse trees contain many fewer nodes.
Unlike trees (6a−b) above, the category labels are used here to mark the nodes. A strictlybinary branching analysis is again employed for the constituency tree in order to best illustrate the inherent minimalism of dependency-based structures. The dependency tree contains seven nodes and six edges, whereas the constituency tree contains 13 nodes and 12 edges. From these numbers it should be apparent that the task of producing a dependency-based parse tree is likely to require fewer decisions of the processor than for a constituency-based parse tree. Furthermore, the parser establishes dependencies by linking words to each other, whereas constituencies are created recursively by grouping constituents. Recursively grouping constituents is a greater task than linking words because larger stretches of the input string must be manipulated by each successive grouping. Of course the question arises as to whether the minimal dependency-based parse trees contain all pertinent information for a variety of computational applications. The answer at present seems to be that it does. Dependency-based parsing is currently being used in a wide range of different applications, e.g. language modeling, information extraction, machine translation, textual entailment, ontology learning, question answer, etc. − see Nivre (2010) and the sources he cites. In addition to this wide range of applications, the actual evidence demonstrating high accuracy and efficiency of dependency parsers has been established at various meetings of experts in the field; two significant milestones in this regard were the multilingual evaluation campaigns for dependency parsers organized by the Conference on Computational Natural Language Learning (CoNLL) (Buchholz and Marsi 2006; Nivre et al. 2007). The results presented at these conferences helped establish the greater accuracy of dependency-based parsers for languages with free word order, and the greater speed with which dependency parsers accomplish their task has never really been in doubt.
7. Conclusion To conclude, some speculation about the future of DG is offered. The increasing acceptance and use of dependency in computational circles is demonstrating that dependency
30. Dependency Grammar
1043
has practical value. It remains to be seen, however, whether dependency will fully catch on in theoretical linguistics and thus start appearing more often on the syllabi for courses on syntax (and semantics and morphology) and otherwise in linguistics departments in general. The Chomskian schools still dominate the stage in Anglo-American theoretical linguistics, and they do so despite the fact that a number of alternative, non-transformational frameworks have managed to gain widespread acknowledgment and acceptance (e.g. HPSG, LFG, CxG, etc.). The interesting thing about these other frameworks, however, is that they by and large assume constituency and are therefore constituency grammars. In this regard, the distinction between derivational and representational grammars is minor compared to the dependency vs. constituency distinction. This may be part of the reason why most constituency grammars take the constituency relation for granted, that is, they do not attempt to motivate it via theory-neutral considerations. Once one becomes fully aware of the dependency vs. constituency distinction, the inherent minimalism of dependency can pose an existential challenge to most any constituency grammar. Constituency grammarians may therefore prefer not to think about this challenge. Due to the acceptance and spread of dependency in computational linguistics, however, overlooking DG is becoming more difficult. In fact a critical mass may be reached at some point, in which case extensive awareness of the dependency vs. constituency distinction could extend from computational linguistics to theoretical linguistics and from theoretical linguistics to pedagogical applications. Therein lies further potential for dependency-based systems; DG structures are so simple that they are well suited for teaching kids and teenagers the concept of syntax. Dependency-based parse trees could even supplant the Reed-Kellogg system of sentence diagramming that is still employed on occasion by some English teachers in middle schools in North America. If that were to occur, a large scale move away from the more complex constituency-based theories of syntax and grammar to the more economical dependency-based theories would be complete.
8. References (selected) Buchholz, Sabine, and Erwin Marsi 2006 CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL), 149−164. Chomsky, Noam 1957 Syntactic Structures. The Hague/Paris: Mouton. Chomsky, Noam 1995 The Minimalist Program. Cambridge, Mass.: The MIT Press. Engel, Ulrich 1994 Syntax der Deutschen Gegenwartssprache, 3rd revised edition. Berlin: Erich Schmidt. Eroms, Hans-Werner 2000 Syntax der Deutschen Sprache. Berlin: de Gruyter. Groß, Thomas 1999 Theoretical Foundations of Dependency Syntax. Munich: Iudicium. Groß, Thomas, and Timothy Osborne 2009 Toward a practical dependency grammar theory of discontinuities. SKY Journal of Lingistics 22: 43−90.
1044
IV. Syntactic Models
Hays, David 1964 Dependency theory: A formalism and some observations. Language 40: 511−525. Gaifman, H. 1965 Dependency systems and phrase-structure systems. Information and Control 8: 304−337. Heringer, Hans Jürgen 1996 Deutsche Syntax Dependentiell. Tübingen: Staufenberg. Hudson, Richard 1984 Word Grammar. Oxford: Blackwell. Hudson, Richard 1990 English Word Grammar. Oxford: Blackwell. Hudson, Richard 2007 Language Networks: The New Word Grammar. Oxford: Oxford University Press. Kahane, Sylvain 2008 Le rôle des structures et représentations dans l’évolution des théories syntaxiques. In: G. Lecointre, J. Pain (eds.), Evolution : Méthodologie, Concepts, Les cahiers de l'Ecole, Ecole Doctorale « Connaissance, Langage, Modélisation », Université Paris X − Nanterre. Mel'čuk, Igor 1988 Dependency Syntax: Theory and Practice. Albany: SUNY Press. Nivre, Joakim 2010 Dependency parsing. Language and Linguistics Compass 4, 3: 138−152. Nivre, Joakim 2011 Bare-bones dependency parsing. In: Security and Intelligent Information Systems. Lecture Notes in Computer Science, Volume 7053, 20−32. Springer. Nivre, Joakim, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, Sebastian Riedel, and Denis Yuret 2007 The CoNLL 2007 shared task on dependency parsing. Proceedings of the CoNLL Shared Task of EMNLP-CoNLL 2007: 915−932. O’Grady, William 1998 The syntax of idioms. Natural Language and Linguistic Theory 16: 79−312. Osborne, Timothy 2005 Beyond the constituent: A dependency grammar analysis of chains. Folia Linguistica 39(3−4): 251−297. Osborne, Timothy and Thomas Groß 2012 Constructions are catenae: Construction Grammar meets dependency grammar. Cognitive Linguistics 23(1): 163−214. Osborne, Timothy, Michael Putnam, and Thomas Groß 2011 Bare phrase structure, label-less trees, and specifier-less syntax: Is Minimalism becoming a dependency grammar? The Linguistic Review 28: 315−364. Osborne, Timothy, Michael Putnam, and Thomas Groß 2012 Catenae: Introducing a novel unit of syntactic analysis. Syntax 15(4): 354−396. Owens, Jonathan 1984 On getting a head: A problem in dependency grammar. Lingua 66: 25−42. Ross, John 1967 Constraints on variables in syntax. Doctoral dissertation, MIT. Schubert, Klaus 1988 Metataxis: Contrastive Dependency Syntax for Machine Translation. Dordrecht: Foris. Sgall, Peter, Eva Hajičová, and Jarmila Panevová 1986 The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. Dordrecht: D. Reidel Publishing Company. Starosta, Stanley 1988 The Case for Lexicase: An Outline of Lexicase Grammatical Theory. New York: Pinter Publishers.
31. Categorial Grammar Tesnière, 1959 Tesnière, 1969
1045
Lucien Éleménts de Syntaxe Structurale. Paris: Klincksieck. Lucien Éléments de Syntaxe Structurale, 2nd edition. Paris: Klincksieck.
Timothy Osborne, Kirkland,Washington (USA)
31. Categorial Grammar 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Introduction The Ajdukiewicz-Bar-Hillel Calculus Combinatory Categorial Grammar Categorial Type Logic Relating type-logical and combinatory approaches Related formalisms Computational applications Conclusion Further reading References (selected)
Abstract This chapter provides the historical backdrop to work in categorial grammar, leading up to current approaches and current synergy between those approaches. We begin by describing the AB calculus (Ajdukiewicz 1935; Bar-Hillel 1953) − categorial grammar in its most basic form − and provide example analyses and derivations with it. We then discuss the Lambek calculus (Lambek 1958), which was the first to define categorial grammar as a logic, followed by Categorial Type Logics (CTL) (Morrill 1994; Moortgat 1997; Oehrle 2011), a resource-sensitive evolution of Lambek’s deductive approach. Then, we turn to Combinatory Categorial Grammar (CCG) (Ades and Steedman 1982; Steedman 1996a, 2000b; Steedman and Baldridge 2011), the best known of rulebased extensions of the AB calculus, which adds rules based on some of the combinators of Combinatory Logic (Curry and Feys 1958). We also discuss the incorporation of resource-sensitivity into CCG, and other connections with CTL. In these sections, we will highlight some of the linguistic analyses which have been proposed for phenemona such as long-distance relativization, right-node raising, argument cluster coordination, and parasitic gaps. The chapter also discusses connections between categorial grammar and other frameworks, with reference both to general perspectives which are shared and to specific efforts to formulate the mechanisms or analyses of a given framework in a categorial one. Finally, we briefly highlight some recent computational work which supports grammar engineering with categorial grammar or which utilizes categorial grammar for both practical and research-oriented applications.
1046
IV. Syntactic Models
1. Introduction Categorial grammar is an umbrella term for a family of grammatical formalisms which handle syntactic and semantic analysis via type-dependent analysis. Syntactic types and semantic interpretations are assigned to complex expressions that are compositionally determined from the types and interpretations of their subexpressions. Modern categorial grammar was first proposed by Ajdukiewicz (1935) as his “calculus of syntactic connection.” It arose from the theory of semantic categories developed by Edmund Husserl and the Polish school of logicians, which included Ajdukiewicz and Stanislaw Lesniewski (Casadio 1988). Bar-Hillel (1953) provided an order-sensitive formulation of categorial grammar, and Lambek (1958) provided its first formulation as a logic. Since then, categorial grammar has been extensively developed into variants that have greater descriptive, explanatory and computational adequacy for dealing with natural language grammar. This chapter focuses on the two current main branches of categorial grammar: Combinatory Categorial Grammar (Ades and Steedman 1982; Szabolcsi 1992; Jacobson 1992b; Steedman 1996a, 2000b; Steedman and Baldridge 2011; Steedman 2012) and Categorial Type Logics (also known as Type Logical Grammars: van Benthem 1989; Morrill 1994; Moortgat 1997; Carpenter 1998; Vermaat 1999; Oehrle 2011). Compared with other grammar formalisms such as Head-Driven Phrase-Structure Grammar (HPSG: Pollard and Sag 1994; Sag et al. 2003) or Lexical-Functional Grammar (LFG: Kaplan and Bresnan 1982), categorial grammars are extreme lexicalist formalisms, meaning nearly all grammatical information is contained within the entries of the lexicon while syntactic derivation is modeled with a small set of very general rules. In this respect, categorial grammars share common ground for proposals within the Minimalist Programs (MP: Chomsky 1995), according to which syntactic derivation involves a very small number of rules operating over lexical categories richly specified with syntactic information. Another core property of CG frameworks is semantic transparency. Syntactic category types (such as s\np) correspond directly to semantic types, typically expressed as terms of the lambda calculus (such as λy.λx.[Pxy]). Meaning assembly is therefore strictly compositional. Essentially, syntactic categories simply constrain the linear order in which semantic functions can combine with their arguments. Syntactic structure itself is simply an artifact of derivation, not a level of representation (Steedman and Baldridge 2011). Typically, CG frameworks allow for much freer surface constituency than is typically assumed in context-free grammar formalisms (as well as in Ajdukiewicz’s original formalism). This provides straightforward analyses of many interesting problems in syntax, including unifying analyses of unbounded constructions such as coordination and relative clause formation, an analysis of intonation structure as part of surface syntactic derivation, and algorithms that allow for incremental processing as a part of syntactic derivation with the competence grammar. This chapter presents the very earliest form of categorial grammar, the AB calculus (Ajdukiewicz 1935; Bar-Hillel 1953), and then discusses predominant modern categorial grammar frameworks in a way that emphasizes their similarities rather than their differences. The AB calculus provides an intuitive and particularly simple example of categorial grammar that allows many of the core intuitions and mechanisms of the approach to be explicated in terms that will be more familiar to newcomers than its modern descendants.
31. Categorial Grammar
1047
Throughout the discussion, examples are provided of some of the linguistic analyses which have been proposed for phenemona such as long-distance relativization, rightnode raising, argument cluster coordination, and parasitic gaps. Finally, brief discussion is provided of generative capacity, the connections to other frameworks, and recent computational work based on categorial grammar.
2. The Ajdukiewicz-Bar-Hillel Calculus The common starting points of categorial grammar formalisms are Ajdukiewicz’s (1935) syntactic connection calculus and Bar-Hillel’s (1953) directional adaptation thereof. It is thus generally referred to as the AB calculus. It has two components: categories and rules. Syntax: Categories may be either atomic categories (such as s for sentences and n for nouns) or complex categories which are functions defined over atomic categories and/ or other complex categories. Complex categories directly encode the subcategorization requirements of the expressions they are associated with, along with specifications of the linear order in which arguments will be found. The linear order of arguments is encoded by means of “slash” operators \ and /, which mean “find an immediately preceding argument” and “find an immediately following argument” respectively. A simple example is the transitive verb category (s\np)/np for English verbs. In words, this category indicates that it seeks a noun phrase to its right (with rightward leaning slash “/” and category np), then another noun phrase to its left (via the leftward leaning slash “\” and category np); upon consuming both of these arguments, it yields a sentence (s). Note that the categories provided above use the “rightmost” notation for categories commonly employed in work in Combinatory Categorial Grammar. In this notation, the leftmost category in a function type is the result category: In (s\np)/np, s is the leftmost category in the function and it indicates the result of applying the verb’s category to its two np arguments. An alternative notation used in the Lambek calculus tradition places leftward arguments to the left of the result category. As such, the transitive category is written (np\s)/np. There are advantages and disadvantages to both conventions which we will not address here, but the reader should expect to see both alternatives in the literature. English verbal categories, which will be used in later derivations, include: (1)
a. b. c. d.
intransitive verbs: s\np transitive verbs: (s\np)/np ditransitive verbs: ((s\np)/np)/np sentential complement verbs: (s\np)/s
Categories are combined by means of two directionally sensitive rules of function application, forward application and backward application: (2)
a. X/Y : f Y : x 0> X : f(x) (FA) b. Y : x X\Y : f 0< X : f(x) (BA)
1048
IV. Syntactic Models
In words, forward application allows two word sequences, ωa with category of type X/ Y and interpretation f of type στ and ωb with category of type Y and interpretation x of type σ, to form the word sequence ωa + ωb of type X and interpretation f(x) of type τ. Backward application is just the directional converse. With the application rules, and a lexicon containing Ed and Ted with category np and saw with transitive category (s\np)/np, the derivation for Ed saw Ted is (3). Each step is annotated by underlining the combining categories and labeling the underline with the symbol for the rule used: “>” for forward application and “
s\ np s
s\ np s
to
go
(sinf \ np) / (sbase \ np) sbase \ np >
sinf \ np >
T)
b. Backward type-raising X : a 0T Y\i (Y/iX) : λf .fa
(
B) (T
saw
Ted
(s\ np)/·np : λx λy .see (y, x)
np : Ted
s/ (s\ np) : λP .PEd >B
s/·np : λx .see (Ed , x )
>
s : see (Ed , Ted )
Allowing associativity in the grammar allows elegant analyses of extraction and coordination by allowing constituents to be derived that cannot be created in the AB calculus. For example, a typical kind of extraction is relative clause formation, illustrated in the following derivation for the relative clause that Sam wants to eat, in which two successive applications of forward composition allow for Sam wants to eat to be derived as a constituent which is then taken as the argument to the relative pronoun: (24)
that
Sam
wants
to eat
s/(s\np) (s/np)/(s\np) (s\np)/np (np\np)/(s/np) : : : : λQλP λx.P x Qx λR.R (Sam ) λP λy.want y(P y ) λzλx.eat (x, z ) (s\np)/np : λzλy.want y(eat ( y, z )) s/np : λz.want (Sam )(eat (Sam , z )) λP λx.P x
np\ np : want (Sam )(eat (Sam , x ))
>B
>B
>
A further example that cannot be handled elegantly with the limited apparatus of AB (and context-free grammar) involves the coordination of verbal complexes such as that in (25). (25) Ed will see and should hear Ted. A standard analysis of modal verbs is that they are functions from intransitive verb phrases into intransitive verb phrase (see Hoyt and Baldridge 2008 for a different analysis). However, for this coordination to proceed, the modal verbs will and should must combine with see and hear respectively and coordinate before their shared object argument is consumed. Because the only rule available in the AB calculus is functional application, there is no way to do this: (26) will(s\np)/(sbase\np) meet(sbase\np)/np The rule >B, however, does allow this: for the combination in (26), X is s\np, Y is sbase\np, and Z is np. The result, X/ Z, is thus (s\np)/np. Crossed Composition: The function application and composition rules discussed so far are all order-preserving, meaning that they require that syntactic functions (such as
1056
IV. Syntactic Models
verbs) be directly adjacent to their arguments in order to be able to combine with them. Thus, they are unable to derive sequences of expressions in which function categories are not immediately adjacent to their arguments. In the CCG literature, non-adjacent arguments are said to have been permuted. An example of argument permutation in English is heavy-NP shift, in which an adverb comes between a verb and its direct object, such as Ed saw today his tall friend Ted. Assuming that today is of type (s\np)\(s\np) and that Ted and Ed have raised categories, then the rules given so far provide no way of combining the categories in the sentence because the directions of the slashes do not match those specified in the rules: (27)
Ed s/(s\np)
saw (s\np)/np
today (s/np)\(s\np)
***
his tall friend Ted s\(s/np)
< B * **
For this reason, “non-harmonic” or “crossed” composition rules are provided: (28) a. Forward crossed composition X/ Y Y\Z 0B X\Z b. Backward crossed composition Y/ Z X\Y 0B X /Z
(>B×) (
s\ np
T
< B×
s/ (s\ np)
( s\ np)/·np s/·np n\ n
n
>B >
Modalized Slashes: One way of blocking derivation like this is to define languagespecific rule restrictions (or outright bans) in order to limit the applicability of rules (Steedman 2000b). This is unattractive because it means that the grammars for languages can vary both in their lexicon and their rule set. An alternative that has become standard practice in CCG is to define a set of modes of combination that can be used to create slash types which selectively license some, but not all, of a set of truly universal combinatory rules. Jacobson (1992b) was perhaps the first to use this strategy in a CCG-like system; there, she uses slash-types to force composition (e.g., for raising verbs) and disallow application. Baldridge (2002) provides a general framework for creating modalized CCG rules using underlying logics that generate the rules as proofs. Most work in CCG now uses the set of modalities ᏹ = {-, >, ×, ·} defined by Baldridge and Kruijff (2003). They have the following behaviors: (32) a. b. c. d.
-:
non-associative and non-commutative associative and non-commutative ×: non-associative and commutative ·: associative and commutative
>:
These modalities allow typed slashes such as /- and /> to be defined. How their behaviors are projected is explained in the remainder of this section. Their basis in Categorial Type Logics is discussed in section 3. With slashes typed according to different modes, the rules must be defined with respect to those types. The application rules are the same as with AB, but may be used with any of the slash types, as indicated with the i subscript, where i can be any of the modalities given in ᏹ: (33) a. Forward application X/iY Y 0 X (for i 2 ᏹ) b. Backward application Y X\iY 0 X (for i 2 ᏹ)
(>) (B×) (: (35) a. Forward harmonic composition (for i, j 2 {>, ·}) X/iY Y/j Z 0B X/j Z b. Backward harmonic composition (for i,j 2 {>,·}) Y \j Z : g X \ iY : f 0B X \j Z : λx.f (gx)
(>B) (np)/·np today := (s\>np)\·(s\>np) powerful := n/-n by Ronaldo := n\×n a np/ s
b.
Ed np
powerful by Ronaldo shot n/ n
***
n\ ×n >B
saw (s\np)/.np
***
n
today
his tall friend Ted
(s/np)\×(s\np)
np
< B×
(s/np)\×np
>
s\np s
T
saw (s\ np)/·np >T
s/ (s\ np)
s/ (s\ np) >B
s/ s
>B
s/·np
>B
s/·np >B
s/·np >
n\ n
(s/·np)\ (s/·np)
s
This analysis is also consistent with intonational constituency and incremental parsing with the competence grammar (Steedman 2000b). Importantly, in addition to deriving these constituents, CCG provides compositional interpretations for them. Other types of “odd” constituent coordinations appear cross-linguistically, such as argument cluster coordination in English: Ed gave Ted tea and Ned bread. This phenomenon has also been called non-constituent coordination, reflecting the difficulty in assigning a sensible phrase structure constituency that groups indirect objects with direct objects. Again, CCG’s increased associativity allows a constituent to be formed in nonstandard ways. To deal with such an example, we need the backward versions of the composition and type-raising rules. These rules conspire in the analyses of Steedman (1985) and Dowty (1988) (presented in 1985) to create the necessary constituents for argument cluster coordination. After type-raising each of the two objects, they are composed, resulting in a function which is looking for a function that is missing its indirect object and direct object arguments. This function can then be coordinated with other functions of the same type. The derivations in (40−41), in which the subject has already composed with the verb, show this. (40)
Ted
tea
np
and (X\ X)/ X
np np)/·np [ without reading ]((s\>np)\·(s\>np))/·np The fact that sequences like filed without reading can coordinate with transitive verbs as in the example indicates that it must be possible to combine their categories to form a transitive category. The following rule provides exactly this functionality:
31. Categorial Grammar
1061
(46) Backward crossed substitution Y/j Z (X\ iY)/j Z 0S X/j Z (for i,j 2 {×,·})
(T
< S×
s/ (s\ np)
( s\ np)/·np >B
np
>
n\ n
B
* **
***
s/ (s\ np)
> B * **
Hoyt and Baldridge propose the following rules for deriving examples like this, based on proofs in an underlying categorial type logic: (50) a. X/>(Y/>Z) : f Y/>W : g 0S X/>(W/>Z) : λh.f (λx.ghx) b. Y\>W : f X\>(Y\>Z) : g 0S X\>(W\>Z) : λh.f (λx.ghx) (51)
what
you can
and
what
you should not
s/ (s/ np) s/ (s\ np) ( X\ X) / X s/ (s/ np) >D
s/ (( s\ np) / np)
s/ (s\ np)
s/ (( s\ np) / np) (( s\ np) / np) \ (( s\ np) / np) (s\ np) / np
>D >
D) (G (X_Z)/(Y_Z) : λPδσ λxδ.f(P(x)) b. Backward Division X\Y : fστ 0 G) (> G)
In addition to the G-rules, Jacobson also assumes the familiar type-raising (T) rules as well as two further rules to provide analyses for a variety of subtle binding data. Rule (53a) is a unary version of the substition rules and is used to model pronominal binding. Rule (53b) is a “de-Curry-ing” rule which changes a function of type σ(δτ) to a function of type (σδ)τ. Jacobson uses the M-rule to model functional interpretations of relative clauses (Jacobson 2000, 2002). (53) a. Rule Z (X\Y)/ Z : λyλx.fxy 0Z (X\Y)/(Z_Z) : λgλx.f(gx)x) b. Rule M (X|Y)|Z : λyλx.fxy 0M X|(Y|Z) : λg.cx[x2dom(g) → fxgx]
(Z) (M)
Jacobson treats pronouns as semantic identity functions λx.x (of type ee) and syntactically as functions of category np_np. Binding is realized by means of the Z-rule, which turns predicate categories into categories taking as arguments functions what we will call “pronominal functions,” namely functions with the _-operator looking for an np argument. To illustrate with a simple example, consider the sentence Every mani thinks Mary loves himi, interpreted with the quantifier every man binding the pronoun him. This is derived in (54), with successive application of the G-rule to the categories above the pronoun in the binding dependency. The type-changing rule Z changes the type for think into one taking a pronominal function as its argument.
31. Categorial Grammar (54)
1063
thinks
Mary
(s\ np)/ s : λp.λy.think yp
s/ (s\ np) : λQ et .Q(M ary ) Z
loves ( s\ np)/ np : λz.λx.love xz G
(s\ np)/ (s np) : λR et .λx.think x (Rx)
him
( s np)/ (( s\ np) np) : λR eet .λz.R(M ary )z
np np : λz.z G
(( s\ np) np)/ (np np) : λPee .λz.λx.love x (P z ) (s\ np) np : λz.λx.love xz
> >
s np : λz.love (M ary )z
>
s\ np : λx.think x (love (M ary )x )
The quantifier every man then takes this expression as its argument, binding the pronoun indirectly (See Szabolcsi 2003 for an extension of Jacobson’s framework to cross-sentential anaphora): (55)
every man
thinks Mary loves him
s/ (s\np) : λPet . x [man x → P x ]
s\ np : λx.think x (love (M ary )x ) >
s : x [man x → think x (love (M ary )x )]
Note that Jacobson’s analysis of pronominal binding can be recreated within BTS-CCG, given that the combination of the G-rule and function application corresponds directly to function composition (B) in BTS-CCG. Likewise, the Z-rule can be assumed to be a lexical rule, in line with the standard treatment of unary type-changing rules in the BTSCCG literature. The binding effect in (54) can then be captured in BTS-CCG as follows (we subscript slash-operators corresponding to the _-operator above with a “g”): (56)
thinks
Mary
(s\ np)/ (s/ g np) : λP et .λy.think y(P y )
s/ (s\ np) : λQet .Q(M ary )
loves ( s\ np) / np : λy.λx.love xy
him np / g np : λz.z
(s\ np) /g np : λz.λx.love xz s/ g np : λz.love (M ary )z s\ np : λy.think y(love (M ary )y)
>B >B >
From this perspective, analyses in G-CCG can be straightforwardly captured in BTSCCG, and vice versa. However, one potential point of difference between the two formulations of CCG involves locality restrictions on application of the rules. As was discussed above, BTSCCG assumes that application of the harmonic and composition rules are restrained by the modalities. Baldridge (2002) argues that restrictions on extraction out of adjuncts and coordinate structures be modeled in terms of modal restrictions. In particular, he argues that adjuncts such as relative clauses as well as and have types specified with the --modality:
1064
IV. Syntactic Models
(57) a. that, which w (np\np)/-(s/np) : λQet.λPet.λx.[Px o Qx] b. and w (X\-X)/-X : λqστ .λpστ .λxσ .[p(x) o q(x)] The presence of the --modality on these categories prevents the composition rules from applying to them and so prevents extraction dependencies from being established from within their arguments. Jacobson, however, used her framework to analyze data showing binding from within conjunct (58a) and adjunct islands (58b): (58) a. Every mani thinks that [ hei lost and Mary won ]. b. Every mani hopes that the woman [ that hei wants to marry ] loves him. For example, in Jacobson’s framework the coordinate structure in (58a) would be analyzed as follows. (59)
he np np : λx.x
lost
and
s\ np : λy.lose y
Mary
won
(s\ s)/s np : : λq.λp.p q M ary
s\ np : λz.win z
G
(s np)\ (np np) : λP ee .λy.lose (P y ) s np : λy.lose y
s : win (M ary )
B+
X\ + Z+pro (for i 2 {-, ×,>,·})
(> B+)
For the sake of notational felicity, we abbreviate \+ with the _-symbol used above: \ + = _. With this additional rule, Jacobson’s examples can be derived while still retaining the precise modal control over long-distance dependencies that BTS-CCG offers. (62)
thinks
he
(s\ np)/ (s np) : λPet .λx.think x (P x )
lost
np np s\ np : : λx.x λy.lose y
and
Mary
(s\ s)/ s np : : λq.λp.p q M ary
won s\ np : λz.win z
< B+
s np : λy.lose y
win (M ary ) >
s\ np : λx.think x (lose x
win (M ary ))
Augmenting BTS-CCG with this rule would make it possible to capture other kinds of binding relationships that have not received much attention in the framework, such as the extensive use of resumptive pronouns in languages like Arabic, Hebrew, and others (Demirdache 1991, 1997; Aoun and Benmamoun 1998; Aoun and Choueiri 2000; Aoun et al. 2001; Ouhalla 2001; Choueiri 2002; Aoun and Li 2003; Asudeh 2005, 2012; Hoyt 2010; a.o.). Honnibal’s (2009)’s and Honnibal and Curran’s (2009) “hat categories,” which allow a form of lexically encoded type-changing, may be relevant for the analysis of data like this. The Wrap Rule: Other combinators have been considered in other contexts, most notably the commuting combinator (Cfxy ≡ fyx) in the guise of the wrap rule (Bach 1979). Most notably, wrap rules have been employed by much work by Dowty (e.g.,
1066
IV. Syntactic Models
Dowty 1982, 1997). One of the uses of wrap rules is that they allow verbal categories to be defined according to a hierarchy of grammatical functions, where each of the arguments corresponding to those functions is represented by adding it to the subcategorization based on the previous function (Dowty 1997). For example, a subject is a noun phrase that combines with s/np (or s\np) to form an s, an object is a noun phrase that combines with (s/np)/np (or (s\np)/np) to form s/np (or s\np), and so on. Wrap-enabling slashes then take care of mismatches in word order. Concretely, consider a verb-initial language, where the intransitive would be s/nps. According to Dowty’s recipe for constructing verbal categories, this would lead to the transitive category (s/nps)/npo, which produces VOS word order. For VSO languages, Dowty takes the category to be (s/nps)/w npo, where the /w slash does not produce adjacent concatenation of the verb and the object, but instead “wraps” them around the as-yetunconsumed subject. In CCG, this effect could be achieved with a unary rule that reorders such categories: (63) Forward Wrap (X/ Y)/w Z 0Wrap (X/ Z)/ Y
(> W)
However, in CCG, the effect of such a rule is assumed to be a process that could be carried out in the lexicon (Steedman and Baldridge 2011), meaning that one could define category hierachies in the lexicon in the manner suggested by Dowty, but still use the linearized category (s/npo)/nps for VSO verbs in syntax. Dowty provides a different characterization of wrap in terms of commutative rules in multi-modal type-logical grammar (see the next section). One of the punchlines of the wrap story is that the binding theory can be defined in terms of the structures defined by categories themselves and the derivations they license. This contrasts with Steedman’s account of binding in CCG, which is based on c-command defined over logical forms (Steedman 2000a). Dowty suggests that this use of logical form is at odds with Montogovian assumptions and should thus be dispreferred to the wrap analysis. Steedman argues that binding of true reflexives and bound anaphors is a strictly clause bounded phenomena, and thus constraints on binding may be specified in the lexicon. Furthermore, he suggests that whether this is done in terms of constraints over derivation structures (categories) or a logical form corresponding to them is immaterial: whereas Dowty makes no intrinsic use of logical form, Steedman makes no intrinsic use of derivation structure.
4. Categorial Type Logic CCG adds to the AB calculus a finite set of rules of category combination. These rules selectively introduce more inference patterns that support greater associativity and/or commutativity. An alternative is to define a fully logical characterization of categorial grammar. To begin on this path, note first the similarity of the rules of the AB calculus with Modus Ponens: both types of rules involve the use of inference. However, in the forward application rule (X/ Y Y 0 X), the resource Y is eliminated. This suggests that the AB calculus is related to resource-sensitive linear logic (Girard 1987). Viewed this way, the AB calculus is a system of inference; however, unlike linear logic, the AB
31. Categorial Grammar
1067
calculus is a partial system of inference because it has no corresponding ability to introduce resources: in other words, it does not support hypothetical reasoning. Furthermore, linear logic also allows different types of logical connectives to be defined; each connective may exhibit different behaviors with respect to associativity and commutativity (and other dimensions). The characterization of categorial grammar as a logic was first proposed by Lambek (1958) and has been since developed into the framework known as Categorial Type Logics (CTL, a.k.a. Type-Logical Grammar and subsuming various forms of the Lambek calculus) (Morrill 1994; Hepple 1995; Carpenter 1998; Moortgat 1999; Oehrle 2011). CTL supports multiple unary and binary modes of grammatical combination that can exhibit quite different logical properties, which in turn lead to different syntactic possibilities (Moortgat and Oehrle 1994). Residuation: From the perspective of CTL, slashes are directionally sensitive implications, in the full logical sense of the term. As with CCG, slashes may be typed, where each type corresponds to a particular mode of combination with different logical properties (e.g., associativity, commutativity). Each slash pair \i , /i is related to a product operator •i; the three operators are related by the residuation laws: (64) A w C/i B
iff A•i B w C
iff B w C\i A
These laws say that if one can conclude the formula C when the formula A is next to the formula B (i.e., A•i B), then one also knows that from A it is possible to conclude C/iB and that likewise from B to conclude C\i A. The slash \i is the right residual of •i , and /i is the left residual. This may not seem particularly intuitive, but as an analogy, C C think of multiplication: from A × B = C one knows that and . Because multiplication B A is commutative, the operator ÷ is both the left and right residual of ×. In linguistic terms, a category formed with the product like np •-np represents the juxtaposition of two noun phrases under the modality -. Slash categories are the same as with CCG: incomplete expressions seeking other expressions. A category that uses all of the operators is (s\>np)/>(np •-np), which would be a candidate category for ditransitive verbs in English. Nonetheless, because the actual linguistic use of product is fairly rare in the CTL literature, the rules for using it are omitted in this discussion. See Moortgat (1997) or Vermaat (2005) for further details on the use of product. Base Logic: The core of a CTL system is a universal component, referred to as the base logic, which defines the basic behavior common to all of the connectives. From the perspective of the AB calculus, the most familiar parts of the base logic are the slash elimination rules. These correspond to AB’s application rules, but there are important differences. Most importantly, slash elimination is not necessarily tied to string adjacency as it is in functional application. Instead, these rules include a structure building component that organizes the antecedents of the premises into a new structured antecedent which may be subsequently restructured and reordered (the reader should note that in the CTL literature two notational conventions are used for representing categories and proofs. One is referred to as the natural deduction presentation and more closely resembles the notation used for CCG categories and derivations and for which reason it is used here. The other notation is referred to as the (Gentzen) sequent presentation; see Carpenter 1998; Ch. 5 for an accessible introduction to the two presentations):
1068
IV. Syntactic Models
(65) Slash elimination schemas (with i 2 M): ∆ Y Γ X /i Y [/i E] a. (Γ ◦i ∆) X
b.
Y
Γ
(∆
◦i Γ )
∆
X\ i Y X
[\ i E]
Note how the direction of the slash is reflected in the order of the components of the structured antecedent. With these rules, we can provide the following proof that the sentence Ed saw Ted today is of type s: saw
(66)
(s\ np)/·np
(saw Ed
◦·
Ted )
Ted s\ np
(( saw
np
◦
(Ed
(( saw
◦·
np
◦·
Ted )
[/·E]
Ted )
◦·
◦·
today
today )
today ))
s
(s\ np)\ ·(s\ np) s\ np
[\ ·E]
[\ E]
This is the standard presentation format for CTL analyses. Note that the structured antecedents contain lexical items, but these are actually standing in for the categories that they licensed from the lexicon. This greatly enhances readability, but readers should keep in mind that the categories are in the antecedents. This matters for a number of ways in which structures are manipulated, starting with hypothetical reasoning, which we turn to next. Proof (66) only builds structure, using the elimination rules. However, the base logic also supports hypothetical reasoning: hypothesized elements can be consumed during the course of a proof and thus become part of the structured antecedent built during the process. In order for the proof to succeed, these hypotheses must eventually be discharged. This requires them to be on the periphery of the structured antecedent in order for the slash introduction rules (67) to apply. A hypothesized element on the right periphery may be discharged with a rightward slash (67a), and one on the left periphery may be eliminated with a leftward slash (67b). (67) Slash introduction schemas: · · ·[y Y] · · · (Γ ◦i y ) X a. [/i I ] Γ X/i Y [y b.
Y] · · · · · · ( y iΓ ) ◦ Γ
X
X\ i Y
[\ i I ]
31. Categorial Grammar
1069
Note that just as eliminating a slash with a modality i builds structure by connecting two antecedents with ◦i , an introduced slash inherits its modality from the structure which produced it. Interestingly, a consequence of hypothetical reasoning combined with slash introduction is that the type-raising rules given in the previous section are theorems of the base logic. This is shown by hypothesizing a function which consumes the argument, and then subsequently withdrawing the assumption. (68)
Ed
np
[x1
(Ed ◦ x1 ) Ed
s\ np]1 s
s/ (s\ np)
[\ E]
1
[/ I ]
The reasoning in this proof, in words, is that if one had an intransitive verb, then it could consume the np (introduced by Ed from the lexicon), and derive an s. But the assumed verb x1 must then be withdrawn. This introduces the rightward slash (since x1 is on the right in the structured antecedent) with the intransitive verb category as the argument. Note that the hypotheses are marked by numbers when they are introduced and discharged to improve readability of proofs which have multiple hypothesized elements. Structural Reasoning: The base logic still has a fairly hands-off approach to the structured antecedents of proof terms and as such it is not more flexible than the AB calculus. However, it is possible to augment the base logic by defining structural rules that reconfigure the antecedent set of premises and thereby create systems with varying levels of flexibility. For example, the following rules will permit structures built via the modalities > and · to be associatively restructured. (69) a. Right Association: b. Left Association:
◦i (∆ b ◦j ∆ c )) ((∆ a ◦i ∆ b ) ◦j ∆ c ) ((∆ a ◦i ∆ b ) ◦j ∆ c ) (∆ a ◦i (∆ b ◦j ∆ c ))
X
(∆ a
X X X
[RA] (for i, j [LA]
(for i, j
{ , ·}) { , ·})
Rules such as this are one component for providing the flexibility which the AB calculus lacks and which CCG gains with rules based on combinators. For example, consider the object relative clause man whom Ed saw, using the same categories assumed in the CCG derivation (38). We begin by introducing entries from the lexicon and hypothesizing the missing object of saw. We then combine the premises using the slash elimination rules of the base logic, and restructure the binary tree built up during the proof using the structural rule of right association. Finally, we discharge the assumption using the rightward slash introduction rule, leaving the category s/·np required by the relative pronoun. The crucial step for the extraction is where the structural rule RA applies and puts the assumption on the periphery so that it may be released by slash introduction − before that, it is buried in an inaccessible position. The parallel with traces and movement operations in mainstream generative grammar should be clear from this example (see section 5 for pointers to work on such connections). However, it should be stressed that
1070
IV. Syntactic Models
(70)
saw Ed
(s\ np)/· np
[x1
◦·
s\ np
(saw
np
[/· E] x1 )
◦ (saw ◦· x1 )) (( Ed ◦ saw ) ◦· x1 ) (Ed ◦1 saw ) s/· np (Ed
whom man
n
(n\ n)/ (s/· np) (whom
(man
◦
(whom
◦
◦
(Ed
(Ed
◦
◦
saw ))
np]1 [\ E]
s [RA ] s
[/· I ]1
[/ E ]
n\ n [\ E]
saw)))
n
this is not actual movement, but reasoning about syntactic types in a structured set of premises. This sort of associative restructuring also allows the system to handle right-node raising: Ed saw and Ned heard both have the type s/·np (as they did in CCG derivation [39]). If the categories of coordinators such as and are of the form (X\-X)/-X, then Ed saw and Ned heard has the type s/·np as desired. The set of structural rules can be expanded to permit other ways to reconfigure structured antecedents. For example, the following rules support reordering: (71) a. Left Permutation: b. Right Permutation:
(∆ a (∆ b ((∆ a ((∆ a
◦i ◦j
(∆ b (∆ a
◦i ◦j
◦j ◦i
∆b ) ∆c )
∆ c ))
X
∆ c ))
X
◦j ◦i
[LP]
∆c )
X
∆b)
X
[RP]
(for i, j
{ × , ·})
(for i, j
{ × , ·})
Consider heavy-NP shift examples such as Ed saw today his tall friend Ted. The proof would be like the one for saw Ted today in (66) but with an application of right permutation: (72)
◦· ◦· today ) (( saw ◦· today ) ◦· his-tall-friend-Ted ) ◦ (( saw ◦· his-tall-friend-Ted ) ◦· today )) (( saw
Ed
np (Ed
· · · his-tall-friend-Ted )
s\ np s\ np s
[RP ] [\ E]
Similar use of the permutation rules would allow a number of other permutation possibilities, such as those needed for non-peripheral extraction and scrambling. Of course, there are constructions which do not permit such freedom, and it is here that the multimodal system shines since it allows selective access to the structural rules. What this means is that the types of slashes specified on specific lexical entries will interact with the universal grammar (base logic plus structural rules) to obtain the right behavior. Notice that the - modality is not referenced in the associative and permutative restructuring rules: it is thus limited to the base logic and thereby forces strict non-associativity and non-
31. Categorial Grammar
1071
permutativity, as it did with the CCG examples in the previous section. See Moot (2002) for proofs connecting different types of structural rules to the generative capacity they engender. With the base logic and the structural rules given above, many schematic rules that are commonly employed in rule-based approaches can be shown to be theorems, similarly to the above proof for type-raising. For example, the following is a proof of the backward crossed composition rule of CCG. (73) ∆
Y/× Z (∆
1
[z1
◦× z1 )
Z] Y
[/× E]
Γ
X\ × Y
◦× z1 ) ◦× Γ) X [RP ] ((∆ ◦× Γ) ◦× z1) X [/× I ]1 (∆ ◦× Γ) X/× Z
((∆
[\ × E]
This proof shows that the rule Y/×Z X\×Y 0 X/×Z is valid given the base logic and the structural rule of Right Permutation. The fact that such rules can be shown as theorems of the base logic plus a set of structural rules provides a crucial connection to CCG and the use of modalities there to control applicability of its finite set of combinatory rules (Baldridge and Kruijff 2003). This issue is discussed further in the next section. Researchers in CTL consider a much wider range of logical operators than the binary ones considered here. Most commonly used are the unary connectives, which are used for both features and fine-grained structural control. In fact, some eschew the sort of multimodal binary system given above in favor of unary modalities which govern the applicability of structural rules (e.g., Vermaat 2005). Bernardi and Szabolcsi (2008) is a detailed syntactic study of quantifier scope and negative polarity licensing in Hungarian that makes extensive use of unary operators, including Galois connectives, which were introduced by Areces and Bernardi (2004) and developed further by Areces et al. (2004). Bernardi and Moortgat (2010) discuss recent directions in CTL, including Galois connectives and the Lambek-Grishin calculus. Another recent, related line of work is that of pregroup grammars (Casadio and Lambek 2008).
5. Relating type-logical and combinatory approaches By the late 1960’s, interest in the AB calculus had been greatly reduced due to the perception that it was basically context-free grammar in different clothing. This was fair to the extent that it had the same generative capacity as context-free grammar (Bar-Hillel et al. 1964), but it was unfair in that it provided a very different architecture for framing theories of grammar and very different means of extending it. However, the success of Montague grammar in the 1970’s and its close connection with rule-based extensions of categorial grammar led to a revival of interest in the syntactic potential of categorial grammars. This in turn spurred a renewed interest in the Lambek calculus in the 1980’s, culminating in the type-logical approach introduced in the previous section.
1072
IV. Syntactic Models
Development in the logical and combinatory traditions proceeded largely independently of one another until the early 2000’s. The former was largely concerned with linguistic applicability of different logical operators and prooftheoretic properties of the proposed logical systems. The latter focused more on obtaining linguistically expressive systems with low automata-theoretic power and attractive computational properties. The logically-minded categorial grammarian will complain that the combinatory approaches are incomplete, partial systems of type-based inference (and therefore of less interest). The combinatory-minded one will complain that the logical approaches are impractical for computational applications (and therefore of less interest). Despite these historical differences, it is important to realize that the actual linguistic ramifications of both approaches are largely compatible. Furthermore, we can generate rule-based systems from an underlying logic (Baldridge and Kruijff 2003; Hoyt and Baldridge 2008); from this perspective, we can investigate and develop the underlying logic while enjoying the computational advantages of a finite rule set whose rules ensure that each derivational step leads to a reduction, rather than expansion, of categories. As noted earlier, Jacobson (1992b) used slash-types in a rule-based categorial grammar that made it possible to force some categories to compose but not to apply. This is an interesting alternative that is available to a multimodal rule-based system. However, from the CTL perspective all slashes can be eliminated. To achieve this effect in a rulebased system derived from a CTL system, if desired, would require appropriate use of unary modalities (the lock-and-key strategy of Moortgat 1997). With a CCG rule set derived from a CTL system, we obtain the same basic analyses as we would with the original system. In one sense, the CCG derivations can be seen as abbreviated versions of the corresponding CTL proofs. Hypothetical reasoning and explicit structural reconfiguration do not play a role: instead, they are folded into the composition and type-raising rules. Of course, since CTL systems are not finitely axiomatizable, it would offer further derivational possibilities than the finite CCG rule set. The claims about the grammars (and the possible analyses they support) are for most purposes identical given a CTL system defined with the structural rules suggested in the previous section and CCG standard rule set. Regardless of the categorial framework, the linguistic work is done in the categories − they are where the bulk of the linguistic claims are made. This is to say that despite many apparent surface differences, logical and rule-based categorial systems are mostly compatible with respect to their treatment of syntactic phenomena. Viewed in this way, CTL provides a microscope that allows us to peer into very low level details concerning properties such as associativity and commutativity; it then tells us how we can prove rule schemata that are part of the CCG rule base, or which have a similar nature to CCG rules. With those rules in hand, we can essentially short-cut many of the logical steps needed to complete certain CTL inferences. This means not only shorter derivations, but also many advantages for practical parsing with categorial grammars. Generative Capacity: The perceived position of natural language on the Chomsky hierarchy (for those who believe it has such a position) has fluctuated during the past five decades. During the the 1960’s and 1970’s, it was generally accepted by linguists that grammars for natural languages required greater than context-free power. Due to this belief, AB categorial grammar was somewhat sidelined after it was proved to be context-free by Bar-Hillel, Gaifman, and Shamir in 1964 − even Bar-Hillel himself gave up on categorial grammar because of this result. Bar-Hillel should not have despaired
31. Categorial Grammar
1073
so easily − both the belief that categorial grammar was just a notational variant of context-free grammar and the apparent supposition that the categorial approach could not somehow be generalized to create systems of greater power were incorrect. Context-free grammars are indeed now known to be not even weakly adequate to handle all natural language constructions, such as crossing dependencies in Swiss German (Huybregts 1984; Shieber 1985). The combinatory rules employed by CCG increase the context-free power of AB by a small but linguistically significant amount. This places CCG into the class of mildly context-sensitive formalisms (Vijay-Shanker and Weir 1994), which are able to capture crossing dependencies, both with respect to the string languages that characterize them (weak generative capacity) and the structural descriptions which capture the appropriate dependencies (strong generative capacity). As a mildly contextsensitive formalism, CCG enjoys worst-case polynomial parsing complexity (n6) (Vijay-Shanker and Weir 1990). Categorial systems with more power have been defined. Motivated by scrambling phenomena in languages such as Turkish, Multiset-CCG modifies the manner in which categories are constructed and defines rules of combination for them which allow greater flexibility, resulting in a formalism with more than mildly context-sensitive power (Hoffman 1995). CTL using very general structural rules of permutation is Turing complete (Carpenter 1995). Moot (2002) provides a much finer characterization of the different generative strengths produced by CTL systems which conform to certain constraints, including the result that systems which use non-expanding structural rules are only context-sensitive. This result is particularly interesting since it appears that nearly all linguistic applications of CTL have obeyed this constraint (Moot 2002). After a long period of relative dormancy as regards research into generative power, there has been a renewed interest in the strong generative capacity of CCG (and TreeAdjoining Grammar). The upshot of this work is that the equivalences are not so tidy in this regard (Hockenmaier and Young 2008; Koller and Kuhlmann 2009). Furthermore, even the weak generative capacity can be greatly affected by subtle choices, such as the use of rule restrictions or a particular set of modalities in CCG (Kuhlmann et al. 2010). The issue of generative capacity was a historical sticking point between the CCG and CTL traditions, though it is not a central concern now. Early attempts to extend the Lambek calculus to allow for permutation led to full permutation collapse (Moortgat 1988). CTL, however, does not suffer from such a collapse since commutative operations can be introduced in a controlled fashion with modal control. Nonetheless, generative capacity does still underly a difference of explanatory philosophy between linguistic applications of the two frameworks. For CTL researchers, issues of generative capacity are generally not considered to be of prime theoretical importance. In contrast, work in both CCG and its closest sibling formalism, Tree Adjoining Grammar (TAG: Joshi 1988), takes a committed stance on minimizing generative power. The basic linguistic claim for such formalisms is that their restricted formal power provides inherent limitations on theories which are based on them (e.g., see Frank 2002). In this way, they enforce a wide range of formal universals. Such formalisms sit on the lower bound of natural language complexity without venturing any further − they are expressive enough, but cannot do everything. With respect to generative capacity, a key attraction of the multimodal approach is that it is able to mix systems of varying power in a resource-sensitive manner. Thus, more powerful operations of grammatical combination − should their inclusion be warranted by linguistic evidence − are introduced in a controlled manner.
1074
IV. Syntactic Models
6. Related formalisms Categorial grammars of all kinds share deep connections to many other approaches to natural language grammar. Tree Adjoining Grammar (Joshi 1988) is much like CCG in that it is also highly lexicalized and assumes a small, universal rule component. As mentioned above, it is also mildly context-sensitive, and like CCG it has been the focus of a great deal of computational work. In general, there has been a great deal of intellectual crossover between CCG and TAG. Categorial grammar’s extreme lexicalism ties it closely to the tradition of dependency grammar (Hudson 1984; Sgall et al. 1986), which also focuses on the way in which patterns of semantic linkage hold a sentence together, rather than segmenting sentences according to analytic patterns such as phrase structure. The use of typed-feature structures in CCG and related rule-based CGs (Zeevat et al. 1987; Villavicencio 2002; Baldridge 2002; Beavers 2004; McConville 2006) is informed by much work in Head-driven Phrase Structure Grammar (Pollard and Sag 1994; Sag et al. 2003) and its predecessors. This is especially true with respect to providing a theory of the lexicon. In this exposition, we have provided a flat lexicon where no information is shared between the categories. While useful for explicating how categorial grammar works at the derivational level, it is clearly an unrealistic way to organize what is, after all, the core of grammar. There are a number of solutions to manage redundancy and category relatedness in a lexicon. Using typed-feature structures with inheritance as is common in HPSG, it is possible to define a structured lexicon that eliminates a great deal of redundancy between categories of different arity. For example, the ditransitive category can be based on the transitive category, which in turn can be based on the intransitive category, and so on (Villavicencio 2002; Baldridge 2002; McConville 2006). Another related strategy is to use lexical rules that produce additional categories based on core lexical category assignments (Carpenter 1992); such rules are similar to those used in Generalized Phrase Structure Grammar (Gazdar et al. 1985) and Lexical Functional Grammar (Kaplan and Bresnan 1982). There are other strong connections with HPSG and LFG. HPSG analyses typically rely on a very abstract set of phrase-structure rules that are similar to the rule schemata used in CCG. As with categorial grammar, HPSG lexical items trigger derivational steps via their subcategorization requirements. Interestingly, Morrill (2007) comments that HPSG is in fact a variety of categorial grammar: this is hardly a stretch given Pollard’s type-logical formulation of HPSG (Pollard 2004). His recent Convergent Grammar (Pollard 2008) synthesizes ideas from both categorial grammar and HPSG, and is closely related to Abstract Categorial Grammars (de Groote 2001) and λ-Grammar (Muskens 2007). Many researchers working in constraint-based approaches, especially HPSG, have adopted construction grammar (Goldberg 2006) as an organizing philosophy. It might seem that construction grammar, with its emphasis on conventionalized and non-compositional grammatical processes, would be incompatible with categorial grammar. However, there seems to be every reason to use categorial grammar as a formalism for analyses compatible with construction grammar. As an example in this direction, Steedman and Baldridge (2011) discuss the “way” construction as in Marcel slept his way to the top. They suggest that one of the categories for his to license this construction would essentially incorporate way as an item that is subcategorized for.
31. Categorial Grammar
1075
(74) his w ((s\np)\LEX(s\np))/pp/npway This strategy, which is related to the use of trees with multiple lexical anchors in TAG, can be used to lexicalize many other phenomena identified by construction grammarians. If pursued seriously, it would undoubtedly involve an examination of the theory of the categorial lexicon and ways of managing the consequent proliferation of category types. Arguably, this is where the focus of attention of linguistic work in categorial grammar should be anyway. Johnson (1999) provides a resource-sensitive interpretation of LFG that builds on many ideas from CTL. See Muskens (2001) for further discussion of the relationship between categorial grammar and LFG and a proposal for providing a hybrid of the two approaches, building on ideas from Oehrle (1999). The basic architectures of theories of grammar based on both CTL and CCG (mostly unchanged since the late 1980’s) are largely in accord with many of the principles later advocated in some versions of the Minimalist program (Chomsky 1995). A deeper and perhaps surprising (to some) connection appears when Minimalism is viewed through the lens of Minimalist grammars (Stabler 1997; Cornell 1999). As noted earlier, hypothetical reasoning in CTL combined with structural reasoning has many parallels with movement operations as construed in Minimalism. For several perspectives on the relation between CTL and Minimalism, see the collection of papers in the journal of Research on Logic and Computation, 2004, Volume 2(1), particularly Retoré and Stabler (2004), Lecomte (2004), Vermaat (2004) and Cornell (2004).
7. Computational applications Along with much progress with respect to formalizing varieties of categorial grammar with desirable linguistic properties, there has also been considerable development in computational applications of categorial grammar. The success of categorial grammar in these arenas is in great part based on its high degree of lexicalization and its semantic transparency. Like many other computationally amenable frameworks, there are grammar development environments available for testing analyses. The Grail system allows CTL structural rule packages and lexicons to be defined and tested (Moot 2002). The OpenCCG system similarly supports (multi-modal) CCG grammar development and performs both sentence parsing and realization; it has also been used for a wide-range of dialog systems − see Baldridge et al. (2007) for discussion of OpenCCG grammar development and applications and White (2006) for specific discussion of efficient realization with OpenCCG. A major development was the creation of CCGbank (Hockenmaier and Steedman 2007), which has allowed the creation of fast and accurate statistical CCG parsers for producing deep dependencies (Hockenmaier 2003; Clark and Curran 2007). A key feature of categorial grammar, the fact that lexical categories are highly informative for overall syntactical analysis, is used in the C&C CCG parser of Clark and Curran (2007) to make it among the fastest of wide-coverage statistical parsers that produce deep dependencies. A fast supertagger is used to perform assignment of lexical categories before
1076
IV. Syntactic Models
parsing begins, thereby drastically reducing the structural ambiguity that must be considered during parsing. This adaptive supertagging strategy is exploited by Kummerfeld et al. (2010) in combination with self-training to achieve fast parsing times with no loss in accuracy. Auli and Lopez (2011b) is a related study that takes a different strategy for reducing the potential for cutoffs to reduce accuracy: they explore parsing CCGs with adaptive supertagging techniques and A* search to both explore the trade-offs made by supertag cutoffs and obtain faster parsing times. In another paper, the same authors investigate an alternative model that integrates the features of a supertagger and parser into the same model (Auli and Lopez 2011a). The best performance to date on CCG parsing for CCGbank is obtained by this model optimized for F-measure (Auli and Lopez 2011c). CCGbank has provided a basis for applications other than parsing. For example, supertaggers learned from CCGbank have been used to improve statistical machine translation systems (Birch et al. 2007; Hassan et al. 2007), and discriminative models over CCG derivations have been used for word-ordering (generating a sentence from a multiset of input words: Zhang et al. 2012). Analyses from parsers have been used for semantic role labeling (Gildea and Hockenmaier 2003; Boxwell et al. 2011). OpenCCG grammars that support wide-coverage sentence realization have been bootstrapped from CCGbank, reducing the effort that goes into creating larger grammars while taking advantage of the deep representations supported by OpenCCG (Espinosa et al. 2008). A number of augmentations of CCGbank have been created, such as improving the representation of noun phrases (Vadas and Curran 2008), fully lexicalizing type-changing rules (using hat categories: Honnibal and Curran 2009), and adding verb-particle constructions (Constable and Curran 2009). Many of these augmentations were integrated in the rebanking of CCGbank completed by Honnibal et al. (2010). Most recently, the resources − both data and processing tools − that have been built up around CCGbank are being used to bootstrap broader and deeper annotations for the Groningen Meaning Bank, using an ongoing, collaborative, semi-automated annotation environment (Basile et al. 2012). Unlike CCG, CTL has seen little use in computational applications. This is in large part due to significant challenges in efficiently dealing with the options made available by using the full logic, which typically allows many more ways to bracket a string than CCG’s (structurally incomplete) finite rule set permits (For recent work in parsing restricted CTL systems, see Capelleti 2007 and Fowler 2008). Interestingly, Capelleti also considers parsing a variant of CCG with the product operator. This sort of strategy seems to be the most expedient way to efficiently parse CTL systems: basically, one could compile CCG-like rules on an as-needed basis and then use standard parsing algorithms that have been successfully used with CCG. Of course, the work mentioned above assumes that we have defined a grammar manually, either explicitly in a grammar development environment or implicitly in the derivations of sentences in a corpus. It is naturally an interesting question whether we can learn categorial grammars from less informative starting points. Villavicencio (2002) learns lexicons for a rule-based CG given child-directed speech annotated with logical forms. There are recent efforts in this direction that induce CCG parsers from sentences paired with logical forms (Zettlemoyer and Collins 2007; Kwiatkowski et al. 2011). There, a small set of category schemata and paired, abstract logical forms are assumed, and the mapping from words to the appropriate categories and lexical semantics is then
31. Categorial Grammar
1077
learned. Other work has considered how to extend an initial seed lexicon using grammarinformed Hidden Markov Models (Baldridge 2008; Ravi et al. 2010) and inference from a partially completed parse chart (Thomforde and Steedman 2011).
8. Conclusion Categorial grammar has a long, but punctuated, history that has coalesced in the last thirty years to provide unique perspectives on natural language grammar. There are connections not only to combinatory logic and resource-sensitive linear logic, but also category theory more generally. The controlled associativity and permutativity available to modern categorial grammars allows them to enjoy straightforward analyses for a number of otherwise troubling issues in constituency without being overly permissive. Work in Categorial Type Logics continues to explore new type constructors that may have linguistic application and explore their logical/mathematical properties. Work in Combinatory Categorial Grammar remains focused on practical applications and grammar acquisition, both from annotated resources such as CCGbank and from text alone using machine learning methods. Given the connections between the two traditions, briefly sketched out here, it is now easier to translate innovations from one to the other. With the current set of formal and computational advances and their application to language phenomena, categorial grammar is well placed to further deepen our understanding of natural language grammar, both through standard linguistic investigation into specific languages and constructions and as the basis for acquisition of grammars using machine learning methods on text or in communicative agents such as chatbots and robots.
9. Further reading There is an incredibly diverse number of systems based on or related to categorial grammar, and a bewildering (and almost equally diverse) set of notations for those systems. Due to space limitations, much of this work has been only briefly touched on: the interested reader can hopefully remedy some of these gaps by looking further into some of the following pointers. See Steedman’s article on categorial grammar in the previous edition of this handbook for pointers to further reading prior to 1993. That article covers many issues and topics that the present one does not, such as more detailed consideration of combinators, the relation of categorial grammar to linear-indexed grammars, treatments of quantification, and responses to common criticism of categorial grammar at the time. Published around the same time, Wood (1993) provides a very complete and balanced introduction to many ideas, analyses and issues in categorial grammar. The collection of papers in Kruijff and Oehrle (2003) are recent contributions within both the combinatory and type-logical traditions. Morrill (2007) gives a literature survey covering major developments in categorial grammar from 1935 to 1994, with a particular emphasis on typelogical work. Morrill’s most recent book, Categorial Grammar: Logical Syntax, Semantics, and Processing (Morrill 2011), is an up-to-date introduction and extended exposition on the
1078
IV. Syntactic Models
logical tradition in categorial grammar. In particular, it develops discontinuous Lambek grammars, an extension of type-logical grammars that support a new analysis of gapping, following Hendriks (1995), and it covers processing and parsing from the type-logical perspective, building on ideas in Morrill (2000). Some earlier books covering the typelogical approach are Moortgat (1988), Morrill (1994), and Carpenter (1998). A notable aspect of Morrill’s book is the thorough discussion of Montague grammar and how it can be formulated within a modern categorial grammar framework. There are also several notable article-length discussions of the type-logical approach. Moortgat’s (2010) chapter in the Handbook of Logic and Language provides an up-todate and definitive overview of CTL (updating the previous version, Moortgat 1997, in ¨ orjars (2011) prothe first edition of that handbook). Oerhle’s article in Borsley and B vides a particularly readable introduction to CTL. It is worth noting that much of the literature on CTL is highly technical and difficult going for newcomers. In this regard, Barker (2003) is an excellent and friendly source for understanding many fundamental concepts in CTL, and Vermaat (2005) gives one of the most accessible introductions to CTL as a whole. A fairly recent development in the space of type-logical approaches is Lambek’s pregroup grammar (Lambek 1999), a type-based algebraic approach that seeks to use standard mathematical structures and concepts for modeling natural language grammar. See Lambek (2000, 2007, 2010) and the collection of papers in Casadio and Lambek (2008) for articles on the formal and linguistic properties of pregroup grammars. Pregroup grammars have also been used to provide a solution to a recent problem in natural language semantics, in which distributional models of lexical semantics − based on vector spaces induced from large corpora − have been combined with traditional compositional methods. Coecke et al. (2011) show how a type-driven semantic compositional model can be developed in the vector-space setting, in which the syntactic types of the pregroups correspond to tensor product spaces. Vectors for relational words, such as verbs and adjectives, live in the appropriate tensor product space. The framework also provides a semantic operation analogous to function application in which a verb vector, for example, can be “applied” to its arguments, resulting in a vector living in a space for sentences. The ultimate goal of such efforts is to devise systems that are able to assign detailed (possibly hierarchical), data-driven meaning representations to sentences. In these representations, the meaning of life is not (the constant) life’: instead, words and phrases have rich internal representations that allow one to calculate their similarity with other words and phrases. Such similarity comparisons can then be used for other computations such as making inferences based on the substitutability of different predications with respect to one another. See Erk (2010) for an overview of much current work in this vein. Steedman’s Surface Structure and Interpretation monograph provides a detailed account of core theoretical linguistic concerns from the perspective of Combinatory Categorial Grammar, with a special emphasis on extraction asymmetries, coordination, and their interaction with binding (Steedman 1996a). His later book The Syntactic Process (Steedman 2000b) is a thorough exposition on both formal and theoretical aspects of Combinatory Categorial Grammar that encapsulates the content of many of the papers written on CCG through the 1980’s and 1990’s. The article by Steedman and Baldridge in Borsley and Börjars (2011) gives a current and detailed account of CCG and analyses of a wide-range of bounded and unbounded language phenomena. Finally, Steedman’s
31. Categorial Grammar
1079
latest book, Taking Scope (Steedman 2012), develops a surfacecompositional account of of quantification using CCG. In doing so, it covers a great deal of linguistic ground and touches on both computational and human sentence processing.
10. References Ades, Anthony and Mark Steedman 1982 On the order of words. Linguistics & Philosophy 7: 639–642. Ajdukiewicz, Kazimierz 1935 Die syntaktische Konnexität. In: Storrs McCall (ed.), Polish Logic 1920–1939, 207– 231. Oxford: Oxford University Press. translated from Studia Philosophica 1: 1–27. Aoun, Joseph and Elabbas Benmamoun 1998 Minimality, reconstruction, and pf movement. Linguistic Inquiry 29(4): 59–597. Aoun, Joseph and Lena Choueiri 2000 Epithets. Natural Language and Linguistic Theory 18: 1–39. Aoun, Joseph, Choueiri, Lina, and Norbert Hornstein 2001 Resumption, movement, and derivational economy. Linguistic Inquiry 32(3): 371–403. Aoun, Joseph and Audrey Li 2003 Essays on the Representational and Derivational Nature of Grammar. Cambridge, Massachusetts: MIT Press. Areces, C. and R. Bernardi 2004 Analyzing the core of categorial grammar. Journal of Logic, Language and Information 13(2): 121–137. Areces, C., Bernardi, R., and M. Moortgat 2004 Galois connections in categorial type logic. In: R. Oehrle and L. Moss (eds.), Electronic Notes in Theoretical Computer Science. Proceedings of FGMOL’01, 1–12. Elsevier Science B.V. volume 53. Asudeh, Ash 2005 Relational nouns, pronouns, and resumption. Linguistics and Philosophy 28: 375–446. Asudeh, Ash 2012 The Logic of Pronominal Resumption. Oxford: Oxford University Press. Auli, Michael and Adam Lopez 2011a A comparison of loopy belief propagation and dual decomposition for integrated CCG supertagging and parsing. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 470–480. Portland, Oregon, USA: Association for Computational Linguistics. Auli, Michael and Adam Lopez 2011b Efficient CCG parsing: A* versus adaptive supertagging. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 1577–1585. Portland, Oregon, USA: Association for Computational Linguistics. Auli, Michael and Adam Lopez 2011c Training a log-linear parser with loss functions via softmax-margin. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 333–343. Edinburgh, Scotland, UK: Association for Computational Linguistics. Bach, Emmon 1979 Control in montague grammar. Linguistic Inquiry 10: 513–531. Baldridge, Jason 2002 Lexically specified derivational control in Combinatory Categorial Grammar. Ph.D. thesis. University of Edinburgh.
1080
IV. Syntactic Models
Baldridge, Jason 2008 Weakly supervised supertagging with grammar-informed initialization. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), 57– 64. Manchester, UK: Coling 2008 Organizing Committee. Baldridge, Jason, Chatterjee, Sudipta, Palmer, Alexis, and Ben Wing 2007 DotCCG and VisCCG: Wiki and programming paradigms for improved grammar engineering with openccg. In: Proceedings of the Workshop on Grammar Engineering Across Frameworks, 5–25. Stanford, CA: CSLI Publications. Baldridge, Jason and Geert-Jan Kruijff 2002 Coupling CCG and hybrid logic dependency semantics. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, 319–326. Philadelphia, Pennsylvania. Baldridge, Jason and Geert-Jan Kruijff 2003 Multi-modal combinatory categorial grammar. In: Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics, 211–218. Budapest, Hungary. volume 1. Bar-Hillel, Yehoshua 1953 A quasi-arithmetical notation for syntactic description. Language 29: 47–58. Bar-Hillel, Yehoshua, Gaifman, Chaim, and Eliyahu Shamir 1964 On categorial and phrase structure grammars. In: Yehoshua Bar-Hillel (ed.), Language and Information, 99–115. Reading MA: Addison-Wesley. Barker, Chris 2003 A gentle introduction to type logical grammar, the curry-howard correspondence, and cut-elimination. semanticsarchive.net. Basile, Valerio, Bos, Johan, Evang, Kilian, and Noortje Venhuizen 2012 A platform for collaborative semantic annotation. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 92– 96. Avignon, France. Beavers, John 2004 Type-inheritance combinatory categorial grammar. In: Proceedings of COLING-04, Geneva, Switzerland. Bernardi, Raffaella and Michael Moortgat 2010 Continuation semantics for the lambek-grishin calculus. Information and Computation 208(5): 397–416. Bernardi, Raffaella and Anna Szabolcsi 2008 Optionality, scope, and licensing: An application of partially ordered categories. Journal of Logic, Language and Information 17(3): 237–283. Birch, Alexandra, Osborne, Miles, and Philipp Koehn 2007 CCG supertags in factored translation models. In: Proceedings of the Second Workshop on Statistical Machine Translation, 9–16. Prague. Borsley, Robert and Kersti Börjars (eds.) 2011 Non-Transformational Syntax: Formal and Explicit Models of Grammar. New York: Wiley-Blackwell. Boxwell, Stephen, Brew, Chris, Baldridge, Jason, Mehay, Dennis, and Sujith Ravi 2011 Semantic role labeling without treebanks? In: Proceedings of 5th International Joint Conference on Natural Language Processing, 192–200. Chiang Mai, Thailand: Asian Federation of Natural Language Processing. Capelleti, Matteo 2007 Parsing with structure-preserving categorial grammars. Ph.D. thesis. Utrecht. Carpenter, Bob 1992 Lexical and unary rules in categorial grammar. In: Robert Levine (ed.), Formal Grammar: Theory and Implementation, 168–242. Oxford: Oxford University Press. volume 2 of Vancouver Studies in Cognitive Science.
31. Categorial Grammar
1081
Carpenter, Bob 1995 The Turing-completeness of multimodal categorial grammar. ms, http://www.illc.uva.nl/ j50/contribs/carpenter/index.html. Carpenter, Bob 1998 Type-Logical Semantics. Cambridge Massachusetts: MIT Press. Casadio, Claudia 1988 Semantic categories and the development of categorial grammars. In: Richard T. Oehrle, Emmon Bach, and Deirdre Wheeler (eds.), Categorial Grammars and Natural Language Structures, 95–123. Dordrecht: Reidel. Proceedings of the Conference on Categorial Grammar, Tucson, AR, June 1985. Casadio, Claudia and Joachim Lambek (eds.) 2008 Computational Algebraic Approaches to Natural Language. Monza: Polimetrica. Chomsky, Noam 1995 The Minimalist Program. Cambridge, Massachusetts: MIT Press. Choueiri, Lena 2002 Issues in the syntax of resumption: Restrictive relatives in Lebanese arabic. Ph.D. thesis. University of Southern California. Clark, Stephen and James Curran 2007 Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics 33(4) 493−552. Coecke, Bob, Sadrzadeh, Mehrnoosh, and Stephen Clark 2011 Mathematical foundations for a compositional distributional model of meaning. Linguistic Analysis 36: A Festschrift for Joachim (Jim) Lambek: 345–384. Constable, James and James Curran 2009 Integrating verb-particle constructions into CCG parsing. In: Proceedings of the Australasian Language Technology Association Workshop 2009, 114–118. Sydney, Australia. Copestake, Ann, Lascarides, Alex, and Dan Flickinger 2001 An algebra for semantic construction in constraint-based grammars. In: Proceedings of the 39th Annual Meeting of the Association of Computational Linguistics, 132–139. Toulouse, France. Cornell, Thomas 1999 Representational minimalism. In: Hans-Peter Kolb and Uwe Mo¨nnich (eds.), The Mathematics of Syntactic Structure: Trees and Their Logics, 301–340. Berlin: Walter de Gruyter. Cornell, Thomas 2004 Lambek calculus for transformational grammar. Research on Language and Computation 2(1): 105–126. Curry, Haskell B. and Robert Feys 1958 Combinatory Logic: Vol I. Amsterdam: North Holland. Demirdache, Hamida 1991 Resumptive chains in restrictive relatives, appositives and dislocation structures. Ph.D. thesis. MIT. Demirdache, Hamida 1997 Dislocation, resumption, and weakest crossover. In: Elena Anagnostopoulou, Henk Van Riemsdijk, and Frans Zwarts (eds.), Materials on Left-Dislocation, 193–231. Philadelphia: John Benjamins. Dowty, David 1982 Grammatical relations and montague grammar. In: P. Jacobson and G. K. Pullum (eds.), The Nature of Syntactic Representation, 79–130. Dordrecht: Reidel. Dowty, David 1988 Type-raising, functional composition, and nonconstituent coordination. In: Richard T. Oehrle, Emmon Bach, and Deirdre Wheeler (eds.), Categorial Grammars and Natural
1082
IV. Syntactic Models
Language Structures, 153–198. Dordrecht: Reidel. Proceedings of the Conference on Categorial Grammar, Tucson, AR, June 1985. Dowty, David 1997 Non-constituent coordination, wrapping, and multimodal categorial grammars. In: M. L. Dalla Chiara et al. (ed.), Structures and Norms in Science, 347–368. Dordrecht: Kluwer. Erk, Katrin 2010 What is word meaning, really? (and how can distributional models help us describe it?). In: Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics, 17–26. Uppsala, Sweden: Association for Computational Linguistics. Espinosa, Dominic, White, Michael, and Dennis Mehay 2008 Hypertagging: Supertagging for surface realization with CCG. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT), 183–191. Columbus, OH. Fowler, Timothy A. D. 2008 Efficiently parsing with the product-free lambek calculus. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), 217–224. Manchester, UK: Coling 2008 Organizing Committee. Frank, Robert 2002 Phrase Structure Composition and Syntactic Dependencies. Cambridge, Massachusetts: MIT Press. Gazdar, Gerald, Klein, Ewan, Pullum, Geoffrey K., and Ivan A. Sag 1985 Generalised Phrase Structure Grammar. Oxford: Blackwell. Gildea, Daniel and Julia Hockenmaier 2003 Identifying semantic roles using combinatory categorial grammar. In: Michael Collins and Mark Steedman (eds.), Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, 57–64. Girard, Jean-Yves 1987 Linear logic. Theoretical Computer Science 50: 1–102. Goldberg, Adele 2006 Constructions at Work. Oxford: Oxford University Press. de Groote, Philippe 2001 Towards abstract categorial grammars. In: Proceedings of 39th Annual Meeting of the Association for Computational Linguistics, 252–259. Toulouse, France: Association for Computational Linguistics. Hassan, Hany, Sima’an, Khalil, and Andy Way 2007 Supertagged phrasebased statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 288–295. Prague. Hendriks, Petra 1995 Comparatives and categorial grammar. Ph.D. thesis. Rijksuniversiteit Groningen. Hepple, Mark 1995 Hybrid categorial logics. Bulletin of the IGPL 3(2,3): 343–356. Special Issue on Deduction and Language. Hockenmaier, Julia 2003 Parsing with generative models of predicate-argument structure. In: Proceedings of the 41st Meeting of the ACL, 359–366. Hockenmaier, Julia and Mark Steedman 2007 CCGbank: A corpus of CCG derivations and dependency structures extracted from the penn treebank. Computational Linguistics 33(3): 355–396. Hockenmaier, Julia and Peter Young 2008 Non-local scrambling: the equivalence of TAG and CCG revisited. In: Proceedings of The Ninth International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+9), 41–48. Tu¨bingen, Germany.
31. Categorial Grammar
1083
Hoffman, Beryl 1995 Computational analysis of the syntax and interpretation of ‘free’ word-order in Turkish. Ph.D. thesis. University of Pennsylvania. IRCS Report 95–17. Honnibal, Matthew 2009 Hat categories: Representing form and function simultaneously in combinatory categorial grammar. Ph.D. thesis. University of Sydney. Honnibal, Matthew and James R. Curran 2009 Fully lexicalising CCGbank with hat categories. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 1212–1221. Singapore: Association for Computational Linguistics. Honnibal, Matthew, Curran, James R., and Johan Bos 2010 Rebanking CCGbank for improved np interpretation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 207–215. Uppsala, Sweden: Association for Computational Linguistics. Hoyt, Frederick and Jason Baldridge 2008 A logical basis for the d combinator and normal form constraints in combinatory categorial grammar. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, 326–334. Columbus, OH. Hoyt, Frederick M 2010 Negative concord in levantine arabic. Ph.D. thesis. University of Texas at Austin. Hudson, Richard 1984 Word Grammar. Oxford: Blackwell. Huybregts, Riny 1984 The weak inadequacy of context-free phrasestructure grammars. In: Ger de Haan, Mieke Trommelen, and Wim Zonneveld (eds.), Van Periferie naar Kern, 81–99. Dordrecht: Foris. Jacobson, Pauline 1990 Raising as function composition. Linguistics and Philosophy 13: 423–475. Jacobson, Pauline 1992a Antecedent-contained deletion in variable-free semantics. In: Chris Barker and David Dowty (eds.), Proceedings of the Second Conference on Semantics and Linguistic Theory, 193–213. OSU Working Papers in Linguistics. Ohio State University. Jacobson, Pauline 1992b Flexible categorial grammars: Questions and prospects. In: Robert Levine (ed.), Formal Grammar, 129–167. Oxford: Oxford University Press. Jacobson, Pauline 1993 i-within-i effects in a variable-free semantics and a categorial syntax. In: Paul Dekker and Martin Stokhof (eds.), Proceedings of the Ninth Amsterdam Colloquium, 349–369. Jacobson, Pauline 1999 Towards a variable-free semantics. Linguistics and Philosophy 22: 117–184. Jacobson, Pauline 2000 Paycheck pronouns, bach-peters sentences, and variable-free semantics. Natural Language Semantics 8: 77–155. Jacobson, Pauline 2002 Direct compositionality and variable-free semantics: The case of binding into heads. In: Brendan Jackson (ed.), Proceedings of the Twelfth Conference on Semantics and Linguistic Theory (SALT XII), 144–163. Jacobson, Pauline 2003 Binding without pronouns (and pronouns without binding). In: Geert-Jan M. Kruijff and Richard T. Oehrle (eds.), Resource-Sensitivity, Binding, and Anaphora, 57–96. Dordrecht, Boston and London: Kluwer.
1084
IV. Syntactic Models
Johnson, Mark 1999 A resource sensitive interpretation of lexical functional grammar. Journal of Logic, Language and Information 8(1): 45–81. Joshi, Aravind 1988 Tree Adjoining Grammars. In: David Dowty, Lauri Karttunen, and Arnold Zwicky (eds.), Natural Language Parsing, 206–250. Cambridge: Cambridge University Press. Kaplan, Ronald and Joan Bresnan 1982 Lexical-Functional Grammar: A formal system for grammatical representation. In: The Mental Representation of Grammatical Relations, 173–281. Cambridge, Massachusetts: MIT Press. Koller, Alexander and Marco Kuhlmann 2009 Dependency trees and the strong generative capacity of CCG. In: Proceedings of the 12th Conference of the European Chapter of the ACL, 460–468. Athens. Kruijff, Geert-Jan M and Richard T. Oehrle (eds.) 2003 Resource Sensitivity, Binding and Anaphora. Dordrecht, Boston and London: Kluwer. Kuhlmann, Marco, Koller, Alexander, and Giorgio Satta 2010 The importance of rule restrictions in CCG. In: Proceedings of the 48th ACL, 534– 543. Uppsala. Kummerfeld, Jonathan K., Roesner, Jessika, Dawborn, Tim, Haggerty, James, Curran, James R., and Stephen Clark 2010 Faster parsing by supertagger adaptation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 345–355. Uppsala, Sweden: Association for Computational Linguistics. Kwiatkowski, Tom, Zettlemoyer, Luke, Goldwater, Sharon, and Mark Steedman 2011 Lexical generalization in CCG grammar induction for semantic parsing. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 1512– 1523. Edinburgh, Scotland, UK: Association for Computational Linguistics. Lambek, Joachim 1958 The mathematics of sentence structure. American Mathematical Monthly 65: 154–169. Lambek, Joachim 1999 Type grammar revisited. In: Logical Aspects of Computational Linguistics, 1–27. Spring. volume 1582 of Lecture Notes in Computer Science. Lambek, Joachim 2000 Type grammar meets german word order. Theoretical Linguistics (26): 19–30. Lambek, Joachim 2007 From word to sentence: a pregroup analysis of the object pronoun who(m). Journal of Logic, Language and Information 16: 303–323. Lambek, Joachim 2010 Exploring feature agreement in French with parallel pregroup computations. Journal of Logic, Language and Information 19(1): 75–88. Lecomte, Alain 2004 Rebuilding MP on a logical ground. Research on Language and Computation 2(1): 27–55. McConville, Mark 2006 Inheritance and the CCG lexicon. In: Proceedings of the European Association for Computational Linguistics, 1–8. Trento. Moortgat, M. and R. Oehrle 1994 Adjacency, dependency, and order. In: P. Dekker and M. Stokhof (eds.), Proceedings of the Ninth Amsterdam Colloquium, 447–466. Amsterdam: ILLC. Moortgat, Michael 1988 Categorial Investigations: Logical and Linguistic Aspects of the Lambek Calculus. Dordrecht, The Netherlands: Foris.
31. Categorial Grammar
1085
Moortgat, Michael 1997 Categorial type logics. In: Johan van Benthem and Alice ter Meulen (eds.), Handbook of Logic and Language, 93–178. Amsterdam, New York etc.: Elsevier Science B.V. Moortgat, Michael 1999 Constants of grammatical reasoning. In: Gosse Bouma, Erhard W. Hinrichs, Geert-Jan M. Kruijff, and Richard T. Oehrle (eds.), Constraints and Resources in Natural Language Syntax and Semantics, 195–219. Stanford, CA: CSLI Publications. Moortgat, Michael 2010 Categorial type logics. In: Johan van Benthem and Alice ter Meulen (eds.), Handbook of Logic and Language, 95−180. Amsterdam, New York etc.: Elsevier Science B.V. Second edition. Moot, Richard 2002 Proof nets for linguistic analysis. Ph.D. thesis. UiL OTS, University of Utrecht. Morrill, Glyn 2000 Incremental processing and acceptability. Computational Linguistics 26: 319–338. Morrill, Glyn 2007 A chronicle of type logical grammar: 1935–1994. Research on Language and Computation 5(3): 359–386. Morrill, Glyn 2011 Categorial Grammar: Logical Syntax, Semantics, and Processing. Oxford: Oxford University Press. Morrill, Glyn V. 1994 Type Logical Grammar: Categorial Logic of Signs. Dordrecht, Boston, London: Kluwer Academic Publishers. Muskens, Reinhard 2001 Categorial grammar and lexical-functional grammar. In: Miriam Butt and Tracy Holloway King (eds.), Proceedings of the LFG01 Conference, 259–279. Stanford, CA: CSLI Publications. Muskens, Reinhard 2007 Separating syntax and combinatorics in categorial grammar. Research on Language and Computation 5: 267–285. Oehrle, Richard T. 1999 LFG as labeled deduction. In: M. Dalrymple (ed.), Semantics and Syntax in Lexical Functional Grammar, 319–357. Cambridge, Massachusetts: MIT Press. Oehrle, Richard T. 2011 Multi-modal type-logical grammar. In: Robert Borsley and Kersti Börjars (eds.), NonTransformational Syntax: Formal and Explicit Models of Grammar, 225–267. New York: Blackwell. Ouhalla, Jamal 2001 Parasitic gaps and resumptive pronouns. In: Peter Cullicover and Paul Postal (eds.), Parasitic Gaps, 147–180. Cambridge: MIT Press. Pollard, Carl 2004 Type-logical HPSG. In: G. Jaeger, P. Monachesi, G. Penn, and S. Wintner (eds.), Proceedings of Formal Grammar 2004 (Nancy), 107–124. Pollard, Carl 2008 Hyperintensional questions. In: W. Hodges and R. de Queiroz (eds.), Proceedings of the 15th Workshop on Logic, Language, Information, and Computation (WoLLIC ’08), 261– 274. volume 5110 of Springer Lecture Notes in Artificial Intelligence. Pollard, Carl and Ivan Sag 1994 Head Driven Phrase Structure Grammar. Chicago: CSLI/Chicago University Press. Ravi, Sujith, Baldridge, Jason, and Kevin Knight 2010 Minimized models and grammar-informed initialization for supertagging with highly ambiguous lexicons. In: Proceedings of the 48th Annual Meeting of the Association for
1086
IV. Syntactic Models
Computational Linguistics, 495–503. Uppsala, Sweden: Association for Computational Linguistics. Retoré, Christian and Edward Stabler 2004 Generative grammars in resource logics. Research on Language and Computation 2(1): 3–25. Ross, John Robert 1967 Constraints on variables in syntax. Ph.D. thesis. MIT. Published as “Infinite Syntax!”, Ablex, Norton, NJ. 1986. Sag, Ivan A., Wasow, Tom A., and Emily Bender 2003 Syntactic Theory: A Formal Introduction. Stanford, California: Center for the Study of Language and Information. second edition. Schönfinkel, Moses 1924 Über die Bausteine der mathematischen Logik. Mathematische Annalen 92: 305–316. Sgall, Petr, Hajicˇova´, Eva, and Jarmila Panevova´ 1986 The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. Dordrecht, Boston, London: D. Reidel Publishing Company. Shieber, Stuart 1985 Evidence against the context-freeness of natural language. Linguistics and Philosophy 8: 333–343. Stabler, Edward 1997 Derivational minimalism. In: Christian Retoré (ed.), Logical Aspects of Computational Linguistics, 68–95. Berlin and Heidelberg: Springer. Steedman, Mark 1985 Dependency and coordination in the grammar of Dutch and English. Language 61: 523–568. Steedman, Mark 1996a Surface Structure and Interpretation. Cambridge, Massachusetts: MIT Press. Linguistic Inquiry Monograph, 30. Steedman, Mark 1996b Surface Structure and Interpretation. Cambridge, Massachusetts: MIT Press. Steedman, Mark 2000a Implications of binding for lexicalized grammars. In: Anne Abeill´e and Owen Rambow (eds.), Tree Adjoining Grammars: Formalisms, Linguistic Analysis, and Processing, 283–301. Stanford: CSLI. Steedman, Mark 2000b The Syntactic Process. Cambridge, Massachusetts: The MIT Press. Steedman, Mark 2012 Taking Scope. Cambridge, Massachusetts: MIT Press/Bradford Books. Steedman, Mark and Jason Baldridge 2011 Combinatory categorial grammar. In: Robert Borsley and Kersti Börjars (eds.), NonTransformational Syntax: Formal and Explicit Models of Grammar, 181–224. New York: Blackwell. Szabolcsi, Anna 1987 Bound variables in syntax: Are there any? In: Stokhof Gronendijk and Veltman (eds.), Proceedings of the Sixth Amsterdam Colloquium, 331–351. Institute for Language, Logic, and Information. Amsterdam. Szabolcsi, Anna 1992 On combinatory grammar and projection from the lexicon. In: Ivan Sag and Anna Szabolcsi (eds.), Lexical Matters, 241–268. Stanford, CA: CSLI Publications. Szabolcsi, Anna 2003 Binding on the fly: Cross-sentential anaphora in variable-free semantics. In: Richard Oehrle and Geert-Jan Kruijff (eds.), Resource Sensitivity, Binding, and Anaphora, 215– 229. Dordrecht, Boston and London: Kluwer.
31. Categorial Grammar
1087
Thomforde, Emily and Mark Steedman 2011 Semi-supervised CCG lexicon extension. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 1246–1256. Edinburgh, Scotland, UK: Association for Computational Linguistics. Vadas, David and James R. Curran 2008 Parsing noun phrase structure with CCG. In: Proceedings of ACL-08: HLT, 335–343. Columbus, Ohio: Association for Computational Linguistics. van Benthem, Johann 1989 Logical constants across varying types. Notre Dame Journal of Formal Logic 30(3): 315–342. Vermaat, W. 1999 Controlling movement: Minimalism in a deductive perspective. Doctorandus thesis, University of Utrecht. Vermaat, Willemijn 2004 The minimalist move operation in a deductive perspective. Research on Language and Computation 2(1): 69–85. Vermaat, Willemijn 2005 The logic of variation: A cross-linguistic account of wh-question formation. Ph.D. thesis. Utrecht University. Vijay-Shanker, K. and David Weir 1990 Polynomial time parsing of combinatory categorial grammars. In: Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, 1–8. Pittsburgh, PA: Association for Computational Linguistics. Vijay-Shanker, K. and David Weir 1994 The equivalence of four extensions of context-free grammar. Mathematical Systems Theory 27: 511–546. Villavicencio, Aline 2002 The acquisition of a unification-based generalised categorial grammar. Ph.D. thesis. University of Cambridge. White, Michael 2006 Efficient realization of coordinate structures in combinatory categorial grammar. Research on Language and Computation 4(1): 39–75. Wood, Mary McGee 1993 Categorial Grammar. London and New York: Routledge. Zeevat, Henk, Klein, Ewan, and Jo Calder 1987 An introduction to unification categorial grammar. In: N. Haddock et al. (ed.), Edinburgh Working Papers in Cognitive Science, 1: Categorial Grammar, Unification Grammar, and Parsing, 195–222. University of Edinburgh. Zettlemoyer, Luke and Michael Collins 2007 Online learning of relaxed CCG grammars for parsing to logical form. In: Proceedings of EMNLP-CoNLL 2007, 678–687. Zhang, Yue, Blackwood, Graeme, and Stephen Clark 2012 Syntax-based word ordering incorporating a large-scale language model. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 736–746. Avignon, France.
Jason Baldridge, Austin TX (USA) Frederick Hoyt, Kittery ME (USA)
V. Interfaces 32. Syntax and the Lexicon 1. 2. 3. 4. 5. 6. 7.
Introduction How does the lexicon feed syntax? Decomposition of verb meaning Syntactic approaches to the lexicon Construction grammar Conclusion References (selected)
Abstract The aim of this chapter is to offer an overview of the different approaches to the complex interaction between syntax and the lexicon. As will be shown, the way linguists conceive of the interaction between syntax and the lexicon has been constantly re-examined through the development of linguistic theory. This chapter will review some proposals in greater detail than others, focussing on the ones considered to be more influential. However, this chapter will not discuss questions concerning the representation of world knowledge, its storage, as well as ways to access this information. For an overview, see the contributions in Wunderlich (2006).
1. Introduction Bloomfield (1933: 269) stated that a complete description of language will include a lexicon. Though Bloomfield himself thought of the lexicon as an appendix of grammar, linguists ever since have been debating over what an adequate theory of grammar should look like, how much information is contained in the lexicon and how the lexicon interacts with the other components of grammar, primarily syntax. While most linguists would agree that syntax is the module of grammar that contains the rules to combine words to form phrases and sentences, it is generally not agreed upon what the scope of the lexicon is. One point of disagreement concerns the level of word formation/morphology: should it be seen as part of the lexicon or not? If yes, then syntax is viewed as building phrases out of words, while the lexicon builds words out of morphemes (see Wasow 1977). For several linguists, the lexicon is generally characterized as a list of forms that speakers of a language know or memorize. For instance, Di Sciullo and Williams (1987) refer to the items listed in the lexicon as listemes. The term listeme highlights the fact that words in this sense must be listed in the lexicon because they have idiosyncratic properties (not governed by general principles) that speakers have to memorize. In their work, these authors point out that not only underived words can be listed in the lexicon.
32. Syntax and the Lexicon
1089
Some syntactic phrases, e.g. idioms, and complex words with non-compositional meaning must also be listed. From this perspective, the division of labor between the two components focuses on compositionality. Elements that are built on the basis of general principles need not be listed. A related way to express this division of labor is given by Wasow (1977) and is represented here in (1). (1)
Syntax Not subject to exceptions
Lexicon Has idiosyncrasies
The linguistic schools in which the relationship between syntax and the lexicon has been studied are quite diverse. Next to generativist approaches, we have frameworks such as Role and Reference Grammar and Construction Grammar. Clearly, this chapter cannot do full justice to all important contributions in this area, thus the presentation will review some proposals in greater detail than others. Before I begin my discussion, let me delineate the general concerns of a theory of the syntax-lexicon interface. Any theory of the relation between the lexicon and the syntax must deal with two issues: first, how lexical entries are structured and organized; and second, the ways lexical information is made visible to syntax. The behavior of verbs is a particularly illuminating case to address the complex interaction between the syntax and the lexicon, especially because of the existence of argument alternations. Given that a noun phrase bearing a particular theta-role can appear in different syntactic positions for the same verb, we need to develop a theory of what controls this mapping. Argument alternations will thus play a central role in my discussion. With this in mind, we can now explore how different schools of linguistics have approached this interaction.
2. How does the lexicon feed syntax? 2.1. The importance of lexical information In the early 1980s, the idea emerged that major aspects of the syntax of a sentence are projected from the lexical properties of the elements it contains (e.g. Chomsky 1981; Pesetsky 1982; Stowell 1981). That lexical information must somehow be related to the syntax of verbs can be seen in the following set of examples, borrowed from Levin (1993: 3). Levin points out that a speaker of, e.g., English seems to have an idea about the syntactic behavior of a verb; crucially, he is aware of the fact of whether or not a verb may participate in one of various argument alternations. These include alternations that involve a change in the transitivity of a verb. (2) is an example of the causative alternation; break permits both transitive and intransitive uses, while appear is only permissable in an intransitive use. The transitive use of break can be paraphrased as cause to break: (2)
a. The window broke. b. The little boy broke the window.
1090
V. Interfaces c. The rabbit appeared. d. *The magician appeared the rabbit out of his hat.
But what enables speakers to provide such judgments? Levin (1993: 4), citing Hale and Keyser (1987), suggests that what enables a speaker to determine the behavior of a verb is its meaning. Some supportive evidence for this conclusion comes from the behavior of the archaic verb gally. As Hale and Keyser argue, if speakers are not familiar with the meaning of this verb, they may conclude that it can mean one of two things in an example such as (3), a) see or b) frighten (examples below from Levin 1993: 4−5): (3)
The sailors gallied the whales.
For those speakers, who determine the meaning of the verb to be (b), (3′) will be grammatical; however, those speakers who determine the meaning of the verb to be the one in (a) will judge (3′) as ungrammatical. The reasoning here is that (3′) is an example of the middle construction and the formation of middles is restricted to a well-defined semantic class of verbs. (3′)
Whales gally easily.
A further illustration of the influence of a verb’s meaning on its syntactic behavior is provided by the following set of examples, from Levin (1993: 6). While all the following verbs are transitive, they differ in two respects: i) whether they can participate in the middle alternation or not (5), and ii) whether they can take an object introduced by the preposition at or not (6); this latter alternation is referred to as the conative alternation. (4)
a. b. c. d.
Mary cut the bread. Jane broke the vase. Terry touched the cat. Carla hit the door.
(5)
a. b. c. d.
The bread cuts easily. Crystal vases break easily. *Cats touch easily. *Door frames hit easily.
(6)
a. b. c. d.
Mary cut at the bread. *Janet broke at the vase. *Terry touched at the cat. Carla hit at the door.
(7−9) show that break is the only verb in this group that alternates in the causative alternation, cf. (2): (7)
a. Mary cut the string. d. *The string cut.
(8)
a. The little boy broke the window. b. The window broke.
32. Syntax and the Lexicon (9)
a. b. c. d.
1091
Terry touched the cat. *The cat touched. Carla hit the door. *The door hit.
The behavior of these four verbs is summarized by Levin (1993) as in (10): (10) conative middle causative
touch
hit
cut
break
no no no
yes no no
yes yes no
no yes yes
Naturally, these verbs are representatives of larger classes, as suggested already in Fillmore (1970), cf. Levin (1993) for a detailed discussion. Further examples of verbs belonging to each of these classes are given in (11): (11) break verbs: break, rip, snap … cut verbs: cut, hack, saw … touch verbs: touch, pat, stroke … hit verbs: hit, bash, kick …. We can thus conclude in agreement with Levin that verbal behavior in alternations is related to the meaning of the individual verbs/verb classes. Upon first approximation one could classify touch as a pure contact verb, while the meaning of hit suggests a verb of contact by motion; cut, on the other hand, is a verb of causing a change of state by moving something into contact with the entity that changes state and finally, break is a pure verb of change of state. These components of meaning explain the behavior of verbs in alternations. For example, the conative alternation is arguably sensitive to both contact and motion components, and as expected, only hit and cut verbs are acceptable in this construction. The causative alternation, on the other hand, is sensitive to the property of change of state, hence only break verbs can participate in the alternation. See Levin (1993) for further discussion and references.
2.2. How fine-grained should verb classes be? The examples in (4−9) make clear that when dealing with verbal behavior the postulation of verb classes is a useful tool for the linguist, as it enables systematic generalizations. Of course, however, the question arises how fine-grained the classification of verbs by their meaning should be. In more recent work, Levin (to appear) summarizes this discussion and observes that we can recognize three main tendencies in verb classification. She takes the classification of the verb run as an example to illustrate this. A first type of classification is labeled coarse-grained, and the main representatives of this are Rappaport Hovav and Levin (2010). Considering this view, verbs are distinguished into manner and result verbs, and run is a manner verb:
1092
V. Interfaces
(12) manner verbs: specify a manner to perform an action run, hit, cry, jump, walk, smear, spray… result verbs: specify the result of an event arrive, break, clean, come, cover, die, go, open, put, As shown in (13−14), taken from Levin (to appear), this classification influences argument realization: manner verbs appear in a greater number of alternations than result verbs. (13) Pat ran. Pat ran to the beach. Pat ran herself ragged. Pat ran her shoes to shreds. Pat ran clear of the falling rocks. The coach ran the athletes around the track. (likewise many manner of motion verbs)
(activity) (directed motion) (change of state) (change of state) (directed motion) (causation)
(14) The students went. The students went to the beach. *The jetsetters went themselves ragged. *The runner went his shoes to shreds. *The pedestrian went clear of the oncoming car. *The coach went the athletes around the track. (likewise many directed motion verbs) A second type of classification is the one offered in Fillmore (1970), and is labeled medium-grained by Levin (op.cit.). According to this classification, run is a manner of motion verb, and it contrasts with other manner verb subclasses, e.g. verbs of manner of speaking: (15) a. manner of motion verbs: amble, crawl, fly, hop, jog, jump, gallop, limp, run, scamper, skip, swim, trudge, walk, wander, … b. manner of speaking verbs: holler, mumble, murmur, mutter, scream, shout, stammer, whisper, yell, … c. manner of hitting verbs: beat, hit, kick, pound, rap, tap, … This classification is argued to determine finer-grained argument realization options among manner verbs, such as transitivity, choice of preposition, and participation in object alternations, which cannot be captured within a coarse-grained classification. This is illustrated in (16−17), with examples choosing different prepositional phrases, again taken from Levin (to appear): (16) Directional phrases (specify path of the subject): a. Tracy ran into the room/over the hill/up the stairs. b. *Tracy shouted into the room/over the hill/up the stairs.
32. Syntax and the Lexicon
1093
(17) Addressee to/at phrases (specify path of the understood content of communication): a. Tracy shouted to/at Sandy. b. *Tracy ran to/at Sandy. Finally, we also find a fine-grained classification, according to which, among manner of motion verbs, run lexicalizes a manner of motion that causes directed displacement towards a goal (as opposed to verbs that do not do so) (Allen, Ozyürek, Kita, Brown, Furman, Ishizuka, and Fujii 2007; Biberauer and Folli 2004; Folli and Ramchand 2005). (18) a. b.
displacement implying manner verbs: fly, jump, roll, run, slide, walk, … other manner verbs: amble, dance, float, meander, scamper, stroll, wander, …
The verbs in (18a) involve manners characteristic of animates, which are typically used with the intention to reach a goal; thus, they may at least implicate a path (Beavers, Levin, and Tham 2010). This distinction accounts for the fact that these are the manner of motion verbs most likely to allow directional interpretations in the context of purely locative PPs, as in (19a), again taken from Levin (to appear). Note that (19b) is odd under a directional interpretation, but is fine under a locative one (see e.g. Gehrke 2008). (19) a. John walked in the room. b. #John danced in the room. This is a consequence that the other two classification systems cannot readily account for. It seems, however, that although the above classifications can each explain subsets of the data, no single one is sufficient to account for all generalizations. Assuming that verbs can be classified into e.g. manner and result verbs, two issues arise. First, are manner and result aspects of meaning projected into syntax? Hale and Keyser (1993), Erteschik-Shir and Rapoport (2010), and Zubizarreta and Oh (2007), among others, give a positive answer to this question. Second, how is this information represented in the lexical entry of a verb in approaches that assume a lexicon as a distinct grammar component? In the next section, I turn to a discussion of the projection problem, and in section 3 I turn to issues of representation of verbal meaning.
2.3. Lexical entries and mapping rules In the previous sections, we saw that the lexical properties of verbs determine to a large extent the syntax of the sentences in which they appear. In the early 80s, (see Chomsky 1981 and subsequent work) this is captured in terms of the Projection Principle in (20): (20) Lexical information is syntactically represented. How does this lexical information look like and how rich is it?
1094
V. Interfaces
Within the 80s generative framework, lexical entries contain certain syntactic, semantic and phonological information. The syntactic information of lexical items includes its category (e.g. verb) and its selectional restrictions. It is further assumed that each verb is associated with a predicate-argument structure (AS) Bresnan 1982; Grimshaw 1990), which signals the number of syntactic arguments a verb has, and some information about how the arguments are projected onto syntax, for example, as internal or external arguments (Marantz 1984; Williams 1981). The lexicon and the syntax interact only in one fixed point: the output of the former is the input to the latter (cf. 21): (21)
Lexicon Syntax 3 postlexical phonology LF
From this perspective, a lexical entry gives information on the number of arguments a given verb contains, which are then projected in the syntax. A simple illustration of this view is given in the examples in (22) and (23). In terms of the number of arguments, we can recognize three main groups of verbs: i) intransitive (only 1 argument), further subdivided into unergatives, e.g. sing and unaccusatives, e.g. arrive, see Perlmutter (1978), Burzio (1981), Levin and Rappaport Hovav (1995); the two differ from one another in that the single argument of unergatives is an external argument, while that of unaccusatives is an internal one, see section 4, ii) transitive (2 arguments), and iii) ditransitive (3 arguments). (22c) is an example of an argument alternation not mentioned in the previous section, namely the dative alternation; this differs from e.g. the causative alternation, as it involves alternation in the realization of the same set of arguments (see Anagnostopoulou 2003 for a thorough cross-linguistic discussion of the dative alternation): (22) a. b. c.
John is dithering (*the crime). John met his friend. John gave a gift to Bill/John gave Bill a gift.
(23) offers examples of argument structures for the examples in (22). Usually the argument marked 1 is taken to be the external argument, i.e. the argument that is not strictly speaking an argument of the verb but rather of the VP, cf. Bresnan (1982), and Grimshaw (1990) who argue that the external argument is simply special, as it combines last with the verb; see also the discussion in section 4: (23) a.
dither: verb
b.
meet: verb
c.
give: verb
1 NP 1 NP 1 NP NP
2 NP 2 NP NP
3 PP NP
32. Syntax and the Lexicon
1095
The predicate-argument structure is often called theta-grid (Stowell 1981; Williams 1981) in that it gives also information about the semantic relations between the arguments and the verb. The specific semantic relations between the verb and its arguments are referred to as theta-roles. So we say that the verb kill takes two arguments to which it assigns theta-roles; thus in (24) John refers to the entity which is the agent of the activity of killing. Bill refers to the entity which is the patient of the activity. This is encoded in the form of a theta-grid, as in (25): (24) John killed Bill. (25) kill: verb
AGENT
PATIENT
It is assumed that the kind of theta-role determines the position of the NP in the syntactic tree, i.e. whether it functions as a subject or as an object: (26) kill: verb
AGENT NP
PATIENT NP
The distribution of theta-roles is regulated by the theta-criterion (27), Chomsky (1981): (27) Theta-criterion Each argument is assigned one and only one theta-role. Each theta-role is assigned to one and only one argument. The theta-criterion ensures that no predicate can end up with the wrong number of arguments and no argument can remain without an interpretation. The AS representation is not unique to individual predicates or classes of predicates. Two different predicates, like walk and sleep, will probably have the same AS, although they differ in meaning. AS is part of the lexical information of a verb, and not a syntactic layer. AS is often seen in connection with another level of lexical representation, the socalled lexical conceptual structure (LCS), which refers to a structured lexical representation of verb meaning (see the overview in Levin and Rappaport Hovav 2011). Several linguists assume that in addition to a verb’s argument structure, it is possible to isolate a small set of recurring meaning components which determine the range of argument alternations a particular verb can participate in. These meaning components are embodied in the primitive predicates of predicate decompositions such as LCSs. LCS is the deep semantic description, which is probably unique for any particular predicate, or class of predicates. I will examine these approaches in section 3. Before I discuss the projection problem, let me note here, however, that there is a considerable amount of tension concerning the relationship between AS, LCS and syntax. For some authors, LCS properties are directly reflected in AS and the mapping from AS to syntax is, in most cases, trivial. Crucially, this view entails that there is no direct relation between syntax and the LCS of predicates, but only between LCS and AS. This is the view on lexical representations taken by authors like Carrier and Randall (1992), Grimshaw (1990), Levin and Rappaport (1986, 1988), Tenny (1994), and Zubizarreta (1987). In frameworks such as Lexical Functional Grammar (LFG), and Role and Reference Grammar (e.g. van Valin 1990), a lexical entry contains information about the
1096
V. Interfaces
meaning of the lexical item, its AS and the grammatical functions that are associated with the arguments. That is, the lexicon directly interacts with syntactic structure. Turning now to the projection problem, an explanation for the correspondence between semantic properties of verbs and their syntactic behavior has been provided in terms of the postulation of certain linking principles, between LCS and AS, on the one hand, and between AS and syntax, on the other. All such principles ensure that the appropriate participant in the event ends up in the appropriate place in the syntactic tree, thus accounting for theta-role/syntactic structure regularities. The general format such proposals adopt is as in (28), which represents the Universal Alignment Hypothesis of Perlmutter and Postal (1984). (28) There exist principles of universal grammar which predict the initial relation [= syntactic encoding] borne by each nominal in a given clause from the meaning of the clause. Other, similar in spirit views, include the Uniformity of Theta- Assignment Hypothesis (UTAH) (cf. Baker 1988; Pesetsky 1995), and the Linking Rules in Levin and Rappaport Hovav (1995). Especially UTAH attempted to constrain the interface between conceptual representations and syntactic representations in a particularly tight way, Baker (1988: 46): (29) UTAH Identical theta relationships between items are represented by identical structural relationships between those items at the level of D-structure. In later work, Baker (1997) argued that UTAH is sensitive to a medium-coarse grained version of theta theory, one that distinguishes three primary (proto)-roles: agent/causer, theme/patient, and goal/path/location. The conditions that it puts on the structural realization of these roles seem to be absolute, rather than relative, and they map the theme to a higher position than the goal. Finally, aspectual notions converge with theta ones in an important range of cases, but seem not to be adequately general. The linking principles that give content to the UTAH can be stated as follows: (30) (i) (ii) (iii)
An agent is the specifier of the higher VP of a Larsonian structure. A theme is the specifier of the lower VP. A goal, path or location is the complement of the lower VP.
Grimshaw (1990) proposes that two hierarchies are responsible for the projection of arguments into syntax: the theta hierarchy (31) and the aspectual hierarchy (32). (31) entails that the highest theta role in the theta hierarchy is mapped to subject, and lower roles are mapped to more prominent syntactic functions, following this hierarchy. (31) AGT < EXP < SOURCE/GOAL/LOCATION < TH Grimshaw’s concern was the behavior of experiencer verbs such as fear and frighten. These differ from one another in that for fear, the theme argument is projected as the object and the experiencer as the subject (Mary fears John), while for frighten the reverse
32. Syntax and the Lexicon
1097
holds (John frightens Mary). The latter is not predicted by the hierarchy in (31), but is accounted for in terms of the aspectual hierarchy in (32), which takes prominence of the theta hierarchy. Grimshaw’s solution is to employ an aspectual hierarchy in which CAUSE is the highest role, as specified in (32): (32) CAUSE > others
(aspectual dimension)
Since fear is a stative verb, there is no cause component involved and the projection of arguments follows the theta hierarchy. Tenny (1994) argues that only the aspectual properties of arguments are visible to linking principles. This is illustrated in the aspectual interface hypothesis in (33): (33) The universal principles of mapping between thematic structure and syntactic argument structure are governed by aspectual properties. Constraints on the aspectual properties associated with direct internal arguments, indirect internal arguments, and external arguments in syntactic structure constrain the kinds of event participants that can occupy these positions. Only the aspectual part of thematic structure is visible to the universal linking principles. In Tenny’s approach, two aspectual properties are relevant for the realization of arguments: measuring out and delimitedness. Tenny postulates two constraints that ensure that the argument that measures out the event is expressed as a direct internal argument, and that the one delimiting the event will be either a direct or indirect internal argument: (34) Measuring-out constraint on direct internal arguments: a. The direct internal argument of a simple verb is constrained so that it undergoes no necessary internal motion or change, unless it is motion or change which measures out the event over time (where measuring out entails that the direct argument plays a particular role in delimiting the event). b. Direct internal arguments are the only overt arguments which can measure out the event. c. There can be no more than one measuring out for any event described by a verb. (35) The terminus constraint on indirect internal arguments: a. An indirect internal argument can only participate in aspectual structure by providing a terminus for the event described by the verb. The terminus causes the event to be delimited. b. If the event has a terminus, it also has a path, either implicit or overt. c. An event as described by a verb can only have one terminus. Thus, according to Tenny, the arguments that are prototypical direct objects are those that measure out an event, as in (36): (36) John washed the cart for half an hour.
1098
V. Interfaces
Concerning the external argument, Tenny proposes the following constraint: (37) The non-measuring constraint on external arguments: An external argument cannot participate in measuring out or delimiting the event described by a verb. An external argument cannot be a measure, a path or a terminus. In the next section, I turn to issues of representation of verbal meaning.
3. Decomposition of verb meaning In general, a consensus has been reached that verbal meaning can be decomposed into more primitive units. We recognize two main approaches to verbal meaning decomposition, one based on theta-roles, and one based on event structure or Aktionsart properties (Vendler 1967). Mixed approaches such as Grimshaw (1990) also exist, as already mentioned. In section 3.1, I will review earlier approaches based on theta roles, in section 3.2 I turn to Reinhart’s theta-system, in 3.3 I focus on an event structure approach put forth by Rappaport Hovav and Levin (2001) and in section 3.4 I briefly discuss other event based approaches.
3.1. Decomposition based on theta-roles The decomposition of verb meaning based on theta-roles can be further sub-divided into two main sub-groups; a first group of researchers posits a semantic role list, such as Gruber (1965) and Fillmore (1970), included in a theta-grid (Stowell 1981) in the Government and Binding framework, as we saw in the previous section. Here the verbal meaning is represented by a list of labels identifying the semantic roles that each of the verb’s arguments play in the event it denotes. For instance, break is associated with the list Agent, Patient (John broke the window). These semantic roles are supposed to not be further analyzable, independent of the verb meaning, and limited in number. A second group posits generalized semantic roles, see most notably van Valin (1999) and Dowty (1991). Generalized semantic roles recognize two such roles: one associated with the subject and one with the object, but see Primus (1999) who recognizes a third role associated with the first object in the double object construction. Dowty (1991) argues that one should see semantic roles as clusters or prototypes, which he refers to as proto-roles. Within Role and Reference Grammar, these are described as macro-roles and are taken to be semantic neutralizations of finer-grained semantic roles. In Dowty’s approach, there are two main proto-roles: the Agent and the Patient. Thus, the contributing properties of the Agent Proto-Role are defined as follows: a. volitional involvement in the event or state, b. sentience (and/or perception), c. causing an event or change of state in another participant, d. movement (relative to the position of another participant) and e. exists independently of the event named by the verb. On the other hand, the contributing properties for the Patient Proto-Role are: a. undergoes change of state, b. incremental theme, c. causally affected by another part, d. stationary relative to
32. Syntax and the Lexicon
1099
movement of another participant and e. does not exist independently of the event or not at all. An argument selection principle, as in (38), regulates the subject and object realization: (38) Argument selection principle: In predicates with grammatical subject and object, the argument for which the predicate entails the greatest number of Proto-Agent properties will be lexicalized as the subject of the predicate; the argument having the greatest number of ProtoPatient entailments will be lexicalized as the direct object. In the next section, I turn to a discussion of the theta-system proposed by Reinhart (2000, 2002), as this system is viewed as overcoming some of the problems raised for earlier approaches based on theta roles, see also Rozwadowska (1988). As has often been noted, there are serious doubts concerning the definability and empirical adequacy of theta role classifications; and second, using principles like the thematic hierarchy to regulate mapping to the syntax does not always give the correct empirical results (for instance dative alternation verbs, psych predicates with either experiencer objects or experiencer subject are cases in point). See Levin and Rappaport Hovav (2005) and Ramchand (2008) for further discussion.
3.2. The theta-system Reinhart (2002) develops the theta-system, which she defines as follows. The thetasystem contains (at least) lexical entries, which are coded concepts, the theta-relations of verb-entries, and a set of operations on lexical entries. The inputs of the CS (syntax) are lexical items selected from the theta-system. Its outputs are representations legible to the inference, context, and sound systems. For the outputs of the theta-system to be legible to all these systems, they need to be formally coded. She proposes that thetarelations can be encoded in terms of two binary features, of the kind assumed in phonology. These define eight feature clusters, which have been labeled as theta-roles. These features are also legible to the inference systems, and hence they are not erased in the CS, but are passed on through the derivation. Two features are used ±m for mental state and ±c for cause change. All roles can be defined on the basis of two primitive features ±c and ±m: (39) [−c−m] [−c] [−m] [+c+m] [+m] [−c+m] [+c−m]
theme goal subject matter agent experiencer I experiencer II cause and maybe instrument
In Reinhart’s system the realization of arguments is guided by the Lexicon Uniformity Principle:
1100
V. Interfaces
(40) Lexicon Uniformity Principle: Each verb-concept corresponds to one lexical entry with one thematic structure. → The various thematic forms of a given verb are derived by lexicon-operations from one thematic structure. (40) suggests that each verb is associated with one thematic structure, as in e.g. (41). In (41) the external argument corresponds to the cluster 1, while the internal argument to cluster 2. (41) The wind damaged my apple tree. Damage: [+c−m] [−c−m] cause theme From this thematic structure, other thematic forms can be derived by a limited set of lexicon operations. In Reinhart’s view there are two main operations: saturation and reduction. Another, more restricted, is the operation of causativization to which I turn in (44). The most obvious example of saturation is passivization, and (intrinsic) of reduction is reflexivization (cf. Grimshaw 1990). Their semantic effects are best analyzed in Chierchia (2004): the operation of saturation closes existentially one of the verb’s arguments. Thus, it is realized semantically, though it does not project as a syntactic argument. A reduction operation applies to a two place relation, identifies two arguments, and reduces the relation to a property. The two operations are illustrated in (42). (42) Operations on θ roles. a. wash θ1, θ2 b.
Saturation: dx (wash (x θ2 )) Max was washed ↔: dx (x wash Max)
c.
Reduction: R(wash) θ1 Max R(washed) ↔ Max λx(x wash x)
When saturation applies to e.g. θ1, and Max is selected for θ2, we get the passive derivation, which is interpreted as in (42b). Reduction creates an intransitive entry, with one role to fill in the syntax. In the case of reflexives, reduction is interpreted as (schematically) in (42c). There is a further option to reduce the external argument, corresponding to cluster 1 in this system, namely expletivization that simply reduces an external [+c] role, as in (43a), and is the operation behind the causative alternation. The outcome is an anticausative as in (43b): (43) a. b.
The storm broke the window. The window broke.
In addition, there is an operation that adds a theta role, namely causativization; according to Reinhart, causativization consists of two parts: i) the feature specification of a given cluster changes and ii) an agent role is added, as in (44):
32. Syntax and the Lexicon
1101
(44) John walks the dog to the park. a. decausativize: change a /+c feature to a /−c feature: walk ([+c−m]) → walk ([−c+m]) b. Agentivize: add an agent role: walk ([+c−m]) ([−c+m]) In related work, Reinhart and Siloni (2005) argue that arity operations may apply in the Lexicon or in the Syntax. A case in point for this partition comes from the possibility of reflexivization into ECM (exceptional case marking) predicates. Marantz (1984) noted for Icelandic, and Reinhart and Siloni (2005) reiterate this point for French, that languages differ as to whether they permit this operation or not. Hence (45b) is possible in French, but there are no English parallels to this example. Languages like English must insert an anaphor in the subject position of the embedded clause to obtain the relevant interpretation. (45b) is contrasted with (45a). Pierre in (45a), to which considére assigns accusative Case, is the subject of the small clause and receives its theta-role from the adjective intelligent. As it is not an argument of considère, a lexical operation on the theta-grid of the latter cannot affect it: (45) a. Jean considère Pierre intelligent. Jean considers Pierre intelligent ‘Jean considers Pierre intelligent’
[French]
b. Jean se considère intelligent Jean REFL considers intelligent ‘Jean considers himself intelligent’ (46) *John considers intelligent. Reinhart and Siloni argue in detail that reflexive verbs do not have a derived subject, contra Marantz (1984), but I will not enter into this discussion here. The important thing is that in their view reflexivization is not a reduction operation, as suggested by Chierchia (2004), but an operation that takes two theta-roles and forms one complex theta-role, which is called bundling. When this operation applies in the lexicon the accusative Case of the verb is reduced. When it applies in the Syntax, Case is reduced by the appropriate morphology (such as the clitic se) and bundling applies to unassigned theta-roles, upon merger of the external theta-role. Thus, in the French ECM structure, the unassigned role of the embedded verb is still available when the matrix external role is merged, therefore these two roles can bundle together. See Kallulli (2006), Papangeli (2004), Rákosi (2006) among others for implementations of this cluster system in languages such as Albanian, Greek and Hungarian, respectively. Kallulli further combines the thetasystem with a structural decomposition in VP shells.
3.3. Roots and event templates A number of lexical semanticists have pursued another method to deal with the problems and limitations of semantic roles. The claim is that verb meaning should be decomposed
1102
V. Interfaces
in terms of event structure templates. Since verbs name events − a notion that goes back to Davidson (1967) − and verb meaning is decomposed into some primitive predicates, predicate decomposition theories are regarded as theories of events. For example, a primitive predicate CAUSE is often argued to be present in the meaning of transitive verbs such as break, open and dry. These approaches made their way into the syntactic representation of verb meaning in the form of e.g. VP shells. Predicate decomposition approaches have been pursued in the work of Jackendoff (1990), Wunderlich (2000), in the work of Croft (1991), and in Role and Reference Grammar, cf. van Valin and LaPolla (1997). The localist approach is primarily associated with Jackendoff (1983) and is a system of predicate decomposition where location is the key notion and predicates are decomposed on the basis of primitives such as BE (for stative), GO (for motional verbs) and STAY (for nonstative verbs). On the other hand, the causal approach takes the so-called causal chain as the main feature of verb meaning. Each event can be described in terms of a causal chain and the participants are realized as e.g. subject and object, see Croft (1991). I refer the reader to Levin and Rappaport Hovav (2005, 2011) for a comprehensive introduction to predicate decomposition and a detailed discussion of other approaches. Concentrating on Rappaport Hovav and Levin (2001), these authors develop a system according to which verb meaning may be represented as a predicate decomposition consisting of two components. These are: (i) an event schema, which is assumed to be the structural component of meaning representing an event type, drawn from a limited inventory consisting of the event types encodable in language; (ii) the root, called constant in earlier work, which is the idiosyncratic component of verb meaning, characterized by an ontological categorization, which is chosen from a fixed set of types: e.g., state, result state, thing, stuff, container, manner, instrument. In addition, the authors posit so-called canonical realization rules which express how the ontological category of a root determines its integration into an event schema. According to these authors, roots and templates interact as follows; the roots are coarse-grained classifications of verb meaning, and are introduced into two kinds of event templates, a simple one, and a complex one. Naturally, different roots enter into different templates; (47) below follows Levin and Rappaport Hovav’s (2008) notation: (47) a.
manner → [ x ACT ] (e.g., jog, run, creak, whistle, … )
b.
instrument → [ x ACT ] (e.g., brush, hammer, saw, shovel, … )
c.
container → [ x CAUSE [ y BECOME AT ] ] (e.g., bag, box, cage, crate, garage, pocket, …)
d.
internally caused state → [ x BECOME ] (e.g., bloom, blossom, decay, flower, rot, rust, sprout, … )
e.
externally caused, i.e. result, state → [ [ x ACT ] CAUSE [ y BECOME ] ] (e.g., break, dry, harden, melt, open, … )
32. Syntax and the Lexicon
1103
Roots are integrated into schemas as arguments (e.g., [47c]−[e]) or modifiers (e.g., [47a]−[b]) of predicates; following Levin and Rappaport Hovav, roots are italicized and in angle brackets and are notated via subscripts when modifiers. The classification the authors propose amounts to the proposal that manner roots modify ACT, while result roots are arguments of BECOME. This enables them to offer an explanation for the distinct behavior of verbs in alternations. In their system an event template participant must be syntactically realized. Since e.g. sweep verbs only have a simple event structure they do not require an internal argument as an event participant, contrary to break verbs: (48) a.
Kim swept/*Kim broke. unspecified objects
b.
Kim scrubbed/*broke her fingers raw. non-subcategorized objects
c.
Kim broke/wiped the window.
d.
The window broke/*wiped. causative alternation
One constraint and one principle restrict this system, the lexicalization constraint and the Argument Realization Principle, given in (49) and (50) respectively, see the discussion in section 4.4. As Rappaport Hovav and Levin (2001) state, in their system, events are sets of temporally anchored properties and a complex event has subevents, which are not necessarily temporarily aligned. But crucially, a causative event is a complex event, and hence, has two arguments: (49) The lexicalization constraint: A root can only be associated with one primitive predicate in an event structure schema as either an argument or a modifier (50) Argument Realization Principle (ARP): There must be one argument XP in the syntax to identify each sub-event in the event structure template. (Rappaport Hovav and Levin 2001: 779) The ARP has been invoked in order to account for the unacceptability of example (51a) (Rappaport Hovav and Levin 1998: 120). (51a) is intended to be a caused change of location: an accomplishment in the Dowty/Vendler classification. As illustrated in (51b), the analysis assumes that there are two independent sub-events: the sweeping action and the motion of the dust onto the floor that is caused by the sweeping. The sweeping action is identified by the subject argument; the motion subevent demands that the theme argument (dust) be overtly realized as well. That is, the ARP requires that both arguments in (51b) be overtly expressed as they are in (51c). (51) a. b. c.
*Phil swept onto the floor. Phil ACT BECOME [dust ] Phil swept the dust onto the floor.
1104
V. Interfaces
Ideas similar but not identical to the ARP are also found in Grimshaw and Vikner (1993), van Hout (1996), and Kaufmann and Wunderlich (1998).
3.4. Aspectual approaches In the approach taken in Rappaport Hovav and Levin (2001), the key semantic factor determining argument realization is the notion of event complexity. This is not the case, for example, in other event related approaches, such as van Hout (1996), and Pustejovksy (1995), where the key notion is telicity. From this perspective, any event which is aspectually telic is a complex event. Event complexity also differs from a pure aspectual approach to verb classification as in e.g. Tenny (1994). Some examples of event roles, from Grimshaw (1990), Tenny (1994), and van Voorst (1988), are given in (52), from Rosen (1999). Event roles describe the part of the event that the argument is linguistically involved in. For example, an originator (cf. van Voorst) begins, or instigates an event; a delimiter (cf. Tenny, van Voorst) determines the extent, or unfolding of the event; a terminus (Tenny) determines the endpoint of the event. (52) a. b.
Ned ate the apple. Fred pushed the cart to the gas station.
It is important to keep in mind that event roles are independent of semantic roles. A particularly suggestive case for this necessary separation is offered on the basis of examples containing instruments and locatives. As is well known, instruments and locatives generally appear in oblique position, but if an instrument is interpreted as an originator, it will map to subject position (The key opened the door), and if a locative delimits an event, it will map to direct object position (The farmer loaded the truck with hay). I refer the reader to Levin and Rappaport Hovav (2005) for a thorough discussion of the different approaches.
3.5. Lexical semantics: some criticism Although lexical semantics has clarified a number of issues concerning verbal behavior, it seems that such approaches also have their limitations, see also Doron (2003). Three main arguments against semantic classification have been presented. First, it has been observed that semantically similar verbs may behave differently across languages (cf. C. Rosen 1984). A case in point here is the verb kill; while in English it cannot enter the causative alternation, it does in Hebrew (cf. Reinhart 2002) and in Greek (cf. Alexiadou, Anagnostopoulou and Schäfer 2006); second, a given verb within a language may have multiple syntactic realizations, as is the case with the verb break (C. Rosen 1984; S. T. Rosen 1996); third, semantically similar verbs may allow different mappings, for example, kill and break both contain a CAUSE component in terms of LCS, but only the latter is licit in the causative alternation in English (cf. S. T. Rosen 1996). It is precisely this variability that motivates purely syntactic proposals, as we will see in section 4.
32. Syntax and the Lexicon
1105
Importantly, context-based variability directly contradicts hypotheses about semantically based universal alignment. The line of reasoning is as follows: a verb classification model claims that two verbs are semantically similar only if they are syntactically identical. As a consequence of this, whenever two verbs behave differently in the syntax, one must posit more and more detailed semantic classes. And, indeed, this is the direction that several researchers took. Thus, several researchers have argued that the lexical semantics of the verb has limited influence on the syntactic behavior of the arguments or the semantic interpretation of the clause. Whereas event mapping models claim that verb semantics tightly controls the syntax, Ritter and Rosen (1996) show that the syntactic position of the arguments and the specific semantics of the arguments themselves play a large role in verb interpretation: Verbs, at least in part, mean what the syntax allows them to mean. This point is made explicit in recent work by Borer (2005). Contrary to what lexical models assume, verbs have variable argument realizations, the extent of argument variability differs across verbs, and variability is correlated with just how detailed a particular verb’s lexical representation is. The less detailed a given verb’s semantic specification, the more variability the verb allows in its argument realization and event interpretation, and the more the syntactic context contributes to the interpretation. If we assume that (a) lexical semantics isn’t enough to explain the forces at work in argument mapping, and (b) verbs are often used in a fashion that violates the canonical assumptions about their lexical structure, then in order to characterize verbal behavior, one needs to minimize the lexicon. This is indeed the direction taken in what I call syntactic approaches to the Lexicon and these will be discussed in section 4. The term syntactic approaches to the Lexicon characterizes a number of approaches where it is the syntax that determines the meaning of a verb and the interpretation of arguments and not a particular encoding of information within a lexical entry. In fact, the various approaches eliminate the notion of a rich/informative lexical entry to a varying degree. Note here that Levin and Rappaport Hovav (2011) view some of these approaches as descendants of the original LCSs. In this group, they include Hale and Keyser (1992, 1993), Mateu (1999), Travis (to appear), Zubizarreta and Oh (2007) and Ramchand (2008), although these authors differ from one another significantly.
4. Syntactic approaches to the lexicon In view of the problems that lexical semantics faces, a number of authors argued for syntactic-based models. This turn was partially triggered by the developments within syntactic theorizing, which led to a situation where the older and simpler picture of the syntax of VPs and argument realization can no longer be maintained. Once VP-shells were introduced (Larson 1988), theta-roles were no longer forced to occur in unique positions. Moreover, with the introduction of elaborated functional structures (Pollock 1989; Ouhalla 1988) the conception of realization of verb meaning has changed. Syntactic approaches to the lexicon use the variability in argument realization as their point of departure, as illustrated here with an example of the causative alternation (53) and the flexible behavior of the verb eat (54), from Ramchand (2008) and cf. also Goldberg (1995):
1106
V. Interfaces
(53) a. b.
The glass broke. John broke the glass.
(54) a. b. c. d. e. f.
John ate the apple. John ate at the apple. The sea ate into the coastline. John ate me out of house and home. John ate. John ate his way into history.
Ideally, one would not want to stipulate ambiguity with multiple lexical items to explain this phenomenon (cf. Levin and Rappaport Hovav 1995; Reinhart 2002). This flexibility, however, is not completely general, e.g. the verb arrive and weigh do not enter transitivity alternations: (55) a. b.
John arrived. *Bill arrived John.
(56) a. b.
Mary weighs 100 pounds. *Mary weighs. (from Ramchand 2008)
A further problem for semantic approaches is the issue of cross-linguistic variation; i.e. not all languages permit such variability with the verb eat, one does not find the same semantic class of verbs in the causative alternation, e.g. kill alternates in Greek but not in English (Alexiadou, Anagnostopoulou, and Schäfer 2006), and several languages allow the verb arrive to enter transitivity alternations, e.g. Japanese. Since lexical semantics cannot capture the lack of cross-linguistic identical behavior of semantically-related classes of verbs, syntactic approaches deny that there is a clearcut distinction between syntax and semantics. Semantics is, on these approaches, not a primitive, but rather verbal meaning is expressed exclusively on the basis of syntactic configurations, see section 4.5. In order to appreciate more recent work on the syntax of verbs, it is important to contrast it with what I consider to be the standard approach. Let us take unaccusativity as the case in point here. In the early 80s, the single argument of unaccusative verbs such as arrive was generated as a D-structure object (VP-internally) and moved to an A(rgument)-position in IP, i.e. the position where the subject of unergatives such as run was generated. The theta-grid of each verb determined the number of arguments and the theta-criterion ensured that each NP would receive exactly one theta-role. As explained previously in this chapter, this lexical information was then projected into syntax. (57) a. [IP NP [VP V] unergative John ran. b. [VP V NP ] unaccusative John arrived. However, this picture has radically changed. I already mentioned in the beginning of this section the introduction of VP-shells, which no longer required theta-roles to occur in unique positions. A further outcome of this introduction is that the subject-object asymmetry can no longer be expressed as a specifier-complement asymmetry. Moreover,
32. Syntax and the Lexicon
1107
with the introduction of elaborated functional structures (Pollock 1989; Ouhalla 1988) and the VP-internal subject hypothesis (Koopman and Sportiche 1991) the conception of A-movement has also changed. A great deal of the work done nowadays in the syntax of verbs finds its predecessor in Larson’s shells. A common thread is that there is no projection of lexical information. Theta-roles are defined structurally, see the discussion in section 2.3 and the principles in (30). Some researchers go as far as to deny the theta-criterion its status, allowing an NP to collect more than one theta-role; this is especially the case in Ramchand (2008); see also Hornstein (1999) and Manzini and Roussou (2000) for arguments in support of this step from the area of Control. In what follows, I will discuss syntactic approaches that can be distinguished from one another as to how they represent the syntax of event structure. As in the other areas, I will concentrate on some influential systems by making reference to other sources as I expound my point. I will begin with Hale and Keyser, as they introduce a distinction between l(-exical) syntax, roughly corresponding to the area of the VP and the syntactic component. This label is adopted also in Travis’s (1994) work, as well as in Zubizarreta and Oh (2007) and Erteschik-Shir and Rapoport (1997, 2005, 2010) and is implicit in Ramchand’s First Phase Syntax model, but it does not seem to be so relevant for e.g. Borer (2005) and work within Distributed Morphology where no such distinction is made. Note that in all syntactic approaches the issue of alternation is treated in terms of presence vs. absence of certain structural layers that introduce further arguments.
4.1. L-syntax Hale and Keyser (1993, 2002) put forth a distinction between lexical syntax and the syntactic component. These authors define the elementary structural types on the basis of the fundamental relations in argument structure, i.e. the relations head-complement and specifier-head. These relations permit certain lexical structures. In other words, the representations in (58) are syntactic, but they are part of lexical syntax. A head which takes a complement but projects no specifier is called monadic, corresponding to (58a) below, in which h represents the head, and cmp represents the complement. The system also permits a structure that consists of the head alone, as in (58d). In addition, it also permits a basic dyadic structure as in (58b), in which spc represents the specifier. A fourth type is permitted in which the head projects a structure embodying both the head complement and the head specifier relation as in (58c). In this particular structure, the *h functions as predicate that establishes a relationship between the DP and the adjective. The structural configurations in (58) are neutral with respect to the morphosyntactic category. In English, there is a favored categorial realization of these heads, as is illustrated under the configurations in (58a−d).
1108
V. Interfaces
(58) a.
h 3 H V
cmp
b.
h 3 spec h 3 h(P) cmp
c.
*h 3 spec *h 3 *h cmp(A)
d.
h N
In this system, unergative verbs are created by a combination of (58d) and (58a), N incorporated into V. Hale and Keyser do not discuss in detail how the external argument of this verb is introduced, one can assume that a higher level will introduce that, but they crucially want the VP in (59) to differ from the VP in (60) in that it does not project a specifier, and, hence, it can never have a transitive construal: (59)
V 3 V N walk t
Unaccusatives, on the other hand, are prototypically instances of (58c). (60) a.
VP 3 Spec V' 3 the leaves V A turn red
Transitives are created by a combination of (58c) and (58a).
32. Syntax and the Lexicon (60) b.
1109
VP 3 Spec V' 3 V VP 3 Spec V' 3 V A the wind turned the leaves red
From this perspective, the causative alternation involves causativization, i.e. the adding of a VP layer introducing the external argument. However, as Hale and Keyser note, in cases where the intransitive variant of the alternation is morphologically marked via e.g. a reflexive as in Italian, then the process is labeled de-transitivization and involves the elimination of the external argument; cf. Levin and Rappaport Hovav (1995), Reinhart (2002), Kalluli (2006) among others. Location-locatum predicates are created by a combination of (58a) and (58b), where h in (b) is P: (61)
V 3 V P 3 put the books P 3 on NP 4 the shelf
In this system, predicates like put are basically transitive, while unaccusative verbs are basically intransitive. In recent years, a number of approaches have emerged that were influenced by Hale and Keyser to varying degrees. See for instance Mateu (1999), Harley (2005), Zubizarreta and Oh (2007) and Erteschik-Shir and Rapoport (1997, 2005, 2010). ErteschikShir and Rapoport in particular go as far as to decompose Vs into atomic meaning components (semantic morphemes): M (Manner), S (State), and L (Location). From this view, a verb like laugh has a manner component, and is an activity verb; arrive has a location component and implies change, while rust has a state component and also implies change. Each of these components projects a VP-shell in the syntax. This is part of a framework the authors term Atom Theory (AT). According to AT, verbs are decomposed into atomic meaning components whose syntactic projection derives aspectual interpretation and argument selection; this is done without recourse to linking rules, functional projections, or movement. There is a restricted universal inventory of atoms from which a verb’s meaning is chosen of the type M, S, and/or L. Each atom ranges over the same set of concepts as an equivalent morpho-syntactic category. M is equivalent to adverbials (manner, means, instrument), S to adjectives, and L to the full range of prepositions. A verb’s meaning is composed entirely of its atoms, as outlined with the laugh, arrive, and rust examples. A verb may have one or two atoms. This is a
1110
V. Interfaces
universal constraint which follows from the fact that there are only two types of atoms, manner (M) and possible results (S and L), cf. the discussion in section 4.4 and in section 3.3.
4.2. Structural representations of events As already discussed, there are several approaches that adopt an event decomposition of verb meaning. Such views have also been adopted by syntacticians with an important twist. While syntactic approaches base themselves on event decomposition (von Stechow 1995), they do not assume a mapping relation between a lexical event structure and syntax. Rather syntax is event structure. In other words, as will become evident in this section, to a certain extent syntactic approaches assume decomposition structures of the type put forth in the tradition of lexical semantics, but they defend the view that it is the syntactic composition that give rise to a particular event interpretation.
4.2.1. The introduction of event phrases Travis (1994, to appear) was among the first researchers to propose that events are encoded in the clausal functional projections and are related to the mechanism of agreement. Her work focuses on verbal behavior in Malagasy and Tagalog. On the basis of these languages, she argued that two functional projections encode the event − an Event Phrase (EP) dominated by T, and an AspP sandwiched between a VP-shell introducing the external argument (containing transitivizers, causatives and other such light verbs) and a second VP. The details of the analysis have changed throughout the development of Travis’ work, but the basic function of these projections has remained the same. In the most recent version of her work, AspP encodes delimitation or telicity; and EP binds the event variable [e] of the verb’s argument array (Davidson 1967) and provides event information. The structure is as in (62): (62)
EP 3 E’ 3 E V1P 3 V1’ 3 V1 AspectP 3 Asp’ 3 Asp V2P 3 V2’ 3 V2 XP
32. Syntax and the Lexicon
1111
V1 is a lexical category that introduces the external argument and when it does it has a meaning similar to CAUSE. ASP, depending on its feature content, has a meaning similar to BE/BECOME. V2 introduces the Theme argument and the endpoint of the event, XP. Note here that VP is a label used for many disparate constituents in current literature. For Travis, it is used to designate the VP of the late 1980s, the projection that contains all the merged arguments of the verb. This is probably closer to the notion of vP in Chomsky (1995), PredicateP in Bowers (1993), VoiceP in Kratzer (1996).
4.2.2. First phase syntax We have seen in the discussion of the system put forth by Levin and Rappaport Hovav (2001) that event complexity is a determining factor in argument realization. Ramchand (2008) can be characterized as a syntactic representative of this idea. The system put forth is labeled first phase syntax, i.e. syntax that does not interact with Case, agreement and the IP/CP domain, and its central idea is that it decomposes the information classically seen to reside within lexical items into a set of distinct categories with specific syntactic and semantic modes of combination. Lexical items in English are seen as complex in features, with their argument structure properties and flexibility deriving ultimately from the association rules that link the particular feature bundle to the syntactic combinatoric system. In particular, Ramchand proposes that first phase syntax contains three sub-evental components: a causing sub-event, a process denoting sub-event and a sub-event corresponding to a result state. Each of these sub-events is represented as its own projection, ordered in the hierarchical embedding relation shown in (63). (63) InitiatorP 3 ProcessP 3 ResultP From this view, procP is the heart of the dynamic predicate since it represents change through time, and it is present in every dynamic verb. In other words, a procP is present regardless of whether we are dealing with a process that is extended (i.e. consisting of an indefinite number of transitions) or the limiting case of representing only single minimal transition such as that found with achievement verbs. The initP exists when the verb expresses a causational or initiational state that leads to the process. The resP only exists when there is a result state explicitly expressed by the lexical predicate; it does not correlate with semantic/aspectual boundedness in a general sense. Specifically, the telicity that arises because of the entailments based on DP structure and the verbal relation do not mean that resP exists unless the event structure itself is specified as expressing a result state. Conversely, the expression of result can be further modified by auxiliaries, PPs etc. outside the first phase syntax to create predications that are atelic, but this will not warrant the removal of resP in the syntactic representation.
1112
V. Interfaces
In addition to representing subevental complexity, as motivated by work on verbal aktionsart (Vendler 1967; Parsons 1990; Pustejovsky 1991), this structure is also designed to capture the set of core argument roles, as defined by the predicational relations formed at each level. In some sense, each projection represented here forms its own core predicational structure with the specifier position being filled by the subject or theme of a particular (sub)event, and the complement position being filled by the phrase that provides the content of that event. The complement position itself of course is also complex and contains another mini-predication, with its own specifier and complement. In this way, the participant relations are built up recursively from successively embedded event descriptions and subject predications. So what do these projections do? According to Ramchand, a) initP introduces the causation event and licenses the external argument (subject of cause = initiator); b) procP specifies the nature of the change or process and licenses the entity undergoing change or process (subject of process = undergoer); c) resP gives the telos or result state of the event and licenses the entity that comes to hold the result state (subject of result = resultee). In this system, verb classes are derived on the basis of the features they contain, e.g. transitive open contains the features init, proc and res. Examples of these roles, taken from Ramchand (2008), are given in (64−66). The subjects in (64) are initiators, the objects in (65) are undergoers and the objects in (66) are resultees: (64) a. b. c.
The key opened the lock. The rock broke the window. John persuaded Mary.
(65) a. b. c. d.
Karena drove the car. Michael dried the coffee beans. The ball rolled. The apple reddened.
(66) a. b. c.
Katherine ran her shoes ragged. Alex handed her homework in. Michael threw the dog out.
This idea has antecedents in the work of Kaufmann and Wunderlich (1998) who argue for a level of semantic structure (SF), which is crucially binary and asymmetric and in which possible verbs are formed by constrained embedding, see (67). (67) In a decomposed SF representation of a verb, every more deeply embedded predicate must specify the higher predicate or sortal properties activated by the higher predicate. (Kaufmann and Wunderlich 1998: 5) Kaufmann and Wunderlich see their SF level as a sub-part of the lexical semantics, and not represented directly in syntax, but the internal structure of their representations is very similar to what Ramchand proposes. A further characteristic of the first phase syntax system is that noun phrases can bear more than one role, again all examples from Ramchand (2008). For instance, undergoerinitiator is a composite role which arises when the same argument is the holder of
32. Syntax and the Lexicon
1113
initiational state and of a changing property homomorphic with the event trace of the proc event, see (68). For example, in (68a), Karena is interpreted as the initiator of the event, but at the same time she is also understood as the undergoer of a process. (68) a. b. c. d.
Karena ran to the tree. The diamond sparkled. Ariel ate the mango. Kayleigh danced.
Finally, resultee-undergoer is a composite role which arises when the same argument is the holder of a changing property homomorphic with the event trace of the proc event, and the holder of the result state, see (69). (69) a. Michael pushed the cart to the store. b. Katherine broke the stick. Alternations in this system result from variability in projection. Thus, break can lexicalize all three layers, giving rise to a transitive variant, or only proc and res, giving rise to an intransitive variant.
4.3. Structuring telicity Borer (2005) develops a system according to which there is no hierarchical or thematic information associated with arguments in lexical entries. For Borer, the syntactic structure gives rise to a template which determines the interpretation of arguments (see also Ritter and Rosen 2000; van Hout 1996). Her view can be schematically summarized as in (70). (70) syntactic structure → event structure → interpretation of arguments In particular, as encyclopedic items, i.e. listemes, do not in and of themselves have arguments, Borer assumes that argument structure emerges through functional syntactic structure, which has the effect of verbalizing a listeme, in the intended sense; this functional structure is eventive/aspectual in nature. Thus, the functional projections that are relevant for the interpretation of arguments are certain aspectual specifiers, which enforce subject of result or subject of process readings. This accounts for the different aspectual properties associated with each verb class. Two specifiers are of importance: the specifier of Aspect-of-Quantity phrase, which is interpreted as subject of quantity, in the sense that it undergoes a structural change and the specifier of the Eventive Phrase, which is interpreted as the originator. Borer offers the following example to illustrate her point. Let us say that we have three listemes, dog, boat and sink. These have no categorical information and in principle can combine in different ways to produce a sentence. They form a conceptual array as in (70a), and together with the grammatical formatives will, the, three, could, in principle, give rise to a number of sentences: (70) a.
[Vdog][DPboat][DPsink]
1114
V. Interfaces
In principle, some are more compatible with world knowledge, or with selectional restrictions, than others. All of them, however, can be interpreted in view of the presence of this particular functional structure. Borer believes this to be outside the domain of the computational grammatical system, and strictly within the conceptual domain. Syntactically, note, they are all unambiguous. For instance, the sentence The boat will dog three sinks, although presumably semantically odd, is interpretable in a particular way. Specifically, by virtue of being in the specifier of ASPQ (quantity aspect), sink in the structure in (70b) is assigned a DP structure, thus allowing the merger of functional DP internal material (in this case, three). In turn, three sinks in SpecASPQ is assigned a subject-of-quantity interpretation, in essence, equivalent to an interpretation associated with undergoing a structured change. Boat, in turn is assigned DP structure in SpecTP, thus, licensing the merger of DPinternal functional material (i.e. the). It then moves from SpecTP to SpecEP where it is assigned the role of an originator (of a non-stative event). Finally, and more crucially from the perspective of our focus here, all the functional nodes in (70b) are verbalizers, turning the lexical domain (L-D) into a VP and categorizing dog as a verb (for concreteness, Borer assumes overt short movement of the verb in English to a functional position above ASPQ). (70c−f) offer further sentences that can be formed on the basis of the listemes dog, boat and sink: (70) b.
c. d. e. f.
EP 3 DP T the boat 3 Originator T’ T 3 will F 3 F+V AspQmax 3 dog DP AspQ’ three sinks 3 Subject of quantity AspQ VP
The The The The
boat will sink three dogs. sink will boat three dogs. dog will sink three boats. dog will boat three sinks.
This system permits a given noun phrase to be projected into the specifier position of different phrases, thereby deriving different verb classes. Thus in (71), the flower could be either projected in SpecASPQ, and the verb is interpreted as an unaccusative, or in SpecEP, in the absence of AspQ, and the verb is interpreted as unergative: (71) The flower wilted. Hence, again, the variability of syntactic projections accounts for alternations in this system.
32. Syntax and the Lexicon
1115
4.4. Argument realization in Distributed Morphology In the previous sections, we examined approaches which make very explicit claims concerning the event syntactic templates that are responsible for argument realization. Recent work within Distributed Morphology (Halle and Marantz 1993) bears a certain amount of resemblance to these views. To begin with, the architecture of the model of grammar adopted in Distributed Morphology (DM) is as in (72). The syntax consists of a set of rules that generate syntactic structures which are then subject to further operations in the course of the derivation of the PF and LF interface levels: (72) The Grammar Syntactic Derivation
(Spell Out) 3 PF LF In this framework, every word is formed by syntactic operations (Merge, Move). The principles of morphology are, therefore, to a large extent the principles of syntax. In the default case, the morphological structure at PF simply is the syntactic structure. In more complex cases, which are in no way exceptional, some further operations apply at PF to modify the syntactic structure. The units that are subject to the syntactic operations Move and Merge are the morphemes. On the widely-held view, according to which, syntactic structures are hierarchical tree structures, the morphemes are the terminals of such trees. There are two types of morphemes, the Roots, i.e. items like √CAT or √SIT, make up the members of the open class and the so-called abstract morphemes, such as [pl] or [past], are the (contents of the) familiar functional categories of syntactic theory. Roots have no grammatical category. Roots in this system never appear bare; they must always be categorized by virtue of being in a local relationship with one of the category-defining functional heads: (73) Categorization assumption: Roots cannot appear without being categorized; Roots are categorized by combining with category defining functional heads. (Embick 2010) Several researchers working within this set of assumptions agree upon the fact that v is a category defining head that categorizes the root as a verb (see e.g. Marantz 2005; Pylkkänen 2002, 2008; Alexiadou, Anagnostopoulou, and Schäfer 2006). This v-layer is taken to be responsible for the introduction of the event interpretation we normally associate with verbs.
1116
V. Interfaces
While DM-based approaches, similarly to e.g. Borer and Ramchand, assume that the lexical verb (in this model the root) does not introduce the external argument, they do not posit an event/aspectual related position for its introduction. They crucially rely on the so-called Voice hypothesis, introduced in Kratzer (1996): (74) The Voice hypothesis: Voice is responsible for the introduction of external arguments. The same head introduces a DP in the active and licenses a PP in the passive. In this system, what we call transitive verbs contain at least VoiceP and vP and the causative alternation is a Voice alternation, as explicitly argued by Alexiadou, Anagnostopoulou, and Schäfer (2006). (75) VoiceP 3 vP 3 Root The motivation for severing the external argument in Kratzer’s work builds on earlier work, by e.g. Marantz (1984), according to which the theta-role of this argument is determined by the Verb + object complex (see 76). Kratzer (2003) substantiates the claim that the external argument is not introduced by the verb by also looking at the properties of nominalization and participle formation (see 77). (76) a. b. c. d. e.
kill kill kill kill kill
(77) a. b.
The climbers are secured with a rope. The climbers are being secured with a rope.
a cockroach. a conversation. an evening watching T. V. a bottle. an audience. (Marantz 1984: 25)
The data in (76) suggest that transitive verbs express a wide range of predicates depending on the choice of direct object, and assign different roles to their subjects. In addition, the external argument seems to be excluded from idioms, which are composed on the basis of the verb and the direct object. Turning to (77), (77a), which can be interpreted as an adjectival passive, is compatible with the climbers having secured themselves. On the other hand, (77b), which must be a verbal passive, requires the climbers to be secured by somebody else. This suggests that adjectival passives are deverbal constructions where the verb’s external argument can be missing. If the verb’s external argument is not obligatorily present in adjectival passives, we might be tempted to weaken the requirement, Kratzer argues, that lexical items must syntactically realize all of their (non-event) arguments. Within event semantics, another possible explanation is available, however. External arguments might be
32. Syntax and the Lexicon
1117
neo-Davidsonian in the syntax, hence might not be true arguments of their verbs at all. If this was so, we might be able to account for the occasional absence of a verb’s external argument without having to give up the hypothesis that lexical items must realize their (non-event) arguments wherever they occur. Authors working within DM, however, make slightly different claims concerning the introduction of internal arguments. Marantz (1997), Harley (2005), Harley and Noyer (2000), Alexiadou (2001), and Alexiadou, Anagnostopoulou, and Schäfer (2006) assume that the internal argument is introduced by the root, while Marantz (2005), and Alexiadou (2009) claim that this is not the case. Embick (2004) claims that the internal argument is sometimes introduced by the root and sometimes by the v-layer. Alexiadou and Schäfer (2010) claim that the internal argument is introduced in SpecvP for break verbs, while within a PP/ResultP for arrive type verbs, an approach crucially building on Hale and Keyser (2002). In recent work, the role of roots in introducing the internal argument has been criticized, see e.g. Borer (2005) and Acquaviva (2008). In particular, Acquaviva points out that lacking syntactically legible information, roots cannot project: there can be, then, no RootP, and no argument may therefore appear in the specifier or complement position of a root. This means that only functional heads/particles/small clause structures introduce arguments. Assuming that Voice is responsible for the external argument and Voice modifiers, we must conclude that internal arguments are licensed via particles/prepositions/functional heads/small clauses (see Pylkkänen 2002; Embick 2004; Marantz 2005; Alexiadou 2009; Alexiadou and Schäfer 2010), cf. Ramchand (2008). One important point should be kept in mind. DM approaches are not based on telicity. Furthermore, they do not assume the syntactic representation of two sub-events. All the structures referred to include only one event layer, namely v. The idea is that a particular structural configuration will receive a particular semantic interpretation at the relevant level. Two such structures have been recognized and are related to the ways in which roots can combine with v. Embick (2004) argues that these two ways are as in (78): roots can modify v, via direct Merge, or be introduced as complements of v: (78) a.
b.
modifiers of v, direct Merge v e.g. hammer 3 √ v complements of v v 3 v √
e.g. flatten
The structure in (78a) can license secondary resultative predication. In that case, the element that appears in the complement of v cannot be a bare root (79). (79)
vP 3 v aP 3 4 √ v
e.g. hammer flat
1118
V. Interfaces
As Embick suggests, direct merge has semantic consequences. It specifies the means component of the complex predicate, cf. Hale and Keyser (2002). Pattern (78b) seems to be reserved for ‘state’ roots in English. The distinction introduced in (78) is very much reminiscent of the division into manner vs. result roots advanced in Rappaport Hovav and Levin’s (2001) work, discussed in section 3.3 as well in l-syntax-based approaches discussed in section 4.1. DM as well as Borer (2005) contrast with Ramchand, who preserves the idea that verbs carry some minimal lexical specification. This takes the form of presenting the structural layers the verbs can combine with: e.g. a verb like push is specified for and verbs can have more than one specification. As already mentioned, in order to account for transitive and intransitive uses of the verb break, Ramchand must assume two possible entries: one containing all three layers or optional subparts. For Borer, the lexical item names a concept and does not carry any sort of information. A given concept is merged with the aspectual layers. Depending on the layers it will merge with, different verb classes arise, e.g. a transitive verb will project both EP (originator) and AspQ (aspect of quantity). Unaccusatives are similar and the argument moves from AspQ to EP, unergatives/activities lack AspQ, as mentioned in section 4.3. Some DM inspired authors, see in particular Harley (2005) and Embick (2004), assume that roots fall in different categories: e.g. state, manner, event, see also Doron (2003). This is in a sense similar to Rappaport Hovav and Levin’s classification. This is not the case for e.g. Acquaviva (2008) where the combination of the roots with a particular structure gives rise to different verb classes and interpretations.
4.5. Summary To summarize, a property shared by all syntactic approaches to the lexicon is the departure from the idea of projection of lexical information, a feature that lexical semantic approaches must assume. A second important characteristic is that verb meaning consists of certain building blocks which are to a certain extent reminiscent of the primitives of lexical decomposition assumed within lexical semantics, that freely combine to generate verb meanings. Three points should be made here in comparison to the lexical semantics approaches discussed earlier in this paper. The first one concerns the extent to which syntactic approaches assume a classification into different types of verb classes. This is the case for certain approaches which more or less explicitly assume such a classification regulated by the type of root involved, see e.g., Marantz (1997), Harley and Noyer (2000), Embick (2004), Alexiadou, Anagnostopoulou, and Schäfer (2006) and perhaps also Ramchand (2008), in the sense that verbs are lexically specified for realizing one or all the features in her type of meaning decomposition. This is clearly not the case for Borer (2005). The classification systems assumed by these authors are coarse-grained for the authors working within DM and could be viewed as fine-grained in the case of Ramchand. The second issue concerns the overgeneration that unavoidably syntactic approaches bring into the picture. A clear illustration of this was given with the examples in (70), borrowed from Borer (2005). Such an overgeneration is avoided within lexical seman-
32. Syntax and the Lexicon
1119
tics, which adopt very rich templates and mapping principles to derive verb meaning. For syntactic approaches, overgeneration is not a drawback itself, considering that semantically ill-formed sentences will be filtered out by the conceptual system. The third issue concerns how lexical and syntactic approaches deal with cross-linguistic differences. Let us consider the following example from the area of the causative alternation as an illustration. The core of verbs that undergo the causative alternation is stable across languages. As already mentioned, there is, however, some interesting variation in two domains, namely verb restrictions and selection restrictions. With respect to the first domain, there are verbs that are predicted by Levin and Rappaport Hovav (1995) which allow the alternation but don’t in English, while they do in other languages such as Greek, e.g. destroy and kill. Consider (80) and (81), from Alexiadou, Anagnostopoulou and Schäfer (2006): (80) a. a′.
John / the fire / the bomb destroyed the manuscript. *The manuscript destroyed.
b. John / the fire / the bomb killed Mary. b′. *Mary killed. (81) a. O Petros / i fotia / i vomva katestreps-e to paketo. The Peter / the fire / the bomb destroyed-3SG the package.ACC ‘Peter/the fire/the bomb destroyed the package.’
[Greek]
apo / me tin fotia / me tin vomva. b. To paketo katastraf-ik-e The package destroyed-NACT-3SG by / with the fire / with the bomb ‘The package got destroyed by the fire/by the bomb.’ Alexiadou, Anagnostopoulou and Schäfer proposed to account for this variation in terms of the building blocks of anticausatives availablc cross-linguistically. In particular, they proposed that Voice [−AG], realized via non-active morphology (NACT) above, can be present in the anticausative structure in Greek, while this is not possible in e.g. English.
5. Construction Grammar The structural representations discussed in section 4 were introduced in order to capture the variability in argument realization which lexicon-based approaches have problems with. In doing this, however, they are often regarded by some researchers as similar in spirit to the proposals made within the framework of Construction Grammar., see e.g. Levin and Rappaport Hovav (2005). In this framework (Goldberg 1995, 2005), an approach to grammar is put forth which takes the speakers’ knowledge of language to consist of a network of learned pairings of form and function, or constructions. Constructions are posited whenever there is evidence that speakers cannot predict some aspect of their form, function, or use from other knowledge of language (i.e., from other constructions already posited to exist). Goldberg (1995: 4) defines constructions as follows:
1120
V. Interfaces
(82) C is a construction iffdef C is a form-meaning pair such that some aspect of Fi or some aspect of Si is not strictly predictable from C’s component parts or from previously established constructions. In Goldberg’s view, constructions can be understood to correspond to the listemes of DiScuillo and Williams (1987) that is the entities of grammar that must be listed. In Goldberg’s work the collection of constructions that forms the lexicon is taken to constitute a highly structured lattice of interrelated information. (83) offers some examples of constructions in the sense of (82) (from Goldberg 1995: 3). (83) Ditransitive V Subj Pat faxed
Obj Bill
Caused motion V Subj Pat sneezed Resultative V Subj She kissed
x causes y to receive z Obj2 the letter x causes y to move to z Obj Obl the napkin off the table x causey y to become z
Obj him
Xcomp
unconscious
Construction Grammar is a non-transformational approach to grammar where there is no strict division between the lexicon and the syntax. The importance of constructions is further highlighted by Goldberg’s (2005) criticism of the ARP, which was introduced in (50). She notes the following: the ARP predicts that causative events, which have two subevents, should necessarily always have two overt arguments. In (82), however, we see that causative verbs often actually allow patient arguments to be omitted under certain discourse conditions. (84) instantiate a different set of constructions that are available in English under a set of well-defined restrictions: (84) a. b. c. d.
The chef-in-training chopped and diced all afternoon. Owls only kill at night. The famous lecturer always aimed to dazzle/please/disappoint/impress/charm. Pat gave and gave, but Chris just took and took.
To account for such data, Goldberg proposes the principle in (85): (85) Principle of omission under low discourse prominence: Omission of the patient argument is possible when the patient argument is construed to be deemphasized in the discourse vis-à-vis the action. That is, omission is possible when the patient argument is not topical (or focal) in the discourse, and the action is particularly emphasized (via repetition, strong affective stance, contrastive focus, etc.).
32. Syntax and the Lexicon
1121
6. Conclusion As Levin and Rappaport Hovav (2005) also pointed out, we can identify two main approaches in the research on argument realization. In one approach, a verb has a structured lexical entry which alone determines the projection of its arguments. Multiple realizations of arguments pose a severe problem for this view and the only solution seems to be to assume that such verbs are polysemous. In the other approach, the lexical entry of the verb registers only its core meaning, and this core meaning combines with the event based meanings presented by syntactic constructions themselves. From such views, polysemy is eliminated, but the problem of over-generation arises. This is the reason why most syntactic systems make use of a kind of filter to block massive variability. For instance, Ramchand describes verbs in terms of the features init, proc, and res, Embick, Alexiadou, Anagnostopoulou, and Schäfer and Harley classify roots into distinct categories, which specify the structures they can appear in. I believe that further research into the interaction between the primitive elements, i.e. roots/listemes and the (event) structures/structural templates, will help us understand which aspects of the verb’s behavior are determined by the root, and which by the structure in which this root appears.
32. Acknowledgements I am indebted to one anonymous reviewer as well as Gianina Iordachioaia, Susanne Lohrmann, Tibor Kiss, Terje Lohndal, Sabine Mohr, and Florian Schäfer for their useful comments and suggestions.
7. References (selected) Alexiadou, Artemis 2001 Functional Structure in Nominals: Nominalization and Ergativity. Amsterdam: John Benjamins. Alexiadou, Artemis 2009 On the role of syntactic locality in morphological processes: the case of Greek nominals. In: Anastasia Giannakidou and Monika Rathert (eds.), Quantification, Definiteness and Nominalization, 253−280. Oxford: Oxford University Press. Alexiadou, Artemis, Elena Anagnostopoulou, and Florian Schäfer 2006 The properties of anticausatives crosslinguistically. In: M. Frascarelli (ed.), Phases of Interpretation, 187−212. Berlin: Mouton de Gruyter. Alexiadou, Artemis, and Florian Schäfer 2010 Unaccusatives at the syntax-semantics interface: there-insertion, indefinites and restitutive again. In: Proceedings of Sinn and Bedeutung 15. Allen, Shanley, Asli Ozyürek, Sotaro Kita, Amanda Brown, Reyhan Furman, Tomoko Ishizuka, and Mihoko Fujii 2007 Language specific and universal influences in children’s syntactic packaging of manner and path: A comparison of English, Japanese, and Turkish. Cognition 102: 16−48. Anagnostopoulou, Elena 2003 The Syntax of Ditransitives: Evidence from Clitics. Berlin: Mouton de Gruyter.
1122
V. Interfaces
Baker, Mark 1988 Incorporation: A Theory of Grammatical Function Changing. Chicago: Chicago University Press. Baker, Mark 1997 Thematic roles and syntactic structure. In: Liliane Haegeman (ed.), Elements of Grammar, 73−137. (Handbook of generative syntax.) Dordrecht: Kluwer. Belletti, Adriana, and Luigi Rizzi 1988 Psych-verbs and theta theory. Natural Language and Linguistic Theory 3: 291−352. Beavers, John, Beth Levin, and Shiao Wei Tham 2010 A morphosyntactic basis for variation in the rncoding of motion events. Journal of Linguistics 46: 331−377. Biberauer, Teresa, and Rafaella Folli 2004 Goals of motion in Afrikaans. Proceedings of Journées d’Etudes Linguistiques 2004, 19−26. Bloomfield, Leonard 1933 Language. New York: Holt. Borer, Hagit 2003 Exo-skeletal vs. endo-skeletal approaches: syntactic projections and the lexicon. In: John Moore and Maria Polinsky (eds.), The Nature of Explanation in Linguistic Theory, 31− 67. Stanford: CLSI Publications. Borer, Hagit 2005 Structuring Sense, vol. II. The Normal Course of Events. Oxford: Oxford University Press. Borsley, Robert, and Jaklin Kornfilt 2000 Mixed extended projections. In: Rorbert Borsley (ed.) The Nature and Function of Syntactic Categories, 101−113. New York: Academic Press. Bowers, John 1993 The syntax of predication. Linguistic Inquiry 24: 591−656. Bresnan, Joan (ed.) 1982 The Mental Representation of Grammatical Relations. Cambridge, M.A.: MIT Press. Burzio, Luigi 1981 Intransitive verbs and Italian auxiliaries. Ph.D. dissertation, Department of Linguistics, MIT. Carrier, Jill, and Janet Randall 1992 The argument structure and syntactic structure of resultatives. Linguistic Inquiry 23: 173−234. Chierchia, Gennaro 2004 A semantics for unaccusatives and its syntactic consequences. In: Artemis Alexiadou, Elena Anagnostopoulou, and Martin Everaert (eds.), The Unaccusativity Puzzle: Explorations of the Syntax-Lexicon Interface, 22−59. Oxford: Oxford University Press. Chomsky, Noam 1981 Lectures on Government and Binding. Dordrecht: Foris. Croft, William 1991 Syntactic Categories and Grammatical Relations. Chicago: The University of Chicago Press. Davidson, Donald 1967 The logical form of action sentences. In: Essays on Actions and Events, 205−148. Oxford: Clarendon Press. Di Sciullo, Anna-Maria, and Edwin Williams 1987 On the Definition of Word. Cambridge, M.A.: MIT Press. Doron, Edit 2003 Agency and voice: the semantics of the Semitic templates. Natural Language Semantics 11: 1−67.
32. Syntax and the Lexicon
1123
Dowty, David 1979 Word Meaning and Montague Grammar − The Semantics of Verbs and Times in Generative Semantics and in Montague’s PTQ. Dordrecht: Reidel. Dowty, David 1991 Thematic proto-Roles and argument selection. Language 67: 547−619. Embick, David 2004 On the structure of resultative participles in English. Linguistic Inquiry 35: 355−392. Embick, David 2010 Localism vs. Globalism in Morphology and Phonology. Cambridge, MA: MIT Press. Erteschik-Shir, Nomi, and Rapoport, Tova 1997 A theory of verbal projection’. In: Gabriela Matos and Matilde Miguel (eds.), Interfaces in Linguistic Theory, 129−148. Lisboa: APL/Ediçõnes Colibri. Erteschik-Shir, Nomi, and Rapoport, Tova 2005 Path Predicates. In: Nomi Erteschik-Shir and Tova Rapoport (eds.), The Syntax of Aspect, 65−86. Oxford: Oxford University Press. Erteschik-Shir, Nomi, and Tova Rapoport 2010 Contacts as results. In: Malka Rappaport Hovav, Edit Doron and Ivy Sichel (eds.), Syntax, Lexical Semantics and Event Structure, 59−75. Oxford: Oxford University Press. Fillmore, Charles J. 1970 The grammar of hitting and breaking. In: Roderick A. Jacobs and Peter S. Rosenbaum (eds.), Readings in English Transformational Grammar, 120−133. Waltham, M.A.: Ginn. Folli, Rafaella, and Gilliam Ramchand 2005 Prepositions and results in Italian and English: an analysis from event decomposition. In: Henk Verkuyl, Henriette De Swart, and Angeliek van Hout, (eds.), Perspectives on Aspect, 81−105. Dordrecth: Kluwer. Gehrke, Berit 2008 Ps in Motion: On the Semantics and Syntax of P Elements and Motion Events. LOT Dissertation Series, Netherlands Graduate School of Linguistics. Goldberg, Adele 1995 Constructions: A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press. Goldberg, Adele 2005 Constructions, lexical semantics and the correspondence principle: accounting for generalizations and subregularities in the realization of arguments. In: Nomi Erteschik-Shir and Tova Rapoport (eds.), The Syntax of Aspect, 215−236. Oxford: Oxford University Press. Grimshaw, Jane 1990 Argument Structure. Cambridge, M.A.: MIT Press. Grimshaw, Jane, and Sten Vikner 1993 Obligatory adjuncts and the structure of events. In: Eric Reuland and Werner Abraham (eds.), Knowledge and Language: Vol. II Lexical and Conceptual Structure, 145−159. Dordrecht: Kluwer. Gruber, Jeffrey S. 1965 Studies in lexical relations. Ph.D. dissertation, Department of Linguistics, MIT. Hale, Ken, and Samuel J. Keyser 1987 A view from the middle. Lexicon Project Working Papers 10, Center for Cognitive Science. MIT. Hale, Ken, and Samuel J. Keyser 1993 On argument structure and the lexical expression of syntactic relations. In: Ken Hale and Samuel J. Keyser (eds.), The View from Building 20, 53−108. Cambridge, M.A.: MIT Press.
1124
V. Interfaces
Hale, Ken, and Samuel J. Keyser 2002 Prolegomenon to a Theory of Argument Structure. Cambridge, M.A.: MIT Press. Halle, Morris, and Alec Marantz 1993 Distributed Morphology and the pieces of inflection. In: Ken Hale and Samuel J. Keyser (eds.), The View from Building 20, 111−176. Cambridge, M.A.: MIT Press. Harley, Heidi 2005 How do verbs take their names? Denominal verbs, manner incorporation and the ontology of roots in English. In: Nomi Erteschik-Shir and Tova Rapoport (eds.), The Syntax of Aspect, 42−64. Oxford: Oxford University Press. Harley, Heidi and Rolf Noyer 2000 Formal vs. encyclopedic properties of vocabulary: evidence from nominalization. In: Bert Peeters (ed.), The Lexicon-Encyclopedia Interface, 349−374. Amsterdam: Elsevier Press. Horstein, Norbert 1999 Movement and control. Linguistic Inquiry 30: 69−96. Horvath, Julia, and Tal Siloni 2002 Against the Little-v Hypothesis. Rivista di Grammatica Generativa 27: 107−122. Jackendoff, Ray 1983 Semantics and Cognition. Cambridge, M.A.: MIT Press. Jackendoff, Ray 1990 Semantic Structure. Cambridge, M.A.: MIT Press. Kallulli, Dalina 2006 Argument demotion as feature suppression. In: Benjamin Lyngfeldt and Torgrim Solstad (eds.), Demoting the Agent, 143−166. Amsterdam: John Benjamins. Kaufmann, Ingrid, and Dieter Wunderlich 1998 Cross-linguistic Patterns of Resultatives. (Theorie des Lexikons. Sonderforschungsbereichs 282. Bericht 109), University of Düsseldorf, Düsseldorf. Koopman, Hilda, and Dominique Sportiche 1991 The position of subjects. Lingua 85: 211−258. Kratzer, Angelika 1996 Severing the external argument from its verb. In: Johan Rooryck and Laurie Zaring (eds.), Phrase Structure and the Lexicon, 109−137. Dordrecht: Kluwer. Kratzer, Angelika 2003 The event argument and the semantics of verbs. Ms., University of Massachussets at Amherst. Larson, Richard 1988 On the double object construction. Linguistic Inquiry 19: 335−391. Levin, Beth 1993 English Verb Classes and Alternations. Chicago: Chicago University Press. Levin, Beth to appear Verb classes within and across Languages. In: Bernhard Comrie and Andrej Malchukov (eds.), Valency Classes: A Comparative Handbook. Berlin: Mouton de Gruyter. Levin, Beth, and Malka Rappaport 1986 The formation of adjectival passives. Linguistic Inquiry 17: 623−661. Levin, Beth, and Malka Rappaport 1988 Non-event -er nominals: A probe into argument structure. Linguistics 26: 1067−1083. Levin, Beth, and Malka Rappaport Hovav 1995 Unaccusativity. At the Syntax-Lexical Semantics Interface. Cambridge, M.A.: MIT Press. Levin, Beth, and Malka Rappaport Hovav 2005 Argument Realization. Cambridge: Cambridge University Press.
32. Syntax and the Lexicon
1125
Levin, Beth, and M. Rappaport Hovav to appear Lexicalized Meaning and Manner/Result Complementarity. In: Boban Arsenijević, Berit Gehrke, and Rafael Marín (eds.), Subatomic Semantics of Event Predicates, Springer: Dordrecht. Levin, Beth, and Malka Rappaport Hovav 2011 Lexical conceptual structure. In: Klaus von Heusinger, Claudia Maienborn, and Paul Portner (eds.), Semantics: An International Handbook of Natural Language Meaning, 418−438. Berlin: Mouton de Gruyter. Manzini, Maria-Rita, and Anna Roussou 2000 A minimalist theory of A-movement and control. Lingua 110: 409−447. Mateu, Jaume 1999 Universals of Semantic Construal for Lexical Syntactic Relations. Distributed by GGT99–4 Research Report, Universitat Autònoma de Barcelona. Marantz, Alec 1984 On the Nature of Grammatical Relations. Cambridge, M.A.: MIT Press. Marantz, Alec 1997 No escape from syntax: Don’t try morphological analysis in the privacy of your own lexicon. In: Alexis Dimitriadis, Laura Siegel, Clarissa Surek-Clark, and Alexander Williams (eds.), University of Pennsylvania Working Papers in Linguistics Vol. 4.2, 201− 225. Philadelphia. Marantz, Alec 2005 Objects out of the lexicon: objects as event. Paper presented at the University of Vienna. Ouhalla, Jamal 1988 The syntax of head movement: a study of Berber. Doctoral dissertation, UCL. Papangeli, Dimitra 2004 The Morphosyntax of Argument Realization: Greek Argument Structure and the LexiconSyntax Interface. LOT Dissertation Series, Netherlands Graduate School of Linguistics. Parsons, Terence 1990 Events in the Semantics of English: A Study in Subatomic Semantics. Cambridge, M.A.: MIT Press. Pesetsky, David 1982 Phases and categories. Ph.D. dissertation, Department of Linguistics, MIT. Pesetsky, David 1995 Zero Syntax. Cambridge, M.A.: MIT Press. Perlmutter, David 1978 Impersonal passives and the unaccusative hypothesis. Proceedings of the fourth Annual Meeting of the Berkeley Linguistics Society, 157−189. Perlmutter, David, and Paul Postal 1984 The 1-advancement exclusiveness law. In: David Perlmutter and Carol Rosen (eds.), Studies in Relational Grammar 2, 81−125. Chicago: Chicago University Press. Pollock, Jean-Yves 1989 Verb-movement, universal grammar, and the structure of IP. Linguistic Inquiry 20: 365−420. Primus, Beatrice 1999 Cases and Thematic roles: Ergative, Accusative and Active. Tübingen: Niemeyer. Pustejovsky, James 1991 The syntax of event structure. Cognition 41: 7−81. Pustejovsky, James 1995 The Generative Lexicon. Cambridge, M.A.: MIT Press. Pylkkänen, Liina 2002 Introducing arguments. Ph.D. dissertation, Department of Linguistics, MIT.
1126
V. Interfaces
Pylkkänen, Liina 2008 Introducing Arguments. Cambridge, M.A.: MIT Press. Rákosi, György 2006 Dative Experiencer Predicates in Hungarian. LOT Dissertation Series, Netherlands Graduate School of Linguistics. Ramchand, Gillian 2008 Verb Meaning and the Lexicon: First Phase Syntax. Cambridge: Cambridge University Press. Rappaport Hovav, Malka, and Beth Levin 1998 Building verb meanings. In: Miriam Butt and Willi Geuder (eds.), The Projection of Arguments: Lexical and Compositional Factors, 97−134. Stanford: CSLI Publications. Rappaport Hovav, Malka, and Beth Levin 2001 An event structure account of English resultatives. Language 77: 766−797. Rappaport Hovav, Malka, and Beth Levin 2010 Reflections on Manner/Result Complementarity. In: Edit Doron, Malka Rappaport Hovav and Ivy Sichel (eds.), Syntax, Lexical Semantics, and Event Structure, 21−38. Oxford: Oxford University Press. Reinhart, Tanya 2000 The Theta System: Syntactic realization of verbal concepts. OTS Working papers, 00.01/TL. Reinhart, Tanya 2002 The theta system: an overview. Theoretical Linguistics 28: 229−290. Reinhart, Tanya, and Tali Siloni 2005 The lexicon-syntax parameter: reflexivization and other arity operations. Linguistic Inquiry 36: 389−436. Rosen, Carol 1984 The interface between semantic roles and initial grammatical relations. In: David M. Perlmutter and Carol Rosen (eds.), Studies in Relational Grammar 2, 38−77. Chicago: Chicago University Press. Rosen, Sara T. 1996 Events and verb classification. Linguistics 34: 191−223. Rosen, Sara T. 1999 The syntactic representation of linguistic events. A state-of-the-article. Glot International 4: 3−10. Ritter, Elizabeth, and Sara T. Rosen 1998 Delimiting events in syntax. In: Miriam Butt and Willi Geuder (eds.), The Projection of Arguments: Lexical and Compositional Factors, 135−164. Stanford, CA: CSLI Publications. Rozwadoska, Bozena 1988 Thematic restrictions on derived nominals. In: Wendy Wilkins (ed.), Syntax and Semantics 21: Thematic Relations, 147−165. New York: Academic Press. Stowell, Tim 1981 Origins of phrase structure. Ph.D. dissertation, Department of Linguistics, MIT. Tenny, Carol 1987 Grammaticalizing aspect and affectedness. Doct. Dissertation MIT, Cambridge, Massachusetts. Tenny, Carol 1992 The aspectual interface hypothesis. In: Ivan Sag and Anna Szabolsci (eds.), Lexical Matters, 1−27. Stanford: CSLI Publications. Tenny, Carol 1994 Aspectual Roles and the Syntax-Semantics Interface. Dordrecht: Kluwer.
32. Syntax and the Lexicon
1127
Travis, Lisa 1994 Event phrase and a theory of functional categories. In Paivi Koskinen (ed.), Proceedings of the 1994 Annual Conference of the Canadian Linguistic Association, Toronto Working Papers in Linguistcs, 559−570. Toronto. Travis, Lisa 2000 Event structure in syntax. In: Carol Tenny and James Pustejovsky (eds.), Events as Grammatical Objects: The Converging Perspectives of Lexical Semantics and Syntax, 145−185. Stanford: CSLI Publications. Travis, Lisa to appear Inner Aspect. Van Hout, Angeliek 1996 Event semantics of verb frame alternations: a case study of Dutch and its acquisition. Doctoral dissertation, Tilburg University. van Hout, Angeliek, and Tom Roeper 1998 Events and aspectual structure in derivational morphology. MIT Working Papers in Linguistics 32: 175−220. Van Valin, Robert 1990 Semantic parameters of split intransitivity. Language 66: 221−260. Van Valin, Robert 1999 Generalized semantic roles and the syntax-semantics interface. In: Francis Corblin, Carmen Dobrovie-Sorin and Jean-Marie Marandin (eds.), Empirical Issues in Formal Syntax and Semantics 2, 373−389. The Hague: Thesus. Van Valin, Robert, and Randy LaPolla 1997 Syntax: Structures, Meaning, and Function. Cambridge: Cambridge University Press. Vendler, Zeno 1967 Linguistics in Philosophy. Ithaca, N.Y.: Cornell University Press. von Stechow, Armin 1995 Lexical decomposition in syntax. In: Urs Egli, Peter Pause, Christoph. Schwartze, Armin von Stechow and Götz Wienold (eds.), Lexical Knowledge in the Organization of Language, 81−117. Amsterdam: John Benjamins. Voorst, Joost van 1988 Event Structure. Amsterdam: John Benjamins. Wasow, Thomas 1977 Transformations and the lexicon. In: Peter Culicover, Thomas Wasow and Adrian Akmajian (eds.), Formal Syntax, 327−360. New York: Academic Press. Williams, Edwin 1981 Argument structure and morphology. The Linguistic Review 1: 81−114. Wunderlich, Dieter 2000 Predicate composition and argument extension as general options: a study in the interface of semantic and conceptual structure. In: Barbara Stiebels and Dieter Wunderlich (eds.), The Lexicon in Focus, 247−270. Berlin: Akademie Verlag. Wunderlich, Dieter (ed.) 2006 Advances in the Theory of the Lexicon. Berlin: Mouton de Gruyter. Zubizarreta, Maria-Luisa 1987 Levels of Representation in the Lexicon and in the Syntax. Dordrecht: Foris. Zubizarreta, Maria-Luisa, and Eunjeong Oh 2007 On the Syntactic Composition of Manner and Motion. Cambridge, M.A.: MIT Press.
Artemis Alexiadou, Stuttgart (Germany)
1128
V. Interfaces
33. The Syntax-Morphology Interface 1. 2. 3. 4. 5. 6.
Introduction Overview: morphology and syntax Frameworks Syntacticocentric morphology Conclusion References (selected)
Abstract This article first briefly exemplifies a number of morphological phenomena which have syntactic repercussions, to illustrate the range of issues investigators confront. It then sketches a variety of grammatical architectures and their general approach to the interaction between morphology and syntax, paying attention to what empirical territory is considered the domain of independent morphological processes vs. the domain of properly syntactic processes in each. Finally, it provides a more in-depth look at the theory of Distributed Morphology, and briefly describes specific problems associated with syntacticocentric approaches to derivational morphology at the interface.
1. Introduction Morphological structure and syntactic structure are clearly mutually dependent. Within a given language, plural subjects can require plural forms of finite verbs, derivational affixation can change a verb’s argument structure requirements, and comparative clauses can be introduced by adjectives which must be appropriately inflected. Complex wordinternal structure in polysynthetic languages represents the same logical content as sentence-internal structure in isolating languages, with broad but not infinitely variable gradience between these extremes. Morphology and syntax share a vocabulary of categories and features − nouns, verbs, tenses, cases, etc. They both exhibit hierarchical structure and headedness. Any adequate grammatical theory must provide an account of the interaction between the two and explicate in what ways it is constrained. In this article, I first briefly exemplify a number of morphological phenomena which have syntactic repercussions, to illustrate the range of issues investigators confront. I then sketch a variety of grammatical architectures and their general approach to the interaction between morphology and syntax, paying attention to what empirical territory is considered the domain of independent morphological processes vs. the domain of properly syntactic processes in each. Finally, I provide a more in-depth look at the theory of the morphosyntactic interface with which I am most familiar, Distributed Morphology, and briefly describe specific problems associated with syntacticocentric approaches to derivational morphology at the interface.
33. The Syntax-Morphology Interface
1129
2. Overview: morphology and syntax There is often a sense of directionality to the morphology-syntax interface; there are cases where the morphology seems to drive the syntax, and other cases where the syntax appears to drive the morphology. This intuition corresponds roughly to the derivation/ inflection distinction, where that distinction is relevant.
2.1. Derivational morphology and syntax Derivational morphology can produce complex word-forms which have radically different syntactic properties than those of the stem to which the morphology attaches. It can, for example, change the syntactic category or subcategory of a word, and thereby alter its selectional properties. Derivational morphology can also radically alter a word’s selectional properties without changing its category. To take an example from English, as a verb, invite requires an internal argument (1); when nominalized as invitation, both the internal and external arguments are apparently optional (2). When the arguments do appear, however, their syntactic properties are appropriate to the nominal syntax in which they find themselves; the internal argument, for example, is prepositionally case-licensed by of (2b) rather than receiving accusative case from invite (1b), see Chomsky (1970, 1981), Lebeaux (1986), Alexiadou (2001), and references in section 4 of this paper, among many others. (1)
a. b.
*Mary invited to the party. Mary invited Susan to the party.
(2)
a. b.
the invitation to the party Mary’s invitation of Susan to the party
In other languages, argument-structure changing morphology, such as causative, desiderative, applicative or passive suffixes, can adjust a sentence’s thematic structure so as to require additional arguments, or suppress extant arguments. The Hiaki desiderative suffix in (3b) introduces a new external argument, Peo; the applicative suffix in (3c) introduces a new internal argument, Jose. Some analyses of applicative constructions cross-linguistically include Baker (1992), Marantz (1993), McGinnis (2001), O’Herin (2001), Pylkkanen (2002), Georgala et al. (2008), among others. For discussion of desideratives affixes (glossed as DESID in [3]) in various languages, see Gerdts (1988), Johns (1999), Kim and Maling (1994), among others. (3)
a. Juan nooka. Juan speak ‘Juan is speaking.’ b. Peo Juan-ta nok-ii’aa. Peo Juan-ACC speak-DESID.TR ‘Peo wants Juan to speak.’
[Hiaki]
1130
V. Interfaces c. Juan Jose-ta nok-ria. Juan Jose-ACC speak-APPL ‘Juan is speaking for Jose.’
Similarly, English out-prefixation creates a transitive verb from an intransitive one, adding an internal argument; compare to swim and to outswim. The direction of influence can go the other way, from syntax to morphology, even with derivational morphology, however: English re- prefixation is incompatible with adjectival resultative constructions (and verb-particle constructions), even when the verb that is the target of prefixation is independently compatible with them. Out-prefixation has been discussed in Bresnan (1982), Keyser and Roeper (1984), Roberts (1985), Levin (1999), among others; for discussion of re-prefixation see Wechsler (1989), Keyser and Roeper (1992), Harley (2004), Lieber (2004), Williams (2006), Marantz (2010), among others. (4)
a. b. c. d.
John opened the discussion (up). John reopened the discussion (*up). Bill cooked the meat (tender). Bill recooked the meat (*tender).
This is a small sampling of the kinds of patterns that an adequate theory of the morphology-syntax interface must account for in the derivational realm. It is usually the case that the choice of derivational structure is optional in the way that lexical choices are optional; if a Hiaki speaker decided not to use the applicative, for example, a periphrasitic alternative is available in which the sentence in (3a) with an underived verb is simply supplemented with a postpositional phrase, Peo-vetchi’ivo, ‘for Peo’. Nothing about the grammar forces the choice of a particular derived form. This optionality of marking is consistent with the observation that morphosyntactic derivation can be “gappy”. Although the Hiaki cases discussed above are completely productive, other syntactically relevant derivational processes can fail with particular stems or classes of stems. Such failures can be specific to particular affixes in combination with particular stems (compare, e.g., English electric~electricity, plastic~plasticity with metallic~*metallicity, dramatic~*dramaticity); alternatively, a gap may be due to semantic/selectional clash between the affix and a class of stems. English agentive nominalization fails with stative transitive verbs (*knower, *wanter, *resemblant vs. teacher, writer, applicant) since such verbs are not compatible with agentive external arguments. A theory of the syntax-morphology interface thus also needs to attend to such morphosyntactic “gaps”.
2.2. Inflectional morphology and syntax Inflectional morphology, in contrast, is generally mandatory, and is often driven by the presence of a particular syntactic configuration, rather than the other way around. A second person plural subject requires a particular form of the finite verb in French; there is no optionality:
33. The Syntax-Morphology Interface (5)
a. Vous parlez. you.PL speak.2PL.PRS ‘You are speaking.’
1131 [French]
b. *Vous parle/parles/parlent/parlons/parler you.PL speak/speak.2SG/speak.3PL/speak.1PL/speak.INF Similarly, the Icelandic preposition frá ‘from’, requires a dative DP complement; other case forms are not appropriate: (6)
frá borginni/*borgina from the.city.DAT/the.city.ACC ‘from the city’
[Icelandic]
Such relationships are paradigm cases of syntactic phenomena, in that they depend on structural, rather than linear, configurations, and persist over unbounded dependencies; the extracted plural subject Which books in (7) triggers plural were no matter how many clause boundaries intervene: (7)
Which booksi did John say Martha claimed Bill thought … ei were/*was on the table?
Case alternations represent one of the most salient interactions between inflectional morphology and syntactic structure, from the familiar patterns of the English passive to the complex relationship between case, transitivity and finiteness in the ergative/absolutive language Warlpiri (data from Woolford 2006 and Legate 2008). For an overview of approaches to ergative/absolutive case systems, see Aldridge (2008) and Deal’s chapter in this volume (in the Warlpiri examples below, rlarni, glossed as OBVC corresponds to the default, non-finite complementizer, used for control by a matrix adjunct or when the non-finite clause has an overt subject. (8)
a.
Intransitive finite clause: Absolutive on subject Ngaju ka-rna parnka-mi. I.ABS PRS-1SG run-NPST ‘I am running.’
b.
Intransitive nonfinite clause: Dative on subject … ngaju-ku jarda-nguna-nja-rlarni … I-DAT sleep-lie-NFIN-OBVC ‘… while I was asleep’
c.
Transitive finite clause: Ergative on subject Karnta-ngku ka-rla kurdu-ku miyi yi-nyi. woman-ERG PRS-3.DAT child-DAT food.ABS give-NPST ‘The woman is giving the child food.’
d.
Transitive nonfinite clause: Dative on subject … karnta-ku kurdu-ku miyi yi-nja-rlarni … woman-DAT child-DAT food.ABS give-NFIN-OBVC ‘… while the woman is giving food to the baby’
[Warlpiri]
1132
V. Interfaces e.
Transitive nonfinite clause: Ergative on subject … ngati-nyanu-rlu karla-nja-rlarni … mother-POSS-ERG dig-NFIN-OBVC ‘… while his mother was digging (for something)’
Any theory of the syntax-morphology interface must be able to provide a mechanism to account for these and other complex interactions between syntactic configuration and morphological form. Theoretical approaches to alternations between periphrastic and affixal representations of the same morphosyntactic feature, for example, vary significantly between different grammatical architectures; consider English comparative affixation exemplified in (9), and Korean negative do-support in (10). For approaches to English comparatives, see Poser (1992), Kennedy (1999), Bhatt and Pancheva (2004), Kiparsky (2005) Embick (2007), among others. On Korean do-support, see Hagstrom (1996), Jo (2000), among others. (9) a. b. c.
smarter/more intelligent than Bill greener/more verdant than my lawn heavier/more massive than an elephant
*intelligenter *verdanter *massiver
(10) a. Chelswu-ka chayk-ul ilk-ess-ta. Chelswu-NOM book-ACC read-PST-DECL ‘Chelswu read the book.’
[Korean]
b. Chelswu-ka chayk-ul ilk-ci ani ha-ess-ta. Chelswu-NOM book-ACC read-CI NEG do-PST-DECL ‘Chelswu did not read the book.’ In the former, the syntactic construction apparently fills a morphological paradigm gap created by the restrictive morphophonological selectional properties of the English comparative -er suffix, while in the latter, the morphology seems to provide an alternative verbal host when syntax interposes a negative particle between the exponents of tense and the main verb stem, resulting in a periphrastic expression of tense in a negative clause and an affixal expression in an affirmative one.
2.3. Clitics and dimensions of wordhood In the above discussion, we have not yet considered the notion of “word”, taking it for granted that the distinction between word-internal structure (morphology) and wordexternal structure (syntax) is not problematic. The default division of labor assumes that morphology governs the hierarchical and linear arrangement of word-internal structure, while syntax governs the hierarchical and linear arrangement of words themselves. Given this intuition about the machinery, it follows that there should be a clear notion of what a “word” is, that is, the kind of thing that morphology constructs and syntax manipulates. Syntactic manipulation of word-internal elements should be by definition impossible.
33. The Syntax-Morphology Interface
1133
There are many contentious questions associated with this intuitive division, however. Perhaps the most salient cases which represent a direct challenge to this set of assumptions are clitics. Clitics are morphologically dependent bound forms, like other affixes, but are (often) linearly positioned as if they are syntactically independent. They form single word-sized units with their hosts, yet it is prima facie implausible that the host+clitic word represents a single syntactic terminal node. In many cases, it is implausible that these single phonological words represent a syntactic constituent of any kind. In such cases, it seems necessary to allow the syntax to manipulate and position these bound, word-internal exponents independently of the hosts to which they attach. Consider the formation of the phonological words in (11). (11) a. b. c.
That man over there’s working this morning. You aren’t helping. You’re not helping.
/ðɛɹz/ /ɑɹnt/ /jʌɹ/
(11a) illustrates a canonical example of a promiscuous “leaner” enclitic (Zwicky 1985; Klavans 1980; Selkirk 1996). It suffixes to a stem to its left. The stem which hosts it is not a member of the same syntactic constituent as the clitic itself; the clitic is the tensed auxiliary verb, and its stem there is embedded within a prepositional phrase embedded within the subject DP. Despite having fewer phonological words, the usual syntactic analysis of a sentence like (11a) does not differ in any detail from the synonymous sentence with the full auxiliary is rather than the clitic ’s − that is, not many analyses treat there’s in such a sentence as a paradigmatically inflected form of there. Similarly, the syntax and truth-conditional semantics of (11b) and (11c) are identical, despite the fact that they contain significantly different sets of phonological words, [ˈjʌɹ ˈnɑt ˈhɛlpɪŋ] in one case, and [ˈjuw ˈɑɹnt ˈhɛlpɪŋ] in the other. The problems posed by clitics, however, extend beyond the treatment of apparently easily-characterized “leaners” like ’s, or the treatment of bound variants of free forms whose placement is linearly identical to that of the corresponding free words, like ’re. In the case of “special” clitics, like the Romance pronominal clitics, the syntax treats the bound forms significantly differently from the corresponding free forms, and, in fact, seems to use a dedicated variety of syntactic movement to get the clitic to its host. This phrasal movement can be recursively iterated, like other phrasal movements, as illustrated by the Spanish clitic-climbing data in (12) below (examples from Bok-Bennema 2005): poder hacer=lo. (12) a. Juana quisiera Juana want.COND.PST.3SG can do=it ‘Juana would want to be able to do it.’
[Spanish]
poder=loi hacer ei b. Juana quisiera do Juana want.COND.PST.3SG can=it poder hacer ei c. Juana loi=quisiera Juana it=want.COND.PST.3SG can do Similar questions are raised by 2nd-position clitics, a relatively frequent phenomenon crosslinguistically. Clitics suggest that the foundational idea phonological words are syntactic constituents − in particular, that phonological words correspond to syntactic termi-
1134
V. Interfaces
nal nodes − should not go unexamined. For example, if one were to draw a tree in which the phonological word you’re (/jʌɹ/) occupied a single syntactic terminal node, what would the syntactic category of that node be? Is it a V of some kind, perhaps Aux or Infl? Or is it an N of some kind, a pronominal D, for example? From a larger perspective, such facts could even cause one to consider whether the evidence of affixation is relevant for the syntactic analysis at all. In any case, tightly interwoven sets of assumptions concerning the nature of wordhood and the locus of word-formation sharply distinguish many of the current principal frameworks for morphosyntactic analysis.
3. Frameworks Theories of morphology vary considerably in the details of articulation of the interface between morphology and syntax. Compounding this variation is considerable variation in the theories of morphology and syntax themselves. This often can make it difficult to compare theories, but certain dimensions of analytical variation are useful in categorizing approaches; we will next survey some of them here.
3.1. ‘Word’ as an independent level of grammatical organization: lexicalism One primary dimension of variation is that of being a “lexicalist” theory − a theory in which the notion “word” has an independent status, on a par with the notion “sentence”. Lexicalist theories subscribe to some version of the Lexical Integrity Hypothesis (Lapointe 1980), according to which words are built by distinct mechanisms, which are encapsulated from the mechanisms that create syntactic structure. In such theories, there are levels of representation and rules of grammatical structure dedicated to word-formation. Lexical insertion introduces these word forms into sentential syntactic structure, along with their concommitant feature structures. Conventions governing featural relationships in the syntactic structure enforce the required matches between, for example, the form of an inflected verb and its subject’s phi-features − if the features do not match, the syntactic configuration cannot be established. In lexicalist theories, syntax has no access to word-internal structure, and whole words must be syntactic constituents; word structure and syntactic structure are only related to each other by the lexical insertion operation. Most “correspondence”-type grammatical architectures, in which parallel and independent grammatical representations are placed in correspondence by interface constraints relating each submodule to the others, are lexicalist in this sense; Autolexical Syntax (e.g. Sadock 1991), the Parallel Architecture (e.g. Jackendoff 1997), Representation Theory (Williams 2003), Lexical Functional Grammar (e.g. Bresnan 2000), Role and Reference Grammar (e.g. Van Valin and La Polla 1997) and the approach described in Ackema and Neeleman (2004) are all examples. Jackendoff (2010) describes Lamb (1966)’s Stratificational Grammar as the “granddaddy of them all.” An oversimplified schema of this kind of approach is provided in (13):
33. The Syntax-Morphology Interface
1135
(13) Sentence Formation Component
Word Formation Component
OUTPUT: words
correspondence rules
OUTPUT: sentences
In such theories, mismatches between wordhood and syntactic constituency can be characterized by allowing nonoptimal correspondences to surface in cases where optimal morphological structure cannot be mapped to optimal syntactic structure. The theoretical action is in the development and explication of the constraints governing the correspondences, and their interactions and relative importance with respect to each other.
3.2. Derivation, inflection, and weakly lexicalist theories Lexicalist theories can nonetheless vary with respect to the units that they identify as word-forms. A strongly lexicalist theory treats both inflectional and derivational forms as internally impenetrable to syntax; even the formation of clitic+host wordforms may be accomplished within the morphological component, as in the Head-driven Phrase Structure Grammar analysis of French clitics of Miller and Sag (1997), or the Paradigm Function Morphology theory described in Stump (2001). Lexical Funcitonal Grammar is also a strongly lexicalist theory (see, e.g., Bresnan 2001). The ‘checking’ version of Minimalism proposed in Chomsky (1995) was also a species of strongly lexicalist theory. In contrast, weakly lexicalist theories treat derivational morphology as encapsulated with respect to syntax, but allow inflectional morphology to be determined by the syntactic component (rather than simply verifying a correspondence between inflectional marking and syntactic configuration). This corresponds to the intuition that derivational morphology can determine syntactic outcomes, while syntax seems to determine inflectional outcomes. In such theories, a word-formation component produces derivationally complex but uninflected word-forms (often termed “lexemes”). Lexemes are complete stems ready for inflection; their internal structure is insulated from syntactic inspection. Weakly lexicalist theories thus conform to the lexicalist hypothesis at the lexeme level. These word-forms acquire the morphosyntactic features relevant for their inflection by virtue of their position in the clause structure and their participation in certain syntactic relationships. The syntactic representation interacts with a second morphological component, which applies inflectional morpholexical rules to the newly featurally enriched stems to derive the appropriate inflected form of the lexeme. An oversimplified schema is provided in (14). Inflectional morphology, then, is not encapsulated with respect to the syntax, though derivational morphology is; hence the “weak” in “weakly lexicalist hypothesis”. The theories proposed by Aronoff (1976, 1994) and Anderson (1992) fall into this general family. Both types of lexicalist theory are compatible with a relatively sparse view of the syntactic component (à la Jackendoff and Culicover 2005), using a minimum of functional
1136
V. Interfaces
(14) Sentence Formation Component
Word Formation Component I
OUTPUT: lexemes
OUTPUT: Featurally annotated lexemes in sentential structure
Word Formation Component II
OUTPUT: Fully inflected lexemes in sentential structure
superstructure. Neither derivational affixes nor inflectional features head their own projections in the syntactic tree. An internally morphologically complex noun like celebration and a monomorphemic one like party are equivalent in terms of their internal syntactic structure − neither has any − though not in their internal morphological structure.
3.3. The syntax of words: non-lexicalist approaches Many approaches to complex word-formation, however, adopt a non-lexicalist architecture, in which the syntactic component constructs words and phrases alike. In such theories, there is no distinct word-formation module, and word-internal structure participates in syntactic operations and semantic interpretation in the same way that wordexternal structure does. Morphemes, not phonological words, are typically taken to correspond to individual syntactic terminal nodes. An oversimplified schema is provided in (15). Most such approaches adopt a broadly Minimalist view of the structure-building component (Chomsky 1995, 2000, et seq.), but can vary considerably in the nature and amount of special mechanisms they employ to create the complex structures that will end up being realized as phonological word-size units. In such approaches, the more stringent selectional requirements of morphemes and their typically more rigid ordering requirements are not ascribable to differences between the operations available in different components of the grammar, but must have some other source; all structure is built
33. The Syntax-Morphology Interface
1137
(15) Structure Formation Component
OUTPUT: Sentence
by the iteration of the single operation which Chomsky calls Merge, to be carefully distinguished from the DM-specific postsyntactic operation of (M-)Merger. At one end of the syntacticocentric spectrum one can find proposals which derive morpheme orders without any non-syntactic operations at all, often involving Kayne’s Antisymmetry Theory (Kayne 1994). Antisymmetry deterministically maps c-command relationships onto linear order. In this as in other syntacticocentric approaches, morphological pieces head the terminal nodes of syntactic projections. Because the Antisymmetry requirement forces all head-movement to be leftward and left-adjoining, head movement − movement of V° to T°, for example − will necessarily result in a [V°-T°]T° sequence, never in a [T°-V°]T° sequence. Complex words which apparently involve suffixation of a lower to a higher constituent (e.g. Zulu [Agr°-V°] sequences), or which appear to involve rightward head-movement (e.g. Japanese [V°-T°] sequences, which follow all other phrasal material), must then be derived via phrasal remnant movement, rather than head-movement. The repeated application of remnant movement results in a linearly adjacent string of heads in the appropriate syntactic position, which can then agglutinate into a single word-form, motivated by their affixal morphophonology. The V° preceding T° in Japanese, for example, would head a VP out of which all arguments have been moved. That VP occupies some specifier c-commanding the TP, headed by the T° element in situ. All arguments and other constituents occupy still higher specifier position. In such a configuration, V° is the only phonologically realized element in a constituent which c-commands TF, whose only phonologically realized element is T°. By the LCA, c-command determines precedence and so V° precedes T°. See Koopman and Szabolsci’s (2000) analysis of Hungarian verbs for a detailed example of such an approach, or Harbour (2008) for an analysis of discontinuous agreement morphology which appeals to the LCA to ensure that person morphemes appear as prefixes while number morphemes appear as suffixes. At perhaps the opposite end of the syntacticocentric spectrum, the original Distributed Morphology framework (Halle and Marantz 1993) adopts the basic premise that individual morphemes instantiate terminal nodes in syntactic structure proper, but introduces several additional post-syntactic operations to manipulate the ordering, hierarchy and content of syntactic terminal nodes as they are mapped to morphophonology. The operations include reordering (“Morphological Merger”), insertion of new terminal nodes (“Dissociated Morphemes”), deletion (“Impoverishment”), unification (“Fusion”) and splitting (“Fission”), as well as allowing the stipulation of the prefixal or suffixal status of particular items. The net effect is to allow for a considerably looser correspondence between the order and hierarchy determined by the syntactic structure proper and the ultimate morphophonological form. However, in the unmarked case, individual mor-
1138
V. Interfaces
phemes are still expected to exhibit a corresponding correlation with the structure and interpretation of the clause; the additional operations introducing deviations from this idea are marked options employed by a grammar only when necessary to accommodate otherwise unexpected data. Syntactic and morphological structure are interrelated because they are, fundamentally, the same thing. Many variations on the syntacticocentric approach occupy the non-lexicalist spectrum between these two extremes; one feature they tend to have in common, however, is a relatively elaborate functional superstructure in the syntactic tree. This is a natural consequence of assuming an overall one-to-one correspondence between syntactic terminal nodes and morphemes. Other clearly articulated syntacticocentric theories are represented in the work of Emonds (2000), Alexiadou (2001), Borer (2005), Baker (1996), Ramchand (2008), Travis (forthcoming), among others; most of the above adopt the general syntacticocentric program within broadly Chomskyan syntactic theory but do not elaborate as extensively on morphophonological interface issues. See section 4 below for further discussion and exemplification of certain issues in the syntactic approach to word-formation. First, however, we briefly consider one more dimension of theoretical variation: realizational vs. projection-based approaches to morphology.
3.4. Late vs. Early Insertion theories: realization vs. projection and percolation Weakly lexicalist theories, in which at least some phonological spell-out of word forms follows the syntactic derivation, were dubbed “realizational” by Anderson (1992), since morpholexical rules simply “realize” the morphological and morphosyntactic features of the grammar, rather than introducing features on their own. Beard (1995)’s LexemeMorpheme Base Morphology is realizational in this sense, although more radically so; all spell-out of both derivation and inflection, occurs post-syntactically, interpreting the morphological and morphosyntactic features associated with a stem, and see also Borer’s (1998: 153) discussion of type (2) vs. type (3) models. Realizational theories can be contrasted with “early insertion” approaches, like the syntacticocentric proposal of Lieber (1992, 2004), where sub-word phonological exponents are treated as true Saussurean signs (what Stump 2001 refers to as incremental theories of morphology). For Lieber, affixes, like roots and stems, are phonological strings listed in the lexicon with particular morphosyntactic features and meanings, and are built up into complex structures whose semantics derive from the features associated with each exponent by percolation. Distributed Morphology, in contrast, is aggressively realizational, like Beard’s Lexeme-Morpheme Base theory. All types of word-internal structures, inflectional and derivational, are assembled by the syntactic component. Rather than assembling actual affixes, as in Lieber’s theory, the syntactic component assembles abstract feature bundles, with associated interpretations. These bundles are only provided with phonological exponence postsyntactically, when a series of Vocabulary Items relating phonological information with morphosyntactic features compete for insertion into terminal nodes of the morphosyntactic structure. The realizational character of Distributed Morphology is char-
33. The Syntax-Morphology Interface
1139
acterized by the term “Late Insertion”, referring to the fact that insertion of phonological exponents follows all syntactic operations. This process is illustrated with a sample derivation in section 4.2 below.
3.5. Typology of theories of the morphosyntactic interface We can thus categorize theories of morphosyntax according to their theoretical choices on a number of dimensions. Are they lexicalist or non-lexicalist? Non-lexicalist theories have only a single structure-building component (the syntax), while lexicalist theories involve some form of generative lexicon, with a separate system of principles and operations, which builds complex word-forms, as well as a generative syntactic component that builds sentences. If a theory is lexicalist, is it strongly lexicalist or weakly lexicalist? Strongly lexicalist and non-lexicalist theories share the important characteristic of treating derivational and inflectional morphology in the same component of the grammar, making use of the same theoretical machinery, while weakly lexicalist theories posit a word-formation component which treats derivation and a separate postsyntactic component which treats inflection. On the other hand, non-lexicalist and weakly lexicalist theories share the property that syntax is responsible for some or all morphological structure building. Finally, one can ask of any theory, is it realizational or Saussurean? A further dimension of variation with some repercussions for the morphology-syntax interface is that between process-based and piece-based theories of morphological marking, which Stump (2001) terms inferential vs. lexical theories. This distinction depends on whether a theory views affixation as the central morphological operation, treating non-concatenative morphological processes such as ablaut as essentially incidental decoration, implemented via readjustment rules or the like. Such a concatenative view would be piece-based (“lexical”). The process-based (“inferential”) approach, in contrast, treats affixation, ablaut, truncation, and other morphophonological markers on an equal footing, viewing all morphological marking, concatenative or otherwise, as instantiating a rewriting function that changes the shape of a stem in a specified way in a particular featural context. The debate concerning this theoretical choice, however, has been little attended to in syntactic circles, being a more purely morphological issue, somewhat orthogonal to morphosyntax. It is worth noting, however, that most syntacticians tend to think in terms of piece-based, concatenative morphology, and hence most syntacticocentric morphological theories adopt this kind of view. If one’s morphological component is realizational and postsyntactic, however, there is no technical obstacle to adopting a process-based/inferential approach, e.g. like that of Anderson (1992). I now turn to a more detailed exposition of the development of syntacticocentric morphology, ultimately focusing on the non-lexicalist, realizational theory of Distributed Morphology (DM henceforth), and illustrate how typical DM analyses interact with the Minimalist syntactic architecture within which they are couched. The combination of Distributed Morphology and Minimalist syntactic theory allows the development of fully articulated analyses in which the mapping from syntax to both morphophonological form and to logical form can be (though often is not) quite fully specified. Significant theoretical problems nonetheless remain, of course, but the attraction of a fully-integrated system, in which morphological considerations have clear implications for specific syntactic and even model-theoretic semantic analyses, and vice versa, should be clear.
1140
V. Interfaces
4. Syntacticocentric morphology 4.1. Brief history within Chomskyan linguistic theory The line of syntactic analysis pursued in generative linguistic theory has undergone considerable alteration on many separate occasions, and the generative view of the relationship between syntax and phonology is no exception. The framework in Syntactic Structures was non-lexicalist (proposing, e.g. the Affix Hopping transformation to attach English verbal suffixes to their hosts), as was the generative semantics approach which emerged in the late sixties and early 70s (see, e.g. McCawley 1968), in which complex multiclausal structures could be collapsed via Predicate Raising and spelled-out as a single morphological word. Partially in reaction to such proposals, Chomsky (1970) argued that certain derivational processes, such as nominalization, must occur in the presyntactic Base component, ushering in an era of Lexicalism within generative grammar. A strongly Lexicalist architecture, including an argument-structure-manipulating wordformation component as well as the syntactic component, was broadly adopted within generative grammar until the late 1980s; perhaps the most comprehensive proposal concerning the properties of that word-formation component and its interaction with syntax is put forward in DiSciullo and Williams (1987). At this point, two lines of research conspired to suggest that a non-lexicalist framework was perhaps better suited to capture certain properties of the syntax-morphology interface. Within Government and Binding theory, the work of Baker (1985, 1988) played a central role in this development. He showed that morphological structures “mirrored” syntactic structures in languages where morphological reordering is possible (Baker 1985). Consider, for example, the following two sentences from Hiaki: uka uusi-ta bwik-tua-sae. (16) a. Maria-ta=ne Maria-ACC=1SG.NOM the.ACC child-ACC sing-CAUS-DIR ‘I am telling Maria to make the child sing.’
[Hiaki]
uka uusi-ta bwik-sae-tua. b. Maria-ta=ne Maria-ACC=1SG.NOM the.ACC child-ACC sing-DIR-CAUS ‘I am making Maria tell the child to sing.’ Depending on the order of affixation of the directive, glossed as DIR, and the causative suffixes, the nominative first person subject is interpreted as the director or the causer, respectively, in precisely the same way that the periphrastic English translations vary in interpretation depending on the embedded or matrix position of the corresponding verbs. Based on similar facts concerning the relative scope and order of argument-structure affixes in various languages, Baker argued that the theory must explain why morphological structure should so closely track syntactic structure and semantic interpretation; identifying morphological and syntactic structure provided one way to do that. A related line of work was developed by Lisa Travis (1984). She observed that in structures where heads undergo reordering with other syntactic constituents, (English auxiliary movement in question formation, for example), that reordering seemed to be subject to a rigid structural constraint, namely that it was limited to movement to the closest c-commanding head, being unable to skip any intervening c-commanding head.
33. The Syntax-Morphology Interface
1141
(Such skipping would result in an XP intervening between the moved head and its trace, creating an Empty Category Principle (ECP) violation). The properties of what Travis called the Head-Movement Constraint showed that head-movement was sensitive to the same kinds of constraints and structural relations as phrasal movement. The clinching development was Baker’s employment of head movement to derive noun-incorporation and verbal complex predicate structures in Mohawk and other languages. Baker pointed out that noun-incorporation in Mohawk was limited to incorporation of a direct object argument into its governing verb: (17) Owira’a waha’-wahr-ake’ baby AGR-meat-ate ‘The baby ate meat.’ (Baker 1988)
[Mohawk]
Incorporation of an external agent argument into the verb is ungrammatical in Mohawk. Baker pointed out that if noun-incorporation structures were created from a syntactic representation by head-moving independent morphemes, Travis’s Head Movement Constraint allows the agent-incorporation restriction to be explained syntactically. The explanation went essentially as follows: Because the trace of head-movement is subject to the ECP, like other traces of syntactic movement, ungoverned traces result in ungrammaticality. Subject noun-incorporation, moving “down” to incorporate into a verb which the subject c-commands, would leave such an ill-formed, ungoverned trace; this formulation is not without problems, as pointed out by, e.g. Borer (1998: 168). The corollary, of course, is that the approach also provided a way to derive the Mirror Principle. If syntactic head-movement assembles independent morphemes, obeying strict locality constraints, the Mirror Principle is predicted. The linear order of morphemes should match their position in the syntactic hierarchy, since the former are derived by head-movement of the latter, which must always be stepwise and upwards in the tree, due to the Head Movement Constraint. Consequently, variation in the order of morphemes as in the Hiaki example (16) above is predicted to reflect a variation in the syntactic structure which generated them, and hence is predicted to correlate with a variation in the interpretive scope of the various morphemes at LF. The combination of Travis’s and Baker’s proposals showed that treating certain complex morphological forms as syntactically derived could result in compelling accounts of some patterns of ungrammaticality in word-building, such as the ban on agent-incorporation. At least some word-formation processes, besides sharing a common vocabulary of categories with syntax, seem to be subject to independently motivated, purely syntactic locality constraints, and are sensitive to purely syntactic structural relationship such as c-command. Consequently, the approach represented an explanatory advance over the postulation of independent mechanisms for the two. Once Rizzi (1990) unified the locality constraints on all three types of movement (head, A and A′) with the idea of Relativized Minimality, the era of economy-driven Minimalist syntactic architecture was ushered in.
4.2. Distributed Morphology Distributed Morphology (Halle and Marantz 1993) is a syntacticocentric, realizational, piece-based, non-lexicalist theory of word (and sentence) formation. A Minimalist syn-
1142
V. Interfaces
tactic component constructs complex hierarchical representations from meaningful and syntactically atomic abstract items, such as [+pl]Num , [+def]D √CAT, [+past]T, etc. These elements have selectional and featural requirements which must be satisfied in order for any representation containing them to converge at the interfaces with the sensory/motor apparatus and the conceptual/intentional apparatus − that is, at PF and LF, respectively. The abstract items undergo Merge, Move and Agree operations, incrementally forming larger and larger syntactic structures, until some pre-determined point at which Spell-Out occurs. The complex structure is then sent to Spell-Out − shipped off for interpretation at the interfaces, both PF and LF. It is the interface with PF that concerns us here.
4.2.1. Late Insertion, the Elsewhere Principle and Positions of Exponence Let us consider a simplified sample derivation of a sentence in Distributed Morphology. Imagine an initial Numeration, input to the syntactic feature-building operations, consisting of abstract feature bundles such as those listed in (18): (18) { [D+1, +pl, uNOM], [T+past, +NOM], [D+3, +pl, +ACC], [VKEEP, uACC] } The syntax merges and moves these feature bundles to create a syntactic tree in which all the necessary feature-checking has been accomplished. For example, the uninterpretable Nominative case feature on [+1, +pl]D has been valued and deleted under an Agree relationship with the T° head; similarly the uninterpretable Accusative case feature on [+3, +pl] has been valued and deleted under Agree with the transitive verb. The (simplified) tree in (19) is then handed off to Spell-Out, for interpretation at both LF and PF. At LF, the denotations of the abstract elements from the Numeration are accessed and composed into an interpretable representation via a finite set of licit composition operations − function application, predicate modification, type-lifting, and possibly others; each binary constituent is interpreted by composing the denotations of its daughter constituents. (19)
TP Di
T’
+1 +pl
T°
uNOM
+past +NOM
VP
Di
V’ V°
D
KEEP
+3
+pl
+ACC uACC
33. The Syntax-Morphology Interface
1143
Analogously, at PF, certain purely morphological operations apply to transform the abstract syntactic hierarchical structure into a linear string of phonologically interpretable segments. Certain operations can apply to the morphological terminal nodes to adjust their content or position: Impoverishment, Fission, and Fusion, among others. “Morphological Merger” is another example of such a postsyntactic operation. In English, the [+past] and [+present] T° nodes are ill-formed as free words on their own, so a morphological repair operation of M-merger applies prior to phonological realization. It attaches the T° terminal node to the V° terminal node, a modern implementation of Chomsky (1957)’s Affix-hopping operation, illustrated below: (20)
TP Di
T’
+1 +pl
VP
uNOM Di
V’
V°
D +3
“M-merger
„
V°
T°
KEEP +past
+pl uACC
+ACC +NOM
This operation, by hypothesis, is sensitive to adjacency in the hierarchical syntactic structure, so, for example, if a Neg° head intervenes, or if T° has head-moved to the C° position, it cannot occur, and instead an alternative morphological repair applies, involving insertion of a “dissociated” V° terminal node to support T° (‘do’-insertion, see Bobaljik 1994). At this point, the structure is ready for phonological material, and Vocabulary Insertion can occur. Each terminal node in the syntactic structure has an associated positionof-exponence, which must be discharged by insertion of a Vocabulary Item. To accomplish this, a list of phonological pieces is accessed. These pieces, termed Vocabulary Items, are pairs of phonological and morphosyntactic information. Crucially, the morphosyntactic information can be underspecified: A Vocabulary Item’s morphosyntactic features must be a subset of the features in the terminal node it realizes: it can have fewer features than the terminal node, for example, but it cannot have more features than are present in the node, or conflicting features. During the Vocabulary Insertion operation, all Vocabulary Items which are compatible with the terminal nodes of the representation under consideration compete to realize the available positions-of-exponence. In keeping
1144
V. Interfaces
with Kiparksy’s (1973) Elsewhere Principle, only the most highly specified VI will be inserted and discharge the position of exponence, though see Caha (2009) for an interesting alternative formulation of competition in terms of a superset principle. So, for example, assuming that the VI we (/wij/) is specified for [+1, +pl, +nom], and the VI I (/aj/) is specified for [+1, +nom], both will be compatible with the terminal node of the D° head in subject position in our toy derivation, and both will compete for insertion at that node. However, we will be inserted, blocking I, as it is more highly specified. Let us assume the following Vocabulary Items and feature specifications: (21) a. b. c. d. e. f. g.
we I them it kepkeep -ed
↔ ↔ ↔ ↔ ↔ ↔ ↔
[+1, +pl, +nom] [+1, +nom] [+3, +pl, +acc] [+3] [KEEP] / ____ [+past] [KEEP] [+past]
The Vocabulary Items I, we, it, kep-, keep, -ed, and them are all compatible with the available positions, but only the most highly specified VI at each slot succeeds in actually realizing the terminal node. The competition is illustrated at the bottom of the tree. Note that in earlier Distributed Morphology proposals, lexical items like KEEP would have been treated as undifferentiated roots, acquiring both Encyclopedic content and phonological content at the Late Insertion operation. However, later argumentation, notably that presented in Pfau (2009), suggests that roots must be differentiated in the syntactic component. This allows me here to treat the irregular past stem kep- as a competing allomorph of keep, rather than with a morphological readjustment rule; see e.g. Siddiqi (2009) for discussion of this and other alternatives. Following Vocabulary Insertion, then, the representation includes a hierarchically organized linear string of phonogical segments which can then be interpreted by the phonogical component, undergoing phonologically conditioned allomorphy (in this case, past tense [d] will be devoiced to [t] in the environment of voiceless [p] by virtue of the phonotactic requirements on English complex codas), stress assignment, phrasal phonology and reduction processes. The approach allows a natural account of mirror principle effects, while still providing mechanisms which permit structures to surface in which the mirror principle fails to hold. It provides an explicit account of the relationship between syntax and morphology, and most it calls for only a single generative engine − it requires no special wordbuilding mechanisms in the lexical component. That is, there is no need for a separate level of paradigmatic structure to generate inflected and/or derived word forms, or to capture the blocking effect. Further, it should be clear that the syntactic Agree operation of Minimalist theory, being subject to locality constraints, will make clear predictions concerning the possible positions and featural content of terminal nodes in the structure. The mechanisms by which inflectional morphology is interpreted, positioned, and realized in the theory, as well as the relevance of syntactic operations for morphological form and vice versa, should be clear. For more explict discussion of many of the details, see Embick and Noyer (2006) and references cited therein.
33. The Syntax-Morphology Interface (22)
1145
TP
T’
VP
Di
V’
V° Di
D
+1
V°
+pl
KEEP +past
uNOM _____
T°
+3 +pl
+ACC uACC
Positions-ofex ponence
_____ _____ _____ Winning VIs
we
kep
(I)
(keep)
-ed
them (it)
Competing but losing VIs—eligible for insertion but not the most highly specified
4.3. High/low derivation and the Elsewhere Principle One of the most productive types of analysis that has emerged from the above research program focusses on word-formation on the idiosyncratic/productive cusp. Individual derivational affixes can be very polyfunctional, instantiating what seem to be distinct but related processes. English -ed, for example, forms adjectives, verbal passives and perfective participles as well as the past tense. The -ing suffix generates deverbal nouns, gerunds, and the progressive participle. Japanese -sase can generate both lexical causation in inchoative/causative pairs and productive causation in other contexts. Typically,
1146
V. Interfaces
the domain of application of such polyfunctional morphology is broader in some of its functions than others; -sase, for example, can generate a productive causative from almost any independent verb stem, but is blocked from forming lexical causatives with many inchoative verb roots by irregular causative suffixes specific to a small class of roots. English adjectival participles are more irregular than English passive participles, e.g. adjective molten vs. participle melted and adjective open vs. participle opened, so regular -ed has a narrower domain of application in the former than the latter. In modern syntacticocentric approaches, this pattern in which one affix can span the derivational/inflectional divide in a series of increasingly productive functions can be analyzed as involving the lower or higher attachment of a head containing the relevant conditioning feature in the extended projection of a given category. The mechanism of morphological competition means that a polyfunctional lexical item can be treated as the “elsewhere”, most underspecified vocabulary item which realizes that feature. Consequently, that item will realize that same feature in multiple different syntactic contexts, when it is not blocked by a more specific, locally conditioned vocabulary item. The lower in the tree, closer to the lexical root, the greater the likelihood of both semantic and morphological irregularity and the more “lexical” the alternation; the higher in the tree, the more functional structure intervening between the relevant node and the lexical root, the more productive and regular the alternation, see Embick (2010) for discussion of the idea that conditioning of irregular morphology by the root is impossible across a phase boundary. In the case of higher attachment we expect to see evidence of the presence of these functional projections in the form of syntactic and semantic properties associated with the functional categories which are included in the derived form, and we expect the absence of such properties in the low-attachment forms. An example of an early proposal of this type is Kratzer (1996), in which Abney’s distinction between of-ing and acc-ing gerunds in English is reinterpreted as an argument concerning the locus of attachment of the -ing suffix. In Kratzer’s proposal, there are two separate projections in the syntax of the VP: the VP itself, which selects for the internal argument, and a higher VoiceP, which introduces the external argument and is responsible for the assignment of accusative case to the object. Kratzer proposes that -ing can attach high, to the VoiceP projection above the VP, or lower, to the simple VP, and that this variation in size of the complement of -ing accounts for the variable presence of accusative case: The subject noun phrase in Mary’s reading Pride and Prejudice upset Bill involves an -ing attached outside VoiceP, so accusative case is available for the object DP Pride and Prejudice within the nominal, while the -ing attaches low, to the VP, in Mary’s reading of Pride and Prejudice upset Bill. In the latter case, although the verb in the VP is able to select for its internal argument Pride and Prejudice, there is no accusative case available due to the absence of VoiceP in the internal structure of the nominal; a Last Resort operation thus must insert of to inherently case-mark the object. The same pieces of morphology is inserted into both the higher and lower nominalizing heads, as no more specific nominalizer is present to block its insertion in one or the other environment, but the resulting forms have importantly distinct properties. For high/low attachment analyses of participial, nominalizing and causativizing morphology in a variety of languages, see Miyagawa (1994, 1998), Marantz (1997), Travis (2000), Embick (2004), Fortin (2004), Jackson (2005), Alexiadou and Anagnostopoulou (2005), Svenonius (2005), among others.
33. The Syntax-Morphology Interface
1147
4.4. Hybrid approaches: Borer Despite the productivity of the conceptual framework described in the previous section, important problems remain, which certain types of alternative approaches are to date considerably more adept at accomodating. For example, the important contrast between Argument Structure deverbal nominals and morphophonologically identical Result nominals identified by Grimshaw (1990) is difficult to account for purely in terms of high vs. low attachment of nominalizing morphology. AS nominals, as in The Romans’ destruction of the city took three days, exhibit a number of verbal properties, including mandatory selection of internal arguments and rich event-structural semantics; R nominals, as in The destruction was extensive, on the other hand, exhibit no such properties, despite having identical word-internal structure ([[de-struct]-ion]). Particularly difficult for the DM approach to account for are R nominals which contain overt verbal morphology, such as nominalization: [[[[nomin]√-al]a-iz]v-ation]n. These nominals are systematically ambiguous between an AS reading (as in Frequent nominalization of verbs is typical of academic writing) and an R reading (as in That nominalization bears stress on the penultimate syllable). As pointed out by Borer (2003), Ackema and Neeleman (2004) and Alexiadou (2009), these structures pose a significant problem, in that they contain overt verbalizing morphology (-ize). In DM, since each morpheme realizes a syntactic terminal node, the mere presence of -ize should definitively diagnose the presence of verbal structure. However, on the R reading of such nominalizations, no verbal syntactic structural properties are present; that was the crucial result discovered by Grimshaw (1990). Borer argues that this type of alternation can be handled more perspicaciously in a model in which the same primitives can enter into word-formation operations either presyntactically or in the syntax proper. That is, a lexical mechanism of word-formation is available in the grammar alongside with the syntactic mechanism. This parallel model is a natural development of earlier approaches to the kind of high/low attachment data described in the previous section. Earlier approaches characterized this kind of variation in terms of affix attachment to lexical items vs. affix attachment to phrases, as in the treatments of adjectival vs. verbal passive forms proposed in Jackendoff (1977) and Borer (1984). The hybrid morphological system proposed in Borer (1991, 1998), called Parallel Morphology, develops this line of thought. In the Parallel Morphology framework, lexical entries (affixes and roots) can participate in structure-building operations in two ways. They may be assembled into words presyntactically, in a word-formation component which does not involve projection of any phrase-level constituents. The output of this presyntactic structure building enters the phrasal syntactic derivation just like any other word-form, lacking complex internal syntactic structure of any kind. Thus, the R nominalization nominalization can enter the syntax as a head of category N, and subsequently projects and behaves just like any other N element, mono- or multi-morphemic. Alternatively, individual affixes can enter the syntax independently, generating phrasal projections just like any other syntactic terminal node, and getting assembled into complex multi-morphemic words like nominalization via syntactic mechanisms, such as head-movement or post-syntactic mechanisms such as M-merger. In that case, each root and affix will project a syntactic XP, and the syntactic selectional requirements associated with each root and affix will need to be satisfied. Hence, the presence of a verbal VP
1148
V. Interfaces
projection headed by -ize entails that AS nominals will need to satisfy verbal selectional restrictions, resulting in the mandatory presence of the syntactic arguments of the verb, and the interpretive effects associated with VP functional structure are predicted to be present, producing the event-structure semantics of such forms. Insofar as high/low attachment analyses have merit, they are certainly still implementable in Parallel Morphology terms, with the additional option of deriving complex words presyntactically as well. For a full discussion of some of the above issues, including the nature and interpretation of the verbal functional structure which may be contained within nominalizations, see Alexiadou (2010).
5. Conclusion The above discussion can only hope to scratch the surface of the active research questions concerning the morphology-syntax interface, providing some pointers to relevant recent work and a rough overview of the rapidly evolving theoretical terrain. It should be clear, however, that there are some enduring and recurring themes, particularly concerning the difference or lack thereof between syntactic and morphological structurebuilding − the difference between lexicalist and non-lexicalist approaches. Empirical and methodological advances of recent years, however, may help shed new light into these dark corners. One development of primary importance is the explosion of theoretical research on lesser-studied and typologically distinct languages over the past decade or more. Many assumptions about the word/phrase distinction have had to be abandoned that in retrospect turned out to only be valid with respect to English and typologically similar languages. The upshot of this influx of new empirical insight is that some strongly lexicalist and strongly non-lexicalist approaches are rather surprisingly converging on similar architectures, in that more and more syntactic structure-building is accomplished in the lexical component of lexicalist approaches like HPSG, and we have seen that non-lexicalist approaches like DM are moving more and more or even all morphological structure-building into the syntactic component. When syntactic and morphological structures are both being assembled with the same generative mechanism, the problem of the syntax-morphology interface has been handled in a reductive way − namely, there is no such interface; syntax and morphology are two names for the same thing. Another development which has the promise to strongly inform our future understanding of the syntax-morphology relationship is the increasing use of psycholinguistic methodologies to probe the time course of on-line morphological processing and production, and the increasing recognition of the relevance of such research to theoretical questions. Of course such methodologies have been in use for several decades, and have yielded considerable insight already (not even touched on above, unfortunately). However, as theories come to make more and more fine-grained predictions, it is possible that such methodologies will become indispensable in deciding between alternative hypotheses; self-guided introspection about well-formedness becomes increasingly unreliable. And such methodologies are becoming more and more portable and accessible, enabling researchers to use these tools to study languages that are not easily accessible within the university laboratory context − many of which, as noted above, have typologi-
33. The Syntax-Morphology Interface
1149
cal properties that promise to shed considerable new light on the morphology-syntax interface. I look forward very much to watching the state of the art evolve over the coming decade.
6. References (selected) Ackema, Peter, and Ad Neeleman 2004 Beyond Morphology: Interface Conditions on Word Formation. Oxford: Oxford University Press. Aldridge, Edith 2008 Generative approaches to ergativity. Language and Linguistics Compass 2(5): 966−995. Alexiadou, Artemis, and Elena Anagnostopoulou 2005 On the syntax and morphology of Greek participles. Talk presented at the Workshop on the Morphosyntax of Modern Greek, LSA Institute, July 2005. Alexiadou, Artemis 2001 Functional Structure in Nominals: Nominalization and Ergativity. Amsterdam: John Benjamins. Alexiadou, Artemis 2009 On the role of syntactic locality in morphological processes: the case of (Greek) derived nominals. In: A. Giannakidou, and M. Rathert (eds.), Quantification, Definiteness and Nominalization, 253−280. Oxford: Oxford University Press. Alexiadou, Artemis 2010 Nominalizations: A probe into the architecture of grammar Parts I and II. Language and Linguistics Compass 4(7): 496−523. Anderson, Stephen R. 1986 Disjunctive ordering in inflectional morphology. Natural Language and Linguistic Theory 4: 1−32. Anderson, Stephen R. 1992 A-morphous Morphology. Cambridge: Cambridge University Press. Aronoff, Mark 1976 Word Formation in Generative Grammar. Cambridge, MA: MIT Press. Aronoff, Mark 1994 Morphology by Itself: Stems and Inflectional Classes. Cambridge: MA: MIT Press. Baerman, Matthew, Dunstan Brown, and Greville G. Corbett 2005 The Syntax-Morphology Interface: A Study of Syncretism. Cambridge: Cambridge University Press. Baker, Mark C. 1985 The mirror principle and morphosyntactic explanation. Linguistic Inquiry 16(3): 373− 415. Baker, Mark C. 1988 Incorporation. Chicago: The University of Chicago Press. Baker, Mark C. 1992 Thematic conditions on syntactic structures: evidence from locative applicatives. In: I. M. Roca (ed.), Thematic Structure: Its Role in Grammar, 23−46. Berlin: Walter de Gruyter. Baker, Mark C. 1996 The Polysynthesis Parameter. Oxford: Oxford University Press. Beard, Robert 1995 Lexeme-Morpheme Base Morphology. New York: SUNY Press.
1150
V. Interfaces
Bejar, Susana, and Milan Rezac 2003 Cyclic agree. Lisbon Workshop on Agreement. Univesidade Nova de Lisboa. Bejar, Susana 2003 Phi-syntax: a theory of agreement. Ph.D. dissertation, University of Toronto. Bhatt, Rajesh, and Roumyana Pancheva 2004 Late merger of degree clauses. Linguistic Inquiry 35(1): 1−45. Bobaljik, Jonathan David 1994 What does adjacency do? In: H. Harley, and C. Phillips (eds.), MIT Working Papers in Linguistics 22: The Morphology-Syntax Connection, 1−32. Bok-Bennema, Reineke 2005 Clitic Climbing. In: M. Everaert, and H. van Riemsdijk (eds.), The Blackwell Companion to Syntax. Blackwell Publishing. Blackwell Reference Online. 30 Aug. 2009. http:// www.blackwellreference.com.ezproxy2.library.arizona.edu/subscriber/tocnode?id= g9781405114851_chunk_g978140511485116. Borer, Hagit 1984 The Projection Principle and rules of morphology. Proceedings of NELS 14: 16−33. Borer, Hagit 1991 The causative-inchoative alternation: A case study in parallel morphology. Linguistic Review 8: 119−58. Borer, Hagit 1998 Morphology and syntax. In: A. Spencer, and A. M. Zwicky (eds.), The Handbook of Morphology, 151−190. Oxford: Blackwell. Borer, Hagit 2003 Exo-skeletal vs. endo-skeletal explanations: syntactic projections and the lexicon. In: J. Moore, and M. Polinsky (eds.), The Nature of Explanation in Linguistic Theory, 31−67. Stanford: CSLI Publications. Borer, Hagit 2005 Structuring Sense, volumes I and II. Oxford: Oxford University Press. Bresnan, Joan 1982 Polyadicity. In: J. Bresnan (ed.), The Mental Representation of Grammatical Relations. Cambridge, MA: MIT Press. Bresnan, Joan 2001 Lexical Functional Syntax. Cambridge: Blackwell. Caha, Pavel 2009 The Nanosyntax of Case. Ph.D. thesis, University of Tromsø. Carmack, Standford 1997 Blocking in Georgian verb morphology. Language 73(2): 314−38. Chomsky, Noam 2000 Minimalist inquiries: The framework. In: R. Martin et al. (eds.), Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik, 89−155. Cambridge, MA: MIT Press. Chomsky, Noam 1970 Remarks on nominalization. In: R. Jacobs, and P. Rosenbaum (eds.), Readings in English Transformational Grammar, 184−221. Waltham, MA: Baisdell. Chomsky, Noam 1981 Lectures in Government and Binding. Dordrecht: Foris. Chomsky, Noam 1995 The Minimalist Program. Cambridge: MIT Press. DiSciullo, Anne-Marie, and Edwin Williams 1987 On the Definition of Word. Cambridge: MIT Press. Embick, David, and Rolf Noyer 2006 Distributed Morphology and the syntax-morphology interface. In: G. Ramchand, and C. Reiss (eds.), Oxford Handbook of Linguistic Interfaces, 289−324. Oxford: Oxford University Press.
33. The Syntax-Morphology Interface
1151
Embick, David, and Rolf Noyer 2001 Movement operations after syntax. Linguistic Inquiry 32(4): 555−595. Embick, David 2004 On the structure of resultative participles in English. Linguistic Inquiry 35(3): 355−392. Embick, David 2007 Blocking effects and analytic/synthetic alternations. Natural Language and Linguistic Theory 25(1): 1−37. Embick, David 2010 Localism versus Globalism in Morphology and Phonology. Cambridge, MA: MIT Press. Emonds, Joseph E. 2000 Lexicon and Grammar: The English Syntacticon. Berlin and New York: Mouton de Gruyter. Fortin, Catharine R. 2004 Minangkabau causatives: Evidence for the l-syntax/s-syntax division. Talk presented at AFLA 2004, April 2004, ZAS Berlin. Georgala, Effi, Waltraud Paul, and John Whitman 2008 Expletive and thematic applicatives. In: C. B. Chang, and H. J. Haynie (eds.), Proceedings of the 26th West Coast Conference on Formal Linguistics, 181−189. Somerville, MA: Cascadilla Press. Gerdts, Donna B. 1988 Semantic linking and the relational structure of desideratives. Linguistics 26: 843−872. Grimshaw, Jane 1990 Argument Structure. Cambridge, MA: MIT Press. Gurevich, Olga 2006 Constructional Morphology: The Georgian Version. Ph.D. dissertation, UC Berkeley. Hagstrom, Paul 1996 do-support in Korean: Evidence for an interpretive morphology. In: H-D. Ahn, M-Y. Kang, Y-S. Kim, and S. Lee (eds.), Morphosyntax in Generative Grammar (Proceedings of 1996 Seoul International Conference on Generative Grammar), 169−180. Seoul: The Korean Generative Grammar Circle, Hankwuk Publishing Co. Halle, Morris, and Alec Marantz 1993 Distributed Morphology and pieces of inflection. In: K. Hale, and S. J. Keyser (eds.), The View from Building 20: Essays in Linguistics in Honor of Sylvian Bromberger, 111− 176. Cambridge, MA: MIT Press. Harbour, Daniel 2008 Discontinuous agreement and the morphology-syntax interface. In: D. Harbour, D. Adger, and S. Béjar (eds.), Phi-theory: Phi-features across Modules and Interfaces, 185−220. Oxford: Oxford University Press. Harley, Heidi 2004 Merge, conflation, and head movement: The First Sister Principle revisited, In: K. Moulton and M. Wolf (eds.), Proceedings of NELS 34, UMass Amherst, GLSA. Jackendoff, Ray, and Peter Culicover 2005 Simpler Syntax. Oxford: Oxford University Press. Jackendoff, Ray 1997 The Architecture of the Language Faculty. Cambridge, MA: MIT Press. Jackendoff, Ray 2010 The parallel architecture and its place in cognitive science. In: B. Heine, and H. Narrog (eds.), The Oxford Handbook of Linguistic Analysis, 645−668. Oxford: Oxford University Press. Jackson, Eric 2005 Derived statives in Pima. Paper presented at the SSILA Annual Meeting, 7 January 2005. http://www.linguistics.ucla.edu/people/grads/ejackson/SSILA05PimaDerivedSta tives.pdf
1152
V. Interfaces
Jo, Jung-Min 2000 Korean do-support revisited: Its implications for Korean verbal inflections. Chicago Linguistics Society 36(2): 147−162. Johns, Alana 1999 On the lexical semantics of affixal ‘want’ in Inuktitut. International Journal of American Linguistics 65(2): 176−200. Kayne, Richard 1994 The Antisymmetry of Syntax. Cambridge: MIT Press. Kennedy, Christopher 1999 Projecting the Adjective: The Syntax and Semantics of Gradability and Comparison. New York: Garland. Keyser, Samuel J., and Tom Roeper 1992 Re: The Abstract Clitic Hypothesis. Linguistic Inquiry 23(1): 89−125. Keyser, Samuel J., and Tom Roeper 1984 On the middle and ergative constructions in English. Linguistic Inquiry 15: 381−416. Kim, Soowon, and Joan Maling 1994 Case assignment in the siphta construction and its implications for Case on adverbs. In: R. King (ed.), Descriptions and Explanation in Korean Linguistics, 133−168. Ithaca, NY, East Asia Program, Cornell University. King, Tracy H. 1994 SpecAgrP and Case: Evidence from Georgian. In: H. Harley, and C. Phillips (eds.), MIT Working Papers in Linguistics 22, 91−110. Cambridge, MA: MIT Press. Kiparsky, Paul 1973 Elsewhere in phonology. In: S. R. Anderson, and P. Kiparsky (eds.), A Festschrift for Morris Halle, 93−106. New York: Holt, Reinhart and Winston. Kiparsky, Paul 2005 Blocking and periphrasis in inflectional paradigms. Yearbook of Morphology 2004: 113−135. Klavans, Judith 1980 Some aspects of a theory of clitics: The syntax-phonology interface. Ph.D. dissertation, University College, University of London. Koopman, Hilda, and Anna Szabolcsi 2000 Verbal Complexes. Cambridge: MIT Press. Lamb, Sydney 1966 Outline of Stratificational Grammar. Washington: Georgetown University Press. Lebeaux, David 1986 The interpretation of derived nominals. In: A. Farley, P. Farley, and K. McCullough (eds.), Chicago Linguistics Society 22, 231−247. Chicago. Legate, Julie Ann 2008 Morphological and abstract Case. Linguistic Inquiry 39(1): 55−101. Levin, Beth 1999 Objecthood: An event structure perspective. Chicago Linguistic Society 35: 223−247. Lieber, Rochelle 1992 Deconstructing Morphology. Chicago: University of Chicago Press. Lieber, Rochelle 2004 Morphology and Lexical Semantics. Cambridge: Cambridge University Press. Lomashvili, Leila, and Heidi Harley (Forthcoming) Phases and templates in Georgian agreement. Studia Linguistica. Marantz, Alec 1989 Relations and configurations in Georgian. Ms. University of North Carolina. Chapel Hill.
33. The Syntax-Morphology Interface
1153
Marantz, Alec 1993 Implications of asymmetries in double object constructions. In: S. A. Mchombo (ed.), Theoretical Aspects of Bantu Grammar 1, 113−150. Stanford, CA: CSLI Publications. Marantz, Alec 1997 No escape from syntax: Don’t try a morphological analysis in the privacy of your own lexicon. In: A. Dimitriadis, L. Siegel, et. al. (eds.), University of Pennsylvania Working Papers in Linguistics 4.2, Proceedings of the 21st Annual Penn Linguistics Colloquium, 1997, 201−225. McCawley, James 1968 Lexical insertion in a transformational grammar without Deep Structure. Chicago Linguistics Society 4: 71−80. McGinnis, Martha 2001 Variation in the phase structure of applicatives. Linguistic Variation Yearbook 1: 105− 146. Miller, Philip H., and Ivan A. Sag 1997 French clitic movement without clitics or movement. Natural Language and Linguistic Theory 15(3): 573−639. Miyagawa, Shigeru 1994 (S)ase as an Elsewhere Causative. In: Program of the Conference on Theoretical Linguistics and Japanese Language Teaching, Tsuda University, 61−76. Miyagawa, Shigeru 1998 (S)ase as an Elsewhere Causative and the Syntactic Nature of Words. Journal of Japanese Linguistics 16: 67−110. O’Herin, Brian 2001 Abaza applicatives. Language 77: 477−493. Pfau, Roland 2009 Grammar As Processor: A Distributed Morphology Account of Spontaneous Speech Errors. Amsterdam: John Benjamins. Poser, William 1992 Blocking of phrasal constructions by lexical items. In: I. Sag, and A. Szabolsci (eds.), Lexical Matters, 111−130. Stanford: CSLI. Pylkkanen, Liina 2002 Introducing Arguments. Ph.D. dissertation, MIT. Rizzi, Luigi 1990 Relativized Minimality. Cambridge, MA: MIT Press. Roberts, Ian 1985 The representation of implicit and dethematized subjects. Ph.D. dissertation, UCLA. Sadock, Jerrold M. 1991 Autolexical Syntax: A Theory of Parallel Grammatical Representations. Chicago: University of Chicago Press. Selkirk, Elisabeth O. 1996 The Prosodic structure of function words. In: J. L. Morgan and K. Demuth (eds.), Signal to Syntax: Bootstrapping from Speech to Grammar in Early Acquisition, 187−213. Mahwah, NJ: Lawrence Erlbaum. Siddiqi, Dan 2009 Syntax within the Word: Economy, Allomorphy and Argument Selection in Distributed Morphology. Amsterdam: John Benjamins. Stewart, Thomas 2001 Georgian agreement without extrinsic ordering. The Ohio State University Working Papers in Linguistics 56: 107−133. Stump, Gregory T. 2001 Inflectional Morphology: A Theory of Paradigm Structure. (Cambridge Studies in Linguistics 93.) Cambridge: Cambridge University Press.
1154
V. Interfaces
Svenonius, Peter 2005 Two domains of causatives. Talk presented at CASTL, March 10, 2005. http:// www.hum.uit.no/a/svenonius/papers/Svenonius05TwoDomains.pdf Travis, Lisa 1984 Parameters and effects of word order variation. Ph.D. dissertation, Massachusetts Institute of Technology. Travis, Lisa 2000 Event structure in syntax. In: C. Tenny and J. Pustejovsky (eds.), Events as Grammatical Objects: The Converging Perspectives of Lexical Semantics and Syntax, 145−185. CSLI, Stanford. Travis, Lisa (Forthcoming) Inner Aspect. Dordrecht: Springer. Van Valin, Robert, Jr., and Randy LaPolla 1997 Syntax: Structure, Meanings, and Function. Cambridge: Cambridge University Press. Weschler, Stephen 1989 Accomplishments and the Prefix re-, NELS XIX. Williams, Edwin 2003 Representation Theory. Cambridge, MA: MIT Press. Williams, Edwin 2006 Telic too late? Ms, Princeton University. Woolford, Ellen 2006 Case-agreement mismatches. In: C. Boeckx (ed.), Agreement Systems, 299−316. Amsterdam: John Benjamins. Zwicky, Arnold 1985 Clitics and particles. Language 61: 283−305.
Heidi Harley, Tucson (USA)
34. Phonological Evidence in Syntax 1. 2. 3. 4. 5. 6.
Introduction Phrasing Prominence Intonational tunes Conclusion References (selected)
Abstract Linear precedence is one of the key sources of evidence for the syntactic structure of complex expressions, but other aspects of the phonological representation of a sentence, such as its prosody, are often not considered when testing syntactic theories. This over-
34. Phonological Evidence in Syntax
1155
view provides an introduction to the three main dimensions of sentence prosody, phrasing, prominence and intonational tune, focusing on how they can enter syntactic argumentation.
1. Introduction The prosody of an utterance can be characterized as all those phonological and phonetic properties that are not determined by the choice of words and morphemes it contains or their linear order, but rather by how they relate to each other syntactically and semantically, by what aspects of the utterance are foregrounded and backgrounded, and by the role of the utterance in discourse. If certain aspects of the phonology of an utterance systematically reflect syntactic structure, then we should be able to make inferences about syntactic structure by looking at phonological evidence. Linear precedence − a phonological property − is standardly used as evidence in current syntactic research, but the same is not true for prosodic evidence. The relationship between syntactic constituency and linear order is usually taken to be entirely systematic, even if assumptions about linearization often remain implicit. An assumption shared across many syntactic theories is that there is a deterministic way in which syntactic structure maps to linear order. Examples of theories relating the two include the Linear Correspondence Axiom of Kayne (1994), and subsequent work in the antisymmetric framework; other recent approaches include current versions of OT syntax (e.g., Sells 2001), representation theory (Williams 2003), or the theory of cyclic linearization presented in Fox and Pesetsky (2005). Prosodic evidence, on the other hand, is generally considered to be an unreliable witness for syntactic structure, partly because it is harder to establish what the precise prosodic structure of an utterance really is, and partly because the relationship between syntax and prosody is considered to be more indirect and malleable. This article provides an overview of types of prosodic evidence and about different theories of how they relate to syntax, and presents arguments that prosody may be a much more useful tool of syntactic analysis than is generally assumed, even if our understanding of the relation between the two is still emerging. Three dimensions of prosodic structure are discussed: prosodic phrasing, prosodic prominence, and intonational tunes. They are in principle orthogonal to each other in that they can be varied independently, although there are some intricate ways in which they interact. The weighting between the three parts of the paper is very uneven, with most of the discussion focusing on prosodic boundaries, followed by prominence, and only a very brief discussion of intonational tunes. This weighting probably reflects the amount attention given to these domains in the more syntactically oriented existing literature on prosody, but this is not to say that they are not ultimately of equal importance from a syntactic point of view. The differential attention they receive here is simply a reflection of the state of the field and also of an attempt at keeping the length of the chapter under control. I will mostly use examples from English and French for illustration.
1156
V. Interfaces
2. Phrasing It is uncontroversial that prosodic phrasing reflects syntactic constituent structure at least to some extent. The following two utterances differ in the location of the boundary between the two clauses they include. Although they use the same words, typical renditions are acoustically quite different, the two readings are thus disambiguated prosodically: (1)
a. b.
If you wait | around it’ll come. If you wait around | it’ll come.
There is disagreement, however, about the nature of this relationship. In the following, we look first at diagnostics for prosodic phrasing, and then at theories that try to explain its relation to syntax.
2.1. Diagnostics for prosodic phrasing How does one establish the prosodic phrasing of an utterance? The following sections review various types of criteria. They are usually assumed to converge on a single structure, an assumption which itself is not unproblematic.
2.1.1. Intonational cues to phrasing One criterion for identifying prosodic domains are tonal events that align with the main prominence or the edges of phonological constituents. A common assumption is that pitch contours can be analyzed in terms of a sequence of tonal targets, consisting of high (H) and low (L) tones, and combinations thereof, following the lead of Pierrehumbert (1980). In the now often used convention for transcribing intonation called ToBI (Silverman et al. 1992), starred tones identify pitch accents, while tones with following diacritics (e.g., L-, L%) represent boundary tones of certain types of phonological constituents (e.g., intermediate phrase, intonational phrase). However, the proper phonological interpretation of pitch contours remains controversial (cf. Dilley 2005). A salient acoustic difference between (1a, b) is the temporal location of certain modulations of fundamental frequency, called boundary tones. A transcription of a typical rendition of (1a, b) following the ToBI convention looks as follows: (2)
a.
If you wait around
b.
H* H* HH*L-L% If you wait | around it’ll come. H* H-
|
H*
it’ll come.
H*L-L%
34. Phonological Evidence in Syntax
1157
A fundamental assumption of ToBI and related representations of sentence prosody is that the intonational contour of a sentence can be broken down into pitch accents which align with prominent syllable and boundary tones which align with the boundaries of certain prosodic constituents. In the examples in (2a, b), the H- boundary tones associated with what ToBI calls intermediate phrases are placed in different locations, which in turn reveal the difference in the underlying syntactic structure. Intonational cues play an important role in chunking the signal into prosodic pieces, and these bear a systematic relationship to syntactic constituent structure. The analysis of pitch contours in terms of tones makes very specific assumptions about the phonological inventory of tones in a given language. French, for example, has a quite different intonational system compared to English. In French, there is a tonal domain that is sometimes referred to as the accentual phrase. Its end is usually delimited by a high pitch accent on the last non-schwa syllable of the phrase (often preceded by a low target). The accent is often followed by a rise if another accentual domain follows, or by a boundary tone if it occurs at the end of an intonational phrase. The annotation of a typical intonation contour in French could look as follows (cf. Post 2000; Jun and Fougeron 2000; Féry 2001; Welby 2003): (3)
Lorenzo
est un petit enfant.
L
L
H*
[French]
H* L%
Just as in the case of English, there are conflicting opinions about how French prosody should be transcribed, including on whether tones should be linked to particular vowels or syllables at all. Recent analyses of French intonation and summaries of earlier approaches can be found in Post (2000); Jun and Fougeron (2000), and Welby (2003). It is important to keep in mind that how a pitch contour will be transcribed depends heavily on what phonological assumptions are made about the tonal inventory of the particular language, such that the same pitch contour will receive radically different transcriptions depending on the assumed transcription system. It is thus risky to run an experiment and only report results based on transcriptions, since, as transcription systems change over time, it will become harder and harder to interpret the results and compare them with new results. There is thus an important advantage in also reporting quantitative measures that are based on transcriptions based on theory-independent and easily replicable landmarks, or even in making available the entire data including the acoustic files, which of course requires special consent from the speakers involved in a study.
2.1.2. External sandhi Phonological processes that apply across word boundaries are often referred to as external sandhi processes. Selkirk (1972), Nespor and Vogel (1986), and many others use French liaison to exemplify external sandhi. French liaison consists of the pronunciation of an otherwise unpronounced word-final consonant, usually when a word beginning with a vowel follows (Selkirk 1972: 206):
1158 (4)
V. Interfaces a. b.
Lorenzo est petit en comparaison de Jean. Lorenzo es[t] un peti[t] enfant.
[French]
In (4a), both the [t] at the end of est and that at the of petit are lost, but for different reasons. The word est is followed by a consonant-initial word, petit, and hence there is no liaison. Petit, in turn, although followed by a vowel-initial word, remains unpronounced as well. The problem here is that it does not form a liaison context together with the following word, due to the syntactic relation between the two. In (4b), on the other hand, both [t]s are preserved since both words are followed by vowel-initial words and in addition, they bear the right syntactic or prosodic relation to the following words. Hence liaison applies. Nespor and Vogel (1986) propose that liaison applies with a certain phonological domain, the phonological phrase, and whether words are phrased together into such a unit depends on their syntactic relation to each other. More generally, work within prosodic phonology assumes that phonological processes that seem to be sensitive to syntax are in fact sensitive to certain phonological domains, and it is in the formation of these domains that syntax indirectly exerts its influence on phonological processes. An analysis of liaison in prosodic terms holds that in (a), but not in (b), there is a prosodic boundary that separates petit from the following syllable en and blocks liaison: (5)
a. b.
Lorenzo | est petit | en comparaison de Jean. Lorenzo | es[t] un peti[t] enfant.
[French]
Looking at phonological processes that apply across word boundaries may thus provide evidence for prosodic domains. In English, as in French, there are such external sandhi processes. One is flapping, which turns [d/t] into [R] at the onset of an unstressed syllable, as in (6b): (6)
a. b.
If you wai[Ɂ] | around it’ll come. If you wai[ɾ] around | it’ll come.
The difference between (6a, b) suggests that flapping is bound by a prosodic domain. The glottalization in (6a), on the other hand, illustrates a phonological process that happens at the end of a prosodic domain. Phonological processes that apply at the edges of prosodic domains are another very common type of phenomenon that can be used to motivate prosodic phrasing. These segmental cues can in fact disambiguate syntactic structures. Scott and Cutler (1984), for example, present evidence from perception experiments that segmental cues such as flapping alone disambiguate structures if one holds all other acoustic cues to phrasing constant.
2.1.3. Quantitative effects The choice between a flap, a glottal stop, and a [t], the choice between pronouncing a liaison consonant or dropping it, and the presence and absence of a tonal boundary tones − all of these might constitute categorical differences (although this is not obvious
34. Phonological Evidence in Syntax
1159
at least in the case of flapping). However, there are also cues for prosodic phrasing that are clearly quantitative. Lehiste (1973), looking at various structural ambiguities and their prosody, found duration to be the most reliable cue in disambiguating them. The importance of durational cues in signaling prosodic phrasing has been confirmed in many studies since (e.g., Price et al. 1991; Wightman et al. 1992). There are actually various different durational effects induced by prosodic phrasing. Pre-boundary lengthening in English, for example, affects mostly the last syllable preceding a boundary and stressed syllables (Turk and Shattuck-Hufnagel 2000). In addition to pre-boundary lengthening, there are also durational effects at the beginning of prosodic domains. Fougeron and Keating (1997) found that there is a strengthening in the articulation of phrase-initial segments, resulting in longer duration. Both preboundary lengthening (Wightman et al. 1992) and domain-initial strengthening (Fougeron and Keating 1997) have been found to distinguish boundaries of different strengths by different degrees of lengthening; they are thus gradient cues to relative boundary strength. Another important cue for prosodic grouping is pitch. A percept of juncture can be induced by the relative pitch scaling between accents and pitch resets between phrases. Resets, which some theorists represent as phonological events analogous to boundary tones (Truckenbrodt 2002), are perceived as discontinuities and form a strong cue for prosodic boundaries (Pijper and Sanderman 1994). Relative pitch scaling between phrases signals prosodic grouping (Ladd 1988; Féry and Truckenbrodt 2005), and van den Berg et al. (1992) have shown that scaling can be used in multiply embedded structures to signal several layers of bracketing. Quantitative interactions at the segmental level also give important evidence for prosodic grouping. The degree of coarticulation and gestural overlap can reveal whether or not words are separated by a prosodic boundary. The divide between coarticulation and full assimilation as observed in sandhi phenomena may not always be clear-cut, but various previous studies have found clear patterns that contrast full assimilation and partial gestural overlap (Niebuhr et al. 2011). An example is palatalization in English (Zsiga 1995): (7)
a. b.
did you → di[ʤ]u want you → wan[ʧ]u
Palatalization is more common in fast speech than in careful speech (Cooper and PacciaCooper 1980), and is blocked if a “major syntactic boundary” which coincides with a prosodic boundary intervenes (Cooper et al. 1978). While palatalization can be complete, it can also happen gradiently (Zsiga 1995), and thus it can provide gradient cues to how close a unit two words form.
2.1.4. Contradictory diagnostics and phonological representation According to the influential theory of the prosodic hierarchy, prosodic boundaries of differing strengths are categorically different, and these categories are organized in a hierarchical way. Phonological processes are characterized by applying within or at the
1160
V. Interfaces
edge of prosodic domains of a certain category, and prosodic categories in turn can thus be identified by considering whether certain rules apply or not. Phonological categories lower on the hierarchy define boundaries that are weaker compared to boundaries of categories higher on the hierarchy: (8)
Prosodic Hierarchy (Selkirk 1986: 384): ( ( ( ( ( )( ( )( )(
)( )( )( )(
)( )( )( )( )( )( ) ( )( )( ) ( ) ( ) ( )( )( ) ( ) ( ) ( ) ( )( ) (
) ) ) ) ) )
Utterance Intonational Phrase Phonological Phrase Prosodic Word Foot Sylllabe
The prosodic hierarchy hypothesis holds that every utterance contains a full suite of prosodic constituents, and each one properly contains one or more constituents of the next lower kind. Recursive nesting of prosodic constituents should be impossible, although Ladd (1986, 1988); Dresher (1994); Truckenbrodt (1995); Wagner (2005b, 2010); Ito (2010); Selkirk (2011) and Elfner (2012) argue that some level of recursion should be allowed. If all of the above-listed diagnostics for prosodic phrasing, both the various quantitative phonetic cues and the categorical phonological cues such as pitch events and sandhi phenomena, are really reflexes of the same prosodic representation, then a straightforward expectation is that they should converge on one and the same prosodic phrasing. This, however, is not always the case. A prime example is French. While the phrasing motivated by accent placement in French is by all accounts in tune with quantitative cues to phrasing (Jun and Fougeron 2000), the same is not true for liaison. Post (2000); Fougeron and Delais-Roussarie (2004); Miller and Fagyal (2005) and Pak and Friesner (2006) all find that liaison sometimes applies across prosodic boundaries, sometimes even major ones. And furthermore, liaison quite often does not apply within a prosodic domain although it should, at least according to theories that closely link certain prosodic domains with the application of certain processes. But there is lots of variation, and it is clear that lexical frequency and collocational frequency play a big role (Miller and Fagyal 2005). This may not be all too surprising, given that liaison in French is heavily morphologized and lexicalized in instances of phonologically conditioned allomorphy. However, this implies that we can’t, in fact, use liaison as a criterion for establishing surface prosodic phrasing. It also means that liaison domains do not fit into a prosodic hierarchy, since contradictions with intonationally motivated boundaries exist. This may in fact be quite characteristic of a fair number of sandhi processes that were used to establish the syntax−phonology relationship in the literature on prosodic phonology. Chen (1987), for example, points out that prosodic boundaries induced by focus do not affect the domains for tone-sandhi in Xiamen (Taiwanese), and even an intonational phrase break can interrupt them. So while Chen’s insights into the formation of sandhidomains were one of the main empirical motivations of the so-called edge-marking approach to the mapping of syntax to phonology (cf. Selkirk 1986), Chen’s paper actually also provides falsifying counter-evidence against this theory. These inconsistencies would warrant a closer look at the syntax and prosody of Xiamen, and how sandhi and focus
34. Phonological Evidence in Syntax
1161
domains relate to it. More evidence for contradictions arising when using different phonological processes as criteria for establishing prosodic phrasing at the same time are discussed for Luganda in Hyman et al. (1987) and for Kimatuumbi in Odden (1990). Other processes, such as flapping and palatalization in American English, or Mandarin tone-3 tone sandhi (Chen 2000) do not interact with morphology in the same way as liaison does, and do not appear to be sensitive to the particular syntactic environment, and are sensitive to speech rate. However, also here, there is some evidence for lexical factors to play a role at least in the case of flapping. High-frequency words are more likely to undergo flapping than low-frequency words (Patterson and Connine 2001), and whether flapping occurs across word boundaries or not depends on the mutual predictability between two words and collocational frequency (Gregory et al. 1999). Also, Cooper and Paccia-Cooper (1980) note that flapping does not necessarily occur even when there is no prosodic break intervening, and Ranbom et al. (2009) report evidence from a corpus study that sometimes flapping even occurs when the segment is followed by a prosodic boundary or a pause. The influence of lexical frequency, mutual predictability, and the possibility of sandhi processes across even strong boundaries calls into question the widely held assumption of the prosodic hierarchy, which holds that prosodic domains completely determine whether or not a phonological rule will apply. One interpretation is that it is actually incorrect that the domain of segmental phonological rule application is necessarily and sufficiently characterized by prosodic domains. Alternatively, this might point to two different levels of representations that phonological rules can be sensitive to, an early more morpho-syntactically motivated parse and a later surface-prosodic one (Kaisse 1985; Seidl 2001; Pak 2008). In any event, when investigating the relationship between phonology and syntax it is therefore crucial to distinguish between processes that reflect the surface prosodic phrasing and processes that reflect syntactic or morphological relations directly. This point was already discussed in Kaisse (1985), but has sometimes been overlooked in the literature since. An alternative perspective on the relation between certain sandhi phenomena like flapping is that they are part of a family of cues to relative boundary strength, along with intonational, segmental, and durational cues. Rather than being causal reflexes of rules applying within constituents of particular phonological category, they actually form a set of correlated but in principle orthogonal cues for relative boundary strength. Whether or not a particular sandhi rule applies is just one signal that contributes to the percept of a certain boundary strength, and conversely many other factors may influence whether a sandhi rule applies or not. As will be discussed in more detail in the next section, this alternative way of looking at things is still compatible with the idea that flapping is used as a cue for surface prosodic phrasing, and indirectly for syntactic structure: If there is flapping across one but not across another boundary, this may still be a strong cue for a difference in relative prosodic boundary strength between those two boundaries. Arguments in favour of relative rather than absolute boundary differences are given in Wagner (2005b, 2010). Intuitions of naive listeners of prosody usually agree to a high extent on relative prominence and relative boundary strength, making an analysis positing relative strength as a primitive plausible, even if these observations are compatible with a categorical interpretation (Pijper and Sanderman 1994; Cole et al. 2010). The relational view of the syntax-phonology mapping also fits well with recent findings that in perception, prosodic boundaries are interpreted relative to earlier prosodic boundaries (Clifton et al.
1162
V. Interfaces
2002; Frazier et al. 2006), and findings that in sentence production later boundaries are scaled relative to earlier occurring boundaries (Wagner 2005b). An even more radical departure from the received view is that one could argue that for processes like flapping, the conditioning environment can in fact be characterized entirely segmentally by saying that a [t/d] flap if a vowel follows unless they are in the onset of a stressed syllable. The interaction with prosody could be entirely indirect: Across a prosodic boundary it is less likely that the following segmental context will be planned out already. Speakers don’t consistently encode the next phonological word, so the conditioning environments may simply not be present at the point in time when the allomorph choice is made. The plausible assumption necessary for this view is that the prosodic strength of the boundary between two words correlates with the likelihood that the segmental content of the beginning of the second word will already be phonologically encoded at the time when the first word is planned (cf. Levelt 1992; Wheeldon and Lahiri 1997; Miozzo and Caramazza 1999; Wheeldon and Lahiri 2002). Such an account of the constraints on sandhi phenomena based on the locality of production planning is presented in Wagner (2011, 2012c). Linking phonological locality directly to online processing and the locality of speech production is unconventional from the point of view of the generative tradition, but it actually makes for a highly parsimonious and modular characterization of the interaction between syntax and phonology. It also fits well with recent psycholinguistic models of phonological processing, which seek the explanation for certain phonological patterns in the interaction of processing constraints and the planning or retrieval of phonological representations (cf. Goldrick 2011). This move toward a theory informed by processing considerations has the desirable effect that it can connect the phonological substance of the structural change and the structural environment of a process to the particular type of morpho-syntactic locality it should obey, and can also make predictions about whether or not it will be variable. Both types of predictions lie outside of what current models of the syntax-phonology interface can explain.
2.1.5. Summary: Diagnostics for phrasing The main lesson to draw from the discussion of diagnostics for prosodic phrasing is that when evaluating whether syntax and phonology are in tune, it is crucial to use multiple tests for prosodic structure and assure that they converge on the same structure. In cases of contradictions, the quantitative phonetic cues and the evidence from less morphologized and less lexicalized phonological and phonetic processes are most likely to reveal the surface prosodic bracketing.
2.2. The nature of the relation between syntax and prosody Most current linguistic theories propose a rather direct relationship between syntax and phonology and postulate a systematic mapping between syntactic and prosodic bracketing. They vary in the precise mapping function and the degree of isomorphism between the two structures they assume. There are, however, other theories that assume a much more indirect relationship between the two, and try to explain intuitions about prosodic
34. Phonological Evidence in Syntax
1163
grouping as a reflex of processing factors such as the cost of planning of upcoming and the integration of preceding constituents. The loci of high processing cost may correlate up to a point with syntactic constituency. The emphasis of the following review will be on theories that suggest a more direct, syntax-based mapping, but the processing-based approach has been gaining steam over the past years, and since it provides a more parsimonious explanation by evoking general processing principles to account for prosodic phenomena it constitutes the null hypothesis against grammatical accounts have to be evaluated. Eurythmic effects on phrasing such as the tendency for binary prosodic constituents (e.g. Selkirk 2011) or constituents of equal size (e.g., Ghini 1993) complicate the picture, and they themselves might relate to ease of processing or grammatical constraints. The precise way in which syntactic and eurythmic factors interact will not receive much attention in this review.
2.2.1. Is prosodic structure isomorphic to syntax? Most early quantitative work on prosodic boundaries such as Lehiste (1973); Grosjean and Collins (1979); Cooper and Paccia-Cooper (1980) and Gee and Grosjean (1983) made an assumption about the relation between syntax and phonology that assumes a close match between the two. The basic intuition is that constituents that form prosodic units also form syntactic constituents in surface structure. This assumption was also at the heart of most generative approaches up until the 1980s. Consider the recursive algorithms proposed to derive a sentence-level prosodic representation in Chomsky et al. (1957) and Chomsky and Halle (1968). In Chomsky et al. (1957) and Chomsky and Halle (1968), the phonological representation of an utterance is a translation of the tree structure including the segments contained in the lexical entries in the terminal nodes into a linear segmental stream. Hierarchical relations are encoded by using n-ary sequences of # symbols. Every syntactic node is translated into a pair of # symbols that flank whatever phonological material the constituents dominated by that node contain. A simple right-branching structure [ a [ b c ]] might correspond to the following transcription: (9)
[ a [ b c ]] → ##a###b##c###
An issue with the particular mapping assumed in Chomsky et al. (1957) and Chomsky and Halle (1968) noted in Wagner (2005b) is that it runs into problems with respect to keeping certain constituents that are internally less complex on par prosodically with more complex constituents that are syntactically at the same level. These cases require a fix similar to the Stress Equalization Principle proposed for the assignment of prominence in Halle and Vergnaud (1987). In principle, this prosodic transcription of constituent structure could be pushed to be completely isomorphic and represent the entire tree structure in terms of boundaries of different strengths (as represented by the number of #-symbols). But the claim in Chomsky and Halle (1968) was actually a weaker one: Only certain nodes, roughly lexical words, NPs, and CPs, actually have the property that they induce boundaries around the words flanking them (similar to today’s ‘phases’). The mapping from syntax to phonology thus effectively “flattens” the tree structure into a less articulated prosodic structure which only partially encodes the syntactic tree it is based on. This captures the intuition
1164
V. Interfaces
that many expressions that are assumed to involve an articulated phrase structure are mapped to a comparatively flat sequence of prosodic phrases, for example a right branching structure (e.g. complex VPs in English) often maps to a sequence of phonological units separated by boundaries of equal strength. If one assumed an isomorphic mapping with boundary symbols added around every node, this would suggest that every rightbranching structure necessarily has a stronger juncture between a and b than between b and c, contrary to fact. This intuition is captured in SPE by assuming that only certain nodes induce boundaries around them (roughly the same nodes that many authors today assume to be “phases”). It is not the case, however, that an isomorphic mapping necessarily cannot account for the intuition that prosodic structure is “flatter” than syntax. There are many conceivable one-to-one functions between syntax and prosody. It seems that conveying many different degrees of prosodic boundary strengths is difficult either for production or perception reasons, so representing any syntactic embedding by boundaries of different strengths would be an inefficient use of phonological resources. Wagner (2005b) argues that there could still be a one-to-one mapping between a syntactic structure and a prosodic structure (sometimes overridden for eurythmic reasons) if certain articulated syntactic structures were mapped to a sequence of prosodic phrases that is prosodically ‘flat’ in that they are separted by boundaries of equal size. The basic intuition of this approach is that syntactic sisters are matched in prosodic size, as first proposed in Wagner (2005a: 359). This basic case is generalized to all nodes in certain right-branching structures that are derived mono-cyclically. The crux of the analysis is that there can be an isomorphic mapping between syntax and prosody, even though the phonology appears to be ‘flatter’ than the syntax. Such a mapping between syntax and phonology is an efficient use of phonological resources if (i) distinguishing prosodic boundaries of different strength is hard to process, and (ii) most structures are right-branching. A right-branching-bias in syntax is exactly what is argued for on independent grounds in Haider (et seq. 1993) and Phillips (1996). The concrete model in Wagner (2005b) assumes that any structure assembled in a single cycle is rightbranching, and the pieces assembled are mapped to prosodic constituents separated by boundaries of equal strength, they are matched in prosodic size. In other words, the mapping can be still be isomorphic, and yet account for the apparent flatness of prosodic structure compared to syntactic structure. A simpler assumption would be to posit that prosodically flat right-branching structures are actually syntactically ‘flat’ and involve more than two sister constituents (Culicover and Jackendoff 2005), in which case ‘sistermatching’ (Wagner 2005a) would be a sufficient account. The choice between these two accounts may in the end depend on theory-internal considerations (Wagner 2005b). A more recent account for sister-matching, formalized in an OT framework, is made in Myrberg (2013). Of course, the assumption of isomorphism might be too strong a theory of how phonology reflects syntax, and the proposal in Wagner (2005b) actually notes that distinctions among boundaries of lower ranks that the algorithms derives are often not realized phonology and “washed out.” So just as the proposal in SPE and almost all approaches since then, the actual mapping function proposed there is not isomorphic but merely homomorphic, that is, not all the details of syntactic bracketing is recoverable from prosody. Different homomorphic theories can be classified by which aspects of syntactic structure they preserve, and it is useful to make explicit how families of theories differ from each other.
34. Phonological Evidence in Syntax
1165
2.2.2. Monotonic theories vs. non-monotonic theories An assumption that is much weaker than a perfect isomorphism is shared by many theories of how syntax maps to prosody, including Chomsky and Halle (1968), Wagner (2010), and many other. These theories all assume a ‘monotonic’ mapping, in the following sense. The following is a slightly revised version of the Hypothesis on Attachment and Prosody proposed in Wagner (2010), who argues for a monotonic view of the relationship between syntax and phonology, with limited adjustments for eurythmic phonological reasons. (10) Monotonicity A mapping from syntactic constituent structure to a prosodic representation is monotonic if the following holds: In a sequence A 3 B 3 C, if the prosodic boundary separating A and B is weaker than the one separating B and C, then there is a node dominating AB that excludes C; if it stronger, then there is a node dominating BC that excludes A. The hypothesis that the mapping is monotonic, although much weaker than assuming an isomorphism, still makes strong and easily falsifiable predictions. An influential theory that doesn’t assume (10) and is hence non-monotonic is the theory of Edge-Alignment (Chen 1987; Selkirk 1986). Monotonicity is also not assumed in the mapping proposed in Nespor and Vogel (1986). At least some of the mismatches that the theory of EdgeAlignment is designed to capture, however, may only be apparent, and disappear once the syntax is looked at more closely. I will turn to (apparent) mismatches in the section (2.3). Most recent proposals in the literature actually seem to have moved away from making strong claims about non-monotonic mapping. For example Selkirk (2011)’s Match Theory abandons the systematic non-monotonic predictions of Edge-Alignment, and argues for a monotonic mapping, modulo phonological factors that can override it. This is effectively the position argued for in Wagner (2005b). Match theory, although presented as a development of prosodic phonology, is hence a radical departure from a 25-year tradition of non-monotonic accounts within prosodic phonology, even if this is not made explicit in the exposition of the theory in Selkirk (2011). The question whether the mapping between syntax and prosody is indeed monotonic is hardly settled, however. For example, there is a cross-linguistically recurring prosodic asymmetry in phrasing between heads taking a complement to the left and heads taking a complement to the right. If true, this is a generalization not explained by current monotonic theories of syntax-phonology mapping (for an insightful discusssion see Taglicht 1998).
2.2.3. Relational theories vs. designated categories The recursive mapping algorithms deriving boundary strength in Chomsky et al. (1957) and Chomsky and Halle (1968), and also more recent accounts such as the one proposed in Wagner (2005b) and in Pak (2008), assume that syntax fixes relative but not absolute ranks of prosodic boundaries. They are thus relational theories, in contrast to theories
1166
V. Interfaces
that map certain syntactic types of objects to certain phonological categories, that is, theories that operate with “designated categories” (Hale and Selkirk 1987). Nespor and Vogel (1986), for example, provide generalizations that are intended to capture how phonological domains of different types are constructed by making reference to particular lexical or syntactic information. To give an example: (11) Phonological Phrase Formation I. φ domain The domain of φ consists of a C [clitic group] which contains a lexical head (X) and all C’s on its nonrecursive side up to the C that contains another head outside of the maximal projection of X. […] (Nespor and Vogel 1986: 168) The definition of phonological phrases makes direct reference to syntactic categories (head, maximal projection). A crucial notion in this particular definition is that of the “recursive side of a constituent.” Which side counts as recursive in a language is assumed to be a fixed parameter, which raises considerable issues in languages with mixed headedness such as any Germanic language. Other researchers have invoked c-command as the relevant notion in constructing phonological domains, such as Kaisse (1985), Hayes and Lahiri (1991), and Guimara˜es (2004), with considerable success. Edge-based theories of mapping (Selkirk 1986; Chen 1987; Selkirk 1995) align the edges of certain syntactic constituents with the edges of certain phonological constituents, or enclose certain syntactic constituents in phonological domains of a certain kind (Truckenbrodt 1995). This idea of designated categories for prosodic domains was formalized in Selkirk (1996: 444) as follows: (12) The edge-based theory of the syntax-prosody interface Right/Left edge of α → edge of β, α is a syntactic category, β is a prosodic category. The intuition behind both approaches is that syntactic objects of a certain size map to phonological objects of a certain size, for example, clauses tend to correspond to intonational phrases. The edge-based theory has been shown to run into trouble when the predicted mismatches between syntax and phonology are scrutinized more closely as will be discussed shortly. A direct form of evidence for designated categories assumed by Match theory, Edge Alignment, and many other theories would require looking at segmentally identical or near identical utterances with varying syntactic structures. The following example is an admittedly awkward attempt at creating such an example in English. Provided that the predicate in the second example is accented, the pair is quite minimal. The theories positing designated categories would predict a categorical phonological difference in prosodic structure due to the difference between a clause boundary and a noun phrase boundary, with only the former inducing an intonational phrase break: (13) a. b.
The dark things in the cool pools. The lark thinks that the pool cools.
34. Phonological Evidence in Syntax
1167
It is not clear though that there is such a categorical phonological difference in prosodic phrasing that depends on the syntactic category of the constituents involved, and to my knowledge there is no experimental evidence that directly tests such predictions. Some subtle differences might exist just because noun phrases and sentences might differ in processing time, and the conditional probability of the constituent following the crucial boundary would habe to be controlled for in an actual study. The evidence for a close correspondence between syntactic category and phonological category based on controlled minimal examples is usually not provided in phonological descriptions within this framework. This is surprising, given how crucial these examples would be to justify the premise of theories with designated categories. There are two types of processes that have been observed that pose challenges to the theory of designated categories. On the one hand, as discussed before, certain segmental alternations that have been used to motiviate categorical differences between phonological constituents correlate well with syntax but not with surface prosody (e.g., liaison, Taiwanese tone sandhi, as discussed above); on the other hand certain segmental alternations correlate well with surface prosody, are affected by speech rate, but do not line up with syntactic categories in any systematic way (e.g., flapping, or Mandarin tone sandhi), that is, they do not actually align with the syntactic landmarks they are designated for. Both types of cases constitute divergences from the predictions of theories that negotiate the interface between syntax and prosody via designated categories. The relational view of boundary strength discussed before has the advantage that it can straightforwardly explain why it is that many sandhi rules such as flapping can optionally apply in wider or less wide domains, for example in coordinate structures. Speakers may simply have a lot of freedom in which particular line of the grid (in other words, which boundary rank) they map to which particular type of phonological unit: (14) Syntax: (a cat or a rat) and a hat Possible Prosodies: (a caR or a raɁ) and a haɁ (a caR or a raR) and a haɁ (a caijor a raɁ) and a haɁ What is impossible is flapping across the stronger boundary when there is no flapping across the weaker boundary: (15) Syntax: (a cat or a rat) and a hat Prosody: *a caɁ or a raɾ and a haɁ Flapping is thus a tool that can be used with some flexibility to differentiate the strength of different boundaries, and this is exactly what is expected under the view that syntax fixes relative but not absolute ranks. Relational models are compatible with the view that flapping is in fact not tied to a particular phonological domain at all, but is simply used as one out of many cues for relative boundary strength, along with final lengthening, initial strengthening, and the presence/absence of boundary tones, a view that, as discussed above, gets some support from experimental studies on flapping that show lexical effects on the likelihood of flapping.
1168
V. Interfaces
The designated-category approach would have to attribute such variation to an effect of variable prosodic phrasing, that is, a phonological domain of a certain type (say, the phonological phrase) which provides the domain for flapping is assigned variably. Such an account is indeed proposed in Nespor and Vogel (1986), where various rules of restructuring are used to account for such variation. There are some languages that seem to provide evidence for clear categorical phonological reflections of clause boundaries, for example, the distribution of certain boundary morphemes and allomorphs in Mayan languages seems to be sensitive to whether they occur adjacent to a phonological boundary of a certain category, the intonational phrase (Aissen 1992; Henderson 2012). The prosodic analysis of the distribution seem to require reference to designated prosodic categories, since it is only intonational boundaries that are relevant. It is difficult though to assess whether the generalization is really phonological in nature rather than syntactic, since the proposed phonological IP-boundary correlates perfectly with the presence of a clause boundary, so the boundary itself could be analyzed as the spell-out of a functional head in the CP domain. Similar questions arise in the analysis of the intonation of English, in particular in the analysis of tonal events at clause boundaries. On one end of the spectrum is the syntactic view, positing that intonational tunes are the pronunciation of syntactically represented elements that combine with syntactic constituents just like other nodes that are associated with segmentally content-full morphemes. This view was pioneered by Stockwell (1960) who proposed that the intonational morphemes combine with syntactic phrase markers by virtue of re-write rules, and was recently formalized in the context of Categorial Grammar by Steedman (2000). On the other end of the spectrum are approaches that view intonational tunes as emerging from an independently generated prosodic representation in phonology which aligns in complex ways with a syntactic representation (e.g., Selkirk 1995, 2005). The aforementioned idea that phonological processes are actually not constrained by prosodic domains but rather by the locality of production planning might be another way to make sense of the differing generalizations about domains of application. Taiwanese tone sandhi, for example, is sensitive to whether or not a lexical word is the last word within a certain syntactic domains. The number of upcoming words within a domain has been shown in the literature on production planning to be planned far ahead (e.g. Sternberg et al. 1978). The precise phonological content of upcoming words, however, is available in a much more local domain. Mandarin tone 3 sandhi, in contrast to Taiwanese sandhi, is sensitive to the identity of the lexical tone on the following word. This could potentially explain why Mandarin tone sandhi is subject to a much stricter locality restriction. More generally, this view makes an interesting production: The locality domain of phonological sandhi processes should directly depend on the type of information that triggers it. The more finegrained the information about the upcoming word is, the more local and sensitive to prosody should be the domain of application, and hence more variable. Rigging up the mapping between syntax and prosody in terms of designated categories presupposes distinctions among phonological categories as they are embodied by the prosodic hierarchy. But it is important to note that conversely, the hypothesis of there being a prosodic hierarchy does not entail that the mapping between syntax and phonology functions in terms of designated categories, a point already made in Dresher (1994). For example, if the mapping from syntax actually only fixes relative prosodic ranks, and
34. Phonological Evidence in Syntax
1169
phonology uses a fixed hierarchy to implement those ranks, there could simply be a lot of flexibility regarding which line in the prosodic hierarchy should line up with which boundary rank (for discussion see Wagner 2005b). The relational view is of course equally compatible with a more recursive view of the phonological representation of prosody itself, and a smaller inventory of phonological categories. I will not explore the issues that arise in determining of prosodic representation, and simply point out that recent approaches have shifted away from some of the strong assumptions embodied in the prosodic hierarchy, and have moved to more recursive representations (Ladd 1986; Ladd and Johnson 1987; Kubozono 1989; Truckenbrodt 2002; van den Berg et al. 1992; Wagner 2005b; Ito 2010; Elfner 2012; Ito and Mester 2012).
2.2.4. Is prosodic phrasing juncture-based? Most accounts of syntax-phonology mapping and all of those mentioned so far make an assumption about intuitions about boundary strength and prosodic phrasing that seems so innocent that I know of no explicit discussion of it: (16) Hypothesis about Juncture and Prosodic Phrasing In a sequence A 3 B 3 C, if the boundary separating A and B is weaker than the one separating B and C, then AB are part of a phonological constituent that excludes C; if it is stronger, then there is a phonological constituent including BC that excludes A. This assumption is implicit in all theories discussed so far, independent of whether they are monotonic or not, and whether they are relational or operate with designated categories. Nevertheless, a recent analysis of the sentence prosody of Irish (Elfner 2012) departs from this assumption, at least to some extent. For example, in a VSO sentence, previous literature reported a weaker morpho-phonological juncture between V and S compared to the one between S and O (Ackema and Neeleman 2003). The proposed phonological structure in Elfner (2012)’s analysis, however, groups together S and O to the exclusion of V. This prosodic structure has the advantage that it is line with the underlying syntactic structure assumed in most on the syntactic literature on Irish (cf. McCloskey 2011). The pattern in the distribution of pitch scaling that Elfner (2012) is able to capture based on this account is intriguing. It seems too early to assess whether this analysis is the only feasible interpretation of the Irish data, and also whether more evidence for this kind of pattern will be found in other languages, but considering the possibility that (16) might not actually hold opens up a whole new perspective in how to look at sentence prosody. It is not inconceivable that intuitions about juncture might find an alternative explanation, and that the actual prosodic structure indeed does not line up neatly with the prosodic constituency relevant for pitch accent and boundary tone assignment. This type of account might be able to explain the inconsistencies between Chen (1987)’s findings and the edge-marking theory based on it mentioned before. It could be that certain junctures
1170
V. Interfaces
in the sentence are realized at points of increased processing load, but do not directly affect phonological domains. This brings us to processing-based explanations for boundary placement.
2.2.5. Processing accounts The approaches discussed so far all assume that prosody directly encodes syntax to some extent or other, even if how this this is conceived of varies (monotonic vs. non-monotonic mapping, designated categories vs. relative boundary strength). An alternative view is that the distribution of prosodic boundaries can be explained by looking at processing factors, and syntax plays into this only indirectly as one factor contributing to processing cost. One view holds that speakers deliberately employ prosodic means as a tool to facilitate processing (either for the listener or the speaker). Snedeker and Trueswell (2003), for instance, propose that prosodic boundaries are inserted whenever necessary in order to avoid structural ambiguities. Whether or not they are used depends on how aware the speaker is of the ambiguity. This hypothesis still assumes a reliable relationship between boundaries and syntactic bracketing, but posits that whether prosody will be used to encode syntax is optional and depends on whether or not the speaker wants to deliberately disambiguate two relevant readings. An alternative view is more production-oriented, and holds that prosodic boundaries are inserted at locations of high processing cost (e.g. Watson 2002), thus facilitating planning of upcoming constituents and recovery from preceding ones. Effects of planning on prosodic phrasing were also discussed in Ferreira (1993). A third and related view, finally, holds that prosodic boundaries reflect a lack of mutual predictability between two words, and prosodic phrasing in general reflects the expectedness of material in context (Turk 2008). This view is part of a broader theory of how predictability shapes prosodic reduction, which also tries to explain prosodic prominence (Gregory et al. 1999; Aylett and Turk 2004; Jaeger 2006). A problem for purely processing-oriented approaches was pointed out in Ferreira (2007), who argues that there are actually qualitative distinctions between acoustic reflexes of processing effects and those directly due to prosodic structure. For example, there may be pauses and disfluencies due to a retrieval difficulty of upcoming material that are noticeably different from planned intonational breaks, which come with preceding final lengthening and pitch cues. So some aspects of prosodic structure may be part of what is grammatically encoded in the message, while other aspects might indeed be reflexes of the on-line processing of the message. A model that ties prosodic phrasing and processing cost too closely together may not be able to explain such distinctions.
2.3. Mismatches between syntax and prosody Prosody and syntax sometimes seem to mismatch. This might mean that we have to complicate the mapping function from syntax to prosody, it might be evidence that prosody does not, in fact, reflect syntax after all or it might be evidence that our syntactic
34. Phonological Evidence in Syntax
1171
analysis is wrong. There is only one way to find out: We need to consider all the prosodic and syntactic evidence available to us in order to understand the apparent or actual mismatch. Wagner (2010) and Steedman (2001) have recently argued that in several cases that would conventionally be considered mismatches, prosody may actually be quite in tune with syntax. I will discuss three examples to illustrate the issues that arise.
2.3.1. Post-verbal junctures Gee and Grosjean (1983: 435) cite certain post-verbal constituents as examples where the main syntactic juncture falls between subject and predicate, but the main prosodic boundary falls between predicate and the following constituent. Consider the following example from Ferreira (2007: 1158): (17) a. Prosody: They have been avidly reading | about the latest rumours in Argentina. b. Assumed Syntax: [CP They have been avidly [VP reading [PP about the latest rumors in Argentina. ]]] Monotonic approaches seem to make the wrong prediction here, since they predict that verb and PP do not form a constituent. It is not so clear, however, whether the EdgeAlignment approach or any other existing non-monotonic account offers a straightforward explanation either, since aligning the left edge of an XP with a prosodic domain − as would be necessary to introduce a boundary between verb and prepositional phrase− would also result in a prosodic boundary before the verb. The overly strong implications regarding mismatches between syntax and prosody made by Edge-Alignment can be reined in by counter-acting constraints, for example, Truckenbrodt (1995) proposes a set of Wrap-Constraints, which favour representations in which certain syntactic constituents (e.g., XPs) are contained in certain phonological constituents (e.g., phonological phrases), although again this may not help in the present case. However, before we complicate the mapping function between syntax and prosody based on examples like the one in (17a), we need to convince ourselves that it is really necessary or even desirable in this case. The strength of any claim about a mismatch between syntax and prosody depends on the strength of the arguments supporting the syntactic analysis it is premised on. So what we should first ask about (17a) is how compelling the syntactic evidence is that the PP necessarily attaches low, as is assumed in (17b). Discussions of mismatches between syntax and prosody often stop short of even raising the question, the tacit assumption being that the syntactic analysis is obvious. However, it is clear that about the latest rumors in Argentina at least can attach much higher than is assumed, including attachment sites above high adverbials such as unfortunately: (18) They have been avidly reading, unfortunately, about the latest rumours in Argentina.
1172
V. Interfaces
The apparent mismatch would be illusory if the mismatching boundary occurs exactly in those cases where the PP attaches high and is absent in cases where the PP attaches low. In order to test this, we would need ways to control the precise attachment site in cases that do not involve intervening adverbs as in (18). Arguably, one way of forcing the verb read and the following PP to form a constituent is to coordinate the preceding material, arguably a case of right-node-raising the constituent read about the latest rumours in Argentina. (19) Some are planning to or at least would like to read about the latest rumours in Argentina. In this example, it is intuitively much harder to separate the verb read from the PP by a prosodic boundary. This constitutes some initial evidence that the presence and absence of the prosodic boundary might in fact reveal something about the surface syntactic bracketing rather than pointing to a mismatch between syntax and prosody. There are various potentially confounding factors regarding the information structure and phonology of ‘right-node-raising constructions’ as in (19) which would need to be considered here. What is important to note is that independent of how solid the phonological analysis is, it is impossible to say anything about the relationship between syntax and prosody based on examples like (17a) without syntactic tests that establish the constituent structure first. Figuring out the precise attachment of constituents is not trivial, and we cannot trust our first intuition about what the bracketing might be. So in order to establish whether (17a) can plausibly be claimed to constitute a mismatch, a much closer look at syntax and its interaction with prosody will be necessary. A common response to placing too much emphasis on syntax when dealing with prosodic boundaries is that the high variability in boundary placement make a syntactic approach implausible. This worry is moot, however, if the observed variation is actually constrained by the bounds of syntax. Many typical cases of ‘mismatch’ between syntax and prosody from the literature (e.g., Shattuck-Hufnagel and Turk 1996: 201) can easily be account as cases of string-vacuous right-node-raising: (20) Sesame Street is brought to you by: || The Children’s Television workshop. A theory whithout a close match of between syntax and prosody still needs to provide an explanation why ‘optional’ boundary placement cannot arbitrarily violate syntax: (21) a. b.
# Sesame Street is brought to || you by the Children’s Television workshop. # Sesame Street is brought || to you by the Children’s Television workshop.
Such syntactic restrictions are expected if the variation observed in boundary placement is in fact mediated through syntax.
2.3.2. Relative clauses and clausal complements The example that epitomizes the discussion of whether syntax and prosody mismatch in the earlier literature involves relative clauses. The head of a relative clause tends to
34. Phonological Evidence in Syntax
1173
phrase with preceding material and to be separated from the relative clause by a boundary (Chomsky and Halle 1968: 372). Yet, at least according to a traditional syntactic analysis, head and relative clause should form a constituent: (22) a. Prosody: This is the cat | that caught the rat | that stole the cheese. b. Assumed Syntax: [CP This is [DP the cat [CP that caught [DP the rat [CP that stole the cheese. ]]]]] It could be that examples like this prove that the mapping between syntactic constituent structure and prosodic phrasing is complex. The edge-marking theory of prosodic phrasing (cf. Selkirk 1986) was specifically designed to account for such mismatches. If, for example, the left edge of clausal nodes is aligned with a prosodic boundary, perhaps an intermediate or intonational phrase, then the relative clause data in (22) can be accounted for: (23) [CP This is the cat [CP that caught the rat [CP that stole the cheese. And yet, even if much of the prosodic literature in the 80s and following took this to be a clear mismatch between syntax and prosody, this is not how Chomsky and Halle (1968: 372) originally presented it. They argued simply that a syntactic restructuring must have taken place, since the underlying syntax and surface bracketing do not match. This restructuring, if there is one, might very well take place in syntax. It is well-known that relative clauses can attach much higher in the structure than to the head of the relative clause. Relative clauses can readily be placed to the right of a sentential adverbial: (24) I couldn’t remember the name of the man, last night, that I had met at the market earlier in the day. Again, in light of the observation that the head of a relative clause can be separated by a boundary from the head, we have to ask whether these cases might involve high attachment, before concluding that there is a mismatch between syntax and prosody. According to Wagner (2005b, 2010), the mismatching prosody actually is only possible for those relative clauses which allow for high attachment (sometimes called ‘extraposed’ relative clauses). Another apparent prosodic mismatch can be observed when looking at RCs. The boundary separating the RC from the head is usually stronger than that separating the determiner from the head, for example the one separating the and cat in (23) is intuitively weaker than the one following cat, even in the absence of a strong boundary. As has been observed already in Partee (1976), the interpretation of restrictive relative clauses seems to require a bracketing that groups head and RC together to the exclusion of the definite determiner. However, this does not necessarily mean that this is the bracketing we observe in surface structure in syntax − the fact that restrictive relative clauses can extrapose at all is already an indication that different bracketings are syntactically possible.
1174
V. Interfaces
The claim that high attachment or extraposition and prosodic phrasing directly correlate receives further support from the prosody of complement clauses. Complement clauses differ from nominal arguments in a number of ways. For example, in many languages they obligatorily extrapose, that is, rather than being pronounced in the complement position of the selecting verb, they are pronounced after the clause of which they constitue an argument. A good example is extraposition in German: (25) Er hat gesagt, es werde regnen. he has said it would rain. ‘He said it would rain.’
[German]
In fact, Moulton (2012) argues that clauses in general cannot saturate argument positions, and that they always move from the complement position of predicates in order to be interpretable. The derivation proposed in Moulton (2012) is similar to the analysis of head-initial complement clauses in Kayne (2009). I refer the reader to Moulton (2012) for other relevant work on this issue. Similar to cases of heavy noun phrase shift, first the shifted constituent, in this case the complement CP, moves left-ward and then the remnant constituent including the trace moves further to the left, resulting in placing the clausal complement all the way to the right edge of the clause. This serves to derive a number of otherwise hard-to-explain facts, such as the fact that complement clauses obligatorily follow other arguments, including PPs, or that complement clauses crosslinguistically seem to extrapose, unless they have nominal morphology. Importantly for the present discussion, the quirky syntax of complement clauses seems to be matched by a quirky prosody: In contrast to nominal arguments, clausal arguments are usually separated from the selecting verb by a prosodic boundary (for evidence from German see Truckenbrodt 2010), but not if they bear nominal morphology and precede the predicate as is the case when a complement clause if pronominalized. This provides further evidence for the correlation between prosodic boundaries and extraposition construals argued for in Wagner (2005b, 2010), and it provides an argument against a superficial mapping principle that says that the left-edges of clause boundaries should map to a boundary of a certain type, since any such theory would reduce the fact that the observed prosody goes along with evidence for extraposition to an accident.
2.3.3. Complex NPs A strong version of a close link between prosody and syntactic constituent structure makes some seemingly odd predictions. Consider the prosody of NPs with post-nominal modifiers or arguments. When asked to group the NP into two intuitive subgroups, many native speakers will draw the boundary between the nominal head and a following PP argument or PP modifier: (26) a. the inventor | of the disco ball b. he inventor | with the weird hair-cut Cross-linguistically, this seems to be the preferred phrasing. For example, French is reported to have obligatory liaison between an article and a following noun, and as not
34. Phonological Evidence in Syntax
1175
showing liaison or only showing optional liaison with a following PP modifier or argument. Based on the claim of monotonic theories about how boundary strength reflects syntax (10), we are driven to the conclusion that the article and the head noun form a constituent to the exclusion of the following PP, be it an argument or an adjunct. But should the argument not be the complement of the head noun and form a constituent to the exclusion of the determiner? And doesn’t a PP modifier directly modify the head noun rather than an entire determiner phrase? The prosodic phrasing of complex NPs would have believe us otherwise. Contrary to received wisdom, Adger (2013) presents compelling evidence in favour of a constituent structure that analyzes the arguments of nouns as attaching much higher than conventionally assumed. Adger’s main arguments are typological in nature: across different languages, word order facts suggest that arguments attach high and outside constituents that modify the nominal head itself (as has been to some extent already noticed in some language previously and often been analyzed as a head-raising of N): (27) Adger’s Generalization: When AP modifiers and PP ‘complements’ both occur to one side of N inside a noun phrase, the PP is separated from the N by the AP. (Adger 2013: 93) One piece of evidence that arguments indeed attach high are cases in which we can observe a post-nominal demonstrative intervening between the head noun and the argument: (28) a. El cuadro y el foto estes de Juan the picture and the photo this.PL of Juan ‘This picture and this photo of Juan’ (see Adger 2013: 91) de Jean b. Cette peinture et ce photo-là the frame and the photo-there of Jean ‘This painting and this photo of Jean’
[Spanish]
[French]
The post-nominal demonstrative in Spanish shows plural agreement with the coordinated noun phrases, indicating that it already attaches to the conjoined expression above the singular determiners, and the PP argument attaches outside the demonstratives. This provides evidence that arguments indeed attach rather high − just as their prosodic phrasing suggests. Many more arguments based on data from a typologically varied set of languages are discussed by Adger (2013). The important lesson we can learn from Adger’s work is: Even some of our most basic assumptions about syntax, such as the assumption that the apparent complements of nouns are indeed syntactic complements, can turn out to be wrong. By implication, this means that there can be no argument about syntax-phonology interactions without establishing the syntactic structure first, and this step is unfortunately often skipped in the literature on the topic. This case also suggests that trusting the prosodic constituent structure as a guide to the underlying syntax might pay off even when it seems unlikely at first sight.
1176
V. Interfaces
2.4. Summary: Syntax and phrasing A high-level summary of this section could be this: There can be no claim about how syntax relates to prosodic phrasing, without first establishing the prosodic phrasing, based on whatever diagnostics one can find, and without first establishing on independent grounds the syntactic structure based on all the syntactic evidence one can leverage. Existing work on how syntax and prosodic phrasing relate often focuses mostly one side of this equation, and runs the risk of making the wrong assumptions about the other. If, as argued here, it is indeed the case that syntax and prosody match to a much greater degree than is commonly assumed, then this casts a more positive light on the project of using prosody as evidence about syntax. There are some studies that use experimental prosodic evidence to adjudicate between syntactic analyses. For example, Dehé and Samek-Lodovici (2009) use prosody to probe the structure internal to DPs. But this type of work is still the exception rather than the rule, and before we become more confident about how prosody and syntax relates by more rigorous work that brings together phonological and syntactic evidence, this state of affairs is unlikely to change. One domain in which there has been substantial progress in the past ten years is the relation between the scope of various operators and prosody. Sugahara (2003), Ishihara (2003) and Hirotani (2005), for example, look at the relationship between post-focal prosodic reduction and the scope of focus-related operators such as question particles and Blaszczak and Gärtner (2005) discuss the relationship between the scope of negation and prosodic boundaries. Closely related to the question of how syntactic domains line up with prosodic domains is the recent literature that tries to derive prosodic domains by cyclic spell-out. Various proposals that directly touch on prosodic constituency have been made by Ishihara (2003); Dobashi (2003); Guimarães (2004); Kahnemuyipour (2004); Wagner (2005b); Adger (2007); Kratzer and Selkirk (2007) and Pak (2008). One issue about prosodic phrasing that this review didn’t discuss is the question whether constraints on prosodic phrasing in turn can affect syntax. One of the most compelling cases for such an effect is made in Truckenbrodt (1994), who argues that certain syntactic constraints on extraposition can be explained by evoking constraints on intonational phrasing. Another proposal in the same vein involves constraints on movement that require the moved constituent to have a certain size (Zec and Inkelas 1990). I refer the reader to these earlier studies and the later work that cites them, rather than discussing these ideas here.
3. Prominence Prosodic prominence within a sentence is affected by a number of factors, and syntactic relations and information structure − the two that this article will focus on − are of particular importance. As in the case of phrasing, frequency and predictability have also been shown to be relevant, and syntactic approaches to prominence compete with explanations in terms of processing.
34. Phonological Evidence in Syntax
1177
3.1. Diagnostics for prosodic prominence Prominence, just like prosodic phrasing, is not trivial to establish. It is often not possible to tell by purely instrumental tools which of two words is perceived as more prominent. And yet a shift in prominence goes along with many acoustic differences, and the perception of prominence is often very clear. One major cue for the prominence of a word is whether or not it is accented, and also whether adjacent words are accented. Accented words are more prominent than unaccented words, and the last accented word within a prosodic domain is generally taken to be more prominent than non-final ones. The last accented word within a particular domain is often described as carrying the main or ‘nuclear’ stress (Newman 1946; Truckenbrodt 1995). Consider one of Newman’s classic examples (capitalization reflects accentedness, not necessarily focus): (29) a. I have INSTRUCTIONS to leave. ‘I am to leave instructions’ b. I have INSTRUCTIONS to LEAVE. ‘I have been instructed to leave.’ Newman (1946: 179) The last word carrying an accent is felt to be the most prominent. When leave carries the nuclear stress, the preceding word instructions can carry an accent of its own, but words following the nuclear stress, such as leave in (29a) usually remain unaccented or at least prosodically reduced. One prominence distinction often taken to be categorical is the distinction between syllables carrying an pitch accent and those that do not. However, the degree of prominence can also vary quantitatively. More prominent words are louder and longer, and emphasis is often associated with an increase in pitch range (Ladd 2008; Ladd and Morton 1997). Contextually salient material is shorter, even when accented (Bard et al. 2000; Wagner et al. 2010). Furthermore, contextually salient material is often realized with a reduced pitch range. While this may result in complete lack of accentuation, it is often the case that accents in the subordinated domain remain intact and only the pitch range is adjusted. This was observed for French (Jun and Fougeron 2000) and for English (Xu 2005). It is therefore often not quite accurate to talk about “deaccentuation” of material marked as less prominent, when in fact only a pitch range reduction may be at play, so I will use the term prosodic subordination in the following. Many authors assume that what is relevant from the point of view of how syntax relates to prominence is relative prominence rather than absolute states of being accented or unaccented. There is still no model that can successfully simulate a native speaker’s intuition about prominence, even if instrumental approaches have identified many of the acoustic cues that are most useful (cf. Ladd 2008). In an experimental situation, one can get a handle on this issue by creating stimuli that are segmentally matched and vary context or the underlying syntax, such that instrumental comparisons of minimal pairs across conditions can be informative about relative levels of prominence even in cases where absolute levels of prominence such as whether or not a constituent is accented are hard to establish. When running an experiment is not an option, one useful tool to sharpen one’s intuitions about prosodic prominence are intonational tunes. A particularly useful tool in
1178
V. Interfaces
English is the well-known vocative chant (Liberman 1975). The vocative chant is a melody that can be imposed on any utterance, typically a name, and one use is to call someone. It consists of a low plateau followed by a high pitch accent on the syllable carrying main word stress, followed by a mid tone on a secondary stress following the main stress if there is on, or else a the mid aligns in the syllable following the main stress. If main stress falls on the final syllable, the mid tone is realized on that syllable along with the high tone. Liberman (1975) illustrates how the vocative chant can be used to uncover the fine structure of the prominence relations within words. One can also superimpose the vocative chant on a bigger constituent and even sentences, and interestingly, the same generalizations about where the tune aligns scale up to sentence prominence. The location of the high pitch accent aligns with the main prominence of a sentence, and the following mid tone aligns with the syllable that is the next most prominent. The downside is that it is pragmatically somewhat odd to superimpose this tune on bigger constituents.
3.2. Nuclear stress The term nuclear stress is usually used to refer to the main sentence prominence. Whether there is such a thing as a unique nuclear stress per utterance, as assumed in Chomsky and Halle (1968), for example, is itself controversial. Newman (1946), in his influential analysis of prominence in English, allowed for the possibility of there being several. A simple, if admittedly unconventional, operational definition of the nuclear stress of an utterance is the syllable on which the high tone of the vocative chant would be placed, should one pronounce it superimposing the vocative chant, and leaving other factors such as syntactic relations and givenness and focus constant. A common assumption across a number of approaches is that there is a default distribution of accent placement, and deviations from this pattern are used to encode focus and givenness presuppositions, while alternative accounts assume that accent placement is always determined by information structural considerations.
3.2.1. Syntax or semantics? Prosodic prominence was one of the main sources of evidence used to investigate the relationship between syntax and phonology in Chomsky and Halle (1968). One main insight was that generalizations about sentence-level prominence must be stated in a recursive fashion. Rather than giving hard and fast rules that are applied to entire sentences, the transformational cycle successively negotiated prominence between the phonological representations associated with sister nodes in syntax. Schmerling (1976) and Fuchs (1976) raised some doubts with respect to the syntaxoriented approach, and proposed a semantic approach instead, based on the observation that, in Schmerling’s words, “predicates receive lower stress than their arguments,” independent of their relative order. For example, in German and English a direct object is always more prominent, independent of whether the word order is OV or VO, within both languages. Newman (1946: 179) first observed that in English, prominence by
34. Phonological Evidence in Syntax
1179
default falls on the internal argument (the “logical object”) of a predicate, independent of whether it precedes or follows it, as was illustrated in (29). Fuchs (1976) proposed that arguments and predicates become prosodically integrated, an idea which has been influential in various proposals that try to capture Newman’s basic information. In the theory of sentence stress proposed in Gussenhoven (1983), the decisive factor that makes one constituent integrate into a single accent domain with a second constituent is semantic: If two adjacent constituents form argument and predicate, they form a single accent domain. This view is of course still compatible with a recursive statement about prominence, it just shifts attention from syntactic relations between sister to semantic relations. Note, however, that forming a single accent domain as is proposed in Gussenhoven (1983) doesn’t itself actually assure that the argument will be more prominent − a predicate could form a single accentual domain with its argument also if the predicate carries prominence. This is one reason for why later accounts have often implemented the idea underlying this approach differently. The generalization that prominence falls on the argument independent of linear order, for example in OV contexts as well as in VO contexts, is unexpected under the analysis presented in SPE where prominence should always go to the right-hand sister at any syntactic node. Bresnan (1971), however, offers one way of saving the syntactic theory: By interleaving the syntactic derivation with stress assignment, she argues that in certain cases, including residual OV orders in English, stress can be assigned before movement. This proposal could account for OV orders on the assumption that they are derived from a previous VO stage in the derivation to which the nuclear stress rule applies (or alternatively one could argue that the argument reconstructs at LF and that prominence relations are computed based on the position in which a constituent is interpreted). One reason not to abandon the syntactic approach too readily is that the notions predicate and argument used in more semantically oriented accounts alone may not be sufficient to capture the generalization. Arguments are often adjacent to their predicates, and yet the predicates fail to subordinate prosodically, for example in the case of secondary predication (Gussenhoven 1992; Wagner 2005a), here illustrated by an example from Marantz (1984): (30) Elmar ate the porcupine raw. And yet, these cases may not constitute counterexamples to a semantically-based generalization, since the kind of depictive predicate occurring in (30) is probably actually a predicate over events, which includes an anaphor referring to the noun that it also predicates over. Marantz (1984) and Pylkkänen (2002) provide arguments against analyzing them as simple predicates over individuals. An interesting observation in relation to the question whether generalizations about prominence relations are more syntactic or more semantic in nature is the prosody of modification, where again the notion of functor versus argument seems to play a crucial role (Birch and Clifton 1995, 2002; Wagner 2005a). The pattern observed based on predicates and nominal arguments is part of a much broader pattern of prominence among predicates and their complements, observable in the various word orders in predicate sequences attested across Germanic dialects. Independent of the word order, main prominence falls on the complement of the most deeply embedded predicate, both in infinitival constructions and in predicate clusters. The pattern can be illustrated by an example from German which allows for at least four out of six possible word orders between three predicates in so called extraposition orders (The
1180
V. Interfaces
numbers reflect the syntactic embedding with 1 being the highest predicate and the head of its complement is labeled with the next number down. Nuclear stress is marked by an accent, secondary stresses are not marked): (31) a. … weil er versprach1 zu versuchen2 zu schwéigen3. because he promised to try to be.silent
[German]
b. … weil er versprach1 zu schwéigen3 zu versuchen2. because he promised to be.silent to try c. … weil er zu schwéigen3 zu versuchen2 versprach1. because he to be.silent to try promised d. … weil er zu schwéigen3 versprach1 zu versuchen2. because he to be.silent promised to try e. ? … weil er zu versuchen2 zu schwéigen3 versprach1. because he to try to be.silent promised f.
?? … weil er zu versuchen2 versprach1 zu schwéigen3. because he to try promised to be.silent ‘… because he promised1 to try2 to be silent3.’
In each of the word orders, main prominence falls in the most deeply embedded constituent in the selectional chain, a fact expected under the syntactic view of nuclear stress as it is proposed in Cinque (1993) and Arregi (2002). The prosodic facts in predicate clusters across dialects of West Germanic are analogous, except that any given Germanic language usually doesn’t allow this many different permutations. However, all six orders are attested in some Germanic language (Wurmbrand 2003), and the prominence relations are always such that prominence falls on the complement (Wagner 2005b). There is more to the pattern, however, than just the observation that the last complement according to the selectional chain receives nuclear stress, since prosodic subordination also occurs when predicate 2 precedes predicate 1, and in the order 2 3 1 3 3, in other words, when only part of the complement precedes. The full generalization about nuclear stress in predicate sequences that holds true across Germanic dialects can be stated as follows (Wagner 2004: 587, 2005b: 226): (32) A Descriptive Generalization A functor is prosodically subordinated if at least part of its complement precedes. While the last two word orders in the German examples in (31) may be marginal, native speakers nevertheless have an intuition about where prominence should fall. In other words, native speakers have intuitions about prominence even for a cases that lie beyond their own grammar, or at least at its margin. And the intuitions about prominence match the actually observed pattern in dialects that allow for the relevant order. This is compelling evidence that speakers have internalized a rather abstract generalization, similarly compelling as the argument in Halle (1997) that speakers of English must have internalized a rather abstract rule of voicing assimilation based on loan words. Speakers of English devoice even when encountering segments that are not part of English phonology, as in the plural of Bach, the Ba[xs].
34. Phonological Evidence in Syntax
1181
Note that the generalization as stated in (32) mixes semantic and syntactic vocabulary, since there is talk about the complement − the difference between complements and noncomplements indeed seems to play a role. Two possibilities to account for the full pattern are discussed in Wagner (2005b), one based on a syntactic nuclear stress rule with a Bresnanian interaction with movement, one with a more semantically formulated principle about the prosodic prominence relation between argument and functors. But no particular argument to choose between these possibilities is offered. The role of syntax becomes apparent when looking at the complement clauses, which were already discussed in the section on phrasing. Truckenbrodt and Darcy (2010) present evidence that complement clauses do not form a single domain with a preceding selecting predicate, and in contrast to nominal arguments, the predicate is usually accented and separated by a boundary. This difference is most likely due to the fact that complement clauses in German obligatorily extrapose, that is, they are treated very differently from other syntactic complements. For example, they follow the verb instead of preceding it, and the complement clause is not in fact the sister of the preceding predicate. There are thus syntactic conditions on prominence above and beyond the semantic conditions on adjacent constituents proposed in Gussenhoven (1984). The work by Cinque (1993) constitutes a major landmark in the discussion on nuclear stress, first because it revived the recursive sister-based approach to sentence prominence pioneered in Chomsky et al. (1957) and Chomsky and Halle (1968), and second because it made a strong typological claim. Cinque proposed a universal generalization about syntax and prominence directly related to syntactic embedding, thus trying to explain Schmerling’s insight that, independent of linear order, it is the argument and not the predicate that receives nuclear stress. The rather baroque definition of depth of embedding was updated and related to syntactic selection in the insightful analysis of Basque sentence phonology in Arregi (2002). One problem with Cinque’s proposal is that it is not clear that there is indeed a universal generalization about nuclear stress. There is evidence, for instance, that in Romance languages the generalization observed in Germanic does not hold (Ladd 2008; Wagner 2005b). In French, for example, the predicate to do is happily accented in a construction whose Germanic parallels show prominence on the argument of the predicate (based on my consultants, Dutch, German, and Norwegian pattern with English; Spanish, Brazilian Portuguese and Italian with French): (33) a. b.
I have some shópping to do. J’ai des courses à faire.
[French]
Verb and complement are often assumed to form a prosodic domain in French, as predicted by Gussenhoven (1983), but do not show the prominence relation that one would expect under the syntactic account in Cinque (1993). Of course, everything hinges on the question whether the two sentences in (33) indeed have a syntactic structure that is comparable in all relevant respects. Another issue with the approach in terms of depth of embedding is whether the relative prominence between, say, two DP arguments is really parallel in nature to the prominence relation between a verbal predicate and its complement, as is assumed in Cinque (1993)’s approach, but not in Schmerling (1976) or Gussenhoven (1984).
1182
V. Interfaces
3.2.2. Is nuclear stress mediated by phrasing? Rather than directly stating conditions on prominence relations between sister constituents, a popular approach is to derive prominence relations as an emerging property of the Edge-Alignment of syntactic and phonological domains and additional constraints that guide the distribution of prominence within domains. The approach mediates the derivation of prominence by constraints on phonological phrasing. A serious problem for this idea is that it has been shown time and again that phrasing and prominence can be independently manipulated. In French, for example, prosodic phrasing remains intact even in the pitch-reduced domain following an early focused constituent (Jun and Fougeron 2000). Similar results of intact phrasing in the postfocal domain were obtained for English in Jaeger and Norcliffe (2005) and for Japanese in Sugahara (2003) and Féry and Ishihara (2012). This constitutes a paradox for the phrasing-based view of prosodic prominence, which would hold that postfocus subordination in fact involves phrasing all material into a single prosodic domain, and thus altering prosodic phrasing. It is instructive to look at why the original account of edge-marking (Chen 1987; Selkirk 1986, 1995) alone is not sufficient to capture prominence patterns in Germanic. If a language only showed either OV orders or VO orders, one could argue that OV languages mark the left edge of XPs and VO languages the right edge, in order to capture the fact that in both orders argument and predicate phrase together. This would not explain, however, why prominence falls on the object in both orders. Also, it doesn’t actually explain why OV languages show right-alignment and VO languages show leftalignment resulting in separate phrasings of predicate and object, and it could just as well be the other way around. Maybe this additional power is needed, but most current accounts assume that such languages should be ruled out. Setting this typological question aside, an even bigger problem for this approach is that one can find both OV and VO word orders within a single language, and the data discussed above from English and German are examples of this. A promising modification of the alignment-based approach is presented in Truckenbrodt (2006a), who uses the constraint Stress-XP from Truckenbrodt (2005), which interacts with other constraints that minimize the number of accentual domains, and serves to derive the nuclear stress pattern in English and German: (34) Stress-XP Each XP is assigned a beat of phrasal stress. (Truckenbrodt 2006a) This constraint indeed can be used to explain nuclear stress in OV and VO contexts, since in both cases an accent on the object can provide an accent for both the DP and the VP: (35) a. b.
[VP [DP Ó ] V ] [VP V [DP Ó]
The underlying assumption is that every lexical word heads a maximal projection, and that every XP needs an accent (unless it is discourse given). This account can account for various of the predicate orders in (31). However, it fails to capture the full generaliza-
34. Phonological Evidence in Syntax
1183
tion in (32). A predicate is subordinated whenever it is preceded by (part of) its complement. For crucial test cases the account based on Stress-XP makes very different predictions. Consider the following German word orders, repeated from (31d): (36) … weil er [VP3 zu schwéigen3 ]i [VP1 versprach1 [VP2 zu versuchen2 ti ]. [German] to be.silent promised to try because he ‘… because he promised to try to be silent.’ In this case, whether or not VP1 contains an accent depends on whether VP3 is located within its specifier. But crucially, it is clear that VP2 does not contain an accent. The generalization from Wagner (2005b, 2007) that at least part of its complement precedes holds, and the account in terms of Stress-XP proposed in Truckenbrodt (2006a) cannot actually capture the full pattern in (32) without making special assumptions for some of the predicate orders. Büring (2012) develops a new account and proposes a promising notion of structural strength to account for relative prosodic strength, including for cases in which there is “long-distance integration”, similar to the case in (31d). Alternative proposals use that combine edge-marking with a notion of cyclic spell-out (Kahnemuyipour 2004; Kratzer and Selkirk 2007). I will not discuss these approaches in detail here.
3.3. Information structure An important factor affecting prominence is information structure. The interaction of focus and prosody and what we can learn from it about the architecture of grammar has been a major concern at least since Chomsky (1971).
3.3.1. Focus and givenness Various proposals have been made to account for the distribution of accents depending on context, framed in terms of focus (Jackendoff 1972; Rooth 1985: i.a.) or theories of newness/givenness (cf. Schwarzschild 1999), or both (Selkirk 1995; Reinhart 2006). Core phenomena that have played a major role in shaping ideas in this domain are question/answer congruence, contrastive focus, anaphoric destressing of discourse-old material, and association with focus. Question-answer congruence has the effect that the last accent of an answer is usually contained in the constituent that corresponds to the wh-word in the question under discussion (I will mark the last accent in a sentence with capital letters, and following prosodically subordinated material by underlining): (37) Question-Answer-Congruence (Sentence Focus) a. Who arrested Smith? b. The DETECTIVE arrested Smith. A second factor is contrast. If there is an expression that is partially overlapping and partially contrasting with a previous expression, then accent placement is often affected:
1184
V. Interfaces
(38) Contrast a. Did the policeman arrest Smith? b. No, the DETECTIVE arrested Smith. A third factor is givenness. Accent placement can change if a constituent that otherwise would have borne an accent refers to an individual that has already been introduced to the discourse: (39) Givenness a. Smith walked into a store. What happened next? b. A detective ARRESTED Smith. A fourth factor is association with focus: (40) Focus association Mary only introduced SMITH to Sue. One major point of divergence among various to information structural effects on prosody is whether these four factors are treated as reflexes of one and the same underlying phenomenon, or receive different explanations. Accounts assuming that these are underlyingly one and the same phenomenon include Rooth (1992a); Schwarzschild (1999); Williams (1997); Wagner (2006) and Büring (2008). Accounts that view them as different phenomena include Selkirk (1995); Szendröi (2001); Reinhart (2006); Féry and Samek-Lodovici (2006) and Katz and Selkirk (2011), who separate anaphoric destressing of given material from instances of true focus. An interesting question, originating in Chomsky (1971), is why it is that certain renditions of a sentence seem to be compatible with different focus values, a phenomenon often called focus projection, following Höhle (1982). Selkirk (1984) proposed an account of this phenomenon in which a syntactic focus feature literally projects in syntax. As pointed out in von Stechow and Uhmann (1984), Selkirk’s proposal doesn’t account for differences between internal and external arguments with respect to whether or not focus can be projected from them upward to the sentence node, that is, it doesn’t account for Newman’s observations about the special status of internal arguments. The more recent approach to focus projection in Selkirk (1995) incorporates von Stechow and Uhmann’s insight to restrict upward focus projection to only be licit from internal arguments. To what extent syntactic focus projection is necessary, however, remains questionable, partly because there may be other ways to account for the data that do not require the projection of a syntactic feature (Rooth 1992b; Schwarzschild 1999). There are also various very basic empirical problems with the formulation of focus projection in Selkirk (1995), as discussed in Wagner (2005b); Büring (2006) and Wagner (2012b). The often made assumption that object focus and VP focus in English are not distinguished phonetically in production has been shown to be false in Gussenhoven (1984) and Breen et al. (2010), which is unexpected under certain views of focus projections. All of the proposals mentioned so far share the assumption that marking a constituent as focused or given introduces a condition on the context that calls for the presence of some salient information in the discourse. A lot of the issues that distinguish different approaches are semantic and pragmatic in nature, but the assumed representation of
34. Phonological Evidence in Syntax
1185
focus often predicts interactions with syntax. Movement, for example, can change the semantic condition introduced by focus simply by virtue of changing the syntactic sister relation, and there is indeed evidence that movement plays an important role, in the syntax of both focused and given constituents (Neeleman and Reinhart 1998; Reinhart 2006; Wagner 2006; Neeleman and Van De Koot 2008; Kucerova 2007). These approaches share the idea that there are certain syntactic configurations that lead to particular information structural effects, either because the syntax of a focus operator interacts with movement (Wagner 2006; Kucerova 2007), or because certain discourse templates directly associate information structural impact to certain tree geometries (Neeleman and Van De Koot 2008). A very different approach to informational structural effects is the cartographic approach, pioneered in Rizzi (1997), which assumes that information theoretic notions such as Focus and Topic are syntactically encoded in functional projections that constitute the syntactic spine of a sentence. Focus and topical constituents move to the specifier of these functional projections. Under this view, the assumption is often that they are part of a universal hierarchy of functional projections (Cinque 1999). A comparison between these two types of approaches is beyond the scope of this overview article, but I refer the reader to Neeleman and Van De Koot (2008) and Wagner (2012a) for some relevant comparative discussion.
3.3.2. Is focus prominence mediated by phrasing? Many approaches to focus assume that focus directly affects prosodic phrasing, and the negotiation of focus prominence is mediated by phrasing in some way or other (Pierrehumbert 1980; Kanerva 1990; Truckenbrodt 1995; Selkirk 1995). One common assumption is that phrasing is obliterated post-focally, since they rule out positing higher level phrasing in the absence of pitch accents (Beckman 1996). There is some evidence, however, that post-focal reduction leaves phrasing intact Sugahara (2003), (Jaeger and Norcliffe 2005), Féry and Ishihara (2012), and Mandarin (Xu 2005), in which case prominence and phrasing might actually be orthogonal dimensions of prosodic representation.
3.4. Processing accounts A different perspective on variation in prominence is pursued in Jurafsky et al. (2001); Aylett and Turk (2004) and Jaeger (2006), who link prominence to predictability measures, or in terms of discourse accessibility, e.g. Terken (1984) or Terken and Hirschberg (1994), see Arnold (2008) for an overview. A word that is highly expected will be realized as shorter and less prominent, while a word that is highly unexpected will be realized with more prominence. A model that predicts these asymmetries is the smooth signal hypothesis of Aylett and Turk (2004), which holds that the way language works results in a constant entropy throughout the signal. The information density across an utterance is held constant by lengthening an amplifying unexpected material and by shortening and reducing expected material.
1186
V. Interfaces
An attractive feature of this type of account is that it has the potential of tying together the prominence distribution of the default prominence distribution in which no material is marked as being salient in discourse, and the special case where prominence is influenced by the question under discussion or by previously mentioned or salient information. That said, a model in terms of entropy or predictability alone cannot easily explain why focus and givenness marking are subject to grammatical constraints that differ dramatically between languages (Ladd 2008). Even if these types of grammatical constraints turn out to be grammaticalized instances of predictability effects, they nevertheless call for a model of grammar that goes beyond purely quantitative acoustic effects of reduction and strengthening, so both levels of analysis are likely to be necessary to understand the complete picture. Unfortunately there has been no attempt at trying to account for core phenomena such as prosodic question-answer congruence, contrastive stress or givenness-deaccentuation in terms of predictability. This is an area where clearly more research is needed that takes into account insights from different disciplines.
3.5. Summary: Syntax and prominence Many of the same issues arise in the study of syntactic effects on prosodic prominence and as in the study of prosodic boundaries. How does syntax interact with prominence? Is the effect of syntax limited to the negotiation of relative prominence or are there absolute prominence constraints that syntax imposes (Wagner 2005b; Calhoun 2012)? Are certain non-phonological generalizations syntactic or semantic in nature? To what extend are apparent effects of syntax reducible to more basic processing effects? This review did not touch at all on the question of what the phonologically categorical distinctions of prominence levels or accent types are, and how they relate to syntax and semantics (cf. Breen et al. 2010). Also, it did not address the question the role of metrical and rhythmic constraints on prominence (cf. Calhoun 2010), and how they might interact with syntactic ones. This article is only able to give a small glimpse of all the open research questions in this domain, and similar to the discussion of boundaries we focused on issues related to syntax and didn’t discuss issues in phonological representation in any detail. Another issue arising both for boundaries and prominence is whether phonology can affect syntax. One compelling case of syntactic movement being motivated by the need of a constituent to receive prominence is the analysis of focus movement in Hungarian in Szendröi (2003). Related proposals have been made in the analysis of other languages. The question whether such back-channelling effects are indeed attested or whether there could be alternative accounts of apparent cases of prominence-driven syntax remains controversial and constitutes an area of active research.
4. Intonational tunes Intonational tunes encode meanings that tend to be related to the type of speech act of an utterance (e.g., interrogative vs. declarative), trigger implicatures (e.g., the contrastive
34. Phonological Evidence in Syntax
1187
topic intonation of Büring 1997), or encode attitudinal meanings (for example, sarcasm/ irony or speaker uncertainty, cf. Ward and Hirschberg 1985). Phrasing and Prominence can be varied independently of the choice of the intonational tune (Liberman 1975; Ladd 2008). Bolinger (cf. 1986) proposed separating out intonational units as meaningful elements in their own right. Recent research has converged on the view that treats at least some elements of the intonational representation as independent elements with their own meaning (Gussenhoven 1984; Pierrehumbert and Hirschberg 1990). The precise analysis of tunes remains highly controversial, including very basic questions about representation and about how much decomposition of complex tunes into smaller meaningful units is necessary. An exemplary case of an intonational tune is the so called (Rise-)Fall-Rise contour (RFR) (cf. Ward and Hirschberg 1985). It consists of a rise followed by a fall on the syllable carrying main sentence prominence, and a sentence-final rise. According to the ToBI convention, it is transcribed as [ L*H L- H%] (example from Ward and Hirschberg 1985): (41) a. b.
Did Victor get tickets for the Fellini triple feature? Veronica did.
L*H L- H% The RFR can be superimposed on any sentence, and will align with sentence prominence in a systematic way. Sometimes the pitch accent is realized on several prominent constituents. It is associated with a systematic meaning, and according to Ward and Hirschberg (1985: 756) it “conveys uncertainty about the appropriateness of some utterance in a given context (…).” Alternative characterizations can be found in Ladd (1980); Bolinger (1982); Gussenhoven (1984); Constant (2012) and Wagner (2012a). There is a growing body of recent work on the precise semantics and pragmatics of intonational tunes, and how they interact with syntax, for example the work by Gunlogson (2003); Truckenbrodt (2006b), and Trinh and Crnič (2011) on the rises in yes/noquestions and how they contribute to the meaning of inverted questions and rising declaratives, or Pruitt and Roelofson (to appear) on the intonation and syntax of alternative questions. This is another area where our understanding of the syntactic and semantic issues is still very preliminary.
5. Conclusion Prosodic phrasing, prominence, and intonational tunes directly relate to syntactic structure in complex and interesting ways. Careful attention to prosodic properties of utterances can in principle guide the development of syntactic analyses. One fundamental problem, however, that often stands in the way of using prosodic evidence in developing syntactic arguments is that the understanding of this relationship is still rudimentary and controversial. So while this chapter is programmatically called Phonological Evidence for Syntax, the current reality is that we still need to understand the relation between syntax and
1188
V. Interfaces
prosody better in order to be able trust the phonological evidence in guiding our syntactic analysis. This state of affairs is a result of a long tradition of work that focuses either on phonology or syntax, but not both, and restricts itself on making plausible but untested assumptions about the underlying syntactic structure or prosodic structure respectively. Unfortunately, what seems plausible often turns out to be wrong once proper evidence is considered. The hopeful result of recent work in syntax and prosody is that figuring out the correct syntactic representation often makes much more sense of the observed prosodic structure and vice versa. Much more work that is both syntactically and prosodically responsible is needed to accomplish the goal of turning prosody into a reliable source of evidence for syntax, or at least identifying the circumstances under which it is. This overview over current thinking on the syntax-phonology relationship will hopefully be of help for syntacticians and phonologists alike who want to embark on this project.
34. Acknowledgements This work was supported by FQRSC Grant FQRSC NP-132516, SSHRC Grant 410– 20111062, and SSHRC funding provided through the Canada Research Chair program. I would like to thank Emily Elfner, and an anonymous reviewer for very insightful comments on an earlier version of this paper, and for many helpful editorial comments by the editors.
6. References (selected) Ackema, Peter, and Ad Neeleman 2003 Context-sensitive spell-out. Natural Language and Linguistic Theory 21(4): 681−735. Adger, David 2007 Stress and phasal syntax. Linguistic Analysis 38: 238−266. Adger, David 2013 A Syntax of Substance. MIT Press. Aissen, Judith L. 1992 Topic and focus in Mayan. Language 68(1): 43−80. Arnold, Jennifer E. 2008 Reference production: Production-internal and addressee-oriented processes. Language and Cognitive Processes 23(4): 495−527. Arregi, Karlos 2002 Focus on Basque movement. Ph.D. thesis, MIT, Cambridge, Ma. Aylett, Matthey, and Alice Turk 2004 The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech 47(1): 31−56. Bard, Ellen Gurman, Anne H. Anderson, Catherine Sotillo, Matthew Aylett, Gwyneth DohertySneddon, and Alison Newlands 2000 Controlling the intelligibility of referring expressions in dialogue. Journal of Memory and Language 42: 1−22. Beckman, M. E. 1996 The parsing of prosody. Prosody and Parsing 19(11): 67.
34. Phonological Evidence in Syntax
1189
van den Berg, Rob, Carlos Gussenhoven, and Toni Rietveld 1992 Downstep in Dutch: implications for a model. In: Gerlad Docherty, and Robert Ladd (eds.), Papers in Laboratory Phonology, vol. II: Gesture, Segment, Prosody, 335−358. Cambridge: Cambridge University Press. Birch, Stacy, and Charles Clifton 1995 Focus, accent, and argument structure: effects on language comprehension. Language and Speech 38(4): 365−391. Birch, Stacy, and Charles Clifton 2002 Effects of varying focus and accenting of adjuncts on the comprehension of utterances. Journal of Memory and Language 47(4): 571−588. Blaszczak, Joanna, and Hand-Martin Gärtner 2005 Intonational phrasing and the scope of negation. Syntax 7: 1−22. Bolinger, Dwight 1982 Intonation and its parts. Language 58(3): 505−533. Bolinger, Dwight 1986 Intonation and Its Parts: Melody in Spoken English. London: Edward Arnold. Breen, Mara, Evelina Fedorenko, Michael Wagner, and Edward Gibson 2010 Acoustic correlates of information structure. Language and Cognitive Processes 25(7): 1044−1098. Bresnan, Joan 1971 Sentence stress and syntactic tranformations. Language 47(2): 257− 281. Büring, Daniel 1997 The Meaning of Topic and Focus: The 59th Street Bridge Accent. (Routledge Studies in German Linguistics.) London: Routledge. Büring, Daniel 2006 Focus projection and default prominence. In: Valéria Molnár, and Susanne Winkler (eds.), The Architecture of Focus, 321−346. Berlin: Mouton De Gruyter. Büring, Daniel 2008 What’s new (and what’s given) in the theory of focus? In: Proceedings of BLS 34, 403−424. Büring, Daniel 2012 Predicate integration: Phrase structure or argument structure? In: Ad Neeleman, and Ivona Kučerová (eds.), Contrasts and Positions in Information Structure, 27−47. Cambridge University Press. Calhoun, Sasha 2010 The centrality of metrical structure in signaling information structure: A probabilistic perspective. Language 86(1): 1−42. Calhoun, Sasha 2012 The theme/rheme distinction: Accent type or relative prominence? Journal of Phonetics 40(2): 329−349. Chen, Matthew 2000 Tone Sandhi: Patterns across Chinese Dialects. Cambridge Univeristy Press. Chen, Matthew Y. 1987 The syntax of Xiamen tone sandhi. Phonology Yearbook 4: 109−49. Chomsky, Noam 1971 Deep structure, surface structure, and semantic interpretation. In: D. D. Steinberg, and L. A. Jakobovits (eds.), Semantics: An Interdisciplinary Reader in Philosophy, Linguistics, and Psychology. Cambridge: Cambridge University Press. Chomsky, Noam, and Morris Halle 1968 The Sound Pattern of English. New York: Harper & Row. Chomsky, Noam, Morris Halle, and Fred Lukoff 1957 On accent and juncture in English. In: Morris Halle, Horace Lunt, and Hugh MacLean (eds.), For Roman Jacobson. The Hague: Mouton.
1190
V. Interfaces
Cinque, Guglielmo 1993 A null theory of phrase and compound stress. Linguistic Inquiry 24(2): 239−297. Cinque, Guglielmo 1999 Adverbs and Functional Heads: A Cross-Linguistic Perspective. (Oxford Studies in Comparative Syntax.) Oxford: Oxford University Press. Clifton, Charles Jr., Katy Carlson, and Lyn Frazier 2002 Informative prosodic boundaries. Language and Speech 45(2): 87−114. Cole, Jennifer, Yoonsook Mo, and Soondo Baek 2010 The role of syntactic structure in guiding prosody perception with ordinary listeners and everyday speech. Language and Cognitive Processes 25(7−9): 1141−1177. Constant, Noah 2012 English rise-fall-rise: A study in the semantics and pragmatics of intonation. Linguistics and Philosophy 35(5): 407−442. Cooper, W. E., C. Egido, and J. M. Paccia 1978 Grammatical control of a phonological rule: Palatalization. Journal of Experimental Psychology 4(2): 264−272. Cooper, William E., and Jeanne Paccia-Cooper 1980 Syntax and Speech. Cambridge, Mass.: Harvard University Press. Culicover, Peter W., and Ray Jackendoff 2005 Simpler Syntax. Oxford: Oxford University Press. Dehé, Nicole, and Vieri Samek-Lodovici 2009 On the prosody and syntax of DPs: Evidence from Italian noun adjective sequences. Natural Language and Linguistic Theory 27(1): 45−75. Dilley, Laura 2005 The phonetics and phonology of tonal systems. Ph.D. thesis, MIT. Dobashi, Yoshihito 2003 Phonological phrasing and syntacic derivation. Ph.D. thesis, Cornell University. Dresher, Bezalel Elan 1994 The prosodic basis of the Tiberian Hebrew system of accents. Language 70(1): 1−52. Elfner, Emily 2012 Syntax-prosody interactions in Irish. Ph.D. thesis, UMass Amherst. Ferreira, Fernanda 1993 Creation of prosody during sentence production. Psychological Review 100: 233−253. Ferreira, Fernanda 2007 Prosody and performance in language production. Language and Cognitive Processes 22(8): 1151−1177. Féry, Caroline 2001 Focus and phrasing in French. In: Caroline Féry, and Wolfgang Sternefeld (eds.), Audiatur Vox Sapientiae: A Festschrift for Arnim von Stechow, 153−181. (Studia Grammatica 52.) Berlin: Akademie Verlag. Féry, Caroline, and Shinishiro Ishihara 2012 How focus and givenness shape prosody. Ms. Goethe Universität Frankfurt. Féry, Caroline, and Vieri Samek-Lodovici 2006 Focus projection and prosodic prominence in nested foci. Language 82: 131−150. Féry, Caroline, and Hubert Truckenbrodt 2005 Sisterhood and tonal scaling. Studia Linguistica 59(2−3): 223−243. Fougeron, C., and E. Delais-Roussarie 2004 Liaisons et enchaînements: “Fais en á Fez parlant”. Actes des XXVèmes Journées d’Etudes sur la Parole, Fez (Morocco): 221−224. Fougeron, C., and P. Keating 1997 Articulatory strengthening at edges of prosodic domains. Journal of the Acoustical Society of America 101(6): 3728−3740.
34. Phonological Evidence in Syntax
1191
Fox, Danny, and David Pesetsky 2005 Cyclic linearization of syntactic structure. Theoretical Linguistics 31: 1−45. Frazier, Lyn, Katy Carlson, and Charles Jr. Clifton 2006 Prosodic phrasing is central to language comprehension. Trends in Cognitive Sciences 10(6): 244−249. Fuchs, Anna 1976 Normaler und kontrastiver Akzent. Lingua 38: 293−312. Gee, J., and F. Grosjean 1983 Performance structures: A psycholinguistic appraisal. Cognitive Psychology 15: 411− 458. Ghini, Mirco 1993 ϕ-formation in Italian: A new proposal. In: Kerry Dick (ed.), Toronto Working Papers in Linguistic, vol. 4, 41−78. University of Toronto. Goldrick, Matthew 2011 Utilizing psychological realism to advance Phonological Theory, 631−660. In: John Goldsmith, Jason Riggle, and Alan Yu (eds.), Handbook of Phonological Theory, 631− 660. Oxford: Blackwell, 2nd edn. Goldsmith, John A. (ed.) 1995 The Handbook of Phonological Theory. London: Blackwell. Gregory, M. L., W. D. Raymond, A. Bell, E. Fosler-Lussier, and D. Jurafsky 1999 The effects of collocational strength and contextual predictability in lexical production. In: Proceedings of the Chicago Linguistic Society, vol. 35, 151−166. Grosjean, F., and M. Collins 1979 Breathing, pausing, and reading. Phonetica 36: 98−114. Guimara˜es, Maximiliano 2004 Derivation and representation of syntactic amalgams. Ph.D. thesis, University of Maryland. Gunlogson, Christine 2003 True to Form: Rising and Falling Declaratives as Questions in English. New York, London: Routledge. Gussenhoven, Carlos 1983 Testing the reality of focus domains. Language and Speech 26: 61−80. Gussenhoven, Carlos 1984 On the Grammar and Semantics of Sentence Accents. Dordrecht: Foris. Gussenhoven, Carlos 1992 Sentence accents and argument structure. In: Iggy M. Roca (ed.), Thematic Structure: Its Role in Grammar, 79−106. Berlin, New York: Foris. Haider, Hubert 1993 Deutsche Syntax − Generativ. Tübingen: Gunter Narr. Hale, Kenneth, and Elizabeth O. Selkirk 1987 Government and tonal phrasing in Papago. Phonology Yearbook 4: 185−194. Halle, Morris 1997 Some consequences of the representation of words in memory. Lingua 100(1): 91−100. Halle, Morris, and Jean-Roger Vergnaud 1987 An Essay on Stress. Cambridge: MIT Press. Hayes, Bruce, and Aditi Lahiri 1991 Bengali intonational phonology. Natural Language and Linguistic Theory 9: 47−96. Henderson, Robert 2012 Morphological alternations at the intonational phrase edge. Natural Language & Linguistic Theory 30: 741−789. Hirotani, Masako 2005 Prosody and lf-interpretation: Processing japanese wh-questions. Ph.D. thesis, UMass Amherst.
1192
V. Interfaces
Höhle, Tilman 1982 Explikation für ‘Normale Betonung’ und ‘Normale Wortstellung’. In: Werner Abaraham (ed.), Satzglieder des Deutschen, 75−153. Tübingen: Narr. Hyman, Larry M., Francis Katamba, and Livingstone Walusimbi 1987 Luganda and the strict layer hypothesis. Phonology Yearbook 4: 87−108. Inkelas, Sharon, and Draga Zec (eds.) 1990 The Phonology-Syntax Connection. CSLI and CUP. Ishihara, Shinishiro 2003 Intonation and interface conditions. Ph.D. thesis, MIT. Ito, Junko, and Armin Mester 2010 Recursive prosodic phrasing in Japanese. In: Proceedings of the 18th Japanese/Korean Conference, 147−164. Stanford: CSLI. Ito, Junko, and Armin Mester 2012 Prosodic subcategories in Japanese. Lingua 124: 20−40. Jackendoff, Ray S. 1972 Semantic Interpretation in Generative Grammar. Cambridge, Ma.: MIT Press. Jaeger, T. Florian 2006 Redundancy and syntactic reduction in spontaneous speech. Ph.D. thesis, Stanford University. Jaeger, T. Florian, and Elisabeth J. Norcliffe 2005 Post-nuclear phrasing. Presented at the LSA Meeting, Oakland, CA. Jun, S. A., and C. Fougeron 2000 A phonological model of French intonation. In: Antonis Botinis (ed.), Intonation: Analysis, Modelling and Technology, 209−242. Dordrecht: Kluwer Academic Publishers. Jurafsky, Dan, Alan Bell, Michelle Gregory, and William D. Raymond 2001 Probabilistic relations between words: Evidence from reduction in lexical production. In: Joan Bybee, and Paul Hopper (eds.), Frequency in the Emergence of Linguistic Structure, 229−254. Amsterdam: John Benjamins. Kahnemuyipour, Arsalan 2004 The syntax of sentential stress. Ph.D. thesis, University of Toronto. Kaisse, Ellen M. 1985 Connected Speech. The Interaction between Syntax and Phonology. Orlando, Flo.: Academic Press. Kanerva, Jonni M 1990 Focus and Phrasing in Chichewa Phonology. Garland. Katz, J., and E. Selkirk 2011 Contrastive focus vs. discourse-new: Evidence from phonetic prominence in English. Language 87(4): 771−816. Kayne, Richard S. 1994 The Antisymmetry of Syntax. Cambridge, Ma.: MIT Press. Kayne, Richard S. 2009 Toward a syntactic reinterpretation of Harris and Halle (2005). MS. NYU. Kratzer, Angelika, and Elisabeth Selkirk 2007 Phase theory and prosodic spell-out: The case of verbs. Linguistic Review 24(2−3): 93− 135. Ms. UMass, Amherst. Kubozono, Haruo 1989 Syntactic and rhythmic effects on downstep in Japanese. Phonology 6: 39−67. Kucerova, Ivona 2007 The syntax of givenness. Ph.D. thesis, Massachusetts Institute of Technology. Ladd, D. Robert 1980 The Structure of Intonational Meaning. Bloomington: Indiana University Press.
34. Phonological Evidence in Syntax
1193
Ladd, D. Robert 1986 Intonational phrasing: The case for recursive prosodic structure. Phonology Yearbook 3: 311−340. Ladd, D. Robert, and Catherine Johnson 1987 “Metrical” factors in the scaling of sentenceinitial accent peaks. Phonetica 44(4): 238− 245. Ladd, D. Robert, and Rachel Morton 1997 The perception of intonational emphasis: continuous or categorical? Journal of Phonetics 25(3): 313−342. Ladd, D. Robert 1988 Declination and ‘reset’ and the hierarchical organziation of utterances. JASA 84. Ladd, D. Robert 2008 Intonational Phonology. Cambridge: Cambridge University Press, 2nd edn. Lehiste, Ilse 1973 Phonetic disambigation of syntactic ambiguity. Glossa 7: 107−122. Levelt, W. J. M. 1992 Accessing words in speech production: Stages, processes and representations. Cognition 42(1−3): 1−22. Liberman, Mark Y. 1975 The intonational system of English. Ph.D. thesis, MIT. Marantz, Alec 1984 On the Nature of Grammatical Relations. Cambridge, Ma.: MIT Press. McCloskey, James 2011 The shape of irish clauses. In: Andrew Carnie (ed.), Formal Approaches to Celtic Linguistics, 143−178. Cambridge Scholars Publishing. Miller, Jessica, and Zsuzsanna Fagyal 2005 Phonetic cues to special cases of liaison. In: Randall Gess, and Edward J. Rubin (eds.), Theoretical and Experimental Approaches to Romance Linguistics, 179−196. Amsterdam: John Benjamins. Miozzo, M., and A. Caramazza 1999 The selection of determiners in noun phrase production. Journal of Experimental Psychology: Learning, Memory, and Cognition 25(4): 907. Moulton, Keir 2012 Clause positions. Ms. Simon Fraser University. Myrberg, Sara 2013 Sisterhood in prosodic phrasing. Phonology 30: 73−124. Neeleman, Ad, and Tanya Reinhart 1998 Scrambling and the PF-interface. In: Miriam Butt, and Wilhelm Geuder (eds.), The Projection of Arguments, 309−353. CSLI Publications. Neeleman, Ad, and Hans Van De Koot 2008 Dutch scrambling and the nature of discourse templates. The Journal of Comparative Germanic Linguistics 11(2): 137−189. Nespor, Marina, and Irene Vogel 1986 Prosodic Phonology. Dordrecht: Foris. Newman, Stanley S. 1946 On the stress system of English. Word 2: 171−187. Niebuhr, Oliver, Meghan Clayards, Christine Meunier, and Leonardo Lancia 2011 On place assimilation in sibilant sequences − comparing French and English. Journal of Phonetics 39(3): 429−451. Odden, David 1990 Syntax, lexical rules and postlexical rules in Kimatuumbi. In: Sharon Inkelas, and Draga Zec(eds.), The Phonology-Syntax Connection, 259−278. Chicago: The University of Chicago Press.
1194
V. Interfaces
Pak, Marjorie 2008 The postsyntactic derivation and its phonological reflexes. Ph.D. thesis, University of Pennsylvania. Pak, Marjorie, and Michael Friesner 2006 French phrasal phonology in a derivational model of PF. In: Christopher Davis et al. (eds.), Proceedings of NELS 36, vol. 2. 480−191. Partee, Barbara H. 1976 Some transformational extensions of Montague grammar. In: Barbara H. Partee (ed.), Montague Grammar, 51−76. New York: Academic Press. Patterson, David, and Cynthia M. Connine 2001 Variant frequency in flap production: A corpus analysis of variant frequency in american english flap production. Phonetica 58: 254−275. Phillips, Colin 1996 Order and structure. Ph.D. thesis, MIT. Pierrehumbert, Janet 1980 The phonology and phonetics of English intonation. Ph.D. thesis, MIT. Pierrehumbert, Janet, and Julia Hirschberg 1990 The meaning of intonational contours in the interpretation of discourse. In: Philip R. Cohen, Jerry Morgan, and Marth E. Pollack (eds.), Intentions in Communication, 271− 311. Cambridge, Ma.: MIT Press. Pijper, Jan Roelof de, and Angelien A. Sanderman 1994 On the perceptual strength of prosodic boundaries and its relation to suprasegmental cues. The Journal of the Acoustical Society of America 96: 2037−2047. Post, Brechtje 2000 Pitch accents, liaison and the phonological phrase in French. Probus 12: 127−164. Price, Patti J., Mari. Ostendorf, Stefanie Shattuck-Hufnagel, and Cynthia Fong 1991 The use of prosody in syntactic disambiguation. Journal of the Acoustical Society of America 90(6): 2956−2970. Pruitt, Kathryn, and Floris Roelofson to appear The interpretation of prosody in disjunctive questions. Linguistic Inquiry. Pylkkänen, Liina 2002 Introducing arguments. Ph.D. thesis, MIT. Ranbom, Larissa J., Cynthia M. Connine, and M. Yudman, Elana 2009 Is phonological context always used to recognize variant forms in spoken word recognition? the role of variant frequency and context distribution. Ms. State University of New York at Binghampton. Reinhart, Tanya 2006 Interface Strategies. Cambridge, Ma.: MIT Press. Rizzi, Luigi 1997 The fine structure of the left periphery. In: Liliane Haegeman (ed.), Elements of Grammar, 281−337. Dordrecht: Kluwer Academic Publishers. Rooth, Mats 1985 Association with focus. Ph.D. thesis, University of Massachussetts, Amherst. Rooth, Mats 1992a Ellipsis redundancy and reduction redundancy. In: Steve Berman, and Arild Hestvik (eds.), Proceedings of the Stuttgart Ellipsis Workshop. No. 29 in Arbeitspapiere des Sonderforschungsbereiches 340: Grundlangen für de Computerlinguistik, Universität Stuttgart/Tübingen. Rooth, Mats 1992b A theory of focus interpretation. Natural Language Semantics 1(1): 75−116. Schmerling, Susan F. 1976 Aspects of English Sentence Stress. Austin: University of Texas Press.
34. Phonological Evidence in Syntax
1195
Schwarzschild, Roger 1999 Givenness, AVOIDF and other constraints on the placement of accent. Natural Language Semantics 7: 141−177. Scott, Donia R., and Anne Cutler 1984 Segmental phonology and the perception of syntactic structure. Journal of Verbal Learning and Verbal Behavior 23(4): 450−466. Seidl, Amanda 2001 Minimal Indirect Reference: A Theory of the Syntax-Phonology Interface. (Outstanding dissertations in linguistics.) Routledge. Selkirk, Elisabeth 1986 On derived domains in sentence phonology. Phonology Yearbook 3: 371−405. Selkirk, Elisabeth 1996 The prosodic structure of function words. In: James L. Morgan, and Katherine Demuth (eds.), Signal to Syntax, 187−213. Mahwah, NJ: Lawrence Erlbaum Associates. Selkirk, Elisabeth 2005 Comments on intonational phrasing in English. In: Sonia Frota, Marina Cláudia Vigário, and Maria João Freitas (eds.), Prosodies: With Special Reference to Iberian Lnaguages, 11−58. (Phonetics and Phonology.) Berlin/New York: Mouton de Gruyter. Selkirk, Elisabeth 2011 The syntax-phonology interface. In: John Goldsmith, Jason Riggle, and Alan Yu (eds.), The Handbook of Phonological Theory, 435−484. Oxford: Blackwell. Selkirk, Elisabeth O. 1972 The phrase phonology of English and French. Ph.D. thesis, MIT. Selkirk, Elisabeth O. 1984 Phonology and Syntax. The Relation between Sound and Structure. Cambridge, MA: MIT Press. Selkirk, Elizabeth O. 1995 Sentence prosody: Intonation, stress, and phrasing. In: Goldsmith, John A. (ed.), The Handbook of Phonological Theory, 550−569. London: Blackwell. Sells, Peter 2001 Structure, Alignment and Optimality in Swedish. (Stanford Mongraphs in Linguistics.) Stanford, CA.: CSLI Publications. Shattuck-Hufnagel, Stefanie, and Alice E. Turk 1996 A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research 25(2): 193− 247. Silverman, Kim, Mary Beckman, Mari Ostendorf, Colin Wightman, Patti Price, Janet Pierrehumbert, and Julia Hirschberg 1992 ToBI: A standard for labeling English prosody. In: Proceedings of the 1992 International Conference of Spoken Language Processing, vol. 2. 867−870. Snedeker, Jesse, and John Trueswell 2003 Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. Journal of Memory and Language 48: 103−130. von Stechow, Arnim, and Susanne Uhmann 1984 On the focus-pitch accent relation. In: Werner Abraham, and Sjaak de Mey (eds.), Papers from the 6th Groningen Grammar Talks on ‘Topic, focus, and configurationality’, 228−263. (Groninger Arbeiten zur Germanistischen Linguistik 25.) Steedman, Mark 2000 Information structure and the phonology-syntax interface. Linguistic Inquiry 34: 649− 689. Steedman, Mark 2001 The Syntactic Process. Cambridge, Ma.: MIT Press.
1196
V. Interfaces
Sternberg, Saul, Stephen Monsell, Ronald L. Knoll, and Charles E. Wright 1978 The latency and duration of rapid movement sequences: Comparisons of speech and typewriting. In: George E. Stelmbach (ed.), Information Processing in Motor Control and Learning, 117−152. New York: Academic Press. Stockwell, Robert P. 1960 The place of intonation in a generative grammar. Language 36(3): 360−367. Sugahara, M. 2003 Downtrends and post-FOCUS intonation in Tokyo Japanese. Ph.D. thesis, University of Massachusetts Amherst, Amherst, MA. Szendröi, Kriszta 2001 Focus and the syntax-phonology interface. Ph.D. thesis, University College London, London. Szendröi, Kriszta 2003 A stress-based approach to the syntax of Hungarian focus. Linguistic Review 20: 37−78. Taglicht, Josef 1998 Constraints on intonational phrasing in English. Journal of Linguistics 34: 181−211. Terken, J., and J. Hirschberg 1994 Deaccentuation of words representing’given’information: Effects of persistence of grammatical function and surface position. Language and Speech 37(2): 125. Terken, J. M. B. 1984 The distribution of pitch accents in instructions as a function of discourse structure. Language and Speech 27(3): 269−289. Trinh, Tue, and Luka Crnič 2011 On the rise and fall of declaratives. In: Ingo Reich, Eva Horch, and Dennis Pauly (eds.), Proceedings of Sinn & Bedeutung 15: 645−660. Truckenbrodt, Hubert 1994 Extraposition from NP and prosodic structure. In: Proceedings of NELS, 503−517. Truckenbrodt, Hubert 1995 Phonological phrases: Their relation to syntax, focus, and prominence. Ph.D. thesis, MIT, Cambridge, Mass. Truckenbrodt, Hubert 2002 Upstep and embedded register levels. Phonology 19(1): 77−120. Truckenbrodt, Hubert 2005 A short report on intonation phrase boundaries in German. Linguistische Berichte 203: 273−296. Truckenbrodt, Hubert 2006a Phrasal stress. In: E. K. Brown, and A. Anderson (eds.), The Encyclopedia of Languages and Linguistics, vol. 9, 572−579. Oxford: Elsevier. Truckenbrodt, Hubert 2006b On the semantic motivation of syntactic verb movement to c in German. Theoretical Linguistics 32: 257−306. Truckenbrodt, Hubert, and Isabelle Darcy 2010 Object clauses and phrasal stress. In: The Sound Pattern of English, 189−216. Oxford: Oxford University Press. Turk, Alice 2008 Prosodic constituency signals relative predictability. Ms. University of Edinburgh. Turk, Alice E., and Stefanie Shattuck-Hufnagel 2000 Word-boundary-related duration patterns in English. Journal of Phonetics 28(4): 397− 440. Wagner, Michael 2004 Prosody as a diagonalization of syntax. evidence from complex predicates. In: Mathew Wolf, and Keir Moulton (eds.), Proceedings of NELS 34 in 2003 at Stony Brook, vol. 34, 587−602.
34. Phonological Evidence in Syntax
1197
Wagner, Michael 2005a Asymmetries in prosodic domain formation. In: Norvin Richards, and Martha Mcginnis (eds.), Perspectives on Phases, 329−367. (MIT Working Papers in Linguistics 49.) Wagner, Michael 2005b Prosody and recursion. Ph.D. thesis, MIT. Wagner, Michael 2006 Givenness and locality. In: Masayuki Gibson, and Jonathan Howell (eds.), Proceedings of SALT XVI, 295−312. Ithaca, NY: CLC Publications. Wagner, Michael 2007 Prosodic evidence for recursion? Ms. Cornell University. Available on Lingbuzz. Wagner, Michael 2010 Prosody and recursion in coordinate structures and beyond. Natural Language and Linguistic Theory 28(1): 183−237. Wagner, Michael 2011 Production-planning constraints on allomorpy. Proceedings of the Acoustics Week in Canada. Canadian Acoustics 39(3): 160−161. Wagner, Michael 2012a Contrastive topics decomposed. Semantics & Pragmatics 5(8): 1−54. Wagner, Michael 2012b Focus and givenness: A unified approach. In: Ad Neeleman, and Ivona Kučerová (eds.), Contrasts and Positions in Information Structure, 102−147. Cambridge: Cambridge University Press. Wagner, Michael 2012c Locality in phonology and production planning. In: Proceedings of Phonology in the 21 Century: Papers in Honour of Glyne Piggott. McGill Working Papers. Wagner, Michael, Mara Breen, Edward Flemming, Stefanie Shattuck-Hufnagel, and Edward Gibson 2010 Prosodic effects of discourse salience and association with focus. In: Proceedings of Speech Prosody. Wagner, Michael, and Jeffrey Klassen 2012 Accessibility is no alternative to alternatives. Ms. McGill Universiy. Ward, Gregory, and Julia Hirschberg 1985 Implicating uncertainty: The pragmatics of fall-rise intonation. Language 61(3): 747− 776. Watson, Duane G. 2002 Intonational phrasing in language production and comprehension. Ph.D. thesis, Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences. Welby, Pauline 2003 Effects of pitch accent position, type, and status on focus projection. Language and Speech 46(1): 53−81. Wheeldon, Linda, and Aditi Lahiri 1997 Prosodic units in speech production. Journal of Memory and Language 37: 356−381. Wheeldon, Linda R., and Aditi Lahiri 2002 The minimal unit of phonological encoding: prosodic or lexical word. Cognition 85(2): 31−41. Wightman, Colin W., Stefanie Shattuck-Hufnagel, Mari Ostendorf, and Patti J. Price 1992 Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America 92: 1707−1717. Williams, Edwin 1997 Blocking and anaphora. Linguistic Inquiry 28(4): 577−628. Williams, Edwin 2003 Representation Theory. Cambridge: MIT Press.
1198
V. Interfaces
Wurmbrand, Susi 2003 Verb-clusters in West-Germanic: The emipirical domain. In: Katalin É. Kiss, and Henk van Riemsdijk (eds.), The verb-cluster Sprachbund: Verb clusters in Germanic and Hungarian. Amsterdam: John Benhamins. Xu, Y. 2005 Speech melody as articulatorily implemented communicative functions. Speech Communication 46: 220−251. Zec, Draga, and Sharon Inkelas 1990 Prosodically constrained syntax. In: Sharon Inkelas, and Draga Zec (eds.), The Phonology-Syntax Connection. Cambridge: Cambridge University Press. Zsiga, Elizabeth C. 1995 An acoustic and electropalatographic study of lexical and postlexical palatalization in American English. In: B. Connell, and A. Arvaniti (eds.), Phonology and Phonetic Evidence, 282−302. (Papers in Laboratory Phonology 4.) Cambridge: Cambridge University Press.
Michael Wagner, Montréal (Canada)
35. The Syntax-Semantics Interface 1. 2. 3. 4. 5. 6. 7. 8.
Introduction: different conceptions of the interface History and semantic background Quantifier scope and the model of the grammar Properties of QR Opacity and hidden structure Reconstruction Two recent developments References (selected)
Abstract This article discusses central aspects of the syntax-semantics interface in derivational theories of the grammar. We describe the main theoretical tools for translating surface representations into abstract, transparent logical forms which can be directly interpreted in the semantic component. Following some brief remarks on the pedigree of this notion of the interface, the article surveys central characteristics of logical forms as they manifest themselves in the interaction between quantifier scope, binding and coreference, and explicates the way in which these results have been interpreted in current minimalist models. Topics to be discussed include scope inversion, covert movement, strategies for delayed structure building, Copy Theory, and referential opacity.
35. The Syntax-Semantics Interface
1199
1. Introduction: different conceptions of the interface Natural language syntax represents a discrete, deterministic, combinatorial system that generates representations inside an autonomous, informationally encapsulated module (Fodor 1983) of the language faculty. The output of the syntactic component is mapped to possibly non-linguistic representations of other cognitive modules at designated points in the computation. These points are defined as the interfaces. The present survey of the syntax-semantics interface focuses on aspects of the translation procedure from syntax to the conceptional-intentional system (Chomsky 1995), which is responsible for modeltheoretic interpretation (semantics in the narrow sense), the calculation of logical inferences, the representation of concepts and intentions, and possibly other functions. All meaningful theories of the syntax-semantics interface adopt in one version or the other the principle of compositionality, which requires that the meaning of complex expressions is functionally dependent upon the meaning of its immediate parts and the way these parts are combined (Frege 1892; Montague 1970). Apart from this requirement, current models of the grammar vary substantially both in the perspective they take on the role of syntax in the architecture of the grammar, as well as in the semantic principles of interpretation they admit. This makes it difficult to provide a uniform, universally applicable definition of the syntax-semantics interface. Still, it is possible to discriminate between two groups of approaches which are characterized by their diverging views on how to treat mismatches between syntax and semantics. Such contexts, prototypically exemplified by quantifier scope ambiguity in sentences like A critic liked every movie are informative about the nature of the interface because the syntactic parse fails to uniquely identify a semantic interpretation, requiring resolution of this conflict in one of the two components. On the one side, monostratal, lexicalized models such as Categorial Grammar (CG; Adjukiewicz 1935; Jacobson 1996, 2002), Combinatory Categorial Grammar (CCG; Steedman 2000; Steedman and Baldridge 2011), Lexical Functional Grammar (LFG; Bresnan 1982, 2001), Montague Grammar (Montague 1970, 1973; Partee 1976), and Head-driven Phrase Structure Grammar (HPSG; Pollard and Sag 1994) assume that syntactic and semantic operations are computed from a single linguistic tree representation. In classic Montague Grammar, for one, syntactic and semantic representations are built simultaneously such that at each stage of the computation, the input of a syntactic rule also serves as the input of a corresponding semantic rule (rule-by-rule approach; cf. Bach 1976 or direct compositionality; cf. Barker and Jacobson 2007). These surface oriented, non-derivational approaches characteristically operate on the hypothesis that linguistic representations do not contain hidden structure, keeping the amount of abstractness in the object-language expressions to a minimum. Potential mismatches between syntax and semantics are typically not resolved in syntax, but by employing a semantic meta-language enriched with type adjustment operations (s. section 3) or additional composition rules. Moreover, theories in this tradition often guard against overgeneration by limiting the application of certain semantic rules to specific syntactic environments (meaning postulates). As a consequence, for lexicalized models, the study of the syntaxsemantics interface primarily consists in investigating the semantic rules and their combinatorial properties, and not in exploring the nature of the relation between these rules and the operations generating overt syntactic expressions.
1200
V. Interfaces
By contrast, in syntacto-centric models, the syntactic component precedes modeltheoretic interpretation, which essentially has two effects. First, core syntactic operations cannot be made contingent on semantic factors without further additions. Second, communication between syntax and semantics is possible only at designated points of interaction that can be accessed from the semantic component. In derivational theories, this interface function is served by the abstract linguistic representations referred to as LFs (logical forms). Since LFs by assumption consist of abstract objects formed within the syntactic component, they are subject to the laws of natural language syntax. Derivational models differ in this respect from lexicalized theories, which predominantly contest the existence of both hidden syntactic structure and operations manipulating (abstract) object-language expressions. It is for this reason that although there is no uniformly shared conception of LF at the moment (see below), one can identify a pair of guiding, methodological principles common to all LF-based approaches: (i) the willingness to admit more abstractness in syntax than surface oriented lexicalized approaches, and (ii) a tendency to resolve mismatches between syntax and semantics, if they arise, in course of the syntactic derivation, instead of in the interpretive component. Thus, as a rule of thumb it can be observed that derivational and categorial theories differ in that the former admit a greater degree of abstractness in syntax, where the latter tend to accept more complexity in semantics. The objective of this article consists in summarizing central aspects of the theory of LF as it has been emerging over the last three decades in syntacto-centric, derivational models of the grammar. Following some remarks on the historical roots of the notion logical form, intended to anticipate potential terminological confusion (section 2.1), the semantic background assumptions will be made explicit (section 2.2). Section 3 presents the standard treatment of quantifier scope in syntacto-centric models, while section 4 provides an overview of the main properties commonly held to be associated with scope extension by quantifier raising (section 4). In section 5, it will be seen how opacity effects arising from the interaction between scope inversion, binding and coreference patterns can be used as a diagnostic for hidden structure and abstract LF-representations. Section 6 reviews different strategies of reconstruction. Finally, section 7 briefly draws attention to two recent developments in the study of the syntax-semantics interface. Unfortunately, limitation of space makes it impossible to expand the discussion to alternative views of the syntax-semantics interface (e.g. Discourse Representation Theory). For further surveys of the syntax-semantics interface see Enç (1988); Fox (2003); Huang (1995); May (1993); Sauerland and von Stechow (2001); von Stechow (2011); and Szabolzsi (2001, 2011).
2. History and semantic background 2.1. The use of logical form in philosophy and linguistics Philosophers have been using the term “logical form” for a wide variety of concepts only loosely related to their linguistic namesake (Lappin 1993; May 1999; Pietroski 2009). On the most general interpretation, logical forms reveal fundamental regularities of propositions, rendering visible unifying logical properties that are masked by what were at times considered inherent irregularities of natural language. Only the logical
35. The Syntax-Semantics Interface
1201
form underlying the translation into (1c) would e.g. bring to light the similarities between the first order interpretation of universal quantification ([1a]) and conditionals ([1b]): (1)
a. All dogs are awake. b. If something is a dog, it is awake. c. cx[dog(x) → awake(x)]
Closely related to this view is the claim that logical forms are not part of the objectlanguage, but are schemata that are uniquely determined by propositions and belong to logic, where they serve to compute inferences and meaning relations. (On the history of logical form see Lappin 1993; Menzel 1998; Pietroski 2009; Preyer and Peter 2002.) The term “logical form” was first employed in a meaning close to its modern linguistic usage in Russell (1905) and Wittgenstein (1929). Russell observed that the translation from natural language expressions into a regimented formal language sometimes displays invariant regularities suggesting a systematic relationship between object- and metalanguage. It is this systematicity in the meaning-form correspondence that distinguishes Russell’s conception of logical forms from earlier approaches, and makes it resemble modern linguistic views. The specific phenomena that Russell considered included definite descriptions and names. For instance, a formula akin to (2b) was seen as the underlying logical form for the surface string (2a). (2)
a. b.
The dog is awake. dx[dog(x) Y awake(x) Y cy[[dog(y) Y ¬x=y] 5 ¬awake(y)]]
However, in their early logico-philosophical incarnation (Tarski 1936; Carnap 1934), logical forms were neither compositionally derived from sentences, nor part of objectlanguage, but merely encoded hidden properties of the proposition expressed by a sentence. This was so as the correspondences between natural language and logical syntax were thought to be too irregular and idiosyncratic for defining a systematic mapping between them (Misleading Form Hypothesis, Russell 1905; this position was famously held by Tarski; Fefermann and Fefermann 2004). For instance, (2a) and its logical form (2b) differ in that the subject DP the dog forms a constituent to the exclusion of the VP in syntax, while no such bracketing can be found in the logical representation. The solution to this problem consists in Frege’s analysis of quantifiers as second order properties and the λ-calculus (see section 2.3). Resolving such mismatches between the logical meta-language syntax and constituency in natural language is a central goal common to all sufficiently precise, compositional theories of language. Lexicalized models are set apart from syntacto-centric derivational (Chomsky 1995) and some representational systems (Brody 1995; Haider 1993; Koster 1986; Williams 1986) by the fact that the latter two adopt as a heuristics for conflict resolution the hypothesis that not all aspects of linguistic meaning are phonetically represented: (3)
Abstractness Hypothesis Linguistic representations contain abstract objects.
In derivational theories, the Abstractness Hypothesis found its most prominent expression in the idea that the grammar includes a post-syntactic component (LF) which en-
1202
V. Interfaces
codes “grammatically determined aspects of meaning” (Chomsky 1976; May 1977). The Principle and Parameters model (Chomsky 1981) treated LF as a separate level of representation located between syntax and model-theoretic interpretation. In minimalist grammars (Chomsky 1995), LF is incorporated in the syntactic component, and defined by the point in the derivation at which Spell-Out applies: LF comprises of all operations that follow Spell-Out, and therefore have no effect on pronunciation. It has become common practice to use the term LF to refer either to the class of all well-formed LF-representations, or to individual members thereof, without any commitment as to whether LF is taken to be a separate level of representation or not. Integrating LFs into the object-language generates the expectation that the logical syntax of the formal interpretation language is co-determined by natural language syntax, as expressed by the Transparent Interface Hypothesis (4): (4)
Transparent Interface Hypothesis Interpretive properties are co-determined by properties of natural language syntax.
In derivation models, the task of defining the logical syntax of an expression can therefore be partially relegated to the rules of natural language syntax operating on this expression. Demonstrating that such a division of labor between syntax and semantics in fact exists has been a central objective for adherents of LFs (see section 4 for additional details). Anticipating some other results to be addressed in detail below, the analytic techniques legitimized by the conjunction of (3) and (4) include silent movement operations; invisible syntactic structure inside traces, elliptical nodes and pronouns; unpronounced variables and variable binders (λ-operators) in the object-language; and lexical decomposition in the object-language, usually in combination with silent operators. As was seen above, the logico-philosophical tradition and current versions of generative grammar assign to logical form fundamentally distinct functions. While on the former, logical forms are equivalence classes of properties that reside in the logical metalanguage, the linguistic notion describes complex natural language objects which can be manipulated by the principles of grammar. There is also a second difference between modern conceptions of LF and the classical philosophical view on logical form apart from their ontological status. While logical forms were meant to represent the logical skeleton of propositions on the basis of which inferences they entailed, LFs in generative linguistics did at least initially (Chomsky 1976) explicitly not encode meaning relations. Rather, LFs are purely syntactic objects that have not been assigned a model theoretic interpretation yet, in line with the hypothesis that syntax operates inside an informationally encapsulated system that does not accept instructions from other modules. Some qualifying remarks are in order regarding the role of LF in current models. Notably, the LF-interface may diverge in at least two ways from the standard minimalist model. First, there are also “flatter” versions of derivational grammars which give up the assumption that the division between overt and covert operations reflects a key characteristic of the model. Unlike lexicalized theories, these single output theories still admit abstract, hidden information to be part of linguistic representations (Bobaljik 1995; Groat and O’Neil 1996; Pesetsky 2000). However, unlike in earlier incarnations of minimalism, the derivation is no longer partitioned into a visible and an invisible section, with the result that abstractness becomes a pervasive property of the grammar. Empirically, such a conception leads one to expect systematic interaction between overt and covert processes. As will be seen in section 4, evidence to that effect has indeed been isolated in various domains.
35. The Syntax-Semantics Interface
1203
Second, there is a growing amount of evidence that the computations mapping from syntax to semantics can also be dependent upon factors external to the syntactic system proper. This option has recently been explored in areas where choices within core syntax that are relevant to the determination of model-theoretic interpretation are either informed by contextual factors (Reinhart 2006) or by results provided by a non-domain specific deductive system computing logical inferences (Chierchia 1984: 40; Fox 2000; Fox and Hackl 2005; Gajewksi 2002; see section 7). The next subsection outlines some of the main developments in semantics that resulted in the inception of compositionally interpretable LF in the early 1990ies, to be discussed in section 2.3.
2.2. Lambda Calculus Prior to the late 1960ies, the consensus was that natural language semantics could not be regimented by the strict methods developed in formal logic. Extending the formal rigor from logic to natural language was made possible by the combination of two components, though (Montague 1973; Lewis 1970; see Gamut 1991; Partee 1996; Partee and Hendriks 1997 for surveys): λ-calculus and Frege’s view of quantifier meanings as higher order functions. Adopting compositionality as its main heuristic guideline, Montague Grammar (MG) was the first explicit theory that treated natural language as the input of an interpretation function which recursively assigns meanings to complex expressions. The combination of λ-calculus and the Fregean theory of quantification in MG was instrumental in identifying compositional translation procedures for complex expressions which previous logic based analyses could only treat in terms of construction specific (syncategorematic) meaning rules. To illustrate, up to that time, a simple quantificational statement such as (5) could not be given a compositional analysis, because it was not possible to assign the quantificational determiner every (and, as a result the whole quantifier phrase every dog) an interpretation independently of the meaning of its common noun sister dogs: (5)
Every dog is awake.
This in turn was a consequence of the fact that pre-Fregean logic was only equipped to generate first order predicates and therefore lacked the formal power to refer to the two place higher order function denoted by every. Hence, the meaning of the proposition expressed by (5) could not be compositionally derived from the meaning of its parts, a fact which was taken to indicate that the computation of natural language meanings involves intrinsically non-compositional aspects. A related problem in mathematics had led to the formulation of the λ-calculus (Church 1936). Consider the function in (6a), which can also be written as (6b): (6)
a. b. c.
f = {, , , , …} f(x) = x2 + 1 f = λx[x2 + 1]
1204
V. Interfaces
The classical function notation is not optimally transparent, though, in that the function symbol (f in [6b]) cannot be separated from the independent variable (x in [6b]). An expression such as f = x2 + 1 is not admissible. In absence of a suitable notational device, there is no way to refer to the function in isolation. The λ-calculus filled this gap by severing the function from its variable, illustrated by (6c). The right-hand side of (6c) is to be interpreted as “the smallest function mapping x to x2 + 1”, which is, as desired, a description of the function f itself (on the history of λ-calculus see Cardone and Hindley, to appear). Applied to natural language semantics, the λ-calculus supplies a method for referring to functions independent of their arguments, and can therefore be used to design appropriate meanings for sub-constituents such as the quantificational determiner every (Montague 1973). As demonstrated by (7b), every is translated as a second-order two place function which, if combined with the common noun meaning dogs, yields a Generalized Quantifier denotation (GQ; Mostowski 1957), characterizing the semantic contribution of every dog ([7c]). GQs denote functions from properties to sentence denotations (i.e. truth values). In the case of (7c), the function maps any property, if it holds of every dog in the domain to 1 (or True), and to 0 (False) otherwise. The GQ meaning (7c) is then applied to the predicate denotation ([7d]), resulting in the appropriate truth conditions for the proposition expressed by (7a) (intensional aspects of meanings not represented; for a survey of the analysis of quantifiers see Peters and Westerstahl 2006). (7)
a. b. c. d.
Every dog is awake. ⟦every⟧ = λPλQcx[P(x) → Q(x)] ⟦every dog⟧ = λQcx[dog(x) → Q(x)] ⟦every dog⟧ (⟦is awake⟧) = = λQcx[dog(x) → Q(x)](λx.awake(x)) = = cx[dog(x) → awake(x)]
To recapitulate, the λ-calculus provides a method for “splitting up” the meanings of complex expressions, while Frege’s conception of quantification specifies where to apply the split (every applies to dog, and the combined result to awake), as well as how to interpret the components. In combination, these theories specify a step-by-step translation procedure from natural language syntax to semantics that proceeds compositionally, assigning a suitable meaning to each node in the syntactic tree. Moreover, in derivational models, this transition is mediated by LFs. Section 2.3 addresses in more detail the internal composition of LFs and the semantic rules rendering them interpretable.
2.3. Transparent logical forms Up to the late 1980ies, the principles relating LFs and model-theoretic semantics were generally left vaguely defined. An explicit system restricting the class of possible LFs to transparent logical forms was articulated in von Stechow (1993) and Heim and Kratzer (1998). Transparent logical forms are abstract linguistic representations derived from surface syntactic trees (i) whose shape is co-determined by the principles of natural language syntax and (ii) which satisfy the additional criterion of being compositionally
35. The Syntax-Semantics Interface
1205
interpretable without further modification. Since compositionality describes a functional dependency, this in turn entails that each transparent LF functionally determines a (single) truth-conditional meaning (modulo context). A first effect of transparency will be made explicit in the treatment of scope and variable binding, two concepts to be introduced in turn. Apart from rendering possible a transparent mapping from syntax to semantics in the first place, the λ-calculus is helpful in modeling probably the two most important syntactic relations encoded by LFs − semantic variable binding and the syntactic notion of the scope of an operator. The scope of an operator defines the domain (i.e. the set of nodes) in an LF-tree in which this operator can bind semantic variables. Structurally, this set corresponds to the c-command domain of the operator. Semantic variables are objects that are dependent upon their (semantic) binders for meaning assignments, and include certain overt pronouns as well as traces left by various movement operations (question formation, relativization, etc…). To illustrate, the subject quantifier everybody in (8) binds the inalienable pronoun his as a variable. As a result, the valuation of the pronoun co-varies with the valuation of the antecedent subject, as shown by the paraphrases in (8a) and (8b), respectively: (8)
Everybody1 is trying his1 best. a. cx[person(x) → x is trying to do x’s best] b. “Every individual x is such that if x is a person, then x is trying to do x’s best”
It is important to note in this context that not all syntactic coindexing results in variable binding. Unlike (8), example (9) is ambiguous between a bound variable construal and a coreferential interpretation (Geach 1962). In the case at hand, this difference is truthconditionally significant. If John and Bill tried on John’s vest and nobody else dressed up, sentence (9) is evaluated as true on the bound reading paraphrased in (9a), yet not on the coreferential interpretation (9b). Conversely, models in which John and Bill each tried on their own vest verify the sentence on its coreferential reading (9b), but fail to satisfy the criteria for the bound interpretation (9a). Thus, coindexing in syntax expresses the semantic notions of coreference or binding (Partee 1970; Büring 2005). (9)
Only John1 is trying on his1 vest. a. No person x except for John has the property of trying on x’s vest. b. Nobody except for John has the property of trying on John’s vest.
Turning to some semantic details, coreferential pronouns are treated as individual variables that are evaluated relative to the context. Technically, this is implemented with the help of contextually given assignment functions which map the numerical indices of free variables to members of the individual domain (Tarski 1936). Coreference arises if an antecedent DP and the index on a pronoun are mapped to the same individual. By contrast, the denotation of trees that include bound variable pronouns does not depend on assignment functions, reflecting the insight that semantic binding is not contingent upon contextual factors. Rather, binding is the result of two variables (x in [9a]) being captured by the same λ-operator. Hence, the difference between bound and coreferential readings of pronouns semantically reduces to whether their meaning is assignment dependent (referential pronoun) or not (bound pronoun).
1206
V. Interfaces
Returning to the relation between scope and λ-calculus, notice that in the formal translation of (8), repeated in (10a), the pronoun is bound by the λ-operator which is introduced by the quantifier everybody, as in (10b). More precisely, the λ-binder (‘λx’) in the formal meta-language corresponds to the index on everybody. (10) a. Everybody1 is trying his1 best. b. λQcx[person(x) → Q(x)](λx.x is trying x’s best) c. [Everybody [1 is trying his1 best]] But while the index is part of the quantificational DP in (10a), the λ-binder ‘λx’ is bracketed with the argument of the quantifier denotation in (10b), resulting in a mismatch between syntactic structure and the logical syntax underlying semantic interpretation. In order to ensure compositional translation, it has therefore, following Heim and Kratzer (1998), become common practice to re-analyze the syntactic relation between the index and its host DP at LF as given in (10c). The syntactic representation in (10c) is then transparently interpretable by the semantic rules of Predicate Abstraction (11) and Function Application (12); adapted from Heim and Kratzer (1998): (11) Predicate Abstraction For any index n: ⟦ n β[ …n … ]⟧g = λx. ⟦β⟧g[n→x] (12) Function Application For any nodes α, β, γ, such that α immediately dominates β and γ, and ⟦β⟧ 2D and ⟦γ⟧ 2Dδ: ⟦α⟧ = ⟦β⟧ (⟦γ⟧) As detailed by the sample derivation under (13), Predicate Abstraction serves two purposes: First, the modified assignment function (g[1→x]), which maps values to variables, replaces all occurrences of index 1 in (13c) by the variable x, resulting in (13d). Second, the λ-operator abstracts over this variable in (13d), creating a derived predicate of individuals. Following Heim and Kratzer (1998), these λ-binders will be assumed to be part of the object-language at LF: (13) a. b. c. d.
= ⟦1 [is trying his1best] ⟧g g[1→x] λx.⟦is trying his1 best⟧ = (by Predicate Abstraction) λx.is trying g[1→x](his1) best = (applying assignment function g to pronoun) λx.is trying x’s best (calculating result of g)
This derived predicate combines then with its sister node, the denotation of the quantifier everybody, by Function Application. The relevant steps in the compositional derivation are detailed in (14): (14) a. b. c. d.
⟦[Everybody [1 [is trying his1 best]]] ⟧g = g g ⟦Everybody⟧ (⟦1 [is trying his1 best] ⟧ ) = (by Function Application) λQcx[person(x) → Q(x)] (λx is trying x’s best = (interpretation of subject cx[person(x) → x is trying x’s best] and substitution of argument by [13d])
35. The Syntax-Semantics Interface
1207
Adopting a widely used notational convention intended to improve readability, indices will from now on be rendered as subscripts to the λ-prefix, writing for example λ1 instead of 1. In current LF-models, index reanalysis of the type exemplified in (10b/c) is not limited to binding relations, but also at work in contexts involving (certain types of) movement. The standard procedure for interpreting movement involves a rule that separates the index from its host category and re-attaches it to the sister node of the host, as shown in the transition from (15a) to (15b) (Heim and Kratzer 1998): [vP t1 is awake]] (15) a. [TP John1 b. [TP John [T′ = λ1 [vP t1 is awake]]] Given this tight structural correspondence between index and λ-binder, it becomes now possible to define the scope of a variable binding operator (e.g. a quantifier) as the scope of the λ-binder associated with that operator. This shift in perspective resonates with the view that one defining property of LFs consists in their ability to match λ-binding relations with explicit syntactic representations. LF-approaches differ here once again from surface oriented, categorial approaches, for which abstraction and variables are purely semantic notions. Explicit representation of λ-binding relations in the object-language has three immediate consequences. First, it extends the notion of scope from the scope of a quantifier (as it was used for a long time in the generative syntactic literature) to the scope of a λbinder, thereby rendering more transparent the semantics of binding and quantification. Second, privileging the status of the λ-binder over the lexical operator in defining LFs forges a close link between movement and binding. Many movement dependencies can now be interpreted as involving variable binding, and v.v. Adopting this view, it becomes for instance immediately evident that empty operator movement in relative clauses is the syntactic reflex of derived predicate formation by λ-abstraction. As detailed by (16), which tracks the evolution of a relative clause at the interface, the index on the fronted empty operator in (16a) creates a derived λ-predicate (16b/c), while the operator itself remains semantically vacuous (Heim and Kratzer 1998; von Stechow 2007). (16) a. b. c.
Overt syntax: LF: Semantics:
the book [OP3 she read t3] the book [λ3 she read t3] the ((λx.book(x))(λx.read(x)(she))
But the analytical options also expand in the other direction, in that relations that have previously been thought to implicate binding, like e.g. control, become now amenable to a movement analysis. If PRO is assumed to move short distance ([17a], Heim and Kratzer 1998), the widely shared assumption that control complements denote properties ([17b], Chierchia 1984, 1989) can be made to follow from the general interpretation rules for movement: (17) a. b. c.
Overt syntax: LF: Semantics:
Sally tried [PRO2 t2 to win] Sally tried [λ2 t2 to win] tried(λx.win(x))(Sally)
1208
V. Interfaces
Thus, the transparent LF model emphasizes the relevance of abstraction by λ-binding in the analysis of relative clauses, control, and many other constructions, thereby anticipating aspects of the compositional interpretation in the syntactic component. A third consequence of representing λ-binding at LF is that quantifier scope must be explicitly represented at LF, which in turn entails syntactic disambiguation of relative quantifier scope. Since the proper treatment of quantifier scope constitutes one of the core areas of any theory of the syntax-semantic interface, and also aids in discriminating among different theories, the issue will be taken up in detail in the section to follow.
3. Quantifier scope and the model of the grammar The current section embeds a survey of different strategies for determining quantifier scope (section 3.1) into a general discussion of the criteria that distinguish between models of the grammar that admit LFs and those that do not (3.2).
3.1. Quantifier scope and scope ambiguity The principle of compositionality dictates that the meaning of each complex expression is only dependent upon the meaning of its immediate parts and the way they are combined. This leads to the problem that quantifiers in object position, illustrated by (18), cannot be interpreted without further additions to the theory: (18) John liked every movie. The problem resides in the incompatibility of the verb meaning and the object denotation, which can be expressed in terms of mismatches between their logical types. On their standard interpretation, transitive predicates such as like denote (Curried or Schönfinkelized) two place relations between individuals (logical type ), and therefore need to combine with an individual denoting term (type e) as their first argument. Generalized Quantifiers such as every movie denote second order properties, i.e. functions of type . Such functions require properties () as input. Thus, the verb meaning is looking for an e-type argument, while the sister of every movie must be in the domain of -type expressions, resulting in conflicting type requirements. In most theories, the conflict is resolved by adopting one of two strategies. First, on the transparent LF approach, documented in (19), the object quantifier every movie is removed from its base position by an application of Quantifier Raising (QR; [19a]; Chomsky 1976; May 1977). QR is a movement operation that targets quantifiers, and covertly raises them into positions where they are interpretable, possibly crossing other operators. In this particular case, the trace of every movie is interpreted as an e-type variable bound by the λ-binder of the fronted quantifier ([19b]), which may then be combined with the quantifier denotation, yielding the desired interpretation outlined in (19c):
35. The Syntax-Semantics Interface
1209
(19) a. [Every movie2 [John liked t2]] b. [Every movie λ2 [John liked t2]] c. λQcx[movie(x) → Q(x)](λx.John liked x) = cx[movie(x) → John liked x] Thus, the LF-approach resolves the type conflict − in line with the general method outlined in the introduction − by adopting an abstract movement operation, i.e. QR, modulating the syntax. QR can in turn be motivated by a general requirement that derivations generate semantically interpretable results. Alternatively, (18) can be given a compositional interpretation by type shifting operations which adjust the meaning of one of the expressions triggering a type mismatch (every movie and like). This strategy is particularly popular in surface oriented theories, among them Montague Grammar (MG) and various current (type-logical or combinatory) versions of Categorial Grammar, which tend to avoid hidden complexity or abstract representations. In these frameworks, transitive verbs can e.g. be mapped to the higher type , such that verb denotations and quantifier denotations may combine without altering surface constituency. Moreover, it is also possible to shift the denotation of the object quantifier (to type ), keeping constant the verb denotation. Detailed discussion of the treatment of scope in Categorial Grammar can be found in Hendriks (1993), Jacobson (1996), Steedman (2012) and Szabolcsi (2010, 2011), among others. As compositionality also entails that each expression translates into a single meaning (modulo lexical ambiguity and context), it follows that structurally ambiguous sentences such as (20) have to be disambiguated before they are submitted to the semantic interpretation function. (20) Some critic liked every movie. Disambiguation can be achieved in various ways. To begin with, on the transparent LF approach, QR optionally places the object quantifier either inside ([21a]) or outside ([21b]) the scope of the subject quantifier, yielding two disambiguated LFs which in turn result in two truth-conditionally distinct interpretations (May 1977): (21) a. LF1: LF1 (index reanalysis) Translation 1: b. LF2: Translation 2:
[Some critic1 [every movie2 [t1 liked t2]]] [Some critic λ1 [every movie λ2 [t1 liked t2]]] dx[critic(x) Y cy[movie(y) → x liked y]] [Every movie λ2 [Some critic λ1 [t1 liked t2]]] cy[movie(y) → dx[critic(x) Y x liked y]]
Covert movement in (21) generates what is called a prenex normal form for quantified expressions in which all quantifiers precede the open formula containing their bound variables (the vPs in [21]). Semantically, the LF analysis is similar to MG, historically the first compositional theory of natural language quantification, in that scope relations are mapped into binding relations between quantifiers and individual variables. MG is representative of a radically different view on the interaction between syntax and semantics, though. Designed as a non-derivational categorial system, each step in the computation of a sentence generates
1210
V. Interfaces
a syntactic object as well as its pertaining interpretation. The specific implementation employed by MG rests of the concept of rule-by-rule interpretation (Bach 1976), on which each input expression simultaneously induces the application of a syntactic rule and its corresponding semantic partner rule. Generating (20) e.g. involves the rule (scheme) of quantifying-in, an operation that translates sentences containing unbound variables or pronouns into quantified formulas. The syntactic part of quantifying-in introduces a quantifier directly into its prenex (read: scope) position ([22a]), while the semantic rule simultaneously assigns the emerging structure a Tarskian model theoretic interpretation, paraphrased in (22b): (22) a. [some critic α] → some critic (λx. [α … x …]) (where α is an open formula) b. “There is an individual which is a critic and which has property λx. [α … x …]” Applying quantifying-in twice to (20) generates two relative scope orders. Quantifyingin the object first, followed by quantifying-in the subject yields the surface scope reading, while the reverse sequencing results in the inverted scope interpretation. Sentences with more than one quantifier are accordingly associated with multiple derivational histories (analysis trees), and not multiple linguistic representations. Thus, MG produces results that are for all means and purposes semantically undistinguishable from the ones generated by its descendant QR − yet, these results are obtained by different means. While in derivational approaches, the meanings are disambiguated at LF, such that each LF-tree functionally translates into a single scope order, MG and other categorial theories derive the two readings of (20) from a single surface representation. Moreover, differences between quantifying-in and QR also manifest themselves on the syntactic side of the derivation. Most notably, the MG strategy for representing scope needs to stipulate that the quantifier surfaces in the position of the coindexed variable, and not in its prenex position. For further in-depth discussion of scope in different versions of CG such as Flexible Categorial Grammar, type-logical grammar, and CCG see Jacobson (1996), Partee and Hendriks (1997), Steedman and Baldridge (2011) and Szabolcsi (2011), among others. In addition to QR, type shifting and Quantifying-in, various other strategies for analyzing quantifier scope phenomena have been proposed in the literature, including Cooper Storage (Cooper 1975, 1983); Scope Indexing (Cooper and Parsons 1976; van Riemsdijk and Williams 1981; Ruys 1992); Quantifier Lowering (Lakoff 1971; May 1977, 1985); Semantic Reconstruction (Cresti 1995; Hendriks 1993; Rullman 1995; von Stechow 1991; see section 6); syntactic reconstruction by copies; decompositional approaches, which generate the quantificational determiner in a position above the quantifier restrictor (Kratzer and Shimoyama 2002; Sportiche 2005); underspecification (Egg 2010) and Game theoretic accounts (Hintikka 1997). For an overview of these and other analytical tools for modeling scope see Szabolcsi (1997, 2001, 2010, 2011); Enç (1987); Ruys and Winter (2011), among many others.
3.2. The model of the grammar (evidence for LF) As was seen above, there are various analytical options for coding scope and scope ambiguity, with derivational models on one side of the spectrum, and strictly surface
35. The Syntax-Semantics Interface
1211
oriented, non-derivational, categorial theories on the other. Even though the two groups are, due to fundamental differences in their axioms and empirical coverage, to a large degree incommensurable, it is possible to isolate some diagnostics that aid in adjudicating between the competing models. The most important of these match the profile of one or more of the criteria in (23): (23) a. There are syntactic principles that treat overt and covert expressions alike. b. Division of labor: the syntactic component assumes functions otherwise left to semantics. c. Existence of hidden structure: surface representations contain hidden structure. d. Opacity effects: covert operations display opaque rule orderings. Anticipating the results of section 4 and 5, these criteria will be seen to provide strong support for the Transparent Interface Hypothesis (repeated from above as [24]), which is in turn best compatible with a model that admits abstract LF-representations, as expressed by the Abstractness Hypothesis (25): (24) Transparent Interface Hypothesis Interpretive properties are co-determined by properties of natural language syntax. (25) Abstractness Hypothesis Linguistic representations may contain abstract objects. It should be noted at this point that the distinction between theories is not as categorical as the presentation above might have suggested. In particular, there are also monostratal, non-derivational models that do not espouse the concept of LF, but still admit hidden complexities such as traces and empty operators (Brody 1995; Haider 1993; Koster 1986), in line with the Abstractness Hypothesis. Furthermore, multiple (possibly abstract) representations can also be linked by other mapping principles apart from movement (Williams 2003). This expanded topology is consistent with the observation that the criteria in (23) divide the logical matrix of plausible theories into more than two cells, and are therefore not exclusively symptomatic of the LF model. Although space precludes a detailed discussion of criteria for these alternatives, it should be in most cases evident which specific analyses presuppose the notion of LF, and which ones are also compatible with other theoretical choices. The remainder of this article presents a review of selected pieces of evidence for the Abstractness Hypothesis from the literature, which match one or more of the criteria in (23). At the same time, and of at least equal significance, the survey aims at (i) exposing the most important analytical tools and methods used in current research on that topic and at (ii) introducing some of the basic phenomena that define the syntax-semantic interface. For expository convenience, the presentation will not proceed from criterion to criterion, but will follow the order intrinsic in the analyses.
1212
V. Interfaces
4. Properties of QR 4.1. Locality matches that of certain overt movements The present section reports similarities between QR and overt movement that contribute both a “QR is syntactic” ([23a]) as well as a division of labor argument for LF ([23b]). Before doing so, two remarks on the method used to diagnose non-surface scope are in order. First, following Reinhart (1976), the examples will throughout be designed in such a way that the non-surface interpretation is logically weaker than (does not entail) the surface interpretation (Cooper 1979; Reinhart 1976; Ruys 1992). This ensures that the existence of a designated LF-representation for non-surface scope can be directly inferred from the existence of models that only satisfy the derived scope reading. (20), for one, meets the criterion (the non-surface reading [21b] does not entail the surface interpretation [21a]), supporting the assumption that the inverted reading is structurally represented in form of the LF in (21b). By contrast, examples like (26) fail to elicit evidence for a non-surface LF, because the derived scope order (26b) entails the surface reading (26a). As a result, it is not possible to identify scenarios that can only be truthfully described by the inverse reading (26b). (26) Every critic liked a movie. cx[critic(x) → dy[movie(y) Y x liked y]] a. Surface interpretation: b. Non-surface interpretation: dy[movie(y) Y cx[critic(x) → x liked y]] More generally, on this methodology, combinations of universals in subject position and (monotone) existential objects are not suited as diagnostics for the existence of LFrepresentations that code non-surface scope. (Such examples can still be found in the syntactic literature, though; for discussion see e.g. Ruys 1992). A second, widely used strategy testing for wide scope apart from judgements related to relative quantifier scope is provided by the use of the relational modifier different, which displays ambiguity between a distributive ([27a]) and a deictic interpretation ([27b]; Carlson 1987; Beck 2000). As revealed by the first order translation of (27a) in (28), the former involves an implicit variable (y) in the denotation of different that is bound by the universal: (27) Every critic liked a different movie. a. Every critic liked a movie and no two critics liked the same movie. b. Every critic liked a movie that was different from that (contextually salient) movie. (28) cx[critic(x) → dy[movie(y) Y x liked y Y ¬da, b[critic(a) Y critic(b) Y a≠b Y a liked y Y b liked y]]] As a result, distributive different must reside inside the scope of a distributive operator. It is this requirement which distinguishes between (29) and (30). While scope shift by QR is licensed in (29), resulting in LF (29a) with associated translation (29b), no stand-
35. The Syntax-Semantics Interface
1213
ardly sanctioned syntactic operation may extend the syntactic scope of quantifiers across sentence boundaries, accounting for the deviance of (30) (but see [42]): (29) A different critic liked every movie. a. LF: every movie λ2 [a different critic liked t2] b. cy[movie(y) → dx[critic(x) Y x liked y Y ¬da, b[movie(a) Y movie(b) Y a≠b Y x liked a Y x liked b]]] (30) #A different critic arrived. Every movie looked interesting to him. The distribution of different accordingly supplies an independent gauge for measuring the LF c-command domain of (at least a certain group of) quantifiers and will be used in the discussion of movement by QR below (Johnson 1996). The classic argument for modeling scope extension by QR is based on the observation that scope of certain QPs is restricted by the same syntactic conditions which limit overt movement. Among others, QR is subject to the Complex NP Constraint and the Subject Condition. Thus, the fact that (31a, b) and (33a) lack the inverted scope reading can be taken to indicate that the representation of object wide scope ([32b]) involves movement, and that movement is blocked in these instances for the same reason that it is unavailable in analogous examples of overt dislocation ([31c] and [33b]): (31) Complex NP Constraint a. Some actress made [DP the claim that she liked every movie]. (d _ c / *c _ d) b. #A different actress made [DP the claim that she liked every movie]. (d _ c / *c _ d) c. *Which movie2 did some actress make [DP the claim that she liked t2]? (32) a. LF 1: some actress made [DP the claim that [every movie2 [she liked t2]]] b. LF 2: *[every movie2 [IP some actress made [DP the claim that she liked t2]]] (33) Subject Condition a. [[That she disliked every movie] convinced some actress to become a critic. (d _ c / *c _ d) b. *Which movie2 did [[that she disliked t2] convince some actress to become a critic? A second standard argument for covert displacement comes from the interaction between pronominal variable binding and scope. Just like overt wh-movement, QR feeds pronominal variable binding only if the base position of the operator (wh-phrase or quantifier) c-commands the variable, as in (34a) and (35a). Absence of c-command leads to violations of the Weak Crossover (WCO; Wasow 1972) condition ([34b] and [35b]), which characteristically involve mild ungrammaticality. (34) a. Who1 t1 likes his1 mother? b. *?Who1 does his1 mother like t1? (35) a. Everyone/Noone1 dislikes his1 mother. b. *?His1 mother dislikes everyone/noone1.
1214
V. Interfaces
Assuming that WCO effects are induced by moving operators across pronominal variables they bind, the observation that the behavior of quantifiers parallels that of overtly fronted wh-phrases supports the claim that quantifiers reach their scope position by covert movement. Note in passing that if the structural relations between trace and pronoun are reversed, i.e. if the pronoun c-commands the trace left by displacement, Strong Crossover (SCO) effects emerge. The symptoms of SCO are robust judgements of unacceptability. (36) a. Who1 t1 thinks she likes him1? b. *Who1 does he1 think she likes t1? (37) *He1 thinks she dislikes noone1. SCO is usually interpreted as a reflex of Principle C, on the assumption that the descriptive content of wh-phrases and quantifiers makes them behave like names for the purposes of Binding Theory. Unlike WCO, SCO does therefore not elicit further evidence for QR. Ruys (1992) observes that a particular combination of the island diagnostic with pronominal variable binding generates at first sight unexpected results, which at closer inspection further strengthen the covert movement hypothesis, though. The contrast (38) demonstrates that QR is, just like overt movement, regulated by the Coordinate Structure Constraint (CSC). Curiously, quantifier exportation out of the initial conjunct all of a sudden becomes licit if the wide scope object binds a variable inside the second conjunct ([39a]): (38) a. Some student likes every professor2 and hates the Dean. (d _ c / *c _ d) b. *She asked, who2 some student likes t2 and hates the Dean. (39) a. Some student likes every professor2 and wants him2 to be on his committee. b. every professor2 [some student [likes t2] and [wants him2 to be on his committee]. Identical contrasts can be replicated for wh-in-situ: (40) a. *I wonder who [took what from Mary] and [gave a book to Jeremy]. b. I wonder who [took what2 from Mary] and [gave it2 to Jeremy]. The sudden emergence of the wide scope reading in (39) and (40b) can be explained on the assumption that pronominal variables and traces are sufficiently similar for the purposes of the CSC (both are interpreted as variables), and that the CSC is a representational condition which is evaluated subsequent to covert movement ([39b]), and not a constraint on syntactic derivations (Fox 2000: 52; Ruys 1992). On this analysis, the existence of selected island violations with QR and wh-in-situ provides an independent argument for an abstract representation such as LF. Even though not strictly necessary for the Abstractness Hypothesis, it would be interesting to be able to pair QR with one of the various overt movement processes. The search for a suitable overt analogue has proved difficult, though. In some ways, QR behaves like A-movement in that it generally observes clause boundedness (Hornstein 1995; example from Johnson 2000, [6b]):
35. The Syntax-Semantics Interface (41) I told someone you would visit everyone.
1215 (d _ c / *c _ d)
There are however various exceptions to this generalization. Reinhart (2006: 49), for one, surveys cases in which QR appears to be able to cross finite sentence boundaries (see also Steedman 2012; Wilder 1997): (42) A doctor will make sure that we give every new patient a tranquilizer. (d _ c / c _ d) More crucially, the motivation behind this alleged parallelism remains obscure. Triggering QR by the same mechanism that drives A-movement (Case), as suggested in Hornstein (1995), fails to provide a complete analysis, because QR is also attested with categories that do not require Case (Kennedy 1997). The indirect object of (43) can e.g. be construed with wide scope even though it has already been assigned Case in its base position by to: (43) Someone gave this to every student.
(d _ c / c _ d)
Similarly, in contexts of Inverse Linking (Larson 1987; May 1986; May and Bale 2005), the embedded quantifier every city may scope out of its containing DP even though it is case marked as the prepositional complement of from: (44) a. Someone from every city2 hates it2. b. [every city2 [[Someone from t2] t2 hates it1]].
(#d _ c / c _ d)
For these reasons, Kennedy (1997) concludes that QR cannot consist in A-movement. An alternative group of approaches tries to assimilate QR to the kind of Mittelfeld scrambling phenomena known from continental Western Germanic (Diesing 1992; Johnson 2000). In German, objects may e.g. move across subjects ([45b]) and scramble out of restructuring infinitivals ([45c]). Scrambling must however not cross finite sentence boundaries ([45d]; the control in [45e] shows that long extraction is generally licit): (45) a. weil Peter den Zaun reparierte since Peter the fence mended ‘since Peter mended the fence’ b. weil den Zaun1 Peter t1 reparierte since the fence Peter mended c. weil den Zaun1 Peter [TP PRO t1 zu reparieren] hoffte to mend hoped since the fence Peter ‘Peter hoped to mend the fence’ d. *weil den Zaun1 Peter hoffte [TP würde Maria t1 reparieren] would Mary mend since the fence Peter hoped e. Den Zaun1 hoffte Peter [TP würde Maria t1 reparieren] would Mary mend the fence hoped Peter ‘Peter hoped that Mary would mend the fence’
[German]
1216
V. Interfaces
Moreover, scrambling is also blocked if the infinitival is introduced by a complementizer (Johnson 2000; example from Dutch, as German lacks infinitival complementizers): (46) *dat Jan Marie1 heeft geprobeerd [om t1 te kussen] that John Mary has tried C° to kiss ‘that John has tried to kiss Mary’
[Dutch]
As pointed out by Johnson (2000), the same restrictions are also characteristic of QR. QR may extend the scope of a quantifier beyond non-finite clause boundaries ([47]), but typically not across finite predicates ([48]) or complementizers ([49]): (47) a. At least one American tourist expects to visit every European country this year. (d _ c / *c _ d) b. At least one American tourist hopes to visit every European country this year. c. Some government official is required to attend every state dinner. (Kennedy 1997, [46], [47], [50]) (48) A different student claimed that she had read every book.
(d _ c / *c _ d)
(49) A different student wanted for you to read every book.
(d _ c / *c _ d)
Finally, in those languages that admit the operation, scrambling, just like QR, feeds scope. Scrambling languages such as German or Japanese are scope rigid in that under normal intonation, base word order is only compatible with the surface scope interpretation ([50a]). Scope ambiguity is contingent upon overt inversion of the quantifiers by scrambling ([50b]), or some other rearrangement operation (Frey 1993; Haider 1993; Kiss 2000; Krifka 1998; Wurmbrand 2008): (50) a. weil irgendeiner jedes Buch mit Freude gelesen hat [German] since someone every book with joy read has (d _ c / *c _ d) ‘since somebody read every book with joy’ mit Freude gelesen b. weil irgendein Buch1 jeder t1 Since some book everybody with joy read hat has ‘since everybody read some book with joy’
(d _ c / c _ d)
This has been taken as evidence that scrambling languages lack QR unless required for resolving type conflicts that arise with Inverse Linking or in-situ object quantifiers. On this view, all non-surface scope orders are derived by reconstruction. (Inverse scope has also been attributed to a non-compositional Scope Principle, which maps configurations with multiple operators to multiple interpretations; Frey 1993; Aoun and Li 1993.) Reconstruction denotes a group of operations which restore (parts of a) dislocated category into one of its pre-movement configurations for the evaluation of scope, binding or referential opacity effects (section 5 and section 6). For the German string (50b), this has the consequence that the inverse reading, on which the universal distributes over the existential (such that books may potentially vary with readers), results from reconstruct-
35. The Syntax-Semantics Interface
1217
ing the fronted object into a position below the subject (t1). Scrambling now resembles QR in that it feeds new scope relations. The QR-as-covert-scrambling analysis has the additional benefit of supplying a division of labor argument for LF (Diesing 1992; Johnson 2000). If scope is determined by QR, and QR is the covert counterpart of scrambling, languages differ only in a single parameter: whether they admit overt scrambling (German) or covert scrambling (QR in English). This perspective dovetails, for one, with a single output model of the grammar (Bobaljik 1995; Groat and O’Neil 1996; Pesetsky 2000), in which overt and covert movement operations are not discriminated by relative timing, but apply in a single cycle and are distinguished only by whether the higher or the lower movement copy is pronounced. Cross-linguistic variation is thereby restricted to different parameter settings in syntax, while the semantic component can be kept uniform across all language types, presumably a desirable result in itself. By contrast, surface oriented categorial approaches not only need to find an explanation for why English type languages employ a strategy of scope extension which is missing in German, but also have to provide an (unrelated) answer as to why scrambling is limited to German. This effectively amounts to admitting cross-linguistic variation both in the component that generates word order variation, as well as in semantics, where scope extension is computed. Thus, even though it postulates an additional, mediating level between syntax and semantics, the LF-approach eventually turns out to be more parsimonious in its design.
4.2. Types of quantifiers It has been observed that QR reveals its nature most transparently when it targets distributive universals (each, every), but is subject to various distortions with other classes of DPs. On the one hand, singular indefinites freely scope out of islands, indicating that they do not reach their scope position by QR (Farkas 2000; Fodor and Sag 1982; Kratzer 1998; Reinhart 1997, 2006; Ruys 1992; Winter 2001). (51) If a relative of mine dies, I will inherit a house. (Ruys 1992) Cardinal plural indefinites are on the other hand much more limited in their scope taking options than distributive universals. For instance, Ruys (1992) notes that (52) lacks the interpretive signature of the inverse scope reading (52b), the sentence cannot be used to characterize situations in which six critics reviewed two movies (s. a. Reinhart 2006: 110). (52) minimally differs in this respect from structurally isomorphic examples with universals in object position ([20], repeated from above as [53]): (52) Three critics liked two movies. (d3 _ d2 / *d2 _ d3) a. dX[|X|=3 Y critics(X) Y ca ≤ X → dY[|Y|=2 Y movies(Y) Y cb ≤ Y → a liked b]] b. dY[|Y|=2 >Y movies(Y) Y cb ≤ Y → dX[|X|=3 Y critics(X) Y ca ≤ X → a liked b]]
1218 (53) Some critic liked every movie.
V. Interfaces (d _ c / c _ d)
The inability of plural indefinites to obtain wide scope is also responsible for the deviance of (54): (54) #Three different critics liked two movies. Moreover, even if the cardinal were allowed to escape islands, the truth conditions delivered by these readings would be too weak (Ruys 1992). The wide scope distributive formula (55a) is already satisfied on the condition that a single relative of mine dies. But this is not what (55) means. Rather, (55) expresses the proposition that an inheritance is dependent on the death of all three relatives, which is only captured by the wide scope collective construal (55b): (55) If three relatives of mine die, I will inherit a house. a. dX [|X|=3 Y relatives of mine(X) Y cy≤x [die(y) → I will inherit a house] b. dX [|X|=3 Y relatives of mine(X) Y [[cy≤x → die(y)] → I will inherit a house] (Ruys 1992) Thus, treating cardinal indefinites as ordinary generalized quantifiers fails to provide the means for excluding unattested interpretations. It has therefore been suggested to analyze plural indefinites as existentially closed wide scope choice functions (Kratzer 1998; Reinhart 2006; Ruys 1992; Winter 1997). In its simplest incarnation, a choice function applies to a non-empty set of individuals and returns a member of that set ([56]). For (55), the choice function account delivers the desired collective interpretation on the assumption that the function ranges over pluralities: (56) f is a choice function (CH) iff for any non-empty X: f(X) 2X (57) df [CH(f) Y [die(f(relatives of mine)) → I will inherit a house] To recapitulate, QR does not affect all noun phrases uniformly, motivating the introduction of new semantic techniques (such as choice functions) for modeling certain aspects of the translation from natural language syntax to the interpretive component. For further discussion of the logical syntax of plural noun phrases and distributivity see Landman (2003); Winter (2001); Reinhart (2006); von Stechow (2000) among many others. For the question of how far QR takes different types of quantifiers see Ioup (1975); Szabolcsi (1997, 2010); Beghelli and Stowell (1997); Kamp and Reyle (1993); and Reinhart (2006).
4.3. Order preservation effects A number of syntactic configurations impose ordering statements on relative quantifier scope, resulting in scope freezing phenomena. This observation provides further support
35. The Syntax-Semantics Interface
1219
for the view that quantifier exportation is the result of a syntactic operation that behaves like overt movement. Order preservation restrictions are known, among others, to limit the relative scope options of the two internal arguments in the double object construction ([58]; Barss and Lasnik 1986). The same condition blocks the theme in (59b) from binding an implicit variable inside the goal: (58) a. I gave a child each doll. (d _ c / *c _ d) b. The judges awarded a athlete every medal. (d _ c / *c _ d) (Bruening 2001, [2a], [28c]) (59) a. I gave every girl a different marble. b. #I gave a different girl every marble. (Johnson 1996) Interestingly, the direct object may scope over the subject, though, as shown by (60): (60) a. A (different) teacher gave me every book. b. At least two judges awarded me every medal. (Bruening 2001, [28a], [28c])
(c _ d) (c _ at least 2)
This indicates that the relevant constraint does not put an upper bound on the scope domain of direct objects per se, but has to be formulated in such a way as to require the relative order between the two internal arguments to be preserved. Before turning to a specific account of scope freezing, it is instructive to digress briefly into order preservation effects with overt movement. In English multiple interrogatives, structurally higher arguments must precede lower ones. This generalization, known as the Superiority Condition (Chomsky 1973; Hornstein 1995; Shan and Barker 2006), forces the subject of (61) to surface to the left of the object. Analogous considerations hold for (62) and (63). Note incidentally that (61b)− (63b) are interpretable by standard techniques of question semantics, thus the constraint is unlikely to be semantic in nature. (61) a. Who bough what? b. *What did who buy? (62) a. Who did she give t what? b. *What did she give who t? (63) a. Whom did Bill persuade to visit whom? b. *Whom did Bill persuade whom to visit? A prominent strand of analyses relates this pattern to a general principle of economy preferring shorter movement paths over longer ones, which is variably referred to as Shortest, Shortest Move, Shortest Attract, or the Minimal Link Condition (Chomsky 1995; Richards 2001). Shortest accounts for the contrasts above because subject movement in (61a) e.g. creates a shorter movement path than object movement in (61b). In (64a), the subject crosses only a single maximal projection on its way to SpecCP, while the object in (64b) has to traverse at least three nodes:
1220
V. Interfaces
(64) a. b.
[CP who1 [TP t1 [vP t1 [VP bought what2 ]]]] ]]]] *[CP what2 did [TP who1 [vP t1 [VP buy t2
One group of languages that permit overt fronting of more than one wh-phrase, among them Bulgarian and Romanian, reveal another important restriction: multiple movement generally proceeds in such a way that it preserves the original serialization of the whphrases. (65) a. Koj kogo vižda? who whom sees ‘Who sees whom?’
[Russian]
b. *Kogo koj vižda? whom who sees (Rudin 1988, [45a, b]) Richards (2001) demonstrated that these order preservation effects also fall out from the economy condition Shortest. The derivation of (65a) is schematized in (66). Whenever a higher category α and a lower node β are attracted by the same c-commanding head ([66a]), the metric that minimizes the length of movement paths dictates that α move prior to β ([66b]). Moreover, Shortest forces the second movement, which affects β, to “tuck in” below α, rather than passing over β, yielding the crossing dependency (66c): (66) a. b. c.
[α1 [α1
head … [α … [β head … [t1 … [β [β2 head … [t1 … [t2
(head attracts α and β) (α moves first) (β moves second, tucking in below α)
Returning at this point to scope freezing in double object constructions, Bruening (2001) argues that multiple applications of QR and wh-movement can be given a common analysis in terms of (66) if it is assumed that quantificational DPs need to check a Q(uantificational)-feature on v°. Then, the indirect object (IO), which is generated above the direct object (DO), undergoes QR first ([67b]), landing in an outer specifier of vP, followed by movement of DO ([67c]). Since the second application of QR tucks in below the first one, the two internal arguments end up in an order-preserving configuration ([67c]): (67) a. b. c.
[vP SUB V[Q] [VP IO2,[Q] [DO3,[Q] ]]] [vP SUB V[Q] [VP t2 [DO3,[Q] ]]]] [vP IO2 [vP IO2 [vP DO3 [vP SUB [vP t2 [t3 ]]]]]
On this view, QR is also feature driven, and not exclusively motivated by the need to repair type clashes. Even though attractive, the feature analysis also encounters complications. First, Sauerland (2000) notes that Bruening’s account is challenged by (68a), which can, among others, be assigned a reading on which the subject scopally interferes in between the indirect and the direct object. This is unexpected inasmuch as the LF (68b) fails to preserve the base order:
35. The Syntax-Semantics Interface (68) a. b.
1221
Two boys gave every girl a flower. [vP IO2 [vP SUB [vP IO3 … (Sauerland 2000, [49])
(c _ d2 _ d)
Second, the assumption that quantifier movement is driven by the need to check Qfeatures in addition to the requirement to avoid type mismatches duplicates the motivation for object QR, thereby introducing redundancy into the system. For further discussion and alternative solutions see Lechner (2012); Sauerland (2000); Williams (2005). Finally, a number of additional structural restrictions on quantifier scope have been identified in the literature, two of which will be briefly addressed below. To begin with, predicate fronting ([69a]) systematically bleeds inverse scope readings (Barss 1986; Huang 1993). (69) a. b.
(¬d _ c / *c _ ¬d) … and [VP teach every student2 ]3, noone1 will t3 … and noone1 will [VP teach every student2 ] (subsequent to reconstruction)
The topicalized VP of (69a) needs to reconstruct for reasons of interpretation, restoring the base word order, as in (69b). Scope freezing can then be interpreted as a consequence of the descriptive generalization in (70), according to which VP-movement renders ineligible the object every student (α) for long QR across the subject noone (β) in representation (69b). (70) If XP contains α, moves and is interpreted below the overt position of β, α cannot extend its scope over β. A similar restriction applies to Inverse Linking, where the two quantifiers affected are in a dominance, instead of a c-command, relation. In (71), the direct object someone from every city needs to cross the (VP-internal trace of the) subject in order to resolve a type mismatch. At the same time, every city may be inversely linked across its container someone ([71a, b]). However, the subject must not scopally interfere between the inversely linked node every city and the container ([71c], Larson 1985): (71) [β Two policemen] spy on [XP a. d2 _ c _ d b. c _ d _ d2 c. *c _ d2 _ d
someone from [α every city]]. (inverse linking, wide scope for subject) (inverse linking, narrow scope for subject) (inverse linking, intermediate scope for subject)
Just like (69), sentence (71) bears the signature of (70), the only difference being that in (71), XP moves covertly and not overtly. In (71), the quantifier someone from every city (XP) contains every city (α) and needs to move for type reasons. Moreover, in the relevant reading (71c), XP is interpreted below the position the subject (β) resides in. Thus, (70) prohibits every city from obtaining scope over the subject, excluding (71c). As for their theoretical relevance, scope freezing effects expose once again the parallelism between overt and covert movement operations. On the one side, it was seen that in double object constructions, both visible wh-movement and multiple QR display order preservation effects. On the other side, embedding a quantifier inside a container that moves prevents that quantifier from crossing higher operators, irrespective whether the
1222
V. Interfaces
container moves overtly (predicate fronting; [69]) or covertly (inverse linking; [71]). These observations contribute further arguments for the position that QR obeys laws very similar to those typical of overt movement processes (s. [23a]).
4.4. Scope Economy VP-ellipsis denotes the process by which a VP is phonologically suppressed under identity with an antecedent VP: (72) a. b.
John liked the movie. Mary liked the movie, too. John liked the movie. Mary did ⌂, too. (⌂ = like every movie)
On the standard analysis, the terminals inside the elided VP are syntactically projected, but the ellipsis operation instructs them to forgo pronunciation. The elided VP is moreover subject to a semantic parallelism condition which is commonly taken to be satisfied whenever the denotation of the antecedent VP is an element of the focus semantic value of the elided VP (Rooth 1992). What is of relevance for the present purposes is that VP-ellipsis reveals a further constraint on relative quantifier scope. As first observed by Sag (1976) and Williams (1977), VP-ellipsis leads to disambiguation in the antecedent clause if the elliptical clause is unambiguous ([73b]). (73) a. b.
A critic liked every movie. An actress did, ⌂ too. A critic liked every movie. Mary did ⌂, too. (⌂ = liked every movie)
(d _ c / c _ d) (d _ c / *c _ d)
On an influential idea developed in Fox (1995, 2000), the contrast under (73) follows from parallelism in conjunction with the principle of Scope Economy. Scope Economy demands that all applications of QR, except for movements that resolve type conflicts, must have a semantic effect. This requirement is met by long object QR inside the elliptical clause of (73a), because long QR generates a new interpretation, which is distinct from (and weaker than) the surface reading ([74a]). Thus, Scope Economy licenses wide scope for the object in (73a). In (73b), on the other hand, locating the object quantifier above or below the subject does not have any consequences for interpretation, the two LFs translate into synonymous formulas ([74b]). As a result, wide object scope for every movie is blocked in (73b). (74) a. ⟦[every movie2 [an actress1 [t1 liked t2]]]⟧ ≠ actress1 [every movie1 [t1 liked t2]]]⟧ b. ⟦[every movie2 [Mary1 [t1 liked t2]]]⟧ = ⟦[Mary1 [every movie2 [t1 liked t2]]]⟧ Just like Shortest, Scope Economy minimizes movement paths. However, unlike Shortest, Scope Economy applies relative to an interpretation. If two competing derivations
35. The Syntax-Semantics Interface
1223
end up with scopally indistinguishable results, the longer, more costly one is blocked by Shortest. The verdict of Shortest is suspended, though, for derivations that create new, distinct interpretations. Similar observations have been made for overt wh-movement (Golan 1993). While the formation of multiple interrogatives in English is arguably shaped by Shortest (see [61] above), Superiority effects are systematically cancelled if costly movement leads to an interpretation which could not have been achieved by a more economical strategy. Observe to begin with that (75) can be either answered as in (75a) or as in (75b), indicating that the sentence is ambiguous between a single and a multiple question interpretation (Baker 1970). Descriptively, the two readings differ in whether the wh-in-situ object what is assigned embedded scope ([76a]) or matrix scope ([76b]; for discussion see also Reinhart 2006): (75) Who remembers where we bought what? a. Sally remembers where we bought what, John remembers where we bought what, … b. Sally remembers where we bought fish, John remembers where we bought bread, … (76) a. Who1 [t1 remembers [where what2 we bought t2]] b. Who1 what2 [t1 remembers where we bought t2]
(object narrow scope) (object wide scope)
By contrast, (77) can only be interpreted with matrix scope for the embedded subject ([78a]), as seen by the fact that (77b) does not constitute a felicitous answer to (77) (Hendrick and Rochemont 1982; Lasnik and Saito 1992): (77) Who remembers what2 who1 t1 bought t2? a. Sally remembers that Bill bought fish, John remembers that Sue bought bread, … b. *Sally remembers what who bought, John remembers what who bought, … (78) a. Who1 who2 [t1 remembers [what3 t2 bought t3]] b. *Who1 [t1 remembers [what3 who2 t2 bought t3]]
(subject wide scope) (subject narrow scope)
Golan (1993) suggests that this observation receives a natural explanation on the assumption that the economy conditions which regulate movement are calculated relative to a fixed interpretation. Economy excludes the narrow scope, single question construal (78b), because the competing surface representation (79) conforms better with Shortest and achieves with (79a) the same target interpretation that (77) does with (78b): (79) Who remembers who1 t1 bought what? a. Who remembers [who1 what2 t1 bought t2] b. *Who who1 remembers [t1 what2 bought t2]
(subject narrow scope) (subject wide scope)
There is, however, no alternative strategy for expressing the subject wide scope, multiple question reading (78a) apart from (77). This is so because in (79), the lower subject who1 marks the complement as an interrogative complement and therefore must be interpreted in the local SpecCP (Wh-Criterion; Rizzi 1996), excluding the subject wide scope
1224
V. Interfaces
reading (79b). As a result, economy legitimizes (77) as the optimal form for the target interpretation (77a), despite the fact that strictly speaking, (77) fails to abide by Shortest. To summarize, the economy metric which regulates the information flow between syntax and semantics treats QR and certain types of wh-in-situ alike. In both cases, Shortest selects the most parsimonious derivation relative to a given interpretation. That is, if two derivations based on the same numeration yield the same interpretation and differ only in the length of their respective movement paths, the grammar prefers the one with the least amount of movement. For covert movement, this translates into the generalization that QR is banned unless it generate new scope orders. And in environments of wh-movement, the costlier derivation is sanctioned only if it places the wh-insitu into a scope position that would be unaccessible otherwise. Since the emergence of economy effects is generally held to be symptomatic of syntactic operations, the fact that certain properties of scope fixing fall under the reign of economy provides further evidence for the claim that aspects of interpretation are determined by syntactic principles, as expressed by the Abstractness Hypothesis. For further applications of Scope Economy see Fox (2000); Meyer and Spector (2009) and Reinhart (2006), among others.
4.5. Cross-categorial QR Movement that affects interpretation is not restricted to nominal generalized quantifiers of individuals, but is also attested with other syntactic categories and second-order properties in other ontological domains, further substantiating the claim that inverse scope phenomena have a structural basis. The present section briefly reviews two such cases: silent movement of the degree head in comparatives, and semantically detectable, overt head movement. Heim (2000) designs a semantics for comparatives, exemplified by (80a), which treats the degree head -er as the degree counterpart of determiners quantifying over individuals, with the meaning given in (80b). The second order property of degrees -er combines with the than-XP first and takes a derived degree predicate as its second argument. In order to generate such a derived degree predicate, the string -er than Bill needs to raise, targeting a propositional node, as shown in (80c). (80) a. Ann is taller [than-XP than Bill]. b. ⟦-er⟧ = λP λQ. P 3 Q (adapted from Bhatt and Pancheva 2004, [84]) c. [[-er than Bill]2, [ λ2 [ Ann is tall-t2]]] The hypothesis that degree heads are not interpreted in their base position has received additional support from two directions: -er movement generates new readings (Heim 2002; Beck 2011) and creates previously unavailable binding options for categories inside the than-XP (Bhatt and Pancheva 2004). Bhatt and Pancheva (2004) demonstrate that these observations are best accounted for by an analysis that moves -er on its own, followed by post-cyclic attachment of than Bill (see [117] for details). On this view, comparatives implicate instances of covert, scope shifting head movement.
35. The Syntax-Semantics Interface
1225
A second instance of semantically detectable X°-movement has been argued to affect certain modal heads (Lechner 2007). In (81), the subject can be assigned split scope, such that the negation takes scope above, and every boy is interpreted below the modal: (81) Not every boy can make the team.
(¬ ◊_ c)
Both the subject and the modal reach their respective surface positions by movement. That modals move is confirmed by the observation that they usually scope below of adverbs to their right: (82) He can2 always t2 count on me.
(always _ ◊ /*◊ _ always)
Hence, the split reading of (81) can in principle be derived by reconstructing every boy below the derived position of the modal, as detailed by (83) (denotation brackets mark interpreted positions): (83) [⟦not⟧ every boy1 [⟦can⟧2 [⟦every boy⟧1 [t2 … [t1 …]]]]]
(¬ ◊_ c)
Crucially, if the parse in (83) turns out to be correct, it follows that can is interpreted in a derived position, since on the intended reading, the modal scopes over the subject. Evidence for the assumption that every boy indeed reconstructs into a position above the base position of the modal comes from the interaction of scope-splitting with negative polarity items (NPIs). Linebarger (1980) reports that an NPI must not be separated from its licensing negation by another quantifier at LF. Among others, the Immediate Scope Constraint excludes (84b) by imposing a locality requirement that is not met by the post-QR configuration (84c): (84) a. b. c.
She doesn’t budgeNPI for me. *She doesn’t budgeNPI for everybody not [everybody1[vP budgeNPI for t1 …
Turning to scope splitting, (85) demonstrates that negative universals are generally compatible with NPIs (Horn 2000). Embedding an NPI into configurations of scope splitting produces sharply degraded result, though, as seen in (86): (85) Not everyone has ever read any Jespersen. (86) *Not everyone can ever be on the team.
(*¬ _ ◊ _ c _ NPI)
If it is assumed that the subject is located above the base position of the modal, as hypothesized in (83), the Immediate Scope Constraint offers a plausible explanation for the contrast above: is ill-formed because everyone intervenes between the negation and the NPI: (87) [not everyone1 [can2 [⟦everyone⟧1 tcan [everNPI … t2 … t1 …]]]] Everything being equal, this entails for the well-formed case of scope splitting in (81) that the scope order between the subject and the modal also has to be computed in
1226
V. Interfaces
derived positions, just as in (83). But then, the modal scopes above its base position, indicating that certain instances of head movement have semantic effects. This is not unexpected if raising modals are taken to denote generalized quantifiers of situations (). Thus, scope extension by head movement fits naturally into the typology of other scope shifting operations. Apart from scope splitting, there is a second context where verb movement appears to induce semantic effects. As illustrated by the contrasts in (88)−(90) below, subject NPIs (in English) are licensed if negation cliticizes on a finite auxiliary in C°, but not by regular sentential negation ([88] attributed to Jim McCloskey by Jason Merchant; [89] attributed by Ian Roberts to Richard Kayne; [90] is from Szabolzci 2010): (88) a. b.
Why isn’t a single chair set up in here? *Why is a single chair not set up in here?
(89) a. b.
Which sandwiches didn’t anybody eat? *Anybody didn’t eat the tuna sandwiches?
(90) a. b.
Don’t anyone/even one of you touch my arm! *Anyone/even one of you touch my arm!
To recapitulate, scope shifting movement operations are not restricted to generalized quantifiers over individuals that surface as DPs, but are also attested with operators in different ontological domains (degrees and situations) that fall in the class of other syntactic categories (degree expressions, modals). Finding such correlations further solidifies the evidence for the transparent LF-model, which postulates a close relation between the logical type of an expression and its ability to affect interpretation by movement. Finally, the addition of semantically detectable overt head movement (modals) and covert head raising operations (-er) results in a typology that exhausts the full logical space created by the two parameters overt vs. covert and XP vs. X°-movement, respectively. This increase in system internal harmony can be taken as a further indicator that the Abstractness Hypothesis provides an adequate model of the interaction between surface syntax and interpretation.
5. Opacity and hidden structure In current derivational models, the combinatory syntactic system CS employs a single structure building operation (Merge), which comes in three flavors: (91) a. External Merge/First Merge: introduces new nodes at the root of the tree. b. Internal Merge/Remerge: corresponds to movement in older terminologies. c. Late Merge/Countercyclic Merge: targets positions created by movement and expands these positions by inserting nodes in non-root positions. The guises of Merge correspond to three sources of abstractness that one typically expects to find in models of the grammar which espouse the Abstractness Hypothesis. First, External Merge may lead to the presence of unpronounced, yet interpreted terminals, in
35. The Syntax-Semantics Interface
1227
contexts involving ellipsis, copy traces or silent operators ([92a]). Next, Internal Merge can manifest itself in form of invisible, covert movement such as QR ([92b]). Finally, Late Merge is (by definition) responsible for the emergence of structure at an unexpected, delayed point in the derivation ([92c]): (92) Sources of Abstractness a. External Merge: ellipsis, copy traces, silent operators b. Internal Merge: covert movement (e.g. QR) c. Late Merge: delayed emergence of structure Furthermore, there is a natural divide that singles out (92b) and (92c) to the exclusion of (92a), in that the former presuppose the existence of movement as well as a sequential ordering of representations. Apart from vindicating the Abstractness Hypothesis, the two exponents of abstractness (92b) and (92c) accordingly will be seen to supply an important tool for detecting derivations. The heuristic underlying these diagnostics is based on the concept of opacity and can − to the extent that the results are associated with interpretive effects − also be used to probe for design features of the interface between syntax and semantics. Following some introductory remarks on opacity, the sections to follow will discuss manifestations of all three types of abstractness (92) that have been identified in the recent literature. One of the strongest arguments for modeling natural language by means of derivations comes from rule opacity, a group of phenomena that plays a particularly prominent role in phonology (Kiparsky 1973). Opacity arises whenever “a rule is contradicted on the surface” (David Stampe, cited in Pullum 1976), or more generally, when properties of an expression cannot be solely attributed to surface appearance. The study of opacity also has important consequences for the theory of the syntax-semantics interface. If a category is spelled out in a derived position, and if it is possible to isolate aspects of interpretation that are not only determined by surface representations, it can be concluded that the dislocated category has retained properties of its derivational history. Opacity effects of this sort require the assumption of abstractness in syntax, and therefore demonstrate that the information flow from syntax to the semantic component is mediated by abstract representations such as LFs or enriched surface structures. Moreover, a particular subtype of opacity will be seen to furnish support for the even stronger hypothesis that at least some abstract expressions inside the syntactic tree are linked by movement. Opacity effects come in three varieties: counterbleeding, counterfeeding and combinations of feeding and bleeding (the latter will be addressed in section 5.3). Assume that a principle or rule of grammar R licenses a property P in context C. In counterbleeding opacity, instances of P show up outside C, resulting in what is also referred to as overapplication. The rule R is opaque in that the context C that determined R has been destroyed by an independent operation and C is therefore not visible in the surface form. Thus, the fact that R has applied can be inferred only by inspecting earlier stages of the derivation. To illustrate, in (93), Principle A of Binding Theory, which demands that anaphors have a local c-commanding antecedent, applies in an opaque environment, because the context of application for Principle A (c-command) has been destroyed by movement. (93) Which picture of each other1 do they1 like best?
1228
V. Interfaces
In counterfeeding opacity (manifestations of which will be encountered in 5.1), P is absent from an environment in which it is expected to emerge (underapplication). Rule R is opaque because the fact that R applied is not visible in the surface form. Again, reference to derivations is crucial, in this case in order to enable P to escape the triggering context of R. As will be explicated in the next subsection, the analytical tools to model binding opacity effects are already an integral component of the derivational model, supporting the specific design adopted in recent versions of minimalist grammars.
5.1. Binding theory and leftward movement The contrast in (94) shows that overt movement extends the binding domain of an anaphor across a potential antecedent, indicating that fronting feeds Principle A (Chomsky 1993). (94) a. I asked the boys1 [which picture of each other1]2 I should buy t2. b. *I asked the boys1 which girl will buy [which picture of each other1]. Similar observations have been made for covert movement and Principle A in Fox (2003: [28]): (95) a. ??The two rivals1 hoped that Bill would hurt [every one of each other1’s operations]. b. The two rivals1 hoped that someone would hurt [every one of each other1’s operations]. (*d _ c / c _ d) This parallelism between overt and covert displacement not only strengthens the view that scope shift is the product of a movement rule ([23a]), but also signals that Principle A does not act on surface representations, but is evaluated at LF, after the application of QR ([23c]). In (96a), the context for application of Principle A is restored subsequent to movement. On a prominent interpretation of this fact, the fronted node has been reconstructed into a position below the antecedent of the anaphor, as documented by (96b) (Barss 1986; Chomsky 1993; for surveys of reconstruction see Barss 2001 and Sportiche 2006): (96) a. [Which pictures of each other1]2 do you think that they1 like best t2? b. [Which pictures of each other1]2 do you think that they1 like best [which pictures of each other1]? The standard explanation for (syntactic) binding reconstruction is provided by the Copy Theory of movement, which posits that movement does not strand simple traces, but leaves behind copies of the fronted category (Chomsky 1993). One of these copies can then be recycled for the computation of Principle A ([96b]). Note that such an analysis of reconstruction effects presupposes the assumption of hidden structure, as expressed by (92a). Principle C is also affected by movement, yet in a slightly more complex way (Freidin 1986; Johnson 1987; Lebeaux 1990; van Riemsdijk and Williams 1981). With Ā-move-
35. The Syntax-Semantics Interface
1229
ment, overt dislocation obviates (bleeds) disjoint reference effects, but only if the name is located inside an adjunct to the fronted category, as in (97a, b) and (98a). R-expressions that are contained inside argument-like phrases ([97c] and [98b]) reconstruct for Principle C: (97) a. [Which picture [near John1]]2 did he1 like t2 best? b. [Which picture [that John1 made]]2 did he1 like t2 best? c. *[Which picture [of John1]]2 did he1 like t2 best? (98) a. [Which claim [that offended Bill1]]2 did he1 repeat t2? b. *[Which claim [that Mary offended Bill1]]2 did he1 repeat t2? Assuming the Copy Theory, (97a) should be parsed into the representation (99). (99) [Which picture [near John1]]2 did he1 like [which picture [near John1]] best? But in (99), the name is c-commanded by the coindexed pronoun, and the theory therefore wrongly predicts a disjoint reference effect. Thus, cases such as (97a, b) and (98a) represent instances of counterfeeding opacity or underapplication, because even though the context of Condition C is met, the rule does not apply, as witnessed by the wellformedness of the output. Lebeaux (1990) presents a solution for modeling counterfeeding opacity which operates on the assumption that insertion of adjuncts can be delayed, applying subsequent to movement of the host. This strategy is known as Late Merge (LM) or countercyclic Merge. As exemplified by (100), which tracks the derivation of (97a), LM has the effect that the names inside the adjuncts are added at a time when the hosting category has already escaped the c-command domain of the coreferential term. (100) a. Move host: b. Late Merge of adjunct:
[Which picture] did he1 like [which picture] best? [Which picture [near John1]]2 did he1 like t2 best?
Moreover, by restricting LM to adjuncts, corresponding countercyclic derivations are blocked for (97c) and (98b), where the names are contained inside arguments. At this point, exponents of all three types of abstractness admitted by the structure building system of derivational models ([92]) have been incorporated into the discussion: silent movement (QR), silent base generated structure (copies) and the delayed emergence of structure with LM. Two remarks are in order here. First, Lebeaux’ analysis, just like the analysis of Principle C obviation ([94]) and bleeding of Principle A ([95]), relies on a sequential ordering of representations, and therefore is compatible with a derivational model only. Second, binding reconstruction is subject to different conditions if the R-expression or anaphor resides inside a fronted predicate. As illustrated by (101), predicate movement does not extend the binding domain for anaphors ([94]). Moreover, unlike reflexives within DPs ([102a]), anaphors that are contained inside predicates cannot choose their antecedent freely from potential binders they have passed ([102b]), suggesting that Principle A is invariable evaluated in the base position of the predicate (Barss 1986; Cinque 1984). This has been taken as an indication that the fronted category includes a trace of the antecedent (Huang 1993).
1230
V. Interfaces
(101) John wonders [t1 how proud of herself1/*himself] Jill3 said that Mary1 certainly is. (102) a. I wonder [t1 how many pictures of herself1/himself2] John2 said that Mary1 liked. b. I wonder [t1 how proud of herself1/*himself2] John2 said that Mary1 certainly is. Huang’s analysis turns out to be incomplete, though, as shown by the distribution of disjoint reference effects (Heycock 1995; Takano 1995). In (103), the higher subject he2 does not bind a trace inside the predicate. Still, the pronoun cannot be construed coreferentially with the fronted name. Thus, there must be some independent reason that forces predicates to reconstruct in syntax. See Takano (1995) for further discussion. (103) *[How t2 proud of John1] do you think he2 said Mary3 is? Returning at this point to countercyclic Merge, recall that the combination of Copy Theory and LM was seen to deliver accurate results for Ā-movement above. But the analysis fails to account for a curious property of A-movement (Chomsky 1993; Lebeaux 1990). To begin with, A-movement reconstructs for the evaluation for the principles of Binding Theory, as can be inferred from (104). In this respect, raising patterns along with wh-movement. (104) Pictures of himself1 seem to nobody1 to be ugly. But unlike wh-movement, subject raising obviates Principle C violations, irrespective whether the name is contained in an adjunct or an argument (Lebeaux 1990; [106] from Takahashi 2006: 72, [15]; see also Takahashi and Hulsey 2009 and references therein). (105) Every picture of John1 seems to him1 to be great. (106) a. The claim that John1 was asleep seems to him1 to be correct. (Chomsky 1993: 37) b. Every argument that John1 is a genius seems to him1 to be flawless. (Fox 1999a: 192) c. John1’s mother seems to him1 to be wonderful. e. Pictures of John1 seem to him1 to be great. (Lebeaux 1998: 23−24) In Takahashi (2006), the exceptional behavior of A-movement is taken to signal that it is not the argument vs. adjunct distinction that regulates LM. Instead, following Bhatt and Pancheva (2004), Takahashi adopts the assumption that LM applies unrestricted, subject only to the requirement that the resulting structures be interpretable (Fox 1999). On this Wholesale Late Merge (WSLM) analysis, even countercyclic insertion of arguments becomes possible under certain conditions. Concretely, Takahashi suggests that in the derivation of (105), relevant parts of which are provided by (107), the determiner every moves up to matrix T on its own ([107a]), followed by insertion of the NPrestrictor ([107b]). Since no occurrence of John resides inside the c-command domain
35. The Syntax-Semantics Interface
1231
of he in the representation to be submitted to interpretation, the WSLM account successfully avoids a disjoint reference effect: (107) a. [TP Every2 seems to him1 [every2 to be great]] b. [TP [Every picture of John1]2 seems to him1 [every2 to be great]] Furthermore, the fragmentary LF representation (107b) can be assigned a compositional interpretation by the same, independently motivated mechanism that converts lower movement copies into legitimate interface objects (Sauerland 2004; Fox 1999). The two rules of Trace Conversion responsible for interpreting copies are given in (108) (adapted from Fox 1999, 2003; n is valued by the index on the copy): (108) Trace Conversion a. Variable Insertion: (Det) (Pred)n ~> (Det) [(Pred) λx.x = n] b. Determiner Replacement: (Det) [(Pred) λx.x = n] ~> the [(Pred) λx.x = n] In a first step, licensed by (108a), a variable is inserted into the position that is normally occupied by the NP-restrictor in the lower copy. For (105), Variable Insertion together with index reanalysis ([15]) yields (109b). (109) a. [Every picture of John]1 λ2 seems to him1 every2 to be great b. [Every picture of John]1 λ2 seems to him1 [every λx.x = 2] to be great (by [108a]) c. [Every picture of John]1 λ2 seems to him1 [the λx.x = 2] to be great (by [108b]) d. “Every z such that z is a picture of John seems to John to be such that the x which is identical to z is great” The second rule (Determiner Replacement) substitutes a definite determiner for the original one, as shown by (109c). Semantically, the combination of these two operations amounts to treating the lower movement copy as an individual variable bound by the fronted category ([109d]; for more complex cases, in which the reconstructed DP contains a bound variable, see Sauerland 2004). Thus, the criterion that LM-derivations be interpretable is met, which in turn sanctions WSLM of the restrictor. Takahashi’s account therefore correctly predicts that names embedded inside raising subjects can escape the verdict of Principle C even if they are contained in an argument. But, as was noted above, disjoint reference effects persist with argument contained names that have undergone Ā-movement (s. [97c], repeated below as [110]): (110) *[Which picture of John1]2 did he1 like t2 best? Takahashi suggests to relate this contrast to a Case requirement on LM. This condition mandates that the NP-argument of the determiner has to be present by the point in the derivation at which the category is assigned Case. Thus, Case defines the upper limit for LM: Since Ā-moved objects are Case marked in the positions they originate in (by
1232
V. Interfaces
Agree), it follows that (110) cannot be produced by the same strategy as (105), blocking derivation (111) in which the determiner moves ([111a]) and the NP-restrictor is merged countercyclically later on ([111b]): (111) a. Which2 did he1 like which2 best? b. [Which picture of John1]2 did he1 like which2 best More specifically, WSLM of the restrictor picture is illegitimate for the reason that the head noun picture agrees with the determiner, and (by assumption) must be inserted prior to Case feature checking applies. At this point, the question arises, though, why not at least the complement PP of John, which is not subject to the case condition, could be merged late. The explanation for why such a derivation of (110), which would avoid a disjoint reference effect, is unavailable follows from the interpretability requirement on the output of WSLM. Suppose that the argumenthood of nominal complements is also reflected in the arity of the predicates and that the relational head noun picture accordingly denotes a relation between individuals. Then, the determiner and the head noun induce a type clash in the lower copy subsequent to Determiner Replacement, as detailed in (112). Note incidentally that Variable Insertion provides silent predicates, but not silent individual arguments. (112) *[Which picture of John1]2 did he1 like the picture best As a result, the complement of the relational noun has to be inserted cyclically in its base position, accounting for the inability of Ā-moved arguments to escape Principle C. The next section will follow up on consequences that the LM-analysis entails for categories that do have not reached their surface by overt leftward displacement, but by (covert) movement to the right.
5.2. Binding Theory and rightward movement The LM analysis of complements receives further, independent support from its ability to contribute to a better understanding of how Principle C interacts with rightward extraposition, rightward shift of comparative complements, and movement that resolves VPellipsis. The presentation below is restricted to a synopsis of some recent results; further details can be found in Bhatt and Pancheva (2004), Fox (2003), and Hulsey and Takahashi (2009), among others. Taraldsen (1981) observed that extraposition restores Principle C violations (for similar effects of extraposition on scope see Williams 1974): (113) a. *I showed him3 a book [that Sam3 wanted to read] yesterday. b. I showed him3 a book t yesterday [that Sam3 wanted to read]. Movement to the right behaves in this respect just like leftward Ā-movement in that both admit LM of relative clauses and other adjuncts. But there is also a crucial disparity
35. The Syntax-Semantics Interface
1233
that separates extraposition to the right from leftward movement. Unlike what was seen to be characteristic of leftward LM ([100]), where the hosting category and the adjunct end up in a contiguous string, rightward movement in (113b) severs the common noun and its determiner (a book) from the relative clause. This property is of course the hallmark of non-string-vacuous extraposition. As shown in Fox and Nissenbaum (1999), word order contrasts between wh-movement and extraposition become inessential, if the standard Y- or inverted T-architecture is substituted by a single output model in which all movements apply overtly (Bobaljik 1995; Groat and O’Neil 1996; Pesetsky 2000). In such theories, extraposition is derived by the two step procedure in (114). First, the host noun (a book) undergoes silent rightward shift by Overt Covert Movement (OCM; [114a]). Second, LM attaches the relative clause to the moved node, yielding a configuration that abides by the principles regulating licit coreference relations ([114b]): (114) a. I [VP showed him3 a book] yesterday [a book] b. I [VP showed him3 a book] yesterday [a [book][that Sam3 wanted to read]] LM in contexts involving extraposition now shares all relevant properties of LM with leftward movement. In a single output model, apparent differences between these two operations therefore no longer pose an obstacle to a unified analysis of Principle C obviation. One important result of Fox and Nissenbaum’s (1999) account of the interaction between syntactic movement and interpretation has been that the empirical scope of LM analyses can be considerably extended once they are embedded into less restrictive theories of movement, and thereby less complex models of the grammar. That expansion in this direction points in the right direction is further corroborated by findings from two other empirical domains: Antecedent Contained Deletion (ACD; Fox 1999) and comparatives (Bhatt and Pancheva 2004). To begin with, the theory offers an account for a contrast that characterizes the difference between overt VPs ([115a]) and elliptical VPs in contexts of ACD ([115b]), first reported in Fiengo and May (1994: 274): (115) a. *I showed him3 every book [CP that Sam3 wanted me to show him]. b. I showed him3 every book [CP that Sam3 wanted me to ⌂]. ⌂ = [VP show him] c. I [VP showed him3 every bookpronounced] [every booksilent [CP that Sam3 wanted me to ⌂]] Fox and Nissenbaum (1999) argue that a Principle C violation in the ACD example (115b) can be averted on the assumption that QR is a special case of extraposition, the only difference being that in the case of genuine extraposition, the host categories move overtly, while ACD involves silent displacement. On this analysis, which has every book undergo silent rightward movement (OCM), followed by LM of the relative clause ([115c]), the derivation of ACD exactly mirrors that of extraposition: …] [DPpronounced [CP …]Late merged] (116) a. Extraposition: [VP … DPsilent [VP … DPpronounced …] [DPsilent [CP …]Late merged] b. ACD:
1234
V. Interfaces
Another configuration involving LM and OCM was identified by Bhatt and Pancheva (2004). Bhatt and Pancheva note that overt extraposition of comparative complements has the same effect on Principle C as extraposition of relative clauses, as the contrast (117a/b) confirms. In both cases, movement legitimizes previously unavailable coreference relations: (117) a. ??I will tell him2 a sillier rumor (about Ann) [than-XP than Mary told John2]. b. I will tell him2 a sillier rumor (about Ann) tomorrow [than-XP than Mary told John2]. c. I [will tell him2 a d-silly rumor (about Ann) tomorrow] [-ersilent [than-XP than Mary told John2]] The analysis is sketched in (117c): the degree head -er moves by OCM to its scope position (section 4.5; Heim 2000), followed by countercyclic attachment of the than-XP, in a way similar to how extraposed relative clauses are late merged with their head noun. However, unlike relative clauses, the extraposed constituent in (117) serves as an argument, and not as an adjunct. It follows that -er and the than-XP combine by WSLM. Bhatt and Pancheva also demonstrate that this operation is − just like WSLM inside DPs − regulated by interpretive considerations (see Bhatt and Pancheva 2004 for details and Grosu and Horvarth 2006 for complications). In sum, there is accumulating evidence that LM is not restricted to adjuncts, but may also apply to arguments, given that the operation yields interpretable output representations.
5.3. Feeding − Bleeding opacity (Duke of York) The line between derivational and representational theories of natural language syntax is notoriously hard to draw on empirical grounds. Partially, this is so as syntactic representations can be enriched with markers such as copies or traces that keep track of the evolution of an expression, rendering even opacity effects amenable to representational analyses. The current section adds a (third) type of opacity, known since Pullum (1975) as Duke of York, which differs from counterfeeding and counterbleeding in that it combines properties in a way that renders a representational analysis impossible. At the same time, the current section introduces evidence for two further types of silent movement operations that affect wh-phrases in pied-piping constructions. Contrasts such as (118) indicate that quantifiers induce barriers for operations that connect wh-in-situ phrases with their scope positions (Beck 1996; intervener bold; see also Pesetsky 2000). (118) a. Sie fragte, was wer wann verstanden hat. She asked what who when understood has ‘She asked, who understood what when.’ b. *Sie fragte, was niemand wann verstanden hat. She asked what nobody when understood has
[German]
35. The Syntax-Semantics Interface
1235
The group of interveners restricting the distribution of wh-in-situ also includes degree particles such as genau (‘exactly’) : (119) a. *?Sie fragte, wer gestern genau wann angekommen ist. She asked who yesterday exactly when arrived is (Sauerland and Heck 2003)
[German]
b. Sie fragte, wer gestern wann genau angekommen ist She asked who yesterday when exactly arrived is ‘She asked, who arrived yesterday when exactly.’ (adapted from Sauerland and Heck 2003) (120) a. *Sie fragte, wer gestern genau mit wem gesprochen hat. She asked who yesterday exactly with whom spoken has
[German]
b. *?Sie fragte, wer gestern mit genau wem gesprochen hat. She asked who yesterday with exactly whom spoken has c. (?)Sie fragte, wer gestern mit wem genau gesprochen hat. She asked who yesterday with whom exactly spoken has ‘She asked, who yesterday with exactly whom spoken has.’ Moreover, Sauerland and Heck notice that intervention effects are not restricted to whin-situ contexts, but are also attested with relative pronouns that pied-pipe PPs (cf. [121b] vs. [121c]). (121) a. Maria sprach [PP über genau zwei Freunde]. about exactly two friends Mary talked
[German]
b. die Freunde, [PP über die] Maria sprach about who Mary talked the friends c. *die Freunde, [PP über genau die] Maria sprach about exactly who Mary talked the friends, ‘the friends (*exactly) who Mary talked about’ A unified explanation for these observations is provided by the analysis of pied-piping by von Stechow (1996), schematized in (122a), on which the pied-piper undergoes LFmovement to its scope position (or, to be precise, to the scope position of the λ-binder that translates as the index on the pied-piper). On this view, (121c) fails to satisfy the same principle that is responsible for generating intervention effects in contexts involving wh-in-situ (see Sauerland and Heck 2003 for further discussion): (122) a. LF: the friends [who λ1 [PP
about t1] Mary talked z-----------m b. LF: the friends [who λ1 [PP exactly about t1] Mary talked z-----*------m
1236
V. Interfaces
Taken together, pied-piping and the recognition of a novel class of interveners provide the basis for a Duke of York argument in support of derivations. The evidence comes from German example such as (123). The scheme in (124) tracks how the relevant steps of the derivation, to be explicated below, unfold (PRT in the gloss stands for particle): (123) etwas [[CP [PP über (*genau) das3] auch nur mit einem seiner1 something about exactly which even only with a of.his Freunde zu sprechen]]2 wohl keiner1 tCP, 2 wagen würde PRT nobody dare would friends to speak ‘something OP3 that nobody1 would dare to talk about t3 [to even a single one of his1 friends]NPI’ (124) a. b.
[intervener1 [[CP r-pron3 [[CP r-pron3 pron1] [intervener1 [....
c. r-pron λ3 [[CP t3 pron1] [intervener1 [.... z----m d. *r-pron λ3 [ [intervener1 [[CP t3 z-----------*--------------m pron1] [intervener1 [[CP t3 e. r-pron λ3 [[CP t3 z----m
[German]
pron1 ]]] ]]] ]]] pron1 ]]] pron1 ]]]
The Duke of York argument for derivations proceeds in two steps. First, as seen in the transition from (124a) to (124c), transporting the relative pronoun (r-pron3) inside a larger, containing CP across an intervener puts the pronoun into a position from where it can be silently moved without inducing an intervention effect (cf. smuggling in Collins 2005). This indicates that the relative pronoun (r-pron3) reaches its final location at LF by moving out of the fronted CP, as in (124c), and not from the reconstructed CP, as in (124d). If the latter would have been the case, one would expect the signature of an intervention effect − which is absent in (123). The second ingredient for the Duke of York argument is provided by the fact that the fronted CP contains a pronominal variable (pron1 in [124]) that is bound by the intervener. Hence, CP must reconstruct into a position below the intervener at LF ([124e]), in order to license variable binding. It follows that CP needs to be evaluated in a position below the intervener. (In addition, [123] includes an additional safe guard that secures reconstruction of CP in form of the NPI even a single.) Given the deliberations above, it appears as if the derivation (124) imposes two contradictory requirements on CP: reconstruction is obligatory for the computation of binding relations, but prohibited for purposes of relative pronoun movement. The conflict can be resolved, though, if one assumes that intervention effects are evaluated derivationally, and if the derivation proceeds by the following three step procedure. CP is piedpiped first ([124b]), followed by silent relative pronoun movement which targets the CP in its derived position ([124c]). Finally, CP reconstructs, such that pronominal variable binding and NPI licensing can be read off the lower copy of CP ([124e]). It is exactly this type of conspiracy of upward movement, application of an operation in the upper position, followed by recycling of a lower copy which is characteristic of Duke of York derivations.
35. The Syntax-Semantics Interface
1237
To summarize, the discussion above revealed that relative pronouns that do not surface right-adjacent to their head noun covertly move to their scope positions. This movement is subject to intervention effects, hence must not cross quantifiers or particles such as exactly. If the relative clause is contained inside a larger node that moves, pied-piping can furthermore be shown to elicit evidence for a Duke of York configuration, which provides one of the strongest known arguments in support of a derivational model of the grammar. This is so as representations are unable to account for Duke of York effects even if they are enriched by standardly sanctioned abstract components such as copies.
6. Reconstruction As was seen in section 5 above, the principles of grammar do not only expand the interpretive domain of expressions upwards, but movement has also been observed to reconstruct. The final section of this article reviews a selection of such phenomena. Also, it will be seen that while movement is in most cases simultaneously undone for all reconstructible properties, there are also constellations that suggest a more complex typology. These findings support the assumption of two distinct strategies for restoring expressions into non-surface positions.
6.1. Reconstruction across three dimensions Reconstruction is attested in at least three interpretive domains (von Fintel and Heim 2011, chapter 7, among others). First, c-command sensitive relations such as disjoint reference effects, variable binding or NPI licensing can be computed in configurations that retain (some) pre-movement properties. Ample evidence of this process has been given in section 5. Second, there are contexts in which reconstruction results in scope diminishment of quantificational expressions (borrowing a term of von Fintel and Iatridou 2004). Pairs such as (125) (adapted from Lebeaux 1995; Hornstein 1996) render visible the interpretive effects of scope reconstruction with A-moved subjects. (125a) demonstrates that raising complements (marked α) are scope islands for embedded objects. Thus, the distributive, wide scope reading for every senator in (125b) cannot be derived by scope inversion in the higher clause, but must be the product of reconstructing the subject into a position inside the raising complement α. This is standardly done by assuming that t1 holds a copy of two women: (125) a. b.
Mary seems to two women [α to have danced with every senator]. (d2 _ c / *c _ d2) Two women1 seem [α t1 to dance with every senator]. (d2 _ c / c _ d2) (adapted from Lebeaux 1995)
Not all quantifiers partake in scope reduction to the same extent, and there is some debate as to the correct empirical generalization underlying these phenomena. Negative quantifiers, for one, have been recognized to resist narrow scope readings below raising
1238
V. Interfaces
predicates, (126a) cannot be paraphrased by (126b) (Partee 1971; Lasnik 1972, 1999; Penka 2002): (126) a. b.
Nobody is (absolutely) certain to pass the test. It is (absolutely) certain that nobody will pass the test.
So far, reconstruction was seen to affect either elements contained inside the restrictor of a moved category or the scope of the category itself. But there is also a third meaning related property systematically correlating with the position of a node in the tree, which manifests itself in referential opacity and de dicto vs. de re ambiguities. Referentially opaque or de dicto interpretations arise whenever the extension of an expression is not functionally determined by the speaker’s knowledge, but varies according to the way alternative worlds or situations are structured which are accessible to the subject of a higher intensional predicate (such as modals or propositional attitude predicates like believe, know or hope). To exemplify, assume that John, otherwise an entirely sane and consistent person, entertains the firm belief that all planetary systems include an uneven number of planets. Suppose moreover that the actual number of planets in the solar system is eight (which is the case as of 2014). Intuitively, sentence (127) can be used to report such a scenario because of John’s non-standard, notional belief (Quine 1956). Understood de dicto, the meaning of the predicate number of planets in the solar system is not calculated on the basis of the speaker’s knowledge about the world, but by taking into account only those situations that are compatible with John’s beliefs. (127) John believes that [DP the number of planets in the solar system] is uneven. a. de dicto:True, because of John’s non-standard beliefs about planetary systems. b. de re: False, because John does not believe that 8 is uneven. Sentence (127) also has a second meaning ([127b]), which is falsified by the scenario above. On this alternative, objectual, referentially transparent or de re reading of the common noun, the subject the number of planets is interpreted with respect to the speaker’s evaluation situation and denotes the even number 8 (assuming that the speaker is aware of current trends in astronomy). On a popular view, this ambiguity has its roots in the three assumptions (i) that situation variables are part of the object-language (Ty2; Gallin 1975; Cresswell 1990); (ii) that predicates contain silent situation variables, and (iii) that these situation variables need to be captured by c-commanding λ-binders (Percus 2000; Keshet 2010). From this, it follows that the two interpretations of (127) are mapped unto two distinct LFrepresentations, schematically rendered in (128), which minimally differ in whether the situation variable is bound by the λ-operator below believe, resulting in the de dicto reading, or a λ-binder outside the scope of the intensional verb, yielding the transparent de re construal: (128) a. de dicto: b. de re:
λ1 … [believe [λ2 … [DP the number of planets in the solar system(s2)]]] λ1 … [believe [λ2 … [DP the number of planets in the solar system(s1)]]]
35. The Syntax-Semantics Interface
1239
Crucially, configurations that involve referential opacity effects can also be used as a probe for the LF-position of dislocated predicates that contain bound situation variables. With this in the background, observe that sentence (129a) can be understood as true in the scenario above, indicating that it is possible to interpret the subject de dicto, that is within the scope of the situation variable binder below seems. Given that variable binding is contingent upon c-command at LF, availability of a de dicto reading therefore constitutes a diagnostic for reconstruction of the raised subject to a position below seem ([129b]): (129) a. [The number of planets in the solar system] seems to John to be even. (de dicto/de re) b. λ1 … [seems [λ2 [α … [DP the number of planets in the solar system(s2)]]]] Finally, the examples in (130a) combine the effects of scope diminishment and reconstruction for referential opacity. (130a), for one, is verified by models in which the individuals the speaker mistakenly identifies as unicorns vary from accessible situation to situation. Thus, the existential of a unicorn resides within the scope of the universal of seem quantifying over accessible situations. (130c) furthermore provides a clear illustration of A-reconstruction (Fox 1999). On its sensible, pragmatically plausible reading, (130c) expresses the proposition that it is likely that with every battle, a different soldier is losing his life: (130) a. A unicorn seems to be in the garden. b. Someone from NY is likely to win the lottery. c. At least one soldier seems (to Napoleon) to be likely to die in every battle. To recapitulate, DP-reconstruction potentially restores the configuration for three interpretive properties: (i) the scope of the quantificational determiner heading the fronted DP; (ii) evaluation of the principles of Binding Theory and other c-command sensitive phenomena which involve individual variables; and (iii) referential opacity, expressed in terms of situation variable binding. For ease of further reference, these properties will also be referred to as scope, binding, and opacity, respectively. On the standard analysis of reconstruction in terms of the Copy Theory of movement, one is, at least at first sight, led to expect that all three properties are evaluated together, in the same position of the tree. Thus, reconstruction for any of the three properties should entail reconstruction for the remaining two ones. Interestingly, this does not always seem to be the case. The closing part of this section will consider the nature of the correlation between scope, binding and opacity, reporting findings that pose complications for the view that all reconstruction is reducible to the Copy Theory of movement.
6.2. Scope trapping Scope Trapping denotes a family of phenomena which have in common that reconstruction for the evaluation of one principle of the grammar forces other properties to be
1240
V. Interfaces
inspected from that reconstructed position, too (see Fox 1999; Hicks 2008; Hornstein 1995; Lasnik 1998; Romero 1998; Wurmbrand and Bobaljik 1999, among others). A first manifestation of Scope Trapping (Lebeaux 1995) is attested with quantifiers that have A-moved across pronouns they bind. In principle, raising subjects may reconstruct into their local clause, resulting in a narrow, inverse scope, distributive reading for (131a). The reconstructed, narrow scope reading is lost, though, if the subject binds a reciprocal in the higher clause, as in (131b). This follows from the assumption that Principle A demands anaphors to be c-commanded by their antecedents at LF. (c _ d2) (131) a. Two women seem to me t1 to have talked with every senator. b. Two women1 seem to each other1 t1 to have talked with every senator. (*c _ d2) (adapted from Lebeaux 1995) The same point is reinforced by the infelicity of (132b), in which anaphor licensing conflicts with the (pragmatically induced) narrow scope requirement for the raised subject (Fox 1999): (132) a. One soldier1 seems (to Napoleon) t1 to be likely to die in every battle. b. #One soldier1 seems to himself1 t1 to be likely to die in every battle. These findings demonstrate that the computation of Principle A and scope cannot be distributed across different copies, as expressed by the Coherence Hypothesis in (133) (Fox 1999; Hornstein 1995; Lebeaux 2009). (133) Coherence hypothesis If α moves and reconstructs into β for the evaluation of one of the three properties scope, binding and opacity, then all three of these properties are evaluated in β. Independent, additional support for the Coherence Hypothesis is provided by a strong correlation between Principle C reconstruction and the emergence of opaque readings for the host containing the R-expression (Romero 1997: 363). The name in (134) can be construed coreferentially with the pronoun to its right only if the subject is interpreted transparently de re (i.e. the speaker considers the subject denotation to consist of nudes of Marilyn): (134) A nude of Marilyn1 seems to her1 to be a good emblem of the exhibit. (d _ seem / *seem _ d) This demonstrates that generating opaque de dicto readings by lowering a DP into the scope of an intensional operator entails binding reconstruction of material inside that DP. To summarize, the Coherence Hypothesis finds support from two generalizations. First, the binding domain of quantifiers shrinks in accordance with their scope domain ([131/132]). And second, referential opacity correlates with binding reconstruction. Trapping Effects of this sort receive a straightforward explanation on the assumption that all reconstruction has its origins in the copy theory of movement. But there are also selected contexts in which reconstruction does not affect all three principles alike, to be followed up in section 6.3.
35. The Syntax-Semantics Interface
1241
6.3. Syntactic vs. semantic reconstruction The analysis of reconstruction in terms of the Copy Theory is challenged by two paradigms indicating that the Coherence Hypothesis does not survive exposure to the full range of data. More precisely, the behavior of (short) scrambling and amount questions reveal that coherence strictly holds for binding and opacity only, allowing scope and binding to be computed in different positions. These findings are of particular theoretical significance, because they necessitate the introduction of an additional mechanism for scope diminishment apart from syntactic reconstruction by movement copies, as well as suitable conditions on this mechanism. I will discuss scrambling first, proceeding from there to more complex cases of amount questions, which involve intensional contexts. In German, reciprocals embedded inside direct objects can be bound either by the indirect object or the subject ([135a]; Grewendorf 1984): Gästen2, IO [einige Freunde von (135) a. weil wir3 den [German] since we the.DAT guests.DAT some.ACC friends.ACC of einander2/3], DO vorstellen wollten each.other introduce wanted ‘since we wanted to introduce some friends of each other to the guests’ Freunde von einander2]1, DO den Gästen2, IO t1 b. *weil ich [einige the.DAT guests.DAT since I some.ACC friends.ACC of each.other vorstellen wollte introduce wanted ‘since I wanted to introduce some friends of each other to the guests’ The former option is lost once the direct objects has been shifted to the left of the dative, as in (135b), indicating that short scrambling across datives does not reconstruct for the evaluation of anaphoric binding (Frey 1993; Haider 1993). (136) furthermore documents that short scrambling feeds scope ambiguity. At the same time, (136) patterns along with the scopally uninformative example (135) in that scrambling is not undone for variable binding (Lechner 1996, 1998): (136) weil wir3 [einige [German] Freunde von einander*2/3]1, DO t1 (d _ c / c _ d) since we some.ACC friends.ACC of each.other allen Gästen2, IO vorstellen wollten all.DAT guests.DAT introduce wanted ‘since we wanted to introduced some friends of each other to all the guests’ From this, it can be concluded that the lowest copy accessible for syntactic reconstruction must be located above the indirect object, where it resides in a position too high for sponsoring the narrow scope reading. Thus, there must be an alternative strategy for licensing scope diminishment in the absence of syntactic reconstruction. Such a device is provided by Semantic Reconstruction (SemR; von Stechow 1991: 133), a class of operations that make it possible to delay scope diminishment to the semantic component (Cresti 1995; Rullmann 1995; Sternefeld 2001). SemR, schematically depicted in (137), applies whenever the logical type of a movement trace
1242
V. Interfaces
matches the type of its antecedent. In (137a), a category α of logical type ε − typically − has crossed over another scope sensitive operator β, stranding an ε-type trace. In course of the semantic computation, α is then λ-converted into its trace position t1, resulting in narrow scope for α with respect to β ([137b]): [α [λ1, ε … [β … [t1, ε … (137) a. Movement [β … [α, ε … b. Semantic reconstruction:
(Surface order: α _ β) (Scope order: β _ α)
What is of particular significance for present purposes is the fact that SemR does not restore the binding relations early enough for them to be visible by LF. Applied to (136), this has the consequence that the direct object can be interpreted with narrow scope, while binding relations are left unchanged at LF (Lechner 1996, 1998). Sharvit (1998) brings to attention a second dissociation between scope and binding which further endorses the view that reconstruction can be postponed by SemR. In the amount question (138), the degree predicate n-many is most naturally interpreted with scope inside hope. (138) How many students who hate Anton1 does he1 hope will buy him1 a beer? a. Narrow n-many, transparent restrictor: (*de dicto/√de re) “For what number n: in all bouletic situation alternatives of Anton s’: there are n-many students who hate Anton in the actual situation and that will buy him a beer in s’.” b. *Narrow n-many, opaque restrictor: “For what number n: in all bouletic situation alternatives of Anton s’: there are n-many students who hate Anton in s’ and that will buy him a beer in s’.” Moreover, the restrictor of the wh-phrase (students who hate Anton1) can be construed de dicto, or de re. Interestingly, the de re interpretation appears to put the restrictor outside the binding domain of the higher pronoun he ([138a]) while the de dicto reading systematically correlates with the emergence of a disjoint reference effect ([138b]). This particular combination of properties in (138a) suggests that scope diminishment has been produced by SemR, and not by reconstruction in the syntactic component (see also Romero 1997). To recapitulate, the principles determining scope diminishment appear to operate at least partially independently from those which regulate reconstruction of restrictors. Certain environments which do not license binding reconstruction, but still permit narrow scope readings (short scrambling chains [136]) attest to the fact that not all scope reconstruction is the product of interpreting lower movement copies. Furthermore, Sharvit’s paradigm (138) demonstrates that scope reconstruction entails binding reconstruction only if the fronted node also reconstructs for the evaluation of opacity, i.e. if the restrictor is interpreted de dicto. Again, this indicates that scope reconstruction is independent of binding reconstruction, endorsing the view that the Coherence Hypothesis ([133]) is too strong. Evidently, these results conflict with the verdict reached in the discussion of Scope Trapping in section 6.2. Although the present survey is not the appropriate place for a synthesis, it is possible to isolate some systematic correlations one might further pursue in the search for a common analysis.
35. The Syntax-Semantics Interface
1243
The three binary choices for whether a DP reconstructs for scope, binding and opacity generate a six cell matrix shown in table 35.1. Out of these six possible dissociations among the reconstructible properties, two (IV and V) are immediately excluded by the assumption that syntactic reconstruction (SynR) by copies entails scope reconstruction. (139) Tab. 35.1: DP-reconstruction Reconstruction of a moved DP for Scope
I.
Binding
+
–
Opacity
–
Is the combination attested? a. Yes (if α contains bound category; [136] and [138]) b. No (if α is the binder; Scope Trapping [131/132])
II.
+
+
–
No (syntactic condition on s-variable binding)
III.
+
–
+
No (by condition on logical type of trace)
IV.
–
+
+
No (since SynR entails SemR)
V.
–
+
–
No (since SynR entails SemR)
VI.
–
–
+
No (by condition on logical type of trace)
Moreover, by restricting higher type traces to certain logical types it becomes possible to define SemR in such a way that it does not feed situation variable binding (von Fintel and Heim 2011; Lechner 2013). As a consequence, a fronted DP can be assigned an opaque, de dicto reading w.r.t. an intensional operator it has passed only in case the DP’s descriptive content is restored within the scope of the operator at LF. On this conception, opaque readings require reconstruction in the syntactic component, which in turn renders ineligible cells III and VI of table 35.1. At this point, two illegitimate cases of the matrix remain to be accounted for. First, Scope Trapping of the type seen in (131) and (132) (cell I/b in table 35.1) signals that a DP which semantically reconstructs cannot serve as an antecedent for an anaphor it has crossed over. This restriction arguably falls out from the fact that such constellations, schematized in (140), impose two conflicting binding requirements on the moved DPs. While SemR demands that the DP binds objects in the generalized quantifier type domain, anaphors are arguably individual variables (or contain e-type variables) and therefore require an e-type binder. (140) *[DP [λ1 … [anaphore … [t1, …]]]] Thus, scope reconstruction cannot be delayed to semantics, which in turn entails that scope and binding must be evaluated in a single position, as expressed by the Scope Trapping generalization. Finally, cell II of table 35.1 can be excluded by syntactic considerations. More specifically, the reconstructed de re reading of (134), repeated below, is blocked by the descriptive principle (70), which mandates that movement out of movement be local. This bars long s-variable binding as in (141b). It follows that syntactic reconstruction invariably produces opaque de dicto readings.
1244
V. Interfaces
(141) a. A nude of Marilyn1 seems to her1 to be a good emblem of the exhibit. (d _ seem / *seem _ d) b. de re: *λ1 … [seems to her1 [λ2 … [[DP a nude of Marilyn1]… Thus, along these lines it becomes possible to envision a system that eliminates all dissociations among scope, opacity and binding but the single attested one (cell I/a in table 35.1). Naturally, the remarks above do not substitute for an explicit and more complete theory of reconstruction. Still, the systematicity of the phenomena involved, to the extent that they are understood, foster the hope that such a theory is not beyond reach.
7. Two recent developments This final section sketches two trends in current research on the syntax-semantics interface. First, current LF-based theories show a tendency to postulate an increasing amount of hidden structure in syntax. In sections 5 and 6, this trend was seen to underlie the standard account of reconstruction phenomena in terms of movement copies. But enriched object-language representations have also been identified in other domains, and are, among others, manifest in analyses that treat third person pronouns as hidden definite descriptions (Elbourne 2005; Sauerland 2007; for resumptive pronouns see Guilliot 2008), or in the resurrected interest in raising analyses of relative clauses (Hulsey and Sauerland 2006) and related constructions such as comparatives (Lechner 2004). Interestingly, the general willingness of research in the LF-tradition to adopt more structure than meets the eye co-exists with a second tendency, which appears to point into the diametrically opposite direction. Various authors have observed that the grammaticality status of certain constructions is not determined by all terminals that can be assigned a model theoretic interpretation, but seems to be sensitive only to properties of the logical constants, or the logical skeleton (Gajewski 2002) of the expressions involved. Prominently among these phenomena are the definiteness effect in the existential construction ([142]; Barwise and Cooper 1981), and quantifier restrictions in exceptives ([143], von Fintel 1994). (142) a. There is no man/a man (in the garden). b. *There is/are every man/most men (in the garden) (143) every man/no man/*a man/*many men/*most men except Bill For both cases, the logical skeleton can be obtained by replacing each non-logical constant (man/men and is/are in [142], and man/men and Bill in [143]) by a variables of corresponding type. It then becomes possible to define a calculus which derives wellformedness of an expression from the logical syntax of the components involved (Gajewski 2002). On this conception, triviality in truth conditions translates into ungrammaticality. Reflexes of the logical skeleton can also be detected in other areas. For instance, on the theory of Scope Economy (Fox 2000; see section 4.4), scope shifting operations are licensed only if they result in truth-conditionally distinct and logically independent read-
35. The Syntax-Semantics Interface
1245
ings. Since the contribution of non-logical constants is generally irrelevant for the calculation of entailments and other logical relations, it is possible to decide whether an operation is legitimized by inspecting the logical skeleton, which contains the logical constants (Boolean operators, quantifiers, …) only. Fox moreover proposes that the scope relations among the operators are computed in a language specific deductive component, the Deductive System (DS), which calculates entailments among competing interpretations. In this manner, DS determines whether a syntactic scope shifting operation is licensed by Scope Economy or not, rendering legitimate certain instances of non-local, scope shifting QR. This conception has an important further consequence for the architecture of the grammar, as it entails that some aspects of natural language syntax are conditioned by the extra-syntactic DS module. Thus, the syntactic autonomy hypothesis, on which syntactic operations can only be motivated by properties of the syntactic component must be weakened, granting DS privileged access to the syntactic component. That purely logical properties have the ability to trigger syntactic operations is also reflected in the widely held view that (certain) type mismatches are resolved by silent movement (Heim and Kratzer 1998). If the type of each atomic expression is fully specified in the logical skeleton, the recursive type assignment rules can also be computed in DS, from where they are subsequently passed on to the syntactic derivation, triggering QR if need arises. Thus, the hypothesis that type mismatches can be repaired by syntactic movement operations provides further corroborating evidence for a model of the grammar in which the syntactic component is not entirely informationally encapsulated, but accepts instructions from a language specific logical subsystem. In sum, recent studies on the syntax-semantics interface have identified two architectural features that correlate with an orientation of the grammar into two opposite directions. On the one hand, some natural language properties are determined by the logical constants of the expressions involved, suggesting that syntax corresponds with a designated deductive system DS via impoverished representations that ignore non-logical constants (logical skeleton). By contrast, various tests that diagnose the presence of descriptive, lexical content indicate that object-language representations are enriched by copies and silent definite descriptions in the position of pronouns.
35. Acknowledgements I would like to thank Elena Anagnostoulou, Friedrich Neubarth and Viola Schmitt for detailed comments on an earlier version which led to numerous improvements. Also, I am grateful to Kyle Johnson for helpful suggestions. All errors and misrepresentations are my own. Parts of this research was supported by an Alexander von Humboldt fellowship.
1246
V. Interfaces
8. References (selected) Aoun, Joseph, and Yen-hui Audrey Li 1993 Syntax of Scope. Cambridge, M.A.: MIT Press. Bach, Emmon 1976 An extension of classical transformational grammar. In: Saenz (ed.), Problems of Linguistic Metatheory, Proceedings of the 1976 Conference at Michigan State University, 183−224. East Lansing, MI: Michigan State University. Barker, Chris, and Pauline Jacobson (eds.) 2007 Direct Compositionality. Oxford: Oxford University Press. Baker, C. Lee 1970 Notes on the description of English questions: The role of an abstract question morpheme. Foundations of Language 6: 197−219. Barss, Andrew 1986 Chains and anaphoric dependence: on reconstruction and its implications. Ph.D. dissertation, Department of Linguistics and Philosophy, MIT. Barss, Andrew, and Howard Lasnik 1986 A note on anaphora and double objects. Linguistic Inquiry 17: 347−354. Bhatt, Rajesh, and Roumyana Pancheva 2004 Late Merger of degree clauses. Linguistic Inquiry 35.1: 1−45. Beck, Sigrid 1996 Quantified structures as barriers for LF-movement. Natural Language Semantics 4.1: 1−56. Beck, Sigrid 2000 The semantics of different: Comparison operator and relational adjective. Linguistics and Philosophy 23.1: 101−139. Beck, Sigrid 2011 Comparative constructions. In: Claudia Maienborn, Klaus von Heusinger and Paul Portner (eds.), Semantics: An International Handbook of Natural Language Meaning, 1341− 1390. Berlin, New York: Mouton de Gruyter. Beghelli, Fillipo, and Tim Stowell 1997 The syntax of distributivity and negation. In: Anna Szabolcsi (ed.), Ways of Taking Scope, 71−108. Dordrecht: Kluwer. Bobaljik, Jonathan 1995 Morphosyntax: the syntax of verbal inflection. Ph.D. dissertation, Department of Linguistics and Philosophy, MIT. Bresnan, Joan 1982 The Mental Representation of Grammatical Functions Cambridge, M.A.: MIT Press. Bresnan, Joan 2001 Lexical Functional Syntax. Malden: Blackwell. Brody, Michael 1995 Lexico-Logical Form: A Radically Minimalist Theory. Cambridge, M.A.: MIT Press. Bruening, Benjamin 2001 QR obeys Superiority: Frozen Scope and ACD. Linguistic Inquiry 32.2: 233−273. Büring, Daniel 2005 Binding Theory. Cambridge: Cambridge University Press. Cardone, Felice, and Roger Hindley to appear History of Lambda-calculus and Combinatory Logic. In: Dov Gabbay and John Woods (eds.), Handbook of the History of Logic. Swansea University Mathematics Department Research Report. [Available at http://www.users.waitrose.com/~hindley/Some Papers_PDFs/2006Car,HistlamReprt.pdf]
35. The Syntax-Semantics Interface
1247
Carlson, Greg 1987 Same and different: consequences for syntax and semantics. Linguistics and Philosophy 10.4: 531−565. Carnap, Rudolf 1934 Logische Syntax der Sprache. Vienna: Springer. Chierchia, Gennaro 1984 Topics in the syntax and semantics of infinitives and gerunds. Ph.D. dissertation, University of Massachusetts, Amherst. Amherst: GLSA. Chierchia, Gennaro 1989 Anaphora and attitudes de se. In: Renate Bartsch, Jan van Benthem and Peter van Emde Boas (eds.), Semantics and Contextual Expression, 1−32. Dordrecht: Foris. Chomsky, Noam 1973 Conditions on transformations. In: Stephen Anderson and Paul Kiparsky (eds.), A Festschrift for Morris Halle, 232−286. New York: Holt, Rinehart and Winston. Chomsky, Noam 1976 Conditions on rules of grammar. Linguistic Analysis 2: 303−351. Chomsky, Noam 1993 A minimalist program for linguistic theory. In: Ken Hale and Jay Keyser (eds.), The View From Building 20, 1−52. Cambridge, M.A.: MIT Press. Chomsky, Noam 1995 The Minimalist Program. Cambridge, M.A.: MIT Press. Church, Alonzo 1936 An unsolvable problem of elementary number theory. American Journal of Mathematics 58: 345−363. Collins, Chris 2005 A smuggling approach to passive in English. Syntax 8.2: 81−120. Cooper, Robin 1979 Variable binding and relative clauses. In: Franz Guenthner and S.J. Schmidt (eds.), Formal Semantics and Pragmatics for Natural Languages, 131−170. Dordrecht: Reidel. Cooper, Robin 1983 Quantification and Syntactic Theory. Dordrecht: Reidel. Cooper, Robin, and Terence Parsons 1976 Montague Grammar, generative semantics and interpretive semantics. In: Barbara Partee (ed.), Montague Grammar, 311−362. New York: Academic Press. Cresti, Diana 1995 Extraction and reconstruction. Natural Language Semantics 3.1: 79−122. Diesing, Molly 1992 Indefinites. Cambridge, M.A.: MIT Press. Dowty, David 1997 Non-constituent coordination, wrapping, and multimodal categorial grammars. In: M.L. dalla Chiara, K. Doets, D. Mundici and J. van Benthem (eds.), Proceedings of the 1995 International Congress of Logic, Methodology, and Philosophy of Science, Florence, 347−368. Dordrecht: Kluwer. Egg, Markus 2010 Underspecification. Language and Linguistics Compass 4: 166−181. Elbourne, Paul 2005 Situations and Individuals. Cambridge, M.A.: MIT Press. Enç, Murvet 1988 The syntax-semantics interface. In: Frederick J. Newmeyer (ed.), Linguistics: The Cambridge Survey, 239−255. Cambridge: Cambridge University Press. Farkas, Donka 2000 Scope matters. In: Klaus von Heusinger and Urs Egli (eds.), Reference and Anaphoric Relations, 79−108. Dordrecht: Kluwer Academic Publishers.
1248
V. Interfaces
Feferman, Anita Burdman, and Solomon Feferman 2004 Alfred Tarski: Life and Logic. Cambridge: Cambridge University Press. Fiengo, Robert, and Robert May 1994 Indices and Identity. Cambridge, M.A.: MIT Press. von Fintel, Kai 1994 Restrictions on quantifier domains. Ph.D. dissertation, Department of Linguistics, University of Massachusetts, Amherst. Amherst: GLSA. von Fintel, Kai, and Irene Heim 2011 Intensional semantics. Lecture notes, MIT. [Available at: http://web.mit.edu/fintel/fintelheim-intensional.pdf] von Fintel, Kai, and Sabine Iatridou 2004 Epistemic containment. Linguistic Inquiry 34.2: 173−198. Fodor, Janet, and Ian Sag 1982 Referential and quantificational indefinites. Linguistics and Philosophy 5.3: 355−398. Fodor, Janet Dean 1970 The linguistic description of opaque contexts. Ph.D. dissertation, Department of Linguistics and Philosophy, MIT. Fodor, Jerry 1983 The Modularity of Mind: An Essay on Faculty Psychology. Cambridge, M.A.: MIT Press. Fox, Danny 1995 Economy and scope. Natural Language Semantics 3.3: 283−341. Fox, Danny 1999 Reconstruction, variable binding and the interpretation of chains. Linguistic Inquiry 30: 157−196. Fox, Danny 2000 Economy and Semantic Interpretation. Cambridge, M.A.: MIT Press. Fox, Danny 2003 On logical form. In: Randall Hendrick (ed.), Minimalist Syntax. Blackwell Publishers. Fox, Danny, and Martin Hackl 2006 The universal density of measurement. Journal of Philosophy 29: 537−586. Fox, Danny, and Jon Nissenbaum 1999 Extraposition and scope: a case for overt QR. Proceedings of WCCFL 18. Frege, Gottlob 1892 Über Sinn und Bedeutung. Zeitschrift für Philosophie und Philosophische Kritik 100: 25−50. Freidin, Robert 1986 Fundamental issues in the theory of binding. In: Barbara Lust (ed.), Studies in the Acquisition of Anaphora, Volume I, 151−188. Dordrecht: Reidel. Frey, Werner 1993 Syntaktische Bedingungen für die Interpretation. Berlin: Studia Grammatica. Gajewski, Jon 2002 L-analyticity and natural language. Ms., MIT. [Available at http://gajewski.uconn.edu/ papers/analytic.pdf]. Gallin, Daniel 1975 Intensional and Higher-order Modal Logic. With Applications to Montague Semantics. Amsterdam: New-Holland. Geach, Peter [1962] 1980 Reference and Generality. An Examination of Some Medieval and Modern Theories. Ithaca and London: Cornell University Press Golan, Yael 1993 Node crossing economy, superiority and D-Linking. Ms., Tel Aviv University.
35. The Syntax-Semantics Interface
1249
Grewendorf, Günther 1984 Reflexivierungsregeln im Deutschen. Deutsche Sprache 1: 14−30. Groat, Erich, and John O’Neil 1996 Spell-Out at the LF-interface. In: Werner Abraham, Samuel David Epstein, Höskuldur Thráinsson and Jan-Wouter Zwart (eds.), Minimal Ideas, 113−139. Amsterdam: John Benjamins. Grosu, Alexander, and Julia Horvath 2006 Reply to Bhatt and Pancheva’s “Late merger of degree clauses”: the irrelevance of (non)conservativity. Linguistic Inquiry 37: 457−483. Guilliot, Nicolas 2008 To reconstruct or not to reconstruct: that is the question. Proceedings of the Workshop on What Syntax feeds Semantics. [Available at: http://nicolas.guilliot.chez-alice.fr/essllisynsem-guilliot.pdf] Haider, Hubert 1993 Deutsche Syntax − Generativ. Tübingen: Gunter Narr Verlag. Heim, Irene 2000 Degree operators and scope. In: Brendan Jackson and Tanya Matthews (eds.), Proceedings of SALT X, 40−64. Ithaca, N.Y.: Cornell University, CLC Publications. Heim, Irene, and Angelika Kratzer 1998 Semantics in Generative Grammar. Oxford: Blackwell. Hendrick, Randall, and Michael Rochemont 1982 Complementation, multiple wh- and echo questions. Ms., University of North Carolina University of California, Irvine. Hendriks, Herman 1993 Studied flexibility. Categories and types in syntax and semantics. Ph.D. dissertation, Department of Linguistics, University of Amsterdam. Heycock, Caroline 1995 Asymmetries in reconstruction. Linguistic Inquiry 26.4: 547−570. Hicks, Glyn 2008 Why the Binding Theory doesn’t apply at LF. Syntax 11.3: 255−280. Hintikka, Jaako, and Gabriel Sandu 1997 Game-theoretical semantics. In: Johan van Benthem and Alice ter Meulen (eds.), Handbook of Logic and Language, 361−410. North Holland: Elsevier. Horn, Lawrence 2000 Pick a theory (not just any theory): indiscriminatives and the free-choice indefinite. In: Lawrence Horn and Yasuhiko Kato (eds.), Negation and Polarity, 147−193. Oxford: Oxford University Press. Hornstein, Norbert 1995 Logical Form: From GB to Minimalism. Cambridge, M.A.: Basil Blackwell. Huang, C.-T. James 1982 Logical relations in Chinese and the theory of grammar. Ph.D. dissertation, Department of Linguistics and Philosophy, MIT. Huang, C.-T. James 1993 Reconstruction and the structure of VP: some theoretical consequences. Linguistic Inquiry 24.1: 103−138. Huang, C-T James 1995 Logical form. In: Gert Webelhuth (ed.), Government and Binding Theory and the Minimalist Program: Principles and Parameters in Syntactic Theory, 127−177. Oxford and Cambridge: Blackwell Publishing. Hulsey, Sarah, and Uli Sauerland 2006 Sorting out relative clauses: a reply to Bhatt. Natural Langauge Semantics 14.1: 111− 137.
1250 Ioup, G. 1975
V. Interfaces
The treatment of quantifier scope in a transformational grammar. Ph.D. dissertation, Department of Linguistics, City University of New York. Jacobson, Pauline 1996 Semantics in categorial grammar. In: Shalom Lappin (ed.), The Handbook of Contemporary Semantic Theory, 89−116. Oxford: Basic Blackwell. Jacobson, Pauline 2002 The (dis)organization of the grammar: 25 years. Linguistics and Philosophy 25.6. Johnson, Kyle 1987 Against the notion SUBJECT. Linguistic Inquiry 18: 354−361. Johnson, Kyle 2000 How far will quantifiers go? In: Roger Martin, David Michaels and Juan Uriagereka (eds.), Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik, 187−210. Cambridge, M.A.: MIT Press. Kamp, Hans, and Uwe Reyle 1993 From Discourse to Logic. Dordrecht: Kluwer Academic Publishers. Kennedy, Chris 1997 Antecedent contained deletion and the syntax of quantification. Linguistic Inquiry 28.4. Keshet, Ezra 2010 Situation economy. Natural Language Semantics 18.3: 385−434. Kiparsky, Paul 1973 Abstractness, opacity and global rules. In: Osama Fujimura (ed.), Three Dimensions of Linguistic Theory, 57−86. Tokyo: TEC Corporation. Kiss, Tibor 2000 Configurational and relational scope determination in German. In: Tibor Kiss and Detmar Meurers (eds.), Constraint-Based Approaches to Germanic Syntax, 141−176. Standford: CSLI. Koster, Jan 1986 Domains and Dynasties: The Radical Autonomy of Syntax. Dordrecht: Foris. Kratzer, Angelika, and Junko Shimoyama 2002 Indeterminate phrases: the view from Japanese. In: Yukio Otsu (ed.), The Proceedings of the Third Tokyo Conference on Psycholinguistics, 1−25: Tokyo: Hituzi Syobo. Kratzer, Angelika 1998 Scope or pseudo-scope? Are there wide-scope indefinites? In: Susan Rothstein (ed.), Events in Grammars, 163−196. Dordrecht: Kluwer. Krifka, Manfred 1998 Scope inversion under the rise-fall contour in German. Linguistic Inquiry 29.1: 75−112. Lakoff, George 1971 On generative semantics. In: D.D. Steinberg and L. Jacobovits (eds.), Semantics, 232− 296. Cambridge: Cambridge University Press. Landman, Fred 2003 Indefinites and the Type of Sets. Malden: Blackwell. Lappin, Shalom 1993 Concepts of logical form in linguistics and philosophy. In: Asa Kasher (ed.), The Chomskyan Turn, 300−333. Cambridge, M.A.: Blackwell Publishers. Larson, Richard 1985 Quantifying into NP. Ms., MIT. Lasnik, Howard 1972 Analyses of negation in English. Ph.D. dissertation, Department of Linguistics and Philosophy, MIT. Lasnik, Howard 1999 Chain of arguments. In: Samuel Epstein and Norbert Hornstein (eds.), Working Minimalism, 189−215. Cambride, M.A.: MIT Press.
35. The Syntax-Semantics Interface
1251
Lasnik, Howard, and Tim Stowell 1991 Weakest Crossover. Linguistic Inquiry 22: 687−720. Lasnik, Howard, and Mamoru Saito 1992 Move α. Cambridge, M.A.: MIT Press. Lebeaux, David 1990 Relative clauses, licensing, and the nature of the derivation. In: Juli Carter, Rose-Marie Dechaine, Bill Philip, and Tim Sherer (eds.), Proceedings of North East Linguistic Society 20, 318−332. University of Massachusetts, Amherst: GLSA. Lebeaux, David 1995 Where does the Binding Theory apply? Maryland Working Papers in Linguistics 3: 63−88. Lebeaux, David 2009 Where Does the Binding Theory Apply? Cambridge, M.A.: MIT Press. Lechner, Winfried 1996 On semantic and syntactic reconstruction. Wiener Linguistische Gazette 57−59: 63−100. Lechner, Winfried 1998 Two Kinds of Reconstruction. Studia Linguistica 52. 3: 276−310. Lechner, Winfried 2004 Ellipsis in Comparatives. Berlin, New York: Mouton de Gruyter. Lechner, Winfried 2007 Interpretive effects of head movement. Ms., University of Stuttgart. [Available at http:// ling.auf.net/lingBuzz/000178]. Lechner, Winfried 2012 Structure building from below: more on Survive and covert movement. In: Valmala Vidal and Myriam Uribe-Etxebarria (eds.), Structure Building, 297−329. Cambridge: Cambridge University Press. Lechner, Winfried 2013 Diagnosing covert movement with the Duke of York and reconstruction. In: Lisa Cheng and Norbert Corver (eds.), Diagnosing Syntax, 158−190. Oxford: Oxford University Press. Lewis, David 1970 General semantics. Synthese 22: 18−67. Linebarger, Marcia 1980 The grammar of negative polarity. Ph.D. dissertation, Department of Linguistics and Philosophy, MIT. May, Robert 1977 The grammar of quantification. Ph.D. dissertation, Department of Linguistics and Philosophy, MIT. May, Robert 1985 Logical Form: Its Structure and Derivation. Cambridge, M.A.: MIT Press. May, Robert 1993 Syntax, semantics, and logical form. In: Asa Kasher (ed.), The Chomskyan Turn, 334− 359. Cambridge, M.A.: Blackwell Publishers. May, Robert 1999 Logical form in linguistics. In: Robert Wilson and Frank Keil (eds.), MIT Encyclopedia of Cognitive Sciences, 486−488. Cambridge, M.A. MIT Press. Menzel, Christopher 1998 Logical form. In: Edward Craig (ed.), Routledge Encyclopedia of Philosophy. London and New York: Routledge. Montague, Richard 1970 Universal Grammar. Theoria 36: 373−398. [Reprinted in Montague 1974: 222−246].
1252
V. Interfaces
Montague, Richard 1973 The proper treatment of quantification in ordinary English. In: Jaako Hintikka, Julius Moravcsik and Peter Suppes (eds.) Approaches to Natural Language, 221−242. Dordrecht: Reidel. [Reprinted in Montague 1974: 247−270; Reprinted in Portner and Partee 2002: 17−34.] Mostowski, Andrzej 1957 On a generalization of quantifiers. Fundamental Mathematics 44: 12−36. Partee, Barbara H. 1970 Opacity, coreference and pronouns. Synthese 21: 359−385. [Reprinted in Partee 2004, 26−49.] Partee, Barbara H. 1971 On the requirement that transformations preserve meaning. In: Charles J. Fillmore and Terence Langendoen (eds.), Studies in Linguistic Semantics, 1−21. New York: Holt, Rinehart and Winston. Partee, Barbara H. (ed.) 1976 Montague Grammar. New York: Academic Press. Partee, Barbara H. 1996 The development of formal semantics in linguistic theory. In: Shalom Lappin (ed.), The Handbook of Contemporary Semantic Theory, 11−38. Oxford: Blackwell. Partee, Barbara H. and Herman Hendrik 1997 Montague Grammar. In: Johan van Benthem and Alice ter Meulen (eds.), Handbook of Logic and Language, 5−92. Amsterdam, Cambridge, M.A.: Elsevier and MIT Press. Penka, Doris 2002 Kein muss kein Problem sein. MA-Thesis, University of Tübingen. Percus, Orin 2000 Constraints on some other variables in syntax. Natural Language Semantics 8.3: 173− 231. Peters, Stanley and Dag Westerståhl 2006 Quantifiers in Language and Logic. Oxford: Oxford University Press. Pesetsky, David 2000 Phrasal Movement and its Kin. Cambridge, M.A.: MIT Press. Pietroski, Paul 2009 Logical form. In: Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy. Fall 2009 Edition. URL: http://plato.stanford.edu/archives/fall2009/entries/logical-form/ Pollard, Carl, and Ivan Sag 1994 Head-driven Phrase Structure Grammar. Chicago: University of Chicago Press. Pullum, Geoffrey 1976 The Duke-of-York gambit. Journal of Linguistics 12: 83−102. Preyer, Gerhard, and Georg Peter 2002 Logical Form and Language. Oxford: Oxford University Press. Quine, Willard O. 1956 Quantifiers and propositional attitudes. Journal of Philosophy 53: 101−111. Reinhart, Tanya 1976 The syntactic domain of anaphora. Ph.D. dissertation, Department of Linguistics and Philosophy, MIT. Reinhart, Tanya 1997 Quantifier scope: how labor is divided between QR and choice functions. Linguistics and Philosophy 20.4: 335−397. Reinhart, Tanya 2006 Interface Strategies. Cambridge, MA: MIT Press. Richards, Norvin 2001 Movement in Language. Oxford: Oxford University Press.
35. The Syntax-Semantics Interface
1253
Riemsdijk, Henk van, and Edwin Williams 1981 NP-structure. The Linguistic Review 1: 171−217. Rizzi, Luigi 1996 Residual verb-second and the wh-Criterion. In: Adriana Belletti and Luigi Rizzi (eds.), Parameters and Functional Heads, 63−90. Oxford: Oxford University Press. Romero, Maribel 1997 The correlation between scope reconstruction and connectivity effects. In: E. Curtis, J. Lyle and G. Webster (eds.), Proceedings of the XVI West Coast Conference in Formal Linguistics, 351−365. Romero, Maribel 1998 Problems for a semantic account of scope Reconstruction. In: Graham Katz, Shin-Sook Kim and Heike Winhart (eds.), Reconstruction. Proceedings of the 1997 Tübingen Workshop, 127−155. Stuttgart/Tübingen: University of Stuttgart/University of Tübingen. Rooth, Mats 1992 Ellipsis redundancy and reduction redundancy. In: Steve Berman and Arild Hestvik (eds.), Proceedings of the Stuttgart Ellipsis Workshop. Stuttgart. Rudin, Catherine 1988 On multiple questions and multiple wh-fronting. Natural Language and Linguistic Theory 6: 445−501. Rullmann, Hotze 1995 Maximality in the semantics of wh-constructions. Ph.D. dissertation, Department of Linguistics, University of Massachusetts, Amherst. Amherst: GLSA. Russell, Bertrand 1905 On denoting. Mind 14: 479−493. Ruys, Eddie 1992 The scope of indefinites. Ph.D. dissertation, department of Linguistics, Utrecht University. Ruys, Eddie, and Yoad Winter 2011 Scope ambiguities in formal syntax and semantics. In: Dov Gabbay and Franz Guenthner (eds.), Handbook of Philosophical Logic, Volume 16, 159−225. Amsterdam: John Benjamins. Sag, Ivan 1976 Deletion and logical form. Ph.D dissertation, Department of Linguistics and Philosophy, MIT. Sauerland, Uli 2004 The interpretation of traces. Natural Language Semantics 12.1: 63−128. Sauerland, Uli 2005 DP is not a scope island. Linguistic Inquiry 36.2: 303−314. Sauerland, Uli 2007 Flat binding: binding without sequences. In: Uli Sauerland and Hans Martin Gärtner (eds.), Interfaces + Recursion = Grammar? Chomsky’s Minimalism and the View from Syntax-Semantics. Berlin: de Gruyter. Sauerland, Uli, and Arnim von Stechow 2001 Syntax-semantics interface. In: N. Smelser and P. Baltes (eds.), International Encyclopedia of the Social & Behavioural Sciences, 15412−15418. Oxford: Pergamon. Sauerland, Uli, and Fabian Heck 2003 LF-intervention effects in pied-piping. In: Makoto Kadowaki and Shigeto Kawahara (eds.), Proceedings of North East Linguistic Society NELS 33. Shan, Chung-Chieh, and Chris Barker 2006 Explaining crossover and superiority as left-to-right evaluation. Linguistics and Philosophy 29: 1.
1254
V. Interfaces
Sharvit, Yael 1998 How-many questions and attitude verbs. Ms. University of Pennsylvania. Sportiche, Dominique 2005 Division of labor between Merge and Move. Ms., UCLA. [Available at: http:// ling.auf.net/lingbuzz/000163] Sportiche, Dominique 2006 Reconstruction, binding and scope. In: Martin Everaert and Henk van Riemsdijk (eds.), The Blackwell Companion to Syntax, volume IV, 35−94.Oxford: Blackwell. Stechow, Arnim von 1991 Syntax und Semantik. In: Arnim von Stechow and Dieter Wunderlich (eds.), Semantik: Ein Handbuch der Zeitgenössischen Forschung/Semantics: An International Handbook of Contemporary Research, 90−148. Berlin/New York: de Gruyter. Stechow, Arnim von 1993 Die Aufgaben der Syntax. In: Joachim Jacobs, Arnim von Stechow, Wolfgang Sternefeld and Theo Vennemann (eds.), Syntax − Ein Zeitgenössisches Handbuch Syntaktischer Forschung, 1−88. Berlin/New York: de Gruyter. Stechow, Arnim von 1996 Against LF pied-piping. Natural Langauge Semantics 4.1: 57−110. Stechow, Arnim von 2000 Some remarks on choice functions and LF-movement. In: Klaus von Heusinger and Urs Egli (eds.), Reference and Anaphoric Relations. Dordrecht: Kluwer Stechow, Arnim von 2007 Schritte zur Satzsemantik. Ms., University of Tübingen. [Available at http:// www2.sfs.uni-tuebingen.de/~arnim10/Aufsaetze/index.html]. Stechow, Arnim von 2011 Syntax and semantics: an overview. In: Claudia Maienborn, Klaus von Heusinger and Paul Portner (eds.), Semantics:An International Handbook of Natural Language Meaning, 2173−2224. Berlin, New York: Mouton de Gruyter. Steedman, Mark 2000 The Syntactic Process. Cambride: MIT Press. Steedman, Mark 2012 Taking Scope. Cambride: MIT Press. Steedman, Mark, and Jason Baldrige 2011 Combinatory Categorial Grammar. In: Robert Borsley and K. Borjars (eds.), Non-Transformational Syntax, 181−224. Malden: Blackwell. Sternefeld, Wolfgang 2001 Semantic vs. syntactic reconstruction. In: Hans Kamp, Antje Rossdeutscher and Christian Rohrer (eds.), Linguistic Form and its Computation, 145−182. Stanford: CSLI Publications. Szabolcsi, Anna (ed.) 1997 Ways of Taking Scope. Dordrecht: Kluwer. Szabolcsi, Anna 2001 The syntax of scope. In: Mark Baltin and Chris Collins (eds.), The Handbook of Contemporary Syntactic Theory, 607−633. Malden: Blackwell. Szabolcsi, Anna 2010 Quantification. Cambridge: Cambridge University Press. Szabolcsi, Anna 2011 Scope and binding. In: Claudia Maienborn, Klaus von Heusinger and Paul Portner (eds.), Semantics: An International Handbook of Natural Language Meaning, 1605−1641. Berlin, New York: Mouton de Gruyter. Takahashi, Shoichi 2006 Decompositionality and identity. Ph.D. dissertation, Department of Linguistics and Philosophy, MIT.
35. The Syntax-Semantics Interface
1255
Takahashi, Shoichi, and Sarah Hulsey 2009 Wholesale Late Merger: beyond the A/Ā-distinction. Linguistic Inquiry 40.3: 387−426. Takano, Yuji 1995 Predicate fronting and internal subjects. Linguistic Inquiry 26.2: 327−340. Taraldsen, Knut Tarald 1981 The theoretical interpretation of a class of marked extractions. In: Adriana Belletti, Luigi Brandi and Luigi Rizzi (eds.), Theory of Markedness in Generative Grammar: Proceedings of the 1979 GLOW Conference, 475−516. Pisa: Scuola Normale Superiore. Tarski, Alfred 1936 Der Wahrheitsbegriff in den formalisierten Sprachen. Studia Philosophica 1: 261−405. Wasow, Thomas 1972 Anaphoric Relations in English. Ph.D. dissertation, Department of Linguistics and Philosophy, MIT. Wilder, Chris 1997 Phrasal movement in LF: de re readings, VP-ellipsis and binding. In: Kiyomi Kusumoto (ed.), Proceedings of North East Linguistic Society 27, 425−440. University of Massachusetts, Amherst: Graduate Linguistic Student Association. Williams, Edwin 1974 Rule ordering in syntax. Ph.D. dissertation, Department of Linguistics and Philosophy, MIT. Williams, Edwin 1977 Discourse and logical form. Linguistic Inquiry 8: 101−139. Williams, Edwin 1986 A reassignment of the functions of LF. Linguistic Inquiry 17.2: 264−300. Williams, Edwin 2003 Representation Theory. Cambride, M.A.: MIT Press. Winter, Yoad 1997 Choice functions and the scopal semantics of indefinites. Linguistics and Philosophy 20.4: 399−467. Winter, Yoad 2001 Flexibility Principles in Boolean Semantics. Cambride, M.A.: MIT Press. Wittgenstein, Ludwig 1929 Some remarks on logical form. Proceedings of the Aristotelian Society, Supplementary Volume, 162−171. Wurmbrand, Susi 2008 Word order and scope in German. Groninger Arbeiten zur Germanistischen Linguistik 46: 89−110. Wurmbrand, Susi, and Jonathan Bobaljik 1999 Modal verbs must be raising verbs. In: Sonya Bird, Andrew Carnie, Jason D. Haugen and Peter Norquest (eds.), Proceedings of the 18th West Coast Conference on Formal Linguistics, 599−612.
Winfried Lechner, Athens (Greece)
1256
V. Interfaces
36. The Syntax − Pragmatics Interface 1. 2. 3. 4. 5. 6.
Introduction Syntax and pragmatics: General considerations SPI: some facts The model Conclusion References (selected)
Abstract This chapter considers the status of the interface between syntax and pragmatics. The chapter begins with general considerations regarding the syntactic framework to be used and the notion of pragmatics that is relevant to the study. A number of empirical cases are then considered, including information structural notions and their representation and then the syntactic aspects of the calculation of scalar implicatures. The chapter concludes by attempting to pull together the different strands of research and conceptual development discussed and consider their implications for the nature of the model of grammatical architecture.
1. Introduction The topic of this paper is the syntax-pragmatics interface (SPI). There are many ways to approach the task of writing (and thinking) about the SPI and some crucial choices need to be made at the outset. Given the space available, the approach that I have chosen is one where the overarching question and the main focus is on architectural matters. Put another way, my objective is not to introduce new empirical findings or provide detailed criticisms or supporting arguments for this or that analysis of specific facts. Thus I will be rather unconcerned whether, for example, left-dislocation in a particular language is derived by movement or base-generated and so forth. The literature on the topics relevant, or potentially relevant, to the SPI is vast and it is not possible to provide a meaningful survey of the different approaches and supporting facts here. So I will have to be selective. It seems to me that, at the present state of knowledge and theoretical development, the burning question concerning the SPI is the following: given a specific system of syntactic organisation and an appropriately delimited view of pragmatics, what does the overall model of grammatical organisation, the architecture of the grammar have to look like in order to allow syntax and pragmatics to interface? As few, if any, would argue against the importance of the interfaces, this type of architectural question becomes paramount if the interfaces are to be investigated in a fruitful manner. Similar questions can be asked about the interfaces between syntax and semantics or phonology, although as I will shortly argue there is some terminological cleanup to do with respect to the use of the term interface.
36. The Syntax − Pragmatics Interface
1257
Clearly, the choice of syntactic theory and the view of pragmatics that one adopts will determine the final architectural outlook. We would get very different answers if we choose, for example, HPSG (Ginzburg and Sag 2001) or Dynamic Syntax (Kempson et al. 2001) as our syntactic model. But this is to be expected, of course. In this paper I will consider the architectural issues within so-called minimalist syntax. Although not every technical detail of the model will be relevant, the general architecture will be. It should also be noted that while our general assumptions will be those of the mimimalist framework, the empirical, analytical and theoretical issues raised in this paper are generally relevant modulo some terminological and technical modifications, to most frameworks for the theory of grammar. The paper is organised as follows: Section 2 will offer some general considerations of syntax, pragmatics and the nature of interfaces. Section 3 will provide the main empirical basis of the paper. I will present a selection of data and approaches relevant to the SPI. Specifically, we will look mainly at information structure (section 3.1) and scalar implicature calculation (section 3.2). In section 4 we draw the implications of the previous sections and formulate a proposal for grammatical architecture that takes seriously the need for an SPI. Section 5 concludes the paper.
2. Syntax and pragmatics: General considerations The preface to the 1975 edition of Chomsky (1955: 57) begins thus: Linguistic theory has two major subdivisions, syntax and semantics. Syntax is the study of linguistic form. Its fundamental notion is “grammatical,” and its primary concern is to determine the grammatical sentences of any given language and to bring to light their underlying formal structure, (…) Semantics on the other hand, is concerned with the meaning and reference of linguistic expressions. It is thus the study of how this instrument, whose formal structure and potentialities of expression are the subject of syntactic investigation is actually put to use in a speech community. Syntax and semantics are distinct fields of investigation. How much each draws from the other is not known, or at least has never been clearly stated.
Many semanticists would agree that semantics is concerned with the meaning and reference of linguistic expressions but would disagree on two points with the view above: First that semantics is the study of how language is put to use in a speech community. Today this is understood to be the subject matter (or part thereof) of pragmatics. Second, that it is not known how much syntax and semantics draw on each other. Surely not everything is known but a great deal is. On the other hand, Chomsky’s view does, I believe, still hold if we replace semantics with pragmatics. My purpose in this paper is to provide some elements of an answer to the following questions, which are an elaboration of Chomsky’s question applied to pragmatics, i.e. how much do syntax and pragmatics draw on each other. (1)
a. Does syntax access pragmatic information? (does syntax draw on pragmatics?) b. What does pragmatics really apply to? c. When and how do syntax and pragmatics meet (interface), if at all?
1258
V. Interfaces
Moreover, as the relevant question is one of the interface, we must also seek to understand what is the formal object(s) that provides appropriate representations or structures for the syntax-pragmatics interface. The SPI has not been as intensely studied in recent years as its cousins, the syntax-semantics and the syntax-phonology interface despite the fact that it is frequently enough mentioned in the literature. Unlike other interfaces little attention has been paid to the kinds of information that a particular syntactic structure must encode in order to be an appropriate input to the pragmatics; and this is another aspect of the same problem, we don’t seem to have anything approximating a good definition of pragmatics such that it would be appropriate for it to interface with the syntax. I think we could do worse than spending a bit of time clarifying the questions and understanding the requirements of the what we call the SPI.
2.1. The standard picture Let’s start with the standard picture and the notion of interface. Much current thinking on the nature of the architecture of the language faculty has converged on something similar to the following, give or take some minor details: The language faculty (in the narrow sense) consists of a central generative component (a.k.a computational system of human language, CHL) which outputs representations that must be legible to the external systems that interface with the central linguistic component. These external components impose on the computational system a number of legibility conditions, for the general framework see Chomsky (1995b, 2000, 2001, 2004) and for some more general relevant aspects of the approach: Chomsky (1992, 1995a, 2002). Usually this is referred to as the Y model and depicted as in (2) (2)
Lexical Resources
Syntax
Spell−Out
PF
LF
Articulatory Perceptual System
Conceptual Intentional System
Furthermore, in the most recent versions of the theoretical model sketched above, a consensus has been reached according to which syntactic computation proceeds in a
36. The Syntax − Pragmatics Interface
1259
strictly local fashion. This means that as soon as each computational cycle (the so-called Phase) is completed, its result is handed to the interpretive components while the rest of the computation proceeds. If we take this view seriously some questions arise with respect to what we actually mean when we talk of the syntax-semantics/phonology interfaces. Semantics and phonology are clearly not external systems, semantics is not the conceptual-intentional system and phonology is not the articulatory-perceptual system. A more appropriate way to look at the model is the following. The structures assembled by the syntax receive a semantic interpretation by the formal semantic component and the result of this is the representation, call it LF in keeping with tradition, that is the interface with the external system C-I. Something similar is going on on the sound side with the result being PF which is the interface representation with the A-P system. So semantics and phonology are internal components and play a role in computing the interface representations rather than using it as their input. This may seem quite obvious but it seems to me there is sufficient confusion in the literature to warrant saying it explicitly. Recall also that one of the main achievements of minimalist theory has been the elimination of internal levels of representation. The only levels are the interface levels. One consequence of this view is that when we talk of the syntax-phonology or the syntax-semantics interface we should be careful to avoid any suggestion that these are actually levels of representation on their own right. This is both important and problematic. It is important because in pursuing the agenda associated with the minimalist programme one should not, in fact, be formulating phonological or semantic conditions on the output of syntax, but ensure that the syntactic output is such that it can be directly parsed and interpreted by the phonology and the semantics. In turn this means that the syntactic output must contain enough information for the interpretive components, which in fact considerably weakens arguments from economy of representation. Now, the system is problematic for two reasons. The first reason is terminological. The term interface is used both for the only levels of representation there are, LF and PF, which are indeed true interface levels, that is, they are the points where the linguistic system interfaces with the external (but mind-internal) CI and AP systems, and it is used for things like the syntax-semantics/phonology interface. It is important to keep in mind that these two types of interface are qualitatively different, in fact they are metaphysically (and ontologically) different. They are, qua levels of representation existentially different in the sense that the former (LF, PF) exist whereas the latter (syntax-phonology/semantics) do not. In the interests of clarity, in what follows I will reserve the term interface for things of the LF, PF sort, and I will use the word mapping for things of the syntaxsemantics/phonology sort. One of our main questions will be in which type does the relation between syntax and pragmatics fall. But before we get to that let me give the second reason why I think the system is problematic. The problem lies in the insistence on the absence of internal levels of representation. To see this consider for a moment two examples of phenomena studied as part of the syntax-phonology/semantics mapping. For the semantics, consider scope assignment based on Quantifier Raising. Scope taking elements must assume their respective scope positions in the syntax so that the semantics may calculate the correct interpretations. This is achieved via the covert application of the rule of Quantifier Raising (May 1985). On the other hand, at the syntax-phonology side, a frequently cited example is the rule known as raddopiamento sintattico (Nespor and Vogel 1982 1986; Inkelas and Zec 1995) which states that in a sequence of two words w1 and w2, the initial consonant of the w2 geminates if w1 ends in a stressed
1260
V. Interfaces
vowel, and if certain syntactic conditions are met. In a nutshell, even if the phonological conditions are met the rule will not apply unless w1 and w2 are immediate syntactic constituents. These are well motivated empirical generalisations. For the phonological case, the condition only says that phonology does not apply to strings but to structures. This does not require a syntax-phonology “interface” level. In fact it does not require a level at all. What we really have is a seamless integration whereby the phonological component processes according to its own rules the output of the syntax. A mapping as we put it earlier. For the semantic case we see that the model in (2) is in fact inadequate as an extra level of representation is required which serves as the input to the formal semantic component − which as we saw is CHL internal − which eventually produces the true interface. In (3) this is iLF where i stands for interpreted. This is correct insofar as the input to the semantic component is scopally unambiguous structure and scope disambiguation happens via QR. The model then really looks like the following diagram (3): (3) Lexical Resources Syntax Spell−Out PF
Articulatory Perceptual Systems
LF
Formal Semantics
iLF
Conceptual Intentional Systems
Unfortunately, this reintroduces CHL internal levels of representation. At the end of this section we have established three things about the general interfacing model. (4)
a. iLF and PF are the proper interfaces with external systems. b. There are mappings (some sort of internal interfaces) between syntax and semantics/phonology. c. The mapping between syntax and semantics is less direct than the one between syntax and phonology.
Let’s turn to pragmatics now. Where does pragmatics apply? We will begin by trying to establish an appropriately delimited view of pragmatics.
36. The Syntax − Pragmatics Interface
1261
2.2. Pragmatics Under the most common and general definition pragmatics concerns those features of context and those aspects of use that contribute directly to interpretation. Assuming a broad definition of context and use to include discourse coherence issues as well as enrichment of meaning via implicature etc. we can first of all divide pragmatics into two large categories: conventional and non-conventional pragmatics (see among others Goldberg 2004). The former concerns the association of certain linguistic structures and properties with some pragmatic effects, e.g. the presuppositional properties of clefts. As Goldberg points out, these effects are non-necessary and we can expect to find some degree of linguistic variation as a result of varying degrees of conventionalisation. Within the class of conventional pragmatics we can include information structure, conventional implicature and related notions such as expressive content as manifested in the grammar of appositives. (Potts 2005), or discourse particles/markers (Blakemore 2004). On the other hand, non-conventional pragmatics concerns phenomena that have their root in the use of language and have their source in the speakers using the language. As a result, according to Goldberg, we expect their results to be universal. In this class we include conversational implicature and other effects generally falling under the heading of conversational pragmatics. Apart from these two large classes we must also draw attention to the fact that appeal to the pragmatics is frequent for many cases where the construction of the full meaning requires contextual supplementation. In this class we include standard cases of reference fixing (indexicals, demonstratives, verbs of motion towards x or away from x and so on). Much of this broad picture is associated with (some version of) the (Neo)-Gricean programme. Grice’s most important works are collected in Grice (1989). A widespread conception of the pragmatic realm is outlined in (5) from Recanati (2004): (5)
What is communicated CONSCIOUS
What is said
What is implicated
UNCONSCIOUS
Sentence meaning
Contextual ingredients of what is said
We will return to this picture later.
2.2.1. The autonomy of syntax and pragmatics Up to this point we were concerned with general aspects of the model of interfacing. We have not said much on the place of pragmatics in the model. This will depend in
1262
V. Interfaces
part on the facts. We will move on in a moment to consider some more specific empirical aspects of the SPI but before we do so it is worth recalling the way the debate on the relation between syntax and pragmatics has been framed. Levinson (1983: 34) in his discussion on the role of pragmatics in the theory of grammar writes: In order to construct an integrated theory of linguistic competence, it is essential to discover the logical ordering of components or levels. For example, Chomsky has elegantly argued that syntax is logically prior to phonology, in that phonological descriptions require reference to syntactic categories, but not vice versa; syntax is thus AUTONOMOUS with respect to phonology, and phonology (non-autonomous with respect to syntax) can be envisaged as taking a syntactic input, on the basis of which phonological representations can be built up. Accepting for a moment this kind of argument, the question is, is it possible to argue that there is some accepted component of grammar that is non-autonomous with respect to pragmatics (i.e. some component requiring pragmatic input)? If so, pragmatics must be logically prior to that component, and so must be included in an overall theory of linguistic competence.
Consider now Chomsky’s response in Stemmer (1999: 394−400) to a question directly referring to the above quote from Levinson (…) I suppose it is possible to argue that the computational /representational system accesses features of language use, though what such a system would look like, I have no idea. Suppose, for example, we consider the (plainly correct) fact that in a linguistic interchange, new/old information is a matter of background that participants assume to be shared (what is sometimes misleadingly called “discourse”; there need be no discourse in any significant sense of that term). Suppose further (as appears to be correct) that old/new information relates to “displacement effects” in narrow syntax. And suppose further (merely for concreteness) that we take these displacement effects to be expressed in narrow syntax by transformational operations. Should we then say that the operations of object-shift, topicalization, etc., literally access shared background information? This seems close to incoherent; any clarification of these intuitive ideas that I can think of yields computational systems of hopeless scope, compelling us to try to formulate what amount to “theories of everything” that cannot possibly be the topic of rational inquiry (…) A more reasonable approach, I think, is to take the operations to be ‘autonomous,’ i.e., syntax in the broad sense, and to understand pragmatics to be a theory concerned with the ways properties of expressions (such as displacement) are interpreted by languageexternal (but person-internal) systems in terms of old/new information. That leaves us with manageable and coherent questions. (…) If that’s correct, then syntax (broad or narrow) will be “autonomous” of pragmatics (…)
Although I believe it is true that pragmatics, or at least a significant part of it is indeed part of linguistic competence, I am not aware of any conclusive or strong argument that would show this along the lines of what Levinson requires. There has been, however, significant confusion in the field regarding the status of pragmatic information like the one that Chomsky describes (more on this in the next section). The general approach that we will adopt here is in line with Chomsky’s comments. How this is to be implemented is a different matter and we will return to it in section 4. We will now move on to consider some empirical issues.
36. The Syntax − Pragmatics Interface
1263
3. SPI: some facts As we mentioned already in the introduction we will mainly concentrate on two types of facts, namely information structure and calculation of scalar implicatures. The main objective throughout this section will be to establish desiderata and constraints that the interface must satisfy.
3.1. Information structure By Information Structure (Erteschik-Shir 2007 for a survey) we will understand the partitioning of sentences into units according to their informational status, i.e. old vs. new information, topic vs. comment, focus vs. background/presupposition and so forth, though these distinctions do not always cover exactly the same empirical ground. Chafe (1976) uses the term information packaging to refer to the ways information is presented. Vallduví (1992) analyses information structure as a separate component of the grammar which analyses sentences into ‘instructions with which the speaker directs the hearer to enter the information carried by the sentence into her/his knowledge-store.’ Languages use various strategies in order to realise information-structural distinctions but generally the three most common ways are: (6)
a. Intonation (e.g. English, Germanic in general) b. Constituent order changes (e.g, Most of the Romance languages, Greek, etc.) c. Special morphology (Japanese, Korean etc.)
The example languages are given for illustration and it is not intended that in these languages only one strategy is available. The above strategies are exemplified in (7): CARTHAGE.
(7)
The senate decided to destroy
(8)
Carthagei, le Sénat a décidé de lai détruire. Carthage, the Senate has decided to it destroy ‘The Senate decided to destroy Carthage.’
(9)
Tin Karhidona, i siglitos apofasise na katastrepsi. The Carthage, the Senate decided to destroy ‘The Senate decided to destroy Carthage.’
(10) Kartako-nuni [Sangwon-i ei pakoi ha ki-ro keyljeng Carthage-TOP Senate-NOM destroy do-NMLZ-COMP decide hayssta]. PST.DECL ‘As for Carthage, the Senate decided to destroy (it).’
[French]
[Greek]
[Korean]
In the English example (7) focal stress on the object in used in order to mark it as a the focus or new information the sentence carries. In the French example (8) we have a case of clitic left dislocation used as a topicalizing device. In the Greek example (9), a left
1264
V. Interfaces
dislocation is used to signal a contrastive interpretation. Finally, in the Korean example (10) the morpheme -(n)un is attached to the noun chayk (book) to signal that it is the topic. Steedman (2000) is another place where specific claims are made regarding information structure qua interface between the syntax and the phonology. We should point out here that there is little consensus in the literature on the exact definitions of notions such as topic, focus and so on. But in very broad brushstrokes most authors will not disagree entirely with a definition of topic as what the sentence is about. As for focus, again there seems to be no common definition but, simplifying greatly, it roughly corresponds to the new information. Finally we need to distinguish the notion of contrast which can apply to topics and foci. As we are concerned mainly with grammatical architecture, and for concreteness we will concentrate on focus and only comment on topics in passing, the relevant conclusions carry over to topics anyway.
3.1.1. Focus As already mentioned, the most often expressed effect of focusing is that focused constituents represent new information. The question-answer test is also relevant here as the focus corresponds to the element that gives the answer to the corresponding WH-question: (11) a. Q: Who did Arturo meet? A: Arturo met Maria. b. Q: What did Arturo do? A: He slept at Maria’s house. The part of the sentence in boldface is the focus. The way to capture formally the question-answer intuition is via the notion of focus or alternative value: (12) ⟦Arturo met MariaF ⟧ f The alternative value for (12) will be the set of propositions that results from replacing the variable x in Arturo met x by the names of relevant individuals. Focus marking and more generally information-structure related markings, categories and divisions do not affect sentence meaning truth-conditionally in the strict sense. How is focus assigned? We should distinguish two parts to this question. First there is the issue of how focus is identified. The answer here is that mostly the focus is identified pragmatically. The second question is what is the formal mark that signals focus. This is a different question from how is focus realised. Rather, the question is what is the abstract mark or feature that triggers specific pitch accent patterns or syntactic movement or the insertion of a particular morpheme. There are, as you might have guessed, many answers to this question. Within the generative tradition Jackendoff’s (1972) idea that phrases are marked with a feature F has been influential. Under this conception, information structural notions are represented in syntactic structure and are mapped onto specific realisations by the phonological and semantic components. Now, in Jackendoff’s original formulation and in much subsequent work (at least on English), a mark F is realised as stress. As
36. The Syntax − Pragmatics Interface
1265
we noted earlier stress is the main means of focus realisation in English. According to Truckenbrodt (1995) this is explainable via one overarching constraint governing focus realisation which he calls FocusProminence: (13) FocusProminence Focus needs to be maximally prominent. Büring (2010) shows that in fact all strategies of focus realisation can be understood as part of FocusProminence. If this is so, there is an interesting consequence. Prominence is most readily defined in phonological, accentual terms. The consequence is that we can raise the following question about the pragmatics: what does the semantic/pragmatic component really interpret? F or stress? Obviously, the answer to this question is crucial for the syntax-pragmatics interface. We can reason then as follows: As we noted some effects of focus are non-truth conditional, hence, simplifying somewhat, pragmatic. Since the realisation of F is stress and prominence is most obviously characterised in terms of stress it would not be unnatural to characterise its meaning in the same way as we characterise the meaning of stress-type things. Hirschberg (2004) suggests intonational meaning is also pragmatic. As a result it seems more natural to propose that pragmatics sees not F but stress, since this is what indicates prominence. The same thing would be valid for topics too assuming that the realisation of T(opic) is again a certain intonational pattern. Now, if correct, this is, I suggest welcome. The general model suggests that syntax manipulates elements form the lexicon and deals mostly in formal features. The syntax does not have access to pragmatic information like old vs. new information and so on and thus cannot identify focus. So we now have the following choice: either we provide the syntactic representation with interpretable features of the type TOPIC or FOCUS or we allow those features only a phonological realisation and we derive the pragmatic effects by looking at the phonological representation, in which case the pragmatics will have to interface directly with the phonologically interpreted structure. As we have not yet stated where pragmatics actually applies − indeed, this is what we are trying to determine − we will put this question to one side for now and return to it shortly. Let’s now turn to languages that use mostly movement based strategies for the expression of information-structural notions.
3.1.2. Movement structures How about languages which rely more extensively on syntactic movement strategies for the expression of information structure? Predominantly, information-structure related movement targets peripheral positions. It is the left periphery that has been the greatest focus of attention at least in the recent literature. On one approach, the left periphery is a richly structured field where there are designated positions and these positions are targeted by phrases that move there for, apparently at least, information structural reasons. The proposals in Rizzi (1997) have been extremely influential in this respect. It should be noted that various authors assume a subset of the positions that Rizzi has proposed often with little more than motivation which is merely empirical. Rizzi’s (1997) proposed structure for the left periphery is as follows (the * symbol indicates that the node is recursive):
1266 (14)
V. Interfaces
ForceP Force Force
TopP* Top Top 0
FocP Foc Foc
TopP* Top Top 0
FinP Fin Fin IP
The position that Rizzi defends is that elements that can be characterised from an information-structure point of view as topics or foci, when fronted, are to be found not in adjoined positions high up in the structure but, rather, in dedicated specifier positions where they establish relations with heads endowed with features such as topic and focus. The approach generalises of course to languages that do not show such movement − or not necessarily (English is a case in point) − in these cases one may suggest, adopting the theory of movement as an operation driven by the existence of an EPP feature, that the operations are identical and the peripheral heads are always there except that in some languages they are not endowed with an EPP feature and therefore, after an AGREE relation has been established between the Top/Foc head and the topical/focused DP there are no further derivational operations to be performed, specifically, there is no movement. We can think of Jackendoff’s F marking in precisely those terms, a DP that enters into an AGREE relation with Foc is F marked while one that is related to Top is T marked. Again, the connection with the phonology is obvious. In a language where the Foc head, say, is [+EPP] the F-marked DP will be pronounced in a peripheral position. In a language where Foc is [−EPP] (or has no EPP feature) it will trigger the assignment of focal pitch accent. There is an elegance and an attractiveness to the theory that is undeniable. Perhaps this is all there should be to it. Rizzi’s (1997) proposals have generated a great deal of debate in the literature focusing mostly on whether the grammar needed to make use of features such as [topic] and [focus] in narrow syntax. Most of the critics suggest that there are simpler, if not better, ways to achieve the same results without postulating an extensive set of functional heads. I want to suggest here that, on reflection, the criticisms miss the mark on a number of counts. If we take Chomsky’s comments in the previous section seriously, Rizzi’s system would be subject to the same type of criticism as any other such system if the suggestion
36. The Syntax − Pragmatics Interface
1267
was that they had to access shared background information on what is old and what is new information and would allow the computational system to manipulate such information. Let me reiterate that these notions, (old/new, topic/focus) though they may be useful in describing some syntactic patterns they are not sentence-level grammatical notions. But the appropriate interpretation of Rizzi’s proposals seems to me to turn on the need for syntax to account for variations in constituent order in a principled manner. As long as one assumes that it is syntax that is responsible for generating constituent order then some way must be found for the syntax to be able to generate them. Perhaps the terminology of topic and focus is not particularly helpful but the essence of the proposal seems to me to be a syntactic mechanism to produce structures that are interpreted in one way or another by the interpretive components. The alternatives to this approach are either base-generation or a sequence of adjoined positions. There is no suggestion here that all constructions involving fronting will involve movement to a designated specifier position. There surely is both base-generation and adjunction to the upper part of the clause structure. The point here is that, in spirit, there is at the end of the day little to differentiate these points of view. A structure like (15) where the lower DP designates the thematic position could be generated in a variety of ways which would be determined by the empirical properties of the construction. (15) [… DPi … [IP … DPi …] ] However, from the point of view of the architecture of the grammar, which is paramount in this paper, I don’t think that one or the other syntactic approach is particularly (dis)favoured. In fact, as I will suggest in section 4 when we take into account the proposals relating to implicature and so on, Rizzi’s proposals for the left periphery become the most attractive. As noted already, the two strategies (intonation and movement) can be combined. De Cat (2002) for instance uses the special prosodic properties of dislocated elements in French to build her account. She criticises Rizzi along the lines already mentioned. However, there seems to be no specific incompatibility between the two accounts (apart from the fact that she uses no designated heads). The issue is that the dislocated DP must somehow come to be interpreted as a topic. If it sits in a position that provides some kind of marking to it then the interpretive components will be able to provide appropriate interpretations. Now, clearly, in this type of language too we need to ensure that there is a connection between the pragmatic component, which identifies, say, the topic and the phonological component which has the overt exponent of the topic marking that was, perhaps, syntactically received. Just as we suggested for focus that phonologically realised prominence will have to be the input to the pragmatics, topic will be associated with its own intonational pattern. It is reasonable then to conclude at this stage that movement strategies represent a different way to achieve prominence and therefore we have to maintain that the pragmatics will be interfacing directly with the phonology. I would like now to move to the case of languages that employ specific marking for topics and foci.
3.1.3. Overt information structure marking Many languages employ special morphology to mark topics and foci. Although the strategy is quite common, Japanese and Korean are two of the best studied cases. Broadly
1268
V. Interfaces
speaking in Japanese topics are marked with the morpheme -wa and foci with -ga. In Korean the equivalent morphemes are -(n)un and -ka/i.We note with Heycock (2008) that while there is much disagreement concerning both the definition and the usefulness of notions such as topic in syntax in studies on English or other Indo-European languages, it seems that the fact that -wa in Japanese does mark the topic is an almost banal fact for scholars working on Japanese or Korean. This overt marking of informationstructural notions raises a number of issues. First, do such morphemes have special lexical semantics connected to topichood and focus? The questions are much too complex to address in this paper. However, keeping in mind the overarching goal of this paper (to elucidate the architecture that underlies the interactions between syntax and pragmatics), we will briefly review the analysis of topic and contrastive focus in Korean due to Gill and Tsoulas (2004) as it tells us something about the role of phonology in the determination of the information structure categories. Korean, like Japanese, is interesting because it instantiates both a movement strategy in the form of scrambling/topicalisation and overt topic marking. As is well known, scrambling in languages such as Japanese and Korean comes in two varieties at least short (or clause internal) and long (across one or many CP boundaries). Korean shows both types as shown below (16), (17): (16) I chayk-uli [Younghee-ka ei ilkessta]. this book-ACC Younghee-NOM read ‘Younghee read this book.’
[Korean]
(17) I chayk-uli [Chelswu-ka [Younghee-ka ei ilkessta]-ko this book-ACC Chelswu-NOM Younghee-NOM read-COMP malhayssta]. said ‘Chelswu said that Younghee read this book.’
[Korean]
Now, in parallel with scrambling, topicalisation in Korean can be short or long distance too (18), (19). However, in contrast to scrambled DPs, topicalised elements are morphologically marked with the marker -(n)un: (18) I chayk-un [Swunja-ka e sassta]. this book-TOP Swunja-NOM bought ‘As for this book, Swunja bought (it).’
[Korean]
(19) I chayk-un [Younghee-ka [Swunja-ka e sassta]-ko this book-TOP Younghee-NOM Swunja-NOM bought-COMP malhayssta]. said ‘As for this book, Younghee said that Swunja bought (it).’
[Korean]
The two constructions are similar in two immediately observable respects. First, they both seem to involve a preposing mechanism whereby the scrambled/topicalised element appears sentence initially. Second, from an acoustic point of view, in both cases, a rising tone is required on the case/topic marker and an intonational break is required after the scrambled/topicalised element.
36. The Syntax − Pragmatics Interface
1269
From an interpretive point of view things are a little more complicated. For one thing, a scrambled element does not receive a topic reading, which, obviously enough a topicalised element does. Moreover, the interpretation of -(n)un marked elements is not restricted to ‘topic’ in the usual sense of the term. Han (1998) distinguishes three different readings for -(n)un marked phrases: a topic reading, a contrastive topic reading and a contrastive focus reading which she defines as follows: (20) A Topic reading/Contrastive topic reading Chelswu-nun Younghee-lul coahanta. Chelswu-TOP Younghee-ACC like ‘As for Chelswu, (he) likes Younghee.’ ‘As for Chelswu, (he) likes Younghee, (Frank likes Susan, and Peter likes Laura)’ (21) B
Contrastive focus reading Chelswu-ka Younghee-nun coahanta. Chelswu-NOM Younghee-CF like ‘As for Younghee, Chelswu likes her.’
[Korean]
[Korean]
(CF = Contrastive Focus) Let me stress first that both in Han’s work and here the main focus is on -(n)un marked DPs. It should also be noted that almost any category can be -(n)un marked and that the readings produced by -(n)un marking in various positions as detailed below seem to be uniform for all categories. Therefore we will not pay any extra attention to non-DP -(n)un marked phrases. Here it is particularly interesting to observe with respect to (21) that the contrastive focus reading for the object is available when it is -(n)un marked in situ. Crucially, the contrastive focus reading is not available in (22): (22) Younghee-nun Chelswu-ka coahanta. Younghee-TOP Chelswu-NOM like ‘Chelswu likes Younghee.’
[Korean]
Here only the topic or contrastive topic reading is available. Now taking scrambling and topicalisation together we also observe that scrambling far from being information structurally inert, it interacts with the general focus structure of the sentence in the following way: scrambling cancels the focal prominence of the subject when both the scrambled object and the subject represent new information. The challenge that this observation poses is to provide an analysis of scrambling from which this fact will result without undue stipulations. The first question that we must answer in order to move closer to such an analysis is whether there is a dedicated position, call it [SpecTopP] for convenience, hosting -(n)un marked phrases. This question is not only important on its own right but it turns out to be also important in connection to scrambling as it is possible to scramble an object over a -(n)un marked (topicalised) subject as below: (23) I chayk-uli [Chelswu-nun ei sassta]. This book-ACC Chelswu-CF bought ‘It is Chelswu who bought this book.’
[Korean]
1270
V. Interfaces
As the gloss and less felicitously perhaps the translation indicate in this case scrambling cancels also the prominence of the topic and gives it a contrastive focus reading. Thus, as a corollary to the question concerning the existence of a position dedicated to -(n)un marked elements there is also the question of whether there is also another position above that, where scrambled elements land and whose properties are such that not only they determine the interpretation of the element sitting in that position but also, it determines the interpretation of the nominal in the position below. The existence of such a functional head would be a major and very surprising discovery indeed. The mystery is compounded by the fact that when a -(n)un marked phrase is not sentence initial, for almost whatever reason, it can only be interpreted as a contrastive focus and never as a topic. Thus, apart from the case just mentioned, where an object is scrambled to the sentence initial position, if more than one -(n)un marked elements occur in a sentence, any -(n)un marked phrase which is not in sentence initial position cannot receive a topic reading. (24) I chayk-un [Chelswu-nun e sassta]. bought This book-TOP Chelswu-CF ‘As for this book, it was Chelswu (not others) who bought (it).’
[Korean]
From the observations above, it seems that the conditions that a DP must fulfil in order to receive a topic reading in Korean can be succinctly summarised as follows: (25) a. b. c. d.
There may only be one topic per sentence. It must be in sentence initial position. It must be -(n)un marked. It must be stressed (and an intonational break (or more precisely phrasal lengthening) must occur after the topic).
How can these patterns be explained? Gill and Tsoulas (2004) suggest that Korean instantiates as part of its normal clause structure a TopP projection. This position, in neutral sentences, that is in sentences that do not contain a -(n)un marked element hosts the subject, which moves there from [SpecIP] as in (26). Assuming the existence of such a position and the movement of the canonical subject is quite natural in languages such (26)
TopP subji
Top Top
IP ti
I I
vP ti
v v
VP OBJ
V
36. The Syntax − Pragmatics Interface
1271
as Korean which, in Li and Thompson (1976) terminology, are halfway between topic and subject prominent. Sentences with a -(n)un marked element will have the obvious structure (27): (27)
TopP XP-(n)un
Top Top
IP subj/t subj
I I
vP t subj
v v
VP OBJ/t
V
Gill and Tsoulas (2004) propose that the only information encoded in that projection is [+stress] and that the patterns are explained through the interaction of the featural content of Top with the properties of accentual phrasing in Korean. Briefly, their account is as follows: Due to the properties of accentual phrasing in Korean, i.e. the first accentual phrase receives the highest pitch accent, it follows that the element in [SpecTopP] will receive the highest pitch accent in the sentence whether or not it is -(n)un marked. This is rather straightforward as long as the element in [SpecTopP] is in sentence initial position. If this is correct, they suggest that it is reasonable to assume that it is the information encoded in Top which is responsible for the phonological phrasing patterns observed. However, the problematic cases would be when the -(n)un marked element is not in sentence initial position. Assuming still that the -(n)un marked phrases are syntactically preposed, and scrambling occurs afterwards, the phonological patterns remain to be explained. Cases such as these have been used to argue that Korean instantiates two configurationally distinct peripheral heads, Top and Foc. The structure then should be (28).
1272
V. Interfaces
(28)
TopP XP-(n)un
Top Top
FocP XP-(n)un
Foc Foc
IP subj/tsubj
I I
vP t subj
v v
VP OBJ/t V
In (28) the -(n)un marked XP which appears in [SpecFocP] is supposed to receive a (contrastive) focus interpretation by virtue of its position only. It would now seem that the proposal that top0 corresponds to [+stress] provides a natural way to understand and perhaps explain the patterns. If we assume that the structure is as in (29). (29)
TopP XP-(n)un/ka/lul
TopP XP-(n)un
Top Top
IP subj/tsubj
I I
vP t subj
v VP
v
OBJ/t V
Given the assumption in (25a) that there may only be one topic per sentence it follows that the [+stress] feature may not be multiply checked. Now, the interaction of the tonal realisation of the first accentual phrase (30): (30) The first accentual phrase is realised at a level higher than the rest of the sentence.
36. The Syntax − Pragmatics Interface
1273
and the idea that the Top head has a [+stress] feature seem to conflict. The resolution of this conflict comes as follows: in order to stress the element in the specifier of TopP an accentual phrase break must occur in order to demarcate it from the previous element, immediately followed by dephrasing of the following words. Now given that the first accentual phrase’s high tone is realised as the highest peak in the sentence in normal circumstances when a -(n)un marked phrase occurs in a ‘sentence internal’ position then it is this AP’s high tone that must receive the highest peak which must, in turn be higher than the high tone of the first accentual phrase. It is precisely this option that allows for the contrastive focus reading. This situation is, we claim, incompatible with a topic reading as it disturbs the phrasing pattern of topic sentences which reflects the topiccomment phrasal pattern: {Topic} − {Comment}. The contrastive focus interpretation is then the only option. This accords well with the previously noted idea that focus is always maximally prominent.
3.1.4. Information structure summary In this section we assumed that information-structure mainly deals in notions that ultimately can only receive a pragmatic definition. We examined in some detail some of the strategies that languages use in order to express information-structure distinctions. The main idea of this section is that despite the various ways that languages use in order to mark information-structural distinction, there seems to be a clear pattern in that, first, maximal prominence seems to be an overarching principle, and that syntax serves to provide the necessary configurations for the phonology to realise this prominence. Given this pattern the main desideratum coming from the analysis of information structure is that the SPI must be somehow mediated or receive information from the phonological component too. This seems in some sense in accordance with the classical Gricean view of pragmatics where it is whole, complete utterances that are subject to enrichment via pragmatic processes. In the next section we turn to implicature and an argument that this Neo-Gricean view is not altogether waterproof.
3.2. Implicature Implicature is in many ways the pièce-de-résistance of formal pragmatics (together with presupposition which is not discussed in this paper). Generally speaking an implicature is something meant, implied, or suggested distinct from what is said (Davis 2008). Ever since Grice’s seminal works dating back to the late 1950s the question of what a speaker implicates as opposed to that which is said has been a pivotal issue in pragmatics. Recall the illustration in (5), repeated in (31) for convenience:
1274
V. Interfaces
(31)
What is communicated CONSCIOUS
What is said
What is implicated
UNCONSCIOUS
Sentence meaning
Contextual ingredients of what is said
Note that what is communicated, i.e. the sum of what is said and implicated is attributed to a conscious cognitive level, while the constituent parts of it are at the unconscious level. This raises an immediate question: if what is said and what is implicated are at the same cognitive level so to speak, does it follow that the processes that compute the the two aspects of the global meaning are also similar? Could they even be the same? This interpretation of the graph is mine and not Recanati’s. He pursues the issue in a different direction. Under Grice’s view and that of a large proportion of those working in the area the answer would be emphatically negative. First, core pragmatic notions such as implicature calculation are not processes applicable to sentences but to actions, actions of saying that is. This is in fact clear in Grice’s work but has given rise to a lot of confusion as Bach (2006) very pertinently notes. If this is indeed so, the following difficulty immediately emerges: if implicatures are really associated with acts of saying then there is no straightforward point of contact between a linguistic representation and the mechanism that draws the implicatures, much less with a syntactic representation. This difficulty has indeed been considered a major obstacle to the construction of a theory of the syntax-pragmatics interface. Under the Neo-Gricean paradigm, pragmatic processes that enrich the compositional meaning of an expression take place at root level and are post-compositional. Horn’s (2004: 3) formulation is revealing: In the Gricean model, the bridge from what is said (the literal content of the uttered sentence, determined by its grammatical structure with the reference of indexicals resolved) to what is communicated is built through implicature. As an aspect of speaker meaning, implicatures are distinct from the non-logical inferences the hearer draws; it is a category mistake to attribute implicatures either to hearers or to sentences (e.g. P and Q) and subsentential expressions (e.g. some).
In other words they apply to whole utterances and make use of reasoning processes. Now the problem this account poses for us is that in order to build the bridge between what is said and what is communicated we need to compute what is said, the meaning of what is said. We picture then the communication process as follows:
36. The Syntax − Pragmatics Interface
1275
(32) What is Said
What is Communicated
The issue with implicature then, to put it figuratively, is one of timing. This view has not gone unchallenged, however. Cohen (1971) discussed cases where implicatures had to be derived within the antecedent of conditionals and therefore their derivation did not sit well withing the Gricean edifice. From the point of view of the SPI, such embedded implicatures, if they indeed exist, represent a good case for the investigation of the SPI. Indeed, recently, there has been a renewed interest in a particular class of implicatures, namely quantity or scalar implicatures with the work of G. Chierchia and his collaborators (Chierchia 2004, 2006, and references therein). Chierchia has proposed what has come to be known as the grammatical view of scalar implicatures which holds essentially that scalar implicatures are not calculated as part of the postcompositional pragmatic processes but rather they are factored in as we go along so to speak. Put another way, pragmatics (or part of it) is also a recursive procedure just like syntax and semantics. As already noted, if this turns out to be right it is also good evidence for for some sort of direct interfacing between syntax and pragmatics. Before proceeding any further, it should be stressed that there have been several responses to the grammatical view of implicatures. The responses aim to provide a fully Gricean account of the cases that Chierchia and others discuss. The work of Geurts (2009, 2010), Sauerland (2004), Horn (2006) is representative of the recent responses to the grammatical view. The work of Geurts (2010) is especially useful here as it constitutes the most detailed criticism and also contains many more references to other relevant work within the (neo)-Gricean paradigm. These criticisms/responses notwithstanding, the grammatical view of implicatures is the only one which gives us an insight on the nature of the SPI. As such it merits special attention in this article. The main argument in favour of the grammatical view of scalar implicatures is that they can, and sometimes must, occur in arbitrarily embedded positions. If this is correct it follows that, at least some SIs cannot be calculated at the global, post-compositional level. Let’s begin with an example of SI calculation according to the standard Neo-Gricean model. Consider (33) (which is broadly based on an example of Chierchia’s from Chierchia 2006). (33) In Sunday’s concert most violins were out of tune The Neo-Gricean story tells us that (33) is associated with the following alternatives which are also being considered: (34) a. b. c. d.
In In In In
Sunday’s Sunday’s Sunday’s Sunday’s
concert concert concert concert
some violins were out of tune many violins were out of tune most violins were out of tune all violins were out of tune
1276
V. Interfaces
The standard reasoning now goes as follows: (35) a. The speaker uttered (34c) rather than (34a), (34b), or (34d). b. Now, (34d) entails (34c), which entails (34b), which in turn entails (34a) c. By the maxim of quantity: If the speaker had enough information to warrant (34d) given that (34d) is stronger than (34c) he would have said so. d. The speaker has no information warranting (34d) e. The speaker is well informed about the relevant facts f. Therefore: The speaker has evidence that (34d) does not hold As Chierchia points out the final step in this reasoning requires a leap of faith, i.e. it does not follow directly from the maxims and logic. Key to Chierchia’s argument are the following observations. First, the fact that although the reasoning in (35) does involve utterances it is also true that it is automatic and unconscious much as is the case with semantic and syntactic processing. Furthermore, he suggests that “it may be wrong to limit processes of this sort to root sentences. (…) But embedded clauses are, after all, potential utterances. And surely speakers do routinely work out the possible conversational effects of potential utterances.” The suggestion then is that implicature calculation can happen at ‘scope sites’ freely and cyclically. Leaving specific formal details to one side, empirically, as Chierchia shows, sentence (36a) gets the reading in (36b) if implicature calculation (meaning enrichment) happens at the root level and (36c) (which is a stronger meaning) if the implicatures are factored in at the level of the embedded clause. The examples in (36) and (37) are from Chierchia (2006: 547−8). (36) a. John believes that many of your students complained. b. John believes that many of your students complained and it is conceivable for all that John believes that not all did. c. John believes that many though not all of your students complained. It seems that Chierchia is correct that (36c) is indeed the preferred reading. Compare the above with (37) (37) a. John doubts that many of your students complained. b. John doubts that many but not all of your students complained. c. John doesn’t believe that many of your students complained but believes that some did. In this case the preferred reading is (37c) where the implicatures are calculated at the root level and not (37b) where it happens at the embedded position. Although the presentation here is extremely simplified, I think that the basic idea should be clear: the kind of reasoning process that is responsible for the calculation of implicatures should be allowed to take place in embedded positions. I will leave much of the semantic details aside. From our point of view it is important to note that the process in question is not random but triggered by specific elements, the scalar elements like the quantifiers which participate in the scale some < many < most < all. What is the formal marking in scalar elements that triggers the implicatures? Chierchia suggests the following implementation: Assume that there is a feature [±σ] and scalar items may be specified for [+σ] in
36. The Syntax − Pragmatics Interface
1277
which case they have active alternatives and must trigger implicatures whereas an item with [−σ] does not have active alternatives and therefore cannot trigger local implicatures. Interestingly, Chierchia notes that the σ feature corresponds to the F feature used in focus semantics. The σ has further syntactic properties. Following Fox (2003), Chierchia suggests that an operator corresponding to strongest meaning could be introduced at different syntactic positions whose effect would be to ‘lock in’ the implicatures and whereby σ [S] will have as its meaning ‘the (strongest) enriched meaning of S’. And here comes the syntactic proposal: Chierchia suggests that the ±σ feature can be uninterpretable and in need of checking against a σ operator located higher up in the structure. Vice versa, −σ must have a [+σ] in its scope: (38)
IP
σ
IP IP
CP If many [+ σ ] complained
(39)
IP
CP if
IP IP
σ
IP IP If many [+ σ ] complained
The trees in (38) and (39) show clearly the syntactic proposal. Perhaps one of the strongest arguments in favour of a syntactic representation of this or similar type comes from the fact that the operator in question gives rise to intervention effects. Observe the following Italian sentence (40) or its Greek counterpart (41): (40) ??Un linguista ha sposato un qualunque dottore. A linguist has married a whatever doctor ‘A linguist married some (unimportant) doctor.’
[Italian]
(41) ??Mia glossologos padreftike enan opjodipote giatro. married a.Masc whatever doctor One.Fem linguist ‘A linguist married some (unimportant) doctor.’
[Greek]
1278
V. Interfaces
Both sentences are odd out of context. In Chierchia’s account the free choice elements qualunque and opjosdipote carry an uninterpretable [+σ] feature. I am simplifying here significantly the actual semantic details, but the syntactic points are unaffected. This feature must be checked by the interpretable operator σ whose effect is to lock in the implicatures. At the same time the DP Un linguista/Mia glossologos carry an uninterpretable [−σ] feature. The resulting structure is: (42) *σ [DP[−σ] [ … DP[+σ]]] Chierchia’s claim is that DP[−σ] intervenes between σ and the DP[+σ] and that this is a case of a minimality violation (Rizzi 1990). As a result, sentences like (40) and (41) are ruled out by the syntax. The assumption here is that the only types of σ marked DPs that can be found between the σ operator and the DP that is operates up must have the same type of σ marking. If this approach to the calculation of scalar implicatures is correct, what does it tell us about the syntax-pragmatics interface? First of all we need to clarify that we take implicature calculation to be indeed a pragmatic process. By this we mean that it is really a version of the reasoning that we presented in (35) that is responsible for generating the implicatures and that the syntactic marking is responsible for triggering that mechanism. This is I think the essence of having a recursive pragmatics. A different way of understanding this would be to suggest that this is really part of the semantics. The distinction is rather subtle. The argument can be made that what Chierchia’s account shows is that scalar inferences are part of the main meaning of sentences, wholly computed by the semantics. In this way the semantics−pragmatics distinction is maintained albeit somewhat recalibrated. We take the former approach here and assume that pragmatic processes must apply cyclically at the relevant sites of the computation. We turn now, in the final section, to the kind of model these considerations suggest.
4. The model Let’s take stock. What we saw in the previous sections on information structure and implicature calculation highlights the following characteristics of the syntax-pragmatics interface. First, for the purposes of information structure, it seems that the syntax cannot interface with the pragmatics independently of phonology. On the other hand, the studies on implicature suggest that pragmatic processes must apply (or at least must be allowed to apply) to structural units that are less than entire sentences but complete enough to be scope sites. Add to that the fact that implicature calculation must apply to interpreted subsentential units (otherwise it would not be able to provide the necessary meaning enrichment) and the result is the following. The classical Y model of grammatical architecture is inadequate since it does not allow LF−PF interactions and neither does it allow local (pre spell-out) application of non-syntactic processes. In its phase-based incarnation the Y model is more promising but again it seems that the interactions between meaning and sound that are required cannot be accounted for. Furthermore there would be no meaningful interfacing between syntax and pragmatics in this model either. As a result, I believe that the appropriate model of grammatical architecture must allow both local computation and spell-out but also some kind of reintegration of structure before the computation proceeds. Perhaps a picture will be more helpful. My proposal is depicted in (43):
36. The Syntax − Pragmatics Interface (43)
Prag
Prag SEM
1279
SEM
Prag
LF
CI GLOBAL
SEM
L
PRAGMATICS
E X Phon
Phon Spell-Out
Phon Spell-Out
PF
AP
According to this model spell-out is cyclic represented by the small squares, we can assume for concreteness that these are the phases. After spell-out the mapping to phonology and semantics is taking place as usual. The novelty is that the structure is again put together before the derivation can proceed. Crucially, local, recursive, pragmatics applies at this level of ‘interpreted’ subsentential constituents, interpreted phases. The result is that the local pragmatic processes are now able to refer to both the semantic and phonological properties of the structure. The result of the application of local pragmatics at the intermediate levels is the creation of phase-sized elements with ‘strengthened’ meanings and information-structure distinctions. Working out in detail how within this approach the full meaning is built up would take us too long to develop and must be left for another occasion. There are two final points I would like to mention before concluding. The first is about the question that was raised at various points in the article regarding the place of pragmatics. Recall that we made the (terminological) distinction between mappings and interfaces. It is clear, it seems to me, that local pragmatics would have to be a mapping. A second level mapping perhaps since it maps interpreted phases to enriched structures but a mapping nonetheless. It is not an external system and we should not be talking of the syntax-pragmatics interface in the case of local pragmatics. What about the box that says ‘Global Pragmatics’ in (43)? I assume this is the overall felicity conditions of sentences/utterances which are determined independently and have a global effect. Pragmatic aspects of felicitous conversation and use for example will come there. These will be dependent on the processes of the CI and AP as well so they are in a real sense external. Thus felicity conditions and so forth will have to be formulated on top of strengthened meanings. Finally, we need to ask the question of the locus of cyclic spell-out, we referred to it as phases but we refrained from endorsing the standard view that phases are the vP and the CP. From what we saw and what we have so far proposed it seems that bare vPs will not do as loci of cyclic spell-out. Chierchia’s views on the syntactic implementation of the strongest meaning operator σ at scope sites is reminiscent of a growing body of work in the last 15 years or so that has sought to establish quantificational dependencies by splitting the functions between an indefinite-like element and an operator that provides its quantificational force (Beghelli 1995; Beghelli and Stowell 1997; Butler 2004; Gill and Tsoulas 2009; Hallman 2000; Kratzer 2005; Kratzer and Shimoyama 2002; Sportiche 1997; Tsoulas 2003). Now if we combine this observation with the ideas on the left periphery stemming form Rizzi’s (1997) work and their reinterpretation here, we can now have a unified view of the phasal periphery as containing the scope heads (including Chierchia’s σ head) and thus being ipso facto scope sites, and the syntactic coding of information structural distinctions. Movement patterns will be derived solely on the basis of the presence (or absence) of EPP features. For evidence in favour of information structure projections above vP, see Belletti (2004), Jayaseelan (2001).
1280
V. Interfaces
5. Conclusion This study of the syntax-pragmatics interface has been, by necessity very partial. There are several issues that have not been considered here for reasons of space. However, I believe that the two major areas covered, information structure and implicature calculation, gave us the core requirements for the study. As our main goal was to elucidate the architecture as opposed to list a number of phenomena where an appeal to pragmatics appears necessary without knowing how or within which parameters such an appeal can be made, we had to leave out some of these cases. These include, the syntax of discourse particles, parentheticals, and other elements that come under the heading of expressive content. Also, cases where covert demonstrative pronouns are assumed as in the account of quantifier domain restriction. These can be fruitfully studied within the model proposed here. Note also that under this model the asymmetry between semantics and phonology in terms of levels of representation disappears. Our main conclusion is that there is a core of pragmatics that should be considered on a par with semantics and phonology in the sense that it provides distinctions that are internal to language. We suggested that these distinctions should be factored in locally on interpreted phasal constituents which include syntactic markings that trigger the pragmatic processes. To go back once more to the schema in (5)/(31) we may conclude that what is implicated (or at least part of it) and what is said are well placed where they are, in the unconscious part, where standard core linguistic computations take place. Thus the answer to the question how much syntax and pragmatics draw on each other (a variation on Chomsky’s question about semantics) is, perhaps unsurprisingly: syntax draws from pragmatics the relevant distinctions, pragmatics draws from syntax the relevant configurations.
6. References (selected) Bach, Kent 2006 The top ten misconceptions about implicature. In: Betty J. Birner and Gregory Ward (eds.), Drawing the Boundaries of Meaning: Neo-Gricean Studies in Pragmatics and Semantics in Honor of Laurence R. Horn, 21–30. Amsterdam: John Benjamins. Beghelli, Fillipo 1995 The Phrase Structure of Quantifier Scope. Ph.D. thesis, UCLA. [UCLA Dissertation #16]. Beghelli, Fillipo, and Tim Stowell 1997 Distributivity and negation: The syntax of each and every. In: Anna Szabolcsi (ed.), Ways of Scope Taking, 71–107. Dordrecht: Kluwer Academic Publishers. Belletti, Adriana 2004 Aspects of the low IP area. In: Luigi Rizzi (ed.), The Structure of CP and IP, 16–51. Oxford and New York: Oxford University Press. Blakemore, Diane 2004 Discourse markers. In: Laurence R. Horn and Gregory Ward (eds.), The Handbook of Pragmatics, 221–240. Oxford: Blackwell Publishing. Büring, Daniel 2010 Towards a typology of focus realization. In: Malte Zimmermann and Caroline Féry, (eds.), Information Structure: Theoretical, Typological and Experimental Approaches, 177–205. Oxford: Oxford University Press.
36. The Syntax − Pragmatics Interface
1281
Butler, Jonny 2004 Phase structure, phrase structure, and quantification. Ph.D. thesis, University of York. Chafe, Wallace 1976 Givenness, contrastiveness, definiteness, subjects, topics, and points of view. In: C. N. Li (ed.), Subject and Topic, 25–55. New York: Academic Press. Chierchia, Gennaro 2004 Scalar implicatures, polarity phenomena, and the syntax/pragmatics interface. In: Adriana Beletti (ed.), Structures and Beyond: the Cartography of Syntactic Structures, Volume 3, 39–103. (Oxford Studies in Comparative Syntax.) Oxford and New York: Oxford University Press. Chierchia, Gennaro 2006 Broaden your views: Implicatures of domain widening and the “logicality” of language. Linguistic Inquiry 37(4): 535–590. Chomsky, Noam 1955 The logical structure of linguistic theory. Revised 1956 version published in part by Plenum Press, 1975; University of Chicago Press, 1985. Chomsky, Noam 1992 Explaining language use. Philosophical Topics 20(1): 205–231. Chomsky, Noam 1995a Language and nature. Mind 104(413): 1–61. Chomsky, Noam 1995b The Mimimalist Program. Cambridge, MA: MIT Press. Chomsky, Noam 2000 Minimalist inquiries: the framework. In: R. Martin, D. Michaels, and Juan Uriagereka (eds.), Step by Step: Essays on Minimalist Syntax in Honour of Howard Lasnik. Cambridge, MA: MIT Press. Chomsky, Noam 2001 Derivation by phase. In: Michael Kenstowicz (ed.), Ken Hale. A Life in Language, 1– 52. Cambridge, MA: MIT Press. Chomsky, Noam 2002 On Nature and Language. Cambridge: Cambridge University Press. Chomsky, Noam 2004 Beyond explanatory adequacy. In: Adriana Belletti (ed.), Structures and Beyond, 104– 131. Oxford: Oxford University Press. Cohen, L. Jonathan 1971 The logical particles of natural language. In: Yehoshua BarHillel (ed.), Pragmatics of Natural Language, 50–68. Dordrecht: Reidel Publishing Co. Davis, Wayne 2008 Implicature. In: Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy. Winter 2008 edition. De Cat, Cécile 2002 French dislocation. Doctoral dissertation, University of York. Erteschik-Shir, Nomi 2007 Information Structure: The Syntax-Discourse Interface. (Oxford Surveys in Syntax and Morphology.) Oxford: Oxford University Press. Fox, Danny 2003 The interpetation of scalar terms: Semantics or pragmatics? or both? Paper presented at the University of Texas, Austin. Geurts, Bart 2009 Scalar implicature and local pragmatics. Mind and Language 24(1): 51– 79. Geurts, Bart 2010 Quantity Implicatures. Cambridge: Cambridge University Press.
1282
V. Interfaces
Gill, Kook-Hee, and George Tsoulas 2004 Peripheral effects withour peripheral syntax. In: David Adger, Cécile de Cat, and George Tsoulas (eds.), Peripheries: Syntactic Edges and their Effects, 121–142. (Studies in Natural Language and Linguistic Theory 59.) Dordrecht: Kluwer Academic Publishers. Gill, Kook-Hee, and George Tsoulas 2009 Issues in quantification and DP/QP structure in Korean and Japanese. In: Anastasia Giannakidou and Monika Rathert (eds.), Quantification, Definiteness & Nominalization. Oxford: Oxford University Press. Ginzburg, Jonathan, and Ivan A. Sag 2001 English Interrogative Constructions. Stanford: CSLI Publications. Goldberg, Adele E. 2004 Pragmatics and argument structure. In: Laurence R. Horn and Gregory Ward (eds.), The Handbook of Pragmatics, 427–441. Oxford: Blackwell Publishing. Grice, H. Paul 1989 Studies in the Way of Words. Cambridge, MA: Harvard University Press. Hallman, Peter 2000 The Structure of predicates: Interactions of derivation, Case and quantification. Ph.D. thesis, UCLA. Han, Chung-Hye 1998 Asymmetry in the interpretation of -(n)un in Korean. In: Noriko Akatsuka, Hajime Hoji, Shoichi Iwasaki, Sung-Ock Sohn, and Susan Strauss, (eds.), Japanese Korean Linguistics, volume 7. Stanford: CSLI Publications. Heycock, Caroline 2008 Japanese -wa, -ga, and information structure. In: Shigeru Miyagawa and Mamoru Saito (eds.), The Oxford Handbook of Japanese Linguistics. Oxford and New York: Oxford University Press. Hirschberg, Julia 2004 Pragmatics and intonation. In: Laurence R. Horn and Gregory Ward (eds.), The Handbook of Pragmatics, 515−537. Oxford: Blackwell Publishing. Horn, Laurence R. 2004 Implicature. In: Laurence R. Horn and Gregory Ward (eds.), The Handbook of Pragmatics, 3−28. Oxford: Blackwell Publishing. Horn, Laurence 2006 The border wars: a neo-Gricean perspective. In: Klaus von Heusinger and K. Turner (eds.), Where Semantics meets Pragmatics, 21–48. Amsterdam: Elsevier. Horn, Laurence R. and Gregory Ward (eds.) 2004 The Handbook of Pragmatics. Oxford: Blackwell Publishing. Inkelas, Sharon, and Draga Zec 1995 Syntax − phonology interface. In: John A. Goldsmith (ed.), The Handbook of Phonological Theory, 535–549. Cambridge, MA. and Oxford: Blackwell. Jackendoff, Ray 1972 Semantic Interpretation in Generative Grammar. Cambridge, MA: MIT Press. Jayaseelan, K. A. 2001 IP-internal Topic and Focus Phrases. Studia Linguistica 55: 39–75. Kempson, Ruth, Wilfried Meyer-Viol, and Dov Gabbay 2001 Dynamic Syntax: The Flow of Language Understanding. Oxford: Blackwell Publishers. Kiss, Katalin E. (ed.) 1995 Discourse Configurational Languages. Oxford: Oxford University Press. Kratzer, Angelika 2005 Indefinites and the operators they depend on: From Japanese to Salish. In: Gregory N. Carlson and Francis Jeffry Pelletier (eds.), Reference and Quantification: The Partee Effect. Stanford: CSLI Publications.
36. The Syntax − Pragmatics Interface
1283
Kratzer, Angelika, and Junko Shimoyama 2002 Indeterminate pronouns: The view from Japanese. In: Yukio Otsu (ed.), The Proceedings of the Third Tokyo Conference on Psycholinguistics, 1–25. Tokyo: Hituzi Syobo. Levinson, Stephen C. 1983 Pragmatics. Cambridge: Cambridge University Press. Li, Charles N., and Sandra A. Thompson 1976 Subject and Topic: A New Typology of Languages. In: C. N. Li (ed.), Subject and Topic. New York: Academic Press. May, Robert 1985 Logical Form, its Structure and Derivation. Cambridge, MA: MIT Press. Miyagawa, Shigeru, and Mamoru Saito, (eds.) 2008 The Oxford Handbook of Japanese Linguistics. Oxford and New York: Oxford University Press. Nespor, Marina, and Irene Vogel 1982 Prosodic domains of external sandhi rules. In: H. van der Hulst and N. Smith (eds.), The Structure of Phonological Representations, volume 1, 222–255. Dordrecht: Foris. Nespor, Marina, and Irene Vogel 1986 Prosodic Phonology. Dordrecht: Foris. Potts, Christopher 2005 The Logic of Conventional Implicatures. (Oxford Studies in Theoretical Linguistics.) Oxford: Oxford University Press. Recanati, François 2004 Literal Meaning. Cambridge: Cambridge University Press. Rizzi, Luigi 1990 Relativized Minimality. Cambridge, MA: MIT Press. Rizzi, Luigi 1997 The fine structure of the left periphery. In: Liliane Haegeman (ed.), Elements of Grammar. A Handbook of Generative Syntax, 281–337. Dordrecht: Kluwer. Sauerland, Uli 2004 Scalar implicatures in complex sentences. Linguistics and Philosophy, 27: 367–391. Sportiche, Dominique 1997 Reconstruction and constituent structure. Talk presented at MIT, October, 1997. Steedman, Mark 2000 Information structure and the syntax-phonology interface. Linguistic Inquiry, 31(4): 649–689. Stemmer, Brigitte 1999 An on-line interview with Noam Chomsky: On the nature of pragmatics and related issues. Brain and Language, 68(3): 393–401. Special Issue: Pragmatics: Theoretical and Clinical Issues. Truckenbrodt, Hubert 1995 Phonological Phrases: their relation to syntax, focus, and prominence. Ph.D. thesis, MIT. Tsoulas, George 2003 Floating quantifiers as overt scope markers. Korean Journal of English Language and Linguistics, 3(2): 157–180. Vallduví, Enric 1992 The Informational Component. Outstanding Dissertations in Linguistics. New York: Garland Publishing.
George Tsoulas, York (UK)
VI. Theoretical Approaches to Selected Syntactic Phenomena 37. Arguments and Adjuncts 1. 2. 3. 4. 5. 6.
Introduction Two basic properties of arguments and adjuncts Dimensions of the argument notion Dimensions of the adjunct notion Conclusions References (selected)
Abstract The article discusses analytical options for arguments and adjuncts in syntax and semantics. Selection and non-suppressibility are identified as two core properties of arguments. Adjuncts are shown to be open to a whole array of modeling options, both in syntax and semantics. The overall trend in Generative Grammar and other frameworks diminishes the territory of adjunction proper and extends the reach of analyses which treat most traditional adjuncts as arguments.
1. Introduction Few dichotomies in linguistics are as underdetermined as the dichotomy between arguments and adjuncts. The basic intuition underlying the opposition is that two different kinds of dependence may be observed between two linguistic expressions E1 and E2. If E1 is an argument of E2, then the presence of E1 in the construction together with E2 satisfies a need or requirement of E2 to become complete, or less incomplete; if E1 is an adjunct of E2, then, according to this intuition, the presence of E1 in the construction together with E2 has no import on the (in)completeness of E2. (Many researchers would add that if E1 is an adjunct, the presence of E2 has an import on the [in]completeness of E1.) On either construal, E2 is the head of the construction, which means that it determines the category and distribution of the resulting structure (Bloomfield 1933: 195). A general consequence of all this is that arguments should, in some sense, be obligatory in a construction, and adjuncts optional. A competing term for “argument” is “complement” unless the argument is a subject, and a competing term for “adjunct” is “modifier”. In some traditions, the terms argument and adjunct are restricted to the syntax, and complement and modifier are restricted to the semantics. The research paradigms of Dependency and Valency Theory center around this basic contrast, leading to frameworks in which the dependency types of argumenthood and adjuncthood have a more basic status than the part-whole relation of constituency −
37. Arguments and Adjuncts
1285
Tesnière (1959); Ágel et al. (eds.) (2003, 2006). Theories relying on constituency as the fundament of linguistic structure building will, by contrast, assign the two dependency types a somewhat secondary status in comparison with constituency. The present article focuses on the constituent-structural implementation of the argument-adjunct dichotomy, and it does so with an eye on current developments in Generative Grammar and, to a limited extent, in Head-Driven Phrase Structure Grammar (HPSG). Notwithstanding this orientation, it will be made clear in numerous places what the general analytical options are irrespective of the particular framework chosen here. Argumenthood may be an ambiguous notion (more on this in a moment), but it is uncontroversial as such. One of the most general modeling tools that science has at its disposal are functions, and functions take arguments. The arguments in linguistic theories are just this: arguments of functions in the mathematical sense (3.4). There is an ambiguity in the use of the linguistic term argument, though. It lies in the contrast between syntactic and semantic arguments. Syntactic arguments are arguments of syntactic functions, which map syntactic categories onto other syntactic categories. On an incremental reading of X-bar Theory (Jackendoff 1977), a constituent of category V taking an NP/ DP argument will, for instance, be mapped onto a constituent of category V′. Semantic arguments are arguments of semantic functions, which map denotations onto other denotations. The CAPITAL-OF function, for instance, if it takes Italy as its argument, will yield Rome as output. The ambiguity does not seem to do much harm; we will often use the term “argument” without the adjunct “semantic” or “syntactic” if the context is sufficiently specific. By contrast, the problems with adjuncthood are numerous. First, there is no single uncontroversial modeling tool for adjuncts. This holds both in semantics (4.2) and in syntax (4.3). Secondly, there is a trend to level out the difference between arguments and adjuncts such that adjuncts, too, are increasingly seen to be arguments of a special kind of function morphemes with abstract content (4.3.2). For uncontroversial arguments, at least for the non-internal ones, it is similarly argued (in Generative Grammar) that they are arguments of functions/heads of the same general kind that also accommodate adjuncts (3.2, 3.3, 4.3). The combined effect of these trends is that the difference between arguments and adjuncts disappears. The effect is further strengthened by a variety of observations that intuitive adjuncts do influence the syntactic distribution and the semantic type of the categories in which they occur (4.3.2/4.3.3), thereby threatening one of the hallmark properties of adjuncts, viz. endocentricity. Other researchers aim at giving the adjunct notion positive content by splitting it up into several sub-notions ordered along a scale of argumenthood or adjuncthood. Storrer’s (2003: 774) work belongs in this tradition, with its six different categories operationalized by an algorithm of four tests with two to three output values each. Jacobs (1994) was the first to emphasize on a principled basis that the problems with the argument-adjunct dichotomy can be seen as a consequence of the fact that the dichotomy actually captures a multitude of contrasts to be analyzed independently. We will introduce important components of Jacobs’ (1994) proposal in the section which deals with basic theoretical concepts (sect. 2) and relate back to them throughout the text. The article uses mostly data from the domain of attributive/adnominal adjunction for exemplification (as opposed to adverbial/clausal adjunction). The reader is referred to Maienborn and Schäfer (2011) and Article 9 of the present handbook for more discussion of adverb(ial)s.
1286
VI. Theoretical Approaches to Selected Syntactic Phenomena
The so-called cartographic tradition is given much space in our survey. The term “cartographic” characterizes proposals which aim at universal syntactic hierarchies for a multitude of fine-grained syntactic categories instead of relying on traditional endocentric adverbial or adnominal adjunction; cf. 4.3.2 for basic assumptions. Two outstanding works in the cartographic tradition dealing with adverbial and adnominal adjunction/ modification are Cinque’s (1999, 2010) books. While the present overview aims at compatibility with Cinque’s proposals, it cannot reach their finegrainedness. Moreover, our survey article aims at a combined syntax-semantics perspective which maps distributional regularities onto options for semantic composition, and requirements of semantic composition onto options for syntactic implementations. A note on terminology: the use of the term argument in this article always signals that, in the discussion at hand, a syntactic or semantic function/predicate is assumed which takes the respective argument, if only for expository reasons; the use of the term adjunct, by contrast, is agnostic. It just signals that the phenomenon at hand was classified as adjunction, or modification, by some author.
2. Two basic properties of arguments and adjuncts 2.1. (Non-)Suppressibility As noted in the introduction, theories of argument structure assume that arguments are, in some sense, necessary constituents; deleting them should lead to ungrammaticality or some kind of deviance. This diagnostic is clear for the object in (1a), but it fails with the object of verbs like eat in (1b), and it yields no contrast between (1b) and (1c/d). (1)
a. b. c. d.
Mary reached *(Estonia). Paul is eating (a pizza). Eddie made (Lisa) a cake. Eddie made a cake (for Lisa).
Despite the omissibility contrast between (1a) and (1b), and the non-contrast between (1b) and (1c/d), most researchers will aim at a theory which categorizes both Estonia and a pizza in (1) as arguments, and not just Estonia. At the same time, (for) Lisa in (1c/d) would, for many researchers, be adjuncts even though these constituents behave no differently from a pizza if syntactic omissibility is the criterion. However, if the nonomissibility criterion of argumenthood (cf. Jacobs’ 1994 obligatoriness) is conjoined with a second, semantic, criterion as in (2) (cf. Jacobs’ 1994 participation), then we may arrive at a better diagnostic. (2)
Non-suppressibility of arguments A constituent C of a simple declarative non-negated sentence S is an argument iff (i) S is ungrammatical without C, or (ii) S is grammatical without C, but entails the C relation (where “the C relation” is the semantic relation that links the content of C to the eventuality described by S).
37. Arguments and Adjuncts
1287
(2) will declare (1a) an argument, because (i) is fulfilled. It will declare (1b) an argument, because (ii) is fulfilled; Paul is eating entails that there is something that Paul eats. It will, finally, declare (1c) an adjunct, because (ii) is not fulfilled; Eddie made a cake does not entail that there is someone who is made a cake. Similarly, (1d) may be an adjunct, because (ii) is not clearly fulfilled; Eddie made a cake does not straightforwardly entail that some other eventuality was tied to it (a person’s intended benefit or detriment, for instance). To be sure, (2) contains a disjunction, but this is the price to pay if nonsuppressibility is to be criterial both in the syntax and in the semantics, as appears to be desirable. The defect of the criterion lies elsewhere. We will see throughout the article that it is not at all easy to decide whether a given lexical item or construction “entails the C relation”. The complementary criterion for adjuncts, which is to be regarded with the same reservations as (2), will then come out as in (3). (3)
Suppressibility of adjuncts A constituent C of a simple declarative non-negated sentence S is an adjunct iff (i) S is grammatical without C, and (ii) S without C does not entail the C relation.
2.2. Selection The second property that is almost invariably put to use to distinguish arguments from adjuncts is what I call selection here. The term selection is to capture the potential of a linguistic expression E1 to impose certain restrictions on a co-occurring expression E2. Traditionally, a verb governs the case of its object, i.e. it imposes a morphosyntactic restriction on a co-occurring expression. Hence, case government is an instance of morphosyntactic selection (Jacobs’ 1994 formal specificity). At the same time, a verb will also impose semantic subcategorization requirements that its object must fulfill. Referents of objects of eat, for instance, must be tangible, whereas referents of objects of think through may not. These are instances of semantic selection or presuppositions/ subcategorization requirements (Jacobs’ 1994 content specificity; cf. 3.4 for its formal implementation). Adjuncts, by contrast, may be said not to fulfill selectional requirements. On the face of it, nothing in make a cake requires the presence of a PP, or a PP with a specific semantic content, as is adjoined in (1d). As such, adjuncts may be said not to be selected. Instead, one may say that adjuncts themselves select the type of their host. In the case of for Lisa in (1d), for instance, the constituent to which the PP adjoins must be one describing a volitional action. Just like (non-)suppressibility, the notion of selection has problems tied to it. While no grammar theory that I know of makes do without selection, the direction of selection may be a matter of debate for a single co-occurrence of two linguistic expressions. This blurs the distinction between arguments and adjuncts again, and it does so in a fundamental way, as can be seen throughout the article. Selection leads to structure building, or defines relations within an existing structure, in all sufficiently formal grammar models, among them Mainstream Generative Syntax, mainstream type-driven formal semantics, Categorial Grammar, Head-Driven Phrase Structure Grammar and unification-based grammars.
1288
VI. Theoretical Approaches to Selected Syntactic Phenomena
3. Dimensions of the argument notion 3.1 discusses the tree-geometrical regularities of argument taking. 3.2 illustrates the content of 3.1 with the help of two case studies. 3.3 deals with the semantic modeling of predicate-argument relations.
3.1. Syntactic positions of arguments in constituent structures It would seem that, in a constituent structure, sisterhood should be the natural treegeometrical relationship between a predicate and its argument. A predicate combines with an argument (restricted by selection), and the resulting structure assembles the two sister nodes of predicate and argument under the mother node. This is schematically depicted in (4) (linearization is irrelevant here and in the following). (4)
Predicate-argument constituent I predicate
argument
While it is true that sisterhood remains indispensable for the complementation of predicates, certain developments in the Generative tradition, especially the cartographic approach, have led to a downgrading of sisterhood as the privileged tree-geometrical configuration of complementation, and to an upgrading of the specifier-head relationship (i.e. a relationship that shares more tree-geometrical properties with prototypical modifier constructions than with complements that are sisters of heads). This development is ultimately a consequence of Generative Grammar’s quest for a universal syntactic and semantic decomposition of natural languages, and it will have similar repercussions in other frameworks. For these reasons the upgrading of specifier-head relationships to model complementation is a feature of Generative Grammar that we will look at in some detail. Its discussion will, moreover, provide necessary background for the treatment of the different ways in which adjuncts can be accommodated. Predicates with more than one argument are a domain where specifier-head relationships (or their notational variants) have always played a role in complementation (provided binary branching is assumed). Thus, a typical way to represent the syntax of a transitive verb, or the syntax of a ditransitive VP, is as in (5). (5)
Predicate-argument constituent II argument i
intermediate projection of P
predicate P
argument j
In a structure as in (5), argumenti is the subject of a transitive verb, or one of the objects of a ditransitive verb. Argumentj is the direct object, or the other object of the ditransitive construction. Argumenti is in a specifier position. To sum up, predicates with more than
37. Arguments and Adjuncts
1289
one argument have always given rise to arguments in specifier positions. The two case studies to follow illustrate why the specifier implementation of arguments has gained in importance in current stages of Generativism.
3.2. Case study I: transitive agentive-causative structures Chomsky (1993, 1995) introduced little v, i.e. a functional category to mediate between VPs and the functional structure above VP (chiefly Tense/TP, formerly IP). With little v in place, subjects were no longer introduced as specifiers of V (or of I, if no VP-internal subject analysis is adopted), but of v. Kratzer (1996; inspired by earlier work by Marantz 1984) delivered a semantic interpretation of little v (cf. Horvath and Siloni 2002 as opponents of the Little-v Hypothesis). In sentences with an agentive semantics it is a voice head which introduces the agent theta role of the event at hand. As such, it selects the agent DP as its specifier. The denotation of the VP itself no longer takes recourse to an agent argument. For this reason, Kratzer’s theory leads to agent severance − the agent is cut off the VP. The resulting structure is depicted in (6). (I often use X-bar notation in this article; this is done to ensure recognizability across frameworks, but I assume those representations to have notational variants in Bare Phrase Structure.) (6)
vP v′
agent DP v0 Voice
VP
In (6) the agent/subject DP in Specv is no longer selected as an argument of the same predicate as the direct object in VP. It is the outer argument of v, and v’s inner argument is VP. Agent severance thus constitutes a first example of what is meant by the upgrading of specifier-head relationships to accommodate arguments in the structure in recent Generativism. Part of the meaning and part of the syntactic selection potential of former V heads have been sourced out to a functional projection of its own. As a consequence of this move, the single DP argument of v is in Specv. The sister position of v accommodates the VP argument, or complement, of v. Kratzer (2005) further radicalizes the structure of agentive causative predications by disentangling v’s causative and agentive semantics. A tree as in (7) will then have a denotation as in (7′) with the VP only denoting resultant states. (7)
VoiceP DP
Voice′ Voice0 +active
CAUSE CAUSE
VP
1290 (7′)
VI. Theoretical Approaches to Selected Syntactic Phenomena ‘The referent of DP is the agent of an event e & e causes a state s & s is of the VP type.’
As said above, cascades of functional projections with binary branching render specifier positions as the norm for classical arguments. The sister constituents of such functional heads will host the next lower phrasal category. To be sure, these sister constituents are arguments, too, but they are so in a more theory-dependent way than the specifier arguments. Functional heads like Voice/v or CAUSE will be called F-heads henceforth (F for functional). Only at the bottom of a given projection line does sisterhood in a current Generative tree signal a more traditional predicate-argument structure as in (4) (an example would be the [PP P DP] constituent in a structure like [Paul … [+active [CAUSE [jumped [PP onto [DP the table]]]]]]). Heads taking such traditional arguments as sisters/ first arguments will henceforth be called L-heads (L reminiscent of lexical). Summing up, the ongoing (and controversial) proliferation of functional categories in Generative Grammar reverses the prototypical tree-geometry of argument-taking. While sisterhood is still a possible configuration to form predicate-argument constituents with L-heads, most arguments (in a traditional sense) are accommodated in specifier positions of F-heads. Section 4.3 below will address the proliferation of F-heads in more detail and will aim at linking it to empirical generalizations.
3.3. Case study II: German free datives vs. beneficiary PPs Consider the examples in (8) and (9) for exemplification of the contrast between predicates on and off the main projection line. (8)
[ Fθ [ einen Drink machte ] ] ]. a. … dass er [ Paul a drink made that he Paul.DAT ‘… that he fixed Paul a drink.’
[German]
b.
(9)
a. … dass er [ [ für Paul] [ einen Drink machte] ]. that he for Paul a drink made ‘… that he fixed a drink for Paul.’ b.
[German]
37. Arguments and Adjuncts
1291
(8) is a German example with a dative not subcategorized for, or selected, by the verb. (9) is a structure with a similar meaning, but the DP that was coded as a free dative in (8) has been replaced by a PP. The verb has been chosen so as to allow for a parallel contrast in English. Either structure entails that (the speaker thinks that) the subject referent considered Paul to have a benefit of (the fixing of) the drink. (8) has an F-head on the main projection line which licenses the argument position of the dative DP and denotes its thematic relation (beneficiency according to Pylkkänen 2002, a modalized experiencer relation according to Hole 2012, 2014, with the beneficiency entailment stemming from purposive structure inside the VP). Now, while both the preposition for and the thematic role head Fθ denote the predicate which selects Paul in the event described by the complete structure, Fθ is an F-head, and für is an L-head (3.2). One interesting detail about the structure of (8) is the different syntactic form that adjuncthood takes here if compared with (9). The adjunct construal in (9) is “self-contained” in the sense that the presence of the adjunct makes no difference on the main projection spine; the lower VP is dominated by another VP. Things are different in (8). Here the adjunctlike dative DP is, syntactically speaking, not an adjunct at all. It is dominated by the F category headed by the F-head introducing a thematic role. Therefore, at the level of F′, the DP Paul is a clear argument, and not an adjunct. Only if one takes the VP level as the point of reference may the complete F-structure on top of the VP count as an adjoined structure. But then again, this purported adjunction takes the form of a headed structure which determines the category of the highest category, i.e. FP, thereby defying the definitional adjunct property of endocentricity. Given that the functional element licensing the DP Paul in (8) has no phonetic content, how can it be decided whether it sits on the main projection line, or whether it forms a constituent with Paul just like the preposition in (9)? The latter option is depicted in (8b′). (8)
b′.
An argument in favor of (8b), and against (8b′), comes from the linearization options that free datives have in comparison with other bona fide dative arguments and in comparison with PPs. It turns out that the bare dative DPs pattern with other bare DP arguments, not with PPs. This is shown in (10) and (11). (10) … dass er (für den Gast/ dem Gast) einen Drink machte that he for the guest/ the.DAT guest.DAT a drink fixed (für den Gast/ *dem Gast) for the guest/ the.DAT guest.DAT ‘… that he fixed a drink for his guest.’
[German]
1292
VI. Theoretical Approaches to Selected Syntactic Phenomena
(11) … dass er (dem Gast) half (*dem Gast). that he the.DAT guest.DAT helped the.DAT guest.DAT ‘… that he helped his guest.’
[German]
Just like other dative arguments not suspected for being hidden PPs (the dative with half ‘helped’ in [11]), the free dative may not surface in postverbal position in embedded clauses. This extraposed position is a natural position for PPs as in (10), especially for heavy PPs. A second argument for the analysis involving Fθ on the main projection line comes from languages that have overt thematic morphology in translational counterparts of (8). Bantu and Kartvel/South Caucasian languages have such verbal markers (the traditional category name is applicative in Bantu linguistics, and version in Kartvelian linguistics). Since this verbal morphology is affixed to the verb just as other verbal categories like aspect or tense are, it can be concluded that the tree-geometry is likewise analogous: since aspect and tense are projections on the main projection line, Fθ is, too. Two examples are given in (12). The Georgian example features additional complexity; Fθ is spelled out as the character vowel -i- for the objective version/voice on the indirect object person marker -m(i)- adjacent to the verb stem. atsikana mphatso. (12) a. Chitsiru chinagul-ir-a [Chichewa; Bantu] presents fool bought-APPL-FINAL.VOWEL girls ‘The fool bought the girls presents.’ (Marantz 1993: 121) (me) vašli. b. man mo-mi-p’ara [Georgian; Kartvel] (s)he has-1SG.OBJECTIVE.VERSION-stolen to.me an.apple ‘He stole an apple for me.’ (Boeder 1968: 94; my gloss, D. H.)
3.4. Functions and arguments I: Basic modeling tools The reason why arguments are called arguments is because they are viewed as arguments of functions in the mathematical sense. On the semantic side such functions may, depending on the phenomenon at hand and the theory used, have different kinds of entities in their ranges: individuals (in the case of definite descriptions); truth values (with sentences in those branches of formal semantics subscribing to a Fregean program; Heim and Kratzer 1998 and many more); events or situations (with sentences in some branches of event and situation semantics in the tradition of Kim 1966); and whatever a given semantic theory assumes to be a semantic type primitive. On the syntactic side an argument may be seen as an input to a function which maps one syntactic category onto another such that a function f (say, V) which takes an argument of a specific category as input (say, DP) yields an output of the kind determined by f (say, V′). Further details depend on what precise syntactic framework is chosen. Within Generative Grammar, the most important difference at the time of writing is whether a version of X′-theory is assumed (Jackendoff 1977), or whether Bare Phrase Structure is given preference (cf. Art. 24).
37. Arguments and Adjuncts
1293
Over the past decades, the lambda calculus (Church 1936) has widely gained ground in the representation of lexical entries for functions and arguments in composition. This holds true not just in semantics, but also in morphology and formal pragmatics. For this reason, the conventions of the lambda calculus in the variant used by Heim and Kratzer (1998) are briefly introduced here. (13a) is the general format (to be explicated below), and (13b) presents an example, the lexical entry of red; the interpetation brackets symbolize the function from linguistic expressions to denotations, which applies at the interpretive interface. (13c) illustrates lambda conversion/Functional Application, i.e. how the function applies to an argument; the copula is ignored for ease of exposition, but could easily be rendered functional. (14) is an example with mathematical content for comparison. (13) a. λ : . b. ⟦red ⟧ = λx : x2De & x is visible . x is red c. ⟦The rose is red ⟧ = ⟦red ⟧(⟦the rose⟧) = λx : x2De & x is visible . x is red(the rose) = 1 if the rose is red, and 0 otherwise (14) λx : x2|R+ . √x (25) = √25 = +/−5 (14) represents the square root function as a lambda-term. The domain of the function is restricted in that portion of the term which follows the colon and precedes the full stop. The domain is restricted to the real numbers. A shorthand for the notation of such a domain restriction is λx|R+ . Since the range of the (reverse) square function only has positive numbers in it, the domain of the root function must fulfill the same condition. In a similar vein, the domain of the red-function in (13b) is restricted to the domain of individuals which are visible − x must be from the intersection of De, the subset of the domain of entities D which has all and only the individuals in it, with the set of visible entities. The truth-condition to the right of the period delivers a truth-value for each member of the domain so restricted, either 1 for true if the visible individual is red, or 0 for false if it is not. With the domain of ⟦red ⟧ restricted to the visible entities, a sentence like Love is red will, on a literal reading, not be true or false, but undefined, and it will not have a truth-value at all. The mapping to truth-values is a convention which is not made explicit in (13b); just the truth-conditions themselves are stated. This is different in the mathematical example (14). In that example, the part to the right of the dot features the value description. If one were to render (13b) more similar to (14) one would have to write ‘⟦red ⟧ = λx : x2De & x is visible . 1 if x is red and 0 otherwise’. The general term in (13a) comes out as follows in natural language (for functions with truth-values in their range): that function which maps every such that fulfills to 1 if fulfills , and to 0 otherwise. The domain restrictions in (13) are the general way to write down presuppositions or subcategorization requirements (semantic selection). They are really just that: domain restrictions, in the sense that the whole function only delivers a value if the argument is taken from the set singled out by the domain restriction (Heim and Kratzer 1998, implementing a philosophical tradition going back to Frege [1892] 1994 and Strawson 1950). There is a useful and widespread notational convention to write down the semantic type of a given one-place function as an ordered pair of input and output types. A function of type )e,t*, for instance, is a function from (the domain of) individuals
1294
VI. Theoretical Approaches to Selected Syntactic Phenomena
(type e) to (the range of) truth-values (type t). This way of writing down semantic types is a good way to render the argument-taking potential of complex lambda-terms visible at a glance. (15) Tab. 37.1: Examples of lambda terms and argument saturation with different categories TYPE OF
EXAMPLE
PREDICATE
LAMBDA
PARAPHRASE
TERM
EXEMPLIFICATION OF ARGUMENT SATURATION BY FUNCTIONAL APPLICATION
one argument/type )e,t* ‘that function which maps every x such that x is an element of De (the domain of individuals) to 1 (true) if x …, and to 0 (false) otherwise’ adjectival I
The rose is λxe . x is red. red
… is red …
λxe . x is red(the rose) 1 iff the rose is red ={ 0 if the rose is not red } = 1 iff the rose is red
nominal
The flower λxe . x is in the vase a rose is a rose.
… is a rose …
λxe . x is a rose(the flower in the vase) = 1 iff the flower in the vase is a rose
verbal I
Paul sleeps.
… sleeps …
λxe . x sleeps(Paul) = 1 iff Paul sleeps
λxe . x sleeps
two arguments/type )e,)e,t** ‘that function which maps every x such that x is an element of De (the domain of individuals) to that function which maps every y such that y is an element of De to 1 (true) if y … x, and to 0 (false) otherwise’ λxe . λye . … is proud of … y is proud of x
adjectival II
Mary is proud of Paul.
verbal II
Mary loves λxe . λye . … loves … Paul. y loves x
λxe . λye . y is proud of x(Paul)(Mary) = λy . y is proud of Paul(Mary) = 1 iff Mary is proud of Paul λxe . λye . y loves x(Paul)(Mary) = λye . y loves Paul(Mary) = 1 iff Mary loves Paul
Frequently, a predicate takes n-many arguments, with n > 1. In this case there are two ways to go ahead. Either the function written down as a lambda-term takes an n-tuple
37. Arguments and Adjuncts
1295
as its argument. The other option is to apply one argument after the other, which means that the first function is, after the first instance of Functional Application, mapped to one intermediate function (in the case of two arguments), before the application of the second argument leads to the final output. This piece-meal kind of saturation with, as it is called, schönfinkeled, or curried, functions (Schönfinkel 1924), is the way to go in a binary branching syntax in which each syntactic node is to have a denotation. The semantic type of a transitive verb, for instance, will then come out as )e,)e,t**, a function from individuals to [a function from individuals to truth-values]. Exemplification is provided for the transitive adjectives and verbs in (15). Domain restrictions that go beyond those which are commonly represented as indices are left out.
4. Dimensions of the adjunct notion There is a confusing array of proposals to get a theoretical grip on the adjunct notion. This holds both in syntax and semantics. For this reason the present section assembles discussions of dichotomies (and one four-way distinction) which all divide the class of adjuncts into complementary subclasses. Which of these modeling tools are made use of is a matter of theoretical persuasion, desired granularity for the discussion of the issue at hand, and taste. Not all of the empirical phenomena discussed in this section are classified as adjunction phenomena by all researchers. But each dichotomy has figured in the classification of (subclasses of) adjuncts in the literature.
4.1. The major split: attributes vs. adverb(ial)s The dichotomy of attributes vs. adverb(ial)s is probably the best-known partition of the adjunct or modifier class. Simplifying somewhat, one may say that attributes are adjuncts of nominals; adverb(ial)s are adjuncts of non-nominals. The second-level dichotomy of adverbs vs. adverbials usually concerns the phrasal status of the adjunct: adverbs are words/terminal nodes, whereas adverbials are phrases (this does not preclude adverbs from being analyzed as being dominated by non-branching phrasal nodes in a given theory). English and Romance are among those languages that distinguish adverbial uses of predicative adjectives from attributive uses by suffixation of an adverbial marker (-ly in English, -mente in Italian). Examples are given in (16) and (17). (16) a. a [wise reply] b. She [replied wise-ly]. (17) a. una [risposta sapiente] a reply wise ‘a wise reply’ b. Lei [ha risposto sapiente-mente]. she has replied wise-ly ‘She replied wisely.’
[Italian]
1296
VI. Theoretical Approaches to Selected Syntactic Phenomena
It was stated above that characterizing attributes as adjuncts of nominals involves a simplification. The simplification lies in the fact that not each adjunct within a nominal is an attribute, which can easily be seen from an example like the frequently useless purchases, where frequently is an adjunct inside the larger nominal, but an adverbial adjunct (as opposed to an attributive adjunct). The reason for this is, of course, that frequently is an adjunct of useless. As such, it is an adjunct inside a nominal, not an adjunct of the nominal proper. In terms of binary branching constituent structures this means that an adjunct of a nominal is an attribute if and only if it, or its maximal projection, is on the main projection line of the nominal.
4.2. Modes of semantic construal for adjuncts The four-way distinction dealt with here is one concerning principles of semantic composition. If one aims at a close syntax-semantics fit, the differences in composition principles may lead to syntactic reflexes and vice versa. A summary of the possible matchings of composition principles and syntactic categories is deferred until 4.10.
4.2.1. Functions and arguments II: implementations for adjuncts The first possible semantic construal for adjuncts is to have one of the two elements denote a semantic function or predicate, and the other its argument. This is no different from the situation with syntactic predicates and arguments discussed in section 3. What is different in the case of adjuncts is that the adjoined element may, on the semantic side, easily be either a predicate or an argument, depending on the construction at hand and the construal chosen. The first option is to construe the adjunct as the semantic predicate, and its sister as its argument; cf. (18). This the traditional view in Dependency Theory and Categorial Grammar which holds that modifiers or adjuncts are predicates, but a kind of predicate which, after it has combined with its argument, results in an expression of the type of the argument, and not of the type of the predicate (endocentricity). It is also the standard treatment of adjuncts in HPSG (Pollard and Sag 1994); but see 4.3.3/scenario 1 below. A consequence of this view is that the adjunct selects its sister constituent in the sense of 2.2. On this view, red in red ball is a predicate which has an argument position for a (nominal) predicate and yields a (nominal) predicate as its outcome. (18) a. b. c. d. e.
example red ball (construal A) predication predicate argument types ))e,t*,)e,t** )e,t* denotations λf)e,t* . λxe . red(x) = f(x) = 1 λxe . ball(x) composition by λf)e,t* . λxe . red(x) = f(x) = 1[λxe . ball(x)] = Functional λxe . red(x) = ball(x) = 1 Application ‘that predicate which is true of all and only the red balls’, i.e. that characteristic function from individuals x to truthvalues which yields 1 iff x is red and a ball, and 0 otherwise
37. Arguments and Adjuncts
1297
The second guise of predicate-argument construals for adjunction is the reverse of the first one: the adjunct is the argument now, and the category that it adjoins to is the predicate (selection originates in the syntactic sister of the adjunct now). This is exemplified for the same example as in (18) above, and for two more examples in (20) and (21). The likely concern of some readers that this appears to make less sense for (19) than for the other examples will be addressed shortly. (19)
a. b. c. d. e.
example predication types denotations composition by Functional Application
red ball (construal B) argument predicate )e,t* ))e,t*,)e,t** λxe . red(x) λf)e,t* . λxe . ball(x) = f(x) = 1 λf)e,t* . λxe . ball(x) = f(x) = 1[λxe . red(x)] = λxe . ball(x) = red(x) = 1 ‘that predicate which is true of all and only the red balls’, i.e. that characteristic function from individuals x to truthvalues which yields 1 iff x is red and a ball, and 0 otherwise
(20)
a. b. c. d.
example predication types denotations
Canadian argument )e,t* λxe . Canadian(x)
e.
composition by Functional Application
a. b. d. e.
example predication denotations composition by Functional Application
(21)
national anthem predicate ))e,t*,)e,t** λf)e,t* . λxe . x is the national anthem of the country of which f is the corresponding property of individuals λf)e,t* . λxe . x is the national anthem of the country of which f is the corresponding property of individuals[λxe . Canadian(x)] = λxe . x is the national anthem of the country of which ‘being Canadian’ is the corresponding property of individuals
speak rudely predicate argument λm . λxe . x speaks in the m-manner m is rude λm . λxe . x speaks in the m-manner[m is rude] = λxe . x speaks in the rude manner
This construal of adjuncts as semantic arguments is certainly pertinent for cases of relational nouns (or nominals) whose arguments may be adjectives. National anthem in (20) is such a relational nominal, because each referent which is a national anthem is an anthem of a single particular state. As is the case in (20), an adjective derived from a state name, or a corresponding of-phrase, may saturate this argument slot. Somewhat less intuitively, the adverbial adjunct in (21) is construed as a manner argument which saturates a postulated manner argument slot of the verb speak (on the construal of [21]). If one wants to argue seriously for such a construal, the assumption of a specific type primitive m for manners will have to be justified; cf. Dik (1975) or Maienborn and Schäfer (2011) for implementations. Turning to red ball in (19), we must ask ourselves
1298
VI. Theoretical Approaches to Selected Syntactic Phenomena
whether anything can be said in favor of construal B as opposed to construal A in (18). Is there anything to support the idea that ball has a denotation which is unsaturated for an additional (color) predicate? In fact, Morzycki (2005) claims that this is one way of modeling modification or adjunction quite generally. Since the different options in this domain are complex and require closer scrutiny, we will return to the contrast between adjuncts as semantic arguments and adjuncts as semantic predicates in a separate subsection (4.3). The present subsection 4.2 continues to provide an overview of the different possible composition mechanisms for adjunction phenomena. So far, we have looked at the predicate-argument costrual. We shall now turn to the second general option: Predicate Modification.
4.2.2. Predicate Modification Predicate Modification is a rule of composition which allows one to derive denotations of sisters in a tree which are of the same (predicative) type. As such these sisters could not be combined by Functional Application, because Functional Application requires a difference in semantic type, or argument structure, between the two constituents to be combined (cf. 3.4). Predicate Modification leads to the conjunction of the truth-conditions of the combined predicates. (22) is Heim and Kratzer’s (1998: 65) definition of Predicate Modification designed for predicates with a single argument slot for individuals. A rule with this function was first introduced by Higginbotham (1985) under the name of theta-identification. (23) is an example. (22) Predicate Modification)e,t* If α is a branching node, {β, γ} is the set of α’s daughters, and ⟦β⟧ and ⟦γ⟧ are both in D)e,t*, then ⟦α⟧ = λxe . ⟦β⟧(x) = ⟦γ⟧(x) = 1 (23) Composition by Predicate Modification)e,t* ⟦ [ [red ] [ball] ] ⟧ = λxe . ⟦red ⟧(x) = ⟦ball⟧(x) = 1 = λxe . red(x) = ball(x) = 1 = λxe . x is red and a ball (22), due to the reference to the (unordered) set of daughters of α, is not sensitive to linearization, i.e. it can deal with adjuncts preceding and following heads. The rule has no symbol of conjunction in it, as one might expect if one looks at its natural-language paraphrase in the last line of (23), which conjoins truth-conditions. The reason for this is the general (Fregean) decision to have sentences refer to truth-values. If ⟦red ⟧(x) has either 1 or 0 as its output, it makes no sense to write ⟦red ⟧(x) & ⟦ball⟧(x), since this would amount to the undefined and nonsensical conjunction of truth-values, i.e. to 1&1, or 1&0, or 0&0. It depends on features of the desired theory whether one finds Predicate Modification stipulative and too crude, or usefully versatile and even parsimonious. Some factors
37. Arguments and Adjuncts
1299
which play a role in this decision are whether uniformity of composition rules is aimed at (minus point for Predicate Modification), how coarse (plus) or finegrained (minus) the type distinctions are to be and to which degree higher-order types are tolerated in (minus) or banned from (plus) the desired theory.
4.2.3. Subsection: partial Predicate Modification Bouchard (2002) argues that certain meaning contrasts in the domain of nominal adjunction structures should be analyzed by having the adjunct predicate intersect with only part of the denotation of its sister constituent. Even though it is assumed here that not all of the examples discussed by Bouchard (2002) in this context should be given such an analysis, we will develop an implementation of his general idea for the paradigmatic example in (24). The analysis sketched here is not standard and relies on the syntaxsemantics interaction of a sub-lexical meaning component with the adjunct. (24) a good chef ‘a person which is good as a chef’ (partial intersection) ‘a person which is good and a chef’ (intersection) A good chef may either be a person who is good as a chef, or a person who is good and a chef. This is a real ambiguity. It becomes obvious when we use the adjective once, but try to interpret it differently for different referents, as in: #Mary and Pepe are good chefs, because Mary has one Michelin star and Pete donated a million euros last year. There is no way to derive this difference in meaning with simple and constant lexical entries for good and chef, and by sticking to simple Predicate Modification or Functional Application. The reason is that if we do not want to assume an ambiguity for good, we will always end up with the conjoined truth-condition ‘x is good and x is a chef’, no matter if Functional Application (4.2.1) or Predicate Modification (4.2.2) is applied. What we want is one reading where x is good as a chef, and one where x is a chef and good independent of being a chef. We will assume for our implementation that chef, as a lexical item, can be decomposed, and that the parts of this decomposition can be syntacticized. (25a) gives the assumed denotation of chef in (i) λ-notation; (ii) natural language (less detailed); (iii) natural language (more detailed). (25b) is a proposal for a syntacticized entry of chef (morphosyntactic features are disregarded). (25) A decomposed and syntacticized lexical entry of chef a. denotation of the N0 chef (i) ⟦chef ⟧ = λxe . GENes [x p.ch.a.(e)] [de′s [e3e′ & x cooks(e′)]] (ii) ‘the property of an individual x such that, if x performs a characteristic action, x typically cooks’ (iii) ‘the property of an individual x such that, if e is an event of x performing a characteristic action, then there typically is an event e′ such that e is a part of e′, and e′ is an event of x cooking’
1300
VI. Theoretical Approaches to Selected Syntactic Phenomena b. N0 chef: syntacticized denotation
N0 chef λxe . GENes [x p.ch.a.(e)] [∃e′s[e R e′ & x cooks(e)]] 1
3
λxe . λes . x cooks(e) 4
λg〈e,〈s,t〉〉 . λxe . GENes [x p.ch.a.(e)] [∃e′[e R e′ & g(x)(e′)=1]]
3
λf〈e,〈s,t〉〉 . λg〈e,〈s,t〉〉 . λxe . GENes λxe . λes . x performs a characteristic action(e) [f(x)(e)=1] [∃e′[e R e′ & g(x)(e′)=1]] 3 2
① marks the topmost N node of the syntacticized version of chef. Its denotation is the result of combining a suitable version of the genericity operator (cf. Krifka et al. 1995) in ② with its restrictor argument ③, and the nuclear scope argument ④. The result in ① is the same as the denotation in (25a). Importantly, it features the predicative node ④ with the denotation ‘λxe . λes . x cooks(e)’ in an interesting position. If we adjoin the adjective good in the next step as in (26) we have good and the cooking predicate in adjacent positions (since better chef with the reading ‘person who cooks better than someone else’ is not deviant, I favor the view that good is inside a degree phrase; cf. 4.3.3). (26)
3 DegP NP 4 | good N0 | chef R e′ & x cooks(e′)]] λxe . GENes [x p.ch.a.(e)] [∃e′[e s
3
λxe . λes . x cooks(e)
λg〈e,〈s,t〉〉 . λxe . GENes [x p.ch.a.(e)] [∃e′[e R e′ & g(x)(e′)=1]]
3
λf〈e,〈s,t〉〉 . λg〈e,〈s,t〉〉 . λxe . GENes [f(x)(e)=1] [∃e′[e R e′ & g(x)(e′)=1]]
λxe . λes . x performs a characteristic action(e)
The terminal node closest to the one hosting good is the one with the denotation ‘λxe . λes . x cooks(e)’. The idea now is to say that, in the case of subsection, Predicate Modification may combine an adjective and a nominal subpredicate if and only if the nominal subpredicate is the outermost predicate/is on the leftmost branch of the syntacticized nominal denotation. If we assume an event semantics denotation of good as in (27a) and combine it with the denotation of the top left branch of chef, we arrive at the denotation in (27b). (27) a. ⟦good ⟧ = λxe . λes . x is good to a high degree d(e) b. [λxe . λes . x is good to a high degree d(e)] [λxe . λes . x cooks(e)] = λxe . λes . x is good to a high degree d and x cooks(e)
[PM]
37. Arguments and Adjuncts
1301
In this representation, the individual’s being good holds as part of the events in which the individual cooks. This yields a reading which may be good enough as an accurate paraphrase of the subsection reading. The denotation of good chef will come out as ‘the property of an individual x such that, if e is an event of x performing a characteristic action, then there typically is an event e′ such that e is a part of e′, and e′ is an event of x cooking and of x being good to a high degree d’. Needless to say, it is a daring step to destroy the lexical integrity of lexemes as is necessary for this account to deliver the right results. But note that syntactic evidence can be adduced to demonstrate that the respective readings are only available under strict adjacency (a good bad chef must be a chef who cooks poorly, but may be a nice person, and not vice versa). Moreover, with the adjacency requirement in place (interaction with the topmost terminal node of the lexical item), the tool to modify parts of lexical predicates from without is kept highly constrained, and is reminiscent of phases.
4.2.4. Special rules of interpretation The composition rules discussed in the previous subsections each have a wide array of uses, or at least their use is not restricted to a few lexical items, or a small distributional class. They are not item-specificic or − with the possible exception of the subsection cases of 4.2.3 − construction-specific. By contrast, the composition principles discussed here have an extremely limited domain of application. Consider, for instance, Ernst’s (2002) interpretation schema for agent-oriented adverbs like cleverly or stupidly in (29). (29a) is cited after Morzycki (2005: 17); (29b) inserts linguistic material; (29c) spells out the resulting truth-conditions. (28) Paul cleverly used his options. (29) agent-oriented adverbs a. ADV [E …] → [E′ [E …]…& PADJ([E …], Agent)], where the designated relation in PADJ between the event and the Agent is [REL warrants positing], and the comparison class for PADJ is all relevant events in context. b. cleverlyADV [E Paul used his options] → [E′ [E Paul used his options] & ‘clever’([E ‘Paul used his options’], Agent)], where the designated relation ‘clever’ between the event and the Agent is [REL warrants positing], and the comparison class for PADJ is all relevant events in context. c. ‘Paul used his options & this event warrants positing more cleverness on the part of the agent of this event than is the norm for events.’ It is not fully clear how this would have to be spelled out in more detail to be fully explicit, but the idea is clear; whenever an adverb of a given syntactic-semantic class combines with an eventive constituent, the structure to be interpreted is enriched by a certain interpretive template so as to keep the denotation of the adverb very simple and put the load of mediating between the adverbial predicate and the rest onto the rule, or the template.
1302
VI. Theoretical Approaches to Selected Syntactic Phenomena
The trend in mainstream formal semantics, inasmuch as it follows, e.g., Heim and Kratzer’s (1998) research program, leads away from rules like this. More and more things that used to be analyzed with the help of specialized interpretive principles in the early days of Montague Grammar have been remodeled as syntactic constituents with the necessary functional potential and the required truth-conditions. This is not to say, though, that it is in any sense wrong to assume interpretive rules that are loaded with truth-conditional content as, e.g., Bittner (1999) propagates them. Moreover, frameworks like Construction Grammar as inspired by Goldberg (1995) or Culicover and Jackendoff (2005) will naturally assume that constructions, i.e. syntactically and morphologically complex expressions with meanings and functions that are attributed to the expressions as wholes, can be primitives at some relevant level of language analysis. As said above, the trend identified above to make do without construction-specific interpretive rules is mainly found in that branch of formal semantics which couples up with mainstream Generativism. This completes the overview of modeling tools for adjunction phenomena. The immediately following subsection returns to the idea that adjuncts may, quite generally, be viewed as semantic arguments of their sister constituents and that the adjunct’s sister selects the adjunct.
4.3. Generalizing the argument construal of adjuncts 4.3.1. Construing adjuncts as semantic predicates or arguments Adjuncts may, on the semantic side, either be predicates or arguments of predicateargument structures. This was discussed in 4.2.1. Examples (18) and (20) are repeated here as (30) and (31) in a simplified way (i.e. without the semantic analysis). (30) a. example b. predication c. types
red predicate ))e,t*,)e,t**
ball argument )e,t*
(31) a. example b. predication c. types
Canadian argument )e,t*
national anthem predicate ))e,t*,)e,t**
(construal A)
It was stated in 4.2.1 above that red ball (on Construal A) is analyzed as involving red as a predicate which has an argument slot for the nominal category to its right. Still the adjective does, at the relevant level of granularity, not determine the syntactic distribution of the larger constituent (see 4.2.2/4.2.3 for a closer look). This makes it an endocentric adjunct. The example in (31) has the reverse semantic predicate-argument configuration, but a similar syntax. Since each national anthem is the anthem of a particular state, Canadian may be said to saturate the respective argument position of national anthem. (Note that this does not mean that Canadian is itself not a predicate; national anthem simply selects an argument of a predicative type.) More examples that can be given an adjunct-as-predicate analysis are provided in (32). (33) assembles further examples which probably require an adjunct-as-argument
37. Arguments and Adjuncts
1303
construal because the sister constituents of the adjuncts are semantically relational. (This holds despite the possibility to use, e.g., behave without well in [33b]. The important point is that if behave is used without an adverb pronounced, it still means the same thing as if well was there.) (32) Adjuncts as predicates a. attributive long log, heavy suitcase, bright colors, American tobacco (intersective) former president, alleged murderer, future husband (non-intersective; … cf. 4.4) b. adverb(ial)s wisely, rudely, quickly, allegedly, fortunately, tomorrow, with mellow words, for three hours, the next day … (33) Adjuncts as arguments a. attributive Italian invasion of Albania, German defeat my father, his ruin, victory of the Spartans … b. adverb(ial)s behave well, readsMIDDLE easily
4.3.2. Generalizing the argument construal of adjuncts in the syntax As mentioned repeatedly, the construal of adjuncts as predicates has a competitor even for cases like red ball, swim for three hours, or the first row of examples of (32a). The best-known syntactic proponent of such a theory is Cinque (1999, 2010). Morzycki (2005) delivered a semantic underpinning of Cinque’s proposal, unifying the perspective for the nominal and the clausal, or sentential, domain. Cinque’s (1999) book is among the first attempts at syntacticizing all adverbs and adverbials in such a way that each adverb or adverbial is a specifier of a designated functional phrase/F-head in a syntactic tree constructed in accordance with X′-Theory (Jackendoff 1977) and Anti-Symmetry (Kayne 1994); cf. Alexiadou (1997) for a study in the same vein. The gist of Cinque’s (1999) proposal can be summarized as in (34). (34) a. There is a universal hierarchy of functional projections F1, …, Fn hosting adverbs and adverbials in clauses. The idea can be extended to the internal structure of nominal arguments. b. The F0 categories of these projections are either phonetically empty, or are spelled out as affixes on verbs, or as particles. c. Adverbs and adverbials have phrasal status and are merged in SpecF as arguments of F0. Adverb(ial)s and, more generally, adjuncts thus saturate an argument position of the argument structure of an F0 category/F-head (3.2). Cinque himself (1999: 134−136)
1304
VI. Theoretical Approaches to Selected Syntactic Phenomena
addresses the possible objection that his proposal syntacticizes many things that might follow from the semantics. According to this objection there are semantic reasons why, say, allegedly scopes over quickly. Cinque admits that there may be such relationships which render part of his syntacticization redundant, but he presents arguments to the effect that not all observed ordering restrictions between adverb(ial)s can be made to follow from the semantics. (35) is Cinque’s (1999: 106, 130) proposal for the universal cascade of F-heads in the clausal domain. Each head is assumed to have a default value and one or more marked values. The unmarked value of Voice is, for instance, [active]; the marked value is [passive]. (35)
a. b. c. d. e. f. g. h. i. j. k. l.
Moodspeech act Moodevaluative Moodevidential Modepistemic T(Past) T(Future) Moodirrealis Modaleth necess Modaleth possib Modvolition Modobligation Modability/permiss
m. n. o. p. q. r. s. t. u. v. w. x.
Asphabitual Asprepetitive(I) Aspfrequentative(I) Aspcelerative(I) T(Anterior) Aspterminative Aspcontinuative Aspperfect Aspretrospective Aspproximative Aspdurative Aspprogressive
y. z. a′. b′. c′. d′. e′. f′.
Aspprospective AspcompletiveSg AspcompletivePl Voice [≠Voice of 3.1] Aspcelerative(II) Asprepetitive(II) Aspfrequentative(II) Aspcompletive(II)
4.3.3. Generalizing the argument construal of adjuncts both in syntax and semantics Morzycki (2005) delivers a semantic underpinning of Cinque’s proposal and elaborates the general idea for most types of adjuncts (modifiers, in his terminology), where Cinque has a clear bias towards adverbial relationships (but cf. Cinque 1994, 2010). The proposal is not semantic in that it would reduce the universal ordering of modifier projections to requirements and impossibilities of semantic scope taking. The general idea is that many more adjuncts can be construed as semantic arguments of predicates than is typically assumed. The predicates taking these arguments either form part of the feature bundles of nouns or verbs, or they project their own F projections outside the lexical category proper. We will first look at such features in their lexemeinternal guise (scenario I); we will then look at the fully syntacticized variant (scenario II). Finally we will discuss mixed cases and cross-linguistic variation (scenario III). The discussion concentrates on attributes, but the arguments carry over to the adverbial/ clausal domain. Scenario I − features as part of X0. Nouns and verbs have feature bundles as denotations, each feature is a predicate, and as a consequence of this nouns may have argument position for property-denoting adjuncts. One might, for instance, say that nominal concepts may have a source feature as part of their feature bundle. Energy, for instance, could be said to denote not just ‘λx . x is energy’ (‘the property of being energy’). It
37. Arguments and Adjuncts
1305
might also have as part of its denotation the predicate ‘λf)e,t* . λx . the source of x has property f’; this would provide an argument slot for a source adjective like solar or nuclear (with denotations ‘λx . x relates to the sun’ and ‘λx . x relates to atomic nuclei’ such that the denotation of solar energy would come out as ‘λx . x is energy & the source of x relates to the sun’). Another example would be a dimensionality feature. A feature like ‘λf . λx . f specifies x’s extension in (a) dimension(s) in space’, if applied to the denotations of plate or stick, might accommodate adjectival adjuncts like thin or short (none of these examples are discussed by Morzycki 2005). In a modification of HPSG proposed in the last decade (Bouma, Malouf, and Sag 2001; Sag 2005), a very similar implementation has been proposed. It leads to argument-structural extensions of lexical heads to accommodate modifiers, i.e. lexical heads are furnished with argument slots for modifiers in a productive fashion. Morzicky (2005) does not implement the respective features as productive extensions of argument structure, but the resulting structures are probably comparable to the HPSG analysis. Scenario II − features fully syntacticized as F0s. Depending on theoretical choices and phenomena to be described, the features may also be severed from their lexical categories (like N0 or V0) and be syntacticized, in bundles or one by one. An example without bundling, which merges two of Morzycki’s examples (2005: 58, 63) and dispenses with semantic types, is given in (36). (36) a. a grey steel dental instrument b. DP D | a
FP [+PRINCIPAL COLOR] AP
F´ [+PRINCIPAL COLOR]
grey F FP [+PRINCIPAL COLOR] [+COMPOSITION] AP
F´ [+COMPOSITION]
steel F FP [+COMPOSITION] [+CLASS] AP
F´ [+CLASS]
dental F [+CLASS]
NP | N | instrument
1306
VI. Theoretical Approaches to Selected Syntactic Phenomena
In the representation in (36b) each adjunct is licensed by a functional head of the semantically appropriate kind. That the information-structurally unmarked order between these adjuncts (the order with the largest contextualization potential) is fixed may then, in the Cinquean fashion, be tied to the fact that an F0[+PRINCIPAL COLOR] will, for instance, select FP[+MATERIAL] as its internal argument. The proposal for a maximal cascade of attribute categories made by Sproat and Shi (1988, 1991) is found in (37). Cinque (1994) proposes the two variants in (38). With Seiler (1978) it should be added that anaphoric adjectives like aforementioned scope over all non-intensional attributive categories. (37) ordinal > cardinal > subjective comment > evidential > size > length > height > speed > depth > width > temperature > wetness > age > shape > color > nationality/origin > material (38) a. poss > cardinal > ordinal > speaker-oriented > subject-oriented > manner/ thematic b. poss > cardinal > ordinal > quality > size > shape > color > nationality/origin Scenario III − mixed cases, cross-linguistic variation, and compounding. Classificatory adjuncts, as opposed to, e.g., color adjectives, are not gradable. We thus get the pattern in (39) (unless some recategorization has occurred). (39) a (very) white (*very) steel (*very) dental instrument If it is assumed that gradation morphemes project DegPs (degree phrases; Kennedy 1998), which take maximal adjectival projections (APs) as complements, and that such DegPs are also projected with positive uses of gradable adjectives, then (very) white in (39), and also grey in (36), should be DegPs (and not just APs, as Morzycki 2005 has it). Steel and dental, with their lack of grading options if no recategorization has occurred, may, on the other hand, be mere APs (a categorization which is compatible with the word category of steel being N if empty structure is assumed within the adjunct). The contrast between gradable white and non-gradable steel and dental invites us to map it onto the contrast between Scenario I and Scenario II, i.e. adjunct-licensing features as parts of X0 categories vs. adjunct-licensing features as projecting F-heads in the syntax. Plastic and dental will then be said to saturate argument slots of features that are bundled up with the noun; (very) white and grey will saturate F-projections above N0 as in (36b). (This and what follows constitutes an extension of Morzycki’s 2005 theory). (40) is a revised and reduced version of (36b) which incorporates the results of the preceding discussion. Changed portions have been highlighted. In this revised representation, [+CLASS] and [+COMPOSITION] are in-built features of instrument, taking dental and steel as arguments within NP. Grey, by contrast, has been recategorized as a DegP, and is selected by an F-head of its own. There is at least one more construction type with a third, highest, degree of closeness between Ns and adjuncts that is relevant in our context − compounds. German, for instance, makes more use of compounding than English. Dental instrument comes out as Dentalinstrument with clear compound properties (single lexical accent on Dentalfor the string Dentalinstrument). The syntax is most likely [N° A0 N0 ]. I will leave it open what the compositional relations are at the lexical compounding level. What matters
37. Arguments and Adjuncts (40)
1307
DP D | a
FP [+PRINCIPAL COLOR] DegP
F´ [+PRINCIPAL COLOR]
grey F NP [+PRINCIPAL COLOR] [+COMPOSITION] AP
N [+COMPOSITION]
steel AP
N [+CLASS]
dental [+COMPOSITION] N instrument [+CLASS] [+COMPOSITION]
in the context of our discussion is that German mostly resorts to compounding with low feature specifications of nominal categories where English has both compounding and classificatory AP adjuncts. At the next higher level of the [+COMPOSITION] feature, German has both a compounding and a phrasal [NP AP N] option: stählernes N ‘steel-ish N’ vs. Stahl-N ‘steelN’. At this higher level of noun-related features, English makes even less use of compounding than at the lower classificatory level. With steel in (40) being an N0 inside an AP, it is not part of a compound, because it is followed by an undisputed syntactic word (dental). But even if it were not (as in one reading of steel instrument), pronominalization facts would still deliver the same non-compound result for the compound suspect steel instrument (Morzycki 2005: 52−53). Steel instrument allows for head noun pronominalization (The steel one, not the plastic one!), whereas analogous pronominalizations are unavailabe for heads of compounds (Which instrument can you play?*I can play the string one, he can play the percussion one.) Color adjectives are typically adjuncts in German as in English, but still many compounds occur in German, e.g. with names for plants and animals; English does have compounds in this domain, but fewer than German (Gelbwurz ‘turmeric, lit.: yellowspice’, Schwarzspecht ‘black woodpecker’). The interesting point to be emphasized here is that the compounding tendency of German is strongest for the lowest exemplified F shell [+CLASS], and weakest for the highest exemplified F shell [+PRINCIPAL COLOR]. This would follow from a generalization which assumes a universal feature hierarchy in the nominal domain which may be part of, or severed from, lexical categories, but which, as a result of its inherent order, generates implicational hypotheses of the type found for German and English above; for
1308
VI. Theoretical Approaches to Selected Syntactic Phenomena
instance, if a language allows [+PRINCIPAL COLOR] compounds in the N domain to a certain extent, it will allow [+CLASS] compounds to a larger extent. Such crosslinguistic comparisons can be multiplied if languages are taken into account which disfavor compounding to the advantage of adjectival adjuncts even more than English does. Russian is such a language. Concepts like fishing settlement, horse market, mousehole (arguably all compounds in English), come out as adjective-noun sequences in Russian: rybac-kij posëlok ‘fisherman-ish village’, kon’-skij bazar ‘horseish market’, myš-inaja nora ‘mouse-ish hole’. The value of uncovering such cross-linguistic differences is that they provide a testing ground for hypotheses about universal feature hierarchies that is independent of the unmarked word order criterion. This is so because we expect the hierarchies in (41) to match in non-random ways. Generally, mappings from one category in Hierarchy1 to categories in Hierarchy2 should only target adjacent categories of Hierarchy2, where each hierarchy in (41) may be Hierarchy1 or Hierarchy2. (41) a. Hierarchy of nominal features … > principal color > … > composition > … > class > … b. Hierarchy of adjunct categories … > gradable DegP adjuncts > … > non-gradable AP adjuncts > … > A0 compound parts
4.4. Intersective vs. non-intersective adjuncts The dichotomy of intersective vs. non-intersective adjunction delivers a partition of the relevant domain into one class which allows for a paraphrase in terms of two conjoined propositions, and one which does not. Examples typically analyzed as intersective adjunction structures are given in (42), examples of non-intersective ones in (43). The examples in (44) are problematic in a way to be discussed shortly. (A note on terminology: the term intersective makes recourse to the set-theoretical construal of predicates as sets. If the denotations of red and ball are identified with the set of red things and the set of balls, respectively, then the set of red balls is the intersection of these two sets. The set construal of predicates is an ultimately equivalent alternative to the construal in terms of functions from individuals to truth-values; cf. 3.3). (42) a. x is a red ball ‘x is red and x is a ball’ b. e was a frightening experience ‘e is frightening and e is an experience’ (43) a. b. c. d.
x x e x
is is is is
the former president #‘x is former and x is a president’ a frequent customer #‘x is frequent and x is a customer’ a mere chance meeting #‘e is mere and e is a chance meeting’ the other person #‘x is other and x is a person’
(44) a. x is running fast (i) ‘e is an event of x running and e is fast.’ (ii) ‘e is an event of x running in the m-manner, with m=fast.’
37. Arguments and Adjuncts
1309
b. x is a good chef (i) ‘x has the property such that, if e is an event of x performing a characteristic action, then there typically is an event e′ such that e is a part of e′, and e′ is an event of x cooking and of x being good to a high deree’ (ii) ‘x is good and x is a chef’ As is evinced by the paraphrases making use of conjunctions in (42) and (43), the conjunction criterion delivers clear distinctions in many cases. The problem with the cases in (44) is that it is theory-dependent whether or not an adjunction structure comes out as intersective. If, for instance, both verbs and adverbs are construed as predicates of events, as is done in paraphrase (i) of (44a), then the structure is an example of intersective adjunction. If, however, the adverb is seen as an argument specifying the manner in which the running is performed, then (44a) will not come out as intersective; this option is paraphrased in paraphrase (ii), where fast fills an argument slot of an appropriately derived lexical entry of run (first proposed in explicit form by McConnellGinet 1982). The problem with (44b) is slightly different. The different paraphrases in (44a) were supposed to be paraphrases with identical truth-conditions. The paraphrases in (44b) are supposed to have different truth-conditions. If, for instance, x cooks lousily, but donates for the poor, then (i) is false and (ii) may be true. The decision whether good chef is intersective or not is highly theory-dependent: good chef will come out as intersective in reading (i) if, as in 4.2.3, the denotation of good just targets the cooking component of the meaning of chef. If the lexical item chef is opaque in the formalism, a non-intersective construal is going to be the result. Reading (ii) will be construed intersectively on all accounts that I can think of. Since the composition rule of Predicate Modification is stated in intersective terms, Predicate Modification is a natural candidate to model intersective adjunction structures. But this is not a necessary tie-up. 4.2.1, for instance, has provided examples of an intersective semantics in terms of Functional Application. Conversely, however, the tieup between (non-subsective) non-intersectivity and composition principles other than Predicate Modification is without exception.
4.5. Intensional vs. extensional adjuncts The terminology of intensionality vs. extensionality, if discussed in the domain of adjuncts, is frequently used synonymously with the terminology of the preceding subsection. i.e. of intersective vs. non-intersective adjuncts. However, the empirical domains of the two dichotomies just overlap. Intensional, on its most widespread contemporary understanding, means that the semantics of the expression in question is modeled with recourse to modality or temporal semantics. The standard tools in this domain are a possible worlds semantics with world variables as parts of denotations, and a semantics of time with variables for points in time or intervals of time. Extensional means that no recourse to this kind of machinery is made. Consider construal (ii) of (44a) once more to see that a non-intersective adjunct need not be intensional. If manner adverbs are construed as arguments of V(P)/v(P) denota-
1310
VI. Theoretical Approaches to Selected Syntactic Phenomena
tions, then the construal is non-intersective; it does not involve reference to, or quantification over, worlds or times though. The standard examples in (45) and their (simplified) paraphrases make it immediately clear why intensional machinery is needed to interpret these expressions. (45) a. the former president (i) ‘the person who was the president at some point in time t′ before the reference time t’ (ii) *‘the x that is former and president’ b. the alleged murderer (i) ‘that person x who is the murderer in all those possible worlds w′ which are compatible with the evaluation-world w beliefs of the relevant source of information about the murderer property of x in w’ (ii) *‘the x that is alleged and a murderer’ The modeling of intensional adjuncts always involves tools other than Predicate Modification, typically Functional Application. We will not enter into any details here. The interested reader is referred to von Fintel and Heim (2011) and the references cited there for details.
4.6. Adjuncts as specifiers vs. adjuncts as modifiers In contemporary Generative syntax, adjuncts are typically accommodated in specifier positions in X′-structure, or in adjoined modifier structures. A theory like Cinque’s (1999) (and no less Alexiadou 1997 or Morzycki 2005) will license most adjuncts as specifiers of functional projections of the nominal or clausal/sentential domain because (most) adjuncts are construed as arguments of these projections in these approaches (4.3.2/4.3.3). In more traditional theories, and in theories which do not make use of the specifier concept, adjuncts will be construed as expansions of a category which do not change the category of the expanded structure (i.e. they are treated, or defined, as endocentric constructions in the traditional sense of the term; cf. sections 1 and 2). Given the results arrived at by Cinque (1994, 1999), Sproat and Shi (1988, 1991), or Morzycki (2005), it has become less clear than before what endocentric behavior really is. It is not the case that adverb(ial)s and nominal attributes are freely ordered within nominal and clausal/sentential constituents once the level of finegrainedness is adjusted sufficiently, and therefore the distribution of a structure with a given adjunct is most often not the same as the distribution of the same structure without the adjunct. This renders implementations of adjunction that rest on true endocentricity less attractive. There seem to be some structures, though, that may, even under the circumstances of many adjuncts nowadays having a plausible argument analysis, survive as true endocentric modifiers. Such structures include DPs with gradable adjectives that are separated from each other by a comma intonation; cf. (46a). (46) a. the green, Italian, aforementioned, heavy, stylish, long table b. *the green Italian aforementioned heavy stylish long table (neutral intonation) c. the aforementioned stylish long heavy green Italian table
37. Arguments and Adjuncts
1311
(46a) is good because each non-final adjective bears a high right boundary tone. In this setting, the order of adjectives plays no role (Alexiadou, Haegemann, and Stavrou 2007: 322−323). The same order of adjectives under a neutral intonation contour as in (46b) is deviant. (46c) provides the corresponding unmarked sequence. The data in (46a/ b) may be interpreted in such a way that An … A1 N sequences with boundary tones on adjectives Am>1 lead to truly endocentric modifier constructions.
4.7. Frame adjuncts vs. event adjuncts A frame adjunct restricts the domain within which an assertion is claimed to hold true; whatever the syntactic implementation looks like, the position of a frame adjunct must be higher on some hierarchy than the position of an event adjunct. An event adjunct, by contrast, forms part of a potentially assertive part of a declarative sentence. The notion of frame adjunct used here is meant to cover the same empirical domain as Chafe’s (1976: 50−51) frame-setting topics and Maienborn’s (2001) and Maienborn and Schäfer’s (2011) frame-setting modifiers. The notion of event adjuncts used here coincides with Maienborn’s (2001) event-external modifiers. Examples of frame adjuncts and event adjuncts are provided in (47) through (49). (47) a. In Argentina, Eva is still popular. a′. In Argentina, Eva signed a contract. b. Eva signed a contract in Argentina.
(cf. Maienborn 2001)
(48) a. On few summer evenings, he had each friend come over. (few>each, *each>few) b. He had each friend come over on few summer evenings. (few>each, each>few) (49) a. On Paul’si evenings off, hei had friends come over. b. *Hei had friends come over on Paul’si evenings off.
(frame) (frame) (event) (frame) (event) (frame) (event)
In (47a), in Argentina restricts the place for which the claims made by Eva is still popular and Eva signed the contract are supposed to hold. In (47b), in Argentina straightforwardly contributes to the truth-conditions of the sentence. The sentence would be false in a context in which Eva signed a letter of intent, but not a contract, in Argentina, and it would likewise be false in a context in which Eva signed a contract in Paraguay, but not in Argentina. The information provided by the direct object and by the event adjunct are thus on a par in terms of potential truth-conditional import (modulo information-structural foregrounding or backgrounding). Things are different in (47a). A person uttering (47a) presents the facts in such a way that in Argentina is related to the currently relevant discourse topic. As such, it is not in the scope of assertion. If the hearer wants to refute the information that the reported situations hold/held in Argentina, information would have to be extracted from what the speaker presented as not at issue. A declarative sentence without quantifiers and with an event adjunct always entails its counterpart without the event adjunct; this renders event adjuncts veridical (Eva signed the contract in Argentina entails Eva signed the contract). A declarative sentence
1312
VI. Theoretical Approaches to Selected Syntactic Phenomena
with a frame adjunct as in (47a) does not entail its counterpart without the frame adjunct, or it only does so under specific contextual conditions (In Argentina, Eva is still popular will only entail Eva is still popular if the adjunct-less sentence is contextually restricted to be true of the situation in Argentina, or involves a contextually understood existential closure: ‘There is a country in which it holds true that Eva ist still popular in it.’) But not all frame adjuncts are like this. With the episodic example in (47a′), veridicality holds again, because this sentence entails ‘Eva signed a contract’. Consequently, frame adjuncts may be non-veridical or not, depending on whether a stative or habitual/generic state-of-affairs is described, or an episodic one. If a frame adjunct contains a quantifier, as on few summer evenings in (48a) does, it always scopes over quantifiers in the event description. (48b) features the same string on few summer evenings, but as an event adjunct this time. As such, it may scope over or under co-occurring quantifiers in the event description. (49) illustrates the different hierarchical positions of frame and event adjuncts with the help of a Principle C effect (cf. article 17 in this volume). Given that proper names may not corefer with c-commanding antecedents, the contrast in acceptability in (49) shows that frame adjuncts as in (49a) scope above subjects, whereas event adjuncts scope underneath subjects, i.e. inside the event description (Maienborn 2001: 206−207). (Note that Paul in [49a] does not c-command the pronoun and hence does not bind it. Still Paul and he may corefer; cf. Büring 2005: 125.) The exact syntactic implementation of the dichotomy between frame adjuncts and event adjuncts will depend on the theory chosen. If one assumes syntactic projections designated to host (restrictors of) topics, or (restrictors of) topic situations (cf. Rizzi’s 1997 Topic Phrase and the cartographic tradition kicked off by that article), then frame adjuncts will be accommodated there. If no such designated positions are assumed, frame adjuncts will be expansions of high sentential categories. Event adjuncts will be accommodated underneath subjects (Maienborn 2001); in all likelihood, event adjuncts are not restricted to occurring in a single syntactic position. Further syntactic details will depend on how adjuncts are licensed in a given theory (cf. 4.3.2). Intensional clausal adjuncts like frankly (speaking), allegedly, or fortunately are frame adjuncts with the additional property of implying a speaker-commitment − as with frankly (speaking) − or an evidential status.
4.8. Restrictive vs. non-restrictive adjuncts The restrictive/non-restrictive dichotomy known from relative clauses is also found with adjectival adjuncts. (50) has two readings depending on whether the Britons are characterized as phlegmatic in their entirety, or whether just a proper subset of them is characterized in that way. The parallelism of the contrast with relative clauses is used to paraphrase the readings in (50a) and (50b) (from Alexiadou, Haegeman, and Stavrou 2007: 335). (50) The phlegmatic Britons will accept his recommendations. a. ‘Those Britons that are phlegmatic will accept his recommendations.’ (restrictive)
37. Arguments and Adjuncts
1313
b. ‘The Britons, {inasmuch as they are/which are} phlegmatic, will accept his recommendations.’ (non-restrictive) In Romance the contrast between restrictive and non-restrictive uses of adjectival adjuncts is reflected by the prenominal (non-restrictive) vs. postnominal (restrictive) position of the adjective; the examples in (51) are from Bouchard (2002: 94−95). (51) a. Les [britanniques flegmatiques] accepteront ses [French] the Britons phlegmatic will.accept his/her recommendations. recommendations ‘The Britons that are phlegmatic will accept his/her recommendations.’ b. Les [flegmatiques britanniques] accepteront ses recommendations. the phlegmatic Britons will.accept his/her recommendations ‘The Britons, {inasmuch as they are/which are} phlegmatic, will accept his/ her recommendations.’ It is a matter of debate how the contrasts in (50) and (51) are to be represented and derived syntactically; more on this below (Demonte 1999; Bouchard 2002; Givón 1993: 268; Lamarche 1991; Ralli and Stavrou 1997). In terms of semantics it appears to be the case that the non-restrictive readings should be analyzed so as to treat the adjective as denoting a property which is definitional for, or at least characteristic of, referents with the head noun property. Put differently, the adjective in non-restrictive readings redundantly spells out one property of the set of properties contributing to the N meaning (as assumed by the speaker in the context at hand). The speaker chooses to mention this presupposed information because there is some relation tying the proposition uttered to the property in question (the Britons in [51b] will accept his recommendations because they are phlegmatic). This can be implemented compositionally in at least two different ways. Either implementation treats the meaning of the adjective as presuppositional. To put it differently, either implementation construes the adjective as providing information which is presented as a non-negotiable meaning component of the nominal denotation. Option (a) will assume an F-head mediating between the nominal and the adjunct category. This head will shift the entailment of its second argument, the adjunct category, to a presupposition. The entailment of the adjunct will vanish and reduce to the identity function. On this account, the F-head between phlegmatiques and britanniques in (51b) will take britanniques as its first argument and will check, on taking its second argument phlegmatiques, whether ‘being a Briton’ entails, or characteristically goes along with, ‘being phlegmatic’ in the context at hand. If this is the case, the resulting expression will denote exactly what britanniques alone means. If it is not the case, the resulting expression will fail to have a denotation, because the presupposition of the F head is not fulfilled. The lexical entry for the required F-head is spelled out in (52a). Option (b) will assume a type-shifted variant of the adjunct category which takes the nominal category as its complement. Moreover, the entailment/assertion of the basic adjectival denotation is shifted to the presupposition. This option is spelled out in (52b).
1314
VI. Theoretical Approaches to Selected Syntactic Phenomena
(52) a. F-head mediating between non-restrictive ADJ-NP structures i. [ ADJ [ F [ NP ] ] ] ii. ⟦FNON-RESTR⟧ = λf)e,t* . λg)e,t* : cx[f(x)=1 → g(x)=1] . λy . f(y)=1 b. Type-shifted adjective with assertive content shifted to the presupposition i. [ ADJ [ NP ] ] ii. ⟦phlegmatiquesNON-RESTR⟧ = λf)e,t* : cx[f(x)=1 → phlegmatic(x)=1] . λy . f(y)=1 b′. Generalized version of b., where A is a variable over truth-conditions of adjunct denotations ⟦ADJNON-RESTR⟧ = λf)e,t* : cx[f(x)=1 → A(x)=1] . λy . f(y)=1 Both ⟦FNON-RESTR⟧ and ⟦ADJNON-RESTR⟧ check whether all individuals having the nominal/f property also have the AP/DegP property. If so, the resulting expression has the same denotation as the nominal category alone. If not, the resulting expression has no denotation. If universal quantification in the presupposition/domain restriction turns out to be too strong, a genericity operator could regulate the relationship between the truth of f(x) and g(x)/A(x) in (52) instead. Since the semantic outcome of analyses in (52a, b) is the same, either can be chosen. In the context of the present article with its bias towards F-head implementations, (52a) is the natural choice. With this semantic background about non-restrictive modification in mind, the following tentative conclusions concerning syntactic detail can be drawn. Since on either account the nominal category is an argument of the F-head/the adjunct, it must be a maximal projection (unless semantic arguments of syntactic heads can be non-maximal projections, an option which is not entertained here). On the F-head analysis, the adjunct category must, for the same reason, be an XP, too. Within the type-shift account, the adjunct category can be either a head or, more generally, a non-maximal projection, simply because it takes an argument (the nominal category). I will leave the matter open at this point, hoping to have clarified what the syntactic options may be in the light of what the semanticist would like to see as input. The readers are invited to check for themselves how these general thoughts relate to syntactic reasonings found in the literature, especially in Bouchard’s (2002) work, in Alexiadou, Haegeman, and Stavrou’s (2007) overview and in the works cited underneath (51). Upon consulting those writings, one should keep in mind that the level of syntactic sophistication of some authors in this domain outranks their semantic explicitness and reliability. Another thing that I will leave open here is how our proposal relates to the standard treatment of the difference between restrictive and non-restrictive relative clauses. Nonrestrictive relative clauses are typically analyzed as (high) DP-adjuncts, whereas restrictive relative clauses are (low) NP-adjuncts. In the clausal domain, parallel phenomena of non-restrictiveness with adjuncts do not seem to exist, or at least they are not discussed in a parallel fashion. Upon closer scrutiny, it would be attractive to map the restrictive vs. non-restrictive contrast of the nominal domain onto the frame adjunct vs. event adjunct contrast of the clausal domain (4.7). What both contrasts have in common is that they draw a dividing line between material that is presented as presuppositional (frame adjuncts/non-restrictive attributes) and material that contributes to the truth-conditions of a predicate (event adjuncts/restrictive attributes). It is beyond the scope of this survey article to delve deeper into explorations of this parallel.
37. Arguments and Adjuncts
1315
4.9. Generic adjuncts vs. episodic adjuncts The contrast between the stacked adjuncts in (53) and (54) can be described as one between generic and episodic properties. (53) a. the [(in)visibleE [visibleG stars]] b. the [[visibleG stars] visibleE] ‘the [usually visible]G stars that are [(in)visible at the reference time]E’ (54) a. the [(non-)navigableE [navigableG rivers]] b. the [[navigableG rivers] navigableE] ‘the [usually navigable]G rivers that are [(not) navigable at the reference time]E’ This type of contrast goes back to Bolinger (1967), and it has been given different names, including reference modificationG vs. referent modificationE by Bolinger himself and individual-levelG vs. stage-levelE by Larson (1999); subscripts cross-reference the terms with the uses in (53) and (54). Here the terms generic vs. episodic are preferred because on a strict reading of the stage-level/individual-level terminology, all the examples given ought to be contradictory. By contrast, the notion of genericity allows for the leeway that can account for a referent being visible in principle and under normal circumstances, but not necessarily all the time (cf. Krifka et al. 1995). The contrast probably boils down to a dichotomy of adjectival modification in a phrasal lexeme for the generic uses ([A0 N0]) vs. (gradable) adjectival modification for the episodic ones (with constituency and semantics mediated by F-heads or noun-internal features, if one of the analyses in 4.3.3 is adopted).
4.10. Syntactic categories of adjuncts There is a confusing multitude of proposals to map individual adjunct types to syntactic categories. This is not much of a surprise, given that the adjunct notion is highly underdetermined. Instead of reviewing different proposals made for small subdomains of our field of interest, I will use the results of the preceding subsections 4.1 through 4.9 to state what our general options are. The basic principles to allot adjunct types to types of syntactic categories are simple. An adjunct that is a single word (and not a phrase) and is given a semantic predicate, or function, analysis such that its sister is its semantic argument must be a head/X0 category. This is so because predicate-argument relations between sister nodes hold between heads and arguments. Univocal examples of such adjuncts would be the intensional adjuncts like former or alleged in the nominal domain, and their adverb counterparts like formerly or allegedly in the adverbial domain (unless F-head mediation is assumed). An adjunct that consists of more than a word and is given a semantic predicate analysis such that its sister is its semantic argument must be a non-maximal, intermediate projection/X′ category. It must contain an L-head, or be construed as having a PP-like overall argument structure. Clear examples include adverbials like according to X and in an X fashion/manner.
1316
VI. Theoretical Approaches to Selected Syntactic Phenomena
An adjunct that is given a semantic argument analysis such that its sister takes it as its argument must be a maximal projection/XP category (maybe with some empty material) no matter if the adjunct at hand is a word or a string of words; at least this holds in those theories that assume that complements/arguments always correspond to maximal projections. If no such regularity is assumed, argumental adjuncts may be mere words/ X0, or X′, categories. Examples would be adjectival arguments of relational nouns like Canadian in Canadian national anthem or, in Morzycki’s (2005) generalized theory of adjuncts-as-arguments, all attributive/adverbial complements of N/V/v-internal features or N/V/v-external F-heads. Depending on the categorial granularity of head types assumed in such adjuncts, adjuncts may, in the nominal domain, be sub-classified at least as DegPs (with gradable/graded attributes), APs (with non-gradable attributes), or CPs (with tensed/clause-worthy attributes, or if a CP origin of most attributes is assumed; Kayne 1994). An analogous reasoning is possible for adverbial adjuncts. The regularities just outlined are tentatively summarized in (55). Meaningful mappings to concrete syntactic categories would, however, only be possible within theories that are explicit as to the available composition principles and the set of functional and lexical categories. Therefore, no category names are mentioned in (55). A theory-dependent assumption underlying the right half of (55) is that syntactic arguments are always phrases. (The branches in [55] represent the following logical relations: a node immediately dominated by a branching node fulfils the conditions of all the nodes dominating it; a node immediately dominated by a non-branching node [= a terminal node] corresponds to an entailment of the conjoined conditions dominating it.) (55)
adjunct (ADJ)
ADJ is a semantic predicate
sister of ADJ combines by way of Functional Application
ADJ is a head OR ADJ is an intermediate projection
ADJ is a semantic argument of its sister
sister of ADJ combines by way of Predicate Modification
no prediction possible concerning syntactic category of ADJ
ADJ is a phrase/ maximal projection
Fig. 37.1: Syntactic categories of adjuncts and composition principles
37. Arguments and Adjuncts
1317
5. Conclusions If a linguistic expression or category is classified as a syntactic or semantic argument, this will allow for straightforward predictions concerning syntactic structure and semantic composition, provided the theory in which the generalization is stated is sufficiently explicit. In grammars relying on constituent structure, arguments are sisters or specifiers of heads on the syntactic side. On the semantic side, an argument is what the function denoted by the/a sister category in the constituent structure applies to. The pretheoretical intuition underlying the adjunct notion has it that adjuncts are optional expansions of syntactic structures and semantic denotations. No unified treatment of adjuncts has been proposed to date, and it may be concluded with Jacobs (1994) that what has been called adjunct in the literature comprises a multitude of phenomena that are more or less similar to argumenthood. We have seen that adjuncts can be analyzed as arguments or functions/predicates both in syntax and semantics. On the other hand, our theories also have modeling tools to represent adjuncts as endocentric expansions in the syntax (if the category of the complete adjunct+head+complement category equals that of the head) and as predicates that combine with other predicates of identical type in the semantics (if Predicate Modification is assumed). It is an open question whether the variety of analytical options in the domain of adjuncts mirrors a variety of different adjunct types in the language, or whether the majority of these different options is just a reflex of our insufficient understanding of what we call adjuncts.
37. Acknowledgements This article has benefited from written comments on an earlier version provided by Peter Ackema, Daniel Büring, Andreas Dufter, Sebastian Löbner, Marcin Morzycki, Martin Schäfer, and the editors. Remaining errors and shortcomings are mine. The research underlying this article was partly conducted with financial support from the Deutsche Forschungsgemeinschaft (grant number HO 2557/3-1).
6. References (selected) Ágel, Vilmos, Ludwig M. Eichinger, Hans-Werner Eroms, Peter Hellwig, Hans Jürgen Heringer, and Hening Lobin (eds.) 2003/2006 Dependenz und Valenz, 2 Volumes. Berlin: de Gruyter. Alexiadou, Artemis 1997 Adverb Placement: A Case Study in Antisymmetric Syntax. Amsterdam: John Benjamins. Alexiadou, Artemis, Liliane Haegeman, and Melita Stavrou 2007 Noun Phrase in the Generative Perspective. (Studies in Generative Grammar 71.) Berlin/New York: Mouton de Gruyter. Bittner, Maria 1999 Concealed causatives. Natural Language Semantics 7: 1−78. Bloomfield, Leonard 1933 Language. New York: Holt. Boeder, Winfried 1968 Über die Versionen des georgischen Verbs. Folia Linguistica 2: 82−152.
1318
VI. Theoretical Approaches to Selected Syntactic Phenomena
Bolinger, Dwight 1967 Adjectives in English: attribution and predication. Lingua 18: 1−34. Bouchard, Denis 2002 Adjectives, Number and Interfaces: Why Languages Vary. (Linguistic Variations 61.) Amsterdam: Emerald Group Publishing. Bouma, Gosse, Robert Malouf, and Ivan A. Sag 2001 Satisfying constraints on extraction and adjunction. Natural Language and Linguistic Theory 19(1): 1−65. Büring, Daniel 2005 Binding Theory. Cambridge: Cambridge University Press. Chafe, William 1976 Givenness, contrastiveness, definiteness, subjects, topics and point of view. In: Charles N. Li (ed.), Subject and Topic, 27−55. London/New York: Academic Press. Chomsky, Noam 1993 A minimalist program for linguistic theory. In: Kenneth Hale and Samuel Jay Keyser (eds.), The View from Building 20, 1−53. Cambridge, MA: MIT Press. Chomsky, Noam 1995 The Minimalist Program. Cambridge, MA: MIT Press. Church, Alonzo 1936 An unsolvable problem of elementary number theory. American Journal of Mathematics 58: 345−363. Cinque, Guglielmo 1994 Evidence for partial N-movement in the Romance DP. In: Guglielmo Cinque, Jan Koster, Jean-Yves Pollock, Luigi Rizzi, and Raffaella Zanuttini (eds.), Paths Toward Universal Grammar: Essays in Honor of Richard S. Kayne, 85−110. Washington, D.C.: Georgetown University Press. Cinque, Guglielmo 1999 Adverbs and Functional Heads: A Cross-Linguistic Perspective. New York: Oxford University Press. Cinque, Guglielmo 2010 The Syntax of Adjectives. Cambridge, MA: MIT Press. Culicover, Peter, and Ray Jackendoff 2005 Simpler Syntax. Oxford: Oxford University Press. Demonte, Violeta 1999 A minimal account of Spanish adjective position and interpretation. In: Jon A. Franco, Alazne Landa, and Juan Martín (eds.), Grammatical Analysis in Basque and Romance Linguistics, 45−76. Amsterdam: Benjamins. Dik, Simon C. 1975 The semantic representation of manner adverbials. In: A. Kraak (ed.), Linguistics in the Netherlands 1972−1973, 96−121. Assen: Van Gorcum. Ernst, Thomas 2002 The Syntax of Adjuncts. Cambridge: Cambridge University Press. von Fintel, Kai, and Irene Heim 2011 Intensional Semantics. Lecture Notes: Massachusetts Institute of Technology. Frege, Friedrich Ludwig Gottlob 1994 Über Sinn und Bedeutung. In: Günther Patzig (ed.), Funktion, Begriff, Bedeutung. Fünf logische Studien, 40−65. Göttingen: Vandenhoeck und Ruprecht. First published in Zeitschrift für Philosophie und philosophische Kritik 100, 25−50 [1892]. Givón, Talmy 1993 English Grammar: a Function-Based Introduction. 2 Vols. Amsterdam/Philadelphia: John Benjamins.
37. Arguments and Adjuncts
1319
Goldberg, Adele 1995 Constructions. A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press. Heim, Irene, and Angelika Kratzer 1998 Semantics in Generative Grammar. Oxford: Blackwell. Higginbotham, James 1985 On semantics. Linguistic Inquiry 16: 547−593. Hole, Daniel 2012 German free datives and Knight Move Binding. In: Artemis Alexiadou, Tibor Kiss, and Gerean Müller (eds.), Local Modelling of Non-Local Dependencies in Syntax, 213−246. Berlin/Boston: de Gruyter. Hole, Daniel 2014 Dativ, Bindung und Diathese. Berlin/Boston: de Gruyter. Horvath, Julia, and Tal Siloni 2002 Against the Little-v Hypothesis. Rivista di Grammatica Generativa 27: 107−122. Jackendoff, Ray 1977 X-bar Syntax: A study of Phrase Structure. (Linguistic Inquiry Monograph 2.) Cambridge, MA: MIT Press. Jacobs, Joachim 1994 Kontra Valenz. Trier: Wissenschaftlicher Verlag. Kayne, Richard S. 1994 The Antisymmetry of Syntax. Cambridge, MA: MIT Press. Kennedy, Christopher 1999 Projecting the Adjective. The Syntax and Semantics of Gradability and Comparison. New York/London: Garland. Kim, Jaegwon 1966 On the psycho-physical identity theory. American Philosophical Quarterly 3: 225−235. Kratzer, Angelika 1996 Severing the external argument from its verb. In: Johan Rooryck and Laurie Zaring (eds.), Phrase Structure and the Lexicon, 109−137. Dordrecht: Kluwer. Kratzer, Angelika 2005 Building resultatives. In: Claudia Maienborn and Angelika Wöllstein (eds.), Event Arguments: Foundations and Applications, 177−212. Tübingen: Max Niemeyer Verlag. Krifka, Manfred, Francis Jeffry Pelletier, Gregory N. Carlson, Alice ter Meulen, Gennaro Chierchia, and Godehard Link 1995 Genericity: an introduction. In: Gregory N. Carlson, and Francis Jeffry Pelletier (eds.), The Generic Book, 1−124. Chicago: University of Chicago Press. Lamarche, Jacques 1991 Problems for N0-movement to Num-P. Probus 3(2): 215−236. Larson, Richard 1999 Semantics of adjectival modification. Lecture Notes, Dutch National Graduate School. Amsterdam, Netherlands. Maienborn, Claudia 2001 On the position and interpretation of locative modifiers. Natural Language Semantics 9(2): 191−240. Maienborn, Claudia, and Martin Schäfer 2011 Adverbs and adverbials. In: Klaus von Heusinger, Claudia Maienborn, and Paul Portner (eds.), Semantics. An International Handbook of Natural Language and Meaning, Vol. 2, 1390−1420. Berlin/New York: Mouton de Gruyter. Marantz, Alec 1984 On the Nature of Grammatical Relations. Cambridge, MA: MIT Press.
1320
VI. Theoretical Approaches to Selected Syntactic Phenomena
Marantz, Alec 1993 Implications of asymmetries in double object constructions. In: Sam A. Mchombo (ed.), Theoretical Aspects of Bantu Grammar, 113−150. Stanford, CA: Center for the study of language and information publications. McConnell-Ginet, Sally 1982 Adverbs and logical form: a linguistically realistic theory. Language 58: 144−184. Morzycki, Marcin 2005 Mediated modification: functional structure and the interpretation of modifier positions. Ph.D. dissertation, Department of Linguistics, University of Massachusetts Amherst. Pollard, Carl, and Ivan A. Sag 1994 Head-Driven Phrase Structure Grammar. Chicago, IL: University of Chicago Press Pylkkänen, Liina 2002 Introducing arguments. Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, M.A. Ralli, Angela, and Melita Stavrou 1997 Morphology-syntax interface: two cases of phrasal compounds. Yearbook of Morphology 1997: 243−264. Rizzi, Luigi 1997 The fine structure of the left periphery. In: Liliane Haegeman (ed.), Elements of Grammar, 281−337. Dordrecht: Kluwer. Sag, Ivan A. 2005 Adverb extraction and coordination: a reply to Levine. In: Stefan Müller (ed.), Proceedings of the HPSG05 Conference, 322−342. Department of Informatics, University of Lisbon. Schönfinkel, Moses 1924 Über die Bausteine der Mathematischen Logik. (Mathematische Annalen 92.) Heidelberg: Springer. Seiler, Hansjakob 1978 Determination: a functional dimension for interlanguage comparison. In: Hansjakob Seiler (ed.), Language Universals, 301−328. Tübingen: Narr. Sproat, Richard, and Shih, Chilin 1988 Prenominal adjectival ordering in English and Mandarin. Proceedings of the North Eastern Linguistic Society 18: 465−89. Sproat, Richard, and Shih, Chilin 1991 The cross-linguistic distribution of adjective ordering restrictions. In: Carol Perkins Georgopoulos, and Roberta Lynn Ishihara (eds.), Interdisciplinary Approaches to Language. Essays in Honor of S.-Y. Kuroda, 565−593. Dordrecht: Academic Press. Storrer, Angelika 2003 Ergänzungen und Angaben. In: Vilmos Ágel et al. (eds.), Dependenz und Valenz, 764− 780. Berlin: de Gruyter. Strawson, Peter Frederick 1950 On referring. Mind 59: 320−344. Tesnière, Lucien 1959 Eléments de Syntaxe Structurale. Paris: Klincksiek.
Daniel Hole, Stuttgart (Germany)
38. Models of Control
38. Models of Control
1321
It’s so empty without me. (Eminem)
1. Introduction 2. 3. 4. 5. 6.
Controller choice and obligatory control The syntax of the controlled phrase Surprising facts Whither control? References (selected)
Abstract This article is concerned with theoretical treatments of control, and thus complements article 14, which deals with control as an empirical phenomenon. Very early treatments of control already suggest two different approaches to deal with the phenomenon, either by assuming lexico-semantic generalizations, or by employing formal syntactic operations. More recent theoretical treatments of control can still be classified according to the opposition between lexico-semantic and syntactic treatments, but introduce further distinctions, in particular the distinction between obligatory and non-obligatory control. This paper focuses on the following analyses: Bech (1955/57), Rosenbaum (1970), Manzini (1983), Pollard and Sag (1994), Hornstein (1999), Manzini and Roussou (2000), Polinsky and Potsdam (2002), and Jackendoff and Culicover (2003). In addition to the treatment of controller choice, we will address the syntactic structure of the controlled phrase, the treatment of complement vs. adjunct control, and cases of backward control.
1. Introduction Control describes an anaphoric relationship between a verbal projection and an NP. The content of the verbal projection is predicated of the NP, which is not in the syntactic domain of the head of that verbal projection. The majority of models of control do not assume that the verbal predicate and the NP are directly connected. Instead, the combination is mediated by a (pro-)nominal element that acts as the subject of the VP and is anaphorically related to the NP. Viewed from this perspective, it may seem that control requires an analysis couched in terms of semantics as well as in terms of syntax − a position hinted at in an assessment by Chomsky (1981: 78). But at the same time, control refers to local syntactic domains − thus justifying a characterization of control as a syntactic phenomenon. Various models of control have been proposed (Bech 1955/57; Rosenbaum 1970; Williams 1980; Chomsky 1981; Bresnan 1982; Manzini 1983; Chierchia 1984; Pollard and Sag 1994; Hornstein 1999; Landau 2000; Manzini and Roussou 2000; Polinsky and Potsdam 2002; Jackendoff and Culicover 2003 among others), and a cursory survey may reveal great differences between individual proposals. Such differences, however, hamper a comparison to a lesser degree than the observation that models of control do not
1322
VI. Theoretical Approaches to Selected Syntactic Phenomena
necessarily agree as to what should be their subject matter. In an ideal scholarly environment, different models of control would be matched on the basis of an agreed upon set of phenomena, and one would choose the model with the smallest inventory covering all the phenomena. Unfortunately, such a model does not exist, despite the claim that more recent models of control are “exercise[s] in grammatical downsizing” (Hornstein 1999: 69). Such an exercise would only be valid if the new model covers the same comprehensive ground as the models it aims to supersede. But a comprehensive coverage of the phenomenon cannot be found in a single model of control. Given this perspective, we can only set a more circumscribed goal of identifying the model that accounts for the largest subset of phenomena. But even this move does not lead to the identification of an optimal model, unless we restrict our focus to a subset so small that most parts of the phenomenon will not be embraced. Such reduced sets will neither include certain cases of control coercion (also known as controller shift), as illustrated in (1) nor adjunct control as illustrated in (2). (1)
John persuaded Sue to be left alone.
(2)
Mein Chef schickte mich nach Hamburg, um einen Chauffeur my boss sent me to Hamburg in.order a driver zu besorgen. to obtain ‘My boss sent mei to Hamburg ei to find a driver.’
[German]
While the data covered by different models of control diverge largely, models agree in that control is a local syntactic relation, as can be illustrated with the examples in (3). (3)
a. b. c. d. e.
Johni promised Paulj [to behave himselfi/*j ]. Johni persuaded Paulj [to behave himself*i/j ]. *Maryi guaranteed that [S John promised [to behave herselfi ]]. Johni believed that [S it was necessary [to behave himselfi ]]. Maryi said that [S John believed that [S it was necessary [to behave herselfi ]]].
The ungrammaticality of (3c) points up the locality of control: the control relation seems to be confined to the clause that includes the control predicate, promise in (3c). The examples in (3a, b) conform to this locality condition, while (3d, e) apparently do not. A reaction to the grammaticality of these examples would be to postulate that they are not instances of control in a strict sense. Williams (1980) drew such a distinction, according to which obligatory control (OC) − as exemplified in (3a−c) − must be set apart from non-obligatory control (NOC). The examples in (3d, e) are taken to be instances of NOC. Locality is then presupposed to be characteristic of obligatory control. The behaviour of predicates like promise in (3a) is interesting with respect to locality, since the controller is obviously not the most local phrase − this would be the object [NP Paul], as in (3b). Similar claims can be made for (1), while the controller in (2) does not seem to occupy a position from where to control into the adjunct at all. A prominent strand of models assumes that OC is not only local in the sense that the controller must be in the next dominating clause, but also that the controller must be the element that is realized most closely to the controlled element. Verbs like promise seem to contradict
38. Models of Control
1323
this conclusion. And while several proposals (most prominently Larson 1991 and Hornstein 1999) have tried to analyse the behaviour of promise as a lexical anomaly, it turns out that non-minimal, yet local OC is far more common than a treatment as a lexical anomaly would suggest. It may seem obvious from the data that lexical classes of predicates trigger control, possibly determined by their semantic commonalities. Still, this conclusion is partly controversial, and partly premature. Controversy emerges as models either define control in terms of properties of lexical classes, or conclude that it is a purely structural phenomenon. The former position is held in Bech (1955/57), Bresnan (1982), Pollard and Sag (1994), and Jackendoff and Culicover (2003); the latter in Hornstein (1999) and Manzini and Roussou (2000). Adjunct control, as illustrated in (2), shows that the conclusion is premature. It does not require a lexical trigger. The integration of adjunct control is problematic for control theories based on lexical classes. Structural accounts have to assume that every predicate can be turned into a control predicate, or, stated in structural terms: the emergence of control is not dependent on the material contained in a control construction, but only on its structural make-up. The problem becomes apparent if control is compared to a phenomenon such as anaphoric binding (cf. Fischer, this volume). Neither reflexivization in itself nor the structural domains in which reflexive pronouns have to be bound are dependent on lexical classes: predicates allow the reflexivization of their arguments, and reflexives are bound in appropriate local domains, provided they contain the relevant structural properties. But, as (3a, b) show, the controller seems to depend on the control predicate. Quite often, this observation is either ignored, as in Hornstein (1999), or presumed to be based on a misconception of lexical ties of control. Manzini and Roussou (2000), e.g. cite cases of controller shift (i.e. control coercion), as exemplified in (1), to argue that a lexical determination of controller choice is only superficial, and that controller choice may vary with control predicates, but will be subject to locality conditions nevertheless. Section 2.3.2 will cover this claim in more detail. It seems that models of control are mainly concerned with answering one question, namely how controller choice − as illustrated in (3a, b) − is determined. But in addition, models of control have to assess various syntactic properties of the controlled phrase. One major question concerns the syntactic structure and category of the controlled phrase (are controlled phrases verbal by necessity?). Another point to be addressed is whether control is crosscut by the distinction between complements and adjuncts. Finally: does control have to be restricted to non-finite phrases? Finite control seems to suggest a negative answer to this question. And yet, there is a broad consensus that control affects non-finite phrases more commonly. As the existence of control into finite clauses cannot be disputed, it should follow from the same model. Many models, however, block finite control explicitly or implicitly, as e.g. in Pollard and Sag (1994), and Manzini and Roussou (2000). We will restrict our survey to analyses of control into non-finite phrases − for the simple reason that most models do not even deal with control into finite phrases (we note, however, Landau 2004 as an exception). In Rosenbaum’s model (1967, 1970), which can be seen as one starting point, the question of controller choice is intertwined with the structure of the controlled phrase: it is characterized as a non-finite complement clause whose subject is identical to another syntactic argument of a control predicate. The subject of this controlled complement is erased under identity, hence Rosenbaum named the relevant transformation Equi-NP-
1324
VI. Theoretical Approaches to Selected Syntactic Phenomena
Deletion. A Principle of Minimal Distance (PMD), later known as the Minimal Distance Principle (MDP), guides Equi-NP-Deletion. The MDP requires that the subject of the controlled clausal complement be identical to the closest nominal argument of the control predicate. Hence subject control will emerge if the control predicate only possesses one further syntactic argument, and object control becomes necessary if the control predicate governs additional nominal objects. Manzini and Roussou (2000) define controller choice in terms of a variant of the MDP. The controller stands in a nonlocal relationship to a verbal, subjectless projection. Not only Manzini and Roussou (2000), but also Larson (1991) and Hornstein (1999) invoke the MDP with regard to controller choice. In the models of Pollard and Sag (1994) and Jackendoff and Culicover (2003), however, controller choice is disentangled from syntactic configurations and distance measures. There are models of control that address the question of controller choice, but do not commit themselves to a particular syntactic structure of the controlled phrase; Bech’s (1955/57) model of control − presumably the first generative analysis of control − can be characterized in these terms. Bech assumes that control is lexically triggered. Controlled phrases are always verbal in Bech’s model, but Bech does not assign a particular syntactic structure to them. The missing subject of the controlled phrase is identified as one of the other arguments of the control predicate. This leads to syntactic models of control that do not assume a clausal but a VP analysis of the controlled complement, as e.g. Bresnan (1982), Chierchia (1984), and Pollard and Sag (1994). It should be noted though that these models differ with respect to the syntactic relevance of the missing subject: While Chierchia (1984) and Manzini and Roussou (2000) assume that control complements do not possess a subject at any level of linguistic description, Bresnan (1982) and Pollard and Sag (1994) make use of representations of the subject either in the functional structure or the argument structure of the controlled phrase. Section 2 will discuss structural and semantic approaches to controller choice, section 3 will discuss the implications of particular models of control for the syntactic properties of the controlled phrase, and section 4 will illustrate the interaction between theoretical models and the phenomena they try to address with backward control (Polinsky and Potsdam 2002). A brief summary at the end of this survey serves to amplify that models of control are still very much open to refinement and renovation.
2. Controller choice and obligatory control 2.1. Starting points In his seminal study on non-finite constructions in German (Bech 1955/57), Gunnar Bech presents a model of controller choice that is dependent on lexical classes and already takes complement and adjunct control into consideration. Different classes are identified by their orientation, which is Bech’s term for controller choice. Orientation is expressed as a coefficient of a governor, whereby an argument of a governor is identified with the subject of a governed element. In (4), we present lexical entries for subject and object control verbs (German versprechen ‘promise’ and überreden ‘persuade’):
38. Models of Control (4)
1325
a. versprechen(N′:N″) b. überreden(A′:N″)
Here, X′ stands for the argument X of the governing element, and Y″ stands for the argument Y of the immediately governed element, where arguments are differentiated according to their case-markings. Bech takes into account that the realization of N′ depends on the finiteness of V, and takes orientation to be a semantic relationship. Hence, the coefficient N′:N″ is to be read as ‘the argument of the governor that would be realized in the nominative determines the reference of the argument of the governed element that would be realized in the nominative’. The coefficient (N′:N″) indicates that versprechen as a governor will orient the NP which is marked nominative, if versprechen is realized finitely (i.e. N′) towards the unexpressed nominative NP (i.e. N″) of the verb that it governs. The coefficient (A′:N″) in (4b) shows that überreden will orient its accusative NP (A′) towards the unexpressed subject (N″) of the verb it governs. It is thus not surprising that Bech’s model has been compared to Bresnan’s (1982) LFG analysis of control in terms of grammatical functions. Orientation is transitive, so local orientations could in principle be chained together, as illustrated in (5). Since überreden has the orientation (A′:N″) and versprechen the orientation (N″:N‴) it follows that (A′ = N″ = N‴). (5)
Ulrich überredete(A′:N″) Klaus, Maria zu versprechen(N″:N″), Ulrich persuaded Klaus Maria to promise sich zu bessern. REFL to improve ‘Ulrich persuaded Klaus to promise Maria to better himself.’
[German]
Adjunct control is mediated by constructional control coefficients. Bech assumes that adjunct control is defined between the syntactic projection of the modified verb, and the verbal complement of the German non-finite complementizers um ‘in order to’, anstatt ‘instead of’, and ohne ‘without’. Bech discusses control by subjects of the modified verb, as well as by objects, the latter illustrated in (2). In characterizing (complement) control through coefficients of lexical entries, Bech’s model of control is a forerunner of the models of control in Bresnan (1982) and Pollard and Sag (1994). Bech also allows empty coefficients to account for arbitrary control (cf. [20] below). Bech’s approach can also account for the ungrammaticality of (3c). Bech stipulates that orientation can only be established between a governing predicate and a predicate that is directly dependent on the governing predicate − or between an adjunct and the element that is directly modified by the adjunct. Bech thus does not only provide a very early model of control, but also a model that already represents the locality of OC − albeit in terms of a stipulation. The first model of control in transformational generative grammar was developed in Rosenbaum (1967, 1970). Rosenbaum assumes that control can be described by a deletion operation. An NP erases an embedded NP if the latter is syntactically identical to the former. This operation is accordingly called Equi-NP-Deletion. The pertinent transformation rule is constrained by the Minimal Distance Principle (MDP) (Rosenbaum
1326
VI. Theoretical Approaches to Selected Syntactic Phenomena
1970: 27) in (6). The MDP offers a structural, distance-based account of controller choice. (6)
An NPj (…) is erased by an identical NPi (…) if and only if there is an Sα (…) such that: (i) NPj is dominated by Sα (ii) NPi neither dominates nor is dominated by Sα (iii) for all NPk neither dominating nor dominated by Sα (…), the distance between NPj and NPk is greater than the distance between NPj and NPi where distance between two nodes is defined in terms of the number of branches in the path connecting them.
The MDP in (6) requires that the erasing NP is the closest NP available. Controller choice is thus derived from the subcategorization frame of the controlling predicate and the MDP. In (3a, b) NPj is the subject of the embedded sentence [S NP to behave himself]. In (3b) the embedded subject Paul is erased since the distance between Paul and the object is smaller than the distance between John, the subject, and NPj, as required by (6iii). Hence, the condition is at odds with the behaviour of promise in (3a), where object control would be expected by the same line of reasoning. With respect to these cases, Rosenbaum (1970: 28) remarks that “[t]here are apparent exceptions to the [MDP], but it is too early to determine whether the fault lies with the principle or with the analyses ascribed to these exceptions.” Future treatments have in fact followed both lines of inquiry. While Pollard and Sag (1994) and Jackendoff and Culicover (2003) have opted for the first option, Hornstein (1999) and Manzini and Roussou (2000) have argued for the second one. Rosenbaum’s analysis in terms of Equi-NP-Deletion has been criticized on syntactic and semantic grounds. Controlled complements start out as clausal complements in Rosenbaum’s analysis. Brame (1976: 93−95) points out that Equi-NP-Deletion thus wrongly predicts that predicates taking non-finite VPs can take (non-)finite clausal complements as well. He notes that predicates such as try, decide, condescend, attempt, need, persuade, and convince take non-finite VPs, but prohibit the realization of non-finite forto-clauses, as illustrated for decide in (7). (7)
*John decided for Mary to sing.
One cannot argue that (7) is semantically deviant, as an interpretation of the ungrammatical sentence is readily available, similarly for other control verbs. Rosenbaum’s proposal has also been criticized from a semantic perspective, as discussed in McCawley (1998: 127−128). Simple quantificational control sentences can be used to test the semantic validity of Equi-NP-Deletion in the following way: If quantificational control sentences are the result of deleting the quantificational (i.e. controlled) phrase in the lower clause, these sentences should receive an interpretation that differs from a bound variable interpretation. (8)
a. Every contestant expects to win. b. Every contestant expects that every contestant wins. c. Every contestanti expects that hei will win.
38. Models of Control
1327
According to (6), (8a) should receive the implausible interpretation (8b). Clearly, (8a) can best be paraphrased by the bound variable reading represented in (8c), which should not be available at all, given that the pertinent NPs (every contestant, he) are not identical. The idea of a deletion of identical NPs has been given up during the 1970s. Instead, structural models of control take it for granted that the subject of the controlled verb shares properties with a reflexive pronoun, which also accounts for the bound variable readings in (8).
2.2. Control as anaphoric binding Manzini (1983) follows Chomsky (1981) in presupposing that controlled phrases show a covert syntactic subject, PRO. Chomsky (1981) presumes that PRO is a reflexive and a pronoun at the same time. This conflict determines the distribution of PRO. In contrast, Manzini assumes that PRO is an anaphor, and hence subject to binding theory (cf. Fischer, this volume). Manzini (1983) proposes within a local domain, any controller choice is possible. Specific controller choices are thus not tied to control predicates but to semantic and pragmatic considerations, which unfortunately are not made explicit. As was pointed out already, Manzini’s conclusion is a reaction to data like (1) and (9), where object control verbs can act as subject control verbs and vice versa. (9)
Mary promised Bill to be allowed to shave himself.
PRO is classified as an anaphor, and hence requires a binder in a suitable domain. If contained in an object clause, the usual conditions of binding theory apply. The next S dominating the object clause contains all necessary elements to establish a binding domain for PRO, and consequently, PRO has to be bound by an NP contained in the next S. Thus (3c) is classified as ungrammatical since the antecedent of PRO is not contained in PRO’s binding domain, as further illustrated in (10). (10) *[S Maryi guaranteed [S′ that [S John promised [S′ PROi to behave herselfi ]]]]. The analysis proceeds as follows (cf. Manzini 1983: 424−426): The reflexive anaphor herself in the lowest S′ must be bound by PRO, hence bears the same index as PRO. But PRO is an anaphor as well. The dominating S contains all structural ingredients required to become a binding domain for PRO: it contains a governor for the lower S′, namely the verb promise, and it contains an accessible subject, which happens to be identical to the eventual binder of PRO, namely [NP John]. The situation is different with regard to (3d, e). It should be noted first that these examples involve it-extraposition of a subject sentence. Hence, they have to be treated on a par with examples like (11), taken from Manzini (1983: 424). (11) a. [PRO to behave oneself in public] would help Bill. b. [PRO to behave himself in public] would help Bill.
1328
VI. Theoretical Approaches to Selected Syntactic Phenomena c. Mary knows that [PRO to behave herself in public] would help Bill. d. [PRO to behave himself in public] would help Bill’s development.
I would like to point out that the grammaticality status of (11a, b), as well as of (12a), is not undisputed, but I will assume the examples to be grammatical for the sake of the argument presented in Manzini (1983). The reflexive oneself in (11a) shows that PRO is not coreferential with Bill; but coreferentiality is not blocked, as indicated in (11b). Similarly, PRO can be coreferential with an argument in a higher clause (11c), and even with an NP contained within another NP (11d). The examples in (11) can be turned into extraposed variants akin to (3), as given in (12). (12) a. b. c. d.
Iti would help Bill [PRO to behave oneself in public]i . Iti would help Bill [PRO to behave himself in public]i . Mary knows that iti would help Bill [PRO to behave herself in public]i . Iti would help Bill’s development [PRO to behave himself in public]i .
The basic difference between (3a, b) and (3d, e) is the non-accessibility of the subject. As the controlled phrases are subjects themselves, it would become necessary to coindex PRO with the phrase that contains PRO, i.e. S′, as indicated for (3d) below in (13). (13) Johni believed [that [S [S′ PROi to behave himselfi ]i [Agr wasi ] necessary]. Following the assumptions of Government-Binding Theory (Chomsky 1981), the only accessible subject within S in (13) is the head of S = Agr (sometimes called Infl or just I). But Agr and the non-finite phrase are co-indexed by definition (because the nonfinite phrase is the specifier of Agr). Agr would not only be the accessible subject for an anaphoric binding of PRO, it would in fact be the binder as well. As a consequence, PRO, the S′ containing PRO, and Agr would bear the same index in (13). Such a coindexation would lead to self-indexation (as PRO is contained in S′). This is blocked on principled grounds by the infamous i-within-i condition. Since the blocked co-indexation in (13) would be the only one available, a binding domain for PRO cannot be defined, and in a slightly circular move, the status of PRO as an anaphor is denied, as it does not have a binding domain. Hence, PRO is classified as a pronominal entity in (13), as well as in (3e), (11), and (12), and can pick up antecedents in non-local domains.
2.3. Syntactic treatments of controller choice The proposals discussed in this section differ from Manzini’s proposal in that control receives a purely syntactic treatment, and thus follow Rosenbaum’s assumption that the fault does not lie with the MDP. The analyses of Hornstein (1999) and Manzini and Roussou (2000) do not only share the assumption that the MDP is basically correct, but also propose that it can be derived from a more general syntactic condition on locality, the (Scopal) Minimal Link Condition (which receives different formulations in the two approaches). While Hornstein (1999) assimilates control to subject raising by assuming that both cases are instances of phrasal movement, Manzini and Roussou’s analysis is
38. Models of Control
1329
based on the attraction of features. Attraction, however, does not differ from movement with respect to the locality conditions that must be obeyed by it. Since there are many similarities between Hornstein (1999) and Manzini and Roussou (2000), we will discuss their approaches in parallel, beginning with the treatment of subject control, object control, and promise-type verbs in section 2.3.1, the treatment of adjunct control and arbitrary control in section 2.3.2, and finally, we discuss some critical aspects in section 2.3.3.
2.3.1. The locality of subject and object control Hornstein’s starting point is the observation that OC does not allow split antecedents. He suggests that OC should be treated as movement. It only differs from subject raising in that a controller bears more than one θ-role. If control is treated as movement, it has to obey constraints on movement, and in particular the Minimal Link Condition (MLC). The MLC requires that a movement proceeds in local steps. Leapfrogging of potential antecedents is not allowed. Hence, the moved phrase finds the most local antecedent available (Hornstein 1999: 76). A formulation of the MLC is given in (14). (14) Minimal Link Condition α moves to γ if there is no β, β closer to γ than α, such that α could move to β. In a case of object control, the embedded subject of a controlled phrase has to move to the object position of the controlling verb, since this position is β, and hence closer than the subject position α. In case of subject control, the controlling verb does not possess an object, hence no object position may intervene between the embedded subject position and the higher subject position. Thus movement into this position is not blocked by intervention. Hornstein (1999: 83, 87) assumes that promise-type verbs are marked constructions in that they are acquired significantly later than transitive object control constructions − a dubious assumption, which is in need of empirical corroboration. Without further proviso, the object of promise would be β in (14), so that the embedded subject (= α) could move to the object of promise (= β) instead of moving to its subject (= γ). Manzini and Roussou (2000) point out that the MDP is based on the same notion of closeness as the Minimal Link Condition. Thus it seems plausible to unify both constraints. Although Manzini and Roussou follow Hornstein in invoking the MLC to account for controller choice, their analysis of control is not one of movement (Manzini and Roussou 2000: 417). Instead, they assume that θ-roles can be treated as features of predicates, and that syntactic specifiers (i.e. phrases occurring in specifier positions) must attract these features. Controlling phrases thus attract the θ-roles of predicates. And this attraction is subject to the MLC. Manzini and Roussou’s analysis further differs from Hornstein’s, and in fact from the majority of analyses within transformational grammar, in that it does not require arguments to start out in θ-positions prior to movement into functional specifier positions. The operation Attract takes care of the identification of specifiers and θ-roles.
1330
VI. Theoretical Approaches to Selected Syntactic Phenomena
Control emerges in the specific situation that one argument attracts more than one predicate and hence establishes a link between one argument and more than one thematic role. Subject control is represented as in (15). (15) [IP John I [VP tried [IP to [VP leave]]]] The boldface type in (15) indicates that the specifier (and subject) John stands in certain thematic relations both to the predicate try and to the predicate leave, i.e. attracts the external θ-roles of both predicates. As the operation Attract directly associates an NP in specifier position with a θ-role, viewed as a feature of a predicate, the need for a representation of a subject (such as PRO) in the controlled phrase vanishes in Manzini and Roussou’s analysis. Accordingly, there is no subject of the controlled phrase in (15). Such an association may be blocked if association of a predicate with a closer NP is possible, as expressed in terms of a Scopal Minimal Link Condition in (16). (16) Scopal MLC Feature F attracts feature FA only down to the next F′ that also attracts FA. (Manzini and Roussou 2000: 422) The workings of the Scopal MLC can best be illustrated by analysing object control verbs such as persuade, as given in (17). (17) [S Mary Agr [VP persuaded [νP John ν [νP V [S to [VP eat ]]]]]] In (17), Manzini and Roussou (2000: 424) apply Larson’s (1991) shell analysis to control verbs. The verb persuade starts as an abstract verb V, which takes the lowest S as its complement. The NP John occupies the specifier position of νP, where it receives accusative case (the configuration ν [VP V … is not present in [15] since try is an intransitive verb; it does not assign the accusative, and hence the presence of ν is not required). John can attract the internal θ-role of the embedded abstract verb V, as well as the external θ-role of the verb eat, since there is no lower F′ that would attract either role. The attraction is thus in accordance with the Scopal MLC in (16). The subject Mary establishes a specific thematic relation to persuade in the highest VP by attracting the external θ-role of persuade. The establishment of a similar relation of Mary to the embedded verb eat is blocked by the presence of John that has attracted the respective θ-role. The operation Attract shares several properties with Hornstein’s analysis of control as movement: it is subject to a locality condition (the MLC), and like movement operations in general, it should be sensitive to island constraints. Hence, it may seem as if Manzini and Roussou’s analysis is a variant of Hornstein’s analysis. But Manzini and Roussou offer a strikingly different analysis of promise-type verbs. They do not only employ Larson’s (1991) proposal to deal with object control, but also to account for promise-type verbs. Larson (1991: 104, 111−113) assimilated control cases of promise to ditransitives realizations of promise, as illustrated in (18). (18) a. John promised Mary a sports car. b. John promised a sports car to Mary.
38. Models of Control
1331
Larson assumes that (18a) must be derived from (18b), so that Mary starts out (in DStructure) from a syntactic position that is less prominent than the position of the other object. Larson (1991: 113) applies the same analysis to control cases so that (3a) receives the D-structural analysis in (19). (19) [VP [NP John] [V′ [V e ] [VP [NP e ] [V′ [V′ promise [NP Paul]] [α to behave himself]]]]] In the representation in (19), the NP Paul forms a constituent with promise, which excludes the non-finite complement (the category of which is given as α in Larson’s analysis). The complement is in fact analysed as a V′-adjunct. The NP Paul moves into the empty − dethematized − subject position of the lower VP, while promise moves into the empty head position of the higher VP, yielding the usual serialization encountered in (3a). Manzini and Roussou (2000: 424) do not directly apply Larson’s analysis to promise, but only refer to it. The crucial aspect for the theory of attraction is that an attractor must be syntactically more prominent than the attracted θ-role, which is not the case for the external role of to behave himself in (19). It should be noted, though, that Larson’s analysis rests on the stipulation of D-Structure, which is presumably not provided by Manzini and Roussou’s analysis − recall that subjects and objects do not have to appear in argument position prior to being realized as specifiers. It would that have been clearly preferable if Manzini and Roussou would have provided a detailed analysis of sentences containing promise. Apart from that, we will see that Larson’s analysis has been subjected to criticism by Jackendoff and Culicover (2003), to which we will come below.
2.3.2. Arbitrary control and adjunct control If control is movement, as in Hornstein (1999), then arbitrary control appear to be somewhat of an anomaly − a problem that already been pointed out in Jackendoff and Culicover (2003: 519). Arbitrary control is exemplified in (20). There is no controller, while a movement analysis seems to suggest that control requires its presence. (20) It is hard to work. Hornstein proposes that the infinitive’s subject in (20) is syntactically realized as the empty pronominal pro, and hence, movement does not handle arbitrary control. Manzini and Roussou (2000: 427−428) object to this analysis since it requires the presence of pro in the grammar of English, which is even considered problematic in the grammars of languages that seem to have required it − such as Romance languages with free drop of subject pronouns. Instead, Manzini and Roussou assume that arbitrary control is attraction of a predicate by an abstract quantificational operator in C, the head of the clause, as sketched in (21). (21) [C [it is hard [to work]]] The abstract quantificational operator C in (21) attracts only the predicate work. Manzini and Roussou (2000) do not assume an analysis where the expletive subject attracts the
1332
VI. Theoretical Approaches to Selected Syntactic Phenomena
predicates hard and work, as sketched in (22), because the expletive is actually a corollary of the of the non-finite complement, cf. (23). (22) [it is hard [to work]]] But it remains unclear in (21) why the abstract quantifier does not attract both predicates. Manzini and Roussou (2000: 429) assume that only DPs may act as attractors. So an attraction of hard by its non-finite subject is blocked in (23). But being a quantifier, it seems plausible that the abstract element is analysed as a DP, and would thus be able to attract the predicates. (23) [C [[To work] is hard]] The same analysis is applied to control into subject sentences in general; C is taken to be open to anaphoric relationships. The consequences for the distinction between OC and NOC will be discussed below. It should be noted here that a relationship between C and work in terms of Attract violates the Subject Condition, a strong island. The consequences of this objection will become clear in their analysis of adjunct control. In opposition to a longstanding tradition in transformational grammar (which is very present in Manzini and Roussou’s analysis, as we have seen in 2.3.1), Hornstein (1999) assumes that movement is not necessarily guided by c-command. Consequently, Hornstein denies that upward movement is the only option. This is so because Hornstein requires sideward movement to account for adjunct control cases like (24), cf. Hornstein (1999: 88−89). (24) John heard Mary without entering the room. (25) [S John [Agr past [VP/VP [VP John [hear Mary]] [Adj without [S John [Agr ing [VP John [enter the room]]]]]]]] The subject of enter moves from the position of the most deeply embedded VP into the specifier position of the adjunct clause in (25). From this position, it is required to move into the specifier position of the VP headed by hear, to pick up the θ-role of this verb. The specifier position of hear, however, does not c-command the specifier position of the most embedded clause. What is more, movement from an adjunct should be ungrammatical in general, since adjuncts are considered strong islands − a position in fact denied by Hornstein (1999: 89). Yet, adjunct control is perfectly grammatical. Hornstein has to assume that adjunct control is always subject control. Sideward movement seems to imply that movement from the specifier of the lower S into the object position of hear is not allowed, and hence that object control could not take place into adjuncts. Manzini and Roussou explicitly acknowledge the status of adjuncts as strong islands. Consequently, neither attraction nor movement out of an adjunct should be possible. Manzini and Roussou note, however, that islands can be circumvented by parasitic gaps, as exemplified in (26). (26) [Which book]i did you review ti [without reading ei ]?
38. Models of Control
1333
The gap ei inside the adjunct is parasitic on the gap ti in object position in (26). Manzini and Roussou apply a similar configuration to their analysis of adjunct control, which shares the problematic assumption that adjunct control by objects is excluded with Hornstein’s analysis. In an example like (27), the subject John attracts the predicate leave inside the VP, and the predicate eat inside the adjunct, just as the wh-phrase in (26) attracts the trace ti inside the VP and the gap ei inside the adjunct. (27) [S John Past [VP [nP leave the house] [before [ing [ eat]]]]] Although they do not discuss this issue, the lack of object control into adjuncts can be derived from this analysis, since parasitic gaps do not emerge from movement inside the VP (or νP). Interestingly, the analysis requires for (26) that actually two parasitic constructions are involved: one by wh-movement and one by control.
2.3.3. Critical aspects of a syntactic theory of control The syntactic models of control presented here have caused quite a few critical reactions. One major point of criticism concerns the treatment of promise-type predicates. The analyses presented here basically follow the two-edged strategy already proposed in Rosenbaum (1970). Hornstein assumes that the MDP is correct, and that promise has to be taken as an exception. Manzini and Roussou assume that the standard analysis of promise-type verbs is wrong. Landau (2000: 201) addresses Hornstein’s claim that promise-type verbs are marked. He points out that even if promise is classified as marked, the MLC cannot be understood in such terms, since it is a general condition on movement. Hence it cannot be subject to markedness conditions. If control is movement, and as such, is constrained by the MLC, markedness should simply not occur in its realm. In addition, Hornstein’s assumption is at odds with a number of predicates that behave exactly like promise, but are even less susceptible to markedness. Chomsky (1968: 58) cites minimal pairs such as (28a, b) to show that object as well as subject control is possible in the presence of an object without giving rise to markedness. Further examples for English have been provided by Postal (1970). (28) a. John asked me what to wear. b. John told me what to wear. Clearly, the verb ask in (28a) could be accounted for by assuming Larson’s (1991) analysis of promise as a double object verb, as illustrated in (19). This analysis has been criticized by Jackendoff and Culicover (2003: 529−531), among others. Jackendoff and Culicover point out that Larson’s analysis does not take into account the syntactic behaviour of the nominalized control predicates such as vow, offer, guarantee, pledge, oath, as illustrated for promise in (29). (29) a. John gave Susan some sort of promise to take care of himself/*herself. b. Susan got from John some sort of promise to take care of himself/*herself.
1334
VI. Theoretical Approaches to Selected Syntactic Phenomena
Larson’s analysis may be applicable to the verb promise, as promise occurs in ditransitive constructions. But the verb pledge − that shares its behaviour with promise in control constructions − does not occur in ditransitive constructions. Yet it requires subject control despite the presence of an (indirect) object, as is illustrated in (30) taken from Jackendoff and Culicover (2003: 529). (30) John pledged to Susan to take care of himself. What is more, the verb tell occurs in the ditransitive construction (Jackendoff and Culicover 2003: 531), but requires object control, as was illustrated in (28a). Taken together, these points suggest that MDP-based accounts as well as MLC-based accounts still suffer from the same problems as Rosenbaum’s (1967, 1970) initial analysis − irrespective of whether the MLC is defined as a constraint on actual movement or on the distance between an NP and the predicate(s) which assign a role to that NP. The problems can neither be eliminated by presuming markedness of lexical predicates with respect to control, nor by restructuring the argument structure of the predicates to turn superficial proximity into underlying larger distance. The first strategy is at odds with the general concept of movement, the second strategy fails to account for the full range of data (a problem that holds for the first strategy as well). Another problem for both Hornstein (1999) and Manzini and Roussou (2000) emerges from cases of non-control through a derived subject, as illustrated in (31). (31) The boat was sunk [to collect the insurance]. An analysis in terms of movement or attraction would wrongly predict that the adjunct is controlled by the derived subject, while it is clear that we either have a case of arbitrary control or control by the implicit demoted subject. Manzini and Roussou (2000: 435−436) assume that the adjunct is in fact attached to the phrase at the sentential level, and that it is controlled by an abstract quantificational operator, just as in (21). But this conclusion is problematic in two respects: First, as Manzini and Roussou (2000: 436) acknowledge, it requires the same syntactic structure for the active counterpart of (31), as illustrated in (32). (32) [C [S [S The shipowners sank the boat] [to collect the insurance]]]. Now, the subject cannot attract the embedded predicate since it does not c-command it. It thus remains unclear how the obvious control relation between the subject and the adjunct is established in (32). Manzini and Roussou (2000) assume that an anaphoric relation between the subject and the abstract operator establishes control indirectly here. But it cannot be denied that the subject of the clause can be the only element to which the operator can establish an anaphoric relationship, which would be a cataphoric relationship between C and the subject according to Manzini and Roussou. They account for this case of apparent control by assuming that pragmatic inferences rule out any other controller, but they remain silent on the details of this claim. Which pragmatic inference should block the matrix subject Mary from acting as a controller in cases like (33)?
38. Models of Control
1335
(33) Mary claimed that John sank the boat to collect the insurance. But apparent super-control seems not to be pragmatically odd, but simply ungrammatical in cases like (33). In (32), C and the subject of the matrix clause must not bear the same index. The subject is an R-expression and cannot be bound by the quantificational operator C. So syntax strictly prohibits an inference in (32) which seems to be the only one available in (32) and (33), irrespective of how much pragmatics is invoked. In this, the examples (32) and (33) sharply differ from cases of NOC with subject sentences, where various anaphoric options become possible. As in general for adjunct control, Manzini and Roussou assume a parasitic gap configuration in (31) and (32). As the examples are still instances of adjunct control, apparent violations of adjunct islands must be circumvented. They do this by stipulating that the head of the matrix clause, i.e. tense, is controlled, but do not spell this out. Adjunct control is also problematic in that Manzini and Roussou’s analysis prohibits object control into adjuncts. The existence of object control into adjuncts is even noted in passing in Hornstein (1999), and its existence has been put forth by example (2). What is more, parasitic gaps are generally considered to be a marginal phenomenon, while adjunct control occurs in ubiquity. If the same mechanism applies to both constructions, why does one belong to the core of grammar, while the other must dwell in marginality? Finally, it should be mentioned that Manzini and Roussou (2000) does not directly reflect the distinction between OC and NOC. Complement control and certain instances of adjunct control are taken to be instances of OC, where OC proper can be characterized in terms of Attract by a nominal argument. Other instances of adjunct control as well as instances of control into subject sentences are taken to be instances of Attract by an abstract operator in C. Since this operator is in principle open to anaphoric relationships, attraction by C indirectly leads to NOC. Both adjunct control and control into subject sentences requires circumvention of strong islands, either by attracting more than one lexical predicate or by attracting matrix tense through C.
2.4. Semantic analyses of controller choice Semantic analyses of control and controller choice bring the configurational variability of control constructions to the fore, particularly in comparison to an observed rigidity in controller choice. Nominalizations have been discussed in the last section (cf. [29]) and are further illustrated with the examples in (34) taken from Jackendoff and Culicover (2003: 529) and Pollard and Sag (1994: 284−285). (34) a. the promise to Susan from John to take care of himself/*herself b. The promise that Sandy made, to leave the party early, caused quite an uproar. The examples in (29) and (34) show control by the participant who is giving a promise, but this participant might be part of a prepositional complement of a deverbal noun, as in (34a), or might be contained in a relative clause, as in (34b). In (29a), the controller is a non-local subject of a benefactive verb embedding the control predicate, and in (29b) it is even the prepositional object (the demoted subject) of a passivized benefactive. The
1336
VI. Theoretical Approaches to Selected Syntactic Phenomena
control relation is not affected by this configurational variability. Similar examples can be given for object control verbs. What is more, certain configurations do not lend themselves to proposing a configurational controller at all, as has been pointed out by Williams (1985). (35) a. Any such attempt [to leave] will be severely punished. b. Yesterday’s orders [to leave] have been cancelled. In semantically based control theories, controller choice is either directly determined through semantic properties of the control predicate, or through the set of thematic roles related to a specific predicate. This alternative is by no means new, initial ideas can already be found in Rosenbaum’s (1967) study, but the first prominent proposal was stated in Jackendoff (1972: 214−216), who assumed that controller choice is mainly a matter of thematic roles, i.e. of the lexical semantics of the control predicate. This idea has been further developed in Williams (1980), Chomsky (1980), Bresnan (1982), Pollard and Sag (1994), and Jackendoff and Culicover (2003), among others. In the following we will discuss the analyses by Pollard and Sag (1994) and Jackendoff and Culicover (2003). Pollard and Sag’s analysis determines controller choice in the basis of the semantic class of the control predicate, but generally requires the subjects of controlled predicates to be bound. While this may sound tautological at first glance, it has intricate consequences for the analysis of control coercion. We will see that semantically based theories of control (Jackendoff and Culicover 2003 in particular) offer new insights with regard to the distinction between OC and NOC. Their arguments suggest that a configurational analysis of this distinction − as e.g. proposed in Manzini (1983) − falls short of empirical coverage. While this assessment will be shown to be basically correct, semantically based control theories have yet to provide a comprehensive alternative to cover clear cases of obligatory vs. non-obligatory control, as will also be illustrated with the analysis of Pollard and Sag (1994).
2.4.1. Two semantic models of control Pollard and Sag (1994: 286−287) assume that control predicates can be reduced to three different types of relations: relations expressing an order or permit, relations expressing a commitment, and relations expressing an orientation. In addition to a propositional role identified with the controlled phrase, the three different types of relations are equipped with different sets of thematic roles or participants: orders and permits introduce an influencer and an influenced participant, commitments introduce a committor and a commissee, and orientations introduce an experiencer. Verbs like order, persuade, or appeal are instances of the influence relation. The same holds for prevent and forbid, in which a participant is influenced not to perform an action. Verbs like promise, pledge, or guarantee again invoke an action. They are instances of the commitment relation. Finally, verbs like want, desire, and expect are instances of the orientation relation, and do not invoke an action. Controller choice can then be expressed in terms of these relations and the thematic roles provided by them:
38. Models of Control
1337
(36) Control Theory (preliminary version, Pollard and Sag 1994: 288) If the semantics of an unsaturated phrase is the propositional argument in a relation that is a control relation, then the unexpressed subject of that phrase is coindexed with the Influenced, Committor, or Experiencer participant in that proposition, according as the control relation is of sort influence, commitment, or orientation, respectively. The model in (36) can account for the interpretation of the examples in (34) and (35) without further proviso. It follows from the semantics of to give a promise in (34) that the participant expressed through the subject of give will be the one committing himself or herself to the promise. Since promise again is a commitment, it follows from (36) that the Committor is the controlling thematic role. Consequently, (34a) receives the interpretation that ‘John is committing himself to Susan to take care of himself’. In (35a), the pertinent predicate is attempt, a commitment. Consequently, the unexpressed subject of the (optional) syntactic argument of attempt is coindexed with the Committor role of attempt, yielding an interpretation in which someone who attempts and thereby commits to leave will be punished. Semantically based models of control differ sharply from MDP based analyses with respect to their treatment of predicates like promise. Predicates like promise are analysed as commitments that may or may not allow the syntactic realization of the commissee participant. But whether or not the participant is realized does not matter for establishing the proper control relation, which is always to be expressed between the Committor and the unexpressed subject of the controlled predicate (apparent counter-examples in cases of controller shift, as illustrated in [9], will be dealt with below, and it will turn out that they do not form counter-examples at all). There are two major differences between the model of Jackendoff and Culicover (2003) and the one in Pollard and Sag (1994). They assume a different set of fundamental classes behind a control theory, and also assign individual predicates to different classes. While Pollard and Sag (1994) propose three lexical relations to be responsible for control, Jackendoff and Culicover (2003) assume that control can be tied to actional complements. Jackendoff and Culicover (2003) claim that a distinction between obligatory and nonobligatory control, as proposed since Williams (1980), is too coarse to cover the variety of control relations. In particular, they point out that counter the analysis of Manzini (1983), not all controlled predicates in object position show OC, while predicates in subject position may show OC (cf. also Stiebels, this vol.). As an example for a predicate allowing NOC with object phrases, Jackendoff and Culicover (2003: 522) present propose. Instead of assuming a distinction between OC and NOC, Jackendoff and Culicover (2003: 523) single out unique control to depend on the controlled phrase denoting an action. The controller is the participant who is to perform that action. Actional complements are a subclass of situational complements, and can be identified by various tests, such as turning the complement into What X did was … If the actor is animate, a voluntary action becomes the default interpretation, which again can be identified by the imperative or by inserting adverbials such as voluntarily or PPs like on purpose (Jackendoff and Culicover 2003: 524). Different controller choices are dependent on two subtypes of relations: control relations that can be subsumed under intend require “that the intender uniquely controls the
1338
VI. Theoretical Approaches to Selected Syntactic Phenomena
actional complement”, while relations expressing a subsort of the predicate be obligated require the participant who is obligated to control the unexpressed subject of the actional complement (Jackendoff and Culicover 2003: 537−538). This is in fact much less a property of certain control predicates, but a general implication of the concept of being obligated: “One cannot be obligated to perform someone else’s action; that is, the action is necessarily bound to the person under obligation.” The semantic conditions for the predicates subsumed under intend and be obligated do not directly correspond to the concepts of subject and object control. The predicates decide and persuade − to be interpreted as ‘come to intend’ and ‘cause to come to intend’ − both fall under the intend predicate. They exhibit subject and object control because the respective participant is realized as subject or object, respectively. In the same line of reasoning, the predicate be obligated subsumes order as well as promise, yielding object control in the first case, but subject control in the second. It is crucial, however, that control relations are not determined by configurations, but solely by the inclusion of the respective predicate in their semantic representation. Consequently, the behaviour of promise does not show any sign of exception in the model of Jackendoff and Culicover (2003), and the examples presented in (34) and (35) can be analysed just as illustrated above with Pollard and Sag’s (1994) analysis. The two models differ with regard to the number of classes of control relations (three in case of Pollard and Sag 1994, two in case of Jackendoff and Culicover 2003), as well as in the assignment of individual predicates to these classes. The predicates persuade and order form a class in Pollard and Sag (1994), while they do not in Jackendoff and Culicover (2003). The predicates order and promise fall into two different classes in Pollard and Sag (1994), but into the same class in Jackendoff and Culicover (2003). The third class in Pollard and Sag’s analysis − orientation predicates − has to be dealt with as exceptional in Jackendoff and Culicover (2003), since the propositional objects of orientation predicates do not have to be actional: (37) a. John hated to be taller than Fred. b. *Be taller than Fred! c. *What John did was to be taller than Fred.
2.4.2. Manzini’s Generalization and controller shift The model of Pollard and Sag (1994) refers to syntactic properties in addition to the semantic classes employed. It shares with Manzini (1983) the idea that the unexpressed subject of a controlled predicate is anaphoric in nature, although the exact definition of what counts as anaphoric differs from the one proposed in Manzini (1983). According to the binding theory developed in Pollard and Sag (1994: 278), reflexive pronouns are obligatorily bound, if they are either in object position of predicates selecting a subject, or if they are in subject position of predicates that are directly subcategorized by predicates selecting a subject. Pollard and Sag (1994) thus assume that the unexpressed subject of a controlled predicate is a reflexive pronoun, and as such, has to be bound if the controlled predicate is realized as an object of a control predicate. This assumption has a direct consequence: binding theory will require that the unexpressed subject of a predi-
38. Models of Control
1339
cate that occurs as object of a predicate that is not a control predicate will be bound, but it will not identify the binder. Hence, Pollard and Sag (1994) predict that in some cases, the binding of the unexpressed subject of a VP will not be dealt with in terms of control, but only in terms of binding. In such cases, control will emerge as a consequence of binding, but controller choice is not fixed. It will turn out that the famous cases of controller shift, as illustrated in (1) and (9), are subject to this condition. The final version of Pollard and Sag’s analysis of control takes the reflexivity of the unexpressed subject into account, as given in the following definition: (38) Control Theory (revised version, Pollard and Sag 1994: 302) If the semantics of an unsaturated phrase is the propositional argument in a control relation, then the unexpressed subject of that phrase is (i) reflexive; and (ii) coindexed with the Influenced, Committor, or Experiencer argument of that relation, according as the control relation is of sort influence, commitment, or orientation, respectively. Pollard and Sag’s fundamental insight with regard to so-called cases of controller shift is that the change is only apparent. They show that typical cases of controller shift allow a second reflexive interpretation, in which the controller does not change (Pollard and Sag 1994: 316). (39) a. Jim promised Maryi to be allowed to get herselfi a new dog. b. Jimi promised Mary to be allowed to get himselfi a new dog. c. *Sami thought that Ij promised Maryk to be allowed to get himselfi a new dog. Example (39a) illustrates the prototypical case of apparent controller shift, visible through the binding of the reflexive pronoun by the object of promise. But (39b) indicates that it is also possible that the subject of promise is retained as a controller of the passivized complement of promise. Again this is visible through the binding of the reflexive pronoun himself. Furthermore, example (39c) shows that antecedents that are located too far cannot bind the unexpressed subject of the passivized complement. Pollard and Sag agree with Jackendoff and Culicover that the complement of influence and commitment relations has to be actional. But in cases of controller shift, the complements have been coerced into situations (perhaps even states), and hence do not meet the semantic requirements of their respective governors. It is the coercion, which accounts for the cases of apparent controller shift, and not the controller shift itself. Consequently, Pollard and Sag rename the phenomenon as control coercion, and provide the following analysis: non-actional complements of control relations of type influence and commitment are coerced into actional complements by inserting a relation, which is called interpolatedcause. The insertion is realized through a lexical rule (Pollard and Sag 1994: 314), as given in (40). The effects of (40) are as follows: the input of (40) is a predicate that belongs to the class of either commitment or influence relations. The predicate syntactically selects a VP, whose subject is missing, i.e. a verbal phrase the SUBCAT value of which contains an NP. The semantics of the VP provides a SOA-ARG (state-of-affairs, i.e. propositional, argument) for the commitment or influence relation. The output of the lexical rule is a predicate that is syntactically unaltered, but which includes an interpolated cause relation in its semantics. The semantics of the VP is not the propositional argument of the com-
1340
VI. Theoretical Approaches to Selected Syntactic Phenomena
(40)
mitment or influence relation anymore, but is now taken to be the propositional argument of the interpolated cause relation. The index of the missing NP subject of the controlled predicate – i.e. – deserves special attention: it is present as the index of the missing subject in the input of the rule, but present as the index of the influencer role of the interpolated cause relation in the output. Being present in the input, this index is subject to the Control Theory given in (38). Consequently, the index is identified with the appropriate argument of the control relation, this being the COMMITTOR if it is a commitment, or the INFLUENCED if it is an influence relation. In addition, the NP is characterized as reflexive due to (38). The lexical rule removes the index from the missing subject, and assigns it to the INFLUENCE role in the output. But this index has been identified as either the COMMITTOR or the INFLUENCED in the input. The output representation will retain this identification, and will thus indicate that the INFLUENCE of the interpolated cause relation is either identified with the COMMITTOR or the INFLUENCED argument of the control relation, depending on which of the two classes the predicate belongs to. This is the effect of the Control Theory. Control Theory, however, no longer determines the identification of the missing subject’s index in the output. It is classified as a reflexive, and hence, must conform to binding theory. Consider the application of this lexical rule to the verb promise. If the complement of promise is to be allowed to leave, then it does not answer the semantic requirements of promise, in particular a promiser cannot just claim that a state will obtain, but commits himself or herself to bringing this state about. The lexical rule in (40) coerces the situational complement x to be allowed to leave into the actional complement y causes x to be allowed to leave, i.e. inserts a cause relation to re-install this particular entailment. As promise has to obey the Control Theory formulated in (38), the unexpressed subject of the controlled complement is a reflexive, and is identified with the committor argument of promise. The Coercion Lexical Rule disbands the connection between the unexpressed subject and the committor of promise, so that the unexpressed subject is still classified as a reflexive, but no longer connected to the committor argument of promise. Instead, it is now identified with the Influencer argument of the interpolated cause relation. The most important result of these operations is that the unexpressed subject of to be allowed to leave is now analysed as a reflexive pronoun that is in need of an antecedent in the domain of the control predicate, i.e. one of the control predicate’s arguments must serve as its antecedent, but the potential antecedent is no longer constrained by (38). It can thus take the object of promise as its antecedent, as illustrated in (39a), yielding an apparent case of controller shift. So-called controller shift cannot be analysed as controller shift at all: the unexpressed subject of cause to be allowed to
38. Models of Control
1341
leave is controlled by the committor argument of promise. In addition, the unexpressed subject of to be allowed to leave can take the subject as its antecedent, resulting in (39b). Binding theory, however, does not allow an antecedent that is outside the domain of the control predicate, as illustrated through the ungrammaticality of long-distance binding in (39c). Next, consider control coercion with an object control verb, as illustrated in (41). (41) John persuaded Sue to be allowed to attend the party. The example in (41) allows two interpretations. Both interpretations share the requirement that the unexpressed subject of the coerced complement, i.e. cause to be allowed to attend the party is to be identified with the influenced argument of persuade, i.e. Sue. The unexpressed subject of to be allowed to attend the party, being a reflexive, can now either be identified with the subject or the object of persuade, yielding either the interpretation that ‘John persuaded Sue so that Sue caused John to be allowed to attend the party’, or that ‘John persuaded Sue so that Sue caused Sue to be allowed to attend the party’. Finally, consider a case where a subject control verb, such as promise, has been passivized: (42) Susan was promised to be allowed to attend the party. This example differs slightly from the ones discussed above in that the controller of cause to be allowed to attend the party is not expressed syntactically due to passivization. The subject of to be allowed to attend the party now must be identified with the matrix subject, since the latter is the only available antecedent in terms of binding theory. Consequently, (42) is to be interpreted as ‘Sue was promised by someone that someone causes it that Sue will be allowed to attend the party’.
2.4.3. Returning to non-obligatory control At least two cases of non-obligatory control have to be considered when discussing semantic analyses of control: non-obligatory control of non-finite objects, and non-obligatory control of subject sentences. With regard to the latter, many proposals assume that control into subject clauses − as illustrated e.g. in (11) − is always an instance of NOC (Manzini 1983; Pollard and Sag 1994: 302; Landau 2000), but Stiebels (this vol.) as well as Jackendoff and Culicover (2003) claim that there are cases of control into subjects that are OC (or unique control, in Jackendoff and Culicover’s terms). Jackendoff and Culicover (2003: 535) provide the following example, where all reflexives except for the one referring to Bert lead to ungrammaticality, indicating that control by the object of the adjective into its subject clause is obligatory. (43) Amy thinks that [calling attention to himself/*herself/*themselves/*oneself/*myself] was rude of Bert.
1342
VI. Theoretical Approaches to Selected Syntactic Phenomena
OC into subject clauses is neither covered by MDP-based nor by semantically based analyses of control. But this may have to do with the fact that current models do not even agree which predicates require OC into their subjects. While Jackendoff and Culicover (2003: 527) assume for annoy that it allows free control (i.e. NOC) into the subject, Stiebels (this vol.) assumes that thrill requires OC. Apart from being near-antonyms, annoy and thrill are semantically very similar, both are presumably analysed as orientation verbs and have an experiencer argument in addition to their propositional subject. Controlled subjects can be actions, but do not have to be (cf. Jackendoff and Culicover 2003: 527). They thus fall out of the class of unique control predicates in Jackendoff and Culicover (2003). Their integration into Pollard and Sag’s analysis would make it necessary to recast the status of the subjects of the controlled subjects as non-reflexive. Binding theory would part company with control theory if predicates like thrill or annoy would be classified as imposing OC. The reason is that compelling arguments can be given to show that the controller is less prominent in terms of argument structure than the subject, but binding theory will require that the binder of a reflexive will always be more prominent than the reflexive. Similar considerations apply to an OC analysis of control into subjects in terms of movement (Hornstein 1999) or attraction (as proposed in Manzini and Roussou 2000, who also assume that control into subjects is an instance of NOC). The situation is better for NOC cases of control into objects. As Pollard and Sag (1994: 301−302) point out, NOC is accounted for if the pertinent predicates fall out of the class of control predicates. Since neither of the two conditions of the Control Theory apply to them in this case, the unexpressed subject may be analysed as a pronominal. Given the current state of affairs, an analysis of OC into subjects can neither be provided by MDP-based accounts nor by semantically based accounts. The latter can account for NOC into objects by assuming that control is dependent on the semantics of the predicates involved. It is less clear how purely structural proposals will deal with such cases.
3. The syntax of the controlled phrase Three interrelated questions lie at the centre of linguistic argumentation about the syntactic structure of the controlled phrase. First, is it necessary to assume a subject of the controlled phrase? Secondly, given a linguistic representation of the subject of controlled phrases, does it have to be present in configurational syntax? Finally, which implications for the syntactic structure of the controlled phrase can be inferred from answering the first two questions affirmatively or negatively? An analysis of control can presumably be established without commitment to a particular syntactic analysis of the controlled phrase (Jackendoff and Culicover 2003 is an example of this strategy). But assumptions about the structure of the controlled phrase interfere with the analysis of control in general. Consider the analysis of control coercion in Pollard and Sag (1994) as an illustration: they characterize the unexpressed subject of the controlled phrase as a reflexive pronoun. Consequently, the subject is syntactically relevant. But it does so without implying the presence of a subject in configurational syntax. This example shows that even conceding that the subject of the controlled phrase is syntactically relevant does not justify the implication of a clausal structure of the controlled phrase. The analysis of Pollard and Sag (1994) assumes a configurational VP structure, yet crucially employs a representation of the subject of the controlled phrase.
38. Models of Control
1343
A rather different, sometimes called classic, position is presented in Chomsky (1981). Here, the subject of the controlled phrase is present configurationally, and is characterized as having specific syntactic properties that lead to further implications for the syntactic structure of controlled phrases in general. Chomsky (1981) assumes that a PRO subject is to be classified as an anaphor and a pronoun at the same time, and hence, that it must not have a governing category. As a consequence, it must neither be governed from inside nor from outside its projection, which makes it necessary to embed it in a clausal projection. The presence of a subject thus leads to assumptions on the syntactic structure of the controlled phrase. But what if further evidence suggests that the presumed structure of the controlled phrase does not comply with a larger data set? As an illustration, consider the impact of clause union phenomena (Aissen and Perlmutter 1976). Certain language types (Germanic, Romance, Altaic, to name a few) allow non-finite complements to take part in operations that have been named clause-union, restructuring, argument raising, in which arguments of a lower predicate seem to appear in the syntactic domain of a higher predicate. In the following, we give an illustration from German: (44) Ulrich hat diese Lampe oft zu reparieren versprochen. [German] Ulrich has this lamp often to fix promised ‘Ulrich has promised to fix this lamp often.’ / ‘Ulrich has often promised to fix this lamp.’ The example in (44) is ambiguous. It allows one reading, in which the adverbial oft ‘often’ takes scope over the embedded verb, and one reading, in which the adverbial takes scope over the control verb versprechen. The second reading, which is actually slightly more natural, is remarkable insofar as the complement of the lower verb reparieren, i.e. [NP diese Lampe], appears to the left of the adverbial, while the lower verb appears to its right. Since independent evidence can be given that the scope of an adverbial is determined by its syntactic position, the second − more natural − reading appears to be surprising. The solution to this problem is to assume that the two verbs actually form a unit, so that the complement of the lower verb is syntactically realized in the syntactic domain of the higher verb. The adverbial then may take scope over the combination of the two verbs, as indicated in (45). (45) [VP [NP diese Lampe] [V′ [Adv oft] [V zu reparieren versprochen]]] The analysis obviously implies that the structural integrity of the VP diese Lampe zu reparieren has been violated. If the structural integrity of the lower clause, and its syntactic make-up as a full clause are required to block government of PRO, then (45) could not contain PRO, and yet, it is an instance of ordinary control. So implications for the syntactic structure or the presence of the subject that hinge on the structural integrity of the lower phrase cannot be maintained. Turning to the implications of proposing a subject for the structure of the controlled phrase as a whole, a note on terminology is required. Since syntactic terminology has changed quite often in the past 40 years, it seems reasonable not to rely on current naming, but instead to employ some theory-neutral parlance to describe the options. Hence, I shall speak of sentential analyses if controlled phrases are analysed as comple-
1344
VI. Theoretical Approaches to Selected Syntactic Phenomena
mentizerless sentences, IPs, TPs, or the like. If controlled phrases are analysed as projections of a clausal complementizer, I shall call this a clausal analysis. Finally, if only a verbal projection is employed, this will be called a VP analysis.
3.1. The category of the complement The clausal analysis of controlled phrases has been developed most prominently in Chomsky (1981), subsequently defended in Koster and May (1982), and still taken to be valid in Landau (2000). Chomsky’s analysis rests on the assumption that the presumed subject of the controlled phrase − PRO − must not be governed. Government from inside the clause is barred by stipulation − a stipulation that has been maintained throughout generative reasoning until at least Manzini and Roussou (2000). But government from outside the clause must be blocked by introducing a clausal projection that acts as a barrier. Denying that PRO must be the subject in question forms the starting point of the sentential analysis, which is most prominently featured in Hornstein (1999) and subsequent work. The sentential analysis assumes that a subject is present, but that this subject does not belong to a particular syntactic category, and does not require the projection of a complementizer to be protected from government. There are various guises of the VP analysis. The most radical VP analysis has been proposed in Chierchia (1984). According to Chierchia (1984), controlled phrases are subjectless VPs. The analyses of Bresnan (1982), Pollard and Sag (1994), and Manzini and Roussou (2000) assume that (at least certain) controlled phrases are VPs in terms of configuration, but that the presence of the subject is indicated on a different syntactic level, be it functional structure (Bresnan 1982), argument structure (Pollard and Sag 1994), or repeated configurational identification of thematic roles as features (Manzini and Roussou 2000). The following phenomena have been adduced as evidence for a clausal analysis of controlled complements: coordination, wh-infinitives, and complementizer-initial infinitives. Koster and May (1982) present coordination examples of the type (46) to argue in favour of a clausal analysis of controlled phrases. (46) John expected [to write a novel] but [that it would be a critical disaster]. It is well-known, however, that coordination cannot be considered an indicative argument for the determination of a syntactic category, let alone the internal structure of a phrase. With regard to non-finite phrases, we can observe that coordination of a PP and an infinite adjunct is possible in German, as can be witnessed in (47). (47) [Aus Mitleid] [German] und [um mich seiner melancholischen out.of compassion and in.order REFL his doleful Zudringlichkeit zu erwehren], hatte ich ihm angeboten vorbeizukommen. intrusiveness to resist had I him offered to.come.around ‘I had offered him to come around, out of compassion, and to resist his doleful intrusiveness.’
38. Models of Control
1345
Also, it is well-known that NPs (DPs) can be coordinated with clauses, and also with non-finite phrases, as illustrated in (48). (48) Er versprach ihr ein neues Auto und sich [German] regelmäßig zu he promised her a new car and himself regularly to waschen. wash ‘He promised her a new car, and to clean himself on a regular basis.’ The presence of a wh-element seems to indicate the presence of a clausal projection, hence the non-finite phrase must be a CP; similarly for complementizer-initial nonfinite phrases. (49) a. John wondered [which way to go]. b. Er kaufte das Buch, um es zu verschenken. he bought the book in.order it to give.away ‘He bought the book to give it away.’
[German]
Again, the argument does not stand under closer scrutiny. Gärtner (2009) points out that wh-infinitives are a cross-linguistic rarity, and their syntactic structure is not wellstudied. Considering complementizer-initial non-finite phrases, it should be noted that complementizers governing non-finite complements typically do not select for finite complements. In German, e.g., the three complementizers um, anstatt, and ohne select nonfinite phrases, and only ohne selects a finite phrase as well, but a full-fledged CP, as can be witnessed in (50) − similarly for other languages. (50) Ohne [CP dass er es sagen musste], wussten alle that he it say had.to knew everybody without Bescheid. the.score ‘Everybody was in the know, without him having to tell about it.’
[German]
Viewed from a slightly different angle, it should become clear that the clausal analysis of controlled phrases was always more a conjecture than a result of empirical study. Rosenbaum (1967, 1970) simply assumed a clausal analysis, as his analysis required the deletion of a subject, which had to be present in the first place. Chomsky’s (1981) deduction of PRO as being embedded into a clausal structure has been attacked by various proposals even within Mainstream Generative Grammar − such as Hornstein (1999) and Manzini and Roussou (2000). It is thus not surprising that a clausal analysis has been given up entirely in their analyses. As was already pointed out, a clausal analysis is incompatible with the facts related to clause union, cf. the discussion around (44) and the analysis in (45). What is worse, an analysis that relates the presence of a subject to a clausal structure of the controlled complement would wrongly predict that clause union is not compliant with control.
1346
VI. Theoretical Approaches to Selected Syntactic Phenomena
3.2. The syntactic nature of the subject While a clausal analysis of control can be refuted conceptually and empirically, this does not mean that controlled phrases should be analysed as bare VPs without any representation of the subject. It is thus not surprising that many analyses deny a clausal structure of the complement and still assume the presence of a subject (Bresnan 1982; Pollard and Sag 1994; Hornstein 1999). But there are still some analyses that go one step beyond and assume that controlled complements are bare VPs: Chierchia (1984) and Manzini and Roussou (2000). Chierchia’s (1984: 30−32) analysis rests on the validity of the inference in (51), which is an instance of modus ponens. (51) a. Nando tries anything Ezio tries. b. Ezio tries to jog at sunrise. c. Nando tries to jog at sunrise. Chierchia points out that an analysis that assumes a semantic subject of the controlled complement (and, eo ipso, the presence of a syntactic subject) is not able to derive the inference in (51). The reason is that a semantic representation of (51) as given in (52a), and the assumption that α is propositional, i.e. that α is filled with something like jogat-sunrise’(e), leads to the wrong inference in (52b, c), which can be paraphrased slightly non-sensical as in (52d). The identity of the variables in (52b) and (52c) is a consequence of control, which implies that the subject of try is identical to the subject of the complement of try. (52) a. b. c. d.
cα[try’(α)(e) → try’(α)(n)] try’(jog-at-sunrise(e))(e) try’(jog-at-sunrise(e))(n) Nando tries for Ezio to jog at sunrise.
The pertinent inference would be valid if the control verb try takes a property instead of a proposition as its argument. It should be noted at this point that while the validity of the argument has not been called into question, models of control which assume a propositional content of a controlled phrase tend to ignore Chierchia’s observation. Chierchia concludes that a semantic analysis of the controlled complement as a property is required, and consequently, that controlled complements are subjectless VPs. Stiebels (2010), however points out that cases of finite control behave exactly like the data presented in Chierchia (1984) − but here, a subject cannot be denied. Manzini and Roussou’s analysis of control was discussed in some detail in section 2.3. According to their analysis, a subject does not have to be syntactically present since control is modelled as attracting pertinent features, the attraction being subject to a minimality condition, the Scopal MLC in (16). Interestingly, control coercion can not only provide a plausibility argument against the analysis of Chierchia (1984), but also provides evidence that Manzini and Roussou’s analysis falls short of a proper analysis without assuming a subject of the controlled phrase.
38. Models of Control
1347
With regard to Chierchia’s analysis, we have to recall that Pollard and Sag’s (1994) analysis of control coercion crucially assumes that a coerced phrase like to be allowed to leave does not only have one subject, but instead two − the second one due to the interpolation of a cause relation. Hence, to be allowed to leave can be represented as y cause x to be allowed x to leave under the assumption that controlled complements show a subject at some level of representation. Chierchia, on the other hand, has to assume that the semantics of to be allowed to leave consists of two (in fact three) subjectless properties, cause to be allowed to leave, and to be allowed to leave. In the absence of any syntactic guidance (e.g. through a reflexive pronoun requiring binding), it remains mysterious, how the two properties become attached to their proper antecedents. Hence only the interpretations in (53b, c) can be derived from (53a), while the readings in (53d, e) are impossible. This is so because the initial control relation of the respective control predicate has to be retained. (53) a. John persuaded Sue to be allowed to leave. b. j persuaded s s cause j to be allowed j to leave c. j persuaded s s cause s to be allowed s to leave d. #j persuaded s j cause s to be allowed s to leave e. #j persuaded s j cause j to be allowed j to leave Following the analysis in Pollard and Sag (1994), I would like to emphasize that the identification of the subject of cause to be allowed to leave with the object is due to control, but the identification of the subject of to be allowed to leave with either the object or the subject is due to binding theory. In the absence of a subject of these two predicates, binding theory cannot apply, and hence, it would become necessary to derive the readings in (53b, c) from control as well − which is implausible. Also, the model does not generally block an attachment of the property cause to be allowed to leave to the subject instead to the object, with subsequent identifications of the lower property to be allowed to leave to either, as illustrated in (53d, e). The situation is worse for the proposal in Manzini and Roussou (2000). With regard to the example (54), Manzini and Roussou (2000: 426) remark that “subject control verbs such as promise do not passivize or to the extent that they passivize they require control by the derived subject, such as John in [(54)]. (…) In our theory, the wellformedness of [(54)] with an object control rather than a subject control reading, corresponds to a derivation whereby hired and promise are both attracted by John. As desired, this is the only derivation allowed by the theory”. (54) John was promised to be hired. It seems that Manzini and Roussou (2000) confuse the subject of to be hired with the subject of cause to be hired. To begin with, the example should be paraphrased as provided in (55). (55) a. dx x promise j x cause j to be hired b. dx x promise j x cause x to be hired The interpretations in (55) retain the control relation between promise and the subject of cause to be hired. Manzini and Roussou (2000) seem to assume that the subject of cause
1348
VI. Theoretical Approaches to Selected Syntactic Phenomena
to be hired is controlled by John. If this is the only derivation allowed by the theory, then the theory clearly must be on the wrong track. What is more, the subjectless analysis also makes wrong predictions with respect to control options with object control verbs, as illustrated in (53). Manzini and Roussou (2000) may either ignore the fact that the complements have been coerced, which they seem to do with regard to (54). They would thus have to assume that the complements are not actional − counter their interpretation. But this move would block the derivation of the interpretations (53b, c) and (55). If they agree that the complements are actually actional, and thus have undergone coercion, then neither reading in (55) can be derived, as the Scopal MLC prohibits control by the unexpressed subject of the passivized promise. In addition, the classic controller shift interpretation of (53a) in (53b) cannot be derived − as Sue should attract the predicate to be allowed as well. Indeed, Manzini and Roussou (2000: 426) propose that the subject of the passivized promise is not present in (54). But without a representation of the argument corresponding to the demoted subject, the correct control readings in (55) cannot be derived, and passive is entirely irrelevant in (53). The case can even be strengthened if control coercion of subject control verbs is considered. (56) a. John promised Sue to be hired. b. j promise s j cause s to be hired c. j promise s j cause j to be hired We would like to remind the reader that Manzini and Roussou (2000) − although they do not consider the case of control coercion with non-passivized promise − assume that the object of promise can be dealt with in terms of Larson’s (1991) analysis. Although Larson’s analysis has been refuted by Jackendoff and Culicover (2003: 529−531), we will take it for granted here for the sake of the argument. Accordingly, the DP/NP Sue does not interfere to determine control in terms of attraction. Given this assumption, however, it remains unclear how (56b) can be derived, since the DP John is the only phrase to attract predicates − including to be hired.
3.3. Complement and adjunct control (and control into subjects) The incorporation of adjunct control proves to be a great challenge for any model of control. The assessment of distance-based approaches to controller choice has already illustrated some of the problems in section 2.3: Hornstein’s (1999) analysis presumes sideward control, while Manzini and Roussou (2000) propose that all cases of adjunct control are hidden parasitic gaps. Both approaches assumed that subject control into adjuncts is to be taken as the standard case. They acknowledge the existence of control into adjuncts, but do not provide an analysis for it. But object control into adjuncts cannot be ignored, as was pointed out as early as in Postal (1970): If object control into adjuncts is considered a viable option at all, then it will actually become the privileged option. It cannot be refuted that object control is an option with adjuncts, but it has often been claimed that it seems to be far more restricted than subject control (Bech 1955/57 offers an early analysis of the options in German, and reaches this conclusion). The reaction of distance-based approaches to this problem has been to downplay the role of
38. Models of Control
1349
object control into adjuncts so that it appears to be almost non-existent. Both Hornstein (1999) and Manzini and Roussou (2000) assume that object control is already blocked since the adjunct is attached to a position that is higher in the structure (or later in the derivation) than the attachment point of the object, thus prohibiting movement into the object position (or else assignment of the respective relation between the object and the adjunct predicate). Were the adjunct syntactically inferior to the object, then object control into adjuncts would be expected invariably. But as the adjunct occupies a position superior to the position of the object, object control into adjuncts is excluded entirely by distance-based approaches. The situation is not ameliorated by semantic approaches to control. The title of Sag and Pollard (1991) − which is a forerunner of Pollard and Sag’s (1994) analysis − is revealing: An integrated theory of complement control. Jackendoff and Culicover (2003: 551) briefly mention adjunct control, but do not offer an analysis (they point out, though, that adjunct control seems to be much more guided by syntax than complement control). So adjunct control is either ignored or subducted by semantic approaches. While distance-based approaches to adjunct control have to take for granted a syntactic structure that eliminates a possible proximity between objects and adjuncts, the problem is far greater for semantic based approaches to control, since adjunct control − as far as it is understood empirically − does not seem to conform to semantic criteria in the same way as complement control does. Restricting an analysis to complement control might be just a reaction to this difference − albeit an unsatisfying one that leaves adjunct control without an analysis at all. It is obvious from the data that cases of adjunct control do not answer to the specific lexical class of the verb which is modified by the controlled adjunct − a distinction between orders and commitments does not play a role here. The same considerations apply to the concept of action. Controlled adjuncts clearly do not have to be actions, as can be exemplified by the data in (57). (57) a. Peter walked through these mean streets without being hit. b. Peter tried several protein diets without growing taller. So adjunct control cannot be analysed in terms of either lexical control classes, or by reference to a concept of an action. Instead pertinent factors for the determination of control seem to be structural in nature, yet not entirely determined by distance. It is thus a fair assessment to say that models of control do not properly cover adjunct control. The determination of controller choice in terms of distance requires the presupposition of a syntactic structure that eliminates certain options of adjunct control. Semantic based approaches to control have to accept that adjunct control relies on different mechanisms, but neither are these mechanisms made explicit, nor is a more general picture offered. A brief look at control from the perspective of the controlled phrase reveals more challenges than perhaps expected after 50 years of intensive research on control. As controlled phrases can be distinguished into controlled objects, controlled subjects and controlled adjuncts, the following picture emerges: All three instances of control observe OC as well as NOC, so that models of controller choice have to account for the difference between OC and NOC cases irrespective of the grammatical function of the controlled phrase. It becomes already apparent that without further stipulations, distance-
1350
VI. Theoretical Approaches to Selected Syntactic Phenomena
based models of control have to take it for granted that OC applies to controlled objects, but NOC to controlled subjects. Adjuncts are instances of OC if they are realized in the syntactic domain of the subject of the controlling clause. But this assumption requires circumvention of conditions on adjunct islands, such as Hornstein’s (1999) sideward movement, or reference to parasitic gaps, as in Manzini and Roussou (2000). Apart from these complications, distance-based approaches group complement and adjunct control into one class as being subject of OC, while control into subjects is never OC. Semantic based approaches assume that the lexical class of the predicate plays the central role, hence, both OC into complements and into subjects becomes possible, although control into subjects will presumably affect predicates that are less prototypical, e.g. are not actions, do not belong to the influence or commitment-types. NOC is possible for control into complements as well as into subjects. An analysis of control that also takes syntactic conditions into account, as e.g. the idea of the reflexivity of the unexpressed subject (as in Pollard and Sag 1994), can only account for control into subjects by relaxing such a condition − which means that a disjunction is introduced in the definition of control. Semantic based analyses presumably have to involve disjunctive formulations anyway, since they are silent in principle on adjunct control.
3.4. Summary We have addressed three syntactic properties of any model of control: the syntactic structure of the controlled phrase, the nature of a subject of the controlled phrase, and external grammatical distribution of the controlled phrase. Initial proposals of control, in particular Rosenbaum (1967, 1970) and Chomsky (1981) tied the structure of the complement to the syntactic representation of its subject. But closer scrutiny of the arguments and facts speaks in favour of a minimal structure for a controlled phrase, i.e. in favour of a VP. Clause-union phenomena even suggest that the representation of the subject of a controlled phrase is entirely independent from the structural make-up of the phrase, and generally offer an argument against a fullfledged clausal structure of controlled phrases. Arguments adduced in favour of clausal structures are also problematic either because doubts must be cast on their cross-linguistic validity. Some arguments for a clausal structure do not stand closer scrutiny. Even within Mainstream Generative Grammar, clausal analyses have been abandoned in favour of sub-clausal structures, without giving up some of the very concepts underlying the argumentation for a clausal analysis − a move that indicates that generative analyses can be recast without taking recourse to clausal structures. The same can be said about the syntactic nature of the subject. Rosenbaum (1967, 1970) assumed a perfect copy of the controller (which was in any case implausible, given − among other arguments − the difference in case marking between object controller and controlled phrase in languages that overtly mark case), later proposals (Chomsky 1981; Chomsky and Lasnik 1993) suggested a special syntactic element, PRO, whose existence was either derived from a ban on government, or from assignment of a special case. With the exception of Landau (2000), more recent transformational models have given up the assumption that a special element (PRO) is required. One of the properties of PRO is retained across frameworks: the subject of a controlled phrase is anaphoric in
38. Models of Control
1351
nature − a crucial assumption in Pollard and Sag’s (1994) analysis of control coercion. The same analysis has also provided some plausibility arguments against entirely giving up a representation of the subject of the controlled phrase. The implications of adjunct control have been mentioned early in the generative literature (cf. Postal 1970), but comprehensive treatments of control into complements, subjects, and adjuncts suffer from various problems. Not the least of which is the controversy on what has to be accepted as a fact.
4. Surprising facts Models of control generally agree with the assumption that the subject of a controlled phrase is empty. This is so either because a designated syntactic element is proposed, as in Chomsky’s (1981) classification of PRO, or because it is not present in syntax, but only in argument structure, as in Pollard and Sag’s (1994) model, or because controlled phrases are subclausal structures, as presumed by Chierchia (1984), as well as by Manzini and Roussou (2000). Backward control, although known since Kuroda (1965), is surprising in light of these assumptions. Backward control refers to a construction where a controller is syntactically realized in the controlled phrase. Control is determined by the governing verb, but the controller (in the sense of controller choice) is an empty element controlling an overt NP in an embedded clause. Schematically, backward control can be presented as in (58). (58) [S … ei … [NPi … V2] V1] In (58), ei is an unexpressed dependent of V1, and NPi is a syntactic argument of V2 that has been realized. The grammaticality of the configuration depends on the presence of V1, so V1 is called a control verb. Given the presence of V1, NPi is controlled by ei. It must be stressed that backward control appears to be a rather restricted phenomenon, both cross-linguistically, and in the languages that exemplify backward control at all. Cross-linguistically, backward control has been ascertained for very few languages, Northern Caucasian languages in particular; Matasović (2007) offers a general overview and Polinsky and Potsdam (2002) a specific analysis of Tsez. It has been proposed for a small set of other languages, including Malagasy, Romanian, Japanese, Korean, as well as Brazilian Portuguese. But in many cases, it remains unclear whether an analysis in terms of backward control could not be replaced by an equivalent analysis omitting backward control (cf. Kirby, Davies, and Dubinsky 2010 for a recent survey). Backward control is also peculiar in that a very restricted class of verbs exemplifies it. Tsez may serve as an illustration (EVID evidential). Backward control in Tsez can only be observed with two similar phase verbs, -oqa and -iča, denoting begin and continue. (59) ei [ kid-bai ziya b-išr-a] y-oq-si girl.II-ERG cow.III.ABS III-feed-INF II-begin-PST.EVID ‘The girl began to feed the cow.’
[Tsez]
The verb -oqa agrees in noun class (II) with the ergative-marked NP kid in (59), while verbal agreement is otherwise targeted by an absolutive argument, thus a III-agreement
1352
VI. Theoretical Approaches to Selected Syntactic Phenomena
pattern would be expected. Polinsky and Potsdam (2002) propose that the agreement pattern can be accounted for if instead the verb agrees with the noun class of the silent subject of the clause, and this subject in turn is controlling the embedded ergativemarked subject of the embedded clause, hence inheriting its noun class. The syntactic structure in (59) is justified by Polinsky and Potsdam (2002: 254−259) by the impossibility of scrambling of the ergative-marked argument across an adverb modifying the main verb, illustrated in (60), as well as by reflexive binding patterns, as given in (61). ziya bišr-a yoq-si kid-ba/ kid (60) a. ħuł yesterday girl-ERG/girl.ABS cow.ABS feed-INF begin-PST.EVID
[Tsez]
ziya bišr-a yoq-si b. *kid-ba ħuł girl-ERG yesterday cow feed-INF begin-PST.EVID ħuł ziya bišr-a yoq-si c. kid girl.ABS yesterday cow feed-INF begin-PST.EVID ‘Yesterday the girl began to feed the cow.’ With regard to (60), it is necessary to understand that the verb -oqa is actually ambiguous between a (backward) control and a (forward) raising interpretation. In case of forward raising, the raised subject is marked in the absolutive case, and the verb agrees with it, as expected. In (60a), both variants are given, i.e. the verb -oqa agrees with either the noun class of the ergative-marked kid-ba in control or with the noun class of the absolutive-marked kid in raising. As Polinsky and Potsdam (2002: 354) point out, word order variation (scrambling) is strictly clause-bound in Tsez. The ungrammaticality of (60b), where the ergative-marked NP is realized to the left of the adverbial ħuł ‘yesterday’, which modifies the main verb, follows if the ergative-marked NP belongs to the embedded clause. In this case, a realization of the NP to the left of ħuł violates the clauseboundedness of scrambling. The example (60c), in which the absolutive-marked NP is realized to the left of the adverbial, further supports the assumption that kid is a matrix argument, while kid-ba is an embedded argument. (61) ei nesā nesiri [irbahin-āi halmaγ-or γutku rod-a] REFL.I.DAT Ibrahim.I-ERG friend-DAT house.ABS make-INF 0̸-oq-si I-begin-PST.EVID ‘Ibrahim began, for himself, to build a house for his friend.’
[Tsez]
Reflexive pronouns require a preceding antecedent in the same clause in Tsez. The binding pattern in (61) seems to contradict this assumption, as the reflexive nesir is bound by the ergative-marked argument of the embedded predicate. Under the assumption of backward control, the reflexive is bound by the empty controlling subject (ei). The models discussed so far obviously cannot easily handle backward control. Polinsky and Potsdam (2002: 262−263) discuss and dismiss an analysis akin to the model proposed by Chomsky (1981), and an analysis of backward control based on semantic classes is yet to be provided. Polinsky and Potsdam (2002: 267−273) instead propose an analysis based on Hornstein’s (1999) movement-based account of control, where the Copy Theory of Movement (CTM) plays a crucial role. The CTM differs from earlier proposals to transformational movement in assuming that movement does not leave tra-
38. Models of Control
1353
ces behind, but exact copies (we have assumed this analysis in Hornstein’s analysis of adjunct control in [25]). Recall that according to Hornstein’s (1999) analysis, control amounts to movement, and the overt realization of the controller in the higher position is determined by the presence of a case-feature in the matrix subject position, and a corresponding absence of such a feature in the embedded subject position. Polinsky and Potsdam (2002) analyse backward control as an instance of covert movement: the subject actually moves from the lower subject position to the higher subject position, thus two copies of it are available, but only the copy in the lower position is pronounced. At first, it looks as if Polinsky and Potsdam’s application of Hornstein’s model offers a key into a possible analysis of backward control. The other models share the assumption that the controlled element always contains a subject gap. Hence they cannot account for backward control. As Polinsky and Potsdam (2002: 263) put it: “[T]hese arguments are essentially independent of particulars of Tsez syntax (…) [T]hey indicate that the architecture of [the Principles and Parameters theory] and its assumptions about control (…) quite generally rule out the possibility of [backward control, BC]. Thus, according to the [the Principles and Parameters theory], BC should not exist in natural language.” This criticism does not only apply to the model of Chomsky (1981) and its successors, but applies quite generally to all models that either assume an anaphoric relationship between controller and controlled subject in cases of OC, or do not assume the existence of a controlled subject at all. Closer scrutiny reveals, however, that the analysis in terms of movement relies on several problematic assumptions. It is thus unclear whether an analysis of backward control can be used as a guide for analyses of control in general. To begin with, Polinsky and Potsdam (2002: 267) explicitly acknowledge that Hornstein’s movement-based account relies on the potentially harmful minimality assumption explicitly stated in the MLC (cf. [14]). Secondly, the analysis must somehow force the phonetic realization of the copy in the embedded clause. Polinsky and Potsdam (2002: 270) achieve this by assuming that the upper copy is realized in a position where no case can be assigned, or more generally, by proposing that -oqa has a thematic subject but may not be able to assign absolutive case to it. With regard to reflexive binding in backward control, Polinsky and Potsdam assume that the reflexive is bound by the upper copy of its antecedent, thus circumventing possible Principle C violations, which would follow from assuming that a reflexive in the matrix would bind an NP in the embedded clause, cf. their schematic analysis of the Logical Form (LF) of (61) in (62). (62) irbahin-āi nesā nesiri [irbahin-āi halmaγ-or γutku rod-a] oqsi It is interesting to observe that similar derivations have been blocked in Indo-European languages, since they violate the strong crossover constraint (Wasow 1972). As an illustration, consider the raising verb scheinen ‘seem’ in German, which may take an optional dative experiencer argument, as illustrated in (63), with an analysis akin to the CTM. (63) [NP Die Männer]i scheinen mir [[NP die Männer]i zu arbeiten]. the men to work seem me ‘It seems to me that the men are working.’
[German]
1354
VI. Theoretical Approaches to Selected Syntactic Phenomena
The dative experiencer cannot be a reflexive pronoun, i.e. example (64) is blatantly ungrammatical irrespective of whether a de re or a de dicto interpretation is assumed. But a de re interpretation would presumably receive the same analysis as (62). (64) *[NP Die Männer]i scheinen sichi [[NP die Männer]i zu arbeiten]. [German] the men to work seem REFL ‘The men themselves seem to have the impression that they are working.’ The structure licensing a de re interpretation is given in (64), under the plausible assumption that licensing a de re interpretation means that the highest copy of the raised phrase is retained at the level of Logical Form. The ungrammaticality of (64) follows from the strong crossover constraint, or alternatively, from Principle C of binding theory. But if Principle C of binding theory excludes (64), it remains mysterious how the structurally almost identical (62) can be retained. A conceptual problem can be noted as well: The movement analysis is fairly general. It is an individual stipulation that a particular predicate will not assign case to its subject. Taken together, we would gradually expect more control predicates in a language to give up the ability to assign case to their subjects, and thus become susceptible to backward control. This point can be repeated on a cross-linguistic scale: why does backward control only appear in such a small set of languages, while forward control seems to be available in a very large set of languages? If the very mechanism that accounts for forward and backward control is the same, as in Hornstein’s analysis, it is surprising that the distribution of forward and backward control is so unequal across the languages of the world.
5. Whither control? We will end this section with a brief historical assessment of the preoccupation with control in the past 60 years. Control was one of the initial hot spots of generative theorizing, and two alternative strands of research were initiated at an early stage, a structural analysis of control in terms of distance (Rosenbaum 1967, 1970), and a lexico-semantic analysis, which makes use of semantic properties of the pertinent predicates (Bech 1955/ 57; Jackendoff 1972). The respective advantages and disadvantages of these two proposals became apparent very soon. And we must admit that neither analysis even covers the full range of data of a single language. While distance-based structural proposals cannot account for patterns that contradict a minimality assumption, semantic analyses have yet to provide a comprehensive account of control covering those areas in which semantics seem to play a much lesser role. The nature of the subject of the controlled phrase has been a topic of constant debate. A certain agreement was reached to the effect that an analysis is less plausible if it is carried out without any representation of the subject of the controlled phrase. But arguments in favour of a syntactically present subject (or even the designated element PRO) are inconclusive. This view on the representation of the subject of the controlled phrase has its impact on the structure of the controlled phrase as well. If a structural representation of the subject is not required, arguments in favour of a clausal analysis have lost their initial
38. Models of Control
1355
strength. That the structure of the complement is much less rigid than initially thought also complies with well-known facts about clause-union in a variety of diverse languages. Given the lack of a comprehensive model, and a certain degree of ignorance towards the arguments of the other side, it is perhaps not surprising that the development of new models of control petered out by the end of the last century, and that a renewed interest in further datasets can be observed. While backward control illustrates that an apparently peculiar dataset may raise important issues, and also may broaden the structural scope within models of control, its exact impact on a comprehensive model of control cannot yet be estimated, and will presumably require empirical analyses in much more detail.
38. Acknowledgements I would like to thank Katharina Börner, Stanley Dubinsky, Marc Richards, Claudia Roch, and Barbara Stiebels for comments, corrections, and suggestions.
6. References (selected) Aissen, Judith, and David M. Perlmutter 1976 Clause reduction in Spanish. Proceedings of the 2nd Annual Meeting of the Berkeley Linguistics Society: 1−30. Bech, Gunnar 1955/57 Studien über das Deutsche Verbum Infinitum. Reprinted 1983. Tübingen: Niemeyer. Brame, Michael K. 1976 Conjectures and Refutations in Syntax and Semantics. Amsterdam: North-Holland Publishers. Bresnan, Joan 1982 Control and complementation. Linguistic Inquiry 13: 343−434. Chierchia, Gennaro 1984 Some anaphoric properties of infinitives. Proceedings of the West Coast Conference on Formal Linguistics (WCCFL) 3: 28−39. Chomsky, Noam 1968 Language and Mind. New York: Harcourt, Brace, and World. Chomsky, Noam 1980 On binding. Linguistic Inquiry 11: 1−46. Chomsky, Noam 1981 Lectures on Government and Binding. Dordrecht: Foris. Chomsky, Noam, and Howard Lasnik 1993 The theory of principles and parameters. In: Joachim Jacobs, Arnim von Stechow, Wolfgang Sternefeld and Theo Vennemann (eds.), Syntax: An International Handbook of Contemporary Research, 506−569. Berlin: Walter de Gruyter. Gärtner, Hans-Martin 2009 More on the indefinite-interrogative affinity: The view from embedded non-finite interrogatives. Linguistic Typology 13: 1−37. Hornstein, Norbert 1999 Movement and control. Linguistic Inquiry 30: 69−96.
1356
VI. Theoretical Approaches to Selected Syntactic Phenomena
Jackendoff, Ray, and Peter W. Culicover 2003 The semantic basis of control in English. Language 79: 517−556. Kirby, Susannah, Davies, William D., and Stanley Dubinsky 2010 Up to d[eb]ate on raising and control part 2: The empirical range of the constructions and research on their acquisition. Language and Linguistics Compass 4/6: 401−416. Koster, Jan, and Robert May 1982 On the constituency of infinitives. Language 58: 117−143. Kuroda, Shige-Yuki 1965 Generative Grammatical Studies in the Japanese Language. Cambridge, MA: MIT Dissertation. [Reprinted (1979) in Outstanding Dissertations in Linguistics series. New York: Garland]. Landau, Idan 2000 Elements of Control: Structure and Meaning in Infinitival Constructions. Dordrecht: Kluwer Academic Publishers. Landau, Idan 2004 The scale of finiteness and the calculus of control. Natural Language and Linguistic Theory 22: 811−877. Larson, Richard 1991 Promise and the theory of control. Linguistic Inquiry 22: 103−139. Manzini, M. Rita 1983 On control and control theory. Linguistic Inquiry 14: 421−446. Manzini, M. Rita, and Anna Roussou 2000 A minimalist theory of A-movement and control. Lingua 110: 409−447. Matasović, Ranko 2007 The “Dependent First” syntactic patterns in Kabardian and other Caucasian languages. Paper presented at the Conference on Languages of the Caucasus, Leipzig. McCawley, James D. 1998 The Syntactic Phenomena of English. 2. Edition. Chicago: University of Chicago Press. Polinsky, Maria, and Eric Potsdam 2002 Backward control. Linguistic Inquiry 33: 245−282. Pollard, Carl J., and Ivan A. Sag 1994 Head-driven Phrase Structure Grammar. Chicago: University of Chicago Press. Rosenbaum, Peter S. 1967 The Grammar of English Predicate Complement Constructions. Cambridge/London: The MIT Press. Rosenbaum, Peter S. 1970 A principle governing deletion in English sentential complementation. In: Roderick Jacobs and Peter Rosenbaum (eds.), Readings in English Transformational Grammar, 220− 229. Waltham, MA: Ginn-Blaisdell. Stiebels, Barbara 2010 Inhärente Kontrollprädikate im Deutschen. Linguistische Berichte 224: 391−440. Wasow, Thomas 1972 Anaphoric Relations in English. Cambridge, MA: MIT Dissertation. [published 1979, SIGLA, 2, Ghent]. Williams, Edwin 1980 Predication. Linguistic Inquiry 11: 203−238.
Tibor Kiss, Nebelsbad (Zubrowka)
39. Theories of Binding
1357
39. Theories of Binding 1. 2. 3. 4. 5. 6. 7. 8.
Introduction Chomsky (1981) − the emergence of the binding principles A, B, and C Reinhart and Reuland (1991, 1993) − a theory of reflexivity Hornstein (2001) − bound elements as spelt-out traces Fischer (2004b, 2006) − derivational binding and local optimization Kiss (2012) − reflexivity and dependency Conclusion References (selected)
Abstract This chapter presents five different approaches to binding. Each theory adopts a different framework and focuses on different empirical issues. As a result, we get a comprehensive impression of what binding theories should be able to capture and what kind of potential flaws there are. The chapter first focuses on the pioneering work of Chomsky (1981), which relies crucially on the notion of government and develops the well-known binding principles A, B, and C. Next, Reinhart and Reuland’s (1991, 1993) theory of reflexivity is discussed, in which the notions of predicate and coargumenthood play a central role. Hornstein (2001) is a minimalist proposal which assumes that bound elements are spelt-out traces of an A-movement chain. Like Hornstein (2001), Fischer (2004b, 2006) adopts a derivational point of view; however, it assumes an optimality-theoretic approach. Here, the bound element starts out as a variable with all possible realization specifications, and local optimization eliminates the anaphoric options step by step as the derivation proceeds. Finally, Kiss (2012) focuses on the occurrence of picture NP reflexives and provides an analysis which crucially relies on the distinction between reflexivity and anaphoric dependency, because only in the latter case a binding relation is established.
1. Introduction In contrast to the chapter on pronominal anaphora (cf. Fischer, this volume), where the distribution and occurrence of reflexives and pronouns in different syntactic environments has been scrutinized, this chapter focuses on analyses that have been developed to account for these facts. In other words, different theories of binding will be presented and discussed here in detail. Chomsky (1981) can certainly be considered to play a pioneering role in this field, and the basic essence of this theory is well-known to everyone dealing with binding. However, the original formulations have often become blurred over time, since (i) there have been proposed many refinements in the meantime, and (ii) people have often used simplified versions of it. Hence, the aim of section 2 will be to thoroughly discuss the orignial version.
1358
VI. Theoretical Approaches to Selected Syntactic Phenomena
Of course, since Chomsky (1981) a huge amount of other theories has been developed, some of them in a similar spirit, some of them on the basis of completely different premises. Obviously, this chapter can only cover a fraction of the analyses proposed, so what I will do here is pick out some assorted samples with different background assumptions. Section 3 will be concerned with one of the first renowned alternative approaches, Reinhart and Reuland’s (1991, 1993) theory of reflexivization. Like Chomsky (1981), this analysis is based on a representational view of syntax. Due to lack of space, we will then jump to more recent theories from the last decade, starting in section 4 with one of the first (and arguably most radical) approaches that have been developed within a derivational framework, namely Hornstein (2001). What Fischer (2004a, b, 2006) additionally takes into account is the concept of competition − this is what section 5 is concerned with. While Fischer (2004a) provides a nonderivational version, Fischer (2004b, 2006) translates the basic ideas into a local derivational approach; in section 5 the latter version is discussed. The selection presented so far thus reflects the general shift syntactic theory has undergone from a representational setting (like G&B theory) to a derivational view (like minimalism). A second dichotomy that can generally be found in syntactic theory concerns the question of whether global or local constraints are applied. Both Fischer (2004b, 2006) and Kiss (2012), which is discussed in section 6, adopt a local approach, but while Fischer (2004b, 2006) serves as an example of a local derivational theory, Kiss (2012) adopts a local representational view. Kiss develops an HPSG-based theory of binding and provides in particular an elaborate analysis of picture NP reflexives in English and German. Hence, section 6 discusses both data and a framework that have not been dealt with in the previous sections. Finally, the chapter closes with a brief conclusion.
2. Chomsky (1981) − the emergence of the binding principles A, B, and C 2.1. An informal approach Before turning to Chomsky’s original approach from 1981, let us briefly recapitulate informally what linguists generally know as the gist of this theory. The reason for this proceeding is that it is easier to keep track of the orignial proposals if we keep in mind the quintessence of it. On the basis of data as in (1) and (2) (cf. also Fischer, this volume), the distribution of anaphors, pronouns, and R-expressions, respectively, can roughly be described as follows: (i) Anaphors must be bound in a relatively local domain; (ii) pronouns must be free in this domain; and (iii) R-expressions may not be bound at all. (1)
a. b. c. d.
Anna1 recognized herself1/*her1 in the picture. She/*Herself/*Sheself likes the picture. Her1 brother recognized *herself1/her1 in the picture. Anna1 said that Paul recognized *herself1/her1 in the picture.
39. Theories of Binding (2)
a. b. c. d.
1359
*Anna1 recognized Anna1 in the picture. Anna likes the picture. Her1 brother recognized Anna1 in the picture. *Anna1 said that Paul recognized Anna1 in the picture.
As far as the concrete nature of this domain is concerned, different definitions and different terminology have been proposed. According to Chomsky (1981), it is the governing category (cf. the following sections); Chomsky (1986b) refers to the complete functional complex (= the minimal domain in which all grammatical functions compatible with the head are realized). Moreover, the notion of binding domain is wide-spread in order to use a neutral term − the concrete definition varies.
2.2. The notion of government Let us now turn to the more formal terminology that is used in Chomsky (1981) to capture these observations. The aim of this work was basically to overcome (at least some of) the technical and conceptual problems the theory of binding proposed in Chomsky’s (1980) article On Binding had brought up (cf. Chomsky 1981, section 3.3.1). The two central principles in Chomsky (1980) are the Opacity Condition and the Nominative Island Condition. The Opacity Condition subsumes the Specified Subject Condition and the Propositional Island (or Tensed-S) Condition (cf. also Fischer, this volume), which were already proposed in Chomsky (1973). One of these conceptual problems concerned certain redundancies between this theory and other modules of the grammar. For example, Chomsky observes that both case and binding theory single out one particular NP position in a clause, namely the subject position of an infinitive; however, the two theories provide seemingly unrelated reasons for this fact (cf. Chomsky 1981: 157). In order to express the close relation between these two modules, Chomsky (1981) therefore proposes that not only case but also binding theory should be based on the central notion of government. In fact, he proposes three slightly different definitions and discusses their impact; however, for reasons of space, I will concentrate on the third version he presents, which is based on an extended definition of c-command (cf. [3]−[5]). (3)
[β ... γ ... α ... γ ...], where a. α = X 0, b. where φ is a maximal projection, if φ dominates γ then φ dominates α, c. α c-commands γ. (cf. Chomsky 1981: 165)
(4)
α governs γ in (3).
(5)
α c-commands γ iff a. α does not contain γ b. Suppose that δ1, ..., δn is the maximal sequence such that (a) δn = α (b) δi = α j
1360
VI. Theoretical Approaches to Selected Syntactic Phenomena (c) δi immediately dominates δi + 1 Then if ε dominates α, then either (I) ε dominates γ or (II) ε = δi and δ1 dominates γ. (cf. Chomsky 1981: 166)
Let us first take a closer look at the configuration described in (3). As far as the double occurrence of γ in the underlying structure is concerned, it indicates that γ may either occur to the right or to the left of α. Furthermore, (3a) constrains the set of potential governors to heads. As to the effects of (3b) and (3c), consider the abstract illustrations in (6)−(8): In (7), α governs γ; in (6), this is not the case if φ is a maximal projection; and in (8), it depends on the nature of φ and β − if both are projections of α, α governs γ in this scenario. (6)
Here, requirement (3b) is violated if φ is a maximal projection; (3c) is satisfied following the definition in (5) (with ε = β, [5b] [c-I] is fulfilled): β α
φ γ
(7)
This structure satisfies both, (3b) and (3c) (with ε = β, [5b] [c-I] is again fulfilled): φ=β α
(8)
γ
If φ and β are projections of α, (3c) is satisfied (with ε = β, [5b] [c-I] is fulfilled; with ε = φ, [5b] [c-II] is fulfilled): β φ
γ
α
In fact, structure (8) reveals that this definition of government differs from earlier versions in that it is more liberal; according to the other definitions of government proposed in Chomsky (1981), the government relation would either be generally blocked or would hold only if φ were no maximal projection. Following (3), a government relation can even be established if φ is a maximal projection (as long as it is a projection of α). After this brief revision of the original definition of the central notion of government, let us now take a closer look at the orignial versions of Chomsky’s binding principles.
2.3. The binding principles Basically, there are two notions of binding. First, binding can refer to the relation between anaphors/pronouns and their antecedents, and second, operators may bind vari-
39. Theories of Binding
1361
ables, which means that there is also a logical notion of binding. One way to distinguish between these two types of binding is to look at the position occupied by the binder, since binders of the first type are generally located in A-positions, whereas binders of the second type are located in A′-positions. Thus, we can refer to the two notions of binding as A- vs. A′-binding and introduce the following definition, where X 2 {A, A′}. (9)
a. α is X-bound by β iff α and β are coindexed, β c-commands α, and β is in an X-position. b. α is X-free iff it is not X-bound. (cf. Chomsky 1981: 185)
In the following, this explicit distinction between A- and A′-binding will be neglected for the sake of convenience, and binding is generally to be understood as A-binding since this is what binding theory is concerned with. On the basis of the notion of governing category (cf. [10]), Chomsky finally proposes the binding principles in (11), which are referred to as Principle A, B, and C, respectively. (10) β is the governing category for α iff β is the minimal category containing α and a governor of α, where β = NP or S. (cf. Chomsky 1981: 188) (11) Binding principles: Let β be a governing category for α. A. If α is an anaphor, it is bound in β. B. If α is a pronominal, it is free in β. C. If α is an R-expression, it is free. (cf. Chomsky 1981: 188, 225) Note that Chomsky assumes first that the binding principles apply at LF (cf. Chomsky 1981: 188, 194); however, in the course of the discussion he finally proposes that the level of application should rather be S-Structure (cf. Chomsky 1981: 196, fn. 34). This is mainly motivated by reconstruction sentences. Let us now consider which predictions these principles make with regard to the examples from above. In (1a) and (2a), repeated in (12), the bound element (= α) is in the object position and the subject is the antecedent. (12) [S * Anna1 [VP recognized herself1/*her1/*Anna1 in the picture]] In this configuration, the verb governs α, and hence the governing category for α is the matrix clause, S*. Obviously, S* also comprises the subject, thus α is bound in its governing category. This means that Principle A is fulfilled, whereas Principle B and Principle C are violated. Hence, (12) is only grammatical if α is realized as an anaphor. Since INFL governs the subject position in structures like these, the governing category for the subject NP in (1b) and (2b) (repeated in [13]) is again S*. However, the subject NP in these examples is not bound at all, thus Principle A rules out the anaphor
1362
VI. Theoretical Approaches to Selected Syntactic Phenomena
in (13) (independent of case), whereas the pronoun and the R-expression are correctly predicted to be grammatical. (13) [S * *Herself/She/Anna likes the picture] As far as the examples (1c) and (2c) (repeated in [14]) are concerned, V is again a governor for α, and hence S* corresponds to its governing category. (14) [S * Her1 brother [VP recognized *herself1/her1/Anna1 in the picture]] However, since the c-command requirement is not fulfilled, the coindexed possessive pronoun does not bind α; hence, Principle A is violated, whereas Principle B and C are fulfilled. Consequently, only the anaphor is ungrammatical in example (14). In (1d) and (2d) (repeated in [15]), the embedded verb serves as a governor for α; its governing category is therefore the embedded S. (15) [S * Anna1 said [S′ that [S Paul [VP recognized *herself1/her1/*Anna1 in the picture]]]] Thus, anaphors are ruled out by Principle A, since binding takes place outside the governing category, whereas pronouns satisfy Principle B. Moreover, R-expressions violate Principle C, which is independent of the notion of governing category. As a result, it can be concluded that Chomsky’s binding principles in (11) make correct predictions with respect to the examples introduced in section 2.1. However, in view of the contrast observed in (16), Chomsky proposes some further refinements as far as the notion of governing category is concerned. Following the definition in (10), NP* is the governing category for the reciprocal in both cases, since P serves as a governor for α. Accordingly, α is not bound within its governing category, and hence we should expect Principle A to rule out both sentences. (Note that Chomsky follows the structural approach to account for picture NP reflexives. Cf. also section 6 and Fischer, this volume, as regards alternative options to tackle data like these.) (16) a. We1 heard [NP* some stories about each other1 ]. b. *We1 heard [NP* John’s stories about each other1 ]. (cf. Chomsky 1981: 207, 208) Interestingly, the Specified Subject Condition assumed formerly can capture the difference, since it is sensitive to the fact that (16b) contains an intervening subject (the possessor), whereas (16a) does not. Hence, Chomsky tries to integrate some of the old insights into the present theory by introducing the notion of SUBJECT and proposes a modification of the definition of governing category, which is given in (18). (Note that [19a] is also known as the i-within-i filter, which prohibits coindexation of γ and δ if the latter is contained inside the former.) (17) a. S % NP INFL VP, where INFL = [[± Tense], (AGR)] b. AGR and the subject of an infinitive, an NP or a small clause are a SUBJECT. (cf. Chomsky 1981: 209)
39. Theories of Binding
1363
(18) a. AGR is coindexed with the NP it governs. b. β is a governing category for α iff β is the minimal category containing α, a governor of α, and a SUBJECT accessible to α. (cf. Chomsky 1981: 211) (19) a. *[γ ... δ ...], where γ and δ bear the same index. b. β is accessible to α iff α is in the c-command domain of β and assignment to α of the index of β would not violate (19a). (cf. Chomsky 1981: 212) These definitions yield the correct result for example (16): In (16a), the governing category is now the matrix clause, S*, which contains AGR as accessible SUBJECT; hence, Principle A is fulfilled. In (16b), by contrast, NP* is the governing category of the reciprocal, since it contains an accessible SUBJECT (namely the possessor); as a result, Principle A is violated and the sentence correctly predicted to be ungrammatical. As to the definition in (18), it does not only capture the contrast between (16a) and (16b); it also has the positive side effect that it accounts for the question as to why governing categories are generally 2 {NP, S}. While this had to be stipulated in the old definition (10), it now follows from the fact that we usually find SUBJECTs in NP or S. Note, moreover, that the modified definition in (18) does not have an effect on the analysis of the previous examples: The governing categories remain the same in all cases (matrix/ embedded S), since they all contain an accessible SUBJECT, namely AGR.
2.4. Problems In this section, I will briefly outline four main problems Chomsky’s theory of binding faces. First, the way in which Principle A and B have been defined (cf. [11]) suggests that anaphors and pronouns generally occur in complementary distribution. However, although the prediction is correct in many contexts, it is not always borne out (neither in English nor in other languages); cf., for instance, (20) (and also the Dutch and Italian examples in [25]). (Note that although the so-called BT-compatibility algorithm, which was introduced in Chomsky 1986b, makes it possible to predict a non-complementary distribution in certain configurations, this is not the case in [20]; cf. Hestvik 1991 as far as a corresponding modification is concerned.) (20) Max1 glanced behind himself1/him1. (cf. Reuland and Everaert 2001: 642) Further problems arise because Chomsky (1981) focuses on English. In many other languages, two different types of anaphors can be observed which differ with respect to their distribution; cf. the Dutch data in (21)−(23). As it stands, Principle A cannot account for these facts; it treats all anaphors alike. (Note that the morphologically simple anaphor, which does not occur in English, is glossed as SE in the following examples, following Reinhart and Reuland’s notation; cf. Reinhart and Reuland 1991, 1993.)
1364
VI. Theoretical Approaches to Selected Syntactic Phenomena
(21) Max1 wast [Dutch] zich1 /zichzelf1. Max washes SE/ himself ‘Max1 washes himself1.’ (cf. Koster 1984: 141; Reuland and Reinhart 1995: 242) (22) Max1 haat zichzelf1 /*zich1. [Dutch] Max hates himself/ SE ‘Max1 hates himself1.’ (cf. Koster 1984: 141; Reuland and Reinhart 1995: 242) (23) Max1 keek achter zich1 /*zichzelf1. Max glanced behind SE/ himself ‘Max1 glanced behind himself1.’ (cf. Koster 1986: 334, 335)
[Dutch]
Furthermore, Principle A suggests that anaphoric binding is a local phenomenon; however, as the case of long-distance binding in other languages shows, this is not always true (cf. the overview in Fischer, this volume). In the Icelandic example in (24), for instance, the matrix subject can bind the anaphor regardless of an intervening subject in the embedded clause. (24) Jón1 skipaði Pétri [PRO að raka [Icelandic] ??sjálfan sig1 / sig1 / hann1 John ordered Peter to shave.INF himself/ SE/ him á hverjum degi]. on every day ‘John1 ordered Peter to shave him1 every day.’ (cf. Reuland and Everaert 2001: 649; Fischer 2004a: 504) But even in more local binding relations we can observe a wide range of crosslinguistic variation, as the comparison between English, German, Dutch, and Italian in (25) reveals. (Note that some Dutch native speakers prefer the weak pronoun ’m instead of the strong pronoun hem in [25c].) (25) a. Max1 glanced behind himself1/him1. (cf. Reuland and Everaert 2001: 642) b. Max1 blickte hinter sich1 /??sich selbst1 /*ihn1. Max glanced behind SE/ himself/ him ‘Max1 glanced behind himself1.’ (cf. Fischer 2004a: 490)
[German]
c. Max1 keek achter zich1 /*zichzelf1 /hem1. Max glanced behind SE/ himself/ him ‘Max1 glanced behind himself1.’ (cf. Koster 1986: 334, 335)
[Dutch]
39. Theories of Binding
1365
d. Max1 ha dato un’occhiata dietro di sé1 /*dietro se stesso1 / [Italian] Max has given a look behind of SE/ behind himself/ ?dietro di lui1. behind of him ‘Max1 glanced behind himself1.’ (cf. Fischer 2004a: 493) To sum up, it can be concluded that the unsolved problems concern in particular the broad range of crosslinguistic variation (including non-local anaphoric binding) and optionality as regards the realization form of the bound element.
3. Reinhart and Reuland (1991, 1993) − A theory of reflexivity 3.1. The theory Another influential approach to binding which has been developed as an alternative to the Chomskyan binding theory and which addresses some of its drawbacks has been developed by Reinhart and Reuland. Based on Dutch examples like those in (21), (22), and (23) (Max1 wast zich1/zichzelf1; Max1 haat zichzelf1/*zich1; Max1 keek achter zich1/ *zichzelf1), their starting point is that many languages exhibit a three-way distinction as regards bound elements, which means that the simple classification into anaphors and pronouns is not sufficient. Instead, they assume that within the former group we have to distinguish between two different types of anaphors which display a different binding behaviour. In Dutch, they correspond to the forms zich vs. zichzelf. Reinhart and Reuland refer to these two types of anaphors as SE (simplex expression) vs. SELF anaphors. What these two elements have in common, they argue, is that they are referentially defective, which means that they depend on their antecedents in order to pick out a referent − a property that distinguishes them from pronouns. (For a different view cf. Kiss 2012, who argues that it contradicts the concept of exemptness adopted by Reinhart and Reuland if anaphors are assumed to be referentially defective.) However, the two anaphors differ from each other in one important aspect. According to Reinhart and Reuland, only SELF anaphors can function as reflexivizers, which means that they can ensure that a coargument of theirs refers to the same entity, which makes the predicate they belong to reflexive. By contrast, SE anaphors and pronouns have no reflexivizing function. On the basis of these assumptions, Reinhart and Reuland propose to replace Chomsky’s binding principles A and B with the conditions (26)−(28), which are based on the definitions in (29). (26) Condition A: A reflexive-marked (syntactic) predicate is reflexive. (27) Condition B: A reflexive (semantic) predicate is reflexive-marked. (28) Condition on A-Chains (= Chain Condition): A maximal A-chain (α1, ..., αn) contains exactly one link − α1 − which is +R.
1366
VI. Theoretical Approaches to Selected Syntactic Phenomena
(29) a. The syntactic predicate of (a head) P is P, all its syntactic arguments, and an external argument of P (subject). b. The syntactic arguments of P are the projections assigned θ-role or case by P. c. The semantic predicate of P is P and all its arguments at the relevant semantic level. d. A predicate P is reflexive iff two of its arguments are coindexed. e. A predicate P is reflexive-marked iff either P is lexically reflexive or one of P’s arguments is a SELF anaphor. f. Generalized Chain Definition: C = (α1, ..., αn) is a chain iff C is the maximal sequence such that there is an index i such that for all j, 1 ≤ j ≤ n, αj carries that index, and for all j, 1 ≤ j < n, αj governs αj + 1. g. An NP is +R iff it carries a full specification for φ-features (gender, number, person, case). The absence of contrasts within the domain of a class implies the absence of a specification for that class. (cf. Reuland and Reinhart 1995: 255) As regards the standard examples from the previous section (repeated in [30]), the new principles make correct predictions. (30) a. b. c. d.
Anna1 recognized herself1/*her1 in the picture. She/*Herself/*Sheself likes the picture. Her1 brother recognized *herself1/her1 in the picture. Anna1 said that Paul recognized *herself1/her1 in the picture.
In (30a), the predicate recognized is reflexive because it has two coindexed arguments. Hence, Condition B requires reflexive-marking − however, this requirement is only fulfilled if the bound element is realized as SELF anaphor. Pronominal binding in (30a) violates Condition B. As far as Condition A is concerned, it only applies non-vacuously in (30) if the SELF anaphor is involved. In this case, it is satisfied in (30a) since the predicate is reflexive. As regards the Chain Condition, the maximal A-chain which contains the bound element is (Anna1, bound element1); if the bound element is realized as SELF anaphor, which is [−R], the condition is fulfilled − a pronoun, however, would violate it, since it is [+R] in this position (a structural case position; cf. the discussion in section 3.2). To sum up, pronominal binding is excluded in (30a) by both Condition B and the Condition on A-Chains. In (30b)−(30d) the situation is different insofar as no reflexive predicate is involved: In (30b), there are no coindexed elements at all, and in (30c) and (30d), they are not arguments of the same predicate. (In [30c], her functions as possessor and is only part of the subject, the only coargument of the SELF anaphor; in [30d], the coindexed elements are not arguments of the same clause.) Thus, Condition B applies vacuously and Condition A rules out the SELF anaphor in all three sentences. (In the case of pronominal binding, Condition A is again irrelevant.) As regards the Chain Condition, the maximal A-chain which contains the bound element corresponds in all three examples to a trivial one-member chain: In (30c) and (30d), the coindexed elements are not part of it because in this case the government requirement could not be fulfilled (cf. [29f]). As a result, the Condition on A-Chains excludes the SELF anaphor in all three cases, since it is
39. Theories of Binding
1367
always [−R]. By contrast, pronominal realization of the bound element satisfies the Chain Condition, since pronouns are [+R] in these positions. Against the background of these standard examples, it might remain unclear what the Condition on A-Chains is needed for, since it only confirms the results predicted by Condition A and B, and furthermore, one might wonder where the difference between syntactic and semantic predicates plays a role. Let us therefore briefly turn to some examples which shed light on these questions before we come back to the sentences that proved to be problematic for Chomsky’s binding theory. In contrast to Condition A and B, the Condition on A-Chains explicitly distinguishes between NPs which are [+R] and those that are [−R]. Hence, it generally helps to differentiate between pronominal and anaphoric binding and furthermore imposes restrictions on the binder; i.e., it makes sure that anaphors do not function as antecedents, and pronouns occur only as bound elements if a barrier intervenes. Here, the question might arise of whether the latter configuration is not generally subsumed under Condition B, which excludes pronouns in a relatively local binding relation (namely if binder and bindee are coarguments of the same semantic predicate). However, although Condition B and the Condition on A-Chains make the same predictions in many contexts (as in [30a]), this is not always the case. ECM constructions as in (31) serve as an example where only the Chain Condition is violated − the maximal A-chain involved in this sentence is (Henk1, hem1) and the pronoun is [+R]. However, Condition B is not violated; there is no reflexive semantic predicate because the coindexed elements are not coarguments, hence the condition applies vacuously. (31) *Henk1 hoorde [hem1 zingen]. Henk heard him sing ‘Henk1 heard himself1 sing.’ (cf. Reinhart and Reuland 1993: 710)
[Dutch]
It also happens that only Condition B is violated and the Condition on A-Chains is fulfilled (in which case the ungrammaticality appears to be weaker). A case in point are coordination structures as in (32) and (33), which show moreover why Condition B must refer to semantic predicates and reflexive-marking at the relevant semantic level. Although there does not seem to occur a reflexive predicate in these examples from a syntactic point of view, the semantic interpretations in (32b) and (33b) show that there is indeed a semantic level at which we find a reflexive predicate that is not correctly licensed via reflexive-marking. Hence, Condition B is violated; the (syntactically defined) Chain Condition, by contrast, is satisfied in these examples. (32) a. *[Felix but not Lucie1 ] praised her1. b. [Felix (λx (x praised her))] but not [Lucie (λx (x praised x))] (cf. Reinhart and Reuland 1993: 676, 677) (33) a. *[The queen1 ] invited both [Max and her1 ] to our party. b. the queen (λx (x invited Max & x invited x)) (cf. Reinhart and Reuland 1993: 675) As regards Condition A, it refers to syntactic predicates and explicitly takes into account external and case-marked arguments of the predicate under consideration and not only
1368
VI. Theoretical Approaches to Selected Syntactic Phenomena
θ-marked arguments. This is motivated by raising and ECM constructions of the type illustrated in (34). (34) a. Lucie1 seems to herself1 [tLucie to be beyond suspicion]. b. Lucie1 expects [herself1 to entertain Max]. (cf. Reinhart and Reuland 1993: 679, 680) In (34a), the matrix subject is not θ-marked by the raising verb, but still the former and the anaphor are coarguments from a syntactic point of view and make the syntactic predicate seem reflexive; hence, Condition A is fulfilled and the sentence is grammatical. Similarly, in (34b), the anaphor counts as syntactic argument of the matrix verb expect since it is case-marked by the latter, although it is θ-marked by the embedded verb. (Note that the matrix and the embedded subject are not coarguments with respect to the semantic predicate expect; the latter takes the matrix subject and the embedded TP as argument − hence, Condition B does not apply; cf. also [31].) Thus, Condition A is again fulfilled, because the syntactic predicate which herself reflexive-marks is reflexive. In fact, being θ-marked by the embedded verb, we should expect that the SELF anaphor also reflexive-marks this predicate and causes a violation of Condition A. Reinhart and Reuland therefore assume that the embedded verb raises at LF and forms a complex predicate with the matrix verb (via adjunction). As a result, herself is no longer the external argument of entertain, which is therefore not reflexive-marked. However, the SELF anaphor still functions as syntactic argument of the embedding (complex) predicate and thus reflexive-marks it. Hence, Condition A is satisfied at last. (35) V-raising at LF: Lucie [to-entertaini-expectj ]j [herself ti Max] (cf. Reinhart and Reuland 1993: 708)
3.2. Predictions Let us now come back to some of the problems the Chomskyan binding principles face (cf. [36], repeated from [20]). (36) Max1 glanced behind himself1/him1. (cf. Reuland and Everaert 2001: 642) In contrast to Chomsky’s binding theory, the reflexivity approach no longer predicts a complementary distribution of anaphors and pronouns, and with respect to examples like (36), the conditions in (26)−(28) (Condition A, Condition B, and the Condition on AChains) make correct predictions: Since the preposition lacks a subject, it does not form a syntactic predicate and hence Condition A does not apply, even if a SELF anaphor occurs as an argument of the preposition (in this case, it would be an exempt anaphor). Condition B does not apply either, because the coindexed elements are not coarguments; so there is no reflexive predicate involved. As far as chain formation is concerned, Reuland and Reinhart (1995: 261) assume that there is a chain between the bound el-
39. Theories of Binding
1369
ement and its antecedent. Note that this is not the case in Reinhart and Reuland (1993: 702), where they assume that the preposition in these sentences forms a minimality barrier, following Chomsky (1986a). (However, on this assumption the Chain Condition would rule out the anaphor, because the trivial chain himself1 would then be a maximal A-chain but would not contain a [+R]-element.) But since the Chain Condition has to rule out the pronoun in sentences like (36) in languages like German (cf. [38b]), Reuland and Reinhart (1995) propose a modified analysis based on Rizzi’s (1990b) framework, according to which P does not block government. Thus, it is clear that the anaphor fulfills the Condition on A-Chains in (36) because anaphors are generally [−R] − but what about the pronoun? Reinhart and Reuland (1995: 262) suggest that since the pronoun bears inherent case in these positions and English shows no contrast within the inherent case system, the pronoun is not specified for case in these sentences. As a result, it is [−R] in (36) and thus does not violate the Condition on A-Chains. However, it seems to me that this analysis also suggests that sentences like the following should violate the Chain Condition on the assumption that the pronoun is [−R] in this position: (37) Max glanced behind her. A-chain violating the Chain Condition: (her[−R]) The German and Dutch counterparts of example (36) (cf. [38b] and [38c], respectively, repeated from [25]) cannot be analyzed as straightforwardly. (38) a. Max1 glanced behind himself1/him1. (cf. Reuland and Everaert 2001: 642) b. Max1 blickte hinter sich1 /??sich selbst1 /*ihn1. Max glanced behind SE/ himself/ him ‘Max1 glanced behind himself1.’ (cf. Fischer 2004a: 490) achter zich1 /*zichzelf1 /hem1. c. Max1 keek Max glanced behind SE/ himself/ him ‘Max1 glanced behind himself1.’ (cf. Koster 1986: 334, 335)
[German]
[Dutch]
As in English, the Dutch pronoun satisfies the Chain Condition, since it is unspecified for (inherent) case. In German, the situation is different. Since German shows a case contrast within the inherent case system, the pronoun is fully specified for all φ-features; hence, it is [+R] and violates the Condition on A-Chains. However, the question remains open as to why SELF anaphors are ungrammatical in Dutch and German. As argued above, neither Condition A nor Condition B apply, which accounts for the grammaticality of the SE anaphors, since they do not violate any condition. But in order to rule out the SELF anaphors, it would have to be assumed that they violate either Condition A or the Chain Condition. The first possibility must be rejected because the English account crucially relies on the fact that Condition A does not apply in these examples; and a violation of the Chain Condition would only occur if there was a barrier between the anaphor and its antecedent and the trivial chain (anaphor1) was a
1370
VI. Theoretical Approaches to Selected Syntactic Phenomena
maximal A-chain. However, this solution must also be excluded, because Reinhart and Reuland’s (1995) account of the ungrammaticality of pronouns in German sentences like (38b) is based on the assumption that (Max1, ihn1) forms a chain that violates the Chain Condition. Hence, the ungrammaticality of the SELF anaphors in (38b) and (38c) cannot be derived directly from the three conditions in (26)−(28) − Condition A, Condition B, and the Condition on A-Chains − and something more needs to be said with respect to these cases. So again crosslinguistic variation seems to be a challenge for the theory.
4. Hornstein (2001) − bound elements as spelt-out traces In the previous two sections, two of the most influential binding theories have been introduced − Chomsky (1981) (which has been developed further in Chomsky 1986b) and Reinhart and Reuland (1993) / Reuland and Reinhart (1995). In this section, we will take a closer look at a more recent theory which has been developed within a derivational framework. Hornstein’s (2001) approach to binding seeks to eliminate binding theory as a separate module by subsuming it under the theory of movement. What Hornstein basically proposes is that anaphors are “the residues of overt A-movement” (Hornstein 2001: 152); pronouns, on the other hand, are considered to be the elsewhere case: formatives, i.e. no real lexical expressions, which are licensed if the movement option is not available and the derivation cannot converge otherwise. Note that two related proposals have been put forward by Kayne (2002) and Zwart (2002). In a nutshell, these three theories can be characterized as follows: They all share the underlying assumption that an antecedent and its bindee start out as or even are (cf. Hornstein 2001) one constituent before the antecedent moves away to a higher position.
4.1. The role of case checking Let us first consider a standard example like (39). A derivation like the one indicated in (39a), where John starts in the object position, then moves to the subject position, and the remaining copy is phonetically realized as himself, must be rejected for case reasons, Hornstein argues: As subject, John − and hence also the copy − must have Nominative case features. However, the verb like bears Accusative case features, which cannot be checked by the Nominative case features on the copy John. Hence, Hornstein suggests that it is not the copy John which satisfies the verb’s case requirements but the morpheme self, a semantically inert morpheme that is adjoined to John in the beginning and prevents a case clash in examples like (39) (cf. [39b]). Note that Hornstein generally tolerates derivations in which an argument receives two θ-roles (for instance, in the object and subject position); in fact, his theory involves movement into θ-positions (cf. the discussion below). As to the notation, Hornstein marks unchecked features with “+” and checked features with “−” (i.e., these symbols are not used to reflect interpretability/ uninterpretability). (39) John likes himself. (cf. Hornstein 2001: 159)
39. Theories of Binding
1371
a. impossible derivation: [TP John [vP likes (PF=himself )]] b. proposed derivation: [TP John [vP likes [[]self] ]] The derivation then proceeds as indicated in (40). After John (with Nominative case features) and self (with Accusative case features) have been merged into the object position (where John is assigned the object θ-role), John moves to the subject θ-position (SpecvP in [40]), where it receives the subject θ-role. Then it moves on to SpecTP, where it finally checks its Nominative case features and satisfies the EPP. The Accusative case features of the verb are checked against the case features on self, which therefore has to move to SpecvP at LF. As Hornstein points out in his footnote 28, it is not relevant to him as to whether this movement is overt or covert. However, if this movement were overt, the Linear Correspondence Axiom (LCA; cf. also Kayne 1994) would require the deletion of the lower copy of self (not of the higher one) and thus predict the wrong linear order; cf. the argumentation below. Hence, this option is not really available. (40) a. overt movement: [TP John-[−NOM] [vP likes [VP [[] self-[+ACC]]]]] b. covert movement: [TP John-[−NOM] [vP self-[−ACC] [vP likes [VP [[] ]]]]] What remains to be explained is where the phonetic form himself finally comes from. According to (40a), we find a copy of John plus self in the object position, which could at most result in the form Johnself. However, Hornstein argues that in order to satisfy the LCA all copies except for one must be deleted at PF, since otherwise linearization is impossible (cf. Hornstein 2001: 79, 80, 85, 160 and Nunes 1995, 1999). According to Nunes (1999) this deletion operation is triggered by a principle called Chain Reduction: (41) Chain Reduction: Delete the minimal number of constituents of a nontrivial chain CH that suffices for CH to be mapped into a linear order in accordance with the LCA. (cf. Nunes 1999: 228) Thus the question arises as to which of the copies are deleted. Since the derivation can only converge if no uninterpretable features survive at the interface (cf. the principle of Full Interpretation), the best option is to keep the copy of John in SpecTP and delete the others, because the former is the only one where the case features have been checked and are therefore invisible at the interfaces. Hornstein argues that the choice is motivated as follows: The copy in SpecTP could not be deleted since it is not defective (it does not bear an unchecked uninterpretable feature), which he considers to be a licensing criterion for deletion. According to Nunes (1999), the choice follows from economy considerations based on the application of Formal Feature Elimination (FF-Elimination) (cf. [42]).
1372
VI. Theoretical Approaches to Selected Syntactic Phenomena
(42) a. Formal Feature Elimination (FF-Elimination) Given the sequence of pairs σ = 具(F, P)1, (F, P)2, ..., (F, P)n典 such that σ is the output of Linearize, F is a set of formal features, and P is a set of phonological features, delete the minimal number of features of each set of formal features in order for σ to satisfy Full Interpretation at PF. (cf. Nunes 1999: 229) b. Linearize corresponds to the operation that maps a phrase structure into a linear order of X ◦ elements in accordance with the LCA. (cf. Nunes 1999: fn. 5) c. The principle of Full Interpretation states that linguistic levels of representation (LF and PF) consist solely of +Interpretable elements. (cf. Martin 1999: 1) d. Checking operations render −Interpretable features invisible at LF and PF. (cf. Nunes 1999: 229) If the highest copy of John in (39b) is kept, no unchecked case feature must be deleted by FF-Elimination (cf. also the illustration in [40a]). By contrast, if another copy survives instead, its unchecked case feature will have to be eliminated additionally. Hence, to keep the highest copy is the most economical option. However, if the lowest copy of John is deleted, the bound morpheme self gets into trouble because it needs some morphological support. Hence, a last resort expression must be inserted to ensure the convergence of the derivation − and this is the pronoun him, which agrees in case with self. The correct LF and PF representations of sentence (39) therefore look as follows, where deleted copies are crossed out. (Hornstein does not explain explicitly what happens to the unchecked uninterpretable Accusative case feature on the lower copy of self at PF. I suppose he has something like Nunes’s Formal Feature Elimination in mind, according to which this feature would simply be eliminated at PF to ensure convergence; cf. [42].) (43) a. deletion before Spell-Out: [TP John-[−NOM] [vP likes [VP [[] self-[+ACC]]]]] b. LF: [TP John-[−NOM] [vP self-[−ACC] [vP likes [VP []]]]] (John: bears subject and object θ-role) c. PF: [TP John-[−NOM] [vP likes [VP [HIM+self-[+ACC]]]]]
4.2. The Linear Correspondence Axiom and the Scope Correspondence Axiom What the derivation of example (39) suggests is that self is inserted to prevent a case clash; however, there is more behind it, since SELF anaphors also occur in contexts in which their antecedents bear the same case (cf. [44]).
39. Theories of Binding
1373
(44) a. I expect John to like himself. b. *I expect John to like. (cf. Hornstein 2001: 162) If self were only needed if there was no other way to check the verb’s Accusative case features, we would expect a sentence like (44b) to be grammatical, because all copies of John would have an Accusative feature, and thus a lower copy could check case against the features of the embedded verb like (cf. [45b]). Hornstein suggests that in order to do so, “one of these copies moves to the outer Spec of the lower [vP]” (Hornstein 2001: 162); however, the motivation of this additional movement step is not clear to me, since one of the copies already occurs in the lower SpecvP, where Accusative case would be checked; cf. (45a). (45) a. neglecting case checking: [vP expect [TP John-[+ACC] to [vP like ]]] b. case-driven movement: [vP John-[−ACC] expect [TP to [vP John-[−ACC] [v′ [v′ like ]]]]]] Now the question arises as to why this derivation is ruled out, independent of whether Accusative case checking takes place in overt syntax or at LF. According to Hornstein, the answer is that the derivation inevitably results in a violation of the LCA or the LF counterpart he proposes, the so-called Scope Correspondence Axiom (SCA), which assigns elements at LF a scope order (cf. Hornstein 2001: 85). (46) Scope Correspondence Axiom (SCA): If α c-commands β at LF, then α scopes over β. Both the LCA and the SCA basically require that all copies but one be deleted to allow linearization or scopification, respectively. Note that scope is assumed to be irreflexive, which means that an expression cannot scope over itself. Hence, in order to be able to assign a coherent scope order, the SCA forces the deletion of all copies but one at LF. If this is not respected, the derivation will crash, since the SCA is a convergence requirement just like the LCA. However, Hornstein argues that deletion can only occur if an expression is defective in some way, for example if it bears an unchecked uninterpretable feature. On the assumption that Accusative case checking (as indicated in [45b]) takes place overtly, we find two instances of John with checked case features in the overt syntax already; hence, none of these two copies will be deleted, and both the LCA and the SCA are violated. If case checking takes place covertly, the uninterpretable case features are still unchecked at PF; thus deletion of all copies except for one can take place and the LCA will be satisfied. Again, Hornstein is not very explicit about the concrete process at PF, but he probably has in mind something like the following: Since all the copies bear an unchecked uninterpretable case feature in this case, they are all defective and could therefore in principle be deleted. However, an operation like Chain Reduction ensures that not all copies are deleted (which would be possible according to the defectiveness approach) and one member survives. But in order to guarantee the convergence of the
1374
VI. Theoretical Approaches to Selected Syntactic Phenomena
derivation, the remaining unchecked uninterpretable feature on this copy must be rendered invisible at the interface, which can be settled by a rule like FF-Elimination. The latter would also ensure that it is the highest copy which is not deleted, because even if all the members share the same amount of unchecked uninterpretable case features, only the highest copy (in this scenario, the copy in embedded SpecTP) has its N-feature checked (against the EPP feature of the embedded T), and therefore it requires the minimal number of features to be eliminated. At LF, again two of the copies check their Accusative case feature, and as a result they will not be deleted, in violation of the SCA. By contrast, if it is assumed that the morpheme self is inserted to check the embedded verb’s case features, these LCA/SCA violations do not occur. To sum up, it can be concluded that self is not only required to avoid a case clash in sentences like (39) (John likes himself ); as examples like (44a) (I expect John to like himself ) show, it is also needed in order to avoid a violation of the LCA and the SCA.
4.3. Principle B One basic assumption in Hornstein’s approach is that neither reflexives nor bound pronouns are part of the lexicon. They are considered to be functional morphemes that are only used if required for the convergence of a derivation. Hence, they do not occur in the numeration but can be added in the course of the derivation if necessary. This means that sentences which differ only with respect to the question of whether the bound element is realized as pronoun or anaphor have the same underlying numeration, and Hornstein assumes that pronouns only emerge as last resort expressions if reflexivization is not available, i.e., if movement is not possible. Hence, the approach captures the nearcomplementary distribution of pronouns and anaphors; however, it remains unclear as to how those cases have to be treated that allow both forms. In the standard examples in (47), the proposal leads to the following result: (47b) is ungrammatical, because the alternative derivation in (47a) is licit and therefore blocks pronominalization, which is assumed to be more costly. In (47c), on the other hand, the bound pronoun occurs in the subject position of an embedded finite clause, a position in which a DP can check its case and φ-features; if it moved on to the matrix subject position, it could therefore not check these features anymore, which means that the derivation would crash under the movement approach. Hence, instead of an anaphor, the overt residue of DP movement, we find a pronoun in (47c), which is inserted to save the derivation. (47) a. b. c. d.
John1 likes himself1. *John1 likes him1. John1 said that he1 would come. John1 likes him2.
However, why is (47d) not blocked by (47a), the derivation in which John receives both θ-roles and self is inserted in the object position? The answer Hornstein provides is that deictic pronouns, unlike bound pronouns, do occur in the numeration and are permitted
39. Theories of Binding
1375
because they are needed “to support the stress/deixis feature” (Hornstein 2001: 176). Hence, sentences involving pronouns fall into two groups, because those involving unbound pronouns are not based on the same numeration as examples involving bound pronouns. Thus, the derivation of the former cannot be compared to the latter. However, as to the problems that have been brought up in section 2.4, it can be concluded that Hornstein’s approach does not provide any answers; as it stands, it does not seem to leave room for optionality, the distinction between simple and complex anaphors, or crosslinguistic variation, including long-distance binding. So although it might be tempting to subsume binding under the notion of movement, it is unclear how an adequate attempt of parametrization could look like.
5. Fischer (2004b, 2006) − Derivational binding and local optimization As shown above, what remains largely unanswered in many theories of binding is in particular the question of how optionality and the broad range of crosslinguistic variation can be accounted for. Against this background, a competition-based analysis would seem to be a good alternative; here, the underlying principles can be formulated in such a general way that they reflect the universal tendencies, while violable constraints keep the system flexible enough to account for all the language-specific differences. In this section, I will first consider some general advantages of competition-based approaches to binding, before I will then present one concrete example in detail, namely the optimalitytheoretic account proposed in Fischer (2004b, 2006) (which also adopts a derivational view of the syntactic component). For reasons of space, I will again restrict myself to a general outline of the theory and will only apply it to selected examples to demonstrate how the mechanism works.
5.1. The merits of competition-based approaches to binding Many competition-based binding theories have been motivated by the observation that the standard binding principles A and B are to a certain extent redundant, because they constitute two isolated principles which refer to exactly the same domain of application and are therefore completely symmetric. As a consequence, it has often been proposed to replace the two principles by one generalized constraint which works in the following way: It imposes requirements on one of the two forms, and if these requirements can be met, the insertion of the second form is blocked. A straightforward implementation has been developed by Fanselow (1991), who proposes a generalized version of Principle A. Pronouns, by contrast, can in principle occur everywhere − however, they only emerge if anaphors are blocked by Principle A; cf. also Burzio’s (1989, 1991, 1996, 1998) and Safir’s (2004) accounts, which are roughly based on the same underlying idea. Further competition-based theories of binding include, for instance, Newson (1998), Menuzzi (1999), or Wilson (2001); as regards a famous predecessor of blocking theory, cf. also Chomsky’s (1981) Avoid Pronoun Principle.
1376
VI. Theoretical Approaches to Selected Syntactic Phenomena
Note that this kind of analysis a priori also predicts a strictly complementary distribution of anaphors and pronouns, and different strategies have been proposed to solve the problem concerning optionality in a competition-based approach. Thus, it is sometimes queried whether these are real instances of true optionality, or whether they exhibit subtle structural or interpretational differences (cf., for instance, Fanselow 1991; Safir 2004); on this assumption, the two forms would result from different underlying competitions. Alternatively, a common way to handle optionality in optimality theory is to adopt the concept of ties, which means that constraints can be equally important; as a result, violations of these constraints weigh the same (cf., for instance, Fischer 2004b, 2006). Let us now turn to some general advantages that competition-based approaches to binding have. First, the (near-)complementary distribution of anaphors and pronouns does not have to be stipulated by the underlying principles; instead, it is expected, since pronouns emerge whenever the requirements for anaphors cannot be fulfilled. Second, in contrast to the standard binding theory, it is not excluded a priori that pronouns might occur in a relatively local binding relation − it is only barred if anaphoric binding is available instead. However, if the latter option is not available, pronominal binding is not blocked by locality restrictions. Hence, a competition-based analysis also straightforwardly accounts for the fact that we never find syntactic configurations in which a binding relation can neither be expressed by an anaphor nor by a pronoun: As soon as anaphoric binding is blocked, pronouns typically step in, even if this implies local pronominal binding. By contrast, in a theory based on two independent principles, it is imaginable that neither of them is fulfilled and both forms are excluded in a given context. That this does not happen in binding theory has to be implemented in the formulation of Principle A and B in the standard theory; in a competition-based approach, this follows automatically from the architecture of the system. One well-known case in point is (local) binding by first or second person antecedents, which involves in many languages (like German or the Romance languages) pronouns instead of anaphors. One way to distinguish the binding behaviour of third vs. first/ second person antecedents is to assume an underlying hierarchy according to which agreement relations between first/second antecedents and simple anaphors are worse than combinations involving a simple anaphor and a third person antecedent; cf. Burzio (1991 and subsequent work). Another example involves the so-called Anaphor Agreement Effect, which states that “anaphors do not occur in syntactic positions construed with agreement” (Rizzi 1990a: 27; Woolford 1999: 257); cf. also Everaert (1991), who focuses on Germanic and therefore associates the unavailability of anaphors in these positions with Nominative case. It accounts for the fact that we do not find anaphors in the subject position of tensed clauses in languages with agreement. In standard approaches, this has often been captured by defining the binding domain in such a way that it never contains the antecedent of bound subjects in tensed clauses; however, this typically led to a rather inhomogeneous notion of binding domain (cf. also Chomsky’s 1981 formulations in [17] and [18b]). Instead, it might be more plausible to view these cases as instances of locally bound pronouns which occur as elsewhere case since anaphors are blocked due to the Anaphor Agreement Effect. (Note also that in more recent work, the connection between anaphoric binding and agreement has often been taken up by suggesting that the former might in fact be syntactically encoded as Agree; cf., for instance, Chomsky 2008; Kratzer 2009; Reuland 2011.) This becomes in particular apparent in the following Icelandic
39. Theories of Binding
1377
example: As (48a) shows, anaphoric binding into a subjunctive complement is in principle possible, even if the anaphor occurs in the subject position − however, only as long as the subject bears a lexical case and hence does not agree. In (48b), by contrast, where the bound element bears Nominative case and thus agrees, anaphoric binding is ruled out (cf. Everaert 1991: 280, 281; Woolford 1999: 260, 261). With a pronominal subject in the embedded clause, (48b) would be grammatical. In view of (48a), we thus have another example of a bound pronoun which only emerges because the anaphor is ruled out in this particular configuration (which is not related to locality in this case, as [48a] shows). sagði að sér1 þætti vænt um mig (48) a. Hún1 she1.NOM said that SE1.DAT was.SBJV fond of me ‘She said that she was fond of me.’
[Icelandic]
segir að sig1 elski Maríu. b. *Jón1 John1.NOM says that SE1.NOM loves.SBJV Maria ‘John says that he loves Maria.’ (cf. Woolford 1999: 261; Everaert 1991: 280, 281) The issue of locally bound pronouns also raises the important question of how to define anaphors versus pronouns independent of their syntactic occurrence. In fact, locally bound pronouns have sometimes also been argued to be instances of anaphors; however, if we rely on their syntactic behaviour to define these terms, we end up with a circular line of reasoning − anaphors have to be locally bound, and whatever is locally bound is considered to be an anaphor. To resolve this dilemma, the morphological and semantic properties of the two forms are usually taken into account as a distinguishing criterion. Of course, different proposals along this line have been put forward; to give an example, I will briefly outline Burzio’s (1991 and subsequent work) approach. He introduces the notion of Referential Economy, which says that “a bound NP must be maximally underspecified referentially” (Safir 2004: 71); this is based on the underlying referential hierarchy anaphor [ pronoun [ R-expression and refers to the question of “how much semantic content a term has” (cf. Safir 2004: 71). Note that Referential Economy replaces Burzio’s earlier notion of Morphological Economy, which assumed that “a bound NP must be maximally underspecified” (Safir 2004: 69); cf. also Safir’s (2004) discussion of these principles. Safir then replaces the term maximally underspecified referentially with most dependent, because the latter formulation does not hinge on the assumption that anaphors do not have any features.
5.2. Basic assumptions and observations in Fischer (2004b, 2006) In this section we will now turn to one exemplary implementation of a competition-based approach to binding. Apart from the empirical goal to capture in particular crosslinguistic variation and optionality, the theoretical aim of Fischer (2004b, 2006) is to develop a theory of binding in a local derivational framework. In contrast to Hornstein’s (2001) derivational theory outlined above, this means that we are restricted in two ways. A
1378
VI. Theoretical Approaches to Selected Syntactic Phenomena
derivational theory in general implies that we do not wait until a derivation has been completed but compute the structure in the course of the derivation. This means that at a given point in the derivation, there is no look-ahead. In a local derivational theory, access to earlier parts of the derivation is also restricted. Technically, this is implemented by adopting Chomsky’s Phase Impenetrability Condition (following Chomsky 2000 and subsequent work). (As far as the general idea is concerned that operations are restricted to some local domain, cf. also van Riemsdijk 1978 and Koster 1987.) (49) Phase Impenetrability Condition (PIC): The domain of a head X of a phase XP is not accessible to operations outside XP; only X and its edge are accessible to such operations. (50) The domain of a head corresponds to its c-command domain. (51) The edge of a head X is the residue outside X′; it comprises specifiers and elements adjoined to XP. Once it is assumed that we only have access to a small piece of the structure at a given point, it seems reasonable to make the system as restrictive as possible by minimizing the search space. This can be achieved if it is assumed that the PIC extends to all phrases; cf., for instance, Müller (2004, 2010); Fischer (2004b, 2006). The basic problem we encounter in a local derivational approach with respect to binding concerns the observation that, on the one hand, binding is not a strictly local phenomenon, i.e., binding relations typically cover a distance which goes beyond one ph(r)ase. However, it would not be sufficient just to split up the non-local relation into several local ones (as it is done, for instance, in the case of wh-movement), because the locality degree of the binding relation overall determines the shape of the bound element (i.e., whether it is realized as SELF anaphor, as SE anaphor, or as pronoun). This means that in order to evaluate a binding relation, we need to know the exact configuration that holds between the bindee and its antecedent. However, since in a local derivational approach the surface position of the bound element (= x), might no longer be accessible when the binder is merged into the derivation, we have to make sure that x is dragged along until both elements are accessible at the same time. In Fischer (2004b, 2006), this is ensured by assuming that a binding relation technically corresponds to feature checking between the antecedent (= the probe) and the bound element x (= the goal); so movement of x is triggered by x’s need to ckeck a feature with its antecedent. Since feature checking can take place as soon as x and its binder are in the same accessible domain, the typical checking configuration for binding looks as follows (where β corresponds to the feature indicating a binding relation). (In the following, I adopt Sternefeld’s 2004 notation according to which features on probes are starred.) (52) Feature checking configuration for binding: [ZP XP[*β*] Z [WP YP[β] W]], where XP = antecedent, YP = bound element With respect to the assumption that binding involves feature checking, cf. also Reuland (2001 and subsequent work). Schäfer (2008) also assumes that anaphors start out as
39. Theories of Binding
1379
variables and proposes that binding is syntactically expressed as an Agree relation between the bound element and its antecedent; he adopts the idea of upward-probing and assumes that the variable corresponds to the probe.
5.3. Technical implementation 5.3.1. On binding domains and realization matrices The general idea in Fischer (2004b, 2006) is that the concrete realization form of the bound element x is determined in the course of the derivation, depending on the locality degree of the binding relation. Before the derivation starts, we only know that there will be a binding relation between x and its designated antecedent; this is the only information we get from the numeration. However, we are familiar with all potential realizations of x. Hence, it is assumed that x is equipped with a realization matrix, i.e., a list which contains all possible realizations of x. Thus, the maximal realization matrix looks as follows: [SELF, SE, pron]. For the sake of concreteness, consider the Dutch example in (53) as an illustration; (53b) corresponds to the underlying numeration, and (53c) to x’s realization matrix. (53) a. Max1 haat zichzelf1 /*zich1 /*hem1. [Dutch] Max hates himself/ SE/ him ‘Max1 hates himself1.’ (cf. Reuland and Reinhart 1995: 242; Fischer 2004b: 225) b. Num = {Max[*β*], haat, x[β]} c. fully specified realization matrix = [SELF,
SE,
pron]
In the course of the derivation we then benefit from the fact that − although we do not know when the antecedent will enter the derivation − we know at each point whether the binder has already been merged into the structure or not, i.e., whether [β] has already been checked. Each time x reaches one of the domains to which binding is sensitive and x remains unbound, its realization matrix might be reduced; this means that the most anaphoric specification might be deleted and henceforth no longer available (depending on the respective domain and the language under consideration). This is to be understood in the following way: SELF anaphors are more anaphoric than SE anaphors, which are in turn more anaphoric than pronouns, which can be compared to Safir’s notion of dependency or Burzio’s referential hierarchy; cf. also section 5.2. Moreover, the approach assumes that binding is sensitives to domains of different size; cf. also, among others, Manzini and Wexler (1987); Dalrymple (1993); Büring (2005); Fischer (this volume). In the end, x stops moving when it can establish a checking relation with its antecedent; at this stage, its concrete realization can be determined, which must match one of the remaining forms in the realization matrix. If there is only one element left in the realization matrix, the choice is clear, otherwise the remaining form that is most anaphoric is selected. Once the form of x is known, the whole chain it heads can be aligned and x can then be spelled out in the appropriate position.
1380
VI. Theoretical Approaches to Selected Syntactic Phenomena
As to the domains that play a role, it is assumed that binding is sensitive to the six domains defined in (54)−(59). (Note that the definitions differ slightly from the formulations used in Fischer, this volume, since they have been adapted to the derivational framework adopted here.) (54) XP-domain (XP): XP is a phrase containing x. (55) θ-domain (ThD): XP is the θ-domain of x if it contains x and the head that θ-marks x plus its external argument (if there is one). (56) Case domain (CD): XP is the case domain of x if it contains x and the head that bears the case features against which x checks case. (57) Subject domain (SD): XP is the subject domain of x if it contains x and either (i) a subject distinct from x which does not contain x, or (ii) the T with which x checks its (Nominative) case features. (58) Finite domain (FD): XP is the finite domain of x if it contains x, a finite verb, and a subject. (59) Indicative domain (ID): XP is the indicative domain of x if it contains x, an indicative verb, and a subject. What will not be discussed here in detail is the role of the larger domains, i.e. the subject, the finite, and the indicative domain. These distinctions are used in order to account for the (syntactic) behaviour of long-distance anaphora, which are sensitive to the type of complement clause in which they occur (cf. Fischer, this volume, as regards an overview of the behaviour of LDA). Moreover, I will neglect R-expressions and Principle C here, but note that they can be integrated into the theory straightforwardly by additionally inserting into the matrix a copy of the R-expression serving as antecedent. The same is true for inherently reflexive predicates, in which case the realization matrix might get completely cleared (for instance in the English example Max behaves like a gentleman, which does not contain an anaphor/pronoun at all).
5.3.2. On local optimization and universal subhierarchies What has been left unexplained so far is how the deletion of the matrix entries is governed. This is where competition comes into play. Note that the mechanism outlined so far does not involve any competition, and the idea of deleting specifications from the realization matrix does not really hinge on an optimality-theoretic implementation either. However, optimality theory provides smart strategies to capture crosslinguistic variation and optionality by means of constraint reranking and tied constraints. The model in Fischer (2004b, 2006) first of all assumes that optimization is local and takes place cyclically. This means that there is not just one optimization process after
39. Theories of Binding
1381
the completion of the whole syntactic derivation (= global optimization); instead, optimization takes place repeatedly after the completion of each phrase. The winner of a competition then serves as input for the next optimization procedure; the initial input corresponds to the numeration. (As to local optimization applying to phrases, cf. also Müller 2000; Heck and Müller 2000, 2003; Heck 2004.) The competing candidates differ from each other with respect to the number of specifications that have been deleted from the realization matrix of x. Since we start with the maximal realization matrix, the first competition includes three candidates − the first candidate (= O1) contains a realization matrix with full specifications ([SELF, SE, pron]); the second one (= O2) contains a realization matrix from which the most anaphoric entry has been deleted ([SE, pron]); and in the realization matrix of the third candidate (= O3), both the SELF and the SE specifications are deleted ([pron]). As regards the constraints that apply, two different types can be distinguished. On the one hand, there is a group of constraints which refer to the different domains relevant for binding; they generally favour candidates with less anaphoric specifications and therefore facilitate the deletion of features from the matrix. They require that x be minimally anaphoric if binding has not yet taken place in XD (where XD 2 {XP, ThD, CD, SD, FD, ID}); cf. (60). The effect of these constraints is similar to the standard version of Principle A: If x is unbound in a relatively local domain, we expect x not to be anaphoric. (60) PRINCIPLE ᏭXD (Pr.ᏭXD): If x[ β ] remains unchecked in its XD, x must be minimally anaphoric. If the derivation reaches one of the relevant domains and no binding relation is established, these constraints apply non-vacuously and are violated twice by candidate O1 and once by O2. As to the ordering of these constraints, it is assumed that they are ordered in a fixed universal subhierarchy in which constraints referring to bigger domains are higherranked than those referring to smaller domains. (61) Universal subhierarchy 1: PR.ᏭID [ PR.ᏭFD [ PR.ᏭSD [ PR.ᏭCD [ PR.ᏭThD [ PR.ᏭXP The second type of constraint must function as counterbalance to the PRINCIPLE Ꮽconstraints insofar as it must favour the more anaphoric specifications. This is achieved by the three FAITH-constraints in (62) in connection with their universal ordering indicated in (63). (62) a. FAITHSELF (FSELF): The realization matrix of x must contain [SELF]. b. FAITHSE (FSE): The realization matrix of x must contain [SE]. c. FAITHpron (Fpron): The realization matrix of x must contain [pron]. (63) Universal subhierarchy 2: FAITHpron [ FAITHSE [ FAITHSELF The choice of x’s realization form is guided by one further principle (which is not violable); cf. (64).
1382
VI. Theoretical Approaches to Selected Syntactic Phenomena
(64) Maximally Anaphoric Binding (MAB): Checked x[β] must be realized maximally anaphorically. Once x is bound, the most anaphoric element of the optimal specification matrix is thus selected. In fact, (64) can be considered to be a PF instruction. Note that this does not mean that PF needs to have access to semantic information. In fact, the choice at PF is made on the basis of morphological considerations: What we can see at PF are the feature specifications contained in the matrix when it is finally mapped to the interfaces, and the preferred realization is the SELF anaphor (if available), the second best realization is the SE anaphor (if available), and the third best realization is the pronoun (cf. also Mycock, this volume, as regards the relation between syntax and its interfaces). The notion of maximal anaphoricity thus refers to the morphological forms, which in turn of course reflect the degree of anaphoricity in a semantic sense.
5.4. Derivational binding in Dutch In order to illustrate how the theory works in practice, consider the Dutch data in (65). They represent examples with the following binding behaviour: (a) binding inside the minimal θ-domain; (b) binding inside the minimal case domain; (c) binding inside the minimal subject domain; (d) binding inside the minimal finite/indicative domain. (65) a. Max1 haat zichzelf1 /*zich1 /*hem1. Max hates himself/ SE/ him ‘Max1 hates himself1.’
(= [53a])
[Dutch]
b. Max1 hoorde zichzelf1 /zich1 /*hem1 zingen. Max heard himself/ SE/ him sing ‘Max1 heard himself1 sing.’ c. Max1 keek achter *zichzelf1 /zich1 /hem1. Max looked after himself/ SE/ him ‘Max1 glanced behind him1/himself1.’ d. Max1 weet dat Mary *zichzelf1 /*zich1 /hem1 leuk vindt. Max knows that Mary himself/ SE/ him nice finds ‘Max1 knows that Mary likes him1.’ (cf. Fischer 2004b: 225) As far as example (65a) is concerned (repeated in [66]), it only allows the complex anaphor as bound element. This is correctly predicted if PRINCIPLE ᏭXP is ranked below FAITHSELF. On this assumption, O1 is the sole winner of the first competition (cf. T1), and when the binder is merged into the derivation in the next phrase (cf. [67]), [SELF, SE, pron] is predicted to be the optimal realization matrix (cf. T1.1). Hence, MAB finally selects the SELF anaphor as optimal realization. (Note that those parts of the derivation that have become inaccessible are crossed out.)
39. Theories of Binding
1383
(66) Max1 haat zichzelf1/*zich1/*hem1. a. [VP x[ β ] tx haat]
[Dutch]
T1: VP optimization (XP reached − x[β] unchecked) Candidates ☞ O1: [SELF,
Fpron SE,
FSE
FSELF
PR.ᏭXP
pron]
**
O2: [SE, pron]
*!
O3: [pron]
*!
*
*
(67) b. [vP Max[*β*] [VP x[β] [V′ tx thaat]] haat] T1.1: vP optimization (x[ β ] checked: PRINCIPLE AXD applies vacuously) Input: O1/T1
Fpron
☞ O11: [SELF,
SE,
FSE
FSELF
pron]
O12: [SE, pron]
*!
O13: [pron]
*!
*
In example (65b) (repeated in [68]), both anaphors can function as bound elements, i.e., we have an example of optionality. In order to derive this, PRINCIPLE ᏭThD must be tied with FAITHSELF : As a result, both O1 and O2 win in the first competition (cf. T2), because when optimization takes place not only an XP but also the θ-domain of x has been reached. [Dutch]
(68) Max1 hoorde zichzelf1/zich1/*hem1 zingen. a. [vP x[β] zingen] T2: vP optimization (XP/ThD reached − x[β] unchecked) Candidates ☞ O1: [SELF,
Fpron SE,
FSE
|
PR.ᏭThD
PR.ᏭXP
|
**(!)
**
*(!)
|
*
*
*
|
pron]
☞ O2: [SE, pron] O3: [pron]
FSELF
*!
When the next phrase is completed, no new domain relevant for binding has been reached, but x’s θ-role assigner (zingen) is still accessible, hence, both PRINCIPLE ᏭXD and PRINCIPLE ᏭThD apply again non-vacuously. In the competition based on the input [SELF, SE, pron], the first two candidates are therefore again predicted to be optimal (cf. T2.1), and in the second competition, the matrix [SE, pron] wins (cf. T2.2).
1384
VI. Theoretical Approaches to Selected Syntactic Phenomena
(69) b. [VP x[β] [vP tx zingen] hoorde] T2.1: VP optimization (XP/ThD reached − x[β] unchecked) Input: O1/T2 ☞ O11: [SELF,
Fpron SE,
FSE
FSELF
| PR.ᏭThD
pron]
☞ O12: [SE, pron] O13: [pron]
PR.ᏭXP
|
**(!)
**
*(!)
|
*
*
*
|
*!
T2.2: VP optimization (XP/ThD reached − x[β] unchecked) Input: O2/T2
Fpron
FSE
☞ O21: [SE, pron] O22: [pron]
*!
FSELF
| PR.ᏭThD
*
|
*
|
*
PR.ᏭXP *
Now the binder enters the derivation, and so the FAITH-constraints alone determine the optimizations at the vP level. In T2.1.1, the maximally specified matrix [SELF, SE, pron] wins, and according to T2.1.2/2.2.1, [SE, pron] is optimal. Thus, MAB finally correctly predicts that either the SELF or the SE anaphor is the optimal realization of x. (70) c. [vP Max[*β*] [VP x[ β ] [vP tx zingen] thoorde] hoorde] T2.1.1: vP optimization (x[ β ] checked: PRINCIPLE ᏭXD applies vacuously) Input: O11/ T2.1 ☞ O111: [SELF,
Fpron SE,
FSE
FSELF
pron]
O112: [SE, pron]
*!
O113: [pron]
*!
*
FSE
FSELF
T2.1.2/2.2.1: vP optimization (x[ β ] checked: PRINCIPLE ᏭXD applies vacuously) Input: O12/ T2.1 or O21/ T2.2
Fpron
☞ O121/O211: [SE, pron] O122/O212: [pron]
* *!
*
In (71) (repeated from [65c]), optionality arises between the pronominal and the simple anaphoric form.
39. Theories of Binding
1385
(71) Max1 keek achter zich1/*zichzelf1/hem1. a. [PP x[ β ] achter tx]
[Dutch]
Note that in (71) x has not yet moved to its final checking position. To trigger intermediate movement steps which do not result in feature checking, Chomsky (2000, 2001, 2008) proposes the insertion of so-called edge features (cf. also the Edge Feature Condition in Müller 2010). Alternatively, Heck and Müller (2003) offer a solution on the basis of a constraint called Phase Balance, which is adapted to all phrases in (72); cf. also Fischer (2004b, 2006), following Müller (2004: 297, 298). (72) Phrase Balance (PB): Every XP has to be balanced: For every feature [*F*] in the numeration there must be a potentially available feature [F] at the XP level. (73) Potential Availability: A feature [F] is potentially available if (i) or (ii) holds: (i) [F] is on X or edgeX of the present root of the derivation. (ii) [F] is in the workspace of the derivation. (74) The workspace of a derivation D comprises the numeration N and material in the trees that have been created earlier (with material from N) and have not yet been used in D. As to the optionality that arises in (71), it can be captured if PRINCIPLE ᏭCD and FAITHSE are tied: When the prepositional phrase is completed, the domains XP, ThD, and CD are reached, which means that in addition to PRINCIPLE ᏭXP and PRINCIPLE ᏭThD , PRINCIPLE ᏭCD is now involved in the competition. On the assumption that the latter is tied with FAITHSE , optionality between O2 and O3 is predicted (cf. T3). T3: PP optimization (XP/ThD/CD reached − x[β] unchecked) Candidates O1: [SELF,
Fpron SE,
FSE
pron]
☞ O2: [SE, pron] ☞ O3: [pron]
*(!)
| PR.ᏭCD |
**!
|
*(!)
|
FSELF
| PR.ᏭThD
PR.ᏭXP
|
**
**
*
|
*
*
*
|
As a result, there are two optimization procedures when the next phrase boundary, VP, is reached. (75) b. [VP x[ β ] [PP t′x achter tx] keek] The competition based on the matrix [SE, pron] yields again two optimal outputs (cf. T3.1), whereas in the competition based on the input [pron] a further reduction is not possible and this matrix remains optimal (cf. T3.2).
1386
VI. Theoretical Approaches to Selected Syntactic Phenomena T3.1: VP optimization (XP/ThD/CD reached − x[β] unchecked) Input: O2/ T3
Fpron
FSE
| PR.ᏭCD
☞ O21: [SE, pron]
|
☞ O22: [pron]
*(!)
FSELF
*(!)
|
| PR.ᏭThD PR.ᏭXP
*
|
*
|
*
*
T3.2: VP optimization (XP/ThD/CD reached − x[β] unchecked) Input: O3/ T3
Fpron
☞ O31: [pron]
FSE
| PR.ᏭCD
*
FSELF
|
*
| PR.ᏭThD PR.ᏭXP |
In the next phrase, the binder is merged into the derivation, hence the FAITH-constraints predict that [SE, pron] is optimal in T3.1.1, and [pron] wins in T3.1.2/3.2.1. (76) c. [vP Max[*β*] [VP x[ β ] [PP tx′ achter tx] tkeek] keek] According to MAB, the optimal choice is therefore the SE anaphor in the former derivation and the pronoun in the latter. T3.1.1: vP optimization (x[ β ] checked: PRINCIPLE ᏭXD applies vacuously) Input: O21/ T3.1
Fpron
FSE
FSELF
☞ O211: [SE, pron]
*
O212: [pron]
*!
*
T3.1.2/3.2.1: vP optimization (x[ β ] checked: PRINCIPLE ᏭXD applies vacuously) Input: O22/ T3.1 or O31/T3.2 ☞ O221/O311: [pron]
Fpron
FSE
FSELF
*
*
In sentences in which binding takes place outside the case domain (cf. [65d], repeated in [77]), x must be realized as a pronoun, and this is captured by ranking PRINCIPLE ᏭSD (and hence also PRINCIPLE ᏭFD and PRINCIPLE ᏭID) above FAITHSE (cf. T4.1). (I treat the verbal predicate leuk vindt like a simple verb and ignore its inherent structure.) (77) Max1 weet dat Mary hem1/*zich1/*zichzelf1 leuk vindt. a. [VP x[ β ] tx leuk vindt]
[Dutch]
When the first optimization process takes place (cf. T4), only PRINCIPLE ᏭXP and the FAITH-constraints apply non-vacuously, which means that O1 serves as input for the next competition.
39. Theories of Binding
1387
T4: VP optimization (XP reached − x[β] unchecked) Candidates ☞ O1: [SELF,
Fpron SE,
FSE
FSELF
PR.ᏭXP
pron]
**
O2: [SE, pron]
*!
O3: [pron]
*!
*
*
When vP is completed, we reach at once all domains relevant for binding, which means that all PRINCIPLE Ꮽ-constraints are involved in the next competition. According to the ranking assumed above, [pron] is therefore predicted to be the optimal realization matrix (cf. T4.1). (78) b. [vP x[ β ] Mary [VP tx′ tx tleuk vindt] leuk vindt] T4.1: vP optimization (XP/ThD/CD/SD/FD/ID reached − x[β] unchecked) Input: O1/T4
Fpron
PR.ᏭID/FD/SD FSE | PR.ᏭCD
O11: [S, S, pr]
*!*
|
**
O12: [SE, pr]
*!
|
*
☞ O13: [pron]
*
|
FSELF | PR.ᏭThD
PR.ᏭXP
|
**
**
*
|
*
*
*
|
Since [pron] serves now as input for the next optimization procedure, it remains the only candidate, because the matrix cannot be further reduced. Hence, [pron] remains optimal in the following optimizations, and when x[β] is checked, MAB correctly predicts that x must be realized as a pronoun.
5.5. Predictions and loose ends As the analyses of example (65b) (Max1 hoorde zichzelf1/zich1/*hem1 zingen) and (65c) (Max1 keek achter *zichzelf1/zich1/hem1) have shown, the theory outlined above can easily account for optionality by resorting to the optimality-theoretic concept of contstraint ties. Similarly, crosslinguistic variation does not pose a problem either, since optimality theory generally handles this by assuming a reranked constraint order for other languages. Against the background of the two universal subhierarchies proposed in (61) and (63), this means that language variation amounts to different interactions between these two subhierarchies. Despite its flexibility, the system is therefore relatively restrictive. As far as general predictions are concerned the theory makes, it can be concluded that it captures the following generaliztions (cf. also Fischer, this volume): If we deal with local binding, only few, low ranked PRINCIPLE Ꮽ-constraints can apply non-vacuously before checking takes place; since only these constraints favour a reduction of the realization matrix, it is very likely that the candidate with the full specification [SELF,
1388
VI. Theoretical Approaches to Selected Syntactic Phenomena
SE,
pron] is optimal and the SELF anaphor selected as optimal realization. In the case of non-local binding, more PRINCIPLE Ꮽ-constraints apply non-vacuously; hence, it is more likely that the realization matrix of x is gradually reduced in the course of the derivation and a less anaphoric form is selected as optimal realization in the end. Moreover, the following implications are predicted, which also seem to be true: If x is realized as SELF or SE anaphor if binding takes place in domain XD, these realizations are also licit if binding is more local; and if x is realized as pronoun, pronominal binding is also possible if binding occurs in a bigger domain. What is not explicitly discussed in Fischer (2004b) are non-syntactic factors that have an impact on the occurrence of reflexives (cf. also Fischer, this volume). These discourse or pragmatic restrictions concern in particular locally free reflexives, and data of these types are only marginally addressed in Fischer (2004b). However, as pointed out in the chapter on pronominal anaphora (cf. Fischer, this volume), there are also quite a number of examples which point to the fact that syntactic and discourse-related factors should not be considered completely independent from each other because they seem to interact; an extension of the theory outlined above should therefore be perfectly possible given the fact that optimality theory easily allows for the interaction of different types of constraints. Hence, additional discourse constraints might be added.
6. Kiss (2012) − Reflexivity and dependency Since the theories oulined in the previous sections set out to develop overall approaches to reflexivity, they initially tend to neglect less salient occurrences of reflexives and start with the most basic data involving reflexives. One well-known example of the former type are picture NP reflexives. In this section, I will therefore turn to an explicit recent analysis of these data which is developed in an HPSG-framework. From a theoretical point of view, this analysis is based on a representational setting which involves local constraints (in the sense of Gazdar et al. 1985; cf. Kiss 2012: fn. 13).
6.1. Picture NP reflexives in English and German One of the central observations in Kiss (2012) concerns a notable difference between English and German: Although German does not have exempt reflexives (cf. also Kiss 2001), it has picture NP reflexives; English, by contrast, allows both exempt reflexives and picture NP reflexives. The term exempt anaphora has been coined by Pollard and Sag (1992, 1994) for reflexives which seem to be exempt from binding principle A insofar as they need not be locally bound. (Note, however, that Kiss points out that they should rather be termed exempt reflexives; cf. below.) The English scenario has led Pollard and Sag (1992, 1994) and Reinhart and Reuland (1993) to propose an analysis of picture NP reflexives in terms of exemptness (cf. Pollard and Sag 1992: 266); however, in view of the German data, this proposal does not seem to be tenable. The question of whether a language has exempt reflexives or not can be clarified by the following tests. Typically, exempt reflexives need not be locally bound; instead, the antecedent (i) might occur in an embedding clause (cf. [79a] vs. [80a]), (ii)
39. Theories of Binding
1389
need not c-command the reflexive (cf. [79b] vs. [80b]), (iii) can occur intersententially (cf. [79c] vs. [80c]), or (iv) can be a split antecedent (cf. [79d] vs. [80d]). As (80) shows, German fails all these tests. (79) a. b. c. d.
Peter1 believed that pictures of himself1/him1 were on sale. Peter1’s campaign required that pictures of himself1 be placed all over town. Peter1 was upset. A picture of himself1 had been mutilated. Peter1 told Mary2 that pictures of themselves1+2 were on sale. (cf. Kiss 2012: 156, 158)
(80) a. Peter1 glaubte, dass Bilder von *sich1 /ihm1 zum Verkauf Peter believed that pictures of SE/ him to.the sale stünden. stand.SBJV ‘Peter1 believed that pictures of him1 were on sale.’
[German]
b. Peters1 Kampagne machte es erforderlich, dass Bilder von *sich1 /ihm1 Peter’s campaign made it necessary that pictures of SE/ him überall in der Stadt platziert wurden. everywhere in the town placed become ‘Peter1’s campaign required that pictures of himself1 be placed all over town.’ c. Peter1 war sauer. Ein Bild von *sich1 /ihm1 war beschädigt worden. Peter was upset a picture of SE/ him was mutilated become ‘Peter1 was upset. A picture of himself1 had been mutilated.’ (cf. Kiss 2012: 158) d. Peter1 erzählte Maria2, dass Bilder von *sich1+2/ihnen1+2 zum Mary that pictures of SE/ Peter told themselves to.the Verkauf stünden. sale stand.SBJV ‘Peter1 told Mary2 that pictures of themselves1+2 were on sale.’ On the other hand, picture NP reflexives do occur in German, as the following examples show. However, it has to be mentioned that pronouns are illicit in these examples and the antecedent must occur within the same clause. (81) a. Warum hat Claude Cahun1 die Bilder von sich1 /*ihr1 why has Claude Cahun the pictures of SE/ her zurückgehalten? withheld ‘Why has Claude Cahun1 withheld the pictures of herself1?’
[German]
dass er1 keine konfusen Berichte über sich1 /*ihn1 lesen b. Verständlich, it.stands.to.reason that he no confused reports about SE/ him read mag. like ‘It stands to reason that he1 does not like to read confused articles about himself1.’ (cf. Kiss 2012: 156)
1390
VI. Theoretical Approaches to Selected Syntactic Phenomena
But although picture NP reflexives can be found in the previous examples (and are in fact the only grammatical option here), Kiss points out that the situation is slightly different in the case of object experiencer psych verbs with picture NPs functioning as subject. In the English example in (82), a reflexive can occur inside the subject picture NP. (82) These pictures of himself1 frighten John1. (cf. Kiss 2012: 159) In German, by contrast, picture NP reflexives are ruled out in this context; cf. (83). Following the exemptness approch by Pollard and Sag (1992, 1994) and Reinhart and Reuland (1993), this is in fact what we would expect for languages without exempt reflexives. (83) a. *Die Bilder von sich1 gefielen den Kindern1. the pictures of SE pleased the children ‘The children1 liked the pictures of themselves1.’
[German]
b. *Ich glaube, dass die Bilder von sich1 den Kindern1 gefielen. I believe that the pictures of SE the children pleased ‘I think the children1 liked the pictures of themselves1.’ (cf. Kiss 2012: 161) Moreover, Kiss argues that topicalizing (cf. [84a]) or scrambling (cf. [84b]) the object experiencer den Kindern in front of the subject die Bilder von sich does not make a difference. In fact, I do get a contrast between (83) and (84) and would rate the examples below as fully acceptable (these judgments are confirmed by an anonymous reviewer). The theoretical consequences of this variation depend to a large extent on the concrete analysis of psych verb constructions. (Note at this point that Kiss argues against the unaccusativity analysis by Belletti and Rizzi 1988 by citing counterarguments by Pesetsky 1995 and Pollard and Sag 1992.) In any case, my impression is that the judgments in (84) might very well be compatible with the approach outlined here; cf. section 6.2. gefielen die Bilder von sich1. (84) a. Den Kindern1 [the children].DAT pleased [the pictures of SE].NOM ‘The children1 liked the pictures of themselves1.’
[German]
die Bilder von sich1 gefielen. b. Ich glaube, dass den Kindern1 I believe that [the children].DAT [the pictures of SE].NOM pleased ‘I think the children1 liked the pictures of themselves1.’ (cf. Kiss 2012: 161 with different judgements)
6.2. The analysis The analysis Kiss (2012) proposes presumes first of all a strict division between the two notions of reflexives vs. anaphors. While the former term is used in order to denote the
39. Theories of Binding
1391
lexical form as such, the term anaphor refers to reflexives that occur in an anaphoric dependency; hence, the notion of anaphor depends on the syntactic context. Of course, other langugages might choose different strategies to express anaphoricity (like clitics in Romance, particular verbal templates in Semitic languages, or particular suffixes in Russian; cf. Reinhart and Siloni 2005: 390) − but in the languages discussed here, it is expessed by reflexives (for a more detailed overview and discussion, cf., for instance, Reinhart and Siloni 2005 or Everaert 2012). On the basis of this distinction, Kiss introduces the two features R and D. The feature R is inherently associated with reflexives; the feature D, by contrast, is dependent on the syntactic context because it indicates a dependency. The two features are furthermore associated with a value (n) which corresponds to the index of the respective reflexive or phrase. Crucially, R does not project; D, by contrast, projects until the dependency can be resolved by identifying D’s value with the index of another, minimally c-commanding phrase (which means that the latter turns out to be the binder in the end). Depending on whether a dependency is introduced or not if a reflexive occurs, it is predicted that the reflexive functions as anaphor or not. The latter scenario corresponds to those cases in which the reflexive is exempt to syntactic restrictions; in the former case, the dependency must be resolved, which roughly means that the anaphor must be locally bound. Whether a dependency is introduced or not depends on the one hand on language-specific constraints (cf. [86], [87]), on the other hand on the question of whether a corresponding syntactic trigger is around. As to the latter point, Kiss assumes that such a trigger “will be any predicate that can have an articulated argument structure” (Kiss 2012: 161). In order to understand the concrete algorithm Kiss proposes, consider first the features defined in (85). (Note that the definition in [85] implies that verbs typically bear the feature [+ARG-S]; elements without an external argument are generally [−ARG-S]; cf. the similarity to Chomsky’s 1986b notion of the complete functional complex.) (85) a. [±ARG-S]: Predicates that contain articulated argument structure will be marked as [+ARG-S], otherwise they will be marked as [−ARG-S]. b. [COMPS ]: This feature signals whether the valency of a predicate has already been fully discharged. Hence, a predicate whose argument structure is saturated bears the specification [COMPS < >], if the external argument is still missing, it bears the feature [COMPS ], etc. As far as the technical implementation of anaphoric dependencies is concerned, their introduction and resolution are governed by the rules indicated in (86)−(89). While (86) is the rule that introduces anaphoric dependencies in English, (87) applies in German. Hence, we get the following result: If an appropriate trigger is around (namely the feature [+ARG-S]), the two rules yield the same result − D(n) is introduced and has to be resolved in the course of the derivation, following (89). Note that these assumptions bear a resemblance to the analysis by Fischer (2004b, 2006): Here, the D-feature is passed up the tree until the binder resolves the dependency, which has to take place in a restricted domain and is guided by local constraints. Similarly, in Fischer (2004b, 2006), the bound element moves from phase edge to phase edge until the binder enters the
1392
VI. Theoretical Approaches to Selected Syntactic Phenomena
derivation in order to establish a checking relation; anaphoric binding is also restricted by locality requirements, and the algorithm also applies strictly locally. If there is no appropriate trigger is present, (86) and (87) make different predictions. In the former scenario (which corresponds to English), no anaphoric dependency is introduced, hence the reflexive finally turns out to be exempt from syntactic restrictions. In the latter scenario (which corresponds to German), the inactive dependency D(n) is introduced. In contrast to R-features, D projects (just like D-features). When a D-feature reaches a position in which it can be activated by a [+ARG-S]-feature, it turns into an active dependency (= D(n); cf. [88]), which in turn must be resolved via Local Resolution (cf. [89]). Hence, the absence of exempt reflexives in German is accounted for − sooner or later, R will inevitably introduce an anaphoric dependency, which has to be resolved, even if it is not yet active at the beginning of the derivation. (86) Active Dependency: Given a phrase Y with daughters X and ZP, where ZP bears the value R(n). ZP bears the value D(n) if and only if X is [+ARG-S]. (cf. Kiss 2012: 171) (87) Dependency: Given a phrase Y with daughters X and ZP, where ZP bears the value R(n). ZP bears the value D(n) if X is [+ARG-S], and the value D(n) if X is [−ARG-S]. (cf. Kiss 2012: 171) (88) Activation: Given a phrase Y with daughters X and ZP, where ZP bears the value D(n). ZP bears the value D(n) if X is [+ARG-S]. (cf. Kiss 2012: 172) (89) Local Resolution: If a daughter of a phrase Y bears D(n) and Y is specified as [COMPS < >], then the other daughter of the phrase must bear index n; if Y is specified with a nonempty value for COMPS, then the index of the other daughter can bear index n. (cf. Kiss 2012: 171) To see how the analysis works in practice, consider the following example derivations. (90) a. Peter1 likes himself1. b. Peter1 mag sich1. Peter likes SE ‘Peter1 likes himself1.’
[German]
(cf. Kiss 2012: 173) In (90), English and German pattern alike. Due to the occurrence of the reflexive forms himself1/sich1, the analysis of both sentences involves the feature R(n). The crucial point now is that the verb (= the sister node of the reflexive) bears the feature [+ARG-S], which means that the application of (86) and (87) yield the same result: In both cases, D(n) is introduced, which is then projected to the VP, where it is resolved (following [89]) by identifying its value n with the index of its sister (= the subject NP); cf. (91)
39. Theories of Binding
1393
(following Kiss 2012: 173). In other words, we deal with an anaphoric depencency, and the subject NP functions as syntactic binder. (91)
S[COMPS < >] NP 1 Peter
VP[COMPS , +ARG-S, D(n = 1 )] V[COMPS , +Arg-S ]
NP [R(n), D(n)]
likes
himself
Let us now turn to the picture NP reflexives. Recall that the German example in (92b) (repeated from [80a]) is ungrammatical, in contrast to its English counterpart, whereas German allows picture NP reflexives in examples like (93b) (like English). (92) a. Peter1 believed that pictures of himself1/him1 were on sale. b. Peter1 glaubte, dass Bilder von *sich1 /ihm1 zum Verkauf Peter believes that pictures of SE/ him to.the sale stünden. stand.SBJV ‘Peter1 believed that pictures of himself1/him1 were on sale.’ (cf. Kiss 2012: 173)
[German]
(93) a. Peter1 prefers a picture of himself1/2 . von sich1/*2 . b. Peter1 bevorzugt ein Bild Peter prefers a picture of SE ‘Peter1 prefers a picture of himself1/2.’ (cf. Kiss 2012: 173)
[German]
This difference is accounted for as follows. In English, the analysis of (92a) and (93a) is basically the same (cf. [94], following Kiss 2012: 174). Since the preposition bears the feature [−ARG-S], no D(n) is introduced − neither at this point in the derivation (cf. [86]) nor later (since R-features do not project and English does not introduce inactive dependencies). Hence, we can conclude that English picture NPs simply do not involve anaphoric dependencies at all, which means that we do not deal with binding but rather with exemptness in terms of Pollard and Sag (1992, 1994) and Reinhart and Reuland (1993). (94)
N′[−ARG-S] N[−ARG-S] pictures
PP[−ARG-S] P[−ARG-S] of
NP[R(n)] himself
1394
VI. Theoretical Approaches to Selected Syntactic Phenomena
In German, the situation is different. The crucial point is that (87) applies instead of (86), which means that − although no active dependency is introduced inside the PP − the inactive dependency D(n) comes into play; cf. (95) (following Kiss 2012: 174). (Recall that in contrast to the R-feature, the D-feature projects.) N′[−ARG-S, D ( n)]
(95)
PP[−ARG-S, D ( n)]
N[−Arg-S] Bilder
P[ −ARG-S]
NP [R(n), D (n)]
von
sich
As to the ungrammatical example in (92b), the derivation looks as indicated in (96) (cf. Kiss 2012: 175). When the picture NP merges with the VP, the inactive dependency is activated (following [88]), since the VP bears a [+ARG-S]-feature. Hence, the resulting dependendy D(n) must be resolved, and since it is immediately dominated by a [COMPS < >]-feature, the resolution has to take place at this point in the derivation (cf. [89]) − however, this is not possible (there is no binder available), and therefore the derivation crashes. S[COMPS < >]
(96) NP[−ARG-S, D ( n), D(n)]
VP[+ARG-S, COMPS ]
Bilder von sich
PP zum Verkauf
V[+ARG-S, COMPS ] stünden
The analysis of (93b), the grammatical German example, is illustrated in (97) (cf. Kiss 2012: 176). As in (96), the picture NP bears an inactive dependency (cf. also [95]), which is activated when it merges with the verb (due to the latter’s [+ARG-S]-feature). However, in this case the mother node is not yet S but the VP, which does not bear an empty value for COMPS. Hence, the dependency need not be resolved at this point (cf. [89]), and since there is no potential binder around, resolution cannot take place − thus, the D-feature is projected to VP, since it still needs to identify its value. Now the dependency can (and must, following [89]) be resolved by the subject NP Peter, which means that n is identified with the index of the subject NP. So it can be concluded that the reflexive in this example turns out to be an anaphor (as in [92b]); however, in this case it is grammatical because it can be locally bound in the end. (97)
S[COMPS < >] NP1
VP[+ARG-S, COMPS , D(n = 1)]
Peter NP[−ARG-S, D ( n), D( n)] ein Bild von sich
V[+ARG-S, COMPS ] b evor zu gt
39. Theories of Binding
1395
By now, the theory has accounted for the fact that German does not have exempt reflexives and that the picture NP reflexives that do occur are in fact locally bound anaphors (which also explains why pronouns are illicit here). The last question that needs to be addressed is how the data involving object experiencer psych verbs are handled (cf. [82], [83]; These pictures of himself1 frighten John1). For English, nothing more needs to be said, since all picture NP reflexives pattern as indicated in (94): Since no anaphoric dependency is introduced, these reflexives are exempt, which means that they basically have the same distribution as pronouns. As to German examples like (83), the problem is again that due to (87)/(88) (Dependency and Activation) a dependency is introduced (starting out as inactive dependency which is then activated in the course of the derivation). However, since the only potential binder (the object experiencer) is embedded more deeply in the structure, Local Resolution cannot apply successfully, and the derivation crashes. Let me now briefly come back to example (84) (repeated in [98]), where it was suggested before that scrambling might in fact improve acceptability (cf. section 6.1). I will not provide a detailed analysis here, the more so as it is not entirely clear which underlying syntactic structure Kiss (2012) assumes for psych verbs (except that he rejects the unaccusativity hypothesis; cf. also section 6.1). However, it is clear what the analysis would have to provide in order to make correct predicitons: It would have to make sure that the activated dependency can be resolved after all, so scrambling (or whichever operation is involved in the concrete example) would need to move the potential binder to a position minimally c-commanding the D(n)-feature − on this assumption, the dependency could be resolved and the picture NP reflexive would be predicted to be grammatical. gefielen die Bilder von sich1. (98) a. Den Kindern1 [the children].DAT pleased [the pictures of SE].NOM ‘The children1 liked the pictures of themselves1.’
[German]
die Bilder von sich1 gefielen. b. Ich glaube, dass den Kindern1 I believe that [the children].DAT [the pictures of SE].NOM pleased ‘I think the children1 liked the pictures of themselves1.’ (cf. Kiss 2012: 161 with different judgements)
7. Conclusion What has been presented in this chapter were five different ways to deal with binding facts. As point of departure, we first focused on Chomsky (1981), which has served as a basis for all future proposals (independent of whether they have been developed along the same lines or adopted another course). In the first section, we concentrated on the original version of Chomsky’s binding theory and eventually discussed some of its major flaws, which can be subsumed, by and large, under the two categories crosslinguistic variation and optionality as regards the realization form of the bound element. One of the leading alternative approaches developed in the early 1990s has been proposed by Reinhart and Reuland. The constraints on binding they assume are not merely structural (only their Chain Condition), but refer instead to the notion of predi-
1396
VI. Theoretical Approaches to Selected Syntactic Phenomena
cate − roughly speaking, the grammaticality of anaphors depends on the question of whether their antecedent is a coargument or not; a strategy which leaves room for the occurrence of exempt anaphors. The third type of analysis we dealt with was Hornstein (2001), which adopts a derivational view and suggests that bound elements are in fact spelt-out traces. In other words, the antecedent starts out in the position in which we find the bound element in the end. Hence, it can be considered a residue of movement which emerges in order to satisfy case requirements as well as the Linear Correspondence Axiom and the Scope Correspondence Axiom. A different derivational approach is put forward in Fischer (2004b, 2006). Here it is assumed that the concrete form of the bound element is determined in the course of the derivation on the basis of local optimization, which takes place after the completion of each phrase and is sensitive to binding domains of different size. In the beginning, the bound element is equipped with a feature matrix which contains all potential realizations. The longer it takes until the antecedent enters the derivation (and checks the features of the bound element), the more anaphoric specifications are deleted. Finally, we considered the analysis proposed in Kiss (2012), which crucially relies on a careful distinction between the two notions of reflexivity and anaphoric dependency, since a reflexive as such only designates a specific form and is not subject to syntactic conditions that demand an antecedent; only if the reflexive introduces a dependency (which depends on language-specific constraints) must it be resolved in the course of the derivation, which means that the reflexive eventually functions as anaphor that has to be bound. Hence, we get a clear contrast between bound anaphors and so-called exempt reflexives. So this chapter did not stick to one particular framework but provided insight into diverse approaches to binding. The different frameworks we have come across include G&B theory, minimalism (with or without optimization processes), and HPSG. Moreover, some of the proposals adopt a derivational view of syntax, others adhere to a representational view, some are based on local constraints, others use global constraints. But the theories presented here do not only differ with respect to the different theoretical frameworks they adopt; they also focus on diverse binding data and thereby reveal how many factors have to be taken into account in the field of binding, including the broad range of crosslinguistic variation that can be observed.
8. References (selected) Belletti, Adriana, and Luigi Rizzi 1988 Psych-verbs and θ-theory. Natural Language and Linguistic Theory 6: 291−352. Büring, Daniel 2005 Binding Theory. Cambridge: Cambridge University Press. Burzio, Luigi 1989 On the non-existence of disjoint reference principles. Rivista di Grammatica Generativa 14: 3−27. Burzio, Luigi 1991 The morphological basis of anaphora. Journal of Linguistics 27: 81−105.
39. Theories of Binding
1397
Burzio, Luigi 1996 The role of the antecedent in anaphoric relations. In: Robert Freidin (ed.), Current Issues in Comparative Grammar, 1−45. Dordrecht: Kluwer. Burzio, Luigi 1998 Anaphora and soft constraints. In: Pilar Barbosa, Danny Fox, Paul Hagstrom, Martha McGinnis, and David Pesetsky (eds.), Is the Best Good Enough?, 93−113. Cambridge, MA: MIT Press. Chomsky, Noam 1973 Conditions on transformations. In: Stephen Anderson and Paul Kiparsky (eds.), A Festschrift for Morris Halle, 232−286. New York: Holt, Rinehart and Winston. Chomsky, Noam 1980 On binding. Linguistic Inquiry 11: 1−46. Chomsky, Noam 1981 Lectures on Government and Binding. Dordrecht: Foris. Chomsky, Noam 1986a Barriers. Cambridge, MA: MIT Press. Chomsky, Noam 1986b Knowledge of Language. New York: Praeger. Chomsky, Noam 2000 Minimalist inquiries: the framework. In: Roger Martin, David Michaels, and Juan Uriagereka (eds.), Step by Step, 89−155. Cambridge, MA: MIT Press. Chomsky, Noam 2001 Derivation by phase. In: Michael Kenstowicz (ed.), Ken Hale: A Life in Language, 1− 52. Cambridge, MA: MIT Press. Chomsky, Noam 2008 On phases. In: Robert Freidin, Carlos Otero, and Maria Luisa Zubizarreta (eds.), Foundational Issues in Linguistic Theory, 133−166. Cambridge, MA: MIT Press. Dalrymple, Mary 1993 The Syntax of Anaphoric Binding. Stanford: CSLI Publications. Everaert, Martin 1991 Nominative anaphors in Icelandic: morphology or syntax? In: Werner Abraham, Wim Kosmeijer, and Eric Reuland (eds.), Issues in Germanic Syntax, 277−305. Berlin: Mouton de Gruyter. Everaert, Martin 2012 The criteria for reflexivization. In: Dunstan Brown, Marina Chumakina, and Greville G. Corbett (eds.), Canonical Morphology and Syntax, 190−206. Oxford: Oxford University Press. Fanselow, Gisbert 1991 Minimale Syntax. Habilitation thesis, University of Passau. Fischer, Silke 2004a Optimal binding. Natural Language and Linguistic Theory 22: 481−526. Fischer, Silke 2004b Towards an optimal theory of reflexivization. Doctoral dissertation, University of Tübingen. Fischer, Silke 2006 Matrix unloaded: binding in a local derivational approach. Linguistics 44: 913−935. Gazdar, Gerald, Ewan Klein, Geoffrey Pullum, and Ivan Sag 1985 Generalized Phrase Structure Grammar. Oxford: Blackwell. Heck, Fabian 2004 A theory of pied piping. Doctoral dissertation, University of Tübingen.
1398
VI. Theoretical Approaches to Selected Syntactic Phenomena
Heck, Fabian, and Gereon Müller 2000 Successive cyclicity, long-distance superiority, and local optimization. In: Roger Billerey and Brook D. Lillehaugen (eds.), Proceedings of WCCFL 19, 218−231. Somerville, MA: Cascadilla Press. Heck, Fabian, and Gereon Müller 2003 Derivational optimization of wh-movement. Linguistic Analysis 33: 97−148. Hestvik, Arild 1991 Subjectless binding domains. Natural Language and Linguistic Theory 9: 455−496. Hornstein, Norbert 2001 Move! A Minimalist Theory of Construal. Oxford: Blackwell. Kayne, Richard 1994 The Antisymmetry of Syntax. Cambridge, MA: MIT Press. Kayne, Richard 2002 Pronouns and their antecedents. In: Samuel Epstein and T. Daniel Seely (eds.), Derivation and Explanation in the Minimalist Program, 133−166. Oxford: Blackwell. Kiss, Tibor 2001 Anaphora and exemptness. A comparative treatment of anaphoric binding in German and English. In: Dan Flickinger and Andreas Kathol (eds.), Proceedings of the 7th International Conference on Head-Driven Phrase Structure Grammar, 182−197. CSLI Publications, http://csli-publications.stanford.edu/HPSG/1/hpsg00kiss.pdf. Kiss, Tibor 2012 Reflexivity and dependency. In: Artemis Alexiadou, Tibor Kiss, and Gereon Müller (eds.), Local Modelling of Non-Local Dependencies in Syntax, 155−185. Berlin: de Gruyter. Koster, Jan 1984 Reflexives in Dutch. In: Jacqueline Guéron, Hans-Georg Obenauer, and Jean-Yves Pollock (eds.), Grammatical Representation, 141−167. Dordrecht: Foris. Koster, Jan 1987 Domains and Dynasties. Dordrecht: Foris. Kratzer, Angelika 2009 Making a pronoun: fake indexicals as windows into properties of pronouns. Linguistic Inquiry 40: 187−237. Manzini, Rita, and Kenneth Wexler 1987 Parameters, binding theory, and learnability. Linguistic Inquiry 18: 413−444. Martin, Roger 1999 Case, the extended projection principle, and minimalism. In: Samuel Epstein and Norbert Hornstein (eds.), Working Minimalism, 1−25. Cambridge, MA: MIT Press. Menuzzi, Sergio 1999 Binding Theory and Pronominal Anaphora in Brazilian Portuguese. Leiden: Holland Academic Graphics. Müller, Gereon 2000 Shape conservation and remnant movement. In: Masako Hirotani, Andries Coetzee, Nancy Hall, and Ji-Yung Kim (eds.), Proceedings of NELS 30, 525−539. Amherst, MA: GLSA. Müller, Gereon 22004 Phrase impenetrability and wh-intervention. In: Arthur Stepanov, Gisbert Fanselow, and Ralf Vogel (eds.), Minimality Effects in Syntax, 289−325. Berlin: Mouton de Gruyter. Müller, Gereon 2010 On deriving CED effects from the PIC. Linguistic Inquiry 41: 35−82. Newson, Mark 1998 Pronominalisation, reflexivity and the partial pronunciation of traces: binding goes OT. In: László Varga (ed.), The Even Yearbook 3. ELTE SEAS Working Papers in Linguistics, 173−222.
39. Theories of Binding
1399
Nunes, Jairo 1995 The copy theory of movement and linearization of chains in the minimalist program. Doctoral dissertation, University of Maryland, College Park. Nunes, Jairo 1999 Linearization of chains and phonetic realization of chain links. In: Samuel Epstein and Norbert Hornstein (eds.), Working Minimalism, 217−249. Cambridge, MA: MIT Press. Pesetsky, David 1995 Zero Syntax. Experiencer and Cascades. Cambridge, MA: MIT Press. Pollard, Carl, and Ivan Sag 1992 Anaphors in English and the scope of binding theory. Linguistic Inquiry 23: 261−303. Pollard, Carl, and Ivan Sag 1994 Head-Driven Phrase Structure Grammar. Chicago: University of Chicago Press. Reinhart, Tanya, and Eric Reuland 1991 Anaphors and logophors: an argument structure perspective. In: Jan Koster and Eric Reuland (eds.), Long-Distance Anaphora, 283−321. Cambridge: Cambridge University Press. Reinhart, Tanya, and Eric Reuland 1993 Reflexivity. Linguistic Inquiry 24: 657−720. Reinhart, Tanya, and Tal Siloni 2005 The lexicon-syntax parameter: reflexivization and other arity operations. Linguistic Inquiry 36: 389−436. Reuland, Eric 2001 Primitives of binding. Linguistic Inquiry 32: 439−492. Reuland, Eric 2011 Anaphora and Language Design. Cambridge, MA: MIT Press. Reuland, Eric, and Martin Everaert 2001 Deconstructing binding. In: Mark Baltin and Chris Collins (eds.), The Handbook of Contemporary Syntactic Theory, 634−669. Oxford: Blackwell. Reuland, Eric, and Tanya Reinhart 1995 Pronouns, anaphors and case. In: Hubert Haider, Susan Olsen, and Sten Vikner (eds.), Studies in Comparative Germanic Syntax, 241−268. Dordrecht: Kluwer. Riemsdijk, Henk van 1978 A Case Study in Syntactic Markedness: The Binding Nature of Prepositional Phrases. Dordrecht: Foris. Rizzi, Luigi 1990a On the anaphor-agreement effect. Rivista di Linguistica 2: 27−42. Rizzi, Luigi 1990b Relativized Minimality. Cambridge, MA: MIT Press. Safir, Ken 2004 The Syntax of Anaphora. Oxford: Oxford University Press. Schäfer, Florian 2008 The Syntax of (Anti-)Causatives. External Arguments in Change-of-State Contexts. Amsterdam: Benjamins. Sternefeld, Wolfgang 2004 Syntax. Eine Merkmalbasierte Generative Analyse des Deutschen. Tübingen: Stauffenburg. Wilson, Colin 2001 Bidirectional optimization and the theory of anaphora. In: Géraldine Legendre, Jane Grimshaw, and Sten Vikner (eds.), Optimality-Theoretic Syntax, 465−507. Cambridge, MA: MIT Press. Woolford, Ellen 1999 More on the anaphor agreement effect. Linguistic Inquiry 30: 257−287.
1400
VI. Theoretical Approaches to Selected Syntactic Phenomena
Zwart, Jan-Wouter 2002 Issues relating to a derivational theory of binding. In: Samuel Epstein and T. Daniel Seely (eds.), Derivation and Explanation in the Minimalist Program, 269−304. Oxford: Blackwell.
Silke Fischer, Stuttgart (Germany)
40. Word Order 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Introduction and overview On the notion of scrambling Approaches to free word-order alternations Scrambling is a syntactic phenomenon Trace-based analyses of Mittelfeld scrambling Base generated adjunction structures: LFG Word-order domains: HPSG Scope Conclusion References (selected)
Abstract This chapter discusses different theories of free word order alternations that commonly go by the name of scrambling. The main example discussed here is Mittelfeld scrambling in German. The chapter argues that scrambling is a genuinely syntactic process with reflexes both in the phonology (word order) and the semantics (binding and scope). The chapter then briefly introduces the three approaches to scrambling that have dominated the literature: trace-based accounts, base generation accounts, linearization-based accounts. Their main strengths and weaknesses are outlined and the most important lines of debate are sketched. The conclusion briefly turns to non-configurationality.
1. Introduction and overview 1.1. Scope of this article This chapter treats various theoretical approaches to free constituent order. The chapter is not concerned with typological patterns of word order (Greenberg 1963), such as the correlations between unmarked OV order and having postpositions. A recent overview of such typological patterns can be found in Dryer (2007). An influential parsing-based approach is contained in Hawkins (1990). Both treat word order pat-
40. Word Order
1401
terns within as well as across major category boundaries (clause, verb phrase, prepositional phrase, noun phrase); Cinque (2005, 2009) provides an important account of the range of permissible word-order types within a given major category. Instead of the typological patterns, this chapter is concerned with the theoretical treatment of constituent order alternations within a given language. More specifically, it treats word order alternations that are often referred to as free. Such alternations go by the name of scrambling in the literature. The concrete case studied here is Mittelfeld scrambling in German. The phenomenon has been widely worked-out from a broad range of theoretical perspectives; it therefore allows the comparison of different worked-out proposals. An introduction to the phenomenon itself is provided by Frey (this volume). I will not discuss more extreme cases of free word order, found in so-called non-configurational languages. I return briefly to the question of non-configurationality and its relation to scrambling in the conclusion but will otherwise ignore the issue.
1.2. Overview The chapter is structured as follows. The first section characterizes the constructions that are referred to as scrambling informally. The second section briefly sketches the outline of four approaches to scrambling that have been pursued in the literature: (i) the nonsyntactic approach, (ii) the trace-based approach, (iii) the base-generation approach, and (iv) the linearization-based approach. Section 4 contains an overview of arguments against the nonsyntactic approach to scrambling in German and long-distance scrambling in Japanese. The following sections sketch versions of the trace-based, base generation, and linearization-based accounts. In the section on the trace-based account, particular attention is paid to the debate on whether scrambling is to be construed as an A- or an Ā-movement phenomenon and to the triggering problem, which arises under the minimalist thesis of movement as a last resort. The section on base generation highlights, in particular, what the conditioning factors for the availability of scrambling are in a crosslinguistic perspective. The final section briefly discusses an argument from scope that has been used to claim superiority of the base generation account.
2. On the notion of scrambling The term scrambling as a description of relative freedom in constituent order was coined by Ross (1967: section 3.1.2), who exemplifies it using discontinuous noun phrases in Latin. Ross’ scrambling rule imposes few constraints, except that it is clause-bound. Ross suggests to locate the scrambling rule in a separate, stylistic, component of the grammar. Little is said about this stylistic component and Ross is usually taken to imply that scrambling is not a rule of syntax proper. Ross excludes scrambling from syntax because of the nature of the rule. Of course, we expect such a move to have consequences. The theory of syntax is, among other things, a theory of the structural aspects of meaning such as binding and scope. If an operation is extrasyntactic, it should not have
1402
VI. Theoretical Approaches to Selected Syntactic Phenomena
an impact on these structural aspects of meaning. For any given operation, this makes a testable claim. It should be noted that the term structural is used here in a sense that is broader than that which is usually employed in the Government and Binding and Minimalist literature, where its meaning is often restricted to the dominance relations holding in tree structures. I have in mind instead the broader and less theory-bound notion of structure found in Keenan and Stabler (2003). In informal usage the term scrambling has come to be used as a cover term for almost any kind of optional variation in word order. Corver and Riemsdijk (1994) for example label various constructions in the following languages as scrambling: Korean, Japanese, Russian, Warlpiri, Persian, Hindi/Urdu, Dutch, German, Hungarian, and Selayarese. The constructions called scrambling represent optional word order variation in the sense that they do not have a morphological reflex, do not determine clause type or are restricted to a particular clause type, and do not seem to be associated with unique positions. The first condition distinguishes scrambling from, for example, the passive, which is accompanied by characteristic verbal morphology and by case alternations on the arguments involved. Lack of a morphological reflex of a given alternation does not preclude the existence of morphological preconditions for it; in Turkish, for example, the presence of the accusative marker -(y)I is required for scrambling to be possible (Enç 1991; Kornfilt 2003). The second and third conditions distinguish scrambled structures from questions, relative clauses and wh-movement constructions in general, since those do play a part in clausal typing and do target specific positions in the clause. The three properties of scrambling can be illustrated below for German. German is an SOV language with the additional property that in clauses without a subordinator, the finite verb is found in second position. In clauses with a subordinator, the finite verb, along with any nonfinite verbs, appears clause finally. In traditional grammar, the space that is defined, on the one side, by the finite verb in main clauses and the subordinator in subordinate clauses and, on the other side, by the non-finite verbs is called the Mittelfeld − ‘middle field’. The examples in (1) illustrate the phenomenon called Mittelfeld scrambling (for a detailed empirical discussion see Frey, this volume). For a ditransitive verb like streitig machen − ‘compete’ all six conceivable linearizations of subject, direct object, and indirect object are possible within the Mittelfeld in one context or another. This is illustrated in (1) (from Haider 1993). These word order alternations are free in the sense outlined above, because (i) there is no morphological reflex of the alternation, (ii) the alternation does not interact with clause type (i.e., since scrambling is equally possible in main and subordinate clauses, in declarative and interrogative clauses, etc.), finally, because (iii) there is no single dedicated scrambling position. In fact, to account for the entire paradigm in (1), even assuming two dedicated positions per argument (a scrambling position and a non-scrambling position) would be insufficient, since this would allow deriving at most five of the six permissible orders. (1)
Objekt dem Subjekt den a. dass das that the.NOM.SG.N object the.DAT.SG.N subject the.ACC.SG.M erste-n Platz streitig macht initial-ACC.SG.M place contested makes ‘that the object competes with the subject for the initial position’
[German]
40. Word Order
1403
Objekt den erste-n Platz dem b. dass das that the.NOM.SG.N object the.ACC.SG.M initial-ACC.SG.M place the.DAT.SG.N Subjekt streitig macht subject contested makes Subjekt das Objekt den c. dass dem that the.DAT.SG.N subject the.NOM.SG.N object the.ACC.SG.M erste-n Platz streitig macht initial-ACC.SG.M place contested makes Subjekt den erste-n Platz d. dass dem that the.DAT.SG.N subject the.ACC.SG.M initial-ACC.SG.M place Objekt streitig macht das the.NOM.SG.N object contested makes erste-n Platz das Objekt dem e. dass den that the.ACC.SG.M initial-ACC.SG.M place the.NOM.SG.N object the.DAT.SG.N Subjekt streitig macht subject contested makes f.
erste-n Platz dem Subjekt dass den that the.ACC.SG.M initial-ACC.SG.M place the.DAT.SG.N subject Objekt streitig macht das the.NOM.SG.N object contested makes
These properties make Mittelfeld scrambling a typical representative of the general type discussed under that label. Like scrambling in Ross’s original rule but unlike so-called long-distance scrambling in languages like Hindi/Urdu, Korean, Japanese, and Persian, German Mittelfeld scrambling does not cross finite clause boundaries.
3. Approaches to free word-order alternations Modern approaches to the analysis of word order usually take the traditional observation very seriously that those elements that belong together semantically also occur close to each other (Behagel 1932: 4). This old observation is expressed through the assumption that syntactic and semantic composition proceed hand in hand and generate phrase structure trees. Phrase-structure trees represent a hierarchical organization for a string of words; the hierarchical aspect is expressed in terms of the antisymmetric, reflexive, and transitive dominance relation, the linear aspect − in terms of the transitive and asymmetric precedence relation. In such phrase structure trees any two distinct nodes are either in a dominance relation to each other or in a precedence relation. Crucially, constituents in a tree are always continuous: two distinct nodes that are not in a dominance relation never overlap linearly. This is referred to as the Nontangling Condition on phrase structure trees, which can be formulated as follows: (2)
The Nontangling Condition: In any well-formed constituent structure tree, for any nodes x and y, if x precedes y, then all nodes dominated by x precede all nodes dominated by y. (Partee, Meulen, and Wall 1990: 440)
1404
VI. Theoretical Approaches to Selected Syntactic Phenomena
The nontangling condition rules out structures like the one in Figure (40.1b), where b precedes c and dominates d and f, yet e, which is dominated by c, precedes f. Assuming that sisters in the tree compose semantically, the nontangling condition strengthens Behagel’s observation and claims that semantic composition corresponds to linear concatenation. In many of the clearest cases this gives correct results: Words that belong together semantically also act as units in other respects. The assumption that syntactic structures obey the nontangling condition is shared by theories as diverse as the Standard Theory of the sixties, the Extended Standard Theory of the seventies, (3)
a.
b.
a b d
a
c f
e
b d
c e
f
Fig. 40.1: A licit (a) and an illicit (b) structure according to the nontangling condition
Government and Binding theory, Minimalism, Generalized Phrase Structure Grammar, Lexical Functional Grammar, Categorial Grammar (without Bach’s 1979 wrap operations), and Tree Adjoining Grammar. Theories vary in the amount of further restrictions that they impose on phrase structure trees. One set of restrictions concerns the labels of the nodes in these trees. Other types of constraints have to do with the geometry of the tree. Richard Kayne has made a number of influential proposals in this realm: Kayne (1981) suggests that all phrase structure is maximally binary branching and Kayne (1994) advances the idea that specifiers invariably precede heads which, in turn, invariably precede their complements. This view is generally taken to entail that even simple clauses like (1a) are derived by a number of movement operations that transform an underlying VO-structure into the more superficial OV-structure. We will not concern ourselves with these additional movement operations here (see Hinterhölzl 2006 for pertinent discussion). Of course, there are many instances where words and phrases that belong together semantically are not adjacent and sometimes not even close to each other. Some simple examples are given below. In the constituent question in (4a), the object of the verb buy doesn’t show up adjacent to the verb but displaced far to the left at the beginning of the sentence. In the raising construction in (4b), the noun phrase der Garaus − ‘the do.in’ occurs initially and separated from the verb machen − ‘make’ although semantically they belong together and make up the idiom jemandem den Garaus machen − ‘somebody.dat the.ACC.M do.in make’, which means to do somebody in. The idiom is passivized here and the direct idiomatic object Garaus, a noun which has no independent meaning in German, acts as the subject of scheinen − ‘appear’. In (4c) the relative clause who haven’t turned in term papers appears separated from the noun students although it restricts this noun and thus belongs together with it semantically. Finally, (4d) is a German example where the verb lesen − ‘read’ is separated from its argument es − ‘it’ by two other arguments and the verb versprechen − ‘promise’ is separated from its two nominal arguments by the verb lesen.
40. Word Order (4)
1405
a. What do you think that Bill said that Sally will buy tomorrow? Garaus scheint ihm gemacht worden zu b. Der become to the.NOM.SG.M do.in appears 3SG.DAT.M made sein. be ‘He appears to have been done in.’
[German]
c. I only want those students to take the exam who haven’t turned in term papers. (McCawley 1987: 196) ihm jemand zu lesen d. dass es that 3SG.ACC.N 3SG.DAT.M somebody.SG.NOM to read versprochen hat promised has ‘… that someone promised him to read it’ (Reape 1994: 157)
[German]
Theoretical reactions to these types of examples have varied. One approach has been to posit inaudible abstract elements, traces or silent copies, in the position where an element would canonically be expected to occur. This is illustrated for (4a) in (5). The trace fills the gap at the position where the object is expected to occur. It fulfills the object’s semantic function with respect to the verb and its presence makes the generalization that elements that belong together semantically occur close to each other true on the (abstract) surface. (5)
What do you think that Bill said that Sally will buy twhat tomorrow?
While the presence of traces makes surface syntactic representations more abstract, their postulation holds fast to the assumption that elements that belong together semantically also occur close to each other both structurally and linearly. In the Principles and Parameters tradition and in Minimalism, all arguments are usually assumed to be licensed within the maximal projection of the argument-taking lexical head. This includes the external argument under the VP-internal subject hypothesis (Koopman and Sportiche 1991). Under these assumptions the analysis of (4b) would involve a trace of der Garaus − ‘the do.in’ within the projection of the verb machen − ‘make’, as shown in (6). The strategy of positing inaudible traces has also been applied to scrambling. (6)
Der Garaus scheint [ihm tder Garaus gemacht] the.NOM.SG.M do.in appears 3SG.DAT.M make worden zu sein. become to be.
[German]
Other researchers have assumed that argument taking can be delayed to a certain extent. Under such approaches (Hinrichs and Nakazawa 1989, 1994; Jacobson 1990; Neeleman and van de Koot 2002; Pollard and Sag 1994 a.o.) argument satisfaction is not restricted to the projection of the argument-taking lexical head and mechanisms are put in place
1406
VI. Theoretical Approaches to Selected Syntactic Phenomena
that allow higher predicates to inherit the unsaturated argument(s) of their complement. This is schematized in (7), where the subscripted [θ] on gemacht − ‘made’, scheint − ‘seems’ and all intermediate verbs is intended as a notation for argument inheritance from the lower verb by the higher verb. The strategy here is to relax the notion of what belongs together semantically and then to show that under these relaxed assumptions phrase structures obey the Nontangling Condition. Base generation analyses of scrambling typically follow this path. (7)
Der Garaus scheint[θ] ihm gemacht[θ] the.NOM.SG.M do.in appears 3SG.DAT.M make worden[θ] zu sein[θ]. become to be.
[German]
Yet a different reaction to some of the cases in (4) has been to give up the Nontangling Condition (Bach 1979; Blevins 1990; Kathol 2000; McCawley 1982; Ojeda 1988; Reape 1994, and the contributions in Bunt, and Horck 1996; Huck, and Ojeda 1987). We will discuss Reape’s analysis of scrambling below. A final reaction to some of the discontinuities in (4) might be to assume that they do not arise in the syntax proper. As we have seen, Ross’s analysis of scrambling follows this general line. In the next sections we discuss these four strategies for the analysis of scrambling.
4. Scrambling is a syntactic phenomenon Let us turn to the question of whether there is evidence that scrambling is syntactic. By assumption, a phenomenon is syntactic if it has an effect on structural aspects of meaning like binding and scope. We start the investigation with German. Here the answer is that scrambling is clearly syntactic. Consider the following two pairs (from Frey 1993). Frey indicates that (8a) is scopally unambiguous while (8b) is scopally ambiguous. The word order alternation in (8) therefore has a scopal effect, thus, scrambling is to be represented syntactically (see Frey 1993; Kiss 2001; Lechner 1996, 1998; Pafel 2005 for detailed discussion of scope alternations in scrambling). The same point is made by the pair in (9), where the order of subject and object are scrambled, which gives rise to different binding possibilities. As discussed in G. Müller (1995: chapter 3.9), these judgments are not shared by all speakers, however. (Note that the translation of [9b] is passive only to facilitate the relevant reading. The German example is in the active voice.) (8)
Geschenk fast jede-m a. DASS er mindestens ein [German] THAT he at.least one.ACC.SG present almost every-DAT.SG Gast überreichte (✓d [ c, *c [ d) guest handed ‘that he handed at least one present to almost every guest’
40. Word Order Gast mindestens jede-m b. DASS er fast THAT he almost every-DAT.SG guest at.least Geschenk überreichte ein one.ACC.SG present handed (9)
1407
(✓d [ c, ✓c [ d)
Mutter jede-m Kindi hilft a. *weil sein-ei because his-NOM.SG.F mother every-DAT.SG.N child helps ‘because hisi mother helps every childi’
[German]
Kindi sein-ei Mutter hilft jede-m b. weil because every-DAT.SG.N child his-NOM.SG.F mother helps ‘because every childi is helped by hisi mother’ Similar judgments regarding clause-bound scrambling have been reported in the literature on Japanese (Hoji 1985; Saito 1985, 1992; Ueyama 2003), Hindi/Urdu (Kidwai 2000; Mahajan 1994), and Persian (Browning and Karimi 1994). In all of these languages clause-bound scrambling changes scope and binding relations in ways closely resembling German. Mittelfeld scrambling in German and the various clause-bound scrambling operations in other languages are therefore clearly syntactic. The empirical situation is often assumed to be different for long-distance scrambled orders, at least in Japanese (Bošković and Takahashi 1998; Saito 1992). Relevant examples from Saito (1992) with the original bracketing and translation are given below. Example (10) is similar to (9) and indicates that scrambling of arguments within the same clause affects binding relations. sensei]-ni [karerai-o (10) a. ?*[Masao-ga [[otagaii-no Masao-NOM each.other-GEN teacher-to they-ACC syookaisita]]] (koto). introduced (fact) ‘Masao introduced themi to each other’si teachers.’
[Japanese]
sensei]-ni b. [karerai-o [Masao-ga [[otagaii-no they-ACC Masao-NOM each.other-GEN teacher-to [ti syookaisita]]]] (koto). introduced (fact) ‘Themi, Masao introduced ti to each other’si teachers.’ Example (11) shows that the same is not true for reordering of arguments that belong to different clauses (see Ueyama 2003 for detailed discussion and references). sensei]-ni [CP [IP Hanako-ga (11) a. *[Masao-ga [otagaii-no [Japanese] Hanako-NOM Masao-NOM each.other-GEN teacher-to karerai-o hihansita] to] itta] (koto). they-ACC criticized comp said (fact) ‘Masao said to each otheri’s teachers that Hanako criticized themi.’
1408
VI. Theoretical Approaches to Selected Syntactic Phenomena sensei]-ni [CP [IP Hanako-ga ti b. *[Karera-oi [Masao-ga [otagaii-no Hanako-NOM they-ACC Masao-NOM each.other-GEN eacher-to hihansita] to] itta]] (koto). criticized comp said (fact) ‘Themi Masao said to each otheri’s teachers that Hanako criticized ti.’
Similarly, long-distance scrambling, in contrast to clause-bound scrambling, does not affect scope relations of elements that originate in different clauses (Hoji 1985; Tada 1993; Ueyama 1998 and references cited there). This contrast between clause bound and long-distance scrambling is illustrated in (12) (from Miyagawa 2006) and (13) (from Bošković and Takahashi 1998). daremo-o sikatta. (12) a. Dareka-ga someone-NOM everyone-ACC scolded ‘Someone scolded everyone.’ dareka-ga ti sikatta. b. Daremo-oi everyone-ACC someone-NOM scolded ‘Everyone, someone scolded.’ [Mary-ga daremo-ni atta to] (13) a. Dareka-ga someone-NOM Mary-NOM everyone-dat met comp omotteiru. thinks ‘Someone thinks that Mary met everybody.’
[Japanese] (d [ c, *c [ d)
(d [ c, c [ d) [Japanese]
(d [ c, *c [ d)
[Mary-ga t1 atta to] b. Daremo1-ni dareka-ga everyone-dat someone-NOM Mary-NOM met comp omotteiru. thinks (d [ c, *c [ d) ‘Someone thinks that Mary met everybody.’ These facts in and of themselves would not have given rise to the claim that longdistance scrambling in Japanese is semantically vacuous. Similar patterns are, in fact, attested in many cases of long distance movement. In this regard, a comparison between long-distance scrambling in Japanese and long topicalization in German is instructive. Frey (1993) reports the following pattern of data for long-distance topicalization in German. These examples show that long-distance topicalization in German behaves like long-distance scrambling in Japanese: It does not give rise to new pronominal binding relations, (14a), and does not extend the scope of a quantifier, (14b). Jungeni hat sein-e)i/k Mutter behauptet, (14) a. Jede-n has his-NOM.SG.F mother claimed every-ACC.SG.M boy der Mann bestohlen. habe have.3SG.SBJV the.NOM.SG.M man mugged ‘Every boyi, his)i/k mother claimed that the man mugged.’
[German]
40. Word Order
1409
b. Fast jede-n hat mindestens ein-er behauptet, almost everyone-ACC.SG.M has at.least one-NOM.SG.M claimed der Mann bestohlen. habe (d [ c, *c [ d) have.3SG.SBJV the.NOM.SG.M man mugged ‘At least one person has claim that the man mugged almost everybody.’ The observation that led Saito (1992) to make the famous claim “that scrambling in Japanese, even when it moves a constituent ‘long distance’, can be literally undone in the LF component” is illustrated in (15). The point to note about the examples in (15) is that both are interpreted as indirect constituent questions. The scope of the wh-phrase dono hono − ‘which book’ is unaffected by its position in the main or embedded clause. In the framework of assumptions about the interpretation of questions underlying Saito (1992), this is only possible if the position of the wh-phrase among the elements of the matrix clause in (15b) can literally be semantically equivalent to its positioning among the elements of the embedded clause, as in (15a). This has come to be called the undoing property of long-distance scrambling in Japanese. The observation that Japanese longdistance scrambling has the undoing property, has led various authors (Bošković and Takahashi 1998; Kitahara 2002; Saito 1989, 2004; Tada 1993) to make the strong claim that long-distance scrambling in Japanese never has an effect on structural aspects of interpretation. If this were true, i.e., if long-distance scrambling in Japanese never had a semantic effect, this would furnish a good argument for an extrasyntactic account. (15) a. [Masao-ga [CP [IP Hanako-ga dono hon-o [Japanese] Hanako-NOM which book-ACC Masao-NOM tosyokan-kara karidasita] ka] siritagatteiru] koto library-from checked.out Q want.to.know fact the fact that Masao wants to know [Q [Hanako checked out which book from the library]] ‘the fact that Masao wants to know which book Hanako checked out from the library’ b. ?[dono hon-oi [Masao-ga [CP [IP Hanako-ga ti tosyokan-kara which book-ACC Masao-NOM Hanako-NOM library-from karidasita] ka] siritagatteiru] koto checked.out Q want.to.know fact the fact that which booki, Masao wants to know [Q [Hanako checked out ti from the library]] ‘the fact that Masao wantso to know which book Hanako checked out from the library’ Again, a comparison with topicalization in German is instructive (see Tada 1993). Although long-distance topicalization in German does not have the undoing-property in all cases, it does in a restricted environment. Reis and Rosengren (1992) discuss a construction they call wh-imperatives, an example of which is given in (16a). Like (16b), (16a) is interpreted as an imperative which embeds an indirect question. The wh-word wen − ‘whom’ takes embedded scope but is topicalized to the beginning of the main clause.
1410
VI. Theoretical Approaches to Selected Syntactic Phenomena
(16) a. Wen stell dir vor, dass Peter besucht hat! who.ACC imagine 2SG.DAT PRT that Peter visited has ‘Image who Peter visited!’
[German]
b. Stell dir vor, wen Peter besucht hat! imagine 2SG.DAT PRT who.ACC Peter visited has ‘Imagine who Peter visited!’ Reis, and Rosengren (1992) argue that the examples cannot be analyzed in terms of a parenthetical imperative but involve true embedding. The construction is limited to imperative verbs that take interrogative complements. Wh-imperatives provide a case, then, in which long-distance topicalization in German also exhibits the undoing property. Once an account of the undoing property of topicalization in German wh-imperatives is available that deals with it in the syntax, as, by commonly held assumption, it would have to, the existence of the undoing property with Japanese long scrambling no longer furnishes an argument for an extrasyntactic treatment. It should be noted that if the parallel between topicalization and scrambling turns out to be real, German constitutes a counterexample to Bošković’s (2004) generalization that only languages without articles display movement operations with the semantic footprint of long-distance scrambling in Japanese. Returning to Japanese, we note that Saito (1985: chapter 3) had already discussed examples like (17). In the unscrambled order, (17a), the subject pronoun kanozyo − ‘she’ cannot be coreferential with Mary while such an interpretation is possible under the scrambled order in (17b). On standard assumptions, the condition governing the possibility of coreference between pronouns and proper names is structural (Condition C of the binding theory in Government and Binding theory). If we follow this assumption, we have to conclude that long-distance scrambling is syntactic because it has an impact on structural aspects of meaning. The same conclusion can be reached on the basis of the patterns of coreference discussed in Miyagawa (2005, 2006) and Nishigauchi (2002). A relevant pair is given below in (18). Similar conclusions emerge from facts concerning the scope of long-distance scrambled quantifiers discussed in Miyagawa (2005, 2006), and the observation that the possibility of binding into an element depends on how far it has been scrambled (Saito 2003: section 5.1). (17) a. *John-ga [kanozyoi-ga [NP kinoo [Japanese] Maryi-o tazunete John-NOM she-NOM yesterday Mary-ACC visit kita hito-o] kirat-tei-ru to] omot-tei-ru (koto). came person-ACC dislike-PROG-PRS comp think-PROG-PRS fact ‘John thinks that shei dislikes the person who came to see Maryi yesterday.’ b. [NP kinoo Maryi-o tazunete kita hito-o]j John-ga yesterday Mary-ACC visit came person-ACC John-NOM [kanozyoi-ga tj kirat-tei-ru to] omot-tei-ru (koto). dislike-PROG-PRS comp think-PROG-PRS fact she-NOM ‘The person who came to see Maryi yesterday, John thinks that shei dislikes.’
40. Word Order (18) a. [Johni nituite-no dono hon-o]j karei-ga [Hanako-ga ti John about-GEN which book-ACC he-NOM Hanako-NOM kiniitta ka] sit-tei-ru (koto). liked Q know-PROG-PRS fact [Which book about Johni ]j , hei knows [Q [Hanako likes tj ]] ‘Hei knows which book about Johni Hanako likes.’
1411 [Japanese]
kiniitta ka] b. *Karei-ga [Hanako-ga [Johni nituite-no dono hon-o] he-NOM Hanako-NOM John about-GEN which book-ACC liked Q sit-tei-ru (koto). know-PROG-PRS fact ‘Hei knows [[which book about Johni ]j [Hanako likes tj ]]’ All of this illustrates that long-distance scrambling in Japanese does have an effect on coreference, binding, and scope, and is therefore not extrasyntactic. The same type of argument has been made for Korean by Johnston and Park (2001). What we have seen so far is that both scrambling in the Mittelfeld and long-distance scrambling have effects on structural aspects of meaning, which leads to the conclusion that they are syntactic phenomena. We turn to various proposals for the representation of structures that involve scrambling in the Mittelfeld in German in the next sections. The analyses entertained by different researchers depend in part on the vocabulary for syntactic analysis made available by the theories in which the analyses are formulated. They also depend crucially on the theory of scope and binding that accompanies the analysis of scrambling.
5. Trace-based analyses of Mittelfeld scrambling There is a large number of analyses of scrambling that involve overt fillers and abstract traces (see Fanselow 1990; Frey 1993; Grewendorf and Sabel 1999; Haider and Rosengren 2003; Hinterhölzl 2006; Kidwai 2000; Kitahara 2002; Mahajan 1990; G. Müller 1995; G. Müller and Sternefeld 1993; Putnam 2007; Sabel 2005; Saito 1985; Stechow and Sternefeld 1988; Ueyama 2003; Webelhuth 1989, 1992 among numerous others). Some of these are expressed in terms of a movement operation while others are not. The distinction will not play a role until the very end of this section. Work done in the Principles and Parameters and Minimalist traditions usually assume that the theory of scope and binding should be expressed strictly in terms of tree-configurational notions, in particular, in terms of dominance and command relations but not in terms of precedence. A strong formulation of this general position is offered by Chomsky (2008), who says “that order does not enter into the generation of the C-I [conceptualintentional] interface, and that syntactic determinants of order fall within the phonological component.” This is usually taken to entail that only those aspects of trees that are expressed in terms of dominance and command relations enter into the determination of scope and binding but crucially not those that are expressed in terms of precedence. Lenerz (1977) had suggested criteria for establishing the unmarked word order in the Mittelfeld of German sentences. One of the properties of unmarked orders (see Frey
1412
VI. Theoretical Approaches to Selected Syntactic Phenomena
1993; Lechner 1996, 1998; Pafel 2005 for extensive discussion as well as Frey, this volume) is that they give rise to unequivocal scope relations while marked orders give rise to scope ambiguities. This was illustrated above in (8), where the unequivocal example (8a) shows the order that is considered the unmarked order for this class of verbs. Example (8b) shows the marked word order and is ambiguous. If the dominance and command relations encoded in trees are the only way in which scope relations are encoded, then these relations must be different in the two examples in (8). Given the Nontangling Condition in combination with certain additional assumptions, such as a restriction to binary branching structures for example, the conclusion that the hierarchical structures are different is, of course, already entailed by the linear order. A very simple account of the lack of ambiguity in (8a) and the presence thereof in (8b) can be given if the direct object in the marked order is moved across the indirect object. The general idea is that scope corresponds to c-command and that an element that has moved can be interpreted in its moved position or in the position of its trace(s) (see Aoun and Li 1989, 1993; Hornstein 1995; Lechner 1996, 1998) for proposals along these general lines). Neither of the objects has moved in (19a), therefore scope relations are unequivocal and since the accusative object c-commands the dative object, the former takes scope over the latter. In (19b) on the other hand, there are two potential scope positions for the dative object, one of which does and the other one of which does not c-command the accusative object. Consequently, the example is ambiguous. Geschenk] [[fast (19) a. [DASS [er [[mindestens ein [German] THAT he at.least one.ACC.SG.N present almost Gast] überreichte]]]] jede-m (✓d [ c, *c [ d) every-DAT.SG.M guest handed ‘that he handed at least one present to almost every guest’ Gast]i [[mindestens ein b. [DASS [er [[fast jede-m one.ACC.SG.M THAT he almost every-DAT.SG.M guest at.least Geschenk] [ti überreichte]]]]] (✓d [ c, ✓c [ d) present handed ‘THAT he handed at least one present to almost every guest’ The reasoning given here represents only the bare outline of the intricate arguments in the literature. Clearly, once the premise is accepted that scope relations are syntactically expressed exclusively in terms of c-command, a movement account of scrambling becomes all but unavoidable. We return to the issue of scope once we have discussed some of the alternatives. Another reason for assuming a movement analysis of scrambling can be derived from the hypothesis that thematic structures map in a uniform way onto underlying syntactic structures across languages (see Fanselow 2001 for critical discussion). Baker (1988) gave an influential formulation to this idea under the name of the Uniformity of Theta Assignment Hypothesis (UTAH), (20). UTAH and various slightly weaker versions discussed in the literature (see Baker 1997; Baltin 2001; Levin and Rappaport Hovav 1995; Perlmutter and Postal 1984; Pesetsky 1995; Rosen 1984 for relevant discussion) entail that if two sentences have the same thematic representation, as the sentences in (8) do, then the hierarchical organizations of the arguments in the underlying structure of the
40. Word Order
1413
sentence must be identical. Therefore, at least one of the two sentences in (8) must deviate from the underlying structure and hence be derived through movement. (20) The Uniformity of Theta Assignment Hypothesis (UTAH): Identical thematic relationships between items are represented by identical structural relationships between those items at the level of D-Structure. (Baker 1988: 46) Once the trace-based account is in place, the structural difference between the scrambled and non-scrambled orders is used to derive other differences between the sentences. For example, Lenerz (1977) observed that examples with the neutral word order allow focus projection if the main sentence accent is on the immediately preverbal argument (and a number of further conditions are met) while sentences with the scrambled order do not allow focus projection (see Frey, this volume). Given the trace-based account of scrambling, the impossibility of focus projection with scrambled orders can be related to the presence of a trace. One can, for example, make the assumption that focus projection from a verbal satellite is possible only if that satellite is both the sister of the verb and selected by it. Consider an example where the neutral order is NOM-ACC-DAT, i.e., an example much like (19a), but without the additional complication of having quantificational objects. If focus projection is possible only from the sister of the verb, then focus projection form the accusative object is not possible: Although the accusative is selected by the verb, it is not the verb’s sister. In (19a) the dative is the sister of the verb rather than the accusative; hence, focus projection is correctly predicted to be possible from the dative here but not from the accusative. In (19b), the sister of the verb is the silent trace. Since the trace cannot act the focus exponent, focus projection is banned altogether in this structure.
5.1. The A- vs. A-movement debate In the Principles and Parameters theory an attempt was made to unify all movement operations under a single transformation, Move α (Lasnik and Saito 1992). Given its generality, conditions on this transformation had to be kept to a minimum. The bulk of the empirical limitations on movement came from representational constraints on the output of this transformation. The generalization that fillers generally c-command the site of the gap for example was handled by the Proper Binding Condition (Fiengo 1977), various locality constraints − by the Empty Category Principle (Chomsky 1981, 1986; Cinque 1990; Lasnik and Saito 1992; Rizzi 1990), etc. A representational device that was instrumental in the attempted unification of all movement transformations under Move α was the typology of abstract elements. Alongside abstract pronominals, the theory recognized exactly two kinds of traces: NP-traces and wh-traces. The former were assimilated to anaphoric elements: like anaphors, they were assumed to require a local c-commanding binder in an A(rgument)-position. (A-positions were standardly understood to be all those where a thematic role could potentially be assigned and, in addition,
1414
VI. Theoretical Approaches to Selected Syntactic Phenomena
the canonical subject position.) Wh-traces on the other hand were assimilated to referential expressions: like referential expressions, wh-traces were assumed to be incompatible with any kind of c-commanding coindexed expression in an A-position. There were no other traces: movement had to leave behind either an A-trace or an Ā-trace. The dichotomous partitioning of movement operations goes back to Postal’s (1971) study on the interaction of movement with binding and the assignment of reference. Postal observes that the range of logically possible interactions between movement, binding and reference is quite large. Nevertheless, he claims, only two kinds of interactions are observed in English: type A and type B. Since there is no third class, B is simply the complement of A, that is, B is non-A or Ā. Postal’s terminology was widely adopted, though with A and Ā given additional meaning. The relevant facts are clearest where bound-variable interpretations rather than simply coreferential interpretations of pronouns and anaphors are concerned. The examples in (21) illustrate the pattern found with raising, an A-movement operation in Postal’s typology. The examples in (21b) show that when raising does not take place, binding of the experiencer of seem is impossible, (21b [i]), and binding into the experiencer is likewise impossible, (21b [ii]). On the other hand when the subject of be a genius is raised into the main clause, binding becomes possible. The examples in (22) illustrate the pattern with Ā-movement. The examples show that when a wh-phrase moves across a pronoun this movement does not extend the binding domain of the wh-phrase. The violation in (22a [i]) is felt to be very severe and goes by the name of strong crossover, while the violation in (22a [ii]) is much less severe and goes by the name of weak crossover (Wasow 1972). The examples here crucially involve quantifiers to guarantee binding rather than anaphoric dependency without binding or simply coreference (see Reinhart 1983; Reinhart and Grodzinsky 1993; Williams 1997). The binding-theoretic approach to traces sketched above offers an immediate account of strong crossover, (22a [ii]): the trace left behind by wh-movement behaves like a referential expression; therefore, it must not be c-commanded by a coindexed element in an A-position; in examples like (22a [ii]) this condition is violated; hence, they are ungrammatical. (21) a. (i) (ii) b. (i) (ii)
Every generalk seems to himselfk tevery general to be a genius. Every generalk seems to hisk brother tevery general to be a genius. *It seems to himk/himselfk that every generalk is a genius. *It seems to hisk brother that every generalk is a genius.
(22) a. (i) (ii) b. (i) (ii)
*Whok did hek see twho? *Whok did hisk brother see twho? *When did hek see whok? *When did hisk brother see whok?
It should be noted that there are well-known problems with the binding-theoretic treatment of traces as referential expressions. Traces of topicalized anaphors and pronouns behave respectively like anaphors and pronouns rather than like referential expressions, as discussed by Frey (1993) and Postal (1971, 2004). Postal (1971) noted that all of the A-movements, i.e., those that give rise to new binding relations, involved relatively local movements crossing finite clause boundaries.
40. Word Order
1415
The class of Ā-movements on the other hand includes unbounded movement operations like wh-movement, relativization, clefting, etc. Work in the Principles and Parameters tradition assumed that the two classes of binding behavior discovered by Postal correlated closely with the type of position an element moved to, and with the motivation for this movement. A-movement was seen as movement of nominal arguments from a position where their case could not be licensed to a different argument position where their case could be licensed: In the raising example in (21), the launching site of the movement is the position associated with the thematic interpretation of every general in the embedded infinitive and the landing site is the subject position in the matrix. The subject position is an A-position by assumption. Ā-movements on the other hand are movements to positions outside of the thematic and case systems, the non-A or Ā-positions: Examples like (22) involve movement to the specifier of CP, a position that is not involved in thematic or case-licensing. Later work correlated a number of further properties with Ā-movement: unboundedness on a par with wh-movement in the sense of Chomsky (1977) and parasitic gap licensing (Chomsky 1982). In this context, the question arises where scrambling falls in the typology of movement operations. Is it an A-movement operation or an Ā-movement operation? This question cannot be answered straightforwardly. On the one hand, one can argue from the principles of the theory (Stechow and Sternefeld 1988) that scrambling cannot be A-movement, because it does not target an A-position, i.e., a potential thematic or case position. This is clear for prepositional phrases. The examples in (23) illustrate scrambling of argumental PPs. Only the scrambled orders are given, in the unmarked order the subject precedes the PP. The movement of the PP behaves like scrambling in that it induces a scopal ambiguity, (23a), and allows binding into the subject, (23b). However, unlike nominal arguments prepositional arguments do not need to be case-licensed; hence, the examples in (23) cannot be accounted for in terms of movement to a case position. über mindestens ein echtes Problem fast jeder (23) a. weil [German] because about at.least one real problem almost every Doktorand nachgedacht hat PhD.student after.thought has (d [ c, c [ d) ‘Because almost every PhD student has thought about at least one real problem’ mit jedem Kindi seinei Mutter geschimpft hat b. weil because with every child its mother scolded has ‘because hisi mother scolded every childi’ Further examples of PP scrambling add to this argument. Example (24) (from Frey and Pittner 1998) illustrates scrambling of a non-argumental instrument PP. Again, the scope ambiguity is taken to be one of the diagnostic properties of scrambling; hence, (24b) represents the scrambled order. In the Principles and Parameters framework, adjuncts can never move to A-positions, because the resulting movement chain would be illformed (Chomsky and Lasnik 1993). Hence, scrambling cannot be A-movement.
1416
VI. Theoretical Approaches to Selected Syntactic Phenomena
jedem (24) a. WEIL an mindestens einem Abend mit fast [German] because at at.least one evening with almost every Computer gearbeitet wurde computer worked was (d [ c, *c [ d) ‘because work was done on at least one evening with every computer’ jedem Abend b. WEIL mit mindestens einem Computer an fast because with at.least one computer at almost every evening gearbeitet wurde worked was (d [ c, c [ d) ‘because work was done with at least on computer on almost every evening’ Finally, (25) illustrates that scrambling can split noun phrases (Fanselow 1987; Kuthy and Meurers 2001; G. Müller 1995; S. Müller 1999 for discussion and references). Again this movement displays the characteristic properties of scrambling in that it gives rise to new binding relations and induces a scope ambiguity. Clear cases of A-movement never split up noun phrases, hence, scrambling is not A-movement. The NPs split in (25) are headed by the noun ‘books’, which takes the about-PP as an optional complement. (25) a. WEIL über jeden Popstari seinei Fans Bücher aus [German] because about every popstar his fans books from der Bibliothek ausgeliehen haben the library checked.out have ‘because hisi fans have checked out books about every popstari from the library’ jeder b. WEIL über mindestens einen Popstar fast because about at.least one popstar almost every aus der Bibliothek ausgeliehen hat from the library checked.out has ‘because almost every student has checked out a book star from the library’
Student ein Buch student a book (d [ c, c [ d) about a least one pop-
On the other hand, the fact that scrambling, like A-movement, does not cross finite clause boundaries (Fanselow 1990) and does not give rise to weak crossover effects militates against treating it as Ā-movement. Other considerations that have been invoked in this debate involve the interaction of anaphor and reciprocal binding with scrambling and the licensing of parasitic gaps. We turn to these arguments now. Neither of them turns out to be conclusive.
5.1.1. Anaphors and reciprocals Regarding anaphor and reciprocal binding, the generally agreed upon fact is that in a double object construction a dative object cannot bind an accusative reciprocal no matter what the order of the two elements is, (26a). On the other hand the accusative object can bind a dative reciprocal, (26b) − there is a certain amount of disagreement to what extent this depends on the order of the two objects. It is equally clear that a subject reciprocal can never be bound by an object, independently of the word order, (26d−e).
40. Word Order Gäst-e-n einander | einander (26) a. *dass ich {den that I the.DAT.PL guest\PL-PL-DAT each.other each.other den Gäst-e-n} vorgestellt habe the.DAT.PL guest\PL-PL-DAT introduced have intended: ‘that I introduced the guests to each other’
1417 [German]
Gäst-e einander vorgestellt habe b. dass ich die that I the.ACC.PL guest\PL-PL.ACC each.other introduced have ‘that I introduced the guests to each other’ vorgestellt habe c. ?dass ich einander die Gäst-e that I each.other the guest\PL-PL.ACC introduced have ‘that I introduced the guests to each other’ Fisch und der Frosch einander angeguckt d. dass der that the.NOM.SG.M fish and the.NOM.SG.M frog each.other at.looked haben have ‘that the fish and the frog looked at each other’ Fisch und den Frosch einander | einander e. *dass {den that the.ACC.SG.M fish and the.ACC.SG.M frog each.other each.other den Fisch und den Frosch} angeguckt haben the.ACC.SG.M fish and the.ACC.SG.M frog at.looked have intended: ‘that the fish and the frog looked at each other’ The conclusions to be drawn from these facts and the much murkier judgments involving the reflexive sich have varied substantially. For a representative sample, see Frey 1993; Haider 2006; Haider and Rosengren 2003; G. Müller 1995, 1999; Putnam 2007. The problem is the following. On the assumption that the underlying order of objects for vorstellen − ‘introduce’ is indirect object (dative) before direct object (accusative), (26b) represents the scrambled order. The fact that the accusative may antecede the reciprocal seems to show that scrambling behaves like English A-movement (see [21] above). If this is taken to be the core fact, the rest of the observations in (26) have to be attributed to independent factors: the fact that the dative cannot antecede the accusative reciprocal can be traced to a restriction against accusative reciprocals not being able to co-occur with dative DPs (Frey 1993: 113), that the order in (26c) is derived from that in (26b) by further movement of the reciprocal which is not scrambling (see Gärtner and Steinbach 2000 for reasons to be skeptical), the observations in (26e) must be attributed to some special status of subjects in the binding theory, etc. Alternatively, we could take (26e) as the starting point and conclude that scrambling is not A-movement and therefore does not allow binding of a co-argument anaphor or reciprocal from the derived position. This might then be coupled with the assumption that the order in (26b) is the underlying order (G. Müller 1995, 1999). Finally one might conclude that reciprocal and anphor binding in German operates in terms of a case or argument hierarchy rather than the phrase structural c-command hierarchy (Grewendorf 1985). At the present level of understanding (Sabel 2002; Sternefeld and Featherston 2003), no firm arguments can be based on these facts. The central question for the A- vs. Ā-movement debate is whether scrambling ever has an effect on reciprocal and anaphor binding in German, a question which has not been settled.
1418
VI. Theoretical Approaches to Selected Syntactic Phenomena
5.1.2. Parasitic gaps The second inconclusive argument revolves around the question whether scrambling licenses parasitic gaps. Example (27a) is a standard case of a parasitic gap. The example indicates the position assumed for the trace and the parasitic gap according to standard Principles and Parameters analyses. Example (27b) shows that the presence of a gap in the object position of read is indeed parasitic upon the presence of a gap in the object position of file. Without the latter gap, the former is illicit. The following two additional generalizations are at the heart of the argument concerning the A- vs. Ā-movement nature of scrambling: (i) The real gap, the trace in (27a), may not c-command the parasitic gap, (28), and (ii) only Ā-movement but not A-movement licenses parasitic gaps, (27) vs. (29). (27) a. What did John file twhat without reading ePG? b. *What did John file the book without reading twhat? (28) a. Who did you [[run into twho ] [without recognizing ePG ]]? b. *Who twho [[ran into you] [without (you) recognizing ePG ]] (29)
*The book was filed tthe book without reading ePG .
Examples like (30) were discussed by Felix (1985), who analyzes them in the way indicated, that is, as structures with parasitic gaps licensed by scrambling. If this is the correct analysis and if parasitic gaps are indeed licensed only by Ā-movement, then (30) provides a strong argument against an A- and for an Ā-movement analysis of scrambling. (30) weil ihn ohne ePG interviewt zu haben er interviewed to have because 3SG.NOM.M 3SG.ACC.M without tihn einstellte hired ‘because he hired himk without having interviewed himk’
[German]
The argument is not straightforward however. Example (31) is taken from Webelhuth (1992: 207) and probably the most famous example in this debate. The example is intended to show that scrambling can simultaneously exhibit A-properties and Ā-properties. The A-property in this examples is the lack of a weak-crossover effect and the creation of binding into the dative object by the quantified accusative object, the Ā-property is the licensing of a parasitic gap in the infinitival adjunct. The simultaneity of A-properties with Ā-properties has come to be known as Webelhuth’s paradox. Example (31) was intensely discussed in the subsequent literature, which tried to resolve the paradox for government and binding theory (see many of the papers in Corver and Riemsdijk 1994). (31) ?Peter hat jede-n Gasti [ohne ePG anzuschauen] look.AT.INF Peter has every-ACC.SG.M guest without sein-em i Nachbar-n ti vorgestellt. his-DAT.SG.M neighbor-DAT.SG.M introduced ‘Peter introduced every guest to his neighbor without looking at.’
[German]
40. Word Order
1419
An important approach to the paradox was proposed by Mahajan (1990: 60), who uses the contrast between (31) and the much worse (32) to argue that the A- and Ā-properties of scrambling are not simultaneous. Rather, on Mahajan’s analysis, there is an initial step of A-movement, followed by a subsequent step of Ā-movement. This analysis allows (31), where the Ā-property is exhibited lower in the tree than the Ā-property, but in conjunction with the ban on improper movement it disallows (32), where the Āproperty is established lower than the A-property. (32) *?Peter hat jede-n Gasti sein-em i Peter has every-ACC.SG.M guest his-DAT.SG.M Nachbar-nj [ohne ePG anzuschauen] tj ti vorgestellt introduced neighbor-DAT.SG.M without look.AT.INF ‘Peter introduced every guest to his neighbor without looking at.’
[German]
However, Mahajan’s solution to the paradox is not viable. Various authors have pointed out that the contrast between (31) and (32) does not stem from the illicit binding relation between the accusative object and the possessor in the dative object, but rather from the fact that the accusative has been too far removed from the infinitive containing the putative parasitic gap (Fanselow 1993; Lee and Santorini 1994; G. Müller and Sternefeld 1994). Thus Fanselow (1993: 34) claims that a scrambled object cannot be separated from the infinitive containing the parasitic gap except by adjuncts and subjects. The degradation in (32) then comes from the lack of adjacency between the accusative and the infinitival. Neeleman (1994: 403) provides the acceptable Dutch example (33). The example involves two stacked adjuncts, the higher one exhibits surface binding, the lower one a parasitic gap. Because of the hierarchical arrangement of the adjuncts, the example is not amenable to Mahajan’s solution. Similar German examples, like (34), seem to be equally acceptable as (31) and much better than (32). (33) Dat Jan [de rivalen]i namens [Dutch] elkaari [Oi [zonder ti aan te That Jan the rivals on.behalf.of each.other without at to kijken]] feliciteert look congratulates ‘That Jan congratulates the rivals in each other’s name without looking at them’ (34) weil [German] du [jede-n Gast]i an sein-emi Geburtstag because you every-ACC.SG.M guest on his-DAT.SG.M birthday ohne ePG anzugucken ti umarmt hast without look.at.INF hugged have ‘because you hugged every guest on his birthday without looking at him’ On the assumption that the example in (34) involves a parasitic gap, it provides evidence against Mahajan’s account in terms of a succession of A- and Ā-movements and reaffirm the existence of Webelhuth’s paradox. Deprez (1994: 128) gives the example in (35) to make a similar point. The infinitive contains both the parasitic gap and the bound pronoun. The example shows that the object in (35) can simultaneously bind a pronoun and license a parasitic gap.
1420
VI. Theoretical Approaches to Selected Syntactic Phenomena
(35) weil [German] Gast [ohne sein-em Partner Maria jede-n because Maria every-ACC.SG.M guest without his-DAT.SG.M partner ePG vorzustellen] allein t läßt introduce.INF alone lets ‘because Maria lets every guest alone without introducing him to his partner’ As presented, Deprez’s argument is inconclusive, as it rests on the untested assumption that the parasitic gap in the infinitival cannot scramble (Lee and Santorini 1994: 294 fn. 15). Since scrambling is clause bound, it is in principle clear how to control for scrambling of the parasitic gap. Relevant examples would have to have the form in (36). As far as I know, this question has not been studied. (36) [CP₁ whk [… [inf adjunct … [hisk N] ePG …] [CP₂ … twh …]]] A more fundamental way out of the paradox is taken by authors who deny that parasitic gaps are involved to begin with (Fanselow 2001; Haider 1993; Haider and Rosengren 2003; Kathol 2001). These authors question whether the relevant constructions involve parasitic gaps at all. Haider and Rosengren (2003: 243) make the following observation. Postal (1993a) in his discussion of distributional differences between parasitic gaps and secondary gaps in across-the-board (ATB) constructions claims that parasitic gaps are impossible in contexts that Postal (1998) came to call antipronominal, that is, contexts where anaphoric pronouns cannot appear, while across-the-board gaps are possible in such environments. Relevant examples that contrast parasitic gaps with ATB gaps from Postal (1993b) are given in (37). (37) a. (i) *Where did Elaine work twhere without ever living ePG? (ii) Where did Elaine work twhere and Gwen vacation twhere? b. (i) *What he became twhat without wanting to become ePG was a traitor. (ii) What Ted was twhat and Greg intended to become twhat was a doctor. c. (i) *This is a topic about which he should think tabout which before talking ePG. (ii) This is a topic about which you should think tabout which and I should talk tabout which . Crucially, the putative parasitic gaps licensed by scrambling in German pattern with ATB gaps in English rather than with parasitic gaps in that they are possible even in antipronominal contexts. This suggests that the empty category in the examples in (38) is an ATB gap rather than a parasitic gap. (38) a. Wo hat Elena, anstatt mit dir e zu wohnen, ihr Büro [German] where has Elena instead.of with you to live her office eingerichtet? set.up ‘Where did Elena set up her office instead of living there together with you?’
40. Word Order
1421
b. Was er wurde, ohne eigentlich e werden zu wollen, war ein what he became without actually become to want was a Syntaktiker. syntactician ‘What he became without really wanting to do was become a syntactician.’ e zu schwätzen, nachdenken c. Das ist ein Thema, über das er, anstatt that is a topic about which he instead.of to chat think.after sollte. should ‘This is a topic he should think about instead of chatting.’ Conversely, parasitic gaps but not ATB gaps can be licensed in complex subjects, (39). Again, example (40) (from Haider and Rosengren 2003: 243) shows that the putative parasitic gaps in German pattern with English ATB gaps rather than with parasitic gaps. Fanselow uses related arguments against the parasitic gap analysis of examples like (30) and (31). He suggests that the relevant infinitival subordinators ohne − ‘without’, anstatt − ‘instead of’, etc. act as ‘quasi coordinating conjunctions’ and that examples like (30) and (31) result from ellipsis under quasi coordination. For a different analysis of these facts see Kathol (2001). (39) a. He’s a man that anyone who talks to ePG usually likes t b. *He’s a man that anyone who talks to t and anyone who sees t leaves immediatley. (40) *Welches Haus wollte jeder, dem er e zeigte, twelches Haus which house wanted everyone whom he showed sofort kaufen at.once buy
[German]
In view of the rather drastic differences between the German construction under discussion here and parasitic gaps in English, the least we can conclude is that the argument for the Ā-status of scrambling based on these facts rests on a very weak foundation.
5.1.3. Conclusion At the beginning of this subsection, we noted that proponents of the Ā-movement analysis of scrambling often point out that the set of categories that undergo scrambling is a superset of those that undergo A-movement. This was illustrated for PPs above, which are not (at least not outside of locative inversion) assumed to undergo A-movement. This does not mean, however, that the set of categories that undergo scrambling is identical to the set undergoing standard Ā-movement. That this expectation of an Ā-movement account of scrambling is frustrated is illustrated in (41) which contrasts the possible topicalization of a separable verbal prefix with the impossibility of scrambling this prefix. The judgments given below assume that no focal stress is placed on aus. Movement of focused phrases in the Mittelfeld is generally taken not to be scrambling (Haider and
1422
VI. Theoretical Approaches to Selected Syntactic Phenomena
Rosengren 1998; Lenerz 1977, 2001; G. Müller 1999; Stechow and Sternefeld 1988) and to show substantially different behavior from scrambling (see also Neeleman 1994). (41) a. Aus hat er das Radio sicher nicht gemacht. off has he the radio certainly not made ‘He has certainly not turned off the radio.’
[German]
b. dass (*aus) er (*aus) das Radio (*aus) sicher (*aus) nicht *(aus) that off he off the radio off certainly off not off gemacht hat made has ‘that he has certainly not turned off the radio’ Mittelfeld scrambling differs from A-movement in terms of the categories targeted and in terms of locality (extraction from NP is allowed). Mittelfeld scrambling also differs from Ā-movement in terms of the categories targeted and in terms of locality (extraction across finite clause boundaries disallowed). Scrambling differs from Ā-movement in terms of its cross-over behavior, and potentially also from A-movement with respect to anaphor and reciprocal binding. It seems clear then that scrambling is neither A-movement nor Ā-movement.
5.2. The trigger problem Under standard Government and Binding-theoretic assumptions, movement was a free operation, its output − subject to a number of constraints whose function it was to curb the generative power of the free movement operation. This meant in particular that the question why a particular movement happened was not of primary importance, as long as the result did not violate any constraints. In this context, proposals were made that linked the availability of scrambling in a particular language to the availability of landing sites in that language (e.g., G. Müller 1995).The theory did not require the analyst to identify triggers for a particular movement. The advent of the Minimalist Program has brought a change in perspective. The copy theory of movement (Chomsky 1993) made obsolete the dichotomous treatment of movement gaps as either anaphors (A-movement) or R-expressions (Ā-movement). Under the copy theory, movement gaps are filled by much richer and much more flexible objects than traces. Among other things, copies allow for simple solutions to the problems encountered by the binding-theoretic treatment of traces mentioned below example (22). In Minimalism movement is no longer viewed as free in principle, but is subject to a last-resort condition, under which an item may move only if it has to (a.o. Chomsky 1995b, 2000; Lasnik 1995; Stroik 1999). A further constraint on theorizing comes from the idea that movement is driven by features that must have either a morphological or an interpretive reflex (Chomsky 1995a: section 4.10). While this change in perspective on movement in general has rendered the debate on the A- vs. Ā-nature of scrambling somewhat theoretically obsolete (Kidwai 2000; Putnam 2007), the underlying issues have not been resolved, and indeed, the same question arises in other frameworks, as we will see shortly: How many different ways do natural
40. Word Order
1423
languages provide for establishing antecedent-gap relationships? How do these interact with each other (Abels 2007) and with interpretive (scope and anaphoric) properties? What, if any, generalizations govern the relation between the length/landing site/trigger of a movement operation and its semantic behavior? How can we account for such generalizations (for a general approach see Williams 2002, 2011)? The Minimalist perspective brings into sharp focus a different issue: Why does scrambling apply? In Minimalism the answer to this type of question lies in identifying the trigger for scrambling. The concrete suggestions have ranged from case (Zwart 1993), via a number of semantic features (e.g., topic in Meinunger 2000, scope in Hinterhölzl 2006, referentiality in Putnam 2007), to purely formal triggers (Grewendorf and Sabel 1999; G. Müller 1998). All of these are somewhat problematic. Linking scrambling to case is problematic, because it leads to the wrong expectation that only noun phrases will undergo scrambling. The other proposals have similar shortcomings. Linking scrambling to scope, leads to the wrong expectation that non-scopal elements (such as proper names) do not scramble and that scrambling across them does not happen; such scrambling would have no scopal consequences. If the trigger for scrambling were scope, the fact that scope reconstruction is compatible with scrambling would have to remain mysterious. The proposals that scrambling is triggered by a referentiality feature or a topic feature stumbles on the fact that quantifiers scramble, although they are clearly not referential and make for bad topics. Purely formal features triggering scrambling may be able to describe the data correctly, but shed no light on the nature of scrambling. One of the problems for a triggering account, as the previous paragraph shows, is that scrambling does not seem to have a uniform effect. Haider and Rosengren (2003) have taken this as an argument for a return to an account where antecedent-trace relationships can be created without a trigger. Grewendorf (2005) follows the opposite strategy, claiming, in essence, that scrambling is not a unified phenomenon and should be further analyzed into a set of different movement operations triggered by different features and targeting slightly different dedicated positions in the Mittelfeld, basing his analysis on Belletti (2004). Grewendorf (2005) is of course not alone in suggesting a multi-factorial analysis of scrambling. Optimality theoretic accounts (Choi 1996, 1997, 2001; Cook and Payne 2006; G. Müller 1999) and similar competition-based accounts (Wurmbrand 2008) are inherently multifactorial. Such accounts allow the same word order patterns to be conditioned by different factors, which is their advantage. However, as argued by G. Müller (1999), standard optimality theory is incapable of capturing the fact that in any given context more than one scrambled or unscrambled word order may be acceptable. At present, there is no satisfactory solution that has been shown to work over a broad range of facts. Scrambling remains as a theoretical problem for Minimalism, as it appears to defy the condition that movement is a last resort. We now turn our attention away from approaches where the unmarked and scopally unambiguous order of elements is represented in the (abstract) surface constituent structure in the form of traces. There are two types of approaches that avoid traces in their treatment of scrambling: those that do adhere to the Nontangling Condition and those that do not.
1424
VI. Theoretical Approaches to Selected Syntactic Phenomena
6. Base generated adjunction structures: LFG The treatment of scrambling in Lexical Functional Grammar is typical of a traceless phrase-structure based approach. Similar traceless base generation accounts have also been proposed in other frameworks. See for example Bayer and Kornfilt (1994), Fanselow (2001), Kiss (2001), Neeleman (1994), and Neeleman and van de Koot (2002). Phrase structure in Lexical Functional Grammar strictly adheres to the Nontangling Condition. However, scope and binding are not expressed in terms of dominance relations in phrase structure trees alone and the idea that there is universal alignment of thematic structure with abstract phrase structure is also not part of the theory, which removes two of the main arguments for the movement analysis in Government and Binding theory and Minimalism. On these assumptions, scrambled structures without traces can easily be entertained. Bresnan (2001) suggests an approach in which the grammatical function of an element in a scrambled position can be recovered using case information. The general schema for such associations, which is restricted to adjoined positions, is given in (42). The schema licenses structures for scrambling where the scrambled elements are base generated in VP-adjoined positions. Case rather than configuration identifies the grammatical function of elements, which allows them to appear in any order. Crucially, (42) allows to identify grammatical function only in the local fstructure, which encodes the observation that scrambling is clause-bounded. (42) Morphological Function Specification via dependent marking (↓ CASE) = k 0 (↑ GF) = ↓ (Bresnan 2001: 111) To go with this analysis, Bresnan (1998, 2001) formulates a binding theory designed to capture the observation that in some languages long movement does but clause internal scrambling does not give rise to weak crossover effects. This was illustrated above for Japanese ([11] vs. [10]) and German ([14] vs. [9]). Bresnan suggests that binding relations can be read off at different levels. The domain of a binder is the minimal clause or predication structure containing it. Furthermore, a binder must be at least as prominent as any pronoun bound by it. This prominence requirement holds across levels, but the definition of prominence is slightly different. On a-structure, prominence is defined in terms of a thematic hierarchy (agent > beneficiary > experiencer/goal > instrument > patient/theme > locative); on f-structure, prominence is defined as higher rank in the relational hierarchy of grammatical functions (SUBJ > OBJ > OBJθ > OBLθ > COMPL > ADJUNCTS); finally on c-structure, prominence is defined in terms of linear order. The precise notion invoked is f-Precedence, defined as follows: (43) Definition of f-Precedence Given a correspondence mapping φ between a c-structure and its f-structure, and given two subsidiary f-structures α and β, α f-precedes β if the rightmost node in φ−1(α) precedes the rightmost node of φ−1(β). (Bresnan 2001: 195) According to this definition, prominence on c-structure is determined in terms of the collection of c-structure nodes that correspond to a particular f-structure. An f-structure α f-precedes an f-structure β just in case every correspondent of α precedes some corre-
40. Word Order
1425
spondent of β. Bresnan further assumes that long-distance filler-gap relations are mediated via inaudible traces. Therefore, the f-structure that a long-distance displaced element corresponds to has, in fact, two correspondents on c-structure: the filler and the trace at the site of the gap. This f-structure f-precedes another f-structure only if the other fstructure has a correspondent that follows both the filler and the gap corresponding to the former. With these notions of prominence at different levels in place, Bresnan argues that there is variation regarding the type of prominence that is relevant to binding theory in different languages. Languages can vary whether a binder has to be more prominent than a bound pronoun in terms of f-Precedence, syntactic rank, or the thematic hierarchy. Disjunctive and conjunctive formulations are also allowed. For a language like German, in which local scrambling does not give rise to weakcrossover effects but long movement does, Bresnan assumes that prominence can be construed in terms of f-precedence. Bresnan also assumes that if a constituent containing a pronoun scrambles, then the binder of the pronoun may follow it, just in case it is more prominent on the relational hierarchy. This is accounted for by assuming that prominence may also be construed in terms of syntactic rank. In other words, an unscrambled or locally scrambled argument can always bind (into) arguments that follow it, but it can bind (into) arguments that precede it only if it, the binder, is more prominent syntactically than the argument which is being bound (into). This accounts directly for examples like (9). The overall formulation of the binding theory for a language like German is therefore disjunctive: prominence on c-structure or prominence on f-structure. Bresnan (2001: 91) assumes that there is an economy condition on the insertion of traces, (44). (44) Economy of Expression All syntactic phrase structure nodes are optional and are not used unless required by independent principles (completeness, coherence, semantic expressivity). (Bresnan 2001: 91) This principle entails that local scrambling, local topicalization, and local wh-movement never leave a trace in German. The reason is that local scrambling, topicalization and whmovement never require traces for completeness, coherence, or expressivity. Therefore, postulating a trace would involve positing an extra node that is not required, which is disallowed under the principle of economy of expression. In Bresnan’s theory there is a fundamental distinction between local movement and long-distance movement. Local wh-movement, local topicalization, and local scrambling are predicted to pattern together and to behave differently from long wh-movement and long topicalization. Most other theories predict that wh-movement and topicalization behave the same way, whether long or short, and distinguish them from short scrambling. The reconstructive behavior short scrambling, short wh-movement, and short topicalization furnishes a relevant test of these divergent predictions. While all three operations allow an object to reconstruct for binding below the subject, Frank, Lee, and Rambow (1992), Frey (1993), and Lechner (1998) claim that a scrambled direct object cannot reconstruct for binding below an indirect object. Topicalized and wh-movement direct objects, on the other hand, readily reconstruct under an indirect object. This state of affairs undermines one of the fundamental assumptions of Bresnan’s account. (Wurmbrand 2008 provides a somewhat more nuanced description of the scrambling facts, but the essence of the problem for Bresnan’s account remains.)
1426
VI. Theoretical Approaches to Selected Syntactic Phenomena
A number of further questions for this approach have been raised in the literature. Berman (2003: 84) discusses the fact that subject-experiencer verbs allow backwards binding even if the order is nominative before accusative: (45) … weil interessiert. seinei Mutter jedeni because his mother everyone interests ‘… because everyone is interested in their mother’
[German]
Here jeden does not f-precede the bound pronoun and neither does jeden outrank the subject on the relational hierarchy. Bresnan’s formulation of the binding theory therefore fails to predict that examples like this are acceptable. A solution can be given, Berman argues, if prominence is defined as the disjunction of f-precedence and thematic prominence. Berman (2003: 85 fn. 12) points out another problem for Bresnan’s formulation of the binding theory: (46) *Jedeni geliebt hat seinei Mutter everyone loved has his mother
[German]
Here the operator jeden precedes − and indeed f-precedes − the pronoun, yet binding of the pronoun is impossible. Cook and Payne (2006) raise the more fundamental point that a disjunctive formulation is inherently non-explanatory. Their paper is concerned with scope rather than binding, but the criticism of a disjunctive scope theory carries over mutatis mutandis to a critique of the disjunctive binding theory. Another issue that remains unaddressed are examples like (25) above. Example (25) was used to illustrate the possibility to scramble a PP out of an NP. Notice now that the PP would be realized as a c-structure daughter of a node that corresponds to the fstructure of the verb. Therefore, the relation between the PP and the gap would have to be mediated via a trace. Therefore, the scrambled PP in (25) does not f-precede the pronoun. Since the PP in addition fails to outrank the subject, Bresnan’s account predicts a cross-over effect here, counter to fact. Finally, to account for scrambling of adjuncts (Frey and Pittner 1998), additional assumptions would have to be invoked. Frey and Pittner themselves argue for a tracebased analysis of argument and adjunct scrambling and assign different classes of adjuncts different base-positions. A straightforward translation of this into LFG would be to assume that there is a hierarchy of adjunct grammatical functions, but allowing caseless adjuncts to scramble would threaten the idea of function identification on the basis of case. The LFG account just sketched highlights an important question: What is the conditioning factor licensing scrambling? Under the LFG account, freedom in word order is tied to the availability of function specification either via head marking (not discussed above), i.e., agreement on the verb, or dependent marking, (42). Under this theory, a language requires sufficiently differentiated case morphology to allow scrambling. The same intuition is expressed by Neeleman and Weerman (1999). In their account case is always expressed as a syntactic head, but when this head fails to be expressed through a morphological case paradigm, it is subject to the Empty Category Principle. The Empty
40. Word Order
1427
Category Principle, according to Neeleman and Weerman (1999), derives case adjacency effects in languages like English and the possibility to scramble in languages with overt case morphology. The truth of this correlation between case morphology and scrambling has been questioned. Dutch is often cited as a counterexample to the claim that rich case paradigms are a necessary condition for scrambling, since Dutch has no morphological case marking on full nominals yet allows a certain degree of word-order freedom in its Mittelfeld. This word-order freedom, which is also called scrambling in the literature, is much more restricted than scrambling in German; Dutch scrambling generally cannot permute arguments with each other but only arguments with adjuncts. When arguments are permuted in the Dutch Mittelfeld, the scope and binding patterns closely resemble those found in long-distance wh-extractions (Neeleman 1994), i.e., scope reconstruction is obligatory and weak-crossover effects do obtain. Dutch therefore does not seem to exhibit scrambling of the type found in German, Japanese, Hindi, and Persian; crucially, it also does not show morphological case. Neverthelss, the connection between scrambling and case has recently been called into question by Putnam (2007), who claims that some of the German heritage dialects of North America allow scrambling of the German type even in the absence of case morphology. Unfortunately, the examples provided by Putnam (2007) do not establish the point clearly. A different connection that has been made, but which is not expressed in the LFG account of scrambling, is one between head-finality and the availability of scrambling. Haider (1997, 2006), and Riemsdijk and Corver (1997) among others claim that headfinality is a prerequisite for scrambling. In German for example, scrambling is possible only in head-final phrases (verb phrases, the Mittelfeld, adjective phrases), but it is impossible in head initial ones (noun phrases). The following examples from Haider (2006: 206) illustrate this point. Subject den erste-n Platz (47) a. [VP Dem the.DAT.SG.N subject the.ACC.SG.M first-ACC.SG.M position streitig gemacht] hat das Objekt. contested made has the.NOM.SG.N object ‘The object has competed for the first position with the subject.’
[German]
erste-n Platzi dem Subjekt ei streitig b. [VP Den the.ACC.SG.M first-ACC.SG.M position the.DAT.SG.N subject contested gemacht] hat das Objekt. made has the.NOM.SG.N object. ‘The object has competed with the subject for the first position.’ [AP dem Briefträger in vielen Merkmalen [German] (48) a. der the.NOM.SG.M the.DAT.SG.M postman in many features nicht unähnliche] Sohn der Nachbarin not dissimilar son the.GEN.SG.F neighbour.F ‘the son of the neighbor resembling the postman in many features’ [AP in vielen Merkmalen dem Briefträger nicht b. der the.NOM.SG.M in many features the.DAT.SG.M postman not unähnliche] Sohn der Nachbarin dissimilar son the.GEN.SG.F neighbor.F ‘the son of the neighbor resembling the postman in many features’
1428
VI. Theoretical Approaches to Selected Syntactic Phenomena
The same is claimed to be true in a cross-linguistic perspective. According to the authors just mentioned, scrambling occurs only in head-final languages. There are superficial counterexamples to this claim (e.g., Russian and some of the other Slavonic languages). Riemsdijk and Corver (1997) limit the scope of their claim to neutral scrambling, that is, scrambling which does not require the scrambled element to be interpreted as focal, contrastive, or topical, and claim that once this is taken into account, the generalization that scrambling is possible only in head-final structures is correct. Another potential counterexample is Yiddish. Yiddish is often analyzed as basically VO (Diesing 1997 a.o.), but it does allow scrambling. The analysis of the basic word order in Yiddish remains disputed (Haider and Rosengren 2003; Vikner 2001). While both case and head directionality may well play a role in licensing scrambling, the issue needs to be investigated further.
7. Word-order domains: HPSG The last type of account to be considered here are ones that do not involve movement and that abandon the Nontangling Condition. Most work of this type has been done in the tradition of HPSG (for German see for example Kathol 2000; S. Müller 2004; Reape 1994, 1996). The account rests on a clean separation between hierarchical and linear information. Earlier work that separated out statements about immediate dominance from those concerning linear precedence (see a.o. Falk 1983; Gazdar 1981; Gazdar et al. 1985; Sag 1987) had maintained the Nontangling Condition, but it is given up under Reape’s concept of word-order domain. The central idea behind word-order domains is the claim that in certain domains, hierarchical structure and word order are independent of each other: the structure is hierarchically organized, but ordering proceeds as if on a flat structure. Reape first introduces a relation called domain union and notated ‘䊊’. Domain union is related to the shuffle operator of formal language theory. Intuitively, two lists stand in the domain union relation to a third list if all and only the elements from the first two lists occur in the third list, and if the relative order of elements in the first list is observed in the third list and the relative order of elements in the second list is also observed in the third list; thus, the two lists in (49) stand in the domain union relation to those in (49a) but not to those in (49b). (49)
a. b.
…
40. Word Order
1429
The idea is now that each node in a tree is associated with an ordered list of elements representing the order of words under that node. The lists associated with sisters in a local tree are not concatenated, as in standard approaches, but shuffled together. An additional feature on a constituent ([UNIONED±]) is used to control whether that constituent may or may not be linearized discontinuously. Constituents that are [UNIONED−] are also called compacted, since they behave as an unbreakable unit with respect to material higher up in the structure. Constituents that are [UNIONED+] are also called liberating, because material in liberating domains is free to interleave with material from higher domains. All constituents in the structure in (50) are compacted; therefore, this structure obeys the Nontangling Condition and allows only the linearizations in (50a−d). If one of the constituents is liberating, as in (51), the additional possibilities in (51e−h) obtain. Finally, if both e and f are liberating all orders become possible in principle. Structures with constituents that are liberating may violate the Nontangling Condition. (Fox and Pesetsky’s 2005 notion of cyclic linearization and linearization domains has certain similarities to Reape’s domain union operator with liberating domains. Unlike Reape, Fox, and Pesetsky assume non-tangling trees, however. For discussion of Fox and Pesetsky 2005 see the other papers in that volume of Theoretical Linguistics.) With this technology in place, Reape (1994) can easily analyze a sentence like (4d). He assigns the example the syntactic structure in Figure 40.2 and assumes the linear precedence constraints that NP precedes V, that a verb follows any verb that it governs, and that all the VPs in Figure 40.2 are liberating. (50) e[UNIONED −] a
f[UNIONED −]
b
c
a.
b. c. d.
d
(51) e[UNIONED+] a a. b. c. d. e. f. g. h.
f[UNIONED −]
b
c
d
1430
VI. Theoretical Approaches to Selected Syntactic Phenomena
(52)
S
[NP 1 jemand]
[V 1 hat]
VP 1
[NP 2 ihm]
VP 2 [NP 3 es]
[V2 versprochen]
[V3 zu lesen]
Fig. 40.2: Reape’s structure for example (4d)
NP precedes V forces all the verbs in Figure 40.2 to appear right peripherally, a verb follows any verb that it governs forces the verbs in Figure 40.2. to appear in the order zu lesen versprochen hat, and the assumptions that the VPs are liberating allows all six conceivable relative orders of the NPs, among them, the order found in (4d). The example in (4d) and the tree in Figure 40.2 are Reape’s but they simplify the more nuanced use of word-order domains to account for word order in the Mittelfeld in HPSG considerably. The reason is that (4d) exemplifies two properties of German that have been subject to intense scrutiny: clustering of the verbs zu lesen versprochen hat and scrambling of the arguments es ihm jemand. (Hinterhölzl 2006 provides a recent book-length exploration of possible connections between these properties.) Reape treats both of these phenomena in terms of word-order domains. It is standard practice in HPSG now to assume the generalized raising analysis of Hinrichs and Nakazawa (1989, 1994) for verb clustering. Under Hinrichs and Nakazawa’s analysis verbs in the cluster may inherit the argument-taking properties from their complements. The topmost verb then takes as its own arguments the arguments from all the embedded verbs in the cluster. If this analysis is combined with a flat phrase structure for the Mittelfeld, there is no need to invoke word-order domains for argument reordering. This can be achieved by run-off-the-mill linear precedence constraints. Even under this set of assumptions, wordorder domains might still have a role in accounting for scrambling under preposition stranding, (53), or scrambling from AP, (54). offenbar niemand damit gerechnet hat (53) a. weil because apparently nobody there.with reckoned has ‘because apparently nobody expected that’ da offenbar niemand mit gerechnet hat b. weil because there apparently nobody with reckoned has ‘because apparently nobody expected that’
[German]
40. Word Order (54) a. dass auf jeden Jungeni seini Vater sehr stolz war that of every boy his father very proud was ‘that hisi father was very proud of every boyi’
1431 [German]
jeder Mann sehr b. dass auf mindestens einen Jungen fast that of at.least one boy almost every man very stolz war proud was (d [ c, c [ d) ‘that at least one man was very proud of almost every boy’ As mentioned at the outset, the central idea behind word-order domains is the claim that in liberating domains, hierarchical structure and word order are independent of each other: the structure is hierarchically organized, but ordering proceeds as if on a flat structure. Under such an approach, the nominal and fully-clausal arguments of verbs are treated as compacting domains, to guarantee their linear coherence. The Mittelfeld itself is made up of (possibly several layered) liberating domains. The linearization rules guarantee that verbs follow their arguments and that verbs are linearized correctly with respect to each other. The linearization rules do not regulate the order of arguments with respect to each other, however. It should be noted that to the extent that finer grained, more parochial linearization rules are needed, these can be added. For example, it is a commonly held assumption that weak pronouns in the Mittelfeld are strictly ordered with respect to each other and with respect to other arguments. This refinement is often expressed by invoking dedicated positions to which these pronouns move, but it can also be expressed in terms of specific linear-precedence rules. Independently of the details of phrase structure assumed, this system allows a very compact statement of the generalizations concerning linear order. Scrambling is simply the result of the fact that the domain union operator may allow associating various strings with the same hierarchical organization. This has the great advantage that, for example, information structure annotations can be accessed directly by linear precedence rules. As discussed above, scrambling has effects on scope and binding relations. Obviously, an adequate theory of scrambling that uses word-order domains cannot define scope and binding strictly in hierarchical terms, since the hierarchical organization of scrambled and unscrambled clauses is identical. Rather, it is necessary to formulate theories of binding and scope that are sensitive directly to linear order. Kathol (2000) proposes a theory of variable binding whose linear aspects are similar to the LFG proposal discussed above. As pointed out above, Bresnan (1998, 2001) assumes that when binding is not determined by the surface linear order, it is determined by rank on the relational hierarchy of grammatical functions at f-structure. Kathol (2000) (following Frank, Lee, and Rambow 1992; Frey 1993; Lechner 1998 empirically) instead assumes that binding can be determined by linear order of co-dependents, but that the subject may bind into its co-dependents even when it follows them. Similarly to the analyses previously discussed, a touchstone of this analysis is its ability to handle scrambling from NP, as in example (25). S. Müller (1997) argues that a simple extension of the idea that scrambling domains are liberating domains runs into difficulties with scrambling from NP, (25), and from AP. Treating NPs as liberating domains would give rise to the wrong prediction that material can be scrambled in between the determiner and the noun, (55b).
1432
VI. Theoretical Approaches to Selected Syntactic Phenomena
Subjekt das Objekt den (55) a. dass dem that the.DAT.SG.N subject the.NOM.SG.N object the.ACC.SG.M erste-n Platz streitig macht first-ACC.SG.M place contested makes ‘that the object competes with the subject for the initial place’
[German]
dem Subjekt Objekt den b. *dass das that the.NOM.SG.N the.DAT.SG.N subject object the.ACC.SG.M erste-n Platz streitig macht first-ACC.SG.M place contested makes Similar overgeneration problems arise for scrambling from AP. S. Müller (1997, 1999) suggests to treat scrambling as extraction into the Mittelfeld, i.e., he suggests to extend the HPSG mechanism for long-distance filler-gap dependencies to cover these cases. de Kuthy (2002) disagrees and suggests instead to extend the scope of Hinrichs and Nakazawa (1989, 1994) generalized raising analysis to cover the cases of scrambling from NP. A treatment which could also be extended to APs, as already mentioned in passing in S. Müller (1997). The question discussed in these papers is which of the mechanisms provided by HPSG should be extended to cover scrambling, the mechanism responsible for wh-movement constructions or the one responsible for raising. This question bears a great similarity to the debate on the A- vs. Ā-nature of scrambling, which is not accidental. Similarly, the fact that binding is possible from the scrambled position when scrambling removes a PP from an NP, (25), is problematic for Kathol’s formulation of the crossover constraint, as the PP and argument into which it binds are not co-dependents.
8. Scope Although we have seen that all existing theories of scrambling wrestle with certain questions and run into problems with the same types of constructions, a direct comparison remains difficult. Too much depends on ancillary assumptions about binding, scope, prosody, etc. Nevertheless, Fanselow (2001) and Kiss (2001) claim that the interaction of scope with scrambling provides an argument against the trace-based account. As shown above, trace-based accounts have a relatively easy time with the prediction that the unmarked order is unambiguous while scrambled orders are ambiguous. This is, because the unmarked order can be directly generated and in it every argument is associated with a unique hierarchical position in the phrase structure. Scope can then be read off the c-command relations directly. In scrambled word orders on the other hand, the scrambled arguments are associated (via copies or traces) with multiple positions in the phrase structure tree. If a quantificational element has scrambled across another quantificational element, the first c-commands the latter on the surface and the latter ccommands the trace of the former. Allowing the scoping mechanism to make reference to either of the positions associated with a scrambled element will then derive ambiguities between two quantificational expressions just in case one has scrambled across another. The idea of reconstruction to a trace position is, despite all differences, at the
40. Word Order
1433
heart of all trace-based accounts of scope determination (see in particular Frey 1993; Haider 1993; Haider and Rosengren 2003; Lechner 1996, 1998). While there is some disagreement in the literature on the question whether all (e.g., Frey 1993; Haider 1993; Kiss 2001) or only some (Lechner 1996, 1998; Pafel 1993) quantifiers, namely the weak ones, may take non-surface scope under scrambling, we can ignore this complication here. Fanselow’s and Kiss’s argument agains trace-based accounts rests on the claim that they overgenerate readings. Consider example (56) (Kiss 2001: 146). The example features scrambled indirect and direct objects preceding the subject. The order amongst the objects is the same as in the neutral order, which for the verb anbieten − ‘offer’ is S > IO > DO. A schematic representation of the structure of this example under a tracebased account of scrambling is given in (56a). Since, under the trace-based account of reconstruction, IO and DO can reconstruct independently of each other to their respective trace positions, the account predicts a scopal ambiguity between IO and DO. IO will take scope over DO if (i) both objects are interpreted in their surface position, or if (ii) DO (but not IO) reconstructs, or if (iii) both IO and DO reconstruct. In case only IO but not DO reconstructs, scope reversal results. Fanselow (2001) and Kiss (2001) claim that in examples like these, the relevant reading (and a number of expected readings with three quantificational expressions in scrambling structures) are, in fact, absent. Both of them use this as an argument for a traceless account of scrambling. (56) Ich glaube, dass mindestens ein-em [German] Verleger fast I believe that at.least one-DAT.SG.M publisher almost jed-es Gedicht nur dies-er Dichter angeboten hat. every-ACC.SG.N poem only this-NOM.SG.M poet offered has ‘I believe that only this poet has offered at least one publisher almost every poem.’ a. [C0 [IO b. [C0 [IO
[DO [DO
[S [S
[tIO [tDO V]]]]
V]]]]]]
Kiss (2001) assumes a phrase structure very similar to the one sketched above in the discussion of the LFG account of scrambling, (56b). He argues for a theory of scope whereby scope is either determined configurationally or relationally. The relevant configurational notion is, very roughly, c-command; the relational notion − the obliqueness hierarchy. A quantifier may either take its sister in its scope or it may take scope according to its position on the obliqueness hierarchy. If the latter, that element’s scope relations are fixed with respect to all other elements according to obliquness; no other element can take configurational scope with respect to it. Since IO and DO are arguments of the same verb, they are co-dependents. IO can take DO in its scope in two ways in (56b), configurationally, because DO is contained in IO’s sister and hence in its configurational scope, or relationally, because DO is a more oblique co-dependent. DO on the other hand cannot take IO in its scope, because IO is not contained in DO’s configurational scope and it is less rather than more oblique. While the case of three quantifiers cannot be discussed without presenting a fair amount of Kiss’s technical apparatus, the upshot of the analysis is that non-surface scope construals come about by giving a quantifier scope over all more oblique co-dependents.
1434
VI. Theoretical Approaches to Selected Syntactic Phenomena
Cook and Payne (2006) claim that Kiss’s characterization of scope is inherently disjunctive, and thereby non-explanatory. They diagnose a disjunction because for a given quantifier scope is determined on the basis of prominence either in the phrase-structure or on the argument hierarchy. Kiss’s and Fanselow’s argument is relatively simple and undermines an important support of the trace-based account. Unfortunately, the facts are less than clear. Frey (1993: 188) gives an example which is identical in relevant structural properties to (56) and claims that it is ambiguous. Further empirical work should be able to shed more light on this question.
9. Conclusion This chapter has argued for a structural, syntactic approach to scrambling and discussed a number of different approaches to the phenomenon. Word order alternations in languages that are usually dubbed non-configurational were set aside at the beginning. In these languages, word order freedom is more extreme than in scrambling languages like German and, crucially, the word order alternations provide no or very little evidence of being structural. The language of this type most frequently discussed in the formal literature is Warlpiri, a Pama-Nyungan language of Australia. Warlpiri has a set of auxiliaries that occupy the second clausal position, but the relative position of other elements in the clause is not fixed. This is illustrated in (57) from Legate (2002: 16−17) based on Hale (1983: 6−7). wawirri panti-rni. (57) a. Ngarrka-ngku ka PRS.IPFV kangaroo spear-NPST man-ERG ‘The man is spearing the kangaroo.’
[Warlpiri]
panti-rni ngarrka-ngku. b. Wawirri ka kangaroo PRS.IPFV spear-NPST man-ERG c. Panti-rni ka ngarrka-ngku wawirri. spear-NPST PRS.IPFV man-ERG kangaroo d. Ngarrka-ngku ka panti-rni wawirri. man-ERG PRS.IPFV spear-NPST kangaroo e. Panti-rni ka wawirri ngarrka-ngku. spear-NPST PRS.IPFV kangaroo man-ERG f.
Wawirri ka ngarrka-ngku panti-rni. kangaroo PRS.IPFV man-ERG spear-NPST
Warlpiri also allows constituents to split, (58) (Hale 1983: 6), and arguments of the verb to remain unpronounced altogether (Hale 1983: 7). (It should be noted that, as argued in Austin and Bresnan 1996, these properties do not necessarily co-occur.)
40. Word Order (58) a. Wawirri yalumpu kapi-rna panti-rni. spear-NPST AUX kangaroo that ‘I will spear that kangaroo.’
1435 [Warlpiri]
b. Wawirri kapi-rna panti-rni yalumpu. spear-NPST that kangaroo AUX (59) Panti-rni ka. spear-NPST PRS.IPFV ‘He/she is spearing him/her/it.’
[Warlpiri]
Now, the question of whether the word order alternations in (57) are syntactic in the sense of section 3 of this chapter is usually answered in the negative (see Hale 1994). The reason for this is that (non-)coreference between elements seems to be fixed independently of the order of elements; we do not find cross-over effects, condition C effects hold or do not hold independently of the order of elements, and familiar subject/object asymmetries are usually claimed to be absent. (A similar state of affairs is discussed for the Hungarian VP in É. Kiss 1987, 1994. Note though that the data available for Warlpiri are much less detailed than for better-studied languages and many of the conclusions must therefore remain tentative.) The difference between a structural operation of scrambling and apparently non-structural word-order permutations in non-configurational languages provides a prima facie argument for treating non-configurationality in a substantially different way from scrambling. Not surprisingly then, a number of proposals have been made according to which the syntax of non-configurational languages differs quite dramatically from that of configurational languages. One tradition (Austin and Bresnan 1996; Bresnan 2001; Hale 1983) assumes that non-configurational languages possess a flat, n-ary branching surface syntactic representation that, in particular, does not contain a VP-node, which would include verb and object to the exclusion of the subject. Another tradition, going back to Jelinek (1984) assumes that noun phrases in non-configurational languages never occupy argument positions of the verb but are adjuncts that semantically modify (null or clitic) pronouns, which are the actual arguments of the verb. Not all researchers have found the prima facie argument entirely convincing, though. They have instead tried to account for the word order in non-configurational languages using the tools already available for the analysis of scrambling in the sense of this chapter and of other displacement phenomena. Thus, Donohue and Sag (1999) sketch an approach to word-order in Warlpiri using Reape-style domain union, while Legate (2002, 2003) suggests that a particular combination of movement operations needed independently in the analysis of configurational languages can provide an analysis for Warlpiri. To determine whether the prima facie argument for a distinct system of non-configurational word-order alternations stands up, much more detailed work on the relevant languages will be necessary. Besides arguing for a structural approach to scrambling, this chapter has provided an overview of the main approaches to the syntactic structures and processes underlying scrambling. Despite many differences between the approaches a number of convergent themes emerge. The noncanonical cases of scrambling, that is, scrambling from NP, AP, and PP, pose unsolved difficulties for almost all theories. Partly, the difficulties stem
1436
VI. Theoretical Approaches to Selected Syntactic Phenomena
from the urge to assimilate scrambling to established phenomena, partly, they stem from the concepts of locality for binding relations. The issue of a trigger for movement was highlighted, as was the question of what the crosslinguistic correlates of scrambling are. The final section on Fanselow’s (2001) and Kiss’s (2001) argument from scope suggests that, 40 years after Ross set aside scrambling as too different from other syntactic rules even to be considered syntax, enough progress has been made so that scrambling can begin to inform theory in earnest.
10. References (selected) Abels, Klaus 2007 Towards a restrictive theory of (remnant) movement. Linguistic Variation Yearbook 7: 5−120. Abraham, Werner (ed.) 1985 Erklärende Syntax des Deutschen. Tübingen: Gunter Narr Verlag. Aoun, Joseph, and Yen-Hui Audrey Li 1989 Scope and constituency. Linguistic Inquiry 20: 141−172. Aoun, Joseph, and Yen-Hui Audrey Li 1993 Syntax of Scope. Cambridge, MA: MIT Press. Austin, Peter, and Joan W. Bresnan 1996 Non-configurationality in Australian aboriginal languages. Natural Language and Linguistic Theory 14: 215−268. Bach, Emmon 1979 Control in Montague Grammar. Linguistic Inquiry 10: 515−531. Baker, Mark 1988 Incorporation: A Theory of Grammatical Function. Chicago: University of Chicago Press. Baker, Mark 1997 Thematic roles and syntactic structure. In: Liliane Haegeman (ed.), Elements of Grammar: Handbook in Generative Syntax, 73−173. Dordrecht, Boston, London: Kluwer Academic Publishers. Baltin, Mark 2001 A-Movements. In: Mark Baltin, and Chris Collins (eds.), The Handbook of Syntactic Theory, 226−254. Oxford: Blackwell. Bayer, Josef, and Jaklin Kornfilt 1994 Against scrambling as an instance of Move-alpha. In: Norbert Corver, and Henk van Riemsdijk (eds.), Studies on Scrambling, 17−60. (Studies in Generative Grammar 41.) Berlin: De Gruyter. Behagel, Otto 1932 Deutsche Syntax. Band 4: Wortstellung. Periodenbau. Heidelberg: Carl Winters Universitätsbuchhandlung. Belletti, Adriana 2004 Aspects of the Low IP Area. In: Luigi Rizzi (ed.), The Structure of CP and IP − The Cartography of Syntactic Structures, Volume 2, 16−51. (Oxford studies in comparative syntax: The cartography of syntactic structures.) Oxford: Oxford University Press. Berman, Judith 2003 Clausal Syntax of German. CSLI Publications. Blevins, James P. 1990 Syntactic complexity: Evidence for discontinuity and multidomination. PhD thesis. University of Massachusetts.
40. Word Order
1437
Bošković, Željko 2004 Topicalization, focalization, lexical insertion, and scrambling. Linguistic Inquiry 35: 613−638. Bošković, Željko, and Daiko Takahashi 1998 Scrambling and last resort. Linguistic Inquiry 29: 347−366. Bresnan, Joan W. 1998 Morphology competes with syntax: Explaining typological variation in weak crossover effects. In: Pilar Barbosa et al. (eds.), Is the Best Good Enough? − Optimality and Competition in Syntax. Cambridge, MA: MIT Press. Bresnan, Joan W. 2001 Lexical-Functional Syntax. Oxford: Blackwell. Browning, Marguerite, and Ezat Karimi 1994 Scrambling to object position in Persian. In: Norbert Corver, and Henk van Riemsdijk (eds.), Studies on Scrambling, 61−100. (Studies in Generative Grammar 41.) Berlin: De Gruyter. Bunt, Harry, and Arthur van Horck (eds.) 1996 Discontinuous Constituency. (Natural Language Processing 6.) Mouton de Gruyter. Choi, Hye-Won 1996 Optimizing structure in context. PhD thesis. Stanford, CA: Stanford University. Choi, Hye-Won 1997 Information structure, phrase structure, and their interface. In: Miriam Butt, and Tracy Holloway King (eds.), Proceedings of the LFG’97 Conference. Stanford, CA: CSLI Publications. Choi, Hye-Won 2001 Phrase structure, information structure, and resolution of mismatch. In: Peter Sells (ed.), Formal and Empirical Issues in Optimality Theoretic Syntax, 17−62. (Studies in Constraint-Based Lexicalism.) Stanford, CA: CSLI Publications. Chomsky, Noam 1977 On wh-movement. In: Peter Culicover, Thomas Wasow, and Adrian Akmajian (eds.), Formal Syntax, 71−132. New York: Academic Press. Chomsky, Noam 1981 Lectures on Government and Binding. Dordrecht: Foris. Chomsky, Noam 1982 Some Concepts and Consequences of the Theory of Government and Binding. Cambridge, MA: MIT Press. Chomsky, Noam 1986 Barriers. (Linguistic Inquiry Monograph 13.) Cambridge, MA: MIT Press. Chomsky, Noam 1993 A minimalist program for linguistic theory. In: Kenneth Hale, and Samuel J. Keyser (eds.), The View from Building 20, 1−52. Cambridge, MA: MIT Press. Chomsky, Noam 1995a Categories and transformations. In: The Minimalist Program, 21−394. Cambridge, MA: MIT Press. Chomsky, Noam 1995b The Minimalist Program. Cambridge, MA: MIT Press. Chomsky, Noam 2000 Minimalist inquiries: The framework. In: Roger Martin, David Michaels, and Juan Uriagereka (eds.), Step by Step: Essays on Minimalism in Honor of Howard Lasnik, 89−155. Cambridge, MA: MIT Press. Chomsky, Noam 2008 On phases. In: Robert Freidin, Carlos Otero, and Maria-Luisa Zubizarreta (eds.), Foundational Issues in Linguistic Theory: Essays in Honor of Jean-Roger Vergnaud, 133− 166. Cambridge, MA: MIT Press.
1438
VI. Theoretical Approaches to Selected Syntactic Phenomena
Chomsky, Noam, and Howard Lasnik 1993 The theory of principles and parameters. In: Joachim Jacobs et al. (eds.), Syntax: An International Handbook of Contemporary Research, vol. 1, 506−569. Berlin: Walter de Gruyter. Cinque, Guglielmo 1990 Types of A′-Dependencies. Cambridge, MA: MIT Press. Cinque, Guglielmo 2005 Deriving Greenberg’s universal 20 and its exceptions. Linguistic Inquiry 36: 315−332. Cinque, Guglielmo 2009 The fundamental left-right asymmetry of natural languages. In: Sergio Scalise, Elisabetta Magni, and Antonietta Bisetto (eds.), Universals of Language Today, 165−184. Dordrecht: Springer. Cook, Philippa, and John Payne 2006 Information structure and scope in German. In: Miriam Butt, and Tracy Holloway King (eds.), The Proceedings of the LFG06 Conference, 124−144. Stanford, CA: CSLI Publications. Corver, Norbert, and Henk van Riemsdijk, eds. 1994 Studies on Scrambling. (Studies in Generative Grammar 41.) Berlin: De Gruyter. Deprez, Viviane 1994 Parameters of object movement. In: Norbert Corver, and Henk van Riemsdijk (eds.), Studies on Scrambling, 101−152. (Studies in Generative Grammar 41.) Berlin: De Gruyter. Diesing, Molly 1997 Yiddish VP order and typology of object movement in Germanic. Natural Language and Linguistic Theory 15: 369−427. Donohue, Cathryn, and Ivan A. Sag 1999 Domains in Warlpiri. Paper presented at HPSG 6 in Edinburgh. Dryer, Matthew S. 2007 Word order. In: Timothy Shopen (ed.), Language Typology and Syntactic Description. 2nd Edition, vol. 1, Clause Structure, 61−131. Cambridge, UK: Cambridge University Press. Enç, Mürvet 1991 The semantics of specificity. Linguistic Inquiry 22: 1−25. Falk, Yehuda 1983 Constituency, word order, and phrase structure rules. Linguistic Analysis 11: 331−360. Fanselow, Gisbert 1987 Konfigurationalität. Untersuchungen zur Universalgrammatik am Beispiel des Deutschen. Tübingen: Gunter Narr Verlag. Fanselow, Gisbert 1990 Scrambling as NP-movement. In: Günther Grewendorf, and Wolfgang Sternefeld (eds.), Scrambling and Barriers, 113−142. John Benjamins. Fanselow, Gisbert 1993 Die Rückkehr der Basisgenerierer. Groninger Arbeiten zur Germanistischen Linguistik 36: 1−74. Fanselow, Gisbert 2001 Features, θ-roles, and free constituent order. Linguistic Inquiry 32: 405−437. Felix, Sascha 1985 Parasitic gaps in German. In: Werner Abraham (ed.), Erklärende Syntax des Deutschen, 173−200. Tübingen: Gunter Narr Verlag. Fiengo, Robert 1977 On trace theory. Linguistic Inquiry 8: 35−62.
40. Word Order
1439
Fox, Danny, and David Pesetsky 2005 Cyclic linearization and its interaction with other aspects of grammar: a reply. Theoretical Linguistics 31: 235−262. Frank, Robert, Young-Suk Lee, and Owen Rambow 1992 Scrambling as non-operator movement and the special status of subjects. In: Sjef Barbiers, Marcel den Dikken, and C. Levelt (eds.), Proceedings of the Third Leiden Conference for Junior Linguists, 135−154. University of Leiden. Frey, Werner 1993 Syntaktische Bedingungen für die semantische Interpretation. (Studia Grammatica 35.) Berlin: Akademie Verlag. Frey, Werner, and Karin Pittner 1998 Zur Positionierung der Adverbiale im deutschen Mittelfeld. Linguistische Berichte 176: 48−534. Gärtner, Hans Martin, and Markus Steinbach 2000 What do reduced pronominals reveal about the syntax of Dutch and German. Linguistics in Potsdam 9: 7−62. Gazdar, Gerald 1981 Unbounded dependencies and coordinate structure. Linguistic Inquiry 12: 155−184. Gazdar, Gerald et al. 1985 Generalized Phrase Structure Grammar. Cambridge, MA: Harvard University Press. Greenberg, Joseph 1963 Some universals of grammar with particular reference to the order of meaningful elements. In: Joseph Greenberg (ed.), Universals of Language, 73−113. Cambridge, MA: MIT Press. Grewendorf, Günther 1985 Anaphern bei Objekt Koreferenz im Deutschen. Ein Problem für die Rektions-BindungsTheorie. In: Werner Abraham (ed.), Erklärende Syntax des Deutschen, 13−171. Tübingen: Gunter Narr Verlag. Grewendorf, Günther 2005 The discourse configurationality of scrambling. In: Joachim Sabel, and Mamoru Saito (eds.), The Free Word Order Phenomenon: Its Syntactic Sources and Diversity, 75−135. (Studies in Generative Grammar 69.) Berlin and New York: de Gruyter. Grewendorf, Günther, and Joachim Sabel 1999 Scrambling in German and Japanese: adjunction versus multiple specifiers. Natural Language and Linguistic Theory 17: 1−65. Haider, Hubert 1993 Deutsche Syntax, Generativ: Vorstudien zur Theorie einer Projektiven Grammatik. Tübingen: Gunter Narr Verlag. Haider, Hubert 1997 Scrambling: Locality, economy and directionality. In: Shigeo Tonoike (ed.), Scrambling, 61−91. Tokyo: Kurosio. Haider, Hubert 2006 Mittelfeld phenomena (Scrambling in Germanic). In: Martin Everaert, and Henk van Riemsdijk (eds.), The Blackwell Companion to Syntax, vol. 3, 204−274. Malden, MA: Blackwell Publishers. Haider, Hubert, and Inger Rosengren 1998 Scrambling. Sprache und Pragmatik 49. Haider, Hubert, and Inger Rosengren 2003 Scrambling: Nontriggered chain formation in OV languages. Journal of Germanic Linguistics 15: 203−267. Hale, Kenneth 1983 Warlpiri and the grammar of non-configurational languages. Natural Language and Linguistic Theory 1: 5−48.
1440
VI. Theoretical Approaches to Selected Syntactic Phenomena
Hale, Kenneth 1994 Core structures and adjunctions in Warlpiri syntax. In: Norbert Corver, and Henk van Riemsdijk (eds.), Studies on Scrambling, 185−220. (Studies in Generative Grammar 41.) Berlin: De Gruyter. Hawkins, John A. 1990 A parsing theory of word order universals. Linguistic Inquiry 21: 223−261. Hinrichs, Erhard W., and Tsuneko Nakazawa 1989 Flipped out: AUX in German. In: Papers from the 25th Annual Regional Meeting of the Chicago Linguistic Society, 187−202. Chicago Linguistic Society. Hinrichs, Erhard W., and Tsuneko Nakazawa 1994 Linearizing AUXs in German verbal complexes. In: John Nerbonne, Klaus Netter, and Carl Pollard (eds.), German in Head-Driven Phrase Structure Grammar, 11−37. Stanford, CA: CSLI publications. Hinterhölzl, Roland 2006 Scrambling, Remnant Movement, and Restructuring in West Germanic. (Oxford Studies in Comparative Syntax.) Oxford University Press. Hoji, Hajime 1985 Logical form constraints and configurational structures in Japanese. Doctoral dissertation. University of Washington. Hornstein, Norbert 1995 Logical Form: From GB to Minimalism. Cambridge, USA: Blackwell. Huck, Geoffrey J., and Almerindo E. Ojeda (eds.) 1987 Discontinuous Constituency. (Syntax and Semantics 20.) Orlando: Academic Press. Jacobson, Pauline 1990 Raising as function composition. Linguistics and Philosophy 13: 423−475. Jelinek, Eloise 1984 Empty categories, case, and configurationality. Natural Language and Linguistic Theory 2: 3−76. Johnston, Jason C., and Iksan Park 2001 Some problems with a lowering account of scrambling. Linguistic Inquiry 32: 727−732. Kathol, Andreas 2000 Linear Syntax. Oxford University Press, USA. Kathol, Andreas 2001 On the nonexistence of true parasitic gaps in Standard German. In: Peter Culicover, and Paul Postal (eds.), Parasitic Gaps, 315−338. Cambridge, MA: MIT Press. Kayne, Richard 1981 Unambiguous paths. In: Robert May, and Jan Koster (eds.), Levels of Syntactic Representation, 143−183. [Reprinted in Kayne 1984]. Dordrecht: Foris. Kayne, Richard 1994 The Antisymmetry of Syntax. Cambridge, MA.: MIT Press. Keenan, Edward, and Edward P. Stabler 2003 Bare Grammar: Lectures on Linguistic Invariants. Chicago: CSLI. Kidwai, Ayesha 2000 XP-Adjunction in Universal Grammar: Scrambling and Binding in Hindi-Urdu. Oxford: Oxford University Press. É. Kiss, Katalin 1987 Configurationality in Hungarian. (Studies in Natural Language and Linguistic Theory.) Norwell, MA: Kluwer Academic Publishers. É. Kiss, Katalin 1994 Scrambling as the base-generation of random complement order. In: Norbert Corver, and Henk van Riemsdijk (eds.), Studies on Scrambling, 221−256. (Studies in Generative Grammar 41.) Berlin: De Gruyter.
40. Word Order
1441
Kiss, Tibor 2001 Configurational and relational scope determination in German. In: W. Detmar Meurers, and Tibor Kiss (eds.), Constraint-Based Approaches to Germanic Syntax, 14−175. Stanford, CA: CSLI Publications. Kitahara, Hisatsugu 2002 Scrambling, case, and interpretability. In: Samuel David Epstein, and T. Daniel Seely (eds.), Derivation and Explanation in the Minimalist Program, 167−183. Malden, Massachusetts: Blackwell Publishing. Koopman, Hilda, and Dominique Sportiche 1991 The position of subjects. Lingua 85: 211−258. Kornfilt, Jaklin 2003 Scrambling, subscrambling, and case in Turkish. In: Simin Karimi (ed.), Word Order and Scrambling, 125−155. Malden, MA: Blackwell. Kuthy, Kordula de 2002 Discontinuous NPs in German: A Case Study of the Interaction of Syntax, Semantics, and Pragmatics. (Studies in Constraint-Based Lexicalism.) Stanford, CA: CSLI Publications. Kuthy, Kordula de, and Walt Detmar Meurers 2001 On partial constituent fronting in German. The Journal of Comparative Germanic Linguistics 3: 143−205. Lasnik, Howard 1995 Case and expletives revisited: On greed and other human failings. Linguistic Inquiry 26: 61−633. Lasnik, Howard, and Mamoru Saito 1992 Move α: Conditions on its Application and Output. Cambridge, MA: MIT Press. Lechner, Winfried 1996 On semantic and syntactic reconstruction. Wiener Linguistische Gazette 57−59: 63−100. Lechner, Winfried 1998 Two kinds of reconstruction. Studia Linguistica 52: 276−310. Lee, Young-Suk, and Beatrice Santorini 1994 Towards resolving Webelhuth’s paradox: evidence from German and Korean. In: Norbert Corver, and Henk van Riemsdijk (eds.), Studies on Scrambling. (Studies in Generative Grammar 41.) Berlin: De Gruyter. Legate, Julie Anne 2002 Warlpiri: Theoretical implications. PhD thesis. Cambridge, MA: Massachusetts Institute of Technology. Legate, Julie Anne 2003 Some interface properties of the phase. Linguistic Inquiry 34: 506−515. Lenerz, Jürgen 1977 Zur Abfolge nominaler Satzglieder im Deutschen. Tübingen: Narr. Lenerz, Jürgen 2001 Scrambling and reference in German. In: Werner Abraham, and C. JanWouter Zwart (eds.), Issues in Formal German(ic) Typology, 179−192. (Linguistik Aktuell 45.) Amsterdam and Philadelphia: John Benjamins Publishing Company. Levin, Beth, and Malka Rappaport Hovav 1995 Unaccusativity: At the Syntax − Lexical Semantics Interface. (Linguistic Inquiry Monograph 26.) Cambridge, MA: MIT Press. Mahajan, Anoop 1990 The A/A-bar distinction and movement theory. PhD thesis. MIT. Mahajan, Anoop 1994 Toward a unified theory of scrambling. In: Norbert Corver, and Henk van Riemsdijk (eds.), Studies on Scrambling, 301−330. (Studies in Generative Grammar 41.) Berlin: De Gruyter.
1442
VI. Theoretical Approaches to Selected Syntactic Phenomena
McCawley, James D. 1982 Parentheticals and discontinuous constituent structure. Linguistic Inquiry 13: 91−106. McCawley, James D. 1987 Some additional evidence for discontinuity. In: Geoffrey Huck, and Almerindo E Ojeda (eds.), Discontinuous Constituency. (Syntax and Semantics.) 20. Chicago: University of Chicago Press. Meinunger, André 2000 Syntactic Aspects of Topic and Comment. (Linguistik Aktuell 38.) Amsterdam & Philadelphia: John Benjamins. Miyagawa, Shigeru 2005 EPP and semantically vacuous scrambling. In: Joachim Sabel, and Mamoru Saito (eds.), The Free Word Order Phenomenon: Its Syntactic Sources and Diversity, 181−220. (Studies in Generative Grammar 69.) Berlin and New York: de Gruyter. Miyagawa, Shigeru 2006 On the “undoing” property of scrambling: A response to Bošković. Linguistic Inquiry 37: 60−624. Müller, Gereon 1995 A-bar Syntax: A Study of Movement Types. (Studies in Generative Grammar 42.) New York: Mouton de Gruyter. Müller, Gereon 1998 Incomplete Category Fronting: A Derivational Approach to Remnant Movement in German. Dordrecht, Boston: Kluwer Academic Publishers. Müller, Gereon 1999 Optimality, word order, and markedness in German. Linguistics 37: 777−818. Müller, Gereon, and Wolfgang Sternefeld 1993 Improper movement and unambiguous binding. Linguistic Inquiry 24: 461−507. Müller, Gereon, and Wolfgang Sternefeld 1994 Scrambling as A-bar movement. In: Norbert Corver, and Henk van Riemsdijk (eds.), Studies on Scrambling, 331−386. (Studies in Generative Grammar 41.) Berlin: De Gruyter. Müller, Stefan 1997 “Scrambling in German − Extraction into the Mittelfeld.” In: Proceedings of the Tenth Pacific Asia Conference on Language, Information and Computation. City University of Hong Kong. 79−83. Müller, Stefan 1999 Deutsche Syntax Deklarativ. (Linguistische Arbeiten 394.) Tübingen: Niemeyer. Müller, Stefan 2004 Continuous or discontinuous constituents? A comparison between syntactic analyses for constituent order and their processing systems. Research on Language & Computation 2: 20−257. Neeleman, Ad 1994 Scrambling as a D-structure phenomenon. In: Norbert Corver, and Henk van Riemsdijk (eds.), Studies on Scrambling, 387−429. (Studies in Generative Grammar 41.) Berlin: De Gruyter. Neeleman, Ad, and Hans van de Koot 2002 The configurational matrix. Linguistic Inquiry 33: 529−574. Neeleman, Ad, and Fred Weerman 1999 Flexible Syntax. Dordrecht, Boston: Kluwer Academic Publishers. Nerbonne, John, Klaus Netter, and Carl Pollard (eds.) 1994 German in Head-Driven Phrase Structure Grammar. Stanford, CA: CSLI publications. Nishigauchi, Taisuke 2002 Scrambling and reconstruction at LF. Gengo Kenkyu 121: 49−105.
40. Word Order
1443
Ojeda, Almerindo E. 1988 A linear precedence account of cross-serial dependencies. Linguistics and Philosophy 11: 45−492. Pafel, Jürgen 1993 Scope and word order. In: Joachim Jacobs et al. (eds.), Syntax − An International Handbook of Contemporary Research, 867−879. (HSK.) Berlin: de Gruyter. Pafel, Jürgen 2005 Quantifier Scope in German. John Benjamins Pub Co. Partee, Barbara Hall, Alice G. B. ter Meulen, and Robert Eugene Wall 1990 Mathematical Methods in Linguistics. Dordrecht; Boston: Kluwer Academic. Perlmutter, David M., and Paul Postal 1984 The 1 Advancement Exclusiveness Law. In: David M Perlmutter, and Carol Rosen (eds.), Studies in Relational Grammar, volume 2, 81−125. Chicago: University of Chicago Press. Perlmutter, David M., and Carol Rosen (eds.) 1984 Studies in Relational Grammar, volume 2. Chicago: University of Chicago Press. Pesetsky, David 1995 Zero Syntax. Cambridge, MA: MIT Press. Pollard, Carl, and Ivan A. Sag 1994 Head-Driven Phrase Structure Grammar. Chicago: University of Chicago Press. Postal, Paul 1971 Cross-over Phenomena. New York: Holt, Rinehart, and Winston. Postal, Paul 1993a Parasitic gaps and the across-the-board phenomenon. Linguistic Inquiry 24: 735−754. Postal, Paul 1993b Some defective paradigms. Linguistic Inquiry 24: 357−364. Postal, Paul 1998 Three Investigations of Extraction. (Current Studies in Linguistics Series 29.) Cambridge, MA: MIT Press. Postal, Paul 2004 Skeptical Linguistic Essays. Oxford and New York: Oxford University Press. Putnam, Michael T. 2007 Scrambling and the Survive Principle. (Linguistik Aktuell 115.) Amsterdam & Philadelphia: John Benjamins Publishing Company. Reape, Mike 1994 Domain union and word order variation in German. In: John Nerbonne, Klaus Netter, and Carl Pollard (eds.), German in Head-Driven Phrase Structure Grammar, 151−197. Stanford, CA: CSLI publications. Reape, Mike 1996 Getting things in order. In: Harry Bunt, and Arthur van Horck (eds.), Discontinuous Constituency, 209−254. (Natural Language Processing 6.) Mouton de Gruyter. Reinhart, Tanya 1983 Anaphora and Semantic Interpretation. London: Croom Helm. Reinhart, Tanya, and Yosef Grodzinsky 1993 The innateness of binding and coreference. Linguistic Inquiry 24: 69−101. Reis, Marga, and Inger Rosengren 1992 What do wh-imperatives tell us about wh-movement. Natural Language and Linguistic Theory 10: 79−118. Riemsdijk, Henk van, and Norbert Corver 1997 The position of the head and the domain of scrambling. In: Bohumil Palek (ed.), Typology: Prototypes, Item Orderings and Universals. Proceedings of LP96, Prague, 57−90. Prague: Charles University Press.
1444
VI. Theoretical Approaches to Selected Syntactic Phenomena
Rizzi, Luigi 1990 Relativized Minimality. (Linguistic Inquiry Monograph 16.) Cambridge, MA: MIT Press. Rosen, Carol 1984 The interface between semantic roles and initial grammatical relations. In: David M Perlmutter, and Carol Rosen (eds.), Studies in Relational Grammar, volume 2, 59−77. Chicago: University of Chicago Press. Ross, John 1967 Constraints on variables in syntax. PhD thesis. Massachusetts Institute of Technology. Sabel, Joachim 2002 Die Doppelobjekt-Konstruktion im Deutschen. Linguistische Berichte 19: 229−244. Sabel, Joachim 2005 String-vacuous scrambling and the Effect on Output Condition. In: Joachim Sabel, and Mamoru Saito (eds.), The Free Word Order Phenomenon: Its Syntactic Sources and Diversity, 281−334. (Studies in Generative Grammar 69.) Berlin and New York: de Gruyter. Sabel, Joachim, and Mamoru Saito (eds.) 2005 The Free Word Order Phenomenon: Its Syntactic Sources and Diversity. (Studies in Generative Grammar 69.) Berlin and New York: de Gruyter. Sag, Ivan A. 1987 Grammatical hierarchy and linear precedence. In: Geoffrey J. Huck, and Almerindo E. Ojeda (eds.), Discontinuous Constituency, 303−340. (Syntax and Semantics 20.) Orlando: Academic Press. Saito, Mamoru 1985 Some asymmetries in Japanese and their theoretical implications. PhD thesis. MIT. Saito, Mamoru 1989 Scrambling as semantically vacuous A′-movement. In: Mark Baltin, and Anthony Kroch (eds.), Alternative Conceptions of Phrase Structure, 182−200. Chicago: University of Chicago Press. Saito, Mamoru 1992 Long distance scrambling in Japanese. Journal of East Asian Linguistics 1: 69−118. Saito, Mamoru 2003 A derivational approach to the interpretation of scrambling chains. Lingua 113: 481− 518. Saito, Mamoru 2004 Japanese scrambling in a comparative perspective. In: David Adger, Cécile de Kat, and George Tsoulas (eds.), Peripheries: Syntactic Edges and their Effects, 14−163. (Studies in Natural Language and Linguistic Theory 59.) Dordrecht, Boston, and London: Kluwer Academic Publishers. Stechow, Arnim von, and Wolfgang Sternefeld 1988 Bausteine Syntaktischen Wissens: ein Lehrbuch der Generativen Grammatik. Opladen: Westdeutscher Verlag. Sternefeld, Wolfgang, and Sam Featherston 2003 The German reciprocal einander in Double Object Constructions. In: Lutz Gunkel, Gereon Müller, and Gisela Zifonun (eds.), Arbeiten zur Reflexivierung, 239−266. (Linguistische Arbeiten 481.) Tübingen: Niemeyer. Stroik, Thomas S. 1999 The Survive Principle. Linguistic Analysis 29: 282−309. Tada, Hiroaki 1993 A/A-bar partition in derivation. PhD thesis. Cambridge, MA: MIT. Ueyama, Ayumi 1998 Two types of dependency. PhD thesis. Los Angeles, CA: University of Southern California.
40. Word Order
1445
Ueyama, Ayumi 2003 Two types of scrambling construction in Japanese. In: Andrew Barss (ed.), Anaphora: A Reference Guide, 23−71. Malden, MA: Blackwell Publishing. Vikner, Sten 2001 Verb Movement Variation in Germanic and Optimality Theory. Neuphilologische Fakultät Universität Tübingen. Wasow, Thomas 1972 Anaphoric relations in English. [Revised version published as Wasow 1979]. PhD thesis. MIT. Webelhuth, Gert 1989 Syntactic saturation phenomena and the modern Germanic languages. PhD thesis. Universite de Geneve. Webelhuth, Gert 1992 Principles and Parameters of Syntactic Saturation. New York: Oxford University Press. Williams, Edwin S. 1997 Blocking and anaphora. Linguistic Inquiry 28: 577−628. Williams, Edwin S. 2002 Representation Theory. Cambridge, MA: MIT Press. Williams, Edwin S. 2011 Regimes of Derivation in Syntax and Morphology. New York: Routledge. Wurmbrand, Susanne 2008 Word order and scope in German. Groninger Arbeiten zur Germanistischen Linguistik 46: 8−110. Zwart, C. Jan-Wouter 1993 Dutch syntax: A minimalist approach. PhD thesis. University of Groningen.
Klaus Abels, London (UK)